Reconstructing Ruby, Part 6: Getting useful syntax errors
Read Part 5 in case you missed it.
We're going to start expanding our parser to better understand ruby programs. The example program we're going to target first is this one from the Ruby homepage:
# The Greeter class
class Greeter
def initialize(name)
@name = name.capitalize
end
def salute
puts "Hello #{@name}!"
end
end
# Create a new object
g = Greeter.new("world")
# Output "Hello World!"
g.salute
Let's changeyour program.rb
file to match the code above and then run ./ruby program.rb
. You should see the following failure:
ID(class)
CONSTANT(Greeter)
ID(def)
ID(initialize)
LPAREN
ID(name)
RPAREN
AT
ID(name)
EQUAL
ID(name)
DOT
ID(capitalize)
ID(end)
ID(def)
ID(salute)
ID(puts)
STRING("Hello #{@name}!")
ID(end)
ID(end)
ID(g)
EQUAL
CONSTANT(Greeter)
DOT
ID(new)
LPAREN
STRING("world")
RPAREN
ID(g)
DOT
ID(salute)
syntax error
That's not a very descriptive error message. To get this to the point where we can actually get useful information about our errors let's move our yyerror
function from parse.y
to the bottom of ruby.l
and make the following changes:
%%
void yyerror(char const *s) {
fprintf(stderr,
"%s. Unexpected \"%s\" on line %d\n",
s,
yytext,
yylineno);
}
yytext
will contain the text of the current token and yylineno
will contain the line number of that token. yylineno
isn't provided by default so we'll have to specify the option in our ruby.l
file. Right after the option for noyywrap
add:
%option yylineno
Now, much like we did for yylex
in parse.y
, we'll need to add the following:
extern void yyerror(const char *s);
If you run make && ./ruby program.rb
again you'll get something rather unexpected:
syntax error. Unexpected "" on line 17
The reason the token text is empty is because we're not actually generating any tokens in our lexer (except for tNUMBER and tPLUS).
We're going to change our lexer to stop printing output and produce tokens instead. Everywhere in ruby.l
where we have TYPE("SOMETHING")
we're going to change to TOKEN(SOMETHING)
. Note the lack of double quotes there, this will be important when we define our macro. If you're using VIM you can type the following :%s/TYPE("\(.*\)")/TOKEN(\1)/<ENTER>
to make the change, but you'll need to manually change return tPLUS
to TOKEN(PLUS)
. Then for each of our VTYPE
rules we'll also add a TOKEN
call. Your rules will look like this:
%%
#.*$ {}
\"([^"]|\\.)*\" { VTYPE("STRING", yytext); TOKEN(STRING); }
\'([^']|\\.)*\' { VTYPE("STRING", yytext); TOKEN(STRING); }
{NUMBER}(\.{NUMBER}|(\.{NUMBER})?[eE][+-]?{NUMBER}) {
VTYPE("FLOAT", yytext); TOKEN(FLOAT); }
{NUMBER} { yylval = atoi(yytext); TOKEN(NUMBER); }
[a-z_][a-zA-Z0-9_]* { VTYPE("ID", yytext); TOKEN(ID); }
[A-Z][a-zA-Z0-9_]* { VTYPE("CONSTANT", yytext); TOKEN(CONSTANT); }
"=" { TOKEN(EQUAL); }
">" { TOKEN(GT); }
"<" { TOKEN(LT); }
">=" { TOKEN(GTE); }
"<=" { TOKEN(LTE); }
"!=" { TOKEN(NEQUAL); }
"+" { TOKEN(PLUS); }
"-" { TOKEN(MINUS); }
"*" { TOKEN(MULT); }
"/" { TOKEN(DIV); }
"%" { TOKEN(MOD); }
"!" { TOKEN(EMARK); }
"?" { TOKEN(QMARK); }
"&" { TOKEN(AND); }
"|" { TOKEN(OR); }
"[" { TOKEN(LSBRACE); }
"]" { TOKEN(RSBRACE); }
"(" { TOKEN(LPAREN); }
")" { TOKEN(RPAREN); }
"{" { TOKEN(LBRACE); }
"}" { TOKEN(RBRACE); }
"@" { TOKEN(AT); }
"." { TOKEN(DOT); }
"," { TOKEN(COMMA); }
":" { TOKEN(COLON); }
[\t ] {}
\n {}
. { fprintf(stderr, "Unknown token '%s'\n", yytext); }
%%
At the top of our file let's replace our TYPE
macro with this TOKEN
macro:
%define TOKEN(id) return t##id
If you haven't seen the ##
before this is part of the c-preprocessor language which will prepend t in front of the value of id. Essentually TOKEN(EQUAL);
will expand to return tEQUAL;
which is the format of our tokens.
Now we need to define a tokens in our parser for all the tokens we're using in our lexer. Inside of parse.y
change the lines:
%token tNUMBER
%token tPLUS
to:
%token tSTRING tFLOAT tNUMBER tID tCONSTANT tEQUAL tGT tLT tGTE tLTE
%token tNEQUAL tPLUS tMINUS tMULT tDIV tMOD tEMARK tQMARK tAND tOR
%token tLSBRACE tRSBRACE tLPAREN tRPAREN tLBRACE tRBRACE tAT tDOT
%token tCOMMA tCOLON
Now if you run make && ./ruby program.rb
you should see the following syntax error:
ID(class)
syntax error. Unexpected "class" on line 2
This is a far more reasonable error message that we can start doing something about. In the next post we'll fix all our syntax errors to start filling out our grammar.
If you're having any problems you can check the reference implementation on GitHub or look specifically at the diff to see what I've changed. Additionally, if you have any comments or feedback I'd greatly appreciate if you left a comment!
Update: read Part 7 of this series.