Reconstructing Ruby, Part 5: The dreaded shift/reduce
Read Part 4 in case you missed it.
Sorry for the months of silence. While this post is going to be short, I'm getting back to working on new content for the series.
So one problem that you'll often see when writing a parser is a shift/reduce conflict. In fact, at this point, if you run make
you'll probably see the following:
bison -d parse.y
parse.y: conflicts: 1 shift/reduce
flex ruby.l
cc -o ruby main.c parse.tab.c lex.yy.c
A shift/reduce conflict happens when we introduce an ambiguity in our grammer. An ambiguity means the parser doesn't know if it should shift the next terminal onto the stack or if it should reduce according to a specified rule. You might also encounter a reduce/reduce conflict where the parser doesn't know which rule it should use to reduce, but those are typically easy to resolve.
Given we have a shift/reduce conflict, it would be useful for us to be able to see what is causing the current shift/reduce conflict. We can do this by changing our Makefile
rule for parse.tab.c
to the following:
parse.tab.c: parse.y
bison -v -d parse.y
The -v
flag is used to generate an output file which breaks down the rules and states of our parser. A state refers to how the parser has interpreted the different terminals. As it reduces these terminals to non-terminals or shifts new tokens on to its stack it will change which state it's in.
Now, when we run make
it will generate the file parse.output
. At the top of that file you should see something like:
State 8 conflicts: 1 shift/reduce
If we look at state 8, we see the following:
state 8
5 expression: expression . tPLUS expression
5 | expression tPLUS expression .
tPLUS shift, and go to state 7
tPLUS [reduce using rule 5 (expression)]
$default reduce using rule 5 (expression)
Here we can see that our parser is unsure whether it should shift a tPLUS token or reduce using rule 5 which reduces expression tPLUS expression
to expression
. A situation where this ambiguity would show up is in the program:
1 + 2 + 3
Essentially our parser doesn't know if it should perform the operations as (1 + 2) + 3
or 1 + (2 + 3)
. While this isn't a problem mathematically, we should do our best to eliminate ambiquities.
Luckily for us we can easily solve this by telling the "+" token how it should reduce. In this case we'll tell it to always reduce everything to the left of it (essentially choosing the (1 + 2) + 3
path). To solve this add:
%left tPLUS
right above the %token
declarations. Now when we run make
we'll no longer see the shift/reduce conflict. We'll continue to utilize the parser.output
file whenever a shift/reduce conflict arises.
In the next post we'll introduce a simple Abstract Syntax Tree, or AST, and the grammar a bit to cover more of the ruby language.
If you're having any problems you can check the reference implementation on GitHub or look specifically at the diff to see what I've changed. Additionally, if you have any comments or feedback I'd greatly appreciate if you left a comment!
Update: read Part 6 of this series.