FIRST and FOLLOW sets, part 1
In my last post, we considered the grammar restrictions imposed by an LL(1) parser. We looked at how FIRST sets for post-increment and post-decrement statements could lead to ambiguity for the parser, and how to fix the problem.
But even a simple language like SPTL has dozens of production rules, and for many of them the FIRST set may not be obvious.
For example:
We've gone through three production rules, and not found anything resembling a SPTL token. And it continues:
At this rate, it will take forever to get anywhere. We need a method for constructing the set of FIRST sets. Fortunately, there are several rules which will make the job a lot easier.
But first, some definitions:
Terminal: The technical term for token. These are the symbols of the language.
Nonterminal: These are the terms that appear in the BNF but not in the language. A nonterminal will always have a production rule to transform it into something else.
Now we'll define:
A and B as nonterminals
w and x as terminals
And, as in the last post, the following symbols mean:
∅ Empty set
∈ Empty sentence
∩ Intersection - the elements common to both sets
∪ Union - the elements appearing in one or both sets
Now, the FIRST rules:
So far, so good. But how do we apply them?
We can see that every FIRST rule for a nonterminal gives us another FIRST rule, while all FIRST rules for terminals gives us a terminal (even if it's an empty terminal). Eventually we need to get a FIRST set of terminals for each rule. By building from the bottom up, we can get a FIRST set for each element of the BNF grammar.
For example, let's look at a portion of the expression rules for SPTL:
If we start with expr = boolexpr { boolop boolexpr }, we get FIRST(expr) = FIRST(boolexpr), FIRST(boolexpr) = FIRST(relexpr), and so on. By starting at the bottom, we see:
FIRST(escapestring) = "
FIRST(plainstring) = '
By starting out at the bottom, with production rules that lead immediately to terminals, we can apply the FIRST rules repeatedly for the other production rules, to get a set of terminals for every production rule. So:
FIRST(stringconstant)
= FIRST(plainstring) ∪ FIRST(escapestring)
= [", ']
I've uploaded the complete set of FIRST sets for SPTL in gnumeric format or csv format.
My next post will look at FOLLOW sets.
But even a simple language like SPTL has dozens of production rules, and for many of them the FIRST set may not be obvious.
For example:
program = elementblock
elementblock = {simpleelement}+
simpleelement = statement | vardec | functiondef
We've gone through three production rules, and not found anything resembling a SPTL token. And it continues:
statement = assignstatement | breakstatement | callstatement |
closestatement | foreachstatement | forstatement |
ifstatement | openstatement | popstatement |
predecstatement | preincstatement | printstatement |
pushstatement | readstatement | repeatstatement |
shiftstatement | whilestatement
At this rate, it will take forever to get anywhere. We need a method for constructing the set of FIRST sets. Fortunately, there are several rules which will make the job a lot easier.
But first, some definitions:
Terminal: The technical term for token. These are the symbols of the language.
Nonterminal: These are the terms that appear in the BNF but not in the language. A nonterminal will always have a production rule to transform it into something else.
Now we'll define:
A and B as nonterminals
w and x as terminals
And, as in the last post, the following symbols mean:
∅ Empty set
∈ Empty sentence
∩ Intersection - the elements common to both sets
∪ Union - the elements appearing in one or both sets
Now, the FIRST rules:
- FIRST(∈) = ∅
- FIRST(x) = x
- FIRST(w|x) = [w, x]
- FIRST(∈ x) = x;
- FIRST(w x) = w
but if w can be empty, FIRST(w x) = [w, x] - FIRST(A|B) = FIRST(A) ∪ FIRST(B)
- FIRST(A B) = FIRST(A)
but if A can be empty, FIRST(A B) = FIRST(A) ∪ FIRST(B)
So far, so good. But how do we apply them?
We can see that every FIRST rule for a nonterminal gives us another FIRST rule, while all FIRST rules for terminals gives us a terminal (even if it's an empty terminal). Eventually we need to get a FIRST set of terminals for each rule. By building from the bottom up, we can get a FIRST set for each element of the BNF grammar.
For example, let's look at a portion of the expression rules for SPTL:
expr = boolexpr { boolop boolexpr }
boolexpr = relexpr [ relop relexpr ]
boolop = '&&' | '||'
relexpr = [ '-' ] term { addop term }
relop = '<' | '>' | '==' | '<=' | '>=' | '!='
term = factor { multop factor }
addop = '+' | '-'
factor = constant | varname | funcname | '(' expr ')' | negate
multop = '*' | '/' | '%'
negate = '!' factor
constant = boolconstant | numconstant | stringconstant
boolconstant = 'true' | 'false'
numconstant = digit {digit}
stringconstant = plainstring | escapestring
plainstring = ''' text '''
escapestring = '"' text '"'
If we start with expr = boolexpr { boolop boolexpr }, we get FIRST(expr) = FIRST(boolexpr), FIRST(boolexpr) = FIRST(relexpr), and so on. By starting at the bottom, we see:
FIRST(escapestring) = "
FIRST(plainstring) = '
By starting out at the bottom, with production rules that lead immediately to terminals, we can apply the FIRST rules repeatedly for the other production rules, to get a set of terminals for every production rule. So:
FIRST(stringconstant)
= FIRST(plainstring) ∪ FIRST(escapestring)
= [", ']
I've uploaded the complete set of FIRST sets for SPTL in gnumeric format or csv format.
My next post will look at FOLLOW sets.
Labels: FIRST, FOLLOW, grammar rules, recursive descent
0 Comments:
Post a Comment
Links to this post:
Create a Link
<< Home