LL(1) grammar restrictions
Before I discuss the statement syntax for SPTL, I want to look at grammar restrictions. As I mentioned in a previous post, I'm planning to build a recursive descent (top down) parser for SPTL. Specifically, it will be an LL(1) parser. That is, parsing from Left to right, with a Lookahead of 1 token.
This places restrictions on the grammar. The parser must be able to determine, when it finds a token, what grammar rule to apply. If a language had these rules:
then the parser wouldn't know when it encountered a var whether this was an incstatement or a decstatement.
For a grammar to qualify as LL(1), it must satisfy a set of rules.
Define:
A as a sentence (one or more tokens) in the grammar
u and v as sentences derived from A
w and x as unspecified sentences
FIRST as the set of possible first tokens for the sentence
FOLLOW as the set of possible first tokens for the following sentence
The following symbols mean:
∀ For all
∃ There exists
∅ Empty set
∈ Empty sentence
∩ Intersection (the elements common to both sets)
A grammar G is LL(1) if and only if:
A pair is any two elements that can appear in the same part of the sentence (in the BNF-like grammar above, these are denoted by the | symbol).
So, going back to the incstatement/decstatement example above,
statement is A, the sentence we are looking at
incstatement and decstatement are u and v respectively
FIRST(incstatement) is {var}
FIRST(decstatement) is {var}
FIRST(incstatement) ∩ FIRST(decstatement) = {var}
{var} ≠ ∅
Therefore, the grammar above is not LL(1).
How can we change it into an LL(1) grammar? Simple:
Now the var is part of statement, and in place of incstatement/decstatement we have incop/decop.
FIRST(incop) is {'++'}
FIRST(decop) is {'--'}
FIRST(incop) ∩ FIRST(decop) = ∅
Now the grammar (or at least this portion of it) is LL(1).
This is a very simple example. But how do we determine FIRST and FOLLOW for all sentences of a grammar? I'll explain in my next post.
This places restrictions on the grammar. The parser must be able to determine, when it finds a token, what grammar rule to apply. If a language had these rules:
statement : incstatement | decstatement
incstatement : var '++'
decstatement : var '--'
then the parser wouldn't know when it encountered a var whether this was an incstatement or a decstatement.
For a grammar to qualify as LL(1), it must satisfy a set of rules.
Define:
A as a sentence (one or more tokens) in the grammar
u and v as sentences derived from A
w and x as unspecified sentences
FIRST as the set of possible first tokens for the sentence
FOLLOW as the set of possible first tokens for the following sentence
The following symbols mean:
∀ For all
∃ There exists
∅ Empty set
∈ Empty sentence
∩ Intersection (the elements common to both sets)
A grammar G is LL(1) if and only if:
- ∀ A, A ≠ ∈,
∀ pairs A → u and A → v,
FIRST(u) ∩ FIRST(v) = ∅ - If ∃ A, A → ∈,
FIRST(A) ∩ FOLLOW(A) = ∅
A pair is any two elements that can appear in the same part of the sentence (in the BNF-like grammar above, these are denoted by the | symbol).
So, going back to the incstatement/decstatement example above,
statement is A, the sentence we are looking at
incstatement and decstatement are u and v respectively
FIRST(incstatement) is {var}
FIRST(decstatement) is {var}
FIRST(incstatement) ∩ FIRST(decstatement) = {var}
{var} ≠ ∅
Therefore, the grammar above is not LL(1).
How can we change it into an LL(1) grammar? Simple:
statement = var (incop | decop)
incop = '++'
decop = '--'
Now the var is part of statement, and in place of incstatement/decstatement we have incop/decop.
FIRST(incop) is {'++'}
FIRST(decop) is {'--'}
FIRST(incop) ∩ FIRST(decop) = ∅
Now the grammar (or at least this portion of it) is LL(1).
This is a very simple example. But how do we determine FIRST and FOLLOW for all sentences of a grammar? I'll explain in my next post.
Labels: grammar rules, LL(1), recursive descent