this blog considered harmful: LL(1) grammar restrictions

Before I discuss the statement syntax for SPTL, I want to look at grammar restrictions. As I mentioned in a previous post, I'm planning to build a recursive descent (top down) parser for SPTL. Specifically, it will be an LL(1) parser. That is, parsing from Left to right, with a Lookahead of 1 token.

This places restrictions on the grammar. The parser must be able to determine, when it finds a token, what grammar rule to apply. If a language had these rules:

statement : incstatement | decstatement
incstatement : var '++'
decstatement : var '--'

then the parser wouldn't know when it encountered a var whether this was an incstatement or a decstatement.

For a grammar to qualify as LL(1), it must satisfy a set of rules.

Define:

A as a sentence (one or more tokens) in the grammar
u and v as sentences derived from A
w and x as unspecified sentences
FIRST as the set of possible first tokens for the sentence
FOLLOW as the set of possible first tokens for the following sentence

The following symbols mean:

∀ For all
∃ There exists
∅ Empty set
∈ Empty sentence
∩ Intersection (the elements common to both sets)

A grammar G is LL(1) if and only if:

∀ A, A ≠ ∈,
∀ pairs A → u and A → v,
FIRST(u) ∩ FIRST(v) = ∅

If ∃ A, A → ∈,
FIRST(A) ∩ FOLLOW(A) = ∅

A pair is any two elements that can appear in the same part of the sentence (in the BNF-like grammar above, these are denoted by the | symbol).

So, going back to the incstatement/decstatement example above,
statement is A, the sentence we are looking at
incstatement and decstatement are u and v respectively
FIRST(incstatement) is {var}
FIRST(decstatement) is {var}

FIRST(incstatement) ∩ FIRST(decstatement) = {var}
{var} ≠ ∅

Therefore, the grammar above is not LL(1).

How can we change it into an LL(1) grammar? Simple:

statement = var (incop | decop)
incop = '++'
decop = '--'

Now the var is part of statement, and in place of incstatement/decstatement we have incop/decop.

FIRST(incop) is {'++'}
FIRST(decop) is {'--'}

FIRST(incop) ∩ FIRST(decop) = ∅

Now the grammar (or at least this portion of it) is LL(1).

This is a very simple example. But how do we determine FIRST and FOLLOW for all sentences of a grammar? I'll explain in my next post.

Labels: grammar rules, LL(1), recursive descent

this blog considered harmful

Thursday, January 29, 2009

LL(1) grammar restrictions

0 Comments:

Links to this post:

About Me

contact me:

Previous Posts