Oberon and LL(1) Parsing

Discussion:

(too old to reply)

August Karlstrom

2008-08-08 15:04:22 UTC

In the chapter "Compiler Construction" from the book "The School of
Niklaus Wirth"(*) Hanspeter Mössenböck writes:

"Unfortunately even Wirth's languages contain a few constructs which are
not LL(1). To analyze such constructs, it is not sufficient to rely on
syntactic information. One must also take semantic information into
account. For example, a type guard x(T) in Oberon cannot be
distinguished from a procedure call syntactically. One has to consult
the symbol table to find out if x is a procedure or a variable. This
ambiguity could have easily been avoided, for example by using curly
brackets instead of round brackets."

Does anyone here know the motivation for the mentioned irregularity,
i.e. why not write x{T} instead of x(T)? Are there more examples of
non-LL(1) constructs in Oberon?

August

(*) This article is also available at
ftp://ftp.ssw.uni-linz.ac.at/pub/Papers/Moe00b.pdf

Pascal J. Bourguignon

2008-08-08 17:29:32 UTC

Permalink

Post by August Karlstrom
In the chapter "Compiler Construction" from the book "The School of
"Unfortunately even Wirth's languages contain a few constructs which
are not LL(1). To analyze such constructs, it is not sufficient to
rely on syntactic information. One must also take semantic information
into account. For example, a type guard x(T) in Oberon cannot be
distinguished from a procedure call syntactically. One has to consult
the symbol table to find out if x is a procedure or a variable. This
ambiguity could have easily been avoided, for example by using curly
brackets instead of round brackets."
Does anyone here know the motivation for the mentioned irregularity,
i.e. why not write x{T} instead of x(T)?

Yes. People who developed, from the 50's to the 80's, the theory of
lexers and parsers, worked hard to invent them. If in the 90's you
start to design your languages using the simple LL(1), for which you
can write compilers meaningfully by hand, you nullify all this work.

Notaly, that nice feedback loop, that goes from the parser down to the
lexers thru the symbol table, wouldn't be needed anymore if there
wasn't such an ambiguity left.

As defined, you've got the pleasure to have to do:

in the parser:

typedcl ::= var <id> ';' .
--> put (token-kind = VAR, id = <id>) in the symbol table

in the lexer:

scan an identifier, search the identifier in the symbol table, if
found, return the token-kind and id, else return IDENT as token-kind
and the token text.

So now, you can write the two rules with different tokens.

Moreover, this gives a good exercise for the students.

Post by August Karlstrom
Are there more examples of non-LL(1) constructs in Oberon?

--
__Pascal Bourguignon__ http://www.informatimago.com/

"Logiciels libres : nourris au code source sans farine animale."

August Karlstrom

2008-08-09 22:14:23 UTC

Permalink

Post by Pascal J. Bourguignon

It is my impression that Mr. Wirth is not afraid to do so, hence my
question.

August

F***@gmail.com

2008-08-13 16:01:38 UTC

Permalink

Post by August Karlstrom
In the chapter "Compiler Construction" from the book "The School of
"Unfortunately even Wirth's languages contain a few constructs which are
not LL(1). To analyze such constructs, it is not sufficient to rely on
syntactic information. One must also take semantic information into
account. For example, a type guard x(T) in Oberon cannot be
distinguished from a procedure call syntactically. One has to consult
the symbol table to find out if x is a procedure or a variable. This
ambiguity could have easily been avoided, for example by using curly
brackets instead of round brackets."
Does anyone here know the motivation for the mentioned irregularity,
i.e. why not write x{T} instead of x(T)? Are there more examples of
non-LL(1) constructs in Oberon?
August
(*) This article is also available atftp://ftp.ssw.uni-linz.ac.at/pub/Papers/Moe00b.pdf

Another example of not LL(1) syntax is the set using constant numbers
like {0..3}
After reading 0. you need a lookhaed to determine if is real number
like 0.1 or an upto range as {0..3}. May be was better {0 TO 3}.

Don't know about the reason; don't think it was so intentinally, may
be a just an oversight
Stefano

August Karlstrom

2008-08-14 10:45:04 UTC

Permalink

Post by F***@gmail.com
Another example of not LL(1) syntax is the set using constant numbers
like {0..3}
After reading 0. you need a lookhaed to determine if is real number
like 0.1 or an upto range as {0..3}. May be was better {0 TO 3}.

Decimal comma "." and range delimiter ".." are both terminal symbols so
as far as I know the parser will see something like

leftBrace digit range digit rightBrace

which is clearly LL(1).

Post by F***@gmail.com
Don't know about the reason; don't think it was so intentinally, may
be a just an oversight

If the type guard is the only non-LL(1) construct then yes. If there are
already other "unavoidable" non-LL(1) constructs Mr Wirth may have
argued that one more of them will not add much complexity.

August