DataStructure representing SyntaxDiagrams?

Discussion:

(too old to reply)

A***@gmail.com

2012-10-20 17:10:45 UTC

In the 70's there was a 2 or 3 issue article in the BYTE periodical,
developing a PASCAL compiler in BASIC. Since I had just then learned about
Finite-State-Machines represented by diagrams equivalent to syntax-diagrams
and since I wanted to make the HP [IEEE488-instrument-driver] desktop
calculator into a PASCAL-like language controller for a suite of
IEEE488 instruments, I was able to use the FSM ideas.

It's amazingly simple, and very maintainable.
You just let the source code 'walk through the syntax diagrams'
[which are nested: blok, statement, exprn.. -- I just guessing now]
and at the relevant nodes [like stations on a railwayline] you
do the action: push, pop, access SymblTbl, generate-code ...

I used to 'code' mainly from my paper-written syntax diagrams IIRC.
Even later when I ported this 'PASCAL' to 8 & 16 bit microprocessors,
running 'its own PASCAL' I prefered to use the FSM method.

Since the HP machine had a BASIC-like [ROM based] language, it had only
numbers and characters and 1-dimensional arrays of both.
Consequently, I never learned to use records. I can read them, but
I've never written any and totally lack a feeling for their use.

The FSM was represented by arrays. IIRC, one per each column of
my paper diagram. BTW this is one BIG reason why I oppose
Oberon's abandonment of <data-arrays>.

This may finally be an opportunity for me to learn to THINK
in terms of records? But it seems absurd to use 1 record per
node of the syntax diagram?

Can someone suggest a suiatble data stucture for the syntax diagrams?
The correspondence between the code and the paper diagrams must be
easy and obvious, for ease of maintenance, because the intended application
is structured editors for 'your next new languages'.
Fill in the table and let it guide you, when writing code,
instead of having to remember a new syntax.

BTW, for google: what is the name [L, LR, ?] of the family of grammars
like PASCAL which need only one-token look-ahead to parse?

== TIA

r***@gmail.com

2012-10-20 22:28:21 UTC

Permalink

Hi,

Post by A***@gmail.com
In the 70's there was a 2 or 3 issue article in the BYTE periodical,
developing a PASCAL compiler in BASIC.

I never had a subscription to BYTE, but are you sure you don't mean Ada here? E.g. Augusta (subset)? (Sorry, can't find a link right now.)

Post by A***@gmail.com
You just let the source code 'walk through the syntax diagrams'
[which are nested: blok, statement, exprn.. -- I just guessing now]
and at the relevant nodes [like stations on a railwayline] you
do the action: push, pop, access SymblTbl, generate-code ...

So you already have the railroad diagrams? For original Pascal, they are also found in Wirth's _Algorithms and Data Structures_ (1976 edition) in the appendix.

I don't offhand know of any online on the web except the Pascal-ish PL/0 offshoot XPL0:

http://www.xpl0.org/syntax.html

I'm no pro myself, but it sounds like something that would be best explained in Wirth's _Algorithms + Data Structures_. I've been somewhat reading over his latest version lately (or maybe the 2004 Oberon edition?), it's interesting. Though of course he is biased towards Oberon these days, but I have no problem with that.

http://www.inf.ethz.ch/personal/wirth/books/AlgorithmE1/AD2012.pdf

Post by A***@gmail.com
The FSM was represented by arrays. IIRC, one per each column of
my paper diagram. BTW this is one BIG reason why I oppose
Oberon's abandonment of <data-arrays>.

I don't understand what this means. AFAICT, Oberon arrays are the same as Pascal. Sure, for dynamic structured data, you'd be better off with POINTER TO RECORD, but ARRAY OF CHAR, INTEGER, etc. is still available. Or maybe you mean Oberon-07 restrictions? Dunno.

Post by A***@gmail.com
This may finally be an opportunity for me to learn to THINK
in terms of records? But it seems absurd to use 1 record per
node of the syntax diagram?

Just try it out and see. If it doesn't work, use something else. There's no fixed answer, just use whatever works.

Post by A***@gmail.com
Can someone suggest a suiatble data stucture for the syntax diagrams?

In Wirth's 1976 A+D book, chapter five was the mini PL/0 compiler example. Later he ripped that out in lieu of a separate book entirely: _Compiler Construction_

http://www.inf.ethz.ch/personal/wirth/books/CBEAll.pdf

Post by A***@gmail.com
The correspondence between the code and the paper diagrams must be
easy and obvious, for ease of maintenance, because the intended application
is structured editors for 'your next new languages'.
Fill in the table and let it guide you, when writing code,
instead of having to remember a new syntax.

Good luck with that. I wish compilers were so easy. Despite Wirth's great efforts, many people (myself included) still stumble over this. It can become very complicated, if you let it.

Post by A***@gmail.com
BTW, for google: what is the name [L, LR, ?] of the family of grammars
like PASCAL which need only one-token look-ahead to parse?

I believe Pascal's grammar is called LL(1), but most other languages use more complicated ones, e.g. LALR or LR (or even GLR). I suppose Pat Terry's adaptation of Coco/R is probably the easiest non-YACC/LEX toolset to use.

http://en.wikipedia.org/wiki/LL_parser

http://www.scifac.ru.ac.za/compilers/
http://www.scifac.ru.ac.za/coco/

A***@gmail.com

2012-10-23 09:42:38 UTC

Permalink

Post by r***@gmail.com
So you already have the railroad diagrams? For original Pascal, they are
also found in Wirth's _Algorithms and Data Structures_ (1976 edition)
in the appendix.

Well no. I want to apply the same principle to a completely
different application:
<traversing syntax diagrams to parse other languages,
eg. to make a syntax-aware editor, so that it shows you, what's
the next possible valid entry, like a menu shows you,
so that you just need to recognise, instead of remember>.

Post by r***@gmail.com
http://www.xpl0.org/syntax.html

Ok, that may be good to have on-file.

Post by r***@gmail.com

Post by A***@gmail.com
The FSM was represented by arrays. IIRC, one per each column of
my paper diagram. BTW this is one BIG reason why I oppose
Oberon's abandonment of <data-arrays>.

I don't understand what this means. AFAICT, Oberon arrays are the same as
Pascal. Sure, for dynamic structured data, you'd be better off with POINTER
TO RECORD, but ARRAY OF CHAR, INTEGER, etc. is still available.
Or maybe you mean Oberon-07 restrictions? Dunno.

IIRC in Turbo-PASCAL [and definitely in my PLO-derivative] I could write:
CharY[ "B","S","A","I","R"];
IntR[1,4,7,3,5];
and by suitably spacing/formatting the source code, it looked like
a 2-dim-array. Which is a massively powerfull tool. Which is why
spreadseets were the killier-app. because you can easily map
cause-and-effect, from eg. a bus/train time-table.
Oberon doesn't allow this.

Post by r***@gmail.com

Post by A***@gmail.com
This may finally be an opportunity for me to learn to THINK
in terms of records? But it seems absurd to use 1 record per
node of the syntax diagram?

Just try it out and see. If it doesn't work, use something else. There's no=
fixed answer, just use whatever works.

Post by A***@gmail.com
Can someone suggest a suiatble data stucture for the syntax diagrams?

In Wirth's 1976 A+D book, chapter five was the mini PL/0 compiler example. =
Later he ripped that out in lieu of a separate book entirely: _Compiler Con=
struction_
http://www.inf.ethz.ch/personal/wirth/books/CBEAll.pdf

I believe Pascal's grammar is called LL(1), but most other languages use
more complicated ones, e.g. LALR or LR (or even GLR). I suppose Pat Terry's
adaptation of Coco/R is probably the easiest non-YACC/LEX toolset to use.
http://en.wikipedia.org/wiki/LL_parser
http://www.scifac.ru.ac.za/compilers/
http://www.scifac.ru.ac.za/coco/

I've looked at coco previously. It lacks: what I had was MAGIC!
You just look at the paper, and at any node you can see immediately:
= what the valid Next-Tokens are,
= what action is done at that node.

So you don't need to write a top-down-compiler, with the
syntax of the langauge interwoven in the compiler's code.

You write a utility to read the syntax diagrams, where the reader
is completely separate from the particular syntax. The syntax is in the
diagrams. And the code-generation is also separate, and represented
by notation next the the corresponding node.

So, IIRC, I needed a dummy node, so that when exiting from an
assignment statement: X := 2+3
the action at the dummy-node was:
pop [the SymbtblPointer (which was pushed at node "ID=X"]
and use it to show where to store [block & offset] the
value of the expression: 2+3.
==============

Marco van de Voort

2012-10-23 10:18:31 UTC

Permalink

Post by r***@gmail.com

Post by A***@gmail.com
and at the relevant nodes [like stations on a railwayline] you
do the action: push, pop, access SymblTbl, generate-code ...

So you already have the railroad diagrams? For original Pascal, they are
also found in Wirth's _Algorithms and Data Structures_ (1976 edition) in
the appendix.

Free Pascal "ref" manual has railroad diagrams:

http://www.freepascal.org/docs-html/ref/refse90.html#x190-20000016.2

A***@gmail.com

2012-10-23 15:07:49 UTC

Permalink

Post by Marco van de Voort

Post by r***@gmail.com

Post by A***@gmail.com
and at the relevant nodes [like stations on a railwayline] you
do the action: push, pop, access SymblTbl, generate-code ...

So you already have the railroad diagrams? For original Pascal, they are
also found in Wirth's _Algorithms and Data Structures_ (1976 edition) in
the appendix.

http://www.freepascal.org/docs-html/ref/refse90.html#x190-20000016.2

Is it SOO difficult to understand me?!

I'm asking people who know about the syntax diagram method of
syntax specification, if they have ideas about suitable data-structures
for representing such <railroad diagrams> ?

And I'm noting that a previous very successfull method which I used
decades ago [so I'd have to re-invent the details now] used several
arrays: representing [IIRC] one-per-column of my diagrams.
Instead of left to right, I drew them top-to-bottom.

And then I'm commenting on the inconvenience that Oberon,
does NOT provide for such data-arrays.

-> ipNews.Send *

Marco van de Voort

2012-10-23 19:24:08 UTC

Permalink

Post by A***@gmail.com

Post by Marco van de Voort

Post by r***@gmail.com
also found in Wirth's _Algorithms and Data Structures_ (1976 edition) in
the appendix.

http://www.freepascal.org/docs-html/ref/refse90.html#x190-20000016.2

Well, let's do a first attempt:

Basically it is EBNF. The devil is in the possible recursion.

If you require that a repititeve term always has a separate definition, one
could probably structure each production as an array of record.

The record would have tokentype, a variant part (to store literal values),
and some flags (to indicate repetition)

Post by A***@gmail.com
And I'm noting that a previous very successfull method which I used
decades ago [so I'd have to re-invent the details now] used several
arrays: representing [IIRC] one-per-column of my diagrams.
Instead of left to right, I drew them top-to-bottom.

I think if you really want a comment on such vague descriptions, better work
out a detailed example.

Post by A***@gmail.com
And then I'm commenting on the inconvenience that Oberon,
does NOT provide for such data-arrays.

Well, I have never seen a requirement of "Oberon", since you crossposted
this to M2 and Pascal.misc. And afaik both support multi dimensional arrays
just fine (but not ragged ones).

However I couldn't make heads or tails of your example, that was most
definitely not Pascal or M2.

c***@hotmail.com

2012-10-23 20:32:30 UTC

Permalink

Post by Marco van de Voort
Well, I have never seen a requirement of "Oberon", since you crossposted
this to M2 and Pascal.misc. And afaik both support multi dimensional arrays
just fine (but not ragged ones).

There seems to be some confusion here. Oberon supports multi dimensional arrays just fine also. He may be referring to constant data arrays (other than string constants). If that is the case they are not supported in Standard Pascal, Standard Modula-2 or Oberon. However, it is no big deal as has been suggested already such data arrays can be initialised by reading from a file.

Richard

2012-10-21 10:27:57 UTC

Permalink

Post by A***@gmail.com
This may finally be an opportunity for me to learn to THINK
in terms of records? But it seems absurd to use 1 record per
node of the syntax diagram?

Some productions of the grammar are usually very similar. E.g., there
is probably one called Factor for multiplication and division, and one
called Term for addition and subtraction. Both have the same number of
subexpressions; thus, a single record type containing a flag
specifying the actual operation will be sufficient for all binary
operations.

Richard

A***@gmail.com

2012-10-23 09:42:22 UTC

Permalink

Post by Richard

Post by A***@gmail.com
This may finally be an opportunity for me to learn to THINK
in terms of records? But it seems absurd to use 1 record per
node of the syntax diagram?

Seeking structures which are similar, in order to factorise,
is too much like low-level optimising.

It's not the computer that matters, it's the human.
A 2-dimensional representastion [spreadsheet-like], with perhaps
an extra dimension given by color seems impossible to beat.

With the system that I used, I could immediately see if/what
code was generated at the last N nodes/tokens before the
error token % in the code:
Total := Max(BestOf(A,B,C), Min(B,C)) + % Y ;

Post by Richard
Richard

BTW I'm ad-libbing this from memory.

The reaon why ETH-Oberon & V4 are marvelous, is that they
are visual: allowing you to recognise instead of needing to
remember. Plus the minimalist cognitive load of eg.
each TextFrame is labeled with what-it-can-do.

Richard

2012-10-23 17:35:24 UTC

Permalink

Post by A***@gmail.com
Seeking structures which are similar, in order to factorise,
is too much like low-level optimising.

I think, it is an interesting challenge to find common patterns and to
limit complexity this way.

Post by A***@gmail.com
A 2-dimensional representastion [spreadsheet-like], with perhaps
an extra dimension given by color seems impossible to beat.

Well, you could store this information in a text file and read it into
appropriate data structures at runtime. The Oberon System already
provides a text scanner which makes this easy.

Of course, this is not as elegant as built-in array literals, but such
is the price of simplicity.

Richard

r***@gmail.com

2013-05-16 17:29:38 UTC

Permalink

Hi, only seven months later! :-)

Post by A***@gmail.com
In the 70's there was a 2 or 3 issue article in the BYTE
periodical, developing a PASCAL compiler in BASIC.

Apparently you were thinking of this:

Chung Kin-Man and Yuen H (1978): A "Tiny" Pascal Compiler,
Byte Magazine, Byte Publications Inc., Petersborough N.H., USA,
Volume 3, Numbers 9, 10, 11, September, October, November 1978.

Apparently it was also later collected into a book:

The BYTE Book of Pascal
BYTE BOOKS (1980), 333 pages
Blaise W. Liffick

What I mistakenly thought you were thinking of was this:

Tiny Ada / Augusta
Edward Mitchell
Dr. Dobb's Journal, issues 75, 77, 79, 81 (1983)

Post by A***@gmail.com
Since the HP machine had a BASIC-like [ROM based] language, it
had only numbers and characters and 1-dimensional arrays of both.
Consequently, I never learned to use records. I can read them,
but I've never written any and totally lack a feeling for their
use.
This may finally be an opportunity for me to learn to THINK in
terms of records? But it seems absurd to use 1 record per node
of the syntax diagram? Can someone suggest a suiatble data
stucture for the syntax diagrams?

It's probably going to be some kind of linked list, aka dynamic
(variant) records using the heap (pointers via new, dispose). You
could probably kludge something up with arrays, but for dynamic
languages with no hard-coded limits (and to avoid mallocing
everything at once), it's not ideal.

Various sources (of various quality) for study exist online:
Pascal-S, Oberon-0, Pascal P4 or P5, M2M-PC, FPC, OP2, GM2, GPC

Bernhard Treutwein

2013-06-08 14:01:02 UTC

Permalink

Post by A***@gmail.com
Can someone suggest a suiatble data stucture for the syntax diagrams?
The correspondence between the code and the paper diagrams must be
easy and obvious, for ease of maintenance, because the intended application
is structured editors for 'your next new languages'.
Fill in the table and let it guide you, when writing code,
instead of having to remember a new syntax.

yet another month later, I was able to track down my memories ...
there is an EBNF visualizer, which creates syntax diagrams from
EBNF code at http://dotnet.jku.at/applications/visualizer/. It is
implemnted in C# and has been generated with the help of Coco
(the Compiler Compiler Toolkit) (see
http://www.scifac.ru.ac.za/cspt/modula2.htm#Compilertools) and/or
http://www.ssw.uni-linz.ac.at/coco/).

Bernhard

Richard

2013-06-08 20:16:26 UTC

Permalink

Post by Bernhard Treutwein
there is an EBNF visualizer, which creates syntax diagrams from
EBNF code at http://dotnet.jku.at/applications/visualizer/. It is
implemnted in C# and has been generated with the help of Coco
(the Compiler Compiler Toolkit) (see
http://www.scifac.ru.ac.za/cspt/modula2.htm#Compilertools) and/or
http://www.ssw.uni-linz.ac.at/coco/).

A similar tool, which also works in other operating systems than
Windows, is the Java-based ANTLRWorks:

http://www.antlr3.org/works

It contains an editor for ANTLR grammars (slightly different syntax
than EBNF: e.g. (...)? instead of [...]) and can show syntax diagrams
immediately during editing.

Richard