Project | Simple Compiler

« Back to project details Sort by:

Code Generation

07/20/2017 at 11:06 • 0 comments

Code Generation

I have been testing some fairly big programs. Another grammar fault I found is that you cannot assign negative constants. Fixed.

Here is the grammar at the moment:

  PL/0 Grammar:
  program = block "." .
  block = [ "const" ident "=" ["-"] number {"," ident "=" number} ";"]        // Add negative constants
          [ "var" ident {"," ident} ";"]
          [ "var" ident "[" number "]" {"," ident "[" number "]"} ";"]        // Add arrays
          { "procedure" ident ";" block ";" } statement.
  statement = [ ident ":=" expression
                ident "[" expression "]" ":=" expression                      // Add arrays
                ident ":=" string                                             // Add string assignmnet to ident
                ident "[" expression "]" ":=" string                          // Add string assignmnet to arrays
              | "call" ident
              | "?" ident | "write" ident | "putc" ident                      // Add "write" (int) and "putc" (char)
              | "!" expression | "read" expression | "getc" expression        // Add "read" (int) and "getc" (char)
              | "begin" statement {";" statement } [";"] "end"                // Add optional ";" before "end"
              | "if" condition "then" statement [ "else" statment ]           // Add "else"
              | "while" condition "do" statement ].
  condition = "odd" expression |
              expression ("="|"#"||"!="||"<>"|"<"|"<="|">"|">=") expression . // Add "!+" and "<>"
  expression = [ "+"|"-"] term { ("+"|"-") term}.
  term = factor {("*"|"/"|"%"|"mod") factor}.                                 // Add "%" and "mod"
  factor = ident | ident "[" expression "]" | number | "(" expression ")".    // Add arrays
Add comments (to tokeniser):
  // ... EOL
  { ... }
  /* ... */
  (* ... *)

Still have not fixed the unary minus fault but its not really an issue, just put parentheses around it.

The above grammar may make you eyes roll but you need it to code a compiler.

The Plan

The plan for building the code generator is to rework the interpreter:

Change the stack to downward growth from upward growth (done).
Swap out the P-Code instructions for pseudo CPU OpCode (done).
Convert the interpreter into a OpCode lister (done).
Covert the opcodes to Subleq.

The reason for doing this to the interpreter is that changes can be incrementally checked.

The CPU I am modelling is as minimal as practical. The current pseudo OpCodes are stack focused. Here are the registers:

Registers	Comment
Ax	Primary Register
Bc	Secondary Register
Tx	Temporary Register (used for swap)
PC	Program Counter (?)
SP	Stack Pointer (downward growing)
BP	Base Pointer

The program counter (PC) is not actually a register but a subroutine return address. It is set by the compiler at compile time, that is the real PC register is not actually accessed during execution.

The stack and register pseudo opcodes:

Stack operations:	Register moves:
FromStack (Ax=stack[Ax]	mov Ax,Bx
ToStack (stack[Ax]=Bx)	mov Ax,SP
push Ax	mov SP,Ax
pop Ax	mov Ax,BP
pop Bx	mov BP,Ax
	mov PC,Ax
Move Immediate (literal):	mov Ax,PC
mov Bx,Imm	mov Tx,Ax
	mov Bx,Tx

I have favoured register moves over stack operations because stage operations are expensive. Here are the remaining pseudo opcodes:

Increment/Decrement:	Mathematical Operations:	Boolean Operations:
dec Ax	sub Ax,Bx	odd Ax
inc Ax	add Ax,Bx	eq Ax,Bx
	mul Ax,Bx	lt Ax,Bx
Jump:	div Ax,Bx	le Ax,Bx
jmp	mod Ax,Bx	ne Ax,Bx
jz		gt Ax,Bx
		ge Ax,Bx

Add four input/outputs instructions:

Input/Output:	Code:
read (integer)	read
get (char)	getc
write (integer)	write
put (char)	putc

Recoded the pseudo opcode interpreter into a code lister (done). Called it PCode2OpCode.

I have been reviewing the OpCode output and I have used indirect addressing with the Ax register which of course is not a valid instruction for the x86 instruction set. It is pseudo opcodes after all. I also changed the format to make it easier to import, for example I use "mov_Ax,Bx" rather than mov Ax, Bx" and I dropped the "L" prefix for address labels.

AlanX

PL/0 Completed?

07/19/2017 at 01:39 • 0 comments

PL/0 Completed?

Brave words! Can it really be completed? There are still things I would like to add but I need to move on. The current state of play is:

Recognises most PL/0 dialects.
Tolerant to redundant ";"
One dimensional arrays
Pascal strings

To do:

Would like to add an ASM instruction (pass through to the target CPU).

Rework the error messages to provide better hints.

Here is an example program:

{ Program Test String }
var Str[16],Prompt[16];

  procedure ReadStr;
  var i,test;
  { Read a Pascal string, uses Str }
  begin 
    i:=1;
    test:=1;
    while test=1 do begin
      getc Str[i];
      if Str[i]<32 then begin
        Str:=i-1; { Str == Str[0] }
        test:=0
      end else begin
        if i=16 then begin
          test:=0;
          Str:=i
        end else begin
          i:=i+1
        end
      end
    end
  end;

  procedure WriteStr;
  var i;
  { Write a string, uses Str }
  begin
    i:=1;
    while i<=Str do begin
      putc Str[i];
      i:=i+1
    end
  end;

  procedure WritePrompt;
  var i;
  { Write Prompt }
  begin
    i:=1;
    while i<=Prompt do begin
      putc Prompt[i];
      i:=i+1
    end
  end;

begin
  Prompt:="String Test";
  call WritePrompt;
  putc 10; { new line }

  Prompt:="Enter a string: ";
  call WritePrompt;
  call ReadStr;

  Prompt:="You wrote: ";
  call WritePrompt;
  call WriteStr;
  putc 10; { new line }
end.

You may notice I have used redundant ":" in this example.

And here is the run:

C:\AlanX\pl0>.\PLZero_Tokeniser\PLZero_Tokeniser     -i test18.pl0   -o test18.token -nl

C:\AlanX\pl0>.\PLZero_Compiler\PLZero_Compiler       -i test18.token -o test18.code  -t  test18.table

C:\AlanX\pl0>.\PLZero_Interpreter\PLZero_Interpreter -i test18.code  -o test18.run   -s  test18.stack
String Test
Enter a string: Hello World!
You wrote: Hello World!

C:\AlanX\pl0>pause
Press any key to continue . . .

I think that is pretty cool!

The next task is to write a P-Code to Subleq translator.

AlanX

PLZero Compiler

07/13/2017 at 16:12 • 0 comments

PLZero (PL/0) Compiler

The compiler is almost done. The Nikluas Wirth Pascal has been recoded in C. The structure completely rebuilt. I don't even think Mr Wirth would recognise his code! I removed the tokeniser and interpreter from the compiler and are now stand alone programs (part of the tool train).

Here are a couple of working programs.

Multiplication:

const m=7, n = 85; 
var print, x, y, z; { Assignment to print variable is printed }
 
procedure multiply; 
var a, b; 
begin 
	a := x; b := y; z := 0; 
	while b > 0  do 
	begin 
		if odd b then z := z + a; 
		a := 2 * a; b := b / 2; 
	end; 
end; 
 
begin 
	x := m;
	y := n;
	call multiply;
	print:=z;
end.

Prime number search (http://www.dustyoldcomputers.com/pdp-8/software/pl0/):

{ calculate prime numbers }
const 
  pcmax = 101;  { max on 8 is 2045 }
var
  print, { memory mapped print routine memory location }
  pc,    { prime candidate }
  d,     { divisor }
  t1,    { temp #1 }
  t2,    { temp #2 }
  isnot; { isnot a prime flag }
begin
  print := 1;
  print := 2;
  pc := 3;
  t1 := pc/2;
  while pc <= pcmax do
    begin
      { trial division to test for prime }
      d := 3;
      isnot := 0;
      { This keeps t1 >= to the sqrt(pc) }
      if (t1*t1) < pc then t1 := t1 + 1;
      while d <= t1 do
        begin
          t2 := pc / d;
          if (t2*d) = pc then { not prime }
            begin
              isnot := 1; { remember not prime }
              d := pc;    { force loop exit because we don't have break }
            end;
          d := d + 2;
        end;
      if isnot = 0 then print := pc;  { store at print will print it }
      pc := pc + 2;
    end;
end.

Wirth's test program:

CONST
  m =  7,
  n = 85;

VAR
  print, putc, x, y, z, q, r;

PROCEDURE multiply;
VAR a, b;

BEGIN
  a := x;
  b := y;
  z := 0;
  WHILE b > 0 DO BEGIN
    IF ODD b THEN z := z + a;
    a := 2 * a;
    b := b / 2
  END
END;

PROCEDURE divide;
VAR w;
BEGIN
  r := x;
  q := 0;
  w := y;
  WHILE w <= r DO w := 2 * w;
  WHILE w > y DO BEGIN
    q := 2 * q;
    w := w / 2;
    IF w <= r THEN BEGIN
      r := r - w;
      q := q + 1
    END
  END
END;

PROCEDURE gcd;
VAR f, g;
BEGIN
  f := x;
  g := y;
  WHILE f # g DO BEGIN
    IF f < g THEN g := g - f;
    IF g < f THEN f := f - g
  END;
  z := f
END;

BEGIN
  x := m;
  y := n;
  CALL multiply;
  print:=z;
  putc:=10;
  x := 25;
  y :=  3;
  CALL divide;
  print:=q;
  print:=r;
  putc:=10;
  x := 84;
  y := 36;
  CALL gcd;
  print:=z
END.

I have edited the tokeniser to recognise:

any case for token words (i.e. BEGIN, begin or Begin etc.) but variables are still case sensitive
"var" and "int" as an alternative
"<>", "!=", "#" as alternatives
"print" (lower case) is recognised at any level as the integer output.
"putc" (lower case) is recognised at any level as the char output.

I rather like the upper case for keywords.

Here is the result from Wirth's program:

Begin PL/0:
595

8
1

12
End PL/0.

I need a final check of the code, add a execution list option before moving on to the Subleq translator.

AlanX

Fixing Unary Minus

07/08/2017 at 00:48 • 1 comment

Language Structure

Programming languages can the presented/described in format called Backus-Naur Form (BNF). Here is a proper example from "Let's Build a Compiler!" Jack W. Crenshaw:

<b-expression> ::= <b-term> [<orop> <b-term>]*
<b-term> ::= <not-factor> [AND <not-factor>]*
<not-factor> ::= [NOT] <b-factor>
<b-factor> ::= <b-literal> | <b-variable> | <relation>
<relation> ::= | <expression> [<relop> <expression]
<expression> ::= <term> [<addop> <term>]*
<term> ::= <signed factor> [<mulop> factor]*
<signed factor>::= [<addop>] <factor>
<factor> ::= <integer> | <variable> | (<b-expression>)

Here is what I decoded from Simple Compiler in a "bnf" like form:

Factor         -> Identifier
               -> Integer
               -> ( BoolExpression )
Term           -> Factor
               -> Factor * Factor+
               -> Factor / Factor+
Expression     -> Term
               -> + Term
               -> - Term
               -> Term + Term+
               -> Term - Term+
BoolExpression -> Expressiom
               -> Expression = Expression+
               -> Expression <> Expression+
               -> Expression < Expression+
               -> Expression <= Expression+
               -> Expression > Expression+
               -> Expression >= Expression+

(Note, my decoded format is as "coded" and is upside down compared to the bnf format.)

Okay what does all this mean? Basically the minimum number of precedence levels (from "Let's Build a Compiler!" Jack W. Crenshaw) is 8:

Level	Syntax Element	Operator
0	factor	literal, variable
1	signed factor	unary minus
2	term	*, /
3	expression	+, -
4	b-factor	literal, variable, relop
5	not-factor	NOT
6	b-term	AND
7	b-expression	OR, XOR

The Simple compiler only has 4, so it was doomed from the start. Even without the bit operations it still needed 5 levels. What failed was the unary minus. This is why I had to put brackets around the "-5" the the code:

begin
  A=4;
  B=1;
  if A<B then
    write A*10<b;
  else begin
    write 1<B*10;
    write B*10, (a+b)*(-5);
  end
  C=-1;
  write C;
end

The original coder did try with "Expression -> - Term" as the C=-1 does work, but a true unary minus needs a higher precedence than "*" for "(a+b)*-5" to work.

So I need at added "SignedFactor" between "Factor" and "Term" based on Crenshaw's bnf, and remove the "Expression -> + Term" and "Expression -> - Term":

Factor         -> Identifier
               -> Integer
               -> ( BoolExpression )
SignedFactor   -> Factor
               -> + Factor
               -> - Factor           
Term           -> SignedFactor
               -> SignedFactor * SignedFactor+
               -> SignedFactor / SignedFactor+
Expression     -> Term
               -> Term + Term+
               -> Term - Term+
BoolExpression -> Expressiom
               -> Expression = Expression+
               -> Expression <> Expression+
               -> Expression < Expression+
               -> Expression <= Expression+
               -> Expression > Expression+
               -> Expression >= Expression+

It took a little while to get my head around the internals of the recursive procedure calls but eventually I got it to work:

void SignedFactor(void)
{
  char op;
  
  op=tTok;
  if ((op==_plus)||(op==_minus)) {
    GetNextToken();
  }
  Factor();
  if (op==_plus) {
    // Nothing to do
  } else if (op==_minus) {
    Emit(_movBxAx);
    Emit(_xorAxAx);
    Emit(_subBx);
  }
}

In the above code:

Th global "tTok" is saved to local "op" in case it is required for later use (other procedures may "eat it"). Basically parser procedures "eat" the token if it belongs to them.
Next, if the token is for this procedure (i.e. it is a "+" or a "-") then "eat" the token by getting the next token (GetNextToken()).
Okay, the procedure has to wait for the "Factor" that will be operated on to come back (i.e. pass through to Factor()).
Factor() has come back and the result will be in "Ax". If "op" is a "+" or a "-" then do the operation (emit the code).
Okay all done, return to the calling procedure (Term()).

Here is the current language structure for Simple Compiler:

Factor         -> Identifier
               -> Integer
               -> ( BoolExpression )
SignedFactor   -> Factor
               -> + Factor
               -> - Factor
Term           -> SignedFactor
               -> SignedFactor * SignedFactor+
               -> SignedFactor / SignedFactor+
               -> SignedFactor % SignedFactor+
Expression     -> Term
               -> Term + Term+
               -> Term - Term+
BoolExpression -> Expression
               -> Expression = Expression+
               -> Expression <> Expression+
               -> Expression < Expression+
               -> Expression <= Expression+
               -> Expression > Expression+
               -> Expression >= Expression+
Statement      -> Begin
               -> While
               -> If
               -> Write
               -> Read
               -> Assignment
Begin          -> Statement
               -> End
While          -> BoolExpression Statement
If             -> BoolExpression then Statement
               -> BoolExpression then Statement "else" Statement
Write          -> BoolExpression , BoolExpression+
Read           -> Input
Assignment     -> Identifier = BoolExpression

The only thing not shown above is that the semicolon (i.e. ";") is treated as a white space.

Subleq Efficiency
07/06/2017 at 04:29 • 0 comments
A Not Very Happy Interpreter
The interpreter is not very happy. I have used the OpCode jump address without considering the Subleq actual address (i.e. the OpCode instruction size does not match the Subleq macro size). This is a bit tricky as I do not calculate Subleq instructions lengths and therefore cannot calculate the Subleq address directly (at least in OpCode2Subleq). A good fix is to precede each Subleq instruction with a label identifying the OpCode Address, eg. "_OPCxxxxx". This has no cost to the final integer code size.
The use of the OpCode instruction pointer label "_OPCxxxxx" works very well. Very easy to find the OPC code in Subleq now.
Fixed a coding error, where I coded "jz" as if it was "jeqz"? Fixed and now the interpreter executes and presents the correct answer.
Subleq Efficiency
Here is the final test of my tool train:
```
C:\AlanX\SimpleC\OpCode2Subleq>SimpleC  -o 1.opc 0<1.sc
Tokenised Code:
Line      1: BEGIN
Line      2: A = 4
Line      3: B = 1
Line      4: IF A < B THEN
Line      5: WRITE A * 10 < B
Line      6: ELSE BEGIN
Line      7: WRITE 1 < B * 10
Line      8: WRITE B * 10 , ( A + B ) * ( - 5 )
Line      9: END
Line     10: C = - 1
Line     11: WRITE C
Line     12: END
Done.

C:\AlanX\SimpleC\OpCode2Subleq>OpCode2Subleq -i 1.opc -o 1.sasm

C:\AlanX\SimpleC\OpCode2Subleq>Subleq_Asm -i 1.sasm -o 1.code -l 1.list

C:\AlanX\SimpleC\OpCode2Subleq>Subleq_Int -i 1.code -l 1.list -t 1.trace

SUBLEQ Interpreter

 1
 10 -25
 -1


Interpreter finished

C:\AlanX\SimpleC\OpCode2Subleq>pause
Press any key to continue . . .
```
Code Efficiency
12 lines of high level code became 95 words (73 OpCodes) which turned into
5530 words of Subleq code (1903 lines of uncommented code). That is about 26 lines of Subleq per OpCode. And the code takes a full 4 seconds to execute!
AlanX
Converting the Simple Compiler to a Subleq Compiler
06/24/2017 at 17:03 • 0 comments
Compiler Subleq Backend
In order to convert Simple Compiler to a Subleq compiler, I have to replace the Simple Compiler Assembler (OpCode) Lister and Assemble (OpCode) Interpreter with a Subleq OpCode exporter. I will use an external Subleq assembler and interpreter. The Subleq OpCode exporter is called the Subleq backend. The Subleq backend contains all the Subleq macros, but only composite macros (which takes an operand) are accessible. Here is the list of Subleq macros:
- jump
- clear
- not
- shl
- inc
- dec
- chs
- label
- return
- putc
- getc
- wrtInt
- rdInt
- sub
- add
- copy
- ncopy (negative copy)
- jlez
- jgez
- jeqz
- jgtz
- jltz
- jnez
- jmin
- store
- absolute
- xor
- and
- or
- nand
- nor
- new sub
- new add
And here is th list of compiler OpCode that are supported:
- system
- movAxVar
- movVarAx
- movAxImm
- movAxBx
- movBxAx
- pushAx
- pushBx
- pushFx
- popAx
- popBx
- popFx
- addBx
- subBx
- mulBx
- divBx
- jmp
- jz
- wrtAx
- wrtLn
- rdAx
- halt
- cmpAx
- orAxAx
- xorAxAx
- setEq
- setNE
- setLT
- setLE
- setGT
- setGE
The "system" command sets up Subleq with:
- HALT (0)
- OUTPUT (-1)
- INPUT (-2)
- Z (zero)
- T (temp)
- t (tmp)
- P (+1)
- N (-1)
- the CPU model:
  - Ax
  - Bx
  - Fx
  - SP
  - DP
- and few useful constants:
  - _SP (-32)
  - _DP (-16384)
  - _MIN (-32768)
  - _CR (13)
  - _LF (10)
  - _SPACE (32)
  - _MINUS (45)
  - _ZERO (48)
  - _NINE (57)
The Subleq system also has a point to pointer copy (PPC) routine (as it is assumed the monitor code will be stored in ROM and pointer to pointer copy requires self modifying code).
I will likely add many more constants to support string output later.
I have added the binary operation macros:
- XOR Ax,Bx
- AND Ax,Bx
- OR Ax,Bx
- NAND Ax,Bx
- NOR Ax,Bx
- NOT Ax
- SHL Ax
- SHR Ax (to be added)
These bit operations don't update the flag register.
Integer Minimum Value
The minimum integer value (MIN) for a 16 bit integer is -32768. Subleq has a major problem with MIN. It treats MIN as equal to 0 unless special precautions are taken. This is not usually a problem unless the code uses the sign bit (such as for bit operations etc.). In the end I rewrote all the test macros (i.e. jeqz, jnez, jlez, jltz, jgez and jgtz) to be MIN aware.
I also used MIN as the return value for multiplication overflow and attempt to divide by zero. There is a "jmin" macro if required.
Working Subleq Backend
I have got a working Subleq backend test-bed (TestMacro2Subleq), for example the following Subleq macros:
```
  compositeMacro("system",0);
  compositeMacro("rdAx",0);
  compositeMacro("pushAx",0);
  compositeMacro("pushAx",0);
  compositeMacro("rdAx",0);
  compositeMacro("movBxAx",0);
  compositeMacro("popAx",0);
  compositeMacro("pushBx",0);
  // iMul
  compositeMacro("mulBx",0);
  compositeMacro("wrtInt",0);
  compositeMacro("wrtLn",0);
  // iDiv
  compositeMacro("popBx",0);
  compositeMacro("popAx",0);
  compositeMacro("divBx",0);
  compositeMacro("wrtInt",0);
  compositeMacro("wrtLn",0);
  compositeMacro("movAxBx",0);
  compositeMacro("wrtInt",0);
  compositeMacro("wrtLn",0);
```
Produces after assembly and interpretation:
```
C:\AlanX\TestSubleq>TestMacro2Subleq  1>Test.sasm
C:\AlanX\TestSubleq>subleq_asm -i Test.sasm -o Test.code -l Test.list
C:\AlanX\TestSubleq>Subleq_Int -i Test.code -l Test.list -t Test.trace

SUBLEQ Interpreter

6   <- Entered integer
2   <- Entered integer
12  -> Integer multiplication (=6*2)
3   -> Integer division (=6/2)
0   -> Integer remainder (=6%2)


Interpreter finished

C:\AlanX\TestSubleq>pause
Press any key to continue . . .
```
The Subleq code produced is too long to list here.
AlanX
Translated the Pascal Code to C Code
06/22/2017 at 16:18 • 0 comments
C Code
The code has been translated. Quite a bit of clean up:
- General regrouping of code.
- Fixed memory reference error (mixed up low/high byte order).
- Simplified the symbol table.
- Added code to export the code being tokenise to "stderr"
- Reworked the error reporting routine.
- Fixed up the EOF problem.
- Reordered the constants and pushed them out to a header file.
- Pushed out the assembler and interpreter code to include files.
- Changed the CPU model.
- Added some options to the command line.
The current version of the tool tranin has been uploaded.
Understanding the Code
The compiler code in most of the minimalist program codes I reviewed are very similar. Even to the point where missing instructions (i.e.">") and constant names are the same (i.e. SETLSS rather than SETLT and SETLEA instead of SETLE).
A good site for compiler construction (but not complete) is http://zserge.com/blog/cucu-part1.html. I will likely refer back to this site for when I add functions to my code.
A good pdf on compiler construction (by Jack W. Crenshaw) is http://compilers.iecc.com/crenshaw/ (I have uploaded is book compiler.pdf).
But the place to go is "Recursive-Descent Parsing" Chapter 6.6 of the AWK book pp 147-152. (I have uploaded a pdf of the book and the programs from Chapter 6).

The NewCPU Model

The CPU model for nearly all the compiler I have reviewed is:
- Ax: Accumulator register
- Bx: Secondary register
- Fx: Flag register
- SP: Stack Pointer
- DP: Data Pointer (Simple Compiler)
- 16 bit data and address word width
Usually all data is on the stack. The Simple Compiler uses a separate Data Pointer (DP) (i.e. modelling a Pascal style "heap memory"). I will keep this method.
The the assembler code therefore revolves around Ax and Bx:
- MOV Ax, Bx
- MOV Bx, Ax
- PUSH Ax
- PUSH Bx (added)
- PUSH Fx (added)
- POP Ax
- POP Bx
- POP Fx (added)
- MOV AX, IMM
- MOV AX, [ADDR] (uses DP)
- MOV [ADDR], AX (uses DP)
Two basic jumps:
- JMP [ADDR]
- JZ [ADDR]
The remainder maps the language symbols (i.e. '+', '-', '*', '/', '=',' <>','<','<=','>','>=') :
- ADD Ax, Bx
- SUB Ax, Bx
- IMUL Ax, Bx (software)
- IDIV Ax, Bx (software)
- SETEQ
- SETNE
- SETLT (added)
- SETLE (added)
- SETGT
- SETGE
The set commands set (for true) or reset (for false) the AX depending on the Flag register. True is defined as Ax=1 and false is defined as Ax=0.
The following commands set the flag register (Fx):
- OR Ax, Ax (Sets the Flags based on the Ax)
- XOR Ax, Ax (Clears the Ax and sets flags)
- CMP Ax, Bx (Sets flags based on Ax-Bx)
Finally there is a HALT and a couple of I/O commands:
- HALT (same as reset)
- wrtAx (converts an integer to characters before printing)
- wrtLn (write a new line)
- rdAx (added, reeads characters and converts to integer)
Language Construct
Languages can me modelled in Backus–Naur Form (BNF). Before extending the language I need to map the existing language. I use the syntax from the AWK handbook for the language construct:
```
// Factor         -> Identifier
//                -> Integer
//                -> ( BoolExpression )
// Term           -> Factor
//                -> Factor * Factor+
//                -> Factor / Factor+
// Expression     -> Term
//                -> + Term
//                -> - Term
//                -> Term + Term+
//                -> Term - Term+
// BoolExpression -> Expressiom
//                -> Expression = Expression
//                -> Expression <> Expression
//                -> Expression < Expression
//                -> Expression <= Expression
//                -> Expression > Expression
//                -> Expression >= Expression
// Statement      -> Begin
//                -> While
//                -> If
//                -> Write
//                -> Read
//                -> Assignment
// Begin          -> Statement    \\ This is where a statement separator (";") should added!
//                -> End
// While          -> BoolExpression Statement
// If             -> BoolExpression then Statement
//                -> BoolExpression then Statement "else" Statement
// Write          -> BoolExpression , BoolExpression+
// Read           -> Input
// Assignment     -> Identifier = BoolExpression
```
Note: the "+" at the end of a line means it can be repeated.
Its okay but I think the program should begin with a "begin" and end with an "end". The Statement procedure needs to be modified for this. There probably needs to be a semi-colon (";") as a statement separator as well. I think it might get messy without a statement delimiter later.
Bit-wise operations (i.e. shl, shr, or, and, xor, not) are also missing.
The parser code is complicated because the procedures use the CPU stack to communicate between procedures. But once you realise this it get a bit easier to read.
Fixed the EOF bug in the "BeginState" procedure and some more code cleanup to allow export of the tokeniser data. Pretty happy with the code as it stands and I now understand it fully.
AlanX

Code Generation

The Plan

PL/0 Completed?

Next

PLZero (PL/0) Compiler

Language Structure

A Not Very Happy Interpreter

Subleq Efficiency

Code Efficiency

Compiler Subleq Backend

Integer Minimum Value

Working Subleq Backend

AlanX

C Code

Understanding the Code

The NewCPU Model

Language Construct