Translated the Pascal Code to C Code

C Code

The code has been translated. Quite a bit of clean up:

General regrouping of code.
Fixed memory reference error (mixed up low/high byte order).
Simplified the symbol table.
Added code to export the code being tokenise to "stderr"
Reworked the error reporting routine.
Fixed up the EOF problem.
Reordered the constants and pushed them out to a header file.
Pushed out the assembler and interpreter code to include files.
Changed the CPU model.
Added some options to the command line.

The current version of the tool tranin has been uploaded.

Understanding the Code

The compiler code in most of the minimalist program codes I reviewed are very similar. Even to the point where missing instructions (i.e.">") and constant names are the same (i.e. SETLSS rather than SETLT and SETLEA instead of SETLE).

A good site for compiler construction (but not complete) is http://zserge.com/blog/cucu-part1.html. I will likely refer back to this site for when I add functions to my code.

A good pdf on compiler construction (by Jack W. Crenshaw) is http://compilers.iecc.com/crenshaw/ (I have uploaded is book compiler.pdf).

But the place to go is "Recursive-Descent Parsing" Chapter 6.6 of the AWK book pp 147-152. (I have uploaded a pdf of the book and the programs from Chapter 6).

The NewCPU Model

The CPU model for nearly all the compiler I have reviewed is:

Ax: Accumulator register
Bx: Secondary register
Fx: Flag register
SP: Stack Pointer
DP: Data Pointer (Simple Compiler)
16 bit data and address word width

Usually all data is on the stack. The Simple Compiler uses a separate Data Pointer (DP) (i.e. modelling a Pascal style "heap memory"). I will keep this method.

The the assembler code therefore revolves around Ax and Bx:

MOV Ax, Bx
MOV Bx, Ax
PUSH Ax
PUSH Bx (added)
PUSH Fx (added)
POP Ax
POP Bx
POP Fx (added)
MOV AX, IMM
MOV AX, [ADDR] (uses DP)
MOV [ADDR], AX (uses DP)

Two basic jumps:

JMP [ADDR]
JZ [ADDR]

The remainder maps the language symbols (i.e. '+', '-', '*', '/', '=',' <>','<','<=','>','>=') :

ADD Ax, Bx
SUB Ax, Bx
IMUL Ax, Bx (software)
IDIV Ax, Bx (software)
SETEQ
SETNE
SETLT (added)
SETLE (added)
SETGT
SETGE

The set commands set (for true) or reset (for false) the AX depending on the Flag register. True is defined as Ax=1 and false is defined as Ax=0.

The following commands set the flag register (Fx):

OR Ax, Ax (Sets the Flags based on the Ax)
XOR Ax, Ax (Clears the Ax and sets flags)
CMP Ax, Bx (Sets flags based on Ax-Bx)

Finally there is a HALT and a couple of I/O commands:

HALT (same as reset)
wrtAx (converts an integer to characters before printing)
wrtLn (write a new line)
rdAx (added, reeads characters and converts to integer)

Language Construct

Languages can me modelled in Backus–Naur Form (BNF). Before extending the language I need to map the existing language. I use the syntax from the AWK handbook for the language construct:

// Factor         -> Identifier
//                -> Integer
//                -> ( BoolExpression )
// Term           -> Factor
//                -> Factor * Factor+
//                -> Factor / Factor+
// Expression     -> Term
//                -> + Term
//                -> - Term
//                -> Term + Term+
//                -> Term - Term+
// BoolExpression -> Expressiom
//                -> Expression = Expression
//                -> Expression <> Expression
//                -> Expression < Expression
//                -> Expression <= Expression
//                -> Expression > Expression
//                -> Expression >= Expression
// Statement      -> Begin
//                -> While
//                -> If
//                -> Write
//                -> Read
//                -> Assignment
// Begin          -> Statement    \\ This is where a statement separator (";") should added!
//                -> End
// While          -> BoolExpression Statement
// If             -> BoolExpression then Statement
//                -> BoolExpression then Statement "else" Statement
// Write          -> BoolExpression , BoolExpression+
// Read           -> Input
// Assignment     -> Identifier = BoolExpression

Note: the "+" at the end of a line means it can be repeated.

Its okay but I think the program should begin with a "begin" and end with an "end". The Statement procedure needs to be modified for this. There probably needs to be a semi-colon (";") as a statement separator as well. I think it might get messy without a statement delimiter later.

Bit-wise operations (i.e. shl, shr, or, and, xor, not) are also missing.

The parser code is complicated because the procedures use the CPU stack to communicate between procedures. But once you realise this it get a bit easier to read.

Fixed the EOF bug in the "BeginState" procedure and some more code cleanup to allow export of the tokeniser data. Pretty happy with the code as it stands and I now understand it fully.

AlanX

C Code

Understanding the Code

The NewCPU Model

Language Construct

Converting the Simple Compiler to a Subleq Compiler

Discussions

Become a Hackaday.io Member