Code Generation
I have been testing some fairly big programs. Another grammar fault I found is that you cannot assign negative constants. Fixed.
Here is the grammar at the moment:
PL/0 Grammar: program = block "." . block = [ "const" ident "=" ["-"] number {"," ident "=" number} ";"] // Add negative constants [ "var" ident {"," ident} ";"] [ "var" ident "[" number "]" {"," ident "[" number "]"} ";"] // Add arrays { "procedure" ident ";" block ";" } statement. statement = [ ident ":=" expression ident "[" expression "]" ":=" expression // Add arrays ident ":=" string // Add string assignmnet to ident ident "[" expression "]" ":=" string // Add string assignmnet to arrays | "call" ident | "?" ident | "write" ident | "putc" ident // Add "write" (int) and "putc" (char) | "!" expression | "read" expression | "getc" expression // Add "read" (int) and "getc" (char) | "begin" statement {";" statement } [";"] "end" // Add optional ";" before "end" | "if" condition "then" statement [ "else" statment ] // Add "else" | "while" condition "do" statement ]. condition = "odd" expression | expression ("="|"#"||"!="||"<>"|"<"|"<="|">"|">=") expression . // Add "!+" and "<>" expression = [ "+"|"-"] term { ("+"|"-") term}. term = factor {("*"|"/"|"%"|"mod") factor}. // Add "%" and "mod" factor = ident | ident "[" expression "]" | number | "(" expression ")". // Add arrays Add comments (to tokeniser): // ... EOL { ... } /* ... */ (* ... *)
Still have not fixed the unary minus fault but its not really an issue, just put parentheses around it.
The above grammar may make you eyes roll but you need it to code a compiler.
The Plan
The plan for building the code generator is to rework the interpreter:
- Change the stack to downward growth from upward growth (done).
- Swap out the P-Code instructions for pseudo CPU OpCode (done).
- Convert the interpreter into a OpCode lister (done).
- Covert the opcodes to Subleq.
The reason for doing this to the interpreter is that changes can be incrementally checked.
The CPU I am modelling is as minimal as practical. The current pseudo OpCodes are stack focused. Here are the registers:
Registers | Comment |
Ax | Primary Register |
Bc | Secondary Register |
Tx | Temporary Register (used for swap) |
PC | Program Counter (?) |
SP | Stack Pointer (downward growing) |
BP | Base Pointer |
The program counter (PC) is not actually a register but a subroutine return address. It is set by the compiler at compile time, that is the real PC register is not actually accessed during execution.
The stack and register pseudo opcodes:
Stack operations: | Register moves: |
FromStack (Ax=stack[Ax] | mov Ax,Bx |
ToStack (stack[Ax]=Bx) | mov Ax,SP |
push Ax | mov SP,Ax |
pop Ax | mov Ax,BP |
pop Bx | mov BP,Ax |
mov PC,Ax | |
Move Immediate (literal): | mov Ax,PC |
mov Bx,Imm | mov Tx,Ax |
mov Bx,Tx |
I have favoured register moves over stack operations because stage operations are expensive. Here are the remaining pseudo opcodes:
Increment/Decrement: | Mathematical Operations: | Boolean Operations: |
dec Ax | sub Ax,Bx | odd Ax |
inc Ax | add Ax,Bx | eq Ax,Bx |
mul Ax,Bx | lt Ax,Bx | |
Jump: | div Ax,Bx | le Ax,Bx |
jmp | mod Ax,Bx | ne Ax,Bx |
jz | gt Ax,Bx | |
ge Ax,Bx |
Add four input/outputs instructions:
Input/Output: | Code: |
read (integer) | read |
get (char) | getc |
write (integer) | write |
put (char) | putc |
Recoded the pseudo opcode interpreter into a code lister (done). Called it PCode2OpCode.
I have been reviewing the OpCode output and I have used indirect addressing with the Ax register which of course is not a valid instruction for the x86 instruction set. It is pseudo opcodes after all. I also changed the format to make it easier to import, for example I use "mov_Ax,Bx" rather than mov Ax, Bx" and I dropped the "L" prefix for address labels.
AlanX
Discussions
Become a Hackaday.io Member
Create an account to leave a comment. Already have an account? Log In.