Language Structure
Programming languages can the presented/described in format called Backus-Naur Form (BNF). Here is a proper example from "Let's Build a Compiler!" Jack W. Crenshaw:
<b-expression> ::= <b-term> [<orop> <b-term>]* <b-term> ::= <not-factor> [AND <not-factor>]* <not-factor> ::= [NOT] <b-factor> <b-factor> ::= <b-literal> | <b-variable> | <relation> <relation> ::= | <expression> [<relop> <expression] <expression> ::= <term> [<addop> <term>]* <term> ::= <signed factor> [<mulop> factor]* <signed factor>::= [<addop>] <factor> <factor> ::= <integer> | <variable> | (<b-expression>)Here is what I decoded from Simple Compiler in a "bnf" like form:
Factor -> Identifier -> Integer -> ( BoolExpression ) Term -> Factor -> Factor * Factor+ -> Factor / Factor+ Expression -> Term -> + Term -> - Term -> Term + Term+ -> Term - Term+ BoolExpression -> Expressiom -> Expression = Expression+ -> Expression <> Expression+ -> Expression < Expression+ -> Expression <= Expression+ -> Expression > Expression+ -> Expression >= Expression+
(Note, my decoded format is as "coded" and is upside down compared to the bnf format.)
Okay what does all this mean? Basically the minimum number of precedence levels (from "Let's Build a Compiler!" Jack W. Crenshaw) is 8:
Level | Syntax Element | Operator |
0 | factor | literal, variable |
1 | signed factor | unary minus |
2 | term | *, / |
3 | expression | +, - |
4 | b-factor | literal, variable, relop |
5 | not-factor | NOT |
6 | b-term | AND |
7 | b-expression | OR, XOR |
The Simple compiler only has 4, so it was doomed from the start. Even without the bit operations it still needed 5 levels. What failed was the unary minus. This is why I had to put brackets around the "-5" the the code:
begin
A=4;
B=1;
if A<B then
write A*10<b;
else begin
write 1<B*10;
write B*10, (a+b)*(-5);
end
C=-1;
write C;
end
The original coder did try with "Expression -> - Term" as the C=-1 does work, but a true unary minus needs a higher precedence than "*" for "(a+b)*-5" to work.
So I need at added "SignedFactor" between "Factor" and "Term" based on Crenshaw's bnf, and remove the "Expression -> + Term" and "Expression -> - Term":
Factor -> Identifier
-> Integer
-> ( BoolExpression )
SignedFactor -> Factor
-> + Factor
-> - Factor
Term -> SignedFactor
-> SignedFactor * SignedFactor+
-> SignedFactor / SignedFactor+
Expression -> Term
-> Term + Term+
-> Term - Term+
BoolExpression -> Expressiom
-> Expression = Expression+
-> Expression <> Expression+
-> Expression < Expression+
-> Expression <= Expression+
-> Expression > Expression+
-> Expression >= Expression+
It took a little while to get my head around the internals of the recursive procedure calls but eventually I got it to work:
void SignedFactor(void) { char op; op=tTok; if ((op==_plus)||(op==_minus)) { GetNextToken(); } Factor(); if (op==_plus) { // Nothing to do } else if (op==_minus) { Emit(_movBxAx); Emit(_xorAxAx); Emit(_subBx); } }In the above code:
- Th global "tTok" is saved to local "op" in case it is required for later use (other procedures may "eat it"). Basically parser procedures "eat" the token if it belongs to them.
- Next, if the token is for this procedure (i.e. it is a "+" or a "-") then "eat" the token by getting the next token (GetNextToken()).
- Okay, the procedure has to wait for the "Factor" that will be operated on to come back (i.e. pass through to Factor()).
- Factor() has come back and the result will be in "Ax". If "op" is a "+" or a "-" then do the operation (emit the code).
- Okay all done, return to the calling procedure (Term()).
Here is the current language structure for Simple Compiler:
Factor -> Identifier -> Integer -> ( BoolExpression ) SignedFactor -> Factor -> + Factor -> - Factor Term -> SignedFactor -> SignedFactor * SignedFactor+ -> SignedFactor / SignedFactor+ -> SignedFactor % SignedFactor+ Expression -> Term -> Term + Term+ -> Term - Term+ BoolExpression -> Expression -> Expression = Expression+ -> Expression <> Expression+ -> Expression < Expression+ -> Expression <= Expression+ -> Expression > Expression+ -> Expression >= Expression+ Statement -> Begin -> While -> If -> Write -> Read -> Assignment Begin -> Statement -> End While -> BoolExpression Statement If -> BoolExpression then Statement -> BoolExpression then Statement "else" Statement Write -> BoolExpression , BoolExpression+ Read -> Input Assignment -> Identifier = BoolExpressionThe only thing not shown above is that the semicolon (i.e. ";") is treated as a white space.
Discussions
Become a Hackaday.io Member
Create an account to leave a comment. Already have an account? Log In.
Crenshaw's (expression) installments are not so straight forward. In chapter 6 initial implementations have to be adjusted and renamed several times and it is not always clear what is meant by "rename the previous one to...". Then, in chapter 10 things are reorganized and renamed again. Only by comparing the chapters and copy-pasting the final implementations one can derive a correctly working expression library, which can be used as a guide: https://sharpbasic.com/pascal/exp.pas
Are you sure? yes | no