Project | One kilobyte Tiny BASIC for the 8080 and Z80

« Back to project details Sort by:

Operators and precedence
02/11/2024 at 07:33 • 0 comments

A discussion about this project on a forum on 6502.org led me to rewrite the code for the comparison operators because they didn’t handle edge cases correctly. E.g. I had implemented a<b as a<=b-1. This doesn’t work if b=-32768 because -32768-1 wraps round to 32767.

I rewrote the code to handle this correctly. I also came across another more significant error where I had made incorrect assumptions about the behaviour of the carry flag. It took 6 bytes to fix both of these problems: now <= and >= test for equality first, then fall through to < and >. Comparison is now done using SUB and SBB instructions rather than RST_NegateDE and DAD.

The way that I have implemented operator precedence is a problem. In my implementation no two operators can have the same precedence, whereas in other BASICs operators can have equal precedence and are evaluated in left to right order. E.g. in some other BASICs + and - have the same precedence, and * and / have the same precedence. My operator precedence code is 5 bytes long. Changing it to allow some operators to have equal precedence would increase this. Because BASIC programmers don’t usually use more than one comparison operator in an expression, and because having - as higher precedence than + has not seemed problematic so far, I think it is only * and / that may cause problems when running existing BASIC programs (I discovered this when running Gordon Henderson’s Mandelbrot set program that runs in Tiny BASICs). E.g. 3*4/3 evaluates as 3*(4/3)=3 rather than (3*4)/3=4.

I notice that some BASICs don’t have equal precedence for * and /. ZX80 integer BASIC gives * higher precedence than /, whereas in ZX81 BASIC they are equal precedence. So I could do the same and make * higher precedence than / at no cost (just rearranging the code). Or I could try to be consistent with other Tiny BASICs. I think that this could be done at a cost of 2 bytes by applying a bit wise OR operation to the operator code before comparing it with the operator on the stack during evaluation, but exactly how this will work will depend on the operator function address. This is the kind of nasty dependency that occurs if trying to squeeze as much functionality as possible into 1K. I will leave this issue until I have thoroughly tested all other behaviour of operators and expressions, so that I can be reasonably sure that address will remain fixed.
Error messages
02/04/2024 at 10:16 • 0 comments

Rather than using up memory on lengthy error messages, 1K BASIC displays the address (in decimal) after the call instruction that called the error subroutine. This means that error codes change during development, but has the advantage that you can distinguish one kind of error from another. For example at the time of writing this ‘E706’ means an unrecognised token was encountered during parsing.
Bug fix
01/28/2024 at 09:20 • 0 comments

I noticed that the @ variable name wasn’t displaying correctly (it was showing as M). This was because the ‘end of TokenList’ marker wasn’t being found. This is a byte with value FFh immediately following the list of tokens. If this byte is encountered when looking up a token, it means that the token can’t be found, and during LIST this means that it must be a variable name token because all other possibilities (line number, integer, string, end of program marker) have already been checked.
But the FFh marker is only recognised if it is 2 bytes after the last byte of the last token. I had it one byte before this. Because there is an FFh a few bytes further on, the error only affected @. There is a sequence 00 CD before FFh is encountered - 00 is the token value for @ and CDh=‘M’ or 80h, which is why it was displayed as M.
The problem was fixed by adding a second FFh at the end of TokenList, then deleting one of the two free bytes at address 014Dh (because it is important that TokenList ends at address 0222h). I hope to find a better fix that doesn’t cost a byte. I also noticed there is a potential 1-byte saving in between List_Token_Loop and List_Token_String_Loop.
The array @
01/27/2024 at 09:12 • 0 comments

There is a single array @, and it starts at the first byte of free memory after the parse area. The ^ variable stores the address of the start of the array. If I input LANDER.BAS (the classic LUNAR LANDER game) then PRINT ^ gives 1502. This means that Lunar Lander occupies 414 bytes (1502 - 1024 (start of RAM) - 64 (variable storage area)). Free memory is 2048 - 1502 = 546, but some of this is used by INPUT and for the stack.
There is no range checking on the array index, and the index can be negative. This means that @ can be used to access the whole of memory. This can be used in lieu of PEEK and POKE. PRINT @(-(^+1)/2) shows the 16-bit value at address 0000.
Expression evaluation
01/26/2024 at 07:34 • 0 comments

When I first wrote the expression evaluator I used Dijkstra’s shunting yard algorithm for handling operator precedence and bracketed subexpressions. I changed approach later on when writing code for the RND, USR and ABS functions because I realised the the evaluator needed to be recursive so that the function parameter can be evaluated. In the new approach recursion is also used to evaluate bracketed subexpressions. Within a bracketed subexpression a stack-based evaluator is used, but the top of the evaluation stack is stored in DE - this means that evaluation of simple expressions (constants, variables) doesn’t need to use the physical stack at all.

To the evaluator, an expression is just an alternating sequence of values and operators (because function calls and bracketed subexpressions are handled recursively, they are just like values to the evaluator).

When an operator is encountered, there will be a value in DE, and the stack will either be empty if this is the first operator, or will contain an operator. An empty stack is detected by looking for a value of zero in the high-order byte of the 16-bit value on the stack (because the evaluator is always called from page zero). If there is an operator on the stack and its precedence is greater than the operator encountered, the function corresponding to the operator on the stack is called (leaving a value in DE) and we loop back to the beginning of this paragraph. Otherwise we push DE and the operator onto the stack and look for the next value. When we reach the end of the expression we evaluate every operator remaining on the stack.

At the time of writing this, the evaluator occupies addresses 38h to 7Ch and BCh to CFh, a total of 88 bytes. This doesn’t include any of the code called by the evaluator to evaluate operators or RND, USR, ABS, or code getting the location of a variable.

Space-saving implementation of operators

01/25/2024 at 08:53 • 0 comments

The + - = <> < <= > >= operators are all implemented using the following 23 bytes of code:

GTSub:
  ; Swap operands and fall through
  XCHG
LTSub:
  DCX D
LTESub:
  ; Swap operands and fall through
  XCHG
GTESub:
  RST_NegateDE
  DAD D
	
  MOV A,H
  RAL
	
  DB 11h; LXi D opcode to swallow next byte

EqualSub:
  RST_CompareHLDE ; returns Z iff HL=DE
  CMC
  DB 3eh ; MVI A opcode to swallow next byte
	
NotEqualSub:
  RST_CompareHLDE; returns Z iff HL=DE
  LXI D,1
  RNC
  DCX D
  RET

AddSub:
  DB 3eh ; opcode for MVI A, to eat next byte
SubSub:
  RST_NegateDE
  ;Add DE to HL and keep in DE
  DAD D
  XCHG
	
  RET

This uses a lot of code sharing, and saves space at the expense of speed. The order of the operator …Sub: labels corresponds to operator precedence, so if these are all in the same 256 byte page then the low order byte of the label address effectively encodes operator precedence.

Program storage, parsing, editing
01/25/2024 at 01:46 • 0 comments

1K Tiny BASIC is tokenised, unlike a lot of other Tiny BASICs. Tokens are either line numbers, integers, strings, variable names, or keywords (including operators and syntactic punctuation), or the special ‘syntax error’ token which indicates that a syntax error occurred. For statements, operators, and some other tokens, the token value is the low order byte of the address of the subroutine corresponding to that token. For operators, the value also increases with increasing operator precedence - this means that the order in which operator subroutines occur in memory is important.

Programs are stored as a simple sequence of tokens in memory. LIST simply has to traverse the sequence and print the string corresponding to each token. GOTO and GOSUB look for the target line by starting from the start of the program and scanning forwards until it is found (and produce an error message if not found).

Parsing happens in-place in memory immediately after the current last line of the program. As soon as a sequence of entered characters is recognised as a token, it is converted into a token. This means there is no need for a separate input buffer elsewhere in memory.

After a line has been parsed it is either executed (if it doesn’t start with a line number), or moved to the right place in the program with a triple-reversal memory rotate algorithm implemented in 27 bytes. This same algorithm is used to delete a line if a line number is entered by itself.

The INPUT statement shares the same parser as is used when entering programs, except that a separate area of memory is used. This means that the input statement will accept expressions as well as integer constants.
RND function
01/22/2024 at 23:53 • 0 comments
The RND function is based on an implementation of the XORSHIFT algorithm that I found here. The algorithm is described in this paper.
The core of it is this 20 byte routine:
```
LHLD RNG_SEED
MOV A,H
RAR
MOV A,L
RAR
XRA H
MOV H,A
MOV A,L
RAR
MOV A,H
RAR
XRA L
MOV L,A
XRA H
MOV H,A
SHLD RNG_SEED
```
The middle 14 bytes of this code are instructions that operate on A,H,L and flags. There are about 45 such instructions.
I wonder whether it would be possible to search the space of all 13-byte or shorter sequences of these instructions to look for a shorter RNG. It’s a large search space, but it could be pruned by omitting combinations of instructions that make no change of state or that cause information loss or that erase the effect of a preceding instruction, so e.g. MOV H,A could not be followed by MOV H,A (no change of state) or MOV A,H (no change of state) or MOV L,A (information loss) or MOV H,L (erases effect).
The search would count how many times the sequence has to be executed before HL returns to its original state. Sequences that give a cycle close to 65536 would be RNG candidates. In addition to RNG candidates, the search would throw up trivial things like INX H all by itself, which would be rejected by an RNG tester.
Current status
01/04/2024 at 21:25 • 0 comments
The 1K BASIC interpreter is in a working state, with the following caveats:
- Division is implemented using repeated subtraction, so it is slow. The RND function is slow for the same reason. I hope to replace the division subroutine with a faster version sometime.
- The syntax checker doesn’t give an error if there are extra letters after a keyword e.g. PRINTIFY will work just as well as PRINT.
It runs a couple of example programs okay (the classic ‘Lunar lander’ and ‘Reverse’ games). I’m still working on a comprehensive automatic regression test suite.

The code is 1012 bytes long - 12 bytes short of 1K. Some of these free bytes would be used up when porting PutChar, because it is likely that a ‘ready for output’ test will be needed before outputting a character.
Syntax and semantics
01/04/2024 at 20:53 • 0 comments
The following statements are supported:
- PRINT <expression>
- LET <var> = <expression>
- IF <expression> <statement>
- GOTO <expression>
- GOSUB <expression>
- RETURN
- INPUT <var>
- FOR <var> = <expression> TO <expression> [STEP <expression>]
- NEXT
- END
and additionally, these commands can be issued if not prefixed by a line number:
- LIST
- RUN
- NEW
<var> is a letter A-Z. A single array called @ is supported, its elements are @( <expression> ). The array uses all remaining memory after the program - it will eventually collide with the stack. Two system variables are available in ^ (address of start of array) and _ (current state of Random Number Generator). ^ can be used to work out how much free memory there is by subtracting it from the highest RAM address, and then subtracting 8+however much stack the program is likely to use (a few tens of bytes typically).

Signed 16-bit integer arithmetic is supported, with no overflow detection. Operators are + - * /.

Comparison operators are = <> < > <= >=. The result of a comparison is 1 if true, 0 if false.

Three functions taking a single parameter are available: ABS, USR and RND. The USR(x) function calls the address x. The RND(x) function returns a random number between 0 and x-1.

Program editing features are minimal - lines can be deleted by typing the line number by itself. Unlike in most BASIC implementations, overwriting a line is not permitted, a line must be deleted and then re-entered.

Prev Next