The original Tiny Basic interpreter relies on the BC (String match branch) instruction to do the parsing of keywords. After the 2-byte statement number (which is binary 16-bit value) the rest of Basic statement is a string, ending with CR (0x13). The "spec" for BC sounds a bit convoluted:
BC a "xxx" 80xxxxXx-9FxxxxXx String Match Branch. The ASCII character string in the IL following this opcode is compared to the string beginning with the current position of the BASIC pointer, ignoring blanks in the BASIC program. The comparison continues until either a mismatch is found, or an IL byte is reached with the most significant bit set to one. This is the last byte of the string in the IL, and it is compared as a 7-bit character; if equal, the BASIC pointer is positioned after the last matching character in the BASIC program and the IL program continues with the next instruction in sequence. Otherwise the BASIC pointer is not altered and the low five bits of the Branch opcode are added to the IL program counter to form the address of the next IL instruction. If the strings do not match and the branch offset is zero an error stop occurs.
But seen in the code it is clear - compare with keyword, and if there is a match code to implement the statements continues after BC, otherwise jump to next BC which checks for next keyword:
112 :RUN BC CLER "RUN"
113 J XEC
114 .
115 :CLER BC REM "CLEAR"
116 MT
117 .
118 :REM BC DFLT "REM"
119 NX
120 .
121 :DFLT BV * NO KEYWORD...
122 BC * "=" TRY FOR LET
123 J LET IT'S A GOOD BET.
At the end of the chain, assume the Basic statement was a <variable>=<expression> (assumed LET)
While simple the problem is that the execution time grows with number of keywords implemented. Clever ordering of checks (e.g. PRINT or IF - very common, before CLEAR - which is rare) can speed up a bit but it remains a slow chain:
100 LET x=100 ... executes "fast" because LET is first keyword to be checked
100 x=100 executes much slower because x is recognized as a variable only after checking for LET, PRINT, IF, INPUT, POKE, GOTO, GOSUB, RETURN, REM, FOR, NEXT, CLEAR, NEW all failed.
Obviously, there is a much faster way - instead of n IFs, execute a single "switch" which checks the first alpha (A-Z) character in the Basic statement. If this is "x" we know right away it is implied LET because no statement starts with X. From the faster interpreter:
//STATEMENT EXECUTOR
:STMT SA TRYQ //switch on alpha char, or go to TRYQ label if not
J LETLP //a
J LETLP //b
J C_BEG //clear
J LETLP //d
J E_BEG //end
J F_BEG //for
J G_BEG //goto, gosub
J LETLP //h
J I_BEG //if, input
J LETLP //j
J LETLP //k
J L_BEG //let, list
J LETLP //m
J N_BEG //next, new
J LETLP //o
J P_BEG //print, poke, push, pop
J LETLP //q
J R_BEG //return
J LETLP //s
J LETLP //t
J LETLP //u
J LETLP //v
J LETLP //w
J LETLP //x
J LETLP //y
J LETLP //z
:TRYQ BC TRYARR, "?" //? is short form of PRINT in many Basic dialects
J P0 //yes, continue with PRINT statement
:TRYARR J ARRAY
- If letter has no statement, go directly to implied LET
- if letter has single statement, go directly to execution (e.g. RETURN)
- if letter has multiple statements, execute a much shorter BC chain (e.g NEXT, NEW)
- If no alpha letter, jump to default label. In case of Tiny Basic only 2 valid cases exist then: ? (abbreviation for PRINT, or @ to assign to element of built in array e.g. @(<index>) = <expression>
SA is the new TBIL instruction (2 bytes, 0x28 <defaultcaseoffset>) implemented in microcode:
///////////////////////////* EXTENDED INSTRUCTION */////////////////////////////
.map 0x28; // SA (Switch on Alpha character)
////////////////////////////////////////////////////////////////////////////////
traceString 4; // Trace mnemonic - TODO
skipSpaces();
MDR <= ToUpper, if MDR_IS_ALPHA then next else sa_exit; // test covers lowercase too, so can execute conversion and check in parallel
IL_PC <= pc_plus_off2alpha, goto fetch;
sa_exit: IL_PC <= pc_plus_off8, goto fetch;
I measured the effect on the Mandelbrot test program (more about this test here). The speedup was modest (about 4%) but bigger advantage is that future keyword additions wont increase the interpretation times as much as a chain of BC would.
| Run time in seconds, 100MHz Basic CPU clock, set parameters W=79, H=79, M=20 | TBIL interpreter with BC (chained "ifs") | TBIL interpreter with SA (switch on alpha char) |
| Regular Basic code (use of PRINT, LET everywhere) | 91.305 | 87.616 |
| "Optimized" Basic code (use of ?, LET only for vars with keywords starting with same alpha char) | 90.907 (interpolated) | 87.234 |
As can be seen, tweaking the Basic code itself does not contribute significantly to speeding up the benchmark program.
IL (intermediate language) cache
Let's consider the following program:
... 100 FOR i = 4096 TO 8191 110 POKE i, 0 120 NEXT i ...
To execute line 110, interpreter must determine if the statement is PRINT, POKE, or maybe just P=... (implied LET). This of course takes time, esp. if done in a loop. If it would be possible to store away the fact that 110 is a POKE, and that its parameters start with i, ... then on other 4095 iterations time could be saved avoiding quite a bit of parsing. This is the goal of the IL cache.
IL cache is 32-entries deep and 48 bits wide (with some padding):
-- TBIL cache
il_cache_entry <= il_cache(to_integer(unsigned(Lino_index))); -- IL cache entry pointed to by Lino
il_cache_valid <= il_cache_v(to_integer(unsigned(Lino_index))) when (cnt_statement = X"00") else '0';
il_cache_hit <= il_cache_valid when (il_cache_tag = Lino_tag) else '0';
il_cache_empty <= '1' when (il_cache_v = X"00000000") else '0';
il_cache_full <= '1' when (il_cache_v = X"FFFFFFFF") else '0';32-entries means that the index is 5 bits, and tag 11 bits because it is indexed by 16-bit integer which is the Basic statement (for example 110 = 0x006E = 00000000011:01110)
The IL cache is only active and meaningful in "run mode", when Basic program is running, otherwise the Lino (line number) register contains 0 and each statement would map to cache entry 0 with little effect.
Sequence to update the cache:
- Basic interpreter is modified by adding SX 0 instruction as the first instruction of each Basic command/keyword. SX 0 in original Tiny Basic is a NOP, so could be reused without other side-effect.
- When SX 0 is executed, PC_IL (program counter) point to next IL instruction (which is the first instruction of the statement), and BP (Basic pointer) points to first non-blank character after the keyword
- Microcode that implements SX takes PC_IL, BP, upper 11 bits of Lino and stores this data into the IL cache, in location determined by lowest 5 bits of Lino (classic cache tag/index)
- il_cache_v is a 32 bit flag register - SX 0 also sets the <index> bit in that register to 1, to indicate valid entry
Sequence to read the cache:
- SA instruction ("Switch on alpha") first examines the valid bit and match of tag in il_cache with the upper 11 bits of Lino. If both are true, cache entry has been found
- If cache hit was detected, IL_PC and BP can be set from il_cache - this means the search sequence to find the keyword is skipped, and also the search for first non-blank in program code (RAM), saving time
- If cache hit is not detected, SA instruction execution proceeds as described above
Sequence to clear the cache:
- When Lino goes from 0 to non-0 (program execution starts) then il_cache_v 32-bit flag register is reset to 0x00000000. This invalidates all entries in the cache, allowing SX 0 and SA to write to and read from cache.

The il_cache shaves off about 4 to 5% of execution time, depending on the Basic program. Together with 4% saving by introducing the SA instruction as described above, performance boost is about 8%.
zpekic
Discussions
Become a Hackaday.io Member
Create an account to leave a comment. Already have an account? Log In.