Close

Speeding up the Tiny Basic interpreter

A project log for CPU running Basic

Celebrating 50 years of Tiny Basic by implementing a custom micro-coded 16/32-bit CPU that executes it directly (up to 100MHz)

zpekiczpekic 04/01/2026 at 00:120 Comments

The original Tiny Basic interpreter relies on the BC (String match branch) instruction to do the parsing of keywords. After the 2-byte statement number (which is binary 16-bit value) the rest of Basic statement is a string, ending with CR (0x13). The "spec" for BC sounds a bit convoluted:

BC a "xxx"   80xxxxXx-9FxxxxXx  String Match Branch.                                The ASCII character string in the IL
following this opcode is compared to the string beginning with the
current position of the BASIC pointer, ignoring blanks in the BASIC
program. The comparison continues until either a mismatch is found,
or an IL byte is reached with the most significant bit set to one.
This is the last byte of the string in the IL, and it is compared as
a 7-bit character; if equal, the BASIC pointer is positioned after
the last matching character in the BASIC program and the IL program
continues with the next instruction in sequence. Otherwise the BASIC
pointer is not altered and the low five bits of the Branch opcode
are added to the IL program counter to form the address of the next
IL instruction. If the strings do not match and the branch offset is
zero an error stop occurs.

But seen in the code it is clear - compare with keyword, and if there is a match code to implement the statements continues after BC, otherwise jump to next BC which checks for next keyword:

 112 :RUN  BC CLER "RUN"
 113       J XEC
 114 .
 115 :CLER BC REM "CLEAR"
 116       MT
 117 .
 118 :REM  BC DFLT "REM"
 119       NX
 120 .
 121 :DFLT BV *            NO KEYWORD...
 122       BC * "="        TRY FOR LET
 123       J LET           IT'S A GOOD BET.

At the end of the chain, assume the Basic statement was a <variable>=<expression> (assumed LET)

While simple the problem is that the execution time grows with number of keywords implemented. Clever ordering of checks (e.g. PRINT or IF - very common, before CLEAR - which is rare) can speed up a bit but it remains a slow chain:

100 LET x=100 ... executes "fast" because LET is first keyword to be checked

100 x=100 executes much slower because x is recognized as a variable only after checking for LET, PRINT, IF, INPUT, POKE, GOTO, GOSUB, RETURN, REM, FOR, NEXT, CLEAR, NEW all failed.  

Obviously, there is a much faster way - instead of n IFs, execute a single "switch" which checks the first alpha (A-Z) character in the Basic statement. If this is "x" we know right away it is implied LET because no statement starts with X. From the faster interpreter:

//STATEMENT EXECUTOR
:STMT    SA TRYQ        //switch on alpha char, or go to TRYQ label if not
    J LETLP        //a
    J LETLP        //b
    J C_BEG        //clear
    J LETLP        //d
    J E_BEG        //end
    J F_BEG        //for
    J G_BEG        //goto, gosub
    J LETLP        //h
    J I_BEG        //if, input
    J LETLP        //j
    J LETLP        //k
    J L_BEG        //let, list
    J LETLP        //m
    J N_BEG        //next, new
    J LETLP        //o
    J P_BEG        //print, poke, push, pop
    J LETLP        //q
    J R_BEG        //return
    J LETLP        //s
    J LETLP         //t
    J LETLP        //u
    J LETLP        //v
    J LETLP        //w
    J LETLP        //x
    J LETLP        //y
    J LETLP        //z

:TRYQ    BC TRYARR, "?"        //? is short form of PRINT in many Basic dialects
    J P0            //yes, continue with PRINT statement
:TRYARR J ARRAY

SA is the new TBIL instruction (2 bytes, 0x28 <defaultcaseoffset>) implemented in microcode:

        ///////////////////////////* EXTENDED INSTRUCTION */////////////////////////////
        .map 0x28;                                    // SA (Switch on Alpha character)
        ////////////////////////////////////////////////////////////////////////////////
        traceString 4;                                // Trace mnemonic - TODO
        skipSpaces();
        MDR <= ToUpper, if MDR_IS_ALPHA then next else sa_exit;    // test covers lowercase too, so can execute conversion and check in parallel    
        IL_PC <= pc_plus_off2alpha, goto fetch;
sa_exit:    IL_PC <= pc_plus_off8, goto fetch;

I measured the effect on the Mandelbrot test program (more about this test here). The speedup was modest (about 4%) but bigger advantage is that future keyword additions wont increase the interpretation times as much as a chain of BC would. 

Run time in seconds, 100MHz Basic CPU clock, set parameters W=79, H=79, M=20TBIL interpreter with BC (chained "ifs")TBIL interpreter with SA (switch on alpha char)
Regular Basic code (use of PRINT, LET everywhere)91.30587.616
"Optimized" Basic code (use of ?, LET only for vars with keywords starting with same alpha char)90.907 (interpolated)87.234

As can be seen, tweaking the Basic code itself does not contribute significantly to speeding up the benchmark program. 

IL (intermediate language) cache

Let's consider the following program:

...
100 FOR i = 4096 TO 8191
110 POKE i, 0
120 NEXT i
...

To execute line 110, interpreter must determine if the statement is PRINT, POKE, or maybe just P=... (implied LET). This of course takes time, esp. if done in a loop. If it would be possible to store away the fact that 110 is a POKE, and that its parameters start with i, ... then on other 4095 iterations time could be saved avoiding quite a bit of parsing. This is the goal of the IL cache.

IL cache is 32-entries deep and 48 bits wide (with some padding):

-- TBIL cache
il_cache_entry <= il_cache(to_integer(unsigned(Lino_index)));		-- IL cache entry pointed to by Lino 
il_cache_valid <= il_cache_v(to_integer(unsigned(Lino_index))) when (cnt_statement = X"00") else '0';
il_cache_hit <= il_cache_valid when (il_cache_tag = Lino_tag) else '0';
il_cache_empty <= '1' when (il_cache_v = X"00000000") else '0';
il_cache_full <= '1' when (il_cache_v = X"FFFFFFFF") else '0';

 32-entries means that the index is 5 bits, and tag 11 bits because it is indexed by 16-bit integer which is the Basic statement (for example 110 = 0x006E = 00000000011:01110)

The IL cache is only active and meaningful in "run mode", when Basic program is running, otherwise the Lino (line number) register contains 0 and each statement would map to cache entry 0 with little effect. 

Sequence to update the cache:

  1. Basic interpreter is modified by adding SX 0 instruction as the first instruction of each Basic command/keyword. SX 0 in original Tiny Basic is a NOP, so could be reused without other side-effect.
  2. When SX 0 is executed, PC_IL (program counter) point to next IL instruction (which is the first instruction of the statement), and BP (Basic pointer) points to first non-blank character after the keyword
  3. Microcode that implements SX takes PC_IL, BP, upper 11 bits of Lino and stores this data into the IL cache, in location determined by lowest 5 bits of Lino (classic cache tag/index)
  4. il_cache_v is a 32 bit flag register - SX 0 also sets the <index> bit in that register to 1, to indicate valid entry

Sequence to read the cache:

  1. SA instruction ("Switch on alpha") first examines the valid bit and match of tag in il_cache with the upper 11 bits of Lino. If both are true, cache entry has been found
  2. If cache hit was detected, IL_PC and BP can be set from il_cache - this means the search sequence to find the keyword is skipped, and also the search for first non-blank in program code (RAM), saving time
  3. If cache hit is not detected, SA instruction execution proceeds as described above

Sequence to clear the cache:

  1. When Lino goes from 0 to non-0 (program execution starts) then il_cache_v 32-bit flag register is reset to 0x00000000. This invalidates all entries in the cache, allowing SX 0 and SA to write to and read from cache.

The il_cache shaves off about  4 to 5% of execution time, depending on the Basic program. Together with 4% saving by introducing the SA instruction as described above, performance boost is about 8%. 

Discussions