Close

Basic CPU overview

A project log for CPU running Basic

Celebrating 50 years of Tiny Basic by implementing a custom micro-coded 16/32-bit CPU that executes it directly (up to 100MHz)

zpekiczpekic 11/13/2025 at 00:150 Comments

If Basic CPU was a real IC (maybe one day it will be? :-)) the data sheet would brag the following features:

CPU has "Harvard architecture" - its program (IL code) and data (Basic statements and command line) reside in two independent memory stores. Typically, the former is ROM, latter is RAM, but both can be ROM. 

Here is an improvised sketch of the main CPU components:

Implementation is mostly in one VHDL source file (may refactor later), with few subcomponents:

Main components (some may merit separate project logs, stay tuned)

Micro-coded control unit.

(if not familiar with micro-coding, this project goes into some details, including the MCC compiler and the toolchain)

Consists of:

All of these components are automatically generated by running the 2-pass MCC compiler on the microcode source file. Deep dive into details of micro-coded control here

Code processing components.

Consist of:

Input / output.

ALU.

This component merits own project log because it is fairly complex. It has state (registers R, S, Y and some flags) which allows it to execute stateful operations (e.g. division, BCD to binary and binary to BCD conversions, loading of consecutive 8-bit values into 16-bit etc. 

Data processing components.

Basic code (statements) and Basic command line are data for this CPU. This data store can be up to 64k bytes. The memory map is as follows:

This memory ("core" in some of the Tiny Basic documents) can be accessed from Basic program using USR function which exposes PEEK/POKE functionality. A part of this memory can be reserved for I/O devices (memory mapped I/O in 6502 style)

Main components:

Variables

This is a simple store of 32 entries, each 16 bits. Variable A is mapped to location 1, up to Z, leaving a few unused. Within this component, there is also a "vars_index" 5-bit address register which can be loaded from byte at the stack top (as it contains the ASCII code, 64 is subtracted to form the correct address in the store). The value of variable can come only through T register which can get it from multiple sources:

 update_Vars: process(clk, mb_Vars)
 begin
    if (rising_edge(clk)) then
        case mb_Vars is
--            when Vars_same =>
--                Vars <= Vars;
            when Vars_indexFromExpStack =>
                -- top byte on expressions stack (ExpSTLo) contains twice the upper - case ASCII code of the variable name
                vars_index <= std_logic_vector(unsigned('0' & ExpSTLo(7 downto 1)) - X"40");
            when Vars_T =>
                Vars(to_integer(unsigned(vars_index(4 downto 0)))) <= T;
            when others =>
                null;
        end case;
 end if;
 end process;

while writing this, I looked at the implementation of SV (store variable) instruction:

////////////////////////////////////////////////////////////////////////////////
.map 0x13;                                    // SV (Store Variable)
////////////////////////////////////////////////////////////////////////////////
traceString 37;                                // trace mnemonic
if STACK_IS_EMPTY then ESTACK_ERR;
T <= ExpStack, ExpStack <= pop2;        // pop T from stack
if STACK_IS_EMPTY then ESTACK_ERR;
Vars <= indexFromExpStack, ExpStack <= pop1;// pop index (variable name A-Z)
Vars <= T, goto fetch;                // Vars(index) <= T

and realized that 1 clock cycle can be eliminated. Given the frequency of storing variables (each LET statement) this can bring a bit of a perf improvement:

////////////////////////////////////////////////////////////////////////////////
.map 0x13;                                    // SV (Store Variable)
////////////////////////////////////////////////////////////////////////////////
traceString 37;                                // trace mnemonic
if STACK_IS_EMPTY then ESTACK_ERR;
T <= ExpStack, ExpStack <= pop2;        // pop T from stack
Vars <= indexFromExpStack, if STACK_IS_EMPTY then ESTACK_ERR;
ExpStack <= pop1, Vars <= T, goto fetch;    // Vars(index) <= T and remove used stack entry

 Expression / evaluation stack. 

Used for evaluation of expressions as well as execution of various instructions, such as FV, SV, CP, USR, LS which require the data on the stack be in specific form (e.g. for LS - list, the stack top is end line, stack next is first line to list). It is 16 levels deep, 8-bits wide because some stack operations work on bytes not whole 16-bit words. In order to speed up the 16-bit operations, the actual implementation is a 2-port RAM to be able to read/write 16-bits in one clock cycle. 

 -- Stack top value
 ExpSTHi <= ExpStack(to_integer(unsigned(ExpSP) - 2));
 ExpSTLo <= ExpStack(to_integer(unsigned(ExpSP) - 1));

GOSUB stack.

8-levels deep, 32-bits wide. Making it 32 - bit instead of 16 like in other implementations helps performance because both Lino (line number of the GOSUB caller) and BP (Basic Pointer location of the caller statement) are saved, so when RETURN is executed no search in the Basic code is needed, everything needed to resume execution on the caller level is ready. 

////////////////////////////////////////////////////////////////////////////////
.map 0x14;            // GS (GoSub - save Basic line for RETURN later)
////////////////////////////////////////////////////////////////////////////////
traceString 47;            // Trace mnemonic
if IS_RUNMODE then next else INTERNAL_ERR;
if STACK_IS_FULL then BSTACK_ERR;
BasStack <= push_Lino_and_BP, goto fetch;

////////////////////////////////////////////////////////////////////////////////
.map 0x15;            // RS (ReStore saved line - Basic RETURN)
////////////////////////////////////////////////////////////////////////////////
traceString 48;            // Trace mnemonic
if IS_RUNMODE then next else INTERNAL_ERR;
T <= BasStack_Hi,if STACK_IS_EMPTY then BSTACK_ERR;
Lino <= T, T <= BasStack_Lo;
BP <= T, BasStack <= pop, goto fetch;

Miscellaneous registers.

16-bit registers that hold important state needed to execute TBIL code and interpret Basic:

GOTO cache. 

This is the key to Basic CPU performance because (depending on size and complexity of Basic program in core memory) speeds up execution 2+ times. 

Basic statements in core memory are stored in consecutive order. To find a statement (GOTO, GOSUB) it is necessary to traverse the memory from the beginning (from location 0x0080 onwards) and look for the 2-byte binary representation of the destination line number. Once the location of that instruction is found, it is stored in this cache, indexed by its line number. Up to 32 locations can be cached, and cache hit saves lots of searching time. The cache is initialized to all entries free when going from command to run mode (Lino changing from 0 to !=0). Stay tuned for separate project log about the structure and operation of this cache. 

Discussions