Close

Lies, damn lies, and ... benchmarks!

A project log for CPU running Basic

Celebrating 50 years of Tiny Basic by implementing a custom micro-coded 16/32-bit CPU that executes it directly (up to 100MHz)

zpekiczpekic 11/13/2025 at 19:110 Comments

Call to action: if you are reading this and have a working retro-computer with any CPU running Tiny Basic (esp. the version with TBIL) please run the same benchmark test and share the results here!


Update 2025-11-27

@msolajic also ran the benchmark on a computer very special and dear to all enthusiasts from ex-Yugoslavia: the Galaksija.

Update 2025-11-26

Running the benchmark in "extended" mode using FOR/NEXT loops improves performance about 3% but the data in tables below are for "original" version of the Tiny Basic interpreter.

Update 2025-11-23 / 27

@msolajic graciously ran the 1000-primes benchmark on some additional retro-computers. Here are the results and comparison with Basic CPU (see table at the bottom of this project log)

As soon as the CPU started semi-working, I set out to measure and improve the performance. To be precise, I added the elapsed run timer into the CPU. It is driven by 1kHz clock (so has 1ms resolution of "ticks"). It is started when Lino register (holding the line of executing statement) goes from 0 to != (program execution starts) and stops when it goes back to 0.

-- counting ticks (typically 1ms) while the program is running (to be displayed at the end of execution
on_clk_tick: process(clk_tick, reset)
begin
    if (reset = '1') then
        cnt_tick <= (others => '0');
        cnt_tick1000 <= (others => '0');
        lino_tick <= (others => '0');
    else
        if (rising_edge(clk_tick)) then
            lino_tick <= Lino;
            if (is_runmode = '1') then
                if (lino_tick = X"0000") then
                    -- going from stopped to running, reset counters
                    cnt_tick <= (others => '0');
                    cnt_tick1000 <= (others => '0');
                else
                    -- when running, load increment counters
                    if (cnt_tick = X"03E7") then        -- wrap around at 1000
                        cnt_tick <= (others => '0');
                        cnt_tick1000 <= std_logic_vector(unsigned(cnt_tick1000) + 1);
                    else
                        cnt_tick <= std_logic_vector(unsigned(cnt_tick) + 1);
                    end if;
                end if;
            end if;
        end if;
    end if;
end process;

At the end of program execution, the value of these 2 counters (seconds and milliseconds elapsed) is displayed:

 For benchmark, I used the "find first 1000 primes" test which has the advantage of simplicity and portability. Because this version has no FOR/NEXT (I plan to implement it), the test had to slightly change and replace that with IF/GOTO.

There are two variations of the test code:

Below is the direct comparison with my previous Tiny Basic project. Meaningless (because it is different interpreter and CPU) but still fun:

Clock frequency25MHz25MHzAcceleration
Serial I/O38400 baud, 8N138400 baud, 8N11
CPUAm9080 (implemented using Am2901 bit slices)Basic CPUN/A
Tiny Basic versionNative assembler interpreterIntermediate language basedN/A
Run time (s)19736.585.32

Going back to the original article from 1980, I attempted to compare by reducing the Basic CPU clock speed to be same as those systems. 

Clock (MHz)CPUBasic versionRun time (s)Basic CPU run time (s)Acceleration
16502Level I Basic13469061.48
26502Level I Basic6804531.50
26502Applesoft II Basic9604532.12
2Z80Level II Basic19284534.26
2.457680C85Microsoft Basic (Tandy 102)20803665.68
38085StarDOS Basic14383024.76
39900Super Basic 3.05853021.94
4Z80Zilog Basic18642278.21
4Z80Level III Basic 9552274.20
58086Business Basic10201825.60
64*Am2901HBASIC+1431520.94

As can be seen, Basic CPU is faster than all compared systems, except AMD's own HEX-29 system / CPU which was a showcase of their own bit-slice technology. Interestingly, it is also controlled by similar "horizontal" micro-code just like the Basic CPU. This CPU has been described in the classic "Bit-slice Microprocessor Design" book.

Update 2025-11-20: with some tweaks in microcode, I improved the perf numbers above by about 1-2%. More info about perf here.

ClockCPUBasic versionRun time (s)Basic CPU run time (s)Acceleration
1MHz6510Commodore Basic (C64)10869061.2
3.072MHzZ80Galaksija Basic (Galaksija, video generation off)27002939.2
3.5MHzZ80Sinclair Basic (ZX Spectrum)15362536.07
7.328MHzZ80Microsoft ROM Basic V4.7B (Grant Searle Z80 SBC)3491722.02
6.144MHz (0 wait state DRAM)HD64180Basic-80 V5.22 (TIM-011B)4461473.03
6.144MHz (1 wait state DRAM)HD64180Basic-80 V5.22 (TIM-011B)5061473.44
18.432MHzZ8S180Basic-80 V5.21 (S131 SBC)1461071.36
18.432MHzZ8S180Microsoft ROM Basic V4.7B (S131 SBC)1391071.30

Discussions