Call to action: if you are reading this and have a working retro-computer with any CPU running Tiny Basic (esp. the version with TBIL) please run the same benchmark test and share the results here!
Update 2025-11-27
@msolajic also ran the benchmark on a computer very special and dear to all enthusiasts from ex-Yugoslavia: the Galaksija.
Update 2025-11-26
Running the benchmark in "extended" mode using FOR/NEXT loops improves performance about 3% but the data in tables below are for "original" version of the Tiny Basic interpreter.
Update 2025-11-23 / 27
@msolajic graciously ran the 1000-primes benchmark on some additional retro-computers. Here are the results and comparison with Basic CPU (see table at the bottom of this project log)
As soon as the CPU started semi-working, I set out to measure and improve the performance. To be precise, I added the elapsed run timer into the CPU. It is driven by 1kHz clock (so has 1ms resolution of "ticks"). It is started when Lino register (holding the line of executing statement) goes from 0 to != (program execution starts) and stops when it goes back to 0.
-- counting ticks (typically 1ms) while the program is running (to be displayed at the end of execution
on_clk_tick: process(clk_tick, reset)
begin
if (reset = '1') then
cnt_tick <= (others => '0');
cnt_tick1000 <= (others => '0');
lino_tick <= (others => '0');
else
if (rising_edge(clk_tick)) then
lino_tick <= Lino;
if (is_runmode = '1') then
if (lino_tick = X"0000") then
-- going from stopped to running, reset counters
cnt_tick <= (others => '0');
cnt_tick1000 <= (others => '0');
else
-- when running, load increment counters
if (cnt_tick = X"03E7") then -- wrap around at 1000
cnt_tick <= (others => '0');
cnt_tick1000 <= std_logic_vector(unsigned(cnt_tick1000) + 1);
else
cnt_tick <= std_logic_vector(unsigned(cnt_tick) + 1);
end if;
end if;
end if;
end if;
end if;
end process;
At the end of program execution, the value of these 2 counters (seconds and milliseconds elapsed) is displayed:
For benchmark, I used the "find first 1000 primes" test which has the advantage of simplicity and portability. Because this version has no FOR/NEXT (I plan to implement it), the test had to slightly change and replace that with IF/GOTO.
There are two variations of the test code:
- Without GOSUB (proposed here, modified Basic program here)
- With GOSUB (proposed here, modified Basic program here) - not surprisingly, it is about 20% slower across all clock frequencies.
Below is the direct comparison with my previous Tiny Basic project. Meaningless (because it is different interpreter and CPU) but still fun:
| Clock frequency | 25MHz | 25MHz | Acceleration |
| Serial I/O | 38400 baud, 8N1 | 38400 baud, 8N1 | 1 |
| CPU | Am9080 (implemented using Am2901 bit slices) | Basic CPU | N/A |
| Tiny Basic version | Native assembler interpreter | Intermediate language based | N/A |
| Run time (s) | 197 | 36.58 | 5.32 |
Going back to the original article from 1980, I attempted to compare by reducing the Basic CPU clock speed to be same as those systems.
| Clock (MHz) | CPU | Basic version | Run time (s) | Basic CPU run time (s) | Acceleration |
| 1 | 6502 | Level I Basic | 1346 | 906 | 1.48 |
| 2 | 6502 | Level I Basic | 680 | 453 | 1.50 |
| 2 | 6502 | Applesoft II Basic | 960 | 453 | 2.12 |
| 2 | Z80 | Level II Basic | 1928 | 453 | 4.26 |
| 2.4576 | 80C85 | Microsoft Basic (Tandy 102) | 2080 | 366 | 5.68 |
| 3 | 8085 | StarDOS Basic | 1438 | 302 | 4.76 |
| 3 | 9900 | Super Basic 3.0 | 585 | 302 | 1.94 |
| 4 | Z80 | Zilog Basic | 1864 | 227 | 8.21 |
| 4 | Z80 | Level III Basic | 955 | 227 | 4.20 |
| 5 | 8086 | Business Basic | 1020 | 182 | 5.60 |
| 6 | 4*Am2901 | HBASIC+ | 143 | 152 | 0.94 |
As can be seen, Basic CPU is faster than all compared systems, except AMD's own HEX-29 system / CPU which was a showcase of their own bit-slice technology. Interestingly, it is also controlled by similar "horizontal" micro-code just like the Basic CPU. This CPU has been described in the classic "Bit-slice Microprocessor Design" book.
Update 2025-11-20: with some tweaks in microcode, I improved the perf numbers above by about 1-2%. More info about perf here.
| Clock | CPU | Basic version | Run time (s) | Basic CPU run time (s) | Acceleration |
| 1MHz | 6510 | Commodore Basic (C64) | 1086 | 906 | 1.2 |
| 3.072MHz | Z80 | Galaksija Basic (Galaksija, video generation off) | 2700 | 293 | 9.2 |
| 3.5MHz | Z80 | Sinclair Basic (ZX Spectrum) | 1536 | 253 | 6.07 |
| 7.328MHz | Z80 | Microsoft ROM Basic V4.7B (Grant Searle Z80 SBC) | 349 | 172 | 2.02 |
| 6.144MHz (0 wait state DRAM) | HD64180 | Basic-80 V5.22 (TIM-011B) | 446 | 147 | 3.03 |
| 6.144MHz (1 wait state DRAM) | HD64180 | Basic-80 V5.22 (TIM-011B) | 506 | 147 | 3.44 |
| 18.432MHz | Z8S180 | Basic-80 V5.21 (S131 SBC) | 146 | 107 | 1.36 |
| 18.432MHz | Z8S180 | Microsoft ROM Basic V4.7B (S131 SBC) | 139 | 107 | 1.30 |
zpekic
Discussions
Become a Hackaday.io Member
Create an account to leave a comment. Already have an account? Log In.