As Michael mentioned, he somewhat nerd sniped me with the the idea of squeezing a Towers of Hanoi solver in not the Science Fair Microcomputer Trainer. As an avid fan of both the Gakken FX R-165 and the Science Fair Microcomputer Trainer, I could't let it go uncompleted.
With a good number of iterations, I managed to squeeze an "interactive" solver down to fit into just the 80 nibbles (40 bytes!) of the Microcomputer Trainer's memory.
The full listing follows.
The Listing
Memory address in column 1, 2, and 3; value to enter in column 4. Logical labels, relevant to program execution, are in column 5. The final two columns are a disassembly of the instruction, back to labels and opcodes. These correspond one-to-one with the final assembly language source (provided at the end of this entry).
; towers of hanoi for science fair microcomputer trainer ; (c) 2024 LambdaMikel & jtjacques ; assembled with https://arpruss.github.io/r165.html ; stack ; set 0x50 to a ; set 0x51 to c ; set 0x52 to b ; set 0x53 to 0 ; initial state ; set z (0x6b) to n (0, 1, 2, 3) ; set y (0x6e) to 3 (pointer to m 0x53) ; start at hanoi (0x17), mode 2 ; when pause (0x33, 0b0110011) ; reset, inspect y (0x6e) ; moves: inspect memory at 0x5[y] (from), 0x5[y+1] (to) ; resume at resume (0x00, or simply reset), mode 2 ; when hlt (0x29, 0b0101001), program complete 0 0000000 00 B resume AIY 1 0000001 01 2 2 2 0000010 02 5 MA 3 0000011 03 B AIY 4 0000100 04 2 2 5 0000101 05 4 AM 6 0000110 06 B AIY 7 0000111 07 D D 8 0001000 08 5 MA 9 0001001 09 B AIY 10 0001010 0A 4 4 11 0001011 0B 4 AM 12 0001100 0C B AIY 13 0001101 0D B B 14 0001110 0E 5 MA 15 0001111 0F B AIY 16 0010000 10 6 6 17 0010001 11 4 AM 18 0010010 12 8 TIA 19 0010011 13 3 3 20 0010100 14 B lbl_6: AIY 21 0010101 15 1 1 22 0010110 16 4 AM 23 0010111 17 2 hanoi CH 24 0011000 18 D CIY 25 0011001 19 0 0 26 0011010 1A F JUMP 27 0011011 1B 3 lbl 28 0011100 1C 6 _0 29 0011101 1D F JUMP 30 0011110 1E 2 lbl 31 0011111 1F 5 _1 32 0100000 20 B lbl_3: AIY 33 0100001 21 C C 34 0100010 22 2 CH 35 0100011 23 B AIY 36 0100100 24 1 1 37 0100101 25 2 lbl_1: CH 38 0100110 26 5 MA 39 0100111 27 E CAL 40 0101000 28 6 SIFT 41 0101001 29 F halt lbl_2: JUMP 42 0101010 2A 2 lbl 43 0101011 2B 9 _2 44 0101100 2C C CIA 45 0101101 2D 0 0 46 0101110 2E F JUMP 47 0101111 2F 2 lbl 48 0110000 30 0 _3 49 0110001 31 B AIY 50 0110010 32 9 9 51 0110011 33 F pause lbl_4: JUMP 52 0110100 34 3 lbl 53 0110101 35 3 _4 54 0110110 36 B lbl_0: AIY 55 0110111 37 F F 56 0111000 38 2 CH 57 0111001 39 B AIY 58 0111010 3A D D 59 0111011 3B 5 MA 60 0111100 3C B AIY 61 0111101 3D 4 4 62 0111110 3E 4 AM 63 0111111 3F B AIY 64 1000000 40 E E 65 1000001 41 5 MA 66 1000010 42 B AIY 67 1000011 43 3 3 68 1000100 44 4 AM 69 1000101 45 B AIY 70 1000110 46 C C 71 1000111 47 5 MA 72 1001000 48 B AIY 73 1001001 49 5 5 74 1001010 4A 4 AM 75 1001011 4B 8 TIA 76 1001100 4C 1 1 77 1001101 4D F JUMP 78 1001110 4E 1 lbl 79 1001111 4F 4 _6 80 1010000 50 A source 81 1010001 51 C dest 82 1010010 52 B spare 83 1010011 53 0 return
Setup
The program does require you to specify the initial parameters for the first stack frame in the data area at addresses 0x50-0x53. This tells the computer the layout of your pegs, and sets the exit condition. For convenience I have included them at the end of the listing and you can simply carry on entry past the program area into the data area. However, for completeness they are as follows:
0x50: source (peg A), 0x51: destination (peg C) 0x52: spare (peg B) 0x53: 0 (return label)
While the names are arbitrary (you could call them pegs 1, 2, and 3, for example, if you prefer) the final return label (0) is required to allow the program to exit correctly.
You also need to configure two more parameters: the number of discs between 0 and 3 in 0x6B (the Z register) and the least significant nibble marking the end of the stack in 0x6E (the Y register), 3.
Important Note: documentation for the FX R-165 and Microcomputer Trainer indicate that the Z register is held in memory at location 0x6D. This appears to be an error. The Z register is located at 0x6B on real hardware (both the FX R-165 and Microcomputer Trainer) and on emulators using the ROM, but other simulations may implement the documented location of 0x6D.
Execution
Once you have entered the program and set your desired number of discs (0-3), you need to start the program from location 0x17, in run mode 2 (run with addresses). Press [1], [7], [addr-set] to set the program counter, then press [2] and [run] to start the program. Using run mode 2 means that we can monitor the program progress on the address bus (the binary LEDs) and read out our moves when the program pauses.
Result
When the program pauses, the address bus will read 0b0110011. When this happens, the program has calculated a move for us.
To read the move, first we must stop the program by pressing [reset]. This allow us to check the Y register, at 0x6E, by pressing [6], [E], [addr-set]. This will display the least significant nibble of the data area address where we will find our move. For example, if you see 4, we would inspect 0x54. e.g. [5], [4], [addr-set]. The computer will now display the from peg on the HEX LED. Press [incr] to see the to peg. For example, if we see [B], [C], we would move the top disc on peg B to peg C.
To find our next move, we simply resume the program in mode 2. For convenience, the resume address is at 0x00. This allows us to simply reset the machine: [reset], [2], [run]. The program will proceed to calculate the next move. If the binary LEDs indicate a pause (0b0110011) then repeat the inspection process, starting with the Y register. If the binary LEDs display 0b0101001 the program has completed.
Note that attempting to solve for numbers of discs 4 or greater will overflow the 16 locations made available as data memory (0x50-5F) and used for the stack. This will cause the program to "crash" or result in other unexpected behaviour.
How the Program Works
It is possible to put executable code in the memory area (0x50-5F) of the Science Fair Microcomputer Trainer, and therefore set return addresses at run time. However, each jump requires 3 nibbles (4-bit words) which is a lot of overhead; both in the stack frame and in code required to write these addresses to memory. Fortunately Michael gave me a great start with his initial work on the problem. Instead, I used his neat trick of constant return labels and return addresses calculated at assembly time.
The program went through multiple iterations, but the draft version fairly closely follows this example pseudo code.
TOH(N, A, B, C): IF (N = 0): STOP TOH(N-1, A, C, B) A -> C TOH(N-1, B, A, C)
Draft program
To make it easier to develop the program I use a very simple assembler. The opcode mnemonics are identical to those used in the Microcomputer Trainer documentation. While the assembler does not offer any macros, labels are supported. Using an assembler allows us to iterate our program faster and, importantly, ensures that all the jumps are correctly calculated in the final program.
The following code represents a basic, first attempt at implementing a (somewhat) interactive solver for Towers of Hanoi.
; towers of hanoi for science fair microcomputer trainer ; (c) 2024 LambdaMikel & jtjacques ; assemble with https://arpruss.github.io/r165.html ; stack ; set 0x50 to a ; set 0x51 to c ; set 0x52 to b ; set 0x53 to 0 ; initial state ; set z (0x6b) to n (0, 1, 2, 3) ; set y (0x6e) to 3 (pointer to m 0x53) ; start at hanoi (0x??), mode 2 ; when pause (0x??, 0b???????) ; reset, inspect y (0x6e) ; moves: inspect memory at 0x5[y] (from), 0x5[y+1] (to) ; resume at resume (0x??), mode 2 ; when hlt (0x??, 0b???????), program complete HANOI: CH ; get b/z CIY 0 ; z <> 0 JUMP CALL1 ; then start JUMP RETURN ; else return CALL1: AIY F ; disk - 1 (z) CH ; restore a/y ; swap 1 ; prepare stack ; s =+0, d =+1, x =+2, r =+3 ; s'=+4, d'=+5, x'=+6, r'=+7 ; y = 3 ; s' <- s ; d' <- x ; x' <- d AIY 3 ; move to x' ; AIY D ; s ; MA ; AIY 4 ; s' ; AM ; AIY E ; x ; MA ; AIY 3 ; d' ; AM ; AIY C ; d ; MA ; AIY 5 ; x' ; AM TIA 1 ; lbl 1 AIY 1 ; advance to r' AM ; set lbl JUMP HANOI ; recurse RET1: ; AIY C ; unwind stack ; ; CH ; get b/z ; ; AIY 1 ; restore disk number (z) ; ; CH ; restore a/y ; AIY D ; move stack to s AIY 9 ; unwind stack and move to s PAUSE: JUMP PAUSE ; pause RESUME: AIY 3 ; move stack to r CALL2: ; CH ; get b/z ; AIY F ; disk - 1 (z) ; CH ; get a/y ; swap 2 ; prepare stack ; s =+0, d =+1, x =+2, r =+3 ; s'=+4, d'=+5, x'=+6, r'=+7 ; y = 3 ; s' <- x ; d' <- d ; x' <- s AIY 3 ; move to x' ; AIY F ; x ; MA ; AIY 2 ; s' ; AM ; AIY D ; d ; MA ; AIY 4 ; d' ; AM ; AIY B ; s ; MA ; AIY 6 ; x' ; AM TIA 2 ; lbl 2 AIY 1 ; advance to r' AM ; set lbl JUMP HANOI ; recurse RET2: AIY C ; unwind stack CH ; get b/z AIY 1 ; restore disk number (z) ;JUMP RETURN ; fallthrough RETURN: CH ; restore a/y MA ; get r CIA 0 ; r <> 0 ? JUMP TEST ; then goto test HLT: ; else JUMP HLT ; hlt TEST: CIA 1 ; r <> 1 ? JUMP RET2 ; then ret2 JUMP RET1 ; else ret1
The program is fairly intelligible and its structure mimics that of the pseudo code. First we check if the number of discs is > 0, if not we exit the function by returning. Otherwise, we begin our first function call. Here we decrement the disc number (disk - 1), then begin to arrange our stack (or we would, but I comment this out for brevity and replaced it with a stub during testing).
Note that the Microcomputer Trainer does not have a subtraction instruction. Subtraction is achieved by causing the sum to overflow and wrap around. To discover the number to add, simply subtract the desired value from 0x10. For example, to subtract 0x1 we instead need to add 0xF as 0x10-0x1=0xF.
Finally, we complete the stack configuration for our first recursive call. We load the constant 1 into the accumulator, advance the stack pointer (the Y register) to the correct location, and store the return label in memory.
The program then does its first recursion, by jumping back to HANOI. When that call is done, the function will return (more on that later) and we will be put back at RET1. Here we pop the stack (or unwind, as I note in the comments), correct the disk number (the decrement should only affect the call we just made) and move to the Y register to point to the source peg in the current stack frame.
Here we see our first optimisations. Notably, there is no need to actually restore the disk number. No one is looking at it right now, and we are going to need to decrement it again in a moment for the second recursive call. So we comment that code out, saving four nibbles. This would leave two adjacent AIY instructions to advance the Y register. We can simplify this by summing these together and doing a single add to save a further two nibbles. The sum of AIY 0xD and AIY 0xC is 0x19. As this is a 4-bit machine, we simply add the lower nibble, and do AIY 0x9.
We then pause the program with a jump to the PAUSE label. This allow the user to exit out of the program and retrieve the calculated move as indicated on the stack at the position indicated by the Y register.
We then expect the user to resume the program at RESUME. We move the stack pointer (the Y register) back to the end of the stack and prepare our second recursive call. We've already commented out the redundant decrement of the disk number (as we didn't restore it), saving another four nibbles. Simply avoiding restoring and re-decrementing the disk number has saved us 10 nibbles, with no impact on execution an only a minor impact on readability.
Next, we managed the stack. For brevity we use a stub to skip over doing the actual swaps while we develop the code. Like with the first call, we set our return label, but this time to 2 to indicate that this is the second recursive call. We load the constant into the accumulator, advance the stack pointer, and store the return label in memory.
When this call completes, we will be retuned to the RET2 label. Here we again pop the stack, restore the disk number (this time we need to!) and jump to RETURN. Or we would, but why waste memory jumping to the very next instruction? Instead we simply fall through. An instant and totally free saving of three nibbles.
Now we get to the RETURN block. Michael gives a great explanation of this concept, which he calls the "jump block", in his video where he demonstrates recursion on the Busch Microtronic 2090. Essentially, we retrieve the label at the stack pointer and do a series of tests. The Microcomputer Trainer tests for inequality, which makes this a bit more confusing, but in summary: if the label is <> 0, we will TEST the label further, otherwise we halt (using a loop to HALT). In the TEST section, if the label is <> 1 (i.e. 2), we jump to the label RET2, otherwise it must be 1 so we jump to RET1.
Shrinking the Program
If we uncomment the swaps (and remove the stub code) and assemble the program we find it is 94 nibbles. While we could just about fit the program into the combined memory and data space of 96 nibbles (80 in the program area at 0x00-4F, plus the first 14 from the adjacent data area at 0x50-5D), that would leave just two nibbles (0x5E-5F) for our stack; not even enough for a single frame. First we double check our code for any easy wins.
We notice two adjacent AIY instructions in the RESUME and CALL2 areas that have surfaced when we uncommented the proper stack management code. We add the values of 0x3 and 0xF to get 0x12, and simply add 0x2 as is sufficient in our 4-bit architecture. This saves us two nibbles. Down to 92.
Our optimisations so far have not materially impacted intelligibility, but to shrink the program further we need to tweak the code somewhat.
Next we notice some duplicate code. Both CALL1 and CALL2 end with the sequence, AIY 1, AM, JUMP HANOI to set the return location on the stack and actually execute the recursive call. We squeeze in a no-cost label of SETR above this code in CALL2 and remove the corresponding code from CALL1. Instead we jump to our new label. This gives us a net saving of three nibbles. Down to 89.
The code is still fairly understandable, but now we will flip that on its head. Literally.
Next, we rearrange the program to maximise the amount of fall through. Each jump takes three nibbles of memory. By rearranging the program we can loose two more jumps (in addition to the one we saved earlier), another saving of 6 nibbles. Down to 83. We also take the opportunity to move the frequently used RESUME block to 0x00, minimising the number of times the user has to explicitly set the program counter.
To squeeze out the final nibbles, we get a little more tricksy. The RETURN block does a bit of an odd pattern of jumps due to the inequality operation. It would be nice to test if the label is 0 and, if it is, halt right away. To do this we can do a no-cost substitution of the two nibble CIA 0 operation, replacing it with a two nibble call to CAL SIFT. This built in subroutine shifts the value in the accumulator to the right and (oddly) sets the jump flag to true if the lowest bit was zero. This allows us to remove the JUMP TEST instruction and immediately halt when we see an even number, like zero. If we have an odd number, the jump will be skipped and we can begin the next TEST. Removing the jump makes a three nibble saving. Down to 80.
Unfortunately, our label of 2 would also now trigger the HALT loop, so we make sure to make a no-cost change and replace the TIA 2 instruction to TIA 3 (or 0b11, as I like to think of it now). Equally, we need to consider the effect of that shift on our comparison in the TEST block, where we change our CIA 1 to CIA 0, to account for the fact that 1 right shifted is 0 (and 3 right shifted would be 1).
The final listing looks like so:
; towers of hanoi for science fair microcomputer trainer ; (c) 2024 LambdaMikel & jtjacques ; assemble with https://arpruss.github.io/r165.html ; stack ; set 0x50 to a ; set 0x51 to c ; set 0x52 to b ; set 0x53 to 0 ; initial state ; set z (0x6b) to n (0, 1, 2, 3) ; set y (0x6e) to 3 (pointer to m 0x53) ; start at hanoi (0x??), mode 2 ; when pause (0x??, 0b???????) ; reset, inspect y (0x6e) ; moves: inspect memory at 0x5[y] (from), 0x5[y+1] (to) ; resume at resume (0x??), mode 2 ; when hlt (0x??, 0b???????), program complete RESUME: ;AIY 3 ; move stack to r CALL2: ; CH ; get b/z ; AIY F ; disk - 1 (z) ; CH ; get a/y ; swap 2 ; prepare stack ; s =+0, d =+1, x =+2, r =+3 ; s'=+4, d'=+5, x'=+6, r'=+7 ; y = 0 ; s' <- x ; d' <- d ; x' <- s ;AIY 6 ; move to x' AIY 2 ; x MA AIY 2 ; s' AM AIY D ; d MA AIY 4 ; d' AM AIY B ; s MA AIY 6 ; x' AM TIA 3 ; lbl 0b11 SETR: AIY 1 ; advance to r' AM ; set lbl ;JUMP HANOI ; recurse ; fallthrough HANOI: CH ; get b/z CIY 0 ; z <> 0 JUMP CALL1 ; then start JUMP RETURN ; else return RET2: AIY C ; unwind stack CH ; get b/z AIY 1 ; restore disk number (z) ;JUMP RETURN ; fallthrough RETURN: CH ; restore a/y MA ; get r CAL SIFT ; (r & 1) <> 1 (i.e. is r even) ? HLT: ; then JUMP HLT ; hlt TEST: ; else CIA 0 ; (r >> 1) <> (0b1 >> 1) ? JUMP RET2 ; then ret2 ;JUMP RET1 ; else ret1 ; fallthrough RET1: ; AIY C ; unwind stack ; ; CH ; get b/z ; ; AIY 1 ; restore disk number (z) ; ; CH ; restore a/y ; AIY D ; move stack to s AIY 9 ; unwind stack and move to s PAUSE: JUMP PAUSE ; pause CALL1: AIY F ; disk - 1 (z) CH ; restore a/y ; swap 1 ; prepare stack ; s =+0, d =+1, x =+2, r =+3 ; s'=+4, d'=+5, x'=+6, r'=+7 ; y = 3 ; s' <- s ; d' <- x ; x' <- d ;AIY 3 ; move to x' AIY D ; s MA AIY 4 ; s' AM AIY E ; x MA AIY 3 ; d' AM AIY C ; d MA AIY 5 ; x' AM TIA 1 ; lbl 0b1 ; AIY 1 ; advance to r' ; AM ; set lbl ; JUMP HANOI ; recurse JUMP SETR
If we run this through the assembler (or assemble it by hand), we get the very same sequence of instructions shown at the beginning of this entry.
Conclusion
In truth, and as Michael could attest, there were a fair few more iterations of this. Previous versions had to dip into the data area to fit. Throughout the process, many subtle sequencing decisions were made at indeterminate points to minimise the use of CH and the swaps of the Y and Z registers. Equally, I admit I did some last minute post-hoc rearranging of code, such as the order of the TIA and AIY instructions around the SETR label to maximise the opportunity for code reuse. Choosing which of the blocks to keep and which to discard may sometimes also require some trial and error in maximising total overall code savings.
Equally, the more abstract program design goals evolved over time. Very early iterations attempted some actual program driven I/O, as seen in the Microtronic version, rather than relying on the address LEDs to indicate state. Unfortunately, this was quickly dismissed as impractical. I did choose to retain the programatic pauses, however, so that there is no need to single-step the program and some semblance of an interactive experience remains.
Overall while many design trade-offs were made, I think it is fair to say we did achieve the goal of squeezing a Towers of Hanoi solver into just 80 nibbles (40 bytes) of memory. I hope you've enjoyed this deep dive into the code as much as I have. Despite its limitations, the Microcomputer Trainer clearly is a "real" computer. And for me, at least, it is yet to truly disappoint!
Discussions
Become a Hackaday.io Member
Create an account to leave a comment. Already have an account? Log In.