Close
0%
0%

1 Square Inch TTL CPU

A microprocessor built with TTL chips. In one square inch.

Similar projects worth following
What would be cool for the square inch contest or Hackaday prize.... How about a homebrew CPU with TTL chips ?

This is the smallest TTL Homebrew CPU in the world !

Like the old CPU's of the eighties, it has 40 connections, an 8 bit databus, and 16 bit addressbus. But it also has some very unusual properties....
Interesting is, that it occupies less area than an old 40-pin DIP processor...

The Square Inch processor is fully working ! A digital clock, driven by the square inch cpu, was designed to prove this.

This project is very much a race against the clock. The idea for this project occurred to me after the first week of september, leaving three weeks before the end of the Square Inch Contest. Since it involves processor architecture, hardware and pcb design, and software, this is hardly possible in three weeks when you also have a job.

The project was not finished in time for the square inch deadline, perhaps the Hackaday prize 2019 will bring more luck.

Well let's start with the usual characteristics:

HARDWARE FEATURES

  • 8 data lines
  • 16 address lines
  • Single 64kB memory space for program and data
  • Memory Read and Memory Write lines
  • Reset and clock inputs
  • 5 volt power lines

SOFTWARE FEATURES

  • Zero page addressing
  • Indirect Zero Page addressing
  • Immediate addressing
  • Load, store and compare instructions
  • Stack
  • Conditional branches
  • Subroutine calls
  • Memory-mapped I/O
  • Microprogrammed

UNUSUAL FEATURES

  • Two rows of 20 pins each, with 0.05 Inch (1.27 mm) pin distance
  • Microprogram is in FLASH memory and can be written with a Raspberry Pi as programmer
  • Programmer's registers are in RAM
  • There is NO ALU
  • Only 8 IC's, on both sides of a 1 square inch, 2-layer pcb.

The registers are in RAM. That has been done before (see TMS9900). This will not give you a speed devil, but it is needed to fit the design in one square inch. Another thing left out of the CPU for this reason is...... the ALU. ( I do not intend to connect the square-inch 4 bit TTL ALU to this cpu ).

DEMONSTRATION PROJECT

A development board was made, that has the external ROM and RAM for the CPU. It also has a 32kHz crystal with divider, and 6 displays, to make a digital clock. The clock is working now ! The development board also has an I/O connector, so it can also be used for other projects.

PROGRAMMING ENVIRONMENT

To make programming easy, an online Javascript editor/assembler/simulator was made. The assembly code for the application can be made and assembled in your browser. It can also simulate the cpu. If you open the simulator, just press Assemble and Run to see the simulated working clock (no soldering required) !

If you do this on a Raspberry Pi, you can save the assembly code to your local memory card, and you can download the assembled binary code. The Raspberry Pi can be connected to the development board, and directly burn the binary code into the flash ROM of the development board by means of a Python script.

While connected to the development board, the Raspberry Pi can also program new microcode into the square inch cpu. The required software and Python scripts are available in the files section.

THERE IS NO ALU

NO ALU...  I could have programmed a small PIC or AVR as ALU (Wikipedia: ALU), but that's cheating. With the current microcode version, the only arithmetic that it can do is compare bytes and address items in a table. And the hardware won't allow much more.

For incrementing or decrementing a byte, lookup-tables are set up that contain an incremented or decremented version of the lower 8 address bits. Now the processor can increment or decrement a byte. Nothing more is needed to do arithmetic !

This was also done in the legendary HP9100 programmable calculator that was introduced in 1968. It worked with transistors and diodes, not a single digital IC ! The story behind this calculator is amazing, and can be read on hp9825.com. People have tried to reverse-engineer the diode-transistor logic and came to the conclusion that the hardware of the machine could only increment or decrement digits (described by Tony Duell). Yet it could calculate with...

Read more »

SW20181025.zip

The ZIP contains the working SW status: - index.php, simas_n.js and style.css together are the assembler that runs in a browser. Live at www.enscope.nl/simas_nac. - clock.txt is the application SW that is a fully functional clock - mc_prog runs on the RPi and programs the microcode - 1sq_prog runs on the RPi, programs the application Flash, and can singlestep the processor

x-zip-compressed - 36.59 kB - 10/25/2018 at 17:26

Download

NAC1840.pdf

Schematic of the DIP-IC version of the square inch CPU

Adobe Portable Document Format - 25.42 kB - 10/20/2018 at 10:03

Preview

gerbers NAC1840.zip

PCB gerber files for the DIP-IC version of the square inch CPU

x-zip-compressed - 26.64 kB - 10/20/2018 at 10:01

Download

CLK1840.pdf

Schematic of the application pcb, "Clock".

Adobe Portable Document Format - 47.59 kB - 10/20/2018 at 10:00

Preview

CLK1840 gerbers.zip

Gerber files for the 3.3 x 3 inch application pcb, "Clock"

x-zip-compressed - 61.49 kB - 10/20/2018 at 09:59

Download

View all 10 files

  • 1 × SST39SF010A-70-4C-WHE Memory ICs / FLASH Memory 128K x 8, stores the microcode, TSOP32, U1
  • 1 × SN74HC161PWR Logic ICs / 4 bit counter, TSSOP-16, U2
  • 1 × SQ1C1841 PCB dual sided, 1x1 Inch, holes 0.3mm, trace/clearance 6/6 mil
  • 1 × SN74HC273PWR Logic ICs / Octal Flip-Flop with clear, high order microcode counter bits, TSSOP-20, U3
  • 3 × SN74HC574PWR Logic ICs / Octal Flip-Flop with 3-state output for B,H,L registers, TSSOP-20, U4, U5, U6

View all 11 components

  • Added the clock to the Tell Time Contest

    roelh01/19/2020 at 17:26 0 comments

    The clock that was built with the 1-square-Inch CPU was entered in the Tell Time Contest, you can see the entry HERE

    And a few months ago, this CPU was featured on Hackaday !

    I also entered another clock to the contest, the Forklift clock .

  • CPU article by Cabe Atwell

    roelh09/05/2019 at 12:48 0 comments

    I am very proud to see that Cabe Atwell wrote about this project and the Kobold computer on Hackster.io !  Read the article HERE !

  • Building a full computer around it

    roelh04/09/2019 at 18:11 0 comments

    Today, the idea of the KOBOLD computer was published here at Hackaday. You can find it here: Kobold - retro TTL computer

  • Clock simulation on webpage working

    roelh10/25/2018 at 17:48 0 comments

    The 6 digits of the clock are added to the Javascript simulator/assembler and are fully functional now. They are addressable in the same way as in the real hardware. You can open the page, press "assemble" and then "run" to see the clock working, driven by the same software that runs on the device itself. (But pressing run without having the code assembled will hang the script, be warned).

    The simulation runs at 1200 cycles per second (depending on your computer and browser speed). That is a bit too slow to have the clock running exactly in the same way as on real hardware. The clock software has a prescaler that counts four edges on the 2Hz hardware signal, to get a one-second tick. The simulator has been tweaked to disable the prescaler, and the one-second-signal is simply generated after each multiplexing "frame" (so the counter is free-running and not coupled to real seconds). The memory locations that are used by the prescaler (0x058 and 0x0030) are manipulated by the simulator. The clock software itself is unchanged. Buttons for setting the time are not available in the simulator, mainly because the software would not scan them frequently enough.

    In the files section, the software zipfile was updated with this new version.

    Note that you can just change the software in the leftmost window, press assemble and run, and see how the program behaves ! For instance, it is easy to change the starting time of the clock from 00:00:00 to another value.

  • Schematic of the CPU

    roelh10/21/2018 at 10:44 0 comments

    Time to fully disclose the operation of the CPU ! The complete schematics can be found in the files section, SQ1C1841.PDF.

    REGISTER B

    Let's start with the simplest part. Since the CPU has no ALU, the most important task of the CPU is moving data from one place to another. So in one cycle it fetches a byte from memory and stores it in the B register. In the next cycle, it can put the contents of the B register on the databus and store the value somewhere. So the B register can be loaded from the databus (at the end of a low pulse on TW), and its contents can be placed on the databus (when TR is low). 

    The signals are called TR (T-Read) and TW (T-Write) because the B register was previously named "Temp".

    In the decoder part you will see that the write and read signals come almost at the same time. It was expected that at the end of these signals, the databus would keep its last signals for perhaps twenty nanoseconds after the '574 was tri-stated at the end of the TR signal. But this did not work, so the TR signal had to be delayed by a small 100pF capacitor. This capacitor was not on the first version of the board. It is also not on the NAC pcb, it must be added on the NAC as a modification.

    Note that the low side of the capacitor is connected to VCC (5 volt) instead of ground, this is because it was impossible to make this connection to ground with a trace on the pcb. But it was possible to connect it to VCC, so that was chosen. It makes no difference for the operation of the capacitor.

    ADDRESS GENERATION

    The L register can be written directly from the databus by making LW low. The output of the L register generates the address bits A0 - A7. Note that the lowest 3 bits are OR'ed (with HC32 gates) with the microcode bits IR0 - IR2. In most cases, only IR0 is used (when the H and L outputs are active), to select either LSB from the address in HL, or select the MSB from (HL+1).

    The H register can not be loaded from the databus. A value must first be in the L register, that will generate the address bits A0-A7, and then these address bits can be loaded into the H register by making HW low. The output of the H register generates the address bits A8 - A15. Note that the highest bit of the H register is called "Flag". This is connected to one of the address inputs of the microprogram storage, and makes conditional jumps possible.

    An important feature is, that the outputs of the H and L register can be switched off (tri-stated) by the microcode bit IR4 (at pin 1 of the '574 registers). The outputs have pulldown resistors to pull the lines to zero in this case. (For A3 - A14 the pulldown resistors are external to the cpu). Now, by using the microcode bits IR0 - IR3, the microcode can generate 16 fixed addresses: A range in RAM (0x0000-0x0007), used for addressing the "software-registers" A, PC and SP, and a range in flash ROM (0x8000 - 0x8007) used to access constants for loading the pc at startup, and to access constants for jump addresses in microcode.

    The fact that H and L can be loaded independently of each other makes it easy to manipulate tables of 256 bytes, like the stack area, the increment and decrement tables, and the area used for the CMPB instruction.

    MICROPROGRAM

    There are two parts (U2 and U3) that together produce the 12-bit microcode address. Only the lowest four bits are in a '161 counter chip and actually count (at each cpu CLK pulse). That means that a stretch of 16 micro-instructions must always end with a jump to another position in the microcode (otherwise the same 16 micro-instructions will be repeated).

    The microcode address is connected to the address bus of the Flash ROM chip U1, together with the flag bit this makes 13 address lines. So why are the unused address lines (A9 -A12) not simply...

    Read more »

  • Clock application pcb

    roelh10/20/2018 at 14:43 0 comments

    CPU demo: CLOCK 

    Since yesterday, the clock application is fully working. It includes button functions for setting the clock to the current time. The browser based Javascript assembler was updated again, it will now show today's clock source code when you open the page. 

    An "Assembler code manual" was made and put in the files section today.

    This log will discuss the pcb that was made for the clock demo.

    For the full schematic of the clock application pcb, please refer to the CLK1840.pdf in the files section. The gerber fies are also in the files section.

    ADDRESS DECODER

    The address decoder on the schematic and pcb is this:

    This was designed very quickly before the implications of all design decisions were known. But it has the possibility to swap the address range of ROM and RAM, by leaving R30 and R31 out and cross-connect them. It also shows that when the programming of the application flash is enabled (PR_ENA = 0), the ROM will always be selected. That is needed because for programming the ROM, all 16 address lines must be controllable. 

    However, this circuit must be changed by modifications on the pcb:

    As you can see in other logs of this project, ROM starts at 0x8000 so the R30-R31 swap must be done. And the I/O ports were not properly decoded, that's also corrected here.

    The ROM can be selected by the decoder OR by the PR_ENA signal, so at first sight the HC32 OR gate was OK. Just too bad that these signals are active low, so a HC08 AND gate is needed. I soldered it on top of the the HC32 (the outputs not connected), disconnected pin3 of the HC32, and a short piece of wire did the rest.

    And during programming of the microcode or application flash, the Raspberry Pi needs control of the databus. That would become a mess if the Flash ROM or RAM of the clock pcb would also put something on the bus. So, during programming, the switch must be closed to disable the ROM and RAM (In my version, the switch is just a wire with crocodile clips). The Raspberry Pi can however still read or write to/from the application Flash ROM because it can enable it by the PR_ENA line. 

    For a next version, the switch should be a signal controlled by the RPi.

    CPU CLOCK

    The cpu clock is a free-running RC oscillator built from a schmitt-trigger port HC132:

    The clock runs at 440kHz but can run a lot faster, that just hasn't been tried yet. For normal operation, the clock must be running, so PR_ENA must be 1 and PR_CLK must be 1. During flash programming, the clock will be stopped by making PR_ENA high, and the RPi can give single (active-low) clock pulses by making PR_CLK high (will give a low pulse on CLK).

    There is a small problem with this circuit. When the CPU is single-stepped by the Raspberry Pi, the PR_ENA must be high (otherwise ROM is enabled all the time, see ADDRESS DECODER). But that means that the clock is not properly stopped. The workaround is, that the timing capacitor C2 must be manually shorted during single stepping. Room for improvement !

    REAL TIME CLOCK

    The real time clock delivers a 2 Hz signal derived from a watch crystal. At first it didn't want to run at the correct frequency. It took me some time, but the solution was simple. The power bypass capacitor for the 4060 was not close enough to the chip (a few cm away). After bypassing VCC directly over the chip, with 100nF, it ran perfect (the schematic shows VCC unconnected, but it certainly is connected on the pcb). The 4060 is the only chip that is SMD (SO16) and mounted on the bottom side. All other chips on this pcb are DIP and mounted on top side.

    CPU CONNECTION

    Here...

    Read more »

  • New instructions

    roelh10/15/2018 at 14:39 1 comment

    The new microcode version, with new instructions, is working now !

    We now have STACK instructions:

     PUSH Z     ; move a 16-bit value from Z-page to stack
    
     POP Z      ; move a 16-bit value from stack to Z-page
    
     CALL label ; push the return address on stack and jump to 16-bit label
                ; 4-byte instruction: opcode, unused, label-lsb, label-msb
    
     RET        ; load the pc from stack (equivalent to POP PC)

    The stack is similar to the 6502 stack, it occupies locations 0100-01FF. There is a single-byte
    stack pointer at location 0 in RAM. The stack grows downwards. All items on stack are 16-bit.

    Note that it would be possible to change the microcode to have a separate stack for return addresses, this would enable a FORTH-style of programming. 

     
    Another new instruction is the BYTE-COMPARE instruction:

     CMPB A,Z    ; compare byte in accumulator with byte in zpage
                 ; result A=0x00 when both are equal, A=0x80 when not equal
    

    This works together with the conditional branches BRM and BRP.  To make programming easier, the BRM and BRP instructions also have another name from now on: BEQ and BNE, that do exactly what you would expect after a compare instruction:

     BEQ label     ; branch if bytes were equal
      
     BNE label     ; branch if bytes were not equal

     The conditional branches BRM and BRP were originally defined to assist in loop counting. A loop counter could count backwards from a certain value (up to 127). When the count changes from 00 to FF, this FF value can be detected by a BRM (branch if minus) or BRP (branch if positive) instruction.

    Back to the byte-compare instruction. How is it possible that it compares two bytes, while there is no ALU or byte-compare chip that can do this function ?

    Suppose the two values to compare are XX and YY (hex). The processor needs the reserved RAM region 0x0200-0x02FF for this function.

    1. It writes 0x80 to 0x02XX
    2. It writes 0x00 to 0x02YY
    3. The result is read from 0x02XX.

    Ii is easy to see that the result will be 0x80 because that was written to 0x02XX. But if YY is the same as XX, the value at 0x02XX is overwritten by 0x00, so the result is 0x00 when both values are equal ! That is how it works !

    The microcode that does this can be seen on the Javascript Assembler page. Scroll down, the byte-compare instruction starts at addres 0x00E0.

  • Success !

    roelh10/11/2018 at 20:24 2 comments

    The smallest homebrew CPU in the world is working now !

    Last week, I spend a few evenings changing the layout, to correct the pinout of the flash chip. Not a really easy task, this is a very crowded (only 2 sided) layout. One week ago, on thursday evening, the new layout was sent to China, and I got the pcb's yesterday. It was assembled today (I had ordered a double set of components the first time, perhaps already having the feeling that the first time would not be right). 

    It is now running a simple program, it fills 6 memory locations with the 7-segment values for the digits 1 to 6, and then writes each of them to the multiplexed display in a loop. 

    To show that there's nothing hidden, I disconnected the RPi, and I also show the back side of the pcb.

    The display is a bit dim, that's because in the time allotted for each digit, it is only on during 3 instructions and off during 4 instructions. But in reality it is looking better. Perhaps I'll also have to make the resistors a bit lower in value. You can see the program on the  Browser-based JS assembler page. 

    Back side of the pcb:

    I did not work more on the application program yet. After the strike of Murphy, I decided I needed subroutines for the clock program, so I extended the microcode, also making it a bit more dense. The new microcode is now on the Javascript assembler page, but it refused to work on the NAC pcb so it has an error. Therefore, the version that you see on the picture is running with the old microcode.

    I will soon update the schematics and gerbers, and describe the application pcb, the NAC, and the modifications.

  • Murphy strikes

    roelh09/30/2018 at 09:41 0 comments

    After having the whole system debugged with the NAC pcb, having display multiplexing working, it was time to replace the NAC by the real thing, the 1x1" CPU.

    After having corrected a few bad soldered pins, it was still impossible to program the microcode. When reading the flash, all locations returned 0x00. Strange, because unprogrammed locations normally return 0xFF.....

    Back to the datasheet......  

    Well everything that CAN go wrong, WILL go wrong. So if you don't check the datasheet for the pinout....  The pinout for the TSOP is different from the DIP pinout !   Beginner's mistake, I assumed they would be the same without even thinking about it. 

    So this means re-doing the pcb design and order new PCB's. Today, I will continue to program the 6-digit clock.

  • Debugging struggle

    roelh09/28/2018 at 12:20 1 comment

    The NAC pcb is a bigger version of the square inch pcb. It is intended to make debugging easier. It has two unused footprints, to have room for extra IC's if that would be needed.:

    Debugging started with the NAC connected to the application pcb and the Raspbery Pi programmer.

    The Raspberry Pi will have two python scripts, one for programming the microcode and one for programming the application code. The last one can also single-step the processor and display the micro-instructions and databus value at each step.

    I corrected a few problems. At this moment, stepping through the microcode works, but only for about 6 micro-instructions after reset. After that, microcode reads as FF...  have to investigate....

    Next picture shows the square inch cpu on the white application pcb. The application pcb contains RAM (empty socket on picture), Flash-ROM, I/O and support functions like clock generation, reset circuit and connecting to the RPi programmer:

    And here is the backside of the CPU, with the 'big' microcode Flash:


View all 16 project logs

Enjoy this project?

Share

Discussions

Sergei wrote 09/15/2024 at 03:29 point

Hi! Using a large memory chip is cheating. The table in memory replaces many chips, like how sometimes an ALU is made from a memory chip to make it work faster. But the work is good, I enjoyed it. You are great! Good luck to you!

  Are you sure? yes | no

Tingqian Li wrote 12/14/2019 at 20:04 point

I found another design https://www.bigmessowires.com/nibbler/

instead of using micro-program for executing an instruction, it just decode the instruction into a set of control signals, so the uROM or micro-code concept there is more like instruction decoder, the primary reason why nibbler can does that and 1square-inch can't is that:

Nibbler has very rich internal inter-connect, so a meaningful operation can be done in one clk cycle, but 1square-inch's internal structure is too simple to perform a meaningful instruction within one cycle, that's why micro-program is required, not just micro-code.

from which I realized that the basic nature of instruction is a set of control signals setup before next clock ring-edge so then the state of CPU will be updated accordingly. the whole CPU is a FSM, and instruction are the ways to change it's state.

Even more, sometime we don't need a full Tuning-complete CPU to solve a problem, in that case a simpler FSM can be designed w/o the BUS/RAM even ROM

  Are you sure? yes | no

roelh wrote 12/14/2019 at 22:00 point

In contrary to the Nibbler, the 1-Square-Inch CPU has indirect addressing. Indirect addressing is needed for all programs except the very simple ones. The 1-Square-Inch CPU is Turing-complete, but not very fast because it needs several cycles for an instruction.

  Are you sure? yes | no

Tingqian Li wrote 12/15/2019 at 22:00 point

Yes, you pointed out a very important difference, the indirect addressing, I noticed Nibbler's author also consider it as a "significant limitation", Nibbler will face difficulty when solving stack-based recursive algorithm.

  Are you sure? yes | no

Tingqian Li wrote 12/12/2019 at 14:37 point

Wonderful work. it inspired me very much, Thank you so much for this excellent work. I believe this design roughly belongs to the one-instruction CPU (in microcode we can see, only move instruction between register & RAM/ROM is implemented), and clever design of table lookup instead of ALU, even more interesting is the microcode design, it turns the fundamental HW into an interpreter !!!  I'm impressed with the software mindset that can be a so powerful complementary in the HW design !!!!

Why not add ALU to make it a more practical CPU? I guess it must be the 1 square inch area limited the possibility. after all that's why semiconductor industry keeps trying to integrate more transistors into the chip to make it more powerful.

Just found your another design:

https://hackaday.io/project/164897-kobold-retro-ttl-computer

https://hackaday.io/project/167605-kobold-k2-risc-ttl-computer


I like your TTL-based CPU which is simple enough to understand & build and complex enough to play with fun!

  Are you sure? yes | no

roelh wrote 12/14/2019 at 21:52 point

Thank you for your nice words !  There are many ways to add an ALU to the design. In my case, the result (Kobold K2) has become totally different. It has no microcode and will do every instruction in one fetch and one execute cycle. 

  Are you sure? yes | no

roelh wrote 10/16/2019 at 19:57 point

Thanks Yann !

  Are you sure? yes | no

John Croudy wrote 06/23/2019 at 16:31 point

Extremely impressive!

  Are you sure? yes | no

roelh wrote 06/23/2019 at 17:45 point

Thanks John !

  Are you sure? yes | no

PixelDud wrote 12/07/2018 at 18:27 point

To what extent could this be used?

  Are you sure? yes | no

roelh wrote 12/07/2018 at 19:01 point

Hi Asher, 

of course this processor is very simple and things will not go very fast (but speed can be at least 100000 instructions per second). But the memory space is quite big, so it has enough memory space for programs from the Apple II or ZX spectrum era. 

It would not be too difficult to make a calculator. You could store each digit of a number in a separate byte. Adding digits can be done by repeatedly decrementing A and incrementing B, until A is zero. Then you have to handle the carry if B is more than 9.

Multiplication can be done by adding repeatedly. The speed of the CPU is high enough for a scientific calculator, I think.

The demonstration program (the working clock) shows how counting digits can be done. You can run that software NOW in the javascript simulator.

It would also be nice to build a Video interface and make a few games.

  Are you sure? yes | no

Yann Guidon / YGDES wrote 10/14/2018 at 03:48 point

Oh my ! :-D

  Are you sure? yes | no

Dusan Petrovic wrote 10/02/2018 at 11:40 point

Indeed, added, thanks @BigEd !

  Are you sure? yes | no

Ed S wrote 09/30/2018 at 09:17 point

This would be a good addition to the https://hackaday.io/list/25846-homebrew-cpu list, @Dusan Petrovic !

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates