-
Big day for DB6502!
06/20/2021 at 21:22 • 0 commentsGood things do happen every now and then
Today I want to start with the latest update - and for a change, this will be very optimistic one. After I shared last entries on Reddit, I got suggestion to try out different PLD chip variant for address decoding, wait state generation and nRD/nWR stretching. Sure, I have used those in the past for my breadboard build, but they never worked very well, causing all the possible random bus issues. I put them in the components box and never looked back.
As it turns out, it pays of to revisit old findings when circumstances change.
I replaced ATF22V10C-15 for an ATF22V10C-10 and strange things started happening. For one, I could finally remove the external circuitry for nRD/nWR stretching and use PLD for this again. Second, it turned out that the PCB build is much more stable, so I started testing various oscillators, and this is where the amazing screenshot came from:
Yes, you are reading this correctly: DB6502 with OS/1 runs perfectly stable at 16MHz on the latest PCB. You can load programs, they all run just fine. MS BASIC works:
It runs MicroChess too, and for a first time it's pretty snappy:
So yeah, that is a small step for computing, but one giant leap for DB6502. After months of struggle I finally reached one of my goals (system clock running at 14MHz), and even went a few more steps further. Feels great, man!
Key takeaway: revisit old assumptions when circumstances change, share to get feedback and use the feedback to improve.
Setup details
So, how does it work? As I said, all that was required was to replace 15ns PLD by faster, 10ns one. Suddenly all the timing violations disappeared and I could remove the external circuitry for nRD/nWR stretching. Thanks to that fact the PCB is running standalone again.
As for the wait state generator, I'm using no wait states for RAM access, one wait state for I/O operations and two wait states for ROM access (using the 150ns EEPROM variant). It's pretty neat setup, and it pushes all the components to their limits. The only thing that could work faster is the VIA chip - it's capable of running at full 14MHz speed, but since it's part of I/O range, it gets one wait state automatically.
There are also other things on the board I haven't tested fully yet, and the biggest one is memory banking. I have tested all the involved components, but to test drive it properly (and according to original design) I need to change the OS/1 code, memory mapping and other things, so it will not happen overnight. Still, based on what I have seen so far it should be working well.
Should be working well. Famous last words.
Is it all good then?
Well, obviously not. There were some issues with bus ownership handling by the supervisor chip, but this was simple software bug. I had some issues with using supervisor as a clock source, but again - this was resolved by software change.
While fixing those I have discovered something disturbing, and further investigation resulted in another interesting find.
The epic struggle for onboard AVR ISP port
As you probably don't remember, in my first DB6502 v2 prototype board I have messed up AVR ISP port layout. It was actually pretty silly mistake, but easy to make, so you can read about it here. It was fixed pretty easily by adding special adapter wires reversing port polarity, and it worked. Or, rather, I think it did. Thing is I haven't used it a lot, once you upload supervisor firmware (and it doesn't change frequently), you don't really need the port for anything else most of the time.
This time I made sure the port is oriented correctly and it was fine. When I discovered the bug mentioned in the previous section I simply uploaded new version of supervisor code and started using it, just as expected. Then I noticed another small bug, fixed it and tried uploading.
Nope.
Nope.
Nope.
Out of a sudden the AVR ISP port stopped working. I unplugged power, waited a few seconds, powered it all up again and... boom, it works. Once. Every consecutive attempt would fail.
Sometimes even the power-down/power-up cycle would not suffice. While not very important issue, it was annoying, so I started another investigation.
AVR ISP interface explained
In case you haven't had the pleasure of working with AVR chips, their ISP sequence starts with reset line being pulled low followed by conversation over SPI port on dedicated pins. This is how ATMega644 pinout looks like:
Image source.
Now, since I wanted the supervisor to drive 6502 bus as fast as possible, I needed these pins connected to data bus, like so:
That shouldn't be an issue, right? After all, AVR programming will be happening while reset line is pulled low. This causes 6502 bus pins to go in tri-stated mode, so if the programmer is the only thing using these data bus lines doubled with SPI interface, there will be no problem, right?
Then it struck me: actually, wrong. Sure, 6502 CPU, 6522 VIA and SC26C92 UART bus connections will be tri-stated during reset, but RAM and ROM? These guys are not connected to reset line in any possible way. Same goes for PLD, which will happily output valid RAM_CS or ROM_CS signals during reset sequence. Clock will keep ticking and valid nRD/nWR signals will be produced as well, even if computed from invalid (floating) r/W signal.
I checked this assumption with my logic analyser and indeed, it detected valid low signal on ROM_CS line during IVR programming sequence. As a result, AVR wasn't able to respond correctly to commands sent by USBASP programmer. I had the culprit then!
Or so I thought.
To test my hypothesis I moved AVR supervisor to breadboard and connected it via the PCB socket using jumper wire.
Side note: do not do that. It turned out that my 22 AWG wire was so thick that pushing it in loosened contacts in the tooled socket. Took me few hours the following day to figure that out after I moved AVR supervisor back to the socket and couldn't understand why it doesn't seem to work anymore. The fun never stops...
Anyway, with the AVR on breadboard I could disconnect lines D5, D6 and D7 from 6502 bus temporarily (so they would be connected to USBASP programmer only) to see if that resolved issue with programming.
Guess what.
It didn't.
Bloody hell.
There was just one last connection from the AVR ISP port left to check - the RESET line.
RESET line in DB6502
For my power-on-reset circuit in DB6502 I use Maxim Integrated DS1813-10 chip. It's very convenient device that will do two things:
- Monitor source voltage and if it drops below 4.5V (in the -10 variant) it will protect your system by triggering reset,
- Monitor RESET line and if it's pulled to GND for a while, it will keep it low for about 150ms letting all connected systems to respond to the RESET condition properly.
Obviously, following the datasheet I added RESET button to it like so:
If you look closely at the AVR ISP schematic few sections above, you will notice that there is another way the nRES line can be pulled low: via AVRISP port. This is what USBASP will do at the beginning of programming sequence.
To summarise: we have two ways to pull nRES low - using onboard button or using AVR ISP interface. Both work beautifully every time - each attempt at uploading new supervisor software initiated correct reset sequence. That being said, there must be something different in how the first reset after power-down/power-up sequence works from all the consecutive ones.
New approach to troubleshooting
By that time I have figured out that the problem is in the nRES signal not being correct during the firmware update. Thing is - troubleshooting something like that on PCB is not easy. Some traces are not easily accessible, and moving the supervisor to breadboard with the jumper wires caused even more issues. I decided to try out something different this time.
If that was software project (my day job basically), I wouldn't try to investigate this sort of issue in huge, complex and interconnected environment with database clusters, mainframe machines and distributed integration nodes. I would try to isolate and replicate the problem in much simpler environment.
I decided to give this approach a try. Instead of trying to investigate my complex board, I rebuilt small fraction of it using different AVR chip (ATmega328P) and DS1813-10, like so:
So yeah, the simplest possible configuration. And yes, in case you are wondering, it failed exactly the same way. First time it would work just fine:
➜ blink-atmega328p-usbasp git:(main) ✗ make clean all flash (...) avrdude -v -B 5 -c usbasp -p m328p -P usb -U flash:w:build/main.hex (...) Programmer Type : usbasp Description : USBasp, http://www.fischl.de/usbasp/ avrdude: set SCK frequency to 187500 Hz avrdude: AVR device initialized and ready to accept instructions Reading | ################################################## | 100% 0.00s avrdude: Device signature = 0x1e950f (probably m328p) avrdude: safemode: hfuse reads as D7 avrdude: safemode: efuse reads as FD avrdude: NOTE: "flash" memory has been specified, an erase cycle will be performed To disable this feature, specify the -D option. avrdude: erasing chip avrdude: set SCK frequency to 187500 Hz avrdude: reading input file "build/main.hex" avrdude: input file build/main.hex auto detected as Intel Hex avrdude: writing flash (176 bytes): Writing | ################################################## | 100% 0.14s avrdude: 176 bytes of flash written avrdude: verifying flash memory against build/main.hex: avrdude: load data flash data from input file build/main.hex: avrdude: input file build/main.hex auto detected as Intel Hex avrdude: input file build/main.hex contains 176 bytes avrdude: reading on-chip flash data: Reading | ################################################## | 100% 0.08s avrdude: verifying ... avrdude: 176 bytes of flash verified avrdude: safemode: hfuse reads as D7 avrdude: safemode: efuse reads as FD avrdude: safemode: Fuses OK (E:FD, H:D7, L:62) avrdude done. Thank you.
Next time - not so much:
➜ blink-atmega328p-usbasp git:(main) ✗ make flash avrdude -v -B 5 -c usbasp -p m328p -P usb -U flash:w:build/main.hex (...) Programmer Type : usbasp Description : USBasp, http://www.fischl.de/usbasp/ avrdude: set SCK frequency to 187500 Hz avrdude: error: program enable: target doesn't answer. 1 avrdude: initialization failed, rc=-1 Double check connections and try again, or use -F to override this check. avrdude done. Thank you. make: *** [flash] Error 1
And that was it. Disconnect everything from power, wait a moment, connect again - works once with each consecutive attempt failing.
This was great: I managed to isolate and replicate the problem is much simpler environment where the investigation was much easier.
Here we go again...
Obviously the first thing to do was to find out that the hell is going on. This is what I captured on the scope (connected to TP1 on RESET line in schematic above) during first reset after power-up:
See how clean this signal is?
This is how it looks each next time:
See that shadow there after about 100ms? This is what it looks like up close:
Honestly, this is it: just a couple dozen spikes maybe 800mV max. Tiny, insignificant signal. This shouldn't affect RESET line operation on the AVR, right?
Well, wrong again.
See, with 5V input this 800mV is way above something that could be guaranteed to be read as low. So, most probably, these spikes were confusing AVR and preventing proper understanding of programming sequence.
Still, where were these spikes coming from?
Now, I can't say for sure, but the explanation I came up with was the following: I assumed that USBASP can easily pull down RESET line, but in fact, it has pretty limited sink capability, restricted by 270 Ohm R6 resistor as seen here:
With 5V supply it should be able to sink no more than 18mA. Probably this is reduced by some parasitic capacitance (?) on the programmer input, and each next time it's just not enough to keep the voltage low enough, causing this strange behavior.
Maybe you have some better explanation for it? I would love to hear in the comments below.
Solving the problem
Basically I needed a way to pull the RESET line to GND stronger than the USBASP does that. I tried two options, and surprisingly, they both worked. One is to connect USBASP RESET output to P-MOSFET gate (with pull-up resistor) and pull the RESET line via source and drain to GND.
There are some things to be pointed out here. First, there is interesting difference between the first and second programming run:
Second:
During second (and each consecutive one) you can see that spike after about 100ms:
There is, however, much more severe issue - RESET voltage produced by this setup is limited by MOSFET's Vgs threshold:
So it's pretty obvious that the MOSFET itself can't pull the reset line all the way to GND, it will stop when the difference between gate (USBASP RESET line pulled to GND) and source (AVR RESET input and DS1813 output) reaches Vgs threshold value. What did surprise me is that I would have expected DS1813 detect this condition as reset and pull the line all the way to GND. Clearly, this doesn't happen.
What is also surprising is that this build does work each time. After all, this 720mV is not that much lower than the observed spikes in initial setup. Still, I didn't want to risk another issue with next revision PCB, so I gave up MOSFET idea.
Alternative solution is actually very simple:
This way you never get your reset supervisor fighting your AVR programmer, and the signal looks much better:
The second run also looks pretty good:
There is that spike again! Closer look reveals that it might be deliberate action from USBASP:
Pretty clean signal with clear 5us duration. There must be some reason it's there.
Oh, and as it turns out, it happens also at the first run, just much sooner:
So yeah, maybe there is more to the AVR ISP interface than I can see.
And, what's worse, the problem is not solved yet, but this entry is already long enough. Remember the data bus lines? Yeah, they cause issues too and I will handle them next time. It was still fun though!
Summary
It was amazing to see the build finally work as designed, but it would have never happened if not for a comment I got on Reddit, suggesting to check the faster PLD again. This also means that you can reach your goals, it just takes persistence and a bit of faith. I wasn't sure if PCB can resolve issues I had with breadboard build, but apparently it did. Sure, it doesn't mean that breadboards are useless, it's just a reminder that sometimes going with your gut feeling is the right way to do.
As for the AVR adventure - I should have seen this coming. My initial design was overly optimistic and I should have anticipated issues. I still don't understand how comes I haven't noticed it before, but frankly? I might have missed it among plethora of smaller bugs and glitches haunting previous revision of my build.
Bottom line: after documenting second part of the AVR ISP glitch I will move forward with OS/1 implementation - I still have this memory banking thing to play with - and in parallel I might actually go forward with the final version 2 PCB for DB6502 that I could share with other people. Exciting times ahead!
-
Strange ROM issue
06/15/2021 at 09:45 • 0 commentsGlitches, glitches everywhere
Last time I wrote about simple software bug that caused very scary-looking issue in my serial communication implementation. Before that I also covered problem with 15ns PLD and nRW/nRD stretching. There was another one I haven't even mentioned so far, but it scared the hell out of me: when I started up my latest DB6502 prototype board, the CPU wouldn't just work. Like at all. It was powered and all, but it was not doing anything. Pretty soon I realised that it was the RDY line that was permanently held down, and then I understood - latest revision of the board doesn't use the open collector variant of the RDY circuit, but the parallel RC network for faster operation. All it took was a simple update of the PLD code from this:
RDY.OE = !RDY;
to this:
RDY.OE = 'b'1;
And now, instead of open collector, I had nice "drive always" RDY output from PLD. After fixing all the simulation errors I got it to work nicely.
Still, it's pretty frightening when you spend months on designing the board, pay for fabrication, spend couple of hours soldering and it just doesn't start at all.
Anyway, what I'm trying to say is that apparently my project is getting to the level of complexity where I have to be very, very careful with each step, because the number of moving parts is growing and making sense of it all can be difficult.
ROM flashing issue
Back to where I left off last time - I fixed simple OS/1 code issue and it should have worked. It didn't, because when I tried flashing ROM via the onboard AVR programmer it would just fail silently. Even worse - it failed, but claimed to have succeeded.
Now, I've been meaning to write about it for some time now, but haven't gotten around to it yet. See, there are two ways to write to the ROM memory: you can write a byte (or page of 64 consecutive of these at a time) and wait for 10ms, or you can perform write and just wait for the chip to finish. These chips have this nice feature where after performing write operation you can read any address and there will be two bits that you can use to determine if the previous write is complete. It makes the whole process much faster, because in most (if not all cases) you don't have to wait full 10ms.
Thing is - it has actually already caused one issue in the past, so I wasn't surprised to see it happening again.
This is how the process starts. In this particular scenario, I'm running simple "check Software Data Protection status" code - it reads first byte in ROM (
0xA2
) and writes the XOR value of it (0x5D
). It will read the same address after write operation is completed - and if it is the old one, it means that SDP is enabled, obviously. If it changes (so the SDP is disabled), it will overwrite it again with the initial value to preserve original state.All the r/W, CLK, BE signals are coming from AVR, the ROM_CS (at the bottom) is calculated by PLD as usual. Actually, RDY is also calculated by PLD and if you look closely, you will notice another problem, but I will deal with this one later.
Anyway, you can see the sequence working as expected: read
0xA2
, write0x5D
, keep reading the same address until the two consecutive reads yield the same value on bit 6 - this indicates that the write operation is completed. Unfortunately, this is not what happened:As you can see, there is something odd about the last read operation: the clock signal is clean, but there is something off about the nRD. It looks like the write operation is completed (two reads of
0xFF
), but the next read also results in the same value (0xFF
), where we expected either0xA2
or0x5D
. You have probably guessed by now what happened, so let's confirm with a close-up:Yep, during single clock cycle there were two read operations performed. Strangely enough, even though these pulses were very short (low pulse measured 25ns, high one 10ns), it was sufficient for the ROM chip to respond as if they were valid. Remember that ROM chip is not connected to CLK line, it only relies on ROM_CS and nRD/nWR signals, and based on those it looked like two very fast, but valid reads. It just responded accordingly.
If you think about it, there is actually very simple "solution" to the problem - read the status three or four times, not just twice. It would have probably worked most of the time, since this ringing on nRD line doesn't happen frequently. That thought terrified me - I wonder how many of actual hardware problems in my devices at home are "solved" this way. It would definitely explain some of the infuriating random failures of these "smart" gadgets.
Obviously, I didn't want that. I need proper solution.
Schroedinger's Cat
Did I mention how much of an ignorant I am when it comes to analog electronics? I love things nice and simple, with clearly defined signals that can be measured, recorded, quantified and processed. Unfortunately, there are cases like this, when even the process of measuring the signal changes it enough to make the measurement useless. It's like opening the box just to find the cat dead every time...
Just as a reminder, this is the circuit responsible for calculation of the nRD signal:
When I measured the nRD (or RDN on the schematic above) signal produced by the redesigned nRD/nWR circuit, it indeed rang like expected:
So, it's a start - I have a solid confirmation of what I saw on the logic analyser output. Time to trace back to the offending signal. First, I checked the upper input of the OR gate (pin 1 of U2A in schematic above):
Nope. The signal is ringing a bit, but well below the range where it could cause issues. Checking the other input (pin 2 on U2A) and output of the AND gate then:
Yeah, that's where the distortion is coming from. So, which of the two inputs of the AND gate is causing the issue? Ready (pin 1 on U3A gate) maybe:
Nope, it's noisy a bit again, but doesn't look that bad. It must be the inverted clock signal then (nCLK), since it's the last signal to check:
And this is what I really hate. Once I connected my probe to nCLK signal (pin 2 of U3A gate) the issue stopped occurring. Just like that.
Put the probe on nCLK - no issue.
Take the probe away from nCLK - issue is back.
Ha, easy. All I have to do is to is to add oscilloscope to the BOM of my project and expose test points for connecting probe to fix the problem!
Seriously though, I don't know how to properly explain what is going on. After thinking about it for a while I came up with the following explanation: this nCLK signal is actually routed to number of input gates, and since these are CMOS chips, they each work as little capacitors, storing a bit of charge each time the signal is high. After it falls back to low, this charge has to go somewhere, causing the short spike that is sufficient to trigger other input gate, but not being able to overcome probe tip capacitance.
I don't know, maybe it doesn't make sense, please let me know what you think. There is however something valuable to learn here, even if I'm wrong.
For one, and this is probably obvious to you, but it was an inportant discovery for me. It means that there is a way to measure this kind of freak signals. If they are sufficient to drive input of a gate, then instead of trying to measure them directly (because it clearly doesn't work), measure them going through a gate - this will prevent probe capacitance to alter the signal.
There is, however, another takeaway from this. If probe capacitance is enough to clean the signal, then the "solution" to problem like that can be adding one small capacitor (I used 22pF) between the signal and ground. It worked immediately, nCLK signal got much nicer, it doesn't ring as much anymore.
Bonus glitch - buy one, get one free
When I was doing the analysis of the signals, I noticed something weird - RDY signal not being as nice and clean as expected. These two images will show it clearly (they are both taken from single scope run, note the frame number at the bottom):
And just a few frames later:
Got me worried for a second, and indeed, it's a design flaw that needs addressing. Luckily, not as severe as it might seem.
And the good thing is I actually tested one of the designed features of my build, just did so unintentionally.
What's going on here?
Let's start with the good news: this not-that-high RDY signal comes from design feature I described in RDY signal experiments entry. In short, sometimes the RDY input on the 6502 can become output producing low signal, and this is what happened here: wait state circuit is driving the RDY signal high, while CPU is driving the same line low. There is 470Ohm resistor between the two (in parallel with 22pF capacitor) which is acting like a voltage divider, causing the signal to drop a bit on one end and rise a bit on the other, but it works. The only note here is that next time I should use a bit larger value, like 1K, to ensure that the drop and rise are not as strong.
There is bad news, however. See, the reason RDY line was pulled low by 6502 was that it executed WAI instruction, and it executed it during EEPROM flashing. You might be thinking that the CPU should not execute any instructions during programming and you would be quite right.
As it turns out (and I hinted this at the beginning), when playing with PLD setup I have introduced a bug - when both WS_DISABLE (disable wait states entirely) and WS_DEBUG (force RDY signal low) are high, the wait state circuitry in PLD should generate low RDY signal, but it was doing the opposite - it was generating high RDY. Fixing this part would be easy, but it required changing the nRD/nWR stretching logic as well.
See, in previous entry I wrote that the nRD/nWR signal translation is easy - it should stretch indefinitely during low RDY and act normally with RDY being high. WS_DISABLE and WS_DEBUG were supposed to be handled "automatically" (the former driving RDY high and the latter driving it low), but during EEPROM programming this is not the case. You want to disable wait states entirely (because you are using slow, AVR based, clock) and you want to pull the RDY line low, so both signals are high at the same time. This is the schematic I came up with:
This way, even if RDY signal is driven low by wait state generator, WS_DISABLE will prevent indefinite stretching of nWR/nRD signals. Yeah, sometimes the solution is really simple.
Conclusion
I'm slowly getting there. I keep running into smaller and bigger obstacles, but the project, overall, is progressing nicely. Solving these issues one by one feels good, and hopefully I can share more optimistic news next time.
Please, let me know what you think, and if you understand what the hell is going on with these ringing signals, I would really appreciate the explanation!
-
Strange copy/paste distortion
06/11/2021 at 08:55 • 1 commentHunting glitches
Last time I wrote about strange problem caused by bad solder joint. I fixed it, signal got considerably better, but was it end of all problems? Sure it wasn't, and the funny part is that I had to review some of the things I wrote last time, referring to them as "simple", but I'm getting ahead of myself.
The fun never stops
It's always one thing leading to another. Another simple question, to which there is never simple answer. After fixing the blinking LED issue I moved on to the next thing - strange copy-paste distortion. This was really tricky one, I knew about it for some time now, but I wasn't feeling up to the challenge until recently.
First: it only happened at higher frequencies. Running at 1-2MHz was bug-free. At 4MHz it was happening occasionally. At 8MHz - almost every time. It also requires SC26C92 and serial communication of 115200 baud minimum.
Second: it required very complex software setup. On one hand it was a good sign, suggesting software bug. On the other, it just made troubleshooting all that more difficult - testing produced plenty of bus data I had to analyse.
Third: it looked like very nasty hardware bug. Something like crosstalk between TX/RX channels or something even worse. Bottomless pit of despair or something similar.
It's no wonder I was intimidated by it. Alas, it was really fascinating!
Problem statement
Yeah, I know I haven't described the problem just yet. This one is too good to spoil the fun with premature... description. So get this: you have to boot OS/1, load Microsoft BASIC, go to this page to copy sample BASIC code I use to test the stability of the build. The program I'm using is very useful - while small, it is pretty complex as far as 6502 BASIC goes (using trigonometry functions and floating point numbers), and due to addressing scheme of MS BASIC it uses stack and distant RAM pages heavily switching between them - I used it to detect nRD/nWR timing violation. So the program is:
10 FOR A=0 TO 6.2 STEP 0.2 20 PRINT TAB(40+SIN(A)*20);"*" 30 NEXT A
It's important: you must not type it in, but copy from the page and paste into the serial terminal.
What you get in the terminal is this:
B( FOR A=0 TO 6.2 STEP 0.2 20 PRINT TAB(40+SIN(A)*20);"*" 30 NEXT A
Crap, so the data got distorted during copy/paste, right? Well, this is where it gets interesting:
OK B( FOR A=0 TO 6.2 STEP 0.2 20 PRINT TAB(40+SIN(A)*20);"*" 30 NEXT A LIST 10 FOR A=0 TO 6.2 STEP 0.2 20 PRINT TAB(40+SIN(A)*20);"*" 30 NEXT A OK
Do I have your attention now?
As you can see - the code got distorted (and sometimes it was much, much worse than that), but when you LIST it, everything is fine. Oh, and in case you are wondering:
OK RUN * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * OK
It runs well too.
OS/1 asynchronous I/O
This is where I should probably explain a bit more about the serial I/O implementation. It uses fully asynchronous, interrupt driven buffered mechanism for sending and receiving data. This makes troubleshooting all that more difficult, because the data is moved around the system in a pretty indeterministic way, but the general flow of a single character is the following:
- Data is received in SC26C92 RX FIFO channel A buffer (8 bytes long), and interrupt line is pulled low (already after first character, but you can change that in DUART config registers),
- CPU receives the IRQ from DUART and processes all possible scenarios:
- RX flag is checked - if there are characters in DUART receive buffer, these are being read until the FIFO is empty and copied over to channel A receive buffer of OS/1 in RAM,
- TX flag is checked - if there is at least one empty slot in DUART TX FIFO buffer, it checks if there are any characters in RAM OS/1 TX buffer (again, the one for channel A). If there are - it will copy one of them to DUART TX FIFO, otherwise it will disable TX interrupt mask until new data to send is written to RAM buffer.
Now, this code is pretty similar to the one I wrote for WDC65C51/R6551, but the main difference is that here I had to abstract the "channel" thing, so there are two RAM RX buffers and two RAM TX buffers. It means more pointer manipulation and all that.
Still, regardless of the complexity, the code seemed to work in most cases, so what could that had been?
Problem simplified
The good thing was that when I ported MS BASIC, I used the native OS/1 routines for reading/writing serial data, so the same problem should occur also in simpler scenario, and indeed that was the case. I have never copy-pasted OS/1 shell commands, but I tried now, pasting the same string (
1234567890ABCDEFGHIJ
) each time:OS/1>12-45-7890ABCDEFGHIJ Command not recognized - either wrong keyword or incorrect number of parameters Enter HELP command to list available commands OS/1>12E45H7890ABCDEFGHIJ Command not recognized - either wrong keyword or incorrect number of parameters Enter HELP command to list available commands OS/1>12F456J890ABCDEFGHIJ Command not recognized - either wrong keyword or incorrect number of parameters Enter HELP command to list available commands OS/1>1264867A90ABCDEFGHIJ Command not recognized - either wrong keyword or incorrect number of parameters Enter HELP command to list available commands OS/1>12045C7890ABCDEFGHIJ Command not recognized - either wrong keyword or incorrect number of parameters Enter HELP command to list available commands OS/1>127450A890ABCDEFGHIJ
First analysis attempt
I executed the scenario number of times with the logic analyser connected, capturing each bus operation for about 2-3 seconds at 200MHz. It doesn't seem like much, but there was plenty of data to plow through, and analysis is not that simple.
As you can see above, even after shortening the dump to mere 4 miliseconds when everything happens, there were approximately 20000 read operations, over 2000 write operations and roughly 200 accesses to UART during this simple copy/paste operation. The UART count has been reduced due to observed ringing - you will see it in close-ups.
I tried looking for odd-looking data, but this was just stupid. There was nothing really out of order there. I started checking the actual instances of wrong data being sent by searching decoded 8080 bus data for specific values. If you take a look at the last sample (
127450A890ABCDEFGHIJ
) I first traced the flow of the correct characters (1
and2
) and it was exactly as I expected - data is first read from the UART (indicated by the nUART chip select signal), saved to RX buffer, copied over to TX buffer (to echo on the screen) and then copied over to TX FIFO (again, indicated by nUART chip select). You can see individual steps below:Address 0x0263 represents channel A RX register, and as you can see, value
'1'
(0x31
) is being read correctly. Next, saving to RX buffer:RX buffer for channel A starts at 0x0300 - and this is where the data it is being written to. Next, after being read by the shell routine, it's written back to the TX buffer to echo on screen, and this is the part where it's picked up and written to channel A TX register:
You can see the RAM channel A TX buffer being read (
0xB1 0x27
-LDA (0x27),Y
), which first reads buffer start address (0x0500
), adds offset from Y register and finally reads value0x31
, which is being written to UART channel A TX register at0x0263
.Please note: SC26C92 has dual-use registers, depending on whether you write to or read from them, hence the same address is used for both TX and RX registers.
Then I looked for
3
being replaced by7
. I didn't like what I found - there was nothing wrong with the signal and it seemed that the strange character was being read from memory, and this was the part that I couldn't wrap my head around. Loading3
from UART RX register:Write to RX buffer:
It's all good, and this is what is being processed by OS/1. That being said, it still wrote
7
to the shell, and this value was being read correctly from the buffer:How comes it seems to be stored in memory correctly (and read from it, when you LIST the program), but then it is read incorrectly when echoing? Another timing violation? And if so, how to find it?
Luckily, it was much simpler, and I noticed it when looked closer at the data captured by logic analyser. As it turns out, I was reading from RX RAM buffer when copying data to TX UART FIFO. Notice how pointer stored at
0x27
points now to0x0300
instead of0x0500
.Still, how comes it happened only sometimes, and only at high frequencies? As it turns out, the problem was occurring only with specific order of send/receive operations, and it happened only if the TX FIFO was being serviced during RX interrupts.
Pattern recognition
Actually, when I said there was nothing special about the captured signals, I was wrong. There was, but I just failed to see this:
See, the two times when the operation failed, there were 5 UART accesses in a row followed by 2, instead of usual 3 and 4. Three accesses indicate TX IRQ:
Actually, the first access happens before the IRQ, and this is what triggers it. Typical RX IRQ requires 4 accesses:
And the culprit - combined RX/TX IRQ was always accessing UART five times:
This was happening, because only in these two instances there was something not yet serviced in TX buffer at the time when some data was received, so the sending took place during RX interrupt service - as designed, just poorly implemented, because I forgot to update the pointer...
And, to be fair, after you spend some time looking at these logic analyser dumps, you will get better at reading them even without looking at specific values. Full IRQ cycle looks like this (markers indicate start and end of the handler execution):
How do I know? When IRQ happens, CPU has to write three values to a stack: current PC and status register. Three bytes. No other operation (that I know of) uses three consecutive memory writes:
Similarly, returning from interrupt handler is pretty specific, but not unique - it triggers four consecutive memory accesses, but it will not be so distinctive, you can have four NOP operations triggered from memory look the same:
So the point I'm trying to make here is that while intimidating at first, these dumps from logic analyser are super useful. It takes a while to get used to them, but it's worth the effort - it will make your glitch hunting all the more effective.
Is that it?
It would have been too easy if it just ended here, right? I fixed the code, uploaded it to ROM using my onboard EEPROM programmer and I tested the whole thing again.
Just to see it fail.
Exactly the same way.
Bloody hell.
So I pull out my logic analyser again, capture the same sequence again, and indeed, it's happening the very same way. Actually, it's all too similar. Didn't I just add several new instructions to my IRQ handler? It should be at least a bit different.
Took me a short while to confirm this. Indeed, the ROM contents haven't changed at all. Weird, I was sure the upload worked. What I did find was even more fascinating - but I will write about it next time.
Let me know what you think in the comments below, and I will see you soon with another fascinating glitch!
-
Every day is a school day
06/03/2021 at 18:59 • 3 commentsEvery day is a school day
Recently I wrote about making PCB prototyping cheaper. Today I wanted to write about things I have learned from latest PCB prototype of DB6502 build. Let me start with the update first.
DB6502 PCB prototype update
The good news is that I haven't made any major mistakes in my latest design. Unlike the previous prototype, this one didn't require any extra wires nor cutting existing traces. AVRISP port is also oriented properly this time!
Does that mean everything is fine and dandy? Sadly, no. Latest prototype does serve its purpose nicely - I can proceed with testing some of my ideas and verify some assumptions. I can learn and improve, but the board, unfortunately doesn't always work "standalone", without extra chips added to the breadboard next to it. The main achievement here is that I planned for such "failure" and made the board configurable - thanks to this flexibility I was able to spot and fix the problems that were blocking the development of the previous revision.
Let's talk about the problems and lessons learned then.
Problem 1: nRD/nWR signal stretching
In one of the previous entries I wrote about impact of introduction of wait states on 6502->8080 bus mapping. Basically, one of the issues with 6502 CPU is that it uses different encoding of read/write signals than the 8080 and derivatives. It might seem irrelevant for 6502 based build, but unfortunately, both your memory chips (RAM and ROM) and the SC26C92 UART interface are designed for 8080 bus. Instead of single r/W signal from 6502, you have nRD/nWR inputs on 8080-compatible devices.
Ben Eater uses simplest possible trick - ensures that nWR can be low only during high CPU cycle. While that works fine at slow speeds (and is generally accepted solution for most homebrew 6502 builds), it can cause issues when introducing wait states, because it causes write operations to happen twice (or more, depending on number of wait states).
In my previous PCB revision I have already implemented Ben's algorithm in my PLD: nRD and nWR were calculated there, based on r/W signal and clock input. It was very elegant solution and it worked beautifully due to simplicity of the calculation. Unfortunately, when I added wait states to the equation things got messy, and calculation got just a little bit longer.
What you have to take into account is that your nWR/nRD signals must rise no more than 10ns after falling edge of your clock. According to 6502 datasheet this is how long the data/address lines are stable. If you wait longer, your address/data lines might start changing, resulting in bus corruption.
See tDHW above, and this is the actual figure:
The problem with nRD/nWR stretching is that the logic behind it is pretty complex in my case - five signals are included in computation: r/W, CLK, RDY, WS_DEBUG and WS_DISABLE. In general, the logic for the computation is the following (in order of priority):
- If WS_DEBUG is high, then we are in "debugger-forced" wait state, so nRD or nWR should be low (depending on the value of r/W),
- If WS_DISABLE is high, then wait state computation should not be performed, and nRD/nWR should be calculated only using r/W and CLK (as in Ben's build),
- If RDY is high and CLK is low, it means that we have just completed full CPU cycle, so both nRD and nWR should be high,
- If RDY is low or CLK is high, then nRD/nWR should be calculated using r/W signal.
Rules 3 and 4 can be illustrated using the following diagram:
There is a problem, however - we still have two more signals to go (WS_DEBUG and WS_DISABLE) to include in the computation, and remember, we need the result to be calculated in less than 10ns relative to falling clock edge. Even if we use AC gates, propagation delay is going to be too long. Inverter will take up to 7ns:
Then there is the AND gate (74AC08) with 7.5ns propagation delay:
Last but not least - OR gate (74AC32) with 7.5ns propagation delay:
It all sums up to 22ns pessimistically, and two signals are still missing (WS_DEBUG and WS_DISABLE). I just gave this up and never looked back - why bother with 22ns incomplete solution when you have 15ns guaranteed?
As it turns out, you can actually get this to work much faster, you just need to be smart about it. First trick is to make time instead of spending it on inverting the clock. Instead of the above schematic, do the following:
See, with this setup, instead of waiting for PHI2 (clock) signal to be inverted, you start with the inverted signal, and start your computation several nanoseconds before the clock falling signal. If you compute the propagation delay now, it looks much, much better: AND + OR gates still take 15ns pessimistically, but the clock inverter signal propagation delay can be subtracted from (instead of added to) this number. Unfortunately, we can't take the pessimistic number here, we need to assume the optimistic one, being just 1,5ns. This leaves us with 13,5ns which is better than the PLD - and in reality, with the values closer to typical, you end up with your nRD/nWR signal rising before the 10ns limit. If you want to be super-safe, you can always invert the signal twice more.
There is just one problem left - WS_DISABLE and WS_DEBUG signals. I puzzled it for quite a while before I realised how dumb that was. Both of these are already included in the schematic, in the RDY signal. When WS_DISABLE is high, so is RDY; when WS_DEBUG is high, RDY is low. Yes, it is that simple.
Sure, if they change in the middle of the cycle, they will only apply at the next calculation point for RDY. So what? They are only debug signals, not intended for high frequency clock operation.
Lesson learned for PCB design
You might be wondering what does all that timing nonsense have to do with PCB design. As a matter of fact - it does quite a lot. In my first revision of the DB6502 prototype board I didn't use the clock-inversion hack and I was forced to use the slower variant. Furthermore, I have never imagined having problems with my nRD/nWR translation in PLD, so in the first board revision these have been "hardwired" to the board. Suspecting that they might cause issues down the line, I designed newer prototype with jumpers for common signals, and with simple jumper removal I disconnected PLD signals from all the other chips. In the picture below you can see the jumpers (on the right side), where the PLD nRD and nWR are disconnected; on the left side there are lines where these signals are being fed back into the board:
Next I built the improved translation logic on a breadboard and connected the signals to dedicated headers on the side of the board and voila! I had new, improved version to test - mixing the durability of PCB with flexibility of the breadboard and jumper cables.
Does it mean that you should put jumpers on each and every signal? Well, obviously not. Some things, like address/data bus will not change. Some will be simple and tested, no need to modify them. You should try to anticipate weak spots in your design (and everything related to CPU/peripheral timing is probably going to be one) and plan for failure by making it easier to introduce plan B. Don't overdo it, you can always cut and patch, but doing so might be hard on complicated boards.
Problem 2: Simple issue with blink LED
The worst problems you will ever encounter will usually start with simple LED not working, or working erratically. If humanity fails to colonize other worlds, it will most definitely start with some abnormal reading on a smallest instrument in the space shuttle, and it will happen because somebody will notice, shrug and ignore the warning. Rest of human kind will perish in massive explosion hours or days later.
Not that I'm comparing building homebrew 8-bit computer to space race or something. Or maybe I am?
Anyway, this one started in the most innocent way possible. I fixed the nRD/nWR signal translation logic with external chips, inserted 8MHz clock and started testing system with single wait state for ROM access. OS/1 booted as usual, blinking the onboard LED twice to indicate start and stop of init sequence, loaded and ran MS BASIC just fine. So far so good. I was really glad.
Then I typed in OS/1 terminal "BLINK ON". Nothing. "BLINK OFF". Still nothing.
I launched system monitor to see the current state of DDR register in VIA PORTB (where the LED is connected) and it was incorrect. I expected 0x0f, but there was 0x0e there. Software bug probably? Then I had this feeling that I have already tested it at lower frequency, and it was fine. Changed system clock to 4MHz, tested again. Works as a charm, no problem. Changed back to 8MHz and it fails. I mean: everything seems to work correctly, except for this bloody blink LED.
So the software bug manifests itself only at higher CPU frequencies? And do I care, really? It's just a blink LED. I can add another operation to overwrite incorrect VIA register at the end of load sequence or shell boot, and it will go away, right?
This time I decided to investigate further and what I found was... amazing!
Before I take you on this fascinating journey, I need to explain something very, very important: in my testing I was using the VIA chip (and I/O in general) heavily, it's not like I was just blinking led. The same VIA port was used for communication with the onboard LCD, and it worked just fine. The same goes for the UART, which was being used for both asynchronous, interrupt-driven serial communication and as timing reference (for delays and CPU speed calculation). So, it wasn't just simple blink on/off, there was this whole I/O subsystem that was, it is very important, functioning correctly, except for one register and one bit in one situation only.
Problem 2a: VIA_CS signal quality
I tried to investigate the issue using my logic analyser, but failed. For some reason connecting it to the system bus interfered with UART operation and without proper shell I couldn't test it correctly. This is still an open question and one I'm dedicated to answering. What I did notice was that connecting logic analyser to VIA_CS signal caused the UART to fail, so I decided to look closer at this particular signal with a scope.
Took me just a while to see these strange readings (captured at 4MHz clock, yellow signal is VIA_CS, purple is the CLK). First one shows very strange "high" level output of the VIA_CS:
As you can see, the voltage of the high signal is extremely low, probably too low for the chips to read it correctly. It also takes a while to rise to normal levels:
Or here:
So, with the very slow rise time it was actually possible that the whole thing worked at lower frequencies only because the VIA_CS signal had enough time to rise before other peripherals were accessed - which is important, because the other VIA_CS2 signal was connected to general IO_CS line. Bottom line: if any write to UART happened shortly after VIA access, the low VIA_CS could cause VIA to interpret it as write to VIA and result in overwrite of the DDR register.
Now that I have plausible explanation for the invalid value of DDR register, I just need to answer - what makes this level so low? Maybe this is how 74AC138 is supposed to work?
I decided to test it, and what I did is I placed another 74AC138 next to the PCB on a breadboard and routed the same input signals to it just to see how it behaves with 8MHz clock. I wasn't very surprised to learn that the other chip produces much better signal:
So, what is the difference here? I was almost certain that the cause of the failure is my lousy routing of the +5V line on the PCB:
See that large white cross at the bottom of the board - this is where 74AC138 is located. It's almost a far end of very long power line with plenty of chips along the way to suck the power. I wasn't sure it was this, but seriously, what else could it be? I was sure I will not be able to solve this issue in this revision of the board, but I had backup plan - I could use correct signals routed from the external 74AC138.
Lesson learned for PCB design
Well, this one is simple - my shitty power distribution was caused by a fact that it was practically one of the last signals I routed on the board. I should have started with it, think about it a bit more and maybe place one more larger capacitor near the bottom of the board. That wouldn't hurt, probably, either.
So, boys and girls - if you don't want to end up with similar issues, make sure you start your routing with deliberate and careful planning of power and ground lines. It will save you a lot of headache down the line!
Problem 2b: strange noise on VIA_CS line
Solving the previous issue was pretty easy - I could connect VIA_CS signal generated by external 74AC138 to the board and go with that, but when scrolling through history of captures on my scope I noticed something very, very strange. As previously: yellow signal is the VIA_CS, and purple represents the clock line:
It was never supposed to be simple, I know, but seriously - what the hell?
I decided to look at the inputs of 74AC138 chip, hoping to find the one that is responsible for this rapid signal oscillation. It wasn't IO_CS (but there was something off about it too):
Yellow is VIA_CS, purple is IO_CS (active high). As you can see, it's ringing a bit, but VIA_CS doesn't seem correlated to it. Let's try the A7 then:
This time purple is the A7 signal. Nope, not that one either. Then I checked A5 and this was the one:
It's even better visible here:
So, what the hell? It seems like the A5 line is suffering from something strange that causes the line to produce very weak signal. Initially I thought it was the CPU that was driving the line poorly or something like that, but it just didn't seem so likely.
It all seemed like there was something trying to "drive" the A5 line, like the signal was struggling to reach high levels, like it was pulled down. I turned the whole device off, and in desperate attempt measured resistance between A5 and A6 line, and took a picture so I never ever forget:
Yeah, that reads 1,7kOhm. Not a dead short, sure, but enough to pull down the signal just enough, assuming that A6 is low at the time that VIA is being accessed (which it is, due to addressing scheme).
How did it happen? Leftover flux from soldering. Apparently I didn't clean it good enough... After another round of cleaning, the signal got much, much better:
Now the reading is still in megaohm range, while there should be no contact whatsoever, but it's sufficient enough to work correctly. With the VIA_CS signal routed from the external chip I have now perfectly stable blink LED on my board, even at 8MHz.
Even though - the A5 signal seems quite low anyway. No worries, will come back to that.
Lesson learned for PCB design
Again, this is obvious - boys and girls, it doesn't matter how pretty you are on the inside, if you don't wash your private parts properly. Same goes for PCB :)
It was first time ever that I was affected by this, but I learned my lesson. I actually plan to test few other bits of the board now and clean it few more times just to be safe.
Problem 2a again: round two
This is when I was almost ready to post this entry online, but I decided to ask someone way more experienced and much smarter than myself to proof-read it and provide feedback.
What I got was pure gold indeed.
I received comment that my power distribution was not that shitty after all, and most likely it didn't cause the observed issue. Yeah, it wouldn't work for high voltages or high currents, but with the circuit in question I should be fine. My mentor suggested I check the voltage at input of my 74AC138 chip to check if there are any power fluctuations there that would explain this drop in quality of VIA_CS signal. I have tried something similar before, but I wasn't able to focus the scope on the actual problem and the reading was inconclusive. Since the chip in question is in SOIC package, you need to hold your probe in place touching the pin - it's not easy and you can't hold it steady forever.
This time I decided to spend some more time to figure out proper scope trigger setup for problem analysis. I tried runt mode, but it wasn't able to set it up to trigger on the correct signal. Then, after reiterating the problem aloud I realised that I need to catch slow rising slope on VIA_CS signal - and this is what I did set up. Rising slope from 2.3V to 3.5V in more than 50ns.
See, you think buying scope is hard? Sure, it is, after all it's pretty difficult decision to make. The problem is that, contrary to popular belief, owning one doesn't make you instantly smarter and sadly, it doesn't solve any problems on its own. You have to spend considerable amount of time to learn to use the device, speak its language and understand how it works (and when it doesn't). So, when you finally get to the point where you manage to see EXACTLY what you were looking for, the feeling of achievement is overwhelming.
Anyway, when I caught the signal in question, I had my probe touching the nearest decoupling capacitor, and the reading contradicted the initial hypothesis of shitty power distribution: power was pretty solid just next to the chip. Yellow signal is the VIA_CS signal, purple is the +5V side of the cap:
So, if the power was fine just next to the chip, maybe it gets worse down the line? Nope:
This was being measured at the last chip on the long power line - still, the power fluctuates only just a bit.
Finally, I tested the 74AC138 power input (between the cap and the chip) by touching the probe to VCC pin and this is what I got:
What the hell? Power is fine just before entering 74AC138 and it's perfectly good right after it. That doesn't make sense, does it?
I noticed one more thing - the stronger I pushed the pin down, the less events did the scope capture, up to a point where it would not trigger anymore. I tried measuring resistance between the 74AC138 VCC input and the power jack +5V input (obviously with the device powered down), and guess what - this explains it perfectly. When pressing the pin from the top, it reads (as expected):
Now, when touching gently the pin from the side:
So, it wasn't power distribution that was bad on my board. Sure, it could have been better, but still, this was good enough for the purpose. It was simple bad solder joint. After I fixed it with just a dab of extra solder my "software bug" causing blink to fail - stopped occurring. At the same time it fixed my power distribution on the board :)
Seriously though, I need you to let that sink in: 74AC138 chip was being used each time I sent data to LCD and read its status, and the LCD was running fine. It was also used for enabling UART, and the serial communication was working perfectly. The only thing that failed to work, and only under certain conditions was the blink LED. Only one bit of data sent to VIA DDR register failed due to something that would seem catastrophic - failure to connect power to the I/O selector chip correctly.
Lesson learned for PCB design
When testing the board, don't assume that things working most of the time imply the build or design is correct. Even very serious issues (like disconnecting power from glue logic chip or disconnecting ground from reset controller) do not have to cause spectacular failures and can haunt you in a form of very rare and mysterious glitches.
And, most of all, test your hypothesis each time. I was really sure it was an error in PCB design that caused issues with power delivery, and I was about to start the process all over. I would waste a lot of time and money on something that took hour to find and five minutes to fix - just because somebody challenged my theory.
Problem 2b: round two
Remember the strangely low A5 signal during VIA_CS operation? The one that I fixed a bit by cleaning the flux off? This is what I got back then:
It was a bit on the low side, right?
Probably this is where the 74AC138 got the power from (together with the IO_CS signal), because after fixing the power issue this is what I see:
So yeah, I had plenty of clues that I didn't read correctly, but as the title says - every day is a school day!
Summary
Actually the outcome is very, very optimistic: I did run into three serious issues with my latest board (so far, there are probably more!), but I'm very, very happy that I managed to find them and the underlying cause. It was great exercise, I learned a lot and I feel much more confident in my adventures with electronic design. I hope you learned thing or two reading it today.
-
Reducing cost of your own PCBs
05/25/2021 at 10:38 • 0 commentsMaking PCB on a budget
Experimenting with digital electronics can be a lot of fun, but as each other hobby, it can cost you a small fortune. Especially at the beginning you will make mistakes and buy useless gear, but as the time goes by you will learn to choose wisely. What does it mean? Different thing for each one of us, as we work in different environments, are in different financial and social situations. My work is mostly limited by space (no dedicated area to work, sharing workbench between different hobbies and projects) and time (being parent of four year old, I can't really plan ahead a lot and have to be able to start/stop work on short notice). My solution to these limitations: using multiple PCB revisions instead of breadboards. Sure, it takes time to design them, but all you need is a laptop and you can work on the design everywhere around the house. Completed project is very durable, all you need to do is to grab it from the drawer, plug it in and resume hacking - no need to spend couple of hours looking for loose wires.
But then there is the cost factor: making PCB is not cheap, especially when you order one project at a time - shipping cost from China can be prohibitive. Another thing is at the beginning you tend to make your boards larger than they need to be, just because of lack of experience.
Don't worry, you will get better at it, and while you do, I want to share some tips/hints I have learned recently.
Tip 1: verify your assumptions!
I wouldn't surprise you if I said that using smaller, SMD packages can help you save some money - after all, the cost of PCB nowadays is mostly defined by the size of the board and most of the space is taken up by your ICs. That being said, there are other factors I was surprised to learn about.
Let's start with simple comparison. Below you will find two revisions of the same Z80 development board - first one (on the right) was made with thru-hole components only, the second uses surface mount to some extent (only the simplest packages - SOIC and 1206, and not for all parts). What is also important, the second board is significantly improved - it has added I/O decoder and all the SC26C92 GPIO input pins use pull-down resistors now for stability. Still, the second revision is considerably smaller:
For quite a while I would stick to DIP packages for a very simple reason: chips are expensive, and you have to recycle them. I would design my boards with tooled DIP sockets in mind, and only after several iterations I realized how wrong my assumptions were. Let's look at the costs, shall we?
Variant 1: DIP package
First, you need the chip for about 3,01 PLN, but also a socket, which costs 1,79 PLN. That's not all, you also have to consider the real estate that the package uses.
Variant 2: SOIC package
Second option is to go with the SOIC package - at 2,20 PLN it's considerably cheaper than the DIP counterpart. Actually, it costs only slightly more than the DIP socket itself. When you consider the extra saving from smaller PCB size required, it might get even cheaper.
Variant 3: TSSOP package
This one is even cheaper - only 2,04 PLN, and it uses very little space on the PCB, but TSSOP can be too hard for other people to solder by hand. I will elaborate below why it matters for the cost of your own fabrication.
One could argue that you can get cheaper socket, and it's true. There are non-tooled sockets for 0,50 PLN, but the point is still valid: sometimes using 0,50 USD socket to be able to recycle 0,60 USD chip is not that smart, and SOIC package makes for great introduction to SMD soldering.
Sure, there are other cases - EEPROM, SRAM or CPU cost significantly more than the socket they use, and in these cases savings on PCB real estate do not justify the expense unless you plan to desolder the chips afterwards.
Bottom line: before you start planning the layout of your board, shop for parts a while - it might turn out that small changes will have significant impact on the final cost. It's also much easier to change footprints at the beginning...
Tip 2: no, seriously, look for alternatives
One of the things that I wish I had known a while ago was that 71256 SRAM chips are a thing. They are not only significantly faster (up to 12ns in DIP package), smaller (narrow DIP-28), but also cheaper at 13,69 PLN compared to 14,48 PLN from Ben's build. Their size also matters when you use them on breadboard - they leave you much more room for all the mixed address/data lines wires.
So yeah, learn to use search engines and spend some time reading about your options - your PCB can get significantly smaller and more flexible with some simple chip substitutions.
Compare sizes of DB6502 protoboards v1 and v2:
As you can see, the second revision is just slightly larger, but it has many, many more chips and ports - it has VIA, SC26C92, debug/supervisory circuitry, etc. All of this is achieved with smaller footprints, chip replacements (like the memory) and moving some parts to the back side of PCB. Amazing what you can do with simple changes like these!
Tip 3: pull-ups and pull-downs
Another thing I never noticed before were resistor arrays, and these things are just perfect for multiple pull-downs/pull-ups you might need at your board inputs. In my first DB6502 prototyping board I placed seven 4K7 pull-up resistors on a board. In my second revision I replaced them with single resistor network for only 1,38 PLN. It uses fraction of the board real estate and is really easy to use. Definitely recommended!
This is how neat they look on finished board:
And this is comparison of real estate used (old board with individual resistors, and resistor network placed in the ROM socket above):
Tip 4: Don't forget the other side
One of the things that will fit great on the backside of your board are decoupling capacitors, especially if you use 1206 SMD package. They are really easy to solder, and let you use space on the front side of your board better. Bonus: you can place them really close to VCC input of your DIP chips, improving the quality of your build.
You can also try and put some of your SOIC package chips on the back of the board, but the routing can get tricky. I didn't dare that yet.
Another thing that fits great on the reverse side of your PCB is the QR code with a link to your project - very convenient!
Tip 5: Make your design flexible
This will increase the size of your board a bit and might make it a bit harder to design, but then again - the worst thing is to have to design, order and pay for new revision when you change your mind. Think about what is set in stone and what is negotiable.
One of the silly things I did in my first PCB ever (the DB6502) was to put two VIA chips on it and wire the first one to LCD/keyboard/blink ports. Waste of space (which costs money), extra chips (more money) and added complexity. In my latest revision I use only one VIA chip connected to two ports - one preconfigured for LCD and another one simply exposing raw VIA I/O pins. You can choose which interface you prefer to use, and if you want, you can always hook up second VIA chip using the extension port on the side of the board.
Add jumpers where you want let people decide how to configure the board - it will make their life easier, and they will be more likely to reuse your design. While not obvious, this is also important cost factor for your future projects!
I will write more on that in next entry (where I will summarize lessons learned from the second revision of the DB6502 prototype board), but as it turned out, these jumpers can also save you from design errors!
Tip 6: Don't pay for your PCBs at all
This might sound silly, but you don't really have to pay for your PCBs. I take advantage of the PCBWay project sharing program - you can share your designs on their website and each time somebody orders from them board designed by you, you will get small commission to be used for future purchases. Sure, it's not a lot of money, but it does add up and this way you can learn to make better designs and start thinking about what other people can use and benefit from. It's pretty much free proving ground for product design.
There are, however, some factors you need to take into account:
- Using very small packages can discourage people from building your design, so balance your options. Sure, your board would get cheaper if you used only TSSOP package, but if you intend to share - reconsider this choice, or find another way to make the board beginner-friendly. I use only one chip in this package (FT230XS), but I always provide alternative solution - direct UART breakout pins that allow you to skip this particular IC altogether,
- Using parts from your drawer might not be the best option. Sure, you have them for free, but are you sure other people will be able to obtain them? Make sure to check that the components you intend to use are available from at least one online retailer and marked as suitable for new projects. And make sure the parts you want to use actually exist - in one of my first designs I used capacitor variant that nobody ever produced :) It was, obviously, typo in my bill of materials, but it sent people on a goose chase...
- Be careful with the footprints. One would think that these are standard and should not vary a lot, but in fact I had to throw away batch of first VGA breakout boards, because I ignored seemingly insignificant discrepancy between the connector datasheet and KiCAD footprint. As it turns out, there is a bug in KiCAD and all VGA port footprints are based on designs that are no longer available anywhere. The funny thing is that I do have (and follow) rigorous process of footprint matching - I print the PCB layout on paper and put my parts (purchased prior to PCB order) on the printout to confirm match, but with the VGA connector the printout didn't seem to match very well the part. I was sure it was the process fault (not making holes in paper precisely enough), but when the PCBs arrived it turned out I had to bend VGA connector pins to make them fit. Ouch. Obviously never shared this design,
- When shopping for parts keep track of all orders and links to parts that you got - it will make your BOM easier to make and it will ensure that you correctly match links with parts that you have actually used. Better yet, when designing PCB for sharing buy separate batch of parts just for this specific purpose, to ensure that there are no discrepancies,
- No design is too simple or too small for sharing. Worst case scenario - nobody will use it, and you will learn something,
- Document your design - the more you write about your design, explain the application and the problem you are solving, the more likely people will be to give it a try and share their comments with you. Community feedback is invaluable for beginners!
Summary
Maybe it's a thing of me being beginner, or maybe it's just the conditions I'm living in, but seriously - designing your own PCBs makes the whole electronic project a lot more exciting. It makes you feel instantly more "pro", and even that feeling alone is worth the effort. At the same time you can contribute to larger community and learn from it. It doesn't have to be expensive and it will teach you a lot about keeping track of your project, managing changes and resolving issues should they arise. What are your tips and tricks for cheaper PCB design? Please let me know in the comments below!
-
Breaking Ohm's Law... or so it seems
04/09/2021 at 17:01 • 0 commentsBreaking the Law
Last time I wrote about my experiments with common operational amplifier, but obviously, there was certain context to that, and I found the topic worthy of another post. Again, the inspiration came from this amazing video by George Foot, and this time I would like to tell you a story of building simple adjustable active load circuit, and how it allowed me to break Ohm's Law. Twice!
Let's start at the beginning: adjustable active load is a circuit/device that allows you to simulate certain (and configurable) load on your system. It is critical for testing any kind of power circuits, but the greatest value of it is in the learning opportunity. Also, please note: if you are to test any kind of commercial product design, you should probably buy professional grade device for several hundred dollars.
However, should you decide to build your own, you get to use different components and while troubleshooting any issues you come across, you get to understand them much better. It was fun, it was full of surprises and discoveries. Strongly recommended!
On basic level such a load circuit will consume certain amount of current and convert it into energy, probably heat. If the idea were to use constant, defined voltage and constant, defined current, you could just use power resistor - and that would be it. Want 500mA at 5V? Using Ohm's law you calculate the resistance as 10 Ohm. Power dissipation will be 2,5W so make sure your resistor can handle that.
Things get more complicated when you want your load adjustable (in terms of current passing through) and working with different voltages. This is where single resistor will not suffice. I used the this blog entry as inspiration, and the schematic was as follows:
Let's explain how this circuit is supposed to work:
- Load input is the VCC/GND, and most of the current will flow through Q1 MOSFET and R4 shunt resistor,
- R1 resistor is used to limit current passing through U1 reference voltage chip, and C1 is standard decoupling capacitor for the U1 chip,
- U1 is very important for the operation of the circuit - it will provide constant voltage of 2,5V at connection with R2 resistor,
- RV1 and R2 form adjustable voltage divider, and the resulting voltage will be in range 0 mV to 417 mV. The resulting voltage is fed into non-inverting input of the OpAmp,
- The magic happens at inverting input of OpAmp - based on Ohm's Law, voltage delivered there will depend only on current passing through Q1 and R4, regardless of the load voltage. If the current is 1A, R4 will drop exactly 150mV, and this value is fed into the inverting input of OpAmp,
- Remaining voltage (Vsupply - R4 drop) will be dropped by MOSFET working as a variable resistor (bear with me, please) and dissipated as heat.
Now, what happens if the non-inverting input is higher than the 150mV measured on OpAmp inverting input? It will increase output voltage delivered to Q1 MOSFET gate and as a result, the MOSFET will pass more current through. This will keep happening until voltage drop on R4 is equal to the non-inverting input of OpAmp. Beautiful usage of the feedback loop!
What is also great - none of the input parameters to OpAmp depends on the input voltage. R4 voltage drop is calculated against ground, and RV1 output is always measured against the 2,5V reference voltage provided by U1 chip. Lovely, isn't it?
Theory and practice - in practice
The beautiful simplicity of this circuit could be matched only by its utter and complete failure to work. I built it on breadboard, provided 5,35V power from standard 2A charger and started testing. Yeah, it would work pretty well almost halfway through, but at around 600mA it wouldn't get higher anymore. I replaced the MOSFET, tried different variants of R2 biasing resistor. 660mA was the limit and that was that. Sure, I could live with the 600mA limitation, but I wanted to understand where it came from, especially that on paper it looked as if it should be able to pull full 2A of current.
I checked power rails of the circuit (just where it enters drain of Q1 mosfet) and the measurement was disappointing: 3,45V. Clearly my power source wasn't able to put up with the load. For a long time I wanted to buy myself proper lab supply, and that was excellent excuse to finally get one.
Few days later the thing arrived (I got Uni-T UTP1306, because it's very small and almost always completely silent) and I started testing again. Nice equal 5V, current limit set to 2A, let's go.
670mA maximum. Not a freaking miliamp more! What is going on?!
You wouldn't believe the crazy theories that went through my head. The best was that it was the MOSFET's fault: it would keep "pulsing" (opening and closing), and while open it would let only 2A to pass through (power supply limit), but since it was "oscillating", it would "average out" to the observed 670mA. While the theory was really tempting, there was one observational problem with it - when I raised current limit in power supply to 6A the "averaged" current wouldn't increase, not by an miliamp. That sort of contradicted the idea.
Proper measurements
When everything else fails it's time to troubleshoot the circuit. Like it "properly measure" instead of guessing the underlying reason. This is what I measured:
Id 50mA 100mA 250mA 500mA 613mA 630mA Vinv 12mV 25mV 62mV 124mV 154mV 157mV Vnon-inv 13mV 26mV 63mV 125mV 155mV 234mV Vgs 1,897V 1,983V 2,13V 2,32V 2,39V 2,40V - Id - current passing through Q1 drain-source,
- Vinv - voltage at the inverting input of OpAmp,
- Vnon-inv - voltage at the non-inverting input of OpAmp,
- Vgs - voltage at the Q1 gate (OpAmp output).
As you can see, there is something weird going on at around 2/3 of the pot (I was using 10K biasing R2 resistor at the time I took these measurements) - up until then the inverting input of OpAmp follows the non-inverting one, as expected, but at around 155mV it just stops, and the Vgs doesn't rise anymore.
I was really, really confused. Sure, I remembered that there is certain limit in operational amplifiers as to the minimum and maximum voltage they can output, relatively to their supply. LM358 that I used, can go almost all the way down to V- input (0V in my case), but it goes up to only V+ -1,5V:
With 5V supply, I would expect the output to be capable of going up to 3.5V at least, so what's the problem here?
Actually, there's more than one.
Revise your assumptions!
Now, I assumed that there was indeed 5V power supply on my breadboard rails. Imagine my surprise when I finally measured it:
Id 50mA 100mA 250mA 500mA 613mA 630mA Vsupply 4,89V 4,78V 4,46V 3,92V 3,67V 3,64V Vinv 12mV 25mV 62mV 124mV 154mV 157mV Vnon-inv 13mV 26mV 63mV 125mV 155mV 234mV Vgs 1,897V 1,983V 2,13V 2,32V 2,39V 2,40V So, this explains it - the circuit can't pass more current, because I have reached my OpAmp output limit, and the MOSFET can't open anymore. Still, how comes the drop is so large, even when using proper power supply?
I was sitting there, staring at my setup and wondering what the hell is wrong... And why the bloody thing claims to output 5V when it clearly outputs only just slightly above 3,64V?
It was like one of these moments, when your brain has already noticed, but can't put a name on it. You can see the picture is wrong, you just don't know what it is... until you do. I realized that these cables hooked up to the power supply look a bit different. The ones I took from the box had these little holes in them, but these are just plain banana-alligator clip cables.
No. This can't be it. No. NO!
Yeah, this is exactly what happened - instead of using the original power supply cables, accidentally I used some cheap crap that I just pulled out of the "box with the cables". Does it matter? Well, yes, apparently it does. At 630mA it takes only about 2,15 Ohms of resistance on the cables to get the 1,36V drop between power supply and the breadboard... I replaced the cables by the original ones, and all started working just fine - full 0..1A range with 10K R2 biasing resistor, just like that:
I have also measured resistance of the two cables with my meter:
As you can see, the crap cable measures at 1,9Ohm, while the good one at 0,2Ohm - ten times less:
You might be wondering how comes it happened also with the old power supply - as it turns out, the previous setup used some cheap breadboard jumper wires with ARK connector (it's visible on the right side of "it works" picture above), and the connection was equally bad, resulting in similar voltage drop. When I replaced it with proper wire soldered to pin headers connected via the same ARK connector it turned out that my original power supply was just as good - it was capable of driving up to 2A of current via the load circuit.
There you go, I just got myself proper power supply only because I had crappy wires connected to the old one. Still, I'm happy I got it, I can use it for other experiments as well!
Breaking Ohm's Law
I promised you breaking Ohm's Law, so where is it? Well, let's look at the revised measurements, with proper power supply cables this time:
Id 50mA 100mA 250mA 500mA 750mA 998mA Vsupply 4,99V 4,98V 4,96V 4,91V 4,87V 4,83V Vinv 12mV 25mV 62mV 125mV 187mV 249mV Vnon-inv 13mV 26mV 62mV 126mV 188mV 249mV Vgs 1,882V 1,976V 2,13V 2,34V 2,52V 2,59V Well, the thing that bothered me when I looked at these figures was the inverting OpAmp input voltage. At 1A it measures as 249mV, but my resistor is only 0,15Ohm. Based on Ohm's Law you would expect only 150mV, so what is going on here?
Again I had some strange ideas about MOSFETs and fairy dust, unicorns and rainbows. Reality, though, was much simpler. Took me actually quite a while to figure that one out, and even though I had plenty of clues all along. See, as it turns out, I was measuring the voltage drop wrong. I mean, sure it was connected to the right junctions in the circuit, but not at the right spot on the breadboard.
I should have measured the drop closely (or preferably - directly connecting) to the R4 wires. What I did wrong, is that I connected the "GND" probe of my multimeter closely to the beginning of the power rail, where the power cables were plugged in. As it turns out, all the rails in breadboard have their own resistance, and while very low (I calculated it at about 0,02Ohm) it is sufficient to drop around 40mV along the length of the breadboard when passing 2A of current.
Again - important lesson learned here: distributing power on breadboard is not easy, and making any kind of assumptions about the voltages at different spots can lead to serious trouble. Also: it does take a while to develop proper intuition about voltage at slightly higher currents - everything that is negligible at 10mA range can become important factor just two orders of magnitude more.
That being said, I'm sad to report that, at least this time, the good old Ohm's Law wasn't broken. Maybe next time!
Bonus discovery - MOSFET characteristics
There is one more thing worth noting here, and I haven't mentioned it before. R3 resistor of 220 Ohms. I added it, because I read somewhere that it could reduce ringing between OpAmp and MOSFET. To be fair, when investigated on the scope, it doesn't seem to do anything, but it doesn't harm the circuit either and who knows? Maybe I'm measuring it wrong? The good thing about it is that I found this amazing article about common MOSFET misconceptions.
Remember when I said at the beginning, that in this circuit MOSFET is being used as variable resistor? This is how I imagined the circuit: the more voltage you provide at the gate (measured against source, so Vgs), the lower the resistance of the MOSFET. This means that in this circuit, if I increase power rail voltage, it should increase the current passing through the Q1 and R4, resulting in increase in inverting input voltage increase (dropped on R4) which will lead to "closing" the MOSFET (or increasing its resistance) to maintain constant current flow, right?
Let's measure this then.
Itotal 502 mA 503 mA 504 mA 506 mA 335 mA Vsupply 5,00 V 5,50 V 6,00 V 8,00 V 5,00 V Vds 4,80 V 5,30 V 5,80 V 7,80 V 4,86 V Vgs 2,083 V 2,077 V 2,073 V 2,056 V 2,052 V Rds 9,56 Ohm 10,53 Ohm 11,50 Ohm 15,41 Ohm 13,69 Ohm First of all, the great news is that the circuit is almost perfectly stable - when I set the current to 500mA at 5,00V it kept the value up until 8,00V with only 4mA difference, and the difference was probably caused only by increased current passing through R1 (3V/1K = 3mA). Please note: I measured current used by the whole circuit, not just the Id (passing through Q1 and R4 only).
However, when you look at the calculated Q1 resistance (Rds = Vds/Itotal), you can see something weird. Resistance changes quite a lot, regardless of rather small changes to Vgs. What is also weird is that when you look at the last measurement (where I adjusted current to match the previous Vgs value), the resistance seems to differ quite a lot even though the Vgs voltage is almost identical.
Is this old German playing tricks on us again? Nope, it's pretty well explained in the article I just linked. See, when you look at the datasheet, you will notice that MOSFET resistance isn't just function of Vgs, but it depends on Vds as well. For certain Vgs values Rds will remain constant for small Vds range, but above it, MOSFET will no longer maintain constant resistance. Instead, for given Vgs it will act as constant current source:
As you can see, for Vgs equal 2,5V MOSFET resistance will stay constant with Vds up to 0,3V, but after about 0,6V it will rise proportionally to Vds, providing almost constant 1A current. If you increase Vgs it will increase the current passing, but for given Vgs value the resistance will rise proportionally to the Vds voltage. Sure, these lines are not perfectly flat, they also change with temperature, and the whole device was heating up quite a lot during my experiment (to around 66 degrees Celsius) - this caused the observed changes in Vgs.
Bottom line is: you have to remember that there are two ranges in which MOSFETs operate and both of those have their own important characteristics. Take them into account unless you want to think you just broke Ohm's Law.
And most of all: If you haven't yet, play with simple circuits as these. Digital electronics, Arduino and 6502-based computers are fun, but there is a lot to be learned from simple analog devices. Remember that at the end of the day your super-modern CPU is actually analog device that has more to it than it seems at first look.
-
Fun with OpAmps
04/03/2021 at 20:52 • 0 commentsEvery day is a school day
I believe I have said it before, but here goes again: one of the things I don't like about all these EE tutorials out there is that most of them are written by people who are actually pretty experienced. They don't remember what was hard to grasp at the beginning and keep using terms that are not that clear for people like myself that are new to the field. The same goes for most books, every single datasheet (for a good reason, really), and majority of videos.
Then there is this split between analog and digital electronics. I've been working way more with the latter, and it all seemed so easy. Sure, there were terms like "input capacitance" or "output impedance" that didn't mean anything to me, but hey, as long as you connect these chips like LEGO pieces, it doesn't seem to matter.
Time went by, and this ignorance was like an itch - something you can forget if you try hard, or have too much of a good time, but it comes back whenever things get rough. As it turns out, there are other people out there having the same problem (pretty decent understanding of digital, but much less of analog electronics), and sometimes they provide excellent inspiration. This great video by George Foot reminded me how badly I need to work on my understanding of the simplest circuits. If you haven't seen it yet, please do, it is really amazing: simple, clear explanation of complex concepts, made by someone who still remembers the difficult beginnings.
I decided to build the circuit myself, trying to understand each part of it the best I can. Since OpAmp is critical part of the circuit, I started there and it was amazing journey so far. So, however it's not related to my DB6502 project, I decided to write about it, because it's definitely something interesting to share.
Fun with OpAmps
OpAmps are virtually everywhere. People that know diode polarity without checking the datasheet probably know everything about them and think that simple two rules explain it all. For everyone.
I tried watching several videos and reading multiple articles, but they all seemed rather convoluted. One way or another, I decided to give it a try and build some basic circuits myself. I will document all the mistakes I made here, because I want to illustrate the learning process and show how insignificant exercise as this one can help you build very strong understanding and intuition about basic rules about electric circuits. Let's get making then!
Simple comparator circuit
To follow along with my exercises, you will need the following:
- Two 47K resistors,
- Three 10K resistors,
- One 1K resistor,
- One 10K potentiometer,
- LM358P OpAmp chip,
- 100nF decoupling capacitor,
- 1uF decoupling capacitor,
- Breadboard + jumper wires,
- Multimeter (or better yet, oscilloscope).
It all starts simple - two voltage dividers, one of them being adjustable with the potentiometer:
With 5,35V power supply I'm using V2 is around 2,65V and V1 can alternate between 2,40V and 2,92V. I chose the values of the R1, R3 and RV1 resistors to make sure that V1 range is pretty small, around 500mV. After all, we are going to amplify that signal, right?
So, let's go ahead and test the output when turning the potentiometer. I use scope in slow "roll" mode to ensure that slow changes introduced that way are clearly visible on screen. Channel one (yellow) is connected to TP1 above and channel two (pink) to TP2.
As you can see, channel 1 oscillates just a little bit below and above the channel 2 - just as I wanted it to.
To use OpAmp as comparator we need to create something called open loop. In general, the way to use OpAmp is to feed back some of the output signal back into one of its inputs; this is called closed loop configuration, and it allows us to control the gain, or signal amplification factor. Sounds complicated? It did to me, so let's start without feedback loop, with something much simpler.
In the open loop configuration gain is virtually infinite, causing OpAmp to behave as a simple voltage comparator. To build this circuit we just need to feed both of our voltages into LM358 inputs and see what comes out on the output:
Channel 1 (yellow) is connected to "inverting" OpAmp input (the one with minus sign), and channel 2 (pink) to output of the OpAmp. This is what scope shows in this configuration as I change the pot setting manually:
As you can see here, when V1 (yellow) is below the V2 value, the output is "high", as if the OpAmp was trying to tell us that the input condition (voltage at "+" input is higher than the voltage at "-" input) is true. As soon as V1 rises above the V2, output changes to the other extreme (GND in that case). This was what I expected and I was happy to see the result.
Voltage follower circuit
Most of OpAmp tutorials out there start with inverting or non-inverting circuits with the feedback added in complicated way. Again, I don't want to go there just yet, but I did find something simpler as next step: the voltage follower circuit. It's also pretty simple:
This circuit illustrates the most useful function of operational amplifiers - their output will be always a function of difference between input voltages. In comparator (open loop) configuration, these values were always "high" or "low", because even small change in input voltages translated to huge differences in output.
When you close the loop, however, this feature can be very useful. Look at the above schematic and image what happens if at certain point in time the non-inverting ("+") input is higher than the inverting one ("-"). Output will start rising towards high signal (like in the comparator example). At the same time, whatever comes out of output, is fed back into inverting input, so this one will start rising too. This will keep happening until output (and inverting input at the same time) catches up with the voltage on non-inverting input. Obviously, all this "catching up" will happen very, very fast, so from our perspective it will seem like the output simply matches non-inverting input all the time, hence the voltage follower. This is how it looks on the scope:
You can't see the yellow channel here, because it's equal to the pink one, obviously.
Simple, isn't it? Well, yes, but to me it seemed absolutely useless - I mean why waste two components (OpAmp and accompanying decoupling capacitor) just to get whatever you already have? No worries, we'll get to that!
Non-inverting amplifier circuit
Actually, the voltage follower circuit is a special (and seemingly useless) variant of the non-inverting amplifier circuit, with two resistors (described below) equal to 0 and infinity respectively. Compare this schematic with the one above:
There are two differences here:
- Previously, R5 was equal to 0 (no resistor at all), now it's equal to 10K,
- Previously, R6 was equal to infinity (no connection at all), now it's equal to 1K.
So, what does it do? In general, for the non-inverting amplifier, there is this equation stating that the amplification factor is equal to 1 + R5/R6. You can look it up, there is some simple math behind this, and it's pretty easy to remember.
Please note: sometimes R6 is connected to GND, and the circuit behaves in similar manner, but the "reference voltage" is then 0V instead of the 2,5V (achieved using R2/R4 voltage divider between +5V and GND) here. In my case I wanted to achieve amplification of the difference between the middle point (2,5V) and signal coming out of RV1 - hence the setup.
Going back to voltage follower case: the amplification factor is 1, because 1 + 0/infinity is just 1. Amplification factor of 1 means the input is just copied to output. Note also that in that case the "reference voltage" (so the point used to calculate amplified output voltage against) is irrelevant - you can calculate the output voltage as difference between 5V and non-inverting input and you end up with the same result as if you calculated it as difference between 0V and non-inverting input.
Look at the equation again. I expected amplification factor of 11, so any change of 0,1V on RV1 output measured at TP3 (compared against 2,5V reference voltage measured at TP1) should result in about 1,1V change on output measured on TP2. Let's see if it worked:
Channel 1 (yellow) is connected to TP3 and represents RV1 output. Channel 2 (pink) is connected to TP2 and represents OpAmp output.
Hmmm, this doesn't seem to work. Yeah, there is some amplification here, but it doesn't seem to be correct, I can't see the signal amplified 11 times. Two, maybe three times at best, but not more. I was sure there was something wrong with my schematic, but couldn't put my finger on it. Ah, probably the OpAmp chip is broken, that must be it!
Then I came up with the idea to see if my reference voltage is correct - and this was interesting:
Channel 2 (pink) is still connected to TP2 (OpAmp output), but channel 1 (yellow) is now connected to TP1 (R2/R4 voltage divider output). This is weird, why is my reference voltage changing? Definitely broken OpAmp!
Inverting amplifier circuit
Ok, so now that I know my OpAmp is broken, maybe I could get at least the inverting amplifier circuit to work? Let me try that:
When you look for this schematic online, you will find out that the amplification factor should be -1 * R5/R6. In this specific case output signal should be inverted around 2,5V axis, and be 10 times amplified. So, does it work?
Channel 1 (yellow) is connected to TP3 and channel 2 (pink) is connected to TP2:
What is wrong again? Yeah, there is some amplification going on, but again, something is wrong with the input signal. When I zoomed in on the signal (200mV/div instead of 1V/div) you can see the amplification, but why is the input signal so "weak"? It used to oscillate between 2,40V and 2,92V but now it seems like it's more like 2,49-2,51V or so:
The good news is that the amplification factor seems around the expected value, but it's all weird.
Input/output impedance
When I used to read terms like "low output impedance" or "high input impedance" I was like "whatever". It sounded like something I would have to understand one day, but it never made much sense to me.
It does now, because thanks to this amazing experiment, I have now internalized understanding of both terms, and their importance. It's not only that I understand it, it's that now I feel how important that is.
Another thing I understood was why it always seemed like people tossed around these terms as if they were something obvious, trivial, not worthy of detailed explanation - because they are indeed, but only after you do understand them.
So, what does it have to do with my broken OpAmp chip?
Well, for one, it's not really broken. Obviously. However, the problem is that in both cases (inverting and non-inverting amplifier) one of the input signals got distorted due to input/output impedance. Let's illustrate that using non-inverting case:
For this circuit to work as expected we would have to ensure that voltage measured at TP1 is constantly 2,5V and whatever happens on OpAmp output can't change it. However, in this case, this is not true. Let's imagine (for simplicity sake) that OpAmp output is 5V. We end up with this simplified circuit:
All the components that are not involved have been removed - OpAmp output (5V) is fed into one end of R5. What we get at TP1 is sort of voltage divider with 10K resistor to GND and circuit of two paraller resistors (10K from R2 and 11K from R5 and R6 in series) of total resistance 5K2 to 5V. Resulting voltage is about 3,28V - it's a huge difference from the assumed, initial voltage of 2,5V.
Now, let's do the math again, but with R2 and R3 replaced by 10 Ohm resistors. It's a completely different story - total resistance of R2 in parallel with series R5 and R6 gives total resistance of 9,99 Ohm. TP1 voltage will be 2,501V - hardly changed.
What does it have to do with input or output impedance? That's just it.
You can treat R2/R4 voltage divider as output driver, but one with high output impedance. This means that it will be vulnerable to being "overruled" by any connected circuitry. This is why you want your outputs to have low impedance - so that they will not be affected by any other circuitry connected to it.
The same goes for inputs - you want these to be high impedance to ensure that reading the input doesn't change the level of actual voltage, which would have happened if R6+R5 was significantly smaller resistor.
This was my "eureka" moment, when I understood what's the story about input/output impedance, and this is when I realized what I needed to do to fix my "broken" OpAmp. Sure, I could use stronger resistors (like the aforementioned 10 Ohm ones), but this would only result in excessive current going through the circuit: 5V over 20 Ohm gives 0,25A. This is way too high for voltage divider, and would probably burn most of the typical resistors (0,25A * 2,5V = 625mW, which might be too much even for 1W resistors). There is, however, good solution to the problem: using the most "useless" circuit described above.
See, the main point of the voltage follower circuit is that it acts like a buffer with very high input impedance and very low output impedance. It will convert high impedance output of R2/R4 voltage divider into low impedance one, preventing reference voltage changes.
Non-inverting amplifier circuit - fixed
This is how the schematic looks after I fixed it:
First, let's see how stable is V2 when changing value of RV1. Channel 1 (yellow) is connected to TP3 and channel 2 (pink) to TP1:
That looks much better, V2 voltage is now stable when V1 changes. Output also looks much better - channel 1 is TP3 and channel 2 is TP2 (output):
Now, this is much better, isn't it?
Inverting amplifier circuit - fixed
Just to make sure I don't leave any stones unturned - this is how the correct version of the inverting amplifier circuit looks like:
And the output also looks much better:
Signal is nicely inverted and amplified - exactly as expected.
Conclusion
Sometimes going back to the basics, even if it feels strange, can help you identify and bridge gaps in your understanding. Looking at things from different perspective can have profound impact on your thought process, so even if it feels like detour, always remember to enjoy the road. You might end up learning something new!
And who knows, maybe one day I will understand what input capacitance is? Looking forward to that day!
As for the DB6502 project - I'm working on new PCB layout, trying to reduce fabrication cost while still keeping it "beginner friendly". I have some nice things to share on the subject, so stay tuned!
-
RDY signal experiments
02/28/2021 at 16:56 • 0 commentsAnother long overdue update
Unfortunately, recently I haven't been able to work on my project as much as I would like to, and the progress is much slower than I was used to. That being said, taking some time off can give you new perspective and lets you reconsider your assumptions, goals and plans. So, not all is lost...
At the moment I decided it's time for another PCB exercise - struggling with 14MHz experiments I kept asking myself whether the problems might be caused by poor connections on breadboard. I know, it seems far fetched and probably is not true, but still - the PCB version I'm using right now was supposed to be temporary and replaced down the line by next iteration while I sort out some of the design questions. I have, actually, so I should probably stick to the original plan.
Sure, making PCBs is not cheap and there is certain delay between order being placed and the board arriving, but given how slow my progress has been recently this is something I can live with. On the upside, I want to use this opportunity to test some new ideas, including some fixes to original design. Stay tuned, I should write about it soon.
For now - there was one issue I didn't want to keep open, and since I was about to make a PCB I needed to decide how to solve it. The issue was nothing new, it's something I mentioned previously: RDY pin on WDC65C02 is a bidirectional pin, so it requires careful handling to avoid damage to CPU.
Problem statement
As I wrote in the "Wait states explained" blog entry, the main issue with RDY pin on 65C02 is that it can work in both input and output modes. Most of the time you will be using only the input mode, supplying information to CPU about wait cycles (if that's not clear, please read the previous entry on the subject), and it's tempting to connect your wait states computation logic circuit output directly to the RDY pin. There is serious risk associated with this approach - if, for one reason or another, CPU executes WAI instruction, RDY pin will change mode to output and line will be pulled low (shorted to GND). At the same time your wait state circuit might be outputting high signal on the same line (shorting to VCC) and you will cause short between VCC and GND, resulting in high current being passed via the CPU. If you're lucky it will cause only high energy consumption, but if not, you might burn your CPU.
Sure, there are some standard approaches to the problem, and I will investigate them below. The thing is that the above section is not all. You also need to remember another thing: if you intend to use wait cycles it probably means you are planning to make your CPU run at higher frequency, giving you less time to spare for any of the solutions to work.
This is why I wanted to compare each of the approaches and discuss pros and cons of each. I hope it will help you choose the approach that is suitable to your build.
Experiment description
So, based on the problem statement above, the question I want to answer is: how do these approaches perform in real scenario given the following below constraints:
- Protect the build from the WAI instruction issue? Does it limit the current passed through CPU in such case?
- What is the impact of particular solution on the system timing? How long it takes to toggle between "not ready" to "ready" mode and vice versa?
- Does it present any other issues?
Now, the most proper way to do it would be to test it against the actual 65C02 CPU, and I might actually do it in future, but at the moment I needed much simpler setup. I just wanted to test what is the fastest, energy efficient way of delivering RDY signal to receiver and compare some of the ideas I saw on 6502.org forums.
Test setup
As described in the paragraph above, this is what I needed: oscillating high/low CMOS signal exiting output of one gate being fed into input of another gate. This would resemble closely target situation where the producer of the signal is your wait state circuitry and the consumer is the CPU. The goal: measure time between output on one end and input on the other.
I built this circuit with clock oscillator (4MHz) and single 74AC04 hex inverter IC. First gate (inverting the alternating clock signal) simulates output of wait state circuitry while the last gate will pretend to be CPU reading the signal. This is how the schematic looks like:
Now, each tested solution will be placed between test points TP1 and TP2 and measured. Let's start with simple measurements.
Clock output:
Clock output (yellow) against the inverted signal at TP1 (purple):
I have also used this opportunity to test the propagation delay of the 74AC04 gate (input low to high and high to low, respectively):
As you can see, in this (perfectly simple!) scenario tpd is below 3ns. Very nice!
Variant 1: the simplest possible
Well, doing nothing at all is also an option. Not acceptable, due to no current limitation in WAI scenario, but still something to measure.
This is how the setup looks like on the breadboard:
Obviously, the delay between TP1 and TP2 is well below anything my scope can measure:
As you can see, TP1 and TP2 are almost perfectly lined up - there is no delay in signal propagation. Still, the fact that this would burn our CPU doesn't make it viable solution.
Variant 2: open-collector buffer and pull-up
This is the variant I used in my build recently - RDY signal is fed into the CPU via open collector buffer (7407) followed by pull-up resistor:
This means that output of the wait state circuitry is never directly "high", it's either "low" or "none" (high impedance). The way it works is that as long as RDY line on 65C02 CPU is operating in input mode, it will be driven either by the buffer output (in "low" case) or by the pull-up resistor (in "high" case). High output will always go through resistor, and when RDY line turns to output mode and pulls the line low, this resistor will limit current passing through.
This is how it looks on the breadboard:
Actually, the pull-up resistor is obscured by one of the probes, sorry...
This is the resulting signal:
As you can see, signal is falling very fast, but it takes time to start doing so:
As you can see here, it takes over 10ns to propagate low signal via the buffer. It's even worse when you check the rising time delay:
Here, it measures at almost 15ns, but in reality I should be measuring time between leaving low signal range and reaching high - and this time is much longer, almost 30ns. When you consider 14MHz clock cycle of 70ns you can already see that's a lot!
You can always try to use smaller resistor, and this is how it looks with 220 Ohm one:
As you can see, rise time looks better at just below 10ns.
Variant 3: series resistor
This one seems too simple to work, but in fact, it's pretty effective. When you consider the open collector variant above from the perspective of the WAI instruction, it's important to realise that the worst case scenario is when you have RDY pin driven low by CPU. You are dropping 5V via 470 Ohm resistor (assuming this one is used), producing current of about 10mA. Sure, if you want to shorten the rise time, you can use smaller resistor at the expense of much higher current being delivered to the CPU. The point is: the amount of current is limited only by the resistor (and, of course, your power supply).
This variant is simpler: it uses series resistor instead of combination of open-collector buffer and pull-up:
Now, as simple as it looks, it's actually pretty effective. During normal operation, where RDY line is in input mode, it will slightly delay signal propagation, and it will do it "symmetrically", so the impact on rise/fall time will be similar. When in output mode, it will have to drop the current as in the previous variant, with one difference: current will be limited by the output gate. Still, for some gates it might be more than 20mA, so you have to be careful. This is how the build looks like:
So, with the simplicity being obvious upside, what is the impact on timing?
As you can see, rise/fall is now slightly delayed:
Rise time increases to about 9-10ns realistically (here it is measured at 2.5V), which is pretty acceptable. Same goes for fall time:
Again, you probably have to consider a bit longer period, but this is still close to the propagation delay of open-collector buffer.
Variant 4: series resistor with parallel capacitor
This is solution suggested by Garth Wilson here. I have never really understood how is this supposed to work (have I mentioned that I'm like a total beginner in electronics?), so I wanted to give it a try. See what it does and how. I'm so glad I did!
Again, build is very simple:
So, how does that work? Let's see it in action first:
Looks pretty good, doesn't it? Well, there is some ringing here, and I will discuss it below. For now, let's see the rise and fall up close:
And the fall looks very similar:
What bothered me a bit was the ringing, so I tried some other options. I replaced 22pF capacitor with 47pF one:
As you can see, it didn't improve that much. 220pF maybe?
That's much better.
What is also important - you can replace the 470Ohm resistor by something significantly stronger, like 1kOhm:
This protects your circuit much better from high current and all the results of it.
Ringing issue in detail
What bothered me was that I had to use ten times as large capacitor as the one suggested by Garth Wilson to prevent the ringing. I figured that maybe this ringing is not that important after all? What I did, I reverted the capacitor to 22pF and used 1kOhm resistor and measured the effect first:
When you put the CMOS threshold in the picture it looks as if this is not very valid signal:
The best I can do at the moment is to see how the resulting gate (U1F in the schematic below) interprets this pink signal. Let's move TP2 after the gate and see the resulting signal:
The thinking goes: if the ringing (measured previously on the input of U1F gate) can cause the gate to misinterpret the signal, it will be visible as a transition on output of the gate, right? Luckily nothing like this happens:
So, it seems like there is nothing to worry about. And, while at it, I measured how the variants 3 (only 1K resistor) and 4 (1K resistor with 22pF capacitor in parallel) impact the output of the gate:
Variant 3 (single series resistor of 1kOhm):
Variant 4 (1kOhm resistor in parallel with 22pF capacitor):
As you can see, it's much, much faster!
Oh, and in the end I have also tested it against 12MHz clock to see how it works:
Looks like we have a winner!
Conclusion
Obviously, this is not the end. I still haven't tested it against real-world scenario with CPU in place. Chances are that the approach needs to be refined. For one, I don't understand why Garth suggested 22pF, where in my scenario it looked like 220pF (ten times more) is performing much better. I guess I will have to build it with the actual CPU and find out myself...
The main takeaway here is that this kind of experiments in very limited environment can help you see for yourself how things work and test out any ideas you might have. Apparently there is always more than one way to do things and trying various options can help you make the right decision.
-
Timing is the key
01/30/2021 at 17:15 • 0 commentsTiming issues explained
This is the final part of the 14MHz series, but I'm sure it's not last entry about it. Sorry if it had been a bit stretched, and maybe too beginner-friendly, but I guess for all the experts out there it's all common knowledge. It's the beginners like myself that struggle with these things, so I'd rather write a bit more and make it more useful.
As I wrote in my first post on the subject, all the other issues are secondary, but the timing is the key in running 65C02 at full advertised speed. Bus translation is not very difficult, and documentation quality can be worked around with enough research (remember what that word meant before Google?), but both of these challenges are all the harder with tight timing of 14MHz.
Before we get to the point where I can talk about specifics, I would like to cover one more thing on the subject: what is the timing violation, and how can that affect your build. Again, sorry for going into such basic details, but it might not be obvious for everyone; it certainly wasn't obvious for me.
What happens if you violate chip timing?
We have all done that at some point, and what we know for sure is that it didn't cause the universe to implode. That's already good news, but in fact: where do all these timing restrictions come from and why? Well, our digital logic integrated circuits are not as digital as we would like them to be, nor are they logical. That part I'm sure of - integration and circuitry are still up for debate :)
What happens in a chip like a simple NAND gate is that whenever voltages change on input pins (which, by the way, is also not that very instant!), there is very long and complicated process where different components of the circuit start responding to changing input, and they all do it in very analogue and illogical way. Usually the dance of currents and voltages takes from several to several dozens of nanoseconds. Anything that happens in between is pretty much random, and as with anything random, you can never assume that your result is the proper, final one. It might, just as well, be just random value that resembles the final value closely enough.
What's even worse, this dance is not deterministic. It's not like the access will always take the same amount of time, because both internal and external conditions might change the duration of the process. This is why in datasheets you have pessimistic values for each operation, and while these are not very important at slow CPU speeds, the faster you go, the more it matters. Let's look at the NAND gate used in Ben's project:
Now, it's tempting to assume that the worst case possible scenario at room temperature should be around 15-18ns (taken from rows 4 and 7), but this assumption is valid only if you can guarantee that your operating voltage will not drop below 4.5V. Can you? Sure, we have decoupling caps for that purpose exactly, but still, keep that in mind, it might matter! If the voltage drops below 4.5V threshold, propagation delay will be longer and valid response will appear on output later. Will you notice? Not necessarily. You might be lucky to get response faster thanks to the random operation in the IC.
Still, these are pretty simple cases. When you consider more complex chips it gets even worse. More moving parts means much more unexpected behaviour. It's especially interesting in case of reading ROM memory, which usually will be the slowest part of your build (unless you connect LCD directly to the bus, that is). Let's consider simple example (assuming ROM starts at 0x8000):
LDA $2000 CMP $9000 BNE not_equal
As you can see, I'm reading RAM at address 0x2000 and comparing it against ROM value at 0x9000, jumping to
not_equal
label when the values differ . How much can you violate the ROM timing for the code to work? Basically, how much can you push that read beyond ROM limits before it fails?There are two things to consider here:
- How random is the value in 0x2000 - basically, what values can it assume depending on the logic behind the program? If there are just two values (like 0x00 and 0xFF) then the probability you catch the error is higher, so your code is less vulnerable to timing violation. Now, if the value is fully random (it can assume any of 256 values with equal probability), the equality occurs only once in 256 tests. If the value in RAM is 0x55 (0b01010101) and the one in ROM 0xAA (0b10101010) then all you need is one bit of the ROM cell read correctly for the test to pass! Even if ROM reads as 0x54 (0b01010100), having only single bit 0 correct, then the equality test will correctly fail, and you will not notice the violation! So, with timing violations your results can vary between alternate possibilities with different probabilities. That's quantum computing on 8-bit 80's hardware for you!
- You might be thinking that I'm being silly here, and my example doesn't make sense, since ROM is mostly used to store the code and any timing violations during opcode read would result in execution failure. Yes, you are correct, but sequential code execution doesn't mean that your timing is correct, because of the second issue: ROM chips are not uniform landscapes of storage cells with uniform random access time. Instead, these are organised in "pages", which are selected during read operation. This process takes pretty long time, but consecutive reads from the same page (or close to it) happen much faster. Your code is usually executed from adjacent cells in ROM making these accesses much faster and seem as if protected against timing violations. So: running the code from ROM just fine doesn't mean you got your timing right!
The reason I spend so much time describing this is that I want to make sure you understand that it's easy to see that your timing is off only if it's really, really off. Edge cases and minor violations might slip unnoticed for a long time and be very, very difficult to locate and fix. This is another reason why going from one to eight megahertz is pretty simple. Going further gets harder with each additional megahertz. If you want to go down that route make sure you know why you are doing it...
The worst case of timing violation
This was really, I mean really infuriating. Also: this is something that I still haven't fixed yet, but hoping to be able to work on it and get it right soon. Or some day, really...
One of the things I hated about my first DB6502 build was that all the delays used arbitrary cycle counting technique. Sure, it's pretty easy and reliable (unless you have plenty of very frequent interrupts that is), but it does depend on the CPU clock. If the clock speed changes (as a result of invoking the debugger for instance), simple 20ms delay might run for minutes if not hours. Each time I wanted to test the system against higher frequency I had to replace oscillator (pretty easy thanks to the oscillator sockets which I can't recommend enough!), but also recompile OS/1 software with new clock counting ratios specific for new frequency. I hated this and wanted to solve it in my second DB6502 version.
Sure, you might think RTC will do just fine, but I came up with "better" idea. UART chip I used (SC26C92) is pretty versatile beast and it provides clock interface based on its own oscillator used for baud rate generator. This seemed as perfect solution, as the BRG clock doesn't change with the CPU clock when going into debug mode. The idea was simple: implement new version of the delay routine that would translate desired number of milliseconds to 3,6864 MHz clock ticks, kick off the UART X16 timer and poll periodically to see if the countdown is completed. Sure, I could also use IRQ for this, but wanted things simple for now. Surprisingly enough, the code worked almost the first time around. Impressed with the result (and how precise the measurements were), I have added another feature: CPU speed calculation routine - it used two clocks at the same time: one in VIA (counting CPU clock ticks) and one in UART (counting fixed 3,6864 MHz). CPU frequency could be estimated by how many clock ticks VIA counted during fixed period measured by UART. Simple, elegant solution, and it also worked beautifully.
+---------------------------+ | | | #### #### # # | | ## ## ## # ## | | # # ### # # # | | ## ## ## # # | | #### #### # ### | | | +---------------------------+ OS/1 version 0.3.5C (Alpha+C) Welcome to OS/1 shell for DB6502 computer Enter HELP to get list of possible commands OS/1>info OS/1 System Information System clock running at 0MHz ROM at address: 0x8000, used: 13420 out of 32768 bytes. System RAM at address: 0x0300, used: 1517 out of 3328 bytes. User RAM at address: 0x1000, used: 0 out of 28672 bytes. ROM code uses 9056 bytes. ROM data uses 4194 bytes. SYSCALLS table uses 164 bytes. VIA1 address: 0x0220 VIA2 address: 0x0240 Serial address: 0x0260 Serial driver: SC26C92 OS/1>info OS/1 System Information System clock running at 8MHz ROM at address: 0x8000, used: 13420 out of 32768 bytes. System RAM at address: 0x0300, used: 1517 out of 3328 bytes. User RAM at address: 0x1000, used: 0 out of 28672 bytes. ROM code uses 9056 bytes. ROM data uses 4194 bytes. SYSCALLS table uses 164 bytes. VIA1 address: 0x0220 VIA2 address: 0x0240 Serial address: 0x0260 Serial driver: SC26C92 OS/1>
Then I moved to 14MHz and strange things started happening out of a blue. The worst part was that the system would boot to shell, everything seemed fine, but during XMODEM file transfer it would fail randomly, sometimes being able to transfer one to twenty blocks of data and freezing after that. Long days of troubleshooting revealed the culprit - the UART countdown clock (used in delay routines executed during data transfer) would occasionally stop. Not the first time, not the second, but at some point in time it would just stop counting down, throwing delay routine in infinite loop.
Now, I'm not 100% positive I know the reason. I think I do, and I even managed to capture some data on my logic analyser supporting the main hypothesis. The main problem with the timer function in the UART chip is how you operate it: to start the timer, you have to read from register 0x0E and to stop it, you have to read from 0x0F. Theoretically, for the chip to register the operation, read should be no shorter than 55ns, but this is one single value valid for all operations, like reading data from inbound queues. In reality, it's actually possible that your stretched nRD signal goes high a bit too late, after the 65C02 guaranteed 10ns address hold time. It all depends on the complexity of your nRD signal stretching logic, but the result is that nRD goes high during the phase in which new address is being stabilised on the address bus - and it might happen that accidentally read happens from register 0x0F, causing the countdown to stop.
I do realise that it seems like a long shot, but I actually did quite a lot of testing to confirm this hypothesis - I changed the polling code in a way that it compared last value of the counter between the checks and if detected no change for five consecutive times, it signals it on dedicated line. Then I connected my logic analyser to the bus and set this specific line transition to trigger data acquisition with large "pre-trigger" buffer. What I got was exactly as expected: series of correct reads with decreasing counter values until at certain point, during UART RX IRQ processing (when range of UART registers is being read), the counter stops decrementing. Once, just once, the data on the bus actually registered 5ns long read from 0x0F at the end of some other read operation.
What is weird though, is that it happens only with high frequency clock. In theory I would expect it to happen at every possible clock frequency, since signal stretching logic delay has no dependency on the main clock. Yeah, these things are not easy...
Actual timing analysis
So, going back to the actual build, what are the key timing requirements you have to meet? There are actually several steps to consider.
Address decoding
You must ensure to complete chip selection process during low CPU clock phase. In case of 500ns at 1MHz it's hardly an issue, but at 14MHz this window seems to shorten to 30ns. Actually, it's much, much worse. First of all, new address doesn't start to show up on the bus for first 10ns of the cycle (tAH - address hold). After that it will take up to another 20ns before the address is fully stable, and there you go: your 30ns window is gone. In fact, this is where the maximum CPU frequency comes from - as long as we assume that the address has to be stable at the rising clock edge, that is. From what I read on 6502.org, some people have successfully decided to avoid this assumption. Let's stick to the assumption then for now. We are already 30ns into the clock cycle, and the rising clock edge is coming soon. Address are stable now, but that's not the end of story - you still haven't selected any chip for read/write operation. If you decide to stick to 74HC00, you have to assume that it will take up to 18ns for the chip selection signal for RAM/ROM to go low. Sure, most of the time it will be closer to typical 9ns time, and the actual input will stabilise faster, but as I wrote above - it's not the typical cases that you have to worry about, it's the outliers. These will crash your system. What can you do then? There are other options. You can use 10ns max time PLD, at the expense of high power consumption. You can choose 74AC00 chip, which will have 7ns guaranteed propagation delay high to low (this is the one you want). What you should avoid at any cost is making your address decoding complex - each chip adds nanoseconds to the final calculation. So, let's assume we have the simplest possible address decoding logic, using two NAND gates of 74AC00 chip, resulting in maximum 14ns propagation delay. This added on top of existing 30ns, you end up with 44ns total time.
Memory access time
In order to ensure that CPU reads data from the bus correctly, you need to respect also data read setup time (tDSR), which is 10ns at 5V. This means that out of 70ns clock cycle (at 14MHz), after having spent up to 44ns to select the correct memory chip, you have only 16ns to get it to output data on the bus for the CPU to read it right (70ns - 10ns tDSR - 44ns CE = 16ns)
Sure, these are worst case scenario details, but you have to account for them. If the data is presented on the bus after 17ns, it will be probably fine, but if it goes over 20ns, it will be too late. CPU will read some bits correctly and some not, and there is nothing you can do about it anymore.
If you look at 62256 memory in DIP package, it has 55ns access time: If you check the timing table, there is interesting hint there: There are two timing constraints that you have to take into account - tAA and tACS of 55ns and tOE, which is shorter with 35ns duration. Now, the important hint is that you can actually spend more time on bus translation of the RD/nWR signal to nOE/nWR, because chip allows for it. Still, you need 55ns from the time the address is stabilised on the bus for the valid data to show up there.
Obviously, 55ns is way too long, so you have to choose different memory chip. There are similar chips available, also in DIP package, but with different (narrow) DIP-28 footprint: 71256. These offer much better access times, down to 12ns in 71256SA12 variant: tAA and tACS are now 12ns long at maximum: This guarantees that data will be correctly retrieved from RAM, but what about slower peripherals?
Wait state propagation delay
As I wrote in my last post, one of the solutions to slow peripherals is to use wait states, in case of 6502 CPU this means using RDY pin by pulling it down when access to slower peripheral takes place. Let's look at the timing aspects of this feature.
First thing (and I wrote about it last time) is that you need to pull RDY low before falling edge of the clock (effectively - before the end of the full clock cycle), respecting the CPU Control Setup Time (tPCH):
In our case (running at 5V), it will be 10ns:
So, at 14MHz we have approximately 60ns of the clock cycle when we must decide whether CPU should wait or not. Given all the above, it seems like a lot, but there is a catch or two.
If you consider simplest possible example - using wait state on every other cycle (one wait state, one ready state, one wait state, one ready state and so on...), you could use simple 74AC74 D flip-flop. This will showcase the model nicely, but in reality it's of very little use. Your logic most likely will get more complex than that, and it will probably depend on the information about selected peripheral. This makes the window shrink rapidly - if you look back at the address decoding example, you might need up to 44ns just to get reliable, usable information whether slow ROM is selected or not, leaving you with just 16ns to process this information.
Now, the processing of this data seems simple, but it's not really. This is the simplest real life example of wait state generator:
As you can see, there is just one 74AC00 gate between chip select signal and the RDY pin. Given the propagation delay of 7ns in case of 74AC00, you are left with only 9ns to spare before your window closes. Pretty tight! Actually, there is another issue here - in this schematic it is assumed that your CS signal is active high, which is not true in case of Ben Eater's build. In his specific design you could use A15 signal directly here, and you save several ns spent on inverting the signal for active low ROM CS signal.
Unfortunately, if you are using different address decoder (like my PLD based version), you might be stuck with active low signal. In this case you should use different gate: 74AC32, like so:
Luckily this specific OR gate in the AC family has similar propagation delay of 7.5ns, so it has very little impact on the timing.
You might be wondering how comes I don't take the 74AC74 delay into account. I actually do, but it just matters less, because the output state of the flip-flop is calculated at the very beginning of the cycle, on the falling edge of the clock, so it will show up there several nanoseconds after that. What you do have to consider, however, is that you have to convert falling clock edge to a rising one, probably using inverter gate. This means that output of the 74AC74 will be delayed by the time it takes to invert clock signal.
There is one trick you might use, however, and it might become useful later on. Instead of this:
You can actually generate PHI2 from inverting PHI1, like so:
And yes, the first inverter is totally unnecessary, it's here just to illustrate the idea. This way your falling edge of the PHI2 clock will be converted to rising edge (required by edge-sensitive circuits) of PHI1 shortly before the rising edge of PHI2. This might save you couple of nanoseconds, but consider the implications carefully. In the context of 74AC74 flip-flop here it doesn't make any sense whatsoever.
So, we have 9ns to spare, we are good to go, right? Obviously not. You have to calculate how many wait states you need, and this is where another catch comes into play. Single wait state for 150ns ROM will not be enough at 14MHz! Single wait state grants you extension of the tACC of full length of single clock cycle, so about 71ns. Considering previous calculations, where in the simplest possible case, we had only 16ns to access memory, even 71ns more will not suffice. You need two wait states for ROM. How does it impact timing?
Now, if you analyse above circuit you will notice one important thing - additional wait state here doesn't add extra delay to the wait state calculation. Sure, there is additional AND gate in the picture, but its state is determined early during clock cycle, so this specific input to the OR gate is available pretty early. ROM_CS signal comes in much later - and this is the critical path for timing analysis.
There is another issue to handle though - you might need several different wait state generator sources. For instance you might need zero wait states for RAM, one wait state for UART and two wait states for ROM. In this case you will need additional gate, and this one, unfortunately, will impact the critical path:
Remember the spare 9ns we have had recently? With the additional gate another 7ns are gone, leaving us with breathing room of 2ns:
This is really tight, and any voltage variance might be enough to cause timing violation. And that's not even the end of the story!
RDY low to high transition
I wrote about it previously - in case of WDC 65C02 CPU you also have to consider the impact of the bidirectional property of the RDY line. If you decide to go with open-drain RDY output, you need to consider voltage rise time of the pull-up resistor. As you can see, with these tight timings it might be difficult to manage to fit in single cycle. Perhaps the best option would be to use series resistor instead, but I still need to test this approach.
Bus translation impact
Another problem you have to handle (in case you are using any peripherals not directly compatible with 6502 interface, like ROM or RAM), is the RD/WR signal stretching. What I came up with was the solution based on the PLD I used for address decoding itself. The rationale was that it was exactly the place where the necessary data was already being processed (state of RDY line was also computed there), so it seemed the reasonable choice.
That being said, there was something off about this solution. Initially I used PLD with guaranteed propagation delay of 10ns and the system was really unstable. Half of the time it wouldn't even boot to the OS/1 shell, and even if it did, it would fail randomly. What I did notice (and was surprised by it), was that this chip was considerably, observably warmer than the surrounding chips. I even used digital thermometer to confirm this and indeed - it was warmer by almost 10 degrees celsius. When I checked datasheet it turned out that the 10ns variant consumes significantly more energy than the 15ns one, and I replaced them. Instantly it all started working much, much better. System got perfectly stable again at 8MHz, and it would work reasonably well at 14MHz. Still, there were issues at higher clock frequency (including the random infinite loop in delay function), but it was working much better.
Coming back to the bus translation impact on the timing - the problem is the complexity of logical equation you need to apply at the beginning of the clock cycle: you want your nRD/nWR line to go high while the clock is low if the last cycle was the "ready" one. In negative terms that means that if your nRD/nWR line is low and the RDY line is low at the time of falling clock edge, you should keep the nRD/nWR low even though the clock line is low.
So far I came up with schematics that either require calculation in PLD or are too slow for discrete chips. The problem here is that your address is stable only for 10ns after falling edge of the clock (that's the tAH part of timing characteristics of 6502). If you don't pull your nRD/nWR line high during that time you might end up in a situation where the address bus changes while the read or write line is still active. In most of the cases, even if it happens, it will not impact your system, but in my "weird delay glitch" case this was exactly what was crashing the build.
What I plan to consider (but haven't designed it yet), is to use some kind of multiplexer for nRD/nWR line translation - this way the signal could be prepared "in advance", and only toggled on the falling clock edge, depending of the state of the RDY line. It feels promising, but further investigation is required.
Nobody expects Spanish Inquisition
I saved this single screenshot for last. Even if you do your analysis and you get your timings right, you might see random issues, because there is one last thing to it - rise time of any digital signal, and what happens if two signals align perfectly in time. Here I have nice case: two signals being fed in the AND gate (its output is pink on the image below) - one rising and the other one falling at the same time. In theory, it should work just fine, but in practice there is very short spike where one signal hasn't fallen yet while the other has risen just enough to cause logical high for just a couple of nanoseconds - enough to mess with interfacing IC, yet not enough to be caught by logic analyser:
Timing summary
Does that mean you should give up? No, definitely not. The whole point of this post is to help you understand all the obstacles that might get in the way of building stable 65C02 computer running at 10+ MHz frequency. It's fun to try, and I promise you will learn a lot just by trying to get there, but it will be quite a journey, so you better be prepared. You will probably need a scope and high frequency logic analyser (cheap Saleae clones might not be sufficient, due to limited bandwidth and number of channels). What you could also consider (but I haven't tried that yet), is to reduce build complexity. You can get rid of ROM altogether by using slower clock to copy ROM to RAM and running the code from the RAM only. You could also use 6502-bus compatible devices only, some of which are rated for 14MHz. There are different ways of going about it, and you have to adapt your solution to the problem you are trying to solve.
If there was one thing I learned while doing this is that I have approached the problem from wrong perspective. I wanted my build to run at 14MHz not because I needed it, and not because I saw any specific use for this. I consider this as interesting challenge, but this has almost drove me away from the project. It might have been too much too soon. I don't know, but right now I need to get back into the saddle and figure out better way of approaching the issue. One way or the other - you have been warned. Proceed with caution and don't give up if it doesn't work. Remember there is always plenty of other things you can try and enjoy.
-
CPU families and their interfaces
12/27/2020 at 19:05 • 0 commentsIt all stays in the family...
One interesting thing that Ben doesn't seem to elaborate on in his videos, is the interesting issue of CPU families and resulting chip (in)compatibility. I came across this issue when started using SC26C92 Dual UART chip, but only much later, when tried pushing 6502 to 14MHz limit I noticed some resulting issues.
Let's start with the beginning, though. If you followed Ben's project closely, you might have noticed important difference between IC interfaces. If not, you will notice shortly...
Conveniently, it's very easy to hook up 6522 chip to 6502 CPU bus. No wonder - these belong to the same "family" of CPU and peripherals, and they use the following signals to synchronise operation:
- CS - Chip Select signal used to activate the chip using address decoding logic,
- R/W - single Read/Write signal to indicate whether current operation is read or write,
- PHI2 - common clock source to be shared between CPU and peripherals,
- D0..D7 - data bus,
- RS0..RS3 - register select usually mapped to A0..A3 lines,
- RES - active low Reset signal,
- IRQ - active low IRQ signal.
If you check the ACIA chip (6551), you will notice it has the same set of control signals (with fewer registers, but the idea is the same):
Now, if you look at the ROM/RAM chips, these are a bit different:
As you can see, some details are similar (like the low active Chip Select signal), but part of the interface is a bit different. Instead of single Read/Write signal, there are two separate lines: low active Output Enable and low active Write Enable. There is no PHI2 signal, and as a result, to prevent accidental writes, in Ben's video about RAM timing there is necessity to ensure that write operation is performed only during high clock phase.
If you haven't played with any other CPU of the era (I haven't at the time), you might just accept the solution and just move on without thinking too much about it. This is exactly what I did, and only after playing with higher frequencies (and, specifically, wait states) I had to revisit my understanding of the subject. But I'm getting ahead of myself...
Interfacing to SC26C92
Side note: all the issues I ran into when trying to connect to this chip are the reason I started this blog in the first place - I wanted this documented somewhere. Probably will need to write more details about the initialisation and such details. Some day, I guess...
When you read this specific chip documentation you will find it uses interface similar to the one used in ROM/RAM:
As you can see, there are standard A0..A3 register select lines, D0..D7 data bus, low active IRQ output line. The first important difference is the RESET signal, which is high active, but this translation is easy - single inverter or NAND gate will do. Chip Enable (other name for Chip Select) is predictably low active, and there are two signals to control read/write operation: low active RD (identical to low active OE) and low active WR.
Now, it might seem that connecting to this chip is pretty simple, and you should do it in a similar way Ben connected RAM:
This way we ensure that RESET signal is RESB inverted and RDN is low only when R/W is high (indicating read operation), while WRN is low only when clock is high.
Unfortunately, there is an issue here: early during clock cycle, while address lines are still being stabilised, you might get random access to the UART chip (your address decoder might react on the unstable address and accidentally pull UART CEN line low for just a couple nanoseconds). At the same time RDN might be low, resulting in read operation being executed.
Sure, the operation would not be valid - it would be at most 10ns long, which is way below the minimum pulse length, but this is actually not a good thing. It might cause issues with chip operation stability or worse.
How can anything be worse than the chip instability? Actually, as I have learned, certain operations can be executed, at least partially, even with very short random read pulses.
What about Ben's build then?
You might be wondering why this hasn't occurred in Ben's build, and you would be right. It might actually occur, but it doesn't matter. Even if RAM or ROM chip is enabled for a short random pulse during low clock phase, all it will do (in worst case scenario) is actually read data at random memory location, put it shortly on a data bus while nobody (as in: CPU) is listening and that's that.
Basically - any read operation from RAM/ROM can't change system state.
Sure, random read from the VIA chip would cause issues, because read from certain registers can change chip state. Reading from PORTA register (marked here as IRA - Input Register A) can clear IRQ flag and change state of handshake lines:
This, however, can't happen, due to CPU family compatibility between 6502 and 6522 chips. VIA will ignore any kind of operation performed during low clock cycle - and this is exactly why it requires PHI2 input.
Fixing the problem
The fix, as you might expect, is actually pretty simple: you have to gate RDN signal with high clock input as well, like so:
This is much better. Sure, the CEN signal might still go low during low clock cycle, but RDN signal will be high at that time, preventing accidental reads. Is that all? Obviously not...
Wait states impact
Now, this is really funny how I didn't think of that and wasn't able to guess what's going on. When I started playing with wait states, my first solution was based on the example from Apple I manual, but instead of using it only for ROM, I added it for all the components: RAM, ROM, VIA and UART. I have also used slow clock (something like 4MHz or so), just to be able to see the results with my logic analyser better.
Everything seemed to work except for some random glitches. Instead of proper system prompt like so:
OS/1 version 0.3.5C (Alpha+C) Welcome to OS/1 shell for DB6502 computer Enter HELP to get list of possible commands
I would get something like that:
OOSS//11 version 0.3.5C (Alpha+C) Welcome to OS/1 shell for DB6502 computer Enter HELP to get list of possible commands
At first I didn't even notice the issue, but it also occurred in shell - each character typed in the serial terminal would be displayed twice.
When printing longer strings from 6502 it would duplicate each of the first 4 characters in the string and print all the following just once. At the same time, when typing into the shell, each character would be printed twice, without the 4 character limit.
Can you guess what the problem was?
If you take another look at the diagram above (where I gated RD/WR signals with high clock phase), you will notice that when wait states are added, each read/write operation is performed twice, like so:
OK, you might as, but why it happened only for first 4 characters? The answer is simple: SC26C92 contains 8 char transmit FIFO buffer, and the first 4 characters filled it up when being written twice. Afterwards, whenever single character was transmitted, UART raised interrupt causing two more duplicates of next character being written - first one stored in FIFO and second discarded as queue overflow.
Obviously, when typing on the keyboard, data was sent slowly and no queue overflow ever happened, resulting in double writes for each and every character.
Now that I explain it like that it seems really simple and easy to understand, but it was really strange and scary at first sight...
Another fix then...
So, how to fix this issue? Basically, for wait state cycles, instead of the above diagram, you want something like that:
You need your RD/WR signals stretched whenever wait state is in operation. Or, speaking in terms of "negative logic" (for a lack of better term), you want your RD/WR signals to go high only during the first low clock phase of any operation. During each of the consecutive low clock phase of the same operation (due to introduction of wait cycles) RD or WR signal should remain low.
So, the logic gets more complex:
And please note: there might be cases where you want to disable the wait state processing altogether (like when flashing the EEPROM from Arduino) or keep the wait state for longer, effectively overriding the RDY line - this would add to the complexity of the signal logic.
However, it's not just complexity that becomes problem here, it's the timing that gets in a way, but I will write more on that next time.
I was actually lucky with that part - my logic for nOE and nWR signals was encoded in the address decoding PLD. Thanks to that I could change it easily, without having to redesign the PCB, and it was immediately applied to ROM and RAM chips as well, even though these have already been placed on prototype board.
So, another lesson learned: if possible, keep your potentially mutable logic in programmable chips, or enable sourcing them from off the board with jumper headers.
Now, coming back to the CPU families, as you might be wondering how does it work with Z80 for instance - and this part is really good. Z80 will not pull nRD or nWR signals low while the address is unstable, and would pull them high prior to changing address lines. It will also keep the signals low for the whole time during wait state operation. Much more convenient, but at the expense of lower per-cycle CPU efficiency. Basically, similar operations take noticeably more clock cycles on Z80 than on 6502.
I guess you can't have everything, can you?
Summary
So, does it mean you should give up on Z80-compatible peripherals? No, of course not, especially that you have to work with the memory chips (both RAM and ROM) that use this different interface. You just have to understand the consequences, accept the limitations it imposes on your project and plan accordingly. You can consider running at slower speeds (so that no wait states are required) for instance, or using clock stretching instead. Each decision will have it's own pros and cons, and you just have to consider them carefully.
What you should do, however, and what I strongly encourage you to do is to play around with different architectures, chip families and solutions. Ben Eater's projects are great place to start, but they also provide certain guardrails that will ensure your adventure stays safe and comfortable, when the actual fun starts when you leave the comfort zone. For me introduction of wait states and different UART controller was turning point in the project - for the first time it really challenged my understanding of the architecture and forced me to reconsider what I did and what I didn't know.