Clocking Registers

At 100MHz, the cycle is only 10ns long. At that time scale, issues that can go ignored at slower clock-rates suddenly become very material. Clock-skew is one such issue. The cycle is so short that even small delays on clock signals will be material.

What kinds of delays could we be dealing with? Suppose we have a clock signal internal to the CPU with a 1.2ns rise-time (Tr) driving a 5" trace with ten flip-flops on it. A 50Ω trace on FR4 will present 3.3pF of parasitic capacitance per inch, and each flip-flop will add 3pF of capacitance in addition (assuming AUC logic). The cumulative delay on that trace is something in the order of 3.5ns relative to the input clock signal (i.e., prop delay = Tr + RC, so 1200ps + (5 * 3.3pF * 50Ω) + (10 * 3pF * 50Ω) = 3.5ns). 3.5ns may not seem like much, but it represents more than a third of the cycle at 100MHz!

The moral of the story is to manage capacitance on clock lines carefully. To that end, I'm contemplating using a CDCVF310 1:10 Clock Driver to distribute the clock around the board. A two level clock tree can provide a dedicated trace for up to 100 destinations with minimum capacitance. We can then adjust for the tpd of the clock drivers themselves by using a CY2302 Zero-Delay-Buffer (ZDB) to synchronize these internal signals to the input clock.

Beyond capacitance, there are four key specs in the CDCVF310 clock-driver datasheet that we should examine to better understand skew:

Tpd = 2.8ns max -- Propagation Delay: CLK input to Yn output propagation delay
Tsk(o) = 150ps max -- Output Skew: the variation in the tpd between outputs, i.e., from Ym to Yn
Tsk(p) = 250ps max -- Pulse skew: the variation in tpd from PLH to PHL
Tsk(pp) = 350ps max -- Part to Part skew: the variations in tpd from various ICs on the board

With a multi-level tree, all four specs may come into play, and the total skew can add up to be a problem if we're not careful. Consider two Flip-Flops in series, like this:Attachment:

If the clock-delay from FF1 to FF2 is longer than the tpd of FF1 plus the data delay to FF2, then FF2 will not latch the intended value correctly and the circuit will fail. One way to ameliorate the problem is to use trace delays in our favour. We can wire CLK signals so traces go from downstream flip-flops to upstream ones, hence clocking them in reverse order. Another option is to introduce delay in the data signals until the travel time between the flip-flops exceeds the longest clock-skew by some safety margin.

And that brings us neatly into the issue of Write-Enable signals (WE) and how they might impact skew. We have a few implementation options to consider:

1) On the C74-6502, write signals are all routed to a 74AC273 register and released together on the clock-edge -- like this:Attachment:

The 74AC273 is cleared mid-cycle by a low-going pulse. Active-high WE signals arrive at the 74AC273 at various times throughout the cycle, but then travel to their destinations more or less together. A challenge with using this approach in this design is the potential skew between the outputs of the’273 register. There is no spec for skew mentioned on the 74AC273 datasheet, but it can be as much as 1ns on a 74LVC273. (From the datasheet, Tsk(o) = 1ns max, “Skew between any two outputs of the same package switching in the same direction."). In addition, it’s also more difficut to generate the mid-cycle pulse to clear the ‘273 reliably at these clock-rates.

2) To minimize skew, we ideally want nothing in the path between the clock and a flip-flop’s CLK input, as in this alternative based on a 2:1 FET Mux at the data inputs of a register:Attachment:

This method accomodates both active-high and active-low WE signals equally well. The FET switch will add 5Ω of series resistance at the data inputs of the flip-flop, and with it some minimal additional delay that we can safely ignore here. The switch-time of the mux becomes the setup time for the WE signal. One consideration is that clocking all flip-flops every cycle will consume some additional power, but the main drawback is the number of additional ICs required implement the approach. We effectively need one ‘3257 2:1 mux for every 4-bits of registered data in the CPU. That's a lot of hardware overhead.

3) Another approach is to use active-high WE signals with AND gates at the CLK inputs of flip-flops. This gating mechanism is sensitive to glitches if the WE signal is allowed to fluctuate while the clock is high. This is very likely to happen with WE signals that are generated by combinatorial logic. In that case, we can use a transparent latch to hold WE steady during the first half of the cycle, like this:Attachment:

In this case, the datasheet for 74AUC08 lacks any skew information so it’s hard to know what kinds of delay we might be dealing with. We can use the ZDB to synch up the signals with an external clock, but that of course won’t help skew.

4) We can also use OR gates with active-low WE signals, as follows:Attachment:

In this case, fluctuations in /WE during the clock-high phase are not harmful. The catch is that once /WE is brought low, the signal must rise again before the following mid-cycle, or a phantom clock-edge will be generated at the flip-flop input when it does rise. The approach is therefore not feasible in situations where WE signals are only available late in the cycle (which is very much the case in this design).

5) Finally, we can use FET switches to gate clock signals, like this:Attachment:

With this approach, brief fluctuations on the WE signal should not cause a problem. However, longer pulses, or indeed a full switching of the WE signal before the mid-cycle is problematic. Like with AND-gated clock above, a transparent latch could be used to hold WE steady during the first half of the cycle. The advantage here is that the tpd of FET switches is much shorter, and so any deviation from the mean is likely to also be minimal.

I don't doubt there are other approaches to these issues, so please feel free to suggest any alternatives. As always, all input is very much welcome. We'll have to see how all this works out in the final analysis. For now it's safe to say that clock-skew, trace-delay and write-enable signals will all need to be carefully managed if we are to get anywhere near the 100MHz goal.

The ALU

Discussions

Become a Hackaday.io Member