-
Mapping the pipeline to iHP PDK
03/21/2026 at 14:46 • 0 commentsIt's tempting to just plop the VHDL source code and let the synthesiser do the heavy lifting.
But the pipeline is in Verilog and I use structural code only so I must map (actually synthesise) the circuits by hand.
For this a clear view of the circuit is essential and circuitJS helps, but doing that also makes me reconsider several choices and the VHDL coding style must be deeply adapted.
The last log has mapped the comparator, so that's one thing left on the list.
The remaining circuits are inventoried.
- The adder : I'll use dumb RTL style since I'm not yet operational with Jerem's Logilib. No time to dig further, it will work.
- The DFF: there are three cases.
- X and T are initialised with INIT_X, have feedback/enable
- Y and B are initialised with INIT_Y, have feedback/enable
- A is not initialised, has feedback/enable
- There are muxes, to select the operands during each phase.
- OPM is a basic mux2 done by mux2_x18() (for both sides)
- OPX, OPY2 and OPT are muxed with a constant ADJUST.
- OPB2 is muxed with the modulus and B is inverted.
That's 2 circuits to create here, one just forces the constant, the other also inverts the operand. Here again the PDK does a great job. See the simulations :
![]()
So that's 2 modules to write using only these gates: ConstAdjOrPass() and ConstModOrNeg().
And that's about it for the bulk of the datapath.
-
Number comparison with the iHP PDK
03/21/2026 at 01:28 • 0 commentsIt didn't go as planned.
The comparator doesn't use NAND/NOR because it actually makes it larger and/or bigger, and what matters is the critical path along the cascade. Fortunately there is A21OI that helps !
![]()
.
-
Testing the Hammer
03/10/2026 at 04:18 • 0 commentsI have a first functional circuit in ASIC and I must write the Python code that tests the sea-of-xor.
There are 19 key values to test: 0 and the 18 single-bit-at-1 configurations.
Then throw a few "random" values for good measure, since the rest should be a linear combination of all the 18 single-bit cases, right ? And the all-ones case of course.
But then the results of some test vectors could become vectors themselves. So some VHDL later and we get
111111111111111111 101011111000110100 000000000000000000 000000000000000000 000000000000000001 110111011111011001 000000000000000010 110111011111011111 000000000000000100 110111101111011111 000000000000001000 110111101110011111 000000000000010000 010111100110110101 000000000000100000 110001101011001010 000000000001000000 000000011111111000 000000000010000000 101110111001000000 000000000100000000 111111000010100000 000000001000000000 111111011111111001 000000010000000000 111110111111111001 000000100000000000 111110111111110101 000001000000000000 001101111111010101 000010000000000000 000111100010110101 000100000000000000 110111000000100110 001000000000000000 111111000000101100 010000000000000000 100001001101110000 100000000000000000 000000011111111010 111111111000000000 011100010000010101 000000000111111111 110111101000100001 101010101010101010 100110111110100101 010101010101010101 001101000110010001 110111011111011001 011011011100100111 110001101011001010 000011011010101011 000000011111111010 110110111000100111 011011011100100111 101100110110001111 001101111111010101 111100010010000001 011011010110110101 011101101001011100 100100101001001010 110110010001101000 110110010001101000 101001010101001101 111100010010000001 001110001011100000 110110111000100111 101000101011111111 100110111110100101 101111000010110000 000011011010101011 100011000001110011 101000101011111111 110110010101011011 100011000001110011 101100010000011011 001110001011100000 101111110000110100
Now I must find again how Python encodes binary data.
.....
It seems to work.
![]()
excepts that it outputs MSB first.
I consider making a subcircuit just to implement and test the Hammer18 in 3 modes: direct, encode and decode
![]()
..
result : it works well.
![]()
and the current circuit uses only 22% of the tile.
Routing stats
Utilisation (%) Wire length (um) 21.265 17366 Cell usage by Category
Category Cells Count Fill decap fill 1742 OR xor2 or2 83 Flip Flops dfrbpq dfrbp sdfrbpq 69 Misc dlygate4sd3 67 Buffer buf 66 Multiplexer mux2 36 Inverter inv 21 Combo Logic a22oi 9 NOR nor4 4 AND and2 and4 2 357 total cells (excluding fill and tap cells)
The speed easily reaches 100MHz, can be pushed to 200MHz,
![]()
but the stats show that about half of the surface is DDFs, 1/4th is buffers/delays/fanouts, and 1/4 is logic gates...
Cell type report: Count Area Fill cell 1742 20588.00 Buffer 3 21.77 Clock buffer 23 493.52 Timing Repair Buffer 107 1701.91 Inverter 9 48.99 Clock inverter 12 65.32 Sequential cell 69 3917.29 Multi-Input combinational cell 146 2104.70 Total 2111 28941.49now, that makes you think..
-
Fixed the Hammer's RTL
03/07/2026 at 16:08 • 0 comments@alcim.dev wanted to help me with the tapeout and translated Hammer18.vhdl to Hammer18.v with his custom AI tools. So I had to verify everything in detail. To my surprise, the AI did not hallucinate anything, but it uncovered two typos in the original file!
Note: these typos did not affect the efficiency of the whole system, the change in error detection rate is insignificant, probably the same order of magnitude as using a different permutation, or swapping wires at the input or output. The correction was necessary for overall coherence reasons, and I expect other, better permutations will appear in the future so it was more important in December to move forward, with a "good enough" permutation and assess the overall system performance.
So I have uploaded the new version of Hammer18.vhdl and, for reference, here is the online sim and here is the corresponding diagram:
![]()
This was originally published in 124. Proof, pudding.
The two typos explain the last glitch in 126. Hammer = Hamming Maximiser
For more reference, the original permutations are
Perm1965 = forward( 3 5 9 17 16 10 15 12 1 2 0 14 6 7 13 8 11 4 ) reverse( 10 8 9 0 17 1 12 13 15 2 5 16 7 14 11 6 4 3 ) Perm7515 = forward( 17 2 11 0 6 16 8 9 10 14 1 7 13 15 5 12 4 3 ) reverse( 3 10 1 17 16 14 4 11 6 7 8 2 15 12 9 13 5 0 ) Perm4021 = forward( 4 17 6 5 1 15 7 14 16 13 0 9 10 8 12 2 3 11 ) reverse( 10 4 15 16 0 3 2 6 13 11 12 17 14 9 7 5 8 1 )
but there was another glitch during the graphic transcription.
And the above permutations were designed and meant to be fed into more structured code, such that copy-pasting the above numbers would indeed avoid any typo. But that transformation will be for later.
.............................................................................................................
Another interesting realization is that the sea-of-xor, the latch and the combination-xor can work in different orders, allowing a better integration in the pipeline, where it amounts to only one XOR layer in the pipeline. Here are only a few possibilities:
![]()
TODO:
The new version of the file must be verified and the avalanche profile compared to the original diagram.
===> YES !
~/miniMAC$ ./runme_testHammer.sh total:200 7 16 100001001101110000 8 14 110111000000100110 8 6 000000011111111000 8 7 101110111001000000 8 8 111111000010100000 9 13 000111100010110101 9 15 111111000000101100 9 17 000000011111111010 9 5 110001101011001010 11 4 010111100110110101 12 12 001101111111010101 13 0 110111011111011001 14 3 110111101110011111 15 10 111110111111111001 15 1 110111011111011111 15 11 111110111111110101 15 2 110111101111011111 15 9 111111011111111001. .
TODO:
Solve the fanout imbalance issues
.
though so far, the Hammer circuit easily fits in the current 10ns cycle time.
-
Hammer in ASIC
03/04/2026 at 02:16 • 0 commentsSo I got a pair of tiles on the soon taping out iHP26 run by Tiny Tapeout.
The original idea was : one tile works as an encoder, another as a decoder. But then I found out that I can reconfigure the Hammer as encoder or decoder with a mere MUX, and I originally designed the whole error encoder and decoder (including GPEAC) to be much smaller than a tile. So both the whole encoder and decoder could fit but you never know.
That's all fine but there are 2 big issues : limitation of IO pins and getting something simple done at first. And the mode pin (RX/TX) reduces the pin budget to 23.
A first simpler version would just implement the Hammer circuit, which requires 18 bits in and 18 bits out, which exceeds the budget. So I have to multiplex. Take the circuit below and replicate it 9 times (except the control signals).
![]()
Then translating the VHDL code to Verilog should be a breeze.
For the mode selection, it's just a MUX:
![]()
I wonder why I hadn't spotted this simplification earlier.
Anyway, I didn't use this approach, as I decided to implement both encoder and decoder in order to create an onchip "loopback":
![]()
Check the doc at https://github.com/YannGuidon/miniMAC_tx/blob/main/docs/info.md
-
Article + on hold
02/13/2026 at 03:01 • 0 commentsAn article describing the Hammer will soon be published in the French "Hackable" magazine.
But I must reorient my priorities at this moment, as I must prepare the upcoming March tapeout of TinyTapeout#26 ...
The transceiver is far from ready and I must also focus heavily on #miniPHY : real-life tests are planned for the end of the month as well. Which made me drift toward low-voltage pulse generation with avalanche through Germanium transistors...
.
20260304:
The article is available!
Distance de Hamming maximale : la clé de la détection d’erreurs
Hackable n°65 / mars 2026
.
.
-
Gaming the error detection
01/30/2026 at 01:00 • 0 commentsThe recent Success has been resounding but the battle is not totally over.
Remember that the Golden Rule is that the error detection is related only to the density of parity bits, and then the probability is 2^-n. That's it.
There is still a flaw in the test that I have created : it only tests consecutive bit flips within one word, not several words with some bits flipped here and there.
Using several words, there is an easy method to create an error pattern that takes much longer to detect:
- First, flip a bit in the first word, which takes the longest to percolate to one of the PEAC control outputs. This could create a wrong sequence that lasts dozens and dozens of cycles.
- Then, the Hammer circuit is looked up again and again to generate the counter-signal that "masks" the original flipped bit.
This is a blueprint to alter one selected bit and there are 18 sequences, one for each bit position.
But the possible harm (if this is an "aggression") is pretty ... limited.
- It is barely possible to control the state of the PEAC registers. There is no way to know if or when a single flipped bit would avalanche toward the control bits, in the first cycle or any next iteration. It would take "a dozen cycles" in average but it's too dependent on the timing...
- The aggression would affect one, or just a few bits, because otherwise the error would be quickly caught by PEAC. So the potential targets (or attack surface) is low.
- The final cleared double-word blows the whole scheme away.
So the joint use of additive and XOR circuits shows that they protect each other, better than just pure XOR or Add-based solutions. Making up alterations that can last an arbitrary long time is possible but increasingly pointless and the final checksum validates the whole transaction, in case the packet is too short to let the alteration bubble up in the PEAC.
The system could be made even tighter, by routing some of the Hammer's bits and XORing with the PEAC's decoder output. This would foil some attacks but the normal operation would not be better (the few XOR gates would add marginal complexity but the latency would increase).
-
No more gPEAC ?
01/04/2026 at 11:48 • 0 commentsThe log 181. PEAC w18 is a mixed bag, there are good things but overall, the less good aspects stick.
Given the great performance bump introduced by the Hammer circuit, I wonder why I still keep the gPEAC layer. There are two reasons: it's the best scrambler, and though the very long periods are great, more importantly it can't be "crashed" (which is a flaw of LFSRs).
At a higher level, the system is stronger because it associates two circuits of different nature.
But what if?
.
Removing gPEAC removes the scrambler. Is it required ? Even though the miniPHY handles baseline wander (somehow, at least that's the expectation), and even if it uses a sort of convolutional error correction system, the spectrum still needs to be spread. Scrambling also helps a bit to increase error detection.
LFSRs don't work well, they suffer from easy cancellation. Using the Hammer on the send side would be much better (and it's very tempting) but cancellation remains, even though a wider Hammer could provide hidden states. But it wouldn't work. It probably wouldn't improve error detection, which is already maximised.
-
Success
12/29/2025 at 17:03 • 0 commentsThe Hammer18 circuit fits well inside the NRZI unit and instantly delivers fantastic results. Just as expected. That will be my Christmas then!
Here are the results after 10 millions of injected errors:
1 : 2241925 - **************************************************** 2 : 5543183 - ******************************************************************************************************************************** 3 : 1691752 - **************************************** 4 : 369784 - ********* 5 : 112181 - *** 6 : 32917 - * 7 : 6360 - * 8 : 1401 - * 9 : 377 - * 10 : 84 - * 11 : 21 - * 12 : 12 - * 13 : 2 - * 14 : 0 - 15 : 0 - 16 : 0 - 17 : 1 - *
The little 1 at the end is an initialisation bug in the program.
Otherwise, the 4x slope is very apparent: the system has achieved true 2-bit-per-word performance!
There is a little "bump" at the start, 1/4 of the errors are caught immediately, but the next cycle catches 1/2! Then every number is divided by 4 as expected.
- CD0:115 : 115 errors were not caught and passed as the first 0-filled word of a control sequence.
- CD1:6443188 : 2/3 of the detected errors triggered the C/D bit and the rest of the word was not 0. That's 56027× the number of data that passed with a 0.
- Err:3556696 : the rest (1/3) was caught as number errors: either the number was out of range or the MSB was 1.
I'm still unable to explain why the CD bit catches 2× more errors than the other methods, though I'm not sure it matters. However, we have a way to extrapolate the error handling capability.
10 millions (almost 24 bits) give 2 errors at 13 words, 3 more words (4^3=2^6=64) will give about one error in 640 millions (close to 1 billion).
Notes:
- the error model that was tested here is just one bit. Results will vary a bit depending on the error model. More bits and at different positions will affect the curve a little, but not radically.
- Adding another 0-word during C/D transitions will get us in the 5 billion ballpark for rejection. This is actually a requirement since the gPEAC has a one-word latency (hence the bump at the 2nd word) and an error could come at the last data word and go unnoticed, so a second 0-word acts as a checksum check.
- Since the NRZI+Hamming circuit does a LOT of crazy avalanche, now comes the time to check if a more basic binary 18-bit PEAC could work too. I'm looking back at old logs, to find some already-calculated data, and there is
- 19. Even more orbits ! : primary orbit of 18 : 172.662.654 (instead of 34.359.738.368 to pass, or 0.5%)
- 44. Test jobs : 18: Total of all reachable arcs: 68719736689
- 90. Post-processing : Width=18 Total= 34359868344 vs 34359869438 (missing 1094)
In fact I now realise that I have very little clue about the topology of w18. I'm taking care of this at 181. PEAC w18.
And I still need to fix this tiny little bug in the program, that leaves one uncaught error. I didn't notice it before because I always got many leftovers but that bug still appears with no NRZ or Hamming avalanche, even after thousands of cycles : my test code must have a problem somewhere.
....
And it's a weird issue with something that does not clear a register somewhere, it's taken care of by double-resetting the circuit, 2 clocks seems to solve it but what and where... ?
But at least I can get clean outputs:
100 errors:
1 : 23 - *********************** 2 : 61 - ************************************************************* 3 : 13 - ************* 4 : 0 - 5 : 3 - ***
1000 errors:
1 : 229 - ******************************** 2 : 582 - ******************************************************************************** 3 : 154 - ********************** 4 : 23 - **** 5 : 8 - ** 6 : 3 - * 7 : 1 - *
10K errors
1 : 2236 - ******************************** 2 : 5625 - ******************************************************************************** 3 : 1635 - ************************ 4 : 351 - ***** 5 : 108 - ** 6 : 31 - * 7 : 11 - * 8 : 3 - *
100K errors:
1 : 22343 - ********************************* 2 : 55555 - ******************************************************************************** 3 : 16765 - ************************* 4 : 3771 - ****** 5 : 1121 - ** 6 : 348 - * 7 : 79 - * 8 : 14 - * 9 : 2 - * 10 : 1 - * 11 : 1 - *
1M errors:
1 : 223252 - ********************************* 2 : 554817 - ******************************************************************************** 3 : 169397 - ************************* 4 : 37354 - ****** 5 : 11003 - ** 6 : 3274 - * 7 : 701 - * 8 : 151 - * 9 : 40 - * 10 : 8 - * 11 : 3 - *
10 millions:
1 : 2231957 - ********************************* 2 : 5546325 - ******************************************************************************** 3 : 1695563 - ************************* 4 : 371915 - ****** 5 : 112766 - ** 6 : 33113 - * 7 : 6437 - * 8 : 1426 - * 9 : 369 - * 10 : 98 - * 11 : 24 - * 12 : 3 - * 13 : 2 - * 14 : 2 - *len:1 CD0:138 CD1:6449726 Err:3550136 Missed:0 Ham:1 NoNRZI:0
The progress 5, 7, 8, 11, 11, 14 has some hicups... Maybe the PRNG is not random enough?
Anyway, it is great to finally get rid of the "long tail"! Look at this amazingly compliant logplot!
![]()
The 10M slope converges to 15, thus 16 words would be good for 100M. High-safety protocols would still work with 16-word buffers but keep the last one in quarantine too.
And here is another logplot that compares the slope versus the number of (consecutive) flipped bits.
![]()
Get it there : miniMAC_2026_20251230.tbz
-
Looping the Hammer
12/29/2025 at 00:29 • 0 commentsI tried to feed the circuit from itself and see if loops appear, and how long they would be. I start with one bit set:
- Start= 0 or 11 => cycle in 1777 cycles
- 1 : 3556 cycles
- 2 leads to 5 or 16 : 5334
- 3 : not part of an orbit, leads to a 10668-loop
- 4 : leads to 1
- 6 or 8 : 10667
- 7 : 2666
- 9 : not part of a cycle, leads to a 889-loop
- 10 leads to 6/8
- 12 : not part of a cycle, leads to 10668-loop
- 13 : leads to 5 or 16 : 5334
- 14 : leads to 5 or 16 : 5334
- 15 : loop in 2667
- 17 : not part of a cycle, leads to a 762-loop
Actually the lengths of the loops do not matter a lot (unless they are ridiculously short) since this would assume a stream of data=0 which can't happen due to gPEAC.
The fact that the values change so drastically is a big improvement over the previous simple NRZ scheme, since this totally locks the error, while the NRZ could have its effect cancelled as soon as the next cycle if two bits are flipped at the same location on consecutive cycles.
Since the expected buffer size (16 or 32 words max) is way shorter than the observed loop length, there is no need to optimise further, as it could only impede (a bit) directed attacks, not improve error detection in common cases.
And I expect a big jump of error detection eficiency: this additional convolutional layer adds one word of latency but is the key to achieve true 2 bits-per word error detection: 15 words will lead to 1 chance in a billion of leaking an error, and 32 words (64 bits) will make it virtually impossible to pass in real life scenarios.
This also means that a 32-word buffer is all that's needed. In high/medium error rates, there is no need to transmit "empty commands" anymore, saving 2 or 4 intermediate checksums, or about 1/16th of bandwidth! So this new unit is very important for efficiency overall, though it couldn't be enough all by itself, its proterties are complementary to those of the gPEAC layer. It's the pair that works together to reach the theoretical limit.
That new unit also over-scrambles the transmitted data stream. This is not the intended function but it does it (somehow) anyway. So the data's properties must be re-evaluated and at least discarded. This implies that the #miniPHY should expect absolutely random data, no special case... This removes one of the (initially supposed) advantages of gPEAC but it's for the overall best.
Yann Guidon / YGDES











