Project Logs

Collapse

Mapping the pipeline to iHP PDK
Yann Guidon / YGDES • 03/21/2026 at 14:46 • 0 comments
It's tempting to just plop the VHDL source code and let the synthesiser do the heavy lifting.

But the pipeline is in Verilog and I use structural code only so I must map (actually synthesise) the circuits by hand.

For this a clear view of the circuit is essential and circuitJS helps, but doing that also makes me reconsider several choices and the VHDL coding style must be deeply adapted.

The last log has mapped the comparator, so that's one thing left on the list.

The remaining circuits are inventoried.
- The adder : I'll use dumb RTL style since I'm not yet operational with Jerem's Logilib. No time to dig further, it will work.
- The DFF: there are three cases.
  - X and T are initialised with INIT_X, have feedback/enable
  - Y and B are initialised with INIT_Y, have feedback/enable
  - A is not initialised, has feedback/enable
  The case of A is already handled by modules.v:dffen_x18(), the others are a bit harder. I have found the solution in the PDK with sg13g2_sdfbbp_1 ! So here comes dffen_rs_x18() with independent SET_B and RESET_B pins. This way I can create the necessary registers.
- There are muxes, to select the operands during each phase.
  - OPM is a basic mux2 done by mux2_x18() (for both sides)
  - OPX, OPY2 and OPT are muxed with a constant ADJUST.
  - OPB2 is muxed with the modulus and B is inverted.
  That's 2 circuits to create here, one just forces the constant, the other also inverts the operand. Here again the PDK does a great job. See the simulations :
  So that's 2 modules to write using only these gates: ConstAdjOrPass() and ConstModOrNeg().
And that's about it for the bulk of the datapath.
Number comparison with the iHP PDK
Yann Guidon / YGDES • 03/21/2026 at 01:28 • 0 comments

It didn't go as planned.

The comparator doesn't use NAND/NOR because it actually makes it larger and/or bigger, and what matters is the critical path along the cascade. Fortunately there is A21OI that helps !
Comparator circuit
.

Testing the Hammer

Yann Guidon / YGDES • 03/10/2026 at 04:18 • 0 comments

I have a first functional circuit in ASIC and I must write the Python code that tests the sea-of-xor.

There are 19 key values to test: 0 and the 18 single-bit-at-1 configurations.

Then throw a few "random" values for good measure, since the rest should be a linear combination of all the 18 single-bit cases, right ? And the all-ones case of course.

But then the results of some test vectors could become vectors themselves. So some VHDL later and we get

111111111111111111  101011111000110100  
000000000000000000  000000000000000000  
000000000000000001  110111011111011001  
000000000000000010  110111011111011111  
000000000000000100  110111101111011111  
000000000000001000  110111101110011111  
000000000000010000  010111100110110101  
000000000000100000  110001101011001010  
000000000001000000  000000011111111000  
000000000010000000  101110111001000000  
000000000100000000  111111000010100000  
000000001000000000  111111011111111001  
000000010000000000  111110111111111001  
000000100000000000  111110111111110101  
000001000000000000  001101111111010101  
000010000000000000  000111100010110101  
000100000000000000  110111000000100110  
001000000000000000  111111000000101100  
010000000000000000  100001001101110000  
100000000000000000  000000011111111010  
111111111000000000  011100010000010101  
000000000111111111  110111101000100001  
101010101010101010  100110111110100101  
010101010101010101  001101000110010001  
110111011111011001  011011011100100111  
110001101011001010  000011011010101011  
000000011111111010  110110111000100111  
011011011100100111  101100110110001111  
001101111111010101  111100010010000001  
011011010110110101  011101101001011100  
100100101001001010  110110010001101000  
110110010001101000  101001010101001101  
111100010010000001  001110001011100000  
110110111000100111  101000101011111111  
100110111110100101  101111000010110000  
000011011010101011  100011000001110011  
101000101011111111  110110010101011011  
100011000001110011  101100010000011011  
001110001011100000  101111110000110100

Now I must find again how Python encodes binary data.

.....

It seems to work.

excepts that it outputs MSB first.

I consider making a subcircuit just to implement and test the Hammer18 in 3 modes: direct, encode and decode

result : it works well.

and the current circuit uses only 22% of the tile.

Routing stats

Utilisation (%)	Wire length (um)
21.265	17366

Cell usage by Category

Category	Cells	Count
Fill	decap fill	1742
OR	xor2 or2	83
Flip Flops	dfrbpq dfrbp sdfrbpq	69
Misc	dlygate4sd3	67
Buffer	buf	66
Multiplexer	mux2	36
Inverter	inv	21
Combo Logic	a22oi	9
NOR	nor4	4
AND	and2 and4	2

357 total cells (excluding fill and tap cells)

The speed easily reaches 100MHz, can be pushed to 200MHz,

but the stats show that about half of the surface is DDFs, 1/4th is buffers/delays/fanouts, and 1/4 is logic gates...

Cell type report:                     Count     Area
Fill cell                              1742   20588.00
Buffer                                    3      21.77
Clock buffer                             23     493.52
Timing Repair Buffer                    107    1701.91
Inverter                                  9      48.99
Clock inverter                           12      65.32
Sequential cell                          69    3917.29
Multi-Input combinational cell          146    2104.70
Total                                  2111   28941.49

now, that makes you think..

Fixed the Hammer's RTL

Yann Guidon / YGDES • 03/07/2026 at 16:08 • 0 comments

@alcim.dev wanted to help me with the tapeout and translated Hammer18.vhdl to Hammer18.v with his custom AI tools. So I had to verify everything in detail. To my surprise, the AI did not hallucinate anything, but it uncovered two typos in the original file!

Note: these typos did not affect the efficiency of the whole system, the change in error detection rate is insignificant, probably the same order of magnitude as using a different permutation, or swapping wires at the input or output. The correction was necessary for overall coherence reasons, and I expect other, better permutations will appear in the future so it was more important in December to move forward, with a "good enough" permutation and assess the overall system performance.

So I have uploaded the new version of Hammer18.vhdl and, for reference, here is the online sim and here is the corresponding diagram:

This was originally published in 124. Proof, pudding.

The two typos explain the last glitch in 126. Hammer = Hamming Maximiser

For more reference, the original permutations are

Perm1965 =
  forward(  3  5  9 17 16 10 15 12  1  2  0 14  6  7 13  8 11  4 )
  reverse( 10  8  9  0 17  1 12 13 15  2  5 16  7 14 11  6  4  3 )
Perm7515 =
  forward( 17  2 11  0  6 16  8  9 10 14  1  7 13 15  5 12  4  3 )
  reverse(  3 10  1 17 16 14  4 11  6  7  8  2 15 12  9 13  5  0 )
Perm4021 =
  forward(  4 17  6  5  1 15  7 14 16 13  0  9 10  8 12  2  3 11 )
  reverse( 10  4 15 16  0  3  2  6 13 11 12 17 14  9  7  5  8  1 )

but there was another glitch during the graphic transcription.

And the above permutations were designed and meant to be fed into more structured code, such that copy-pasting the above numbers would indeed avoid any typo. But that transformation will be for later.

.............................................................................................................

Another interesting realization is that the sea-of-xor, the latch and the combination-xor can work in different orders, allowing a better integration in the pipeline, where it amounts to only one XOR layer in the pipeline. Here are only a few possibilities:

TODO:

The new version of the file must be verified and the avalanche profile compared to the original diagram.

===> YES !

~/miniMAC$ ./runme_testHammer.sh
  total:200                                                 
 7   16   100001001101110000                                                 
 8   14   110111000000100110                                                 
 8    6   000000011111111000                                                 
 8    7   101110111001000000                                                 
 8    8   111111000010100000                                                 
 9   13   000111100010110101                                                 
 9   15   111111000000101100                                                 
 9   17   000000011111111010                                                 
 9    5   110001101011001010                                                 
11    4   010111100110110101                                                 
12   12   001101111111010101                                                 
13    0   110111011111011001                                                 
14    3   110111101110011111                                                 
15   10   111110111111111001                                                 
15    1   110111011111011111                                                 
15   11   111110111111110101                                                 
15    2   110111101111011111                                                 
15    9   111111011111111001

. .

TODO:

Solve the fanout imbalance issues

though so far, the Hammer circuit easily fits in the current 10ns cycle time.

Hammer in ASIC
Yann Guidon / YGDES • 03/04/2026 at 02:16 • 0 comments

So I got a pair of tiles on the soon taping out iHP26 run by Tiny Tapeout.

The original idea was : one tile works as an encoder, another as a decoder. But then I found out that I can reconfigure the Hammer as encoder or decoder with a mere MUX, and I originally designed the whole error encoder and decoder (including GPEAC) to be much smaller than a tile. So both the whole encoder and decoder could fit but you never know.

That's all fine but there are 2 big issues : limitation of IO pins and getting something simple done at first. And the mode pin (RX/TX) reduces the pin budget to 23.

A first simpler version would just implement the Hammer circuit, which requires 18 bits in and 18 bits out, which exceeds the budget. So I have to multiplex. Take the circuit below and replicate it 9 times (except the control signals).

Then translating the VHDL code to Verilog should be a breeze.

For the mode selection, it's just a MUX:

I wonder why I hadn't spotted this simplification earlier.

Anyway, I didn't use this approach, as I decided to implement both encoder and decoder in order to create an onchip "loopback":

Check the doc at https://github.com/YannGuidon/miniMAC_tx/blob/main/docs/info.md
Article + on hold
Yann Guidon / YGDES • 02/13/2026 at 03:01 • 0 comments

An article describing the Hammer will soon be published in the French "Hackable" magazine.

But I must reorient my priorities at this moment, as I must prepare the upcoming March tapeout of TinyTapeout#26 ...

The transceiver is far from ready and I must also focus heavily on #miniPHY : real-life tests are planned for the end of the month as well. Which made me drift toward low-voltage pulse generation with avalanche through Germanium transistors...

.

20260304:

The article is available!

Distance de Hamming maximale : la clé de la détection d’erreurs
Hackable n°65 / mars 2026

.

.
Gaming the error detection
Yann Guidon / YGDES • 01/30/2026 at 01:00 • 0 comments
The recent Success has been resounding but the battle is not totally over.

Remember that the Golden Rule is that the error detection is related only to the density of parity bits, and then the probability is 2^-n. That's it.

There is still a flaw in the test that I have created : it only tests consecutive bit flips within one word, not several words with some bits flipped here and there.

Using several words, there is an easy method to create an error pattern that takes much longer to detect:
- First, flip a bit in the first word, which takes the longest to percolate to one of the PEAC control outputs. This could create a wrong sequence that lasts dozens and dozens of cycles.
- Then, the Hammer circuit is looked up again and again to generate the counter-signal that "masks" the original flipped bit.
This is a blueprint to alter one selected bit and there are 18 sequences, one for each bit position.
But the possible harm (if this is an "aggression") is pretty ... limited.
- It is barely possible to control the state of the PEAC registers. There is no way to know if or when a single flipped bit would avalanche toward the control bits, in the first cycle or any next iteration. It would take "a dozen cycles" in average but it's too dependent on the timing...
- The aggression would affect one, or just a few bits, because otherwise the error would be quickly caught by PEAC. So the potential targets (or attack surface) is low.
- The final cleared double-word blows the whole scheme away.
So the joint use of additive and XOR circuits shows that they protect each other, better than just pure XOR or Add-based solutions. Making up alterations that can last an arbitrary long time is possible but increasingly pointless and the final checksum validates the whole transaction, in case the packet is too short to let the alteration bubble up in the PEAC.
The system could be made even tighter, by routing some of the Hammer's bits and XORing with the PEAC's decoder output. This would foil some attacks but the normal operation would not be better (the few XOR gates would add marginal complexity but the latency would increase).
No more gPEAC ?
Yann Guidon / YGDES • 01/04/2026 at 11:48 • 0 comments

The log 181. PEAC w18 is a mixed bag, there are good things but overall, the less good aspects stick.

Given the great performance bump introduced by the Hammer circuit, I wonder why I still keep the gPEAC layer. There are two reasons: it's the best scrambler, and though the very long periods are great, more importantly it can't be "crashed" (which is a flaw of LFSRs).

At a higher level, the system is stronger because it associates two circuits of different nature.

But what if?
.
Removing gPEAC removes the scrambler. Is it required ? Even though the miniPHY handles baseline wander (somehow, at least that's the expectation), and even if it uses a sort of convolutional error correction system, the spectrum still needs to be spread. Scrambling also helps a bit to increase error detection.
LFSRs don't work well, they suffer from easy cancellation. Using the Hammer on the send side would be much better (and it's very tempting) but cancellation remains, even though a wider Hammer could provide hidden states. But it wouldn't work. It probably wouldn't improve error detection, which is already maximised.
Success
Yann Guidon / YGDES • 12/29/2025 at 17:03 • 0 comments
The Hammer18 circuit fits well inside the NRZI unit and instantly delivers fantastic results. Just as expected. That will be my Christmas then!

Here are the results after 10 millions of injected errors:
```
 1 : 2241925 - ****************************************************
 2 : 5543183 - ********************************************************************************************************************************
 3 : 1691752 - ****************************************
 4 :  369784 - *********
 5 :  112181 - ***
 6 :   32917 - *
 7 :    6360 - *
 8 :    1401 - *
 9 :     377 - *
10 :      84 - *
11 :      21 - *
12 :      12 - *
13 :       2 - *
14 :       0 -
15 :       0 -
16 :       0 -
17 :       1 - *
```
The little 1 at the end is an initialisation bug in the program.

Otherwise, the 4x slope is very apparent: the system has achieved true 2-bit-per-word performance!

There is a little "bump" at the start, 1/4 of the errors are caught immediately, but the next cycle catches 1/2! Then every number is divided by 4 as expected.
- CD0:115 : 115 errors were not caught and passed as the first 0-filled word of a control sequence.
- CD1:6443188 : 2/3 of the detected errors triggered the C/D bit and the rest of the word was not 0. That's 56027× the number of data that passed with a 0.
- Err:3556696 : the rest (1/3) was caught as number errors: either the number was out of range or the MSB was 1.
I'm still unable to explain why the CD bit catches 2× more errors than the other methods, though I'm not sure it matters. However, we have a way to extrapolate the error handling capability.

10 millions (almost 24 bits) give 2 errors at 13 words, 3 more words (4^3=2^6=64) will give about one error in 640 millions (close to 1 billion).

Notes:
- the error model that was tested here is just one bit. Results will vary a bit depending on the error model. More bits and at different positions will affect the curve a little, but not radically.
- Adding another 0-word during C/D transitions will get us in the 5 billion ballpark for rejection. This is actually a requirement since the gPEAC has a one-word latency (hence the bump at the 2nd word) and an error could come at the last data word and go unnoticed, so a second 0-word acts as a checksum check.
- Since the NRZI+Hamming circuit does a LOT of crazy avalanche, now comes the time to check if a more basic binary 18-bit PEAC could work too. I'm looking back at old logs, to find some already-calculated data, and there is
  - 19. Even more orbits ! : primary orbit of 18 : 172.662.654 (instead of 34.359.738.368 to pass, or 0.5%)
  - 44. Test jobs : 18: Total of all reachable arcs: 68719736689
  - 90. Post-processing : Width=18 Total= 34359868344 vs 34359869438 (missing 1094)
In fact I now realise that I have very little clue about the topology of w18. I'm taking care of this at 181. PEAC w18.

And I still need to fix this tiny little bug in the program, that leaves one uncaught error. I didn't notice it before because I always got many leftovers but that bug still appears with no NRZ or Hamming avalanche, even after thousands of cycles : my test code must have a problem somewhere.

....

And it's a weird issue with something that does not clear a register somewhere, it's taken care of by double-resetting the circuit, 2 clocks seems to solve it but what and where... ?

But at least I can get clean outputs:

100 errors:
```
 1 :     23 - ***********************
 2 :     61 - *************************************************************
 3 :     13 - *************
 4 :      0 -
 5 :      3 - ***
```
1000 errors:
```
 1 :    229 - ********************************
 2 :    582 - ********************************************************************************
 3 :    154 - **********************
 4 :     23 - ****
 5 :      8 - **
 6 :      3 - *
 7 :      1 - *
```
10K errors
```
 1 :   2236 - ********************************
 2 :   5625 - ********************************************************************************
 3 :   1635 - ************************
 4 :    351 - *****
 5 :    108 - **
 6 :     31 - *
 7...
```
Read more »
Looping the Hammer
Yann Guidon / YGDES • 12/29/2025 at 00:29 • 0 comments
I tried to feed the circuit from itself and see if loops appear, and how long they would be. I start with one bit set:
- Start= 0 or 11 => cycle in 1777 cycles
- 1 : 3556 cycles
- 2 leads to 5 or 16 : 5334
- 3 : not part of an orbit, leads to a 10668-loop
- 4 : leads to 1
- 6 or 8 : 10667
- 7 : 2666
- 9 : not part of a cycle, leads to a 889-loop
- 10 leads to 6/8
- 12 : not part of a cycle, leads to 10668-loop
- 13 : leads to 5 or 16 : 5334
- 14 : leads to 5 or 16 : 5334
- 15 : loop in 2667
- 17 : not part of a cycle, leads to a 762-loop
Actually the lengths of the loops do not matter a lot (unless they are ridiculously short) since this would assume a stream of data=0 which can't happen due to gPEAC.

The fact that the values change so drastically is a big improvement over the previous simple NRZ scheme, since this totally locks the error, while the NRZ could have its effect cancelled as soon as the next cycle if two bits are flipped at the same location on consecutive cycles.

Since the expected buffer size (16 or 32 words max) is way shorter than the observed loop length, there is no need to optimise further, as it could only impede (a bit) directed attacks, not improve error detection in common cases.

And I expect a big jump of error detection eficiency: this additional convolutional layer adds one word of latency but is the key to achieve true 2 bits-per word error detection: 15 words will lead to 1 chance in a billion of leaking an error, and 32 words (64 bits) will make it virtually impossible to pass in real life scenarios.

This also means that a 32-word buffer is all that's needed. In high/medium error rates, there is no need to transmit "empty commands" anymore, saving 2 or 4 intermediate checksums, or about 1/16th of bandwidth! So this new unit is very important for efficiency overall, though it couldn't be enough all by itself, its proterties are complementary to those of the gPEAC layer. It's the pair that works together to reach the theoretical limit.

That new unit also over-scrambles the transmitted data stream. This is not the intended function but it does it (somehow) anyway. So the data's properties must be re-evaluated and at least discarded. This implies that the #miniPHY should expect absolutely random data, no special case... This removes one of the (initially supposed) advantages of gPEAC but it's for the overall best.

View all 136 project logs

Hammer18_RTL.tgz x-compressed-tar - 2.94 kB - 03/23/2026 at 14:51		Download
Hammer18.vhdl updated/fixed netlist x-vhdl - 3.83 kB - 03/07/2026 at 16:24		Download
miniMAC_2026_20251230.tbz gPEAC + Hammer18 working together x-bzip-compressed-tar - 193.25 kB - 12/30/2025 at 17:08		Download
Hammer18.tbz implements the error maximiser x-bzip-compressed-tar - 69.42 kB - 12/29/2025 at 00:32		Download
PermParam_20251227.tgz better brute force, better program, better results x-compressed-tar - 7.54 kB - 12/27/2025 at 05:20		Download

miniMAC - Not an Ethernet Transceiver

Description

Details

20260326: #YGMII spin off

20260225: ASIC with Tiny Tapeout

20251113: Architecture update

20250525: spinning #miniPHY off.

Files

Hammer18_RTL.tgz

Hammer18.vhdl

miniMAC_2026_20251230.tbz

Hammer18.tbz

PermParam_20251227.tgz

Project Logs

Collapse

Mapping the pipeline to iHP PDK

Number comparison with the iHP PDK

Testing the Hammer

Routing stats

Cell usage by Category

357 total cells (excluding fill and tap cells)

Fixed the Hammer's RTL

TODO:

TODO:

Hammer in ASIC

Article + on hold

20260304:

Distance de Hamming maximale : la clé de la détection d’erreurs

Gaming the error detection

No more gPEAC ?

Success

Looping the Hammer

Discussions

Similar Projects

Another Table-Based Stream Scrambler

µδ code

LRU

Formal Verification of RVECC Error Correcting Code

miniMAC - Not an Ethernet Transceiver

Become a Hackaday.io member

Just one more thing

Description

Details

20260326: #YGMII spin off

20260225: ASIC with Tiny Tapeout

20251113: Architecture update

20250525: spinning #miniPHY off.

Files

Hammer18_RTL.tgz

Hammer18.vhdl

miniMAC_2026_20251230.tbz

Hammer18.tbz

PermParam_20251227.tgz

Project Logs Collapse

Routing stats

Cell usage by Category

357 total cells (excluding fill and tap cells)

TODO:

TODO:

20260304:

Enjoy this project?

Discussions

Become a Hackaday.io Member

Similar Projects

Does this project spark your interest?

Report project as inappropriate

Send message

Remove Member

Project Logs

Collapse