I am still evaluating the options for a massively parallel scanner, described in the previous logs. In the case where I can program a high density FPGA, I created a VHDL version of one "core", available in PEAC_core.tgz with a GHDL script.
I provided a simple wrapper that iterates over the arcs, just like the C version but slower because of the emulation. The point is validation, and I spotted a single stupid error in the core. But that was only the easy part. The challenge is to connect hundreds, thousands of these small counters without wasting resources or latency.
If I fit 1K cores on a FPGA, communication bandwidth will be crazy. I might need memory buffers, and probably local schedulers that would manage their own sub-group of cores. After all it's simply a counter, initialised by an external resources, that increments every time a core finishes its job, until a maximum value is reached. It's also a matter of fanout: data must enter and exit (while keeping synchronisation) and FPGA don't like large buses. I might create a sort of "ring bus", a parallel token ring, probably byte-wide, to keep the resources low, while cores are clustered and their results are put in FIFOs...
Simulating this in GHDL will be fun !
Discussions
Become a Hackaday.io Member
Create an account to leave a comment. Already have an account? Log In.