Compute better, compute more

There are 2 main and complementary methods to reduce the time needed to scan the state of a whole statespace:

compute more arcs in parallel
test for 2 conditions to detect complementary/symmetrical orbits.

The first method yields the best speedup, relying on multi-processor parallel processing. So far I can use POSIX threads but I also plan to use CUDA on RTX3070. Before I can setup the software chain, I also explore using SIMD: AVX promises 256 bits wide packed integers that GCC seems to handle mostly fine. https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html looks "reasonable", except I need a bit more than basic operations: carry wrap-around and conditional jump on comparison will require some intrinsics or even assembly...

SIMD code is quite important because this is inherently how GPGPU work. Developing the AVX version will help make a better CUDA implementation.

The second method was introduced in 76. More efficient scan of the state space and only brings a speedup of (almost) 2 but even that is significant. What if a computation would run 3 months instead of 5 months ? I still have to get the details right though, and this is the priority because the result will be the basis for the parallel versions.

A practical use of PEAC : packet transmission

How to double the scan rate

Discussions

Become a Hackaday.io Member