There are 2 main and complementary methods to reduce the time needed to scan the state of a whole statespace:
- compute more arcs in parallel
- test for 2 conditions to detect complementary/symmetrical orbits.
The first method yields the best speedup, relying on multi-processor parallel processing. So far I can use POSIX threads but I also plan to use CUDA on RTX3070. Before I can setup the software chain, I also explore using SIMD: AVX promises 256 bits wide packed integers that GCC seems to handle mostly fine. https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html looks "reasonable", except I need a bit more than basic operations: carry wrap-around and conditional jump on comparison will require some intrinsics or even assembly...
SIMD code is quite important because this is inherently how GPGPU work. Developing the AVX version will help make a better CUDA implementation.
The second method was introduced in 76. More efficient scan of the state space and only brings a speedup of (almost) 2 but even that is significant. What if a computation would run 3 months instead of 5 months ? I still have to get the details right though, and this is the priority because the result will be the basis for the parallel versions.
Discussions
Become a Hackaday.io Member
Create an account to leave a comment. Already have an account? Log In.