The FHP3 model is outdated, from the physics point of view, and superseded by the Lattice Bolzmann models for to decades. However it remains a simple, powerful, accessible model for desktop simulations of subsonic turbulences. At the turn of the millenium, I have thoroughly optimised the computations of the particles collisions using a large array of modern methods : SIMD, strip-mining, SMP, boolean reduction... Using the fastest and latest P3@550MHz, I once reached a peak update speed of 150M sites/second, enough to see vortices develop in real time. More information is found (in french) in my Masters thesis : http://ygdes.com/memoire/ More than 17 years later, computers have evolved to a point where this algorithm would totally scream, thanks to the ubiquitous 128/256-bits registers and on-die SMP. And I have since then learned a lot more about programming and architecture...
Watching https://player.vimeo.com/video/450406346 , at about 32:00 I realise that AVX512 implements ROP3 ! Had I had this opcode back in 1999 I would have save SO MUCH DEVELOPMENT TIME !
It's disabled ... did you find out why? Process problems? Edit: I've found this: https://www.tomshardware.com/news/intel-nukes-alder-lake-avx-512-now-fuses-it-off-in-silicon https://www.realmicentral.com/2021/08/18/amd-zen4-will-support-avx512/
Buying the latest hardware for hobby purposes requires deep pockets. If the "ROI" is of a "non-material" kind having fun with slightly (or really) outdated stuff is just as good (and as the Finnish saying goes: reikä se on rumallakin!)
In chapter 2.4 you wrote "Les mesures effectuées au MIT en janvier 2000 montrent que les PC de bureau de dernière génération sont presque aussi rapides qu'un bloc CAM8 (8 cartes à 25MHz). Les tout derniers microprocesseurs généralistes permettent de rivaliser avec des ASIC créés il y a plusieurs années. En terme de génération équivalente, si l'on considère la règle de Moore, l'optimisation poussée du code a permis probablement de gagner trois ou quatre années par rapport à un code non optimisé. Le code est 4 fois plus rapide que le plus rapide des codes testés, ce qui permet d'affirmer que l'effort a permis de gagner 3 ans. Ce gain permet d'utiliser une machine plus vieille à vitesse égale (donc moins chère) ou bien de gagner 3 ans sur la machine la plus récente. Cet aspect d'économie est valable si le code original était "bâclé", mais reste dans le cadre de la démonstration du fait qu'un codage consciencieux n'est pas une perte de temps à longue échéance."
I wanted to add something smart but it turned out that in chapter 2.5 you already did that. Chapeau.
@Thomas To be fair, I spent more time overall than normal on this thesis. I played with the algorithms and prepared all the elements for maybe 2 years, before the masters year, which turned into two, for the bit-parallel version described. Thus it is quite thorough.
And it seems that the Intel 13 and 14 generations crap on themselves https://www.youtube.com/watch?v=OVdmK1UGzGs So I'll have to wait even more until gen 15, if they finally get their shit together at last maybe who knows.
trying to run these type simulations has always been a pain for me maybe you could run my space plane fuselage and see what it does when I post it in a iges file soon on my entry
https://youtu.be/IjmostrFetg?t=167
Only today do I find this video...