So I coded away and it looks promising.
I implemented the ideas of the last log : alternate the reference code and the candidate code, and yield in-between. A first run found a 0,09% match between the reference code and itself :-) And I have not even added real-time scheduling.
Of course, the rest of the computer's activity matters. Here I run a baseline CPU-bound test and then I started 2 web browsers:

This shows that a purely CPU-bound performance is affected by other things running somewhere else, and Chrome is a big liability because it starts so many sub-processes. May I feign surprise or did I just launch it to prove my point ?
So the system must be as idle as possible to provide significant and meaningful results, but it's barely a surprise, right ? Going SCHED_RR would reduce the parasitic influence but that's not for now, yet, because I want to prove that differential benchmarking is a bit immune to it already.
The current code calculates a rough arithmetic average but even this value drifts with time, over minutes, so differential tests are really important. It is not the absolute performance of one 1-second run that matters, but the ratio between two consecutive runs, averaged over time.
...............
Newer code uses floating point numbers to compute the statistics, which prevents some silly overflows. Further testing shows self-agreement of 0.02% under idle conditions so this is relevant. I uploaded the source file at bench.diff_float.c. Time for peep-hole optimisations now !
Update :
chrt -r 95 ./bench | tee baseline8.times
this not only sets the priority but also the scheduling and this halves the uncertainty :
# chrt -r 95 ./bench | tee baseline8.times #iter. duration iterations perf | duration iterations perf | ratio 0 2693438077 335678945 0.1246284 2693510097 335616415 0.1246019 -0.0002131 1 2693504839 337332967 0.1252394 2693507091 336461227 0.1249157 -0.0025851 2 2693501590 334610597 0.1242288 2693510382 335716993 0.1246392 0.0033033 ..... 46 2693634063 334390965 0.1241412 2693562100 336205929 0.1248183 0.0054545 47 2693625754 333694985 0.1238832 2693554345 336532681 0.1249400 0.0085306 48 2693583228 334283753 0.1241037 2693589913 336257767 0.1248363 0.0059027 49 2693619508 335573853 0.1245810 2693680867 334681913 0.1242471 -0.0026806 # 50: 0.1243920 0.1245018 0.0008917
0.089% is pretty good :-)
Discussions
Become a Hackaday.io Member
Create an account to leave a comment. Already have an account? Log In.
it's me or Firefox looks to be better ?
Are you sure? yes | no
This old version of FF spawns only 2 processes (main and media handler) and was not very active. To load the mule, I opened Chrome that spawns 6 to 8 processes, pointing to a video site that adds quite a few ads.
Yeah Chrome is a performance hog but that old FF is quite slow anyway...
Performance always has a price.
Are you sure? yes | no