Differential benchmarking

So I coded away and it looks promising.

I implemented the ideas of the last log : alternate the reference code and the candidate code, and yield in-between. A first run found a 0,09% match between the reference code and itself :-) And I have not even added real-time scheduling.

Of course, the rest of the computer's activity matters. Here I run a baseline CPU-bound test and then I started 2 web browsers:

This shows that a purely CPU-bound performance is affected by other things running somewhere else, and Chrome is a big liability because it starts so many sub-processes. May I feign surprise or did I just launch it to prove my point ?

So the system must be as idle as possible to provide significant and meaningful results, but it's barely a surprise, right ? Going SCHED_RR would reduce the parasitic influence but that's not for now, yet, because I want to prove that differential benchmarking is a bit immune to it already.

The current code calculates a rough arithmetic average but even this value drifts with time, over minutes, so differential tests are really important. It is not the absolute performance of one 1-second run that matters, but the ratio between two consecutive runs, averaged over time.

...............

Newer code uses floating point numbers to compute the statistics, which prevents some silly overflows. Further testing shows self-agreement of 0.02% under idle conditions so this is relevant. I uploaded the source file at bench.diff_float.c. Time for peep-hole optimisations now !

Update :

chrt -r 95 ./bench | tee baseline8.times

this not only sets the priority but also the scheduling and this halves the uncertainty :

# chrt -r 95 ./bench | tee baseline8.times
#iter.   duration   iterations  perf  |    duration   iterations  perf  |  ratio 
0  2693438077  335678945  0.1246284   2693510097  335616415  0.1246019  -0.0002131
1  2693504839  337332967  0.1252394   2693507091  336461227  0.1249157  -0.0025851
2  2693501590  334610597  0.1242288   2693510382  335716993  0.1246392  0.0033033
.....
46  2693634063  334390965  0.1241412   2693562100  336205929  0.1248183  0.0054545
47  2693625754  333694985  0.1238832   2693554345  336532681  0.1249400  0.0085306
48  2693583228  334283753  0.1241037   2693589913  336257767  0.1248363  0.0059027
49  2693619508  335573853  0.1245810   2693680867  334681913  0.1242471  -0.0026806
# 50:  0.1243920   0.1245018   0.0008917

0.089% is pretty good :-)

Discussions

sxpert wrote 05/02/2023 at 16:37

it's me or Firefox looks to be better ?

Are you sure? yes | no

Yann Guidon / YGDES wrote 05/02/2023 at 16:58

This old version of FF spawns only 2 processes (main and media handler) and was not very active. To load the mule, I opened Chrome that spawns 6 to 8 processes, pointing to a video site that adds quite a few ads.

Yeah Chrome is a performance hog but that old FF is quite slow anyway...

Performance always has a price.

Are you sure? yes | no

Userland relative benchmarking on POSIX

Branchless gPEAC

Discussions

Become a Hackaday.io Member