Close

Parallel scan, at the thread level

A project log for PEAC Pisano with End-Around Carry algorithm

Add X to Y and Y to X, says the song. And carry on.

yann-guidon-ygdesYann Guidon / YGDES 12/17/2021 at 06:270 Comments

I'll try to map w26 with the i7-10750H in scalar mode first, as a warm-up for later. That's 12 logic cores or threads that could run a whole week with no interruption, this should be enough to gather the required data. After all, I have not progressed much on the fusion program so no need to focus on SIMD/AVX2 right now, this will be for larger widths.

w26 will generate 26×2^26 bytes (or 2^27, by coincidence) = 134217728 bytes, 134MB, which fits nicely in RAM, so I won't need crazy streaming/sorting optimisation, a few flat files will do: the SSD can read or write at 200MB/s so I will not have to wait too long for processing.

The goal at this point is to implement a better version of the scanner, which includes features of the bash script, and to expand on them. This should not be difficult as I have already used pthreads in the log 29. Going both ways and pt_orbit_07.c.

The chosen organisation is pretty simple: a "management" thread distributes work to N worker threads and receives results through FIFOs. This way, OS call overhead will be focused on a single core that will also handle the rest of the computer's activity, by calling sleep() at 10Hz or so. A curses-like status screen would be nice too, so I can monitor the activity and detect anomalous behaviour.

As before, the start and end of the range, as well as the output file name, will be on the command line, so I can script multiple runs and checkpoints. The file output however will not be sorted, sort will do the trick, particularly well if multiple partial results are processed initially.

The little interesting trick for this version (that differs from the bash version) is that the input FIFO are filled before the respective threads are started. The program ends when all the input and output FIFO are empty. A scanner thread would end by itself when its input FIFO is empty, so the FIFO size should be well sized and carefully managed. This should correctly handle various cases, such as when there are fewer X to scan than available FIFO slots, or even FIFOs, without using mathematical gymnastics. Just distribute the work in round-robin way.

Discussions