Fusion

As the scanner becomes more and more efficient, more and more data are generated and... stay on the hard disks. But the scanner is only the first half of the system and all these semi-arcs must be re-joined together one day (ASAP) to rebuild the whole orbits and tally their characteristics. This is the purpose of the "fusion" program which I describe here. I have already discussed the algorithm there:

90. Post-processing
91. Post-processing: the strategy is coming
92. Post-processing: the strategy looks good

but here is a quick reminder since some details have been confirmed since these logs.

The chosen algorithm is not obvious so let's see the flaws of the trivial version. Ideally, the fusion program must be scalable and perform significant work for every pass, or it will take ages to process over 100GB of data, even at 100MB/s.

Let's imagine a program that streams the log file and writes a compacted log file. This program "consumes" a source file, gets the first entries and stores them in memory, waiting for more entries to match the end of the entries in memory. This creates many problems:

The entries are stored in RAM in the order of their appearance, which is usually in increasing order. This is not the order where they are looked up because we also care about the end of the semi-arc. Not only is this a random access, it is also a random lookup... Somehow the order must be restored so it's creating all kinds of memory management headaches (particularly in C). A large database might be used but who would consider this ?
The capacity of the RAM is limited, compared to the file sizes. At most, this process can only remove as many entries as the RAM can hold, which is (optimistically) 500M for 16GB. Many passes are required, since this is a sort of linear (O(n)) process. We all know O(log(n)) is better, right ?
Of course, some optimisations are possible but the program can not be overly complicated because there is only so much time to spend on it. The simpler the program, the earlier it is operational.

So the program uses a different approach, with 2 streams of input data, made of the log sorted with different orders. This doubles the required storage but simplifies things as well (and some operations could be performed "in-place" by reusing the space of used data).

To maximise the efficiency and reduce the development time, we enlist the program sort to shuffle the binary p7k log file. This way, the fusion program does not need to duplicate data in memory or search random entries. This way, each pass should cut the number of entries in half, which satisfies the scalability requirement because the whole process, even though it might be quite heavy, works in O(log(n)). So w32 will use at most 32 passes.

The only required RAM storage is a "scoreboard" that flags the entries that are being merged. Though, after a bit of thinking, this might not even be necessary because the information "this entry is already processed" could be found/deduced from the available stream, if we bother looking at the right place (hint: one index must be < the current index).

Anyway, you must remember one of the conditions for an orbit to be "maximal": the sum of the lengths of all the semi-arcs must equal a precise value. We can already compute this with an adaptation of a simple tool that is already written: dump7ck.c though in practice, its runtime is sub-optimal, I'd say. So as a preliminary check, I can adapt dump7ck.c by 1) speeding it up and 2) accumulating all the semi-arcs lengths. This will help discarding width that contain orbits with no 0-crossing.

Then the program can be adapted to read 2 input streams for combination.

__________________________________________________________

Update: dump7ck.tgz is ready and stops hammering the kernel with small read requests.

Of the difficulty to microbenchmark on modern processors

Summing it up...

Discussions

Become a Hackaday.io Member