It was easy to implement the "MIDI/UTF8"-like packing system (see p7ck.c) and I had no problem integrating it with the scanner. The files are quite large (at a fixed 26 bytes per entry) but compress slightly better than decimal-based representations and each output byte takes only a few instructions for import and export. It's fast, the code is compact and it's efficient. And as the update below shows, sort deals well with it.
I already have a script that spreads computations over parallel programs and it's quite convenient for the widths between 16 to 23. I could get a 12× speedup on the i7-10750H with no effort though I should use the pthread library instead of independent programs for convenience: if I want to stop scanning, I must kill all the scanners one by one, or invoke killall...
There is no resumption feature yet but the scan program can start and end at any point. You can find the latest code at parscan01.tgz and it's still missing some features, which I integrate one by one.
____________________________________________
Update: sort works !
It's as simple as this:
LC_ALL=C sort -z -k 1.6,1.10 log10.0.p7k | od -An -v -w26 -t x1
The od command is only to display the bytes. I should write a proper program for this...
I use "only" 3 options/tricks to coerce sort to behave as I want:
- LC_ALL=C turns UTF-8 interpretations off, which boosts speed.
- -z specified 0-terminated lines.
- -k 1.6,1.10 selects the 2nd batch of 5 bytes (there is only 1 key because no separator, except end of line).
Et voilà ! C'est réglé !
The next steps:
- use threads instead of independent programs, (done)
- implement dynamic allocation of ranges, (done)
- start writing the fusion program
- continue with the SIMD development...
tbc...
Discussions
Become a Hackaday.io Member
Create an account to leave a comment. Already have an account? Log In.