I had a decent working version of GDUPS while it was published in the GNU/Linux Magazine France (GLMF) n°64 (pp 56..65) in May 2004 and it has since evolved a bit.
http://connect.ed-diamond.com/GNU-Linux-Magazine/GLMF-061/gdups
Very useful when you have lousy filesystem hygiene or duplicates of directories here and there. It's necessary if you want to "deduplicate" and save some space.
Lately I have noticed false positives and weird behaviour : is this because of Linux evolution or "scale effect" ? Did I insert bugs ?
Several enhancements are necessary, such as a better signature algorithm and "partial signature" (only the first 64KB) to prevent signing large/huge files uselessly...
Please be kind and forgiving, I was only a moderately good programmer at this time :-) I publish here to learn from my youth's mistake...
The CRC system would be greatly enhanced with the new methods of #PEAC Pisano with End-Around Carry algorithm :-D