For a binary comparison of 2 large files you need to read the files and either perform a binary compare or create CRCs. However you need to read the entire file, which may be expensive.
Over the time my digital photo directories have grown and there were duplicates all over the place. In the past I used a full binary compare which took time and was inlegant.
I have implemented a small DLL which allows the creation of partial fingerprints of files. This means instead of reading several MB you decide for example to read 20 blocks of 20 K and produce the fingerprint of it. It is of course not absolutely save but very unlikely that these blocks are equal and the file is not.
There are 2 fingerprint methods which maybe used the SHA1 (http://en.wikipedia.org/wiki/SHA1) or the CRC fingerprint (http://en.wikipedia.org/wiki/Cyclic_redundancy_check).
in the DLL are 2 funtions implemented to create a fingerprint of a file:
sha1=sha1file(file,blocks-to-read,block-length)
crc=crcfile(file,blocks-to-read,block-length)
blocks-to-read: defines the numbers of blocks which should be read in the to create an SHA1 or a CRCfingerprint default value is 32 block-length: is the block length which is read default value is 16000
if block-to-read*block-length is greater than 90% of the file, the entire file is processed
additionally there are 2 funtions implemented which return the fingerprint of passed strings:
sha1=sha1string(string)
crc=crcstring(string)
Example:
envdir=DIRECTORY()"\FingerPrint"
rc=0
SAY FUNCDEF('SHA1File','str,str,32,32',envdir,'SHA1FILE')
SAY FUNCDEF('CRCFile','32u,str,32,32',envdir,'CRCFILE')
SAY FUNCDEF('SHA1String','str,str,32',envdir,'SHA1STRING')
SAY FUNCDEF('CRCString','32u,str,32',envdir,'CRCSTRING')
regdir=VALUE('HKEY_LOCAL_MACHINE\Software\Microsoft\Windows\CurrentVersion\App Paths\RxLaunch.exe\Path', ,"WIN32")
file=regdir"RXLaunch.exe"
SAY "Reginald Directory: "regdir
SAY "Fingerprint of : "file
SAY "SHA1: "sha1file(file,256,100)
SAY "CRC : "crcfile(file,256,100)
SAY "Fingerprint of strings"
SAY "SHA1: "sha1string("The quick brown fox jumps over the lazy dog")
SAY "SHA1: "sha1string("The quick brown fox jumps over the lazy cog")
SAY "CRC : "crcstring("The quick brown fox jumps over the lazy dog")
SAY "CRC : "crcstring("The quick brown fox jumps over the lazy cog")
|