On Thu, 05 May 2005 22:37:06 +1000, Alex Peters
<(E-Mail Removed)> wrote:
>Sam wrote:
>
>> I was wondering if there was a utility that could search a local hard
>> drive and scan file contents of common file types (documents,
>> presentations, spreadsheets, text files) and report back those that
>> seem to be identical or near perfect matches. Maybe using some sort
>> of percentage match relationship.
>
>DupeLocater v2.0
>http://www.freewareweb.com/cgi-bin/archive.cgi?ID=205
just some feedback to the group -
Downloaded and tried DupeLocater - it's rather simplistic but
it does work. It does not provide any "match" data only file names.
It does catch "incrementing" the file name. It seems to catch exact
same content in totally different file names but not formats.
I took a text file 'name.txt' and copied it as mena.xls, and eman.doc
(did not change the format - just the name). Everything worked as
expected. DupeLocater identified the two new files as dupes of the
original.
Then I took mena.xls - and saved it using Excel as mena2.xls as a
worksheet, and I used Word to save eman.doc as eman2.doc. This didn't
work as a document. DupeLocater sill reported the renamed text files
as dupes but the word & excel formatted files were not reported even
though to me (the user) the "content" was the same.
Going one step further - I took the mena2.xls (real) excel file and
copied it to drat3.xls and similarly for eman2.doc (real) word saved
as lost.doc. This did work in that DupeLocater identified the two new
files as dupes of their respective originals.
It will sometimes report non match files - I haven't figured out why.
I have a couple of index files (binary formats) that get reported as
dupes of unrelated list files. The files are not the same size (or
anywhere close). The content of the list files is (it appears)
something like rtf (text + markup) or it could be a delimited file in
some way. The index files are binary format (I developed the program
that creates them.) The good news is that they are so dissimilar that
it's easy to catch.
And lastly - just for fun; I took a key phrase which appears in all
these files and plugged it into Google Desktop Search and it reported
back all the files regardless of format.
end of report...
Sam