PC Review


Reply
Thread Tools Rate Thread

File dupe finder - when file names are not the same?

 
 
Sam
Guest
Posts: n/a
 
      5th May 2005

I was wondering if there was a utility that could search a local hard
drive and scan file contents of common file types (documents,
presentations, spreadsheets, text files) and report back those that
seem to be identical or near perfect matches. Maybe using some sort
of percentage match relationship.

With the advent of e-mail file attachments, it's easy to get large
numbers of duplicate files. My e-mail program (at home) is Eudora and
one of its constructs is that all attachments are in fact detached and
stored in a directory. If you get the same file many times - Eudora
dutifully stores them by indexing the files (e.g. sam.txt becomes
sam1.txt, sam2.txt, etc.).

At work, we iterate on lots of documents during preparation (nothing
new I'm sure) and so have many versions upto the final. It would be
nice to find all the similar files then have the option of doing
something with them (sort, group zip, delete, etc.).

If anyone has any idea I'd be glad to hear it. File discipline is one
of my strong points but once overwhelmed it's really hard to clean up
the mess.

Thanks.

Sam
 
Reply With Quote
 
 
 
 
Alex Peters
Guest
Posts: n/a
 
      5th May 2005
Sam wrote:

> I was wondering if there was a utility that could search a local hard
> drive and scan file contents of common file types (documents,
> presentations, spreadsheets, text files) and report back those that
> seem to be identical or near perfect matches. Maybe using some sort
> of percentage match relationship.


DupeLocater v2.0
http://www.freewareweb.com/cgi-bin/archive.cgi?ID=205

--

Best regards,
Alex Peters
 
Reply With Quote
 
(ProteanThread)
Guest
Posts: n/a
 
      5th May 2005
>> I was wondering if there was a utility that could search a local
>> hard drive and scan file contents of common file types
>> (documents, presentations, spreadsheets, text files) and report
>> back those that seem to be identical or near perfect matches.
>> Maybe using some sort of percentage match relationship.


there is also one called doublekiller, a single no install app; as soon
as i find it i'll post the link unless someone beats me to it.

 
Reply With Quote
 
MLC
Guest
Posts: n/a
 
      5th May 2005
_(ProteanThread)_, giovedì 05/mag/2005:

> there is also one called doublekiller, a single no install app; as soon
> as i find it i'll post the link unless someone beats me to it.


http://www.bigbangenterprises.de/en/doublekiller/

;-)
--
Maria Luisa C - 05/05/2005 15.25.46
 
Reply With Quote
 
Mel
Guest
Posts: n/a
 
      5th May 2005
>On Thu, 05 May 2005 12:57:30 GMT, "(ProteanThread)" <(E-Mail Removed)> wrote:

>>> I was wondering if there was a utility that could search a local
>>> hard drive and scan file contents of common file types
>>> (documents, presentations, spreadsheets, text files) and report
>>> back those that seem to be identical or near perfect matches.
>>> Maybe using some sort of percentage match relationship.

>
>there is also one called doublekiller, a single no install app; as soon
>as i find it i'll post the link unless someone beats me to it.


It may be on this page, but I'm not sure:

http://freeware.intrastar.net/uninstal.htm
 
Reply With Quote
 
Susan Bugher
Guest
Posts: n/a
 
      5th May 2005
Alex Peters wrote:

> Sam wrote:
>
>> I was wondering if there was a utility that could search a local hard
>> drive and scan file contents of common file types (documents,
>> presentations, spreadsheets, text files) and report back those that
>> seem to be identical or near perfect matches. Maybe using some sort
>> of percentage match relationship.

>
> DupeLocater v2.0
> http://www.freewareweb.com/cgi-bin/archive.cgi?ID=205


I believe this is the last freeware version:

Program: DupeLocater
Author: Midnight Blue Software
Install: n.i.
W: LFW
Ware: v 1.0.0.1
http://ftp.tdcnorge.no/pub/windows/misc/

More apps are listed here:

http://www.pricelesswarehome.org/acf...ateFileChecker

For graphics files see this list:

http://www.pricelesswarehome.org/acf...Checker;Images

Susan
--
Posted to alt.comp.freeware
Search alt.comp.freeware (or read it online):
http://google.ca/advanced_group_sear....comp.freeware
Pricelessware & ACF: http://www.pricelesswarehome.org
Pricelessware: http://www.pricelessware.org (not maintained)

 
Reply With Quote
 
John Fitzsimons
Guest
Posts: n/a
 
      6th May 2005
On Thu, 05 May 2005 02:17:18 GMT, Sam <(E-Mail Removed)> wrote:

>I was wondering if there was a utility that could search a local hard
>drive and scan file contents of common file types (documents,
>presentations, spreadsheets, text files) and report back those that
>seem to be identical or near perfect matches. Maybe using some sort
>of percentage match relationship.


< snip >

If such a program existed then it would be very handy. IIRC there is a
graphic file compare program that gives you a % identical in two files
but I have never heard of that in non graphics files. If you find one
please post the info here. Most (all ?) dupe detectors only register
files as duplicates if the contents match 100%.

Regards, John.
--
****************************************************
,-._|\ (A.C.F FAQ) http://clients.net2000.com.au/~johnf/faq.html
/ Oz \ John Fitzsimons - Melbourne, Australia.
\_,--.x/ http://www.vicnet.net.au/~johnf/welcome.htm
v http://clients.net2000.com.au/~johnf/
 
Reply With Quote
 
Sam
Guest
Posts: n/a
 
      7th May 2005
On Thu, 05 May 2005 22:37:06 +1000, Alex Peters
<(E-Mail Removed)> wrote:

>Sam wrote:
>
>> I was wondering if there was a utility that could search a local hard
>> drive and scan file contents of common file types (documents,
>> presentations, spreadsheets, text files) and report back those that
>> seem to be identical or near perfect matches. Maybe using some sort
>> of percentage match relationship.

>
>DupeLocater v2.0
>http://www.freewareweb.com/cgi-bin/archive.cgi?ID=205


just some feedback to the group -

Downloaded and tried DupeLocater - it's rather simplistic but
it does work. It does not provide any "match" data only file names.
It does catch "incrementing" the file name. It seems to catch exact
same content in totally different file names but not formats.

I took a text file 'name.txt' and copied it as mena.xls, and eman.doc
(did not change the format - just the name). Everything worked as
expected. DupeLocater identified the two new files as dupes of the
original.

Then I took mena.xls - and saved it using Excel as mena2.xls as a
worksheet, and I used Word to save eman.doc as eman2.doc. This didn't
work as a document. DupeLocater sill reported the renamed text files
as dupes but the word & excel formatted files were not reported even
though to me (the user) the "content" was the same.

Going one step further - I took the mena2.xls (real) excel file and
copied it to drat3.xls and similarly for eman2.doc (real) word saved
as lost.doc. This did work in that DupeLocater identified the two new
files as dupes of their respective originals.

It will sometimes report non match files - I haven't figured out why.
I have a couple of index files (binary formats) that get reported as
dupes of unrelated list files. The files are not the same size (or
anywhere close). The content of the list files is (it appears)
something like rtf (text + markup) or it could be a delimited file in
some way. The index files are binary format (I developed the program
that creates them.) The good news is that they are so dissimilar that
it's easy to catch.

And lastly - just for fun; I took a key phrase which appears in all
these files and plugged it into Google Desktop Search and it reported
back all the files regardless of format.

end of report...

Sam


 
Reply With Quote
 
David
Guest
Posts: n/a
 
      8th May 2005
On Sat, 07 May 2005 16:26:59 GMT, Sam <(E-Mail Removed)> typed
furiously:

>On Thu, 05 May 2005 22:37:06 +1000, Alex Peters
><(E-Mail Removed)> wrote:
>
>>Sam wrote:
>>
>>> I was wondering if there was a utility that could search a local hard
>>> drive and scan file contents of common file types (documents,
>>> presentations, spreadsheets, text files) and report back those that
>>> seem to be identical or near perfect matches. Maybe using some sort
>>> of percentage match relationship.

>>
>>DupeLocater v2.0
>>http://www.freewareweb.com/cgi-bin/archive.cgi?ID=205

>
>just some feedback to the group -
>
> Downloaded and tried DupeLocater - it's rather simplistic but
>it does work. It does not provide any "match" data only file names.
>It does catch "incrementing" the file name. It seems to catch exact
>same content in totally different file names but not formats.
>
>I took a text file 'name.txt' and copied it as mena.xls, and eman.doc
>(did not change the format - just the name). Everything worked as
>expected. DupeLocater identified the two new files as dupes of the
>original.
>
>Then I took mena.xls - and saved it using Excel as mena2.xls as a
>worksheet, and I used Word to save eman.doc as eman2.doc. This didn't
>work as a document. DupeLocater sill reported the renamed text files
>as dupes but the word & excel formatted files were not reported even
>though to me (the user) the "content" was the same.
>

Which is what I would expect. Did you think to compare the file sizes
after the Excel and Word saves. They will be much larger and thus
could not possibly be considered as duplicates.

>Going one step further - I took the mena2.xls (real) excel file and
>copied it to drat3.xls and similarly for eman2.doc (real) word saved
>as lost.doc. This did work in that DupeLocater identified the two new
>files as dupes of their respective originals.
>
>It will sometimes report non match files - I haven't figured out why.
>I have a couple of index files (binary formats) that get reported as
>dupes of unrelated list files. The files are not the same size (or
>anywhere close). The content of the list files is (it appears)
>something like rtf (text + markup) or it could be a delimited file in
>some way. The index files are binary format (I developed the program
>that creates them.) The good news is that they are so dissimilar that
>it's easy to catch.
>
>And lastly - just for fun; I took a key phrase which appears in all
>these files and plugged it into Google Desktop Search and it reported
>back all the files regardless of format.
>

DupeLocater is not about the content of file but about whether they
are identical or not. One byte of difference makes files
non-identical.

--
David
Remove "farook" to reply
At the bottom of the application where it says
"sign here". I put "Sagittarius"
 
Reply With Quote
 
Sam
Guest
Posts: n/a
 
      8th May 2005
On Sun, 08 May 2005 23:32:12 +0930, David <(E-Mail Removed)>
wrote:

>On Sat, 07 May 2005 16:26:59 GMT, Sam <(E-Mail Removed)> typed
>furiously:
>
>>On Thu, 05 May 2005 22:37:06 +1000, Alex Peters
>><(E-Mail Removed)> wrote:
>>
>>>[big snip]

>
>DupeLocater is not about the content of file but about whether they
>are identical or not. One byte of difference makes files
>non-identical.


Which sadly is the point. That is why I was looking for "match data"
in my original post. If DupeLocater (just as an example) had reported
that the content of File X is 99.8% the same as File Y - I would have
more to go on.

In today's world of streaming information, we need 'similar.' But this
implies that the tools contain the necessary mechanisms to decipher
file formats. This is particularly notable in the plain text file
versus the MS Word document.

I am sadly all to familiar with the computer's binary view of files -
and I don't really care (in this context). What I care about is the
user's view of files. If the text, in a text formatted file, is
_identical_ to the text in a Word document, I think the tools should
report that in some way. The fact that their checksum is different is
not the deciding factor. It _is_ about the content.

That is why I made the point about Google DTS. While DupeLocater
determined that the files are not 'identical,' Google dutifully
reported however that the 'content' - regardless of format - was found
in every file.

There are other tools and I'm still looking. I'm patient and not all
that old.

Sam
 
Reply With Quote
 
 
 
Reply

Thread Tools
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
can I trust 'Easy Duplicate Finder?' And can anyone recommend areliable FREE duplicate file finder? alan Windows XP Help 5 6th Jan 2012 10:22 AM
Searching Dupes file finder for 2 different file sets (!!!) Frank Callone Windows XP General 0 24th Aug 2007 02:36 PM
Searching Dupes file finder for 2 different file sets (!!!) Frank Callone Windows XP Help 0 24th Aug 2007 02:36 PM
dupe file finder? Spoon2001 Freeware 13 1st Dec 2003 07:14 PM
Re: Directory Finder (As Opposed To File Finder) John Fitzsimons Freeware 0 13th Aug 2003 03:47 AM


Features
 

Advertising
 

Newsgroups
 


All times are GMT +1. The time now is 11:06 PM.