Duplicate file checker in C#?

  • Thread starter Thread starter rob
  • Start date Start date
R

rob

Does anyone know of a duplicate file checker project in C#? Couldn't
locate anything on CodeProject or SourceForge.

Has anyone here considered writing one?
 
Does anyone know of a duplicate file checker project in C#? Couldn't
locate anything on CodeProject or SourceForge.

Has anyone here considered writing one?

What do you mean by duplicate file checker?

Do you want to compare the content of two files, or do you want to see if a file
exists in more than one place on a drive or drives?
Good luck with your project,

Otis Mukinfus
http://www.arltex.com
http://www.tomchilders.com
 
What do you mean by duplicate file checker?

Do you want to compare the content of two files, or do you want to see if a file
exists in more than one place on a drive or drives?

I should have said "Finder" rather than "Checker".

Dupe finders usually track down multiple copies of one file existing
within a set of folders. Used for hunting down disk-hogging
duplicates of large files. Differences in commercial/PD dupe-finders
are primarily the UI, but there are also variations on the method for
fingerprinting files (no assumptions are made that the names or dates
are identical). The usual approach is to identify files by doing an
MD5 or sorting by size and doing a byte-by-byte compare (BTW, I can't
see why the MD5 would be any faster than byte-by-byte, except if more
than two copies of one file are present).

So it's a matter of recursing through folder structures, logging
files, then finding out if they are duplicates. The process after
that is usually where things are missing. Everyone has their own
ideas about how to deal with the dupes after they are located.

Given the need to customize the UI, I thought this would be one of the
most-hacked types of programs out there, but I found nothing in C# on
Sourceforge.

By the way, my own interest is just for my own use, not for any
commercial endeavor. It would be a cool thing to post as a community
effort, so I was surprised it had not been done.
 
I should have said "Finder" rather than "Checker".

Dupe finders usually track down multiple copies of one file existing
within a set of folders. Used for hunting down disk-hogging
duplicates of large files. Differences in commercial/PD dupe-finders
are primarily the UI, but there are also variations on the method for
fingerprinting files (no assumptions are made that the names or dates
are identical). The usual approach is to identify files by doing an
MD5 or sorting by size and doing a byte-by-byte compare (BTW, I can't
see why the MD5 would be any faster than byte-by-byte, except if more
than two copies of one file are present).

So it's a matter of recursing through folder structures, logging
files, then finding out if they are duplicates. The process after
that is usually where things are missing. Everyone has their own
ideas about how to deal with the dupes after they are located.

Given the need to customize the UI, I thought this would be one of the
most-hacked types of programs out there, but I found nothing in C# on
Sourceforge.

By the way, my own interest is just for my own use, not for any
commercial endeavor. It would be a cool thing to post as a community
effort, so I was surprised it had not been done.

I was interested when I saw your post because a co-worker of mine has been given
a similar task. His assignment was to write something that compares the files
on two servers to determine if both have the same set of files. Actually I'm
glad he got the task rather than me. I think he will probably use the FileInfo
and DirectoryInfo classes to find duplicate names, then as you say decide how to
determine if files with the same name truly are the same file. After that he'll
have to figure out which is the correct one.

Regarding the solution to your project. It sounds like you have the methodology
worked out. Time to start coding ;o)
Good luck with your project,

Otis Mukinfus
http://www.arltex.com
http://www.tomchilders.com
 
I was interested when I saw your post because a co-worker of mine has been given
a similar task. His assignment was to write something that compares the files
on two servers to determine if both have the same set of files. Actually I'm
glad he got the task rather than me. I think he will probably use the FileInfo
and DirectoryInfo classes to find duplicate names, then as you say decide how to
determine if files with the same name truly are the same file.

If he doesn't need to do that in C#, he could use "Beyond Compare"
(www.ScooterSoftware.com), an excellent folder comparison program.
There may be a way to use it from C# using its plugin interface, but I
haven't tried that.

I need to do a generalized global search, and I can't count on the
file names being the same, so I can't go that route. Looks like I'll
have to write mine from the ground up. Amazing that there's no C#
code available for this.
 
If he doesn't need to do that in C#, he could use "Beyond Compare"
(www.ScooterSoftware.com), an excellent folder comparison program.
There may be a way to use it from C# using its plugin interface, but I
haven't tried that.

I need to do a generalized global search, and I can't count on the
file names being the same, so I can't go that route. Looks like I'll
have to write mine from the ground up. Amazing that there's no C#
code available for this.

Thanks, Rob. I'll pass that on to him.
Good luck with your project,

Otis Mukinfus
http://www.arltex.com
http://www.tomchilders.com
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Back
Top