Mahmoud,
I hate to say this, but in the time it probably took to write this post,
you could have easily generated numbers which show the performance profiles
for your particular case.
I would look at the Stopwatch class, and then start testing to see how
long it would take to perform each operation. The operations themselves, as
well as the code to perform the timing, aren't difficult at all.
If I had to guess, for files that are 1024 bytes, it probably is easier
to just loop through them to see if any of the bytes differ. It would
probably be much faster than hashing the whole thing (since the hash has to
cycle through all of the bytes anyways, and you are cutting out if you find
a difference between any two of them).
Even in the 512kb case, you might want to use the method that loops
through two streams. This is an important point. Make sure you do not load
the entire contents of the two files into memory. For the small files, it's
no big deal, but for large files, you are going to take a hit trying to load
that into memory. By reading chunks of the files into memory, and then
comparing the chunks, you are going to make the process much more efficient.
Hope this helps.
--
- Nicholas Paldino [.NET/C# MVP]
-
(E-Mail Removed)
"Mahmoud Al-Qudsi" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> I'm looking to compare the contents of two files. Files will generally
> not exceed 1024 *bytes* in length.
> Given this info, and assuming that the accuracy/reliability of SHA1 is
> more than enough, is it more efficient to
>
> a) Use System.Security.Cryptography and get the SHA1 of each binary
> file and compare the two hashes
> b) Create a byte-by-byte checker that loops through the two files and
> exits with a false when a byte doesn't match in the same location
> between the two files?
>
> Generally speaking, I'd use the second method when dealing with
> anything larger than 512kb, expecting it to take less resources/time.
>
> However, in the case of such small files, is SHA1 a better-performing
> alternative? What about MD5?
> Assuming 99% of the time the two files will match, is MD5's limited
> reliability enough to determine whether the two files are a match? Is
> the performance difference between MD5 and SHA1 worth going with MD5
> or am I better off sticking with the latter?
>
> I'm guessing MD5 is good enough, that SHA1 takes a lot longer, and
> that it won't matter since byte-by-byte is more efficient and faster
> code (assuming it's programmed half-decently of course)... But I'd
> like to make sure since I'm looking for a minimal hit on system
> resources.
>
> Thanks!
>