Byte Array Comparison Not Accurate - MD5CryptoServiceProvider

G

Guest

I picked up the following code posted by a MVP at some newsgroup. I am using
the code to compare excel files. It works great for considerable changes but
when the difference in the excel files is quite minor (for instance if I
change only one or less than 10 cells in one file), the comparison fails to
pick up the differences. Any thoughts? (Code below)

Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As
System.EventArgs) Handles MyBase.Load
Dim abyt1() As Byte = {12, 55, 88, 32}
Dim abyt2() As Byte = {12, 55, 88, 32}
Dim fs As IO.FileStream

fs = New IO.FileStream("File1.xls", IO.FileMode.Open)
ReDim abyt1(fs.Length)

fs = New IO.FileStream("File2.xls", IO.FileMode.Open)
ReDim abyt2(fs.Length)

Dim IsDifferent As String = CType(ArrayDif(abyt1, abyt2), String)
System.Windows.Forms.MessageBox.Show(IsDifferent)

End Sub




Public Function ArrayDif(ByVal array1() As Byte, ByVal array2() As Byte)
As Boolean
Dim Hash1() As Byte = New
MD5CryptoServiceProvider().ComputeHash(array1)
Dim Hash2() As Byte = New
MD5CryptoServiceProvider().ComputeHash(array2)
For i As Int64 = 0 To Math.Min(Hash1.Length, Hash2.Length) - 1
If Hash1(i) <> Hash2(i) Then
Return False
Exit Function
End If
Next
Return True
End Function
 
P

Patrice

Have you also checked the length of those arrays ? (AFAIK they should be of
fixed size). If they can be of different size, here you'll consider that
arrays are the same if the elements of the smaller one are the same than
those of the bigger one on the length of the smaller one.

Else it looks like you found a collision. The hash value is not the real
thing so you could find distinct values giving the same hash value.

Have you tried the direct approach ? Do you have a problem with this one ?
(also keep in mind that you can begin by checking their size and that you
have to perform a byte comparison only if they match in length).

I would avoid these kind of hacks that tends to introduce subtle problems...

What you are trying to do may also help (for example xcopy or robocy uses
also the file timestamp).
 
P

Patrice

Also I suppose you omitted the code that reads the Excel file inside the
byte array ?

I found since http://www.fastsum.com/. you could likely give it a try to see
if you actually have two files with the same hash value...
 
G

Guest

If you change just one bit in the file, you should get a hash code that
is completely different, so it sounds strange if you manage to change
several bytes without getting a difference.

Show the code that you are actually using instead. The code that you
showed doesn't even read the files.
 
G

Guest

Thanks for all your responses. Odd that I actually DID indeed include the
code to read in the files. Anyways, I managed to fix the problem...apparantly
I wasnt reading the files into the arrays propertly. FOr the benefit of
anyone with the same problem, here's the code that works.

Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As
System.EventArgs) Handles MyBase.Load


Dim abyt1() As Byte
Dim abyt2() As Byte
Dim fs As IO.FileStream
Dim fs1 As IO.FileStream

fs = New IO.FileStream("File1.xls", IO.FileMode.Open)
Dim reader1 As BinaryReader = New BinaryReader(fs)
ReDim abyt1(fs.Length)
Dim iCount1 As Integer = reader1.Read(abyt1, 0, fs.Length)

fs1 = New IO.FileStream("File2.xls", IO.FileMode.Open)
Dim reader2 As BinaryReader = New BinaryReader(fs1)
ReDim abyt2(fs1.Length)
Dim iCount2 As Integer = reader2.Read(abyt2, 0, fs1.Length)



Dim IsDifferent As String = CType(ArrayDif(abyt1, abyt2), String)
System.Windows.Forms.MessageBox.Show(IsDifferent)
End Sub




Public Function ArrayDif(ByVal array1() As Byte, ByVal array2() As Byte)
As Boolean
Dim Hash1() As Byte = New
MD5CryptoServiceProvider().ComputeHash(array1)
Dim Hash2() As Byte = New
MD5CryptoServiceProvider().ComputeHash(array2)

If Hash1.Length <> Hash2.Length Then
Return False
Else
For i As Int64 = 0 To Math.Min(Hash1.Length, Hash2.Length) - 1
If Hash1(i) <> Hash2(i) Then
Return False
Exit Function
End If
Next
End If
Return True
End Function
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top