Aren't All MD5 Hashes the Same?

J

Jonathan Wood

Greetings,

I can't seem to find a solution to this.

According to Google's Safe Browsing API, the following code should produce a
matching base64-encoded checksum.

string clientKey = "8eirwN1kTwCzgWA2HxTaRQ==";
string table =
"+8070465bdf3b9c6ad6a89c32e8162ef1\t\n+86fa593a025714f89d6bc8c9c5a191ac\t\n+bbbd7247731cbb7ec1b3a5814ed4bc9d\t\n";
string mac = "dRalfTU+bXwUhlk0NCGJtQ==";

byte[] results = Convert.FromBase64String(clientKey);
StringBuilder sb = new StringBuilder(32);
foreach (byte b in results)
sb.Append(b.ToString("x2"));
string key = sb.ToString();

string s = String.Format("{0}:coolgoog:{1}:coolgoog:{0}", key, table);
MD5 md5 = MD5.Create();
results = md5.ComputeHash(Encoding.Default.GetBytes(s));
s = Convert.ToBase64String(results);

if (s != mac)
s = "Whoops! Doesn't match!";

So, it appears I have something wrong. However, I found someone apparently
resolve this same problem at
http://stackoverflow.com/questions/181994/code-to-verify-updates-from-the-google-safe-browsing-api.
While it appears they are doing the same thing I'm doing, they do it in
another language (Python?). Should an MD5 checksum be the same regardless of
the language?

Can anyone see what I've missed?

Thanks.

Jonathan
 
B

Ben Voigt [C++ MVP]

If the IV and data are the same then the result of the MD5 operation should
also be the same. But are you sure the data is the same? Your string
handling looks very wrong to me, use array concatenation instead of trying
to use string.Format on this binary data.

Jonathan Wood said:
Greetings,

I can't seem to find a solution to this.

According to Google's Safe Browsing API, the following code should produce
a matching base64-encoded checksum.

string clientKey = "8eirwN1kTwCzgWA2HxTaRQ==";
string table =
"+8070465bdf3b9c6ad6a89c32e8162ef1\t\n+86fa593a025714f89d6bc8c9c5a191ac\t\n+bbbd7247731cbb7ec1b3a5814ed4bc9d\t\n";
string mac = "dRalfTU+bXwUhlk0NCGJtQ==";

byte[] results = Convert.FromBase64String(clientKey);
StringBuilder sb = new StringBuilder(32);
foreach (byte b in results)
sb.Append(b.ToString("x2"));
string key = sb.ToString();

string s = String.Format("{0}:coolgoog:{1}:coolgoog:{0}", key, table);
MD5 md5 = MD5.Create();
results = md5.ComputeHash(Encoding.Default.GetBytes(s));
s = Convert.ToBase64String(results);

if (s != mac)
s = "Whoops! Doesn't match!";

So, it appears I have something wrong. However, I found someone apparently
resolve this same problem at
http://stackoverflow.com/questions/181994/code-to-verify-updates-from-the-google-safe-browsing-api.
While it appears they are doing the same thing I'm doing, they do it in
another language (Python?). Should an MD5 checksum be the same regardless
of the language?

Can anyone see what I've missed?

Thanks.

Jonathan


__________ Information from ESET NOD32 Antivirus, version of virus
signature database 4160 (20090616) __________

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com

__________ Information from ESET NOD32 Antivirus, version of virus signature database 4160 (20090616) __________

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 
J

Jonathan Wood

Hi Ben,
If the IV and data are the same then the result of the MD5 operation
should also be the same. But are you sure the data is the same? Your
string handling looks very wrong to me, use array concatenation instead of
trying to use string.Format on this binary data.

I appreciate the input, but it's a little short on details.

Can you be more specific about what is wrong with String.Format? I don't
understand the problem and all you've said is that it looks wrong.

Thanks.

Jonathan

Ben Voigt said:
If the IV and data are the same then the result of the MD5 operation
should also be the same. But are you sure the data is the same? Your
string handling looks very wrong to me, use array concatenation instead of
trying to use string.Format on this binary data.

Jonathan Wood said:
Greetings,

I can't seem to find a solution to this.

According to Google's Safe Browsing API, the following code should
produce a matching base64-encoded checksum.

string clientKey = "8eirwN1kTwCzgWA2HxTaRQ==";
string table =
"+8070465bdf3b9c6ad6a89c32e8162ef1\t\n+86fa593a025714f89d6bc8c9c5a191ac\t\n+bbbd7247731cbb7ec1b3a5814ed4bc9d\t\n";
string mac = "dRalfTU+bXwUhlk0NCGJtQ==";

byte[] results = Convert.FromBase64String(clientKey);
StringBuilder sb = new StringBuilder(32);
foreach (byte b in results)
sb.Append(b.ToString("x2"));
string key = sb.ToString();

string s = String.Format("{0}:coolgoog:{1}:coolgoog:{0}", key, table);
MD5 md5 = MD5.Create();
results = md5.ComputeHash(Encoding.Default.GetBytes(s));
s = Convert.ToBase64String(results);

if (s != mac)
s = "Whoops! Doesn't match!";

So, it appears I have something wrong. However, I found someone
apparently resolve this same problem at
http://stackoverflow.com/questions/181994/code-to-verify-updates-from-the-google-safe-browsing-api.
While it appears they are doing the same thing I'm doing, they do it in
another language (Python?). Should an MD5 checksum be the same regardless
of the language?

Can anyone see what I've missed?

Thanks.

Jonathan


__________ Information from ESET NOD32 Antivirus, version of virus
signature database 4160 (20090616) __________

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com

__________ Information from ESET NOD32 Antivirus, version of virus
signature database 4160 (20090616) __________

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 
J

Jonathan Wood

BTW, none of my arguments are to String.Format is binary--only strings.

Jonathan

Ben Voigt said:
If the IV and data are the same then the result of the MD5 operation
should also be the same. But are you sure the data is the same? Your
string handling looks very wrong to me, use array concatenation instead of
trying to use string.Format on this binary data.

Jonathan Wood said:
Greetings,

I can't seem to find a solution to this.

According to Google's Safe Browsing API, the following code should
produce a matching base64-encoded checksum.

string clientKey = "8eirwN1kTwCzgWA2HxTaRQ==";
string table =
"+8070465bdf3b9c6ad6a89c32e8162ef1\t\n+86fa593a025714f89d6bc8c9c5a191ac\t\n+bbbd7247731cbb7ec1b3a5814ed4bc9d\t\n";
string mac = "dRalfTU+bXwUhlk0NCGJtQ==";

byte[] results = Convert.FromBase64String(clientKey);
StringBuilder sb = new StringBuilder(32);
foreach (byte b in results)
sb.Append(b.ToString("x2"));
string key = sb.ToString();

string s = String.Format("{0}:coolgoog:{1}:coolgoog:{0}", key, table);
MD5 md5 = MD5.Create();
results = md5.ComputeHash(Encoding.Default.GetBytes(s));
s = Convert.ToBase64String(results);

if (s != mac)
s = "Whoops! Doesn't match!";

So, it appears I have something wrong. However, I found someone
apparently resolve this same problem at
http://stackoverflow.com/questions/181994/code-to-verify-updates-from-the-google-safe-browsing-api.
While it appears they are doing the same thing I'm doing, they do it in
another language (Python?). Should an MD5 checksum be the same regardless
of the language?

Can anyone see what I've missed?

Thanks.

Jonathan


__________ Information from ESET NOD32 Antivirus, version of virus
signature database 4160 (20090616) __________

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com

__________ Information from ESET NOD32 Antivirus, version of virus
signature database 4160 (20090616) __________

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 
B

Ben Voigt [C++ MVP]

Jonathan said:
Hi Ben,


I appreciate the input, but it's a little short on details.

Can you be more specific about what is wrong with String.Format? I
don't understand the problem and all you've said is that it looks
wrong.

What's wrong with string.Format is that it works on strings. You don't have
strings, you have binary data in byte arrays, and your attempt to convert
binary into strings and back is totally broken.
Thanks.

Jonathan

Ben Voigt said:
If the IV and data are the same then the result of the MD5 operation
should also be the same. But are you sure the data is the same? Your
string handling looks very wrong to me, use array concatenation
instead of trying to use string.Format on this binary data.

Jonathan Wood said:
Greetings,

I can't seem to find a solution to this.

According to Google's Safe Browsing API, the following code should
produce a matching base64-encoded checksum.

string clientKey = "8eirwN1kTwCzgWA2HxTaRQ==";
string table =
"+8070465bdf3b9c6ad6a89c32e8162ef1\t\n+86fa593a025714f89d6bc8c9c5a191ac\t\n+bbbd7247731cbb7ec1b3a5814ed4bc9d\t\n";
string mac = "dRalfTU+bXwUhlk0NCGJtQ==";

byte[] results = Convert.FromBase64String(clientKey);
StringBuilder sb = new StringBuilder(32);
foreach (byte b in results)
sb.Append(b.ToString("x2"));
string key = sb.ToString();

string s = String.Format("{0}:coolgoog:{1}:coolgoog:{0}", key,
table); MD5 md5 = MD5.Create();
results = md5.ComputeHash(Encoding.Default.GetBytes(s));
s = Convert.ToBase64String(results);

if (s != mac)
s = "Whoops! Doesn't match!";

So, it appears I have something wrong. However, I found someone
apparently resolve this same problem at
http://stackoverflow.com/questions/181994/code-to-verify-updates-from-the-google-safe-browsing-api.
While it appears they are doing the same thing I'm doing, they do
it in another language (Python?). Should an MD5 checksum be the
same regardless of the language?

Can anyone see what I've missed?

Thanks.

Jonathan


__________ Information from ESET NOD32 Antivirus, version of virus
signature database 4160 (20090616) __________

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com

__________ Information from ESET NOD32 Antivirus, version of virus
signature database 4160 (20090616) __________

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 
J

Jonathan Wood

Hi Ben,
What's wrong with string.Format is that it works on strings. You don't
have strings, you have binary data in byte arrays, and your attempt to
convert binary into strings and back is totally broken.

Eh?

If you check the code, my two arguments to String.Format (besides the format
string) are key and table. table is declared as a string and initialized
with a string. key is also declared as a string and constructed using the
following block:

byte[] results = Convert.FromBase64String(clientKey);
StringBuilder sb = new StringBuilder(32);
foreach (byte b in results)
sb.Append(b.ToString("x2"));
string key = sb.ToString();

So, where is this binary data you are talking about?

Jonathan
Jonathan

Ben Voigt said:
If the IV and data are the same then the result of the MD5 operation
should also be the same. But are you sure the data is the same? Your
string handling looks very wrong to me, use array concatenation
instead of trying to use string.Format on this binary data.

Greetings,

I can't seem to find a solution to this.

According to Google's Safe Browsing API, the following code should
produce a matching base64-encoded checksum.

string clientKey = "8eirwN1kTwCzgWA2HxTaRQ==";
string table =
"+8070465bdf3b9c6ad6a89c32e8162ef1\t\n+86fa593a025714f89d6bc8c9c5a191ac\t\n+bbbd7247731cbb7ec1b3a5814ed4bc9d\t\n";
string mac = "dRalfTU+bXwUhlk0NCGJtQ==";

byte[] results = Convert.FromBase64String(clientKey);
StringBuilder sb = new StringBuilder(32);
foreach (byte b in results)
sb.Append(b.ToString("x2"));
string key = sb.ToString();

string s = String.Format("{0}:coolgoog:{1}:coolgoog:{0}", key,
table); MD5 md5 = MD5.Create();
results = md5.ComputeHash(Encoding.Default.GetBytes(s));
s = Convert.ToBase64String(results);

if (s != mac)
s = "Whoops! Doesn't match!";

So, it appears I have something wrong. However, I found someone
apparently resolve this same problem at
http://stackoverflow.com/questions/181994/code-to-verify-updates-from-the-google-safe-browsing-api.
While it appears they are doing the same thing I'm doing, they do
it in another language (Python?). Should an MD5 checksum be the
same regardless of the language?

Can anyone see what I've missed?

Thanks.

Jonathan


__________ Information from ESET NOD32 Antivirus, version of virus
signature database 4160 (20090616) __________

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com




__________ Information from ESET NOD32 Antivirus, version of virus
signature database 4160 (20090616) __________

The message was checked by ESET NOD32 Antivirus.

http://www.eset.com
 
B

Ben Voigt [C++ MVP]

Jonathan said:
Hi Ben,
What's wrong with string.Format is that it works on strings. You
don't have strings, you have binary data in byte arrays, and your
attempt to convert binary into strings and back is totally broken.

Eh?

If you check the code, my two arguments to String.Format (besides the
format string) are key and table. table is declared as a string and
initialized with a string. key is also declared as a string and
constructed using the following block:

byte[] results = Convert.FromBase64String(clientKey);
StringBuilder sb = new StringBuilder(32);
foreach (byte b in results)
sb.Append(b.ToString("x2"));
string key = sb.ToString();

So, where is this binary data you are talking about?

byte[] results

That block you show tries to convert binary data into a string, but does it
wrong. You should not use a string to hold binary data, and if you try, you
need to use the same logic to convert in each direction. But you've used a
handcoded loop to go binary->string and Encoding.Default.GetBytes() to go
string->binary and these are not compatible.

Get rid of the strings. Encryption and cryptographic hashes like MD5 work
on blocks of octets, not strings.

If it was safe to use a string to hold binary data, don't you think that
Convert.FromBase64 would have returned a string? They chose byte[] instead
of string for a very good reason, you're trying to change that design
decision without trying to understand the rationale. Encoding.GetBytes (and
its counterpart Encoding.GetString) work on text, not binary data. They are
allowed to do things like merge Unicode combining characters together which
will do horrible horrible things to your MD5 check.

String.Format can be replaced by a small number (about five) calls to
Array.Copy, probably run faster, and most importantly you'll be running MD5
using the value stored as Base64 in clientKey, not some mangled copy which
has been encoded and decoded by functions that don't round-trip.
 
B

Ben Voigt [C++ MVP]

This code works for me. I have written it for clarity rather than
speed or efficiency:

The correct size of the result in Concatenate is trivial to calculate, so
using List isn't really a big advantage. But this is exactly the direction
I was trying to point Jonathan.

/// Concatenates the supplied byte arrays into one.
static byte[] Concatenate(params byte[][] bArrays) {
int combinedLength = 0;
foreach (byte[] bAry in bArrays) {
combinedLength += bAry.Length;
}
byte[] result = new byte[combinedLength];
combinedLength = 0;
foreach (byte[] bAry in bArrays) {
int elemLength = bAry.Length;
Array.Copy(bAry, 0, result, combinedLength, elemLength);
combinedLength += elemLength;
}
return result;
} // end Concatenate()
// --- Begin Code ---

public static void Main() {

// Supplied data.
string clientKey = "8eirwN1kTwCzgWA2HxTaRQ==";
string table = "+8070465bdf3b9c6ad6a89c32e8162ef1\t\n" +
"+86fa593a025714f89d6bc8c9c5a191ac\t\n" +
"+bbbd7247731cbb7ec1b3a5814ed4bc9d\t\n";
string mac = "dRalfTU+bXwUhlk0NCGJtQ==";
string coolgoog = ":coolgoog:";

// Convert input strings to byte arrays.
byte[] keyBytes = Convert.FromBase64String(clientKey);
byte[] coolBytes = System.Text.Encoding.UTF8.GetBytes(coolgoog);
byte[] tableBytes = System.Text.Encoding.UTF8.GetBytes(table);

// Concatenate the arrays in the right order.
byte[] theLot = Concatenate(keyBytes, coolBytes,
tableBytes, coolBytes, keyBytes);

// Calculate the MD5 hash.
MD5 md5 = MD5.Create();
byte[] hash = md5.ComputeHash(theLot);

// Check the hash against the supplied MAC.
String s = Convert.ToBase64String(hash);
if (s != mac) { s = "Whoops! Doesn't match!"; }

Console.WriteLine(s);
} // end Main()

/// Concatenates the supplied byte arrays into one.
static byte[] Concatenate(params byte[][] bArrays) {
List<byte> result = new List<byte>();
foreach (byte[] bAry in bArrays) {
result.AddRange(bAry);
}
return result.ToArray();
} // end Concatenate()

// --- End Code ---

rossum
 
J

Jonathan Wood

Hi Rossum,
byte[] results = Convert.FromBase64String(clientKey);
You have done well up to here. Think about how many bytes there are
in results[].
16.
StringBuilder sb = new StringBuilder(32);
foreach (byte b in results)
sb.Append(b.ToString("x2"));
string key = sb.ToString();
Now think about how many bytes there are in key. Does this number
match with the number of bytes in results[] above.

No, key is a hex representation of results. It's double the number of
characters (32). Being Unicode, I assume it's twice that number of bytes.
You are concatenating strings, which is easier than concatenating byte
arrays, but more error prone.

Okay, so I was was running the hash on the hex representation of the data
rather than the data itself. I don't consider what I did "error prone" as to
the results produced by String.Format, it was just the wrong thing I
should've been doing the hash on.

Note that I didn't do it that way because it was easier: The entire API
deals with hex representations of MD5 hash codes and I guess I assumed
(incorrectly) that this would be handled the same way (the API docs aren't
very complete).

What's more, I googled around and found some samples doing exactly the same
thing I was (but not working right)! So I really got off on the wrong track.
Better to concatenate byte arrays since
you will eventually be hashing a byte array.

Yep, your code works great. You obviously loaded into Visual Studio and
worked through it and it is very much appreciated!

Many thanks.

Jonathan
 
J

Jonathan Wood

Hi Ben,
So, where is this binary data you are talking about?

byte[] results

That block you show tries to convert binary data into a string, but does
it wrong. You should not use a string to hold binary data, and if you
try, you need to use the same logic to convert in each direction. But
you've used a handcoded loop to go binary->string and
Encoding.Default.GetBytes() to go string->binary and these are not
compatible.

Okay, I appreciate you working with me on this. I'm very comfortable with
String.Format and was producing expected results with it--that's why I was
having trouble understanding you. Note that the API I'm working with uses
this approach extensively (all the data tables returned from the API are hex
representations of MD5 hashes).

The problem was that I needed to hash the bytes and not the hex
representation of those bytes.
If it was safe to use a string to hold binary data, don't you think that
Convert.FromBase64 would have returned a string? They chose byte[]
instead of string for a very good reason, you're trying to change that
design decision without trying to understand the rationale.

Well, again, if you generate a hex representation of the binary data, then
I'm not seeing any inherent problem with that. It is text in every sense at
that point. Again, it's just that it was not the data the algorithm needed.

Thanks.

Jonathan
 
B

Ben Voigt [C++ MVP]

If it was safe to use a string to hold binary data, don't you think
that Convert.FromBase64 would have returned a string? They chose
byte[] instead of string for a very good reason, you're trying to
change that design decision without trying to understand the
rationale.

Well, again, if you generate a hex representation of the binary data,
then I'm not seeing any inherent problem with that. It is text in
every sense at that point. Again, it's just that it was not the data
the algorithm needed.

Well no, it's not text. The binary data is just a bunch of bits. When you
put them in a string they take on *meaning*. String manipulation functions
don't treat them as just a bunch of bits, they do things like combine
characters into ligands, merge combining accents with the following
character, etc. All these transformations result in the same text but
different bits. Since MD5 is calculated from the bits, allowing these
transformations causes it to break.

Using ASCII characters to hold the values in text is safe from these Unicode
transformations, but it is itself a transformation and the bits that MD5
hash function sees are different.

The bits in 0xABCD are different from the bits in "ABCD". So the MD5 hash
is different.
 
J

Jonathan Wood

Hi Ben,
The bits in 0xABCD are different from the bits in "ABCD". So the MD5 hash
is different.

Of course. But, for all programming purposes, I consider "ABCD" to be text,
and fully compatible with strings.

Jonathan
 
B

Ben Voigt [C++ MVP]

Jonathan said:
Hi Ben,


Of course. But, for all programming purposes, I consider "ABCD" to be
text, and fully compatible with strings.

Ok, bad wording.

The bits in { 0xAB, 0xCD } may be different from the bits in "\xAB\xCD"
because the latter is a string and may have text transformations performed
on it.
 
J

Jeff Johnson

Ok, bad wording.

The bits in { 0xAB, 0xCD } may be different from the bits in "\xAB\xCD"
because the latter is a string and may have text transformations performed
on it.

Because .NET uses Unicode exclusively in strings, I would say that in memory
the bits are most definitely different, because "\xAB\xCD" would be 00AB00CD
in memory, right?
 
J

Jeff Johnson

Because .NET uses Unicode exclusively in strings, I would say that in
memory the bits are most definitely different, because "\xAB\xCD" would be
00AB00CD in memory, right?

(Assuming ISO-8859-1 or Windows 1252, that is, which I'm sure is what
Johnathan is set up for.)
 
B

Ben Voigt [C++ MVP]

Jeff said:
Because .NET uses Unicode exclusively in strings, I would say that in
memory the bits are most definitely different, because "\xAB\xCD"
would be 00AB00CD in memory, right?

See, you're making an assumption that codepoints are preserved. But there
is no such expectation. If either of those hexadecimal values represents a
codepoint for a combining accent, then "\xAB\xCD" might turn into EF12 in
memory (just one Unicode codepoint now, bits not related to the original
bits in any mathematical way).

Strings are not good for storing binary data. You can encode the binary
into the ASCII characters for hexadecimal digits, or Base64, or several
other options which are guaranteed not to be transformed. But then don't
forget to undo the transformation before using the data. And don't rely on
casting of individual Unicode codepoints to be a reversible transformation.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top