Encodings and MD5

J

John Puopolo

All:

I have a string that I need to encode using MD5. Since I am using C#,
my original, plain string is in Unicode, e.g."

string clearText = "foobar";

To use the MD5 hash function, I need to turn my string into a byte
array. However, I have choices:

// UTF8Encoding encoder = new UTF8Encoding();
// ASCIIEncoding encoder1 = new ASCIIEncoding();
// UnicodeEncoding encoder2 = new UnicodeEncoding();
// UTF7Encoding encoder3 = new UTF7Encoding();
// UTF8Encoding encoder4 = new UTF8Encoding();

// Convert given string to bytes required as input by MD5
// byte[] clearBytes1 = encoder1.GetBytes(clearText);
// byte[] clearBytes2 = encoder2.GetBytes(clearText);
// byte[] clearBytes3 = encoder3.GetBytes(clearText);
// byte[] clearBytes4 = encoder4.GetBytes(clearText);

A simple text shows that these conversions all do slightly different
things, thus the subsequent hash returns different values. Is there a
"right" one to choose to do a "standard" MD5 hash?

Thanks,
John
 
B

Ben Rush

As to whether there is standard encoding that is expected for those reading
an MD5 hash of character data, I have no answer. It's my guess that there
isn't, though I could be wrong. You may just want to think about the
application of your hash - for example, are you the only consumer and if so,
who cares what encoding you use? Are you going to be using non-ASCII
characters (characters that extend beyond the ASCII chart)? If so - do not
use the ASCII encoding as you'll be throwing away bits. Are you going to
have UNICODE characters? If so, then don't choose ASCII as you'll be, again,
throwing away data.

I do not think there is an exact answer to your question as I think it's
highly dependent upon your situation. Even if you were to say, "I just want
to grab the bytes that make up the string" you must first define how that
string data will be represented in memory (it persists in the runtime in
UNICODE format).
 
J

Jon Skeet [C# MVP]

John Puopolo said:
I have a string that I need to encode using MD5. Since I am using C#,
my original, plain string is in Unicode, e.g."

string clearText = "foobar";

To use the MD5 hash function, I need to turn my string into a byte
array. However, I have choices:

// UTF8Encoding encoder = new UTF8Encoding();
// ASCIIEncoding encoder1 = new ASCIIEncoding();
// UnicodeEncoding encoder2 = new UnicodeEncoding();
// UTF7Encoding encoder3 = new UTF7Encoding();
// UTF8Encoding encoder4 = new UTF8Encoding();

// Convert given string to bytes required as input by MD5
// byte[] clearBytes1 = encoder1.GetBytes(clearText);
// byte[] clearBytes2 = encoder2.GetBytes(clearText);
// byte[] clearBytes3 = encoder3.GetBytes(clearText);
// byte[] clearBytes4 = encoder4.GetBytes(clearText);

A simple text shows that these conversions all do slightly different
things, thus the subsequent hash returns different values. Is there a
"right" one to choose to do a "standard" MD5 hash?

No, I don't believe there's a standard for it. What's going to be
reading these hashes? If it's only your own code, it doesn't matter
much - in which case I'd suggest picking UTF-8 (definitely that or
Unicode - you want something which will give different results for all
Unicode characters).

If something else is going to be using the hash, you need to find out
what that will be expecting.
 
J

John Puopolo

All:

Thanks for the feedback - all right on. The net net is that the hashes
need to use the same mechanism.

(My particular problem was the system that I was talking to was
matching the hex byte representation of the hash - in lower case - and
I was transmitting in upper case. )

Thanks, all -
John
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top