Problem: GetHashCode for string is different in 1.1 and 2.

G

Guest

Hello,

I am wondering why GetHashCode() for the string is using two differemt
algorithms in .NET Framework 1.1 and 2.

This is creating a big problem, because we relied on this hashcode as
unique. How can we fix this problem?
 
N

Nicholas Paldino [.NET/C# MVP]

Mike,

You are going to have to work around it, unfortunately. There is no
guarantee that the hashcode is going to be unique across different versions
of the framework, or even across calls to GetHashCode across different
invocations of the program.

The default implementation of GetHashCode for object just returns a
counter to the object reference. This will always be different across
invocations of any program for any type that doesn't override GetHashCode.

If you need a hash that is predictable, then you have to go with a
well-known algorithm. You should look in the System.Security.Cryptography
namespace, and use the MD5 or SHA1 algorithm for hashing your data. This
will give you predictable results every time.

Hope this helps.
 
J

Jon Skeet [C# MVP]

Mike9900 said:
I am wondering why GetHashCode() for the string is using two differemt
algorithms in .NET Framework 1.1 and 2.

Presumably because MS found a better algorithm - potentially the new
one is faster, has fewer collisions, or has a better distribution. I
don't know - but it's entirely reasonable for it to change.
This is creating a big problem, because we relied on this hashcode as
unique.

You shouldn't rely on a hashcode as being unique in the first place
(it's not - there can be multiple strings with the same hashcode) and
you shouldn't rely on a hashcode being the same even across two *runs*
necesasrily let alone between two versions of the framework.
How can we fix this problem?

Well, you can probably dig up the 1.1 implementation somewhere, but you
should urgently review your code to see where else you're making the
same inaccurate assumption. Hashcodes should be used as a quick "first
pass" to check for equality (where the hash codes being equal does
*not* mean that objects are necessarily equal - only that if the hash
codes are not equal, the objects should definitely not be equal) in a
way which allows you to find equal objects quickly.
 
G

Guest

Thanks for reply.

I have many strings and I want them to be stored in a file and be idetified
by hashcode, so when the id is encountered the application would know of the
string. What is the correct approach and algorithm for it?
--
Mike


Nicholas Paldino said:
Mike,

You are going to have to work around it, unfortunately. There is no
guarantee that the hashcode is going to be unique across different versions
of the framework, or even across calls to GetHashCode across different
invocations of the program.

The default implementation of GetHashCode for object just returns a
counter to the object reference. This will always be different across
invocations of any program for any type that doesn't override GetHashCode.

If you need a hash that is predictable, then you have to go with a
well-known algorithm. You should look in the System.Security.Cryptography
namespace, and use the MD5 or SHA1 algorithm for hashing your data. This
will give you predictable results every time.

Hope this helps.

--
- Nicholas Paldino [.NET/C# MVP]
- (e-mail address removed)

Mike9900 said:
Hello,

I am wondering why GetHashCode() for the string is using two differemt
algorithms in .NET Framework 1.1 and 2.

This is creating a big problem, because we relied on this hashcode as
unique. How can we fix this problem?
 
J

Jon Skeet [C# MVP]

Mike9900 said:
Thanks for reply.

I have many strings and I want them to be stored in a file and be idetified
by hashcode, so when the id is encountered the application would know of the
string. What is the correct approach and algorithm for it?

Are you saying the file can only contain hashes of some form, not the
full string? Could you give a bit more information about what you're
trying to do? It's not entirely clear. Note that hashing is one way -
you can't go from a hashcode to a full string, the information just
isn't there.

If you have a lookup from some kind of hash (a better specified one
than GetHashCode, with a better chance of uniqueness - such as SHA1 or
MD5) to the string, then you could do it, but there may well be a
better solution. The more information you could give us, the better
we'll be able to help you.
 
G

Guest

Hello,

Thanks for the reply. I am using MD5, but I am wondering if SHA1 is better.

My problem is this:

I have a set of string that I want to refer to them by a identifier, because
the string could be in a different languages. So, the identifier does not
care about the language and always refer to the same string.
 
J

Jon Skeet [C# MVP]

Mike9900 said:
Thanks for the reply. I am using MD5, but I am wondering if SHA1 is better.

My problem is this:

I have a set of string that I want to refer to them by a identifier, because
the string could be in a different languages. So, the identifier does not
care about the language and always refer to the same string.

Where does hashing come in? Why not just use a GUID or something
similar?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top