GetHashCode() not consistent?

M

Michi Henning

From the Object.GetHashCode doc:

"The default implementation of GetHashCode does not guarantee uniqueness or
consistency; therefore, it must not be used as a unique object identifier
for hashing purposes."

I'm blown away by this statement.

Translation: the default implementation of GetHashCode() is unsuitable for hashing.

In particular, the doc also says:

"GetHashCode must always return the same value for a given instance
of the object."

It appears then that the default implementation of GetHashCode() does not meet
this requirement, given that the default implementation "does not guarantee
consistency."

What's happening here? Is the documentation really correct? It would mean
that System.Object.GetHashCode() is simply useless -- I'm finding that
hard to believe. On the other hand, if the documentation is correct, then
GetHashCode() should be abstract, no?

Thanks,

Michi.
 
C

Chris A. R.

Why should this:
Object objectItem = new object();
int objectCode = obj.GetHashCode();
always return the same value? There is nothing in an object on which to
determine a set value for a hash code.

But then, how often do you create new object instances in your code? I
never do. I think setting the default to return different values is a
defense against silly programming practices and prevents the user from using
the default method when they should be overriding it in their class. But,
there is no need to make it abstract, since there are many classes that do
not participate in being a key value in a hashtable.

Chris A.R.
 
M

Michi Henning

Chris said:
Why should this:
Object objectItem = new object();
int objectCode = obj.GetHashCode();
always return the same value? There is nothing in an object on which to
determine a set value for a hash code.

Huh? Suppose I make a set of objects. They need not be of type object.
I insert a bunch of objects into a hashtable. Sometime later, I want
to know whether one of the objects is in the table or not. If the
hash code changes during the life time of an object, I will get incorrect
answers.

Hashing simply doesn't work if the hash code can change unpredictably.
But then, how often do you create new object instances in your code? I
never do. I think setting the default to return different values is a
defense against silly programming practices and prevents the user from using
the default method when they should be overriding it in their class.

Excuse me? The way to "help" me is to randomly make the code fail? I think not.
But,
there is no need to make it abstract, since there are many classes that do
not participate in being a key value in a hashtable.

Right. The lack of templates in C# is certainly noticeable here.

But C# could do what Java does. In Java, the default implementation of hash()
guarantees to return the same hash value for the life time of an object.
With that, things work as expected and there are no suprises.

Cheers,

Michi.
 
C

Chris A. R.

The default implementation DOES give the same hash value for the lifetime of
the object.

However, in when you override hashing it is expected that, if two items are
identical, they return the same hash code.

For an object, identical items will return different hash codes, by default,
thus the "inconsistency" issue.

For example, two different strings with the same value will return the same
hash code. Two classes that do not override GetHashCode, even when they
have the same field values, will return different values. As for
uniqueness, you'd only have to worry about that with a large number of items
anyway.

Note: Java doesn't guarantee uniqueness or consistency either.

Chris A.R.
 
B

boudino

Don't forget that you must implement GetHashCode() if you implement Equals(). I think this is a reason for sentence you have mentioned. Just not to relay on default implementation if hav classes with your own Equals(), because GetHashCode() is key part of object beeing stored in Hastables.
 
J

Jon Skeet [C# MVP]

Michi Henning said:
From the Object.GetHashCode doc:

"The default implementation of GetHashCode does not guarantee uniqueness or
consistency; therefore, it must not be used as a unique object identifier
for hashing purposes."

I'm blown away by this statement.

Translation: the default implementation of GetHashCode() is unsuitable for hashing.

Nope, that's fine.
In particular, the doc also says:

"GetHashCode must always return the same value for a given instance
of the object."

It appears then that the default implementation of GetHashCode() does not meet
this requirement, given that the default implementation "does not guarantee
consistency."

I don't believe it means that it violates the requirements - it just
means that you can't predict ahead of time what the hash code will be
based on the values of the fields in the object. I agree it's not very
well worded. (It's worse than that - it then goes on to say that
derived classes must provide a *unique* hashcode, which is somewhat
difficult if you've got more than 32 bits of information...)

I've sent a mail to the documentation team to try to get this improved.
 
M

Michi Henning

Jon Skeet said:
Michi Henning said:
From the Object.GetHashCode doc:

"The default implementation of GetHashCode does not guarantee uniqueness or
consistency; therefore, it must not be used as a unique object identifier
for hashing purposes."

[...]

"GetHashCode must always return the same value for a given instance
of the object."

[...]

I don't believe it means that it violates the requirements - it just
means that you can't predict ahead of time what the hash code will be
based on the values of the fields in the object.

Sure, that's understandable. The hash code can be chosen as a random
number, for example. As long as GetHashCode() returns the same
random number during the life time of an object, everything is fine.
I agree it's not very
well worded. (It's worse than that - it then goes on to say that
derived classes must provide a *unique* hashcode, which is somewhat
difficult if you've got more than 32 bits of information...)

Right. In particular, the words "it must not be used as a unique
object identifier for hashing purposes" basically say "you cannot
use GetHashCode() for hashing!"

When I first read this, I was incredulous. Worse, the first thought
that came to my mind was "aha, they derive the default hash
value from the memory address of the object and then, if the GC
relocates the object in memory, the hash code can change
unpredictably." This would obviously be a Very Bad Thing (TM) ;-)
I've sent a mail to the documentation team to try to get this improved.

Thanks muchly! Updating this would be a real improvement. BTW --
the Java API docs have a beautifully concise and correct set of
rules of how a hash function must behave. Something along similar
lines would be useful.

So, just to make sure: I *can* indeed rely on the default implementation
of GetHashCode() to return the same value for the life time of an
object?

Cheers,

Michi.
 
M

Michi Henning

boudino said:
Don't forget that you must implement GetHashCode() if you implement Equals().
I think this is a reason for sentence you have mentioned. Just not to relay on
default implementation if hav classes with your own Equals(), because
GetHashCode() is key part of object beeing stored in Hastables.

Sure -- assignment to any of the fields that are used by Equals also allows the
hash code to
change. (However, this does not mean that the hash code *has* to change -- it's
just that,
if the hash code doesn't change, I'm unlikely to get good hashing.) Further, if
two objects
compare as equal with Equals(), they *must* have the same hash code.

But the words in the doc really seem to say that GetHashCode() cannot be
used for hashing, which is nonsensical.

Cheers,

Michi.
 
M

Michi Henning

Chris A. R. said:
The default implementation DOES give the same hash value for the lifetime of
the object.

Yes, that appears to be the case, but the words in the doc seem to strongly
suggest otherwise :)
However, in when you override hashing it is expected that, if two items are
identical, they return the same hash code.
Exactly.

For an object, identical items will return different hash codes, by default,
thus the "inconsistency" issue.

Well, if I override Equals and don't override GetHashCode(), I have a
problem indeed. But, in the context of hashing, "consistency" means that
the hash function consistently returns the same value if invoked on an object
repeatedly (provided that none of the fields used by Equals have changed
in value). So, if the word "consistency" in the doc isn't used with that
meaning,
that's misleading.
Note: Java doesn't guarantee uniqueness or consistency either.

Not uniqueness, but it does guarantee consistency. From the Java doc:

The general contract of hashCode is:

Whenever it is invoked on the same object more than once during an
execution of a Java application,
the hashCode method must consistently return the same integer, provided no
information used in equals
comparisons on the object is modified. This integer need not remain
consistent from one execution of
an application to another execution of the same application.

If two objects are equal according to the equals(Object) method, then
calling the hashCode method
on each of the two objects must produce the same integer result.

It is not required that if two objects are unequal according to the
equals(java.lang.Object) method,
then calling the hashCode method on each of the two objects must produce
distinct integer results.
However, the programmer should be aware that producing distinct integer
results for unequal objects
may improve the performance of hashtables.

Cheers,

Michi.
 
J

Jon Skeet [C# MVP]

Michi Henning said:
Right. In particular, the words "it must not be used as a unique
object identifier for hashing purposes" basically say "you cannot
use GetHashCode() for hashing!"

Right. Using it as a unique object identifier would be bad, but using
it for hashing should be fine.
When I first read this, I was incredulous. Worse, the first thought
that came to my mind was "aha, they derive the default hash
value from the memory address of the object and then, if the GC
relocates the object in memory, the hash code can change
unpredictably." This would obviously be a Very Bad Thing (TM) ;-)
Indeed.


Thanks muchly! Updating this would be a real improvement.

Let's hope it happens :)
BTW --
the Java API docs have a beautifully concise and correct set of
rules of how a hash function must behave. Something along similar
lines would be useful.

The rules in GetHashCode aren't too bad - it's just the extra stuff
that's wrong :(
So, just to make sure: I *can* indeed rely on the default implementation
of GetHashCode() to return the same value for the life time of an
object?

I certainly believe so.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top