GetHashCode and Equals

T

Tony Johansson

Hi!

When you use a class as key in a HashTable you is recommended to override
both Equals and GetHashCode if you want the
dictionary to work perfect and be fast.

If you override Equals but not GetHashCode the compiler will give you a
warning that you
have probably forgot to override GetHashCode.

But if you only override GetHashCode you will not get any warning about the
forgotten Equals isn't that strange ?
I would have hoped that you will have recived a waring even in this case.

//Tony
 
P

Peter Duniho

Tony said:
[...]
If you override Equals but not GetHashCode the compiler will give you a
warning that you
have probably forgot to override GetHashCode.

But if you only override GetHashCode you will not get any warning about the
forgotten Equals isn't that strange ?
I would have hoped that you will have recived a waring even in this case.

I think it's fine. The requirements tying Equals() and GetHashCode()
together are not symmetric. GetHashCode() is required to return the
same value for two different objects if Equals() returns true. But,
Equasl() is not required to return true if GetHashCode() returns the
same value for two different objects.

And in fact, one reasonable and expected reason for overriding
GetHashCode() but not Equals() is when you already have an equality
comparison that works fine, but you want to improve the hash code
calculation so that it's more discriminating or better-distributed.

And in general, the language designers try to restrict warnings to those
things that are almost certainly errors, but which the compiler can't
provide with 100% certainty that they are. A change to the Equals()
implementation without a corresponding change to GetHashCode() is almost
certainly an error, but cannot be proved to be one (so must be a warning
instead of an error), but a change to GetHashCode() can easily happen in
valid code even without a change to Equals().

Thus, a warning in the one scenario, but no warning in the other.

Pete
 
T

Tony Johansson

Peter Duniho said:
Tony said:
[...]
If you override Equals but not GetHashCode the compiler will give you a
warning that you
have probably forgot to override GetHashCode.

But if you only override GetHashCode you will not get any warning about
the forgotten Equals isn't that strange ?
I would have hoped that you will have recived a waring even in this case.

I think it's fine. The requirements tying Equals() and GetHashCode()
together are not symmetric. GetHashCode() is required to return the same
value for two different objects if Equals() returns true. But, Equasl()
is not required to return true if GetHashCode() returns the same value for
two different objects.

And in fact, one reasonable and expected reason for overriding
GetHashCode() but not Equals() is when you already have an equality
comparison that works fine, but you want to improve the hash code
calculation so that it's more discriminating or better-distributed.

And in general, the language designers try to restrict warnings to those
things that are almost certainly errors, but which the compiler can't
provide with 100% certainty that they are. A change to the Equals()
implementation without a corresponding change to GetHashCode() is almost
certainly an error, but cannot be proved to be one (so must be a warning
instead of an error), but a change to GetHashCode() can easily happen in
valid code even without a change to Equals().

Thus, a warning in the one scenario, but no warning in the other.

Pete

Hi!

Assume the following.
Hashtable hashtable = new Hashtable(),
Fish key1 = new Fish("Herring"); //The class Fish is defined at the end
Fish key2 = new Fish("Herring");
hashtable[key1] = "Hello";
hashtable[key2] = "Hello";

Here hashtable.Count will be 2
So here we can see that two items in the collection that have the same name
is stored in the collection which is not good.

In this case Object.GetHashCode will most certainly return different hash
for these two rows
hashtable[key1] = "Hello";
hashtable[key2] = "Hello";

So I override the GetHashCode like this
public override int GetHashCode()
{
return name.GetHashCode();
}
Here the GetHashCode will return the same hash but this is not good enough
because when the Hashtable finds two
objects with the same hash, it will call Equals to see whether the two
objects are in fact equal.
The default implementation from Object.Equals will return false because of
comparing references between key1 and key2.

So in this case we had to override the Equals method like this
public override bool Equals(object)
{
Fish otherFish = obj as Fish;
if (otherFish ==null) return false;
return otherFish.name == name;
}

So here is my question:
As I can see here we must always override Equals when using class as key in
Hashtable isn't this correct ?
Or has I missed something again !


The class Fish that was used above has this definition
public class Fish
{
string name;
publick Fish(string name)
{
this.name=name;}
}
}

//Tony
 
J

Jeroen Mostert

Assume the following.
Hashtable hashtable = new Hashtable(),
Fish key1 = new Fish("Herring"); //The class Fish is defined at the end
Fish key2 = new Fish("Herring");
hashtable[key1] = "Hello";
hashtable[key2] = "Hello";

Here hashtable.Count will be 2
So here we can see that two items in the collection that have the same name
is stored in the collection which is not good.
Says you! You make the rules for when two objects should be considered the
same object. Just because you passed the same argument to the constructor
doesn't mean they should be considered the same. If they should be, you
should have specified so.
In this case Object.GetHashCode will most certainly return different hash
for these two rows
hashtable[key1] = "Hello";
hashtable[key2] = "Hello";

So I override the GetHashCode like this
public override int GetHashCode()
{
return name.GetHashCode();
}
Here the GetHashCode will return the same hash but this is not good enough
because when the Hashtable finds two
objects with the same hash, it will call Equals to see whether the two
objects are in fact equal.
The default implementation from Object.Equals will return false because of
comparing references between key1 and key2.
This is still fine and dandy. According to how you've defined it the objects
are, in fact, not equal and they must be stored separately in the table.
This doesn't mesh with how you've defined GetHashCode(), which just means
you get lots of unnecessary hash collisions, but there's nothing wrong with
that semantically. The hash table will work fine, it'll just be slower.
As I can see here we must always override Equals when using class as key in
Hashtable isn't this correct ?

No, it's not correct. The key difference (pardon the pun) is that if you
override .Equals() but not .GetHashCode() you are almost certainly
introducing a semantic error, because the contract says that if two objects
compare equal, they must have identical hash codes. Failing to adhere to
this will result in hash tables no longer working correctly: if A == B, you
could have table[A] != table because A and B were put into different
buckets based on their hash value. Not good.

If you do it the other way around, nothing bad happens at all as far as
hashing is concerned. Whether reference equality is really what you want is
up to you. It's true that *usually*, you'll want to implement a more useful
..Equals(), but there's nothing that *requires* it.
 
T

Tony Johansson

Jeroen Mostert said:
Assume the following.
Hashtable hashtable = new Hashtable(),
Fish key1 = new Fish("Herring"); //The class Fish is defined at the end
Fish key2 = new Fish("Herring");
hashtable[key1] = "Hello";
hashtable[key2] = "Hello";

Here hashtable.Count will be 2
So here we can see that two items in the collection that have the same
name
is stored in the collection which is not good.
Says you! You make the rules for when two objects should be considered the
same object. Just because you passed the same argument to the constructor
doesn't mean they should be considered the same. If they should be, you
should have specified so.
In this case Object.GetHashCode will most certainly return different hash
for these two rows
hashtable[key1] = "Hello";
hashtable[key2] = "Hello";

So I override the GetHashCode like this
public override int GetHashCode()
{
return name.GetHashCode();
}
Here the GetHashCode will return the same hash but this is not good
enough
because when the Hashtable finds two
objects with the same hash, it will call Equals to see whether the two
objects are in fact equal.
The default implementation from Object.Equals will return false because
of
comparing references between key1 and key2.
This is still fine and dandy. According to how you've defined it the
objects are, in fact, not equal and they must be stored separately in the
table. This doesn't mesh with how you've defined GetHashCode(), which just
means you get lots of unnecessary hash collisions, but there's nothing
wrong with that semantically. The hash table will work fine, it'll just be
slower.
As I can see here we must always override Equals when using class as key
in
Hashtable isn't this correct ?

No, it's not correct. The key difference (pardon the pun) is that if you
override .Equals() but not .GetHashCode() you are almost certainly
introducing a semantic error, because the contract says that if two
objects compare equal, they must have identical hash codes. Failing to
adhere to this will result in hash tables no longer working correctly: if
A == B, you could have table[A] != table because A and B were put into
different buckets based on their hash value. Not good.

If you do it the other way around, nothing bad happens at all as far as
hashing is concerned. Whether reference equality is really what you want
is up to you. It's true that *usually*, you'll want to implement a more
useful .Equals(), but there's nothing that *requires* it.


I think you missunderstood me.
What I'm trying to say is that my example show that I must override both
GetHashCode and Equals.
It was not enough to only override the GetHashCode.

So as far as I can see it always best to override both GetHashCode and
Equals when using classes as key
in Hashtable.

//Tony
 
J

Jeroen Mostert

Jeroen Mostert said:
As I can see here we must always override Equals when using class as key
in
Hashtable isn't this correct ?

No, it's not correct. The key difference (pardon the pun) is that if you
override .Equals() but not .GetHashCode() you are almost certainly
introducing a semantic error, because the contract says that if two
objects compare equal, they must have identical hash codes. Failing to
adhere to this will result in hash tables no longer working correctly: if
A == B, you could have table[A] != table because A and B were put into
different buckets based on their hash value. Not good.

If you do it the other way around, nothing bad happens at all as far as
hashing is concerned. Whether reference equality is really what you want
is up to you. It's true that *usually*, you'll want to implement a more
useful .Equals(), but there's nothing that *requires* it.

I think you missunderstood me.


Depends on what you meant by "we must always override Equals when using
class as key in Hashtable". I interpreted the "always" as, well, "always",
and answered the question accordingly. But from your statement below I
gather you meant "typically". I'm a computer programmer discussing compiler
semantics, you can hardly begrudge me some literal interpretation...
What I'm trying to say is that my example show that I must override both
GetHashCode and Equals.
It was not enough to only override the GetHashCode.
For what you wanted to accomplish *specifically*, yes. But your question was
why the compiler didn't issue a warning for overriding .GetHashCode() but
not .Equals(), while it does issue one for the reverse case.
So as far as I can see it always best to override both GetHashCode and
Equals when using classes as key in Hashtable.

From a pragmatic standpoint, yes. It would rarely be appropriate to
override .GetHashCode() but not .Equals() -- in fact, I can't think of a
concrete example where it would not also make sense to override .Equals().

But it's also true that code that overrides .GetHashCode() but not .Equals()
will not experience problems storing and retrieving values from a hash
table. In this case the code will just not do *what you expect*. If you
override .Equals() but not .GetHashCode() the code will not do *what the
framework library expects*, which is a bit more serious and leads to much
more obscure bugs.

The designers could have chosen to warn about both or neither of these
cases, but they chose to warn only about the least obvious case that leads
to the most serious problems.
 
P

Peter Duniho

Tony said:
[...]
What I'm trying to say is that my example show that I must override both
GetHashCode and Equals.
It was not enough to only override the GetHashCode.

So as far as I can see it always best to override both GetHashCode and
Equals when using classes as key
in Hashtable.

Since Jeroen has already the more detailed answer, I will simply say:

No. Just because in _some_ particular example it is important to
override both, it does not then logically follow that in _all_ examples,
it is important to do so.

Pete
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top