about Equals and GetHashCode

Jon Skeet [C# MVP] · Jun 13, 2008

To be perfectly honest, I'm not sure how important it is for the hash
code to be absolutely random as opposed to just good.

I think to know that we'd have to know exactly what you mean by
"random". Completely evenly distributed?

I suspect it's beneficial because whatever the hashtable does to
process hashcodes, if you have some pattern/clustering to your data
you're more likely to get a poor distribution in terms of buckets.
However, I would hope that a good hashtable implementation would make
it *unlikely* that you'd see a problem. Then again, my understanding
of the *exact* implementation of hashtables is somewhat lacking.

I think that would be an interesting topic for a blog. I do try to
read your blog posts so I'll catch it if you do decide to post on the
topic.

I'll put it on my mental list of future posts

Jon

Marc Gravell · Jun 13, 2008

Re adding a seed; actually, it has one advantage... a lot of default
instances get created just with new Foo() or
Activator.CreateInstance()
Without a seed, this type of hash algorithm (using + and *) will
return 0 for every default instance. Which is fine in itself, since
every default instance is broadly comparable... but! Now imagine that
we are comparing tuples of values, or lists of values - where the
tuple's hashcode is a composite (again using + and *) of the contained
values' hashcodes.
It is nice (from a purely theoretical standpoint) to be able to
immediately distinguish a set of 2 (default instances) from a set of 8
(default instances) - or the sets {A,B} and {[default],A,B}.

OK; in reality, this is an unlikely scenario - but it does still seem
desirable to avoid any "obvious" causes of zero. Yes, even with a seed
you will still get zero from some values, but then as a genuine
exception, and so it is not an issue.

Re random - "uniform" would be a better term; Int32.GetHashCode() is
uniform but not random.

Marc

Marc Gravell · Jun 13, 2008

Um, okay. I'll buy that. Still, it seems like a pretty esoteric scenario

to drive a hash code implementation. Especially since it still doesn't
address sets of identical lengths.

Re identical lengths - if the lengths and contents match, I'm happy
for the hashcode to match under that approach, since they sound like
the same set (assuming the definition of equality ties with
GetHashCode(), which it should).

Marc

Brian Gideon · Jun 13, 2008

Okay, I'll have to consult Effective Java, which is where the advice
originally came from. I believe there's some benefit to having two
magic numbers which are coprime to each other. Wikipedia doesn't show
this as one of the example hash functions, unfortunately, so there's
no discussion there to help

Jon

Okay, I'm looking through Knuth Vol 3 (those who have read the books
know that it isn't an easy read) on hashing and he talks about
choosing a prime number as the multiplier for the "scrambling
function" (aka GetHashCode). However, I don't see anything about the
initial seed being a prime number.

My best guess right now is definitely along the lines of what Marc
suggested. It has more to do with the end result you get from
combining the values of more than one call to GetHashCode (via
combinations of + and *).

From clues in Knuth this may be related to random number generation
theory. Since the scrambling function needs to be random it's easy to
see how that works in. Seeding the hash code with a number that is
coprime to the multiplier may be an attempt to prevent unwanted
periods from developing in the final hash code when combinging results
from multiple calls to GetHashCode.

GetHashCode and Equals	6	Feb 6, 2010
Appropriate GetHashCode() override?	4	Nov 14, 2008
about the hashtable and equals	11	Jun 5, 2010
what happens if I skip implementing GetHashCode	7	Jun 13, 2008
GetHashCode() not consistent?	10	Mar 9, 2004
Generic Class example and how to use Generic Classes (format)	1	Oct 9, 2008
What does KeyValuePair.GetHashCode do?	4	Jan 14, 2014
Generating a unique hash of a known length	7	Jul 22, 2009

about Equals and GetHashCode

Jon Skeet [C# MVP]

Marc Gravell

Marc Gravell

Brian Gideon

Ask a Question

Similar Threads