about Equals and GetHashCode

J

Jon Skeet [C# MVP]

To be perfectly honest, I'm not sure how important it is for the hash
code to be absolutely random as opposed to just good.

I think to know that we'd have to know exactly what you mean by
"random". Completely evenly distributed?

I suspect it's beneficial because whatever the hashtable does to
process hashcodes, if you have some pattern/clustering to your data
you're more likely to get a poor distribution in terms of buckets.
However, I would hope that a good hashtable implementation would make
it *unlikely* that you'd see a problem. Then again, my understanding
of the *exact* implementation of hashtables is somewhat lacking.
I think that would be an interesting topic for a blog. I do try to
read your blog posts so I'll catch it if you do decide to post on the
topic.

I'll put it on my mental list of future posts :)

Jon
 
M

Marc Gravell

Re adding a seed; actually, it has one advantage... a lot of default
instances get created just with new Foo() or
Activator.CreateInstance()
Without a seed, this type of hash algorithm (using + and *) will
return 0 for every default instance. Which is fine in itself, since
every default instance is broadly comparable... but! Now imagine that
we are comparing tuples of values, or lists of values - where the
tuple's hashcode is a composite (again using + and *) of the contained
values' hashcodes.
It is nice (from a purely theoretical standpoint) to be able to
immediately distinguish a set of 2 (default instances) from a set of 8
(default instances) - or the sets {A,B} and {[default],A,B}.

OK; in reality, this is an unlikely scenario - but it does still seem
desirable to avoid any "obvious" causes of zero. Yes, even with a seed
you will still get zero from some values, but then as a genuine
exception, and so it is not an issue.

Re random - "uniform" would be a better term; Int32.GetHashCode() is
uniform but not random.

Marc
 
M

Marc Gravell

Um, okay. I'll buy that. Still, it seems like a pretty esoteric scenario
to drive a hash code implementation. Especially since it still doesn't
address sets of identical lengths. :)

Re identical lengths - if the lengths and contents match, I'm happy
for the hashcode to match under that approach, since they sound like
the same set (assuming the definition of equality ties with
GetHashCode(), which it should).

Marc
 
B

Brian Gideon

Okay, I'll have to consult Effective Java, which is where the advice
originally came from. I believe there's some benefit to having two
magic numbers which are coprime to each other. Wikipedia doesn't show
this as one of the example hash functions, unfortunately, so there's
no discussion there to help :(

Jon

Okay, I'm looking through Knuth Vol 3 (those who have read the books
know that it isn't an easy read) on hashing and he talks about
choosing a prime number as the multiplier for the "scrambling
function" (aka GetHashCode). However, I don't see anything about the
initial seed being a prime number.

My best guess right now is definitely along the lines of what Marc
suggested. It has more to do with the end result you get from
combining the values of more than one call to GetHashCode (via
combinations of + and *).

From clues in Knuth this may be related to random number generation
theory. Since the scrambling function needs to be random it's easy to
see how that works in. Seeding the hash code with a number that is
coprime to the multiplier may be an attempt to prevent unwanted
periods from developing in the final hash code when combinging results
from multiple calls to GetHashCode.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top