mortb said:
I've thought a bit about the issue with hashcodes.
I use hashtables and I suppose that they use the GetHashcode function to
determine the identity of the objects stored.
Well, to determine a hash key for the objects, rather than the unique
identity of the objects.
Then I thought that the
GetHashcode would have to return an unique number for every possible object
if I am to use it as key in the hashtable.
No; the point of a hash function is to provide a hash key that will
*usually* be different. Hash collisions aren't the end of the world.
Suppose I have a hashtable that stores books, using the 4th digit of
the ISBN as a hash key. If I get lucky, I will be able to store 10
books before getting a hash collision. But at some point, my hash
buckets are going to start having more than one book in. That's not an
error, it's part of the design of a hashtable. The point is that by
storing the books in ten different buckets, the list of books I have to
search to find a particular one is typically ten times shorter than if
I just kept them all in one list. That's the point of a hash table.
If hashcodes had to be unique, then no hashtable could ever store more
than (the number of possible hashcodes) items, which wouldn't be
useful; and furthermore every object that ever wanted to be stored in a
hashtable would somehow have to obtain a unique hashcode, which would
be extremely unpleasant.
I've seen a few implementations on the internet looking like this:
Class C
{
public int propertyA;
public int propertyB;
public override GetHashcode()
{
return propertyA.GetHashcode ^ propertyB.GetHashcode;
}
}
Whould the bitwize OR generate unique hashcodes for objects with all
possible combinations in propertyA and propertyB?
It wouldn't; the purpose is to create some number that partakes of the
hashcodes for both propertyA and propertyB.
In my current application there are several types of objects that inherit
from a base class that we've written. They are stored in a database and have
a primary key combined from two values in the database. The two values are
object id and version number.Then they have sveral other properties such as
name, createdByUserId, createTimeStamp, updateTimeStamp.
My guess is that I could write a GetHashcode function in the base class
looking like this:
GetHashcode()
{
return (typeof(this)).GetHashcode() ^ this.objectId ^
this.versionNumber();
}
Would this be a good implementation?
Sure.
Are there any resources which sumarize how to quickly write *good*
GetHashcode() functions?
Not that I've seen but I'm sure someone has, so watch this space.