Generic GetHashCode method

G

Guest

I'm trying to write a generic method to generate Hashcodes but am having some
problems with (generic) collections. Here is the code of my method:

public static int GetHashCode(object input)
{
try
{
Type objectType = input.GetType();
PropertyInfo[] properties = objectType.GetProperties();

int totalHashCode = 7;

foreach (PropertyInfo property in properties)
{
// Reset hashcode
int hashcode = 0;

// Get property value
object value = property.GetValue(input, null);

// Invoke GetHashCode method on property
if (property.PropertyType.IsValueType || value !=
null)
{

int.TryParse(property.PropertyType.InvokeMember("GetHashCode",
BindingFlags.Default | BindingFlags.InvokeMethod, null, value, null,
CultureInfo.InvariantCulture).ToString(), out hashcode);
}

totalHashCode ^= hashcode;
}
return totalHashCode;
}
catch(Exception)
{
// Handle exception
return 0;
}
}

public class Customer{
public Customer(){}
public string Name{get{return "Mr Blobby";}}
public string Telephone{get{return "999";}}
}

public class CustomerCollection : List<Customer>
{
public CustomerCollection(){}
}

The problem I am having is that this method doesn't really take into account
objects which are(generic) collections. eg.. (excusing my simplicity) the
Customer class would work but the Customer Collection wouldn't. Can anyone
give me some pointers to improve this? Cheers in advance for any tips!
 
B

Barry Kelly

Metaman said:
I'm trying to write a generic method to generate Hashcodes but am having some
problems with (generic) collections. Here is the code of my method:

Normally, one overrides the GetHashCode() method to provide the hashcode
for an object.

It would be clearer if you described your algorithm for generating this
hashcode. It appears that you are iterating through all the properties
of an object and getting the hashcode of each property, and xoring them
together, and using that as the hashcode.

Because GetHashCode() is virtual and all objects in .NET have it, you
can invoke it directly. As far as I can make out, you could rewrite your
method like this:

---8<---
public static int GetHashCode(object value)
{
if (value == null)
return 0;

int result = 0;
foreach (PropertyInfo prop in value.GetType().GetProperties(
BindingFlags.Instance | BindingFlags.Public))
{
if (prop.GetIndexParameters().Length == 0)
{
object propValue = prop.GetValue(value, null);
if (propValue != null)
result ^= propValue.GetHashCode();
}
}
return result;
}
--->8---

However, I don't think you should do this. I'll explain below.
public class Customer{
public Customer(){}
public string Name{get{return "Mr Blobby";}}
public string Telephone{get{return "999";}}
}

Why don't you override the GetHashCode() method instead?
The problem I am having is that this method doesn't really take into account
objects which are(generic) collections. eg.. (excusing my simplicity) the
Customer class would work but the Customer Collection wouldn't. Can anyone
give me some pointers to improve this? Cheers in advance for any tips!

What you are describing is a GetHashCode() which iterates through an
entire list, and enumerates all the properties of every object in the
list, all to produce a single hash code.

Hash codes are used to insert and retrieve values into and from data
structures which use hash codes, usually Dictionary<,> or Hashtable. One
of the most important concerns in writing a hash function is that it
should be fast.

With the deep definition of equality that you are using, these hash
tables will be expensive for inserts and expensive for lookups.

-- Barry
 
G

Guest

Thanks for your reply Barry. It slipped my mind that GetHashCode method is on
the base object as well so that makes things a little easier. The reason that
I am writing this function is that I have a number of objects that contain
quite a lot of properties. I need a reliable way of comparing two objects of
the same type to see if they are the same. Even if only one of their
properties differs slightly I need to be made aware of it. After doing some
testing the only way that seemed to work reliably was to Xor the hashcodes of
each property together. The problem with this is that for each object I then
need to override the GetHashcode method and write in all the properties
(which can be slow and unmaintainable if there is 30 properties or so)

eg..

public override int GetHashCode()
{
int result = Global.HashCodeMultiplier;
result ^= ( Name.GetHashCode() ^
Street1.GetHashCode() ^
Telephone.GetHashCode() );
return result;
}

so I thought that an alternative would be to write a generic function to
handle this and then just override the GetHashCode method for each object as
follows:

public override int GetHashCode()
{
return ObjectHelper.GetHashCode(this);
}

I appreciate that this is not the most effiecient way of checking whether
objects are the same but not sure what a better way of generating a hashcode
could be.

P.S. I actually override the Equals method first which checks the easy
things (makes sure that they are the same type etc..)
 
H

Helge Jensen

Metaman said:
Thanks for your reply Barry. It slipped my mind that GetHashCode method is on
the base object as well so that makes things a little easier. The reason that
I am writing this function is that I have a number of objects that contain
quite a lot of properties. I need a reliable way of comparing two objects of
the same type to see if they are the same. Even if only one of their
properties differs slightly I need to be made aware of it. After doing some

Then looking at the hash-codes won't be enough.

By convention:
x.Equals(y) => x.GetHashCode() == y.GetHashCode()
but
x.GetHashCode() == y.GetHashCode() =/> x.Equals(y)

If your hash-codes are intractible you may be able to argue equivalence
upto a certain acceptable error-margin, but really 2^32 is a very small
co-domain for an intractible hash.

If you wish to "fit into .NET" you should write an implementation of
IEqualityComparer<T> that uses reflections on the properties and
(recursively) applies hashing/comparison to the objects.

Note that computing the hash-value and comparing those to "optimize"
comparison will probably be a lot slower than just doing the comparison
inline, and unless you rely on intractability of the hash you will need
to do the comparison anyway.
I appreciate that this is not the most effiecient way of checking whether
objects are the same but not sure what a better way of generating a hashcode
could be.

hashing is *not* equality.
P.S. I actually override the Equals method first which checks the easy
things (makes sure that they are the same type etc..)

If x.Equals(y) then it should definatly be the case that x.GetHashCode()
== y .GetHashCode(), not the other way around.
 
G

Guest

Hey Helge,

Thanks for your reply. I appreciate that as Hashcodes determined by a
GetHashcode method are only integer values there is a finite number of
results with collision a possibility. Part of the reason why I am using
hashcodes is that I need to calculate whether objects have changed when
exporting from another system. I store hashcodes in a database and then
compare the hashcode of the exported object with the value that has
previously been stored in the database.

Is there a better way of creating a hashcode of an object that will have
less collision than a GetHashCode integer?

I agree that for comparing two objects (rather than object vs hashcode -
above) implementing the IEqualityComparer<T> would be better. I'm also aware
that hashing is *not* equality (see my overriden Equals method below).
Perhaps my wording was not correct.

public override bool Equals(object obj)
{
// Check that the parameter has value
if (obj == null) return false;

// Check that types are the same
if (GetType() != obj.GetType()) return false;

// safe because of the GetType check
Customer customer = (Customer)obj;

// Check if hash codes match (being the same does not guarantee
equality
// however if the hashcode are different then definitely not
equal)
if (! GetHashCode().Equals(customer.GetHashCode())) return false;

return true;
}
 
H

Helge Jensen

Metaman said:
Hey Helge,

Thanks for your reply. I appreciate that as Hashcodes determined by a
GetHashcode method are only integer values there is a finite number of
results with collision a possibility. Part of the reason why I am using
hashcodes is that I need to calculate whether objects have changed when
exporting from another system. I store hashcodes in a database and then
compare the hashcode of the exported object with the value that has
previously been stored in the database.

So, basically you *are* using your hash-function as an equality
relation? That is a useful technique, since you don't need to store the
original object, only the hash, to decide if anything has changed.

Unfortunately, it comes at a cost: the chance that the object is
different but has the same hash.

You may be willing to accept that risk, which is determined by the
quality of your hash-function and the size of the co-domain.

If your hash-function is perfect the chance of invalidly classifying an
object as unchanged will be 1/(2^32), which may be acceptable to you.

If your hash-function is worse than that, it may be *much* worse. If you
compare, not only to an "old" hash-value, but to a set of old values
chances of failure also increases dramatically.

You were using xor to combine hash'es, that means that any number of
occurrences of the same item modulo 2 will cancel out. As an example,
the objects { x=1; y=1 } and { x=2; y=2 } will collide. This may be a
*real* problem for you. Good hash-functions have a complicated structure
to try and prevent structural equivalence from generating hash-equivalence.
Is there a better way of creating a hashcode of an object that will have
less collision than a GetHashCode integer?

You expect to rely on GetHashCode() of each member right? and then
combine them in some way for a combined hash. I would suggest ordering
the traversal and applying a cryptographic hash (for example one of the
SHA variants) to the contanenation of the GetHashCode()'es.
public override bool Equals(object obj)
{
// Check that the parameter has value
if (obj == null) return false;

You may wish to:

if ( Object.ReferenceEquals(this,obj) )
return true;

depending on the expected self vs. non-self. comparison ratio.
// Check that types are the same
if (GetType() != obj.GetType()) return false;

So inherited implementations, without member-variables are considered
non-equal, even if all members are equal?

Test-implementations also? :)
// safe because of the GetType check
Customer customer = (Customer)obj;

// Check if hash codes match (being the same does not guarantee
equality
// however if the hashcode are different then definitely not
equal)
if (! GetHashCode().Equals(customer.GetHashCode())) return false;

So, you really need:

// Check every part sequentially
return this.x == customer.x && this.y == customer.y && ...;

or you are in fact relying on the hash-function for equality.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top