GetHashCode()

G

Guest

Hi
I have question for GetHashCode() function, Is it correct in following code or there is more efficient way to implement GetHashCode() function

class IntArray
public int[] data

public override int GetHashCode()
int hash = 0
for (int i = 0; i < data.Length; i++

hash ^= data

return hash


public override bool Equals(System.Object cs_

if (obj == null || GetType() != cs_.GetType()) return false

IntArray cs = (IntArray) cs_
if (data.Length != cs.data.Length) return false
for (int i = 0; i < data.Length; i++

if (data != cs.data) return false

return true



As documentation says
If two objects of the same type represent the same value, the hash function must return the same constant value for either object.

Is CLI library is viewable or open source? So I can take reference. Can any one point where to get it

Thank you
Avin Patel
 
M

mikeb

Avin said:
Hi,
I have question for GetHashCode() function, Is it correct in following code or there is more efficient way to implement GetHashCode() function.

class IntArray {
public int[] data;

public override int GetHashCode() {
int hash = 0;
for (int i = 0; i < data.Length; i++)
{
hash ^= data;
}
return hash;
}

public override bool Equals(System.Object cs_)
{
if (obj == null || GetType() != cs_.GetType()) return false;

IntArray cs = (IntArray) cs_;
if (data.Length != cs.data.Length) return false;
for (int i = 0; i < data.Length; i++)
{
if (data != cs.data) return false;
}
return true;
}
}

As documentation says:
If two objects of the same type represent the same value, the hash function must return the same constant value for either object.


In my opinion, if the GetHashCode() function takes about as much
processing as Equals(), then there's not much point to it.

However, only if you have a decent idea of what type of data will be in
your class can you make a really good determination of what the hashcode
function should do.

For example, if your array is only ever going to have at most 10
elements, then there's not much you can do to make the hash function
better - it's going to be pretty fast no matter what.

However, if your arrays are going to routinely have 100000 elements or
more, then it probably makes sense to have a different hash function.
Suppose that for any 2 different arrays, you can expect that 99.9% of
the time that they will also be different in the first 10 elements. if
that's the case, you can hash over only the 1st 10 elements and you'd
have a good hash function.

However, if the 1st 10 elements are likely to be the same (maybe they're
a fileheader), then that would not be a good hash function... but maybe
hashing over the last 10 elements would work.

If you have no idea what kind of data will be in the array, then I'd
probably hash over the 1st X number of elements, where X is a number
that seems to work well after experimentation.
Is CLI library is viewable or open source? So I can take reference. Can any one point where to get it?

See Microsoft's 'Shared Source CLI':


http://www.microsoft.com/downloads/details.aspx?FamilyId=3A1C93FA-7462-47D0-8E56-8DD34C6292F0

or Mono:

http://www.go-mono.com/
 
J

Jon Skeet [C# MVP]

mikeb said:
In my opinion, if the GetHashCode() function takes about as much
processing as Equals(), then there's not much point to it.

Why not? Suppose you have 100 keys in a hashtable, and you know the
hashcodes of all of them, and they're all distinct.

Finding the matching object when you're presented with a new key (to
retrieve with) only requires the hashcode of the new key to be
calculated, and then Equals to be called. Compare that with calling
Equals on *every* key within the table.
 
G

Guest

Hi
So now as I understood, objects are stored in Hash Table. For Equals() function call, First it checks the matching Hash Key, If it is than it checks for matching value. If hashkey doesn't match than it will call Equals() function further

So I guess I should be calculating GetHashKey() function value for first 20 elements in array. As array size is not known to me

Thank you
Avin Patel
 
J

Jon Skeet [C# MVP]

Avin Patel said:
So now as I understood, objects are stored in Hash Table. For
Equals() function call, First it checks the matching Hash Key, If it
is than it checks for matching value. If hashkey doesn't match than
it will call Equals() function further.

No, it only calls Equals if the hash *does* match. My point was that it
would have to call Equals on everything if hashing didn't exist at all.
So I guess I should be calculating GetHashKey() function value for
first 20 elements in array. As array size is not known to me.

Well, you could - but that would give an awful hash function if the
first 20 elements were the same for all instances, for example.
 
M

mikeb

Jon said:
Why not? Suppose you have 100 keys in a hashtable, and you know the
hashcodes of all of them, and they're all distinct.

Finding the matching object when you're presented with a new key (to
retrieve with) only requires the hashcode of the new key to be
calculated, and then Equals to be called. Compare that with calling
Equals on *every* key within the table.

True - I went overboard. However, if you can generate a good hash (no or
very rare collisions) by iterating over 10 elements (or 20 or whatever),
then it's not really necessary to iterate over the entire object (if
it's large).

Note that I'm hiding behind a lot of caveats there. As with so many
things it can all depend on what you know about the data (if anything),
and there's no reason to optimize before knowing there's a problem.
Especially a hash function, since it's generally so easy to speed up if
it did turn out to be a problem.
 
B

Bruno Jouhier [MVP]

The requirement is that a.GetHashCode() and b.GetHashCode() be equal if
a.Equals(b) is true, but not in the other direction: a.Equals(b) may be
false, although a.GetHashCode() and b.GetHashCode() are equal.

So, you don't need to be as precise in your GetHashCode() as in your Equals
method. If your arrays are likely to contain lots of elements, I would use a
"sampling" of the elements to compute GetHashCode(). Something like:

const int HashSampleLen = 20; // adjust as necessary

public override int GetHashCode()
{
int hash = 0;
int step = Max(data.Length / HashSampleLen, 1);
for (int i = 0; i < data.Length; i += step)
hash ^= data;
return hash;
}

The System.String class uses a similar strategy to compute fast hash codes
even on long strings.

Bruno.

Avin Patel said:
Hi,
I have question for GetHashCode() function, Is it correct in following
code or there is more efficient way to implement GetHashCode() function.
class IntArray {
public int[] data;

public override int GetHashCode() {
int hash = 0;
for (int i = 0; i < data.Length; i++)
{
hash ^= data;
}
return hash;
}

public override bool Equals(System.Object cs_)
{
if (obj == null || GetType() != cs_.GetType()) return false;

IntArray cs = (IntArray) cs_;
if (data.Length != cs.data.Length) return false;
for (int i = 0; i < data.Length; i++)
{
if (data != cs.data) return false;
}
return true;
}
}

As documentation says:
If two objects of the same type represent the same value, the hash

function must return the same constant value for either object.
 
G

Guest

Hi Bruno,

The requirement is that a.GetHashCode() and b.GetHashCode() be equal if
a.Equals(b) is true, but not in the other direction: a.Equals(b) may be
false, although a.GetHashCode() and b.GetHashCode() are equal.

This is very good point. I have done sampling for first 20 elements.

Thank You,
Avin Patel
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top