What does KeyValuePair.GetHashCode do?

M

Marcel Müller

I have an index provider (part of a query provider) that has an internal
data structure of type
HashSet<KeyValuePair<Nullable<some_struct>, another_struct>>

some_struct and another_struct are generic. some_struct is typically
something like DateTime? or DateRange? while another_struct is always an
object id (primary key) typed to the entity type where the index belongs to.

The key in KeyValuePair cane become null (i.e. HasValue == false) in
case the property has no value assigned. (This is a well defined part of
the data model.) In this case GetHashCode always seems to return
1188265464 regardless of the value. This ends up in excessive hash
collisions.

I tried to figure out what is going on, but the trace ended at
public override extern int ValueType.GetHashCode();
Something strange seem to happen here that causes many struct instances
that do not compare equal all get the same hash code.


Marcel
 
B

bradbury9

El martes, 14 de enero de 2014 14:15:48 UTC+1, Marcel Müller escribió:
I have an index provider (part of a query provider) that has an internal

data structure of type

HashSet<KeyValuePair<Nullable<some_struct>, another_struct>>



some_struct and another_struct are generic. some_struct is typically

something like DateTime? or DateRange? while another_struct is always an

object id (primary key) typed to the entity type where the index belongs to.



The key in KeyValuePair cane become null (i.e. HasValue == false) in

case the property has no value assigned. (This is a well defined part of

the data model.) In this case GetHashCode always seems to return

1188265464 regardless of the value. This ends up in excessive hash

collisions.



I tried to figure out what is going on, but the trace ended at

public override extern int ValueType.GetHashCode();

Something strange seem to happen here that causes many struct instances

that do not compare equal all get the same hash code.





Marcel

According to MS:

http://msdn.microsoft.com/es-es/library/system.object.gethashcode(v=vs.110).aspx

A hash code is a numeric value that is used to identify an object during equality testing. It can also serve as an index for an object in a collection..
The GetHashCode method is suitable for use in hashing algorithms and data structures such as a hash table.

Hope it help, I have never played around with query providers...
 
A

Arne Vajhøj

I have an index provider (part of a query provider) that has an internal
data structure of type
HashSet<KeyValuePair<Nullable<some_struct>, another_struct>>

some_struct and another_struct are generic. some_struct is typically
something like DateTime? or DateRange? while another_struct is always an
object id (primary key) typed to the entity type where the index belongs
to.

The key in KeyValuePair cane become null (i.e. HasValue == false) in
case the property has no value assigned. (This is a well defined part of
the data model.) In this case GetHashCode always seems to return
1188265464 regardless of the value. This ends up in excessive hash
collisions.

I tried to figure out what is going on, but the trace ended at
public override extern int ValueType.GetHashCode();
Something strange seem to happen here that causes many struct instances
that do not compare equal all get the same hash code.

Let us start with the obvious: the behavior is legal. The rule is that
if Equals return true then GetHashCode must return the same value. There
are no rule that Equals return false means GetHashCode should return
different values.

public override int GetHashCode()
{
return 42;
}

is a valid implementation.

For good performance in contexts where the hash is used like
Dictionary and HashSet it is important that the returned values
of GetHashCode is well distributed.

Besides the most simple types (int, double, string etc.), then
one should never expect the given GetHashCode to be good and
if good performance is needed then one should implement a
GetHashCode with the desired characteristics.

The documentation for KeyValuePair is also rather clear.
http://msdn.microsoft.com/en-us/library/system.valuetype.gethashcode.aspx

<quote>
The GetHashCode method applies to types derived from ValueType. One or
more fields of the derived type is used to calculate the return value.
If you call the derived type's GetHashCode method, the return value is
not likely to be suitable for use as a key in a hash table.
Additionally, if the value of one or more of those fields changes, the
return value might become unsuitable for use as a key in a hash table.
In either case, consider writing your own implementation of the
GetHashCode method that more closely represents the concept of a hash
code for the type.
</quote>

In a post to SO it is explained what the ValueType implementation
actually does.

http://stackoverflow.com/questions/3841602/why-is-valuetype-gethashcode-implemented-like-it-is

<quote>
We look for the first non-static field and get it's hashcode.
</quote>

The first non-static field in KeyValuePair is key.

I think that the bottom line is that you should create your own
struct or class with a GetHashCode that matches your requirements.

Arne
 
M

Marcel Müller

Am 15.01.2014 01:07, schrieb Arne Vajhøj:
[...]

You are right. The documentation of ValueType.GetHashCode effectively
says that it is not suitable for any purpose. What else than a hash
table should use a hash?

Throwing an NotImplementedException would have been more fair-minded in
my opinion.

I think that the bottom line is that you should create your own
struct or class with a GetHashCode that matches your requirements.

or my own comparer.

I will try to find a simple to use solution and put it in our utils
assembly. Unfortunately I cannot simply replace Comperer<T>.Default.
I have no good idea so far.


Marcel
 
A

Arne Vajhøj

Am 15.01.2014 01:07, schrieb Arne Vajhøj:
[...]

You are right. The documentation of ValueType.GetHashCode effectively
says that it is not suitable for any purpose. What else than a hash
table should use a hash?

Throwing an NotImplementedException would have been more fair-minded in
my opinion.

http://msdn.microsoft.com/en-us/library/system.object.gethashcode.aspx

<quote>
The GetHashCode method should not throw exceptions.
</quote>

Arne
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top