Own implementation of GetHashCode()

M

Matthias Kientz

I have a class which should override the GetHashcode() function, which
look like this:

public class A
{
public string str1;
public string str2;

//other methods...

public override int GetHashCode()
{
return ???;
}
}

The instance members str1 and str2 are the "primary keys" of my class.
All instances of A with the same values for str1 and str2 should return
the same hash code. The implementation should be also as fast as possible.

An instance with A.str1 = "x" and A.str2 = "y" should return a different
hashcode as an instance with A.str1 = "y" and A.str2 = "x".

How to implement such a behavior and ensure that my hash code is
consistent? How does the implementation of System.String do this?

Thanks for any help and suggestions.
Matthias
 
D

Dennis Myrén

There are probably many ways, but this is one of them.

You could concatenate str1 and str2 and then use that hash code, as of the
System.String.GetHashCode implementation.


public override int GetHashCode()
{
return string.Concat(str1, str2).GetHashCode();
}
 
M

Matthias Kientz

Dennis said:
There are probably many ways, but this is one of them.

You could concatenate str1 and str2 and then use that hash code, as of the
System.String.GetHashCode implementation.


public override int GetHashCode()
{
return string.Concat(str1, str2).GetHashCode();
}

This was my first idea, too.

But this will return the same hash code for the following instances:
InstanceA.str1 = "x"; InstanceA.str2 = "yz";
InstanceB.str1 = "xy"; InstanceB.str2 = "z";

The goal is, to find a way to ensure that different values of str1 and
str2 result in a differnt hashcode.
 
S

Samuel R. Neff

You can never guarantee that has codes will be unique, but you can use
additional calculations to reduce the chance of duplication. The
following calculates the hash codes individually, performs a bit shift
on one, and then xors them.

Note that GetHashCode should be a fast operation and if the calcuation
is too long, you can reduce the benefit of addtional uniqueness (i.e.,
time it takes to calculate for all values vs. the time it takes to
loop through the buckets for collided values).

HTH,

Sam


using System;

namespace __WinTesterCS
{
public class GetHashCodeTest
{
public static void Test()
{
Strings[] a = new Strings[5];
a[0] = new Strings("a","bc");
a[1] = new Strings("ab", "c");
a[2] = new Strings("abc","");
a[3] = new Strings("","abc");
a[4] = new Strings(null,"abc");

for(int i=0; i<a.Length; i++)
{
Console.WriteLine(a.String1 + " / " + a.String2 + " -> "
+ a.GetHashCode().ToString("X"));
}
}
}

public class Strings
{
public string String1;
public string String2;

public Strings(string string1, string string2)
{
String1 = string1;
String2 = string2;
}

public override int GetHashCode()
{
int i1 = String1 != null ? String1.GetHashCode() : 0;
int i2 = String2 != null ? String2.GetHashCode() : 0;
return (i1 >> 1) ^ i2;
}

}

}

B-Line is now hiring one Washington D.C. area VB.NET
developer for WinForms + WebServices position.
Seaking mid to senior level developer. For
information or to apply e-mail resume to
sam_blinex_com.
 
M

Matthias Kientz

Samuel said:
You can never guarantee that has codes will be unique, but you can use
additional calculations to reduce the chance of duplication. The
following calculates the hash codes individually, performs a bit shift
on one, and then xors them.

Note that GetHashCode should be a fast operation and if the calcuation
is too long, you can reduce the benefit of addtional uniqueness (i.e.,
time it takes to calculate for all values vs. the time it takes to
loop through the buckets for collided values).

HTH,

Sam


using System;

namespace __WinTesterCS
{
public class GetHashCodeTest
{
public static void Test()
{
Strings[] a = new Strings[5];
a[0] = new Strings("a","bc");
a[1] = new Strings("ab", "c");
a[2] = new Strings("abc","");
a[3] = new Strings("","abc");
a[4] = new Strings(null,"abc");

for(int i=0; i<a.Length; i++)
{
Console.WriteLine(a.String1 + " / " + a.String2 + " -> "
+ a.GetHashCode().ToString("X"));
}
}
}

public class Strings
{
public string String1;
public string String2;

public Strings(string string1, string string2)
{
String1 = string1;
String2 = string2;
}

public override int GetHashCode()
{
int i1 = String1 != null ? String1.GetHashCode() : 0;
int i2 = String2 != null ? String2.GetHashCode() : 0;
return (i1 >> 1) ^ i2;
}

}

}


Hi Samuel,

I've tested your suggestion, but it produces to much collisions for my
purposes (I have many short strings as keys).

My solution is to copy each character of the strings (alternating) to a
buffer, convert this to a string and get the hash code. Because this is
not very fast, I do this in the constructor and save the hash code in
the instance.

public class A
{
protected string str1;
protected string str2;
protected hash;

public A(string s1, string s2)
{
int n1 = s1.Length;
int n2 = s2.Length;
int j = 0;
str1 = s1;
str2 = s2;

// the buffer gets the double length of the largest string
int nLength = n1 > n2 ?
n1 * 2 : n2 * 2;

char[] buffer = new char[nLength];

// copy each char alternating, use spaces to fill the smaller string
for(int i = 0; i < nLength; i++)
{
buffer = j < n1 ? s1[j] : ' ';
i++;
buffer = j < n2 ? s2[j] : ' ';
j++;
}

// convert to string and get the hash code
hash = new String(buffer).GetHashCode();
}

public override int GetHashCode()
{
// use our saved hash code, this is very, very fast now
return hash;
}
}
 
M

Martin

This was my first idea, too.

But this will return the same hash code for the following instances:
InstanceA.str1 = "x"; InstanceA.str2 = "yz";
InstanceB.str1 = "xy"; InstanceB.str2 = "z";

The goal is, to find a way to ensure that different values of str1 and
str2 result in a differnt hashcode.

If you have some knowledge about your strings maybe you can just add an
"unused" divider between
the strings to make them more likely to be unique for any two string pairs

public override int GetHashCode()
{
string DIV = "@"; //or even "@$¤%&098ggj494"
return string.Concat(str1+DIV, str2).GetHashCode();
}
you can move the calculation to the constructor and return a cached value of
the hashcode as you've
already shown

Martin
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top