Internal representation of hashcode

  • Thread starter Thread starter archana
  • Start date Start date
A

archana

Hi all,

I have one question regarding hashing in .net

I have two string containing same data. When i see hashcode by calling
gethascode, i am getting same value.

I want to know how internally strings are storing using hasing.

Please clear my concept regarding hashing in ,net

thanks in advance.
 
Well, personally I don't think you should need to know, since this could be
revised at any point; it is only really guaranteed for the duration of the
app-domain anyway... you certainly shouldn't rely on it between app-domains
(e.g. don't store it in the database) - if this is your intent, use fixed,
published hashes instead.

But the code (from reflector) is below:

Marc

[ReliabilityContract(Consistency.WillNotCorruptState, Cer.MayFail)]
public override unsafe int GetHashCode()
{
fixed (char* text1 = ((char*) this))
{
char* chPtr1 = text1;
int num1 = 0x15051505;
int num2 = num1;
int* numPtr1 = (int*) chPtr1;
for (int num3 = this.Length; num3 > 0; num3 -= 4)
{
num1 = (((num1 << 5) + num1) + (num1 >> 0x1b)) ^
numPtr1[0];
if (num3 <= 2)
{
break;
}
num2 = (((num2 << 5) + num2) + (num2 >> 0x1b)) ^
numPtr1[1];
numPtr1 += 2;
}
return (num1 + (num2 * 0x5d588b65));
}
}
 
On a more generic note: yes, it is absolutely normal for 2 completely
different strings (or whatever) to return the same hashcode. Hashcode is
*not* a test of equality; it is a test of *inequality*. If the hashcodes are
different, the strings *cannot* be the same. If the hashcode is the same,
then they *might* be the same.

This is used primarily to sort objects into "buckets" for hash-tables.
Hash-tables group objects based on the hashcode (actually, a mod of the
hashcode); when looking for item "x", the code obtains the hashcode for "x",
then looks in the corresponding bucket. It now only has to look through a
small portion of the entire contents of the hashtable (i.e. all those in the
same bucket) to see if "x" is in the hashtable.

D'ya see?

Marc
 
archana said:
Hi all,

I have one question regarding hashing in .net

I have two string containing same data. When i see hashcode by calling
gethascode, i am getting same value.

That's right. Strings have value semantics, so two strings will compare as
equal if they have the same value. This requires that they also have the
same hashcode.
 
Marc Gravell said:
Well, personally I don't think you should need to know, since this could
be revised at any point; it is only really guaranteed for the duration of
the app-domain anyway... you certainly shouldn't rely on it between
app-domains (e.g. don't store it in the database) - if this is your
intent, use fixed, published hashes instead.

But it is guaranteed for the life of .NET (unless severe incompatibilities
are going to be introduced) that the hash value of a string depends only on
its contents.
 
Yes; the point I was trying to make is that /relying/ on an object hashcode
implementation not changing is not a great idea. I remember a post not so
long ago (here perhaps) from someone who had used object hashcodes to create
password hashes that were stored in the database. The implementation
changed, and of course none of the hashcodes matched any more, and nobody
could log in.

It is quite unlikely that the primitive hashcodes will change (much) in the
foreseeable future, but I would still be very, very paranoid of it, just
through defensive programming. It doesn't take much more effort to use a
well-known, public, fixed hash that will be identical on multiple
languagues, runtimes and OSes.

That's what I was trying to say...

Marc
 
Marc Gravell said:
Yes; the point I was trying to make is that /relying/ on an object
hashcode implementation not changing is not a great idea. I remember a
post not so long ago (here perhaps) from someone who had used object
hashcodes to create password hashes that were stored in the database. The
implementation changed, and of course none of the hashcodes matched any
more, and nobody could log in.

It is quite unlikely that the primitive hashcodes will change (much) in
the foreseeable future, but I would still be very, very paranoid of it,
just through defensive programming. It doesn't take much more effort to
use a well-known, public, fixed hash that will be identical on multiple
languagues, runtimes and OSes.

That's what I was trying to say...

And you were right to do so :-)

I was trying to answer the OP's question in a way that doesn't get into
specific algorithms but still makes the concepts clear:
I have two string containing same data. When i see hashcode by calling
gethascode, i am getting same value.
I want to know how internally strings are storing using hasing.
Please clear my concept regarding hashing in ,net

The answer being:

Classes in .NET can define GetHashCode() however they like, so long as
certain rules are obeyed, e,g,

a.Equals(b) -> a.GetHashCode() == b.GetHashCode()

String, in particular, determines that value of GetHashCode() purely from
the contents of the String (as the rule above implies that it must, since it
also determines Equals() that way.)
 
Mike Schilling said:
But it is guaranteed for the life of .NET (unless severe incompatibilities
are going to be introduced) that the hash value of a string depends only on
its contents.

No - it also depends on the version of .NET, to start with. The
algorithm changed between 1.1 and 2.0. I don't think it would be
unreasonable for it to change between different runs, too (and indeed I
think there's one hashcode which *does* do that - not sure which). I
would only rely on two objects within the same AppDomain which are
equal having the same hashcode.
 
Jon Skeet said:
No - it also depends on the version of .NET, to start with. The
algorithm changed between 1.1 and 2.0. I don't think it would be
unreasonable for it to change between different runs, too (and indeed I
think there's one hashcode which *does* do that - not sure which). I
would only rely on two objects within the same AppDomain which are
equal having the same hashcode.

Perhaps I didn't express myself clearly. I meant that in all versions of
..NET (now and forever), it will be true that the value of the hashcode will
depend only on the contents of the string. In other words, String will
always have value semantics.
 
You are right. I kinda mis-interpreted the original post, and hand-on-heart
admit that my answer was over-complicated for the original question. I saw
the "internal representation" (subject line), and made it all horribly
complex. I think my second post was nearer the mark.

Apologies for any confusion (in particular, to archana). I hope you
(archana) now have the answer you need.

Marc
 
Back
Top