string.GetHashCode() and HashTable calls GetHashCode() Differ?

A

Ashish Khandelwal

-----See below code,
string str = "blair";
string strValue = "ABC";
string str1 = "brainlessness";
string strValue1 = "XYZ";
int hash = str.GetHashCode() ; // Returns 175803953
int hash1 = str1.GetHashCode(); // Returns 175803953
Hashtable ht = new Hashtable();
ht.Add(hash ,strValue);
ht.Add(hash1,strValue1); // ****ERROR****
string strTmp = (string) ht[str];
string strTmp1 = (string) ht[hash1];

In Above code when i try to call GetHashCode() for both str and str1,
it returns me same Hash Code '175803953', and that's why when i try to
add into hashtable, exception generates which is normal (i know we
cannot add same key twice). Now.... see below code


string str = "blair";
string strValue = "ABC";
string str1 = "brainlessness";
string strValue1 = "XYZ";
Hashtable ht = new Hashtable();
ht.Add(str,strValue);
ht.Add(str1,strValue1);

the above code runs perfectly without any error, so now here i want to
understand one thing, as HashTable calls GetHashCode() method to get
the Hash Code of passed key and as we show in the 1st example that the
both strings are generating the same Hash Code so why there is no
exception in the 2nd example,

Does HashTable use some other algorithm to generate the Hash Code of
passed key? if so, i think then its always better to assign object
directly as a key in stand of first generate the Hash Code and then
assign it to HashTable as a key.

(My main concentration on String as a Key)

Please help me to understand...
 
A

Alberto Poblacion

Ashish Khandelwal said:
-----See below code,
string str = "blair";
string strValue = "ABC";
string str1 = "brainlessness";
string strValue1 = "XYZ";
int hash = str.GetHashCode() ; // Returns 175803953
int hash1 = str1.GetHashCode(); // Returns 175803953
Hashtable ht = new Hashtable();
ht.Add(hash ,strValue);
ht.Add(hash1,strValue1); // ****ERROR****
string strTmp = (string) ht[str];
string strTmp1 = (string) ht[hash1];

In Above code when i try to call GetHashCode() for both str and str1,
it returns me same Hash Code '175803953', and that's why when i try to
add into hashtable, exception generates which is normal (i know we
cannot add same key twice). Now.... see below code


string str = "blair";
string strValue = "ABC";
string str1 = "brainlessness";
string strValue1 = "XYZ";
Hashtable ht = new Hashtable();
ht.Add(str,strValue);
ht.Add(str1,strValue1);

the above code runs perfectly without any error, so now here i want to
understand one thing, as HashTable calls GetHashCode() method to get
the Hash Code of passed key and as we show in the 1st example that the
both strings are generating the same Hash Code so why there is no
exception in the 2nd example,

Does HashTable use some other algorithm to generate the Hash Code of
passed key? if so, i think then its always better to assign object
directly as a key in stand of first generate the Hash Code and then
assign it to HashTable as a key.

(My main concentration on String as a Key)


In your first example you are adding two KEYS that are identical, but in
the second example you are adding two different keys withe the same
hashvalue. The first is illegal, but the second is not. When you add to a
hashtable a second key that has the same hash as an existing one, you get
what is called a "collission", and the hashtable code provides an algorithm
to solve the collissions (which will assign a different slot in the
hashtable to the second key). You do want to minimize the number of
collissions, since they reduce the performance of the hashtable, and one way
to do it is to have a good hashing algorithm that distributes the hashcodes
evenly along their range of values.
 
P

Peter Duniho

[...]
Does HashTable use some other algorithm to generate the Hash Code of
passed key? if so, i think then its always better to assign object
directly as a key in stand of first generate the Hash Code and then
assign it to HashTable as a key.

You are not allowed to have duplicate _keys_ in a Hashtable, but no such
requirement is made on the hash value itself.

The Hashtable (and similar collections) use GetHashCode() to provide fast
access to keys in the Hashtable, but collisions are allowed (they have to
be, otherwise the Hashtable would be artificially restricted in size
according to however large the actual hashed value it winds up using to
index the collection elements). Duplication is detected via the comparer
being used for the Hashtable (e.g. the default comparer would use the
IComparable interface implemented by the data type of the key), not the
hash code itself.

Collisions in the hash value slow things down (a very tiny amount), but
they don't prevent keys that are actually different from being added to
the Hashtable.

In your first example, you are using the hash code itself as a key. Since
keys can't be duplicated, the hash code collision prevents the addition of
the value for that key a second time. The Hashtable doesn't know or care
where you got that int...all it knows is that you tried to use the same
int twice.

In the second example, the string instance itself is the key. Since the
strings are in fact different, you can add each as a key for the
Hashtable. Accessing one of them will be slightly slower than the other
because their hash codes are identical, but otherwise there's no problem.

Pete
 
A

Ashish Khandelwal

Let me clear my doubt once again

As above given 2 strings are generating the same Hash Code, so when
Hash Table call the GetHashCode() method of passed keys, it will also
get the same Hash Code for both the strings, right?
so now here how it works, it is having 2 keys with same hash code, i
think you are saying that if this will be the case, Hash Code will use
comparer to find out the right value, in short if there is any
delicacy in the Hash Code (inside HashTable) hashtable is capable to
handle the case.

One more thing, As per MSDN, it is not sure that the Default
GetHashCode() (HashTable uses the same) method will always return the
same Hash Code for same object or String so in this case how Hashtable
works how it finds the right value, is there also possibility in
Hashtable to return the wrong value as the GetHashCode() method is now
returning the different Hash Code for same key?
 
J

Jon Skeet [C# MVP]

the above code runs perfectly without any error, so now here i want to
understand one thing, as HashTable calls GetHashCode() method to get
the Hash Code of passed key and as we show in the 1st example that the
both strings are generating the same Hash Code so why there is no
exception in the 2nd example

The keys aren't the same. The hashtable doesn't *just* use the hash
code - it uses the hash code *and* the key. The hash code is a way of
very quickly finding all the *possible* matching keys when fetching
from the table - it then looks through all of those keys and compares
them with the key you're looking up with Equals().
 
P

Peter Duniho

Let me clear my doubt once again

As above given 2 strings are generating the same Hash Code, so when
Hash Table call the GetHashCode() method of passed keys, it will also
get the same Hash Code for both the strings, right?
so now here how it works, it is having 2 keys with same hash code, i
think you are saying that if this will be the case, Hash Code will use
comparer to find out the right value, in short if there is any
delicacy in the Hash Code (inside HashTable) hashtable is capable to
handle the case.

I don't understand this question. What does "delicacy" mean here?

As was explained to you in the previous thread, it is impossible to create
a hash function that generates unique hash codes for every possible
input. By definition, two different inputs can create the same hash
code. So obviously the Hashtable (and similar collections) must deal with
collisions gracefully.

Not only is the Hashtable class capable of dealing with that situation, it
and any other class using hash codes _must_ be able to deal with that
situation in order to operate correctly.
One more thing, As per MSDN, it is not sure that the Default
GetHashCode() (HashTable uses the same) method will always return the
same Hash Code for same object or String

I also don't understand this statement. What do you mean "as per MSDN"?
Do you have a specific reference that explains what you're saying?

GetHashCode() _must_ always return the same value for the same object. It
_may_ also return the same value for different objects that are considered
equal (I believe this is the usual implementation).

So I don't know what you mean by "it is not sure
that...GetHashCode()...will always return the same Hash Code for same
object". I would say it definitely _is_ sure that will happen. Why do
you say it's not?
so in this case how Hashtable
works how it finds the right value, is there also possibility in
Hashtable to return the wrong value as the GetHashCode() method is now
returning the different Hash Code for same key?

When is GetHashCode() returning a different hash code for the same key?
That should never happen (see above).

Pete
 
A

Ashish Khandelwal

Oooppps... Typing mistake

Delicacy = Duplicate


The default implementation of the GetHashCode method does not
guarantee unique return values for different objects. Furthermore,
the .NET Framework does not guarantee the default implementation of
the GetHashCode method, and the value it returns will be the same
between different versions of the .NET Framework. Consequently, the
default implementation of this method must not be used as a unique
object identifier for hashing purposes.


For detail please see http://msdn2.microsoft.com/en-us/library/system.object.gethashcode.aspx
 
P

Peter Duniho

Oooppps... Typing mistake

Delicacy = Duplicate

Ah. That makes more sense.

But for the MSDN doc you cited, that doesn't say what you said it does.
The closest it comes to saying that the same object won't return the same
hash code is where it says "the .NET Framework does not guarantee the
default implementation of the GetHashCode method, and the value it returns
will be the same between different versions of the .NET Framework". But
that's very different than saying that there's a possibility you'll get
the wrong value from GetHashCode().

The documentation is basically saying that you can rely on the hash code
for a given execution of your application, but that you should not store
it externally, or even transmit it to some other process that might be
running a different version of .NET, because different implementations of
..NET might calculate the hash code differently.

But as long as you treat the hash code as strictly an internal, run-time
attribute of your objects, there's no problem.

Pete
 
A

Ashish Khandelwal

Thanks a Lot Peter, its making sense to me..

One more question:
Can you able to say me that what is the reason that hash Code can
not be Unique for different Objects?
 
M

Marc Gravell

    Can you able to say me that what is the reason that hash Code can
not be Unique for different Objects?

Very simple; it is an integer, and has only 2^32 possible values. Now
imagine (as a simple case) that your object is a "long" (Int64)... now
keep incrementing "i" (Int64) and get the hash-code; *eventually* you
are going to see duplicates, simply because you have run out of unused
Int32 values.

The same is true of any data type where there are more than 2^32
feasible values.

As such, hash-tables only use the hash-code to group things; they
don't enforce uniqueness on the hash-code. Two different objects can
return the same hash-code, but two objects that should be *considered*
equal *must* report the same hash-code.

Finally, the following is a perfectly legal (albeit stupid) hash-code
routine:

public override int GetHashCode() {
return 17;
}

Marc
 
A

Ashish Khandelwal

Thanks a lot...

I got really good responses and very satisfy with the answers i got
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top