about the hashtable and equals

T

Tony Johansson

Hello!

First some background information before I ask the question.The text is just
copied from a book that I'm reading
The .Net Framework doesn't always understand equality as we do, however. For
example, imagine you created class Fish
that holds the name of the fish like below.

public class Fish
{
string name;

public Fish(string theName)
{
name = theName;
}
}

Now if we create two instances of the Fish class with the same name, the
Hashtable treats them as different objects, as shown in the following code
example.

Hashtable duplicates = new Hashtable();
Fish key1 = new Fish("Herring");
Fish key2 = new Fish("Herring");
duplicates[key1] = "Hello";
duplicates[key2] = "Hello";
Console.WriteLine(duplicates.Count); // give 2

The reason we have two items in the collection is because the Object class's
implementation of GetHashCode creates a hash that is likely to be unique for
each instance of a class. I can override the GetHashCode in the Fish class
to try and let the Hashtable known they are equal.

Here is the code that override the GetHashCode in class Fish
public override int GetHashCode()
{
return name.GetHashCode();
}

If I return the hash of the fish's name, the two instances of the fish will
have the same hash code.
But is that enough for the Hashtable to determine they are identical
objects. Unfortunately, no. If the Hashtable find two objects with the same
hash, it calls their Equals method to see whether the two objects are in
fact equal. Again, the default implementation of Objects.Equals will return
false if the two objects are two different instances of the same class. So
we need to also add an override of the Equals method to our Fish class.

Now to my question: I mean that if the hash code is the same as in my
example then the object are equal so I don't understand why the framework
must call the equals.

Can somebody give me an example when you have the same hashcode but the
object is not equal.

//Tony
 
W

Willem van Rumpt

Now to my question: I mean that if the hash code is the same as in my
example then the object are equal so I don't understand why the framework
must call the equals.

Can somebody give me an example when you have the same hashcode but the
object is not equal.

//Tony

Becasue, in general, a hash is not unique (unless you want a perfect
hash). Ideally, it should be as unique as possible, but it doesn't have
to be.

Hashes are a characteristic, and in a hashtable, everything with the
same characteristic goes into the same bucket. So when more than one
item is a the bucket, there has to be a way to find the correct item.
Equals solves that problem.

Think of "wears glasses" as a hashcode.
All your friends wearing glasses hash to "wears glasses", but they are
most definitely not equal to each other.
 
A

Arne Vajhøj

First some background information before I ask the question.The text is just
copied from a book that I'm reading
The .Net Framework doesn't always understand equality as we do, however. For
example, imagine you created class Fish
that holds the name of the fish like below.

public class Fish
{
string name;

public Fish(string theName)
{
name = theName;
}
}

Now if we create two instances of the Fish class with the same name, the
Hashtable treats them as different objects, as shown in the following code
example.

Hashtable duplicates = new Hashtable();
Fish key1 = new Fish("Herring");
Fish key2 = new Fish("Herring");
duplicates[key1] = "Hello";
duplicates[key2] = "Hello";
Console.WriteLine(duplicates.Count); // give 2

The reason we have two items in the collection is because the Object class's
implementation of GetHashCode creates a hash that is likely to be unique for
each instance of a class. I can override the GetHashCode in the Fish class
to try and let the Hashtable known they are equal.

Here is the code that override the GetHashCode in class Fish
public override int GetHashCode()
{
return name.GetHashCode();
}

If I return the hash of the fish's name, the two instances of the fish will
have the same hash code.
But is that enough for the Hashtable to determine they are identical
objects. Unfortunately, no. If the Hashtable find two objects with the same
hash, it calls their Equals method to see whether the two objects are in
fact equal. Again, the default implementation of Objects.Equals will return
false if the two objects are two different instances of the same class. So
we need to also add an override of the Equals method to our Fish class.

Now to my question: I mean that if the hash code is the same as in my
example then the object are equal so I don't understand why the framework
must call the equals.

Can somebody give me an example when you have the same hashcode but the
object is not equal.

The logic is:
- If Equals return true it is the "same" object
- GetHashCode returns an int for all objects, if Equals return true then
GetHashCode must return the same value, but returning "good" values
from GetHashCode is only for good performance

public override int GetHashCode()
{
return 1;
}

is valid implementation.

But it will make Hashtable lookup O(n) instead of O(1), so
it would be very bad for performance.

You want a GetHashCode that returns many different values.

Arne
 
T

Tony Johansson

Arne Vajhøj said:
First some background information before I ask the question.The text is
just
copied from a book that I'm reading
The .Net Framework doesn't always understand equality as we do, however.
For
example, imagine you created class Fish
that holds the name of the fish like below.

public class Fish
{
string name;

public Fish(string theName)
{
name = theName;
}
}

Now if we create two instances of the Fish class with the same name, the
Hashtable treats them as different objects, as shown in the following
code
example.

Hashtable duplicates = new Hashtable();
Fish key1 = new Fish("Herring");
Fish key2 = new Fish("Herring");
duplicates[key1] = "Hello";
duplicates[key2] = "Hello";
Console.WriteLine(duplicates.Count); // give 2

The reason we have two items in the collection is because the Object
class's
implementation of GetHashCode creates a hash that is likely to be unique
for
each instance of a class. I can override the GetHashCode in the Fish
class
to try and let the Hashtable known they are equal.

Here is the code that override the GetHashCode in class Fish
public override int GetHashCode()
{
return name.GetHashCode();
}

If I return the hash of the fish's name, the two instances of the fish
will
have the same hash code.
But is that enough for the Hashtable to determine they are identical
objects. Unfortunately, no. If the Hashtable find two objects with the
same
hash, it calls their Equals method to see whether the two objects are in
fact equal. Again, the default implementation of Objects.Equals will
return
false if the two objects are two different instances of the same class.
So
we need to also add an override of the Equals method to our Fish class.

Now to my question: I mean that if the hash code is the same as in my
example then the object are equal so I don't understand why the framework
must call the equals.

Can somebody give me an example when you have the same hashcode but the
object is not equal.

The logic is:
- If Equals return true it is the "same" object
- GetHashCode returns an int for all objects, if Equals return true then
GetHashCode must return the same value, but returning "good" values
from GetHashCode is only for good performance

public override int GetHashCode()
{
return 1;
}

is valid implementation.

But it will make Hashtable lookup O(n) instead of O(1), so
it would be very bad for performance.

You want a GetHashCode that returns many different values.

Arne

Hi!

I still mean that if the hash code is the same then you must have identical
keys so I don't see any point why the
framework must call the equals. ?
Can somebody exaplin that ?

//Tony



//Tony
 
P

Peter Duniho

Tony said:
I still mean that if the hash code is the same then you must have identical
keys so I don't see any point why the
framework must call the equals. ?
Can somebody exaplin that ?

Three people have already explained that your assertion is wrong.

Think about it. If the hash code was really unique — that is, if only
one string ever has a given hash code — then we'd have a fabulous way to
compress _all_ strings down to 32 bits, no matter how long they are.

Hash codes necessarily can results in "collisions". That is, two
different objects having the same hash code. The fact that two objects
you check have the same hash code most certainly DOES NOT mean that you
have identical keys.

Pete
 
A

Arne Vajhøj

Arne Vajhøj said:
First some background information before I ask the question.The text is
just
copied from a book that I'm reading
The .Net Framework doesn't always understand equality as we do, however.
For
example, imagine you created class Fish
that holds the name of the fish like below.

public class Fish
{
string name;

public Fish(string theName)
{
name = theName;
}
}

Now if we create two instances of the Fish class with the same name, the
Hashtable treats them as different objects, as shown in the following
code
example.

Hashtable duplicates = new Hashtable();
Fish key1 = new Fish("Herring");
Fish key2 = new Fish("Herring");
duplicates[key1] = "Hello";
duplicates[key2] = "Hello";
Console.WriteLine(duplicates.Count); // give 2

The reason we have two items in the collection is because the Object
class's
implementation of GetHashCode creates a hash that is likely to be unique
for
each instance of a class. I can override the GetHashCode in the Fish
class
to try and let the Hashtable known they are equal.

Here is the code that override the GetHashCode in class Fish
public override int GetHashCode()
{
return name.GetHashCode();
}

If I return the hash of the fish's name, the two instances of the fish
will
have the same hash code.
But is that enough for the Hashtable to determine they are identical
objects. Unfortunately, no. If the Hashtable find two objects with the
same
hash, it calls their Equals method to see whether the two objects are in
fact equal. Again, the default implementation of Objects.Equals will
return
false if the two objects are two different instances of the same class.
So
we need to also add an override of the Equals method to our Fish class.

Now to my question: I mean that if the hash code is the same as in my
example then the object are equal so I don't understand why the framework
must call the equals.

Can somebody give me an example when you have the same hashcode but the
object is not equal.

The logic is:
- If Equals return true it is the "same" object
- GetHashCode returns an int for all objects, if Equals return true then
GetHashCode must return the same value, but returning "good" values
from GetHashCode is only for good performance

public override int GetHashCode()
{
return 1;
}

is valid implementation.

But it will make Hashtable lookup O(n) instead of O(1), so
it would be very bad for performance.

You want a GetHashCode that returns many different values.
I still mean that if the hash code is the same then you must have identical
keys so I don't see any point why the
framework must call the equals. ?
Can somebody exaplin that ?

identical key means always identical hash code
identical hash code does not mean always identical keys
for a good hash function identical hash code means usually identical keys

But even with usually it is necessary to test with Equals to be sure.

Arne
 
H

Harlan Messinger

Tony said:
Arne Vajhøj said:
First some background information before I ask the question.The text is
just
copied from a book that I'm reading
The .Net Framework doesn't always understand equality as we do, however.
For
example, imagine you created class Fish
that holds the name of the fish like below.

public class Fish
{
string name;

public Fish(string theName)
{
name = theName;
}
}

Now if we create two instances of the Fish class with the same name, the
Hashtable treats them as different objects, as shown in the following
code
example.

Hashtable duplicates = new Hashtable();
Fish key1 = new Fish("Herring");
Fish key2 = new Fish("Herring");
duplicates[key1] = "Hello";
duplicates[key2] = "Hello";
Console.WriteLine(duplicates.Count); // give 2

The reason we have two items in the collection is because the Object
class's
implementation of GetHashCode creates a hash that is likely to be unique
for
each instance of a class. I can override the GetHashCode in the Fish
class
to try and let the Hashtable known they are equal.

Here is the code that override the GetHashCode in class Fish
public override int GetHashCode()
{
return name.GetHashCode();
}

If I return the hash of the fish's name, the two instances of the fish
will
have the same hash code.
But is that enough for the Hashtable to determine they are identical
objects. Unfortunately, no. If the Hashtable find two objects with the
same
hash, it calls their Equals method to see whether the two objects are in
fact equal. Again, the default implementation of Objects.Equals will
return
false if the two objects are two different instances of the same class.
So
we need to also add an override of the Equals method to our Fish class.

Now to my question: I mean that if the hash code is the same as in my
example then the object are equal so I don't understand why the framework
must call the equals.

Can somebody give me an example when you have the same hashcode but the
object is not equal.
The logic is:
- If Equals return true it is the "same" object
- GetHashCode returns an int for all objects, if Equals return true then
GetHashCode must return the same value, but returning "good" values
from GetHashCode is only for good performance

public override int GetHashCode()
{
return 1;
}

is valid implementation.

But it will make Hashtable lookup O(n) instead of O(1), so
it would be very bad for performance.

You want a GetHashCode that returns many different values.

Arne

Hi!

I still mean that if the hash code is the same then you must have identical
keys so I don't see any point why the
framework must call the equals. ?
Can somebody exaplin that ?

Then you aren't understanding the nature of a hash code. The number of
items in a hash table is going to be smaller, by orders of magnitude,
then the range of possible hash codes, and as long as the hash codes are
well distributed, collisions are going to be rare anyway. In addition,
the consequence of a collision when it does occur is very small.

Besides that, it isn't POSSIBLE to have what you're asking for because
the number of possible distinct object values is infinite, and the range
of available hash codes is finite.

Also, think of a dictionary with a separate tab (basically, a hash code)
for each letter of the alphabet. That helps you find a word faster,
right? Now imagine a dictionary with a separate tab for every word.
Would that be useful?
 
A

Arne Vajhøj

Harlan said:
[...]
Besides that, it isn't POSSIBLE to have what you're asking for because
the number of possible distinct object values is infinite, and the
range of available hash codes is finite. [...]

While it may seem as though modern PCs are capable of infinitely large
data, it's not actually the case. :)

It's true that the number of possible distinct object values is vastly
larger than the number of available hash codes, it's still a finite number.

..NET limits object size to 2 GB.

So there can definitely not be more than 256^2147483648
different objects.

And that is a finite number.

But also a very big number.

Arne
 
A

Arne Vajhøj

Peter said:
Harlan said:
[...]
Besides that, it isn't POSSIBLE to have what you're asking for
because the number of possible distinct object values is infinite,
and the range of available hash codes is finite. [...]

While it may seem as though modern PCs are capable of infinitely large
data, it's not actually the case. :)

The formality of "subject to the physical limitations of the hardware"
is a little tedious sometimes. :) *Conceptually*, of course, there is
no limit.

The limit in .NET is more restricting than the limit of the hardware.

Arne
 
H

Harlan Messinger

Arne said:
Peter said:
Harlan Messinger wrote:
[...]
Besides that, it isn't POSSIBLE to have what you're asking for
because the number of possible distinct object values is infinite,
and the range of available hash codes is finite. [...]

While it may seem as though modern PCs are capable of infinitely large
data, it's not actually the case. :)

The formality of "subject to the physical limitations of the hardware"
is a little tedious sometimes. :) *Conceptually*, of course, there is
no limit.

The limit in .NET is more restricting than the limit of the hardware.

If you're going to be pendantic about it, then I will as well, and I
will note that you can define a type that has a field whose value is a
path to a file, and you can define the hash code for objects of that
type as a function of not only the contents of that file, but also the
contents of all other files, recursively, that are the identified by
paths appearing in files already processed in this formula, leading to a
result that is limited only by the hardware restrictions on the number
and size of files that can be stored on a computer. There is no reason
why you would do this, but that isn't the point.
 
A

Arne Vajhøj

Arne said:
Peter Duniho wrote:
Harlan Messinger wrote:
[...]
Besides that, it isn't POSSIBLE to have what you're asking for
because the number of possible distinct object values is infinite,
and the range of available hash codes is finite. [...]

While it may seem as though modern PCs are capable of infinitely large
data, it's not actually the case. :)

The formality of "subject to the physical limitations of the hardware"
is a little tedious sometimes. :) *Conceptually*, of course, there is
no limit.

The limit in .NET is more restricting than the limit of the hardware.

If you're going to be pendantic about it,

Given that the 2 GB limit is a practical limit on sub-1000$ PC's, then
it is hardly pedantic.
then I will as well, and I
will note that you can define a type that has a field whose value is a
path to a file, and you can define the hash code for objects of that
type as a function of not only the contents of that file, but also the
contents of all other files, recursively, that are the identified by
paths appearing in files already processed in this formula, leading to a
result that is limited only by the hardware restrictions on the number
and size of files that can be stored on a computer.

That does not give more distinct values.

Two objects with the same filename should get the same file data.

In fact the files are not different from any reference type. The
object only contains the pointer. But same pointer means same data.

Arne
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top