GetHashCode returns the same value for two doubles

  • Thread starter Thread starter Dan
  • Start date Start date
D

Dan

Hi,

I did a test in C#

double x1 = 1.0;
double x2 = 5.29980882362664E-315;
int h1 = x1.GetHashCode();
int h2 = x2.GetHashCode();

It turned out that both h1 and h2 = 1072693248

It is not unique! Is it a defect?
Thanks.

Dan
 
Dan said:
I did a test in C#

double x1 = 1.0;
double x2 = 5.29980882362664E-315;
int h1 = x1.GetHashCode();
int h2 = x2.GetHashCode();

It turned out that both h1 and h2 = 1072693248

It is not unique! Is it a defect?

How could all doubles have unique hashcodes? There are more doubles
than there are ints.

Anything which relies on two semantically different values having
different hashcodes is broken.
 
hi,
every time it calculates 1072693248 for x1 and x2. but x2 has a special
value.
For instance;
x2+1.0= ?
answer is quite surprising: 1.0
for x2=2.0 it calculates 1073741824
difference it is 1048576=1024*1024
there are 1048576 distinct values. Hashing function simply calculates in
this way...
 
Yunus Emre ALPÖZEN said:
hi,
every time it calculates 1072693248 for x1 and x2. but x2 has a special
value.
For instance;
x2+1.0= ?
answer is quite surprising: 1.0
for x2=2.0 it calculates 1073741824
difference it is 1048576=1024*1024
there are 1048576 distinct values. Hashing function simply calculates in
this way...

That's quite a leap of logic there...

Here's a counterexample - this class just generates random doubles and
checks whether or not the hashcode has already been seen:

using System;
using System.Collections;

class Test
{
static void Main()
{
Hashtable map = new Hashtable();
Random rng = new Random();

int distinct = 0;
while (true)
{
double d = rng.Next();

int hash = d.GetHashCode();
if (!map.ContainsKey(hash))
{
map[hash]=hash;
distinct++;
if (distinct % 1000==0)
{
Console.WriteLine (distinct);
}
}
}
}
}

It doesn't take it long to go over the number of distinct values you
claim.
 
Could not understand what u mean ???

I made several changes on your code:

using System;
using System.Collections;

class Test
{
static void Main()
{
Hashtable map = new Hashtable();
Random rng = new Random();

int distinct = 0;
int same = 0;
while (true)
{
double d = rng.Next();

int hash = d.GetHashCode();
if (!map.ContainsKey(hash))
{
map[hash]=d;
distinct++;
if (distinct % 1000==0)
{
//Console.WriteLine (distinct);
}
}
else
{
if (((double)map[hash])!=d)
{
same++;
Console.WriteLine("{0}\t{1}",d,map[hash]);
}
}
}
}
}


--

Thanks,
Yunus Emre ALPÖZEN
BSc, MCAD.NET

Yunus Emre ALPÖZEN said:
hi,
every time it calculates 1072693248 for x1 and x2. but x2 has a special
value.
For instance;
x2+1.0= ?
answer is quite surprising: 1.0
for x2=2.0 it calculates 1073741824
difference it is 1048576=1024*1024
there are 1048576 distinct values. Hashing function simply calculates in
this way...

That's quite a leap of logic there...

Here's a counterexample - this class just generates random doubles and
checks whether or not the hashcode has already been seen:

using System;
using System.Collections;

class Test
{
static void Main()
{
Hashtable map = new Hashtable();
Random rng = new Random();

int distinct = 0;
while (true)
{
double d = rng.Next();

int hash = d.GetHashCode();
if (!map.ContainsKey(hash))
{
map[hash]=hash;
distinct++;
if (distinct % 1000==0)
{
Console.WriteLine (distinct);
}
}
}
}
}

It doesn't take it long to go over the number of distinct values you
claim.
 
Yunus Emre ALPÖZEN said:
Could not understand what u mean ???

You claimed there were 1048576 distinct values returned by
Double.GetHashCode. My program claimed that there are many more than
that - and I would strongly suspect that there are actually as many
values as there are ints, seeing as it would be easily to implement
that way.
I made several changes on your code:

Your changes show that collisions occur, which is unsurprising as I
explained in another post - there are far more doubles available than
ints! I don't see what else your version shows.
 
Sorry, I misunderstood u. But i said that there are 1048576 distinct values
between 1.0 and 2.0
GetHashCode run as a one way function and maps a double value to another
integer value.

At last, there is no doubt; GetHashCode never returns a unique value. It
operates on value, not memory reference. I wish, GetHashCode returns a
unique value every time by using memory locations and value...
--

Thanks,
Yunus Emre ALPÖZEN
BSc, MCAD.NET

Yunus Emre ALPÖZEN said:
Could not understand what u mean ???

You claimed there were 1048576 distinct values returned by
Double.GetHashCode. My program claimed that there are many more than
that - and I would strongly suspect that there are actually as many
values as there are ints, seeing as it would be easily to implement
that way.
I made several changes on your code:

Your changes show that collisions occur, which is unsurprising as I
explained in another post - there are far more doubles available than
ints! I don't see what else your version shows.
 
Yunus Emre ALPÖZEN said:
Sorry, I misunderstood u. But i said that there are 1048576 distinct values
between 1.0 and 2.0

Ah - you didn't, but I can accept that you meant to :)

Actually, again there are more distinct values than that, even between
1.0 and 2.0. The following program uses the fact that when you look at
the bit representations of doubles as a long, they're ordered:

using System;
using System.Collections;

class Test
{
static void Main()
{
long d1 = BitConverter.DoubleToInt64Bits(1.0);
long d2 = BitConverter.DoubleToInt64Bits(2.0);

Console.WriteLine (d2-d1);

Hashtable map = new Hashtable();
int distinct = 0;

for (long l = d1; l <= d2; l++)
{
double d = BitConverter.Int64BitsToDouble(l);
int hash = d.GetHashCode();
if (!map.ContainsKey(hash))
{
map[hash]=""; // Avoid double boxing
distinct++;
if ((distinct%1000)==0)
{
Console.WriteLine (distinct);
}
}
}
}
}

The above finds over 10 million distinct hash codes on my box before it
runs out of memory.
GetHashCode run as a one way function and maps a double value to another
integer value.
Yes.

At last, there is no doubt; GetHashCode never returns a unique value.

Not for doubles, no.
It operates on value, not memory reference. I wish, GetHashCode returns a
unique value every time by using memory locations and value...

I'm glad it doesn't, otherwise it would be screwed up by the garbage
collector moving objects around in memory...
 
Back
Top