Binary representation of a double?

G

Guest

Hello,

We are writing an application in C# that finds roots of a function up to
machine precision. Therefore I have to compare the number of significant bits
in two double variables (the upper- and lowerbound of the interval containing
the root). Currently I am struggling with the binary representation of a
double variable.

I have written some test code and it seems that the binary representation of
a double is given by:

MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM EEESMMMM EEEEEEEE

Where M denotes the mantissa, E the exponent and S the sign.

The location of the least significant bit (*) is seems to be given by:

MMMMMMM* MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM EEESMMMM EEEEEEEE

And the most significant bit:

MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM EEES*MMM EEEEEEEE

This would mean that the mantissa part is written as:

qrstuvwxy ijklmnop abcdefgh

Where y is the least significant bit and a the most significant bit. This is
kind of unfortunate, because we cannot simply use a bitwise shift operator to
determine if all but the least significant bit are equal. Could you confirm
if we have this bitwise structure correct?

Regards,
Martijn Kaag

We use test code similar to the following to obtain the bitwise
representation:
public string GetBitString (double d) {

byte[] bytes = ToByte(d);
string rval = “â€;
for (int b=0; b<8;b++) {
rval += GetBitString (bytes) + “ â€;
}
return rval;
}
public unsafe Byte[] ToByte ( double d)
{

*Byte[] byte;
&byte = (*Byte[])&d;
return byt;

}

public string GetBitString (byte b)
{
string rval = "";


for (int i=0; i <8;i++)
{
if (b>=126){
// alleen >=126 als meest significante bit in b = 1
rval += "1";
}
else {
rval += "0";
}



______________________________
www.VECOZO.nl
 
E

Egbert Nierop \(MVP for IIS\)

Hello,

We are writing an application in C# that finds roots of a function up to
machine precision. Therefore I have to compare the number of significant
bits
in two double variables (the upper- and lowerbound of the interval
containing
the root). Currently I am struggling with the binary representation of a
double variable.

I have written some test code and it seems that the binary representation
of
a double is given by:

Have you been looking at existing code? This seems something that somebody
else might have solved.

Regards
 
G

Guest

Dear Egbert,

Thanks for your response.

Yes I have been looking [a lot] around for answers before posting. The
question is not about code, it is about standards and the internal
representation of floating point numbers.

I sure some else has already solved this and probably everyone with an
academic degree in computing is able to answer this question. Unfortunatley,
I cannot.

Regards,
Martijn
 
M

Michael Bray

The location of the least significant bit (*) is seems to be given by:

MMMMMMM* MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM EEESMMMM EEEEEEEE

And the most significant bit:

MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM EEES*MMM EEEEEEEE

This would mean that the mantissa part is written as:

qrstuvwxy ijklmnop abcdefgh

Where y is the least significant bit and a the most significant bit.
This is kind of unfortunate, because we cannot simply use a bitwise
shift operator to determine if all but the least significant bit are
equal. Could you confirm if we have this bitwise structure correct?

This is typical of little-endian systems. I can't speak specifically for
the double byte format - I see no reason to doubt your analysis of what is
where. But it should be simple to and/or/bitshift this into a number that
you can use more easily.

-mdb
 
G

Guest

Hi Michael,

Thanks for your response.

It would be simple if the binary representation would be

yxwvutsrq ponmlkji hgfedcba

instead of the suspected qrstuvwxy ijklmnop abcdefgh. That we we could the
*double to a *long and use shift operators to determine number of different
digits.:

long a
long b

int numDiff = 0;

for (int i = 0; i < 53; i++) {

a = a << 1
b = b << 1

if ( a != b ) {
numDiff++
}
else {
return numDiff;
}

}

Unfortunatly this does not seem to be the case.

Regards
Martijn

______________________________
www.VECOZO.nl
 
M

Michael Bray

Hi Michael,

Thanks for your response.

It would be simple if the binary representation would be

yxwvutsrq ponmlkji hgfedcba

instead of the suspected qrstuvwxy ijklmnop abcdefgh. That we we could
the *double to a *long and use shift operators to determine number of
different digits.:

long a
long b

int numDiff = 0;

for (int i = 0; i < 53; i++) {

a = a << 1
b = b << 1

if ( a != b ) {
numDiff++
}
else {
return numDiff;
}

}

If all you are trying to do is count the number of bits that are the
same or different between the two binary representations, you can use
BitConverter and BitArray classes to do this quite easily:

double pi = Math.PI;
double e = Math.E;

byte[] pib = BitConverter.GetBytes(pi);
byte[] eb = BitConverter.GetBytes(e);

BitArray bapi = new BitArray(pib);
BitArray baeb = new BitArray(eb);

for (int i = 0; i < bapi.Length; i++)
{
Console.WriteLine(
string.Format(
"PI.Bit({0}) = {1}; E.Bit({0}) = {2}; {3}",
i,
bapi ? 0 : 1,
baeb ? 0 : 1,
(bapi == baeb) ? "same" : "different"
)
);
}

If you don't want to compare all of the bits, adjust accordingly.

-mdb
 
G

Guest

We are writing an application in C# that finds roots of a function up to
machine precision. Therefore I have to compare the number of significant bits
in two double variables (the upper- and lowerbound of the interval containing
the root). Currently I am struggling with the binary representation of a
double variable.

Double is documented to have 15-16 decimal digits of precision. The number
of mantissa bits is 52 which yields 53 under the hidden bit storage format.
With hidden bit, the mantissa is shifted left until there is a 1 bit in the
high order postion, and then that bit is discarded because there is no need
to store the bit if it is always a one bit. 2^52 is 10^15.65, and 2^53 is
10^15.95, hence 15-16 decimal digits of precision.

..Net defines Double.Epsilon, the smallest nonzero positive double value. As
the relative error between two doubles approaches this Epsilon value,
computational differences between them vanish. The relative error between x
and y is (x-y)/y or nearly equivalently (y-x)/x. The purpose of the division
is to take the double's exponent out of the picture.

My recommendation is to test for relative error in the neighborhood of 10 ^
-15.
 
G

Guest

Hello vecozo,

The binary representation of a double is, in fact (with the bytes ordered):

SEEEEEEE EEEEMMMM MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM

The first bit of the mantissa is omitted and is always 1 for normalized numbers.

The precision P of the representation of a double D is: D * 2 ^ -53 <= P < D * 2 ^ -52.
If you take the exponent E, the precision is exactly 2 ^ (E - 53)

For more information, visit, for example:
IEEE Standard 754 Floating Point Numbers
http://steve.hollasch.net/cgindex/coding/ieeefloat.html
What Every Computer Scientist Should Know About Floating-Point Arithmetic
http://docs.sun.com/source/806-3568/ncg_goldberg.html

Regards.


<[email protected]> escribió en el mensaje | Hello,
|
| We are writing an application in C# that finds roots of a function up to
| machine precision. Therefore I have to compare the number of significant bits
| in two double variables (the upper- and lowerbound of the interval containing
| the root). Currently I am struggling with the binary representation of a
| double variable.
|
| I have written some test code and it seems that the binary representation of
| a double is given by:
|
| MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM EEESMMMM EEEEEEEE
|
| Where M denotes the mantissa, E the exponent and S the sign.
|
| The location of the least significant bit (*) is seems to be given by:
|
| MMMMMMM* MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM EEESMMMM EEEEEEEE
|
| And the most significant bit:
|
| MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM EEES*MMM EEEEEEEE
|
| This would mean that the mantissa part is written as:
|
| qrstuvwxy ijklmnop abcdefgh
|
| Where y is the least significant bit and a the most significant bit. This is
| kind of unfortunate, because we cannot simply use a bitwise shift operator to
| determine if all but the least significant bit are equal. Could you confirm
| if we have this bitwise structure correct?
|
| Regards,
| Martijn Kaag
|
| We use test code similar to the following to obtain the bitwise
| representation:
| > public string GetBitString (double d) {
| >
| > byte[] bytes = ToByte(d);
| > string rval = “â€;
| > for (int b=0; b<8;b++) {
| > rval += GetBitString (bytes) + “ â€;
| > }
| > return rval;
| > }
| > public unsafe Byte[] ToByte ( double d)
| > {
| >
| > *Byte[] byte;
| > &byte = (*Byte[])&d;
| > return byt;
| >
| > }
| >
| > public string GetBitString (byte b)
| > {
| > string rval = "";
| >
| >
| > for (int i=0; i <8;i++)
| > {
| > if (b>=126){
| > // alleen >=126 als meest significante bit in b = 1
| > rval += "1";
| > }
| > else {
| > rval += "0";
| > }
|
|
| ______________________________
| www.VECOZO.nl
 
G

Guest

Thanks Michael,

That is actually a more convenient way of handling those bits around. And..
It will help me to find the actual binary representation. To be posted here..
in a few days.

Martijn Kaag


______________________________
Martijn Kaag
VECOZO BV
www.vecozo.nl


Michael Bray said:
Hi Michael,

Thanks for your response.

It would be simple if the binary representation would be

yxwvutsrq ponmlkji hgfedcba

instead of the suspected qrstuvwxy ijklmnop abcdefgh. That we we could
the *double to a *long and use shift operators to determine number of
different digits.:

long a
long b

int numDiff = 0;

for (int i = 0; i < 53; i++) {

a = a << 1
b = b << 1

if ( a != b ) {
numDiff++
}
else {
return numDiff;
}

}

If all you are trying to do is count the number of bits that are the
same or different between the two binary representations, you can use
BitConverter and BitArray classes to do this quite easily:

double pi = Math.PI;
double e = Math.E;

byte[] pib = BitConverter.GetBytes(pi);
byte[] eb = BitConverter.GetBytes(e);

BitArray bapi = new BitArray(pib);
BitArray baeb = new BitArray(eb);

for (int i = 0; i < bapi.Length; i++)
{
Console.WriteLine(
string.Format(
"PI.Bit({0}) = {1}; E.Bit({0}) = {2}; {3}",
i,
bapi ? 0 : 1,
baeb ? 0 : 1,
(bapi == baeb) ? "same" : "different"
)
);
}

If you don't want to compare all of the bits, adjust accordingly.

-mdb
 
G

Guest

Thanx Amercer,

In our current implementation we test if the difference between the two
double is greater than:

public static double GetLeastSignificantBit ( double d )
{
d = Math.Abs (d);
int power2 = (int)Math.Floor(Math.Log (d)/Constants.Ln2);
double bit = Math.Pow (2, power2-52);

System.Diagnostics.Debug.Assert ((d + bit) != d);
System.Diagnostics.Debug.Assert ((d + bit/4) == d);

return bit;
}

This comes very close to your recommendation. Thanks!
 
G

Guest

Thanks José!

That is what my book on numerical mathematics told me as well. Unfortunatly
the real binary representation (on "little endian") systems differs from this
specification.

Hopefully I post the answer here in a few days.

Regards,
Martijn Kaag

José Manuel Agüero said:
Hello vecozo,

The binary representation of a double is, in fact (with the bytes ordered):

SEEEEEEE EEEEMMMM MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM

The first bit of the mantissa is omitted and is always 1 for normalized numbers.

The precision P of the representation of a double D is: D * 2 ^ -53 <= P < D * 2 ^ -52.
If you take the exponent E, the precision is exactly 2 ^ (E - 53)

For more information, visit, for example:
IEEE Standard 754 Floating Point Numbers
http://steve.hollasch.net/cgindex/coding/ieeefloat.html
What Every Computer Scientist Should Know About Floating-Point Arithmetic
http://docs.sun.com/source/806-3568/ncg_goldberg.html

Regards.


<[email protected]> escribió en el mensaje | Hello,
|
| We are writing an application in C# that finds roots of a function up to
| machine precision. Therefore I have to compare the number of significant bits
| in two double variables (the upper- and lowerbound of the interval containing
| the root). Currently I am struggling with the binary representation of a
| double variable.
|
| I have written some test code and it seems that the binary representation of
| a double is given by:
|
| MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM EEESMMMM EEEEEEEE
|
| Where M denotes the mantissa, E the exponent and S the sign.
|
| The location of the least significant bit (*) is seems to be given by:
|
| MMMMMMM* MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM EEESMMMM EEEEEEEE
|
| And the most significant bit:
|
| MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM EEES*MMM EEEEEEEE
|
| This would mean that the mantissa part is written as:
|
| qrstuvwxy ijklmnop abcdefgh
|
| Where y is the least significant bit and a the most significant bit. This is
| kind of unfortunate, because we cannot simply use a bitwise shift operator to
| determine if all but the least significant bit are equal. Could you confirm
| if we have this bitwise structure correct?
|
| Regards,
| Martijn Kaag
|
| We use test code similar to the following to obtain the bitwise
| representation:
| > public string GetBitString (double d) {
| >
| > byte[] bytes = ToByte(d);
| > string rval = “â€;
| > for (int b=0; b<8;b++) {
| > rval += GetBitString (bytes) + “ â€;
| > }
| > return rval;
| > }
| > public unsafe Byte[] ToByte ( double d)
| > {
| >
| > *Byte[] byte;
| > &byte = (*Byte[])&d;
| > return byt;
| >
| > }
| >
| > public string GetBitString (byte b)
| > {
| > string rval = "";
| >
| >
| > for (int i=0; i <8;i++)
| > {
| > if (b>=126){
| > // alleen >=126 als meest significante bit in b = 1
| > rval += "1";
| > }
| > else {
| > rval += "0";
| > }
|
|
| ______________________________
| www.VECOZO.nl
 
G

Guest

It would be simple if the binary representation would be

yxwvutsrq ponmlkji hgfedcba

instead of the suspected qrstuvwxy ijklmnop abcdefgh. That we we could the
*double to a *long and use shift operators to determine number of different
digits.:
...
Unfortunatly this does not seem to be the case.

Actually, this isn't a problem, and casting from a *double to a *long will
do exactly what you want. The bytes appear to be "backwards" when you look
at the double byte by byte because of Intel endian conventions, but you have
to keep in mind that the representation of a long is similarly "backwards".
So if you cast to a long, the lowest order bit of the double's mantissa will
be in the lowest order bit of the long, and the highest order bit of the
exponent will be in the highest order bit of the long.

-Thomee-
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top