Binary representation of a double?

Guest · Mar 31, 2006

Hello,

We are writing an application in C# that finds roots of a function up to
machine precision. Therefore I have to compare the number of significant bits
in two double variables (the upper- and lowerbound of the interval containing
the root). Currently I am struggling with the binary representation of a
double variable.

I have written some test code and it seems that the binary representation of
a double is given by:

MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM EEESMMMM EEEEEEEE

Where M denotes the mantissa, E the exponent and S the sign.

The location of the least significant bit (*) is seems to be given by:

MMMMMMM* MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM EEESMMMM EEEEEEEE

And the most significant bit:

MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM EEES*MMM EEEEEEEE

This would mean that the mantissa part is written as:

qrstuvwxy ijklmnop abcdefgh

Where y is the least significant bit and a the most significant bit. This is
kind of unfortunate, because we cannot simply use a bitwise shift operator to
determine if all but the least significant bit are equal. Could you confirm
if we have this bitwise structure correct?

Regards,
Martijn Kaag

We use test code similar to the following to obtain the bitwise
representation:

public string GetBitString (double d) {

byte[] bytes = ToByte(d);
string rval = â€œâ€;
for (int b=0; b<8;b++) {
rval += GetBitString (bytes) + â€œ â€;
}
return rval;
}
public unsafe Byte[] ToByte ( double d)
{

*Byte[] byte;
&byte = (*Byte[])&d;
return byt;

}

public string GetBitString (byte b)
{
string rval = "";

for (int i=0; i <8;i++)
{
if (b>=126){
// alleen >=126 als meest significante bit in b = 1
rval += "1";
}
else {
rval += "0";
}

______________________________
www.VECOZO.nl

Egbert Nierop \(MVP for IIS\) · Mar 31, 2006

Hello,

We are writing an application in C# that finds roots of a function up to
machine precision. Therefore I have to compare the number of significant
bits
in two double variables (the upper- and lowerbound of the interval
containing
the root). Currently I am struggling with the binary representation of a
double variable.

I have written some test code and it seems that the binary representation
of
a double is given by:

Have you been looking at existing code? This seems something that somebody
else might have solved.

Regards

Guest · Mar 31, 2006

Dear Egbert,

Thanks for your response.

Yes I have been looking [a lot] around for answers before posting. The
question is not about code, it is about standards and the internal
representation of floating point numbers.

I sure some else has already solved this and probably everyone with an
academic degree in computing is able to answer this question. Unfortunatley,
I cannot.

Regards,
Martijn

Michael Bray · Mar 31, 2006

The location of the least significant bit (*) is seems to be given by:

MMMMMMM* MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM EEESMMMM EEEEEEEE

And the most significant bit:

MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM EEES*MMM EEEEEEEE

This would mean that the mantissa part is written as:

qrstuvwxy ijklmnop abcdefgh

Where y is the least significant bit and a the most significant bit.
This is kind of unfortunate, because we cannot simply use a bitwise
shift operator to determine if all but the least significant bit are
equal. Could you confirm if we have this bitwise structure correct?

This is typical of little-endian systems. I can't speak specifically for
the double byte format - I see no reason to doubt your analysis of what is
where. But it should be simple to and/or/bitshift this into a number that
you can use more easily.

-mdb

Guest · Mar 31, 2006

Hi Michael,

Thanks for your response.

It would be simple if the binary representation would be

yxwvutsrq ponmlkji hgfedcba

instead of the suspected qrstuvwxy ijklmnop abcdefgh. That we we could the
*double to a *long and use shift operators to determine number of different
digits.:

long a
long b

int numDiff = 0;

for (int i = 0; i < 53; i++) {

a = a << 1
b = b << 1

if ( a != b ) {
numDiff++
}
else {
return numDiff;
}

}

Unfortunatly this does not seem to be the case.

Regards
Martijn

______________________________
www.VECOZO.nl

Michael Bray · Mar 31, 2006

Hi Michael,

Thanks for your response.

It would be simple if the binary representation would be

yxwvutsrq ponmlkji hgfedcba

instead of the suspected qrstuvwxy ijklmnop abcdefgh. That we we could
the *double to a *long and use shift operators to determine number of
different digits.:

long a
long b

int numDiff = 0;

for (int i = 0; i < 53; i++) {

a = a << 1
b = b << 1

if ( a != b ) {
numDiff++
}
else {
return numDiff;
}

}

If all you are trying to do is count the number of bits that are the
same or different between the two binary representations, you can use
BitConverter and BitArray classes to do this quite easily:

double pi = Math.PI;
double e = Math.E;

byte[] pib = BitConverter.GetBytes(pi);
byte[] eb = BitConverter.GetBytes(e);

BitArray bapi = new BitArray(pib);
BitArray baeb = new BitArray(eb);

for (int i = 0; i < bapi.Length; i++)
{
Console.WriteLine(
string.Format(
"PI.Bit({0}) = {1}; E.Bit({0}) = {2}; {3}",
i,
bapi ? 0 : 1,
baeb ? 0 : 1,
(bapi == baeb) ? "same" : "different"
)
);
}

If you don't want to compare all of the bits, adjust accordingly.

-mdb

Guest · Mar 31, 2006

We are writing an application in C# that finds roots of a function up to

machine precision. Therefore I have to compare the number of significant bits
in two double variables (the upper- and lowerbound of the interval containing
the root). Currently I am struggling with the binary representation of a
double variable.

Double is documented to have 15-16 decimal digits of precision. The number
of mantissa bits is 52 which yields 53 under the hidden bit storage format.
With hidden bit, the mantissa is shifted left until there is a 1 bit in the
high order postion, and then that bit is discarded because there is no need
to store the bit if it is always a one bit. 2^52 is 10^15.65, and 2^53 is
10^15.95, hence 15-16 decimal digits of precision.

..Net defines Double.Epsilon, the smallest nonzero positive double value. As
the relative error between two doubles approaches this Epsilon value,
computational differences between them vanish. The relative error between x
and y is (x-y)/y or nearly equivalently (y-x)/x. The purpose of the division
is to take the double's exponent out of the picture.

My recommendation is to test for relative error in the neighborhood of 10 ^
-15.

Guest · Apr 1, 2006

Hello vecozo,

The binary representation of a double is, in fact (with the bytes ordered):

SEEEEEEE EEEEMMMM MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM

The first bit of the mantissa is omitted and is always 1 for normalized numbers.

The precision P of the representation of a double D is: D * 2 ^ -53 <= P < D * 2 ^ -52.
If you take the exponent E, the precision is exactly 2 ^ (E - 53)

For more information, visit, for example:
IEEE Standard 754 Floating Point Numbers
http://steve.hollasch.net/cgindex/coding/ieeefloat.html
What Every Computer Scientist Should Know About Floating-Point Arithmetic
http://docs.sun.com/source/806-3568/ncg_goldberg.html

Regards.

<[email protected]> escribiÃ³ en el mensaje | Hello,
|
| We are writing an application in C# that finds roots of a function up to
| machine precision. Therefore I have to compare the number of significant bits
| in two double variables (the upper- and lowerbound of the interval containing
| the root). Currently I am struggling with the binary representation of a
| double variable.
|
| I have written some test code and it seems that the binary representation of
| a double is given by:
|
| MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM EEESMMMM EEEEEEEE
|
| Where M denotes the mantissa, E the exponent and S the sign.
|
| The location of the least significant bit (*) is seems to be given by:
|
| MMMMMMM* MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM EEESMMMM EEEEEEEE
|
| And the most significant bit:
|
| MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM EEES*MMM EEEEEEEE
|
| This would mean that the mantissa part is written as:
|
| qrstuvwxy ijklmnop abcdefgh
|
| Where y is the least significant bit and a the most significant bit. This is
| kind of unfortunate, because we cannot simply use a bitwise shift operator to
| determine if all but the least significant bit are equal. Could you confirm
| if we have this bitwise structure correct?
|
| Regards,
| Martijn Kaag
|
| We use test code similar to the following to obtain the bitwise
| representation:
| > public string GetBitString (double d) {
| >
| > byte[] bytes = ToByte(d);
| > string rval = â€œâ€;
| > for (int b=0; b<8;b++) {
| > rval += GetBitString (bytes) + â€œ â€;
| > }
| > return rval;
| > }
| > public unsafe Byte[] ToByte ( double d)
| > {
| >
| > *Byte[] byte;
| > &byte = (*Byte[])&d;
| > return byt;
| >
| > }
| >
| > public string GetBitString (byte b)
| > {
| > string rval = "";
| >
| >
| > for (int i=0; i <8;i++)
| > {
| > if (b>=126){
| > // alleen >=126 als meest significante bit in b = 1
| > rval += "1";
| > }
| > else {
| > rval += "0";
| > }
|
|
| ______________________________
| www.VECOZO.nl

Guest · Apr 2, 2006

Thanks Michael,

That is actually a more convenient way of handling those bits around. And..
It will help me to find the actual binary representation. To be posted here..
in a few days.

Martijn Kaag

______________________________
Martijn Kaag
VECOZO BV
www.vecozo.nl

Michael Bray said:
Hi Michael,

Thanks for your response.

It would be simple if the binary representation would be

yxwvutsrq ponmlkji hgfedcba

instead of the suspected qrstuvwxy ijklmnop abcdefgh. That we we could
the *double to a *long and use shift operators to determine number of
different digits.:

long a
long b

int numDiff = 0;

for (int i = 0; i < 53; i++) {

a = a << 1
b = b << 1

if ( a != b ) {
numDiff++
}
else {
return numDiff;
}

}

Click to expand...

If all you are trying to do is count the number of bits that are the
same or different between the two binary representations, you can use
BitConverter and BitArray classes to do this quite easily:

double pi = Math.PI;
double e = Math.E;

byte[] pib = BitConverter.GetBytes(pi);
byte[] eb = BitConverter.GetBytes(e);

BitArray bapi = new BitArray(pib);
BitArray baeb = new BitArray(eb);

for (int i = 0; i < bapi.Length; i++)
{
Console.WriteLine(
string.Format(
"PI.Bit({0}) = {1}; E.Bit({0}) = {2}; {3}",
i,
bapi ? 0 : 1,
baeb ? 0 : 1,
(bapi == baeb) ? "same" : "different"
)
);
}

If you don't want to compare all of the bits, adjust accordingly.

-mdb

Guest · Apr 2, 2006

Thanx Amercer,

In our current implementation we test if the difference between the two
double is greater than:

public static double GetLeastSignificantBit ( double d )
{
d = Math.Abs (d);
int power2 = (int)Math.Floor(Math.Log (d)/Constants.Ln2);
double bit = Math.Pow (2, power2-52);

System.Diagnostics.Debug.Assert ((d + bit) != d);
System.Diagnostics.Debug.Assert ((d + bit/4) == d);

return bit;
}

This comes very close to your recommendation. Thanks!

Guest · Apr 2, 2006

Thanks JosÃ©!

That is what my book on numerical mathematics told me as well. Unfortunatly
the real binary representation (on "little endian") systems differs from this
specification.

Hopefully I post the answer here in a few days.

Regards,
Martijn Kaag

JosÃ© Manuel AgÃ¼ero said:
Hello vecozo,

The binary representation of a double is, in fact (with the bytes ordered):

SEEEEEEE EEEEMMMM MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM

The first bit of the mantissa is omitted and is always 1 for normalized numbers.

The precision P of the representation of a double D is: D * 2 ^ -53 <= P < D * 2 ^ -52.
If you take the exponent E, the precision is exactly 2 ^ (E - 53)

For more information, visit, for example:
IEEE Standard 754 Floating Point Numbers
http://steve.hollasch.net/cgindex/coding/ieeefloat.html
What Every Computer Scientist Should Know About Floating-Point Arithmetic
http://docs.sun.com/source/806-3568/ncg_goldberg.html

Regards.

<[email protected]> escribiÃ³ en el mensaje | Hello,
|
| We are writing an application in C# that finds roots of a function up to
| machine precision. Therefore I have to compare the number of significant bits
| in two double variables (the upper- and lowerbound of the interval containing
| the root). Currently I am struggling with the binary representation of a
| double variable.
|
| I have written some test code and it seems that the binary representation of
| a double is given by:
|
| MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM EEESMMMM EEEEEEEE
|
| Where M denotes the mantissa, E the exponent and S the sign.
|
| The location of the least significant bit (*) is seems to be given by:
|
| MMMMMMM* MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM EEESMMMM EEEEEEEE
|
| And the most significant bit:
|
| MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM MMMMMMMM EEES*MMM EEEEEEEE
|
| This would mean that the mantissa part is written as:
|
| qrstuvwxy ijklmnop abcdefgh
|
| Where y is the least significant bit and a the most significant bit. This is
| kind of unfortunate, because we cannot simply use a bitwise shift operator to
| determine if all but the least significant bit are equal. Could you confirm
| if we have this bitwise structure correct?
|
| Regards,
| Martijn Kaag
|
| We use test code similar to the following to obtain the bitwise
| representation:
| > public string GetBitString (double d) {
| >
| > byte[] bytes = ToByte(d);
| > string rval = â€œâ€;
| > for (int b=0; b<8;b++) {
| > rval += GetBitString (bytes) + â€œ â€;
| > }
| > return rval;
| > }
| > public unsafe Byte[] ToByte ( double d)
| > {
| >
| > *Byte[] byte;
| > &byte = (*Byte[])&d;
| > return byt;
| >
| > }
| >
| > public string GetBitString (byte b)
| > {
| > string rval = "";
| >
| >
| > for (int i=0; i <8;i++)
| > {
| > if (b>=126){
| > // alleen >=126 als meest significante bit in b = 1
| > rval += "1";
| > }
| > else {
| > rval += "0";
| > }
|
|
| ______________________________
| www.VECOZO.nl

Guest · Apr 5, 2006

[email protected] said:
It would be simple if the binary representation would be

yxwvutsrq ponmlkji hgfedcba

instead of the suspected qrstuvwxy ijklmnop abcdefgh. That we we could the
*double to a *long and use shift operators to determine number of different
digits.:
...
Unfortunatly this does not seem to be the case.

Actually, this isn't a problem, and casting from a *double to a *long will
do exactly what you want. The bytes appear to be "backwards" when you look
at the double byte by byte because of Intel endian conventions, but you have
to keep in mind that the representation of a long is similarly "backwards".
So if you cast to a long, the lowest order bit of the double's mantissa will
be in the lowest order bit of the long, and the highest order bit of the
exponent will be in the highest order bit of the long.

-Thomee-

Hidden members of a base class	4	Jan 7, 2009
Reading a binary field	1	Jul 6, 2007
extracting the text out of a binary file	6	Aug 5, 2005
Out of memory eventhough Garbage collection is in the code	9	Nov 3, 2008
conversion help some vb asp to c#	2	Jul 30, 2004
Bit of a VBA problem.	11	Mar 6, 2006
Ziff-Davis Benchmarking Operation	1	Jul 9, 2007
Beyond3D article on ATI 'Xenos' - the graphics processor of Xbox 360	1	Jun 13, 2005

Binary representation of a double?

Guest

Egbert Nierop \(MVP for IIS\)

Guest

Michael Bray

Guest

Michael Bray

Guest

Guest

Guest

Guest

Guest

Guest

Ask a Question

Similar Threads