How can I speed up ToString of decimal and double?

T

ThunderMusic

Hi,
We have a part of our application that deals with millions of records and do
some processing of them. We've achieved a pretty good performance gain by
developping a custom DateTime.ToString and a custom int.ToString, but we
can't find any clue on doing for decimal and double, which would be about
half the load if they are put together (decimal and double). By changing our
DateTime and Int ToStrings, we achieved a 84% performance gain, so now we
want to optimize decimal and double. Anyone would have a link or a clue on
how we should do it? Actually we speed it up by forcing our format rather
than relying on cultures and format string and others... we found this page
for our DateTime (
http://geekswithblogs.net/akraus1/archive/2006/04/23/76146.aspx ), so if you
have something similar for double and decimal it would be very appreciated.

Thanks

ThunderMusic

P.S. I know it's 100% performance related, but I feel people from the
framework and general forums could be concerned too, and for the c#, it's
just because our things are done with c# and I would appreciate to have
examples in c# if possible... ;)
 
R

rossum

Hi,
We have a part of our application that deals with millions of records and do
some processing of them. We've achieved a pretty good performance gain by
developping a custom DateTime.ToString and a custom int.ToString, but we
can't find any clue on doing for decimal and double, which would be about
half the load if they are put together (decimal and double). By changing our
DateTime and Int ToStrings, we achieved a 84% performance gain, so now we
want to optimize decimal and double. Anyone would have a link or a clue on
how we should do it? Actually we speed it up by forcing our format rather
than relying on cultures and format string and others... we found this page
for our DateTime (
http://geekswithblogs.net/akraus1/archive/2006/04/23/76146.aspx ), so if you
have something similar for double and decimal it would be very appreciated.

Thanks

ThunderMusic

P.S. I know it's 100% performance related, but I feel people from the
framework and general forums could be concerned too, and for the c#, it's
just because our things are done with c# and I would appreciate to have
examples in c# if possible... ;)
What format of double or decimal do you want: 12.3456 or 1.23456e+001?

Any format can be constricted from a combination of integers and other
characters, so you may well have all that you need to hand.

rossum
 
T

ThunderMusic

because we are writing them to a CSV File, so we must convert them to
string. And even if we don't use ToString and use for example
StreamWriter.WriteInt() or something like this, it seems to do a ToString in
the background because it's as slow as doing a ToString and a Write
afterward...

Thanks

ThunderMusic
 
T

ThunderMusic

we want the first one... 12.3456... for decimal, it's possible to get all
the integer values because of the big big big precision, but it's not the
case for double, if we multiply, sometimes, the digits change because of the
precision...

What is the good way of doing it? because right now, we go digit by digit by
doing bitshifts and multiplications... is there a better way?

Thanks

ThunderMusic
 
?

=?ISO-8859-1?Q?G=F6ran_Andersson?=

ThunderMusic said:
Hi,
We have a part of our application that deals with millions of records and do
some processing of them. We've achieved a pretty good performance gain by
developping a custom DateTime.ToString and a custom int.ToString, but we
can't find any clue on doing for decimal and double, which would be about
half the load if they are put together (decimal and double). By changing our
DateTime and Int ToStrings, we achieved a 84% performance gain, so now we
want to optimize decimal and double. Anyone would have a link or a clue on
how we should do it? Actually we speed it up by forcing our format rather
than relying on cultures and format string and others... we found this page
for our DateTime (
http://geekswithblogs.net/akraus1/archive/2006/04/23/76146.aspx ), so if you
have something similar for double and decimal it would be very appreciated.

Thanks

ThunderMusic

P.S. I know it's 100% performance related, but I feel people from the
framework and general forums could be concerned too, and for the c#, it's
just because our things are done with c# and I would appreciate to have
examples in c# if possible... ;)

Are you sure that it's the conversion to strings that really is the
performance problem?

What do you do with the strings after you converted them? Do you write
them directly to a stream, or do you do any string concatenation first?
 
T

ThunderMusic

hi,
I'm 100% sure it's the main bottleneck. for a million loop, for DateTime,
We've gone from ±4 seconds to 0.3 seconds and optimized int.ToString enough
to get a 4X performance gain. so We'd want to do it for decimal and double
because they are two types we use very heavily in what we do...

I write directly to the stream and it's the fastest way possible in our case
because we tried to concatenate first to build a buffering system, but it
was slower.

Thanks

ThunderMusic
 
J

JR

I had developed the following algorithm for float to string in 1962 or 1963.
This is what I remember:

Handle the sign.

Handle the special cases - zero, NaN, infinity.

Split the exponent and the mantissa as integers.

Calculate the decimal exponent k, such that 10 to k-1 is less or equal than
the number and 10 to k is larger than the number (If equal you are done).
The number then has k decimal digits in the integer part. Divide the number
by 10 to k. Now keep multiplying by 10 - each time the integer is a decimal
digit - subtract it and continue.

How to get at k quickly? Log 2 base 10 is approximately 0.3, so if you
multiply the binary exponent by 3 and divide by 10 and add 1 you get a good
approximation which may be off by 1.

I don't know whether the .NET framework uses a similar algorithm, a faster
algorithm or a slower algorithm.

JR


ThunderMusic said:
hi,
I'm 100% sure it's the main bottleneck. for a million loop, for DateTime,
We've gone from ±4 seconds to 0.3 seconds and optimized int.ToString
enough to get a 4X performance gain. so We'd want to do it for decimal and
double because they are two types we use very heavily in what we do...

I write directly to the stream and it's the fastest way possible in our
case because we tried to concatenate first to build a buffering system,
but it was slower.

Thanks

ThunderMusic
 
P

Phil Wilson

It's easy to see what's actually happening by using Reflector to look at
code. There's some overhead because date/time is culture-driven. You didn't
mention which ToString() you're using, but if you have a fixed format then I
suspect you can always beat the generic one that's culture-dependent, and
providing your own format provider could be faster than letting the
framework use the culture info.

The string issue that was alluded to is that if you're creating a million
strings it might be an appreciable overhead, and re-using one StringBuilder
could be faster.

--
--
Phil Wilson
[MVP Windows Installer]

ThunderMusic said:
hi,
I'm 100% sure it's the main bottleneck. for a million loop, for DateTime,
We've gone from ±4 seconds to 0.3 seconds and optimized int.ToString
enough to get a 4X performance gain. so We'd want to do it for decimal and
double because they are two types we use very heavily in what we do...

I write directly to the stream and it's the fastest way possible in our
case because we tried to concatenate first to build a buffering system,
but it was slower.

Thanks

ThunderMusic
 
T

ThunderMusic

thanks a lot, that's a great starting point... I mean, exactly with that,
we can manage it well enough... We'll try to see what performance gain we
can get and start optimizing from there... ;)

thanks

ThunderMusic

JR said:
I had developed the following algorithm for float to string in 1962 or
1963. This is what I remember:

Handle the sign.

Handle the special cases - zero, NaN, infinity.

Split the exponent and the mantissa as integers.

Calculate the decimal exponent k, such that 10 to k-1 is less or equal
than the number and 10 to k is larger than the number (If equal you are
done). The number then has k decimal digits in the integer part. Divide
the number by 10 to k. Now keep multiplying by 10 - each time the integer
is a decimal digit - subtract it and continue.

How to get at k quickly? Log 2 base 10 is approximately 0.3, so if you
multiply the binary exponent by 3 and divide by 10 and add 1 you get a
good approximation which may be off by 1.

I don't know whether the .NET framework uses a similar algorithm, a faster
algorithm or a slower algorithm.

JR
 
T

ThunderMusic

hi,
even if using the DateTime.ToString("OurFormat"), it was still slow, and we
got it very very quick now... I don't know if we will be able to speed up
decimal and double too, but we have to have a starting point and I think JR
just gave us the starting point we need...

We know the structure for a double... now, can someone tell me what is the
structure of a decimal? Can I find it somewhere on the net? I tried
googleing, but found nothing about the decimal structure.

Is it like Binary Coded Decimal? if the number is 123.456, is it represented
0x12, 0x34, 0x56? or something like this?

thanks

ThunderMusic


Phil Wilson said:
It's easy to see what's actually happening by using Reflector to look at
code. There's some overhead because date/time is culture-driven. You
didn't mention which ToString() you're using, but if you have a fixed
format then I suspect you can always beat the generic one that's
culture-dependent, and providing your own format provider could be faster
than letting the framework use the culture info.

The string issue that was alluded to is that if you're creating a million
strings it might be an appreciable overhead, and re-using one
StringBuilder could be faster.
 
J

Jon Skeet [C# MVP]

ThunderMusic said:
even if using the DateTime.ToString("OurFormat"), it was still slow, and we
got it very very quick now... I don't know if we will be able to speed up
decimal and double too, but we have to have a starting point and I think JR
just gave us the starting point we need...

We know the structure for a double... now, can someone tell me what is the
structure of a decimal? Can I find it somewhere on the net? I tried
googleing, but found nothing about the decimal structure.

Is it like Binary Coded Decimal? if the number is 123.456, is it represented
0x12, 0x34, 0x56? or something like this?

See http://pobox.com/~skeet/csharp/decimal.html

(Or look at MSDN :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top