Performance - 0 ticks ???

  • Thread starter Thread starter Tamir Khason
  • Start date Start date
T

Tamir Khason

Follow the code I have to check the performance
double a = r.Next(9)*1e307;


DateTime d = DateTime.Now;

double b = Math.Pow(a,1/(double)100);

TimeSpan tm = DateTime.Now - d;

Console.WriteLine("{0}---{1}",b,tm.Ticks);

The return value is 0 Ticks - IT CAN NOT BE !!! What is the problem?
 
Tamir Khason said:
Follow the code I have to check the performance
double a = r.Next(9)*1e307;


DateTime d = DateTime.Now;

double b = Math.Pow(a,1/(double)100);

TimeSpan tm = DateTime.Now - d;

Console.WriteLine("{0}---{1}",b,tm.Ticks);

The return value is 0 Ticks - IT CAN NOT BE !!! What is the problem?

The problem is that it's taking less time than the granularity of the
timer. Do the operation a million times, time that, and then divide by
a million.
 
Thank you this works, however how it is possible to count such small
operations without multiplexing them?
 
Tamir Khason said:
Thank you this works, however how it is possible to count such small
operations without multiplexing them?

Well, there's the high resolution performance counter, documented in
various places (I don't have a link to hand) - but to be honest, that's
going to be influenced significantly by other things, such as what else
is happening on the computer at the time. Timing a single very fast
operation is generally a bad idea.
 
Accumulating a million measures and than dividing by million will give an
average time of the operation - which is much more meaningful from my point
of view.
Otherwise, using the high-resolution timer is the only option, but the
results will indeed vary depending on CPU usage. The reason is quite
simple - Windows is not a real-time OS by design.

--
Sincerely,
Dmitriy Lapshin [C# / .NET MVP]
Bring the power of unit testing to the VS .NET IDE today!
http://www.x-unity.net/teststudio.aspx

Tamir Khason said:
Thank you ;) But this what the client wants ;)
 
That's the point. My client want to make benchmark of simplest math
calculations on vary platforms. Thus the solution of accumulation is not
suitable for him. I begun to override win32 and kernel for those
calculations. Maybe there is better approach...

--
Tamir Khason
You want dot.NET? Just ask:
"Please, www.dotnet.us "

Dmitriy Lapshin said:
Accumulating a million measures and than dividing by million will give an
average time of the operation - which is much more meaningful from my
point of view.
Otherwise, using the high-resolution timer is the only option, but the
results will indeed vary depending on CPU usage. The reason is quite
simple - Windows is not a real-time OS by design.

--
Sincerely,
Dmitriy Lapshin [C# / .NET MVP]
Bring the power of unit testing to the VS .NET IDE today!
http://www.x-unity.net/teststudio.aspx
 
Tamir,

In that case, I think you need to have a face to face with your client
about the nature of the OS, processors, and why this isn't feasible.

--
- Nicholas Paldino [.NET/C# MVP]
- (e-mail address removed)

Tamir Khason said:
That's the point. My client want to make benchmark of simplest math
calculations on vary platforms. Thus the solution of accumulation is not
suitable for him. I begun to override win32 and kernel for those
calculations. Maybe there is better approach...

--
Tamir Khason
You want dot.NET? Just ask:
"Please, www.dotnet.us "

Dmitriy Lapshin said:
Accumulating a million measures and than dividing by million will give an
average time of the operation - which is much more meaningful from my
point of view.
Otherwise, using the high-resolution timer is the only option, but the
results will indeed vary depending on CPU usage. The reason is quite
simple - Windows is not a real-time OS by design.

--
Sincerely,
Dmitriy Lapshin [C# / .NET MVP]
Bring the power of unit testing to the VS .NET IDE today!
http://www.x-unity.net/teststudio.aspx

Tamir Khason said:
Thank you ;) But this what the client wants ;)

--
Tamir Khason
You want dot.NET? Just ask:
"Please, www.dotnet.us "

Thank you this works, however how it is possible to count such small
operations without multiplexing them?

Well, there's the high resolution performance counter, documented in
various places (I don't have a link to hand) - but to be honest, that's
going to be influenced significantly by other things, such as what else
is happening on the computer at the time. Timing a single very fast
operation is generally a bad idea.
 
By the way, Jon Skeet is correct: timing a single short operation produces
meaningless data. You must loop in an operation that takes a few 10s of
milliseconds, minimum, to obtain anything useful, and even that is fairly
meaningless unless your CPU is running a single process (impossible). So
plan on looping a thousand times for your little operation, and then loop
that loop 10 or 20 times, getting the elapsed time each time, then look at
the average of those times, possibly excluding the first loop.

At least the performance counter does produce accurate data. TimeSpan is not
accurate.

Regards,
Frank Hileman

check out VG.net: http://www.vgdotnet.com
Animated vector graphics system
Integrated Visual Studio .NET graphics editor
 
Frank,

The performance counter doesn't either produce accurate data, the
granularity of the high performance timer is still an issue when timing
short operations.

Willy.
 
Shawn B. said:
This is what I use to test the duration of individual data query execution
times in my data layer, it works pretty well. However, using it in my
65c02/65816 C# CPU emulator doesn't work very good as a general way to keey
my timing info on individual operations, in this case, I have to group about
25 operation before I can begin to measure and enforce timing...
nonetheless, I use it for anywhere I need timing, it sure beats having to
use a TimeSpan.

The way I see it, if you're going to time things which take long enough
to produce meaningful data, the granularity of DateTime is probably
okay. When I do benchmarks, I like to get results between 10s and a
minute, at which point the slight inaccuracy of DateTime is fairly
irrelevant - and it's easy to use...
 
You're probabyly right.

But most of the stuff I time requires a high resolution timer, for example,
the CPU emulator I speak of, I need to enforce timing to simulate a 1.02MHz
CPU and so I have to "throttle" the instruction decode/execution. As well,
I also have to enfore a 30FPs display so I need to do some more throttling
and timing synchronization. Working with DataTime in this case not only
cheats me of the required high-resolution timing I require, but also
consumes more clock cycles to work with than this timer class, as per my
requirements. Every clock-cycle counts in this case, and so every line of
code can make a different, especially when running in a loop 1 million times
per second.

For the data-execution timing, I probably can take your approach, however,
I've gotten used to sub-millisecond timing information when I do my traces.
I'm not trying to measure 10,000 inserts or selects to benchmark. When we
debug and profile in a production environment, we flip the switch in the
config file and start recording certain diagnostics that reflect "real
production" usage and timing, among many other things. In this case, I want
better than millisecond timing. If I run 4 commands against the database,
and each reports either 0 milliseconds, I won't really no how long things
are taking in reality, if they report 1 millisecond, I might be inclined to
think that the four commands execute in 4 milliseconds time. With the high
resolution timer, I can see that each command really only takes about 200ns
and I can execute 5 of them in the same space of 1 millisecond, and so the
information is far more meaningful when recording real-world usage, on their
server, in their enivornment, and not in our "workbench" environment.

We've pinpointed numerous bottlenecks this way and will continue to do so.
Each environment is different so we always have a chance to get different
info and find new bottlenecks and tune-and-optimize accordingly.

However, if you are going to measure longer running operations, a
DataTime/TimeSpan will probably be okay. Just a matter of what your needs
and requirements are, I suppose. I'm not pushing one way over the other,
just providing an option in case higher resolution timing is required.

Thanks,
Shawn
 
Granularity is still an issue, but it is far better than TimeSpan. So it is
more accurate for general purpose timing.
 
Frank, Shawn - Thank you. Great work.
Both timers able to measure such operation: 1e307^(1/100);
Both return more or less the same results on P4 2.4 HT
I wrote wrapper on kernel32 as well for
QueryPerformanceCounter and

QueryPerformanceFrequency and got almost the same results.

I also added support for GetTickCount and timeGetTime to make more exact
results.

The comparition with tinny ASM I build for this case the results are almost
the same. So windows can show timer with hight excel ?..
 
Of course timing repeated execution of the operation is misleading too...
It's true that it provides a very accurate measure of how quickly the code
runs when you call it lots and lots of times in quick succession. The
problem is that this doesn't necessarily tell you how fast it will run when
used in the real-life context in which it will eventually be used.

The speed at which a particular piece of code executes varies wildly
according to what the CPU has been doing recently. How good a measure
you'll get from repeatedly running the code in a tight loop will depend on
how the code will get called in real life. If that's how it's going to be
used in reality, then this will be a good test. If it's not, then it
probably won't be.

The very first time the CPU runs a particular bit of code, it'll be fairly
slow - it will have to load the code out of main memory, which is pretty
slow. But it you run it immediately thereafter, it will be much much
faster, because it's in the cache now - this speeds things up by orders of
magnitude. And there's also the similar issue of whether the data it works
with is in the cache or not. If your simple looping test works with the
same data each time round, but in real life it's going to have to work with
data that might not yet be in the cache, you're going to get pretty
misleading results.

The problem is that microbenchmarks can be misleading by an order of
magnitude or more. The only way to get meaningful results is to try out a
realistic workload, and see what the impact of individual changes are on
your throughput/latency/whatever performance metric matters most to you.

But given what it sounds like the customer has asked for here, it looks like
they are simply demanding a test that will produce some entirely meaningless
data...
 
Shawn,

An offtopic question (would ask by e-mail but I am not sure your e-mail
address is a real one) - judging by your words on 65c02 CPU and 30 FPS,
aren't we going to see a game console emulator written in managed code? ;-)
 
<offtopic>

I started by trying to see what C# was capable of, in terms of raw speed and
capability. My intention was originally to write a CPU, and a development
environment with a custom debugger that allowed me to change my code while I
was still in a break point. So I went about it the typical way you might in
a .NET program, creating a series of interfaces to "black box" the CPU, the
RAM, IO ports, video.

Somewhere in there I noticed that I was only getting 780KHz emulation speed
for the 8-bit 65c02 and so I began to re-design parts of the CPU and voila,
I ended up with 2.5MHz emulation speed (on a 1000MHz PIII). Of course, now,
I have a 64-bit AMD 3500+ so it goes a little faster ;-) Somewhere along
the line, I decided to document various things that can help speed up your
raw processing speed with C# and make a tutorial, except that I don't feel
like going through my SourceGear Vault history and recomposing my CPU, but
I'll get around to it, because it would be an interesting read.

Anyway, my intention was to make the CPU seperate from the debugger and so
on, so I can write my debugger later. I then decided to see how well I can
"plug" the CPU into various scenarios. So I decided to reserve 1k of memory
for a text output screen. About 30 times a second it would read memory and
display whatever was there. At some point I got smarter and only if a
change was made it'll update. Then I got more creative and created a "ROM"
where I can call "built-in" methods that manipulate text and scroll and
stuff. At somepoint, I decided to dedicate 8k of RAM to the graphics
display area with only 8 colors. Same thing, created a "ROM" area where I
can call to plat a pixel and a line and so on (forget about the C# aspect,
just getting the 8-bit code right to do what I wanted was a real task and
learning experience, reeks of nostalgia from my earlier days).

Well, didn't take much before I started looking into how the old NES works
internally. I honestly don't know what my chances of emulating the PPU and
such are, specially since some games are encrypted, not that source isn't
open that I can examine in C/C++, I just don't think C# can handle it but I
will indeed try. Further, a SuperNES would be even harder to emulate, I
still have problems with my 65816 CPU in its 8-bit emulation part.

As I said, I have an AMD 64-bit CPU, I've been experimenting with doing this
CPU in assembly and I really like that I can reserve some of the registers
for the CPU registers exclusively, that makes a massive difference. I don't
lik that neither Microsoft 64-bit C++ and Intell C++ 8.1 compilers do not
and will not support inline assembly. That forces me to have to write it
all in C or in ASM but not mix them (in the manner that I prefer, by
inlining). I'll get around to benchmarking the 64-bit runtime but only when
it is finally released, it isn't so important to me at the moment.

So the answer to your question is a yes and no. Hopefully when I make my
source available someone will be able to do something with it more
meaningful than I, since I'm pre-occupied with www.zenofdotnet.com

I created a binary8, binary16, binary32, and binary64 valuetype in C# for
this CPU core and posted it on planet-source-code, you can look at, but it
didn't meet my performance requirements as well as a raw uint does. I'll
have to figure out why, nonetheless, if you are interested:

http://tinyurl.com/5egg3 [^ goes to a www.planet-source-code.com page]

As for my email address, I removed the appropriate vowels from "hotmail"
but, I never thought until you mentioned it, whether or not my fake email
address is an actual real email address at html.com ... hmm...

</offtopic>

Thanks,
Shawn



Dmitriy Lapshin said:
Shawn,

An offtopic question (would ask by e-mail but I am not sure your e-mail
address is a real one) - judging by your words on 65c02 CPU and 30 FPS,
aren't we going to see a game console emulator written in managed code? ;-)

--
Sincerely,
Dmitriy Lapshin [C# / .NET MVP]
Bring the power of unit testing to the VS .NET IDE today!
http://www.x-unity.net/teststudio.aspx

Shawn B. said:
You're probabyly right.

But most of the stuff I time requires a high resolution timer, for
example,
the CPU emulator I speak of, I need to enforce timing to simulate a
1.02MHz
CPU and so I have to "throttle" the instruction decode/execution. As
well,
I also have to enfore a 30FPs display so I need to do some more throttling
and timing synchronization. Working with DataTime in this case not only
cheats me of the required high-resolution timing I require, but also
consumes more clock cycles to work with than this timer class, as per my
requirements. Every clock-cycle counts in this case, and so every line of
code can make a different, especially when running in a loop 1 million
times
per second.
 
Back
Top