D
Daniel Earwicker
I wrote two trivial test programs that do a billion iterations of a
virtual method call, first in C# (Visual Studio 2005):
Thing t = new DerivedThing();
for (System.Int64 n = 0; n < 10000000000; n++)
t.method();
Then in C++ (using Visual C++ 2005):
Thing *t = new DerivedThing();
for (__int64 n = 0; n < 10000000000L; n++)
t->method();
.... with appropriate declarations in each case for Thing (abstract
base class) and DerivedThing (method increments a counter).
C# took 47 seconds, C++ took 58 seconds. Both were release builds.
Now, given that the C++ implementation of virtual method dispatch is
very "close to the metal", this must mean that by the time the C#
version is running, there is no virtual method dispatch happening. The
CLR JIT must be inlining the method call, right? (I looked at the IL
and it's not being inlined by the C# compiler).
Then I tried moving the allocation of the DerivedThing inside the loop
- for the C++ program this also meant putting a 'delete t' after the
method call. Note that DerivedThing is a class in C#, not a struct,
and it holds some data.
C# took 13 seconds, C++ took 175 seconds. I was a bit shocked by this,
so ran both a few more times, with identical results.
I thought maybe the JIT looks at what I'm doing with the object and
realises that I'm not holding onto a reference to it outside of the
loop scope, and so it doesn't need to be allocated on the garbage
collected heap in the same way as a long-lived object. Of course to
know that, it would have to look at the code of method(), because it
could be stashing the 'this' reference somewhere.
So I modified DerivedThing's method so it stored the 'this' reference
in a static member, but only on the forteenth time (out of a billion!)
that it was called. Now the CLR has to allocate a garbage collected
object each time around the loop, right?
But this merely increased the running time to 16 seconds, still less
than 10% of the C++ result.
So maybe it inlines method(), then looks at what it does and
completely rewrites it to produce the same effect without allocating a
billion objects?
Are there any articles that will tell me what the CLR's garbage
collected heap (and/or the JIT) is actually doing in this case? How
can it be more than ten times faster than the non-garbage collected C+
+ heap?
virtual method call, first in C# (Visual Studio 2005):
Thing t = new DerivedThing();
for (System.Int64 n = 0; n < 10000000000; n++)
t.method();
Then in C++ (using Visual C++ 2005):
Thing *t = new DerivedThing();
for (__int64 n = 0; n < 10000000000L; n++)
t->method();
.... with appropriate declarations in each case for Thing (abstract
base class) and DerivedThing (method increments a counter).
C# took 47 seconds, C++ took 58 seconds. Both were release builds.
Now, given that the C++ implementation of virtual method dispatch is
very "close to the metal", this must mean that by the time the C#
version is running, there is no virtual method dispatch happening. The
CLR JIT must be inlining the method call, right? (I looked at the IL
and it's not being inlined by the C# compiler).
Then I tried moving the allocation of the DerivedThing inside the loop
- for the C++ program this also meant putting a 'delete t' after the
method call. Note that DerivedThing is a class in C#, not a struct,
and it holds some data.
C# took 13 seconds, C++ took 175 seconds. I was a bit shocked by this,
so ran both a few more times, with identical results.
I thought maybe the JIT looks at what I'm doing with the object and
realises that I'm not holding onto a reference to it outside of the
loop scope, and so it doesn't need to be allocated on the garbage
collected heap in the same way as a long-lived object. Of course to
know that, it would have to look at the code of method(), because it
could be stashing the 'this' reference somewhere.
So I modified DerivedThing's method so it stored the 'this' reference
in a static member, but only on the forteenth time (out of a billion!)
that it was called. Now the CLR has to allocate a garbage collected
object each time around the loop, right?
But this merely increased the running time to 16 seconds, still less
than 10% of the C++ result.
So maybe it inlines method(), then looks at what it does and
completely rewrites it to produce the same effect without allocating a
billion objects?
Are there any articles that will tell me what the CLR's garbage
collected heap (and/or the JIT) is actually doing in this case? How
can it be more than ten times faster than the non-garbage collected C+
+ heap?