Stefan,
Well, if you are using a method in any class, derived or not (and
everything derives from object except object, so technically, everything is
derived), yes, you are using a virtual table.
This is definitely a micro-optimization. Also, you don't know if your
methods are being inlined or not. If you have a good deal of smaller
methods that really arent doing anything that are being run repeatedly ina
loop, then I would manually move that code into the loop if you are
concerned with the lookup overhead of the virtual method table.
However, if the method is not marked as virtual, it is possible, if the
method is small enough, that the JIT compiler is already doing this for you
(assuming that the method is not marked with the MethodImpl attribute with
the MethodImplOptions.NoInlining value passed to it), so you might not make
any gains here (I believe that the body of the method has to be less than32
bytes in order to be a candidtate for inlining, you will have to
double-check that).
With parameterized types (generic classes), IIRC, each type is going to
have it's own type handle, and it's own virtual method table, so it's like
using a different class, you still have the virtual method table to contend
with.
With parameterized methods (generic methods), I believe (and I could
very well be wrong here) that it has a separate entry in the method table
for each combination of type parameters passed to the generic method based
on compatability. All reference types are considered equal, while different
value types are treated differently for the sake of comparison. Then, the
list of types (with previous rules applied) is compared when looking in the
method table.
Since you are doing a great amount of calcs on your structures, you
might want to try using unsafe code, as you can do direct memory management
(this is helpful if you want to swap out sections of memory) and you can
also try and turn off bounds checking for arrays in unsafe code if you are
sure you won't violate an array (buffer overruns are still checked for, and
I believe the CLR will tear down the process if you do overrun a buffer).
You can also look into stackalloc if you don't want to allocate memory on
the heap as well.
--
- Nicholas Paldino [.NET/C# MVP]
- (e-mail address removed)
Hmm good point. To be honest: no I haven't thought about that.
About generics / inheritance. If you're calling a method in a class,
you're using a virtual table. So that makes an overhead. If you're
creating a generic (with an where T: interface) and call the methods
in the generic, you're solving the problem by aggregation, but don't
really need the virtual table. That is: unless the generics actually
use a vtable to resolve the methods in the interface to the type.
And then there's the || class Foo<T> : Bar<T> where T:interface ||
construct. I'm interested what this generates in terms of vtable and
so on.
I am using a sheer amount of calculations on structs with basic types.
If you are using a good number of longs in your structures, have you
considered a move to a 64 bit processor (if you are not already on one).
It
could help if you are performing a large number of calculations with that
data type. Of course, you would have to do a great deal of other testing
as
well to make sure that everything else works correctly as well.
What kind of inheritance overhead are you thinking of? Also, where
do
you think you are being impacted by generics?
If you are doing a large number of sheer calculations on a primitive
type, then unsafe could be of use to you here. Also, the work in moving
elements in an array could possibly be faster. However, could possibly
introduce other bottlenecks into your app because of fixing the arrays
when
you manipulate them in unsafe code.
--
- Nicholas Paldino [.NET/C# MVP]
- (e-mail address removed)
Thanks for the reactions. They do help.
I do feel like clearing up some things. We work with ANTS profiler
here, which of course helps determining the bottlenecks. We've already
implemented heuristics on algorithms to speed things up; even
implemented a custom compression component to speed up disk writes.
Tried different algorithmic approaches to matching, determining which
algorithm is best in which situation by heuristics. But when you're
doing some algorithmic operations a couple of million times per cpu,
then it pais off to do micro-optimizations.
And of course optimizing for the sake of optimizing is of course never
a good idea... unless you're in the position where you're making
Google-alike systems where it pays off to have less servers lying
around (where the trade-off is that the resulting applications
shouldn't become unstable, which the main reason we're not using C+
+)... Hmm and well, yes, that's exactly what we're doing. So to put it
bluntly and in the scope of the large picture: the goal is cost
reduction. Which in our case specifies the "large picture" quite
clearly.
About maintainability: we're talking HELL here, where hell of course
stands for "Highly Efficient Low Level" code. It's not supposed to be
maintainable; that's why we have the non-optimized version. If it
stops working, we'll throw it away and work our way up from the non-
optimized version, taking along the lessons learned (but that's
unlikely for these particular pieces of code). I clearly want to point
out that we're not just optimizing every line of code and acknowledge
the issues concerning maintainability and design.
I already see a lot of good information in these reactions. I'm
particulary interested in large volumes of (floating point) operations
on arrays (of structs) - like sorting, but also like filtering, array
resizing, filling these arrays with calculations based on other
arrays, etc - and ways of dealing with this. I'm mostly using the
basic types 'long' and 'double' in these structs; where long is used
because 32-bit isn't large enough. I'm also interested in the overhead
caused by inheritance; particulary if a combination with generics pose
additional overhead.
I find it funny that switch uses a dictionary, since I expected a jump-
by-value. Perhaps I should dig into IL some more...
Thanks,
Stefan.
On May 9, 3:09 pm, "Nicholas Paldino [.NET/C# MVP]"
To follow up a little, performance tuning is a phased approach as
well.
You have to look at big picture issues before you dig down into
micro-optimizations like you are asking about now. Like Marc said,
use a
profiler, see what the bottlenecks in your application are, and then
address
those. If you are performance tuning, then you have a goal in mind
for
what
is acceptable for your application in terms of performance (if you
don't
have a goal, then performance tuning for the sake of performance
tuning
is a
waste here since you are more than likely going to compromise the
maintainability and design of the code in the process).
For switch statements, once you have more than a few cases, theC#
compiler turns it into a Hashtable (possibly a Dictionary in .NET 2..0
and
beyond) which is used for the switch. For very large switch
statements,
my
guess is that this is faster. Also, this should be faster than
repeated
if/then statements as well.
I agree that unsafe is going to do much for you here.
IIRC, the speed of invoking delegates was increased dramatically
with
.NET 2.0. Making calls on interfaces/class instances is always going
to
be
faster, but I think that calling delegates might be much closer to the
speed
of calling a member directly. Of course, if you are doing this over a
massive number of iterations, this small amount can add up.
Using "yield" is always going to add overhead. When you use
"yield"
for
an IEnumerable implementation, the compiler is generating a state
machine
for you in the background, and that's always going to cost more
resources
than doing it yourself.
Hope this helps.
--
- Nicholas Paldino [.NET/C# MVP]
- (e-mail address removed)
Use a decent profiler and find out. Otherwise you are simply
guessing,
and
could be having little effect, or making things worse. In most
cases,
the
performance isn't down to micro-optimisations such as if/else vs
switch,
but rather down to bulk algorithms e.g. looping over a pair of
collections
rather than using a dictionary for "exists" etc.
For specifics; I recall reading an article demonstrating that switch
was
*marginally* quicker, but barely by enough margin to warrant the
risks
from an edit... "unsafe" would do very little unless you actually
use
unsafe functionality
...
read more »