Performance tuning

atlaste · May 9, 2007

Hi,

I'm currently developing an application that uses a lot of
computational power, disk access and memory caching (to be more exact:
an information retrieval platform). In these kind of applications the
last thing that remains is bare performance tuning.

So for example, you can do an 'if then else' on a bit like a 'case/
switch', an 'if/then/else' and as a multiplication with a static
buffer. Or, you can do sorting with an inline delegate, with a
delegate and with a static delegate. Which is best in terms of raw
speed? Or should I just use reflector, fill the sort in and save time?
Perhaps I should mark my critical code 'unsafe'? Should I avoid for-
each? Or rather use it? What about 'yield'? Should I avoid exceptions?
What are real performance killers? So on...

Since I don't feel like testing every possible optimization and
thereby reinventing the wheel I wonder if someone else has already
made up a document, book or article containing all best practices
concerning low-level optimizations in C#.

Your help is appreciated.

Thanks,
Stefan.

Marc Gravell · May 9, 2007

Use a decent profiler and find out. Otherwise you are simply guessing,
and could be having little effect, or making things worse. In most
cases, the performance isn't down to micro-optimisations such as
if/else vs switch, but rather down to bulk algorithms e.g. looping
over a pair of collections rather than using a dictionary for "exists"
etc.

For specifics; I recall reading an article demonstrating that switch
was *marginally* quicker, but barely by enough margin to warrant the
risks from an edit... "unsafe" would do very little unless you
actually use unsafe functionality inside the block. For sorting, the
interface-based approach (IComparer<T>) /may/ be quicker than
delegates - but don't trust anybody: rig a test harness and find out,
or profile your app. As for exceptions - well, my code can screw
things up very quickly indeed if I don't correctly handle error
conditions ;-p

Marc

Nicholas Paldino [.NET/C# MVP] · May 9, 2007

To follow up a little, performance tuning is a phased approach as well.
You have to look at big picture issues before you dig down into
micro-optimizations like you are asking about now. Like Marc said, use a
profiler, see what the bottlenecks in your application are, and then address
those. If you are performance tuning, then you have a goal in mind for what
is acceptable for your application in terms of performance (if you don't
have a goal, then performance tuning for the sake of performance tuning is a
waste here since you are more than likely going to compromise the
maintainability and design of the code in the process).

For switch statements, once you have more than a few cases, the C#
compiler turns it into a Hashtable (possibly a Dictionary in .NET 2.0 and
beyond) which is used for the switch. For very large switch statements, my
guess is that this is faster. Also, this should be faster than repeated
if/then statements as well.

I agree that unsafe is going to do much for you here.

IIRC, the speed of invoking delegates was increased dramatically with
..NET 2.0. Making calls on interfaces/class instances is always going to be
faster, but I think that calling delegates might be much closer to the speed
of calling a member directly. Of course, if you are doing this over a
massive number of iterations, this small amount can add up.

Using "yield" is always going to add overhead. When you use "yield" for
an IEnumerable implementation, the compiler is generating a state machine
for you in the background, and that's always going to cost more resources
than doing it yourself.

Hope this helps.

atlaste · May 9, 2007

Thanks for the reactions. They do help.

I do feel like clearing up some things. We work with ANTS profiler
here, which of course helps determining the bottlenecks. We've already
implemented heuristics on algorithms to speed things up; even
implemented a custom compression component to speed up disk writes.
Tried different algorithmic approaches to matching, determining which
algorithm is best in which situation by heuristics. But when you're
doing some algorithmic operations a couple of million times per cpu,
then it pais off to do micro-optimizations.

And of course optimizing for the sake of optimizing is of course never
a good idea... unless you're in the position where you're making
Google-alike systems where it pays off to have less servers lying
around (where the trade-off is that the resulting applications
shouldn't become unstable, which the main reason we're not using C+
+)... Hmm and well, yes, that's exactly what we're doing. So to put it
bluntly and in the scope of the large picture: the goal is cost
reduction. Which in our case specifies the "large picture" quite
clearly.

About maintainability: we're talking HELL here, where hell of course
stands for "Highly Efficient Low Level" code. It's not supposed to be
maintainable; that's why we have the non-optimized version. If it
stops working, we'll throw it away and work our way up from the non-
optimized version, taking along the lessons learned (but that's
unlikely for these particular pieces of code). I clearly want to point
out that we're not just optimizing every line of code and acknowledge
the issues concerning maintainability and design.

I already see a lot of good information in these reactions. I'm
particulary interested in large volumes of (floating point) operations
on arrays (of structs) - like sorting, but also like filtering, array
resizing, filling these arrays with calculations based on other
arrays, etc - and ways of dealing with this. I'm mostly using the
basic types 'long' and 'double' in these structs; where long is used
because 32-bit isn't large enough. I'm also interested in the overhead
caused by inheritance; particulary if a combination with generics pose
additional overhead.

I find it funny that switch uses a dictionary, since I expected a jump-
by-value. Perhaps I should dig into IL some more...

Thanks,
Stefan.

To follow up a little, performance tuning is a phased approach as well.
You have to look at big picture issues before you dig down into
micro-optimizations like you are asking about now. Like Marc said, use a
profiler, see what the bottlenecks in your application are, and then address
those. If you are performance tuning, then you have a goal in mind for what
is acceptable for your application in terms of performance (if you don't
have a goal, then performance tuning for the sake of performance tuning is a
waste here since you are more than likely going to compromise the
maintainability and design of the code in the process).

For switch statements, once you have more than a few cases, the C#
compiler turns it into a Hashtable (possibly a Dictionary in .NET 2.0 and
beyond) which is used for the switch. For very large switch statements, my
guess is that this is faster. Also, this should be faster than repeated
if/then statements as well.

I agree that unsafe is going to do much for you here.

IIRC, the speed of invoking delegates was increased dramatically with
.NET 2.0. Making calls on interfaces/class instances is always going to be
faster, but I think that calling delegates might be much closer to the speed
of calling a member directly. Of course, if you are doing this over a
massive number of iterations, this small amount can add up.

Using "yield" is always going to add overhead. When you use "yield" for
an IEnumerable implementation, the compiler is generating a state machine
for you in the background, and that's always going to cost more resources
than doing it yourself.

Hope this helps.

--
- Nicholas Paldino [.NET/C# MVP]
- (e-mail address removed)

Use a decent profiler and find out. Otherwise you are simply guessing, and
could be having little effect, or making things worse. In most cases, the
performance isn't down to micro-optimisations such as if/else vs switch,
but rather down to bulk algorithms e.g. looping over a pair of collections
rather than using a dictionary for "exists" etc.

Click to expand...

For specifics; I recall reading an article demonstrating that switch was
*marginally* quicker, but barely by enough margin to warrant the risks
from an edit... "unsafe" would do very little unless you actually use
unsafe functionality inside the block. For sorting, the interface-based
approach (IComparer<T>) /may/ be quicker than delegates - but don't trust
anybody: rig a test harness and find out, or profile your app. As for
exceptions - well, my code can screw things up very quickly indeed if I
don't correctly handle error conditions ;-p

Click to expand...

Marc

Click to expand...

Nicholas Paldino [.NET/C# MVP] · May 9, 2007

If you are using a good number of longs in your structures, have you
considered a move to a 64 bit processor (if you are not already on one). It
could help if you are performing a large number of calculations with that
data type. Of course, you would have to do a great deal of other testing as
well to make sure that everything else works correctly as well.

What kind of inheritance overhead are you thinking of? Also, where do
you think you are being impacted by generics?

If you are doing a large number of sheer calculations on a primitive
type, then unsafe could be of use to you here. Also, the work in moving
elements in an array could possibly be faster. However, could possibly
introduce other bottlenecks into your app because of fixing the arrays when
you manipulate them in unsafe code.

--
- Nicholas Paldino [.NET/C# MVP]
- (e-mail address removed)

atlaste said:
Thanks for the reactions. They do help.

I do feel like clearing up some things. We work with ANTS profiler
here, which of course helps determining the bottlenecks. We've already
implemented heuristics on algorithms to speed things up; even
implemented a custom compression component to speed up disk writes.
Tried different algorithmic approaches to matching, determining which
algorithm is best in which situation by heuristics. But when you're
doing some algorithmic operations a couple of million times per cpu,
then it pais off to do micro-optimizations.

And of course optimizing for the sake of optimizing is of course never
a good idea... unless you're in the position where you're making
Google-alike systems where it pays off to have less servers lying
around (where the trade-off is that the resulting applications
shouldn't become unstable, which the main reason we're not using C+
+)... Hmm and well, yes, that's exactly what we're doing. So to put it
bluntly and in the scope of the large picture: the goal is cost
reduction. Which in our case specifies the "large picture" quite
clearly.

About maintainability: we're talking HELL here, where hell of course
stands for "Highly Efficient Low Level" code. It's not supposed to be
maintainable; that's why we have the non-optimized version. If it
stops working, we'll throw it away and work our way up from the non-
optimized version, taking along the lessons learned (but that's
unlikely for these particular pieces of code). I clearly want to point
out that we're not just optimizing every line of code and acknowledge
the issues concerning maintainability and design.

I already see a lot of good information in these reactions. I'm
particulary interested in large volumes of (floating point) operations
on arrays (of structs) - like sorting, but also like filtering, array
resizing, filling these arrays with calculations based on other
arrays, etc - and ways of dealing with this. I'm mostly using the
basic types 'long' and 'double' in these structs; where long is used
because 32-bit isn't large enough. I'm also interested in the overhead
caused by inheritance; particulary if a combination with generics pose
additional overhead.

I find it funny that switch uses a dictionary, since I expected a jump-
by-value. Perhaps I should dig into IL some more...

Thanks,
Stefan.

To follow up a little, performance tuning is a phased approach as
well.
You have to look at big picture issues before you dig down into
micro-optimizations like you are asking about now. Like Marc said, use a
profiler, see what the bottlenecks in your application are, and then
address
those. If you are performance tuning, then you have a goal in mind for
what
is acceptable for your application in terms of performance (if you don't
have a goal, then performance tuning for the sake of performance tuning
is a
waste here since you are more than likely going to compromise the
maintainability and design of the code in the process).

For switch statements, once you have more than a few cases, the C#
compiler turns it into a Hashtable (possibly a Dictionary in .NET 2.0 and
beyond) which is used for the switch. For very large switch statements,
my
guess is that this is faster. Also, this should be faster than repeated
if/then statements as well.

I agree that unsafe is going to do much for you here.

IIRC, the speed of invoking delegates was increased dramatically with
.NET 2.0. Making calls on interfaces/class instances is always going to
be
faster, but I think that calling delegates might be much closer to the
speed
of calling a member directly. Of course, if you are doing this over a
massive number of iterations, this small amount can add up.

Using "yield" is always going to add overhead. When you use "yield"
for
an IEnumerable implementation, the compiler is generating a state machine
for you in the background, and that's always going to cost more resources
than doing it yourself.

Hope this helps.

--
- Nicholas Paldino [.NET/C# MVP]
- (e-mail address removed)

Use a decent profiler and find out. Otherwise you are simply guessing,
and
could be having little effect, or making things worse. In most cases,
the
performance isn't down to micro-optimisations such as if/else vs
switch,
but rather down to bulk algorithms e.g. looping over a pair of
collections
rather than using a dictionary for "exists" etc.

Click to expand...

For specifics; I recall reading an article demonstrating that switch
was
*marginally* quicker, but barely by enough margin to warrant the risks
from an edit... "unsafe" would do very little unless you actually use
unsafe functionality inside the block. For sorting, the interface-based
approach (IComparer<T>) /may/ be quicker than delegates - but don't
trust
anybody: rig a test harness and find out, or profile your app. As for
exceptions - well, my code can screw things up very quickly indeed if I
don't correctly handle error conditions ;-p

Click to expand...

Marc

Click to expand...

Click to expand...

David Madden · May 10, 2007

One thing that I like to do before hitting disk access and network access
tuning is to check my variables. Make sure that variables can be the
minimum they can be. If you can use int16 for elements in an array then
that might help if you have some intensity in that area. Make sure you
would never have to go beyond int16, of course.

Variable checking is something that people forget and can be a cause of
performance issues when reading data into the program for processing. I
cannot recall the name of the program at this time but there are tools out
there that help analyze memory allocation and usage of an application when
it its launched.

Also check for objects that might be in memory that did not need to be at
the time. I hope this helps as a start. It is something that I usually try
to tackle since my apps run on PCs that are not always up to my standards.

atlaste · May 10, 2007

David,

You're absolutely right. Since disk is a factor 1K slower than memory
it helps to compact things. Books like "Managing Gigabytes" tell this
story for the field of information retrieval in great detail. Which is
why we apply gamma encoding and some variant of arithmetic coding to
the storing process.

Furthermore I set variables to null and use 'using' so on because I
found this to help the GC. And structs rather than classes if
appropriate.

Useful suggestions. Have more? :-)

Cheers,
Stefan.

atlaste · May 10, 2007

Hmm good point. To be honest: no I haven't thought about that.

About generics / inheritance. If you're calling a method in a class,
you're using a virtual table. So that makes an overhead. If you're
creating a generic (with an where T: interface) and call the methods
in the generic, you're solving the problem by aggregation, but don't
really need the virtual table. That is: unless the generics actually
use a vtable to resolve the methods in the interface to the type.

And then there's the || class Foo<T> : Bar<T> where T:interface ||
construct. I'm interested what this generates in terms of vtable and
so on.

I am using a sheer amount of calculations on structs with basic types.

Cheers,
Stefan.

If you are using a good number of longs in your structures, have you
considered a move to a 64 bit processor (if you are not already on one). It
could help if you are performing a large number of calculations with that
data type. Of course, you would have to do a great deal of other testing as
well to make sure that everything else works correctly as well.

What kind of inheritance overhead are you thinking of? Also, where do
you think you are being impacted by generics?

If you are doing a large number of sheer calculations on a primitive
type, then unsafe could be of use to you here. Also, the work in moving
elements in an array could possibly be faster. However, could possibly
introduce other bottlenecks into your app because of fixing the arrays when
you manipulate them in unsafe code.

--
- Nicholas Paldino [.NET/C# MVP]
- (e-mail address removed)

Thanks for the reactions. They do help.

Click to expand...

I do feel like clearing up some things. We work with ANTS profiler
here, which of course helps determining the bottlenecks. We've already
implemented heuristics on algorithms to speed things up; even
implemented a custom compression component to speed up disk writes.
Tried different algorithmic approaches to matching, determining which
algorithm is best in which situation by heuristics. But when you're
doing some algorithmic operations a couple of million times per cpu,
then it pais off to do micro-optimizations.

Click to expand...

And of course optimizing for the sake of optimizing is of course never
a good idea... unless you're in the position where you're making
Google-alike systems where it pays off to have less servers lying
around (where the trade-off is that the resulting applications
shouldn't become unstable, which the main reason we're not using C+
+)... Hmm and well, yes, that's exactly what we're doing. So to put it
bluntly and in the scope of the large picture: the goal is cost
reduction. Which in our case specifies the "large picture" quite
clearly.

Click to expand...

About maintainability: we're talking HELL here, where hell of course
stands for "Highly Efficient Low Level" code. It's not supposed to be
maintainable; that's why we have the non-optimized version. If it
stops working, we'll throw it away and work our way up from the non-
optimized version, taking along the lessons learned (but that's
unlikely for these particular pieces of code). I clearly want to point
out that we're not just optimizing every line of code and acknowledge
the issues concerning maintainability and design.

Click to expand...

I already see a lot of good information in these reactions. I'm
particulary interested in large volumes of (floating point) operations
on arrays (of structs) - like sorting, but also like filtering, array
resizing, filling these arrays with calculations based on other
arrays, etc - and ways of dealing with this. I'm mostly using the
basic types 'long' and 'double' in these structs; where long is used
because 32-bit isn't large enough. I'm also interested in the overhead
caused by inheritance; particulary if a combination with generics pose
additional overhead.

Click to expand...

I find it funny that switch uses a dictionary, since I expected a jump-
by-value. Perhaps I should dig into IL some more...

Thanks,
Stefan.

Click to expand...

To follow up a little, performance tuning is a phased approach as
well.
You have to look at big picture issues before you dig down into
micro-optimizations like you are asking about now. Like Marc said, use a
profiler, see what the bottlenecks in your application are, and then
address
those. If you are performance tuning, then you have a goal in mind for
what
is acceptable for your application in terms of performance (if you don't
have a goal, then performance tuning for the sake of performance tuning
is a
waste here since you are more than likely going to compromise the
maintainability and design of the code in the process).
For switch statements, once you have more than a few cases, the C#
compiler turns it into a Hashtable (possibly a Dictionary in .NET 2.0 and
beyond) which is used for the switch. For very large switch statements,
my
guess is that this is faster. Also, this should be faster than repeated
if/then statements as well.
I agree that unsafe is going to do much for you here.
IIRC, the speed of invoking delegates was increased dramatically with
.NET 2.0. Making calls on interfaces/class instances is always going to
be
faster, but I think that calling delegates might be much closer to the
speed
of calling a member directly. Of course, if you are doing this over a
massive number of iterations, this small amount can add up.
Using "yield" is always going to add overhead. When you use "yield"
for
an IEnumerable implementation, the compiler is generating a state machine
for you in the background, and that's always going to cost more resources
than doing it yourself.
Hope this helps.
--
- Nicholas Paldino [.NET/C# MVP]
- (e-mail address removed)

Use a decent profiler and find out. Otherwise you are simply guessing,
and
could be having little effect, or making things worse. In most cases,
the
performance isn't down to micro-optimisations such as if/else vs
switch,
but rather down to bulk algorithms e.g. looping over a pair of
collections
rather than using a dictionary for "exists" etc.
For specifics; I recall reading an article demonstrating that switch
was
*marginally* quicker, but barely by enough margin to warrant the risks
from an edit... "unsafe" would do very little unless you actually use
unsafe functionality inside the block. For sorting, the interface-based
approach (IComparer<T>) /may/ be quicker than delegates - but don't
trust
anybody: rig a test harness and find out, or profile your app. As for
exceptions - well, my code can screw things up very quickly indeed if I
don't correctly handle error conditions ;-p
Marc

Click to expand...

Click to expand...

Nicholas Paldino [.NET/C# MVP] · May 10, 2007

Stefan,

Well, if you are using a method in any class, derived or not (and
everything derives from object except object, so technically, everything is
derived), yes, you are using a virtual table.

This is definitely a micro-optimization. Also, you don't know if your
methods are being inlined or not. If you have a good deal of smaller
methods that really arent doing anything that are being run repeatedly in a
loop, then I would manually move that code into the loop if you are
concerned with the lookup overhead of the virtual method table.

However, if the method is not marked as virtual, it is possible, if the
method is small enough, that the JIT compiler is already doing this for you
(assuming that the method is not marked with the MethodImpl attribute with
the MethodImplOptions.NoInlining value passed to it), so you might not make
any gains here (I believe that the body of the method has to be less than 32
bytes in order to be a candidtate for inlining, you will have to
double-check that).

With parameterized types (generic classes), IIRC, each type is going to
have it's own type handle, and it's own virtual method table, so it's like
using a different class, you still have the virtual method table to contend
with.

With parameterized methods (generic methods), I believe (and I could
very well be wrong here) that it has a separate entry in the method table
for each combination of type parameters passed to the generic method based
on compatability. All reference types are considered equal, while different
value types are treated differently for the sake of comparison. Then, the
list of types (with previous rules applied) is compared when looking in the
method table.

Since you are doing a great amount of calcs on your structures, you
might want to try using unsafe code, as you can do direct memory management
(this is helpful if you want to swap out sections of memory) and you can
also try and turn off bounds checking for arrays in unsafe code if you are
sure you won't violate an array (buffer overruns are still checked for, and
I believe the CLR will tear down the process if you do overrun a buffer).
You can also look into stackalloc if you don't want to allocate memory on
the heap as well.

--
- Nicholas Paldino [.NET/C# MVP]
- (e-mail address removed)

atlaste said:
Hmm good point. To be honest: no I haven't thought about that.

About generics / inheritance. If you're calling a method in a class,
you're using a virtual table. So that makes an overhead. If you're
creating a generic (with an where T: interface) and call the methods
in the generic, you're solving the problem by aggregation, but don't
really need the virtual table. That is: unless the generics actually
use a vtable to resolve the methods in the interface to the type.

And then there's the || class Foo<T> : Bar<T> where T:interface ||
construct. I'm interested what this generates in terms of vtable and
so on.

I am using a sheer amount of calculations on structs with basic types.

Cheers,
Stefan.

If you are using a good number of longs in your structures, have you
considered a move to a 64 bit processor (if you are not already on one).
It
could help if you are performing a large number of calculations with that
data type. Of course, you would have to do a great deal of other testing
as
well to make sure that everything else works correctly as well.

What kind of inheritance overhead are you thinking of? Also, where
do
you think you are being impacted by generics?

If you are doing a large number of sheer calculations on a primitive
type, then unsafe could be of use to you here. Also, the work in moving
elements in an array could possibly be faster. However, could possibly
introduce other bottlenecks into your app because of fixing the arrays
when
you manipulate them in unsafe code.

--
- Nicholas Paldino [.NET/C# MVP]
- (e-mail address removed)

Thanks for the reactions. They do help.

Click to expand...

I do feel like clearing up some things. We work with ANTS profiler
here, which of course helps determining the bottlenecks. We've already
implemented heuristics on algorithms to speed things up; even
implemented a custom compression component to speed up disk writes.
Tried different algorithmic approaches to matching, determining which
algorithm is best in which situation by heuristics. But when you're
doing some algorithmic operations a couple of million times per cpu,
then it pais off to do micro-optimizations.

Click to expand...

And of course optimizing for the sake of optimizing is of course never
a good idea... unless you're in the position where you're making
Google-alike systems where it pays off to have less servers lying
around (where the trade-off is that the resulting applications
shouldn't become unstable, which the main reason we're not using C+
+)... Hmm and well, yes, that's exactly what we're doing. So to put it
bluntly and in the scope of the large picture: the goal is cost
reduction. Which in our case specifies the "large picture" quite
clearly.

Click to expand...

About maintainability: we're talking HELL here, where hell of course
stands for "Highly Efficient Low Level" code. It's not supposed to be
maintainable; that's why we have the non-optimized version. If it
stops working, we'll throw it away and work our way up from the non-
optimized version, taking along the lessons learned (but that's
unlikely for these particular pieces of code). I clearly want to point
out that we're not just optimizing every line of code and acknowledge
the issues concerning maintainability and design.

Click to expand...

I already see a lot of good information in these reactions. I'm
particulary interested in large volumes of (floating point) operations
on arrays (of structs) - like sorting, but also like filtering, array
resizing, filling these arrays with calculations based on other
arrays, etc - and ways of dealing with this. I'm mostly using the
basic types 'long' and 'double' in these structs; where long is used
because 32-bit isn't large enough. I'm also interested in the overhead
caused by inheritance; particulary if a combination with generics pose
additional overhead.

Click to expand...

I find it funny that switch uses a dictionary, since I expected a jump-
by-value. Perhaps I should dig into IL some more...

Thanks,
Stefan.

Click to expand...

On May 9, 3:09 pm, "Nicholas Paldino [.NET/C# MVP]"
To follow up a little, performance tuning is a phased approach as
well.
You have to look at big picture issues before you dig down into
micro-optimizations like you are asking about now. Like Marc said,
use a
profiler, see what the bottlenecks in your application are, and then
address
those. If you are performance tuning, then you have a goal in mind
for
what
is acceptable for your application in terms of performance (if you
don't
have a goal, then performance tuning for the sake of performance
tuning
is a
waste here since you are more than likely going to compromise the
maintainability and design of the code in the process).

Click to expand...

For switch statements, once you have more than a few cases, the C#
compiler turns it into a Hashtable (possibly a Dictionary in .NET 2.0
and
beyond) which is used for the switch. For very large switch
statements,
my
guess is that this is faster. Also, this should be faster than
repeated
if/then statements as well.

Click to expand...

I agree that unsafe is going to do much for you here.

Click to expand...

IIRC, the speed of invoking delegates was increased dramatically
with
.NET 2.0. Making calls on interfaces/class instances is always going
to
be
faster, but I think that calling delegates might be much closer to the
speed
of calling a member directly. Of course, if you are doing this over a
massive number of iterations, this small amount can add up.

Click to expand...

Using "yield" is always going to add overhead. When you use
"yield"
for
an IEnumerable implementation, the compiler is generating a state
machine
for you in the background, and that's always going to cost more
resources
than doing it yourself.

Click to expand...

Hope this helps.

Click to expand...

Use a decent profiler and find out. Otherwise you are simply
guessing,
and
could be having little effect, or making things worse. In most
cases,
the
performance isn't down to micro-optimisations such as if/else vs
switch,
but rather down to bulk algorithms e.g. looping over a pair of
collections
rather than using a dictionary for "exists" etc.

Click to expand...

For specifics; I recall reading an article demonstrating that switch
was
*marginally* quicker, but barely by enough margin to warrant the
risks
from an edit... "unsafe" would do very little unless you actually
use
unsafe functionality inside the block. For sorting, the
interface-based
approach (IComparer<T>) /may/ be quicker than delegates - but don't
trust
anybody: rig a test harness and find out, or profile your app. As
for
exceptions - well, my code can screw things up very quickly indeed
if I
don't correctly handle error conditions ;-p

Click to expand...

Marc

Click to expand...

Click to expand...

atlaste · May 11, 2007

Thanks, this was more or less the information I am looking for.

Cheers,
Stefan.

Stefan,

Well, if you are using a method in any class, derived or not (and
everything derives from object except object, so technically, everything is
derived), yes, you are using a virtual table.

This is definitely a micro-optimization. Also, you don't know if your
methods are being inlined or not. If you have a good deal of smaller
methods that really arent doing anything that are being run repeatedly ina
loop, then I would manually move that code into the loop if you are
concerned with the lookup overhead of the virtual method table.

However, if the method is not marked as virtual, it is possible, if the
method is small enough, that the JIT compiler is already doing this for you
(assuming that the method is not marked with the MethodImpl attribute with
the MethodImplOptions.NoInlining value passed to it), so you might not make
any gains here (I believe that the body of the method has to be less than32
bytes in order to be a candidtate for inlining, you will have to
double-check that).

With parameterized types (generic classes), IIRC, each type is going to
have it's own type handle, and it's own virtual method table, so it's like
using a different class, you still have the virtual method table to contend
with.

With parameterized methods (generic methods), I believe (and I could
very well be wrong here) that it has a separate entry in the method table
for each combination of type parameters passed to the generic method based
on compatability. All reference types are considered equal, while different
value types are treated differently for the sake of comparison. Then, the
list of types (with previous rules applied) is compared when looking in the
method table.

Since you are doing a great amount of calcs on your structures, you
might want to try using unsafe code, as you can do direct memory management
(this is helpful if you want to swap out sections of memory) and you can
also try and turn off bounds checking for arrays in unsafe code if you are
sure you won't violate an array (buffer overruns are still checked for, and
I believe the CLR will tear down the process if you do overrun a buffer).
You can also look into stackalloc if you don't want to allocate memory on
the heap as well.

--
- Nicholas Paldino [.NET/C# MVP]
- (e-mail address removed)

Hmm good point. To be honest: no I haven't thought about that.

Click to expand...

About generics / inheritance. If you're calling a method in a class,
you're using a virtual table. So that makes an overhead. If you're
creating a generic (with an where T: interface) and call the methods
in the generic, you're solving the problem by aggregation, but don't
really need the virtual table. That is: unless the generics actually
use a vtable to resolve the methods in the interface to the type.

Click to expand...

And then there's the || class Foo<T> : Bar<T> where T:interface ||
construct. I'm interested what this generates in terms of vtable and
so on.

Click to expand...

I am using a sheer amount of calculations on structs with basic types.

Cheers,
Stefan.

Click to expand...

If you are using a good number of longs in your structures, have you
considered a move to a 64 bit processor (if you are not already on one).
It
could help if you are performing a large number of calculations with that
data type. Of course, you would have to do a great deal of other testing
as
well to make sure that everything else works correctly as well.
What kind of inheritance overhead are you thinking of? Also, where
do
you think you are being impacted by generics?
If you are doing a large number of sheer calculations on a primitive
type, then unsafe could be of use to you here. Also, the work in moving
elements in an array could possibly be faster. However, could possibly
introduce other bottlenecks into your app because of fixing the arrays
when
you manipulate them in unsafe code.
--
- Nicholas Paldino [.NET/C# MVP]
- (e-mail address removed)

Thanks for the reactions. They do help.
I do feel like clearing up some things. We work with ANTS profiler
here, which of course helps determining the bottlenecks. We've already
implemented heuristics on algorithms to speed things up; even
implemented a custom compression component to speed up disk writes.
Tried different algorithmic approaches to matching, determining which
algorithm is best in which situation by heuristics. But when you're
doing some algorithmic operations a couple of million times per cpu,
then it pais off to do micro-optimizations.
And of course optimizing for the sake of optimizing is of course never
a good idea... unless you're in the position where you're making
Google-alike systems where it pays off to have less servers lying
around (where the trade-off is that the resulting applications
shouldn't become unstable, which the main reason we're not using C+
+)... Hmm and well, yes, that's exactly what we're doing. So to put it
bluntly and in the scope of the large picture: the goal is cost
reduction. Which in our case specifies the "large picture" quite
clearly.
About maintainability: we're talking HELL here, where hell of course
stands for "Highly Efficient Low Level" code. It's not supposed to be
maintainable; that's why we have the non-optimized version. If it
stops working, we'll throw it away and work our way up from the non-
optimized version, taking along the lessons learned (but that's
unlikely for these particular pieces of code). I clearly want to point
out that we're not just optimizing every line of code and acknowledge
the issues concerning maintainability and design.
I already see a lot of good information in these reactions. I'm
particulary interested in large volumes of (floating point) operations
on arrays (of structs) - like sorting, but also like filtering, array
resizing, filling these arrays with calculations based on other
arrays, etc - and ways of dealing with this. I'm mostly using the
basic types 'long' and 'double' in these structs; where long is used
because 32-bit isn't large enough. I'm also interested in the overhead
caused by inheritance; particulary if a combination with generics pose
additional overhead.
I find it funny that switch uses a dictionary, since I expected a jump-
by-value. Perhaps I should dig into IL some more...
Thanks,
Stefan.
On May 9, 3:09 pm, "Nicholas Paldino [.NET/C# MVP]"
To follow up a little, performance tuning is a phased approach as
well.
You have to look at big picture issues before you dig down into
micro-optimizations like you are asking about now. Like Marc said,
use a
profiler, see what the bottlenecks in your application are, and then
address
those. If you are performance tuning, then you have a goal in mind
for
what
is acceptable for your application in terms of performance (if you
don't
have a goal, then performance tuning for the sake of performance
tuning
is a
waste here since you are more than likely going to compromise the
maintainability and design of the code in the process).
For switch statements, once you have more than a few cases, theC#
compiler turns it into a Hashtable (possibly a Dictionary in .NET 2..0
and
beyond) which is used for the switch. For very large switch
statements,
my
guess is that this is faster. Also, this should be faster than
repeated
if/then statements as well.
I agree that unsafe is going to do much for you here.
IIRC, the speed of invoking delegates was increased dramatically
with
.NET 2.0. Making calls on interfaces/class instances is always going
to
be
faster, but I think that calling delegates might be much closer to the
speed
of calling a member directly. Of course, if you are doing this over a
massive number of iterations, this small amount can add up.
Using "yield" is always going to add overhead. When you use
"yield"
for
an IEnumerable implementation, the compiler is generating a state
machine
for you in the background, and that's always going to cost more
resources
than doing it yourself.
Hope this helps.
--
- Nicholas Paldino [.NET/C# MVP]
- (e-mail address removed)

Use a decent profiler and find out. Otherwise you are simply
guessing,
and
could be having little effect, or making things worse. In most
cases,
the
performance isn't down to micro-optimisations such as if/else vs
switch,
but rather down to bulk algorithms e.g. looping over a pair of
collections
rather than using a dictionary for "exists" etc.
For specifics; I recall reading an article demonstrating that switch
was
*marginally* quicker, but barely by enough margin to warrant the
risks
from an edit... "unsafe" would do very little unless you actually
use
unsafe functionality

Click to expand...

Click to expand...

...

read more »

David Madden · May 11, 2007

Also take into consideration the PCs that the software will be running on.
You might be able to change the amount of times that you write out data to
improve system performance.

ThunderMusic · May 11, 2007

Hi,
Here we have done a bit of optimizing on a raw data export to csv file and
made a 91% time optimization on a process that took about 35 minutes before
we optimize and now takes about 3 minutes. The only thing we've done is make
our own ToString methods for DateTime, int and decimal, which is really
simple once you know how (there are many exemples on the net for this kind
of method). If you don't care about localization and use a lot of ToString,
maybe it could be something to begin with.

Working with arrays (<Type>[] myVar = new <Type>[<Count>]

is way much
faster using collection objects, even generics. It's a bit more trouble to
manage, but you then you can optimize manipulation exactly to your needs.

Throwing a lot of exceptions will have a big influence in debug mode, but
will tend to become less important when building in release mode.

With the results we got from our tests, it seems foreach is usually slower
than the normal for loop mecanism, but is sometimes faster (for some reason)
in some situation. So I guess it's just a matter of testing both and finding
some pattern for when it's faster or slower.

I hope it helps

ThunderMusic

Marc Gravell · May 11, 2007

Here we have done a bit of optimizing on a raw data export to csv file

Care to share any code / references? For instance, I do a lot of
custom serialization via IXmlSerializable, which essentially boils
down to the same thing; writing strings (representing lots of values)
to a stream... I'd be interested in practical things I can do to make
things more efficient...

Marc

ThunderMusic · May 11, 2007

I cannot send you the final code, but here's the page where we found all we
needed...

http://geekswithblogs.net/akraus1/archive/2006/04/23/76146.aspx

I hope it helps

ThunderMusic

Performance tuning

atlaste

Marc Gravell

Nicholas Paldino [.NET/C# MVP]

atlaste

Nicholas Paldino [.NET/C# MVP]

David Madden

atlaste

atlaste

Nicholas Paldino [.NET/C# MVP]

atlaste

David Madden

ThunderMusic

Marc Gravell

ThunderMusic