C# optimization

N

NigelW

This is really a question for the development team.

Are there plans to improve the optimization of C# to MSIL?

I ask this, as inspection with ILDASM of the MSIL code
shows that, even with the optimization switch on, there is
no inlining even of very short methods, common sub-
expressions are evaulated each time used and constants
used in loops are evaluated at each iteration. The
following code, where I hand optimized the source, runs
about twice as fast as the equivalent, original code shown
further down. I notice that the MSIL produced by the C++
compiler (with optimization on and the /clr flag set) does
have the optimizations. I'd appreciate your feedback as I
feel the relatively slow C# code is detering me moving
from C++ to C#.

double total = 0.0;
double recip1 = 1.0/999999.0;

for (int rep = 0; rep < 5; rep++)
{
total *= 0.001;

for (long i = 0; i <
100000000; i++)
{
double di = i;
total += di*recip1;
double disc =
total*total + di;
double root =
(total + disc)/(200000.0*(di + 1));
total -= root;
}
}

Unoptimized (original code):

double total = 0.0;

for (int rep = 0; rep < 5; rep++)
{
total /= 1000.0;

for (long i = 0; i <
100000000; i++)
{
total +=
i/999999.0;
double disc =
total*total + i;
double root =
(total + disc)/(200000.0*(i + 1));
total -= root;
}
}
 
E

Eric Gunnerson [MS]

You are correct that MSIL is a largely unoptimized format. The optimization
happens as part of the JIT process, and can thereby be shared by all the
compilers. It is true that the C++ compiler does do some optimizations ahead
of time, but this is largely an artifact of their compilation process, and
(to my knowledge) does not produce big performance gains over what C# and VB
can do (though I've never tested it myself).

I did some testing, and I think the reason you're seeing a difference is
because you're using a long as your loop counter. The current JIT isn't very
aggressive in optimizing operations with 64-bit types, and that shows up.

If you switch the benchmark to use 'int', the results are much much closer.
They may be identical, but my measurements are good enough to be sure.

--
Eric Gunnerson

Visit the C# product team at http://www.csharp.net
Eric's blog is at http://blogs.gotdotnet.com/ericgu/

This posting is provided "AS IS" with no warranties, and confers no rights.
 
P

Per Larsen

Eric,
The optimization happens as part of the JIT process, and can thereby be
shared by all the compilers.

I'm somewhat puzzled by this position, but I've seen it represented several
times by other Microsoft people as well.

I agree that if the JIT'er could perform code optimizations /
transformations of arbitrary complexity in linear time, it would make sense
to have it be responsible for all types of optimization, but in the real
world, the JIT'er is likely to always be slower than what we'd ideally want
and yet to crank out sub-optimal code. The reason being that the two goals
for the JIT'er - that it should be fast as well as produce efficient code -
are at odds with each other.

While the C# compiler (and other front-ends) shouldn't (can't) attempt any
optimizations that exploit platform idiosyncrasies, wouldn't it make sense
for front-ends to perform as much of the complex, time-consuming
optimization work as possible and thus give the JIT'er the best possible
quality of IL code to work with?

- Per
 
J

Jon Skeet [C# MVP]

shared by all the compilers.

I'm somewhat puzzled by this position, but I've seen it represented several
times by other Microsoft people as well.

I agree that if the JIT'er could perform code optimizations /
transformations of arbitrary complexity in linear time, it would make sense
to have it be responsible for all types of optimization, but in the real
world, the JIT'er is likely to always be slower than what we'd ideally want
and yet to crank out sub-optimal code. The reason being that the two goals
for the JIT'er - that it should be fast as well as produce efficient code -
are at odds with each other.

While the C# compiler (and other front-ends) shouldn't (can't) attempt any
optimizations that exploit platform idiosyncrasies, wouldn't it make sense
for front-ends to perform as much of the complex, time-consuming
optimization work as possible and thus give the JIT'er the best possible
quality of IL code to work with?

Not really - because code which looks highly optimised in IL is often
actually harder for the JIT to work with. For instance, the C# compiler
could do some loop unrolling - but then the JIT would find it harder to
spot the loop to unroll it itself, and it probably has a far better
idea of exactly how much unrolling to do.

JITs generally work best with the simplest representation of the
program possible. Indeed, in the case of Java, as soon as the JIT came
along developers were advised *not* to use the -O (optimise) flag of
the javac compiler as it slowed things down in actual performance.

I've seen a couple of examples where the C# compiler could do better,
but those weren't so much optimisations of "natural" code to
"optimised" code so much as a simpler representation of the natural
code to start with. That's the kind of thing the C# compiler team
should be looking at, IMO.
 
P

Per Larsen

Not really - because code which looks highly optimised in IL is often
actually harder for the JIT to work with. For instance, the C# compiler
could do some loop unrolling - but then the JIT would find it harder to
spot the loop to unroll it itself, and it probably has a far better
idea of exactly how much unrolling to do.

Not a very good example, IMHO. For the reasons you mention, loop unrolling
might not belong in the front-end - at least not in all cases, but if the
front-end did unroll loops, then - for fully unrolled loops, at least -
there would be no loop for the JIT'er to even consider unrolling (as it
would already be unrolled), and so it could save the associated analysis
overhead completely.

Also, sometimes unrolling a loop can open opportunities for applying other,
global optimizations.
JITs generally work best with the simplest representation of the
program possible. Indeed, in the case of Java, as soon as the JIT came
along developers were advised *not* to use the -O (optimise) flag of
the javac compiler as it slowed things down in actual performance.

Hmm - I'm not sure what sort of optimizations that would be. Most
traditional front-end optimizations (dead code elimination, common
sub-expression elimination, and so on) actually make the object code
simpler, and smaller - not more complex.

- Per
 
J

Jon Skeet [C# MVP]

Not a very good example, IMHO. For the reasons you mention, loop unrolling
might not belong in the front-end - at least not in all cases, but if the
front-end did unroll loops, then - for fully unrolled loops, at least -
there would be no loop for the JIT'er to even consider unrolling (as it
would already be unrolled), and so it could save the associated analysis
overhead completely.

Yes, although the JIT may know that doing that unrolling is (on that
particular processor) less performant than not unrolling at all.
Also, sometimes unrolling a loop can open opportunities for applying other,
global optimizations.

Sure - but the JIT can do that unrolling, can't it?
Hmm - I'm not sure what sort of optimizations that would be. Most
traditional front-end optimizations (dead code elimination, common
sub-expression elimination, and so on) actually make the object code
simpler, and smaller - not more complex.

Certainly there are optimisations which work in that way, but the JIT
can do those too. The advantage here is that only *one* optimiser needs
to be really heavily worked on, rather than one per language.

As for what optimisations work against the JIT, I don't have any
examples, but I've been assured that it's the case by people with far
more knowledge than myself.
 
P

Per Larsen

Yes, although the JIT may know that doing that unrolling is (on that
particular processor) less performant than not unrolling at all.
True

Sure - but the JIT can do that unrolling, can't it?

Theoretically, but the analysis is time-consuming (relatively speaking).
I've never personally seen a loop unrolled, though I've tried to present
very suggestive loops to the JIT'er, have you? I must admit this was a while
ago - might have been in the v1.0 time-frame.
Certainly there are optimisations which work in that way, but the JIT
can do those too.

Again, in theory, but it doesn't - presumably because the analysis required
would slow it down too much.
The advantage here is that only *one* optimiser needs to be really
heavily worked on, rather than one per language.

This is indeed an advantage. It's also possible for the .NET framework to
have a built-in pre-JIT'er of sorts, which could optimize the IL in advance.

- Per
 
J

Jon Skeet [C# MVP]

Theoretically, but the analysis is time-consuming (relatively speaking).
I've never personally seen a loop unrolled, though I've tried to present
very suggestive loops to the JIT'er, have you? I must admit this was a while
ago - might have been in the v1.0 time-frame.

I haven't - but then I haven't looked at the JITted code for anything,
to be honest.
Again, in theory, but it doesn't - presumably because the analysis required
would slow it down too much.

Do you know it *definitely* doesn't do any of them? Bearing in mind
what Eric said about the version using int instead of long taking
almost exactly the same time as the C++ version with optimised IL, that
would suggest that either the JIT *is* doing common subexpression
elimination or it isn't actually as important as all that.
heavily worked on, rather than one per language.

This is indeed an advantage. It's also possible for the .NET framework to
have a built-in pre-JIT'er of sorts, which could optimize the IL in advance.

True.
 
P

Per Larsen

Jon Skeet said:
Do you know it *definitely* doesn't do any of them?

No, sorry - that didn't come out right. I was referring to dead code
elimination specifically, which it doesn't do. The C# compiler might do
common subexpression elimination - I haven't checked for that specifically,
but the last time I looked, it didn't do invariant code hoisting in
general - only for a few specific cases, like taking the length of an array
in a for loop. Dead code elimination isn't too important unless we're
talking global optimizations, which, unfortunately, are not even being
considered, AFAIK. By global optimizations, I'm referring to things like
constant propagation across method boundaries.

- Per
 
N

NigelW

A very interesting discussion. My experiments using
comparisons between C# and C++ (compiled to MSIL) confirm
that replacing the long with an int does bring the C#
performance up to that of C++ (which, incidently, for the
small section of code I've used, runs virtually as fast as
MSIL under JIT as when compiled to native code).

The C++ compiler does eliminate the common subexpression
which converts the long or int to a double for use in a
floating point calculation. The C# compiler does not
eliminate it. From this I can conclude that the conversion
when using an int takes an insignificant time, unlike when
using a long.

Worryingly, when I replace (in the code of my original
message) the inline total*total by a call to a method

double Square(double x) { return x*x; }

the execution time more than doubles for C#, but is
unchanged for C++. Examination of the MSIL shows that C++
has inlined the call, C# has not. The different execution
times show that the JITer has not inlined either.

I understand the saving in effort by concentrating effort
on the JIT's optimization, but it does not appear to do
much optimization anyway. Also, optimization is a well
researched topic and the C++ compiler does it very well.

Salford Software produce a Fortran for .NET compiler which
they claim achieves over 90% the performance of native
code, a speed with which I would be very happy to see for
C#. Presumably the Fortran compiler is doing lots of
optimisation.
 
N

NigelW

Eric
Thank you for your speedy response. As you will see from
my detailed reply further down this chain, using an int
instead of a long does bring the C# performance virtually
up to that of C++. The common subexpression converting the
int is still in the MSIL 3 times, unlike that for C++
where it has been eliminated apart from the necessary once.
The speed gain is clearly due to the greater efficiency of
int to double over long to double conversion.

Worryingly, I notice that using call to square a double,
intead of an inline multiplication, more than doubles the
C# execution time (please see my message further down this
chain for details). The C++ time is unchanged - inspection
of C++'s MSIL shows the call has been inlined, the C# call
remain out of line.

Thank you for your interest and help.

Nigel
 
J

Jon Skeet [C# MVP]

NigelW said:
Worryingly, when I replace (in the code of my original
message) the inline total*total by a call to a method

double Square(double x) { return x*x; }

the execution time more than doubles for C#, but is
unchanged for C++.

Could you post your complete benchmarking code? I don't see that occur
at all.

Here's my benchmarking code:

using System;

public class Test
{
static void Main(string[] args)
{
DateTime start = DateTime.Now;
double total = 0.0;

for (int rep = 0; rep < 5; rep++)
{
total /= 1000.0;

for (int i = 0; i <
100000000; i++)
{
total +=
i/999999.0;
double disc =
Square(total) + i;
double root =
(total + disc)/(200000.0*(i + 1));
total -= root;
}
}
DateTime end = DateTime.Now;

Console.WriteLine (end-start);
}

static double Square (double x)
{
return x*x;
}
}

My *guess* is that your benchmarking code is within a form - and I
don't believe the JITter will inline methods within MarshalByRef
classes, including forms. That's arguably a problem, but it's probably
*less* of a problem as I don't believe that forms should often include
number-crunching code. It certainly shows that the JIT *can* inline
methods appropriately. (Just making the Test class, which is never
instantiated, inherit from System.Windows.Forms.Form doubles the
execution time on my box.)
 
M

mikeb

Jon said:
Could you post your complete benchmarking code? I don't see that occur
at all.

Here's my benchmarking code:

... snip ...
My *guess* is that your benchmarking code is within a form - and I
don't believe the JITter will inline methods within MarshalByRef
classes, including forms. That's arguably a problem, but it's probably
*less* of a problem as I don't believe that forms should often include
number-crunching code. It certainly shows that the JIT *can* inline
methods appropriately. (Just making the Test class, which is never
instantiated, inherit from System.Windows.Forms.Form doubles the
execution time on my box.)

Also, Nigel needs to make sure not to run benchmarks from within VS.NET
(if that's what's being used). When .NET code is run from VS.NET, the
JITter does not perform optimizations, even if the assembly was built
with optimizations turned on and is not marked for debugging. I assume
this is to allow somewhat easier debugging of release builds.
 
J

Jon Skeet [C# MVP]

mikeb said:
Also, Nigel needs to make sure not to run benchmarks from within VS.NET
(if that's what's being used). When .NET code is run from VS.NET, the
JITter does not perform optimizations, even if the assembly was built
with optimizations turned on and is not marked for debugging. I assume
this is to allow somewhat easier debugging of release builds.

I would imagine it would work fine when run from VS.NET with "Start
without debugging", wouldn't it?
 
P

Peter Koen

I would imagine it would work fine when run from VS.NET with "Start
without debugging", wouldn't it?

No, even when started without debugging, the VS.NET Debugger is attached to
the process and certain JIT Optimizations are turned off.

--
------ooo---OOO---ooo------

Peter Koen - www.kema.at
MCAD MCDBA MCT
CAI/RS CASE/RS IAT

------ooo---OOO---ooo------
 
P

Per Larsen

No, even when started without debugging, the VS.NET Debugger is attached to
the process and certain JIT Optimizations are turned off.

Not a big deal for me, as I always test outside VS.NET, but I'm curious: Is
this documented anywhere (what specifically is being disabled, and when
exactly)?

- Per
 
M

mikeb

Jon said:
I would imagine it would work fine when run from VS.NET with "Start
without debugging", wouldn't it?

Well, I thought that it disabled JIT optimization regardless. I'm sure
I got that bit of lore from the microsoft.public.dotnet.* newsgroups
somewhere, sometime ago.

But at your prompting, I actually tested it, and it seems I'm wrong. If
you run a .NET application inside VS.NET (using Ctrl+F5 or the "Start
without Debugging" menu option) the JIT does perform optimizations,
which is what you would intuitively expect.
 
N

NigelW

Thank you.
I did have my benchmark code in a form. Now its in a
separate class, it runs as fast with the call to Square()
as it does with manually inlined calculation of the square
(and as fast as C++).
I would not normally put such code in a form, but it was a
quick way to run a benchmark; it did not occur to me that
optimization is dependent on the code not being in a
certain class. Is this documented anywhere?
I wonder whether the pre-jitter NGEN is similarly
selective? Are there other classes for which code in their
derivatives is not optimized?
As the jitter does no/less optimization for code in forms,
it might provide a means of finding out what optimisations
it does, by comparing the speed of code when outside a
form with when its inside - I'll try to find time to
investigate over the next few days.
BTW, I was aware of the effect of running in/outside
Visual Studio - I do all timing outside.

Thank you for solving a puzzle for me.

Nigel





-----Original Message-----
NigelW said:
Worryingly, when I replace (in the code of my original
message) the inline total*total by a call to a method

double Square(double x) { return x*x; }

the execution time more than doubles for C#, but is
unchanged for C++.

Could you post your complete benchmarking code? I don't see that occur
at all.

Here's my benchmarking code:

using System;

public class Test
{
static void Main(string[] args)
{
DateTime start = DateTime.Now;
double total = 0.0;

for (int rep = 0; rep < 5; rep++)
{
total /= 1000.0;

for (int i = 0; i <
100000000; i++)
{
total +=
i/999999.0;
double disc =
Square(total) + i;
double root =
(total + disc)/(200000.0*(i + 1));
total -= root;
}
}
DateTime end = DateTime.Now;

Console.WriteLine (end-start);
}

static double Square (double x)
{
return x*x;
}
}

My *guess* is that your benchmarking code is within a form - and I
don't believe the JITter will inline methods within MarshalByRef
classes, including forms. That's arguably a problem, but it's probably
*less* of a problem as I don't believe that forms should often include
number-crunching code. It certainly shows that the JIT *can* inline
methods appropriately. (Just making the Test class, which is never
instantiated, inherit from System.Windows.Forms.Form doubles the
execution time on my box.)

--
Jon Skeet - <[email protected]>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
.
 
J

Jon Skeet [C# MVP]

NigelW said:
I did have my benchmark code in a form. Now its in a
separate class, it runs as fast with the call to Square()
as it does with manually inlined calculation of the square
(and as fast as C++).
I would not normally put such code in a form, but it was a
quick way to run a benchmark;

Just out of interest (and this is straying off topic, but it's
something that interests me) why do you think that is? I've always
found it *much* quicker to write a small console project to do
something than a GUI project. You don't even need to fire up VS.NET to
write a console project quickly, but even if you *do*, there's no
mucking around putting a button in or whatever - you just put the code
in Main and run it.

I'm not having a go at you personally - I'm just wondering why everyone
else seems to find it quicker to write short GUI programs than short
console programs. I see it pretty much every day, and my natural
inclination when everyone else does something differently to me is to
ask why - it usually means I'm missing something. Am I in this case?
it did not occur to me that
optimization is dependent on the code not being in a
certain class. Is this documented anywhere?

I don't know - I only ran across it a while ago in another newsgroup.
I wonder whether the pre-jitter NGEN is similarly
selective? Are there other classes for which code in their
derivatives is not optimized?
As the jitter does no/less optimization for code in forms,
it might provide a means of finding out what optimisations
it does, by comparing the speed of code when outside a
form with when its inside - I'll try to find time to
investigate over the next few days.

It's not just forms. It's any class which derives from
MarshalByRefObject. I suspect there are some good reasons somewhere,
but hopefully they'll have been ironed out by the time Whidbey comes
out. I don't know whether or not there are other optimisations it
doesn't perform as well as inlining, by the way.
 
F

Frank Oquendo

Jon said:
I'm not having a go at you personally - I'm just wondering why
everyone else seems to find it quicker to write short GUI programs
than short console programs. I see it pretty much every day, and my
natural inclination when everyone else does something differently to
me is to ask why - it usually means I'm missing something. Am I in
this case?

Well if you're missing something, so am I. I always start my test cases
as console applications.

--
There are 10 kinds of people. Those who understand binary and those who
don't.

http://code.acadx.com
(Pull the pin to reply)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top