Lacking for-loop optimization in C#

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

I stumbled over an optimization (or lack of one, to be specific) when viewing
IL opcodes generated by the compiler using ms .net 2003. I was testing fast
pixel manipulation using Bitmap.LockBits and unsafe pointers to iterate over
an image's pixels. The inner-loop looked like this:

for (int x = 0; x < _buffer.Width*_buffer.Height; x++)
{
// do manipulations here..
}

I noticed how the for-loop performed extremely slow, and ildasm revealed
that the "x < _buffer.Width*_buffer.Height" was computed at each iteration,
resulting in two virtual calls and one mul. The use of /optimize did not
improve the matter, so the solution was to calculate
_buffer.Width*_buffer.Height outside of the for-loop. I was under the
impression that the compiler performed hoisting automatically, am I missing
something here?
 
If the compiler can't guarantee that the value of
_buffer.Width*_buffer.Height changes during any loop iteration, then it'll
have to be calculated each time. That is the nature of a C/C++/C# for-loop.
It's quite common to have for-loops where you want this to occur. (note that
VB never recalculates the ending condition)

David Anton
www.tangiblesoftwaresolutions.com
Home of the Instant C# VB.NET to C# converter
and the Instant VB C# to VB.NET converter
 
Hi MariusI,
for (int x = 0; x < _buffer.Width*_buffer.Height; x++)
{
// do manipulations here..
}

I noticed how the for-loop performed extremely slow, and ildasm revealed
that the "x < _buffer.Width*_buffer.Height" was computed at each iteration,

I think that it is working as it should be.
How do you say about that code:

for (int x = 0; x < _buffer.Width*_buffer.Height; x++)
{
// do manipulations here..
if( _buffer.Height>1 ) {
_buffer.Height=_buffer.Height-1;
}
}

Of course there is no sense to shrink a "_buffer" but
the loop expression should work in this case too.
Why the compiler have to guess what you have in mind?

Regards
Marcin
 
Thanks for the reply. When you say "can't guarantee" do you mean that a
multithreaded system could alter the _buffer size and width? (There is no
code inside the for-loop which refers to the buffer).
 
Not knowing the contents of your loop, I assumed that the compiler may have
determined that _buffer may possibly be changed.

If there is no way that _buffer could have been altered during loop
execution, then this is indeed probably a shortcoming of the compiler.
 
Did it actually get any faster when you hoisted the code out of the loop?

I ask, because in general, these kinds of optimizations are done by the JIT
compiler, not the C# compiler. One reason for that is this enables all
languages to take advantage of these kinds of optimization, without every
single compiler having to repeat the same work all the other compilers do.

Most .NET compilers do very little optimization for this reason.

So I'm wondering if this really is slow for the reason you think it is. The
actual compiled code that runs at run time doesn't necessarily have the same
structure as the IL shown by ILDASM. It often doesn't. Indeed there are
certain cases where it definitely will hoist invariants outside of loops at
JIT compile time.

But if the change really did improve things, then it's probably, as others
have suggested, either because the compiler was unable to be absolutely
certain that the values wouldn't change (e.g. because you are calling
virtual accessors), or because the JIT compiler failed to optimize in this
case.
 
MariusI said:
I stumbled over an optimization (or lack of one, to be specific) when
viewing
IL opcodes generated by the compiler using ms .net 2003. I was testing
fast
pixel manipulation using Bitmap.LockBits and unsafe pointers to iterate
over
an image's pixels. The inner-loop looked like this:

for (int x = 0; x < _buffer.Width*_buffer.Height; x++)
{
// do manipulations here..
}

I noticed how the for-loop performed extremely slow, and ildasm revealed
that the "x < _buffer.Width*_buffer.Height" was computed at each
iteration,
resulting in two virtual calls and one mul. The use of /optimize did not
improve the matter, so the solution was to calculate
_buffer.Width*_buffer.Height outside of the for-loop. I was under the
impression that the compiler performed hoisting automatically, am I
missing
something here?

It's not the C# compiler that performs hoisting, it's the JIT compilers that
does a limited form of hoisting but this depends on what's been done in the
loop. Mind to share the code in the loop?

Willy.
 
Yes, that is a good point, but as long as there is no code within the
for-loop which refers to the buffer, the compiler should optimize away the
buffer.Width*_buffer.Height calculation (move it outside) at compile time. In
my case there was no code inside the for-loop refering to the buffer which
mean that the compiler should have moved the calculation outside.
 
To be honest there isn't any, but i found an old graphical effect which i had
coded which suffered from the same problems. I hoisted all for-loop variables
and the performance increase was truly impressive. I'll give you the code
whole code for the class, just cut'n paste :-)

// --- Field.cs

using System;
using System.Drawing;
using System.Drawing.Imaging;
using System.Collections;
using System.ComponentModel;
using System.Windows.Forms;
using System.Threading;

namespace Field
{
public unsafe sealed class Field : System.Windows.Forms.Form
{
private const int MAXAMP=20, MINAMP=5;

private System.ComponentModel.Container components = null;
private Graphics _hDC;
private Bitmap _buffer;
private int _numPixelsInbuffer;
private int _bufferWidth, _bufferHeight;
private uint [] _palette;
private int [] _hBaseMap, _hRunnerMap;
private Thread _runner;
private bool _done;

public Field()
{
InitializeComponent();
this.SetStyle(ControlStyles.AllPaintingInWmPaint | ControlStyles.UserPaint
| ControlStyles.Opaque, true);
}

/// <summary>
/// Clean up any resources being used.
/// </summary>
protected override void Dispose( bool disposing )
{
if( disposing )
{
if(components != null)
{
components.Dispose();
}
}
base.Dispose( disposing );
}

/// <summary>
/// Generates a heightmap
/// </summary>
/// <param name="map">Map to generate</param>
private void generateHeightMap(ref int [] map, int minamp, int maxamp)
{
Random r = new Random();
int [] xBaseSine = new int[_bufferWidth];

double freqX = r.Next((_bufferWidth-10)/5)+10;
int amplitudeX = r.Next(maxamp-minamp)+minamp;
double freqY = r.Next((_bufferHeight-10)/5)+10;
int amplitudeY = r.Next(maxamp-minamp)+minamp;

fixed (int* pMap = map, pBase = xBaseSine)
{
int* pM = pMap;
int* pB = pBase;

for (int x = 0; x < _bufferWidth; x++ )
{
*pB = (int)(Math.Sin(x/freqX)*amplitudeX);
pB++;
}

for ( int y = 0; y < _bufferHeight; y++ )
{
pB = pBase;
int yBaseSine = (int)(Math.Sin(y/freqY)*amplitudeY);
for ( int x = 0; x < _bufferWidth; x++ )
{
*pM = yBaseSine+*pB;
pM++; pB++;
}
}
}

}

/// <summary>
/// Initializes our palette with shades of blue
/// </summary>
private uint[] initializePalette(int size)
{
uint[] palette = new uint[size];
double blueDecrPrStep = 255f/size;
for (int i = 0; i < size; i++)
palette = 0xFF000000 | (byte)(255f-i*blueDecrPrStep);

return palette;
}

private void run()
{
const int FADE=1, ZEROPOINT = 2*MAXAMP;

bool regenerate = true;
int[] tmpArrayPointer;

_palette = initializePalette(4*MAXAMP);
_hRunnerMap = new int[_numPixelsInbuffer];
_hBaseMap = new int[_numPixelsInbuffer];

generateHeightMap(ref _hRunnerMap, MINAMP, MAXAMP);

while(!_done)
{
if (regenerate)
{
tmpArrayPointer = _hBaseMap;
_hBaseMap = _hRunnerMap;
_hRunnerMap = tmpArrayPointer;
generateHeightMap(ref _hRunnerMap, MINAMP, MAXAMP);
}
regenerate = true;

fixed (int* pBaseMap = _hBaseMap, pRMap = _hRunnerMap)
{
int* pB = pBaseMap; // phMap is read only
int* pR = pRMap; // prHMap is read only

BitmapData bData = _buffer.LockBits(new Rectangle(0, 0, _buffer.Width,
_buffer.Height),
ImageLockMode.WriteOnly, _buffer.PixelFormat);

uint* pPixel = (uint*)bData.Scan0.ToPointer();

for ( int i = 0; i < _numPixelsInbuffer; i++ )
{
*pPixel = _palette[*pB+ZEROPOINT];
++pPixel;

if (*pR > *pB)
{
*pB += FADE;
regenerate = false;
}
else if (*pR < *pB)
{
*pB -= FADE;
regenerate = false;
}
pB++;
pR++;
}

_buffer.UnlockBits(bData);
}

_hDC.DrawImage(_buffer,0,0);
}
}

#region Windows Form Designer generated code
/// <summary>
/// Required method for Designer support - do not modify
/// the contents of this method with the code editor.
/// </summary>
private void InitializeComponent()
{
//
// Field
//
this.AutoScaleBaseSize = new System.Drawing.Size(5, 13);
this.ClientSize = new System.Drawing.Size(492, 371);
this.Name = "Field";
this.Text = "Field";
this.Closing += new
System.ComponentModel.CancelEventHandler(this.Field_Closing);
this.Load += new System.EventHandler(this.Field_Load);

}
#endregion

private void Field_Load(object sender, System.EventArgs e)
{
_hDC = this.CreateGraphics();
_buffer = new Bitmap(this.ClientSize.Width, this.ClientSize.Height, _hDC);
_numPixelsInbuffer = _buffer.Width*_buffer.Height;
_bufferWidth = _buffer.Width;
_bufferHeight = _buffer.Height;
_runner = new Thread(new ThreadStart(this.run));
_runner.Start();
}

private void Field_Closing(object sender,
System.ComponentModel.CancelEventArgs e)
{
_done = true;

// wait for thread to end naturally
while (_runner.IsAlive) {}
}
}


}
 
MariusI said:
To be honest there isn't any, but i found an old graphical effect which i had
coded which suffered from the same problems. I hoisted all for-loop variables
and the performance increase was truly impressive. I'll give you the code
whole code for the class, just cut'n paste :-)

Ah, I see these are member variables. I suspect that the compiler is
indeed concerned (not unreasonably) about the possibility of them being
modified in a
multi-threaded environment.
 
Or, more likely, the compiler has no way of knowing what might cause
the .Width and .Height properties to change, so it did the conservative
thing and left the property fetches in the loop. If you were to change
it to this:

int wid = _buffer.Width;
int high = _buffer.Height;
for (int i = 0; i < wid * high; i++)
{
}

I would be very surprised if the compiler did not optimize and do the
multiply only once. The compiler should know that the _buffer reference
isn't changing, but what it can't know is whether something inside the
loop might not affect the values that _buffer chooses to return for
Width and Height.
 
Back
Top