Slow For-Loop (Picture Data Access)

H

hufaunder

I have 16-bit data that I want to display. In order to display it I
compress a certain range of the input data into 8 bit (I need control
over this). All seems to work ok except that it is dead slow both in
release and debug mode. The main problem is the for-loop shown in the
code bellow. Without the for-loop I can get somewhere around 10-20
updates/second. With the for-loop I get an update every 2-3 seconds.
The strange thing is that when I measure the actual time (as shown
bellow) I get values of 400-600ms. That is still VERY slow but what I
observe is even another 5x slower. I don't know where that comes from.

Any idea on how to very significantly improve this or any other
completely different solution is appreciated.

Thanks

Single scale = ...;
bmpData = image.LockBits(rec, ImageLockMode.WriteOnly,
image.PixelFormat);
ptr = bmpData.Scan0;
// DateTime time;
for (i = 0; i < image.Width*image.Height; i++) {
Int32 value;

value = (Int32)((data-l)*scale);
if(value < 0) value = 0;
else if(value > 255) value = 255;
rgbValues[i * 3 + 0] = (Byte)value;
rgbValues[i * 3 + 1] = (Byte)value;
rgbValues[i * 3 + 2] = (Byte)value;
}
//
MessageBox.Show(((TimeSpan)DateTime.Now.Subtract(date)).Milliseconds.ToString());

System.Runtime.InteropServices.Marshal.Copy(rgbValues, 0,
ptr, numberBytes);
image.UnlockBits(bmpData);
 
G

Greg Young

yes ... a good start would be removing the if() from the loop

if(value < 0) value = 0;
else if(value > 255) value = 255;

I would imagine you could do this with a map?

your array data-l ... if you disassemble the code are you getting an
array bounds check?

also

rgbValues[i * 3 + 0] = (Byte)value;
rgbValues[i * 3 + 1] = (Byte)value;
rgbValues[i * 3 + 2] = (Byte)value;

you should be able to do this with 1 write treat rgbvalues as an uint*
instead of a byte array (then add 3 to a byte pointer (cast back to a uint
etc)

Cheers,

Greg
 
B

Bruce Wood

I have 16-bit data that I want to display. In order to display it I
compress a certain range of the input data into 8 bit (I need control
over this). All seems to work ok except that it is dead slow both in
release and debug mode. The main problem is the for-loop shown in the
code bellow. Without the for-loop I can get somewhere around 10-20
updates/second. With the for-loop I get an update every 2-3 seconds.
The strange thing is that when I measure the actual time (as shown
bellow) I get values of 400-600ms. That is still VERY slow but what I
observe is even another 5x slower. I don't know where that comes from.

Any idea on how to very significantly improve this or any other
completely different solution is appreciated.

Thanks

Single scale = ...;
bmpData = image.LockBits(rec, ImageLockMode.WriteOnly,
image.PixelFormat);
ptr = bmpData.Scan0;
// DateTime time;
for (i = 0; i < image.Width*image.Height; i++) {
Int32 value;

value = (Int32)((data-l)*scale);
if(value < 0) value = 0;
else if(value > 255) value = 255;
rgbValues[i * 3 + 0] = (Byte)value;
rgbValues[i * 3 + 1] = (Byte)value;
rgbValues[i * 3 + 2] = (Byte)value;
}
//
MessageBox.Show(((TimeSpan)DateTime.Now.Subtract(date)).Milliseconds.ToString());

System.Runtime.InteropServices.Marshal.Copy(rgbValues, 0,
ptr, numberBytes);
image.UnlockBits(bmpData);


What is the type of "rgbValues"?
 
H

hufaunder

Greg,

This is a great suggestion. What I am going to try is setting up a
table with 2^16 entries and then just do a lookup for each value. Also
in order to make things easier I will use RGBA rather then RGB as that
aligns to 4 bytes. To be honest, from my experience I am not that
confident that this will give a 30-60x improvement but it's worth a
try. Honestly, I relatively simple loop of around 200k should in no
circimstances take 2-3 seconds. Maybe there is some type miss-match
that takes up all this time.

Thanks



Greg said:
yes ... a good start would be removing the if() from the loop

if(value < 0) value = 0;
else if(value > 255) value = 255;

I would imagine you could do this with a map?

your array data-l ... if you disassemble the code are you getting an
array bounds check?

also

rgbValues[i * 3 + 0] = (Byte)value;
rgbValues[i * 3 + 1] = (Byte)value;
rgbValues[i * 3 + 2] = (Byte)value;

you should be able to do this with 1 write treat rgbvalues as an uint*
instead of a byte array (then add 3 to a byte pointer (cast back to a uint
etc)

Cheers,

Greg
I have 16-bit data that I want to display. In order to display it I
compress a certain range of the input data into 8 bit (I need control
over this). All seems to work ok except that it is dead slow both in
release and debug mode. The main problem is the for-loop shown in the
code bellow. Without the for-loop I can get somewhere around 10-20
updates/second. With the for-loop I get an update every 2-3 seconds.
The strange thing is that when I measure the actual time (as shown
bellow) I get values of 400-600ms. That is still VERY slow but what I
observe is even another 5x slower. I don't know where that comes from.

Any idea on how to very significantly improve this or any other
completely different solution is appreciated.

Thanks

Single scale = ...;
bmpData = image.LockBits(rec, ImageLockMode.WriteOnly,
image.PixelFormat);
ptr = bmpData.Scan0;
// DateTime time;
for (i = 0; i < image.Width*image.Height; i++) {
Int32 value;

value = (Int32)((data-l)*scale);
if(value < 0) value = 0;
else if(value > 255) value = 255;
rgbValues[i * 3 + 0] = (Byte)value;
rgbValues[i * 3 + 1] = (Byte)value;
rgbValues[i * 3 + 2] = (Byte)value;
}
//
MessageBox.Show(((TimeSpan)DateTime.Now.Subtract(date)).Milliseconds.ToString());

System.Runtime.InteropServices.Marshal.Copy(rgbValues, 0,
ptr, numberBytes);
image.UnlockBits(bmpData);
 
J

Jon Skeet [C# MVP]

I have 16-bit data that I want to display. In order to display it I
compress a certain range of the input data into 8 bit (I need control
over this). All seems to work ok except that it is dead slow both in
release and debug mode. The main problem is the for-loop shown in the
code bellow. Without the for-loop I can get somewhere around 10-20
updates/second. With the for-loop I get an update every 2-3 seconds.
The strange thing is that when I measure the actual time (as shown
bellow) I get values of 400-600ms. That is still VERY slow but what I
observe is even another 5x slower. I don't know where that comes from.

Could you post a short but complete program which demonstrates the
problem?

See http://www.pobox.com/~skeet/csharp/complete.html for details of
what I mean by that.

As you say, it does sound very slow...
 
H

hufaunder

Jon,

Thanks for the reply. I was going to create some simple test case
tomorrow. I'll send the results within the next 24h.
 
M

Markus

instead of this line:
for (i = 0; i < image.Width*image.Height; i++) {

try to cache the image width/height, as these calls are very slow:

int w = image.Width;
int h = image.Height;
int wh = w*h;
for (i = 0; i < wh; i++) {
//[...]
}

hth
Markus
 
G

Greg Young

Ignore this advice .. The properties get inlined providing you are running
with JIT optimizations (btw you ARE running with JIT optimizations (i.e.
without debugging by default) right?) ... Please see
http://codebetter.com/blogs/gregyoung/archive/2006/06/11/146343.aspx for
much further explanation. Manually hoisting this may not be a bad idea
though I would have to look at the assembly generated.

Cheers,

Greg

Markus said:
instead of this line:
for (i = 0; i < image.Width*image.Height; i++) {

try to cache the image width/height, as these calls are very slow:

int w = image.Width;
int h = image.Height;
int wh = w*h;
for (i = 0; i < wh; i++) {
//[...]
}

hth
Markus
 
J

Jon Skeet [C# MVP]

Greg Young said:
Ignore this advice .. The properties get inlined providing you are running
with JIT optimizations (btw you ARE running with JIT optimizations (i.e.
without debugging by default) right?) ... Please see
http://codebetter.com/blogs/gregyoung/archive/2006/06/11/146343.aspx for
much further explanation. Manually hoisting this may not be a bad idea
though I would have to look at the assembly generated.

Inlining has nothing to do with the calls being slow. Even if the
properties are inlined, that won't stop the GDI+ call (which I believe
is being made) any faster.

I certainly didn't infer from Markus' post that it was the act of
calling the property method that was slow - I understood it to be that
the calculation of those properties is slow.

It's certainly worth *trying* that advice rather than blindly accepting
it - but I wouldn't *ignore* it either.
 
H

hufaunder

Markus,

Thanks a lot for that suggestion. Actually, after reading this I
remembered that I had a similar issue before and taking the
multiplication out of the loop header did help a lot. At that time it
took me a while to figure this out because the loop did a lot of stuff
and I just didn't think that this multiplication could have been the
reason. Have to upgrade my brain storage retrival...

When writing assembly code I do these kind of optimizations. But
today's compilers are very sophisticated regarding optimizations so the
trival optimizations I leave to the compiler. Well, maybe I shouldn't.
I am extremly astonished that the C# compiler can't do this simple
optimization itself. Note that release is slow, too, not just the debug
version.

Bellow is the disasembled code (debug version). It indeed seems that
getting the dimensions from the image is the problem.

Thanks

FAST:
for (i = 0; i < numberPixels; i++)
00000251 xor edx,edx
00000253 mov dword ptr [ebp-58h],edx
00000256 nop
00000257 jmp 0000033C
....
00000339 inc dword ptr [ebp-58h]
0000033c mov eax,dword ptr [ebp-58h]
0000033f cmp eax,dword ptr [ebp-5Ch]
00000342 setl al
00000345 movzx eax,al
00000348 mov dword ptr [ebp-70h],eax
0000034b cmp dword ptr [ebp-70h],0
0000034f jne 0000025C

About 100x (rough estimate) slower then above:
for (i = 0; i < image.Width * image.Height; i++)
00000251 xor edx,edx
00000253 mov dword ptr [ebp-58h],edx
00000256 nop
00000257 jmp 0000033C
....
00000339 inc dword ptr [ebp-58h]
0000033c mov ebx,dword ptr [ebp-58h]
0000033f mov eax,dword ptr [ebp-3Ch]
00000342 mov ecx,dword ptr [eax+00000144h]
00000348 cmp dword ptr [ecx],ecx
0000034a call 7A25FEA0
0000034f mov esi,eax
00000351 mov eax,dword ptr [ebp-3Ch]
00000354 mov ecx,dword ptr [eax+00000144h]
0000035a cmp dword ptr [ecx],ecx
0000035c call 7A25FEB4
00000361 mov edi,eax
00000363 imul edi,esi
00000366 cmp ebx,edi
00000368 setl al
0000036b movzx eax,al
0000036e mov dword ptr [ebp-70h],eax
00000371 cmp dword ptr [ebp-70h],0
00000375 jne 0000025C

instead of this line:
for (i = 0; i < image.Width*image.Height; i++) {

try to cache the image width/height, as these calls are very slow:

int w = image.Width;
int h = image.Height;
int wh = w*h;
for (i = 0; i < wh; i++) {
//[...]
}

hth
Markus
 
G

Greg Young

I figured they were managed in which case it makes a _huge_ difference ...

Providing they were being inlined (just kept in a register) you would hoist
due to the multiplication not due to the accesses.

Given I was still saying to hoist (just for a different reason).
 
J

Jon Skeet [C# MVP]

Thanks a lot for that suggestion. Actually, after reading this I
remembered that I had a similar issue before and taking the
multiplication out of the loop header did help a lot. At that time it
took me a while to figure this out because the loop did a lot of stuff
and I just didn't think that this multiplication could have been the
reason. Have to upgrade my brain storage retrival...

When writing assembly code I do these kind of optimizations. But
today's compilers are very sophisticated regarding optimizations so the
trival optimizations I leave to the compiler. Well, maybe I shouldn't.
I am extremly astonished that the C# compiler can't do this simple
optimization itself. Note that release is slow, too, not just the debug
version.

The C# compiler can't possibly hoist the property accesses - how is it
supposed to know that the property values don't change between
iterations of the loop?
 
G

Greg Young

First of all the C# compiler doesn't do anything like this .. the JIT
compiler does.

Jon the JIT compiler DOES hoist property accesses if they are inlined (and
simple variable accesses).. this is what I have been on about.

Go disassemble the volatility example on your website .. the reason it
doesn't work in your starting case in release mode is because the JIT
hoisted it out of the loop and stores the value in a register.

Cheers,

Greg
 
H

hufaunder

Jon said:
The C# compiler can't possibly hoist the property accesses - how is it
supposed to know that the property values don't change between
iterations of the loop?

The compiler can check if the loop somewhere does modify the value. If
there are function calls inside the loop it gets a bit more complicated
but it still can be done. The only problem is if the value does get
changed by some other thread. While I didn't give it any deep thought I
don't see a scenario where that could be safe. In C you would have to
specify that certain values can change from the outside (typically a HW
register but in the above case it's the other thread). If it's not
specified the program will assume it does not change and can make
appropriate optimizations.
 
B

Barry Kelly

The compiler can check if the loop somewhere does modify the value.
If there are function calls inside the loop it gets a bit more complicated
but it still can be done. The only problem is if the value does get
changed by some other thread. While I didn't give it any deep thought I
don't see a scenario where that could be safe.

Properties aren't variables - they are syntax sugar for functions.
Reading a property is calling a function. Even C doesn't typically elide
extra function calls, because C functions generally aren't idempotent.
The CLR JIT, after inlining, may see that the function (i.e. the inlined
expression) is idempotent and thus perform invariant code motion,
hoisting the calculation out of the loop, but it can't in this case
because int System.Drawing.Image.get_Width() etc. calls P/Invoke methods
in unmanaged code (GDI+, in this case).

-- Barry
 
G

Greg Young

2 words ... volatile keyword

The volatile keyword will prevent the optimization that hoists the read and
checks a register forcing a read from memory on every iteration.

Cheers,

Greg
 
G

Greg Young

Barry you are correct .. I just got a bit ahead of myself thinking that the
image class actually followed the framework design guidelines and would have
put this as a method.

Although I have to admit it is kind of a grey area ... the guidelines state
that if it is something that would take an order of magnitude more time than
a simple field access it should be placed in a method.

At the same time they state that anything that represents a logical
attribute of the type.

Here is a case where its attributish but its also an order of magnitude
slower than a field access due to the overhead of interop.

I will fall back on the fact that the language is specified as "consider"
for logical attributes and "do" for the order of magnitude rule and say that
I would not consider this to be conforming to those rules.;
 
M

Markus

Thanks a lot for that suggestion. Actually, after reading this I
remembered that I had a similar issue before and taking the
multiplication out of the loop header did help a lot. At that time it
took me a while to figure this out because the loop did a lot of
stuff and I just didn't think that this multiplication could have
been the reason. Have to upgrade my brain storage retrival...

Good, that I could help you ;-)

Markus
 
J

Jon Skeet [C# MVP]

Greg Young said:
First of all the C# compiler doesn't do anything like this .. the JIT
compiler does.

Jon the JIT compiler DOES hoist property accesses if they are inlined (and
simple variable accesses).. this is what I have been on about.

Note the "and simple variable accesses". That's not the case here.
(It's also only the case if the JIT can determine that the values of
those variables won't change during the loop.)
Go disassemble the volatility example on your website .. the reason it
doesn't work in your starting case in release mode is because the JIT
hoisted it out of the loop and stores the value in a register.

Yes, I'm aware of what the JIT can do in simple cases. Calculating the
image height and width isn't such a simple case (with the current
implementation).
 
B

Barry Kelly

Greg said:
Barry you are correct .. I just got a bit ahead of myself thinking that the
image class actually followed the framework design guidelines and would have
put this as a method.

Although I have to admit it is kind of a grey area ... the guidelines state
that if it is something that would take an order of magnitude more time than
a simple field access it should be placed in a method.

At the same time they state that anything that represents a logical
attribute of the type.

Here is a case where its attributish but its also an order of magnitude
slower than a field access due to the overhead of interop.

I will fall back on the fact that the language is specified as "consider"
for logical attributes and "do" for the order of magnitude rule and say that
I would not consider this to be conforming to those rules.;

I think common sense and usability has to come into play too. Lots of
properties are backed by unmanaged resources. Control.Text if it's not
cached, Control.ModifierKeys, TreeNode.Bounds, most of the interesting
properties on WebBrowser, etc. Check out the work FileInfo.Directory
does. If anything, the problem with Image is that it doesn't cache the
width and height, and have some kind of internal observer set up with
the GDI+ side to refresh them if necessary. There's a risk that you end
up with the Java-itis of 'GetFoo().GetBar().GetBaz()' versus
'Foo.Bar.Baz'.

In many ways, I feel that if you're doing something based on
Width*Height of an image, you're in a state of sin anyway - that's going
to be in the millions for many common images. Most usable image
manipulation code I've seen works with the scanlines directly, often in
assembler tuned for the processor where possible.

-- Barry
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top