Unwinding a loop: why can't .NET add 1 + 2 + 3 ... ?

M

Mountain Bikn' Guy

It turns out that there is indeed a simple solution (at least for the first
half of the problem) and that .NET can indeed add correctly ;)

In the statements that generate the brute force sum code (the line 1 + 2 +
.... + n), all I did was add this:
if ((i % 200) == 0)
{
wrtr.WriteLine();
}
This breaks the lines of code into shorter lines. Now the reflection
invocation produces the correct result too.

I find it strange that my csc log file (compiler.out) did not issue any
warning about the lines being too long -- and that the program compiled and
ran and produced incorrect results -- all with no warning.

(I also find it frustrating that Jesse Liberty didn't check the code in his
book more carefully... but I'll withhold judgement until after I've written
a book or two myself and experienced the difficulty I imagine exists in
getting all the examples correct.)

However, I am happy that this was not a serious issue... but there is still
a second part to this puzzle.

1. In my second example (see first post in this thread), that "optimized"
code is always about 5 times slower than the normal looping code.
2. In the second example, the lines are not too long, so we don't have the
problem we had in this first example. However, if the number of rows in that
approach brute force are increased above about 500, then that code refuses
to run via reflection. If anyone is interested, I'll post the rest of that
code too.

Dave

Mountain Bikn' Guy said:
I pasted the complete program to a message in this thread just above. The
example comes from Jesse Liberty's Programming C#. The main issue/problem
seems to be reflection (which is my main interest in working with this type
of code anyway.) All I did was debug his example code so it would compile
and run, and then I added a method to check the results. The method I added
just contains the code Jesse was producing at runtime. I copied the result
into the main file and I run it without reflection. Under those conditions
it produces the correct result. When invoked via reflection it produces a
vastly incorrect result. I'm beginning to think there is an obvious solution
to this. In fact, I'm going to try putting line breaks into the generated
code to shorten the line length... I'll post the results.
 
S

Santiago

A couple of ideas:
1) From the screenshot, I noticed "Exit Code: 1". It's not obvious where
it's coming from but if it's coming from csc.exe that means your program
didn't compile and you are running an old version. This would lead me to
believe the following is happening: The last program that compiled fine
correctly yielded a result of 5050, then you added more +'s to your
statement and the program stopped compiling. Check compile.out.
2) You are using a /optimize+ switch. Could that switch be mangling the code
and everyone else in the ns is testing without this switch?

- Santiago
 
M

Mountain Bikn' Guy

Hi Santiago,
Thanks for your message. I found the solution. It's posted at the end of
this thread. The only thing I didn't figure out yet is why compile.out
didn't contain any error messages.
Dave
 
M

Mountain Bikn' Guy

Good insight -- that was exactly the problem (but it was covered up because
I was using reflection techniques to generate/compile/load and run the
code). See my last message at the end of this thread.
Dave

memememe said:
maybe there is some sort of limit on the ammount of +s and length of your
line, have you tried splitting it on small chunks?


Mountain Bikn' Guy said:
Take some standard code such as shown below. It simply loops to add up a
series of terms and it produces the correct result.

// sum numbers with a loop
public int DoSumLooping(int iterations)
{
int result = 0;
for(int i = 1;i <=iterations;i++)
{
result += i;
}
return result;
}

Now translate this into a specific solution that doesn't use looping (and
use the same value for the number of iterations the loop performs). This
code returns an incorrect result. The method consists entirely of a very
straightforward code statement, but in this case .NET adds the numbers
incorrectly.
public double ComputeSum( )
{
// Brute force sum method
// For iterations == 10000
double sum = 0+ 1+ 2+ 3+ 4+ 5+ 6+ 7+ 8+ 9+ 10+ 11+ 12+ 13+ 14+ 15+ ...
+ 9997+ 9998+ 9999+ 10000;
return sum;
}
The above method returns an incorrect result with any number of terms above
about 200. It will correctly add 1 + 2 + ... + 200, but it will NOT
correctly add 1 + 2 + ... + 1000.

I have just run across this, and I have not yet researched the possible
reasons for this behavior. It may be a known issue related to either stack
size or the length of a code line, but to my knowledge it hasn't been
discussed in any of the "popular" literature on C# and .NET. I need to write
code like this, so if anyone has already encountered this issue, please
advise me.


Here's another example that also creates problems, but of a somewhat
different nature. Take the following code and translate it into a specific,
non-looping method and try to execute it using reflection. It fails.

public double LoopToCompute()
{
double sumOfProducts = 0;
double grandTotal = 0;
for (int i = 0; i < maxRows; ++i)
{
for (int j = 0; j < maxCols; ++j)
{
sumOfProducts += coeff[j] * table[j];
}
a_point = sumOfProducts;
grandTotal += sumOfProducts;
sumOfProducts = 0;
}
return grandTotal;
}//LoopToCompute

The above code works -- but it's equivalent code with loops unrolled (shown
below) doesn't work unless the maxRows is set very small. For small values,
the 2 methods (above and below) produce identical results. There is nothing
"wrong" with the code in that sense. It's similar to the above
situation.
If
the "size" of the code statement or the number of code statements is too
large, .NET fails. In this case (using reflection) it doesn't return the
incorrect result, as the first example did. In this case, reflection calls
it an invalid program and refuses to run it (but only when the value of
maxRows is above about 250). The reason for this is probably
straightforward. However, I have the need to make statements like this for
performance reasons so I need a work-around. Any suggestions are
appreciated! All comments are appreciated.

public double DoBruteForceCompute()
{
double bruteForceSum = 0;

point1=coeff1*table[0][0] +coeff2*table[0][1] +coeff3*table[0][2]
+coeff4*table[0][3] +coeff5*table[0][4] +coeff6*table[0][5]
+coeff7*table[0][6] +coeff8*table[0][7] +coeff9*table[0][8]
+coeff10*table[0][9] +coeff11*table[0][10] +coeff12*table[0][11]
+coeff13*table[0][12] +coeff14*table[0][13] +coeff15*table[0][14]
+coeff16*table[0][15] +coeff17*table[0][16] +coeff18*table[0][17]
+coeff19*table[0][18] +coeff20*table[0][19] +coeff21*table[0][20]
+coeff22*table[0][21] +coeff23*table[0][22] +coeff24*table[0][23]
+coeff25*table[0][24] +coeff26*table[0][25] +coeff27*table[0][26]
+coeff28*table[0][27] +coeff29*table[0][28] +coeff30*table[0][29]
+coeff31*table[0][30] +coeff32*table[0][31] +coeff33*table[0][32]
+coeff34*table[0][33] +coeff35*table[0][34] ;

point2=coeff1*table[1][0] +coeff2*table[1][1] +coeff3*table[1][2]
+coeff4*table[1][3] +coeff5*table[1][4] +coeff6*table[1][5]
+coeff7*table[1][6] +coeff8*table[1][7] +coeff9*table[1][8]
+coeff10*table[1][9] +coeff11*table[1][10] +coeff12*table[1][11]
+coeff13*table[1][12] +coeff14*table[1][13] +coeff15*table[1][14]
+coeff16*table[1][15] +coeff17*table[1][16] +coeff18*table[1][17]
+coeff19*table[1][18] +coeff20*table[1][19] +coeff21*table[1][20]
+coeff22*table[1][21] +coeff23*table[1][22] +coeff24*table[1][23]
+coeff25*table[1][24] +coeff26*table[1][25] +coeff27*table[1][26]
+coeff28*table[1][27] +coeff29*table[1][28] +coeff30*table[1][29]
+coeff31*table[1][30] +coeff32*table[1][31] +coeff33*table[1][32]
+coeff34*table[1][33] +coeff35*table[1][34] ;


[...]

point500=coeff1*table[499][0] +coeff2*table[499][1] +coeff3*table[499][2]
+coeff4*table[499][3] +coeff5*table[499][4] +coeff6*table[499][5]
+coeff7*table[499][6] +coeff8*table[499][7] +coeff9*table[499][8]
+coeff10*table[499][9] +coeff11*table[499][10] +coeff12*table[499][11]
+coeff13*table[499][12] +coeff14*table[499][13] +coeff15*table[499][14]
+coeff16*table[499][15] +coeff17*table[499][16] +coeff18*table[499][17]
+coeff19*table[499][18] +coeff20*table[499][19] +coeff21*table[499][20]
+coeff22*table[499][21] +coeff23*table[499][22] +coeff24*table[499][23]
+coeff25*table[499][24] +coeff26*table[499][25] +coeff27*table[499][26]
+coeff28*table[499][27] +coeff29*table[499][28] +coeff30*table[499][29]
+coeff31*table[499][30] +coeff32*table[499][31] +coeff33*table[499][32]
+coeff34*table[499][33] +coeff35*table[499][34] ;

bruteForceSum =
point1 +
point2 + ... +

point499 +
point500
;

return bruteForceSum;

}//DoBruteForceCompute
 
F

Fergus Cooney

Hi Guys,

4GB Ram.
Super speed.
Modern OS.
Ultra-modern compiler.

Doh! Of course there's an arbitrary small liimit somewhere. ;-))

Regards,
Fergus
 
M

Mountain Bikn' Guy

Anyone have any ideas why the following code "is not a valid program". It
compiles fine, but the runtime refuses to run it. I apologize for it being a
bit long (but I did cut out 497 of the 500 statement lines in the method).
FYI, this is part 2 of my initial question in this thread (and in this case,
the statement lines are not longer than the limit).
Thanks.
Dave

public double DoBruteForceCompute()
{
double bruteForceSum = 0;

point1=coeff1*table[0][0] +coeff2*table[0][1] +coeff3*table[0][2]
+coeff4*table[0][3] +coeff5*table[0][4] +coeff6*table[0][5]
+coeff7*table[0][6] +coeff8*table[0][7] +coeff9*table[0][8]
+coeff10*table[0][9] +coeff11*table[0][10] +coeff12*table[0][11]
+coeff13*table[0][12] +coeff14*table[0][13] +coeff15*table[0][14]
+coeff16*table[0][15] +coeff17*table[0][16] +coeff18*table[0][17]
+coeff19*table[0][18] +coeff20*table[0][19] +coeff21*table[0][20]
+coeff22*table[0][21] +coeff23*table[0][22] +coeff24*table[0][23]
+coeff25*table[0][24] +coeff26*table[0][25] +coeff27*table[0][26]
+coeff28*table[0][27] +coeff29*table[0][28] +coeff30*table[0][29]
+coeff31*table[0][30] +coeff32*table[0][31] +coeff33*table[0][32]
+coeff34*table[0][33] +coeff35*table[0][34] ;

point2=coeff1*table[1][0] +coeff2*table[1][1] +coeff3*table[1][2]
+coeff4*table[1][3] +coeff5*table[1][4] +coeff6*table[1][5]
+coeff7*table[1][6] +coeff8*table[1][7] +coeff9*table[1][8]
+coeff10*table[1][9] +coeff11*table[1][10] +coeff12*table[1][11]
+coeff13*table[1][12] +coeff14*table[1][13] +coeff15*table[1][14]
+coeff16*table[1][15] +coeff17*table[1][16] +coeff18*table[1][17]
+coeff19*table[1][18] +coeff20*table[1][19] +coeff21*table[1][20]
+coeff22*table[1][21] +coeff23*table[1][22] +coeff24*table[1][23]
+coeff25*table[1][24] +coeff26*table[1][25] +coeff27*table[1][26]
+coeff28*table[1][27] +coeff29*table[1][28] +coeff30*table[1][29]
+coeff31*table[1][30] +coeff32*table[1][31] +coeff33*table[1][32]
+coeff34*table[1][33] +coeff35*table[1][34] ;


[...]

point500=coeff1*table[499][0] +coeff2*table[499][1] +coeff3*table[499][2]
+coeff4*table[499][3] +coeff5*table[499][4] +coeff6*table[499][5]
+coeff7*table[499][6] +coeff8*table[499][7] +coeff9*table[499][8]
+coeff10*table[499][9] +coeff11*table[499][10] +coeff12*table[499][11]
+coeff13*table[499][12] +coeff14*table[499][13] +coeff15*table[499][14]
+coeff16*table[499][15] +coeff17*table[499][16] +coeff18*table[499][17]
+coeff19*table[499][18] +coeff20*table[499][19] +coeff21*table[499][20]
+coeff22*table[499][21] +coeff23*table[499][22] +coeff24*table[499][23]
+coeff25*table[499][24] +coeff26*table[499][25] +coeff27*table[499][26]
+coeff28*table[499][27] +coeff29*table[499][28] +coeff30*table[499][29]
+coeff31*table[499][30] +coeff32*table[499][31] +coeff33*table[499][32]
+coeff34*table[499][33] +coeff35*table[499][34] ;

bruteForceSum =
point1 +
point2 + ... +

point500
;

return bruteForceSum;

}//DoBruteForceCompute


Jon Skeet said:
[Removed microsoft.public.dotnet.csharp.general which isn't a valid
group, at least not on the MS server.]

Mountain Bikn' Guy said:
Take some standard code such as shown below. It simply loops to add up a
series of terms and it produces the correct result.

// sum numbers with a loop
public int DoSumLooping(int iterations)
{
int result = 0;
for(int i = 1;i <=iterations;i++)
{
result += i;
}
return result;
}

Now translate this into a specific solution that doesn't use looping (and
use the same value for the number of iterations the loop performs). This
code returns an incorrect result.

Not for me it doesn't (going up to 10000). Could you produce a
*complete* example somewhere which gives the incorrect result, either
posting it or giving a link to it on the web?
 
M

Mountain Bikn' Guy

P.S. I forgot to mention that the problem is related to the number of
statements in the method. 1000 doesn't work. I think it works up to a few
hundred statements, then stops being "a valid program," according to the
runtime. The compiler never complains.
 
J

Jon Skeet [C# MVP]

Mountain Bikn' Guy said:
Anyone have any ideas why the following code "is not a valid program". It
compiles fine, but the runtime refuses to run it. I apologize for it being a
bit long (but I did cut out 497 of the 500 statement lines in the method).
FYI, this is part 2 of my initial question in this thread (and in this case,
the statement lines are not longer than the limit).

<snip>

Presumably you do actually still *have* the full version around? I
appreciate it would be too long to post, but could you mail it to me or
put it on a website? It's easier to test that way :)

If you could make it a full program that we could just compile as-is
and then run, that would be great.
 
F

Fergus Cooney

Hi Dave,

Breaks at 1000 per method?

There's the 'why' and there's the 'what to do about it'.

Here's a possibility for the latter:

public double DoBruteForceCompute_Part_01()

public double DoBruteForceCompute_Part_02()
: : :
public double DoBruteForceCompute_Part_NN()

.. where each gets, say, a mere 250.

Regards,
Fergus
 
M

Mountain Bikn' Guy

Jon,
I will email it to you. Thanks for you interest.
If people wish, I'll also post it, but it IS long.
Dave
 
S

Steve S

how about....... putting it into a briefcase... or something like
that...... so those of us that subscribe to the board... don't pull it
down.. unless we want it..... then lost the location ?

Mountain Bikn' Guy said:
Jon,
I will email it to you. Thanks for you interest.
If people wish, I'll also post it, but it IS long.
Dave
 
M

Mountain Bikn' Guy

I hope the zip file I attached to my other message is OK. If not, let me
know and I'll try something else.
 
M

Mountain Bikn' Guy

This link is interesting

http://groups.google.com/groups?hl=...&selm=%23a9b8ArJCHA.2636%40tkmsftngp09&rnum=3

You can also get to the same place by searching Google Groups on "Common
Language Runtime detected an invalid program" if that URI is too long.

The little block of code posted back in 2002 still causes a compiler error
in VS 2003. Here it is:

I wrote this piece of code in the main method of a console application.

int myInteger = 5;
goto addval;
writeresult:
{
Console.WriteLine("My Integer = {0}", myInteger);
if (myInteger == 15)
{
goto Finish;
}
}

addval:
{
myInteger += 10;
goto writeresult;
}

Finish:
{
Console.WriteLine("End of Code");
}
 
M

Mountain Bikn' Guy

Jon,
I'm becoming convinced this issue is a compiler bug. I'm curious what you or
others find:
can you reproduce it using my code?
is there a work around (with equivalent performance advantages)?
is the bug present in both VS.NET 2002 and 2003?
Regards,
Dave
 
J

Jon Skeet [C# MVP]

Mountain Bikn' Guy said:
I'm becoming convinced this issue is a compiler bug. I'm curious what you or
others find:
can you reproduce it using my code?
Yup.

is there a work around (with equivalent performance advantages)?
is the bug present in both VS.NET 2002 and 2003?

I suspect it's actually in the framework itself (which is where the
guts of the compiler is) rather than VS.NET itself.

Is the simple looping code definitely too slow for your needs?
 
M

Mountain Bikn' Guy

In the event anyone wishes to review this thread now or later, I thought I
would provide some conclusions that resulted from everyone's input and from
my reading:

1. unrolling loops often doesn't provide a performance gain on large (real
life size?) problems. In my experience, the unrolled loop was 5-10 times
SLOWER than the simple nested loop that I used as a reference. I found the
following quote in Eric Gunnerson's book (2nd ed). He said (about unrolled
loops), a "function is so big it doesn't fit into cache and therefore gets
slower performance."

Jon added the following info:
I suspect he means the processor's instruction
cache - if it's a small loop, the processor (not JIT, note!) can decode
the instructions once and keep the microcode available for further
iterations.

2. VS.NET has a 2048 char line limit in code files. In my experience, when
using runtime code generation and compiling, this can cause unusual trouble
if one isn't aware of it. I was able to compile and run an application that
was created from a code file with a line longer than this limit. The only
error I encountered was that the mathematical expression in the executing
program returned incorrect results! (The fix was simply inserting line
breaks into the code file.)

3. VS.NET has some compiler bugs that will result in the following error
message:
Additional information: Common Language Runtime detected an invalid program.

This appears to happen on complex mathematical expressions. However, in my
case the expression itself was not that complex. It was just long (but each
line wasn't too long). When I reached about 1000 subexpressions in a simple
sum of products expression, I encountered this compiler bug. Search on
Google Groups using the error message, and you'll see that others have
encountered it (frequently) when dealing with complex mathematical
expressions.

To my knowledge, there is no solution for this. Anyone from Microsoft care
to comment?

That about sums up (no pun intended) what I've learned so far. HTH.
Dave


Mountain Bikn' Guy said:
Take some standard code such as shown below. It simply loops to add up a
series of terms and it produces the correct result.

// sum numbers with a loop
public int DoSumLooping(int iterations)
{
int result = 0;
for(int i = 1;i <=iterations;i++)
{
result += i;
}
return result;
}

Now translate this into a specific solution that doesn't use looping (and
use the same value for the number of iterations the loop performs). This
code returns an incorrect result. The method consists entirely of a very
straightforward code statement, but in this case .NET adds the numbers
incorrectly.
public double ComputeSum( )
{
// Brute force sum method
// For iterations == 10000
double sum = 0+ 1+ 2+ 3+ 4+ 5+ 6+ 7+ 8+ 9+ 10+ 11+ 12+ 13+ 14+ 15+ ....
+ 9997+ 9998+ 9999+ 10000;
return sum;
}
The above method returns an incorrect result with any number of terms above
about 200. It will correctly add 1 + 2 + ... + 200, but it will NOT
correctly add 1 + 2 + ... + 1000.

I have just run across this, and I have not yet researched the possible
reasons for this behavior. It may be a known issue related to either stack
size or the length of a code line, but to my knowledge it hasn't been
discussed in any of the "popular" literature on C# and .NET. I need to write
code like this, so if anyone has already encountered this issue, please
advise me.


Here's another example that also creates problems, but of a somewhat
different nature. Take the following code and translate it into a specific,
non-looping method and try to execute it using reflection. It fails.

public double LoopToCompute()
{
double sumOfProducts = 0;
double grandTotal = 0;
for (int i = 0; i < maxRows; ++i)
{
for (int j = 0; j < maxCols; ++j)
{
sumOfProducts += coeff[j] * table[j];
}
a_point = sumOfProducts;
grandTotal += sumOfProducts;
sumOfProducts = 0;
}
return grandTotal;
}//LoopToCompute

The above code works -- but it's equivalent code with loops unrolled (shown
below) doesn't work unless the maxRows is set very small. For small values,
the 2 methods (above and below) produce identical results. There is nothing
"wrong" with the code in that sense. It's similar to the above situation. If
the "size" of the code statement or the number of code statements is too
large, .NET fails. In this case (using reflection) it doesn't return the
incorrect result, as the first example did. In this case, reflection calls
it an invalid program and refuses to run it (but only when the value of
maxRows is above about 250). The reason for this is probably
straightforward. However, I have the need to make statements like this for
performance reasons so I need a work-around. Any suggestions are
appreciated! All comments are appreciated.

public double DoBruteForceCompute()
{
double bruteForceSum = 0;

point1=coeff1*table[0][0] +coeff2*table[0][1] +coeff3*table[0][2]
+coeff4*table[0][3] +coeff5*table[0][4] +coeff6*table[0][5]
+coeff7*table[0][6] +coeff8*table[0][7] +coeff9*table[0][8]
+coeff10*table[0][9] +coeff11*table[0][10] +coeff12*table[0][11]
+coeff13*table[0][12] +coeff14*table[0][13] +coeff15*table[0][14]
+coeff16*table[0][15] +coeff17*table[0][16] +coeff18*table[0][17]
+coeff19*table[0][18] +coeff20*table[0][19] +coeff21*table[0][20]
+coeff22*table[0][21] +coeff23*table[0][22] +coeff24*table[0][23]
+coeff25*table[0][24] +coeff26*table[0][25] +coeff27*table[0][26]
+coeff28*table[0][27] +coeff29*table[0][28] +coeff30*table[0][29]
+coeff31*table[0][30] +coeff32*table[0][31] +coeff33*table[0][32]
+coeff34*table[0][33] +coeff35*table[0][34] ;

point2=coeff1*table[1][0] +coeff2*table[1][1] +coeff3*table[1][2]
+coeff4*table[1][3] +coeff5*table[1][4] +coeff6*table[1][5]
+coeff7*table[1][6] +coeff8*table[1][7] +coeff9*table[1][8]
+coeff10*table[1][9] +coeff11*table[1][10] +coeff12*table[1][11]
+coeff13*table[1][12] +coeff14*table[1][13] +coeff15*table[1][14]
+coeff16*table[1][15] +coeff17*table[1][16] +coeff18*table[1][17]
+coeff19*table[1][18] +coeff20*table[1][19] +coeff21*table[1][20]
+coeff22*table[1][21] +coeff23*table[1][22] +coeff24*table[1][23]
+coeff25*table[1][24] +coeff26*table[1][25] +coeff27*table[1][26]
+coeff28*table[1][27] +coeff29*table[1][28] +coeff30*table[1][29]
+coeff31*table[1][30] +coeff32*table[1][31] +coeff33*table[1][32]
+coeff34*table[1][33] +coeff35*table[1][34] ;


[...]

point500=coeff1*table[499][0] +coeff2*table[499][1] +coeff3*table[499][2]
+coeff4*table[499][3] +coeff5*table[499][4] +coeff6*table[499][5]
+coeff7*table[499][6] +coeff8*table[499][7] +coeff9*table[499][8]
+coeff10*table[499][9] +coeff11*table[499][10] +coeff12*table[499][11]
+coeff13*table[499][12] +coeff14*table[499][13] +coeff15*table[499][14]
+coeff16*table[499][15] +coeff17*table[499][16] +coeff18*table[499][17]
+coeff19*table[499][18] +coeff20*table[499][19] +coeff21*table[499][20]
+coeff22*table[499][21] +coeff23*table[499][22] +coeff24*table[499][23]
+coeff25*table[499][24] +coeff26*table[499][25] +coeff27*table[499][26]
+coeff28*table[499][27] +coeff29*table[499][28] +coeff30*table[499][29]
+coeff31*table[499][30] +coeff32*table[499][31] +coeff33*table[499][32]
+coeff34*table[499][33] +coeff35*table[499][34] ;

bruteForceSum =
point1 +
point2 + ... +

point499 +
point500
;

return bruteForceSum;

}//DoBruteForceCompute
 
M

Mountain Bikn' Guy

I think I found the right balance:
Instead of using a nested loop, I unroll the inner loop but leave the outer
loop. This has resulted in the best performance so far (and it doesn't cause
any of the issues/bugs and have been discussed in this thread).

Thanks for all your replies.

Dave
 
A

anonymouse

Did you make these unrolled loops manually or did you use a CodeDom to
generate them?

Curious.

Why do it the hard way:D


Mountain Bikn' Guy said:
In the event anyone wishes to review this thread now or later, I thought I
would provide some conclusions that resulted from everyone's input and from
my reading:

1. unrolling loops often doesn't provide a performance gain on large (real
life size?) problems. In my experience, the unrolled loop was 5-10 times
SLOWER than the simple nested loop that I used as a reference. I found the
following quote in Eric Gunnerson's book (2nd ed). He said (about unrolled
loops), a "function is so big it doesn't fit into cache and therefore gets
slower performance."

Jon added the following info:
I suspect he means the processor's instruction
cache - if it's a small loop, the processor (not JIT, note!) can decode
the instructions once and keep the microcode available for further
iterations.

2. VS.NET has a 2048 char line limit in code files. In my experience, when
using runtime code generation and compiling, this can cause unusual trouble
if one isn't aware of it. I was able to compile and run an application that
was created from a code file with a line longer than this limit. The only
error I encountered was that the mathematical expression in the executing
program returned incorrect results! (The fix was simply inserting line
breaks into the code file.)

3. VS.NET has some compiler bugs that will result in the following error
message:
Additional information: Common Language Runtime detected an invalid program.

This appears to happen on complex mathematical expressions. However, in my
case the expression itself was not that complex. It was just long (but each
line wasn't too long). When I reached about 1000 subexpressions in a simple
sum of products expression, I encountered this compiler bug. Search on
Google Groups using the error message, and you'll see that others have
encountered it (frequently) when dealing with complex mathematical
expressions.

To my knowledge, there is no solution for this. Anyone from Microsoft care
to comment?

That about sums up (no pun intended) what I've learned so far. HTH.
Dave


Mountain Bikn' Guy said:
Take some standard code such as shown below. It simply loops to add up a
series of terms and it produces the correct result.

// sum numbers with a loop
public int DoSumLooping(int iterations)
{
int result = 0;
for(int i = 1;i <=iterations;i++)
{
result += i;
}
return result;
}

Now translate this into a specific solution that doesn't use looping (and
use the same value for the number of iterations the loop performs). This
code returns an incorrect result. The method consists entirely of a very
straightforward code statement, but in this case .NET adds the numbers
incorrectly.
public double ComputeSum( )
{
// Brute force sum method
// For iterations == 10000
double sum = 0+ 1+ 2+ 3+ 4+ 5+ 6+ 7+ 8+ 9+ 10+ 11+ 12+ 13+ 14+ 15+ ...
+ 9997+ 9998+ 9999+ 10000;
return sum;
}
The above method returns an incorrect result with any number of terms above
about 200. It will correctly add 1 + 2 + ... + 200, but it will NOT
correctly add 1 + 2 + ... + 1000.

I have just run across this, and I have not yet researched the possible
reasons for this behavior. It may be a known issue related to either stack
size or the length of a code line, but to my knowledge it hasn't been
discussed in any of the "popular" literature on C# and .NET. I need to write
code like this, so if anyone has already encountered this issue, please
advise me.


Here's another example that also creates problems, but of a somewhat
different nature. Take the following code and translate it into a specific,
non-looping method and try to execute it using reflection. It fails.

public double LoopToCompute()
{
double sumOfProducts = 0;
double grandTotal = 0;
for (int i = 0; i < maxRows; ++i)
{
for (int j = 0; j < maxCols; ++j)
{
sumOfProducts += coeff[j] * table[j];
}
a_point = sumOfProducts;
grandTotal += sumOfProducts;
sumOfProducts = 0;
}
return grandTotal;
}//LoopToCompute

The above code works -- but it's equivalent code with loops unrolled (shown
below) doesn't work unless the maxRows is set very small. For small values,
the 2 methods (above and below) produce identical results. There is nothing
"wrong" with the code in that sense. It's similar to the above
situation.
If
the "size" of the code statement or the number of code statements is too
large, .NET fails. In this case (using reflection) it doesn't return the
incorrect result, as the first example did. In this case, reflection calls
it an invalid program and refuses to run it (but only when the value of
maxRows is above about 250). The reason for this is probably
straightforward. However, I have the need to make statements like this for
performance reasons so I need a work-around. Any suggestions are
appreciated! All comments are appreciated.

public double DoBruteForceCompute()
{
double bruteForceSum = 0;

point1=coeff1*table[0][0] +coeff2*table[0][1] +coeff3*table[0][2]
+coeff4*table[0][3] +coeff5*table[0][4] +coeff6*table[0][5]
+coeff7*table[0][6] +coeff8*table[0][7] +coeff9*table[0][8]
+coeff10*table[0][9] +coeff11*table[0][10] +coeff12*table[0][11]
+coeff13*table[0][12] +coeff14*table[0][13] +coeff15*table[0][14]
+coeff16*table[0][15] +coeff17*table[0][16] +coeff18*table[0][17]
+coeff19*table[0][18] +coeff20*table[0][19] +coeff21*table[0][20]
+coeff22*table[0][21] +coeff23*table[0][22] +coeff24*table[0][23]
+coeff25*table[0][24] +coeff26*table[0][25] +coeff27*table[0][26]
+coeff28*table[0][27] +coeff29*table[0][28] +coeff30*table[0][29]
+coeff31*table[0][30] +coeff32*table[0][31] +coeff33*table[0][32]
+coeff34*table[0][33] +coeff35*table[0][34] ;

point2=coeff1*table[1][0] +coeff2*table[1][1] +coeff3*table[1][2]
+coeff4*table[1][3] +coeff5*table[1][4] +coeff6*table[1][5]
+coeff7*table[1][6] +coeff8*table[1][7] +coeff9*table[1][8]
+coeff10*table[1][9] +coeff11*table[1][10] +coeff12*table[1][11]
+coeff13*table[1][12] +coeff14*table[1][13] +coeff15*table[1][14]
+coeff16*table[1][15] +coeff17*table[1][16] +coeff18*table[1][17]
+coeff19*table[1][18] +coeff20*table[1][19] +coeff21*table[1][20]
+coeff22*table[1][21] +coeff23*table[1][22] +coeff24*table[1][23]
+coeff25*table[1][24] +coeff26*table[1][25] +coeff27*table[1][26]
+coeff28*table[1][27] +coeff29*table[1][28] +coeff30*table[1][29]
+coeff31*table[1][30] +coeff32*table[1][31] +coeff33*table[1][32]
+coeff34*table[1][33] +coeff35*table[1][34] ;


[...]

point500=coeff1*table[499][0] +coeff2*table[499][1] +coeff3*table[499][2]
+coeff4*table[499][3] +coeff5*table[499][4] +coeff6*table[499][5]
+coeff7*table[499][6] +coeff8*table[499][7] +coeff9*table[499][8]
+coeff10*table[499][9] +coeff11*table[499][10] +coeff12*table[499][11]
+coeff13*table[499][12] +coeff14*table[499][13] +coeff15*table[499][14]
+coeff16*table[499][15] +coeff17*table[499][16] +coeff18*table[499][17]
+coeff19*table[499][18] +coeff20*table[499][19] +coeff21*table[499][20]
+coeff22*table[499][21] +coeff23*table[499][22] +coeff24*table[499][23]
+coeff25*table[499][24] +coeff26*table[499][25] +coeff27*table[499][26]
+coeff28*table[499][27] +coeff29*table[499][28] +coeff30*table[499][29]
+coeff31*table[499][30] +coeff32*table[499][31] +coeff33*table[499][32]
+coeff34*table[499][33] +coeff35*table[499][34] ;

bruteForceSum =
point1 +
point2 + ... +

point499 +
point500
;

return bruteForceSum;

}//DoBruteForceCompute
 
M

Mountain Bikn' Guy

Since you asked...

I could use some help with CodeDOM. Here is how I am generating these loops:

//
//TODO: generate this entire expression using CodeDOM classes:
StringBuilder formulaString = new StringBuilder();
formulaString.Append("dataPoints[row] = 0");
if (columns != null)
{
formulaString.Capacity = 40 * (columns.Count + 2);
for (int j = 0; j < columns.Count; ++j)
{
if ( ((MyColumnType)columns[j]).Role == RoleType.Independent)
{
formulaString.Append(System.Environment.NewLine);
formulaString.Append("+ coefficients[" + j + "] * table[row][" + j + "] ");
}
}
}
formulaString.Append(";");
//

The rest of the stuff (namespace, class, outer for loop, return statement,
etc.) is all generated purely without code snippet expressions (with 1
exception -- see bug below). Anyone want to show me how to do the above
expression without code snippets? (Note the need to break the lines to keep
them below the 2046 char limit.)

The one other issue I am having trouble with is this:

compute.ReturnType = new CodeTypeReference("public float[]"); //suggested
workaround for bug below
compute.Attributes |= MemberAttributes.Public;//has a bug

Is there an easy, language neutral way to work around the
MemberAttributes.Public bug when I return an array?

TIA,
Dave



anonymouse said:
Did you make these unrolled loops manually or did you use a CodeDom to
generate them?

Curious.

Why do it the hard way:D


Mountain Bikn' Guy said:
In the event anyone wishes to review this thread now or later, I thought I
would provide some conclusions that resulted from everyone's input and from
my reading:

1. unrolling loops often doesn't provide a performance gain on large (real
life size?) problems. In my experience, the unrolled loop was 5-10 times
SLOWER than the simple nested loop that I used as a reference. I found the
following quote in Eric Gunnerson's book (2nd ed). He said (about unrolled
loops), a "function is so big it doesn't fit into cache and therefore gets
slower performance."

Jon added the following info:
I suspect he means the processor's instruction
cache - if it's a small loop, the processor (not JIT, note!) can decode
the instructions once and keep the microcode available for further
iterations.

2. VS.NET has a 2048 char line limit in code files. In my experience, when
using runtime code generation and compiling, this can cause unusual trouble
if one isn't aware of it. I was able to compile and run an application that
was created from a code file with a line longer than this limit. The only
error I encountered was that the mathematical expression in the executing
program returned incorrect results! (The fix was simply inserting line
breaks into the code file.)

3. VS.NET has some compiler bugs that will result in the following error
message:
Additional information: Common Language Runtime detected an invalid program.

This appears to happen on complex mathematical expressions. However, in my
case the expression itself was not that complex. It was just long (but each
line wasn't too long). When I reached about 1000 subexpressions in a simple
sum of products expression, I encountered this compiler bug. Search on
Google Groups using the error message, and you'll see that others have
encountered it (frequently) when dealing with complex mathematical
expressions.

To my knowledge, there is no solution for this. Anyone from Microsoft care
to comment?

That about sums up (no pun intended) what I've learned so far. HTH.
Dave


Mountain Bikn' Guy said:
Take some standard code such as shown below. It simply loops to add up a
series of terms and it produces the correct result.

// sum numbers with a loop
public int DoSumLooping(int iterations)
{
int result = 0;
for(int i = 1;i <=iterations;i++)
{
result += i;
}
return result;
}

Now translate this into a specific solution that doesn't use looping (and
use the same value for the number of iterations the loop performs). This
code returns an incorrect result. The method consists entirely of a very
straightforward code statement, but in this case .NET adds the numbers
incorrectly.
public double ComputeSum( )
{
// Brute force sum method
// For iterations == 10000
double sum = 0+ 1+ 2+ 3+ 4+ 5+ 6+ 7+ 8+ 9+ 10+ 11+ 12+ 13+ 14+
15+
...
+ 9997+ 9998+ 9999+ 10000;
return sum;
}
The above method returns an incorrect result with any number of terms above
about 200. It will correctly add 1 + 2 + ... + 200, but it will NOT
correctly add 1 + 2 + ... + 1000.

I have just run across this, and I have not yet researched the possible
reasons for this behavior. It may be a known issue related to either stack
size or the length of a code line, but to my knowledge it hasn't been
discussed in any of the "popular" literature on C# and .NET. I need to write
code like this, so if anyone has already encountered this issue, please
advise me.


Here's another example that also creates problems, but of a somewhat
different nature. Take the following code and translate it into a specific,
non-looping method and try to execute it using reflection. It fails.

public double LoopToCompute()
{
double sumOfProducts = 0;
double grandTotal = 0;
for (int i = 0; i < maxRows; ++i)
{
for (int j = 0; j < maxCols; ++j)
{
sumOfProducts += coeff[j] * table[j];
}
a_point = sumOfProducts;
grandTotal += sumOfProducts;
sumOfProducts = 0;
}
return grandTotal;
}//LoopToCompute

The above code works -- but it's equivalent code with loops unrolled (shown
below) doesn't work unless the maxRows is set very small. For small values,
the 2 methods (above and below) produce identical results. There is nothing
"wrong" with the code in that sense. It's similar to the above
situation.
If
the "size" of the code statement or the number of code statements is too
large, .NET fails. In this case (using reflection) it doesn't return the
incorrect result, as the first example did. In this case, reflection calls
it an invalid program and refuses to run it (but only when the value of
maxRows is above about 250). The reason for this is probably
straightforward. However, I have the need to make statements like this for
performance reasons so I need a work-around. Any suggestions are
appreciated! All comments are appreciated.

public double DoBruteForceCompute()
{
double bruteForceSum = 0;

point1=coeff1*table[0][0] +coeff2*table[0][1] +coeff3*table[0][2]
+coeff4*table[0][3] +coeff5*table[0][4] +coeff6*table[0][5]
+coeff7*table[0][6] +coeff8*table[0][7] +coeff9*table[0][8]
+coeff10*table[0][9] +coeff11*table[0][10] +coeff12*table[0][11]
+coeff13*table[0][12] +coeff14*table[0][13] +coeff15*table[0][14]
+coeff16*table[0][15] +coeff17*table[0][16] +coeff18*table[0][17]
+coeff19*table[0][18] +coeff20*table[0][19] +coeff21*table[0][20]
+coeff22*table[0][21] +coeff23*table[0][22] +coeff24*table[0][23]
+coeff25*table[0][24] +coeff26*table[0][25] +coeff27*table[0][26]
+coeff28*table[0][27] +coeff29*table[0][28] +coeff30*table[0][29]
+coeff31*table[0][30] +coeff32*table[0][31] +coeff33*table[0][32]
+coeff34*table[0][33] +coeff35*table[0][34] ;

point2=coeff1*table[1][0] +coeff2*table[1][1] +coeff3*table[1][2]
+coeff4*table[1][3] +coeff5*table[1][4] +coeff6*table[1][5]
+coeff7*table[1][6] +coeff8*table[1][7] +coeff9*table[1][8]
+coeff10*table[1][9] +coeff11*table[1][10] +coeff12*table[1][11]
+coeff13*table[1][12] +coeff14*table[1][13] +coeff15*table[1][14]
+coeff16*table[1][15] +coeff17*table[1][16] +coeff18*table[1][17]
+coeff19*table[1][18] +coeff20*table[1][19] +coeff21*table[1][20]
+coeff22*table[1][21] +coeff23*table[1][22] +coeff24*table[1][23]
+coeff25*table[1][24] +coeff26*table[1][25] +coeff27*table[1][26]
+coeff28*table[1][27] +coeff29*table[1][28] +coeff30*table[1][29]
+coeff31*table[1][30] +coeff32*table[1][31] +coeff33*table[1][32]
+coeff34*table[1][33] +coeff35*table[1][34] ;


[...]

point500=coeff1*table[499][0] +coeff2*table[499][1] +coeff3*table[499][2]
+coeff4*table[499][3] +coeff5*table[499][4] +coeff6*table[499][5]
+coeff7*table[499][6] +coeff8*table[499][7] +coeff9*table[499][8]
+coeff10*table[499][9] +coeff11*table[499][10] +coeff12*table[499][11]
+coeff13*table[499][12] +coeff14*table[499][13] +coeff15*table[499][14]
+coeff16*table[499][15] +coeff17*table[499][16] +coeff18*table[499][17]
+coeff19*table[499][18] +coeff20*table[499][19] +coeff21*table[499][20]
+coeff22*table[499][21] +coeff23*table[499][22] +coeff24*table[499][23]
+coeff25*table[499][24] +coeff26*table[499][25] +coeff27*table[499][26]
+coeff28*table[499][27] +coeff29*table[499][28] +coeff30*table[499][29]
+coeff31*table[499][30] +coeff32*table[499][31] +coeff33*table[499][32]
+coeff34*table[499][33] +coeff35*table[499][34] ;

bruteForceSum =
point1 +
point2 + ... +

point499 +
point500
;

return bruteForceSum;

}//DoBruteForceCompute

 
E

Eric Gunnerson [MS]

Comments inline

--
Eric Gunnerson

Visit the C# product team at http://www.csharp.net
Eric's blog is at http://blogs.gotdotnet.com/ericgu/

This posting is provided "AS IS" with no warranties, and confers no rights.
Mountain Bikn' Guy said:
In the event anyone wishes to review this thread now or later, I thought I
would provide some conclusions that resulted from everyone's input and from
my reading:

1. unrolling loops often doesn't provide a performance gain on large (real
life size?) problems. In my experience, the unrolled loop was 5-10 times
SLOWER than the simple nested loop that I used as a reference. I found the
following quote in Eric Gunnerson's book (2nd ed). He said (about unrolled
loops), a "function is so big it doesn't fit into cache and therefore gets
slower performance."

Jon added the following info:
I suspect he means the processor's instruction
cache - if it's a small loop, the processor (not JIT, note!) can decode
the instructions once and keep the microcode available for further
iterations.

I do mean the processor cache, but it's not necessarily just an instruction
effect.

Jan Gray's excellent article on managed code performance has a lot more
detail on this:

http://msdn.microsoft.com/library/?url=/library/en-us/dndotnet/html/fastmanagedcode.asp
2. VS.NET has a 2048 char line limit in code files. In my experience, when
using runtime code generation and compiling, this can cause unusual trouble
if one isn't aware of it. I was able to compile and run an application that
was created from a code file with a line longer than this limit. The only
error I encountered was that the mathematical expression in the executing
program returned incorrect results! (The fix was simply inserting line
breaks into the code file.)

3. VS.NET has some compiler bugs that will result in the following error
message:
Additional information: Common Language Runtime detected an invalid program.

This appears to happen on complex mathematical expressions. However, in my
case the expression itself was not that complex. It was just long (but each
line wasn't too long). When I reached about 1000 subexpressions in a simple
sum of products expression, I encountered this compiler bug. Search on
Google Groups using the error message, and you'll see that others have
encountered it (frequently) when dealing with complex mathematical
expressions.

To my knowledge, there is no solution for this. Anyone from Microsoft care
to comment?

I haven't heard of this before. Can you send me a program that demonstrates
the problem?
(e-mail address removed)
That about sums up (no pun intended) what I've learned so far. HTH.
Dave


Mountain Bikn' Guy said:
Take some standard code such as shown below. It simply loops to add up a
series of terms and it produces the correct result.

// sum numbers with a loop
public int DoSumLooping(int iterations)
{
int result = 0;
for(int i = 1;i <=iterations;i++)
{
result += i;
}
return result;
}

Now translate this into a specific solution that doesn't use looping (and
use the same value for the number of iterations the loop performs). This
code returns an incorrect result. The method consists entirely of a very
straightforward code statement, but in this case .NET adds the numbers
incorrectly.
public double ComputeSum( )
{
// Brute force sum method
// For iterations == 10000
double sum = 0+ 1+ 2+ 3+ 4+ 5+ 6+ 7+ 8+ 9+ 10+ 11+ 12+ 13+ 14+ 15+ ...
+ 9997+ 9998+ 9999+ 10000;
return sum;
}
The above method returns an incorrect result with any number of terms above
about 200. It will correctly add 1 + 2 + ... + 200, but it will NOT
correctly add 1 + 2 + ... + 1000.

I have just run across this, and I have not yet researched the possible
reasons for this behavior. It may be a known issue related to either stack
size or the length of a code line, but to my knowledge it hasn't been
discussed in any of the "popular" literature on C# and .NET. I need to write
code like this, so if anyone has already encountered this issue, please
advise me.


Here's another example that also creates problems, but of a somewhat
different nature. Take the following code and translate it into a specific,
non-looping method and try to execute it using reflection. It fails.

public double LoopToCompute()
{
double sumOfProducts = 0;
double grandTotal = 0;
for (int i = 0; i < maxRows; ++i)
{
for (int j = 0; j < maxCols; ++j)
{
sumOfProducts += coeff[j] * table[j];
}
a_point = sumOfProducts;
grandTotal += sumOfProducts;
sumOfProducts = 0;
}
return grandTotal;
}//LoopToCompute

The above code works -- but it's equivalent code with loops unrolled (shown
below) doesn't work unless the maxRows is set very small. For small values,
the 2 methods (above and below) produce identical results. There is nothing
"wrong" with the code in that sense. It's similar to the above
situation.
If
the "size" of the code statement or the number of code statements is too
large, .NET fails. In this case (using reflection) it doesn't return the
incorrect result, as the first example did. In this case, reflection calls
it an invalid program and refuses to run it (but only when the value of
maxRows is above about 250). The reason for this is probably
straightforward. However, I have the need to make statements like this for
performance reasons so I need a work-around. Any suggestions are
appreciated! All comments are appreciated.

public double DoBruteForceCompute()
{
double bruteForceSum = 0;

point1=coeff1*table[0][0] +coeff2*table[0][1] +coeff3*table[0][2]
+coeff4*table[0][3] +coeff5*table[0][4] +coeff6*table[0][5]
+coeff7*table[0][6] +coeff8*table[0][7] +coeff9*table[0][8]
+coeff10*table[0][9] +coeff11*table[0][10] +coeff12*table[0][11]
+coeff13*table[0][12] +coeff14*table[0][13] +coeff15*table[0][14]
+coeff16*table[0][15] +coeff17*table[0][16] +coeff18*table[0][17]
+coeff19*table[0][18] +coeff20*table[0][19] +coeff21*table[0][20]
+coeff22*table[0][21] +coeff23*table[0][22] +coeff24*table[0][23]
+coeff25*table[0][24] +coeff26*table[0][25] +coeff27*table[0][26]
+coeff28*table[0][27] +coeff29*table[0][28] +coeff30*table[0][29]
+coeff31*table[0][30] +coeff32*table[0][31] +coeff33*table[0][32]
+coeff34*table[0][33] +coeff35*table[0][34] ;

point2=coeff1*table[1][0] +coeff2*table[1][1] +coeff3*table[1][2]
+coeff4*table[1][3] +coeff5*table[1][4] +coeff6*table[1][5]
+coeff7*table[1][6] +coeff8*table[1][7] +coeff9*table[1][8]
+coeff10*table[1][9] +coeff11*table[1][10] +coeff12*table[1][11]
+coeff13*table[1][12] +coeff14*table[1][13] +coeff15*table[1][14]
+coeff16*table[1][15] +coeff17*table[1][16] +coeff18*table[1][17]
+coeff19*table[1][18] +coeff20*table[1][19] +coeff21*table[1][20]
+coeff22*table[1][21] +coeff23*table[1][22] +coeff24*table[1][23]
+coeff25*table[1][24] +coeff26*table[1][25] +coeff27*table[1][26]
+coeff28*table[1][27] +coeff29*table[1][28] +coeff30*table[1][29]
+coeff31*table[1][30] +coeff32*table[1][31] +coeff33*table[1][32]
+coeff34*table[1][33] +coeff35*table[1][34] ;


[...]

point500=coeff1*table[499][0] +coeff2*table[499][1] +coeff3*table[499][2]
+coeff4*table[499][3] +coeff5*table[499][4] +coeff6*table[499][5]
+coeff7*table[499][6] +coeff8*table[499][7] +coeff9*table[499][8]
+coeff10*table[499][9] +coeff11*table[499][10] +coeff12*table[499][11]
+coeff13*table[499][12] +coeff14*table[499][13] +coeff15*table[499][14]
+coeff16*table[499][15] +coeff17*table[499][16] +coeff18*table[499][17]
+coeff19*table[499][18] +coeff20*table[499][19] +coeff21*table[499][20]
+coeff22*table[499][21] +coeff23*table[499][22] +coeff24*table[499][23]
+coeff25*table[499][24] +coeff26*table[499][25] +coeff27*table[499][26]
+coeff28*table[499][27] +coeff29*table[499][28] +coeff30*table[499][29]
+coeff31*table[499][30] +coeff32*table[499][31] +coeff33*table[499][32]
+coeff34*table[499][33] +coeff35*table[499][34] ;

bruteForceSum =
point1 +
point2 + ... +

point499 +
point500
;

return bruteForceSum;

}//DoBruteForceCompute
 
Top