Peter said:
I searched high and low and finally found some stuff on unverified code. The
MSDN documentation mentions it yet does not discuss what it means. It could be
anything from security checks to tests for memory leaks. It doesn't say what its
purpose is.
I guess you should google for 'define:verify'. Verifiable benchmarks
have nothing to do with verifiable code. Anders just wanted the
benchmarks to be verifiably *valid*, which they aren't by any means. A
nested loop written the way it is in the benchmark is measuring nothing
but a compiler's ability to optimize nested loops that do more or less
nothing. Verifiable code means (in the .NET vocabulary) that the CLR can
statically verify the code to ensure it will do nothing it is not
expected to do. But enough of that, let's see what native assembly the
compilers generate for the loop and get over with it.
<...>
ha! I won't post the complete disassemblies (the C++ one is terribly
cryptic so it would do no help), but I've found the following. The C++
compiler (8.0 in my case, but I suspect 6.0 is doing the same) manages
to pre-cache the additions in the outer loops, which the C# compiler
doesn't. Thus, in the C# code all the additions that happen at
x+=a+b+c+d+e+f;
are evaluated for every single innermost loop, while c++ does something
like the following:
for (a = 0; a < n; a++)
{
for (b = 0, tmp_ab = a; b < n; b++, tmp_ab++)
{
for (c = 0, tmp_abc = tmp_ab; c < n; c++, tmp_abc++)
{
for (d = 0, tmp_abcd = tmp_abc; d < n; d++, tmp_abcd++)
{
for (e = 0, tmp_abcde = tmp_abcd; e < n; e++, tmp_abcde++)
{
for (f = 0, tmp = tmp_abcde; f < n; f++, tmp++)
x += tmp;
}
}
}
}
}
If you compile this code in C#, the execution time is 6734 ms for .NET
2.0 and 8593 ms for .NET 1.1 versus 3656 ms for unmanaged C++ (8.0) on
my machine. But note that the innermost loop in the C++ is only four
instructions. This can hardly be matched to any real-life algorithm, and
I can assure you that if there was anything more than 'nothing' in the
inner loop, the performance difference would be much less than 80% (I
expect it to drop far below 10% for most real-life algorithms).
That said, I wouldn't expect C# (or .NET) to match pure native code
performance for purely computational tasks like the one you describe is.
C# can win easily in cases where a lot of dynamic allocation is
involved, etc., but it will probably never outperform optimized native
C++ (or handwritten assembly) for computational tasks. If you need to
squeeze every clock of your CPU, you will probably get best results
using an optimizing compiler targeted for exactly the CPU the code will
run on.
I didn't want to write all this, but after reading through the long
discussions on this matter without touching the importatnt points, I
just decided to do so
Just my 2c.
Stefan