Poor array performance

  • Thread starter John Mark Howell
  • Start date
J

John Mark Howell

I had a customer call about some C# code they had put together that was
handling some large arrays. The performance was rather poor. The C# code
runs in about 22 seconds and the equivalent C++.Net code runs in 0.3
seconds. Can someone help me understand why the C# code performance is so
poor? I rewote the C# code to use a single dimenional array and the time
went down to about 3 seconds, but that's still no explaination as to why the
two dimenional array performance is so bad. I tried this on both C# 1.1 and
C# 2.0.


The original code was:

public void TestLoop4OldMethod()
{
double[,] emisbase = new double[1000,8784];
double[,] vombase = new double[1000,8784];
int Iteration_Index;
int Hourly_Index;
double differenceInSeconds;
DateTime myDateTime1;
DateTime myDateTime2;

myDateTime1 = DateTime.UtcNow;
for (int i1 = 0; i1 < 10; )
{
for (Hourly_Index = 0; Hourly_Index < 8784; )
{
for (Iteration_Index = 0; Iteration_Index < 1000; )
{
emisbase[Iteration_Index, Hourly_Index] = Iteration_Index *
Iteration_Index;
vombase[Iteration_Index, Hourly_Index] = emisbase[Iteration_Index,
Iteration_Index];
Iteration_Index++;
}
Hourly_Index++;
}
//Console.WriteLine("Here we are - Loop 4: {0}", i1);
i1++;
}
myDateTime2 = DateTime.UtcNow;
TimeSpan ts = myDateTime2 - myDateTime1;
differenceInSeconds = ts.TotalMilliseconds / 1000;

Console.WriteLine("RunTime in Seconds - Left Most Index + array
reference: {0} ", differenceInSeconds);
Console.WriteLine(" ");
}


It runs in about 22 seconds. Here I rewrote the code in C++ and it runs in
0.3 seconds:


// -------------------------------------------------------------------------
-------------------------------------------
int Iteration_Index;
int Hourly_Index;
double differenceInSeconds;
DateTime myDateTime1;
DateTime myDateTime2;

double * emisbase = new double[1000,8784];
double * vombase = new double[1000,8784];

myDateTime1 = DateTime::UtcNow;
for (int i1 = 0; i1 < 10; )
{
for (Hourly_Index = 0; Hourly_Index < 8784; Hourly_Index++)
{
for (Iteration_Index = 0; Iteration_Index < 1000; Iteration_Index++)
{
emisbase[Iteration_Index, Hourly_Index] = Iteration_Index *
Iteration_Index;
vombase[Iteration_Index, Hourly_Index] = emisbase[Iteration_Index,
Iteration_Index];
//Iteration_Index++;
}
//Hourly_Index++;
}
//Console::Write(S"Here we are - Loop 4: ");
//Console::WriteLine(Convert::ToString(i1));
i1++;
}
myDateTime2 = DateTime::UtcNow;
TimeSpan ts = myDateTime2 - myDateTime1;
differenceInSeconds = ts.TotalMilliseconds / 1000;

Console::Write(S"RunTime in Seconds - Left Most Index + array reference:
");
Console::WriteLine(Convert::ToString(differenceInSeconds));

// -------------------------------------------------------------------------
-------------------------------------------
 
G

Gabriel Magaña

For a loop that runs almost 9 million times, 3 seconds is not that bad! ;-)

I don't know the internals of C# well enough to say something intelligent
about it, but why not leave in the managed C++ part? It sounds like this is
a critical section that should have the heck optimized out of it anyhow...


John Mark Howell said:
I had a customer call about some C# code they had put together that was
handling some large arrays. The performance was rather poor. The C#
code
runs in about 22 seconds and the equivalent C++.Net code runs in 0.3
seconds. Can someone help me understand why the C# code performance is so
poor? I rewote the C# code to use a single dimenional array and the
time
went down to about 3 seconds, but that's still no explaination as to why
the
two dimenional array performance is so bad. I tried this on both C# 1.1
and
C# 2.0.


The original code was:

public void TestLoop4OldMethod()
{
double[,] emisbase = new double[1000,8784];
double[,] vombase = new double[1000,8784];
int Iteration_Index;
int Hourly_Index;
double differenceInSeconds;
DateTime myDateTime1;
DateTime myDateTime2;

myDateTime1 = DateTime.UtcNow;
for (int i1 = 0; i1 < 10; )
{
for (Hourly_Index = 0; Hourly_Index < 8784; )
{
for (Iteration_Index = 0; Iteration_Index < 1000; )
{
emisbase[Iteration_Index, Hourly_Index] = Iteration_Index *
Iteration_Index;
vombase[Iteration_Index, Hourly_Index] = emisbase[Iteration_Index,
Iteration_Index];
Iteration_Index++;
}
Hourly_Index++;
}
//Console.WriteLine("Here we are - Loop 4: {0}", i1);
i1++;
}
myDateTime2 = DateTime.UtcNow;
TimeSpan ts = myDateTime2 - myDateTime1;
differenceInSeconds = ts.TotalMilliseconds / 1000;

Console.WriteLine("RunTime in Seconds - Left Most Index + array
reference: {0} ", differenceInSeconds);
Console.WriteLine(" ");
}


It runs in about 22 seconds. Here I rewrote the code in C++ and it runs
in
0.3 seconds:


// -------------------------------------------------------------------------
-------------------------------------------
int Iteration_Index;
int Hourly_Index;
double differenceInSeconds;
DateTime myDateTime1;
DateTime myDateTime2;

double * emisbase = new double[1000,8784];
double * vombase = new double[1000,8784];

myDateTime1 = DateTime::UtcNow;
for (int i1 = 0; i1 < 10; )
{
for (Hourly_Index = 0; Hourly_Index < 8784; Hourly_Index++)
{
for (Iteration_Index = 0; Iteration_Index < 1000; Iteration_Index++)
{
emisbase[Iteration_Index, Hourly_Index] = Iteration_Index *
Iteration_Index;
vombase[Iteration_Index, Hourly_Index] = emisbase[Iteration_Index,
Iteration_Index];
//Iteration_Index++;
}
//Hourly_Index++;
}
//Console::Write(S"Here we are - Loop 4: ");
//Console::WriteLine(Convert::ToString(i1));
i1++;
}
myDateTime2 = DateTime::UtcNow;
TimeSpan ts = myDateTime2 - myDateTime1;
differenceInSeconds = ts.TotalMilliseconds / 1000;

Console::Write(S"RunTime in Seconds - Left Most Index + array reference:
");
Console::WriteLine(Convert::ToString(differenceInSeconds));

// -------------------------------------------------------------------------
 
C

Chris Dunaway

How many times did you run the loop? The first time the code is
executed it must be JIT compiled and that can take some of the time.

Just a thought.
 
W

Willy Denoyette [MVP]

Fast right, but correct? One important thing when running such kind of
benchmark is that the results are correct and repeatable (and I mean the
timing and the results of the operations performed).
Did you inspect the value of the array elements (say the first 10 locations)
after the run?
IMO they aren't correct in case of C++.
I will take a look at the code when I find some spare time, but 0.3 sec for
80 Million iterations smell like "broken optimization".

Willy.




|I had a customer call about some C# code they had put together that was
| handling some large arrays. The performance was rather poor. The C#
code
| runs in about 22 seconds and the equivalent C++.Net code runs in 0.3
| seconds. Can someone help me understand why the C# code performance is so
| poor? I rewote the C# code to use a single dimenional array and the
time
| went down to about 3 seconds, but that's still no explaination as to why
the
| two dimenional array performance is so bad. I tried this on both C# 1.1
and
| C# 2.0.
|
|
| The original code was:
|
| public void TestLoop4OldMethod()
| {
| double[,] emisbase = new double[1000,8784];
| double[,] vombase = new double[1000,8784];
| int Iteration_Index;
| int Hourly_Index;
| double differenceInSeconds;
| DateTime myDateTime1;
| DateTime myDateTime2;
|
| myDateTime1 = DateTime.UtcNow;
| for (int i1 = 0; i1 < 10; )
| {
| for (Hourly_Index = 0; Hourly_Index < 8784; )
| {
| for (Iteration_Index = 0; Iteration_Index < 1000; )
| {
| emisbase[Iteration_Index, Hourly_Index] = Iteration_Index *
| Iteration_Index;
| vombase[Iteration_Index, Hourly_Index] = emisbase[Iteration_Index,
| Iteration_Index];
| Iteration_Index++;
| }
| Hourly_Index++;
| }
| //Console.WriteLine("Here we are - Loop 4: {0}", i1);
| i1++;
| }
| myDateTime2 = DateTime.UtcNow;
| TimeSpan ts = myDateTime2 - myDateTime1;
| differenceInSeconds = ts.TotalMilliseconds / 1000;
|
| Console.WriteLine("RunTime in Seconds - Left Most Index + array
| reference: {0} ", differenceInSeconds);
| Console.WriteLine(" ");
| }
|
|
| It runs in about 22 seconds. Here I rewrote the code in C++ and it runs
in
| 0.3 seconds:
|
|
|
// -------------------------------------------------------------------------
| -------------------------------------------
| int Iteration_Index;
| int Hourly_Index;
| double differenceInSeconds;
| DateTime myDateTime1;
| DateTime myDateTime2;
|
| double * emisbase = new double[1000,8784];
| double * vombase = new double[1000,8784];
|
| myDateTime1 = DateTime::UtcNow;
| for (int i1 = 0; i1 < 10; )
| {
| for (Hourly_Index = 0; Hourly_Index < 8784; Hourly_Index++)
| {
| for (Iteration_Index = 0; Iteration_Index < 1000; Iteration_Index++)
| {
| emisbase[Iteration_Index, Hourly_Index] = Iteration_Index *
| Iteration_Index;
| vombase[Iteration_Index, Hourly_Index] = emisbase[Iteration_Index,
| Iteration_Index];
| //Iteration_Index++;
| }
| //Hourly_Index++;
| }
| //Console::Write(S"Here we are - Loop 4: ");
| //Console::WriteLine(Convert::ToString(i1));
| i1++;
| }
| myDateTime2 = DateTime::UtcNow;
| TimeSpan ts = myDateTime2 - myDateTime1;
| differenceInSeconds = ts.TotalMilliseconds / 1000;
|
| Console::Write(S"RunTime in Seconds - Left Most Index + array reference:
| ");
| Console::WriteLine(Convert::ToString(differenceInSeconds));
|
|
// -------------------------------------------------------------------------
| -------------------------------------------
|
|
|
|
|
 
W

Willy Denoyette [MVP]

JIT compiling such a small piece of code takes less than a millisecond.

Willy.

| How many times did you run the loop? The first time the code is
| executed it must be JIT compiled and that can take some of the time.
|
| Just a thought.
|
 
G

Guest

Hi John,
I ran your original 2D code in a loop 10 times and the average I get is
about 6.36 seconds (my computer is a 3Ghz Pentium4 with 512MB of RAM) to loop
87,840,000 times which if you look at one iteration time is roughly 7.9 x
10E-8 which is pretty fast :). If I am looping 87 million times I think I
would be happy with the 6 second range.

Not sure why the C++ is so much faster but something smells fishy when
performing so much processing :)

Mark
http://www.markdawson.org



John Mark Howell said:
I had a customer call about some C# code they had put together that was
handling some large arrays. The performance was rather poor. The C# code
runs in about 22 seconds and the equivalent C++.Net code runs in 0.3
seconds. Can someone help me understand why the C# code performance is so
poor? I rewote the C# code to use a single dimenional array and the time
went down to about 3 seconds, but that's still no explaination as to why the
two dimenional array performance is so bad. I tried this on both C# 1.1 and
C# 2.0.


The original code was:

public void TestLoop4OldMethod()
{
double[,] emisbase = new double[1000,8784];
double[,] vombase = new double[1000,8784];
int Iteration_Index;
int Hourly_Index;
double differenceInSeconds;
DateTime myDateTime1;
DateTime myDateTime2;

myDateTime1 = DateTime.UtcNow;
for (int i1 = 0; i1 < 10; )
{
for (Hourly_Index = 0; Hourly_Index < 8784; )
{
for (Iteration_Index = 0; Iteration_Index < 1000; )
{
emisbase[Iteration_Index, Hourly_Index] = Iteration_Index *
Iteration_Index;
vombase[Iteration_Index, Hourly_Index] = emisbase[Iteration_Index,
Iteration_Index];
Iteration_Index++;
}
Hourly_Index++;
}
//Console.WriteLine("Here we are - Loop 4: {0}", i1);
i1++;
}
myDateTime2 = DateTime.UtcNow;
TimeSpan ts = myDateTime2 - myDateTime1;
differenceInSeconds = ts.TotalMilliseconds / 1000;

Console.WriteLine("RunTime in Seconds - Left Most Index + array
reference: {0} ", differenceInSeconds);
Console.WriteLine(" ");
}


It runs in about 22 seconds. Here I rewrote the code in C++ and it runs in
0.3 seconds:


// -------------------------------------------------------------------------
-------------------------------------------
int Iteration_Index;
int Hourly_Index;
double differenceInSeconds;
DateTime myDateTime1;
DateTime myDateTime2;

double * emisbase = new double[1000,8784];
double * vombase = new double[1000,8784];

myDateTime1 = DateTime::UtcNow;
for (int i1 = 0; i1 < 10; )
{
for (Hourly_Index = 0; Hourly_Index < 8784; Hourly_Index++)
{
for (Iteration_Index = 0; Iteration_Index < 1000; Iteration_Index++)
{
emisbase[Iteration_Index, Hourly_Index] = Iteration_Index *
Iteration_Index;
vombase[Iteration_Index, Hourly_Index] = emisbase[Iteration_Index,
Iteration_Index];
//Iteration_Index++;
}
//Hourly_Index++;
}
//Console::Write(S"Here we are - Loop 4: ");
//Console::WriteLine(Convert::ToString(i1));
i1++;
}
myDateTime2 = DateTime::UtcNow;
TimeSpan ts = myDateTime2 - myDateTime1;
differenceInSeconds = ts.TotalMilliseconds / 1000;

Console::Write(S"RunTime in Seconds - Left Most Index + array reference:
");
Console::WriteLine(Convert::ToString(differenceInSeconds));

// -------------------------------------------------------------------------
 
J

John Mark Howell

Only once, but the timer is in the code, not external so the IL is JIT'ed by
the time I get the begin time.
 
J

John Mark Howell

That's an option I suggested to them. They have code already in Fortran and
were considering migrating to C# for business reasons. Does anyone know how
well the FORTRAN.Net from Intel or Lahey perform?
 
J

John Mark Howell

"broken optimization"? Do you mean that it should take longer or shorter?

The client is looking at a piece of Intel FORTRAN code that is running on a
Windows box in 0.04 seconds as a basis. They are considering migrating to
C# for business reasons. Has anyone looked at the performance of either the
Intel or Lahey FORTRAN.Net products?
 
J

John Mark Howell

MarkD you may be on to something. I only ran the loop once. C# may have
some type of internal tuning that takes more than one pass. I'll have to
adjust my test and re-run it multiple times to see.
 
C

C.C. \(aka Me\)

Here are some tests that I have done with the code included below. I
compiled it in Release mode and ran it outside of VStudio.

Empty loop
..01 seconds

count++ only
..15 seconds

count++ and vals1[x,y] = x*y+1
1.5 seconds

count++ and vals1[x,y] = x*y+1 and vals2[x,y] = x*y+2;

4.5 for the first few runs then it jumped up to about 10 seconds.

It may not give any answers to why (we may just have to accept it as a fact)
but it does give you some ideas as to how much array access and
multiplication decreases performance.


Code I used:
=====================
using System;
namespace SpeedTest
{
class Class1
{
[STAThread]
static void Main(string[] args)
{
for(int x=1; x<=10; x++)
RunTest();
Console.ReadLine();
}
public static void RunTest()
{
DateTime myDateTime1;
DateTime myDateTime2;
double differenceInSeconds;
long count = 0;
double [,]vals1 = new double[1001,8001];
double [,]vals2 = new double[1001,8001];
myDateTime1 = DateTime.UtcNow;
for(int x=1; x<=1000; x++)
{
for(int y=1; y<=8000; y++)
{
for(int z=1; z<=10; z++)
{
count++;
// vals1[x,y] = x*y+1;
// vals2[x,y] = x*y+2;
}
}
}
myDateTime2 = DateTime.UtcNow;
TimeSpan ts = myDateTime2 - myDateTime1;
differenceInSeconds = ts.TotalMilliseconds / 1000;
Console.WriteLine("{0} loops in {1}Seconds", count, differenceInSeconds);
}
}
}
=====================
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Charles Cox
VC/VB/C# Developer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

John Mark Howell said:
I had a customer call about some C# code they had put together that was
handling some large arrays. The performance was rather poor. The C#
code
runs in about 22 seconds and the equivalent C++.Net code runs in 0.3
seconds. Can someone help me understand why the C# code performance is so
poor? I rewote the C# code to use a single dimenional array and the
time
went down to about 3 seconds, but that's still no explaination as to why
the
two dimenional array performance is so bad. I tried this on both C# 1.1
and
C# 2.0.


The original code was:

public void TestLoop4OldMethod()
{
double[,] emisbase = new double[1000,8784];
double[,] vombase = new double[1000,8784];
int Iteration_Index;
int Hourly_Index;
double differenceInSeconds;
DateTime myDateTime1;
DateTime myDateTime2;

myDateTime1 = DateTime.UtcNow;
for (int i1 = 0; i1 < 10; )
{
for (Hourly_Index = 0; Hourly_Index < 8784; )
{
for (Iteration_Index = 0; Iteration_Index < 1000; )
{
emisbase[Iteration_Index, Hourly_Index] = Iteration_Index *
Iteration_Index;
vombase[Iteration_Index, Hourly_Index] = emisbase[Iteration_Index,
Iteration_Index];
Iteration_Index++;
}
Hourly_Index++;
}
//Console.WriteLine("Here we are - Loop 4: {0}", i1);
i1++;
}
myDateTime2 = DateTime.UtcNow;
TimeSpan ts = myDateTime2 - myDateTime1;
differenceInSeconds = ts.TotalMilliseconds / 1000;

Console.WriteLine("RunTime in Seconds - Left Most Index + array
reference: {0} ", differenceInSeconds);
Console.WriteLine(" ");
}


It runs in about 22 seconds. Here I rewrote the code in C++ and it runs
in
0.3 seconds:


// -------------------------------------------------------------------------
-------------------------------------------
int Iteration_Index;
int Hourly_Index;
double differenceInSeconds;
DateTime myDateTime1;
DateTime myDateTime2;

double * emisbase = new double[1000,8784];
double * vombase = new double[1000,8784];

myDateTime1 = DateTime::UtcNow;
for (int i1 = 0; i1 < 10; )
{
for (Hourly_Index = 0; Hourly_Index < 8784; Hourly_Index++)
{
for (Iteration_Index = 0; Iteration_Index < 1000; Iteration_Index++)
{
emisbase[Iteration_Index, Hourly_Index] = Iteration_Index *
Iteration_Index;
vombase[Iteration_Index, Hourly_Index] = emisbase[Iteration_Index,
Iteration_Index];
//Iteration_Index++;
}
//Hourly_Index++;
}
//Console::Write(S"Here we are - Loop 4: ");
//Console::WriteLine(Convert::ToString(i1));
i1++;
}
myDateTime2 = DateTime::UtcNow;
TimeSpan ts = myDateTime2 - myDateTime1;
differenceInSeconds = ts.TotalMilliseconds / 1000;

Console::Write(S"RunTime in Seconds - Left Most Index + array reference:
");
Console::WriteLine(Convert::ToString(differenceInSeconds));

// -------------------------------------------------------------------------
 
J

James Park

John Mark Howell said:
What is 'native'?

In managed Managed C++, the statements should be:
Double emisbase[,] = new Double[1000, 8784];
Double vombase[,] = new Double[1000, 8784];

In C++/CLI, the statements should be:
array<double, 2>^ emisbase = gcnew array<double, 2>(1000,8784);
array<double, 2>^ vombase = gcnew array<double, 2>(1000,8784);

What you really are doing right now is really:
double * emisbase = new double[8784];
double * vombase = new double[8784];

And statements like:
emisbase[Iteration_Index, Hourly_Index]

turn into:
emisbase[Hourly_Index]

because emisbase doesn't refer to a .NET array. It's just a regular C++
pointer to a regular C++ array.

Once you run apples to apples, it takes just about as long.
 
J

John Mark Howell

Thanks James. That does clarify a lot of questions about the difference.
But do you know of anyway to coerce C# to get the performance of a native
array as in C++ or is that just asking too much? Would it be worth it to
have sections of unsafe code?

James Park said:
John Mark Howell said:
What is 'native'?

In managed Managed C++, the statements should be:
Double emisbase[,] = new Double[1000, 8784];
Double vombase[,] = new Double[1000, 8784];

In C++/CLI, the statements should be:
array<double, 2>^ emisbase = gcnew array<double, 2>(1000,8784);
array<double, 2>^ vombase = gcnew array<double, 2>(1000,8784);

What you really are doing right now is really:
double * emisbase = new double[8784];
double * vombase = new double[8784];

And statements like:
emisbase[Iteration_Index, Hourly_Index]

turn into:
emisbase[Hourly_Index]

because emisbase doesn't refer to a .NET array. It's just a regular C++
pointer to a regular C++ array.

Once you run apples to apples, it takes just about as long.
 
W

Willy Denoyette [MVP]

No I mean that the C++ figures are wrong because of "broken" optimization.
If I run the code (managed C++) you posted it takes 0.210 secs. to complete,
impressive, but I don't trust it.
Now if I watch the memory consumption when running (1000 iterations to make
it possible to measure), the "private bytes" counter stays at ~9 Mb and the
working-set at ~6.5 Mb, wich is the minimum for a ".NET" process, this
smells like agressive optimization. Why?, well the program should create two
arrays of doubles (well, the optimizer could optimize away the first array),
each of them - 8784*1000*sizeof(double) = 70.272.000 bytes. That means that
the final private bytes count should reach at least 70.272.000 (or
140.544.000 bytes).
So it's clear that NO arrays are created, that would mean both array
creation are optimized away (which is possible because they aren't used
outside the loop, right?
So I decided to look at the results, by including this code after the loop:

for (Hourly_Index = 0; Hourly_Index < 1; Hourly_Index++)
{
for (Iteration_Index = 0;Iteration_Index < 1000 ; Iteration_Index++ )
{
Console::WriteLine("{0}", Convert::ToString(vombase[Iteration_Index,
Hourly_Index]));
}
}
See, I'm only interested at the first 1000 elements, and he! they are all at
0, thus wrong! The I watched the last elements values, they are all at
998001, which is 999^2 or the largest value calculated in the loop. Finaly I
checked all the 'values' and guess, there are only 0 (the first 1.000.000)
and 998001 (the remaining), but still no array's, the memory consumption
remain the same.
So, this is what I call "broken optimization", the results are wrong, so the
benchmarh time is bogus.

Note that I did the same with the C# code, and here the results are correct,
memory consumption is > 140Mb and the benchmark time is correct (5.2 Secs.
on my box).

Actually I didn't look into the generated code, I will do is I find some
spare time.

Willy.



| "broken optimization"? Do you mean that it should take longer or shorter?
|
| The client is looking at a piece of Intel FORTRAN code that is running on
a
| Windows box in 0.04 seconds as a basis. They are considering migrating to
| C# for business reasons. Has anyone looked at the performance of either
the
| Intel or Lahey FORTRAN.Net products?
|
|
| | > Fast right, but correct? One important thing when running such kind of
| > benchmark is that the results are correct and repeatable (and I mean
the
| > timing and the results of the operations performed).
| > Did you inspect the value of the array elements (say the first 10
| > locations)
| > after the run?
| > IMO they aren't correct in case of C++.
| > I will take a look at the code when I find some spare time, but 0.3 sec
| > for
| > > 80 Million iterations smell like "broken optimization".
| >
| > Willy.
|
|
 
J

James Park

John Mark Howell said:
Thanks James. That does clarify a lot of questions about the difference.
But do you know of anyway to coerce C# to get the performance of a native
array as in C++ or is that just asking too much?

The implementation given is broken. If you fix it by declaring the arrays:

double (*emisbase)[8784];
double (*vombase)[8784];

and switching all [a,b] to [a], it takes as long. Even if you use
straight native C++, it doesn't really help (I just tested it).
 
W

Willy Denoyette [MVP]

This is how the code should look like:
....

// changed array declarations
double (*emisbase)[8700];
double (*vombase)[8700];
emisbase = new double[1000][8700]; // allocate native array on the heap
vombase = new double[1000][8700]; // "
.....
for (int i1 = 0; i1 < 10; i1++)
{
for (Hourly_Index = 0; Hourly_Index < 8784; Hourly_Index++)
{
for (Iteration_Index = 0; Iteration_Index < 1000; Iteration_Index++)
{
emisbase[Iteration_Index][ Hourly_Index] =
Iteration_Index*Iteration_Index;
vombase[Iteration_Index][ Hourly_Index] =
emisbase[Iteration_Index][Iteration_Index];
}
}
}

note that the code you posted only creates array's of 8700 entries (as per
C++ standard), which leads to broken MSIL code as explained in my previous
post.
Running the above results in the same time to complete as C#, note also that
allocations of the managed arrays are somewhat faster in C# as opposed to
the native array allocation in C++ (managed code).

Willy.


| No I mean that the C++ figures are wrong because of "broken" optimization.
| If I run the code (managed C++) you posted it takes 0.210 secs. to
complete,
| impressive, but I don't trust it.
| Now if I watch the memory consumption when running (1000 iterations to
make
| it possible to measure), the "private bytes" counter stays at ~9 Mb and
the
| working-set at ~6.5 Mb, wich is the minimum for a ".NET" process, this
| smells like agressive optimization. Why?, well the program should create
two
| arrays of doubles (well, the optimizer could optimize away the first
array),
| each of them - 8784*1000*sizeof(double) = 70.272.000 bytes. That means
that
| the final private bytes count should reach at least 70.272.000 (or
| 140.544.000 bytes).
| So it's clear that NO arrays are created, that would mean both array
| creation are optimized away (which is possible because they aren't used
| outside the loop, right?
| So I decided to look at the results, by including this code after the
loop:
|
| for (Hourly_Index = 0; Hourly_Index < 1; Hourly_Index++)
| {
| for (Iteration_Index = 0;Iteration_Index < 1000 ; Iteration_Index++ )
| {
| Console::WriteLine("{0}", Convert::ToString(vombase[Iteration_Index,
| Hourly_Index]));
| }
| }
| See, I'm only interested at the first 1000 elements, and he! they are all
at
| 0, thus wrong! The I watched the last elements values, they are all at
| 998001, which is 999^2 or the largest value calculated in the loop. Finaly
I
| checked all the 'values' and guess, there are only 0 (the first 1.000.000)
| and 998001 (the remaining), but still no array's, the memory consumption
| remain the same.
| So, this is what I call "broken optimization", the results are wrong, so
the
| benchmarh time is bogus.
|
| Note that I did the same with the C# code, and here the results are
correct,
| memory consumption is > 140Mb and the benchmark time is correct (5.2 Secs.
| on my box).
|
| Actually I didn't look into the generated code, I will do is I find some
| spare time.
|
| Willy.
|
|
|
| || "broken optimization"? Do you mean that it should take longer or
shorter?
||
|| The client is looking at a piece of Intel FORTRAN code that is running on
| a
|| Windows box in 0.04 seconds as a basis. They are considering migrating
to
|| C# for business reasons. Has anyone looked at the performance of either
| the
|| Intel or Lahey FORTRAN.Net products?
||
||
|| || > Fast right, but correct? One important thing when running such kind of
|| > benchmark is that the results are correct and repeatable (and I mean
| the
|| > timing and the results of the operations performed).
|| > Did you inspect the value of the array elements (say the first 10
|| > locations)
|| > after the run?
|| > IMO they aren't correct in case of C++.
|| > I will take a look at the code when I find some spare time, but 0.3 sec
|| > for
|| > > 80 Million iterations smell like "broken optimization".
|| >
|| > Willy.
||
||
|
|
 
J

John Mark Howell

Ouch. I should have looked closer at the code and tested the results before
posting. That would have avoided some confusion. Thanks for setting me
straight on what was actually happening. After I fixed my code here, you
are correct, I'm getting equivalent speed to the C# code.
 
W

Willy Denoyette [MVP]

No problem, I was getting confused because I would have expected the
compiler would flag the array declarations as illegal. Further investigation
on my part showed the 'tiny' 8784 elem. array's being created which made
ring a bell.

Willy.


| Ouch. I should have looked closer at the code and tested the results
before
| posting. That would have avoided some confusion. Thanks for setting me
| straight on what was actually happening. After I fixed my code here, you
| are correct, I'm getting equivalent speed to the C# code.
|
|
| | > No I mean that the C++ figures are wrong because of "broken"
optimization.
| > If I run the code (managed C++) you posted it takes 0.210 secs. to
| complete,
| > impressive, but I don't trust it.
| > Now if I watch the memory consumption when running (1000 iterations to
| make
| > it possible to measure), the "private bytes" counter stays at ~9 Mb and
| the
| > working-set at ~6.5 Mb, wich is the minimum for a ".NET" process, this
| > smells like agressive optimization. Why?, well the program should create
| two
| > arrays of doubles (well, the optimizer could optimize away the first
| array),
| > each of them - 8784*1000*sizeof(double) = 70.272.000 bytes. That means
| that
| > the final private bytes count should reach at least 70.272.000 (or
| > 140.544.000 bytes).
| > So it's clear that NO arrays are created, that would mean both array
| > creation are optimized away (which is possible because they aren't used
| > outside the loop, right?
| > So I decided to look at the results, by including this code after the
| loop:
| >
| > for (Hourly_Index = 0; Hourly_Index < 1; Hourly_Index++)
| > {
| > for (Iteration_Index = 0;Iteration_Index < 1000 ; Iteration_Index++ )
| > {
| > Console::WriteLine("{0}", Convert::ToString(vombase[Iteration_Index,
| > Hourly_Index]));
| > }
| > }
| > See, I'm only interested at the first 1000 elements, and he! they are
all
| at
| > 0, thus wrong! The I watched the last elements values, they are all at
| > 998001, which is 999^2 or the largest value calculated in the loop.
Finaly
| I
| > checked all the 'values' and guess, there are only 0 (the first
1.000.000)
| > and 998001 (the remaining), but still no array's, the memory consumption
| > remain the same.
| > So, this is what I call "broken optimization", the results are wrong, so
| the
| > benchmarh time is bogus.
| >
| > Note that I did the same with the C# code, and here the results are
| correct,
| > memory consumption is > 140Mb and the benchmark time is correct (5.2
Secs.
| > on my box).
| >
| > Actually I didn't look into the generated code, I will do is I find some
| > spare time.
| >
| > Willy.
|
|
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top