Performance Breakdown Writing to Memory Location

G

Guest

Hi Folks!

For some reasons, this code:
float tmp = 0.;
for (int i = 0; i < 2000; i++) {
for (int j = i; j < 2000; j++) {
for (int k = 0; k < 9000; k++) {
tmp += tmp_mat(k,i) * tmp_mat(k,j);
}
tmp = 0.;

}
cout << "J: " << i << endl;
}

is 10-100 times faster than this

float tmp = 0.;
for (int i = 0; i < 2000; i++) {
for (int j = i; j < 2000; j++) {
for (int k = 0; k < 9000; k++) {
tmp += tmp_mat(k,i) * tmp_mat(k,j);
}
data2d[j] = tmp; // the problem
tmp = 0.;

}
cout << "J: " << i << endl;
}


Why is assigning a value that slow?

can anybody help me?

Thanks in advance for your efforts

-Chucker
 
C

Carl Daniel [VC++ MVP]

Chucker said:
Hi Folks!

For some reasons, this code:
float tmp = 0.;
for (int i = 0; i < 2000; i++) {
for (int j = i; j < 2000; j++) {
for (int k = 0; k < 9000; k++) {
tmp += tmp_mat(k,i) * tmp_mat(k,j);
}
tmp = 0.;

}
cout << "J: " << i << endl;
}

is 10-100 times faster than this

float tmp = 0.;
for (int i = 0; i < 2000; i++) {
for (int j = i; j < 2000; j++) {
for (int k = 0; k < 9000; k++) {
tmp += tmp_mat(k,i) * tmp_mat(k,j);
}
data2d[j] = tmp; // the problem
tmp = 0.;

}
cout << "J: " << i << endl;
}


Why is assigning a value that slow?



What is tmp_mat? what's data2d?

First off, I hope you're measuring a release build as debug build timings
are pretty close to meaningless.

Secondly, in the first loop, tmp is probably stored in a floating point
register for the entire operation, while the second form has to make an
additional 4,000,000 memory writes. That's got to take a bit of time.

-cd
 
G

Guest

Sorry, maybe I did not make myself clear.

1.) All the "unknown" variables are float types

2.) I know that loop 1 does not write to a memory location. I am looking for
the most performant way to do this.

Thanks

Chucker

Carl Daniel said:
Chucker said:
Hi Folks!

For some reasons, this code:
float tmp = 0.;
for (int i = 0; i < 2000; i++) {
for (int j = i; j < 2000; j++) {
for (int k = 0; k < 9000; k++) {
tmp += tmp_mat(k,i) * tmp_mat(k,j);
}
tmp = 0.;

}
cout << "J: " << i << endl;
}

is 10-100 times faster than this

float tmp = 0.;
for (int i = 0; i < 2000; i++) {
for (int j = i; j < 2000; j++) {
for (int k = 0; k < 9000; k++) {
tmp += tmp_mat(k,i) * tmp_mat(k,j);
}
data2d[j] = tmp; // the problem
tmp = 0.;

}
cout << "J: " << i << endl;
}


Why is assigning a value that slow?



What is tmp_mat? what's data2d?

First off, I hope you're measuring a release build as debug build timings
are pretty close to meaningless.

Secondly, in the first loop, tmp is probably stored in a floating point
register for the entire operation, while the second form has to make an
additional 4,000,000 memory writes. That's got to take a bit of time.

-cd
 
C

Carl Daniel [VC++ MVP]

Chucker said:
Sorry, maybe I did not make myself clear.

1.) All the "unknown" variables are float types
OK.

2.) I know that loop 1 does not write to a memory location. I am
looking for the most performant way to do this.

You may have already found it. Have you looked at a disassembly of the
first loop (without the writes)? In an optimized build, the compiler may
have simply omitted much (or all) of the loop if it can prove that the only
side-effect of the whole thing is to assign 0.0 to tmp.

-cd
Thanks

Chucker

Carl Daniel said:
Chucker said:
Hi Folks!

For some reasons, this code:

float tmp = 0.;
for (int i = 0; i < 2000; i++) {
for (int j = i; j < 2000; j++) {
for (int k = 0; k < 9000; k++) {
tmp += tmp_mat(k,i) * tmp_mat(k,j);
}
tmp = 0.;

}
cout << "J: " << i << endl;
}

is 10-100 times faster than this


float tmp = 0.;
for (int i = 0; i < 2000; i++) {
for (int j = i; j < 2000; j++) {
for (int k = 0; k < 9000; k++) {
tmp += tmp_mat(k,i) * tmp_mat(k,j);
}
data2d[j] = tmp; // the problem
tmp = 0.;

}
cout << "J: " << i << endl;
}

Why is assigning a value that slow?



What is tmp_mat? what's data2d?

First off, I hope you're measuring a release build as debug build
timings are pretty close to meaningless.

Secondly, in the first loop, tmp is probably stored in a floating
point register for the entire operation, while the second form has
to make an additional 4,000,000 memory writes. That's got to take a
bit of time.

-cd
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top