a question on VC++.NET compiler

G

Guest

I have some code like this:
///////////////
void test(int* a)
{
a[0]+=((a[1]-a[2])<<3);
}
////////////////
after compilng with vc.net 2003, the asm code is:
///////////
PUBLIC ?test@@YIXPAH@Z ; test
; Function compile flags: /Ogty
; File c:\test.cpp
; COMDAT ?test@@YIXPAH@Z
_TEXT SEGMENT
?test@@YIXPAH@Z PROC NEAR ; test, COMDAT
; _a$ = eax
; 144 : void test(int* a)

mov ecx, DWORD PTR [eax+4]
sub ecx, DWORD PTR [eax+8]
add ecx, ecx
add ecx, ecx
add ecx, ecx
add DWORD PTR [eax], ecx
ret 0
?test@@YIXPAH@Z ENDP ; test
////////////////////////////////////////////
Question:
why using "add ecx,ecx" three times and not using "shl ecx,3" instead.

To my idea shl ecx,3 is faster.
Thanks for any answers.
 
T

Tom Widmer

mestupid said:
I have some code like this:
///////////////
void test(int* a)
{
a[0]+=((a[1]-a[2])<<3);
}
////////////////
after compilng with vc.net 2003, the asm code is:
///////////
PUBLIC ?test@@YIXPAH@Z ; test
; Function compile flags: /Ogty
; File c:\test.cpp
; COMDAT ?test@@YIXPAH@Z
_TEXT SEGMENT
?test@@YIXPAH@Z PROC NEAR ; test, COMDAT
; _a$ = eax
; 144 : void test(int* a)

mov ecx, DWORD PTR [eax+4]
sub ecx, DWORD PTR [eax+8]
add ecx, ecx
add ecx, ecx
add ecx, ecx
add DWORD PTR [eax], ecx
ret 0
?test@@YIXPAH@Z ENDP ; test
////////////////////////////////////////////
Question:
why using "add ecx,ecx" three times and not using "shl ecx,3" instead.

To my idea shl ecx,3 is faster.

I know very little about assembler, but a quick search on google seems
to show that you may be wrong, at least for P4:

http://www.intel.com/cd/ids/developer/asmo-na/eng/44010.htm?prn=Y

It says: "Shifts, rotations Avoid if possible, schedule dependent
instructions as far away as possible; replace shl with additions". It
also says: "the shl µop has a latency of 4", whatever that means.

Tom
 
C

Carl Daniel [VC++ MVP]

Tom said:
I know very little about assembler, but a quick search on google seems
to show that you may be wrong, at least for P4:

http://www.intel.com/cd/ids/developer/asmo-na/eng/44010.htm?prn=Y

It says: "Shifts, rotations Avoid if possible, schedule dependent
instructions as far away as possible; replace shl with additions". It
also says: "the shl µop has a latency of 4", whatever that means.

IIUC, on a Pentium 4 class CPU, the adds will execute in 1/2 clock cycle
each in the execution core - three adds will be ~2X faster than the shift
would be.

-cd
 
G

Guest

Thanks -cd and Tom.
That's very helpful. and I read through IA32 Intel architecture
optimization. yeah shl has a long latency.

I have another question need to make it clear:
ADD has latency 0.5, Does that mean the cpu could handle two ADD per clock
cycle?

I guess ADD needs a total of 1.5 clock cycles. one for instruction and half
for latency. So three ADDs need 4.5 cycles

But shl has one instruction cycle and 4 for latency . so shl has totally 5
cycles.
Am I right?
 
C

Carl Daniel [VC++ MVP]

mestupid said:
Thanks -cd and Tom.
That's very helpful. and I read through IA32 Intel architecture
optimization. yeah shl has a long latency.

I have another question need to make it clear:
ADD has latency 0.5, Does that mean the cpu could handle two ADD per
clock cycle?

That's correct.
I guess ADD needs a total of 1.5 clock cycles. one for instruction
and half for latency. So three ADDs need 4.5 cycles

But shl has one instruction cycle and 4 for latency . so shl has
totally 5 cycles.
Am I right?

You could be - I didn't check the actual instruction timings.

-cd
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top