VC6 speed vs VS2003

Y

YAZ

Hello,
I have a dll which do some number crunching. Performances (execution
speed) are very important in my application. I use VC6 to compile the
DLL.
A friend of mine told me that in Visual studio 2003 .net optimization
were enhanced and that i must gain in performance if I switch to VS
2003 or intel compiler. So I send him the project and he returned a
compiled DLL with VS 2003.
Result : the VS 2003 compiled Dll is slower than the VC6 one. For
example the creation of a matrix of 2000000 double random values last
230 ms in the VC6 DLL and 330 ms in the VS 2003.
Then he compiled the project with intel C++ compiler 9.0 inside VS 2003
with maximum optimization /fast flag and the result is 300 ms.

At last he tried the intel C++ compiler inside VC6 and the result was
200 ms with the /O2 flag. But the DLL generates External exception when
compiled with the /fast flag in VC6.

So :
Compiler flag execution time
VC6 /O2 230 ms
VS2003 /O2 330 ms
icl+VS2003 /fast 300 ms
icl+VC6 /O2 200 ms
icl+VC6 /fast external exception

I can't believe that the compiler in VS 2003 is so bad. May be the link
library are differents ? I use libc.lib /ML in VC6.

Do you have any experience with switching from VC6 to VS2003 ?
 
C

Carl Daniel [VC++ MVP]

YAZ said:
Hello,
I have a dll which do some number crunching. Performances (execution
speed) are very important in my application. I use VC6 to compile the
DLL.
A friend of mine told me that in Visual studio 2003 .net optimization
were enhanced and that i must gain in performance if I switch to VS
2003 or intel compiler. So I send him the project and he returned a
compiled DLL with VS 2003.
Result : the VS 2003 compiled Dll is slower than the VC6 one. For
example the creation of a matrix of 2000000 double random values last
230 ms in the VC6 DLL and 330 ms in the VS 2003.
Then he compiled the project with intel C++ compiler 9.0 inside VS
2003 with maximum optimization /fast flag and the result is 300 ms.

At last he tried the intel C++ compiler inside VC6 and the result was
200 ms with the /O2 flag. But the DLL generates External exception
when compiled with the /fast flag in VC6.

So :
Compiler flag execution time
VC6 /O2 230 ms
VS2003 /O2 330 ms
icl+VS2003 /fast 300 ms
icl+VC6 /O2 200 ms
icl+VC6 /fast external exception

I can't believe that the compiler in VS 2003 is so bad. May be the
link library are differents ? I use libc.lib /ML in VC6.

Do you have any experience with switching from VC6 to VS2003 ?

What's the complete set of compiler and linker options that you're using,
but for VC6 and VC7.1?

In general, the VC7.1 optimizer is better, but there may be other options
that are masking that effect (and the Intel compiler will duitifully
reproduce that behavior as well, since they really try to be switch
compatible with the corresponding version of VC).

Finally, make sure you're measuring the performance of something that
matters. Creating a large number of random numbers is probably not a
scenario that you're really interested in. You may in fact be comparing the
performance of two different random number generators. If the random number
generator in VC7.1 was improved to produce better randomness, for example,
it's very likely that it would also be slower. If you really want to
measure random number generation, make sure that you're using your own
generator (not rand(), etc), so that you know you're running the same code
in both environments.

-cd
 
A

Andre Kaufmann

YAZ said:
Hello,
I have a dll which do some number crunching. Performances (execution
speed) are very important in my application. I use VC6 to compile the
DLL.
A friend of mine told me that in Visual studio 2003 .net optimization
were enhanced and that i must gain in performance if I switch to VS
2003 or intel compiler. So I send him the project and he returned a
compiled DLL with VS 2003.

Your friend is right. VS 2003 and the Intel compiler are the best
compilers available for Windows, regarding speed of generated code.

Optimization has been enhanced. Which surely doesn´t mean that your
programs are always faster. But IMHO shouldn´t be that slower ;-)
Result : the VS 2003 compiled Dll is slower than the VC6 one. For
example the creation of a matrix of 2000000 double random values last
230 ms in the VC6 DLL and 330 ms in the VS 2003.

Is that pure memory allocation and filling or are some floating point
calculations involved ?
Then he compiled the project with intel C++ compiler 9.0 inside VS 2003
with maximum optimization /fast flag and the result is 300 ms.

At last he tried the intel C++ compiler inside VC6 and the result was
200 ms with the /O2 flag. But the DLL generates External exception when
compiled with the /fast flag in VC6.

Curious. Why should the same compiler (Intel 9.0) emit different code
with the same settings in different IDE´s ? Your project settings (VC6 /
VC 2003) must be different. Or did you compile from the command line ?
[...]
Do you have any experience with switching from VC6 to VS2003 ?

I had a look at the generated code ;-) and VC7 (VS2003) optimizes much
better and generates faster code. If you use link time code generation
(global optimization and whole program optimization) the compiler can
use more cpu registers and eliminates function calls and even inlines
code called in different cpp files.

Perhaps you should have a look at the following settings (VC 2003)

a) /O1 is commonly faster

b) RTTI / Exception handling may reduce speed.

c) Security checks (/GS) may reduce (slightly) the performance
(on in release builds)

d) Runtime library (Single / Multithread) has a significant impact
on the speed of e.g. memory allocations and other runtime library
functions

e) Are you using P4 code optimizations ? Though the generated code
runs on P3 processors, it will run slower on non P4 CPU´s.

f) VC 2003 doesn´t use an optimized heap for small memory blocks
on Win2K and upper. I don´t know if this was already the case
for VC6.0. VC 2003 uses the Windows heap instead, to AFAIK prevent
heap fragmentation.
Allocating many small memory blocks might be slower in VC 2003.
It´s one of the reasons why VC 2003 is complained to generate slower
code than other compilers. Though i would recommend to rewrite
the program if the speed difference is due to memory allocation.

Though you may enable the small block memory heap with

_set_sbh_threshold

(But i wouldn´t recommend it generally).


If your program does allocate many small memory blocks and the speed
changes significantly when you are using the function _set_sbh_threshold
i suppose the speed difference is due to changes in the runtime library
and the heap management.

If not - then you should check the project settings. If they are the
same or if you are using the compiler from the command line i can´t
explain the speed difference and i would recommend you to use the
profiler comming with the Intel compiler to find out which part of the
program is causing the speed difference.

Hope that helps.
Andre
 
Y

YAZ

He imported the dsw project that i send to him so the compiler setting
are the same.

My setting in VC6 are
/ML /O2 /Ob2 /D "WIN32" /D "NDEBUG" /D "_WINDOWS" /D "_MBCS" /D
"_USRDLL" /D "KERNEL_EXPORTS" /FR"Release/" /Fp"Release/Kernel.pch"
/YX /Fo"Release/" /Fd"Release/" /FD /c

for the compiler and :
kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib
advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib
odbccp32.lib ws2_32.lib /nologo /dll /incremental:no
/pdb:"Release/Kernel.pdb" /machine:I386 /out:"Release/Kernel.dll"
/implib:"Release/Kernel.lib"

for the linker.

The setting in VS2003 are :
/O2 /D "WIN32" /D "NDEBUG" /D "_WINDOWS" /D "_USRDLL" /D
"KERNEL_EXPORTS" /D "_WINDLL" /FD /ML /Fp".\Release/Kernel.pch"
/Fo".\Release/" /Fd".\Release/" /FR".\Release/" /W1 /nologo /c

and
/OUT:".\Release/Kernel.dll" /INCREMENTAL:NO /NOLOGO /DLL
/PDB:".\Release/Kernel.pdb" /IMPLIB:".\Release/Kernel.lib" /MACHINE:X86
odbc32.lib odbccp32.lib ws2_32.lib kernel32.lib user32.lib gdi32.lib
winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib
oleaut32.lib uuid.lib odbc32.lib odbccp32.lib

The setting in intel compiler are :
/c /fast /Og /Oi /Oy /D "WIN32" /D "NDEBUG" /D "_WINDOWS" /D "_USRDLL"
/D "KERNEL_EXPORTS" /D "_WINDLL" /FD /ML /Fo".\Release/"
/FR".\Release/" /W1 /nologo /Gd /Qparallel

and
/DLL odbc32.lib odbccp32.lib ws2_32.lib kernel32.lib user32.lib
gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib
oleaut32.lib uuid.lib odbc32.lib odbccp32.lib
/OUT:".\Release/Kernel.dll" /INCREMENTAL:NO /NOLOGO /TLBID:1
/IMPLIB:".\Release/Kernel.lib" /MACHINE:X86

Of course the difference in speed is global and not only in the random
number generation. BTW I use my own routine to make the generation of
random numbers.
Loops for example are slower . Which is strange in random number
generation is the VS2003 & intel are slower in the memory allcoation of
2000000 double and coping them to another matrix. There are based on
new operator and memcpy which proofs that the link library libc.lib in
my case is fatser in VC6.
The other functions in the library give the same performance when
applied to small matrix. But if the memory allocation and copying is
slower in VS2003 and intel c++ the whole application will be slower.
After all the application job is manipulating objects by creating,
copying and deleteing them.

Another thing is that that intel c++ inside VC6 generates External
exceptions !!! . My friend says that he have intel c++ 9.0 installed in
VS2003 but it's accessible from VC6 too. He is doubting that the intel
compiler are mixing libraries from VC6 and VS2003. When he tried /O3
instead of /fast the DLL works fine but the DLL is a slower than with
/O2 .

My impression that the whole performance difference is due to the link
libraries (libc.lib) which are better in VC6.

The last thing is that my CPU is ATHLON 64 , 2GHz, 1GRAM and my be
intel & VS2003 DLL will be faster on a P4.
 
A

Andre Kaufmann

YAZ said:
He imported the dsw project that i send to him so the compiler setting
are the same.

Ok, then both settings (VC7 and VC6) should be the same and the speed
difference should be caused by the runtime library, as i assumed in my
last post and you too.
[...]
The setting in VS2003 are :
/O2 /D "WIN32" /D "NDEBUG" /D "_WINDOWS" /D "_USRDLL" /D
"KERNEL_EXPORTS" /D "_WINDLL" /FD /ML /Fp".\Release/Kernel.pch"
/Fo".\Release/" /Fd".\Release/" /FR".\Release/" /W1 /nologo /c

Everything OK.

You could add: /Og /GL. for global optimization

This might result in better code, when handling with doubles (alignment).

and

/G7 for P4 / Athlon specific optimization
(G7 won't be supported anymore in VC8).
[...]

generation is the VS2003 & intel are slower in the memory allcoation of
2000000 double and coping them to another matrix. There are based on
new operator and memcpy which proofs that the link library libc.lib in
my case is fatser in VC6.
The other functions in the library give the same performance when
applied to small matrix. But if the memory allocation and copying is
slower in VS2003 and intel c++ the whole application will be slower.
After all the application job is manipulating objects by creating,
copying and deleteing them.

You could add the following lines to your main(...) or DllMain(...).

#include <malloc.h>

const size_t UseSmallBlockHeapForMaxBytes = 256; // You may
try higher // or lower values

int _tmain(int argc, _TCHAR* argv[])
{
_set_sbh_threshold(UseSmallBlockHeapForMaxBytes);

.....
}

This should boost your program's speed, if compiled with VC 2003.
But i wouldn't recommend this setting generally, because it might reduce
the speed of your programs, due to heap fragmentation.
Memory allocations are time consuming and therefore you should reduce
them by using e.g. boosts multi_arrays or special math libraries (matrix
template library etc.). This should also boost the speed of your VC6
compiled program. Especially if you are using dense / sparse matrices.
Each single memory allocation needs additional memory for internal heap
management and alignment.
Therefore most STL containers are using their own memory allocators and
are allocating only large blocks of memory from the programs heap.

[...]
My impression that the whole performance difference is due to the link
libraries (libc.lib) which are better in VC6.

Obviously at the first sight yes. But your programs will suffer from
heap fragmentation and cache failures, due to the increased memory overhead.
And if you have multiple dlls, each using a statically linked runtime
library, the overhead in using the internal and the Windows heap should
be even more significantly. This is why using a dynamic linked runtime
library results sometimes in faster programs compared to the static
linked ones.

The last thing is that my CPU is ATHLON 64 , 2GHz, 1GRAM and my be
intel & VS2003 DLL will be faster on a P4.

Don't think so. At least not (that) significantly. Normally the Athlon
should be "faster" executing single threaded programs.

Andre
 
Y

YAZ

Thank you André

In fact the _set_sbh_threshold(UseSmallBlo­ckHeapForMaxBytes=256) call
changed completely the performance of my application. Now I have nearly
the same speed as with VC6. Some fonctions are slower and other are
faster.
The speed difference is less than of 2% in favor of VC6.
I'll try to play with the UseSmallBlo­ckHeapForMaxBytes value in order
to optimize.
Thank you all
 
Y

YAZ

BTW,
I use atlas and lapack libraries in the matrix computation.
But I was surprised when you say
This is why using a dynamic linked runtime
library results sometimes in faster programs compared to the static
linked ones

Because I never got better performance with DLL run times. and less
with the multi threaded version of the run times.
 
A

Andre Kaufmann

YAZ said:
BTW,
I use atlas and lapack libraries in the matrix computation.
But I was surprised when you say



Because I never got better performance with DLL run times. and less
with the multi threaded version of the run times.

Sorry for the late answer - back from vacation.

It depends on your application. The dynamic runtime library isn´t
generally faster than the static linked one.

But if your application uses many DLL´s which are also using the static
runtime library the memory usage will be much higher, than if all the
DLL´s would be using the dynamic runtime library. And if you have an
application using much memory the speed difference might be significant.

André
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top