Performance: VC++ 33% slower then Builder 5 on LineTo() API call??

  • Thread starter Gustavo L. Fabro
  • Start date
G

Gustavo L. Fabro

Greetings!

Getting straight to the point, here are the results
of my experiment. I've included my comments and questions
after them.

The timing:
(The total time means the sum of each line's drawing time.
Time is measured in clock ticks (from QueryPerformanceCounter() API).
The processor resolution (QueryPerformanceFrequency()) for my
machine is 3579545).
------------------------------------------
Visual Studio .NET 2003
Total time: 717230
Average: 89.8165625

Borland Builder 5:
Total Time: 482151
Average: 61.0975

The code (for the DLL):
------------------------------------------
DrawDll.h
#ifdef DRAWDLL_EXPORTS
#define DRAWDLL_API __declspec(dllexport)
#else
#define DRAWDLL_API __declspec(dllimport)
#endif

class DRAWDLL_API CDrawDll {
public:
CDrawDll(void);
void MyMethod(HWND handle);
};

DrawDll.cpp
#include "stdafx.h"
#include "DrawDll.h"
#include <stdio.h>

BOOL APIENTRY DllMain( HANDLE hModule,
DWORD ul_reason_for_call,
LPVOID lpReserved
)
{
return TRUE;
}

void CDrawDll::MyMethod(HWND handle)
{

HDC hDC = ::GetDC(handle);

LARGE_INTEGER m_StartCounter; // start time
LARGE_INTEGER m_EndCounter; // finish time
__int64 m_ElapsedTime;
char buff2[255];

//For 800 different positions
for(int x=0;x<800;x++)
{
//10 times on each position
for(int rep=0;rep<10;rep++)
{
QueryPerformanceCounter (&m_StartCounter);

::MoveToEx(hDC, x,0, NULL);
::LineTo(hDC, 50+x,50);

QueryPerformanceCounter (&m_EndCounter);

//get and store finishing time and calc elapsed time(ticks)
m_2ElapsedTime = (m_2EndCounter.QuadPart -
m_2StartCounter.QuadPart );
sprintf(buff2, "%d\n", m_2ElapsedTime);
OutputDebugString(buff2);
}
}

ReleaseDC(handle, hDC);

}

CDrawDll::CDrawDll()
{
return;
}

The explanation
---------------------------------------------------------

In the translation process from a big project to Visual Studio, I started
facing some performance problems. Things were much slower on the VS compiled
executables. I went to study what exactly was happening and got to some
staring (to my point of view) conclusions.

I made a DLL and compiled it on Builder and Visual C++ .NET, with all
optimizations enabled for both compilers. The DLL has a class with only
one function, that gets a handle for a DC and draws 8.000 lines on it.

I made 2 executables that run the function from the DLL (compiled with
both compilers too).

The results were astonishing, for me, and I'd like an explanation for
what is happening.

I've run the test several times and the results are always of the
same magnitude. How can that be, if the only thing I'm doing is MoveTo() and
LineTo() API calls?

It's something simple! I'm not playing with the disk, loading large
chunks of memory, using managed extensions (I created a 'pure' Win32
project under VS), anything that could relate with performance.
Only 2 simple API calls.

Is Visual C++ really THAT MUCH slower?

I have the complete code and compiled executables here and will be glad
to send to anyone who wants to replicate the test. As for this posting
is concerned:

- Is VS compiled DLLs and/or executables inherently slower then, for
instance, Builder 5?
- Why does a simple API call takes that longer? Isn't it the same API call?
Shouldn't the call be fast and the API function itself take longer?
- Is there anything I can do/try to make the code run faster?

We would like to migrate other big projects for Visual C++, but now we're
having
second thoughts!

Waiting for a light,

Gustavo L. Fabro
 
J

Jonathan Allen

Could you give me an example of when I would want to call that function 8000
times in a tight loop?

Jonathan

Gustavo L. Fabro said:
Greetings!

Getting straight to the point, here are the results
of my experiment. I've included my comments and questions
after them.

The timing:
(The total time means the sum of each line's drawing time.
Time is measured in clock ticks (from QueryPerformanceCounter() API).
The processor resolution (QueryPerformanceFrequency()) for my
machine is 3579545).
------------------------------------------
Visual Studio .NET 2003
Total time: 717230
Average: 89.8165625

Borland Builder 5:
Total Time: 482151
Average: 61.0975

The code (for the DLL):
------------------------------------------
DrawDll.h
#ifdef DRAWDLL_EXPORTS
#define DRAWDLL_API __declspec(dllexport)
#else
#define DRAWDLL_API __declspec(dllimport)
#endif

class DRAWDLL_API CDrawDll {
public:
CDrawDll(void);
void MyMethod(HWND handle);
};

DrawDll.cpp
#include "stdafx.h"
#include "DrawDll.h"
#include <stdio.h>

BOOL APIENTRY DllMain( HANDLE hModule,
DWORD ul_reason_for_call,
LPVOID lpReserved
)
{
return TRUE;
}

void CDrawDll::MyMethod(HWND handle)
{

HDC hDC = ::GetDC(handle);

LARGE_INTEGER m_StartCounter; // start time
LARGE_INTEGER m_EndCounter; // finish time
__int64 m_ElapsedTime;
char buff2[255];

//For 800 different positions
for(int x=0;x<800;x++)
{
//10 times on each position
for(int rep=0;rep<10;rep++)
{
QueryPerformanceCounter (&m_StartCounter);

::MoveToEx(hDC, x,0, NULL);
::LineTo(hDC, 50+x,50);

QueryPerformanceCounter (&m_EndCounter);

//get and store finishing time and calc elapsed time(ticks)
m_2ElapsedTime = (m_2EndCounter.QuadPart -
m_2StartCounter.QuadPart );
sprintf(buff2, "%d\n", m_2ElapsedTime);
OutputDebugString(buff2);
}
}

ReleaseDC(handle, hDC);

}

CDrawDll::CDrawDll()
{
return;
}

The explanation
---------------------------------------------------------

In the translation process from a big project to Visual Studio, I started
facing some performance problems. Things were much slower on the VS
compiled
executables. I went to study what exactly was happening and got to some
staring (to my point of view) conclusions.

I made a DLL and compiled it on Builder and Visual C++ .NET, with all
optimizations enabled for both compilers. The DLL has a class with only
one function, that gets a handle for a DC and draws 8.000 lines on it.

I made 2 executables that run the function from the DLL (compiled with
both compilers too).

The results were astonishing, for me, and I'd like an explanation for
what is happening.

I've run the test several times and the results are always of the
same magnitude. How can that be, if the only thing I'm doing is MoveTo()
and
LineTo() API calls?

It's something simple! I'm not playing with the disk, loading large
chunks of memory, using managed extensions (I created a 'pure' Win32
project under VS), anything that could relate with performance.
Only 2 simple API calls.

Is Visual C++ really THAT MUCH slower?

I have the complete code and compiled executables here and will be glad
to send to anyone who wants to replicate the test. As for this posting
is concerned:

- Is VS compiled DLLs and/or executables inherently slower then, for
instance, Builder 5?
- Why does a simple API call takes that longer? Isn't it the same API
call?
Shouldn't the call be fast and the API function itself take longer?
- Is there anything I can do/try to make the code run faster?

We would like to migrate other big projects for Visual C++, but now we're
having
second thoughts!

Waiting for a light,

Gustavo L. Fabro
 
P

Phil Frisbie, Jr.

Gustavo said:
Greetings!

Getting straight to the point, here are the results
of my experiment. I've included my comments and questions
after them.

The timing:
(The total time means the sum of each line's drawing time.
Time is measured in clock ticks (from QueryPerformanceCounter() API).
The processor resolution (QueryPerformanceFrequency()) for my
machine is 3579545).
------------------------------------------
Visual Studio .NET 2003
Total time: 717230
Average: 89.8165625

Borland Builder 5:
Total Time: 482151
Average: 61.0975

Did you look at the assembly produced by both compilers?

But artificial tests like this rarely mean anything in real applications...
 
T

Tim Robinson

Gustavo said:
In a CAD application, for instance. The function is only called one time,
what it does
is draw 8000 lines.

On a regular CAD drawing much more then 8.000 lines are needed for the
complete
drawing to take place.

In a CAD program you wouldn't be calling QueryPerformanceCounter or
OutputDebugString for each line:

for(int rep=0;rep<10;rep++)
{
QueryPerformanceCounter (&m_StartCounter);

::MoveToEx(hDC, x,0, NULL);
::LineTo(hDC, 50+x,50);

QueryPerformanceCounter (&m_EndCounter);

//get and store finishing time and calc elapsed time(ticks)
m_2ElapsedTime = (m_2EndCounter.QuadPart -
m_2StartCounter.QuadPart );
sprintf(buff2, "%d\n", m_2ElapsedTime);
OutputDebugString(buff2);
}

QPC and ODS both have high overhead: each involve a transition to kernel
mode and back; QPC samples the hardware timer; and when a debugger is
attached, ODS effectively triggers an exception, which causes a full
context switch to the debugger and back.

Move the benchmarking code to the outside of the outer loop -- time the
whole operation -- and then compare results.
 
C

Carl Daniel [VC++ MVP]

Gustavo said:
Greetings!

Getting straight to the point, here are the results
of my experiment. I've included my comments and questions
after them.

What command-line options are you using for the VC++ build? If you're
compiling it as managed code (/clr) I wouldn't be surprised to see a 33%
speed reduction since you'd be transitioning in and out of managed code
several times per iteration of your timing loop.

-cd
 
G

Gustavo L. Fabro

Could you give me an example of when I would want to call that function
8000
times in a tight loop?

In a CAD application, for instance. The function is only called one time,
what it does
is draw 8000 lines.

On a regular CAD drawing much more then 8.000 lines are needed for the
complete
drawing to take place.
Jonathan

Gustavo L. Fabro said:
Greetings!

Getting straight to the point, here are the results
of my experiment. I've included my comments and questions
after them.

The timing:
(The total time means the sum of each line's drawing time.
Time is measured in clock ticks (from QueryPerformanceCounter() API).
The processor resolution (QueryPerformanceFrequency()) for my
machine is 3579545).
------------------------------------------
Visual Studio .NET 2003
Total time: 717230
Average: 89.8165625

Borland Builder 5:
Total Time: 482151
Average: 61.0975

The code (for the DLL):
------------------------------------------
DrawDll.h
#ifdef DRAWDLL_EXPORTS
#define DRAWDLL_API __declspec(dllexport)
#else
#define DRAWDLL_API __declspec(dllimport)
#endif

class DRAWDLL_API CDrawDll {
public:
CDrawDll(void);
void MyMethod(HWND handle);
};

DrawDll.cpp
#include "stdafx.h"
#include "DrawDll.h"
#include <stdio.h>

BOOL APIENTRY DllMain( HANDLE hModule,
DWORD ul_reason_for_call,
LPVOID lpReserved
)
{
return TRUE;
}

void CDrawDll::MyMethod(HWND handle)
{

HDC hDC = ::GetDC(handle);

LARGE_INTEGER m_StartCounter; // start time
LARGE_INTEGER m_EndCounter; // finish time
__int64 m_ElapsedTime;
char buff2[255];

//For 800 different positions
for(int x=0;x<800;x++)
{
//10 times on each position
for(int rep=0;rep<10;rep++)
{
QueryPerformanceCounter (&m_StartCounter);

::MoveToEx(hDC, x,0, NULL);
::LineTo(hDC, 50+x,50);

QueryPerformanceCounter (&m_EndCounter);

//get and store finishing time and calc elapsed time(ticks)
m_2ElapsedTime = (m_2EndCounter.QuadPart -
m_2StartCounter.QuadPart );
sprintf(buff2, "%d\n", m_2ElapsedTime);
OutputDebugString(buff2);
}
}

ReleaseDC(handle, hDC);

}

CDrawDll::CDrawDll()
{
return;
}

The explanation
---------------------------------------------------------

In the translation process from a big project to Visual Studio, I started
facing some performance problems. Things were much slower on the VS
compiled
executables. I went to study what exactly was happening and got to some
staring (to my point of view) conclusions.

I made a DLL and compiled it on Builder and Visual C++ .NET, with all
optimizations enabled for both compilers. The DLL has a class with only
one function, that gets a handle for a DC and draws 8.000 lines on it.

I made 2 executables that run the function from the DLL (compiled with
both compilers too).

The results were astonishing, for me, and I'd like an explanation for
what is happening.

I've run the test several times and the results are always of the
same magnitude. How can that be, if the only thing I'm doing is MoveTo()
and
LineTo() API calls?

It's something simple! I'm not playing with the disk, loading large
chunks of memory, using managed extensions (I created a 'pure' Win32
project under VS), anything that could relate with performance.
Only 2 simple API calls.

Is Visual C++ really THAT MUCH slower?

I have the complete code and compiled executables here and will be glad
to send to anyone who wants to replicate the test. As for this posting
is concerned:

- Is VS compiled DLLs and/or executables inherently slower then, for
instance, Builder 5?
- Why does a simple API call takes that longer? Isn't it the same API
call?
Shouldn't the call be fast and the API function itself take longer?
- Is there anything I can do/try to make the code run faster?

We would like to migrate other big projects for Visual C++, but now we're
having
second thoughts!

Waiting for a light,

Gustavo L. Fabro
 
G

Gustavo L. Fabro

Did you look at the assembly produced by both compilers?

By this time I unfortunately don't have the necessary knowledge in assembly
language to be able to tell something concrete out of 2 given codes. If that
helps I can disassemble both DLLs and post the code here!
But artificial tests like this rarely mean anything in real
applications...

I'm afraid this is not the case here. This test is just a replication of
something I have
seen in practice. Our CAD application took 5 times longer to draw the same
file
in the screen with the VS compiled version then with our Builder compiled
one.

As the application itself has lots of classes, DLLs, and we used managed and
unmanaged C++ in the middle, I tried to first check out if the API calls
themselves, after all the processing (of elements, points positions, etc)
were running at the same speed. In case that was true, I would then try to
focus on the managed/unmanaged approach, DLL interaction and other factors.

But when I saw that even the API drawing calls themselves were taking
longer, I got intrigued... And decided to do this test! Hence the results
here demonstrated and the question: Is it *really* like this?

Fabro
 
K

Ken Hagan

Gustavo said:
I've run the test several times and the results are always of the
same magnitude. How can that be, if the only thing I'm doing is
MoveTo() and LineTo() API calls?

It's something simple! I'm not playing with the disk, loading large
chunks of memory, using managed extensions (I created a 'pure' Win32
project under VS), anything that could relate with performance.
Only 2 simple API calls.

Is Visual C++ really THAT MUCH slower?

Well, first off, as you state yourself, the portion of the code that
your compilers generated is only a fraction of the full overhead.
The work of the API calls is done by the same (OS) code in both cases,
so the results are not comparing VC with Builder 5.

Having said that, if your original application shows the same behaviour
then it is quite reasonable for you to ask for an explanation!

Try comparing the interval "m_ElapsedTime" with a millisecond or so.
(357954 in your case). If the APIs take more than that, then you've
suffered a context switch and you should ignore that time interval.
If this is the case, the question ceases to be "why is VC slower"
but becomes "why is VC provoking context switches" and the answer
probably lies in the run-time library rather than the compiler's
code generation.

A similar test is to use an array of "m_ElapsedTime[10]" and collect
ten iterations of the inner loop between tracing. Yet another test
might be to insert Sleep(0) at the start of the inner loop. If either
of these affects the results, your problem is context switching.

Another variation is to use the RDTSC instruction...

__declspec(naked) __int64 Rdtsc()
{
__asm rdtsc;
__asm ret;
}

This is a higher resolution timer with much lower calling overheads.

Oh, and lastly, %d isn't the correct format for an __int64 variable.
 
G

Guest

Just out of curiosity. How come you are not using hardware to render your
lines (i.e. DirectX). If performance is an issue, using DirectX to draw lines
would give you a seemingly infinite boost in performance compared to
rendering your lines in software (even anti-aliased lines).

Just curious.

cheers,
Luis Miguel Huapaya
 
G

Gustavo L. Fabro

Greetings!

Thanks everybody for the comments. I've ran the
tests again, and indeed it was my mistake.

As Tim suggested, With the profiling code on the
outer loop (eliminating a great overhead) and putting
the call in a better place (I was using menus, but that
XP's "fading effect" time was interfering in the timing)
the results I got matched what I expected in the first place:

Visual Studio:
269996

Borland:
270206

I can now go through the code and try to find what is really
affecting the speed (I had stopped when I saw this).

Answering to Carl, I wasn't compiling using managed code. Will
do so later on in my quest to see what is happening in our program.

And commenting Ken's reply, I appreciate the tips for reducing
context switching time's interference in the profiles for a better timing.
Will use that next time if I find myself in a similar situation!

Fabro
 
G

Gustavo L. Fabro

Just out of curiosity. How come you are not using hardware to render your
lines (i.e. DirectX). If performance is an issue, using DirectX to draw
lines
would give you a seemingly infinite boost in performance compared to
rendering your lines in software (even anti-aliased lines).

Hmmm... As far as I know (or knew), GDI calls are accelerated by hardware
when
available (and when the "Hardware Acceleration" slider in Control Panel,
Video, Configuration, Advanced, Problem Solving is not all to the left).

The profiling for the problem of this post, for instance, was made using
a computer with the "Hardware Acceleration" slider a couple of notches
to the left. It took an average of 270206 ticks to draw 8.000 lines. With
hardware acceleration fully enabled, the time droped to 62123.

Am I wrong? If DirectX could give an infinite boost in performance
I would definitely be interested!

Fabro
 
D

Derrick Coetzee [MSFT]

Gustavo said:
for(int rep=0;rep<10;rep++)
{
QueryPerformanceCounter (&m_StartCounter);

::MoveToEx(hDC, x,0, NULL);
::LineTo(hDC, 50+x,50);

QueryPerformanceCounter (&m_EndCounter);

/* snip */
}
------------------------------------------
Visual Studio .NET 2003
Total time: 717230
Average: 89.8165625

Borland Builder 5:
Total Time: 482151
Average: 61.0975

I can't explain the speed difference in your experiment, but I can say
that if you are writing or porting an application for which drawing
primitives are a critical bottleneck, such as the CAD applications you
cite in a later post, you should seriously consider using a
performance-oriented graphics library such as DirectX or OpenGL, which
takes advantage of modern hardware. The GDI is, quite frankly, rarely up
to the task of serious graphics work, just simple business graphics such
as bar charts and buttons.
 
T

Tom Widmer

Gustavo said:
Hmmm... As far as I know (or knew), GDI calls are accelerated by hardware
when
available (and when the "Hardware Acceleration" slider in Control Panel,
Video, Configuration, Advanced, Problem Solving is not all to the left).

The profiling for the problem of this post, for instance, was made using
a computer with the "Hardware Acceleration" slider a couple of notches
to the left. It took an average of 270206 ticks to draw 8.000 lines. With
hardware acceleration fully enabled, the time droped to 62123.

Am I wrong? If DirectX could give an infinite boost in performance
I would definitely be interested!

Yes, such 2D calls generally are accelerated by hardware. However, for
the ultimate in speed, you should perhaps render using 3D hardware,
although this requires a lot of extra programming work. This would be
appropriate for a CAD application though, perhaps.

Tom
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top