mixed-mode DLL: register corruption occurs when managed C++ callsunmanaged C++

A

Adam McKee

We are using Visual Studio.NET 2003 in our project with .NET framework
1.1. One of our libraries is a mixed-mode dll assembly consisting of
one managed C++ library, and several unmanaged C++ libraries. We are
using managed C++ as a bridge between managed .NET code and unmanaged
C++ code, which I'm sure is a fairly common practice. The managed C++
library is compiled with /CLR whereas all other libraries are compiled
without /CLR because they are strictly native C++ code.

Now let me take you on a journey.

A class like the following is written in unmanaged C++:

////////////////////////////////////////////////////////////////

/// UnmanagedClass.h

class UnmanagedClass
{
public:
virtual bool returnsFalse() const;
};

/// UnmanagedClass.cpp

bool
UnmanagedClass::returnsFalse() const
{
return false;
}

////////////////////////////////////////////////////////////////

then consider the following managed code:

////////////////////////////////////////////////////////////////

/// ManagedClass.cpp

void
ManagedClass::foo()
{
UnmanagedClass* obj = new UnmanagedClass();
bool res = obj->returnsFalse();
assert(!res); // <= this assertion fails
}

////////////////////////////////////////////////////////////////

I have been able to consistently reproduce this problem by running an
example very similar to the above *before* executing any of my other
code. At first I could hardly believe what I was seeing -- I actually
watched a function return "false", but the caller got back "true"! Not
good at all. Looking at the machine code, the "returnsFalse" function
is as follows:

07778D90 push ebp
07778D91 mov ebp,esp
07778D93 sub esp,8
07778D96 mov dword ptr [ebp-8],0CCCCCCCCh
07778D9D mov dword ptr [ebp-4],0CCCCCCCCh
07778DA4 mov dword ptr [ebp-4],ecx
07778DA7 xor al,al
07778DA9 mov esp,ebp
07778DAB pop ebp
07778DAC ret 4

////////////////////////////////////////////////////////////////

The boolean result of the function is returned in the "AL" register,
which does equal 0 at the time when the "ret" instruction is executed.
Here are all the register values at the time "ret" is executed:

////////////////////////////////////////////////////////////////

EAX = 07751400 EBX = 0012F594 ECX = 07455B48 EDX = 07455B98
ESI = 001532A8 EDI = 00000000 EIP = 0775147E ESP = 0012F58C
EBP = 0012F5E4 EFL = 00000246

////////////////////////////////////////////////////////////////

The caller looks like this:

00000038 push 170470h
0000003d call F8D18698
00000042 movzx ebx,al <=== control returns here
(expecting result in "AL")
00000045 movzx eax,bl
00000048 mov dword ptr [ebp-0Ch],eax
0000004b nop
0000004c mov ebx,dword ptr [ebp-0Ch]
0000004f jmp 00000051
00000051 mov eax,ebx
00000053 pop ebx
00000054 pop esi
00000055 pop edi
00000056 mov esp,ebp
00000058 pop ebp
00000059 ret

////////////////////////////////////////////////////////////////

When control returns to the caller (at 000000042h), the registers
have changed to the following:

////////////////////////////////////////////////////////////////

EAX = 00000001 EBX = 07751470 ECX = 00000004 EDX = 00000000
ESI = 07455B48 EDI = 07455B98 EBP = 0012F63C ESP = 0012F624

////////////////////////////////////////////////////////////////

As you can see, the "AL" register has been set to "01". Actually the
entire EAX register has been set to "00000001". Since the "ret"
instruction should never modify the contents of the AL register, I had
another look at the point of time when the "ret 4" is executed. The
call stack reveals the likely culprit:

////////////////////////////////////////////////////////////////
libjss.dll!cse::ResourceRequirement::equals(
constcse::ResourceRequirement & rhs={...}) Line 27 C++
mscorwks.dll!7925c098()
^^^^^^^^^^^^^^^^^^^ probable culprit ^^^^^^^^^^^^^^^^^^^^^^^^^^^
libjss.dll!utl::equals<cse::ResourceRequirement>(
cse::ResourceRequirement* lhs = 0x07455b48,
cse::ResourceRequirement* rhs = 0x07455b98) Line 134 + 0x15
bytes C++
libjss.dll!jss.ResourceRequirement.Equals(
System.Object rhs = 0x04ce6ba0)
Line 33 + 0xc bytes C++
demo_cs.exe!demo_cs.demo_cs.evilBug() Line 143 + 0x9 bytes C#
demo_cs.exe!demo_cs.demo_cs.Main(string[] args = {Length=5})
Line 317 C#

////////////////////////////////////////////////////////////////

The register mangling must occur in mscorwks.dll, which I assume is
responsible for managing calls from managed into unmanaged code. I
found that if I compiled the native C++ projects with /CLR switch,
then the problem goes away (because there is no longer a transition
into unmanaged code). However, compiling with "/CLR" has significant
disadvantages, including:

1. can't use pre-compiled headers -- they make a BIG difference
compilation performance
2. debugging of unmanaged code becomes very difficult

I should say that I have followed the instructions for mixed-mode DLLs
that are described in a KB article called "Converting Managed Extensions
for C++ Projects from Pure Intermediate Language to Mixed Mode".

The debugging problem when using "/CLR" _also_ implicates mscorwks.dll.
Basically when I step into unmanaged C++ code, the debugger says "there
is no source code for the current location". Looking at the call stack,
I can see that mscorwks.dll!7925c098 is on TOP of the call stack, and
the native C++ code that I am trying to debug is next down on the call
stack. The end result is that debugging the native C++ code (that was
compiled with "/CLR") is a very painful exercise.

So I have to choose between:

a) having my code execute as I wrote it ("/CLR" switch turned ON for
native C++)
-OR-
b) being able to debug the code ("/CLR" switch turned OFF for native
C++)

In other words, if I want to be able to debug the code, then I must
accept the fact that the code will not always execute as I wrote it.
Not a good situation to be in! I don't see what I can do now except
try to get help.

Please ... help ....

-Adam McKee // (e-mail address removed)
 
C

Carl Daniel [VC++ MVP]

I believe this is a known bug, although when I looked recently I was unable
to find any document on the MSDN web site that seems relevant to this bug.

Another alternative it to return BOOL (a typedef for int) instead of bool.
IIRC, this marshalling bug only affects functions that return bool.

-cd


Adam said:
We are using Visual Studio.NET 2003 in our project with .NET framework
1.1. One of our libraries is a mixed-mode dll assembly consisting of
one managed C++ library, and several unmanaged C++ libraries. We are
using managed C++ as a bridge between managed .NET code and unmanaged
C++ code, which I'm sure is a fairly common practice. The managed C++
library is compiled with /CLR whereas all other libraries are compiled
without /CLR because they are strictly native C++ code.

Now let me take you on a journey.

A class like the following is written in unmanaged C++:

////////////////////////////////////////////////////////////////

/// UnmanagedClass.h

class UnmanagedClass
{
public:
virtual bool returnsFalse() const;
};

/// UnmanagedClass.cpp

bool
UnmanagedClass::returnsFalse() const
{
return false;
}

////////////////////////////////////////////////////////////////

then consider the following managed code:

////////////////////////////////////////////////////////////////

/// ManagedClass.cpp

void
ManagedClass::foo()
{
UnmanagedClass* obj = new UnmanagedClass();
bool res = obj->returnsFalse();
assert(!res); // <= this assertion fails
}

////////////////////////////////////////////////////////////////

I have been able to consistently reproduce this problem by running an
example very similar to the above *before* executing any of my other
code. At first I could hardly believe what I was seeing -- I actually
watched a function return "false", but the caller got back "true"!
Not good at all. Looking at the machine code, the "returnsFalse"
function
is as follows:

07778D90 push ebp
07778D91 mov ebp,esp
07778D93 sub esp,8
07778D96 mov dword ptr [ebp-8],0CCCCCCCCh
07778D9D mov dword ptr [ebp-4],0CCCCCCCCh
07778DA4 mov dword ptr [ebp-4],ecx
07778DA7 xor al,al
07778DA9 mov esp,ebp
07778DAB pop ebp
07778DAC ret 4

////////////////////////////////////////////////////////////////

The boolean result of the function is returned in the "AL" register,
which does equal 0 at the time when the "ret" instruction is executed.
Here are all the register values at the time "ret" is executed:

////////////////////////////////////////////////////////////////

EAX = 07751400 EBX = 0012F594 ECX = 07455B48 EDX = 07455B98
ESI = 001532A8 EDI = 00000000 EIP = 0775147E ESP = 0012F58C
EBP = 0012F5E4 EFL = 00000246

////////////////////////////////////////////////////////////////

The caller looks like this:

00000038 push 170470h
0000003d call F8D18698
00000042 movzx ebx,al <=== control returns here
(expecting result in "AL")
00000045 movzx eax,bl
00000048 mov dword ptr [ebp-0Ch],eax
0000004b nop
0000004c mov ebx,dword ptr [ebp-0Ch]
0000004f jmp 00000051
00000051 mov eax,ebx
00000053 pop ebx
00000054 pop esi
00000055 pop edi
00000056 mov esp,ebp
00000058 pop ebp
00000059 ret

////////////////////////////////////////////////////////////////

When control returns to the caller (at 000000042h), the registers
have changed to the following:

////////////////////////////////////////////////////////////////

EAX = 00000001 EBX = 07751470 ECX = 00000004 EDX = 00000000
ESI = 07455B48 EDI = 07455B98 EBP = 0012F63C ESP = 0012F624

////////////////////////////////////////////////////////////////

As you can see, the "AL" register has been set to "01". Actually the
entire EAX register has been set to "00000001". Since the "ret"
instruction should never modify the contents of the AL register, I had
another look at the point of time when the "ret 4" is executed. The
call stack reveals the likely culprit:

////////////////////////////////////////////////////////////////
libjss.dll!cse::ResourceRequirement::equals(
constcse::ResourceRequirement & rhs={...}) Line 27 C++
mscorwks.dll!7925c098()
^^^^^^^^^^^^^^^^^^^ probable culprit ^^^^^^^^^^^^^^^^^^^^^^^^^^^
libjss.dll!utl::equals<cse::ResourceRequirement>(
cse::ResourceRequirement* lhs = 0x07455b48,
cse::ResourceRequirement* rhs = 0x07455b98) Line 134 + 0x15
bytes C++
libjss.dll!jss.ResourceRequirement.Equals(
System.Object rhs = 0x04ce6ba0)
Line 33 + 0xc bytes C++
demo_cs.exe!demo_cs.demo_cs.evilBug() Line 143 + 0x9 bytes C#
demo_cs.exe!demo_cs.demo_cs.Main(string[] args = {Length=5})
Line 317 C#

////////////////////////////////////////////////////////////////

The register mangling must occur in mscorwks.dll, which I assume is
responsible for managing calls from managed into unmanaged code. I
found that if I compiled the native C++ projects with /CLR switch,
then the problem goes away (because there is no longer a transition
into unmanaged code). However, compiling with "/CLR" has significant
disadvantages, including:

1. can't use pre-compiled headers -- they make a BIG difference
compilation performance
2. debugging of unmanaged code becomes very difficult

I should say that I have followed the instructions for mixed-mode DLLs
that are described in a KB article called "Converting Managed
Extensions for C++ Projects from Pure Intermediate Language to Mixed
Mode".

The debugging problem when using "/CLR" _also_ implicates
mscorwks.dll. Basically when I step into unmanaged C++ code, the
debugger says "there is no source code for the current location".
Looking at the call stack, I can see that mscorwks.dll!7925c098 is on
TOP of the call stack, and
the native C++ code that I am trying to debug is next down on the call
stack. The end result is that debugging the native C++ code (that was
compiled with "/CLR") is a very painful exercise.

So I have to choose between:

a) having my code execute as I wrote it ("/CLR" switch turned
ON for native C++)
-OR-
b) being able to debug the code ("/CLR" switch turned OFF for
native C++)

In other words, if I want to be able to debug the code, then I must
accept the fact that the code will not always execute as I wrote it.
Not a good situation to be in! I don't see what I can do now except
try to get help.

Please ... help ....

-Adam McKee // (e-mail address removed)
 
C

Carl Daniel [VC++ MVP]

Jochen said:
Yes, indeed this is a known bug for over 1 year!!!
I reported it 12.04.2002.

Has it been reported through a formal channel, or just on the ngs? This is
definitely something that would be nice to get fixed!

-cd
 
C

Carl Daniel [VC++ MVP]

Carl said:
Has it been reported through a formal channel, or just on the ngs?
This is definitely something that would be nice to get fixed!

Yes, this bug is known to the VC team and AFIAK will be fixed a future
release of the product.

-cd
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top