Deadlock in multithread C# app

G

Guest

I am currently debugging a deadlock in a multithread C# application. It
makes lots of calls to legacy unmanaged code. The application runs on windows
sever 2003 and uses the sever version of the CLR. The application runs on
..NET 1.1 SP1.

When the deadlock happens the process is using 0 CPU.

I was able to collect crash dumps and have been trying to make sense out of
them using windbg.

!threads show PreEmptive GC is disabled for both thread 20 and 26. There are
other threads running (all have PreEmptive GC enabled) but I did not display
them here to keep this shorter.

PreEmptive GC Alloc
Lock
ID ThreadOBJ State GC Context Domain
Count APT Exception
20 0x1658 0x16e083d0 0x1800222 Disabled 0x01275044:0x01276a64 0x001498b8
1 MTA (Threadpool Worker) System.NullReferenceException
26 0x958 0x414274c0 0x1800220 Disabled 0x0d327830:0x0d327844 0x001498b8
2 MTA (GC) (Threadpool Worker) System.NullReferenceException


Here is the clrstack for thread 26. ttGetRteVarVariation is unmanaged code.
Looks like it had a problem and a exception (probably a
NullReferenceException) was thrown. There are GCFrames in the middle so I
begin to wonder if the GC is involved.

ESP EIP
0x40bae194 0x7c82ed54 [FRAME: HelperMethodFrame]
0x40bae1c0 0x799ef4fc [DEFAULT] [hasThis] Void
System.Diagnostics.StackTrace.CaptureStackTrace(I4,Boolean,Class
System.Threading.Thread,Class System.Exception)
0x40bae1e0 0x799f11fc [DEFAULT] [hasThis] Void
System.Diagnostics.StackTrace..ctor(Class System.Exception,Boolean)
0x40bae1ec 0x799ef21f [DEFAULT] String
System.Environment.GetStackTrace(Class System.Exception)
0x40bae248 0x16e32c14 [FRAME: InterceptorFrame] [DEFAULT] String
System.Environment.GetStackTrace(Class System.Exception)
0x40bae258 0x799f1b09 [DEFAULT] [hasThis] String
System.Exception.get_StackTrace()
0x40bae260 0x799f1a32 [DEFAULT] [hasThis] String System.Exception.ToString()
0x40bae28c 0x16e32b54 [FRAME: InterceptorFrame] [DEFAULT] [hasThis] String
System.Exception.ToString()
0x40bae29c 0x799fc50e [DEFAULT] [hasThis] String
System.Exception.InternalToString()
0x40bae578 0x791b33cc [FRAME: GCFrame]
0x40baefa4 0x791b33cc [FRAME: GCFrame]
0x40baf670 0x791b33cc [FRAME: NDirectMethodFrameGeneric] [DEFAULT] String
<namespaceremoved>.ttGetRteVarVariation(I4)
0x40baf680 0x40278218 [DEFAULT] [hasThis] String
<namespaceremoved>.GetRteVarVariation(I4)
at [+0x8] [+0x0]
0x40baf684 0x402781df [DEFAULT] [hasThis] String
<namespaceremoved>.get_Variation()
at [+0x1f] [+0x0]
<more stack trace after this but not that interesting>

Here is the native stack for thread 26. Look like the thread was trying to
allocate space (probably for the exception) and forced a call to GC.

ChildEBP RetAddr
40badb8c 7c822114 ntdll!KiFastSystemCallRet
40badb90 77e6711b ntdll!NtWaitForMultipleObjects+0xc
40badc38 77e61075 kernel32!WaitForMultipleObjectsEx+0x11a
40badc54 791f2ff8 kernel32!WaitForMultipleObjects+0x18
40bade8c 791f311c mscorsvr!Thread::SysSuspendForGC+0x248
40badea4 791f337d mscorsvr!GCHeap::SuspendEE+0xcf
40badec0 791f5775 mscorsvr!GCHeap::GarbageCollectGeneration+0x13f
40badef0 791bd4ae mscorsvr!gc_heap::allocate_more_space+0x181
40bae118 791b5411 mscorsvr!GCHeap::Alloc+0x7b
40bae12c 791b93c3 mscorsvr!Alloc+0x3a
40bae14c 791b9411 mscorsvr!FastAllocateObject+0x25
40bae1b8 799ef4fc mscorsvr!JIT_NewFast+0x2c
WARNING: Stack unwind information not available. Following frames may be
wrong.
40bae1cc 799f11fc mscorlib_79990000+0x5f4fc
40bae1e0 799ef21f mscorlib_79990000+0x611fc
40bae204 791d8504 mscorlib_79990000+0x5f21f
40bae218 00000000 mscorsvr!DoDeclarativeSecurity+0x1a

I went looking for the garbage collection thread to see what it was up to.
I found 4 GC threads. They all look the same. Here is the native stack for
one of the GC threads:

2 Id: 2394.1910 Suspend: 1 Teb: 7ffdd000 Unfrozen
ChildEBP RetAddr Args to Child
WARNING: Stack unwind information not available. Following frames may be
wrong.
00daff70 77e6ba12 000000bc ffffffff 00000000 ntdll!KiFastSystemCallRet
00daff84 791f3206 000000bc ffffffff 00000000 kernel32!WaitForSingleObject+0x12
00daffac 79224ac2 00000000 00daffec 77e66063
mscorsvr!gc_heap::gc_thread_function+0x2f
00daffb8 77e66063 00150448 00000000 00000000
mscorsvr!gc_heap::gc_thread_stub+0x1e
00daffec 00000000 79224aa4 00150448 00000000 kernel32!GetModuleFileNameA+0xeb


Looks like the garbage collection is waiting for something to happen. Some
research I have done tells me the garbage collection has to suspend all
threads before doing its work and that garbage collection can not suspend a
thread if the PreEmptive GC is disabled. PreEmptive GC can be disabled if
the thread is calling unmanaged code. Looks like I have a classic deadlock:

1) garbage collector is waiting for all threads to suspend.
2) thread is waiting for garbage collector to finish.

I believe I know why the code is throwing NullReferenceException and I am
fixing it.

My worry here is that this is a generic thing (A GC call is forced while
PreEmptive GC is disabled) that could happen and could still happen even if I
get rid of the NullReferenceExceptions.

Does anybody have any ideas or thoughts here? Have I found a bug/feature in
the .NET framework?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top