Garbage Collection Issues in long-standing services

G

Guest

I'm having issues with garbage collection with my long-standing service
process. If you could review and point me in the right direction it would be
of great help. If there are any helpful documents that you could point me to
help me control the GC, then that would be great also.

The .Net GC does not cleanup memory of our service process unless it is
forced to by another process that hogs memory.
· GC Algorithm - This is an issue because if the GC is not forced into doing
this, it does not aggressively cleanup until the amount of physical memory
available is very small. I understand why it doesn’t want to force cleanup
due to processor efficiency, but it forces applications into conditions that
are not acceptable. It would be nice to be able to hint an upper limit for
an application that helps the GC be more aggressive when required.
· Race Condition – The GC Algorithm causes race conditions because the GC is
not coordinated with our application and our application throws
OutOfMemoryExceptions. We have very good exception handling that guards
against unhandled exceptions in the main thread and thread pool threads. The
problem is that we use memory to log these issues so the handlers are
probably throwing another OutOfMemoryException. We can handle this, but the
point is the OutOfMemoryException cause a transaction to fail.
· Force GC To Collect - I wrote a Memory Hogger application which when run
will reduce the amount of memory used by the service application from 800MB
to 3MB, so this proves that the GC will cleanup the memory when it truly
needs it. I noticed that Memory Usage displayed in the Task Manager Process
tab did not add up to the total amount of memory in use, so this means that
the inactive applications probably moved their heap to swap. One concern
here was that when the Memory Hogger application was terminated, our service
application reclaimed half of its memory and we were not processing
transactions. Maybe GC just moved it to swap.
· GC.Collect – Using this is not recommended within the application. Even
when this is used, it doesn’t make the GC any more aggressive, so I agree,
there is no reason to use it. It would be nice to have the ability to make
the GC more aggressive.
· CLR Profiler - I’ve used the CLR Profiler to determine what memory is not
being collected. Mostly string and byte arrays. Our service handles TCP
Connections asynchronously and we reuse the same byte array when receiving
data for the next asynchronous read. We store a reference to the current
IAsyncResult for both the asynchronous send and receive requests. The state
object holds the byte array which we are currently not setting to null.
There isn’t a way of canceling the async requests, so we might have to
explicitly set this to null. I will try this to see if makes a difference.
As for the strings, I’m not sure where the problem is here.
 
W

Willy Denoyette [MVP]

Did you ever checked the GC performance counters (using perfmon) to check
whether this is true? The GC is more aggressive than you imagine.
Just check the Gen0 1 and 2 performance counters and you will see that the
collector runs, your problem is that you are holding references to objects
(probably large objects) which you never release, so there is little or
nothing to collect.
By starting another process that allocates memory, your service working set
gets trimmed by the OS, that all that happens.

Willy.
 
A

Andreas Mueller

Willy said:
Did you ever checked the GC performance counters (using perfmon) to check
whether this is true? The GC is more aggressive than you imagine.
Just check the Gen0 1 and 2 performance counters and you will see that the
collector runs, your problem is that you are holding references to objects
(probably large objects) which you never release, so there is little or
nothing to collect.
By starting another process that allocates memory, your service working set
gets trimmed by the OS, that all that happens.

Willy.

Could also be some large objects that are on the Large object Heap that
is not getting compacted. I found this article a good starting point:
http://tinyurl.com/3e9n3
HTH,
Andy
 
G

Guest

Thanks for the reply.

I would agree that I must be holding on to some references, I will try to
isolate. If the GC was really being aggressive, then why after everything
came back to steady state did the memory in use reduce by half? (800MB -
350MB)

I did run some tests on the classes that would give me the most problems and
it appeared that the GC was smart enough to solve the issue. Here is a brief
synopsis:

TCPServer - Has a hashtable holding a class that encapsulates the client
socket (CS), including a refrence to a class that handles the application
processing (AP). The first thing AP does when it handles an OnConnect is
store a reference to CS so that it can use it to send data back to the client.

State Object (SO) - This holds a reference to CS and a pointer to a 4K byte
array. This is passed on the Async Send / Receive Calls.

Connection Termination - After the connection is closed (abnormally or
normally), the CS is removed from the hashtable. I've done test to see that
once this is performed that the GC is smart enough to deal with the circular
reference between CS and AP, so I don't explicitly set each reference to null.

Outstanding Async Send / Receive - Depending how the connection is
terminated, there is a chance that there can be an Outstanding Async Send and
Receive. I store a reference to the most current IAsyncResult for both the
Send and Receive in the CS object. When closing, I check to see if these are
still outstanding and if so, I retrieve the SO object and set the CS
reference to null. I'm not setting the buffer to null and I probably should.
I don't expect that this condition is normal, but I do my best at handling
it. You can't just stop an asyncronous request which is unfortunate because
I'm sure there is a structure being stored in the OS to encapsulate each
(e.g. same structure that holds the state object).

I will continue to try and isolate. Any ideas based on the above?
 
G

Goran Sliskovic

Larry Herbinaux said:
Thanks for the reply.

I would agree that I must be holding on to some references, I will try to
isolate. If the GC was really being aggressive, then why after everything
came back to steady state did the memory in use reduce by half? (800MB -
350MB)
....

There is commercial product, .NET memory profiler (http://memprofiler.com/),
which will show you live objects, as well as who is holding reference and
where it was allocated. There is eval version too. It was great help for me
when I was in similar situation.

Regards,
Goran
 
G

Guest

Thanks, I will definitely try this out. In comparison with the CLR Profiler,
do you think it is easier to isolate the issues?

The CLR Profiler is helpful in many respects and the documentation is very
good. I was using it to look at the Objects Allocated By Age and this was
helpful to see what objects were holding on too long. I was using it while
doing a stress test and it was way too much information to do isolation. I
should probably just go back and send a single transaction and wait for a
period of time.

Anyhow thanks again.
 
J

Justin Creasy

Have you tried just calling the GC yourself. I was creating an
application that used screen capturing pretty heavily, and it managed
to fill memory up in no time. I simply inserted

GC.Collect();

in my outer loop and just like that, memory usage flattened. Hope this
helps.
 
G

Guest

Yeah, I tried this, but it was just reducing it a small amount much like the
GC was doing. I think I probably have some references that aren't being
removed. I just need to isolate it.

Thanks
 
W

Willy Denoyette [MVP]

Larry,

The problem with sockets is the unmanaged memory buffers used by the
underlying Winsock library, whenever you need to transfer a block of data
(probably a byte array) to and from the unmanaged socket send/receive
function, these arrays must be pinned. Now, when using asynchronous sockets,
these arrays may stay pinned for a relatively long period of time, right.
The problem with pinned objects (the arrays) is that they prevent the GC to
compact the heap (pinned objects can't move!).
The result is that, depending of the number of buffers (your State object?),
you might end with a highly fragmented GC heap that keeps growing, not
because the GC cannot collect but because he cannot compact.
So what you need to do is review your design, and try to find out how many
pinned objects you have in the youngest generations ((Gen0 and Gen1), these
have the most impact). Try to prevent pinning for young objects by
(pre)allocating your buffers very early in the process and use the same
buffers for the whole run of the process, that way they will end in the Gen2
after a few collector runs and stay there where they don't hurt the GC that
much.

Hope this helps.
Willy.
 
B

Bill Gregg

Willy,
Can you please explain further what you mean by saying "the arrays
are pinned"?

Thanks,
Bill Gregg
 
W

Willy Denoyette [MVP]

Bill Gregg said:
Willy,
Can you please explain further what you mean by saying "the arrays
are pinned"?

Thanks,
Bill Gregg

Bill,

Whenever you pass an instance of a reference type (say an array) to a native
function, you must prevent the GC from moving the object by "pinning" the
object while executing the function.
The PInvoke layer does this automatically for object references passed as
arguments, but you can also pin the object yourself by passing a pointer,
like this:

byte[] buf;
fixed(byte* bufferPtr = buf) { // pin buf
CallSomeNativeFunction(bufferPointer, ...);
} // unpin buf

Here the byte[] referred to by buf gets pinned for the duration of the call
(the scope of the fixed block).
Now, this is exactly what the Socket class methods are doing whenever you
send/receive data to or from a socket. They pin the (send/receive data)
buffer (byte[]) before they call into the native winsock library (calling
Send/Receive and WSASend/WSAReceive).

Hope this helps.

Willy.
 
G

Goran Sliskovic

Larry Herbinaux said:
Thanks, I will definitely try this out. In comparison with the CLR
Profiler,
do you think it is easier to isolate the issues?
....

I haven't used CLR profiler, but .NET memory profiler does a great job. You
can create snapshots of all objects, compare two snapshots, show undisposed
but also unreferenced (great for finding missing calls to Dispose()), live
objects, their allocation stack and who referenced it. It has real time info
(as program run) where you can trace number of allocated objects and you can
quickly identify classes that continuosly grow in number of instances.

Regards,
Goran
 
G

Guest

Willy, thanks for pointing this out, it does make sense concerning the Pinned
array.

In our case, the client should be fulfilling the requests in an immediate
fashion. If this is the case, the pinned array should get unpinned or
compacted since the async call completes pretty quickly, right?

The receive case is done quite well, I do create a 4K buffer and reuse it
for the life of the connection. In most cases, I don't expect an outstanding
begin receive at the end of the connection, but it could happen. I do keep
track of the IAsyncResults and if it is outstanding, I grab the state object
and remove the reference to my wrapper socket class, but I don't remove the
reference to the buffer. I should probably do this, right? The OS still has
a reference to the request object, can I just call EndReceive if it is
outstanding (not completed)? The one issue I see with this is if for some
reason the remote (client) socket hasn't closed (e.g. physical route is
broken, etc.), then EndReceive would block according to MSDN (framework 1.1).

The send case is a little more difficult and I would appreciate your advise
on it. I also have to support SSL and there is a maximum limit of 16K that
can be transfered at a time. I try to make life easy on the application
programmer by buffering this data if the amount is larger that 16K. I have a
class that my wrapper socket class references that contains an ArrayList of
byte arrays, so there could be more than one. Should I go to the exteme of
having a perminent send buffer and then just copy the data to this prior to
sending so that OS has only one reference to a byte array during the length
of the connection?

One other note, sorry about the long set of questions, I did use the CLR
Profiler using a simple application that mimics the references of my
TCPServer. My one worry was that my wrapper socket class did have a
reference to the TCPServer object (which is obviously long-standing) and I
was worried about the GC not collecting due to this. The TCPServer also has
a hashtable that references the wrapper socket object. I did verify that
when I removed the the wrapper socket from the hashtable that the GC wasn't
worried about the reference back to the TCPServer; I could explicitly set
this to null but feel that it is cleaner not having to do this throughout the
code. What's your opinion?
 
G

Guest

I think the real cool thing is to compare the differences of the snapshots
because this weeds out a ton of stuff that you have to riffle through in the
CLR Profiler.

Thanks again.
 
W

Willy Denoyette [MVP]

Larry,

There is something in this whole story which isn't clear to me:
If your buffers are that small (4K-16K) why does your working set grows to
over 800MB?
Is it because they are much larger, or because there are a lot of them, or
are you allocating many more objects in the process, or are you pegging the
NIC buffer (NDIS or winsock) with too many requests dispatched by a lot of
threads, while you aren't able to process the received buffers in a timely
fashion?

Willy.
 
G

Guest

We process about 13000 transactions a day. After processing 300K - 400K
worth of transactions, the working set gets up to 800K and the garbage
collector collects back down to at most 620K and then hits this water mark.
During the 300K - 400K process, due to the fact that there is plenty of Ram,
the process continues to make higher water marks.

The issue is that it appears I am holding on to the Wrapper Socket class, the
Application processor class which has references to other things that we
don't manually remove because we expect garbage collection to do it job once
the the Wrapper Socket class is removed from the hash table and is no longer
rooted. We also apear to be holding onto the buffers because byte [] keep
increasing in the CLR Profiler. String and byte [] take up the majority.
The Wrapper Socket and associated class aren't that big because they just
hold references to other objects.

The problem I think has to do with the Wrapper Socket not becoming unrooted
because once this occurs everything should become unrooted. As I stated, the
Wrapper Socket class does have a reference back to the TCPServer which is
long lived but I have already modelled that this does not cause the garbage
collector from doing its job.

The only other thing which would make sense is that my state object is not
getting freed and is be held by the OS via an asyncronous request. The last
async receive request could be outstanding, but I think I handle this
situation for the Wrapper Socket class. The state object holds the Wrapper
Socket class and the buffer. I'm not explicitly setting the buffer to null.
In most cases with normal shutdown, I don't expect to have an outstand
asyncronous receive request because if I receive 0 bytes (e.g. client
closing), I don't create another asyncronous receive request. Again, based
on CLR Profiler, it looks like state object might be being rooted.

Someone recommend another memory profiling tool which I think will help. It
does differences of snapshots, so it can weed out a lot of the stuff I don't
really care about. It has other interesting features that might better
indicate where the issue is occurring. This is going to be my next step.

Thanks again for all the help; it really helps to have people to bounce
ideas off of.
 
G

Guest

I found the main issue!

SciTech's .Net Memory Profiler is awesome. Doing the differences between
the snapshots after performing a single transaction allowed me to uncover the
issue.

The Wrapper Socket class has a Timer whose Callback is defined in the
instance of the Application Protocol Processor (APP) class. Well, I turn
off the timer before closing, but I don't set it to null, thus the Timer is
rooted which holds the APP class which in turn holds everything else.

I haven't written the fix yet, but I'm sure it will work to set the timer to
null. It all makes sense; it's just funny how you can mentally filter
something out like this. I'm using the evaluation version of SciTech's tool,
but the check will be in the mail to by the full version!

Thanks again for all the help.
 
G

Guest

I posted this comment on another sub-thread, but I want people reading this
sub-thread to strongly consider buying SciTech's Memory Profiler:

I found the main issue!

SciTech's .Net Memory Profiler is awesome. Doing the differences between
the snapshots after performing a single transaction allowed me to uncover the
issue.

The Wrapper Socket class has a Timer whose Callback is defined in the
instance of the Application Protocol Processor (APP) class. Well, I turn
off the timer before closing, but I don't set it to null, thus the Timer is
rooted which holds the APP class which in turn holds everything else.

I haven't written the fix yet, but I'm sure it will work to set the timer to
null. It all makes sense; it's just funny how you can mentally filter
something out like this. I'm using the evaluation version of SciTech's tool,
but the check will be in the mail to by the full version!

Thanks again for all the help.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top