[How weird. This post didn't appear to go through the first time...]
I'm curious what situations you've run into that were the fault of the
GC. Is the LOH bug one of them?
This is going to open a big can-o-worms I should really leave unopened,
but...
Both cases were related to arrays of objects (not on the LOH heap), that
were being allocated at the start of an operation. Both cases were running
on multi-core x86 machines, with the GC operating in Server mode.
The code, in essence, looked like:
ProcessTransaction()
{
PreLoadDatabaseData allData[] = DataLayer.GetAllUserData[...]
// Lots of complex synchronous operations here.
// We hit LDAP, AD, WMI, Sockets, Files, and Databases.
// We really leverage Reflection, Dynamic Method Invokation,
// and custom attributes.
// This method is recursive, as Transactions often kick off
// other transactions.
// There is also quite a bit of concurrency going
// on with various shared datastructures.
// variable should go out of scope here.
}
We would see this variable, state data, stay around forever. I was able to
confirm with both Scitech and SOS that things were not referenced by
anything, yet the data would never be cleaned up.
Much to my surprise, the solution was to add in "allData=null;" to the end
of the method. I've got lots of angry comments in my source code, none of
which are at all suitable for publication!
Both times this happened, I spent a signifigant amount of time trying to
duplicate the problem in a more "unit test" type of case. In both cases, I
failed - even keeping the object heirarchy, all the polymorphic classes, and
such, everything would work great in a unit test scenario. I didn't keep
track of the particular revision of .Net I was using, and I haven't removed
the code and tested on the newest revision to the platform.
In both cases, we discovered the problem via MiniDumps sent to us by
customers who were seeing OOM. And before the obvious question is asked, I
do (very well!) understand the difference between "eligible for GC, but
hasn't been collected yet" and "Why the f*ck doesn't this get collected?".
Disclaimer:
Before people jump on this post as support for "I should always set my
variables to null when I'm done with them", I feel the need to pre-empt that
argument. Please don't do it. Don't set your variables to null! Use the GC
correctly. It's your friend. Don't fight with it. Besides, setting your
stuff to null actually keeps the references around longer in many cases.
This is 2 examples out of lots and lots and lots of code, running on
thousands of installations with crazy load and throughput requrements. This
stuff also runs on x86 / x64 / IA64, works well in all cases, and is
frightfully complex. Your code just isn't this complex. Really. If you think
it is, and you think you really need to do this, hire me to do an evaluation
of your code base, and I'll let you know.