Garbage Colletor

  • Thread starter Johnny E. Jensen
  • Start date
B

Ben Voigt [C++ MVP]

I quote you: "to me a memory leak means a very specific thing, which is
Hi Ben,
I guess I just don't see how HeapWalk could be used to identify which
blocks of memory to be recover. I see how you can use HeapWalk to
iterate through the heap entries, but how would you know which ones
need to be recovered?

By this new definition of memory leak, .NET can leak memory as well. Say I
have a Form that's disposed but being kept alive by some event handler
somewhere. .NET doesn't provide any better means of cleaning that up and
letting the Form and child controls be reclaimed than HeapWalk provides to
native code.
 
B

Ben Voigt [C++ MVP]

Willy Denoyette said:
The trouble with C++/CLI is that a number of IDisposable Framework classes
cannot be used with stack allocation semantics.

Try this in ++/CLI:

WindowsIdentity wi = WindowsIdentity::GetCurrent();

it'll fail to compile ...
so, you are forced to resort to heap based semantics, like this..

WindowsIdentity^ wi = WindowsIdentity::GetCurrent();

and as such, to Dispose of this instance by:

delete wi;

right, you are back at square zero, using explicit try/finally, something
C++/CLI's *stack allocation* idiom was meant to arrange for you.

Nope, you still don't need try/finally, just a stack-semantics smart
pointer.
 
W

Willy Denoyette [MVP]

Ben Voigt said:
Nope, you still don't need try/finally, just a stack-semantics smart
pointer.

Oh, I see the (yet to be released) auto_handle template, where is the
advantage here over the C# using idiom, doesn't this "places extra
unnecessary burden on the programmer "?


auto_handle<WindowsIdentity> myIdentity = WindowsIdentity::GetCurrent();

using(WindowsIdentity myIdentity = WindowsIdentity.GetCurrent())
{

}

Willy.
 
P

Peter Duniho

Ben said:
Oh, so now even HeapWalk isn't necessary. As long as the internal heap
manager has internal variables that can get at that data, it isn't leaked?

No. That's not what I'm talking about, and IMHO a person who infers
that is intentionally misinterpreting what I'm trying to write just to
make a point.
 
B

Ben Voigt [C++ MVP]

Oh, I see the (yet to be released) auto_handle template, where is the
advantage here over the C# using idiom, doesn't this "places extra
unnecessary burden on the programmer "?


auto_handle<WindowsIdentity> myIdentity = WindowsIdentity::GetCurrent();

using(WindowsIdentity myIdentity = WindowsIdentity.GetCurrent())
{

}

For such classes, there's is little difference.

There may be some advantages, though, when changing ownership. How would
the C# compiler react to trying to change myIdentity inside the using block?
Could you automatically dispose everything, unless the whole block was
successful and you decide to "commit the transaction"?

Sure, the Microsoft auto_handle may not be released yet, but it's trivial to
write your own. This also lets you vary the behavior if you want -- in C#
you can't make something that's almost a using block without making the
syntax far more complicated.
 
B

Ben Voigt [C++ MVP]

Peter Duniho said:
No. That's not what I'm talking about, and IMHO a person who infers that
is intentionally misinterpreting what I'm trying to write just to make a
point.

Then could you please explain the difference between that and what you are
talking about here (I quote):

"It seems as though you are trying to say that the list of event handlers is
not accessible by the code, but that's simply false. It may not be
accessible by the code that added the handler, but that's not the same
thing."

Yes, I am trying to make a point.

If you create a object, hand it to a library that isn't smart enough to
destroy it correctly, and then lose your reference to the object, you have a
memory leak. This is equally possible in both native code using C++ or
Win32 heaps and .NET code using the garbage collector.
 
P

Peter Duniho

Ben said:
Then could you please explain the difference between that and what you are
talking about here (I quote):

"It seems as though you are trying to say that the list of event handlers is
not accessible by the code, but that's simply false. It may not be
accessible by the code that added the handler, but that's not the same
thing."

I have already made the distinction between code that can use the memory
and code that cannot.

In your example, the event handling class still exists, and could in
fact respond to the event, since it's still subscribed to the event.
That's hardly "useless" memory.
Yes, I am trying to make a point.

I wrote "JUST to make a point". I know you're trying to make a point.
Duh. But the ends do not always justify the means. Intentionally
ignoring what I'm writing just for the sake of making a point isn't
useful, or even honorable.
If you create a object, hand it to a library that isn't smart enough to
destroy it correctly, and then lose your reference to the object, you have a
memory leak.

No, you don't. You have buggy code. You have code that mismanages
memory. But that's not the same as a leak.

Frankly, I'm sick of arguing over the semantics of "leak". My original
point was specifically that there are DIFFERING OPINIONS regarding the
use of the term. It is important that when people are communicating to
each other about a specific memory leak or specific class of memory leak
that they agree on the semantics, but otherwise it is pointless to argue
about it.

I don't dispute your right to use "memory leak" in the way that you feel
is best. What I do dispute is your right to assert that I am an idiot
for using it in the way that I feel is best when that's different from
what you feel is best (and whether you use the word "idiot" or not,
that's exactly what you're doing).

Pete
 
J

John Duval

By this new definition of memory leak, .NET can leak memory as well. Say I
have a Form that's disposed but being kept alive by some event handler
somewhere. .NET doesn't provide any better means of cleaning that up and
letting the Form and child controls be reclaimed than HeapWalk provides to
native code.






- Show quoted text -

Hi Ben,
I'm not sure I follow your example about the Form being kept alive by
an event handler. If you're talking about a circular reference (Form
has reference to child control, child control's event has reference
back to Form's event handler), I'm sure you know that the GC will
handle that and clean up the objects involved once they are no longer
referenced.

I don't think my definition has changed if that's what you're saying
-- I did say originally that with a memory leak there is no reference
to the memory which can be used to recover it. In the case of
HeapWalk, I don't see how it can be used to recover orphaned memory,
mainly because I don't see a way to identify specific memory blocks
that are no longer referenced.

I guess the key point is that (in my mind) there is a difference
between *failing* to release memory and being *unable* to release the
memory because the code no longer has a reference to it. Of course
there are cases where you can write a .NET program that fails to
release memory -- I've run into this a number of times including the
Win32 and Interop cases that Chris Mullins alluded to.

But when I think of the term "memory leak", I think specifically of
the problem that the .NET GC was designed to solve. Are there other
cases where .NET programs can fail to release memory? Sure, but I
don't call them all memory leaks (and neither do a lot of other
devs). It sounds like your definition of "memory leak" is broader.

John
 
C

Chris Mullins [MVP - C#]

John Duval said:
But when I think of the term "memory leak", I think specifically of
the problem that the .NET GC was designed to solve. Are there other
cases where .NET programs can fail to release memory? Sure, but I
don't call them all memory leaks (and neither do a lot of other
devs). It sounds like your definition of "memory leak" is broader.

In the midst of all this definition madness, I'm going to fall back on a
version of the old Supreme Count defintion of pornography: "I can't exactly
define what a memory leak is, but if my program's memory footprint is
growing without bounds, then it's leaking memory."

I've personally only had this be the fault of the GC on two occasions, and
all of the other thousands of times it's been due to concrete bugs in
application code.
 
J

John Duval

In the midst of all this definition madness, I'm going to fall back on a
version of the old Supreme Count defintion of pornography: "I can't exactly
define what a memory leak is, but if my program's memory footprint is
growing without bounds, then it's leaking memory."

I've personally only had this be the fault of the GC on two occasions, and
all of the other thousands of times it's been due to concrete bugs in
application code.

Hi Chris,
Love your definition! :)

I'm curious what situations you've run into that were the fault of the
GC. Is the LOH bug one of them?

John
 
B

Ben Voigt [C++ MVP]

Hi Ben,
I'm not sure I follow your example about the Form being kept alive by
an event handler. If you're talking about a circular reference (Form
has reference to child control, child control's event has reference
back to Form's event handler), I'm sure you know that the GC will
handle that and clean up the objects involved once they are no longer
referenced.

Well, they are all still referenced. But yes, if the whole bundle becomes
unreachable the GC will take them all out, running finalizers in no
particular order. This is what causes the resurrection issue.

For example of leaking -- Application.Idle, although any of the Application
events will do.
I don't think my definition has changed if that's what you're saying
-- I did say originally that with a memory leak there is no reference
to the memory which can be used to recover it. In the case of
HeapWalk, I don't see how it can be used to recover orphaned memory,
mainly because I don't see a way to identify specific memory blocks
that are no longer referenced.

I guess the key point is that (in my mind) there is a difference
between *failing* to release memory and being *unable* to release the
memory because the code no longer has a reference to it. Of course
there are cases where you can write a .NET program that fails to
release memory -- I've run into this a number of times including the
Win32 and Interop cases that Chris Mullins alluded to.

But when I think of the term "memory leak", I think specifically of
the problem that the .NET GC was designed to solve. Are there other
cases where .NET programs can fail to release memory? Sure, but I
don't call them all memory leaks (and neither do a lot of other
devs). It sounds like your definition of "memory leak" is broader.

If you register a handler (assume an instance member of a Form) for
Application.Idle and fail to unregister it during Form disposal, you're
going to have one very hard time getting the GC to ever collect that Form.
Yeah, some code somewhere within Application has the list of Delegate
instances that includes a reference to your form, but it isn't any more
accessible to *you* in order to break the cycle than the internal heap
structures are to a native program. So your form, and all the other object
relying on mass unreachability, get leaked.
 
B

Ben Voigt [C++ MVP]

Frankly, I'm sick of arguing over the semantics of "leak". My original
point was specifically that there are DIFFERING OPINIONS regarding the use
of the term. It is important that when people are communicating to each
other about a specific memory leak or specific class of memory leak that
they agree on the semantics, but otherwise it is pointless to argue about
it.

I know there are different ways to define "memory leak". I've addressed
several of them.
I don't dispute your right to use "memory leak" in the way that you feel
is best. What I do dispute is your right to assert that I am an idiot for
using it in the way that I feel is best when that's different from what
you feel is best (and whether you use the word "idiot" or not, that's
exactly what you're doing).

Pete, I know you aren't an idiot. I don't mean to say that you are. I just
think that in this particular discussion, you weren't originally aware that
native heap managers not only kept references to all outstanding
allocations, but made them available to the programmer via HeapWalk, and now
you're engaging in hand-waving to get rid of it.

Yes, as long as .NET keeps an object alive it is because it could be
accessed somehow, the event referencing it could be fired, etc. The same is
true for native code though, HeapWalk could be used to skim memory looking
for sensitive information, for example. I don't see any substantial
difference between .NET and native code here -- the memory in question (can
we agree it has "leaked"?) is not 100% useless in either case but it is
awfully close.
 
C

Chris Mullins [MVP - C#]

I'm curious what situations you've run into that were the fault of the
GC. Is the LOH bug one of them?

This is going to open a big can-o-worms I should really leave unopened,
but...

Both cases were related to arrays of objects (not on the LOH heap), that
were being allocated at the start of an operation. Both cases were running
on multi-core x86 machines, with the GC operating in Server mode.

The code, in essence, looked like:
ProcessTransaction()
{
PreLoadDatabaseData allData[] = DataLayer.GetAllUserData[...]

// Lots of complex synchronous operations here.
// We hit LDAP, AD, WMI, Sockets, Files, and Databases.
// We really leverage Reflection, Dynamic Method Invokation,
// and custom attributes.
// This method is recursive, as Transactions often kick off
// other transactions.
// There is also quite a bit of concurrency going
// on with various shared datastructures.

// variable should go out of scope here.
}

We would see this variable, state data, stay around forever. I was able to
confirm with both Scitech and SOS that things were not referenced by
anything, yet the data would never be cleaned up.

Much to my surprise, the solution was to add in "allData=null;" to the end
of the method. I've got lots of angry comments in my source code, none of
which are at all suitable for publication!

Both times this happened, I spent a signifigant amount of time trying to
duplicate the problem in a more "unit test" type of case. In both cases, I
failed - even keeping the object heirarchy, all the polymorphic classes, and
such, everything would work great in a unit test scenario. I didn't keep
track of the particular revision of .Net I was using, and I haven't removed
the code and tested on the newest revision to the platform.

In both cases, we discovered the problem via MiniDumps sent to us by
customers who were seeing OOM. And before the obvious question is asked, I
do (very well!) understand the difference between "eligible for GC, but
hasn't been collected yet" and "Why the f*ck doesn't this get collected?".

Disclaimer:
Before people jump on this post as support for "I should always set my
variables to null when I'm done with them", I feel the need to pre-empt that
argument. Please don't do it. Don't set your variables to null! Use the GC
correctly. It's your friend. Don't fight with it. Besides, setting your
stuff to null actually keeps the references around longer in many cases.
This is 2 examples out of lots and lots and lots of code, running on
thousands of installations with crazy load and throughput requrements. This
stuff also runs on x86 / x64 / IA64, works well in all cases, and is
frightfully complex. Your code just isn't this complex. Really. If you think
it is, and you think you really need to do this, hire me to do an evaluation
of your code base, and I'll let you know.
 
P

Peter Duniho

Ben said:
[...]
Pete, I know you aren't an idiot.

Then please stop treating me like one.
I don't mean to say that you are. I just
think that in this particular discussion, you weren't originally aware that
native heap managers not only kept references to all outstanding
allocations, but made them available to the programmer via HeapWalk,

Well, you're wrong about that. I may not be an expert, but I certainly
have a decent understanding of how memory management in Windows and
similar systems work.
and now
you're engaging in hand-waving to get rid of it.

No, I'm not. I genuinely view the OS memory management data structures
as being very different logically from memory management within an
application.
Yes, as long as .NET keeps an object alive it is because it could be
accessed somehow, the event referencing it could be fired, etc. The same is
true for native code though,

And would be no more a memory leak in native code, should it have the
same behavior (that is, some other code may actually use a reference
that the code that theoretically owns the memory has forgotten about).
HeapWalk could be used to skim memory looking
for sensitive information, for example.

When I talk about some code having access to the memory, I'm not talking
about something like HeapWalk. HeapWalk is not a normal way to manage
memory resources, nor is it an example of the kind of "access" I'm
talking about.

Pete
 
P

Peter Duniho

Chris said:
In the midst of all this definition madness, I'm going to fall back on a
version of the old Supreme Count defintion of pornography: "I can't exactly
define what a memory leak is, but if my program's memory footprint is
growing without bounds, then it's leaking memory."

I like that definition. Not because I agree with it, but because it's a
good point of reference for discussing reasonable differences of opinion
regarding what a memory leak is.

Your definition reminds me of a problem that the Asheron's Call game
client had, where as you moved about in the world, it would accumulate
more and more data in some structure. The memory footprint was growing
practically without bounds. Yes, it's true: there was a finite amount
of data that could represent the game world. But that was far more data
than you could actually store in a 32-bit application's virtual address
space. For all intents and purposes, the memory footprint was growing
unbounded.

The end result was that on lower-end computers you'd run into excessive
memory swapping after playing for a couple of hours, and even on
computers with more system memory you'd suffer a performance penalty as
the game client had to traverse an ever-growing data structure.

There are other examples of programs that intentionally just generate
more and more data, in an effectively unbounded way. In some cases, it
might even be the only practical way to accomplish whatever it is that
program is trying to do.

While it was my personal opinion that the Asheron's Call problem was a
bug, it's not something I would call a memory leak. It's certainly
related to a memory leak, but AC never lost track of the memory. It
just managed it poorly.

Pete
 
C

Chris Mullins [MVP - C#]

[How weird. This post didn't appear to go through the first time...]

I'm curious what situations you've run into that were the fault of the
GC. Is the LOH bug one of them?

This is going to open a big can-o-worms I should really leave unopened,
but...

Both cases were related to arrays of objects (not on the LOH heap), that
were being allocated at the start of an operation. Both cases were running
on multi-core x86 machines, with the GC operating in Server mode.

The code, in essence, looked like:
ProcessTransaction()
{
PreLoadDatabaseData allData[] = DataLayer.GetAllUserData[...]

// Lots of complex synchronous operations here.
// We hit LDAP, AD, WMI, Sockets, Files, and Databases.
// We really leverage Reflection, Dynamic Method Invokation,
// and custom attributes.
// This method is recursive, as Transactions often kick off
// other transactions.
// There is also quite a bit of concurrency going
// on with various shared datastructures.

// variable should go out of scope here.
}

We would see this variable, state data, stay around forever. I was able to
confirm with both Scitech and SOS that things were not referenced by
anything, yet the data would never be cleaned up.

Much to my surprise, the solution was to add in "allData=null;" to the end
of the method. I've got lots of angry comments in my source code, none of
which are at all suitable for publication!

Both times this happened, I spent a signifigant amount of time trying to
duplicate the problem in a more "unit test" type of case. In both cases, I
failed - even keeping the object heirarchy, all the polymorphic classes, and
such, everything would work great in a unit test scenario. I didn't keep
track of the particular revision of .Net I was using, and I haven't removed
the code and tested on the newest revision to the platform.

In both cases, we discovered the problem via MiniDumps sent to us by
customers who were seeing OOM. And before the obvious question is asked, I
do (very well!) understand the difference between "eligible for GC, but
hasn't been collected yet" and "Why the f*ck doesn't this get collected?".

Disclaimer:
Before people jump on this post as support for "I should always set my
variables to null when I'm done with them", I feel the need to pre-empt that
argument. Please don't do it. Don't set your variables to null! Use the GC
correctly. It's your friend. Don't fight with it. Besides, setting your
stuff to null actually keeps the references around longer in many cases.
This is 2 examples out of lots and lots and lots of code, running on
thousands of installations with crazy load and throughput requrements. This
stuff also runs on x86 / x64 / IA64, works well in all cases, and is
frightfully complex. Your code just isn't this complex. Really. If you think
it is, and you think you really need to do this, hire me to do an evaluation
of your code base, and I'll let you know.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top