Cleaning Up Custom Collection classes

J

jehugaleahsa

Hello:

A while back I wrote my own collection classes for C#. It was merely a
teaching exercise for myself.

The problem is that my collections perform drastically slower than the
built-in classes. I was able to do some performance monitoring and
came to the conclusion that a good portion of my time was being spent
in the Dispose methods I used to clear out values from my collections.

I have my collections call Dispose on their elements, so that
references get set to null. This way I don't have dangling references
holding up the garbage collector.

However, this comes at a somewhat serious runtime cost. My guess is
that the performance hit is due to Dispose being virtual (and it is in
my situation).

How do the built-in collections prevent dangling references?

Thanks for any suggestions. This isn't critical, purely academic.

~Travis
 
J

Jeroen Mostert

A while back I wrote my own collection classes for C#. It was merely a
teaching exercise for myself.

The problem is that my collections perform drastically slower than the
built-in classes. I was able to do some performance monitoring and
came to the conclusion that a good portion of my time was being spent
in the Dispose methods I used to clear out values from my collections.
That's because you probably shouldn't be doing that.
I have my collections call Dispose on their elements, so that
references get set to null.

This is *not* what Dispose() normally does, or is supposed to do.
This way I don't have dangling references holding up the garbage
collector.

The garbage collector will do just fine on its own. Don't try to "help" it
by setting references to null, this is more likely to reduce performance
than to enhance it. In particular, if a collection becomes eligible for
garbage collection and nobody else is holding references to the elements it
contains, the garbage collection can collect the entire tree of objects in
one swoop (or sweep, rather). There's no point in pre-emptively trying to
trigger GC for the individual elements. There is *very occasionally* a
performance gain to be had by explicitly setting a reference to null, but
only for large objects that wouldn't get collected because they're part of
an even longer-lived object -- which is rare, and often a design flaw.

Dispose() has nothing to do with reclaiming memory, but with cleaning up
resources that need to be cleaned up ASAP (as opposed to when the garbage
collector comes around to collect the object that's holding the resource).
This is called "deterministic finalization" and it's needed exactly because
garbage collection isn't enough.

For example, a FileStream implements IDisposable because every open file
consumes a precious operating system handle, and having a file open will
prevent some other operations on it from succeeding. It's not good enough to
have this happen when the FileStream is "eventually" garbage collected, it
should happen as soon as you're done with it. FileStream.Close() does the
same as .Dispose() in this case. Implementing IDisposable allows you to use
the "using" statement to conveniently create and clean up objects in the
same scope.

See the "Garbage Collection" topic in the MSDN. It's rather big and
complicated, but it's thorough. As a first guideline, though: if you don't
know exactly what IDisposable is for, it's best to leave it alone. Use the
"using" statement for objects that implement IDisposable, or call .Dispose()
yourself, but don't create new classes that implement IDisposable unless you
know what you're doing, because it's usually not necessary.
 
J

jehugaleahsa

Hello:

A while back I wrote my own collection classes for C#. It was merely a
teaching exercise for myself.

The problem is that my collections perform drastically slower than the
built-in classes. I was able to do some performance monitoring and
came to the conclusion that a good portion of my time was being spent
in the Dispose methods I used to clear out values from my collections.

I have my collections call Dispose on their elements, so that
references get set to null. This way I don't have dangling references
holding up the garbage collector.

However, this comes at a somewhat serious runtime cost. My guess is
that the performance hit is due to Dispose being virtual (and it is in
my situation).

How do the built-in collections prevent dangling references?

Thanks for any suggestions. This isn't critical, purely academic.

~Travis

I was just looking at the source code for the PowerCollections
RedBlack tree and I see how they manage it. When they remove a node
from the tree, since there are no pointers to the node anymore, the GC
will collect the node automatically. However, that puts the reference
inside the node in a second generation. That means it takes two GC
runs to finalize the underlying store. Is this generally accepted as
the best performance measure?

What about contiguous memory collections? Since they are part of an
underlying Array, should I manually iterate over removed items and set
them to a default value? What does MyStruct[50] do when it is first
initialized? I assume it calls the default MyStruct ctor 50 times.
Wouldn't it be faster to zero out memory for 50 * sizeof(MyStruct)? Is
there anyway I can achieve that with my own code? or does using the
Array.Resize<T>(ref T[], int) handle it for me? What about when a
developer calls a method to remove the last 10 items (without
affecting capacity) - should I iterate over the removed items and zero
them out or should I set them to default(T)? Is there even a "safe"
way to zero out memory in C#?

My problem is that I would be more than willing to buy a book about
developing data stuctures in C#; however, I can't really find one.
Everyone just insists that I use the built-in collections. I am being
purely academic.

I am also trying to introduce this collection library to my colleagues
as to allow them to start using iterators, similar to C++. I have
developed an extensive algorithms and numerics library that uses my
collections as well as generic IEnumerable<T>. But I'm not going to
suggest working with a collection library that runs significantly
slower and has potentially more bugs in it. My implementation of
List<T> runs in about the same time as
System.Collections.Generics.List<T>. If I could get my HashTable and
my RBTree to work just as fast, I would be in good shape.

I don't expect I will be able to compete with some of the built-in
optimizations of the .NET collections. However, if I can approach them
performance wise, I can justify the use of my collections over
the .NET ones. Especially since I have classes such as Set, BigInteger
and BidirectionalMap.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top