List<T> constructor taking IEnumerable<T> behaves surprisingly

W

Weeble

I was implementing a collection type today, and when I ran the tests
on it I was surprised to see a stack overflow. It turns out this was
the culprit:

public class MyCollection : ICollection<T>, IEnumerable<T>
{
[...]
public void CopyTo(T[] array, int arrayIndex)
{
List<T> list = new List<T>(this);
list.CopyTo(array, arrayIndex);
}
}

List<T> has three constructors: one with no arguments, one that takes
an integer, and one that takes an IEnumerable<T>. I had assumed that
the one that takes an IEnumerable<T> would create a new, empty list,
and iterate over my collection using the IEnumerable<T> interface,
adding items to the list. Instead, it appears to cast to
ICollection<T>, inspect Count, create itself at an appropriate size,
then call - uh oh - CopyTo in order to copy elements of my collection
straight into itself. D'oh! I'd better write CopyTo the long way...

As a related note, this data structure doesn't have an O(1)
implementation of Count. (I don't think it's actually possible to do
that and meet some of its other complexity requirements.) Is it wise
to implement ICollection<T> at all? I'm a bit worried that algorithms
assume that Count is cheap and will call it unexpectedly. I'm
*especially* worried that this can happen even in methods that take
IEnumerable<T> and not ICollection<T>, given my nasty surprise with
List's constructor. How useful is it to implement ICollection<T>
anyway?
 
J

Jon Skeet [C# MVP]

Weeble said:
List<T> has three constructors: one with no arguments, one that takes
an integer, and one that takes an IEnumerable<T>. I had assumed that
the one that takes an IEnumerable<T> would create a new, empty list,
and iterate over my collection using the IEnumerable<T> interface,
adding items to the list. Instead, it appears to cast to
ICollection<T>, inspect Count, create itself at an appropriate size,
then call - uh oh - CopyTo in order to copy elements of my collection
straight into itself. D'oh! I'd better write CopyTo the long way...

Eek. It's understandable that it does it, but it would be nice to have
that documented.
As a related note, this data structure doesn't have an O(1)
implementation of Count. (I don't think it's actually possible to do
that and meet some of its other complexity requirements.) Is it wise
to implement ICollection<T> at all? I'm a bit worried that algorithms
assume that Count is cheap and will call it unexpectedly. I'm
*especially* worried that this can happen even in methods that take
IEnumerable<T> and not ICollection<T>, given my nasty surprise with
List's constructor. How useful is it to implement ICollection<T>
anyway?

That entirely depends on the usage. How are you expecting this to be
used?

But yes, if Count may well take a while to evaluate, it might be a good
idea not to implement ICollection<T>. On the other hand, if people want
to use it as a collection and don't mind the hit, it would be annoying
for them not to be able to...

How in control of the code are you? Is this for a public API which
might be used by people everywhere, or is it only for use in a single
application?
 
J

Jeroen Mostert

Weeble said:
I was implementing a collection type today, and when I ran the tests
on it I was surprised to see a stack overflow. It turns out this was
the culprit:

public class MyCollection : ICollection<T>, IEnumerable<T>

This is redundant; ICollection said:
{
[...]
public void CopyTo(T[] array, int arrayIndex)
{
List<T> list = new List<T>(this);
list.CopyTo(array, arrayIndex);
}
}

List<T> has three constructors: one with no arguments, one that takes
an integer, and one that takes an IEnumerable<T>. I had assumed that
the one that takes an IEnumerable<T> would create a new, empty list,
and iterate over my collection using the IEnumerable<T> interface,
adding items to the list.

Well, it does. Provided the IEnumerable<T> does not also implement
ICollection said:
Instead, it appears to cast to ICollection<T>, inspect Count, create
itself at an appropriate size, then call - uh oh - CopyTo in order to
copy elements of my collection straight into itself. D'oh! I'd better
write CopyTo the long way...
A rather unfortunate consequence of an otherwise sensible optimization.
As a related note, this data structure doesn't have an O(1)
implementation of Count. (I don't think it's actually possible to do
that and meet some of its other complexity requirements.) Is it wise
to implement ICollection<T> at all? I'm a bit worried that algorithms
assume that Count is cheap and will call it unexpectedly.

They won't call it "unexpectedly", really, but yes, they're going to assume
that .Count is cheap.
I'm *especially* worried that this can happen even in methods that take
IEnumerable<T> and not ICollection<T>, given my nasty surprise with
List's constructor. How useful is it to implement ICollection<T> anyway?

In your case, probably not at all. Implementing general interfaces is an
enabling move, not a requirement. There aren't many classes that go through
ICollection<T> as opposed to either more specific interfaces like IList<T>
or less specific interfaces like IEnumerable<T>. Those classes that do go
through ICollection<T> are definitely going to expect that .Count is O(1).
How much practical impact this will have on efficiency would be unclear
without testing typical scenarios.

By contrast, IEnumerable<T> has an extension method .Count() that's not
expected to be O(1) (because it necessarily has to enumerate all items and
count them individually in the worst case).
 
M

Marc Gravell

You could bypass the List<T>'s tendency to use ICollection<T> by
simply not giving it an ICollection<T> - for example, with the C# 3
extension method below you could use:

List<T> list = new List<T>(source.AsEnumerableOnly());

The AsEnumerableOnly() *only* exposes IEnumerable<T> (OK, it also
exposes IEnumerable and IDisposable ;-p), allowing your non-Count-
friendly implementation to work happily.

Marc

public static class EnumerableExt
{
public static IEnumerable<T> AsEnumerableOnly<T>(
this IEnumerable<T> source)
{
if (source == null) yield break; // GIGO
foreach (T item in source)
{
yield return item;
}
}
}
 
G

Göran Andersson

Weeble said:
I was implementing a collection type today, and when I ran the tests
on it I was surprised to see a stack overflow. It turns out this was
the culprit:

public class MyCollection : ICollection<T>, IEnumerable<T>
{
[...]
public void CopyTo(T[] array, int arrayIndex)
{
List<T> list = new List<T>(this);
list.CopyTo(array, arrayIndex);
}
}

List<T> has three constructors: one with no arguments, one that takes
an integer, and one that takes an IEnumerable<T>. I had assumed that
the one that takes an IEnumerable<T> would create a new, empty list,
and iterate over my collection using the IEnumerable<T> interface,
adding items to the list. Instead, it appears to cast to
ICollection<T>, inspect Count, create itself at an appropriate size,
then call - uh oh - CopyTo in order to copy elements of my collection
straight into itself.

It certainly does. I checked the code of the constructor using reflector.

It makes very much sense to check for a collection, as it's more
efficient to allocate the list at the correct size from the start
instead of having to increase the size of the list all the time when
adding items to it.

Why'd you want to copy all the items to a list before copying them to an
array, anyway? I don't see the point in copying everything twice.
D'oh! I'd better write CopyTo the long way...

Well, it's not that long really...

public void CopyTo(T[] array, int arrayIndex) {
foreach (T t in this) array[arrayIndex++] = t;
}

;)
As a related note, this data structure doesn't have an O(1)
implementation of Count. (I don't think it's actually possible to do
that and meet some of its other complexity requirements.)

Then you should perhaps keep a local variable in your class that keeps
track of the count.
Is it wise
to implement ICollection<T> at all? I'm a bit worried that algorithms
assume that Count is cheap and will call it unexpectedly.

The Count property of a collection is generally cheap, so I think that
most code assumes that it's at least cheaper than getting an enumerator
for example.
I'm
*especially* worried that this can happen even in methods that take
IEnumerable<T> and not ICollection<T>, given my nasty surprise with
List's constructor.

How useful is it to implement ICollection<T> anyway?

That depends on what you need to do with your class.
 
W

Weeble

Why'd you want to copy all the items to a list before copying them to an
array, anyway? I don't see the point in copying everything twice.

I did it that way because CopyTo has a whole bunch of failure cases to
take care of (multi-dimensional array, not enough space, negative
index, null array), and I didn't think it was likely to get used that
much, so I wanted to do something that would behave correctly without
a lot of work. If it became a performance issue I would certainly have
changed it later.
Then you should perhaps keep a local variable in your class that keeps
track of the count.

Yeah, I figured I would have to explain this. The requirements:

List of nodes, similar to LinkedList<T>.
O(1) splicing together of two lists to make one.
O(1) removal of a node from a list.

The problem is the combination of the splicing requirement and the
removal requirement. To support the splicing, I can't have every node
hold a reference to the list, otherwise I'd need to update them all
during the splice. However, this makes removal tricky. If removal is a
method on the list, then it has no way to ensure that the node to be
removed actually belongs to it and not some other list. Instead,
removal is a method on the node. This means that when we're doing the
removal, we don't actually know what list we're removing from, so we
can't update the count. (It's a little bit more complicated than that:
removing the head of the list would mess things up, so the head node
actually holds a reference to the list to cope with this situation.
This doesn't compromise the O(1) splicing because there's only ever
one item in a list that has a reference back up to the list.)
The List<T> constructor actually takes an IEnumerable<T> argument. It's
the code in the constructor that then checks if it actually is a collection.

Yep, that's what I thought. It's perfectly understandable from a
performance point of view, but as Jon Skeet mentions, it's surprising
that this sort of behaviour isn't documented.
That depends on what you need to do with your class.

I'm tending to think that it's not actually going to be useful for
anything I'm going to do with the class. I thought I should implement
it because it made sense for my class to be considered a collection,
but the more I think about it the less value I see in going to the
bother of implementing (and testing) the ICollection behaviour.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top