Array.Clear vs List<>.Clear

  • Thread starter Thread starter Lee Crabtree
  • Start date Start date
L

Lee Crabtree

This seems inconsistent and more than a little bizarre.

Array.Clear sets all elements of the array to their default values (0,
null, whatever), whereas List<>.Clear removes all items from the list.
That part makes a reasonable amount of sense, as you can't actually take
items away from an Array. However, there doesn't seem to be a way to
perform the same operation in one fell swoop on a List<>.

For example:

byte[] byteArr = new byte[10];

....things happen and bytes get set...

Array.Clear(byteArr, 0, 10);

Now all the bytes are set to 0.


But if you use a List<byte>:

List<byte> byteList = new List<byte>(new byte[10]);

....things happen and bytes get set...

There's no way to reset all the bytes, so you're forced to iterate over
the list. Now, I'm sure that the performance hit of having to run a for
loop across the list isn't incredible. But aside from the apparent
inconsistency, I have to wonder if there isn't some mechanism to do the
same thing to a generic List.

Lee Crabtree
 
IMO, the Array.Clear() behavior is incorrect, as it doesn't follow the
normal intent of Clear(), and should throw a NotSupportedException. Re
the generic list - I can't think of many scenarios when I would *want*
to do this on a list - however, looping (using indexer, not foreach)
is a reasonable option and won't generally be too slow. You could
simply Clear() and AddRange(new T[count]), but this unnecessarily
creates the array.

Marc
 
Hello, Lee!

What's the point using List<T> like Array ?

List<T> internally maintains array of type T, so when you call List<T>.Clear
all internal array items will be set to 0,
and List<T> size will be reset.
--
With best regards, Vadym Stetsiak.
Blog: http://vadmyst.blogspot.com

You wrote on Thu, 04 Oct 2007 10:17:20 -0500:

LC> This seems inconsistent and more than a little bizarre.

LC> Array.Clear sets all elements of the array to their default values
LC> (0, null, whatever), whereas List<>.Clear removes all items from
LC> the list.
LC> That part makes a reasonable amount of sense, as you can't actually
LC> take items away from an Array. However, there doesn't seem to be a
LC> way to perform the same operation in one fell swoop on a List<>.

LC> For example:

LC> byte[] byteArr = new byte[10];

LC> ...things happen and bytes get set...

LC> Array.Clear(byteArr, 0, 10);

LC> Now all the bytes are set to 0.


LC> But if you use a List<byte>:

LC> List<byte> byteList = new List<byte>(new byte[10]);

LC> ...things happen and bytes get set...

LC> There's no way to reset all the bytes, so you're forced to iterate
LC> over the list. Now, I'm sure that the performance hit of having to
LC> run a for loop across the list isn't incredible. But aside from
LC> the apparent inconsistency, I have to wonder if there isn't some
LC> mechanism to do the same thing to a generic List.

LC> Lee Crabtree
 
IMO, the Array.Clear() behavior is incorrect...

for clarity, I'm talking about its implementation on IList / IList<T>.

What Array / T[] does on its own behalf is up to itself.

Marc
 
Lee,

There isn't a mechanism to do this, but writing your own is easy:

public static void ClearList<T>(List<T> list, int index, int length)
{
// Cycle through the list and set the item to the default.
for (; index < (index + length); ++index)
{
// Set the item in the list to the default value.
list[index] = default(T);
}
}
 
The major reason I use a List<T> in a place where an Array would
normally be more common is that the line length has to be extended or
shortened at certain times, and recreating the array then copying
elements across seems unnecessarily ridiculous.

Lee Crabtree
 
That's more or less what I wound up doing anyway, but curiosity got the
better of me, and there's rarely harm in asking.

Lee Crabtree
 
Lee Crabtree said:
The major reason I use a List<T> in a place where an Array would normally
be more common is that the line length has to be extended or shortened at
certain times, and recreating the array then copying elements across seems
unnecessarily ridiculous.

Well, that's what you are going to get with List<T> anyway, copying and more
(ridiculous) copying.
As Vadym said, the List<T> internally manages the list as an Array,[], so
every time you add one element over the List<T>.Capacity, the it will create
a new array, copy the elements, add the new item and then destroying the old
array.
The unfortunate thing with List<T> is that the capacity does grow up one by
one. I'd like to see something like "expand it by 10%", so that the next Add
would not incur reallocations.
So if you know how much you must expand, you sould set the List<T>.Capacity
Lee Crabtree

Vadym said:
Hello, Lee!

What's the point using List<T> like Array ?

List<T> internally maintains array of type T, so when you call
List<T>.Clear all internal array items will be set to 0,
and List<T> size will be reset.
--
With best regards, Vadym Stetsiak.
Blog: http://vadmyst.blogspot.com

You wrote on Thu, 04 Oct 2007 10:17:20 -0500:

LC> This seems inconsistent and more than a little bizarre.

LC> Array.Clear sets all elements of the array to their default values
LC> (0, null, whatever), whereas List<>.Clear removes all items from
LC> the list.
LC> That part makes a reasonable amount of sense, as you can't actually
LC> take items away from an Array. However, there doesn't seem to be a
LC> way to perform the same operation in one fell swoop on a List<>.

LC> For example:

LC> byte[] byteArr = new byte[10];

LC> ...things happen and bytes get set...

LC> Array.Clear(byteArr, 0, 10);

LC> Now all the bytes are set to 0.


LC> But if you use a List<byte>:

LC> List<byte> byteList = new List<byte>(new byte[10]);

LC> ...things happen and bytes get set...

LC> There's no way to reset all the bytes, so you're forced to iterate
LC> over the list. Now, I'm sure that the performance hit of having to
LC> run a for loop across the list isn't incredible. But aside from
LC> the apparent inconsistency, I have to wonder if there isn't some
LC> mechanism to do the same thing to a generic List.

LC> Lee Crabtree
 
Okay, this defies what I would have expected, but a few quick tests make
it look as though chucking the whole list out and re-newing it is almost
twice as fast, viz:

using System;
using System.Collections.Generic;
using System.Diagnostics;

namespace ListDefaultingTest
{
class Program
{
static void Main(string[] args)
{
Stopwatch testTime = new Stopwatch();
List<byte> testList = new List<byte>(new byte[5000]);

testTime.Start();

for(int i = 0; i < testList.Count; i++)
testList = 0;

testTime.Stop();

Console.WriteLine(testTime.Elapsed);

testTime.Reset();
testTime.Start();

testList = new List<byte>(new byte[5000]);

testTime.Stop();

Console.WriteLine(testTime.Elapsed);

Console.ReadLine();
}
}
}

Running this on my box (2.6 Ghz dual-core) gives between 450 and 650 ms
for the first version, and the second gives between 250 and 400 ms.

The List reference being fast makes sense, as there isn't an amazing
amount of work in generating a reference type, but I was pretty shocked
that creating a great big value array was still faster. Whether this
has to do with the speed of accessing List elements or the speed of
contiguous memory allocation (arrays ARE contiguous, aren't they?) I
don't know, but this was more than a little weird, to my mind.

Lee Crabtree
 
Lee Crabtree said:
The major reason I use a List<T> in a place where an Array would normally
be more common is that the line length has to be extended or shortened at
certain times, and recreating the array then copying elements across seems
unnecessarily ridiculous.

Well, that's what you are going to get with List<T> anyway, copying and more
(ridiculous) copying.
As Vadym said, the List<T> internally manages the list as an Array,[], so
every time you add one element over the List<T>.Capacity, the it will create
a new array, copy the elements, add the new item and then destroying the old
array.
The unfortunate thing with List<T> is that the capacity does grow up one by
one. I'd like to see something like "expand it by 10%", so that the next Add
would not incur reallocations.

It *doubles* the internal array size whenever you add an item that
would grow the array, not one by one. Starting at whatever your
initial capacity was (4 is default).
 
Lee said:
Okay, this defies what I would have expected, but a few quick tests make
it look as though chucking the whole list out and re-newing it is almost
twice as fast, viz:

[...]
Running this on my box (2.6 Ghz dual-core) gives between 450 and 650 ms
for the first version, and the second gives between 250 and 400 ms.

The List reference being fast makes sense, as there isn't an amazing
amount of work in generating a reference type, but I was pretty shocked
that creating a great big value array was still faster. Whether this
has to do with the speed of accessing List elements or the speed of
contiguous memory allocation (arrays ARE contiguous, aren't they?) I
don't know, but this was more than a little weird, to my mind.

Well, my first caveat would to note the variance in your timings. If
your tests show differences for the same code of almost 100% (250 to 400
ms), that suggests that there are external factors that have much more
significance than the actual implementation.

That in mind, your results aren't all that surprising to me, as it's
almost always going to be slower to access individual elements than to
do things an entire block of memory at a time. Your second section of
code allows the compiler and run-time to do the latter, and so it seems
reasonably expected that it would be faster.

Pete
 
Lee said:
Okay, this defies what I would have expected, but a few quick tests make
it look as though chucking the whole list out and re-newing it is almost
twice as fast, viz:
[...]
Running this on my box (2.6 Ghz dual-core) gives between 450 and 650 ms
for the first version, and the second gives between 250 and 400 ms.
The List reference being fast makes sense, as there isn't an amazing
amount of work in generating a reference type, but I was pretty shocked
that creating a great big value array was still faster. Whether this
has to do with the speed of accessing List elements or the speed of
contiguous memory allocation (arrays ARE contiguous, aren't they?) I
don't know, but this was more than a little weird, to my mind.

Well, my first caveat would to note the variance in your timings. If
your tests show differences for the same code of almost 100% (250 to 400
ms), that suggests that there are external factors that have much more
significance than the actual implementation.

That in mind, your results aren't all that surprising to me, as it's
almost always going to be slower to access individual elements than to
do things an entire block of memory at a time. Your second section of
code allows the compiler and run-time to do the latter, and so it seems
reasonably expected that it would be faster.

Pete

Well... List.clear actually calls internally array.clear which is an
internal framework function. I wouldn't be surprised if it didn't
just eventually call some sort of memset variant (which is really
fast).

I would say that calling List.Clear() is going to be slightly faster
than an iterative version (but since her test didn't measure a
List<T>.Clear call it is unclear..pardon the pun).

Really, I think the main advantage to throwing out the old list is the
fact that you can't shrink a list size once it has been grown....and
that every time it grows the array size doubles....So if you have a
list that is 4 megabytes, the next growth will be to 8 megs. even if
you only want 4Megs+2 items....
 
Laura said:
Well, that's what you are going to get with List<T> anyway, copying and
more (ridiculous) copying.

I wouldn't call the copying "ridiculous", and it is required in any case
with _any_ array implementation. The only way to avoid it would be to
use some linked data structure (eg LinkedList).
As Vadym said, the List<T> internally manages the list as an Array,[],
so every time you add one element over the List<T>.Capacity, the it will
create a new array, copy the elements, add the new item and then
destroying the old array.

This would be required managing a dynamically-sized Array as well,
except you'd have to do it yourself.
The unfortunate thing with List<T> is that the capacity does grow up one
by one. I'd like to see something like "expand it by 10%", so that the
next Add would not incur reallocations.

That's not correct. The List<T> does NOT "grow up one by one". The
default size starts out at 4 elements, and each time you exceed the
capacity of the list, the newly allocated storage has double the
previous capacity.
So if you know how much you must expand, you sould set the
List<T>.Capacity in advance to the max you need as you cannot shrink a
List<T> capacity.

Yes, you definitely should provide whatever information you have, if you
have it. If you know what the eventual required capacity will be, it is
better to provide that.
The performance of simple System.Array could result better.

I don't see how, since the underlying behavior is basically the same.

For what it's worth, I wrote a short program to test the performance and
behavior of the List<T> class. It confirms that with each new
allocation, the capacity is doubled. It also shows that at least on a
reasonably fast computer (Core 2 Duo 2.33Ghz), it takes less than 100 ms
to add in sequence 1 million integers to a List<int>.

It's true that for smaller lists, there are more allocations, and so the
copying is a larger part of the overall time cost. But then, for
smaller lists, the total time cost is significantly lower anyway, so
that proportional difference isn't really a big deal.

I really don't think "all that copying" is a significant performance factor.

Pete
 
Doug Semler said:
The major reason I use a List<T> in a place where an Array would
normally
be more common is that the line length has to be extended or shortened
at
certain times, and recreating the array then copying elements across
seems
unnecessarily ridiculous.

Well, that's what you are going to get with List<T> anyway, copying and
more
(ridiculous) copying.
As Vadym said, the List<T> internally manages the list as an Array,[], so
every time you add one element over the List<T>.Capacity, the it will
create
a new array, copy the elements, add the new item and then destroying the
old
array.
The unfortunate thing with List<T> is that the capacity does grow up one
by
one. I'd like to see something like "expand it by 10%", so that the next
Add
would not incur reallocations.

It *doubles* the internal array size whenever you add an item that
would grow the array, not one by one. Starting at whatever your
initial capacity was (4 is default).


No Laura is right, a new array is created with a size which is equal to the
previous size + 1 (size is an int) whenever there is an overflow.
Consider following sample:

List<byte> testList = new List<byte>();
for(int i = 0; i < 8; i++)
testList.Add((byte)i);
Here the List starts with an array of size 0, at the first insert a new
array gets created with a size of 1 (int), that means that we can add 4
bytes before the array overflow, when that happens a new array gets created
with a size of 2, and again 4 bytes can be added.
For the above sample that means creating 3 array's :-(

Willy.
 
Willy Denoyette said:
Doug Semler said:
"Lee Crabtree" <[email protected]> ha scritto nel
messaggio
The major reason I use a List<T> in a place where an Array would
normally
be more common is that the line length has to be extended or shortened
at
certain times, and recreating the array then copying elements across
seems
unnecessarily ridiculous.

Well, that's what you are going to get with List<T> anyway, copying and
more
(ridiculous) copying.
As Vadym said, the List<T> internally manages the list as an Array,[],
so
every time you add one element over the List<T>.Capacity, the it will
create
a new array, copy the elements, add the new item and then destroying the
old
array.
The unfortunate thing with List<T> is that the capacity does grow up one
by
one. I'd like to see something like "expand it by 10%", so that the next
Add
would not incur reallocations.

It *doubles* the internal array size whenever you add an item that
would grow the array, not one by one. Starting at whatever your
initial capacity was (4 is default).


No Laura is right, a new array is created with a size which is equal to
the previous size + 1 (size is an int) whenever there is an overflow.
Consider following sample:

List<byte> testList = new List<byte>();
for(int i = 0; i < 8; i++)
testList.Add((byte)i);
Here the List starts with an array of size 0, at the first insert a new
array gets created with a size of 1 (int), that means that we can add 4
bytes before the array overflow, when that happens a new array gets
created with a size of 2, and again 4 bytes can be added.
For the above sample that means creating 3 array's :-(

Willy.
 
"Lee Crabtree" <[email protected]> ha scritto nel
messaggio
The major reason I use a List<T> in a place where an Array would
normally
be more common is that the line length has to be extended or shortened
at
certain times, and recreating the array then copying elements across
seems
unnecessarily ridiculous.
Well, that's what you are going to get with List<T> anyway, copying and
more
(ridiculous) copying.
As Vadym said, the List<T> internally manages the list as an Array,[], so
every time you add one element over the List<T>.Capacity, the it will
create
a new array, copy the elements, add the new item and then destroying the
old
array.
The unfortunate thing with List<T> is that the capacity does grow up one
by
one. I'd like to see something like "expand it by 10%", so that the next
Add
would not incur reallocations.
It *doubles* the internal array size whenever you add an item that
would grow the array, not one by one. Starting at whatever your
initial capacity was (4 is default).

No Laura is right, a new array is created with a size which is equal to the
previous size + 1 (size is an int) whenever there is an overflow.
Consider following sample:

List<byte> testList = new List<byte>();
for(int i = 0; i < 8; i++)
testList.Add((byte)i);
Here the List starts with an array of size 0, at the first insert a new
array gets created with a size of 1 (int), that means that we can add 4
bytes before the array overflow, when that happens a new array gets created
with a size of 2, and again 4 bytes can be added.
For the above sample that means creating 3 array's :-(

Willy

If I put console.Write(testList.Capacity) after every create/add i
get:
0
4
4
4
4
8
8
8
8


The list doubles....Reflect the Add and EnsureCapacity method. What
happens is the current Count (size) is compared to the capacity
(items.Length). EnsureCapacity is then called on size+1. HOWEVER,
EnsureCapacity DOUBLES the list size.
 
Willy Denoyette said:
Doug Semler said:
"Lee Crabtree" <[email protected]> ha scritto nel
messaggio
The major reason I use a List<T> in a place where an Array would
normally
be more common is that the line length has to be extended or shortened
at
certain times, and recreating the array then copying elements across
seems
unnecessarily ridiculous.

Well, that's what you are going to get with List<T> anyway, copying and
more
(ridiculous) copying.
As Vadym said, the List<T> internally manages the list as an Array,[],
so
every time you add one element over the List<T>.Capacity, the it will
create
a new array, copy the elements, add the new item and then destroying the
old
array.
The unfortunate thing with List<T> is that the capacity does grow up one
by
one. I'd like to see something like "expand it by 10%", so that the next
Add
would not incur reallocations.

It *doubles* the internal array size whenever you add an item that
would grow the array, not one by one. Starting at whatever your
initial capacity was (4 is default).


No Laura is right, a new array is created with a size which is equal to
the previous size + 1 (size is an int) whenever there is an overflow.
Consider following sample:

List<byte> testList = new List<byte>();
for(int i = 0; i < 8; i++)
testList.Add((byte)i);
Here the List starts with an array of size 0, at the first insert a new
array gets created with a size of 1 (int), that means that we can add 4
bytes before the array overflow, when that happens a new array gets
created with a size of 2, and again 4 bytes can be added.
For the above sample that means creating 3 array's :-(

Willy.


I see I'm not (completely) right of course, point is that the default size
is zero, and is doubled when there is an overflow.

So in the previous sample, the sequence is:
array size 0
add 1 byte -> create array with size 1 (int) , say n
add byte
add byte
add byte
add byte -> create array with size n = n* 2, from now on the new array is
doubled in size.
....

Sorry for the confusion.
Willy.
 
The unfortunate thing with List<T> is that the capacity does grow up one by
one. I'd like to see something like "expand it by 10%", so that the next Add
would not incur reallocations.

Nope, it doubles the capacity each time, as Doug said. Here's code to show that:

using System;
using System.Collections.Generic;

public class ConvertFile
{
public static void Main(string[] args)
{
List<string> list = new List<string>();

int previousCapacity=-1;
for (int i=0; i < 100000; i++)
{
list.Add("");
if (list.Capacity != previousCapacity)
{
Console.WriteLine (list.Capacity);
previousCapacity = list.Capacity;
}
}
}
}

Output:

4
8
16
32
64
128
256
512
1024
2048
4096
8192
16384
32768
65536
131072

It would perhaps be nice to be able to specify the scaling factor, but 2 is reasonable for most situations.
 
Doug Semler said:
"Lee Crabtree" <[email protected]> ha scritto nel
messaggionews:[email protected]...
The major reason I use a List<T> in a place where an Array would
normally
be more common is that the line length has to be extended or
shortened
at
certain times, and recreating the array then copying elements across
seems
unnecessarily ridiculous.
Well, that's what you are going to get with List<T> anyway, copying
and
more
(ridiculous) copying.
As Vadym said, the List<T> internally manages the list as an Array,[],
so
every time you add one element over the List<T>.Capacity, the it will
create
a new array, copy the elements, add the new item and then destroying
the
old
array.
The unfortunate thing with List<T> is that the capacity does grow up
one
by
one. I'd like to see something like "expand it by 10%", so that the
next
Add
would not incur reallocations.
It *doubles* the internal array size whenever you add an item that
would grow the array, not one by one. Starting at whatever your
initial capacity was (4 is default).

No Laura is right, a new array is created with a size which is equal to
the
previous size + 1 (size is an int) whenever there is an overflow.
Consider following sample:

List<byte> testList = new List<byte>();
for(int i = 0; i < 8; i++)
testList.Add((byte)i);
Here the List starts with an array of size 0, at the first insert a new
array gets created with a size of 1 (int), that means that we can add 4
bytes before the array overflow, when that happens a new array gets
created
with a size of 2, and again 4 bytes can be added.
For the above sample that means creating 3 array's :-(

Willy

If I put console.Write(testList.Capacity) after every create/add i
get:
0
4
4
4
4
8
8
8
8


The list doubles....Reflect the Add and EnsureCapacity method. What
happens is the current Count (size) is compared to the capacity
(items.Length). EnsureCapacity is then called on size+1. HOWEVER,
EnsureCapacity DOUBLES the list size.


That's right, a new array is created with a size which is twice the previous
array size.
Actually, my point was that the default was not 4 but 0 (call it
nitpicking). That is;
List<byte> testList = new List<byte>();
starts with an empty array, and at the very first insert, a new array gets
created with a Capacity = 4 * element size, which for a byte type is an
array of 1 int. From then on a new array with twice the size is created when
the current array gets filled completely .


Willy.
 
Jon Skeet said:
The unfortunate thing with List<T> is that the capacity does grow up one
by
one. I'd like to see something like "expand it by 10%", so that the next
Add
would not incur reallocations.

Nope, it doubles the capacity each time, as Doug said. Here's code to show
that:

using System;
using System.Collections.Generic;

public class ConvertFile
{
public static void Main(string[] args)
{
List<string> list = new List<string>();

int previousCapacity=-1;
for (int i=0; i < 100000; i++)
{
list.Add("");
if (list.Capacity != previousCapacity)
{
Console.WriteLine (list.Capacity);
previousCapacity = list.Capacity;
}
}
}
}

Output:

4
8
16
32
64
128
256
512
1024
2048
4096
8192
16384
32768
65536
131072

It would perhaps be nice to be able to specify the scaling factor, but 2
is reasonable for most situations.

I heard this scaling factor would change, or the algorithm would be adapted,
such that at a certain threshold the scaling factor would be reduced.
Actually there are some issues with this on 64 bit, remember the object size
limit is still 2GB even on 64 bit, so when arrived at 1GB, the next insert
will fail with an OOM exception, starting with a capacity which is large
enough is not always possible or even wanted.

Willy.
 
Back
Top