How memory stacks up.

  • Thread starter Thread starter Frank Rizzo
  • Start date Start date
F

Frank Rizzo

Hello, doing some profiling work and trying to resolve the memory usage
balooning through the roof. Just want to make sure that my reasoning is
correct.

My question is as follows: Let's say I have 1000 ArrayLists and each of
them has quite a bit of data in them. Does .NET framework load 1000
implementations of ArrayList in memory or does it load just one and uses
that to operate on data in the lists?

Thanks.
 
Frank Rizzo said:
Hello, doing some profiling work and trying to resolve the memory usage
balooning through the roof. Just want to make sure that my reasoning is
correct.

My question is as follows: Let's say I have 1000 ArrayLists and each of
them has quite a bit of data in them. Does .NET framework load 1000
implementations of ArrayList in memory or does it load just one and uses
that to operate on data in the lists?

It only loads one implementation - the size of the JITted code is
usually much smaller than the size of the actual data used.
 
Jon said:
It only loads one implementation - the size of the JITted code is
usually much smaller than the size of the actual data used.

However, by default, ArrayList instantiation, it allocates space for 16
entries. So I'd have 16*4*1000 bytes memory usage right out of the box,
correct? The 4 is 4 bytes for an empty object reference.
 
You can, of course, use

new ArrayList(1)

to initialize an arraylist with an initial capacity of 1 item.
 
Bruce said:
You can, of course, use

new ArrayList(1)

to initialize an arraylist with an initial capacity of 1 item.

Yes, but I first want to analyze the existing memory pattern before
making changes to the code base.
 
Frank Rizzo said:
However, by default, ArrayList instantiation, it allocates space for 16
entries. So I'd have 16*4*1000 bytes memory usage right out of the box,
correct? The 4 is 4 bytes for an empty object reference.

No the size of an Arralist is:
24 bytes fixed for the ArrayList + 32 bytes for the initial Object[] wrapped
by the ArrayList.
The array starts with 4 entries (16 bytes) and doubles it's size at each
extent.

So 1000 ArrayList's take 1000 * (32 + 24) = 56000 bytes.

Willy.
 
Frank Rizzo said:
Yes, but I first want to analyze the existing memory pattern before making
changes to the code base.

Well this will account for a total (24 + 20) * 1000 = 44000 bytes.

Willy.
 
Frank Rizzo said:
However, by default, ArrayList instantiation, it allocates space for 16
entries. So I'd have 16*4*1000 bytes memory usage right out of the box,
correct? The 4 is 4 bytes for an empty object reference.

Aside from extra overhead, yes. But as Willy said, that's only a very
small amount of data. You'll be using much more memory up on your
*realy* data.
 
No the size of an Arralist is:
24 bytes fixed for the ArrayList + 32 bytes for the initial Object[] wrapped
by the ArrayList.
The array starts with 4 entries (16 bytes) and doubles it's size at each
extent.

It starts with 16 entries, not 4.
ms-help://MS.VSCC.2003/MS.MSDNQTR.2004JAN.1033/cpref/html/frlrfsystemcollectionsarraylistclassctortopic3.htm
 
Gentlemen,

Just so that I understand... In terms of memory used by ArrayList
instantiations only, the following 2 structures will use up the same
amount of RAM? Forget the data, just empty instantiations.

Structure 1.

ArrayList singleList = new ArrayList(1000000);


Structure 2.
ArrayList [] multipleLists = new ArrayList [1000000];
for (int x=0;x<1000000;x++)
multipleLists[x] = new ArrayList(1);

Thanks
 
Frank Rizzo said:
Just so that I understand... In terms of memory used by ArrayList
instantiations only, the following 2 structures will use up the same
amount of RAM? Forget the data, just empty instantiations.

Structure 1.

ArrayList singleList = new ArrayList(1000000);


Structure 2.
ArrayList [] multipleLists = new ArrayList [1000000];
for (int x=0;x<1000000;x++)
multipleLists[x] = new ArrayList(1);

Not at all.

The first has the overhead of 1 ArrayList object, and an array with
1,000,000 null references. That array will take 4MB + the overhead of
the array object itself (about 12 bytes IIRC).

The second has an array of references to start with (4MB+tiny
overhead), and then 1,000,000 ArrayList objects, each of which has its
own overhead (about 24 bytes, IIRC), and its own array (overhead + 4
bytes for the single null reference).

So, the difference is between just over 4MB, and about 40MB.
 
Right, I was looking at v2.0 , on v1.x it defaults to 16 entries. (ArrayList
size = 24 + 16 + 64 = 104)
On v2.0 , the default is ... 0 entries, that is the object[] is created with
0 elements.
total size = ArrayList size 24 + Object[] size 16 = 40
, however the space reserved on the heap for the array is 32 bytes (4
entries), so while the object[] size is only 16 bytes, the real physical
size is 32. Whenever you add the first element to the list, the object[]
size becomes the real physical size (32).

Willy.

Frank Rizzo said:
No the size of an Arralist is:
24 bytes fixed for the ArrayList + 32 bytes for the initial Object[]
wrapped by the ArrayList.
The array starts with 4 entries (16 bytes) and doubles it's size at each
extent.

It starts with 16 entries, not 4.
ms-help://MS.VSCC.2003/MS.MSDNQTR.2004JAN.1033/cpref/html/frlrfsystemcollectionsarraylistclassctortopic3.htm

So 1000 ArrayList's take 1000 * (32 + 24) = 56000 bytes.

Willy.
 
Jon said:
Frank Rizzo said:
Just so that I understand... In terms of memory used by ArrayList
instantiations only, the following 2 structures will use up the same
amount of RAM? Forget the data, just empty instantiations.

Structure 1.

ArrayList singleList = new ArrayList(1000000);


Structure 2.
ArrayList [] multipleLists = new ArrayList [1000000];
for (int x=0;x<1000000;x++)
multipleLists[x] = new ArrayList(1);


Not at all.

The first has the overhead of 1 ArrayList object, and an array with
1,000,000 null references. That array will take 4MB + the overhead of
the array object itself (about 12 bytes IIRC).

The second has an array of references to start with (4MB+tiny
overhead), and then 1,000,000 ArrayList objects, each of which has its
own overhead (about 24 bytes, IIRC), and its own array (overhead + 4
bytes for the single null reference).

So, the difference is between just over 4MB, and about 40MB.

Ok, now I am confused. In the original answer to my question you stated
that .NET just loads one implementation of ArrayList, the rest is just
data. So according to you, the size of 2 structures should be about the
same. Because the first one is just a single ArrayList with allocated
memory for 1 million objects (4 bytes * 1 million). The second scenario
is an array of 1 million ArrayList objects (4mb + tiny overhead) + space
that each ArrayList object allocated (4 bytes) - so that's another 4MB.
And since you mentioned that .NET just loads one implementation of
ArrayList, that should be pretty much it.

Unless I misundertood your original answer.
Btw, you are absolutely correct, it is 4mb vs 40 mb (i just measured it).

Regards
 
Thanks, this is really helpful. My last question is how exactly did
figure out that the implementation of ArrayList is 24 bytes?

I am looking at ArrayList source code and it has the following local
variables which adds up to 39 bytes.

public virtual int Capacity { get; set; }
public virtual int Count { get; }
public virtual bool IsFixedSize { get; }
public virtual bool IsReadOnly { get; }
public virtual bool IsSynchronized { get; }
public virtual object this[int index] { get; set; }
public virtual object SyncRoot { get; }

// Fields
private const int _defaultCapacity = 0x10;
private object[] _items;
private int _size;
[NonSerialized]
private object _syncRoot;
private int _version;


Right, I was looking at v2.0 , on v1.x it defaults to 16 entries. (ArrayList
size = 24 + 16 + 64 = 104)
On v2.0 , the default is ... 0 entries, that is the object[] is created with
0 elements.
total size = ArrayList size 24 + Object[] size 16 = 40
, however the space reserved on the heap for the array is 32 bytes (4
entries), so while the object[] size is only 16 bytes, the real physical
size is 32. Whenever you add the first element to the list, the object[]
size becomes the real physical size (32).

Willy.

No the size of an Arralist is:
24 bytes fixed for the ArrayList + 32 bytes for the initial Object[]
wrapped by the ArrayList.
The array starts with 4 entries (16 bytes) and doubles it's size at each
extent.

It starts with 16 entries, not 4.
ms-help://MS.VSCC.2003/MS.MSDNQTR.2004JAN.1033/cpref/html/frlrfsystemcollectionsarraylistclassctortopic3.htm


So 1000 ArrayList's take 1000 * (32 + 24) = 56000 bytes.

Willy.
 
Jon Skeet said:
Frank Rizzo said:
Just so that I understand... In terms of memory used by ArrayList
instantiations only, the following 2 structures will use up the same
amount of RAM? Forget the data, just empty instantiations.

Structure 1.

ArrayList singleList = new ArrayList(1000000);


Structure 2.
ArrayList [] multipleLists = new ArrayList [1000000];
for (int x=0;x<1000000;x++)
multipleLists[x] = new ArrayList(1);

Not at all.

The first has the overhead of 1 ArrayList object, and an array with
1,000,000 null references. That array will take 4MB + the overhead of
the array object itself (about 12 bytes IIRC).

The second has an array of references to start with (4MB+tiny
overhead), and then 1,000,000 ArrayList objects, each of which has its
own overhead (about 24 bytes, IIRC), and its own array (overhead + 4
bytes for the single null reference).

So, the difference is between just over 4MB, and about 40MB.

To be exect for v1.1 of the framework.

The ArrayList overhead is 24 bytes,
the Array overhead is 16 bytes.
So for 1 the total size is: 24 + 16 + (1.000.000 * 4) = 4.000.040 bytes.

For case 2:
An array of 1000000 entries of type ArrayList:
An array of 1.000.000 entries = 16 + (1.000.000 * 4)
plus..
1.000.000 ArrayLists with 1 entry
(24 + 20) * 1000000 = 44.000.000.
gives:
16 + (1.000.000 * 4) + (24 + 16 + 4) * 1000000 = 48.000.016

Willy.
 
Frank Rizzo said:
Thanks, this is really helpful. My last question is how exactly did
figure out that the implementation of ArrayList is 24 bytes?

I am looking at ArrayList source code and it has the following local
variables which adds up to 39 bytes.

public virtual int Capacity { get; set; }
public virtual int Count { get; }
public virtual bool IsFixedSize { get; }
public virtual bool IsReadOnly { get; }
public virtual bool IsSynchronized { get; }
public virtual object this[int index] { get; set; }
public virtual object SyncRoot { get; }

// Fields
private const int _defaultCapacity = 0x10;
private object[] _items;
private int _size;
[NonSerialized]
private object _syncRoot;
private int _version;

Wait a minute you are confusing implementation (behavior, that is the code)
and object state.
Code is loaded only once per type (not in the GC heap) , state is created
per instance of that type, right?

When talking about the object state, that is what is allocated on the GC
heap, we have per ArrayList:

- a fixed overhead of 8 bytes (2 words of 32 bits on X86) for each object
type, plus
- the object state consisting of: _items, _size, _syncRoot, _version; or 4
* 4 = 16 bytes
Makes 24 bytes.
BUT old on, the ArrayList wraps a Object[] _items as part of it's state, and
this array has:
16 bytes overhead + n * 4 bytes per Array element (4 bytes per element as
they are references), where n is 16 by default for a ArrayList().
Note that this is for empty arraylist only, add to that the real objects
added to the list .....


Willy.
 
Frank Rizzo said:
Ok, now I am confused. In the original answer to my question you stated
that .NET just loads one implementation of ArrayList, the rest is just
data.
Yes.

So according to you, the size of 2 structures should be about the
same.

No, not at all. There's all the difference in the world between a
million arrays of length 1 (each of which is an object) and one array
of length a million.
Because the first one is just a single ArrayList with allocated
memory for 1 million objects (4 bytes * 1 million).

That's the space required just for the cost of the array of ArrayList
references, not for the ArrayLists themselves.
The second scenario
is an array of 1 million ArrayList objects (4mb + tiny overhead) + space
that each ArrayList object allocated (4 bytes) - so that's another 4MB.

The cost of each ArrayList object is more than 4 bytes - it's about 24
bytes: 8 bytes overhead for being an object, 4 bytes for the size, 4
bytes for the internal versioning, 4 bytes for the reference to the
array, and 4 bytes for the sync root reference. Then there's the
overhead of the array itself - 8 bytes for being an object, plus 4
bytes for the size, plus 4 bytes for the single reference (in the case
of a length 1 array). There may be other overheads in there.
And since you mentioned that .NET just loads one implementation of
ArrayList, that should be pretty much it.

Unless I misundertood your original answer.
Btw, you are absolutely correct, it is 4mb vs 40 mb (i just measured it).

Do you understand now? I'm not sure that I've listed anything above
that I didn't list in my previous post...
 
Jon said:
No, not at all. There's all the difference in the world between a
million arrays of length 1 (each of which is an object) and one array
of length a million.




That's the space required just for the cost of the array of ArrayList
references, not for the ArrayLists themselves.




The cost of each ArrayList object is more than 4 bytes - it's about 24
bytes: 8 bytes overhead for being an object, 4 bytes for the size, 4
bytes for the internal versioning, 4 bytes for the reference to the
array, and 4 bytes for the sync root reference. Then there's the
overhead of the array itself - 8 bytes for being an object, plus 4
bytes for the size, plus 4 bytes for the single reference (in the case
of a length 1 array). There may be other overheads in there.




Do you understand now? I'm not sure that I've listed anything above
that I didn't list in my previous post...

Yes, thanks. I get it now. Something Willy said finally clicked in my
head. Code is loaded only once per type (not in the GC heap), state is
created per instance of that type. I grok it now. I have a object
model that has a lot of collections (array lists, hashtables) of other
collections. So I am having an effect where many thousand collection
objects get created each obviously consuming memory even though these
collections contain nothing. I will move those collections out as much
as possible to a higher-level object to alleviate the problem.

Regards
 
Back
Top