Why are arrays objects (on heap) instead of structs (on stack)?

G

Gomaw Beoyr

Hello

Is there any explanation why Microsoft chose to implement
arrays as objects allocated on the heap instead of structs
allocated on the stack?

For "mathematical stuff", one normally wishes to avoid
unnecessary garbage collection, but that is exactly what's
going to happen if a lot of arrays, matrixes etc are allocated
on the heap.


Consider e.g. this line at the beginning of a function:

double[,] a = new double[n, n];

If this function is called 10000 times with n = 100, that
yields around 800 MB to garbage-collect.


Ok, I realize that the case when arrays on the heap are obviously
better than arrays on the stack, is when the array is "reallocated"
to a bigger array, e.g.:

double[] a = new double[1000];
...
a = new double[2000];

This would obviously "break" the original array. So is this the
most significant reason for the decision to put all arrays on the
heap? (There are some alternative solutions to this particular
problem, but they are all somewhat "kludgy".)

/Gomaw
 
J

Justin Rogers

Couple of small things. First, the stack is only but so large. I'd think
you would run out of stack space if you allocated a sufficiently large
array. Second, arrays are objects because they don't have a fixed layout in
memory necessarily. They are made so that a single array can span multiple
chunks of memory (though I don't know if this is actually put to practice).

For mathematical stuff you may need to wait until generics are fully
implemented and the Array is rewritten a bit. Or you may just want to
allocate a fixed size array that you use over and over again if you keep
calling it from the same function. If the function can get called multiple
times on multiple threads, then use some OO programming and put everything
into an object pool that each stores an array instance and you can re-use
the objects from the pool as things complete.
Ok, I realize that the case when arrays on the heap are obviously
better than arrays on the stack, is when the array is "reallocated"
to a bigger array, e.g.:

double[] a = new double[1000];
...
a = new double[2000];

That does not reallocate a bigger array. You now have two double arrays,
one that is available for GC and is 1000 elements large and a second that is
2000 elements large and is now assigned to the local variable a. There is
no way to *grow* an array. They are always of fixed size. If you have a
logical array (let's denote it by the local variable a), and you need to
*grow* it, you have to allocate a new array, copy over elements from the
previous instance and finally assign the new array back to the local
variable. Some programmers consider this *growing* the array, but in fact
you are physically creating new arrays and doing quite a bit more work than
what would happen with a realloc statement in C.
 
G

Gomaw Beoyr

Couple of small things. First, the stack is only but so large. I'd think
you would run out of stack space if you allocated a sufficiently large
array. Second, arrays are objects because they don't have a fixed layout in
memory necessarily. They are made so that a single array can span multiple
chunks of memory (though I don't know if this is actually put to practice).

But in C and C++, aren't arrays by default on the stack? (I don't
know, though, exactly how memory is managed. Is the heap "bigger"
than the stack?)

For mathematical stuff you may need to wait until generics are fully
implemented and the Array is rewritten a bit.

Er? Will Array be rewritten when generics heve been implemented,
in a way that might make math stuff go faster? How?

Or you may just want to
allocate a fixed size array that you use over and over again if you keep
calling it from the same function. If the function can get called multiple
times on multiple threads, then use some OO programming and put everything
into an object pool that each stores an array instance and you can re-use
the objects from the pool as things complete.

Yes, I need to do something along these lines.

(At the moment, I just declare all "temporary" arrays as instance or
even class variables, i.e. they're only allocated once. And since I
normally only have one instance of the object, memory and
thread-safety is no big deal. However, the final code needs to be
more robust, of course.)

/Gomaw
 
M

Michael Mayer

Gomaw Beoyr said:
practice).

I'd be EXTREMELY surprised to ever see an array split up in memory.
Afterall, array's are fixed size (as Justin pointed out). When you create
an array (e.g. new Int32[100]) the runtime knows to allocate 400 bytes (plus
a little overhead) on the heap in one contiguous chunk. If that much room
isn't available, time for a GC. This means that indexing into the array
will only take the time for a quick bounds check (index >=0, index < length,
etc) and then take the base address and add the index to find the spot in
memory. To change that would be a rather poor choice.
But in C and C++, aren't arrays by default on the stack? (I don't
know, though, exactly how memory is managed. Is the heap "bigger"
than the stack?)

That depends on if you use the new keyword or not in C++

Stack based array:
int array1[100000];

Heap based array:
int* array1 = new int[100000];

I would tend to agree with Justin that creating large arrays on the stack
could create a stack overflow. (I got one earlier today in C++, but it was
after declaring three arrays of 1 million ints each, and then only after
calling printf, so I don't think it's an issue very often, but I'm a bit out
of touch with C++).
Er? Will Array be rewritten when generics heve been implemented,
in a way that might make math stuff go faster? How?

This is the first I've heard that Array would be rewritten for generics.
Isn't Array already strongly typecast:
(back to C# code:)
int[] arrayOfInts = new int[100000];

arrayOfInts[0] = new Object(); // should throw an exception...

I would have thought generics would be only to assist with collections like
ArrayList, Hashtable, etc. But I could very well be mistaken.
Yes, I need to do something along these lines.

(At the moment, I just declare all "temporary" arrays as instance or
even class variables, i.e. they're only allocated once. And since I
normally only have one instance of the object, memory and
thread-safety is no big deal. However, the final code needs to be
more robust, of course.)

Sounds like a good solution to me. For example, if you have a class to
perform an FFT on an input array of 1024 numbers AND you plan on calling
this class often and frequently, it probably makes sense to let the FFT
class be constructed with that size (1024), have the constructor allocate
any internal arrays that are needed, and keep them in memory all the time
(until the class as a whole is GC'ed). Another choice, balancing speed of
allocating / GCing with memory consumption would be to use a "weak
reference". Look that up in msdn or google.
 
W

Willy Denoyette [MVP]

Michael Mayer wrote:
|| This is the first I've heard that Array would be rewritten for
|| generics. Isn't Array already strongly typecast:
|| (back to C# code:)
|| int[] arrayOfInts = new int[100000];
||
|| arrayOfInts[0] = new Object(); // should throw an exception...
||
|| I would have thought generics would be only to assist with
|| collections like ArrayList, Hashtable, etc. But I could very well
|| be mistaken.
||

You are correct, Array isn't/shouldn't be rewritten to accomodate generics.

Willy.
 
G

Gomaw Beoyr

Another choice, balancing speed of
allocating / GCing with memory consumption would be to use a "weak
reference". Look that up in msdn or google.

Yes, I did, and they will probably help much. Thanks!

/Gomaw
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top