Garbage collectable pinned arrays!

A

Atmapuri

Hi!

It would be great, if the pinned arrays would be garbage
collectable. This would greatly reduce the amount of copy
back and forth the managed and unmanaged memory.
I cant think of a single reason, why the GC should not allow
this.

It would be also great, if the memory could already be allocated
as pinned:

double[] array= new pinned uninitialized double [500];

where uninitialized would mean that it is not set to zero, thus
saving processing power where GC pressure is high.

Thanks!
Atmapuri
 
L

Lasse Vågsæther Karlsen

Atmapuri said:
Hi!

It would be great, if the pinned arrays would be garbage
collectable. This would greatly reduce the amount of copy
back and forth the managed and unmanaged memory.
I cant think of a single reason, why the GC should not allow
this.

Wouldn't that mean you would need two flags for each such allocation:

1. fixed address
2. available for garbage collection

AFAIK, the pin flag is to be used when you pass a buffer or similar to
unmanaged code (or unsafe code using pointers) and rely on the garbage
collector not moving it about and perhaps not even being able to
determine when it has run out of scope.

For instance, the following two problems are solved by the dual function
of the pin flag:

You create a buffer, pass it to an unmanaged function, and the function
takes note of the address and returns. Other unmanaged functions are not
given the address for each call but rely on the original address that
was stored. In this scenario, the garbage collector should not move the
memory around.

You create a buffer, fill it with data and call an unmanaged function.
This function spawns a thread to process the data and returns. Your
managed code does not use the buffer any more, but will need to have it
pinned for the duration of the background processing to avoid it being
yanked away beneath the thread.

Or perhaps I'm misunderstanding something and mixing the pin flag with
something else?
 
J

Jesse McGrew

Hi!

It would be great, if the pinned arrays would be garbage
collectable. This would greatly reduce the amount of copy
back and forth the managed and unmanaged memory.
I cant think of a single reason, why the GC should not allow
this.

It would be also great, if the memory could already be allocated
as pinned:

double[] array= new pinned uninitialized double [500];

where uninitialized would mean that it is not set to zero, thus
saving processing power where GC pressure is high.

As I understand it, pinning is an attribute of the *reference*, not
the object itself. The object is pinned when there's a pinning
reference to it anywhere on the call stack.

Thus, it doesn't make sense for a pinned object to be garbage
collectible. The object can't be considered garbage anyway while
you're still holding a reference to it, and once you let go of the
last reference, it's no longer pinned.

I don't see why you'd even want it to be collectible, actually. The
point of pinning is to let unmanaged code access the object without
worrying that the GC will move it. But collecting the object and
letting its memory be used for something else is just as dangerous as
moving it!

Jesse
 
A

Atmapuri

Hi!
As I understand it, pinning is an attribute of the *reference*, not
the object itself. The object is pinned when there's a pinning
reference to it anywhere on the call stack.

Thus, it doesn't make sense for a pinned object to be garbage
collectible. The object can't be considered garbage anyway while
you're still holding a reference to it, and once you let go of the
last reference, it's no longer pinned.

I don't see why you'd even want it to be collectible, actually. The
point of pinning is to let unmanaged code access the object without
worrying that the GC will move it. But collecting the object and
letting its memory be used for something else is just as dangerous as
moving it!

You are mixing two points:

- reference to unamanged memory, where the word pinned also
means that it won't be collected.
- location of the array in GC (Heap or else).

When the array is pinned it is copied to heap. All I would like
to see is an option to allocate the array on the heap initially.

The array is automatically allocated on the heap once it exceeds
a certain size and thu's becomes "pinned", because the heap is
never compacted and all addresses are absolute.

Therefore, the pinned keyword during memory allocation would
only instruct the GC where to put the array. From there on it
can work as usual.

The reason to put the array on the heap is to avoid the
need to pin it, if you have to pass it to unmanaged code
multiple times. Thus in turn improving performance in
such cases.

So, I would like to specify that the memory is allocated
on the heap, and before passing it to the external code,

I can still call GC.KeepAlive(), to prevent the array from
being collected if needed. That is still less code than
calling GCHandle.Alloc and multiple times faster than
current implementation which copies the array.

The GC can still track which arrays have
been declared as pinned and make sure that they
are not collected while unmanaged code is executing.
Same as now.

Regards!
Atmapuri
 
J

Jesse McGrew

Hi!




You are mixing two points:

- reference to unamanged memory, where the word pinned also
means that it won't be collected.
- location of the array in GC (Heap or else).

When the array is pinned it is copied to heap. All I would like
to see is an option to allocate the array on the heap initially.

The array is automatically allocated on the heap once it exceeds
a certain size and thu's becomes "pinned", because the heap is
never compacted and all addresses are absolute.

Really? I thought arrays were always allocated on the heap, like any
other reference type. And why do you say the heap is never compacted?

Jesse
 
A

Atmapuri

Hi!
You create a buffer, fill it with data and call an unmanaged function.
This function spawns a thread to process the data and returns. Your
managed code does not use the buffer any more, but will need to have it
pinned for the duration of the background processing to avoid it being
yanked away beneath the thread.

Or perhaps I'm misunderstanding something and mixing the pin flag with
something else?

You understand well :) In my case the word "pinned" refers only to the
"fixed address" and not to "available for the garbage collection",
(if there are no other references of course).

The idea is to reduce the overhead which comes from the fact that
array is not allocated on the "fixed address".

Currently when you need to have the array on the fix address, you have
to copy it first to the heap. (GCHandle.Alloc does that for you). If you
pass that same array multiple times to external unmanaged code, that
results in considerable performance penalties, becaues that same
array is copied multiple times, if you want to keep your code parts
independent.

Thanks!
Atmapuri
 
J

Jon Skeet [C# MVP]

Atmapuri said:
You are mixing two points:

- reference to unamanged memory, where the word pinned also
means that it won't be collected.
- location of the array in GC (Heap or else).

When the array is pinned it is copied to heap. All I would like
to see is an option to allocate the array on the heap initially.

Unless you use __stackalloc or fixed size buffers, arrays are always
allocated on the heap.
The array is automatically allocated on the heap once it exceeds
a certain size and thu's becomes "pinned", because the heap is
never compacted and all addresses are absolute.

The heap absolutely *is* compacted. Only the large object heap isn't
compacted as I recall - is that what you're talking about?
Therefore, the pinned keyword during memory allocation would
only instruct the GC where to put the array. From there on it
can work as usual.

The reason to put the array on the heap is to avoid the
need to pin it, if you have to pass it to unmanaged code
multiple times. Thus in turn improving performance in
such cases.

So, I would like to specify that the memory is allocated
on the heap, and before passing it to the external code,

I can still call GC.KeepAlive(), to prevent the array from
being collected if needed. That is still less code than
calling GCHandle.Alloc and multiple times faster than
current implementation which copies the array.

If you do that without pinning, your code will break when the GC
compacts the generation containing the array.
 
L

Lasse Vågsæther Karlsen

Jesse said:
Really? I thought arrays were always allocated on the heap, like any
other reference type. And why do you say the heap is never compacted?

Jesse

There are two heaps: The normal heap and the "large object heap". I
don't know if they have different names or whatnot, and I think the size
threshold is around 85kb, but that is probably an implementation detail.

Objects above the size is allocated on "that other heap", which is not
garbage collected the same way as "the main heap". I'm sure Jon or
someone else can correct my use(?) of names here, or details in general
for that matter :)

If I understand the OP correctly, objects put on the large object heap
is "pinned automatically" because they main reason they're put on this
heap is that they're so big that to move them around is considered a
performance problem.

For some reason I have this nagging feeling that I'm missing something
here... On the other hand, it could just be a fever...
 
W

Willy Denoyette [MVP]

Atmapuri said:
Hi!


You are mixing two points:
- reference to unamanged memory, where the word pinned also
means that it won't be collected.
- location of the array in GC (Heap or else).

When the array is pinned it is copied to heap. All I would like
to see is an option to allocate the array on the heap initially.

Array's are reference types, so always stored on the heap!
The array is automatically allocated on the heap once it exceeds
a certain size and thu's becomes "pinned", because the heap is
never compacted and all addresses are absolute.

Arrays that exceed a certain size (85Kb currently), are moved to the Large
Object Heap, but they aren't pinned by this. The LOH is not compacted, but
that doesn't mean that the objects cannot get collected.

Therefore, the pinned keyword during memory allocation would
only instruct the GC where to put the array. From there on it
can work as usual.

The reason to put the array on the heap is to avoid the
need to pin it, if you have to pass it to unmanaged code
multiple times. Thus in turn improving performance in
such cases.

Arrays are always on the heap, when passed to unmanaged land, all heap
allocated objects must be pinned in order to avoid relocation by the
*compacting* GC.
..
So, I would like to specify that the memory is allocated
on the heap, and before passing it to the external code,

What memory are you talking about? Reference types are always on the heap,
arrays are reference types so.....
I can still call GC.KeepAlive(), to prevent the array from
being collected if needed. That is still less code than
calling GCHandle.Alloc and multiple times faster than
current implementation which copies the array.

Not sure what you mean here, mind to elaborate?
The GC can still track which arrays have
been declared as pinned and make sure that they
are not collected while unmanaged code is executing.
Same as now.

Not sure what you mean here, mind to elaborate?

Willy.
 
W

Willy Denoyette [MVP]

Atmapuri said:
Hi!


You understand well :) In my case the word "pinned" refers only to the
"fixed address" and not to "available for the garbage collection",
(if there are no other references of course).

The idea is to reduce the overhead which comes from the fact that
array is not allocated on the "fixed address".

Currently when you need to have the array on the fix address, you have
to copy it first to the heap. (GCHandle.Alloc does that for you). If you
pass that same array multiple times to external unmanaged code, that
results in considerable performance penalties, becaues that same
array is copied multiple times, if you want to keep your code parts
independent.
Thanks!
Atmapuri

Pinning is used to prevent object collection when passing a reference to it
to unmanaged code, the GC doesn't know what happens in unmanaged code and
considers the object as garbage (assuming there's no other live reference to
the object).

Don't know why you keep talking about copying when passing arrays to
unmanaged code, in general, no copying is done, but all depends on the
context and the version of the JIT/CLR. The 64-bit JIT may copy array's to
an internal buffer when passing an array to unmanaged, this is done to avoid
pinning for the duration of the call. Pinning negatively impacts overall
performance, that's why MS has opted for a copy operationinstead of a
pinning operation.

Willy.
 
A

Atmapuri

Hi!
If I understand the OP correctly, objects put on the large object heap
is "pinned automatically" because they main reason they're put on this
heap is that they're so big that to move them around is considered a
performance problem.

There is also another reason why objects are put on this heap. When you
pin them :)
For some reason I have this nagging feeling that I'm missing something
here... On the other hand, it could just be a fever...

Well, you understand and agree with everything but...? If you
have a sequence:

function1();
function2();
function3();
...

And each function passes the same array to unmanaged memory.
Do you realize that performance is lost, because of pinning?

That the primary reason for performance loss is the copy operation
because of pinning?

That if the arrays would be already allocated on the large object
heap, this performance loss would not occure, because the need
for copying would be gone?

Thanks!
Atmapuri
 
A

Atmapuri

Hi!
Array's are reference types, so always stored on the heap!

The heap as I understand is not compactable, which in this
case means the "large object heap". This is what I meant
under "heap".
Arrays that exceed a certain size (85Kb currently), are moved to the Large
Object Heap, but they aren't pinned by this. The LOH is not compacted, but
that doesn't mean that the objects cannot get collected.

Of course.

I want an option:

- for the arrays to have a fixed address. (!!)
- still be collectable (!!)

If the arrays are not pinned they will still not be collected, if
all references to the array also have not been invalidated.

If the array has a fixed address and its only reference is passed
to unmanaged code, You can ensure that it will not be collected
by putting GC.KeepAlive(array) after the unmanaged code call.

By removing the need to pin the array for every call to unmanaged
code, you gain speed.

Thanks!
Atmapuri
 
J

Jon Skeet [C# MVP]

Well, you understand and agree with everything but...? If you
have a sequence:

function1();
function2();
function3();
..

And each function passes the same array to unmanaged memory.
Do you realize that performance is lost, because of pinning?

That the primary reason for performance loss is the copy operation
because of pinning?

Where is that copying taking place though? Pinning doesn't inherently
involve copying.
 
A

Atmapuri

Hi!
Where is that copying taking place though? Pinning doesn't inherently
involve copying.

Yes. It does. All arrays shorter than the limit are allocated inside
the "compactable" part of the GC heap and thus have to be copied
out of the "compactable" heap before they can have a "fixed" address.

Thanks!
Atmapuri
 
A

Atmapuri

Hi!
Pinning is used to prevent object collection when passing a reference to
it to unmanaged code, the GC doesn't know what happens in unmanaged code
and considers the object as garbage (assuming there's no other live
reference to the object).

Don't know why you keep talking about copying when passing arrays to
unmanaged code, in general, no copying is done, but all depends on the
context and the version of the JIT/CLR. The 64-bit JIT may copy array's to
an internal buffer when passing an array to unmanaged, this is done to
avoid pinning for the duration of the call. Pinning negatively impacts
overall performance, that's why MS has opted for a copy operationinstead
of a pinning operation.

Ok. Allow me refrase my wish:

double[] a = new fixed double[20];

It should be evident from your explanation, that if the array would have a
fixed
address, the copy operation for which the Microsoft opted for, would
not be neccessary.

So please: Add a keyword that would allow allocation of the array with
a fixed address (!).

Thanks!
Atmapuri
 
J

Jon Skeet [C# MVP]

Yes. It does. All arrays shorter than the limit are allocated inside
the "compactable" part of the GC heap and thus have to be copied
out of the "compactable" heap before they can have a "fixed" address.

I was under the impression that pinned objects live where they are,
but potentially cause heap fragmentation:
http://msdn.microsoft.com/msdnmag/issues/06/11/CLRInsideOut/default.aspx

That's still a performance hit, but it's not copying. Do you have
evidence that pinning involves copying?

Jon
 
A

Atmapuri

Hi!
That's still a performance hit, but it's not copying. Do you have
evidence that pinning involves copying?

It does copying and it does cause "fragmentation" and performance hit
due to fragmentation. Both at once. (But that fragmentation issue, is
actually the same fragmenation issue as with all unmanaged code apps
including
Windows.)

However, the copying is from small object heap to the large object heap,
And the fragmentation is in the large object heap, because the small object
heap which is compactable of course can not be fragmented.

You can check that arrays are copied by allocating ever
larger arrays and pinning them down and measuring the time it takes
to do that for each array size.

You will see that beyond certain array size, the pinning cost becomes
zero. The timings must be normalized with the array length however,
otherwise
it is harder to see. When the pinning cost becomes zero this means that
the array is so large that it is now allocated on the large object heap
from the start and needs not to be copied there anymore.

If however, there would be a language feature which would allow you
to specify that the array could be allocated on the large object
heap regardless of its size.... you could save yourself a lot of
copy operations when interfacing unmanaged code.

Currently the GC decides where the array goes:
- in the small object heap which is compactable
- large object heap which is not compactable

Please give us a C# language feature where the programmer
decides where the arrays go, because only the programmer
knows how will they be used.

Thanks!
Atmapuri
 
J

Jon Skeet [C# MVP]

Atmapuri said:
It does copying and it does cause "fragmentation" and performance hit
due to fragmentation. Both at once. (But that fragmentation issue, is
actually the same fragmenation issue as with all unmanaged code apps
including Windows.)

However, the copying is from small object heap to the large object heap,
And the fragmentation is in the large object heap, because the small object
heap which is compactable of course can not be fragmented.

That's certainly not what I've been reading.

See

http://blogs.msdn.com/maoni/archive/2005/10/03/so-what-s-new-in-the-
clr-2-0-gc.aspx

and

http://blogs.msdn.com/maoni/archive/2004/12/19/327149.aspx

In particular, from the latter:

<quote>
Pinning for a short time is cheap.

How short is "a short time"? Well, if there=3Fs no GC happening, pinning
simply sets a bit in the object header and unpinning simply clears it.
</quote>

How does that fit in with your claim that the data is being copied?
You can check that arrays are copied by allocating ever
larger arrays and pinning them down and measuring the time it takes
to do that for each array size.

No, that's not checking that arrays are being copied. That's checking
that *something* is taking time. You're *assuming* that's because of
copying, unless you've got more evidence to back your assertion that it
really is copying.

Please give us a C# language feature where the programmer
decides where the arrays go, because only the programmer
knows how will they be used.

Frankly, when you require that much control over how memory is used,
I'd consider writing unmanaged code instead.
 
L

Lasse Vågsæther Karlsen

Atmapuri said:
Hi!


Yes. It does. All arrays shorter than the limit are allocated inside
the "compactable" part of the GC heap and thus have to be copied
out of the "compactable" heap before they can have a "fixed" address.

Thanks!
Atmapuri

I'd very much like to know where you have this from.

I tried the following code:

Object o = 10;
Debug.WriteLine(o);
GCHandle h = GCHandle.Alloc(o, GCHandleType.Pinned);
Debug.WriteLine(o);
Text = ((Int32)h.AddrOfPinnedObject()).ToString("X8");
h.Free();

// Just for security
GC.KeepAlive(o);
GC.KeepAlive(h);

I then ran this code through the CLR Debugger, and the following code is
the result:

Object o = 10;
00000030 mov ecx,79102290h
00000035 call FFA31CE4
0000003a mov esi,eax
0000003c mov dword ptr [esi+4],0Ah
00000043 mov edi,esi
Debug.WriteLine(o);
00000045 mov ecx,edi
00000047 call 795EAAD8
0000004c nop
GCHandle h = GCHandle.Alloc(o, GCHandleType.Pinned);
0000004d mov ecx,edi
0000004f mov edx,3
00000054 call 78450ABC
00000059 mov esi,eax
0000005b mov dword ptr [ebp-48h],esi
Debug.WriteLine(o);
0000005e mov ecx,edi
00000060 call 795EAAD8
00000065 nop
Text = ((Int32)h.AddrOfPinnedObject()).ToString("X8");
00000066 mov esi,dword ptr [ebp-3Ch]
00000069 lea ecx,[ebp-48h]
0000006c call 7847B8F8
00000071 mov ebx,eax
00000073 mov ecx,ebx
00000075 call 7844DBB8
0000007a mov ebx,eax
0000007c mov dword ptr [ebp-4Ch],ebx
0000007f lea ecx,[ebp-4Ch]
00000082 mov edx,dword ptr ds:[02313048h]
00000088 call 784D4C70
0000008d mov ebx,eax
0000008f mov edx,ebx
00000091 mov ecx,esi
00000093 mov eax,dword ptr [ecx]
00000095 call dword ptr [eax+00000168h]
0000009b nop
h.Free();
0000009c lea ecx,[ebp-48h]
0000009f call 7847BAA8
000000a4 nop

// Just for security
GC.KeepAlive(o);
000000a5 mov ecx,edi
000000a7 call 78FE30D8
000000ac nop
GC.KeepAlive(h);
000000ad mov ecx,79107774h
000000b2 call FFA31CE4
000000b7 mov esi,eax
000000b9 mov eax,dword ptr [ebp-48h]
000000bc mov dword ptr [esi+4],eax
000000bf mov ecx,esi
000000c1 call 78FE30D8
000000c6 nop

As you can see, the magic two lines of code that calls Debug.WriteLine
are the ones at address 45/47, and 5e/60.

When executing the code, from your statement, the value in ecx in the
above two executions of the same code should change, but it doesn't.
It's the same address passed to .WriteLine in both cases.

Additionally, the "Text = " line puts the address as reported by the
GCHandle object into the caption of the form I put this code in, and the
following is what I get during one execution:
ecx : 013ACCEC
text: 013ACCF0

The difference is 4 bytes, which amounts to the VMT pointer or whatever
they call it in .NET (or perhaps something else... or perhaps I'm just
wrong).

As such, it doesn't appear to involve any kind of copying at all, as I
would gather from your statement that the address would then change more.

Can you please elaborate on why you think there is copying involved?

Or perhaps I'm doing something horribly wrong here...
 
W

Willy Denoyette [MVP]

Atmapuri said:
Hi!
Pinning is used to prevent object collection when passing a reference to
it to unmanaged code, the GC doesn't know what happens in unmanaged code
and considers the object as garbage (assuming there's no other live
reference to the object).

Don't know why you keep talking about copying when passing arrays to
unmanaged code, in general, no copying is done, but all depends on the
context and the version of the JIT/CLR. The 64-bit JIT may copy array's
to
an internal buffer when passing an array to unmanaged, this is done to
avoid pinning for the duration of the call. Pinning negatively impacts
overall performance, that's why MS has opted for a copy operationinstead
of a pinning operation.

Ok. Allow me refrase my wish:

double[] a = new fixed double[20];

It should be evident from your explanation, that if the array would have a
fixed
address, the copy operation for which the Microsoft opted for, would
not be neccessary.
No, but the GC must keep track of the object and prevent it from moving when
running inside unmanaged code (assuming you have passed the address to
tunmanaged). Alos, the PInvoke layer will not copy the double[] of that size
when passing it to unmanaged, it will pin it. Copying is only done when
running 64-bit code and only in certain scenario's, believe me, the MS
performance folks know what they are doing, don't try to outsmart the
system.
So please: Add a keyword that would allow allocation of the array with
a fixed address (!).
What makes you think I can do this?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top