Garbage collectable pinned arrays!

Atmapuri · Feb 12, 2008

Hi!

And for interop buffers with a long lifetime, the cost of pinning is very
high, yet the fragmentation impact wouldn't be a problem at all.

For example, I have an application which uses glVertexPointer. I need to
interop that buffer every single frame for the life of my program, there's
no reason it shouldn't sit in the LOH and avoid the overhead of pinning
(both direct cost and extra fragmentation of Gen0 heap, because the buffer
is pinned it can never move to Gen2).

God bless you

Finally a real man (!)
Atmapuri

Lasse Vågsæther Karlsen · Feb 12, 2008

Lasse said:
Lasse said:

Atmapuri said:

Hi!

I'd very much like to know where you have this from.

Common knowledge? <g>

Click to expand...

Since we're several people here arguing that pinning does not, to our
knowledge, copy data, I would say that it doesn't sound to me like
it's common knowledge.

I tried the following code:

Object o = 10;
Debug.WriteLine(o);
GCHandle h = GCHandle.Alloc(o, GCHandleType.Pinned);
Debug.WriteLine(o);
Text = ((Int32)h.AddrOfPinnedObject()).ToString("X8");
h.Free();

// Just for security
GC.KeepAlive(o);
GC.KeepAlive(h);

I am using array's larger than 80 bytes up to 100KBytes.
Pinning 4byte large objects or arrays is handled vastly
differently by GC. Time something like this:

for (int k = 0; k < GCIterCount; k++)
{
testArray = new double[testArrayLength];
testArray[2] = 2;
}

Such that GCIterCont*testArrayLength is a constant value.

Click to expand...

I don't exactly understand what you're timing here. It looks to me as
you're timing things up to the point where GC kicks in. If you keep
testArrayLength constant, you allocate that sized array "GCIterCount"
times.

Please write complete code so that you give us something I can just
copy and paste into a compiler and try.

When testArrayLength reaches 1024 elements (80kBytes) you will see a
big jump in the cost of the allocation. All timings must be
normalized with
(GCIterCont*testArrayLength). That will give allocation cost
per element as a function of the array length.

Click to expand...

I tried this:

for (Int32 index = 100; index < 1000000; index += 1000)
{
DateTime dt1 = DateTime.Now;
Int32[] values = new Int32[index];
DateTime dt2 = DateTime.Now;
for (Int32 j = 0; j < 1000000; j++)
{
GCHandle h = GCHandle.Alloc(values);
h.Free();
}
DateTime dt3 = DateTime.Now;
Console.WriteLine(String.Format("{0}: {1}", index, (dt3 -
dt1).TotalSeconds));
}

This gave me fairly constant times for all the pinning involved,
regardless of the size of the array.

Please poke holes in the code and tell me what I'm doing wrong.

Click to expand...

I realize that the date calculation here should be dt3 - dt2, not - dt1,
however changing this does not change the end result.

Here's a snippet of my results (with dt3-dt2):

122100: 0,297616
123100: 0,31328
124100: 0,297616
125100: 0,297616
126100: 0,297616
127100: 0,281952
128100: 0,31328
129100: 0,297616
130100: 0,31328
131100: 0,297616
132100: 0,31328
133100: 0,297616
134100: 0,297616
135100: 0,31328
136100: 0,297616
137100: 0,297616
138100: 0,297616
139100: 0,328944
140100: 0,297616
141100: 0,297616
142100: 0,31328
143100: 0,31328
144100: 0,297616
145100: 0,31328

The values are around 0.3 regardless of how large or small the array is,
so clearly (to me), pinning seems to have a reasonable constant cost.

I was also notified by atmapuri that I had forgotten to use
GCHandleType.Pinned. Altering the example to the following code does not
change my results.

for (Int32 index = 100; index < 1000000; index += 1000)
{
DateTime dt1 = DateTime.Now;
Int32[] values = new Int32[index];
DateTime dt2 = DateTime.Now;
for (Int32 j = 0; j < 1000000; j++)
{
GCHandle h = GCHandle.Alloc(values, GCHandleType.Pinned);
h.Free();
}
DateTime dt3 = DateTime.Now;
Console.WriteLine(String.Format("{0}: {1}", index, (dt3 -
dt2).TotalSeconds));
}

The same type of values still appear for all values attempted.

Ben Voigt [C++ MVP] · Feb 12, 2008

Willy said:
Why do you think the costs of pinning are larger when passing buffers
with long life time? And how did you measure the costs of pinning.
What kind of GCmode are you using?

Because, as you said, pinning isn't designed to be done for an extended
length of time. Yet these buffers can't be allowed to move because OpenGL
is using them.

Are you talking about PInvoke interop here?

Yes, but the buffers are not pinned by p/invoke, because the pointer is
passed to glVertexPointer but the buffer address needs to remain valid until
glDrawElements, so I have to use a pinning GCHandle.

I will probably rewrite all of it in C++/CLI when I have some time, I
inherited this code from another developer and have been incrementally
improving it (adding display lists, vectorizing, etc).

Look at the mess that using OVERLAPPED structures inside the BCL goes
through... it requires special help from the runtime via
System.Threading.OverlappedData.AllocateNativeOverlapped() to avoid
long-term pinning in the Gen0 heap and horrible fragmentation. Why can't we
get that same help for our own buffers!?!

Atmapuri · Feb 12, 2008

Hi!

What you are measuring here is the cost of the allocation of a pinned
Object Handle, this is quite expensive, however this is not what happens
when you are passing an argument to unmanaged code using PInvoke, the
arguments passed are only pinned when the GC kicks off, the GC (and the
JIT) cooperates with the interop layer for this. So you are only paying
for this when the GC runs, but the costs of pinning is low compared to the
actual GC cost.

This is not a valid example, because I also need to pass subarrays from
index "i" onward for example. In this case, again a copy operation would
be required to create a new array. Pinning thus in reality can not be
avoided.

Again, write a small program that calls a C function that takes an array,
and look at the native code when calling that function, you won't see any
"pinning" at all.
The array will only get pinned (by the CLR's interop layer) when you force
a GC run on another thread when you are actually executing your function.
But again, watch out for the context, the semantics of the call and the GC
mode and the CLR and JIT version, the interop layer may do things other
than pinning. Like I said the JIT64 may copy the array to an internal
buffer before passing to unmanaged code.

Well, the cost of the pinning per element is a linear function of the
element
count.

Not exactly, fragmentation occurs when the GC leaves some gaps because it
cannot compact regions of the GC heap (the gen0, 1 and 2 heaps). The order
of deallocation is not determined by the order of allocation, some objects
may live longer than others, besides the order of de-allocation is
non-deterministic.

OK. I read the doc on CLR 2.0 improvements and I see your point.

(still does not make my code run fast said:
The PInvoke layer doesn't make any distinction between LO's and non-LO's
when pinning (this is what we are talking about isn't it?), "pinning" must
be done to:
1) prevent the object from moving when the GC comes along, and
2) prevent premature GC, whether the object is on the LOH (fixed address)
is a non issue.
Besides, it's not because, the LOH is not currently compacted that MS
cannot decide to implement this in a later version, or they could decide
to change the threshold to an higher value for instance.

Ok. So, as I stated before, I am only intrested in ensuring that the memory
is
not moved and that there is no secondary cost associated with this.

When and how it is collectable can be ensured with GC.KeepAlive().

You're taking a *managed* environment and saying "Actually, I want to
manage this myself."

No need to feel too frantic about anything that does not work <g>
Especially if ti does not break existing code and provides extra
performance.

Thanks!
Atmapuri

Ben Voigt [C++ MVP] · Feb 12, 2008

This is not what pinned means in this context. The objects on the LOH

(all version of the CLR) are at a fixed address for their life time,
but that doesn't mean they are pinned. Pinning is an explicit action

Ok, but then they don't require pinning. In any case they inherently have
the fixed address which requires pinning to obtain for normal objects.

It would be useful to request that a particular buffer not be
subject to relocation by the GC. Probably the easiest way to do
this would be to place it in the LOH. The OLE task allocator or
HGlobal allocator, both of which are already exposed by the Marshal
class in a typeless way, would be other options. It could be as
simple as adding a T[] Marshal.AllocCoTaskMem<T>(int elementCount)
override.

Click to expand...

But now you are allocating from the unmanaged heap (COM heap or CRT
heap or whatever). So now you will incur the costs of copying back
and forth, again this depends on the semantics, but might be a
solution when you need to pass large data chunks to unmanaged land.

Why? .NET could create a proper array descriptor storing the metadata
alongside and access it directly.

Ben Voigt [C++ MVP] · Feb 12, 2008

122100: 0,297616

123100: 0,31328
124100: 0,297616
125100: 0,297616
126100: 0,297616
127100: 0,281952
128100: 0,31328
129100: 0,297616
130100: 0,31328
131100: 0,297616
132100: 0,31328
133100: 0,297616
134100: 0,297616
135100: 0,31328
136100: 0,297616
137100: 0,297616
138100: 0,297616
139100: 0,328944
140100: 0,297616
141100: 0,297616
142100: 0,31328
143100: 0,31328
144100: 0,297616
145100: 0,31328

The values are around 0.3 regardless of how large or small the array
is, so clearly (to me), pinning seems to have a reasonable constant
cost.

From the first column, it looks like index is plenty big to put the arrays
in the LOH.

Jon Skeet [C# MVP] · Feb 12, 2008

Atmapuri said:
Well, the cost of the pinning per element is a linear function of the
element count.

So you keep claiming.

Here is a genuinely short but complete example which appears to run
counter to your claim:

using System;
using System.Diagnostics;

public class Test
{
const int Iterations = 100000000;

static void Main()
{
unsafe
{
for (int index = 100; index < 1000000; index += 1000)
{
byte[] values = new byte[index];
byte x = 0;
Stopwatch sw = Stopwatch.StartNew();
for (int j=0; j < Iterations; j++)
{
fixed (byte* b = values)
{
x += *b;
}
}
sw.Stop();
Console.WriteLine("{0}: {1}", index,
(int)sw.ElapsedMilliseconds);
GC.Collect();
GC.WaitForPendingFinalizers();
}
}
}
}

On my laptop the results are consistently between 300 and 500ms for 100
million iterations. There is no linear progression here. Note that if
it genuinley *were* copying, that would be pretty impressive - at 84K,
for instance, it would be copying over 15TB per second. That's a lot of
copying, even for fairly fast memory.

What happens if you run the above code on your box? How do you explain
my results?

Atmapuri · Feb 12, 2008

Hi!

The next evolution of your code would be this:

namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
for (Int32 index = 2; index < 1000000; index *= 2)
{
GCHandle h;
DateTime dt2 = DateTime.Now;
Int32 Iters = (100000000 / index); //allocate always the same count of
elements
for (Int32 j = 0; j < Iters; j++)
{
Int32[] values = new Int32[index];
// h = GCHandle.Alloc(values,GCHandleType.Pinned); //comment this in
values[1] = 2;
// h.Free(); //and this for comparison
}
DateTime dt3 = DateTime.Now;
Console.WriteLine(String.Format("{0}: {1}", index, (dt3 -
dt2).TotalSeconds));
}
}
}
}

Here are the problems fixed in your code:
- The GC and the C# compiler are "VERY" smart

- When you call the constructor on the GCHandle the second
time it sees that it is the same object and does nothing.
- the iter count has been normalized so that now you measure
allocation of the same amount of memory.

Int32 not pinned:

2: 0.856
4: 0.428
8: 0.254
16: 0.153
32: 0.1
64: 0.074
128: 0.062
256: 0.057
512: 0.055
1024: 0.053
2048: 0.053
4096: 0.051
8192: 0.051
16384: 0.051
32768: 0.139
65536: 0.127
131072: 0.138
262144: 0.081
524288: 0.071

Int32, pinned:

2: 9.807
4: 4.916
8: 2.477
16: 1.275
32: 0.662
64: 0.358
128: 0.203
256: 0.125
512: 0.088
1024: 0.069
2048: 0.06
4096: 0.052
8192: 0.051
16384: 0.048
32768: 0.14
65536: 0.127
131072: 0.137
262144: 0.081
524288: 0.071

As you see the cost of pinning for arrays shorter than 1024
elements is "substantial".

Now imagine you have an array typically 512 elements long
and you have to pin it each time before you enter unmanaged code?

Arrays as a whole can not always be passed, many times you
need subarrays, and thus the fine optimization by GC which may
only flag an array does not help.

Regards!
Atmapuri

Willy Denoyette [MVP] · Feb 12, 2008

Atmapuri said:
Hi!

No, you don't because somehow you must pin the object to prevent
premature collection. It's not because an object that is not pinned, is
not referenced by the unmanaged code.

Consider following sample:

int[] ia = new ia[200000]; // large object stored on the LOH at a
fixed location (not moveable.
UnmanagedFunctionThatTakesAnArrayOfInts(ia);

if the interop layer would not pin the array, the GC would be free to
collect the object and as such invalidate the pointer you have passed to
unmanaged code. The reason for this is that the JIt doesn't know about
unmanaged code, as a result he *marks* the reference "ia" as a candidate
for collection (signals the GC that the object ia refers to may be
collected at the moment of the call.

Click to expand...

That is not the case, if you put

GC.KeepAlive(ai);

at the end of your code, which not costs "Nothing" (!!!)

Oh, and you forget one of these KeepAlives and hell breaks loose. And again
why would you need this for, the arguments passed through PInvoke are
protected when needed for the duration of the call .

Willy.

Willy Denoyette [MVP] · Feb 12, 2008

Ben Voigt said:
Because, as you said, pinning isn't designed to be done for an extended
length of time. Yet these buffers can't be allowed to move because OpenGL
is using them.

Well you are talking about the side efects of pinning, not the actual
"pinning" action, which is what we are discussing here.

Yes, but the buffers are not pinned by p/invoke, because the pointer is
passed to glVertexPointer but the buffer address needs to remain valid
until glDrawElements, so I have to use a pinning GCHandle.

Interop does all you need to pin the buffer when the GC kicks of, this way,
the buffer is protected for the duration of the call, but that doesn't mean
it is pinned during , it doesn't have to, it only needs to get pinned when
the GC runs!

I will probably rewrite all of it in C++/CLI when I have some time, I
inherited this code from another developer and have been incrementally
improving it (adding display lists, vectorizing, etc).

This won't change a bit, interop between managed and unmanaged is the same
whether you use C++/CLI or another managed language. All you have is
somewhat greater control (and responsability) when using C++/CLI, but
whenever you need to pass managed "buffers" to unmanaged you need to watch
for the GC.

Look at the mess that using OVERLAPPED structures inside the BCL goes
through... it requires special help from the runtime via
System.Threading.OverlappedData.AllocateNativeOverlapped() to avoid
long-term pinning in the Gen0 heap and horrible fragmentation. Why can't
we get that same help for our own buffers!?!

Horrible fragmentation? Ever looked at the native heap fragmentation when
using OVERLAPPED in unmanaged code?

Willy.

Jon Skeet [C# MVP] · Feb 12, 2008

<snip<

Here are the problems fixed in your code:
- The GC and the C# compiler are "VERY" smart
- When you call the constructor on the GCHandle the second
time it sees that it is the same object and does nothing.

Hmm... but during the time when it's not pinned, the garbage collector
*could* have moved the object.

- the iter count has been normalized so that now you measure
allocation of the same amount of memory.

No, *that's* a problem. You're not only allocating the same amount of
memory that way, you're performing fewer iterations - fewer pinning
operations. In other words, you're skewing all your figures.

As you see the cost of pinning for arrays shorter than 1024
elements is "substantial".

Well, pinning isn't free - but your stats are inappropriate IMO.

Here's an alternative version which still takes into account the
allocation time, and still allocates a new array each time - but
performs the same *number of pins* for each size.

using System;
using System.Diagnostics;
using System.Runtime.InteropServices;

class Test
{
const int Iterations = 1000000;

static void Main(string[] args)
{
for (int index = 2; index < 100000; index *= 2)
{
int allocationOnly = TimeAllocation(index);
int allocAndPinning = TimeAllocAndPin(index);

Console.WriteLine("{0}: {1}", index,
allocAndPinning-allocationOnly);
}
}

static int TimeAllocation(int index)
{
Stopwatch sw = Stopwatch.StartNew();
for (int j = 0; j < Iterations; j++)
{
byte[] values = new byte[index];
values[1] = 2;
}
sw.Stop();
GC.Collect();
GC.WaitForPendingFinalizers();
return (int) sw.ElapsedMilliseconds;
}

static int TimeAllocAndPin(int index)
{
Stopwatch sw = Stopwatch.StartNew();
for (int j = 0; j < Iterations; j++)
{
byte[] values = new byte[index];
GCHandle h = GCHandle.Alloc(values,GCHandleType.Pinned);
values[1] = 2;
h.Free();
}
sw.Stop();
GC.Collect();
GC.WaitForPendingFinalizers();
return (int) sw.ElapsedMilliseconds;
}
}

I've used byte arrays instead of int arrays to make it clearer what the
size involved is. Here are my results - each figure is the number of
milliseconds difference between "just allocate" and "allocate and pin".

2: 457
4: 474
8: 394
16: 359
32: 441
64: 571
128: 592
256: 576
512: 254
1024: 369
2048: 190
4096: 232
8192: -264
16384: -1129
32768: -2814
65536: -3277

Now, I suspect that by the end the garbage collector is affecting the
results significantly, hence the odd nature - but the important thing
is that there doesn't appear to be the escalation you expect. If
copying were involved, the numbers would go up linearly with the size,
with a certain constant involved as well. Indeed, if you add in a loop
to manually set the values of the byte array in the pinning version
(reproducing copying) then you get that effect.

I still don't believe there's any copying involved, and I don't think
you've produced any evidence that there is. You certainly haven't
produced any documentation backing up your assertion. Wouldn't you
expect there to be some mention of the copying involved *somewhere*?

Lasse Vågsæther Karlsen · Feb 12, 2008

Ben said:
From the first column, it looks like index is plenty big to put the arrays
in the LOH.

The index didn't start on that value, that was just a small subset, but
even at 100 elements the timing was the same type of value, close to 0.3.

Jon Skeet [C# MVP] · Feb 12, 2008

I've used byte arrays instead of int arrays to make it clearer what the
size involved is. Here are my results - each figure is the number of
milliseconds difference between "just allocate" and "allocate and pin".

2: 457
4: 474
8: 394
16: 359
32: 441
64: 571
128: 592
256: 576
512: 254
1024: 369
2048: 190
4096: 232
8192: -264
16384: -1129
32768: -2814
65536: -3277

I've just tried an interesting experiment - instead of using GCHandle,
I've used the "fixed" statement for the purposes of pinning.

The results are *much* faster.

Do you really need a GCHandle, or would using a C# pointer be good
enough?

Lasse Vågsæther Karlsen · Feb 12, 2008

Atmapuri said:
Hi!

The next evolution of your code would be this:

namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
for (Int32 index = 2; index < 1000000; index *= 2)
{
GCHandle h;
DateTime dt2 = DateTime.Now;
Int32 Iters = (100000000 / index); //allocate always the same count of
elements
for (Int32 j = 0; j < Iters; j++)
{
Int32[] values = new Int32[index];
// h = GCHandle.Alloc(values,GCHandleType.Pinned); //comment this in
values[1] = 2;
// h.Free(); //and this for comparison
}
DateTime dt3 = DateTime.Now;
Console.WriteLine(String.Format("{0}: {1}", index, (dt3 -
dt2).TotalSeconds));
}
}
}
}

What you're timing is skewed by the overhead. Nobody said pinning was
free, but I still don't feel I have seen proof that pinning moves anything.

Assuming your heap has enough space for two arrays vastly different in
size, if pinning these two copied the contents to another heap, then
pinning those two would take a measurable difference in time.

My code, unless you manage to poke a hole in it that I'll accept, shows
that this doesn't happen.

I'm pretty sure you can invent something that shows something, like you
have done, but I don't understand your code.

It looks like you're trying to, for each index value, allocate the same
number of bytes in total by either allocating many small, or few large,
and then pinning those. Well, surprise, but when you allocate many small
the overhead will be more than when you allocate and pin few large.

When index is 16, you're allocating 16 integers 6250000 times and
pinning them. Pinning the memory that many times does indeed have
overhead, even just calling a method that many times will most likely
produce a measurable overhead.

But again, I still don't feel like you've shown that pinning moves anything.

As you see the cost of pinning for arrays shorter than 1024
elements is "substantial".

The cost of allocating anything less than 1024 elements that many times
is substantial. There are method calls involved in GCHandle.Alloc, and
even sub-objects being created, putting a larger pressure on the garbage
collector.

Now imagine you have an array typically 512 elements long
and you have to pin it each time before you enter unmanaged code?

Pinning isn't free, nobody has said so.

Arrays as a whole can not always be passed, many times you
need subarrays, and thus the fine optimization by GC which may
only flag an array does not help.

So use unsafe code and deal with pointers, or even write the whole thing
in unmanaged code. Nobody has said C# and .NET and managed code can do
everything better than every other language and runtime.

Willy Denoyette [MVP] · Feb 12, 2008

Jon Skeet said:
Atmapuri said:

Well, the cost of the pinning per element is a linear function of the
element count.

Click to expand...

So you keep claiming.

Here is a genuinely short but complete example which appears to run
counter to your claim:

using System;
using System.Diagnostics;

public class Test
{
const int Iterations = 100000000;

static void Main()
{
unsafe
{
for (int index = 100; index < 1000000; index += 1000)
{
byte[] values = new byte[index];
byte x = 0;
Stopwatch sw = Stopwatch.StartNew();
for (int j=0; j < Iterations; j++)
{
fixed (byte* b = values)
{
x += *b;
}
}
sw.Stop();
Console.WriteLine("{0}: {1}", index,
(int)sw.ElapsedMilliseconds);
GC.Collect();
GC.WaitForPendingFinalizers();
}
}
}
}

On my laptop the results are consistently between 300 and 500ms for 100
million iterations. There is no linear progression here. Note that if
it genuinley *were* copying, that would be pretty impressive - at 84K,
for instance, it would be copying over 15TB per second. That's a lot of
copying, even for fairly fast memory.

What happens if you run the above code on your box? How do you explain
my results?

Jon, what you are doing here is not directly comparable with what the OP is
doing, but it illustrates the overhead taken by "protecting" an object in
memory.
The OP is measuring the overhead of a GCHandle.Alloc, which is not what
happens when passing an object during a call into unmanaged and not what
happens when interop 'pins' the object during a GC run.

By using a "fixed" and "unsafe" code block, you are effectively creating a
pointer, the OP is creating "pinned object handles".
Basically what the JIT does is nothing more than taking the address of the
first element of an array, put the address of the object (the array) in the
"Pointer Table" ( which is part of the GCInfo table) together with some
tracking info. The array is not really "pinned", but the GC knows how to
deal with this, that is, he inspects the pointer table before he starts a
scan, so he will not move or collect the object as long as the current
instruction falls in the range of the tracking info of this pointer.
I'm not sure whether the Interop layer will not "effectively" pin the object
when the GC comes along when passing the address to unmanaged code in an
interop scenario, I guess not.

Willy.

Willy Denoyette [MVP] · Feb 12, 2008

Jon Skeet said:
I've just tried an interesting experiment - instead of using GCHandle,
I've used the "fixed" statement for the purposes of pinning.

The results are *much* faster.

Do you really need a GCHandle, or would using a C# pointer be good
enough?

Here are my results, running your code (optimized build).

2: 331
4: 337
8: 334
16: 335
32: 342
64: 342
128: 346
256: 349
512: 351
1024: 354
2048: 358
4096: 363
8192: 369
16384: 386
32768: 412
65536: 597

You see the difference is quite constant but considerable (as expected).

As I said in another reply, there is quite some difference between
"GHHandle.Alloc "and the pinning done by PInvoke. Measuring GCHandle.Alloc
overhead tells you nothing about PInvoke interop pinning (again which only
occurs when the GC starts running).
There is no copying involved at all when passing a buffer (an array) to
unmanaged code, this can simply be verified by checking the address passed
to unmanaged (1), it will be the address of the first element of the array
(or the "fixed" pointer in case of *unsafe fixed* code.

Willy.

1)
// compile with: cl /EHsc /O2 /LD test.cpp
#include <cstdio>
extern "C" void __declspec(dllexport) __stdcall PassSimpleByteArray(byte
d[])
{
printf_s("address(d) = %0x aligned = %s\n", &d[0], (bool)((int)&d[0] %
4)?"False":"True");
}

usage (C#):

[DllImport("test"), SuppressUnmanagedCodeSecurity]
extern unsafe static void PassSimpleByteArray(byte* da);

or:

[DllImport("test"), SuppressUnmanagedCodeSecurity]
extern unsafe static void PassSimpleByteArray(byte[] da);
....

Willy.

Jon Skeet [C# MVP] · Feb 12, 2008

Jon, what you are doing here is not directly comparable with what the OP is
doing, but it illustrates the overhead taken by "protecting" an object in
memory.
The OP is measuring the overhead of a GCHandle.Alloc, which is not what
happens when passing an object during a call into unmanaged and not what
happens when interop 'pins' the object during a GC run.

By using a "fixed" and "unsafe" code block, you are effectively creating a
pointer, the OP is creating "pinned object handles".

Do we know that the OP can't actually use pointers? I probably missed a
post somewhere.

Basically what the JIT does is nothing more than taking the address of the
first element of an array, put the address of the object (the array) in the
"Pointer Table" ( which is part of the GCInfo table) together with some
tracking info. The array is not really "pinned", but the GC knows how to
deal with this, that is, he inspects the pointer table before he starts a
scan, so he will not move or collect the object as long as the current
instruction falls in the range of the tracking info of this pointer.

Making sure that an object isn't moved or collected sounds like pinning
it to me. It's pinned as far as the IL goes, too - the variable is
explicitly marked with the "pinned" flag.

Now, that may not be doing everything the OP is talking
about - but I don't see how it doesn't count as being pinned.

Willy Denoyette [MVP] · Feb 12, 2008

Jon Skeet said:
Do we know that the OP can't actually use pointers? I probably missed a
post somewhere.

I don't know either, this thread is simply derailed as it is. As far as I
understand, the OP thinks that pinning is
exactly what is done by GCHandle.Alloc with GCHandleType.Pinned, that's why
he is measuring this methods overhead. But interop doesn't pin implicitely,
it pins the arguments when needed. It's so easy to watch this when running
some code in an unmanaged debugger, provide you know how to use your tools
;-).
It's also possible to inspect what happens when a GC comes along, the
problem is that it takes some time to find out where to set the
break-points.

Making sure that an object isn't moved or collected sounds like pinning
it to me. It's pinned as far as the IL goes, too - the variable is
explicitly marked with the "pinned" flag.

Yes, but what's been produced by the JIT is not the same as you "really" pin
an object at run-time, say by using GCHandle.Alloc with GCHandleType.Pinned,
and it's also not exactly the same as done by the Interop layer.
Fixed code blocks and pointers are the cheapest way to handle "pinned"
buffers, this is quite normal as it can be done a compile time, which is not
the case when using interop. Also, note that fixed was introduced after the
initial release of the framework, the Pointer table did not exists in the
first versions of the CLR.

Now, that may not be doing everything the OP is talking
about - but I don't see how it doesn't count as being pinned.

What I mean is, that it's not "pinned", like it's done when calling
GCHAndle.Alloc, that doesn't mean the objects address isn't fixed for the
duration of the scope of the declaring block.
Actually, there is no "pinned reference" in the GCInfo table (the pinned
object table) when using "fixed", nor is there a bit set in the object
header.
Not sure however what kind of "pinning" is used by the interop layer , but
I'm sure it isn't the same as done with unsafe/fixed. I know that in V1.1 it
was done by setting a bit in the objects header (for both PInvoke and COM
interop), but possibly this has changed since more bits are used for other
purposes.

Willy.

Willy Denoyette [MVP] · Feb 12, 2008

Ben Voigt said:
Ok, but then they don't require pinning. In any case they inherently have
the fixed address which requires pinning to obtain for normal objects.

As I said before, pinning is an action performed by the Interop layer *when
the GC initiates a scan, and it's not done by means of a call to
GCHandle.Alloc. You only need to pin "explicitely" when you are passing an
object to unmanaged code and you need to keep the object alive and at fixed
location after the PInvoke call returned, during the call the PInvoke layer
takes care of eventual pinning.
Also, you keep ignoring my remark that the fact that addresses of *Large*
objects are fixed is a convenience of the current version of the CLR,
nothing stops MS from changing this.

It would be useful to request that a particular buffer not be
subject to relocation by the GC. Probably the easiest way to do
this would be to place it in the LOH. The OLE task allocator or
HGlobal allocator, both of which are already exposed by the Marshal
class in a typeless way, would be other options. It could be as
simple as adding a T[] Marshal.AllocCoTaskMem<T>(int elementCount)
override.

Click to expand...

But now you are allocating from the unmanaged heap (COM heap or CRT
heap or whatever). So now you will incur the costs of copying back
and forth, again this depends on the semantics, but might be a
solution when you need to pass large data chunks to unmanaged land.

Click to expand...

Why? .NET could create a proper array descriptor storing the metadata
alongside and access it directly.

Where? outside of the GC heap? Who's going to "manage" these objects then?
As you may know, there are other cheap means to pass fixed buffers to
managed code.

Willy.

Jon Skeet [C# MVP] · Feb 12, 2008

Willy Denoyette said:
I don't know either, this thread is simply derailed as it is. As far as I
understand, the OP thinks that pinning is
exactly what is done by GCHandle.Alloc with GCHandleType.Pinned, that's why
he is measuring this methods overhead. But interop doesn't pin implicitely,
it pins the arguments when needed. It's so easy to watch this when running
some code in an unmanaged debugger, provide you know how to use your tools
;-).

It's also possible to inspect what happens when a GC comes along, the
problem is that it takes some time to find out where to set the
break-points.
Right.

Yes, but what's been produced by the JIT is not the same as you "really" pin
an object at run-time, say by using GCHandle.Alloc with GCHandleType.Pinned,
and it's also not exactly the same as done by the Interop layer.
Fixed code blocks and pointers are the cheapest way to handle "pinned"
buffers, this is quite normal as it can be done a compile time, which is not
the case when using interop. Also, note that fixed was introduced after the
initial release of the framework, the Pointer table did not exists in the
first versions of the CLR.

"fixed" wasn't in the original framework? Not in C# 1.0? Was it
introduced in the (bizarrely named) C# 1.2?

What I mean is, that it's not "pinned", like it's done when calling
GCHAndle.Alloc, that doesn't mean the objects address isn't fixed for the
duration of the scope of the declaring block.

Whereas the latter is precisely what I mean when I use the word
"pinned" - partly because that's how the CLI spec uses it.

Actually, there is no "pinned reference" in the GCInfo table (the pinned
object table) when using "fixed", nor is there a bit set in the object
header.
Not sure however what kind of "pinning" is used by the interop layer , but
I'm sure it isn't the same as done with unsafe/fixed. I know that in V1.1 it
was done by setting a bit in the objects header (for both PInvoke and COM
interop), but possibly this has changed since more bits are used for other
purposes.

Fair enough. I readily admit it's all a bit of a blur to me. I tend to
go only by what's in the specs - which is of course a lot less than is
available in reality.

Array allocation and garbage collection	7	Jan 17, 2008
.NET Garbage Collector Question - Pinned Memory	1	Jul 12, 2007
Garbage collection question	1	Mar 28, 2008
Automatic garbage collection	8	Feb 3, 2012
Garbage Collection Issues in long-standing services	18	Dec 21, 2005
unmanaged vs managed.	23	Oct 30, 2007
Way to improve managed code performance	4	Feb 17, 2005
life-time pinning. vs Global heap	11	Jan 3, 2008

Garbage collectable pinned arrays!

Atmapuri

Lasse Vågsæther Karlsen

Ben Voigt [C++ MVP]

Atmapuri

Ben Voigt [C++ MVP]

Ben Voigt [C++ MVP]

Jon Skeet [C# MVP]

Atmapuri

Willy Denoyette [MVP]

Willy Denoyette [MVP]

Jon Skeet [C# MVP]

Lasse Vågsæther Karlsen

Jon Skeet [C# MVP]

Lasse Vågsæther Karlsen

Willy Denoyette [MVP]

Willy Denoyette [MVP]

Jon Skeet [C# MVP]

Willy Denoyette [MVP]

Willy Denoyette [MVP]

Jon Skeet [C# MVP]

Ask a Question

Similar Threads