Heap question

Adam Benson · Nov 4, 2009

Hi,

I have an app throwing lots of byte arrays around over remoting links, sizes
vary and could be from about 64K to 2Mb. A colleague says its easier on the
heap if you split data up into 5K packets.

So according to what he said instead of passing byte[2 * 1024 * 1024], for
example, I ought to be passing byte[512][4096] across the link.

Is he talking sense?

Thanks,

Adam.

==============================
(e-mail address removed)

Scott M. · Nov 4, 2009

Adam Benson said:
Hi,

I have an app throwing lots of byte arrays around over remoting links,
sizes vary and could be from about 64K to 2Mb. A colleague says its easier
on the heap if you split data up into 5K packets.

So according to what he said instead of passing byte[2 * 1024 * 1024], for
example, I ought to be passing byte[512][4096] across the link.

Is he talking sense?

Thanks,

Adam.

==============================
(e-mail address removed)

I've never heard of such a thing. I can understand splitting data up for
more efficient storage on a hard drive or for more efficient network
transfer, but not for heap allocation.

-Scott

Peter Duniho · Nov 4, 2009

Adam said:
Hi,

I have an app throwing lots of byte arrays around over remoting links, sizes
vary and could be from about 64K to 2Mb. A colleague says its easier on the
heap if you split data up into 5K packets.

So according to what he said instead of passing byte[2 * 1024 * 1024], for
example, I ought to be passing byte[512][4096] across the link.

Is he talking sense?

Maybe. Maybe not. It depends on what problem you're trying to solve.

Objects larger than 85K wind up in the large object heap, which has some
drawbacks and some advantages:

-- The main drawback is that the LOH is never compacted, so certain
allocation patterns can fragment the heap and cause future allocation
failures.

-- On the other hand, the main advantage is that the LOH is never
compacted, so large objects don't have to be copied around when the main
heap has to be compacted.

If you don't have a specific performance issue that you're trying to
solve, I don't see the point in meddling with the code. I'm not sure I
would use remoting to move large amounts of data like that, due to the
overhead involved in remoting. But that's just a general sense of the
issue, not something I've really explored carefully. The overhead can
either be magnified as the amount of data gets larger, or amortized,
depending on the exact treatment of the data (and note that at best,
breaking the data into smaller pieces will reduce any amortization,
potentially increasing network overhead).

If the current implementation is working for your needs, provides the
kind of throughput you consider acceptable given the hardware you're
using (PC, network, etc.), and provides a simpler, more maintainable
implementation, it seems to me that the current implementation is the
way to go.

Pete

Keith Ruralls (gentlerobbinATgmail) - webdesigner- · Nov 5, 2009

Hi,

Click to expand...

I have an app throwing lots of byte arrays around over remoting links,
sizes vary and could be from about 64K to 2Mb. A colleague says its easier
on the heap if you split data up into 5K packets.

Click to expand...

So according to what he said instead of passing byte[2 * 1024 * 1024], for
example, I ought to be passing byte[512][4096] across the link.

Click to expand...

Is he talking sense?

==============================
(e-mail address removed)

Click to expand...

I've never heard of such a thing. I can understand splitting data up for
more efficient storage on a hard drive or for more efficient network
transfer, but not for heap allocation.

-Scott- Hide quoted text -

- Show quoted text -

No, splitting data can increase performance too, look at the paging,
objects are swapped in and out on the heap at different levels of
memory management. You can also be sure that the memory manager is
adroit enough to get each storage frame to be in full of fit size.
What hole do you think you can fit a 5GB of objects into ? Definitely
it should be 5x8=40GB of length of memory and since each address is
dispatched throughout the address space but the manager will keep it
fit with a special adjustment of the socalled allocator to keep the
fragments all away down the hiearchy. So to speak heap allocation is
fine with data splicing at all.

Hope that helps

Adam Benson · Nov 5, 2009

Thanks for the responses, all.

Much appreciated.

A bit of background info - we run automated TV channels and customers
sometimes load a hell of lot in advance (3 or 4 days worth of scheduling) -
a daft way to do it in my view, but hey they're the customers. For the most
part we do not throw around the whole schedule but there are some occasions
where it's unavoidable and it takes ages.

I tried writing my own serializer to see if matters could be improved and I
managed that quite convincingly. I think from what you've said the main
issue is the LOH is never compacted - but these large arrays are only alive
for 2-3 seconds at most, then when the objects have been re-hydrated the
arrays are ditched. Unless the LOH never merges free space back together
again I can't see it being an issue. I suppose it could be more of an issue
server-side but again the arrays are only alive until the stuff's been sent
down the wire.

Regards,

Adam.

Peter Duniho · Nov 5, 2009

Keith said:
No, splitting data can increase performance too, look at the paging,
objects are swapped in and out on the heap at different levels of
memory management.

Huh? That doesn't make any sense.

The component that does swapping is the OS virtual memory manager. And
it doesn't care at all about allocation units (i.e. how much memory the
application allocated at once) when it does that. It swaps out on a
page-by-page basis and can just as easily swap out a 4K portion of a 1GB
data structure as a 4K portion of a 16K data structure.

You can also be sure that the memory manager is
adroit enough to get each storage frame to be in full of fit size.

What do you mean by "storage frame" What does "to be in full of fit
size" mean (since that's not a grammatically correct phrase)?

What hole do you think you can fit a 5GB of objects into ? Definitely
it should be 5x8=40GB of length of memory

What should be? "5GB of objects"? Why would you need "40GB of memory"
to fit "5GB of objects"?

and since each address is dispatched

"Address is dispatched"? What's that supposed to mean? "Dispatch" is a
messaging/invocation term, not a memory management term.

throughout the address space but the manager will keep it
fit

What does "keep it fit" mean? How do you keep an address "fit"?

with a special adjustment of the socalled allocator

What "special adjustment"? What's a "socalled [sic] allocator"?

to keep the fragments all away down the hiearchy.

Fragmentation is good to avoid if possible/practical. But that's more
of an allocation failure/success issue, not a performance/swapping issue.

If anything, avoiding fragmentation by breaking up data structures can
be a trade-off with performance, as it can reduce data locality,
reducing performance rather than increasing it.

So to speak heap allocation is fine with data splicing at all.

I'm sorry...as you can see, most of that paragraph made no sense at all.
Care to clarify?

Pete

Scott M. · Nov 5, 2009

"Keith Ruralls (gentlerobbinATgmail) - webdesigner-to-be"

Hi,

Click to expand...

I have an app throwing lots of byte arrays around over remoting links,
sizes vary and could be from about 64K to 2Mb. A colleague says its
easier
on the heap if you split data up into 5K packets.

Click to expand...

So according to what he said instead of passing byte[2 * 1024 * 1024],
for
example, I ought to be passing byte[512][4096] across the link.

Click to expand...

Is he talking sense?

==============================
(e-mail address removed)

Click to expand...

I've never heard of such a thing. I can understand splitting data up for
more efficient storage on a hard drive or for more efficient network
transfer, but not for heap allocation.

-Scott- Hide quoted text -

- Show quoted text -

No, splitting data can increase performance too, look at the paging,
objects are swapped in and out on the heap at different levels of
memory management.

Click to expand...

What does that mean? How does that support your statement that splitting
data increases performance?

But, should I have to split the data up myself ahead of time or let the
system do its job and pack the data into the allocation units it has to work
with? If I break up the data myself, I'm basically bloating up my
applicaiton to do what the Framework/OS is already programmed to do for me.
What do I gain by splitting the data up myself?

I'm not sure I follow that, but again, the original question wasn't whether
or not storing data in small pieces is efficient, the question was whether
the programmer should be splitting the data up ahead of time.

And does "fine" mean better?

No, it doesn't.

Heap question

Adam Benson

Scott M.

Peter Duniho

Keith Ruralls (gentlerobbinATgmail) - webdesigner-

Adam Benson

Peter Duniho

Scott M.