Best practice using large objects in foreach

  • Thread starter Thread starter Benny
  • Start date Start date
B

Benny

I just wanted to throw the discussion out there on what the best
practice people feel is for using large objects in a foreach loop. For
example if you are reusing an Image object in a loop like this (letters
above snippets are for reference purposes):

A
foreach ( string s in myList )
{
Image img = Image.FromFile( s );
// operate on img
}

would it run more efficiently like this?

B
Image img = null;
foreach ( string s in myList )
{
img = Image.FromFile( s );
// operate on img
}

or

C
foreach ( string s in myList )
{
using ( Image img = Image.FromFile( s );
{
// operate on img
}
}

the list goes on....

Let me know what your thoughts are.
 
In none of the cases are you re-using the object; you are just re-using
the variable, which is just (effectively) an integer, so no real size.
In fact, the nesting is likely to be normalized by the compiler, so no
real difference between A and B. Even if you could img.LoadFrom (i.e.
using the same object) it would likely assign a new memory block
internally, so you wouldn't re-use the "big" portion of memory.

Disposing a disposable object is a good thing for loops like this, so C
is good. Do this ;-p

Marc
 
So by using A and B, nothing is disposed automatically by the GC once
it hits the end of the loop?
 
I don't see any object reuse, at most variable reuse. Compiler does not care
much about it. Nor does CLR.
From the performance point of view, A & B are the same. C is slower because
of using().
 
No; you need to (in your mind) separate *object* lifetime, and
*variable* lifetime. img is the variable; it isn't img that you are
trying to dispose, but the object that it points to (at that time).

Consider:

Image img = Image.FromFile(whatever);
img = Image.FromFile(somethingElse);
img.Dispose();

here, I have 1 variable, but 2 objects, only one of which is disposed;
when I perform the second "img = " assignment, the first image stays in
memory, as an orphan on the managed heap. The garbage collector [GC]
will probably spot it after a while and destroy it (I'm assuming it has
a finaliser or similar if it uses unmanaged image handles).

The second object will also be orphaned when our code completes, making
it answerable to the GC - but in this case the finalizer will have
probably been cancelled by the Dispose() call (after freeing the
resources). This has 2 advantages: firstly, the resources are freed
much sooner (i.e. when you are done with them, rather than when the GC
spots them), and secondly the GC can destroy the object in one pass
rather than two (if an object needs finalizing, then it finalizes the
first time the GC spots it, and takes it off the managed heap on the
second pass).

C is the good code. Stick with this. You could add disposal to A or B,
but wy bother? You'd still want to dispose *each image* as you are done
with it (not the single variable). You can't use the "using" syntax
with an existing variable (img), which means you'd need a try /
finally... so it would make the code more complex (to do properly).

Marc
 
No; C is /vastly/ more efficient overall, as the memory footprint is
minimised during processing due to early disposal. In A & B this is
performed by the GC, so you will see (for a large set) the memory usage
ramp for a while, then processing grind to a halt as the GC tears down
the improperly discarded images, and then the memory start ramping up
again. I would anticipate C to stay fairly constant both in terms of
memory footprint and processing rate, and it will put much less stress
on the rest of the system - important in a server environment.

Marc
 
No; C is /vastly/ more efficient overall...

(this was in response to Laura's post; sorry if that was unclear)
 
So by using A and B, nothing is disposed automatically by the GC once
it hits the end of the loop?

A further clarification:
a: the GC does not dispose; it finalises (if required)
b: the GC doesn't care about "the end of the loop"; it runs on its own
thread, and hunts for abandoned objects in its own sweet time. The CLR
doesn't use reference counting etc (e.g. from COM) that would allow it
to destroy things when they go out of scope, as this is prone to orphan
islands of memory (memory leaks). Rather, it periodically scans
*everything* to see what can still be reached by an active (reachable)
object.

Marc
 
Benny said:
I just wanted to throw the discussion out there on what the best
practice people feel is for using large objects in a foreach loop. For
example if you are reusing an Image object in a loop like this (letters
above snippets are for reference purposes):

There are two issues involved here:

1) The scope of locals

2) Disposal of objects which implement IDisposable

The basic rules of thumb, correspondingly, are:

1) The scope should be as small as possible, but large enough so the
variable is visible everywhere it's needed in the function. Preferably,
locals shouldn't be declared until there's a value to initialize them
with - this isn't always possible, though.

2) Locals which implement IDisposable should be used inside a 'using' if
possible; fields which implement IDisposable should be disposed of in
the Dispose(bool) method, and the class should implement the Dispose
pattern.
C
foreach ( string s in myList )
{
using ( Image img = Image.FromFile( s );
{
// operate on img
}
}

So, the above is the best option, IMHO.

About local scoping: the scope information of the variable is lost after
compiling. The JIT compiler in the CLR works out the actual lifetime of
the local by analysing when and where it is used. If you try compiling
two programs, one which uses your option B and one which uses C, you'll
find that they compile to the same IL.

-- Barry
 
one which uses your option B and one which uses C, you'll
find that they compile to the same IL

I think you mean A and B, plus there's the trivial "=null" assignment
to watch for (although this might not actually create IL, since it is
essentially zero; I can't remember...)

Marc
 
Yes. The C version is the correct way.
I dont't think there would be no halt because the objects remain in gen1.
And on the server side GC is concurrent.
But yes, it's (almost) always imperative to Dispose() as fast as you can.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Back
Top