How does "new" work in a loop?

T

Tony Sinclair

I'm just learning C#. I'm writing a program (using Visual C# 2005 on
WinXP) to combine several files into one (HKSplit is a popular
freeware program that does this, but it requires all input and output
to be within one directory, and I want to be able to combine files
from different directories into another directory of my choice).

My program seems to work fine, but I'm wondering about this loop:


for (int i = 0; i < numFiles; i++)
{
// read next input file

FileStream fs = new FileStream(fileNames,
FileMode.Open, FileAccess.Read, FileShare.Read);
Byte[] inputBuffer = new Byte[fs.Length];

fs.Read(inputBuffer, 0, (int)fs.Length);
fs.Close();

//append to output stream previously opened as fsOut

fsOut.Write(inputBuffer, 0, (int) inputBuffer.Length);
progBar.Value++;
} // for int i

As you can see, the objects fs and inputBuffer are both created as
"new" each time through the loop, which could be many times. I didn't
think this would work; I just tried it to see what kind of error
message I would get, and I was surprised when it ran. Every test run
has produced perfect results.

So what is happening here? Is the memory being reused, or am I piling
up objects on the heap that will only go away when my program ends, or
am I creating a huge memory leak?

I can see that fs might go away after fs.Close(), but I don't
understand why I'm allowed to recreate the byte array over and over,
without ever disposing of it. I have verifed with the debugger that
the array has a different size each time the input file size changes,
so it really is being reallocated each time through the loop, rather
than just being reused. I've tried to find explanations of how "new"
works in a loop, but I haven't been able to so far. Any help,
including pointers to the VS docs or a popular book on C#, would be
appreciated.
 
P

Peter Rilling

Well, "new" always create a new instance of the object. The object was
assigned to the variable during the previous cycle will be flagged so the
garbage collector can release memory. Now, since the GC does not clear
memory immediately, there is a period of time when the old object stays in
memory so there is the change that you can use too much memory before the GC
is invoked, but not a great chance. It is not something that I would worry
about unless you plan to join files where each part is a gig in size. :)
 
J

John J. Hughes II

The memory is allocated during each loop and in concept released when the
loop ends. In reality the memory is marked to be released by GC (garbage
collection) as some future point in time. I have found that if you were to
do this loop only a few times and the program was idle the memory would be
freed but if this loop involves a lot of files and very little idle time is
returned to the system you will run out of memory.

That being said I would at the very least place fs=null and inputbuffer =
null at the end of the loop.

A better solution for fs would be the using statment which would force GC
and return the memory.

using(FileStream fs = new FileStream(fileNames,FileMode.Open,
FileAccess.Read, FileShare.Read))
{
Byte[] inputBuffer = new Byte[fs.Length];

/// do something with fs

inputBuffer = null;
}

Regards,
John
 
M

Matt

Tony said:
I'm just learning C#. I'm writing a program (using Visual C# 2005 on
WinXP) to combine several files into one (HKSplit is a popular
freeware program that does this, but it requires all input and output
to be within one directory, and I want to be able to combine files
from different directories into another directory of my choice).

My program seems to work fine, but I'm wondering about this loop:


for (int i = 0; i < numFiles; i++)
{
// read next input file

FileStream fs = new FileStream(fileNames,
FileMode.Open, FileAccess.Read, FileShare.Read);
Byte[] inputBuffer = new Byte[fs.Length];

fs.Read(inputBuffer, 0, (int)fs.Length);
fs.Close();

//append to output stream previously opened as fsOut

fsOut.Write(inputBuffer, 0, (int) inputBuffer.Length);
progBar.Value++;
} // for int i

As you can see, the objects fs and inputBuffer are both created as
"new" each time through the loop, which could be many times. I didn't
think this would work; I just tried it to see what kind of error
message I would get, and I was surprised when it ran. Every test run
has produced perfect results.

So what is happening here? Is the memory being reused, or am I piling
up objects on the heap that will only go away when my program ends, or
am I creating a huge memory leak?


Unlikely that you are creating a memory leak. C# uses garbage
collection.
When the object goes out of scope (in your case, the } marked // for
int i)
the object is destroyed. The next time through the loop, a new one is
created.

Matt
 
J

Jon Skeet [C# MVP]

John J. Hughes II said:
The memory is allocated during each loop and in concept released when the
loop ends. In reality the memory is marked to be released by GC (garbage
collection) as some future point in time. I have found that if you were to
do this loop only a few times and the program was idle the memory would be
freed but if this loop involves a lot of files and very little idle time is
returned to the system you will run out of memory.

That being said I would at the very least place fs=null and inputbuffer =
null at the end of the loop.

Why? It serves no purpose - and code which serves no purpose is just
distracting, IMO.
A better solution for fs would be the using statment which would force GC
and return the memory.

It wouldn't return the memory - but it *would* close/dispose the stream
in all situations, whether or not there's an exception.
Closing/disposing the stream doesn't return any memory, but it releases
the handle on the file.
 
J

Jon Skeet [C# MVP]

Unlikely that you are creating a memory leak. C# uses garbage
collection. When the object goes out of scope (in your case, the }
marked // for int i) the object is destroyed. The next time through
the loop, a new one is created.

The object is *not* destroyed when it reaches the end of the scope.
..NET does not have deterministic garbage collection. Instead, the
object's memory will be released *at some point* after it is last used.

In fact, this could be before the end of the scope - the GC could kick
in before progBar.Value++ and free both fs and inputBuffer.
 
J

Jon Skeet [C# MVP]

Peter Rilling said:
Well, "new" always create a new instance of the object. The object was
assigned to the variable during the previous cycle will be flagged so the
garbage collector can release memory.

Not quite - the "old" object isn't marked in any way (which would
basically be like reference counting). Instead, every time the GC kicks
in, all the "live" objects in the system are marked, and after that's
finished anything which *isn't* marked can be destroyed (or finalized,
if it has a finalizer).
 
B

Barry Kelly

Unlikely that you are creating a memory leak. C# uses garbage
collection. When the object goes out of scope (in your case, the }
marked // for int i) the object is destroyed. The next time through
the loop, a new one is created.

It isn't freed immediately on every loop, or when it goes out of scope,
unless it's a value type (struct in C#). The GC kicks in on an as-needed
basis: memory pressure, repeated allocations, etc. That's one thing, and
is easy.

The other half of the story is resources. The GC isn't a resource
manager, so you shouldn't rely on it to manage things like file handles,
since it won't release them in a timely manner. That's why it's
important to use "using" with FileStream and other classes which
implement IDispose.

-- Barry
 
?

=?ISO-8859-1?Q?G=F6ran_Andersson?=

Jon said:
Why? It serves no purpose - and code which serves no purpose is just
distracting, IMO.

Well, it actually serves one purpose, at least occationally.

If the first generation of the heap is full when the buffer is going to
be allocated, a garbage collection kicks in to free some memory. If you
have removed the reference to the previous buffer, it can be collected,
otherwise not.

For that purpose, you can just as well set the reference to null before
you create the new buffer:

inputBuffer = null;
inputBuffer = new Byte[fs.Length];


I wouldn't allocate a new buffer for every file, though. I would use a
reasonably sized buffer to read chunks from the files:

while ((len = fs.Read(inputBuffer, 0, 4096)) > 0) {
fsOut.Write(inputBuffer, 0, len);
}
fs.Close();
 
?

=?ISO-8859-1?Q?G=F6ran_Andersson?=

Jon said:
The object is *not* destroyed when it reaches the end of the scope.
.NET does not have deterministic garbage collection. Instead, the
object's memory will be released *at some point* after it is last used.

In fact, this could be before the end of the scope - the GC could kick
in before progBar.Value++ and free both fs and inputBuffer.

But at that point there are still references to those objects.

Actually, the buffer won't be collectable until after the next buffer
has been created in the next iteration of the loop, when the reference
is replaced by the reference to the new buffer.
 
B

Barry Kelly

Göran Andersson said:
But at that point there are still references to those objects.

Not if the variables are in registers and have been overwritten. The JIT
compiler can detect the variable's lifetime, it doesn't necessarily last
out the whole lexical scope.
Actually, the buffer won't be collectable until after the next buffer
has been created in the next iteration of the loop, when the reference
is replaced by the reference to the new buffer.

Ditto.

-- Barry
 
?

=?ISO-8859-1?Q?G=F6ran_Andersson?=

Barry said:
Not if the variables are in registers and have been overwritten. The JIT
compiler can detect the variable's lifetime, it doesn't necessarily last
out the whole lexical scope.

So you mean that the scope of a variable only last one single iteration
of the loop?
 
B

Barry Kelly

Göran Andersson said:
So you mean that the scope of a variable only last one single iteration
of the loop?

I mean that the "FileStream fs" in the OP may be GCd before
progBar.Value++, like Jon says. The variable's lifetime may be smaller
than its scope. Scope is a lexical concept that exists only at compile
time.

-- Barry
 
J

Jon Skeet [C# MVP]

Göran Andersson said:
But at that point there are still references to those objects.

Actually, the buffer won't be collectable until after the next buffer
has been created in the next iteration of the loop, when the reference
is replaced by the reference to the new buffer.

Nope. In release mode, the JIT is smart enough to work out when a
variable can no longer be read, and will not count that variable as a
live root.

Jon
 
J

John J. Hughes II

Well Jon you can site what is supposed to happen but I have to deal with
what really happens. I write services that run constantly and in some
cases don't return much idle time back to the system for days. I have
found that <var>=null on non-disposable values and using(<statement>) allows
my program to maintain an even memory allocation and stops the memory creep.
I will grant you in my code I am probably using them to excess but having my
customers tell me of memory errors after running my program for X+/- days
depending on load can be really hard to track down, this stopped after
adding the the set to null statements and using statements.

Note in forms applications I normally don't use them as much being as the
system is normally idle.

As you say "IMO" ;>

Regards,
John
 
M

Matt

Barry said:
It isn't freed immediately on every loop, or when it goes out of scope,
unless it's a value type (struct in C#). The GC kicks in on an as-needed
basis: memory pressure, repeated allocations, etc. That's one thing, and
is easy.

Yes, I know this. The OP was a C++ programmer, I was giving it to him
in C++ context. GC is deterministic, it will kick in when it makes
sense
to kick in.
The other half of the story is resources. The GC isn't a resource
manager, so you shouldn't rely on it to manage things like file handles,
since it won't release them in a timely manner. That's why it's
important to use "using" with FileStream and other classes which
implement IDispose.

True and a good point. I was just explaining why it wasn't a memory
leak,
but your explanation is better for this purpose.

Thanks
Matt
 
?

=?ISO-8859-1?Q?G=F6ran_Andersson?=

Jon said:
Nope. In release mode, the JIT is smart enough to work out when a
variable can no longer be read, and will not count that variable as a
live root.

Jon

Does that mean that the reference is removed from the variable?
Otherwise the garbage collector will still see the reference and can't
collect the object.
 
?

=?ISO-8859-1?Q?G=F6ran_Andersson?=

Barry said:
I mean that the "FileStream fs" in the OP may be GCd before
progBar.Value++, like Jon says. The variable's lifetime may be smaller
than its scope. Scope is a lexical concept that exists only at compile
time.

-- Barry

Does that mean that the compiler adds code to remove the reference from
the fs variable? As long as the reference is there, the garbage
collector won't collect the object.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top