High Memory Usage Garbage Collection Question

I

Ian Taite

Hello,

I'm exploring why one of my C# .NET apps has "high" memory usage, and
whether I can reduce the memory usage.

I have an app that wakes up and processes text files into a database
periodically. What happens, is that the app reads the contents of a
text file line by line into an ArrayList. Each element of the ArrayList
is a string representing a record from the file. The ArrayList is then
processed, and the arraylist goes out of scope.

As the file is read, and the ArrayList is extended, memory usage goes
from about 6Mb up to about 200Mb in some cases, depending on the size
of the file.

What is peculiar is that after the file has been read and closed, and
the ArrayList has gone out of scope, the memory usage does not not
reduce even after waiting several minutes.

My expectation is that most of the 200Mb would be garbage collected
soon after the ArrayList goes out of scope, and released back to the
system after several minutes, but this does not happen.

Another peculiar behaviour is that when the app wakes to process a
second file, memory usage remains at about 200Mb. After the second file
is processed, memory is still 200Mb. Note that when the second file
processed, memory does not jump from 200mb to 400mb back to 200mb.

I have tried using ArrayList's Clear() method to empty the arraylist
before it goes out of scope but that does not make any difference.

To try and make the problem reproducable, I wrote a test app that
simply reads a file into an arraylist, calls ArrayList.Clear() then
allows the ArrayList to fall out of scope, and sure enough the problem
is still there. Memory usage climbs in rough proportion to the file
size and remains there until the app ends, even if the file is read
several times, and even when the array list is out of cope, i.e. not
referenced.

Explicitly calling CG.Collect() made hardly any difference, although
calling GC.GetTotalMemory(true) shortly after processing each file did
reduce the amount of memory shown by task manager for the process.

I've re-written my app not to read the whole file into memory before
processing it (not a particularly good idea in the first place) and the
maximum memory usage has changed from 200mb to 35mb.

So what is really happening here?

In C# there's no way to explicitly release resources, so I do not have
direct control over releasing memory.

Assuming that the objects in the arraylist and the arraylist itself are
no longer referenced they should be candidates for garbage collection,
and should eventually be garbage collected, at which point the memory
they used should be released back to the system.

Perhaps these objects are no longer referenced, and the GC has decided
that as there's insufficient contention for memory on the server, it
won't bother garbage collecting them, resulting in apparently high
memory usage by the application.

I don't really know how to measure what's happening, any offers would
be greatfully received. I've included some test code below in the form
of a console app; you'll have to make the code process a suitably large
text file of your own though.

Regards,
Ian.

<!-- Code Starts here -->

using System;
using System.IO;
using System.Collections;

namespace SimpleFileReader
{
class SimpleFileReader
{
public static string readLineResponse;

[STAThread]
static void Main(string[] args)
{

Console.WriteLine( "Press return to create an app object, and run
the app." );
readLineResponse = Console.ReadLine();

SimpleFileReader app = new SimpleFileReader();
app.Execute();

Console.WriteLine( "Type quit to terminate the app." );

readLineResponse = "continue";

while ( true )
{
Console.WriteLine(
"\nHit return to continue, \n" +
" or type 'gc' to garbage collect, \n" +
" or type 'gtm' to call GC.GetTotalMemory(true), \n" +
" or type 'quit' and hit return.\n\n" );

readLineResponse = Console.ReadLine();

if ( "gc" == readLineResponse ) GC.Collect();
if ( "gtm" == readLineResponse ) GC.GetTotalMemory( true );
if ( "quit" == readLineResponse ) break;
}
}


public void Execute()
{
GetRecords( @"C:\SimpleFileReader\SLEXTRACTFILE050701.DAT" );
}


public void GetRecords( string fileName )
{
int recordCounter = 0;

ArrayList al = new ArrayList();

try
{
FileStream fs = File.OpenRead( fileName );
StreamReader sr = new StreamReader( fs );

string aLine = string.Empty;

Console.WriteLine( "Press return to read the file." );
readLineResponse = Console.ReadLine();

while( ( aLine = sr.ReadLine() )!= null )
{
if ( aLine.Length > 0 )
{
al.Add( aLine );
++recordCounter;
}
}

Console.WriteLine( recordCounter.ToString() + " record(s) read from
" + fileName );
Console.WriteLine( "File read. Press return to continue." );
readLineResponse = Console.ReadLine();

sr.Close();
fs.Close();
}
catch( System.Exception ex )
{
string extractFailedMessage =
"An exception occurred whilst extracting the contents of " +
fileName;

Console.WriteLine( extractFailedMessage + "\n" + ex.ToString() );
}

Console.WriteLine( "Press return to clear the arraylist" );
readLineResponse = Console.ReadLine();
al.Clear();
Console.WriteLine( "al.Clear() called." );

Console.WriteLine( "Press return to call GC.GetTotalMemory" );
readLineResponse = Console.ReadLine();
Console.WriteLine( "GC.GetTotalMemory just returned: " +
GC.GetTotalMemory( true ).ToString() );
}
}
}


<!-- Code Ends here -->
 
W

Willy Denoyette [MVP]

Inline

Willy.

Ian Taite said:
Hello,

I'm exploring why one of my C# .NET apps has "high" memory usage, and
whether I can reduce the memory usage.

I have an app that wakes up and processes text files into a database
periodically. What happens, is that the app reads the contents of a
text file line by line into an ArrayList. Each element of the ArrayList
is a string representing a record from the file. The ArrayList is then
processed, and the arraylist goes out of scope.

As the file is read, and the ArrayList is extended, memory usage goes
from about 6Mb up to about 200Mb in some cases, depending on the size
of the file.

What is peculiar is that after the file has been read and closed, and
the ArrayList has gone out of scope, the memory usage does not not
reduce even after waiting several minutes.

My expectation is that most of the 200Mb would be garbage collected
soon after the ArrayList goes out of scope, and released back to the
system after several minutes, but this does not happen.
After the functions ends with 200Mb string data in the Gen2 heap (yes, all
your string objects are aged objects) , the program returns to the main loop
and waits for a command (quit, gc, ...). But there is no reason for the GC
to run and clean-up the garbage, he quite happy with all the memory free in
the system ;-). Things would be different if you didn't have that much
memory available, the system would request the GC to clean-up after the
ArrayList became eligible for collection.
But wait, If you didn't have that memory free the system would start paging
because you need that memory badly because of your design which is somewhat
flawned.
It's flawned because I see no valid reason to store a complete text file in
a ArrayList (or array that's the same) one line per entry. Each entry has a
size of 16 + (line length in character * 2) (16 is the object overhead and,
I suppose the file is ANSI, while strings are unicode in .NET). That's a
reason not to store the whole file in memory, but only read small chuncks at
a time and process these. The small chunks (let's say 1000 lines at a time)
could be stored in a fixed length array, after which you can start
processing the lines stored in the array. Once this is done you refile the
array with the next chunk until the whole file is processd.


Another peculiar behaviour is that when the app wakes to process a
second file, memory usage remains at about 200Mb. After the second file
is processed, memory is still 200Mb. Note that when the second file
processed, memory does not jump from 200mb to 400mb back to 200mb.

I have tried using ArrayList's Clear() method to empty the arraylist
before it goes out of scope but that does not make any difference.

To try and make the problem reproducable, I wrote a test app that
simply reads a file into an arraylist, calls ArrayList.Clear() then
allows the ArrayList to fall out of scope, and sure enough the problem
is still there. Memory usage climbs in rough proportion to the file
size and remains there until the app ends, even if the file is read
several times, and even when the array list is out of cope, i.e. not
referenced.

Explicitly calling CG.Collect() made hardly any difference, although
calling GC.GetTotalMemory(true) shortly after processing each file did
reduce the amount of memory shown by task manager for the process.

I've re-written my app not to read the whole file into memory before
processing it (not a particularly good idea in the first place) and the
maximum memory usage has changed from 200mb to 35mb.

So what is really happening here?

In C# there's no way to explicitly release resources, so I do not have
direct control over releasing memory.

Assuming that the objects in the arraylist and the arraylist itself are
no longer referenced they should be candidates for garbage collection,
and should eventually be garbage collected, at which point the memory
they used should be released back to the system.

Perhaps these objects are no longer referenced, and the GC has decided
that as there's insufficient contention for memory on the server, it
won't bother garbage collecting them, resulting in apparently high
memory usage by the application.

I don't really know how to measure what's happening, any offers would
be greatfully received. I've included some test code below in the form
of a console app; you'll have to make the code process a suitably large
text file of your own though.

Regards,
Ian.

<!-- Code Starts here -->

using System;
using System.IO;
using System.Collections;

namespace SimpleFileReader
{
class SimpleFileReader
{
public static string readLineResponse;

[STAThread]
static void Main(string[] args)
{

Console.WriteLine( "Press return to create an app object, and run
the app." );
readLineResponse = Console.ReadLine();

SimpleFileReader app = new SimpleFileReader();
app.Execute();

Console.WriteLine( "Type quit to terminate the app." );

readLineResponse = "continue";

while ( true )
{
Console.WriteLine(
"\nHit return to continue, \n" +
" or type 'gc' to garbage collect, \n" +
" or type 'gtm' to call GC.GetTotalMemory(true), \n" +
" or type 'quit' and hit return.\n\n" );

readLineResponse = Console.ReadLine();

if ( "gc" == readLineResponse ) GC.Collect();
if ( "gtm" == readLineResponse ) GC.GetTotalMemory( true );
if ( "quit" == readLineResponse ) break;
}
}


public void Execute()
{
GetRecords( @"C:\SimpleFileReader\SLEXTRACTFILE050701.DAT" );
}


public void GetRecords( string fileName )
{
int recordCounter = 0;

ArrayList al = new ArrayList();

try
{
FileStream fs = File.OpenRead( fileName );
StreamReader sr = new StreamReader( fs );

string aLine = string.Empty;

Console.WriteLine( "Press return to read the file." );
readLineResponse = Console.ReadLine();

while( ( aLine = sr.ReadLine() )!= null )
{
if ( aLine.Length > 0 )
{
al.Add( aLine );
++recordCounter;
}
}

Console.WriteLine( recordCounter.ToString() + " record(s) read from
" + fileName );
Console.WriteLine( "File read. Press return to continue." );
readLineResponse = Console.ReadLine();

sr.Close();
fs.Close();
}
catch( System.Exception ex )
{
string extractFailedMessage =
"An exception occurred whilst extracting the contents of " +
fileName;

Console.WriteLine( extractFailedMessage + "\n" + ex.ToString() );
}

Console.WriteLine( "Press return to clear the arraylist" );
readLineResponse = Console.ReadLine();
al.Clear();
Console.WriteLine( "al.Clear() called." );

Console.WriteLine( "Press return to call GC.GetTotalMemory" );
readLineResponse = Console.ReadLine();
Console.WriteLine( "GC.GetTotalMemory just returned: " +
GC.GetTotalMemory( true ).ToString() );
}
}
}


<!-- Code Ends here -->
 
I

Ian Taite

Reading the whole file.

Sure, in the original design I thought the files would only be small, a
few 1000 records at most, but in fact 180,000 is more likely. I did
rewrite the app to "read-a-line - process-a-line" with much less memory
usage and hardly any difference in run time. Task Manager shows a
memory usage (is that "working set"?) of about 35Mb once the first file
has been processed, and it remains at 35mb when any more files are
processed, and it doesn't seem to matter whether the additional files
are say 5,000 or 180,000 records long.
Garbage collector

So you reckon that the garbage collector has decided not to clean up
because it has decided there's no contention for memory? If another
process started that used lots of memory, would that cause the GC in my
app to collect and free up those memory pages that were no longer in
use? I might see the working set trimmed right back and the memory
metric in task manager reduce in such a case. I suppose this is
something I could test with a 2nd memory hogging test app.

Ian.
 
W

Willy Denoyette [MVP]

Ian Taite said:
Sure, in the original design I thought the files would only be small, a
few 1000 records at most, but in fact 180,000 is more likely. I did
rewrite the app to "read-a-line - process-a-line" with much less memory
usage and hardly any difference in run time. Task Manager shows a
memory usage (is that "working set"?) of about 35Mb once the first file
has been processed, and it remains at 35mb when any more files are
processed, and it doesn't seem to matter whether the additional files
are say 5,000 or 180,000 records long.


So you reckon that the garbage collector has decided not to clean up
because it has decided there's no contention for memory? If another
process started that used lots of memory, would that cause the GC in my
app to collect and free up those memory pages that were no longer in
use? I might see the working set trimmed right back and the memory
metric in task manager reduce in such a case. I suppose this is
something I could test with a 2nd memory hogging test app.

Ian.

Yes, if there is memory contention, the OS will ask the processes to trim
their WS and the CLR would initiate a ful collect, note that XP and w2k3
have other means to signal memory pressure. But don't worry, the GC will
clean-up earlier, when your code continues to process another file, the GC
will kick-in a do a full collect, that's why you saw a steady 200 MB memory
consumption in your runs.
If you need to know when exactly the GC kicks in you'll have to watch the
"CLR perfomance counters" using perfmon, especialy watch the Gen2 collection
counter and the gen1 and Gen2 sizes using the "CLR Memory Counters".

Willy.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top