Kernel Memory leak with Files

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

I am using Windows XP Professional with SP2.

I am writing an application that writes millions of small .txt files to the
filesystem. As files are opened and closed, there seems to be a memory leak
in the Windows OS, because after a little over a million files have been
created, my disk I/O speed significantly decreases. Additionally,
applications such as Task Manager cannot be executed. An error message pops
up indicating that the paged pool has been depleted. If I reboot the
machine, I can start creating new files again.

When I write the same amount of data to a single file (without opening or
closing new files), my application can run forever. This leads me to believe
that the leak occurs when opening and closing files.

The main loop of a small test application that I wrote to confirm that it is
not a problem with my code looks like this (where newfilename is unique
everytime through the loop):
while (true)
{
handle=CreateFile(newfilename, .... );
CloseHandle(handle);
}

Executing just those two lines over and over brings about the empty paged
pool symptoms. I have run poolmon.exe with this application running, and I
can see that the $Mft size and Mdl are growing rapidly in the non-paged pool,
and the MmSt paged pool tag was the big hitter in the paged pool.

Does anyone have any ideas of what is causing this? And maybe how I can
correct it?

Thanks,
Jake
 
Jake Chung said:
I am using Windows XP Professional with SP2.

I am writing an application that writes millions of small .txt files to
the
filesystem. As files are opened and closed, there seems to be a memory
leak
in the Windows OS, because after a little over a million files have been
created, my disk I/O speed significantly decreases. Additionally,
applications such as Task Manager cannot be executed. An error message
pops
up indicating that the paged pool has been depleted. If I reboot the
machine, I can start creating new files again.

When I write the same amount of data to a single file (without opening or
closing new files), my application can run forever. This leads me to
believe
that the leak occurs when opening and closing files.

The main loop of a small test application that I wrote to confirm that it
is
not a problem with my code looks like this (where newfilename is unique
everytime through the loop):
while (true)
{
handle=CreateFile(newfilename, .... );
CloseHandle(handle);
}

Executing just those two lines over and over brings about the empty paged
pool symptoms. I have run poolmon.exe with this application running, and
I
can see that the $Mft size and Mdl are growing rapidly in the non-paged
pool,
and the MmSt paged pool tag was the big hitter in the paged pool.

Does anyone have any ideas of what is causing this? And maybe how I can
correct it?

Thanks,
Jake
You are trying to save entirely too many files in a single folder. The
sequential IO required to find a single file (or even merely the next file)
is slowing your system down.
So, your program design is faulty.

Jim
 
Jake Chung said:
I am using Windows XP Professional with SP2.

I am writing an application that writes millions of small .txt files to
the
filesystem. As files are opened and closed, there seems to be a memory
leak
in the Windows OS, because after a little over a million files have been
created, my disk I/O speed significantly decreases. Additionally,
applications such as Task Manager cannot be executed. An error message
pops
up indicating that the paged pool has been depleted. If I reboot the
machine, I can start creating new files again.

When I write the same amount of data to a single file (without opening or
closing new files), my application can run forever. This leads me to
believe
that the leak occurs when opening and closing files.

The main loop of a small test application that I wrote to confirm that it
is
not a problem with my code looks like this (where newfilename is unique
everytime through the loop):
while (true)
{
handle=CreateFile(newfilename, .... );
CloseHandle(handle);
}

Executing just those two lines over and over brings about the empty paged
pool symptoms. I have run poolmon.exe with this application running, and
I
can see that the $Mft size and Mdl are growing rapidly in the non-paged
pool,
and the MmSt paged pool tag was the big hitter in the paged pool.

Does anyone have any ideas of what is causing this? And maybe how I can
correct it?

Thanks,
Jake

I'm assuming that you're using C++.
So how do you know that it's not your program that's at fault? It could just
be that something like CloseHandle() in C++ is sloppy with its cleanup.
Have you tried a for loop instead of a while loop to see if that makes any
difference (it probably won't, but have you tried it)? Something like -
for(x=0; continueWrite == true; x++). I know that this adds x as a counter
that doesn't do anything, but it wouldn't be bad to rule it out that it
won't work better.

Although I only have a very basic understanding of C++, as I only work with
C#, I do know that running a loop millions of times with creating, writing,
and closing files with every itteration will produce severe system
resource/program degradation. So you might want to try another approach.

But you might be able to try this with your program:
New C++ Classes for Better Resource Management in Windows:
http://msdn.microsoft.com/msdnmag/issues/0400/win32/

Or you might want to try some other method of reclaiming
resources/performing cleanup within your loop.

Also, you might want to see if someone in C++ newsgroup can help you as this
is more directly related to programming.
Here are a few that I found:
alt.comp.lang.learn.c-c++
alt.lang.c++
comp.lang.c++
comp.std.c++

-Dan
 
Jake
You are trying to save entirely too many files in a single folder. The
sequential IO required to find a single file (or even merely the next file)
is slowing your system down.
So, your program design is faulty.

Jim

Jim,
You are very right, and I did not include the directory logic in the example
code because I wanted to keep it simple. In reality, every 40,000 files that
I write out, I create a new directory and start writing to the new directory
instead. Do you have any other ideas?

Thanks,
Jake
 
Jake Chung said:
Jim,
You are very right, and I did not include the directory logic in the
example
code because I wanted to keep it simple. In reality, every 40,000 files
that
I write out, I create a new directory and start writing to the new
directory
instead. Do you have any other ideas?

Thanks,
Jake
I believe that 40,000 files are too many...
Jim
 
Isn't anyone else curious as to WHY all these files are being written? What
possible purpose would this activity be in aid of? One of the first things I
would have tried was to WRITE the files out unti I started slowing down, then
DELETE all the files written and see if that improved system conditions.

I'm with Jim: I also think that 40,000 files in a single directory is too
many. Remember that even if the data contained is minimal (say, 500 bytes)
the actual file size on disk would be the size of an allocation unit.

GP
 
Grand_Poohbah said:
Isn't anyone else curious as to WHY all these files are being written?
What
possible purpose would this activity be in aid of? One of the first
things I
would have tried was to WRITE the files out unti I started slowing down,
then
DELETE all the files written and see if that improved system conditions.

I'm with Jim: I also think that 40,000 files in a single directory is too
many. Remember that even if the data contained is minimal (say, 500
bytes)
the actual file size on disk would be the size of an allocation unit.

GP

Who knows why people do what they do. I couldn't think of any real-world
situations where something like this would be applicable, other than writing
as many small files to the disk as possible to see how the system holds up.

If he is trying to store data in this situation then he should be using
either an external database or a few data files.

It has me a little intrigued as to why he would need to do this, but I
didn't really care enough to ask.

-Dan
 
Thanks dan for the tips.
I've tried the for loop, and my system still ended up hanging.

Also, I am sure that handles are not being leaked, because i am monitoring
the handle count using Task Manager, and the handle count does not go up
beyond 30 or so. I understand that the maximum file-handles-open-at-one-time
limit is 2048 or so....and I am no where near that limit.
 
Isn't anyone else curious as to WHY all these files are being written? What
possible purpose would this activity be in aid of? One of the first things I
would have tried was to WRITE the files out unti I started slowing down, then
DELETE all the files written and see if that improved system conditions.

I'm with Jim: I also think that 40,000 files in a single directory is too
many. Remember that even if the data contained is minimal (say, 500 bytes)
the actual file size on disk would be the size of an allocation unit.

GP

GP,

I have done a test where I was writing to the files between opening and
closing them. I have also done a test where I appended the contents of the
millions of files to a handful of flat files.

In the case of opening, writing, and closing over and over, I still got the
system freeze.

In the case of just writing all my data to the handful of flat files, I
could let it run forever until the HDD was full.

The situation where I am simply opening and closing files was the result of
several days of trying different things to get rid of other variables. With
all else removed, I was able to reproduce the problem by simply opening and
closing files. I think this information would be more specific to a
Microsoft developer than if I included all the other things.

Lastly, my example program may be outrageous. And it also is probably
something that no one would ever think of doing. However, I did think of
doing it, and so has someone else in another forum
(http://www.microsoft.com/windowsxp/...-us-ms-winxp&lang=en&cr=US&sloc=en-us&m=1&p=1).
Therefore we have uncovered something in the Windows OS that seems to be a
bug. Whether there's a reasonable reason for doing what I am doing or not, I
would like to publish my problem so that someone at Microsoft might be able
to address this issue.
 
He's right, even 40,000 files will take a while to search through. A better idea if possible is to use a single large file,
especially if there's a predetermined structure to the contents of the files.
 
Jake Chung said:
Thanks dan for the tips.
I've tried the for loop, and my system still ended up hanging.

Also, I am sure that handles are not being leaked, because i am monitoring
the handle count using Task Manager, and the handle count does not go up
beyond 30 or so. I understand that the maximum
file-handles-open-at-one-time
limit is 2048 or so....and I am no where near that limit.

Well, this would make it seem that Windows IO would be the problem then like
you originally thought. Maybe someone in a programming newsgroup would be
able to shed some more light on this for you.

As this would be a problem within Windows itself I would think that you'll
have to work around this problem. Another problem with this is that it is
probably compounded by the bottleneck with transferring data between RAM,
the hard drive, and the CPU. Running a computer with a faster system bus
would probably improve performance, but then there's no way to know if it
would improve noticeably or not short of actually running the program on a
Windows machine that has like a 1GHz bus.

Running a 64 bit system with a 64 bit program might also improve
performance, as it allows more throughput.

-Dan
 
Back
Top