out of Memory exception

G

Guest

I hope someone can help.

I have an application whose purpose is to suck files into memory and read
through them. These files can't be read sequentially. This has worked like
a charm for over a year... file sizes are anywhere from a few hundred bytes
to 50 MB.

Well, now i have to suck in files that could be larger than 300mb. I'm
getting out of memory exceptions on a server with 4gb of ram. When i watch
Task manager, the machine isn't even using half of its ram.

Last week I had to deal with a 130mb file and was able to get it to work by
forcing garbage collection (gc.collect), but now I'm stuck.

Any suggestions? Currently, I'm using a FileStream and StreamReader to read
the file into a StringBuilder using the StreamReader's '.ReadToEnd' method.

Any help is greatly appreciated.

Thanks
Corey
 
J

Jon Skeet [C# MVP]

cmies said:
I hope someone can help.

I have an application whose purpose is to suck files into memory and read
through them. These files can't be read sequentially. This has worked like
a charm for over a year... file sizes are anywhere from a few hundred bytes
to 50 MB.

Well, now i have to suck in files that could be larger than 300mb. I'm
getting out of memory exceptions on a server with 4gb of ram. When i watch
Task manager, the machine isn't even using half of its ram.

Last week I had to deal with a 130mb file and was able to get it to work by
forcing garbage collection (gc.collect), but now I'm stuck.

Any suggestions? Currently, I'm using a FileStream and StreamReader to read
the file into a StringBuilder using the StreamReader's '.ReadToEnd' method.

Hmm... is the file genuinely text? Does it contain any non-ASCII
characters? Could you seek into the appropriate place in the file
rather than actually having the whole thing in memory the whole time?

My guess is that using ReadToEnd will create lots of temporary objects,
which will end up getting very large. You *might* be able to improve
things by giving the StreamReader the size of your file as its buffer
size - that way it won't need to keep building up bigger buffers.
 
G

Guest

Yes, it is a text file.

I used to read files into a string, but that wasn't working as well as the
StringBuilder when dealing with 130mb file. And the file needs to be in
memory for two reasons... I need to make some sweeping edits and then I need
to extract specific pieces of data from it and the data extracted is
determined "dynamically" based on info in the file... I may need to go back
to the beginning to read the file to get more info.

I've not used SEEK (i don't think)... I don't know how it could possibly be
useful for my application since these aren't flat files, but I'll see if it
offers any hope.

Thanks
Corey
 
W

Willy Denoyette [MVP]

cmies said:
Yes, it is a text file.

I used to read files into a string, but that wasn't working as well as the
StringBuilder when dealing with 130mb file. And the file needs to be in
memory for two reasons... I need to make some sweeping edits and then I
need
to extract specific pieces of data from it and the data extracted is
determined "dynamically" based on info in the file... I may need to go
back
to the beginning to read the file to get more info.

I've not used SEEK (i don't think)... I don't know how it could possibly
be
useful for my application since these aren't flat files, but I'll see if
it
offers any hope.

Thanks
Corey


Please post your code or at least the part that reads the file.
Don't forget that reading a ASCII text file always implies doubling the size
of the (string) buffer returned by StreamReader(), that means that reading a
130MB ASCII file results in a >260MB buffer

Worse if you aren't careful and use ReadToEnd() you will consume more than 4
times the file size.
Consider following (BAD) sample:

StringBuilder sb = null;
string path = "test.txt";
using (FileStream fs = new FileStream(path, FileMode.Open))
{
using (StreamReader sr = new StreamReader(fs))
{
sb = new StringBuilder(sr.ReadToEnd());
}
}

Say the file ASCII test.txt file has a size of 20Mb.
The memory used by this part of the code after ReadToEnd() is:
67108882 Bytes for the string returned by ReadToEnd(). Note that the string
buffer is a char[] which doubles it's size starting with 256 at each
overflow. The result is a buffer of (32 * 2 ) +16 = 67108882 bytes.
But there is more, the string is moved to a Stringbuilder and again another
6710882 bytes are taken for the StringBuilder's string buffer. The net
result is 128MB taken from the LOH for a file of only 20MB in size.

You can easily calculate what's needed for a 300MB file if you are using the
same "technique" as illustrated above.

Willy.
 
J

Jon Skeet [C# MVP]

cmies said:
Yes, it is a text file.

But is it only ASCII text? There's a big difference. For one thing, if
it's all in ASCII, you can accurately seek to any character position by
seeking to the same byte position.
I used to read files into a string, but that wasn't working as well as the
StringBuilder when dealing with 130mb file. And the file needs to be in
memory for two reasons... I need to make some sweeping edits and then I need
to extract specific pieces of data from it and the data extracted is
determined "dynamically" based on info in the file... I may need to go back
to the beginning to read the file to get more info.

That's fine - you then just seek to the beginning.
I've not used SEEK (i don't think)... I don't know how it could possibly be
useful for my application since these aren't flat files, but I'll see if it
offers any hope.

It doesn't matter whether they're flat files or not - if you just need
to be able to get to specific bits of the file (you gave an example of
going back to the beginning) then you can do that easily enough without
having it all in memory.

Put it this way - imagine the disk as a very big version of memory.
You're already seeking in some ways, by processing parts of the string
at a time. Just think about seeking to the right place in the stream
and reading a chunk of text in the same way. (If you use StreamReader,
don't forget to call DiscardBufferedData after seeking though.)
 
A

AlexS

I would dump file in database and process it using SQL>
Considering the size.

It might be a bit more slow than from memory but certainly more reliable
What you will do when you get file >4Gb?

HTH
Alex
 
G

Guest

Forgive me, but I'm not seeing how "seek" will help. I don't see a means by
which I can seek to a specific piece of text (the start of a logical record),
for example, "~XYZ^". I hope I'm wrong, because it would be wonderful if I
could do exactly that. There will likely be many instances of "~XYZ^" in a
file and I'll need to find them all and then work with the fields in the
record.

When I'm done looking for the "~XYZ^" record, I'll probably need to look for
the "~ABC^" records, too.
 
W

Willy Denoyette [MVP]

cmies said:
Forgive me, but I'm not seeing how "seek" will help. I don't see a means
by
which I can seek to a specific piece of text (the start of a logical
record),
for example, "~XYZ^". I hope I'm wrong, because it would be wonderful if
I
could do exactly that. There will likely be many instances of "~XYZ^" in
a
file and I'll need to find them all and then work with the fields in the
record.

When I'm done looking for the "~XYZ^" record, I'll probably need to look
for
the "~ABC^" records, too.

Well, if you have "logical records" it probably means you have line
delimited (cr/lf) files, right?
So you can read the file line by line and store these lines (call it
records) which have "~XWY^" in a container (ArrayList, or whatever fits your
needs) and skip all others for now. When done processing these stored
"records", rescan the file for "~ABC^" and repeat the process until all
records are done.

Willy.
 
W

Willy Denoyette [MVP]

Ok then, are your "records" fixed length? They must have a well defined
layout isn't it?
Point is that you can't take the whole file in memory (see my other reply),
so you have to find a way to scan the file piece by piece and build a
collection of related "records" you can process in one go. Once done with
one "record type" rescan the file and restart the process for the other type
until all types are done.
If your records have a fixed length you could use ReadBlock() to read a
block of "records" , of reasonable size, and scan the block, extracting the
wanted records and store them in a collection (your choice). If they don't
have a fixed length things are just a little more complicated as the end of
a block isn't necessarily the end of a record.

Willy.
 
G

Guest

I've pretty much decided to do something similar to what you've described.

The files are HIPAA claim files, they are one continuous string of data and
the length of the record depends on record type and usage and then content of
each field. Thankfully, I don't care about most of the records, but I do
need to glean certain pieces of info from the file.

For smaller files, I've been reading the entire file into a string and can
very quickly get the info that I need from the file by using simple string
commands.

c
 
W

Willy Denoyette [MVP]

Ok, that's great you have structured data, you have a type from which you
can deduct the length and you have fields, so you can easily build a
collection of objects from it.
You just need to find a good balance between required memory space (the
block size used for ReadBlock()) and speed of execution.

Willy.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top