Streamreader Alternative

  • Thread starter Thread starter Matt Bailey
  • Start date Start date
M

Matt Bailey

I am writing an ASP page with a C# backend. This page uses a
StreamReader to input a file and pull out a specific section of the
text based on markers inside. While the code I have works, I get a 404
error on files larger than 10 mbs, which will be a common occurence. I
understand this might be a question better suited for an ASP forum, but
I was wondering if anyone out there knows of an alternative to the
StreamReader which will help me avoid this timout issue.

Thanks.
 
Matt Bailey said:
I am writing an ASP page with a C# backend. This page uses a
StreamReader to input a file and pull out a specific section of the
text based on markers inside. While the code I have works, I get a 404
error on files larger than 10 mbs, which will be a common occurence. I
understand this might be a question better suited for an ASP forum, but
I was wondering if anyone out there knows of an alternative to the
StreamReader which will help me avoid this timout issue.

I doubt that the issue is really with StreamReader - that's pretty
fast.

Could you post a short but complete program which demonstrates the
problem?

See http://www.pobox.com/~skeet/csharp/complete.html for details of
what I mean by that.
 
StreamReader itself should be quite capable of handling the file. How are
you parsing the file? Reading line by line? Reading the whole thing at
once? How are you then "separating" out the section you need?

Also, just to clarify - you say you are getting a 404 error. That is a
"Page Not Found" error, I don't believe a timeout would throw this...
 
Here is a snippit of the code...

private Hashtable readPDFile(string pdFile)
{
StreamReader reader = new StreamReader(File.OpenRead(pdFile));
Hashtable pdList = new Hashtable();

while (reader.ReadLine().Trim() != "NCE"){}

ArrayList entries = new ArrayList();
string entry;

entry = reader.ReadLine();

while(entry != "ZZZZ")
{
if(!entry.StartsWith("&"))
entries.Add(entry.Substring(0, 8));

entry = reader.ReadLine().Trim();
}

entries.Sort();

foreach(object o in entries)
{
string key = o.ToString().Substring(0, 7);

if(pdList.ContainsKey(key))
pdList[key] = Convert.ToInt32(pdList[key]) + 1;
else
pdList.Add(key, 1);
}

reader.Close();

return pdList;
}

Essentially, I am reading through a file which is seperated into
"chapters", which are denoted by headers. The section I want starts
with the header "NCE", so I am reading through the lines until I hit
that marker, and then read from the file until I hit the end of the
chapter, marked by "ZZZZ". From there, I parse through the list and
count the number of entries for each value, ignoring the final
character. In the example of the file that doesn't work, that is about
20,000 records, which this program can handle. The problem lies in
where the file itself contains more data further on in additional
chapters, which is padding the filesize to 10 mb. This code works with
smaller files with more records being read in (approx 30,000), so I
know it's the additional chapters that are the culprit.

And you are right, I am not getting a 404 error, but a "The page cannot
be displayed. The page you are looking for is currently unavailable.
The Web site might be experiencing technical difficulties, or you may
need to adjust your browser settings." error.

Hope this makes things clearer.

Thanks!
 
Matt Bailey said:
Here is a snippit of the code...

Rather than a snippet, have you tried producing a short but complete
program which demonstrates the problem? What happens if you try to call
the same method from a console app which just passes in the filename
from the command line?

A few pointers on the code, btw:

1) You're not using either a "using" statement or try/finally with the
StreamReader, so if anything goes wrong the file will stay open until
the GC kicks in and finalizes the underlying stream.

2) Your ToString call in the foreach is unnecessary - just declare o as
a string instead of as an object.

3) It's not clear why you're building a list in the first place - why
not just add them straight to the hashtable?

Finally, if you can't reproduce the problem in a console app, add some
logging to the method to see how far it's actually getting and how long
it's taking to get there.
 
You might want to try turning off page buffering in your ASP.NET page
to see if that helps. It's turned on by default, and a 10Mb file will
likely be over that threshold.

Jono

Matt said:
Here is a snippit of the code...

private Hashtable readPDFile(string pdFile)
{
StreamReader reader = new StreamReader(File.OpenRead(pdFile));
Hashtable pdList = new Hashtable();

while (reader.ReadLine().Trim() != "NCE"){}

ArrayList entries = new ArrayList();
string entry;

entry = reader.ReadLine();

while(entry != "ZZZZ")
{
if(!entry.StartsWith("&"))
entries.Add(entry.Substring(0, 8));

entry = reader.ReadLine().Trim();
}

entries.Sort();

foreach(object o in entries)
{
string key = o.ToString().Substring(0, 7);

if(pdList.ContainsKey(key))
pdList[key] = Convert.ToInt32(pdList[key]) + 1;
else
pdList.Add(key, 1);
}

reader.Close();

return pdList;
}

Essentially, I am reading through a file which is seperated into
"chapters", which are denoted by headers. The section I want starts
with the header "NCE", so I am reading through the lines until I hit
that marker, and then read from the file until I hit the end of the
chapter, marked by "ZZZZ". From there, I parse through the list and
count the number of entries for each value, ignoring the final
character. In the example of the file that doesn't work, that is about
20,000 records, which this program can handle. The problem lies in
where the file itself contains more data further on in additional
chapters, which is padding the filesize to 10 mb. This code works with
smaller files with more records being read in (approx 30,000), so I
know it's the additional chapters that are the culprit.

And you are right, I am not getting a 404 error, but a "The page cannot
be displayed. The page you are looking for is currently unavailable.
The Web site might be experiencing technical difficulties, or you may
need to adjust your browser settings." error.

Hope this makes things clearer.

Thanks!

Adam said:
StreamReader itself should be quite capable of handling the file. How are
you parsing the file? Reading line by line? Reading the whole thing at
once? How are you then "separating" out the section you need?

Also, just to clarify - you say you are getting a 404 error. That is a
"Page Not Found" error, I don't believe a timeout would throw this...
 
Hi Jon, thanks for the replies.

The reason I am going to a list is so that I can sort the entries. The
Hashtable doesn't have that functionality as far as I know. As for the
try/catch, it's there, but not in the snippet. The code works in a
console app, but not over the web, which is where my problem lies.
 
I also neglected to mention in the previous post that I have tried to
do some debugging using both logging and breakpoints in the code, and
they never go off....
 
And in response to Jono, I have tried setting the Page.Response.Buffer
equal to false, assuming that's the buffer you are refering to. That
is to no avail.
 
Matt said:
I also neglected to mention in the previous post that I have tried to
do some debugging using both logging and breakpoints in the code, and
they never go off....

That's the first thing to fix then. You say it's working for small
files - so you should be able to get logging etc working on small
files. You can then try with the large file.

My guess is that you'll find your problem isn't in the area you think
it is.

Jon
 
Back
Top