Reading heavy .TXT file with StreamReader

M

mloichate

I must read a very heavy-weight text plain file (usually .txt
extension) )and replace a given character with another given character
in all text inside the file. My application was working pretty well
with this below shown code (code placed in a buttonclick event after
selecting the file in a normal OpenFileDialog):

---------------------------------
System.IO.StreamReader sr = new System.IO.StreamReader(@"C:\1.txt");
string all = sr.ReadToEnd();
sr.Close();
all = all.Replace(",",".");
System.IO.StreamWriter sw = new System.IO.StreamWriter(@"C:\1.txt",
false);
sw.Write(all);
sw.Flush();
sw.Close();
MessageBox.Show("Done!");
---------------------------------

The problem is that user now needs to read over 500MBytes .txt files.
Incredible and hellish but true!

User now runs the application and after selecting .txt file with
OpenFileDialog, application starts processing but inmediatelly crashes
or hangs, I don't remember.

I need a good performance way for reading so heavy .txt files. Reading
the file in different steps would do it? I mean, reading in arrays of
bytes or similar, with MemoryStream object..........

Any help will be greatly appreciated.
 
P

Paul E Collins

[replacing characters in a 500 MB text file]
[...]
string all = sr.ReadToEnd();

This means you're reading the entire file into memory at once.
Instead, try having a StreamReader and a StreamWriter open at the same
time. Read a line from one file and write it to the other file, then
delete the first file when done and replace it with the second one.
You will only use the memory taken up by one line, rather than
slurping up 500 MB out of nowhere.

P.
 
I

Ignacio Machin \( .NET/ C# MVP \)

Hi,


Below is a better solution:

string sourceFile = @"C:\1.txt";
string tempTarget = System.IO.Path.GetTempFileName();

StreamReader reader = new System.IO.StreamReader( sourceFile );
StreamWriter writer = new StreamWriter( tempTarget )

string line;
while ( (line = reader.ReadLine() )
writer.WriteLine( line.Replace( ",",".") )
reader.Close();
writer.Close();
File.Delete( source);
File.Copy( target, source);


cheers,
 
M

mloichate

Thanks guys, specially Ignacio.

I'll try next days but really seems to be a good solution. If I have
any more problems, I'll return to the post.

Thanks again.
 
M

mloichate

Hi again, the "while" statement does not compile because the while
comparison must return true and we are doing an assigment statement.
ReadLine() returns string and must be bool. Besides, I cannot do
while (reader.ReadLine() != String.Empty) because the file can contain
blank lines and this way we will go out of the while loop.

Any other idea?

Thanks.
 
J

Jon Skeet [C# MVP]

It should be
while ( (line.reader.ReadLine()) != null)

Another suggestion, however - rather than having manual calls to Close,
you should use "using" statements round both the reader and the writer.
That way they automatically get closed whether an exception occurs or
not.

Jon
 
M

mloichate

Hi guys,

The correct code, perfectly working, although I haven't tried with 500
MBytes .txt files:

-----------------------------------------------------------------------
string sourceFile = textBox1.Text;
string tempTarget = System.IO.Path.GetTempFileName();
StreamReader reader = new System.IO.StreamReader(sourceFile);
StreamWriter writer = new StreamWriter(tempTarget);

string line = String.Empty;
while ( (line = reader.ReadLine()) != null)
{
writer.WriteLine(line.Replace(txt1.Text,txt2.Text));
}
reader.Close();
writer.Close();
File.Delete(sourceFile);
File.Move(tempTarget,sourceFile);
 
J

Jon Skeet [C# MVP]

Hi guys,

The correct code, perfectly working, although I haven't tried with 500
MBytes .txt files:

-----------------------------------------------------------------------
string sourceFile = textBox1.Text;
string tempTarget = System.IO.Path.GetTempFileName();
StreamReader reader = new System.IO.StreamReader(sourceFile);
StreamWriter writer = new StreamWriter(tempTarget);

string line = String.Empty;
while ( (line = reader.ReadLine()) != null)
{
writer.WriteLine(line.Replace(txt1.Text,txt2.Text));
}
reader.Close();
writer.Close();
File.Delete(sourceFile);
File.Move(tempTarget,sourceFile);

You should be able to:

using (StreamReader reader = new StreamReader (sourceFile))
{
using (StreamWriter writer = new StreamWriter (tempTarget))
{
string line;
while ( (line=reader.ReadLine()) != null)
{
writer.WriteLine (line.Replace(txt1.Text, txt2.text);
}
}
}
File.Delete(sourceFile);
File.Move(tempTarget,sourceFile);
 
M

mloichate

Congratulations for this simple but really good snipplet of code Jon.

Nested "Using" structure seems to be very comfortable and as you said,
you don't have to worry about closing neither the reader nor the
writer.

Thanks very much.

Regards,
 
W

William Stacey [MVP]

You should be able to:
using (StreamReader reader = new StreamReader (sourceFile))
{
using (StreamWriter writer = new StreamWriter (tempTarget))
{
string line;
while ( (line=reader.ReadLine()) != null)
{
writer.WriteLine (line.Replace(txt1.Text, txt2.text);
}
}
}
File.Delete(sourceFile);
File.Move(tempTarget,sourceFile);

For what it is worth, I have been using slightly modified usage like:

using(StreamReader sr = ...)
using(StreamWriter sw = ...)
{
// use the streams.
}

Same results, just saves a couple lines and looks slightly cleaner.
 
L

Lonifasiko

Sorry to dissapoint you but my programming style goes more towards
Jon's style. Code looks like much more estructured and clearer for me.
I think in the example you only are "saving" two brackets, don't you?

One of my own programming rules: "It's better to have a clear and
well-structured code (brackets and so on...) in spite of having a
little bit more lines of code".

Thanks anyway. I didn't know you couldn't nest two "using" loops
without any brackets between them.

Regards.
 
J

Jon Skeet [C# MVP]

Lonifasiko said:
Sorry to dissapoint you but my programming style goes more towards
Jon's style. Code looks like much more estructured and clearer for me.
I think in the example you only are "saving" two brackets, don't you?

It depends. You can easily get really heavily indented code with no
benefit in some situations - it's quite easy to end up with four nested
brackets without much in between.

I very rarely use the version with no extra brackets, but occasionally
it does improve readability. After all, you're treating the whole
contents as a single block.
 
W

William Stacey [MVP]

Sorry to dissapoint you but my programming style goes more towards
Jon's style. Code looks like much more estructured and clearer for me.
I think in the example you only are "saving" two brackets, don't you?

Thats ok, no dissapointment. What ever make you more productive.
One of my own programming rules: "It's better to have a clear and
well-structured code (brackets and so on...) in spite of having a
little bit more lines of code".

I would disagree with that. It is more clear, IMO, to do it the other way
as you see right away where they come in scope and leave scope at the same
place. For two streams it probably makes no difference. But for 3 or 4 (or
more), using the method I showed can really make it cleaner IMO. Your not
just saving the brackets, but your eliminating all the unnessasary nesting
and indents which can get distracting. And if your stacking streams in a
pipeline (i.e. file to compression to encryption to networkstream), it
really looks good. The end result is the same, but naturally everyone has
different tastes.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top