Strip Unwanted Characters from a text file

  • Thread starter Thread starter David Beck
  • Start date Start date
D

David Beck

I donwnload some files for processing every day that have unwanted
characters in them. In VB6 I use the InputB to read in the text and the
StrConv.

vLinesFromFile = StrConv(InputB(LOF(nFileNumGENERIC), nFileNumGENERIC),
vbUnicode)

If the string has any unwanted characters (e.g. Chr(26)), I use the
replace to remove them and save the file.

Now the size of some of these files has grown to several megabytes.
Processing them in VB6 now is slower that a slug in salt. Can someone
give me a C# program stub that can help a VB guy check for unwanted
characters and eliminate them? I'm thinking it will be much faster.

David A. Beck
 
David,

Are you writing these back to the main file, or to a new file? Either
way, you should open up two streams (one to read, one to write), and then
put them in a StreamReader and StreamWriter respectively. As you cycle
through the characters in the stream (you can read in chunks, you can decide
what chunck size is the best) check for the existence of the character. If
you need it replaced, then replace it before writing it to the output
stream.

Hope this helps.
 
David Beck said:
I donwnload some files for processing every day that have unwanted
characters in them. In VB6 I use the InputB to read in the text and the
StrConv.

vLinesFromFile = StrConv(InputB(LOF(nFileNumGENERIC), nFileNumGENERIC),
vbUnicode)

If the string has any unwanted characters (e.g. Chr(26)), I use the
replace to remove them and save the file.

Now the size of some of these files has grown to several megabytes.
Processing them in VB6 now is slower that a slug in salt. Can someone
give me a C# program stub that can help a VB guy check for unwanted
characters and eliminate them? I'm thinking it will be much faster.

The simplest way would probably be to read the file line by line with a
StreamReader (using whatever encoding the file is in), use
String.Replace (or possibly a regular expression) to remove the
characters, then write the line back to the new file.
 
Jon:

Actually all I want to do is read line for line. However, some files had
chr(0) imbedded in them and that gets translated as EOF.

I set up a test openeing an append file and processing chuncks 1K, 10K,
and 100K on a 101MB file. The total time is not much different from the
VB6 loading the whole thing at once. However, it does not kill the whole
machine in the process of doing it.

Are you saying I can use file streams in/out in VB6? Can you give me a
code stub?
 
David Beck said:
Actually all I want to do is read line for line. However, some files had
chr(0) imbedded in them and that gets translated as EOF.

It shouldn't do when read by StreamReader, IIRC.
I set up a test openeing an append file and processing chuncks 1K, 10K,
and 100K on a 101MB file. The total time is not much different from the
VB6 loading the whole thing at once. However, it does not kill the whole
machine in the process of doing it.

Are you saying I can use file streams in/out in VB6? Can you give me a
code stub?

Well, something like:

using (StreamReader input = new StreamReader (...))
{
using (StreamWriter output = new StreamWriter (...))
{
string line;
while ( (line = input.ReadLine()) != null)
{
// Munge the line however you want to

output.WriteLine (line);
}
}
}
 
Back
Top