Using detectEncodingFromByteOrderMarks while copying a text file

C

Claire

I've noticed after copying a text file line by line and comparing, that the
original had several bytes of data at the beginning denoting its encoding.
How do I use that in my copy?
My original code shown below, didn't produce a perfect copy, so I used the
StreamReader construct that includes detectEncodingFromByteOrderMarks. But I
need to pass that to the construct for my StreamWriter so I need to be able
to work out the encoding type somehow. How please?

string InputPath = Path.GetDirectoryName(Application.ExecutablePath) +
@"\intext.txt";
string OutputPath = Path.GetDirectoryName(Application.ExecutablePath)
+ @"\outtext.txt";
string In;
string Out;

using (StreamReader Input = new StreamReader(InputPath))
// using (StreamReader Input = new StreamReader(InputPath, true)) <<
construct
{
using (StreamWriter Output = new StreamWriter(OutputPath))
{
while ((In = Input.ReadLine()) != null)
{
Out = DoSomethingTo(In);
Output.WriteLine(Out);
}
}
}
 
M

Marc Gravell

I'm guessing - tell the writer about it?

using (StreamWriter Output = new StreamWriter(OutputPath, false,
Input.CurrentEncoding)) {...}

Marc
 
M

Marc Gravell

Correction - the CurrentEncoding is not valid until it has read some
data; perhaps something like below; note that it also can't detect every
encoding possible...

Marc

using (StreamReader reader = new StreamReader(path1, true))
{
string line = reader.ReadLine();
using (StreamWriter writer = new StreamWriter(path2, false,
reader.CurrentEncoding))
{
Console.WriteLine("Reading {0} with {1}", path1,
reader.CurrentEncoding.EncodingName);
Console.WriteLine("Writing {0} with {1}", path2,
writer.Encoding.EncodingName);

while (line != null)
{
string t = Transform(line);
Console.WriteLine(t);
writer.WriteLine(t);
line = reader.ReadLine();
}
}
}
 
C

Claire

Marc Gravell said:
Correction - the CurrentEncoding is not valid until it has read some data;
perhaps something like below; note that it also can't detect every
encoding possible...

That's great! thank you :)
 
M

Mihai N.

Using detectEncodingFromByteOrderMarks while copying a text file
Unless you process the text somehow, it is not worth the trouble to
copy a text file as text file (with encoding detection, line ending,
and so on).
Just copy it as a binary. The routine can also be reused for any type
of files, and there is no risk of data corruption if you "guess" the
encoding wrong.
 
M

Marc Gravell

I very nearly said the same thing - but if you look carefully, there is
a transform hidden in the code:

Out = DoSomethingTo(In);
Output.WriteLine(Out);

Marc
 
M

Mihai N.

I very nearly said the same thing - but if you look carefully, there is
a transform hidden in the code:

Right, I missed that one. Got fouled by the subject :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top