System.Text.Encoding

Q

qm

I'm really confused and frustrated. Please bear with this
explanation.

I have a text file. I'm opening the text file and
removing any Carriage Controls and Line feeds from the
file. This is usually simple.

HOWEVER...

I have one file where the delimiters are not
exactly "normal". One delimiter is a Hex '5E' and the
other a Hex 'BA'.

So I read the file in with this.....
====================================================
fs = New FileStream(sFile, System.IO.FileMode.Open,
System.IO.FileAccess.Read)

sr = New StreamReader(fs, System.Text.Encoding.UTF7)
====================================================

I then read the file into memory with sr.readtoend, and
do my replacing of cr and lf.

then I write the file out with this.....
====================================================
fs = New FileStream(File2, System.IO.FileMode.Create,
System.IO.FileAccess.Write, IO.FileShare.Write)

sw = New IO.StreamWriter(fs) <==could put encoding here

sw.Write(CleanedUpTextString)

=====================================================

Here's the problem.

The delimiters are getting replaced with other charcters.
In one instance, 'BA' gets replaced with a question mark
('3F'). In another instance, my two delimiters (they are
right next to each other in the original file) are turning
into three odd hex ('5E', 'C2', 'BA') characters. What
i've been changing is the encoding combinations for both
reading and writing (what is written above is my original
code), and haven't found anything that is preseving the
original data.

I really don't understand text encoding, so that isn't
helping my situation.

Any help would be greatly appreciated.

qm.
 
J

Jon Skeet [C# MVP]

qm said:
I'm really confused and frustrated. Please bear with this
explanation.

I have a text file. I'm opening the text file and
removing any Carriage Controls and Line feeds from the
file. This is usually simple.

HOWEVER...

I have one file where the delimiters are not
exactly "normal". One delimiter is a Hex '5E' and the
other a Hex 'BA'.

So I read the file in with this.....
====================================================
fs = New FileStream(sFile, System.IO.FileMode.Open,
System.IO.FileAccess.Read)

sr = New StreamReader(fs, System.Text.Encoding.UTF7)

Is the file definitely encoded in UTF-7? That's quite unusual, and if
it's not, that could well be your problem. In fact, it sounds very
likely that it *is* your problem, if your file has a byte of 0xba,
given that UTF-7 is meant to only include ASCII characters (ie 0x00-
0x7f).

I really don't understand text encoding, so that isn't
helping my situation.

Have a look at http://www.pobox.com/~skeet/csharp/unicode.html
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top