Text Encoding...

Q

quincy

I need help. Please bear with this.

I have a program. It takes in files that are delimited.
The delimiters are declared in the file by looking at
fixed positions in the file (If you work with ANSI x12
files, you know what I mean). This normally isn't a
problem, but I'm getting a file that is using some odd
characters as delimiters.

Specifically, a Hex 'BA' is declared as a delimiter. I
read the file into memory using this...


=================================================
fs = New FileStream(InFile, System.IO.FileMode.Open,
System.IO.FileAccess.Read)
sr = New StreamReader(fs, System.Text.Encoding.UTF7)

InputString = sr.ReadToEnd
=================================================


At this point I close the file. In the string, I remove
any carriage control and line feed characters.

Then I write the string to a new file with this.


=================================================
fs2 = New FileStream(OutFile, System.IO.FileMode.Create,
System.IO.FileAccess.Write, IO.FileShare.Write)
sw = New StreamWriter(fs2, System.Text.Encoding.ASCII)

sw.Write(sANSIString)
=================================================

NOTE :Initially, I don't think my stream writer specified
encoding.

Anyway here is the problem....

The resulting file ends up with a different value in the
places where the hex 'BA' used to be. I've played with
various combinations of encoding, for both reading and
writing, and I'm not able to
maintain the character. I need to maintain this!

In one case, the single-byte hex 'BA' is actually
replaced with two bytes, but everything else in the file
is as it should be. In another case, the character is
a "?". I don't remember what happens in other
situations, but in no case is the hex 'BA' maintained.

I don't really understand encoding, so that is only
compounding my frustration and confusion.

Any help is greatly appreciated. I could supply more
details, if necessary.

QM.
 
B

Bryan Martin

Not for sure here but her we go anyway....

Stop removing the cr and lf's from the stream the A in BA is actually a lf.
When you replace this your losing your ability to parse at that position.

cr = Carriage Return
lf = Line Feed

Bryan Martin
(e-mail address removed)
 
B

Bryan Martin

Oh and BTW it seems your BA is represented as....

Hex B = Dec 11 which corresponds to vertical tab mostly added by word
processors.
Hex A = Dec 10 which corresponds to line feed.

Bryan Martin
(e-mail address removed)
 
G

Guest

I wondered about doing that, and I did run the read/write
without removing the crlf.... same problems.
 
C

Carl

Try setting the code page for the encoder, for example:

Encoding enc = Encoding.GetEncoding(1252);
 
J

Jon Skeet [C# MVP]

quincy said:
I need help. Please bear with this.

I have a program. It takes in files that are delimited.
The delimiters are declared in the file by looking at
fixed positions in the file (If you work with ANSI x12
files, you know what I mean). This normally isn't a
problem, but I'm getting a file that is using some odd
characters as delimiters.

Specifically, a Hex 'BA' is declared as a delimiter. I
read the file into memory using this...


=================================================
fs = New FileStream(InFile, System.IO.FileMode.Open,
System.IO.FileAccess.Read)
sr = New StreamReader(fs, System.Text.Encoding.UTF7)

InputString = sr.ReadToEnd
=================================================

As I said before, if you use UTF7 any bytes which were 0xba won't be
decoded properly, because UTF7 doesn't have any character which is
encoded to hex 0xba.

Please read the full response I posted when you asked the same question
(without the encoding on the writing side) 5 days ago.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top