<Paul Bradshaw> wrote:
> I have a weird situation...
>
> I have a unicode text file that contains Latin-1 characters. This
> code displays fine in unicode text editors, and when converted to
> ANSI by a tool like EditPad, it looks correct as well. But when I use
> the code below to load it into a Text box, all the extended
> characters are stripped ... either the accents are missing, or the
> character is just plain gone.
>
> using (StreamReader text = File.OpenText(logFilePath))
> {
> string log = text.ReadToEnd();
> LogFileDisplay.Text = log;
> }
>
> I'm not sure what I'm missing here. Somewhere, something is
> transcoding the unicode text... and I want it to just leave it alone,
> and display it as-is. Any pointers?
Files are binary. Strings are text. When you convert between them, you
will *always* end up using an encoding somewhere, even if you think
it's just a "null" encoding.
In the case of StreamReader, the default encoding is UTF-8. If you want
to use a different encoding, you need to specify that. It's a shame
that File.OpenText neither documents what encoding it will use, nor
allows you to specify it.
Use one of the StreamReader constructors instead, either passing in the
filename or opening a stream and then passing that it - along with the
appropriate encoding.
--
Jon Skeet - <(E-Mail Removed)>
http://www.pobox.com/~skeet Blog:
http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too