M
Marc Scheuner
Folks,
I have a text file which contains some XML. In its XML header, it
claims to be of UTF-8 encoding - however, it's really not, it's a ANSI
/ Windows-1252 / ISO-8859-1 encoding.
Trouble is: when I deserialize objects from that file, all the German
umlauts and other special characters get dropped, some even cause
deserialization errors.
When I open the file in a text editor and save it as a REAL UTF-8
file, every thing works just fine as expected.
I then tried to make sure I open the text file with a StreamReader,
telling it to determine the encoding automatically, and I intended to
then store it as real UTF-8 in case it wasn't really in that encoding.
Trouble is: no matter what encoding the file is in, when I tell
StreamReader to auto-detect the encoding, it *ALWAYS* comes back with
UTF-8 and then my deserialization might fail......
I even tried to use the Platform SDK function "IsTextUnicode" on the
first 256 bytes I read from the file using a FileStream - no luck
either, IsTextUnicode always returns false ........
How on earth can I *reliably* detect the encoding of a text file in a
C# app?
Thanks for any hints, pointers, and most notably, CODE SAMPLES !! ;-)
Marc
I have a text file which contains some XML. In its XML header, it
claims to be of UTF-8 encoding - however, it's really not, it's a ANSI
/ Windows-1252 / ISO-8859-1 encoding.
Trouble is: when I deserialize objects from that file, all the German
umlauts and other special characters get dropped, some even cause
deserialization errors.
When I open the file in a text editor and save it as a REAL UTF-8
file, every thing works just fine as expected.
I then tried to make sure I open the text file with a StreamReader,
telling it to determine the encoding automatically, and I intended to
then store it as real UTF-8 in case it wasn't really in that encoding.
Trouble is: no matter what encoding the file is in, when I tell
StreamReader to auto-detect the encoding, it *ALWAYS* comes back with
UTF-8 and then my deserialization might fail......
I even tried to use the Platform SDK function "IsTextUnicode" on the
first 256 bytes I read from the file using a FileStream - no luck
either, IsTextUnicode always returns false ........
How on earth can I *reliably* detect the encoding of a text file in a
C# app?
Thanks for any hints, pointers, and most notably, CODE SAMPLES !! ;-)
Marc