Jon Skeet [C# MVP] <(E-Mail Removed)> wrote in
news:(E-Mail Removed):
> <(E-Mail Removed)> wrote:
>> Because I used the same format all the
>> way thru the code and its umlauted ok but when its writing (using the
>> default ctors) its garbled. I wiped the file, changed it to construct
>> the SR with Encoding.Default and its saving the umlat charset now,
>> howcome the usual ctor with FileStream doesnt save umlaut chars then as
>> nowwhere else did I specify any form of encoding until this change to
>> fix it.
>
> It *does* save umlaut characters, it's just that what you're using to
> read the file isn't recognising that it's UTF-8. You later say:
The byte specification in the actual raw data misses UTF-8
specification when you use Default. I was bitten by the same thing. I had
to explicitly state Encoding.Unicode. WHen I used Encoding.Default, it
should work according to the docs, but it didn't. It did save stuff like
scandinavian characters away in the file, but it couldn't read it back
correctly, even if I stated UTF-8 as encoding or whatever in the xml
header. So I think he's right.
>> Opening the text file in notepad and selecting save as shows its ANSI,
>> not UTF8
>
> That's just notepad being confused.
> UTF-8 works fine, the framework works fine - but some of your tools may
> not be doing what you want them to.
If you specify Encoding.Unicode, it will work, if you specify
Encoding.Default it will not in some cases. In both cases, the files do
NOT have an XML heading explaining the encoding. The actual encoding is in
the bytes in the file (and probably in a meta-data property in NTFS). That
specification is not read back / or written correctly when you use
Default. I think that's the reason for his complaint and I have to admit,
he's right, I had exactly the same thing.
Frans
--
Get LLBLGen Pro, the new O/R mapper for .NET:
http://www.llblgen.com