BUG in StreamWriter

J

Jon Skeet [C# MVP]

Its not XML file, the XML file only is used as the input string, the actual
output thats being corrupted is a normal text file.

The program had NO reference to encoding (thereby using the default .NET
mechanism) and that was corrupting the output using StreamWriter with
FileStream. The solution to this was to construct the StreamWriter with the
Encoding.Default yet this was my actual issue, why is this default when
infact its not.

It's not the default for StreamWriter, it's the default encoding for
the Windows box you're running it on.
It was confusing to me and why can the default .NET
mechanism (not specifying encoding) handle umlaut chars correctly (if its
UTF8 as you say).

It can. You just can't read it properly.
 
G

Guest

Ok, notepad shows it ok, so does the VS editor

Wintail and INTERNET EXPLORER (which is suprising) does not.
 
G

Guest

Internet explorer displays it as äåö

Jon Skeet said:
It's not the default for StreamWriter, it's the default encoding for
the Windows box you're running it on.


It can. You just can't read it properly.
 
J

Jon Skeet [C# MVP]

Actually with specifying Encoding.Default wintail and notepad correctly show
this characters.

Yes, because they're assuming the Windows default encoding.
Its the actual C# save that it doesnt.

<sigh>

How many times do I need to explain it? C# is working fine - it's just
that your tools don't understand UTF-8. Find a text editor which lets
you pick a UTF-8 encoding, and load the file - you'll see the
characters just fine.
 
G

Guest

Nothing like plugging your own site.

<sigh> if you get tired of explaining nobody forces you to reply to each and
every post out there, no need to step on others to make your ego bigger. Ive
seen you post before, you do the same every time.


Internet explorer displays it as äåö

Internet Explorer is probably assuming the Windows default encoding as
well.

How well do you actually understand encodings? You might like to read
http://www.pobox.com/~skeet/csharp/unicode.html for more information.
 
J

Jon Skeet [C# MVP]

Nothing like plugging your own site.

I wrote that page (and various others) to save me from explaining
things in detail repeatedly. It's not like I get money from them or
anything - they're just meant to be helpful.
<sigh> if you get tired of explaining nobody forces you to reply to each and
every post out there, no need to step on others to make your ego bigger. Ive
seen you post before, you do the same every time.

I don't reply to each and every post out there, but it *is*
disconcerting when people clearly don't really read answers. This
thread is pointless - no-one's really saying what .NET is supposedly
doing wrong except in terms of what Notepad/Wintail etc can cope with.
I've explained what's going on numerous times now...
 
M

Marco Martin

<sigh>
Nothing like plugging your own site.

<sigh> if you get tired of explaining nobody forces you to reply to each and
every post out there, no need to step on others to make your ego bigger. Ive
seen you post before, you do the same every time.




Internet Explorer is probably assuming the Windows default encoding as
well.

How well do you actually understand encodings? You might like to read
http://www.pobox.com/~skeet/csharp/unicode.html for more information.
 
M

Marco Martin

Discussion,

if all you want to do is complain about the people who are here to help you,
perhaps you would feel more at home at this forum instead:
alt.complainers.bitch-n-moan

Others here use this forum as help, and is invaluable for their jobs. I for
one find you quite distasteful to say the least and am asking nicely for you
to be, at the very least respectfull, to the people who take the time to
answer your questions.

Marco.
 
F

Frans Bouma

What do you mean by this, exactly?

that I had the same XML data in the file, one written away with
Encoding.Default and the other with Encoding.Unicode. Both looked the same
in notepad, I had NO encoding specifcation. however one couldn't be loaded
due to a an 'ae' character, the other one could be loaded (or better: be
serialized back). I found this very odd, because there was NO encoding
specifier in the XML, so the encoding has to be stored somewhere else.
I really don't think so - please provide a complete example stating
*exactly* what you expected, and what you got.

write:
XmlTextWriter writer = new XmlTextWriter(Path.Combine
(Application.StartupPath, ApplicationConstants.PreferencesFilename),
System.Text.Encoding.Unicode);

try
{
writer.WriteStartElement("Preferences");
writer.WriteStartElement("preferedProjectFolder");
writer.WriteAttributeString("value",
_preferences.PreferedProjectFolder);
writer.WriteEndElement();
// etc.


THIS works. (the Unicode encoding).
However when I change that to Default, it doesn't. I even added UTF-8
encoding specification to the XML file, no luck. Now, the docs state that
the codepage of the local system is used with 'default'. I did set the
codepage of my system to all kinds of wicked pages, but also no luck.
Unicode solved it (obviously). However, 'Default' THUS doesn't work for
characters other than plain ASCII.

read:
XmlTextReader reader = new XmlTextReader(Path.Combine
(Application.StartupPath, ApplicationConstants.PreferencesFilename));

try
{
// Read the nodes and store the values as they are found in the
preferences object.
while(reader.Read())
{
switch(reader.Name)
{
case "preferedProjectFolder":
_preferences.PreferedProjectFolder =
reader.GetAttribute("value"); // <-- crash here, character could not be
loaded. Character was a scandinavian character 'ae' (combined to 1 char).
break;
// etc..
Notepad isn't going to look at the XML header anyway, of course. I
don't see what the XML header has to do with anything, here, to be
honest. What relevance do you think it has to how a file is opened in
notepad?

I wasn't talking about notepad :) I write an XML file and read it
back the next time the app starts. It crashed then (it didn't while saving
the XML). However because it is XML, I thought an encoding specification
would be better in the XML header. But if you add that (UTF-8) and you've
saved with 'Default' the file can't be opened with the XmlTextReader
because of some byte encoding issue. (IIRC).
The encoding isn't "in" the bytes of the file - it's perfectly possible
to have a file which means two different things when considered as
being in two different encodings. How would it be in the meta-data
anyway? As far as the file system is concerned, it's just a stream of
bytes.

that's what I was thinking too, however the errors I had made me
draw that conclusion. However I can be wrong, what I DO know is that
characters in extended ascii can't be handled with Encoding.Default.

FB
 
C

Cor

Hi Jon,

A question to you.
I was seeing (not following) your impossible strugle to do this right.
I complete agree with the message from Marco Martin

But all this post is cross posted.
And although you did your best and maybe it is usefull but because of the
reactions became trash.

Maybe you can delete the next time the newsgroups from which you are not
answering when it is this kind of answers you get.

:))

Cor
 
J

Jon Skeet [C# MVP]

Frans Bouma said:
that I had the same XML data in the file, one written away with
Encoding.Default and the other with Encoding.Unicode. Both looked the same
in notepad, I had NO encoding specifcation. however one couldn't be loaded
due to a an 'ae' character, the other one could be loaded (or better: be
serialized back). I found this very odd, because there was NO encoding
specifier in the XML, so the encoding has to be stored somewhere else.

It's not odd add all - it would have been preferable to have the
encoding specifier in the XML, but Notepad wouldn't have used it
anyway.

In fact, it seem that Notepad on XP *does* read UTF-8 files. If you use
the following code:

using System;
using System.IO;
using System.Text;

public class Test
{
static void Main()
{
using (StreamWriter sw = new StreamWriter ("test.txt"))
{
sw.WriteLine ("\u00e9");
}
}
}

to generate a file test.txt, which has contents 0xc9 0xa9 0x0d 0x0a,
then if you open it in Notepad with encoding UTF-8, it correctly
displays an e-acute. If you open it in Notepad with encoding ANSI, it
displays é (again, correctly).

Now, if your XML didn't include an encoding specifier, the XML parser
should have assumed UTF-8. If you used Encoding.Default (instead of
UTF-8) then you would indeed get an error if the file was not a valid
UTF-8 file. From the XML specification:

<quote>
In the absence of information provided by an external transport
protocol (e.g. HTTP or MIME), it is an error for an entity including an
encoding declaration to be presented to the XML processor in an
encoding other than that named in the declaration, or for an entity
which begins with neither a Byte Order Mark nor an encoding declaration
to use an encoding other than UTF-8.
</quote>

When you used the Unicode encoding, I suspect you got a byte-order mark
which allowed the parser to tell that it was using that encoding.
write:
XmlTextWriter writer = new XmlTextWriter(Path.Combine
(Application.StartupPath, ApplicationConstants.PreferencesFilename),
System.Text.Encoding.Unicode);

try
{
writer.WriteStartElement("Preferences");
writer.WriteStartElement("preferedProjectFolder");
writer.WriteAttributeString("value",
_preferences.PreferedProjectFolder);
writer.WriteEndElement();
// etc.


THIS works. (the Unicode encoding).
However when I change that to Default, it doesn't. I even added UTF-8
encoding specification to the XML file, no luck.

No, it wouldn't - for the reasons given above.
Now, the docs state that
the codepage of the local system is used with 'default'. I did set the
codepage of my system to all kinds of wicked pages, but also no luck.
Unicode solved it (obviously). However, 'Default' THUS doesn't work for
characters other than plain ASCII.

It does, but not when you've told the XML parser to expect UTF-8 and
then don't give it UTF-8!
I wasn't talking about notepad :) I write an XML file and read it
back the next time the app starts. It crashed then (it didn't while saving
the XML). However because it is XML, I thought an encoding specification
would be better in the XML header. But if you add that (UTF-8) and you've
saved with 'Default' the file can't be opened with the XmlTextReader
because of some byte encoding issue. (IIRC).

Yup, that makes perfect sense, in the same way that if you tell someone
that you're going to talk English and then you talk French they may
well get confused. You've got to actually use the encoding you specify
in the XML header.
that's what I was thinking too, however the errors I had made me
draw that conclusion. However I can be wrong, what I DO know is that
characters in extended ascii can't be handled with Encoding.Default.

a) There's no such thing as "extended ASCII". There are various
encodings which are 8-bit extensions to ASCII, but they are all
different, and there's no one true "extended ASCII".
b) Characters within an ANSI code-page *can* be used if you correctly
specify the character encoding in the XML header. I suspect that an
encoding of "windows-1252" would have worked. I haven't tried it
and I wouldn't recommend it though - I'd just stick to UTF-8.
 
F

Frans Bouma

Ok Thanks Jon, for clearing that up. :)

Frans

sam loade be

It's not odd add all - it would have been preferable to have the
encoding specifier in the XML, but Notepad wouldn't have used it
anyway.

In fact, it seem that Notepad on XP *does* read UTF-8 files. If you use
the following code:

using System;
using System.IO;
using System.Text;

public class Test
{
static void Main()
{
using (StreamWriter sw = new StreamWriter ("test.txt"))
{
sw.WriteLine ("\u00e9");
}
}
}

to generate a file test.txt, which has contents 0xc9 0xa9 0x0d 0x0a,
then if you open it in Notepad with encoding UTF-8, it correctly
displays an e-acute. If you open it in Notepad with encoding ANSI, it
displays é (again, correctly).

Now, if your XML didn't include an encoding specifier, the XML parser
should have assumed UTF-8. If you used Encoding.Default (instead of
UTF-8) then you would indeed get an error if the file was not a valid
UTF-8 file. From the XML specification:

<quote>
In the absence of information provided by an external transport
protocol (e.g. HTTP or MIME), it is an error for an entity including an
encoding declaration to be presented to the XML processor in an
encoding other than that named in the declaration, or for an entity
which begins with neither a Byte Order Mark nor an encoding declaration
to use an encoding other than UTF-8.
</quote>

When you used the Unicode encoding, I suspect you got a byte-order mark
which allowed the parser to tell that it was using that encoding.
it back

No, it wouldn't - for the reasons given above.



It does, but not when you've told the XML parser to expect UTF-8 and
then don't give it UTF-8!
it savin specification you've

Yup, that makes perfect sense, in the same way that if you tell someone
that you're going to talk English and then you talk French they may
well get confused. You've got to actually use the encoding you specify
in the XML header.
NTFS). possible me

a) There's no such thing as "extended ASCII". There are various
encodings which are 8-bit extensions to ASCII, but they are all
different, and there's no one true "extended ASCII".
b) Characters within an ANSI code-page *can* be used if you correctly
specify the character encoding in the XML header. I suspect that an
encoding of "windows-1252" would have worked. I haven't tried it
and I wouldn't recommend it though - I'd just stick to UTF-8.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top