BUG in StreamWriter

G

Guest

Hi,

When constructing StreamWriter with the following..
FileStream f = new FileStream(..);
StreamWriter s = new StreamWriter(f);

Then attempt to write out åäö letters they become garbage.

BUT

If we call StreamWriter as follows...
FileStream f = new FileStream(..);
StreamWriter s = new StreamWriter(f, System.Text.Encoding.Default);

Its ok. So why is default not the actual DEFAULT as it says on the ctor?

It seems to me either the ctor is wrong or the name .Default is misleading.

Thanks.
 
J

Jon Skeet [C# MVP]

When constructing StreamWriter with the following..
FileStream f = new FileStream(..);
StreamWriter s = new StreamWriter(f);

Then attempt to write out åäö letters they become garbage.

BUT

If we call StreamWriter as follows...
FileStream f = new FileStream(..);
StreamWriter s = new StreamWriter(f, System.Text.Encoding.Default);

Its ok. So why is default not the actual DEFAULT as it says on the ctor?

It seems to me either the ctor is wrong or the name .Default is misleading.

..Default is *slightly* misleading, although all the information is in
the documentation. The docs for new StreamWriter(Stream) say:

<quote>
This constructor creates a StreamWriter with UTF-8 encoding whose
GetPreamble method returns an empty byte array. The BaseStream property
is initialized using the stream parameter.
</quote>

However, the brief summary saying that it uses "the default" encoding
is misleading (I'll mail MS about it).

..Default means the default *platform* encoding - but pretty much
everything in .NET itself uses UTF-8 by default.
 
J

Jon Skeet [C# MVP]

So UTF8 cant handle umlaut characters it seems then

Yes it can. It's just that whatever you were using to read the file
presumably wasn't aware that it was encoded in UTF-8.
 
G

Guest

So UTF8 cant handle umlaut characters it seems then


When constructing StreamWriter with the following..
FileStream f = new FileStream(..);
StreamWriter s = new StreamWriter(f);

Then attempt to write out åäö letters they become garbage.

BUT

If we call StreamWriter as follows...
FileStream f = new FileStream(..);
StreamWriter s = new StreamWriter(f, System.Text.Encoding.Default);

Its ok. So why is default not the actual DEFAULT as it says on the ctor?

It seems to me either the ctor is wrong or the name .Default is
misleading.

..Default is *slightly* misleading, although all the information is in
the documentation. The docs for new StreamWriter(Stream) say:

<quote>
This constructor creates a StreamWriter with UTF-8 encoding whose
GetPreamble method returns an empty byte array. The BaseStream property
is initialized using the stream parameter.
</quote>

However, the brief summary saying that it uses "the default" encoding
is misleading (I'll mail MS about it).

..Default means the default *platform* encoding - but pretty much
everything in .NET itself uses UTF-8 by default.
 
G

Guest

According to windows file system it says ASCII :D

I thought that was standard enough :D Because I used the same format all the
way thru the code and its umlauted ok but when its writing (using the
default ctors) its garbled. I wiped the file, changed it to construct the
SR with Encoding.Default and its saving the umlat charset now, howcome the
usual ctor with FileStream doesnt save umlaut chars then as nowwhere else
did I specify any form of encoding until this change to fix it.
 
G

Guest

<?xml version="1.0" encoding="utf-8"?>

was even defined in the XML file that I got the string from, its even stored
in the String type correctly its just when writing to the file.

Normal calls specified WITHOUT encoding parameters did NOT save the umlaut
chars.
 
G

Guest

Opening the text file in notepad and selecting save as shows its ANSI, not
UTF8- how come the file create when appending does not store the file as
UTF8 then as thats suppost to be the default that you state?

That would cause the mixmatch if the file create is creating as ANSI and all
methods default to UTF8.




<?xml version="1.0" encoding="utf-8"?>

was even defined in the XML file that I got the string from, its even stored
in the String type correctly its just when writing to the file.

Normal calls specified WITHOUT encoding parameters did NOT save the umlaut
chars.
 
J

Jon Skeet [C# MVP]

According to windows file system it says ASCII :D

What do you mean by "according to the Windows file system"?
I thought that was standard enough :D

ASCII doesn't have any characters with accents.
Because I used the same format all the
way thru the code and its umlauted ok but when its writing (using the
default ctors) its garbled. I wiped the file, changed it to construct the
SR with Encoding.Default and its saving the umlat charset now, howcome the
usual ctor with FileStream doesnt save umlaut chars then as nowwhere else
did I specify any form of encoding until this change to fix it.

It *does* save umlaut characters, it's just that what you're using to
read the file isn't recognising that it's UTF-8. You later say:
Opening the text file in notepad and selecting save as shows its ANSI,
not UTF8

That's just notepad being confused.

UTF-8 works fine, the framework works fine - but some of your tools may
not be doing what you want them to.

See http://www.pobox.com/~skeet/csharp/unicode.html for more
information about encodings.
 
G

Guest

You're right, because notepad isnt standard at all for reading text files.
Nobody in theyre right mind uses it or Wintail etc to view logs. No no not
at all :D

Its fine when i specify Encoding.Default on StreamWriter yet its NOT when I
dont specify ANY encoding anywhere in the app.
 
F

Frans Bouma

Jon Skeet said:
It *does* save umlaut characters, it's just that what you're using to
read the file isn't recognising that it's UTF-8. You later say:

The byte specification in the actual raw data misses UTF-8
specification when you use Default. I was bitten by the same thing. I had
to explicitly state Encoding.Unicode. WHen I used Encoding.Default, it
should work according to the docs, but it didn't. It did save stuff like
scandinavian characters away in the file, but it couldn't read it back
correctly, even if I stated UTF-8 as encoding or whatever in the xml
header. So I think he's right.
That's just notepad being confused.
UTF-8 works fine, the framework works fine - but some of your tools may
not be doing what you want them to.

If you specify Encoding.Unicode, it will work, if you specify
Encoding.Default it will not in some cases. In both cases, the files do
NOT have an XML heading explaining the encoding. The actual encoding is in
the bytes in the file (and probably in a meta-data property in NTFS). That
specification is not read back / or written correctly when you use
Default. I think that's the reason for his complaint and I have to admit,
he's right, I had exactly the same thing.

Frans
 
J

Jon Skeet [C# MVP]

You're right, because notepad isnt standard at all for reading text files.
Nobody in theyre right mind uses it or Wintail etc to view logs. No no not
at all :D

That doesn't mean that notepad will automatically detect UTF-8 encoded
files. (I don't know whether or not it can cope with UTF-8 at all.)
Its fine when i specify Encoding.Default on StreamWriter yet its NOT when I
dont specify ANY encoding anywhere in the app.

Yes, as you keep saying. That's because Encoding.Default is the default
ANSI encoding for the platform, but the default if you don't specify
any encoding is UTF-8, as I keep saying.

We seem to be going round and round here - which part are you not
understanding?
 
J

Jon Skeet [C# MVP]

Frans Bouma said:
The byte specification in the actual raw data misses UTF-8
specification when you use Default.

What do you mean by this, exactly?
I was bitten by the same thing. I had
to explicitly state Encoding.Unicode. WHen I used Encoding.Default, it
should work according to the docs, but it didn't. It did save stuff like
scandinavian characters away in the file, but it couldn't read it back
correctly, even if I stated UTF-8 as encoding or whatever in the xml
header. So I think he's right.

I really don't think so - please provide a complete example stating
*exactly* what you expected, and what you got.
If you specify Encoding.Unicode, it will work, if you specify
Encoding.Default it will not in some cases.

That's because notepad can cope with UCS-2 (Unicode) encoding but not
UTF-8.
In both cases, the files do
NOT have an XML heading explaining the encoding.

Notepad isn't going to look at the XML header anyway, of course. I
don't see what the XML header has to do with anything, here, to be
honest. What relevance do you think it has to how a file is opened in
notepad?
The actual encoding is in
the bytes in the file (and probably in a meta-data property in NTFS).

The encoding isn't "in" the bytes of the file - it's perfectly possible
to have a file which means two different things when considered as
being in two different encodings. How would it be in the meta-data
anyway? As far as the file system is concerned, it's just a stream of
bytes.
That specification is not read back / or written correctly when you use
Default. I think that's the reason for his complaint and I have to admit,
he's right, I had exactly the same thing.

I don't think he's write at all. When you say "Default" do you mean
"the default encoding if you don't specify one" or "Encoding.Default"?
I believe both work exactly as intended - but I suspect you're missing
something about the intention.
 
J

Jon Skeet [C# MVP]

Anders Borum said:
I've experienced similar problems too using the default encoding.

What problems, exactly? People are being very woolly about what they're
seeing and how they're testing it.

To recap:

o If you don't specify an encoding, you'll get UTF-8
o If you specify Encoding.Default, you'll get the platform's default
encoding (eg Cp437)
o Notepad doesn't understand UTF-8 files, so if you open a UTF-8 file
in it you'll see garbage. This doesn't mean it's not a perfectly
valid UTF-8 file, it just means Notepad is pretty poor.

Now, given the above, what exactly do you think is wrong?
 
G

Guest

originally I did NOT specify any encoding anywhere and the umlaut åäö chars
where ok everywhere except on the file save.

When I specify Encoding.Default on the StreamWriter with a fresh file ,
everything is ok. If .net defaults to UTF8 if i specify NO encoding, how
come it cant save the chars then?
 
G

Guest

It affects wintail also, www.wintail.com


Jon Skeet said:
That doesn't mean that notepad will automatically detect UTF-8 encoded
files. (I don't know whether or not it can cope with UTF-8 at all.)


Yes, as you keep saying. That's because Encoding.Default is the default
ANSI encoding for the platform, but the default if you don't specify
any encoding is UTF-8, as I keep saying.

We seem to be going round and round here - which part are you not
understanding?
 
G

Guest

Its not XML file, the XML file only is used as the input string, the actual
output thats being corrupted is a normal text file.

The program had NO reference to encoding (thereby using the default .NET
mechanism) and that was corrupting the output using StreamWriter with
FileStream. The solution to this was to construct the StreamWriter with the
Encoding.Default yet this was my actual issue, why is this default when
infact its not. It was confusing to me and why can the default .NET
mechanism (not specifying encoding) handle umlaut chars correctly (if its
UTF8 as you say).
 
J

Jon Skeet [C# MVP]

originally I did NOT specify any encoding anywhere and the umlaut åäö chars
where ok everywhere except on the file save.

When I specify Encoding.Default on the StreamWriter with a fresh file ,
everything is ok. If .net defaults to UTF8 if i specify NO encoding, how
come it cant save the chars then?

It can. It's just that the tool you're using to check for them can't
read them.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top