System.IO.StreamWriter uses two bytes for ASCII characters with UT

G

Guest

I am creating a text file with a StreamWriter set to UTF8 encoding like in
the following example:

Using writer As New IO.StreamWriter("C:\temp\HelloWorld.txt", False,
System.Text.Encoding.UTF8)
writer.Write("Hello World")
End Using

It appears that ASCII characters are written using two bytes. From what I
have read UTF-8 is a variable length character encoding format and standard
ASCII characters are written using only one byte.

Why is it writing two bytes? Is there a way to change this behavior?

Thanks,
Joe
 
C

Carl Daniel [VC++ MVP]

Joe said:
I am creating a text file with a StreamWriter set to UTF8 encoding like in
the following example:

Using writer As New IO.StreamWriter("C:\temp\HelloWorld.txt",
False,
System.Text.Encoding.UTF8)
writer.Write("Hello World")
End Using

It appears that ASCII characters are written using two bytes. From what I
have read UTF-8 is a variable length character encoding format and
standard
ASCII characters are written using only one byte.

Why is it writing two bytes? Is there a way to change this behavior?

How are you determining that it's writing two bytes? It shouldn't, and it
doesn't when I just tried it.

Note that the file will have a BOM (byte order mark) prepended: 0xEF, 0xBB,
0xBF, but after that, all ASCII characters are encoded as a single byte
since their code points are all <0x80.

-cd
 
W

William Stacey [MVP]

I don't see that behavior. It does add a utf8 preamble, but the ascii is
ascii.
using (StreamWriter sw = new
StreamWriter(Console.OpenStandardOutput(), Encoding.UTF8))

{

sw.Write("Hello World");

}



Output:

Hello World


--
William Stacey [MVP]

|I am creating a text file with a StreamWriter set to UTF8 encoding like in
| the following example:
|
| Using writer As New IO.StreamWriter("C:\temp\HelloWorld.txt",
False,
| System.Text.Encoding.UTF8)
| writer.Write("Hello World")
| End Using
|
| It appears that ASCII characters are written using two bytes. From what I
| have read UTF-8 is a variable length character encoding format and
standard
| ASCII characters are written using only one byte.
|
| Why is it writing two bytes? Is there a way to change this behavior?
|
| Thanks,
| Joe
 
G

Guest

I am using UltraEdit to view the HEX version of the text file. If I execute:

Using writer As New IO.StreamWriter("C:\temp\HelloWorldUtf8.txt",
False, System.Text.Encoding.UTF8)
writer.Write("Hello World")
End Using

Then I get the output:
FF FE FF FE 48 00 65 00 6C 00 6C 00 6F 00 20 00 57 00 6F 00 72 00 6C 00 64 00

If I execute this command:

Using writer As New IO.StreamWriter("C:\temp\HelloWorldASCII.txt",
False, System.Text.Encoding.ASCII)
writer.Write("Hello World")
End Using

Then I get this output:

48 65 6C 6C 6F 20 57 6F 72 6C 64

What tool are you using to verify the actual bytes written?
 
G

Guest

Ah Ha. You got me curious so I downloaded TextPad and used that to examine
the binary. You are correct, it is writing it properly. This must be a
quirk with UltraEdit. Other than the BOM the ASCII and UTF8 are exactly the
same.

Thanks for helping me with my silly mistake!!!

-Joe
 
C

Carl Daniel [VC++ MVP]

Joe said:
I am using UltraEdit to view the HEX version of the text file. If I
execute:

Using writer As New
IO.StreamWriter("C:\temp\HelloWorldUtf8.txt", False,
System.Text.Encoding.UTF8) writer.Write("Hello World")
End Using

Then I get the output:
FF FE FF FE 48 00 65 00 6C 00 6C 00 6F 00 20 00 57 00 6F 00 72 00 6C
00 64 00

If I execute this command:

Using writer As New
IO.StreamWriter("C:\temp\HelloWorldASCII.txt", False,
System.Text.Encoding.ASCII) writer.Write("Hello World")
End Using

Then I get this output:

48 65 6C 6C 6F 20 57 6F 72 6C 64

What tool are you using to verify the actual bytes written?

I see you figured it out. Odd that UltraEdit would do that - apparently it
was converting the file to UCS-2 and then showing you the binary version of
that.

Incidentally, I was using Visual Studio to examine the file as binary.
(Click the little drop-down arrow on the Open button when opening a file in
VS and choose "Binary Editor" to open a text file as binary).

-cd
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top