UTF-8 encoding in AJAX web application.

S

Steven Cheng[MSFT]

Hello Allan,

Does the explanation in my last reply also help some? As I've mentioned
there, one important thing is that when you load any text stream into .net
framework code(string or char), .net has automatically handled them as
two-byte widechar streams(UTF-16 encoding) in memory).

Sincerely,

Steven Cheng

Microsoft MSDN Online Support Lead


This posting is provided "AS IS" with no warranties, and confers no rights.
 
J

Jon Skeet [C# MVP]

I don't get why you can't say a System.String is UTF-8 encoded? if the bytes
in the string have to be read with a UTF-8 encoding to make sense?

The string is logically composed of characters, not bytes.

When you read your data originally, it's converting it from the binary
form (UTF-8 encoded data) into the text form (which happens to be
stored UTF-16 encoded, but that's irrelevant).
Where does my UTF-8 encoded XML get translated to UTF-16

When you first load the XML document.

Jon
 
A

Allan Ebdrup

Jon Skeet said:
The string is logically composed of characters, not bytes.

Yes, and if you print the string it would be be printed incorrectly because
you would be assuming a UTF-16 encoding when the encoding is in fact UTF-8.
Granted it would not be what you want and .length would probably return the
wrong result. It would clearly be an error but the string I pass to my
webmothod is UTF-8 encoded so the string I pass to the XmlDocument should
have encoding probles right?
When you read your data originally, it's converting it from the binary
form (UTF-8 encoded data) into the text form (which happens to be
stored UTF-16 encoded, but that's irrelevant).


When you first load the XML document.

How can this be when I don't specify the encoding in the XML string I pass
to the XmlDocument.LoadXml() method. How the loading of the XmlDocument know
that the string I pass is UTF-8?

Never mind, I give up. It works now.

Kind Regards,
Allan Ebdrup
 
J

Jon Skeet [C# MVP]

Yes, and if you print the string it would be be printed incorrectly because
you would be assuming a UTF-16 encoding when the encoding is in fact UTF-8.

If that's the case, you've incorrectly read the string in in the first
place.
Granted it would not be what you want and .length would probably return the
wrong result. It would clearly be an error but the string I pass to my
webmothod is UTF-8 encoded so the string I pass to the XmlDocument should
have encoding probles right?

As I keep saying, there's no such thing as a UTF-8 encoded string. By
the time it's a string, there is no encoding involved logically.
How can this be when I don't specify the encoding in the XML string I pass
to the XmlDocument.LoadXml() method. How the loading of the XmlDocument know
that the string I pass is UTF-8?

If you're passing in a string, then there's no encoding required in
the first place.
Never mind, I give up. It works now.

Did you read the link I posted earlier? I strongly recommend that you
do.

Encodings are only involved when converting text data to binary data
or vice versa.

When you first load a string from a file (or some other binary data
store) you need to do that conversion. It's required again if you
write it to a file (etc). In the meantime, there's no encoding
involved, it's just a sequence of Unicode characters.

Jon
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top