UTF-8 encoding in AJAX web application.

  • Thread starter Thread starter Allan Ebdrup
  • Start date Start date
Hello Allan,

Does the explanation in my last reply also help some? As I've mentioned
there, one important thing is that when you load any text stream into .net
framework code(string or char), .net has automatically handled them as
two-byte widechar streams(UTF-16 encoding) in memory).

Sincerely,

Steven Cheng

Microsoft MSDN Online Support Lead


This posting is provided "AS IS" with no warranties, and confers no rights.
 
I don't get why you can't say a System.String is UTF-8 encoded? if the bytes
in the string have to be read with a UTF-8 encoding to make sense?

The string is logically composed of characters, not bytes.

When you read your data originally, it's converting it from the binary
form (UTF-8 encoded data) into the text form (which happens to be
stored UTF-16 encoded, but that's irrelevant).
Where does my UTF-8 encoded XML get translated to UTF-16

When you first load the XML document.

Jon
 
Jon Skeet said:
The string is logically composed of characters, not bytes.

Yes, and if you print the string it would be be printed incorrectly because
you would be assuming a UTF-16 encoding when the encoding is in fact UTF-8.
Granted it would not be what you want and .length would probably return the
wrong result. It would clearly be an error but the string I pass to my
webmothod is UTF-8 encoded so the string I pass to the XmlDocument should
have encoding probles right?
When you read your data originally, it's converting it from the binary
form (UTF-8 encoded data) into the text form (which happens to be
stored UTF-16 encoded, but that's irrelevant).


When you first load the XML document.

How can this be when I don't specify the encoding in the XML string I pass
to the XmlDocument.LoadXml() method. How the loading of the XmlDocument know
that the string I pass is UTF-8?

Never mind, I give up. It works now.

Kind Regards,
Allan Ebdrup
 
Yes, and if you print the string it would be be printed incorrectly because
you would be assuming a UTF-16 encoding when the encoding is in fact UTF-8.

If that's the case, you've incorrectly read the string in in the first
place.
Granted it would not be what you want and .length would probably return the
wrong result. It would clearly be an error but the string I pass to my
webmothod is UTF-8 encoded so the string I pass to the XmlDocument should
have encoding probles right?

As I keep saying, there's no such thing as a UTF-8 encoded string. By
the time it's a string, there is no encoding involved logically.
How can this be when I don't specify the encoding in the XML string I pass
to the XmlDocument.LoadXml() method. How the loading of the XmlDocument know
that the string I pass is UTF-8?

If you're passing in a string, then there's no encoding required in
the first place.
Never mind, I give up. It works now.

Did you read the link I posted earlier? I strongly recommend that you
do.

Encodings are only involved when converting text data to binary data
or vice versa.

When you first load a string from a file (or some other binary data
store) you need to do that conversion. It's required again if you
write it to a file (etc). In the meantime, there's no encoding
involved, it's just a sequence of Unicode characters.

Jon
 
Back
Top