XmlDocument and utf-8

M

MaxMax

A question: all the XML files I've seen use this declaration:

<?xml version="1.0" encoding="UTF-8"?>

BUT files created using XmlDocument have:

<?xml version="1.0" encoding="utf-8"?>

(you see? lowercase UTF)

I've even tried "manually correcting" the utf to UTF with notepad and then
opening with XmlDocument and saving.

Are they equivalent? Am I doing something wrong?

--- bye
 
S

Steven Cheng[MSFT]

Hi Max,

Yes, both "UTF-8" and "utf-8" is ok for the charset in XML declaration
section. And the .net framework XmlDocument just always convert the charset
value to lower case for consistency purpose.

In addition, the <?xml ....?> declaration's charset value is only a
suggestion value for some XML processing programs, the actual
charset/encoding format of a XML document/file still rely on how you write
out the document(through file I/O api). In other words, the actual
charset/encoding of a XML file may be different from the charset
declaration in the <?xml ....?> section

Sincerely,

Steven Cheng

Microsoft MSDN Online Support Lead



==================================================

Get notification to my posts through email? Please refer to
http://msdn.microsoft.com/subscriptions/managednewsgroups/default.aspx#notif
ications.



Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
where an initial response from the community or a Microsoft Support
Engineer within 1 business day is acceptable. Please note that each follow
up response may take approximately 2 business days as the support
professional working with you may need further investigation to reach the
most efficient resolution. The offering is not appropriate for situations
that require urgent, real-time or phone-based interactions or complex
project analysis and dump analysis issues. Issues of this nature are best
handled working with a dedicated Microsoft Support Engineer by contacting
Microsoft Customer Support Services (CSS) at
http://msdn.microsoft.com/subscriptions/support/default.aspx.

==================================================


This posting is provided "AS IS" with no warranties, and confers no rights.
 
M

MaxMax

Yes, both "UTF-8" and "utf-8" is ok for the charset in XML declaration
section. And the .net framework XmlDocument just always convert the
charset
value to lower case for consistency purpose.

In addition, the <?xml ....?> declaration's charset value is only a
suggestion value for some XML processing programs, the actual
charset/encoding format of a XML document/file still rely on how you write
out the document(through file I/O api). In other words, the actual
charset/encoding of a XML file may be different from the charset
declaration in the <?xml ....?> section

Just checked the XML standard (fourth edition) (just to be sure... 99% of
the internet can't be wrong... or can it?)
"XML processors SHOULD match character encoding names in a case-insensitive
way "

The "official" name of UTF-* is UTF-* uppercase, but the parser should parse
it in a case insensitive way.

--- bye
 
J

Jon Skeet [C# MVP]

Yes, both "UTF-8" and "utf-8" is ok for the charset in XML declaration
section. And the .net framework XmlDocument just always convert the charset
value to lower case for consistency purpose.

In addition, the <?xml ....?> declaration's charset value is only a
suggestion value for some XML processing programs, the actual
charset/encoding format of a XML document/file still rely on how you write
out the document(through file I/O api). In other words, the actual
charset/encoding of a XML file may be different from the charset
declaration in the <?xml ....?> section

It's not really a "suggestion" - it's the encoding which should be
used to parse the rest of the document. If you claim (in the
declaration) to use UTF-8 and actually use some other encoding, XML
parsers are almost certainly going to fail to understand the data in
the way you expect.

Jon
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top