Using XmlTextReader to read unicode characters

Jordan · Nov 9, 2005

I have a unicode XML file that I am trying to read using the .NET
XmlTextReader in C#. How do I read the unicode file? If I try to
using the XmlTextReader.Read() method, it throws an exception.

The exception reads:
The '€' character, hexadecimal value 0x80, cannot begin with a name.
Line 1, position 2.

Any suggestions? I read on Microsoft's website about writing surrogate
pairs, but I can't find any documentation that confirms the
XmlTextReader can handle surrogate pairs.

Martin Honnen · Nov 9, 2005

Jordan said:
I have a unicode XML file that I am trying to read using the .NET
XmlTextReader in C#. How do I read the unicode file? If I try to
using the XmlTextReader.Read() method, it throws an exception.

What Unicode encoding does that XML file have (e.g. UTF-8 or UTF-16)?
How do you know it is Unicode?
Is there an XML declaration (e.g. <?xml version="1.0"
encoding="UTF-8"?>) at the beginning? Is there a BOM (byte order mark)?
How do you create the XmlTextReader, simply with
new XmlTextReader("file.xml")
?

The exception reads:
The '€' character, hexadecimal value 0x80, cannot begin with a name.
Line 1, position 2.

Maybe the XML is not properly encoded? How do the first lines of the XML
file look?
What happens when you load the file with the IE browser? Does that give
a parse error too?

Using XmlTextReader to read unicode characters

Jordan

Martin Honnen