Parsing unicode (UTF-16) XML file

M

Moistly

I am having difficulty parsing a unicode (UTF-16) XML file that has
been generated by a 3rd party piece of software.

I ideally would like to use XmlDocument though would settle with using
XmlTextReader

Thanks
 
M

Martin Honnen

Moistly said:
I am having difficulty parsing a unicode (UTF-16) XML file that has
been generated by a 3rd party piece of software.

Well what difficulties exactly do you have? The .NET framework supports
UTF-16 and its Xml parsers certain do too. It sounds more like the file
is not properly UTF-16 encoded.
 
A

Arne Vajhøj

Moistly said:
I am having difficulty parsing a unicode (UTF-16) XML file that has
been generated by a 3rd party piece of software.

I ideally would like to use XmlDocument though would settle with using
XmlTextReader

How does the XML look like, how does the parsing code look like and
what error do you get ?

Arne
 
M

Moistly

How does the XML look like, how does the parsing code look like and
what error do you get ?

Arne

The XML file looks like this (in notepad) <?xml version="1.0"
encoding="utf-16"?><form Id="1111"></form> in VS 2008 looks like 㼼浸â¬æ•¶ç²æ½©
㵮ㄢ〮•湥潣楤æ®âˆ½ç‘µâµ¦ã˜±ã¼¢ã°¾æ½¦æµ²ä¤ ãµ¤ã„¢ã„±âˆ±ã°¾æ˜¯ç‰¯ã¹­

I have tried parsing like

string filePath = "form.xml";

XmlDocument doc = new XmlDocument();

doc.Load(filePath);
XmlNode formNode = doc.FirstChild;

Error I get is There is no Unicode byte order mark. Cannot switch to
Unicode.

I have also tried

StreamReader reader = new StreamReader(filePath, Encoding.Unicode,
true);

XmlTextReader textReader = new XmlTextReader(reader);

textReader.Read();

Error I get now is Data at the root level is invalid. Line 1, position
1.

Any ideas? Thanks
 
A

Arne Vajhøj

Moistly said:
The XML file looks like this (in notepad) <?xml version="1.0"
encoding="utf-16"?><form Id="1111"></form> in VS 2008 looks like 㼼浸â¬æ•¶ç²æ½©
㵮ㄢ〮•湥潣楤æ®âˆ½ç‘µâµ¦ã˜±ã¼¢ã°¾æ½¦æµ²ä¤ ãµ¤ã„¢ã„±âˆ±ã°¾æ˜¯ç‰¯ã¹­

I have tried parsing like

string filePath = "form.xml";

XmlDocument doc = new XmlDocument();

doc.Load(filePath);
XmlNode formNode = doc.FirstChild;

Error I get is There is no Unicode byte order mark. Cannot switch to
Unicode.

I have also tried

StreamReader reader = new StreamReader(filePath, Encoding.Unicode,
true);

XmlTextReader textReader = new XmlTextReader(reader);

textReader.Read();

Error I get now is Data at the root level is invalid. Line 1, position
1.

Any ideas?

Have you tried the obvious - putting a BOM first in the file ?

Arne
 
M

Mark Tolonen

Arne Vajhøj said:
Have you tried the obvious - putting a BOM first in the file ?

Arne

This works with a little-endian UTF-16 file with no BOM:

StreamReader reader = new StreamReader("form-le.xml",Encoding.Unicode);
XmlDocument doc = new XmlDocument();
doc.Load(reader);
XmlNode formNode = doc.FirstChild;

-Mark
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top