XmlDocument LoadXml()- problem with utf8 xml

G

Guest

Hi,
I'm using WebClient to download a XML file from a remote server. I then save
the xml into a string.
The problem is when i use XmlDocument LoadXml() on that string. I get the
following exception:
"System.Xml.XmlException was unhandled
Message="Data at the root level is invalid. Line 1, position 1."

If i save the xml file (on the remote server) as ASCII file then there is no
problem. For some reason, the LoadXml() function cannot handle the utf8 file
format! Of course my xml is declared as encoding="utf-8".
This is my code:

string link = "http://myServer/Test.xml";
WebClient client = new WebClient();
client.Encoding = System.Text.Encoding.UTF8;
string test = client.DownloadString(link);
client.Dispose();
XmlDocument testXML = new XmlDocument();
testXML.LoadXml(test); // <-- Here i get the exception

I'll appreciate your help
10x
 
J

Jon Skeet [C# MVP]

barbutz said:
I'm using WebClient to download a XML file from a remote server. I then save
the xml into a string.
The problem is when i use XmlDocument LoadXml() on that string. I get the
following exception:
"System.Xml.XmlException was unhandled
Message="Data at the root level is invalid. Line 1, position 1."

If i save the xml file (on the remote server) as ASCII file then there is no
problem. For some reason, the LoadXml() function cannot handle the utf8 file
format! Of course my xml is declared as encoding="utf-8".

It may be *declared* as UTF-8, but is it *actually* UTF-8?

Could you post a short but complete program which demonstrates the
problem?

See http://www.pobox.com/~skeet/csharp/complete.html for details of
what I mean by that.

You shouldn't need to have any loading involved, by the way - just a
piece of code which uses a string literal should be sufficient. My
guess is that while trying to reproduce it, you'll find that the string
you've got from the WebClient isn't what you think it is.
 
T

Truong Hong Thi

Did you try to write out/log the contents of "test" to see what it
looked like before passing it to LoadXml ? You should have understood
why you got such XmlException.

Thi
 
G

Guest

Ok, i debugged my program and i realize the following thing:
The xml file is saved in utf8 format. That means that its first 3 bytes are
binary which represents utf8: EF BB BF.
Now when i read it with WebClient to a string then string contains the whole
xml data INCLUDING those 3 bytes. Now when i load this string with LoadXml i
get an execption because of those 3 bytes. If i remove those bytes from the
file by Hex Editor then there is no problem but this is not a good solution
cause it turns my file to simple ASCII.
How can i solve this issue without touching the xml file ?
 
T

Truong Hong Thi

I suggest you try as follows:

string link = "http://myServer/Test.xml";
WebClient client = new WebClient();
byte[] theBytes = client.DownloadData(link);
string test = Encoding.UTF8.GetString(theBytes);
client.Dispose();
XmlDocument testXML = new XmlDocument();
testXML.LoadXml(test);

I did not test it yet, but hope it could help,
Thi
 
J

Joerg Jooss

barbutz said:
Hi,
I'm using WebClient to download a XML file from a remote server. I
then save the xml into a string.
The problem is when i use XmlDocument LoadXml() on that string. I get
the following exception:
"System.Xml.XmlException was unhandled
Message="Data at the root level is invalid. Line 1, position 1."

If i save the xml file (on the remote server) as ASCII file then
there is no problem. For some reason, the LoadXml() function cannot
handle the utf8 file format! Of course my xml is declared as
encoding="utf-8". This is my code:

string link = "http://myServer/Test.xml";
WebClient client = new WebClient();
client.Encoding = System.Text.Encoding.UTF8;
string test = client.DownloadString(link);
client.Dispose();
XmlDocument testXML = new XmlDocument();
testXML.LoadXml(test); // <-- Here i get the exception

What happens is that Encoding.UTF8.GetString() doesn't strip away the
BOM if one exists. I'm not sure whether that's by design -- to me it's
rather a bug.

You have to choices: Strip away the BOM yourself or use
XmlDocument.Load() to read the XML content directly from a URL.

Cheers,
 
J

Jon Skeet [C# MVP]

Joerg Jooss said:
What happens is that Encoding.UTF8.GetString() doesn't strip away the
BOM if one exists. I'm not sure whether that's by design -- to me it's
rather a bug.

You have to choices: Strip away the BOM yourself or use
XmlDocument.Load() to read the XML content directly from a URL.

Hmm. It feels as much a bug in the XmlDocument.LoadXml() call as
anywhere else. Certainly if this were presented as *binary* data it
should be okay - the XML specification mentioned BOMs particularly.

For anyone who's interested, here's a short but complete program
demonstrating it:

using System;
using System.Xml;

class Test
{
static void Main()
{
try
{
string x = "\ufeff<?xml version='1.0'?><hello/>";

XmlDocument doc = new XmlDocument();
doc.LoadXml(x);
}
catch (Exception e)
{
Console.WriteLine (e);
}
}
}
 
G

Guest

Thanks for all of your replies.
The second option sounds good, but how can i read the xml file using
XmlDocument.Load() ? The xml file is sitting in a remote web server not local
- that's why i used WebClient in the first place. Is there a way to use
XmlDocument.Load() to load an xml file that is located in a remote http
server?

Thanks!
 
J

Joerg Jooss

barbutz said:
Thanks for all of your replies.
The second option sounds good, but how can i read the xml file using
XmlDocument.Load() ? The xml file is sitting in a remote web server
not local - that's why i used WebClient in the first place. Is there
a way to use XmlDocument.Load() to load an xml file that is located
in a remote http server?

Yes. Actually, the string parameter in Load(fileName) is documented as
follows:

"URL for the file containing the XML document to load."

That means you should be able to pass any valid URL. If it starts with
the HTTP scheme, the file is downloaded from the Web. Unless you need
control over the HTTP communication, that's the easiest way to load an
XML file.

Cheers,
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top