Invalid characters before xml header

  • Thread starter Thread starter Nadav
  • Start date Start date
N

Nadav

Hello,
When I create an XML header using this code:

XmlDeclaration header = doc.CreateXmlDeclaration("1.0", "UTF-8", null);

XmlElement rootElement = doc.DocumentElement;

doc.InsertBefore(header, rootElement);


It adds some invalid characters before the header itself, only viewable with
a text editor (IE opens the XML ok). This causes some perl code I got, which
reads from the XML, to fail.

This is the header:
?»¿<?xml version="1.0" encoding="UTF-8"?>

How can I fix this ?
Thanks.
Nadav.
 
Nadav said:
When I create an XML header using this code:

XmlDeclaration header = doc.CreateXmlDeclaration("1.0", "UTF-8", null);

XmlElement rootElement = doc.DocumentElement;

doc.InsertBefore(header, rootElement);


It adds some invalid characters before the header itself, only viewable with
a text editor (IE opens the XML ok). This causes some perl code I got, which
reads from the XML, to fail.

This is the header:
?»¿<?xml version="1.0" encoding="UTF-8"?>

They're not invalid characters. That's the byte order mark. It's
perfectly valid for it to be there - it sounds like the Perl code is
broken. You may well not be able to fix that though, so I guess we need
to sort out how to suppress the BOM from the written file.

You haven't shown how you're writing out the document - could you do
so?
 
Sure...

XmlDocument doc = new XmlDocument();

XmlNode root = doc.CreateElement("XXXX");

doc.AppendChild (root);

and so on....

and then at last I perform the code written before, to add the decleration.

The reason I thought they are invalid chars, is that I have the same
software which creates the XML in JAVA also (I rewrote it in C#), and when I
checked - the output XML files were identical (text and structure). But
still the JAVA created XML worked with the perl code and the C# wasn't. The
reason are probably these chars which don't exist in the Java XML.

Thanks, Nadav.




Nadav said:
When I create an XML header using this code:

XmlDeclaration header = doc.CreateXmlDeclaration("1.0", "UTF-8", null);

XmlElement rootElement = doc.DocumentElement;

doc.InsertBefore(header, rootElement);


It adds some invalid characters before the header itself, only viewable with
a text editor (IE opens the XML ok). This causes some perl code I got, which
reads from the XML, to fail.

This is the header:
?»¿<?xml version="1.0" encoding="UTF-8"?>

They're not invalid characters. That's the byte order mark. It's
perfectly valid for it to be there - it sounds like the Perl code is
broken. You may well not be able to fix that though, so I guess we need
to sort out how to suppress the BOM from the written file.

You haven't shown how you're writing out the document - could you do
so?
 
I seem to hav emailed you insteda of posting before but here it is:

This is the byte order mark (BOM)and it confused me too at first.

You might think that you've already specified the encoding as "UTF-8" but if
you think about it the reader needs to know the encodding to read the string
"UTF-8" hence the BOM which is a 16 a magic 16 bit unicode value usually put
at the start of the file.

Off the top of my head you have an interaction between XmlTextWriter, the
stream you are writing to and the encoding for that stream.
It IS all documented (just not very clearly) and you definitely can suppress
the BOM (at least for utf-8).
I think you will find that there is a parameter to the encoding constructor
that specifies whether to use the BOM.
Just to confuse things I seem to remember that Encoding.UTF8 and new
UTF8Encoding() are different.

I seem to remember that when I had this problem it was because I was writing
to a MemoryStream which defaulted to Unicode whereas I think file streams
default to UTF-8 for compatibility reasons.

Be careful - it is totally possible to have the xml say "UTF-8" and the BOM
say something else - this will cause a self explanatory error when you try
to load the document.

P.S. Notepad can read and write UTF-8 and unicode big or little endian

see also
http://www.unicode.org/faq/utf_bom.html
http://en.wikipedia.org/wiki/Byte_Order_Mark
 
Nadav said:
Sure...

XmlDocument doc = new XmlDocument();

XmlNode root = doc.CreateElement("XXXX");

doc.AppendChild (root);

and so on....

and then at last I perform the code written before, to add the decleration.

But none of that is what actually writes out the document - what saves
it to disk. That's what I'm interested in, as that's when the BOM is
created.
The reason I thought they are invalid chars, is that I have the same
software which creates the XML in JAVA also (I rewrote it in C#), and when I
checked - the output XML files were identical (text and structure). But
still the JAVA created XML worked with the perl code and the C# wasn't. The
reason are probably these chars which don't exist in the Java XML.

Well, they probably do depending on how you tell Java to write it out
(and which XML library you use - there are loads of Java XML
libraries).

Jon
 
Nadav said:
Oh, Sorry bout that :)
it's :

doc.Save(fileDialog.FileName);

Okay. The simplest way round it is to create a StreamWriter using an
encoding (matching the one the document uses) which doesn't use a BOM
(you can create an instance of UTF8Encoding which doesn't use a BOM):

Encoding enc = new Utf8Encoding (
using (StreamWriter writer = new StreamWriter (fileDialog.FileName,
false, new UTF8Encoding(false))
{
doc.Save(writer);
}

That should work...
Jon
 
Thanks alot ! works great !

Jon Skeet said:
Okay. The simplest way round it is to create a StreamWriter using an
encoding (matching the one the document uses) which doesn't use a BOM
(you can create an instance of UTF8Encoding which doesn't use a BOM):

Encoding enc = new Utf8Encoding (
using (StreamWriter writer = new StreamWriter (fileDialog.FileName,
false, new UTF8Encoding(false))
{
doc.Save(writer);
}

That should work...
Jon
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Back
Top