Find the encoding of an XML char array (or stream) stored in memory

M

Martin Z

Hi,

I have an application that involves sending a lot of XML data to
various places. The problem is that once in a while, I just want the
XML document as a string (for example, sending to a typed dataset
tableadaptor). In that case, I have a problem: what encoding do I use
to conver the byte[] (or memorystream) into a string? The encoding is
in the file, of course, like any XML file, but how do I find out
which? I tried using an XmlTextReader - this is supposed to have an
Encoding... but all I did basically was

myEncoding = new XmlTextReader(myMemoryStream).Encoding;

and in that case it's null. Is there something I have to do with the
textreareader to get it to populate it's own "Encoding" field from the
given text? The presence of a null indicates I'm doing something
really wierd, since the documentation for XmlTextReader.Encoding says:

Property Value
The encoding value. If no encoding attribute exists, and there is no
byte-order mark, this defaults to UTF-8.
 
M

Martin Honnen

Martin said:
I have an application that involves sending a lot of XML data to
various places. The problem is that once in a while, I just want the
XML document as a string (for example, sending to a typed dataset
tableadaptor). In that case, I have a problem: what encoding do I use
to conver the byte[] (or memorystream) into a string? The encoding is
in the file, of course, like any XML file, but how do I find out
which? I tried using an XmlTextReader - this is supposed to have an
Encoding... but all I did basically was

myEncoding = new XmlTextReader(myMemoryStream).Encoding;

and in that case it's null. Is there something I have to do with the
textreareader to get it to populate it's own "Encoding" field from the
given text?

Yes, the reader has to read the beginning of the stream to find a byte
order mark or look at the XML declaration.
If you do e.g.
XmlTextReader reader = new XmlTextReader(myMemoryStream);
reader.MoveToContent();
then you should be able to get the encoding
Encoding myEncoding = reader.Encoding;

Note that in .NET 2.0 there is also a method ReadOuterXml so instead of
trying to find out the encoding and decode the bytes in the memory
stream to a string it might suffice to do e.g.
XmlTextReader reader = new XmlTextReader(myMemoryStream);
reader.MoveToContent();
string xml = reader.ReadOuterXml();
That will strip out anything like comment or processing instructions
before the root element however. If you want the complete XML then you
might need to call Read and ReadOuterXml for all top level nodes instead
of using MoveToContent.

And most APIs taking XML input usually have overloads to read from a
stream directly, so double check that your API does not take a stream
before you take efforts to decode your stream into a string of XML.
 
M

Martin Z

Thankyou, you solved my problem perfectly. And as I said, I was using
a typed dataset tableadaptor on SQL 2000 - those use strings for any
text field, so I needed it as a string.

Martin said:
I have an application that involves sending a lot of XML data to
various places. The problem is that once in a while, I just want the
XML document as a string (for example, sending to a typed dataset
tableadaptor). In that case, I have a problem: what encoding do I use
to conver the byte[] (or memorystream) into a string? The encoding is
in the file, of course, like any XML file, but how do I find out
which? I tried using an XmlTextReader - this is supposed to have an
Encoding... but all I did basically was
myEncoding = new XmlTextReader(myMemoryStream).Encoding;
and in that case it's null. Is there something I have to do with the
textreareader to get it to populate it's own "Encoding" field from the
given text?

Yes, the reader has to read the beginning of the stream to find a byte
order mark or look at the XML declaration.
If you do e.g.
XmlTextReader reader = new XmlTextReader(myMemoryStream);
reader.MoveToContent();
then you should be able to get the encoding
Encoding myEncoding = reader.Encoding;

Note that in .NET 2.0 there is also a method ReadOuterXml so instead of
trying to find out the encoding and decode the bytes in the memory
stream to a string it might suffice to do e.g.
XmlTextReader reader = new XmlTextReader(myMemoryStream);
reader.MoveToContent();
string xml = reader.ReadOuterXml();
That will strip out anything like comment or processing instructions
before the root element however. If you want the complete XML then you
might need to call Read and ReadOuterXml for all top level nodes instead
of using MoveToContent.

And most APIs taking XML input usually have overloads to read from a
stream directly, so double check that your API does not take a stream
before you take efforts to decode your stream into a string of XML.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top