Odd character returned when using Encoding.UTF8.GetString(MemoryStream)

C

Chris Lacey

Is anyone aware why the following code (intended to write XML into a
memory-based XmlTextWriter, and return the complete document as a string)
produces badly formed XML due to the resultant string always commencing with
a question mark

string xmlRequestString;
MemoryStream memoryStream = new MemoryStream();
XmlTextWriter xmlTextWriter = new XmlTextWriter(memoryStream,
Encoding.UTF8);

xmlTextWriter.WriteStartDocument();
xmlTextWriter.WriteStartElement("element");
xmlTextWriter.WriteStartElement("subelement");
xmlTextWriter.WriteAttributeString("attribute", "attributeValue");
xmlTextWriter.WriteString("string");
xmlTextWriter.WriteEndElement();
xmlTextWriter.WriteEndElement();
xmlTextWriter.WriteEndDocument();
xmlTextWriter.Close();

xmlRequestString = Encoding.UTF8.GetString(memoryStream.ToArray());
Console.WriteLine(xmlRequestString);


This returns the following (note the preceding question mark):

?<?xml version="1.0" encoding="utf-8"?><element><subelement
attribute="attribute
Value">string</subelement></element>

Any ideas or assistance very gratefully received!!

Many thanks,

Chris.
 
J

Jon Skeet [C# MVP]

Chris Lacey said:
Is anyone aware why the following code (intended to write XML into a
memory-based XmlTextWriter, and return the complete document as a string)
produces badly formed XML due to the resultant string always commencing with
a question mark

<snip>

It's a byte ordering mark. If instead of using Encoding.UTF8.GetString
you use a StreamReader, you'll see it goes away.

If you really only want the characters, I suggest that you change to
using a StringWriter instead of a MemoryStream. To support appropriate
encodings, you may wish to derive a class from StringWriter to allow
you to specify the encoding, e.g.

public class StringWriterWithEncoding : StringWriter
{
Encoding encoding;

public StringWriterWithEncoding (Encoding encoding)
{
this.encoding = encoding;
}

public override Encoding Encoding
{
get { return encoding; }
}
}
 
O

Oleg Tkachenko

Chris said:
Is anyone aware why the following code (intended to write XML into a
memory-based XmlTextWriter, and return the complete document as a string)
produces badly formed XML due to the resultant string always commencing with
a question mark

Most likely it's not question mark, but BOM - byte order mark.
Btw, why don't you write directly to string using StringWriter?
 
Top