Strange problem with sharpziplib, xml and utf-8 encoding

macap.usenet · Dec 12, 2007

Hello,

I am having a very strange problem with the sharpziplib and xml files.

In my application I am uploading a zip file which contains an xml file
via HttpPostedFile:

private void getProject(HttpPostedFile file)
{
String projectName = "";
double projectVersion = 0.0;

ZipFile zFile = new ZipFile(file.InputStream);

ZipEntry zEntry = zFile.GetEntry("version.xml");
if (zEntry != null)
{
using (Stream zis = zFile.GetInputStream(zEntry))
{
byte[] data = new byte[2048];
while (true)
{
int size = zis.Read(data, 0, data.Length);
if (size > 0)
{
XmlDocument doc = new XmlDocument();

String s = new
UTF8Encoding().GetString(data, 0, size); // Problem!!!
doc.LoadXml(s);

XmlNodeList nl =
doc.SelectNodes("project");

//...

The Problem is my line with "String s = ....".
In the debugger view, this String contains the following xml string:

ï»¿< ? xml version="1.0">
^^^^
<project>
<name>MyProjekt</name>
<version>1.0</version>
</project>

As you can see, there is a blank before the question mark!
This seems to be the reason why the next line (doc.loadXml(s)) throws
an XmlException ("ungÃ¼ltige Daten auf Stammebene, Zeile 1, Position 1/
in english: "invalid data on root layer, line 1, position 1").

But the most strange behaviour of this problem is...
When I copy the xml-string via copy and paste out of the debugger into
my news client,
everything seems to be fine:

ï»¿<?xml version="1.0">
<project>
<name>MyProjekt</name>
<version>1.0</version>
</project>

How could this happen???

I also tried String s = new ASCIIEncoding().GetString(data, 0, size);
But the result of this is:

???<?xml version="1.0">
<project>
<name>MyProjekt</name>
<version>1.0</version>
</project>

As you can see, the result of ASCIIEncoding are three question marks
(???) at the beginning of my document, which also leads to an invalid
XML file.

I am using Visual Studio 2005 Professional. Is this maybe a bug or...?
ItÂ´s very confusing and I do not have any other idea where to start
debugging.

Regards,

Martin

macap.usenet · Dec 12, 2007

By the way...
I also tried:
String s = System.Text.UTF8Encoding.UTF8.GetString(data);

Which leads to the same strange behaviour.

Jon Skeet [C# MVP] · Dec 12, 2007

By the way...
I also tried:
String s = System.Text.UTF8Encoding.UTF8.GetString(data);

Which leads to the same strange behaviour.

You could either detect and then discard the Unicode BOM, or you could
just pass the stream into XmlDocument.Load and let it deal with it.

Jon

macap.usenet · Dec 12, 2007

Hi Jon,

just pass the stream into XmlDocument.Load and let it deal with it.

Thank you. This works fine. Sometimes the solution is so near and
clear ;-)

But all in all. I really do not understand why the debugger showed a
"< ?xml ..."
and with copy & pasting the code out of the debugger i get the correct
"<?xml ...".

Regards,
Martin

Jon Skeet [C# MVP] · Dec 12, 2007

Thank you. This works fine. Sometimes the solution is so near and
clear ;-)
Goodo.

But all in all. I really do not understand why the debugger showed a
"< ?xml ..."
and with copy & pasting the code out of the debugger i get the correct
"<?xml ...".

I suspect it's that the byte order mark wasn't being copied to the
clipboard - and I'd expect it to be *before* the < by the way, not
after it.

Jon

Strange problem with sharpziplib, xml and utf-8 encoding

macap.usenet

macap.usenet

Jon Skeet [C# MVP]

macap.usenet

Jon Skeet [C# MVP]