Strange problem with sharpziplib, xml and utf-8 encoding

M

macap.usenet

Hello,

I am having a very strange problem with the sharpziplib and xml files.

In my application I am uploading a zip file which contains an xml file
via HttpPostedFile:

private void getProject(HttpPostedFile file)
{
String projectName = "";
double projectVersion = 0.0;

ZipFile zFile = new ZipFile(file.InputStream);

ZipEntry zEntry = zFile.GetEntry("version.xml");
if (zEntry != null)
{
using (Stream zis = zFile.GetInputStream(zEntry))
{
byte[] data = new byte[2048];
while (true)
{
int size = zis.Read(data, 0, data.Length);
if (size > 0)
{
XmlDocument doc = new XmlDocument();

String s = new
UTF8Encoding().GetString(data, 0, size); // Problem!!!
doc.LoadXml(s);

XmlNodeList nl =
doc.SelectNodes("project");

//...

The Problem is my line with "String s = ....".
In the debugger view, this String contains the following xml string:

< ? xml version="1.0">
^^^^
<project>
<name>MyProjekt</name>
<version>1.0</version>
</project>

As you can see, there is a blank before the question mark!
This seems to be the reason why the next line (doc.loadXml(s)) throws
an XmlException ("ungültige Daten auf Stammebene, Zeile 1, Position 1/
in english: "invalid data on root layer, line 1, position 1").


But the most strange behaviour of this problem is...
When I copy the xml-string via copy and paste out of the debugger into
my news client,
everything seems to be fine:


<?xml version="1.0">
<project>
<name>MyProjekt</name>
<version>1.0</version>
</project>


How could this happen???

I also tried String s = new ASCIIEncoding().GetString(data, 0, size);
But the result of this is:

???<?xml version="1.0">
<project>
<name>MyProjekt</name>
<version>1.0</version>
</project>

As you can see, the result of ASCIIEncoding are three question marks
(???) at the beginning of my document, which also leads to an invalid
XML file.


I am using Visual Studio 2005 Professional. Is this maybe a bug or...?
It´s very confusing and I do not have any other idea where to start
debugging.



Regards,

Martin
 
M

macap.usenet

By the way...
I also tried:
String s = System.Text.UTF8Encoding.UTF8.GetString(data);

Which leads to the same strange behaviour.
 
J

Jon Skeet [C# MVP]

By the way...
I also tried:
String s = System.Text.UTF8Encoding.UTF8.GetString(data);

Which leads to the same strange behaviour.

You could either detect and then discard the Unicode BOM, or you could
just pass the stream into XmlDocument.Load and let it deal with it.

Jon
 
M

macap.usenet

Hi Jon,

just pass the stream into XmlDocument.Load and let it deal with it.

Thank you. This works fine. Sometimes the solution is so near and
clear ;-)

But all in all. I really do not understand why the debugger showed a
"< ?xml ..."
and with copy & pasting the code out of the debugger i get the correct
"<?xml ...".


Regards,
Martin
 
J

Jon Skeet [C# MVP]

Thank you. This works fine. Sometimes the solution is so near and
clear ;-)
Goodo.

But all in all. I really do not understand why the debugger showed a
"< ?xml ..."
and with copy & pasting the code out of the debugger i get the correct
"<?xml ...".

I suspect it's that the byte order mark wasn't being copied to the
clipboard - and I'd expect it to be *before* the < by the way, not
after it.

Jon
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top