ReadXML error

T

tshad

I have the following code:

string myXMLfile = strFile;
DataSet ds1 = new DataSet();

// Create new FileStream with which to read the schema.
System.IO.FileStream fsReadXml = new System.IO.FileStream
(myXMLfile, System.IO.FileMode.Open);
try
{
ds1.ReadXml(fsReadXml);
}
....

I am getting an error reading the file:

"Invalid character in the given encoding. Line 3, position 1."

The start of the XML file is:

<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:blush:ffice:spreadsheet"
xmlns:blush:="urn:schemas-microsoft-com:blush:ffice:blush:ffice"
xmlns:x="urn:schemas-microsoft-com:blush:ffice:excel"
xmlns:ss="urn:schemas-microsoft-com:blush:ffice:spreadsheet"
xmlns:html="http://www.w3.org/TR/REC-html40">
<DocumentProperties xmlns="urn:schemas-microsoft-com:blush:ffice:blush:ffice">

Looks fine to me.

What would cause this error?

Excel reads it perfectly.

Thanks,

Tom
 
F

Family Tree Mike

I have the following code:

string myXMLfile = strFile;
DataSet ds1 = new DataSet();

// Create new FileStream with which to read the schema.
System.IO.FileStream fsReadXml = new System.IO.FileStream
(myXMLfile, System.IO.FileMode.Open);
try
{
ds1.ReadXml(fsReadXml);
}
...

I am getting an error reading the file:

"Invalid character in the given encoding. Line 3, position 1."

The start of the XML file is:

<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:blush:ffice:spreadsheet"
xmlns:blush:="urn:schemas-microsoft-com:blush:ffice:blush:ffice"
xmlns:x="urn:schemas-microsoft-com:blush:ffice:excel"
xmlns:ss="urn:schemas-microsoft-com:blush:ffice:spreadsheet"
xmlns:html="http://www.w3.org/TR/REC-html40">
<DocumentProperties xmlns="urn:schemas-microsoft-com:blush:ffice:blush:ffice">

Looks fine to me.

What would cause this error?

Excel reads it perfectly.

Thanks,

Tom

I don't think I changed anything significant...
With the following xml file and command line app, I don't get the error.

<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:blush:ffice:spreadsheet"
xmlns:blush:="urn:schemas-microsoft-com:blush:ffice:blush:ffice"
xmlns:x="urn:schemas-microsoft-com:blush:ffice:excel"
xmlns:ss="urn:schemas-microsoft-com:blush:ffice:spreadsheet"
xmlns:html="http://www.w3.org/TR/REC-html40">
<DocumentProperties xmlns="urn:schemas-microsoft-com:blush:ffice:blush:ffice">
</DocumentProperties>
</Workbook>


static void Main(string[] args)
{
string myXMLfile = "XMLFile1.xml";
DataSet ds1 = new DataSet();

// Create new FileStream with which to read the schema.
System.IO.FileStream fsReadXml =
new System.IO.FileStream (myXMLfile, System.IO.FileMode.Open);
try
{
ds1.ReadXml(fsReadXml);
}
catch (Exception ex)
{ Console.WriteLine(ex.Message + "\n" + ex.StackTrace); }
Console.WriteLine("Done...");
Console.ReadKey();
}
 
T

tshad

I finally found out what was causing the file give me the error.

The file has A0 where it normally would have spaces.

Not sure why, but this is what is causing the problem.

Excel has no problem reading it but ReadXML doesn't like it.

Why is that and how do I get around this ?

These files are generated by a program that gets downloaded that we have do
handle.

Thanks,

Tom


Family Tree Mike said:
I have the following code:

string myXMLfile = strFile;
DataSet ds1 = new DataSet();

// Create new FileStream with which to read the schema.
System.IO.FileStream fsReadXml = new System.IO.FileStream
(myXMLfile, System.IO.FileMode.Open);
try
{
ds1.ReadXml(fsReadXml);
}
...

I am getting an error reading the file:

"Invalid character in the given encoding. Line 3, position 1."

The start of the XML file is:

<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:blush:ffice:spreadsheet"
xmlns:blush:="urn:schemas-microsoft-com:blush:ffice:blush:ffice"
xmlns:x="urn:schemas-microsoft-com:blush:ffice:excel"
xmlns:ss="urn:schemas-microsoft-com:blush:ffice:spreadsheet"
xmlns:html="http://www.w3.org/TR/REC-html40">
<DocumentProperties xmlns="urn:schemas-microsoft-com:blush:ffice:blush:ffice">

Looks fine to me.

What would cause this error?

Excel reads it perfectly.

Thanks,

Tom

I don't think I changed anything significant...
With the following xml file and command line app, I don't get the error.

<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:blush:ffice:spreadsheet"
xmlns:blush:="urn:schemas-microsoft-com:blush:ffice:blush:ffice"
xmlns:x="urn:schemas-microsoft-com:blush:ffice:excel"
xmlns:ss="urn:schemas-microsoft-com:blush:ffice:spreadsheet"
xmlns:html="http://www.w3.org/TR/REC-html40">
<DocumentProperties xmlns="urn:schemas-microsoft-com:blush:ffice:blush:ffice">
</DocumentProperties>
</Workbook>


static void Main(string[] args)
{
string myXMLfile = "XMLFile1.xml";
DataSet ds1 = new DataSet();

// Create new FileStream with which to read the schema.
System.IO.FileStream fsReadXml =
new System.IO.FileStream (myXMLfile, System.IO.FileMode.Open);
try
{
ds1.ReadXml(fsReadXml);
}
catch (Exception ex)
{ Console.WriteLine(ex.Message + "\n" + ex.StackTrace); }
Console.WriteLine("Done...");
Console.ReadKey();
}
 
F

Family Tree Mike

I finally found out what was causing the file give me the error.

The file has A0 where it normally would have spaces.

Not sure why, but this is what is causing the problem.

Excel has no problem reading it but ReadXML doesn't like it.

Why is that and how do I get around this ?

These files are generated by a program that gets downloaded that we have do
handle.

Thanks,

Tom

Can you provide a small xml file that exhibits the problem? Its not
obvious where the 'A0' is.
 
T

tshad

Family Tree Mike said:
Can you provide a small xml file that exhibits the problem? Its not
obvious where the 'A0' is.
You can't see it as it unless you look at the binary. But 0xA0 is 0x20 with
the high bit set (blank).

I found that I could just replace it with a blank and it seemed to work
fine. Not sure why it is there but it was in a bunch of places in the file.
The xml files apparently was an Excel 2007 file based on what was in the
file. I just don't know where the A0s came from.

You also have to be sure you read the file with the correct encoding or you
will end up with extra bytes in the file. Apparently, the default encoding
will replace illegal characters with other characters, so the A0 won't be
there.

Tom
 
F

Family Tree Mike

You can't see it as it unless you look at the binary. But 0xA0 is 0x20 with
the high bit set (blank).

I found that I could just replace it with a blank and it seemed to work
fine. Not sure why it is there but it was in a bunch of places in the file.
The xml files apparently was an Excel 2007 file based on what was in the
file. I just don't know where the A0s came from.

You also have to be sure you read the file with the correct encoding or you
will end up with extra bytes in the file. Apparently, the default encoding
will replace illegal characters with other characters, so the A0 won't be
there.

Tom

You are right, the encoding should be set when reading. It can however
be set in the file itself, as indicated here:
http://www.w3schools.com/XML/xml_encoding.asp.

Perhaps the provider can be persuaded to include the encoding in the header.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top