Validating an XML file

J

Jack White

Hi there,

I've created a strongly-typed "DataSet" using VS. If I save the data via
"DataSet.WriteXml()" and later prompt my users for the name of the file in
order to read it back in again (using "DataSet.ReadXml()"), how do I
validate that the file they enter is valid. That is, while
"DataSet.ReadXml()" throws a "System.Xml.XmlException" if they enter a
non-XML file, a valid XML file results in no exception even though the
schema may be completely incorrect (if they enter some random XML file on
the system that is). Is there a clean way of detecting this situatiion.
Thank you.
 
F

ForrestPhoto

validate that the file they enter is valid. That is, while
"DataSet.ReadXml()" throws a "System.Xml.XmlException" if they enter a
non-XML file, a valid XML file results in no exception even though the
schema may be completely incorrect (if they enter some random XML file on
the system that is). Is there a clean way of detecting this situatiion.

Well, you could read in the xml data inside a try block ( which would
catch the XmlException, and others ) plus check to make sure the tables
you expect are there after you ReadXml().
 
B

Bob Jones

You might want to look into XML DTD - data type definitions. Here's
what www.w3schools.com has to say about it: "Your application can use a
standard DTD to verify that the data you receive from the outside world
is valid.

You can also use a DTD to verify your own data."
 
G

Guest

try
{
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.LoadXml(xmlString); // or xmlDoc.Load(xmlFilePathHere);
// Valid XML
}
catch(XmlException)
{
// Invalid XML
}

I hope this will help.
Best Regards,
Rizwan aka RizwanSharp
 
G

Glenn

Hi

Check out the WriteXmlSchema, ReadXmlSchema and WriteXml and ReadXml
overloads of the DataSet type. I haven't actually used these before, and I
can't check that it will do exactly what you want it do (at a PC with no
..Net), but it's worth a look.

Glenn
 
?

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

Jack said:
I've created a strongly-typed "DataSet" using VS. If I save the data via
"DataSet.WriteXml()" and later prompt my users for the name of the file in
order to read it back in again (using "DataSet.ReadXml()"), how do I
validate that the file they enter is valid. That is, while
"DataSet.ReadXml()" throws a "System.Xml.XmlException" if they enter a
non-XML file, a valid XML file results in no exception even though the
schema may be completely incorrect (if they enter some random XML file on
the system that is). Is there a clean way of detecting this situatiion.

Try something like:

DataSet ds = new DataSet("TestDS");
XmlReaderSettings xrs = new XmlReaderSettings();
xrs.ValidationType = ValidationType.Schema;
xrs.Schemas.Add(XmlSchema.Read(new
StreamReader(@"C:\ds.xsd"), ValidationEventHandler));
XmlReader xr = XmlTextReader.Create(@"C:\ds.xml", xrs);
ds.ReadXml(xr);

Arne
 
J

Jack White

I've created a strongly-typed "DataSet" using VS. If I save the data via
Try something like:

DataSet ds = new DataSet("TestDS");
XmlReaderSettings xrs = new XmlReaderSettings();
xrs.ValidationType = ValidationType.Schema;
xrs.Schemas.Add(XmlSchema.Read(new StreamReader(@"C:\ds.xsd"),
ValidationEventHandler));
XmlReader xr = XmlTextReader.Create(@"C:\ds.xml", xrs);
ds.ReadXml(xr);

Thanks very much. That does appear to be the correct solution but it still
doesn't trap an invalid ".xml" file for some reason. I've tried different
variations including similar solutions on the web (which all basically boil
down to your own) but it never invokes the handler. Note that my experience
with XML is very limited so I'm not sure what's wrong (I'm a very
experienced developer however). I'll have to research it further but I'm
basically doing this:

1) Create a strongly-typed (wizard-generated) dataset using VS
(wizard-generated constructor creates all tables, constraints, etc.)
2) Populate it with data and write to file using "DataSet.WriteXml("ds.xml",
XmlWriteMode.IgnoreSchema)". Note that my ".xml" file actually uses another
extension but I assume that's not an issue.
3) Read it back in using your code above, passing the wizard-generated
".xsd" file from step 1. Note however that "ds" from your example will
actually be the wizard-generated "DataSet" derivative whose constructor
creates all tables, etc. I'm assuimg this makes no difference (?).

If I now pass in an arbitrary (invalid) ".xml" file in step 3, the handler
is never called. Any idea what's wrong? Thanks again.
 
?

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

Jack said:
Thanks very much. That does appear to be the correct solution but it still
doesn't trap an invalid ".xml" file for some reason.

If I add something either non well formed XML to the file or
add well formed XML that does not comply with the schema then
I get an exception.

Arne
 
J

Jack White

If I add something either non well formed XML to the file or
add well formed XML that does not comply with the schema then
I get an exception.

Strange. The ".xml" file I'm passing doesn't conform with the ".xsd" file.
The handler isn't called however nor is any exception thrown. After the call
to "ReadXml()", everything created in the constructor remains intact as if
nothing happened. After several hours mucking with this I'm at a loss to
explain it. Anyway, I'll just have to keep probing. Thanks again though
(appreciated).
 
G

Glenn

Using ReadXmlSchema and ReadXml should throw an exception if the XML doesn't
match the schema, although it'll probably being something vague like a
constraint failure.
 
J

Jack White

Using ReadXmlSchema and ReadXml should throw an exception if the XML
doesn't match the schema, although it'll probably being something vague
like a constraint failure.

Thanks for your feedback. I was going to respond to your first post in fact
but was working on resolving the issue which I just did moments ago (with
the help of an XML MVP elsewhere though I'm still testing things). I had to
turn on the "XmlSchemaValidationFlags.ReportValidationWarnings" in
"XmlReaderSettings" and then Arne's example works (well, I changed it
slightly). Note that not even MSFT's examples touch this flag however so I
don't understand this (it makes me a little nervous in fact). Problems are
also reported as warnings and not errors which I find counter-intuitive. In
any case, perhaps one of the "ReadXml()" overloads you suggested does throw
an exception but not the version I've been using all along (it doesn't). If
one of them does however then it's not documented and therefore unreliable.
I think Arne's way is really the "official" way in 2.0 anyway (there was
another technique in 1.X that's now obsolete) so I'm probably safer relying
on it. I don't like having to send my ".xsd" file out just for validation
however (it's an internal detail) but I'm hoping it can be avoided somehow.
I'm still looking into it but I'm fairly new to XML and so it's a learning
process. Any advice you can offer (on having to ship my ".xsd" file) would
be welcome however. Thanks again.
 
G

Glenn

Responses inline...

Jack White said:
Thanks for your feedback. I was going to respond to your first post in
fact but was working on resolving the issue which I just did moments ago
(with the help of an XML MVP elsewhere though I'm still testing things). I
had to turn on the "XmlSchemaValidationFlags.ReportValidationWarnings" in
"XmlReaderSettings" and then Arne's example works (well, I changed it
slightly). Note that not even MSFT's examples touch this flag however so I
don't understand this (it makes me a little nervous in fact).

Interesting, I can't remember ever having to do the
"XmlSchemaValidationFlags.ReportValidationWarnings" thing. The method I
use, which is almost definately slower given it doesn't use a reader, or if
it does it's internal, is XmlDocument.Validate().
Problems are also reported as warnings and not errors which I find
counter-intuitive.

Whenever a problem gets reported you can check ValidationEventArgs.Severity
property to determine what action to take, if any.
In any case, perhaps one of the "ReadXml()" overloads you suggested does throw
an exception but not the version I've been using all along (it doesn't).
If one of them does however then it's not documented and therefore
unreliable.

Was the schema inlined with the XML? If not, it'll infer the schema from
the XML and won't throw an exception.

http://msdn2.microsoft.com/en-us/library/360dye2a.aspx
I think Arne's way is really the "official" way in 2.0 anyway (there was
another technique in 1.X that's now obsolete) so I'm probably safer
relying on it.

That involved using XmlValidatingReader, which is indeed obselete.
I don't like having to send my ".xsd" file out just for validation however
(it's an internal detail) but I'm hoping it can be avoided somehow. I'm
still looking into it but I'm fairly new to XML and so it's a learning
process. Any advice you can offer (on having to ship my ".xsd" file) would
be welcome however. Thanks again.


I've distributed schemas with application code before now as a plain .xsd
file, although this was to a well constrained user population. If your
worried about people tampering with it, you could store it in a .resx file
in your application.

Glenn
 
J

Jack White

Interesting, I can't remember ever having to do the
"XmlSchemaValidationFlags.ReportValidationWarnings" thing. The method I
use, which is almost definately slower given it doesn't use a reader, or
if it does it's internal, is XmlDocument.Validate().

Since none of the examples I've seen turn this flag on it leads me to
believe I have a problem somewhere. I shouldn't have to turn it on IOW if
nobody else has to.
Whenever a problem gets reported you can check
ValidationEventArgs.Severity property to determine what action to take, if
any.

It always reports it as a warning. I originally thought it would be reported
as an error but apparently not. The problem is that you can't distinguish
between "acceptable" warnings generated while reading a conforming ".xml"
file (warnings I can safely ignore), and those that really need to be
treated as errors (normally because you're dealing with a non-conforming
".xml" file). My testing shows that a conforming ".xml" file generates no
warnings however so I'll have to assume that any warning is really an error
and treat it that way. That may not be true however so I may actually reject
a conforming ".xml" file which will be a problem. I can't seem to resolve
the issue any other way however.
Was the schema inlined with the XML? If not, it'll infer the schema from
the XML and won't throw an exception.

http://msdn2.microsoft.com/en-us/library/360dye2a.aspx

Even when I pass "XmlWriteMode.WriteSchema" to "WriteXml()" and later read
it back in, no errrors are generated.
I've distributed schemas with application code before now as a plain .xsd
file, although this was to a well constrained user population. If your
worried about people tampering with it, you could store it in a .resx file
in your application.

Tampering isn't an issue in my case but it's really an implemenation detail
so I wanted to avoid having to install it merely for this purpose. I'm not
sure what the accepted protocol is however . To validate an ".xml" file, do
you normally install its ".xsd" file for this purpose, assuming you don't
need it for anything else. In any case, my overall experience with this
situation has been very frustrating. I just want to validate an ".xml" file
but apparently I have to become an XML expert to do it. Thanks again for
your help though.
 
G

Glenn

Jack White said:
Since none of the examples I've seen turn this flag on it leads me to
believe I have a problem somewhere. I shouldn't have to turn it on IOW if
nobody else has to.


It always reports it as a warning. I originally thought it would be
reported as an error but apparently not. The problem is that you can't
distinguish between "acceptable" warnings generated while reading a
conforming ".xml" file (warnings I can safely ignore), and those that
really need to be treated as errors (normally because you're dealing with
a non-conforming ".xml" file). My testing shows that a conforming ".xml"
file generates no warnings however so I'll have to assume that any warning
is really an error and treat it that way. That may not be true however so
I may actually reject a conforming ".xml" file which will be a problem. I
can't seem to resolve the issue any other way however.


Even when I pass "XmlWriteMode.WriteSchema" to "WriteXml()" and later read
it back in, no errrors are generated.


Tampering isn't an issue in my case but it's really an implemenation
detail so I wanted to avoid having to install it merely for this purpose.
I'm not sure what the accepted protocol is however . To validate an ".xml"
file, do you normally install its ".xsd" file for this purpose, assuming
you don't need it for anything else. In any case, my overall experience
with this situation has been very frustrating. I just want to validate an
".xml" file but apparently I have to become an XML expert to do it. Thanks
again for your help though.

Don't worry about distributing an XSD with your application, it's just
another file. And it's there for the purpose of making your application
inputs more robust.

Anyway, back your the validation problem...

Before calling the myDataSet.ReadXml( reader ), you need to call while (
reader.Read() ); Doing so will cause the reader to walk over the document
and pick up any problems.

I promise you, this version works, honest ;-)

class Program
{
static void Main()
{
try
{
//WriteData();

//Mangle the data by hand...

ReadData();
}
catch ( Exception exception )
{
Console.WriteLine( exception.Message );
}
finally
{
Console.WriteLine( "\r\nFinished" );
Console.ReadLine();
}

}
static void WriteData()
{
MyDataSet testDataSet = new MyDataSet();

testDataSet.MyTable.AddMyTableRow( "Abraham Lincoln" );

testDataSet.WriteXml( "MyFile.xml", XmlWriteMode.IgnoreSchema );
}
/// <summary>
///
/// </summary>
static void ReadData()
{

MyDataSet myDataSet = new MyDataSet();

XmlReaderSettings settings = new XmlReaderSettings();

settings.Schemas.Add( null, "..\\..\\MyDataSet.xsd" );
settings.ValidationType = ValidationType.Schema;
settings.ValidationFlags |=
XmlSchemaValidationFlags.ReportValidationWarnings;
settings.ValidationEventHandler += new ValidationEventHandler(
delegate( object sender, ValidationEventArgs args )
{
Console.WriteLine( "Severity : {0}\r\nMessage :{1}",
args.Severity.ToString(), args.Message );
} );

XmlReader reader = XmlReader.Create( "MyFile.xml", settings );

//this line is important!!!!
while ( reader.Read() ) ;

myDataSet.ReadXml( reader );
}
}
 
G

Glenn

Jack White said:
Since none of the examples I've seen turn this flag on it leads me to
believe I have a problem somewhere. I shouldn't have to turn it on IOW if
nobody else has to.


It always reports it as a warning. I originally thought it would be reported
as an error but apparently not. The problem is that you can't distinguish
between "acceptable" warnings generated while reading a conforming ".xml"
file (warnings I can safely ignore), and those that really need to be
treated as errors (normally because you're dealing with a non-conforming
".xml" file). My testing shows that a conforming ".xml" file generates no
warnings however so I'll have to assume that any warning is really an error
and treat it that way. That may not be true however so I may actually reject
a conforming ".xml" file which will be a problem. I can't seem to resolve
the issue any other way however.


Even when I pass "XmlWriteMode.WriteSchema" to "WriteXml()" and later read
it back in, no errrors are generated.


Tampering isn't an issue in my case but it's really an implemenation detail
so I wanted to avoid having to install it merely for this purpose. I'm not
sure what the accepted protocol is however . To validate an ".xml" file, do
you normally install its ".xsd" file for this purpose, assuming you don't
need it for anything else. In any case, my overall experience with this
situation has been very frustrating. I just want to validate an ".xml" file
but apparently I have to become an XML expert to do it. Thanks again for
your help though.

Firstly, add a while (reader.Read() ); after you create the XmlReader. That
will cause the reader to walk over the document.

And, if you haven't already done so, add someway of capturing and displaying
the results of the validation from the ValidationEventArgs of the
ValidationEventHandler.

I did a test, for instance changing the data type of the Key element
InnerText which resulted in and XmlSeverity.Error and produced a message
indicating the exact problem.

HTH

Glenn
 
J

Jack White

I appreciate your on-going assistance ...
Don't worry about distributing an XSD with your application, it's just
another file. And it's there for the purpose of making your application
inputs more robust.

Ok (thanks)
Anyway, back your the validation problem...

Before calling the myDataSet.ReadXml( reader ), you need to call while (
reader.Read() ); Doing so will cause the reader to walk over the document
and pick up any problems.

Ok, I'll look into it but why is "reader.Read()" required? I assume (maybe
naively) that any action on the reader will be validated. And in fact, I've
conducted a number of tests now (not exhaustive but so far so good) and the
handler seems to be trapping everything. That is, the call to
"myDataSet.ReadXml(reader)" attempts to read the entire ".xml" file into
"myDataSet" and so validation occurs no differently than if you read it one
node at a time by calling "reader.Read()" first. Doing that would double the
processing time in fact so it doesn't seem to make any sense (on the surface
anyway). Are you sure it's really required? In any case, I do have two
related questions that maybe you know something about:

1) I noticed that if I add a new field, column, etc. to my "DataSet", VS
updates the ".xsd" file of course but I can still read previous ".xml" files
error/warning-free (i.e., older ".xml" files that don't have these new
elements). I'll experiment with your "reader.Read()" scenario to see what it
does (I'm guessing there won't be any difference) but since the schema of
the older ".xml" file isn't an exact match for the updated ".xsd" file, I'm
not sure why it passess validation (probaly because it's still a valid
subset I'm guessing).

2) The first arg to "settings.Schemas.Add" (i.e., the targetNameSpace) is
null. I'm not too familiar with this argument yet (working on it) but is
this safe. Passing null therefore causes the function to pull
"targetNamespace" from the ".xsd" file itself which I assume is typically
the correct path to go (since I'd presumably pass the same value anyway if I
were to explicitly pass the first arg).

Thanks again.
 
G

glenn

Jack White said:
I appreciate your on-going assistance ...


Ok (thanks)


Ok, I'll look into it but why is "reader.Read()" required? I assume (maybe
naively) that any action on the reader will be validated. And in fact, I've
conducted a number of tests now (not exhaustive but so far so good) and the
handler seems to be trapping everything. That is, the call to
"myDataSet.ReadXml(reader)" attempts to read the entire ".xml" file into
"myDataSet" and so validation occurs no differently than if you read it one
node at a time by calling "reader.Read()" first. Doing that would double the
processing time in fact so it doesn't seem to make any sense (on the surface
anyway). Are you sure it's really required? In any case, I do have two
related questions that maybe you know something about:

Thinking about it, you're right, why call it twice? The reader.Read()
should get called by the ReadXml(). The strange thing is, when I didn't
call reader.Read() explicitly I didn't get the validation messages. Could
ReadXml() suppress validation warnings? Could it only care that the XML can
create a "valid" DataSet and to hell with everything else?

Sorry, don't know the answer to this one. Where's an MVP when you need one?
1) I noticed that if I add a new field, column, etc. to my "DataSet", VS
updates the ".xsd" file of course but I can still read previous ".xml" files
error/warning-free (i.e., older ".xml" files that don't have these new
elements). I'll experiment with your "reader.Read()" scenario to see what it
does (I'm guessing there won't be any difference) but since the schema of
the older ".xml" file isn't an exact match for the updated ".xsd" file, I'm
not sure why it passess validation (probaly because it's still a valid
subset I'm guessing).

If your adding columns, the behaviour is absolutely correct, it's forward
compatibility, or maybe backward compatibility. One of the two.
2) The first arg to "settings.Schemas.Add" (i.e., the targetNameSpace) is
null. I'm not too familiar with this argument yet (working on it) but is
this safe. Passing null therefore causes the function to pull
"targetNamespace" from the ".xsd" file itself which I assume is typically
the correct path to go (since I'd presumably pass the same value anyway if I
were to explicitly pass the first arg).

Thanks again.

In this instance it's quite safe.

Glenn
 
J

Jack White

Ok, thanks for all your help (appreciated). Everything seems to working now
but I'm sure I'll be revisting it once I gain more experience.
 
T

Thomas T. Veldhouse

In microsoft.public.dotnet.languages.csharp Arne Vajh?j said:
Try something like:

DataSet ds = new DataSet("TestDS");
XmlReaderSettings xrs = new XmlReaderSettings();
xrs.ValidationType = ValidationType.Schema;
xrs.Schemas.Add(XmlSchema.Read(new
StreamReader(@"C:\ds.xsd"), ValidationEventHandler));
XmlReader xr = XmlTextReader.Create(@"C:\ds.xml", xrs);
ds.ReadXml(xr);

Also, I find that it is worth while to embedd the XSD as a resource in the
assembly. It can then be retreived and used without worry about filesystem
access or other issues [like an assembly being GAC'd or building a custom
directory structure].
 
G

Guest

I wish I could say it worked for me.

This still isn't working for me.

Here's my XML file ("details.xml"):
<?xml version="1.0" encoding="UTF-8" ?>
<details/>

Here's my XSD file ("Test.xsd")
<?xml version="1.0" encoding="utf-8"?>
<xs:schema id="Test" targetNamespace="http://tempuri.org/Test.xsd"
elementFormDefault="qualified" xmlns="http://tempuri.org/Test.xsd"
xmlns:mstns="http://tempuri.org/Test.xsd"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="details" type="xs:string" />
</xs:schema>

Here's my code:
System.Xml.XmlReaderSettings settings = new System.Xml.XmlReaderSettings();
settings.Schemas.Add(null, "Test.xsd");
settings.Schemas.Compile();
settings.ValidationType = System.Xml.ValidationType.Schema;
settings.ValidationFlags |=
System.Xml.Schema.XmlSchemaValidationFlags.ReportValidationWarnings;

settings.ValidationEventHandler += new
System.Xml.Schema.ValidationEventHandler(delegate(object s,
System.Xml.Schema.ValidationEventArgs args)
{
Console.WriteLine("Severity : {0}\r\nMessage :{1}",
args.Severity.ToString(), args.Message);
});

System.Xml.XmlReader reader = System.Xml.XmlReader.Create("details.xml",
settings);

while (reader.Read()) ;

The XML is compliant, but it's still reporting an error:
Could not find schema information for the element 'details'. I've been
fighing with this for hours. This is ridiculous. Any ideas?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top