XML - compare

V

Ve

Hi,
I have two XmlDocument object. In fact this is one xml, but one is from
internet, other from data base.

I would like to compare this xml’s. I would like to check if this two
xml’s in fact is the same.

But this xml’s hava some differences like:
-empty tags can be written in two forms <tag></tag> or <tag />,
-xml’s could be created on two different strings, where ends of the line
could be written like: „\n†or „\r\nâ€,
-in tags can be html, characters like '<' and '>' can be like written
like '&lt;' and '&gt',
or characters '<' and '>' can be inside the block of <![CDATA[ something
]]>

But I know, that in fact this is the same xml, with the same content.
I have this problem for a long time and I’m not able to resolve it.
Thanks for help
 
S

Sergiu DUDNIC

Hello,

In your case, because you must compare the same things, I see 2 ways:
1) Bring both documents to the same format, after that compare it (in
particular case, you can
a) remove all the "redundant formatting"(like \n, &lt, "-' ) from both
documents, and compare contents only or
b) bring both "redundant formatting" to a same "standard"
c) bring one of documents to the format of other and compare them

You can use also try to use the XML Diff & Patch GUI Tool

/serhio
 
V

Ve

Sergiu DUDNIC pisze:
Hello,

In your case, because you must compare the same things, I see 2 ways:
1) Bring both documents to the same format, after that compare it (in
particular case, you can
a) remove all the "redundant formatting"(like \n, &lt, "-' ) from
both documents, and compare contents only or
b) bring both "redundant formatting" to a same "standard"
c) bring one of documents to the format of other and compare them

You can use also try to use the XML Diff & Patch GUI Tool
[...]

But, how to make XmlDocument to return the same standard in case
<tag></tag> and </tag> ?

I can remove all "&lt" etc, but what should I do in case <![CDATA[
something ]]> ?
 
M

Martin Honnen

Ve said:
But, how to make XmlDocument to return the same standard in case
<tag></tag> and </tag> ?

I don't see the problem, whether the markup in the XML document is
<foo></foo> or <foo/> or <foo />, in the DOM object model or in the
XPath data model you will have an element node with name foo and no
child elements so you can simply compare on the node type, on the node
name and if needed on the child nodes.
I can remove all "&lt" etc, but what should I do in case <![CDATA[
something ]]> ?

Well you could use the XPath data model which only knows text nodes and
does not distinguish between text nodes and CDATA section nodes. Or your
comparison code would need to ensure that text nodes and cdata section
nodes are compared based on the node value, that will be the same then
whether you have e.g.
<foo>a &amp; b</foo>
or
<foo><![CDATA[a & b]]></foo>
does not matter, you will still have the same node value.

As for line endings, with .NET 2.0 and later, if you use
XmlReader.Create to parse your XML, then line endings are normalized
anyway as the XML specification requires. Thus if you work with
System.Xml.XmlDocument then use e.g.
XmlDocument doc1 = new XmlDocument();
doc1.Load(XmlReader.Create("file1.xml"));
instead of
XmlDocument doc1 = new XmlDocument();
doc1.Load("file1.xml");
then line endings are normalized and your comparison code does not have
to worry about that.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top