compare XML serialized file and a normal XML file

T

Tony Johansson

Hi!

Below I have two blocks of data.the first block is from 3 Movie objects that
have been XML serialized.
The second block of data is just these three movie object in an XML file.
I just wonder when I look at these two blocks of data they look almost
identical.. There are some minor differences.
So my question is simply is a XML serialized file the same as an normal XML
file ?

?xml version="1.0"?>
<ArrayOfMovie xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<Movie>
<Title>My Sister's Keeper</Title>
<RunningLength>109</RunningLength>
<ProductionYear>2009</ProductionYear>
<IsbnNumber>9100120820</IsbnNumber>
<Url>http://images.filmtipset.se/posters/86432038.jpg</Url>
</Movie>
<Movie>
<Title>Fight Club</Title>
<RunningLength>139</RunningLength>
<ProductionYear>1999</ProductionYear>
<IsbnNumber>9100123064</IsbnNumber>
<Url>http://images.filmtipset.se/posters/93394736.jpg</Url>
</Movie>
<Movie>
<Title>Star Trek</Title>
<RunningLength>127</RunningLength>
<ProductionYear>2009</ProductionYear>
<IsbnNumber>9172321385</IsbnNumber>
<Url>http://images.filmtipset.se/posters/83532544.jpg</Url>
</Movie>
</ArrayOfMovie>


//This is a normal XML file consisting of three Movie objects.
<?xml version="1.0" encoding="utf-8" ?>
- <movies>
- <movie>
<title>My Sister's Keeper</title>
<runningLength>109</runningLength>
<productionYear>2009</productionYear>
<isbn>9100120820</isbn>
<url>http://images.filmtipset.se/posters/86432038.jpg</url>
</movie>
- <movie>
<title>Fight Club</title>
<runningLength>139</runningLength>
<productionYear>1999</productionYear>
<isbn>9100123064</isbn>
<url>http://images.filmtipset.se/posters/93394736.jpg</url>
</movie>
- <movie>
<title>Star Trek</title>
<runningLength>127</runningLength>
<productionYear>2009</productionYear>
<isbn>9172321385</isbn>
<url>http://images.filmtipset.se/posters/83532544.jpg</url>
</movie>
</movies>
 
M

Martin Honnen

Tony said:
Below I have two blocks of data.the first block is from 3 Movie objects that
have been XML serialized.
The second block of data is just these three movie object in an XML file.
I just wonder when I look at these two blocks of data they look almost
identical.. There are some minor differences.
So my question is simply is a XML serialized file the same as an normal XML
file ?

Define "normal XML file".

Hopefully any XML serialization creates a well-formed XML document if
that is what you consider "normal".

With .NET's XmlSerializer you can use certain attributes in your class
and member definitions to for instance specify the element or attribute
name a class/member is mapped to, see
http://msdn.microsoft.com/en-us/library/83y7df3e(v=VS.90).aspx
so that way you can get the names for instance you had in the sample you
called normal.
 
T

Tony Johansson

Martin Honnen said:
Define "normal XML file".

Hopefully any XML serialization creates a well-formed XML document if that
is what you consider "normal".

With .NET's XmlSerializer you can use certain attributes in your class and
member definitions to for instance specify the element or attribute name a
class/member is mapped to, see
http://msdn.microsoft.com/en-us/library/83y7df3e(v=VS.90).aspx
so that way you can get the names for instance you had in the sample you
called normal.

With normal XML file I mean if I for example use VS create XML file and
enter the data into the file.
//Tony
 
M

Martin Honnen

Tony said:
With normal XML file I mean if I for example use VS create XML file and
enter the data into the file.

The main difference between your samples is the case of letters in
element names and the exact term used in element names I think. I don't
see why e.g.
<isbn>9172321385</isbn>
is "normal" and e.g.
<IsbnNumber>9172321385</IsbnNumber>
is not "normal".
And the XmlSerializer (by default) always emits two namespace
declarations on the root element, in case the namespaces might be used
deeper in the document with attributes like xsi:type.
 
T

Tony Johansson

Martin Honnen said:
The main difference between your samples is the case of letters in element
names and the exact term used in element names I think. I don't see why
e.g.
<isbn>9172321385</isbn>
is "normal" and e.g.
<IsbnNumber>9172321385</IsbnNumber>
is not "normal".
And the XmlSerializer (by default) always emits two namespace declarations
on the root element, in case the namespaces might be used deeper in the
document with attributes like xsi:type.

Sorry it was a typo it should be IsbnNumber!

//tony
 
A

Alan Meyer

Hi!

Below I have two blocks of data.the first block is from 3 Movie objects that
have been XML serialized.
The second block of data is just these three movie object in an XML file.

I'm not sure what you mean by "normal". Both of your XML objects
are serialized XML. "Serialized" just means that the XML is in
the form of a text stream, not a collection of nodes in some
internal format like DOM, ElementTree, or whatever.

Serialized XML is standardized. It delimits tags with angle
brackets, it has a specific way of representing, attributes,
namespaces, comments, processing instructions, text, etc.

Other formats, such as a DOM tree, do not have any standard
representation. A DOM tree has a standard interface, but the
data is represented however the implementer wants to represent it
internally. It's almost certainly different for every DOM
implementation.
I just wonder when I look at these two blocks of data they look almost
identical.. There are some minor differences.
So my question is simply is a XML serialized file the same as an normal XML
file ?

Answer = Yes - if by "normal" you mean "serialized" :^)

Less sarcastically, if by "normal" you mean what you showed us,
the answer is still, almost, Yes. But see the comment on
hyphens below.
//This is a normal XML file consisting of three Movie objects.
<?xml version="1.0" encoding="utf-8" ?>
-<movies>
-<movie>
....

Actually, this is serialized XML with an illegal character in it,
namely the first hyphen. It looks like something you copied from
an Internet Explorer screen, not something that came straight
from a text file of legal XML. The other hyphens are not
necessarily illegal, but I bet you don't really intend for them
to be part of the XML. I bet that they're IE artifacts.

Also, this XML document does not contain three "Movie" objects.
It contains three "movie" objects. Case is significant.

What you need to do is find a good beginner reference book or
tutorial on XML. There are some on the Internet.

Now having said that, I will also say that your original problem
of how to compare two XML documents is not trivial to solve. A
text comparison using a tool like "diff" or "fc" doesn't work
because two documents that are identical from an XML point of
view may differ from a text point of view due to line breaks,
spaces, character entities, single vs. double quotes, etc.

I know of two approaches. One it to get a specialized XML
comparator such as Altova's "DiffDog". There are some open
source ones, but I don't know which ones actually work and/or
are currently maintained.

A second approach is to pass each of the two documents through an
indent formatter that eliminates the non-significant differences
between the docs, then pass the output of that to a textual diff
program. That works but can be harder to use.

A third approach, of course, is to write your own program. But
it looks like you're not ready for that yet.

Alan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top