Few confusing things about XmlSerializer and xml serialization

K

klem s

1) If, when object is serialized using BinaryFormatter, runtime is
able to create object graph ( which documents how a set of objects
refer to each other), then why don’t we let runtime figure that out
also when persisting object state to xml? Instead, XmlSerializer
requires us to manually specify dependencies between related objects
( by specifying type information that represents each subelement
nested within the root ).

2)
I assume object can be reconstructed only if the environment receiving
and de-serializing the object-graph also contains assemblies
containing types specified in object graph ( thus, if object is
serialized in Net environment, then it can only be reconstructed in
Net environment )?

3) Isn’t the very definition of the term serialization that we persist
object state in a manner that enables us to reconstruct an object
being serialized? Since XmlSerializer only persists data, but doesn’t
enable us to reconstruct an object, why then do we claim that it
serializes the state of an object?

4) One of arguments why Xml doesn’t also persist each type’s fully
qualified name and name of defining assembly is that this way its
state can be used by any OS, application framework or programming
language. But can’t object state persisted via SoapFormatter also be
used by any OS or app framework or programming language?

I’m asking because SoapFormater does persist the name of the assembly
through the use of namespaces – that way it can be used by both Net
( which would de-serialize the object ) and also by other
environments. So why doesn’t XmlSerializer also persist assembly name
through the use of namespaces?!



5) I can understand how serialization may be useful when persisting
via BinaryFormatter, which enables us to reconstruct the object. But I
fail to see the importance of persisting object to xml format and
transferring it to the system (say Java) that knows nothing about Net
data types and thus isn’t able to reconstruct an object.

Thus, why would persisting object state inside xml be any more useful
than persisting its state inside Sql database file, which would then
be send ( instead of a xml file ) to the remote computer?
 
K

klem s

In your post you quite often use the term "reconstructing object
graph" ( in my answers I interpret that as if you actually meant
"reconstructing an object" - perhaps my interpretation is off ).
Anyways, here is a related question:

From msdn:

“BinaryFormatter.Deserialize - Deserializes the specified stream into
an object graph.
Return value - The top (root) of the object graph.”

a) Isn’t here the term object graph used incorrectly, since to my
understanding object graph is the information which describes the
dependencies between serialized objects and the values these objects
hold, but I’d argue this information isn’t the object itself?

b) Thus, saying it returns object graph would suggest it returns to
the caller the information on how to build this object, when in fact
it returns the re-constructed object?

c) I assume “the top(root) of the object graph” refers to type Object?


Different API, different features and requirements.

Could you elaborate on this? I understand XmlSerializer meets
different requirements, but how would letting runtime figure out
dependencies ( instead of us manually specifying them ) prevent
XmlSerializer from fulfilling those requirements?
It depends on your requirements. If you are asking whether you can get
exactly the same object implementation deserializing objects, then
yes…obviously you must be able to execute the code that went along with
that object in the first place.

But there's nothing preventing some non-.NET environment from
deserializing data that a .NET program serialized, and providing its own
implementation for the class supporting that object type.
I assume by providing its own implementation you’re talking about
defining some class C, which would have field members able to hold de-
serialized field values, which we would manually assign to members of
C?

None of the serialization APIs in .NET persist anything except data.
And as you already noted, it is in fact possible to reconstruct object
graphs (for example) deserializing XML-serialized objects, with
additional effort.
* Uhm, I don’t recall claiming that it’s possible to reconstruct
object graphs ( if by reconstructing object graphs you really mean
reconstructing actual objects ) by de-serializing Xml-serialized
objects. I understand we are able to extract field data of an xml
serialized object, but I don’t know how we would be able to
reconstruct actual object ( see further down my post where I discuss
the meaning of term “reconstruct” ) ?!
with additional effort.

* I know I’m being repetitive, since I ask the following quite a few
times in this post - by “additional effort” you mean defining class C
with its field members being able to hold de-serialized filed values
and then manually assigning de-serialized values to these field
members?

You don't even need to be deserializing using .NET code to reconstruct
the basic data structures.

But we must know in advance the type of fields serialized object holds
and thus define a class with members able to hold values of de-
serialized fields?

Different API, different features and requirements.
So when should we use SOAP and when XML?
If it were true that a Java system could not reconstruct an object
serialized in .NET (whatever the serialization method used) then perhaps
one could say that if interoperability with Java were important, one
should not use .NET serialization.
I’m not sure if we’re using the same terminology here. Doesn’t the
term reconstruction mean re-creating the serialized object? As you've
said, object serialized in Net can’t be re-created in Java.

But I suspect when you talk about reconstructing an object graph
( serialized in Net ) in Java, you’re talking about defining some Java
class C, which would have field members able to hold de-serialized
field values? Thus, in Java we would de-serialize field values and
MANUALLY assign them to members of C?

cheers
 
K

klem s

... By definition, if you have some objects referring
to other objects, the entire collection of objects can be referred to as
a "graph" and it's obviously the _graph_ that's of interest.  It's
generally not enough to just recreate each object; you need to fix up
all the references stored in each object that are what define the graph.
I didn’t mean to imply that reconstructing an object doesn’t also
include restoring all the references held by each object

The quote is stating the situation correctly.  The _graph_ is the
collection of objects along with the relationships between the objects.
  That's what BinaryFormatter.Deserialize() is reconstructing.
Then we could claim that the following method also returns an object
graph, since returned object of type B is basically a collection of
objects with some type of relationships established between each
other:

object void some_Method()
{
return new B();
}

class A{}
class B : A
{
public C c = new C ();
}
class C{}

BTW – are there situations where the term “reconstructing an object
graph” also includes the reconstruction of object’s behaviour ( I
realize that this makes sense only if either object’s methods are also
persisted or if we serialize object O via BinaryFormatterand and then
also de-serialize O via BinaryFormatter)?

Field data (or in many cases more properly, property data) is the only
thing you have in serialized data.  

I don’t completely agree with your claim since ( at least with
BinaryFormatter ), there’s additional data stored in object graph that
also describes parent-child relationship of a persisted object – I
wouldn’t consider this information as field / property data?!
Sometimes those fields are
references to other objects.  As long as your serialization method can
allow for storing the data and relationships for all of the objects
involved, the entire graph can be reconstructed.

And all of the serialization methods discussed here allow that.
That’s not strictly true when it comes to XmlSerializer, since it
doesn’t describe parent-child relationship?

That depends entirely on the serialization method.  But yes, usually
it's expected that the data types are implicit and that the code
deserializing the data knows what types to expect.
What do you mean by “data types being implicit”?
That's not to rule out other serialization methods in which the type
information is embedded.  In fact, it's my recollection that both
BinaryFormatter and SoapFormatter include enough type information in the
serialized data that you could in fact reconstruct a _container_ type in
which all of the data could be stored upon deserialization.
I’m aware that both BinaryFormatter and SoapFormatter also serialize
assembly and type names, which enables Net application to re-construct
an instance of same type as one that was serialized. Is that what you
meant by reconstructing a _container_type?
…include enough type information in the serialized data…

Doesn't the above excerpt somewhat contradicts with your statement
that only information persisted is field/property data:

"Field data (or in many cases more properly, property data) is the
only thing you have in serialized data."

Note, of course, that the data in an object does not fully define the
object.  There's all the code that goes with an object as well.  Some
simple types are just data containers, and those are easy enough to
recreate from scratch based on type information embedded in the
serialized data.  But most interesting types have a wealth of
implementation detail, none of which is part of the serialized data.
Is there a way to a also serialize the behaviour of an object ( ie
methods ), which could then be de-serialized on some non-Net
environment?
You should use SOAP if you're serializing data to be consumed by some
third-party that requires the use of SOAP.  I would personally not use
it for internal-only serialization,

By internal-only serialization are you referring to data that gets
serialized and de-serialized only within the same application ( thus
data that won’t get de-serialized by external app )?


BTW - my book claims that strictly speaking XmlSerializer does not
persist state using object graph. XmlSerializer persists field/
property data and it also indirectly persists a relationship between
objects ( via subelements ), so why would book make such claims?

thank you
 
K

klem s

Why not? The graph has to be stored in fields _somewhere_. Most
commonly, this is in the form of object references within the objects
that make up the graph themselves. In that case, _obviously_ even the
graph information being persisted is coming from field or property data.
I assume you mean that with each object in object graph an additional
field(s) is serialized, which describes object’s dependencies/
references? Thus if each object in object graph is assigned a unique
numerical value, then information in these additional fields somehow
describes how an object is related to other objects in graph?

And thus, if app de-serializing this object graph knows how to
interpret these additional fields, it can correctly reconstruct an
object graph?

Of course it does. It requires more work, but nested XML elements are
commonly used with XmlSerializer to describe parent/child relationships.

I understand your point that with some effort we can also serialize
into xml relationships between objects, but I assume in next example
XmlSerializer doesn’t store the parent-child relationship, since no
sub-element is created for type A:

XmlSerializer xmlFormat = new XmlSerializer ( typeof( B ),
new Type[] { typeof( A ), typeof( D ) } )

class A{}
class B:A { D d = new D(): }
class D{}

I mean that you cannot tell by inspecting the serialized data what the
types are. For example, if I write the text "17", can you tell me
whether that is supposed to be deserialized as a byte, a short, an int,
a float, a double, a decimal, or a string? No. The data types are
implied by — that is, they are implicit — the serialization method that
was used.

But if I include some meta-data that goes along with the data itself,
and which describes the format of the data, then the data types are
stored _explicitly_ in the serialized data.
So BinaryFormatter and SoapFormatter also somehow serialize
information which describes the types of serialized fields. But do the
two include type information only for primitive types?
No. By "reconstructing a container type", I mean that if you've got
(for example) a class A that looks like this:

class A
{
public int IntValue { get; set; }
public byte ByteValue { get; set; }
public float FloatValue { get; set; }
}

…then one could write code to deserialize that data that actually built
a whole new type at run-time to hold that data.

Slightly off topic, but are you saying that Net (and perhaps some
other frameworks also) enables us to define new class at runtime?

thank you
 
K

klem s

Ok, I've already posted two replies but for some reason google groups
doesn't display them. I will try yet again and hopefully this time it
will work.

I don't think you are quite getting my precise point.  
I should be more specific in my post, but I was actually asking more
about how dependencies are stored when serializing via
BinaryFormatter. Knowing that, is my assumption about how dependencies
are Serialized via BinaryFormatter more or less correct?


I understand your point that with some effort we can also serialize
into xml relationships between objects, but I assume in next example
XmlSerializer doesn’t store the parent-child relationship, since no
sub-element is created for type A:
XmlSerializer xmlFormat = new XmlSerializer ( typeof( B ),
                new Type[] {  typeof( A ), typeof( D ) } )
class A{}
class B:A { D d = new D(): }
class D{}

You assume incorrectly.

First, you need to understand that the class A is not a child of the
class B.  Thus, there is no parent/child relationship to emit.

Second, if in your example the class B _did_ have a public member
referencing a child object, XmlSerializer would in fact emit that child
object as part of the serialized data, by representing it as a nested
element within the XML.

For example:

using System;
using System.IO;
using System.Xml.Serialization;

namespace TestXmlSerializer
{
     public class Base
     {
         public int baseInt = 31;
     }

     public class Derived : Base
     {
         public Other other = new Other();
     }

     public class Other
     {
         public int i = 17;
     }

     class Program
     {
         static void Main(string[] args)
         {
             XmlSerializer serializer = new XmlSerializer(typeof(Derived));

             using (StringWriter writer = new StringWriter())
             {
                 serializer.Serialize(writer, new Derived());
                 Console.WriteLine(writer.ToString());
             }

             Console.ReadLine();
         }
     }

}

That code emits the following XML:

   <?xml version="1.0" encoding="utf-16"?>
   <Derived xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
     <baseInt>31</baseInt>
     <other>
       <i>17</i>
     </other>
   </Derived>

Note that the instance of the Other class, in the field "other", is
included as a nested element of the Derived object.  Note also that the
public member in the base class A is emitted, not as a child of the
"Derived" element, but simply as a normal member of the "Derived"
element, just as it is in the language object model.

I already knew all of that :). What I was pointing out is that Derived
class is a child of Base class, but this parent child relationship
isn’t recorded in xml. Thus, just by looking at resulting xml one
would never figure out that Derived is child of a Base.

BTW - I realize I was wrong when suggesting we should create
subelement A to indicate parent child relationship, since subelements
indicate members of Derived and not parents/children – I had one of my
brain fart episodes and as such wasn’t thinking clearly

thank you
 
K

klem s

Uh, I’ve “missed” ( or somehow forgot about it ) the following
statement from your previous post, else I’d already address the issue
in previous post:
First, you need to understand that the class A is not a child of the
class B. Thus, there is no parent/child relationship to emit.
[...]
I already knew all of that :).

Define "that".  Because this statement:
What I was pointing out is that Derived
class is a child of Base class,

…is definitely not true.  The Derived class is a _sub-class_ of the Base
class.  It has no parent/child relationship.  It's simply an instanceof
Base, by virtue of having inherited that class.
It was only in your last post that I’ve noticed that you use the term
parent/child to describe different type of relationship ( aka the
relationship between a class and its members ), but if I’m not
mistaken parent/child term also applies to base/derived class
relationship?!

Namely, book I’m reading often uses this term to describe relationship
between the base class and the derived class. I’ve also seen quite a
few articles on the net using the term parent/child when talking about
base/derived classes
There's no need for the XML, or any serialization format, to preserve
that information.

Why wouldn't it be important to also persist base/derived class
relationship?

You seem to be changing your definition of "parent/child relationship".
Not really. I always associated the term with base/derived class
relationship ( in this thread the only time I’ve associated the term
with different kind of relationship ( and even than implicitly ) was
with statement “ I already knew all of that”, which was a response to
your explanation that all public members get persisted by
XmlSerializer ( and you used the term to describe class/members
relationship )

  In general, we use "parent" and "child" to refer to completely
different objects that have a specific referencing relationship.
Specifically, the parent is an object that references the child in some
type of hierarchy (typically a tree data structure, which is a form of
graph).
So parent/child should be used to describe class/member relationship?

So throughout the entire thread when we talked about parent/child
relationships, I was thinking of base/derived class relationship and
you of class/members relationships?!

thank you
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top