BinaryFormatter Serialization format? Is it documented?

googlenewsgroups · May 1, 2007

I have a custom set of objects that is correctly serializing &
deserializing the object tree. However in some cases the size of the
resultant serialization is larger than I would expect. Is there any
documented way to pull apart the stream to determine the sizes of the
objects embedded within the file? For example if I open the file up in
a Hex editor the 'containing' format looks reasonably simple to
decode, but I would rather be coding business logic rather than
figuring out a custom file format! Is the format documented (or
accessible via API)?

What I am looking for is a way to see all the of the named of the
serialized object within a stream with a count of the instances and
number of bytes taken up by the serialization.

Is there a mechanism or tool to do this?

Many thanks,

Gareth

Nicholas Paldino [.NET/C# MVP] · May 1, 2007

Gareth,

I doubt there is, as the document isn't formatted.

Because it isn't formatted, I have to wonder how you are formulating the
idea that the serialized instance is larger than you expect. There is
nothing to base that idea on.

How big are the files, and what is the size that you are expecting them
to be? Also, is the file size really causing that much of a problem?

googlenewsgroups · May 1, 2007

Firstly thanks for the reply!

It is the size is bigger than I expected because there is (I am
assuming) a rogue object that is keeping a reference to a contained
object that it shouldn't (if I serialize the same object tree twice
after significant processing the second serialization is larger than
the first). As such the serialization is actually pulling in too much
information (and so the larger size) - not a serialization bug, but an
implementation bug in one of the objects keeping a rogue pointer (or
so I suspect). Unfortunately due to the fact that the objects can be
dynamically added to the graph via a plugin architecture (and the
objects persist their own state) it is hard to determine where the
delta data size is coming from.

The best 'documentation' I've found is located
http://primates.ximian.com/~lluis/dist/binary_serialization_format.htm
.. It appears however that the sscli20 has the implementation code for
the formatters so I could probably work it out from there - but that
is not something on my fun to do list!

The format is 'formatted', but just not in a human readable manner.
The deserialization knows how to read it, and reconstitute the
encapsulated objects. I'm only looking for the size of the objects :-)

So unless someone has written a 'summarization' reader of
binaryformatted to determine sizes and graphs of the data, and there
isnt a API to get the size of the individual data elements within the
stream I'll have to either write one - or poke around with notepad to
see what is being persisted and shouldn't be!!

Nicholas Paldino [.NET/C# MVP] · May 1, 2007

Well, if you are not in control of the plug ins, then there is little
you can do. If the plug ins are under your control, then the same
techniques you apply to the main object being serialized can be applied to
the objects attached to the graph by the plug ins.

The very first thing I would look at is to make sure you are not
serializing any delegate chains to the graph. Delegates will end up
serializing every object that they hold a reference to (assuming that you
are holding a delegate to an instance method), and that could easily add the
overhead you are seeing if you have objects subscribing to events that the
serialized object is exposing.

The easy way to get around this is to mark the events with NonSerialized
attribute, using the field indicator, like so:

[field:NonSerialized]
public event EventHandler MyEvent;

That should keep your events from not being serialized (assuming this is
the case in the first place).

If it is not, to be honest, I would just look at the object tree in the
debugger at the point before you serialize. You should be able to easily
determine (with enough digging) which objects are necessary and which are
not.

--
- Nicholas Paldino [.NET/C# MVP]
- (e-mail address removed)

Firstly thanks for the reply!

It is the size is bigger than I expected because there is (I am
assuming) a rogue object that is keeping a reference to a contained
object that it shouldn't (if I serialize the same object tree twice
after significant processing the second serialization is larger than
the first). As such the serialization is actually pulling in too much
information (and so the larger size) - not a serialization bug, but an
implementation bug in one of the objects keeping a rogue pointer (or
so I suspect). Unfortunately due to the fact that the objects can be
dynamically added to the graph via a plugin architecture (and the
objects persist their own state) it is hard to determine where the
delta data size is coming from.

The best 'documentation' I've found is located
http://primates.ximian.com/~lluis/dist/binary_serialization_format.htm
. It appears however that the sscli20 has the implementation code for
the formatters so I could probably work it out from there - but that
is not something on my fun to do list!

The format is 'formatted', but just not in a human readable manner.
The deserialization knows how to read it, and reconstitute the
encapsulated objects. I'm only looking for the size of the objects

So unless someone has written a 'summarization' reader of
binaryformatted to determine sizes and graphs of the data, and there
isnt a API to get the size of the individual data elements within the
stream I'll have to either write one - or poke around with notepad to
see what is being persisted and shouldn't be!!

googlenewsgroups · May 1, 2007

Thanks,

I'll have a look at the object tree just prior to serialization. For
now we are in control of the objects (just not me personally), and
I'll have a look at the in memory tree to see if it yields any
pointers (thanks for that suggestion).

I'm still a little surprised that there isnt a way to 'dump' the
serialized file into an object tree showing sizes as that seems the
most expeditious route if the tool existed. But oh well! Perhaps when
I get time I'll write one (assuming no one subsequently points me to a
pre-existing tool)

Thanks,

Samuel R. Neff · May 1, 2007

Temporarily switch to use the SoapFormatter instead of
BinaryFormatter. Then the serialized form will be easily readable.
Once you know where the rogue objects are you can switch back to
binary formatting.

HTH,

Sam

one major disadvantages with the binaryformatter is that you don't know what you have serialized	5	Mar 7, 2010
Problem with BinaryFormatter Deserialize method	10	Sep 23, 2009
Different Serialization Technique In .NET	0	Sep 27, 2013
how to write serialized object to disc	2	Apr 6, 2009
SerializationException thrown when deserializing: "Unable to findassembly"	2	Dec 18, 2008
Few confusing things about XmlSerializer and xml serialization	6	Sep 30, 2010
How does tcpclient.getstream know when a serialized objects ends?	2	Oct 16, 2009
Can a Object from BinaryFormatter be changed to SoapFormatter? It's Urgent!!!! Help me.	3	May 12, 2005

BinaryFormatter Serialization format? Is it documented?

googlenewsgroups

Nicholas Paldino [.NET/C# MVP]

googlenewsgroups

Nicholas Paldino [.NET/C# MVP]

googlenewsgroups

Samuel R. Neff

Ask a Question

Similar Threads