BinaryFormatter Serialization format? Is it documented?

  • Thread starter Thread starter googlenewsgroups
  • Start date Start date
G

googlenewsgroups

I have a custom set of objects that is correctly serializing &
deserializing the object tree. However in some cases the size of the
resultant serialization is larger than I would expect. Is there any
documented way to pull apart the stream to determine the sizes of the
objects embedded within the file? For example if I open the file up in
a Hex editor the 'containing' format looks reasonably simple to
decode, but I would rather be coding business logic rather than
figuring out a custom file format! Is the format documented (or
accessible via API)?

What I am looking for is a way to see all the of the named of the
serialized object within a stream with a count of the instances and
number of bytes taken up by the serialization.

Is there a mechanism or tool to do this?

Many thanks,

Gareth
 
Gareth,

I doubt there is, as the document isn't formatted.

Because it isn't formatted, I have to wonder how you are formulating the
idea that the serialized instance is larger than you expect. There is
nothing to base that idea on.

How big are the files, and what is the size that you are expecting them
to be? Also, is the file size really causing that much of a problem?
 
Firstly thanks for the reply!

It is the size is bigger than I expected because there is (I am
assuming) a rogue object that is keeping a reference to a contained
object that it shouldn't (if I serialize the same object tree twice
after significant processing the second serialization is larger than
the first). As such the serialization is actually pulling in too much
information (and so the larger size) - not a serialization bug, but an
implementation bug in one of the objects keeping a rogue pointer (or
so I suspect). Unfortunately due to the fact that the objects can be
dynamically added to the graph via a plugin architecture (and the
objects persist their own state) it is hard to determine where the
delta data size is coming from.

The best 'documentation' I've found is located
http://primates.ximian.com/~lluis/dist/binary_serialization_format.htm
.. It appears however that the sscli20 has the implementation code for
the formatters so I could probably work it out from there - but that
is not something on my fun to do list!

The format is 'formatted', but just not in a human readable manner.
The deserialization knows how to read it, and reconstitute the
encapsulated objects. I'm only looking for the size of the objects :-)

So unless someone has written a 'summarization' reader of
binaryformatted to determine sizes and graphs of the data, and there
isnt a API to get the size of the individual data elements within the
stream I'll have to either write one - or poke around with notepad to
see what is being persisted and shouldn't be!!
 
Well, if you are not in control of the plug ins, then there is little
you can do. If the plug ins are under your control, then the same
techniques you apply to the main object being serialized can be applied to
the objects attached to the graph by the plug ins.

The very first thing I would look at is to make sure you are not
serializing any delegate chains to the graph. Delegates will end up
serializing every object that they hold a reference to (assuming that you
are holding a delegate to an instance method), and that could easily add the
overhead you are seeing if you have objects subscribing to events that the
serialized object is exposing.

The easy way to get around this is to mark the events with NonSerialized
attribute, using the field indicator, like so:

[field:NonSerialized]
public event EventHandler MyEvent;

That should keep your events from not being serialized (assuming this is
the case in the first place).

If it is not, to be honest, I would just look at the object tree in the
debugger at the point before you serialize. You should be able to easily
determine (with enough digging) which objects are necessary and which are
not.

--
- Nicholas Paldino [.NET/C# MVP]
- (e-mail address removed)


Firstly thanks for the reply!

It is the size is bigger than I expected because there is (I am
assuming) a rogue object that is keeping a reference to a contained
object that it shouldn't (if I serialize the same object tree twice
after significant processing the second serialization is larger than
the first). As such the serialization is actually pulling in too much
information (and so the larger size) - not a serialization bug, but an
implementation bug in one of the objects keeping a rogue pointer (or
so I suspect). Unfortunately due to the fact that the objects can be
dynamically added to the graph via a plugin architecture (and the
objects persist their own state) it is hard to determine where the
delta data size is coming from.

The best 'documentation' I've found is located
http://primates.ximian.com/~lluis/dist/binary_serialization_format.htm
. It appears however that the sscli20 has the implementation code for
the formatters so I could probably work it out from there - but that
is not something on my fun to do list!

The format is 'formatted', but just not in a human readable manner.
The deserialization knows how to read it, and reconstitute the
encapsulated objects. I'm only looking for the size of the objects :-)

So unless someone has written a 'summarization' reader of
binaryformatted to determine sizes and graphs of the data, and there
isnt a API to get the size of the individual data elements within the
stream I'll have to either write one - or poke around with notepad to
see what is being persisted and shouldn't be!!
 
Thanks,

I'll have a look at the object tree just prior to serialization. For
now we are in control of the objects (just not me personally), and
I'll have a look at the in memory tree to see if it yields any
pointers (thanks for that suggestion).

I'm still a little surprised that there isnt a way to 'dump' the
serialized file into an object tree showing sizes as that seems the
most expeditious route if the tool existed. But oh well! Perhaps when
I get time I'll write one (assuming no one subsequently points me to a
pre-existing tool)

Thanks,
 
Temporarily switch to use the SoapFormatter instead of
BinaryFormatter. Then the serialized form will be easily readable.
Once you know where the rogue objects are you can switch back to
binary formatting.

HTH,

Sam
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Back
Top