Fast binary serialization/deserialization of object

S

schoenfeld1

I've implemented IPC between two applications using named pipes and
binary serialization, but have noticed that the binary formatter is
rather slow.

It seems that the binary formatter reflects the entire type everytime
it is invoked to serialize/deserialize an object of that type.

Is there a way to prepare the binary formatter with a pre-defined type,
such that it only reflects once but can be re-used to
serialize/deserialize objects of that type without performing
reflection again?

Here is the relevant code:

static public byte[] Serialize(Message message) {
#region Pre-conditions
Debug.Assert(message != null);
#endregion
MemoryStream memStream = new MemoryStream(1024);
BinaryFormatter binFormatter = new BinaryFormatter();
binFormatter.Serialize(memStream, message);
return memStream.GetBuffer();
}


static public Message Deserialize(byte[] data) {
#region Pre-conditions
Debug.Assert(data != null);
#endregion
BinaryFormatter binFormatter = new BinaryFormatter();
MemoryStream memStream = new MemoryStream(data);
object obj = binFormatter.Deserialize(memStream);
Debug.Assert(obj is Message);
return obj as Message;
}

Any assistance is appreciated.
 
S

Steve Walker

In message said:
I've implemented IPC between two applications using named pipes and
binary serialization, but have noticed that the binary formatter is
rather slow.

It seems that the binary formatter reflects the entire type everytime
it is invoked to serialize/deserialize an object of that type.

Is there a way to prepare the binary formatter with a pre-defined type,
such that it only reflects once but can be re-used to
serialize/deserialize objects of that type without performing
reflection again?

Implement ISerializable and write your own serialization code. If you're
passing anything substantial you can get a significant performance
improvement. The downside is the potential maintenance costs, since you
will probably have to touch your serialization code whenever members are
added to your class. I wouldn't do it unless I had to, and I think I've
only ever done it once in production code.
 
S

schoenfeld1

Steve said:
Implement ISerializable and write your own serialization code. If you're
passing anything substantial you can get a significant performance
improvement. The downside is the potential maintenance costs, since you
will probably have to touch your serialization code whenever members are
added to your class. I wouldn't do it unless I had to, and I think I've
only ever done it once in production code.

This is not a problem at all. My objects are not complex at all. They
are just frequently serialized/deserialized. Do you think custom
serialization would provide significant performance boost in this case,
or should I just implement my own object serialization?
 
S

Steve Walker

In message said:
Steve Walker wrote:

This is not a problem at all. My objects are not complex at all. They
are just frequently serialized/deserialized. Do you think custom
serialization would provide significant performance boost in this case,
or should I just implement my own object serialization?

I think you'll probably have to benchmark it to find out.

If I remember rightly, the project I used it in was serializing a large
collection of Person objects, each of which contained subobjects. We
flattened each person and subobjects into a struct and stuffed an array
of those into SerializationInfo. I've no idea whether what we did was
optimal or not, there wasn't time to experiment further, but it did get
us a massive improvement in performance over the default mechanism.

If you aren't asking too much of the default mechanism (as we were) to
begin with, the gain may be less significant.
 
S

schoenfeld1

Steve said:
I think you'll probably have to benchmark it to find out.

If I remember rightly, the project I used it in was serializing a large
collection of Person objects, each of which contained subobjects. We
flattened each person and subobjects into a struct and stuffed an array
of those into SerializationInfo. I've no idea whether what we did was
optimal or not, there wasn't time to experiment further, but it did get
us a massive improvement in performance over the default mechanism.

If you aren't asking too much of the default mechanism (as we were) to
begin with, the gain may be less significant.

Thanks a lot for your help. I've noticed a sufficient performance
increase with custom serialization rendering this a non-issue.
 
S

schoenfeld1

I've implemented IPC between two applications using named pipes and
binary serialization, but have noticed that the binary formatter is
rather slow.

It seems that the binary formatter reflects the entire type everytime
it is invoked to serialize/deserialize an object of that type.

Is there a way to prepare the binary formatter with a pre-defined type,
such that it only reflects once but can be re-used to
serialize/deserialize objects of that type without performing
reflection again?

Here is the relevant code:

static public byte[] Serialize(Message message) {
#region Pre-conditions
Debug.Assert(message != null);
#endregion
MemoryStream memStream = new MemoryStream(1024);
BinaryFormatter binFormatter = new BinaryFormatter();
binFormatter.Serialize(memStream, message);
return memStream.GetBuffer();
}


static public Message Deserialize(byte[] data) {
#region Pre-conditions
Debug.Assert(data != null);
#endregion
BinaryFormatter binFormatter = new BinaryFormatter();
MemoryStream memStream = new MemoryStream(data);
object obj = binFormatter.Deserialize(memStream);
Debug.Assert(obj is Message);
return obj as Message;
}

Any assistance is appreciated.

For future lurkers, I've corrected 2 bugs found in previous code
snippet (i.e. closing streams when done and not pre-allocating
memstream size when serializing).

static public byte[] Serialize(Message message) {
#region Pre-conditions
Debug.Assert(message != null);
#endregion
MemoryStream memStream = new MemoryStream();
BinaryFormatter binFormatter = new BinaryFormatter();
binFormatter.Serialize(memStream, message);
byte[] data = memStream.GetBuffer();
memStream.Close();
return data;
}


static public Message Deserialize(byte[] data) {
#region Pre-conditions
Debug.Assert(data != null);
#endregion
BinaryFormatter binFormatter = new BinaryFormatter();
MemoryStream memStream = new MemoryStream(data);
object obj = binFormatter.Deserialize(memStream);
memStream.Close();
Debug.Assert(obj is Message);
return obj as Message;
}
 
M

Michael S

This is not a problem at all. My objects are not complex at all. They
are just frequently serialized/deserialized. Do you think custom
serialization would provide significant performance boost in this case,
or should I just implement my own object serialization?

If it's fairly simple objects I would implement my own serialization.

Also, if the objects are constantly being created, loaded, used, saved and
forgotten; I would consider using an object-pool for unused objects and
re-use instances rather than creating new ones.

Something like this:

myObj = MyObjPool.Pop();
if (myObj == null)
{
myObj.Create();
}
myObj.Load(someId);
myObj.Use();
myObj.Save(someId)
MyPool.Push(myObj);

Happy Pooling
- Michael S
 
S

Steve Walker

In message said:
Steve Walker wrote:

Thanks a lot for your help. I've noticed a sufficient performance
increase with custom serialization rendering this a non-issue.

Welcome.

Just had a play with this. Looks like a collection of:

class Foo
{
int a;
int b;
int c;
int d;
int e;
int f;
string g;
string h;
string i;
string j;
string k;
string l;
}

serializes a lot faster than a collection of:

class Bar
{
private Ints ints;
string g;
string h;
string i;
string j;
string k;
string l;
private class Ints
{
int a;
int b;
int c;
int d;
int e;
int f;
}
}

and so performance-wise it's worthwhile implementing ISerializable in
the collection:

Assuming we've defined (Foo) and (Bar) as operators to convert between
the two:

public BarCollection(SerializationInfo si, StreamingContext context)
{
Foo[] foos = (Foo[])si.GetValue("stuff", typeof(Foo[]));

foreach(Foo item in foos)
{
this.Add((Bar)item);
}
}

public void GetObjectData(SerializationInfo info,
StreamingContext context)
{
Foo[] foos = new Foo[this.Count];

for(int i=0; i<this.Count; i++)
{
foos = (Foo)this;
}

info.AddValue("stuff",foos);
}



I still wouldn't do it unless something was really too slow, but
interesting all the same.

So, anybody know enough about how the default binary serialization
mechanism works under the hood to explain the difference?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top