Getting serilized data size

J

John J. Hughes II

Using the below code I am send multiple sterilized object across an IP port.
This works fine if only one object is received at a time but with packing
sometimes there is more then one object or half an object in the received
data. If I place the data in a memory stream on the received side is there
a way to determine where one ends and the next one start?

Since the deserializer stream seems to move the pointer I am trying to look
at the next couple bytes to determine if there is more data in the stream.
It seems that serialized data starts with 0,1 but I can not confirm this.
Is there maybe a rule?

I have also noticed that the number of bytes recieved is always bigger then
the number of bytes deserizlized, any clue why?

CODE 1
IFormatter f = new BinaryFormatter();
MemoryStream ms = new MemoryStream();
f.Serialize(ms, cmd);
byte[] b = ms.GetBuffer();
this.socket.BeginSend(b, 0, b.Length, SocketFlags.None, new
AsyncCallback(Send_Callback), this.socket);

CODE 2
IFormatter i = new BinaryFormatter();
MemoryStream ms = new MemoryStream(o, 0, len);
ms.Position = 0;
while(ms.Position < len)
{
long position = ms.Position;
if(!(ms.ReadByte() == 0 && ms.ReadByte() == 1))
break;
ms.Position = position;

object obj = i.Deserialize(ms);
.....

Regards,
John
 
S

Sahil Malik

John,

How are you getting the value of "len" in CODE 2? Did you try looking at
ms.Length after serialization? Maybe you could send that in some sort of
delimited fashion over the wire.

- SM
 
H

Helge Jensen

John said:
Using the below code I am send multiple sterilized object across an IP port.
This works fine if only one object is received at a time but with packing
sometimes there is more then one object or half an object in the received
data. If I place the data in a memory stream on the received side is there
a way to determine where one ends and the next one start?

Through _painstaking_ tracing and hours of debugging thinking that a
network card driver was buggy, I have found that:

- The SoapSerializer.Deserialize (sometimes) read more data than
SoapSerializer.Serialize generates (due to buffered reads).

- The BinarySerializer.Deserialize doesn't always read all the data
generated by BinarySerializer.Serialize, and will Close() the stream
when done deserializing!

So basically, you cannot send multiple serialized objects down the same
stream without being really lucky, especially after .NET-sp1.
SoapSerializer used to work most of the time before .NET-sp1.

My solution for sending multiple serialized objects, in a streaming way,
inside the same trasport-Stream is to implement a Stream that does
chunk-length en/de-coding of the amount of data written.

So I have a BlockSubStreamWriter, that writes a 16-bit length and then
the data to be written to the stream, and writes a 16-bit 0-length and
flushes to mark Close(), and a BlockSubStreamReader that reads 16-bit
length and then corresponding data, making EOF if the length is 0, and
"eating" the rest of stream untill a 0-length is received if the
BlockSubStreamReader is Close() or Dispose()'ed without having read a
0-length.

So now i can say

new BinarySerializer.Serialize(new BlockSubStreamWriter(stream), graph);
new BinarySerializer.Deserialize(new BlockSubStreamReader(stream)).

On top of this i have a 4-byte magic ID, so I can do
serialize/deserialize using multiple protocols from a simple interface:

public interface IObjectChannel: IDisposable
{
/// <summary>
/// Send object down the channel
/// </summary>
void SendObject(Object o);
/// <summary>
/// Read object from channel
/// </summary>
/// <exception cref="ObjectChannels.Closed">ObjectChannels.Closed
is thrown if the transport is closed
/// before any data is received</exception>
Object ReceiveObject();
}

I have a (rather pretty, if I should say it myself) implementation (and
some rather nifty handshaking and stuff) making up a transparent way of
shipping objects from one .NET instance to another, including extensive
unit and integration-tests.

Clients can now do (/actual code from test/)

using ( IObjectChannel c = ksio.ChannelPool.Global.Connect(host,port))
{
c.SendObject(command);
Object reply = c.ReceiveObject();
}

Which will reuse an existing connection if one is available, and GC
connections that are idle for more than a specific time (default: 1minute).

Servers simply do (/actual code from test - implements an "echo-service"/)

Socket s = ...;
try
{
using ( NetworkStream stream = new NetworkStream(s, true) )
while ( true )
using ( IObjectChannel c =
ObjectChannels.Auto.Global.Create(stream) )
c.SendObject(c.ReceiveObject());
}
catch ( ObjectChannels.MagicIdObjectChannel.Closed)
{
// That's the expected way to close
}
catch ( Exception e )
{
Console.WriteLine("UNEXPECTED END\n{0}", e);
}

The entire stuff (including a version that does gzip compression on the
data, thanks to ICSharpCode.SharpZipLib) is about 200kb worth of code,
..proj- and .sln-files, which I would be happy to publish if someone
wants to check it out.
 
J

John J. Hughes II

Sahil Malik

Len comes form the socket when I recieved the packet... below is the code
that calls the code that you saw before. I have another system where I send
compressed data that is larger then the buffer size so I send the size as a
header then then wait for all the data to show up. This system was intented
to more or less just monitor the stream and act based on the data so I was
trying to keep the overhead down. Guess I have no choice but to do it the
long way around.

Thanks for the feedback.

/// Display data calls a delegate which inturn is sent to an event which
handles the data.
private void AcceptData(IAsyncResult result)
{
try
{
StateObject so = (StateObject)result.AsyncState;
if(so.workSocket == null || !so.workSocket.Connected)
return;

int bytes = so.workSocket.EndReceive(result);

if(bytes > 0)
{
this.displayData(so.buffer, bytes);
}
else
{
this.stop = true;
}
recDone.Set();
}
catch(Exception ex)
{ System.Diagnostics.Debug.WriteLine(ex.Message); }
}


Regards,
John
 
J

John J. Hughes II

Helge Jensen

Well I am not trying to send the data down the same stream. I am just
sending it to a TCP socket which is connected. The problem seems to be that
if the receiving function does not take the data from the stream fast
enought I end up with more then one data group in a packet. Now the problem
is taking the data apart. I have another program that uses header when
sending the data I was just trying to avoid in here.

Thanks for the detailed answer!

Regards,
John
 
H

Helge Jensen

John said:
Helge Jensen

Well I am not trying to send the data down the same stream. I am just
sending it to a TCP socket which is connected.

Arent you serializing objects and sending multiple objects through the
same connection and deserializing them when they get to their endpoint?
That's what the code reads like to me, otherwise you can simply pass a
new NetworkStream(socket) to Deserialize().

You do NOT want to put the data in a memory-stream, *trust* me. It's
because of the .NET garbage-collector: If the MemoryStream contains less
than 85kb data it will be moved for (probably) every garbage-collection,
which means a copy of a large chunk. If the MemoryStream contains more
than 85kb it will be allocates on the large object heap, and .NET is
very reluctant to free that memory.

Try writing a test-program that sends a few hundred 1Mb objects throgh
and watch your machine crumble :)

Kamstrup (my workplace) used to serialize/deserialize to memory when I
got there, sending roughly 1Mb worth of serialized objects. But that
would make win32 come to a halt, .NET expending >400Mb of memory even
though only 1Mb was sent at a time.

Now, with my new solution, .NET stays below 5Mb mem-usage.
I have also noticed that the number of bytes recieved is always
bigger > then the number of bytes deserizlized, any clue why?

As I said, the BinarySerializer Deserialize() doesn't read all the data
that Serialize() writes, AND it Close()'es the stream afterwards.
Now the problem
is taking the data apart. I have another program that uses header when
sending the data I was just trying to avoid in here.

If you *do* write to MemoryStream's, then you can easily attach
data-length-headers to the data sent, and workaround your problem, as I
suspect you are saying here?

My method is just a streaming way of attaching the lengths, instead of
using a MemoryStream, and then some added bells&whistles from my
previous experiences with protocols :)

There is *no* way around knowing the lengths generated by Serialize if
you wish to read multiple objects from the same connection. I have found
no simple way to predict how many bytes Deserialize doesn't read.
Thanks for the detailed answer!

No problem, I wrestled this beast for about 25hrs of development time,
don't wan't others to go through the same experience.
 
H

Helge Jensen

Code now available as http://slog.dk/~jensen/download/ksio.2705.zip,
straight out of Kamstrup's Subversion-repository.

You are free to do as you please with the code, and welcome to patch it,
and even more welcome to contribute a patch back :)

To make it compile, remove the prebuild-steps (they insert
subversion-specific information not available in the .zip file) and
remove the svn.cs files from the projects (they are generated in the
prebuild steps).
 
S

Sahil Malik

John,

My gut feeling is that you shouldn't be doing BinaryFormatter's work, i.e.
you shouldn't have to figure out Length to do a deserialize reliably.

Also, unless you splice out individual streams per object, your multi object
stream cannot be reliable. You will be much better off wrapping that into
another object instead that acts as a collection of these various objects,
i.e. BinaryFormatter must have something atomic to work upon.

- Sahil Malik
http://dotnetjunkies.com/weblog/sahilmalik
 
S

Sahil Malik

You do NOT want to put the data in a memory-stream, *trust* me. It's
because of the .NET garbage-collector: If the MemoryStream contains less
than 85kb data it will be moved for (probably) every garbage-collection,
which means a copy of a large chunk. If the MemoryStream contains more
than 85kb it will be allocates on the large object heap, and .NET is very
reluctant to free that memory.


<-- I was trying to validate that paragraph. Look at the code below.

static void Main(string[] args)
{
int i;
byte[] dummydata = System.Text.Encoding.UTF8.GetBytes("This is some
dummydata, who cares what's in it as long as it is some decent length. I
think we've reached a decent length now.") ;
for (i = 0; i <= 200000; i++)
{
Console.WriteLine("Iteration : " + i.ToString());
MemoryStream ms = new MemoryStream();
ms.Write(dummydata, 0, dummydata.Length);
}
Console.Read();
}

My mem usage didn't go crazy, and really why should it .. MemoryStream is a
reference type.

What am I missing? :)

- Sahil Malik
http://dotnetjunkies.com/weblog/sahilmalik
 
H

Helge Jensen

Sahil Malik wrote:

[ Sahil Malik trying to get large-object-heap problems ]
What am I missing? :)

You are just writing to one MemoryStream, which does have O(n)
mem-usage, it's not an amortization problem (like the one in

Below is the test-program that i originally used to check my assumption.
BTW: That was before .net-sp1, so things may have changed.

I have tried running it again today on SP1, and it doesn't seem to be as
bad anymore. The specific problems I had involved a server that had the
SoapFormatter leak as well, so it may have manifested worse on that machine.

The .NET GC fan-crowd is gonna come down on me like a ton of bricks
saying that the memory-profile of this is just a feature of the GC, not
collecting that memory, reusing it and giving it back to the OS if
needed. But my observation at the time was, that .NET processes using
the large-memory-heap could grind the machine to a halt, and even cause
out-of-memory situations by not releasing memory not referenced by any
object.

The program should use roughly 3*bts.Length[1] memory ~ 30Mb (bts,
stream, bts2) but did eat up all available RAM on an 256Mb machine,
making all other programs swap out, and the machine was unable to
recover from that state, since the TaskManager couldn't start so i was
unable to kill the program.

1. It is checked that the serialization is roughly the size of bts.Length.
 
J

John J. Hughes II

Helge Jensen

Yes I am serializing them through the same connection and if I implement
handshaking which I have done in the past with a header showing the expected
size it works fine. Normally protocol stuff: send header with size, read
data until size reached, ack data, send next header/data, etc...

I do try avoid using a memory stream when possible if for no other reason it
has more overhead then a byte array but as far as I am aware the only way to
deserialize the data is to create a memory stream. I try to compensate for
the memory issued but setting values to null when done and also manually
disposing of object when possible. Lets face it .NET like to eat memory and
is not always very quick about giving it back.

As I said I have another application that does handshaking with large data
chucks which I compress before sending and have not really noticed a large
problem with memory usage. I have allocated up to 50 meg chuck, compressed
them down to couple megs and send them, normally the program does not go
over 150 meg usage and that is with debugging turned on.

But yes I agree I am going to have to use some kind of protocol and I will
not put it in a memory stream to decode later. I was just hoping when I
started this thread they MS had built something in that I was not seeing and
based on this thread it does not exist.

Regards,
John
 
J

John J. Hughes II

Helge Jensen

Wow, thanks for the code... it will take a while to digest but what I have
seen looks good :)

I put the link and e-mail address in the project so if I find anything I
will update you.

Regards,
John
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top