How do I inflate compressed data from an asynchronous socket?

P

Pat B

Hi, I'm writing my own implementation of the Gnutella P2P protocol
using C#. I have implemented it using BeginReceive and EndReceive
calls so as not to block when waiting for data from the supernode.
Everything I have written works fine sending and receiving
uncompressed data. But now I want to implement compression using the
deflate algorithm as the Gnutella protocol accepts:
Accept-Encoding: deflate
Content-Encoding: deflate
in the HTTP connection headers.

At this stage I can't get the receive to work properly. I need this
first before I can tell if the send is working or not.

First I attempted to use the existing DeflateStream(new
NetworkStream(socket)) and call BeginRead and EndRead, but I kept
getting errors. (Can't remember what they were, this was a few days
ago now).

So I then tried using the SharpZipLib with a little bit of success. I
initially used the InflaterInputStream(new NetworkStream(socket)) but
it was causing too many exceptions when the supernode cut the
connection during the receive. The problem with this is, because I am
using asynchronous callbacks, the exceptions are not caught in my
code.

So I implemented it using the existing socket calls and passed the
received data to an Inflater and called SetInput and Inflate. With a
large receive buffer size I can sometimes uncompress some data from
the supernode. However when I shrink the buffer to a more manageable
size, I start getting "broken uncompressed block" exceptions.

I believe the problem is that I am trying to uncompress an incomplete
packet of data. But I am not sure how else to do it, because I have to
maintain a permanent connection to the supernode, so the data never
stops coming, and there is no way to tell how big individual messages
will be until I have them and have processed them.

I don't think a MemoryStream is the right way to go either, because if
I buffer all data from the session, due to the permanent connection,
eventually I will eat up all my memory.

Does anyone know how to inflate segmented chunks of data without
knowledge of the packet size?

Thanks,
Pat.
 
P

Peter Duniho

Keeping in mind that I don't have first-hand knowledge of your specific
problem...

[...]
First I attempted to use the existing DeflateStream(new
NetworkStream(socket)) and call BeginRead and EndRead, but I kept
getting errors. (Can't remember what they were, this was a few days
ago now).

I'm not clear on why you would call BeginRead, or any kind of read, using
this mechanism. Once you've successfully wrapped the socket in a
NetworkStream, and passed that NetworkStream to DeflateStream, I would
expect that DeflateStream would at that point take over the network i/o by
accessing the NetworkStream directly. Of course, you would want this to
happen in its own thread so as to not block your GUI.

What is it that you expect to be able to do when your receive callback
passed to BeginRead is called?
So I then tried using the SharpZipLib with a little bit of success. I
initially used the InflaterInputStream(new NetworkStream(socket)) but
it was causing too many exceptions when the supernode cut the
connection during the receive. The problem with this is, because I am
using asynchronous callbacks, the exceptions are not caught in my
code.

You can broaden the chances of getting useful information by defining
terms not specific to C# or .NET Framework (the only real common ground
you are assured of here). For example, what's a "supernode" and how is it
relevant to your question? I assume it has something to do with the
Gnutella protocol, but that's not something I've spent time studying.

As far as the exceptions go, I have the same question as above: if you are
passing the NetworkStream to another component, why would you expect to
handle the receive yourself at all?
So I implemented it using the existing socket calls and passed the
received data to an Inflater and called SetInput and Inflate. With a
large receive buffer size I can sometimes uncompress some data from
the supernode. However when I shrink the buffer to a more manageable
size, I start getting "broken uncompressed block" exceptions.

I can't speak specifically about the library you're using for
decompression, but it may assume that you are passing a complete
compressed block of data to it in each operation. So if you don't
accumulate all of your data first before passing it to the decompressor, I
certainly could see why you'd see it complain about incomplete (or
"broken") blocks.
I believe the problem is that I am trying to uncompress an incomplete
packet of data. But I am not sure how else to do it, because I have to
maintain a permanent connection to the supernode, so the data never
stops coming, and there is no way to tell how big individual messages
will be until I have them and have processed them.

I don't really know. If the Gnutella protocol doesn't provide a way to
tell you in advance how long a single compressed block of data will be,
nor does it have a mechanism for delimiting a single compressed block of
data, _and_ the decompressor you're using does not accept partial blocks
of data, nor does it have a mechanism for reading a stream of data and
periodically emitting decompressed blocks as they are completed, I don't
really see a solution to your problem.

If you find a way to eliminate just one of the above factors, then I
believe that would be your solution.

It seems to me that, based on a couple of the methods you mentioned
(specifically, the two tht involve passing a NetworkStream to something),
I would think that you should not have any problem as long as the library
can deal with somehow telling you when it's finished a block. It would do
this either by continuing to run, but somehow informing you of a completed
block (for example, raising a .NET event), or it could only read the
stream up to the end of the block and then complete, at which point you
would create a new decompressor instance using the same NetworkStream
object so that it can start reading from where the last one left off.

But if your decompressors get confused when you provide a stream that's
too large, I don't have any ideas.

Finally, one thing to check on: you talking about "message" and "packet
size", but assuming you're dealing with TCP there's no such thing at the
network protocol level. If you are trying to receive the data manually
and don't take into account the fact that when you read from a TCP socket,
you can wind up getting back anywhere between 1 byte and the total number
of bytes sent since the previous byte most recently received, then it is
possible and quite likely to wind up with corrupted data.

Make sure you have this last point addressed before you move on to the
more specific stuff. :)

Pete
 
P

Pat B

Thanks Pete.

When I was receiving the raw data I was not using NetworkStream, I was
just using the socket itself. I was calling BeginReceive and
EndReceive because I don't want the read to block. It is much simpler
to make use of the asynchronous calls than to implement threads or
thread pools.

Example in pseudo code with fictional functions:

OnConnect
{
socket.BeginReceive(tempbuffer[256], OnReceive)
}

OnReceive
{
read = socket.EndReceive()
readData += CopyLeftBytes(tempbuffer, read)
ProcessData(readData)
socket.BeginReceive(tempbuffer[256], OnReceive)
}

ProcessData
{
while (true)
{
msgLength = ReadMessageLengthFrom(readData)
if (msgLength == 0 || msgLength > readData.Length)
{
// wait for more data
return
}

message = CopyLeftBytes(readData, msgLength)
RemoveLeftBytes(readData, msgLength)
// process single message
}
}

This works without compression. The size of the messages coming from
the supernode (server) is specified within the data itself. The
problem with adding compression is that you first have to decompress
the data before you can read out the message length.

First I tried:

OnConnect
{
readStream = new SharpZipLib.InflaterInputStream(new
NetworkStream(socket))
readStream.BeginRead(tempbuffer[256], OnReceive)
}

OnReceive
{
read = readStream.EndRead()
readData += CopyLeftBytes(tempbuffer, read)
ProcessData(readData)
readStream.BeginRead(tempbuffer[256], OnReceive)
}

ProcessData
{
...
}

This works... when everything works. The problem is when the supernode
(server) terminates the connection while you are waiting for data, or
if some invalid data is sent by the supernode, the exception is thrown
on a different thread (initiated by windows?) and so I cannot catch it
an clean up. This is because the Begin and End are asynchronous and
the exception is being thrown in the middle.

Using the dotNET DeflateStream in the place of
SharpZipLib.InflaterInputStream blew up with the error: "Block length
does not match with its complement."

So then I tried this:

OnConnect
{
inflater = new SharpZipLib.Inflater()
socket.BeginReceive(tempbuffer[256], OnReceive)
}

OnReceive
{
read = readStream.EndRead()
inflater.SetInput(CopyLeftBytes(tempbuffer, read))
read = inflater.Inflate(inflateBuffer[1024])
readData += CopyLeftBytes(inflateBuffer, read)
ProcessData(readData)
readStream.BeginRead(tempbuffer[256], OnReceive)
}

ProcessData
{
...
}

This works only when tempBuffer is large enough to receive the entire
packet before passing it into the inflater. If not it blows up on the
inflater.Inflate(...) line with the "broken uncompressed block" error.

- So in each case I am catering for different size blocks of data
coming from the socket.
- I have no way of knowing how much data to expect until after I have
decompressed it.
- I don't want to have to implement using thread pools and blocking
calls.
- Using streams may be the solution, only if there is a good way of
catching the exceptions from the asynchronous reads.

Maybe I am supposed to expect the error from the inflater and take
that as an indication to wait for more data before trying to
decompress again?

Thanks,
Pat.
 
P

Pat B

Maybe I am supposed to expect the error from the inflater and take
that as an indication to wait for more data before trying to
decompress again?

Nope. Doesn't work. Same error.
 
P

Peter Duniho

Again, keeping mind my lack of first-hand knowledge with the exact
situation...

When I was receiving the raw data I was not using NetworkStream, I was
just using the socket itself. I was calling BeginReceive and
EndReceive because I don't want the read to block. It is much simpler
to make use of the asynchronous calls than to implement threads or
thread pools.

Yes, I agree that the asynchronous methods of the Socket class are simple
and convenient. Do keep in mind that you are still using the thread pool
though; you just do it implicitly in this case, rather than explicitly.
[...]
First I tried:

OnConnect
{
readStream = new SharpZipLib.InflaterInputStream(new
NetworkStream(socket))
readStream.BeginRead(tempbuffer[256], OnReceive)
}

OnReceive
{
read = readStream.EndRead()
readData += CopyLeftBytes(tempbuffer, read)
ProcessData(readData)
readStream.BeginRead(tempbuffer[256], OnReceive)
}

ProcessData
{
...
}

This works... when everything works. The problem is when the supernode
(server) terminates the connection while you are waiting for data, or
if some invalid data is sent by the supernode, the exception is thrown
on a different thread (initiated by windows?) and so I cannot catch it
an clean up. This is because the Begin and End are asynchronous and
the exception is being thrown in the middle.

I see. So the decompressor (SharpZipLib) exposes a stream from which you
can read. IMHO, it seems to me that SharpZipLib ought to provide a
mechanism for cleanly dealing with errors on the NetworkStream you pass to
it. It could do this in a variety of ways, but I can't comment on whether
or how it does, never having used it.

If it does in fact provide no mechanism, even though it should, then one
thing you might consider is writing your own intermediate Stream class to
handle the errors. You would have this class simply forward all the basic
calls to the NetworkStream it contains, so that the SharpZipLib is calling
your class rather than NetworkStream directly. That way if and when an
exception or other error occurs, you have a chance to deal with it before
propogating it back to the SharpZipLib class.
Using the dotNET DeflateStream in the place of
SharpZipLib.InflaterInputStream blew up with the error: "Block length
does not match with its complement."

So then I tried this:

OnConnect
{
inflater = new SharpZipLib.Inflater()
socket.BeginReceive(tempbuffer[256], OnReceive)
}

OnReceive
{
read = readStream.EndRead()
inflater.SetInput(CopyLeftBytes(tempbuffer, read))
read = inflater.Inflate(inflateBuffer[1024])
readData += CopyLeftBytes(inflateBuffer, read)
ProcessData(readData)
readStream.BeginRead(tempbuffer[256], OnReceive)
}

ProcessData
{
...
}

This works only when tempBuffer is large enough to receive the entire
packet before passing it into the inflater. If not it blows up on the
inflater.Inflate(...) line with the "broken uncompressed block" error.

This sounds to me like the same error you get using the "dotNET
DeflateStream" mechanism: essentially, you've tried to decompress data in
one shot without having the complete data available to decompress. No
surprise then that doesn't work.
- So in each case I am catering for different size blocks of data
coming from the socket.

Yes, it seems to me that you're correctly handing the basic network
receiving correctly.
- I have no way of knowing how much data to expect until after I have
decompressed it.

This seems to me to be a fundamental mistake in the Gnutella protocol,
which requires you to make at least some extra effort beyond that which
should be necessary. I suspect you can get SharpZipLib to do what you
want, but in the worst case if you couldn't, the Gnutella protocol would
put you in a position where you'd have to do all the decompressing
yourself in your own code so that you can work around not knowing how much
data you're expecting.

Whether this limitation really exists in Gnutella, I can't say. It may be
that it does, or it may be that you're just overlooking something in the
protocol that would address this.
- I don't want to have to implement using thread pools and blocking
calls.

Well, as I said, you are already using the thread pool implicitly. Still,
it doesn't seem to me that you would wind up in a position where you have
to implicitly do some threading, given the apparent design of the
libraries you're trying touse.
- Using streams may be the solution, only if there is a good way of
catching the exceptions from the asynchronous reads.

See above: I suspect that if you simply write your own Stream-derived
wrapper for the NetworkStream, you can intercept the exceptions before
they get to the SharpZipLib class, allowing you to handle the exceptions
however you see fit.
Maybe I am supposed to expect the error from the inflater and take
that as an indication to wait for more data before trying to
decompress again?

You could do that as well. I don't know why it wouldn't, even though you
say it doesn't. When you tried this, are you sure that you retained the
data at the beginning of your array for future attempts? It seems to me
that if you keep trying to decompress the same data, except that new data
is appended between each attempt, eventually you would have enough data to
decompress and it would succeed. It doesn't seem like a very efficient
approach to the problem though, so I think you should explore the
Stream-derived wrapper solution first.

Pete
 
P

Pat B

Ok, thanks for all your suggestions Pete. I'll give some of them a go
and see what I can come up with.

Thanks,
Pat.
 
R

Rakesh Soni

Thanks Pete.

When I was receiving the raw data I was not using NetworkStream, I was
just using the socket itself. I was calling BeginReceive and
EndReceive because I don't want the read to block. It is much simpler
to make use of the asynchronous calls than to implement threads or
thread pools.

Example in pseudo code with fictional functions:

OnConnect
{
socket.BeginReceive(tempbuffer[256], OnReceive)

}

OnReceive
{
read = socket.EndReceive()
readData += CopyLeftBytes(tempbuffer, read)
ProcessData(readData)
socket.BeginReceive(tempbuffer[256], OnReceive)

}

ProcessData
{
while (true)
{
msgLength = ReadMessageLengthFrom(readData)
if (msgLength == 0 || msgLength > readData.Length)
{
// wait for more data
return
}

message = CopyLeftBytes(readData, msgLength)
RemoveLeftBytes(readData, msgLength)
// process single message
}

}

This works without compression. The size of the messages coming from
the supernode (server) is specified within the data itself. The
problem with adding compression is that you first have to decompress
the data before you can read out the message length.

First I tried:

OnConnect
{
readStream = new SharpZipLib.InflaterInputStream(new
NetworkStream(socket))
readStream.BeginRead(tempbuffer[256], OnReceive)

}

OnReceive
{
read = readStream.EndRead()
readData += CopyLeftBytes(tempbuffer, read)
ProcessData(readData)
readStream.BeginRead(tempbuffer[256], OnReceive)

}

ProcessData
{
...

}

This works... when everything works. The problem is when the supernode
(server) terminates the connection while you are waiting for data, or
if some invalid data is sent by the supernode, the exception is thrown
on a different thread (initiated by windows?) and so I cannot catch it
an clean up. This is because the Begin and End are asynchronous and
the exception is being thrown in the middle.

Using the dotNET DeflateStream in the place of
SharpZipLib.InflaterInputStream blew up with the error: "Block length
does not match with its complement."

So then I tried this:

OnConnect
{
inflater = new SharpZipLib.Inflater()
socket.BeginReceive(tempbuffer[256], OnReceive)

}

OnReceive
{
read = readStream.EndRead()
inflater.SetInput(CopyLeftBytes(tempbuffer, read))
read = inflater.Inflate(inflateBuffer[1024])
readData += CopyLeftBytes(inflateBuffer, read)
ProcessData(readData)
readStream.BeginRead(tempbuffer[256], OnReceive)

}

ProcessData
{
...

}

This works only when tempBuffer is large enough to receive the entire
packet before passing it into the inflater. If not it blows up on the
inflater.Inflate(...) line with the "broken uncompressed block" error.

- So in each case I am catering for different size blocks of data
coming from the socket.
- I have no way of knowing how much data to expect until after I have
decompressed it.
- I don't want to have to implement using thread pools and blocking
calls.
- Using streams may be the solution, only if there is a good way of
catching the exceptions from the asynchronous reads.

Maybe I am supposed to expect the error from the inflater and take
that as an indication to wait for more data before trying to
decompress again?

Thanks,
Pat.

=========================================

Hi ,

I also want to inflate compressed data from an asynchronous socket but
I would like to inflate ones data is received by BeginReceive and
EndReceive. Is that feasible ? or I have to replace this socket
calls with InflaterInputStream - > BeginRead and EndRead ?

Thanks
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top