Reading TCP data stream and finding an End of line

Tom · Jul 20, 2007

This may be more theory than code,

I am currently using a 3rd party TCP client socket tool to read in
data from a connection. The tool supports and End of Line(EOL)
character. The problem is that the data that is coming in does not
have an EOL character or set of characters. The first 4 bytes of the
data contain a set of defined characters "ABCD". The next 4 bytes
give the size of the whole packet, "0012". After that, there is a
random length of data.

Of course, the TCP tools DataIn function fires for partial messages,
which I have to assemble in a byte[] buffer, while waiting for the
rest of the message. Well I only have the length of the message to go
on. If I use the starting characters "ABCD" as the EOL, the last
message in will not get processed until another. This is to be
expected. I could count the number of characters until the message
length, copy out the message, but would then have to shift the entire
buffer to the left in case part of another message was in the previous
transmission. Or is there a natural break so that one event can not
have two separate partial TCP packets? I know that this is probably a
3rd party question.

What is a good way to deal with something like this?

Thanks,
Tom

Peter Duniho · Jul 20, 2007

[...]
Of course, the TCP tools DataIn function fires for partial messages,
which I have to assemble in a byte[] buffer, while waiting for the
rest of the message. Well I only have the length of the message to go
on.

Then you need to use that.

If I use the starting characters "ABCD" as the EOL, the last
message in will not get processed until another. This is to be
expected. I could count the number of characters until the message
length, copy out the message, but would then have to shift the entire
buffer to the left in case part of another message was in the previous
transmission.

Yes, shifting the data is a waste of time, and there are better ways to
deal with it. For example, use multiple buffers, put into a queue. Keep
track of the next byte offset to be read from the current buffer, updating
that each time you copy out any data. Discard a buffer once you've read
all of the bytes from it.

Or is there a natural break so that one event can not
have two separate partial TCP packets?

There's no such thing as a "TCP packet", and there is no "natural break"
in the stream of TCP data. With TCP, the data may be grouped in any
arbitrary grouping. The only guarantee is that the bytes will be received
in the same order in which they are sent, assuming they are received at
all.

What is a good way to deal with something like this?

The first step is to stop thinking of data you receive over TCP as
"packets". Even if the application protocol defines "packets" or
"messages" or whatever you want to call it, TCP doesn't know anything
about that. IMHO, if you can conceptually (that is, in your own mind)
separate the TCP communications _completely_ from your application
protocol, this leads to better solutions.

Every time you write or think something like "partial TCP packets", you
are leading your own mental concept in the wrong direction, whether you
realize it or not. IMHO, this makes it harder to discover the correct
solution.

IMHO, the second step is probably to stop using the third party "TCP
client socket tool". .NET provides a very nice Socket class, as well as a
TcpClient class that encapsulates some of the more basic things you'd want
to do with a TCP connection. Part of your problem here is that because
the third party tool offers a feature (supporting the idea of a "end of
line" marker) you are getting bogged down thinking that you somehow need
to use that feature.

As you've noted, your data does not have an "end of line" marker. You
can't use any concept of "end of line" to handle your data. Even if you
did have an "end of line" marker, it's not hard to handle this explicitly
rather than using some third party library to do it for you.

Especially since the third party tool does not appear to be taking over
any of the work to actually build up application-level messages anyway,
it's hard for me to see what value it offers. It doesn't seem like it
could be offering much.

As far as the specific solution goes...

There are a variety of ways to address this. The simplest is to keep
track of how many bytes you are capable of processing, and only attempt to
receive that many at once. So, in your example, you would start out
receive 8 bytes, which would be the signature (4 bytes) and the message
length (4 bytes). Then, once you know how long the message is, receive
only the number of bytes that will compose that message. You control the
number of bytes you receive via the length of the buffer you pass to the
receive method, of course. This is VERY inefficient, but if your
communications do not involve a lot of data, that should not be a problem.

A better way is to disconnect the network i/o from the data handling
altogether. This is where not thinking about the TCP stream as having
"packets" is useful. You need a layer between your application and your
network i/o that handles "packetizing" the TCP stream. There are a
variety of ways to do this, and the "best" way depends on your own
application to some degree, but the basic idea is to design the network
i/o to be efficient relative to how network i/o works, and to design the
application data handling to be efficient relative to how the application
works. The layer in between maps the efficient network i/o to the
efficient application i/o.

IMHO, the goal here would be to design something that conceptually makes
sense to you, without worrying too much about the efficiency. Obviously,
you should worry a little bit...otherwise, it'd be better to keep it
simple and to the inefficient thing I mentioned above. But other than
that, don't waste time worrying too much about efficiency when the first
goal should be to get it to work.

Pete

Tom · Jul 23, 2007

Thank you Pete,

That was very helpful and informative. I am going to try and separate
the TCP from the app, by trying to use a queue. I will just push the
data from the TCP connection onto a queue. The queue handle the
parsing of the beginning and end, and will then handle the
distribution to the client app. I will see if I can get that working.

Thanks,
Tom

C TCP client and .NET TCP server?	2	Jan 5, 2011
Actual Size of UDP Packets	1	Jun 22, 2009
Stream Reading	2	Oct 18, 2008
Design Pattern for TCP communication	10	Jan 30, 2015
.net -> win32 stream oddity	1	May 19, 2008
XmlSerializer over NetworkStream	3	Mar 4, 2008
Stream Reading -- Again	1	Oct 20, 2008
network read question	2	Apr 9, 2008

Reading TCP data stream and finding an End of line

Tom

Peter Duniho

Tom

Ask a Question

Similar Threads