Truncated data using .NET 3.5 sockets VS 2008

Kenny D · Mar 24, 2008

I am having trouble sending streams of data from one socket to another
using the following code.

When the transmitter is running on the same machine as the receiver,
the data transmits perfectly. When the transmitter runs on another
machine, the data gets truncated at semi random places.

The data being transmitted is a unicode XML string. There is nothing
special about it other than being unicode.

The message stream is terminated with "</mos>\r\n". I read the stream
1 byte at a time in the code below. I have tried various other "block
read" methods with the same results.

This used to work when I was using .Net 2.0 under VS 2005. Since
upgrading to VS 2008, this no longer works correctly.

Is there anything I am doing wrong based on the code below or is there
something I have overlooked vis-a-vis VS 2008? Is there something
wrong with my network set up that would cause this - a service pack
release or something? I have tested this with the Windows Firewall on
and off - makes no difference.

Any help appreciated.

// The code that writes to the socket...
public void Write(NetworkStream OutputStream, string Msg)
{
Msg += "\r\n";
while (Msg.Contains("\r\n\r\n"))
Msg = Msg.Replace("\r\n\r\n", "\r\n");

byte[] OutputBuffer = Encoding.BigEndianUnicode.GetBytes(Msg);
OutputStream.Write(OutputBuffer, 0, OutputBuffer.GetLength(0));
OutputStream.Flush();
}

// The code that reads the socket...
public string Read(NetworkStream InputStream, string EOM)
{
StreamReader Reader = new StreamReader(InputStream,
Encoding.BigEndianUnicode);
StringBuilder MsgText = new StringBuilder();

Int32 InChar = -1;

while (true)
{

// If the next character == -1, we have been cut off and
disconnected or there was an error
// in the message being sent to the reader. Either way, stop
reading.
if ((InChar = Reader.Peek()) == -1)
{

if (!MsgText.ToString().EndsWith(EOM))
{
string ErrorMessage = string.Format("Error: Incomplete
Message\r\n\r\n{0}\r\n[PREMATURE END OF MESSAGE]",
MsgText.ToString());

MOSNetTraceSource.TraceData(TraceEventType.Error,
ErrorMessage);
LogWriter.WriteError(ErrorMessage);
}

return (null);
}

InChar = Reader.Read();

MsgText.Append((char)InChar);

if (MsgText.Length > 10 && (char)InChar == '\n')
{
if (MsgText.ToString().EndsWith(EOM))
{
break;
}
}

return(MsgText.ToString());
}

Jon Skeet [C# MVP] · Mar 24, 2008

Kenny D said:
I am having trouble sending streams of data from one socket to another
using the following code.

When the transmitter is running on the same machine as the receiver,
the data transmits perfectly. When the transmitter runs on another
machine, the data gets truncated at semi random places.

The data being transmitted is a unicode XML string. There is nothing
special about it other than being unicode.

The message stream is terminated with "</mos>\r\n". I read the stream
1 byte at a time in the code below. I have tried various other "block
read" methods with the same results.

This used to work when I was using .Net 2.0 under VS 2005. Since
upgrading to VS 2008, this no longer works correctly.

Is there anything I am doing wrong based on the code below or is there
something I have overlooked vis-a-vis VS 2008? Is there something
wrong with my network set up that would cause this - a service pack
release or something? I have tested this with the Windows Firewall on
and off - makes no difference.

I'd say the first thing to find out is what's actually going on the
wire. Try using WireShark (Google for the site) to sniff the data,
preferably on both sides.

It's not a hugely efficient way of reading, but I don't think it should
actually be causing any issues.

Kenny D · Mar 24, 2008

Thanks, Jon.

I will check out the utility you mentioned.

I know that one character at a time is inefficient, it's just happened
to be the way I did it when I decided to seek help here. Not shown in
the code is the blcok-read method which results in the same errors.

I am pulling what's left of my hair out over this. All this stuff
worked fine a couple of weeks ago.

Jon Skeet [C# MVP] · Mar 24, 2008

I admit, I don't use StreamReader.Peek() and thus don't have first-hand
experience with respect to what it really does.
Ditto.

But the documentation
certainly implies to me that you're using it wrong. And if that
implication is correct, it would absolutely explain the behavior you're
seeing.

The documentation seems pretty unclear to me. I'd read it as a blocking
call, but on a reread I think you're probably right. The docs should
definitely be improved...

I'd keep using ReadLine until that returned null, or Read/BeginRead
until they returned 0.

Kenny D · Mar 24, 2008

Read(), as I have used it in the original example is a blocking call.
Peek is a blocking call as well.

I am no network expert and this code worked like a charm before I
upgraded to VS2008 and .NET 3.5. I am not emphatically stating that
they are the culprits but the coincidence is questionable at least.

The problem with using the...
BytesRead = InputStream.Read(Buffer, 0, ReadBufferSize);
.... method is that the results are the same - truncated data.

Additionally, I used WireShark (WS) and it indicates a bad CRC on some
of the packets. The interesting thing is that it looks like the data
is getting to the target machine ok. I can look at the data in
WireShark and see all the proper tags in the XML, including the
message end tag. I don't see any strange data embedded in the XML in
WS.

Any other ideas? All help appreciated.

Thanks, so far.

Jon Skeet [C# MVP] · Mar 24, 2008

Kenny D said:
Read(), as I have used it in the original example is a blocking call.
Peek is a blocking call as well.

Do you have concrete evidence of this? A look in Reflector suggests
that it's not really blocking - that it will return -1 if the last call
to fill the internal buffer didn't read as many bytes as it asked for.

This sort of question occurs every so often - I think I'm going to
write a stream class which one can "program" to behave in particular
ways (e.g. waiting a while before returning data, returning a certain
amount of data at a time etc). That probably won't happen tonight
though.

I am no network expert and this code worked like a charm before I
upgraded to VS2008 and .NET 3.5. I am not emphatically stating that
they are the culprits but the coincidence is questionable at least.

It's quite possible that undocumented behaviour has changed - that you
were relying on something that you shouldn't have been, but that has
changed. Of course, it could be a real bug - or a change somewhere
else.

Sorry, that isn't terribly helpful really, is it?

The problem with using the...
BytesRead = InputStream.Read(Buffer, 0, ReadBufferSize);
... method is that the results are the same - truncated data.

Could you post that code as well, so we can check the logic?

Additionally, I used WireShark (WS) and it indicates a bad CRC on some
of the packets. The interesting thing is that it looks like the data
is getting to the target machine ok. I can look at the data in
WireShark and see all the proper tags in the XML, including the
message end tag. I don't see any strange data embedded in the XML in
WS.

Any other ideas? All help appreciated.

It would be helpful to write a pair of really simple console apps to
demonstrate it - one which just listened for a connection and then spat
out the data, and one which connected and then read it.

Kenny D · Mar 24, 2008

Blocking Peek(): I can place a break point after the call to peek.
The break point never gets hit until data is in the pipe waiting to be
read.

I will code up the console applications. They won't be ready for a
couple of hours as I have to step out. This will help out a lot
because I have not had anyone else try this on their machines.

Please keep an eye on this thread. I really need to get to the bottom
of this.

Thanks,

K

Code for the other method...

Int32 BytesRead = 0;
do
{
byte[] Buffer = new byte[ReadBufferSize];
BytesRead = InputStream.Read(Buffer, 0, ReadBufferSize);

if (BytesRead > 0)
{
MsgText.Append(Encoding.BigEndianUnicode.GetString(Buffer));
}

string Text = MsgText.ToString();
Int32 Pos = Text.IndexOf(EOM);

if (Pos > 0)
{
Pos += EOM.Length;
MsgText.Length = Pos;

break;
}

} while (InputStream.DataAvailable);

Kenny D · Mar 24, 2008

Also, the other code snippet was thrown together hastily just to test
for the truncation. It has not been thoroughly tested. Neither is it
very elegant.

Jon Skeet [C# MVP] · Mar 24, 2008

Kenny D said:
Blocking Peek(): I can place a break point after the call to peek.
The break point never gets hit until data is in the pipe waiting to be
read.

It may wait for the *first* time, but possibly not afterwards. I'll try
to prove it either way though.

I will code up the console applications. They won't be ready for a
couple of hours as I have to step out. This will help out a lot
because I have not had anyone else try this on their machines.

Happy to help

Please keep an eye on this thread. I really need to get to the bottom
of this.

} while (InputStream.DataAvailable);

That one's a problem to start with - it will only keep going while
there is data available *now*, rather than until the stream is closed
at the server end.

You should really just read until the call to Read returns 0.

I assume the server *is* closing its end of the stream, by the way?

Jeroen Mostert · Mar 24, 2008

Jon said:
Do you have concrete evidence of this? A look in Reflector suggests
that it's not really blocking - that it will return -1 if the last call
to fill the internal buffer didn't read as many bytes as it asked for.

This sort of question occurs every so often - I think I'm going to
write a stream class which one can "program" to behave in particular
ways (e.g. waiting a while before returning data, returning a certain
amount of data at a time etc). That probably won't happen tonight
though.

There's not as much use for that as you'd expect. The network stack in
particular will make sure at some point that the conditions your stream is
trying to guarantee can't be met, so you'll either need a way to communicate
how many bytes were actually read (and we already have that) or it's
exception-throwing time (and that's really only useful if you've got a
strict-length protocol).

It is true that NetworkStream's behavior tends to be rather unintuitive,
especially when you're used to other streams. It's why I usually don't
bother with the abstraction it tries to provide and use Socket directly
(with my own higher-level abstraction on top of it where relevant).

I think most of the problems could be solved if NetworkStream's behavior was
tightened up and documented better. At present it feels like you have to
know both how sockets work and how NetworkStream abstracts from them, and
that really doesn't make it much more convenient. NetworkStream wraps a
Socket for if you absolutely, positively must pass a Stream instance to someone.

Jon Skeet [C# MVP] · Mar 24, 2008

Jeroen Mostert said:
There's not as much use for that as you'd expect. The network stack in
particular will make sure at some point that the conditions your stream is
trying to guarantee can't be met, so you'll either need a way to communicate
how many bytes were actually read (and we already have that)

We already have that in Read - it's a case of creating a
MemoryStream-like type which lets you state how the stream should
respond in various cases.

I should stress that this class wouldn't be useful as a *production*
class - it would be a class to help *test* code. Basically a stub, or
at least somewhere between a stub and a mock.

or it's exception-throwing time (and that's really only useful if
you've got a strict-length protocol).

Well, it's handy to be able to check whether a program behaves
correctly in the face of IO errors.

It is true that NetworkStream's behavior tends to be rather
unintuitive, especially when you're used to other streams. It's why I
usually don't bother with the abstraction it tries to provide and use
Socket directly (with my own higher-level abstraction on top of it
where relevant).

I think most of the problems could be solved if NetworkStream's
behavior was tightened up and documented better. At present it feels
like you have to know both how sockets work and how NetworkStream
abstracts from them, and that really doesn't make it much more
convenient. NetworkStream wraps a Socket for if you absolutely,
positively must pass a Stream instance to someone.

I don't know - I'm quite happy with its behaviour for most of the time.
I always assume that *any* stream may return less bytes than have been
requested, even if there's more data on its way.

What other problems do you have with NetworkStream?

Jeroen Mostert · Mar 24, 2008

Jon said:
I don't know - I'm quite happy with its behaviour for most of the time.
I always assume that *any* stream may return less bytes than have been
requested, even if there's more data on its way.

And this is indeed the proper way to use any stream, but unfortunately
there's plenty of code out there assuming that all the world's basically a
file and that pulling in some more data will always work.

What other problems do you have with NetworkStream?

OK, let me rephrase it slightly -- NetworkStream isn't a problem if you know
it's a NetworkStream. When passing it to clients who use a Stream, it can be
quite a problem not to know when NetworkStream is going to block (possibly
indefinitely, until you close the socket, at which point you'll get an
IOException and anything goes).

Wrapping a NetworkStream in other streams, buffers or readers (which some
clients will do) is almost sure to cause even more unexpected blocking,
because a lot of those wrappers don't even have proper asynchronous
operations. The blocking wouldn't be such a problem if it didn't have the
potential to be *indefinite* -- if you're expecting this, it's fine, but a
lot of clients are not expecting this and it's hard to make them work well
with this scenario.

Then there's timeouts. NetworkStream doesn't document that setting the read
or write timeouts and getting an exception on timeout means that the stream
is now useless and should be closed (as should the underlying socket,
regardless of whether the stream owns the socket). This is documented in
setsockopt(), but you'd have to know that that's being used under the
covers. True, you ideally only get bitten by this once, but it's a pretty
leaky abstraction.

Now, most of this isn't actually NetworkStream's fault, but it does mean you
have to take care when passing a NetworkStream around. For streams with a
clear beginning and end (like HTTP traffic) it works out quite well; for
other streams, not so much.

Lasse Vågsæther Karlsen · Mar 24, 2008

Jeroen said:
And this is indeed the proper way to use any stream, but unfortunately
there's plenty of code out there assuming that all the world's basically
a file and that pulling in some more data will always work.

OK, let me rephrase it slightly -- NetworkStream isn't a problem if you
know it's a NetworkStream. When passing it to clients who use a Stream,
it can be quite a problem not to know when NetworkStream is going to
block (possibly indefinitely, until you close the socket, at which point
you'll get an IOException and anything goes).

Wrapping a NetworkStream in other streams, buffers or readers (which
some clients will do) is almost sure to cause even more unexpected
blocking, because a lot of those wrappers don't even have proper
asynchronous operations. The blocking wouldn't be such a problem if it
didn't have the potential to be *indefinite* -- if you're expecting
this, it's fine, but a lot of clients are not expecting this and it's
hard to make them work well with this scenario.

I think this is what Joel Spolsky would call a leaky abstraction.

You can abstract away all the details and try to hide the fact that
you're talking to something that may, or may not, get more data, at some
uncertain point in the future, but at some point the code that talks to
this abstracted class needs to know what can happen.

If everything's a stream, there has to be some common low-level ground
that they can all share, and if reading from the stream always blocks,
then it always blocks, otherwise it always completes as soon as possible.

If what happens depends on the underlying class, it's not really all
that abstract and the calling code would definitely have to been written
to handle this case.

Then there's timeouts. NetworkStream doesn't document that setting the
read or write timeouts and getting an exception on timeout means that
the stream is now useless and should be closed (as should the underlying
socket, regardless of whether the stream owns the socket). This is
documented in setsockopt(), but you'd have to know that that's being
used under the covers. True, you ideally only get bitten by this once,
but it's a pretty leaky abstraction.

And here you say that yourself. Teaches me to read the whole post I guess.

Well, now that I put so much work and energy into writing it, I'll just
go ahead and post it anyway

But just to reiterate my point, I don't think it's possible to treat all
streams as streams, as the documentation for how a "Stream" should
behave is lose enough that most, if not anything, goes, and thus the
code using the stream must probably be aware of what kind of stream it
is, or rather, limit itself to handling streams it know how to handle.

Or... you could build a layer that behaves like a file stream, warts and
all, around the networked stream, since most code that deals with
streams assumes there is a file involved.

Jeroen Mostert · Mar 24, 2008

Lasse said:
Or... you could build a layer that behaves like a file stream, warts and
all, around the networked stream, since most code that deals with
streams assumes there is a file involved.

But that's just the point -- you can't do that. A NetworkStream just *isn't*
a file and all the wrapping in the world isn't going to make it one. You
can't magic away things like indefinite blocking and partial reads. Clients
just have to handle this properly, or else you have to store *all* the data
in an actual file before processing it (or less conservatively a
MemoryStream). Anything inbetween won't solve the problem.

Now, if all clients treated a Stream as if it *could* be a NetworkStream,
things would have a better chance of working... since file streams can
certainly be seen as extraordinarily predictable, finite network streams.
Unfortunately a lot of them aren't as conservative as they should/could be.

Kenny D · Mar 25, 2008

Pete,

Thanks for the reply and, no, it is not my intention to have everyone
on the internet at my beck and call. You guys have been very helpful
and I appreciate it.

As to the questions:

Peek will return -1 if the transmitting application terminates the
connection for whatever reason. So will Read(); If the connection
gets broken, I want the code on the receiver to clean itself up,
dispose of the socket and associated resources and wait for the
transmitter to open another connection. This all worked at one point.

The messaging protocol we are using calls for an acknowledgement to be
sent by the receiver before the transmitter is supposed to send
another message. So, in effect, the business rules call for only one
message at a time to be "in the pipe." There is a timeout on the
transmitting side that waits a configurable amount of time to receive
the ACK message. If the ACK is not received in that time, it will
disconnect, open a new connection and try again.

It is my desire to hold the connection open for the length of the
communication between transmitter and receiver. If I close the
NetworkStream on the transmitter after sending the message, doesn't
that close the socket and terminate the connection? If so, this is
not the behavior I desire.

Before I finish coding the console applications, I am going to give
Pete's suggestions a try. I have also removed the call to Peek() and
replaced it with Read() and a couple of minor logic changes to make
sure that I get the behavior I am looking for. Once I can determine
the cause of the truncation, I will fix up the code for efficiency.

Thanks again. I will keep you posted.

Everyone's hep is VERY MUCH APPRECIATED.

Cheers,

K

Kenny D · Mar 25, 2008

Gentlemen,

THANK YOU, THANK YOU, THANK YOU!!!

Did I say THANK YOU?

It was the call to Peek() that was causing the problem! I replaced
the original call to Peek() with a call to Read() and a minor change
to the logic to just append the character read to the string.

This was a real PITA as I don't think it's clear from the
documentation that Peek() behaves as you all described it. Add to
this that I have used this code in .NET 2.0 and it worked perfectly on
every system I tested on.

Both my original code and Pete's code work fine now. Pete's code is,
of course, more elegent than my one character at a time method, so
that is what I will stick with. Thanks, Pete!

The only changes I had to make to Pete's code to get it to work
correctly was to place the "\r\n" back at the end of each string read
in. This is an interesting problem because while the communication
protocol does not require CRLF pairs be after closing XML tags, the
system I must interface to includes them, and recommends that our
system support it. It does make for XML that is easier to read and
debug.

My goal is to preserve whatever formatting was present in the original
message. Our system will send nicely formatted, human readable XML in
case a human has to look at it.

Jon, I have been to your web site and retrieved several articles from
there for my teammates to read.

Pete thanks for taking the time to rewrite the code!

Thanks, guys! This is much appreciated! It's great to know that
there are resources such as you folks on the internet!

Cheers,

K

Kenny D · Mar 25, 2008

I admit, I'm not really clear on why you desire that behavior. If all

you're going to do is create another NetworkStream to start reading from
the Socket again, why not just reuse the same NetworkStream rather than
closing it?

Actually, I am not creating a new NetworkStream to read from the
socket again. The messaging works roughly like this:

====================================

start:

Transmitter opens a connection to the receiver.

while(true)
{
Transmitter sends an XML message to the receiver and waits for an
ACK message for "nn" seconds.
if (ACK received before timeout)
continue
else
// connection broken
break;
}

goto start;

====================================

In the case of successful transmissions and successful ACK messages,
neither the NS or the socket on either side is closed. The connection
remains a two-way communication over one socket. Only in the case of
a failure (for whatever reason) is the connection dropped and opened
again.

I think that the goal of preserving formatting is a good one. However,
given that that's a goal, I think you might be better off using
StreamReader(char[], int, int). It doesn't sound like you really need
line-by-line access to the data; the main benefit of the StreamReader for
your code is the text encoding. So just read the characters in chunks
until you reach the end of the stream.

Great point. I do not need line by line access and I am aware of the
"\n" possibility. In fact, when first ingesting data from the present
system, that was the first thing I checked for when looking at the raw
data coming in.

In order to reach the end of the stream on the receiver, must I close
the NetworkStream on the transmitting side (without closing the
socket)? I am a little hazy on that.

Thanks again,

K

Kenny D · Mar 25, 2008

Ok, I think we got mixed up in the nature of brief written
communication.

The only time I want to reestablish the connection is if the network
fails for some reason. At that time, the transmitter will get an
error either when attempting to transmit a message or when listening
on the same socket for an acknowledgement. That's it. At no time
will either side purposefully disconnect. The NetworkStream and
Socket would continue to be used until some error caused us to
reconnect.

The listener, upon a network error would dispose of the socket and all
associated resources and wait for another connection attempt.

In the above case, reading to the end of the stream then becomes
problematic as (according to your last post) there would be no end of
stream per se. I would only read until the end-of-message tag ("</mos>
\r\n").

Make sense now?

If there is something inherently wrong with this, please feel free to
enlighten me. The help is much appreciated.

Thanks again!

Cheers,

K

.NET Remoting/Serialization way too slow vs C++ binary/tcp	11	Mar 15, 2005
Socket.Connect fails after several successful attemps	12	Sep 23, 2004
The problem with sockets using Visual C++ windows form application	1	Oct 21, 2008

Truncated data using .NET 3.5 sockets VS 2008

Kenny D

Jon Skeet [C# MVP]

Kenny D

Jon Skeet [C# MVP]

Kenny D

Jon Skeet [C# MVP]

Kenny D

Kenny D

Jon Skeet [C# MVP]

Jeroen Mostert

Jon Skeet [C# MVP]

Jeroen Mostert

Lasse Vågsæther Karlsen

Jeroen Mostert

Kenny D

Kenny D

Kenny D

Kenny D

Ask a Question

Similar Threads