BinaryReader / BinaryWriter possible bug

Klaus Petersen · May 27, 2004

Hi.

I'm trying to figure out how the BinaryWriter class stores strings - the
BinaryWriter is attached to a MemoryStream.

BinaryWriter inserts stores a value just before the actual string to reveal
the length of the string to follow.

If the length of the string is less than 128 chars, its length is stored as
a single byte.

If the string is longer than 127 chars, the length is stored as 2 bytes -
and the length can be restored from the 2 bytes in the following way:

length = byte [1] * 128 + byte [0] - 128

This way of storing the length should be good enough to enable any contence
in the string (e.g. chars from 0 to 255).

However, BinaryWriter still converts all chars of value 128 into 63, which
makes a reader unable to tell them apart.

If you force a char in the string to be of value 128, the BinaryReader
converts chars of value 128 to 172, which makes the code using the output of
the BinaryReader unable to tell these two chars apart.

I'm using default encoding on the BinaryReader and the BinaryWriter aswell.

Can someone explain why the BinaryReader/BinaryWriter has this "feature" or
suggest a solution?

Regards
Klaus

Jon Skeet [C# MVP] · May 27, 2004

Klaus Petersen said:
I'm trying to figure out how the BinaryWriter class stores strings - the
BinaryWriter is attached to a MemoryStream.

BinaryWriter inserts stores a value just before the actual string to reveal
the length of the string to follow.

If the length of the string is less than 128 chars, its length is stored as
a single byte.

If the string is longer than 127 chars, the length is stored as 2 bytes -
and the length can be restored from the 2 bytes in the following way:

length = byte [1] * 128 + byte [0] - 128

Etc - it can take more than 2 bytes if the string is long enough.

This way of storing the length should be good enough to enable any contence
in the string (e.g. chars from 0 to 255).

Chars don't go from 0-255, they go from 0-65535.

Length storage is pretty much orthogonal to character storage though.

However, BinaryWriter still converts all chars of value 128 into 63, which
makes a reader unable to tell them apart.

That suggests you're using the wrong Encoding, usually.

If you force a char in the string to be of value 128, the BinaryReader
converts chars of value 128 to 172, which makes the code using the output of
the BinaryReader unable to tell these two chars apart.

What do you mean by "force a char in the string"?

I'm using default encoding on the BinaryReader and the BinaryWriter aswell.

What exactly do you mean by "default encoding" here? Encoding.Default,
or not specifying an encoding?

Could you post a short but complete program which demonstrates the
problem?

See http://www.pobox.com/~skeet/csharp/complete.html for details of
what I mean by that.

Bug in BinaryWriter	2	Jul 12, 2007
Handling severe packet loss (serialization).	10	Mar 18, 2010
Bug or Feature in BinaryReader.PeekChar()?	0	Nov 29, 2004
problem de-serializing binary data stored in db	4	Jul 20, 2006
code to retrieve binary field from Sql Server	1	Mar 23, 2007
Possible bug in UnicodeEncoding	3	Sep 12, 2006
using generics to serialize primitives.	10	May 20, 2008
about binaryreader	1	Aug 25, 2005

BinaryReader / BinaryWriter possible bug

Klaus Petersen

Jon Skeet [C# MVP]

Ask a Question

Similar Threads