BinaryReader / BinaryWriter possible bug

K

Klaus Petersen

Hi.

I'm trying to figure out how the BinaryWriter class stores strings - the
BinaryWriter is attached to a MemoryStream.

BinaryWriter inserts stores a value just before the actual string to reveal
the length of the string to follow.

If the length of the string is less than 128 chars, its length is stored as
a single byte.

If the string is longer than 127 chars, the length is stored as 2 bytes -
and the length can be restored from the 2 bytes in the following way:

length = byte [1] * 128 + byte [0] - 128

This way of storing the length should be good enough to enable any contence
in the string (e.g. chars from 0 to 255).

However, BinaryWriter still converts all chars of value 128 into 63, which
makes a reader unable to tell them apart.

If you force a char in the string to be of value 128, the BinaryReader
converts chars of value 128 to 172, which makes the code using the output of
the BinaryReader unable to tell these two chars apart.

I'm using default encoding on the BinaryReader and the BinaryWriter aswell.

Can someone explain why the BinaryReader/BinaryWriter has this "feature" or
suggest a solution?

Regards
Klaus
 
J

Jon Skeet [C# MVP]

Klaus Petersen said:
I'm trying to figure out how the BinaryWriter class stores strings - the
BinaryWriter is attached to a MemoryStream.

BinaryWriter inserts stores a value just before the actual string to reveal
the length of the string to follow.

If the length of the string is less than 128 chars, its length is stored as
a single byte.

If the string is longer than 127 chars, the length is stored as 2 bytes -
and the length can be restored from the 2 bytes in the following way:

length = byte [1] * 128 + byte [0] - 128

Etc - it can take more than 2 bytes if the string is long enough.
This way of storing the length should be good enough to enable any contence
in the string (e.g. chars from 0 to 255).

Chars don't go from 0-255, they go from 0-65535.

Length storage is pretty much orthogonal to character storage though.
However, BinaryWriter still converts all chars of value 128 into 63, which
makes a reader unable to tell them apart.

That suggests you're using the wrong Encoding, usually.
If you force a char in the string to be of value 128, the BinaryReader
converts chars of value 128 to 172, which makes the code using the output of
the BinaryReader unable to tell these two chars apart.

What do you mean by "force a char in the string"?
I'm using default encoding on the BinaryReader and the BinaryWriter aswell.

What exactly do you mean by "default encoding" here? Encoding.Default,
or not specifying an encoding?

Could you post a short but complete program which demonstrates the
problem?

See http://www.pobox.com/~skeet/csharp/complete.html for details of
what I mean by that.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top