What's happended to ANSI chars 135 and 130 - ToString/GetBytes

G

gizmo

Hi,

Here's a little hack I put together to try to get to the bottom of a
problem I'm having with trying to base64 encode a hash value. The hash
value contains character codes 135 and 130 amongst others.

This snippet will set up a string of chars 190, 135, 130, 73, 242, 243,
10. It puts them into a bytearray.

string encodedData;
byte[] andBackAgainBytes;
char a1 = (char)190;
char a2 = (char)135;
char a3 = (char)130;
char a4 = (char)73;
char a5 = (char)242;
char a6 = (char)243;
char a7 = (char)10;
string data = a1.ToString() + a2.ToString() + a3.ToString() +
a4.ToString() + a5.ToString() + a6.ToString() + a7.ToString();
byte[] encData_byte = new byte[data.Length];
encData_byte = System.Text.Encoding.Default.GetBytes(data);

However when debugging I look at the resulting byte array I see the
following character codes:

190, 63, 63, 73, 242, 243, 10

Note: the ANSI character "single baseline quote" is char 130 and
"dagger (double)" is char 135.

Any idea what happended to chars 130 and 135. I know a work around but
I'm curious as to why this is the case.

Thanks
Gizmo
 
D

dmm

I think the problem is that you are mixing your codepages. Casting an
integer value to a char will invoke the unicode codepage (.NET chars are 2
byte unicode chars). You are then using
the Windows 1252 codepage to try to get back to the origonal value.
 
L

Larry Lard

gizmo said:
Hi,

Here's a little hack I put together to try to get to the bottom of a
problem I'm having with trying to base64 encode a hash value.

The quick answer is to use Convert.ToBase64String on a byte array, but
we will continue for the purposes of enlightenment :)
The hash
value contains character codes 135 and 130 amongst others.

Contains *bytes* 135 and 130. One of the key things about thinking C#
rather than C/++ is knowing that "bytes is bytes and characters is
characters, and only via an Encoding shall the twain meet".
This snippet will set up a string of chars 190, 135, 130, 73, 242, 243,
10.

We'll see what it does in just a moment
It puts them into a bytearray.

string encodedData;
byte[] andBackAgainBytes;
char a1 = (char)190;
char a2 = (char)135;
char a3 = (char)130;
char a4 = (char)73;
char a5 = (char)242;
char a6 = (char)243;
char a7 = (char)10;

OK so far, though semantically dodgy...
string data = a1.ToString() + a2.ToString() + a3.ToString() +
a4.ToString() + a5.ToString() + a6.ToString() + a7.ToString();

Still fine, chars can become strings just fine...
byte[] encData_byte = new byte[data.Length];
encData_byte = System.Text.Encoding.Default.GetBytes(data);

And here's the problem. Now, your IP is (like me) in the UK, so I'm
going to guess that your default encoding is the same as mine - good
old Windows Codepage 1252, which is like but not the same as)
iso-8859-1. Now, there's a funny thing about this encoding. Take a look
at <http://en.wikipedia.org/wiki/ISO/IEC_8859-1#Code_table> - and
you'll see a big gap in the middle marked 'unused'. Further down, we
are told "Code values 00-1F, 7F, and 80-9F are not assigned to
characters by ISO/IEC 8859-1". This is our problem - the characters
represented by (char)130 (0x82) and (char)135 (0x87) simply *are not
present* in Encoding.Default, so cannot be converted by it into bytes.
However when debugging I look at the resulting byte array I see the
following character codes:

190, 63, 63, 73, 242, 243, 10

So it helpfully gives us ?s instead - which is where the 63s come from.
Note: the ANSI character "single baseline quote" is char 130 and
"dagger (double)" is char 135.

Any idea what happended to chars 130 and 135. I know a work around but
I'm curious as to why this is the case.

Hopefully I have slightly cleared up the why, but the really important
thing is to get the concept of the separation between characters and
bytes completely embedded in the way you think C#.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top