Odd string encoding behaviour

M

Miki Watts

I'm having a problem with encoding a string... here's my code:

byte[] s = System.Text.Encoding.ASCII.GetBytes(FieldContent);

Now, this works fine, as long as there are no bytes that are over 128, for
example, a 0x99 byte turns out as 0x3f byte.
I know that ASCII is just 7 bits, but i tried the other encoding formats,
and they didn't get me what i needed... UTF7 did the same thing as ASCII,
UTF8 gave 0xC299 for each 0x99 byte, and UNICODE gave good results, but in
unicode format.

What am i doing wrong?

Miki
 
M

Miki Watts

Well, i managed to find a solution of some sort:

System.Text.Encoding e = System.Text.Encoding.GetEncoding("iso-8859-1");
output = BitConverter.ToString(e.GetBytes(FieldContent)).Replace("-"," ");

Is there something equivalent to the iso-8859-1 codepage?

Miki
 
M

Mihai N.

Is there something equivalent to the iso-8859-1 codepage?
1252 is the MS equivalent (it is in fact iso-8859-1 with some extras)
 
J

Jon Skeet [C# MVP]

Miki Watts said:
I'm having a problem with encoding a string... here's my code:

byte[] s = System.Text.Encoding.ASCII.GetBytes(FieldContent);

Now, this works fine, as long as there are no bytes that are over 128, for
example, a 0x99 byte turns out as 0x3f byte.
I know that ASCII is just 7 bits, but i tried the other encoding formats,
and they didn't get me what i needed... UTF7 did the same thing as ASCII,
UTF8 gave 0xC299 for each 0x99 byte, and UNICODE gave good results, but in
unicode format.

What am i doing wrong?

Nothing. What do you think it's doing wrong? It's doing exactly what it
should be - it's encoding your text in the various different ways,
depending on the encoding type used.

See http://www.pobox.com/~skeet/csharp/unicode.html for more
information.
 
J

Jon Skeet [C# MVP]

Mihai N. said:
1252 is the MS equivalent (it is in fact iso-8859-1 with some extras)

Sort of - using the 8859-1 code page, you'll actually end up with bytes
effectively being "passed through", even if they shouldn't really be.
(I'm talking about characters 128-139 IIRC.) Code page 1252 has
entirely different characters in that range (the extras you mean).

If the OP wants 8859-1, he can just use the form he's already shown, or
ask for codepage number 28591. It's not a good idea though, if he's
basically using it to treat a string as sequence of bytes instead of
chars.
 
M

Miki Watts

If the OP wants 8859-1, he can just use the form he's already shown, or
ask for codepage number 28591. It's not a good idea though, if he's
basically using it to treat a string as sequence of bytes instead of
chars.

well, yes, basically, that is what i want to do, i want a string (i.e.
dynamic resize) that contains the exact bytes that i want, without
interpetation or encoding. I haven't found any other construct that can do
this for me though. (byte[] should be what i need, but it's not dynamic).
 
J

Jon Skeet [C# MVP]

Miki Watts said:
well, yes, basically, that is what i want to do, i want a string (i.e.
dynamic resize) that contains the exact bytes that i want

Strings don't contain bytes. They contain characters. You shouldn't use
them for binary data - that's not what they're designed for.
without
interpetation or encoding. I haven't found any other construct that can do
this for me though. (byte[] should be what i need, but it's not dynamic).

String itself isn't dynamic either - once created, a string is fixed.
It just has methods to make it easy to create a new string with (say)
the value of two strings concatenated.

I suspect that MemoryStream might be helpful to you though.
 
M

Miki Watts

I suspect that MemoryStream might be helpful to you though.

ok, thanks. I'll check it out.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top