UTF8 encoding - Problem

  • Thread starter Thread starter Frank Esser
  • Start date Start date
F

Frank Esser

Hello!

On a PC with German Codepage settings I want to get UTF8 out of string in my
application.

I use this function:

Byte[] array = Encoding.UTF8.GetBytes("à");

When I look at the Unicode tables then this character is in the Latin table
and has a hex value of 0x00E0.

But when I look at my byte array then I see 0xC3A0.

What's wrong ???

Thanks!
 
Hi Frank,

What you are seeing is correct. You can verify that the value is stored correctly by translating it back using

Encoding.UTF8.GetString(array)

You can't assume UTF8 characters will have the same byte values as Unicode for characters above the ASCII range, and 'á' falls into the extended ASCII range. For an explanation to how an UTF8 character is calculated take a look at this page.

http://en.wikipedia.org/wiki/UTF8


Hello!

On a PC with German Codepage settings I want to get UTF8 out of string in my
application.

I use this function:

Byte[] array = Encoding.UTF8.GetBytes("à");

When I look at the Unicode tables then this character is in the Latin table
and has a hex value of 0x00E0.

But when I look at my byte array then I see 0xC3A0.

What's wrong ???

Thanks!
 
Back
Top