UTF8 encoding - Problem

F

Frank Esser

Hello!

On a PC with German Codepage settings I want to get UTF8 out of string in my
application.

I use this function:

Byte[] array = Encoding.UTF8.GetBytes("à");

When I look at the Unicode tables then this character is in the Latin table
and has a hex value of 0x00E0.

But when I look at my byte array then I see 0xC3A0.

What's wrong ???

Thanks!
 
M

Morten Wennevik

Hi Frank,

What you are seeing is correct. You can verify that the value is stored correctly by translating it back using

Encoding.UTF8.GetString(array)

You can't assume UTF8 characters will have the same byte values as Unicode for characters above the ASCII range, and 'á' falls into the extended ASCII range. For an explanation to how an UTF8 character is calculated take a look at this page.

http://en.wikipedia.org/wiki/UTF8


Hello!

On a PC with German Codepage settings I want to get UTF8 out of string in my
application.

I use this function:

Byte[] array = Encoding.UTF8.GetBytes("à");

When I look at the Unicode tables then this character is in the Latin table
and has a hex value of 0x00E0.

But when I look at my byte array then I see 0xC3A0.

What's wrong ???

Thanks!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top