Encoding question

Guest · Dec 6, 2004

Hello.

Can someone help me with the diff between UTF8 and Unicode encoding ?
I know both use 8 bits, both can use more then 2 Bytes (?)

Thanks.

Dan Bass · Dec 6, 2004

UTF-8 is a unicode encoding format that can be upto 32 bits per character if
the complexity of the data being encoded requires it, but for standard
characters is 8 bits.

I'd recommend starting here:

http://msdn.microsoft.com/library/d...us/cpguide/html/cpconusingunicodeencoding.asp

Jon Skeet [C# MVP] · Dec 6, 2004

Amir said:
Can someone help me with the diff between UTF8 and Unicode encoding ?
I know both use 8 bits, both can use more then 2 Bytes (?)

See http://www.pobox.com/~skeet/csharp/unicode.html

Ollie · Dec 6, 2004

http://en.wikipedia.org/wiki/ISO_10646

HTH

Ollie Riches

Mihai N. · Dec 7, 2004

UTF-8 is a unicode encoding format that can be upto 32 bits per character if
the complexity of the data being encoded requires it, but for standard
characters is 8 bits.

This is really total confusion.
Short story: Unicode is code page (collection of characters, each character
with an associated value). Unicode uses the term "code point", because
what Unicode encodes are not only "characters".
A valid character value can be between 0x0000-0x10FFFF.

UTF8, UTF16, UTF32 are different ways of representing Unicode.
One Unicode code point can be between 1 and 4 bytes in UTF8, 2 or 4 bytes as
utf16, 4 bytes as utf32.
utf8, 16, 32 are different ways to represent Unicode.
Same as binary, octal, hex, decimal, BCD are different way of representing
numbers.

Encoding question

Guest

Dan Bass

Jon Skeet [C# MVP]

Ollie

Mihai N.