Jerry Pisk <(E-Mail Removed)> wrote:
> You're right, it is a bug, but the correct answer is not what you think it
> is.
I think that depends on how you read the documentation.
> In UTF-8 a character can be up to 6 bytes, see
> http://www.ietf.org/rfc/rfc2279.txt, chapter 2. As for the frameworks
> internal representation - it uses UCS-2, where each character is expressed
> as 2 bytes with the exception of characters larger than 0xFFFF which are
> expressed as a sequence of two characters, called surrogate pair. So each
> character in UCS-2 takes up two bytes but some Unicode characters have to be
> expressed in pairs.
That's exactly what I thought. I believe GetMaxByteCount is meant to
return the maximum number of bytes for a sequence of 16-bit characters
though, where 2 characters forming a surrogate pair counts as 2
characters in the input. That way the maximum number of bytes required
to encode a string, for instance, is GetMaxByteCount(theString.Length).
Given that pretty much the whole of the framework works on the
assumption that a character is 16 bits and that surrogate pairs *are*
two characters, this seems more useful. It would be better if it were
more explicitly documented either way, however.
--
Jon Skeet - <(E-Mail Removed)>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too