surrogate characters and chars

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

if a string contains surrogate chars (i.e. Unicode characters that consiste
of more than 1 char) do functions that use an indexer or a string length into
the string e.g. Mid, Len work correctly?

guy
 
All strings in .NET are Unicode. Indexing is per-character, not per byte.
I assume you are coming from a C/C++ background? :)))
 
From help:

The LenB function in earlier versions of Visual Basic returns the number
of bytes in a string rather than characters. It is used primarily for
converting strings in double-byte character set (DBCS) applications. All
Visual Basic .NET strings are in Unicode, and LenB is no longer supported.

So the short answer is Yes. Len, Mid etc. work fine. If you try in VB6
for example you would (normally) get Len("A") = 1 but LenB("A") = 2
thereby demonstrating that strings are handled as unicode.
 
if a string contains surrogate chars (i.e. Unicode characters that consiste
of more than 1 char) do functions that use an indexer or a string length into
the string e.g. Mid, Len work correctly?

What is correct to you? They work the same way the String class works,
i.e. Len (just like String.Length) returns the number of 16-bit Chars
in the string. If you want to treat a surrogate pair as a single
element you should have a look at the StringInfo class in .NET 2.0.


Mattias
 
Sorry for not making myself clear, i am interested in the impact surrogate
char/char pairs have, these are used to extend unicode and in effect are
32bit chars.
what i need to know is if i ahve a string that consists of chinese text -
3 graphically symbols - which in the string are stored as char, surrogate
char and char, char the situation is that the i am using 4 .net chars to
represent 3 graphical characters.
so would Len(myString) return 3 or 4?
would Left(myString,2) give me the first 2 graphical characters (3 chars) or
a string consisting of 1 graphical char and a char which is the first half of
the second (surrogate) char?

btw I am Not a C person:-) i have been using basic since 1976!
 
Looks like the normal vb and string functions dont dfferentiate surrogate
pairs, i will need to use StringInfo.ParseCombiningCharacters
 
Back
Top