How to validate a string containing Chinese?

  • Thread starter Thread starter Kevin
  • Start date Start date
K

Kevin

Hi All,

I want to validate a string, and see if it contains any Chinese character
(simple or traditional). I'm trying to use RegExp and Encoding, but no
result.
Can someone point me a direction?

Kind regards,
Kevin
 
Hi Kevin,

I am not sure why you would need the Encoding class. All strings are
internally Unicode in .NET (unless you are dealing with _byte_ arrays).
Therefore, what I think you need first is to determine what range(s) of
numeric character codes constitute all the Chinese hieroglyphs. Then, just
define an RegExp pattern that would capture all such characters, for
example, if there are just two ranges:

[\uXXXX-\uYYYY]|[\uZZZZ-\uWWWW]

(of course XXXX must be numerically less than YYYY, the same goes for ZZZZ
and WWWW).
 
Kevin said:
I want to validate a string, and see if it contains any Chinese character
(simple or traditional). I'm trying to use RegExp and Encoding, but no
result.
Can someone point me a direction?

Dmytro gave a regular expression solution - I'd just hard code it,
personally. Just iterate through each character in the string, and
check whether it's in the range you're interested in. Personally I
think that's a bit more readable than the regular expression solution,
although if there are lots of ranges to consider, a regular expression
formatted on multiple lines with a range and a comment on each line
might be better than a hard-coded solution. (The hard-coded solution is
likely to be faster too, but I wouldn't worry about that until you've
determined that it's actually a performance bottleneck.)
 
Back
Top