Encoding/Decoding chinese characters

L

Long Pham

Hi Group,

I have some data stored in my DB in as Big5 and GB2312 character sets.
I need to be able to decode that to unciode UTF8 so that I can display it
those characters.

I have tried the following but the output does not seem correct according to
some reference web sites I have seen.

string inputBig5 = "¨¬²y";
Encoding big5Encoding = Encoding.GetEncoding("big5");

byte[] bytes = big5Encoding.GetBytes(inputBig5);

Encoding utf8Encoding = Encoding.UTF8;
string output = unicodeEncoding.GetString(bytes);

TIA!
 
J

Jon Skeet [C# MVP]

Long Pham said:
I have some data stored in my DB in as Big5 and GB2312 character sets.
I need to be able to decode that to unciode UTF8 so that I can display it
those characters.

I have tried the following but the output does not seem correct accordingto
some reference web sites I have seen.

string inputBig5 = "¨¬²y";
Encoding big5Encoding = Encoding.GetEncoding("big5");

byte[] bytes = big5Encoding.GetBytes(inputBig5);

This is the problem - your "inputBig5" string isn't in Big5, it's a
series of Unicode code points by virtue of it being a .NET string in
the first place.

Your database should probably be doing all this for you, however - are
you absolutely sure it's not? It should be giving you a Unicode string
with the appropriate characters in. If not, can you get it to give you
the bytes directly? If so, that's why you should use Encoding, but
giving it the bytes and calling GetString.

See http://www.pobox.com/~skeet/csharp/unicode.html for more
information.
 
L

Long Pham

Jon,

Yes, that makes sense.

If the DB (Oracle 9i) is giving me the unicode string, how to get it to give
me the actual Chinese characters (as this is essentially what I need) ?

Long Pham said:
I have some data stored in my DB in as Big5 and GB2312 character sets.
I need to be able to decode that to unciode UTF8 so that I can display it
those characters.

I have tried the following but the output does not seem correct according to
some reference web sites I have seen.

string inputBig5 = "¨¬²y";
Encoding big5Encoding = Encoding.GetEncoding("big5");

byte[] bytes = big5Encoding.GetBytes(inputBig5);

This is the problem - your "inputBig5" string isn't in Big5, it's a
series of Unicode code points by virtue of it being a .NET string in
the first place.

Your database should probably be doing all this for you, however - are
you absolutely sure it's not? It should be giving you a Unicode string
with the appropriate characters in. If not, can you get it to give you
the bytes directly? If so, that's why you should use Encoding, but
giving it the bytes and calling GetString.

See http://www.pobox.com/~skeet/csharp/unicode.html for more
information.
 
J

Jon Skeet [C# MVP]

Long Pham said:
Yes, that makes sense.

If the DB (Oracle 9i) is giving me the unicode string, how to get it to give
me the actual Chinese characters (as this is essentially what I need) ?

How sure are you that it's not doing so already? How are you trying to
verify that?

The best way I know is to get it to print out the Unicode value of each
character. For instance:

foreach (char x in dodgyString)
{
Console.WriteLine ((int)x);
}
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top