Chinese character sets

S

Steven Nagy

Hi all, I have to do a website in chinese!

Basically I just need to know how to output chinese characters. I am
assuming its very easy, but have never done it before. I can however do
simple things like changing the formats of currency and calendars and
so on.

I am guessing the answer is quite simple given; I assume Unicode would
support all the chinese characters right? Ideally I'd like them to be
able to enter their own content through a WYSIWYG or something similar
in their native language so that I don't have to worry about any
translation. Knowing me, I'd intend to write "give your dog a bone" but
end up writing "I'd like to bone your dog".

Cheers
Steve
 
J

Jon Skeet [C# MVP]

Steven Nagy said:
Hi all, I have to do a website in chinese!

Basically I just need to know how to output chinese characters. I am
assuming its very easy, but have never done it before. I can however do
simple things like changing the formats of currency and calendars and
so on.

I am guessing the answer is quite simple given; I assume Unicode would
support all the chinese characters right? Ideally I'd like them to be
able to enter their own content through a WYSIWYG or something similar
in their native language so that I don't have to worry about any
translation. Knowing me, I'd intend to write "give your dog a bone" but
end up writing "I'd like to bone your dog".

It's really just a matter of choosing your encoding, as far as writing
the file is concerned. You might choose Big5, or UTF-8. The latter is
somewhat handier in various ways, and can cover all Unicode strings.

Strings in .NET are Unicode by default, so you shouldn't have many
problems.

See http://www.pobox.com/~skeet/csharp/unicode.html for a bit more
information.
 
S

Steven Nagy

Cheers Jon.

In my research I have discovered that lots of people seem to have
problems storing the chinese chars in an english SQL Server 2000
instance.
Has anyone experienced this? Any advice to give me before I design the
schema?
Essentially the plan is to create a chinese web site, provide an admin
page that lets them edit all their own content through a wysiwyg that
supports chinese chars. The chinese chars obviously get saved in the
sql database and retreived on request of whichever content page was
requested.
Does anyone see any fundamental flaws that may not be possible with
chinese chars?
Also, is it usual to allow the user to option between traditional and
simplified chinese? I don't know anything about the language and am not
sure if chinese sites normally support this feature. Perhaps they are
all in 1 or the other as a general rule?

Thanks,
Steve
 
J

Jon Skeet [C# MVP]

Steven Nagy said:
In my research I have discovered that lots of people seem to have
problems storing the chinese chars in an english SQL Server 2000
instance.
Has anyone experienced this? Any advice to give me before I design the
schema?

My experience with this is that there are issues with collation,
particularly with Japanese (not sure about Chinese) and that often it
depends on how the *instance* was created to start with, unless the
database itself specifies the collation on creation.

I'm not a SQL Server expert, but I'd definitely try some
experimentation, and make sure that all the appropriate fields are
Unicode text fields.
Essentially the plan is to create a chinese web site, provide an admin
page that lets them edit all their own content through a wysiwyg that
supports chinese chars. The chinese chars obviously get saved in the
sql database and retreived on request of whichever content page was
requested.
Does anyone see any fundamental flaws that may not be possible with
chinese chars?

No, that should be fine.
Also, is it usual to allow the user to option between traditional and
simplified chinese? I don't know anything about the language and am not
sure if chinese sites normally support this feature. Perhaps they are
all in 1 or the other as a general rule?

I don't *think* you can easily just switch text between the two - but
you could save whatever the user enters, whichever character set they
use.
 
L

Lau Lei Cheong

If you're using data objects .NET framework provides and not those 3rd party
data objects, and remember to pass all your parameters to the SQL string by
SQLParameters, you should be quite safe as so far I haven't experienced any
problem. (Of course, you have to configure your database to use Unicode
first)

For other cases, there's a lot of things to do. For example, in MySQL you
can use "char(hex value seperated by comma)" to store Big5 string without
problem even if the database is set to ASCII charset only. Read your SQL
manual to see the SQL language specific syntax you may use.

Note that String.Replace("\"", "\"\"") like method won't do as .NET knows
the double-byte char is a single char, so that replacement won't work.
 
L

Lau Lei Cheong

If you need conversion, I'd recommand you to store everything in
Tranditional Chinese in order to get rid of "lost of information" during
translation.

You know, CHS to CHT conversion is a many to one conversion. Normally the
translation routines do a good job in finding a highest probable
replacement. But in case of username, it doesn't help much. And I can tell
you many user in the world will upset if you display the name they entered
wrongly.

Store everything in CHT in the beginning will save you from a lot of problem
later.
 
M

Mihai N.

In my research I have discovered that lots of people seem to have
problems storing the chinese chars in an english SQL Server 2000
instance.

Go Unicode all the way.
UTF-8 for the web pages and forms, NCHAR & NVARCHAR for the database.
Also, is it usual to allow the user to option between traditional and
simplified chinese?
You should think of them as different languages.
So if you allow swithcing between French, German, English, then you should
allow for Simplified/Traditional Chinese.
Perhaps they are
all in 1 or the other as a general rule? No.

The StrConv function seems to support translation back and forth
between simplified and traditional.
Useless. Even if you convert between Simplified Chinese encoding (gb1230) and
Traditional Chinese encoding (big5), there are still linguistic diffences
(more than US/Australia/U.K./Indian English).
Yes, the user might understand something, but it will be obvious is not the
real thing.
You know, CHS to CHT conversion is a many to one conversion. ....
Store everything in CHT in the beginning will save you from a lot of
problem later.
Please don't. They two are different languages, there is no conversion.
It is more like translation. Compare again with English.
One might come up with a list of search-replaces (color-colour and so on),
but then you have full expressions (money for jam <=> easy money, fitted
carpet <=> wall to wall carpeting)

Best advice: keep them separate, consider them different languages.
 
S

Steven Nagy

Ok thanks for that.
Which of the 2 do you recommend? Traditional or Simplified?
 
M

Mihai N.

Ok thanks for that.
Which of the 2 do you recommend? Traditional or Simplified?
There is no direct answer for this. What customers do you have?
Is like asking "Which of the 2 do you recommend: German or French?"
 
S

Steven Nagy

The customers are chinese!
So you can't use a french german comparison because they are two
different countries.
Germans don't have two different writing scripts.

I guess what I am asking is:
Which is used more predominantly on chinese websites, traditional or
simplified?
I have no idea what markets the two seperate scripts would be targeting
because I have no idea who uses which script.
 
L

Lau Lei Cheong

I'd say half and half.

CHT is commonly used in Taiwan, Hong Kong, Macau and Singapore, while CHS is
used in other places of the world.

So the decision would be of which audience your website is intended to. If
you audience includes both parties, you had better support both language.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top