Multilanguage

  • Thread starter Thread starter Zhangming Su
  • Start date Start date
Z

Zhangming Su

My database contains multiple languages such as Englisg, French, Chinese,
Japanese...(16 languages). Can any one tell me how I can detect which kind
of language is saved in a column without any additional flag by program?
 
Zhangming,

There is really no way to do that. Assuming that you are storing
everything in unicode columns, you can try and scan every character, and see
if they all fall within some subset of unicode which corresponds to a
particular language, but at best, that is a heuristic. You really need to
store the language along with the information.
 
My database contains multiple languages such as Englisg, French, Chinese,
Japanese...(16 languages). Can any one tell me how I can detect which kind
of language is saved in a column without any additional flag by program?

You need a NLP Language Identifiction tool/algorithm.
Check this:
http://www.let.rug.nl/~vannoord/TextCat/competitors.html

I myself is not an expert, saw that months ago, still not get my hands
wet yet.
 
You need a NLP Language Identifiction tool/algorithm.
Check this:
http://www.let.rug.nl/~vannoord/TextCat/competitors.html

I myself is not an expert, saw that months ago, still not get my hands
wet yet.

Certainly an interesting solution ... however I wonder if there are
some limitations in term of
a) Accuracy -- how will the tool function with a few single words that
could be cross-language?
b) Performance -- how long do the tools take to evaluate each input?

I wonder if it wouldn't be simpler to have a column that stores the
language ...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Back
Top