Determine encoding

  • Thread starter Thread starter Ryu
  • Start date Start date
R

Ryu

Is there a way to determine if a text is ASCII or Unicode in C#. I have
looked at Encoding classes but I have found that They dont allow me to pass
a text to the encoding obj. In addition is there a way to determine the
text's language?
 
Ryu said:
Is there a way to determine if a text is ASCII or Unicode in C#. I have
looked at Encoding classes but I have found that They dont allow me to pass
a text to the encoding obj. In addition is there a way to determine the
text's language?

There's no way to determine it absolutely reliably. However, if you
have a look at the bytes and find that every other byte is 0, chances
are you should be using Encoding.Unicode.
 
Hi Ryu,

No, encoding specification isn't stored anywhere in pure text, it is
simply an array of bytes that may be one byte per character or two (or
something else). You need to know the encoding in advance to be able to
decode the text properly, or you can do an educated guess.

And no, you can't determine the language of a text. Well, you could try
to recognize certain words of the text and determine the language that
way, involving comparing the words with a whole list of possible words in
various languages.
 
Back
Top