Determine encoding

R

Ryu

Is there a way to determine if a text is ASCII or Unicode in C#. I have
looked at Encoding classes but I have found that They dont allow me to pass
a text to the encoding obj. In addition is there a way to determine the
text's language?
 
J

Jon Skeet [C# MVP]

Ryu said:
Is there a way to determine if a text is ASCII or Unicode in C#. I have
looked at Encoding classes but I have found that They dont allow me to pass
a text to the encoding obj. In addition is there a way to determine the
text's language?

There's no way to determine it absolutely reliably. However, if you
have a look at the bytes and find that every other byte is 0, chances
are you should be using Encoding.Unicode.
 
M

Morten Wennevik

Hi Ryu,

No, encoding specification isn't stored anywhere in pure text, it is
simply an array of bytes that may be one byte per character or two (or
something else). You need to know the encoding in advance to be able to
decode the text properly, or you can do an educated guess.

And no, you can't determine the language of a text. Well, you could try
to recognize certain words of the text and determine the language that
way, involving comparing the words with a whole list of possible words in
various languages.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top