Detect characterset of text file

J

Jaroslav Jakes

Hi,

I can't find out, how to detect the characterset of a text file. Orders I
receive by mail have the same structure, but are created on different
systems, like DOS, Windows, Linux. So the characterset may be different.
After importing these file, the results may be different, because of the
fact, that characters like ä, à... of a DOS-created file are not the same
compared to a Windows-created file.

Do you know a way how to detect the characterset of a text file in order to
convert this file correctly?

Thank's and regards

Jari
 
M

Morten Wennevik

Hi Jari,

For simple text file, as far as I know, there is information about the characterset used.
You have to either guess the characterset or try to find out from who or what made the text file.
 
J

Jaroslav Jakes

Hi Morten,

I could map e.g. sender of text file to characterset used by this sender.
Would prefer a solution with detection of characterset: sender could change
his PC, OS and this would mean, that definition would have to be changed as
well.

Thank's - Jari
 
J

Jon Skeet [C# MVP]

Jaroslav Jakes said:
I could map e.g. sender of text file to characterset used by this
sender. Would prefer a solution with detection of characterset:
sender could change his PC, OS and this would mean, that definition
would have to be changed as well.

The problem is that two users could send two identical files, but mean
entirely different things by them. There are some character encodings
which can be detected as the *likely* encoding for a file, but it'll
never be 100% accurate.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top