Dealing with textfiles with multiple encodings

G

Gustaf

I'm planning to write an app that will extract certain messages in
mailboxes stored in Eudora and Thunderbird formats. These are both
"plain text" formats, but the character encoding varies greatly from one
message to the other. I wonder how to deal with that, so that the
messages comes out in the right encoding.

I suppose I need to decode each message according to its MIME headers to
get it right, but now I wonder how to store the messages in memory
before decoding them. Would it be right to treat the text as binary data?

Gustaf
 
M

Mihai N.

I suppose I need to decode each message according to its MIME headers to
get it right, but now I wonder how to store the messages in memory
before decoding them. Would it be right to treat the text as binary data?
Take a look at System.Net.Mime
 
?

=?ISO-8859-1?Q?G=F6ran_Andersson?=

Yes, you can treat it as binary data, i.e. a byte array. You can use the
ASCII encoding to read enough of the message to determine the encoding,
and the convert it with the correct encoding.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top