Dealing with textfiles with multiple encodings

  • Thread starter Thread starter Gustaf
  • Start date Start date
G

Gustaf

I'm planning to write an app that will extract certain messages in
mailboxes stored in Eudora and Thunderbird formats. These are both
"plain text" formats, but the character encoding varies greatly from one
message to the other. I wonder how to deal with that, so that the
messages comes out in the right encoding.

I suppose I need to decode each message according to its MIME headers to
get it right, but now I wonder how to store the messages in memory
before decoding them. Would it be right to treat the text as binary data?

Gustaf
 
I suppose I need to decode each message according to its MIME headers to
get it right, but now I wonder how to store the messages in memory
before decoding them. Would it be right to treat the text as binary data?
Take a look at System.Net.Mime
 
Yes, you can treat it as binary data, i.e. a byte array. You can use the
ASCII encoding to read enough of the message to determine the encoding,
and the convert it with the correct encoding.
 
Back
Top