Dealing with textfiles with multiple encodings

Gustaf · Oct 21, 2006

I'm planning to write an app that will extract certain messages in
mailboxes stored in Eudora and Thunderbird formats. These are both
"plain text" formats, but the character encoding varies greatly from one
message to the other. I wonder how to deal with that, so that the
messages comes out in the right encoding.

I suppose I need to decode each message according to its MIME headers to
get it right, but now I wonder how to store the messages in memory
before decoding them. Would it be right to treat the text as binary data?

Gustaf

Mihai N. · Oct 21, 2006

I suppose I need to decode each message according to its MIME headers to

get it right, but now I wonder how to store the messages in memory
before decoding them. Would it be right to treat the text as binary data?

Take a look at System.Net.Mime

=?ISO-8859-1?Q?G=F6ran_Andersson?= · Oct 21, 2006

Yes, you can treat it as binary data, i.e. a byte array. You can use the
ASCII encoding to read enough of the message to determine the encoding,
and the convert it with the correct encoding.

Alternate mime encodings in Outlook 2007	2	Oct 7, 2008
Reading mailboxes effectively	1	Nov 3, 2005
Multipart Mime Message Parser in .NET BCL	1	Jun 21, 2007
Messages and attachments not decoding properly in Outlook 2003	1	Mar 26, 2006
Messages and attachments not decoding properly in Outlook 2003	7	Mar 28, 2006
UUDWin decoder / encoder	1	Nov 8, 2003
Outlook 2007 can't encode UTF-8 messages	5	May 6, 2008
Regex help with large strings	1	Sep 17, 2004

Dealing with textfiles with multiple encodings

Gustaf

Mihai N.

=?ISO-8859-1?Q?G=F6ran_Andersson?=

Ask a Question

Similar Threads