Outlook MSG file reading

  • Thread starter Thread starter Dmitry Akselrod
  • Start date Start date
D

Dmitry Akselrod

Hello everyone,

I am attempting to extract some header information from typical Microsoft
Outlook MSG files in VB.NET. I am not after a complete message or
attachments that may be enclosed. I am particularly interested in the
Message ID field. I have examined MSG files in notepad and hex editors. I
can see that the Internet Headers are there and present. I can do a search
for Message-ID and locate it without any problems in notepad. The only
display issue I have seen so far is that each letter is separated by hex
character 00. Thus the Message-ID string would actually be, M e s s a g e -
I D.

I don't want to use Outlook automation. I have found it to be cumbersome
and slow. I also don't want to be reliant on an installation of Office.

Since the file is binary, I have attempted to use the System.IO.StreamFile
object to read the file. However, I have
not been able successfully walk through the file and obtain any readable
text. I have played around with various encodings, such as ASCII and
Unicode. I think that MSG files are BASE64/Mime encoded though. Perhaps
that could be part of my trouble.

I have downloaded several example applications that mimic Notepad. However,
none of them have been able to read the encoding of MSG files. I have
gained a new level of appreciation for Notepad :). I wander what it is that
notepad uses to detect the file encoding and display it in such a readable
way.

Does anyone have any experience with reading Outlook data? Again, I am not
after pretty formatting, I just want to extract certain text fragments from
these binary files. Can someone point me in the right direction? I would
think that I just need to be able to read Byte Sream from the file with the
correct encoding and convert it to ASCII text. I have been totally
unsuccessful so far.

Thanks,
Dmitry
 
Does anyone have any experience with reading Outlook data? Again, I am
not after pretty formatting, I just want to extract certain text fragments
from these binary files. Can someone point me in the right direction? I
would think that I just need to be able to read Byte Sream from the file
with the correct encoding and convert it to ASCII text. I have been
totally unsuccessful so far.

Outlook can be automated, just like Word, Excel etc. It's a bit cranky, but
I have done it. Have you tried adding a reference to it?
 
Hi,

That's my whole thing is that I don't want to automate Outlook. It's very
clunky. I need to be able to process millions of MSG files and Office
products (i.e. Access) suck with that many files.

Thank you though.

dmitry
 
That's my whole thing is that I don't want to automate Outlook. It's very
clunky. I need to be able to process millions of MSG files and Office
products (i.e. Access) suck with that many files.

In that case I'd start searching for third party tools. I assume that MSFT
aren't offering to divulge the details of the format.
 
No, MS is definitely not documenting their MSG format. I did find this
article:

http://www.msusenet.com/archive/topic.php/t-288764.html

A gentleman, named Eduardo A. Morcillo has developed some .NET classes that
wrap the Office OLE storage. They are pretty good so far. The classes are
here:

http://www.mvps.org/emorcillo/en/code/grl/storage.shtml

I have been able to take a couple of MSG files and obtain a list of streams
(properties) and their values. However, I am still missing the Internet
Headers. They must lie somewhere else in the file. All of this is quite
annoying, thanks to Microsoft.

The only known working API I have seen so far (used by many forensic
applications) is from Fookes software. These guys are great and their tools
are phenomenal, but the API is a little outside my price range.

Being able to obtain the Sender, Recipient, Subject, etc. is definitely a
plus, but I need the Message ID. I guess it's back to more research.

Dmitry


Basically, the MSG file format is a series of binary streams.
 
Actually, never mind on the Internet Headers, they are there. They happen
to be stream, __substg1.0_007D001F. I just had some issues with data
formatting and conversion. I think that my problem is solved, thanks to Mr.
Morcillo.

dmitry
 
Difficulties in reading Outlook MSG files arise when Outlook is not installed on the system or the MSG files are corrupted/unsupported. Opening them directly is not possible for every user. A simple solution to this problem is to use a third-party tool like GainTools MSG Converter. Bulk processing, clean interface and accurate email properties preservation make it a reliable option for Outlook MSG reading. This suite allows you to open, preview and convert MSG files to multiple formats (EML, PST, MBOX, PDF, HTML) without Outlook.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Back
Top