Outlook MSG file reading

D

Dmitry Akselrod

Hello everyone,

I am attempting to extract some header information from typical Microsoft
Outlook MSG files in VB.NET. I am not after a complete message or
attachments that may be enclosed. I am particularly interested in the
Message ID field. I have examined MSG files in notepad and hex editors. I
can see that the Internet Headers are there and present. I can do a search
for Message-ID and locate it without any problems in notepad. The only
display issue I have seen so far is that each letter is separated by hex
character 00. Thus the Message-ID string would actually be, M e s s a g e -
I D.

I don't want to use Outlook automation. I have found it to be cumbersome
and slow. I also don't want to be reliant on an installation of Office.

Since the file is binary, I have attempted to use the System.IO.StreamFile
object to read the file. However, I have
not been able successfully walk through the file and obtain any readable
text. I have played around with various encodings, such as ASCII and
Unicode. I think that MSG files are BASE64/Mime encoded though. Perhaps
that could be part of my trouble.

I have downloaded several example applications that mimic Notepad. However,
none of them have been able to read the encoding of MSG files. I have
gained a new level of appreciation for Notepad :). I wander what it is that
notepad uses to detect the file encoding and display it in such a readable
way.

Does anyone have any experience with reading Outlook data? Again, I am not
after pretty formatting, I just want to extract certain text fragments from
these binary files. Can someone point me in the right direction? I would
think that I just need to be able to read Byte Sream from the file with the
correct encoding and convert it to ASCII text. I have been totally
unsuccessful so far.

Thanks,
Dmitry
 
H

Homer J Simpson

Does anyone have any experience with reading Outlook data? Again, I am
not after pretty formatting, I just want to extract certain text fragments
from these binary files. Can someone point me in the right direction? I
would think that I just need to be able to read Byte Sream from the file
with the correct encoding and convert it to ASCII text. I have been
totally unsuccessful so far.

Outlook can be automated, just like Word, Excel etc. It's a bit cranky, but
I have done it. Have you tried adding a reference to it?
 
D

Dmitry Akselrod

Hi,

That's my whole thing is that I don't want to automate Outlook. It's very
clunky. I need to be able to process millions of MSG files and Office
products (i.e. Access) suck with that many files.

Thank you though.

dmitry
 
H

Homer J Simpson

That's my whole thing is that I don't want to automate Outlook. It's very
clunky. I need to be able to process millions of MSG files and Office
products (i.e. Access) suck with that many files.

In that case I'd start searching for third party tools. I assume that MSFT
aren't offering to divulge the details of the format.
 
D

Dmitry Akselrod

No, MS is definitely not documenting their MSG format. I did find this
article:

http://www.msusenet.com/archive/topic.php/t-288764.html

A gentleman, named Eduardo A. Morcillo has developed some .NET classes that
wrap the Office OLE storage. They are pretty good so far. The classes are
here:

http://www.mvps.org/emorcillo/en/code/grl/storage.shtml

I have been able to take a couple of MSG files and obtain a list of streams
(properties) and their values. However, I am still missing the Internet
Headers. They must lie somewhere else in the file. All of this is quite
annoying, thanks to Microsoft.

The only known working API I have seen so far (used by many forensic
applications) is from Fookes software. These guys are great and their tools
are phenomenal, but the API is a little outside my price range.

Being able to obtain the Sender, Recipient, Subject, etc. is definitely a
plus, but I need the Message ID. I guess it's back to more research.

Dmitry


Basically, the MSG file format is a series of binary streams.
 
D

Dmitry Akselrod

Actually, never mind on the Internet Headers, they are there. They happen
to be stream, __substg1.0_007D001F. I just had some issues with data
formatting and conversion. I think that my problem is solved, thanks to Mr.
Morcillo.

dmitry
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top