Danny Tuppeny said:
Hi Peter,
[snip]
Interesting response. What about performance though? If the user opens a
folder that has 1,000 messages, either I have to load them all *very*
quickly (I need to display Sender, Subject, Date, etc.), or I fetch them
as the user scrolls (which could be pretty unresponsive if the user is
dragging the scrollbar).
I suspect it will load much faster than you think.
Assuming in the index you store Sender, subject, date, message ID, offset in
main file, and a few other header items, I suspect you're looking at an
average of roughly 100-200 bytes per message. Let's say 200 bytes, but
that's probably on the high side. That works out to only 200K per thousand
messages or 5000 messages per megabyte. That will load into memory pretty
quickly.
What would you store in the index file? The user will be able to change
the sort order in the display, so unless I maintain a few indexes, it'd be
difficult to get a list in order. The message list will show the Sender,
Date, Subject etc., and so if I have to scan through the data file for
thousands of these things, surely it'll take an age? I've never done this
kind of processing before, so I've no idea of how it would perform. I
don't want to build it and find it's unacceptable, so any experiences
anyone can share would be much appreciated!
Well, if they're going to be able to sort them, then it makes sense to load
it all into memory, assuming that's feasible. Given the figures above, that
should be doable on most modern computers, assuming your just loading
messages from a single group at a time. Load the messages into memory and
then sort them. Leave them sorted in the files however you want. It won't
make much difference.
I don't expect it to be lighting fast, but I think it will be much faster
than you think. Implementing the IComparer interface, sorting should be a
piece of cake and the built-in sort algorithm is quick sort, I believe.
As for compression - again, without testing it, I wouldn't know - but
although compression would save tons of disk space, wouldn't the overhead
of the compression make is slower than reading more uncompressed data? I
assume compression would be variable, so it'd be difficult to seek within
a compressed stream. Any ideas?
Compressing data is slow. Decomrpessing is generally quite fast. I suspect
it'll be faster to read due to the large amount of saved space, particularly
if data is located on a network drive.
Remember, 2 files: Index file and Data File. Leave the index file
uncompressed. Don't compress the entire data file, just compress the
individual messages. That way you have an offset to each compressed message
and just begin decompression at the beginning of the message. Again, look at
the message I posted earlier where I use a simple index file and store a
bunch of thumbnails in a single file. It easily loads 500 thumbnails (and
that includes jpeg decoding of the data) in a matter of maybe 2 seconds.
Without the jpeg decoding, it would be less than half a second, I'm sure.