Measuring the ACTUAL overhead of NTFS compression

A

Andrew Mayo

I am astonished that, search as I might, all I can find about the
overhead of NTFS compression are informal comments where the authors
surmise, without a shred of evidence, that the overhead is
'substantial'.

Now, I remember when DoubleSpace was first released for DOS 6, that
Microsoft at that point said that the overhead of decompressing the
data was approximately 5% of the processing power of a 486/66.

I conclude from this that unless NTFS compression is incredibly
inefficient, that the overhead of decompressing on the fly must
therefore be vanishingly small. This implies that in most cases,
compression would actually benefit, not hinder, performance, because
hard drive access times have not improved much since the days of the
486 but processor speed certainly has!

I assume that NTFS compression must operate at the block level, since
random access to a compressed file would otherwise require that the
entire file be decompressed first, and given the kind of performance I
have experienced with it, I do not believe this is true. There will be
some write overhead, and I do understand that the algorithm used is
asymmetrically tilted in favour of decompression speed - a wise
decision.

I am astonished that no-one seems to have ever scientifically measured
the overhead of NTFS compression. Given that it tends to reduce the
size of a SQL Server database by a factor of 10, you would suppose
that the gain in effective disk bandwidth would considerably overcome
the compression overhead. Indeed, on my laptop at home, I have NTFS
compression on all folders and have never noticed any overhead at all,
even running SQL Server.

Does anyone know of any tests that have been made to actually measure
the overhead?.
 
G

Gary Davis

I think the main reason you don't see anything on this is that disk space is
so cheep today.
The is really no good reason to compress the files to save space.
 
A

Andrew Mayo

Gary Davis said:
I think the main reason you don't see anything on this is that disk space is
so cheep today.
The is really no good reason to compress the files to save space.
Ah, but wait a minute!. I agree entirely but that wasn't quite the
point. With today's blindingly fast CPUs, there may be a number of
applications where compressing the data would actually IMPROVE
performance, possibly dramatically. Take SQL Server, for instance.

SQL Server databases compress around 10:1 based on experience with
NTFS file compression. Now suppose I want to do a table scan.
Previously, let's suppose the table occupied, let's say, 10M.
Compressed it will now occupy only 1M.

Suppose I have a disk transfer speed of 10Mbytes/sec with a latency
of, say 5ms. Assume that the disk is already 'on track' so ignoring
seek time and latency, which we have no control over, once that data
starts streaming we can get 10M per second. Now, suppose we've
compressed that data. Now we only need 1M, which takes 0.1 second. If
we can decompress this in less than 0.9 second we're ahead in
performance.


Therefore I wonder if the 'folk wisdom' that compression is going to
degrade system performance is actually wrong.

Back when disk was expensive and CPUs were expensive, and slow, I
recall that file compression was the norm rather than the exception.
Usually a simple low-cost (i.e cheap to decompress or compress)
run-length encoded algorithm was used.

Disk drive speed hasn't really improved significantly in the last
decade, albeit that capacity has increased dramatically, but CPU
performance has probably increased by thousands of percent (486/66 ->
Pentium 3G)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top