Database fragmentation and performance

T

Tom Ellison

I have 6 very large and horribly fragmented database files (MDB and MDF).
They range from 1.59 GB to 2.81 GB in size (SQL Express on the larger ones).
There are up to 19 thousand fragments in them. I've just learned that, even
though the HD is only 79% full, I cannot defrag these. I would have to
somehow get 2.81 GB of CONTIGUOUS free space to do the largest. Looking at
the map in defrag, I doubt there is a single contiguous area of free space
larger than a half GB.

Before going to the expense and effort of buying another HD, installing it,
and copying these files, I want to ask your opinion and experience with the
performance issues involved. Is such a horribly fragmented MDB as this,
scattered back and forth across a 40 GB drive, likely to see much
performance improvement if defragged?

I ran into a client who wanted performance. I decreased the storage of most
of the columns of data, reducing the size by 70% and more, then added an
index. I got about a 10:1 performance improvement in queries. Now I want
to ask the client to defrag what is a very large database (he's already
reused most of the 70% savings by adding more records). I had hoped to
defrag and test here, but that's being held up because I can't defrag.

Do you see much performance improvement with a defrag on a database that is
in such bad condition? Any idea how much? The index still leaves a table
scan of perhaps a million rows (rather than the total database of 10 million
rows, hence the 10:1 improvement).

I know, this is a crazy thing to do with Jet. I'm trying to get it switched
to SQL Express, but if I can illustrate an even greater improvement still
using Jet, my credibility goes up. That will then tend to help move things
toward the ultimate solution.

Tom Ellison
 
G

Guest

It would certainly seem that the less a hard drive has to work, the faster
the database; however, it's probably not going to make a big improvement. I'm
assuming that you have already done a compact and repair.

Is there a DVD burner on that computer? If so, move the databases to a DVD;
then defrag the hard drive; finally move the databases back. You could do the
same thing with an external hard drive. There's no guarantee that Windows
will put it all in the same area of the hard drive plus it might not take
long for things to defrayment again.

Which brings up another subject - how do they backup the database now?
Hopefully backups are being done very frequently.
 
A

Arvin Meyer [MVP]

Hi Tom,

My experience with MDBs is that you will get a 5 to 10% performance increase
after the database is compacted. Compaction is not the same as disk
defragmentation, however, and I suspect that defragmenting will give you a
similar improvement. With 19,000 fragments, you cannot help but improve the
system condiserably. Indexing, as you already know, will give substantial
improvements in performance.

You might suggest burning everything to a DVD if the client has one. I've
been buying 80GB USB drives for use in backup as cheap as $100 (and even $50
dollars when there's a big sale).
--
Arvin Meyer, MCP, MVP
Microsoft Access
Free Access downloads
http://www.datastrat.com
http://www.mvps.org/access
 
P

Pat Hartman\(MVP\)

How do you determine how many fragments for a particular file?

Won't the HD defrag incrementally? You might have to do it a couple of
times to get enough free space but won't that eventually happen?
 
T

Tom Ellison

Hi, Pat,

Sorry we haven't yet met. I'll try to remedy that. Hope to see you at the
next Summit.

When I attempt the defrag on WinXP Pro, and it doesn't defrag some of the
files, it gives statistics on them, including the number of fragments.

Tom Ellison
 
T

Tom Ellison

Hey, Arvin,

It's always a fine thing to have contact with you!

There's no DVD drive on this system. The one on my other computer is
temporarily unavailable as the HD there crashed completely. Have to rebuild
a system AGAIN! Sigh!

I think I'll compress the directories where these files are. Maybe then
I'll be able to defrag one. If I get one, then maybe I'll be able to defrag
another, and another.

I had no idea the ability of Windows to defrag was so poor. Theoretically,
a single file filling only 0.03 % of the hard drive, and being the only file
on the hard drive, could be spread out across a drive an you'd be unable to
defrag! The rule is, there must be a single contiguous free space large
enough to hold the file or it won't defrag. A single 10 megabyte file on my
40 gigabyte drive could be too much for the defrag utility, if it were
spread out in uniformly spaced chunks that are less than 10 megabytes apart.
What a mess. I wrote my own defrag for unix systems about 20 years ago and
it worked better than this one does! Really nuts!

Tom Ellison
 
T

Tom Ellison

Thanks, Jerry:

No DVD and the CD burner is too small for these files, unless maybe they
compress well. I may yet try that.

"They" is ME. This is my development copy. I have backups of the
programming, and I have T2 bandwidth access to copies of the client's
backups that are kept on their HDs.

I suspect one client may have a fragmented HD and could really get some
benefit. Unfortunately, defrag seems to be bunk.

And, yah, I hope they're backing up their stuff, too. However, this is an
unusual database, made up of hundreds of imports from Excel. There is no
Access interface to insert, delete, or update anything. Just to search it.
Their search was originally about 20 hours daily. I got that down to 2-1/2
hours, and hope defrag will improve it even more. I'm also thinking having
it in a compressed folder would be a good thing for the table scan that
still occurs.

Just talking with all of you really helps me think about things I would have
missed otherwise.

Tom Ellison
 
T

Tom Ellison

Arvin,

Another thing, if you have a minute. Do you or any of the others have any
experience with the speed of table scans in a huge Jet database if the file
is both defragged and compressed. It would seem to me compression would be
a good thing, too.

The searches being done in this database are on numeric columns, and there's
not hint of uniqueness in any of them. The database really is a
spreadsheet, and there's no apology for that. It's supposed to be a huge
spreadsheet, imported daily from Excel (and, yes, that's got to make the
fragmentation all the worse.) I was just tinking of using compression to
get the file size down so I might have a shot at defragging it, and realized
that this would be a good thing for a table scan, too. I have eliminated
80-90% of the table scan by indexing the most beneficial column of data, but
there's still a lot of table scan needed. Compression sounds like a very
good thing then, doesn't it?

I guess we'll see!

Tom Ellison
 
D

david epsom dot com dot au

Compression is a bad idea.

The file has to be uncompressed on the fly everytime you
access it.

Also, compression takes place in the background, and is
a pain in the butt, because you can't move or delete
files while the compression action is going on. Which
may take overnight as a background task.

Also, compression tends to make fragmentation even worse,
and defragmenting even more difficult.

Compact the MDB files, then zip them into zipped folders
(not into compressed folders), then delete the unzipped file.
If you have room, do the same to the SQL Databases. If not,
just zip and delete them. Do the zip/delete one file at
a time, so that the fragmentation does not just get worse.

A good third party defragmenter will defragment your
directory structure, which will make far more difference
than defragmenting your files ever will.

For these big files, you will get better speed if you
reformat with big clusters (but you won't be able to
defrag at all, so you shouldn't do this where you are
updating the data as the client is).
In fact, you will get better speed if you format your
40GB drive as FAT32, with very large clusters.

For the MDB files, connect using Exclusive. Also, try
connecting Read-only: I can't remember if that made
any difference.

And, consider re-indexing between searches. For some
kinds of searches, it is worth adding an index before
the search.

(david)
 
G

Graham Mandeno

Hi Tom

There are many files that WinXP defrag simply will not touch.

However, I can strongly recommend Raxco's PerfectDisk
(http://www.raxco.com/). It does a very good job, even defragging system
files, MFT and pagefiles.
 
T

Tom Ellison

Dear David:

Thanks for your thoughts.

Yes, the file must be decompressed every time you read from it, and when you
write to the file, it must be compressed. This is a "processor" function.

However, the total amount of reading and writing would be decreased in terms
of hard drive access time.

In my experience, the processor can compress and decompress in virtually no
time at all, measured in microseconds, while the time to access a hard drive
is measured in milliseconds. The processor can do this 1000 times as fast
as the hard drive. For that reason, I expect there to be a significant
improvement.

Now, this improvement would depend somewhat on the type of access to the
data that is being performed.

In my case, the access would be done using a Clustered Index, meaning the
data to be scanned will be a contiguous portion of data determined though an
initial access using the criterion associated with that index. If the data
is compressed 5:1 then the speed of this search would likely be about 4:1
faster.

I have now increased the free space on the drive to 41%, and still it will
not defrag a half dozen files. The poor quality of the defrag software is
being illustrated.

Is the "third party" software to which you refer the Symantec/Norton
product. I've used that with great success, but I don't have a version of
it I can use on XP.

Your suggestions on alternatives for optimal performance are interesting.
It seems very likely that different formatting of a separate drive just for
database files, using large clusters, would be a positive approach. It
makes me wish I had seen some information about experiments on this. If
there aren't any, I'd like to publish that myself.

Tom Ellison
 
T

Tom Ellison

Dear Graham:

That sounds great! Thanks for the info!

Tom Ellison


Graham Mandeno said:
Hi Tom

There are many files that WinXP defrag simply will not touch.

However, I can strongly recommend Raxco's PerfectDisk
(http://www.raxco.com/). It does a very good job, even defragging system
files, MFT and pagefiles.
--
Good Luck!

Graham Mandeno [Access MVP]
Auckland, New Zealand

Tom Ellison said:
Hi, Pat,

Sorry we haven't yet met. I'll try to remedy that. Hope to see you at
the next Summit.

When I attempt the defrag on WinXP Pro, and it doesn't defrag some of the
files, it gives statistics on them, including the number of fragments.

Tom Ellison
 
G

Guest

If I had any real knowledge* I would have said so :~)

When I move a file from or to a compressed folder, it
takes much longer. When moving from, I can see the
difference immediately. When moving to, the file
appears to move, but the source file can't be deleted
until much later. These are large (15GB) heavily
compressed (3GB) Windows backup files.

(david)
*http://home.znet.com/schester/tchester/index.html
 
T

Tom Ellison

Dear David:

Until I have seen some extensive testing of this, I would not say that any
of us has "any real knowledge."

What I am suggesting is based on theory.

Now, when you move a sizable file from compressed to non-compressed, or the
other way around, it has to copy the whole file as well as perform the
compress or de-compress. Likely, it doesn't take much time to compress or
decompress. The time taken would be the time to read/write. That's my
theory.

A truer test would be to copy a compressed file to another compressed drive
and compare that to copying the same file, but uncompressed, to another
uncompressed drive. Wouldn't it be much faster for the compressed version?
So, this is the basis for my theory about performaing a query on a
compressed database. There would be far less reading from the hard drive if
the file were compressed, and that's the thing that takes so much time,
waitng for rotational latency and for track-to-track access (and at that,
I'm assuming the file is defragged, or it all gets much worse!)

I'm more and more determined to make a study of this. I'll report back.

The size of files you mention are exactly what I want to be testing, but as
open databases.

Tom Ellison
 
A

Arvin Meyer [MVP]

Compaction improves performance of mdb's on the data, and slows the first
use of every query on the front end (while it recompiles). I reserve
compaction (I'm assuming that you mean disk compaction) for files I archive.
The performance is slowed significantly to the point that I've never even
considered trying it on database files. Defragging a disk improves
performance of everything. I can't give you specifics on any of these since
my experience is anecdotal, not tested in a controlled environment.
--
Arvin Meyer, MCP, MVP
Microsoft Access
Free Access downloads
http://www.datastrat.com
http://www.mvps.org/access
 
T

Tom Ellison

Dear Arvin:

The term I should use is "Compression", which is what it is called in
Windows Explorer, as in "a compressed folder."

In your response, you said it "slows the first use of every query on the
front end." Are you saying that compressing the back end slows this, or
that compressing the front end does? Your context seems to be saying you do
not compress (compact) the back end except to archive it. But then it seems
you are saying you've never even tried it on "database files" (back end?)
Sorry, but I really value your opinion, and would very much like to
understand just what you've said.

In my case, the client was spending 20 hours a day performing table scans.
A single, simple, carefully chosen index on a non-unique numeric column sped
up the process 6 fold, to perhaps 3 hours. Now, most of this is still table
scan. Further testing indicates it could be done in 1-1/2 hour (running
400-500 queries) by compressing the database. On this scale, the "compile
time" is much less than the table scan time, which is, of course, not the
normal mix of these factors. I got a 5:3 increase in performance with
compression.

Defragging is another issue. I don't think anyone is going to advise
against this.

Now, I'm remembering a situation in which I had considerable control (I
wrote the database engine). Here, I would pre-allocate tables and indexes
to what their eventual size might become, so the physical file sizes never
changed, and no fragmentation was possible (except for deleted rows within
the tables).

Now, I'm considering testing a set up of a separate hard drive for database
files, with optimal "large file size" parameters (large segments). Now, if
we have only one file per partition on this drive, fragmentation should
disappear, right? For long term maintenance, this would simplify things, as
long as the file never outgrows the partition. Given that the database
cannot grow beyond 2 GB anyway (for MDB) this isn't any big problem.

I'm also thinking how it might be useful to have an option that a database
NOT decrease in size during Compact and Repair. If the database could be
preallocated in size, and defragmented at that size, it would be useful for
it to just continue to occupy all that space on the drive, expanding within
itself as needed. Make sense?

Thanks for considering my ideas, and for your thoughtful responses.

Tom Ellison
 
D

david epsom dot com dot au

A truer test would be to copy a compressed file to another compressed
drive and compare that to copying the same file, but uncompressed,

Doesn't work that way in Windows. Whenever you
move a file, it uncompresses. Obviously, not very
efficient, but it probably makes some other things
much simpler.

(david)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top