D
David W. Fenton
The cutover for my SQL Server upsizing project is this weekend.
Things have gone well in terms of the app design, but one thing was
driven home to me by this upsizing project that I think it's good to
be reminded of:
Clean data is important not just to accuracy, but because clean data
makes your database function better.
For example:
I had a table with over 500K records that had a foreign key from a
5K table. There was no relationship enforced, and so many of the
records had no value there. But I needed to display the data from
both tables. That meant that I needed an outer join.
The cost of this in SQL Server was HUGE.
So, after speaking with the client, it was determined that there
really was no reason except for sloppy data entry that this field
should ever be Null. Thus, I was able to populate the field, make it
required (i.e., NULL not allowed) and change the join in views to
INNER JOIN. The performance improvement was drastic -- from over a
minute to pull up a filtered result on the view that was doing the
joining to instantaneous results (i.e., much less than a second).
I started looking around for other outer joins that could be
eliminated by cleaning up the data, and found several places where
bad data entry and failure to enforce relationships had led to null
values in foreign keys. After cleaning this up, a lot of performance
improvements resulted.
The main point:
Avoid Null foreign keys whenever you can. In Jet, that means making
your foreign keys required.
Obviously, the N:0 relationship requires allowing a Null FK, but if
you don't need that, it's best to not allow it. In the database I've
been working on, there were plenty of places where the only reason
the Null was allowed was to accomodate legacy data that didn't have
any values in the FK fields. Fortunately, there were obvious choices
for the parent records in all cases, so populating the FK fields was
very desirable.
So, performance can be a matter of schema design, and avoiding outer
joins is A GOOD THING WHENEVER POSSIBLE.
Things have gone well in terms of the app design, but one thing was
driven home to me by this upsizing project that I think it's good to
be reminded of:
Clean data is important not just to accuracy, but because clean data
makes your database function better.
For example:
I had a table with over 500K records that had a foreign key from a
5K table. There was no relationship enforced, and so many of the
records had no value there. But I needed to display the data from
both tables. That meant that I needed an outer join.
The cost of this in SQL Server was HUGE.
So, after speaking with the client, it was determined that there
really was no reason except for sloppy data entry that this field
should ever be Null. Thus, I was able to populate the field, make it
required (i.e., NULL not allowed) and change the join in views to
INNER JOIN. The performance improvement was drastic -- from over a
minute to pull up a filtered result on the view that was doing the
joining to instantaneous results (i.e., much less than a second).
I started looking around for other outer joins that could be
eliminated by cleaning up the data, and found several places where
bad data entry and failure to enforce relationships had led to null
values in foreign keys. After cleaning this up, a lot of performance
improvements resulted.
The main point:
Avoid Null foreign keys whenever you can. In Jet, that means making
your foreign keys required.
Obviously, the N:0 relationship requires allowing a Null FK, but if
you don't need that, it's best to not allow it. In the database I've
been working on, there were plenty of places where the only reason
the Null was allowed was to accomodate legacy data that didn't have
any values in the FK fields. Fortunately, there were obvious choices
for the parent records in all cases, so populating the FK fields was
very desirable.
So, performance can be a matter of schema design, and avoiding outer
joins is A GOOD THING WHENEVER POSSIBLE.