Historical Data

D

Darrell

I have a recurring question which I can't seem to locate any discussion
of. It relates to keeping historical data in a relational database. This
may seem basic, but I've never come across an answer which satisfies my
curiosity.
Obviously, when you are joining from one table to related table through
a key field, data accessed via this key is what is currently in the
fields of the related table. Whenever data in the related table changes,
these changes will be reflected in any joins. This, is, in fact, part of
the whole point of a RDBMS. It backfires, however, whenever you want to
collect and preserve HISTORICAL information.
The classic example is when a vendor address changes. Purchases are made
from a vendor (Vendor A), and all that is recorded for the purchases is
the key for Vendor A. Then Vendor A's address changes. All purchases
ever made from Vendor A will now show Vendor A's current current -
clearly not true.
I have two possible solutions to this problem, the first of which I have
used. The first one is to duplicate all the fields of Vendor table in
the Purchases table and write the values from Vendors into Purchases
whenever Vendor A is selected. This seems a horrible redundancy of data.
The second one is to create multiple records for the Vendors table with
date ranges as part of the key which must be referenced in joins to that
table. This is not quite so redundant but still seems clunky, involving
the complications of including the date range in ALL joins.
Are there any other options????? This presents itself in virtually every
database I've ever designed, and I would certainly like some more
enlightened approaches.

Thanks, as always, for all suggestions,

Darrell
 
B

Bill Mosca, MS Access MVP

Historical data really is a bear, but I don't see address history as
important. Maybe it's just my lack of forethought, but wouldn't you need the
current address only? Who cares where a vendor *used to live*?

Discounts, on the other hand, or things like pay rates for employees *do*
need to be tracked. That's where effective dates come into play.
 
D

Darrell

Bill said:
Historical data really is a bear, but I don't see address history as
important. Maybe it's just my lack of forethought, but wouldn't you need the
current address only? Who cares where a vendor *used to live*?

Discounts, on the other hand, or things like pay rates for employees *do*
need to be tracked. That's where effective dates come into play.
Bill, thanks for the feedback. It certainly helps my self-confidence
level if nothing else. I guess I can check with the users, but I've
imagined a vendor claiming they never received a check and us having no
way to prove what address we had on file when the check was sent. Maybe
I'm just paranoid, but those thoughts go through my mind.

Darrell
 
B

Bill Mosca, MS Access MVP

I worked fo ra bank for awhile. The proof is in the cancelled check. Ask any
banker.
 
D

Darrell

Bill said:
I worked fo ra bank for awhile. The proof is in the cancelled check. Ask any
banker.
Okay. I see I've gotten myself into a discussion about the relative
importance of particular pieces of data specific to the illustration I
chose. Thanks for the feedback. Seriously. I will remember this for
future reference. However, this is a bit beside the original point, and
I think I've gotten the main thrust: don't worry about historical data
unless it's deemed to be crucial data.

Given that, any feedback on which of my 2 shared approaches is regarded
as best practice? Your first post implied the effective dates would be
the best solutions for discounts, etc.

Thanks again.
 
A

Allen Browne

Darrell, I'm interested to see what answers you get to this question as
well.

Both the approaches you suggest make good sense:

a) The "duplicate-the-address-into-the-order" approach makes good sense for
customer orders, since a customer might order something that is not really
his address at all, such as a hotel room or a gift to a friend.

b) The "related-addresses-limited-to-a-date-range" is very flexible,
allowing for multiple current and past addresses of different types and with
different priorities.

With (b), it's not too difficult to use a subquery to get the address in the
range, but a better approach might be to include a foreign key in your
purchase order, so you know for sure which of the address fields was
actually used for that p.o.:

- Advantage: It's the best solution if you allow multiple current addresses.

- Disadvantage: Strictly, it's not normalized, because the address belongs
to a vendor, so the VendorID is redundant. For purchase orders, you would
probably drop the VendorID field. For a customer sales table you probably
can't do that, because you can't make the address field required, so you
might rationalize it on the basis that one customer could order stuff to be
sent to someone else's address.

Good question. Hope you get more discussion on this.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top