B
Brian Selzer
Sylvain Lafontaine said:To that, I would add that the increased simplicity of using a surrogate
(or artificial or autonumber) key as the primary key in place of a
composite key is only half their advantage.
The biggest problem that I have with composite keys is that they share the
same fundamental problem as natural keys: using them as the primary key is
allowing the fact that a primary key can change its value over time.
IMHO, a primary key should never be allowed to change its value once it
has been created; a assumption which will forbid the use of a composite
key in many cases. (Of course, if you don't mind to see a primary key
changing its value after its creation then you are not concerned by this
argument.).
This argument has an inherent fallacy in it. Just because a key is composed
from multiple columns doesn't necessarily mean that its values can be
different in different database states. For example, in an Inventory table
that has the key, {ItemKey, WarehouseKey}, with references to an Item table
and a Warehouse table respectively, the combination values that comprise
each key value can never change from database state to database state. A
particular combination of values identifies a particular individual in the
Universe of Discourse in /every/ database state in which it appears. It can
/never/ identify any other individual. Therefore, it should be obvious that
adding an additional autonumber primary key in this instance would be
superfluous, since each {ItemKey, WarehouseKey} combination already rigidly
designates a distinct individual in the Universe of Discourse.
The same can be said for many natural keys. For example, suppose you have a
table, Queue, that has an integer key, {Position}. Each value for Position
rigidly designates a distinct individual in the Universe of Discourse (3
always means "third in line" in any database state in which there are 3 or
more elements), so therefore there is no need for an additional autonumber
primary key.
This is not only a theoritical argument as many interfaces - like Access -
won't like to see a primary key that could change it value. But even if
you take out such interfaces out of the equation, the use of a surrogate
key for all tables reveals itself to be advantageous in many database
problems. For example, if you want to add a log of all changes to a
table, it's much more easier to design it if the table use a surrogate key
for its primary key than a natural key or a composite key.
Personally, I stopped using natural keys and composite keys many years ago
and probably that something like half of my problems with the design of
databases have vanished with them. On these occasions when I was called
to work on a problematic database, chances was much higher to see that the
problems were associated with the use of natural keys and/or composite
keys than with the use of a surrogate keys and the solutions were usually
much more complicated to solve in the first case than in the second case.
Also, I've remember some peoples who have done like me and have stopped
using natural and composite keys in favor of the exclusive use of
surrogate keys but I don't remember anyone doing the opposite; ie. going
from the use of surrogate keys to the use of natural and composite keys.
--
Sylvain Lafontaine, ing.
MVP - Technologies Virtual-PC
E-mail: sylvain aei ca (fill the blanks, no spam please)
Whenever I have multiple key fields, natural or not, I create an
AutoNumber PK for pragmatic reasons. The main reason is that it makes
it easier to create the joins. The theorists are champions at joining
tables and don't have to be concerned with the complexity of the SQL
they write.
Word to the wise: 'theorists' hate SQL.
Queries involving many to many relationships often add
additional tables later and highlight the need to keep joins as simple
as possible.
I think I'm with Bob Badour (gulp!) on the issue of complexity,
though: if you think more columns in the ON clause makes a SQL join
more 'complex' then I think you could be looking at things wrong.
Having more characters to type increases the risk of typos? More
columns mean you may omit one in error? The SQL engine may be twice as
slow in handling two columns rather than one? Is it more 'complex' to
split a post address into 'subatomic' columns (address lines from
postal code/zip etc)?
Surely the issue you allude to (I think) is the one that Access
Relationships (as distinct from Jet foreign keys) were invented to
solve? i.e. you pre-define the join columns and 'join type' (inner
join, left outer join or right outer join) and the join clause gets
written as SQL for you when you drop the tables into the Query Builder
thing. I would have thought the 'theorists' would love the fact that
you also create foreign keys in the same Relationships dialog i.e. you
end up with a natural join (not having to explicitly specify the
columns yourself) because one table references the other.
[I tend to be dismissive of tools that write SQL code for me but I
think I should perhaps review my stance e.g. I still write all my SQL
Server procs by hand whereas I have tasked myself to investigate CRUD
generators. But, for the time being, ...] As a SQL coder myself, I
find it more annoying that I have to create multiple joins to get the
'natural key' values, having to discover what the 'artificial key'
columns are in the first place.
Lately, I've increased the amount of normalization in one
of my databases and the joins got even more complicated, adding about a
line or so in the SQL view in Access for every new query using those
tables.
Bad luck: I think you might have got way with "reduced the amount of
denormalization" ;-) In this thread I've already broken my personal
rule (!!) about not mentioning normalization [formulated because the
average 'replier' around here thinks "fully normalized" is BCNF, which
they think is 3NF anyhow, and doesn't pay much attention to anomalies
that normalization doesn't address, unless the 'asker' mentions
storing calculations...]
I keep Jamie's advice in the back of my mind, about how enforcing
constraints at the table level is better than enforcing them through
code
..and best to do it in both places! Bear in mind that it's a rule of
thumb i.e. "strict rules modified in practise." Checking something in
the in front end allows you to give timely user feedback and could
save them some keying, not to mention a database roundtrip. Checking
in the database catches anything neglected in the front end by
omission of validation or introduction of bugs. In practice, some
things are better done in one place but not the other: contrast the
validation of the basic pattern of an email address with the
verification that an addressable entity can be contacted at that email
address; I don't think it would be sensible to put the latter test
into a table constraint, even if it were possible.
Jamie.
--
What part of simpler don't you understand . Only one expression in
the ON is simpler. Needing less indexes is simpler. Not having to
look for your multi-key fields is easier, although your point that
Relationships can handle that is valid. If the AutoNumber key has a
one-to-one relationship with the multi-key fields then it's fine to
use it. There's no down side that I can see. I also like to rely on
coding to detect inconsistent data rather than on error trapping, so I
have to check the multi-key values anyway before adding a new record.
I think that your idea about enforcing constraints at both the table
level and in code is an excellent idea. The OP wanted to know what
people did and why. I still don't see any reason put forward for me
to change to a multi-field key. Are totals queries easier when multi-
field keys are used? BTW, "reduced the amount of denormalization"
works just as well. Real databases experience denormalizing
influences.
James A. Fortune
(e-mail address removed)