Separate PK in Jxn Tbl?

B

Brian Selzer

Sylvain Lafontaine said:
To that, I would add that the increased simplicity of using a surrogate
(or artificial or autonumber) key as the primary key in place of a
composite key is only half their advantage.

The biggest problem that I have with composite keys is that they share the
same fundamental problem as natural keys: using them as the primary key is
allowing the fact that a primary key can change its value over time.
IMHO, a primary key should never be allowed to change its value once it
has been created; a assumption which will forbid the use of a composite
key in many cases. (Of course, if you don't mind to see a primary key
changing its value after its creation then you are not concerned by this
argument.).

This argument has an inherent fallacy in it. Just because a key is composed
from multiple columns doesn't necessarily mean that its values can be
different in different database states. For example, in an Inventory table
that has the key, {ItemKey, WarehouseKey}, with references to an Item table
and a Warehouse table respectively, the combination values that comprise
each key value can never change from database state to database state. A
particular combination of values identifies a particular individual in the
Universe of Discourse in /every/ database state in which it appears. It can
/never/ identify any other individual. Therefore, it should be obvious that
adding an additional autonumber primary key in this instance would be
superfluous, since each {ItemKey, WarehouseKey} combination already rigidly
designates a distinct individual in the Universe of Discourse.

The same can be said for many natural keys. For example, suppose you have a
table, Queue, that has an integer key, {Position}. Each value for Position
rigidly designates a distinct individual in the Universe of Discourse (3
always means "third in line" in any database state in which there are 3 or
more elements), so therefore there is no need for an additional autonumber
primary key.
This is not only a theoritical argument as many interfaces - like Access -
won't like to see a primary key that could change it value. But even if
you take out such interfaces out of the equation, the use of a surrogate
key for all tables reveals itself to be advantageous in many database
problems. For example, if you want to add a log of all changes to a
table, it's much more easier to design it if the table use a surrogate key
for its primary key than a natural key or a composite key.

Personally, I stopped using natural keys and composite keys many years ago
and probably that something like half of my problems with the design of
databases have vanished with them. On these occasions when I was called
to work on a problematic database, chances was much higher to see that the
problems were associated with the use of natural keys and/or composite
keys than with the use of a surrogate keys and the solutions were usually
much more complicated to solve in the first case than in the second case.

Also, I've remember some peoples who have done like me and have stopped
using natural and composite keys in favor of the exclusive use of
surrogate keys but I don't remember anyone doing the opposite; ie. going
from the use of surrogate keys to the use of natural and composite keys.

--
Sylvain Lafontaine, ing.
MVP - Technologies Virtual-PC
E-mail: sylvain aei ca (fill the blanks, no spam please)


Whenever I have multiple key fields, natural or not, I create an
AutoNumber PK for pragmatic reasons. The main reason is that it makes
it easier to create the joins. The theorists are champions at joining
tables and don't have to be concerned with the complexity of the SQL
they write.

Word to the wise: 'theorists' hate SQL.
Queries involving many to many relationships often add
additional tables later and highlight the need to keep joins as simple
as possible.

I think I'm with Bob Badour (gulp!) on the issue of complexity,
though: if you think more columns in the ON clause makes a SQL join
more 'complex' then I think you could be looking at things wrong.
Having more characters to type increases the risk of typos? More
columns mean you may omit one in error? The SQL engine may be twice as
slow in handling two columns rather than one? Is it more 'complex' to
split a post address into 'subatomic' columns (address lines from
postal code/zip etc)?

Surely the issue you allude to (I think) is the one that Access
Relationships (as distinct from Jet foreign keys) were invented to
solve? i.e. you pre-define the join columns and 'join type' (inner
join, left outer join or right outer join) and the join clause gets
written as SQL for you when you drop the tables into the Query Builder
thing. I would have thought the 'theorists' would love the fact that
you also create foreign keys in the same Relationships dialog i.e. you
end up with a natural join (not having to explicitly specify the
columns yourself) because one table references the other.

[I tend to be dismissive of tools that write SQL code for me but I
think I should perhaps review my stance e.g. I still write all my SQL
Server procs by hand whereas I have tasked myself to investigate CRUD
generators. But, for the time being, ...] As a SQL coder myself, I
find it more annoying that I have to create multiple joins to get the
'natural key' values, having to discover what the 'artificial key'
columns are in the first place.
Lately, I've increased the amount of normalization in one
of my databases and the joins got even more complicated, adding about a
line or so in the SQL view in Access for every new query using those
tables.

Bad luck: I think you might have got way with "reduced the amount of
denormalization" ;-) In this thread I've already broken my personal
rule (!!) about not mentioning normalization [formulated because the
average 'replier' around here thinks "fully normalized" is BCNF, which
they think is 3NF anyhow, and doesn't pay much attention to anomalies
that normalization doesn't address, unless the 'asker' mentions
storing calculations...]
I keep Jamie's advice in the back of my mind, about how enforcing
constraints at the table level is better than enforcing them through
code

..and best to do it in both places! Bear in mind that it's a rule of
thumb i.e. "strict rules modified in practise." Checking something in
the in front end allows you to give timely user feedback and could
save them some keying, not to mention a database roundtrip. Checking
in the database catches anything neglected in the front end by
omission of validation or introduction of bugs. In practice, some
things are better done in one place but not the other: contrast the
validation of the basic pattern of an email address with the
verification that an addressable entity can be contacted at that email
address; I don't think it would be sensible to put the latter test
into a table constraint, even if it were possible.

Jamie.

--

What part of simpler don't you understand :). Only one expression in
the ON is simpler. Needing less indexes is simpler. Not having to
look for your multi-key fields is easier, although your point that
Relationships can handle that is valid. If the AutoNumber key has a
one-to-one relationship with the multi-key fields then it's fine to
use it. There's no down side that I can see. I also like to rely on
coding to detect inconsistent data rather than on error trapping, so I
have to check the multi-key values anyway before adding a new record.
I think that your idea about enforcing constraints at both the table
level and in code is an excellent idea. The OP wanted to know what
people did and why. I still don't see any reason put forward for me
to change to a multi-field key. Are totals queries easier when multi-
field keys are used? BTW, "reduced the amount of denormalization"
works just as well. Real databases experience denormalizing
influences.

James A. Fortune
(e-mail address removed)
 
W

Wayne-I-M

For example, in an Inventory table
that has the key, {ItemKey, WarehouseKey}, with references to an Item table
and a Warehouse table respectively, the combination values that comprise
each key value can never change from database state to database state.

What happens if something happens to the warehouse that makes it un-usable.
Not enough to affect the items "in" the warehouse. Would this not mean the
items are moved to another warhouse. Why not just use the item key as a
stand alone or (as Sylvain suggested) have the Item Key as an autonumber.

I'm not arguing either way - I am trying to learn better but it seems that
createing a muliple layer primary field is just asking for problems when
there is no need to do this as all DB's can cope perfectly well with just an
autonumber. I was always told that the primary field was "not" there for an
other purpose than to indetify that specific recordset.

If you use multiple layered key fields are you not assigning another value
to the primary (that of a - in your example - a product/item locator).

As I said I'm not standing on either side I'm just wanting to me knowledge
increase.

Thank you
--
Wayne
Manchester, England.



Brian Selzer said:
Sylvain Lafontaine said:
To that, I would add that the increased simplicity of using a surrogate
(or artificial or autonumber) key as the primary key in place of a
composite key is only half their advantage.

The biggest problem that I have with composite keys is that they share the
same fundamental problem as natural keys: using them as the primary key is
allowing the fact that a primary key can change its value over time.
IMHO, a primary key should never be allowed to change its value once it
has been created; a assumption which will forbid the use of a composite
key in many cases. (Of course, if you don't mind to see a primary key
changing its value after its creation then you are not concerned by this
argument.).

This argument has an inherent fallacy in it. Just because a key is composed
from multiple columns doesn't necessarily mean that its values can be
different in different database states. For example, in an Inventory table
that has the key, {ItemKey, WarehouseKey}, with references to an Item table
and a Warehouse table respectively, the combination values that comprise
each key value can never change from database state to database state. A
particular combination of values identifies a particular individual in the
Universe of Discourse in /every/ database state in which it appears. It can
/never/ identify any other individual. Therefore, it should be obvious that
adding an additional autonumber primary key in this instance would be
superfluous, since each {ItemKey, WarehouseKey} combination already rigidly
designates a distinct individual in the Universe of Discourse.

The same can be said for many natural keys. For example, suppose you have a
table, Queue, that has an integer key, {Position}. Each value for Position
rigidly designates a distinct individual in the Universe of Discourse (3
always means "third in line" in any database state in which there are 3 or
more elements), so therefore there is no need for an additional autonumber
primary key.
This is not only a theoritical argument as many interfaces - like Access -
won't like to see a primary key that could change it value. But even if
you take out such interfaces out of the equation, the use of a surrogate
key for all tables reveals itself to be advantageous in many database
problems. For example, if you want to add a log of all changes to a
table, it's much more easier to design it if the table use a surrogate key
for its primary key than a natural key or a composite key.

Personally, I stopped using natural keys and composite keys many years ago
and probably that something like half of my problems with the design of
databases have vanished with them. On these occasions when I was called
to work on a problematic database, chances was much higher to see that the
problems were associated with the use of natural keys and/or composite
keys than with the use of a surrogate keys and the solutions were usually
much more complicated to solve in the first case than in the second case.

Also, I've remember some peoples who have done like me and have stopped
using natural and composite keys in favor of the exclusive use of
surrogate keys but I don't remember anyone doing the opposite; ie. going
from the use of surrogate keys to the use of natural and composite keys.

--
Sylvain Lafontaine, ing.
MVP - Technologies Virtual-PC
E-mail: sylvain aei ca (fill the blanks, no spam please)


On Jan 24, 11:00 pm, "James A. Fortune" <[email protected]>
wrote:

Whenever I have multiple key fields, natural or not, I create an
AutoNumber PK for pragmatic reasons. The main reason is that it makes
it easier to create the joins. The theorists are champions at joining
tables and don't have to be concerned with the complexity of the SQL
they write.

Word to the wise: 'theorists' hate SQL.

Queries involving many to many relationships often add
additional tables later and highlight the need to keep joins as simple
as possible.

I think I'm with Bob Badour (gulp!) on the issue of complexity,
though: if you think more columns in the ON clause makes a SQL join
more 'complex' then I think you could be looking at things wrong.
Having more characters to type increases the risk of typos? More
columns mean you may omit one in error? The SQL engine may be twice as
slow in handling two columns rather than one? Is it more 'complex' to
split a post address into 'subatomic' columns (address lines from
postal code/zip etc)?

Surely the issue you allude to (I think) is the one that Access
Relationships (as distinct from Jet foreign keys) were invented to
solve? i.e. you pre-define the join columns and 'join type' (inner
join, left outer join or right outer join) and the join clause gets
written as SQL for you when you drop the tables into the Query Builder
thing. I would have thought the 'theorists' would love the fact that
you also create foreign keys in the same Relationships dialog i.e. you
end up with a natural join (not having to explicitly specify the
columns yourself) because one table references the other.

[I tend to be dismissive of tools that write SQL code for me but I
think I should perhaps review my stance e.g. I still write all my SQL
Server procs by hand whereas I have tasked myself to investigate CRUD
generators. But, for the time being, ...] As a SQL coder myself, I
find it more annoying that I have to create multiple joins to get the
'natural key' values, having to discover what the 'artificial key'
columns are in the first place.

Lately, I've increased the amount of normalization in one
of my databases and the joins got even more complicated, adding about a
line or so in the SQL view in Access for every new query using those
tables.

Bad luck: I think you might have got way with "reduced the amount of
denormalization" ;-) In this thread I've already broken my personal
rule (!!) about not mentioning normalization [formulated because the
average 'replier' around here thinks "fully normalized" is BCNF, which
they think is 3NF anyhow, and doesn't pay much attention to anomalies
that normalization doesn't address, unless the 'asker' mentions
storing calculations...]

I keep Jamie's advice in the back of my mind, about how enforcing
constraints at the table level is better than enforcing them through
code

..and best to do it in both places! Bear in mind that it's a rule of
thumb i.e. "strict rules modified in practise." Checking something in
the in front end allows you to give timely user feedback and could
save them some keying, not to mention a database roundtrip. Checking
in the database catches anything neglected in the front end by
omission of validation or introduction of bugs. In practice, some
things are better done in one place but not the other: contrast the
validation of the basic pattern of an email address with the
verification that an addressable entity can be contacted at that email
address; I don't think it would be sensible to put the latter test
into a table constraint, even if it were possible.

Jamie.

--

What part of simpler don't you understand :). Only one expression in
the ON is simpler. Needing less indexes is simpler. Not having to
look for your multi-key fields is easier, although your point that
Relationships can handle that is valid. If the AutoNumber key has a
one-to-one relationship with the multi-key fields then it's fine to
use it. There's no down side that I can see. I also like to rely on
coding to detect inconsistent data rather than on error trapping, so I
have to check the multi-key values anyway before adding a new record.
I think that your idea about enforcing constraints at both the table
level and in code is an excellent idea. The OP wanted to know what
people did and why. I still don't see any reason put forward for me
to change to a multi-field key. Are totals queries easier when multi-
field keys are used? BTW, "reduced the amount of denormalization"
works just as well. Real databases experience denormalizing
influences.

James A. Fortune
(e-mail address removed)
 
R

Roy Hann

Sylvain Lafontaine said:
To that, I would add that the increased simplicity of using a surrogate
(or artificial or autonumber) key as the primary key in place of a
composite key is only half their advantage.

The biggest problem that I have with composite keys is that they share the
same fundamental problem as natural keys: using them as the primary key is
allowing the fact that a primary key can change its value over time.
IMHO, a primary key should never be allowed to change its value once it
has been created; a assumption which will forbid the use of a composite
key in many cases. (Of course, if you don't mind to see a primary key
changing its value after its creation then you are not concerned by this
argument.).

I have decided not to respond to this post in detail because there isn't a
single point it makes that I agree with (as stated). I really don't know
where to start. One specific comment I will make is that my progression has
been the opposite of yours. I once used synthetic keys everywhere, but now
try to limit my use of them, with wholly beneficial effects. That may be
why I write about this with the fervor of a born-again convert.

I would have a lot more sympathy for these kinds of claims if the people
making them would give any hint that they know what the alternatives are,
and why their solutions make sense within application development tools.
For example, I don't think I've ever seen anyone enthusing about the liberal
use of synthetic keys who also noted that they are aware of the possibility
of declaring a foreign key constraint WITH ON UPDATE CASCADE. If you can
persuade me you've looked at it and had to reject for reasons X, Y, and Z, I
can respect that. Or tell me you know about it but your particular product
doesn't support it and I can respect that. Or tell me that you understand
that the DBMS handles the problem almost trivially but the application
development tools make you write extra code and I can respect that.

And what is all this tripe about composite keys making the SQL more complex?
If I bodge up my tables with a spurious third synthetic key (skey) so that
instead of writing:

select i.description, b.cause_of_damage
from orderitems i left join breakages b
using (ordernr,itemnr)

I can instead write:

select i.description, b.cause_of_damage
from orderitems i left join breakages b
on i.skey = b.skey

How much easier is THAT?? And at what cost?

If you don't convince me that you properly understand the problems, and
crucially, where the problems *really* lie, then you aren't going to
convince me that your solutions are anything but cut-and-paste rote-learned
hackery that seems elegant/sensible only to those with limited knowledge of
very limited products. You have to establish that you are credible.
Merely claiming years and years of experience (as others have) could just
mean they've been successfully getting away with being incompetent--and
goodness knows, that happens, so they won't get the benefit of the doubt.

Roy
 
N

Neil

Bob,

Here's a programming tip for you: prozac.


Bob Badour said:
Your hope is futile in the face of your ignorance and laziness.



I'll try.




Ignorance is bliss.


invincibly ignorant.

Precious few pleasant ways exist to express unpleasant truths. However,
intellectual honesty requires the expression of truths regardless of
appeal to oneself or to others.
 
N

Neil

Jamie Collins said:
Why do you say "needing the PK"? JohnV no doubt makes a good case when
considering bound controls in Access but I would point out that he
also said the PK is "never necessary, but it can be handy".


OK, I'll rephrase:

"That makes sense, about the PK coming in handy to refer spearately to the
junction table, if that situation exists."

Better? :)
 
D

David Cressey

(quote)
What part of simpler don't you understand :). Only one expression in
the ON is simpler. Needing less indexes is simpler. Not having to
look for your multi-key fields is easier, although your point that
Relationships can handle that is valid. If the AutoNumber key has a
one-to-one relationship with the multi-key fields then it's fine to
use it. There's no down side that I can see. I also like to rely on
coding to detect inconsistent data rather than on error trapping, so I
have to check the multi-key values anyway before adding a new record.
I think that your idea about enforcing constraints at both the table
level and in code is an excellent idea. The OP wanted to know what
people did and why. I still don't see any reason put forward for me
to change to a multi-field key. Are totals queries easier when multi-
field keys are used? BTW, "reduced the amount of denormalization"
works just as well. Real databases experience denormalizing
influences.

(end quote)

Simplicity is in the eye of the beholder.

I think it's simpler to rely on constraints enforced by the DBMS to prevent
duplicate entries
than it is to write code to accomplish the same thing.

When you want to delete an entry form a junction table, you almost always
know the two FKs that uniquely determine the entry to be deleted. You
almost never know the value of the superflous surrogate key. So it's simple
to use the two FK's as the criterion for deletion than it is to look up the
ID field, and then use that as the basis for deletion.

The above comment depends entirely on how you go about organizing you code.
I like to keep my code simple. At least "simple" in my own eyes.

The use of multi-key fields in star schemas doesn't make individual queries
any simpler. But it expands by orders of magnitude the number of different
combinations that can be used as selection criteria when computing totals or
other aggregates. This makes the entire system simpler, on a lerger scale.

I realize that star schema discussions may be out of place in MS access
newsgroups. The same is true in c.d.t. There is almost nothing of
theoretical interest in star schemas.
 
B

Bob Badour

Roy said:
I have decided not to respond to this post in detail because there isn't a
single point it makes that I agree with (as stated). I really don't know
where to start.

That, in a nutshell, is Date's _Principle of Incoherence_.


One specific comment I will make is that my progression has
been the opposite of yours. I once used synthetic keys everywhere, but now
try to limit my use of them, with wholly beneficial effects. That may be
why I write about this with the fervor of a born-again convert.

I would have a lot more sympathy for these kinds of claims if the people
making them would give any hint that they know what the alternatives are,
and why their solutions make sense within application development tools.
For example, I don't think I've ever seen anyone enthusing about the liberal
use of synthetic keys who also noted that they are aware of the possibility
of declaring a foreign key constraint WITH ON UPDATE CASCADE. If you can
persuade me you've looked at it and had to reject for reasons X, Y, and Z, I
can respect that. Or tell me you know about it but your particular product
doesn't support it and I can respect that. Or tell me that you understand
that the DBMS handles the problem almost trivially but the application
development tools make you write extra code and I can respect that.

And what is all this tripe about composite keys making the SQL more complex?
If I bodge up my tables with a spurious third synthetic key (skey) so that
instead of writing:

select i.description, b.cause_of_damage
from orderitems i left join breakages b
using (ordernr,itemnr)

I can instead write:

select i.description, b.cause_of_damage
from orderitems i left join breakages b
on i.skey = b.skey

How much easier is THAT?? And at what cost?

If you don't convince me that you properly understand the problems, and
crucially, where the problems *really* lie, then you aren't going to
convince me that your solutions are anything but cut-and-paste rote-learned
hackery that seems elegant/sensible only to those with limited knowledge of
very limited products. You have to establish that you are credible.
Merely claiming years and years of experience (as others have)

I have found some people can work for 10 years and get a year's
experience 10 times.


could just
 
B

Bob Badour

Hi Sylvain,

First, let me thank you for being so kind as to volunteer the
information that you are a Most Vociferous Person (MVP). It does a fair
service to the world when the self-aggrandizing ignorants self-declare
that information.

Sylvain said:
To that, I would add that the increased simplicity of using a surrogate (or
artificial or autonumber) key as the primary key in place of a composite key
is only half their advantage.

At this point, a prudent man would Plonk! you while mentally citing
Date's _Principle of Incoherence_. Never the prudent man, instead, I
observe the absurdity of your suggestion that adding features,
structures or attributes increases simplicity. What nonsense!

The biggest problem that I have with composite keys is that they share the
same fundamental problem as natural keys: using them as the primary key is
allowing the fact that a primary key can change its value over time. IMHO,
a primary key should never be allowed to change its value once it has been
created; a assumption which will forbid the use of a composite key in many
cases.

I find your absolutism foolish suggesting ignorance and/or stupidity.

The design criteria for keys are: uniqueness, irreducibility,
simplicity, stability and familiarity (in no particular order). If any
criterion is absolute, it is uniqueness not stability.


(Of course, if you don't mind to see a primary key changing its
value after its creation then you are not concerned by this argument.).

This is not only a theoritical argument as many interfaces - like Access -
won't like to see a primary key that could change it value.

It is not a theoretical argument at all. You simply regurgitate
ignorance and stupidity.

[remaining nonsense snipped]

Plonk!
 
B

Bob Badour

David said:
(quote)
What part of simpler don't you understand :). Only one expression in
the ON is simpler. Needing less indexes is simpler. Not having to
look for your multi-key fields is easier, although your point that
Relationships can handle that is valid. If the AutoNumber key has a
one-to-one relationship with the multi-key fields then it's fine to
use it. There's no down side that I can see. I also like to rely on
coding to detect inconsistent data rather than on error trapping, so I
have to check the multi-key values anyway before adding a new record.
I think that your idea about enforcing constraints at both the table
level and in code is an excellent idea. The OP wanted to know what
people did and why. I still don't see any reason put forward for me
to change to a multi-field key. Are totals queries easier when multi-
field keys are used? BTW, "reduced the amount of denormalization"
works just as well. Real databases experience denormalizing
influences.

(end quote)

Simplicity is in the eye of the beholder.

I tend to disagree. I suspect one can quantify simplicity and complexity.

I think it's simpler to rely on constraints enforced by the DBMS to prevent
duplicate entries
than it is to write code to accomplish the same thing.

Using the dbms uses fewer tools, fewer concepts, fewer computational
models, fewer structures, fewer machines. I suggest the observed
simplicity is more than a matter of perspective or opinion.

[further demonstrations of simplicity snipped]
 
S

Sylvain Lafontaine

Your argument about the use of a DRI WITH ON UPDATE CASCADE is an
interesting argument and one that come back often; however it's not a silver
buller. First of all, it's another level of complexity that you must add to
the design of your database; ie, you must make sure that they are all there
and no one is missing. Second, this DRI cannot be used with cyclic
relationship with SQL-Server but with Oracle, you can. (From your example,
I believe that you are working with Oracle). On SQL-Server, you must use
triggers to implement such a feature when there is a cyclic relationship.
Of course, when you are dealing with tens and hundreds of relationships,
this can quickly translate into a nightmare. There is also the qestion of
the diminution of performance and of general design: when you have to update
multiples records on multiple tables for what should be the change of a
single value in a single table make it hard to believe that this is a proper
normalized database design and this situation quickly worsen if you have to
take into account the correspondance with backups, reports and linked
databases; all systems for which there is no automatic DRI.

But why make it simpler when you can make it harder?

Finally, I don't understand your example at all. You are introducing us to
the NATURAL JOIN and USING statement that have been introduced by Oracle in
its 9i version (also in MySQL and Postgres, I believe) but I fail to see
what this has to do with the subject of this thread; the use of a separate
PK in a junction table and its highly related topic, ie. the use of natural
keys versus the use of surrogate keys. There is no relationship at all
between a NATURAL JOIN and a natural key and the Natural Join can be used as
easily with a surrogate key than with a natural key. The only thing that is
important with the Natural Join is the name of the key. (BTW, if you were
to ask me what I'm thinking about this little monstruosity, I would tell you
that this is a perfect example of a Pandora box.).

And finally, a for your request of asking me to convince you that I'm
properly understand the problem here of to etablish that I'm credible: I can
tell you that I have absolutely no intention of doing it and that I have
absolutely no interest at all about what you are thinking of me. The only
things that are of interest to me are the arguments that I'm seeing posted
here - whatever the people who might write them - but for someone who has
just make a confusion between a natural key and the NATURAL JOIN, asking for
such a thing make it looks very strange.
 
J

James A. Fortune

Marshall said:
Yes, exactly.

One of the greatest benefits, and one of the fundamental
differences between how SQL treats data and how
(most) conventional programming languages treat data
is that in SQL we specify data by its value, instead of by
location. I often observe that superfluous keys in the field
are an attempt to make SQL data have an address, to
make it behave the way the programmer's mental model
(perhaps influenced by years of using pointers) does.


Marshall

Personally, I don't take the natural keys out either, so they can still
be used for the deletion. The thought of giving the SQL data an address
and following a programmer's mental model did not enter into my thinking
at all. I am not trying to give the data an order either. You've been
listening to Celko too much. Because of his overall manner, which I
find quite offensive, I don't even want to listen to him when he's right
:). Is the AutoNumber primary key a denormalization of the schema?
Yes. Is it added for a reason? Yes again. I'm still waiting for a
cogent reason for me to go to using natural keys.

James A. Fortune
(e-mail address removed)
 
J

James A. Fortune

JOG said:
* Artificial keys allow you to enter the exact same statement of fact
twice. This would simply be nonsense.
* Artificial keys allows a tuple at t1 and a tuple at t2 to be
corresponded to each other, even if they don't have a _single_
attribute from the real world in common. This would also simply be
nonsense.

Access programmers use forms to interact with the data. If I follow
Jamie's advice and constrain the data at both the table level and in
code, then your points make more sense. Right now, they're just
arguments for me not to constrain the data at the table level because
the reasons you gave might make natural keys preferable in that
situation :).

* Referencing an artificial key in a child table can complicates
queries - and not just with a longer restrict clause, but with a whole
extra join that may well have been unrequired if a natural key had
been used.

I don't agree with that point. The child table can contain the
AutoNumber primary key from the main table as a foreign key if desired.
I don't see how using the natural key fields requires less joins than
that. Maybe an example would help me understand what you mean.
So not one, but three cogent reasons of the top of my head. I wouldn't
say there are never cases when an artificial key is useful, but they
certainly shouldn't be hidden, and adding them blindly to every
relation is surely just a bit silly. Regards, J.

Did I imply that that's what I do?

James A. Fortune
(e-mail address removed)
 
B

Bob Badour

James said:
Personally, I don't take the natural keys out either, so they can still
be used for the deletion. The thought of giving the SQL data an address
and following a programmer's mental model did not enter into my thinking
at all. I am not trying to give the data an order either. You've been
listening to Celko too much. Because of his overall manner, which I
find quite offensive, I don't even want to listen to him when he's right
:).

So, you object to Celko's style but consider him right?!? That's a new
one. ::rolls eyes::


Is the AutoNumber primary key a denormalization of the schema?

You are an ignoramus. You don't even have a clue what normalization is.
The addition of an attribute to act as a simple, stable key does not
affect the normal form in any way shape or manner.


Is it added for a reason? Yes again. I'm still waiting for a
cogent reason for me to go to using natural keys.

It's the familiarity, stupid. The design criteria for keys are (and I
repeat): uniqueness, irreducibility, stability, simplicity and
familiarity (in no particular order.)
 
B

Brian Selzer

JOG said:
* Artificial keys allow you to enter the exact same statement of fact
twice. This would simply be nonsense.

Not exactly. Artificial key values are simply names assigned to individuals
in the Universe of Discourse. I would think that it should be possible to
have many different names for the same thing: considering the fact that
there are a great many different languages, there must therefore be a great
many words for each thing.
* Artificial keys allows a tuple at t1 and a tuple at t2 to be
corresponded to each other, even if they don't have a _single_
attribute from the real world in common. This would also simply be
nonsense.

I wouldn't call it nonsense. It is not necessary that every property that
an individual exemplifies be represented in the database--only those
properties that are relevant to the problem at hand need be included. In
that event, if a particular individual is assigned a name at t1, and then if
the values for all of the properties that are relevant to the problem at
hand at t1 are compared to those from the individual with the same name at
t2, it is possible for all of those properties to be different. That isn't
nonsense, it just is, given the inherent incompleteness of the information
in the database.
* Referencing an artificial key in a child table can complicates
queries - and not just with a longer restrict clause, but with a whole
extra join that may well have been unrequired if a natural key had
been used.

You left one out. In a table that has multiple natural keys, when an
artificial key is added, which key values are its values surrogates for?
 
B

Brian Selzer

James A. Fortune said:
Access programmers use forms to interact with the data. If I follow
Jamie's advice and constrain the data at both the table level and in code,
then your points make more sense. Right now, they're just arguments for
me not to constrain the data at the table level because the reasons you
gave might make natural keys preferable in that situation :).

Well, that's just dumb. Checks in code can reduce database round-trips, and
therefore can improve performance, but are not and cannot be a substitute
for constraints on the tables. It is the constraints on the tables that
keeps garbage out of the database.
I don't agree with that point. The child table can contain the AutoNumber
primary key from the main table as a foreign key if desired. I don't see
how using the natural key fields requires less joins than that. Maybe an
example would help me understand what you mean.

An extra join may be needed if the natural key from the parent table is used
in a restrict clause. If all you have is the artificial key from the parent
table, then you have to join in order to access the natural key columns.
With natural keys, the natural key values from the parent table also appear
in the child table, so there isn't any need to join. Bottom line: joins of
artificial keys are typically faster than joins of natural keys due to the
size of the comparands, but with natural keys, fewer joins may be needed..
 
B

Brian Selzer

Well, that's just dumb. Checks in code can reduce database round-trips,
and
therefore can improve performance, but are not and cannot be a substitute
for constraints on the tables. It is the constraints on the tables that
keeps garbage out of the database.
If the users only access the tables through forms, conforming to best
practices in Access, how are they going to get garbage into the
tables? Now if you're trying to keep Jamie and his Excel SQL out of
your database, that's another story :).
<<<<<

There can be several forms that access the same table, so you would have to
duplicate the code behind each form that accesses a table, or you can get
garbage into the database.
An extra join may be needed if the natural key from the parent table is
used
in a restrict clause. If all you have is the artificial key from the
parent
table, then you have to join in order to access the natural key columns.
With natural keys, the natural key values from the parent table also
appear
in the child table, so there isn't any need to join. Bottom line: joins of
artificial keys are typically faster than joins of natural keys due to the
size of the comparands, but with natural keys, fewer joins may be needed..

If you're planning on using a natural key column in the child table as
part of a join then doesn't it make sense to include that field in the
child table?

Still waiting...


A typical schema with artificial keys:

Customer {CustomerKey, CustomerNo, ...}
Key {CustomerKey}, Key {CustomerNo}

Item {ItemKey, ItemNo, ...}
Key {ItemKey}, Key {ItemNo}

CI {CustomerItemKey, CustomerKey, ItemKey, CustomerItemNo}
Key {CustomerItemKey}, Key {CustomerKey, ItemKey}
CI[ItemKey] IN Item[ItemKey]
CI[CustomerKey] IN Customer[CustomerKey]

SOLine {SOLineKey, SOKey, SOLineNo, CustomerItemKey, Quantity, Price}
Key {SOLineKey}, Key {SOKey, SOLineNo}
SOLine[CustomerItemKey] IN CI[CustomerItemKey]


A typical schema with natural keys

Customer {CustomerNo, ...}
Key {CustomerNo}

Item {ItemNo, ...}
Key {ItemNo}

CI {CustomerNo, ItemNo, CustomerItemNo}
KEY {CustomerNo, ItemNo}
CI[CustomerNo] IN Customer[CustomerNo]
CI[ItemNo] IN Item[ItemNo]

SOLine {SO#, SOLineNo, CustomerNo, ItemNo, Quantity, Price}
SOLine[CustomerNo, ItemNo] IN CI[CustomerNo, ItemNo]


Now write a query that returns how many of item '12345' were sold to
customer '4321'

It should be obvious that with the natural keys, no joins are
necessary--it's just a simple select from SOLine since all of the
information is actually /in/ SOLine; whereas, with the artifical keys,
several joins are required because in order to query by item number and
customer number, SOLine must be joined to CI which must then be joined to
Customer and Item.
 
R

Roy Hann

Sylvain Lafontaine said:
Your argument about the use of a DRI WITH ON UPDATE CASCADE is an
interesting argument and one that come back often;

Really? I can put my hand on my heart and say I've never seen anyone
suggest it when they cross-post to comp.databases.theory. I don't doubt it
has come up, but I do doubt it is often.
however it's not a silver buller.

I implicitly allowed that it is not a silver bullet by actually suggesting
the kinds of reasons one might exclude it.
First of all, it's another level of complexity that you must add to the
design of your database; ie, you must make sure that they are all there
and no one is missing.

Of course one must make sure "they are all there"; you are absolutely right.
But it absurd to suggest doing that is "another level of complexity". It is
trivial to do it, and trivial to check that you've done it by querying the
DB catalogs.
Second, this DRI cannot be used with cyclic relationship with SQL-Server
but with Oracle, you can. (From your example, I believe that you are
working with Oracle).

I'm not. But the fact that you are distinguishing the behaviour of
particular products gets close to the real problem. The real problem is
that the products we use are all more or less defective, but instead of
clamouring to have them fixed (by establishing suitable standards and
following them) we promote workarounds as if they are actually desirable. I
have no problem at all with people describing workarounds for defects but I
have a major problem when it is implied that the workaround is some kind of
best-practice or even desirable.
On SQL-Server, you must use triggers to implement such a feature when
there is a cyclic relationship. Of course, when you are dealing with tens
and hundreds of relationships, this can quickly translate into a
nightmare. There is also the qestion of the diminution of performance and
of general design: when you have to update multiples records on multiple
tables for what should be the change of a single value in a single table
make it hard to believe that this is a proper normalized database design

First of all, updates to keys should be very rare. A fundamental property
of a well-chosen key is that it is reasonably stable. Imposing the little
self-discipline required to make sure you choose stable keys is
inconsequential when compared to the overall database design effort.

Secondly, even having to update hundreds of tables to amend a key is only
about the same effort required to insert all those rows in the first place.
Against the background of work the system does all the time, that will be
inconsequential. (Of course, if you unwisely choose a key that is not
stable, your argument would be more nearly correct. But that is why the
long-standing advice has been to avoid keys that are not stable.)
and this situation quickly worsen if you have to take into account the
correspondance with backups, reports and linked databases; all systems for
which there is no automatic DRI.

I don't entirely agree with all these reasons, but as I said in my earlier
post, there often *are* good reasons why one might not be able to use ON
UPDATE CASCADE in a particular product and I will take your word for it that
these reasons apply with the product you use. My challenge to you was to
signal you know that, and you have now done so.
But why make it simpler when you can make it harder?
Hm.

Finally, I don't understand your example at all. You are introducing us
to the NATURAL JOIN and USING statement that have been introduced by
Oracle in its 9i version (also in MySQL and Postgres, I believe) but I
fail to see what this has to do with the subject of this thread; the use
of a separate PK in a junction table and its highly related topic, ie. the
use of natural keys versus the use of surrogate keys. There is no
relationship at all between a NATURAL JOIN and a natural key and the
Natural Join can be used as easily with a surrogate key than with a
natural key.

I am confused about your argument here. I was giving counter-example to
disprove the claim that composite keys make the SQL code more complex, which
was being presented as an argument to introduce yet more, spurious,
synthetic/surrogate keys. In fact if you read my example carefully, you
will have seen that I talked explicitly about *three* synthetic keys because
I aware I was already using two (order number and item number).
The only thing that is important with the Natural Join is the name of the
key. (BTW, if you were to ask me what I'm thinking about this little
monstruosity, I would tell you that this is a perfect example of a Pandora
box.).

Well, I have to admit that I'm not over-fond of relying on names to imply
that two columns represent the same thing, so in fact I never use that
syntax. I do prefer to assert all the conditions on all the key columns
explicitly and I just don't notice the few extra keystrokes when it's a
composite key. But on the other hand, I find it monstrous when I see two or
more distinct names for colunms that do represent the same thing.
And finally, a for your request of asking me to convince you that I'm
properly understand the problem here of to etablish that I'm credible: I
can tell you that I have absolutely no intention of doing it

And yet you have greatly increased your credibility with this post. I still
disagree with what you've said, but I can see you know more about what
you're talking about than it seemed before. Before, I thought you were
ignorant and uncurious. Now I see you are merely wrong. :)
and that I have absolutely no interest at all about what you are thinking
of me.

That's very healthy. You shouldn't.
The only things that are of interest to me are the arguments that I'm
seeing posted here - whatever the people who might write them - but for
someone who has just make a confusion between a natural key and the NATURAL
JOIN, asking for such a thing make it looks very strange.

I make no such confusion, and a quick glance at my earlier post will confirm
it.

Roy
 
B

Bob Badour

Roy said:
"Sylvain Lafontaine" <sylvain aei ca (fill the blanks, no spam please)>
wrote in message
[snip]
And finally, a for your request of asking me to convince you that I'm
properly understand the problem here of to etablish that I'm credible: I
can tell you that I have absolutely no intention of doing it

And yet you have greatly increased your credibility with this post.

I disagree. You give him too much credit.


I still
disagree with what you've said, but I can see you know more about what
you're talking about than it seemed before. Before, I thought you were
ignorant and uncurious. Now I see you are merely wrong. :)

I suspect that is wishful thinking on your part.
 
D

David Cressey

an attempt to make SQL data have an address

Exactly. Literally hundreds of attempts to sell snake oil in c.d.t. can be
reduced to precisely this statement. We've all said variations of the
above, but I've never seen it put so succintly.

As far as the MS Access newsgroups that this discussion is posted to, I
can't speak to how well your summary extends to their mental model. But
many of them seem to write as if
contents as determined by address were the fundamental paradigm of data.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top