Separate PK in Jxn Tbl?

B

Brian Selzer

Sylvain Lafontaine said:
This remind me of the discussion twenty five years ago between relational
databases (like SQL-Server) and non-relational databases (like ISAM
databases, dBase, etc.). It has always been right to say that for simple
queries, non-relational databases are faster than relational databases.
However, nowadays, non-relational databases have (practically) vanished
because of their inherent slowness when the queries become more and more
complicated.

This is exactly the same situation with the possibility of accelerating a
query by using a natural key: you are accelerating simple queries that are
already light and fast but on the opposite side, you are slowing down
complexe queries that are already big and slow. Not sure if going this
way is really advantageous.

I don't think it's exactly the same situation. As long as the data resides
in the same table, it is possible to create several indexes over several
columns in order to improve query performance--including join performance.
It's a lot harder to do that when the data is spread out over several
tables, as can be seen from the artificial key example. (You could use
indexed views, perhaps, if you know ahead of time which queries will be
run.)
--
Sylvain Lafontaine, ing.
MVP - Technologies Virtual-PC
E-mail: sylvain aei ca (fill the blanks, no spam please)


Brian Selzer said:
"James A. Fortune" <[email protected]> wrote in messagenews:%
Access programmers use forms to interact with the data. If I follow
Jamie's advice and constrain the data at both the table level and in
code,
then your points make more sense. Right now, they're just arguments
for
me not to constrain the data at the table level because the reasons
you
gave might make natural keys preferable in that situation :).

Well, that's just dumb. Checks in code can reduce database round-trips,
and
therefore can improve performance, but are not and cannot be a
substitute
for constraints on the tables. It is the constraints on the tables that
keeps garbage out of the database.
If the users only access the tables through forms, conforming to best
practices in Access, how are they going to get garbage into the
tables? Now if you're trying to keep Jamie and his Excel SQL out of
your database, that's another story :).
<<<<<

There can be several forms that access the same table, so you would have
to duplicate the code behind each form that accesses a table, or you can
get garbage into the database.
* Referencing an artificial key in a child table can complicates
queries - and not just with a longer restrict clause, but with a
whole
extra join that may well have been unrequired if a natural key had
been used.

I don't agree with that point. The child table can contain the
AutoNumber
primary key from the main table as a foreign key if desired. I don't
see
how using the natural key fields requires less joins than that. Maybe
an
example would help me understand what you mean.

An extra join may be needed if the natural key from the parent table is
used
in a restrict clause. If all you have is the artificial key from the
parent
table, then you have to join in order to access the natural key columns.
With natural keys, the natural key values from the parent table also
appear
in the child table, so there isn't any need to join. Bottom line: joins
of
artificial keys are typically faster than joins of natural keys due to
the
size of the comparands, but with natural keys, fewer joins may be
needed..

If you're planning on using a natural key column in the child table as
part of a join then doesn't it make sense to include that field in the
child table?

Still waiting...


A typical schema with artificial keys:

Customer {CustomerKey, CustomerNo, ...}
Key {CustomerKey}, Key {CustomerNo}

Item {ItemKey, ItemNo, ...}
Key {ItemKey}, Key {ItemNo}

CI {CustomerItemKey, CustomerKey, ItemKey, CustomerItemNo}
Key {CustomerItemKey}, Key {CustomerKey, ItemKey}
CI[ItemKey] IN Item[ItemKey]
CI[CustomerKey] IN Customer[CustomerKey]

SOLine {SOLineKey, SOKey, SOLineNo, CustomerItemKey, Quantity, Price}
Key {SOLineKey}, Key {SOKey, SOLineNo}
SOLine[CustomerItemKey] IN CI[CustomerItemKey]


A typical schema with natural keys

Customer {CustomerNo, ...}
Key {CustomerNo}

Item {ItemNo, ...}
Key {ItemNo}

CI {CustomerNo, ItemNo, CustomerItemNo}
KEY {CustomerNo, ItemNo}
CI[CustomerNo] IN Customer[CustomerNo]
CI[ItemNo] IN Item[ItemNo]

SOLine {SO#, SOLineNo, CustomerNo, ItemNo, Quantity, Price}
SOLine[CustomerNo, ItemNo] IN CI[CustomerNo, ItemNo]


Now write a query that returns how many of item '12345' were sold to
customer '4321'

It should be obvious that with the natural keys, no joins are
necessary--it's just a simple select from SOLine since all of the
information is actually /in/ SOLine; whereas, with the artifical keys,
several joins are required because in order to query by item number and
customer number, SOLine must be joined to CI which must then be joined to
Customer and Item.
 
T

Tony Toews [MVP]

Jamie Collins said:

<chuckle> You almost got me there. However I was ridiculuing other people who
weren't part of the conversation. Not much of an excuse.

Tony
--
Tony Toews, Microsoft Access MVP
Please respond only in the newsgroups so that others can
read the entire thread of messages.
Microsoft Access Links, Hints, Tips & Accounting Systems at
http://www.granite.ab.ca/accsmstr.htm
Tony's Microsoft Access Blog - http://msmvps.com/blogs/access/
 
T

Tony Toews [MVP]

Brian Selzer said:
So now they're good reasons? In your earlier post, you said they weren't
good reasons. Can't you make up your mind? You also haven't stated your
reasons. How can I like them or not like them? I don't know them!

They are good reasons for me and, in my opinion, for many or all other users of
Access. I flippantly stated "no particular good reason" as I was thinking about
some of the regulars in the comp.databases.theor newsgroup and what I assumed would
be there viewpoint.
I was not speaking of corruption due to disk failures; I was instead
referring to permitting garbage into the database due to the misuse of
auto-number primary keys.

Ok, not quite sure what you mean by this. Presumably because a unique index wasn't
declared on other fields such as, for a parts table, a part number. Which can be
duplicated by multiple manufacturers so even there that's not quite a good example.

Tony
--
Tony Toews, Microsoft Access MVP
Please respond only in the newsgroups so that others can
read the entire thread of messages.
Microsoft Access Links, Hints, Tips & Accounting Systems at
http://www.granite.ab.ca/accsmstr.htm
Tony's Microsoft Access Blog - http://msmvps.com/blogs/access/
 
T

Tony Toews [MVP]

-CELKO- said:
I always use an autonumber PK and a unique index set on the two FK fields [sic]. Why? No particular good reason. One of my database rules is that all tables have an autonumber primary key [sic]. <<

You add redundancy to a schema and never thought about it?

I have and I'm quite comfortable with it.
If you had
gone thru the Normalization process, this would stick out as a
fundamental design error immediately. You have not been writing SQL;
you are faking a sequential file system in SQL and even say "field"
instead of "column" -- huge conceptual and implementation differences!
Ok.
Now if I was to have a child table [sic: that terms comes from Network DBs; did you used to program in IMS?] from the junction table [sic: did you mean a table that models a relationship among many entities? Or a multi-way pointer structure as in an Network DB?] then I would absolutely use a autonumber primary key [sic: it cannot be a key by definition] for ease of use when designing queries, forms and reports. <<

Never woked in IMS or Network DBs. These are common terminology in the Access world
so that's what I use.
I hope that you are not designing forms and reports in the database.

You've never used Access then?

Tony
--
Tony Toews, Microsoft Access MVP
Please respond only in the newsgroups so that others can
read the entire thread of messages.
Microsoft Access Links, Hints, Tips & Accounting Systems at
http://www.granite.ab.ca/accsmstr.htm
Tony's Microsoft Access Blog - http://msmvps.com/blogs/access/
 
T

Tony Toews [MVP]

rkc said:
Yes. Because Access developers have no need to separate data access
from user interface from business rules. Just bind a form to a
recordsource, sprinkle some VBA code in a few events and run with it.

I sure would like to have that middle tier with business rules and also have the
power of Access and RAD.

Tony

--
Tony Toews, Microsoft Access MVP
Please respond only in the newsgroups so that others can
read the entire thread of messages.
Microsoft Access Links, Hints, Tips & Accounting Systems at
http://www.granite.ab.ca/accsmstr.htm
Tony's Microsoft Access Blog - http://msmvps.com/blogs/access/
 
T

Tony Toews [MVP]

James A. Fortune said:
I am only speaking for myself. I may be the only Access programmer on
the planet who validates input the way I do in code.

Not sure exactly what mean mean by that statement but I do a lot of validating on
input as well. And I very much try to keep one form updating one table so as to
not have to duplicate such data.

I'd very much like to see a business rules layer implemented that worked well within
Access. However I rather much doubt we'll ever see it given that Microsoft is
focusing their attentions on the office information worker rather than the developer.

Tony
--
Tony Toews, Microsoft Access MVP
Please respond only in the newsgroups so that others can
read the entire thread of messages.
Microsoft Access Links, Hints, Tips & Accounting Systems at
http://www.granite.ab.ca/accsmstr.htm
Tony's Microsoft Access Blog - http://msmvps.com/blogs/access/
 
J

Jon Heggland

Quoth David W. Fenton:
Any database engine can have the schema defined in a way that will
allow duplicates.

Nonsense. Any /SQL-based/ database engine, perhaps.
 
E

Ed Murphy

Brian said:
Now write a query that returns how many of item '12345' were sold to
customer '4321'

It should be obvious that with the natural keys, no joins are
necessary--it's just a simple select from SOLine since all of the
information is actually /in/ SOLine; whereas, with the artifical keys,
several joins are required because in order to query by item number and
customer number, SOLine must be joined to CI which must then be joined to
Customer and Item.

I'm sure you already know this, but:

1) Assuming proper indexes, the extra time required to process the extra
joins should be negligible.

2) If the natural keys are changeable, then synthetic keys are a Big
Win. (Example: One of my higher-end clients has lots of items that
are essentially the same except for color, and routinely discontinues
slow-moving options - including, for various reasons, prepending 'Z'
to the item number.)
 
E

Ed Murphy

rkc said:
Yes. Because Access developers have no need to separate data access
from user interface from business rules. Just bind a form to a
recordsource, sprinkle some VBA code in a few events and run with it.

The few Access projects that I've taken over have precisely followed
the "just bind a form" approach. Which is presumably why the original
developers were sacked and I was brought in... :)
 
D

David Cressey

This is the big problem with natural keys. When the theory was first
elaborated, the point was that a natural key never change its value. That
was the point that was making the choice of a natural key on par with the
use of a surrogate key as the primary key of a table. However, like anyone
have discovered with experience, a natural key can change its value under a
set of various circonstances. One could argue that if a key can change its
value, than it's not a natural key but as you know, this argument bring
nothing in regard to help you choosing a natural key.

This is a specific example of a syndrome that I described more generally:
the mismanagement of natural keys.

If a key is "natural", it is managed, if at all, beyond the scope of the
DBMS. It might be managed at the application layer, or it might be managed
by people, or it might indeed be unmanaged data, like sunspots.

When it is managed by other people, it is subject to mismanagement.
Changing values that ought to be immutable is one of many ways that natural
keys can be mismanaged.

As I said before, I prefer to use natural keys where ever possible. If
that's not possible due to mismanagement of the natural keys, I'll use
synthetic keys.
 
D

David Cressey

(quote)
I don't claim to know more about db theory than the cdt regulars, or
to have more experience than the many practioners who have contributed
to this thread.

I do know that the debate over relationship representations has been
going on for years, neither side giving any ground. Some aspects of it
are never addressed. For example, making the two foreign keys in a
junction table a composite PRIMARY key prohibits NULL values from
either key, but declaring a separate surrogate primary key together
with a UNIQUE constraint on the two-foreign-key-composite-key does
allow NULL values for either (or both!). This can be useful for
representing unrelated entity tuples, either childless parents or
orphans.

(unquote)

In the case of a junction table, this point is moot.

If both FK values are NULL, the entire row of the junction table can be
omitted with no loss of information.

Childless parents will exist in the table that contains parents, where the
id of the parent is not an FK, but a PK. Same pattern for orphans. The two
tables can even be the same table.
 
D

David Cressey

David W. Fenton said:
(e-mail address removed) wrote in
m:


What if there's more than one application built on top of the
database?

I believe this is a moot point when it comes to MS Access. The app and the
database are all stored together in Access. There is, by definition, only
one Access.

Access regulars, feel free to correct this if it's wrong.
 
D

David Cressey

to see if there is a different number Ah! <<
In the US, you get a UPC/EAN code and then a batch number with booze.
The closest thing to a serial number is your cash register receipt.

This is the "cat food problem" in a nutshell. (Or, in this discussion, the
"gin bottle" problem.)
The problem has been known as the "cat food" problem for years and years.

Items on a cash register tape represent entities with no identity at the
individual level, only at the batch or product level. You therefore can't
tell whether two bottles of gin were checked out, or whether one bottle was
scanned twice.
 
R

Roy Hann

David Cressey said:
When it is managed by other people, it is subject to mismanagement.
Changing values that ought to be immutable is one of many ways that
natural
keys can be mismanaged.

As I said before, I prefer to use natural keys where ever possible. If
that's not possible due to mismanagement of the natural keys, I'll use
synthetic keys.

I think you may be asking too much of a natural key. I always took "stable"
to mean *almost* never changes ratherer than meaning absolutely immutable
(which I take to mean "intrinsically incapable of changing"). But whatever
one thinks it should mean, one always has to accommodate the possibility
that a key value will be mis-typed during manual entry by a user, and
therefore that it will have to be corrected. Mismanagement is, as you
rightly say, a fact of life.

But that's not usually a sufficiently good reason to introduce a synthetic
key. I have found it possible to tolerate quite a lot of updates to
supposedly stable keys using ON UPDATE CASCADE, with imperceptible
response-time costs. (One can of course measure the costs, but that's not
what counts.) A key would have to be really unstable to justify using a
synthetic key IMO.

Roy
 
B

Brendan Reynolds

David Cressey said:
I believe this is a moot point when it comes to MS Access. The app and
the
database are all stored together in Access. There is, by definition, only
one Access.

Access regulars, feel free to correct this if it's wrong.


The app and the data *can* be stored in one MDB, but do not *have* to be
stored in one MDB (linked tables). It is entirely possible - and in my
experience not unusual - to have more than one application reading and or
updating data in an MDB (or ACCDB).
 
B

Bob Badour

David said:
I believe this is a moot point when it comes to MS Access. The app and the
database are all stored together in Access. There is, by definition, only
one Access.

Access regulars, feel free to correct this if it's wrong.

While awkward, slow and kludgy, it is possible to attach an Access app
to an external database.
 
D

David Cressey

Roy Hann said:
I think you may be asking too much of a natural key. I always took "stable"
to mean *almost* never changes ratherer than meaning absolutely immutable
(which I take to mean "intrinsically incapable of changing"). But whatever
one thinks it should mean, one always has to accommodate the possibility
that a key value will be mis-typed during manual entry by a user, and
therefore that it will have to be corrected. Mismanagement is, as you
rightly say, a fact of life.

You are right. Whether or not mismanagement is a sufficient reason for
distrust of a natural key depends on the degree of mismanagement. I should
have made that explicit in my earlier post.

But that's not usually a sufficiently good reason to introduce a synthetic
key. I have found it possible to tolerate quite a lot of updates to
supposedly stable keys using ON UPDATE CASCADE, with imperceptible
response-time costs. (One can of course measure the costs, but that's not
what counts.) A key would have to be really unstable to justify using a
synthetic key IMO.

One problem with cascading updates occurs when you want to relate data still
in the database to data that was extracted from the database at an earlier
point in time. This includes, but is not limited to, data in the log file.
 
R

Roy Hann

David Cressey said:
One problem with cascading updates occurs when you want to relate data
still
in the database to data that was extracted from the database at an earlier
point in time. This includes, but is not limited to, data in the log
file.

Good points, and I will always concede that these are the kinds of external
factors that in practice force one into using more synthetic keys than we'd
like. Happily in my case the problem isn't too bad because we have a very
nice journal analyzer that quickly reveals any key updates.

I hope I have also been pretty careful to warn people that creating
redundant copies of data outside the database will cause all the problems
one always associates with redundant copies of data. We tend to do a lot of
refreshes instead of hoping our snapshots haven't mutated too badly. (In a
way it's been a chicken and egg solution: we never allowed the egg, so we
don't have to deal with the chicken! :) We also make use of replication
in several systems, and to our replicator an updated key is just another
update.

But to repeat, I do understand synthetic keys will sometimes be forced on
me. I don't insist they can always be avoided. They are my just last
resort, not my first.

Roy
 
B

Brian Selzer

David Cressey said:
This is a specific example of a syndrome that I described more generally:
the mismanagement of natural keys.

If a key is "natural", it is managed, if at all, beyond the scope of the
DBMS. It might be managed at the application layer, or it might be
managed
by people, or it might indeed be unmanaged data, like sunspots.

When it is managed by other people, it is subject to mismanagement.
Changing values that ought to be immutable is one of many ways that
natural
keys can be mismanaged.

As I said before, I prefer to use natural keys where ever possible. If
that's not possible due to mismanagement of the natural keys, I'll use
synthetic keys.

I don't think it's productive to blame it on the users. The problem you're
referring to is due to the nature of keys, and misunderstandings on the part
of the database designer as to what constitutes a key. All that is required
for a key to be a key is that in every possible database instance, a
projection over the attributes in the key for a relation has the same
cardinality as the relation. This does /NOT/ mean that a particular
combination of values /always/ identifies the same individual in the
Universe of Discourse, but only in the picture of the Universe that is a
database instance. In other words, a particular combination of values may
not /necessarily/ identify an individual, but rather may only /contingently/
identify an individual. This is the nature of keys: either the values for a
key are permanent identifiers, or they're not. It has nothing to do with
how well keys are managed. The values for a key may be managed perfectly,
yet still not be permanent identifiers--the position of something in a list
of things comes to mind.
 
S

Sylvain Lafontaine

Bof, if you live in a perfect world, one with infinite budget and infinite
time to do any project, then I understand your concerns.

However, in my case and probably in the case of Tony Toews - but I cannot
vouch for sur for him - I don't live in such a world. First of all, if I
were to live in a perfect world, I wouldn't have to work to earn a living in
the first place. Second, you cannot codifying everything for a variety of
reasons: budget, system already in place and working well, impossibility to
anticipate everything, more art than a science, etc., etc.

I have a client who has such a system at the moment as one critical part of
their business process, this system is working well at the moment and has
done so for many years; it would probably make your teeth gnashing but it's
not on their radar at this moment to change it. Why would they pay to
change something that had worked well for them for many years and at the
risk of finding themselves at the front of something new that might not work
as well as the one system? To give pleasure to people like Celko? If I
were to tell them that Celko would like to see them changing their system,
they would probably tell that if Celko was to bring them a check to pay for
the change, then maybe they would give it a try.

In this message, you can replace the name of Celko with the name of a lot of
persons around here but not with mine.

--
Sylvain Lafontaine, ing.
MVP - Technologies Virtual-PC
E-mail: sylvain aei ca (fill the blanks, no spam please)


« But many of them seem to write as if contents as determined by
ddress ».

The content is not determined by the address and in fact, for those who
are
using surrogate keys, the exact value of an address inside the database
has
zero importance.

For you maybe.

In a recent thread on this subject, Tony Toews Access MVP qualified
that he liked using incremental autonumbers (rather than random)
because they where easier to type (WHERE ID = -2001736589 may
encourage typos) and easier to drop into conversation ("Hello Tony?
I'm seeing a problem with the record where the ID is -2001736589...").
Did I mention that I sincerely appreciate Tony's honesty?

Also consider the amount of posts we see in the Access groups asking
to reseeding autonumbers, gaps in sequences, etc. Wrong mental model,
perhaps, but the mentality certainly exists.

Jamie.

--
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top