Data Primary key vs. Artificial (Autonumber) primary key

F

Fred

Can someone tell me what this mere mortal is missing regarding how to look at
these long threads? The summary page says 40 posts (which looks correct)
but a look at the posts (2 pages with only one "more" between them) only
shows 30, and some that I know were there before are gone.

Fred
 
F

Fred

That's about 325 miles away. Although the shocks will travel farther in
Midwest geology, that's still like another country to us.

:)

Fred
 
D

Dennis

And memo fields
I don't know what you mean but memo fields can be indexed and can be
used in WHERE and HAVING clauses; you can cast them using CSTR() in
SQL code to prevent them from being truncated in certain
circumstances.

Then you're a better man that I, because I could never get them to work for
me in that regard. But that wasn't the first time I've said that, and it
CERTAINLY won't be the last - heh....

Dennis
 
D

Dale Fye

M.

I always love these discussions, because you have some purists, some
"religious fanatics", and some simpletons; call me a simpleton. The reason I
use "meaningless" primary keys is that it saves me work!!!

When I want to update a record, or select a record with an autonumber PK, I
type:

WHERE ID = 123

If I use a composite PK, I have to type:

WHERE [somefield] = somevalue
AND [someotherfield] = somevalue
AND [athirdfield] = somevalue
AND ....

This seems like a lot of extra work to me, so I avoid it where I can.

It also occurs to me that if you want to use the composite key, then if you
want to use this stuff in a list or combo box, and want to be able to select
the specific record, then you need to include all of the fields necessary for
the PK (which increases the load on your network).

Just my 2 cents!

Dale
--
Don''t forget to rate the post if it was helpful!

email address is invalid
Please reply to newsgroup only.
 
P

Pat Hartman

I'm not sure why I'm jumping in here and I'm not sure if someone has already
said what I am going to say so I apologize in advance. I also have over 30
years of database design experience and in my mainframe days, all primary
keys were "natural" and many were compound. It took me a few years of using
Access to become converted to the "autonumber as PK camp" but I have. I
find autonumbers extremely easy to work with. Access does have an issue
with autonumbers that are not defined as primary keys so I would NEVER use a
"natural" PK in a table that also had an autonumber. the autonumber would
ALWAYS be the PK and would be used for all relationships. When I have
business rules to satisfy that revolve around uniqueness of "natural" keys,
I use unique indexes.
The benefits of using an autonumber PK are:
1. there is never a need to cascade PK updates since the PK is an autonumber
and by definition cannot be changed.
2. Business rules regarding uniqueness can be implemented with unique
indexes.
3. In the cases where one of the "natural" keys may validly be blank, I
don't have an issue with a unique index whereas I would have an issue if I
defined a compound primary key since PKs may not contain null values. SSN
is a good example here (aside from the fact that there are laws prohibiting
its use as a PK) since not everyone will have one (h1b visa holders for one)
and there are only limited cases where a person actually has to provide one
to the requestor.
4. Combo/List boxes require code behind the scenes to work if the unique
identifier is compound.
5. As the hierarchy gets deeper, the compound key grows and joins become
more unwieldy. It is way too easy to omit one field out of 5 in a join with
disastrous results if the error isn't discovered quickly.

The only pro for using autonumber PKs is that a lot of your tables will end
up with lots of numeric columns that you would prefer to view as text when
you open the table. The solution is NOT table level lookups, it is creating
queries that join the tables so you can see the identifying "natural" key
data from the parent table. You only have to create one of these views for
each table so it's not like it is a great deal of work. Then for yourself
(because ONLY you should be viewing tables or queries directly), just
remember to open the query rather than the table when you want to see the
natural key values.

That's my opinion and I'm stickin' to it:)
 
M

M.

Dear Dale,

You could still use your preference for the (unique) autonumber field in the
WHERE statement (as suggested for defining relations in my example), even if
it's not the primary key. The same goes for the lookup box example. I'm in
the mean time convinced that it's a matter of taste (and maybe of performance
for the Jet engine) to decide between an artificial primary key and a real
data primary key.

What still interests me, however, is how pepole prevent entry of duplicate
records when they use a autonumber field as primary index. This choice
doesn't offer you any guarantee that the real data in your record is
accidentally entered a second time in a later record. In my opinion you would
still need a unique single (e.g. ISBN number) or multifield index (e.g. book
title + author name) to prevent duplicate entry. In the end, it's then a
matter of taste whether you name the autonumber field index primary or the
composite field(s) index primary.

If you disagree, please let me know,

regards,

M.

Dale Fye said:
M.

I always love these discussions, because you have some purists, some
"religious fanatics", and some simpletons; call me a simpleton. The reason I
use "meaningless" primary keys is that it saves me work!!!

When I want to update a record, or select a record with an autonumber PK, I
type:

WHERE ID = 123

If I use a composite PK, I have to type:

WHERE [somefield] = somevalue
AND [someotherfield] = somevalue
AND [athirdfield] = somevalue
AND ....

This seems like a lot of extra work to me, so I avoid it where I can.

It also occurs to me that if you want to use the composite key, then if you
want to use this stuff in a list or combo box, and want to be able to select
the specific record, then you need to include all of the fields necessary for
the PK (which increases the load on your network).

Just my 2 cents!

Dale
--
Don''t forget to rate the post if it was helpful!

email address is invalid
Please reply to newsgroup only.



M. said:
Dear all,

Although many Microsoft Access books advise to set an autonumber field as
primary index (a so called pseudo primary key),
http://www.blueclaw-db.com/database_link_tables.htm advises to use real data
to define a (composite) primary key. In summary, this results in the
following two designs:

Employee_ID (autonumber, primary key)
SSN (social security number, composite index key1)
Employee_Name (full employee name, composite index key2)
composite index SSN + Employee_Name = unique

Employee_ID (autonumber, unique index)
SSN (social security number, (composite) primary key1)
Employee_Name (full employee name, (composite) primary key2)

In both approaches, Employee_ID would be used as a foreign key in other
tables to define the relationship with the Employee table.

Are there any negative aspects associated with the BlueClaw approach?

Pros of BlueClaw approach
*Display of table is meaningful, because it's sorted on primary index
*No cascaded update necessary of linked relationship fields in other tables,
because autonumber is only used for linking tables and therefore will never
change.
*Prevention of duplicates is improved, since data fields are used to check
for duplicates, instead of an (always unique) autonumber field > this can
also be achieved with the composite unique index as shown above in the Access
books example.

Cons of BlueClaw approach
*???

I would appreciate your comments / opinion on the BlueClaw approach, because
I currently have the feeling that I'm missing something that explains why so
many people use autonumber fields as primary (artificial) key. If the
BlueClaw approach is the best one, I'm considering to use it as a standard in
new database design questions.

Best regards,

M.
 
T

Tony Toews [MVP]

Dale Fye said:
I always love these discussions, because you have some purists, some
"religious fanatics", and some simpletons; call me a simpleton. The reason I
use "meaningless" primary keys is that it saves me work!!!

When I want to update a record, or select a record with an autonumber PK, I
type:

WHERE ID = 123

If I use a composite PK, I have to type:

WHERE [somefield] = somevalue
AND [someotherfield] = somevalue
AND [athirdfield] = somevalue
AND ....

Consider if you have a parent and subform based a composite PK?
That's a lot of extra work there figuring out what fields go where.

Tony
--
Tony Toews, Microsoft Access MVP
Please respond only in the newsgroups so that others can
read the entire thread of messages.
Microsoft Access Links, Hints, Tips & Accounting Systems at
http://www.granite.ab.ca/accsmstr.htm
Tony's Microsoft Access Blog - http://msmvps.com/blogs/access/
 
D

Dale Fye

I am certain that there are some performance issues associated with either
decision.

I generally use the Autonumber field as my PK, but use multi-field unique
indexes, and application level business logic, to prevent "duplicate
entries". As long as you restrict your users access to the raw data, it is
frequently easier to implement the business logic in the application rather
then through table constraints.

Dale
--
Don''t forget to rate the post if it was helpful!

email address is invalid
Please reply to newsgroup only.



M. said:
Dear Dale,

You could still use your preference for the (unique) autonumber field in the
WHERE statement (as suggested for defining relations in my example), even if
it's not the primary key. The same goes for the lookup box example. I'm in
the mean time convinced that it's a matter of taste (and maybe of performance
for the Jet engine) to decide between an artificial primary key and a real
data primary key.

What still interests me, however, is how pepole prevent entry of duplicate
records when they use a autonumber field as primary index. This choice
doesn't offer you any guarantee that the real data in your record is
accidentally entered a second time in a later record. In my opinion you would
still need a unique single (e.g. ISBN number) or multifield index (e.g. book
title + author name) to prevent duplicate entry. In the end, it's then a
matter of taste whether you name the autonumber field index primary or the
composite field(s) index primary.

If you disagree, please let me know,

regards,

M.

Dale Fye said:
M.

I always love these discussions, because you have some purists, some
"religious fanatics", and some simpletons; call me a simpleton. The reason I
use "meaningless" primary keys is that it saves me work!!!

When I want to update a record, or select a record with an autonumber PK, I
type:

WHERE ID = 123

If I use a composite PK, I have to type:

WHERE [somefield] = somevalue
AND [someotherfield] = somevalue
AND [athirdfield] = somevalue
AND ....

This seems like a lot of extra work to me, so I avoid it where I can.

It also occurs to me that if you want to use the composite key, then if you
want to use this stuff in a list or combo box, and want to be able to select
the specific record, then you need to include all of the fields necessary for
the PK (which increases the load on your network).

Just my 2 cents!

Dale
--
Don''t forget to rate the post if it was helpful!

email address is invalid
Please reply to newsgroup only.



M. said:
Dear all,

Although many Microsoft Access books advise to set an autonumber field as
primary index (a so called pseudo primary key),
http://www.blueclaw-db.com/database_link_tables.htm advises to use real data
to define a (composite) primary key. In summary, this results in the
following two designs:

Microsoft Acces books setup for Employee table
Employee_ID (autonumber, primary key)
SSN (social security number, composite index key1)
Employee_Name (full employee name, composite index key2)
composite index SSN + Employee_Name = unique

BlueClaw setup for Employee table
Employee_ID (autonumber, unique index)
SSN (social security number, (composite) primary key1)
Employee_Name (full employee name, (composite) primary key2)

In both approaches, Employee_ID would be used as a foreign key in other
tables to define the relationship with the Employee table.

Are there any negative aspects associated with the BlueClaw approach?

Pros of BlueClaw approach
*Display of table is meaningful, because it's sorted on primary index
*No cascaded update necessary of linked relationship fields in other tables,
because autonumber is only used for linking tables and therefore will never
change.
*Prevention of duplicates is improved, since data fields are used to check
for duplicates, instead of an (always unique) autonumber field > this can
also be achieved with the composite unique index as shown above in the Access
books example.

Cons of BlueClaw approach
*???

I would appreciate your comments / opinion on the BlueClaw approach, because
I currently have the feeling that I'm missing something that explains why so
many people use autonumber fields as primary (artificial) key. If the
BlueClaw approach is the best one, I'm considering to use it as a standard in
new database design questions.

Best regards,

M.
 
D

Dale Fye

Amen, brother!

--
Don''t forget to rate the post if it was helpful!

email address is invalid
Please reply to newsgroup only.
 
P

Pat Hartman

BTW, I meant con not pro.

Pat Hartman said:
I'm not sure why I'm jumping in here and I'm not sure if someone has
already said what I am going to say so I apologize in advance. I also
have over 30 years of database design experience and in my mainframe days,
all primary keys were "natural" and many were compound. It took me a few
years of using Access to become converted to the "autonumber as PK camp"
but I have. I find autonumbers extremely easy to work with. Access does
have an issue with autonumbers that are not defined as primary keys so I
would NEVER use a "natural" PK in a table that also had an autonumber.
the autonumber would ALWAYS be the PK and would be used for all
relationships. When I have business rules to satisfy that revolve around
uniqueness of "natural" keys, I use unique indexes.
The benefits of using an autonumber PK are:
1. there is never a need to cascade PK updates since the PK is an
autonumber and by definition cannot be changed.
2. Business rules regarding uniqueness can be implemented with unique
indexes.
3. In the cases where one of the "natural" keys may validly be blank, I
don't have an issue with a unique index whereas I would have an issue if
I defined a compound primary key since PKs may not contain null values.
SSN is a good example here (aside from the fact that there are laws
prohibiting its use as a PK) since not everyone will have one (h1b visa
holders for one) and there are only limited cases where a person actually
has to provide one to the requestor.
4. Combo/List boxes require code behind the scenes to work if the unique
identifier is compound.
5. As the hierarchy gets deeper, the compound key grows and joins become
more unwieldy. It is way too easy to omit one field out of 5 in a join
with disastrous results if the error isn't discovered quickly.

The only pro for using autonumber PKs is that a lot of your tables will
end up with lots of numeric columns that you would prefer to view as text
when you open the table. The solution is NOT table level lookups, it is
creating queries that join the tables so you can see the identifying
"natural" key data from the parent table. You only have to create one of
these views for each table so it's not like it is a great deal of work.
Then for yourself (because ONLY you should be viewing tables or queries
directly), just remember to open the query rather than the table when you
want to see the natural key values.

That's my opinion and I'm stickin' to it:)
 
D

David W. Fenton

What still interests me, however, is how pepole prevent entry of
duplicate records when they use a autonumber field as primary
index. This choice doesn't offer you any guarantee that the real
data in your record is accidentally entered a second time in a
later record. In my opinion you would still need a unique single
(e.g. ISBN number) or multifield index (e.g. book title + author
name) to prevent duplicate entry. In the end, it's then a matter
of taste whether you name the autonumber field index primary or
the composite field(s) index primary.

Well, if you're using a surrogate key, that doesn't mean you leave
out the unique indexes on fields and combinations of fields that
should also be unique. However, in many cases, composite keys can't
be unique because you have to allow storage of Nulls. In that case,
your app has to do duplicate checking of some sort. Given that with
things like names, you can't rely on a unique index even when Nulls
are disallowed (because Bob and Robert could still be the same
person), you often end up putting some form of duplicate checking
into your app even when you already have unique indexes on the
fields you're checking.

So, to me, having unique indexes is a question that is answered
field by field (or composite key by composite key), and is wholly
independent of which unique index you happen to use for your primary
key. The unique indexes are a function of the data, while the
primary key is a function of your database structure. One field or
collection of fields may serve both functions, but they don't have
to.
 
J

Jamie Collins

Then you're a better man that I, because I could never get them to work for
me in that regard. But that wasn't the first time I've said that, and it
CERTAINLY won't be the last - heh....

Try this:

CREATE TABLE Test1 (memo_col MEMO NOT NULL)
;
CREATE INDEX idx__test1__memo_col ON Test1 (memo_col)
;
CREATE TABLE Test2 (memo_col MEMO NOT NULL PRIMARY KEY)
;
CREATE TABLE Test3 (memo_col MEMO NOT NULL UNIQUE)
;

All the above work in Jet. The real issue is that Jet only tests the
first 255 characters of a MEMO column when testing a unique constraint
and in other contexts e.g. GROUP BY, UNION (but not UNION ALL), etc.
For example:

INSERT INTO Test2 (memo_col) VALUES
('123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a12345x')
;
INSERT INTO Test2 (memo_col) VALUES
('123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a12345z')
;

The above represent different values when considering all characters
but the engine considers them non-unique when checking the PK because
only the first 255 characters are used in the check. Off the top of my
head (so probably not the best), here's one way of achieving a unique
constraint with a MEMO column (ANSI-92 Query Mode syntax):

CREATE TABLE Test4 (
memo_col MEMO NOT NULL,
CONSTRAINT Test4__memo_col__unique
CHECK (1 = (
SELECT COUNT(*)
FROM Test4 AS T4
WHERE CSTR(Test4.memo_col) = CSTR(T4.memo_col))))
;
INSERT INTO Test4 (memo_col) VALUES
('123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a12345x')
;
INSERT INTO Test4 (memo_col) VALUES
('123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a123456789a12345z')
;

Jamie.

--
 
J

Jamie Collins

I always love these discussions, because you have some purists, some
"religious fanatics", and some simpletons; call me a simpleton. The reason I
use "meaningless" primary keys is that it saves me work!!!

When I want to update a record, or select a record with an autonumber PK, I
type:

WHERE ID = 123

If I use a composite PK, I have to type:

WHERE [somefield] = somevalue
AND [someotherfield] = somevalue
AND [athirdfield] = somevalue
AND ....

This seems like a lot of extra work to me, so I avoid it where I can.

Adding an artificial key to each table introduces data elements not
present in the reality being modelled. This seems like a lot of extra
work to me, so I avoid it where I can. The obvious example is a single-
column lookup table.

I've just got back from a project where an entity's four-column
'natural' key was replaced with a single-column meaningless key. Note,
however, the existing model was flawed because the real world key
would require six columns (not that columns exist in the real world of
course <g>) and the required data is not currently being captured. So
I guess I'm not a purist because, although I am inclined towards a
natural key e.g. a single column lookup table) I can open my mind up
to an artificial key when appropriate.

If you imagine I spent my time on this project 'simplifying' SQL
queries by trimming down WHERE clauses and reducing the SELECT clause
by three columns then you'd be dead wrong. Those columns involved in
the 'natural' key are still required by the application in most
scenarios so a lot of the donkey work has been creating JOINs back to
the entity tables to find out the real key values, sometimes six
tables deep. I've yet to do the metrics but it seems to me that the
additional table joins have had a noticeable negative impact on
performance.

Another minor issue I have is that the tables are less readable. For
example, looking at the InvoicePayments table I now only see the
meaningless invoice_ID whereas I used to see customer_ number; to view
that I now have to create four JOINs, and typing those ON clauses
wastes any effort I may have saved in the now simplified WHERE clause.

Jamie.

--
 
J

Jamie Collins

The only [con] for using autonumber PKs is that a lot of your tables will end
up with lots of numeric columns that you would prefer to view as text when
you open the table.

I've suggested another in this thread: that you have no influence over
physical ordering on disk in Jet other than by using PRIMARY KEY and
there is virtually no case in favour of clustering on a incremental
Autonumber column, being the Autonumber algorithm of choice because it
generates low value positive integers that are easy to type -- see
Dale Fye's 2 cents in this thread. In other words, putting the PK on
an Autonumber at best wastes an opportunity for better optimization
and at worst causes poor performance.
Access does have an issue
with autonumbers that are not defined as primary keys

What is the issue you are alluding too? TIA.

Jamie.

--
 
J

Jamie Collins

the autonumber would
ALWAYS be the PK and would be used for all relationships. When I have
business rules to satisfy that revolve around uniqueness of "natural" keys,
I use unique indexes.

The main issue I have with the 'Autonumber PK' movement is that the
message about having table constraints on the candidate keys (and
other business rules) doesn't always come across. You've said the
right things (IMO) about so-called surrogates in this thread but I
would wager than most users in the 'Autonumber PK' camp put a PK on
the Autonumber and think, "Job done" as regards table constraints.

Another significant issue is when Autonumber values are exposed to
users. Most informed people agree this should be avoided but again the
message isn't always cleat and so it does happen. In an ideal world,
surrogates would be implemented under the hood and not even be exposed
to DBA, developers, etc. I encourage people to use the random flavour
of Autonumber to discourage this practice (and persistent offenders
should be forced to use GUIDs <g>).

Jamie.

--
 
J

Jamie Collins

Consider if you have a parent and subform based a composite PK?
That's a lot of extra work there figuring out what fields go where.

This implies you design your database schema (tables etc) for the
convenience of your front end (forms and subforms) and/or for your own
convenience?! My approach is to design the database schema entirely
independent of the front ends (which is done by another person
anyhow).

Jamie.

--
 
P

Pat Hartman

I agree with you that way too many people do not understand the need for
unique indexes to support business rules when using autonumbers as primary
keys. I don't believe I said that autonumbers should be exposed.
Presumably, if there are candidate keys available, they are what the user
would see and use.

Clustered indexes are a double-edged sword and we won't discuss how they
work in "real" databases. In a Jet database, each table is reordered in
sequence by its primary key whenever the database is compacted. That is
what fools people into thinking queries without sort orders will always
return rows in a predictable order. The reordering acts sort of like a
clustered index. However, unlike "real" databases, all new records and some
changed records (if the record size increases) are stored outside of the
clustered sequence and must be reordered when the database is compacted.
Clustered indexes only provide benefit when large numbers of rows are
regularly retrieved in key sequence. When records are retrieved randomly
(one at a time) or based on other attributes, the clustering has no impact
whatsoever and alternate indexes are much more important for optimizing
query performance.

The most useful clustered index that I have ever used segregated a 9 million
row transaction table into months (days actually). Almost all reporting
access was to the most recent three months so the index was extremely
useful. And since the transaction date didn't change, there wasn't an issue
with outliers.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top