Data Primary key vs. Artificial (Autonumber) primary key

M

M.

Dear all,

Although many Microsoft Access books advise to set an autonumber field as
primary index (a so called pseudo primary key),
http://www.blueclaw-db.com/database_link_tables.htm advises to use real data
to define a (composite) primary key. In summary, this results in the
following two designs:
Microsoft Acces books setup for Employee table
Employee_ID (autonumber, primary key)
SSN (social security number, composite index key1)
Employee_Name (full employee name, composite index key2)
composite index SSN + Employee_Name = unique
BlueClaw setup for Employee table
Employee_ID (autonumber, unique index)
SSN (social security number, (composite) primary key1)
Employee_Name (full employee name, (composite) primary key2)

In both approaches, Employee_ID would be used as a foreign key in other
tables to define the relationship with the Employee table.

Are there any negative aspects associated with the BlueClaw approach?

Pros of BlueClaw approach
*Display of table is meaningful, because it's sorted on primary index
*No cascaded update necessary of linked relationship fields in other tables,
because autonumber is only used for linking tables and therefore will never
change.
*Prevention of duplicates is improved, since data fields are used to check
for duplicates, instead of an (always unique) autonumber field > this can
also be achieved with the composite unique index as shown above in the Access
books example.

Cons of BlueClaw approach
*???

I would appreciate your comments / opinion on the BlueClaw approach, because
I currently have the feeling that I'm missing something that explains why so
many people use autonumber fields as primary (artificial) key. If the
BlueClaw approach is the best one, I'm considering to use it as a standard in
new database design questions.

Best regards,

M.
 
J

Jeff Boyce

You may have just inflamed a long-running religious war about proper primary
keys.<g>

See comments in-line below...

M. said:
Dear all,

Although many Microsoft Access books advise to set an autonumber field as
primary index (a so called pseudo primary key),

"psuedo" implies "not real" -- a primary key is a unique identifier, no
matter where it comes from. It isn't a question of "real".
http://www.blueclaw-db.com/database_link_tables.htm advises to use real
data
to define a (composite) primary key. In summary, this results in the
following two designs:

Employee_ID (autonumber, primary key)
SSN (social security number, composite index key1)

you will want to be very careful about capturing/displaying SSNs. Moreover,
not every "person" has (or cares to share) one. How will you handle a Null
SSN?
Employee_Name (full employee name, composite index key2)

No, no no! If you combine more than one fact in a single field, you have to
work extra hard to do simple things, like, say, sort by LAST NAME! Use
FirstName and LastName fields, then use a query to concatenate them when
needed.
composite index SSN + Employee_Name = unique

can you say "identity theft"? what makes you think that SSN + Employee_Name
will be unique?
Employee_ID (autonumber, unique index)

Why? By definition, an Autonumber is supposed to already be unique, so you
wouldn't gain anything by indexing it.
SSN (social security number, (composite) primary key1)

(see above)
Employee_Name (full employee name, (composite) primary key2)

(see above)
In both approaches, Employee_ID would be used as a foreign key in other
tables to define the relationship with the Employee table.

If you go to the effort of creating a composite primary key, then why would
you not also go to the effort to "migrate" that key (i.e., all fields) to
the "child" tables?
Are there any negative aspects associated with the BlueClaw approach?

Pros of BlueClaw approach
*Display of table is meaningful, because it's sorted on primary index

No, NO, NO!! Access tables store data, Access forms (and reports) display
it. Using Access tables to display data is asking for trouble! (can you
tell I have some strong feelings on this topic?<g> - check this newsgroup
for others' ideas about using tables to display data. From experience, I
don't want inexperienced users mucking about directly in my tables.
Instead, I'll guide their use of the data via forms. This is a major
difference between, say, Word {everyone knows how to move words around} and
Access, a relational database {how many normal people understand relational
database design?})

And "meaningful"?! To whom? Just because a set of data is sorted in one
order doesn't mean that EVERYONE wants to see it in that order. I, for one,
prefer to see a list of employees sorted by last name when I'm considering
Human Resources activities, but by firstname when I'm looking for their
phone numbers.
*No cascaded update necessary of linked relationship fields in other
tables,
because autonumber is only used for linking tables and therefore will
never
change.

The implication is that the SSN and Employee_Name MAY change. So what?
There's next to zero effort required to set Cascading Updates when you set
the relationships among tables. And while an Autonumber may not change, you
can re-record a row of data and get a NEW autonumber, then delete the old
record. Where's your foreign key now?!
*Prevention of duplicates is improved, since data fields are used to check
for duplicates, instead of an (always unique) autonumber field > this can
also be achieved with the composite unique index as shown above in the
Access
books example.

Ahem! ?"Duplicates"? Are the following employees the same person:
John Doe
J. J. Doe
And what about John and his son John, who both work for your company, both
live at the same address, and both have the same last name. ?Duplicates?!
Cons of BlueClaw approach
*???

I would appreciate your comments / opinion on the BlueClaw approach,
because
I currently have the feeling that I'm missing something that explains why
so
many people use autonumber fields as primary (artificial) key. If the
BlueClaw approach is the best one, I'm considering to use it as a standard
in
new database design questions.

JOPO (just one person's opinion) Any approach to this that claims to be the
one and only appropriate way to do this is probably wrong! Use what works
for you.

Regards

Jeff Boyce
Microsoft Office/Access MVP
 
M

M.

Thanks for your reply, but unfortunately this doesn't answer my question. I'm
sorry to distract you with minor issues, like the SSN and name fields. These
were copied from the Blue Claw example on their website. Of course I'm aware
of full name issues, but that's not my question.

My main question is: are there negative aspects associated with using a
primary key based on data fields versus using a primary key based on an
artificial primary key as generated with an autonumber field? In both cases
the autonumber field would be used for defining relations between tables.

Until now my answer would be: there are no negative aspects associated with
the data fields approach.

Best regards,

M.
 
D

Dennis

ANY data can be used as a Primary Key AS LONG AS you are 100% sure of no
duplication and/or have code in place to prevent such an occurrance. I've
been a database designer for over 30 years, and until I started working with
MS Access, there was NO SUCH THING as an "autonumber key." EVERY key/primary
key was based on data. And several of my projects written in Access use
data-based keys/primaries.
 
D

Douglas J. Steele

M. said:
My main question is: are there negative aspects associated with using a
primary key based on data fields versus using a primary key based on an
artificial primary key as generated with an autonumber field? In both
cases
the autonumber field would be used for defining relations between tables.

There's no reason to have an Autonumber field AND a "natural" primary key.

If you're going to create relationships, they will always be based on the
primary key. You cannot change that.

As Jeff said, this really is a religious war, so I won't say any more. <g>
 
B

Boyd Trimmell aka HiTechCoach via AccessMonster.co

A primary key is a SINGLE field used as the record identifier. It really
should have no meaning to the user and in most cases never seam by users.

In all the databases that have worked with the have multiple fields for a
primary key have made it a lot harder to do anything, even things that should
be simple. I would never create one that way. I probably am glad the there
are people that recommend it (like Blue Claw) or actually do it so that I
will get lots of work fixing the issues it causes.

--
Boyd Trimmell
aka HiTechCoach
http://www.hitechcoach.com
http://www.officeprogramming.com

Message posted via AccessMonster.com
 
M

Michael Gramelspacher

There's no reason to have an Autonumber field AND a "natural" primary key.

If you're going to create relationships, they will always be based on the
primary key. You cannot change that.

As Jeff said, this really is a religious war, so I won't say any more. <g>

This works for me. It seems to go against what you are saying.

Sub CreateTest()
With CurrentProject.Connection

..Execute _
"CREATE TABLE Cities" & _
" (city_id IDENTITY (1,1) NOT NULL UNIQUE" & _
",city_name VARCHAR (30) NOT NULL" & _
",PRIMARY KEY (city_name));"

..Execute _
"CREATE TABLE States" & _
" (state_id IDENTITY(1,1) NOT NULL UNIQUE" & _
",state_name VARCHAR (30) NOT NULL" & _
",PRIMARY KEY (state_name));"

..Execute _
" CREATE TABLE Locations" & _
" (location_id IDENTITY (1,1) NOT NULL UNIQUE" & _
",city_id INTEGER NOT NULL" & _
" REFERENCES Cities (city_id)" & _
",state_id INTEGER NOT NULL" & _
" REFERENCES States (state_id)" & _
",PRIMARY KEY (city_id, state_id));"

End With
End Sub
 
T

Tony Toews [MVP]

BlueClaw are full of cr*p. The middle sentence in the following is
exceedingly arrogant.

"You may look at this design and say you have always seen the
Employee_ID set as the table primary key. No matter what your teacher
or books say - this would be incorrect. See why you almost never use
an artificially generated numeric ID as a primary key."

As far as thier "Table Design Example - Detail Table" goes it's based
on a very faulty assumption. That the employee only does one task
throughout the day. And that's seldom the case.

And they don't explain themselves very well either.

As Jeff states this can become a religious war.

In my opinion use autonumber primary keys on every table and unique
and duplicate indexes as appropriate.

Tony
--
Tony Toews, Microsoft Access MVP
Please respond only in the newsgroups so that others can
read the entire thread of messages.
Microsoft Access Links, Hints, Tips & Accounting Systems at
http://www.granite.ab.ca/accsmstr.htm
Tony's Microsoft Access Blog - http://msmvps.com/blogs/access/
 
D

Douglas J. Steele

A primary key does NOT have to be a single field.

A primary key is simply a special unique index, and therefore can contain up
to ten separate fields.

Perhaps you're thinking of an Autonumber field when you say "It really
should have no meaning to the user and in most cases never seam by users."
 
J

John W. Vinson

This works for me. It seems to go against what you are saying.

Sub CreateTest()
With CurrentProject.Connection

.Execute _
"CREATE TABLE Cities" & _
" (city_id IDENTITY (1,1) NOT NULL UNIQUE" & _
",city_name VARCHAR (30) NOT NULL" & _
",PRIMARY KEY (city_name));"

.Execute _
"CREATE TABLE States" & _
" (state_id IDENTITY(1,1) NOT NULL UNIQUE" & _
",state_name VARCHAR (30) NOT NULL" & _
",PRIMARY KEY (state_name));"

.Execute _
" CREATE TABLE Locations" & _
" (location_id IDENTITY (1,1) NOT NULL UNIQUE" & _
",city_id INTEGER NOT NULL" & _
" REFERENCES Cities (city_id)" & _
",state_id INTEGER NOT NULL" & _
" REFERENCES States (state_id)" & _
",PRIMARY KEY (city_id, state_id));"

End With
End Sub

Well, sure, it works. You're creating autonumber primary keys. It *works*,
everyone agrees with that; the question would be is it the *only* way that
works, or even the *best* way? Many would disagree.

In one particular case I'd really have to disagree with you. The US and
Canadian postal services have defined unique, stable (well *almost* stable),
short two-letter state codes. My States table would be more like

..Execute _
"CREATE TABLE States" & _
" (state_code CHAR(2) NOT NULL UNIQUE" & _
",state_name VARCHAR (30) NOT NULL" & _
",PRIMARY KEY (state_code));"


As for making the city_name or state_name the primary key of its table...
state_name you can get away with but certainly not city_name; there are many
cities named Springfield, and New Mexico even has two towns named Los Alamos
(the famous one with the rad lab, and a tiny village north of Las Vegas - no,
not the one in Nevada).

John W. Vinson [MVP]
 
J

Jamie Collins

Well, sure, it works. You're creating autonumber primary keys.

No, he isn't.
It *works*,
everyone agrees with that

What exactly is everyone agreeing with, in your opinion?

Jeff Boyce said: "If you're going to create relationships, they will
always be based on the primary key. You cannot change that."

Michael Gramelspacher said (paraphrasing): "I can create a FOREIGN KEY
REFERENCE (a.k.a. Access Relationship with referential integrity
enforced) based on a UNIQUE constraint, which goes against what Jeff
Boyce said about relationships always being based on the primary key."

Jamie says: If by 'relationships' Jeff Boyce meant 'Access
Relationships' (i.e. entities created in the Relationships window in
the Access user interface) then there is no requirement for them to be
based on either PKs or even UNIQUEs; heck, the columns in the
respective tables don't even need to be the same data type!

Jamie.

--
 
J

Jeff Boyce

Jamie

I'm old and forgetful, so help me remember...

I don't recall making the statement you quote me as making (or maybe you
know another Jeff Boyce):


Jamie Collins said:
Jeff Boyce said: "If you're going to create relationships, they will
always be based on the primary key. You cannot change that."

I'm pretty sure I did not say that in the current thread. Is there some
thread out of my past in which I said this? Could you provide the context
in which I said this? (Again, I don't recall ...)

Regards

Jeff Boyce
 
J

Jamie Collins

I don't recall making the statement you quote me as making (or maybe you
know another Jeff Boyce):

Check for the quote upthread. You were calling yourself 'Douglas J.
Steele' at the time but you didn't fool me.

Seriously, apologies for attributing the quote to the wrong Access
MVP. You guys all look the same to me <g>.

Jamie.

--
 
F

Fred

I've been both creating and using databases heavily for 17 years in business,
personal and organizational environments. And I've seen both ways done.
I'd like to weigh in on the side of primary keys not containing real world
data. If it's real world data, there will inevitably be an expectation that
it be current / correct. And that means at least occasionally correcting or
changing it. Even if you have the extremely rare case where the real world
data key is so stable it wil never never never change, (not even
state/province abbreviations are that clean/stable e.g. Quebec due to their
political problems) you still have the problem of making corrections to that
data for wrong entries.

I've also seen plans that try to blend the two. That generate a primary
key from actual data (e.g. a company name) but then never ever change it.
This presents a dilemma. You can't have you cake and eat it too: first,
does it show real world data or not? If the answer is yes, then the data is
wrong as soon as there is a change or a correction. If the answer is no,
then what are you putting real world info into the key.

If the key is abstract, then it is controlled by the database....there are
no hooks that could allow outsides forces to mess with it.

Fred
 
J

Jamie Collins

Calling all Jeff Boyces...
Why? By definition, an Autonumber is supposed to already be unique, so

An Autonumber is supposed to auto-generate a value according to an
algorithm; the choices are increment, random or GUID. Show me the
section of the Jet specification which says an Autonumber is supposed
to be unique.

If you like I can post some code to demonstrate the fact that Jet can
auto-generate duplicate Autonumber values (hint: you change increment
value from the default value of one to a value very close to the
maximum for Long Integer).
you
wouldn't gain anything by indexing it.

Can you say "data integrity"? what makes you think that values in an
Autonumber column will be unique unless you put a unique index (or
constraint) on it? Again, I can post code to explicitly insert
duplicate values into an Autonumber column if you like.

Jamie.

--
 
J

Jason Lepack

This works for me. It seems to go against what you are saying.

Sub CreateTest()
With CurrentProject.Connection

.Execute _
"CREATE TABLE Cities" & _
" (city_id IDENTITY (1,1) NOT NULL UNIQUE" & _
",city_name VARCHAR (30) NOT NULL" & _
",PRIMARY KEY (city_name));"

.Execute _
"CREATE TABLE States" & _
" (state_id IDENTITY(1,1) NOT NULL UNIQUE" & _
",state_name VARCHAR (30) NOT NULL" & _
",PRIMARY KEY (state_name));"

.Execute _
" CREATE TABLE Locations" & _
" (location_id IDENTITY (1,1) NOT NULL UNIQUE" & _
",city_id INTEGER NOT NULL" & _
" REFERENCES Cities (city_id)" & _
",state_id INTEGER NOT NULL" & _
" REFERENCES States (state_id)" & _
",PRIMARY KEY (city_id, state_id));"

End With
End Sub

Man, the weather's nice in Miami, North Dakota
 
J

Jamie Collins

My main question is: are there negative aspects associated with using a
primary key based on data fields versus using a primary key based on an
artificial primary key as generated with an autonumber field? In both cases
the autonumber field would be used for defining relations between tables.

As a guess: for reasons of optimization because PRIMARY KEY determines
physical ordering on disk (clustering). See:

Microsoft Jet 3.5 Performance Overview and Optimization Techniques
http://msdn.microsoft.com/archive/d...MicrosoftAccessMicrosoftJetDatabaseEngine.asp

"From a performance perspective, there are many reasons to frequently
compact a database. One reason is that compacting will create a new
database that stores all table rows in a contiguous order. If a
primary key or unique index is defined, the rows will be sorted in
order of the primary key or unique index. This allows Microsoft Jet to
take full advantage of its read-ahead cache and also reduces disk I/O
when doing sequential scans of a table."

Put the other way around, assigning the table's PRIMARY KEY
designation to the Autonumber column could have a negative impact on
the performance of queries which use a sequential scan on a table e.g.
using the BETWEEN keyword in SQL code.

Jamie.

--
 
J

Jamie Collins

Calling all Jeff Boyces...
Why? By definition, an Autonumber is supposed to already be unique, so

An Autonumber is supposed to auto-generate a value according to an
algorithm; the choices are increment, random or GUID. Show me the
section of the Jet specification which says Autonumber is supposed to
be unique. If you like I can post some code to demonstrate the fact
that Jet can auto-generate duplicate Autonumber values (hint: you
change increment value from the default value of one to a value very
close to the maximum for Long Integer).
you
wouldn't gain anything by indexing it.

Can you say "data integrity"? what makes you think that values in an
Autonumber column will be unique unless you put a unique index (or
constraint) on it? Again, I can post code to explicitly insert
duplicate values into an Autonumber column if you like.

Jamie.

--
 
J

Jamie Collins

ANY data can be used as a Primary Key AS LONG AS you are 100% sure of no
duplication and/or have code in place to prevent such an occurrance.

IIRC a column of type OLEOBJECT cannot be PRIMARY KEY.

Perhaps you meant to say "as long as the engine can be 100% sure of no
duplication"? While I can be sure that I am putting unique values into
a MEMO column, the engine only considers the first 255 characters when
checking a PRIMARY KEY constraint. Also I'd be vary wary of
approximate numeric types (e.g. REAL and FLOAT).

Jamie.

--
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top