Data Primary key vs. Artificial (Autonumber) primary key

J

Jeff Boyce

Oh, you are in so much trouble now!

Just because Douglas and I (Jeff) have traditionally male surnames doesn't
mean we're males... (oh wait, we are, never mind<g>)

Regards

Jeff
 
J

Jamie Collins

Just because Douglas and I (Jeff) have traditionally male surnames doesn't
mean we're males... (oh wait, we are, never mind<g>)

I've also seen the 'traditionally male' beards in the MVP mug
shots ;-)

Jamie.

--
 
J

Jeff Boyce

Facial hair is not restricted to males...

(hence the facination of the "Bearded Lady" in carnivals!)

You are STILL in trouble!

Regards

Jeff Boyce
Microsoft Office/Access MVP
 
D

Dennis

I was speaking from a pure theory perspective, not from an Access-specific one.

OLE objects aren't "data" per se; they're, well, OBJECTS. And memo fields
cannot be used as indexes or searched on, so as a key they'd be impossible to
use. In the "old-timey world" one needed to keep one's keys as short as
possible while being meaningful. A key-length over about 15 characters almost
guaranteed terrible application performance. But that was then....
 
J

John W. Vinson

"If you're going to create relationships, they will
always be based on the primary key. You cannot change that."

Michael Gramelspacher said (paraphrasing): "I can create a FOREIGN KEY
REFERENCE (a.k.a. Access Relationship with referential integrity
enforced) based on a UNIQUE constraint, which goes against what Jeff
Boyce said about relationships always being based on the primary key."

You're right (about the meaning if not the attribution), Jamie - I
misinterpreted Michael's point.

John W. Vinson [MVP]
 
J

John W. Vinson

And that means at least occasionally correcting or
changing it.

well... there *is* Cascade Updates to cover that possibility.

But I agree, it's ideal if the primary key is rock solid stable. If it's only
San Andreas faultline rock solid stable, you need to depend on cascade
updates, which do work but have their own problems!

John W. Vinson [MVP]
 
M

M.

Thanks for this helpful answer.

Is the following summary correct:

The (unique) index that would be used most frequently for sorting or
filtering the table records, should be defined as primary index from a
performance point of view, because the primary index determines the
sequential physical order on harddisk ?

M.
 
F

Fred

I'm from Chicago, we don't know what fault lines are. Could you use a
tornado analogy so that I can understand?

:)


Fred
 
J

Jeff Boyce

(psst, Fred, can you say New Madrid?)

Jeff
Fred said:
I'm from Chicago, we don't know what fault lines are. Could you use a
tornado analogy so that I can understand?

:)


Fred
 
D

David W. Fenton

Why? By definition, an Autonumber is supposed to already be
unique, so you wouldn't gain anything by indexing it.

Eh? The unique index has an effect on the type of relationship. If
both sides of the join lack a unique index, it's an undefined
relationship type. If one side does, it's a many-to-one. If both
sides do, it's a 1:1.
 
D

David W. Fenton

My main question is: are there negative aspects associated with
using a primary key based on data fields versus using a primary
key based on an artificial primary key as generated with an
autonumber field? In both cases the autonumber field would be used
for defining relations between tables.

Until now my answer would be: there are no negative aspects
associated with the data fields approach.

Have you tried it? For instance, try creating a join between two
tables where the key they are joined on has 5 fields in it.

My philosophy:

Single-field natural PKs are OK, though they should be for data that
is never updated once entered.

Multi-column keys are a horrid amount of work, and duplicate a
helluva lot of data.

It also makes your joins less efficient.

Last of all, very few of the tables in the apps I'm asked by clients
to build can have no Nulls, so natural keys are only usable if you
assign a default value. If you use a zero-length string, then that
means you can't query on Null, and it means that you can't propagate
Nulls in concatenation operations without using an intermediary
function to convert ZLS's to Null.
 
D

David W. Fenton

The (unique) index that would be used most frequently for sorting
or filtering the table records, should be defined as primary index
from a performance point of view, because the primary index
determines the sequential physical order on harddisk ?

No, unless you're limiting the discussion to Jet.

In SQL Server, for instance, you can cluster on any unique index
(i.e., write the table in that index's order). It doesn't have to be
the PK.
 
J

Jamie Collins

I'm from Chicago, we don't know what fault lines are. Could you use a
tornado analogy so that I can understand?

I'm from the UK, where both weather and geology is relatively benign.
So that we all may understand the point, can you please use an example
(in place of an analogy) of an industry standard key with a trusted
source which manages changes. As a suggestion, consider ISBN which
recently changed from 10 to 13 characters (which Amazon and many
public libraries seem to have survived <g>) and for which duplication
occasionally occurs.

Jamie.

--
 
J

Jamie Collins

Thanks for this helpful answer.

Is the following summary correct:

The (unique) index that would be used most frequently for sorting or
filtering the table records, should be defined as primary index from a
performance point of view, because the primary index determines the
sequential physical order on harddisk ?

Not exactly. To repeat: "This allows Microsoft Jet to take full
advantage of its read-ahead cache and also reduces disk I/O when doing
sequential scans of a table." The best example I can think of for
sequential scans of a table is using BETWEEN in SQL because the data
would be on contiguous pages. Tthink of a paper copy telephone
directory ordered by last name then first names and I asked you to get
me all the telephone numbers of people whose last names begin with the
letter 'C' and how easy it would be to rip out the appropriate pages,
then think about how difficult it would be to do the same if the task
was to get all the telephone numbers which began with the numeral '5'.
The next best example would be GROUP BY.

I suggest you do some research on clustered indexes *generally*; you
won't find much in the Jet world because most users disregard or are
unaware of (or even are in denial of <g>) the effects of clustering in
Jet. David W. Fenton's suggestion of looking at SQL Server (where you
have an explicit choice of clustered index) seems a good one. Which
reminds me, David came up with the brilliant suggestion of making a
random Autonumber column the sole PK to favour concurrency (because
values created close together in transaction time have more chance of
appearing on *different* pages).

Jamie.

--
 
J

Jamie Collins

=?Utf-8?B?TS4=?= <[email protected]> wrote innews:[email protected]:
Have you tried it? For instance, try creating a join between two
tables where the key they are joined on has 5 fields in it.

Multi-column keys are a horrid amount of work, and duplicate a
helluva lot of data.

Isn't this the problem Access Relationships were invented to solve?
i.e. defining the Relationship on those five columns is a one-time
task, after which you simply add the tables to the query builder tool
thing and it types out the JOIN syntax for you. Me, I can type fast
(and the parser does spell chekcing for me <g>) so I don't really mind
the 'hard' work. And if you are concerned about disk space, port to a
back end without a 2GB (or whatever) limit and buy a bigger drive
(relatively inexpensive these days).

Using meaningless keys in your tables results in many table joins to
get basic key data so you have to balance these things out. I'm wary
of people who use the same hammer to bash every kind of fixing said:
Last of all, very few of the tables in the apps I'm asked by clients
to build can have no Nulls, so natural keys are only usable if you
assign a default value.

I also have only a few nullable columns (subatomic elements, usually
of type DATETIME) and absolutely zero columns that allow a zero length
string but still your point alludes me. I suspect the answer is that
you are have to denormalized tables but could you please clarify with
an example? TIA.

Jamie.

--
 
J

Jamie Collins

I was speaking from a pure theory perspective, not from an Access-specific one.

I get into trouble myself for doing that said:
OLE objects aren't "data" per se; they're, well, OBJECTS.

LOL! Did I interpret the word 'data' too literally <g>? Is OLEOBJECT a
*data* type? Can columns of type OLEOBJECT be found in a Jet
'database'?
And memo fields
cannot be used as indexes or searched on

I don't know what you mean but memo fields can be indexed and can be
used in WHERE and HAVING clauses; you can cast them using CSTR() in
SQL code to prevent them from being truncated in certain
circumstances.

Jamie.

--
 
R

Rod Plastow

Jamie,

You forgot to mention the penultimate phone number change that meant phone
numbers would never have to change again - that is until the next phone
number change about four years later. :)

Let me nail my colours to the mast; I belong to the religious sect of a
'meaningless' pk. I believe the job of a pk is to identify the row and
nothing more and I've the dog collar to demonstrate my faith - well that is
until I come across a situation where to include a meaningless key simply
doubles the effort such as an accounting period table that I key with yyyymm
- surely there will never be another Pope Gregory!

As for 'Rock Solid:' I'm in the Philippines and we experienced an earthquake
last week so geological analogies do not work here.

Regards,

Rod
 
J

Jamie Collins

Let me nail my colours to the mast; I belong to the religious sect of a
'meaningless' pk. I believe the job of a pk is to identify the row and
nothing more and I've the dog collar to demonstrate my faith

I'm borderline 'atheist' because I don't use the PRIMARY KEY on every
table, and I have a relationally sound basis for doing so because a
key is a key is a key; calling one 'primary' is arbitrary.

But really it has nothing to do with 'religion': PRIMARY KEY behaves
the same in Jet regardless of your beliefs. So let's stick to the
facts.

What does PRIMARY KEY do for you that other keys ('non-primary keys'?)
do not? Doesn't allow the NULL value but I've got NOT NULL at the
column level for that. Let's you omit the PK columns in the RFERENCES
clause in SQL DDL but I prefer to write things out longhand. Puts the
columns in bold text in the Access UI but I don't look at the
Relationships picture (I find SQL code easier to interpret). So
clustering is the only aspect of PRIMARY KEY which I find appealing
for Jet. But try telling someone to use NOT NULL UNIQUE for their
existing PKs and leveraging the clustering nature of PRIMARY KEY as a
'physical' index and in most cases it's too much of a paradigm shift
for them to comprehend.

Jamie.

--
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top