Multi-Field Primary Key

G

Guest

I have 2 fields: [PHONE] and [AUTONUMBER]

I would like to define the Primary Key as a multi-field primary key as the
concatenation of [PHONE]+[AUTONUMBER]

Now, [PHONE] may be null, but [AUTONUMBER] will always contain a unique
autonumber.

The problem is that Access won't allow null in [PHONE] even though the
multi-field concatenation of [PHONE]+[AUTONUMBER] would be unique. Why not?
Is this a bug? I thought that is the whole purpose of doing multi-fields is
to achieve guaranteed uniqueness?
 
A

Allen Browne

Not sure I follow the logic here. AutoNumbers are normally unique on their
own? So why would you want to use an autonumber plus another field?

If you really want to do it anyway, presumably Phone is Text field (not a
number field), so you can set these properties:
Allow Zero Length Yes
Required Yes
Default Value ""

The field will now contain a zero-length string (ZLS) instead of a Null
where this is no phone number, and so you can use it as part of the primary
key. You will also need to code so that the field is set to a ZLS when a
phone number is deleted.
 
G

Guest

I use the [PHONE] portion only to link in one-to-many to another table that
contains records of activity on that [PHONE]. I am assuming that the primary
key designation causes a faster lookup during an innerjoin than just an
indexed field? (Is that assumption correct?) I.E., the [PHONE]+[AUTONUMBER]
would be in the same position in the primary key index as [PHONE] alone, but
the [PHONE]+[AUTONUMBER] achieves uniqueness whereas [PHONE] alone isn't.
Actually the problem is that a lot of records have a null [PHONE] (because we
don't have a phone], so then we need the [AUTONUMBER] portion to achieve
uniqueness. So with your ZLS the index would look like
""1
""154
""894
""954
et cetera

whereas the non-null records would have 10-digit [PHONE] + [AUTONUMBER]:
2125551212955
2135551212956
2145551212957
et cetera

Maybe my logic is just totally flawed here. I always learned in db class
that the primary key is faster than just indexing alone, so you always want
to try to create a primary key to maximize the speed of lookups and joins.



Allen Browne said:
Not sure I follow the logic here. AutoNumbers are normally unique on their
own? So why would you want to use an autonumber plus another field?

If you really want to do it anyway, presumably Phone is Text field (not a
number field), so you can set these properties:
Allow Zero Length Yes
Required Yes
Default Value ""

The field will now contain a zero-length string (ZLS) instead of a Null
where this is no phone number, and so you can use it as part of the primary
key. You will also need to code so that the field is set to a ZLS when a
phone number is deleted.

--
Allen Browne - Microsoft MVP. Perth, Western Australia

Reply to group, rather than allenbrowne at mvps dot org.

nickh said:
I have 2 fields: [PHONE] and [AUTONUMBER]

I would like to define the Primary Key as a multi-field primary key as the
concatenation of [PHONE]+[AUTONUMBER]

Now, [PHONE] may be null, but [AUTONUMBER] will always contain a unique
autonumber.

The problem is that Access won't allow null in [PHONE] even though the
multi-field concatenation of [PHONE]+[AUTONUMBER] would be unique. Why
not?
Is this a bug? I thought that is the whole purpose of doing multi-fields
is
to achieve guaranteed uniqueness?
 
J

John Vinson

I use the [PHONE] portion only to link in one-to-many to another table that
contains records of activity on that [PHONE]. I am assuming that the primary
key designation causes a faster lookup during an innerjoin than just an
indexed field? (Is that assumption correct?)

No, it is not.

Put an Index (unique or not as appropriate) on the Phone field,
distinct from the Primary Key. The Autonumber alone should be the PK;
the index on Phone will make your joins work just fine (better than a
composite index would do in fact).

John W. Vinson[MVP]
 
J

Jamie Collins

I have 2 fields: [PHONE] and [AUTONUMBER]
I would like to define the Primary Key as a multi-field primary key as the
concatenation of [PHONE]+[AUTONUMBER]
Now, [PHONE] may be null, but [AUTONUMBER] will always contain a unique
autonumber.
The problem is that Access won't allow null in [PHONE] even though the
multi-field concatenation of [PHONE]+[AUTONUMBER] would be unique. Why
not?
Is this a bug?

Such a key could implemented using a UNIQUE constraint i.e.
UNIQUE([PHONE], [AUTONUMBER]). In relational terms, a key is a key and
'primary' has no special meaning. The UNIQUE constraint can still be
used in a FOREIGN KEY constraint (see OT note below).
Perhaps the better approach is to ask: what does PRIMARY KEY give me
that UNIQUE does not, aside from requiring that all values be not null?
(FWIW columns in the PRIMARY KEY designation can be nullable, it's just
you can't actually put the null value in them)...

The key could implemented using a UNIQUE constraint i.e.
UNIQUE([PHONE], [AUTONUMBER]). In relational terms, a key is a key.
The UNIQUE constraint can still be used in a FOREIGN KEY.
AutoNumbers are normally unique on their
own? So why would you want to use an autonumber plus another field?

A valid answer to your question is clustering or physical ordering on
disk.

The better 'phone book' example would be the desire to cluster on the
subscribers' names. The phone number (using a natural key) or
autonumber would be required to break duplicates because subscribers'
names may not be unique.

Care is required in declaring the columns in the correct order i.e.
PRIMARY KEY (name, phone_number) i.e. the columns should be in
left-to-right order of significance to the clustering.

Clustering on name would favour BETWEEN (e.g. grab all names beginning
with 'B') and GROUP BY on the name because the rows would be physically
contiguous (and on the same page) to begin with and would be faster to
fetch. Think how 'useful' a paper phone book physically order on
telephone number would be to the average user.

Clustering on phone number or *random* autonumber on the other hand
would favour concurrency but a sequential autonumber PRIMARY KEY may be
the worst choice in any case.

[OT ]

FWIW nulls in foreign key constraints is one of the few areas where the
Jet implementation is IMO better than SQL Server's (and, Allen, I know
you are a fan of the 'CASCADE to null' feature in Jet 4.0 <g>).

SQL Server will disregard all values in the referencing columns if one
of then contains a null and also will not cascade them e.g. from a null
value to a not null value. When you see how Jet handles it, it makes
the SQL Server approach seem very odd e.g.

CREATE TABLE Table1 (
key_col1 INTEGER NOT NULL,
key_col2 INTEGER, UNIQUE (key_col1, key_col2)
)
;
CREATE TABLE Table2 (
key_col1 INTEGER NOT NULL,
key_col2 INTEGER,
FOREIGN KEY (key_col1, key_col2)
REFERENCES Table1 (key_col1, key_col2)
ON DELETE CASCADE
ON UPDATE CASCADE
)
;

INSERT INTO Table1 (key_col1, key_col2) VALUES (1, NULL)
;
INSERT INTO Table1 (key_col1, key_col2) VALUES (2, NULL)
;
INSERT INTO Table1 (key_col1, key_col2) VALUES (3, 3)
;
INSERT INTO Table2 (key_col1, key_col2) VALUES (1, NULL)
;
INSERT INTO Table2 (key_col1, key_col2) VALUES (2, NULL)
;
INSERT INTO Table2 (key_col1, key_col2) VALUES (3, NULL)
;

That last insert would fail in Jet but in SQL Server it would succeed
on the logic, I guess, that Table1.col2 = Table2.col2 returns unknown
therefore should not fail; Jet's apparent logic 'NULL must equal NULL'
is harder to defend 'relationally'. However, what seems strange IMO is
the SQL Server CASCADE behaviour:

UPDATE Table1
SET key_col2 = 1
WHERE key_col1 = 1
;

The above change does not cascade to the referencing table (Table2) in
SQL Server. While the inability to compare null values between tables
may be sound 'relationally' it would seem to be unintuitive and not the
desired effect. Similarly, when deleting:

DELETE
FROM Table1
WHERE key_col1 = 2

the row is removed from the referencing table in Jet but not in SQL
Server.

I think the Jet behaviour has more utility. Such 'practical' decisions
are hardly unprecedented, even my the famously strict ANSI SQL
standards committees. For example, if one null value cannot be compared
as being equal to another null value then why do they GROUP together?
e.g.

DELETE FROM Table1
;
INSERT INTO Table1 (key_col1, key_col2) VALUES (1, NULL)
;
INSERT INTO Table1 (key_col1, key_col2) VALUES (2, NULL)
;
INSERT INTO Table1 (key_col1, key_col2) VALUES (3, 3)
;
SELECT key_col2, COUNT(*)
FROM Table1
GROUP BY key_col2
;

The answer is, at least in part, because most people would expect it to
work this way and such an approach has the greater practical utility.

Jamie.

--
 
V

Van T. Dinh

It is probably more important for the Inner Join to put an Index on the
ForeignKey [frg_PHONE] in the "Many" Table in this case.

Something troubles me: Do you have Activities that don't involve [PHONE]?
 
G

Guest

So, Jamie, if I am understanding your points, you are saying that there
actually is a benefit to my idea in terms of the physical fragmentation on
the disk. I.E. the worst arrangement is to have the autonumber alone be the
primary key because it is completely random and irrelevant and there will be
the most disk head movement as I try to fetch phone#'s from the table
(assuming I am fetching them in sequential phone# order). My idea to have
PHONE+AUTONUMBER would be good because at least the phone #'s will be
sequentially next to each other on the physical disk. This database is quite
large already at 1.2 million records, and growing by 220,000 records per
month.

I really have 2 different types of records mixed together in 1 table, which
may be the root problem here (that I should redesign the schema into 2
tables!) I.E., I am storing records with phones, and also records without
phones (by the way these are sales leads purchased from InfoUSA; some have
phones which we telemarket to, and others are name/address only no phone
which we mail to). We are mixing the 2 records together in the same table
since they are both "leads". Those records with phones the obvious lookup
key is the phone (appended with an autonumber to break ties because we do
receive dupe phones occassionally in the daily feed). Those records that are
name/address only no phone, then I would rely on the autonumber to be the key
because the phone would be null or zerolengthstring

Jamie Collins said:
I have 2 fields: [PHONE] and [AUTONUMBER]
I would like to define the Primary Key as a multi-field primary key as the
concatenation of [PHONE]+[AUTONUMBER]
Now, [PHONE] may be null, but [AUTONUMBER] will always contain a unique
autonumber.
The problem is that Access won't allow null in [PHONE] even though the
multi-field concatenation of [PHONE]+[AUTONUMBER] would be unique. Why
not?
Is this a bug?

Such a key could implemented using a UNIQUE constraint i.e.
UNIQUE([PHONE], [AUTONUMBER]). In relational terms, a key is a key and
'primary' has no special meaning. The UNIQUE constraint can still be
used in a FOREIGN KEY constraint (see OT note below).
Perhaps the better approach is to ask: what does PRIMARY KEY give me
that UNIQUE does not, aside from requiring that all values be not null?
(FWIW columns in the PRIMARY KEY designation can be nullable, it's just
you can't actually put the null value in them)...

The key could implemented using a UNIQUE constraint i.e.
UNIQUE([PHONE], [AUTONUMBER]). In relational terms, a key is a key.
The UNIQUE constraint can still be used in a FOREIGN KEY.
AutoNumbers are normally unique on their
own? So why would you want to use an autonumber plus another field?

A valid answer to your question is clustering or physical ordering on
disk.

The better 'phone book' example would be the desire to cluster on the
subscribers' names. The phone number (using a natural key) or
autonumber would be required to break duplicates because subscribers'
names may not be unique.

Care is required in declaring the columns in the correct order i.e.
PRIMARY KEY (name, phone_number) i.e. the columns should be in
left-to-right order of significance to the clustering.

Clustering on name would favour BETWEEN (e.g. grab all names beginning
with 'B') and GROUP BY on the name because the rows would be physically
contiguous (and on the same page) to begin with and would be faster to
fetch. Think how 'useful' a paper phone book physically order on
telephone number would be to the average user.

Clustering on phone number or *random* autonumber on the other hand
would favour concurrency but a sequential autonumber PRIMARY KEY may be
the worst choice in any case.

[OT ]

FWIW nulls in foreign key constraints is one of the few areas where the
Jet implementation is IMO better than SQL Server's (and, Allen, I know
you are a fan of the 'CASCADE to null' feature in Jet 4.0 <g>).

SQL Server will disregard all values in the referencing columns if one
of then contains a null and also will not cascade them e.g. from a null
value to a not null value. When you see how Jet handles it, it makes
the SQL Server approach seem very odd e.g.

CREATE TABLE Table1 (
key_col1 INTEGER NOT NULL,
key_col2 INTEGER, UNIQUE (key_col1, key_col2)
)
;
CREATE TABLE Table2 (
key_col1 INTEGER NOT NULL,
key_col2 INTEGER,
FOREIGN KEY (key_col1, key_col2)
REFERENCES Table1 (key_col1, key_col2)
ON DELETE CASCADE
ON UPDATE CASCADE
)
;

INSERT INTO Table1 (key_col1, key_col2) VALUES (1, NULL)
;
INSERT INTO Table1 (key_col1, key_col2) VALUES (2, NULL)
;
INSERT INTO Table1 (key_col1, key_col2) VALUES (3, 3)
;
INSERT INTO Table2 (key_col1, key_col2) VALUES (1, NULL)
;
INSERT INTO Table2 (key_col1, key_col2) VALUES (2, NULL)
;
INSERT INTO Table2 (key_col1, key_col2) VALUES (3, NULL)
;

That last insert would fail in Jet but in SQL Server it would succeed
on the logic, I guess, that Table1.col2 = Table2.col2 returns unknown
therefore should not fail; Jet's apparent logic 'NULL must equal NULL'
is harder to defend 'relationally'. However, what seems strange IMO is
the SQL Server CASCADE behaviour:

UPDATE Table1
SET key_col2 = 1
WHERE key_col1 = 1
;

The above change does not cascade to the referencing table (Table2) in
SQL Server. While the inability to compare null values between tables
may be sound 'relationally' it would seem to be unintuitive and not the
desired effect. Similarly, when deleting:

DELETE
FROM Table1
WHERE key_col1 = 2

the row is removed from the referencing table in Jet but not in SQL
Server.

I think the Jet behaviour has more utility. Such 'practical' decisions
are hardly unprecedented, even my the famously strict ANSI SQL
standards committees. For example, if one null value cannot be compared
as being equal to another null value then why do they GROUP together?
e.g.

DELETE FROM Table1
;
INSERT INTO Table1 (key_col1, key_col2) VALUES (1, NULL)
;
INSERT INTO Table1 (key_col1, key_col2) VALUES (2, NULL)
;
INSERT INTO Table1 (key_col1, key_col2) VALUES (3, 3)
;
SELECT key_col2, COUNT(*)
FROM Table1
GROUP BY key_col2
;

The answer is, at least in part, because most people would expect it to
work this way and such an approach has the greater practical utility.

Jamie.
 
J

Jamie Collins

nickh said:
So, Jamie, if I am understanding your points, you are saying that there
actually is a benefit to my idea in terms of the physical fragmentation on
the disk. I.E. the worst arrangement is to have the autonumber alone be the
primary key because it is completely random and irrelevant and there will be
the most disk head movement as I try to fetch phone#'s from the table
(assuming I am fetching them in sequential phone# order). My idea to have
PHONE+AUTONUMBER would be good because at least the phone #'s will be
sequentially next to each other on the physical disk. This database is quite
large already at 1.2 million records, and growing by 220,000 records per
month.

Yes, that's the theory. However, like anything relating to
'performance', you will need to test to determine what works best in
your particular circumstances e.g. something that benefits one query
may penalise another so you have to make a judgement on that which is
critical path. A physical in index may be 'better' than a logical one
but you will use both flavours (knowingly or otherwise). Be wary of
anyone (including me <g>) who suggests they may be able to determine
what is best for you based on the little information you have posted
here.

Note the mdb must be compacted for the 'clustered' index to be rebuilt.
See the following Microsoft articles:

ACC2000: Defragment and Compact Database to Improve Performance

http://support.microsoft.com/default.aspx?scid=kb;en-us;209769

"If a primary key exists in the table, compacting re-stores table
records into their Primary Key order. This provides the equivalent of
Non-maintained Clustered Indexes, and makes the read-ahead capabilities
of the Microsoft Jet database engine much more efficient."

New Features in Microsoft Jet Version 3.0

http://support.microsoft.com/default.aspx?scid=kb;en-us;137039

"Compacting the database now results in the indices being stored in a
clustered-index format. While the clustered index isn't maintained
until the next compact, performance is still improved."
I really have 2 different types of records mixed together in 1 table, which
may be the root problem here (that I should redesign the schema into 2
tables!) I.E., I am storing records with phones, and also records without
phones (by the way these are sales leads purchased from InfoUSA; some have
phones which we telemarket to, and others are name/address only no phone
which we mail to). We are mixing the 2 records together in the same table
since they are both "leads".

Consider 'subclassing', where the attributes common to all 'leads' are
found in a 'superclass' table. Do a google search on the exact phrase
"CREATE TABLE Vehicles".

Jamie.

--
 
B

BruceM

I'll just mention that a many people, including Access MVPs, consider
autonumber to be a perfectly acceptable and efficient choice for PK.
However, even if physical ordering on the disk has enough of an effect on
performance that multi-field PKs are worth the extra effort, it may be worth
asking yourself if the order of phone numbers is less "random" than the
order of autonumbers. It's true that phone numbers have area codes and
exchanges so they can be grouped by state, etc., but does that help you?

nickh said:
So, Jamie, if I am understanding your points, you are saying that there
actually is a benefit to my idea in terms of the physical fragmentation on
the disk. I.E. the worst arrangement is to have the autonumber alone be
the
primary key because it is completely random and irrelevant and there will
be
the most disk head movement as I try to fetch phone#'s from the table
(assuming I am fetching them in sequential phone# order). My idea to have
PHONE+AUTONUMBER would be good because at least the phone #'s will be
sequentially next to each other on the physical disk. This database is
quite
large already at 1.2 million records, and growing by 220,000 records per
month.

I really have 2 different types of records mixed together in 1 table,
which
may be the root problem here (that I should redesign the schema into 2
tables!) I.E., I am storing records with phones, and also records without
phones (by the way these are sales leads purchased from InfoUSA; some have
phones which we telemarket to, and others are name/address only no phone
which we mail to). We are mixing the 2 records together in the same table
since they are both "leads". Those records with phones the obvious lookup
key is the phone (appended with an autonumber to break ties because we do
receive dupe phones occassionally in the daily feed). Those records that
are
name/address only no phone, then I would rely on the autonumber to be the
key
because the phone would be null or zerolengthstring

Jamie Collins said:
I have 2 fields: [PHONE] and [AUTONUMBER]

I would like to define the Primary Key as a multi-field primary key
as the
concatenation of [PHONE]+[AUTONUMBER]

Now, [PHONE] may be null, but [AUTONUMBER] will always contain a
unique
autonumber.

The problem is that Access won't allow null in [PHONE] even though
the
multi-field concatenation of [PHONE]+[AUTONUMBER] would be unique.
Why
not?
Is this a bug?

Such a key could implemented using a UNIQUE constraint i.e.
UNIQUE([PHONE], [AUTONUMBER]). In relational terms, a key is a key and
'primary' has no special meaning. The UNIQUE constraint can still be
used in a FOREIGN KEY constraint (see OT note below).
Perhaps the better approach is to ask: what does PRIMARY KEY give me
that UNIQUE does not, aside from requiring that all values be not null?
(FWIW columns in the PRIMARY KEY designation can be nullable, it's just
you can't actually put the null value in them)...

The key could implemented using a UNIQUE constraint i.e.
UNIQUE([PHONE], [AUTONUMBER]). In relational terms, a key is a key.
The UNIQUE constraint can still be used in a FOREIGN KEY.
AutoNumbers are normally unique on their
own? So why would you want to use an autonumber plus another field?

A valid answer to your question is clustering or physical ordering on
disk.

The better 'phone book' example would be the desire to cluster on the
subscribers' names. The phone number (using a natural key) or
autonumber would be required to break duplicates because subscribers'
names may not be unique.

Care is required in declaring the columns in the correct order i.e.
PRIMARY KEY (name, phone_number) i.e. the columns should be in
left-to-right order of significance to the clustering.

Clustering on name would favour BETWEEN (e.g. grab all names beginning
with 'B') and GROUP BY on the name because the rows would be physically
contiguous (and on the same page) to begin with and would be faster to
fetch. Think how 'useful' a paper phone book physically order on
telephone number would be to the average user.

Clustering on phone number or *random* autonumber on the other hand
would favour concurrency but a sequential autonumber PRIMARY KEY may be
the worst choice in any case.

[OT ]

FWIW nulls in foreign key constraints is one of the few areas where the
Jet implementation is IMO better than SQL Server's (and, Allen, I know
you are a fan of the 'CASCADE to null' feature in Jet 4.0 <g>).

SQL Server will disregard all values in the referencing columns if one
of then contains a null and also will not cascade them e.g. from a null
value to a not null value. When you see how Jet handles it, it makes
the SQL Server approach seem very odd e.g.

CREATE TABLE Table1 (
key_col1 INTEGER NOT NULL,
key_col2 INTEGER, UNIQUE (key_col1, key_col2)
)
;
CREATE TABLE Table2 (
key_col1 INTEGER NOT NULL,
key_col2 INTEGER,
FOREIGN KEY (key_col1, key_col2)
REFERENCES Table1 (key_col1, key_col2)
ON DELETE CASCADE
ON UPDATE CASCADE
)
;

INSERT INTO Table1 (key_col1, key_col2) VALUES (1, NULL)
;
INSERT INTO Table1 (key_col1, key_col2) VALUES (2, NULL)
;
INSERT INTO Table1 (key_col1, key_col2) VALUES (3, 3)
;
INSERT INTO Table2 (key_col1, key_col2) VALUES (1, NULL)
;
INSERT INTO Table2 (key_col1, key_col2) VALUES (2, NULL)
;
INSERT INTO Table2 (key_col1, key_col2) VALUES (3, NULL)
;

That last insert would fail in Jet but in SQL Server it would succeed
on the logic, I guess, that Table1.col2 = Table2.col2 returns unknown
therefore should not fail; Jet's apparent logic 'NULL must equal NULL'
is harder to defend 'relationally'. However, what seems strange IMO is
the SQL Server CASCADE behaviour:

UPDATE Table1
SET key_col2 = 1
WHERE key_col1 = 1
;

The above change does not cascade to the referencing table (Table2) in
SQL Server. While the inability to compare null values between tables
may be sound 'relationally' it would seem to be unintuitive and not the
desired effect. Similarly, when deleting:

DELETE
FROM Table1
WHERE key_col1 = 2

the row is removed from the referencing table in Jet but not in SQL
Server.

I think the Jet behaviour has more utility. Such 'practical' decisions
are hardly unprecedented, even my the famously strict ANSI SQL
standards committees. For example, if one null value cannot be compared
as being equal to another null value then why do they GROUP together?
e.g.

DELETE FROM Table1
;
INSERT INTO Table1 (key_col1, key_col2) VALUES (1, NULL)
;
INSERT INTO Table1 (key_col1, key_col2) VALUES (2, NULL)
;
INSERT INTO Table1 (key_col1, key_col2) VALUES (3, 3)
;
SELECT key_col2, COUNT(*)
FROM Table1
GROUP BY key_col2
;

The answer is, at least in part, because most people would expect it to
work this way and such an approach has the greater practical utility.

Jamie.
 
J

Jamie Collins

BruceM said:
I'll just mention that a many people, including Access MVPs, consider
autonumber to be a perfectly acceptable and efficient choice for PK.

I'll just mention that many people, including the Emperor's most
trusted advisors, considered his New Clothes to be a perfectly
acceptable choice for a state occasion <g>.

Researching a natural key can involve time and effort (though usually a
google search will quickly reveal an industry-standard key, ISO
standard, etc). Designing your own key can be hard work. Is there
anything simpler than incrementing an integer? Sure, an integer that is
incremented for you!

It is mere coincidence in this case that 'the easy way out' is the not
'the road less travelled'?

Jamie.

--
 
B

BruceM

So I'm the chump who's being duped by the slick-talking autonumber heretics,
is that it?
I can find all sorts of arguments pro and con on any topic, so the existence
of advocacy for one position or the other does nothing by itself to sway my
views. Autonumber works fine for my purposes, but my point, which you
ignored, was that even if I was to accept your position on "natural" keys,
what is "natural" about a phone number? Unless you are sorting by phone
number to find gaps or something like that the ordering of a phone number
field is surely as arbitrary as an autonumber field.
 
R

Rick Brandt

BruceM said:
So I'm the chump who's being duped by the slick-talking autonumber
heretics, is that it?
I can find all sorts of arguments pro and con on any topic, so the
existence of advocacy for one position or the other does nothing by
itself to sway my views. Autonumber works fine for my purposes, but
my point, which you ignored, was that even if I was to accept your
position on "natural" keys, what is "natural" about a phone number? Unless
you are sorting by phone number to find gaps or something like
that the ordering of a phone number field is surely as arbitrary as
an autonumber field.

Bruce there are two different philosophies at issue here.

Surrogate numeric keys (AutoNumber or otherwise) do solve a lot of technical
problems for database and database application developers and many seasoned
professionals who know what they are talking about use them and recommend
them. However; they are not part of proper Relational Database Design
Theory.

If you are discussing the construction of "proper" relational databases then
surrogates will be argued against by the experts almost without exception.
If you are discussing the practical pitfalls and solutions of building
working databases then you will find a lot of people who make their living
doing such things arguing for the use of surrogate keys.

These really are two different discussions, but often the boundaries get
blurred during exchanges in these groups. I for one have no problem with
the judicious use of surrogate keys and often use them myself. However; I
avoid making statements like "All Primary Keys should be numeric surrogates"
because their use is simply not warranted in every case and my preferences
are not applicable in every case. I will point out their advantages when
discussing databases with other developers, but I would not fault the
decision of another developer not to use them.
 
B

BruceM

Rick, thanks for the explanation. I have done some reading on the subject,
but usually get bogged down when the discussion gets into clustered indexes,
physical ordering on the disk, and the like.
I was responding in part because the OP wrote (in response to Jamie): "the
worst arrangement is to have the autonumber alone be the primary key because
it is completely random and irrelevant and there will be the most disk head
movement as I try to fetch phone#'s from the table (assuming I am fetching
them in sequential phone# order)."
I wondered if adding another essentially arbitrary number (phone #) to the
key would somehow improve this situation, assuming that the part about
physical location on the disk is essentially accurate in the first place. I
would think that it would make more sense, in a case involving a surrogate
key as part of a multi-field PK, to have the natural part of the key be
something such as LastName that would actually be used for sorting and
filtering. If one is going to combine something with autonumber, from the
uniqueness point of view it doesn't much matter what that is.
Anyhow, I appreciate your adding some comments I can actually begin to
understand. By the way, who decided what is "proper"? I'm not disputing
the point so much as wondering how it came to be.
 
R

Rick Brandt

BruceM said:
Rick, thanks for the explanation. I have done some reading on the
subject, but usually get bogged down when the discussion gets into
clustered indexes, physical ordering on the disk, and the like.
I was responding in part because the OP wrote (in response to Jamie):
"the worst arrangement is to have the autonumber alone be the primary
key because it is completely random and irrelevant and there will be
the most disk head movement as I try to fetch phone#'s from the table
(assuming I am fetching them in sequential phone# order)."
I wondered if adding another essentially arbitrary number (phone #)
to the key would somehow improve this situation, assuming that the
part about physical location on the disk is essentially accurate in
the first place. I would think that it would make more sense, in a
case involving a surrogate key as part of a multi-field PK, to have
the natural part of the key be something such as LastName that would
actually be used for sorting and filtering. If one is going to
combine something with autonumber, from the uniqueness point of view
it doesn't much matter what that is. Anyhow, I appreciate your adding some
comments I can actually begin to
understand. By the way, who decided what is "proper"? I'm not
disputing the point so much as wondering how it came to be.

The theory purists will mostly quote <genuflect> Codd and Date </genuflect>.
They pretty much "wrote the books" on relational set theory. With that
foundation to build on many "real world practioners" found that adhering too
strictly to theory can be really difficult and less productive, so for the
sake of problem solving and productivity they deviated from pure theory.

Some of these "devious" ;-) practices have become so common-place that many
now conflate them to theory in their own right. While they are common and
useful one should be aware that they are deviations from relational theory,
not part of it.
 
D

David W. Fenton

The theory purists will mostly quote <genuflect> Codd and Date
</genuflect>. They pretty much "wrote the books" on relational set
theory. With that foundation to build on many "real world
practioners" found that adhering too strictly to theory can be
really difficult and less productive, so for the sake of problem
solving and productivity they deviated from pure theory.

If you're choosing your primary keys based on ordering of the
storage, you're way outside the theoretical realm. That's an
implementation issue and is going to be answered differently for
each different database engine (e.g., Jet will be different from
MySQL).

I can't imagine a telephone number table that would *need* a PK that
didn't involve the parent foreign key.

In any event, you'd have to have hundreds of thousands of records in
the telephone table for it to make a noticeable difference.
 
J

Jamie Collins

BruceM said:
what is "natural" about a phone number?

A natural key can be verified in the reality being modelled (hint: pick
up the phone, press the buttons). If you are lucky, the natural key
will have a trusted source who will handle the rare (hopefully)
occurrences of duplicate entities (<flippant> and if you are unlucky,
the trusted source will be a phone company; Bill Bryson's three laws of
nature: you can't re-live the past, the waiter can't see you until he's
ready and you can't beat the phone company </flippant>).

In case your question was, "Is phone number better than autonumber as
regards clustering?" my answer would be, I suspect not but it will
depend on the primary usage of the table. From my experience with
(paper) telephone directories, physical ordering on subscriber name
(last name, first names) suits my primary usage better.
I can find all sorts of arguments pro and con on any topic, so the existence
of advocacy for one position or the other does nothing by itself to sway my
views.

That was *my* point. The fact you have found all sorts of people who
believe an autonumber PK makes a fine PK does not make it true.
So I'm the chump who's being duped by the slick-talking autonumber heretics,
is that it?

You get to choose your own role. In the versions of the tale I've
heard, there has been no such role as "person pointing out that there
are people who think the Emperor is fully clothed" because, I guess,
such a role would be of no relevance to the plot. Me, I'm the small
child who, at the risk of public ridicule, questions popular belief
when it appears to be based on fallacy.

Jamie.

--
 
B

BruceM

The OP, explaining his understanding of your comments, wrote "the worst
arrangement is to have the autonumber alone be the primary key because it is
completely random and irrelevant". I questioned whether a phone number is
less random in terms of ordering. I understand that it is natural in the
sense that it corresponds to a real-world application, but it is unlikely to
be used for ordering except maybe in the case of a call list. That being
the case, combining autonumber with phone number offers dubious advantage at
best over autonumber alone.
I understand a case may be made for natural keys, but I tend to regard
invariable rules with some suspicion.
 
J

Jamie Collins

Rick said:
there are two different philosophies at issue here.

Surrogate numeric keys (AutoNumber or otherwise) do solve a lot of technical
problems for database and database application developers and many seasoned
professionals who know what they are talking about use them and recommend
them. However; they are not part of proper Relational Database Design
Theory.

If you are discussing the construction of "proper" relational databases then
surrogates will be argued against by the experts almost without exception.
If you are discussing the practical pitfalls and solutions of building
working databases then you will find a lot of people who make their living
doing such things arguing for the use of surrogate keys.

These really are two different discussions, but often the boundaries get
blurred during exchanges in these groups. I for one have no problem with
the judicious use of surrogate keys and often use them myself. However; I
avoid making statements like "All Primary Keys should be numeric surrogates"
because their use is simply not warranted in every case and my preferences
are not applicable in every case. I will point out their advantages when
discussing databases with other developers, but I would not fault the
decision of another developer not to use them.

A key uniquely identifies data and the unit of work in SQL is a row
(record). 'Primary keys' are a subset of 'keys'. Disregard the
'primary' distinction for a moment.

I recognise *three* uses for autonumber.

1) As a so-called surrogate, where the natural key is being constrained
and the autonumber is used to 'associate' related tables. The main
advantage is that a single column integer is efficient and easier to
work with. IMO most people take this approach out of habit rather than
to tackle a real life problem. FWIW the 'theorists' are split: Codd
suggests a true surrogate should not be visible, à la an index; Date
seems to be less strict e.g. might be visible to a DBA. I'm not a
theorist and I don't really have a problem with such usage. The problem
really is the physical implementation: from a SQL point of view, there
is no requirement for a referencing (child) table to physically store
the referenced key value (in some SQL products there they are
physically one and the same) but in Jet physically repeated values
between tables are a reality and sometimes physical considerations
(e.g. performance) must be addressed. Natural key + autonumber is a
fair compromise because you have a 'relational' key (the natural key)
plus a physical connector (autonumber).

2) As a so-called artificial key where no natural key exists i.e. the
composite of all the attributes being modelled could be duplicated for
distinct entities and modelling more attributes is not an option. You
will have to accept that you will never be able to associate entities
(e.g. if you want to collect all posts by me - and why wouldn't you
<g>? - then you're our of luck because lots of people post via google,
handles can be duplicated, email addresses can be reassigned, sig lines
can be faked, personal style can be mimicked, etc). Your database
becomes the trusted source. The problem here is that the autonumber
must be exposed to users and logically autonumbers aren't the best for
real word situations: people hate typing GUIDs (if they can remember
one in the first place <g>); random integers are easily mistyped (hint:
a check digit, à la ISBN, helps here); with sequential integers people
get bothered by missing values (e.g. auditors). Autonumbers also have
physical problems: a sequence must be generated on the same machine,
that's why your incrementing autonumber are changed to when using
replication, but even random numbers can produce clashes. And consider
that autonumber values can be assigned explicitly i.e. using INSERT
INTO syntax. On the other hand, designing a key and maintaining its
values takes time and effort and Access is all about RAD so I can see
the appeal (but I maintain my right to level charges of unimaginative
design and contempt for users).

3) As a so-called uniquifier. Designed to ensnare the clueless
(subjective, I know, but I can't think of a better way of putting it).
You know the message: "Although a primary key isn't required, it's
highly recommended. A table must have a primary key for you to define a
relationship between this table and other tables in the database. Do
you want [Access] to create a primary key now?" Lies, all lies: a)
logically, a table *does* require a key, otherwise it's a 'heap'; b) a
table does not need a primary key or other key to define a relationship
(it needs a key to 'enforce referential integrity' but it doesn't need
to be the primary key), rather it merely needs a compatible data type
in anther table; c) if you choose 'Yes', Access doesn't create a key,
it merely creates an autonumber to 'uniquify' you data. A key should
prevent duplicates. Think about it: an autonumber doesn't *prevent*
duplicates, rather it *enables* duplicates. Try it: create a single
column of type 'number' and create several rows with the same value.
say, 1. Now, are those 1s the same or are they all different? The
autonumber is there to tell you they are all different but do you
believe it? In case it's not clear, it's this third flavour of
autonumber I have a problem with.

Back to 'primary key'. The term can be traced back to an early mistake
made by Ted Codd. You have number of 'candidate key' and you pick one
to be 'primary key'. He later realised that all keys are equal but by
then it was too late. Relational theory has moved on but SQL took
PRIMARY KEY (SQL keywords in uppercase) and now has to deal with it as
a legacy issue.

First, you don't *need* PRIMARY KEY. If you make your existing primary
key columns NOT NULL and constrained them with UNIQUE you would suffer
no loss of data integrity because a NOT NULL UNIQUE key is a key and
all keys are equal. The NOT NULL property is required for equivalence
because a PRIMARY KEY cannot comprise a NULL value.

Second, while the idea of PRIMARY KEY is the particular SQL
implementation - each SQL product e.g. Jet - would give special meaning
to the PRIMARY KEY. This is why everyone says that a SQL table should
have a primary key: if it didn't have one at the very worst you would
be missing out on the special benefits that PRIMARY KEY has to offer.

David W Fenton (downthread) is correct: these special benefits vary
from product to product. For Access/Jet, they include:

1) Clustering (physical ordering) on disk on compact;
2) Appearing as bold text in the Relationships diagram;
3) In absence of explicit definition, it will be used when the table is
referenced in a relationship or FOREIGN KEY (the letter behaviour is
defined in the SQL-92 standard);
4) Prevents you from being nagged, "Although a primary key isn't
required..." <g>.

Jamie.

--
 
J

Jamie Collins

BruceM said:
The OP, explaining his understanding of your comments, wrote "the worst
arrangement is to have the autonumber alone be the primary key because it is
completely random and irrelevant".

The OP also wrote, "My idea to have PHONE+AUTONUMBER would be good
because at least the phone #'s will be sequentially next to each other
on the physical disk." I conclude the OP has understood the issues and
physically contiguous PHONE values is good for them. I'm satisifed
'duty of care' has been served.

Jamie.

--
 
J

Jamie Collins

Jamie said:
First, you don't *need* PRIMARY KEY. If you make your existing primary
key columns NOT NULL and constrained them with UNIQUE you would suffer
no loss of data integrity because a NOT NULL UNIQUE key is a key and
all keys are equal. The NOT NULL property is required for equivalence
because a PRIMARY KEY cannot comprise a NULL value.

Second, while the idea of PRIMARY KEY is the particular SQL
implementation - each SQL product e.g. Jet - would give special meaning
to the PRIMARY KEY. This is why everyone says that a SQL table should
have a primary key: if it didn't have one at the very worst you would
be missing out on the special benefits that PRIMARY KEY has to offer.

There are many articles pointing out that the choice of which candidate
key should be primary is *arbitrary* but here's one that seems to make
the point well:

http://www.aisintl.com/case/relational_keys.html

Relational Keys

"The primary key of any table is any candidate key of that table which
the database designer arbitrarily designates as "primary". The primary
key may be selected for convenience, comprehension, performance, or any
other reasons. It is entirely proper (albeit often inconvenient) to
change the selection of primary key to another candidate key.
[...] there is no property of a primary key which is not shared by all
other candidate keys in of the table except this arbitrary designation.
Unfortunately E-R methodologies and RDBMS products have come to rely on
the primary key almost to the exclusion of the concept of candidate
keys, which are rarely supported by software."

Note, Access/Jet supports candidate keys via NOT NULL UNIQUE.

Jamie.

--
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top