Primary Keys

O

onedaywhen

Lyle said:
From

http://msdn2.microsoft.com/en-us/library/wd9d69b1.aspx

THE CLUSTERED PROPERTY IS IGNORED FOR DATABASES THAT USE THE MICROSOFT
JET DATABASE ENGINE BECAUSE THE JET DATABASE ENGINE DOES NOT SUPPORT
CLUSTERED INDEXES.

No need to shout.

Try reading more widely:

New features in Jet Version 3.0:
http://support.microsoft.com/default.aspx?id=137039

Quote: "Compacting the database now results in the indices being stored

in a clustered-index format. While the clustered index isn't maintained

until the next compact, performance is still improved ... The new
clustered-key compact method is based on the primary key of the table.
New data entered will be in time order."

ACC2000: Defragment and Compact Database to Improve Performance
http://support.microsoft.com/default.aspx?scid=kb;en-us;209769

Quote: "A disk defragmenter will place all files, including the
database file into contiguous clusters on a hard disk ... If a primary
key exists in the table, compacting re-stores table records into their
Primary Key order. This provides the equivalent of Non-maintained
Clustered Indexes"

I think the phrase 'not supported' is used to convey the fact that in
Jet you cannot specify the clustered index independent of the PRIMARY
KEY as you can in, say, SQL Server. It may just mean that there is no
syntax for CLUSTERED INDEX.

Regardless of what 'not suuported' means, clustered indexes definitely
exist for Jet and PRIMARY KEY is the way to leverage them.

Jamie.

--
 
D

David W. Fenton

If you've understood the above you should come to the conclusion
that a sole autonumber column will never make a good PRIMARY KEY
in Access/Jet, because a random/incrementing integer/GUID does not
make a good clustered index. I'd suggest that anyone who uses
their autonumber column in a BETWEEN or GROUP BY construct has got
something wrong in design and/or queries. I'd further suggest that
anyone who uses BETWEEN or GROUP BY constructs which do not
include columns that comprise their PKs are likely to have made a
poor choice of PK.

A random PK would result in the placement of records on as many data
pages as possible, thus improving concurrency.
 
J

Jamie Collins

David said:
A random PK would result in the placement of records on as many data
pages as possible, thus improving concurrency.

Good point, I was thinking too narrowly.

If my table had columns for surname, initials and telephone number and
my queries predominantly use BETWEEN on the surname column, having the
table physically ordered on telephone number may make my queries
perform worse than if the physical order was on surname (can you
imagine trying to use a paper copy telephone directory ordered on
telephone number <g> ?!)

As I said, the choice of PK should be determined by the SQL DML e.g.
you are interested in page locks for updates in a multi-user
environment, I'm interested in query performance, etc.

Jamie.

--
 
J

Jamie Collins

Jamie said:
If my table had columns for surname, initials and telephone number and
my queries predominantly use BETWEEN on the surname column, having the
table physically ordered on telephone number may make my queries
perform worse than if the physical order was on surname (can you
imagine trying to use a paper copy telephone directory ordered on
telephone number <g> ?!)

As I said, the choice of PK should be determined by the SQL DML e.g.
you are interested in page locks for updates in a multi-user
environment, I'm interested in query performance, etc.

Oops! I meant to add:

In other words I want to fetch rows on the same page and contiguous
pages; you want to maximise the chances of the rows each user will be
interested in are on different pages (am I correct?) I think in my
simple contacts example physically ordering on surname would provide
good concurrency as well. Whatever, it's clear we are both thinking
about the Jet implementation (i.e. contiguous storage on disk) when
considering PKs. Can everyone else say the same?

Jamie.

--
 
L

Lyle Fairfield

Regardless of what 'not suuported' means, clustered indexes definitely
exist for Jet and PRIMARY KEY is the way to leverage them.

One can get some of the advantages of clustered indexes by choosing a
meaningful primary key, by compacting ... and, perhaps, by defragging.
This is a far cry from the convenience and power or a clustered index.
 
J

Jamie Collins

Lyle said:
One can get some of the advantages of clustered indexes by choosing a
meaningful primary key, by compacting ... and, perhaps, by defragging.

We seem to be in agreement here i.e. what to consider when choosing a
PK.
This is a far cry from the convenience and power [of] a clustered index.

The gap doesn't appear that wide to me but I'm willing to learn
otherwise. Details?

Thanks,
Jamie.

--
 
L

Lyle Fairfield

Jamie said:
Lyle said:
One can get some of the advantages of clustered indexes by choosing a
meaningful primary key, by compacting ... and, perhaps, by defragging.

We seem to be in agreement here i.e. what to consider when choosing a
PK.
This is a far cry from the convenience and power [of] a clustered index.

The gap doesn't appear that wide to me but I'm willing to learn
otherwise. Details?

Thanks,
Jamie.

It appears wide to me. A clustered index in SQL-Server is maintained.
No compacting or defragging is required.
In Jet, if the table/file is small, the performance advantages will,
probably, not be noticed.
If the table/file is large, there will be a penalty (time/resources) in
compacting.
Even if we compact, there is, TTBOMK, no guarantee that the compacting
will use contiguous sectors, although we might hope for that if the
disk is new. Pages containing consecutive (from the pimary key point
of view) might be distant from each other (from the disk's point of
view). Defragging is likely to cure this. But defragging is slow. And
defragging may result in the space after the MDB/E being used by
another file. So that, as soon as we update or insert another record it
may again be remoteness (from the disk's point of view) from records
with "adjacent" primary keys. And to correct this remoteness we may
have to compact and defrag.
In a maintained clustered index all of this (we hope) is planned and
managed by the database engine.
This seems to me to be wide gap.
I understand that with a static database, the gap would be considerably
less.
In general, I agree with you that if one were doing a lot of sql work
based on >= 'Mainwaring; and <= 'Milne' it might, depending on other
needs of the db, it might be efficient to use some primary key that
helped with identifying the records wanted or processed.
I think I have said sufficient about this ...if you reply you shall
have the field to yourself.
 
D

David W. Fenton

Good point, I was thinking too narrowly.

If my table had columns for surname, initials and telephone number
and my queries predominantly use BETWEEN on the surname column,
having the table physically ordered on telephone number may make
my queries perform worse than if the physical order was on surname
(can you imagine trying to use a paper copy telephone directory
ordered on telephone number <g> ?!)

As I said, the choice of PK should be determined by the SQL DML
e.g. you are interested in page locks for updates in a multi-user
environment, I'm interested in query performance, etc.

Given that with Jet you can only have the one clustered index, I
really think this is a pretty irrelevant consideration. I never use
the BETWEEN operator on anything but date fields, which in the vast
majority of tables could not possibly ever be a candidate for PK
(and very seldom even a candidate for inclusion in a compound
natural key, which wouldn't give you the clustered index benefit,
anyway, unless the date was the first field of the compound key).

So, I just don't see any practical benefit in your pointing out the
performance advantage of the clustered index.

Have you tested SEEKs on non-PK indexes?
 
D

David W. Fenton

Oops! I meant to add:

In other words I want to fetch rows on the same page and
contiguous pages; you want to maximise the chances of the rows
each user will be interested in are on different pages (am I
correct?) . . .

No. I want to minimize the chance that two users will be editing
data on the same data page.
. . . I think in my
simple contacts example physically ordering on surname would
provide good concurrency as well. Whatever, it's clear we are both
thinking about the Jet implementation (i.e. contiguous storage on
disk) when considering PKs. Can everyone else say the same?

Well, either way, it's irrelevant for newly added records before the
database is compacted, since those are all going to land in their
own data page that is not written back in PK order.

As long as you've got records added after the last compact, there's
always going to be some data pages (and index data pages) that are
not in the final order (whether your natural key order or my random
Autonumber order), and thus the performance gain can never be fully
realized in an actively used database.
 
J

Jamie Collins

David said:
As long as you've got records added after the last compact, there's
always going to be some data pages (and index data pages) that are
not in the final order (whether your natural key order or my random
Autonumber order), and thus the performance gain can never be fully
realized in an actively used database.

The table may not be updated frequently e.g. how often does your
telephone company send you a new paper copy directory, how frequent is
your subscription for a refreshed data set? I don't mean to argue
pointlessly, merely point out that you have to base your PKs (on a
table by table basis) on SQL DML which may or may not include frequent
SQL UPDATE statements.

Jamie.

--
 
J

Jamie Collins

David said:
with Jet you can only have the one clustered index

Erm, think it is fundamental that a table can have only one clustered
index, regardless of SQL implementation. Surely a table with two
phyical orders is in fact tow tables!
I never use
the BETWEEN operator on anything but date fields, which in the vast
majority of tables could not possibly ever be a candidate for PK
(and very seldom even a candidate for inclusion in a compound
natural key, which wouldn't give you the clustered index benefit,
anyway, unless the date was the first field of the compound key)

You seem to have talked yourself into seeing my point i.e. make the PK
compound with first your date column followed by a candidate key. This
will favour your BETWEEN constructs.

Jamie.

--
 
A

Amy Blankenship

If it's that important to you, Access is probably the wrong database to be
using. If, on the other hand, you use your PK's to establish relationships,
then physical location on disk is unimportant. With cheap fast processors,
labor time of the developer saved by using PK's that make development
efficient is far more cost-effective than any small speed gain realized by
trying to physically order records next to each other. Access is not the
tool of choice for DBA's obsessed with application speed ;-).
 
D

David W. Fenton

Erm, think it is fundamental that a table can have only one
clustered index, regardless of SQL implementation. Surely a table
with two phyical orders is in fact tow tables!

Yes. What I should have said was that Jet allows only the PK to be
clustered, not the indexed field of your choice.
You seem to have talked yourself into seeing my point i.e. make
the PK compound with first your date column followed by a
candidate key. This will favour your BETWEEN constructs.

But it's a nonsensical way to pick PKs. Dates are very seldom going
to be part of a natural key, at least not for very many types of
entities.

I think your suggestion is bloody stupid, as it optimizes something
that very seldom needs further optimization in the first place.
 
J

Jamie Collins

Amy said:
If it's that important to you, Access is probably the wrong database to be
using. If, on the other hand, you use your PK's to establish relationships,
then physical location on disk is unimportant.

Did you read the OP's post? They asked (paraphrasing), if I can use NOT
NULL UNIQUE to define my relationships, what do I need PRIMARY KEY for?
I certainly get you point but I'm trying to address another issue i.e.
what's the difference between NOT NULL UNIQUE and PRIMARY KEY in
Access/Jet to which the answer is the clustered index.

We need to differentiate between data integrity and indexing. I agree
that data integrity is vital, whereas indexing is less significant but
still important.

You should think of PRIMARY KEY as your most powerful index for a
table. If you are ignoring this aspect of PK then you are at best
missing out on some potential gain e.g. better performance, improved
concurrency, etc.

It sounds like you have no need for indexes: that's fine, that's your
choice and if your databases are small you will probably not notice any
difference. However, If you do use regular indexes yet don't consider
the indexing aspect of PRIMARY KEY then maybe it's time to reassess you
indexing strategy.

Jamie.

--
 
J

Jamie Collins

David said:
Dates are very seldom going
to be part of a natural key, at least not for very many types of
entities.

It seems I was wrong then and you haven't had the epiphany yet.

Stop thinking in terms of PRIMARY KEY as being your primary key,
candidate key, natural key, etc because you can use NOT NULL UNIQUE for
those purposes. For Jet you must think in terms of PRIMARY KEY meaning
clustered index and nothing else, then choose whatever columns makes
sense in that context.
I think your suggestion is bloody stupid, as it optimizes something
that very seldom needs further optimization in the first place.

The important word there is 'seldom'. If 'optimization' is the *only*
thing that differentiates PRIMARY KEY from NOT NULL UNIQUE then why use
PK for any other purpose?

Jamie.

--
 
A

Amy Blankenship

I think YOU need to reread it. The poster asked the
advantages/disadvantages of using natural keys vs. autonumber. The words
not null unique did NOT appear.

In your own words, indexing is less significant than data integrity (and,
presumably, developer time). Therefore, the things that are more
significant should be considered first.

If my Access databases were large enough that indexing were an issue, they
wouldn't be in Access.

OK, I'll leave the church of the not null unique vs. primary key to you.
How many times do you kneel facing Redmond each day,anyway ;-)?

-Amy
 
P

polite person

If my Access databases were large enough that indexing were an issue, they
wouldn't be in Access.
<snip>
I don't want to intrude in other people's point scoring but newbies might read this.
Indexing is essential in most Access databases. I think you probably mean "the efficiency of the
indexing."
Also the effectiveness of Access as against other dbs depends on other things besides size, as a
matter of fact Access can be used for pretty big databases.
 
L

Lyle Fairfield

Amy said:
If my Access databases were large enough that indexing were an issue, they
wouldn't be in Access.

You have Access Databases without tables or records? Cool!
 
T

Tim Marshall

Amy said:
If my Access databases were large enough that indexing were an issue, they
wouldn't be in Access.

OK, I'll leave the church of the not null unique vs. primary key to you.
How many times do you kneel facing Redmond each day,anyway ;-)?

It is Edgar Codd's legacy to which most of us pay homage, the platform
on which database development takes place is utterly irrelevant.

I haven't followed this thread, but I have seen this post. No offence,
but you've shown a complete and, if you are anything close to a
"professional" developer, shocking lack of understanding of relational
database design and of "Access" by:

1) Indicating indexing is not an issue; and

2) by referring to "Access databases".

As I said, if you consider yourself a database developer, all I can say
is...

wow.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Top