Unique fields in table design

Guest · Jan 22, 2007

Hi,

I have a payroll database which is currently in use, in Access 2002.

One of the tables called TimeSheet consists of the following fields:
TimeSheetID
EmployeeID (references the employee table)
PayPeriod (references the Pay period table).

The problems that is occuring is that some staff members are filling out
more than one time card for a particular pay period.

Is there a way to stop this from happening? I see that I can set unique on
table fields, but I need to do it on a combination of fields, e.g. the
combination of employeeiD and Payperiod should not be duplicated.

Can I do this in the table design? Or in the form design?

Any help would be much appreciated,
thanks,
Alex.

Jeff Boyce · Jan 22, 2007

Table design view, add an index comprised of the two fields, make it "no
duplicates".

Regards

Jeff Boyce
Microsoft Office/Access MVP

Jamie Collins · Jan 23, 2007

I have a payroll database which is currently in use, in Access 2002.

One of the tables called TimeSheet consists of the following fields:
TimeSheetID
EmployeeID (references the employee table)
PayPeriod (references the Pay period table).

The problems that is occuring is that some staff members are filling out
more than one time card for a particular pay period.

Is there a way to stop this from happening?

I'd wager a substantial sum that TimeSheetID does not exist in the
reality being modelled and is an autonumber column designated as
primary key. That would be a classic example of how an autonumber PK
not only fails to prevent duplicates but actually *facilitates* the
creation duplicates.

Enforce the *compound* natural key (EmployeeID, PayPeriod) with a
UNIQUE constraint e.g.

ALTER TABLE TimeSheet ADD
CONSTRAINT TimeSheet__natural_key
UNIQUE (EmployeeID, PayPeriod);

One done, seriously consider dropping TimeSheetID completely.

Jamie.

--

Jeff Boyce · Jan 23, 2007

Jamie

If the rows in TimeSheet are not the "parents" to any other "child" tables,
I, too, see no advantage to having a totally arbitrary Autonumber ID field.

However, if TimeSheet rows can have "many" related rows in other tables, it
may prove easier to use an Autonumber-based ID to provide a single-field
foreign key in those tables. Otherwise, the multi-column primary key would
need to have all columns migrated to the "child" tables.

While this could speed up querying, wouldn't it make managing table
relationships and forms/reports more complicated?

Regards

Jeff Boyce
Microsoft Office/Access MVP

Jeff Boyce · Jan 24, 2007

Jamie

Thanks for the example.

Jeff

Jamie Collins said:
Well, I think anyone would agree that, all other things being equal, a
single-field foreign key is better than a composite foreign key. I guess
the
real question is, when a single field foreign key doesn't exist in the
data
model, is there merit in inventing one? The so-called 'surrogate key'
approach.

I say 'so-called' because Ted Codd defined the term and theorists and
practitioners alike have disagreed about its interpretation ever since.
Some
(e.g. Celko) say that it should be hidden from all users à la an index
whereas others (e.g. Date, I think) say it can be visible to the DBA;
almost
everyone says it should be hidden from end users.

From a logical point of view, I think there is very little merit in
inventing something where something already exists in the data model. On
rare
occasions an otherwise-reliable natural key can be duplicated e.g. two
books
with the same ISNB number but I think you'd need to operate on a scale
somewhere close to Amazon's for this to be an operational issue. Some have
indicated that there may be practical implications associated with privacy
issues of using natural keys as foreign keys e.g. a SSN for
employee-related
tables would mean you would not always be able to expose the SSN yet users
would have to know of its existence to be able to JOIN related tables; I
can
see that a lot of VIEWs would need to be provided and maintained! I did
once
encounter a natural key (a business key maintained by a trusted source)
which
was defined as 'String' (don't you just hate XML schemas <g>?!) and had
values that exceeded VARCHAR(255) in practice (i.e. ridiculously
over-engineered) which in composite could seriously threaten the upper
limit
for a CONSTRAINT/index.

It remains the case that if a fabricated 'business key' is required there
are much more intelligent and user-friendly algorithms than those used by
Jet's autonumber. Incrementing integer has issues with missing sequence
values and concurrency (e.g. when used as PRIMARY KEY); sequences are
useful
(e.g. cheque/check book numbers, invoice numbers, etc where you need to
account for every item) but autonumber is not a reliable way of generating
them. GUID (replication ID) is a nightmare for end users e.g. typing them
in,
remembering them, etc. Random integer is probably the best available:
clashes
are going to be as rare as duplicate ISBNs but the algorithm can be
improved
upon e.g. a check digit, reseeding to guarantee no clashes, algorithm to
guarantee consecutively-generated values are far apart, etc. Let's now put
exposed surrogates aside.

From a physical point of view there may be some merit to introducing a
surrogate. Consider that in the Jet implementation (FWIW it's the same for
SQL Server), foreign key values are repeated in ("migrated to") the
referencing ("child") table, i.e. the physical mirrors the logical, but it
doesn't have to be this way. I'm told that in some SQL DBMS products the
values are physically stored once, meaning, for example, a CASCADE
referential action will have zero cost. I tend to take the view that if
the
physical implementation is not up to scratch you either wait for the
vendor
to change the product (and with Access you could be in for a long wait
<g>)
or move to another product. I am very wary** of changing my logical design
to
work around a vendor's bad design choice or physical limitation (**read:
it
makes me very grumpy <g>). Note I often find it useful to have the natural
key values repeated in the table because, unlike a surrogate, they have
actual meaning; I'll post an example below.

IMO one should first ensure a physical problem actually exists before
deviating from a logical design. I get impression that most people who
prefer
a surrogate use them systematically e.g. they 'grew up' with a user
interface
that encourages them to add one so they always do. I suspect most people
view
a surrogate the way I view an index e.g. I think the chances are it will
do
more good than harm so I'll create one (or more) when I create the table
then
promptly forget about it and take it for granted it's doing its thing.
Really
I'm from the 'If it ain't broke...' school of thought e.g. in VBA code I
only
set my object variables to Nothing when there's a good reason for doing so
and I've *never* declared a variable as Public.

Of course, the biggest problem I have with surrogates is that some people,
respected members of these Access groups included, don't seem to get the
fact
that you still need to enforce your candidate keys. Consider the literal
(dictionary) meaning of the word: "taking the place of somebody or
something
else". So the debates it not "natural vs surrogate" as is often said,
rather
it is "natural vs natural+surrogate". Surrogate on its own makes no sense
and
leads to duplicates, evidenced by daily posts asking how to
identify/remove
duplicates; however, the posts recommending an autonumber primary key with
no
mention of candidate (or similar) keys are even more frequent

Is a composite foreign key hard to manage? Personally, I don't think so.
There's no increased complexity in table joins: if you prefer SQL code
then
there's more characters to type, no big deal; if you prefer the Query
Designer I understand you only have to define the join once (in the
Relationships window, where I assume dragging-and-dropping more than one
column is no big deal) and when you include both tables in a the Query
Builder they will appear pre-joined. If you write SQL DDL you don't need
to
explicitly include all the columns because of the assumption in Jet SQL
(and
the SQL-92 standard) that a table will be referenced via its PRIMARY KEY
e.g.

CREATE TABLE Table1 (
key_col1 INTEGER NOT NULL,
key_col2 INTEGER NOT NULL,
PRIMARY KEY (key_col1, key_col2)
)
;
CREATE TABLE Table2 (
key_col1 INTEGER NOT NULL,
key_col2 INTEGER NOT NULL,
FOREIGN KEY (key_col1, key_col2)
REFERENCES Table1
)
;

Personally, I prefer to write it longhand anyhow to explicitly document
the
default behaviour e.g. this is equivalent to the above:

CREATE TABLE Table2 (
key_col1 INTEGER NOT NULL,
key_col2 INTEGER NOT NULL,
FOREIGN KEY (key_col1, key_col2)
REFERENCES Table1 (key_col1, key_col2)
ON DELETE NO ACTION
ON UPDATE NO ACTION
)
;

In summary, I have no fear of composite keys and use them all the time
without hesitation; I'll stop just short of saying I've never designed a
surrogate myself (though I strongly suspect I haven't). Sure, I've used
artificial keys and have even used autonumber for that purposes when
spending
someone else's money; rolling your own key is hard work which ever way you
do
it and autonumber is available out-of-the-box). But adding a surrogate,
even
an autonumber, when a composite key is already there strikes me as extra
work, rather than being the other way around.

PS that example I promised. Consider the following tables, stripped down
the
basics for simplicity (many missing columns and constraints, choice of
data
types etc):

CREATE TABLE Employees (
employee_nbr INTEGER NOT NULL PRIMARY KEY
)
;
CREATE TABLE Payroll (
employee_nbr INTEGER NOT NULL
REFERENCES Employees,
salary_start_date DATETIME NOT NULL,
salary_end_date DATETIME,
salary_amount DECIMAL(10,2) NOT NULL
)
;

It is worth noting that the candidate keys for the Payroll table include
(employee_nbr, salary_start_date), (employee_nbr, salary_end_date),
(employee_nbr, salary_start_date, salary_end_date). Not only are the
candidate keys involving the nullable end_date column (where a null
end_date
signifies the current period) prevented from being considered for the
PRIAMRY
KEY designation, none of the candidate keys can prevent duplicates e.g.
consider these rows:

ALTER TABLE Payroll ADD
UNIQUE (employee_nbr, salary_start_date)
;
ALTER TABLE Payroll ADD
UNIQUE (employee_nbr, salary_end_date)
;
ALTER TABLE Payroll ADD
UNIQUE (employee_nbr, salary_start_date, salary_end_date).
;
INSERT INTO Payroll (employee_nbr, salary_start_date, salary_end_date,
salary_amount)
VALUES (1, #2001-01-01 00:00:00#, #2001-12-21 23:59:59#, 16000.00)
;
INSERT INTO Payroll (employee_nbr, salary_start_date, salary_end_date,
salary_amount)
VALUES (1, #2001-12-01 00:00:00#, NULL, 18000.00)
;

Using UNIQUE or PRIMARY KEY constraints (SQL keywords in uppercase) for
the
candidate keys will fail still allow the above INSERTs to succeed.
However,
the result would be that for the period December 2001 two different salary
amounts would apply

SELECT employee_nbr, salary_amount
FROM Payroll
WHERE #2001-12-15 00:00:00#
BETWEEN salary_start_date AND
IIF(salary_end_date IS NULL, NOW(), salary_end_date);

This is clearly something which should be prevented from happening. The
requirement here is for what the SQL literature terms a 'sequenced primary
key', which involves constraints to ensure no overlapping periods,
resulting
in the interesting situation where a table's primary key cannot be
implemented using PRIMARY KEY but instead one must use table-level CHECK
constraints (e.g. in Jet), triggers (e.g. in SQL Server), etc.

Now consider a table of salary reviews. Obviously, it would reference the
Payroll table i.e. you can't have your salary reviewed if you are not
receiving one. Also, the salary review date must bear relation to the
salary
dates: let's say can't have your salary reviewed before you started
receiving
it. In Jet SQL and especially so the Access user interface to be able
implement business rules with DRI (Relationships) and field- and
record-level
Validation Rules than table-level CHECK constraints e.g.

CREATE TABLE SalaryReviews (
employee_nbr INTEGER NOT NULL,
salary_start_date DATETIME NOT NULL,
salary_end_date DATETIME,
FOREIGN KEY (employee_nbr, salary_start_date, salary_end_date)
REFERENCES Payroll (employee_nbr, salary_start_date, salary_end_date),
salary_review_date DATETIME NOT NULL
);

The primary key would depend on whether this was a current state or
valid-time state ('history') table. Let's take the easier example of a
current state table i.e. only the current salary can be reviewed. With the
dates at hand, the validation rule (record-level) is easy to define:

salary_review_date > salary_start_date
AND salary_end_date IS NULL

I don't fancy typing out all those date literals after all <g> so what
about
a procedure to help 'migrate' the repeated data e.g.

CREATE PROCEDURE AddSalaryReview (
arg_employee_number INTEGER,
arg_salary_review_date DATETIME = DATE()
)
AS
INSERT INTO SalaryReviews (employee_nbr, salary_start_date,
salary_end_date, salary_review_date)
SELECT employee_nbr, salary_start_date,
salary_end_date, arg_salary_review_date
FROM Payroll
WHERE employee_nbr = arg_employee_number
AND salary_end_date IS NULL
;
EXECUTE AddSalaryReview 1
;

On the other hand, if a surrogate was used to reference the Payroll table,
the date would have to be looked up when checking the validation rule e.g.
resigned table to reference via the surrogate Payroll.ID

DROP TABLE SalaryReviews
;
DELETE FROM Payroll
;
ALTER TABLE Payroll
ADD COLUMN
ID INTEGER DEFAULT GenUniqueID() NOT NULL
PRIMARY KEY
;
CREATE TABLE SalaryReviews (
employee_nbr INTEGER NOT NULL PRIMARY KEY,
payroll_ID INTEGER NOT NULL UNIQUE,
FOREIGN KEY (payroll_ID)
REFERENCES Payroll,
salary_review_date DATETIME NOT NULL
)
;

Sure, this table is simpler because it has once less column (but the same
number of columns across the whole schema <g>) and of type INTEGER rather
than DATETIME, but take a look at our revised validation rule
(table-level):

salary_review_date > (
SELECT Payroll.salary_start_date
FROM Payroll AS P1
WHERE SalaryReviews.payroll_ID = Payroll.ID
)
AND EXISTS (
SELECT *
FROM Payroll
WHERE SalaryReviews.payroll_ID = Payroll.ID
AND Payroll.salary_end_date IS NULL
)

Obviously, this would need another table-level CHECK constraint. The
revised
helper proc would require a repetition of this logic as a correlated
subquery
to retrieve the payroll_ID of the current salary.

So in this example (contrived to fit the point, of course), the surrogate
provides less utility than the composite key and, arguably, complicates
the
design more than simplifies.

Jamie.

Table Design	4	Jul 24, 2008
Timesheet Banked hours	4	Sep 30, 2009
Table Design	2	Jun 15, 2005
Linking two tables - a bit different	4	Jun 4, 2010
Not In (sort of) query	1	Jun 12, 2008
Source data for many fields in a table from one field in another t	1	Jul 11, 2008
Table design	7	Jul 16, 2005
Database Design/Change Help Requested	2	Aug 21, 2008

Unique fields in table design

Guest

Jeff Boyce

Jamie Collins

Jeff Boyce

Jeff Boyce

Ask a Question

Similar Threads