Append query syntax requested

Jack Sheet · Sep 25, 2006

Hi all

I am a real newbie to Access (I don't even know how to insert a calculated
date into a record during an action query) and John Spencer gave me a lot of
help in the gettingstarted newsgroup on a similar problem. Perhaps this
question should be addressed there but I feel it is getting a bit more
complicated, and it is query-specific so I put it here. Apologies if in
error. (John, if you read this you will recognise the previous problem.)

I have a database with 2 tables: T_Clients and T_Tasks.

T_Clients has 3 fields:
ID_Clients = autonumber, primary key
YEM = number, integer, validation permits range >0 and <13 (representing a
month)
YED = number, integer, validation permits range >0 and <32 (representing a
day in a month)
(the user is currently trusted not to enter impossible values within those
ranges, ie 31 June or 30 February)

T_Tasks has 3 fields:
ID_Tasks = autonumber, primary key
ID_Clients = number, long integer
PeriodTo = date

There is a one (T_Clients.ID_Clients) to many (T_Tasks.ID_Clients)
relationship with enforced referencial integrity.

In my sample data, T_Clients has 3 records

ID_Clients YEM YED
20 4 30
21 5 31
22 6 30

T_Tasks has 3 records
ID_Tasks ID_Clients PeriodTo
11 20 2004-04-30
12 21 2005-05-31
13 22 2006-06-30

I am needing an Append query, that creates records in T_Tasks. I don't care
how many intermediate Select Queries I need to create to get to the final
destination.

User inputs a date (call it "RollDate") and for this example say the user
input date 2006-06-30 (I am flexible about where this needs to be stored, or
whether it is input on a prompt when the append query is run).

The query then adds records to T_Tasks, by considering each instance of
ID_Clients in T_Clients and, for each such entry it adds a new period in
T_Tasks if an appropriate period does not already exist in T_Tasks for that
ID_Clients value. An appropriate period is one that ends on the latest date
on or before the RollDate but bearing the month and day-of-the-month
particular to that value of ID_Clients as specified in T_Clients.YEM and
T_Clients.YED.

So, for the sample data given:

It should ignore ID_Clients = 22, because the year end for that client is
set to 30 June, the latest such date on or before rolldate (30 June 2006)
is, coincidentally, 30 June 2006 and there already exists a record with that
date in the PeriodTo field.

It should create one record for ID_Clients = 21, and populate the PeriodTo
field with 2006-05-31.

IDEALLY it should create two records for ID_Clients = 20, one for period
ended 30 April 2005 and one for period ended 30 April 2006. However, that
sounds very complicated, and I would be content (for now) with one that just
creates one field to 30 April 2006 (unless the more general solution is more
trivial than I expect).

Any help or pointers would be gratefully received.

Jack Sheet · Sep 25, 2006

Further to my earlier message I have made an attempt at this, which doesn't
yet work. At the moment I an concentrating on creating two Select queries.

One (the simpler problem) is to create a query that identifies all records
that require no updating.
I call this query Q_RecExists, and the query states

SELECT T_Tasks.ID_Tasks, T_Tasks.ID_Clients, T_Tasks.PeriodTo
FROM T_Clients INNER JOIN T_Tasks ON T_Clients.ID_Clients =
T_Tasks.ID_Clients
WHERE (((T_Tasks.PeriodTo)>#6/30/2005#));

For the time being I have hard-coded 30 June 2005, being one year prior to
the Rolldate, but that will eventually have to be replaced by a calculated
amount based on user input. Anyway, for the time being this query at least
works, on the existing data and for a proposed RollDate 30 June 2006.

The second query is intended to identify The records that require to be
appended, and this is where I run into problems. I call this query
Q_RecNotExists, and currently the query is saved as:

SELECT Q_RecExists.ID_Clients
FROM T_Clients AS Q_RecExists LEFT JOIN
[SELECT ID_Clients FROM T_Tasks WHERE [T_Tasks].[PeriodTo] =
#2005-06-30#].
AS Q_RecNotExists ON Q_RecExists.ID_Clients=Q_RecNotExists.ID_Clients
WHERE Q_RecNotExists.ID_Clients is NULL;

This does not work, and nor would I expect it to. It compiles, and runs,
but lists all records in T_Tasks.
The reason why I would not EXPECT it to work is that where I currently have
"= #2005-06-30#"
I thought that it should read "> #2005-06-30#"

But if I try to change the "=" to ">" then it will not save the query, and
tells me that there is a syntax error in the FROM clause.
So now I am stuck!

Jamie Collins · Sep 26, 2006

Jack said:
for the sample data given:

It should ignore ID_Clients = 22, because the year end for that client is
set to 30 June, the latest such date on or before rolldate (30 June 2006)
is, coincidentally, 30 June 2006 and there already exists a record with that
date in the PeriodTo field.

It should create one record for ID_Clients = 21, and populate the PeriodTo
field with 2006-05-31.

IDEALLY it should create two records for ID_Clients = 20, one for period
ended 30 April 2005 and one for period ended 30 April 2006. However, that
sounds very complicated, and I would be content (for now) with one that just
creates one field to 30 April 2006 (unless the more general solution is more
trivial than I expect).

Good detail with the spec.

Unlike you, I do not trust users to always enter good data <g> so I'll
include a test for valid combinations of YEM and YED (the below SQL
requires ANSI-92 query mode in Access or an ADO connection):

CREATE TABLE T_Clients (
ID_Clients INTEGER IDENTITY(1, 1) NOT NULL primary key,
YEM INTEGER DEFAULT 4 NOT NULL,
CHECK(YEM BETWEEN 1 AND 12),
YED INTEGER DEFAULT 5 NOT NULL,
CHECK(SWITCH(
YEM IN (1, 3, 5, 7, 8, 10, 12), YED BETWEEN 1 AND 31,
YEM IN (9, 4, 6, 11), YED BETWEEN 1 AND 30,
YEM = 2, YED BETWEEN 1 AND 29
))
)
;
INSERT INTO T_Clients (ID_Clients, YEM, YED) VALUES
(20, 4, 30)
;
INSERT INTO T_Clients (ID_Clients, YEM, YED) VALUES
(21, 5, 31)
;
INSERT INTO T_Clients (ID_Clients, YEM, YED) VALUES
(22, 6, 30)
;
CREATE TABLE T_Tasks (
ID_Tasks INTEGER IDENTITY(1,1) NOT NULL PRIMARY KEY,
ID_Clients INTEGER NOT NULL REFERENCES T_Clients (ID_Clients),
PeriodTo dateTIME NOT NULL,
UNIQUE (ID_Clients, PeriodTo)
)
;

Of course, the above table should have validation to ensure the
PeriodTo date matches the YEM and YED values in the referenced table,
handling 29th February:

ALTER TABLE T_Tasks ADD
CONSTRAINT task_periodto__client_YE__interaction
CHECK(EXISTS (
SELECT *
FROM T_Clients AS T1
WHERE T1.ID_Clients = T_Tasks.ID_Clients
AND T1.YEM = MONTH(T_Tasks.PeriodTo)
AND IIF(T1.YEM = 2 AND T1.YED = 29 AND
DAY(DATESERIAL(YEAR(T_Tasks.PeriodTo), T1.YEM, T1.YED)) = 1, 28,
T1.YED) = DAY(T_Tasks.PeriodTo)
))
;
INSERT INTO T_Tasks (ID_Tasks, ID_Clients, PeriodTo)
VALUES (11, 20, #2004-04-30#)
;
INSERT INTO T_Tasks (ID_Tasks, ID_Clients, PeriodTo)
VALUES (12, 21, #2005-05-31#)
;
INSERT INTO T_Tasks (ID_Tasks, ID_Clients, PeriodTo)
VALUES (13, 22, #2006-06-30#)
;

Business needs such as 'find the latest date' should suggest a Calendar
auxiliary table. Here's one for years 2004 to 2030 inclusive:

CREATE TABLE Calendar (
dt DATETIME PRIMARY KEY
)
;
INSERT INTO Calendar VALUES (#2004-01-01#)
;
INSERT INTO Calendar (dt) SELECT CDATE(Units.nbr + Tens.nbr +
Hundreds.nbr + Thousands.nbr + TenThousands.nbr) AS dt FROM (SELECT nbr
FROM (SELECT 0 AS nbr FROM Calendar UNION ALL SELECT 1 FROM Calendar
UNION ALL SELECT 2 FROM Calendar UNION ALL SELECT 3 FROM Calendar UNION
ALL SELECT 4 FROM Calendar UNION ALL SELECT 5 FROM Calendar UNION ALL
SELECT 6 FROM Calendar UNION ALL SELECT 7 FROM Calendar UNION ALL
SELECT 8 FROM Calendar UNION ALL SELECT 9 FROM Calendar) AS Digits) AS
Units, (SELECT nbr * 10 AS nbr FROM (SELECT 0 AS nbr FROM Calendar
UNION ALL SELECT 1 FROM Calendar UNION ALL SELECT 2 FROM Calendar UNION
ALL SELECT 3 FROM Calendar UNION ALL SELECT 4 FROM Calendar UNION ALL
SELECT 5 FROM Calendar UNION ALL SELECT 6 FROM Calendar UNION ALL
SELECT 7 FROM Calendar UNION ALL SELECT 8 FROM Calendar UNION ALL
SELECT 9 FROM Calendar) AS Digits) AS Tens, (SELECT nbr * 100 AS nbr
FROM (SELECT 0 AS nbr FROM Calendar UNION ALL SELECT 1 FROM Calendar
UNION ALL SELECT 2 FROM Calendar UNION ALL SELECT 3 FROM Calendar UNION
ALL SELECT 4 FROM Calendar UNION ALL SELECT 5 FROM Calendar UNION ALL
SELECT 6 FROM Calendar UNION ALL SELECT 7 FROM Calendar UNION ALL
SELECT 8 FROM Calendar UNION ALL SELECT 9 FROM Calendar) AS Digits ) AS
Hundreds, (SELECT nbr * 1000 AS nbr FROM (SELECT 0 AS nbr FROM Calendar
UNION ALL SELECT 1 FROM Calendar UNION ALL SELECT 2 FROM Calendar UNION
ALL SELECT 3 FROM Calendar UNION ALL SELECT 4 FROM Calendar UNION ALL
SELECT 5 FROM Calendar UNION ALL SELECT 6 FROM Calendar UNION ALL
SELECT 7 FROM Calendar UNION ALL SELECT 8 FROM Calendar UNION ALL
SELECT 9 FROM Calendar ) AS Digits) AS Thousands, (SELECT nbr * 10000
AS nbr FROM (SELECT 0 AS nbr FROM Calendar UNION ALL SELECT 1 FROM
Calendar UNION ALL SELECT 2 FROM Calendar UNION ALL SELECT 3 FROM
Calendar UNION ALL SELECT 4 FROM Calendar UNION ALL SELECT 5 FROM
Calendar UNION ALL SELECT 6 FROM Calendar UNION ALL SELECT 7 FROM
Calendar UNION ALL SELECT 8 FROM Calendar UNION ALL SELECT 9 FROM
Calendar) AS Digits) AS TenThousands WHERE Units.nbr + Tens.nbr +
Hundreds.nbr + Thousands.nbr + TenThousands.nbr BETWEEN
CLNG(#2004-01-02#) AND CLNG(#2030-12-31#)
;

To find the missing dates, we can 'reuse' the 29th February code
(demonstrating it pays to write effective constraints in the first
place <g>). I couldn't dudce which dates were/weren't required so
here's a query that returns them all, to get you started:

CREATE PROCEDURE GetTasksMissingPeriods (
roll_date DATETIME = DATE()
) AS
SELECT T1.ID_Clients, C1.dt AS PeriodTo
FROM T_Clients AS T1, Calendar AS C1
WHERE C1.dt <= roll_date
AND T1.YEM = MONTH(C1.dt)
AND IIF(T1.YEM = 2 AND T1.YED = 29 AND DAY(DATESERIAL(YEAR(C1.dt),
T1.YEM, T1.YED)) = 1, 28, T1.YED) = DAY(C1.dt)
AND NOT EXISTS (
SELECT *
FROM T_Tasks AS K1
WHERE T1.ID_Clients = K1.ID_Clients
AND C1.dt = K1.PeriodTo
)
ORDER BY T1.ID_Clients, C1.dt
;

If you only want the most recent, use a GROUP BY e.g.

SELECT T1.ID_Clients, MAX(C1.dt) AS PeriodTo
....
GROUP BY T1.ID_Clients

Now that I've proposed a solution, I feel I can point out a flaw in
your design: a period is modelled in SQL using start and end date pairs
and you are missing the PeriodFrom column.

For an example on how to create effective constraints (e.g. to prevent
overlapping periods) on a time-valid state ('history') table, see:

http://groups.google.com/group/microsoft.public.access.forms/msg/04c3f233ba3336cc

Jamie.

--

Jack Sheet · Sep 30, 2006

Jamie Collins said:
<Snip>

Now that I've proposed a solution, I feel I can point out a flaw in
your design: a period is modelled in SQL using start and end date pairs
and you are missing the PeriodFrom column.

Hi Jamie
I posted a thank you message a few days ago, but it did not propogate my
newsgroup reader, so I hope you got it.

As regards your comment about the absence of a PeriodFrom column, whilst I
have in fact inserted one I wonder whether its omission would really be a
flaw?

I am repeatedly advised not to hard-code data into fields where the data can
be calculated by reference to data already within the database. That is
what queries are for. The "PeriodFrom" date will be one day later than the
latest previous PeriodTo date for the same Client/Task. I can query for
that, so I don't need to hard code it. Or am I missing something?

Jamie Collins · Oct 2, 2006

Jack said:
I posted a thank you message a few days ago, but it did not propogate my
newsgroup reader, so I hope you got it.

It's not in the google group Usenet archive so I haven't seen it

I am repeatedly advised not to hard-code data into fields where the data can
be calculated by reference to data already within the database.

Yes, for some reason in the Access groups the rule of thumb appears to
be, 'Never use a calculated column.' Elsewhere in the SQL world there
is less of a knee-jerk reaction and a calculated column may be
acceptable 'when the cost of the calculation is higher than the cost of
a simple read'. See

Calculated Columns
by Joe Celko
http://www.dbazine.com/ofinterest/oi-articles/celko4/view?searchterm=celko
"You are not supposed to put a calculated column in a table in a pure
SQL database. And as the guardian of pure SQL, I should oppose this
practice. Too bad the real world is not as nice as the theoretical
world..."

As regards your comment about the absence of a PeriodFrom column, whilst I
have in fact inserted one I wonder whether its omission would really be a
flaw?

The "PeriodFrom" date will be one day later than the
latest previous PeriodTo date for the same Client/Task. I can query for
that, so I don't need to hard code it. Or am I missing something?

I don't think yours is a calculated column scenario. Rather, it's basic
modelling.

As you are probably aware, Access/Jet's DATETIME type is an instant and
is the only temporal type provided. To model a period, you send to use
two instants, one for the start and one for the end. There are various
representations: my aforementioned example uses the closed-closed
representation; yours I will examine later...

Straight away, your model has a flaw. If a table (Payroll) represent a
collection of entities (periods of time) then each row should represent
one entity. However, in your model the end date appear in another row
and, as you admit, you must use a query to retrieve it. Any time you
need a query to re-establish a basic fact, this should raise suspicions
that you have committed the flaw of 'attribute splitting'.

This flaw has practical consequences, beyond the overhead of the
reconstructive query. Consider this example:

CREATE TABLE Payroll (
employee_nbr INTEGER NOT NULL,
start_date DATETIME NOT NULL,
annual_salary_amount INTEGER NOT NULL,
PRIMARY KEY (employee_nbr, start_date)
)
;
INSERT INTO Payroll VALUES (1, #2001-01-01#, 11111)
;
INSERT INTO Payroll VALUES (1, #2002-01-01#, 22222)
;
INSERT INTO Payroll VALUES (1, #2003-01-01#, 33333)
;

To get the salary amount in effect on 2002-06-01, something like (based
on your description):

SELECT DT1.employee_nbr, DT1.annual_salary_amount,
DT1.start_date,
DATEVALUE(DT1.end_date_closed) AS end_date,
#2002-06-01# AS report_date
FROM (
SELECT T1.employee_nbr, T1.start_date, (
SELECT IIF(MIN(T2.start_date) IS NULL,
#9999-12-31 23:59:59#, DATEADD('s', -1, MIN(T2.start_date)))
FROM Payroll AS T2
WHERE employee_nbr = T2.employee_nbr
AND T1.start_date < T2.start_date
) AS end_date_closed,
T1.annual_salary_amount
FROM Payroll AS T1
) AS DT1
WHERE #2002-06-01# BETWEEN DT1.start_date AND DT1.end_date_closed

This returns the row (1, #2002-01-01#).

Now consider an application bug caused this to happen:

DELETE FROM Payroll
WHERE start_date = #2002-01-01#
;

We should now have no data for the period [#2002-01-01#] to
(#2002-01-01#). However, the same query happily returns some data
including a spurious salary amount. This is because the query, like the
model, is based on an assumption. Think about it: how would you define
a database constraint that would prevent the anomaly from arising? You
can't, of course. You simply do not have enough data. And before you
think, 'REVOKE DELETE ON Payroll FROM PUBLIC', consider that INSERT and
UPDATE anomalies are similarly unpreventable.

Rather than assuming, the end date should be explicit. In my earlier
example, the delete would fail because the constraint which disallows
non-contiguous periods would bite. In a properly designed valid-time
state ('history') table, INSERT, UPDATE and DELETE operations in the
general case ('in the past') require multiple SQL statements and would
be harder to create anomalies with a single typo.

When I say *my* example, I should point out it is based on the work of
Snodgrass in 'Developing Time-Oriented Database Applications in SQL',
which you can download for free
(http://www.cs.arizona.edu/people/rts/tdbbook.pdf). All I ask is that
you read up on periods in SQL in the book, ideally chapters 4,5 and 7
but even a few pages should do, and see if you still think that
modelling the end date in another row is acceptable

The "PeriodFrom" date will be one day later than the
latest previous PeriodTo date for the same Client/Task. I can query for
that, so I don't need to hard code it.

Is this the open-open representation? Care is needed here to avoid the
situation where you find an end date with a time element (e.g.
#2002-12-31 09:00:00#) falls in the 'no man's land' between periods.

Jamie.

--

Jack Sheet · Oct 2, 2006

Thank you very much, once again Jamie
I actually understood that!

Append query syntax requested

Jack Sheet

Jack Sheet

Jamie Collins

Jack Sheet

Jamie Collins

Jack Sheet