Help writing a high performance query

B

Bill Jaeck

I am looking for a high performance solution that joins tables and returns
some cost information for employees whose rates change over time. Surely
this problem has been solved many times before, but I have yet to come up
with a solution that I am happy with.



I have the following tables:



Table: Employees - contains employee names and their EmployeeIDs.

EmployeeID
Name

1
Jack Jones

2
John Doe

Etc.
Etc.




Table: Hours - For each day that each Employee works, a row is written to
this table which specifies who worked when and for how many hours.

EmployeeID
Date
Hours

1
1/1/2003
2

1
2/2/2003
3

1
3/3/2003
4

2



3



Etc.






Table: Rates - Each Employee works for a different hourly rate, and the
rates change over time. This table contains the rates for each employee over
time. The effective date shows the first day that the rate is in effect. The
rate is in effect for the employee until a new rate with a more recent
EffectiveDate for that employee is entered into the table. So employee 1
worked for $1.00 per hour in January, $2.00 per hour in February, and for
$3.00 ever since March of 2003.

EmployeeID
EffectiveDate
Rate

1
1/1/2003
$1.00

1
2/1/2003
$2.00

1
3/1/2003
$3.00

2



3



Etc.






Here is the problem. I want to create a "Costs" query which combines the
rows in the Hours table with the effective rate for the employee on the date
the hours were worked so that a net cost can be calculated for the work done
each day by the employee, e.g.

Query: Costs

EmployeeID
Date
Hours
Rate
Cost

1
1/1/2003
2
$1.00
$2.00

1
2/1/2003
3
$2.00
$6.00

1
3/1/2003
4
$3.00
$12.00

2





3





etc








My Hours table is BIG. Obviously, the Cost is the Hours X Rate. But looking
up the correct cost for the date in the Rates table is what I am having a
hard time with. I realize that I could write a VBA function to figure out
the rate (e.g. a function for the Rate column called
RateForEmployee(argEmployeeID as Long, argDate as Date) as Currency). But
that function has to build an SQL string and then open a recordset for each
row in my Hours table, and that takes a long time because there are tens of
thousands of rows in my Hours table.



Does anyone know how I could build this Costs query without having to write
a function in VBA to obtain the Rate to use on the date the hours were
worked? I am looking for a high performance solution.
 
J

Jamie Collins

Bill said:
I am looking for a high performance solution that joins tables and returns
some cost information for employees whose rates change over time. Surely
this problem has been solved many times before, but I have yet to come up
with a solution that I am happy with.

Do you persist rates daily, as your scant sample data suggests? e.g.
(without obvious validation rules, PRIMARY KEY syntax avoided, etc):

CREATE TABLE Employees (
EmployeeID INTEGER NOT NULL UNIQUE,
EmployeeName VARCHAR(35) NOT NULL UNIQUE)
;
CREATE TABLE Rates (
EmployeeID INTEGER NOT NULL
REFERENCES Employees (EmployeeID),
EffectiveDate DATETIME NOT NULL,
UNIQUE (EmployeeID, EffectiveDate),
rate_amount DECIMAL(10, 4) NOT NULL)
;
CREATE TABLE Hours (
EmployeeID INTEGER NOT NULL,
EffectiveDate DATETIME NOT NULL,
UNIQUE (EmployeeID, EffectiveDate),
FOREIGN KEY (EmployeeID, EffectiveDate)
REFERENCES Rates (EmployeeID, EffectiveDate),
hours_amount DECIMAL(4, 2) NOT NULL)
;

I tested a 260 workday year with a daily rate amount and hour amount
for each employee:

SELECT E1.EmployeeID, E1.EmployeeName,
R1.rate_amount, H1.EffectiveDate,
H1.hours_amount, R1.rate_amount * H1.hours_amount AS income
FROM (Rates AS R1 INNER JOIN Hours AS H1
ON R1.EmployeeID = H1.EmployeeID
AND R1.EffectiveDate = H1.EffectiveDate)
INNER JOIN Employees AS E1
ON H1.EmployeeID = E1.EmployeeID;

For 50 employees (13K rows in Hours and Rates each) the query takes
about 0.2 seconds, for 250 employees (65K rows in Hours and Rates each)
the query takes about 1.2 seconds. Is this acceptable performance?

It's an obvious one but if you have extraneous data (e.g. going back
years and you are only usually interested in the current year) consider
moving it to archive tables.

Maybe you have a 'history' table for rates, where you have used a
single EffectiveDate column rather than the conventional start_date and
end_date pairs on each row. This design flaw adds overhead to your
queries in finding the 'next' effective date to determine when the
'current' period ends. Correcting:

CREATE TABLE RatesHistory (
EmployeeID INTEGER NOT NULL
REFERENCES Employees (EmployeeID),
rate_start_date DATETIME NOT NULL,
rate_end_date DATETIME DEFAULT '9999-12-31 23:59:59' NOT NULL,
UNIQUE (EmployeeID, rate_start_date),
UNIQUE (EmployeeID, rate_end_date),
UNIQUE (EmployeeID, rate_start_date, rate_end_date),
rate_amount DECIMAL(10, 4) NOT NULL);

The maximum date value is being used to model the current period.
Again, obvious validation rules (e.g. start _date < end_date) are
omitted. Most crucially, the above table lacks a key because although
the UNIQUE constraints represent candidate keys, they still don't
prevent overlapping periods, including the current period. The only way
of implementing a primary key in a history table in Access/Jet is to
use table-level CHECK constraints and they can be a bit complex - this
is why I've omitted them! For a more detailed example, see:

http://groups.google.com/group/microsoft.public.access.forms/msg/04c3f233ba3336cc

To get daily rates using the history table, you can use the standard
trick of joining to a Calendar table e.g.

SELECT C1.dt, R1.rate_amount
FROM RatesHistory AS R1, Calendar AS C1
WHERE C1.dt
BETWEEN R1.rate_start_date AND R1.rate_end_date;

Moving on to the corrected Hours table, how do you ensure there is a
rate in effect? More table-level CHECK constraints is one way but
denormalization to enable use of simple referential integrity and
validation rules could be a legitimate choice:

CREATE TABLE Hours (
EmployeeID INTEGER NOT NULL,
rate_start_date DATETIME NOT NULL,
rate_end_date DATETIME NOT NULL,
FOREIGN KEY (EmployeeID, rate_start_date, rate_end_date)
REFERENCES RatesHistory (EmployeeID, rate_start_date, rate_end_date),
effective_date DATETIME NOT NULL,
CHECK (effective_date BETWEEN rate_start_date AND rate_end_date),
UNIQUE (EmployeeID, effective_date),
hours_amount DECIMAL(4, 2) NOT NULL);

Jamie.

--
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top