Using select query results as filter for append query

R

Ray S.

I have made a table that will hold large amounts of financial data obtained
from a mainframe. I saved the make table query, although I only needed to use
it to make the initial table. One of the fields is a date_posted. Then, I
made a select query on the main table initially created containing only one
field and row: the max date_posted. Now, what I want to do is not remake the
main table every time I want to update the data, rather only append to it the
records that have a greater date_posted than the last run. It seems to me
that I could use the results of the last max date_posted query as a filter on
an append query for the main table, but I'm not exactly sure how to do this.
I'd like this all to run smoothly in code or from a macro in one step without
having to copy and paste the select query results into the append query.
 
T

Tom van Stiphout

On Thu, 6 Nov 2008 04:50:00 -0800, Ray S.

Yes you can, by using a subquery. Here is the main idea:
select * from Table1
where date_posted > (select Max(date_posted) from Table2)

-Tom.
Microsoft Access MVP
 
K

Ken Sheridan

You can use a subquery to restrict the outer query to those rows whose
date_posted value is greater than the MAX(date_posted) value in the main
table. This is not the same as greater than the last 'run', as when the
query was last executed the latest date could have been earlier than the date
of execution, but that should not be relevant. The query would be along
these lines:

INSERT INTO [Main Table]
SELECT *
FROM [Imported Table]
WHERE [date_posted] >
(SELECT NZ(MAX([date_posted]),0)
FROM [Main Table]);

This assumes all columns are being inserted. If not use field lists in both
the INSERT and the SELECT clause of the outer query.

Calling the NZ function should not be necessary in your case in fact as its
only there to cater for where the Main Table would be empty, but it will do
no harm to leave it in.

Ken Sheridan
Stafford, England
 
R

Ray S.

I tried this line, but there's a problem. The select query that pulls the max
date posted is pulling it out of the same table I'm appending to. Let's say
Main_Table is the one I'm appending the new data to. What I want to do is
select the last posting date of the data I have just pulled in with the
append query. Now, I want to incorporate that last date (right now I'm using
a make table query to create that max date, let's call that table
Max_date_Table, into my append query as the filter for the date_posted in the
new appended table. What I get when I use your approach is a dialog box
asking me for the max_post_date.

Ken Sheridan said:
You can use a subquery to restrict the outer query to those rows whose
date_posted value is greater than the MAX(date_posted) value in the main
table. This is not the same as greater than the last 'run', as when the
query was last executed the latest date could have been earlier than the date
of execution, but that should not be relevant. The query would be along
these lines:

INSERT INTO [Main Table]
SELECT *
FROM [Imported Table]
WHERE [date_posted] >
(SELECT NZ(MAX([date_posted]),0)
FROM [Main Table]);

This assumes all columns are being inserted. If not use field lists in both
the INSERT and the SELECT clause of the outer query.

Calling the NZ function should not be necessary in your case in fact as its
only there to cater for where the Main Table would be empty, but it will do
no harm to leave it in.

Ken Sheridan
Stafford, England

Ray S. said:
I have made a table that will hold large amounts of financial data obtained
from a mainframe. I saved the make table query, although I only needed to use
it to make the initial table. One of the fields is a date_posted. Then, I
made a select query on the main table initially created containing only one
field and row: the max date_posted. Now, what I want to do is not remake the
main table every time I want to update the data, rather only append to it the
records that have a greater date_posted than the last run. It seems to me
that I could use the results of the last max date_posted query as a filter on
an append query for the main table, but I'm not exactly sure how to do this.
I'd like this all to run smoothly in code or from a macro in one step without
having to copy and paste the select query results into the append query.
 
K

Ken Sheridan

I can't see a problem, nor why you are creating the Max_date_Table at all; by
doing so you are merely introducing redundancy and the consequent risk of
inconsistent data. The latest date of rows so far appended is available from
the table into which you are appending rows with the INSERT statement. The
subquery returns this date and restricts the rows returned by the outer query
to those with values later than that date. Consequently only rows later than
that date are appended, which is what you said you wanted in your original
post.

If you insist on using the Max_date_Table then all you have to do is use
that in the subquery rather than the main table. You won't need to call the
MAX operator of course. But you'll have to update the Max_date_Table every
time so that the value represents the latest date of rows already appended,
whereas the approach described by Tom and myself dynamically identifies the
current latest date every time the query is executed.

In fact, if you do use the Max_date_Table table you don't need a subquery;
you can join the tables:

INSERT INTO [Main Table]
SELECT [Imported Table].*
FROM [Imported Table], [Max_date_Table]
WHERE [Imported Table].[date_posted] > [Max_date_Table].[max_post_date];

Ken Sheridan
Stafford, England

Ray S. said:
I tried this line, but there's a problem. The select query that pulls the max
date posted is pulling it out of the same table I'm appending to. Let's say
Main_Table is the one I'm appending the new data to. What I want to do is
select the last posting date of the data I have just pulled in with the
append query. Now, I want to incorporate that last date (right now I'm using
a make table query to create that max date, let's call that table
Max_date_Table, into my append query as the filter for the date_posted in the
new appended table. What I get when I use your approach is a dialog box
asking me for the max_post_date.

Ken Sheridan said:
You can use a subquery to restrict the outer query to those rows whose
date_posted value is greater than the MAX(date_posted) value in the main
table. This is not the same as greater than the last 'run', as when the
query was last executed the latest date could have been earlier than the date
of execution, but that should not be relevant. The query would be along
these lines:

INSERT INTO [Main Table]
SELECT *
FROM [Imported Table]
WHERE [date_posted] >
(SELECT NZ(MAX([date_posted]),0)
FROM [Main Table]);

This assumes all columns are being inserted. If not use field lists in both
the INSERT and the SELECT clause of the outer query.

Calling the NZ function should not be necessary in your case in fact as its
only there to cater for where the Main Table would be empty, but it will do
no harm to leave it in.

Ken Sheridan
Stafford, England

Ray S. said:
I have made a table that will hold large amounts of financial data obtained
from a mainframe. I saved the make table query, although I only needed to use
it to make the initial table. One of the fields is a date_posted. Then, I
made a select query on the main table initially created containing only one
field and row: the max date_posted. Now, what I want to do is not remake the
main table every time I want to update the data, rather only append to it the
records that have a greater date_posted than the last run. It seems to me
that I could use the results of the last max date_posted query as a filter on
an append query for the main table, but I'm not exactly sure how to do this.
I'd like this all to run smoothly in code or from a macro in one step without
having to copy and paste the select query results into the append query.
 
R

Ray S.

I really don't want to have to create a second table, but I don't clearly
understand your instructions, then. I have a table that was created to hold
the updated data. One field in that file is the date_posted. I would like to
now have just one query run each time I update the data. I don't want to
re-create all the data because it is quite large. So, I just want to use the
last date_posted field from the collected data in the table (before update)
to filter the data from the mainframe that will update my local file. I know
that I can determine what the last date_posted was. I know that I could take
that date and use it to filter the data from the mainframe so I avoid
re-creating the entire file. I'm just not clear on how to "automate" what I
am doing. As I mentioned, I tried to follow your instructions, but all I got
was a dialog asking me to put in the max_date_posted.

Ken Sheridan said:
I can't see a problem, nor why you are creating the Max_date_Table at all; by
doing so you are merely introducing redundancy and the consequent risk of
inconsistent data. The latest date of rows so far appended is available from
the table into which you are appending rows with the INSERT statement. The
subquery returns this date and restricts the rows returned by the outer query
to those with values later than that date. Consequently only rows later than
that date are appended, which is what you said you wanted in your original
post.

If you insist on using the Max_date_Table then all you have to do is use
that in the subquery rather than the main table. You won't need to call the
MAX operator of course. But you'll have to update the Max_date_Table every
time so that the value represents the latest date of rows already appended,
whereas the approach described by Tom and myself dynamically identifies the
current latest date every time the query is executed.

In fact, if you do use the Max_date_Table table you don't need a subquery;
you can join the tables:

INSERT INTO [Main Table]
SELECT [Imported Table].*
FROM [Imported Table], [Max_date_Table]
WHERE [Imported Table].[date_posted] > [Max_date_Table].[max_post_date];

Ken Sheridan
Stafford, England

Ray S. said:
I tried this line, but there's a problem. The select query that pulls the max
date posted is pulling it out of the same table I'm appending to. Let's say
Main_Table is the one I'm appending the new data to. What I want to do is
select the last posting date of the data I have just pulled in with the
append query. Now, I want to incorporate that last date (right now I'm using
a make table query to create that max date, let's call that table
Max_date_Table, into my append query as the filter for the date_posted in the
new appended table. What I get when I use your approach is a dialog box
asking me for the max_post_date.

Ken Sheridan said:
You can use a subquery to restrict the outer query to those rows whose
date_posted value is greater than the MAX(date_posted) value in the main
table. This is not the same as greater than the last 'run', as when the
query was last executed the latest date could have been earlier than the date
of execution, but that should not be relevant. The query would be along
these lines:

INSERT INTO [Main Table]
SELECT *
FROM [Imported Table]
WHERE [date_posted] >
(SELECT NZ(MAX([date_posted]),0)
FROM [Main Table]);

This assumes all columns are being inserted. If not use field lists in both
the INSERT and the SELECT clause of the outer query.

Calling the NZ function should not be necessary in your case in fact as its
only there to cater for where the Main Table would be empty, but it will do
no harm to leave it in.

Ken Sheridan
Stafford, England

:

I have made a table that will hold large amounts of financial data obtained
from a mainframe. I saved the make table query, although I only needed to use
it to make the initial table. One of the fields is a date_posted. Then, I
made a select query on the main table initially created containing only one
field and row: the max date_posted. Now, what I want to do is not remake the
main table every time I want to update the data, rather only append to it the
records that have a greater date_posted than the last run. It seems to me
that I could use the results of the last max date_posted query as a filter on
an append query for the main table, but I'm not exactly sure how to do this.
I'd like this all to run smoothly in code or from a macro in one step without
having to copy and paste the select query results into the append query.
 
K

Ken Sheridan

That's exactly what the query Tom and I gave you should do. All you need to
do is run it at any time and it will append just those rows which have dates
later than the last one you appended.

The dialogue you are getting is a parameter prompt which means that Access
doesn't recognize the reference to the max_date_posted column. This is
presumably because that column only exists in the table you created to store
the latest date. That table isn't necessary at all to achieve what you want.

Lets take the query step by step:

INSERT INTO [Main Table]

In this line Main table is the table to which you are appending the rows.

SELECT [Imported Table].*

In this line Imported table is the table from which you are getting the rows
to be imported into Main Table.

WHERE [date_posted] >

This restricts the rows inserted to those in Imported Table with a date in a
column date_posted later than:

(SELECT NZ(MAX([date_posted]),0)
FROM [Main Table])

which is a subquery which returns the latest value in a column date_posted
in rows *already in* the table in which you are inserting the new rows.

So the end result is that only those rows with dates later than that
returned by the subquery are inserted, i.e. rows with dates *not yet in* Main
Table, which from my reading of your original post is what you want. If I've
misunderstood, and this is not what's required, perhaps you could explain it
in words of one syllable which even an Irish bogtrotter like me can
understand.

Ken Sheridan
Stafford, England

Ray S. said:
I really don't want to have to create a second table, but I don't clearly
understand your instructions, then. I have a table that was created to hold
the updated data. One field in that file is the date_posted. I would like to
now have just one query run each time I update the data. I don't want to
re-create all the data because it is quite large. So, I just want to use the
last date_posted field from the collected data in the table (before update)
to filter the data from the mainframe that will update my local file. I know
that I can determine what the last date_posted was. I know that I could take
that date and use it to filter the data from the mainframe so I avoid
re-creating the entire file. I'm just not clear on how to "automate" what I
am doing. As I mentioned, I tried to follow your instructions, but all I got
was a dialog asking me to put in the max_date_posted.

Ken Sheridan said:
I can't see a problem, nor why you are creating the Max_date_Table at all; by
doing so you are merely introducing redundancy and the consequent risk of
inconsistent data. The latest date of rows so far appended is available from
the table into which you are appending rows with the INSERT statement. The
subquery returns this date and restricts the rows returned by the outer query
to those with values later than that date. Consequently only rows later than
that date are appended, which is what you said you wanted in your original
post.

If you insist on using the Max_date_Table then all you have to do is use
that in the subquery rather than the main table. You won't need to call the
MAX operator of course. But you'll have to update the Max_date_Table every
time so that the value represents the latest date of rows already appended,
whereas the approach described by Tom and myself dynamically identifies the
current latest date every time the query is executed.

In fact, if you do use the Max_date_Table table you don't need a subquery;
you can join the tables:

INSERT INTO [Main Table]
SELECT [Imported Table].*
FROM [Imported Table], [Max_date_Table]
WHERE [Imported Table].[date_posted] > [Max_date_Table].[max_post_date];

Ken Sheridan
Stafford, England

Ray S. said:
I tried this line, but there's a problem. The select query that pulls the max
date posted is pulling it out of the same table I'm appending to. Let's say
Main_Table is the one I'm appending the new data to. What I want to do is
select the last posting date of the data I have just pulled in with the
append query. Now, I want to incorporate that last date (right now I'm using
a make table query to create that max date, let's call that table
Max_date_Table, into my append query as the filter for the date_posted in the
new appended table. What I get when I use your approach is a dialog box
asking me for the max_post_date.

:

You can use a subquery to restrict the outer query to those rows whose
date_posted value is greater than the MAX(date_posted) value in the main
table. This is not the same as greater than the last 'run', as when the
query was last executed the latest date could have been earlier than the date
of execution, but that should not be relevant. The query would be along
these lines:

INSERT INTO [Main Table]
SELECT *
FROM [Imported Table]
WHERE [date_posted] >
(SELECT NZ(MAX([date_posted]),0)
FROM [Main Table]);

This assumes all columns are being inserted. If not use field lists in both
the INSERT and the SELECT clause of the outer query.

Calling the NZ function should not be necessary in your case in fact as its
only there to cater for where the Main Table would be empty, but it will do
no harm to leave it in.

Ken Sheridan
Stafford, England

:

I have made a table that will hold large amounts of financial data obtained
from a mainframe. I saved the make table query, although I only needed to use
it to make the initial table. One of the fields is a date_posted. Then, I
made a select query on the main table initially created containing only one
field and row: the max date_posted. Now, what I want to do is not remake the
main table every time I want to update the data, rather only append to it the
records that have a greater date_posted than the last run. It seems to me
that I could use the results of the last max date_posted query as a filter on
an append query for the main table, but I'm not exactly sure how to do this.
I'd like this all to run smoothly in code or from a macro in one step without
having to copy and paste the select query results into the append query.
 
R

Ray S.

OK, as I follow your explanation line by line, I note your reference to the
"Imported Table."

Now, nothing is as easy as it seems. I'm actually selecting my data from a
join of two tables - one is what you're calling the Imported_Table, and the
other is a table with some important dimension structure points - let's call
it the Dimension_Table. The actual Imported Table query part has other
criteria...there is a where currency_code is like "USD" and a dimension_id is
like "cc_by_mgmt_org" and amt_type_class is
in('ACCRUAL','CASH','GAAP','STATUTORY')

when I add the where date_posted (that is, of the Imported_Table) is >
Nz(max(date_posted),0) (that is, of the Main_Table)

Access returns an error telling me I can't have an aggregate function in my
where clause. I take that to mean that I can't use the max function in this
way.

Ken Sheridan said:
That's exactly what the query Tom and I gave you should do. All you need to
do is run it at any time and it will append just those rows which have dates
later than the last one you appended.

The dialogue you are getting is a parameter prompt which means that Access
doesn't recognize the reference to the max_date_posted column. This is
presumably because that column only exists in the table you created to store
the latest date. That table isn't necessary at all to achieve what you want.

Lets take the query step by step:

INSERT INTO [Main Table]

In this line Main table is the table to which you are appending the rows.

SELECT [Imported Table].*

In this line Imported table is the table from which you are getting the rows
to be imported into Main Table.

WHERE [date_posted] >

This restricts the rows inserted to those in Imported Table with a date in a
column date_posted later than:

(SELECT NZ(MAX([date_posted]),0)
FROM [Main Table])

which is a subquery which returns the latest value in a column date_posted
in rows *already in* the table in which you are inserting the new rows.

So the end result is that only those rows with dates later than that
returned by the subquery are inserted, i.e. rows with dates *not yet in* Main
Table, which from my reading of your original post is what you want. If I've
misunderstood, and this is not what's required, perhaps you could explain it
in words of one syllable which even an Irish bogtrotter like me can
understand.

Ken Sheridan
Stafford, England

Ray S. said:
I really don't want to have to create a second table, but I don't clearly
understand your instructions, then. I have a table that was created to hold
the updated data. One field in that file is the date_posted. I would like to
now have just one query run each time I update the data. I don't want to
re-create all the data because it is quite large. So, I just want to use the
last date_posted field from the collected data in the table (before update)
to filter the data from the mainframe that will update my local file. I know
that I can determine what the last date_posted was. I know that I could take
that date and use it to filter the data from the mainframe so I avoid
re-creating the entire file. I'm just not clear on how to "automate" what I
am doing. As I mentioned, I tried to follow your instructions, but all I got
was a dialog asking me to put in the max_date_posted.

Ken Sheridan said:
I can't see a problem, nor why you are creating the Max_date_Table at all; by
doing so you are merely introducing redundancy and the consequent risk of
inconsistent data. The latest date of rows so far appended is available from
the table into which you are appending rows with the INSERT statement. The
subquery returns this date and restricts the rows returned by the outer query
to those with values later than that date. Consequently only rows later than
that date are appended, which is what you said you wanted in your original
post.

If you insist on using the Max_date_Table then all you have to do is use
that in the subquery rather than the main table. You won't need to call the
MAX operator of course. But you'll have to update the Max_date_Table every
time so that the value represents the latest date of rows already appended,
whereas the approach described by Tom and myself dynamically identifies the
current latest date every time the query is executed.

In fact, if you do use the Max_date_Table table you don't need a subquery;
you can join the tables:

INSERT INTO [Main Table]
SELECT [Imported Table].*
FROM [Imported Table], [Max_date_Table]
WHERE [Imported Table].[date_posted] > [Max_date_Table].[max_post_date];

Ken Sheridan
Stafford, England

:

I tried this line, but there's a problem. The select query that pulls the max
date posted is pulling it out of the same table I'm appending to. Let's say
Main_Table is the one I'm appending the new data to. What I want to do is
select the last posting date of the data I have just pulled in with the
append query. Now, I want to incorporate that last date (right now I'm using
a make table query to create that max date, let's call that table
Max_date_Table, into my append query as the filter for the date_posted in the
new appended table. What I get when I use your approach is a dialog box
asking me for the max_post_date.

:

You can use a subquery to restrict the outer query to those rows whose
date_posted value is greater than the MAX(date_posted) value in the main
table. This is not the same as greater than the last 'run', as when the
query was last executed the latest date could have been earlier than the date
of execution, but that should not be relevant. The query would be along
these lines:

INSERT INTO [Main Table]
SELECT *
FROM [Imported Table]
WHERE [date_posted] >
(SELECT NZ(MAX([date_posted]),0)
FROM [Main Table]);

This assumes all columns are being inserted. If not use field lists in both
the INSERT and the SELECT clause of the outer query.

Calling the NZ function should not be necessary in your case in fact as its
only there to cater for where the Main Table would be empty, but it will do
no harm to leave it in.

Ken Sheridan
Stafford, England

:

I have made a table that will hold large amounts of financial data obtained
from a mainframe. I saved the make table query, although I only needed to use
it to make the initial table. One of the fields is a date_posted. Then, I
made a select query on the main table initially created containing only one
field and row: the max date_posted. Now, what I want to do is not remake the
main table every time I want to update the data, rather only append to it the
records that have a greater date_posted than the last run. It seems to me
that I could use the results of the last max date_posted query as a filter on
an append query for the main table, but I'm not exactly sure how to do this.
I'd like this all to run smoothly in code or from a macro in one step without
having to copy and paste the select query results into the append query.
 
R

Ray S.

Here's the actual query with all it's long actual table names and fields, if
it helps:

SELECT DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.LDR_ENTITY_ID,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.LINE_LDR_ENTITY_ID,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.JRNL_ID,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.JRNL_DESC,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.EFF_DATE,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.PERIOD,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.ACCOUNT_ID,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.COST_CENTER_ID,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.PRIMARY_ID,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.LOB_ID,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.BOB_ID,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.TYPE_CODE_ID,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.JRNL_USER_ALPHA_FLD_1,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.JRNL_USER_ALPHA_FLD_2,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.JRNL_USER_ALPHA_FLD_3,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.POSTING_YR,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.POSTING_PD,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.AMT_CLASS_1_TYPE,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.TRANS_AMT,
DM_LEDGER_DIMHIER_COST_CENTER_ID.LEVEL18_ID,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.DATE_POSTED
FROM tbl_CY_PJL_CC_FILTER INNER JOIN
(DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE INNER JOIN
DM_LEDGER_DIMHIER_COST_CENTER_ID ON
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.COST_CENTER_ID =
DM_LEDGER_DIMHIER_COST_CENTER_ID.HIERARCHY_ID) ON
tbl_CY_PJL_CC_FILTER.COST_CENTER_ID =
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.COST_CENTER_ID
WHERE (((DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.AMT_CLASS_1_TYPE) In
('ACCRUAL','CASH','GAAP','STATUTORY')) AND
((DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.DATE_POSTED)>Max([tbl_CY_PJL_CC_FILTER]![DATE_POSTED]))
AND ((DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.TRANS_CURR_CODE)='USD') AND
((DM_LEDGER_DIMHIER_COST_CENTER_ID.DIMENSION_ID)='Cost_Centers_by_Management_Org'));

Ken Sheridan said:
That's exactly what the query Tom and I gave you should do. All you need to
do is run it at any time and it will append just those rows which have dates
later than the last one you appended.

The dialogue you are getting is a parameter prompt which means that Access
doesn't recognize the reference to the max_date_posted column. This is
presumably because that column only exists in the table you created to store
the latest date. That table isn't necessary at all to achieve what you want.

Lets take the query step by step:

INSERT INTO [Main Table]

In this line Main table is the table to which you are appending the rows.

SELECT [Imported Table].*

In this line Imported table is the table from which you are getting the rows
to be imported into Main Table.

WHERE [date_posted] >

This restricts the rows inserted to those in Imported Table with a date in a
column date_posted later than:

(SELECT NZ(MAX([date_posted]),0)
FROM [Main Table])

which is a subquery which returns the latest value in a column date_posted
in rows *already in* the table in which you are inserting the new rows.

So the end result is that only those rows with dates later than that
returned by the subquery are inserted, i.e. rows with dates *not yet in* Main
Table, which from my reading of your original post is what you want. If I've
misunderstood, and this is not what's required, perhaps you could explain it
in words of one syllable which even an Irish bogtrotter like me can
understand.

Ken Sheridan
Stafford, England

Ray S. said:
I really don't want to have to create a second table, but I don't clearly
understand your instructions, then. I have a table that was created to hold
the updated data. One field in that file is the date_posted. I would like to
now have just one query run each time I update the data. I don't want to
re-create all the data because it is quite large. So, I just want to use the
last date_posted field from the collected data in the table (before update)
to filter the data from the mainframe that will update my local file. I know
that I can determine what the last date_posted was. I know that I could take
that date and use it to filter the data from the mainframe so I avoid
re-creating the entire file. I'm just not clear on how to "automate" what I
am doing. As I mentioned, I tried to follow your instructions, but all I got
was a dialog asking me to put in the max_date_posted.

Ken Sheridan said:
I can't see a problem, nor why you are creating the Max_date_Table at all; by
doing so you are merely introducing redundancy and the consequent risk of
inconsistent data. The latest date of rows so far appended is available from
the table into which you are appending rows with the INSERT statement. The
subquery returns this date and restricts the rows returned by the outer query
to those with values later than that date. Consequently only rows later than
that date are appended, which is what you said you wanted in your original
post.

If you insist on using the Max_date_Table then all you have to do is use
that in the subquery rather than the main table. You won't need to call the
MAX operator of course. But you'll have to update the Max_date_Table every
time so that the value represents the latest date of rows already appended,
whereas the approach described by Tom and myself dynamically identifies the
current latest date every time the query is executed.

In fact, if you do use the Max_date_Table table you don't need a subquery;
you can join the tables:

INSERT INTO [Main Table]
SELECT [Imported Table].*
FROM [Imported Table], [Max_date_Table]
WHERE [Imported Table].[date_posted] > [Max_date_Table].[max_post_date];

Ken Sheridan
Stafford, England

:

I tried this line, but there's a problem. The select query that pulls the max
date posted is pulling it out of the same table I'm appending to. Let's say
Main_Table is the one I'm appending the new data to. What I want to do is
select the last posting date of the data I have just pulled in with the
append query. Now, I want to incorporate that last date (right now I'm using
a make table query to create that max date, let's call that table
Max_date_Table, into my append query as the filter for the date_posted in the
new appended table. What I get when I use your approach is a dialog box
asking me for the max_post_date.

:

You can use a subquery to restrict the outer query to those rows whose
date_posted value is greater than the MAX(date_posted) value in the main
table. This is not the same as greater than the last 'run', as when the
query was last executed the latest date could have been earlier than the date
of execution, but that should not be relevant. The query would be along
these lines:

INSERT INTO [Main Table]
SELECT *
FROM [Imported Table]
WHERE [date_posted] >
(SELECT NZ(MAX([date_posted]),0)
FROM [Main Table]);

This assumes all columns are being inserted. If not use field lists in both
the INSERT and the SELECT clause of the outer query.

Calling the NZ function should not be necessary in your case in fact as its
only there to cater for where the Main Table would be empty, but it will do
no harm to leave it in.

Ken Sheridan
Stafford, England

:

I have made a table that will hold large amounts of financial data obtained
from a mainframe. I saved the make table query, although I only needed to use
it to make the initial table. One of the fields is a date_posted. Then, I
made a select query on the main table initially created containing only one
field and row: the max date_posted. Now, what I want to do is not remake the
main table every time I want to update the data, rather only append to it the
records that have a greater date_posted than the last run. It seems to me
that I could use the results of the last max date_posted query as a filter on
an append query for the main table, but I'm not exactly sure how to do this.
I'd like this all to run smoothly in code or from a macro in one step without
having to copy and paste the select query results into the append query.
 
K

Ken Sheridan

The fact that it’s a query is immaterial; the result set of a query is a
table and can be used just as if it were a base table.

As far as the error is concerned it sounds like you are not using subquery
at all but are just calling MAX function in a WHERE clause; the MAX function
is called in the subquery's SELECT clause and it’s the subquery as a whole
which goes in the outer query's WHERE clause. If all columns returned by the
query are to be inserted into the target table the really all you should need
to do is change the table names in the SQL statement I gave you for the
'append' query to (a) the name of the query and (b) the name of the table
into which the rows are being inserted. Lets assume the query is called
SourceQuery and the table is called TargetTable, in which case the 'append'
query is:

INSERT INTO [TargetTable]
SELECT *
FROM [SourceQuery]
WHERE [date_posted] >
(SELECT NZ(MAX([date_posted]),0)
FROM [TargetTable]);

If its only a subset of the query's columns are to be inserted then you'd
need to specify these both in the INSERT clause (after the table name) and in
the outer query's SELECT clause (in place of the asterisk). In the former
the column list is enclosed within parentheses, in the latter its just a
comma separated list. Both lists must match column for column, but the names
of the columns don't necessarily have to be the same in each.

Ken Sheridan
Stafford, England

Ray S. said:
OK, as I follow your explanation line by line, I note your reference to the
"Imported Table."

Now, nothing is as easy as it seems. I'm actually selecting my data from a
join of two tables - one is what you're calling the Imported_Table, and the
other is a table with some important dimension structure points - let's call
it the Dimension_Table. The actual Imported Table query part has other
criteria...there is a where currency_code is like "USD" and a dimension_id is
like "cc_by_mgmt_org" and amt_type_class is
in('ACCRUAL','CASH','GAAP','STATUTORY')

when I add the where date_posted (that is, of the Imported_Table) is >
Nz(max(date_posted),0) (that is, of the Main_Table)

Access returns an error telling me I can't have an aggregate function in my
where clause. I take that to mean that I can't use the max function in this
way.

Here's the actual query with all it's long actual table names and fields, if
it helps:

SELECT DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.LDR_ENTITY_ID,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.LINE_LDR_ENTITY_ID,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.JRNL_ID,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.JRNL_DESC,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.EFF_DATE,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.PERIOD,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.ACCOUNT_ID,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.COST_CENTER_ID,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.PRIMARY_ID,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.LOB_ID,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.BOB_ID,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.TYPE_CODE_ID,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.JRNL_USER_ALPHA_FLD_1,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.JRNL_USER_ALPHA_FLD_2,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.JRNL_USER_ALPHA_FLD_3,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.POSTING_YR,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.POSTING_PD,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.AMT_CLASS_1_TYPE,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.TRANS_AMT,
DM_LEDGER_DIMHIER_COST_CENTER_ID.LEVEL18_ID,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.DATE_POSTED
FROM tbl_CY_PJL_CC_FILTER INNER JOIN
(DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE INNER JOIN
DM_LEDGER_DIMHIER_COST_CENTER_ID ON
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.COST_CENTER_ID =
DM_LEDGER_DIMHIER_COST_CENTER_ID.HIERARCHY_ID) ON
tbl_CY_PJL_CC_FILTER.COST_CENTER_ID =
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.COST_CENTER_ID
WHERE (((DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.AMT_CLASS_1_TYPE) In
('ACCRUAL','CASH','GAAP','STATUTORY')) AND
((DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.DATE_POSTED)>Max([tbl_CY_PJL_CC_FILTER]![DATE_POSTED]))
AND ((DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.TRANS_CURR_CODE)='USD') AND
((DM_LEDGER_DIMHIER_COST_CENTER_ID.DIMENSION_ID)='Cost_Centers_by_Management_Org'));

Ken Sheridan said:
That's exactly what the query Tom and I gave you should do. All you need to
do is run it at any time and it will append just those rows which have dates
later than the last one you appended.

The dialogue you are getting is a parameter prompt which means that Access
doesn't recognize the reference to the max_date_posted column. This is
presumably because that column only exists in the table you created to store
the latest date. That table isn't necessary at all to achieve what you want.

Lets take the query step by step:

INSERT INTO [Main Table]

In this line Main table is the table to which you are appending the rows.

SELECT [Imported Table].*

In this line Imported table is the table from which you are getting the rows
to be imported into Main Table.

WHERE [date_posted] >

This restricts the rows inserted to those in Imported Table with a date in a
column date_posted later than:

(SELECT NZ(MAX([date_posted]),0)
FROM [Main Table])

which is a subquery which returns the latest value in a column date_posted
in rows *already in* the table in which you are inserting the new rows.

So the end result is that only those rows with dates later than that
returned by the subquery are inserted, i.e. rows with dates *not yet in* Main
Table, which from my reading of your original post is what you want. If I've
misunderstood, and this is not what's required, perhaps you could explain it
in words of one syllable which even an Irish bogtrotter like me can
understand.

Ken Sheridan
Stafford, England

Ray S. said:
I really don't want to have to create a second table, but I don't clearly
understand your instructions, then. I have a table that was created to hold
the updated data. One field in that file is the date_posted. I would like to
now have just one query run each time I update the data. I don't want to
re-create all the data because it is quite large. So, I just want to use the
last date_posted field from the collected data in the table (before update)
to filter the data from the mainframe that will update my local file. I know
that I can determine what the last date_posted was. I know that I could take
that date and use it to filter the data from the mainframe so I avoid
re-creating the entire file. I'm just not clear on how to "automate" what I
am doing. As I mentioned, I tried to follow your instructions, but all I got
was a dialog asking me to put in the max_date_posted.

:

I can't see a problem, nor why you are creating the Max_date_Table at all; by
doing so you are merely introducing redundancy and the consequent risk of
inconsistent data. The latest date of rows so far appended is available from
the table into which you are appending rows with the INSERT statement. The
subquery returns this date and restricts the rows returned by the outer query
to those with values later than that date. Consequently only rows later than
that date are appended, which is what you said you wanted in your original
post.

If you insist on using the Max_date_Table then all you have to do is use
that in the subquery rather than the main table. You won't need to call the
MAX operator of course. But you'll have to update the Max_date_Table every
time so that the value represents the latest date of rows already appended,
whereas the approach described by Tom and myself dynamically identifies the
current latest date every time the query is executed.

In fact, if you do use the Max_date_Table table you don't need a subquery;
you can join the tables:

INSERT INTO [Main Table]
SELECT [Imported Table].*
FROM [Imported Table], [Max_date_Table]
WHERE [Imported Table].[date_posted] > [Max_date_Table].[max_post_date];

Ken Sheridan
Stafford, England

:

I tried this line, but there's a problem. The select query that pulls the max
date posted is pulling it out of the same table I'm appending to. Let's say
Main_Table is the one I'm appending the new data to. What I want to do is
select the last posting date of the data I have just pulled in with the
append query. Now, I want to incorporate that last date (right now I'm using
a make table query to create that max date, let's call that table
Max_date_Table, into my append query as the filter for the date_posted in the
new appended table. What I get when I use your approach is a dialog box
asking me for the max_post_date.

:

You can use a subquery to restrict the outer query to those rows whose
date_posted value is greater than the MAX(date_posted) value in the main
table. This is not the same as greater than the last 'run', as when the
query was last executed the latest date could have been earlier than the date
of execution, but that should not be relevant. The query would be along
these lines:

INSERT INTO [Main Table]
SELECT *
FROM [Imported Table]
WHERE [date_posted] >
(SELECT NZ(MAX([date_posted]),0)
FROM [Main Table]);

This assumes all columns are being inserted. If not use field lists in both
the INSERT and the SELECT clause of the outer query.

Calling the NZ function should not be necessary in your case in fact as its
only there to cater for where the Main Table would be empty, but it will do
no harm to leave it in.

Ken Sheridan
Stafford, England

:

I have made a table that will hold large amounts of financial data obtained
from a mainframe. I saved the make table query, although I only needed to use
it to make the initial table. One of the fields is a date_posted. Then, I
made a select query on the main table initially created containing only one
field and row: the max date_posted. Now, what I want to do is not remake the
main table every time I want to update the data, rather only append to it the
records that have a greater date_posted than the last run. It seems to me
that I could use the results of the last max date_posted query as a filter on
an append query for the main table, but I'm not exactly sure how to do this.
I'd like this all to run smoothly in code or from a macro in one step without
having to copy and paste the select query results into the append query.
 
R

Ray S.

OK, I tried this.

INSERT INTO tbl_CY_PJL_CC_FILTER
SELECT *
FROM SELECT_QRY_MAX_DP
WHERE (((SELECT_QRY_MAX_DP.DATE_POSTED)>(SELECT NZ(MAX([DATE_POSTED]),0)
FROM [tbl_CY_PJL_CC_FILTER])));

It seems to me that I'm following your instructions. There's a table into
which the updated data will go. That's "tbl_CY_PJL_CC_FILTER." I created a
select query that pulls all the fields I want from the mainframe source. It's
"SELECT_QRY_MAX_DP." Now the append query form I'm using here follows your
instructions to the letter. The problem is that this query is now running for
over an hour and has returned nothing. Eventually my connection to the
mainframe will be killed.


Ken Sheridan said:
The fact that it’s a query is immaterial; the result set of a query is a
table and can be used just as if it were a base table.

As far as the error is concerned it sounds like you are not using subquery
at all but are just calling MAX function in a WHERE clause; the MAX function
is called in the subquery's SELECT clause and it’s the subquery as a whole
which goes in the outer query's WHERE clause. If all columns returned by the
query are to be inserted into the target table the really all you should need
to do is change the table names in the SQL statement I gave you for the
'append' query to (a) the name of the query and (b) the name of the table
into which the rows are being inserted. Lets assume the query is called
SourceQuery and the table is called TargetTable, in which case the 'append'
query is:

INSERT INTO [TargetTable]
SELECT *
FROM [SourceQuery]
WHERE [date_posted] >
(SELECT NZ(MAX([date_posted]),0)
FROM [TargetTable]);

If its only a subset of the query's columns are to be inserted then you'd
need to specify these both in the INSERT clause (after the table name) and in
the outer query's SELECT clause (in place of the asterisk). In the former
the column list is enclosed within parentheses, in the latter its just a
comma separated list. Both lists must match column for column, but the names
of the columns don't necessarily have to be the same in each.

Ken Sheridan
Stafford, England

Ray S. said:
OK, as I follow your explanation line by line, I note your reference to the
"Imported Table."

Now, nothing is as easy as it seems. I'm actually selecting my data from a
join of two tables - one is what you're calling the Imported_Table, and the
other is a table with some important dimension structure points - let's call
it the Dimension_Table. The actual Imported Table query part has other
criteria...there is a where currency_code is like "USD" and a dimension_id is
like "cc_by_mgmt_org" and amt_type_class is
in('ACCRUAL','CASH','GAAP','STATUTORY')

when I add the where date_posted (that is, of the Imported_Table) is >
Nz(max(date_posted),0) (that is, of the Main_Table)

Access returns an error telling me I can't have an aggregate function in my
where clause. I take that to mean that I can't use the max function in this
way.

Here's the actual query with all it's long actual table names and fields, if
it helps:

SELECT DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.LDR_ENTITY_ID,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.LINE_LDR_ENTITY_ID,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.JRNL_ID,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.JRNL_DESC,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.EFF_DATE,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.PERIOD,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.ACCOUNT_ID,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.COST_CENTER_ID,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.PRIMARY_ID,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.LOB_ID,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.BOB_ID,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.TYPE_CODE_ID,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.JRNL_USER_ALPHA_FLD_1,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.JRNL_USER_ALPHA_FLD_2,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.JRNL_USER_ALPHA_FLD_3,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.POSTING_YR,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.POSTING_PD,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.AMT_CLASS_1_TYPE,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.TRANS_AMT,
DM_LEDGER_DIMHIER_COST_CENTER_ID.LEVEL18_ID,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.DATE_POSTED
FROM tbl_CY_PJL_CC_FILTER INNER JOIN
(DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE INNER JOIN
DM_LEDGER_DIMHIER_COST_CENTER_ID ON
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.COST_CENTER_ID =
DM_LEDGER_DIMHIER_COST_CENTER_ID.HIERARCHY_ID) ON
tbl_CY_PJL_CC_FILTER.COST_CENTER_ID =
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.COST_CENTER_ID
WHERE (((DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.AMT_CLASS_1_TYPE) In
('ACCRUAL','CASH','GAAP','STATUTORY')) AND
((DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.DATE_POSTED)>Max([tbl_CY_PJL_CC_FILTER]![DATE_POSTED]))
AND ((DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.TRANS_CURR_CODE)='USD') AND
((DM_LEDGER_DIMHIER_COST_CENTER_ID.DIMENSION_ID)='Cost_Centers_by_Management_Org'));

Ken Sheridan said:
That's exactly what the query Tom and I gave you should do. All you need to
do is run it at any time and it will append just those rows which have dates
later than the last one you appended.

The dialogue you are getting is a parameter prompt which means that Access
doesn't recognize the reference to the max_date_posted column. This is
presumably because that column only exists in the table you created to store
the latest date. That table isn't necessary at all to achieve what you want.

Lets take the query step by step:

INSERT INTO [Main Table]

In this line Main table is the table to which you are appending the rows.

SELECT [Imported Table].*

In this line Imported table is the table from which you are getting the rows
to be imported into Main Table.

WHERE [date_posted] >

This restricts the rows inserted to those in Imported Table with a date in a
column date_posted later than:

(SELECT NZ(MAX([date_posted]),0)
FROM [Main Table])

which is a subquery which returns the latest value in a column date_posted
in rows *already in* the table in which you are inserting the new rows.

So the end result is that only those rows with dates later than that
returned by the subquery are inserted, i.e. rows with dates *not yet in* Main
Table, which from my reading of your original post is what you want. If I've
misunderstood, and this is not what's required, perhaps you could explain it
in words of one syllable which even an Irish bogtrotter like me can
understand.

Ken Sheridan
Stafford, England

:

I really don't want to have to create a second table, but I don't clearly
understand your instructions, then. I have a table that was created to hold
the updated data. One field in that file is the date_posted. I would like to
now have just one query run each time I update the data. I don't want to
re-create all the data because it is quite large. So, I just want to use the
last date_posted field from the collected data in the table (before update)
to filter the data from the mainframe that will update my local file. I know
that I can determine what the last date_posted was. I know that I could take
that date and use it to filter the data from the mainframe so I avoid
re-creating the entire file. I'm just not clear on how to "automate" what I
am doing. As I mentioned, I tried to follow your instructions, but all I got
was a dialog asking me to put in the max_date_posted.

:

I can't see a problem, nor why you are creating the Max_date_Table at all; by
doing so you are merely introducing redundancy and the consequent risk of
inconsistent data. The latest date of rows so far appended is available from
the table into which you are appending rows with the INSERT statement. The
subquery returns this date and restricts the rows returned by the outer query
to those with values later than that date. Consequently only rows later than
that date are appended, which is what you said you wanted in your original
post.

If you insist on using the Max_date_Table then all you have to do is use
that in the subquery rather than the main table. You won't need to call the
MAX operator of course. But you'll have to update the Max_date_Table every
time so that the value represents the latest date of rows already appended,
whereas the approach described by Tom and myself dynamically identifies the
current latest date every time the query is executed.

In fact, if you do use the Max_date_Table table you don't need a subquery;
you can join the tables:

INSERT INTO [Main Table]
SELECT [Imported Table].*
FROM [Imported Table], [Max_date_Table]
WHERE [Imported Table].[date_posted] > [Max_date_Table].[max_post_date];

Ken Sheridan
Stafford, England

:

I tried this line, but there's a problem. The select query that pulls the max
date posted is pulling it out of the same table I'm appending to. Let's say
Main_Table is the one I'm appending the new data to. What I want to do is
select the last posting date of the data I have just pulled in with the
append query. Now, I want to incorporate that last date (right now I'm using
a make table query to create that max date, let's call that table
Max_date_Table, into my append query as the filter for the date_posted in the
new appended table. What I get when I use your approach is a dialog box
asking me for the max_post_date.

:

You can use a subquery to restrict the outer query to those rows whose
date_posted value is greater than the MAX(date_posted) value in the main
table. This is not the same as greater than the last 'run', as when the
query was last executed the latest date could have been earlier than the date
of execution, but that should not be relevant. The query would be along
these lines:

INSERT INTO [Main Table]
SELECT *
FROM [Imported Table]
WHERE [date_posted] >
(SELECT NZ(MAX([date_posted]),0)
FROM [Main Table]);

This assumes all columns are being inserted. If not use field lists in both
the INSERT and the SELECT clause of the outer query.

Calling the NZ function should not be necessary in your case in fact as its
only there to cater for where the Main Table would be empty, but it will do
no harm to leave it in.

Ken Sheridan
Stafford, England

:

I have made a table that will hold large amounts of financial data obtained
from a mainframe. I saved the make table query, although I only needed to use
it to make the initial table. One of the fields is a date_posted. Then, I
made a select query on the main table initially created containing only one
field and row: the max date_posted. Now, what I want to do is not remake the
main table every time I want to update the data, rather only append to it the
records that have a greater date_posted than the last run. It seems to me
that I could use the results of the last max date_posted query as a filter on
an append query for the main table, but I'm not exactly sure how to do this.
I'd like this all to run smoothly in code or from a macro in one step without
having to copy and paste the select query results into the append query.
 
K

Ken Sheridan

The SQL statement in principle should be efficient as there is no correlation
between the outer and inner queries.

You originally said that you were using a 'make table' query, which I assume
the SELECT_QRY_MAX_DP reflects in terms of the rows and columns returned,
i.e. it returns all rows including those before the last import. You've not
mentioned any problems with that operation. The current performance problem
therefore must arise from the restriction on the SELECT_QRY_MAX_DP on its
DATE_POSTED column.

You may have to break it down into stages as follows:

1. Append all the rows from your mainframe source into an Access table with:

INSERT INTO tblTemp
SELECT *
FROM SELECT_QRY_MAX_DP;

2. Append the latest rows from tblTemp with:

INSERT INTO tbl_CY_PJL_CC_FILTER
SELECT *
FROM tblTemp
WHERE date_posted >
(SELECT NZ(MAX([date_posted]),0)
FROM tbl_CY_PJL_CC_FILTER);

3. Empty tblTemp with:

DELETE *
FROM tblTemp;

All this can be easily automated as a single operation by executing the
three queries successively in code or a macro. Before doing so, however,
you will need to create tblTemp, which can be very easily done by copying
tbl_CY_PJL_CC_FILTER in the databases window and then pasting its 'structure
only' in as tblTemp. Make sure the date_posted column in each is indexed.
The index will presumably be 'duplicates OK' as I assume the values in this
column are not distinct.

By breaking it down like this you are only connecting with the mainframe for
the first step. The second and third steps are entirely local and should not
suffer from the fact that data from the mainframe is the basis of the
restriction which governs the rows inserted, which seems to be the root of
the current poor performance.

However, this solution is predicated on:

1. the need to edit the data after its pulled from the mainframe. Or

2. the data on the mainframe might be edited or deleted after you've
imported it, but you want the local Access table to keep the values from when
the data was imported. Or

3. a combination of 1 and 2.

If none of these are the case and the imported data will always be the same
as that currently on the mainframe then there is no need to identify when the
data was last imported and you might as well simply empty
tbl_CY_PJL_CC_FILTER with:

DELETE *
FROM tbl_CY_PJL_CC_FILTER;

and then refill it with:

INSERT INTO tbl_CY_PJL_CC_FILTER
SELECT *
FROM SELECT_QRY_MAX_DP;

Just to be absolutely clear here: SELECT_QRY_MAX_DP is a straightforward
'SELECT' equivalent of your original make table query and returns all rows
from the mainframe source tables or files.

Ken Sheridan
Stafford, England

Ray S. said:
OK, I tried this.

INSERT INTO tbl_CY_PJL_CC_FILTER
SELECT *
FROM SELECT_QRY_MAX_DP
WHERE (((SELECT_QRY_MAX_DP.DATE_POSTED)>(SELECT NZ(MAX([DATE_POSTED]),0)
FROM [tbl_CY_PJL_CC_FILTER])));

It seems to me that I'm following your instructions. There's a table into
which the updated data will go. That's "tbl_CY_PJL_CC_FILTER." I created a
select query that pulls all the fields I want from the mainframe source. It's
"SELECT_QRY_MAX_DP." Now the append query form I'm using here follows your
instructions to the letter. The problem is that this query is now running for
over an hour and has returned nothing. Eventually my connection to the
mainframe will be killed.


Ken Sheridan said:
The fact that it’s a query is immaterial; the result set of a query is a
table and can be used just as if it were a base table.

As far as the error is concerned it sounds like you are not using subquery
at all but are just calling MAX function in a WHERE clause; the MAX function
is called in the subquery's SELECT clause and it’s the subquery as a whole
which goes in the outer query's WHERE clause. If all columns returned by the
query are to be inserted into the target table the really all you should need
to do is change the table names in the SQL statement I gave you for the
'append' query to (a) the name of the query and (b) the name of the table
into which the rows are being inserted. Lets assume the query is called
SourceQuery and the table is called TargetTable, in which case the 'append'
query is:

INSERT INTO [TargetTable]
SELECT *
FROM [SourceQuery]
WHERE [date_posted] >
(SELECT NZ(MAX([date_posted]),0)
FROM [TargetTable]);

If its only a subset of the query's columns are to be inserted then you'd
need to specify these both in the INSERT clause (after the table name) and in
the outer query's SELECT clause (in place of the asterisk). In the former
the column list is enclosed within parentheses, in the latter its just a
comma separated list. Both lists must match column for column, but the names
of the columns don't necessarily have to be the same in each.

Ken Sheridan
Stafford, England

Ray S. said:
OK, as I follow your explanation line by line, I note your reference to the
"Imported Table."

Now, nothing is as easy as it seems. I'm actually selecting my data from a
join of two tables - one is what you're calling the Imported_Table, and the
other is a table with some important dimension structure points - let's call
it the Dimension_Table. The actual Imported Table query part has other
criteria...there is a where currency_code is like "USD" and a dimension_id is
like "cc_by_mgmt_org" and amt_type_class is
in('ACCRUAL','CASH','GAAP','STATUTORY')

when I add the where date_posted (that is, of the Imported_Table) is >
Nz(max(date_posted),0) (that is, of the Main_Table)

Access returns an error telling me I can't have an aggregate function in my
where clause. I take that to mean that I can't use the max function in this
way.

Here's the actual query with all it's long actual table names and fields, if
it helps:

SELECT DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.LDR_ENTITY_ID,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.LINE_LDR_ENTITY_ID,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.JRNL_ID,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.JRNL_DESC,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.EFF_DATE,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.PERIOD,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.ACCOUNT_ID,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.COST_CENTER_ID,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.PRIMARY_ID,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.LOB_ID,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.BOB_ID,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.TYPE_CODE_ID,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.JRNL_USER_ALPHA_FLD_1,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.JRNL_USER_ALPHA_FLD_2,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.JRNL_USER_ALPHA_FLD_3,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.POSTING_YR,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.POSTING_PD,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.AMT_CLASS_1_TYPE,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.TRANS_AMT,
DM_LEDGER_DIMHIER_COST_CENTER_ID.LEVEL18_ID,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.DATE_POSTED
FROM tbl_CY_PJL_CC_FILTER INNER JOIN
(DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE INNER JOIN
DM_LEDGER_DIMHIER_COST_CENTER_ID ON
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.COST_CENTER_ID =
DM_LEDGER_DIMHIER_COST_CENTER_ID.HIERARCHY_ID) ON
tbl_CY_PJL_CC_FILTER.COST_CENTER_ID =
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.COST_CENTER_ID
WHERE (((DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.AMT_CLASS_1_TYPE) In
('ACCRUAL','CASH','GAAP','STATUTORY')) AND
((DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.DATE_POSTED)>Max([tbl_CY_PJL_CC_FILTER]![DATE_POSTED]))
AND ((DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.TRANS_CURR_CODE)='USD') AND
((DM_LEDGER_DIMHIER_COST_CENTER_ID.DIMENSION_ID)='Cost_Centers_by_Management_Org'));

:

That's exactly what the query Tom and I gave you should do. All you need to
do is run it at any time and it will append just those rows which have dates
later than the last one you appended.

The dialogue you are getting is a parameter prompt which means that Access
doesn't recognize the reference to the max_date_posted column. This is
presumably because that column only exists in the table you created to store
the latest date. That table isn't necessary at all to achieve what you want.

Lets take the query step by step:

INSERT INTO [Main Table]

In this line Main table is the table to which you are appending the rows.

SELECT [Imported Table].*

In this line Imported table is the table from which you are getting the rows
to be imported into Main Table.

WHERE [date_posted] >

This restricts the rows inserted to those in Imported Table with a date in a
column date_posted later than:

(SELECT NZ(MAX([date_posted]),0)
FROM [Main Table])

which is a subquery which returns the latest value in a column date_posted
in rows *already in* the table in which you are inserting the new rows.

So the end result is that only those rows with dates later than that
returned by the subquery are inserted, i.e. rows with dates *not yet in* Main
Table, which from my reading of your original post is what you want. If I've
misunderstood, and this is not what's required, perhaps you could explain it
in words of one syllable which even an Irish bogtrotter like me can
understand.

Ken Sheridan
Stafford, England

:

I really don't want to have to create a second table, but I don't clearly
understand your instructions, then. I have a table that was created to hold
the updated data. One field in that file is the date_posted. I would like to
now have just one query run each time I update the data. I don't want to
re-create all the data because it is quite large. So, I just want to use the
last date_posted field from the collected data in the table (before update)
to filter the data from the mainframe that will update my local file. I know
that I can determine what the last date_posted was. I know that I could take
that date and use it to filter the data from the mainframe so I avoid
re-creating the entire file. I'm just not clear on how to "automate" what I
am doing. As I mentioned, I tried to follow your instructions, but all I got
was a dialog asking me to put in the max_date_posted.

:

I can't see a problem, nor why you are creating the Max_date_Table at all; by
doing so you are merely introducing redundancy and the consequent risk of
inconsistent data. The latest date of rows so far appended is available from
the table into which you are appending rows with the INSERT statement. The
subquery returns this date and restricts the rows returned by the outer query
to those with values later than that date. Consequently only rows later than
that date are appended, which is what you said you wanted in your original
post.

If you insist on using the Max_date_Table then all you have to do is use
that in the subquery rather than the main table. You won't need to call the
MAX operator of course. But you'll have to update the Max_date_Table every
time so that the value represents the latest date of rows already appended,
whereas the approach described by Tom and myself dynamically identifies the
current latest date every time the query is executed.

In fact, if you do use the Max_date_Table table you don't need a subquery;
you can join the tables:

INSERT INTO [Main Table]
SELECT [Imported Table].*
FROM [Imported Table], [Max_date_Table]
WHERE [Imported Table].[date_posted] > [Max_date_Table].[max_post_date];

Ken Sheridan
Stafford, England

:

I tried this line, but there's a problem. The select query that pulls the max
date posted is pulling it out of the same table I'm appending to. Let's say
Main_Table is the one I'm appending the new data to. What I want to do is
select the last posting date of the data I have just pulled in with the
append query. Now, I want to incorporate that last date (right now I'm using
a make table query to create that max date, let's call that table
Max_date_Table, into my append query as the filter for the date_posted in the
new appended table. What I get when I use your approach is a dialog box
asking me for the max_post_date.

:

You can use a subquery to restrict the outer query to those rows whose
date_posted value is greater than the MAX(date_posted) value in the main
table. This is not the same as greater than the last 'run', as when the
query was last executed the latest date could have been earlier than the date
of execution, but that should not be relevant. The query would be along
these lines:

INSERT INTO [Main Table]
SELECT *
FROM [Imported Table]
WHERE [date_posted] >
(SELECT NZ(MAX([date_posted]),0)
FROM [Main Table]);

This assumes all columns are being inserted. If not use field lists in both
the INSERT and the SELECT clause of the outer query.

Calling the NZ function should not be necessary in your case in fact as its
only there to cater for where the Main Table would be empty, but it will do
no harm to leave it in.

Ken Sheridan
Stafford, England

:

I have made a table that will hold large amounts of financial data obtained
from a mainframe. I saved the make table query, although I only needed to use
it to make the initial table. One of the fields is a date_posted. Then, I
made a select query on the main table initially created containing only one
field and row: the max date_posted. Now, what I want to do is not remake the
main table every time I want to update the data, rather only append to it the
records that have a greater date_posted than the last run. It seems to me
that I could use the results of the last max date_posted query as a filter on
an append query for the main table, but I'm not exactly sure how to do this.
I'd like this all to run smoothly in code or from a macro in one step without
having to copy and paste the select query results into the append query.
 
R

Ray S.

Thanks for your patience. Using the SELECT_QRY_MAX_DP to fill a Temp table is
exactly what I'm trying to avoid. You see, the amount of data is so enormous
that it takes well over an hour to get it from the mainframe. What I want to
do is reduce the amount of data that I have to get from the mainframe
precisely by using a max date posted filter. That way, each time I get data
from the mainframe it is only that portion which was posted since the last
post date. If I do this every day, I should be able to substantially reduce
the download time. The data on the mainframe, once loaded, is never directly
changed. It is financial ledger information. Any change would have to be done
by a later post with the appropriate credit or debit to the corresponding
accounts. I just need a local source of certain business portions of the
ledger so I can do financial analysis.

Quoting: "Just to be absolutely clear here: SELECT_QRY_MAX_DP is a
straightforward 'SELECT' equivalent of your original make table query and
returns all rows from the mainframe source tables or files."

This is absolutely NOT useful. If I simply download all the rows from the
mainframe without filters I would be downloading a file that contains several
years of financial data (totally unnecessary to me). I have to select out
only the current year data and I have to select out only the necessary
business structural points (there are several companies I don't need). That's
why I sent you the exact query - so you could see that I have to join tables
that define the financial structural points I need. Even with all that, as I
said, the download always takes about an hour. If I run the update every
morning, I'm reloading all the data that I already have up to yesterday's
posting plus the new day's posting. I'm trying to figure out how to only
download the data that has been posted since my last update.


Ken Sheridan said:
The SQL statement in principle should be efficient as there is no correlation
between the outer and inner queries.

You originally said that you were using a 'make table' query, which I assume
the SELECT_QRY_MAX_DP reflects in terms of the rows and columns returned,
i.e. it returns all rows including those before the last import. You've not
mentioned any problems with that operation. The current performance problem
therefore must arise from the restriction on the SELECT_QRY_MAX_DP on its
DATE_POSTED column.

You may have to break it down into stages as follows:

1. Append all the rows from your mainframe source into an Access table with:

INSERT INTO tblTemp
SELECT *
FROM SELECT_QRY_MAX_DP;

2. Append the latest rows from tblTemp with:

INSERT INTO tbl_CY_PJL_CC_FILTER
SELECT *
FROM tblTemp
WHERE date_posted >
(SELECT NZ(MAX([date_posted]),0)
FROM tbl_CY_PJL_CC_FILTER);

3. Empty tblTemp with:

DELETE *
FROM tblTemp;

All this can be easily automated as a single operation by executing the
three queries successively in code or a macro. Before doing so, however,
you will need to create tblTemp, which can be very easily done by copying
tbl_CY_PJL_CC_FILTER in the databases window and then pasting its 'structure
only' in as tblTemp. Make sure the date_posted column in each is indexed.
The index will presumably be 'duplicates OK' as I assume the values in this
column are not distinct.

By breaking it down like this you are only connecting with the mainframe for
the first step. The second and third steps are entirely local and should not
suffer from the fact that data from the mainframe is the basis of the
restriction which governs the rows inserted, which seems to be the root of
the current poor performance.

However, this solution is predicated on:

1. the need to edit the data after its pulled from the mainframe. Or

2. the data on the mainframe might be edited or deleted after you've
imported it, but you want the local Access table to keep the values from when
the data was imported. Or

3. a combination of 1 and 2.

If none of these are the case and the imported data will always be the same
as that currently on the mainframe then there is no need to identify when the
data was last imported and you might as well simply empty
tbl_CY_PJL_CC_FILTER with:

DELETE *
FROM tbl_CY_PJL_CC_FILTER;

and then refill it with:

INSERT INTO tbl_CY_PJL_CC_FILTER
SELECT *
FROM SELECT_QRY_MAX_DP;

Just to be absolutely clear here: SELECT_QRY_MAX_DP is a straightforward
'SELECT' equivalent of your original make table query and returns all rows
from the mainframe source tables or files.

Ken Sheridan
Stafford, England

Ray S. said:
OK, I tried this.

INSERT INTO tbl_CY_PJL_CC_FILTER
SELECT *
FROM SELECT_QRY_MAX_DP
WHERE (((SELECT_QRY_MAX_DP.DATE_POSTED)>(SELECT NZ(MAX([DATE_POSTED]),0)
FROM [tbl_CY_PJL_CC_FILTER])));

It seems to me that I'm following your instructions. There's a table into
which the updated data will go. That's "tbl_CY_PJL_CC_FILTER." I created a
select query that pulls all the fields I want from the mainframe source. It's
"SELECT_QRY_MAX_DP." Now the append query form I'm using here follows your
instructions to the letter. The problem is that this query is now running for
over an hour and has returned nothing. Eventually my connection to the
mainframe will be killed.


Ken Sheridan said:
The fact that it’s a query is immaterial; the result set of a query is a
table and can be used just as if it were a base table.

As far as the error is concerned it sounds like you are not using subquery
at all but are just calling MAX function in a WHERE clause; the MAX function
is called in the subquery's SELECT clause and it’s the subquery as a whole
which goes in the outer query's WHERE clause. If all columns returned by the
query are to be inserted into the target table the really all you should need
to do is change the table names in the SQL statement I gave you for the
'append' query to (a) the name of the query and (b) the name of the table
into which the rows are being inserted. Lets assume the query is called
SourceQuery and the table is called TargetTable, in which case the 'append'
query is:

INSERT INTO [TargetTable]
SELECT *
FROM [SourceQuery]
WHERE [date_posted] >
(SELECT NZ(MAX([date_posted]),0)
FROM [TargetTable]);

If its only a subset of the query's columns are to be inserted then you'd
need to specify these both in the INSERT clause (after the table name) and in
the outer query's SELECT clause (in place of the asterisk). In the former
the column list is enclosed within parentheses, in the latter its just a
comma separated list. Both lists must match column for column, but the names
of the columns don't necessarily have to be the same in each.

Ken Sheridan
Stafford, England

:

OK, as I follow your explanation line by line, I note your reference to the
"Imported Table."

Now, nothing is as easy as it seems. I'm actually selecting my data from a
join of two tables - one is what you're calling the Imported_Table, and the
other is a table with some important dimension structure points - let's call
it the Dimension_Table. The actual Imported Table query part has other
criteria...there is a where currency_code is like "USD" and a dimension_id is
like "cc_by_mgmt_org" and amt_type_class is
in('ACCRUAL','CASH','GAAP','STATUTORY')

when I add the where date_posted (that is, of the Imported_Table) is >
Nz(max(date_posted),0) (that is, of the Main_Table)

Access returns an error telling me I can't have an aggregate function in my
where clause. I take that to mean that I can't use the max function in this
way.

Here's the actual query with all it's long actual table names and fields, if
it helps:

SELECT DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.LDR_ENTITY_ID,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.LINE_LDR_ENTITY_ID,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.JRNL_ID,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.JRNL_DESC,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.EFF_DATE,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.PERIOD,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.ACCOUNT_ID,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.COST_CENTER_ID,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.PRIMARY_ID,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.LOB_ID,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.BOB_ID,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.TYPE_CODE_ID,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.JRNL_USER_ALPHA_FLD_1,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.JRNL_USER_ALPHA_FLD_2,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.JRNL_USER_ALPHA_FLD_3,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.POSTING_YR,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.POSTING_PD,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.AMT_CLASS_1_TYPE,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.TRANS_AMT,
DM_LEDGER_DIMHIER_COST_CENTER_ID.LEVEL18_ID,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.DATE_POSTED
FROM tbl_CY_PJL_CC_FILTER INNER JOIN
(DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE INNER JOIN
DM_LEDGER_DIMHIER_COST_CENTER_ID ON
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.COST_CENTER_ID =
DM_LEDGER_DIMHIER_COST_CENTER_ID.HIERARCHY_ID) ON
tbl_CY_PJL_CC_FILTER.COST_CENTER_ID =
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.COST_CENTER_ID
WHERE (((DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.AMT_CLASS_1_TYPE) In
('ACCRUAL','CASH','GAAP','STATUTORY')) AND
((DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.DATE_POSTED)>Max([tbl_CY_PJL_CC_FILTER]![DATE_POSTED]))
AND ((DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.TRANS_CURR_CODE)='USD') AND
((DM_LEDGER_DIMHIER_COST_CENTER_ID.DIMENSION_ID)='Cost_Centers_by_Management_Org'));

:

That's exactly what the query Tom and I gave you should do. All you need to
do is run it at any time and it will append just those rows which have dates
later than the last one you appended.

The dialogue you are getting is a parameter prompt which means that Access
doesn't recognize the reference to the max_date_posted column. This is
presumably because that column only exists in the table you created to store
the latest date. That table isn't necessary at all to achieve what you want.

Lets take the query step by step:

INSERT INTO [Main Table]

In this line Main table is the table to which you are appending the rows.

SELECT [Imported Table].*

In this line Imported table is the table from which you are getting the rows
to be imported into Main Table.

WHERE [date_posted] >

This restricts the rows inserted to those in Imported Table with a date in a
column date_posted later than:

(SELECT NZ(MAX([date_posted]),0)
FROM [Main Table])

which is a subquery which returns the latest value in a column date_posted
in rows *already in* the table in which you are inserting the new rows.

So the end result is that only those rows with dates later than that
returned by the subquery are inserted, i.e. rows with dates *not yet in* Main
Table, which from my reading of your original post is what you want. If I've
misunderstood, and this is not what's required, perhaps you could explain it
in words of one syllable which even an Irish bogtrotter like me can
understand.

Ken Sheridan
Stafford, England

:

I really don't want to have to create a second table, but I don't clearly
understand your instructions, then. I have a table that was created to hold
the updated data. One field in that file is the date_posted. I would like to
now have just one query run each time I update the data. I don't want to
re-create all the data because it is quite large. So, I just want to use the
last date_posted field from the collected data in the table (before update)
to filter the data from the mainframe that will update my local file. I know
that I can determine what the last date_posted was. I know that I could take
that date and use it to filter the data from the mainframe so I avoid
re-creating the entire file. I'm just not clear on how to "automate" what I
am doing. As I mentioned, I tried to follow your instructions, but all I got
was a dialog asking me to put in the max_date_posted.

:

I can't see a problem, nor why you are creating the Max_date_Table at all; by
doing so you are merely introducing redundancy and the consequent risk of
inconsistent data. The latest date of rows so far appended is available from
the table into which you are appending rows with the INSERT statement. The
subquery returns this date and restricts the rows returned by the outer query
to those with values later than that date. Consequently only rows later than
that date are appended, which is what you said you wanted in your original
post.

If you insist on using the Max_date_Table then all you have to do is use
that in the subquery rather than the main table. You won't need to call the
MAX operator of course. But you'll have to update the Max_date_Table every
time so that the value represents the latest date of rows already appended,
whereas the approach described by Tom and myself dynamically identifies the
current latest date every time the query is executed.

In fact, if you do use the Max_date_Table table you don't need a subquery;
you can join the tables:

INSERT INTO [Main Table]
SELECT [Imported Table].*
FROM [Imported Table], [Max_date_Table]
WHERE [Imported Table].[date_posted] > [Max_date_Table].[max_post_date];

Ken Sheridan
Stafford, England

:

I tried this line, but there's a problem. The select query that pulls the max
date posted is pulling it out of the same table I'm appending to. Let's say
Main_Table is the one I'm appending the new data to. What I want to do is
select the last posting date of the data I have just pulled in with the
append query. Now, I want to incorporate that last date (right now I'm using
a make table query to create that max date, let's call that table
Max_date_Table, into my append query as the filter for the date_posted in the
new appended table. What I get when I use your approach is a dialog box
asking me for the max_post_date.

:

You can use a subquery to restrict the outer query to those rows whose
date_posted value is greater than the MAX(date_posted) value in the main
table. This is not the same as greater than the last 'run', as when the
query was last executed the latest date could have been earlier than the date
of execution, but that should not be relevant. The query would be along
these lines:

INSERT INTO [Main Table]
SELECT *
FROM [Imported Table]
WHERE [date_posted] >
(SELECT NZ(MAX([date_posted]),0)
FROM [Main Table]);
 
K

Ken Sheridan

Its seems to be Catch 22. Selecting the rows and filling an empty table
using a normal SELECT query equivalent to your original 'make-table' query is
impractical, but restricting it to rows with date_posted values later than
the MAX date_posted date in the local table, and appending the rows is also
impractical using the standard SQL solution of a subquery. Without a more
efficiently queryable source of the data I don't see how you can have a
solution which will improve performance.

Ken Sheridan
Stafford, England

Ray S. said:
Thanks for your patience. Using the SELECT_QRY_MAX_DP to fill a Temp table is
exactly what I'm trying to avoid. You see, the amount of data is so enormous
that it takes well over an hour to get it from the mainframe. What I want to
do is reduce the amount of data that I have to get from the mainframe
precisely by using a max date posted filter. That way, each time I get data
from the mainframe it is only that portion which was posted since the last
post date. If I do this every day, I should be able to substantially reduce
the download time. The data on the mainframe, once loaded, is never directly
changed. It is financial ledger information. Any change would have to be done
by a later post with the appropriate credit or debit to the corresponding
accounts. I just need a local source of certain business portions of the
ledger so I can do financial analysis.

Quoting: "Just to be absolutely clear here: SELECT_QRY_MAX_DP is a
straightforward 'SELECT' equivalent of your original make table query and
returns all rows from the mainframe source tables or files."

This is absolutely NOT useful. If I simply download all the rows from the
mainframe without filters I would be downloading a file that contains several
years of financial data (totally unnecessary to me). I have to select out
only the current year data and I have to select out only the necessary
business structural points (there are several companies I don't need). That's
why I sent you the exact query - so you could see that I have to join tables
that define the financial structural points I need. Even with all that, as I
said, the download always takes about an hour. If I run the update every
morning, I'm reloading all the data that I already have up to yesterday's
posting plus the new day's posting. I'm trying to figure out how to only
download the data that has been posted since my last update.


Ken Sheridan said:
The SQL statement in principle should be efficient as there is no correlation
between the outer and inner queries.

You originally said that you were using a 'make table' query, which I assume
the SELECT_QRY_MAX_DP reflects in terms of the rows and columns returned,
i.e. it returns all rows including those before the last import. You've not
mentioned any problems with that operation. The current performance problem
therefore must arise from the restriction on the SELECT_QRY_MAX_DP on its
DATE_POSTED column.

You may have to break it down into stages as follows:

1. Append all the rows from your mainframe source into an Access table with:

INSERT INTO tblTemp
SELECT *
FROM SELECT_QRY_MAX_DP;

2. Append the latest rows from tblTemp with:

INSERT INTO tbl_CY_PJL_CC_FILTER
SELECT *
FROM tblTemp
WHERE date_posted >
(SELECT NZ(MAX([date_posted]),0)
FROM tbl_CY_PJL_CC_FILTER);

3. Empty tblTemp with:

DELETE *
FROM tblTemp;

All this can be easily automated as a single operation by executing the
three queries successively in code or a macro. Before doing so, however,
you will need to create tblTemp, which can be very easily done by copying
tbl_CY_PJL_CC_FILTER in the databases window and then pasting its 'structure
only' in as tblTemp. Make sure the date_posted column in each is indexed.
The index will presumably be 'duplicates OK' as I assume the values in this
column are not distinct.

By breaking it down like this you are only connecting with the mainframe for
the first step. The second and third steps are entirely local and should not
suffer from the fact that data from the mainframe is the basis of the
restriction which governs the rows inserted, which seems to be the root of
the current poor performance.

However, this solution is predicated on:

1. the need to edit the data after its pulled from the mainframe. Or

2. the data on the mainframe might be edited or deleted after you've
imported it, but you want the local Access table to keep the values from when
the data was imported. Or

3. a combination of 1 and 2.

If none of these are the case and the imported data will always be the same
as that currently on the mainframe then there is no need to identify when the
data was last imported and you might as well simply empty
tbl_CY_PJL_CC_FILTER with:

DELETE *
FROM tbl_CY_PJL_CC_FILTER;

and then refill it with:

INSERT INTO tbl_CY_PJL_CC_FILTER
SELECT *
FROM SELECT_QRY_MAX_DP;

Just to be absolutely clear here: SELECT_QRY_MAX_DP is a straightforward
'SELECT' equivalent of your original make table query and returns all rows
from the mainframe source tables or files.

Ken Sheridan
Stafford, England

Ray S. said:
OK, I tried this.

INSERT INTO tbl_CY_PJL_CC_FILTER
SELECT *
FROM SELECT_QRY_MAX_DP
WHERE (((SELECT_QRY_MAX_DP.DATE_POSTED)>(SELECT NZ(MAX([DATE_POSTED]),0)
FROM [tbl_CY_PJL_CC_FILTER])));

It seems to me that I'm following your instructions. There's a table into
which the updated data will go. That's "tbl_CY_PJL_CC_FILTER." I created a
select query that pulls all the fields I want from the mainframe source. It's
"SELECT_QRY_MAX_DP." Now the append query form I'm using here follows your
instructions to the letter. The problem is that this query is now running for
over an hour and has returned nothing. Eventually my connection to the
mainframe will be killed.


:

The fact that it’s a query is immaterial; the result set of a query is a
table and can be used just as if it were a base table.

As far as the error is concerned it sounds like you are not using subquery
at all but are just calling MAX function in a WHERE clause; the MAX function
is called in the subquery's SELECT clause and it’s the subquery as a whole
which goes in the outer query's WHERE clause. If all columns returned by the
query are to be inserted into the target table the really all you should need
to do is change the table names in the SQL statement I gave you for the
'append' query to (a) the name of the query and (b) the name of the table
into which the rows are being inserted. Lets assume the query is called
SourceQuery and the table is called TargetTable, in which case the 'append'
query is:

INSERT INTO [TargetTable]
SELECT *
FROM [SourceQuery]
WHERE [date_posted] >
(SELECT NZ(MAX([date_posted]),0)
FROM [TargetTable]);

If its only a subset of the query's columns are to be inserted then you'd
need to specify these both in the INSERT clause (after the table name) and in
the outer query's SELECT clause (in place of the asterisk). In the former
the column list is enclosed within parentheses, in the latter its just a
comma separated list. Both lists must match column for column, but the names
of the columns don't necessarily have to be the same in each.

Ken Sheridan
Stafford, England

:

OK, as I follow your explanation line by line, I note your reference to the
"Imported Table."

Now, nothing is as easy as it seems. I'm actually selecting my data from a
join of two tables - one is what you're calling the Imported_Table, and the
other is a table with some important dimension structure points - let's call
it the Dimension_Table. The actual Imported Table query part has other
criteria...there is a where currency_code is like "USD" and a dimension_id is
like "cc_by_mgmt_org" and amt_type_class is
in('ACCRUAL','CASH','GAAP','STATUTORY')

when I add the where date_posted (that is, of the Imported_Table) is >
Nz(max(date_posted),0) (that is, of the Main_Table)

Access returns an error telling me I can't have an aggregate function in my
where clause. I take that to mean that I can't use the max function in this
way.

Here's the actual query with all it's long actual table names and fields, if
it helps:

SELECT DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.LDR_ENTITY_ID,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.LINE_LDR_ENTITY_ID,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.JRNL_ID,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.JRNL_DESC,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.EFF_DATE,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.PERIOD,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.ACCOUNT_ID,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.COST_CENTER_ID,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.PRIMARY_ID,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.LOB_ID,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.BOB_ID,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.TYPE_CODE_ID,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.JRNL_USER_ALPHA_FLD_1,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.JRNL_USER_ALPHA_FLD_2,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.JRNL_USER_ALPHA_FLD_3,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.POSTING_YR,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.POSTING_PD,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.AMT_CLASS_1_TYPE,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.TRANS_AMT,
DM_LEDGER_DIMHIER_COST_CENTER_ID.LEVEL18_ID,
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.DATE_POSTED
FROM tbl_CY_PJL_CC_FILTER INNER JOIN
(DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE INNER JOIN
DM_LEDGER_DIMHIER_COST_CENTER_ID ON
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.COST_CENTER_ID =
DM_LEDGER_DIMHIER_COST_CENTER_ID.HIERARCHY_ID) ON
tbl_CY_PJL_CC_FILTER.COST_CENTER_ID =
DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.COST_CENTER_ID
WHERE (((DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.AMT_CLASS_1_TYPE) In
('ACCRUAL','CASH','GAAP','STATUTORY')) AND
((DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.DATE_POSTED)>Max([tbl_CY_PJL_CC_FILTER]![DATE_POSTED]))
AND ((DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE.TRANS_CURR_CODE)='USD') AND
((DM_LEDGER_DIMHIER_COST_CENTER_ID.DIMENSION_ID)='Cost_Centers_by_Management_Org'));

:

That's exactly what the query Tom and I gave you should do. All you need to
do is run it at any time and it will append just those rows which have dates
later than the last one you appended.

The dialogue you are getting is a parameter prompt which means that Access
doesn't recognize the reference to the max_date_posted column. This is
presumably because that column only exists in the table you created to store
the latest date. That table isn't necessary at all to achieve what you want.

Lets take the query step by step:

INSERT INTO [Main Table]

In this line Main table is the table to which you are appending the rows.

SELECT [Imported Table].*

In this line Imported table is the table from which you are getting the rows
to be imported into Main Table.

WHERE [date_posted] >

This restricts the rows inserted to those in Imported Table with a date in a
column date_posted later than:

(SELECT NZ(MAX([date_posted]),0)
FROM [Main Table])

which is a subquery which returns the latest value in a column date_posted
in rows *already in* the table in which you are inserting the new rows.

So the end result is that only those rows with dates later than that
returned by the subquery are inserted, i.e. rows with dates *not yet in* Main
Table, which from my reading of your original post is what you want. If I've
misunderstood, and this is not what's required, perhaps you could explain it
in words of one syllable which even an Irish bogtrotter like me can
understand.

Ken Sheridan
Stafford, England

:

I really don't want to have to create a second table, but I don't clearly
understand your instructions, then. I have a table that was created to hold
the updated data. One field in that file is the date_posted. I would like to
now have just one query run each time I update the data. I don't want to
re-create all the data because it is quite large. So, I just want to use the
last date_posted field from the collected data in the table (before update)
to filter the data from the mainframe that will update my local file. I know
that I can determine what the last date_posted was. I know that I could take
that date and use it to filter the data from the mainframe so I avoid
re-creating the entire file. I'm just not clear on how to "automate" what I
am doing. As I mentioned, I tried to follow your instructions, but all I got
was a dialog asking me to put in the max_date_posted.

:

I can't see a problem, nor why you are creating the Max_date_Table at all; by
doing so you are merely introducing redundancy and the consequent risk of
inconsistent data. The latest date of rows so far appended is available from
the table into which you are appending rows with the INSERT statement. The
subquery returns this date and restricts the rows returned by the outer query
to those with values later than that date. Consequently only rows later than
that date are appended, which is what you said you wanted in your original
post.

If you insist on using the Max_date_Table then all you have to do is use
that in the subquery rather than the main table. You won't need to call the
MAX operator of course. But you'll have to update the Max_date_Table every
time so that the value represents the latest date of rows already appended,
whereas the approach described by Tom and myself dynamically identifies the
current latest date every time the query is executed.

In fact, if you do use the Max_date_Table table you don't need a subquery;
you can join the tables:

INSERT INTO [Main Table]
SELECT [Imported Table].*
FROM [Imported Table], [Max_date_Table]
 
J

John Spencer

I don't know if this will be any faster or will work any better, but you can
try the following. I think the problem could be that Access is downloading
the entire table from the source in order to decide which records it needs.
That could occur since you are joining on tables in the source and in Access.
I attempted to get around that by dropping the join and inserting the
criteria into the where clause. That may or may not make any difference.

Good luck.

INSERT INTO tbl_CY_PJL_CC_FILTER
(LDR_ENTITY_ID,
LINE_LDR_ENTITY_ID,
JRNL_ID,
JRNL_DESC,
EFF_DATE,
PERIOD,
ACCOUNT_ID,
COST_CENTER_ID,
PRIMARY_ID,
LOB_ID,
BOB_ID,
TYPE_CODE_ID,
JRNL_USER_ALPHA_FLD_1,
JRNL_USER_ALPHA_FLD_2,
JRNL_USER_ALPHA_FLD_3,
POSTING_YR,
POSTING_PD,
AMT_CLASS_1_TYPE,
TRANS_AMT,
LEVEL18_ID,
DATE_POSTED)

SELECT JL.LDR_ENTITY_ID,
JL.LINE_LDR_ENTITY_ID,
JL.JRNL_ID,
JL.JRNL_DESC,
JL.EFF_DATE,
JL.PERIOD,
JL.ACCOUNT_ID,
JL.COST_CENTER_ID,
JL.PRIMARY_ID,
JL.LOB_ID,
JL.BOB_ID,
JL.TYPE_CODE_ID,
JL.JRNL_USER_ALPHA_FLD_1,
JL.JRNL_USER_ALPHA_FLD_2,
JL.JRNL_USER_ALPHA_FLD_3,
JL.POSTING_YR,
JL.POSTING_PD,
JL.AMT_CLASS_1_TYPE,
JL.TRANS_AMT,
DM_LEDGER_DIMHIER_COST_CENTER_ID.LEVEL18_ID,
JL.DATE_POSTED
FROM DM_LEDGER_CURRENT_YEAR_POSTED_JRNL_LINE AS JL
INNER JOIN DM_LEDGER_DIMHIER_COST_CENTER_ID
ON JL.COST_CENTER_ID = DM_LEDGER_DIMHIER_COST_CENTER_ID.HIERARCHY_ID
WHERE (JL.AMT_CLASS_1_TYPE In ('ACCRUAL','CASH','GAAP','STATUTORY')
AND JL.DATE_POSTED > (SELECT Max([DATE_POSTED])
FROM tbl_CY_PJL_CC_FILTER as F)
AND JL.COST_CENTER_ID IN
(SELECT DISTINCT Cost_Center_ID
FROM tbl_CY_PJL_CC_FILTER as F
AND JL.TRANS_CURR_CODE='USD' AND
DM_LEDGER_DIMHIER_COST_CENTER_ID.DIMENSION_ID='Cost_Centers_by_Management_Org'


John Spencer
Access MVP 2002-2005, 2007-2008
The Hilltop Institute
University of Maryland Baltimore County
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top