Problem with Dual subselect

  • Thread starter robert d via AccessMonster.com
  • Start date
R

robert d via AccessMonster.com

I've been struggling with a subselect query. Here's the query:

------------------------------------------------------------------------------
--
SELECT SUM(COST) As SumCosts, SUM(SAVINGS) As SumSavings
FROM [TABLEA] As A
WHERE (A.STATUS_DATE IN (SELECT MAX(B.STATUS_DATE) FROM [TABLEB] As B WHERE B.
[PROJECT] = A.PROJECT))
OR (A.PROJECT IN (SELECT B.PROJECT FROM [TABLEB] As B WHERE B.S_IND = 'Y'))
------------------------------------------------------------------------------
--


This query is used when a user selects "(All)" in the project combobox. It's
executed as an SQL statement using DAO.

HOWEVER, All is not what you think it might mean. Because these tables
contain projects at different statuses, to include every record would double
count projects that have more than one status.

"(All)" means:

1) Select the project/status date where the Status_Date is the most recent
2) If a Project/Status_Date in TableB has the S_IND = 'Y', then select that
Project/Status Date regardless of any other criteria.

I just can't seem to get the correct Projects/Status Dates combinations
returned. I'm not getting errors, though.

Any help is appreciated.
 
V

Vincent Johns

(This would have been much easier to analyze had you included an example
of what data you were using, and what you wanted the results to look
like. I don't know what you consider to be "correct Projects/Status
Dates combinations". What follows is my guess at what might be going on.)

Suppose you have the following data in [TABLEA] and [TABLEB].

[TABLEA] Table Datasheet View:

COST SAVINGS PROJECT STATUS_DATE
------- ------- ------- -----------
$150.00 $1.00 Daffy 12/1/2005
$77.00 $2.00 Daffy 12/3/2005
$50.00 $16.00 Bugs 12/5/2005
$100.00 $4.00 Bugs 12/15/2005
$350.00 $8.00 Daisy 12/15/2005

[TABLEB] Table Datasheet View:

PROJECT S_IND STATUS_DATE
------- ----- -----------
Bugs Q 12/15/2005
Daisy B 12/7/2005
Elmer Y 12/25/2005
Daffy Y 12/2/2005
Daffy W 12/5/2005
Bugs Z 12/1/2005

The SQL I used is pretty close to yours...

[Q_Old] SQL:

SELECT Sum([COST]) AS SumCosts,
Sum([SAVINGS]) AS SumSavings
FROM TABLEA AS A
WHERE (((A.STATUS_DATE) In
(SELECT MAX(B.STATUS_DATE)
FROM [TABLEB] As B
WHERE B.[PROJECT] = A.PROJECT)))
OR (((A.PROJECT) In
(SELECT B.PROJECT FROM [TABLEB] As B
WHERE B.S_IND = 'Y')));

.... and I got the following results.

[Q_Old] Query Datasheet View:

SumCosts SumSavings
-------- ----------
$327.00 $7.00

The summed records include the two "Daffy" records and the second "Bugs"
record (and no others). How is this different from what you wanted?
You said...

"1) Select the project/status date where the Status_Date is the most recent"

Of the "Bugs" records, the more recent was the only one included. No
"Elmer" record was included, even though it had a later date in [TABLEB]
because there was no "Elmer" in [TABLEA]. No "Daisy" record was
included because no "Daisy" date in [TABLEA] matched the maximum "Daisy"
date in [TABLEB].

"2) If a Project/Status_Date in TableB has the S_IND = 'Y', then select
that Project/Status Date regardless of any other criteria. "

Since "Elmer" and "Daffy" projects had [TABLEB].[S_IND] = 'Y', all
records in [TABLEA] from both projects were included, regardless of
other criteria.

It looks to me as if your verbal description is fairly close to what
your SQL (or my version of it) does, and if you don't like the results
you're getting, perhaps the criteria need to be revised.

Also, as I suggested, an example would help. Specifically, given the
[TABLEA] and [TABLEB] that I used, what would you want the results to be?

-- Vincent Johns <[email protected]>
Please feel free to quote anything I say here.

I've been struggling with a subselect query. Here's the query:

------------------------------------------------------------------------------
--
SELECT SUM(COST) As SumCosts, SUM(SAVINGS) As SumSavings
FROM [TABLEA] As A
WHERE (A.STATUS_DATE IN (SELECT MAX(B.STATUS_DATE) FROM [TABLEB] As B WHERE B.
[PROJECT] = A.PROJECT))
OR (A.PROJECT IN (SELECT B.PROJECT FROM [TABLEB] As B WHERE B.S_IND = 'Y'))
------------------------------------------------------------------------------
--


This query is used when a user selects "(All)" in the project combobox. It's
executed as an SQL statement using DAO.

HOWEVER, All is not what you think it might mean. Because these tables
contain projects at different statuses, to include every record would double
count projects that have more than one status.

"(All)" means:

1) Select the project/status date where the Status_Date is the most recent
2) If a Project/Status_Date in TableB has the S_IND = 'Y', then select that
Project/Status Date regardless of any other criteria.

I just can't seem to get the correct Projects/Status Dates combinations
returned. I'm not getting errors, though.

Any help is appreciated.
 
R

robert d via AccessMonster.com

Thank you, Vincent, for taking the time to consider my problem.

You are quite right, I should have included data. Below, I've included data.
Note that the Project and Status Date in TableA must exist in TableB. TableB
is where the project, status and status date are created. TableA contains
additional details on the particular project/status date (in a one [TableB)
to many (TableA) relationship]:

TABLEA
Project Status Date Cost Savings
A 6/27/2005 1,000,000 50,000
B 2/11/2004 2,000,000 100,000
B 11/23/2003 2,250,000 90,000
A 12/11/2005 3,000,000 150,000
C 1/25/2003 4,000,000 200,000
D 10/31/2004 5,000,000 250,000


TABLEB
Project Status Date S_IND
A 4/29/2004
A 6/27/2005
A 12/11/2005
B 2/11/2004
B 11/23/2003 Y
C 1/25/2003
D 10/31/2004
E 7/25/2005

Of course each table has many more fields and TABLEA generally has many
records for each project/status date (that's why SUM is used).

The following records in the TABLEA shown would be included in the select:
TABLEA
Project Status Date Cost Savings
A 12/11/2005 3,000,000 150,000
B 2/11/2004 2,000,000 100,000
B 11/23/2003 2,250,000 90,000
C 1/25/2003 4,000,000 200,000
D 10/31/2004 5,000,000 250,000


Project = B with Status Date = 11/23/2003 would be selected because S_IND in
Table B = 'Y'. Note also that Project E in TABLEB is not to be returned
because there are no corresponding records in TableA. The other
Project/Status Date combinations are selected because the Status Dates are
the maximum for that Project.

Thanks.


Vincent said:
(This would have been much easier to analyze had you included an example
of what data you were using, and what you wanted the results to look
like. I don't know what you consider to be "correct Projects/Status
Dates combinations". What follows is my guess at what might be going on.)

Suppose you have the following data in [TABLEA] and [TABLEB].

[TABLEA] Table Datasheet View:

COST SAVINGS PROJECT STATUS_DATE
------- ------- ------- -----------
$150.00 $1.00 Daffy 12/1/2005
$77.00 $2.00 Daffy 12/3/2005
$50.00 $16.00 Bugs 12/5/2005
$100.00 $4.00 Bugs 12/15/2005
$350.00 $8.00 Daisy 12/15/2005

[TABLEB] Table Datasheet View:

PROJECT S_IND STATUS_DATE
------- ----- -----------
Bugs Q 12/15/2005
Daisy B 12/7/2005
Elmer Y 12/25/2005
Daffy Y 12/2/2005
Daffy W 12/5/2005
Bugs Z 12/1/2005

The SQL I used is pretty close to yours...

[Q_Old] SQL:

SELECT Sum([COST]) AS SumCosts,
Sum([SAVINGS]) AS SumSavings
FROM TABLEA AS A
WHERE (((A.STATUS_DATE) In
(SELECT MAX(B.STATUS_DATE)
FROM [TABLEB] As B
WHERE B.[PROJECT] = A.PROJECT)))
OR (((A.PROJECT) In
(SELECT B.PROJECT FROM [TABLEB] As B
WHERE B.S_IND = 'Y')));

... and I got the following results.

[Q_Old] Query Datasheet View:

SumCosts SumSavings
-------- ----------
$327.00 $7.00

The summed records include the two "Daffy" records and the second "Bugs"
record (and no others). How is this different from what you wanted?
You said...

"1) Select the project/status date where the Status_Date is the most recent"

Of the "Bugs" records, the more recent was the only one included. No
"Elmer" record was included, even though it had a later date in [TABLEB]
because there was no "Elmer" in [TABLEA]. No "Daisy" record was
included because no "Daisy" date in [TABLEA] matched the maximum "Daisy"
date in [TABLEB].

"2) If a Project/Status_Date in TableB has the S_IND = 'Y', then select
that Project/Status Date regardless of any other criteria. "

Since "Elmer" and "Daffy" projects had [TABLEB].[S_IND] = 'Y', all
records in [TABLEA] from both projects were included, regardless of
other criteria.

It looks to me as if your verbal description is fairly close to what
your SQL (or my version of it) does, and if you don't like the results
you're getting, perhaps the criteria need to be revised.

Also, as I suggested, an example would help. Specifically, given the
[TABLEA] and [TABLEB] that I used, what would you want the results to be?

I've been struggling with a subselect query. Here's the query:
[quoted text clipped - 25 lines]
Any help is appreciated.
 
V

Vincent Johns

Well, when I put your data into my Tables and ran the Query, I got what
you got. So I don't know what you want to see that's different. I have
to run now, but I can post the details of my versions of your Query later.

-- Vincent Johns <[email protected]>
Please feel free to quote anything I say here.
Thank you, Vincent, for taking the time to consider my problem.

You are quite right, I should have included data. Below, I've included data.
Note that the Project and Status Date in TableA must exist in TableB. TableB
is where the project, status and status date are created. TableA contains
additional details on the particular project/status date (in a one [TableB)
to many (TableA) relationship]:

TABLEA
Project Status Date Cost Savings
A 6/27/2005 1,000,000 50,000
B 2/11/2004 2,000,000 100,000
B 11/23/2003 2,250,000 90,000
A 12/11/2005 3,000,000 150,000
C 1/25/2003 4,000,000 200,000
D 10/31/2004 5,000,000 250,000


TABLEB
Project Status Date S_IND
A 4/29/2004
A 6/27/2005
A 12/11/2005
B 2/11/2004
B 11/23/2003 Y
C 1/25/2003
D 10/31/2004
E 7/25/2005

Of course each table has many more fields and TABLEA generally has many
records for each project/status date (that's why SUM is used).

The following records in the TABLEA shown would be included in the select:
TABLEA
Project Status Date Cost Savings
A 12/11/2005 3,000,000 150,000
B 2/11/2004 2,000,000 100,000
B 11/23/2003 2,250,000 90,000
C 1/25/2003 4,000,000 200,000
D 10/31/2004 5,000,000 250,000


Project = B with Status Date = 11/23/2003 would be selected because S_IND in
Table B = 'Y'. Note also that Project E in TABLEB is not to be returned
because there are no corresponding records in TableA. The other
Project/Status Date combinations are selected because the Status Dates are
the maximum for that Project.

Thanks.


Vincent said:
(This would have been much easier to analyze had you included an example
of what data you were using, and what you wanted the results to look
like. I don't know what you consider to be "correct Projects/Status
Dates combinations". What follows is my guess at what might be going on.)

Suppose you have the following data in [TABLEA] and [TABLEB].

[TABLEA] Table Datasheet View:

COST SAVINGS PROJECT STATUS_DATE
------- ------- ------- -----------
$150.00 $1.00 Daffy 12/1/2005
$77.00 $2.00 Daffy 12/3/2005
$50.00 $16.00 Bugs 12/5/2005
$100.00 $4.00 Bugs 12/15/2005
$350.00 $8.00 Daisy 12/15/2005

[TABLEB] Table Datasheet View:

PROJECT S_IND STATUS_DATE
------- ----- -----------
Bugs Q 12/15/2005
Daisy B 12/7/2005
Elmer Y 12/25/2005
Daffy Y 12/2/2005
Daffy W 12/5/2005
Bugs Z 12/1/2005

The SQL I used is pretty close to yours...

[Q_Old] SQL:

SELECT Sum([COST]) AS SumCosts,
Sum([SAVINGS]) AS SumSavings
FROM TABLEA AS A
WHERE (((A.STATUS_DATE) In
(SELECT MAX(B.STATUS_DATE)
FROM [TABLEB] As B
WHERE B.[PROJECT] = A.PROJECT)))
OR (((A.PROJECT) In
(SELECT B.PROJECT FROM [TABLEB] As B
WHERE B.S_IND = 'Y')));

... and I got the following results.

[Q_Old] Query Datasheet View:

SumCosts SumSavings
-------- ----------
$327.00 $7.00

The summed records include the two "Daffy" records and the second "Bugs"
record (and no others). How is this different from what you wanted?
You said...

"1) Select the project/status date where the Status_Date is the most recent"

Of the "Bugs" records, the more recent was the only one included. No
"Elmer" record was included, even though it had a later date in [TABLEB]
because there was no "Elmer" in [TABLEA]. No "Daisy" record was
included because no "Daisy" date in [TABLEA] matched the maximum "Daisy"
date in [TABLEB].

"2) If a Project/Status_Date in TableB has the S_IND = 'Y', then select
that Project/Status Date regardless of any other criteria. "

Since "Elmer" and "Daffy" projects had [TABLEB].[S_IND] = 'Y', all
records in [TABLEA] from both projects were included, regardless of
other criteria.

It looks to me as if your verbal description is fairly close to what
your SQL (or my version of it) does, and if you don't like the results
you're getting, perhaps the criteria need to be revised.

Also, as I suggested, an example would help. Specifically, given the
[TABLEA] and [TABLEB] that I used, what would you want the results to be?

-- Vincent Johns <[email protected]>
Please feel free to quote anything I say here.

I've been struggling with a subselect query. Here's the query:

[quoted text clipped - 25 lines]
Any help is appreciated.
 
R

robert d via AccessMonster.com

You are right, that set of pseudo data does work correctly (I used my actual
data; I abbreviated a lot of things in what I posted).

However, add the following record to TABLEA
PROJECT = B
STATUS_DATE = 1/31/2004
COST = 2250000
SAVINGS = 100000

To TABLEB Add:
PROJECT = B
STATUS_DATE = 1/31/2004

Leave S_Ind blank in TABLEB for this new record.

Now I get three records are returned for Project B, when I'm still wanting
only the two posted earlier. This is also what I see in my actual data.

Thanks.

Vincent said:
Well, when I put your data into my Tables and ran the Query, I got what
you got. So I don't know what you want to see that's different. I have
to run now, but I can post the details of my versions of your Query later.

Thank you, Vincent, for taking the time to consider my problem.
[quoted text clipped - 129 lines]
 
V

Vincent Johns

Robert,

I revised your Tables a bit, adding a primary key (Autonumber data type)
to each one to simplify the Queries slightly. It's not necessary to do
that, but my suggestion is to at least look at what I did so you can
decide if it makes sense to do the same thing in your own Tables.

Adding the primary keys is simple; just define a field with Autonumber
type in each Table.

Adding the foreign key to [TABLEA] is slightly trickier; it's used to
link each record in [TABLEA] to its corresponding record in [TABLEB].
To add the [TableB_ID] foreign-key field to [TABLEA], open [TABLEA] in
Table Design View and add a field called [TableB_ID] with Data Type =
Number (long integer) and no index. (It will look like an Autonumber
but will behave differently -- it's not unique, and the values will be
copied from [TABLEB] instead of being generated automatically.) At this
point, it will have no values; to set the values, run the following
Update Query:

[QU_TableB_ID_Setup] SQL:
UPDATE TABLEA INNER JOIN TABLEB
ON (TABLEA.Project = TABLEB.Project)
AND (TABLEA.[Status Date] = TABLEB.[Status Date])
SET TABLEA.TableB_ID = [TABLEB]![TableB_ID];

Then, in the Relationships window, delete the existing relationship
between [TABLEB] and [TABLEA] and set a new one linking
[TABLEB].[TableB_ID] and [TABLEA].[TableB_ID], setting Referential
Integrity on this relationship.

Having done this, and no longer needing the duplicate fields
[TABLEA].[Project] and [TABLEA].[Status Date], I deleted them from
[TABLEA]. At this point, the Tables contained the following records:

[TABLEB] Table Datasheet View:
TableB_ID Project Status Date S_IND
----------- -------- ------------- -----
-1869411747 A 4/29/2004
-1679588807 B 11/23/2003 Y
-1670788141 A 12/11/2005
-1585945064 B 2/11/2004
-1499531505 D 10/31/2004
-727128156 E 7/25/2005
-12772858 C 1/25/2003
191389786 A 6/27/2005
1136072244 B 1/31/2004

[TABLEA] Table Datasheet View:

TableA_ID TableB_ID Cost Savings
----------- ------------ ------------- ------------
-2008314815 -1679588807 $2,250,000.00 $90,000.00
-1244319657 -12772858 $4,000,000.00 $200,000.00
-205746725 1136072244 $2,250,000.00 $100,000.00
581413196 -1499531505 $5,000,000.00 $250,000.00
1283352731 191389786 $1,000,000.00 $50,000.00
1619660142 -1670788141 $3,000,000.00 $150,000.00
1676911936 -1585945064 $2,000,000.00 $100,000.00

If you wish, you can then set a Lookup Property on the new
[TABLEA].[TableB_ID] foreign key, so that it will have a meaningful
appearance. (Raw key values are often, such as in my example here, not
meaningful to human beings. A Lookup property on the foreign-key field
allows you to read the records more easily, and I normally set it for
each foreign key I use, but some people prefer not to do so. In case
you wish to use a Lookup, I'll describe what I did to set it up. In
case you DON'T wish to use it, just skip to the bottom -- the Queries
will work just as well without the Lookup.)

For the Lookup property, I defined a Query to summarize the data in each
record in [TABLEB], including ">" if S_IND = "Y", the project name, and
the status date. There's nothing magic about these choices, but the
result should be meaningful to you, unique within your Table, and fairly
short. I sorted them by [S_IND], project name, and date.

[QL_TableB_Projects] SQL:
SELECT TABLEB.TableB_ID,
IIf([TABLEB]![S_IND]="Y",">"," ")
& [TABLEB]![Project] & " "
& [TABLEB]![Status Date] AS Proj
FROM TABLEB
ORDER BY TABLEB.S_IND DESC ,
TABLEB.Project, TABLEB.[Status Date] DESC;

[QL_TableB_Projects] Query Datasheet View:
TableB_ID Proj
-------------- --------------
-1679588807 >B 11/23/2003
-1670788141 A 12/11/2005
191389786 A 6/27/2005
-1869411747 A 4/29/2004
-1585945064 B 2/11/2004
1136072244 B 1/31/2004
-12772858 C 1/25/2003
-1499531505 D 10/31/2004
-727128156 E 7/25/2005

To apply this, I opened [TABLEA] in Table Design View, selected the
[TableB_ID] field, opened its Lookup tab, and set the properties as follows:

Display Control = List Box (instead of Text Box)
Row Source = QL_TableB_Projects
Column Count = 2
Column Widths = 0;1

I then opened [TABLEA] in Table Datasheet View, selected the [TableA_ID]
column, and used Format --> Hide Columns to hide it, since I had no need
to see those values. Actually, as it turns out, I had no need for the
entire [TABLEA].[TableA_ID] field, since no other Table makes reference
to it, so I could have deleted it, but leaving it in there does little
harm. I just don't want to have to look at it.

The result of setting Lookup property on the foreign key [TableB_ID] and
hiding the primary key [TableA_ID] was a prettier, more legible [TABLEA].

[TABLEA] Table Datasheet View (with Lookup):
TableB_ID Cost Savings
-------------- --------------- ------------
A 6/27/2005 $1,000,000.00 $50,000.00
B 1/31/2004 $2,250,000.00 $100,000.00
B 2/11/2004 $2,000,000.00 $100,000.00
C 1/25/2003 $4,000,000.00 $200,000.00
D 10/31/2004 $5,000,000.00 $250,000.00
E 7/25/2005 $3,000,000.00 $150,000.00
B 11/23/2003 $2,250,000.00 $90,000.00

Please bear in mind that the values stored in the [TableB_ID] field here
are still those key values you saw earlier, but they are displayed here
in a more meaningful format. (However, some people find this confusing
and recommend against using Lookup properties largely for that reason.)

If you use the Form Wizard to generate a Form for this Table, the Lookup
property will be copied to the List Box or Combo Box on that Form that
matches this field.

===

OK. Now we set up a revised Query (using the revised [TABLEB]
and[TABLEA], but I expect you can easily edit it to use your original
Table structures).

When I ran (my version of) your original Query on your Tables, I got the
following results, and I assume this is what you saw, too (extra record
for Project B):

[Q_Old_Orig_Select] Query Datasheet View:
Project Status Date Cost Savings
------- ----------- ------------- -----------
A 12/11/2005 $3,000,000.00 $150,000.00
B 2/11/2004 $2,000,000.00 $100,000.00
B 1/31/2004 $2,250,000.00 $100,000.00
B 11/23/2003 $2,250,000.00 $90,000.00
C 1/25/2003 $4,000,000.00 $200,000.00
D 10/31/2004 $5,000,000.00 $250,000.00

The Query I defined -- well, actually, I set it up in two steps to make
it easier to debug -- looks like this. First step lists the records,
second step lists the totals.

[Q_New_Select] SQL:
SELECT B.Project, B.[Status Date],
A.Cost, A.Savings, B.S_IND
FROM (TABLEB AS B INNER JOIN TABLEB AS B2
ON B.Project = B2.Project)
INNER JOIN TABLEA AS A
ON B.TableB_ID = A.TableB_ID
GROUP BY B.Project, B.[Status Date],
A.Cost, A.Savings, B.S_IND
HAVING (((B.[Status Date])=Max([B2].[Status Date])))
OR (((B.S_IND)="Y"))
ORDER BY B.Project, B.[Status Date] DESC;

[Q_New_Select] Query Datasheet View:

Project Status Date Cost Savings S_IND
------- ----------- ------------- ------------ -----
B 2/11/2004 $2,000,000.00 $100,000.00
B 11/23/2003 $2,250,000.00 $90,000.00 Y
C 1/25/2003 $4,000,000.00 $200,000.00
D 10/31/2004 $5,000,000.00 $250,000.00
E 7/25/2005 $3,000,000.00 $150,000.00

The last step is pretty simple -- just add them up.

[Q_New_Select_Sums] SQL:

SELECT Sum(Q_New_Select.Cost) AS SumCosts,
Sum(Q_New_Select.Savings) AS SumSavings
FROM Q_New_Select;

[Q_New_Select_Sums] Query Datasheet View:

SumCosts SumSavings
-------------- -----------
$16,250,000.00 $790,000.00


-- Vincent Johns <[email protected]>
Please feel free to quote anything I say here.

You are right, that set of pseudo data does work correctly (I used my actual
data; I abbreviated a lot of things in what I posted).

However, add the following record to TABLEA
PROJECT = B
STATUS_DATE = 1/31/2004
COST = 2250000
SAVINGS = 100000

To TABLEB Add:
PROJECT = B
STATUS_DATE = 1/31/2004

Leave S_Ind blank in TABLEB for this new record.

Now I get three records are returned for Project B, when I'm still wanting
only the two posted earlier. This is also what I see in my actual data.

Thanks.

Vincent said:
Well, when I put your data into my Tables and ran the Query, I got what
you got. So I don't know what you want to see that's different. I have
to run now, but I can post the details of my versions of your Query later.

-- Vincent Johns <[email protected]>
Please feel free to quote anything I say here.

Thank you, Vincent, for taking the time to consider my problem.

[quoted text clipped - 129 lines]
Any help is appreciated.
 
R

robert d via AccessMonster.com

Vincent, Thanks for all the work on this problem.

I saw your SQL statement and I'm just not up to that level of creating SQL
statements. I realized that one of the problems with my statement was that I
have to select both PROJECT and STATUS_DATE when I want to return the one
record where B.S_IND = 'Y". So I've constructed a UNION QUERY as follows.
I've included Project and Status_Date in the select to assist with viewing
the results.

SELECT PROJECT, STATUS_DATE, Sum([COST]) AS SumCosts, Sum([SAVINGS]) AS
SumSavings
FROM TABLEA AS A
WHERE EXISTS
(SELECT B.PROJECT, B.STATUS_DATE FROM [TABLEB] As B
WHERE A.PROJECT = B.PROJECT
AND A.STATUS_DATE = B.STATUS_DATE
AND B.S_IND = 'Y')
GROUP BY A.PROJECT, A.STATUS_DATE
UNION SELECT PROJECT, STATUS_DATE, Sum([COST]) AS SumCosts, Sum([SAVINGS]) AS
SumSavings
FROM TABLEA AS A
WHERE (((A.STATUS_DATE) In
(SELECT MAX(B.STATUS_DATE)
FROM [TABLEB] As B
WHERE B.[PROJECT] = A.PROJECT)))
GROUP BY A.PROJECT, A.STATUS_DATE;

This statement returns the correct results, but needs more testing.

Thanks for all of your help and any comments you have on my revised statement
are greatly appreciated.

Vincent said:
Robert,

I revised your Tables a bit, adding a primary key (Autonumber data type)
to each one to simplify the Queries slightly. It's not necessary to do
that, but my suggestion is to at least look at what I did so you can
decide if it makes sense to do the same thing in your own Tables.

Adding the primary keys is simple; just define a field with Autonumber
type in each Table.

Adding the foreign key to [TABLEA] is slightly trickier; it's used to
link each record in [TABLEA] to its corresponding record in [TABLEB].
To add the [TableB_ID] foreign-key field to [TABLEA], open [TABLEA] in
Table Design View and add a field called [TableB_ID] with Data Type =
Number (long integer) and no index. (It will look like an Autonumber
but will behave differently -- it's not unique, and the values will be
copied from [TABLEB] instead of being generated automatically.) At this
point, it will have no values; to set the values, run the following
Update Query:

[QU_TableB_ID_Setup] SQL:
UPDATE TABLEA INNER JOIN TABLEB
ON (TABLEA.Project = TABLEB.Project)
AND (TABLEA.[Status Date] = TABLEB.[Status Date])
SET TABLEA.TableB_ID = [TABLEB]![TableB_ID];

Then, in the Relationships window, delete the existing relationship
between [TABLEB] and [TABLEA] and set a new one linking
[TABLEB].[TableB_ID] and [TABLEA].[TableB_ID], setting Referential
Integrity on this relationship.

Having done this, and no longer needing the duplicate fields
[TABLEA].[Project] and [TABLEA].[Status Date], I deleted them from
[TABLEA]. At this point, the Tables contained the following records:

[TABLEB] Table Datasheet View:
TableB_ID Project Status Date S_IND
----------- -------- ------------- -----
-1869411747 A 4/29/2004
-1679588807 B 11/23/2003 Y
-1670788141 A 12/11/2005
-1585945064 B 2/11/2004
-1499531505 D 10/31/2004
-727128156 E 7/25/2005
-12772858 C 1/25/2003
191389786 A 6/27/2005
1136072244 B 1/31/2004

[TABLEA] Table Datasheet View:

TableA_ID TableB_ID Cost Savings
----------- ------------ ------------- ------------
-2008314815 -1679588807 $2,250,000.00 $90,000.00
-1244319657 -12772858 $4,000,000.00 $200,000.00
-205746725 1136072244 $2,250,000.00 $100,000.00
581413196 -1499531505 $5,000,000.00 $250,000.00
1283352731 191389786 $1,000,000.00 $50,000.00
1619660142 -1670788141 $3,000,000.00 $150,000.00
1676911936 -1585945064 $2,000,000.00 $100,000.00

If you wish, you can then set a Lookup Property on the new
[TABLEA].[TableB_ID] foreign key, so that it will have a meaningful
appearance. (Raw key values are often, such as in my example here, not
meaningful to human beings. A Lookup property on the foreign-key field
allows you to read the records more easily, and I normally set it for
each foreign key I use, but some people prefer not to do so. In case
you wish to use a Lookup, I'll describe what I did to set it up. In
case you DON'T wish to use it, just skip to the bottom -- the Queries
will work just as well without the Lookup.)

For the Lookup property, I defined a Query to summarize the data in each
record in [TABLEB], including ">" if S_IND = "Y", the project name, and
the status date. There's nothing magic about these choices, but the
result should be meaningful to you, unique within your Table, and fairly
short. I sorted them by [S_IND], project name, and date.

[QL_TableB_Projects] SQL:
SELECT TABLEB.TableB_ID,
IIf([TABLEB]![S_IND]="Y",">"," ")
& [TABLEB]![Project] & " "
& [TABLEB]![Status Date] AS Proj
FROM TABLEB
ORDER BY TABLEB.S_IND DESC ,
TABLEB.Project, TABLEB.[Status Date] DESC;

[QL_TableB_Projects] Query Datasheet View:
TableB_ID Proj
-------------- --------------
-1679588807 >B 11/23/2003
-1670788141 A 12/11/2005
191389786 A 6/27/2005
-1869411747 A 4/29/2004
-1585945064 B 2/11/2004
1136072244 B 1/31/2004
-12772858 C 1/25/2003
-1499531505 D 10/31/2004
-727128156 E 7/25/2005

To apply this, I opened [TABLEA] in Table Design View, selected the
[TableB_ID] field, opened its Lookup tab, and set the properties as follows:

Display Control = List Box (instead of Text Box)
Row Source = QL_TableB_Projects
Column Count = 2
Column Widths = 0;1

I then opened [TABLEA] in Table Datasheet View, selected the [TableA_ID]
column, and used Format --> Hide Columns to hide it, since I had no need
to see those values. Actually, as it turns out, I had no need for the
entire [TABLEA].[TableA_ID] field, since no other Table makes reference
to it, so I could have deleted it, but leaving it in there does little
harm. I just don't want to have to look at it.

The result of setting Lookup property on the foreign key [TableB_ID] and
hiding the primary key [TableA_ID] was a prettier, more legible [TABLEA].

[TABLEA] Table Datasheet View (with Lookup):
TableB_ID Cost Savings
-------------- --------------- ------------
A 6/27/2005 $1,000,000.00 $50,000.00
B 1/31/2004 $2,250,000.00 $100,000.00
B 2/11/2004 $2,000,000.00 $100,000.00
C 1/25/2003 $4,000,000.00 $200,000.00
D 10/31/2004 $5,000,000.00 $250,000.00
E 7/25/2005 $3,000,000.00 $150,000.00
B 11/23/2003 $2,250,000.00 $90,000.00

Please bear in mind that the values stored in the [TableB_ID] field here
are still those key values you saw earlier, but they are displayed here
in a more meaningful format. (However, some people find this confusing
and recommend against using Lookup properties largely for that reason.)

If you use the Form Wizard to generate a Form for this Table, the Lookup
property will be copied to the List Box or Combo Box on that Form that
matches this field.

===

OK. Now we set up a revised Query (using the revised [TABLEB]
and[TABLEA], but I expect you can easily edit it to use your original
Table structures).

When I ran (my version of) your original Query on your Tables, I got the
following results, and I assume this is what you saw, too (extra record
for Project B):

[Q_Old_Orig_Select] Query Datasheet View:
Project Status Date Cost Savings
------- ----------- ------------- -----------
A 12/11/2005 $3,000,000.00 $150,000.00
B 2/11/2004 $2,000,000.00 $100,000.00
B 1/31/2004 $2,250,000.00 $100,000.00
B 11/23/2003 $2,250,000.00 $90,000.00
C 1/25/2003 $4,000,000.00 $200,000.00
D 10/31/2004 $5,000,000.00 $250,000.00

The Query I defined -- well, actually, I set it up in two steps to make
it easier to debug -- looks like this. First step lists the records,
second step lists the totals.

[Q_New_Select] SQL:
SELECT B.Project, B.[Status Date],
A.Cost, A.Savings, B.S_IND
FROM (TABLEB AS B INNER JOIN TABLEB AS B2
ON B.Project = B2.Project)
INNER JOIN TABLEA AS A
ON B.TableB_ID = A.TableB_ID
GROUP BY B.Project, B.[Status Date],
A.Cost, A.Savings, B.S_IND
HAVING (((B.[Status Date])=Max([B2].[Status Date])))
OR (((B.S_IND)="Y"))
ORDER BY B.Project, B.[Status Date] DESC;

[Q_New_Select] Query Datasheet View:

Project Status Date Cost Savings S_IND
------- ----------- ------------- ------------ -----
B 2/11/2004 $2,000,000.00 $100,000.00
B 11/23/2003 $2,250,000.00 $90,000.00 Y
C 1/25/2003 $4,000,000.00 $200,000.00
D 10/31/2004 $5,000,000.00 $250,000.00
E 7/25/2005 $3,000,000.00 $150,000.00

The last step is pretty simple -- just add them up.

[Q_New_Select_Sums] SQL:

SELECT Sum(Q_New_Select.Cost) AS SumCosts,
Sum(Q_New_Select.Savings) AS SumSavings
FROM Q_New_Select;

[Q_New_Select_Sums] Query Datasheet View:

SumCosts SumSavings
-------------- -----------
$16,250,000.00 $790,000.00

You are right, that set of pseudo data does work correctly (I used my actual
data; I abbreviated a lot of things in what I posted).
[quoted text clipped - 28 lines]
 
V

Vincent Johns

Robert,

First please let me apologize for something that may have caused some
bewilderment -- I made a mistake in my version of [TABLEA], attaching
one of the records to Project E instead of Project A. Here's what it
should have looked like.

[TABLEA] Table Datasheet View (with correction in record for
A 12/11/2005 that I'd erroneously labeled as E 7/25/2005; totals were
correct, but not the project names):

TableB_ID Cost Savings
------------- ------------- -----------
A 6/27/2005 $1,000,000.00 $50,000.00
B 2/11/2004 $2,000,000.00 $100,000.00
B 1/31/2004 $2,250,000.00 $100,000.00
B 11/23/2003 $2,250,000.00 $90,000.00
A 12/11/2005 $3,000,000.00 $150,000.00
C 1/25/2003 $4,000,000.00 $200,000.00
D 10/31/2004 $5,000,000.00 $250,000.00

So my [Q_New_Select] Query should have looked like this (same Query, but
based on different data in [TABLEA]):

[Q_New_Select] Query Datasheet View:

Project Status Date Cost Savings S_IND
------- ----------- ------------- ----------- -----
A 12/11/2005 $3,000,000.00 $150,000.00
B 2/11/2004 $2,000,000.00 $100,000.00
B 11/23/2003 $2,250,000.00 $90,000.00 Y
C 1/25/2003 $4,000,000.00 $200,000.00
D 10/31/2004 $5,000,000.00 $250,000.00

===

Now, looking at your Union Query, I think it should work OK for you.
(There's more than one way to skin a cat.) But I re-wrote it slightly,
breaking it into 3 Queries to make each part easier to maintain.

Your original Tables I had renamed to [TABLEB_Orig] and [TABLEA_Orig],
and the split-up SQL corresponding to your Union Query follows. I
called the first SELECT statement [QU_Part1] and the other one [QU_Part2].

[QU_Part1] SQL:

SELECT [PROJECT], [STATUS DATE],
Sum([COST]) AS SumCosts,
Sum([SAVINGS]) AS SumSavings
FROM TABLEA_Orig AS A
WHERE EXISTS
(SELECT B.PROJECT,
B.[STATUS DATE] FROM [TABLEB] As B
WHERE A.PROJECT = B.PROJECT
AND A.[STATUS DATE] = B.[STATUS DATE]
AND B.S_IND = 'Y')
GROUP BY A.PROJECT, A.[STATUS DATE];

[QU_Part1] Query Datasheet View:

PROJECT STATUS DATE SumCosts SumSavings
------- ---------- ------------- ----------
B 11/23/2003 $2,250,000.00 $90,000.00

[QU_Part2] SQL:

SELECT [PROJECT], [STATUS DATE],
Sum([COST]) AS SumCosts,
Sum([SAVINGS]) AS SumSavings
FROM TABLEA_Orig AS A
WHERE (((A.[STATUS DATE]) In
(SELECT MAX(B.[STATUS DATE])
FROM [TABLEB_Orig] As B
WHERE B.[PROJECT] = A.PROJECT)))
GROUP BY A.PROJECT, A.[STATUS DATE];

[QU_Part2] Query Datasheet View:

PROJECT STATUS DATE SumCosts SumSavings
------- ----------- ------------- -----------
A 12/11/2005 $3,000,000.00 $150,000.00
B 2/11/2004 $2,000,000.00 $100,000.00
C 1/25/2003 $4,000,000.00 $200,000.00
D 10/31/2004 $5,000,000.00 $250,000.00

I combined them into the following Union Query:

[QU_Both] SQL:

SELECT PROJECT, [STATUS DATE],
SumCosts, SumSavings
FROM QU_Part1
UNION SELECT PROJECT, [STATUS DATE],
SumCosts, SumSavings
FROM QU_Part2
ORDER BY PROJECT, [STATUS DATE];

[QU_Both] Query Datasheet View:

PROJECT STATUS DATE SumCosts SumSavings
------- ----------- -------- -----------
A 12/11/2005 $3,000,000.00 $150,000.00
B 11/23/2003 $2,250,000.00 $90,000.00
B 2/11/2004 $2,000,000.00 $100,000.00
C 1/25/2003 $4,000,000.00 $200,000.00
D 10/31/2004 $5,000,000.00 $250,000.00


Although the Union Query will probably do everything you need for now,
at some point you may wish to look at the other ones to be aware of what
they do and how they work.

-- Vincent Johns <[email protected]>
Please feel free to quote anything I say here.

Vincent, Thanks for all the work on this problem.

I saw your SQL statement and I'm just not up to that level of creating SQL
statements. I realized that one of the problems with my statement was that I
have to select both PROJECT and STATUS_DATE when I want to return the one
record where B.S_IND = 'Y". So I've constructed a UNION QUERY as follows.
I've included Project and Status_Date in the select to assist with viewing
the results.

SELECT PROJECT, STATUS_DATE, Sum([COST]) AS SumCosts, Sum([SAVINGS]) AS
SumSavings
FROM TABLEA AS A
WHERE EXISTS
(SELECT B.PROJECT, B.STATUS_DATE FROM [TABLEB] As B
WHERE A.PROJECT = B.PROJECT
AND A.STATUS_DATE = B.STATUS_DATE
AND B.S_IND = 'Y')
GROUP BY A.PROJECT, A.STATUS_DATE
UNION SELECT PROJECT, STATUS_DATE, Sum([COST]) AS SumCosts, Sum([SAVINGS]) AS
SumSavings
FROM TABLEA AS A
WHERE (((A.STATUS_DATE) In
(SELECT MAX(B.STATUS_DATE)
FROM [TABLEB] As B
WHERE B.[PROJECT] = A.PROJECT)))
GROUP BY A.PROJECT, A.STATUS_DATE;

This statement returns the correct results, but needs more testing.

Thanks for all of your help and any comments you have on my revised statement
are greatly appreciated.

Vincent said:
Robert,

I revised your Tables a bit, adding a primary key (Autonumber data type)
to each one to simplify the Queries slightly. It's not necessary to do
that, but my suggestion is to at least look at what I did so you can
decide if it makes sense to do the same thing in your own Tables.

Adding the primary keys is simple; just define a field with Autonumber
type in each Table.

Adding the foreign key to [TABLEA] is slightly trickier; it's used to
link each record in [TABLEA] to its corresponding record in [TABLEB].
To add the [TableB_ID] foreign-key field to [TABLEA], open [TABLEA] in
Table Design View and add a field called [TableB_ID] with Data Type =
Number (long integer) and no index. (It will look like an Autonumber
but will behave differently -- it's not unique, and the values will be
copied from [TABLEB] instead of being generated automatically.) At this
point, it will have no values; to set the values, run the following
Update Query:

[QU_TableB_ID_Setup] SQL:
UPDATE TABLEA INNER JOIN TABLEB
ON (TABLEA.Project = TABLEB.Project)
AND (TABLEA.[Status Date] = TABLEB.[Status Date])
SET TABLEA.TableB_ID = [TABLEB]![TableB_ID];

Then, in the Relationships window, delete the existing relationship
between [TABLEB] and [TABLEA] and set a new one linking
[TABLEB].[TableB_ID] and [TABLEA].[TableB_ID], setting Referential
Integrity on this relationship.

Having done this, and no longer needing the duplicate fields
[TABLEA].[Project] and [TABLEA].[Status Date], I deleted them from
[TABLEA]. At this point, the Tables contained the following records:

[TABLEB] Table Datasheet View:
TableB_ID Project Status Date S_IND
----------- -------- ------------- -----
-1869411747 A 4/29/2004
-1679588807 B 11/23/2003 Y
-1670788141 A 12/11/2005
-1585945064 B 2/11/2004
-1499531505 D 10/31/2004
-727128156 E 7/25/2005
-12772858 C 1/25/2003
191389786 A 6/27/2005
1136072244 B 1/31/2004

[TABLEA] Table Datasheet View:

TableA_ID TableB_ID Cost Savings
----------- ------------ ------------- ------------
-2008314815 -1679588807 $2,250,000.00 $90,000.00
-1244319657 -12772858 $4,000,000.00 $200,000.00
-205746725 1136072244 $2,250,000.00 $100,000.00
581413196 -1499531505 $5,000,000.00 $250,000.00
1283352731 191389786 $1,000,000.00 $50,000.00
1619660142 -1670788141 $3,000,000.00 $150,000.00
1676911936 -1585945064 $2,000,000.00 $100,000.00

If you wish, you can then set a Lookup Property on the new
[TABLEA].[TableB_ID] foreign key, so that it will have a meaningful
appearance. (Raw key values are often, such as in my example here, not
meaningful to human beings. A Lookup property on the foreign-key field
allows you to read the records more easily, and I normally set it for
each foreign key I use, but some people prefer not to do so. In case
you wish to use a Lookup, I'll describe what I did to set it up. In
case you DON'T wish to use it, just skip to the bottom -- the Queries
will work just as well without the Lookup.)

For the Lookup property, I defined a Query to summarize the data in each
record in [TABLEB], including ">" if S_IND = "Y", the project name, and
the status date. There's nothing magic about these choices, but the
result should be meaningful to you, unique within your Table, and fairly
short. I sorted them by [S_IND], project name, and date.

[QL_TableB_Projects] SQL:
SELECT TABLEB.TableB_ID,
IIf([TABLEB]![S_IND]="Y",">"," ")
& [TABLEB]![Project] & " "
& [TABLEB]![Status Date] AS Proj
FROM TABLEB
ORDER BY TABLEB.S_IND DESC ,
TABLEB.Project, TABLEB.[Status Date] DESC;

[QL_TableB_Projects] Query Datasheet View:
TableB_ID Proj
-------------- --------------
-1679588807 >B 11/23/2003
-1670788141 A 12/11/2005
191389786 A 6/27/2005
-1869411747 A 4/29/2004
-1585945064 B 2/11/2004
1136072244 B 1/31/2004
-12772858 C 1/25/2003
-1499531505 D 10/31/2004
-727128156 E 7/25/2005

To apply this, I opened [TABLEA] in Table Design View, selected the
[TableB_ID] field, opened its Lookup tab, and set the properties as follows:

Display Control = List Box (instead of Text Box)
Row Source = QL_TableB_Projects
Column Count = 2
Column Widths = 0;1

I then opened [TABLEA] in Table Datasheet View, selected the [TableA_ID]
column, and used Format --> Hide Columns to hide it, since I had no need
to see those values. Actually, as it turns out, I had no need for the
entire [TABLEA].[TableA_ID] field, since no other Table makes reference
to it, so I could have deleted it, but leaving it in there does little
harm. I just don't want to have to look at it.

The result of setting Lookup property on the foreign key [TableB_ID] and
hiding the primary key [TableA_ID] was a prettier, more legible [TABLEA].

[TABLEA] Table Datasheet View (with Lookup):
TableB_ID Cost Savings
-------------- --------------- ------------
A 6/27/2005 $1,000,000.00 $50,000.00
B 1/31/2004 $2,250,000.00 $100,000.00
B 2/11/2004 $2,000,000.00 $100,000.00
C 1/25/2003 $4,000,000.00 $200,000.00
D 10/31/2004 $5,000,000.00 $250,000.00
E 7/25/2005 $3,000,000.00 $150,000.00
B 11/23/2003 $2,250,000.00 $90,000.00

Please bear in mind that the values stored in the [TableB_ID] field here
are still those key values you saw earlier, but they are displayed here
in a more meaningful format. (However, some people find this confusing
and recommend against using Lookup properties largely for that reason.)

If you use the Form Wizard to generate a Form for this Table, the Lookup
property will be copied to the List Box or Combo Box on that Form that
matches this field.

===

OK. Now we set up a revised Query (using the revised [TABLEB]
and[TABLEA], but I expect you can easily edit it to use your original
Table structures).

When I ran (my version of) your original Query on your Tables, I got the
following results, and I assume this is what you saw, too (extra record
for Project B):

[Q_Old_Orig_Select] Query Datasheet View:
Project Status Date Cost Savings
------- ----------- ------------- -----------
A 12/11/2005 $3,000,000.00 $150,000.00
B 2/11/2004 $2,000,000.00 $100,000.00
B 1/31/2004 $2,250,000.00 $100,000.00
B 11/23/2003 $2,250,000.00 $90,000.00
C 1/25/2003 $4,000,000.00 $200,000.00
D 10/31/2004 $5,000,000.00 $250,000.00

The Query I defined -- well, actually, I set it up in two steps to make
it easier to debug -- looks like this. First step lists the records,
second step lists the totals.

[Q_New_Select] SQL:
SELECT B.Project, B.[Status Date],
A.Cost, A.Savings, B.S_IND
FROM (TABLEB AS B INNER JOIN TABLEB AS B2
ON B.Project = B2.Project)
INNER JOIN TABLEA AS A
ON B.TableB_ID = A.TableB_ID
GROUP BY B.Project, B.[Status Date],
A.Cost, A.Savings, B.S_IND
HAVING (((B.[Status Date])=Max([B2].[Status Date])))
OR (((B.S_IND)="Y"))
ORDER BY B.Project, B.[Status Date] DESC;

[Q_New_Select] Query Datasheet View:

Project Status Date Cost Savings S_IND
------- ----------- ------------- ------------ -----
B 2/11/2004 $2,000,000.00 $100,000.00
B 11/23/2003 $2,250,000.00 $90,000.00 Y
C 1/25/2003 $4,000,000.00 $200,000.00
D 10/31/2004 $5,000,000.00 $250,000.00
E 7/25/2005 $3,000,000.00 $150,000.00

The last step is pretty simple -- just add them up.

[Q_New_Select_Sums] SQL:

SELECT Sum(Q_New_Select.Cost) AS SumCosts,
Sum(Q_New_Select.Savings) AS SumSavings
FROM Q_New_Select;

[Q_New_Select_Sums] Query Datasheet View:

SumCosts SumSavings
-------------- -----------
$16,250,000.00 $790,000.00

-- Vincent Johns <[email protected]>
Please feel free to quote anything I say here.

You are right, that set of pseudo data does work correctly (I used my actual
data; I abbreviated a lot of things in what I posted).

[quoted text clipped - 28 lines]
Any help is appreciated.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Top