Comparing 2 recordsets to get Missing or Different

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

I presently have a DAO routine that creates recordsets so I can scroll
through and find different or missing values between the recordsets.
I felt I could have the most control with this setup.

However, after being recently exposed to some powerful SQL (see More help
with A Not so simple - MAX function - answer by Ofer)

I would much rather get rid of the code, and construct an SQL query.

I have a table called tbl_AIPProfileVariables

The fields are:
ProfileVariable
PF_Value
Base Environment
Base Env Date
Modified Environment
Mod Env Date

Though it is just one table, the comparison I will be doing is between
Environments of Different Env Date

For example: I would like to compare Environment "A" extracted on 12/15/2004
8:30 AM with Environment "A" extracted on 12/03/2004 3:30 PM
A comparison could also be one between A and B, but for the purposes of this
example, I want to illustrate that it could be between same name environments
with just different Env Dates

and I'd like the result to look something like this: (date has been
shortened for this example)

ProfileVariable PF_Value Status ChangedFrom Base Base Mod
Mod

Env Dt Env dt
========= ====== ==== ========= === === ==== ===
Send_PT 3 Different 2 A
12/15 A 12/3
Appt_Count Count all Missing - A
12/15 A 12/3
TimeOfDay Morning Different Evening A 12/15
A 12/3


I'm presently walking through the recordsets using DAO...
but I feel this could be handled more efficiently with SQL?
 
Dear Jon:

If you have a large number or rows to look at, it would certainly be easier
to let some SQL code do the walking.

You seem to indicate you want to compare all rows from within your table
having the same value in one column, Environment.

There are, as you show, 3 different possible actions: a value of
Environment is found only once, a value is found in two rows and both are
identical in other particulars, or a value is found in two rows and are not
identical. Unless the value of Environment is found more than twice, you
could also find multiple matches. There could conceivably be 3 values with
"A" for Environment in one table and 2 in the other. This would result in
there being 6 compared combinations between the two tables, with each being
identical or not identical.

To keep it simple, consider there to be only one other column to compare for
identical values. This can be very easily extended to compare additional
columns. I'm assuming the Dt (date?) column is not of interest for
comparison. There's not point in whether the dates are the same between the
two sets, right?

SELECT *
FROM tbl_AIPProfileVariables T1,
tblAIPProfileVariables T2

In this first step, we set up a comparison between every pair of rows in the
table. If there were 100 rows, this "cross product" would have 10,000 rows,
every rows being compared with every other row.

We can reduce this set by eliminating those rows where the comparison is
between two different values of Environment, as follows:

SELECT *
FROM tbl_AIPProfileVariables T1,
tblAIPProfileVariables T2
WHERE T1.Environment = T2.Environment

Also, assuming Environment/Dt is a unique key taken together, we can
eliminate comparisons of any row with itself:

SELECT *
FROM tbl_AIPProfileVariables T1,
tblAIPProfileVariables T2
WHERE T1.Environment = T2.Environment
AND T1.Dt <> T2.Dt

Now, this result set can be reduced to show either those that do match or
those that don't match with respect to the other columns. I'm not sure what
your other columns are named, or how many you have. The example you gave
wraps and is difficult. I'll just call them C1, C2, and C3. You should be
able to adapt this to your actual column names:

SELECT *
FROM tbl_AIPProfileVariables T1,
tblAIPProfileVariables T2
WHERE T1.Environment = T2.Environment
AND T1.Dt <> T2.Dt
AND T1.C1 = T2.C1
AND T1.C2 = T2.C2
AND T1.C3 = T2.C3

This provides any pairs that do match. To see those that don't, negate the
last 3 rows:

SELECT *
FROM tbl_AIPProfileVariables T1,
tblAIPProfileVariables T2
WHERE T1.Environment = T2.Environment
AND T1.Dt <> T2.Dt
AND NOT (T1.C1 = T2.C1
AND T1.C2 = T2.C2
AND T1.C3 = T2.C3)

To find rows without a match:

SELECT Environment
FROM tblAIPProfileVariables
GROUP BY Environment
HAVING COUNT(*) = 1

If you want the format of results I think you show in your example, we'll
need to add a column describing "Missing", "Different", and "Identical" and
then create a union, filling in the missing columns from this last set.
Let's cross that bridge later, OK?

Tom Ellison
 
Dear Tom:

I really like where you're going with this so let me take the time to
clarify more of the particulars.

What I have is a database that tracks the changes made in multiple
environments.
We have created an interface that enables us to do comparisons against
different environments, or the SAME environments at different moments in time.

Environment names are like REGNHIM, PRODHIM, STSTHI1, etc...

in my particular example, I am giving the example of comparing profile
variables from PRODHIM extracted on 12/15/2005 against PRODHIM extracted on
12/03/2005

What I really care about is the "Delta" that has occured between these
extractions.
so, what I'm looking for are the Different Profile values (most likely, the
PF_Value and not necessarilly the ProfileVariable - "a label") and the
"Missing" profile variables.

I totally get where you're going with the descriptive column and UNION
query, so you probably don't even need to cross that bridge...

But if I could have the basic missing and different SQL statements in place
for the MANY comparisons that I am going to have to do, it will be great,
knowing that if there are any errors in the results, --it's probably from an
incorrect extract!

I will try to implement what you have suggested so far, but just in case my
clarification has proved a different solution is in order, please show me
those examples... again thank you for taking the time to walk me through it
in this manner.
 
Dear Jon?:

I'm glad if this has been some help. So, we're agreed to create separate
solutions for MISSING, IDENTICAL, and DIFFERENT. You will be able to UNION
these later as needed.

By MISSING I believe you mean there is only one row meeting the criteria.
That is, there is no comparable row to use with it.

It really sounds like your original communication has me on the right track.
I don't have any suggestions currently for any change in my original
response. The work of verifying my understanding is really in your hands.

Tom Ellison
 
Ok, so here's where I'm getting stuck.

I took your example and tried to make it fit my interface:
My interface looks something like this:


cmbModEnv ModEnvDate
(This lets the user choose an environment) (Identifies the extraction)

cmbBaseEnvi BaseEnvDate
(This lets the user choose an environment) (Identifies the extraction)

cmdRunComparison
(Button to take the parameters provided by the user and apply it in VBA
using SQL)


So it would seem like T1 could be the "Mod Environment"
and T2 could be the "Base Environment"

Before I plug these parameters in, I tried modifying the SQL with HARD
values, just to see if I get something - But for the example below, I put how
I plan to use it in my code: (This structure is what keeps erroring out when
I try plugging in real values)

SELECT *
FROM tbl_AIP_ProfileVariables Where Environment = cmbModEnv AND
ActualDateTime = #ModExtractDate# AS T1,

FROM tbl_AIP_ProfileVariables Where Environment = cmbBaseEnv AND
ActualDateTime = #BaseExtractDate# AS T2,

HAVING (((T1.EnvironmentName)=[T2].[EnvironmentName]) AND
((T1.ActualDateTime)<>[T2].[ActualDateTime]) AND
(([T1].[ProfileVariable]=[T2].[ProfileVariable] And
[T1].[PF_Value]=[T2].[PF_Value])=False));
 
Dear Jon:

Limited time at the moment.

Do not repeat FROM. Just one.

I prefer not to filter in a HAVING, expecially as you are not grouping. Use
WHERE.

I'm thinking what you want may be:

SELECT *
FROM tbl_AIP_ProfileVariablesT1, tbl_AIP_ProfileVariables T2
WHERE T1.Environment = [Forms].[FormName].cmbBaseEnv
AND T1.ActualDateTime = "#" & [Forms].[FormName].[BaseExtractDate] & "#"
AND T2.Environment = [Forms].[FormName].[cmbModEnv]
AND T2.ActualDateTime = "#" & [Forms].[FormName].[ModExtractDate] & "#"
AND ((T1.ActualDateTime) <> [T2].[ActualDateTime])
AND ([T1].[ProfileVariable] <> [T2].[ProfileVariable]
OR [T1].[PF_Value] <> [T2].[PF_Value])

One of the above is unnecessary. Since you limit T1.ActualDateTime to
BaseExtractDate and T2.ActualTimeDate to ModExtractDate they will be equal
only when the selected filters are equal. If you don't want to allow them
to be equal, then don't allow these selections to be equal. Makes sense to
me, anyway. So now I have:

SELECT *
FROM tbl_AIP_ProfileVariablesT1, tbl_AIP_ProfileVariables T2
WHERE T1.Environment = [Forms].[FormName].cmbBaseEnv
AND T1.ActualDateTime = "#" & [Forms].[FormName].[BaseExtractDate] & "#"
AND T2.Environment = [Forms].[FormName].[cmbModEnv]
AND T2.ActualDateTime = "#" & [Forms].[FormName].[ModExtractDate] & "#"
AND ([T1].[ProfileVariable] <> [T2].[ProfileVariable]
OR [T1].[PF_Value] <> [T2].[PF_Value])

Notice how I handled the dates. The # is a literal and must be in quotes.
Append it to the values from the controls. This is an important detail you
may not have understood.

So, does this get closer?

Tom Ellison


jonefer said:
Ok, so here's where I'm getting stuck.

I took your example and tried to make it fit my interface:
My interface looks something like this:


cmbModEnv ModEnvDate
(This lets the user choose an environment) (Identifies the
extraction)

cmbBaseEnvi BaseEnvDate
(This lets the user choose an environment) (Identifies the
extraction)

cmdRunComparison
(Button to take the parameters provided by the user and apply it in VBA
using SQL)


So it would seem like T1 could be the "Mod Environment"
and T2 could be the "Base Environment"

Before I plug these parameters in, I tried modifying the SQL with HARD
values, just to see if I get something - But for the example below, I put
how
I plan to use it in my code: (This structure is what keeps erroring out
when
I try plugging in real values)

SELECT *
FROM tbl_AIP_ProfileVariables Where Environment = cmbModEnv AND
ActualDateTime = #ModExtractDate# AS T1,

FROM tbl_AIP_ProfileVariables Where Environment = cmbBaseEnv AND
ActualDateTime = #BaseExtractDate# AS T2,

HAVING (((T1.EnvironmentName)=[T2].[EnvironmentName]) AND
((T1.ActualDateTime)<>[T2].[ActualDateTime]) AND
(([T1].[ProfileVariable]=[T2].[ProfileVariable] And
[T1].[PF_Value]=[T2].[PF_Value])=False));










Tom Ellison said:
Dear Jon?:

I'm glad if this has been some help. So, we're agreed to create separate
solutions for MISSING, IDENTICAL, and DIFFERENT. You will be able to
UNION
these later as needed.

By MISSING I believe you mean there is only one row meeting the criteria.
That is, there is no comparable row to use with it.

It really sounds like your original communication has me on the right
track.
I don't have any suggestions currently for any change in my original
response. The work of verifying my understanding is really in your
hands.

Tom Ellison
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Back
Top