Inconsistent query results

cpnet · May 4, 2005

I've discovered a serious problem in my application. My app uses an MS
Access ('97) format database to process some data for reports. I have a lot
of queries of queries of queries etc. My report database actually links to
tables in a 'real' Access database so that I don't change the 'real'
database for the sake of some reports.

Anyway, I have a query called, "baseQuery".

I've also run the following SQL:

SELECT * INTO baseTable FROM baseQuery

As a result of running this command baseQuery and baseTable both have the
exact same data.

I also have finalQuery, which is a query that looks at baseData (baseData is
expected to be a table/view in the exact format of baseQuery and baseTable).
By renaming baseQuery, or baseTable to "baseData", I can run finalQuery on
either the table or query version of the baseQuery data - without changing
finalQuery. The results of finalQuery should be identical in either case.
However, I'm finding that when finalQuery is working on baseQuery,
finalQuery returns 26 rows. When finalQuery runs on baseTable, I get 400
rows!!!

I'm getting no warnings or errors or any other indication that there's a
problem. I've tired this on a fully patched XP Pro machine, and 2 fully
patched Win98 SE machines.

This also seems dependent on the complexity of baseQuery. When baseQuery
involves fewer joins, and I persist the data to baseTable, I see that
finalQuery will return the same result set whether working with equivalent
source data from a table or query.

JohnFol · May 5, 2005

So what is common to the 374 rows that conditionally appear?

cpnet · May 5, 2005

There is no apparent pattern. But, I think this is irrelevant anyway. If
you have a table and a query each of which provide the _exact_ same
resulting dataset when you view them, you would expect a 2nd query that
looks at the source table or query to produce the exact same results.

To try to be a little more concrete:

Query1 : A complex query with inner and outer joins of queries and tables.

Table1 : This is produced by running "SELECT * INTO [Table1] FROM [Query1]"

If I run "SELECT * FROM [Query1]" or 'SELECT * FROM [Table1]", I get the
_exact_ same result set. So far everything is fine. But here's where it
gets interesting:

Query2a :

SELECT
q1a.ID, q1a.someDate,
Max(IIF(qa2.ID > qa1.ID, q1b.someDate, Null)) AS NextDate
FROM
[Query1] q1a
LEFT JOIN [Query1] q1b ON
((q1a.ID <> q1b.ID) AND (q1a.CLNTID = q1b.CLNTID))
GROUP BY
q1a.ID, q1a.someDate

Query2b :

SELECT
q1a.ID, q1a.someDate,
Max(IIF(qa2.ID > qa1.ID, q1b.someDate, Null)) AS NextDate
FROM
[Table1] q1a
LEFT JOIN [Table1] q1b ON
((q1a.ID <> q1b.ID) AND (q1a.CLNTID = q1b.CLNTID))
GROUP BY
q1a.ID, q1a.someDate

Notice that Query2a and Query2b are identical, except that one uses Query1
for it's source data, and the other uses Table1. (But remember Query1 and
Table1 return the _exact_ same result set).

Query2a will return about 26 records, and Query2b will return about 400.
Query2a and Query2b should be returning the same result set. It's like
Access is trying to do some optimization when querying a query, but it's
getting it wrong. If Query2a is too complex (because it involves queries of
queries of queries), then I should get a "too complex..." error.

cpnet · May 5, 2005

oops.. should be

Query2a :

SELECT
q1a.ID, q1a.someDate,
--> Max(IIF(q1a.ID > q1b.ID, q1b.someDate, Null)) AS NextDate
FROM
[Query1] q1a
LEFT JOIN [Query1] q1b ON
((q1a.ID <> q1b.ID) AND (q1a.CLNTID = q1b.CLNTID))
GROUP BY
q1a.ID, q1a.someDate

Query2b :

SELECT
q1a.ID, q1a.someDate,
--> Max(IIF(q1a.ID > q1b.ID, q1b.someDate, Null)) AS NextDate
FROM
[Table1] q1a
LEFT JOIN [Table1] q1b ON
((q1a.ID <> q1b.ID) AND (q1a.CLNTID = q1b.CLNTID))
GROUP BY
q1a.ID, q1a.someDate

I actually wrote these by hand as a simplified example of the situation. In
my testing of the problem, I just used cut and paste to copy Query2a to
Query2b, then edited [Query1] to read [Table1] (of vice versa) to compare
what happened when using a table or query producing the same result set as
the source for Query2a/b.

cpnet · May 5, 2005

As explained in my other response to your question, I can't see any pattern
(I looked for patterns of particular values or nulls in columns, but I can't
see anything). Also, I still think this is irrelevant. I have [Query2]
that examines some data. Whether that source data is coming from another
query or a table, as long as the source query or table contain the exact
same result set (and they do), [Query2] should give me the exact same
result, but it doesn't.

I have made an interesting discovery though...

[Query2] references [Query1] and [Query2] is not returning a lot of records
that it should (I've already established this).

[Query1] includes the 'DISTINCT' clause so it contains no duplicate records.
[Query1] references [QueryX]. Now, [QueryX] does not contain a 'DISTINCT'
clause and does have some duplicates. When I add the 'DISTINCT' clause to
[QueryX], the duplicates are dropped. However, the output of [Query1]
remains the same since the 'DISTINCT' clause in [Query1] was already dealing
with any duplicates from [QueryX]. So to recap, [Query1] has the same
output (no duplicates) regardless of whether or not [QueryX] has duplicates.

But here's the funny part. [Query2] does produce the correct output when
[QueryX] includes the 'DISTINCT' clause. This is strange, since [Query2]
has no direct knowledge of [QueryX]. [Query2] only references [Query1].
It's [Query1] that in turn references [QueryX]. As I've already explained,
the result set returned by [Query1] is identical, regardless of whether or
not [QueryX] returns duplicates (becuase [Query1] always has a 'DISTINCT'
clause to remove duplicates).

It really seems that when you run a query of a query of a query etc., Access
doesn't run each one in series, but rather tries to do some
behind-the-scenes optimization. Unfortunately, it doesn't seem to be doing
it correctly in all cases. If it can't correctly optimize a long chain of
queries, then it should give me an error, rather than a result set that's
just missing some records.

cpnet · May 6, 2005

The database in question is actually an export from an application I've
written. Different users (with separate copies of my app) can all export to
an Access '97 format database. I have another tool which allows them to
aggregate all of these databases into a single database of the same format.

I have looked at databases from 2 other customers, and determined that the
problem I'm seeing only occurs in one of the 3 databases. I can't see any
peculiarity in the data of the 'broken' database. However, when I aggregate
the 3 databases, the aggregated database shows the broken behaviour.

These facts together led me to believe that maybe the problem was a corrupt
database. So, I took the 'broken' database, and exported all tables to XML
(data and schema). I then created a brand new database, and imported the
tables from my XML. I figured that this would either get rid of any
corruption in the tables, or throw an error because the XML/Access
conversion would find any corruption. The XML exported and imported with no
error, but the new database now exhibits the same problem behaviour!!!

Corruption seemed to be the only possible explanation, but it would seem to
me that the export/import to a new DB using XML would rule out corruption as
the problem.

The fact that the queries work fine on 2 out of 3 databases, would suggest
maybe it's an issue with the data in the one database (but I can't find any
problem).

The impact of the 'DISTINCT' clause in indirect queries on the final result
on the 'problem' database (described in the prior post) suggests that Access
is actually doing something wrong.

So what's going on here?!? Is Access doing something wrong, but only in
rare cases?

Please help!

Inconsistent query results

cpnet

JohnFol

cpnet

cpnet

cpnet

cpnet