Excluding related records and a subquery NOT IN clause

S

Scott Millhisler

I have a question regarding the NOT IN clause in subqueries. Is the NOT IN
clause known to be slow? I am accessing linked Access databases using DAO
under Access 2000.

I've been trying to work around using NOT IN but have had no success so far.

What I have, for the sake of illustration, is two tables. One is a child of
the other with a logical one-to-many relationship.

Tables: Master, Child

Both tables have a long integer connector field named IDNum. IDNum is
unique in the table Master.

The table Child also has a long integer field for storing attribute codes
called AttrIDNum. Each record in the master table can have 0 or more
related entries in the child table.

What I want to do is filter out certain attributes using a query, recordset,
or whatever.

I have built a third table (Exclude) that contains a list of the AttrIDNums
that I want filtered out. Therefore, the result set I want should be a list
of IDNums from Master that have no related records in Child where the
AttrIDNum is in the table Exclude.

A sample SQL techniques I have tried is:

SELECT * FROM Master WHERE IDNum NOT IN (SELECT IDNum FROM Child WHERE
AttrIDNum = ANY (SELECT AttrIDNum FROM Exclude));

There are approximately 6,000 records in Master and 12,000 in Child. It
takes several minutes for this query to run. Is there a better way to
handle this type of query?

I have tried doing tests where I hard code the AttrIDNums in the Child query
to eliminate the query against the Exclude table but it doesn't help much at
all.

In my actual application I filter more than this, such as records that must
have all of a series of attributes and records that must have one or more of
a series of attributes. However, these queries run at a very acceptable
speed using subqueries and the = ANY clause. It is when I do the
exclusionary query that I see the very slow performance.

Any thoughts or ideas?

Sincerely,

Scott
 
B

Brian Camire

Yes, NOT IN can be slow. You might try a query using NOT IN in a different
way something like this:

SELECT
Master.*
FROM
Master
INNER JOIN
Exclude
ON Master.IDNum=Exclude.AttrIDNum
WHERE
Master.IDNum Not In (SELECT IDNum FROM Child)

but you might get better performance out of something like this:

SELECT
Master.*
FROM
(Master
INNER JOIN
Exclude
ON
Master.IDNum = Exclude.AttrIDNum)
LEFT JOIN
Child
ON Master.IDNum = Child.IDNum
WHERE
Child.IDNum Is Null
 
S

Scott Millhisler

Thanks for the reply but I think you misunderstood the table structure, or
I'm misunderstanding your solution.

You have "INNER JOIN Exclude ON Master.IDNum=Exclude.AttrIDNum..." yet IDNum
and AttrIDNum are in no way related. IDNum is a unique key to records in
the Master table and AttrIDNum is a reference to an attribute type found in
the Child table. Linking these two fields would make no sense.

Scott Millhisler
SJM Computer Consulting
 
B

Brian Camire

In that case you might try:

SELECT
Master.*
FROM
Master
LEFT JOIN
(SELECT
Child.IDNum
FROM
Child
INNER JOIN
Exclude
ON
Child.AttrIDNum = Exclude.AttrIDNum) AS A
ON
Master.IDNum = A.IDNum
WHERE
A.IDNum Is Null
 
S

Scott Millhisler

Thanks, I never thought to subquery one of the join parameters. I believe
it will work just fine. It did a test run on the dataset and it returned
the full results in about 2 seconds. A far cry from what using NOT IN was
doing.

Thanks again,

Scott Millhisler
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top