Best way to find people on ALL lists.

  • Thread starter Thread starter Anne Savage
  • Start date Start date
A

Anne Savage

Hi -

I need to ask the question "Which people are on ALL of
these Lists". There's a people table, a list table and a
peoplelist xref table.

I know at least 4 ways to do this, two are quite fast and
two quite slow in my limited testing. The two fast ones
are below as Way1 & Way2 and the slower ones as Way3 &
Way4.

Does any one know any reason that Way1 would be better
than Way2 or vice versa? Or is there another, better way?

Ultimately this solution will be coded behind a form with
a multiple select list box, and an AND/OR radio button.
Obviously, this is the AND solution.

Any thoughts? Thanks for your help.


Way 1: Use Group By & Having Count =
(pseudo code)
SELECT PeopleID FROM PeopleList
WHERE (ListID In 1,2,3)
GROUP BY PeopleID
HAVING Count(PeopleListID=3);

Way 2: Use subquery for each list and inner join
(pseudo code)
SELECT DISTINCT list1.PeopleID
FROM ((Select PeopleID from PeopleList where ListID=1) as
List1
INNER JOIN (Select PeopleID from PeopleList where
ListID=2) as List2 ON
List1.PeopleID = List1.PeopleID)
INNER JOIN (Select PeopleID from
PeopleList where ListID=2) as List3 ON List2.PeopleID =
List3.PeopleID

Way3: Exists and correlated subqueries (I've found this is
a lot slower than Way1 & Way2)
(pseudo code)
SELECT PeopleID
FROM People
WHERE Exists (Select PeopleID from ListPeople where
People.PeopleID =
ListPeople.PeopleID and ListID = 1)
AND Exists (Select PeopleID from ListPeople where
People.PeopleID =
ListPeople.PeopleID and ListID = 2)
AND Exists (Select PeopleID from ListPeople where
People.PeopleID =
ListPeople.PeopleID and ListID = 3)

Way4 In & Subqueries (Again, much slower than Way1 & Way2)
SELECT PeopleID
FROM People
WHERE PeopleID In
(Select PeopleID from ListPeople where ListID = 1)
AND PeopleID In
(Select PeopleID from ListPeople where ListID = 2)
AND PeopleID In
(Select PeopleID from ListPeople where ListID = 3)
 
Hi,


See each SELECT as a for-loop. If the test occurring inside the for loop
depends on a "something" from outside, that makes it an imbricate loop, like

for i = 1 to n
for j = 1 to m
test imply i and j
next j
next i


If the test does not imply something external to the loop, if it can be
executed as a stand alone query, that is just a disjoint loop, like:

for i = 1 to n
...
next i
' keep result form that loop


for j= 1 to m
...
next j


In the first case, there is n*m operations (up to n*m) and in the second,
there is just n+m operations. That is the main difference. We generally
speak of correlated and not-correlated query, respectively, rather than
"inside" loop or "disjoint" loop, since looping is just but ONE way to solve
the query.


So, 3 should be the worse of all, with its multiple correlated query, 1
should be the best, having no extra query at all.


Now, why 4 is worse than 2 itself worse than 1. Each time we reach a table,
we have to deal with it. As example, the following:

SELECT f1, f2 FROM myTable
UNION
SELECT f1, f2 FROM myTable

is about twice slower than

SELECT DISTINCT f1, f2 FROM myTable


because we run through the same table, twice (lucky if it is in buffer in
memory, else, we pump it twice from the hard disk, very slow). Note that
both queries result is the same.


That may explain why 2 and 4 (and I disregard 3, already doomed) are slower
than 1. Now, why 2 is better than 4 is due to two things: an IN clause
cannot make use of the indexes, since the IN-list has no built in index...
and a JOIN may easily use indexes, to solve the ON criteria. The JOIN is
solved before the WHERE clause, and if records are "eliminated" from the
JOIN, through the use of index, the whole row does NOT have to be pumped
(from hard disk, and, in the case of Jet, via the network cable, eventually)
for evaluation in the WHERE clause, occurring after. So, 2 making the job in
a join, using indexes, is generally faster than 4, making the job in a WHERE
clause, mainly if many records will be rejected, they will have been pumped
uselessly, and has to scan the list(s), since no index exist on these ( that
is why it is preferable, for a IN list made of constants, to start with the
most highly probable list hit value, so the Field IN ( h1, h2, ...) will
stop at h1, no need to continue to scan the list, if Field=h1 very often) .


Well, you can probably come with special cases where, mainly with very small
tables, the order will not be 1-2-4-3 (from most preferred to doom), but in
general:

Avoid multiple table opening, if possible ( UNION versus DISTINCT; query4
versus query1, query2 versus query1)

Avoid correlated subquery. (query 3 is really doomed)

Use join rather than correlated sub-query (no example here, and query1 show
a solution without correlated subquery), in general.

Use JOIN rather than WHERE clause, if the condition implies two of the
joined (joinable) tables.

Avoid nullifying the use of index ( IN list, or computed expression, like:
WHERE Field+1 = 1, very bad, better do the math: WHERE Field1=0. Indeed,
there is no index on the expression "Field+1" (or on more complex
expression), plain and simple.

There are cases where ONE Exists can be at par, or better, than a ONE join
(mainly with MS SQL Server), but Jet gave up in NOT EXISTS ... really poor,
avoid NOT EXIST, with Jet, or NOT IN.

With Jet, try to make all the computations in the SELECT, rather that in
VBA; with MS SQL Server, it is generally seen preferable to make the
computation outside the SQL statement.


<rant>
ADO.Net books, dealing with their golumly "precious" disconnected mode,
promote the use of simple SQL statements with no JOIN. That is horrible, in
my humble opinion, but sure, it simplifies their miserable life since they
have to provide SQL statement to update/delete and append that correspond to
the SELECT... so simpler is their SELECT, simpler is their job. If you ever
read to avoid joins and complex queries, it may be from that context, of
their XML deformed body, which is far from the "tradition", and totally
uselessly complex on a desktop application.
<\rant>

Hoping it may help,
Vanderghast, Access MVP
 
Hi Anne,

Don't forget the power of a "pair of
multiselect listboxes" as demonstrated by
Helen Feddema in "Access Archon Column #72"
http://www.helenfeddema.com/access.htm
(72. Paired Multi-Select Listboxes accarch72.zip)

On your form you would show 2 listboxes side-by-side,
lstAvailable and lstSelected, bound to 2 tables, tblAvailable
and tblSelected.

When you open the form, you clear records from
tblSelected, and fill tblAvailable with all ListID's.

Between the listboxes you have 2 command buttons,
one with "left arrow", the other with "right arrow," whose
routines move selected values in listboxes from one to the
other.

When the user is ready to run the process that needs the
query, there in the lstSelected (and thus, in tblSelected)
are all the "selected" -- it doesn't get any clearer (nor cleaner)
than that when it comes to "interface."

But the main point is that in your query you simply need to
join to tblSelected.... no "IN's" or "EXIST's"....

In terms of interface and query speed, it is a win-win
solution.

Good luck,

Gary Walter
 
Hi there -

Just wanted to say thansk for your very complete answer.
I had already decided to go with #1, which also has the
added benefit of allowing me to use the same solution to
create the sql as I do for the or solution, then I just
wrap in the group by/having and I'm all set.

Thanks again.
 
Back
Top