Michel Walsh said:
We can expect the join to be faster than the correlated sub-query
since the join can immediate use indexes and walk down simultaneously,
side
by side, the two indexes implied in the ON clause. The correlated
sub-query
may miss that possibility in how to use efficiently the two indexes, and
thus, turn out to be slower. Sure, that assumes the fields are indexed,
and
that table are larges (else, indexes won't be loaded).
My usual reply to an assertion such as yours is to ask, 'Did you
test?'
So, did you?
I appreciate the detail of your reply (snipped) and thank you for it.
May I ask, is it based on how JOINs and subqueries *should* work in
theory or is it based on how Jet' *does* work in actual
implementation? My practical experience with Jet seems to differ from
your description. I say 'seem' because I have no real evidence from
testing to offer, merely anecdotal evidence that Jet doesn't seem to
use indexes very efficiently in JOINS, ORDER BY and GROUP BY. I
suspect procedural code under the covers as being the culprit.
Back to practical testing. You asserted that indexing may give an
advantage to the JOIN in the group by approach which may not be
afforded to the subquery. In my original test I used no indexing;
again, I thought the absence of indexing would make for a better test
because it means more work for the SQL engine.
I just now created an index on MyDataCol, compacted the database to
ensure the index was rebuilt and re-ran the test. You may be surprised
by the results:
subquery: 0mins 17secs
group by: 5min 48secs
Again, group by results did not come out in rank order, which
presumably is a requirement for a row ID.
I think you may have missed the fact the JOIN is completed before the
GROUP BY clause is applied and a self JOIN such as yours returns a
relatively large row set. The most efficient way (I think) of
demonstrating this is to use COUNT(*) because the SQL engine is
optimized for this construct, so:
SELECT COUNT(*)
FROM 10K_row_table As a
INNER JOIN 10K_row_table As b
ON a.MyDataCol >= b.MyDataCol
This returns a single value in 2min 01secs (remember the subquery
approach completed and returned a 10K ordered rowset in just 17secs).
Will you now do your own testing of your assertions and post your
findings? Will you at least acknowledge the validity (or otherwise) of
my testing? If you don't, anyone reading this will be unsure whether
I've given your group by approach a fair test.
[a] BIG advantage of the join approach (that uses a GROUP BY), is that it
[can be represented] entirely GRAPHICALLY... well, when the join is an
equi
join
From my perspective, there is zero advantage to being able to
represent a query graphically (whatever that means). And when the
trade-off is taking 200% to 2000% more time to run, I'm not sure
having a 'picture' of a query would be preferable to anyone.
BTW for ranking purposes, my subquery would be better written as:
SELECT COUNT(*) + 1 FROM 10K_row_table WHERE MyDataCol <
T1.MyDataCol
because it would award the lower number (higher rank) to duplicates of
the same rank.
Jamie.
--