GROUP BY vs DISTINCT

D

David W. Fenton

Were you the one who said that life is too short??

The test demonstrated exactly what he had forecast, that DISTINCT is
the logical way to get DISTINCT values, and that GROUP BY is for a
different purpose (and would thus very likely not be as fast).

He only did the test because you stop you from continuing to natter
on about it.
 
D

David W. Fenton

and here is the query plan, again, the same, for both statements
(excuse the HTML format, I suspect that without it, it turns out
un-readable):

|--Sort(DISTINCT ORDER BY:([pubs].[dbo].[authors].[au_fname]
|ASC))

|--Index Scan(OBJECT:([pubs].[dbo].[authors].[aunmind]))

Er, that's a SQL Server query plan, not a Jet one.

Who said we were using SQL Server as the back end?
 
D

David W. Fenton

and here is the query plan, again, the same, for both statements
(excuse the HTML format, I suspect that without it, it turns out
un-readable):

|--Sort(DISTINCT ORDER BY:([pubs].[dbo].[authors].[au_fname]
|ASC))

|--Index Scan(OBJECT:([pubs].[dbo].[authors].[aunmind]))

And if you run it in Jet, the SHOWPLAN gives this (Jet 3.5 or 4.0,
run both wiht LastName indexed and indexed (non-unique)):

--- Query1 ---
[SELECT DISTINCT tblPerson.LastName
FROM tblPerson;]

- Inputs to Query -
Table 'tblPerson'
- End inputs to Query -

store result in temporary table


--- Query2 ---
[SELECT tblPerson.LastName
FROM tblPerson
GROUP BY tblPerson.LastName;]

- Inputs to Query -
Table 'tblPerson'
- End inputs to Query -

01) Group table 'tblPerson'

This shows that the answer to the question depends on the database
engine being used, and, as expected, Jet handles the two completely
differently.
 
W

Warrio

it's not that I want to waste my time saying that you are right or write
saying the only way is testing...

but the good thing in computing is that everything has a reason, somewhere
hidden or invisible, but it does exist! and saying that testing will give
the right answer is like gambling! how many parameters can influence your
testing????

even if you make it on a new machine, without any connection and only access
running! you can't have the same result.

come on, computing is one of the rare exact science, don't make something
esle!
 
B

Baz

If you are so convinced that Michel is right, on the basis of no evidence
whatsoever, I suggest that YOU better try it.

If someone here told you that the moon is made of green cheese I guess you
would believe that too.
 
B

Baz

My God what a pair of idiots (I mean Walsh and Warrio, not you David).

Probably better to let these two fools massage each other's egos in French
while we go and do something more worthwhile.

David W. Fenton said:
and here is the query plan, again, the same, for both statements
(excuse the HTML format, I suspect that without it, it turns out
un-readable):

|--Sort(DISTINCT ORDER BY:([pubs].[dbo].[authors].[au_fname]
|ASC))

|--Index Scan(OBJECT:([pubs].[dbo].[authors].[aunmind]))

Er, that's a SQL Server query plan, not a Jet one.

Who said we were using SQL Server as the back end?
 
B

Baz

If you really believe that garbage then you have no chance of getting an
answer to your question. Oh, but I'm forgetting, you have already accepted
Michel's answer, even though he hasn't a clue what he's talking about.

Warrio said:
it's not that I want to waste my time saying that you are right or write
saying the only way is testing...

but the good thing in computing is that everything has a reason, somewhere
hidden or invisible, but it does exist! and saying that testing will give
the right answer is like gambling! how many parameters can influence your
testing????

even if you make it on a new machine, without any connection and only access
running! you can't have the same result.

come on, computing is one of the rare exact science, don't make something
esle!


"David W. Fenton" <[email protected]> a écrit dans le message de
 
B

Baz

It's a good thing he doesn't because speculation and misinformation are no
use to anyone.
 
M

Michel Walsh

Indeed, I assumed that Jet would solve these simple basic queries the same
way MS SQL Server would do, but it seems Jet optimizer fails to optimize the
second query.

In MS SQL Server, there is no difference between the queries plan, even if
there are multiple fields (indexed, or not) between the DISTINCT and the
logical equivalent GROUP BY.


Vanderghast, Access MVP

David W. Fenton said:
and here is the query plan, again, the same, for both statements
(excuse the HTML format, I suspect that without it, it turns out
un-readable):

|--Sort(DISTINCT ORDER BY:([pubs].[dbo].[authors].[au_fname]
|ASC))

|--Index Scan(OBJECT:([pubs].[dbo].[authors].[aunmind]))

And if you run it in Jet, the SHOWPLAN gives this (Jet 3.5 or 4.0,
run both wiht LastName indexed and indexed (non-unique)):

--- Query1 ---
[SELECT DISTINCT tblPerson.LastName
FROM tblPerson;]

- Inputs to Query -
Table 'tblPerson'
- End inputs to Query -

store result in temporary table


--- Query2 ---
[SELECT tblPerson.LastName
FROM tblPerson
GROUP BY tblPerson.LastName;]

- Inputs to Query -
Table 'tblPerson'
- End inputs to Query -

01) Group table 'tblPerson'

This shows that the answer to the question depends on the database
engine being used, and, as expected, Jet handles the two completely
differently.
 
M

Michel Walsh

Half of the pair will humbly tell you that if it just happen that Jet is ...
crippled... in the particular case, that is surely not 'by design', since
clearly, other SQL optimizer achieve to find the SAME plan of execution for
a DISTINCT and the logically equivalent GROUP BY.


There is nothing 'built-in' the concepts of DISTINCT or of GROUP BY that may
leads you to think they logically 'differ', exception made for some syntax
particularities like DISTINCT which can be used with *, while GROUP BY
cannot (due to extra validations carried over when GROUP BY is explicitly
used).



Vanderghast, Access MVP.


Baz said:
My God what a pair of idiots (I mean Walsh and Warrio, not you David).

Probably better to let these two fools massage each other's egos in French
while we go and do something more worthwhile.

David W. Fenton said:
and here is the query plan, again, the same, for both statements
(excuse the HTML format, I suspect that without it, it turns out
un-readable):

|--Sort(DISTINCT ORDER BY:([pubs].[dbo].[authors].[au_fname]
|ASC))

|--Index Scan(OBJECT:([pubs].[dbo].[authors].[aunmind]))

Er, that's a SQL Server query plan, not a Jet one.

Who said we were using SQL Server as the back end?
 
K

Keith Wilby

Baz said:
If someone here told you that the moon is made of green cheese I guess you
would believe that too.

Ridiculous. Everyone knows it's cheddar, right Gromit?
 
M

Michel Walsh

The test he does did NOT prove the second part of your (or is it his)
affirmation (DISTINCT and GROUP BY being for different 'purposes') since
other alternatives exist to explain the difference, such as that Jet not
doing as a good job as it should, in a specific case. Generalization from a
single case is unwise.


Vanderghast, Access MVP
 
M

Michel Walsh

And to further clarify the spirit of my last message, I will like to recall
to your attention that SQL is not an imperative language, but a descriptive
one, a language where you specify what you want, not HOW TO get it. In that
'spirit' of the SQL language, DISTINCT and GROUP BY are NOT for different
purposes.

*If* someone, unaware, just realized that a DISTINCT can be replace by a
GROUP BY to get the same 'description' of what is wanted, well, that is
great. Newsgroup are to educate, too.

Vanderghast, Access MVP.

Michel Walsh said:
The test he does did NOT prove the second part of your (or is it his)
affirmation (DISTINCT and GROUP BY being for different 'purposes') since
other alternatives exist to explain the difference, such as that Jet not
doing as a good job as it should, in a specific case. Generalization from
a single case is unwise.


Vanderghast, Access MVP
 
M

Michel Walsh

Can you repeat that with my name explicitly written in it?

Vanderghast, Access MVP
 
D

David W. Fenton

Half of the pair will humbly tell you that if it just happen that
Jet is ... crippled...

Jet is not in any way crippled.
in the particular case, that is surely not 'by design', since
clearly, other SQL optimizer achieve to find the SAME plan of
execution for a DISTINCT and the logically equivalent GROUP BY.

There is nothing 'built-in' the concepts of DISTINCT or of GROUP
BY that may leads you to think they logically 'differ', exception
made for some syntax particularities like DISTINCT which can be
used with *, while GROUP BY cannot (due to extra validations
carried over when GROUP BY is explicitly used).

They seem obviously different to me, as one is eliminating
duplicates, while the other is going group on like values for the
purpose of summarizing other field(s). It makes perfect sense to me
for them to not be optmized the same, though I guess a query
optimizer that recognizes there's nothing to summarize could then
use the same methods as with DISTINCT.

But that looks to me more like a fall-back to fix user error. I see
no logical reason why someone should expect them to be treated
identically. There are any number of methods in SQL that can produce
results that appear identical, but that doesn't mean that all
methods for doing so should be optimized exactly the same way.
 
D

David W. Fenton

And to further clarify the spirit of my last message, I will like
to recall to your attention that SQL is not an imperative
language, but a descriptive one, a language where you specify what
you want, not HOW TO get it. In that 'spirit' of the SQL language,
DISTINCT and GROUP BY are NOT for different purposes.

Yes, different database engines treat them differently.
*If* someone, unaware, just realized that a DISTINCT can be
replace by a GROUP BY to get the same 'description' of what is
wanted, well, that is great. Newsgroup are to educate, too.

They aren't the same just because they can be used to produce the
same results. That's a major logical fallacy.

The SQL Server optimization is a fix for USER ERROR.
 
D

DAVID

Warrio said:
Hello!

Which of GROUP BY and DISTINCT is the quickest? and why?

Thanks for any relevant answer!

PS: Please don't tell me that there is no difference because it's a matter
of one millionth second.
"When a SELECT statement is executed, the JET engine always performs an
implicit sort if a DISTINCT modifier is present; an ORDER BY clause
causes another sort to be done, which occurs after all other sorts (if
any)."

Both "group by" and "distinct" are normally limited by the speed of the
sort operation.

(david)
 
M

Michel Walsh

The purpose of an optimizer is to find the best method. Jet optimizer fails
to find the best method in that specific case, so Jet optimizer didn't
optimize as it should. I hardly see how can someone can say that this is
fine and acceptable state. So, indeed, Jet is crippled, in that specific
case, as it is in the NOT IN() construction.

If a better method is called a 'fix-up' in your dictionary, then, no problem
with me, as long as NO ONE think there is no need to bring these 'fix-ups'
into Jet.

Finally, again, SQL is descriptive, and logically equivalent results should
be optimized the same way, indecently of the exact syntax... in particular
if the query is a basic elementary one.


Vanderghast, Access MVP
 
D

David W. Fenton

The purpose of an optimizer is to find the best method.

Yes. Given the commands given.
Jet optimizer fails
to find the best method in that specific case, so Jet optimizer
didn't optimize as it should.

Because it was given the wrong commands.
I hardly see how can someone can say that this is
fine and acceptable state. So, indeed, Jet is crippled, in that
specific case, as it is in the NOT IN() construction.

All SQL variants have certain things that aren't well-implemented.
If a better method is called a 'fix-up' in your dictionary, then,
no problem with me, as long as NO ONE think there is no need to
bring these 'fix-ups' into Jet.

It would be nice if Jet could fix the user error, as long as it
didn't break performance in other cases.
Finally, again, SQL is descriptive, and logically equivalent
results should be optimized the same way, indecently of the exact
syntax... in particular if the query is a basic elementary one.

But they aren't logically equivalent -- they are only equivalent in
the *results* displayed.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Top