Select Distinct Max

cicciotom2004 · Apr 6, 2006

Hi all

I need to filter a table in this way:
for each distinct value U in column A,
I want to select only the row having, in column B, the maximum value
between those rows having U in column A.

Example:

Field A | Field B
800000111111 | 27000
800000111111 | 4000
800000222222 | 5000
800000222222 | 8000
800000222222 | 15000
800000222222 | 2000
800000222222 | 15000
800000222222 | 9000
800000222222 | 9000
800000333333 | 4500
800000333333 | 2700
800000333333 | 9800

My result table will be

Field A | Field B
800000111111 | 27000
800000222222 | 15000
800000333333 | 9800

Thank you very much

Tom Ellison · Apr 6, 2006

Dear Tom:

Could it be this simple?

SELECT [Field A], MAX([Field B]) AS FieldB
FROM YourTableNameGoesHere
GROUP BY [Field A]
ORDER BY [Field A]

Tom Ellison

cicciotom2004 · Apr 6, 2006

Thanks Tom,

Sorry but I forgot to tell I actually have some other columns, of
different type.
I can't see how to select them also, without using an aggregate
function (that would not make much sense)

I.E.: I have

Field A | Field B | Field C | Field D
800000111111 | 27000 | .... | ....
800000111111 | 4000 | .... | .....
800000222222 | 5000 | ... | .....
800000222222 | 8000 | .... | ....

It must return

Field A | Field B | Field C | Field D
800000111111 | 27000 | .... | ....
800000222222 | 15000 | .... | ...

Thanks

Tom Ellison · Apr 6, 2006

Dear Tom:

That's more complicated.

SELECT [Field A], [Field B], [Field C], [Field D
FROM YourTableNameGoesHere T
WHERE [Field B] =
(SELECT MAX([Field B])
FROM YourTableNameGoesHere T1
WHERE T1.[Field A] = T.[Field A])
ORDER BY [Field A]

If two rows for the same Field A value have the same Field B value as well,
and it is the maximum value of Field B for all rows of that Field A value,
then you're going to get both. However, Field C and Field D could
definitely be different for those two rows. There's really no choice in
this, unless you give definite rules that will eliminate or even reduce such
duplicity.

Tom Ellison

Ciccio Tom · Apr 6, 2006

Tom:

SELECT [Field A], [Field B], [Field C], [Field D
FROM YourTableNameGoesHere T
WHERE [Field B] =
(SELECT MAX([Field B])
FROM YourTableNameGoesHere T1
WHERE T1.[Field A] = T.[Field A])
ORDER BY [Field A]

If two rows for the same Field A value have the same Field B value as well,
and it is the maximum value of Field B for all rows of that Field A value,
then you're going to get both.

Unfortunately, this is the case.
With a
SELECT DISTINCT [list af all columns] ...
to be executed BEFORE your query,
wouldn't I get a way to eliminate the chance of such duplicates?

However, Field C and Field D could
definitely be different for those two rows.

Yes they could.

Also, your query is very time consuming (I guess it has to run a query
for each row - there are 240000 rows)
Any way to speed it up?

Thanks a lot

Tom Ellison · Apr 7, 2006

Dear Tom:

Off the top of my head, an index on [Field A] and [Field B], in that order,
may be just the ticket to speed this up. However, if this is run on an
intervening query, and not a table, I'm not so sure. An optimal solution
could take a while to prepare.

A more thorough way to speed it up would be to put the intermediate results
from the former table into a temporary table, then add the index on [Field
A] and [Field B], then query that. I don't know how long "very time
consuming" means, and a fair measure of that might help me guess at the rest
of the situation just a bit better.

Be sure to add the right square bracket at the end of the first line, which
I omitted.

Tom Ellison

Ciccio Tom said:
Tom:

SELECT [Field A], [Field B], [Field C], [Field D
FROM YourTableNameGoesHere T
WHERE [Field B] =
(SELECT MAX([Field B])
FROM YourTableNameGoesHere T1
WHERE T1.[Field A] = T.[Field A])
ORDER BY [Field A]

If two rows for the same Field A value have the same Field B value as
well,
and it is the maximum value of Field B for all rows of that Field A value,
then you're going to get both.

Click to expand...

Unfortunately, this is the case.
With a
SELECT DISTINCT [list af all columns] ...
to be executed BEFORE your query,
wouldn't I get a way to eliminate the chance of such duplicates?

However, Field C and Field D could
definitely be different for those two rows.

Click to expand...

Yes they could.

Also, your query is very time consuming (I guess it has to run a query
for each row - there are 240000 rows)
Any way to speed it up?

Thanks a lot

Ciccio Tom · Apr 7, 2006

Tom said:
I don't know how long "very time
consuming" means,

It means I run the query last night, and it still did not end.

Anyway, I stopped it, added an index on the table.
Index is FieldA AND FieldB AND AnotherVariableField

Now the query works, in some seconds it returns almost 52000 rows.
If I run a SELECT DISTINCT FieldA FROM MyTable I get 47000 rows.
That means I have 5000 rows having same FieldA of at least another row,
but differing in some other field. Right?

Thanks a lot

Tom Ellison · Apr 7, 2006

Dear Tom:

DISTINCT works across all the columns specified in the query. The 5000 rows
eliminated are duplicates in all 4 columns, not only FieldA. To find the
statistics on FieldA alone, drop the other 3 columns from the query.

Tom Ellison

Select Distinct Max

cicciotom2004

Tom Ellison

cicciotom2004

Tom Ellison

Ciccio Tom

Tom Ellison

Ciccio Tom

Tom Ellison