Select Distinct Max

C

cicciotom2004

Hi all

I need to filter a table in this way:
for each distinct value U in column A,
I want to select only the row having, in column B, the maximum value
between those rows having U in column A.

Example:

Field A | Field B
800000111111 | 27000
800000111111 | 4000
800000222222 | 5000
800000222222 | 8000
800000222222 | 15000
800000222222 | 2000
800000222222 | 15000
800000222222 | 9000
800000222222 | 9000
800000333333 | 4500
800000333333 | 2700
800000333333 | 9800

My result table will be

Field A | Field B
800000111111 | 27000
800000222222 | 15000
800000333333 | 9800


Thank you very much
 
T

Tom Ellison

Dear Tom:

Could it be this simple?

SELECT [Field A], MAX([Field B]) AS FieldB
FROM YourTableNameGoesHere
GROUP BY [Field A]
ORDER BY [Field A]

Tom Ellison
 
C

cicciotom2004

Thanks Tom,

Sorry but I forgot to tell I actually have some other columns, of
different type.
I can't see how to select them also, without using an aggregate
function (that would not make much sense)

I.E.: I have

Field A | Field B | Field C | Field D
800000111111 | 27000 | .... | ....
800000111111 | 4000 | .... | .....
800000222222 | 5000 | ... | .....
800000222222 | 8000 | .... | ....

It must return

Field A | Field B | Field C | Field D
800000111111 | 27000 | .... | ....
800000222222 | 15000 | .... | ...


Thanks
 
T

Tom Ellison

Dear Tom:

That's more complicated.

SELECT [Field A], [Field B], [Field C], [Field D
FROM YourTableNameGoesHere T
WHERE [Field B] =
(SELECT MAX([Field B])
FROM YourTableNameGoesHere T1
WHERE T1.[Field A] = T.[Field A])
ORDER BY [Field A]

If two rows for the same Field A value have the same Field B value as well,
and it is the maximum value of Field B for all rows of that Field A value,
then you're going to get both. However, Field C and Field D could
definitely be different for those two rows. There's really no choice in
this, unless you give definite rules that will eliminate or even reduce such
duplicity.

Tom Ellison
 
C

Ciccio Tom

Tom:

SELECT [Field A], [Field B], [Field C], [Field D
FROM YourTableNameGoesHere T
WHERE [Field B] =
(SELECT MAX([Field B])
FROM YourTableNameGoesHere T1
WHERE T1.[Field A] = T.[Field A])
ORDER BY [Field A]

If two rows for the same Field A value have the same Field B value as well,
and it is the maximum value of Field B for all rows of that Field A value,
then you're going to get both.


Unfortunately, this is the case.
With a
SELECT DISTINCT [list af all columns] ...
to be executed BEFORE your query,
wouldn't I get a way to eliminate the chance of such duplicates?

However, Field C and Field D could
definitely be different for those two rows.

Yes they could.

Also, your query is very time consuming (I guess it has to run a query
for each row - there are 240000 rows)
Any way to speed it up?

Thanks a lot
 
T

Tom Ellison

Dear Tom:

Off the top of my head, an index on [Field A] and [Field B], in that order,
may be just the ticket to speed this up. However, if this is run on an
intervening query, and not a table, I'm not so sure. An optimal solution
could take a while to prepare.

A more thorough way to speed it up would be to put the intermediate results
from the former table into a temporary table, then add the index on [Field
A] and [Field B], then query that. I don't know how long "very time
consuming" means, and a fair measure of that might help me guess at the rest
of the situation just a bit better.

Be sure to add the right square bracket at the end of the first line, which
I omitted.

Tom Ellison


Ciccio Tom said:
Tom:

SELECT [Field A], [Field B], [Field C], [Field D
FROM YourTableNameGoesHere T
WHERE [Field B] =
(SELECT MAX([Field B])
FROM YourTableNameGoesHere T1
WHERE T1.[Field A] = T.[Field A])
ORDER BY [Field A]

If two rows for the same Field A value have the same Field B value as
well,
and it is the maximum value of Field B for all rows of that Field A value,
then you're going to get both.


Unfortunately, this is the case.
With a
SELECT DISTINCT [list af all columns] ...
to be executed BEFORE your query,
wouldn't I get a way to eliminate the chance of such duplicates?

However, Field C and Field D could
definitely be different for those two rows.

Yes they could.

Also, your query is very time consuming (I guess it has to run a query
for each row - there are 240000 rows)
Any way to speed it up?

Thanks a lot
 
C

Ciccio Tom

Tom said:
I don't know how long "very time
consuming" means,

It means I run the query last night, and it still did not end.


Anyway, I stopped it, added an index on the table.
Index is FieldA AND FieldB AND AnotherVariableField

Now the query works, in some seconds it returns almost 52000 rows.
If I run a SELECT DISTINCT FieldA FROM MyTable I get 47000 rows.
That means I have 5000 rows having same FieldA of at least another row,
but differing in some other field. Right?


Thanks a lot
 
T

Tom Ellison

Dear Tom:

DISTINCT works across all the columns specified in the query. The 5000 rows
eliminated are duplicates in all 4 columns, not only FieldA. To find the
statistics on FieldA alone, drop the other 3 columns from the query.

Tom Ellison
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top