JET Implementation of "Select Top xx Percent"

S

Smartin

Not so much a question but an observation.

I have been analyzing somewhat large chunks of data (on the order of
25000 rows) using SELECT xx PERCENT. I noticed that the returned number
of rows generally differs from the expected number by tens or even a
couple thousand.

The explanation seems to be that SELECT xx PERCENT returns all the rows
that favor xx on the column indicated, not a percentage of source rows
to be returned.

For example, if I query
SELECT TOP 95% MYTABLE.* FROM MYTABLE ORDER BY SOMEVALUE ASC;
Access seems to calculate 95% of the max of SOMEVALUE, then returns the
result of
SELECT TOP 95% MYTABLE.* FROM MYTABLE
WHERE SOMEVALUE <= [max(somevalue*.95)]
ORDER BY SOMEVALUE ASC;

Since there might be many rows that match <= [max(somevalue*.95)] the
result set might differ considerably from the expected. I suspect
rounding factors into this as well, as I have seen in other posts.
 
M

Michael Gramelspacher

smartin108 said:
Not so much a question but an observation.

I have been analyzing somewhat large chunks of data (on the order of
25000 rows) using SELECT xx PERCENT. I noticed that the returned number
of rows generally differs from the expected number by tens or even a
couple thousand.

The explanation seems to be that SELECT xx PERCENT returns all the rows
that favor xx on the column indicated, not a percentage of source rows
to be returned.

For example, if I query
SELECT TOP 95% MYTABLE.* FROM MYTABLE ORDER BY SOMEVALUE ASC;
Access seems to calculate 95% of the max of SOMEVALUE, then returns the
result of
SELECT TOP 95% MYTABLE.* FROM MYTABLE
WHERE SOMEVALUE <= [max(somevalue*.95)]
ORDER BY SOMEVALUE ASC;

Since there might be many rows that match <= [max(somevalue*.95)] the
result set might differ considerably from the expected. I suspect
rounding factors into this as well, as I have seen in other posts.
Just an example:

Employees
-------------
employee_nbr
Salary
Employee_name

Query: Top_X_Percent_of_Salaries
--------------------------------
PARAMETERS [Enter a percent as a decimal:] Value;
SELECT E.Employee_nbr, E.Salary, E.Employee_name
FROM Employees AS E
WHERE ((((SELECT COUNT(*) * [Enter a percent as a decimal:]
FROM Employees AS E2))>=(Select Count(*) FROM Employees AS E1
WHERE E1.Salary >E.salary OR (E1.salary = E.salary AND
E1.employee_nbr <= E.employee_nbr))));

Results for .25
 
D

Dirk Goldgar

Smartin said:
Not so much a question but an observation.

I have been analyzing somewhat large chunks of data (on the order of
25000 rows) using SELECT xx PERCENT. I noticed that the returned
number of rows generally differs from the expected number by tens or
even a couple thousand.

The explanation seems to be that SELECT xx PERCENT returns all the
rows that favor xx on the column indicated, not a percentage of
source rows to be returned.

For example, if I query
SELECT TOP 95% MYTABLE.* FROM MYTABLE ORDER BY SOMEVALUE ASC;
Access seems to calculate 95% of the max of SOMEVALUE, then returns
the result of
SELECT TOP 95% MYTABLE.* FROM MYTABLE
WHERE SOMEVALUE <= [max(somevalue*.95)]
ORDER BY SOMEVALUE ASC;

Since there might be many rows that match <= [max(somevalue*.95)] the
result set might differ considerably from the expected. I suspect
rounding factors into this as well, as I have seen in other posts.

I think you're mistaken. But remember that, as the Jet SQL help file
says, "The TOP predicate does not choose between equal values." So if
you ask for the top 95% of the ordered records, and there are 100
records in the ordered recordset, but records 95 through 100 have the
same value in the sort key fields, then you're going to get all 100
records.
 
S

Smartin

Dirk said:
Smartin said:
Not so much a question but an observation.

I have been analyzing somewhat large chunks of data (on the order of
25000 rows) using SELECT xx PERCENT. I noticed that the returned
number of rows generally differs from the expected number by tens or
even a couple thousand.

The explanation seems to be that SELECT xx PERCENT returns all the
rows that favor xx on the column indicated, not a percentage of
source rows to be returned.

For example, if I query
SELECT TOP 95% MYTABLE.* FROM MYTABLE ORDER BY SOMEVALUE ASC;
Access seems to calculate 95% of the max of SOMEVALUE, then returns
the result of
SELECT TOP 95% MYTABLE.* FROM MYTABLE
WHERE SOMEVALUE <= [max(somevalue*.95)]
ORDER BY SOMEVALUE ASC;

Since there might be many rows that match <= [max(somevalue*.95)] the
result set might differ considerably from the expected. I suspect
rounding factors into this as well, as I have seen in other posts.

I think you're mistaken. But remember that, as the Jet SQL help file
says, "The TOP predicate does not choose between equal values." So if
you ask for the top 95% of the ordered records, and there are 100
records in the ordered recordset, but records 95 through 100 have the
same value in the sort key fields, then you're going to get all 100
records.

Fair enough. Were I able to /find/ the help entry on the top predicate...

Thanks for the explanation!
 
D

Dirk Goldgar

Smartin said:
Fair enough. Were I able to /find/ the help entry on the top
predicate...

Finding things in the help file can be a problem, all right. For SQL
questions, it's usually easiest to open the help contents, locate the
entry for the "Microsoft Jet SQL Reference" not far from the bottom of
the list, and start from there.
 
S

Smartin

Dirk said:
Finding things in the help file can be a problem, all right. For SQL
questions, it's usually easiest to open the help contents, locate the
entry for the "Microsoft Jet SQL Reference" not far from the bottom of
the list, and start from there.

Thanks for the tip.

I swear the ease of using help, and its efficacy, are both inversely
proportional to the version. Ah well, roll with the punches.

Regards,
 
D

Dirk Goldgar

Smartin said:
I swear the ease of using help, and its efficacy, are both inversely
proportional to the version. Ah well, roll with the punches.

It's all been downhill since Access 97. They *say* that although Access
2007 will ship with minimal help, it will download "better" help as it
becomes available. We'll see.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top