JET Implementation of "Select Top xx Percent"

Smartin · Sep 3, 2006

Not so much a question but an observation.

I have been analyzing somewhat large chunks of data (on the order of
25000 rows) using SELECT xx PERCENT. I noticed that the returned number
of rows generally differs from the expected number by tens or even a
couple thousand.

The explanation seems to be that SELECT xx PERCENT returns all the rows
that favor xx on the column indicated, not a percentage of source rows
to be returned.

For example, if I query
SELECT TOP 95% MYTABLE.* FROM MYTABLE ORDER BY SOMEVALUE ASC;
Access seems to calculate 95% of the max of SOMEVALUE, then returns the
result of
SELECT TOP 95% MYTABLE.* FROM MYTABLE
WHERE SOMEVALUE <= [max(somevalue*.95)]
ORDER BY SOMEVALUE ASC;

Since there might be many rows that match <= [max(somevalue*.95)] the
result set might differ considerably from the expected. I suspect
rounding factors into this as well, as I have seen in other posts.

Michael Gramelspacher · Sep 4, 2006

smartin108 said:
Not so much a question but an observation.

I have been analyzing somewhat large chunks of data (on the order of
25000 rows) using SELECT xx PERCENT. I noticed that the returned number
of rows generally differs from the expected number by tens or even a
couple thousand.

The explanation seems to be that SELECT xx PERCENT returns all the rows
that favor xx on the column indicated, not a percentage of source rows
to be returned.

For example, if I query
SELECT TOP 95% MYTABLE.* FROM MYTABLE ORDER BY SOMEVALUE ASC;
Access seems to calculate 95% of the max of SOMEVALUE, then returns the
result of
SELECT TOP 95% MYTABLE.* FROM MYTABLE
WHERE SOMEVALUE <= [max(somevalue*.95)]
ORDER BY SOMEVALUE ASC;

Since there might be many rows that match <= [max(somevalue*.95)] the
result set might differ considerably from the expected. I suspect
rounding factors into this as well, as I have seen in other posts.

Just an example:

Employees
-------------
employee_nbr
Salary
Employee_name

Query: Top_X_Percent_of_Salaries
--------------------------------
PARAMETERS [Enter a percent as a decimal:] Value;
SELECT E.Employee_nbr, E.Salary, E.Employee_name
FROM Employees AS E
WHERE ((((SELECT COUNT(*) * [Enter a percent as a decimal:]
FROM Employees AS E2))>=(Select Count(*) FROM Employees AS E1
WHERE E1.Salary >E.salary OR (E1.salary = E.salary AND
E1.employee_nbr <= E.employee_nbr))));

Results for .25

Dirk Goldgar · Sep 4, 2006

Smartin said:
Not so much a question but an observation.

I have been analyzing somewhat large chunks of data (on the order of
25000 rows) using SELECT xx PERCENT. I noticed that the returned
number of rows generally differs from the expected number by tens or
even a couple thousand.

The explanation seems to be that SELECT xx PERCENT returns all the
rows that favor xx on the column indicated, not a percentage of
source rows to be returned.

For example, if I query
SELECT TOP 95% MYTABLE.* FROM MYTABLE ORDER BY SOMEVALUE ASC;
Access seems to calculate 95% of the max of SOMEVALUE, then returns
the result of
SELECT TOP 95% MYTABLE.* FROM MYTABLE
WHERE SOMEVALUE <= [max(somevalue*.95)]
ORDER BY SOMEVALUE ASC;

Since there might be many rows that match <= [max(somevalue*.95)] the
result set might differ considerably from the expected. I suspect
rounding factors into this as well, as I have seen in other posts.

I think you're mistaken. But remember that, as the Jet SQL help file
says, "The TOP predicate does not choose between equal values." So if
you ask for the top 95% of the ordered records, and there are 100
records in the ordered recordset, but records 95 through 100 have the
same value in the sort key fields, then you're going to get all 100
records.

Smartin · Sep 5, 2006

Dirk said:
Smartin said:

Not so much a question but an observation.

I have been analyzing somewhat large chunks of data (on the order of
25000 rows) using SELECT xx PERCENT. I noticed that the returned
number of rows generally differs from the expected number by tens or
even a couple thousand.

The explanation seems to be that SELECT xx PERCENT returns all the
rows that favor xx on the column indicated, not a percentage of
source rows to be returned.

For example, if I query
SELECT TOP 95% MYTABLE.* FROM MYTABLE ORDER BY SOMEVALUE ASC;
Access seems to calculate 95% of the max of SOMEVALUE, then returns
the result of
SELECT TOP 95% MYTABLE.* FROM MYTABLE
WHERE SOMEVALUE <= [max(somevalue*.95)]
ORDER BY SOMEVALUE ASC;

Since there might be many rows that match <= [max(somevalue*.95)] the
result set might differ considerably from the expected. I suspect
rounding factors into this as well, as I have seen in other posts.

Click to expand...

I think you're mistaken. But remember that, as the Jet SQL help file
says, "The TOP predicate does not choose between equal values." So if
you ask for the top 95% of the ordered records, and there are 100
records in the ordered recordset, but records 95 through 100 have the
same value in the sort key fields, then you're going to get all 100
records.

Fair enough. Were I able to /find/ the help entry on the top predicate...

Thanks for the explanation!

Dirk Goldgar · Sep 5, 2006

Smartin said:
Fair enough. Were I able to /find/ the help entry on the top
predicate...

Finding things in the help file can be a problem, all right. For SQL
questions, it's usually easiest to open the help contents, locate the
entry for the "Microsoft Jet SQL Reference" not far from the bottom of
the list, and start from there.

Smartin · Sep 5, 2006

Dirk said:
Finding things in the help file can be a problem, all right. For SQL
questions, it's usually easiest to open the help contents, locate the
entry for the "Microsoft Jet SQL Reference" not far from the bottom of
the list, and start from there.

Thanks for the tip.

I swear the ease of using help, and its efficacy, are both inversely
proportional to the version. Ah well, roll with the punches.

Regards,

Dirk Goldgar · Sep 6, 2006

Smartin said:
I swear the ease of using help, and its efficacy, are both inversely
proportional to the version. Ah well, roll with the punches.

It's all been downhill since Access 97. They *say* that although Access
2007 will ship with minimal help, it will download "better" help as it
becomes available. We'll see.

SELECT TOP n -- looking for assistance	12	Apr 3, 2010
Access 2000: Data Type Mismatch in query	2	Feb 22, 2008
SQL Statement SELECT TOP * PERCENT	3	Oct 12, 2005
SELECT TOP 50 * returns more than 50 rows	4	Nov 15, 2006
Random Top 10 percent of groups	6	Feb 26, 2010
Why does view return different results in Enterprise Manager than in adp?	1	Mar 31, 2011
Union Query	0	Feb 22, 2011
Getting TOP of Subsets of a Table	3	Aug 23, 2006

JET Implementation of "Select Top xx Percent"

Smartin

Michael Gramelspacher

Dirk Goldgar

Smartin

Dirk Goldgar

Smartin

Dirk Goldgar

Ask a Question

Similar Threads