Calculate Average without outliers

G

Guillermo_Lopez

Hello all,

I am trying to calculate the average sales for each SKU while removing
the x% of the sales as outliers.

So from 30k rows in the sales table with about 2,000 SKUs, this is the
query that I currently have. It works correctly; however, it takes too
long to run. What I want is a query or several queries that run
faster.

SELECT Q1.PRODUCT, Q1.Invoice, Q1.UNITS INTO ProductInvoiceNoOutliers
FROM Q204_1_Sales_Detail AS Q1
WHERE (((Q1.Invoice) Not In (SELECT top 5 PERCENT Q2.Invoice FROM
Q204_1_Sales_Detail AS Q2 WHERE Q2.PRODUCT=Q1.PRODUCT ORDER BY
Q2.UNITS DESC;)));

Then: SELECT Product, Avg(Units) as AVGUNITS FROM Query GROUP BY
Product

Like I was saying, this query works but not as fast as I want it too.
For 30k rows, it takes more than 10 hours to run. I need to drop it
down to minutes or less if possible. Is there any other way?

VBA code is acceptable.

Thanks in Advanced

- GL
 
J

John W. Vinson

Like I was saying, this query works but not as fast as I want it too.
For 30k rows, it takes more than 10 hours to run. I need to drop it
down to minutes or less if possible. Is there any other way?

Access can be pretty inefficient at handling NOT IN() clauses, especially if
they have a large number of ID's. Try using a "frustrated outer join" query.
It can be done as a Subquery but I'm not sure I could get the SQL right in one
go. Instead try using two queries: first save your current subquery as InTop5:

SELECT top 5 PERCENT Q2.Invoice FROM
Q204_1_Sales_Detail AS Q2 WHERE Q2.PRODUCT=Q1.PRODUCT ORDER BY
Q2.UNITS DESC;

Then use an unmatched query:

SELECT Q1.PRODUCT, Q1.Invoice
FROM Q204_1_Sales_Detail AS Q1 LEFT JOIN InTop5
ON Q1.Invoice = InTop5.Invoice
WHERE InTop5.Invoice IS NULL;

to select only those records which are not in the top 5 query.
 
G

Guillermo_Lopez

Access can be pretty inefficient at handling NOT IN() clauses, especially if
they have a large number of ID's. Try using a "frustrated outer join" query.
It can be done as a Subquery but I'm not sure I could get the SQL right inone
go. Instead try using two queries: first save your current subquery as InTop5:

SELECT top 5 PERCENT Q2.Invoice FROM
Q204_1_Sales_Detail AS Q2 WHERE Q2.PRODUCT=Q1.PRODUCT ORDER BY
Q2.UNITS DESC;

Then use an unmatched query:

SELECT Q1.PRODUCT, Q1.Invoice
FROM Q204_1_Sales_Detail AS Q1 LEFT JOIN InTop5
ON Q1.Invoice = InTop5.Invoice
WHERE InTop5.Invoice IS NULL;

to select only those records which are not in the top 5 query.

Thanks John,

I thought of having the unmatched query, however I need to remove 5%
from every Sku hence the "Q2.PRODUCT=Q1.PRODUCT".

However, trying out some stuff. When I implemented my original query
into my application, the query was much faster than it was when i ran
it originally.

My guess is that on the application I'm working on, the query pulls
the data directly from a table, and the original pulls it from a
query. This being the case, the query runs fast enough to be used in
the application. (It takes around 20 seconds now).

- GL
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top