Top Percentage Problem

G

Green Biro

I have data with the fields: Student, Subject, Score.

I want to get lists of students something like the following:
Top 3% in any one subject.
Top 10% in at least two subjects.
Top 30% in at least three subjects.

Can anyone give me a pointer on how to approach this problem. I can use
Access or Excel including VBA but my preference is for a solution that's as
automated as possible so that I can get others to run it for me as and when
we get new data sets.

Thanks in advance for any help.


GB

PS I hope that there is no objection to me also posting this in an Access
group as, like I said, I can use either for the solution.
 
A

Arvin Meyer [MVP]

I'd run a series of queries in Access using the TOP predicate to get the
data you want, then do a union query to put the subqueries together, then
another query for 2 subjects from the union and yet another for the 3
subjects from the union. it's a bit kludgy, but it will take only a very
short time to do.
 
G

Green Biro

Hmmm....

This certainly is a it kludgy.

And whenever a new subject was introduced, I would have to create a new
query and add it to the union.
I'll keep thinking and maybe I or someone else will come up with something
slicker.

Thanks for the reply in any case.

GB
 
D

Douglas J. Steele

No, you shouldn't need to create new queries for each subject.

You should be able to create a query that returns, say, the top 3 percent
for each subject that's in the database: something like:

Subject1 99%
Subject1 98%
Subject1 94%
Subject2 97%
Subject2 96%
Subject2 95%
Subject3 97%
Subject3 91%
Subject3 81%

As new subjects are added, they just automatically appear in there. (See
what Allen Browne has at http://www.allenbrowne.com/subquery-01.html for how
to create a query like this)

You can then jon your table to the results of that query for each student.
That should give you n rows for each student (where n will be the number of
subjects where their mark is one of the ones selected for the subject). You
can then do a Count on how many rows there are for each student, and only
report those students for whom the count is greater or equal to the desired
number.

--
Doug Steele, Microsoft Access MVP

(no e-mails, please!)
 
D

Darwin Phish

I solved a similar problem using the following method. This solution
means you will be storing derived values in your table, a big no-no
for some. So, if you are changing test scores frequently or this is
an assignment for a database class, you probably will not want to use
it.

Add a numeric field to your table called Percentile. This field will
tell you what percentage of total test takers a given student scored
better than (or the same as) on the test. That is, the student with
the top score did better than everyone else; the student with the
lowest score did better than nobody).

You can populate this field using the following query (I assume your
table is named test_scores and that Subject is a text field):

UPDATE test_scores
SET Percentile = ( DCOUNT("*","test_scores","[Subject] = '" &
[Subject] & "' and [Score] = " & [Score]) + (DCOUNT
("*","test_scores","[Subject] = '" & [Subject] & "' and [Score] = " &
[Score]) - 1)/2 )/ DCOUNT("*","test_scores","[Subject] = '" &
[Subject] & "'");

I know this is ugly, but you only need to run this once on your
current data. You can add a where clause later when you only need to
calculate the field for new rest results. You may need to fine tune
the formula based on how you want to handle students with the same
score.

To get the list of students who where in the top X% in Y subjects, use
something like the following:

SELECT Student, sum(1) as Subjects
WHERE Percentile > (1-X%)
GROUP BY Student
HAVING ((sum(1)) >= Y);

You will have to use sub-queries to determine the actual subjects.

I hope this helps.

Noah
 
G

Green Biro

You should be able to create a query that returns, say, the top 3 percent
for each subject that's in the database: something like:

Subject1 99%
Subject1 98%
Subject1 94%
Subject2 97%
Subject2 96%
Subject2 95%
Subject3 97%
Subject3 91%
Subject3 81%

I'm sorry but this is the bit I don't understabd how to get. How do I
produce this (and also get the student id alongside these top results) when
I don't know in advance what my subjects could be. I'd like to persist with
this method if I could get my head around it. For the moment I've run a
crosstab using first result (it will in fact be the only result) for each
student / subject combination then I was going to do some Excel trickery.

Please let me know what SQL I should run to get your output. I'm reading
and rereading A.Brown's article but I can't see a parallell example to what
I need.


Thanks in anticipation.

GB
 
G

Green Biro

Got it!

First of all I appended my data to an empty temporary table that had an
autonumber primary key then I was able to use the subquery method A.Browne
uses to get customers with the longest time since ordering. I was then able
to feed this query into count and join queries to get the output to look
exactly the way I wanted it to.
I think the crux of the matter was that in order to use subqueries I needed
a unique id field.

Thanks to all who contributed to discussion.

GB
 
R

Rudolf Lamour

I solved a similar problem using the following method. This solution
means you will be storing derived values in your table, a big no-no
for some. So, if you are changing test scores frequently or this is
an assignment for a database class, you probably will not want to use
it.

Add a numeric field to your table called Percentile. This field will
tell you what percentage of total test takers a given student scored
better than (or the same as) on the test. That is, the student with
the top score did better than everyone else; the student with the
lowest score did better than nobody).

You can populate this field using the following query (I assume your
table is named test_scores and that Subject is a text field):

UPDATE test_scores
SET Percentile = ( DCOUNT("*","test_scores","[Subject] = '" &
[Subject] & "' and [Score] = " & [Score]) + (DCOUNT
("*","test_scores","[Subject] = '" & [Subject] & "' and [Score] = " &
[Score]) - 1)/2 )/ DCOUNT("*","test_scores","[Subject] = '" &
[Subject] & "'");

I know this is ugly, but you only need to run this once on your
current data. You can add a where clause later when you only need to
calculate the field for new rest results. You may need to fine tune
the formula based on how you want to handle students with the same
score.

To get the list of students who where in the top X% in Y subjects, use
something like the following:

SELECT Student, sum(1) as Subjects
WHERE Percentile > (1-X%)
GROUP BY Student
HAVING ((sum(1)) >= Y);

You will have to use sub-queries to determine the actual subjects.

I hope this helps.

Noah
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top