# Using STDEV to evaluate a subset of a population

J

#### jdunnisher

I have a list of 500 items that I have ranked across 10 separate categories
using the STDEV function for each individual category. I then SUM the
standard deviations of the 10 categories to come up with a "total value" for
each item. I'm trying to rank the items 1 to 500 based on their total values.

Instead of evaluating each of the items against the average of the entire
population, I'd like to have Excel determine the top 100 items based on
"total value" and rank all 500 items against the averages of that subset of
the top 100.

In other words, I'd like for Excel to cherry pick the top 100 items and then
evaluate the entire population of 500 against that subset - instead of
against the entire population. Is this possible? Thanks.

B

#### Bob Bridges

What defines topness? Is it just the greatest "total value"? If so, that
doesn't sound like cherry picking; all you need to do is sort on the
total-value column, once you've calculated each value. That's too obvious,
so how do you calculate "total value" and why do you speak of Excel "cherry
picking" based on it? I'm probably missing something fundamental in your
description.

J

#### jdunnisher

Bob,

I apologize for any confusion my inital post may have caused. What you
described is exactly what I have been doing. I sort by the "total value"
column to get the top 100, then take the means and standard deviations from
those 100 items for each of the 10 categories and manually plug them into
cells that are setup to have Excel re-evaluate the entire set of 500 items
against those values. The reason I refer to it as "cherry picking" is that
it no longer represents a true standard deviation.

What I'd like to do (if possible), is have Excel determine which items have
the top 100 "total value" across the 10 categories and automatically
re-calculate the means and standard deviations, so that I don't have to do it
manually each time. Again, I'm not sure it's even possible; I just figured
I'd throw the question out to the forum.

B

#### Bob Bridges

I'm not getting it yet. Let's describe the process in detail: You have 500
rows; you say you have ranked the rows "across 10 separate categories". Does
that mean each of the 500 rows falls into a different category (French,
Canadian, Spanish, Chinese, Russian etc) and you treat each category
separately? Or do you mean you have 10 categories of value for each row
(cost, attractiveness, popularity, intelligence etc) and your "total value"
is a composite of these ten values? I'm guessing the latter.

Then you sort the 500 rows on total value and set aside all but the top 100.
Of the new population of 100 rows you calculate the average and standard
deviation in each of the ten categories, and with those 20 numbers you...do
what, exactly?

J

#### jdunnisher

Based on your posts, you seem to be following along just fine.

Let's say I have 500 salespeople that I rank across 10 performance
categories to determine the Top 100 overall. Above each column, I have two
cells (one for the mean and one for the std deviation) - for a total of 20
cells. After determining the Top 100, I overwrite the values in those 20
cells with the means and standard deviations for just the Top 100. I then
plug the entire data set of 500 salespeople back into the table below.

On a separate worksheet, I have another table setup to determine the number
of standard deviations above or below the mean each salesperson is for each
category. But my goal is to have Excel automatically evaluate the entire
group of 500 employees against the Top 100 without having to manually
overwrite the values in those 20 cells each time. I hope this helps to
clarify.

B

#### Bob Bridges

Yeah, I think it does. Last review, then: Sheet1 (let's say) has employee
name in column A, plus maybe a few more demographic columns - department,
employee number, phone, whatever, I don't care about that - and then in say
columns D through M there are ten performance categories; in each of the 500
employee rows, that employee receives a numerical rating in each of the 10
categories. Between the 500 employee rows and the column header(s) at the
top you've saved room for AVERAGE(D4503) in D2:M2, and in each column of
D3:M3 you have the standard deviation. And out in column N, perhaps, you
calculate for each employee a "total value" based on that employee's
performance ratings, weighted to your satisfation.

Now comes a manual process: You sort the 500 employees on column N, the
best numbers at the top. Then you have to have the AVERAGE and STDEV
calculations in rows 2 and 3 reflect not all 500 rows but just the first 100;
you figure that out and probably PasteSpecial the values (not the formulae)
into rows 2 and 3, so that during the following steps those values will be
preserved. After that you're free to sort all 500 rows back into whatever
order you want them in; rows 2 and 3 now represent mean and stdev for just
the top 100 employees.

Back to automatic processes: On (let's say) Sheet2, D4:M503 display each
employee's rating in each category in plus-or-minus sigs intead of a raw
number, based on the new values in rows 2 and 3 of Sheet1. Sheet2 is what
you actually look at for analysis.

This process works, but you're hoping to minimize the manual work, or even
eliminate if possible. Right?

Well, I haven't yet asked how you're plugging in the second set of mean and
stdev formulae - for the top 100 employees, I mean. I hope you have them
permanently prepared, so all you have to do is a PasteSpecial one time. But
at any rate, I've tried the following and think it would work: Instead of
pasting the new top-100 means and stdev in rows 2 and 3 of Sheet1, put them
in rows 2 and 3 of Sheet2, though still calculating on the first 100 rows in
Sheet1. (Eg "=AVERAGE(Sheet1!D\$4\$103)".) I've tested this and the
formulae in Sheet2 seem to keep looking at rows 4 through 103 even after I
sort the rows in Sheet1, with the result that the Sheet2 means and stdevs
change as you'd expect. Then your sig calculations in Sheet2!D4:M503 will, I
believe, show the results you want, without ever having to do manual pasting
and recalculation; just a sort will do the trick.

Is that clear?

J

#### jdunnisher

I appreciate the suggestion, Bob. Unfortunately, I won't have the chance to
test it out for a day or two since I'm traveling early tomorrow, but the way
you've described it looks like it should accomplish what I'm looking to do.
Thanks for bearing with me.