Weighted Stdev

L

Lee Harris

If I have a list of items, where each item has a name, a number (frequency)
and an average value, I can work out the STDEV easily on Column C on the
average values

however, how can I work out a weighted stdev such that the averages appear
in the ratio given by their frequency

in other words

if I have


Apple, 8, 1
Banana, 3, 2
Pear, 1, 5


STDEV(C:C) = STDEV (1,2,5) = 2.08

when what I actually want is for the value for apple to be counted 8 times,
for banana 3 times and pear 1 times

so I want

STDEV(1,1,1,1,1,1,1,1,2,2,5) = 1.16

Is there any way I can do this?


I'm doing the same with average, but whereas it's easy to do a weighted
average by just summing the frequencies and the values, I don't know how to
do this with stdev
 
L

Lee Harris

Lee Harris said:
If I have a list of items, where each item has a name, a number (frequency)
and an average value, I can work out the STDEV easily on Column C on the
average values

however, how can I work out a weighted stdev such that the averages appear
in the ratio given by their frequency

in other words

if I have


Apple, 8, 1
Banana, 3, 2
Pear, 1, 5


STDEV(C:C) = STDEV (1,2,5) = 2.08

when what I actually want is for the value for apple to be counted 8 times,
for banana 3 times and pear 1 times

so I want

STDEV(1,1,1,1,1,1,1,1,2,2,5) = 1.16

Is there any way I can do this?


I'm doing the same with average, but whereas it's easy to do a weighted
average by just summing the frequencies and the values, I don't know how to
do this with stdev

OK, I downloaded an add-in from here


and that seems to include a grouped STDEV which is what I need

on the test above, it worked, giving 1.16, but I have a large data set that
doesnt give the right answer by the look of it


It has about 150 rows, with the top 50 rows having much higher frequencies.
These rows have values in the range 3 to 4 almost exclusively. The normal
STDEV is 5.2, but when I use grouped STDEV it goes *UP* to 6.75 even though
by my reckoning, the weighting of those top rows should make the stdev go
down, not up

am I doing something wrong?
 
H

Harlan Grove

Lee Harris said:
If I have a list of items, where each item has a name, a number (frequency)
and an average value, I can work out the STDEV easily on Column C on the
average values

however, how can I work out a weighted stdev such that the averages appear
in the ratio given by their frequency

in other words

if I have


Apple, 8, 1
Banana, 3, 2
Pear, 1, 5


STDEV(C:C) = STDEV (1,2,5) = 2.08

when what I actually want is for the value for apple to be counted 8 times,
for banana 3 times and pear 1 times

so I want

STDEV(1,1,1,1,1,1,1,1,2,2,5) = 1.16

Is there any way I can do this?


I'm doing the same with average, but whereas it's easy to do a weighted
average by just summing the frequencies and the values, I don't know how to
do this with stdev

You could use the same approach for weighted standard deviations as you do
for weighted means.

wt'd mean:
=SUMPRODUCT(C:C,B:B)/SUM(B:B)

wt'd variance:
=SUMPRODUCT((C:C-SUMPRODUCT(C:C,B:B)/SUM(B:B))^2,B:B)

wt'd standard deviation:
=SQRT(SUMPRODUCT((C:C-SUMPRODUCT(C:C,B:B)/SUM(B:B))^2,B:B))
 
H

Harlan Grove

Harlan Grove said:
You could use the same approach for weighted standard deviations as you do
for weighted means.

SUMPRODUCT can't handle entire column ranges, so none of the formulas I
posted earlier work as written. If your data is in rows 2 through 1001, use
these intead.

wt'd mean:
=SUMPRODUCT(C2:C1001,B2:B1001)/SUM(B2:B1001)

wt'd variance:
=SUMPRODUCT((C2:C1001-SUMPRODUCT(C2:C1001,B2:B1001)
/SUM(B2:B1001))^2,B2:B1001)

wt'd standard deviation:
=SQRT(SUMPRODUCT((C2:C1001-SUMPRODUCT(C2:C1001,B2:B1001)
/SUM(B2:B1001))^2,B2:B1001))
 
L

Lee Harris

Harlan Grove said:
SUMPRODUCT can't handle entire column ranges, so none of the formulas I
posted earlier work as written. If your data is in rows 2 through 1001, use
these intead.

wt'd mean:
=SUMPRODUCT(C2:C1001,B2:B1001)/SUM(B2:B1001)

wt'd variance:
=SUMPRODUCT((C2:C1001-SUMPRODUCT(C2:C1001,B2:B1001)
/SUM(B2:B1001))^2,B2:B1001)

wt'd standard deviation:
=SQRT(SUMPRODUCT((C2:C1001-SUMPRODUCT(C2:C1001,B2:B1001)
/SUM(B2:B1001))^2,B2:B1001))

Sorry but that doesn't seem to give the correct answer

Column C contains values from 3 to 5, and column B contains frequencies from
10 up to about 300

I don't see how the STDEV can be 113 as your formula reports. It has to be
something similar to the 3-5 range of the data column.

When i tried your formula on

8,1
3,2
5,1

I got the answer 2.6 and not the correct value 1.16

any ideas?
 
L

Lee Harris

Harlan Grove said:
SUMPRODUCT can't handle entire column ranges, so none of the formulas I
posted earlier work as written. If your data is in rows 2 through 1001, use
these intead.

wt'd mean:
=SUMPRODUCT(C2:C1001,B2:B1001)/SUM(B2:B1001)

wt'd variance:
=SUMPRODUCT((C2:C1001-SUMPRODUCT(C2:C1001,B2:B1001)
/SUM(B2:B1001))^2,B2:B1001)

wt'd standard deviation:
=SQRT(SUMPRODUCT((C2:C1001-SUMPRODUCT(C2:C1001,B2:B1001)
/SUM(B2:B1001))^2,B2:B1001))

Here is some sample data
By plotting as a normal and taking the 67 and 95% populations I worked out
that the population SD should be between 0.6 and 1.0, which is what I would
expect
your formula gives the answer 113 (I tried swapping the Bs and Cs in your
formula but ended up with an answer in the thousands)
hope someone can figure out where it's gone wrong

col B is freq
col C is data

Num Avg
1 -10
2 -9
1 -7
1 -6
1 -5
3 -3.7
4 -2.3
1 -2
2 -1
7 -0.3
22 0.8
10 1
12 1.2
3 1.3
18 1.4
3 1.7
8 1.8
13 1.9
24 2
23 2.1
86 2.2
78 2.4
46 2.5
318 2.6
36 2.9
100 3
484 3.2
394 3.3
641 3.4
467 3.5
432 3.6
153 3.7
329 3.8
534 3.9
309 4
160 4.1
25 4.2
258 4.3
372 4.4
59 4.6
4 4.8
12 4.9
23 5
125 5.1
144 5.2
7 5.3
229 5.4
209 5.5
7 5.7
4 5.8
70 6.1
5 6.2
72 6.4
8 6.5
31 6.9
29 7
7 7.1
3 7.7
25 7.8
1 8
5 8.6
3 8.7
1 9
3 10
10 10.3
13 10.8
1 12
3 14.3
1 15
5 16.4
1 17
2 18
2 19
1 22
1 23
1 33
 
M

Myrna Larson

I think Harlan forgot to divide by the sum of the frequencies before taking
the square root. Try it like this:

=SQRT(SUMPRODUCT((C2:C1001-SUMPRODUCT(C2:C1001,B2:B1001)/SUM(B2:B1001))^2,B2:B1001)/(SUM(B2:B1001,-1)))

That gives me 1.405348.

The way you can build up this sort of formula is to do it in pieces:

a) in one cell, say E2, put the formula for the weighted average, i.e.

=SUMPRODUCT(C2:C1001,B2:B1001)/SUM(B2:B1001,-1)

b) in E3, the formula for the sum of squared deviations from the mean, also
weighted:

=SUMPRODUCT((C2:C1001-$E$2)^2,B2:B1001)

c) in E4, the formula for the standard deviation,

=SQRT($E$3/(SUM(B2:B1001,-1))

To get to a formula in a single cell, do this:

Select E3. In the formula bar, select all of the formula EXCEPT for the
initial equal sign and press CTRL+C to copy it. Press ESC to get out of edit
mode.

Then select cell E4. In the formula bar. double-click on the reference to $E$3
to select it, then CTRL+V to paste. This replaces the reference to E3 with the
formula in E3. Now your formula refers to E2, so copy the formula in E2 and
paste it over the reference to E2.
 
L

Lee Harris

Myrna Larson said:
I think Harlan forgot to divide by the sum of the frequencies before taking
the square root. Try it like this:

=SQRT(SUMPRODUCT((C2:C1001-SUMPRODUCT(C2:C1001,B2:B1001)/SUM(B2:B1001))^2,B2
:B1001)/(SUM(B2:B1001,-1)))

That gives me 1.405348.


many, many thanks Myrna. That really has made my life considerably easier,
and thanks also to Harlan for his efforts, despite the minor glitch.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top