Weighted Standard Deviation

M

Mike Lechner

I've been using the STDEVP function to get the standard deviation of
"an entire population" of values. Now I would like to calculate the
standard deviation for a population and have it weighted by another
variable. In my case, the samples represent a list of chemical
analyses for a drill hole. The samples are not of uniform length so I
would like to have my statistics weighted by the length of the
samples. How do I go about this?

Another question that I have for the group is should I be using the
STDEVP or STDEV function? It is my understanding that the STDEV
function is used for a subset or "sampling" of a much larger
population.

TIA

Mike Lechner
 
T

Tom Ogilvy

What about using a separate column to weight the original values, then use
the STDDEV on that column.

STDEV is for a sample from a population (subset), and STDEVP is for a census
(entire population).

Regards,
Tom Ogilvy
 
M

Mike Lechner

Tom,

I've tried that (maybe incorrectly) and did not get the correct
answer. What I did was multiplied the length times the value for each
record and stored that in a separate column. I then took the STDEVP
of that column and divided the result by the sum of the weights
(length). This did not give me the correct answer.

Any comments?

Mike
 
D

David J. Braden

Mike,
You are welcome. To map from a population variance to the standard
definition of a "sample variance", you need to know the size of your
original sample. It isn't enough to know the weights alone, unless they
haven't been scaled by the number of observations.

IOW, if your weights are raw data like "I observed X1 2 times, X2 6
times,..." then add up 2 + 6 + ... to get the number of observations.
Call this "n". To get the so-called "sampe variance", multiply your
earlier calculation by (n/(n-1)). The sample stdev is just the square
root of this.

As to other questions: I'm no statistician either. I'm more into
gardening as a hobby. But folks use that adjusted variance/stdev to
satisfy certain desiderata for estimation purposes, desiderata, btw,
that are hardly set in stone, and rarely understood. Maybe you *do* want
the population statistics. If you have all the data, use what I gave you
initially. If you don't have the entire population's data, and *if* your
sample satisfies certain conditions, then you *might* want to use the
adjusted statistic to give you an extimate of the population's variance,
an estimate that, as I said, satisfies some desiderata that I won't go
into here. FWIW, there are other ways to scale the sample variance that
make it a "better" estimator of the population variance---"better" under
different desiderata.

When would it be better to use a sample? Comes down to cost and quality
control. More data *does not* make you better off, if it is used
blindly, and presumably it does come at a cost.

My suggestion? Now that you are in the ballpark with how to get Excel to
give answers that seem reasonable to you, go with them and make the
decisions that prompted the work in the first place.

HTH
David Braden

Thanks David for your answer. Using David's formulae enabled me to
duplicate the weighted standard deviation that I have been getting
from several different software packages. I have been comparing the
results from various programs with another software package that gives
slightly different values for variance and standard deviation. It
looks like most of the packages are giving the equivalent of Excel's
VARP and STDEVP. The other software system is giving the equivalent
of VAR and STDEV results.

I think that the VARP and STDEVP results are more correct given the
type of data that I work with (earth science stuff), but I'll be the
first to admit that I'm no statistician. Why wouldn't one want to
calculate the varaince and standard deviation of an entire population?
When would it be more appropriate to calculate those stats based on a
"sample" of the population? I'm assuming that the more data that I
have should provide a better answer than some subset of the entire
data set.

The calculations that David showed me replicated Excel's VARP and
STDEVP. What calculations would I use to replicate the VAR and STDEV
calculation (again weighting those stats)?

TIA

Mike Lechner (novice at statistics)

Myrna Larson said:
IOW the weighting is applied to the deviations [the (x - mean)^2 values],
not to the original
[x] values, right?

Sounds like a sample standard deviation is most appropriate for
presentations to some audiences, though to me it really doesn't matter.

If you have a set of values X and weights W, then first normalize the
weights: In an adjacent column (say), assuming the weights are in
B1:B100, enter into C1 the formula =B1/SUM($B$1:$B$100), and drag-copy
that down to C100.

Assuming the "X" values are in A1:A100, then enter into D1 the formula
=SUMPRODUCT(A1:A100,C1:C100)
to get the average.
And into D2 ctrl-shift-enter
=SUMPRODUCT((A1:A100-$D$1)^2,C1:C100)
to get the variance. Its square root is the (population) standard
deviation.

You can adjust these for the sample size to get various sorts of
estimators for the "true" standard deviation. See Excel's help file on
STDEV, and any into stats book.

HTH
Dave Braden

Tom,

I've tried that (maybe incorrectly) and did not get the correct
answer. What I did was multiplied the length times the value for each
record and stored that in a separate column. I then took the STDEVP
of that column and divided the result by the sum of the weights
(length). This did not give me the correct answer.

Any comments?

Mike

What about using a separate column to weight the original values, then
use
the STDDEV on that column.

STDEV is for a sample from a population (subset), and STDEVP is for a
census
(entire population).

Regards,
Tom Ogilvy

I've been using the STDEVP function to get the standard deviation of
"an entire population" of values. Now I would like to calculate the
standard deviation for a population and have it weighted by another
variable. In my case, the samples represent a list of chemical
analyses for a drill hole. The samples are not of uniform length so
I
would like to have my statistics weighted by the length of the
samples. How do I go about this?

Another question that I have for the group is should I be using the
STDEVP or STDEV function? It is my understanding that the STDEV
function is used for a subset or "sampling" of a much larger
population.

TIA

Mike Lechner
 
M

Myrna Larson

"They" say you should use the sample method if you have fewer than 30 values -- the result is
closer to the true population values.
 
D

David J. Braden

Myrna,
"They" are the unwashed masses. <g>. Indeed, dividng by (n+1) instead of
(n-1) gives a "better" estimator under some desiderata; this was due, as
far as I remember, to LaGrange. What I don't recall, nor can look up
right now (my books are still in storage <groan>), is the set of
assumptions that led to the popular acceptance of scaling the variance
by n/(n-1). In particualr, I wonder if a bootstrapped calculation gives,
asymptotically, the popular formula. We might be able to derive that one
analytically. I just wish Excel would correctly implement VAR and VARP,
and its related functions.
Dave
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top