Average w/o random high numbers

Guest · Aug 14, 2004

I'm keeping track of inventory of about 100 items. Every month is a different
special, which means a much higher movement. I need to order 1.5 months in
advance, go no lower than 1 months supply and no more than 3-4 months on
hand. In order to do this I need an accurate NORMAL average...as in exclude
the abnormally high volume months (I'll just have to figure that order out
manually...)
Currently I have this =AVERAGE(L4,R4,X4,AD4,AJ4,AP4,AV4)
Then based on that average I have a formula to help me know when and what to
order.

Trevor Shuttleworth · Aug 14, 2004

this should point you in the right direction, though not using the specific
cells you mention.

If, for example, your values were all in column A and were, maybe, 1, 2, 34,
4, 6, 10, then:

=COUNTA(A:A) would give you 6, the number of non-blank cells in column A
=SUM(A:A) would give 57, the total of the cells in column A
=MAX(A:A) would give 34, the maximum value in column A, which we can assume
is the promotion value (?)
=AVERAGE(A:A) would give 9.5, the average of the cells in column A
=(SUM(A:A)-MAX(A:A))/(COUNTA(A:A)-1) would give 4.6, the average of the
values excluding the promotion value

I guess this last is what you are looking for ?

So *your* formula would look something like this:

=(SUM(L4,R4,X4,AD4,AJ4,AP4,AV4)-MAX(L4,R4,X4,AD4,AJ4,AP4,AV4))/6

Regards

Trevor

Guest · Aug 15, 2004

Ok, that sounds good, but what about eliminating multiple max occurances,
like 1,2,30,2,3,10,3,34 - eliminating the 30 & 34?

Trevor Shuttleworth · Aug 15, 2004

In my example:

=(SUM(A:A)-SUM(LARGE(A:A,{1,2})))/(COUNTA(A:A)-2)

would work for two "large" numbers.

Think your formula needs to look like:

=(SUM(L4,R4,X4,AD4,AJ4,AP4,AV4)-SUM(LARGE((L4,R4,X4,AD4,AJ4,AP4,AV4),{1,2}))
)/5

Regards

Trevor

Myrna Larson · Aug 15, 2004

Look at the TRIMMEAN function. It allows you to exclude a certain percentage
of values from the top and bottom.

From Help for this function: "Returns the mean of the interior of a data set.
TRIMMEAN calculates the mean taken by excluding a percentage of data points
from the top and bottom tails of a data set. You can use this function when
you wish to exclude outlying data from your analysis."

Harlan Grove · Aug 15, 2004

Trevor Shuttleworth said:
=(SUM(A:A)-SUM(LARGE(A:A,{1,2})))/(COUNTA(A:A)-2)

....

Safer to use COUNT, not COUNTA.

Harlan Grove · Aug 15, 2004

Myrna Larson said:
Look at the TRIMMEAN function. It allows you to exclude a certain
percentage of values from the top and bottom.

....

And how would it help when the OP wants to exclude only high values?

Ron Rosenfeld · Aug 15, 2004

I'm keeping track of inventory of about 100 items. Every month is a different
special, which means a much higher movement. I need to order 1.5 months in
advance, go no lower than 1 months supply and no more than 3-4 months on
hand. In order to do this I need an accurate NORMAL average...as in exclude
the abnormally high volume months (I'll just have to figure that order out
manually...)
Currently I have this =AVERAGE(L4,R4,X4,AD4,AJ4,AP4,AV4)
Then based on that average I have a formula to help me know when and what to
order.

It sounds as if what you want to do is throw out the "outliers", and you know
that the outliers will be at the high end of the spectrum of inventory change.

Depending on the distribution of your numbers, it might be possible to do this
automatically. For example, if your sales numbers are normally distributed,
you might be able to use a formula which would exclude those values that are
more than one (or two) standard deviations higher than the mean.

For example:

=SUMIF(rng,"<"&AVERAGE(rng)+STDEVP(rng))/COUNTIF(rng,"<"&AVERAGE(rng)+STDEVP(rng))

Whether this would work any better than manual inspection, or throwing out an
arbitrary number of high values, is something you can only tell by examining
the actual data.

To allow for sales growth, you also might want to arbitrarily restring rng to a
certain time period.

--ron

Harlan Grove · Aug 15, 2004

Ron Rosenfeld said:
It sounds as if what you want to do is throw out the "outliers", and
you know that the outliers will be at the high end of the spectrum of
inventory change.

....

Actually, the OP has already explained that the outliers correspond to
inventory of items 'on special' in certain months. Maybe better to add an
indicator column showing whether the item were 'on special' in a given month
or not, then use AVERAGE(IF(...)) to average only those items not 'on
special'.

duane · Aug 15, 2004

why not use stddev to identify high outliers

Harlan Grove · Aug 16, 2004

duane > said:
why not use stddev to identify high outliers?

Standard deviations are wonderful things. They're always *less* than the
difference between the highest value and the mean. If you have no true
outliers, you're always going to exclude some observations if you use a
1-times-SD test. If you have extremely low variance, e.g.,

1000 1005 1000 1005 1000 1005

you'd wind up excluding all values above the mean.

If you want to identify outliers, then compare the largest value to the mean
plus n standard deviations of the remaining values. If there are multiple
largest values, exclude only one and include the rest in the moments.

Ron Rosenfeld · Aug 16, 2004

Actually, the OP has already explained that the outliers correspond to
inventory of items 'on special' in certain months. Maybe better to add an
indicator column showing whether the item were 'on special' in a given month
or not, then use AVERAGE(IF(...)) to average only those items not 'on
special'.

Sure. That's another approach. The downside is it requires a bit more work
(having to ensure the indicator column is correctly filled out) than a
formulaic approach.

The problem with the formulaic approach is that one has to be sure that the
formula works in a desired way. When dealing with outliers, statistical tests
can tell you where to look; but they can't tell you in and of themselves
whether that data is wrong. Other factors are required for that judgement.

--ron

Harlan Grove · Aug 16, 2004

Ron Rosenfeld said:
. . . Other factors are required for that judgement.

Such as using indicator variables when they're the obvious way to
distinguish between nonstochastic states such as 'on special' and not?

Ron Rosenfeld · Aug 16, 2004

Such as using indicator variables when they're the obvious way to
distinguish between nonstochastic states such as 'on special' and not?

Yes, or experience and observation showing a definite correlation between some
statistical test (such as SD or 2*SD from the median) and the existence of an
outlier.
--ron

Harlan Grove · Aug 17, 2004

We're defining 'outlier' differently. If 'outlier' values correlate nearly
perfectly with a nonstochastic indicator variable (e.g., 'on special' or
not), to me they ain't outliers - they're simply not relevant observations.
If there were no such state indicator and there were a *few* unusually high
or low values, then those could bias sample moments as estimators of the
underlying population moments. To me, only the latter are outliers.

Ron Rosenfeld · Aug 17, 2004

We're defining 'outlier' differently. If 'outlier' values correlate nearly
perfectly with a nonstochastic indicator variable (e.g., 'on special' or
not), to me they ain't outliers - they're simply not relevant observations.
If there were no such state indicator and there were a *few* unusually high
or low values, then those could bias sample moments as estimators of the
underlying population moments. To me, only the latter are outliers.

Fair enough. I was using a somewhat different definition that was inclusive of
"stragglers" for the sake of this conversation.

"Extreme values are defined as observations in a sample, so far separated in
value from the remainder as to suggest that they may be from a different
population, or the result of an error in measurement. Extreme values can also
be subdivided into stragglers, extreme values detected between the 95% and 99%
confidence levels; and outliers, extreme values at >99% confidence level". (ISO
3534. Statistics — Vocabulary and Symbols. Part 1: Probability and general
statistical terms, section 2.64. Geneva 1993.)

And I agree with you (I think) that while the suspect values can be shown to be
"unusual", they cannot be shown to be "wrong" without other input.

While that input could certainly be a manually entered marker in another
column, it may not be required depending on the nature of the data.

--ron

Average w/o random high numbers

Guest

Trevor Shuttleworth

Guest

Trevor Shuttleworth

Myrna Larson

Harlan Grove

Harlan Grove

Ron Rosenfeld

Harlan Grove

duane

Harlan Grove

Ron Rosenfeld

Harlan Grove

Ron Rosenfeld

Harlan Grove

Ron Rosenfeld