Summarize YTD statistics

G

Guest

I have 13 separate worksheets with certain data; one worksheet for each month
of the year, and one to summarize the monthly result of the average, the
standard deviation, max and min of the data. For example, B25:M25 lists the
average, B26:M26 the standard deviation, B27:M27 the Max, and B28:M28 the
Min. How can I accurately summarize the monthly results from AVERAGE, STDEV,
MAX and MIN? Should I use the AVERAGE formulae or simply sum up the 12
figures and then divide them by 12?

Thank you for your help.

Orlando
 
B

Barb Reinhardt

If it were me, I'd probably do some statistical analysis of the monthly
values using SPC. That's outside the scope of this newsgroup. You might
want to look at ISIXSIGMA.COM if you are interested in more info.
 
J

joeu2004

OrlandoFreeman said:
How can I accurately summarize the monthly results from AVERAGE, STDEV,
MAX and MIN? Should I use the AVERAGE formulae or simply sum up the 12
figures and then divide them by 12?

I do not think the result would be any different. I tend to use the
appropriate function for the task whenever possible, if only for
clarity. Moreover, AVERAGE would be more consistent (symmetrical),
since I presume you would definitely use the STDEV, MAX and MIN
functions.

By the way, you might actually want STDEVP, not STDEV. It depends on
whether you consider your monthly data to be merely a representative
sample (use STDEV in that case), or whether you want the true std dev
of that "population" data (use STDEVP in that case).

Finally, AVERAGE and STDEV[P] might not be the correct functions to use
at all. It depends on the nature of the monthly data. For example, if
the monthly data represent a "time series" of data -- such as stock
prices or returns on investment -- you might want the geometric avg and
std dev. And for stock price "returns" (return rate), you might want
the (arithmetic) avg and std dev of the log returns (LOG), optionally
converted back to their "antilog" values (EXP).
 
G

Guest

Thank yo for your thoughts. I think that I didn't explain enough. I have 12
spreadsheets of certain data, each spreadsheet for every month of the year
(i.e. Jan-Dec). For each month of the year I have already calculated the
following:
SUM
MIN
MAX
AVERAGE
STDEV
Data above the AVERAGE (e.g. =COUNTIF(W5:W64,">"&W68))
Data below the AVERAGE (e.g.
=ABS(COUNTIF(W5:W64,">"&W68)-COUNTIF(W5:W64,">0"))

So, I now have a 13th spreadsheet that lists the results for every monthly
spreadsheet. For example:

C1:N1 row SUM
C2:N2 row MIN
C3:N3 row MAX
C4:N4 row AVERAGE
C5:N5 row STDEV
C6:N6 row Data above the AVERAGE (e.g. =COUNTIF(W5:W64,">"&W68))
C7:N7 row Data below the AVERAGE (e.g.
=ABS(COUNTIF(W5:W64,">"&W68)-COUNTIF(W5:W64,">0"))

My question is how I can summarize in, say, Column M1:M7 the results I got
for each formulae and for each month. For example, SUM is pretty
straighfoward as I just need to =SUM(C1:N1). But, my problem is how to
sumarize the other results.

Thank you for your help.

Orlando

OrlandoFreeman said:
How can I accurately summarize the monthly results from AVERAGE, STDEV,
MAX and MIN? Should I use the AVERAGE formulae or simply sum up the 12
figures and then divide them by 12?

I do not think the result would be any different. I tend to use the
appropriate function for the task whenever possible, if only for
clarity. Moreover, AVERAGE would be more consistent (symmetrical),
since I presume you would definitely use the STDEV, MAX and MIN
functions.

By the way, you might actually want STDEVP, not STDEV. It depends on
whether you consider your monthly data to be merely a representative
sample (use STDEV in that case), or whether you want the true std dev
of that "population" data (use STDEVP in that case).

Finally, AVERAGE and STDEV[P] might not be the correct functions to use
at all. It depends on the nature of the monthly data. For example, if
the monthly data represent a "time series" of data -- such as stock
prices or returns on investment -- you might want the geometric avg and
std dev. And for stock price "returns" (return rate), you might want
the (arithmetic) avg and std dev of the log returns (LOG), optionally
converted back to their "antilog" values (EXP).
 
J

joeu2004

OrlandoFreeman said:
My question is how I can summarize in, say, Column M1:M7 the results I got
for each formulae and for each month.

Unfortunately, the answer is: it depends. Specifically, it depends on
__what__ you want to represent in your summary. You need to decide
that before we can talk about "how". It is unclear whether you are
looking for help on "how" so much as you might be asking for
suggestions on "what".

In your original posting, you asked about AVERAGE v. summing and
dividing by 12. I noted that the results should be the same. But your
leading question implied to me that you want to represent "typical"
statistics "per month"; for example, the "typical" minimum, maximum,
average and standard deviation.

That is not an unreasonable way to summarize your results, although you
need to be careful about it. For example, while you might talk
separately about the "typical monthly average" and the "typical monthly
minimum", it would be wrong to talk about the "typical monthly average
__with__ a typical minimum". In other words, it would be wrong to
assume that the "typical" statistics coincide or work in tandem.

But now you write:
For example, SUM is pretty straighfoward as I just need to
=SUM(C1:N1). But, my problem is how to sumarize the other results.

That suggests to me that perhaps you want to represent statistics for
the total year, combining the monthly data. For example, if the
monthly statistics represent the daily average or the average per month
of 60 categories, perhaps you want to represent the daily average or
the average per year of 60 categories.

In that case, you could compute MIN(C2:N2) and MAX(C3:N3), just as you
computed SUM(C1:N1). However, in general AVERAGE(C4:N4) is not the
same as the average of all the data, unless the divisor (number of days
or number of categories, for example) is the same for each month.
Moreover, there is no meaningful way to compute an annual STDEV from
the 12 monthly STDEV values (C5:N5). (And I reiterate: you might want
STDEVP, not STDEV.)

A more reliable way to compute the annual average (i.e. it does not
rely on the same monthly divisor) would be to carry forward N for each
month (C8:N8) and compute SUM(C8:N8) in M8. N is the divisor used to
compute each monthly average (e.g. number of days or number of
categories). Then the annual average in M4 can be computed by either
of the following:

=SUMPRODUCT(C4:N4,C8:N8) / M8

=M1 / M8

I would opt for the latter, since you already carried forward the
monthly sums, and you will need them for another purpose, as you will
see below.

To compute the annual std dev, I would also carry forward SUMX2 for
each month (C9:N9) and compute SUM(C9:N9) in M9. SUMX2 is the sum of
the square of each of the data for each month, for example
=SUMPRODUCT(W5:W64,W5:W64). Then the annual STDEV (sample std dev) can
be computed in M5 by:

=SQRT( ( M9 - (M1^2 / M8) ) / (M8 - 1) )

Alternatively, the annual STDEVP (population std dev) can be computed
in M5 by:

=SQRT(M9/M8 - M4^2)
C6:N6 row Data above the AVERAGE (e.g. =COUNTIF(W5:W64,">"&W68))
C7:N7 row Data below the AVERAGE (e.g.
=ABS(COUNTIF(W5:W64,">"&W68)-COUNTIF(W5:W64,">0"))

I see no way to summarize these statistics based on the summary data in
C6:N6 and C7:N7. For "data above the average", you might be able to
construct an array formula, I believe using INDIRECT() to construct
references to the per-month worksheets, if they are cleverly named.
Search these newsgroups for other people's solutions to similar
problems (computations across multiple worksheets). Similarly for
"data below the average"; but the formula that you have does not make
sense to me, if the average is in W68 in each per-month worksheet.


----- complete previous posting -----
Thank yo for your thoughts. I think that I didn't explain enough. I have 12
spreadsheets of certain data, each spreadsheet for every month of the year
(i.e. Jan-Dec). For each month of the year I have already calculated the
following:
SUM
MIN
MAX
AVERAGE
STDEV
Data above the AVERAGE (e.g. =COUNTIF(W5:W64,">"&W68))
Data below the AVERAGE (e.g.
=ABS(COUNTIF(W5:W64,">"&W68)-COUNTIF(W5:W64,">0"))

So, I now have a 13th spreadsheet that lists the results for every monthly
spreadsheet. For example:

C1:N1 row SUM
C2:N2 row MIN
C3:N3 row MAX
C4:N4 row AVERAGE
C5:N5 row STDEV
C6:N6 row Data above the AVERAGE (e.g. =COUNTIF(W5:W64,">"&W68))
C7:N7 row Data below the AVERAGE (e.g.
=ABS(COUNTIF(W5:W64,">"&W68)-COUNTIF(W5:W64,">0"))

My question is how I can summarize in, say, Column M1:M7 the results I got
for each formulae and for each month. For example, SUM is pretty
straighfoward as I just need to =SUM(C1:N1). But, my problem is how to
sumarize the other results.

Thank you for your help.

Orlando

OrlandoFreeman said:
How can I accurately summarize the monthly results from AVERAGE, STDEV,
MAX and MIN? Should I use the AVERAGE formulae or simply sum up the 12
figures and then divide them by 12?

I do not think the result would be any different. I tend to use the
appropriate function for the task whenever possible, if only for
clarity. Moreover, AVERAGE would be more consistent (symmetrical),
since I presume you would definitely use the STDEV, MAX and MIN
functions.

By the way, you might actually want STDEVP, not STDEV. It depends on
whether you consider your monthly data to be merely a representative
sample (use STDEV in that case), or whether you want the true std dev
of that "population" data (use STDEVP in that case).

Finally, AVERAGE and STDEV[P] might not be the correct functions to use
at all. It depends on the nature of the monthly data. For example, if
the monthly data represent a "time series" of data -- such as stock
prices or returns on investment -- you might want the geometric avg and
std dev. And for stock price "returns" (return rate), you might want
the (arithmetic) avg and std dev of the log returns (LOG), optionally
converted back to their "antilog" values (EXP).
 
J

joeu2004

Errata....
OrlandoFreeman said:
C6:N6 row Data above the AVERAGE (e.g. =COUNTIF(W5:W64,">"&W68))
C7:N7 row Data below the AVERAGE (e.g.
=ABS(COUNTIF(W5:W64,">"&W68)-COUNTIF(W5:W64,">0"))
[....]
[Regarding "data below the average",] the formula that you have does not make
sense to me, if the average is in W68 in each per-month worksheet.

Aha! I had ass-u-me-d that you had some typos; it is only because of
that assumption that the formula seemed confusing. If the formula had
been written as follows, I might have interpreted it more readily (but
to each his own):

=countif(W5:W64,">0") - countif(W5:W64,">"&W68)

That is, the data below the average is all the data above zero less all
the data above the average. Strictly speaking, that is a count of the
data below __or_equal_to__ the average. Why not simply
countif(W5:W64,"<"&W68)? (Rhetorical. And use "<=" if that is what
you truly want.) You are assuming that all the "interesting" data are
greater than zero, or you want to exclude the data that are not.
Perhaps that makes sense in your case.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top