Geomean function

B

BobA5835

Hello All:

We use Excel spreadsheets to gather and calculate data for determining the
Reference Time (Geomean) for use in the formula: PT (patient’s time in
seconds)/Reference Time (in seconds) raised to the power of the (reagent’s)
ISI. This formula determines the Patient’s INR which is used by the
patient’s Physician to determine dosing of the drug coumadin in a coagulation
assay called the Prothrombin Time.

Normally, we obtain 20 to 30 data points with the order of magnitude of
approximately 11.0 to 15.0 seconds, always two digits with one decimal place.
Excel calculates the Geomean easily with this number of data points.

We have run into a situation where an institution has multiple sites and the
data we gather can approach 400 data points, i.e., 20 data points by 20
different sites. This gives us approximately 300 to 350 usable datapoints.
The problem is that Excel will not calculate the Geomean using two digits and
a single decimal place beyond an ‘n’ of 274. And yes, we have observed that
the ‘n’ is dependant on the order of magnitude of the numbers used. Single
digits will calculate to a higher ‘n’ them say double or triple digits.

It appears that this phenomenon is referred to as “overflow†and the Result
in the cell is #NUM!

I have two questions:

1. Is there a fix for this issue that will allow us to use Geomean to
calculate to a higher ‘n’ based on the numerical values that we must use?

And

2. And this might be beyond this newsgroup, but I don’t know how to contact
any real mathematicians, if we by groups obtain the Geomeans of an ‘n’ of
20-30, 20 times and then obtain the Geomean of those, is the final Geomean
obtained legit?

Thank you for any assistance,

Bob Allen
 
S

Stephen

BobA5835 said:
Hello All:

We use Excel spreadsheets to gather and calculate data for determining the
Reference Time (Geomean) for use in the formula: PT (patient's time in
seconds)/Reference Time (in seconds) raised to the power of the (reagent's)
ISI. This formula determines the Patient's INR which is used by the
patient's Physician to determine dosing of the drug coumadin in a
coagulation
assay called the Prothrombin Time.

Normally, we obtain 20 to 30 data points with the order of magnitude of
approximately 11.0 to 15.0 seconds, always two digits with one decimal
place.
Excel calculates the Geomean easily with this number of data points.

We have run into a situation where an institution has multiple sites and
the
data we gather can approach 400 data points, i.e., 20 data points by 20
different sites. This gives us approximately 300 to 350 usable
datapoints.
The problem is that Excel will not calculate the Geomean using two digits
and
a single decimal place beyond an 'n' of 274. And yes, we have observed
that
the 'n' is dependant on the order of magnitude of the numbers used.
Single
digits will calculate to a higher 'n' them say double or triple digits.

It appears that this phenomenon is referred to as "overflow" and the
Result
in the cell is #NUM!

I have two questions:

1. Is there a fix for this issue that will allow us to use Geomean to
calculate to a higher 'n' based on the numerical values that we must use?

And

2. And this might be beyond this newsgroup, but I don't know how to
contact
any real mathematicians, if we by groups obtain the Geomeans of an 'n' of
20-30, 20 times and then obtain the Geomean of those, is the final Geomean
obtained legit?

Thank you for any assistance,

Bob Allen

I don't know which version of Excel you are using, or indeed if it matters.
I am using 2000. When I type GEOMEAN into Help, it tells me that the
function can take from 1 to 30 arguments (what you called "data points"). So
I suppose any more than that and the result isn't defined. You could see if
your Excel version is the same. BTW, it's always worth typing the function
name into Help to check you are using it correctly.

To try to answer your second question, taking the GEOMEAN of multiple
GEOMEANS is valid ONLY if the number of data points used to calculate each
is the same. So, if you split your data into groups of exactly 20 (say), you
can calculate the GEOMEAN of each group, and then calculate the overall
GEOMEAN by finding the GEOMEAN of these results. Of course, this will only
work if you can split your data into equal-sized groups.

(It's like averaging averages. This gives a correct overall average only if
the individual averages are calculated from groups of the same size.)
 
H

Harlan Grove

Stephen said:
. . . When I type GEOMEAN into Help, it tells me that the
function can take from 1 to 30 arguments (what you called "data
points"). So I suppose any more than that and the result isn't
defined. . . .
....

Arguments aren't the same as data points.

=GEOMEAN(1,2,3,4)

has 4 arguments AND 4 data points, but

=GEOMEAN(B5:B1004)

has ONE argument but could span 1000 data points.

The OP's problem is driven by having 300-odd values of around 10, and
that means the result of multiplying all the values together would be
around 10^300, which is the high end of Excel's numeric capabilities.
In other words, GEOMEAN is suffering from numeric overflow, which is
due to Excel's rather stupid implementation, equivalent to

GEOMEAN(x) == PRODUCT(x) ^ (1 / COUNT(x))

The expedient solution to this is using something other than GEOMEAN.
If all values in the range in question were positive numbers, then the
easiest is

=EXP(AVERAGE(LN(data_points)))

If there were missing values, then an array formula is necessary

=EXP(AVERAGE(IF(ISNUMBER(data_points),LN(data_points))))
 
J

joeu2004

The problem is that Excel will not calculate the Geomean using two digits
and a single decimal place beyond an 'n' of 274.
[....]
It appears that this phenomenon is referred to as "overflow" and the Result
in the cell is #NUM!
[....]
1. Is there a fix for this issue that will allow us to use Geomean to
calculate to a higher 'n' based on the numerical values that we must use?

I presume you mean that you are taking the geomean of a single range,
or no more than 30 ranges. So the function limit of 30 arguments is
not a factor.

Instead, you are probably hitting a computational limit of the IEEE
754 floating-point arithmentic -- approximatey 10^308 -- because it
appears that GEOMEAN uses an algorithm that matches its formula,
namely: (y1*y2*...*yN)^(1/N). Thus, y1*y2*...*yN is probably
exceeding the maximum floating-point value.

The workaround is to note that (y1*y2*...*yN)^(1/N) =
10^((log(y1)+log(y2)+...+log(yN))/N).

So in Excel, enter the following array formula (commit by pressing ctrl
+shift+Enter, not just Enter):

=10^(sum(log(A1:A400))/400)

2. And this might be beyond this newsgroup, but I don't know how to contact
any real mathematicians, if we by groups obtain the Geomeans of an 'n' of
20-30, 20 times and then obtain the Geomean of those, is the final Geomean
obtained legit?

You don't need to be a mathematician. Simply experiment with smaller
groups of simple numbers, e.g. 2 through 11. Try splitting into 2
groups of 5. Then try splitting into groups of 4 and 6.

You should discover that the geomean of several geomeans equals the
single geomean only when all data sets are the same size. (I could
explain the mathematics. But why bother?)

But also note that if you went this approach, then you might run into
the Excel limit of 30 arguments to GEOMEAN, unless you put the
geomeans of the smaller data sets into "helper" cells in no more than
30 contiguous ranges.
 
J

joeu2004

Errata....

You don't need to be a mathematician. Simply experiment with smaller
groups of simple numbers, e.g. 2 through 11. Try splitting into 2
groups of 5. Then try splitting into groups of 4 and 6.

You should discover that the geomean of several geomeans equals the
single geomean only when all data sets are the same size.

What I should have said is: you should discover that the geomean of
geomeans does not equal the single geomean when the data sets have
different sizes.

The conclusion that it works for same-sized data sets does require an
inductive proof. It might work for 2 equal groups by accident. But
you might get pretty convincing evidence (or not) by experimenting
with equal groups of 2, 3, and 5, for example.
 
B

BobA5835

Thank you Stephen, Harlan and joeu2004 for you helpful info and prompt replies.

Stephen: BTW I did check out Help, but I'm a biologist. And an old one at
that. Our young programmer told me this as a further explanation

"The #NUM error would be return if any of three conditions were met:

1. Any of the values was equal to zero (None are)
2. Any of the values was negative (None are)
3. Limits of Excel were exceeded (Bingo)

Since GEOMEAN finds the product of all numbers (multiplies them all by each
other) and then takes the nth root, the product may easily be too large for
Excel. The largest positive number in Excel is 9.99999999999999 * 10^307 (in
scientific notation this is written as 9.99999999999999E+307) If the product
gets larger than this number you will get a #NUM error for the function."

Which matches what joeu2004 said.

Yes, the equal sized Groups do produce a numerical Value equal to the Value
of the Geomean in an 'n' of 273 data points. Hence my second question to a
mathmatician: Are these actually the same Value?

Harlan: I believe your explanation is fairly close to others I've seen
regarding 'Excel limitations' and Thank You for a workable formula, I'll run
it by my programmer.

joeu2004: Thank You for a second workable formula ditto on showing it to my
programmer.

"You don't need to be a mathematician." Well, actually, you do in this
case. I'm asking for an expert opinion. We are in a regulated industry.
I'm a biologist, not a mathematican.

Warm regards,

Bob
 
J

joeu2004

Our young programmer told me this as a further explanation
[....]
The largest positive number in Excel is 9.99999999999999 * 10^307

That is the largest positive constant that you can enter. The largest
postive value as a result of computation is a little more than
1.7976931348623157E+308, which is displayed as 1.79769313486232E+308.
Yes, the equal sized Groups do produce a numerical Value equal to the Value
of the Geomean in an 'n' of 273 data points. Hence my second question to a
mathmatician: Are these actually the same Value?

First, considering the constraints of your data (up to a value of 15,
if I understand your first posting correctly), I think you would have
to settle for groups of 262, not 273. 15^262 is about 1.367E+308, but
15^263 overflows. You might know that your average data point less
than 15; you said you computed the geomean of 273 data successfully.
But I think it would be risky to depend on that. It's a judgment
call.

As for the mathematical question, it has already been answer: yes,
the geomean of k geomeans of n data each is equal to the geomean of
k*n data [1]. But remember that n must be the same for all "sub-
geomeans". If you choose 262 for n, the total number of data must be
a multiple of 262(!!).
joeu2004: Thank You for a second workable formula ditto on showing it to my
programmer.

Harlan's formula, using AVERAGE, is equivalent and arguably cleaner.
You can use exp/ln or 10/log. Frankly, that's the form that I always
use. I don't know why I chose not to use AVERAGE in my formula.
Brain fart!
"You don't need to be a mathematician." Well, actually, you do in this
case. I'm asking for an expert opinion. We are in a regulated industry.
I'm a biologist, not a mathematican.

I would not qualify as an "expert mathematical opinion" in a court of
law.

HTH.


-----

Endnotes:

[1] Proof. For simplicity, G(...) means GEOMEAN(...).

G( G(y[1,1],...,y[1,n]), ... ,G(y[k,1],...,y[k,n]) )

=( G(y[1,1],...,y[1,n]) *...* G(y[k,1],...,y[k,n]) ) ^ (1/k)

=( (y[1,1]*...*y[1,n])^(1/n) * ... * (y[k,1]*...*y[k,n])^(1/n) ) ^ (1/
k)

= ( y[1,1]*...*y[1,n] *...* y[k,1]*...*y[k,n] ) ^ (1/n) ^ (1/k)

= ( y[1,1]*...*y[1,n] *...* y[k,1]*...*y[k,n] ) ^ (1/n*k)

=G(y[1,1],...y[1,n], ... ,y[k,1],...,y[k,n])
 
B

BobA5835

Hello Joeu2004:

Thank you for all of this additional information and detailed explanations.

Interestingly enough, your formula gave me a closer value to the target test
value than Harlans. My programmer says it may have to do with the log
function in some way.

You may not consider yourself an expert in the legal sense, but I'm pretty
sure your explanations and formulas would hold up anyplace. Thank you so
much for your time and efforts!

Warm regards,

Bob Allen

joeu2004 said:
Our young programmer told me this as a further explanation
[....]
The largest positive number in Excel is 9.99999999999999 * 10^307

That is the largest positive constant that you can enter. The largest
postive value as a result of computation is a little more than
1.7976931348623157E+308, which is displayed as 1.79769313486232E+308.
Yes, the equal sized Groups do produce a numerical Value equal to the Value
of the Geomean in an 'n' of 273 data points. Hence my second question to a
mathmatician: Are these actually the same Value?

First, considering the constraints of your data (up to a value of 15,
if I understand your first posting correctly), I think you would have
to settle for groups of 262, not 273. 15^262 is about 1.367E+308, but
15^263 overflows. You might know that your average data point less
than 15; you said you computed the geomean of 273 data successfully.
But I think it would be risky to depend on that. It's a judgment
call.

As for the mathematical question, it has already been answer: yes,
the geomean of k geomeans of n data each is equal to the geomean of
k*n data [1]. But remember that n must be the same for all "sub-
geomeans". If you choose 262 for n, the total number of data must be
a multiple of 262(!!).
joeu2004: Thank You for a second workable formula ditto on showing it to my
programmer.

Harlan's formula, using AVERAGE, is equivalent and arguably cleaner.
You can use exp/ln or 10/log. Frankly, that's the form that I always
use. I don't know why I chose not to use AVERAGE in my formula.
Brain fart!
"You don't need to be a mathematician." Well, actually, you do in this
case. I'm asking for an expert opinion. We are in a regulated industry.
I'm a biologist, not a mathematican.

I would not qualify as an "expert mathematical opinion" in a court of
law.

HTH.


-----

Endnotes:

[1] Proof. For simplicity, G(...) means GEOMEAN(...).

G( G(y[1,1],...,y[1,n]), ... ,G(y[k,1],...,y[k,n]) )

=( G(y[1,1],...,y[1,n]) *...* G(y[k,1],...,y[k,n]) ) ^ (1/k)

=( (y[1,1]*...*y[1,n])^(1/n) * ... * (y[k,1]*...*y[k,n])^(1/n) ) ^ (1/
k)

= ( y[1,1]*...*y[1,n] *...* y[k,1]*...*y[k,n] ) ^ (1/n) ^ (1/k)

= ( y[1,1]*...*y[1,n] *...* y[k,1]*...*y[k,n] ) ^ (1/n*k)

=G(y[1,1],...y[1,n], ... ,y[k,1],...,y[k,n])
 
J

joeu2004

Interestingly enough, your formula gave me a closer value to the target test
value than Harlans. My programmer says it may have to do with the log
function in some way.

I suspect the difference that you saw was because you followed
Harlan's instruction literally. For his first exp/ln formula, Harlan
neglected to mention that it is an array formula, to be committed with
ctrl+shift+Enter, just as you did with my formula presumably.

If you press just Enter instead, both formulas might operate on only
one cell, not the entire range. Specifically, they might simply
return the value of the one data item (because exp(ln(x)) = x and
10^log(x) = x). Oddly, they return the value of the 2nd cell in the
range, not the first. In my case, I suspect that it is because the
exp/ln formula is in C2 and the data are in B1:B262. In other words,
when the non-array formula is in a parallel column, it appears that
the corresponding cell from the parallel range is used. In fact, if I
move the non-array formula to a non-parallel location, it fails with
#VALUE.

In any case, I would not be surprised if there were some difference
between using 10/log and exp/ln. But I would expect the difference to
be very small.

Indeed, when I compare all 3 approaches (including GEOMEAN) using 56
sets of 262 random real numbers between 11.0 and 15.0, the difference
between the unrounded results of either logarithm approach and GEOMEAN
is on the order of 1E-14 or 1E-15. There is no difference among the
approaches if I round their results to one decimal place, as input
data is.

Comparing the unrounded results, the exp/ln formula is closer to the
GEOMEAN result 48%, compared to 38% for the 10/log formula. I suspect
those percentages would be about the same if I did the experiment on a
much large number of sets -- similar to tossing a coin. They are
equally close to GEOMEAN 8% of the time. Each is exactly equal to
GEOMEAN (within the precision of 64-bit floating-point arithmetic) 13%
of the time.

But simply being closer does not make it more correct. It is
difficult to compare the numerical errors inherent in the computation
for the different approaches. Who's to say that the GEOMEAN result is
the most correct one?
 
H

Harlan Grove

BobA5835 said:
Interestingly enough, your formula gave me a closer value to the
target test value than Harlans. My programmer says it may have to
do with the log function in some way.
....

It might. If this were so, you might achieve even higher precision
using a larger base for your logarithms, e.g.,

=128^AVERAGE(LOG(x,128))
 
B

BobA5835

Hello Joeu2004 and Harlan:

Bravo Gentlemen.

I followed Joeu2004's instructions regarding the ctrl+shift+Enter with both
of Harlan's formulas and the Results are identical to my target Value of
13.42!

Proving you are both Geniuses and entitled to my undying, eternal Thanks and
Gratitude!

I can't wait to run these new calcs past my programmer, even he will be
impressed!

Warm regards,

Bob
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top