Looking for efficient solution to formulas

R

R Gostautas

Greetings everyone,

I have put together the following formulas. For small
data sets, these formulas have worked out fine. However,
I have hit one particular data set that has over 21,000+
rows. This of course brings the entire system to a halt
(hangs) when running these formulas.

I am hoping that someone can provide a better/efficient
set of formulas.

Mathematical equation (1)
HI = (N/N-K)*[(Summation from K+1 to N of Soi)/(Summation
of 1 to N of Soi)]

Formula
=K11/(K11-L11)*(SUM(INDEX($I$11:$I$261,L11+1,1):I11)/J11)


Mathematical equation (2)
S = (1/J)*(Summation of 1 to J of Som)

Severity is defined as the average signal strength for a
given number of events (J) having the largest value of
signal strength.

Formula
=(SUM(LARGE($I$11:$I11,
{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20})))/N1
1

(So once I hit the 21st event, it calculates a value for
the largest 20 out of 21 events. For the 22nd event, it
calculates a value for the largest 20...(out of 22)..and
so on...once I get to the 21,000 event - it is going
through and looking for the top 20 again - I think you
can see how intensive this gets but I have not been able
to find a more efficient formula).

N ~ Number of hits up to and including time (t)
Soi ~ Signal strength of the ith event
J ~ empirically derived constant
Som ~ is the signal strength of the mth hit, where the
ordering of m is based on magnitude of signal strength.

If anyone has some insight, it would be greatly
appreciated.

Richard
 
H

Harlan Grove

R Gostautas said:
Mathematical equation (2)
S = (1/J)*(Summation of 1 to J of Som)

Severity is defined as the average signal strength for a
given number of events (J) having the largest value of
signal strength.

Formula
=(SUM(LARGE($I$11:$I11,
{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20})))/N11

So column N on each row contains the number of observations in the average
of the column I values ending in the same row? Not that the following is
much of an efficiency, but

=AVERAGE($I$11:$I11)

in the topmost cell then filled down into the next 19 cells would be more
readable.
(So once I hit the 21st event, it calculates a value for
the largest 20 out of 21 events. For the 22nd event, it
calculates a value for the largest 20...(out of 22)..and
so on...once I get to the 21,000 event - it is going
through and looking for the top 20 again - I think you
can see how intensive this gets but I have not been able
to find a more efficient formula).

The problem is the implicit sorting needed by LARGE. It looks like you can't
just sort a copy of column I because at each row N you need to average the
20 largest signal strengths from row 11 to row N based on data in the
original column I order.

Classic need for speed. Standard approach is to use more memory.

If your formula 2 results were in column X with the $I$11:$I11 result in
cell X11, then use the formula =I11 in X11, =AVERAGE($I$11:$I12) in X12,
and fill X12 down into X13:X30. If columns Y through AR were empty, enter
the following array formula in the range Y30:AV30.

=SMALL(I11:I30,COLUMN(INDIRECT("A:T")))

From row 31 on you're only averaging the 20 largest values from row 11
through the current row. Enter the formula =AVERAGE(Y31:AR31) in X31, and
enter the formula =MEDIAN($I31,Y30:Z30) in Y31, then fill Y31 right into
cells Z31:AQ31, and enter the formula =MAX(I31,AR30) in cell AR31. Select
X31:AR31 and fill down as far as needed. The resulting formulas will recalc
a lot faster, but they'll eat up a lot more memory. Up to you to decide if
the trade-off is worth it.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top