Improving Nested Formula Calculation Efficiency

K

kittronald

I'm having difficulty with trying to improve the efficiency of a
nested formula.

I'm using Excel 2007 and a third party COM add-in that provides
additional functions written in C++ (not a free product).

The .XLSB workbook contains two worksheets.

Sheet1 has about 1,000,000 cells, each containing the nested
formula.

Sheet2 is the source of the names (On_Setting, Primary,
Secondary and Tertiary) using cell references, not formulas.

Cust_Func is a third party function and can return a number or a
text based #N/A value - not an Excel #N/A.

Below is an example of the formula.



=IF(On_Setting=FALSE,"",IF(IFERROR(PRODUCT(Cust_Func($A1,Primary),1),
0)>0,Cust_Func($A1,Primary),IF(IFERROR(PRODUCT(Cust_Func($A1,Secondary),
1),0)>0,Cust_Func($A1,Secondary),IF(IFERROR(PRODUCT(Cust_Func($A1,J
$2,Tertiary),1),0)>0,Cust_Func($A1,Tertiary),"Error"))))

=IF(On_Setting=FALSE is linked to an ON/OFF radio button that is
used to disable calculation in case of cells that need to be edited
beforehand.

IFERROR(PRODUCT(Cust_Func($A1,Primary),1),0) is used to test
whether a zero or an #N/A value is returned by Cust_Func($A1,Primary).

If Cust_Func($A1,Primary) evaluates to zero or #N/A, the IF
statement moves to the next IF statement.

If Cust_Func($A1,Primary) evaluates to > 0, then
Cust_Func($A1,Primary) is returned.

The only variables that change with each nested IF statement are
the names Primary, Secondary and Tertiary.


*** ISSUES***

1) It takes almost 40 minutes for these calculations to complete.
The option of building an Intel octo-core, hyper-threaded and
overclocked desktop isn't immediately possible.

2) It has been speculated that all IF statements in a nested
formula are evaluated, regardless if the second IF statement evaluates
TRUE. Is this correct ?

3) Is it possible to write a UDF that calls the Cust_Func function
(supported) and iterates through the names until a value greater than
zero is returned and then stops iterating to prevent unnecessary
calculations ?

Easy, right ?



- Ronald K.
 
M

Martin Brown

I'm having difficulty with trying to improve the efficiency of a
nested formula.

I'm using Excel 2007 and a third party COM add-in that provides
additional functions written in C++ (not a free product).

The .XLSB workbook contains two worksheets.

Sheet1 has about 1,000,000 cells, each containing the nested
formula.

Sheet2 is the source of the names (On_Setting, Primary,
Secondary and Tertiary) using cell references, not formulas.

Cust_Func is a third party function and can return a number or a
text based #N/A value - not an Excel #N/A.

Below is an example of the formula.



=IF(On_Setting=FALSE,"",IF(IFERROR(PRODUCT(Cust_Func($A1,Primary),1),
0)>0,Cust_Func($A1,Primary),IF(IFERROR(PRODUCT(Cust_Func($A1,Secondary),
1),0)>0,Cust_Func($A1,Secondary),IF(IFERROR(PRODUCT(Cust_Func($A1,J
$2,Tertiary),1),0)>0,Cust_Func($A1,Tertiary),"Error"))))

=IF(On_Setting=FALSE is linked to an ON/OFF radio button that is
used to disable calculation in case of cells that need to be edited
beforehand.

IFERROR(PRODUCT(Cust_Func($A1,Primary),1),0) is used to test
whether a zero or an #N/A value is returned by Cust_Func($A1,Primary).

It singularly fails to do that since PRODUCT(0,1)==0 is a valid number.
(unless Cust_Function returns a list of numbers and if it does you
should multiply them together on the other side of the data bridge)
If Cust_Func($A1,Primary) evaluates to zero or #N/A, the IF
statement moves to the next IF statement.

If Cust_Func($A1,Primary) evaluates to> 0, then
Cust_Func($A1,Primary) is returned.

The only variables that change with each nested IF statement are
the names Primary, Secondary and Tertiary.

It would be better to offload the hunt through Primary, Secondary and
Tertiary onto the data server - and up to 6x faster.

You can easily double the speed by adding hidden columns B,C,D

containing Cust_Function($A1, Primary), Cust_Function($A1, Sec...)

Then simplifying the conditional to use the preloaded values.

=IF(On_Setting=FALSE,"",IF(ISNUMBER(B1), B1, IF(ISNUMBER(C1),C1,
IF(ISNUMBER(D1),D1,"Error"))))

It is possible you mean IF (AND(ISNUMBER(B1), B1>0)

Based on the spec. I can't see the point of the call to PRODUCT(x, 1) is
it is the identity and the extra error handling when it fails is an
additional and unnecessary performance hit.

*** ISSUES***

1) It takes almost 40 minutes for these calculations to complete.
The option of building an Intel octo-core, hyper-threaded and
overclocked desktop isn't immediately possible.

Suggested changes above will only fetch data once for each line instead
of twice. It should be at least 2x faster.
2) It has been speculated that all IF statements in a nested
formula are evaluated, regardless if the second IF statement evaluates
TRUE. Is this correct ?

No that is incorrect even for the lamentably slow XL2007 code. The
evaluation of IF statements is lazy only the ones down the true path are
executed. To test define a UDF in a module and set a breakpoint.

eg.

Function toy(a)
toy = a + 1
End Function

With a breakpoint on the addition then use it in the worksheet

=IF(toy(1), toy(2), toy(3))

BTW You might find that XL2003 runs 20-30% faster. XL2007 is slower.
3) Is it possible to write a UDF that calls the Cust_Func function
(supported) and iterates through the names until a value greater than
zero is returned and then stops iterating to prevent unnecessary
calculations ?

Easy, right ?

The main hit at present is from having to move the data from the server
to XL twice for each IF(,,) you execute. You need to put the choice
between 3 options on the other side so that only one item is transferred
back to XL. That will run somewhere between 2x and 6x faster depending
on the proportion of searches that go to full depth.

You could do a local XL UDF that is slightly better along the lines of

Function MyUDF(a)
on error resume next
b = Cust_Function($A1, Primary)
if b>0 then
MyUDF = b
else
c = Cust_Function($A1, Secondary)
if c>0 then
MyUDF =c
else
d = Cust_Function($A1, Tertiary)
if d>0 then
MyUDF = d
else
MyUDF = "Error"
endif
endif
endif
end function

There may be some syntactic sugar needed around your externally supplied
UDF Cust_Function to make it work - see the manual.

This only moves data from your server when it has to.

Is there no way to recast the problem so that a range of queries could
be passed to the server and it passes back the answers in a similar
sized range? This would have considerably better performance.

Regards,
Martin Brown
 
K

kittronald

Martin,

Thank you for the time you put into your response.

I should have given more detail on the environment of the situation.

1) Using Excel 2007 SP2 and a third party add-in that provides custom
functions to Excel

2) The third party add-in connects to a local COM application that
fetches data from a remote server that is not within user control. For
example, think of downloading employment statistics from a government FTP
site - SQL type commands and server side processing aren't available. The
returned data is cached in a local, flat file database.

3) When one of its custom functions is used, the add-in:

a) Makes a call to a local COM application and determines if the
requested data exists in the local database. If it doesn't, the data is
requested from a remote server.

b) The COM application returns the requested data cached in the
local database to the add-in.

Regarding the use of PRODUCT, I should have extracted the following:

IFERROR(PRODUCT(Cust_Func($A1,Primary),1),0)>0

IFERROR is used to deal with a text formatted #N/A result since
PRODUCT('#N/A,1)=#VALUE. The add-in doesn't return a standard Excel #N/A
error code. IFERROR also saves from having to calculate another PRODUCT
formula if an error is not returned. And >0 is used to deal with a result of
zero.

Additionally, the entire formula is filled into approximately 1,000,000
cells on Sheet1, so preloading the PRIMARY, etc. values wouldn't be
feasible.

It's interesting you say Excel 2003 is faster than 2007. Have you found
Excel 2010 to be any better ? I was planning to hold off upgrading until
Excel 15.

I'll try out the UDF, but since the formula will be filled right and
down, how could the UDF account for that since $A1 will only be correct for
one cell ?



- Ronald K.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top