Weighted Averages

C

Chad Fields

Is there a way to calculated weighted averages in Excel?
Lotus has a function called WEIGHTAVG, this is what I am
looking for in Excel.

Thanks
 
G

GB

David J. Braden said:
Hi Mike,
Possible speed considerations aside, which I will look into shortly on
my funky system, I wonder why you (and others) suggest SUMPRODUCT as
opposed to an array-entered SUM(x*y).

You know I mean no disrespect... my question isn't rhtorical, but to
be taken simply as posed.

Regards,
Dave Braden

I don't know what Mike's answer would be Dave, but I can tell you that I
prefer code that's easy to understand - either for other people or for me in
say 6 months time when I've forgotten all about the project. Maybe Mike
finds SumProduct more obvious - I know I do!

Geoff
 
H

Harlan Grove

David J. Braden said:
Possible speed considerations aside, which I will look into shortly on
my funky system, I wonder why you (and others) suggest SUMPRODUCT as
opposed to an array-entered SUM(x*y).
....

Mostly because most of the time SUMPRODUCT formulas don't have to be
array-entered. One less thing to have to remember. One less thing to explain
to OPs. For me, that's about it.
 
D

David J. Braden

Geoff,
Thanks for your thoughts. I'm afraid that it is so natural for me to
work with vectors that I might have lost touch with that
"self-documenting" feature you allude to. Given that we resort to array
formulas for all sorts of stuff, such as worksheet-based numerical
integration, staying with such only strengthens, for me,
self-documentation. The advent of SUMPRODUCT stymies me, especially
given extant technical problems that abound within Excel.

OK, and plz don't laugh, but I am running a Mac for this next test. Are
you, or perhaps Harlan, up to running a test to see if there is a
comparative advantage from using one call versus the other?

I'm thinking along the lines of looping through "n" iterations where two
arrays are multiplied; to help trigger the loop's recalc, we need only
have a single =RAND(), or its equivalent, in the array.

Likely this will seem petty to you, but this consideration looms large
in certain numerical procedures.

I'll get going on it tonight on a platform that few care about, to glean
what I can. Would appreciate any and all insight you guys can provide.

All the best,
Dave Braden
(MVP - Mac)
 
A

Aladin Akyurek

Excel 10, as implemented for the Mac's OS-X, is about 32% slower when
doing something like SUM(rng1*rng2), where the formula is "array
entered", than using SUMPRODUCT(rng1,rng2). This frankly surprises me,

I'm also surprised: Would you check whether this difference in speed is due
to the *native* comma syntax of SumProduct?

Thus:

{=SUM(rng1*rng2)}

Vs

=SUMPRODUCT(rng1*rng2)

I expect these two to share the same set of procudures/routines to
accomplish the intended result, so they should take all things being equal
the same amount time.

I've never seen the weighted average formula built with SumProduct other
than with the _comma_ sysntax, although we see lots of SumProduct formulas,
unnecessarily using a _star_ syntax like in:

=SUMPRODUCT((rng1 RelOp cond1)*(rng2 RelOp cond2)*rng3)

instead of the kosher

=SUMPRODUCT(--(rng1 RelOp cond1),--(rng2 RelOp cond2),rng3). or even

=SUMPRODUCT((rng1 RelOp cond1)*(rng2 RelOp cond2),rng3)

One advantage of the comma syntax is that it tolerates text values in the
vectors that are multiplied:

=SUMPRODUCT(A1:A4,B1:B4)/SUM(A1:A4)

for example will not flounder by the existence of a formula-blank ("") in
B1, due to the formula, say, =IF(G1,1,"").
 
D

David J. Braden

Aladin,
Sure, I'll check it shortly. I want you to do the same on a Windows
platform, though. OK? And let us know the version of Excel you are
running. I am at this moment in the house of a Microsoft exec that is
blissfully *filled* with Macs; not a single Windows box, so I cannot
check this out on what is obviously the dominant platform for Excel.

Can you please enlighten me as to why anyone would use
=SUMPRODUCT(rng1*rng2) as opposed to
=SUMPRODUCT(rng1,rng2) ?

Regardless, I'll run through some timings; hoping to get some from you
as well (as I recall, QueryPerformance is the system routine for you to
call when doing this sort of stuff).

Oh, what I did was to set up rng1 with =RAND() in A1, and static values
in A2:A1000, rng2 has static values in B1:B1000. I'm trying to minimize
artifacts of Excel's RNG, yet get the volatility I need for the test.

If you have an idea on how to get the test better focused on the
funtions' implementations, I'm all ears.

Regards,
Dave Braden

Excel 10, as implemented for the Mac's OS-X, is about 32% slower when
doing something like SUM(rng1*rng2), where the formula is "array
entered", than using SUMPRODUCT(rng1,rng2). This frankly surprises me,

I'm also surprised: Would you check whether this difference in speed is due
to the *native* comma syntax of SumProduct?

Thus:

{=SUM(rng1*rng2)}

Vs

=SUMPRODUCT(rng1*rng2)

I expect these two to share the same set of procudures/routines to
accomplish the intended result, so they should take all things being equal
the same amount time.

I've never seen the weighted average formula built with SumProduct other
than with the _comma_ sysntax, although we see lots of SumProduct formulas,
unnecessarily using a _star_ syntax like in:

=SUMPRODUCT((rng1 RelOp cond1)*(rng2 RelOp cond2)*rng3)

instead of the kosher

=SUMPRODUCT(--(rng1 RelOp cond1),--(rng2 RelOp cond2),rng3). or even

=SUMPRODUCT((rng1 RelOp cond1)*(rng2 RelOp cond2),rng3)

One advantage of the comma syntax is that it tolerates text values in the
vectors that are multiplied:

=SUMPRODUCT(A1:A4,B1:B4)/SUM(A1:A4)

for example will not flounder by the existence of a formula-blank ("") in
B1, due to the formula, say, =IF(G1,1,"").

David J. Braden said:
Followup...
Excel 10, as implemented for the Mac's OS-X, is about 32% slower when
doing something like SUM(rng1*rng2), where the formula is "array
entered", than using SUMPRODUCT(rng1,rng2). This frankly surprises me,
but I am happy to know that Microsoft plods ahead. I would appreciate
someone letting me know how things stand on a recent Windows version of
Excel. So, from my perspective, not only does Mike's initial suggestion
make sense, as Geoff puts it, from a self-documenting perspective, but
has *very* strong grounding from hard-core performance perspecitves.
Sheesh, I'll bet Mike didn't even realize that (kidding, Mike!)

Again, any and all countervailing evidence to this would be greatly
appreciated. But I've a strong hunch that the observation will stand.

Regards,
DaveB
[/QUOTE]
 
D

David J. Braden

OK, I have the stats, which I have verified across a number of Macs. I
count on my buddy Dana, and you, to better inform me.

The fastest worksheet call I see is something like SUMPRODUCT(rng1,
rng2). I also tested it as an array-entered formula (hey, I look for
all possible tweaks and idiosyncracities within Excel), but no
difference there that I could detect across my set of platforms.

A formula like SUMPRODUCT(rng1*rng2), however entered, as tested across
a broad range of Macs, is about 20% slower. OK, OK, the average was
21.875, but wth.

Finally, using an array-entered SUM(rng1*rng2), which of course works
back through lots of previous versions of Excel, I took an average hit
of about 46%.

Bottom line, for me, is to use SUMPRODUCT(x,y) wherever possible, not
array-entered, unless someone can come up with a faster alternative.

I await your Windows tests.
Regards,
Dave Braden

Excel 10, as implemented for the Mac's OS-X, is about 32% slower when
doing something like SUM(rng1*rng2), where the formula is "array
entered", than using SUMPRODUCT(rng1,rng2). This frankly surprises me,

I'm also surprised: Would you check whether this difference in speed is due
to the *native* comma syntax of SumProduct?

Thus:

{=SUM(rng1*rng2)}

Vs

=SUMPRODUCT(rng1*rng2)

I expect these two to share the same set of procudures/routines to
accomplish the intended result, so they should take all things being equal
the same amount time.

I've never seen the weighted average formula built with SumProduct other
than with the _comma_ sysntax, although we see lots of SumProduct formulas,
unnecessarily using a _star_ syntax like in:

=SUMPRODUCT((rng1 RelOp cond1)*(rng2 RelOp cond2)*rng3)

instead of the kosher

=SUMPRODUCT(--(rng1 RelOp cond1),--(rng2 RelOp cond2),rng3). or even

=SUMPRODUCT((rng1 RelOp cond1)*(rng2 RelOp cond2),rng3)

One advantage of the comma syntax is that it tolerates text values in the
vectors that are multiplied:

=SUMPRODUCT(A1:A4,B1:B4)/SUM(A1:A4)

for example will not flounder by the existence of a formula-blank ("") in
B1, due to the formula, say, =IF(G1,1,"").

David J. Braden said:
Followup...
Excel 10, as implemented for the Mac's OS-X, is about 32% slower when
doing something like SUM(rng1*rng2), where the formula is "array
entered", than using SUMPRODUCT(rng1,rng2). This frankly surprises me,
but I am happy to know that Microsoft plods ahead. I would appreciate
someone letting me know how things stand on a recent Windows version of
Excel. So, from my perspective, not only does Mike's initial suggestion
make sense, as Geoff puts it, from a self-documenting perspective, but
has *very* strong grounding from hard-core performance perspecitves.
Sheesh, I'll bet Mike didn't even realize that (kidding, Mike!)

Again, any and all countervailing evidence to this would be greatly
appreciated. But I've a strong hunch that the observation will stand.

Regards,
DaveB
[/QUOTE]
 
A

Aladin Akyurek

David J. Braden said:
OK, I have the stats, which I have verified across a number of Macs. I
count on my buddy Dana, and you, to better inform me.

The fastest worksheet call I see is something like SUMPRODUCT(rng1,
rng2). I also tested it as an array-entered formula (hey, I look for
all possible tweaks and idiosyncracities within Excel), but no
difference there that I could detect across my set of platforms.

Well, not every formula that operates on (computed) arrays requires
confirming with control+shift+enter (CSE). The SumProduct formulas as a
prominent type generally does not need CSE. I recall a SumProduct formula by
Harlan, which appeared to require CSE while Sum could not be substituted for
SumProduct.

I tested {=SUM(rng1*rng2)} against

=SUMPRODUCT(rng1,rng2)

on XL 2000, Win 2000/1Ghz, using FastExcel.

Test specs: rng1= A1:A15000 and rng2= B1:B15000, figures generated using
RAND() & transformed into constant values, except the one in A1. The results
below are an average of 5 runs, expressed in millisecs.

SumProduct: ReCalc Time --- 21.4; Full Calc Time --- 31.4
Sum: ReCalc Time --- 23.9; Full Calc Time --- 34.4
A formula like SUMPRODUCT(rng1*rng2), however entered, as tested across
a broad range of Macs, is about 20% slower. OK, OK, the average was
21.875, but wth.

This result also appears to hold on Windows. It's strange though. Both Sum
and SumProduct would use, one would expect, the same set of underlying procs
when multiplying vectors.
Finally, using an array-entered SUM(rng1*rng2), which of course works
back through lots of previous versions of Excel, I took an average hit
of about 46%.

Bottom line, for me, is to use SUMPRODUCT(x,y) wherever possible, not
array-entered, unless someone can come up with a faster alternative.

SumProduct set up with the kosher, that is, with its native comma syntax is
a winner.
I await your Windows tests.
Regards,
Dave Braden



I'm also surprised: Would you check whether this difference in speed is due
to the *native* comma syntax of SumProduct?

Thus:

{=SUM(rng1*rng2)}

Vs

=SUMPRODUCT(rng1*rng2)

I expect these two to share the same set of procudures/routines to
accomplish the intended result, so they should take all things being equal
the same amount time.

I've never seen the weighted average formula built with SumProduct other
than with the _comma_ sysntax, although we see lots of SumProduct formulas,
unnecessarily using a _star_ syntax like in:

=SUMPRODUCT((rng1 RelOp cond1)*(rng2 RelOp cond2)*rng3)

instead of the kosher

=SUMPRODUCT(--(rng1 RelOp cond1),--(rng2 RelOp cond2),rng3). or even

=SUMPRODUCT((rng1 RelOp cond1)*(rng2 RelOp cond2),rng3)

One advantage of the comma syntax is that it tolerates text values in the
vectors that are multiplied:

=SUMPRODUCT(A1:A4,B1:B4)/SUM(A1:A4)

for example will not flounder by the existence of a formula-blank ("") in
B1, due to the formula, say, =IF(G1,1,"").

shortly
on but
to that
I for
me in Lotus
has a Excel.
<
[/QUOTE]
 
D

David J. Braden

Aladin,
Thanks for checking the stuff out for me on the Windows boxes.
I've no idea what "FastExcel" is, but will assume you mean a recent
version of Excel (10? 11?) running on Windows.

Slightly off topic, perhaps, but you write
" It's strange though. Both Sum and SumProduct would use, one would
expect, the same set of underlying procs when multiplying vectors."

Doesn't surprise me a bit, after observing the goofiness with many of
the stat functions (the variance-related ones are a good case in point,
among many other examples). That is the very reason I thought of
testing SUMPRODUCT in both a normal mode and CSE mode (who would know
what we might turn up?). I imagine the code for the beast to be
undocumented hell to wade through, and appreciate the efforts of folks
who tread within to fix its more glaring problems, of which there are
many (IMO). Excel has *many* examples of code that you would have
thought would have been shared, but aren't;. Result? Plenty of internal
inconsistancy.

Again, thanks for those test runs.

Dave Braden
MVP
 
J

J.E. McGimpsey

David J. Braden said:
Do you see similar results on your Windows platforms (and of course, Mac
platforms) that Aladin and I are getting? I ask because this particular
operation can make a *real* difference in how quickly intensive
simulations are resolved.

I'm not running a Windows box currently - mine's a dual boot that
I'm doing some long term testing with Linux for a client. I can take
it down for a couple hours at a time and boot into Windows if I need
to, but not very often...

I'm running VPC5 (should be VPC6, soon) for testing, and have a good
friend/client running a mixed Windows/Mac lab that lets me do
hard-core testing when necessary.

With XLv.X 10.1.5 and Mac OS 10.2.6, I see similar results to yours
- about 18-25% faster for SUMPRODUCT(,) over {SUM(*)}, depending on
what other processes are running.
 
D

David J. Braden

Harlan,
Thanks for the ideas. I cannot detect, on my machine of the moment (Mac,
running XL 10) any difference in array-entry versus not, other things
equal. What did give me pause is how much faster SUMPRODUCT is compared
to what you and I have been doing for many years, namely using Excel's
nifty array features within SUM. Come to think of it, it's a great
feature that MS has seemingly not communicated very well. Perhaps trying
to sells us on the elegance of the ATP stat functions distracted them.

In any case, I cannot construct a realistic example where
SUMPRODUCT(a,b) isn't meaningfully faster than SUM(a*b), array-entered.
As you well know, that particular function can be at the heart of
intensive simulations that might last hours, hence my interest.

Regards, you nut,
Dave B
 
H

Harlan Grove

David J. Braden said:
Thanks for the ideas. I cannot detect, on my machine of the moment (Mac,
running XL 10) any difference in array-entry versus not, other things
equal. What did give me pause is how much faster SUMPRODUCT is compared
to what you and I have been doing for many years, namely using Excel's
nifty array features within SUM. . . .

Speak for yourself. I've been avoiding SUM(x*y) for years precisely because
SUMPRODUCT is better suited to that task. Doesn't mean you may not find a
few postings from me using SUM(x*y), but I'm not proud of some of my
postings.

It seems logical to me that SUMPRODUCT when there are multiplications
involved would be faster because multiplication and addition would both be
in SUMPRODUCT while SUM(x*y) would necessarily involve multiplication in one
place, then addition *only* within SUM. The thing I find odd is that
SUMPRODUCT(x*y) is faster than SUM(x*y). In my cynical mind, I can only come
up with the guess that SUM was clumsily implemented in the original Excel
version for 512K Macs back in the mid 1980s, and Microsoft, true to form,
hasn't touched the code since. SUMPRODUCT was simply better written at a
later date. You've been around long enough - just how unrealistic is the
cynical perspective?
In any case, I cannot construct a realistic example where
SUMPRODUCT(a,b) isn't meaningfully faster than SUM(a*b), array-entered.
As you well know, that particular function can be at the heart of
intensive simulations that might last hours, hence my interest.

Which is why I've used SUMPRODUCT in one or two of my posts. Still, if I
needed to do anything involving millions or billions of floating point
operations, it wouldn't be in Excel. Spreadsheets, any of 'en, not just
Excel, are never going to be FAST compared to the alternatives.
 
A

Aladin Akyurek

Harlan Grove said:
...

And all I wanted to do was avoid array entry.

Also for me a main reason for adopting SumProduct, although gradually other
reasons started to turn up: a better execution time in particular with the
comma syntax and the expressive power.
Still, good to know SUMPRODUCT
with commas is faster than SUMPRODUCT with a single (long) arithmetic
expression. Note that if you have a number of boolean arrays as criteria,
SUMPRODUCT forces you to convert them to numeric because it shares Excel's
oddness of SUM(TRUE,TRUE) = SUM(FALSE,FALSE) = 0.

SumProduct could be indeed tweaked to execute implicit coercion wrt its
conditional arguments for more efficiency.
The next question in this
exercise would be the relativing timing of

SUMPRODUCT(--(A=B),C)

vs

SUMPRODUCT((A=B)*C)

If this is what Aladin has already checked, then fine. As I read it, it was
unclear.

What I checked out, like Dave, is =SUMPRODUCT(rng1,rng2) Vs
=SUMPRODUCT(rng1*rng2), where none of rng1 and rng2 were a conditional that
required coercion.

The structure you depict above can better be done with SumIf, unless I'm
misinterpreting the =B bit.

=SUMPRODUCT(--(A1:A15000>=D1),B1:B15000) Vs
=SUMPRODUCT((A1:A15000>=D1)*B1:B15000)

ReCalc: 0.6; 0.5
Full Calc: 51.3; 52.4
Volatility: 1.1%; 0.9%
MicroSecs/Formula: 51,343.7; 52,380.5

=SUMPRODUCT(--(A1:A15000>=E1),--(B1:B15000<=F1),C1:C15000) Vs
=SUMPRODUCT((A1:A15000>=E1)*(B1:B15000<=F1),C1:C15000) Vs
=SUMPRODUCT((A1:A15000>=E1)*(B1:B15000<=F1)*C1:C15000) Vs
{=SUM(IF($A$1:$A$15000>=E1,IF($B$1:$B$15000<=F1,$C$1:$C$15000,0),0))} [ Cond
Sum Wizard ]

ReCalc: 0.2; 0.2; 0.2; 0.2
Full Calc: 86.9; 85.7; 90.0; 89.3
Volatility: 0.2%; 0.2%; 0.2%; 0.3%
MicroSecs/Formula: 86,944.2; 85,683.2; 90,022.5; 89,347.6

All figures are averages of 5 runs.
 
D

David J. Braden

Hi Aladin,
Since I am running off of a set of Macs now, I cannot check out Charles
Wilson's intriquing program that you (and John McGimpsey) alerted me to.

Can you plz help me understand what the FullCalc, Recalc and Volatility
metrics are about?

Also, since you seem to be as much of a nut about computational
efficiency as I am, do you know of a list of Excel functions that has
version of introduction? I'm not concerned about when different Excel
functions were introduced prior to Version 7. I *am* concerned about
which version was the first to feature a function, such as SUMPRODUCT.
Of course some of the functions' implementations change through the
versions (like those related to the Normal distribution), but the
question I have is how to know the *first* version that any particular
worksheet function appeared.

TIA
Dave Braden

Aladin Akyurek said:
Harlan Grove said:
...

And all I wanted to do was avoid array entry.

Also for me a main reason for adopting SumProduct, although gradually other
reasons started to turn up: a better execution time in particular with the
comma syntax and the expressive power.
Still, good to know SUMPRODUCT
with commas is faster than SUMPRODUCT with a single (long) arithmetic
expression. Note that if you have a number of boolean arrays as criteria,
SUMPRODUCT forces you to convert them to numeric because it shares Excel's
oddness of SUM(TRUE,TRUE) = SUM(FALSE,FALSE) = 0.

SumProduct could be indeed tweaked to execute implicit coercion wrt its
conditional arguments for more efficiency.
The next question in this
exercise would be the relativing timing of

SUMPRODUCT(--(A=B),C)

vs

SUMPRODUCT((A=B)*C)

If this is what Aladin has already checked, then fine. As I read it, it was
unclear.

What I checked out, like Dave, is =SUMPRODUCT(rng1,rng2) Vs
=SUMPRODUCT(rng1*rng2), where none of rng1 and rng2 were a conditional that
required coercion.

The structure you depict above can better be done with SumIf, unless I'm
misinterpreting the =B bit.

=SUMPRODUCT(--(A1:A15000>=D1),B1:B15000) Vs
=SUMPRODUCT((A1:A15000>=D1)*B1:B15000)

ReCalc: 0.6; 0.5
Full Calc: 51.3; 52.4
Volatility: 1.1%; 0.9%
MicroSecs/Formula: 51,343.7; 52,380.5

=SUMPRODUCT(--(A1:A15000>=E1),--(B1:B15000<=F1),C1:C15000) Vs
=SUMPRODUCT((A1:A15000>=E1)*(B1:B15000<=F1),C1:C15000) Vs
=SUMPRODUCT((A1:A15000>=E1)*(B1:B15000<=F1)*C1:C15000) Vs
{=SUM(IF($A$1:$A$15000>=E1,IF($B$1:$B$15000<=F1,$C$1:$C$15000,0),0))} [ Cond
Sum Wizard ]

ReCalc: 0.2; 0.2; 0.2; 0.2
Full Calc: 86.9; 85.7; 90.0; 89.3
Volatility: 0.2%; 0.2%; 0.2%; 0.3%
MicroSecs/Formula: 86,944.2; 85,683.2; 90,022.5; 89,347.6

All figures are averages of 5 runs.
To put in my $0.02 about the relative execution times, I suspect (i.e., a
SWAG) that array-entry is in part the culprit. That is, array entry imposes
additional formula interpretation overhead that straight entry doesn't. The
way to test this would be iterating data ranges of 10, 100, 1000, and 10000
values to see whether the relative execution time remains constant. If the
differential narrows as the array size increases, it'd tend to support the
additional overhead hypothesis.
 
D

David J. Braden

John,
Forgive my ignorance, but if I don't ask, I don't have a chance to
ameliorate it: What the heck is VPCx?

Also, why would you, as a seriously savvy user, use Linux as opposed to
Mac's OS? No baiting here, just curious and having a blast learning new
things.

Regards,
DaveB
(the *other* Mac/Excel MVP <vbg>)
 
H

Harlan Grove

David J. Braden said:
I too have used SUMPRODUCT in some of my posts and certainly a number of
client-specific tasks, but am sensitive to the users who don't (for what
in my opinion are reasonable objections) upgrade to the "latest,
greatest" version of Excel. I don't yet know when SUMPRODUCT was
introduced (v 8?), but now I am alerted to its great usefulness.
....

FWIW, Lotus 123 introduced @SUMPRODUCT in Releases 2.2 and 3.0 in Summer
1989, so I'd be willing to bet a dinner that SUMPRODUCT made it's way into
Excel in version 3 (for Windows 3.0 - the one that crashed if anyone on the
same floor sneezed). Cutting to the point: I'd be surprised if you have any
clients running any version of Excel that doesn't provide SUMPRODUCT.
 
J

J.E. McGimpsey

David J. Braden said:
Forgive my ignorance, but if I don't ask, I don't have a chance to
ameliorate it: What the heck is VPCx?

Virtual PC - formerly of Connectix, now part of MS.
Also, why would you, as a seriously savvy user, use Linux as opposed to
Mac's OS? No baiting here, just curious and having a blast learning new
things.

Because I'm getting paid to...<g>
 
A

Aladin Akyurek

Dave,
Can you plz help me understand what the FullCalc, Recalc and Volatility
metrics are about?

What follows is a quote from the Help file of FastExcel:

QUOTE
· Recalc: milliseconds to recalculate (F9) the workbook immediately after a
full calculation.
· FullCalc: milliseconds to fully calculate the workbook (Ctrl/Alt/F9).

% Volatile is Recalculate Time as a percentage of full calc time: a high
value means one or more of:

2. High proportions of the formulae are Volatile
3. Evaluating the dependency trees takes a long time.
4. The workbook is too complex for Excel to build a dependency tree (status
bar always shows Calculate and there are no circular references).
UNQUOTE

Alas, I do not have any specific information on when some functions are
appeared in Excel. As Harlan intimated, the development of Lotus may allow
you to infer about their order of appearance.

Efficiency regarding formulas operating on (computed) arrays is often
gained though by restricting them to the relevant subranges within the
ranges of interest. A recent experience of mine is:

http://makeashorterlink.com/?W1BC219F5

Aladin

David J. Braden said:
Hi Aladin,
Since I am running off of a set of Macs now, I cannot check out Charles
Wilson's intriquing program that you (and John McGimpsey) alerted me to.

Can you plz help me understand what the FullCalc, Recalc and Volatility
metrics are about?

Also, since you seem to be as much of a nut about computational
efficiency as I am, do you know of a list of Excel functions that has
version of introduction? I'm not concerned about when different Excel
functions were introduced prior to Version 7. I *am* concerned about
which version was the first to feature a function, such as SUMPRODUCT.
Of course some of the functions' implementations change through the
versions (like those related to the Normal distribution), but the
question I have is how to know the *first* version that any particular
worksheet function appeared.

TIA
Dave Braden

Aladin Akyurek said:
Harlan Grove said:
...
Bottom line, for me, is to use SUMPRODUCT(x,y) wherever possible, not
array-entered, unless someone can come up with a faster alternative.
...

And all I wanted to do was avoid array entry.

Also for me a main reason for adopting SumProduct, although gradually other
reasons started to turn up: a better execution time in particular with the
comma syntax and the expressive power.
Still, good to know SUMPRODUCT
with commas is faster than SUMPRODUCT with a single (long) arithmetic
expression. Note that if you have a number of boolean arrays as criteria,
SUMPRODUCT forces you to convert them to numeric because it shares Excel's
oddness of SUM(TRUE,TRUE) = SUM(FALSE,FALSE) = 0.

SumProduct could be indeed tweaked to execute implicit coercion wrt its
conditional arguments for more efficiency.
The next question in this
exercise would be the relativing timing of

SUMPRODUCT(--(A=B),C)

vs

SUMPRODUCT((A=B)*C)

If this is what Aladin has already checked, then fine. As I read it,
it
was

What I checked out, like Dave, is =SUMPRODUCT(rng1,rng2) Vs
=SUMPRODUCT(rng1*rng2), where none of rng1 and rng2 were a conditional that
required coercion.

The structure you depict above can better be done with SumIf, unless I'm
misinterpreting the =B bit.

=SUMPRODUCT(--(A1:A15000>=D1),B1:B15000) Vs
=SUMPRODUCT((A1:A15000>=D1)*B1:B15000)

ReCalc: 0.6; 0.5
Full Calc: 51.3; 52.4
Volatility: 1.1%; 0.9%
MicroSecs/Formula: 51,343.7; 52,380.5

=SUMPRODUCT(--(A1:A15000>=E1),--(B1:B15000<=F1),C1:C15000) Vs
=SUMPRODUCT((A1:A15000>=E1)*(B1:B15000<=F1),C1:C15000) Vs
=SUMPRODUCT((A1:A15000>=E1)*(B1:B15000<=F1)*C1:C15000) Vs
{=SUM(IF($A$1:$A$15000>=E1,IF($B$1:$B$15000<=F1,$C$1:$C$15000,0),0))} [ Cond
Sum Wizard ]

ReCalc: 0.2; 0.2; 0.2; 0.2
Full Calc: 86.9; 85.7; 90.0; 89.3
Volatility: 0.2%; 0.2%; 0.2%; 0.3%
MicroSecs/Formula: 86,944.2; 85,683.2; 90,022.5; 89,347.6

All figures are averages of 5 runs.
To put in my $0.02 about the relative execution times, I suspect (i.e., a
SWAG) that array-entry is in part the culprit. That is, array entry imposes
additional formula interpretation overhead that straight entry
doesn't.
The
way to test this would be iterating data ranges of 10, 100, 1000, and 10000
values to see whether the relative execution time remains constant. If the
differential narrows as the array size increases, it'd tend to support the
additional overhead hypothesis.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top