spreadsheet addiction

patrick · Jan 3, 2005

There is a new page on the Burns Statistics website

http://www.burns-stat.com/pages/Tutor/spreadsheet_addiction.html

that looks at spreadsheets from a quality assurance perspective.
It looks at several specific problems with Excel, including the
behavior of unary minus and the quality of statistical functionality.

Bob Phillips · Jan 3, 2005

Patrick,

I received this news with interest.

However, without making any overall comment on the page, as I haven't read
and digested it all yet, a couple of immediate observations.

On the section on Unary Minus. I just don't get the results that you get. If
I put 3 in A1, and =-A1^2 in B1 I get 9. If I put -3^2 in C1 I get 9 (Excel
2000, XP Pro). Both of these are consistent with what MS seems to predict.

On the graphing section, I think you make a very weak point. I agree with
you that Excel graphs are extremely amateurish (rubbish springs to mind),
and I very rarely use them. But your example on the pie chart, the negative
value excepted, is a fault with the chart choice not the graphing
capability. This would be true of whatever package implemented a 3D pie
chart. It may be a poor idea to have such a chart, but having that chart
type does not make the graphics capability poor (other things do!). Thanks
for the pointer to R though, I will check that out.

I would add that these statements are made by a non-statistician, but a
frequent Excel user. Excel has many weaknesses, but the problems that are
usually created could be created in any spreadsheet tool, because they are
made in the design and the programming (development).

--

HTH

RP
(remove nothere from the email address if mailing direct)

Harlan Grove · Jan 3, 2005

Bob Phillips said:
On the section on Unary Minus. I just don't get the results that you get.
If I put 3 in A1, and =-A1^2 in B1 I get 9. If I put -3^2 in C1 I get 9
(Excel 2000, XP Pro). Both of these are consistent with what MS seems to
predict.

....

This point has been argued to death before. Excel's operator precedence for
unary minus vs exponentiation is inconsistent with standard mathematical
operation precedence *if* one takes the token '-' always to be the negation
operator. Most though not all programming languages that provide both unary
minus and exponentiation operators give exponentiation the higher
precedence. So, e.g., in VBA -3^2 returns -9.

FWLIW, PostgreSQL and COBOL share Excel's operator precedence, but BASIC,
FORTRAN, C-derived languages with exponentiation operators, and all other
spreadsheets not trying to provide strict Excel compatability give
exponentiation higher precedence than unary minus. In other words, the way
VBA works is much more common than the way Excel works.

Given the fact that much of Excel's calculation functionality was initially
implemented with what can accurately be described as numerical obtuseness,
it's very likely Excel's original implementors were ignorant of standard
mathematical conventions and believed -3^2 should be interpretted as (-3)^2,
so -A5^2 should also be construed as (-A5)^2.

With regard to programming languages, and Excel formulas are a type of
programming language, operator precedence is up to the implementor. Once a
particular operator precedence has been established it cannot be changed
without screwing up existing code. Therefore, in terms of consistency with
other calculation software (and VBA!), Excel's implementors screwed up
Excel's operator precedence, but we all have to live with it as it is.

Bob Phillips · Jan 3, 2005

Harlan Grove said:
...

This point has been argued to death before. Excel's operator precedence for
unary minus vs exponentiation is inconsistent with standard mathematical
operation precedence *if* one takes the token '-' always to be the negation
operator. Most though not all programming languages that provide both unary
minus and exponentiation operators give exponentiation the higher
precedence. So, e.g., in VBA -3^2 returns -9.

Yes, but the point Patrick's paper made, and which I didn't replicate, was
the inconsistency. Right or wrong, it is consistent as far as I can test.
Your point is a valid brick to throw at Excel, but Patrick was using
another, which doesn't seem so valid to me. And it was talking spreadsheets,
thereby discounting VBA.

Harlan Grove · Jan 3, 2005

There is a new page on the Burns Statistics website

http://www.burns-stat.com/pages/Tutor/spreadsheet_addiction.html

that looks at spreadsheets from a quality assurance perspective.

....

Two general points. First, spreadsheets are often the only programmable
system with GUI interface most business users are given (most are given
Windows Scrioting with VBScript and VJScript bundled with Windows, but few
are aware of having it, and it provides no GUI IDE). Second, I agree that
spreadsheet models that take more than a few minutes at most to recalc are
too complex to remain as spreadsheet models, *but* it's usually not the
developer or maintainer of such systems that decide to leave them as
spreadsheet models.

The first bears immediately on your technology acceptance model. If XL is
believed to be the only tool some person has to develop any computationally
intensive models, then that person will use XL because by it's exclusive
availability it automatically satisfies your technology acceptance model.
It's the lack of alternatives for most people who've been given develoment
tasks that explains the overuse of spreadsheets.

Details

Spreadsheet Computation

- DATA CHECKING
This is debatable. There are other languages that automatically treat
strings as numeric zeros in numeric calculations (most scripting languages).
This is a design trade-off between convenience and type checking rigor.
Given the explosive growth of scripting languages (as well as spreadsheets)
compared to strongly typed languages over the last two decades, it seems
very clear that developers have placed a higher value on convenience than
rigorous type checking. Why should spreadsheets differ?

Granted there are risks involved with this, but shouldn't adult developers
be allowed to decide whether the advantages more than offset the risks?

Also, Excel 2003's SmartTags provide some data checking functionality.

- DATA STRUCTURE
Your claim that "Complex data [...] demand a convention for placement of the
components." is to vague and too general. There may be structures that allow
for more efficient access during calculations, but it just doesn't matter
whether one variable in a given calculation is placed to the left or right,
above of below another in a worksheet. Related data should be stored close
together, and data layout used should allow formulas to be as simple and
efficient as possible, but that's all.

- COMMAND HISTORY
If you mean recording every action, then such histories could grow much
larger than the workbook itself. That's a nonstarter. If you mean there
should be facilities to record formulas, formatting, VBA code and forms in
plain text that could be used in revision control systems as is standard
software engineering practice with more traditional programming languages,
then you've got a point. However, that's not going to happen until companies
decide to approach spreadsheet development the way they approach other
software development - as an endevor requiring disciplined procedures.

The Treatment Center (Alternatives)
- COMPUTATION
I like R, and I use it regularly, but you're crazy if you believe it could
be used efficiently by most current Excel users. First off, it uses a
command line interface. That actually gives it greater power than Excel, but
at the cost of considerable complexity. I've worked with people who believe
the same thing about APL. They're right that spreadsheets are inefficient,
but wrong in believing that spreadsheets are crippled. Also, undisciplined
use of R will produce as many problems as undisciplined use of spreadsheets.
Unless companies are willing to take *ALL* application development out of
the hand of people with no formal training (school coursework or in-house)
in software development, companies are stuck with spreadsheets as the least
of many evils.

Specific Problems with Excel
Excel's current dominance owes as much to what Lotus Development Corp didn't
do as to what Microsoft did do. Lotus was so immersed in the look & feel
lawsuits of the late 1980s that they misapprehended the importance of
Windows. What killed 123 wasn't that it was inferior to Excel (IMO, 123
Release 5 with only the classic macro facility was still better than Excel
5/95 with VBA, but a close contest), it was that so-called productivity
applications became bundled in suites, and Word was MUCH better than AmiPro.
[In retrospect, Lotus's single biggest mistake was in failing to pursue
acquisition of WordPerfect and settling for AmiPro instead. Of course, no
major upgrades to 123 between 1986 and 1989 didn't help either.]

- WRITING ASCII FILES
Agreed, but your criticism is too narrow. Excel also mangles some types of
user inputs, being far too ready to convert entries into dates. There should
be a way to turn off Excel's 'helpful' features.

- PROPAGATION OF BLANKS
Excel has always skipped blanks (and cells containing text and booleans)
when they were accessed within ranges. So if A1 contained 1 and A3 contained
5 while A2 was blank and A4 contained "xyz", =SUM(A1:A4) has always returned
6, and =AVERAGE(A1:A4) has always returned 3. Excel also has never provided
a mechanism to propagate blanks. Using A1:A4 as above, Excel must return
something for the formula =A2, and that something must be either a number, a
text string, a boolean value or an error value. Excel provides no value that
equals blank (though returning Empty from udfs does return a value Excel
will treat as blank, Excel itself has no means of generating such a value as
a formula result). Excel's convention, reasonable for a program intended
primarily for calculation, is to return 0 for =A2.

This limitation isn't unique to Excel. 123 and QuattroPro do the same thing.

- RANGE CHANGES
No big deal. Anyone with real spreadsheet development experience knows to
include blank cells bordering data ranges in order to prevent this. For
example, if I want to use data in C3:H22 that could later vary in the number
of rows or columns, I'll refer to B2:I23, keeping columns B and I and rows 2
and 23 blank or containing FALSE with number format ";;;""-""" and Fill
horizontal alignment. Then when I insert or delete rows or columns inside or
bordering C3:H22, the containment range B2:I23 will adapt as needed.
Needless to say, I disable range extension. Avoiding this comes with
experience, just as other common bugs in other languages are a sign of
inexperience.

- TYPING MISTAKES
If you want to trap 'em, use Data > Validation. Excel defaults to free form
input, so the contents of any cell need not be the same type as the values
in any adjacent cell. If you want to impose such restrictions, Excel
provides you the tools necessary to change its default behavior. Only
unintended date interpretation is a problem, and accidentally entering
slashes can be nearly eliminated by entering numbers using the numeric
keypad. USE THE RIGHT TOOLS FOR THE TASK, which in this case means USE THE
RIGHT SET OF NUMBER KEYS.

- UNARY MINUS
As already pointed out, with 3 in A1, the formula =-A1^2 returns 9 as does
the formula =-3^2 (and as does =-(3)^2). Your claim that the two formulas
return different results is wrong. This is due to nonstandard operator
precedence, but once established (and extant for over two decades now) it
can't be changed without causing errors in existing workbooks.

You need to rewrite this entire section to correct all the errors. [Very
poorly done job here. Did you really check any of the formulas in Excel?]

- BAD GRAPHICS
Not unique to Excel. No sensible person uses Excel charts for anything other
than presentations, which are a refined form of lying.

- POOR STATISTICS
As you mention yourself, you shouldn't use Excel for real statistics work.
While the continuous distributions and regression functions have been
improved considerably in Excel 2003, the discreet distributions are still
pretty poor.

- UNHELPFUL HELP
DEFINITELY! Microsoft has invested next to nothing on correcting long
standing errors in Excel's online help, why should they invest anything on
added help topics or expanded examples?

- NO DATABASE OF BUGS
What do you expect from a company that requires users to pay initially for
the privilege of reporting bugs in their software?

- LIMITED DIMENSIONS
Of the major commercial spreadsheets, only QuattroPro provides >256 columns
and >65536 rows, and it's far & away the buggiest of them. If you want more
dimensions, use Xess (a Unix spreadsheet that has a Windows version) or wait
until Gnumeric 1.4's Windows port is more stable, then tweak its sources and
rebuild to get more columns and/or rows.

patrick · Jan 4, 2005

Bob said:
Yes, but the point Patrick's paper made, and which I didn't replicate, was
the inconsistency. Right or wrong, it is consistent as far as I can test.
Your point is a valid brick to throw at Excel, but Patrick was using
another, which doesn't seem so valid to me. And it was talking spreadsheets,
thereby discounting VBA.

Correct, the main issue is consistency within the spreadsheet.

I have rewritten the section because (today) I am getting 9 for
=-A1^2 and =-3^2 and =-(3)^2 (where A1 has value 3) when using
Excel 2000. In Microsoft Works version 7.0 I get -9, 9, -9,
respectively.

I am quite sure that I was getting the same behavior in Excel 2000
as I am in Microsoft Works. This means that either I'm senile
(which is entirely possible), or Excel is senile (which I also
hold to be possible).

Harlan Grove · Jan 4, 2005

I have rewritten the section because (today) I am getting 9 for
=-A1^2 and =-3^2 and =-(3)^2 (where A1 has value 3) when using
Excel 2000. In Microsoft Works version 7.0 I get -9, 9, -9,
respectively.

....

Excel and Works Spreadsheet are not the same thing, even though Word and
Works Word Processor are. Works Spreadsheet's operator precedence is the
same as that in most other spreadsheets, so -(3)^2 = -9, but -3^2 = 9
departs from most other spreadsheets. That does appear to be a parsing
idiosyncrasy, but it's well defined if the numeric constant beginning with
minus sign appears at the beginning of the expression or immediately follows
an operator.

Jerry W. Lewis · Jan 4, 2005

I have rewritten the section because (today) I am getting 9 for
=-A1^2 and =-3^2 and =-(3)^2 (where A1 has value 3) when using
Excel 2000. In Microsoft Works version 7.0 I get -9, 9, -9,
respectively.

I am quite sure that I was getting the same behavior in Excel 2000
as I am in Microsoft Works. This means that either I'm senile
(which is entirely possible), or Excel is senile (which I also
hold to be possible).

Sorry, but the senility appears to be yours. 2000, XP, & 2003 (and I
suspect all other versions) all evaluate identically in this regard. If
you wish, I could try it on Excel 4.

Jerry

Bob Phillips · Jan 4, 2005

Correct, the main issue is consistency within the spreadsheet.

I have rewritten the section because (today) I am getting 9 for
=-A1^2 and =-3^2 and =-(3)^2 (where A1 has value 3) when using
Excel 2000. In Microsoft Works version 7.0 I get -9, 9, -9,
respectively.

I wan't arguing for your case!

I maintain that Excel is consistent (maybe wrong, but consistently so). As
Harlan states, comparing Excel with Works is not a relevant argument to use
against Excel's consistency. This whole section, even re-written, seems
misleading and irrelevant to me.

Jerry W. Lewis · Jan 4, 2005

Harlan Grove wrote:
....

- COMMAND HISTORY
If you mean recording every action, then such histories could grow much
larger than the workbook itself. That's a nonstarter. If you mean there
should be facilities to record formulas, formatting, VBA code and forms in
plain text that could be used in revision control systems as is standard
software engineering practice with more traditional programming languages,
then you've got a point. However, that's not going to happen until companies
decide to approach spreadsheet development the way they approach other
software development - as an endevor requiring disciplined procedures.

There is Tools|Options|View|Formulas, and a more comprehensive form of
documentation could be generated by a VBA application. I vaguely recall
seeing workbook documentors being offered by 3rd parties.

....

- POOR STATISTICS
As you mention yourself, you shouldn't use Excel for real statistics work.
While the continuous distributions and regression functions have been
improved considerably in Excel 2003, the discreet distributions are still
pretty poor.

Your distribution function comments are potentially misleading. The
discrete distributions have a limited working range (which can be
extremely frustrating), but tend to be very accurate when they return
values. The continuous distribution cdf's (I don't think that any other
than the Normal distribution have been improved in 2003) also have
limited working ranges, and (except for 2003 Normal, which is much
improved) are much less accurate (when they return values) than the
discrete distributions. The major 2003 improvement in this area was to
the inverse cdf functions; the algorithm was improved to make it a much
better inverse of the corresponding ...DIST function, but the accuracy
of ...INV functions is still limited by the (lack of) accuracy in
....DIST functions.

Accuracy is better than printed tables for typical simple hypothesis
testing applications, but as functions with the potential to be used in
many other ways, the continuous distributions still need a lot of work.

Patrick, your information is dated with regard to 2003 RAND() returning
negative numbers. This was fixed nearly a year ago, along with problems
with the treatment of blank cells in otherwise improved bivariate
statistics calculations.
http://office.microsoft.com/en-us/assistance/ha011525601033.aspx

Jerry

spreadsheet addiction

patrick

Bob Phillips

Harlan Grove

Bob Phillips

Harlan Grove

patrick

Harlan Grove

Jerry W. Lewis

Bob Phillips

Jerry W. Lewis