H
Harlan Grove
RE:
If you mean reports, then I don't disagree that databases could be more
efficient in some instances. However, there's a question of how many
calculations (not the simple database sort of accumulations - SUM,
COUNT, AVERAGE, etc. - but rather more complex ones like the a_th
percentile of a statistical distribution with parameters based on
sample data) are needed to produce the results of interest.
This type of person should be locked up in jail for being egocentric.
SQL Server has the ability to make user defined functions.
No, really?!
Most useful software possesses the attributes of modularization,
programmable extension and the ad hoc ability to run other software.
That said, while one could write a udf for SQL RDBMSs to calculate, say,
estimators and standard errors for linear models, it'd unpleasant to do so.
That sort of thing is built into modern spreadsheets but not databases.
Now one could use a database as the storage backend for a stats package, but
it'd be the stats package that does the real work, and only after it loads
the data into its own data structures. I agree that databases are very good
storage and retrieval subsystems, but I remain unconvinced that the forms
and reporting tools provided by most databases are vastly and uniformly
superior to the alternative provided by spreadsheets (when properly
designed). But calculations using udfs leads to UI/front-end vs
processing/back-end interface issues.
Consider something as simple as the currency conversion rate table discussed
before. If there were no conversion fees reflected in the conversion rates,
then one would expect, e.g., the conversion rate from US dollars to UK
pounds to be the reciprocal of UK pounds to US dollars. So if the foreigh
exchange markets were perfectly efficient, the matrix of conversion rates
should be characterized by the entries in its lower triangular matrix being
the reciprocals of the entries in its upper triangular matrix. In Excel
terms,
=AND(ABS(TRANSPOSE(ConvRates)-1/ConvRates)<1E-12)
where ConvRates is the N-by-N matrix of conversion rates for N currencies,
which are unity along the main diagonal.
This can be done with an inner join of the table with itself swapping the
country fields in the second reference into the table. Something like
SELECT (Count(*) > 0)
FROM (SELECT [CT1].[ConvRate] = 1/[CT2].[ConvRate]
FROM CurrTbl As CT1 INNER JOIN CurrTbl AS CT2
ON (CT1.ToCountry = CT2.FromCountry) AND (CT1.FromCountry = CT2.ToCountry)
WHERE ((Abs([CT1].[ConvRate] - 1/[CT2].[ConvRate]) < 1E-12)));
Does the latter really look clearer to you?
Anythign that you can do in Excel; i can do in either MDX or SQL. And I'll
do it once; and make a couple of DTS packages; and I am done-- I don't need
to come back and type stuff into a spreadsheet in order to make a new
report..
Further to the example above, I could parametrize the table name in Excel,
so the formula would become
=AND(ABS(TRANSPOSE(INDIRECT(CTN))-1/INDIRECT(CTN))<1E-12)
How do you do that in SQL without resorting to some metalanguage or creating
temporary tables using common, reserved names?
I tell you this-- if spreadsheets were all-powerful. you know those
supercomputers that they have?? They would be runnnig Excel.
Who wastes money running databases on supercomputers or networked clusters?
No one in their right mind. They run hand-crafted FORTRAN or C code if they
want to get anywhere past single-digit gigaflops. Do you believe such
programs rely on realtime database feeds? No way. They use cached,
multiplexed data pipelines. And they don't waste runtime writing results to
databases but to many synchronized output streams. Possibly databases
populate the inputs and store the outputs eventually, but this is just
another example of databases being the backend storage subsystem. A useful
supporting role to be sure, but hardly center stage.
If Excel was really the best solution-- people would have Clusters of
spreadsheets. Excel 2005 Cluster Edition...
No more than they'd have clusters of DBMSs. Neither are the right tool for
the task. And no one uses supercomputers of clusters for generating reports.
Time for your next straw man.
explicit and reproducible.
They're reproducible because you CUT AND PASTE THE FORMULAS INTO 1,000
DIFFERENT CELLS.
Yup. No one said audit trails are storage efficient or free from unintended
screw-ups. Rather, it's easier to locate such screw-ups.
I CAN DO THE SAME THING ON THE DATABASE SIDE--
No you can't. The source tables, the definitions of the views or the queries
could be modified. If a user calls the same named stored procedure in March
and April, that user has no guarantee other than the word of their database
admin that nothing has changed other than the addition of data from the
month of April.
Historically this has been addressed in mainframe reporting by including
checksums, tape volume IDs, record counts and other stats derived from
inputs along with full JCL and key procedure listings in printouts. To
repeat, no one said audit trails were storage efficient.
....BUT I CHOOSE TO KEEP MY BUSINESS LOGIC IN ONE PLACE-- SO THAT IF I NEED TO
CHANGE A REPORT: I CHANGE IT IN ONE PLACE.
It's the need to prove there have been no changes in that business logic
between report runs that's the nasty problem. It's actually pretty easy to
show the formulas in two different workbooks are identical or substantially
similar. As long as printouts (which could be text files) of business logic
are included in the master copies of reports, there's a true audit trail.
As for change in one place, that is a definite advantage of databases.
However, using standard templates as the basis for spreadsheet reports also
provides centralized change management.
In spreadsheets the key is to separate storage of user inputs from
calculations. [IMO, this is much easier in Lotus 123 and Quattro Pro than
Excel.] Then the only thing that would need to be stored would be the user
inputs. All the 'business logic' (dare I call it formulas and macros?) would
remain in a separate, centrally stored and maintained workbook.
IT IS PHYSCIALLY AND PRACTICALLY IMPOSSIBLE TO GET DATA OUT OF
SPREADSHEETS.
If you don't know how to do something, then it'll seem impossible.
Getting information out of spreadsheets does require knowing the workbook's
filename and the worksheet ranges in it where that information is located.
If no proper documentation of filenames or worksheet/range addresses exists,
it can take considerable effort to locate the information. But if we're
talking about standard reports, it's easy as long as a consistent layout has
been maintained (or better still, common defined names have been used to
identify the information of interest).
I SHOUD KNOW. I HAVE WRITTEN DOZENS OF APPLICATIONS THAT CATALOG
SPREADSHEETS-- BECAUSE WINDOWS INDEXING SERVICE SUX-- I BUILD A DATABASE
WHERE I DO A FULLTEXT INDEX ON HUNDREDS, IF NOT THOUSANDS OF SPREADSHEETS.
Windows indexing is an OS feature, no? Microsft still to blame, but not the
Excel or Office developers.
You'd be better off using Perl to create such indices. Of course that
assumes you have the capability of learning anything other than SQL. An
associative array each entry of which is in turn a reference to an
associative array is a very powerful data structure for generalized text
search and retrieval.
AS IT IS; IT IS A TOTAL WASTE OF TIME THAT 80% OF YOU RECREATE THE SAME
REPORT BY HAND EVERY MONTH.
Maybe you and/or people in the company for which you work do so, but that
doesn't mean everyone does.
MDX IS GOING TO EAT YOU ALIVE, KIDS
I'm shaking in my shoes, soiling my pants, won't sleep ever again.