PC Review


Reply
Thread Tools Rate Thread

Multiple (as in many) regressions

 
 
lk
Guest
Posts: n/a
 
      22nd Dec 2011
Hi all

Does anyone have a quick routine for running multiple regressions over a
range of files in Excel?

I have a whole load of .xls files. Each contain data in columns D and E.
D is always the independent (x axis) and E is always the dependent (y axis)
but each file will have a different number of rows. I've never been too
fond of .xlDown in VBA and a clumsy routine to count until the first blank
row would take a while with over 5,000 files!

I need a routine to open up each file, feed the ranges in D and E into the
Regression routine in the Data Analysis toolpak, create the outputs but
(crucially) dump the output somewhere. With 5,000 files, this means 5,000
regressions and therefore 5,000 worksheets in the results workbook. I'm
not sure what the limit is nowadays - used to be 256 sheets per book when I
last got anywhere near it.

What I really need is a table showing (for each regression) what the
"important" results were i.e. the intercept, the coefficient and the
associated goodness of fit.

If anyone can help, that would be great.

Thx
lk

 
Reply With Quote
 
 
 
 
GS
Guest
Posts: n/a
 
      22nd Dec 2011
lk has brought this to us :
> Hi all
>
> Does anyone have a quick routine for running multiple regressions over a
> range of files in Excel?
>
> I have a whole load of .xls files. Each contain data in columns D and E. D
> is always the independent (x axis) and E is always the dependent (y axis) but
> each file will have a different number of rows. I've never been too fond of
> .xlDown in VBA and a clumsy routine to count until the first blank row would
> take a while with over 5,000 files!
>
> I need a routine to open up each file, feed the ranges in D and E into the
> Regression routine in the Data Analysis toolpak, create the outputs but
> (crucially) dump the output somewhere. With 5,000 files, this means 5,000
> regressions and therefore 5,000 worksheets in the results workbook. I'm not
> sure what the limit is nowadays - used to be 256 sheets per book when I last
> got anywhere near it.
>
> What I really need is a table showing (for each regression) what the
> "important" results were i.e. the intercept, the coefficient and the
> associated goodness of fit.
>
> If anyone can help, that would be great.
>
> Thx
> lk


Firstly, to find the last row of data you should start at the last cell
of a column and use .End(xlUp).Row to find the last entry in the column
regardless if the data is contiguous.

Secondly, I don't recommend you open over 5000 Excel workbooks to just
grab the contents of Cols D:E. You might want to consider using ADODB
to grab the data (from closed workbooks) into a recordset that you can
manipulate however you like. Examples of how to do this can be
downloaded here...

http://www.appspro.com/conference/Da...rogramming.zip

I'm not sure why anyone would store output data in an Excel file as a
storage container. Normally, outputs are written to DAT, TXT, or CSV
files so the data can be utilized by any program via its normal file
I/O functions. Excel files take up way more storage space than plain
text files and so is not an efficient format for any program to output
raw data to. Not saying exporting to Excel is a bad practice for
analysis purposes, just that it's not an efficient way to store data
for use by other software using that data. So.., if the files actually
are CSVs then the specific data you need to pass to Analysis Toolpak
functions can be manipulated fairly easily using VB's normal file I/O
functions to load the data into an array where it can also be
easily/quickly parsed as needed for further use. Since all of this
happens in memory (as apposed to physically opening each file) it's
blazingly faster than working with Excel workbooks and using ADODB to
grab data from closed workbooks.

--
Garry

Free usenet access at http://www.eternal-september.org
ClassicVB Users Regroup! comp.lang.basic.visual.misc


 
Reply With Quote
 
 
 
 
lk
Guest
Posts: n/a
 
      22nd Dec 2011


"GS" wrote in message news:jcvfij$7ui$(E-Mail Removed)...

lk has brought this to us :
> Hi all
>
> Does anyone have a quick routine for running multiple regressions over a
> range of files in Excel?
>
> I have a whole load of .xls files. Each contain data in columns D and E.
> D is always the independent (x axis) and E is always the dependent (y
> axis) but each file will have a different number of rows. I've never
> been too fond of .xlDown in VBA and a clumsy routine to count until the
> first blank row would take a while with over 5,000 files!
>
> I need a routine to open up each file, feed the ranges in D and E into the
> Regression routine in the Data Analysis toolpak, create the outputs but
> (crucially) dump the output somewhere. With 5,000 files, this means
> 5,000 regressions and therefore 5,000 worksheets in the results workbook.
> I'm not sure what the limit is nowadays - used to be 256 sheets per book
> when I last got anywhere near it.
>
> What I really need is a table showing (for each regression) what the
> "important" results were i.e. the intercept, the coefficient and the
> associated goodness of fit.
>
> If anyone can help, that would be great.
>
> Thx
> lk


Firstly, to find the last row of data you should start at the last cell
of a column and use .End(xlUp).Row to find the last entry in the column
regardless if the data is contiguous.

Secondly, I don't recommend you open over 5000 Excel workbooks to just
grab the contents of Cols D:E. You might want to consider using ADODB
to grab the data (from closed workbooks) into a recordset that you can
manipulate however you like. Examples of how to do this can be
downloaded here...

http://www.appspro.com/conference/Da...rogramming.zip

I'm not sure why anyone would store output data in an Excel file as a
storage container. Normally, outputs are written to DAT, TXT, or CSV
files so the data can be utilized by any program via its normal file
I/O functions. Excel files take up way more storage space than plain
text files and so is not an efficient format for any program to output
raw data to. Not saying exporting to Excel is a bad practice for
analysis purposes, just that it's not an efficient way to store data
for use by other software using that data. So.., if the files actually
are CSVs then the specific data you need to pass to Analysis Toolpak
functions can be manipulated fairly easily using VB's normal file I/O
functions to load the data into an array where it can also be
easily/quickly parsed as needed for further use. Since all of this
happens in memory (as apposed to physically opening each file) it's
blazingly faster than working with Excel workbooks and using ADODB to
grab data from closed workbooks.

--
Garry

Free usenet access at http://www.eternal-september.org
ClassicVB Users Regroup! comp.lang.basic.visual.misc


If it's any help - I have the whole dataset in a 15,000,000 row csv file.
I'm quite happy to manipulate if from there
but couldn't see how to get Excel to read chunks of it, analyse that chunk
and then move on to the next chunk.

 
Reply With Quote
 
Xt
Guest
Posts: n/a
 
      26th Dec 2011
On Dec 23, 2:58*am, "lk" <gofys...@wrong.address.com> wrote:
> Hi all
>
> Does anyone have a quick routine for running multiple regressions over a
> range of files in Excel?
>
> I have a whole load of .xls files. * Each contain data in columns D andE.
> D is always the independent (x axis) and E is always the dependent (y axis)
> but each file will have a different number of rows. * I've never been too
> fond of .xlDown in VBA and a clumsy routine to count until the first blank
> row would take a while with over 5,000 files!
>
> I need a routine to open up each file, feed the ranges in D and E into the
> Regression routine in the Data Analysis toolpak, create the outputs but
> (crucially) dump the output somewhere. * With 5,000 files, this means 5,000
> regressions and therefore 5,000 worksheets in the results workbook. * I'm
> not sure what the limit is nowadays - used to be 256 sheets per book whenI
> last got anywhere near it.
>
> What I really need is a table showing (for each regression) what the
> "important" results were i.e. the intercept, the coefficient and the
> associated goodness of fit.
>
> If anyone can help, that would be great.
>
> Thx
> lk


If it makes things easier, you don't need to use the Data Analysis
Toolpack. You can use the inbuilt Excel functions SLOPE and INTERCEPT
over whole columns.

In a spare spot you can put something like
Range("Z1") = "=SLOPE(E:E,D)"
Range("Z2") = "=INTERCEPT(E:E,D)" in your VBA. The when the
file is opened, it copies the formulas into Z1:Z2 and the slope and
intercept appear, ready to be copied off somewhere else. It's
probably quicker too.

xt


 
Reply With Quote
 
 
 
Reply

Thread Tools
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
I used Lotus Help for regressions in MS Excel 99. Now what? =?Utf-8?B?SGF3a2V5ZQ==?= Microsoft Excel Worksheet Functions 1 1st Feb 2006 10:25 PM
Confidence of Regressions elizabeth Microsoft Excel Discussion 2 9th Mar 2005 04:20 PM
how can i fill a table with values from repeated regressions =?Utf-8?B?YnVyYWdvdGNo?= Microsoft Excel Worksheet Functions 4 1st Jan 2005 02:25 PM
Multiple simple regressions procedure? Mikko H Microsoft Excel Misc 0 24th Mar 2004 08:52 AM
Creating regressions with more than 17 data points... =?Utf-8?B?S2V2aW4=?= Microsoft Excel Programming 4 23rd Dec 2003 04:20 AM


Features
 

Advertising
 

Newsgroups
 


All times are GMT +1. The time now is 05:59 PM.