If you want to try using countif > 0 which will get the non-blank dates it
requires only one line change. This method doesn't look at bad data, just
cares if there is data or is'nt data.
chage from:
corelval = WorksheetFunction.Correl(corelarray,
colarray)
change to:
corelval = WorksheetFunction.Countif(corelarray,
">0")
If ArrayCount(colarray) > ArrayCount(corelarray) * 0.1
Then
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
corelval = WorksheetFunction.Correl(corelarray,
colarray
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
If Abs(corelval) > coreltest Then
colave = WorksheetFunction.Average(colarray)
colsd = WorksheetFunction.StDev(colarray)
rsltcount = rsltcount + 1
rsltarray(rsltcount, 1) = Cells(1, c + 1)
rsltarray(rsltcount, 2) = Cells(2, c + 1)
rsltarray(rsltcount, 3) = colave
rsltarray(rsltcount, 4) = colsd
rsltarray(rsltcount, 5) = colsd / colave * 100
rsltarray(rsltcount, 6) = corelval
rsltarray(rsltcount, 7) = corelval * corelval
rsltarray(rsltcount, 8) =
WorksheetFunction.Count(colarray)
End If
End If
Next c
"Geoff" wrote:
> Hello Joel
>
> I do not want to put in a value in the empty cells. I don't want to
> force a correlation I want to ignore columns that have correlations
> based on only a few values. I have 20 sheets each with 200 columns and
> 8700 rows. I want to find out which columns have a real correlation
> with a particular columns of interest.
>
> Here is the macro that I am currently using. The tagname is allows me
> to choose from the data the column that I want to find out what
> correlates so that I can then have acloser look at the data to see
> what is the value of the correlation. Some columns will have 6000+
> data points and others will ahve 100. But if only 10 of the data
> points overlap with the 6000 then any correlation is not significant
> as there is insufficiant data for my purposes. To force the
> correlation to happen will not help me identify which items to look at
> more closely.
>
> I want to identify which columns contain more than say 100 common data
> points.
>
> Thank you for your help.
>
> Sub CorrelSearch()
> Dim startrow As Integer, endrow As Integer, startcol As Integer,
> endcol As Integer
> Dim r As Long, c As Long
> Dim StartTime As Double
> Dim myArray() As Variant, rsltarray() As Variant
> Dim myRange As Range
> Dim clcell As Long
> Dim aSheet As Object
> Dim dupval As Variant, rsltcount As Variant
> Dim colarray() As Variant, corelarray() As Variant
> Dim colave As Double, colsd As Double
> Dim testval As Double
> Dim tagname As Variant
> Dim srchrange As Variant
> Dim corelval As Double
> Dim coreltest As Double
> Dim rsltrange As Range
>
>
> StartTime = Timer
>
> ' find from lookup table tag name
> tagname = InputBox(prompt:="Enter the tagname",
> Title:="Correlation selection")
> coreltest = InputBox(prompt:="Enter the correlation value")
> ' copy selected tag data to an array
> For Each aSheet In ActiveWorkbook.Sheets
> Worksheets(aSheet.Name).Select
> Set srchrange = Range(Cells(1, 1), Cells(1, 256))
> ' srchrange = Range(Selection)
> If WorksheetFunction.CountIf(srchrange, tagname) = 1 Then
> Cells.Find(What:=tagname, LookIn:=xlFormulas,
> LookAt:=xlPart).Activate
> corelarray = Range(Cells(7, ActiveCell.Column),
> Cells(ActiveSheet.Cells.SpecialCells(xlLastCell).Row,
> ActiveCell.Column))
> End If
>
> Next aSheet
>
> ' cycle through all pages in workbook and check correlation
> rsltcount = 0
> ReDim rsltarray(2000, 8)
> For Each aSheet In ActiveWorkbook.Sheets
> Application.ScreenUpdating = True
> Worksheets(aSheet.Name).Select
> Application.ScreenUpdating = False
> Set myRange = Range("B7", ActiveCell.SpecialCells(xlLastCell))
> startcol = myRange.Column
> endcol = myRange.Columns(myRange.Columns.Count).Column
> startrow = myRange.Row
> endrow = myRange.Rows(myRange.Rows.Count).Row
> ReDim myArray(startrow To endrow, startcol To endcol)
> myArray() = myRange
> For c = 1 To endcol - startcol + 1
> Application.StatusBar = ActiveSheet.Name & ", column " & c
> colarray = Application.Index(myArray, 0, c)
> On Error Resume Next
> If ArrayCount(colarray) > ArrayCount(corelarray) * 0.1
> Then
> corelval = WorksheetFunction.Correl(corelarray,
> colarray)
> If Abs(corelval) > coreltest Then
> colave = WorksheetFunction.Average(colarray)
> colsd = WorksheetFunction.StDev(colarray)
> rsltcount = rsltcount + 1
> rsltarray(rsltcount, 1) = Cells(1, c + 1)
> rsltarray(rsltcount, 2) = Cells(2, c + 1)
> rsltarray(rsltcount, 3) = colave
> rsltarray(rsltcount, 4) = colsd
> rsltarray(rsltcount, 5) = colsd / colave * 100
> rsltarray(rsltcount, 6) = corelval
> rsltarray(rsltcount, 7) = corelval * corelval
> rsltarray(rsltcount, 8) =
> WorksheetFunction.Count(colarray)
> End If
> End If
> Next c
>
>
> Next aSheet
>
> Sheets.Add after:=ActiveSheet
> Set rsltrange = Range(Cells(1, 1), Cells(rsltcount, 8))
> rsltrange.Value = rsltarray
> ActiveSheet.Name = tagname
> ' Display elapsed time
> Application.ScreenUpdating = True
> MsgBox Format(Timer - StartTime, "00.00") & " seconds"
> MsgBox rsltcount
>
> End Sub
>
>
> On Aug 23, 1:00 pm, Joel <J...@discussions.microsoft.com> wrote:
> > The Correlation function will give you a 1 when the two columns are exacttly
> > the same. When they don't match, the results closestt to one (absolute) is
> > the best results.
> >
> > Correlattion function in Excel ignores empty cells. If you first put a
> > value like 0 into the empty cells then you would get better results.
> >
> > I think the best result would be to use correlation but when a cell is empty
> > place the same value in both columns. This way correlation will count these
> > cells
> >
> > You should go to the excel worksheet and from Inset Menu - Function -
> > Correl. the select help (lowerr left corner of window). This will show you
> > the calculation that is made for Correl.
> >
> > The formula basical is calculatting the difference between the cells in the
> > two columns, and then dividing the product of the squares. It ignores blank
> > cells in the calculation. Putting the same value in both columns will not
> > chbange the results of the Numerator of the function, but does change the
> > value of the denominator.
> >
> > I don't think it will take a macro very long to automatically put the same
> > value in both columns. iff you tell me how your columns are aranged I will
> > write the macro. It take me only 5 minutes.
> >
> > Are your columns A & B, the D & E leavving one empty column between pairs,
> > or some other arrangement.
> >
> > "Geoff" wrote:
> > > I am using the intenal correlation function within excel. I am not
> > > looking for an incredible accuracy. What I need to know is how many
> > > common elements are in the two arrays that I am producing the
> > > correlation for. I think it is a Pearson function but that is entirely
> > > speculative :-)
> >
> > > Tha data is froma real world data collection system for a paper
> > > machine. There are large gaps in the data after removing text and
> > > erroneous data. For the data that is left I need some idea of what
> > > correlates with key parameters and am now realising that I need to
> > > know if I can believe the correlation.
> >
> > > I thought a simple way of doing this would be to check that there is
> > > at least 100 or more values that are common to the two arrays. 100 may
> > > not be the absolute number but it can be a starting point.
> >
> > > I am hoping there is a simple way to do this. I could just go through
> > > the two arrays and check if there is a value for each row in the array
> > > and once it reaches 100 check the correlation. This would be very slow
> > > for a macro that has such a large body of data to check.
> >
> > > Regards, Geoff
> >
> > > On Aug 21, 4:40 pm, Joel <J...@discussions.microsoft.com> wrote:
> > > > What math function(s) are you using to get your correlation. How many bit
> > > > accuracy are you look at.
>
>
>