Regression through select points on a scatter plot.

T

Tom

Is it possible to select just particular points in a x-y scatter plot
(say 2-3 points on a graph of 500 data points) and run a linear
regression through them and show the trendline and output the x-
intercept? Another concept that would work for me is to hilight/box-
in a group or some groups of data points on the scatter plot and add a
trendline that could ignore these selected points without me having to
back out of the plot into the "Source Data" and break up all my data
into different "Series." Does anyone know if this can be done, and if
so, how I could go about doing it?

Thanks,
Tom
 
J

JLGWhiz

Have you tried selecting the plot points, right clicking the chart and
selecting add trend line from the drop down menu?
 
T

Tom

Yeah but the regression that I am getting is not matching my data
correctly becuase it has a couple of data points that are much larger
in the x- and y-coordinates than the rest of the data and it is very
unreliable. I'm trying to run a macro in which I make a plot of
pseudo-time and pseudo-pressure and then make a break in the macro or
include an "Event" in my macro so that I can select just a few data
points on the plot to create more of a "visual" trendline to determine
the y-intercept, with which i can quickly insert back into my macro to
iterate until it converges. Creating another data series to select
just the three or four points out of over a thousand x- and y-
coordinates to run a trendline through would just be inefficient
because I have 460 of these to do. I have heard of a function or some
way to box in the data on an Excel plot that you want to use to run a
regression through, but I can't seem to dig anything up on it.
 
P

PBezucha

Tom,

I am able to bid a luxury modification of a standard LinReg. It returns
immediately the coefficients without the need of indexing or array
arrangement. You can then use scalar product for evaluation of each Ycalc.
Which is most useful, however, it performs the validation of individual
points so that by means of change of index values you can drop them out of
the evaluation, class them into groups etc. In your case (several points out
of hundreds) you would do the best, if you would use an arbitrary index
IdGroup (simpliest 1) and mark with the same index (1) the relevant data in
the WArr column. The change is immediately reflected on the chart, when you
calculate aside the point values for a regression curve.

Petr

Function DLinReg(YArr As Range, XArr As Range, WArr As Range, _
Outp As Variant, Optional IdGroup) As Double

'Returns a linear regression parameter, according to the value
'of Output keyword:
'A) for a simple lin.regression
' Outp = 0 or 'intercept' ... intercept
' Outp = 1 or 'slope' ... slope
' Outp = 100 or 's0' ... stand. dev. of intercept
' Outp = 101 or 's1' ... stand. dev. of slope
' Outp = 6 or 'sy' ... F-statistics
' Outp = 7 or 'F' ... stand. dev. of Y
' Outp = 8 or 'correl' or 'R2' ... corr. coef.^2
' Outp = 9 or 'R' ... corr. coef.
'B) for an expansion of n-th order of ordinary variable X:
' Outp = nm (1<=n<=7,m<=n) ... coeff. with the m-th term of X^Outp
' Outp = 1nm ... stand. dev. of the nm-term
'C) for the regression of n independent variables:
' Outp = 9m (m<=n) ... coefficient at the Xm-variable
' Outp = 1nm ... stand. error of m-coefficient estim.
'D) for preceeding correlations B) and C) but without a common term
' (equiv. to b=False in the ws function LinRegression):
' numeric keyword Outp with the value after B) or C)
' but with minus sign (e.i. -21, -22)

'YArr a XArr are the input column ranges with the dependent variable
' and independent variables.
'WArr is an column index range of the same dimension as the range YArr.
'According to the actual index value in the row of ranges
'only those values are evaluated
'- in the case of index IdGroup missing or equal zero,
' which have the index cell empty or equal zero,
'- in the case of index IdGroup declared,
' which have the value of the index cell identical with IdGroup.

Dim y(), X(), IsGroup As Boolean, Valid As Boolean, _
NonZeroIntercept As Boolean, N As Integer, I As Integer, J As Integer, _
K As Integer, NX As Integer, NY As Integer, NC As Integer, FN As Integer, _
Out As Long, AOut As Integer, NValid As Integer, Id1 As Integer, _
Id2 As Integer, MB As Integer, ErrMsg As String, Title As String, _
CallCell As String, W As Variant, NW As Integer, Dev As Long
Static IErr As Long
Const MaxErr As Long = 2
NonZeroIntercept = True
NX = XArr.Count
N = YArr.Count 'N = NY row count of Y-range (number of measurements)
Title = "Linear regression y(x): " & ActiveSheet.Name & "!" & _
XArr(1).Address & ":" & XArr(N).Address
IsGroup = Not IsMissing(IdGroup)
If Not IsMissing(WArr) Then NW = WArr.Count Else NW = 0
If NX Mod N <> 0 Or NW > 0 And NW <> N Then _
ErrMsg$ = "arrays": GoTo ErrExit
NC = NX / N 'NC count of columns in XArr
If IsNumeric(Outp) Then 'Outp is numeric?
Out = CLng(Outp)
AOut = Abs(Out) 'AOut absolute value of Outp
Dev = AOut \ 100 'Dev=1 starts the evaluation of the deviations
AOut = AOut - 100 * Dev
Out = AOut Mod 10

FN = AOut \ 10
If FN = 9 Then
If Out > FN Then ErrMsg = "keyword Outp " & CStr(Out): GoTo ErrExit
ElseIf FN > 1 Then
If FN < 6 And NC <> 1 Then _
ErrMsg = "keyword Outp for a single variable": GoTo ErrExit
If FN < 6 Then NC = FN
End If
If Outp < 0 Then NonZeroIntercept = False
Else 'Outp must be translated into the numeric code
If NC <> 1 Then ErrMsg = "keyword Outp with multidim X-range": _
GoTo ErrExit
FN = 0
Select Case Outp
Case "intercept": Out = 0
Case "slope": Out = 1
Case "sy", "sY": Out = 6
Case "F": Out = 7
Case "R2": Out = 8
Case "R": Out = 9
Case Else: ErrMsg = "keyword Outp " & Outp: GoTo ErrExit
End Select
End If

NValid = 0: I = 0
Do
I = I + 1
GoSub Validation
If Valid Then NValid = NValid + 1
Loop Until I = N
If NValid = 0 Then
If IsEmpty(IdGroup) Then Exit Function
End If
ReDim y(1 To NValid), X(1 To NC, 1 To NValid)
I = 0: J = 0
For I = 1 To N
GoSub Validation
If Valid Then
J = J + 1
y(J) = YArr(I)
For K = 1 To NC
Select Case FN
Case 0, 1, 9
X(K, J) = XArr(I, K)
Case 2 To NC
X(K, J) = XArr(I, 1) ^ K
End Select
Next K
End If
Next I
If Out > NC And Out < 6 Then ErrMsg = "keyword Outp > X range": _
GoTo ErrExit
If Out < 6 Then
Id2 = NC + 1 - Out
If Dev = 0 Then Id1 = 1 Else Id1 = 2
ElseIf Out = 6 Then Id1 = 3: Id2 = 2
ElseIf Out = 7 Then Id1 = 4: Id2 = 1
ElseIf Out = 8 Or Out = 9 Then Id1 = 3: Id2 = 1
End If
DLinReg = Application.WorksheetFunction. _
Index(Application.WorksheetFunction.LinEst(y(), X(), NonZeroIntercept, _
True), Id1, Id2)
If Out = 9 Then DLinReg = Sqr(DLinReg)
IErr = 0
Exit Function
Validation:
If NW = 0 Then Valid = True: Return
Valid = False
If IsEmpty(YArr(I)) Or IsEmpty(XArr(I)) Or WorksheetFunction.IsNA(YArr(I)) _
Then Return
If IsEmpty(WArr(I)) Or WArr(I) = "" Then W = 0 Else W = WArr(I)
If Not IsNumeric(W) Then Return
If IsGroup Then
If W = IdGroup Then Valid = True
ElseIf W = 0 Then
Valid = True
End If
Return
ErrExit:
DLinReg = 0
IErr = IErr + 1
If IErr > MaxErr Then
ErrMsg = "Wrong input of " & ErrMsg
MB = MsgBox(ErrMsg, vbOKOnly, Title)
End If
End Function
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top