Automating Linear Interpolation

S

smurray444

Hi there - thanks for looking,

I have a large dataset with various columns of data for global climate
(meteorological) stations, however the dataset is not complete - it has
gaps for some years where data has not been collected/the equipment
failed.

I am hoping to be able to interpolate the data in order to fill these
gaps... via a simple linear method. For example, if there are three
cells as shown below, then the gap in cell B can be filled by taking
the average of cells A and C to equal 50 etc etc.

A = 40
B = [GAP] (=50 when interpolated/averaged)
C = 60

There are also incidents where there are a series of gaps in
succession, eg. as follows:

A = 10
B = GAP
C = GAP
D = GAP
E = 50

I understand that it might be difficult to fill the gaps whereby the
values increment in turn (ie. = 20, 30, 40 respectively in the example
above) - but would it be possible to design an automated solution
whereby the missing data is filled by doing, say, (50+10)/2 = 30... and
'30' is used to fill the gaps from B-D (there are not necessary 3 blanks
in each case, this ranges considerably).

The final piece of the jigsaw is in averaging the values correctly when
the climate station changes and there are gaps for both stations in
succession, as demonstrated in the simplified example below:

Station 1: A = 10
Station 1: B = 15
Station 1: C = 20
Station 1: D = GAP (=25)
Station 1: E = 30
Station 1: F = GAP (=35)
Station 1: G = GAP (=40)
Station 2: A = GAP (=4)
Station 2: B = GAP (=5)
Station 2: C = 6
Station 2: D = 7
Station 2: E = GAP (=8)
Station 2: F = 9
Station 2: G = 10

In the case of Station 1: F-G and Station 2: A-B, an average is not
required of the values above and below the gap present (ie. not taking
the average of 30 and 6 and filling each of the gaps with 13), but is
required only for the data in that particular station. So would there
be a way of informing the to be wary of such changes in station
reference? (Station reference number is in column A in my dataset)

Is there a way of designing (a macro in Visual Basic?) to automate
these procedures in Excel 2002? - as doing it manually would be
unfeasible due to the nature of the dataset size. Incidently, data
which requires interpolation is in columns F to M (inclusive), and is
represented as a blank cell. Interpolation is required vertically down
the columns, and not between rows.

Thanks very much for looking at this - I would be very grateful if
someone manages to manufacture a solution.

Thanks once more,
Steve Murray
 
B

Bernie Deitrick

Steve,

Assuming that your step in the Time dimension is always consistent, your data is sorted based on the
identifier (station 1, etc.), and that your identifier is in column A - try the macro below, on a
copy of your data.

HTH,
Bernie
MS Excel MVP


Sub FillInBlanks()
Dim myCell As Range
Dim i As Integer
Dim myRange As Range

Set myRange = Range("A:A")

For i = 6 To 13

' Interpolation First
For Each myCell In Columns(i).SpecialCells(xlCellTypeBlanks)
With myRange
If (.Cells(myCell.Row).Value = _
.Cells(myCell.End(xlUp).Row).Value) And _
(.Cells(myCell.Row).Value = _
.Cells(myCell.End(xlDown).Row).Value) Then
myCell.Value = myCell.End(xlDown).Value + _
(myCell.Row - myCell.End(xlDown).Row) _
* (myCell.End(xlUp).Value - myCell.End(xlDown).Value) / _
(myCell.End(xlUp).Row - myCell.End(xlDown).Row)
End If
End With
Next myCell

'Now Extrapolation
For Each myCell In Columns(i).SpecialCells(xlCellTypeBlanks)
'Interpolate if a cell above has been filled in
With myRange
If (.Cells(myCell.Row).Value = _
.Cells(myCell.End(xlUp).Row).Value) And _
(.Cells(myCell.Row).Value = _
.Cells(myCell.End(xlDown).Row).Value) Then
myCell.Value = myCell.End(xlDown).Value + _
(myCell.Row - myCell.End(xlDown).Row) _
* (myCell.End(xlUp).Value - myCell.End(xlDown).Value) / _
(myCell.End(xlUp).Row - myCell.End(xlDown).Row)
ElseIf (.Cells(myCell.Row).Value <> _
.Cells(myCell.End(xlUp).Row).Value) Then
myCell.Value = myCell.End(xlDown).Value + _
(myCell.Row - myCell.End(xlDown).Row) _
* (myCell.End(xlDown)(2).Value - myCell.End(xlDown).Value) / _
(myCell.End(xlDown)(2).Row - myCell.End(xlDown).Row)
Else
myCell.Value = myCell.End(xlUp).Value + _
(myCell.Row - myCell.End(xlUp).Row) _
* (myCell.End(xlUp)(0).Value - myCell.End(xlUp).Value) / _
(myCell.End(xlUp)(0).Row - myCell.End(xlUp).Row)
End If
End With
Next myCell

Next i

End Sub





smurray444 said:
Hi there - thanks for looking,

I have a large dataset with various columns of data for global climate
(meteorological) stations, however the dataset is not complete - it has
gaps for some years where data has not been collected/the equipment
failed.

I am hoping to be able to interpolate the data in order to fill these
gaps... via a simple linear method. For example, if there are three
cells as shown below, then the gap in cell B can be filled by taking
the average of cells A and C to equal 50 etc etc.

A = 40
B = [GAP] (=50 when interpolated/averaged)
C = 60

There are also incidents where there are a series of gaps in
succession, eg. as follows:

A = 10
B = GAP
C = GAP
D = GAP
E = 50

I understand that it might be difficult to fill the gaps whereby the
values increment in turn (ie. = 20, 30, 40 respectively in the example
above) - but would it be possible to design an automated solution
whereby the missing data is filled by doing, say, (50+10)/2 = 30... and
'30' is used to fill the gaps from B-D (there are not necessary 3 blanks
in each case, this ranges considerably).

The final piece of the jigsaw is in averaging the values correctly when
the climate station changes and there are gaps for both stations in
succession, as demonstrated in the simplified example below:

Station 1: A = 10
Station 1: B = 15
Station 1: C = 20
Station 1: D = GAP (=25)
Station 1: E = 30
Station 1: F = GAP (=35)
Station 1: G = GAP (=40)
Station 2: A = GAP (=4)
Station 2: B = GAP (=5)
Station 2: C = 6
Station 2: D = 7
Station 2: E = GAP (=8)
Station 2: F = 9
Station 2: G = 10

In the case of Station 1: F-G and Station 2: A-B, an average is not
required of the values above and below the gap present (ie. not taking
the average of 30 and 6 and filling each of the gaps with 13), but is
required only for the data in that particular station. So would there
be a way of informing the to be wary of such changes in station
reference? (Station reference number is in column A in my dataset)

Is there a way of designing (a macro in Visual Basic?) to automate
these procedures in Excel 2002? - as doing it manually would be
unfeasible due to the nature of the dataset size. Incidently, data
which requires interpolation is in columns F to M (inclusive), and is
represented as a blank cell. Interpolation is required vertically down
the columns, and not between rows.

Thanks very much for looking at this - I would be very grateful if
someone manages to manufacture a solution.

Thanks once more,
Steve Murray
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Top