Loop through date range, but skip down to next year

J

Jason

Okay so the title is stupid, but I don't know how to explain in just a few
words my issue.

We have a range of dates in Column A
Jan 1973
Feb 1973
....
Apr 2009

I am trying to pull numbers from a website that correspond to totals for
each month, but the website (
http://tonto.eia.doe.gov/dnav/ng/hist/n9132cn2m.htm) only lists the years,
and then each line below it corresponds to a total for each month in that
year. Here is an example of the source for year 1973:

<tr>
<td class='B4'> 1973</td>
<td class='B3'>60</td>
<td class='B3'>384</td>
<td class='B3'>1,167</td>
<td class='B3'>931</td>
<td class='B3'>1,670</td>
<td class='B3'>1,598</td>
<td class='B3'>1,758</td>
<td class='B3'>1,829</td>
<td class='B3'>1,022</td>
<td class='B3'>1,465</td>
<td class='B3'>1,483</td>
<td class='B3'>1,456</td>
</tr>

I used the following code to test getting the January total for each year,
but how can I set my loop to skip down 12 cells to the next year? Currently
it is putting the January value in each cell for each year

Set rng = Range("A2").CurrentRegion
Set rng = rng.Resize(rng.Rows.Count, 1)

For Each c In rng
sURLdate = Format(c.Value2, "yyyy")
c.Offset(0, 1).Value = RegexMid(myStr, sURLdate, "class=b3")
 
R

ryguy7272

I can't tell if this is what you want or not, but the code takes all the data
from that site and sweeps it into two columns. Try it and you will see:
Sub settupp()
Set s1 = Sheets("Sheet1")
Set s2 = Sheets("Sheet2")
s1.Activate
n = Cells(Rows.Count, 1).End(xlUp).Row
k = 1
For i = 1 To n
namee = Cells(i, 1).Value
For j = 1 To 13
numberr = Cells(i, j).Value
s2.Cells(k, 1) = namee
s2.Cells(k, 2) = numberr
k = k + 1
Next
Next
End Sub

Also, just for fun, here is a recorded macro that imports the data from that
site:
Sub Macro1()
With ActiveSheet.QueryTables.Add(Connection:= _
"URL;http://tonto.eia.doe.gov/dnav/ng/hist/n9132cn2m.htm",
Destination:=Range _
("A1"))
.Name = "n9132cn2m"
.FieldNames = True
.RowNumbers = False
.FillAdjacentFormulas = False
.PreserveFormatting = True
.RefreshOnFileOpen = False
.BackgroundQuery = True
.RefreshStyle = xlInsertDeleteCells
.SavePassword = False
.SaveData = True
.AdjustColumnWidth = True
.RefreshPeriod = 0
.WebSelectionType = xlSpecifiedTables
.WebFormatting = xlWebFormattingNone
.WebTables = "10"
.WebPreFormattedTextToColumns = True
.WebConsecutiveDelimitersAsOne = True
.WebSingleBlockTextImport = False
.WebDisableDateRecognition = False
.WebDisableRedirections = False
.Refresh BackgroundQuery:=False
End With
End Sub

You probably know that already.............

HTH,
Ryan---
 
D

Dave Peterson

Maybe...

Dim FirstRow as long
dim LastRow as long
dim iRow as long
dim wks as worksheet

set wks = activesheet

with wks
firstrow = 2 'A2
lastrow = .cells(.rows.count,"A").end(xlup).row

for irow = firstrow to lastrow step 12
'sURLdate = Format(c.Value2, "yyyy")
sURLdate = Format(.cells(irow,"A").Value2, "yyyy")
'I would have used (if I could use a real number
sURLdate = year(.cells(irow,"A").value)
....

next irow
end with
 
R

Rick Rothstein

It isn't entirely clear from your code snippet how the text is placed in the
myStr variable nor whether it contains the entire html table from the
website or if it contains a line-by-line "read ins" of that table. The
following code (not RegEx) assumes the **entire** HTML table has been saved
to a file on the hard drive; it then reads that entire file in, parses the
data and place it on the active worksheet with Column A containing the years
and Columns B through M containing January through December's values for
that year appearing in Column A. I have marked the code that reads the
entire file into the myStr variable in case you are using a different method
to fill the myStr variable with the **entire** HTML table of data and need
to change that part of my code. If you keep my code as posted, you will need
to change the example location of the data file and filename that I used in
the Open statement.

Sub DistributeYearData()
Dim Rng As Range
Dim X As Long
Dim Z As Long
Dim FileNum As Long
Dim myStr As String
Dim MonthData() As String
Dim YearParts() As String
' Read in entire file all at once
' {change file location in line below}
FileNum = FreeFile
Open "d:\temp\WebYearData.txt" For Binary As #FileNum
myStr = Space(LOF(FileNum))
Get #FileNum, , myStr
Close #FileNum
' Entire file now resides in the myStr variable
Set Rng = Range("A2")
YearParts = Split(myStr, "<td class='B4'>&nbsp;&nbsp;")
For X = 0 To UBound(YearParts) - 1
Rng.Offset(X).Value = Val(YearParts(X + 1))
MonthData = Split(YearParts(X + 1), "<td class='B3'>")
For Z = 1 To 12
Rng.Offset(X, Z).Value = Val(Replace(MonthData(Z), ",", ""))
Next
Next
End Sub
 
E

EricG

Why not do a web query? Data/Import External Data/New Web Query...

You can pull in that entire table, and then rearrange it any way you want.

HTH,

Eric
 
J

Jason

I am pulling the data from the HTML source code off the website.
Here is my full code below to help. To replicate the problem I am seeing,
column A with Jan 2000 to say Dec 2004 incrementing by month

Jan 2000
Feb 2000
.....

Running the code will put the value for January total for 2000 in every
month with the year 2000, Janurary total for 2001 in every month for 2001 and
so on. We want to use a program (that works correctly) similar to the one
below, so we can add on later for other countries, etc.


Option Explicit

Sub getExportCanada()
Dim c As Range, rng As Range
Dim myStr As String
Dim test As Variant
Dim Val As Variant
Dim IE As Object

Const sURL1 As String = "http://tonto.eia.doe.gov/dnav/ng/hist/n9132cn2m.htm"
Dim sURLdate As String

Application.Cursor = xlWait
Set IE = CreateObject("InternetExplorer.Application")

IE.Navigate sURL1
While IE.ReadyState <> 4
DoEvents
Wend
myStr = IE.Document.body.innerhtml

Dim StartRow As Long
Set rng = Range("A2").CurrentRegion
'look for last filled in row
Set c = rng.SpecialCells(xlCellTypeBlanks).Areas(1)
Set c = c.Resize(1, 1)

StartRow = c.Row - rng.Row

Set rng = rng.Offset(rowoffset:=StartRow).Resize( _
rowsize:=rng.Rows.Count - StartRow, columnsize:=1)

For Each c In rng
sURLdate = Format(c.Value2, "yyyy")
c.Offset(0, 1).Value = RegexMid(myStr, sURLdate, "class=b3")
Next c

IE.Quit
Set IE = Nothing
Application.Cursor = xlDefault
End Sub

Private Function RegexMid(s As String, sYear As String, sMonthTotal) As String
Dim re As Object, mc As Object
Set re = CreateObject("vbscript.regexp")
re.IgnoreCase = True
re.MultiLine = True
re.Global = True
re.Pattern = "\b" & "nbsp;" & sYear & "[\s\S]+?" & sMonthTotal & "\D+(\d+)"

' Yes I know the syntax for the line above is wrong, I am working on that as
well

If re.test(s) = True Then
Set mc = re.Execute(s)
RegexMid = mc(0).SubMatches(0)
End If
Set re = Nothing
End Function
 
E

EricG

In VBA, it looks like this:

With ActiveSheet.QueryTables.Add(Connection:= _
"URL;http://tonto.eia.doe.gov/dnav/ng/hist/n9132cn2m.htm",
Destination:=Range _
("A1"))
.Name = "n9132cn2m"
.FieldNames = True
.RowNumbers = False
.FillAdjacentFormulas = True
.PreserveFormatting = True
.RefreshOnFileOpen = False
.BackgroundQuery = True
.RefreshStyle = xlInsertDeleteCells
.SavePassword = False
.SaveData = True
.AdjustColumnWidth = True
.RefreshPeriod = 0
.WebSelectionType = xlSpecifiedTables
.WebFormatting = xlWebFormattingNone
.WebTables = "10"
.WebPreFormattedTextToColumns = True
.WebConsecutiveDelimitersAsOne = True
.WebSingleBlockTextImport = False
.WebDisableDateRecognition = False
.WebDisableRedirections = False
.Refresh BackgroundQuery:=False
End With
 
C

Colbert Zhou [MSFT]

Hi Jason,

Your codes only get the first matched class="B3", so it will always return
the first month data for a specified year. We need to modify the regular
expression to make it work. The regular expression should include 12
months' sub matches. And we also need to modify the RegexMid function to
make it has another parameter month. So this function can return the
desired month data depending on the passed in parameter. The followings are
codes work on my side now,

-----------------------------------------------------------------------

Sub getExportCanada()
Dim c As Range, rng As Range
Dim myStr As String
Dim test As Variant
Dim Val As Variant
Dim IE As Object

Const sURL1 As String =
"http://tonto.eia.doe.gov/dnav/ng/hist/n9132cn2m.htm"
Dim sURLdate As String
Dim m As String
Application.Cursor = xlWait
Set IE = CreateObject("InternetExplorer.Application")

IE.Navigate sURL1
While IE.ReadyState <> 4
DoEvents
Wend
myStr = IE.Document.body.innerhtml

Dim StartRow As Long
Set rng = Range("A2").CurrentRegion


For Each c In rng
sURLdate = Format(c.Value2, "yyyy")
m = Format(c.Value2, "m")
c.Offset(0, 1).Value = RegexMid(myStr, sURLdate, CInt(m), "class=b3")
Next c

IE.Quit
Set IE = Nothing
Application.Cursor = xlDefault
End Sub

Private Function RegexMid(s As String, sYear As String, iMonth As Integer,
sMonthTotal) As String
Dim re As Object, mc As Object
Set re = CreateObject("vbscript.regexp")
re.IgnoreCase = True
re.MultiLine = True
re.Global = True
re.Pattern = "\b" & "nbsp;" & sYear & _
"[\s\S]+?" & sMonthTotal & "\D+(\d\D?\d+)" & _
"[\s\S]+?" & sMonthTotal & "\D+(\d\D?\d+)" & _
"[\s\S]+?" & sMonthTotal & "\D+(\d\D?\d+)" & _
"[\s\S]+?" & sMonthTotal & "\D+(\d\D?\d+)" & _
"[\s\S]+?" & sMonthTotal & "\D+(\d\D?\d+)" & _
"[\s\S]+?" & sMonthTotal & "\D+(\d\D?\d+)" & _
"[\s\S]+?" & sMonthTotal & "\D+(\d\D?\d+)" & _
"[\s\S]+?" & sMonthTotal & "\D+(\d\D?\d+)" & _
"[\s\S]+?" & sMonthTotal & "\D+(\d\D?\d+)" & _
"[\s\S]+?" & sMonthTotal & "\D+(\d\D?\d+)" & _
"[\s\S]+?" & sMonthTotal & "\D+(\d\D?\d+)" & _
"[\s\S]+?" & sMonthTotal & "\D+(\d\D?\d+)"

' Yes I know the syntax for the line above is wrong, I am working on that
as well

If re.test(s) = True Then
Set mc = re.Execute(s)
RegexMid = mc(0).SubMatches(iMonth - 1)
End If
Set re = Nothing
End Function
------------------------------------------------------------------

Please note, to make the codes work correctly, we must pass valid month
parsed from the Column A. That is to say, we need to fill Date in ColumnA
using 1/1/1973 instead of Jan 1973 as you used.


Best regards,
Colbert Zhou
Microsoft Newsgroup Support Team
 
J

Jason

Thanks! My only problem with web queries, and I am sure there is an easy way
to get around this, is usually I need to the data in a different format than
the way it is after the web query import.

I am new to the programming side of Excel, so I am not sure if there is a
way to run the web query, then re-arrange it.
 
J

Jason

Colbert

Thanks! That was exactly what I was looking for!

You even fixed my syntax problem in regular expression. However, I noticed
that for some reason, if the total for the month is greater than 10,000 it
only grabs the numbers before the comma. Or if the number is just a single
digit, it messes that up as well...

If you scroll down the months for the totals after Dec 1976, you'll see the
single digit problem.

And after say, year 2001, you can see the issue with the numbers over
10,000...

You can compare to the totals on the website
http://tonto.eia.doe.gov/dnav/ng/hist/n9132cn2m.htm

I know its just a syntax issue in the Pattern

Thanks again for your work so far.
 
R

Rick Rothstein

If it helps you any, here is the macro I posted earlier modified to read the
table of values into myStr using the method you posted in your code. Again,
it places the data on the active worksheet with Column A containing the
years and Columns B through M containing January through December's values
for that year appearing in Column A starting on Row 2. This leaves room for
headers on Row 1 (if the layout I used is what you want, the header
placements can be added to the code quite easily).

Sub DistributeYearData()
Dim IE As Object
Dim Rng As Range
Dim X As Long
Dim Z As Long
Dim FileNum As Long
Dim myStr As String
Dim sURLdate As String
Dim MonthData() As String
Dim YearParts() As String
Const sURL1 As String =
"http://tonto.eia.doe.gov/dnav/ng/hist/n9132cn2m.htm"
Set IE = CreateObject("InternetExplorer.Application")
IE.Navigate sURL1
While IE.ReadyState <> 4
DoEvents
Wend
myStr = IE.Document.body.innerhtml
Set Rng = Range("A2")
YearParts = Split(myStr, "<td class=B4>&nbsp;&nbsp;", , vbTextCompare)
For X = 0 To UBound(YearParts) - 1
Rng.Offset(X).Value = Val(YearParts(X + 1))
MonthData = Split(YearParts(X + 1), "<td class=B3>", , vbTextCompare)
For Z = 1 To 12
Rng.Offset(X, Z).Value = Val(Replace(MonthData(Z), ",", ""))
Next
Next
End Sub
 
J

Jason

Thanks...that gets the info we need, but we want to keep it with the dates in
column A and the totals for each month in column B (it's a legacy
spreadsheet that we have charts based off)

The Sub that Colbert posted does this and I added in some stuff to allow
headers, but the syntax is off a little in the pattern search due to the
numbers having commas

Thanks again though! I am new to this and so I always go through code
people post to try and learn what it is doing.
 
R

Rick Rothstein

Does the following do what you want? Note that you do not have to put your dates in Column A first... the code handles that for you.

Sub DistributeYearData()
Dim IE As Object
Dim rng As Range
Dim X As Long
Dim Z As Long
Dim Yr As Long
Dim FileNum As Long
Dim myStr As String
Dim sURLdate As String
Dim MonthData() As String
Dim YearParts() As String
Const sURL1 As String = "http://tonto.eia.doe.gov/dnav/ng/hist/n9132cn2m.htm"
Set IE = CreateObject("InternetExplorer.Application")
IE.Navigate sURL1
While IE.ReadyState <> 4
DoEvents
Wend
myStr = IE.Document.body.innerhtml
Set rng = Range("A2")
YearParts = Split(myStr, "<td class=B4>&nbsp;&nbsp;", , vbTextCompare)
For X = 0 To UBound(YearParts) - 1
Yr = Val(YearParts(X + 1))
MonthData = Split(YearParts(X + 1), "<td class=B3>", , vbTextCompare)
For Z = 1 To 12
rng.Offset(Z + 12 * X - 1).Value = DateSerial(Yr, Z, 1)
rng.Offset(Z + 12 * X - 1).NumberFormat = "mmm yyyy"
rng.Offset(Z + 12 * X - 1, 1).Value = Val(Replace(MonthData(Z), ",", ""))
Next
Next
End Sub
 
J

Jason

Works great!

My only issue it actually works better than I need. I would prefer it to
read the dates from the column...that way I can enter the dates in and run
variations of this routine to fill in similar stats in the other columns
 
E

EricG

Here's one way to do it. Add a temporary sheet, import the data from the web
site, the bring the data into memory using array variables. Once you have it
in memory, you can do whatever you want with it, including adding it to
another worksheet in a different order.

In the example I assume that there are always 12 months, so I don't bother
to store the months in memory. Stick the example in a general VBA module and
run it. I left off the part of what you do with the data after you get it in
memory.

Option Explicit
Option Base 1
'
' This routine adds a temporary sheet to
' grab data from a URL table, then once the
' data are in memory, deletes the sheet.
'
Sub Grab_and_Reorder()
Dim i As Long, j As Long
Dim nRows As Long
Dim nYears As Long
Dim wsh As Worksheet
Dim theYears() As Integer, theData() As Double
'
Set wsh = ActiveWorkbook.Sheets.Add
'
' First grab the table of data and put in on
' the active worksheet.
'
With ActiveSheet.QueryTables.Add(Connection:= _
"URL;http://tonto.eia.doe.gov/dnav/ng/hist/n9132cn2m.htm", _
Destination:=Range("A1"))
.Name = "n9132cn2m"
.FieldNames = True
.RowNumbers = False
.FillAdjacentFormulas = True
.PreserveFormatting = True
.RefreshOnFileOpen = False
.BackgroundQuery = True
.RefreshStyle = xlInsertDeleteCells
.SavePassword = False
.SaveData = True
.AdjustColumnWidth = True
.RefreshPeriod = 0
.WebSelectionType = xlSpecifiedTables
.WebFormatting = xlWebFormattingNone
.WebTables = "10"
.WebPreFormattedTextToColumns = True
.WebConsecutiveDelimitersAsOne = True
.WebSingleBlockTextImport = False
.WebDisableDateRecognition = False
.WebDisableRedirections = False
.Refresh BackgroundQuery:=False
End With
'
' Delete any blank rows (those whose first cell is blank)
'
Range("A:A").SpecialCells(xlCellTypeBlanks).EntireRow.Delete
'
' Next, determine how much data you just grabbed.
'
ActiveSheet.Cells(1, 1).Select
nYears = ActiveCell.CurrentRegion.Rows.Count - 1 ' Ignore header row
'
' Grab the data and store in memory
'
ReDim theYears(nYears)
ReDim theData(nYears * 12)
'
For i = 1 To nYears
theYears(i) = ActiveSheet.Cells(i + 1, 1)
For j = 1 To 12
theData((i - 1) * 12 + j) = ActiveSheet.Cells(i + 1, j + 1)
Next j
Next i
'
wsh.Delete
Set wsh = Nothing
'
' Now reorder the data as you please...
'
End Sub
 
R

Rick Rothstein

Works too good, eh?<g> Okay, how about this "not too good" code then?<bg>

' Place the following Dim statement in the (General)(Declarations)
' section at the top of whatever module you put the macro in.
Dim TableValues() As String

Sub DistributeYearData()
Dim IE As Object
Dim X As Long
Dim Z As Long
Dim Yr As Long
Dim CellVal As Variant
Dim myStr As String
Dim MonthData() As String
Dim YearParts() As String
Const sURL1 As String =
"http://tonto.eia.doe.gov/dnav/ng/hist/n9132cn2m.htm"
On Error Resume Next
X = UBound(TableValues)
If Err.Number Then
On Error GoTo 0
Err.Clear
Set IE = CreateObject("InternetExplorer.Application")
IE.Navigate sURL1
While IE.ReadyState <> 4
DoEvents
Wend
myStr = IE.Document.body.innerhtml
IE.Quit
Set IE = Nothing
YearParts = Split(myStr, "<td class=B4>&nbsp;&nbsp;", , vbTextCompare)
ReDim TableValues(12 * (1 + (Val(YearParts(UBound( _
YearParts))) - Val(YearParts(1)))))
For X = 0 To UBound(YearParts) - 1
Yr = Val(YearParts(X + 1))
MonthData = Split(YearParts(X + 1), "<td class=B3>", , vbTextCompare)
For Z = 1 To 12
TableValues(12 * X + Z - 1) = Yr & Format$(Z, "00") & "01-" & _
Val(Replace(MonthData(Z), ",", ""))
Next
Next
End If
For X = 2 To Cells(Rows.Count, "A").End(xlUp).Row
CellVal = Cells(X, "A").Value
If CellVal <> "" Then
Cells(X, "B").Value = Mid(Filter(TableValues, Format$( _
CellVal, "yyyymm01-"))(0), 10)
End If
Next
End Sub

NOTE: Since you said you were going to run multiple scenarios, I figured
there was no reason to go out to the website and download and reprocess the
same data over and over again; so I made the TableValues array "global" by
placing it in the (General)(Declarations) section (outside of any
procedures) of the module you place the code in. Doing this will make your
second and all other scenario runs execute at a blindly fast pace (well,
blindingly fast as compared to the first run which has to open the website,
read the data and then process the data to create the TableValues array).
There is one *possible* downside to doing this, however... if you leave the
worksheet module that the array and code are in open, and the data changes
on the website for some reason, the TableValues array will not have the
updated values in it. Here is a routine you can place in the same module as
the above code...

Sub ClearTableValuesArray()
Erase TableValues
End Sub

Run it and it will clear the TableValues array so that the next time you run
my DistributeYearData macro, it will be forced to go out and refresh the
TableValues array with the latest data. Of course, shutting Excel down will
also clear the TableValues arrays as well; so if you close the workbook
daily, that will accomplish the same thing.
 
J

Jason

Rick,

Thanks! I'll have to study up on the array stuff, but I definitely like it!

I was trying to use it on some similar stats from other areas of this same
site, and ran into some minor glitches.

1. How can we insert a zero into the cell if the entry for that month is
blank?

Example site:
http://tonto.eia.doe.gov/dnav/ng/hist/n9103id2m.htm

2. If you notice on this country's data, the history doesn't start until
1997. I am just manually changing the starting point in each country's sub
to whatever row corresponds to the starting date, since there are not too
many countries. However, if you know of a simple way to have it automate
this that would be great, but not too important.

Thanks again!

Jason
 
R

Rick Rothstein

1. On my system, the code I posted does put a 0 in for blank values... are
you saying it doesn't do this on your system? What version of Excel are you
using and what operating system version are you running it on?

2. My originally posted code did that, but you told me it gave you too much
information and you had me modify it to retrieve the data for just the dates
you have listed in Column A. I'm not sure how you would want to automate
this. How would the code fill in the first date for you and yet only return
values for the dates you have listed? I see these as mutually exclusive
situations; or do you have some method of proceeding in mind that would
permit both scenarios to apply?
 
J

Jason

Rick,

My apologies...it does do all that stuff already. The data for that one
country I was testing was bad (should've checked that first). The data shows
1997 then skips to Oct 2000. I set it to start at Oct 2000 and it went
through fine.

My last request is kinda different and probably should go on a new post but
thought I would ask you while I had you here, since you've helped already.

-After I get the data read into the say "Sheet5" column "D", how can I make
the formulas in columns "E" & "F" autofill down to the last row of data in
column "D"?

Thanks!
 
Top