Importing several web pages with macro/VBA?

K

KG

I am attempting to import a large amount of webpages containing historical
odds on NBA games (about 4 years of data). The data is available on webpages
similar to this example:

http://archive.scoresandodds.com/grid_20070418.html

As you can see, the date is used in the URL, meaning it should be possible
to make a macro to automatically import each page from then until now into
Excel.
The problem is, however, complicated further due to these factors:

1) NBA odds aren't posted daily meaning some dates/URLs wont have any data
to import (for instance the next two pages/days in the sequence, that is
20070419.html and 20070420.html doesn’t contain NBA data, while
20070421.html once again has the NBA table)

2) Each page also contains tables with data for others sports such as
baseball, but I only need data from the NBA table to be imported.

What is the easiest way to achieve this goal? Is it possible to make a
macro/VBA that loops through all dates since april 2007 and imports only the
NBA table and discards pages where no NBA table is present?

I am new to Excel and would appreciate any help and solutions you can
provide.

Thank you!
 
D

Don Guillett

I am attempting to import a large amount of webpages containing historical
odds on NBA games (about 4 years of data). The data is available on webpages
similar to this example:

http://archive.scoresandodds.com/grid_20070418.html

As you can see, the date is used in the URL, meaning it should be possible
to make a macro to automatically import each page from then until now into
Excel.
The problem is, however, complicated further due to these factors:

1) NBA odds aren't posted daily meaning some dates/URLs wont have any data
to import (for instance the next two pages/days in the sequence, that is
20070419.html and 20070420.html doesn’t contain NBA data, while
20070421.html once again has the NBA table)

2) Each page also contains tables with data for others sports such as
baseball, but I only need data from the NBA table to be imported.

What is the easiest way to achieve this goal? Is it possible to make a
macro/VBA that loops through all dates since april 2007 and imports only the
NBA table and discards pages where no NBA table is present?

I am new to Excel and would appreciate any help and solutions you can
provide.

Thank you!

This is the sort of thing I do for clients. What is this for? Contact
me privately dguillett1 @gmail.com

NBA05/22/2009 · Live betting trends on all these games at sports.com
·
Team Open Line Movements Current Moneyline Scores Notes
8:35 PM EDT
507 orlando magic 188o08 188 / 188.5 / 189 188.5 460 95 Over 188.5
508 cleveland cavaliers -18 -0.105263158 -14 -620 96 final
EASTERN CONFERENCE FINALS - Gm#2 [TNT]ORL. MAGIC LEAD SERIES 1-0SERIES
PRICE: CLE -280 / ORL +220CLE-F-James-Probable
 
R

ron

Yes, this can be done in a relatively straightforward manner. I don't think opening IE and scraping the data is the best way to go. For so many web pages such an approach would take a long time. A better way might be to use the following construction, it will take the source code behind the web page and assign it to a variable. You can then use the "instr" and "mid" functions to extract the data you are interested in and place it in your workbook.

my_url = "your_url"
Set my_obj = CreateObject("MSXML2.XMLHTTP")
my_obj.Open "GET", my_url, False
my_obj.send
my_var = my_obj.responsetext
Set my_obj = Nothing

This code would placed inside a loop where you increment the start date by 1
each time through and then use that date to build the next url. The lop could also contain an if/then/else construction to see if the word "NBA" appears on the current webpage and react accordingly...Ron
 
D

Don Guillett

Yes, this can be done in a relatively straightforward manner.  I don't think opening IE and scraping the data is the best way to go.  For so many web pages such an approach would take a long time.  A better way might be to use the following construction, it will take the source code behind the web page and assign it to a variable.  You can then use the "instr" and "mid" functions to extract the data you are interested in and place it inyour workbook.

    my_url = "your_url"
    Set my_obj = CreateObject("MSXML2.XMLHTTP")
    my_obj.Open "GET", my_url, False
    my_obj.send
    my_var = my_obj.responsetext
    Set my_obj = Nothing

This code would placed inside a loop where you increment the start date by 1
each time through and then use that date to build the next url.  The lop could also contain an if/then/else construction to see if the word "NBA" appears on the current webpage and react accordingly...Ron
==================
Why don't you provide a complete sub or send a file to me at
dguillett1 @gmail.com.
Using the query only takes about 1 second for each date and results
in:

Team Open Line Movements Current Moneyline Scores
NBA 04/21/2007
501 new jersey nets 191 193 / 193.5 / 194 194.5 165 96 Under 194.5
502 toronto raptors -4.5 -14 -4.5 -190 91 final
503 miami heat 183 183 / 182.5 / 183 183.5 175 91 Over 183.5
504 chicago bulls -4.5 -1 -5 -210 96 final
505 orlando magic 180.5 181 / 180.5 / 181 181.5 475 92 Over 181.5
506 detroit pistons -8 -24.65789474 -10 -650 100 final
507 utah jazz 184 188.5 / 189 / 189.5 189 240 75 Under 189
508 houston rockets -6 -20.91666667 -6.5 -280 84 final
NBA 04/22/2007
511 los angeles lakers 214.5 212.5 / 212 / 211.5 212 550 87 Under
212
512 phoenix suns -10.5 -0.090909091 -15.5 -750 95 final
513 denver nuggets 196 196.5 / 197 / 197.5 197 320 95 Under 197
514 san antonio spurs -8 0.941176471 -8 -400 89 final
515 golden state warriors 214.5 214.5 / 215 / 214.5 215 500 97 Under
215
516 dallas mavericks -10 -15.15 -10 -700 85 final
NBA 04/23/2007
701 orlando magic 182.5 182.5 / 183 / 183.5 184 400 90 Over 184
702 detroit pistons -8.5 1.0625 -23 -500 98 final
703 utah jazz 185.5 185 / 184.5 / 185 185.5 190 90 Over 185.5
704 houston rockets -5.5 -0.181818182 -10 -230 98 final
NBA 04/24/2007
705 new jersey nets 194 194 / 194.5 / 194 193.5 200 83 Under 193.5
706 toronto raptors -5 -5.545454545 -20.5 -240 89 final
707 miami heat 184.5 184.5 / 185 / 184.5 184 135 89 Over 184
708 chicago bulls -4.5 -9.408163265 -3 -155 107 final
709 los angeles lakers 208 208 / 207.5 / 208 208.5 380 98 Over 208.5
710 phoenix suns -9.5 -9 -05 / -9 EVEN / -8.5 -14 -480 126 final
NBA 04/25/2007
711 denver nuggets 195 194.5 / 194 / 194.5 194 330 88 Under 194
712 san antonio spurs -8.5 -8.5 -13.5 -420 97 final
715 golden state warriors 210 210.5 / 210 / 209.5 210 425 99 Over
210
716 dallas mavericks -9.5 -14.66620499 -9.5 EVEN -550 112 final
NBA 04/26/2007
501 detroit pistons -2 -20.2 -22.5 -155 93 Under 183
502 orlando magic 183 183 / 183.5 183 135 77 final
503 houston rockets 186 186 / 186.5 / 187.5 188 170 67 Under 188
504 utah jazz -4 -0.222222222 -4.5 -200 81 final
505 phoenix suns -4 -10.25 -24 -190 89 Under 211
506 los angeles lakers 210 210 / 211.5 211 165 95 final
NBA 04/27/2007
507 toronto raptors 192 192 / 191.5 191 165 89 Push 191
508 new jersey nets -4.5 -4.5 -4 -190 102 final
509 chicago bulls 185 185 / 184.5 / 185 185.5 200 104 Over 185.5
510 miami heat -5.5 -5.5 -10.5 -240 96 final
511 dallas mavericks -4.5 -13.875 -4 -180 91 Under 212
512 golden state warriors 211 211 / 211.5 212 160 109
 
R

ron

Don...I was comparing the "MSXML2.XMLHTTP" method of capturing web information via assigning the source code to a variable to the method that actually opens IE

Set ie = CreateObject("InternetExplorer.Application")

With ie
.Visible = True
.Navigate "http://www.your_url"
.Top = 50
.Left = 530
.Height = 400
.Width = 400

' Loop until the page is fully loaded
Do Until Not ie.Busy And ie.ReadyState = 4
DoEvents
Loop

The former method is much faster then the latter. I haven't used the web query approach so I did not comment on it. Here is a link to an earlier post in this NG providing a bit more detail around the "MSXML2.XMLHTTP" method.

https://groups.google.com/forum/#!s...ic.excel.programming/pJryKM1aCh4/VBbkT8YCUdoJ

....Ron
 
K

KG

Thank you for the suggestion. I have tried to use this solution but have ran
into trouble when trying to extract the info I need and put it into Excel
since the HTML code is a bit more complex than in the example you linked to.

I actually found a better site to extract from since it only contains the
NBA data and in a clear way.

http://www.oddsportal.com/basketball/usa/nba-2008-2009/results/page/2/

The problem is to extract the exact strings i need from the source code as I
haven't worked with VB much. Basically, I will need these 5 pieces of data
placed into their own column in Excel:

Teamname1
Teamname2
Match result
Odds1
Odds2

I have discovered that each row in the table on the webpage contains the
word "name table-participant" in the source code, so I tried to use this to
extract the strings and get the data I want, but it requires more coding to
get the 5 strings above than I'm capable of.

Do you have any suggestion to how I can get these 5 pieces of information
extracted from the source code to the page above? I find it difficult to
narrow it down so I just get a string containing each of the above.
 
R

ron

I think this will do most of what you want. You'll need to add a loop to move from web page to web page...Ron

Sub NBA()
' Set the url for the desired web page
my_url = "http://www.oddsportal.com/basketball/usa/nba-2008-2009/results/page/2/"

' Get the source code behind the desired web page
Set my_obj = CreateObject("MSXML2.XMLHTTP")
my_obj.Open "GET", my_url, False
my_obj.send
my_var = my_obj.responsetext
Set my_obj = Nothing

' Find the two team names, final score and 2 odds
pos_0 = 1

Do Until InStr(pos_0, my_var, "name table-participant") = 0
pos_1 = InStr(pos_0, my_var, "name table-participant")
pos_2 = InStr(pos_0, my_var, "deactivate")
If InStr(Mid(my_var, pos_2, 25), "span class") > 0 Then ' first team is in boldface
pos_3 = InStr(pos_2, my_var, "span class")
pos_4 = InStr(pos_3, my_var, ">")
pos_5 = InStr(pos_4, my_var, "<")
teamname_1 = Mid(my_var, 1 + pos_4, pos_5 - (1 + pos_4))

pos_6 = InStr(pos_5, my_var, "-")
pos_7 = InStr(pos_6, my_var, "<")
teamname_2 = Trim(Mid(my_var, 1 + pos_6, pos_7 - (1 + pos_6)))
Else ' the second team is in boldface
pos_4 = InStr(pos_2, my_var, ">")
pos_5 = InStr(pos_4, my_var, "-")
teamname_1 = Trim(Mid(my_var, 1 + pos_4, pos_5 - (1 + pos_4)))

pos_6 = InStr(pos_5, my_var, ">")
pos_7 = InStr(pos_6, my_var, "<")
teamname_2 = Mid(my_var, 1 + pos_6, pos_7 - (1 + pos_6))
End If

pos_7 = InStr(pos_6, my_var, "table-odds")
pos_8 = InStr(pos_7, my_var, ">")
pos_9 = InStr(pos_8, my_var, "<")
score = Mid(my_var, 1 + pos_8, pos_9 - (1 + pos_8))

pos_10 = InStr(pos_9, my_var, "addMatch")
pos_11 = InStr(pos_10, my_var, ">")
pos_12 = InStr(pos_11, my_var, "<")
odds_1 = Mid(my_var, 1 + pos_11, pos_12 - (1 + pos_11))

pos_13 = InStr(pos_12, my_var, "addMatch")
pos_14 = InStr(pos_13, my_var, ">")
pos_15 = InStr(pos_14, my_var, "<")
odds_2 = Mid(my_var, 1 + pos_14, pos_15 - (1 + pos_14))

pos_0 = pos_15

' Post the data to the worksheet
ActiveCell = teamname_1
ActiveCell.Offset(0, 1) = teamname_2
ActiveCell.Offset(0, 2) = score
ActiveCell.Offset(0, 3) = odds_1
ActiveCell.Offset(0, 4) = odds_2

ActiveCell.Offset(1, 0).Select
Loop

' Format the columns
Columns("A:E").Select
Selection.Columns.AutoFit
Columns("C:C").Select
With Selection
.HorizontalAlignment = xlCenter
.VerticalAlignment = xlCenter
.WrapText = False
.Orientation = 0
.AddIndent = False
.IndentLevel = 0
.ShrinkToFit = False
.ReadingOrder = xlContext
.MergeCells = False
End With
Range("A1").Select
End Sub
 
K

KG

Thanks a lot! I added a loop with a static loop count and it's working just
fine. I was struggling with the fact that both teams could be bold in the
HTML code and that the odds could be both three and four digits, but you
clearly solved that in a great way.

I get an "1004 - Application-Defined or Object-Defined Error" error though,
which seem to be related to the snip that formats the columns. If I remove
that part the error disappears. Any ideas on that? The formatting isn't
really necessary for me, but just thought I would mention it anyway :)
 
K

KG

I can't seem to recreate the error now for some reason. The formatting works
fine now, so I must have done something wrong.
Thanks again for the help, it is much appreciated!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top