G
Guest
I am creating a screen scraping app that will extract data from a website.
The screen scraping is pretty straightforward using .NET 2.0, but stripping
out all extraneous characters is proving to be more difficult. I am
basically trying to extract the team, quarter, score for the quarter, and
score for the entire game from this html. (This html is a subset of the
entire page)
<table border="0" width="100%"><tr><td width="40%">Team</td><td width="10%"
align="center">1</td> <td width="10%" align="center">2</td><td width="10%"
align="center">3</td> <td width="10%" align="center">4</td><td width="20%"
align="center">Score</td></tr><tr><td width="40%"><A
href="/default.asp?c=sportsnetwork&page=nfl/teams/078.htm">New
Orleans</A></td><td width="10%" align="center" >0</td><td width="10%"
align="center" >10</td><td width="10%" align="center" >0</td><td width="10%"
align="center" >0</td><td width="20%" align="center" >10</td></tr><tr><td
width="40%"><A
href="/default.asp?c=sportsnetwork&page=nfl/teams/071.htm">Indianapolis</A></td><td
width="10%" align="center" >7</td><td width="10%" align="center" >3</td><td
width="10%" align="center" >14</td><td width="10%" align="center" >17</td><td
width="20%" align="center" >41</td></tr></table>
In essance I want to be able to put the names and scores into an array so I
can add to a database. From what I read regular expressions should be able
to do this but I am a complete beginner using regex. Could someone assist
in getting me started? Many thanks.
The screen scraping is pretty straightforward using .NET 2.0, but stripping
out all extraneous characters is proving to be more difficult. I am
basically trying to extract the team, quarter, score for the quarter, and
score for the entire game from this html. (This html is a subset of the
entire page)
<table border="0" width="100%"><tr><td width="40%">Team</td><td width="10%"
align="center">1</td> <td width="10%" align="center">2</td><td width="10%"
align="center">3</td> <td width="10%" align="center">4</td><td width="20%"
align="center">Score</td></tr><tr><td width="40%"><A
href="/default.asp?c=sportsnetwork&page=nfl/teams/078.htm">New
Orleans</A></td><td width="10%" align="center" >0</td><td width="10%"
align="center" >10</td><td width="10%" align="center" >0</td><td width="10%"
align="center" >0</td><td width="20%" align="center" >10</td></tr><tr><td
width="40%"><A
href="/default.asp?c=sportsnetwork&page=nfl/teams/071.htm">Indianapolis</A></td><td
width="10%" align="center" >7</td><td width="10%" align="center" >3</td><td
width="10%" align="center" >14</td><td width="10%" align="center" >17</td><td
width="20%" align="center" >41</td></tr></table>
In essance I want to be able to put the names and scores into an array so I
can add to a database. From what I read regular expressions should be able
to do this but I am a complete beginner using regex. Could someone assist
in getting me started? Many thanks.