Find HTML tags using RegEx.

T

Thief_

I have the following data in a web page:

<tr height="25">
<td nowrap class="odd" align="center"><img
src="/forums/images/icon_topic_new.gif" width=14 height=14 alt='New Topic'
border=0></td>

<td nowrap class="odd" align="center">&nbsp;</td>

<td nowrap class="odd" align="center">&nbsp;</td>
<td width="85%" class="even" align="left"><font class="new-row"><a
href="topic.asp?tid=106898">
Working with Sequential Text Files</a>&nbsp;</font>
<font class="sub-row">in .NET&nbsp;/&nbsp;.NET Newbies</font><font
class="sub-row"><br>Started 7/24/2005 - pages <a
href="topic.asp?tid=106898">1</a> - last posted by <a
href="profile.asp?action=view&id=jmcilhinney"
onmouseover="window.status='Show the authors profile'; return true;"
onmouseout="window.status=''; return true;">jmcilhinney</a></font></td>
<td width="15%" class="even" valign="middle" align="left"><font
class="new-row"><a href="profile.asp?action=view&id=USMC93"
onmouseover="window.status='Show the authors profile'; return true;"
onmouseout="window.status=''; return true;">USMC93</a></font></td>
<td nowrap class="odd" valign="middle" align="center"><font
class="new-row">6</font></td>
<td nowrap class="odd" valign="middle" align="left">
<font class="new-row">7/24/2005<br>
<font class="sub-row">8:42:37 PM</font></font></td>
</tr>

It's repeated over and over but with different data and is amongst other
unrelated data. I need to capture the following data:

* "topic.asp?tid=106898"
* "Working with Sequential Text Files"
* "in .NET"
* ".NET Newbies"
* "Started 7/24/2005"
* "pages" and "1"
* "last posted by"
* "jmcilhinney"
* "USMC93"
* "7/24/2005"
* "8:42:37 PM"

If someone can show me the Regex to capture say the first two items, I'll
try to figger out the rest.

Thanks.

--
|
+-- Thief_
|

VB.Net
 
K

Ken Tucker [MVP]

Hi,

Maybe this will help.

http://www.regexlib.com/REDetails.aspx?regexp_id=984

Ken
-----------------
I have the following data in a web page:

<tr height="25">
<td nowrap class="odd" align="center"><img
src="/forums/images/icon_topic_new.gif" width=14 height=14 alt='New Topic'
border=0></td>

<td nowrap class="odd" align="center">&nbsp;</td>

<td nowrap class="odd" align="center">&nbsp;</td>
<td width="85%" class="even" align="left"><font class="new-row"><a
href="topic.asp?tid=106898">
Working with Sequential Text Files</a>&nbsp;</font>
<font class="sub-row">in .NET&nbsp;/&nbsp;.NET Newbies</font><font
class="sub-row"><br>Started 7/24/2005 - pages <a
href="topic.asp?tid=106898">1</a> - last posted by <a
href="profile.asp?action=view&id=jmcilhinney"
onmouseover="window.status='Show the authors profile'; return true;"
onmouseout="window.status=''; return true;">jmcilhinney</a></font></td>
<td width="15%" class="even" valign="middle" align="left"><font
class="new-row"><a href="profile.asp?action=view&id=USMC93"
onmouseover="window.status='Show the authors profile'; return true;"
onmouseout="window.status=''; return true;">USMC93</a></font></td>
<td nowrap class="odd" valign="middle" align="center"><font
class="new-row">6</font></td>
<td nowrap class="odd" valign="middle" align="left">
<font class="new-row">7/24/2005<br>
<font class="sub-row">8:42:37 PM</font></font></td>
</tr>

It's repeated over and over but with different data and is amongst other
unrelated data. I need to capture the following data:

* "topic.asp?tid=106898"
* "Working with Sequential Text Files"
* "in .NET"
* ".NET Newbies"
* "Started 7/24/2005"
* "pages" and "1"
* "last posted by"
* "jmcilhinney"
* "USMC93"
* "7/24/2005"
* "8:42:37 PM"

If someone can show me the Regex to capture say the first two items, I'll
try to figger out the rest.

Thanks.

--
|
+-- Thief_
|

VB.Net
 
C

Cor Ligthert [MVP]

Thief,

Although that fore capturing HTML tags is MSHTML (what is a terrible class
to use). You can as well maybe use the method I showed you in the next
question you showed us.

Cor
 
L

Larry Lard

Thief_ said:
I have the following data in a web page:
[snip]
It's repeated over and over but with different data and is amongst other
unrelated data. I need to capture the following data:

* "topic.asp?tid=106898"
* "Working with Sequential Text Files" [etc]

If someone can show me the Regex to capture say the first two items, I'll
try to figger out the rest.

Rather than a RegEx (which you'll agree will be pretty hideous), might
I recommend HtmlAgilityPack for all your HTML parsing needs?

<http://smourier.blogspot.com/2005/05/net-html-agility-pack-how-to-use.html>

It makes dealing with HTML a breeze :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top