Extract HTML + Reg Ex

O

Ori

Hi,

I have a HTML text which I need to parse in order to extract data from
it.

My html contain a table contains few rows and two columns. I want to
extract the data from the 2nd column in the most efficient way (using
Reg Ex.) either than using the "indexOf" function of String.

Thanks,

Ori.

Here is the HTML table:

<table BORDER="1" CELLSPACING="0" CELLPADDING="1">
<tr>
<td>Licensee Name</td>
<td BGCOLOR="#ffffcc">JOHN Doo</td>
</tr>
<tr>
<td><a HREF=>Primary Status</a></td>
<td BGCOLOR="#ffffcc">Data_To_Be_Extracted</td>
</tr>
<tr>
<td>License Number</td>
<td BGCOLOR="#ffffcc">Data_To_Be_Extracted</td>
</tr>
<tr>
<td><a >License Type</a></td>
<td BGCOLOR="#ffffcc">Data_To_Be_Extracted</td>
</tr>
<tr>
<td>Header</td>
<td BGCOLOR="#ffffcc">Data_To_Be_Extracted</td>
</tr>
<tr>
<td>Address</td>
<td BGCOLOR="#ffffcc">Data_To_Be_Extracted</td>
</tr>
<tr>
<td>City State State Zip </td>
<td BGCOLOR="#ffffcc">Data_To_Be_Extracted</td>
</tr>
</table>
 
M

Matthias Kwiedor

Hi!

Try this:

// First split the HTML into Table Lines
string[] arrLines = Regex.Split(strContent, @"<tr.*?>",
RegexOptions.IgnoreCase);

// Go through each line
forearch (string strLine in arrLines)
{
// Split into Rows Array
string[] strCol = Regex.Split(strLine, @"<td.*?>",
RegexOptions.IgnoreCase);
// Remove HTML Tags?
strCol[1] = Regex.Replace(strCol[1], @"<[^>]*>", "");
// second Column
MessageBox.Show(strCol[1]);
}


Hope thats what you want!



Greetings



Matthias

(e-mail address removed) (Ori) wrote in @posting.google.com:
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top