regular expression help needed

H

henrik

Hi

I have a regex question. I want to find all content of a <td
class="someclass"> tag. This means the expression should include all other
tags included between <td class="someclass"> and </td>.

Please help

Regards

Henrik
 
T

toupeira23

Hi Henrik

I guess something like /^<[^>]+>(.*)<[^>]+>$/ should do the trick,
although I'm not really sure about that greedy matching... (i.e. will
$1 include *everything* after the first tag, or only up to the closing
tag?)


hth,
Markus
 
L

Ludovic SOEUR

It depends on what you really are looking for.

if you document is like this one : <tr><td class="myClass">Some
text</td></tr>
the following regex=new Regex(@"<td\s+class=\"myClass\"\s+>(.*?)</td>");
will give you as result : "Some text"

if you document is like this one : <tr><td class="myClass"> <table><tr><td
class="anotherClass">Other text</td></tr></table> </td></tr>
the following regex=new Regex(@"<td\s+class=\"myClass\"\s+>(.*?)</td>");
will give you as result : " <table><tr><td class="anotherClass">Other text"
wich is not what you are looking for

if you remove the question mark in the regex, it will work for the two
anterior examples but will not work for the following one :
if you document is like this one : <tr><td class="myClass">Some
text</td></tr><tr><td class="anotherClass">Other text</td></tr>
that will give you "Some text</td></tr><tr><td class="anotherClass">Other
text"

My question is : do you have nested <td> tags ? In that case, you have to
use backtracking
Could you give us the most complicated example of file you have ?

Ludovic SOEUR.
 
L

Ludovic SOEUR

I did the case where you can have nested tags, using backtracking :

public void showTDContent(string content) {
Regex regex=new Regex(@"<td
class=\""someclass\"">(?<tdcontent>.*?((?=<td)|(?=</td))(((?<Open><td.*?>).*
?((?=<td)|(?=</td)))+((?<Close-Open></td>).*?((?=<td)|(?=</td)))+)*(?(Open)(
?!)))</td>");
MatchCollection matches=regex.Matches(content);
foreach(Match match in matches) {
string sMatch=match.Groups["tdcontent"].Value;
MessageBox.Show(sMatch);
showTDContent(sMatch); //Try to find nested tags
}
}

Here an example :
showTDContent(@"<table><tr><td
class=""someclass""><table><tr><td>someText</td><td
class=""someclass"">otherText</td></tr></table></td></tr><tr><td
class=""someclass"">thirdText</td></tr></table>");

It returns three strings :
1) <table><tr><td>someText</td><td
class=""someclass"">otherText</td></tr></table>
2) otherText
3) thirdText

If you dont have nested tags like the example before, you can keep the SAME
regex expression but you don't need to use recursivity
public void showTDContent(string content) {
Regex regex=new Regex(@"<td
class=\""someclass\"">(?<tdcontent>.*?((?=<td)|(?=</td))(((?<Open><td.*?>).*
?((?=<td)|(?=</td)))+((?<Close-Open></td>).*?((?=<td)|(?=</td)))+)*(?(Open)(
?!)))</td>");
MatchCollection matches=regex.Matches(content);
foreach(Match match in matches) {
MessageBox.Show(match.Groups["tdcontent"].Value);
}
}

To explain the regular expression, have a look to
http://blogs.msdn.com/bclteam/archive/2005/03/15/396452.aspx. It explains
how works balanced matching
<
[^<>]*

(

(

(?<Open><)

[^<>]*

)+

(

(?<Close-Open>>)

[^<>]*

)+

)*

(?(Open)(?!))


My regex is nearly the same:
<td\s+class=\"someclass\">(?<tdcontent>
.*?((?=<td)|(?=</td))
(
(
(?<Open><td.*?>)
.*?((?=<td)|(?=</td))
)+
(
(?<Close-Open></td>)
.*?((?=<td)|(?=</td))
)+
)*
(?(Open)(?!))
)</td>

In fact,
[^<>]* is replaced by .*?((?=<td)|(?=</td)) that means any opening or
closing TD tag
and the other things are exactly the same.


Hope everything helps,

Ludovic SOEUR.
 
H

henrik

Hi you Guys

Thank you for you help!

I solved it with <td[\ \s]class="myClass">(\s\S]*?)</td>. I donot have
nested tables, so this did the trick. Very close to some of your
suggenstions.

:blush:)

Regards,

Henrik
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top