A question about a failing regular expression

A

Anthony P.

Hello Everyone,

My application needs to parse some HTML. As is usual in HTML parsing,
I just need the data between two HTML tags. So here is my regular
expression:

Dim myRegex2 = New Regex("<td headers=""re2 e1"" align=""right""
valign=""bottom"">" & _
"((.|\n)*?)<sup>",
RegexOptions.IgnoreCase)

Now, this is suppose to get the text between the <td headers tag> and
the <sup> tag. But, instead, it returns the entire tag including all
of the attributes. What am I doing wrong?

Thanks!
 
A

Anthony Papillion

Sorry, I forgot to add that I am also doing the required myMatch =
myRegex2.Match(sContent) after the expression thereby performing the
match against the string sContent.
 
E

eBob.com

The expression is returning what you have asked for. Maybe not what you are
interested in, but what you have asked for.

You need to look at what the author of my favorite reference (Balena) calls
"zero width positive/negative look-ahead/behind assertions". These are
"grouping constructs". Maybe you could use a "noncapturing group" - I don't
think I've used that construct.

(I'd like to be more specific but I am at the wrong computer at the moment.)

ALSO ... do yourself a favor and get a FREE product named Expresso from
Ultrapico. It is WONDERFUL for developing regular expressions.

Regular expressions are very useful but not very intuitive. Ask if you have
further questions.

Good Luck, Bob
 
B

Branco

Anthony P. wrote:
My application needs to parse some HTML. As is usual in HTML parsing,
I just need the data between two HTML tags. So here is my regular
expression:

Dim myRegex2 = New Regex("<td headers=""re2 e1"" align=""right""
valign=""bottom"">" & _
                                          "((.|\n)*?)<sup>",
RegexOptions.IgnoreCase)

Now, this is suppose to get the text between the <td headers tag> and
the <sup> tag. But, instead, it returns the entire tag including all
of the attributes.  What am I doing wrong?
<snip>

You probably figured it out at this point, but it seems you need to
retrieve the grouped text from the Match's Groups property (the groups
collection is 0 based, but the 0th item is the full matched text, thus
you need to retrieve group(1):

<example>
Dim M As Match = MyRegex2.Match(sContent)
Do While M.Success
'////
Dim Text As String = M.Groups(1).Value
'////
'...
'Do something with Text
'...
M = M.NextMatch
Loop
</example>

HTH

Regards,

Branco
 
A

Anthony Papillion

You probably figured it out at this point, but it seems you need to
retrieve the grouped text from the Match's Groups property (the groups
collection is 0 based, but the 0th item is the full matched text, thus
you need to retrieve group(1):
<snip?

Hi Branco,

No, I hadn't figured it out yet and I thank you for your help. I saw
something about the match's groups the other day but it didn't click
that was what I needed thank you sir!

Anthony
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top