regex problem

  • Thread starter Thread starter japi
  • Start date Start date
J

japi

Hi,

as a regex starter I am having a little trouble here.

suppose i want to parse the folling html fragment:

<li>
a
</li>
<li>
b
</li>


I would like to have a regular expression that matches each of the li
starttag and corresponding endtag, including it's inner text.

When i use the following regex (in SingleLine mode) it matches the
first <li> with the last </li> tag, which results in returning only one
match instead of 2.

so

<li>(?<itemcontents>.*)</li>

seems to match the following as a whole (which is not my intention):

<li>
a
</li>
<li>
b
</li>

I hope my problem is clear, and somebody here can help me :)

Thanks
Jaap
 
japi said:
When i use the following regex (in SingleLine mode) it matches the
first <li> with the last </li> tag, which results in returning only one
match instead of 2.

so

<li>(?<itemcontents>.*)</li>

use .*? instead of .*

the ? makes it lazy (without it it is greedy). the difference is that
when it is lazy it only matches to the _next_ match. while when it is
greedy it matches up to the _last_ match.

hth,
Max
 
Back
Top