regex problem

J

japi

Hi,

as a regex starter I am having a little trouble here.

suppose i want to parse the folling html fragment:

<li>
a
</li>
<li>
b
</li>


I would like to have a regular expression that matches each of the li
starttag and corresponding endtag, including it's inner text.

When i use the following regex (in SingleLine mode) it matches the
first <li> with the last </li> tag, which results in returning only one
match instead of 2.

so

<li>(?<itemcontents>.*)</li>

seems to match the following as a whole (which is not my intention):

<li>
a
</li>
<li>
b
</li>

I hope my problem is clear, and somebody here can help me :)

Thanks
Jaap
 
M

Markus Stoeger

japi said:
When i use the following regex (in SingleLine mode) it matches the
first <li> with the last </li> tag, which results in returning only one
match instead of 2.

so

<li>(?<itemcontents>.*)</li>

use .*? instead of .*

the ? makes it lazy (without it it is greedy). the difference is that
when it is lazy it only matches to the _next_ match. while when it is
greedy it matches up to the _last_ match.

hth,
Max
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top