Regex match backwards?

B

Brent

I'm hoping I can get some pointers on this regular express problem:

1) Text to look in:

<tr><td>Hello, salut</td><td><a href="http://URL1.com">URL1</a></td><td
align=center><small>Goodbye, a bientot</small></td><td align=center><a
href=URL2.html>URL2</a></td>

2) Possible (but wrong) regex:

a.href\=(?<listenURL>.*?)\>(?=URL2)

The problem is that <listenURL> returns everything from the "href"
before "URL1" up until "URL2". I only want it to return "URL2.html". Is
there any way to match backwards? Or any other solutions?

Thanks for your help.

--Brent
 
M

Mark Harris

Try this to find all url's in the html document:
<a\shref="?(?<listenURL>.+?)"?>

this will only work on <a href="bleh"> not <a style="stuff" href="bleh">

if you wanted to match all you'd need something more like:
<a\s.*href="?(?<listenURL>.+?)"?.*?>

- Mark H
 
X

xicheng

Brent said:
I'm hoping I can get some pointers on this regular express problem:

1) Text to look in:

<tr><td>Hello, salut</td><td><a href="http://URL1.com">URL1</a></td><td
align=center><small>Goodbye, a bientot</small></td><td align=center><a
href=URL2.html>URL2</a></td>

2) Possible (but wrong) regex:

a.href\=(?<listenURL>.*?)\>(?=URL2)

how about this:

href=(")?(?<listenURL>(?:(?!href).)*)(?(1)")(?=>URL2)

this should not take any other 'href' in your captured URLs..
(untested)

Xicheng
 
X

Xicheng Jia

how about this:
=> href=(")?(?<listenURL>(?:(?!href).)*)(?(1)")(?=>URL2)

err, better change this into non-greedy form:

href=(")?(?<listenURL>(?:(?!href).)*?)(?(1)")(?=>URL2)

not work if you have some other attributes between the "href" attribute
and the closing '>'

Xicheng:)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top