L
Luhar
After much scouring of information on Regular Expressions from books and the
web, I've come up with the this handy little Regex to parse links from HTML:
<a\s+href(?:\s+)?=(?:\s+)?[""']+(.?[^'""]+)['""]+(?:\s+)?>(.*?)</a>
It works quite well at extracting the url and title of a link from an anchor
tag, with one major problem--if the anchor tag includes other attributes
after the HREF= attribute, such as TITLE= or TARGET=, it doesn't consider it
a match. Here are some examples:
This one matches:
<a href="/">Home</a>
Group 1: "/"
Group 2: "Home"
This one doesn't:
<a href="/" target="_blank">Home</a>
I can't figure out how to match just the href attribute and just the link
text. Any help would be appreciated.
Thanks.
web, I've come up with the this handy little Regex to parse links from HTML:
<a\s+href(?:\s+)?=(?:\s+)?[""']+(.?[^'""]+)['""]+(?:\s+)?>(.*?)</a>
It works quite well at extracting the url and title of a link from an anchor
tag, with one major problem--if the anchor tag includes other attributes
after the HREF= attribute, such as TITLE= or TARGET=, it doesn't consider it
a match. Here are some examples:
This one matches:
<a href="/">Home</a>
Group 1: "/"
Group 2: "Home"
This one doesn't:
<a href="/" target="_blank">Home</a>
I can't figure out how to match just the href attribute and just the link
text. Any help would be appreciated.
Thanks.