.NET Regex href

R

Ryan Moore

I am trying to write a regex expression which extracts all href links
from a HTML page... I'm currently using the following:

href\\s*=\\s*(?:\"(?<1>[^\"]*)\"|(?<1>\\S+))

but it has a problem with hrefs enclosed in single quotes, such as:

<a href='anotherpage.htm'>

I'm not a regex guru, can anyone point me in the right direction?

Thanks!
 
R

Robby

You could add a grouping that has both the " and the ' like the following;

href\s*=\s*(?:["'](?<1>[^"']*)["']|(?<1>\\S+))

You need to add your escapes for C# strings. I am VB. :)

Robby
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Top