M
Mick Walker
Hi,
I am using the following function to match any URLS from within a string
containing the html of a webpage:
public List<string> DumpHrefs(String inputString)
{
Regex r;
Match m;
List<string> LstURLs = new List<string>();
r = new Regex("href\\s*=\\s*(?:\"(?<1>[^\"]*)\"|(?<1>\\S+))",
RegexOptions.IgnoreCase | RegexOptions.Compiled);
for (m = r.Match(inputString); m.Success; m = m.NextMatch())
{
LstURLs.Add(m.Groups[1].ToString());
}
return LstURLs;
}
However the problem with this, is it returns all links on the page, and
I only wish to return fully qualified links such as
http://www.domain.com/page.html and not relitive links.
Does anyone know how I can modfy my regex to do so?
Regards
I am using the following function to match any URLS from within a string
containing the html of a webpage:
public List<string> DumpHrefs(String inputString)
{
Regex r;
Match m;
List<string> LstURLs = new List<string>();
r = new Regex("href\\s*=\\s*(?:\"(?<1>[^\"]*)\"|(?<1>\\S+))",
RegexOptions.IgnoreCase | RegexOptions.Compiled);
for (m = r.Match(inputString); m.Success; m = m.NextMatch())
{
LstURLs.Add(m.Groups[1].ToString());
}
return LstURLs;
}
However the problem with this, is it returns all links on the page, and
I only wish to return fully qualified links such as
http://www.domain.com/page.html and not relitive links.
Does anyone know how I can modfy my regex to do so?
Regards