programmatically retrieve links from web page

  • Thread starter Thread starter Loane Sharp
  • Start date Start date
L

Loane Sharp

Hi there

I am using the Microsoft XML v6.0 library to retrieve a web page from the
Internet, as follows:

Dim oHttp As Object
Set oHttp = CreateObject("MSXML2.XMLHTTP")
oHttp.Open "GET", "http://www.microsoft.com/default.aspx", False
oHttp.Send
content = oHttp.responseText

Once downloaded, I want to search through the page for all URLs that link
through to other web pages (ie. contained within <a> </a> tags). The problem
is that, given the huge diversity of formats for links (relative and
absolute references, url-encoding, etc.), I'm struggling to write out all
the possibilities in code.

Is there an easier way to retrieve the contents of a specific element in a
web page, or even better, to scroll through collections of elements? I've
tried XML proper (MSXML2.DOMDocument40) but this doesn't seem to work with
HTML pages' loose structure.

Best regards
Loane
 
Back
Top