programmatically retrieve links from web page

L

Loane Sharp

Hi there

I am using the Microsoft XML v6.0 library to retrieve a web page from the
Internet, as follows:

Dim oHttp As Object
Set oHttp = CreateObject("MSXML2.XMLHTTP")
oHttp.Open "GET", "http://www.microsoft.com/default.aspx", False
oHttp.Send
content = oHttp.responseText

Once downloaded, I want to search through the page for all URLs that link
through to other web pages (ie. contained within <a> </a> tags). The problem
is that, given the huge diversity of formats for links (relative and
absolute references, url-encoding, etc.), I'm struggling to write out all
the possibilities in code.

Is there an easier way to retrieve the contents of a specific element in a
web page, or even better, to scroll through collections of elements? I've
tried XML proper (MSXML2.DOMDocument40) but this doesn't seem to work with
HTML pages' loose structure.

Best regards
Loane
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top