How to extract all links/url from web page?

L

learnyourabc

For a webcrawler, you need to extract all links from the web page. For
normal html anchor tags or any of the src and href attribute on the
tag can be easily extracted using ihtmldocument.
What about links inside of javascript function like below??

<HEAD>
<SCRIPT language="JavaScript">
<!--hide

function newwindow()
{
window.open('jex5.htm','jav','width=300,height=200,resizable=yes');
}
//-->
</SCRIPT>

<A HREF="javascript:newwindow()" >Click Here!</A>

or
javascript function with the following
function newwindow()
{
......
window.location('http://www.google.com')
}

<input type=button onclick="javascript:newwindow()" >Click Here!

How to extract the links from these javascript function??

Any help would be much appreciated. Can a crawler extract such links
and how??
 
V

vincent.apesa

For a webcrawler, you need to extract all links from the web page. For
normal html anchor tags or any of the src and href attribute on the
tag can be easily extracted using ihtmldocument.
What about links inside of javascript function like below??

<HEAD>
<SCRIPT language="JavaScript">
<!--hide

function newwindow()
{
window.open('jex5.htm','jav','width=300,height=200,resizable=yes');}

//-->
</SCRIPT>

<A HREF="javascript:newwindow()" >Click Here!</A>

or
javascript function with the following
function newwindow()
{
.....
window.location('http://www.google.com')

}

<input type=button onclick="javascript:newwindow()" >Click Here!

How to extract the links from these javascript function??

Any help would be much appreciated. Can a crawler extract such links
and how??

Regular expressions are the best way to go. Store the entire HTML
contents in a string and search it for patterns matches. You can find
a ton of RegEx tutorials online.
 
L

learnyourabc

Regular expressions can only be used to extract the link from the text
if it is displayed inside the javascript in clear text. how to extract
all instances of links formed inside javascript automatically? say
combination of some variables to form the link? Have to execute the
script for the onclick button ext to get the link?? Anyone has any
suggestions?? How
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top