S
Steve Ocsic
Hi,
I've coded a basic crawler where by you enter the URL and it will then
crawl the said URL. What I would like to do now is to take it one
step further and do the following:
1. pick up the url's I would like to crawl from a database and pass
them to the crawler. Once the crawler has crawled the website I would
then like to put a flag against it so that the url is not processed
for a certain period of time.
2. create a pool of crawler's that can that can individually be
invoked to process a given url running on separate threads.
Also what is the best way to make sure that the crawler is not
hammering a website.
Sorry about all of the questions I am a newbie. If anyone can point
me in the right direction it would be gratefully appreciated.
Steve
I've coded a basic crawler where by you enter the URL and it will then
crawl the said URL. What I would like to do now is to take it one
step further and do the following:
1. pick up the url's I would like to crawl from a database and pass
them to the crawler. Once the crawler has crawled the website I would
then like to put a flag against it so that the url is not processed
for a certain period of time.
2. create a pool of crawler's that can that can individually be
invoked to process a given url running on separate threads.
Also what is the best way to make sure that the crawler is not
hammering a website.
Sorry about all of the questions I am a newbie. If anyone can point
me in the right direction it would be gratefully appreciated.
Steve