crawler pool

Steve Ocsic · Dec 30, 2003

Hi,

I've coded a basic crawler where by you enter the URL and it will then
crawl the said URL. What I would like to do now is to take it one
step further and do the following:

1. pick up the url's I would like to crawl from a database and pass
them to the crawler. Once the crawler has crawled the website I would
then like to put a flag against it so that the url is not processed
for a certain period of time.

2. create a pool of crawler's that can that can individually be
invoked to process a given url running on separate threads.

Also what is the best way to make sure that the crawler is not
hammering a website.

Sorry about all of the questions I am a newbie. If anyone can point
me in the right direction it would be gratefully appreciated.

Steve

Erik Frey · Dec 30, 2003

Sounds like you are ready to start working with threads:

http://msdn.microsoft.com/library/d...s/cpguide/html/cpconusingthreadsthreading.asp

To avoid hammering a site, be sure to look into Thread.Sleep

Web Crawler Architecture	3	Jun 25, 2008
Downloading files	1	Apr 29, 2006
USB external drives and Daemon Tools Lite don't play nicely together	1	May 24, 2014
Multi-Threaded App	3	Feb 14, 2008
How to WebBrowser.DocumentText with right encoding	5	Jul 17, 2009
C# Crawler and performance (speed of crawling)	1	Jun 14, 2005
Crawler Toolbar - EULA advice needed	5	Dec 1, 2005
crawling the net...	4	Apr 29, 2004

crawler pool

Steve Ocsic

Erik Frey

Ask a Question

Similar Threads