Downloading files

ssduli · Apr 29, 2006

I am writing a web crawler to download all html and jpg files to a
directory locally. Since dns lookups present a possible bottleneck, i
have decide to incorporate a dns cache in the form of a hashtable. The
urls are resolved before they are passed onto the crawler for crawling.

My problem is that the methods in the WebClient class take a string
address, presumably the website's dns name, rather than the IP address
of the website. I guess this means that the address will still have to
be resolved using ecternal name servers and would therefore render my
local dns cache useless.

Is there a way of accessing a website by providing an IP address rather
than the URL? Or do I have to resort to using sockets?

Greg Young [MVP] · Apr 29, 2006

simply remove the domain in the url and replace it with the ip address.

example:

http://google.com
http://64.233.167.99

Some websites however change their behavior based upon the url you use to
get them.

Your DNS should be cached anyways by the OS btw. To view your current DNS
cache try ipconfig /displaydns from a command prompt.

Cheers,

Greg

Downloading files

ssduli

Greg Young [MVP]