Downloading files

  • Thread starter Thread starter ssduli
  • Start date Start date
S

ssduli

I am writing a web crawler to download all html and jpg files to a
directory locally. Since dns lookups present a possible bottleneck, i
have decide to incorporate a dns cache in the form of a hashtable. The
urls are resolved before they are passed onto the crawler for crawling.


My problem is that the methods in the WebClient class take a string
address, presumably the website's dns name, rather than the IP address
of the website. I guess this means that the address will still have to
be resolved using ecternal name servers and would therefore render my
local dns cache useless.

Is there a way of accessing a website by providing an IP address rather
than the URL? Or do I have to resort to using sockets?
 
simply remove the domain in the url and replace it with the ip address.

example:

http://google.com
http://64.233.167.99

Some websites however change their behavior based upon the url you use to
get them.

Your DNS should be cached anyways by the OS btw. To view your current DNS
cache try ipconfig /displaydns from a command prompt.

Cheers,

Greg
 
Back
Top