Downloading files

  • Thread starter Thread starter ssduli
  • Start date Start date
S

ssduli

I am writing a web crawler to download all html and jpg files to a
directory locally. Since dns lookups present a possible bottleneck, i
have decide to incorporate a dns cache in the form of a hashtable. The
urls are resolved before they are passed onto the crawler for crawling.


My problem is that the methods in the WebClient class take a string
address, presumably the website's dns name, rather than the IP address
of the website. I guess this means that the address will still have to
be resolved using ecternal name servers and would therefore render my
local dns cache useless.

Is there a way of accessing a website by providing an IP address rather
than the URL? Or do I have to resort to using sockets?
 
simply remove the domain in the url and replace it with the ip address.

example:

http://google.com
http://64.233.167.99

Some websites however change their behavior based upon the url you use to
get them.

Your DNS should be cached anyways by the OS btw. To view your current DNS
cache try ipconfig /displaydns from a command prompt.

Cheers,

Greg
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Back
Top