Howto? Download multiple (increasing) webpages?

  • Thread starter Thread starter lennart
  • Start date Start date
L

lennart

Hi everyone,

On the net, i've found an old book i have to use for my study (yes,
it's copyright free, it's from the 17th century). You can view the
pages with an URL like this : .../book/page001.html,
..../book/page002.html etc. Some pages are missing
(.../book/page093.html does not exist, but 92 and 94 do).

What kind of grabber you recommend? It has to be working with filters.
 
lennart said:
Hi everyone,

On the net, i've found an old book i have to use for my study (yes,
it's copyright free, it's from the 17th century). You can view the
pages with an URL like this : .../book/page001.html,
.../book/page002.html etc. Some pages are missing
(.../book/page093.html does not exist, but 92 and 94 do).

What kind of grabber you recommend? It has to be working with filters.

I've used httrack (httrack.com) on and off for about a year. Very
useful. One thing, please take the time to read the configuration set
up. Otherwise it can establish more links to the target site than is
considered "polite."

hth,
-Craig
 
Craig said:
I've used httrack (httrack.com) on and off for about a year. Very
useful.

Me too. I use the Firefox extension SpiderZilla which integrates Httrack
in FF. Very handy.

One thing, please take the time to read the configuration set
up. Otherwise it can establish more links to the target site than is
considered "polite."

Can you please explain what "impolite" is for you? I often have the
impression that I don't target enough. When I want to download the pdf
files of a website I also get all the useless rest.

- Frank
 
FTR said:
Can you please explain what "impolite" is for you? I often have the
impression that I don't target enough. When I want to download the pdf
files of a website I also get all the useless rest.

- Frank

Hey Frank;

A very good description of "impolite" httrack behavior is written up
here: http://www.httrack.com/html/abuse.html. In a nutshell, everything
described is "common sense" as long as the user makes the effort to
understand the implications of technologies being used.

Of paramount concern (to me at least) are two points:
1) honoring copyrighted material (includes mirroring)
2) Not swamping the target site w/too many connections or unfiltered
requests or doing it during business hours.

hth,
-Craig
 
Back
Top