J
james.dixon
Hi
I have been struggling with what should be a simple thing. I want to
crawl the web for specific links, and save any html and files that meet
my criteria to my hard drive.
I thought that WebRequest/WebResponse would be the best way to go,
using a proxy. The page that I am returned is just a proxy generated
file, and not the source code. Is there any way around this?
As a workaround, I used the AxWebBrowser and an mshtml document. this
works well, except when I come to downloading files. As stated above,
the easy options don't work - WebRequest/WebRespose return a 404:File
not Found error (which you would expect, given that it doesn't get to
the html), webclient doesn't work behind a proxy AND after doing some
searching, it is proving difficult to programmatically download linked
files using AxWebbrowser.
So, my questions to the smart people out there include:
a) is there any way to get the webrequest/webresponse objects working
behind a proxy when only proxy generated source is returned to the
WebRequest object 9and not the file source code) - could this be
anything to do with proxy authorisation?;
b) is there a way to programattically download pdf files using
AxWebbrowser?
Grateful for any advice.
Cheers
James
I have been struggling with what should be a simple thing. I want to
crawl the web for specific links, and save any html and files that meet
my criteria to my hard drive.
I thought that WebRequest/WebResponse would be the best way to go,
using a proxy. The page that I am returned is just a proxy generated
file, and not the source code. Is there any way around this?
As a workaround, I used the AxWebBrowser and an mshtml document. this
works well, except when I come to downloading files. As stated above,
the easy options don't work - WebRequest/WebRespose return a 404:File
not Found error (which you would expect, given that it doesn't get to
the html), webclient doesn't work behind a proxy AND after doing some
searching, it is proving difficult to programmatically download linked
files using AxWebbrowser.
So, my questions to the smart people out there include:
a) is there any way to get the webrequest/webresponse objects working
behind a proxy when only proxy generated source is returned to the
WebRequest object 9and not the file source code) - could this be
anything to do with proxy authorisation?;
b) is there a way to programattically download pdf files using
AxWebbrowser?
Grateful for any advice.
Cheers
James