WebRequest from behind a proxy

james.dixon · Dec 20, 2005

Hi

I have been struggling with what should be a simple thing. I want to
crawl the web for specific links, and save any html and files that meet
my criteria to my hard drive.

I thought that WebRequest/WebResponse would be the best way to go,
using a proxy. The page that I am returned is just a proxy generated
file, and not the source code. Is there any way around this?

As a workaround, I used the AxWebBrowser and an mshtml document. this
works well, except when I come to downloading files. As stated above,
the easy options don't work - WebRequest/WebRespose return a 404:File
not Found error (which you would expect, given that it doesn't get to
the html), webclient doesn't work behind a proxy AND after doing some
searching, it is proving difficult to programmatically download linked
files using AxWebbrowser.

So, my questions to the smart people out there include:

a) is there any way to get the webrequest/webresponse objects working
behind a proxy when only proxy generated source is returned to the
WebRequest object 9and not the file source code) - could this be
anything to do with proxy authorisation?;

b) is there a way to programattically download pdf files using
AxWebbrowser?

Grateful for any advice.

Cheers

James

Yosh · Dec 20, 2005

Does this help?

http://support.microsoft.com/default.aspx?scid=KB;EN-US;Q301102&ID=KB;EN-US;Q301102&SD=MSDN

Yosh

Michael C · Dec 20, 2005

b) is there a way to programattically download pdf files using
AxWebbrowser?

Grateful for any advice.

Can't you just use the TcpClient class to get the page? The syntax is very
simple and you get the full text of the page back unprocessed, so you can do
what you like with it. All you need to do is send:

GET /pagename.htm HTTP/1.1
HOST: nameofhost.com

then 2 crlfs.

Michael

james.dixon · Dec 20, 2005

Thanks Yosh - yes I had seen that and tried webrequest etc, but
couldn't get through.

Michael - could you provide more information (or links to more
information) on how to set a simple TCPClient up - haven't done it
before.

Thanks

James

Michael C · Dec 20, 2005

Thanks Yosh - yes I had seen that and tried webrequest etc, but
couldn't get through.

Michael - could you provide more information (or links to more
information) on how to set a simple TCPClient up - haven't done it
before.

Note the slash after the GET is the page you're requesting, in this case the
default page for MS.

TcpClient client = new TcpClient();
client.Connect("www.microsoft.com", 80);
NetworkStream stream = client.GetStream();
byte[] data = System.Text.ASCIIEncoding.ASCII.GetBytes("GET /
HTTP/1.1\r\nHOST: microsoft.com\r\n\r\n");
stream.Write(data, 0, data.Length);
data = new byte[256];
int len = 0;
do
{
len = stream.Read(data, 0, 256);
Console.WriteLine(System.Text.ASCIIEncoding.ASCII.GetString(data, 0, len));
}while(len == 256);
stream.Close();
client.Close();

Michael

james.dixon · Dec 20, 2005

Thanks Michael - that code worked beautifully. I have also found
another proxy server I can use which allows me to use WebRequest and
WebResponse, so case closed ...

webclient-webrequest please help!	2	Jun 16, 2006
WebRequest ignoring cache	1	Jan 23, 2005
The Remote server returned an error :(407) Proxy Authentication Required	0	Oct 31, 2007
ftp request through ftp proxy	5	Jan 23, 2008
Determine Filename	1	Jun 5, 2009
"The Remote server returned an error :(407) Proxy Authentication Required "	3	Oct 31, 2007
Accessing the javascript on a webpage using Webrequest/Webresponse	2	Jan 26, 2005
httpwebrequest with https behind proxy with authentication	4	Dec 6, 2007

WebRequest from behind a proxy

james.dixon

Yosh

Michael C

james.dixon

Michael C

james.dixon

Ask a Question

Similar Threads