HTTPWebRequest not working with Wikipedia

Mugunth · Mar 10, 2008

I'm trying to use a HTTPWebRequest class to retrieve a webpage. Below
is the following code....

string google = "http://www.google.com.sg/search?
hl=en&btnI=I'm+feeling+Lucky&q=";
string wikipedia = "http://en.wikipedia.org/wiki/
Special:Search?fulltext=Search&search=";

string website = wikipedia; // wikipedia does not work,
google works....

string query = textBoxUserQuery.Text;

// prepare the web page we will be asking for
HttpWebRequest request =
(HttpWebRequest)WebRequest.Create(website + query);

// execute the request
HttpWebResponse response = (HttpWebResponse)
request.GetResponse();

// we will read data via the response stream
Stream resStream = response.GetResponseStream();

Somehow, when is use google, I get a response, where as if I use
wikipedia, I get a Http Error stating
The remote server returned an error: (403) Forbidden.

The status says "System.Net.WebExceptionStatus.ProtocolError"

However I'm able to query for a page like http://en.wikipedia.org/wiki/Main_Page,
but cannot access the search page.

Am I missing something? Please help.

Mugunth

Jon Skeet [C# MVP] · Mar 10, 2008

I'm trying to use a HTTPWebRequest class to retrieve a webpage. Below
is the following code....

Could you produce a short but complete (preferrably console)
application which demonstrates the problem? See http://pobox.com/~skeet/csharp/complete.html
for what I mean by that.

Jon

Mugunth · Mar 10, 2008

I've posted the complete code in my prev post.
It's a console app.

string google = "http://www.google.com.sg/search?
hl=en&btnI=I'm+feeling+Lucky&q=";
string wikipedia = "http://en.wikipedia.org/wiki/
Special:Search?fulltext=Search&search=";

string website = wikipedia; // wikipedia does not work,
google works....

string query = "Microsoft";

// prepare the web page we will be asking for
HttpWebRequest request =
(HttpWebRequest)WebRequest.Create(website + query);

// execute the request
HttpWebResponse response = (HttpWebResponse)
request.GetResponse();

// we will read data via the response stream
Stream resStream = response.GetResponseStream();

the request.GetResponse() call throws an exception when I use
wikipedia search but runs fine and returns a html page when I use
google.

Any Help is appreciated,
Mugunth

Jon Skeet [C# MVP] · Mar 10, 2008

Mugunth said:
I've posted the complete code in my prev post.
It's a console app.

Your previous post contained a reference to "textBoxUserQuery.Text"
which doesn't sound like a console app.

See http://pobox.com/~skeet/csharp/incomplete.html

If it doesn't start with using directives and a class declaration, it's
unlikely to be complete.

Try cutting and pasting what you've posted into a brand new text file
and compile it. It won't work.

Marc Gravell · Mar 10, 2008

I can reproduce the 403... at the end of the day, if they want to
prevent this type of access that is their prerogative?

You could probably go to town trying to spoof a standard request, but
I suspect you might be in violation of their policies (I haven't
checked).

Alternatively, host in a WebBrowser (which is shdocvw), or search for
a *supported* search API / web-service

Marc

Nicholas Paldino [.NET/C# MVP] · Mar 10, 2008

This action is disallowed by Wikipedia. If you check the Robots.txt
file:

http://en.wikipedia.org/robots.txt

You will see this in it:

User-agent: *
Disallow: /wiki/Special:Search

So your response of 403 - Forbidden is expected. They don't want you
doing this.

Peter Bromberg [C# MVP] · Mar 10, 2008

Wikipedia exposes its content for search via several APIs including a few
that have been written and are managed by third -parties. There is an XML
version that returns the MediaWiki markup for a result page inside an Xml
element. You would still have to convert the wiki markup to formatted HTML, a
process which is not trivial. As Nicholas indicated, Wikipedia doesn't want
people "faking" their seach box and redisplaying the scraped content.
-- Peter
Site: http://www.eggheadcafe.com
UnBlog: http://petesbloggerama.blogspot.com
Short Urls & more: http://ittyurl.net

HttpWebRequest - data is truncated?	2	Apr 24, 2008
Reading from a controller by HTTP	11	Mar 8, 2012
HttpWebRequest timeouts	5	Mar 5, 2007
downloading a file HttpWebRequest c# need help	1	Jul 9, 2006
Uploading files using HttpWebRequest and PUT takes too long	1	Feb 27, 2008
Persistemt https session?	2	Feb 19, 2007
using HttpWebRequest to view Reporting Services reports	0	Sep 28, 2006
Problem HttpWebRequest from asp.net app	6	Mar 27, 2006

HTTPWebRequest not working with Wikipedia

Mugunth

Jon Skeet [C# MVP]

Mugunth

Jon Skeet [C# MVP]

Marc Gravell

Nicholas Paldino [.NET/C# MVP]

Peter Bromberg [C# MVP]

Ask a Question

Similar Threads