[OT?] download Wikipedia....

L

Lloyd Dupont

I would like to use some of the data in the Wikipedia in one of my (.NET!)
program.
I'm still at the stage of trying to figure out how to download the data.

Any tip on:
- how to download WikiPedia data?
- how to use the data once downloaded?
 
A

Alex Li

Do you mean scrapping the wikipedia webpages? If that is the case,
then you want to take a look at the System.Net namespace; in particular
the WebClient class or the HttpWebRequest class for download the
content; then use a parser to extra the data from the webpage content.

Does that help?
Alex
 
L

Lloyd Dupont

Nono...
I found the URL, you could download the wikipedia's books at:
http://download.wikimedia.org/

Now I am the stage, trying to figure out what to do with this 136MB long XML
file.
Obviously basic XML tool which simply load it in memory are
innapropriate....
 
G

Gaurav Vaish \(EduJini.IN\)

Hi Lloyd,
Now I am the stage, trying to figure out what to do with this 136MB long
XML file.
Obviously basic XML tool which simply load it in memory are
innapropriate....

What is that you are trying to achieve will define a lot of things.
May be you need to upgrade to 2GB machine to load all XML in memory
May be you need serial access (XMLTextReader) and can do with only
512MB of RAM.


--
Cheers,
Gaurav Vaish
http://www.mastergaurav.org
http://www.edujini.in
-------------------
 
L

Lloyd Dupont

Now I am the stage, trying to figure out what to do with this 136MB long
What is that you are trying to achieve will define a lot of things.
May be you need to upgrade to 2GB machine to load all XML in memory
May be you need serial access (XMLTextReader) and can do with only
512MB of RAM.
I try something very simple with XMLTextReader:
XmlTextReader xml = new XmlTextReader("theBigFile.xml");
while(!xml.EOF)
xml.Skip();

it tooks ages.....
so I am kind of dubious I could use for anything usefull....

but that's kind of suprising as I found some other WikiPedia tool which
didn't seem to ave any trouble.. mhh....
 
L

Lloyd Dupont

Hu.. doesn't demonstrate much to me.
Anyway, interestingly This:
===
XmlTextReader xml = new XmlTextReader("theBigFile.xml");
xml.ReadStartElement(); <<== new
while(!xml.EOF)
xml.Skip();
===
works much better!...
 
G

Gaurav Vaish \(EduJini.IN\)

Ha ha ha ha.
That tells me that we should be given access to the source code of the
application to check and report the code that result in these issues.

Let me also try out.. should be interesting to work with :D
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top