Advice on webpage scraper

  • Thread starter Thread starter Veign
  • Start date Start date
V

Veign

I have a friend looking for a webpage scraper application to extract
addresses from a page containing several hundred company names and
addresses.

Any help would be appreciated.
 
I have a friend looking for a webpage scraper application to extract
addresses from a page containing several hundred company names and
addresses.
Any help would be appreciated.
Chris Hanscom

http:www.lencom.com I bought the thing eventually
but I think they have a basic version. The paid version
will SPIDER it's way through various domains even.

Ned
 
I have a friend looking for a webpage scraper application to extract
addresses from a page containing several hundred company names and
addresses.

Since that's pretty specific to the page it's scraping, I doubt that
you'll find anything off the shelf.
 
I had come across one several years ago that did this but I totally see what
you are saying. Since addresses can be very complex and inconsistent along
with page coding styles I can see that it may be a tall order...
 
Veign said:
I have a friend looking for a webpage scraper application to extract
addresses from a page containing several hundred company names and
addresses.

Any help would be appreciated.

Write Visual Foxpro program to do it. :)

--
.~. Might, Courage, Vision. SINCERITY. http://www.linux-sxs.org
/ v \ Simplicity is Beauty! May the Force and Farce be with you!
/( _ )\ (Ubuntu 5.10) Linux 2.6.16.11
^ ^ 10:37:01 up 2 days 17:51 load average: 1.00 1.00 1.00
news://news.3home.net news://news.hkpcug.org news://news.newsgroup.com.hk
 
I had come across one several years ago that did this but I totally see what
you are saying. Since addresses can be very complex and inconsistent along
with page coding styles I can see that it may be a tall order...

Not so much inconsistent over a page - if that's the case the program
probably can't be written (I tell my customers that there's a second
order relationship between inconsistency and price - twice as
inconsistent causes 4 times the work, so it costs 4 times as much) -
but inconsistent from one site to another. The program has to be
written for that site.

That said, writing a program to parse a web site is fairly easy if you
know any programming language. A language like Perl makes it trivial,
but even VB6 can go to the page, download it and save a file of names,
addresses and phone numbers from the data it gets.
 
I have written an application that parses webpage's but it doesn't handle
addresses. I was trying to avoid modifying the application as the cost for
an existing program would be cheaper then me doing it.

Thanx for you help but I will probably modify my app to handle the site's
content...

If your interested in the application, its free, its at
http://www.veign.com/download_app.asp?app=108 . It should be going through
a major overhaul shortly that will clean up the results.
 
I have a friend looking for a webpage scraper application to extract

Do you really need a dedicated application for the job?

How about you just save the web page to a local copy and then use tools
to parse the source code. If the info is well delimited (eg with
quotes, semi colons or commas) then an editor that supports regular
expressions with strip them all out nicely (eg Regex Power!
http://www.ware4u.de/regexpower/download.html).

If the data is not delimited nicely but there is some other pattern, eg
CRLF around the names or addresses, or some tags on the fields (eg
"NAME:") then you could try an editor that supports macros (eg a macro
that deletes all but the names, then restart and use a macro that
deletes all but the first address line... etc) like notepad++
(http://notepad-plus.sourceforge.net/uk/site.htm)

W4tch3r =3F=3F¿=3F=3F
 
Back
Top