Screen Scraping From Online Service's Web Site?

P

PeteCresswell

Got a mini-project coming up: user is logging in to an online
service, supplying some parms, hitting "Go" on the service's web
screen, and then manually capturing information from the screen into a
spreadsheet.

They want to automate this process as much as possible, substituting a
little MS Access app for the spreadsheet and having it pull the
information from the screen automagically.

I have never done anything like this before, but my sense is that
screen scraping per-se is mostly obsolete: instead, I would be looking
for the site to expose one or more services that will supply the info.

Two questions:
------------------------------------------------------------------------------------------------
1) How does one poll a site to discover what services are exposed?
A collegue that does .NET says "no problem" with .NET, but
how about MS Access?

2) Lacking a service, would I be on the right track by somehow
capturing the immediate HTML of the screen, beating down it,
and extracting the information?
------------------------------------------------------------------------------------------------
 
T

Tom van Stiphout

On Tue, 3 Nov 2009 07:42:22 -0800 (PST), PeteCresswell

Re #1: Ask the site. They probably have a document on how to integrate
with their services. Indeed there also typically is a file available
that shows what's offered by the service. Isn't that the WSDL file? In
..NET you can point at a url and tell VS to suck in the information and
expose it as a set of classes and methods.
In VBA I would use this tool:
http://msdn.microsoft.com/en-us/library/aa140260(office.10).aspx

Re #2: You can, but it may be better to use the DOM = Document Object
Model so you can work with elements very much like JavaScript does.
Use a WebBrowser control, and the myWebBrowserControl.Document object
and a reference to "Microsoft HTML Object Library" is what you need to
get started.

-Tom.
Microsoft Access MVP
 
L

Larry Linson

Tom is correct, you need to contact the site and see if they have
instructions for how to access their data. See the following:

Once upon a time, a long time ago, in a land not so far away, there was a
journeyman Access person who was contracted to capture data using
then-current versions of Access from online Internet screens from a supplier
of financial information, who did not have a procedure for accessing their
data. The conditions of the contract were "quick and simple", meaning "quick
and dirty is OK", so the Access person set up a system where a user would
display the data, highlight the values and copy them into a text file, then
the user would execute the Access person's VBA code to extract the pertinent
data into an Access table.

This worked astonishingly well, but only for about one week, during which
time a webmaster at the supplier of financial information decided that the
screen would look better with a few small changes. So the Access person was
called in again, modified the VBA code, and it worked astonishingly well
again, for a comparable period of time. Same situation; same solution. After
a few repetitions, the client decided to take the Access person's original
advice, and called the provider of financial information.

Because this client was the largest volume user of the particular services
from the provider of financial information, the contact was happy to arrange
for a procedure for downloading the information to be created. The Access
person obtained the specifications, wrote a few VBA procedures to access it
and stash the data in the already-created tables, and, lo, there was never
another re-write required as long as the client and their provider were
doing business together!

I can vouch that the preceding is neither fiction nor a "fairy tale".

Larry Linson
Microsoft Office Access MVP
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top