Scraping/listing document URLs on a server that don't have web pages/existing links?

K

Keith R

We have a server-side database that includes BLOBs of MS word documents.
Some of those documents have always been available via URL hyperlinks on an
intranet web page. We've asked the administrator to expose a new group of
documents from that database, although the links to those newly exposed
documents won't be built into the existing web interface (they should still
be available via individual intranet URLs).

As the first step in a new project, I'd like to scrape all of those document
URLs. I assume it would be something like a recursive tree search, since
there is a heirarchical order of the documents in the existing web
interface- so for example,
/WPSAC/1-1000/356/356.doc
/WPSAC/1-1000/781/781.doc
/WPSAC/1001-2000/2294/2294.doc
/WPSAC/1001-2000/2770-2790/specials/2776.doc
/WPSAC/Revised/single_entry/B438.doc
etc.

I haven't worked with web stuff at all (although I'm decent with VBA)- any
pointers on where to start, to build a list in Excel where each sequential
cell contains a link to the next word document?

My next step on the project will be to build a user interface so a user can
select a document number and have it automatically load the link, but I need
to get the links themselves first.

Thanks!
Keith
 
T

Tim Williams

Keith,

From your description it sounds as though the docs are kept in a database
(which flavour?) and not on the web server filesystem. If this is the case
then you won't be able to "scrape" the URL's: there is no "Dir()" equivalent
in this case.

If web links aren't to be created for the new docs it's not clear how
they're being exposed to you. If they're in a database then potentially you
could use something like ADO to search and index the docs from Excel. Even
then, creating a hyperlink clickable in XL would require some type of
scripting set up on the server to deliver the requested file from the DB
table.

Tim
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top