Excel VBA to acess currently open web page (existing IE6 instance)

K

ker_01

Thanks to Ron and ryguy for responses to an earlier thread- I asked my
question poorly and the thread died, so here goes another try.

I have a series of web pages that for the sake of convenience, I really
can't access programatically with Excel (I know it is probably possible, but
there are several pages of data entry to get to the target pages, and it
isn't practical)

So, I'm perfectly willing to access the web page directly in IE6. However, I
want to scrape the target page's source and pull a bunch of data into Excel.
This web page design prevents copy/paste, so my only way to scrape the data
is from the page source (or hand-retype all the data, which is what I'm
trying to avoid).

I haven't found any information on how to get Excel to interact with an
existing IE6 instance and grab the page source on the fly (when a macro
runs). Can anyone point me in the right direction?

Also, I'm currently on IE6, but if a newer version of IE would make this
easier to program, I'd consider upgrading.

Many thanks,
Keith
 
J

JLGWhiz

If you figure out how to do this, be sure to post back to the group. I am
sure a lot of us would want to know how to do that.
 
D

Don Guillett

There may be a way but witout seeing all the info, it's hard to say. Give
urls and the desired. OR,
If desired, send your file to my address below along with this msg and
a clear explanation of what you want and before/after examples.
 
K

ker_01

Don- thank you for your generous offer. Unfortunately, the URL is on a secure
site, and I have no way to give access to others; the good news is that I can
right-click and I see that the page source contains the information I want to
scrape with RegEx, so really any web page will do.

Here is a test case:
* Open IE
* Navigate to www.google.com (or an alternate address of your preference)
* Click View/Source - this will bring up the page source in notepad

The part I'm missing is how to get this source automatically from an
"already open" IE window into Excel as a string. With Google it would be easy
to open the URL directly from Excel VBA, but my challenge is to capture that
text string directly from an already open browser window.

I've been abusing code snippets from all sorts of sources (non-VBA sites,
etc) but haven't gotten anywhere. I expect it will be something like:

Dim IE As SHDocVw.InternetExplorer
Do
MsgBox IE.document '.body.innerHTML?
Loop

I appreciate your time and expertise, and any advice you might be able to
offer.

Thank you!
Keith
 
K

ker_01

Thanks Steve and Tim- this definitely has me on the right track.

Turns out that the content is not within the innerHTML, and I've been
googling and checking MSDN for syntax to return all, but so far no luck. If
you happen to come back to this thread, I'd welcome any syntax hints on how
to return the full page source, which I can then parse with RegEx.

Many thanks,
Keith
 
T

Tim Williams

document.body.innerHTML

should do it. Without seeing the page you're trying to capture it's
hard to know what else to suggest.

Tim
 
T

Tim Williams

Agreed: any page which generates content on the client-side once the page
has loaded (eg. via javascript/AJAX) will show differences between
view-source (the HTML as delivered from the server) vs. getting the
innerHTML of the final page. However, the latter should reflect the final
content, and so it's not clear why innerHTML isn't giving the correct thing.

To test, I'd suggest using a bookmarklet script to view the final HTML:
http://sniptools.com/vault/view-source-with-a-bookmarklet

If that doesn't show what you expect then you'll have to post a URL or give
some description of what's on the page.

Tim
 
K

ker_01

Steve, Tim, JLGWhiz, and Don-

Thank you all for your help. I now have it working (details below).
Unfortunately I haven't done any web programming, so while I originally
thought I needed the innerHTML, I didn't know what was client side, etc, so I
was just fumbling to get everything so I could parse out the little bits I
need.

Turns out I was working on the wrong line of code; I was trying to edit
Set rng = objIE.Document.Body.CreateTextRange
to things like ...body.all
but when I left that line alone and went to the subsequent line
strPageHTML = rng.htmlText
and changed it to
strPageHTML = rng.text

Now the resulting text string includes the content I need to parse.

Thank you all- I would not have figured this all out (pulling an active IE
instance, etc) without the help of this newsgroup- it is much appreciated!

Keith
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top