Can I import raw HTML source code into Excel?

G

Guest

I want to run a web query from Excel that retreives the raw HTML code, not
the parsed HTML page.

Specifically, instead of retreiving text from the page, I want to return the
page's HTML title (read: the text that falls between the <title> tags).

Unfortunately, the web query interface only lets you pick page content. I
recorded the web query process and tried hacking the VBA by hand, but it
doesn't appear that there are any objects/parameters/commands that allow me
to import the page as pure source code.

Does anyone know how to trick the web query into loading the source code
into worksheet? If so, I can then write some VBA to strip the <title> value
out for me.

Any gurus out there have an answer?

Thanks in advance-
Jon
 
J

Jake Marx

Hi Jon,

Here's a function that should return the page title of an HTML document (or
an empty string if URL is invalid or can't be found):

Public Function gsGetURLTitle(rsURL As String) As String
Dim ie As Object

Set ie = CreateObject("InternetExplorer.Application")

With ie
.Navigate rsURL
Do While .Busy And Not .ReadyState = 4
DoEvents
Loop

gsGetURLTitle = IIf(StrComp(.Document.Title, _
"Cannot find server", vbTextCompare) = 0, _
vbNullString, .Document.Title)
.Quit
End With

Set ie = Nothing
End Function

--
Regards,

Jake Marx
MS MVP - Excel
www.longhead.com

[please keep replies in the newsgroup - email address unmonitored]
 
G

Guest

THANKS JAKE! I can't believe you did that in 4 minutes.

-Jon


Jake Marx said:
Hi Jon,

Here's a function that should return the page title of an HTML document (or
an empty string if URL is invalid or can't be found):

Public Function gsGetURLTitle(rsURL As String) As String
Dim ie As Object

Set ie = CreateObject("InternetExplorer.Application")

With ie
.Navigate rsURL
Do While .Busy And Not .ReadyState = 4
DoEvents
Loop

gsGetURLTitle = IIf(StrComp(.Document.Title, _
"Cannot find server", vbTextCompare) = 0, _
vbNullString, .Document.Title)
.Quit
End With

Set ie = Nothing
End Function

--
Regards,

Jake Marx
MS MVP - Excel
www.longhead.com

[please keep replies in the newsgroup - email address unmonitored]


J said:
I want to run a web query from Excel that retreives the raw HTML
code, not the parsed HTML page.

Specifically, instead of retreiving text from the page, I want to
return the page's HTML title (read: the text that falls between the
<title> tags).

Unfortunately, the web query interface only lets you pick page
content. I recorded the web query process and tried hacking the VBA
by hand, but it doesn't appear that there are any
objects/parameters/commands that allow me to import the page as pure
source code.

Does anyone know how to trick the web query into loading the source
code into worksheet? If so, I can then write some VBA to strip the
<title> value out for me.

Any gurus out there have an answer?

Thanks in advance-
Jon
 
J

Jake Marx

Jon said:
THANKS JAKE! I can't believe you did that in 4 minutes.

No problem - glad to help! And actually, it was more like 15 minutes. But
who's counting? <g>

--
Regards,

Jake Marx
MS MVP - Excel
www.longhead.com

[please keep replies in the newsgroup - email address unmonitored]
 
G

got mike?

okay, that works for the title, but i'm trying to parse the actual html to
pick out data in the middle of the page. how would you alter this to search
to just get the raw html code on the page?

Jake Marx said:
Hi Jon,

Here's a function that should return the page title of an HTML document (or
an empty string if URL is invalid or can't be found):

Public Function gsGetURLTitle(rsURL As String) As String
Dim ie As Object

Set ie = CreateObject("InternetExplorer.Application")

With ie
.Navigate rsURL
Do While .Busy And Not .ReadyState = 4
DoEvents
Loop

gsGetURLTitle = IIf(StrComp(.Document.Title, _
"Cannot find server", vbTextCompare) = 0, _
vbNullString, .Document.Title)
.Quit
End With

Set ie = Nothing
End Function

--
Regards,

Jake Marx
MS MVP - Excel
www.longhead.com

[please keep replies in the newsgroup - email address unmonitored]


J said:
I want to run a web query from Excel that retreives the raw HTML
code, not the parsed HTML page.

Specifically, instead of retreiving text from the page, I want to
return the page's HTML title (read: the text that falls between the
<title> tags).

Unfortunately, the web query interface only lets you pick page
content. I recorded the web query process and tried hacking the VBA
by hand, but it doesn't appear that there are any
objects/parameters/commands that allow me to import the page as pure
source code.

Does anyone know how to trick the web query into loading the source
code into worksheet? If so, I can then write some VBA to strip the
<title> value out for me.

Any gurus out there have an answer?

Thanks in advance-
Jon
 
T

Tim Williams

..Document.Body.outerHTML

Tim


got mike? said:
okay, that works for the title, but i'm trying to parse the actual html to
pick out data in the middle of the page. how would you alter this to
search
to just get the raw html code on the page?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top