VBA Regular Expressions & URL Extraction

R

Ron Rosenfeld

So do you think there's a difference between:
Dim re as Object
Set re = CreateObject("vbscript.regexp")
... and ...
Dim re as RegExp
Set re = New RegExp
?

I'll test it but I wondered if you have some insight into this. The
rest of my code (the actual regexp pattern test) is the same as yours.

Thanks!

No, there should not be any difference so long as you have a reference set to
Microsoft VBScript Regular Expressions 5.5. (That is not required with my
code). Also, if you have the reference set, I would DIM mc as match collection
rather than as object.

For trouble shooting, I would suggest you first try my code exactly as written,
all by itself (i.e. not embedded within your code).

In other words, enter the code I posted all by itself in a module.

Then --
A1: (please copy/paste from here)

<p id="height10" class="price-strike2">$2,599.00/EA Each</p>

B1:
=GetStrike2($A$1)

And see if that works. (returns $2,599.00)

If that works, and you want to use early-binding, set the Reference and use
this code instead:

====================================
Option Explicit
Function GetStrike2(sStr As String)
Dim re As RegExp, mc As MatchCollection
Set re = New RegExp
re.IgnoreCase = True
re.Pattern = """price-strike2"">(\$[0-9]*,?[0-9]*\.?[0-9]*)"
If re.test(sStr) = True Then
Set mc = re.Execute(sStr)
GetStrike2 = mc(0).submatches(0)
End If
Set re = Nothing
End Function
===============================

Again, use it "stand-alone"

If both work, then your problem is elsewhere.
--ron
 
A

Akrobrat

Ron and all,

I used Ron's function on a "stand-alone" basis and used a subroutine
(TestGetStrike) to pass the HTML source (as string), but it didn't
return anything. No errors, but no result.

I haven't got a clue as to why this doesn't work. Do you think I have
VB references set that are clashing? I've currently got:
- Visual Basic for Applications
- Microsoft Excel 11.0 Object Library
- Microsoft Internet Controls
- Microsoft VBScript Regular Expressions 5.5
- OLE Automation
- Microsoft Office 11.0 Object Library
- Microsoft Forms 2.0 Object Library

I moved the RegExp and Internet Controls up in priority too. I also
don't have any items in Excel that have been disabled (listed under
Help > About MS Excel). Any thoughts?


====================================

Option Explicit

Sub TestGetStrike()

Dim IE As InternetExplorer

Set IE = New InternetExplorer
IE.Visible = True
IE.Navigate "http://www.homedepot.com/webapp/wcs/stores/servlet/
ProductDisplay?
jspStoreDir=hdus&catalogId=10053&productId=100401483&navFlow=3&keyword=jvm1790sk"

'Load the page completely
Do Until IE.ReadyState = READYSTATE_COMPLETE And IE.Busy = False
Loop

Range("A1").Value = GetStrike2(IE.Document.body.innerHTML)

End Sub


Function GetStrike2(sStr As String)
Dim re As RegExp, mc As MatchCollection
Set re = New RegExp
re.IgnoreCase = True
re.Pattern = """price-strike2"">(\$[0-9]*,?[0-9]*\.?[0-9]*)"
If re.Test(sStr) = True Then
Set mc = re.Execute(sStr)
GetStrike2 = mc(0).SubMatches(0)
End If
Set re = Nothing
End Function

====================================
 
A

Akrobrat

Ron,

Sorry, in my frustration with this whole thing, I forgot to note that
your code as a Worksheet function worked perfectly, returning the
correct $ result. Therefore, this problem MUST be somewhere else, and
that is what I'm trying to figure out. Thanks, Ron!

- Dilan
 
R

Ron Rosenfeld

Ron,

Sorry, in my frustration with this whole thing, I forgot to note that
your code as a Worksheet function worked perfectly, returning the
correct $ result. Therefore, this problem MUST be somewhere else, and
that is what I'm trying to figure out. Thanks, Ron!

- Dilan

We're making progress.

Where I would suggest you check next is to ensure that the HTML derived string
being passed to the VBA function is, in fact, what it should be.

One way to do that would be to use the code I provided as a function. So you
might have something like:

===========================
Sub Dilan1()
....get the HTML string and assign it to strPageContent

strURL = getStrike2(strPageContent)

debug.print "result = " & strURL

end sub
====================

If the immediate window only prints result = , then it is likely the case
that strPageContent does not contain the relevant HTML substring.

If the immediate window prints the correct result, then your problem is
downstream.
--ron
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top