Regex Capture problem

G

Guest

I am using the VBScript_RegExp_55 library in an Excel-based VBA project to
build a text-file processor that uses rules stored in an Excel spreadsheet.
I have used Regex utilities before, so I understand the concepts of text
capture and re-use in a Regex expression.

But when I try to capture text with (*), for example, and then try to re-use
it with \1, I get either no changes made, or an object-generated error 5081.
I have not been able to isolate when these two results occur.

I believe my program code is basically correct, because I can do search and
replace type stuff just fine, including in the same "run" as the capture
operations that fail. Here are my options settings:

objRgx.IgnoreCase = False ' Set case sensitive.
objRgx.Global = True ' Set replace all.
objRgx.MultiLine = True ' Set multiline mode.

Thanks for any help you can provide.
 
R

Ron Rosenfeld

I am using the VBScript_RegExp_55 library in an Excel-based VBA project to
build a text-file processor that uses rules stored in an Excel spreadsheet.
I have used Regex utilities before, so I understand the concepts of text
capture and re-use in a Regex expression.

But when I try to capture text with (*), for example, and then try to re-use
it with \1, I get either no changes made, or an object-generated error 5081.
I have not been able to isolate when these two results occur.

I believe my program code is basically correct, because I can do search and
replace type stuff just fine, including in the same "run" as the capture
operations that fail. Here are my options settings:

objRgx.IgnoreCase = False ' Set case sensitive.
objRgx.Global = True ' Set replace all.
objRgx.MultiLine = True ' Set multiline mode.

Thanks for any help you can provide.

How about giving an example of your program code, input, and desired output?
--ron
 
G

Guest

Here are two regex search and replace strings. The first does works, but the
second does not:

Search for = "<TITLE>Page-1</TITLE>"
Replace with = "<TITLE>Staffing Process Model</TITLE>"


This did not do any capture or replace:
Search for = "<CENTER>\n([^\n]*)_DPM Model - (.*)HREF="([^\n]*).html"
Replace with = "<CENTER>\n\1_DPM Model - \2HREF= "\1_DPM Model - \3.html"

Here are lines in the html- first 3 show where I want to pick-up the
extended file name which is "STAFFING_VISION-REF_2007-03-31_DPM_Model_-_"

<CENTER>
<IMG SRC="STAFFING_VISION-REF_2007-03-31_DPM_Model_-_Talent_acquired.jpg"
ALT="" BORDER="0" USEMAP="#image_map">
</CENTER>

Here is where I want to insert the extended filename in front of the "Talent
Need defined.html". The stuff after ".html" will be processed and removed in
later steps:

<AREA SHAPE="CIRCLE" COORDS="103,395,44" HREF="Talent Need defined.html,
1UP: Toplevel">

Here are the variables as sent to the function:
strRgxSearchPattern = "<CENTER>\n\1_DPM Model - \2HREF= "\1_DPM Model -
\3.html"
strRgxInput = "<CENTER>\n([^\n]*)_DPM Model - (.*)HREF="([^\n]*).html"


Here is the function:
Private Function RgxReplaceText(strRgxSearchPattern, strRgxInput,
strRgxOutput) As String
Dim objRgx As New RegExp
objRgx.IgnoreCase = False ' Set case sensitive.
objRgx.Global = True ' Set replace all.
objRgx.MultiLine = True ' Set multiline mode.
objRgx.pattern = strRgxSearchPattern
RgxReplaceText = objRgx.Replace(strRgxInput, strRgxOutput)
End Function
 
R

Ron Rosenfeld

Replace with = "<CENTER>\n\1_DPM Model - \2HREF= "\1_DPM Model - \3.html"

I didn't test it but I believe, in the Replace with string, you need to use the
tokens $1 $2 $3
--ron
 
R

Ron Rosenfeld

I am using the VBScript_RegExp_55 library in an Excel-based VBA project to
build a text-file processor that uses rules stored in an Excel spreadsheet.
I have used Regex utilities before, so I understand the concepts of text
capture and re-use in a Regex expression.

But when I try to capture text with (*), for example, and then try to re-use
it with \1, I get either no changes made, or an object-generated error 5081.
I have not been able to isolate when these two results occur.

I believe my program code is basically correct, because I can do search and
replace type stuff just fine, including in the same "run" as the capture
operations that fail. Here are my options settings:

objRgx.IgnoreCase = False ' Set case sensitive.
objRgx.Global = True ' Set replace all.
objRgx.MultiLine = True ' Set multiline mode.

Thanks for any help you can provide.

Unfortunately, I'm still not following your example.

But, so far as your question about capturing text with (*), and then re-using
it, note the following:

Given the UDF:

===================================
Function RESub(str As String, SrchFor As String, ReplWith As String) As String
Dim objRegExp As RegExp

Set objRegExp = New RegExp
objRegExp.Pattern = SrchFor
objRegExp.IgnoreCase = True
objRegExp.Global = True
objRegExp.MultiLine = True

RESub = objRegExp.Replace(str, ReplWith)

End Function
================================

With these simple parameters:

str = "Now is the time"
SrchFor = Now is (.*) time
ReplWith = "$1"

Function returns "the" as would be expected.


--ron
 
G

Guest

Thanks, Ron - as it is so often the problem was the nut holding the wheel. I
"learned" my regex using a freeware utility that had slightly different
syntax, so I had to convert many lines of expressions. Along the way I
somehow forgot that * is an enumerator, not a character placeholder - so I
was trying to capture (*) instead of (.*). As soon as I saw your fragment I
realized what the problem was.

Initially I ALSO had the parameters backwards in the method invocation
(having the search in a proprerty, and the input and replacement strings in
the call is just hard for me to grok it seems), but when I fixed that, I then
made the * error and the message I got was the same, 5018, with no helpful
text.

Okay - thanks again for your help - I will probably be asking something
again sooner than I imagine.
 
R

Ron Rosenfeld

Thanks, Ron - as it is so often the problem was the nut holding the wheel. I
"learned" my regex using a freeware utility that had slightly different
syntax, so I had to convert many lines of expressions. Along the way I
somehow forgot that * is an enumerator, not a character placeholder - so I
was trying to capture (*) instead of (.*). As soon as I saw your fragment I
realized what the problem was.

Initially I ALSO had the parameters backwards in the method invocation
(having the search in a proprerty, and the input and replacement strings in
the call is just hard for me to grok it seems), but when I fixed that, I then
made the * error and the message I got was the same, 5018, with no helpful
text.

Okay - thanks again for your help - I will probably be asking something
again sooner than I imagine.

Well, I'm glad my efforts helped point you in a useful direction. There are
certainly some aspects of Regular Expressions as used in Perl, for example,
that are not replicated in VB.

Best wishes.
--ron
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top