"String" manipulation for a Case clause

E

EagleOne

Ron,

I recalled this information from a few years ago. I present it only for your ease.

Is there anything you would like to add/amplify re the current subject?

TIA EagleOne

*****************************************************************************************
'From: Ron Rosenfeld <[email protected]>
'Subject: Re: How in to parse constants in formula to cells
'Date: Thu, 30 Nov 2006 07:48:11 -0500
'Newsgroups: microsoft.public.Excel.programming
'
'"Well, the simplest way (for me) to do that sort of replacement is by using
'Regular Expressions. The routines work quicker if you set a reference to
'Microsoft VBScrip REgular Expressions 5.5 (See Tools/References), but you can
'also use the CreateObject method to set this within the script.
'"Like"
'
''Set up Regex
'
' Dim objRegExp As RegExp
' Dim objMatch As match
' Dim colMatches As MatchCollection
'
''Create a regular expression object.
' Set objRegExp = New RegExp
''Set Case Insensitivity.
' objRegExp.IgnoreCase = True
''Set global applicability.
' objRegExp.Global = True
'
'The "key" as to what is going to be replaced is in how you define "Pattern".
'As set up, the "\b" parameter refers to any word boundary, which could be an
'operator, comma, or any character that is not alphanumeric, or the beginning or
'end of the string.
'
'I don't know how you are planning to input those variables, so I just
'put them in as Constants for testing purposes. You can Dim them and then set
'them to whatever, however you wish.
'
'With the Const statement, the type declaration is optional. But if the values
'were always going to be integers, I would use the Long type; if they might be
'decimal numbers, I would use Double or String.
'
'The "\b" token picks up a word boundary. A word boundary is defined as a
'change from a word character [A-Za-z0-9_] and a few other characters to a
'non-word character.
'
'Since the operators are not word characters, the pattern \b-64596792\b will not
'pick up the "-" as there is no "\b" prior to the "-". By definition, there
'cannot be. The first word boundary in the string "-64596792" is between the
' "-" and the first digit "6".
'
'This only becomes a problem when you specify the "-" as part of a signed
'number, rather than as an operator.
'
'Depending on precise requirements, you could just make the "-" optional, and
'construct a pattern like:
'
' -?\b64596792\b
'
'If that doesn't work, and you must test for ONLY operators, then change two
'lines: (Note that below the "+" sign is to the right of "-" which for this
'is an arguement to a command not an operator.)
'
'Pattern = "([-=+/*])" & NumToReplace & "\b" -or-
'Pattern = "(-?)\b" & NumToReplace & "\b"
' (Note the parentheses around the -? to "capture" it.)
'
'FormulaText = objRegExp.Replace(FormulaText, "$1" & ReplWith)"
' (The "$1" represents the first captured item.)
'
'But I don't understand why you can't just ignore the "-". In other words, why
'can't you treat the "-" as an operator rather than as denoting the sign of the
'constant?
'
'If the reason has to do with how you generate "Pattern", like from your Input
'Box, you could just strip it off by using the ABS function:
'
'=======================
'Do While NumbToReplace = 0
' NumbToReplace = ABS(InputBox("Enter number to _
' replace", "Replace Entry Box"))
' If NumbToReplace = 0 Then
' MsgBox "No Number Entered ... Start Over"
' End If
' Loop
' Pattern = "\b" & NumbToReplace & "\b"
' objRegExp.Pattern = Pattern
'==============================

'If you need to specify that NumbToReplace must follow an operator or an "="
'sign or a comma (for example), then:
'
'Pattern = "([-*/+=,])" & NumbToReplace & "\b"
'
'Note that the Character Class containing the operators and other characters is
'enclosed in parentheses. That will be "captured" and can be referenced in the
'Replace statement, so the same operator is also replaced:
'
'e.g.:
'
'FormulaText = objRegExp.Replace(FormulaText, "$1" & ReplaceWith)
**************************************************************************************************
 
R

Ron Rosenfeld

re.Pattern = "(^|[-+/*^=])\b\d*\.?\d+\b"

Here is a formal explanation:

======================================
Extract Constants

(^|[-+/*^=])\b\d*\.?\d+\b

Match the regular expression below and capture its match into backreference
number 1 «(^|[-+/*^=])»
Match either the regular expression below (attempting the next alternative
only if this one fails) «^»
Assert position at the beginning of the string «^»
Or match regular expression number 2 below (the entire group fails if this
one fails to match) «[-+/*^=]»
Match a single character present in the list “-+/*^=” «[-+/*^=]»
Assert position at a word boundary «\b»
Match a single digit 0..9 «\d*»
Between zero and unlimited times, as many times as possible, giving back as
needed (greedy) «*»
Match the character “.” literally «\.?»
Between zero and one times, as many times as possible, giving back as needed
(greedy) «?»
Match a single digit 0..9 «\d+»
Between one and unlimited times, as many times as possible, giving back as
needed (greedy) «+»
Assert position at a word boundary «\b»


Created with RegexBuddy
======================================

We are not using the backreferences. The parentheses are used for grouping
purposes.

A "word boundary" is a position that is between a "word character" and a
"non-word character". Word charactes include [A-Za-z0-9_]

Some of these references may be helpful (some may not still be valid, though):

Regular Expressions
http://www.regular-expressions.info/reference.html
http://support.microsoft.com/default.aspx?scid=kb;en-us;818802&Product=vbb
http://msdn2.microsoft.com/en-us/library/6wzad2b2.aspx
http://msdn2.microsoft.com/en-us/library/ms974619.aspx
http://www.regex-guru.info/
--ron
 
R

Ron Rosenfeld

Ron,

I recalled this information from a few years ago. I present it only for your ease.

Is there anything you would like to add/amplify re the current subject?

TIA EagleOne

Not really.

I think the current routine works better for extracting constants.

Every so often I work on a routine to parse formulas correctly, but it is still
a work in progress.
--ron
 
E

EagleOne

Your comment "A work in progress" is extremely interesting to me.

Silly me, I thought I was close that goal - with 100% your help even!

What are some examples/situations your are struggling with?

In short, I do not know what I do not know. I thought I was near heaven.

EagleOne
 
E

EagleOne

Excellent! Thanks

Ron Rosenfeld said:
re.Pattern = "(^|[-+/*^=])\b\d*\.?\d+\b"

Here is a formal explanation:

======================================
Extract Constants

(^|[-+/*^=])\b\d*\.?\d+\b

Match the regular expression below and capture its match into backreference
number 1 «(^|[-+/*^=])»
Match either the regular expression below (attempting the next alternative
only if this one fails) «^»
Assert position at the beginning of the string «^»
Or match regular expression number 2 below (the entire group fails if this
one fails to match) «[-+/*^=]»
Match a single character present in the list “-+/*^=” «[-+/*^=]»
Assert position at a word boundary «\b»
Match a single digit 0..9 «\d*»
Between zero and unlimited times, as many times as possible, giving back as
needed (greedy) «*»
Match the character “.” literally «\.?»
Between zero and one times, as many times as possible, giving back as needed
(greedy) «?»
Match a single digit 0..9 «\d+»
Between one and unlimited times, as many times as possible, giving back as
needed (greedy) «+»
Assert position at a word boundary «\b»


Created with RegexBuddy
======================================

We are not using the backreferences. The parentheses are used for grouping
purposes.

A "word boundary" is a position that is between a "word character" and a
"non-word character". Word charactes include [A-Za-z0-9_]

Some of these references may be helpful (some may not still be valid, though):

Regular Expressions
http://www.regular-expressions.info/reference.html
http://support.microsoft.com/default.aspx?scid=kb;en-us;818802&Product=vbb
http://msdn2.microsoft.com/en-us/library/6wzad2b2.aspx
http://msdn2.microsoft.com/en-us/library/ms974619.aspx
http://www.regex-guru.info/
--ron
 
R

Ron Rosenfeld

Your comment "A work in progress" is extremely interesting to me.

Silly me, I thought I was close that goal - with 100% your help even!

What are some examples/situations your are struggling with?

In short, I do not know what I do not know. I thought I was near heaven.

EagleOne

Well, in addition to parsing out constants and operators, you would also need
to parse out Functions; Names; Function arguments; cell addresses and ranges in
a variety of formats; handle arrays properly; and so forth.
--ron
 
E

EagleOne

My needs "might" be a bit more limited. I do realize that Excel, VBA, UDF can manipulate values.

That said I need to be able to find undocumented (almost always, unlabeled additions, subtractions
and/or * or / of same) in formula strings. The user then needs to document those constants.

Obviously my game plan can be made difficult by those sophisticated accountants who may manipulate
values in hidden cells, hidden formulas, Interior.ColorIndex = Font.ColorIndex, ranges, sheets or
functions.

I have proceedures to catch all the above except for functions.

Below are some notes from your prior work that I collected; with current comments:

'From: Ron Rosenfeld <[email protected]>
'Subject: Re: How in to parse constants in formula to cells
'Date: Mon, 27 Nov 2006 09:05:29 -0500
'Newsgroups: microsoft.public.Excel.programming
'With regard to some of the issues:
' it returns the negative signed values
' it does NOT return "within string" constants
Here I think you mean the formula string Yes/No?
or do you mean "123456" within a Function?
i.e. VLookup
' it returns 3% as 3
' it returns all constants within a function
This is OK as surprises can occur as a result

Realizing that I am now to be a dead man resulting from my limited perspective, what have I missed?
Gulp!

EagleOne
 
R

Ron Rosenfeld

'From: Ron Rosenfeld <[email protected]>
'Subject: Re: How in to parse constants in formula to cells
'Date: Mon, 27 Nov 2006 09:05:29 -0500
'Newsgroups: microsoft.public.Excel.programming
'With regard to some of the issues:
' it returns the negative signed values
' it does NOT return "within string" constants
Here I think you mean the formula string Yes/No?
or do you mean "123456" within a Function?
i.e. VLookup

It's been a long time since I've worked on this routine. But, off the top of
my head, this routine will not return anything from a formula of the type:

="1/2/09"-"4/6/08"

It also needs to be able to handle NAME'd constants.
--ron
 
E

EagleOne

That is what I thought and that is fine for my purposes.

Again, thank you so much for sharing your wealth re Regex + other thoughts.

EagleOne
 
E

EagleOne

Ron,

For the heck of it:

Starting with: Pattern = (^|[-+/*^=])\b\d*\.?\d+\b

Is there a modification to the pattern which would eliminate all Functions from consideration?

EagleOne
 
R

Ron Rosenfeld

Ron,

For the heck of it:

Starting with: Pattern = (^|[-+/*^=])\b\d*\.?\d+\b

Is there a modification to the pattern which would eliminate all Functions from consideration?

EagleOne

I can't think of one off-hand. Probably you'd need a list of functions to
start with, and then look for paired parentheses. Then remove those substrings
from the original. You'll have to deal with nesting also.
--ron
 
E

EagleOne

Well it was thought. Heck it could have been ~{/ !

Ron Rosenfeld said:
Ron,

For the heck of it:

Starting with: Pattern = (^|[-+/*^=])\b\d*\.?\d+\b

Is there a modification to the pattern which would eliminate all Functions from consideration?

EagleOne

I can't think of one off-hand. Probably you'd need a list of functions to
start with, and then look for paired parentheses. Then remove those substrings
from the original. You'll have to deal with nesting also.
--ron
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top