The 7-digit fix worked great. Deleting "=" and everything after is a good
change too...thanks.
Can I extract the "/" character also? Some people use it in
abbreviations…such as "w/out" for "without." It will never be used for
division in my data.
You just remove it from this part of the pattern where it is listed in the
"character class": [[^-\d+*/^.] --> [^-\d+*^.]
It would be great if you could explain what each step of this formula is
doing. I hate to keep asking you for help in tweaking it.
I don't mind helping you along, but hopefully the references below will help
you get a grasp on this.
The function is based on Regular Expressions.
For information on how to implement this in VBA, see:
http://support.microsoft.com/default.aspx?scid=kb;en-us;818802&Product=vbb
http://msdn2.microsoft.com/en-us/library/6wzad2b2.aspx
For basic Regular Expression syntax, see:
http://www.regular-expressions.info/reference.html
Here are some examples, although the variety of actual data is too much to
list:
0.045 Fabrication hours per system * 100 Long Machines + 0.045 * 2 Short
Machines
6 pages (one table & one Illustration) * 3.9hrs per page = 24 hrs
The 6 hours represent actual labor expenditure for I/O 5024549. LOE for Jan
09 is 3 hrs total effort. (this one creates a problem with the "09"; I can
replace the "." with a "+" and get 9 hrs)
I don't expect to be able to resolve everything, but if I can automatically
check calculations on a majority of the data, it can cut hours & hours of
work per project. What you've given me so far is getting about 75%
results...which is great!
Well, looking at all the examples you have provided, this Pattern seems to do
the job. I have split it onto several lines so that when you copy it into the
module, you will not have problems with extra spaces or line breaks, but it
could be written all on one line as a single expression:
re.Pattern = "=[\s\S]*|\d{7}|^[^-+*]*$|\.(?!\d)|" & _
"(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)" & _
"\D*\d{1,2}|[^-+*\d.]"
In brief, this expression looks for, and deletes (in order):
1. Any "=" sign and everything following it in the cell
2. Any series of exactly seven digits
3. Any complete line which does not contain any arithmetic operators.
4. Any "." that is not immediately followed by a digit.
5. Any sequence that starts with a month name and ends with one or two digits.
(note that if you could have a sequence such as Jan 2009, you might
want to change the sequence \d{1,2} to \d{1,4} )
6. Anything else that is not a digit or a minus, or a plus, or an asterisk, or
a decimal (".")
The different groupings in the pattern are pipe-delimited.
Here is my current version of this UDF, with some annotations.
===============================
Option Explicit
Function foo(sStr) As Double
Dim sTemp As String
'set up the regular expression object and
'specify the Global and Pattern properties
Dim re As Object
Set re = CreateObject("vbscript.regexp")
re.Global = True
re.Pattern = "=[\s\S]*|\d{7}|^[^-+*]*$|\.(?!\d)|" & _
"(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)" & _
"\D*\d{1,2}|[^-+*\d.]"
'Make sTemp equal to the original function argument with the
'substitutions carried out
sTemp = re.Replace(sStr, "")
'Use the Evaluate method to make foo equal to the desired result
foo = Evaluate(re.Replace(sTemp, ""))
End Function
================================
Without the annotations, it's pretty short, for all it does:
=====================================
Option Explicit
Function foo(sStr) As Double
Dim sTemp As String
Dim re As Object
Set re = CreateObject("vbscript.regexp")
re.Global = True
re.Pattern = "=[\s\S]*|\d{7}|^[^-+*]*$|\.(?!\d)|" & _
"(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)" & _
"\D*\d{1,2}|[^-+*\d.]"
sTemp = re.Replace(sStr, "")
foo = Evaluate(re.Replace(sTemp, ""))
End Function
==================================
Here is a formal explanation of the Regular Expression. It probably won't make
much sense until you've had a chance to review the above references:
Extract
=[\s\S]*|\d{7}|^[^-+*]*$|\.(?!\d)|(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\D*\d{1,2}|[^-+*\d.]
Match either the regular expression below (attempting the next alternative only
if this one fails) «=[\s\S]*»
Match the character “=†literally «=»
Match a single character present in the list below «[\s\S]*»
Between zero and unlimited times, as many times as possible, giving back
as needed (greedy) «*»
A whitespace character (spaces, tabs, line breaks, etc.) «\s»
Any character that is NOT a whitespace character «\S»
Or match regular expression number 2 below (attempting the next alternative
only if this one fails) «\d{7}»
Match a single digit 0..9 «\d{7}»
Exactly 7 times «{7}»
Or match regular expression number 3 below (attempting the next alternative
only if this one fails) «^[^-+*]*$»
Assert position at the beginning of the string «^»
Match a single character NOT present in the list “-+*†«[^-+*]*»
Between zero and unlimited times, as many times as possible, giving back
as needed (greedy) «*»
Assert position at the end of the string (or before the line break at the
end of the string, if any) «$»
Or match regular expression number 4 below (attempting the next alternative
only if this one fails) «\.(?!\d)»
Match the character “.†literally «\.»
Assert that it is impossible to match the regex below starting at this
position (negative lookahead) «(?!\d)»
Match a single digit 0..9 «\d»
Or match regular expression number 5 below (attempting the next alternative
only if this one fails)
«(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\D*\d{1,2}»
Match the regular expression below and capture its match into backreference
number 1 «(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)»
Match either the regular expression below (attempting the next
alternative only if this one fails) «Jan»
Match the characters “Jan†literally «Jan»
Or match regular expression number 2 below (attempting the next
alternative only if this one fails) «Feb»
Match the characters “Feb†literally «Feb»
Or match regular expression number 3 below (attempting the next
alternative only if this one fails) «Mar»
Match the characters “Mar†literally «Mar»
Or match regular expression number 4 below (attempting the next
alternative only if this one fails) «Apr»
Match the characters “Apr†literally «Apr»
Or match regular expression number 5 below (attempting the next
alternative only if this one fails) «May»
Match the characters “May†literally «May»
Or match regular expression number 6 below (attempting the next
alternative only if this one fails) «Jun»
Match the characters “Jun†literally «Jun»
Or match regular expression number 7 below (attempting the next
alternative only if this one fails) «Jul»
Match the characters “Jul†literally «Jul»
Or match regular expression number 8 below (attempting the next
alternative only if this one fails) «Aug»
Match the characters “Aug†literally «Aug»
Or match regular expression number 9 below (attempting the next
alternative only if this one fails) «Sep»
Match the characters “Sep†literally «Sep»
Or match regular expression number 10 below (attempting the next
alternative only if this one fails) «Oct»
Match the characters “Oct†literally «Oct»
Or match regular expression number 11 below (attempting the next
alternative only if this one fails) «Nov»
Match the characters “Nov†literally «Nov»
Or match regular expression number 12 below (the entire group fails if
this one fails to match) «Dec»
Match the characters “Dec†literally «Dec»
Match a single character that is not a digit 0..9 «\D*»
Between zero and unlimited times, as many times as possible, giving back
as needed (greedy) «*»
Match a single digit 0..9 «\d{1,2}»
Between one and 2 times, as many times as possible, giving back as needed
(greedy) «{1,2}»
Or match regular expression number 6 below (the entire match attempt fails if
this one fails to match) «[^-+*\d.]»
Match a single character NOT present in the list below «[^-+*\d.]»
One of the characters “-+*†«-+*»
A single digit 0..9 «\d»
The character “.†«.»
Created with RegexBuddy
--ron