Need formula to extract a numeric value from a free-format text

L

Lori

After further investigations, i think i've got a simplified version:

=LOOKUP(1E8,--(MID(A4&".",COLUMN(A:IV),7)&" 0/1"))

which returns the last 7 digit integer and errors if none is found.
[&" 0/1" ensures only integers are returned; &"e0" would return decimals too]
 
R

Ron Rosenfeld

After further investigations, i think i've got a simplified version:

=LOOKUP(1E8,--(MID(A4&".",COLUMN(A:IV),7)&" 0/1"))

which returns the last 7 digit integer and errors if none is found.
[&" 0/1" ensures only integers are returned; &"e0" would return decimals too]

It also returns seven digit numbers that are part of other strings, including
longer strings of digits.
--ron
 
L

Lori

I'm not sure what you mean by "numbers that are part of other strings".
I thought that's what was wanted but i'm probably missing something.

As Rick states the requirements are not clear enough on this.

Ron Rosenfeld said:
After further investigations, i think i've got a simplified version:

=LOOKUP(1E8,--(MID(A4&".",COLUMN(A:IV),7)&" 0/1"))

which returns the last 7 digit integer and errors if none is found.
[&" 0/1" ensures only integers are returned; &"e0" would return decimals too]

It also returns seven digit numbers that are part of other strings, including
longer strings of digits.
--ron
 
R

Ron Rosenfeld

I'm not sure what you mean by "numbers that are part of other strings".
I thought that's what was wanted but i'm probably missing something.

Sorry. That statement was not clear.

What I meant is that your routine seems to return seven digits from substrings
that are longer.

NH1234567890 --> 4567890
45678901234 --> 8901234

I interpreted the OP's requirements to indicate that he wanted to extract the
first seven digit WORD; whereas your routines extract the first (or last) seven
consecutive digits, even if they are part of another word.

In the regex I presented, WORD is defined as a seven digit string of digits
surrounded by a non-word character. A non-word character is anything except a
digit, letter or underscore.

So, mine is somewhat flawed in that it would return seven digits if they were
prepended by, let us say, an asterisk or ampersand; but it would return
**MISSING** in the above instances.

But given &1234567*, mine would also return the seven digits. This could be
taken care of, if necessary, by defining word a bit differently.
--ron
 
R

Rick Rothstein

I know you asked for a formula, but would a UDF (user defined function) be
acceptable (it would require allowing macros to run)? If so, press Alt+F11
to go into the VB editor and, once there, add a Module (Insert/Module from
its menu bar). Next copy/paste the following into the code window that
opened up...

Function First7DigitNumber(S As String) As String
Dim X As Long
For X = 1 To Len(S)
If Mid(" " & S & " ", X, 9) Like "[!0-9]#######[!0-9]" Then
First7DigitNumber = Mid(S, X, 7)
End If
Next
End Function

Now, go back to your worksheet and use this formula in whatever cell you
want (changing the A1 reference to the cell address containing your text)...

=First7DigitNumber(A1)

This UDF finds the first "isolated" 7 digit number (that is, a 7 digit
number at the beginning or end of the text or, if interior to the text, with
non-digit characters in front and behind it).
 
R

Ron Rosenfeld

Function First7DigitNumber(S As String) As String
Dim X As Long
For X = 1 To Len(S)
If Mid(" " & S & " ", X, 9) Like "[!0-9]#######[!0-9]" Then
First7DigitNumber = Mid(S, X, 7)
End If
Next
End Function

I note that given the following modification of the OP's test string:

"User requested authority of emergency ID for reason NHUSER1234567 Restore
of object to library LEVEL2 under remedy 61074317. 06/04/09 17:46 QPGMR"

your routine returns 1234567 whereas my UDF returns "MISSING" since there are
no seven digit words.

(Lori's formulas return 1074317)

--ron
 
R

Rick Rothstein

I am still not sure if the OP wanted the 7-digit number to stand alone (as a
"word") or not, so I just went for the first isolated 7 digits in a row...

Function First7DigitNumber(S As String) As String
Dim X As Long
For X = 1 To Len(S)
If Mid(" " & S & " ", X, 9) Like "[!0-9]#######[!0-9]" Then
First7DigitNumber = Mid(S, X, 7)
Exit Function
End If
Next
First7DigitNumber = "*MISSING*"
End Function

--
Rick (MVP - Excel)


Ron Rosenfeld said:
Function First7DigitNumber(S As String) As String
Dim X As Long
For X = 1 To Len(S)
If Mid(" " & S & " ", X, 9) Like "[!0-9]#######[!0-9]" Then
First7DigitNumber = Mid(S, X, 7)
End If
Next
End Function

I note that given the following modification of the OP's test string:

"User requested authority of emergency ID for reason NHUSER1234567
Restore
of object to library LEVEL2 under remedy 61074317. 06/04/09 17:46 QPGMR"

your routine returns 1234567 whereas my UDF returns "MISSING" since there
are
no seven digit words.

(Lori's formulas return 1074317)

--ron
 
R

Rick Rothstein

My last message had the wrong opening sentence (it was meant for another
message I was working on). This is what I meant to post...

The function I posted does not work as I had indicated... it finds the last
7-digit number, not the first. Here is the corrected code (plus I added the
missing *MISSING* indicator)...

Function First7DigitNumber(S As String) As String
Dim X As Long
For X = 1 To Len(S)
If Mid(" " & S & " ", X, 9) Like "[!0-9]#######[!0-9]" Then
First7DigitNumber = Mid(S, X, 7)
Exit Function
End If
Next
First7DigitNumber = "*MISSING*"
End Function

--
Rick (MVP - Excel)


Rick Rothstein said:
I am still not sure if the OP wanted the 7-digit number to stand alone (as
a "word") or not, so I just went for the first isolated 7 digits in a
row...

Function First7DigitNumber(S As String) As String
Dim X As Long
For X = 1 To Len(S)
If Mid(" " & S & " ", X, 9) Like "[!0-9]#######[!0-9]" Then
First7DigitNumber = Mid(S, X, 7)
Exit Function
End If
Next
First7DigitNumber = "*MISSING*"
End Function

--
Rick (MVP - Excel)


Ron Rosenfeld said:
Function First7DigitNumber(S As String) As String
Dim X As Long
For X = 1 To Len(S)
If Mid(" " & S & " ", X, 9) Like "[!0-9]#######[!0-9]" Then
First7DigitNumber = Mid(S, X, 7)
End If
Next
End Function

I note that given the following modification of the OP's test string:

"User requested authority of emergency ID for reason NHUSER1234567
Restore
of object to library LEVEL2 under remedy 61074317. 06/04/09 17:46 QPGMR"

your routine returns 1234567 whereas my UDF returns "MISSING" since
there are
no seven digit words.

(Lori's formulas return 1074317)

--ron
 
R

Rick Rothstein

I am still not sure if the OP wanted the 7-digit number to stand alone (as a
"word") or not, so I just went for the first isolated 7 digits in a row
whether imbedded in other text or not. The reason I thought that is because
of the "dot" that followed the 7-digit number in the OP's posted example
text. In thinking about it, I'm guessing you took that to be a period at the
end of a sentence.

--
Rick (MVP - Excel)


Rick Rothstein said:
I know you asked for a formula, but would a UDF (user defined function) be
acceptable (it would require allowing macros to run)? If so, press Alt+F11
to go into the VB editor and, once there, add a Module (Insert/Module from
its menu bar). Next copy/paste the following into the code window that
opened up...

Function First7DigitNumber(S As String) As String
Dim X As Long
For X = 1 To Len(S)
If Mid(" " & S & " ", X, 9) Like "[!0-9]#######[!0-9]" Then
First7DigitNumber = Mid(S, X, 7)
End If
Next
End Function

Now, go back to your worksheet and use this formula in whatever cell you
want (changing the A1 reference to the cell address containing your
text)...

=First7DigitNumber(A1)

This UDF finds the first "isolated" 7 digit number (that is, a 7 digit
number at the beginning or end of the text or, if interior to the text,
with non-digit characters in front and behind it).

--
Rick (MVP - Excel)


Eric_NY said:
I have a column containing text values like "User requested authority of
emergency ID for reason NHUSER23 Restore of object to library LEVEL2
under remedy 1074317. 06/04/09 17:46 QPGMR".

Some of the cells contain a 7-digit number and others don't. The 7-digit
number does not start at a fixed location in the text.

Is it possible to write a formula which looks for the 7-digit number, and
returns the number if present, and another value (such as 0 or
"*MISSING*")
if not?

I'm using Excel 2003.

Thanks.
 
R

Rick Rothstein

I am still not sure if the OP wanted the 7-digit number to stand alone (as a
"word") or not, so I just went for the first isolated 7 digits in a row
whether imbedded in other text or not. The reason I thought that is because
of the "dot" that followed the 7-digit number in the OP's posted example
text. In thinking about it, I'm guessing you took that to be a period at the
end of a sentence.

With that said, I made a mistake in my original function and left out the
*MISSING* indicator. I just posted a corrected function against my original
message for the function.

--
Rick (MVP - Excel)


--
Rick (MVP - Excel)


Ron Rosenfeld said:
Function First7DigitNumber(S As String) As String
Dim X As Long
For X = 1 To Len(S)
If Mid(" " & S & " ", X, 9) Like "[!0-9]#######[!0-9]" Then
First7DigitNumber = Mid(S, X, 7)
End If
Next
End Function

I note that given the following modification of the OP's test string:

"User requested authority of emergency ID for reason NHUSER1234567
Restore
of object to library LEVEL2 under remedy 61074317. 06/04/09 17:46 QPGMR"

your routine returns 1234567 whereas my UDF returns "MISSING" since there
are
no seven digit words.

(Lori's formulas return 1074317)

--ron
 
R

Rick Rothstein

Ignore this message... it was mis-sent.

--
Rick (MVP - Excel)


Rick Rothstein said:
I am still not sure if the OP wanted the 7-digit number to stand alone (as
a "word") or not, so I just went for the first isolated 7 digits in a row
whether imbedded in other text or not. The reason I thought that is because
of the "dot" that followed the 7-digit number in the OP's posted example
text. In thinking about it, I'm guessing you took that to be a period at
the end of a sentence.

--
Rick (MVP - Excel)


Rick Rothstein said:
I know you asked for a formula, but would a UDF (user defined function) be
acceptable (it would require allowing macros to run)? If so, press Alt+F11
to go into the VB editor and, once there, add a Module (Insert/Module from
its menu bar). Next copy/paste the following into the code window that
opened up...

Function First7DigitNumber(S As String) As String
Dim X As Long
For X = 1 To Len(S)
If Mid(" " & S & " ", X, 9) Like "[!0-9]#######[!0-9]" Then
First7DigitNumber = Mid(S, X, 7)
End If
Next
End Function

Now, go back to your worksheet and use this formula in whatever cell you
want (changing the A1 reference to the cell address containing your
text)...

=First7DigitNumber(A1)

This UDF finds the first "isolated" 7 digit number (that is, a 7 digit
number at the beginning or end of the text or, if interior to the text,
with non-digit characters in front and behind it).

--
Rick (MVP - Excel)


Eric_NY said:
I have a column containing text values like "User requested authority of
emergency ID for reason NHUSER23 Restore of object to library LEVEL2
under remedy 1074317. 06/04/09 17:46 QPGMR".

Some of the cells contain a 7-digit number and others don't. The 7-digit
number does not start at a fixed location in the text.

Is it possible to write a formula which looks for the 7-digit number,
and
returns the number if present, and another value (such as 0 or
"*MISSING*")
if not?

I'm using Excel 2003.

Thanks.
 
R

Rick Rothstein

I found this definition of a word boundary in Regular Expressions...

"A word boundary represents the spot where a letter or
number meets a space, apostrophe, a period, or anything
else that isn't a letter or number"

Given that, this modification of my function should do what your RegExp
solution does...

Function First7DigitNumber(S As String) As String
Dim X As Long
For X = 1 To Len(S)
If Mid(" " & S & " ", X, 9) Like "[!a-zA-Z0-9]#######[!a-zA-Z0-9]" Then
First7DigitNumber = Mid(S, X, 7)
Exit Function
End If
Next
First7DigitNumber = "*MISSING*"
End Function

--
Rick (MVP - Excel)


Ron Rosenfeld said:
Function First7DigitNumber(S As String) As String
Dim X As Long
For X = 1 To Len(S)
If Mid(" " & S & " ", X, 9) Like "[!0-9]#######[!0-9]" Then
First7DigitNumber = Mid(S, X, 7)
End If
Next
End Function

I note that given the following modification of the OP's test string:

"User requested authority of emergency ID for reason NHUSER1234567
Restore
of object to library LEVEL2 under remedy 61074317. 06/04/09 17:46 QPGMR"

your routine returns 1234567 whereas my UDF returns "MISSING" since there
are
no seven digit words.

(Lori's formulas return 1074317)

--ron
 
R

Ron Rosenfeld

The reason I thought that is because
of the "dot" that followed the 7-digit number in the OP's posted example
text. In thinking about it, I'm guessing you took that to be a period at the
end of a sentence.

Yes, I did.

I found this definition of a word boundary in Regular Expressions...

"A word boundary represents the spot where a letter or
number meets a space, apostrophe, a period, or anything
else that isn't a letter or number"

Not quite what I understand it to be (but close)./

The definitions I've seen indicate that a word boundary "Matches at the
position between a word character (anything matched by \w) and a non-word
character (anything matched by [^\w] or \W) as well as at the start and/or end
of the string if the first and/or last characters in the string are word
characters."

And, at least in VBScript, a word character is a digit, letter or underscore
e.g: [A-Za-z0-9_]

Given that, this modification of my function should do what your RegExp
solution does...

Function First7DigitNumber(S As String) As String
Dim X As Long
For X = 1 To Len(S)
If Mid(" " & S & " ", X, 9) Like "[!a-zA-Z0-9]#######[!a-zA-Z0-9]" Then
First7DigitNumber = Mid(S, X, 7)
Exit Function
End If
Next
First7DigitNumber = "*MISSING*"
End Function

It comes pretty close. Just change this line to include the underscores:

If Mid(" " & S & " ", X, 9) Like "[!a-zA-Z0-9_]#######[!a-zA-Z0-9_]" Then

--ron
 
R

Rick Rothstein

I had the damnedest time trying to find what RegExp considered a word
boundary (basically, each sight just kept say use \w without listing what it
consider the boundary). When I finally found the one I cited, I figured it
was a universal definition. Now I'm guessing there might be version
differences between the various RegExp engines. Yes, I fix to account for
the underbar is as you have shown it. For the archives, here is the UDF with
the change you indicated...

Function First7DigitNumber(S As String) As String
Dim X As Long
For X = 1 To Len(S)
If Mid(" " & S & " ", X, 9) Like "[!a-zA-Z0-9_]#######[!a-zA-Z0-9_]"
Then
First7DigitNumber = Mid(S, X, 7)
Exit Function
End If
Next
First7DigitNumber = "*MISSING*"
End Function

--
Rick (MVP - Excel)


Ron Rosenfeld said:
The reason I thought that is because
of the "dot" that followed the 7-digit number in the OP's posted example
text. In thinking about it, I'm guessing you took that to be a period at
the
end of a sentence.

Yes, I did.

I found this definition of a word boundary in Regular Expressions...

"A word boundary represents the spot where a letter or
number meets a space, apostrophe, a period, or anything
else that isn't a letter or number"

Not quite what I understand it to be (but close)./

The definitions I've seen indicate that a word boundary "Matches at the
position between a word character (anything matched by \w) and a non-word
character (anything matched by [^\w] or \W) as well as at the start and/or
end
of the string if the first and/or last characters in the string are word
characters."

And, at least in VBScript, a word character is a digit, letter or
underscore
e.g: [A-Za-z0-9_]

Given that, this modification of my function should do what your RegExp
solution does...

Function First7DigitNumber(S As String) As String
Dim X As Long
For X = 1 To Len(S)
If Mid(" " & S & " ", X, 9) Like "[!a-zA-Z0-9]#######[!a-zA-Z0-9]"
Then
First7DigitNumber = Mid(S, X, 7)
Exit Function
End If
Next
First7DigitNumber = "*MISSING*"
End Function

It comes pretty close. Just change this line to include the underscores:

If Mid(" " & S & " ", X, 9) Like "[!a-zA-Z0-9_]#######[!a-zA-Z0-9_]" Then

--ron
 
L

Lori

Good points although I'm pretty sure this is academic from the parenthesised
reply to JoeU above. From the context given, if it's known there's no more
than one seven digit number it seems unlikely there are any longer numbers
either.

Given the high risk of typos in freeform text fields it's possible a udf may
be more reliable but there are just too many unknowns. A thorough check will
need to take place whatever the method.
 
R

Ron Rosenfeld

I had the damnedest time trying to find what RegExp considered a word
boundary (basically, each sight just kept say use \w without listing what it
consider the boundary). When I finally found the one I cited, I figured it
was a universal definition. Now I'm guessing there might be version
differences between the various RegExp engines. Yes, I fix to account for
the underbar is as you have shown it. For the archives, here is the UDF with
the change you indicated...

Function First7DigitNumber(S As String) As String
Dim X As Long
For X = 1 To Len(S)
If Mid(" " & S & " ", X, 9) Like "[!a-zA-Z0-9_]#######[!a-zA-Z0-9_]"
Then
First7DigitNumber = Mid(S, X, 7)
Exit Function
End If
Next
First7DigitNumber = "*MISSING*"
End Function

And, to keep them together, here is the Regex version:

===================
Option Explicit
Function Seven(s As String) As String
Dim re As Object, mc As Object
Set re = CreateObject("vbscript.regexp")
re.Pattern = "\b\d{7}\b"
If re.test(s) = True Then
Set mc = re.Execute(s)
Seven = mc(0)
Else
Seven = "*MISSING*"
End If
End Function
====================

Here are some definitions with regard to word boundaries and some of the
differences:

There are three different positions that qualify as word boundaries:

1. Before the first character in the string, if the first character is
a word character.
2. After the last character in the string, if the last character is a
word character.
3. Between two characters in the string, where one is a word character
and the other is not a word character.

In all flavors, the characters [a-zA-Z0-9_] are word characters. And that is
the case for VBScript. However, some of the other flavors will recognize
characters from other languages, and/or unicode characters, as word characters.
And I believe there is one flavor (?Python) where you can even set flags to
change the definition of a word character.
--ron
 
E

Eric_NY

I just read your message from last Friday.

The text is free format. Users can enter it in whatever format they want.
The 7-digit number is somewhere within the text. I've glanced through it and
in the samples I've seen, there's no consistency in what appears before or
after the 7-digit number.

I used the regex solution that Ron Rosenfeld suggested, and adjusted the
regular expression by removing the "\b" before and after the "\d{7}".
 
E

Eric_NY

The text is user-entered and free-format. It can contain anything. Except in
case of input error, it always contains the 7-digit number, which is what I'm
trying to extract.
 
R

Rick Rothstein

In case you want to consider it, here is a non-RegEx array-entered** formula
that will do what you want...

=MID(F5,MIN(IF(ISNUMBER(--MID(SUBSTITUTE(F5," ","x"),ROW(1:30),7))*
ISERR(SEARCH("e",MID(F5,ROW(1:30),7)))*ISERR(FIND("/",MID(F5,ROW
(1:30),7))),ROW(1:30))),7)

Note though, that this formula is dependent on what your default date
separator is. Mine is the slash character (/) and that is what I used in the
FIND function call... if your default date separator is a different symbol,
then just replace my slash with that character.
 
R

Rick Rothstein

I forgot to include the note regarding array-entered formulas. Here is my
message again, but with the note...

In case you want to consider it, here is a non-RegEx array-entered** formula
that will do what you want...

=MID(F5,MIN(IF(ISNUMBER(--MID(SUBSTITUTE(F5," ","x"),ROW(1:30),7))*
ISERR(SEARCH("e",MID(F5,ROW(1:30),7)))*ISERR(FIND("/",MID(F5,ROW
(1:30),7))),ROW(1:30))),7)

**Commit this formula using Ctrl+Shift+Enter, not just Enter by itself

Note though, that this formula is dependent on what your default date
separator is. Mine is the slash character (/) and that is what I used in the
FIND function call... if your default date separator is a different symbol,
then just replace my slash with that character.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top