Need formula to extract a numeric value from a free-format text

E

Eric_NY

I have a column containing text values like "User requested authority of
emergency ID for reason NHUSER23 Restore of object to library LEVEL2
under remedy 1074317. 06/04/09 17:46 QPGMR".

Some of the cells contain a 7-digit number and others don't. The 7-digit
number does not start at a fixed location in the text.

Is it possible to write a formula which looks for the 7-digit number, and
returns the number if present, and another value (such as 0 or "*MISSING*")
if not?

I'm using Excel 2003.

Thanks.
 
B

Bob Phillips

Is there any hard and fast rule that governs where the number might appear,
such as before the first full stop?
 
J

JoeU2004

If the word "remedy" and a single space always precede the number, and if
the number is truly always 7 digits, you could use the following (A1
contains the original text):

=IF(ISNUMBER(FIND("remedy",A1)), MID(A1,FIND("remedy",A1)+7,7), "missing")

Otherwise, it probably will be easier to write a user-defined function in
VB. However, the rules for finding the desired number must be precise. For
example, is there always only one 7-digit number preceded by a space? Or is
there always specific set of words ("remedy" and others) preceding the
number?

Basically, how would a human distinguish and find the number if he had to
scan the text left to right or right to left character by character or word
by word, where a "word" is a sequence of non-space characters.


----- original message -----
 
E

Eric_NY

No, none that I can see. This is free-format input by the user, and as much
as we'd like to insist that the user use a prescribed format, we can't be
sure that they always will. I just need a way to look for a 7-digit number
anywhere within the text.

Thanks for any suggestions you can provide.
 
E

Eric_NY

Unfortunately there's no precise format. I just need to look for a 7-digit
number, which can be anywhere in the string. (I could restrict it to looking
for the first 7-digit number, since I know that there's never more than one.)

Maybe it's time for me finally to force myself to learn VB. I'm a former
programmer, but never learned VB.

Thanks for your suggestions.
 
R

Ron Rosenfeld

I have a column containing text values like "User requested authority of
emergency ID for reason NHUSER23 Restore of object to library LEVEL2
under remedy 1074317. 06/04/09 17:46 QPGMR".

Some of the cells contain a 7-digit number and others don't. The 7-digit
number does not start at a fixed location in the text.

Is it possible to write a formula which looks for the 7-digit number, and
returns the number if present, and another value (such as 0 or "*MISSING*")
if not?

I'm using Excel 2003.

Thanks.

Easy to do using Regular Expressions.

The Regular Expression (Regex) would be "\b\d{7}\b" which translates into "find
a seven digit expression that is surrounded by word boundaries. You need to
assert the word boundary to avoid picking up 7 digits of a longer number that
might be there.

This can be implemented in several ways.

1. Download and install Longre's free morefunc.xll add-in (use Google to find a
functioning download site). Then use the formula:

=IF(REGEX.MID(A1,"\b\d{7}\b")="","*MISSING*",REGEX.MID(A1,"\b\d{7}\b"))

2. Write a short UDF using VBA.

To enter this User Defined Function (UDF), <alt-F11> opens the Visual Basic
Editor.
Ensure your project is highlighted in the Project Explorer window.
Then, from the top menu, select Insert/Module and
paste the code below into the window that opens.

To use this User Defined Function (UDF), enter a formula like

=Seven(A1)

in some cell.

================================
Option Explicit
Function Seven(s As String) As String
Dim re As Object, mc As Object
Set re = CreateObject("vbscript.regexp")
re.Pattern = "\b\d{7}\b"
If re.test(s) = True Then
Set mc = re.Execute(s)
Seven = mc(0)
Else
Seven = "*MISSING*"
End If
End Function
=================================
--ron
 
E

Eric_NY

Ron - Thanks. I'm going to try the Longre morefunc add-in.

If I use the add-in functions and then send the sheet to someone by email,
will the formulas still work? Or does the recipient also have to install the
add-in separately on his own machine?

Thanks for your help.
 
R

Ron Rosenfeld

Ron - Thanks. I'm going to try the Longre morefunc add-in.

If I use the add-in functions and then send the sheet to someone by email,
will the formulas still work? Or does the recipient also have to install the
add-in separately on his own machine?

Thanks for your help.

Ordinarily no. They would have to install it themselves. However, with the
morefunc add-in there is an option, I believe it is a menu option, to install
the add-in as part of the workbook. If you do that, it will then be usable by
the recipient with no particular effort on his part.

One caution concerning the add-in -- it will not work on strings that are
longer than 255 characters. This, apparently is an .xll limitation, and there
is no good way around it within the add-in.

By the way, if you should use the UDF approach, the UDF should be embedded
within the workbook, so its use should be transparent to your user. But there
are a lot of other useful functions in morefunc.
--ron
 
L

Lori

For a formula approach maybe try this (for text in A1):

=MIN(MMULT(10^{6,5,4,3,2,1,0},--TEXT(MID(A1,COLUMN(A:IV)+{0;1;2;3;4;5;6},1),"0;;0;\1\e\9")))
 
L

Lori

I should have added to ignore any answer greater than 7 digits, this could be
done by wrapping the formula below in =TEXT( ... ,"[<1e9]0;""MISSING""")
 
T

T. Valko

there is an option, I believe it is a menu option,
to install the add-in as part of the workbook.

This option is currently not available for Excel 2007. In other versions of
Excel, when the add-in is installed it should create a new item in the Tools
menu, Tools>Morefunc>Embed Morefunc in the workbook. Note that embeding
Morefunc will add some size to the file. According to Morefunc help, about
500kb. 350kb for the functions and 150kb for the help file.
 
R

Rick Rothstein

Is the number always precede by a space when the number is interior to the
text? What about that "dot" after it... is there always a dot following it?
If the dot might not always be there, is there always a space after the
number when it is interior to the text?
 
E

Eric_NY

"However, with the morefunc add-in there is an option, I believe it is a menu
option, to install the add-in as part of the workbook. "

I can't find that option. Where should I look for it? Which menu?
 
R

Ron Rosenfeld

"However, with the morefunc add-in there is an option, I believe it is a menu
option, to install the add-in as part of the workbook. "

I can't find that option. Where should I look for it? Which menu?

Which version of Excel are you using? If you are using 2007+, that option is
no longer available. If you are using an earlier version of Excel, I believe
that, when you install morefunc, you need to make a selection to enable this;
and then, if I recall correctly, it shows up on the Tools menu.

If you are using Excel 2007+, and don't want to require your users to install
morefunc, I would suggest using the UDF regex approach I previously posted.
--ron
 
R

Rick Rothstein

I would still be interested in the answer to my questions...

"Is the number always precede by a space when the number
is interior to the text? What about that "dot" after it... is there
always a dot following it? If the dot might not always be
there, is there always a space after the number when it is
interior to the text?"
 
J

JoeU2004

Rick Rothstein said:
I would still be interested in the answer to my questions...

In response to my questions, Eric wrote:

"Unfortunately there's no precise format. I just need to look for a 7-digit
number, which can be anywhere in the string. (I could restrict it to looking
for the first 7-digit number, since I know that there's never more than
one.)"


----- original message -----
 
R

Rick Rothstein

I saw that, but was wondering if that 7-digit number could ever be encased
in text like this... abc1234567def... hence my question about a leading
space (assuming the number were not at the beginning of the text) and a
following "dot" (as shown in his example) or possibly a following space to
offset it from other text. I've been playing around with an array-entered**
formula concept like this...

=MID(F5,MIN(IF(ISNUMBER(--MID(SUBSTITUTE(F5," ","x"),
ROW(1:30),7)),ROW(1:30))),7)

**Commit this formula using Ctrl+Shift+Enter, not just Enter by itself

I'm not really concerned about the leading space any more (the SUBSTITUTE
function has taken care of that), but I am interested in what trails the
7-digit number. Of course there are still problems with the above formula;
for example, things constructed like this...17Jul09... and like this...
1.23e45... will register as "7-digit numbers". Unfortunate, the first 7
digits of a number longer than 7 digits will also register as a "hit". I
just figured it would be nice to know if there were *any* formatting type
things to lock onto before attempting to patch the above formula any
further. As a matter-of-fact, if the OP is reading this... I would also like
to know if the text could contain 8-digit or longer numbers, date-looking
text like 17Jul09 and floating point power of ten numbers like 1.23e45?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top