Extract number from text string based on number's format?

M

MeatLightning

Hey all -
I'm trying to extract a number from a text string. The text string
varies in length and contents. The only thing that uniquely identifies the
data I need to extract is its format - specifically: The number is always 4
digits separated by a dash followed by 4 more digits. For example: 1234-1234.

The trick is that there are other numbers in there that come close to the
same format (ex: 12-1234).

Any suggestions?

Thanks in advance!
-meat
 
J

JLatham

A user defined function (UDF) like the one below might work for you - it's
not complete or foolproof, but is a good basis for one.

To use it, first put it into your workbook:
Open the workbook. Press [Alt]+[F11] to enter the VB Editor and in the VBE,
choose Insert | Module and copy and paste the code below into the empty code
module presented to you. Close the VB Editor.

To use it in a worksheet, enter a formula such as
=GetNumGroup(A5)
where A5 is the cell containing the text you want to find the number group
in. So, if
A5 has "hello 1234-9876" in it, whatever cell you enter the =GetNumGroup(A5)
should display "1234-9876".
If no 4-4 group is found, the cell with the formula will remain blank.

Hope this helps a bit.

The code:

Function GetNumGroup(whatCell) As String
'only works properly if there's just one
'group of possible digits.
'Examples:
' hello 1234-5678 goodbye
'would be ok and found, but
' hello 1234-5678 goodbye 4ever
'would fail because the result would be
'1234-56784
'
Const charList = "-0123456789"
Dim workingString As String
Dim resultString As String
Dim LC As Integer

workingString = whatCell
If Len(workingString) > 8 Then
'has to be at least 9 long to contain a 1234-4321 entry!
For LC = 1 To Len(workingString)
If InStr(charList, Mid(workingString, LC, 1)) Then
resultString = resultString & Mid(workingString, LC, 1)
End If
Next
End If
If Len(resultString) = 9 And Mid(resultString, 5, 1) = "-" Then
GetNumGroup = resultString
Else
GetNumGroup = ""
End If
End Function
 
M

MeatLightning

Hmmm... first, the text string can be any length... your formula seems to
depend on it being 9 characters total?... second, I'd need both sets of 4
from either side of the "-".

ex: "texty mr textington in text town getting texted for no texting reason
other than to pass the text. sometimes 1234-1234 is texterrific. text."

I need a formula that returns "1234-1234"
 
M

MeatLightning

Thanks! If it's not possible to tackle this with a formula, I'll give your
suggestions a whirl.

JLatham said:
A user defined function (UDF) like the one below might work for you - it's
not complete or foolproof, but is a good basis for one.

To use it, first put it into your workbook:
Open the workbook. Press [Alt]+[F11] to enter the VB Editor and in the VBE,
choose Insert | Module and copy and paste the code below into the empty code
module presented to you. Close the VB Editor.

To use it in a worksheet, enter a formula such as
=GetNumGroup(A5)
where A5 is the cell containing the text you want to find the number group
in. So, if
A5 has "hello 1234-9876" in it, whatever cell you enter the =GetNumGroup(A5)
should display "1234-9876".
If no 4-4 group is found, the cell with the formula will remain blank.

Hope this helps a bit.

The code:

Function GetNumGroup(whatCell) As String
'only works properly if there's just one
'group of possible digits.
'Examples:
' hello 1234-5678 goodbye
'would be ok and found, but
' hello 1234-5678 goodbye 4ever
'would fail because the result would be
'1234-56784
'
Const charList = "-0123456789"
Dim workingString As String
Dim resultString As String
Dim LC As Integer

workingString = whatCell
If Len(workingString) > 8 Then
'has to be at least 9 long to contain a 1234-4321 entry!
For LC = 1 To Len(workingString)
If InStr(charList, Mid(workingString, LC, 1)) Then
resultString = resultString & Mid(workingString, LC, 1)
End If
Next
End If
If Len(resultString) = 9 And Mid(resultString, 5, 1) = "-" Then
GetNumGroup = resultString
Else
GetNumGroup = ""
End If
End Function



MeatLightning said:
Hey all -
I'm trying to extract a number from a text string. The text string
varies in length and contents. The only thing that uniquely identifies the
data I need to extract is its format - specifically: The number is always 4
digits separated by a dash followed by 4 more digits. For example: 1234-1234.

The trick is that there are other numbers in there that come close to the
same format (ex: 12-1234).

Any suggestions?

Thanks in advance!
-meat
 
M

MeatLightning

Or... even more better:

ex: "texty mr textington in text town getting texted for no texting reason
other than to pass the text. sometimes 1234-1234 is texterrific. but 12-1234
is pretty much never textastical. text."

I need a formula that returns "1234-1234
 
T

T. Valko

Is there *always* a space before and after the number string?

This works on the sample you posted:

=MID(A1,SEARCH(" ????-???? ",A1)+1,9)
 
M

MeatLightning

Ah! That's really close... No, there is not always a space before and after
the number.
 
T

T. Valko

You'll need to post several representative samples so we can see what we're
dealing with.
 
M

MeatLightning

OK here goes:

#1234-1234.
1234-1234
1234-1234
text1234-1234
text1234-1234text

Does that help?

If there was a way to search for a number in that format, that'd nail it.
For example, instead of "????-????", you'd use "####-####"
 
R

Ron Rosenfeld

Hey all -
I'm trying to extract a number from a text string. The text string
varies in length and contents. The only thing that uniquely identifies the
data I need to extract is its format - specifically: The number is always 4
digits separated by a dash followed by 4 more digits. For example: 1234-1234.

The trick is that there are other numbers in there that come close to the
same format (ex: 12-1234).

Any suggestions?

Thanks in advance!
-meat

Would something like 12345-1234 meet your specification? In other words, do
the characters preceding and following the nnnn-nnnn string have to be
non-numeric?

You could download and install Longre's free morefunc.xll add-in from
http://xcell05.free.fr/morefunc/english/index.htm#Download

and, if the preceding and following characters have to be non-numeric, use
this:

=REGEX.MID(A1,"(?<=\D|^)\d{4}-\d{4}(?=\D|$)")

If they could be numeric, then use this:

=REGEX.MID(A1,"\d{4}-\d{4}")
--ron
 
M

MeatLightning

Thanks that looks cool... but!... the download link isn't working for me (php
errors galore)
 
J

JLatham

If you can come up with something using regular expressions like Ron
Rosenfeld has shown, it would probably be an optimum solution. Although the
solution that T. Valko is working with is another potential - problem being
(as with the code I put up earlier) that multiple groups of numbers or any
extra numbers in the string will potentially cause problems.

One question that really needs answering is also this one:
Can there be TWO groups of 1234-4321 type numbers in one text string?
Probably everyone's solution is going to be made to find only one in a text
string, not two or more.
 
R

Ron Rosenfeld

Thanks that looks cool... but!... the download link isn't working for me (php
errors galore)

Unfortunately, the download links have been flakey lately. But I thought it
was working today. Guess not.

Well, you can still use a UDF.

But you have not answered one of my questions which was whether characters that
might follow or precede your string can be numbers. If they can; in other
words, given:

123456-123456

you can return 3456-1234 as valid, then you could use this simple UDF:

==========================
Option Explicit
Function NumPat(str As String) As String
Dim i As Long
For i = 1 To Len(str) - 8
If Mid(str, i, 9) Like "####-####" Then
NumPat = Mid(str, i, 9)
Exit Function
End If
Next i
End Function
========================

To enter the UDF, <alt-F11> opens the VBEditor. Ensure your project is
highlighted in the project explorer window, then Insert/Module and paste the
code above into the window that opens.

To use it, you would simply enter =NumPat(cell_ref) into some cell, and it
would return your string.

-------------------------------------------------

If your digits MUST be surrounded by non-digits, then try this UDF:

====================================
Function NumPat2(str As String) As String
Dim re As Object, mc As Object
Dim i As Long

Set re = CreateObject("vbscript.regexp")
re.Pattern = "(^|\D)(\d{4}-\d{4})(\D|$)"

If re.Test(str) = True Then
Set mc = re.Execute(str)
NumPat2 = mc(0).submatches(1)
End If
End Function
=================================
--ron
 
M

MeatLightning

Wow that rules! Never knew I could make my own functions... awesome! Is there
any doc on it? I could see this coming in handy in the future.

Anyway, thanks a ton Ron!

And also thanks to JLatham, T. Valko for the help!
 
R

Ron Rosenfeld

Wow that rules! Never knew I could make my own functions... awesome! Is there
any doc on it? I could see this coming in handy in the future.

Anyway, thanks a ton Ron!

You're welcome. Glad to help.

I'm not sure of the best place to read about using VBA with Excel. Certainly
there are books available at book stores or on line. But you can also work
through HELP for the VBA editor, as well as hanging out in the excel groups and
reading.

Others will have better references.
--ron
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top