# Extract 5 digit number from string

D

#### directos2

I am using the following code to extract a 5 digit number from a
string. It does not work. If there is a longer than 5-dgit number in
front of the 5-digit number it gives the first 5 digits of this longer
number (while it only should give the 5-digit number, not a part of a
longer number if the text string contains it).

Text sting Result
365485 12345 36548 >> Wrong, this one should
give 12345

++++++++++++++++++++++++++++++++++++++++++++
Function Extract(S As String) As String
Sub test()
Dim bArr() As Byte

Dim vIn

vIn = Application.InputBox("Enter 10 digit number")
bArr = StrConv(vIn, vbFromUnicode)
For i = 0 To UBound(bArr)
Select Case bArr(i)
Case 48 To 57
Case Else
bArr(i) = 32
End Select
Next

vIn = StrConv(bArr, vbUnicode)

' Replace n/a in xl 97, use Application.Substitute
vIn = Replace(vIn, " ", "")

MsgBox vIn & vbCr & _
IIf(Len(vIn) = 10, "OK", "Bad input person")

End Sub

End Sub

End Function

+++++++++++++++++++++++++++++++++++++++++++++++++
Text sting Result
36548 dfg hdh 515748 36548
fgj 26547 152475 12-11-2005 26547
12345 12345
dfgdg 21212 .21 dfgdg . - dfgdfg 21212
blablabla
365485 12345 36548 >> Wrong, this one should
give 12345
12254 12254
1,2547 blabla -12457 12457
1.2547 blabla 12

I'm out of ideas. Does someone know a solution?

Chris

G

#### Guest

It looks to me like it extracts the first 5 digits of the string. Is that
what you want.

36548 dfg hdh 515748 36548 - first 5
fgj 26547 152475 12-11-2005 26547 first 5
12345 12345 first 5
dfgdg 21212 .21 dfgdg . - dfgdfg 21212 first 5
blablabla
365485 12345 36548 >> Wrong, this one should
give 12345 Last 5. How does it know?
12254 12254 first 5
1,2547 blabla -12457 12457 first 5. Or do you think it's pulling the last 5
1.2547 blabla 12

How do you determine which one you want to extract. Without that
information, I'm not sure someone can assist.

D

#### directos2

123456 blaat 78910 >> this one extract give 12345, while it should
extract 78910
I only want the part to be extract which have 5 digits after each
other

Some examples:
123456 blaat 78910 >> should give 78910
123456 blaat 78910blaat >> should give 78910
78910 blaat 123456 >> should give 78910
78910blaat blaat 123456 >> should give 78910
1.23456 blaat 78910 >> should give 78910
1,23456 blaat 78910blaat >> should give 78910
78910-blaat 123456 >> should give 78910
78910**blaat blaat 123456 >> should give 78910

I am not really an expert with VBA. I was already happy to make is so
far

D

#### Dave Peterson

Your test data was a little ambiguous.
1,2547 blabla -12457 12457
Is the 12457 returned from the last string or did you ignore the comma in the
first 5 digits?

Anyway, this may get you closer:

Option Explicit
Function ExtractFirst5DigitNumber(myInStr As String) As String

Dim myOutStr As String
Dim iCtr As Long
Dim myChar As String
Dim Found5Chars As Boolean

myOutStr = ""
Found5Chars = False
'Uncomment this line if you want to ignore commas in the digits
'myInStr = Replace(myInStr, ",", "")

'pad with a trailing space to make it easier to check "6th" digit
myInStr = myInStr & " "

'but avoid that trailing space when looping t
For iCtr = 1 To Len(myInStr) - 1
myChar = Mid(myInStr, iCtr, 1)
If IsNumeric(myChar) Then
myOutStr = myOutStr & myChar
If Len(myOutStr) = 5 Then
If IsNumeric(Mid(myInStr, iCtr + 1, 1)) Then
'more than 5 digits
'keep looking
Else
Found5Chars = True
Exit For
End If
End If
Else
myOutStr = ""
End If
Next iCtr

If Found5Chars = True Then
ExtractFirst5DigitNumber = myOutStr
Else
ExtractFirst5DigitNumber = "Invalid String"
End If

End Function

N

#### Norman Jones

Hi Director,

Try:

'=============>>
Public Function Last5Digits(sStr) As Variant
Dim i As Long
Dim j As Long
Dim sStr2 As String
Dim sOut As String

For i = Len(sStr) To 1 Step -1
sStr2 = Mid(sStr, i, 1)
If IsNumeric(sStr2) _
And Not sStr2 = vbNullString Then
sOut = sStr2 & sOut
j = j + 1
If j = 5 Then Exit For
End If
Next i

If sOut <> vbNullString Then
Last5Digits = CLng(sOut)
Else
Last5Digits = CVErr(xlErrNA)
End If
End Function
'<<=============

D

#### Dave Peterson

I still have questions about what happens with strings with dots and commas.

Should the commas and periods be ignored so that it would essentially look like:

1.2345 blaat 78910
1,2345 blaat 78910blaat
1.23456 blaat 78910

(ignored, they'd look like)
12345 blaat 78910 returns 12345
12345 blaat 78910blaat returns 12345
123456 blaat 78910 returns 78910

Or should they be treated like they're spaces:

1.23456 blaat 78910
1,23456 blaat 78910blaat
1.2345 blaat 78910

(treated like spaces)
1 23456 blaat 78910 returns 23456
1 23456 blaat 78910blaat returns 23456
1 2345 blaat 78910 returns 78910

G

#### Guest

Simple

chane from : For i = 0 To UBound(bArr)
to : For i = 0 To 4

You may really want this
if UBound(bArr) < 5
Upperbound = UBound(bArr)
else
Upperbound = 5
end if

For i = 0 To Upperbound

D

#### Dave Peterson

ps. You could treat dots one way and commas a different way, too.

G

#### Guest

I only want the part to be extract which have 5 digits after each
other

You have this

123456 blaat 78910

The result could be 12345, 23456 or 78910 based upon your statement. My
question was WHICH 5? The first 5, the last 5, something else?

D

#### Dana DeLouis

Hi. Would you be interested in a Regular Expression function?
I make the assumption that we are only looking for 5-digit numbers only.
No other characters. We then take the last one in case there is more than
one.
I'm not very good at this. Because we want exactly 5 digits, and the
surounding area are non-digits, I think I have to make the actual 5-digits a
SubMatch.
I moved Re to the top of the module in case the function is repeated in a
Macro. No sense in resetting it often.

Option Explicit
Private Re As RegExp

Function Grab_Last_5_Digits(Str As String)
'// = = = = = = = = = =
'// Set Vba Library Reference to:
'// Microsoft VbScript Regular Expressions 5.5
'// = = = = = = = = = =

If Re Is Nothing Then Set Re = New RegExp
' 5-Digit Number
Const Ptn As String = "(?:^|\D+)(\d{5})(?:\$|\D+)"
Dim M As MatchCollection

Re.Pattern = Ptn
Re.IgnoreCase = True
Re.Global = True

If Not Re.Test(Str) Then
Grab_Last_5_Digits = "None"
Exit Function
End If
Set M = Re.Execute(Str)
Grab_Last_5_Digits = CDbl(M.Item(M.Count - 1).SubMatches(0))
End Function

Sub TestIt()
Debug.Print Grab_Last_5_Digits("987654 junk 12345 abc")
Debug.Print Grab_Last_5_Digits("987654 xx 78910xx")
Debug.Print Grab_Last_5_Digits("987654 x 12345x9876")
Debug.Print Grab_Last_5_Digits("987654 bla 78910bla")
End Sub

Returns:
12345
78910
12345
78910

D

#### directos2

I will show you some results I would like to have in the list. I did
not get the right results alreayd with your help, so I am probably not
so clear.This is the output I would like to have from the following
list:

36548 dfg hdh 515748 returns 36548
fgj 26547 152475 12-11-2005 returns 26547
12345 returns 12345
dfgdg 21212 .21 dfgdg . - dfgdfg returns 21212
blablabla returns none
365485 12345 returns 12345
12254 returns 12254
1,2589 blabla -12345 returns 12345
1.2589 blabla -12345 returns 12345
1.2547 blabla 12 returns none
123456 blaat 78910 returns 78910
123456 blaat 78910blaat returns 78910
78910 blaat 123456 returns 78910
78910blaat blaat 123456 returns 78910
1.23456 blaat 78910 returns 78910
1,23456 blaat 78910blaat returns 78910
78910-blaat 123456 returns 78910
78910**blaat blaat 123456 returns 78910

D

#### David Portwood

If your numbers contain only digits (no decimal points, commas, or negative
signs) then this isn't too difficult:

Sub ExtractNum(ByVal stInput as String, ByRef lPos as Long, ByRef stNum as
String)

Dim lLen as Long

stNum = ""

'skip preceding nonnumeric characters
lLen = Len(stInput)
do while lPos <= lLen
if not isNumeric(Mid(stInput, lPos, 1) then
lPos = lPos + 1
end if
loop

'extract embedded number
do while lPos<=lLen
if isNumeric(Mid(stInput, lPos, 1) then
stNum = stNum & Mid(stInput, lPos, 1)
lPos = lPos + 1
end if
loop

end Sub

In your calling code you can now write:

Call ExtractNum(MyString, lPos, stNum)
do while stNum <> ""
if len(stNum) = 5 then
exit do
else
call ExtractNum(MyString, lPos, stNum)
end if
loop

if stNum = "" then
msgbox("There is no embedded five digit number")
else
lNum = CLng(stNum) 'found it
end if

The subroutine ExtractNum() skips preceding nondigits until it finds a first
digit character. It then appends each digit character to a number string
until it hits a character which is not a digit or comes to the end of the
input string.

The input variable lPos is passed by reference so you can call ExtractNum
repeatedly and each time you get the next embedded number in the string.

I've just typed the above from memory. I actually use exactly that sub in
some code of my own at work. However, I don't guarantee I remembered it
exactly correctly. But if there is an error, it should be easily fixable.
The strategy should be clear.

D

#### Dana DeLouis

Hi. I just realized that the pattern won't work with a string like
"12345x56789".
I think it is too hard when you are using the or function (|) and testing
the beginning, middle, and end.
I started testing each, and realized it was too complicated.
This just extracts all 5 or more digits, and returns the first one that is
exactly 5 digits.
I can't think of anything that is more efficient. Anyone?

Option Explicit
Private Re As RegExp

Function Fx(Str As String)
'// = = = = = = = = = =
'// Set Vba Library Reference to:
'// Microsoft VbScript Regular Expressions 5.5
'// = = = = = = = = = =

If Re Is Nothing Then Set Re = New RegExp
Dim M As MatchCollection
Dim J As Long

Re.IgnoreCase = True
Re.Global = True

' 5 or more Digits
Re.Pattern = "\d{5,}"

If Not Re.Test(Str) Then
Fx = "None"
Exit Function
End If

'// The first 5-digit number starting from
'// the end is returned
'// Array is 0-Indexed

Set M = Re.Execute(Str)
For J = M.Count To 1 Step -1
If Len(M(J - 1)) = 5 Then
Fx = CDbl(M(J - 1))
Exit Function
End If
Next J

'// If we are here, there was no 5-digit solution
Fx = "None"
End Function

<snip>

J

#### JE McGimpsey

This isn't optimized at all, but it returns the values you want:

Public Function Extract5(sInput As String) As String
Dim sTemp As String

Extract5 = vbNullString
sInput = Trim(sInput)
If sInput Like "*#####*" Then
Do While Len(sInput) > 0
Debug.Print sInput, sTemp
If Left(sInput, 1) Like "[0-9,.]" Then
sTemp = sTemp & Left(sInput, 1)
ElseIf sTemp Like "#####" Then
Exit Do
Else
sTemp = vbNullString
End If
sInput = Mid(sInput, 2)
Loop
If Len(sTemp) = 5 Then Extract5 = sTemp
End If
End Function

R

#### Ron Rosenfeld

123456 blaat 78910 >> this one extract give 12345, while it should
extract 78910
I only want the part to be extract which have 5 digits after each
other

Some examples:
123456 blaat 78910 >> should give 78910
123456 blaat 78910blaat >> should give 78910
78910 blaat 123456 >> should give 78910
78910blaat blaat 123456 >> should give 78910
1.23456 blaat 78910 >> should give 78910
1,23456 blaat 78910blaat >> should give 78910
78910-blaat 123456 >> should give 78910
78910**blaat blaat 123456 >> should give 78910

I am not really an expert with VBA. I was already happy to make is so
far

The following VBA UDF gives the desired result in all of your examples. It
makes use of "Regular Expressions"

===============================================
Option Explicit
Function Extr5D(str As String) As String
Dim oRegExp As Object
Dim colMatches As Object
Const sPattern As String = "(^|[^0-9,.])(\d{5})(\D|\$)"

Set oRegExp = CreateObject("VBScript.RegExp")

With oRegExp
.IgnoreCase = True
.Global = True
.Pattern = sPattern
If .Test(str) = True Then
Set colMatches = .Execute(str)
Extr5D = colMatches(0).submatches(1)
End If
End With

End Function
=============================================

sPattern does the work. It looks for a pattern which consists of the
following:

(^|[^0-9,.])

Start of the string OR a character that is not in the set of 0-9 comma dot

followed by

(\d{5})

5 digits

followed by

(\D|\$)

something which is either NOT a digit or is the end of the string.

Those three patterns are extracted into separate submatches. We extract the
second of the submatches to get our 5 digit number.
--ron

R

#### Ron Rosenfeld

I will show you some results I would like to have in the list. I did
not get the right results alreayd with your help, so I am probably not
so clear.This is the output I would like to have from the following
list:

36548 dfg hdh 515748 returns 36548
fgj 26547 152475 12-11-2005 returns 26547
12345 returns 12345
dfgdg 21212 .21 dfgdg . - dfgdfg returns 21212
blablabla returns none
365485 12345 returns 12345
12254 returns 12254
1,2589 blabla -12345 returns 12345
1.2589 blabla -12345 returns 12345
1.2547 blabla 12 returns none
123456 blaat 78910 returns 78910
123456 blaat 78910blaat returns 78910
78910 blaat 123456 returns 78910
78910blaat blaat 123456 returns 78910
1.23456 blaat 78910 returns 78910
1,23456 blaat 78910blaat returns 78910
78910-blaat 123456 returns 78910
78910**blaat blaat 123456 returns 78910

I now note you wish to have "none" returned if there is nothing matching the
pattern.

Simplest way would be to change my previously recommended formula to something
like:

=IF(LEN(Extr5D(A1))=0,"none",Extr5D(A1))

--ron

D

#### Dave Peterson

What should be returned with:

12345 abc 23456 abc 34567

If it's 12345, then try this:

Option Explicit
Function ExtractFirst5DigitNumber(myInStr As String) As String

Dim myArr As Variant
Dim iCtr As Long
Dim myOutStr As String
Dim myStr As String

Dim bCtr As Long

myOutStr = ""

'remove leading, trailing, duplicate internal spaces
myInStr = Application.Trim(myInStr)

'replace every non-digit, non-comma, non-dot with a space
For iCtr = 1 To Len(myInStr)
Select Case Mid(myInStr, iCtr, 1)
Case "0" To "9", ",", "."
'do nothing
Case Else
Mid(myInStr, iCtr, 1) = " "
End Select
Next iCtr

'remove leading, trailing, duplicate internal spaces
'just keeping a single space separator, digits or dot or comma
myInStr = Application.Trim(myInStr)

myArr = Split(myInStr, " ")
For iCtr = LBound(myArr) To UBound(myArr)
myStr = myArr(iCtr)
If Len(myStr) = 5 Then
Exit For
End If
Next bCtr
'look at next element in myArr
Else
myOutStr = myStr
Exit For
End If
End If
Next iCtr

If myOutStr = "" Then
ExtractFirst5DigitNumber = "None"
Else
ExtractFirst5DigitNumber = myOutStr
End If

End Function

D

#### Dave Peterson

We have a winner!

JE said:
This isn't optimized at all, but it returns the values you want:

Public Function Extract5(sInput As String) As String
Dim sTemp As String

Extract5 = vbNullString
sInput = Trim(sInput)
If sInput Like "*#####*" Then
Do While Len(sInput) > 0
Debug.Print sInput, sTemp
If Left(sInput, 1) Like "[0-9,.]" Then
sTemp = sTemp & Left(sInput, 1)
ElseIf sTemp Like "#####" Then
Exit Do
Else
sTemp = vbNullString
End If
sInput = Mid(sInput, 2)
Loop
If Len(sTemp) = 5 Then Extract5 = sTemp
End If
End Function

I will show you some results I would like to have in the list. I did
not get the right results alreayd with your help, so I am probably not
so clear.This is the output I would like to have from the following
list:

36548 dfg hdh 515748 returns 36548
fgj 26547 152475 12-11-2005 returns 26547
12345 returns 12345
dfgdg 21212 .21 dfgdg . - dfgdfg returns 21212
blablabla returns none
365485 12345 returns 12345
12254 returns 12254
1,2589 blabla -12345 returns 12345
1.2589 blabla -12345 returns 12345
1.2547 blabla 12 returns none
123456 blaat 78910 returns 78910
123456 blaat 78910blaat returns 78910
78910 blaat 123456 returns 78910
78910blaat blaat 123456 returns 78910
1.23456 blaat 78910 returns 78910
1,23456 blaat 78910blaat returns 78910
78910-blaat 123456 returns 78910
78910**blaat blaat 123456 returns 78910

D

#### directos2

Thanks Ron! This one works exactly the way I wanted!
Thank you all for your help!

Chris

D

#### Dana DeLouis

Well, for my own education, I've learned that a string like
"12345A56789" will not work because the logic sees the
"12345A" test first, and ignores the other test of "A56789"
Not the best solution, but I've added the following general idea to my
"library" on Regular Expressions for future reference. There are no
programming loops.

Function Last5(Str As String)
Dim Re As RegExp
Dim M As MatchCollection
Dim S As String

Const NoSol As String = "None" 'No Solution

Select Case Len(Str)

Case Is < 5
Last5 = NoSol

Case 5
If Str Like "#####" Then
Last5 = CDbl(Str)
Else
Last5 = NoSol
End If

Case Is > 5
If Str Like "*[!0-9]#####" Then
Last5 = CDbl(Right\$(Str, 5))
Else
Set Re = New RegExp
Re.IgnoreCase = True
Re.Global = True
Re.Pattern = "\D+"
S = "xx" & Re.Replace(Str, "xx")

Re.Pattern = "\D(\d{5})\D"
If Re.Test(S) Then
Set M = Re.Execute(S)
Last5 = M.Item(M.Count - 1).SubMatches(0)
Else
Last5 = NoSol
End If
End If
End Select
End Function