finding a value in a string

I

icestationzbra

i am looking for some help in finding a particular value in a string.

i have strings that look like:

ABC_DEF123_IJK_V(1).xls

or

ABC_DEF123_IJK_V2.xls

or

ABC_DEF123_IJK_V3_A.xls

or

ABC_DEF123_IJK_V4_NA.xls

or

ABC_DEF123_IJK_V5.0.xls

or

ABC_DEF123_IJK_6_0.xls

or

ABC DEF123 IJK V7.xls

these are the filenames of various files that users would have create
and made an entry into this one master sheet along a column. i have a
many nomenclatures as i have users.

i need to extract the (1, 2, 3, 4, 5, 6, 7) from the strings. the
denote a certain value that is of importance to other calculations.

now, i have created upto 5 columns with RIGHT, MID, LEFT and WHAT HAV
YOU and WHAT NOT to extricate the values from the strings. i have ove
3000 rows from which to extract data. i have even combined upto 3 IF'
to get the desired output in some cases. only problem is, i have mor
variety than the number of IFs allowed (7) incorporating ISNUMBER
VALUE and ISTEXT functions. actually, after the 3rd IF, the formula ha
more number of braces than characters!!! i am not able to create
single formula that would take care of all the variety i have. over 5-
columns, and even after that, a little bit of manual tweaking, allow
me to get the right value. however, it is quite cumbersome.

would anyone be able to give me one formula that could pick that numbe
for me from the string? there is one commonality in those strings. th
number that i require to be picked up resides within 5 places to th
left of the ".xls".

thanks in advance,

mac
 
R

Ron Rosenfeld

would anyone be able to give me one formula that could pick that number
for me from the string?

It's not clear from your posting what kind of output you want.

But here is a UDF (User defined function) that extracts all of the digits as a
number.

If you have a specific output in mind, please post it.

=================================
Function GetValue(str)
Dim N As Integer, i As String
i = ""
For N = 1 To Len(str)
If IsNumeric(Mid(str, N, 1)) Then
i = i & Mid(str, N, 1)
End If
Next
If i = "" Then
GetValue = i
Exit Function
End If
GetValue = CDbl(i)
End Function
========================

To enter this, <alt-F11> opens the VB Editor.

Ensure your project is highlighted in the Project Explorer, then Insert/Module
and paste the code into the window that opens.

Use it in your worksheet like any function. e.g. =getvalue(A1)


--ron
 
F

Frank Kabel

Hi
try the following UDF:
Public Function Get_Numerics(rng As Range) As Double
Dim i As Integer
Dim res
Dim sValue
sValue = Mid(rng.Value, Len(rng.Value) - 9, 5)
For i = 1 To Len(sValue)
If IsNumeric(Mid(sValue, i, 1)) Then
res = res & Mid(sValue, i, 1)
End If
Next i
res = CDbl(res)
Get_Numerics = res
End Function

and use it like
=GET_NUMERICS(A1)
 
I

icestationzbra

sorry if i was not clear in my earlier posting.

if you see the filenames that i have mentioned, there is a numbe
within the last 5-6 characters. thats the only number that keep
changing (1, 2, 3, 4, 5, 6, 7) within all the filenames that i hav
given. thats a sort of version number and thats what i am looking t
extract.

for example, in "ABC DEF123 IJK V7.xls", '7' is what i am looking t
extract. in "ABC_DEF123_IJK_V5.0.xls", '5' is what i am looking t
extract.

if the UDF extracts everything from the string, i might still have t
do some drudgery. some filenames are as long as 100 characters wit
several numbers in them.

however, there is one condition that almost 95% of the strings woul
adhere to. to the left of ".xls", if the first numeric character tha
is greater than zero could be picked, that would be the number i a
looking for.

please help
 
F

Frank Kabel

Hi
have you tried the UDF?. It should return only this version number. At
least it worked for your example data?
If not what problems did you encounter (please post the example
filename in these cases)
 
I

icestationzbra

frank,

i had tried your udf.

here are two strings that i tested it for, wherein it failed.

ABC_DEF_IJK_MNO070_UTHI05_1.1_V1.xls

the udf returned 11, the expected output is 1.

ABC_DEF_IJK_MNO120_QTLY19BREP_1.xls

the udf returned 0, the expected output is 1.

there are other cases wherein it gave the correct output.

thank you for your response and help.

mac
 
F

Frank Kabel

Hi
in the UDF change the line
sValue = Mid(rng.Value, Len(rng.Value) - 9, 5)

to
sValue = Mid(rng.Value, Len(rng.Value) - 8, 5)

This solves your second example. The first example is difficult as you
said that the version number is within the last 5 characters before
'.xls'. But in this example there's also a part of a previous number
within this range. To solve also this problem try the following UDF:
Public Function Get_Numerics(rng As Range) As Double
Dim i As Integer
Dim res
Dim sValue
Dim version
version = False
sValue = Mid(rng.Value, Len(rng.Value) - 8, 5)
For i = Len(sValue) To 1 Step -1
If IsNumeric(Mid(sValue, i, 1)) Then
version = i
res = Mid(sValue, version, 1)
While IsNumeric("0" & Mid(sValue, version - 1, 1) & res)
version = version - 1
res = Mid(sValue, version, 1) & res
Wend
Exit For
End If
Next i
res = CDbl(res)
Get_Numerics = res
End Function

Not fully tested but give it a try
 
H

hgrove

Frank Kabel wrote...
have you tried the UDF?. It should return only this version
number. At least it worked for your example data?
...

*YOU* didn't test your UDF on the filenames

ABC_DEF123_IJK_V2.xls

or

ABC DEF123 IJK V7.xls

for which your UDF returns 0. Why? Classic off-by-one indexing error
The statement

sValue = Mid(rng.Value, Len(rng.Value) - 9, 5)

would need to be replaced with

sValue = Mid(rng.Value, Len(rng.Value) - 8, 5)

in order to pick up version numbers immediately preceding '.xls', bu
that exposes another failing of your UDF. If the change above wer
made, then for the filenames

ABC_DEF123_IJK_V5.0.xls

and

ABC_DEF123_IJK_6_0.xls

it returns 50 and 60, respectively, because you're using a character a
a time algorithm that's clueless about '.' being a valid numeri
character as long as it appears once with at least one adjacen
numeral. Even then it'd choke on '6_0', which appears to be a vali
alternative to 6.0.

Anyway, a UDF that copes with all the OP's variations given so fa
needs to locate the rightmost numeric substring, and that requires a
least two loops - first to find the end of the rightmost numeri
substring, then to find it's beginning. Here's an alternative UD
that's more permissive with its argument and returns a string rathe
than a number, so strings without such numeric substrings return "
rather than #VALUE!.


Public Function vn(ByVal v As Variant) As String
Dim i As Long, n As Long, s As String, t As String

n = Len(v) - 4

If n <= 0 Then Exit Function 'v is too short


For i = 0 To 4

If Mid(v, n - i, 1) Like "[0-9]" Then
v = Mid(v, n - 4, 5 - i)
n = 5 - i
Exit For
End If

Next i


If i > 4 Then Exit Function


For i = n - 1 To 0 Step -1
s = Right(v, n - i)
t = Application.WorksheetFunction.Substitute(s, "_", ".")

If IsNumeric(s) Then
vn = s

ElseIf IsNumeric(t) Then
vn = t

Else
Exit For

End If

Next i


End Functio
 
H

hgrove

icestationzbra wrote...
...
i have strings that look like:

ABC_DEF123_IJK_V(1).xls ...
...
ABC DEF123 IJK V7.xls

these are the filenames of various files that users would have
created and made an entry into this one master sheet along a
column. i have as many nomenclatures as i have users.

i need to extract the (1, 2, 3, 4, 5, 6, 7) from the strings. they
denote a certain value that is of importance to other
calculations. ...
would anyone be able to give me one formula that could pick
that number for me from the string? there is one commonality in
those strings. the number that i require to be picked up resides
within 5 places to the left of the ".xls".

If it weren't for the filename

ABC_DEF123_IJK_6_0.xls

you could use the formula

=LOOKUP(99999,ABS(-MID(LEFT(RIGHT(A1,9),5),
6-INT((ROW(INDIRECT("1:25"))-1)/5),
1+MOD(ROW(INDIRECT("1:25"))-1,5))))

to pull the version number from the filename in cell A1. I don't thin
there's any compact way to handle underscores between numerals a
decimal points, but I could be wrong
 
F

Frank Kabel

Frank Kabel wrote...
..

*YOU* didn't test your UDF on the filenames

Ack, got me :)

[....]
Anyway, a UDF that copes with all the OP's variations given so far
needs to locate the rightmost numeric substring, and that requires at
least two loops - first to find the end of the rightmost numeric
substring, then to find it's beginning. Here's an alternative UDF
that's more permissive with its argument and returns a string rather
than a number, so strings without such numeric substrings return ""
rather than #VALUE!.

like your approach. And your UDF also covers 5_5 (what my seond UDF
does not)


Frank
 
F

Frank Kabel

[....]
If it weren't for the filename

ABC_DEF123_IJK_6_0.xls

you could use the formula

=LOOKUP(99999,ABS(-MID(LEFT(RIGHT(A1,9),5),
6-INT((ROW(INDIRECT("1:25"))-1)/5),
1+MOD(ROW(INDIRECT("1:25"))-1,5))))

to pull the version number from the filename in cell A1. I don't think
there's any compact way to handle underscores between numerals as
decimal points, but I could be wrong.

Hi Harlan
using two formulas (to avoid the maximum of 7 nested functions) One
could use:
B1:
=SUBSTITUTE(A1,"_",".",MAX(LEN(A1)-LEN(SUBSTITUTE(A1,"_","")),1))

C1:
=LOOKUP(99999,ABS(-MID(LEFT(RIGHT(B1,9),5),6-INT((ROW(INDIRECT("1:25"))
-1)/5),1+MOD(ROW(INDIRECT("1:25"))-1,5))))

This works correctly for the following test data:
ABC_DEF123_IJK_V(1).xls
ABC_DEF123_IJK_V2.xls
ABC_DEF123_IJK_V3_A.xls
ABC_DEF123_IJK_V4_NA.xls
ABC_DEF123_IJK_V5,1.xls
ABC_DEF123_IJK6_2.xls
ABC DEF123 IJK V7.xls
ABC,DEF123,IJK6,2.xls
ABC_DEF123_IJK_6_2.xls
ABC_DEF123_IJK_6_0.xls

Frank
 
H

hgrove

hgrove wrote...
...
If it weren't for the filename

ABC_DEF123_IJK_6_0.xls

you could use the formula

=LOOKUP(99999,ABS(-MID(LEFT(RIGHT(A1,9),5),
6-INT((ROW(INDIRECT("1:25"))-1)/5),
1+MOD(ROW(INDIRECT("1:25"))-1,5))))

to pull the version number from the filename in cell A1. I don't
think there's any compact way to handle underscores between
numerals as decimal points, but I could be wrong.

I was wrong. The following *nonarray* formula handles underscores a
decimal points as long as there are at most 2 of them in the 5 chars t
the left of '.xls'.

=LOOKUP(99999,
ABS(-MID(SUBSTITUTE(LEFT(RIGHT(F10,9),5),"_",".",
1+MOD(ROW(INDIRECT("1:50")),2)),
6-MOD(INT((ROW(INDIRECT("1:50"))-1)/5),5),
1+MOD(ROW(INDIRECT("1:50"))-1,5)))
 
F

Frank Kabel

Frank said:
[....]

Hi Harlan
using two formulas (to avoid the maximum of 7 nested functions) One
could use:
B1:
=SUBSTITUTE(A1,"_",".",MAX(LEN(A1)-LEN(SUBSTITUTE(A1,"_","")),1))

C1:
=LOOKUP(99999,ABS(-MID(LEFT(RIGHT(B1,9),5),6-INT((ROW(INDIRECT("1:25"))
-1)/5),1+MOD(ROW(INDIRECT("1:25"))-1,5))))


And to create one single formula for this one may use:
=LOOKUP(99999,ABS(-MID(LEFT(RIGHT(SUBSTITUTE(A1,"_",",",LEN("_"&A1)-LEN
(SUBSTITUTE("_"&A1,"_",""))),9),5),6-INT(seq/5),1+MOD(seq,5))))

with seq as a defined name for
=ROW(INDIRECT("1:25"))-1

Now one can just argue if this is a 'compact' formula :)

Frank
 
F

Frank Kabel

hgrove wrote...
..
I was wrong. The following *nonarray* formula handles underscores as
decimal points as long as there are at most 2 of them in the 5 chars
to the left of '.xls'.

=LOOKUP(99999,
ABS(-MID(SUBSTITUTE(LEFT(RIGHT(F10,9),5),"_",".",
1+MOD(ROW(INDIRECT("1:50")),2)),
6-MOD(INT((ROW(INDIRECT("1:50"))-1)/5),5),
1+MOD(ROW(INDIRECT("1:50"))-1,5))))

Hi Harlan
very impressive. Though this won't handle:
ABC_DEF123_IJK_6_2.xls

returns 38023.

Frank
 
F

Frank Kabel

Frank said:
Frank said:

And to create one single formula for this one may use:
=LOOKUP(99999,ABS(-MID(LEFT(RIGHT(SUBSTITUTE(A1,"_",",",LEN("_"&A1)-LEN
(SUBSTITUTE("_"&A1,"_",""))),9),5),6-INT(seq/5),1+MOD(seq,5))))

Make this
=LOOKUP(99999,ABS(-MID(LEFT(RIGHT(SUBSTITUTE(A1,"_",".",LEN("_"&A1)-LEN
(SUBSTITUTE("_"&A1,"_",""))),9),5),6-INT(seq/5),1+MOD(seq,5))))

for the dot as decimal point (and not the coma)
 
H

hgrove

Frank Kabel wrote...
...
using two formulas (to avoid the maximum of 7 nested
functions) One could use:
B1:
=SUBSTITUTE(A1,"_",".",MAX(LEN(A1)-LEN(SUBSTITUTE(A1,
"_","")),1))
...

Two formulas not needed. You seem to have posted this before I sent m
latest prior follow-up.
This works correctly for the following test data: ...
ABC,DEF123,IJK6,2.xls
...

Testing is good, but it should be realistic. These are filenames, and
believe it's uncommon for Mac files to carry extensions, so these ar
likely Windows filenames. If so, commas aren't valid characters i
filenames
 
I

icestationzbra

harlan:

incognito with 'hgrove'. what gives?

frank:

thanks for the perseverence.

i will try both the udfs and the formulae. please bear with me till
test it out and get back to you.

thanks,

mac
 
F

Frank Kabel

Frank Kabel wrote...
..
Two formulas not needed. You seem to have posted this before I sent my
latest prior follow-up.

Correct. But I also posted a combined formula :) Though in the last
one I forgot to add and "_" & part. But as this is now mainly fun (at
least for me) this is probably not that important.
Also saw your formula and added a comment regarding version numbers
like _6_2.xls.

Testing is good, but it should be realistic. These are filenames, and
I believe it's uncommon for Mac files to carry extensions, so these
are likely Windows filenames. If so, commas aren't valid characters in
filenames.

Agreed (not a realistic one. Was just one to check for strings without
'_'. But definelty not a valid filename

Frank
 
H

hgrove

Frank Kabel wrote...
...
. . . Though this won't handle:
ABC_DEF123_IJK_6_2.xls

returns 38023.

Looks like you're testing my formula on a machine with non-US defaul
date formats since 38023 is 6-Feb-2004 under the 1900 date system. M
formula returns 6.2 for this on my machine with standard US settings.

Unless the OP surprises me, I'm not going to worry much about thi
failing internationalization. For that matter, does this imply tha
Excel is stupid enough to convert "6.2" to a date rather than 6 + 2/1
 
F

Frank Kabel

Frank Kabel wrote...
..

Looks like you're testing my formula on a machine with non-US default
date formats since 38023 is 6-Feb-2004 under the 1900 date system. My
formula returns 6.2 for this on my machine with standard US settings.

Unless the OP surprises me, I'm not going to worry much about this
failing internationalization. For that matter, does this imply that
Excel is stupid enough to convert "6.2" to a date rather than 6 + 2/10
?

Hi Harlan
correct guess. changing the regional settings for the decimal delimiter
corrects this (and I agree one should not worry about this).

Also correct that Excel converts '6.2' to a date

Frank
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top