Extracting a numbers from a text string

G

Guest

How would I extract a number (or series of numbers) using a single cell
formula from an alphanumeric text string that also contains spaces? One key
component of the number(s) in question, is that they have a space on either
side. See the example of the text string below:

"A 544646-BA CALIF UN1MM+ 5 7/1/2017 FSA 108.579 3.90 3.82 8 Aaa AAA"

In this example, I am trying to extract the number "8". However, there is
also the number "5", which appears first, and that I want to avoid
extracting. Additionally, the formula should be able to extract any number,
not specific to "8" or "5". This formula should always skip the first single
number surrounded by spaces ("5" in this example) and extract the second
number(s) ("8" in this example).
 
G

Guest

If you always have a space and 6 characters after the number you seek, then...

=LEFT(RIGHT(A1,9),1)*1

Vaya con Dios,
Chuck, CABGx3
 
G

Guest

Based solely on the criteria you provided, this should work:

=MID(MID(A1,SEARCH(" ? ",A1)+1,LEN(A1)),SEARCH(" ? ",MID(A1,SEARCH(" ?
",A1)+1,LEN(A1)))+1,1)

HTH,
Elkar
 
G

Guest

The text string is always different (ie, it does not always have a space and
6 characters after the number)....

Any other suggestions?
 
G

Guest

Thanks, that worked great! One caveat though - what if the number in question
comes with or without a "-" (negative sign) accompaniment (ie, "-8") and/or
the number that I am trying not to extract has a decimal behind it with up to
2 decimpal places (ie, "5.75")?
 
P

Pete_UK

I think you might find it easier to use Data | Text-to-columns and
split the data at each space. Then you would get (in different
columns):

A
544646-BA
CALIF
UN1MM+
5
7/1/2017
FSA
108.579
3.90
3.82
8
Aaa
AAA

and then you can decide more easily which of these you want to take as
your number.

Hope this helps.

Pete
 
G

Guest

You ask too much. From your first statement of the problem the target number
is the second of 2 bracketed by spaces. Now you completely change the
criteria. Moreover, there are at least TWO numbers with decimals in the
example you provided, and you would want to skip them both, right?

You apparently want an infinitely flexible solution.
 
G

Guest

Hmm... if the number could contain decimal places, then based on your
example, 3.90 would be returned (or perhaps even 108.579 if the number can be
greater than 10). I think we would need to find a better set of criteria
than "a number surrounded by spaces". Are there any other commonalities
amongst your data that we can work with? (ie.. total number of spaces, first
number from the right, etc...)

Or, perhaps if you posted some more examples of your data, we might be able
to see a useful pattern.
 
G

Guest

Okay, let me make this a bit clearer: The number I am looking for ("8" in
the example) can be more than one digit (up to 3), but it will always be a
number without decimals and a space on either side. The number in the
example that comes first, which I want to avoid, ("5") can be a number with
up to 2 decimal places - or no decimals like in the example. The formula
should not pull any single letters that have spaces on either side. I
provided 3 other strings as an example:

1/4/2007 CA 544646-BH LOS ANGELES CALIF UN 1000000 5 7/1/2022 FGIC 108.151
3.96 3.86 10 Aaa AAA Y 7/1/2016

....here I am looking for "10"

1/8/2007 CA 544644-NC LOS ANGELES CALIF UN 1000000 5.75 7/1/2016 MBIA
115.907 3.74 3.67 7 Aaa AAA N 7/1/2016

....here for "7"

1/8/2007 CA 544644-NC LOS ANGELES CALIF UN 1000000 5.75 7/1/2016 MBIA
115.907 3.74 3.67 17 Aaa AAA N 7/1/2016

....and here for "17"

your help is greatly appreciated.
 
G

Guest

It would appear that way from my examples, but the answer would have to be no.

thanks for giving this another crack...
 
G

Guest

Ok, give this one a try:

=MID(A1,FIND("~",SUBSTITUTE(A1," ","~",14))+1,FIND("~",SUBSTITUTE(A1,"
","~",15))-FIND("~",SUBSTITUTE(A1," ","~",14))-1)

I've made two assumptions here which may or may not be correct. First off,
the number you're looking for is always between the 14th and 15th spaces.
And, you don't have the ~ symbol in any of your data. If either assumption
is incorrect, then I'll try something else.

HTH,
Elkar
 
G

Guest

OK, these lines represent bond issues, right?
Is it fair to say that these would parse into (rough guess)

Date (purchase?)
State of Obligation
CUSIP
Description
Par
Coupon
Maturity
Guarantor
Price
YTM
YTC
?
Call Date

And you want the ? value

What does it represent, and is there a better way of grabbing it?
 
G

Guest

We are getting closer. This one would not work because the numbers are not
always going to be between the 14th and 15th spaces.

Here are some examples of where it would not work:

12/15/2006 CA 544644-L6 LOS ANGELES CALIF UN1MM+ 5 7/1/2018 AMBAC 108.748
3.79 3.71 8 Y Aaa AAA Y 7/1/2015

....here it pulls Y

2/6/2007 CA 544646-BA LOS ANGELES CALIF UN1MM+ 5 7/1/2017 FSA 108.579 3.90
3.82 8 Aaa AAA Y 7/1/2016

....here it pulls "Aaa"
 
G

Guest

You are correct. I am looking for spread. The difference between what you
call YTM and YTC, which are actually the benchmark yield and the bond's yield.
 
G

Guest

After the second date value, will you ALWAYS have separate numbers
representing price, benchmark yield, and bond yield, and frequently but not
always have a guarantor name, too? Any chance the guarantor name will have
spaces?
 
G

Guest

This will get you a string that starts with the guarantor or the price, where
there is no guarantor

=RIGHT(A1,LEN(A1)-5-SEARCH("\",SUBSTITUTE(A1,"/","\",4)))

More to follow, based on your response to the earlier question
 
G

Guest

Yes, price, benchmark yield, and bond yield will always be before spread and
it will occasionally have a guarantor name that will not have a space.
 
R

Ron Rosenfeld

How would I extract a number (or series of numbers) using a single cell
formula from an alphanumeric text string that also contains spaces? One key
component of the number(s) in question, is that they have a space on either
side. See the example of the text string below:

"A 544646-BA CALIF UN1MM+ 5 7/1/2017 FSA 108.579 3.90 3.82 8 Aaa AAA"

In this example, I am trying to extract the number "8". However, there is
also the number "5", which appears first, and that I want to avoid
extracting. Additionally, the formula should be able to extract any number,
not specific to "8" or "5". This formula should always skip the first single
number surrounded by spaces ("5" in this example) and extract the second
number(s) ("8" in this example).

Looking at your examples and trying to read a bit between the lines, it appears
as if the integer number you are trying to extract can also be described as the
last integer surrounded by spaces in the string.

That being the case, here is one solution:

Download and install Longre's free and easily distributable morefunc.xll add-in
from http://xcell05.free.fr/

Then use this "regular expression" formula:

=REGEX.MID(A1,"(?<=\s)\d+(?=\s)",-1)


--ron
 
R

Ron Rosenfeld

How would I extract a number (or series of numbers) using a single cell
formula from an alphanumeric text string that also contains spaces? One key
component of the number(s) in question, is that they have a space on either
side. See the example of the text string below:

"A 544646-BA CALIF UN1MM+ 5 7/1/2017 FSA 108.579 3.90 3.82 8 Aaa AAA"

In this example, I am trying to extract the number "8". However, there is
also the number "5", which appears first, and that I want to avoid
extracting. Additionally, the formula should be able to extract any number,
not specific to "8" or "5". This formula should always skip the first single
number surrounded by spaces ("5" in this example) and extract the second
number(s) ("8" in this example).


Well, I just noted that the number to be extracted can be a negative number, so
use this instead:

=REGEX.MID(A1,"(?<=\s)-?\d+(?=\s)",-1)


--ron
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top