Extract text within ( )

M

M

Formula to enter to extract the text within ( ) in a text string
Example: "My Text String (MTS)" and extract or return MTS
Currently the "(XXX)" is always at the end of the text string
Using MS Excel 2002
 
P

Peo Sjoblom

Are those the only parenthesis?

=SUBSTITUTE(MID(A1,FIND("(",A1)+1,255),")","")

replace A1 with your cell



--


Regards,


Peo Sjoblom
 
M

M

Thank you ever so much Peo -
WORKED GREAT!!!!!
On going through a long list it turned out that there were cases
with additional parenthesis -
But the prenethesis which I want to extract the text from is the
one on the furthest right.
For example I found one that looked like:
"My Text String (XPYZ (MTS)" in addition
"My Text String (MTS)"
and I still want to return MTS from both examples
 
P

Peo Sjoblom

If there always will be a space before the parenthesis
with the string you want to extract

=SUBSTITUTE(MID(A1,FIND("^^",SUBSTITUTE(A1,"
","^^",LEN(A1)-LEN(SUBSTITUTE(A1," ",""))))+2,255),")","")



--


Regards,


Peo Sjoblom
 
M

M

Thanks Peo
Worked Great!!!!
But in addition to "My Text String (MTS)" I also found
"My Text String (ADT (MTS)"
For which I would still like to return "MTS"
 
R

Ron Rosenfeld

Formula to enter to extract the text within ( ) in a text string
Example: "My Text String (MTS)" and extract or return MTS
Currently the "(XXX)" is always at the end of the text string
Using MS Excel 2002

You could download and install Longre's free morefunc.xll add-in from
http://xcell05.free.fr and use this Regular Expression formula to extract the
alphanumeric text string within the last set of "(...)"

Easy to modify if that is not precisely what you want. And fairly easy to
write a UDF to mimic it.

Also, this will not work on strings longer than 255 characters -- if that might
be the case, post back and I will put up an equivalent UDF.


=REGEX.SUBSTITUTE(A1,".*\((\w+)\).*$","[1]")


--ron
 
H

Harlan Grove

Ron Rosenfeld said:
You could download and install Longre's free morefunc.xll add-in from >http://xcell05.free.fr and use this Regular Expression formula to extract the
alphanumeric text string within the last set of "(...)" ....
=REGEX.SUBSTITUTE(A1,".*\((\w+)\).*$","[1]")

Or use assertions.

=REGEX.MID(A1,"(?<=\()[^()]*(?=\))")

Also begs the question whether to be as inclusive as possible within
the parentheses. If so, [^()]* matches more substrings than \w+.
 
R

Ron Rosenfeld

Ron Rosenfeld said:
You could download and install Longre's free morefunc.xll add-in from >http://xcell05.free.fr and use this Regular Expression formula to extract the
alphanumeric text string within the last set of "(...)" ...
=REGEX.SUBSTITUTE(A1,".*\((\w+)\).*$","[1]")

Or use assertions.

=REGEX.MID(A1,"(?<=\()[^()]*(?=\))")


If you are going to use assertions that way, per the OP's requirements, you
need to ensure you return the last match.

Given, for example:

My Text String (XPYZ) (MTS)) in addition

your REGEX.MID returns (XPYZ),

So, I would use:

=REGEX.MID(A1,"(?<=\()[^()]*(?=\))",-1) or
=REGEX.MID(A1,"(?<=\()[^()]+(?=\)[^(]*$)")

But I've been purposely trying to avoid using lookbehind assertions since they
are not supported in VBScript. And often enough, it has happened that I've had
to switch to VBA.

Also begs the question whether to be as inclusive as possible within
the parentheses. If so, [^()]* matches more substrings than \w+.

Absolutely correct, and a good point to make explicitly. (That is why, in my
description, I wrote "alphanumeric text string".)

Now, having written all that, and thought about it some more, it appears as if
the look-behind assertion may not even be necessary!

For example:

[^()]+(?=\)[^(]*$)

or even:

\w+(?=\)[^(]*$)

or

[A-Z]+(?=\)[^(]*$)

All seem to work, and would work in either PCRE or VBScript.

Thanks.
--ron
 
M

M

Thank You "ALL", Peo, Ron & Harlen
ALL super helpful: I am learnin a lot -
Back to Peo's original suggestion - I finally figured out what
everything in the formula =SUBSTITUTE(MID(A1,FIND("(",A1)+1,255),")","") and
will use this and those suggestions form Ron and Harlen in the future - BUT
what does the 255 refer to in the above example. I also just found another
useful information site: http://www.techonthenet.com/excel/formulas/

AGAIN THANKS ALL!!!!! REALLY I sometimes spend hours trying to figure this
stuff out on my own and you all are a super HELP!!!!
M

Ron Rosenfeld said:
Ron Rosenfeld said:
You could download and install Longre's free morefunc.xll add-in from >http://xcell05.free.fr and use this Regular Expression formula to extract the
alphanumeric text string within the last set of "(...)" ...
=REGEX.SUBSTITUTE(A1,".*\((\w+)\).*$","[1]")

Or use assertions.

=REGEX.MID(A1,"(?<=\()[^()]*(?=\))")


If you are going to use assertions that way, per the OP's requirements, you
need to ensure you return the last match.

Given, for example:

My Text String (XPYZ) (MTS)) in addition

your REGEX.MID returns (XPYZ),

So, I would use:

=REGEX.MID(A1,"(?<=\()[^()]*(?=\))",-1) or
=REGEX.MID(A1,"(?<=\()[^()]+(?=\)[^(]*$)")

But I've been purposely trying to avoid using lookbehind assertions since they
are not supported in VBScript. And often enough, it has happened that I've had
to switch to VBA.

Also begs the question whether to be as inclusive as possible within
the parentheses. If so, [^()]* matches more substrings than \w+.

Absolutely correct, and a good point to make explicitly. (That is why, in my
description, I wrote "alphanumeric text string".)

Now, having written all that, and thought about it some more, it appears as if
the look-behind assertion may not even be necessary!

For example:

[^()]+(?=\)[^(]*$)

or even:

\w+(?=\)[^(]*$)

or

[A-Z]+(?=\)[^(]*$)

All seem to work, and would work in either PCRE or VBScript.

Thanks.
--ron
 
H

Harlan Grove

Ron Rosenfeld said:
If you are going to use assertions that way, per the OP's requirements, you
need to ensure you return the last match.
....

Fair point.
=REGEX.MID(A1,"(?<=\()[^()]*(?=\))",-1)  or
Best.

=REGEX.MID(A1,"(?<=\()[^()]+(?=\)[^(]*$)")

Less efficient.
But I've been purposely trying to avoid using lookbehind assertions since they
are not supported in VBScript.  And often enough, it has happened that I've had
to switch to VBA.
....

For long string support? You could handle them with VBScript by using
a regex to check the match pattern itself. If you find a look behind
assertion, convert it to a separate expression, e.g.,

(?<=\()[^()]*(?=\)) -> \([^()]*(?=\))

use the modified regex to get the desired substring from the source
string, then trim off the bit matching the look behind assertion. It'd
make the wrapper function more complicated, but you could add support
for look behind assertions this way.
Now, having written all that, and thought about it some more, it appears as if
the look-behind assertion may not even be necessary!

For example:

[^()]+(?=\)[^(]*$)

or even:

\w+(?=\)[^(]*$)

or

[A-Z]+(?=\)[^(]*$)

All seem to work, and would work in either PCRE or VBScript.

All would fail if there were an unmatched left parenthesis to the
right of all other parentheses. Not necessarily an idle consideration
since your example string

My Text String (XPYZ) (MTS)) in addition

has unbalanced parentheses. All may fail if there were only a right
parenthesis with no left parentheses if the OP wanted "" returned in
such cases.
 
P

Peo Sjoblom

It tells you how many characters to return after the last space, I didn't
think you would have a
string parenthesis included that would be more than that :) So it is just an
arbitrary number
and very rarely would a last string (since you told us it was always the
last word) exceed
that. I am sure putting 20 would be enough.

--


Regards,


Peo Sjoblom

M said:
Thank You "ALL", Peo, Ron & Harlen
ALL super helpful: I am learnin a lot -
Back to Peo's original suggestion - I finally figured out what
everything in the formula =SUBSTITUTE(MID(A1,FIND("(",A1)+1,255),")","")
and
will use this and those suggestions form Ron and Harlen in the future -
BUT
what does the 255 refer to in the above example. I also just found another
useful information site: http://www.techonthenet.com/excel/formulas/

AGAIN THANKS ALL!!!!! REALLY I sometimes spend hours trying to figure this
stuff out on my own and you all are a super HELP!!!!
M

Ron Rosenfeld said:
...
You could download and install Longre's free morefunc.xll add-in from
http://xcell05.free.fr and use this Regular Expression formula to
extract the
alphanumeric text string within the last set of "(...)"
...
=REGEX.SUBSTITUTE(A1,".*\((\w+)\).*$","[1]")

Or use assertions.

=REGEX.MID(A1,"(?<=\()[^()]*(?=\))")


If you are going to use assertions that way, per the OP's requirements,
you
need to ensure you return the last match.

Given, for example:

My Text String (XPYZ) (MTS)) in addition

your REGEX.MID returns (XPYZ),

So, I would use:

=REGEX.MID(A1,"(?<=\()[^()]*(?=\))",-1) or
=REGEX.MID(A1,"(?<=\()[^()]+(?=\)[^(]*$)")

But I've been purposely trying to avoid using lookbehind assertions since
they
are not supported in VBScript. And often enough, it has happened that
I've had
to switch to VBA.

Also begs the question whether to be as inclusive as possible within
the parentheses. If so, [^()]* matches more substrings than \w+.

Absolutely correct, and a good point to make explicitly. (That is why,
in my
description, I wrote "alphanumeric text string".)

Now, having written all that, and thought about it some more, it appears
as if
the look-behind assertion may not even be necessary!

For example:

[^()]+(?=\)[^(]*$)

or even:

\w+(?=\)[^(]*$)

or

[A-Z]+(?=\)[^(]*$)

All seem to work, and would work in either PCRE or VBScript.

Thanks.
--ron
 
R

Ron Rosenfeld

...

For long string support?

Yes. That and also dealing with capturing groups (.submatches) in certain
instances.
You could handle them with VBScript by using
a regex to check the match pattern itself. If you find a look behind
assertion, convert it to a separate expression, e.g.,

(?<=\()[^()]*(?=\)) -> \([^()]*(?=\))

use the modified regex to get the desired substring from the source
string, then trim off the bit matching the look behind assertion. It'd
make the wrapper function more complicated, but you could add support
for look behind assertions this way.

Interesting approach. I like the flexibility. I'll have to think about
implementing that one of these days.
Now, having written all that, and thought about it some more, it appears as if
the look-behind assertion may not even be necessary!

For example:

[^()]+(?=\)[^(]*$)

or even:

\w+(?=\)[^(]*$)

or

[A-Z]+(?=\)[^(]*$)

All seem to work, and would work in either PCRE or VBScript.

All would fail if there were an unmatched left parenthesis to the
right of all other parentheses. Not necessarily an idle consideration
since your example string

My Text String (XPYZ) (MTS)) in addition

has unbalanced parentheses. All may fail if there were only a right
parenthesis with no left parentheses if the OP wanted "" returned in
such cases.

Good point. I had not considered something like:

"My Text String XPYZ (MTS) ( in addition"

--ron
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top