Using a regular expression to retrieve the text between two parentheses

Mark Rae · Jan 15, 2007

Hi,

Supposing I had a string made up of a person's name followed by their
profession in parentheses e.g.

string strText = "Tiger Woods (golfer)";

and I wanted to extract the portion of the string between the parentheses
i.e. "golfer"

Would a regular expression be the most efficient way of doing this...?

I'm trying to do something like this:

string strProfession = String.Empty;
Regex objRegEx = new Regex("(((.|\n)*?))", RegexOptions.IgnoreCase);
foreach (Match objMatch in objRegEx.Matches(strText)
{
strProfession = objMatch.ToString();
}

but that is returning an empty string, no doubt because I haven't defined
the regular expression correctly.

Also, is it even necessary to have a foreach loop here, as in this
particular scenario there can only ever be one match...?

Any assistance gratefully received.

Mark

=?ISO-8859-1?Q?Arne_Vajh=F8j?= · Jan 15, 2007

Mark said:
Supposing I had a string made up of a person's name followed by their
profession in parentheses e.g.

string strText = "Tiger Woods (golfer)";

and I wanted to extract the portion of the string between the parentheses
i.e. "golfer"

Would a regular expression be the most efficient way of doing this...?

I'm trying to do something like this:

string strProfession = String.Empty;
Regex objRegEx = new Regex("(((.|\n)*?))", RegexOptions.IgnoreCase);
foreach (Match objMatch in objRegEx.Matches(strText)
{
strProfession = objMatch.ToString();
}

but that is returning an empty string, no doubt because I haven't defined
the regular expression correctly.

Also, is it even necessary to have a foreach loop here, as in this
particular scenario there can only ever be one match...?

string s = "Tiger Woods (golfer)";
Regex re = new Regex(@"(\()([^\)]*)(\))");
string prof = re.Match(s).Groups[2].Value;

seems to work.

No regex will typical not be the most efficient way of coding it,
but it is simple code with a well documented syntax.

Some spagetti with IndexOf will be faster, but it would
also be much easier to introduce bugs if modifying the code.

Arne

Jon Shemitz · Jan 15, 2007

Regex re = new Regex(@"(\()([^\)]*)(\))");

@"\( ([^\)]+) \)", RegexOptions.IgnorePatternWhitespace

is probably a bit simpler and faster - there's no real need to capture
the parens.

No regex will typical not be the most efficient way of coding it,
but it is simple code with a well documented syntax.

You might be surprised. I compared a regex to find all tokens between
% signs (@"% (\w+) %") with a hand-coded state machine. Not only did
the hand-coded version take about 180 times as long to write (ie,
fifteen minutes vs five seconds) it also ran slower. As soon as the
task gets at all complex, a regex can save both programmer time and
run time.

Some spagetti with IndexOf will be faster, but it would
also be much easier to introduce bugs if modifying the code.

Yes, and the regex is easier to read and maintain - part of one line,
instead of three statements and some comments.

Gary Stephenson · Jan 15, 2007

Hi,

string strText = "Tiger Woods (golfer)";

Others have supplied working regexes - so I won't repeat.

But perhaps you should be made aware of the limitations implicit in
regexes - the main one being commonly rendered as "regexes can't count". As
long as you are not having to deal with recursive structures, nested
delimiters and so on, regexes will often work well. But they can't be used
to "find the balancing brace", verify correct nesting or suchlike.

In theory, you can explicitly construct a regex to cope with a given maximum
number of pairs of balancing delimiters, but even to cope with a single
extra level of nestiing requires a regex pattern so complex that it's
clearer and simpler to just code the match algorithm explicitly.

And of course there is the school of thought that regexes are only rarely
the best
solution. -

"Some people, when confronted with a problem, think 'I know, I'll use
regular expressions.' Now they have two problems." - Jamie Zawinski{*1]

cheers,

gary

http://www.oxide.net.au

*[1] - For an entertaining discussion of the origins of this quote, see
Jeffrey Friedl's blog at
http://regex.info/blog/2006-09-15/247

Mark Rae · Jan 15, 2007

string s = "Tiger Woods (golfer)";
Regex re = new Regex(@"(\()([^\)]*)(\))");
string prof = re.Match(s).Groups[2].Value;

seems to work.

Yes indeed - thanks very much.

No regex will typical not be the most efficient way of coding it,
but it is simple code with a well documented syntax.
OK.

Some spagetti with IndexOf will be faster, but it would
also be much easier to introduce bugs if modifying the code.

I guess so...

Mark Rae · Jan 15, 2007

Regex re = new Regex(@"(\()([^\)]*)(\))");

Click to expand...

@"\( ([^\)]+) \)", RegexOptions.IgnorePatternWhitespace

is probably a bit simpler and faster - there's no real need to capture
the parens.

That returns an empty string...

You might be surprised. I compared a regex to find all tokens between
% signs (@"% (\w+) %") with a hand-coded state machine. Not only did
the hand-coded version take about 180 times as long to write (ie,
fifteen minutes vs five seconds) it also ran slower. As soon as the
task gets at all complex, a regex can save both programmer time and
run time.

I have a real "blind-spot" with regular expressions... After over 20 years
of programming in all sorts of languages, I *still* can't do them in my
head, or look at them and know intuitively what they're doing... :-)

Mark Rae · Jan 15, 2007

Others have supplied working regexes - so I won't repeat.

OK - thanks...

=?ISO-8859-1?Q?Arne_Vajh=F8j?= · Jan 15, 2007

Mark said:
I have a real "blind-spot" with regular expressions... After over 20 years
of programming in all sorts of languages, I *still* can't do them in my
head, or look at them and know intuitively what they're doing...

The syntax is horrible.

But it is well documented. And there are a ton of supporting
tools out there.

Arne

Mark Rae · Jan 15, 2007

The syntax is horrible.

That's for sure!

But it is well documented. And there are a ton of supporting
tools out there.

Can you recommend one? I've looked at several over the years, but almost all
of them seem to be designed to show the effect of a regular expression on a
string, rather than "build me a regular expression which will..."

If you could have found one which would have built me the "find all the text
between the opening and closing parentheses" expression, I wouldn't have
troubled the newsgroup...

=?ISO-8859-1?Q?Arne_Vajh=F8j?= · Jan 15, 2007

Mark said:
Can you recommend one? I've looked at several over the years, but almost all
of them seem to be designed to show the effect of a regular expression on a
string, rather than "build me a regular expression which will..."

If you could have found one which would have built me the "find all the text
between the opening and closing parentheses" expression, I wouldn't have
troubled the newsgroup...

I am not aware of any english to regex translator, but an
interactive one that shows what you get out of an regex
expression is useful as well. Because it helps you
build up the regex incrementally.

Arne

Jon Shemitz · Jan 15, 2007

Mark said:
@"\( ([^\)]+) \)", RegexOptions.IgnorePatternWhitespace

is probably a bit simpler and faster - there's no real need to capture
the parens.

Click to expand...

That returns an empty string...

No, it doesn't. The Match.Value is the parenthesized expression;
Match.Groups[1].Value is the string in the parens.

Jon Shemitz · Jan 15, 2007

Gary said:
But perhaps you should be made aware of the limitations implicit in
regexes - the main one being commonly rendered as "regexes can't count". As
long as you are not having to deal with recursive structures, nested
delimiters and so on, regexes will often work well. But they can't be used
to "find the balancing brace", verify correct nesting or suchlike.

Not true with .NET regexes, btw.

Ethan Strauss · Jan 15, 2007

I haven't been following this whole thread, so I may be somewhat off topic,
but it seems relevant...

Don't forget that you can name groups in a Regex.
For example you could take the expression below and add a name as follows
@"\( (?<StuffIWantToActuallySee>[^\)]+) \)",
RegexOptions.IgnorePatternWhitespace

You can then get at it from the Group name.
Match.Groups["StuffIWantToActuallySee"].Value

I find that much easier to deal with than trying to get the groups by
number.

Ethan

Jon Shemitz said:
Mark said:

@"\( ([^\)]+) \)", RegexOptions.IgnorePatternWhitespace

is probably a bit simpler and faster - there's no real need to capture
the parens.

Click to expand...

That returns an empty string...

Click to expand...

No, it doesn't. The Match.Value is the parenthesized expression;
Match.Groups[1].Value is the string in the parens.

Mark Rae · Jan 15, 2007

No, it doesn't. The Match.Value is the parenthesized expression;
Match.Groups[1].Value is the string in the parens.

Apologies - my mistake.

Gary Stephenson · Jan 16, 2007

Not true with .NET regexes, btw.

Really? How so? They must then be fundamentally different to all regex
implementations I have seen or heard about. A quick scan of the
documentation doesn't reveal anything significantly different about .NET
regexes ... hmmm ...

As I understand it, in order to solve matching-brace type problems, a
push-down automaton is required, as opposed to a finite-state automaton. Do
..NET regexes somehow provide that?

Please explain,

respectfully,

gary

Jon Shemitz · Jan 16, 2007

As I understand it, in order to solve matching-brace type problems, a
push-down automaton is required, as opposed to a finite-state automaton. Do
.NET regexes somehow provide that?

Yes.

..NET capture groups capture all matching expressions, not just the
last one. The "balancing group definition" grouping construct
(?<-name>expr) pops the most recent capture if expr matches; the
(?(name)a|b) alternation construct lets you force the match to fail if
a stack is not empty.

See pgs 289-290 in
<http://www.midnightbeach.com/.net/ShemitzBook.Chapter11.pdf> for more
details.

Gary Stephenson · Jan 16, 2007

Hi Jon

----- Original Message -----
From: "Jon Shemitz" <[email protected]>
Newsgroups: microsoft.public.dotnet.languages.csharp
Sent: Tuesday, January 16, 2007 12:09 PM
Subject: Re: Using a regular expression to retrieve the text between two
parentheses

.NET capture groups capture all matching expressions, not just the
last one. The "balancing group definition" grouping construct
(?<-name>expr) pops the most recent capture if expr matches; the
(?(name)a|b) alternation construct lets you force the match to fail if
a stack is not empty.

See pgs 289-290 in
<http://www.midnightbeach.com/.net/ShemitzBook.Chapter11.pdf> for more
details.

Cool as! Thanks for that - very interesting indeed. Apologies to all for
misrepresenting (and underestimating) .NET regexes.

gary

Syntax for regular expression to highlight text in HTML string	2	Sep 22, 2005
Using regulare expressions to parse text (HTML)	4	May 1, 2004
using a regular expression to match up to but not including html start/end tags	9	Oct 11, 2008
Using regular expressions to parse INI file	1	Jun 28, 2004
How to parse for a substring using regular expressions??	2	Jan 30, 2004

Using a regular expression to retrieve the text between two parentheses

Mark Rae

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

Jon Shemitz

Gary Stephenson

Mark Rae

Mark Rae

Mark Rae

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

Mark Rae

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

Jon Shemitz

Jon Shemitz

Ethan Strauss

Mark Rae

Gary Stephenson

Jon Shemitz

Gary Stephenson

Ask a Question

Similar Threads