function to match parenthesis '(' and ')'

S

Steve Kirsch

I need a simple function that can match the number of beginning and ending
parenthesis in an expression. Here's a sample expression:

( ( "john" ) and ( "jane" ) and ( "joe" ) )

Does .NET have something built-in that can accomplish this, or do I have to
write my own parser? I don't want to reinvent the wheel if possible.
 
P

Paul E Collins

Steve Kirsch said:
I need a simple function that can match the number
of beginning and ending parenthesis in an expression.
Here's a sample expression:
( ( "john" ) and ( "jane" ) and ( "joe" ) )

There's nothing in the Framework to count the number of occurrences of
a character in a string, but you can find any one occurrence with
String.IndexOf, and do this in a loop to count them all. (Each time
you find one, increment the start position and loop back.)

P.
 
A

ALI RAZA

Salam

The simple solution to this problem is to implement a stack, when u
encounter '(' push it on the stack, and when u encounter ')', pop it from
the stack.

In the End, check if the stach is empty, if it is empty then the parenthesis
are match else they are not.

--
ALI RAZA SHAIKH
MCAD.net

www.programmersparadise.cjb.net
alirazashaikh.blogspot.com
 
A

ALI RAZA

Salam

The simple solution to this problem is to implement a stack, when u
encounter '(' push it on the stack, and when u encounter ')', pop it from
the stack.

In the End, check if the stach is empty, if it is empty then the parenthesis
are match else they are not.


--
ALI RAZA SHAIKH
MCAD.net

www.programmersparadise.cjb.net
alirazashaikh.blogspot.com
 
M

Marcus Andrén

I need a simple function that can match the number of beginning and ending
parenthesis in an expression. Here's a sample expression:

( ( "john" ) and ( "jane" ) and ( "joe" ) )

Does .NET have something built-in that can accomplish this, or do I have to
write my own parser? I don't want to reinvent the wheel if possible.

private bool IsValidString(string expression)
{
int balance = 0;
foreach(char c in expression)
switch(c)
{
case '(':
balance++;
break;
case ')':
if(--balance < 0)
return false;
break;
}
return balance == 0;
}
 
K

Kevin Spencer

How about a Regular Expression?

The following Regular Expression will capture all opening and closing
parentheses. It puts opening parentheses into Group 1 ("opening"), and
closing parenthese into Group 2 ("closing"):

(?<opening>[\(])|(?<closing>[\)])

Using the Regex Class, you can count the number of each group separately.

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer
Ambiguity has a certain quality to it.
..
 
B

Bill Butler

ALI RAZA said:
Salam

The simple solution to this problem is to implement a stack, when u encounter '(' push it on the
stack, and when u encounter ')', pop it from the stack.

In the End, check if the stach is empty, if it is empty then the parenthesis are match else they
are not.

--

All of this assumes that ALL parens should be pushed on the stack.
For instance:
Console.WriteLine("(");
would not work.

In a situation like this you might need to make sure the paren is not part of a string.

All of this depends on the use to which the function is being applied.

Bill
 
J

Jon Skeet [C# MVP]

Kevin Spencer said:
How about a Regular Expression?

The following Regular Expression will capture all opening and closing
parentheses. It puts opening parentheses into Group 1 ("opening"), and
closing parenthese into Group 2 ("closing"):

(?<opening>[\(])|(?<closing>[\)])

Using the Regex Class, you can count the number of each group separately.

That will count them, but not make sure that they match - it would say
that:

)(

is valid, for instance.
 
K

Kevin Spencer

True, Jon, but that was not the question asked. A regular expression could
also be developed that would ensure they match. If it were requested, I
would be glad to take the time to do so.

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer
Ambiguity has a certain quality to it.

Jon Skeet said:
Kevin Spencer said:
How about a Regular Expression?

The following Regular Expression will capture all opening and closing
parentheses. It puts opening parentheses into Group 1 ("opening"), and
closing parenthese into Group 2 ("closing"):

(?<opening>[\(])|(?<closing>[\)])

Using the Regex Class, you can count the number of each group separately.

That will count them, but not make sure that they match - it would say
that:

)(

is valid, for instance.
 
J

Jon Skeet [C# MVP]

Kevin Spencer said:
True, Jon, but that was not the question asked.

Well, it asked about "matching" parentheses. I think it's reasonable to
assume that the OP doesn't actually want ")(" to match - especially
given the sample.
A regular expression could
also be developed that would ensure they match. If it were requested, I
would be glad to take the time to do so.

Hmm... I seem to remember something in CS classes saying that this is
something that regular expressions actually can't do. You can develop
them that will match down to a certain level of nesting (i.e. for any
given number of levels of nesting, you can build a regular expression
which will match it), but nothing to match *any* level of nesting. I
could easily be wrong though...
 
K

Kevin Spencer

He did say "match," although what he meant by it was somewhat ambiguous. If
I have the time, I'll give it a shot.

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer
Ambiguity has a certain quality to it.
 
S

Steve Kirsch

Hi folks,
I am the person who started this thread, and appreciate all the assistance
and code samples. To clarify, I was looking for a simple way to validate an
expression by counting the number of opening and closing parenthesis. If the
expression is determined to be invalid based on that criteria, I show a
message box to the user saying please check your input.

Marcus Andren's code was simple and easy enough to do the job, so I'm using
that for now. I'm not knowledgable about regular expressions at the moment,
and don't have the time to learn them right now (this project is due
tomorrow!). So, I'll stick with the simple validation accomplished by
Marcus's code (above), and hopefully I'll learn some regular expressions in
the future to come enhance this in a future version. If one of you would
like to suggest an alternative to Marcus' code using regular expressions,
please post the code. Thanks again, and best regards to all.

=Steve

Kevin Spencer said:
He did say "match," although what he meant by it was somewhat ambiguous.
If I have the time, I'll give it a shot.

--
HTH,

Kevin Spencer
Microsoft MVP
.Net Developer
Ambiguity has a certain quality to it.
 
J

Jon Skeet [C# MVP]

Steve Kirsch said:
Marcus Andren's code was simple and easy enough to do the job, so I'm using
that for now. I'm not knowledgable about regular expressions at the moment,
and don't have the time to learn them right now (this project is due
tomorrow!). So, I'll stick with the simple validation accomplished by
Marcus's code (above), and hopefully I'll learn some regular expressions in
the future to come enhance this in a future version. If one of you would
like to suggest an alternative to Marcus' code using regular expressions,
please post the code. Thanks again, and best regards to all.

It's worth learning regular expressions just because they're very
useful in certain situations. However, any regular expression solution
to this problem is likely to be significantly more complicated than the
code Marcus provided, and probably slower too.

It's very easy to see by inspection that Marcus's code works.
(Admittedly I'd decrement balance in one statement and check it in
another, but that's a minor nit-pick.) Checking that a matching regular
expression - especially one that copes with arbitrary levels of nesting
- is correct would be very, very hard.

It's always worth thinking about the right tool for the job - in this
case, regular expressions aren't the right tool, IMO.
 
L

Ludovic SOEUR

Steve, you can use the following regex :
^[^()]*(?:\(([^()]*(?:(?:(\()[^()]*)+(?:(?<-2>\))[^()]*)+)*(?(2)(?!)))\)[^()
]*)*$

Here a simple example to know if parenthesis match :
Regex regex=new
Regex(@"^[^()]*(?:\(([^()]*(?:(?:(\()[^()]*)+(?:(?<-2>\))[^()]*)+)*(?(2)(?!)
))\)[^()]*)*$");
if (!regex.IsMatch(theStringIWantToCheck)) MessageBox.Show("Parenthesis
do not match !");

Another example to know all the subexpressions (not the inner subexpression
but you can find them with recursivity) :
Regex regex=new
Regex(@"^[^()]*(?:\(([^()]*(?:(?:(\()[^()]*)+(?:(?<-2>\))[^()]*)+)*(?(2)(?!)
))\)[^()]*)*$");
Match match=regex.Match(theStringIWantToCheck);
if (match.Success) {
foreach(Capture c in match.Groups[1].Captures)
MessageBox.Show("SubExpression="+c.Value);
//You can have the number of subexpressions with
match.Groups[1].Captures.Count
} else {
MessageBox.Show("Parenthesis do not match !");
}

If you want to understand how it works, there is a good article from Ryan
Byington that helped me to make this regexe :
http://blogs.msdn.com/bclteam/archive/2005/03/15/396452.aspx

You can also use RegexOptions.Compiled in Regex, and also use Static Regex
class if you want a faster Regex.

Marcus Code is obviously quicker than this one with Regex (on my computer,
it's 4 times quicker), but I would prefer Regex. It's more "object" like.
You can do what you want without modifying your source code. Everything is
stored in Capures field in one shot, so you can navigate very easily. If you
wanted to do the same with Marcus Code, I guess it would not be 4 times
quicker but maybe not more than 2 times quicker.)

Hope it helps,

Ludovic Soeur.
 
M

Michael S

Jon Skeet said:
Checking that a matching regular
expression - especially one that copes with arbitrary levels of nesting
- is correct would be very, very hard.

Ludovic SOEUR said:
Steve, you can use the following regex :
^[^()]*(?:\(([^()]*(?:(?:(\()[^()]*)+(?:(?<-2>\))[^()]*)+)*(?(2)(?!)))\)[^()
]*)*$


Hehe. Guess Ludovic just proved Jon's point. =)

I think if you are evil enough to write such a regexp, you must never ever
supply a comment telling what the regexp does. Heck, the maintainance guys
need to think for themselves! =)

Happy Coding
- Michael S
 
J

Jon Skeet [C# MVP]

Ludovic said:
Steve, you can use the following regex :
^[^()]*(?:\(([^()]*(?:(?:(\()[^()]*)+(?:(?<-2>\))[^()]*)+)*(?(2)(?!)))\)[^()
]*)*$

Here a simple example to know if parenthesis match :
Regex regex=new
Regex(@"^[^()]*(?:\(([^()]*(?:(?:(\()[^()]*)+(?:(?<-2>\))[^()]*)+)*(?(2)(?!)
))\)[^()]*)*$");

That's simple, is it? Out of interest, how long do you think it would
take to understand that, if presented with it with no prior knowledge?
How long do you think you'd take to verify that it's correct?

Marcus Code is obviously quicker than this one with Regex (on my computer,
it's 4 times quicker), but I would prefer Regex. It's more "object" like.

Well, the code which does it explicitly is faster, and *much* easier to
understand. I don't see where the real benefit comes in, just from it
being more "object" like. You could always encapsulate Marcus's code in
a "ParenthesisMatcher" class or something similar.

With the explicit code, it's easy to modify, easy to verify by
inspection, easy to understand, and fast. The regular expression is
none of these. It's just the wrong tool for the job, IMO.

Jon
 
L

Ludovic SOEUR

I did not think that my post would start a debate. So as it has started, I
have to reply.

I put this sample to prove that it's possible with regex and I would like to
do a time comparison too. I still stay on my point of view when I say I
prefer RegEx version if you do not need to do this check 100 000 times.

First of all, I would like to answer to Michael's reply. No, I'm not evil
enough to write such a RegEx. I used the Regex from the Ryan Byington's one
http://blogs.msdn.com/bclteam/archive/2005/03/15/396452.aspx
that is very well explained. The only improvment was to remove all 'names'
and to capture ONLY valid strings (the ones Steve is looking for). I ALWAYS
explain my regular expressions and this one is a very usefull one that is
known by most of who works with regex.
Maintenance for this case is useless. It's like providing a function for the
System namespace. You dont have to do maintenance for "substring" method of
string class. When a regular expression is known to be correct and used by
many developpers, you don't have to change anything. Only put a link to the
original explanation and add comment in your code if you want to be more
precise. On the other hand, Marcus code must be commented, and i would say,
more than the regex one...it should be surprising for most people who are
not used to balancing in regex.

Second, to answer to Jon, it took me 2 minutes to understand the original
regex from Ryan Byington. Firstly because it's well commented and also
because I am used to writting balanced regex. To verify it was correct was
very quick. Encapsulate Marcus' code in a "ParenthesisMatcher" is logical
and I agree with you. Doing that and then store all subparts to have a
direct access takes time. I said it was 4 times quicker (in fact it's 3.5)
and I guess with en encapulated version, Marcus's one can reach 2 times. So
why taking time to write an entire class doing the check, giving a direct
access to the subelements, comment all this methods of course, write many
lines of code ? It is alredy done with this regex which is known. It does
all the works. Why allways rewriting what is done and working ?

I totally disagree when you say that regular expression is not the job for
this work. With one line that you do not mind if it's understable, you can
do everything you need with parenthesis balancing. It's the right tool if
you not need to do a loop over thousands of checks, you don't have to write
your own classes or method, everything is already done. It's like using a
dll, libray, in four words, reusing what was done.

Ludovic SOEUR.



Jon Skeet said:
Ludovic said:
Steve, you can use the following regex :
^[^()]*(?:\(([^()]*(?:(?:(\()[^()]*)+(?:(? said:
]*)*$

Here a simple example to know if parenthesis match :
Regex regex=new
Regex(@"^[^()]*(?:\(([^()]*(?:(?:(\()[^()]*)+(?:(? said:
))\)[^()]*)*$");

That's simple, is it? Out of interest, how long do you think it would
take to understand that, if presented with it with no prior knowledge?
How long do you think you'd take to verify that it's correct?

Marcus Code is obviously quicker than this one with Regex (on my computer,
it's 4 times quicker), but I would prefer Regex. It's more "object"
like.

Well, the code which does it explicitly is faster, and *much* easier to
understand. I don't see where the real benefit comes in, just from it
being more "object" like. You could always encapsulate Marcus's code in
a "ParenthesisMatcher" class or something similar.

With the explicit code, it's easy to modify, easy to verify by
inspection, easy to understand, and fast. The regular expression is
none of these. It's just the wrong tool for the job, IMO.

Jon
 
K

Kevin Spencer

Hi Jon,

That's not too long for a Regular Expression. They are quite cryptic,
though, if you're not used to the syntax. Unlike C, there is no ability to
separate sections of instruction, even across lines. Like C, though, the
more you practice, the better you get at it. It also helps to have a good
IDE (or 2) for developing and testing Regular Expressions in.

Regular Expressions are quite powerful in terms of how they do their
parsing. It is essentially procedural, and character-based.

Still, in this case, I can see how Marcus' code would be a more elegant
solution. Just wanted to pique your interest in the power of Regular
Expressions, which are invaluable in certain situations, such as parsing
complex patterns in large strings and/or documents, replacing patterns
rather than identical blocks of text, etc.

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer
Ambiguity has a certain quality to it.

Jon Skeet said:
Ludovic said:
Steve, you can use the following regex :
^[^()]*(?:\(([^()]*(?:(?:(\()[^()]*)+(?:(?<-2>\))[^()]*)+)*(?(2)(?!)))\)[^()
]*)*$

Here a simple example to know if parenthesis match :
Regex regex=new
Regex(@"^[^()]*(?:\(([^()]*(?:(?:(\()[^()]*)+(?:(?<-2>\))[^()]*)+)*(?(2)(?!)
))\)[^()]*)*$");

That's simple, is it? Out of interest, how long do you think it would
take to understand that, if presented with it with no prior knowledge?
How long do you think you'd take to verify that it's correct?

Marcus Code is obviously quicker than this one with Regex (on my
computer,
it's 4 times quicker), but I would prefer Regex. It's more "object" like.

Well, the code which does it explicitly is faster, and *much* easier to
understand. I don't see where the real benefit comes in, just from it
being more "object" like. You could always encapsulate Marcus's code in
a "ParenthesisMatcher" class or something similar.

With the explicit code, it's easy to modify, easy to verify by
inspection, easy to understand, and fast. The regular expression is
none of these. It's just the wrong tool for the job, IMO.

Jon
 
J

Jon Skeet [C# MVP]

Kevin said:
That's not too long for a Regular Expression.

Maybe not for a regular expression, but compared with the simple C#
code, it's much too long and cryptic.
They are quite cryptic, though, if you're not used to the syntax.

I would argue that they're *relatively* cryptic even if you are used to
the syntax. I'm not exactly a total newbie to regular expressions, but
it would have taken me a fair amount of time to work out what that one
was doing and verify it.
Unlike C, there is no ability to
separate sections of instruction, even across lines. Like C, though, the
more you practice, the better you get at it. It also helps to have a good
IDE (or 2) for developing and testing Regular Expressions in.

Does that mean you agree that it's pretty difficult to understand them
by just inspection?
Regular Expressions are quite powerful in terms of how they do their
parsing. It is essentially procedural, and character-based.

Oh, they're incredibly powerful and very useful - *in the right
situations*.
Still, in this case, I can see how Marcus' code would be a more elegant
solution. Just wanted to pique your interest in the power of Regular
Expressions, which are invaluable in certain situations, such as parsing
complex patterns in large strings and/or documents, replacing patterns
rather than identical blocks of text, etc.

I certainly appreciate that regexes are useful. I just don't think this
is a place where they're suitable. I would guess that at least half of
the requests for regular expressions I see on the newsgroups are
inappropriate places to apply regular expressions. A lot of people seem
to approach any textual problem with the immediate idea of "use a
regular expression!" without considering whether or not it's the
best/simplest way of solving it.

Jon
 
J

Jon Skeet [C# MVP]

Ludovic said:
Maintenance for this case is useless. It's like providing a function for the
System namespace. You dont have to do maintenance for "substring" method of
string class. When a regular expression is known to be correct and used by
many developpers, you don't have to change anything.

So you're assuming that it will never, ever have to change it's
function. It won't have to, say, change to balance < and > as well as
or instead of ( and ). I don't like making that kind of assumption.

I don't like having code which is hard to read - in particular, harder
to read that an alternative, which is the case here. In many cases,
regular expressions prevent you from having to write very complicated
parsers. In those cases, the REs are the simpler code. Here, they're
just not.
Second, to answer to Jon, it took me 2 minutes to understand the original
regex from Ryan Byington. Firstly because it's well commented and also
because I am used to writting balanced regex. To verify it was correct was
very quick.

Do you think every maintenance engineer would only take two minutes?
How long did it take you to understand the code Marcus produced? It
certainly took me less than two minutes...
Encapsulate Marcus' code in a "ParenthesisMatcher" is logical
and I agree with you. Doing that and then store all subparts to have a
direct access takes time. I said it was 4 times quicker (in fact it's 3.5)
and I guess with en encapulated version, Marcus's one can reach 2 times. So
why taking time to write an entire class doing the check, giving a direct
access to the subelements, comment all this methods of course, write many
lines of code ? It is alredy done with this regex which is known. It does
all the works. Why allways rewriting what is done and working ?

Either you encapsulate this in another method (or specific type) or you
don't. If you don't, you have to have the bare regular expression
itself available, which is pretty unpleasant IMO. If you do, it doesn't
matter (from a usage point of view) whether you use a regular
expression or the code from Marcus.
I totally disagree when you say that regular expression is not the job for
this work. With one line that you do not mind if it's understable, you can
do everything you need with parenthesis balancing.

You may not mind whether or not all your code is as readable as it can
be. I certainly do.

You might be interested to know that I ran the alternatives of "use a
regular expression" or "use some simple logic" past a couple of
colleagues - they were pretty much dumbfounded that anyone would use a
regular expression in this situation, where the logic is so
straightforward.

Jon
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top