I did not think that my post would start a debate. So as it has started, I
have to reply.
I put this sample to prove that it's possible with regex and I would like to
do a time comparison too. I still stay on my point of view when I say I
prefer RegEx version if you do not need to do this check 100 000 times.
First of all, I would like to answer to Michael's reply. No, I'm not evil
enough to write such a RegEx. I used the Regex from the Ryan Byington's one
http://blogs.msdn.com/bclteam/archive/2005/03/15/396452.aspx
that is very well explained. The only improvment was to remove all 'names'
and to capture ONLY valid strings (the ones Steve is looking for). I ALWAYS
explain my regular expressions and this one is a very usefull one that is
known by most of who works with regex.
Maintenance for this case is useless. It's like providing a function for the
System namespace. You dont have to do maintenance for "substring" method of
string class. When a regular expression is known to be correct and used by
many developpers, you don't have to change anything. Only put a link to the
original explanation and add comment in your code if you want to be more
precise. On the other hand, Marcus code must be commented, and i would say,
more than the regex one...it should be surprising for most people who are
not used to balancing in regex.
Second, to answer to Jon, it took me 2 minutes to understand the original
regex from Ryan Byington. Firstly because it's well commented and also
because I am used to writting balanced regex. To verify it was correct was
very quick. Encapsulate Marcus' code in a "ParenthesisMatcher" is logical
and I agree with you. Doing that and then store all subparts to have a
direct access takes time. I said it was 4 times quicker (in fact it's 3.5)
and I guess with en encapulated version, Marcus's one can reach 2 times. So
why taking time to write an entire class doing the check, giving a direct
access to the subelements, comment all this methods of course, write many
lines of code ? It is alredy done with this regex which is known. It does
all the works. Why allways rewriting what is done and working ?
I totally disagree when you say that regular expression is not the job for
this work. With one line that you do not mind if it's understable, you can
do everything you need with parenthesis balancing. It's the right tool if
you not need to do a loop over thousands of checks, you don't have to write
your own classes or method, everything is already done. It's like using a
dll, libray, in four words, reusing what was done.
Ludovic SOEUR.
Jon Skeet said:
Ludovic said:
Steve, you can use the following regex :
^[^()]*(?:\(([^()]*(?:(?:(\()[^()]*)+(?:(? said:
]*)*$
Here a simple example to know if parenthesis match :
Regex regex=new
Regex(@"^[^()]*(?:\(([^()]*(?:(?:(\()[^()]*)+(?:(? said:
That's simple, is it? Out of interest, how long do you think it would
take to understand that, if presented with it with no prior knowledge?
How long do you think you'd take to verify that it's correct?
Marcus Code is obviously quicker than this one with Regex (on my computer,
it's 4 times quicker), but I would prefer Regex. It's more "object"
like.
Well, the code which does it explicitly is faster, and *much* easier to
understand. I don't see where the real benefit comes in, just from it
being more "object" like. You could always encapsulate Marcus's code in
a "ParenthesisMatcher" class or something similar.
With the explicit code, it's easy to modify, easy to verify by
inspection, easy to understand, and fast. The regular expression is
none of these. It's just the wrong tool for the job, IMO.
Jon