string routines and code libraries

  • Thread starter Thread starter zoro
  • Start date Start date
how do you think about performance comparation between regular string
operations and regular expression? can we use regular expressions with no
performance consideration (is performance slow down is very little)?
 
how do you think about performance comparation between regular string
operations and regular expression? can we use regular expressions with no
performance consideration (is performance slow down is very little)?

It entirely depends what you're doing. In some situations, compiled
regular expressions will be faster than the same kind of operations
done just with String methods - at least without significant work.

In most cases, however, regular expressions are slower, sometimes quite
significantly.
 
It's a well-known technique, and doesn't play well with threading
because the reference count has to be updated atomically.
 
Jon said:
Do you use regular expressions every time you need to do more than one
operation on a string then?

Mostly, yes, I do. I've been using regular expressions for a long time
and it's easier for me to read and verify one regular expression than
to understand multiple calls to index and substring. Also, that sort
of string manipulation is very easy to get wrong.

I certainly don't. I'd rather see a few
simple operations than one regular expression which could take a while
to understand or even to write properly in the first place.

With practice, you'll find that regular expressions are easy to
understand.
Regular expressions are great when they take the place of *complicated*
string processing, but when you've just got a few operations to
perform, I'll take the simplicity of straight string operations any
day.

As soon as you get to 'a few' operations, it's no longer simple. Such
code is quite prone to off-by-one errors, index out of range
exceptions, invalid argument exceptions, etc. It also tends to be
slower than a single regular expression match.
 
kevin cline said:
Mostly, yes, I do. I've been using regular expressions for a long time
and it's easier for me to read and verify one regular expression than
to understand multiple calls to index and substring.

Have all the other engineers who might read your code also been using
regular expressions for that long?
Also, that sort of string manipulation is very easy to get wrong.

Whereas no-one ever gets regular expressions wrong, I suppose? ;)
I certainly don't. I'd rather see a few

With practice, you'll find that regular expressions are easy to
understand.

Without practice, simple string calls are easy to understand, IME. Why
should anyone who has to read my code also have to have years of
experience with regular expressions?
As soon as you get to 'a few' operations, it's no longer simple.

If it genuinely is "a few" (as opposed to several including a couple of
loops), it can still be very simple IMO.
Such code is quite prone to off-by-one errors, index out of range
exceptions, invalid argument exceptions, etc.

Likewise regular expressions are prone to forgetting to escape certain
characters, forgetting just which bits need matching, etc. They're also
prone to assumptions in terms of portability - not all regular
expression environments are the same, so you either have to limit
yourself to a basic core, or learn the extensions in each and remember
which platform you're dealing with. Of course, not all string-handling
libraries are the same either - but I've got the compiler and
intellisense to help me there.
It also tends to be slower than a single regular expression match.

That's not my experience in the benchmarks I've done on various
operations over the years (in response to newsgroup questions). It
depends what exactly is being done, but often "hard-coded" string
operations are significantly faster. That makes sense, as they're
(each) less generalised.
 
Jon Skeet said:
Have all the other engineers who might read your code also been using
regular expressions for that long?


Whereas no-one ever gets regular expressions wrong, I suppose? ;)


Without practice, simple string calls are easy to understand, IME. Why
should anyone who has to read my code also have to have years of
experience with regular expressions?


If it genuinely is "a few" (as opposed to several including a couple of
loops), it can still be very simple IMO.


Likewise regular expressions are prone to forgetting to escape certain
characters, forgetting just which bits need matching, etc. They're also
prone to assumptions in terms of portability - not all regular
expression environments are the same, so you either have to limit
yourself to a basic core, or learn the extensions in each and remember
which platform you're dealing with. Of course, not all string-handling
libraries are the same either - but I've got the compiler and
intellisense to help me there.


"intellisense" is available only in .net platform.

That's not my experience in the benchmarks I've done on various
operations over the years (in response to newsgroup questions). It
depends what exactly is being done, but often "hard-coded" string
operations are significantly faster. That makes sense, as they're
(each) less generalised.



in my opinion, someone who has a little knowledge on regular expressions and
software engineering can sense where to use regular string operations or
regular expressions... if you ask me, ill choose expressing rather then
doing the work. doing the work is always more error prone.
 
"intellisense" is available only in .net platform.

Call it what you like, many IDEs have the same sort of auto-completion
and prompting with documentation that VS.NET has. Eclipse's version is
actually rather better than VS.NET 2003's, in fact.

I believe that most developers on most platforms use an IDE which can
help them with basic string handling.

I believe that very few developers use an IDE which can help them
(without having to go to a different view/window/whatever) get regular
expressions right first time.
in my opinion, someone who has a little knowledge on regular expressions and
software engineering can sense where to use regular string operations or
regular expressions... if you ask me, ill choose expressing rather then
doing the work. doing the work is always more error prone.

Of course, everyone in this thread probably thinks they can sense where
to use regular string operations and where to use regular expressions -
but come out with completely different answers.

And if you think that using a regular expression means you aren't doing
work, you're kidding yourself. There's a reason I see more questions
about regular expressions on the newsgroups than string operations -
and that reason is that regular expressions are relatively complex to
both read and write.
 
Jon said:
Have all the other engineers who might read your code also been using
regular expressions for that long?

Whereas no-one ever gets regular expressions wrong, I suppose? ;)

It's easier to get regular expressions right because they are usually
closer to the requirement. All I know is that I've seen a lot of buggy
string manipulation functions that could be easily performed with a
single regular expression.
Without practice, simple string calls are easy to understand, IME.

Individually, they are trivial to understand. But it's not so easy to
understand the purpose of five or six of them in a row, and usually not
at all easy to verify that the code is doing what it is supposed to do.

which
should anyone who has to read my code also have to have years of
experience with regular expressions?

I generally assume the other programmers on my team are competent
enough to read the documentation of library functions. It's not rocket
science, just basic computer science. An hour of study will save you
hundreds of hours of programming and debugging in the future.
 
kevin cline said:
It's easier to get regular expressions right because they are usually
closer to the requirement. All I know is that I've seen a lot of buggy
string manipulation functions that could be easily performed with a
single regular expression.

And I've seen people going out of their way to use regular expressions
(often needing to ask for help because they can't get it right on their
own) when the code can be significantly simpler with just a few string
operations.
Individually, they are trivial to understand. But it's not so easy to
understand the purpose of five or six of them in a row, and usually not
at all easy to verify that the code is doing what it is supposed to do.

I see it's gone up from "more than one" to "five or six"...

Verification is necessary with either technique, and should involve
enough test cases to give confidence. I'd be a lot happier
I generally assume the other programmers on my team are competent
enough to read the documentation of library functions.

I think it's far more likely that people will know the *basic* library
functions (including string manipulations) than that they'll know the
details of the regular expression dialect used on every platform they
happen to come across.

Even when you know regular expressions, when they become even slightly
non-trivial they take a while to understand, IMO.
It's not rocket science, just basic computer science. An hour of
study will save you hundreds of hours of programming and debugging in
the future.

I think we'll have to agree to disagree. Regular expressions certainly
have their place, but for me the bar for their use is much higher than
it is for you. I believe it's much easier to make a mistake -
particularly when changing the behaviour of a working regular
expression in a way which appears trivial at first sight, but where you
need to be careful about escaping, grouping etc.
 
This debate is all rather silly. The usefulness of Regular Expressions lies
in the purpose for which they were created. That is, pattern-matching. There
is quite a bit of difference between a string and a pattern. A string (in
the purely non-oop sense of the word) is a literal array of char. Each
character in it is a specific character, having a specific value. A pattern,
on the other hand, is a non-specific set of rules for determining whether or
not a given string (or substring of a string) satisfies the rules laid out
by the pattern.

When parsing a string for a string (as in the problem which sparked this
discussion), obviously you are not looking for a pattern. You are looking
for a string. A Regular Expression carries overhead with it which makes it
less optimal for this sort of use. Why use a sledge hammer to hammer a nail?

On the other hand, when parsing a string for one or more patterns, the
Regular Expression is the optimal tool for this sort of use. Regular
Expressions are designed using the most efficient algorithm for
pattern-matching. While one could certainly write the same algorithm in C#,
why build a sledgehammer when you already have one in your toolbox?

So, how about we shake hands and make up here, and move on to more important
matters? :-D

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer
A watched clock never boils.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Back
Top