Regex Novice needs help

  • Thread starter Thread starter Zach
  • Start date Start date
Z

Zach

I'm writing an app which is going to rely extremely heavily on the
usage of regular expressions. I'm reading the docs but having trouble
wrapping my head around some of this since it's all fairly new to me.
I have two questions, I'm hoping I can get answers to at least one :)
Any help is better than no help:

1) I have many cases I am checking if a particular string matches
against a particular regular expression. However, if the match happens
"inside" the string I don't consider it a match. I need the entire
string to constitute as a match. How can I force this check on the
RegEx engine?

2) Performance is going to be a big factor for this particular app. I
have about 300 pre-determined hardcoded regular expressions, and in
peak scenarios I will be matching incoming strings at a rate of about
10-15 per second. Is there a list of "guidelines" somewhere for
writing performance-aware regular expressions?

Thanks
Zach
 
Zach said:
I'm writing an app which is going to rely extremely heavily on the
usage of regular expressions. I'm reading the docs but having trouble
wrapping my head around some of this since it's all fairly new to me.
I have two questions, I'm hoping I can get answers to at least one :)
Any help is better than no help:

1) I have many cases I am checking if a particular string matches
against a particular regular expression. However, if the match happens
"inside" the string I don't consider it a match. I need the entire
string to constitute as a match. How can I force this check on the
RegEx engine?

Use ^ and $ to specify the start and end of the string.
2) Performance is going to be a big factor for this particular app. I
have about 300 pre-determined hardcoded regular expressions, and in
peak scenarios I will be matching incoming strings at a rate of about
10-15 per second. Is there a list of "guidelines" somewhere for
writing performance-aware regular expressions?

Do you mean you'd be running 300 regular expressions on each of 10-15
seconds per second? I wouldn't like to say for *sure* without testing
it (with examples of the actual regular expressions and sample data)
but I wouldn't have thought that would be a problem.

One important thing is to make sure you build the regular expressions
ahead of time and re-use them rather than creating new ones each time.
Also, use RegexOptions.Compiled. I'm sure others will be able to help
further - but the best thing to do to start with is to work out your
regular expressions and create a good sample data set. Then measure,
measure, measure - whenever you change something, run the test data set
through again and record the change to performance. Make sure you keep
that record - don't just do it on a scrap of paper. If possible, keep
the test results in the same source control system as the source, so
you can work out *exactly* which set of test results came from which
version of the code.
 
What I meant regarding the 300 and the 10-15 numbers is that my entire
set of regular expressions consists of about 300ish. Sometimes I will
have around 10-15 input strings per second to check against these
regular expressions. However, each input string will never be checked
against more than 3-4 regular expressions out of those 300. So a true
worst case is like (10-15)*(3-4) = 30-60 -> 45ish matches per second or
so.
 
Zach said:
What I meant regarding the 300 and the 10-15 numbers is that my entire
set of regular expressions consists of about 300ish. Sometimes I will
have around 10-15 input strings per second to check against these
regular expressions. However, each input string will never be checked
against more than 3-4 regular expressions out of those 300. So a true
worst case is like (10-15)*(3-4) = 30-60 -> 45ish matches per second or
so.

Right - that shouldn't be a problem at all. As ever though, it's worth
measuring. Of course, if the regexes are incredibly complicated, it
could take a long time.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Back
Top