regex - long operations

A

AlexS

Hi

Is there a way to interrupt long running regex parsing operation or perform
it in a way, which will allow to specify max time of parsing?

I use regex to check basic format of some messages. In some cases, when
format is wrong, regex takes 100% cpu for very long time - in excess of
minutes. I would like to be able to control this process, for example, when
regex expression can't be evaluated in 0.5 second it should be aborted. Then
I can use more specialized parsing expression.

I seem not to be able to find a way to solve this issue in elegant way.

Any pointers available?

Thanks!
 
J

Jerry III

This is generally the result of nesting patterns, combined with greedy
matching it can take a loooong time to run the regexp. I don't think you can
abort a running regexp matching...

Jerry
 
T

thumper

You can not abort a running regexp, but you can abort a
thread. Kick your regexp off with a thread and if it
doesn't return with in your max time, abort that thread
and move on.
 
J

Jerry III

In .Net framework you cannot kill a thread predictably. There's absolutely
no guarantee that the Abort (I hope you're talking about Thread.Abort) will
actually stop the thread. Read the manual.

Jerry
 
A

AlexS

It doesn't make sense to post specific regex. Currently I have around 20
different of them. String do vary from 128b to 10Mb. Users will add some
more. They might also add something stupid. Which will run for a long time.

So, it looks like I have a wish. Anyway, my analysis shows that if source
string is improperly formatted - means not as regex is expecting - and you
have recursion in the expression or collect all repetitions of specific
pattern, this might happen. I mean 100% cpu for a long time.

So, either users won't be allowed to make their own regexes (most of them
don't know what is "greedy" except in the context of personal behavior),
either MS has to do something in this respect.

How's Java regex engine behaving in such situations - anybody?

Thanks for confirming my worst expectations...

Alex
 
J

Jerry III

Java's regexps will behave pretty much the same, this is a problem with
regexp algorithms, not an actual implementation. And personally - I don't
think it's a good idea to let users enter their own expressions, especially
when you acknowledge that most of them will have no idea what they would be
doing. It would create some nice DoS issues in your app.

Jerry
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Top