Bug in Regex?

N

Nils Gösche

Hi!

I think I stumbled upon a bug in the Regex class (well, a customer did,
actually):

using System;
using System.Text.RegularExpressions;

namespace ConsoleTest
{
class Program
{
static void Main(string[] args)
{
Regex regex = new Regex(@".*(\W.*)*Belastungs.*(\W.*)*");
string testInput = "Datum motivation lAItardsr "
+ "JKoopsrations- fErfesoung !bera(t3cha 1";
Console.WriteLine("Result is: '{0}'", regex.Match(testInput));
}
}
}

If I compile this code as a C# program in Visual Studio 2005 and run it,
the program hangs forever in regex.Match. I haven't tried VS 2008 yet.

Is this a known bug?

Regards,
 
H

Hans Kesting

Nils Gösche presented the following explanation :
Hi!

I think I stumbled upon a bug in the Regex class (well, a customer did,
actually):

using System;
using System.Text.RegularExpressions;

namespace ConsoleTest
{
class Program
{
static void Main(string[] args)
{
Regex regex = new Regex(@".*(\W.*)*Belastungs.*(\W.*)*");
string testInput = "Datum motivation lAItardsr "
+ "JKoopsrations- fErfesoung !bera(t3cha 1";
Console.WriteLine("Result is: '{0}'", regex.Match(testInput));
}
}
}

If I compile this code as a C# program in Visual Studio 2005 and run it,
the program hangs forever in regex.Match. I haven't tried VS 2008 yet.

Is this a known bug?

Regards,

At least in SnippetCompiler 2008 (which uses the C#3 compiler against
framework 3.5) this also hangs.

The regex is a bit weird (but that's no reason to hang), because you
effectively just search for a substring "Belastungs" surrounded by
"anything".

Hans Kesting
 
N

Nils Gösche

Hans Kesting said:
At least in SnippetCompiler 2008 (which uses the C#3 compiler against
framework 3.5) this also hangs.

Good to know, thanks.
The regex is a bit weird (but that's no reason to hang), because you
effectively just search for a substring "Belastungs" surrounded by
"anything".

It was the customer who wrote that regex. I am innocent! :)

Regards,
 
N

Nils Gösche

maciek kanski said:
Hangs on 3.5SP1 too. But regexp works well on JS (IE7, FF3), with egrep
on BSD also so probably it's a bug.

A (compiled) regex is a finite state machine, with single characters as
input. The input is consumed character by character, with a possible
state change at each step, until the input string is over. This should
never take as much as a millisecond, independent of the particular
regex.

Regards,
 
T

Tim Roberts

A (compiled) regex is a finite state machine, with single characters as
input. The input is consumed character by character, with a possible
state change at each step, until the input string is over. This should
never take as much as a millisecond, independent of the particular
regex.

That's not true. Many regexes require backtracking. It's not all that
hard to make a complicated regex that takes a very long time to run.
 
N

Nils Gösche

Tim Roberts said:
That's not true. Many regexes require backtracking. It's not all that
hard to make a complicated regex that takes a very long time to run.

That depends on your regex implementation. The typical way of
implementing regular expression matchers takes a regular expression,
which represents a non-deterministic state machine, and »compile« it by
transforming the non-deterministic state machine into a deterministic
one. In that case, matching is always fast, as the input is simply
consumed character by character until EOI, as I described.

A backtracking implementation is necessary only for advanced regex
features like backreferences (like \1). But even those should not take
a long time to run for an input string of only 81 characters.

Regards,
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top