Regex issue

  • Thread starter Thread starter John B
  • Start date Start date
J

John B

I am trying to do a pretty simple pattern match using regex.

The pattern is ^(?:(?<Item>.*?)@:@)*$.

This should return a match for test123@:@ but does not, instead it never
returns when I call Regex.Match, I have to kill the thread.

The code is below, The Regulator (some will know it) returns the correct
results, I cannot work out why.

RegexOptions options = RegexOptions.IgnoreCase |
RegexOptions.IgnorePatternWhitespace | RegexOptions.Multiline;
Regex matcher = new Regex(@"^(?:(?<Item>.*?)@:@)*$", options);
Match match = matcher.Match(@"test123@:@"); //Never returns from here.

I have tried this on two machines, it just doesnt work.

TIA

JB
 
Hello John,
I am trying to do a pretty simple pattern match using regex.

The pattern is ^(?:(?<Item>.*?)@:@)*$.

This should return a match for test123@:@ but does not, instead it
never returns when I call Regex.Match, I have to kill the thread.

The code is below, The Regulator (some will know it) returns the
correct results, I cannot work out why.

RegexOptions options = RegexOptions.IgnoreCase |
RegexOptions.IgnorePatternWhitespace | RegexOptions.Multiline; Regex
matcher = new Regex(@"^(?:(?<Item>.*?)@:@)*$", options); Match match =
matcher.Match(@"test123@:@"); //Never returns from here.

I have tried this on two machines, it just doesnt work.

TIA

JB

^(?:(?<Item>.*?)@:@)*$

I'd giess that the problem lies in the last *, but I'm unsure why it would
hang the parser. I've seen these kinds of problems before where a lot of
backtracking is possible.

a better pattern would be: ^(?<item>.*)@:@$

The extra ? after the first .* isn't really needed, unless there are more
than one @:@ signs in your text at the end of a line (which I doubt). But
as you've not sent us the text you're trying to extract your pattern from
I'd have to guess that.

Another pattern that would probably work would be:

^(?<item>[^@]*)@:@$ but in there I assume that @ does not precede the @:@
at all.

I hope this helps. If not, please add some sample input.

Another possible problem might be that your input is simply too large. The
engine isn't very good at reading multiple megabytes of text at once. As
you're looking at the lines one by one, you could use that to your advantage
and load a couple of lines, try to match and load the next couple of lines.
A StringReader would be ideal for that purpose.
 
Jesse said:
Hello John,
^(?:(?<Item>.*?)@:@)*$

I'd giess that the problem lies in the last *, but I'm unsure why it
would hang the parser. I've seen these kinds of problems before where a
lot of backtracking is possible.

a better pattern would be: ^(?<item>.*)@:@$

Hi Jesse, thanks for the reply.
I need the Group(Named Group())repetition because I am trying to match

a@:@b@:@c@:@d@:@e@:@f@:@g@:@
1@:@2@:@3@:@4@:@

etc.. (a..b..c are not the real values, it is actually more characters,
but the effect should be the same.
The extra ? after the first .* isn't really needed, unless there are
more than one @:@ signs in your text at the end of a line (which I
doubt). But as you've not sent us the text you're trying to extract
your pattern from I'd have to guess that.

Hmm, it is necessary in the regulator, and should be by my
understanding, to make it non-greedy.
Another pattern that would probably work would be:

^(?<item>[^@]*)@:@$ but in there I assume that @ does not precede the
@:@ at all.

I hope this helps. If not, please add some sample input.

See above.
Another possible problem might be that your input is simply too large.
The engine isn't very good at reading multiple megabytes of text at
once. As you're looking at the lines one by one, you could use that to
your advantage and load a couple of lines, try to match and load the
next couple of lines. A StringReader would be ideal for that purpose.

Yep, I thought of that too, I have tried it with a single line, half a
line, and a single possible match, all hang the parser.

A funny thing though, originally when I was experimenting with less than
a line, it would work for a bit and the stop at a certain point, now it
wont even match the 1 candidate.

TIA

JB
 
Hello John,
Jesse said:
Hello John,
^(?:(?<Item>.*?)@:@)*$

I'd giess that the problem lies in the last *, but I'm unsure why it
would hang the parser. I've seen these kinds of problems before where
a lot of backtracking is possible.

a better pattern would be: ^(?<item>.*)@:@$
Hi Jesse, thanks for the reply.
I need the Group(Named Group())repetition because I am trying to match
a@:@b@:@c@:@d@:@e@:@f@:@g@:@
1@:@2@:@3@:@4@:@
etc.. (a..b..c are not the real values, it is actually more
characters, but the effect should be the same.
The extra ? after the first .* isn't really needed, unless there are
more than one @:@ signs in your text at the end of a line (which I
doubt). But as you've not sent us the text you're trying to extract
your pattern from I'd have to guess that.
Hmm, it is necessary in the regulator, and should be by my
understanding, to make it non-greedy.
Another pattern that would probably work would be:

^(?<item>[^@]*)@:@$ but in there I assume that @ does not precede the
@:@ at all.

I hope this helps. If not, please add some sample input.
See above.
Another possible problem might be that your input is simply too
large. The engine isn't very good at reading multiple megabytes of
text at once. As you're looking at the lines one by one, you could
use that to your advantage and load a couple of lines, try to match
and load the next couple of lines. A StringReader would be ideal for
that purpose.
Yep, I thought of that too, I have tried it with a single line, half a
line, and a single possible match, all hang the parser.

A funny thing though, originally when I was experimenting with less
than a line, it would work for a bit and the stop at a certain point,
now it wont even match the 1 candidate.

Given your input, try the following:

(?<item>[^@]+)@:@

Or even

(?<item>(?:(?!@:@).)*)@:@

No ^ or $ anywhere just this pattern. It will match single occurances each
time.

You should be able to loop through the results with

Match m = matcher.Match(text);
while(m.Success)
{
// Do stuff
m = m.NextMatch();
}

You can use the match's start and end position to verify the input is correct,
instead of relying on the pattern to match the whole string. This is probably
much faster, and because each time you apply the pattern only a short part
fo the string needs to be matched, it shouldn't make the engine hang.

Another option that should be even easier to use I'd guess, would be Regex.Split
on "@:@". That should result in an array of the other values. But then again,
be careful with the length of the input.
 
Jesse Houwing wrote:
Given your input, try the following:

(?<item>[^@]+)@:@

Or even

(?<item>(?:(?!@:@).)*)@:@

No ^ or $ anywhere just this pattern. It will match single occurances
each time.
<...>
Thanks Jesse,

I realize that there are plenty of other ways to parse this input, but I
cannot figure out why it is failing when it works fine in regulator.

Cheers,
JB
 
This should return a match for test123@:@ but does not, instead it never 
returns when I call Regex.Match, I have to kill the thread.

It sounds as if you are running your code under the debugger. I had what
sounds like the same problem a few weeks ago involving regular expressions
that worked fine using a regex test tool (in my case Expresso). I noticed
that if I set the breakpoint on the line after the match, the breakpoint
was reached. Eventually, I discovered that if the "Autos" window wasn't
open, Visual Studio was able to step though the code including the Match
line.

I suspect that Visual Studio has some problems with some of the objects in
the Match/Regular expression objects. This might be because the VS debugger
actually uses an object's ToString(format,format Provider) to display
values. If the object's ToString() routine takes an exception, it must be
very difficult for the debugger to handle. Any window that shows an
object's value might have the same problem.

Hope this helps.

Mike
 
Back
Top