Regular Expression Hangs

S

shawnmkramer

Anyone every heard of the Regex.IsMatch and Regex.Match methods just
hanging and eventually getting a message "Requested Service not
found"?

I have the following pattern:

^(?<OrgCity>([A-Z][\w ]+)+), City of, (?<OrgState>(([A-Z][A-Z])|([A-Z]
[a-z]+\.)))( \((?<OrgCountry>[\w ]{2,})\))?$

(ignore the line wrap)

trying to match the following data:

ENVIRONMENTAL RESTORATION AND WASTE MANAGEMENT, DOE (US)

The following code will simply hang for a long time then write the
message "Requested Service not found" to the debug console:

Regex myRegex = new Regex(@"^(?<OrgCity>([A-Z][\w ]+)+), City of, (?
<OrgState>(([A-Z][A-Z])|([A-Z][a-z]+\.)))( \((?<OrgCountry>[\w ]{2,})
\))?$", RegexOptions.Compiled);

myRegex.IsMatch("ENVIRONMENTAL RESTORATION AND WASTE MANAGEMENT, DOE
(US)");
 
K

Kevin Spencer

Well, your regular expression is a mess for starters. I would suggest an
alternative, but you haven't given us any rules regarding the pattern(s)
you're trying to match. What you said was:
trying to match the following data:

ENVIRONMENTAL RESTORATION AND WASTE MANAGEMENT, DOE (US)

That is obviously not true. You wouldn't need a regular expression to match
a single fixed string. A regular expression searches for patterns in
strings. Those patterns are defined by rules that are expressed in the
regular expression. And the regular expression you posted, besides being a
mess (I will get to that), couldn't possibly match that string, since it
contains the literal ", City of, " - which is nowhere to be found in your
posted string.

The reason it's a mess is that you have many more Groups than you probably
know. You have 3 named Groups ("OrgCity," "OrgState," and "OrgCountry"), but
you also have FIVE unnamed Groups, and you're using backreferencing, so I'm
not sure where the compiler is throwing up on you.

Because I don't know the rules, I can't really give you a full answer.
However, I can tell you this much:

"^(?<OrgCity>[\w\s]+),"

will capture the following:

ENVIRONMENTAL RESTORATION AND WASTE MANAGEMENT,

Everything but the comma will be in the Group "OrgCity"

"\((?<OrgCountry>[\w\s]{2,})\)?$"

will capture the following:

(US)

Everything but the parentheses will be in the group "OrgCountry"

As for your third Group, I simplified the regular expression to the
following, which has the same rules:

(?<OrgState>[A-Z]{2}|[A-Z][a-z]+\.)

Briefly, it captures 1 of 2 possible patterns:
2 Capital letters
-or-
1 Capital letter followed by 1 or more lower-case letters, followed by a
period

That's the best I can do!

--
HTH,

Kevin Spencer
Microsoft MVP

Printing Components, Email Components,
FTP Client Classes, Enhanced Data Controls, much more.
DSI PrintManager, Miradyne Component Libraries:
http://www.miradyne.net

Anyone every heard of the Regex.IsMatch and Regex.Match methods just
hanging and eventually getting a message "Requested Service not
found"?

I have the following pattern:

^(?<OrgCity>([A-Z][\w ]+)+), City of, (?<OrgState>(([A-Z][A-Z])|([A-Z]
[a-z]+\.)))( \((?<OrgCountry>[\w ]{2,})\))?$

(ignore the line wrap)

trying to match the following data:

ENVIRONMENTAL RESTORATION AND WASTE MANAGEMENT, DOE (US)

The following code will simply hang for a long time then write the
message "Requested Service not found" to the debug console:

Regex myRegex = new Regex(@"^(?<OrgCity>([A-Z][\w ]+)+), City of, (?
<OrgState>(([A-Z][A-Z])|([A-Z][a-z]+\.)))( \((?<OrgCountry>[\w ]{2,})
\))?$", RegexOptions.Compiled);

myRegex.IsMatch("ENVIRONMENTAL RESTORATION AND WASTE MANAGEMENT, DOE
(US)");
 
S

shawnmkramer

Well, your regular expression is a mess for starters. I would suggest an
alternative, but you haven't given us any rules regarding the pattern(s)
you're trying to match. What you said was:
trying to match the following data:
ENVIRONMENTAL RESTORATION AND WASTE MANAGEMENT, DOE (US)

That is obviously not true. You wouldn't need a regular expression to match
a single fixed string. A regular expression searches for patterns in
strings. Those patterns are defined by rules that are expressed in the
regular expression. And the regular expression you posted, besides being a
mess (I will get to that), couldn't possibly match that string, since it
contains the literal ", City of, " - which is nowhere to be found in your
posted string.

The reason it's a mess is that you have many more Groups than you probably
know. You have 3 named Groups ("OrgCity," "OrgState," and "OrgCountry"), but
you also have FIVE unnamed Groups, and you're using backreferencing, so I'm
not sure where the compiler is throwing up on you.

Because I don't know the rules, I can't really give you a full answer.
However, I can tell you this much:

"^(?<OrgCity>[\w\s]+),"

will capture the following:

ENVIRONMENTAL RESTORATION AND WASTE MANAGEMENT,

Everything but the comma will be in the Group "OrgCity"

"\((?<OrgCountry>[\w\s]{2,})\)?$"

will capture the following:

(US)

Everything but the parentheses will be in the group "OrgCountry"

As for your third Group, I simplified the regular expression to the
following, which has the same rules:

(?<OrgState>[A-Z]{2}|[A-Z][a-z]+\.)

Briefly, it captures 1 of 2 possible patterns:
2 Capital letters
-or-
1 Capital letter followed by 1 or more lower-case letters, followed by a
period

That's the best I can do!

--
HTH,

Kevin Spencer
Microsoft MVP

Printing Components, Email Components,
FTP Client Classes, Enhanced Data Controls, much more.
DSI PrintManager, Miradyne Component Libraries:http://www.miradyne.net




Anyone every heard of the Regex.IsMatch and Regex.Match methods just
hanging and eventually getting a message "Requested Service not
found"?
I have the following pattern:
^(?<OrgCity>([A-Z][\w ]+)+), City of, (?<OrgState>(([A-Z][A-Z])|([A-Z]
[a-z]+\.)))( \((?<OrgCountry>[\w ]{2,})\))?$
(ignore the line wrap)
trying to match the following data:
ENVIRONMENTAL RESTORATION AND WASTE MANAGEMENT, DOE (US)
The following code will simply hang for a long time then write the
message "Requested Service not found" to the debug console:
Regex myRegex = new Regex(@"^(?<OrgCity>([A-Z][\w ]+)+), City of, (?
<OrgState>(([A-Z][A-Z])|([A-Z][a-z]+\.)))( \((?<OrgCountry>[\w ]{2,})
\))?$", RegexOptions.Compiled);
myRegex.IsMatch("ENVIRONMENTAL RESTORATION AND WASTE MANAGEMENT, DOE
(US)");- Hide quoted text -

- Show quoted text -

I think you missed the point. My post was not for help on how to match
some pattern. It's about why the regex library has this unpredictable
behavior.

I actually intended for the pattern to NOT match that string. I see
your point about having unneccessary capturing groups though, but
there not a problem for what I'm trying to capture.
 
K

Kevin Spencer

I wish you had told us that you weren't looking for a solution before I
tried to solve your problem! :p

Oh well. The result of the groupings was probably the cause, as it created a
large number of capturing groups (7), some of which were nested inside
others. I'm thinking that the combination of nested groups and unintentional
self-backreferences caused some sort of recursion overflow, but that's just
a guess.

--
HTH,

Kevin Spencer
Microsoft MVP

Printing Components, Email Components,
FTP Client Classes, Enhanced Data Controls, much more.
DSI PrintManager, Miradyne Component Libraries:
http://www.miradyne.net

Well, your regular expression is a mess for starters. I would suggest an
alternative, but you haven't given us any rules regarding the pattern(s)
you're trying to match. What you said was:
trying to match the following data:
ENVIRONMENTAL RESTORATION AND WASTE MANAGEMENT, DOE (US)

That is obviously not true. You wouldn't need a regular expression to
match
a single fixed string. A regular expression searches for patterns in
strings. Those patterns are defined by rules that are expressed in the
regular expression. And the regular expression you posted, besides being
a
mess (I will get to that), couldn't possibly match that string, since it
contains the literal ", City of, " - which is nowhere to be found in your
posted string.

The reason it's a mess is that you have many more Groups than you
probably
know. You have 3 named Groups ("OrgCity," "OrgState," and "OrgCountry"),
but
you also have FIVE unnamed Groups, and you're using backreferencing, so
I'm
not sure where the compiler is throwing up on you.

Because I don't know the rules, I can't really give you a full answer.
However, I can tell you this much:

"^(?<OrgCity>[\w\s]+),"

will capture the following:

ENVIRONMENTAL RESTORATION AND WASTE MANAGEMENT,

Everything but the comma will be in the Group "OrgCity"

"\((?<OrgCountry>[\w\s]{2,})\)?$"

will capture the following:

(US)

Everything but the parentheses will be in the group "OrgCountry"

As for your third Group, I simplified the regular expression to the
following, which has the same rules:

(?<OrgState>[A-Z]{2}|[A-Z][a-z]+\.)

Briefly, it captures 1 of 2 possible patterns:
2 Capital letters
-or-
1 Capital letter followed by 1 or more lower-case letters, followed
by a
period

That's the best I can do!

--
HTH,

Kevin Spencer
Microsoft MVP

Printing Components, Email Components,
FTP Client Classes, Enhanced Data Controls, much more.
DSI PrintManager, Miradyne Component Libraries:http://www.miradyne.net




Anyone every heard of the Regex.IsMatch and Regex.Match methods just
hanging and eventually getting a message "Requested Service not
found"?
I have the following pattern:
^(?<OrgCity>([A-Z][\w ]+)+), City of, (?<OrgState>(([A-Z][A-Z])|([A-Z]
[a-z]+\.)))( \((?<OrgCountry>[\w ]{2,})\))?$
(ignore the line wrap)
trying to match the following data:
ENVIRONMENTAL RESTORATION AND WASTE MANAGEMENT, DOE (US)
The following code will simply hang for a long time then write the
message "Requested Service not found" to the debug console:
Regex myRegex = new Regex(@"^(?<OrgCity>([A-Z][\w ]+)+), City of, (?
<OrgState>(([A-Z][A-Z])|([A-Z][a-z]+\.)))( \((?<OrgCountry>[\w ]{2,})
\))?$", RegexOptions.Compiled);
myRegex.IsMatch("ENVIRONMENTAL RESTORATION AND WASTE MANAGEMENT, DOE
(US)");- Hide quoted text -

- Show quoted text -

I think you missed the point. My post was not for help on how to match
some pattern. It's about why the regex library has this unpredictable
behavior.

I actually intended for the pattern to NOT match that string. I see
your point about having unneccessary capturing groups though, but
there not a problem for what I'm trying to capture.
 
J

Jesse Houwing

* (e-mail address removed) wrote, On 18-5-2007 16:44:
Anyone every heard of the Regex.IsMatch and Regex.Match methods just
hanging and eventually getting a message "Requested Service not
found"?

I have the following pattern:

^(?<OrgCity>([A-Z][\w ]+)+), City of, (?<OrgState>(([A-Z][A-Z])|([A-Z]
[a-z]+\.)))( \((?<OrgCountry>[\w ]{2,})\))?$

(ignore the line wrap)

trying to match the following data:

ENVIRONMENTAL RESTORATION AND WASTE MANAGEMENT, DOE (US)

The following code will simply hang for a long time then write the
message "Requested Service not found" to the debug console:

Regex myRegex = new Regex(@"^(?<OrgCity>([A-Z][\w ]+)+), City of, (?
<OrgState>(([A-Z][A-Z])|([A-Z][a-z]+\.)))( \((?<OrgCountry>[\w ]{2,})
\))?$", RegexOptions.Compiled);

myRegex.IsMatch("ENVIRONMENTAL RESTORATION AND WASTE MANAGEMENT, DOE
(US)");

This regex caused my vista installation to bluescreen for the very first
time since its installation in November. Congratulations on that :).

The cause:
(?<OrgCity>([A-Z][\w ]+)+)

Allows for an enormous amount of backtracking. A slightly improved variant:

(?<OrgCity>([A-Z][\w]+ )+)

Actually gives speedy and stable results. (notice how I removed the
space from the inner repetition).

Though a bluescreen should never have been caused of course.

Jesse
 
J

Jesse Houwing

* (e-mail address removed) wrote, On 18-5-2007 16:44:
Anyone every heard of the Regex.IsMatch and Regex.Match methods just
hanging and eventually getting a message "Requested Service not
found"?

I have the following pattern:

^(?<OrgCity>([A-Z][\w ]+)+), City of, (?<OrgState>(([A-Z][A-Z])|([A-Z]
[a-z]+\.)))( \((?<OrgCountry>[\w ]{2,})\))?$

(ignore the line wrap)

trying to match the following data:

ENVIRONMENTAL RESTORATION AND WASTE MANAGEMENT, DOE (US)

The following code will simply hang for a long time then write the
message "Requested Service not found" to the debug console:

Regex myRegex = new Regex(@"^(?<OrgCity>([A-Z][\w ]+)+), City of, (?
<OrgState>(([A-Z][A-Z])|([A-Z][a-z]+\.)))( \((?<OrgCountry>[\w ]{2,})
\))?$", RegexOptions.Compiled);

myRegex.IsMatch("ENVIRONMENTAL RESTORATION AND WASTE MANAGEMENT, DOE
(US)");

I submitted a bug to the Framework bugtracker on connect.

Please vote for it here:
https://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID=277745

Jesse
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top