Need help with Regex

D

Danny Ni

Hi,

The following code snippet is causing CPU to max out on my local machine and
production servers. It looks fine on Expresso though.

Regex rgxVideo = new
Regex(@"<embed(\s+[a-z]+\s*=\s*(""[^""]*""|'[^']*'|[^\s]*))*\s+src=\s*(""|')?http://www.g4tv.com/i?sv3?/(?<videokey>\d+)(""|')?(\s+[a-z]+\s*=\s*(""[^""]*""|'[^']*'|[^\s]*))*\s*(/\s*>|>\s*</embed>)",
RegexOptions.IgnoreCase);
string strBody = "<embed name=\"VideoPlayer\"
src=\"http://localhost/lv3/26757\" width=\"480\" height=\"418\"
scale=\"ShowAll\" loop=\"loop\" menu=\"menu\" wmode=\"Window\" quality=\"1\"
type=\"application/x-shockwave-flash\"></embed>" +
"<embed name=\"VideoPlayer\" src=\"http://localhost/lv3/19251\"
width=\"480\" height=\"418\" scale=\"ShowAll\" loop=\"loop\" menu=\"menu\"
wmode=\"Window\" quality=\"1\"
type=\"application/x-shockwave-flash\"></embed>" +
"<embed name=\"VideoPlayer\" src=\"http://localhost/lv3/20202\"
width=\"480\" height=\"418\" scale=\"ShowAll\" loop=\"loop\" menu=\"menu\"
wmode=\"Window\" quality=\"1\"
type=\"application/x-shockwave-flash\"></embed>" +
"<embed name=\"VideoPlayer\" src=\"http://localhost/lv3/16549\"
width=\"480\" height=\"418\" scale=\"ShowAll\" loop=\"loop\" menu=\"menu\"
wmode=\"Window\" quality=\"1\"
type=\"application/x-shockwave-flash\"></embed>";
foreach (Match objMatch in rgxVideo.Matches(strBody)) // loop
indefinitely here
{


}

TIA
 
K

Kottekoe

Danny,

I tried this in Expresso and it predicts the same behavior you should see in
code, namely that the execution time of your regex grows exponentially with
the size of the input string. I'm guessing that when you tested it in
Expresso, you used a shorter input string or one that easily found a match,
thereofe it terminated quickly. The example in your code does not have a
match (for example, "g4tv" will never match). The regex engine has to try
every possible permutation of your regex hunting for a match. The number of
permutations grows exponentially with the size of the string, so your
application hangs, while it continues to try new permutations. There are
dangerous things in your regex design that cause this. Be very careful with
nested quantifiers, especially when applied to wildcards, like (.*)*. Things
like this can cause the execution time to double every time a single
character is added to the input text. It may work fine for 100 characters,
but add 10 more and the execution time goes up by a factor of 1000, or add 20
characters (a 20% increase in length) and the times goes up by one million
times.

JWT

P.S. I don't know what the Colorado Kid is talking about. Expresso is
specifically designed to work with .NET regex.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top