Regular Expression 100% cpu - please help

  • Thread starter Gareth James via .NET 247
  • Start date
G

Gareth James via .NET 247

I have an expression that when run uses 100% cpu for over 1minute.
I can change the expression so this does not happen, but couldsome one explain why this happens so that I don't do it again


expression -->


Departing:</td>.*?</span>(?<departingAirportName>.*?)\(
(?<airportCode>\w+)\).*?<li>
(?<departingCity>[\w\s]+),\s*
(?<departingCountry>[\w\s]+).*?
(?<departingTimeHours>\d+):
(?<departingTimeMins>\d+).*?Arriving:.*?</span>
(?<arrivalAirportName>.*?)\(
(?<arrivelAirportCode>\w+)\).*?<li>
(?<arrivalCity>[\w\s]+),\s*
(?<arrivalCountry>[\w\s]+).*?
(?<arrivalTimeHours>\d+):
(?<arrivalTimeMins>\d+).*?href=".*?\(
(?<linkURL>.*?)\).*?>
(?<carrier>[\w\s]+)\(
(?<flightNumber>.*?)\).*?


text to search -->


l10 bb2"><span class="textBold">Wed 16 March 05</span>, 1stop(s)</td>
</tr>
<tr class="h32 dotsbottom canvas">
<td class="text l10">Duration:</td>
<td class="textBold l10">14h00</td>
</tr>
<tr height="60">
<td class="text l10 t15 vtop">Departing:</td>
<td class="text l10 t15 vtop">
<ul class="list">
<li>
<span class="bold">
</span>
Newcastle Int'l (NCL),</li><li>Newcastle, United Kingdom
</li>
<li>
<span class="bold">12:05</span> Wed
</li>
</ul>
</td>
</tr>
<tr height="60">
<td class="text l10 t15 vtop">Arriving:</td>
<td class="text l10 t15 vtop">
<ul class="list">
<li>
<span class="bold">
Terminal 1,
</span>
Heathrow (LHR),</li><li>London, United Kingdom
</li>
<li>
<span class="bold">13:20</span> Wed
</li>
</ul>
</td>
</tr>
<tr>
<td class="textBold l10 t10 b10 vtop"><imgsrc='/images/en/FE/BE/Tailfin/smBA.gif' alt="tailfin" width="30"height="25" alt="" /></td>
<td class="text l10 t10 b10">
<ul class="list">
<li>
Non-stop
</li>
<li>
<ahref="javascript:popupWithNoReturn('/otpbvpl/Jsp/opodo/FlifoInfoServlet?BV_SessionID=@@@@1016786011.1110564120@@@@&BV_EngineID=ccdeaddediigiijcefecenhdhhldfnk.0&locale=en_GB&FLIGHT_NUMBER=1327&AIRLINE_CODE=BA&B_DATE=200503161205', 'opodo', 700, 450)"class="link">British Airways (BA 1327) ></a>
</li>
<li>
Airplane type - 320
</li>
<li>
Economy restricted
</li>
<li>
<script type="text/javascript" language="JavaScript">
// work around for netscape 7.0.1/2 encoded characters in link
var eticketURL ='http://www.opodo.co.uk:80/otpbvpl/G...rod_lvl1=1&p_prod_lvl2=30&sURLType=RightNow';
document.write('<ahref="javascript:popupWithNoReturn(eticketURL,\\'faq\\',750,600)" class="link">e-ticket available ></a>');
</script>
</li>
<li>
</li>
</ul>
</td>
</tr>
<tr class="h32">
<td class="textBold l10 beigeBG dotstop bb2"colspan="2">Connection:</td>
</tr>
<tr height="152">
<td class="text l10 t10 b10 beigeBG dotsbottom vtop">
<ul class="list">
<li>

</li>
<li>

</li>
<li>

</li>
<li>

</li>
</ul>
</td>
<td class="text l10 t10 b10 beigeBG dotsbottom vtop"width="100%">
<ul class="list">
<li><span class="bold">13:20</span> Wed - <spanclass="bold">14:35</span> Wed</li>
<li>
Change plane
</li>
<li class="pt10">
Stop-over duration: 1h15
</li>
</ul>
</td>
</tr>
<tr height="60">
<td class="text l10 t15 vtop">Departing:</td>
<td class="text l10 t15 vtop">
<ul class="list">
<li>
<span class="bold">
Terminal 1,
</span>
Heathrow (LHR),</li><li>London, United Kingdom
</li>
<li>
<span class="bold">14:35</span> Wed
</li>
</ul>
</td>
</tr>
<tr height="60">
<td class="text l10 t15 vtop">Arriving:</td>
<td class="text l10 t15 vtop">
<ul class="list">
<li>
<span class="bold">
Terminal 1,
</span>
Narita (NRT),</li><li>Tokyo, Japan
</li>
<li>
<span class="bold">11:05</span> Thu
</li>
</ul>
</td>
</tr>

<tr>
<td class="textBold l10 t10 b10 vtop"><imgsrc='/images/en/FE/BE/Tailfin/smBA.gif' alt="tailfin" width="30"height="25" alt="" /></td>
<td class="text l10 t10 b10">
<ul class="list">
<li>
Non-stop
</li>
<li>
<ahref="javascript:popupWithNoReturn('/otpbvpl/Jsp/opodo/FlifoInfoServlet?BV_SessionID=@@@@1016786011.1110564120@@@@&BV_EngineID=ccdeaddediigiijcefecenhdhhldfnk.0&locale=en_GB&FLIGHT_NUMBER=7&AIRLINE_CODE=BA&B_DATE=200503161435', 'opodo', 700, 450)"class="link">British Airways (BA 7) ></a>
</li>
<li>
Airplane type - 744

</li>
<li>
World Traveller Plus
</li>
<li>
<script type="text/javascript" language="JavaScript">
// work around for netscape 7.0.1/2 encoded characters in link
var eticketURL ='http://www.opodo.co.uk:80/otpbvpl/G...rod_lvl1=1&p_prod_lvl2=30&sURLType=RightNow';
document.write('<ahref="javascript:popupWithNoReturn(eticketURL,\\'faq\\',750,600)" class="link">e-ticket available ></a>');
</script>
</li>
<li>
</li>
</ul>
</td>
</tr>
 
N

Niki Estner

Gareth James via .NET 247 said:
I have an expression that when run uses 100% cpu for over 1 minute.
I can change the expression so this does not happen, but could some one
explain why this > happens so that I don't do it again


I just entered your expression and your sample string into expresso, and it
didn't take a second to run it. I guess it takes forever on *other* input
string, am I right? e.g. if the expression can't find a match, right?

I'm not 100% certain, but I think things like this : "(...[\w\s]+).*?" will
make it take forever: The [\w\s]+ part will first attempt to match all
word/space characters that follow (like "United Kingdom"), then ".*?" will
eat up any number of characters until the next subexpression matches. Now,
if any subexpression after this one doesn't match, the regex will have to
backtrack: It'll match "United Kingdo" for [\w\s]+, and try all the
subexpressions after this one again, and so on. Now, if you have more than
one expression of that kind, every possible combination will be tried, which
will take some time...

If you can, don't use ".*" or ".*?" at all, because they often lead to much
more possible combinations than you really want. Use some more suitable
character class if there is one. Also, you can (I think) forbid backtracking
on many of those subexpressions: use a greedy subexpression like
(?>[\w\s]+).
or
(?>Departing:</td>.*?</span>(?<departingAirportName>.*?)\()
This way the engine won't look at these subexpressions once it has found a
match for them.

Niki
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top