RegEx hangs - Please help

  • Thread starter Thread starter sasifiqbal
  • Start date Start date
S

sasifiqbal

Hi All,

I have following text Template that needs to be parsed using Regular
Expression

-- Test Template
This is a Test Template for <#OrderNumber/>

Sender Message is <#SENDERMESSAGE/>

<#ORDERS>
This is Order number <#OrderNumber/>

Order Lines are

<#ORDERLINES>
Product Desc: <#PRODUCTDESCRIPTION/>

Product Information: <#ProductInformation/>

Supplier Name: <#SUPPLIERNAME/>

Supplier Contact: <#SUPPLIERCONTACTNO/>
</#ORDERLINES>
</#ORDERS>

Your voucher Reference number is <#VREFNO/>

-- End of Test Template

I need to extract string in between <#ORDERS>(anything)</#ORDERS>. The
reg exp i am using is (<#ORDERS>(\s|.)*?</#ORDERS>) and it works fine.
However if the test template gets change a bit like in following way,
the regular expression gets hanged.

--
<#ORDERS>
This is Order number <#OrderNumber/>

Order Lines are

<#ORDERLINES>
Product Desc: <#PRODUCTDESCRIPTION/>

Product Information: <#ProductInformation/>

Supplier Name: <#SUPPLIERNAME/>

Supplier Contact: <#SUPPLIERCONTACTNO/>
</#ORDERLINES>
</#ORDERS
--

Can some one please help and let me know any suggestion to do the same
thing. I am a bit new with Regular expression

Thanks,
Asif
 
Hi Asif,

This looks for all the world to me like an XML file. Correct me if I'm
mistaken. If you're parsing an XML file, it would be much better to use the
XmlDocument and related classes to parse it. It certainly could be done
using Regular Expressions, but I'm not sure from your description how the
document is actually arranged, and it can be tricky. Using XML classes, it
can be much easier, and you will not need to use a whole set of Regular
Expressions if you want to parse more than just that one tag.

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer
A watched clock never boils.
 
--
<#ORDERS>
This is Order number <#OrderNumber/>

Order Lines are

<#ORDERLINES>
Product Desc: <#PRODUCTDESCRIPTION/>

Product Information: <#ProductInformation/>

Supplier Name: <#SUPPLIERNAME/>

Supplier Contact: <#SUPPLIERCONTACTNO/>
</#ORDERLINES>
</#ORDERS

I have to agree with the previous replier that you should probably be
using XmlDocument instead of regular expressions.

I do have a few comments to make anyway. First of all, the reason the
regular expression doesn't work on the second template is because the
final tag is missing the finishing '>' character.

THe reason your program hangs is because the regular expression is
badly constructed. It should be @"<#ORDERS>.*?</#ORDERS>" and you
should use the RegexOptions.Singleline option when executing the
regular expression.
 
Hi Kevin,

Well, this is not an XML file. This would be a MS Word RTF formatted
document. This is actually a template which users will be
defining/creating on their own using a set of template codes and the
system will parse the document, replace all template codes (Yes,
<#ORDERS> is a template code) with appropriate values.

I hope my question would be a bit clear now.

Thanks for your reply any way.
Asif
 
Hi Marcus,

Thanks for the reply and your regular expression works and solve the
problem.
However i have a couple of issues now.

The regular expression i was using before handles those template codes
as well which are initself nested that is
<#ORDERS>
<Do some thing>
<#ORDERS>
<Do some thing more>
</#ORDERS>
<Do some final stuff>
</#ORDERS>

but the reg exp you mentioned is not returning the top level tag rather
it returns something like this:
<#ORDERS><Do some thing><#ORDERS> <Do some thing more></#ORDERS>

Any advice on this issue?

Thanks anyway for the help.

Regards,
Asif
 
Hi Marcus,

Thanks for the reply and your regular expression works and solve the
problem.
However i have a couple of issues now.

The regular expression i was using before handles those template codes
as well which are initself nested that is
<#ORDERS>
<Do some thing>
<#ORDERS>
<Do some thing more>
</#ORDERS>
<Do some final stuff>
</#ORDERS>

but the reg exp you mentioned is not returning the top level tag rather
it returns something like this:
<#ORDERS><Do some thing><#ORDERS> <Do some thing more></#ORDERS>

Any advice on this issue?

Use XmlDocuemnt if possible or write a simple balancing parser?

Oh well, if you really want a regular expression, this should work by
using the .NET specific balancing group construct:
(Apply RegexOptions.IgnorePatternWhitespace | RegexOptions.Singleline)

string pattern =
@"
<\#ORDERS>
(

<\#ORDERS> (?<DEPTH>) #Consume Start, increase
depth
|
</\#ORDERS> (?<-DEPTH>) #Consume End, decrease depth
|
(?!<\#ORDERS>) (?!</\#ORDERS>) . #Consume anything else

)*?

(?(DEPTH)(?!)) #Check that
Start==End

</\#ORDERS>
";
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Back
Top