RegEx, attempting to select some markup

  • Thread starter R. K. Wijayaratne
  • Start date
R

R. K. Wijayaratne

Hello everyone,

I have something similar to the below and I am trying to select just
the Block 1 markup/text <li><ul>...</ul></li> with the following RegEx
@"<li>(.|\n)*<ul>(.|\n)*</ul>\n*</li>", however it selects all 3
blocks. Any ideas how to get around this?

[C#]
string section = Regex.Match(text, @"(<li>(.|\n)*<ul>(.|\n)*</ul>\n*</
li>)").Value;

HTML:
<li>Block 1, some text
<ul>
<li>Link 1</li>
<li>Link 2</li>
<li>Link 3</li>
</ul></li>

<li>Block 2, some text
<ul>
<li>Link 4</li>
<li>Link 5</li>
<li>Link 6</li>
</ul></li>

<li>Block 3, some text
<ul>
<li>Link 7</li>
<li>Link 8</li>
<li>Link 9</li>
</ul></li>
 
F

firstfather

R.K., try this:

[Test]
public void TestRegex()
{
string toTest = @"<li>Block 1, some text
<ul>
<li>Link 1</li>
<li>Link 2</li>
<li>Link 3</li>
</ul></li>

<li>Block 2, some text
<ul>
<li>Link 4</li>
<li>Link 5</li>
<li>Link 6</li>
</ul></li>

<li>Block 3, some text
<ul>
<li>Link 7</li>
<li>Link 8</li>
<li>Link 9</li>
</ul></li>";
Match match = null;
try
{
match =
System.Text.RegularExpressions.Regex.Match(toTest, "<li>.*?<ul>.*?</
ul></li>", RegexOptions.Singleline);
}
catch (ArgumentException)
{
// Syntax error in the regular expression
}
Console.WriteLine(((Match)match).Value);
}

What makes this work is the question mark after the dotstar (.*?
instead .*). It forces a non-greedy scan, which keeps the regex
engine from devouring the whole string. The RegexOptions.Singleline
switch treats the newline characters as part of the string so you
don't have to scan for them explicitly.

Tom
 
R

R. K. Wijayaratne

Amazing! Thanking you Tom... :)

R.K., try this:

[Test]
public void TestRegex()
{
string toTest = @"<li>Block 1, some text
<ul>
<li>Link 1</li>
<li>Link 2</li>
<li>Link 3</li>
</ul></li>

<li>Block 2, some text
<ul>
<li>Link 4</li>
<li>Link 5</li>
<li>Link 6</li>
</ul></li>

<li>Block 3, some text
<ul>
<li>Link 7</li>
<li>Link 8</li>
<li>Link 9</li>
</ul></li>";
Match match = null;
try
{
match =
System.Text.RegularExpressions.Regex.Match(toTest, "<li>.*?<ul>.*?</
ul></li>", RegexOptions.Singleline);
}
catch (ArgumentException)
{
// Syntax error in the regular expression
}
Console.WriteLine(((Match)match).Value);
}

What makes this work is the question mark after the dotstar (.*?
instead .*). It forces a non-greedy scan, which keeps the regex
engine from devouring the whole string. The RegexOptions.Singleline
switch treats the newline characters as part of the string so you
don't have to scan for them explicitly.

Tom

Hello everyone,
I have something similar to the below and I am trying to select just
the Block 1 markup/text <li><ul>...</ul></li> with the following RegEx
@"<li>(.|\n)*<ul>(.|\n)*</ul>\n*</li>", however it selects all 3
blocks. Any ideas how to get around this?
[C#]
string section = Regex.Match(text, @"(<li>(.|\n)*<ul>(.|\n)*</ul>\n*</
li>)").Value;
HTML:
<li>Block 1, some text
<ul>
<li>Link 1</li>
<li>Link 2</li>
<li>Link 3</li>
</ul></li>[/QUOTE]
[QUOTE]
<li>Block 2, some text
<ul>
<li>Link 4</li>
<li>Link 5</li>
<li>Link 6</li>
</ul></li>[/QUOTE]
[QUOTE]
<li>Block 3, some text
<ul>
<li>Link 7</li>
<li>Link 8</li>
<li>Link 9</li>
</ul></li>- Hide quoted text -[/QUOTE]

- Show quoted text -[/QUOTE]
 
W

Wingot

-----Original Message-----
From: (e-mail address removed) [mailto:[email protected]]
Posted At: Sunday, 9 December 2007 2:03 AM
Posted To: microsoft.public.dotnet.languages.csharp
Conversation: RegEx, attempting to select some markup
Subject: Re: RegEx, attempting to select some markup *snip*
What makes this work is the question mark after the dotstar (.*?
instead .*). It forces a non-greedy scan, which keeps the regex
engine from devouring the whole string. The RegexOptions.Singleline
switch treats the newline characters as part of the string so you
don't have to scan for them explicitly.

Actually, question mark is optionality (ie. The character/search term
before it can occur 0 or 1 times. It does however result in non greedy
when applied to the * or + operators.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top