RegEx, attempting to select some markup

  • Thread starter Thread starter R. K. Wijayaratne
  • Start date Start date
R

R. K. Wijayaratne

Hello everyone,

I have something similar to the below and I am trying to select just
the Block 1 markup/text <li><ul>...</ul></li> with the following RegEx
@"<li>(.|\n)*<ul>(.|\n)*</ul>\n*</li>", however it selects all 3
blocks. Any ideas how to get around this?

[C#]
string section = Regex.Match(text, @"(<li>(.|\n)*<ul>(.|\n)*</ul>\n*</
li>)").Value;

HTML:
<li>Block 1, some text
<ul>
<li>Link 1</li>
<li>Link 2</li>
<li>Link 3</li>
</ul></li>

<li>Block 2, some text
<ul>
<li>Link 4</li>
<li>Link 5</li>
<li>Link 6</li>
</ul></li>

<li>Block 3, some text
<ul>
<li>Link 7</li>
<li>Link 8</li>
<li>Link 9</li>
</ul></li>
 
R.K., try this:

[Test]
public void TestRegex()
{
string toTest = @"<li>Block 1, some text
<ul>
<li>Link 1</li>
<li>Link 2</li>
<li>Link 3</li>
</ul></li>

<li>Block 2, some text
<ul>
<li>Link 4</li>
<li>Link 5</li>
<li>Link 6</li>
</ul></li>

<li>Block 3, some text
<ul>
<li>Link 7</li>
<li>Link 8</li>
<li>Link 9</li>
</ul></li>";
Match match = null;
try
{
match =
System.Text.RegularExpressions.Regex.Match(toTest, "<li>.*?<ul>.*?</
ul></li>", RegexOptions.Singleline);
}
catch (ArgumentException)
{
// Syntax error in the regular expression
}
Console.WriteLine(((Match)match).Value);
}

What makes this work is the question mark after the dotstar (.*?
instead .*). It forces a non-greedy scan, which keeps the regex
engine from devouring the whole string. The RegexOptions.Singleline
switch treats the newline characters as part of the string so you
don't have to scan for them explicitly.

Tom
 
Amazing! Thanking you Tom... :)

R.K., try this:

[Test]
public void TestRegex()
{
string toTest = @"<li>Block 1, some text
<ul>
<li>Link 1</li>
<li>Link 2</li>
<li>Link 3</li>
</ul></li>

<li>Block 2, some text
<ul>
<li>Link 4</li>
<li>Link 5</li>
<li>Link 6</li>
</ul></li>

<li>Block 3, some text
<ul>
<li>Link 7</li>
<li>Link 8</li>
<li>Link 9</li>
</ul></li>";
Match match = null;
try
{
match =
System.Text.RegularExpressions.Regex.Match(toTest, "<li>.*?<ul>.*?</
ul></li>", RegexOptions.Singleline);
}
catch (ArgumentException)
{
// Syntax error in the regular expression
}
Console.WriteLine(((Match)match).Value);
}

What makes this work is the question mark after the dotstar (.*?
instead .*). It forces a non-greedy scan, which keeps the regex
engine from devouring the whole string. The RegexOptions.Singleline
switch treats the newline characters as part of the string so you
don't have to scan for them explicitly.

Tom

Hello everyone,
I have something similar to the below and I am trying to select just
the Block 1 markup/text <li><ul>...</ul></li> with the following RegEx
@"<li>(.|\n)*<ul>(.|\n)*</ul>\n*</li>", however it selects all 3
blocks. Any ideas how to get around this?
[C#]
string section = Regex.Match(text, @"(<li>(.|\n)*<ul>(.|\n)*</ul>\n*</
li>)").Value;
HTML:
<li>Block 1, some text
<ul>
<li>Link 1</li>
<li>Link 2</li>
<li>Link 3</li>
</ul></li>[/QUOTE]
[QUOTE]
<li>Block 2, some text
<ul>
<li>Link 4</li>
<li>Link 5</li>
<li>Link 6</li>
</ul></li>[/QUOTE]
[QUOTE]
<li>Block 3, some text
<ul>
<li>Link 7</li>
<li>Link 8</li>
<li>Link 9</li>
</ul></li>- Hide quoted text -[/QUOTE]

- Show quoted text -[/QUOTE]
 
-----Original Message-----
From: (e-mail address removed) [mailto:[email protected]]
Posted At: Sunday, 9 December 2007 2:03 AM
Posted To: microsoft.public.dotnet.languages.csharp
Conversation: RegEx, attempting to select some markup
Subject: Re: RegEx, attempting to select some markup *snip*
What makes this work is the question mark after the dotstar (.*?
instead .*). It forces a non-greedy scan, which keeps the regex
engine from devouring the whole string. The RegexOptions.Singleline
switch treats the newline characters as part of the string so you
don't have to scan for them explicitly.

Actually, question mark is optionality (ie. The character/search term
before it can occur 0 or 1 times. It does however result in non greedy
when applied to the * or + operators.
 
Back
Top