Problem with named groups using Regex

  • Thread starter Thread starter Edgardo Rossetto
  • Start date Start date
E

Edgardo Rossetto

Hi, I have something like:

string result = null;

try
{
string exp = @"(?'GroupName'<title>.*\(([0-9]{4})\).*</title>)";
Regex r = new Regex(exp);
result = r.Match(html).Groups["GroupName"].Value;
return result.Trim();
}
catch
{
return result;
}

The problem is that in theory GroupName should contain the result of the
expression, but instead it returns the entire thing, including the
<title> and </title> tags, however, r.Match(html).Groups[1].Value;
returns the proper content (the numbers).

What am I doing wrong?

Note I tried with 'GroupName' and <GroupName>, both valid ways to
identify the group name.
 
Hi, Edgardo! If you try using this expression

string exp = @"(<title>.*\((?'GroupName'[0-9]{4})\).*</title>)";

you will get the numbers you want.

The reason, I think, is that your GroupName refers to the "outer" group,
which is enclosed by the first ( and last ). So it is not so strange that you
got the whole string including <title> and </title>.

About the group number, I guess the sequence of counting is not "from left
to right", but "from inside to outside", which means that the number 1 group
in your expression is ([0-9]{4}), and group with the biggest number is the
whole string.

Hope this explaination will help you.

Ryan
 
Well, I made some tests and the result proved that I was wrong about the
counting sequence of the groups. The counting of groups is really a
"left-to-right" manner.
 
Ryan said:
Hi, Edgardo! If you try using this expression

string exp = @"(<title>.*\((?'GroupName'[0-9]{4})\).*</title>)";

you will get the numbers you want.

The reason, I think, is that your GroupName refers to the "outer" group,
which is enclosed by the first ( and last ). So it is not so strange that you
got the whole string including <title> and </title>.

Yes, thats it!

I dont fully understand why though, I never did named groups before, but
whats the point of creating another backreference for naming?

Thanks for your reply.
 
The point is that your group name always refer to the group whose '(' is just
to the left of your group name.

So, in the pattern

@"(<title>.*\((?'GroupName'[0-9]{4})\).*</title>)"

GroupName refers to the group ([0-9]{4}), and in this pattern

@"(?'GroupName'<title>.*\(([0-9]{4})\).*</title>)";

GroupName refers to the group

(<title>.*\(([0-9]{4})\).*</title>).

If there is any question about my explanation, please don't hesitate to tell
me.
 
Back
Top