Problem with named groups using Regex

E

Edgardo Rossetto

Hi, I have something like:

string result = null;

try
{
string exp = @"(?'GroupName'<title>.*\(([0-9]{4})\).*</title>)";
Regex r = new Regex(exp);
result = r.Match(html).Groups["GroupName"].Value;
return result.Trim();
}
catch
{
return result;
}

The problem is that in theory GroupName should contain the result of the
expression, but instead it returns the entire thing, including the
<title> and </title> tags, however, r.Match(html).Groups[1].Value;
returns the proper content (the numbers).

What am I doing wrong?

Note I tried with 'GroupName' and <GroupName>, both valid ways to
identify the group name.
 
G

Guest

Hi, Edgardo! If you try using this expression

string exp = @"(<title>.*\((?'GroupName'[0-9]{4})\).*</title>)";

you will get the numbers you want.

The reason, I think, is that your GroupName refers to the "outer" group,
which is enclosed by the first ( and last ). So it is not so strange that you
got the whole string including <title> and </title>.

About the group number, I guess the sequence of counting is not "from left
to right", but "from inside to outside", which means that the number 1 group
in your expression is ([0-9]{4}), and group with the biggest number is the
whole string.

Hope this explaination will help you.

Ryan
 
G

Guest

Well, I made some tests and the result proved that I was wrong about the
counting sequence of the groups. The counting of groups is really a
"left-to-right" manner.
 
E

Edgardo Rossetto

Ryan said:
Hi, Edgardo! If you try using this expression

string exp = @"(<title>.*\((?'GroupName'[0-9]{4})\).*</title>)";

you will get the numbers you want.

The reason, I think, is that your GroupName refers to the "outer" group,
which is enclosed by the first ( and last ). So it is not so strange that you
got the whole string including <title> and </title>.

Yes, thats it!

I dont fully understand why though, I never did named groups before, but
whats the point of creating another backreference for naming?

Thanks for your reply.
 
G

Guest

The point is that your group name always refer to the group whose '(' is just
to the left of your group name.

So, in the pattern

@"(<title>.*\((?'GroupName'[0-9]{4})\).*</title>)"

GroupName refers to the group ([0-9]{4}), and in this pattern

@"(?'GroupName'<title>.*\(([0-9]{4})\).*</title>)";

GroupName refers to the group

(<title>.*\(([0-9]{4})\).*</title>).

If there is any question about my explanation, please don't hesitate to tell
me.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top