Regular Expression Bug?

G

Guest

By working with RegularExpressions I found some inconsistency between the
Escape and Match method.

Has anyone an explanation for this behaviour or is this a bug?

Following the description of the problem:

'Escape' DOES NOT recognize ']' as a metacharacter.
'Match(es)' DOES recognize ']' as a metacharacter.

void Wanted_but_not_working()
{
string OpenMask = "[";
string CloseMask = "]";
OpenMask = Regex.Escape(OpenMask);
OpenMask = Regex.Escape(OpenMask);
//Expected Result: OpenMask="\\\[";
CloseMask = Regex.Escape(CloseMask);
CloseMask = Regex.Escape(CloseMask);
//Expected Result: CloseMask="\\\]";
//Why is ']' no metacharacter?
//It has to be. Otherwise the Match below should work.
//Either the Escape or the Match operation does not work properly.

string actualDefinition = "[KG]*[IND]*[VHW]*[KZ]";
actualDefinition = Regex.Escape(actualDefinition);
//Expected Result: "\[KG\]\*\[IND\]\*\[VHW\]\*\[KZ\]"

string pattern =
String.Concat(OpenMask,"[^",OpenMask,CloseMask,"]+",CloseMask);
//Expected Result: "\\\[[^\\\[\\\]]+\\\]"
MatchCollection ma = Regex.Matches(actualDefinition, pattern,
RegexOptions.Compiled | RegexOptions.CultureInvariant |
RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);
//Expected Result: 4 matches... NOTHING
}

void Manual_and_working_but_not_wanted()
{
string OpenMask = @"\\\[";
string CloseMask = @"\\\]";

string actualDefinition = @"\[KG\]\*\[IND\]\*\[VHW\]\*\[KZ\]";
string pattern = String.Concat(OpenMask, "[^", OpenMask, CloseMask, "]+",
CloseMask);
MatchCollection ma = Regex.Matches(actualDefinition, pattern,
RegexOptions.Compiled | RegexOptions.CultureInvariant |
RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);
//Expected Result: 4 matches ... OK
}

void Crazy_Workaround()
{
string OpenMask = "[";
string CloseMask = @"\]";
OpenMask = Regex.Escape(OpenMask);
OpenMask = Regex.Escape(OpenMask);
//Expected Result: OpenMask="\\\[";

string actualDefinition = "[KG]*[IND]*[VHW]*[KZ]";
actualDefinition = Regex.Escape(actualDefinition);
string pattern = String.Concat(OpenMask, "[^", OpenMask, CloseMask, "]+",
CloseMask);
MatchCollection ma = Regex.Matches(actualDefinition, pattern,
RegexOptions.Compiled | RegexOptions.CultureInvariant |
RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);
//Expected Result: 4 matches... OK
}
 
C

Carl Daniel [VC++ MVP]

Ringwraith said:
By working with RegularExpressions I found some inconsistency between
the Escape and Match method.

Has anyone an explanation for this behaviour or is this a bug?

I'd call it an oversight in Regex.Escape. The issue is that ] is
technically not a meta character; nor is ) or }. These characters only have
special meaning when they follow an unmatched, unescaped instance of the
corresponding "open" character: they're contextual meta-characters, if you
will.

You can submit a bug report on
http://connect.microsoft.com/feedback/default.aspx?SiteID=210

I'd expect that it will be closed as by-design though, since Regex.Escape is
working exactly as documented:

http://msdn2.microsoft.com/en-gb/library/system.text.regularexpressions.regex.escape.aspx

-cd
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top