Regex Q (Another one)

  • Thread starter Thread starter John B
  • Start date Start date
J

John B

Could anyone tell me why the following do not match?

^ABC_\d{6}_\d{6}$

Should match(twice):

ABC_123456_789123
ABC_123456_789123

It matches:
ABC_123456_789123

But not the double one.

Regex options set are:

RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace |
RegexOptions.Multiline

TIA

JB
 
John said:
Could anyone tell me why the following do not match?

^ABC_\d{6}_\d{6}$

Should match(twice):

ABC_123456_789123
ABC_123456_789123

It matches:
ABC_123456_789123

But not the double one.

Regex options set are:

RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace |
RegexOptions.Multiline

If I read your description correct, then it does more sound as
a problem with the code retrieving matches than with the regex.

Try post the code.

Arne
 
Hi John,

I may be wrong, but from my experience, if it was to match your double item,
it should be (^ABC_\d{6}_\d{6}$)+

meaning that it would allow that whole pattern one or more times as opposed
to just once.

- Lucas
 
Lucas said:
Hi John,

I may be wrong, but from my experience, if it was to match your double
item, it should be (^ABC_\d{6}_\d{6}$)+

meaning that it would allow that whole pattern one or more times as
opposed to just once.

Hmm, I was under the impression that it would match _any_ instances
(only the *and+ operators being greedy and eating the other matches
would interfere).

I am by no means an expert though and I tried the grouped one or more
and it still didnt work.

I eventually used (?:^|\r|\n)(?<asn>abc_\d{6}_\d{6})(?:$|\r|\n) which
worked, see reply to Arne for more.

Thanks for your help.

John
 
Arne said:
John B wrote:
If I read your description correct, then it does more sound as
a problem with the code retrieving matches than with the regex.

Try post the code.
Hi Arne,

Thanks for the reply.

The code is:
[Test]
public void Test1()
{
Regex matcher = new Regex(@"^ABC_\d{6}_\d{6}$",
RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace |
RegexOptions.Multiline);
MatchCollection matches =
matcher.Matches("ABC_123456_789123"); //works
Assert.AreEqual(1, matches.Count);
foreach (Match match in matches)
{
Console.WriteLine(match.Value);
}

matches =
matcher.Matches("ABC_123456_789123\r\nABC_123456_789123"); //doesnt work
Assert.AreEqual(2, matches.Count);
foreach (Match match in matches)
{
Console.WriteLine(match.Value);
}

matches = matcher.Matches("ABC_123456_789123\r\n");
//doesnt work
Assert.AreEqual(1, matches.Count);

foreach (Match match in matches)
{
Console.WriteLine(match.Value);
}
}



The pattern (?:^|\r|\n)(?<cap>abc_\d{6}_\d{6})(?:$|\r|\n) does what I
want so no big deal.

I just dont understand why it doesnt match with the multiline flag.

Cheers,

John
 
John said:
The code is:
[Test]
public void Test1()
{
Regex matcher = new Regex(@"^ABC_\d{6}_\d{6}$",
RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace |
RegexOptions.Multiline);
MatchCollection matches =
matcher.Matches("ABC_123456_789123"); //works
Assert.AreEqual(1, matches.Count);
foreach (Match match in matches)
{
Console.WriteLine(match.Value);
}

matches =
matcher.Matches("ABC_123456_789123\r\nABC_123456_789123"); //doesnt work
Assert.AreEqual(2, matches.Count);
foreach (Match match in matches)
{
Console.WriteLine(match.Value);
}

Actually I can see why you are puzzled.

A bit of experimentation shows that:

matcher.Matches("ABC_123456_789123\nABC_123456_789123")

will find 2 matches.

But it is not obvious to me why \n is the proper line
delimiter and not \r\n.

Maybe someone with more regex knowledge than me can
explain it.

Arne
 
Arne said:
John said:
The code is:
[Test]
public void Test1()
{
Regex matcher = new Regex(@"^ABC_\d{6}_\d{6}$",
RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace |
RegexOptions.Multiline);
MatchCollection matches =
matcher.Matches("ABC_123456_789123"); //works
Assert.AreEqual(1, matches.Count);
foreach (Match match in matches)
{
Console.WriteLine(match.Value);
}

matches =
matcher.Matches("ABC_123456_789123\r\nABC_123456_789123"); //doesnt work
Assert.AreEqual(2, matches.Count);
foreach (Match match in matches)
{
Console.WriteLine(match.Value);
}

Actually I can see why you are puzzled.

A bit of experimentation shows that:

matcher.Matches("ABC_123456_789123\nABC_123456_789123")

will find 2 matches.

But it is not obvious to me why \n is the proper line
delimiter and not \r\n.

Maybe someone with more regex knowledge than me can
explain it.

Ahh, I see.
Hmm, that is kind of odd, I would have thought the system "New Line"
variable would be used (\r\n for windows obviously)

Thanks for that.

John
 
John said:
Ahh, I see.
Hmm, that is kind of odd, I would have thought the system "New Line"
variable would be used (\r\n for windows obviously)

Me too.

But ...

Arne
 
Hmm, I was under the impression that it would match _any_ instances
(only the *and+ operators being greedy and eating the other matches
would interfere).

yes + and * are greedy but you can make them not greedy by appending
a ? like this:

new Regex("#.+?#") matched against "#hello # world#" would give
"#hello #"
while new Regex(".+") would return the match "#hello # world#"

hth,
-- henon
 
Hello Arne,
John said:
The code is:
[Test]
public void Test1()
{
Regex matcher = new Regex(@"^ABC_\d{6}_\d{6}$",
RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace |
RegexOptions.Multiline);
MatchCollection matches =
matcher.Matches("ABC_123456_789123"); //works
Assert.AreEqual(1, matches.Count);
foreach (Match match in matches)
{
Console.WriteLine(match.Value);
}
matches =
matcher.Matches("ABC_123456_789123\r\nABC_123456_789123"); //doesnt
work
Assert.AreEqual(2, matches.Count);
foreach (Match match in matches)
{
Console.WriteLine(match.Value);
}
Actually I can see why you are puzzled.

A bit of experimentation shows that:

matcher.Matches("ABC_123456_789123\nABC_123456_789123")

will find 2 matches.

But it is not obvious to me why \n is the proper line delimiter and
not \r\n.

Maybe someone with more regex knowledge than me can explain it.

Arne

I've raised this as a bug with Microsoft multiple times, but always get a
could not replicate back as result.

I've become accustomed to write \r?$ at all times instead of just $.

But that shouldn't be needed.

It didn't use to be so in .NET 1.0, it changed somewhere after that and I
havent' experimented enough with the different versions of the platform,
but my guess is that it even changed in 1.1 from the original version compared
to the latest SP.

One other thing I've been trying is to use multiple locales and see if that
has anything to do with it. But as far as I can tell that has no impact.
One other thing to try is to set the OS defaults for newlines etc, but I've
ot yet experimented with those either.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Back
Top