Regex. How can this match?

  • Thread starter Thread starter Ethan Strauss
  • Start date Start date
E

Ethan Strauss

Hi,

I have written a regular expression which is supposed to pull a
direction (forward or reverse) designation from a file name.

Unfortunately, the direction designation can either be the whole word
("Forward" or "Reverse") or just a single letter ("F" or "R") and the rest
of the name is not as consistent as I would like.. For example
"P1|1_G10_Forward_primer.ab1" or "K8_I1_A01_F.ab1".

At the time I am processing the file names, I have already stripped off
the extension.

I have written the Regular Expression
public static Regex DirectionFromFIleName = new
Regex("_(?<Direction>[Forward_|Reverse_|R$|F$])");

This looks for the underscore, followed by "Forward" or "Reverse" or an "F"
as the last character in the string or an "R" as the last character in the
string, or so I thought.

In fact,

when Designation = "P1|1_G10_Forward_primer"
RegexLibrary.DirectionFromFIleName.Match(Designation).Groups["Direction"].Value
= "F"!



How can it pick up that F when it is not the last character? I assume it has
something to do with putting the $ inside the square brackets, but I can't
figure out exactly what it is.



I can figure out a bunch of different work arounds for this, but I would
like to understand what the regular expression is doing for the future.

Thanks!
Ethan
 
Ethan,

For this, I don't know that I would use that logic. I would parse apart
the parts of the filename by non-alphanumeric characters and then look for
the word or letters in the remaining results. Basically, you would use the
regular expression pattern "\W" (for non-alphanumeric characters) and then
call the Split method on the regular expression, passing your string.

In the array of strings that is returned, look for Forward, Reverse, R
or F.
 
* Ethan Strauss wrote, On 5-7-2007 18:15:
Hi,

I have written a regular expression which is supposed to pull a
direction (forward or reverse) designation from a file name.

Unfortunately, the direction designation can either be the whole word
("Forward" or "Reverse") or just a single letter ("F" or "R") and the rest
of the name is not as consistent as I would like.. For example
"P1|1_G10_Forward_primer.ab1" or "K8_I1_A01_F.ab1".

I have written the Regular Expression
public static Regex DirectionFromFIleName = new
Regex("_(?<Direction>[Forward_|Reverse_|R$|F$])");

This looks for the underscore, followed by "Forward" or "Reverse" or an "F"
as the last character in the string or an "R" as the last character in the
string, or so I thought.

How can it pick up that F when it is not the last character? I assume it has
something to do with putting the $ inside the square brackets, but I can't
figure out exactly what it is.

I can figure out a bunch of different work arounds for this, but I would
like to understand what the regular expression is doing for the future.

There's a few error's in your regex. Let me try to explain:

_ : Find a '_'
[Forward_|Reverse_|R$|F$] : followed by any letter in the
following group 'F','o','r','w'
... ... 's', 'e', '_', '|', 'R', '$'
'F'
(?<Direction>) : Capture these in a named group called
Direction

Of course this isn't what you wanted ;)

This should work better:

_(?<Direction>Forward_|Reverse_|R$|F$)

Just removing the [] would make things work. Now it reads:

_ : Find a '_'
Forward_|Reverse_|R$|F$ : Find either "Forward_", "Reverse_",
"R" followed by end of line,
"F" followed by end of line
(?<Direction>) : Capture these in a named group called
Direction

I present a course in Regular expressions and you've fallen into the
trap many of my students have before you. Be very sure what each kind of
brace, bracket etc means in which context.

() Group, Capture, Set options
[] Character set, any characters in there match
{} Quantifier
<> Naming of groups, look around

Jesse
Jesse
 
Thanks Jesse,
I had actually figured it out and was about to post the answer, but you
beat me to it.
The Regex which ended up working as I wanted is
"_(?<Direction>Forward|Reverse|R$|F$)"

This is slightly different from what is below, but only because I
changed my mind about what characters to capture...
Ethan


Jesse Houwing said:
* Ethan Strauss wrote, On 5-7-2007 18:15:
Hi,

I have written a regular expression which is supposed to pull a
direction (forward or reverse) designation from a file name.

Unfortunately, the direction designation can either be the whole word
("Forward" or "Reverse") or just a single letter ("F" or "R") and the
rest of the name is not as consistent as I would like.. For example
"P1|1_G10_Forward_primer.ab1" or "K8_I1_A01_F.ab1".

I have written the Regular Expression
public static Regex DirectionFromFIleName = new
Regex("_(?<Direction>[Forward_|Reverse_|R$|F$])");

This looks for the underscore, followed by "Forward" or "Reverse" or an
"F" as the last character in the string or an "R" as the last character
in the string, or so I thought.

How can it pick up that F when it is not the last character? I assume it
has something to do with putting the $ inside the square brackets, but I
can't figure out exactly what it is.

I can figure out a bunch of different work arounds for this, but I would
like to understand what the regular expression is doing for the future.

There's a few error's in your regex. Let me try to explain:

_ : Find a '_'
[Forward_|Reverse_|R$|F$] : followed by any letter in the
following group 'F','o','r','w'
... ... 's', 'e', '_', '|', 'R', '$'
'F'
(?<Direction>) : Capture these in a named group called
Direction

Of course this isn't what you wanted ;)

This should work better:

_(?<Direction>Forward_|Reverse_|R$|F$)

Just removing the [] would make things work. Now it reads:

_ : Find a '_'
Forward_|Reverse_|R$|F$ : Find either "Forward_", "Reverse_",
"R" followed by end of line,
"F" followed by end of line
(?<Direction>) : Capture these in a named group called
Direction

I present a course in Regular expressions and you've fallen into the trap
many of my students have before you. Be very sure what each kind of brace,
bracket etc means in which context.

() Group, Capture, Set options
[] Character set, any characters in there match
{} Quantifier
<> Naming of groups, look around

Jesse
Jesse
 
Back
Top