regular expression help

  • Thread starter Thread starter Trevor Braun
  • Start date Start date
T

Trevor Braun

Hi, I'm not sure that this is the right forum for this, but I've been having
a very tough time completing this expression, and I was hoping someone might
have some suggestions for me.
I am trying to read measurements out of a text description, and I have a
working expression, but it captures a pile of empty matches. I obviously am
not interested in them, but I screw up my functionality when I try to get
rid of them.

My expression is:
(?:(?:(?<Feet>[0-9]*)\'){0,1}(?:(?:(?<WholeInches>[0-9]*(?![/\w])){0,1}(?:[
,\-]){0,1}(?<Fraction>[0-9]*\/[0-9]*){0,1}(?<Decimal>\d*\.\d*){0,1}\")){0,1})

Some test strings are:
1/4" x 2" Flat 44W x 20'
1 1/4" x 2" Flat 44W x 20'
1/4" x 2.5" Flat 44W x 20'
1/4" x 2" Flat 44W x 20' 3"
1/4" x 2" Flat 44W x 20' 3.5"
1/4" x 2" Flat 44W x 20' 1/2"
1/8" x 4" C-1018 flat x 14' 5-1/4"

I really could use some help on this. I've been working on this on and off
for several months now, and just can't seem to get it right.
 
Sorry, it's been a hectic day... I didn't finish my post, but somehow
managed to send it anyway....

In the strings, I there are always random numbers, and I want them ignored.
I only want matches on the measurements which can be written about a million
different ways. This is for pulling data out of a legacy inventory
application.

Any thoughts or suggestions would be very, very much appreciated. Right
now, my app uses this expression, and removes matches to the empty groups,
but this is just not how it should work.

Thanks,
Trevor_B
 
Trevor said:
Sorry, it's been a hectic day... I didn't finish my post, but somehow
managed to send it anyway....

In the strings, I there are always random numbers, and I want them ignored.
I only want matches on the measurements which can be written about a million
different ways. This is for pulling data out of a legacy inventory
application.

Any thoughts or suggestions would be very, very much appreciated. Right
now, my app uses this expression, and removes matches to the empty groups,
but this is just not how it should work.

Thanks,
Trevor_B

shoot me an email and i'll work with you on these. there's no need to
flood a C# newsgroup with a bunch of back and forth messages about
regular expressions, when they're just between you and me.

send me a long list of the test strings and i'll see what i can do for
you. i've never written a regular expression this complicated and i
would love to give it a try.

jeremiah
 
Hey trevor,

It maybe easier to write multiple regex strings than one large regex
string capable of handling all situations. There is always going to be
a legacy string that will fail your regex. So, instead have a set of
regex strings that you will loop through and try to match. If no match
is found, then you know you need to create a new regex.

It's like a bunch of security check points. If it fails one, then it
goes through another checkpoint. Having one large centralized
checkpoint can cause a lot of complications.

Give it a whirl because sometimes it's easier to have a bunch of little
tasks than one large complicated task.

josh
 
: Hi, I'm not sure that this is the right forum for this, but I've been
: having a very tough time completing this expression, and I was hoping
: someone might have some suggestions for me.
: I am trying to read measurements out of a text description, and I have
: a working expression, but it captures a pile of empty matches. I
: obviously am not interested in them, but I screw up my functionality
: when I try to get rid of them.
:
: My expression is:
: [snipped]
:
: Some test strings are:
: 1/4" x 2" Flat 44W x 20'
: 1 1/4" x 2" Flat 44W x 20'
: 1/4" x 2.5" Flat 44W x 20'
: 1/4" x 2" Flat 44W x 20' 3"
: 1/4" x 2" Flat 44W x 20' 3.5"
: 1/4" x 2" Flat 44W x 20' 1/2"
: 1/8" x 4" C-1018 flat x 14' 5-1/4"
:
: I really could use some help on this. I've been working on this on and
: off for several months now, and just can't seem to get it right.

One easy suggestion is that you can write "{0,1}" more succinctly as
"?", e.g., "a{0,1}" and "a?" are equivalent.

If you want to insist that one of the groups matches, then say what
you mean. Remember that the ? and * quantifiers *always* succeed
because they can match nothing.

For complex patterns, I like to use IgnorePatternWhitespace

Your subpatterns are inconsistent, e.g., some included the unit and
some didn't, and even with your followup, I may not be clear on what
you're trying to capture.

Take a look at the code below. Note how the pattern requires one of
the alternatives to match non-empty strings.

static void Main(string[] args)
{
Regex measurements = new Regex(
@"
(?<Fraction> (\d+\s+)?\d+/\d+"" ) |
(?<Decimal> \d+\.\d+"" ) |
(?<Feet> \d+' ) |
(?<WholeInches> \d+(?![/\w]) )
",
RegexOptions.IgnorePatternWhitespace |
RegexOptions.ExplicitCapture);

string[] inputs = {
"1/4\" x 2\" Flat 44W x 20'",
"1 1/4\" x 2\" Flat 44W x 20'",
"1/4\" x 2.5\" Flat 44W x 20'",
"1/4\" x 2\" Flat 44W x 20' 3\"",
"1/4\" x 2\" Flat 44W x 20' 3.5\"",
"1/4\" x 2\" Flat 44W x 20' 1/2\"",
"1/8\" x 4\" C-1018 flat x 14' 5-1/4\"",
};

string[] groups = {
"Feet", "WholeInches", "Fraction", "Decimal",
};

foreach (string input in inputs)
{
Console.WriteLine("[" + input + "]:");

int count = 1;
foreach (Match m in measurements.Matches(input))
{
Console.WriteLine(" - {0}:", count++);

foreach (string group in groups)
Console.WriteLine(" - {0}: [{1}]",
group, m.Groups[group].Value);
}
}
}

Is it at least a start in the right direction? Should an input such
as [20' 3"] produce one match or two (one for the feet component and
one for the inches component)? What else needs fixing?

I agree with Mark Noon: regular expressions are fun, so I look forward
to hearing back from you.

Hope this helps,
Greg
 
Back
Top