Regex Question

  • Thread starter Thread starter JM
  • Start date Start date
J

JM

Hi,

I am not sure if this is the place to post a REGEX question if not, please
indicate me where i can post it?.

My question:
Given a string like: "Boston. MA. Holidays" I need to define the regular
expression (not other way) to find each dot, the previous word and the word
after. That is i
want to obtain:

first match: "Boston" "." "MA"
second match: "MA" "." "Holidays"

The problem is that if I use somthing like: ([^\s\.]+)(\.)(\s*[^\s\.]+) I
"cosume" MA in the first macth so I do not have a second match. That is I
obtained:

first match: "Boston" "." "MA"
and no more matches, since MA is consumed in the previous match.

Any help??

Thanks,
jaime
 
JM said:
I am not sure if this is the place to post a REGEX question if not, please
indicate me where i can post it?.

It's best to stick with one group, or if you can't do that, then
cross-post rather than posting separate messages to multiple groups - I
saw this message in .framework too.
Given a string like: "Boston. MA. Holidays" I need to define the regular
expression (not other way) to find each dot, the previous word and the word
after.

Regular expressions don't work that way - you can't "un-match"
characters after they've already been matched. You can use regular
expressions to get some of the way, and then you need to use C# etc. to
get the rest. So, write a regex that captures what you want:

---8<---
Regex re = new Regex(@"([^\s.]+)(\.)\s*([^\s.]+)");
string data = "Boston. MA. Holidays";

Match match = re.Match(data);
while (match.Success)
{
Console.WriteLine("{0}, {1}, {2}", match.Groups[1].Value,
match.Groups[2].Value, match.Groups[3].Value);
match = re.Match(data, match.Groups[2].Index);
}
--->8---

Results in:

---8<---
Boston, ., MA
MA, ., Holidays
--->8---

The key is to match again picking up where the '.' left off (you could
probably start matching at Groups[3].Index, I haven't thought about it
too much).

-- Barry
 
Back
Top