regex: multiple matches after initial text

  • Thread starter Thread starter Chance Hopkins
  • Start date Start date
C

Chance Hopkins

I'm trying to match a set of matches after some initial text:

mytext: "something" "somethingelse" "another thing"
"maybe another"
(?:mytext: )(?<mymatch>["]{1,1}[^"]+["]{1,1}[\s| ]+)+

I only get the last one "maybe another". I want to get all the values with
quotes as a group, hence the + after the ()'s.

Is this possible?
 
Chance said:
I'm trying to match a set of matches after some initial text:

mytext: "something" "somethingelse" "another thing"
"maybe another"
(?:mytext: )(?<mymatch>["]{1,1}[^"]+["]{1,1}[\s| ]+)+

I only get the last one "maybe another". I want to get all the values with
quotes as a group, hence the + after the ()'s.

Is this possible?
Regex regex = new Regex(@"
((?<mymatch>(?<="")\w+\s*\w+(?="")|[^\s""]+))",
RegexOptions.ExplicitCapture);

Test input:
"something" "somethingelse" "another thing" "maybe another"

Test output:
mymatch =»something«=
mymatch =»somethingelse«=
mymatch =»another thing«=
mymatch =»maybe another«=
 
Ken Arway said:
Chance said:
I'm trying to match a set of matches after some initial text:

mytext: "something" "somethingelse" "another thing"
"maybe another"
(?:mytext: )(?<mymatch>["]{1,1}[^"]+["]{1,1}[\s| ]+)+

I only get the last one "maybe another". I want to get all the values
with quotes as a group, hence the + after the ()'s.

Is this possible?
Regex regex = new Regex(@" ((?<mymatch>(?<="")\w+\s*\w+(?="")|[^\s""]+))",
RegexOptions.ExplicitCapture);

Test input:
"something" "somethingelse" "another thing" "maybe another"

Test output:
mymatch =»something«=
mymatch =»somethingelse«=
mymatch =»another thing«=
mymatch =»maybe another«=

Thanks for the help.

Is it possible to target this match after an initial match (so that I only
match ""'s after the text "mytext: "), without splitting the string first?

I'm sure I could use SubString with IndexOf "mytext: ", but I'd prefer to do
this all with RegEx if possible.

I'm working on a PPC with a 300mhz processor and have to do a large number
of these inside a case statement to begin with. It's got a thread running a
loop and I really need to try and be as minimal as possible.

Thanks again.
 
Hello Chance,
mytext: "something" "somethingelse" "another thing"
"maybe another"

maybe you want it a more generic way :
Regex regex = new Regex(@"(?<="")(\s*\w+\s*)+(?="")");
foreach (Match match in regex.Matches(inputText.Text))
outputText.AppendText(match.Value+"\r\n");

so you get also the words "another third token" etc.


ciao Frank
 
Frank Dzaebel said:
Hello Chance,


maybe you want it a more generic way :
Regex regex = new Regex(@"(?<="")(\s*\w+\s*)+(?="")");
foreach (Match match in regex.Matches(inputText.Text))
outputText.AppendText(match.Value+"\r\n");

so you get also the words "another third token" etc.


ciao Frank

Hi Frank, thanks for the help. I don't think I was clear in my first post.

Is it possible to target this match after an initial match (so that I only
match ""'s after the text "mytext: "), without splitting the string first?

I have other text in the document, prior to this text. I'm confused on
whether or not I can put those repeating patterns into a group object. I
only seem to get (using my expression) one match with two groups. Group 0 is
the whole string and groups 1 is the last "" pair with text.

I'm sure I could use SubString with IndexOf "mytext: ", but I'd prefer to do
this all with RegEx if possible.

*see other reply for "why" if interested.
 
Chance said:
Is it possible to target this match after an initial match (so that I only
match ""'s after the text "mytext: "), without splitting the string first?

I think that's different from what I understood from your first post.
To clarify, please provide a test input string and the output you want.
 
: Is it possible to target this match after an initial match (so that I
: only match ""'s after the text "mytext: "), without splitting the
: string first?
: [...]

Use something similar to the following:

string input =
"mytext: \"something\" \"somethingelse\" " +
"\"another thing\" \"maybe another\"";

string pat = @"^mytext:(\s*'(?<strings>[^\\']*(\\.[^\\']*)*)')+\s*$";
Regex regex = new Regex(pat.Replace('\'', '"'));

Match m = regex.Match(input);
if (m.Success)
foreach (Capture c in m.Groups["strings"].Captures)
Console.WriteLine(c);

The regular expression for catching the individual strings is based
on code from Damian Conway's Text::Balanced module:

http://search.cpan.org/~dconway/Text-Balanced-1.95/lib/Text/Balanced.pm

Hope this helps,
Greg
 
Sorry for the delay in responding, my week is a bit off from everyone else.


You're right, I did a real poor job of explaining this. Let me try again.
I'm trying to pull multiple matches after an initial match, for instance:


onevalue: "asdf" "1234"
twovalue: "9999" "8888"
"7777" "6666"
threevalue: "4444" 3333"
"1111"


with something like this:

(?:threevalue: )(?<mymatch>["]{1,1}[^"]+["]{1,1}[\s| ]+)+


I want to do a non capturing match (?:threevalue: ) and then do a group
capture after that of all values inside the "'s. Right now, I just get the
final one "1111". If I remove my initial match in the expression I match all
quoted values in the entire document and can retrieve them with a group
object.

I'm aware I can fix this by doing a substring and then a regex on that, but
I'm on a PPC and trying to save every bit of processor. I have to do this 16
or so times inside a loop that fires sometimes 1000 times or more. I'd
really like to do it all in Regex to save on all the substring calls, but I
don't even know if it's possible.


Thanks for the feedback.
 
: [...]
: I'm aware I can fix this by doing a substring and then a regex on
: that, but I'm on a PPC and trying to save every bit of processor. I
: have to do this 16 or so times inside a loop that fires sometimes 1000
: times or more. I'd really like to do it all in Regex to save on all
: the substring calls, but I don't even know if it's possible.

Did you see my followup[*]?

[*] Message-ID: <[email protected]>

Perhaps the more important question is whether you're engaging in
premature optimization.

Greg
 
Greg Bacon said:
: [...]
: I'm aware I can fix this by doing a substring and then a regex on
: that, but I'm on a PPC and trying to save every bit of processor. I
: have to do this 16 or so times inside a loop that fires sometimes 1000
: times or more. I'd really like to do it all in Regex to save on all
: the substring calls, but I don't even know if it's possible.

Did you see my followup[*]?

[*] Message-ID: <[email protected]>


Actually I didn't (for some reason maybe it didn't propigate over to the MS
server, maybe) and THANK YOU for speaking up. That got it going.

It was the captures group I had to loop through. Everything is WONDERFUL now
:-)

Thanks to you other guys for the help also.
 
: Actually I didn't (for some reason maybe it didn't propigate over
: to the MS server, maybe) and THANK YOU for speaking up. That got
: it going.

Good deal. Glad to help.

Greg
 
Back
Top