A nagging Regex question

G

Guest

Given some text that looks like:
random Text Key 1.00
More random Text
Item more random text = "Wanted1", "More random Text"
even more random Text
EndItem
Item more random text = "Wanted2", "Quoted Random Text" more Random Text
even more random Text
even more random Text
EndItem
even more random Text
even more random Text
(where the random Text does not contain "Item", "=", or any """")
I would like to, in one Regex, capture such that
Group(1) = 1.00
Group(2) = Wanted1
Group(3) = Wanted2

I have been reduced to using 2 Regexs:
Key (\d\.\d+) -- to capture the 1.00 and
Item[^=]+=\s?"(\w+)" to capture Wanted1 and Wanted2

All attempts to combine the two Regexs result in (at best):
Group(1) = 1.00
Group(2) = Wanted2

I understand that the greedy match leads me to this result, but it seems to
me that this should be doable in 1 Regex.
 
K

Kevin Spencer

I would like to, in one Regex, capture such that
Group(1) = 1.00
Group(2) = Wanted1
Group(3) = Wanted2

Regular Expressions match patterns. A pattern is a set of rules regarding
the text to match. What you have posted is not a set of rules. Therefore, it
is not possible to determine what your Regular Expression should be.

For example, do you really want to capture the literal string "1.00"? If so,
you can use "1.00". But your Regular Expression indicates that the rules for
this match are:

(\d\.\d+) - Exactly1 digit character, followed by exactly1 dot, followed by
at least 1 digit character.

So, that Regular Expression would match:
2.1
1.0
3.987654321

But would *not* match:
5
100
67.9 (it would capture "7.9")
-9.5 (it would capture "9.5")

Could you please define the rules exactly for both patterns?

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer

Presuming that God is "only an idea" -
Ideas exist.
Therefore, God exists.

Jim Parsells said:
Given some text that looks like:
random Text Key 1.00
More random Text
Item more random text = "Wanted1", "More random Text"
even more random Text
EndItem
Item more random text = "Wanted2", "Quoted Random Text" more Random Text
even more random Text
even more random Text
EndItem
even more random Text
even more random Text
(where the random Text does not contain "Item", "=", or any """")
I would like to, in one Regex, capture such that
Group(1) = 1.00
Group(2) = Wanted1
Group(3) = Wanted2

I have been reduced to using 2 Regexs:
Key (\d\.\d+) -- to capture the 1.00 and
Item[^=]+=\s?"(\w+)" to capture Wanted1 and Wanted2

All attempts to combine the two Regexs result in (at best):
Group(1) = 1.00
Group(2) = Wanted2

I understand that the greedy match leads me to this result, but it seems
to
me that this should be doable in 1 Regex.
 
G

Guest

Indeed, the (\d\.\d+) is capturing exactly what I want. If the data is
3.987654321, then that is what I want to capture.

By the same token, I wish to capture the first quoted value following each
literal occurance of "Item" that is not a part of "EndItem" (though the
pattern I am using doesn't exclude "EndItem", I know).

The meat of the question is that "Item" .....quoted text ..... "EndItem"
will occur 1 or more times, and I wish to capture ALL of the quoted text
occurances, not just the first or last. The pattern Item[^=]+=\s?"(\w+)" ,
used alone will do just that. However and for example, the pattern:

(?:Key (\d\.\d+).*){1,1}.?(?:(?:Item[^=]+=\s?"(\w+)"))+?

will only capture "1.00" and "Wanted2" because of the greedy match behavior
of the engine. I am looking for a pattern that will capture ALL of instances
of:
first quoted text occurring
After "Item" followed by NOT "=" followed by "="
and prior to "EndItem".

This is actually a recurring requirement: A need to capture some info at the
beginning of a string followed by recurring groups of identically formatted
information. For example, segments of XML may follow this pattern.

Thanks, Jim Parsells
--
Jim Parsells


Kevin Spencer said:
I would like to, in one Regex, capture such that
Group(1) = 1.00
Group(2) = Wanted1
Group(3) = Wanted2

Regular Expressions match patterns. A pattern is a set of rules regarding
the text to match. What you have posted is not a set of rules. Therefore, it
is not possible to determine what your Regular Expression should be.

For example, do you really want to capture the literal string "1.00"? If so,
you can use "1.00". But your Regular Expression indicates that the rules for
this match are:

(\d\.\d+) - Exactly1 digit character, followed by exactly1 dot, followed by
at least 1 digit character.

So, that Regular Expression would match:
2.1
1.0
3.987654321

But would *not* match:
5
100
67.9 (it would capture "7.9")
-9.5 (it would capture "9.5")

Could you please define the rules exactly for both patterns?

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer

Presuming that God is "only an idea" -
Ideas exist.
Therefore, God exists.

Jim Parsells said:
Given some text that looks like:
random Text Key 1.00
More random Text
Item more random text = "Wanted1", "More random Text"
even more random Text
EndItem
Item more random text = "Wanted2", "Quoted Random Text" more Random Text
even more random Text
even more random Text
EndItem
even more random Text
even more random Text
(where the random Text does not contain "Item", "=", or any """")
I would like to, in one Regex, capture such that
Group(1) = 1.00
Group(2) = Wanted1
Group(3) = Wanted2

I have been reduced to using 2 Regexs:
Key (\d\.\d+) -- to capture the 1.00 and
Item[^=]+=\s?"(\w+)" to capture Wanted1 and Wanted2

All attempts to combine the two Regexs result in (at best):
Group(1) = 1.00
Group(2) = Wanted2

I understand that the greedy match leads me to this result, but it seems
to
me that this should be doable in 1 Regex.
 
K

Kevin Spencer

Hi Jim,

Forgive me, but I am very careful about making sure that I know what I'm
discussing before I say anything about it, so I tend to ask more questions!

Now that you've confirmed what the rules are, I think the solution is fairly
easy:

Key (\d\.\d+)|Item[^=]+=\s?"(\w+)"

What this does is combine the 2 Regular Expressions using the "|" (or)
operator, and grouping the results as Group 1 (Key value) and Group 2 (Item
value).

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer

Presuming that God is "only an idea" -
Ideas exist.
Therefore, God exists.

Jim Parsells said:
Indeed, the (\d\.\d+) is capturing exactly what I want. If the data is
3.987654321, then that is what I want to capture.

By the same token, I wish to capture the first quoted value following
each
literal occurance of "Item" that is not a part of "EndItem" (though the
pattern I am using doesn't exclude "EndItem", I know).

The meat of the question is that "Item" .....quoted text ..... "EndItem"
will occur 1 or more times, and I wish to capture ALL of the quoted text
occurances, not just the first or last. The pattern Item[^=]+=\s?"(\w+)" ,
used alone will do just that. However and for example, the pattern:

(?:Key (\d\.\d+).*){1,1}.?(?:(?:Item[^=]+=\s?"(\w+)"))+?

will only capture "1.00" and "Wanted2" because of the greedy match
behavior
of the engine. I am looking for a pattern that will capture ALL of
instances
of:
first quoted text occurring
After "Item" followed by NOT "=" followed by "="
and prior to "EndItem".

This is actually a recurring requirement: A need to capture some info at
the
beginning of a string followed by recurring groups of identically
formatted
information. For example, segments of XML may follow this pattern.

Thanks, Jim Parsells
--
Jim Parsells


Kevin Spencer said:
I would like to, in one Regex, capture such that
Group(1) = 1.00
Group(2) = Wanted1
Group(3) = Wanted2

Regular Expressions match patterns. A pattern is a set of rules regarding
the text to match. What you have posted is not a set of rules. Therefore,
it
is not possible to determine what your Regular Expression should be.

For example, do you really want to capture the literal string "1.00"? If
so,
you can use "1.00". But your Regular Expression indicates that the rules
for
this match are:

(\d\.\d+) - Exactly1 digit character, followed by exactly1 dot, followed
by
at least 1 digit character.

So, that Regular Expression would match:
2.1
1.0
3.987654321

But would *not* match:
5
100
67.9 (it would capture "7.9")
-9.5 (it would capture "9.5")

Could you please define the rules exactly for both patterns?

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer

Presuming that God is "only an idea" -
Ideas exist.
Therefore, God exists.

Jim Parsells said:
Given some text that looks like:
random Text Key 1.00
More random Text
Item more random text = "Wanted1", "More random Text"
even more random Text
EndItem
Item more random text = "Wanted2", "Quoted Random Text" more Random
Text
even more random Text
even more random Text
EndItem
even more random Text
even more random Text
(where the random Text does not contain "Item", "=", or any """")
I would like to, in one Regex, capture such that
Group(1) = 1.00
Group(2) = Wanted1
Group(3) = Wanted2

I have been reduced to using 2 Regexs:
Key (\d\.\d+) -- to capture the 1.00 and
Item[^=]+=\s?"(\w+)" to capture Wanted1 and Wanted2

All attempts to combine the two Regexs result in (at best):
Group(1) = 1.00
Group(2) = Wanted2

I understand that the greedy match leads me to this result, but it
seems
to
me that this should be doable in 1 Regex.
 
G

Guest

I am not 100% sure of what you are trying to do with your Regex, so I cannot
work out the string and be sure it will be of help to you, but I can give you
a way of setting it up.

Here is what I think you are trying to do (although I have some questions
from the way your post is written): Find a line where there is an equal sign
that has either a) a word value or b) a decimal value.

Take it to the next level and pseudo code it:

(Find Equal sign) with ((word) or (decimal value))

If you need something before the equal sign (the area I am fuzzy on looking
at your post), you can set the condition(s) before the find on the equal sign.

Great site for playing:
http://www.regular-expressions.info/tutorial.html

--
Gregory A. Beamer
MVP; MCP: +I, SE, SD, DBA

***************************
Think Outside the Box!
***************************
 
G

Guest

Thanks Kevin. I must have some mental block about "|". That is exactly what
I was looking for.
--
Jim Parsells


Kevin Spencer said:
Hi Jim,

Forgive me, but I am very careful about making sure that I know what I'm
discussing before I say anything about it, so I tend to ask more questions!

Now that you've confirmed what the rules are, I think the solution is fairly
easy:

Key (\d\.\d+)|Item[^=]+=\s?"(\w+)"

What this does is combine the 2 Regular Expressions using the "|" (or)
operator, and grouping the results as Group 1 (Key value) and Group 2 (Item
value).

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer

Presuming that God is "only an idea" -
Ideas exist.
Therefore, God exists.

Jim Parsells said:
Indeed, the (\d\.\d+) is capturing exactly what I want. If the data is
3.987654321, then that is what I want to capture.

By the same token, I wish to capture the first quoted value following
each
literal occurance of "Item" that is not a part of "EndItem" (though the
pattern I am using doesn't exclude "EndItem", I know).

The meat of the question is that "Item" .....quoted text ..... "EndItem"
will occur 1 or more times, and I wish to capture ALL of the quoted text
occurances, not just the first or last. The pattern Item[^=]+=\s?"(\w+)" ,
used alone will do just that. However and for example, the pattern:

(?:Key (\d\.\d+).*){1,1}.?(?:(?:Item[^=]+=\s?"(\w+)"))+?

will only capture "1.00" and "Wanted2" because of the greedy match
behavior
of the engine. I am looking for a pattern that will capture ALL of
instances
of:
first quoted text occurring
After "Item" followed by NOT "=" followed by "="
and prior to "EndItem".

This is actually a recurring requirement: A need to capture some info at
the
beginning of a string followed by recurring groups of identically
formatted
information. For example, segments of XML may follow this pattern.

Thanks, Jim Parsells
--
Jim Parsells


Kevin Spencer said:
I would like to, in one Regex, capture such that
Group(1) = 1.00
Group(2) = Wanted1
Group(3) = Wanted2

Regular Expressions match patterns. A pattern is a set of rules regarding
the text to match. What you have posted is not a set of rules. Therefore,
it
is not possible to determine what your Regular Expression should be.

For example, do you really want to capture the literal string "1.00"? If
so,
you can use "1.00". But your Regular Expression indicates that the rules
for
this match are:

(\d\.\d+) - Exactly1 digit character, followed by exactly1 dot, followed
by
at least 1 digit character.

So, that Regular Expression would match:
2.1
1.0
3.987654321

But would *not* match:
5
100
67.9 (it would capture "7.9")
-9.5 (it would capture "9.5")

Could you please define the rules exactly for both patterns?

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer

Presuming that God is "only an idea" -
Ideas exist.
Therefore, God exists.

Given some text that looks like:
random Text Key 1.00
More random Text
Item more random text = "Wanted1", "More random Text"
even more random Text
EndItem
Item more random text = "Wanted2", "Quoted Random Text" more Random
Text
even more random Text
even more random Text
EndItem
even more random Text
even more random Text
(where the random Text does not contain "Item", "=", or any """")
I would like to, in one Regex, capture such that
Group(1) = 1.00
Group(2) = Wanted1
Group(3) = Wanted2

I have been reduced to using 2 Regexs:
Key (\d\.\d+) -- to capture the 1.00 and
Item[^=]+=\s?"(\w+)" to capture Wanted1 and Wanted2

All attempts to combine the two Regexs result in (at best):
Group(1) = 1.00
Group(2) = Wanted2

I understand that the greedy match leads me to this result, but it
seems
to
me that this should be doable in 1 Regex.
 
G

Guest

Thanks for the response. However, the question has been answered.
See my elaboration to Kevin and his reply. Problem solved.
Key (\d\.\d+)|Item[^=]+=\s?"(\w+)"

I was looking for a more complex solution when the simple did the job.
--
Jim Parsells


Cowboy (Gregory A. Beamer) - MVP said:
I am not 100% sure of what you are trying to do with your Regex, so I cannot
work out the string and be sure it will be of help to you, but I can give you
a way of setting it up.

Here is what I think you are trying to do (although I have some questions
from the way your post is written): Find a line where there is an equal sign
that has either a) a word value or b) a decimal value.

Take it to the next level and pseudo code it:

(Find Equal sign) with ((word) or (decimal value))

If you need something before the equal sign (the area I am fuzzy on looking
at your post), you can set the condition(s) before the find on the equal sign.

Great site for playing:
http://www.regular-expressions.info/tutorial.html

--
Gregory A. Beamer
MVP; MCP: +I, SE, SD, DBA

***************************
Think Outside the Box!
***************************


Jim Parsells said:
Given some text that looks like:
random Text Key 1.00
More random Text
Item more random text = "Wanted1", "More random Text"
even more random Text
EndItem
Item more random text = "Wanted2", "Quoted Random Text" more Random Text
even more random Text
even more random Text
EndItem
even more random Text
even more random Text
(where the random Text does not contain "Item", "=", or any """")
I would like to, in one Regex, capture such that
Group(1) = 1.00
Group(2) = Wanted1
Group(3) = Wanted2

I have been reduced to using 2 Regexs:
Key (\d\.\d+) -- to capture the 1.00 and
Item[^=]+=\s?"(\w+)" to capture Wanted1 and Wanted2

All attempts to combine the two Regexs result in (at best):
Group(1) = 1.00
Group(2) = Wanted2

I understand that the greedy match leads me to this result, but it seems to
me that this should be doable in 1 Regex.
 
K

Kevin Spencer

I must have some mental block about "|".

Don't be hard on yourself, Jim. Writing efficient Regular Expressions is a
challenging art. I am constantly on the lookout for more efficient solutions
than I have thought of, and constantly finding them!

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer

Presuming that God is "only an idea" -
Ideas exist.
Therefore, God exists.

Jim Parsells said:
Thanks Kevin. I must have some mental block about "|". That is exactly
what
I was looking for.
--
Jim Parsells


Kevin Spencer said:
Hi Jim,

Forgive me, but I am very careful about making sure that I know what I'm
discussing before I say anything about it, so I tend to ask more
questions!

Now that you've confirmed what the rules are, I think the solution is
fairly
easy:

Key (\d\.\d+)|Item[^=]+=\s?"(\w+)"

What this does is combine the 2 Regular Expressions using the "|" (or)
operator, and grouping the results as Group 1 (Key value) and Group 2
(Item
value).

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer

Presuming that God is "only an idea" -
Ideas exist.
Therefore, God exists.

Jim Parsells said:
Indeed, the (\d\.\d+) is capturing exactly what I want. If the data is
3.987654321, then that is what I want to capture.

By the same token, I wish to capture the first quoted value following
each
literal occurance of "Item" that is not a part of "EndItem" (though the
pattern I am using doesn't exclude "EndItem", I know).

The meat of the question is that "Item" .....quoted text .....
"EndItem"
will occur 1 or more times, and I wish to capture ALL of the quoted
text
occurances, not just the first or last. The pattern
Item[^=]+=\s?"(\w+)" ,
used alone will do just that. However and for example, the pattern:

(?:Key (\d\.\d+).*){1,1}.?(?:(?:Item[^=]+=\s?"(\w+)"))+?

will only capture "1.00" and "Wanted2" because of the greedy match
behavior
of the engine. I am looking for a pattern that will capture ALL of
instances
of:
first quoted text occurring
After "Item" followed by NOT "=" followed by "="
and prior to "EndItem".

This is actually a recurring requirement: A need to capture some info
at
the
beginning of a string followed by recurring groups of identically
formatted
information. For example, segments of XML may follow this pattern.

Thanks, Jim Parsells
--
Jim Parsells


:

I would like to, in one Regex, capture such that
Group(1) = 1.00
Group(2) = Wanted1
Group(3) = Wanted2

Regular Expressions match patterns. A pattern is a set of rules
regarding
the text to match. What you have posted is not a set of rules.
Therefore,
it
is not possible to determine what your Regular Expression should be.

For example, do you really want to capture the literal string "1.00"?
If
so,
you can use "1.00". But your Regular Expression indicates that the
rules
for
this match are:

(\d\.\d+) - Exactly1 digit character, followed by exactly1 dot,
followed
by
at least 1 digit character.

So, that Regular Expression would match:
2.1
1.0
3.987654321

But would *not* match:
5
100
67.9 (it would capture "7.9")
-9.5 (it would capture "9.5")

Could you please define the rules exactly for both patterns?

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer

Presuming that God is "only an idea" -
Ideas exist.
Therefore, God exists.

message
Given some text that looks like:
random Text Key 1.00
More random Text
Item more random text = "Wanted1", "More random Text"
even more random Text
EndItem
Item more random text = "Wanted2", "Quoted Random Text" more Random
Text
even more random Text
even more random Text
EndItem
even more random Text
even more random Text
(where the random Text does not contain "Item", "=", or any """")
I would like to, in one Regex, capture such that
Group(1) = 1.00
Group(2) = Wanted1
Group(3) = Wanted2

I have been reduced to using 2 Regexs:
Key (\d\.\d+) -- to capture the 1.00 and
Item[^=]+=\s?"(\w+)" to capture Wanted1 and Wanted2

All attempts to combine the two Regexs result in (at best):
Group(1) = 1.00
Group(2) = Wanted2

I understand that the greedy match leads me to this result, but it
seems
to
me that this should be doable in 1 Regex.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top