Regex replace where Search Value not between specific delimiters

R

Rory Becker

Hi all.

I have managed over the last few years to get by with relativly little regex
knowledge.

However I now have what seems like a simple problem which I simply cannot
find a solution for.

As stated Regular expressions are not my strong point.

What I need is a way to replace a set string within a source but only where
that set string is not surounded by brackets.

Thus I would like to (for instance) change "AB(A)BABA" into "CB(A)BCBC"

Must I match all the A's and then loop to find those that are not surrounded
or is there a better way?

Thanks in advance
 
W

Walter Wang [MSFT]

Hi Rory,

You can achieve this with "Negative Lookahead":

Regex RegexObj = new Regex("(?!\\()A(?!\\))");
Debug.Assert(RegexObj.Replace("AB(A)BABA", "C")=="CB(A)BCBC");


<quote>
#Grouping Constructs
http://msdn2.microsoft.com/en-us/library/bs2twtah.aspx

(?! subexpression)

(Zero-width negative lookahead assertion.) Continues match only if the
subexpression does not match at this position on the right. For example,
\b(?!un)\w+\b matches words that do not begin with un.
</quote>


Hope this helps.


Regards,
Walter Wang ([email protected], remove 'online.')
Microsoft Online Community Support

==================================================
When responding to posts, please "Reply to Group" via your newsreader so
that others may learn and benefit from your issue.
==================================================

This posting is provided "AS IS" with no warranties, and confers no rights.
 
R

Rory Becker

You can achieve this with "Negative Lookahead":
Regex RegexObj = new Regex("(?!\\()A(?!\\))");
Debug.Assert(RegexObj.Replace("AB(A)BABA", "C")=="CB(A)BCBC");
<quote>
#Grouping Constructs
http://msdn2.microsoft.com/en-us/library/bs2twtah.aspx
(?! subexpression)

(Zero-width negative lookahead assertion.) Continues match only if the
subexpression does not match at this position on the right. For
example,
\b(?!un)\w+\b matches words that do not begin with un.
</quote>

This is the definative example of why "Community Rocks".

I could have spent hours looking for this :) (but I wouldn't have :p. Sadly
my iterative process would have had to do.)

Not only do I now have the answer I was looking for, but I have the name
of the feature I was describing and a great reference back to the original
docs.

Cheers Walter, your answer is about as perfect as I could have wished for :)

Great job man
 
R

Rory Becker

You can achieve this with "Negative Lookahead":
Ok so I have now integrated this into my code and it works very well.

However I have found a problem which exists due to the greedy nature of the
modified regEx I am using.

Time to elaborate more on the problem....

I am trying to place parenthesis around key phrases within a text but only
if they are not contained already within parenthesis.

My code...
-------------------------------------------------------------
Dim Pattern as String = String.Format("(?!\(.*){0}(?!.*\))", regex.Escape(SearchPhrase))


However I am trying to do this iteratively with several phrases.

If I try to do this with "Hello World" and "today" in the following phrase....
 
R

Rory Becker

You can achieve this with "Negative Lookahead":
Ok so I have now integrated this into my code and it works very well.

However I have found a problem which exists due to the greedy nature of the
modified regEx I am using.

Time to elaborate more on the problem....

I am trying to place parenthesis around key phrases within a text but only
if they are not contained already within parenthesis.

However I am trying to do this iteratively with several phrases.

Thus applying first...
-------------------------------------------------------------
(?!\(.*)Hello World(?!.*\))
-------------------------------------------------------------
.... and then...
-------------------------------------------------------------
(?!\(.*)Hello(?!.*\))
-------------------------------------------------------------

If I try to do this with "Hello World" and "Hello" in the following phrase....
-------------------------------------------------------------
"Hello World. Hello. Hello World"
-------------------------------------------------------------
....I would like to get...
-------------------------------------------------------------
"(Hello World). (Hello). (Hello World)"
-------------------------------------------------------------
....but I wind up with...
-------------------------------------------------------------
"(Hello World). Hello. (Hello World)"
-------------------------------------------------------------

This appears to be because the regex subsystem views the word "Hello" to
already be surrounded by parenthesis.
I admit that I have changed the original RegEx to include references to .*
but without this I would have got...
-------------------------------------------------------------
"((Hello) World). (Hello). ((Hello) World)"
-------------------------------------------------------------
....as the Hello's from "Hello world" were found after already having been
parenthesised

I think I need a non greedy .* which I have researched and apears to be .*?
but this doesn't seem to change anything.

Any Ideas...
 
R

Rory Becker

Wouldn't you know it....

Found an answer... well one that will do for now.

All phrases are Alphanumeric + underscore so instead of
 
W

Walter Wang [MSFT]

Hi Rory,

I understand that the new problem is caused by the fact that one phrase is
a substring of another phrase. I did some research and this indeed seems
difficult to overcome with some clear rules. This will probably require
some more conditions as you currently found out. Please feel free to let me
know if there's anything I can help. Thanks.


Regards,
Walter Wang ([email protected], remove 'online.')
Microsoft Online Community Support

==================================================
When responding to posts, please "Reply to Group" via your newsreader so
that others may learn and benefit from your issue.
==================================================

This posting is provided "AS IS" with no warranties, and confers no rights.
 
B

Ben Voigt [C++ MVP]

Rory Becker said:
Ok so I have now integrated this into my code and it works very well.

However I have found a problem which exists due to the greedy nature of
the modified regEx I am using.

Time to elaborate more on the problem....

I am trying to place parenthesis around key phrases within a text but only
if they are not contained already within parenthesis.

However I am trying to do this iteratively with several phrases.

Thus applying first...
-------------------------------------------------------------
(?!\(.*)Hello World(?!.*\))
-------------------------------------------------------------
... and then...
-------------------------------------------------------------
(?!\(.*)Hello(?!.*\))
-------------------------------------------------------------

If I try to do this with "Hello World" and "Hello" in the following
phrase....
-------------------------------------------------------------
"Hello World. Hello. Hello World"
-------------------------------------------------------------
...I would like to get...
-------------------------------------------------------------
"(Hello World). (Hello). (Hello World)"
-------------------------------------------------------------
...but I wind up with...
-------------------------------------------------------------
"(Hello World). Hello. (Hello World)"
-------------------------------------------------------------

This appears to be because the regex subsystem views the word "Hello" to
already be surrounded by parenthesis.
I admit that I have changed the original RegEx to include references to .*
but without this I would have got...
-------------------------------------------------------------
"((Hello) World). (Hello). ((Hello) World)"
-------------------------------------------------------------
...as the Hello's from "Hello world" were found after already having been
parenthesised

I think I need a non greedy .* which I have researched and apears to be
.*?

How about '[^)]*?' which means a sequence of things except a closing
parenthesis... then the match must end at the first subsequent closing
parenthesis, not extend from the first open to the ultimate close.
 
W

Walter Wang [MSFT]

I think Rory's requirement is to first replace the "Hello World" with
"(Hello World)"; then replace "Hello" with "(Hello)"; however, since
"Hello" is a substring of "Hello World", this will result with "((Hello)
World)". Unless we could do the replace in one pass, I think it's difficult
to overcome.


Regards,
Walter Wang ([email protected], remove 'online.')
Microsoft Online Community Support

==================================================
When responding to posts, please "Reply to Group" via your newsreader so
that others may learn and benefit from your issue.
==================================================

This posting is provided "AS IS" with no warranties, and confers no rights.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top