Help with Regex.replace

M

maheshvd

Hi Group,

I've a HTML document with all sorts of HTML tags. I nned to provide
search and replace feature for text in the HTML documents. User can
enter any phrase to search and any phrase to replace it with. While
searching, I strip all HMTL tags from the HTML document and search.
User can select the document(s) s/he wants to replace the desired
text.
While replacing, I've issue. How do I replace the string with the new
one?
e.g.
The HTML document may contain:

<li>This is a test document</li> All the <b>articles</b> here are
written for general public. <strong>Tip: <strong>If you do not find
desired articles, please mail <SPAN id="test" style="FONT-WEIGHT:
bold; COLOR: #ff0000">[email protected]</SPAN >

User may want to find
"All the articles here"
and replace with
"all the documents here".

The resultant document could be
<li>This is a test document</li> All the documents here are written
for general public. <strong>Tip: <strong>If you do not find desired
articles, please mail <SPAN id="test" style="FONT-WEIGHT: bold; COLOR:
#ff0000">[email protected]</SPAN >

So while replacing the string, can I somehow ignore the HTML tags and
achieve replacement? Rest of the HTML tags must be retained in the
HTML doc.
Any thoughts will be appreciated.

Regards,
dev
 
A

Alexey Smirnov

Hi Group,

I've a HTML document with all sorts of HTML tags. I nned to provide
search and replace feature for text in the HTML documents. User can
enter any phrase to search and any phrase to replace it with. While
searching, I strip all HMTL tags from the HTML document and search.
User can select the document(s) s/he wants to replace the desired
text.
While replacing, I've issue. How do I replace the string with the new
one?
e.g.
The HTML document may contain:

<li>This is a test document</li> All the <b>articles</b> here are
written for general public. <strong>Tip: <strong>If you do not find
desired articles, please mail <SPAN id="test" style="FONT-WEIGHT:
bold; COLOR: #ff0000">[email protected]</SPAN >

User may want to find
"All the articles here"
and replace with
"all the documents here".

The resultant document could be
<li>This is a test document</li> All the documents here are written
for general public. <strong>Tip: <strong>If you do not find desired
articles, please mail <SPAN id="test" style="FONT-WEIGHT: bold; COLOR:
#ff0000">[email protected]</SPAN >

So while replacing the string, can I somehow ignore the HTML tags and
achieve replacement? Rest of the HTML tags must be retained in the
HTML doc.
Any thoughts will be appreciated.

string sourceTxt = "....";

string searchTxt = "All the articles here";
string replaceTxt = "all the documents here";

string searchPattern = searchTxt.replace(" ","(.*?)");
string replaceString = replaceTxt;

int i = 0;

while (replaceString.indexOf(" ") > -1) {
i+=1;
replaceString = Regex.Replace(" ", "$" + i.toString(), 1);
}

string finalTxt = Regex.Replace(sourceTxt, searchTxt, replaceString);
 
A

Alexey Smirnov

string sourceTxt = "....";

string searchTxt = "All the articles here";
string replaceTxt = "all the documents here";

string searchPattern = searchTxt.replace(" ","(.*?)");
string replaceString = replaceTxt;

int i = 0;

while (replaceString.indexOf(" ") > -1) {
i+=1;
replaceString = Regex.Replace(" ", "$" + i.toString(), 1);

}

string finalTxt = Regex.Replace(sourceTxt, searchTxt, replaceString);- Hide quoted text -

A silly typo, sorry:

string sourceTxt = "....";

string searchTxt = "All the articles here";
string replaceTxt = "all the documents here";

string searchPattern = searchTxt.Replace(" ", "(.*?)");
string replaceString = replaceTxt;

int i = 0;

Regex r = new Regex(@"\s");
while (replaceString.IndexOf(" ") > -1)
{
i += 1;
replaceString = r.Replace(replaceString, "$" +
i.ToString(), 1);
}

string finalTxt = Regex.Replace(sourceTxt, searchPattern,
replaceString);
 
M

maheshvd

Hey Alexey,
Thanks a ton. Thats a great solution.
There is a small hitch though. If the string to be replaced is bigger
that the searched string, the replacement string carries extra $3,$4.
I'm counting the words in both the strings and whateever remains goes
in the last replacement.
Hope this is the right way.
Regards,
Mahesh
 
A

Alexey Smirnov

Hey Alexey,
Thanks a ton. Thats a great solution.
There is a small hitch though. If the string to be replaced is bigger
that the searched string, the replacement string carries extra $3,$4.
I'm counting the words in both the strings and whateever remains goes
in the last replacement.
Hope this is the right way.
Regards,
Mahesh

Yup, it could be a problem. Maybe we have to look for a better
approach.
 
M

maheshvd

Yup, it could be a problem. Maybe we have to look for a better
approach.

Moreover, (.*?) will not only ignore HTML tags, it may ignore whole
sentenses. e.g. if I have something like
"This is a test where we need to replace words. Also test words"
and I search for "test words" and try to replace with "test
sentences", it will replace in 2 places because in first sentence we
have "test" and "word" seperated by many other words which we are
trying to ignore. Is there any way we can say only if its HTML tag,
replace?
Thanks for all the help. I desperately need a solution to this.
Mahesh
 
A

Alexey Smirnov

Moreover, (.*?) will not only ignore HTML tags, it may ignore whole
sentenses. e.g. if I have something like
"This is a test where we need to replace words. Also test words"
and I search for "test words" and try to replace with "test
sentences", it will replace in 2 places because in first sentence we
have "test" and "word" seperated by many other words which we are
trying to ignore. Is there any way we can say only if its HTML tag,
replace?
Thanks for all the help. I desperately need a solution to this.
Mahesh

Sure, there is a way to do that.

Use this pattern:

test(((<[^>]*>)|\s)*?)words

It will skip HTML tags and spaces between words.
 
M

maheshvd

Yes, thats exactly what I was looking for. I tested it with few
strings, working fine. I'll test it thoroughly.
Thanks a ton.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top