Syntax for regular expression to highlight text in HTML string

G

Guest

I'm relatively new to regular expressions and was looking for some help on a
problem that I need to solve. Basically, given an HTML string, I need to
highlight certain words within the text of the string. I had it working
somewhat, but ran into problems if one of the highlighted words could also be
part of an HTML tag (such as 'Table' or 'Border'). What I need is the regex
to find the word, but ignore any words that fall between an HTML tag.

Here's a snippet of code that I have used to highlight the text, but I need
this to exclude any words that fall between HTML tags.

public string AnswerXMLSearchHighlight(string Question, string AnswerXML,int
FAQId , string HighlightColor)
{
//Strip unneeded characters from strQuestion
string strQuestion = Regex.Replace(Question, @"[^\w\.@-]", " ");

string highlightedAnswer = AnswerXML;
string[] searchWords = strQuestion.Split(null);
foreach (string word in searchWords)
{
// the empty string was getting put into the array
// Don't replace it.
// also, let's exclude any word of 1 character
if (word!="" & word.Trim().Length > 1 )
{
string pattern = "( " + word + ")";
highlightedAnswer = Regex.Replace(highlightedAnswer,pattern,"<B
style='color:black;background-color:" + HighlightColor +
"'>$1</B>",RegexOptions.IgnoreCase);
}
}
return highlightedAnswer;
}

Thanks for any assistance,
Dan
 
G

Guest

Dan Schumm said:
I'm relatively new to regular expressions and was looking for some help on a
problem that I need to solve. Basically, given an HTML string, I need to
highlight certain words within the text of the string. I had it working
somewhat, but ran into problems if one of the highlighted words could also be
part of an HTML tag (such as 'Table' or 'Border'). What I need is the regex
to find the word, but ignore any words that fall between an HTML tag.

Here's a snippet of code that I have used to highlight the text, but I need
this to exclude any words that fall between HTML tags.

public string AnswerXMLSearchHighlight(string Question, string AnswerXML,int
FAQId , string HighlightColor)
{
//Strip unneeded characters from strQuestion
string strQuestion = Regex.Replace(Question, @"[^\w\.@-]", " ");

string highlightedAnswer = AnswerXML;
string[] searchWords = strQuestion.Split(null);
foreach (string word in searchWords)
{
// the empty string was getting put into the array
// Don't replace it.
// also, let's exclude any word of 1 character
if (word!="" & word.Trim().Length > 1 )
{
string pattern = "( " + word + ")";
highlightedAnswer = Regex.Replace(highlightedAnswer,pattern,"<B
style='color:black;background-color:" + HighlightColor +
"'>$1</B>",RegexOptions.IgnoreCase);
}
}
return highlightedAnswer;
}

Thanks for any assistance,
Dan

Dan,

Here's a modification of your regex that highlights what you want. Note that
the "if" expression was corrected and simplified (the & operator is for
bitwise operations. && is needed for logical operations. regardless,
checking "word" to see if it was an empty string wasn't needed since it's
length would be zero):

public string AnswerXMLSearchHighlight(string Question, string
AnswerXML, int FAQId, string HighlightColor)
{
//Strip unneeded characters from strQuestion
string strQuestion = Regex.Replace(Question, @"[^\w\.@-]", " ");

string highlightedAnswer = AnswerXML;
string[] searchWords = strQuestion.Split(null);
foreach (string word in searchWords)
{
// the empty string was getting put into the array
// Don't replace it.
// also, let's exclude any word of 1 character
if (word.Trim().Length > 1)
{
string pattern = "(" + word + ")(?=[^>]*<)";
highlightedAnswer = Regex.Replace(highlightedAnswer, pattern,
"<B style='color:black;background-color:" + HighlightColor +
"'>$1</B>", RegexOptions.IgnoreCase);
}
}

return highlightedAnswer;
}

Hope this helps.
 
G

Guest

Works great! Thanks

Chris R. Timmons said:
Dan Schumm said:
I'm relatively new to regular expressions and was looking for some help on a
problem that I need to solve. Basically, given an HTML string, I need to
highlight certain words within the text of the string. I had it working
somewhat, but ran into problems if one of the highlighted words could also be
part of an HTML tag (such as 'Table' or 'Border'). What I need is the regex
to find the word, but ignore any words that fall between an HTML tag.

Here's a snippet of code that I have used to highlight the text, but I need
this to exclude any words that fall between HTML tags.

public string AnswerXMLSearchHighlight(string Question, string AnswerXML,int
FAQId , string HighlightColor)
{
//Strip unneeded characters from strQuestion
string strQuestion = Regex.Replace(Question, @"[^\w\.@-]", " ");

string highlightedAnswer = AnswerXML;
string[] searchWords = strQuestion.Split(null);
foreach (string word in searchWords)
{
// the empty string was getting put into the array
// Don't replace it.
// also, let's exclude any word of 1 character
if (word!="" & word.Trim().Length > 1 )
{
string pattern = "( " + word + ")";
highlightedAnswer = Regex.Replace(highlightedAnswer,pattern,"<B
style='color:black;background-color:" + HighlightColor +
"'>$1</B>",RegexOptions.IgnoreCase);
}
}
return highlightedAnswer;
}

Thanks for any assistance,
Dan

Dan,

Here's a modification of your regex that highlights what you want. Note that
the "if" expression was corrected and simplified (the & operator is for
bitwise operations. && is needed for logical operations. regardless,
checking "word" to see if it was an empty string wasn't needed since it's
length would be zero):

public string AnswerXMLSearchHighlight(string Question, string
AnswerXML, int FAQId, string HighlightColor)
{
//Strip unneeded characters from strQuestion
string strQuestion = Regex.Replace(Question, @"[^\w\.@-]", " ");

string highlightedAnswer = AnswerXML;
string[] searchWords = strQuestion.Split(null);
foreach (string word in searchWords)
{
// the empty string was getting put into the array
// Don't replace it.
// also, let's exclude any word of 1 character
if (word.Trim().Length > 1)
{
string pattern = "(" + word + ")(?=[^>]*<)";
highlightedAnswer = Regex.Replace(highlightedAnswer, pattern,
"<B style='color:black;background-color:" + HighlightColor +
"'>$1</B>", RegexOptions.IgnoreCase);
}
}

return highlightedAnswer;
}

Hope this helps.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top