Regex to strip evil HTML tags

  • Thread starter Thread starter Daniel M. Hendricks
  • Start date Start date
D

Daniel M. Hendricks

I'm looking for a function/regex in C# to strip unwanted HTML tags from
comments posted to my web site. Previously, it was written in PHP and
I used this function to strip unwanted tags:

function removeEvilTags($source)
{
$allowedTags='<b><i><blockquote><ul><ol><li><br><a>';
$source = strip_tags($source, $allowedTags);
return preg_replace('/<(.*?)>/ie',
"'<'.removeEvilAttributes('\\1').'>'", $source);
}

I'd like to do the same in C# - strip all tags from the submission
except the following: <b>, <i>, <blockquote>, <ul>, <ol>, <li>, <br>,
<a>.

Can someone give an example of how to do this?

Thanks,
Daniel
http://www.danhendricks.com
 
Hi Daniel,

Just a reminder that if you're looking to prevent dangerous markup, ASP.NET
offers the

ValidateRequest attribute in @ Page

"Indicates whether request validation should occur. If true, request
validation checks all input data against a hard-coded list of potentially
dangerous values. If a match occurs, an HttpRequestValidationException Class
is thrown. The default is true.
This feature is enabled in the machine configuration file (Machine.config).
You can disable it in your application configuration file (Web.config) or on
the page by setting this attribute to false. "


http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpgenref/html/cpconPage.asp

Ken
Microsoft MVP [ASP.NET]
 
Daniel M. Hendricks said:
I'm looking for a function/regex in C# to strip unwanted HTML tags from
comments posted to my web site. Previously, it was written in PHP and
I used this function to strip unwanted tags:

function removeEvilTags($source)
{
$allowedTags='<b><i><blockquote><ul><ol><li><br><a>';
$source = strip_tags($source, $allowedTags);
return preg_replace('/<(.*?)>/ie',
"'<'.removeEvilAttributes('\\1').'>'", $source);
}

I'd like to do the same in C# - strip all tags from the submission
except the following: <b>, <i>, <blockquote>, <ul>, <ol>, <li>, <br>,
<a>.



You might look at Server.htmlencode
Mike
 
Back
Top