Regex expression to remove some html tags

  • Thread starter Thread starter Spondishy
  • Start date Start date
S

Spondishy

Hi,

Does anyone have a good regex expression to remove some html tags that
would be efficient in .Net. Basically I want to keep anchors, bolds and
a few others, so an expression that says remove all tags except these
few would be best.

Thanks.
 
Just wiped this together, its may need some work.

string expres = @"<(?![!/]?[ABIU][>\s])[^>]*>";

string output = Regex.Replace(inputStr, expres, "", RegexOptions.IgnoreCase
| RegexOptions.Multiline);
 
Was tested using

<html>
<body>
<a name="top">
<b>My Website</b><br><br>
Here is the text for my website.
<table border="0" cellpadding="0">
<tr>
<td>Cell 1</td>
</tr>
<tr>
<td>Cell 2</td>
</tr>
</body>
</html>

you will still have to go through and replace /r/n s
 
Back
Top