About a class I wrote to filter bad html input


Owen Wong

Please look at my newly written class. It is meant to be used to filter
suspicious html input from an online html editor. I need help about 2
1. Does it need to filter more things? Which I think is of course
needed although I don't know where to improve.
2. You see I try to filter any link. If the target address is not
started with "http://" or "mailto:", it will be replaced with an empty
string. But I think the code I wrote can be rewritten to make it more
performant. But how?
Public Class strOp
Public Function filterHtml(ByVal s As String)
s = Regex.Replace(s,
"<script>|</script>|<iframe.*?><!--#include.*?>", "",
s = Regex.Replace(s, "<.*? (?:blush:nload|onclick|ondblclick)[
]?=[ ]?.*?>", "", RegexOptions.IgnoreCase)
Dim re As New Regex("<a .*?href\s*=\s*[""]?([^""
]*)[""]?.*?>", RegexOptions.IgnoreCase Or RegexOptions.Singleline)
Dim m As Match
Dim s1, s2 As String
Dim ms As MatchCollection
ms = re.Matches(s)
For Each m In ms
s1 = m.Value.ToLower.ToString
s2 = re.Replace(s1, "$1")
If Not (s2.StartsWith("mailto:") Or
s2.StartsWith("http://")) Then
s = s.Replace(s1, "<a href=''>")
End If
Return s
End Function
End Class

Owen Wong

a little bit update:
I changed the 4th line:
s = Regex.Replace(s, "<.*? (?:blush:nload|onclick|ondblclick)[
]?=[ ]?.*?>", "", RegexOptions.IgnoreCase)
s = Regex.Replace(s, "<.*?\s*(?:blush:n)[a-z]*\s*=\s*.*?>", "",
so that it can match all dhtml events.

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question
