Stripping HTML

  • Thread starter Thread starter David Sawyer
  • Start date Start date
D

David Sawyer

I am trying to read in an HTML file and strip out the HTML
code so that all I have left is the text of the body.

Does anyone have any suggestions for doing this?
Any HTML stripping routines or objects that perform the
function?
 
Public Function StripHTML(ByVal HTML As String) As String
Dim strContent As String, mString As String
Dim mStartPos As Long, mEndPos As Long
Dim i, j

strContent = HTML.Replace("</P>", vbCrLf)
strContent = strContent.Replace("</p>", vbCrLf)

mStartPos = InStr(strContent, "<")
mEndPos = InStr(strContent, ">")
Do While mStartPos <> 0 And mEndPos <> 0 And mEndPos > mStartPos
mString = Mid(strContent, mStartPos, mEndPos - mStartPos + 1)
strContent = Replace(strContent, mString, "")
mStartPos = InStr(strContent, "<")
mEndPos = InStr(strContent, ">")
Loop
strContent = Replace(strContent, "&nbsp;", " ")
strContent = Replace(strContent, "&amp;", "&")
strContent = Replace(strContent, "&quot;", "'")
strContent = Replace(strContent, "&#", "#")
strContent = Replace(strContent, "&lt;", "<")
strContent = Replace(strContent, "&gt;", ">")
strContent = Replace(strContent, "%20", " ")
strContent = LTrim(Trim(strContent))
Do While Left(strContent, 1) = Chr(13) Or Left(strContent, 1) =
Chr(10)
strContent = Mid(strContent, 2)
Loop
Return strContent.Replace(vbCrLf, "<br>")
End Function
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Back
Top