parsing HTML programmatically

  • Thread starter Thread starter unklevo
  • Start date Start date
U

unklevo

Is there an easy way to convert HTML that comes from database as a
string into text and display it on winform...

Thanks.
 
A year ago I wrote a spider that crawls around http://www.wtng.info and
extracts all the country codes, country names, and telephone numbering
information. I haven't looked at it since then, but I think this is the
utility function I made that strips all HTML tags from a string leaving only
the text:

Use it like this:

'...get your data first
Dim htmlText As String = 'wherever you got your html text

' This will keep extra empty lines
Dim textOnly As String = stripHTML(htmlText, True)

' This will remove [most] extra empty lines
Dim textOnly As String = stripHTML(htmlText, False)

For this function to compile, you must Import
System.Text.RegularExpressions.

Private Function stripHTML( _
ByVal html As String, _
ByVal keepCRLF As Boolean _
) As String

' isolates a value between HTML tags and other control chars

Dim rxDrop As New Regex("(\<[^\>]+)\>(\r\n)*")
Dim rxKeep As New Regex("(\<[^\>]+)\>")

If keepCRLF Then
Return rxKeep.Replace(html, "").Trim()
Else
Return rxDrop.Replace(html, "").Trim()
End If

End Function
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Back
Top