parsing HTML programmatically

unklevo · Aug 15, 2005

Is there an easy way to convert HTML that comes from database as a
string into text and display it on winform...

Thanks.

Mike Labosh · Aug 15, 2005

A year ago I wrote a spider that crawls around http://www.wtng.info and
extracts all the country codes, country names, and telephone numbering
information. I haven't looked at it since then, but I think this is the
utility function I made that strips all HTML tags from a string leaving only
the text:

Use it like this:

'...get your data first
Dim htmlText As String = 'wherever you got your html text

' This will keep extra empty lines
Dim textOnly As String = stripHTML(htmlText, True)

' This will remove [most] extra empty lines
Dim textOnly As String = stripHTML(htmlText, False)

For this function to compile, you must Import
System.Text.RegularExpressions.

Private Function stripHTML( _
ByVal html As String, _
ByVal keepCRLF As Boolean _
) As String

' isolates a value between HTML tags and other control chars

Dim rxDrop As New Regex("(\<[^\>]+)\>(\r\n)*")
Dim rxKeep As New Regex("(\<[^\>]+)\>")

If keepCRLF Then
Return rxKeep.Replace(html, "").Trim()
Else
Return rxDrop.Replace(html, "").Trim()
End If

End Function

Parse MSWord Files...	7	Jul 3, 2008
parsing HTML text	4	Jan 9, 2007
Parsing through HTML	7	Jan 12, 2009
Programmatically interact with website	1	Mar 13, 2010
Best third-party tool to repair OST files	3	Aug 25, 2025
Parsing an HTML document	1	Apr 26, 2005
A question about a failing regular expression	4	Jun 10, 2009
Easy Steps to Import PST Files to Office 365	0	May 13, 2025

parsing HTML programmatically

unklevo

Mike Labosh

Ask a Question

Similar Threads