How to remove HTML special characters and tags from RSS feed data.

S

squishyalt

Here is an example of RSS feed data from Google....

"<table border="0" cellpadding="2" cellspacing="7"
style="vertical-align:top;"><tr><td width="80" align="center"
valign="top"><font style="font-size:85%;font-family:arial,sans-serif"><a
href="http://news.google.com/news/url?fd=...EAeC_g&usg=AFQjCNHOZ9w9BRN3suaUKX5NtYyvXwu1Hg"><img
src="http://nt0.ggpht.com/news/tbn/JId-xOOMV8PujM/6.jpg" alt="" border="1"
width="80" height="80" /><br /><font size="-2">AFP</font></a></font></td><td
valign="top" class="j"><font
style="font-size:85%;font-family:arial,sans-serif"><br /><div
style="padding-top:0.8em;"><img alt="" height="1" width="1" /></div><div
class="lh"><a
href="http://news.google.com/news/url?fd=...&pos=6&usg=AFQjCNFpZEDNQOe3IGg-ywaHRs4QbhY0-g"><b>Mitsubishi
UFJ May Announce Share Sale This Week</b></a><br /><font size="-1"><b><font
color="#6f6f6f">Bloomberg</font></b></font><br /><font size="-1">Nov. 16
(Bloomberg) -- Mitsubishi UFJ Financial Group Inc. may announce Japan's
biggest secondary share sale this week as it prepares for stricter global
capital rules, according to a survey of analysts. The nation's largest bank
by <b>...</b></font><br /><font size="-1"><a
href="http://news.google.com/news/url?fd=...091116&usg=AFQjCNEmza8jguldmVwYZHQbCgrNWeEr9g">MUFG
shares fall nearly 5 pct on share issue plan</a><font size="-1"
color="#6f6f6f"><nobr>Reuters</nobr></font></font><br /><font size="-1"><a
href="http://news.google.com/news/url?fd=...ws_wsj&usg=AFQjCNHnBOh1rt7oohVtBHRwwRmTYEr4Qg">Mitsubishi
UFJ Weighs Raising $11 Billion</a><font size="-1" color="#6f6f6f"><nobr>Wall
Street Journal</nobr></font></font><br /><font size="-1"><a
href="http://news.google.com/news/url?fd=...OGLEFI&usg=AFQjCNGhb0g_PzOJYIRcpSDeA6Xv7hBu8w">Mitsubishi
UFJ Ponders Stock Sale</a><font size="-1"
color="#6f6f6f"><nobr>TheStreet.com</nobr></font></font><br /><font size="-1"
class="p"><a
href="http://news.google.com/news/url?fd=...-11-14&usg=AFQjCNHCJpoOUHe_gaVZIV1gk0j42MkDPg"><nobr>MarketWatch</nobr></a> -<a
href="http://news.google.com/news/url?fd=...shares&usg=AFQjCNESYgvq1nwyiLyos9-_bdhvmHn18w"><nobr>TopNews
United States</nobr></a> -<a
href="http://news.google.com/news/url?fd=...DATE-2&usg=AFQjCNFXGg95rPlOBIA2bKc0jd2Q3yFk2g"><nobr>Forexyard</nobr></a></font><br
/><font class="p" size="-1"><a class="p"
href="http://news.google.com/news/more?ned=us&topic=b&ncl=dPy1T0LK7URGP1MyRL2FQL4se2MkM"><nobr><b>all
69 news articles »</b></nobr></a></font></div></font></td></tr></table>"

I want to make this human readable. To do so, I need to (1) replace the
HTML special characters (like " " with their text equivilent) and (2) remove
all HTML tags (like <b> or </b>).

Now, just looking at the list of possible HTML character expressions at
http://www.degraeve.com/reference/specialcharacters.php this is no simple
task to code. (I could do it, but I am pressed for time and am looking for
anything that may help me save coding time in this project.)

Is there any way to do this already buried deep in the namespaces of VB.net
2008 that I have missed?

How about a class or tool that makes this possible?

I can't believe that no such thing exists...I must just not be able to find
it on my own.

Any help would be greatly appreciated.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top