How to read text rendered of html file

  • Thread starter Thread starter Robin
  • Start date Start date
R

Robin

Hello
I need to read the rendered text of a html file. Does anybody know how to do
that?
Thanks
 
Robin,
Your post is unclear. When you say you need to "read" the rendered text, do
you mean you need to capture the contents of an HTML page without the HTML
Tags?

If so, you can make a WebRequest for the page and strip out just the "text"
with a Regular Expression. Search for "HTML to TEXT" and you should find
some good resources.
Peter
 
An HTML file *is* text. Are you referring to the text that appears in the
browser? To do that, you would have to parse the HTML. That would be quite a
difficult. There are a number of .Net HTML parsing libraries and articles on
how to do this out there, but none that I would trust. HTML is extremely
complex. The rules of HTML are complex, and there are quite a few HTML
documents out there that break the rules in various ways. As you are only
wanting to get the text displayed, your job would be a bit less difficult.
You could concentrate on only that aspect of the HTML, without having to
worry about the rest. Still, it is likely to be time-consuming and
frustrating to do.

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer
You can lead a fish to a bicycle,
but it takes a very long time,
and the bicycle has to *want* to change.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Back
Top