Find text within HTML file

P

Piotrekk

Hi

Having a keyword i need to search HTML file for keyword dismissing all
the tags, and checking only plain text.
Is there an easy way to do it in C#?

Thanks
PK
 
P

Piotrekk

This will not do what i asked for.
This method only opens file and reads text. I need to find text within
HTML TAGS - text visible for the user opening the page.
 
A

Alex Meleta

Hi Alex,

Hmm, yeah, sorry. The simplest way is to match Regex like "search_string(?=[^>]*<)".
Other is defined by props of html (is it valid, what tags should be ingnored
and so on).

Regards, Alex
[TechBlog] http://devkids.blogspot.com


Hi Piotrekk,

For example,
System.IO.File.ReadAllText(@"C:\text.txt").Contains("something")

Regards, Alex
[TechBlog] http://devkids.blogspot.com
Hi

Having a keyword i need to search HTML file for keyword dismissing
all
the tags, and checking only plain text.
Is there an easy way to do it in C#?
Thanks
PK
 
N

Nicholas Paldino [.NET/C# MVP]

Piotrekk,

I would use the MSHTML.HTMLDocument class through COM interop (you can
navigate to the file on disk) to load the file from disk. Once you have
that, get the IHTMLElement implementation for the body element through the
body property on the document. Once you have that, you can call the
innerText property to get the text of the document (without tags).
 
G

Guest

You could use a Regex.Replace statement with the correct Regex expression to
"clean" all the HTML tags from the text string of the HTML Page, but that
might not even be necessary since it is unlikely your keyword will be found
in HTML tag names or attributes.
Have you tried just:
int foundPosition = myHtmlString.IndexOf(keyWord) ... ?
this will return the first position of the keyword, or -1 if not found.
-- Peter
Recursion: see Recursion
site: http://www.eggheadcafe.com
unBlog: http://petesbloggerama.blogspot.com
BlogMetaFinder: http://www.blogmetafinder.com
 
J

joachim

Note: if you also need to search for keywords in ALT text, or for a
title (which is outside the body tag) make sure you adapt your Regex/
search strategy accordingly.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top