OK, where'd you hide my freekin' Regex html -> text extractor?

  • Thread starter Thread starter _BNC
  • Start date Start date
B

_BNC

I've been looking for a couple weeks for a regex expression that will
extract text from html in a form that will look like IE screen output.
I'm sure one of you guys hid it somewhere as a joke, but it's not funny
any more. Give it up, OK?

Seriously, I've checked regexlib.com and regular-expressions.info and lots
of google searches. I find tag extractors and other related stuff but
haven't come up with the 'visible text' extractor.

The other way I could do this is to bring up a website (ala scraper) and
somehow do the programatic equivalent of 'mouse drag capturing' the
visible text into the clipboard. Not sure if that's easy to do though.
 
_BNC said:
The other way I could do this is to bring up a website (ala scraper) and
somehow do the programatic equivalent of 'mouse drag capturing' the
visible text into the clipboard. Not sure if that's easy to do though.

There is actually an easier way...use a tool like SgmlReader to convert
the HTML into XHTML, and you can then "query" for the data you want
using XPath expressions...

<a
href="http://www.gotdotnet.com/Community/...mpleGuid=B90FDDCE-E60D-43F8-A5C4-C3BD760564BC">SgmlReader</a>

HTH...

Chris
 
Back
Top