Need Help with regular expression

L

Lucky

hi guys,

i'm looking for a RegEx which can find these type of string from the
bunch of html lines. if any one can help me here, would be appriciated.

<a
href="/url?sa=p&pref=ig&pval=2&q=http://www.google.co.in/ig?hl=en"
onmousedown="return rwt(this,'pro','hppphou:def','')">Personalized
Home</a>
<a
href="https://www.google.com/accounts/Login?continue=http://www.google.co.in/&hl=en">Sign
in</a>
<a id=1a class=q href="/imghp?hl=en&tab=wi" onClick="return
qs(this);">Images</a>
<a id=2a class=q href="http://groups.google.co.in/grphp?hl=en&tab=wg"
onClick="return qs(this);">Groups</a>
<a id=4a class=q href="http://news.google.co.in/nwshp?hl=en&tab=wn"
onClick="return qs(this);">News</a>
<a href="/intl/en/options/" class=q>more&nbsp;&raquo;</a>
<a href=/advanced_search?hl=en>Advanced Search</a>
<a href=/preferences?hl=en>Preferences</a>
<a href=/language_tools?hl=en>Language Tools</a>
<a href="http://www.google.co.in/hi">Hindi</a>
<a href="http://www.google.co.in/bn">Bengali</a>
<a href="http://www.google.co.in/te">Telugu</a>
<a href="http://www.google.co.in/mr">Marathi</a>
<a href="http://www.google.co.in/ta">Tamil</a>
<a href="/ads/">Advertising&nbsp;Programs</a>
<a href=/intl/en/about.html>About Google</a>
<a href=/jobs/positions-in.html onmousedown="return
rwt(this,'pro','hppwebjob:en_in','')">We're Hiring</a>
<a href=http://www.google.com/ncr>Go to Google.com</a>

guys, please do help me. i'm in big trouble
 
C

Cor Ligthert [MVP]

L

Lucky

okey dude,
ultimately i made one regEx by myself. here it is for all those who
are in need for the same

<a [a-zA-Z0-9 ="'.:;?]*href=*[a-zA-Z0-9 ="'.:;>?]*[^>]*>([a-zA-Z0-9
="'.:;>?]*[^<]*<)\s*/a\s*>


copy all the links from my last post into one string and run this
expression. it will show you the o/p that i wanted.
 
M

m.posseth

I would avoid it on a web page. A webpage can be changed in a
minute by the author.

huh ??

if you want to strip out text that is between anchor tags what can the
author change ( not setting it in anchor tags will also change your strip
requirment so what is the problem )

by the way i used regex to strip all html tags from a webpage and then
harvest all data in a database ,,, i do not see how a website owner can
change his html so i can`t harvest it annymore other as just not displaying
it on the web ;-)

regards

Michel Posseth
 
C

Cor Ligthert [MVP]

Michel,

You are right about anchor tags, I was speaking in general.

However I prefer MSHTML.
(Admitting that it is as well not so very good readable)

:)

Cor
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top