Regex Question

J

Joerg Battermann

Hello there,

I am a little bit confused by a regex I need to run on some strings...
basically I want to match it all "file:// .... .xls" occurrences, but
only the ones that do NOT start or end with a " (quote). The reason
for that is basically I want to find pure text occurrences of a file://-link
within a html file, and not the ones that are <a href="file:\
\....xls">abc</a> ones...


Does anyone maybe know from the top of his/her head what the correct
regex would be in this situation?


Cheers and thanks,
-Jörg
 
D

Dathan

Hello there,

I am a little bit confused by a regex I need to run on some strings...
basically I want to match it all "file:// .... .xls" occurrences, but
only the ones that do NOT start or end with a " (quote). The reason
for that is basically I want to find pure text occurrences of a file://-link
within a html file, and not the ones that are <a href="file:\
\....xls">abc</a> ones...

Does anyone maybe know from the top of his/her head what the correct
regex would be in this situation?

Cheers and thanks,
-Jörg

I think something like [^"](file://.+\.xls)[^"] should do the trick.
Beware, though -- XML and XHTML (and maybe HTML?) allow the use of
single-quoted attributes, too. So you might have to change the regex
to [^'"](file://.+\.xls)[^'"] or something similar.
 
D

Dathan

Hello there,
I am a little bit confused by a regex I need to run on some strings...
basically I want to match it all "file:// .... .xls" occurrences, but
only the ones that do NOT start or end with a " (quote). The reason
for that is basically I want to find pure text occurrences of a file://-link
within a html file, and not the ones that are <a href="file:\
\....xls">abc</a> ones...
Does anyone maybe know from the top of his/her head what the correct
regex would be in this situation?
Cheers and thanks,
-Jörg

I think something like [^"](file://.+\.xls)[^"] should do the trick.
Beware, though -- XML and XHTML (and maybe HTML?) allow the use of
single-quoted attributes, too.  So you might have to change the regex
to [^'"](file://.+\.xls)[^'"] or something similar.

May need to use .+? instead of .+, as .+ does greedy matching and .+?
does not. With .+, if you have multiple occurrences of
"file://.......xsl" on a single line, it'll include the first file://
and the last .xls and everything between as a single match.

~Dathan
May need to change this to [^'"](file://.+?\.xls)[^'"] to turn off
greedy matching. (I think that's the correct syntax
 
J

Jeff Johnson

I think something like [^"](file://.+\.xls)[^"] should do the trick.
Beware, though -- XML and XHTML (and maybe HTML?) allow the use of
single-quoted attributes, too. So you might have to change the regex
to [^'"](file://.+\.xls)[^'"] or something similar.
May need to use .+? instead of .+, as .+ does greedy matching and .+?
does not. With .+, if you have multiple occurrences of
"file://.......xsl" on a single line, it'll include the first file://
and the last .xls and everything between as a single match.

You should probably also use a backreference so that whatever type of
quotation mark you match the first time, you match the second time. Don't
ask me for the syntax, I don't remember; I just know it exists.

The other thing that came to my mind when I saw this question was maybe it
will require lookahead/lookbehind. But maybe that's overkill in this
situation.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Top