Regex Question

Joerg Battermann · Dec 10, 2008

Hello there,

I am a little bit confused by a regex I need to run on some strings...
basically I want to match it all "file:// .... .xls" occurrences, but
only the ones that do NOT start or end with a " (quote). The reason
for that is basically I want to find pure text occurrences of a file://-link
within a html file, and not the ones that are <a href="file:\
\....xls">abc</a> ones...

Does anyone maybe know from the top of his/her head what the correct
regex would be in this situation?

Cheers and thanks,
-Jörg

Dathan · Dec 10, 2008

Hello there,

I am a little bit confused by a regex I need to run on some strings...
basically I want to match it all "file:// .... .xls" occurrences, but
only the ones that do NOT start or end with a " (quote). The reason
for that is basically I want to find pure text occurrences of a file://-link
within a html file, and not the ones that are <a href="file:\
\....xls">abc</a> ones...

Does anyone maybe know from the top of his/her head what the correct
regex would be in this situation?

Cheers and thanks,
-Jörg

I think something like [^"](file://.+\.xls)[^"] should do the trick.
Beware, though -- XML and XHTML (and maybe HTML?) allow the use of
single-quoted attributes, too. So you might have to change the regex
to [^'"](file://.+\.xls)[^'"] or something similar.

Dathan · Dec 10, 2008

Hello there,

Click to expand...

I am a little bit confused by a regex I need to run on some strings...
basically I want to match it all "file:// .... .xls" occurrences, but
only the ones that do NOT start or end with a " (quote). The reason
for that is basically I want to find pure text occurrences of a file://-link
within a html file, and not the ones that are <a href="file:\
\....xls">abc</a> ones...

Click to expand...

Does anyone maybe know from the top of his/her head what the correct
regex would be in this situation?

Click to expand...

Cheers and thanks,
-Jörg

Click to expand...

I think something like [^"](file://.+\.xls)[^"] should do the trick.
Beware, though -- XML and XHTML (and maybe HTML?) allow the use of
single-quoted attributes, too. So you might have to change the regex
to [^'"](file://.+\.xls)[^'"] or something similar.

May need to use .+? instead of .+, as .+ does greedy matching and .+?
does not. With .+, if you have multiple occurrences of
"file://.......xsl" on a single line, it'll include the first file://
and the last .xls and everything between as a single match.

~Dathan
May need to change this to [^'"](file://.+?\.xls)[^'"] to turn off
greedy matching. (I think that's the correct syntax

Andrew Morton · Dec 11, 2008

Joerg said:
I am a little bit confused by a regex I need to run on some strings...
basically I want to match it all "file:// .... .xls" occurrences, but

Um, that could be file:/// - as in three slashes - some circumstances. Or
even four:

http://en.wikipedia.org/wiki/File_URI_scheme

Andrew

Jeff Johnson · Dec 11, 2008

I think something like [^"](file://.+\.xls)[^"] should do the trick.
Beware, though -- XML and XHTML (and maybe HTML?) allow the use of
single-quoted attributes, too. So you might have to change the regex
to [^'"](file://.+\.xls)[^'"] or something similar.

Click to expand...

May need to use .+? instead of .+, as .+ does greedy matching and .+?
does not. With .+, if you have multiple occurrences of
"file://.......xsl" on a single line, it'll include the first file://
and the last .xls and everything between as a single match.

You should probably also use a backreference so that whatever type of
quotation mark you match the first time, you match the second time. Don't
ask me for the syntax, I don't remember; I just know it exists.

The other thing that came to my mind when I saw this question was maybe it
will require lookahead/lookbehind. But maybe that's overkill in this
situation.

Regex question	2	Apr 29, 2006
.NET 2.0 beta and regex regular expressions	1	Jun 22, 2005
Regex search: advanced search range settings?	3	Jul 21, 2007
Regex in C#	4	Jun 2, 2014
Another question about regex (not understanding)	5	Dec 12, 2007
Regex doubt	9	Oct 26, 2004
Regex question	3	Mar 29, 2006
Regex help needed	1	Apr 4, 2010

Regex Question

Joerg Battermann

Dathan

Dathan

Andrew Morton

Jeff Johnson

Ask a Question

Similar Threads