stuck on a REGEX (\S[^\s/>]*)

darrel · Jul 12, 2004

I'm trying to find the opening < and the text of a tag (without the
attributes or closing tags)

This is what I'm using:

(\S[^\s/>]*)

Which, I think, reads as:

(any number of non-whitespace characters [up to a space, /, or >])

Is that correct? I can't get it to work.

If my text is:

<tag

then it returns "<tag" which is what I want.

However, if I have:

<tag/ or <tag>

it instead matches "/" or ">" respectively.

Why?

mikeb · Jul 12, 2004

darrel said:
I'm trying to find the opening < and the text of a tag (without the
attributes or closing tags)

This is what I'm using:

(\S[^\s/>]*)

Which, I think, reads as:

(any number of non-whitespace characters [up to a space, /, or >])

Is that correct? I can't get it to work.

If my text is:

<tag

then it returns "<tag" which is what I want.

However, if I have:

<tag/ or <tag>

it instead matches "/" or ">" respectively.

Why?

In my brief testing, when run against "<tag/" it first matches "<tag" -
then the next match is "/". The second match matches "/" because it
matches the \S character class.

Post some examples of how you want the regex to behave, and maybe
someone can help put one together.

darrel · Jul 12, 2004

In my brief testing, when run against "<tag/" it first matches "<tag" -

then the next match is "/". The second match matches "/" because it
matches the \S character class.

But shouldn't this: [^/] stop it from doing that?

Here's how I want the regex to behave:

I want to find the first 'word' in the string. this would be any number of
characters in a row up to (but not including) a space, a new line, or a / or
so in this:

"hello there, how are you"

it should match 'hello'

in this:

"<blockquote>hello there, how are you"

it should match '<blockquote'

Thanks!

-Darrel

darrel · Jul 12, 2004

But shouldn't this: [^/] stop it from doing that?

Aha. Mike, you are correct!

Here's what's happening. If this is my text:

<blockquote>monkey</blockquote>

and this is my Regex:

\S[^>]*

It returns these matches:

<blockquote

monkey</blockquote

So, it's returning the last match, I suppose. This is where I get lost. How
do I get it to ONLY return the first match?

darrel · Jul 12, 2004

Got it!

The problem was the very next group I was using.

I had this:

(\S[^\s/>]*)
but had to add another group:
(\s|\n[^\S>]*)|(>))
which checks for whitespace/new lines OR a closing tag.
-Darrel

Guest · Jul 13, 2004

Use the Match Class of the regular expression object
Dim m as Match = yourRegEx.Match(string)
m will return the first match

darrel said:
But shouldn't this: [^/] stop it from doing that?

Click to expand...

Aha. Mike, you are correct!

Here's what's happening. If this is my text:

<blockquote>monkey</blockquote>

and this is my Regex:

\S[^>]*

It returns these matches:

<blockquote

monkey</blockquote

Click to expand...

So, it's returning the last match, I suppose. This is where I get lost. How
do I get it to ONLY return the first match?

stuck on a REGEX (\S[^\s/>]*)

darrel

mikeb

darrel

darrel

darrel

Guest