stuck on a REGEX (\S[^\s/>]*)

  • Thread starter Thread starter darrel
  • Start date Start date
D

darrel

I'm trying to find the opening < and the text of a tag (without the
attributes or closing tags)

This is what I'm using:

(\S[^\s/>]*)

Which, I think, reads as:

(any number of non-whitespace characters [up to a space, /, or >])

Is that correct? I can't get it to work.

If my text is:

<tag

then it returns "<tag" which is what I want.

However, if I have:

<tag/ or <tag>

it instead matches "/" or ">" respectively.

Why?
 
darrel said:
I'm trying to find the opening < and the text of a tag (without the
attributes or closing tags)

This is what I'm using:

(\S[^\s/>]*)

Which, I think, reads as:

(any number of non-whitespace characters [up to a space, /, or >])

Is that correct? I can't get it to work.

If my text is:

<tag

then it returns "<tag" which is what I want.

However, if I have:

<tag/ or <tag>

it instead matches "/" or ">" respectively.

Why?

In my brief testing, when run against "<tag/" it first matches "<tag" -
then the next match is "/". The second match matches "/" because it
matches the \S character class.

Post some examples of how you want the regex to behave, and maybe
someone can help put one together.
 
In my brief testing, when run against "<tag/" it first matches "<tag" -
then the next match is "/". The second match matches "/" because it
matches the \S character class.

But shouldn't this: [^/] stop it from doing that?

Here's how I want the regex to behave:

I want to find the first 'word' in the string. this would be any number of
characters in a row up to (but not including) a space, a new line, or a / or
so in this:

"hello there, how are you"

it should match 'hello'

in this:

"<blockquote>hello there, how are you"

it should match '<blockquote'

Thanks!

-Darrel
 
But shouldn't this: [^/] stop it from doing that?

Aha. Mike, you are correct!

Here's what's happening. If this is my text:

<blockquote>monkey</blockquote>

and this is my Regex:

\S[^>]*

It returns these matches:

<blockquote
monkey</blockquote

So, it's returning the last match, I suppose. This is where I get lost. How
do I get it to ONLY return the first match?
 
Got it!

The problem was the very next group I was using.

I had this:

(\S[^\s/>]*)
but had to add another group:
(\s|\n[^\S>]*)|(>))
which checks for whitespace/new lines OR a closing tag.
-Darrel
 
Use the Match Class of the regular expression object
Dim m as Match = yourRegEx.Match(string)
m will return the first match

darrel said:
But shouldn't this: [^/] stop it from doing that?

Aha. Mike, you are correct!

Here's what's happening. If this is my text:

<blockquote>monkey</blockquote>

and this is my Regex:

\S[^>]*

It returns these matches:

<blockquote
monkey</blockquote

So, it's returning the last match, I suppose. This is where I get lost. How
do I get it to ONLY return the first match?
 
Back
Top