Regex Help Required

  • Thread starter Thread starter Matthias S.
  • Start date Start date
M

Matthias S.

hello,

I've got the following regexpression:

"<span .*bbc_underline.*>(.*)</span>"

and the following input string:

"this <span class="bbc_underline">is underlined <span
class="bbc_strikethrough">and striked through</span>text.</span>"

when I do a replace on the expression, I get the following result:

"this text."

I'm quite new to regular expressions. the goal in the expression above is to
yield the following result:

"this is underlined and striked throughtest."

but somehow, the first expression eats up more then it should. btw, I don't
know if or what is nested. I just have a couple of span's with given
classes, that will "turn into" square brackets. can somebody please help? A
ton of thanks in advance.

Matthias
 
Regex is "greedy" by default. It will always find the largest chunk it can.
To make it "lazy" (find the smallest chunk it can), use "*?".
So, I think your Regex would be
"<span .*?bbc_underline.*?>(.*?)</span>"
Each of the ".*?" will find the minimum number of characters rather than the
maximum.
Give it a try.
Ethan
 
hey ethan,

thanks for your help, but it does not work correctly either, since it
matches the first occurence of </span> instead of the last.

In the case I did not explain it correctly: What I want to achive is to
match the first occurence of <span SOMETHING "bbc_underline"> with the last
possible occurence of </span> and replace everything within with $1.
I don't know how many nestings there are. The span classes I use come from a
predefined list (bbc_underline, bbc_italic, bbc_strikethrough and the like)

I would greatly appreceate further help.

Matthias

Ethan Strauss said:
Regex is "greedy" by default. It will always find the largest chunk it can.
To make it "lazy" (find the smallest chunk it can), use "*?".
So, I think your Regex would be
"<span .*?bbc_underline.*?>(.*?)</span>"
Each of the ".*?" will find the minimum number of characters rather than the
maximum.
Give it a try.
Ethan


Matthias S. said:
hello,

I've got the following regexpression:

"<span .*bbc_underline.*>(.*)</span>"

and the following input string:

"this <span class="bbc_underline">is underlined <span
class="bbc_strikethrough">and striked through</span>text.</span>"

when I do a replace on the expression, I get the following result:

"this text."

I'm quite new to regular expressions. the goal in the expression above is
to
yield the following result:

"this is underlined and striked throughtest."

but somehow, the first expression eats up more then it should. btw, I
don't
know if or what is nested. I just have a couple of span's with given
classes, that will "turn into" square brackets. can somebody please help?
A
ton of thanks in advance.

Matthias

 
The only way of replacing nested tags is to start from the innermost tag.

Make a pattern that matches your tag, but only if there is not another
such tag inside it. Then you can successfully match the innermost tag
and replace it, then repeat the process until there are no more matches.
hey ethan,

thanks for your help, but it does not work correctly either, since it
matches the first occurence of </span> instead of the last.

In the case I did not explain it correctly: What I want to achive is to
match the first occurence of <span SOMETHING "bbc_underline"> with the last
possible occurence of </span> and replace everything within with $1.
I don't know how many nestings there are. The span classes I use come from a
predefined list (bbc_underline, bbc_italic, bbc_strikethrough and the like)

I would greatly appreceate further help.

Matthias

Ethan Strauss said:
Regex is "greedy" by default. It will always find the largest chunk it can.
To make it "lazy" (find the smallest chunk it can), use "*?".
So, I think your Regex would be
"<span .*?bbc_underline.*?>(.*?)</span>"
Each of the ".*?" will find the minimum number of characters rather than the
maximum.
Give it a try.
Ethan


Matthias S. said:
hello,

I've got the following regexpression:

"<span .*bbc_underline.*>(.*)</span>"

and the following input string:

"this <span class="bbc_underline">is underlined <span
class="bbc_strikethrough">and striked through</span>text.</span>"

when I do a replace on the expression, I get the following result:

"this text."

I'm quite new to regular expressions. the goal in the expression above is
to
yield the following result:

"this is underlined and striked throughtest."

but somehow, the first expression eats up more then it should. btw, I
don't
know if or what is nested. I just have a couple of span's with given
classes, that will "turn into" square brackets. can somebody please help?
A
ton of thanks in advance.

Matthias


 
Back
Top