W
writebrent
I think I need to do a negative lookahead with a regular expression,
but I'm a bit confused how to make it all work. Take these example
texts:
Need to match these two:
=========================
Item 4.01 Regulation and other items
<b>Item 4. Regulation</b>
=========================
Need to avoid matching these two:
=========================
....then he looked at Item 12.06 for more information...
<a href="000">Item 6. Other</a> below
=========================
In other words, I need to match a string as follows:
0. Begining of line or ">"
1. The word "Item",
2. Single space,
3. A number containing at least one or more digits and a period,
4. Some more text,
5. Terminating either in an end of line, or "<"
6. Except, not terminating in "</a" (i.e., exclude hyperlinks).
My proposed (nonworking) solution is this:
(?:^|>)(?<item>Item\s\d+\..*?)(?!</a>)(?:<|$)
The problem is that it still matches all hyperlinks.
I'd sure appreciate any help you might have.
Thanks.
--Brent
but I'm a bit confused how to make it all work. Take these example
texts:
Need to match these two:
=========================
Item 4.01 Regulation and other items
<b>Item 4. Regulation</b>
=========================
Need to avoid matching these two:
=========================
....then he looked at Item 12.06 for more information...
<a href="000">Item 6. Other</a> below
=========================
In other words, I need to match a string as follows:
0. Begining of line or ">"
1. The word "Item",
2. Single space,
3. A number containing at least one or more digits and a period,
4. Some more text,
5. Terminating either in an end of line, or "<"
6. Except, not terminating in "</a" (i.e., exclude hyperlinks).
My proposed (nonworking) solution is this:
(?:^|>)(?<item>Item\s\d+\..*?)(?!</a>)(?:<|$)
The problem is that it still matches all hyperlinks.
I'd sure appreciate any help you might have.
Thanks.
--Brent