Need RegEx To Find Corrupted Hyperlink

H

HowdeeDoodee

Using Word's Regular Expression system, I need to find corrupte
hyperlinks like the one shown below.

<U><
href="http://www.findthesite.net/CP/Comment/PostNewABC2_R.php?BRN=ON&SeeAlso=1P
1:17">1Pe 1:17">1Pe 1:17">1Pe 1:17</a></U>

The part: 1Pe can be any combination of letters and numbers with th
numbers always being 1, 2, 3, or 4. The numbers always come first, i.e
1Co, 2Jn, 3Jn. Other examples without a number prefix would be Gen, Exo
Lev, Num and so on.

The numbers following the three character prefix could be any numbe
from 1 to 200.

The numbers following the colon could be any number from 1 to 200 an
the numbers following the colon could be a range of numbers from 1 t
200 separated by a dash. Example: 19-24


Other examples:

<U><
href="http://www.findthesite.net/CP/Comment/PostNewABC2_R.php?BRN=ON&SeeAlso=2J
2:2-8">2Jn 2:2-8">2Jn 2:2-8">2Jn 2:2-8</a></U>

<U><
href="http://www.findthesite.net/CP/Comment/PostNewABC2_R.php?BRN=ON&SeeAlso=Pr
200:100">Pro 200:100">Pro 200:100">Pro 200:100</a></U>

<U><
href="http://www.findthesite.net/CP/Comment/PostNewABC2_R.php?BRN=ON&SeeAlso=Le
12:10">Lev 12:10">Lev 12:10">Lev 12:10</a></U>

The link appears in text like the text shown below. The problem I kee
having is the semi-colon separating the two hyperlinks because using
wild card ignores the semi-colon.

Here is a sample of the text where the corrupted hyperlink appears.

of God and a good conscience; and this is its ordinary acceptation i
Scripture. <U><
href="http://www.findthesite.net/CP/Comment/PostNewABC2_R.php?BRN=ON&SeeAlso=Ac
10:34">Act 10:34</a></U> ; <U><
href="http://www.findthesite.net/CP/Comment/PostNewABC2_R.php?BRN=ON&SeeAlso=1P
1:17">1Pe 1:17">1Pe 1:17">1Pe 1:17</a></U> But piety, zeal, holiness
and other similar graces, were the principal

Here is one of the regular expressions I have tried but this expressio
ignores the semicolon and both hyperlinks above are selected.

Thank you in advance for your replies
 
M

macropod

Hi HowdeeDoodee,

Assuming your faulty expressions are split by a paragraph break as per your post, you could make your 'Find' expression:
=[0-9][A-Z][a-z]^13[0-9]{1,3}:[0-9]{1,3}
for strings like 1Pe, 1Co, 2Jn, 3Jn, etc, and:
=[A-Z][a-z][a-z]^13[0-9]{1,3}:[0-9]{1,3}
for strings like Gen, Exo, Lev, Num, etc.

To turn this into a Find/Replace scenario, where the paragraph break is replaced by a space, you could make your 'Find' expressions:
=([0-9][A-Z][a-z])^13([0-9]{1,3}:[0-9]{1,3})
and:
=([A-Z][a-z][a-z])^13([0-9]{1,3}:[0-9]{1,3} )
and the 'Replace' expression:
\1 \2

That some expressions may include a dash following the first verse reference appears immaterial.
 
H

HowdeeDoodee

'macropod[_2_ said:
;394529']Hi HowdeeDoodee,

Assuming your faulty expressions are split by a paragraph break as pe
your post, you could make your 'Find' expression:...

Hi Macropod, I think I know you from another world :)

OK, my original post stinks because I could not get the right and lef
arrows in the url to post on this forum. I put a Word attachment t
this post so you could see the issue in a Word file.

If you cannot open the Word file attachment, here is the corrupte
hyperlink with appropriate words in brackets substituted for th
non-displaying right and left arrows. The following url is preceded b
a good url. A semicolon separates the two urls. I suspect I have ba
urls like the following in various files in this document conversio
project.

[UnderlineTag][LeftArrow]
href="http://www.findthepower.net/CP/CommentaryProject/PostNewABC2_R.php?CAL=ON&SeeAlso=1P
1:17"[RightArrow]1Pe 1:17"[RightArrow]1Pe 1:17"[RightArrow]1P
1:17[LeftArrow]/a[RightArrow][/EndUnderlineTag]

I tried using =[0-9][A-Z][a-z] [0-9]{1,3}:[0-9]{1,3}in Word using wil
card search and this regex works but not enough. I need to find th
whole URL from the leftmost underline tag on the left to the rightmos
underline on the right only for _broken_ urls like the url shown above
not for good urls.

Thank you for your help and time

+-------------------------------------------------------------------
|Filename: Text.doc
|Download: http://www.wordbanter.com/attachment.php?attachmentid=74
+-------------------------------------------------------------------
 
M

macropod

Hi HowdeeDoodee,
Hi Macropod, I think I know you from another world :)
Yup!

OK, so how about these for the Find expressions:
\;*([0-9][A-Z][a-z] [0-9]{1,3}\:[0-9]{1,3}*\"\>){3,}*\</U\>
and
\;*([A-Z][a-z][a-z] [0-9]{1,3}\:[0-9]{1,3}*\"\>){3,}*\</U\>
Both assume the faulty link is preceded by a semi-colon and ends in </U>. These Find expressions will retrieve the complete
hyperlink, which you can load into a string variable for further manipulation.

--
Cheers
macropod
[MVP - Microsoft Word]


HowdeeDoodee said:
'macropod[_2_ said:
;394529']Hi HowdeeDoodee,

Assuming your faulty expressions are split by a paragraph break as per
your post, you could make your 'Find' expression:...

Hi Macropod, I think I know you from another world :)

OK, my original post stinks because I could not get the right and left
arrows in the url to post on this forum. I put a Word attachment to
this post so you could see the issue in a Word file.

If you cannot open the Word file attachment, here is the corrupted
hyperlink with appropriate words in brackets substituted for the
non-displaying right and left arrows. The following url is preceded by
a good url. A semicolon separates the two urls. I suspect I have bad
urls like the following in various files in this document conversion
project.

[UnderlineTag][LeftArrow]a
href="http://www.findthepower.net/CP/CommentaryProject/PostNewABC2_R.php?CAL=ON&SeeAlso=1Pe
1:17"[RightArrow]1Pe 1:17"[RightArrow]1Pe 1:17"[RightArrow]1Pe
1:17[LeftArrow]/a[RightArrow][/EndUnderlineTag]

I tried using =[0-9][A-Z][a-z] [0-9]{1,3}:[0-9]{1,3}in Word using wild
card search and this regex works but not enough. I need to find the
whole URL from the leftmost underline tag on the left to the rightmost
underline on the right only for _broken_ urls like the url shown above,
not for good urls.

Thank you for your help and time.


+-------------------------------------------------------------------+
|Filename: Text.doc |
|Download: http://www.wordbanter.com/attachment.php?attachmentid=74 |
+-------------------------------------------------------------------+
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top