Bizarre wildcard replace

G

Guest

Hi, I'm trying to write a basic word -> HTML conversion (since I can't find
any tool that actually does a clean job of it.) For example, I want to find
all instances of italics and replace them with <i>(original text)</i>.
I've tried doing this in the search-and-replace:

Turned on "Use Wildcards", selected Format in the 'find' with Font of
"Italic", do a search for '*'.

In the replace with, I've set formatting to "Not Italic," and made the
replace string <i>^&</i> .

I've also tried it with (*) in the find field and <i>\1</i> in the replace
field.

My problem is that this seems to only be matching one character at a time --
I end up with <i> and </i> around every individual character, instead of
around an entire word, part of a word, sentence or paragraph.

Any help would be much appreciated.
 
J

Jezebel

Word's implementation of regular expressions (which is what you get with
'Use Wildcards' checked) uses minimal matching, as opposed to Unix which
uses maximal. In other words it looks for the *smallest* sequence of
characters that match the Find expression -- in your case, one character at
a time. Which is a damned nuisance.

However, there's an easy fix, at least for the example you give: if you
don't check the 'Use Wildcards' checkbox you can leave the Find box blank:
it will then match the full sequence with the formatting you specify. In the
replace box use ^& for the 'find what' text. So:

Find: (blank), Format = italic
Replace: <i>^&</i>, Format = not italic
 
J

John McGhie [MVP - Word and Word Macintosh]

Had you thought of trying a tool named "Microsoft Word 2003"?

It does a *perfect* job of converting a Word document to HTML.

That's a lie :) It does a perfect job of converting a Word document into
X-HTML. HTML is not capable of describing a Word document. Word writes XML
inline with the HTML to describe the components of the document that HTML
can not.

If you are working with Word 2003 Enterprise Edition, you will have a tool
named InfoPath available. InfoPath enables you to write an XML Transform
that would remove the various components you do not want from the XML that
Word writes.

Otherwise, you will find any number of tools available that will do a
greater or lesser version of what you are trying to do. FrontPage does an
excellent job of you simply "paste" the text of the Word document into the
FrontPage editor. DreamWeaver does a great job of filtering Word's HTML on
import. I think making your own is very much a case of re-inventing a wheel
that is already trundling down the highway under dozens of cars :)

Cheers

Hi, I'm trying to write a basic word -> HTML conversion (since I can't find
any tool that actually does a clean job of it.) For example, I want to find
all instances of italics and replace them with <i>(original text)</i>.
I've tried doing this in the search-and-replace:

Turned on "Use Wildcards", selected Format in the 'find' with Font of
"Italic", do a search for '*'.

In the replace with, I've set formatting to "Not Italic," and made the
replace string <i>^&</i> .

I've also tried it with (*) in the find field and <i>\1</i> in the replace
field.

My problem is that this seems to only be matching one character at a time --
I end up with <i> and </i> around every individual character, instead of
around an entire word, part of a word, sentence or paragraph.

Any help would be much appreciated.

--

Please reply to the newsgroup to maintain the thread. Please do not email
me unless I ask you to.

John McGhie <[email protected]>
Microsoft MVP, Word and Word for Macintosh. Consultant Technical Writer
Sydney, Australia +61 (0) 4 1209 1410
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top