Word docs to rtf of html without Word markup

G

Guest

I'm not sure this is the correct group, but I'll give it a shot. I copyedit
Word docs for a journal that is now going to make its content available
online. There is a limited budget for what the web host can do when he
receives the files, so I need to get them cleaned up so the Word-specific
markup doesn't mess up the files. For the first online issue, he just pasted
the docs into Wordpad (to get them into rtf) and then into the html editor he
uses, but the final results were not uniform even though the final docs (that
I edited) all had the same font, sizes, etc. Also, the notes didn't function
properly. They don't convert properly from Word, so they aren't anchored,
(which is what I understand the terminology/problem to be). Hence, they
appear as a link in the text of the article but don't jump to the actual note
text at the end of the article. So what I need to know is the following:

Is there a way to create a doc in Word that is "clean," so that it will
convert uniformly (font, font size, formatting as in bold, ital, etc.)?

Is there a way to make the notes "anchor" so that they link properly?

Is there a better/different text editor that will produce rtf files that
will convert easily to html?

Thanks for any help offered.
 
R

Robert M. Franz (RMF)

Hi DOH
I'm not sure this is the correct group, but I'll give it a shot. I copyedit
Word docs for a journal that is now going to make its content available
online. There is a limited budget for what the web host can do when he
receives the files, so I need to get them cleaned up so the Word-specific
markup doesn't mess up the files. For the first online issue, he just pasted
the docs into Wordpad (to get them into rtf) and then into the html editor he
uses, but the final results were not uniform even though the final docs (that
I edited) all had the same font, sizes, etc. Also, the notes didn't function
properly. They don't convert properly from Word, so they aren't anchored,
(which is what I understand the terminology/problem to be). Hence, they
appear as a link in the text of the article but don't jump to the actual note
text at the end of the article. So what I need to know is the following:

Is there a way to create a doc in Word that is "clean," so that it will
convert uniformly (font, font size, formatting as in bold, ital, etc.)?

Is there a way to make the notes "anchor" so that they link properly?

Is there a better/different text editor that will produce rtf files that
will convert easily to html?

A text editor is probably even less suited to generate "WYSIWYG HTML
output" (whatever that would be :)).

Have you tried the direct Save as: HTML (filtered) from Word?

HTH
Robert
 
G

Guest

Robert M. Franz (RMF) said:
Hi DOH


A text editor is probably even less suited to generate "WYSIWYG HTML
output" (whatever that would be :)).

Have you tried the direct Save as: HTML (filtered) from Word?

HTH
Robert
--
/"\ ASCII Ribbon Campaign | MS
\ / | MVP
X Against HTML | for
/ \ in e-mail & news | Word

Thanks for the response. No I haven't tried that. But do you mean in the
"Save As Type" drop down box in the "Save As" screen, or the "Save As Web
Page" option on the File menu? Btw, I still use Word 2K.
 
R

Robert M. Franz (RMF)

DOH said:
A text editor is probably even less suited to generate "WYSIWYG HTML
output" (whatever that would be :)).

Have you tried the direct Save as: HTML (filtered) from Word?
[..]
Thanks for the response. No I haven't tried that. But do you mean in the
"Save As Type" drop down box in the "Save As" screen, or the "Save As Web
Page" option on the File menu? Btw, I still use Word 2K.

That's, well, not ideal! :)

Word 2000 was written when the Internet and Web pages was still pretty
low on MSFT's (or at least the Word team's) agenda. There's no
"filtered" option there. The resulting XHTML code (both options you
describe seem to be the same export, BTW), while not really looking
"nice", contains the relevant formatting information (including some
CSS-like definitions at the start) both to read back the file into Word
or display it in a browser.

Now, whether a browser lays it out exactly as Word: this most likely
will never happen 100%, but that has more to do with the base difference
between a text processor and a browser.

I'd try the direct export from Word as HTML. IF your target application
doesn't like the code, you can try running HTMLTidy from the w3c on the
HTML code (this will make the code "nicer", but you might loose some
layout information ...).

Basically, it's not a good idea to use Word for webpublishing while at
the same time expecting to get the exact layout in HTML. Better get your
CSS definitions right and don't bother too much about layout in Word. If
you can't live with that, produce PDF documents from your Word files and
put these on you website ...

HTH
Robert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Top