Transferring Word docs to FP

T

Terry Pinnell

Quite a lot of my client's input is in the form of nicely-formatted
Word documents, with tables etc. I had the devil's job with the first
of these I worked on a few months ago, after first using Word's
File|Save As|HTM. The resultant code had a lot of very strange stuff
in it. This time I reckon I'll save as text and then piece it together
in FP via the clipboard.

What approach do more experienced FP users take please?
 
J

Jim Buyens

I usually copy from Word, paste to Notepad, copy from
Notepad, and paste to FrontPage.

You might also want to try pasting directly from Word,
and then click the Paste Options button and choose either
Use Destination Styles or Keep Text Only. (The Paste
Options button is the little doohickey that appears near
the lower right corner of something you've just pasted.)

FP2003 has some special features for removing extraneous
HTML that Word puts in. If you can upgrade, that's worth
a try.

You might also want to try the SuperRemoveFormatting
macro downloadable from
http://www.interlacken.com/fp2002/newgizmos/default.htm.

Jim Buyens
Microsoft FrontPage MVP
http://www.interlacken.com
Author of:
*========----------
|\=========------------
|| Microsoft Office FrontPage 2003 Inside Out
|| Microsoft FrontPage Version 2002 Inside Out
|| Web Database Development Step by Step .NET Edition
|| Troubleshooting Microsoft FrontPage 2002
|| Faster Smarter Beginning Programming
|| (All from Microsoft Press)
|/=========------------
*========----------
 
A

ai429

This is a great question. I just posted a similar problem
moments ago. In my situation, not only did Word create
alien code in my Web page, it restructured the directories
in my FrontPage site by creating a folder called
[site]_files, added XML files, and then it deleted my
images folder and all it's contents. That's right!

Unless someone has a better solution to your and my
problem, I might suggest you have Word save the file as
html and then use Macromedia DreamweaverMX to clean up the
mess.

Regards
 
J

JL Amerson

I'm not a "more experienced" user but I'd have saved the file as a .pdf and
posted that. Two steps and it's a done deal.
 
T

Terry Pinnell

Jim Buyens said:
I usually copy from Word, paste to Notepad, copy from
Notepad, and paste to FrontPage.

Thanks a lot, appreciate the help Jim. That's what I'm trying to do
right now. Please see my post (minutes before seeing this) asking why
it doesn't work the way I expect. IOW, why do paragraph marks show up
in FP 2000 as 'soft returns', not paragraph breaks? Makes BIG
diffrence to degree of editing needed...
You might also want to try pasting directly from Word,
and then click the Paste Options button and choose either
Use Destination Styles or Keep Text Only. (The Paste
Options button is the little doohickey that appears near
the lower right corner of something you've just pasted.)

That's a revelation - but I can't find anything like that! Can you
spell it out please? Here, I copy a piece of text in Word 2000, go to
FP 2000, position my cursor, Ctrl+v to paste it. And I just get the
text (complete with wrong sort of para marks). No sign of any 'paste
options doohickey. The only options I get are if, instead of usingthe
plain Paste command, I use Paste Special, which brings up a window
'Conver Text' with 5 options. But none of those get me the para marks.
FP2003 has some special features for removing extraneous
HTML that Word puts in. If you can upgrade, that's worth
a try.

You might also want to try the SuperRemoveFormatting
macro downloadable from
http://www.interlacken.com/fp2002/newgizmos/default.htm.

Will take a look. But for now would like to get a grip on copy/paste
 
T

Terry Pinnell

JL Amerson said:
I'm not a "more experienced" user but I'd have saved the file as a .pdf and
posted that. Two steps and it's a done deal.

PDF? What version of Word allows that? No PDF in Save As drop-down
here in Word 2000 (from Office Professional Suite 2000). News to me
it's now possible at all - I thought Adobe had that wrapped up tight!
 
T

Terry Pinnell

ai429 said:
This is a great question. I just posted a similar problem
moments ago. In my situation, not only did Word create
alien code in my Web page, it restructured the directories
in my FrontPage site by creating a folder called
[site]_files, added XML files, and then it deleted my
images folder and all it's contents. That's right!

Unless someone has a better solution to your and my
problem, I might suggest you have Word save the file as
html and then use Macromedia DreamweaverMX to clean up the
mess.

Thanks. Don't have Macromedia DreamweaverMX. But did use a clean up
utility last time, with limited success. So immediate goals are
finding ways to:

1. Paste text so para marks stay as para marks, not those soft return
things you get with Shift+Enter.

2. Paste cell contents from Word table to FP table effectively, i.e.
not one cell at a time.
 
S

Stefan B Rusynko

There are many PDF generators that work w/ Office - like Win2PDF

--




|
| >I'm not a "more experienced" user but I'd have saved the file as a .pdf and
| >posted that. Two steps and it's a done deal.
|
| PDF? What version of Word allows that? No PDF in Save As drop-down
| here in Word 2000 (from Office Professional Suite 2000). News to me
| it's now possible at all - I thought Adobe had that wrapped up tight!
|
| --
| Terry, West Sussex, UK
|
 
B

Brett...

Terry said:
PDF? What version of Word allows that? No PDF in Save As drop-down
here in Word 2000 (from Office Professional Suite 2000). News to me
it's now possible at all - I thought Adobe had that wrapped up tight!

There are lots of PDF add-ins for Word, Publisher et al.
You don't usually "save as" but "print to" PDF.
 
J

Jack Brewster

That Office filter will only install if you have Office 2000 installed.
That really stinks because I used to use and recommend this tool, too.

What really stinks is, if you install Office 2000, then install the tool,
then install an Office upgrade, like XP, the tool will work just fine. The
problem is in the installer.
 
J

Jack Brewster

Terry,

There is nothing built-in to Office to allow PDF export. You would need to
track down a third party app such as:
- Acrobat
defacto, expensive!
- pdfFactory
http://www.pdffactory.com/products/pdffactory/index.html
I've used both their FinePrint and pdfFactorty products and I think both
are great
products.
There are two versions, but I think the $50 'non-Pro' version will take
care of you.

There are many others, heck you may even find a free one if you look hard
enough.

Good luck!
 
J

JL Amerson

It's not part of Word. I use Acrobat but from reading this group I've
learned there are free programs that will allow you to create a .pdf file.
 
T

Terry Pinnell

Jack Brewster said:
Terry,

There is nothing built-in to Office to allow PDF export. You would need to
track down a third party app such as:
- Acrobat
defacto, expensive!
- pdfFactory
http://www.pdffactory.com/products/pdffactory/index.html
I've used both their FinePrint and pdfFactorty products and I think both
are great
products.
There are two versions, but I think the $50 'non-Pro' version will take
care of you.

There are many others, heck you may even find a free one if you look hard
enough.

Good luck!

PDF approach
------------
Thanks Jack. I remembered that I do have such a PDF converter
installed, called PDF995 (free version) so I thought I'd first try
that to satisfy my curiosity. But is this perhaps a misunderstanding
of some kind?

I found at first that it didn't work (Word 2000 didn't ask for a file
name after using Print|PDF995). But after re-installing it was OK. I
saved my DOC file as a PDF and read it OK in Adobe Reader. So far so
good.

But on importing that file into FP 2000, I took the option to 'Copy
here as Web format'. That just gave me a page that contained 2 lines
of gobbledegook.

Have you or anyone else used this method to get *HTML* into FP?
Obviously, I can import the PDF file itself. But that's just a larger
file than the original Word DOC. And both would require the link
opening Adobe or Word respectively.

MSOffice HTML Filter 2.0 route
------------------------------
I already had this utility too, but again it didn't work until I
reinstalled. On importing into FP and displaying the page it looks
promising.

However, one strange major flaw is that some of the text, and all of
the tables are abruptly cut off on the left. Looking very briefly at
the HTML before I call it a night, it seems to be due to a kluge of
stuff like this:
p.MsoBodyText, p.MsoBodyTextIndent, etc

Stefan's '2-stage' copy/paste method
 
B

Brett...

Terry said:
But on importing that file into FP 2000, I took the option to 'Copy
here as Web format'. That just gave me a page that contained 2 lines
of gobbledegook.

Have you or anyone else used this method to get *HTML* into FP?
Obviously, I can import the PDF file itself. But that's just a larger
file than the original Word DOC. And both would require the link
opening Adobe or Word respectively.

I can't work out what you are doing here.
Just treat the .pdf file as you would any other type of file...
Upload it to your web and create a link to it as you would a html page.
There is no concept of conversion.
 
J

JL Amerson

True - but the Acrobat Reader is *free* and contrary to what you might have
heard, not everyone has Word. Plus you don't have to worry about which
version of Word the viewer has or what fonts they have.


(snip)
 
J

Jack Brewster

I think the point, which has been lost in the thread, is that Terry _wants_
the contents of this Word file in HTML format, not in PDF format. My
apologies for not catching that one earlier, Terry.

As for the Office Filter, I can't say. When I used it I had really great
results. You may need to play with the options a bit. Maybe you haven't
set it to remove all the nasty bits yet?
 
T

Terry Pinnell

Jack Brewster said:
I think the point, which has been lost in the thread, is that Terry _wants_
the contents of this Word file in HTML format, not in PDF format.

Correct! I don't want to *link* to such files - I want to use their
content within the website, on 'standard' page. With consistent colour
scheme, font, header styles, etc.
My apologies for not catching that one earlier, Terry.

No problem, I *thought* there had to be some misunderstanding!
As for the Office Filter, I can't say. When I used it I had really great
results. You may need to play with the options a bit. Maybe you haven't
set it to remove all the nasty bits yet?

You're right, many thanks! I've just had another crack at it, this
time enabling *all* options, not just the top set, and it worked OK.

For others who end up googling here with a similar problem, the filter
has one set of options at the top enabled by default, then the
following set which are initially disabled:

Use VML for displaying graphics
Remove standard CSS
Remove all STYLE elements
Remove standard @rule constructs

By trial/error I established that the critical option to enable (which
maybe should have been obvious to me) was 'Remove standard CSS'. In
this case, that gave me a 42 KB file from the original 174 KB HTM
file. Enabling all 4 of that lower set gave the same size.

For background (and possibly to assist with ongoing queries) here are
more details:

This temporary (unfinished) page
http://www.cupod-mentoring.com/swprojectdoc_copy.htm
shows the page content I tediously created the way I described
earlier, e.g. pasting into each cell of table from original Word
table, etc. I have yet to continue that process. Need to master it, as
in many ways it's the preferred way I'd like to work, as it lets me
add content to an existing formatted page. Next time will try your and
Stefan's recommendations about the copy/paste procedure. Anyway, that
page gives you an idea of what result I want.

The original Word doc is here:
http://www.cupod-mentoring.com/simonoriginal.doc

I used File|Save As in Word 2000 to convert that to an HTML file,
which I've copied to the site for the time being:
http://www.cupod-mentoring.com/simonoriginal-save1.htm
(I changed its name, as the Office Filter overwrites same name later.)
Opening that shows the flawed result I described, i.e. left part of
text cut off in places.

BTW, I note that opening the HTML file creates a subfolder
'simonoriginal-save1_files', containing 2 files: filelist.xml and
header.htm. Not entirely sure what if any impact this has on FP - for
time being I've cheerfully ignored it!

I now run Office HTML Filter 2.0 and use Add to select
simonoriginal.htm (identical to simonoriginal-save1.htm), then Apply,
and finally Close. BTW, I see there's no indication when it's
finished? However, if you watch the folder, it's signaled by the
appearance of filename.bak. Happily, that filtered result now looks
fairly good. Some work needed on headers, but the left cut-off is
cured.

Of course, that gives me a plain, separate page. So next step is to
work out how to get content on a standard page, like my manually
prepared example.

Thanks for sticking with me on this!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top