Converting PDF to DOC

G

Graham Mayor

Not easily - and the successful method depends on the tools you have
available and whether protection options have been set in the pdf.

Most expensive: If you have Acrobat and the password for the document (if
there is one) you can save the document as a Word document.
You could use OCR software - Finereader from version 5 on will read PDF and
convert to Word (if unprotected). Recent versions of Ominipage will do so
also.
If you own a scanner (or a digital camera) You could print the document and
scan (or photograph) the pages and use OCR to convert the images.
You could extract the text with SnagIt.
There are links to some of these applications from the favourite page of my
web site.

--
<>>< ><<> ><<> <>>< ><<> <>>< <>>< ><<>
Graham Mayor - Word MVP
E-mail (e-mail address removed)
Web site www.gmayor.com
Word MVP web site www.mvps.org/word
<>>< ><<> ><<> <>>< ><<> <>>< <>>< ><<>
 
G

Graham Mayor

Not if you use the Acrobat or SnagIt options, but the former is expensive
and reliant on you having any password that protects the document and the
latter will only extract sections of text as unformatted text. OCR is
probably the best option.

--
<>>< ><<> ><<> <>>< ><<> <>>< <>>< ><<>
Graham Mayor - Word MVP
E-mail (e-mail address removed)
Web site www.gmayor.com
Word MVP web site www.mvps.org/word
<>>< ><<> ><<> <>>< ><<> <>>< <>>< ><<>
 
G

Guest

You can cut and paste texts and pictures separately from PDF into WORD but
if you want the document in Word look exactly like its PDF version, you have
to format it manually
 
G

Guest

For my case, I just need the text from pdf file. I cannot copy and paste because that cannot do by programming. How to sort things out?
 
G

Graham Mayor

PDF is a graphical format. The text is only accessible via Acrobat or OCR
software, as I believe we have already told you.

--
<>>< ><<> ><<> <>>< ><<> <>>< <>>< ><<>
Graham Mayor - Word MVP

Web site www.gmayor.com
Word MVP web site www.mvps.org/word
<>>< ><<> ><<> <>>< ><<> <>>< <>>< ><<>
 
A

AA

PDF is a graphical format. The text is only accessible via Acrobat or OCR
software, as I believe we have already told you.

I've been able to copy text from a PDF file to the clipboard
 
A

AA

Not if it is protected.
So that's the easy way to find out if a PDF file is protected, see if
you can mark and copy text?

I assumed that there were different kinds of PDF files, one's that had
copyable text, and others that were just graphics, pictures of the
text and therefore not copyable. Wrong assumption I guess.
 
J

Jean-Guy Marcil

AA said:
So that's the easy way to find out if a PDF file is protected, see if
you can mark and copy text?

I assumed that there were different kinds of PDF files, one's that had
copyable text, and others that were just graphics, pictures of the
text and therefore not copyable. Wrong assumption I guess.
Not quite... if you scan a page and do not run it through OCR, you will end
up with a picture of a page (picture of the text...). For example, people
who scan whole books do not usually bother going through OCR for every
single page.

The protection is not designed to prevent you from only grabbing the text...
you cannot grab anything.

--
Salut!
_______________________________________
Jean-Guy Marcil - Word MVP
(e-mail address removed)
Word MVP site: http://www.word.mvps.org
 
A

AA

The protection is not designed to prevent you from only grabbing
the text... you cannot grab anything.

Ok, I understand now. Merci Jean-Guy. I guess I haven't tried to
copy from a protected PDF file very much, or I don't receive many
protected files.
For example, people who scan whole books do not usually bother
going through OCR for every single page.

So in this case, assuming the file is not protected, what I would be
copying would be graphics, and would be useless to paste into a text
editor or word processor for editing?


Andy
 
J

Jean-Guy Marcil

Hi Andy,

AA said:
Ok, I understand now. Merci Jean-Guy. I guess I haven't tried to
copy from a protected PDF file very much, or I don't receive many
protected files.

If you have Acrobat Writer, you can play around with that protection
feature... Go to File > Document Properties... or something like that...
writing from memory here...
So in this case, assuming the file is not protected, what I would be
copying would be graphics, and would be useless to paste into a text
editor or word processor for editing?

That's right, but you could run the "graphics" through an OCR if your OCR
software can handle the graphic format it is in, then you could genrate
text.

--
Salut!
_______________________________________
Jean-Guy Marcil - Word MVP
(e-mail address removed)
Word MVP site: http://www.word.mvps.org
 
S

Suzanne S. Barnhill

Actually, if a PDF is not protected, you can copy text as text. There are
specific selection buttons for text and graphics.

--
Suzanne S. Barnhill
Microsoft MVP (Word)
Words into Type
Fairhope, Alabama USA

Email cannot be acknowledged; please post all follow-ups to the newsgroup so
all may benefit.
 
J

Jean-Guy Marcil

Hi Suzanne,

Normally yes, but here we were writing about PDF documents containing text
pages that were obtained from scanning a book, but not run through OCR, so
each page is a big single graphic. Even from within Acrobat you cannot
select the text. The text tool will not work, but the object tool does, and
if you use it, you end up moving the whole page around.

In those cases, you have to run the graphical page through an OCR software
to re-extract the text since that step was not done when the original
scanning took place.
I believe it could be done... I have never tried OCR from a PDF page, but I
know you can export a PDF page as a graphic, and OCR software usually accept
graphics from a scanner to extract the text... So it should work from a PDF
graphic, as long as the graphic is saved in a format recognized by the OCR.

Cheers!

--
Salut!
_______________________________________
Jean-Guy Marcil - Word MVP
(e-mail address removed)
Word MVP site: http://www.word.mvps.org
 
A

AA

That's right, but you could run the "graphics" through an OCR if your OCR
software can handle the graphic format it is in, then you could genrate
text

Got it. Thanks Jean-Guy

Andy
 
A

AA

Actually, if a PDF is not protected, you can copy text as text. There are
specific selection buttons for text and graphics.

But what about the example Jean-Guy gave, where the text is actually a
picture of the text?
 
S

Suzanne S. Barnhill

Ah, okay. I hadn't been following the thread that closely.

--
Suzanne S. Barnhill
Microsoft MVP (Word)
Words into Type
Fairhope, Alabama USA

Email cannot be acknowledged; please post all follow-ups to the newsgroup so
all may benefit.
 
S

Sparky191

Jean-Guy Marcil said:
*Hi Suzanne,

Normally yes, but here we were writing about PDF documents containin
text
pages that were obtained from scanning a book, but not run throug
OCR, *

I seemed to have missed where this thread mentions that?

PDF's are not a graphical format. They are simply a file format. The
can be made up of text and or graphics. Using PDF's to hold graphic
of text, is like taking screenshots of Word documents then pasting the
back into word as graphics. Yes you can do it but its not the best us
of the format
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top