Converting PDF to DOC

  • Thread starter Thread starter Sasa
  • Start date Start date
Not easily - and the successful method depends on the tools you have
available and whether protection options have been set in the pdf.

Most expensive: If you have Acrobat and the password for the document (if
there is one) you can save the document as a Word document.
You could use OCR software - Finereader from version 5 on will read PDF and
convert to Word (if unprotected). Recent versions of Ominipage will do so
also.
If you own a scanner (or a digital camera) You could print the document and
scan (or photograph) the pages and use OCR to convert the images.
You could extract the text with SnagIt.
There are links to some of these applications from the favourite page of my
web site.

--
<>>< ><<> ><<> <>>< ><<> <>>< <>>< ><<>
Graham Mayor - Word MVP
E-mail (e-mail address removed)
Web site www.gmayor.com
Word MVP web site www.mvps.org/word
<>>< ><<> ><<> <>>< ><<> <>>< <>>< ><<>
 
Not if you use the Acrobat or SnagIt options, but the former is expensive
and reliant on you having any password that protects the document and the
latter will only extract sections of text as unformatted text. OCR is
probably the best option.

--
<>>< ><<> ><<> <>>< ><<> <>>< <>>< ><<>
Graham Mayor - Word MVP
E-mail (e-mail address removed)
Web site www.gmayor.com
Word MVP web site www.mvps.org/word
<>>< ><<> ><<> <>>< ><<> <>>< <>>< ><<>
 
You can cut and paste texts and pictures separately from PDF into WORD but
if you want the document in Word look exactly like its PDF version, you have
to format it manually
 
For my case, I just need the text from pdf file. I cannot copy and paste because that cannot do by programming. How to sort things out?
 
PDF is a graphical format. The text is only accessible via Acrobat or OCR
software, as I believe we have already told you.

--
<>>< ><<> ><<> <>>< ><<> <>>< <>>< ><<>
Graham Mayor - Word MVP

Web site www.gmayor.com
Word MVP web site www.mvps.org/word
<>>< ><<> ><<> <>>< ><<> <>>< <>>< ><<>
 
PDF is a graphical format. The text is only accessible via Acrobat or OCR
software, as I believe we have already told you.

I've been able to copy text from a PDF file to the clipboard
 
Not if it is protected.
So that's the easy way to find out if a PDF file is protected, see if
you can mark and copy text?

I assumed that there were different kinds of PDF files, one's that had
copyable text, and others that were just graphics, pictures of the
text and therefore not copyable. Wrong assumption I guess.
 
AA said:
So that's the easy way to find out if a PDF file is protected, see if
you can mark and copy text?

I assumed that there were different kinds of PDF files, one's that had
copyable text, and others that were just graphics, pictures of the
text and therefore not copyable. Wrong assumption I guess.
Not quite... if you scan a page and do not run it through OCR, you will end
up with a picture of a page (picture of the text...). For example, people
who scan whole books do not usually bother going through OCR for every
single page.

The protection is not designed to prevent you from only grabbing the text...
you cannot grab anything.

--
Salut!
_______________________________________
Jean-Guy Marcil - Word MVP
(e-mail address removed)
Word MVP site: http://www.word.mvps.org
 
The protection is not designed to prevent you from only grabbing
the text... you cannot grab anything.

Ok, I understand now. Merci Jean-Guy. I guess I haven't tried to
copy from a protected PDF file very much, or I don't receive many
protected files.
For example, people who scan whole books do not usually bother
going through OCR for every single page.

So in this case, assuming the file is not protected, what I would be
copying would be graphics, and would be useless to paste into a text
editor or word processor for editing?


Andy
 
Hi Andy,

AA said:
Ok, I understand now. Merci Jean-Guy. I guess I haven't tried to
copy from a protected PDF file very much, or I don't receive many
protected files.

If you have Acrobat Writer, you can play around with that protection
feature... Go to File > Document Properties... or something like that...
writing from memory here...
So in this case, assuming the file is not protected, what I would be
copying would be graphics, and would be useless to paste into a text
editor or word processor for editing?

That's right, but you could run the "graphics" through an OCR if your OCR
software can handle the graphic format it is in, then you could genrate
text.

--
Salut!
_______________________________________
Jean-Guy Marcil - Word MVP
(e-mail address removed)
Word MVP site: http://www.word.mvps.org
 
Actually, if a PDF is not protected, you can copy text as text. There are
specific selection buttons for text and graphics.

--
Suzanne S. Barnhill
Microsoft MVP (Word)
Words into Type
Fairhope, Alabama USA

Email cannot be acknowledged; please post all follow-ups to the newsgroup so
all may benefit.
 
Hi Suzanne,

Normally yes, but here we were writing about PDF documents containing text
pages that were obtained from scanning a book, but not run through OCR, so
each page is a big single graphic. Even from within Acrobat you cannot
select the text. The text tool will not work, but the object tool does, and
if you use it, you end up moving the whole page around.

In those cases, you have to run the graphical page through an OCR software
to re-extract the text since that step was not done when the original
scanning took place.
I believe it could be done... I have never tried OCR from a PDF page, but I
know you can export a PDF page as a graphic, and OCR software usually accept
graphics from a scanner to extract the text... So it should work from a PDF
graphic, as long as the graphic is saved in a format recognized by the OCR.

Cheers!

--
Salut!
_______________________________________
Jean-Guy Marcil - Word MVP
(e-mail address removed)
Word MVP site: http://www.word.mvps.org
 
That's right, but you could run the "graphics" through an OCR if your OCR
software can handle the graphic format it is in, then you could genrate
text

Got it. Thanks Jean-Guy

Andy
 
Actually, if a PDF is not protected, you can copy text as text. There are
specific selection buttons for text and graphics.

But what about the example Jean-Guy gave, where the text is actually a
picture of the text?
 
Ah, okay. I hadn't been following the thread that closely.

--
Suzanne S. Barnhill
Microsoft MVP (Word)
Words into Type
Fairhope, Alabama USA

Email cannot be acknowledged; please post all follow-ups to the newsgroup so
all may benefit.
 
Jean-Guy Marcil said:
*Hi Suzanne,

Normally yes, but here we were writing about PDF documents containin
text
pages that were obtained from scanning a book, but not run throug
OCR, *

I seemed to have missed where this thread mentions that?

PDF's are not a graphical format. They are simply a file format. The
can be made up of text and or graphics. Using PDF's to hold graphic
of text, is like taking screenshots of Word documents then pasting the
back into word as graphics. Yes you can do it but its not the best us
of the format
 
Back
Top