Is there a program to convert .pdf to .txt?

  • Thread starter Thread starter Chris
  • Start date Start date
wald said:
XPdf has a "pdftotext" tool that does just that, besides
"pdfimages", "pdftops" and "pdffonts". It's open source, does the
job perfectly.

http://www.foolabs.com/xpdf/download.html

Unfortunately, it works no better than copying text in Acrobat Reader
then pasting it into Wordpad, or simply doing "File", "Save as Text..."
in that it loses spaces at the beginning of a line as well as double
hard returns.
 
Is there a program to convert .pdf to .txt?

Easy PDF to Text Converter

Easy PDF to Text Converter is freeware. It works well on Windows
98/ME/2000/NT/XP Platform.

Features of Easy PDF to Text Converter

* Supports PDF to Text file conversion
* Convert batches of PDF files to Text files at one time
* Processes the conversion with very high speed
* Does NOT need Adobe Acrobat software
* Keeps original page layout when convert pdf to text
* Support drag and drop files
* Support PDF1.5 protocol (formerly only supported by Acrobat6.0)
* Works well on Win98/ME/NT/2000/XP platforms
* Userfriendly interface and easy to use!

http://www.pdf-to-html-word.com/pdf-to-text/
 
John Corliss said:
Unfortunately, it works no better than copying text in Acrobat
Reader then pasting it into Wordpad, or simply doing "File",
"Save as Text..." in that it loses spaces at the beginning of a
line as well as double hard returns.

Well, in case you want to preserve formatting as much as possible,
it's probably better to use something like pdftohtml
(http://pdftohtml.sourceforge.net/), which converts PDF to... euh,
well, HTML :-)

Regards,
Wald
 
wald said:
Well, in case you want to preserve formatting as much as possible,
it's probably better to use something like pdftohtml
(http://pdftohtml.sourceforge.net/), which converts PDF to... euh,
well, HTML :-)

Thanks, but I've tried that. What you do is to convert the .pdf file to
..html, then open the page in your browser. Next you save the web page as
a text file (be sure to manually change the extension to ".txt" or it
won't work.)

Unfortunately, the same limitations that I listed above are present.
 
CharlieDontSurf said:
Easy PDF to Text Converter

Easy PDF to Text Converter is freeware. It works well on Windows
98/ME/2000/NT/XP Platform.

Features of Easy PDF to Text Converter

* Supports PDF to Text file conversion
* Convert batches of PDF files to Text files at one time
* Processes the conversion with very high speed
* Does NOT need Adobe Acrobat software
* Keeps original page layout when convert pdf to text
* Support drag and drop files
* Support PDF1.5 protocol (formerly only supported by Acrobat6.0)
* Works well on Win98/ME/NT/2000/XP platforms
* Userfriendly interface and easy to use!

http://www.pdf-to-html-word.com/pdf-to-text/

CharlieDontSurf,
I downloaded and installed this program. It's nice and the install is
fairly clean, but when I converted one .pdf document it lost a lot of
spaces between words. It didn't do this in all files that I converted
however.
Also, in any multipage .pdf file that I tried to convert to text, it
saved each page as a separate text file. Appending those pages to each
other is a real pain. Still, it comes closest to anything I've seen in
this thread to keeping original page layout.
 
CharlieDontSurf said:
Easy PDF to Text Converter

Easy PDF to Text Converter is freeware. It works well on Windows
98/ME/2000/NT/XP Platform.

Features of Easy PDF to Text Converter

* Supports PDF to Text file conversion
* Convert batches of PDF files to Text files at one time
* Processes the conversion with very high speed
* Does NOT need Adobe Acrobat software
* Keeps original page layout when convert pdf to text
* Support drag and drop files
* Support PDF1.5 protocol (formerly only supported by Acrobat6.0)
* Works well on Win98/ME/NT/2000/XP platforms
* Userfriendly interface and easy to use!

http://www.pdf-to-html-word.com/pdf-to-text/

CharlieDontSurf,
I downloaded and installed this program. It's nice and the install is
fairly clean, but when I converted one .pdf document it lost a lot of
spaces between words. It didn't do this in all files that I converted
however.
Also, in any multipage .pdf file that I tried to convert to text, it
saved each page as a separate text file. Appending those pages to each
other is a real pain. Still, it comes closest to anything I've seen in
this thread to keeping original page layout.
 
wald said:
Well, in case you want to preserve formatting as much as possible,
it's probably better to use something like pdftohtml
(http://pdftohtml.sourceforge.net/), which converts PDF to... euh,
well, HTML :-)

Thanks, but I've tried that. What you do is to convert the .pdf file to
..html, then open the page in your browser. Next you save the web page as
a text file (be sure to manually change the extension to ".txt" or it
won't work.)

Unfortunately, the same limitations that I listed above are present.
 
CharlieDontSurf,
I downloaded and installed this program. It's nice and the install is
fairly clean, but when I converted one .pdf document it lost a lot of
spaces between words. It didn't do this in all files that I converted
however.
Also, in any multipage .pdf file that I tried to convert to text, it
saved each page as a separate text file. Appending those pages to each
other is a real pain. Still, it comes closest to anything I've seen in
this thread to keeping original page layout.

--
Regards from John Corliss
I don't reply to trolls. No adware, cdware, commercial software,
crippleware, demoware, nagware, PROmotionware, shareware, spyware,
time-limited software, trialware, viruses or warez please.
 
John Corliss said:
Unfortunately, it works no better than copying text in Acrobat
Reader then pasting it into Wordpad, or simply doing "File",
"Save as Text..." in that it loses spaces at the beginning of a
line as well as double hard returns.

Just to be sure... have you looked at the available options
(pdftotext -h)? The -layout option looks like it might improve the
results for you, although I haven't tested it.

Regards,
wald
 
John Corliss wrote, twice: ...
I don't know if you've noticed, but your latest postings appear twice.
 
wald said:
Just to be sure... have you looked at the available options
(pdftotext -h)? The -layout option looks like it might improve the
results for you, although I haven't tested it.

Ack! And I did RTFM. Don't know why I didn't try that. Well, I'll give
it another go.
 
Michael said:
John Corliss wrote, twice: ...
I don't know if you've noticed, but your latest postings appear twice.

Yes, I did notice that. My hand slipped and pressed the send button
twice. I canceled it, but not all newsfeeds honor cancels.
 
CharlieDontSurf,
I downloaded and installed this program. It's nice and the install is
fairly clean, but when I converted one .pdf document it lost a lot of
spaces between words. It didn't do this in all files that I converted
however.
Also, in any multipage .pdf file that I tried to convert to text, it
saved each page as a separate text file. Appending those pages to each
other is a real pain. Still, it comes closest to anything I've seen in
this thread to keeping original page layout.

Thanks for the report, John. I haven't used it myself, just stumbled
upon it recently and thought I'd pass it along. Sounds like it might be
worth checking out.
 
Back
Top