convert pdf to html ??

  • Thread starter Thread starter rth
  • Start date Start date
R

rth

any freeware to do pdf -> html ? especially pdf's with images and tables /
columns in them ??
 
rth said:
any freeware to do pdf -> html ? especially pdf's with images and tables /
columns in them ??

Yesterday I looked and looked. Found one named "Clickcat PDF-to-HTML"
which you should avoid because the free version has this limitation:

"In the generated HTML code some letters are exchanged."

Also, the download is 43 mb!

I did, however, find PDF2HTML at Snapfiles:

http://www.snapfiles.com/get/pdf2html.html

Note that it's a command line too though. Homepage is here:

http://sourceforge.net/projects/pdftohtml/
 
thanks John.. got the whole bit, plus the gui and ghost but it didn't do a
very good job... I'd as soon settle for something that'd pull the text out
as plain text and find some other way of sucking out the pictures... if I
could get that stuff separately then I could re-author the doc as html by
hand.
 
rth said:
thanks John.. got the whole bit, plus the gui and ghost but it didn't do a
very good job...

Sorry that didn't work for you.
I'd as soon settle for something that'd pull the text out
as plain text and find some other way of sucking out the pictures... if I
could get that stuff separately then I could re-author the doc as html by
hand.

You can pull the text out by selecting and copying it right inside
Acrobat Reader (I use version 5.0.) It even retains some of the
formatting if you paste into Wordpad instead of Notepad. As for the
images, that would take using your "Printscreen" button and then
pasting the clipboard into something like PhotoFiltre. Next, either
crop the resulting images to remove the rest of the page in the .pdf
file you copied from, or select the images you want from the paste
then copy, paste and save *those* as new images in the image editor
you're using. You can then combine the images and the text in your
favorite HTML editor to make the page.

Note though, that copying images in this way may lead to a reduction
in image quality.

HTH
 
actually, what i finally wound up doing was to do a color print of the pdf
and then scanned it with Omnipage 14 to get the text, then scanned in the
images via Corel... then, with all the pieces, I used nvu to put it together
into html ... took the long way around <g> ...
 
Back
Top