Tutorial print web page to PDF with embedded clickable links

A

Anthony Susa

Is there a tutorial for printing web pages to PDF with clickable links?

When we download programs from the Internet on the Windows XP PC, we often
archive the landing page of the software download feature set so as to have
a reference point for each download for the future.

We could archive that software home page by saving HTML and then convert
the HTML to pdf somehow; but that generally creates a multi-file mess in
our downloads hierarchy. Printing to a single PDF with embedded top-level
clickable links seems so much more straightforward, if it can be done.

We can post-process add the missing top-level links manually using Adobe
Acrobat Standard software; but how does one automatically embed top-level
hypertext links when printing a web page to PDF on Windows XP?

Googling for web page to PDF printing on WinXP, I find the following
potential software solutions:
- Adobe Acrobat http://www.adobe.com/products/postscript/pdf.html
- PrimoPDF version 2.0 http://www.primopdf.com
- CutePDF version 2.5 http://www.cutepdf.com
- eDocPrinter PDF Pro version 6.18 http://www.iteksoft.com
- VirtualPDF Printer version 1.01 http://www.go2pdf.com/products.html
- PDF995 version 7.55s http://www.pdf995.com
- PDF Online version http://www.gohtm.com
- ActivePDF version 4.0 http://www.activepdf.com
- FinePrint pdfFactory version 2.46 http://www.fineprint.com
etc.

Is there a simple tutorial out there that you know of which shows how to
use a PDF printing program to print a web landing page to a single pdf file
containing embedded top-level clickable hypertext links?

Tony
 
A

Aandi Inston

Anthony Susa said:
Is there a tutorial for printing web pages to PDF with clickable links?

This isn't completely impossible, but is fairly unlikely. The "print
to PDF" method won't pass the information about links to the
"printer", so it can't make links.

There are plenty of other methods. For instance, the free tool
htmldoc. But since you already seem to have Acrobat, why not use
Create PDF > From Web Page?
 
A

Anthony Susa

Why not use Create PDF > From Web Page?

Hi Aandi,

Why not? Because I didn't know about the Adobe Acrobat Standard command:
- File > Create PDF > From Web Page

Thank you for pointing that wonderful command out. Indeed, it does create a
PDF of a web page or even the entire web site, with clickable links in the
PDF just as we asked! This is great! I found the tutorial for this at:
- http://www.webdevelopersjournal.com/software/acrobat_4_part2.html

I also found, while searching, other options, including the Microsoft
Internet Explorer menu to do the same thing:
- View > Explorer Bar > Adobe PDF >
- Convert Current Web Page to an Adobe PDF File
The tutorial for the IE Web Capture to clickable PDF is at:
- http://www.acrotips.com/pdf/tips/6024.pdf

Both seem to have some problems with javascript (or maybe it's just my
browser setup) but otherwise the links embedded in the PDF are clickable.

Thanks also for the additional suggestion of using htmldoc freeware
- http://www.pcworld.com/downloads/file_description/0,fid,5114,00.asp
But users should note only the version 1.5 is freeware; later versions are
no longer freeware apparently.

Another option is to use HTML2EXE to create a single file of a web site:
- http://www.softpicks.net/software/HTML2EXE-11403.htm

These are the only known methods so far to create a single-file PDF (or in
the case of HTML2EXE, a single-file exe) of an entire web site yet which
also contains all the links clickable by the user.

If you know of other methods to print an entire web site to PDF containing
clickable links, please post so we all gain knowledge together,

Tony Susa
 
A

Aandi Inston

Anthony Susa said:
Why not? Because I didn't know about the Adobe Acrobat Standard command:
- File > Create PDF > From Web Page ....
Both seem to have some problems with javascript (or maybe it's just my
browser setup) but otherwise the links embedded in the PDF are clickable.

This conversion does not use your browser at all. This means that
JavaScript and other things simply cannot work; it converts HTML and
graphics only.
Thanks also for the additional suggestion of using htmldoc freeware
- http://www.pcworld.com/downloads/file_description/0,fid,5114,00.asp
But users should note only the version 1.5 is freeware; later versions are
no longer freeware apparently.

Thanks for the update.
 
A

Anthony Susa


I had previously installed pre-compiled HTMLdoc version 1.6 based on the
URLs in the PC World web site:
http://www.pcworld.com/downloads/file_description/0,fid,5114,00.asp

But presumably your pointer to HTMLdoc version 1.8.24 is better - so,
here's the quick step by step tutorial I was asking for which archives not
only the PDF of the downloads landing page (with links), but, also tests
the base integrity of that PDF (using those links), and, accomplishes the
downloaded of the installation files themselves, using that PDF.

Anyone can follow us simply by cutting & pasting these commands:
0. Mkdir C:\Install\PDF\HTMLdoc
1. Start > Run > Acrobat
2. File > Create PDF > From Web Page
3. URL: http://users.tpg.com.au/naffall/htmldoc.html
4. Press the "Create" button to create the desired PDF
5. File > Save As > C:\Install\PDF\HTMLdoc\htmldoc.pdf
6. On the PDF, press the download link "HTMLDoc1.8.24.zip"
7. Select "Open Web Link in Browser" (why does this bring up IE?)
8. Press the IE "File Download" button "Save"
9. Save to C:\Install\PDF\HTMLdoc\HTMLDoc1.8.24.zip & press "Close"
10. Voila! You saved the installer & the web-clickable landing page!

Advice:
- Always save all installers in a known archive location
- Always save the landing page of the installer with the installer
- The archived-installer hierarchy mirrors the installation hierarchy
- That way it's easy when you perform the inevitable system rebuild
- You install from the archive to the same location in your programdir
- The archived web page provides necessary context of the installer
- If you need a later version, just use the archived web page links!

Thank you all for taking the time & effort to help me and others,
Tony Susa
 
A

Anthony Susa

1. Start > Run > Acrobat
2. File > Create PDF > From Web Page
3. URL: http://users.tpg.com.au/naffall/htmldoc.html
4. Press the "Create" button to create the desired PDF
5. File > Save As > C:\Install\PDF\HTMLdoc\htmldoc.pdf
6. On the PDF, press the download link "HTMLDoc1.8.24.zip"
7. Select "Open Web Link in Browser" (why does this bring up IE?)

Why does the Adobe "Open Web Link in Browser" command not bring up my
default browser (Firefox 1.5) - but instead, Acrobat brings up Internet
Explorer?

I never use Internet Explorer (it only exists on my PC because it came with
Windows XP).

Is there an Adobe Acrobat 6.0.3 setting that disobeys the Windows XP
default browser?

How can we set Adobe Acrobat to respect the Windows XP default browser
setting?
 
A

Aandi Inston

Anthony Susa said:
How can we set Adobe Acrobat to respect the Windows XP default browser
setting?

One thought: double check that Firefox really is the default browser,
from Windows' point of view, by doing START > Run and typing a URL
(including http://). See what browser is run.
 
A

Anthony Susa

Double check that Firefox really is the default browser,
from Windows' point of view, by doing START > Run and typing a URL
(including http://). See what browser is run.

Hmmmm ...

Start > Run > http://maps.google.com

pops up an error dialog box saying:
X This file does not have a program associated with it
for performing this action. Create an association in
the Folder Options control panel.

So I went to the WinXP Start > Settings > Control Panel
Of the ten options, none were "Folder Options".
So I pressed "Switch to Classic View" on the left side.
Then I saw "Folder Options" as an applet.
I clicked on the "Folder Options" "File Types" tab.
This showed "Registered File Types"
Typing "H" got me to "HTM" & "HTML" as "HTML File" types.
Both were set to open with "Microsoft Office 2003 component".
I pressed the "Change" button for HTM & HTML file types.
I select c:\proggies\browsers\firefox\firefox.exe
I press "Close".

To test:
Start > Run > http://groups.google.com
Bummer!
pops up an error dialog box saying:
X This file does not have a program associated with it
for performing this action. Create an association in
the Folder Options control panel.
 
A

Anthony Susa

If you have Set Program Access And Defaults on your Start menu see if
you can use that. Or see if there is any other advice on getting
FireFox to be the real default browser.

I'm not saying this will fix Reader; I don't know, but it's worth a
try.

Thanks Aandi Inston,

I appreciate the advice. I think, for now, I'll give up as if there was a
good way, it would have been suggested by an expert such as you by this
point.

For now, we have our coveted tutorial (see prior posts) which is the main
topic of this thread. We can now easily print either a landing page, any
number of levels deep, or an entire landing page to PDF with the click of a
button.

For example, I printed the entire www.craigslist.com web site to a PDF with
nary a glitch (it was a rather large PDF but it was just a test). All the
links were clickable. To test, I sent the web site PDF to a friend to test
and she confirmed the links worked for her even though all she had was the
Adobe PDF Reader program.

The only caveat is that we can now print any page to PDF except those which
require a login and a password. Even with that one flaw in the process,
that leaves all the other web sites easily printed to clickable PDF.

Thank you all for the help - the definitive summary of your efforts is:
- We can all print web pages to clickable PDF now
- But we can't print them if the web page is password protected
- Even if we know the login and password

All in all, not a bad compromise for such a wonderful feature!
Tony Susa
 
S

Stan Brown


Unfortunately the author's Web site says it doesn't support style
sheets. Anyone know of an alternative that does?

--
Stan Brown, Oak Road Systems, Tompkins County, New York, USA
http://OakRoadSystems.com/
HTML 4.01 spec: http://www.w3.org/TR/html401/
validator: http://validator.w3.org/
CSS 2.1 spec: http://www.w3.org/TR/CSS21/
validator: http://jigsaw.w3.org/css-validator/
Why We Won't Help You:
http://diveintomark.org/archives/2003/05/05/why_we_wont_help_you
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top