| When webpages and pdf files don't permit selecting and copying text,
| is there a way around that? Not counting print screen, which I use
| sometimes, but won't help when the place I want to copy to only allows
| text.
|
Those are different issues. A webpage can do things
like blocking the right-click menu, but only if you enable
script.
A webmaster can also do things like using an image
of text. I recently saw a page where the author had
gone to great lengths to block image copying by
loading the images in a Flash program. (Without script
and Flash enabled there are no pictures on the page!)
If there's text there you should be able to get it
by disabling script. You can also view the source code
to get at the text. And in most browsers you can view
with no style, which makes the right selection easier
in some cases.
PDFs are different. Adobe designed the PDF format to
allow for a number of restrictions. Text copying can be
blocked. A password can be required. Etc. Those
restrictions are actually just "flags" in the PDF file. There's
not really any kind of lock. But most software respects
the flag. So, if you have a PDF with a text-copying
restriction the only option is to get software that will
bypass it. I think there is such software, but not for free.
It's an odd issue. Since you have a right to the file you
have a right to access the text, but Adobe has tried to
mimic white collar procedure in order to impart a sense
of solidity to digital files. In doing that they've done their
best to render a PDF as an immutable file that mimics a
printed page, and is actually designed just to get business
docs transported via PC and printer rather than via postal
mail.
Unfortunately, people often restrict PDFs for no good
reason. (I once downloaded a state auto accident report
form that I had to file in triplicate, and the editing function
was blocked!)
In some cases a PDF is actually a collection of scanned
book pages. In that case there isn't any text. Your only
option is to run it through OCR software. But actually, these
days OCR software is quite good, and usually comes free
with a scanner.
There's a command line PDF extractor named XPDF.
I wrote a convenient wrapper for it here:
http://www.jsware.net/jsware/pdfconv.php5
With that you can extract text and images. As I
note on that page, Sumatra PDF can also extract
text and does a better job. XPDF is outdated.
But Sumatra doesn't extract images.
Both XPDF and Sumatra can be recompiled to
ignore restriction flags with a very small code edit.
They're both OSS. But both authors have chosen
to respect the restriction flags in their compile.