You're discussing optical character recognition, as an option
for handling content. Here are a few examples, of sending a
newspaper article to a friend.
*******
Scan ------> bitmap -----> bitmap stored in .PDF file
In that case, the resulting newspaper.PDF is neatly and accurately
preserved.
It looks almost as good, as the original piece of newspaper. The
recipient's
eyeballs, will do an excellent job of looking at any marginally
captured printed letters, and figuring out what the article says.
(Humans are better at that, than any computer program. OCR sucks.)
In addition, if the newspaper article includes pictures, they'll be
preserved. This requires that the recipient, own a PDF reader program.
The intermediate case, works like this.
Scan ------> bitmap ------+------ bitmap plus words, stored in .PDF
file
| ^
| | Text
+-- OCR ---+
Some optical character recognition software (may come with scanner, or be
purchased separately), has the ability to find letters in the image, and
lay a text string over top of the scanned bitmap image. Any text which
is not recognized, is not converted. In some cases, this gives an enhanced
appearance to the text. But the recognition is at best perhaps 99%, so
there will be errors. And the error-filled letters, will hide the
part of the bitmap underneath with the originally captured image.
In that case, you send newspaper.PDF to your friend, and the OCR
may make a few, distracting mistakes.
Now, in the third case, the operation looks like this. Again, OCR
is used, to convert what looks like text in the bitmap scan, into
letters. The letters are then stored in a text file, with poor
formatting. If the original newspaper article was three columns
of text, I can't guarantee all the words are in neat columns. It
may take significant text editing after the OCR step, to make
a presentable document.
Scan ------> bitmap
|
|
+-- OCR ---> words only, stored in a text file
That third option, of using a text file, means that more tools
at the recipient's end can be used. You can even copy the text,
right into the email, and not bother making an attachment from it.
But any transcription errors in the text, have to be fixed at your
end, to prevent annoying the recipient with a multitude of errors.
I think the first option, scanning as a bitmap, saving as PDF
and sending that, is the best compromise. It requires no labor
on your end, there is no possibility of OCR errors (because
you're not using OCR). The only disadvantage, is the recipient
must have a PDF reader program.
If you could find a small enough PDF reader program, you could
send that by email as well. At least one non-Adobe free one,
is near the end of the list (Foxit Reader). Perhaps that is
smaller than the multi-megabytes of some Adobe download.
http://en.wikipedia.org/wiki/PDF_reader
There is a fourth conversion option. It would look like this.
Scan ------> bitmap -----> bitmap stored in JPEG picture file
In that case, the recipient needs an image viewer program.
And the computer may already have one of those, even if
it doesn't have a PDF viewer. Choose an image format, which
you know the OS of the recipient can handle. There are formats
other than JPEG for example.
For highest compression, I might select CCITT compression in a
TIFF format, with thresholding used to convert the scan into
black and white. By adjusting the threshold, you can send a page
of text in about 50 KB as an image file. And that is easier
for someone who is on dialup to download. That is the computer
equivalent of using a FAX machine, since the CCITT compression
algorithm is the same one as is used for faxes.
http://en.wikipedia.org/wiki/Fax
"The transferred image formats are called ITU-T
(formerly CCITT) fax group 3 or 4."
The TIFF image format, has an option for that kind of
compression. But on your end, you need to use an
image editing program, to reduce the image to black
and white from the original color scan. "Setting the
threshold" determines whether a dot is black or white.
You adjust the threshold, until you can read the
newpaper print in the image.
And you'd only do the extra work, if you knew the
recipient was on dialup. If your friend is on broadband,
then just send the JPEG image file as an attachment to
the email.
The best options, don't require excessive work on your part.
PDF or JPEG, and you could be done in five minutes.
I'm a big fan of
1) Knowing the recipient's skills and the capabilities of
their computer.
2) Only sending something they can open.
I consider it rude, to send something they don't have a
chance of opening. (Like if someone sends me a Christmas
morning movie they shot, in a format I can't even figure out
what it is.) But, that's just me
*******
When you're using the scanner, you don't need to scan at
extremely high dots per inch. Newspaper and fine art magazines,
are printed with "dots" of ink. To scan them, select the
"de-screen" option in your scanner software, and tell the
software what the screen print frequency is. For example,
on a fine art magazine, there might be about 150 dots of
ink per inch. You tell the scanner, what that screen
frequency is. You can scan, somewhere between 1x and 2x
the screen frequency, for adequate results. If the newspaper
uses 75 dots of ink per inch, then scanning at no more
than 150 dots per inch, with de-screen turned on, should
suffice. (That's just going from memory. I haven't used
my scanner in years

It's all dusty. )
You can do a practice scan at high resolution (1000 DPI),
on a tiny section of newspaper first. Then, open the image
in your image editor, and count how many dots of ink there
are per inch. On your final scan, you can use the information
gathered, to set up the de-screen.
http://en.wikipedia.org/wiki/Color_printing
"Screens with a "frequency" of 60 to 120 lines per inch
(lpi) reproduce color photographs in newspapers. The coarser
the screen (lower frequency), the lower the quality of
the printed image. Highly absorbent newsprint requires a
lower screen frequency than less-absorbent coated paper
stock used in magazines and books, where screen frequencies
of 133 to 200 lpi and higher are used."
This article gives more practical advice.
"Why we use descreening"
http://www.scanhelp.com/288int/scontent/descreen.html
HTH,
Paul