Scanning text

B

bm

I need advice on the correct way to scan a text document in order to add it
to an email
I have an Epson scanner which allows me to select the type from JPEG,
BITMAP,Tiff PDF etc.
I chose PDF and selected Text and scanned the newspaper cutting . I saved it
to my scanning file and it transpires that it is now an Adobe Acrobat
Document. This I can open but can my recipients open it if they do not have
the Adobe Reader?
Alternatively how should I send it to ensure that my friends can read it?
Blair
 
P

pjp

Send it in an image file format rather than a document file format, e.g.
jpg, bmp, tiff are all image files (e.g. a picture, jpg keeps size down, bmp
is original, both 100% no problems unless pc has problems, tiff maybe a
problem) that IE or even Paint can open where-as pdf is a document file
format for Adobe Reader hence need that (or similar) installed to view it.
 
P

Paul

bm said:
I need advice on the correct way to scan a text document in order to add it
to an email
I have an Epson scanner which allows me to select the type from JPEG,
BITMAP,Tiff PDF etc.
I chose PDF and selected Text and scanned the newspaper cutting . I saved it
to my scanning file and it transpires that it is now an Adobe Acrobat
Document. This I can open but can my recipients open it if they do not have
the Adobe Reader?
Alternatively how should I send it to ensure that my friends can read it?
Blair

You're discussing optical character recognition, as an option
for handling content. Here are a few examples, of sending a
newspaper article to a friend.

*******

Scan ------> bitmap -----> bitmap stored in .PDF file

In that case, the resulting newspaper.PDF is neatly and accurately preserved.
It looks almost as good, as the original piece of newspaper. The recipient's
eyeballs, will do an excellent job of looking at any marginally
captured printed letters, and figuring out what the article says.
(Humans are better at that, than any computer program. OCR sucks.)
In addition, if the newspaper article includes pictures, they'll be
preserved. This requires that the recipient, own a PDF reader program.

The intermediate case, works like this.

Scan ------> bitmap ------+------ bitmap plus words, stored in .PDF file
| ^
| | Text
+-- OCR ---+

Some optical character recognition software (may come with scanner, or be
purchased separately), has the ability to find letters in the image, and
lay a text string over top of the scanned bitmap image. Any text which
is not recognized, is not converted. In some cases, this gives an enhanced
appearance to the text. But the recognition is at best perhaps 99%, so
there will be errors. And the error-filled letters, will hide the
part of the bitmap underneath with the originally captured image.

In that case, you send newspaper.PDF to your friend, and the OCR
may make a few, distracting mistakes.

Now, in the third case, the operation looks like this. Again, OCR
is used, to convert what looks like text in the bitmap scan, into
letters. The letters are then stored in a text file, with poor
formatting. If the original newspaper article was three columns
of text, I can't guarantee all the words are in neat columns. It
may take significant text editing after the OCR step, to make
a presentable document.

Scan ------> bitmap
|
|
+-- OCR ---> words only, stored in a text file

That third option, of using a text file, means that more tools
at the recipient's end can be used. You can even copy the text,
right into the email, and not bother making an attachment from it.
But any transcription errors in the text, have to be fixed at your
end, to prevent annoying the recipient with a multitude of errors.

I think the first option, scanning as a bitmap, saving as PDF
and sending that, is the best compromise. It requires no labor
on your end, there is no possibility of OCR errors (because
you're not using OCR). The only disadvantage, is the recipient
must have a PDF reader program.

If you could find a small enough PDF reader program, you could
send that by email as well. At least one non-Adobe free one,
is near the end of the list (Foxit Reader). Perhaps that is
smaller than the multi-megabytes of some Adobe download.

http://en.wikipedia.org/wiki/PDF_reader

There is a fourth conversion option. It would look like this.

Scan ------> bitmap -----> bitmap stored in JPEG picture file

In that case, the recipient needs an image viewer program.
And the computer may already have one of those, even if
it doesn't have a PDF viewer. Choose an image format, which
you know the OS of the recipient can handle. There are formats
other than JPEG for example.

For highest compression, I might select CCITT compression in a
TIFF format, with thresholding used to convert the scan into
black and white. By adjusting the threshold, you can send a page
of text in about 50 KB as an image file. And that is easier
for someone who is on dialup to download. That is the computer
equivalent of using a FAX machine, since the CCITT compression
algorithm is the same one as is used for faxes.

http://en.wikipedia.org/wiki/Fax

"The transferred image formats are called ITU-T
(formerly CCITT) fax group 3 or 4."

The TIFF image format, has an option for that kind of
compression. But on your end, you need to use an
image editing program, to reduce the image to black
and white from the original color scan. "Setting the
threshold" determines whether a dot is black or white.
You adjust the threshold, until you can read the
newpaper print in the image.

And you'd only do the extra work, if you knew the
recipient was on dialup. If your friend is on broadband,
then just send the JPEG image file as an attachment to
the email.

The best options, don't require excessive work on your part.
PDF or JPEG, and you could be done in five minutes.

I'm a big fan of

1) Knowing the recipient's skills and the capabilities of
their computer.

2) Only sending something they can open.

I consider it rude, to send something they don't have a
chance of opening. (Like if someone sends me a Christmas
morning movie they shot, in a format I can't even figure out
what it is.) But, that's just me :)

*******

When you're using the scanner, you don't need to scan at
extremely high dots per inch. Newspaper and fine art magazines,
are printed with "dots" of ink. To scan them, select the
"de-screen" option in your scanner software, and tell the
software what the screen print frequency is. For example,
on a fine art magazine, there might be about 150 dots of
ink per inch. You tell the scanner, what that screen
frequency is. You can scan, somewhere between 1x and 2x
the screen frequency, for adequate results. If the newspaper
uses 75 dots of ink per inch, then scanning at no more
than 150 dots per inch, with de-screen turned on, should
suffice. (That's just going from memory. I haven't used
my scanner in years :) It's all dusty. )

You can do a practice scan at high resolution (1000 DPI),
on a tiny section of newspaper first. Then, open the image
in your image editor, and count how many dots of ink there
are per inch. On your final scan, you can use the information
gathered, to set up the de-screen.

http://en.wikipedia.org/wiki/Color_printing

"Screens with a "frequency" of 60 to 120 lines per inch
(lpi) reproduce color photographs in newspapers. The coarser
the screen (lower frequency), the lower the quality of
the printed image. Highly absorbent newsprint requires a
lower screen frequency than less-absorbent coated paper
stock used in magazines and books, where screen frequencies
of 133 to 200 lpi and higher are used."

This article gives more practical advice.

"Why we use descreening"

http://www.scanhelp.com/288int/scontent/descreen.html

HTH,
Paul
 
C

choro

But, sir, Adobe Reader is a ubiquitous program that is absolutely free
and freely downloadable. The whole idea of .pdf files is that (once the
absolutely free Adobe Reader is downloaded and installed) they can be
opened on any computer. And there is no excuse why such a useful free
program should not be on any computer especially when some versions of
..pdf files are also searchable. Surely this facility makes it far more
desirable to send a .pdf file as opposed to .jpg or some suchlike image
files.

choro
*****
 
H

Harden Thicke

It is very doubtful that there is anyone in the world that doesn't have a
PDF reader on their computer. PDF is an extremely common format and it is
used all the time.
 
T

Tim Meddick

You asked for the "best" way to scan, save and send your text document....

The best way, if you have Microsoft Office installed, or your Epson scanner
came with OCR (Optical Character Recognition).

If you have M$ Office, you can use OCR with Microsoft Office Document
Imaging under Office Tools in the Office Start Menu.

The point being, that converting your text-document with OCR software into
a digital text file such as a plain [.TXT] or Microsoft [.DOC] files, are
going to save on size dramatically - being in the order of %1 of the size
of an image-file.

Your PDF solution seems a good idea, and it is about the best of the
imaging choices.

I say imaging because, although PDF format uses a mixture of compressed
imaging *and* formatted text in the way it stores it's presentations, if
you choose not to employ OCR, the PDF will contain just a compressed
image - measurably larger.than a text-based PDF file.

Because there's a big difference in the size of a PDF document that has
used OCR and has converted the actual text from the scanned document, and
one that has simply made a compressed bit-map out of the scanned image.

By using Microsoft Office Document Imaging, or OCR software bundled with
your Epson scanner, you can save your file (once converted to text) as a
character-based file like [.TXT] [.RTF] [.WRI] or [.DOC] amongst by far the
most popular text formats.

Take a typical A4 letter that has been scanned and left as an image, or
converted to .PDF [NOT USING CHARACTER RECOGNITION] will turn out around :

200 KB for a PDF without OCR (.pdf)
200 KB for a JPEG compressed image (.jpg).
1.5 Mb for an uncompressed bitmap (.bmp)

However, a scanned document, converted to text using OCR software, will
turn out at around

13 KB for a PDF *using* OCR (.pdf).
3 KB for M$ Word Document (.doc).
1 KB for a plain text file (.txt).

So you can see, that getting your letters into text (character) format,
will be immensely beneficial on the end size of the file you are trying to
attach and email.

==

Cheers, Tim Meddick, Peckham, London. :)
 
D

Don Phillipson

I need advice on the correct way to scan a text document in order to add it
to an email
I have an Epson scanner which allows me to select the type from JPEG,
BITMAP,Tiff PDF etc.
I chose PDF and selected Text and scanned the newspaper cutting . I saved
it to my scanning file and it transpires that it is now an Adobe Acrobat
Document. This I can open but can my recipients open it if they do not
have the Adobe Reader?
Alternatively how should I send it to ensure that my friends can read it?

If you doubt the recipients can read Adobe PDF files (see
other replies) you must ask what formats they can read,
or at least what Operating Systems their PCs use. We
know WinXP PCs are configured to display standard
graphic formats (JPG, BMP, TIF etc.)
 
B

bm

Paul said:
You're discussing optical character recognition, as an option
for handling content. Here are a few examples, of sending a
newspaper article to a friend.

*******

Scan ------> bitmap -----> bitmap stored in .PDF file

In that case, the resulting newspaper.PDF is neatly and accurately
preserved.
It looks almost as good, as the original piece of newspaper. The
recipient's
eyeballs, will do an excellent job of looking at any marginally
captured printed letters, and figuring out what the article says.
(Humans are better at that, than any computer program. OCR sucks.)
In addition, if the newspaper article includes pictures, they'll be
preserved. This requires that the recipient, own a PDF reader program.

The intermediate case, works like this.

Scan ------> bitmap ------+------ bitmap plus words, stored in .PDF
file
| ^
| | Text
+-- OCR ---+

Some optical character recognition software (may come with scanner, or be
purchased separately), has the ability to find letters in the image, and
lay a text string over top of the scanned bitmap image. Any text which
is not recognized, is not converted. In some cases, this gives an enhanced
appearance to the text. But the recognition is at best perhaps 99%, so
there will be errors. And the error-filled letters, will hide the
part of the bitmap underneath with the originally captured image.

In that case, you send newspaper.PDF to your friend, and the OCR
may make a few, distracting mistakes.

Now, in the third case, the operation looks like this. Again, OCR
is used, to convert what looks like text in the bitmap scan, into
letters. The letters are then stored in a text file, with poor
formatting. If the original newspaper article was three columns
of text, I can't guarantee all the words are in neat columns. It
may take significant text editing after the OCR step, to make
a presentable document.

Scan ------> bitmap
|
|
+-- OCR ---> words only, stored in a text file

That third option, of using a text file, means that more tools
at the recipient's end can be used. You can even copy the text,
right into the email, and not bother making an attachment from it.
But any transcription errors in the text, have to be fixed at your
end, to prevent annoying the recipient with a multitude of errors.

I think the first option, scanning as a bitmap, saving as PDF
and sending that, is the best compromise. It requires no labor
on your end, there is no possibility of OCR errors (because
you're not using OCR). The only disadvantage, is the recipient
must have a PDF reader program.

If you could find a small enough PDF reader program, you could
send that by email as well. At least one non-Adobe free one,
is near the end of the list (Foxit Reader). Perhaps that is
smaller than the multi-megabytes of some Adobe download.

http://en.wikipedia.org/wiki/PDF_reader

There is a fourth conversion option. It would look like this.

Scan ------> bitmap -----> bitmap stored in JPEG picture file

In that case, the recipient needs an image viewer program.
And the computer may already have one of those, even if
it doesn't have a PDF viewer. Choose an image format, which
you know the OS of the recipient can handle. There are formats
other than JPEG for example.

For highest compression, I might select CCITT compression in a
TIFF format, with thresholding used to convert the scan into
black and white. By adjusting the threshold, you can send a page
of text in about 50 KB as an image file. And that is easier
for someone who is on dialup to download. That is the computer
equivalent of using a FAX machine, since the CCITT compression
algorithm is the same one as is used for faxes.

http://en.wikipedia.org/wiki/Fax

"The transferred image formats are called ITU-T
(formerly CCITT) fax group 3 or 4."

The TIFF image format, has an option for that kind of
compression. But on your end, you need to use an
image editing program, to reduce the image to black
and white from the original color scan. "Setting the
threshold" determines whether a dot is black or white.
You adjust the threshold, until you can read the
newpaper print in the image.

And you'd only do the extra work, if you knew the
recipient was on dialup. If your friend is on broadband,
then just send the JPEG image file as an attachment to
the email.

The best options, don't require excessive work on your part.
PDF or JPEG, and you could be done in five minutes.

I'm a big fan of

1) Knowing the recipient's skills and the capabilities of
their computer.

2) Only sending something they can open.

I consider it rude, to send something they don't have a
chance of opening. (Like if someone sends me a Christmas
morning movie they shot, in a format I can't even figure out
what it is.) But, that's just me :)

*******

When you're using the scanner, you don't need to scan at
extremely high dots per inch. Newspaper and fine art magazines,
are printed with "dots" of ink. To scan them, select the
"de-screen" option in your scanner software, and tell the
software what the screen print frequency is. For example,
on a fine art magazine, there might be about 150 dots of
ink per inch. You tell the scanner, what that screen
frequency is. You can scan, somewhere between 1x and 2x
the screen frequency, for adequate results. If the newspaper
uses 75 dots of ink per inch, then scanning at no more
than 150 dots per inch, with de-screen turned on, should
suffice. (That's just going from memory. I haven't used
my scanner in years :) It's all dusty. )

You can do a practice scan at high resolution (1000 DPI),
on a tiny section of newspaper first. Then, open the image
in your image editor, and count how many dots of ink there
are per inch. On your final scan, you can use the information
gathered, to set up the de-screen.

http://en.wikipedia.org/wiki/Color_printing

"Screens with a "frequency" of 60 to 120 lines per inch
(lpi) reproduce color photographs in newspapers. The coarser
the screen (lower frequency), the lower the quality of
the printed image. Highly absorbent newsprint requires a
lower screen frequency than less-absorbent coated paper
stock used in magazines and books, where screen frequencies
of 133 to 200 lpi and higher are used."

This article gives more practical advice.

"Why we use descreening"

http://www.scanhelp.com/288int/scontent/descreen.html

HTH,
Paul

I am very grateful to you and others who have replied. I have learned a lot
which will stand me in good stead in future.
It looks that my decision to use PDF file was the best one and I sent the
first trial to a friend and he tells me he read it OK but the image was
very small and he couldn't increase the size.
Blair
 
P

Paul

bm said:
I am very grateful to you and others who have replied. I have learned a lot
which will stand me in good stead in future.
It looks that my decision to use PDF file was the best one and I sent the
first trial to a friend and he tells me he read it OK but the image was
very small and he couldn't increase the size.
Blair

The Acrobat Reader has a zoom function. Depending on the version,
it can probably magnify at least 16 times. So that could be used
to fix it, at the recipient's end. (Try looking in the "View" menu.)

You should review your documents, in your own copy of Acrobat
Reader, before sending them. That will allow you to anticipate
problems, before they happen.

I realize the latest version of Acrobat Reader is not very
friendly. One of the reasons I haven't upgraded, is Acrobat
Reader 9 has a dreadful interface. I continue to use Acrobat 6
version, as it is easier to use, and I'm more productive.
(The search function works better.) I don't know "what Adobe
was smoking", when they wrote version 9. It's a step backwards.
Sometimes I'm forced to use version 9, because the document
I get, won't open in 6. And I really hate having to use
another computing environment, to do that.

I'm surprised your dedicated scanning program, with PDF output,
didn't do a better job for you. There isn't much point in
such a program, unless it is easy to use and produces
perfect results. I've converted lots of documents, using
half-baked free tools, and it can take many tried to
get all the scale, DPI, and other issues, sorted out.
When you pay for a program to do it, it's supposed to work :)

Paul
 
T

Tim Meddick

Again, the best way to crate an Acrobat PDF file (if you haven't got
£700 to spend on the full Acrobat Suite) is to create your document in Word
and then use one of the third-party PDF Virtual Printers that are currently
being distributed as freeware (Cute PDF is one).

That way, you can get a very clear idea of what your PDF file is going
to look like as a finished article - pictures and text.

Otherwise, you may find that you have your pictures in too small a
resolution to be able to gain any benefit from Acrobat Reader's zoom
facility.

The "Virtual Printer" is installed like any software and appears as an
installed printer. When selecting to print to the Virtual Printer instead
of your "real" one, a "Save As..." dialog appears for you to select a file
to save a PDF document to.

This is not the same as the "Print to file" check-box option, which
does not produce any coherent results.


Cute PDF can be obtained from :
http://www.cutepdf.com/download/CuteWriter.exe

Cute PDF Writer requires the installation of GohstScript also :
http://www.cutepdf.com/download/converter.exe

==

Cheers, Tim Meddick, Peckham, London. :)
 
C

choro

Or download and install the free OpenOffice.Org where you can open any
MS Office user file and save it as a PDF file. Simple and Free....
Freeee....!
 
W

wilby

I need advice on the correct way to scan a text document in order to add it
to an email
I have an Epson scanner which allows me to select the type from JPEG,
BITMAP,Tiff PDF etc.
I chose PDF and selected Text and scanned the newspaper cutting . I saved it
to my scanning file and it transpires that it is now an Adobe Acrobat
Document. This I can open but can my recipients open it if they do not have
the Adobe Reader?
Alternatively how should I send it to ensure that my friends can read it?
Blair

Many good suggestions have been offered.
I would like to offer a different way to decide what format to use.

Over several years I have sent many e-mail messages to a group of about
500 people. I have tried many different formats and have received
several complaints. My experience tells me the following:

TEXT: Everyone can read text.
RTF: Few complaints.
MS Word: A few complaints.
MS Excel: Quite a few more complaints.
PDF: Very few complaints.
JPG: Several more complaints than PDF.
TIF: A big mistake to send it this way.
PUB: Huge problems, it was a mistake on my part to use this.

Wilby
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top