PC Review


Reply
Thread Tools Rate Thread

Beginner's questions on scanning

 
 
Larry
Guest
Posts: n/a
 
      10th Jul 2004



I have a large job to do, turning several hundreds of pages of 8.5 x 11
xeroxed pages into a single .pdf document. It is straight text. The
computer I'm using has the Lexmark 1100 all-in-one with a flat bed
scanner. I've never done a large scan job and have never created a pdf
document.

What is the best and fastest way to do this? Should I scan into Word,
and then change the Word documents to .pdf? The problem with that is
that when I pick Word as the application for the scanning, after a
single page is scanned, Word opens with that document in it. If I then
scan a second page, a _second_ instance of Word opens with that second
page in it, instead of adding the second page to the first page in
single document. I don't see how to create a multiple page document if
I'm using Word as the application

Alternatively, I can scan into OCR and then change the OCR to pdf, and
that allows the creation of a multiple-page document, which I can then
save as a pdf. That seems the way to go. However, if I do scan into
OCR, the application I must use by default is the Lexmark Photo Editor,
and that doesn't sound like the right kind of application for pages that
are all text.

Another question. Let's say I scan 50 pages into the OCR and then turn
that into a pdf file. Then I go away and come back to do more pages
which get turned into a second pdf file. How do I combine the two pdf
files into a single file, keeping the correct order of the pages?

BTW, the Help files for the Lexmark are really poor. Everything is
broken into separate little steps, nothing gives you an overview of how
to proceed.

I appreciate any tips on this. Thanks much.

Larry



 
Reply With Quote
 
 
 
 
lostinspace
Guest
Posts: n/a
 
      10th Jul 2004
----- Original Message -----
From: "Larry" <>
Newsgroups: comp.periphs.scanners
Sent: Saturday, July 10, 2004 12:34 AM
Subject: Beginner's questions on scanning


>
>
>
> I have a large job to do, turning several hundreds of pages of 8.5 x 11
> xeroxed pages into a single .pdf document. It is straight text. The
> computer I'm using has the Lexmark 1100 all-in-one with a flat bed
> scanner. I've never done a large scan job and have never created a pdf
> document.
>
> What is the best and fastest way to do this? Should I scan into Word,
> and then change the Word documents to .pdf? The problem with that is
> that when I pick Word as the application for the scanning, after a
> single page is scanned, Word opens with that document in it. If I then
> scan a second page, a _second_ instance of Word opens with that second
> page in it, instead of adding the second page to the first page in
> single document. I don't see how to create a multiple page document if
> I'm using Word as the application
>
> Alternatively, I can scan into OCR and then change the OCR to pdf, and
> that allows the creation of a multiple-page document, which I can then
> save as a pdf. That seems the way to go. However, if I do scan into
> OCR, the application I must use by default is the Lexmark Photo Editor,
> and that doesn't sound like the right kind of application for pages that
> are all text.
>
> Another question. Let's say I scan 50 pages into the OCR and then turn
> that into a pdf file. Then I go away and come back to do more pages
> which get turned into a second pdf file. How do I combine the two pdf
> files into a single file, keeping the correct order of the pages?
>
> BTW, the Help files for the Lexmark are really poor. Everything is
> broken into separate little steps, nothing gives you an overview of how
> to proceed.
>
> I appreciate any tips on this. Thanks much.
>
> Larry
>
>
>


Hello Larry,
Is your ALL-in-one scanner a flatbed or a feed-fax-like?
I've never used the fax-like, except as a photo copier.

You haven't specified to any length what your end-use or purpose will be for
the entire package?
Does it need to be searchable?
Or will it just be printed?

Nor do you specify the quality of the photostat copies your working with?
Nor, even the source the photo copies were taken from?

Each scanning job is a new method when OCRing text. The quality of paper,
ink and print quality in the original source ALL effect your method and
capabilities.

FAST?
Rather than OCR!
Scan directly into Acrobat as "line-art" at 150DPI of your scanner setting.

The end result will be what you desire, FAST (also the scan file sizes will
be compact,) however what your able to use this line-art scan for will be
very limited (NO search option.) The quality of printing in a line-art
scanned PDF is no where near the quality of an OCR'd with fonts job, however
in most instances the line-art items are perfectly readable.

If you require searchable pages and quality print?
There there is NOT any FAST method.
You OCR the pages individually and make the corrections.

There are ways to improve your OCR work.
Most scanner when used for OCRing text are nearly as sensitive as what I've
sen discussed in this forum for scanning slides.
My current scanner was cleaned when it was NEW and FRESH out of the box
because the maunfacturer had some type of haze on the glass. IMO it was not
a sign of effective quality control, however with regular cleaning the
bottom line scanner has worked superbly for my purposes.

I do way more scanning than the average or even above average person. In
most instances, what I'm working with are magazine going back from 1940 to
current. However I've even scanned some books from as early as 1903.
In scanning the magazines, I scan the text and images separtely. The text
is OCR'd and saved as a Word RTF (or 6.0) doc. In rare instances where the
printed font is very small I may either scan as line-art or OCR, it depends
as previously mentioned on what my intended end-use is.

Word does NOT open Acrobat PDF's.
Word does NOT edit Acrobat PDF's.
The two are entirely separate software's. The Acrobat Reader may open from
within Word, however that transition and/or difference should be obvious to
the user.

In Acrobat (the full version-NOT the Free Reader,) your able to insert
and/or rearrange your pages both before and after the current page.

When adding a new page to any type of scanned document, your active cursor
should be at the position you desire the scan inserted.

Sounds to me like you just need to read up on SCANNING.
Here's an excellent source and every SECOND you spend reading these pages
will save you days and weeks in time later.
http://www.scantips.com/



 
Reply With Quote
 
-
Guest
Posts: n/a
 
      10th Jul 2004
One thing to add to what "lostinspace" said, with Acrobat you can download a
module for the full version of Acrobat 5.x that adds OCR capability to
Acrobat (it may be standard in the newer 6.x versions). The OCR is not
perfect, but as OCR goes it is fairly decent. I would definitely scan at
225-300 dpi if you plan to use OCR in Acrobat just so that it does a bit
better job of resolving the text.

Doug
--
Doug's "MF Film Holder" for batch scanning "strips" of 120/220 medium format
film:
http://home.earthlink.net/~dougfishe...mainintro.html


 
Reply With Quote
 
Larry
Guest
Posts: n/a
 
      10th Jul 2004



As I said, it is a flat bed scanner. The originals are good quality
xeroxes copied from books and articles, on 8 x 11 sheets. It is a
collection of articles and excerpts from books on politics and political
philosophy. The end use will be a pdf document posted at a web site
that people can access.

Why do you capitalize FAST? Is that an acronym?

I will check out the link you provided.

Thanks,
Larry

>
> Hello Larry,
> Is your ALL-in-one scanner a flatbed or a
> feed-fax-like? I've never used the fax-like, except as a photo copier.
>
> You haven't specified to any length what your end-use or purpose will
> be for the entire package?
> Does it need to be searchable?
> Or will it just be printed?
>
> Nor do you specify the quality of the photostat copies your working
> with? Nor, even the source the photo copies were taken from?
>
> Each scanning job is a new method when OCRing text. The quality of
> paper, ink and print quality in the original source ALL effect your
> method and capabilities.
>
> FAST?
> Rather than OCR!
> Scan directly into Acrobat as "line-art" at 150DPI of your scanner
> setting.
>
> The end result will be what you desire, FAST (also the scan file
> sizes will be compact,) however what your able to use this line-art
> scan for will be very limited (NO search option.) The quality of
> printing in a line-art scanned PDF is no where near the quality of an
> OCR'd with fonts job, however in most instances the line-art items
> are perfectly readable.
>
> If you require searchable pages and quality print?
> There there is NOT any FAST method.
> You OCR the pages individually and make the corrections.
>
> There are ways to improve your OCR work.
> Most scanner when used for OCRing text are nearly as sensitive as
> what I've sen discussed in this forum for scanning slides.
> My current scanner was cleaned when it was NEW and FRESH out of the
> box because the maunfacturer had some type of haze on the glass. IMO
> it was not a sign of effective quality control, however with regular
> cleaning the bottom line scanner has worked superbly for my purposes.
>
> I do way more scanning than the average or even above average person.
> In most instances, what I'm working with are magazine going back from
> 1940 to current. However I've even scanned some books from as early
> as 1903. In scanning the magazines, I scan the text and images
> separtely. The text is OCR'd and saved as a Word RTF (or 6.0) doc. In
> rare instances where the printed font is very small I may either scan
> as line-art or OCR, it depends as previously mentioned on what my
> intended end-use is.
>
> Word does NOT open Acrobat PDF's.
> Word does NOT edit Acrobat PDF's.
> The two are entirely separate software's. The Acrobat Reader may open
> from within Word, however that transition and/or difference should be
> obvious to the user.
>
> In Acrobat (the full version-NOT the Free Reader,) your able to insert
> and/or rearrange your pages both before and after the current page.
>
> When adding a new page to any type of scanned document, your active
> cursor should be at the position you desire the scan inserted.
>
> Sounds to me like you just need to read up on SCANNING.
> Here's an excellent source and every SECOND you spend reading these
> pages will save you days and weeks in time later.
> http://www.scantips.com/



 
Reply With Quote
 
lostinspace
Guest
Posts: n/a
 
      10th Jul 2004
> Why do you capitalize FAST? Is that an acronym?

It was merely for emphasis. quotes or asterisk would have served just as
well.
Sorry if I confused you.


 
Reply With Quote
 
Steve Bukosky
Guest
Posts: n/a
 
      10th Jul 2004
On Sat, 10 Jul 2004 04:34:27 GMT, "Larry" <(E-Mail Removed)> wrote:

>I have a large job to do, turning several hundreds of pages of 8.5 x 11
>xeroxed pages into a single .pdf document. It is straight text. The
>computer I'm using has the Lexmark 1100 all-in-one with a flat bed
>scanner. I've never done a large scan job and have never created a pdf
>document.


I've been scanning documents for seveal years both at home and work.
Several hundred pages is a daunting task! I'd definitely look into a
sheet feed scanner for that! Yesterday I had to scan about thirty
pages of an old manual on a HP flatbed scanner for a customer and it
took more time than I care to spend.

Software that you should consider is PaperPort 9.0. You can scan
pages here and there and covert them to or create them as PDF files
and stack seperate batches into one single file and email it or
whatever. It is searchable text when scanned as text and run through
the PaperPort OCR which does a great job if the text is clear and scan
resolution is set for 300dpi.

 
Reply With Quote
 
lostinspace
Guest
Posts: n/a
 
      10th Jul 2004
Larry,
I've had a sinus migraine for two days which was quite profound in
my last brief reply.

If all you desire for the end result is reading and printing than line-art
scans will do OK.

However, initially your inquiry contained the following "a large job to do,
turning several hundreds of pages of 8.5 x 11 > xeroxed pages into a single
..pdf document." end of quote

No mention here of website use? As in your reply?
Have you seen the server side logs for a website when visitors began viewing
multiple page PDF's online?

There are numerous repeats of downloading the entire PDF when the visitor
merely goes from page to page. Let's say you have a 100-page PDF which the
visitor has opened online?
When they view page one the full file loads, then when they change to page
two, the full file loads again, it's possible that you could have as many as
50-100 loads of your entire large file if the visitor continues browsing to
the end. (I find such visitors annoying and short-sighted.) Add to that the
possibility of the numerous bots spidering your PDF, both god and bad bots
and your bandwidth could increase unnecessarily, very fast.

My suggestion is some caution in placement of the file location (a
disallowed folder in your robots.txt solve the bot problem.)

However, I would also consider breaking up the large document into smaller
chapters or modules. Six-eight pages online are plenty from a server side
point of view.



 
Reply With Quote
 
Larry
Guest
Posts: n/a
 
      12th Jul 2004
Unfortunately, your replies are hard to follow. For one thing, instead
of replying to my questions you go into a lot of other side issues.
Even when you are addressing my questions it's hard to understand what
you're saying. It would help if you would read read over your posts
before you post them and ask yourself, "Is the other person going to be
able to understand anything useful out of this?"

I'm not saying this to put you down. I'm telling you something
important for your and other people's benefit. The whole point of these
groups is to exchange useful information, and while I appreciate your
replies, you need to try to write your replies in a way so that the
other person will be able to make some sense out of them. Thank you.

Larry


 
Reply With Quote
 
lostinspace
Guest
Posts: n/a
 
      12th Jul 2004
----- Original Message -----
From: "Larry" <>
Newsgroups: comp.periphs.scanners
Sent: Monday, July 12, 2004 5:43 AM
Subject: Re: Beginner's questions on scanning


> Unfortunately, your replies are hard to follow. For one thing, instead
> of replying to my questions you go into a lot of other side issues.
> Even when you are addressing my questions it's hard to understand what
> you're saying. It would help if you would read read over your posts
> before you post them and ask yourself, "Is the other person going to be
> able to understand anything useful out of this?"
>
> I'm not saying this to put you down. I'm telling you something
> important for your and other people's benefit. The whole point of these
> groups is to exchange useful information, and while I appreciate your
> replies, you need to try to write your replies in a way so that the
> other person will be able to make some sense out of them. Thank you.
>
> Larry
>
>




My replies do not make sense to you beacuse you have not done any extensive
scanning, which is obvious today.

did you view the link I provided in the closing line of this previous reply?
I really don't need to ask and can affirm without your answer that you DID
NOT!

Hopefully another who is more adapt at communication than myself will come
along and assist you.


----- Original Message -----
From: "Larry" <>
Newsgroups: comp.periphs.scanners
Sent: Saturday, July 10, 2004 12:34 AM
Subject: Beginner's questions on scanning


>
>
>
> I have a large job to do, turning several hundreds of pages of 8.5 x 11
> xeroxed pages into a single .pdf document. It is straight text. The
> computer I'm using has the Lexmark 1100 all-in-one with a flat bed
> scanner. I've never done a large scan job and have never created a pdf
> document.
>
> What is the best and fastest way to do this? Should I scan into Word,
> and then change the Word documents to .pdf? The problem with that is
> that when I pick Word as the application for the scanning, after a
> single page is scanned, Word opens with that document in it. If I then
> scan a second page, a _second_ instance of Word opens with that second
> page in it, instead of adding the second page to the first page in
> single document. I don't see how to create a multiple page document if
> I'm using Word as the application
>
> Alternatively, I can scan into OCR and then change the OCR to pdf, and
> that allows the creation of a multiple-page document, which I can then
> save as a pdf. That seems the way to go. However, if I do scan into
> OCR, the application I must use by default is the Lexmark Photo Editor,
> and that doesn't sound like the right kind of application for pages that
> are all text.
>
> Another question. Let's say I scan 50 pages into the OCR and then turn
> that into a pdf file. Then I go away and come back to do more pages
> which get turned into a second pdf file. How do I combine the two pdf
> files into a single file, keeping the correct order of the pages?
>
> BTW, the Help files for the Lexmark are really poor. Everything is
> broken into separate little steps, nothing gives you an overview of how
> to proceed.
>
> I appreciate any tips on this. Thanks much.
>
> Larry
>
>
>



Hello Larry,
Is your ALL-in-one scanner a flatbed or a feed-fax-like?
I've never used the fax-like, except as a photo copier.

You haven't specified to any length what your end-use or purpose will be for
the entire package?
Does it need to be searchable?
Or will it just be printed?

Nor do you specify the quality of the photostat copies your working with?
Nor, even the source the photo copies were taken from?

Each scanning job is a new method when OCRing text. The quality of paper,
ink and print quality in the original source ALL effect your method and
capabilities.

FAST?
Rather than OCR!
Scan directly into Acrobat as "line-art" at 150DPI of your scanner setting.

The end result will be what you desire, FAST (also the scan file sizes will
be compact,) however what your able to use this line-art scan for will be
very limited (NO search option.) The quality of printing in a line-art
scanned PDF is no where near the quality of an OCR'd with fonts job, however
in most instances the line-art items are perfectly readable.

If you require searchable pages and quality print?
There there is NOT any FAST method.
You OCR the pages individually and make the corrections.

There are ways to improve your OCR work.
Most scanner when used for OCRing text are nearly as sensitive as what I've
sen discussed in this forum for scanning slides.
My current scanner was cleaned when it was NEW and FRESH out of the box
because the maunfacturer had some type of haze on the glass. IMO it was not
a sign of effective quality control, however with regular cleaning the
bottom line scanner has worked superbly for my purposes.

I do way more scanning than the average or even above average person. In
most instances, what I'm working with are magazine going back from 1940 to
current. However I've even scanned some books from as early as 1903.
In scanning the magazines, I scan the text and images separtely. The text
is OCR'd and saved as a Word RTF (or 6.0) doc. In rare instances where the
printed font is very small I may either scan as line-art or OCR, it depends
as previously mentioned on what my intended end-use is.

Word does NOT open Acrobat PDF's.
Word does NOT edit Acrobat PDF's.
The two are entirely separate software's. The Acrobat Reader may open from
within Word, however that transition and/or difference should be obvious to
the user.

In Acrobat (the full version-NOT the Free Reader,) your able to insert
and/or rearrange your pages both before and after the current page.

When adding a new page to any type of scanned document, your active cursor
should be at the position you desire the scan inserted.

Sounds to me like you just need to read up on SCANNING.
Here's an excellent source and every SECOND you spend reading these pages
will save you days and weeks in time later.
http://www.scantips.com/




 
Reply With Quote
 
Larry
Guest
Posts: n/a
 
      12th Jul 2004
The web site you referred me to was enormously technical, way beyond
anything I need to know. I just need some basic practical help on how
to scan pages and create a usable pdf file.

Larry


 
Reply With Quote
 
 
 
Reply

Thread Tools
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Beginner questions Peter Morris Microsoft Dot NET Framework Forms 4 21st May 2008 06:34 AM
Re: 2 questions from a beginner Don Guillett Microsoft Excel New Users 0 14th Apr 2006 06:39 PM
Beginner's questions Dina Perros Microsoft Access 2 14th Mar 2005 10:27 PM
Re: Two Beginner Questions Herb Martin Microsoft Windows 2000 Active Directory 2 28th Apr 2004 01:37 PM
Re: Two Beginner Questions Dmitry Korolyov [MVP] Microsoft Windows 2000 Active Directory 0 26th Apr 2004 03:48 PM


Features
 

Advertising
 

Newsgroups
 


All times are GMT +1. The time now is 04:07 AM.