What scanner do you recommend here?

R

Random Person

Hello all. I have a Mustek 1200 UB Plus (600 x 1200 dpi) scanner. When
I scan documents even on the highest resolution, and try to print it,
the borders of the text comes out all fuzzy (sort of like when you use
a .jpg file instead of a .bmp file to store text). The bitmap files the
scanner outputs are also huge (a few megabytes per page), despite
producing a blurry print in the end.

Basically, I have some really old texts that are falling apart that I'd
like to digitise. I've seen libraries and archives do it - they can
scan old, out of print, documents and still produce a small .pdf file
(usually <1MB) with crisp text. How do they do it?

So basically I'm looking for a scanner that can produce clearly defined
text (maybe some pictures/graphs in the documents), with a fast
scanning rate - no more than several seconds a page. Lastly, before I
sound like I'm asking for the moon, any chance the scanner can be
inexpensive?

Thanks :)
 
P

Pen

Random Person said:
Hello all. I have a Mustek 1200 UB Plus (600 x 1200 dpi)
scanner. When
I scan documents even on the highest resolution, and try to
print it,
the borders of the text comes out all fuzzy (sort of like when
you use
a .jpg file instead of a .bmp file to store text). The bitmap
files the
scanner outputs are also huge (a few megabytes per page),
despite
producing a blurry print in the end.

Basically, I have some really old texts that are falling apart
that I'd
like to digitise. I've seen libraries and archives do it - they
can
scan old, out of print, documents and still produce a small
.pdf file
(usually <1MB) with crisp text. How do they do it?

So basically I'm looking for a scanner that can produce clearly
defined
text (maybe some pictures/graphs in the documents), with a fast
scanning rate - no more than several seconds a page. Lastly,
before I
sound like I'm asking for the moon, any chance the scanner can
be
inexpensive?

Thanks :)

If they are producing small files, are they text files?
If so, they are using OCR software. Frankly your problem
sounds more like problems with the source materiel,
not the scanner. That said, here's a review from PCMag;
http://www.pcmag.com/article2/0,1759,1812831,00.asp
and one from PCWorld;
http://www.pcworld.com/reviews/article/0,aid,119276,00.asp
 
S

Synapse Syndrome

Random Person said:
Hello all. I have a Mustek 1200 UB Plus (600 x 1200 dpi) scanner. When
I scan documents even on the highest resolution, and try to print it,
the borders of the text comes out all fuzzy (sort of like when you use
a .jpg file instead of a .bmp file to store text). The bitmap files the
scanner outputs are also huge (a few megabytes per page), despite
producing a blurry print in the end.

Basically, I have some really old texts that are falling apart that I'd
like to digitise. I've seen libraries and archives do it - they can
scan old, out of print, documents and still produce a small .pdf file
(usually <1MB) with crisp text. How do they do it?

So basically I'm looking for a scanner that can produce clearly defined
text (maybe some pictures/graphs in the documents), with a fast
scanning rate - no more than several seconds a page. Lastly, before I
sound like I'm asking for the moon, any chance the scanner can be
inexpensive?

Maybe you should actually try using a lower resolution and then using some
OCR software to recognise the text and turn them into small PDF files.

There are free trials:
http://www.abbyy.com/finereader_ocr/

ss.
 
K

kony

Hello all. I have a Mustek 1200 UB Plus (600 x 1200 dpi) scanner. When
I scan documents even on the highest resolution, and try to print it,
the borders of the text comes out all fuzzy (sort of like when you use
a .jpg file instead of a .bmp file to store text). The bitmap files the
scanner outputs are also huge (a few megabytes per page), despite
producing a blurry print in the end.

Look closely at those BMP files with an image editor (zoomed
in). Determine for certain that the text in the BMP is
already fuzzy or whether becoming so from printing it.

Many scanners have manual settings (software) for contrast.
This can reduce fuzzy edges. If you didn't mind editing the
documents you could also use a sharpen filter. If your
image editor has a batch-processing mode, you might even be
able to have it process a folder full of files instead of
manually doing each.


Basically, I have some really old texts that are falling apart that I'd
like to digitise. I've seen libraries and archives do it - they can
scan old, out of print, documents and still produce a small .pdf file
(usually <1MB) with crisp text. How do they do it?


OCR (optical character recognition) software. It may have
come with your scanner, but such bundled software is often
only a "lite" version. Typically a full OCR software that
does a good job requires buying it, especially on poor
documents, and those with odd / fancy text may not OCR very
well regardless.
So basically I'm looking for a scanner that can produce clearly defined
text (maybe some pictures/graphs in the documents), with a fast
scanning rate - no more than several seconds a page. Lastly, before I
sound like I'm asking for the moon, any chance the scanner can be
inexpensive?

Thanks :)

Sometimes it helps to put a dark (black) page behind the
page of scanned text, when the page is thin enough that the
back shows through. This maximizes contrast of the lighter
background and may result in less text bleed-through from
the other side of the page.
 
R

Random Person

Hi. The output files they're producing are Adobe .pdf files. I'm pretty
sure they aren't using OCR software. I know the old documents have been
scanned in because tell tale signs (such as the book's spine and page
edges) are visible.
 
R

Random Person

Hi Kony. I think part of the problem is that my printer for some reason
prints out a grey "haze" as the white background of the page, i.e. even
parts which are supposed to be all white have some ink pixels on them
after scanning/printing.

I'm pretty sure no OCR program was used on it (see my response to Pen).


The scanned documents in bmp format from my Mustek 1200 UB Plus (600 x
1200 dpi) scanner look crisp. Of course when I zoom in, the edges go
blurry. I am not sure if that is supposed to happen, or if I am
supposed to get 100% black-white contrast at the edges regardless of
zoom.

Unfortunately I can't do any fresh testing other than with the old
scanned documents I have because the scanner no longer works.

Another question: supposing I get to sort out the text edge sharpness
problem, how do I get the scanner to scan in a picture and not treat it
like text? (e.g. fine colour for a picture, line art for the text).

Thanks!
 
K

kony

Hi. The output files they're producing are Adobe .pdf files. I'm pretty
sure they aren't using OCR software. I know the old documents have been
scanned in because tell tale signs (such as the book's spine and page
edges) are visible.


It doesn't matter what the output format is.
To turn colored pixels (scanned data) into text, it needs be
OCR'd. You can of course use a PDF to just display a
graphic image instead, but it's a relative HUGE file, which
takes away most of the benefits of PDF format except the
issue of how to help unknowning users print something at the
correct size.

Acrobat (full, not the reader) allows printing from *any*
application to PDF format, as well as other ways to convert
or create PDF from (OCR-created as well as other formats).
If you want PDF and are particular about how the result
looks, you'll need become proficient at Acrobat (which it
would appear is not the case yet).
 
K

kony

Hi Kony. I think part of the problem is that my printer for some reason
prints out a grey "haze" as the white background of the page, i.e. even
parts which are supposed to be all white have some ink pixels on them
after scanning/printing.

Scanners consider true white or something less. What I mean
is that even if a page looks relatively white, if it's not
"true" white in the scanner's "mind", you would indeed get a
very light grey pixels instead of white in the BMP. This is
where adjusting contrast comes into play, to boost the
lightest values up to a true-white level. Further, some
imaging programs allow adjusting midtone, this will either
lighten or darken (your choice) that mid-value where the
dark text blurs against the light background.


I'm pretty sure no OCR program was used on it (see my response to Pen).

If a full page of easily-legible text is a PDF file of less
than a few hundred KB (one with no other picture, and this
"few hundred KB" is a rough guesstimation) then it would
have to be OCR'd.

If you have the full version of Acrobat, maybe even the
reader, you can try to select and copy text. If the text
characters are copied rather than a bitmap (to the
clipboard) then it HAD to be OCR'd.


The scanned documents in bmp format from my Mustek 1200 UB Plus (600 x
1200 dpi) scanner look crisp. Of course when I zoom in, the edges go
blurry. I am not sure if that is supposed to happen, or if I am
supposed to get 100% black-white contrast at the edges regardless of
zoom.

Unfortunately I can't do any fresh testing other than with the old
scanned documents I have because the scanner no longer works.

Oh.
Yeah I guess you need another scanner then.
Prices vary wildly on scanners, I suggest you first look
into the pricing on OCR software then determine how well it
fits into the entire scanner/etc budget. Good OCR software
plus an average scanner will be better at this task than
poor OCR and a great scanner. However, the issue of
adjusting contrast via software will usually still apply.

Another question: supposing I get to sort out the text edge sharpness
problem, how do I get the scanner to scan in a picture and not treat it
like text? (e.g. fine colour for a picture, line art for the text).

The scanner doesn't "treat" anything. It ONLY creates a
bitmap. There is no problem with fine color for text
instead of line-art, as either is not going to be much
different relative to actual text characters. Using actual
text characters is necessary to achieve good filesizes with
good clarity.
 
S

Synapse Syndrome

Random Person said:
Hi. The output files they're producing are Adobe .pdf files. I'm pretty
sure they aren't using OCR software. I know the old documents have been
scanned in because tell tale signs (such as the book's spine and page
edges) are visible.

Sounds like they are just highly compressed JPEGs embedded in the PDF files.
This means that you won't be able to search or select text in the documents.

ss.
 
K

kony

Sounds like they are just highly compressed JPEGs embedded in the PDF files.
This means that you won't be able to search or select text in the documents.

ss.

Maybe JPEG, though often I find text remains more readable
at similar filesizes with PNG or GIF.
 
J

John

Hello all. I have a Mustek 1200 UB Plus (600 x 1200 dpi) scanner. When
I scan documents even on the highest resolution, and try to print it,
the borders of the text comes out all fuzzy (sort of like when you use
a .jpg file instead of a .bmp file to store text). The bitmap files the
scanner outputs are also huge (a few megabytes per page), despite
producing a blurry print in the end.

Basically, I have some really old texts that are falling apart that I'd
like to digitise. I've seen libraries and archives do it - they can
scan old, out of print, documents and still produce a small .pdf file
(usually <1MB) with crisp text. How do they do it?

So basically I'm looking for a scanner that can produce clearly defined
text (maybe some pictures/graphs in the documents), with a fast
scanning rate - no more than several seconds a page. Lastly, before I
sound like I'm asking for the moon, any chance the scanner can be
inexpensive?

Someone asked a similar question at a deal website. I scan rebate
receipts all the time. I save them in jpeg and pdf whatever its set on
when I scan it. Im using the Canon 3000ex scanner which is sold in the
US for 49 bucks regular price. They have a wide variety of scanners
below 99 bucks which seem sort of similar . See if you can find an
example of a scan from it. I dont see any any fuzzyness unless it was
there to begin with. A recent receipt I got for a wireless phone was
printed out blurry so obviously its still blurry but the others are
fine.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top