OCR software and ambiguity

B

Big Blue

I am in the market for a scanner so I have been reading posts here for the
past week. What I find frustrating are the responses I have read regarding
questions about OCR software. Since I will be using this feature primarily
and picuture and slide transfer secondarily, I have been looking for
information on this subject. I soon discovered that the vast number of
posts deal with 35 mm, film, and pictures, not document transfer. When I
finally did come across a few posts on the OCR subject, one post said that
the best OCR programs are true/accurate 99% of the time. What the poster
failed to indicate, however, was what the name was for that software.
Another post referred to the "good" software without any mention of the
manufacturers' names. Since I am planning on spending no more than $200, I
am particularly interested in bundled OCR programs and the scanners they
come in. Thus far, I have had a very tough time finding out what is good and
what is not in terms of scanner and accompanying software. It appears that
some scanners are good mechanically but are bundled with mediocre software
while other scanners are not as structurally sound, but possess topnotch
software. Is there a scanner that is both well made and is loaded with solid
OCR software? If not, what would be a good compromise in my price category
and projected purposes. I am not an avid photographer but would like to put
some of my old slides and pictures onto a CD as well as use the scanner for
document transfer to Word and Adobe PDF.
Thanks
 
M

mark.grebner

My strong recommendation would be Abbyy. I've used it for two years,
running OCR on about 30,000 pages, and my impression is that it has the
highest accuracy in the marketplace. I think it's about twice as
costly as the $200 limit you mention.
 
D

Dances With Crows

What I find frustrating are the responses I have read
regarding questions about OCR software. Since I will be using this
feature primarily and picuture and slide transfer secondarily, I have
been looking for information on this subject.

I hear they've invented this thing called Google....
the vast number of posts deal with 35 mm, film, and pictures, not
document transfer.

Yep. Well, what can you do? Most of the more vocal folks deal with
picture scanning, since scanning text, OCRing it, and getting decent
results is simpler from an end-user perspective than scanning pictures
and getting good results.
one post said that the best OCR programs are true/accurate 99% of the
time. What the poster failed to indicate, however, was what the name
was for that software.

99% accuracy seems awfully high. It's certainly possible if you have
high-contrast text that isn't mangled/broken, but OCR engines are
software, and All Software Sucks. Don't take my opinion as The Truth,
though; I work extensively with images that have broken/mangled/curled
type on them and there's no OCR engine in existence that will grok that.
Since I am planning on spending no more than $200, I am particularly
interested in bundled OCR programs

"Bundled" typically equals "crap". Another poster mentioned Abbyy
Finereader, which is actually a very good engine. It's expensive, but
may be worth it. You could also take a look at Omnipage and TypeReader;
IME TypeReader is better than Omnipage at recognizing text but slightly
worse than Finereader.
I have had a very tough time finding out what is good and what is not
in terms of scanner and accompanying software. It appears that some
scanners are good mechanically but are bundled with mediocre software
while other scanners are not as structurally sound, but possess
topnotch software. Is there a scanner that is both well made and is
loaded with solid OCR software?

You typically get what you pay for when it comes to hardware. In many
fields, the best software is Free, but OCR is not one of those fields
IME. If I were you, I'd get the best hardware I could for my money, but
then I don't care much about commercial software since I don't run
Windows at home.
would like to put some of my old slides and pictures onto a CD as well
as use the scanner for document transfer to Word and Adobe PDF.

Putting JPEGs or TIFFs on CD is orthogonal to scanning them in. Any
good CD-burning program like k3b or XCDRoast will do that. If you meant
"create a Kodak Photo CD", that's a little different but still doable.
Word is a horrible format for keeping text in; use something that can be
edited anywhere with anything, like simple HTML. PDF is fine for
distributing documents to others, but it's a complete and utter bitch to
edit a PDF. Keep the master document in something else (HTML, text,
XML, LaTeX, troff, whatever) and convert it to PDF using print to
file->ps2pdf (or OpenOffice's "save as PDF") when you need to. HTH,
 
M

Martin Trautmann

99% accuracy seems awfully high. It's certainly possible if you have
high-contrast text that isn't mangled/broken, but OCR engines are
software, and All Software Sucks.

99 % is awfully low ;-)

It depends on how you define OCR and accuracy:

A single page may hold about 3000 characters. Thus 99 % means that 30
characters on a single page might be incorrect - which means that up to 30
words are incorrect (or even 30 sentences, not to say 100 % of the
text).

In fact I've used Abbyy FineReader recently - bundled into the Microtek
software. Although I did not have any choice how to fine tune the
application, the result was pretty good and much better than myself
typing it again.
 
D

Dances With Crows

99 % is awfully low ;-) It depends on how you define OCR and accuracy:
A single page may hold about 3000 characters. Thus 99 % means that 30
characters on a single page might be incorrect - which means that up
to 30 words are incorrect (or even 30 sentences, not to say 100 % of
the text).

YMMV on this as on everything. My experience has been almost entirely
with OCRing images that are of fairly low quality; microfilm that's been
sitting in a vault for N years tends to lose detail. Some of the
originals were printed with low-contrast ink, photographed ineptly, or
damaged by water/sunlight/rabid weasels. IME, the best OCR comes from
good-quality original images, but different engines may provide
different results, and you *always* have to proofread and correct
things.
In fact I've used Abbyy FineReader recently - bundled into the
Microtek software. Although I did not have any choice how to fine tune
the application, the result was pretty good and much better than
myself typing it again.

If the Finereader you used was bundled and didn't have any options, it
was a neutered version. If you can't retype a page of text and get
better accuracy than what an OCR engine gives you, you may need to
practice your typing/proofreading.

OCR is useful because it reduces the time and tedium necessary to
convert images to text, not because it does a better job than a human
would. OCR does a *worse* job than human eyeballs, but you can load up
100 images and tell the engine to batch-locate and batch-recognize and
spit out an RTF. Then you can go do something else while the engine's
doing the tedious work.
 
M

Mendel Leisk

Another vote for Abbyy FineReader, for reliable accurate ocr, though
their prices have gone thru the roof. Ver. 5.0 was around $99. Later
versions have ATLEAST doubled. 6.0 works fine for me, a little more
accurate than 5.0. Tried 7.0, something was bugging me about it, can't
recall if it was DECREASE in accuracy, or something else buggy.
 
H

Hecate

If not, what would be a good compromise in my price category
and projected purposes. I am not an avid photographer but would like to put
some of my old slides and pictures onto a CD as well as use the scanner for
document transfer to Word and Adobe PDF.
Thanks
Abby Finereader is, IMHO, the best. Followed by Omnipage. Whatever you
do, avoid Iris.
 
H

Hecate

Another vote for Abbyy FineReader, for reliable accurate ocr, though
their prices have gone thru the roof. Ver. 5.0 was around $99. Later
versions have ATLEAST doubled. 6.0 works fine for me, a little more
accurate than 5.0. Tried 7.0, something was bugging me about it, can't
recall if it was DECREASE in accuracy, or something else buggy.

Using v7 with no problems at all. And it's the most accurate so far.
 
M

Mendel Leisk

Also, while I mainly use it for bare-bones text records, Finereader can
output very nice Word documents, with the columns, multi-fonts, jpegs
pictures, etcetera.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top