Can I use Access (2007) for sorting scanned documents?

M

martin gifford

Hi,
I've decided to create a paperless office.
I've scanned about 2000 documents to jpeg files.
Now I want to sort them and cross reference them according to topics.
Is Access 2007 a good tool for this purpose?
If so, is there any good online guides available to show how it can be done
for this purpose?
Any quick tips? (I've used Access a tiny bit a long time ago, but I could
get up to speed quickly.)
Thanks,
Martin Gifford.
 
M

M Skabialka

That gives me an idea for something I'd like to try, but with thousands of
scanned documents in multiple sub-folders I'm not sure how I would find a
document that I was looking for, with names like 112345.jpg - which is a
scanned memorandum. Is there a way to tag a document, then have Access
retrieve this information? Using Access 2007 and Win2K.
Mich
 
J

John W. Vinson

owwwwwww....

JPG is a good storage medium for photographs and images.
It is a TERRIBLE storage medium for documents!

jpg files do not contain text, are not searchable, are not editable, cannot be
indexed, are not crossreferenced by content.

You will at the very least want to run some character recognition software to
convert the text in the images into computer-readable text. Is that part of
the plan (I hope!)?
 
D

David W. Fenton

owwwwwww....

JPG is a good storage medium for photographs and images.
It is a TERRIBLE storage medium for documents!

jpg files do not contain text, are not searchable, are not
editable, cannot be indexed, are not crossreferenced by content.

You will at the very least want to run some character recognition
software to convert the text in the images into computer-readable
text. Is that part of the plan (I hope!)?

It's also not a good format for black and white documents, either.
PNG would be much better.
 
A

Arvin Meyer [MVP]

M

M Skabialka

Actually I haven't scanned the documents yet, one of our departments has
these paper documents and wants me to create a database so they can scan
them and track them and go paperless. They want to be able to find the
documents in the database, then click on a hyperlink and open them. I
hadn't thought about the format of these, since I usually scan into jpeg, so
storing as a .pdf or .png is probably a much better idea. Some of the users
have the full Acrobat and other have the reader.
But I was thinking that the documents would still probably have some fairly
generic names, but somehow users would need to be able to find a particular
one. e.g. there are hundreds of memorandum for record files, all associated
with other documents or products. I don't think Access would be able to
launch a search through the text of these files so identifying the content
remains a mystery at this point.
Any ideas?
 
J

John W. Vinson

Actually I haven't scanned the documents yet, one of our departments has
these paper documents and wants me to create a database so they can scan
them and track them and go paperless. They want to be able to find the
documents in the database, then click on a hyperlink and open them. I
hadn't thought about the format of these, since I usually scan into jpeg, so
storing as a .pdf or .png is probably a much better idea. Some of the users
have the full Acrobat and other have the reader.
But I was thinking that the documents would still probably have some fairly
generic names, but somehow users would need to be able to find a particular
one. e.g. there are hundreds of memorandum for record files, all associated
with other documents or products. I don't think Access would be able to
launch a search through the text of these files so identifying the content
remains a mystery at this point.

You'll need to use the old reliable USB interface to get this done.

Not the Uniform Serial Bus... but the much older and more powerful Using
Someone's Brain.

Full text searching of documents is a complex (some would say arcane) art in
its own right, and you're correct, Access would not be the tool of choice. I
worked tangentially on a "textbase" database twenty years ago, and the field
has evolved a lot since then. But now, as then, proper indexing (manually
inspecting the document, making an intellectually informed choice of which
search terms should apply, and entering them) was by FAR the most expensive
and difficult part of the task.

If the department assumes that "oh, once the documents are scanned, we can
find anything in an instant at the click of a mouse" they are cruising for a
big disappointment.
 
M

M Skabialka

I think I may have to create a table of 'Keywords' and link this to the
document table. Someone will have to go into this table on a subform for
each document and create keywords or tags to describe the document. Then
later they can look for all documents with that tag.
Or maybe a TreeView of the Explorer path from Windows..? I haven't tried
this before, and the folder structure could get pretty deep...
 
J

John W. Vinson

I think I may have to create a table of 'Keywords' and link this to the
document table. Someone will have to go into this table on a subform for
each document and create keywords or tags to describe the document. Then
later they can look for all documents with that tag.
Or maybe a TreeView of the Explorer path from Windows..? I haven't tried
this before, and the folder structure could get pretty deep...

You might want to consider storing a Hyperlink field pointing to the document
itself. A three table structure might be appropriate:

Documents
DocID
DocLocation <hyperlink>
Title
<other fields specific to the document>

Keywords
Keyword <Text, Primary Key>

DocKeywords
DocID
Keyword

You could fill in keywords using a subform with a combo box to select existing
keywords, and use the combo's Not In List event to add new keywords as needed.
 
D

David W. Fenton

Actually I haven't scanned the documents yet, one of our
departments has these paper documents and wants me to create a
database so they can scan them and track them and go paperless.

In my experience, this is a fool's errand. I was involved in a
project of this type more than ten years ago using a commercial
document management program and a hardware budget of 10s of
thousands (for one PC and its scanner and supporting hardware!), and
it was abandoned after they found out how much damned work it was to
both scan and categorize documetns in a way that would allow them to
be usefully retrieved.

They paid a huge restocking fee when they returned all the hardware.
 
D

David W. Fenton

As others have mentioned, you need something that's searchable.
PDFs, at least some of them, are searchable. Access doesn't read
directly from a PDF, but other applications do and some of them
*might* be accessible through automation. There is a PDF to Word
converter:

http://www.nuance.com/pdfconverter/

that might do the job, but I'd call and ask them first.

But if it's just a scanned image that hasn't been OCR'd, it won't be
text searchable.

And if you have to OCR it, it has to be proofed. Obviously the level
of this is much higher than it once was (if it weren't Google Books
couldn't exits), but it's still not perfect.
 
M

M Skabialka

What Uncle Sam wants, Uncle Sam gets!

David W. Fenton said:
In my experience, this is a fool's errand. I was involved in a
project of this type more than ten years ago using a commercial
document management program and a hardware budget of 10s of
thousands (for one PC and its scanner and supporting hardware!), and
it was abandoned after they found out how much damned work it was to
both scan and categorize documetns in a way that would allow them to
be usefully retrieved.

They paid a huge restocking fee when they returned all the hardware.
 
M

M Skabialka

I had planned to use a hyperlink. The path to the document, plus its name
should give a fairly accurate description of what should be in the document,
these are documents supporting products, and the folders they are stored
under will categorize the documents based on the product. The users don't
want to open Explorer and start wending their way through multiple folders
to find a document if they can select a key word or two, find the document,
link to it and open it.
I have a database which does this with digitized technical drawings but can
import a pre-generated text file describing what is in the folder so the
user doesn't need to generate any keywords. e.g. They select Drawing No
123-456, it lists all the sheets to the drawing and the revision number.
They make a selection and click on a hyperlink to the actual drawing which
opens in the appropriate application. It's location is completely
transparent to the user.
 
A

Arvin Meyer [MVP]

David W. Fenton said:
But if it's just a scanned image that hasn't been OCR'd, it won't be
text searchable.

And if you have to OCR it, it has to be proofed. Obviously the level
of this is much higher than it once was (if it weren't Google Books
couldn't exits), but it's still not perfect.

Not all PDFs are scanned images. PDFs made from text documents or made by
printing to a PDF printer can be searched just like the original text, at
least with the Foxit PDF reader that I use.:

http://www.foxitsoftware.com/pdf/reader_2/down_reader.htm
 
D

David W. Fenton

Not all PDFs are scanned images.

Well, of course not. But I thought the OP said it was all scanned
images. Hence my remark "if it's just a scanned image that hasn't
been OCR'd...".
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top