PDF to Access DB

F

FordsAngel

Is there a way to load tables in MSAccess with information from a PDF file? I
have been told that it is possible with pdf pro, but have not yet figured out
the process...

Thank you!
 
J

Jeff Boyce

Not sure why you posted twice ...

A PDF file is an image. To get the "data" out of it, you'd need to convert
it to something other than an image.

If you have a "pro" version of something like Adobe, you can try the OCR
(optical character recognition) feature to re-build the underlying data ...
but be aware that OCR is less than 100% accurate. Plan on having some of
your data 'lost in translation'.

Regards

Jeff Boyce
Microsoft Office/Access MVP
 
A

a a r o n _ k e m p f

SQL Server can search through PDF files using FullText Search.
SQL Server can search through PDF files using FullText Search.
SQL Server can search through PDF files using FullText Search.
SQL Server can search through PDF files using FullText Search.
 
J

Jeff Boyce

Did you also notice that some SQL-Server gurus point out that modifying
SQL-Server's FullText search to add the ability to search PDF files also
increases the security risk?

Regards

Jeff Boyce
Microsoft Office/Access MVP
 
A

a a r o n _ k e m p f

uh, I don't believe everything I hear.. especially from so-called
'mvps'

I'd take any MCP over any MVP any day of the week.. it's about
demonstrable knowledge, not 'who you know'.. you know?

-Aaron
 
P

Paul Shapiro

A PDF file is not necessarily an image. It can contain text if it was
created from a text-based source. But it's still an "uncomfortable" medium
for extracting data since that's not it's intended purpose. If you open a
PDF in Adobe Acrobat, you can try the Save As menu to see what options you
have. Plain text is one of them, but it will still be tough to count on
extracting the data cleanly and reliably. If you have any option to get data
in a data-centric format, you'll have a much easier time.
 
J

James A. Fortune

Jeff said:
Not sure why you posted twice ...

A PDF file is an image. To get the "data" out of it, you'd need to convert
it to something other than an image.

If you have a "pro" version of something like Adobe, you can try the OCR
(optical character recognition) feature to re-build the underlying data ...
but be aware that OCR is less than 100% accurate. Plan on having some of
your data 'lost in translation'.

Regards

Jeff Boyce
Microsoft Office/Access MVP

A PDF file can contain images, but to claim that "a PDF file is an
image" seems shockingly simplistic, IMO, unless you are only considering
the output to your screen. For example, the PDF 1.7 Reference
describing the PDF format contains about 1310 pages. See the discussion
in the following thread:

http://groups.google.com/group/microsoft.public.access/browse_frm/thread/d34aa27e14854f45

Basically, extracting text and images from a PDF file with 100% accuracy
ranges from fairly easy to very difficult depending on things like the
scope and method of compression used, the number of edits made and
whether or not PDF Linearization optimization was employed by the
program used to create the PDF file. For anything past "somewhat easy"
I recommend not using Access to perform the extraction from the data
streams even though Access theoretically has enough capability to
perform the task. I agree that image and text data can be extracted
from a screen capture (or try a simple copy/paste for text data), but I
consider those methods, especially the "lossy" OCR, to be last resorts.
I think I remember seeing a free software tool that can split a PDF
file into individual one page PDF files. Googling... Perhaps it was:

http://www.pdfhacks.com/pdftk/

Using something like that could possibly break a complex problem down to
smaller pieces that may be more amenable to data extraction. If all
else fails, there are likely many commercial software packages that can
extract data from PDF files and that cost under $100.00.

James A. Fortune
(e-mail address removed)
 
L

Larry Linson

a a r o n _ k e m p f said:
I'd take any MCP over any MVP any day of the week..
it's about demonstrable knowledge, not 'who you know'..
you know?

It would be interesting to have the psychic ability to know just how many
people are rolling in the floor, laughing at the idea of trusting Mr. Kempf
(who claims, but hasn't provided a link to prove, some flavor of MCP) over
any of the MVPs who post in this forum.

I'm not "calling for a vote", but I think I can hear the peals of laughter
from here.

Larry
 
J

Jeff Boyce

Thanks for the clarifications ... that may just satisfy my self-imposed
"learn one new thing each day" requirement!

Jeff B.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top