PDF to Access DB

FordsAngel · Mar 31, 2009

Is there a way to load tables in MSAccess with information from a PDF file? I
have been told that it is possible with pdf pro, but have not yet figured out
the process...

Thank you!

Jeff Boyce · Mar 31, 2009

Not sure why you posted twice ...

A PDF file is an image. To get the "data" out of it, you'd need to convert
it to something other than an image.

If you have a "pro" version of something like Adobe, you can try the OCR
(optical character recognition) feature to re-build the underlying data ...
but be aware that OCR is less than 100% accurate. Plan on having some of
your data 'lost in translation'.

Regards

Jeff Boyce
Microsoft Office/Access MVP

a a r o n _ k e m p f · Mar 31, 2009

SQL Server can search through PDF files using FullText Search.
SQL Server can search through PDF files using FullText Search.
SQL Server can search through PDF files using FullText Search.
SQL Server can search through PDF files using FullText Search.

a a r o n _ k e m p f · Mar 31, 2009

I believe that you just need to register / install something that
implements the Acrobat IFilter interface

http://dineshasanka.spaces.live.com/Blog/cns!22A79FCE82651673!248.entry

Jeff Boyce · Apr 1, 2009

Did you also notice that some SQL-Server gurus point out that modifying
SQL-Server's FullText search to add the ability to search PDF files also
increases the security risk?

Regards

Jeff Boyce
Microsoft Office/Access MVP

a a r o n _ k e m p f · Apr 1, 2009

uh, I don't believe everything I hear.. especially from so-called
'mvps'

I'd take any MCP over any MVP any day of the week.. it's about
demonstrable knowledge, not 'who you know'.. you know?

-Aaron

Paul Shapiro · Apr 1, 2009

A PDF file is not necessarily an image. It can contain text if it was
created from a text-based source. But it's still an "uncomfortable" medium
for extracting data since that's not it's intended purpose. If you open a
PDF in Adobe Acrobat, you can try the Save As menu to see what options you
have. Plain text is one of them, but it will still be tough to count on
extracting the data cleanly and reliably. If you have any option to get data
in a data-centric format, you'll have a much easier time.

James A. Fortune · Apr 1, 2009

Jeff said:
Not sure why you posted twice ...

A PDF file is an image. To get the "data" out of it, you'd need to convert
it to something other than an image.

If you have a "pro" version of something like Adobe, you can try the OCR
(optical character recognition) feature to re-build the underlying data ...
but be aware that OCR is less than 100% accurate. Plan on having some of
your data 'lost in translation'.

Regards

Jeff Boyce
Microsoft Office/Access MVP

A PDF file can contain images, but to claim that "a PDF file is an
image" seems shockingly simplistic, IMO, unless you are only considering
the output to your screen. For example, the PDF 1.7 Reference
describing the PDF format contains about 1310 pages. See the discussion
in the following thread:

http://groups.google.com/group/microsoft.public.access/browse_frm/thread/d34aa27e14854f45

Basically, extracting text and images from a PDF file with 100% accuracy
ranges from fairly easy to very difficult depending on things like the
scope and method of compression used, the number of edits made and
whether or not PDF Linearization optimization was employed by the
program used to create the PDF file. For anything past "somewhat easy"
I recommend not using Access to perform the extraction from the data
streams even though Access theoretically has enough capability to
perform the task. I agree that image and text data can be extracted
from a screen capture (or try a simple copy/paste for text data), but I
consider those methods, especially the "lossy" OCR, to be last resorts.
I think I remember seeing a free software tool that can split a PDF
file into individual one page PDF files. Googling... Perhaps it was:

http://www.pdfhacks.com/pdftk/

Using something like that could possibly break a complex problem down to
smaller pieces that may be more amenable to data extraction. If all
else fails, there are likely many commercial software packages that can
extract data from PDF files and that cost under $100.00.

James A. Fortune
(e-mail address removed)

Larry Linson · Apr 1, 2009

a a r o n _ k e m p f said:
I'd take any MCP over any MVP any day of the week..
it's about demonstrable knowledge, not 'who you know'..
you know?

It would be interesting to have the psychic ability to know just how many
people are rolling in the floor, laughing at the idea of trusting Mr. Kempf
(who claims, but hasn't provided a link to prove, some flavor of MCP) over
any of the MVPs who post in this forum.

I'm not "calling for a vote", but I think I can hear the peals of laughter
from here.

Larry

Jeff Boyce · Apr 1, 2009

Thanks for the clarifications ... that may just satisfy my self-imposed
"learn one new thing each day" requirement!

Jeff B.

Best Tool to Repair Corrupted PDF Files	0	Jan 7, 2025
How can we convert OST to PDF file format?	1	Apr 1, 2025
PowerPoint Printing PPT to PDF without losing quality	1	May 27, 2023
Browse button for access form	1	May 22, 2020
attach pdf file	1	Dec 22, 2009
Problem Exporting Images Over 5 MB to PDF	2	Jul 3, 2012
Access Access variable string for export	0	Nov 28, 2017
Create Fillable PDF Forms With LibreOffice Writer	3	May 26, 2019

PDF to Access DB

FordsAngel

Jeff Boyce

a a r o n _ k e m p f

a a r o n _ k e m p f

Jeff Boyce

a a r o n _ k e m p f

Paul Shapiro

James A. Fortune

Larry Linson

Jeff Boyce

Ask a Question

Similar Threads