Getting text from PDF

Klaus Jensen · Nov 30, 2005

Hi!

I need to extract all text from PDF-files for fulltext-indexing purposes.
How do I do that?

I have looked at several PDF-components, but none of them have features to
read the text in the PDF - only create PDF's.

Using an applicaton (or indexing service) to search the pdf is not what I
need, I need to extract the text and store it in a database.

Any pointers and help will be greatly appreciated.

Thanks in advance

Klaus Jensen

Brian Henry · Nov 30, 2005

if you are using SQL Server all you need to do is install the adobe PDF
Ifilter and it will full text index it for you automatically

Ken Tucker [MVP] · Dec 1, 2005

Hi,

http://www.codeproject.com/showcase/TallComponents.asp

Ken

Klaus Jensen · Dec 1, 2005

Brian Henry said:
if you are using SQL Server all you need to do is install the adobe PDF
Ifilter and it will full text index it for you automatically

Hi Brian

Thanks for your response!

Unfortunately that would mean having to store the PDF's in the SQL Server,
and I am talking about 1 gig of data a day... Im afraid it is not an option.

- Klaus

Klaus Jensen · Dec 1, 2005

Ken Tucker said:
http://www.codeproject.com/showcase/TallComponents.asp

Hi Ken

Thanks for your reply, I'll look into it.

- Klaus

Extract text from EPS-file?	1	Nov 28, 2005
How to extract highlighted text from a PDF file	1	Mar 4, 2009
I need to extract the highlighted text from PDF	0	Jun 29, 2009
Extracting highlighted text from PDF file... Need help..!!!	2	Mar 7, 2009
Code to Extract Text from PDF	3	Jul 2, 2008
Creating custom stamps in .pdf Adobe documents	1	Jul 21, 2016
Copying Text from a PDF file???	7	May 2, 2014
Extract Image From PDF	5	Aug 22, 2008

Getting text from PDF

Klaus Jensen

Brian Henry

Ken Tucker [MVP]

Klaus Jensen

Klaus Jensen

Ask a Question

Similar Threads