in-memory pdf2text?

  • Thread starter Thread starter Allan Ebdrup
  • Start date Start date
A

Allan Ebdrup

I would like to have a class library in dotNet, where I have an in-memory
bytestream of a pdf file and I want to convert it to a string (extract the
text from the file).

A method like this in C#:

public string Pdf2Text(byte[] pdfFile){ ... }

Is this something you can provide or do you know any place where I can buy a
class library with this kind of functionality?

I don't want to have to save the pdf to a file.

Kind Regards,

Allan Ebdrup,

Software Architect,

OFiR
 
Hi Allan,

I found a good open source in the codeproject website. This is a solution
written in C#. You may go to the following link for more information.

http://www.codeproject.com/useritems/PDFToText.asp

Hope this helps.

Note: Since this is a 3rd party solution, Microsoft isn't responsible for
its stability and security.

Sincerely,
Linda Liu
Microsoft Online Community Support

==================================================
Get notification to my posts through email? Please refer to
http://msdn.microsoft.com/subscriptions/managednewsgroups/default.aspx#notif
ications.

Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
where an initial response from the community or a Microsoft Support
Engineer within 1 business day is acceptable. Please note that each follow
up response may take approximately 2 business days as the support
professional working with you may need further investigation to reach the
most efficient resolution. The offering is not appropriate for situations
that require urgent, real-time or phone-based interactions or complex
project analysis and dump analysis issues. Issues of this nature are best
handled working with a dedicated Microsoft Support Engineer by contacting
Microsoft Customer Support Services (CSS) at
http://msdn.microsoft.com/subscriptions/support/default.aspx.
==================================================

This posting is provided "AS IS" with no warranties, and confers no rights.
 
Linda Liu said:
Hi Allan,

I found a good open source in the codeproject website. This is a solution
written in C#. You may go to the following link for more information.

http://www.codeproject.com/useritems/PDFToText.asp

I downloaded it and modified it so it could work on in-memory pdf files.
But I just discovered that what i really need is pdf2html, is there also an
open source class library that can do this conversion pdf -> html.

Kind Regards,
Allan Ebdrup
 
HI Allan,

I am sorry to tell you that I didn't find any open source class library to
do the conversion pdf to html.

As we all know, the source of an html file is made up of html tags, such as
<html>,<head>,<body> and so on. Since you have known how to extract text
from a pdf file, you could write some properly html tags along with the
text extracted from the pdf file into the html file.

Hope this helps.


Sincerely,
Linda Liu
Microsoft Online Community Support
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Back
Top