Extracting/Exporting HTML Tables or PDF Tables into Excel

  • Thread starter master.investor.2005
  • Start date
M

master.investor.2005

Hi All,

I just had a quick programming/general excel question surrounding my
current dilemma. Essentially, I am trying to extract financial tables
from SEC filings (either made in PDF or HTML). Ideally I would like to
have the capability of searching an SEC filing for a specific table
(i.e. lets say a "Consolidated Income Statement") and then have a macro
which would export that table into excel without losing the formatting.
If you guys have any idea as to how to go about doing this and
potentially provide me some starter code I would greatly appreciate
that.

Thanks

Mohammed

P.S. I know next to nothing about VB. So if you could explain what
parameters I may need it would be quite useful and what is going on
with the the code that would be helpful.
 
N

NickHK

Mohammed,
Reading the HTML files could be achieved with a web query. Look into
Data>Get External Data>New Web Query, selecting the table to import from.
Getting data out PDF and into XL can be done manually as I've not looked
into coding this:
Open the PDF in Acrobat, NOT the Reader.
Use the Select Table tool.
Right click and export or open in Excel, depending on your version of
Acrobat.
Or can save the PDF as HTML, then web query that.

NickHK
 
L

lam.charlton

For HTML to Excel, you might consider using the following script
extract -
---------------------------------------------------------------
sURL = "http://www.ibm.com"
On Error GoTo error_handler
Set objIE = CreateObject("InternetExplorer.Application")
With objIE
.Navigate sURL
Do While .Busy: DoEvents: Loop
RowNum = 1
ColNum = 1
With objIE.Document
Set theTables = .all.tags("table")
For Each Table In theTables
For Each Row In Table.Rows
For Each cell In Row.Cells
ws.Cells(RowNum, ColNum) = cell.innerText
ColNum = ColNum + 1
Next
RowNum = RowNum + 1
Next
Next
End With
End With
Set objIE = Nothing
Exit Sub
---------------------------------------------------------------
For PDF to Excel, there's no direct tool I could found, but you might
try PDF->HTML->Excel.

For PDF to HTML, you can use pdf2html, freely available on
sourceforge.net
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top