HTML files into Excel

  • Thread starter Thread starter alfred
  • Start date Start date
A

alfred

Hi,

I have >1,000 html fairly short html files, which I want to extract
into a spreadsheet. The html were originally generated from a
template, but there are about 3 different types of template within the
1000 files. Is it possible to write a script that will extract all
the data from within the html files, and put everything between the
tags into a cell. I am not concerned about retaining the html tags,
but need to extract all the info into excel, so i can manipulate, and
regenerate the html....any hepl appreciated!

Thanks
 
HTML files are text files. You would read them like any textt files and then
apply any filtering you need. Here is a genral format of VBA code that reads
and writtes text data.

You can see the HTML text by opening the HTML with Notepad or in Internet
explorer sectect View Menu - Source. It is basically string manipulations
using Left(),MID,and Right() functions. I have used VBA to modify XML files.
 
Have you tried just opening one of the html files in Excel using File=>Open.

It should put the data in various cells.

If you want to remove any formatting, you can select all the cells, do
edit=>copy, go to a new sheet and do Edit=>Paste special and select values.

If this works, then you could write a macro to do it on all the files and
accumulate the results.
 
Tom: With Hyper-text files some tagged items are formatting and some are
actual text. Alfred is looking for data which I believe is the text and not
the formating. The task would be to identify each tagged field and only
extractt the tags that pertain to the text strings.
 
If I agreed, I wouldn't have posted.

Opening the files in excel should do exactly what I feel the OP wants to do.
Nonetheless, it is offered for his consideration to accept or reject.

I suspect he is still waiting for your promise:
Here is a genral format of VBA code that reads and writtes text data.

I hope it wasn't the simple paragraph that followed. Based on that, I
would have expected a generalized html parser that strips off all tags.
(which is essentially what I offered).
 
forgot to paste sample code. Tom just notified me of my mistake.

Sub ConvertCSV()

Const Sourcefile = "c:\temp\Origin.csv"
Const Destfile = "c:\temp\Destination.csv"
Const ForReading = 1
Const ForWriting = 2
Const ForAppending = 3

Set OriginCSV = _
CreateObject("Scripting.FileSystemObject")
Set FOrigin = _
OriginCSV.GetFile(Sourcefile)
Set FSOrigin = _
FOrigin.OpenAsTextStream _
(ForReading)


Set DestinationCSV = _
CreateObject("Scripting.FileSystemObject")
DestinationCSV.CreateTextFile Destfile
Set DestinationCSV = DestinationCSV. _
GetFile(Destfile)
Set FSDestination = DestinationCSV. _
OpenAsTextStream _
(ForWriting)



Do While FSOrigin.ATENDOFSTREAM = False


InputString = FSOrigin.readline


Else
'Loop until no more characters in line
First = True
Do While Len(InputString) > 0

'enter your code here

Loop

End If
Loop

FSOrigin.Close
FSDestination.Close

End Sub
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Back
Top