Convert HTML table to CSV?

  • Thread starter Thread starter Steve Ericson
  • Start date Start date
S

Steve Ericson

Perhaps a silly question and I don't know if it can be done. Is there
freeware to convert HTML tables to CSV (or spreadsheet) files?

Steve Ericson
 
Steve said:
Perhaps a silly question and I don't know if it can be done. Is there
freeware to convert HTML tables to CSV (or spreadsheet) files?

http://www.pricelesswarehome.org/acf/P_TEXT.php#2.03Convert:HTMLToTextAndTables

HTML2Table; HTMStrip

Program: HTML2Table
Author: (Stefan Pettersson)
Ware: (Donationware)
http://www.stefan-pettersson.nu/

Program: HTMStrip
Author: Wayne Software (Bruce Guthrie)
Ware: (Freeware)
http://users.erols.com/waynesof/bruce.htm


Susan
--
Posted to alt.comp.freeware
Search alt.comp.freeware (or read it online):
http://www.google.com/advanced_group_search?q=+group:alt.comp.freeware
Pricelessware & ACF: http://www.pricelesswarehome.org
Pricelessware: http://www.pricelessware.org (not maintained)
 
Steve said:
Doesn't run on my system, get's stuck during the attempted conversion.

Can't get it to do what I want, that is producing a table which I can
import into a simple database (CSV) or spreadsheet.

Thanks for the feedback. You might try:

Program: HTML2TXT
Author: BobSoft.com (Yang Bo)
Ware: (Liteware)
http://www.bobsoft.com/h2t/

Susan
--
Posted to alt.comp.freeware
Search alt.comp.freeware (or read it online):
http://www.google.com/advanced_group_search?q=+group:alt.comp.freeware
Pricelessware & ACF: http://www.pricelesswarehome.org
Pricelessware: http://www.pricelessware.org (not maintained)
 
Yeah. I tried it, too. Sux.

Steve Ericson said:
Thanks, Susan. I tried it, but the output file was html as well.
Besides, the interface is hopeless, with a disappearing menu.
 
Steve Ericson said:
Thanks, Susan. I tried it, but the output file was html as well.
Besides, the interface is hopeless, with a disappearing menu.

Steve

If you have Cygwin for Windows (it's free), you can write a shell script
that will do what you want. Here's a first shot at such a script. It does a
pretty good job. I prefer tab-delimited files over comma-delimited, so
that's what this does.

It runs pretty slowly, but WTF?

When cuttting and pasting the code, take care NOT to create any blanks at
the end of any line of code (especially after the \ character), or it won't
work.

--- cut here ---
#!/bin/ksh

# table2text
#
# usage: table2text file-name
#
# Eliminates (almost) all HTML garbage from a file,
# keeping only the text contents of a <table>.
#
# It's not (yet) smart enough to eliminate the text
# of embedded style-sheets.

case $# in
1) file="$1";;
*) exit 1;;
esac

cat "$file" | \
sed -e "s,\~,,g" \
-e "s,<[Tt][Aa][Bb][Ll][Ee][^>]*>,\~TABLE\~,g" \
-e "s,</[Tt][Aa][Bb][Ll][Ee][^>]*>,\~/TABLE\~,g" \
-e "s,<[Tt][Rr][^>]*>,\~TR\~,g" \
-e "s,</[Tt][Rr][^>]*>,\~/TR\~,g" \
-e "s,<[Tt][Dd][^>]*>,\~TD\~,g" \
-e "s,</[Tt][Dd][^>]*>,\~/TD\~,g" \
-e "s,</*[^>]*>,,g" \
-e "s,\&nbsp;, ,g" \
-e "s, *$,," \
-e "s,^ *,," \
-e "s, *, ,g" | \
grep -v "^$" | \
while read instrg
do
outstrg=`echo "$instrg" | \
sed -e "s,\~/TABLE\~,,g" \
-e "s,\~/TR\~,\~,g" \
-e "s,\~/TD\~, ,g" \
-e "s,\~TABLE\~,\~,g" \
-e "s,\~TR\~,,g" \
-e "s,\~TD\~,,g"`
echo "$outstrg\c"
done | \
tr "~" "\012"

exit 0
--- cut here ---
 
Select table in IE (maybe other browsers too) and in
Open Office Calc 2.0 (maybe earlier versions too) click
on upper left cell and use Edit | Paste Special |
HTML Format | OK to paste it into the spreadsheet.
Now save it as a CSV file.

(In Excel its even easier since you can paste it into
Excel using just ctrl-V).
 
In the previous instructiions I neglected to explicitly
say that you need to copy it to the cliipboard although
it was probably obvious. Just in case here are
revised instructions:

Select table in IE (maybe other browsers too) and press
ctrl-C to copy the selection to the Windows clipboard.
In Open Office Calc 2.0 (maybe earlier versions too)
click on upper left cell and use Edit | Paste Special |
HTML Format | OK to paste clipboard into the spreadsheet.
Now save it as a CSV file.

(In Excel its even easier since you can paste it into
Excel using just ctrl-V).
 
Select table in IE (maybe other browsers too) and press
ctrl-C to copy the selection to the Windows clipboard.
In Open Office Calc 2.0 (maybe earlier versions too)
click on upper left cell and use Edit | Paste Special |
HTML Format | OK to paste clipboard into the spreadsheet.
Now save it as a CSV file.

That works with OO 1.4 also. Thanks.

Steve
 
Back
Top