taking "strip html" to the next level

S

Steve127

I'm really glad I found this - and glad you took the time to write the
code. Not being a 'coder' I was wondering if my request would be
possible to add into the script -

I have various spreadsheets totaling 10's of thousands of cells that
contain HTML tags that I need to remove the HTML from. This script
definitely does that and it helps me quite a bit...

My request is the following:

What I envision is when I double click on a cell with HTML tags the
script executes just like it does, but instead of having to manually
copy the text from within the user form, then click the 'command'
button, and CTRL-V back into the original cell....it would be cool if
those steps were automated.

In other words, the user double-clicks the cell and "bingo" the HTML
markup cell contents are replaced with the non-HTML content.

If you could select an entire column and put all that code in
"For...Next" (i'm sure for...next isn't correct) loop and the script
executes all the way down the column with one double-click - that
would *really* be cool...but if somebody could show me how to do the
simplest automation I'd greatly appreciate it....

making my way through about 15,000 rows and 3 columns is going to be a
lot of double-clicks, CTRL-C, click, CTRL-V, Arrow
Down....repeat.....don't get me wrong though...I'm very appreciative
to have what you've already provided!
 
R

Ron Rosenfeld

I'm really glad I found this - and glad you took the time to write the
code. Not being a 'coder' I was wondering if my request would be
possible to add into the script -

I have various spreadsheets totaling 10's of thousands of cells that
contain HTML tags that I need to remove the HTML from. This script
definitely does that and it helps me quite a bit...

My request is the following:

What I envision is when I double click on a cell with HTML tags the
script executes just like it does, but instead of having to manually
copy the text from within the user form, then click the 'command'
button, and CTRL-V back into the original cell....it would be cool if
those steps were automated.

In other words, the user double-clicks the cell and "bingo" the HTML
markup cell contents are replaced with the non-HTML content.

If you could select an entire column and put all that code in
"For...Next" (i'm sure for...next isn't correct) loop and the script
executes all the way down the column with one double-click - that
would *really* be cool...but if somebody could show me how to do the
simplest automation I'd greatly appreciate it....

making my way through about 15,000 rows and 3 columns is going to be a
lot of double-clicks, CTRL-C, click, CTRL-V, Arrow
Down....repeat.....don't get me wrong though...I'm very appreciative
to have what you've already provided!

Steve,

Since this is for readability, is it correct to assume that you'd want the
document "collapsed" after having been processed? Or do you just want to leave
blank lines.

For example,

==============================
Option Explicit
Sub StripHTML()
Dim c As Range
Dim re As Object
Set re = CreateObject("vbscript.regexp")
re.IgnoreCase = True
re.Global = True
re.Pattern = "</?[a-z][a-z0-9]*[^<>]*>"

'not sure how you want to set the range to act on
' but this is quick and easy

For Each c In Selection
c.Value = re.Replace(c.Value, "")
Next c
End Sub
=================================

removes all the HTML tags in Selection, except for Comments and the Document
Type tags


If you also want to remove the blank lines, then perhaps:

=======================================
Option Explicit
Sub StripHTML()
Dim c As Range, rw As Object
Dim i As Long, lFirstRow As Long, lLastRow As Long, lColumn As Long
Dim re As Object
Set re = CreateObject("vbscript.regexp")
re.IgnoreCase = True
re.Global = True
re.Pattern = "</?[a-z][a-z0-9]*[^<>]*>"

Application.ScreenUpdating = False

'not sure how you want to set the range to act on
' but using Selection is quick and easy
For Each c In Selection
c.Value = re.Replace(c.Value, "")
Next c

lFirstRow = Selection.Row
lLastRow = Selection.Rows.Count + lFirstRow - 1
lColumn = Selection.Column


For i = lLastRow To lFirstRow Step -1
If Application.WorksheetFunction.Trim(Cells(i, 1).Value) = "" Then
Cells(i, lColumn).Delete shift:=xlUp
End If
Next i
Application.ScreenUpdating = True
End Sub
=====================================

--ron
 
S

Steve127

Thank you Ron - I'll give both a try and let you know.

To clarify:

I'm working on an export from a MySQL table. The database is part of
a shopping cart system. I inherited the database from person(s) who
input the product data with a lot of deprecated and non-validating
HTML. I am trying to remove all those tags.

As an example:

Suppose column D cells contain 'product_desc' data which are the cells
that have the bad HTML. Using the script from the original poster,
you double click the cell (say D3). In the popup text box you see the
text that is in D3, except the HTML tags are gone. What I do then is
CTRL-A, then CTRL-C, click the command button, and paste back into
D3. That gives me what I'm looking for - same product description
without HTML tags and database/table integrity.

One table alone has over 15,000 rows and 3 fields (or columns) with
bad HTML so you can imagine the routine will take me a very long time
to finish.

There might be a way to do this same thing inside MySQL, but I'm less
proficient at it than I am Excel! :) I can do write basic data
queries, but writing something to remove HTML tags would be way over
my head.

Anyway, hope that gives some insight into my problem (roadblock
really).

BTW...I messed around with the original script and managed to get it
to auto-paste the 'good' text into the cell after clicking the command
button, but I still have to do CTRL-A & CTRL-C. I gave both of those
a shot but kept getting into runtime errors and so forth and it
quickly got past my skill level.

Thank you
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top