Strip or Convert HTML

  • Thread starter Thread starter Djmask
  • Start date Start date
D

Djmask

I need to convert or remove html tags in excel fields. I found a
suggestion here to copy the data into notepad and save as .html. This
works great however is not practical as I numerous files. Could this
possible done with VBA? Thanks for any help. I have been searching the
internet for days now trying to figure this out.

scott
 
This might help.

Run it a few times to pick up any strays

Dim cell As Range
Dim ipos As Long
Dim iEnd As Long

For Each cell In Selection
ipos = InStr(1, cell.Value, "<")
If ipos > 0 Then
For iEnd = ipos + 1 To Len(cell.Value)
If Mid(cell.Value, iEnd, 1) = ">" Then
Exit For
End If
Next iEnd
If ipos > 1 Then
cell.Value = Left(cell.Value, ipos - 1) & Right(cell.Value,
Len(cell.Value) - iEnd)
Else
cell.Value = Right(cell.Value, Len(cell.Value) - iEnd)
End If
End If
Next cell
 
Hi Scott,
after running Bob's macro you might also want to run
(or incorporate macro into Bob's) the TRIMALL macro
http://www.mvps.org/dmcritchie/excel/join.htm#trimall

and also include replacements for &amp as &
and &nbsp; as a space

Curious how you ended up with HTML code within Excel.


Bob Phillips said:
This might help.

Run it a few times to pick up any strays

Dim cell As Range
Dim ipos As Long
Dim iEnd As Long

For Each cell In Selection
ipos = InStr(1, cell.Value, "<")
If ipos > 0 Then
For iEnd = ipos + 1 To Len(cell.Value)
If Mid(cell.Value, iEnd, 1) = ">" Then
Exit For
End If
Next iEnd
If ipos > 1 Then
cell.Value = Left(cell.Value, ipos - 1) & Right(cell.Value,
Len(cell.Value) - iEnd)
Else
cell.Value = Right(cell.Value, Len(cell.Value) - iEnd)
End If
End If
Next cell
 
Hi Scott,
after running Bob's macro you might also want to run
(or incorporate macro into Bob's) the TRIMALL macro
http://www.mvps.org/dmcritchie/excel/join.htm#trimall

to see if I missed something but

actually you might want to include this within Bob's
remember VBA is case sensitive
Selection.Replace What:=Chr(160), Replacement:=Chr(32), _
LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False Selection.Replace What:="&nbsp;"), Replacement:=Chr(32), _
LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Selection.Replace What:="&amp;"), Replacement:="&", _
LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False
Selection.Replace What:="<br>", Replacement:=Chr(10), _
LookAt:=xlPart, SearchOrder:=xlByRows, MatchCase:=False


Curious how you ended up with HTML code within Excel.


Bob Phillips said:
This might help.

Run it a few times to pick up any strays

Dim cell As Range
Dim ipos As Long
Dim iEnd As Long

For Each cell In Selection
ipos = InStr(1, cell.Value, "<")
If ipos > 0 Then
For iEnd = ipos + 1 To Len(cell.Value)
If Mid(cell.Value, iEnd, 1) = ">" Then
Exit For
End If
Next iEnd
If ipos > 1 Then
cell.Value = Left(cell.Value, ipos - 1) & Right(cell.Value,
Len(cell.Value) - iEnd)
Else
cell.Value = Right(cell.Value, Len(cell.Value) - iEnd)
End If
End If
Next cell
 
Thanks for the help guys. That worked unfortunelty it didn't get me the result I needed. We are uploading an exported product list from another website. The text they gave us has HTML Tags in it however the text field we are uploading to on our website doesn't support HTML tags. Its a big pain and the people giving us the file aren't to helpful nor are the people who manage our webiste. Thanks for all the help though. I learned some stuff about VBA! Looks like i just have to find replace in notepad. The biggest problem is that there are bullets in the HTML code and no punctuation for those sentences. So when I remove the tags the text becomes unreadable.


Scott
 
Hi Scott,
Perhaps you could post a small sample of say 5 rows x 25 characters
of what you started with in the way of source HTML. What you see in
Excel and what you wanted in Excel.

If you were dealing strictly with source HTML and not Excel the following
would fail if you tried to strip everything between < > on a line by line
basis.
see <a href="#abc" title="def ghi
jkl mno"> fails since < and > are not matched on same line, you get bad results.

In the original question wouldn't you save as text.

If you want to extract some wording from HTML document from a
browser, You could use PureText to remove all formatting and just
have text -- same result of pasting to notepad, and then copying from
notepad.

PureText Home Page, strips all formatting out of the clipboard so you can copy from and HTML display and paste plain text. Invoke
with Window+V shortcut or with [PT] on taskbar next to clock.
http://stevemiller.net/puretext/

If you want do edit the HTML source HTML-Kit with Tidy provides a means of
stripping out unwanted HTML if you get into a bit more than I have. .
 
Thanks for your help david. Unfortunelty I am not dealing with the original HTML. I'm dealing with an export from an asp table that has HTML in it. So I recieved an excel doument with HTML and non HTML text in different fields.
Ohhword.jpg


"<ul>Sleek, ultra lightweight construction is coupled with spirited athletic styling to create the Quark Rocket. Inside you will find the Quark Duo-Density footbed which provides support at the arch of the foot where you need it most, especially when you find yourself on your feet for extended periods of time. The footbed is also made of EVA material which actually molds to your foot for an exceptional fit and overall comfort. The inner sock lining is moisture-wicking Dryz™ material to keep your feet dry and free from uncomfortable dampness, no matter the activity or the how high the temperature. <br><ul><li>Approximately 1 inch</ul>"

The Html is all pretty redundant. So I actually just modified the VBA code you and Bob gave to search and replace certain html tags. I would have originally just done that with find and replace but there are so many records and characters that F/R would error out. I've tried lots of different HTML strippers and converters. The problem is none of them seem to be friendly with excel so once I convert the text the way I want it it screws up my fields. Puretext looks pretty cool though I will keep that utility for future use. But I'm all set now. I learned a bunch about VBA today. And I'm truly grateful for your help.

Scott.
 
Last edited:
Back
Top