With a Function, how can I get rid of formating codes like <div> ?

A

AFSSkier

I have a refreshable spreadsheet with data imported from Access. When the
data imports, I get Access formatting codes like <div>, &nbsp and font codes
(see below). I doing want the end user to have to do a search & replace a
blank, even with a macro. It only happens with memo/comment fields coming
from Access.

<div><font face=Arial size=2 color="#006666">PBA data. (ALL VERSIONS
INCLUDING MONTANA)</font></div> <div> </div>

In an adjacent cell I want to use a function(s) to format the new cell to
look like this => PBA data. ;(ALL VERSIONS INCLUDING MONTANA)
 
R

Ron Rosenfeld

I have a refreshable spreadsheet with data imported from Access. When the
data imports, I get Access formatting codes like <div>, &nbsp and font codes
(see below). I doing want the end user to have to do a search & replace a
blank, even with a macro. It only happens with memo/comment fields coming
from Access.

<div><font face=Arial size=2 color="#006666">PBA data. (ALL VERSIONS
INCLUDING MONTANA)</font></div> <div> </div>

In an adjacent cell I want to use a function(s) to format the new cell to
look like this => PBA data. ;(ALL VERSIONS INCLUDING MONTANA)

I'm not sure of all the possible variations of formatting codes that might come
in with your import. But the following User Defined Function should strip out
most of the formatting codes if they are similar to those above. It works by
removing the <>'s and everything in between. If you might have <>'s within the
text portion, then a more complex algorithm would be required.

To enter this User Defined Function (UDF), <alt-F11> opens the Visual Basic
Editor.
Ensure your project is highlighted in the Project Explorer window.
Then, from the top menu, select Insert/Module and
paste the code below into the window that opens.

To use this User Defined Function (UDF), enter a formula like

=StripFormat(A1)

in some cell.

=================================
Option Explicit
Function StripFormat(S As String) As String
Dim re As Object
Set re = CreateObject("vbscript.regexp")
re.Global = True
re.Pattern = "<[^<>]+>"
StripFormat = re.Replace(S, "")
End Function
================================

--ron
 
J

Jacob Skaria

Try this..

=MID(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,"
",),">",">|"),">|<",""),FIND("|",SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,"
",),">",">|"),">|<",""))+1,FIND("<",SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,"
",),">",">|"),">|<",""),2)-FIND("|",SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,"
",),">",">|"),">|<",""))-1)

If this post helps click Yes
 
R

Rick Rothstein

As a follow up to Ron's posting, here is a non-Regular Expression UDF for
you to consider...

Function StripFormat(S As String) As String
Dim X As Long
Dim Parts() As String
Parts = Split(S, "<")
For X = 1 To UBound(Parts)
If Not Parts(X) Like "*>" Then
StripFormat = Split(Parts(X), ">")(1)
Exit For
End If
Next
End Function
 
R

Ron Rosenfeld

I have a refreshable spreadsheet with data imported from Access. When the
data imports, I get Access formatting codes like <div>, &nbsp and font codes
(see below). I doing want the end user to have to do a search & replace a
blank, even with a macro. It only happens with memo/comment fields coming
from Access.

<div><font face=Arial size=2 color="#006666">PBA data. (ALL VERSIONS
INCLUDING MONTANA)</font></div> <div> </div>

In an adjacent cell I want to use a function(s) to format the new cell to
look like this => PBA data. ;(ALL VERSIONS INCLUDING MONTANA)
^^^

Is that semi-colon a typo? I assumed it was in my response but, if not, what
are the rules for its insertion?
--ron
 
R

Rick Rothstein

I assumed the same thing in my response... it looks like the OP traded the
last space from two consecutive spaces for that semi-colon, which I assumed
was a typo.
 
R

Rick Rothstein

And I also assumed the spaces between some of the formats (between the > and
<) at the end were also typos (my code depends on this).
 
R

Rick Rothstein

The code I post in my earlier response assumes the spaces between the > and
< symbols that you show in parts of your sample text are typos. If they are
not typos, then you will need to use this code instead...

Function StripFormat(ByVal S As String) As String
Dim X As Long
Dim Parts() As String
Do While InStr(S, "> ")
S = Replace(S, "> ", "> ")
Loop
Parts = Split(Replace(S, "> <", "><"), "<")
For X = 1 To UBound(Parts)
If Not Parts(X) Like "*>" Then
StripFormat = Split(Parts(X), ">")(1)
Exit For
End If
Next
End Function

--
Rick (MVP - Excel)


Rick Rothstein said:
As a follow up to Ron's posting, here is a non-Regular Expression UDF for
you to consider...

Function StripFormat(S As String) As String
Dim X As Long
Dim Parts() As String
Parts = Split(S, "<")
For X = 1 To UBound(Parts)
If Not Parts(X) Like "*>" Then
StripFormat = Split(Parts(X), ">")(1)
Exit For
End If
Next
End Function
 
A

AFSSkier

Ron,
Your UDF code works great, however there are still additional formatting
codes like (hard return), and &

--
Thanks, Kevin


Ron Rosenfeld said:
I have a refreshable spreadsheet with data imported from Access. When the
data imports, I get Access formatting codes like <div>, &nbsp and font codes
(see below). I doing want the end user to have to do a search & replace a
blank, even with a macro. It only happens with memo/comment fields coming
from Access.

<div><font face=Arial size=2 color="#006666">PBA data. (ALL VERSIONS
INCLUDING MONTANA)</font></div> <div> </div>

In an adjacent cell I want to use a function(s) to format the new cell to
look like this => PBA data. ;(ALL VERSIONS INCLUDING MONTANA)

I'm not sure of all the possible variations of formatting codes that might come
in with your import. But the following User Defined Function should strip out
most of the formatting codes if they are similar to those above. It works by
removing the <>'s and everything in between. If you might have <>'s within the
text portion, then a more complex algorithm would be required.

To enter this User Defined Function (UDF), <alt-F11> opens the Visual Basic
Editor.
Ensure your project is highlighted in the Project Explorer window.
Then, from the top menu, select Insert/Module and
paste the code below into the window that opens.

To use this User Defined Function (UDF), enter a formula like

=StripFormat(A1)

in some cell.

=================================
Option Explicit
Function StripFormat(S As String) As String
Dim re As Object
Set re = CreateObject("vbscript.regexp")
re.Global = True
re.Pattern = "<[^<>]+>"
StripFormat = re.Replace(S, "")
End Function
================================

--ron
 
A

AFSSkier

Jacob,
Your function works great, however there are still additional formatting
codes like (hard return), and &. Also, where the field is blank or has no
formatting codes, an error #VALUE! is the result.
 
A

AFSSkier

The additional codes are "(hard return), and &". These did not show on my
last request.
--
Thanks, Kevin


Ron Rosenfeld said:
I have a refreshable spreadsheet with data imported from Access. When the
data imports, I get Access formatting codes like <div>, &nbsp and font codes
(see below). I doing want the end user to have to do a search & replace a
blank, even with a macro. It only happens with memo/comment fields coming
from Access.

<div><font face=Arial size=2 color="#006666">PBA data. (ALL VERSIONS
INCLUDING MONTANA)</font></div> <div> </div>

In an adjacent cell I want to use a function(s) to format the new cell to
look like this => PBA data. ;(ALL VERSIONS INCLUDING MONTANA)

I'm not sure of all the possible variations of formatting codes that might come
in with your import. But the following User Defined Function should strip out
most of the formatting codes if they are similar to those above. It works by
removing the <>'s and everything in between. If you might have <>'s within the
text portion, then a more complex algorithm would be required.

To enter this User Defined Function (UDF), <alt-F11> opens the Visual Basic
Editor.
Ensure your project is highlighted in the Project Explorer window.
Then, from the top menu, select Insert/Module and
paste the code below into the window that opens.

To use this User Defined Function (UDF), enter a formula like

=StripFormat(A1)

in some cell.

=================================
Option Explicit
Function StripFormat(S As String) As String
Dim re As Object
Set re = CreateObject("vbscript.regexp")
re.Global = True
re.Pattern = "<[^<>]+>"
StripFormat = re.Replace(S, "")
End Function
================================

--ron
 
A

AFSSkier

I'll try this one more time. The codes are and &.

" and &"
--
Thanks, Kevin


Ron Rosenfeld said:
I have a refreshable spreadsheet with data imported from Access. When the
data imports, I get Access formatting codes like <div>, &nbsp and font codes
(see below). I doing want the end user to have to do a search & replace a
blank, even with a macro. It only happens with memo/comment fields coming
from Access.

<div><font face=Arial size=2 color="#006666">PBA data. (ALL VERSIONS
INCLUDING MONTANA)</font></div> <div> </div>

In an adjacent cell I want to use a function(s) to format the new cell to
look like this => PBA data. ;(ALL VERSIONS INCLUDING MONTANA)

I'm not sure of all the possible variations of formatting codes that might come
in with your import. But the following User Defined Function should strip out
most of the formatting codes if they are similar to those above. It works by
removing the <>'s and everything in between. If you might have <>'s within the
text portion, then a more complex algorithm would be required.

To enter this User Defined Function (UDF), <alt-F11> opens the Visual Basic
Editor.
Ensure your project is highlighted in the Project Explorer window.
Then, from the top menu, select Insert/Module and
paste the code below into the window that opens.

To use this User Defined Function (UDF), enter a formula like

=StripFormat(A1)

in some cell.

=================================
Option Explicit
Function StripFormat(S As String) As String
Dim re As Object
Set re = CreateObject("vbscript.regexp")
re.Global = True
re.Pattern = "<[^<>]+>"
StripFormat = re.Replace(S, "")
End Function
================================

--ron
 
A

AFSSkier

Sorry, this is frustrating to me too.

The codes must be html, are & n b s p; and & a m p; (without spaces).

"& n b s p ; and & a m p ;"
--
Thanks, Kevin


Ron Rosenfeld said:
I have a refreshable spreadsheet with data imported from Access. When the
data imports, I get Access formatting codes like <div>, &nbsp and font codes
(see below). I doing want the end user to have to do a search & replace a
blank, even with a macro. It only happens with memo/comment fields coming
from Access.

<div><font face=Arial size=2 color="#006666">PBA data. (ALL VERSIONS
INCLUDING MONTANA)</font></div> <div> </div>

In an adjacent cell I want to use a function(s) to format the new cell to
look like this => PBA data. ;(ALL VERSIONS INCLUDING MONTANA)

I'm not sure of all the possible variations of formatting codes that might come
in with your import. But the following User Defined Function should strip out
most of the formatting codes if they are similar to those above. It works by
removing the <>'s and everything in between. If you might have <>'s within the
text portion, then a more complex algorithm would be required.

To enter this User Defined Function (UDF), <alt-F11> opens the Visual Basic
Editor.
Ensure your project is highlighted in the Project Explorer window.
Then, from the top menu, select Insert/Module and
paste the code below into the window that opens.

To use this User Defined Function (UDF), enter a formula like

=StripFormat(A1)

in some cell.

=================================
Option Explicit
Function StripFormat(S As String) As String
Dim re As Object
Set re = CreateObject("vbscript.regexp")
re.Global = True
re.Pattern = "<[^<>]+>"
StripFormat = re.Replace(S, "")
End Function
================================

--ron
 
R

Rick Rothstein

The hard return can be handled easily enough (at least in the function I
posted), but can you give an example of the & in use so that we can figure
out how to account for it?

--
Rick (MVP - Excel)


AFSSkier said:
I'll try this one more time. The codes are and &.

" and &"
--
Thanks, Kevin


Ron Rosenfeld said:
I have a refreshable spreadsheet with data imported from Access. When
the
data imports, I get Access formatting codes like <div>, &nbsp and font
codes
(see below). I doing want the end user to have to do a search & replace
a
blank, even with a macro. It only happens with memo/comment fields
coming
from Access.

<div><font face=Arial size=2 color="#006666">PBA data. (ALL VERSIONS
INCLUDING MONTANA)</font></div> <div> </div>

In an adjacent cell I want to use a function(s) to format the new cell
to
look like this => PBA data. ;(ALL VERSIONS INCLUDING MONTANA)

I'm not sure of all the possible variations of formatting codes that
might come
in with your import. But the following User Defined Function should
strip out
most of the formatting codes if they are similar to those above. It
works by
removing the <>'s and everything in between. If you might have <>'s
within the
text portion, then a more complex algorithm would be required.

To enter this User Defined Function (UDF), <alt-F11> opens the Visual
Basic
Editor.
Ensure your project is highlighted in the Project Explorer window.
Then, from the top menu, select Insert/Module and
paste the code below into the window that opens.

To use this User Defined Function (UDF), enter a formula like

=StripFormat(A1)

in some cell.

=================================
Option Explicit
Function StripFormat(S As String) As String
Dim re As Object
Set re = CreateObject("vbscript.regexp")
re.Global = True
re.Pattern = "<[^<>]+>"
StripFormat = re.Replace(S, "")
End Function
================================

--ron
 
A

AFSSkier

The semi-colon is a typo from a html code not showing up here. The are & n b
s p; and & a m p; (without spaces).

"& n b s p ; and & a m p ;"
 
R

Rick Rothstein

Can you show a real example of these & items in use so I can see what
locations they have to be accounted for at.
 
G

Glenn

AFSSkier said:
I have a refreshable spreadsheet with data imported from Access. When the
data imports, I get Access formatting codes like <div>, &nbsp and font codes
(see below). I doing want the end user to have to do a search & replace a
blank, even with a macro. It only happens with memo/comment fields coming
from Access.

<div><font face=Arial size=2 color="#006666">PBA data. (ALL VERSIONS
INCLUDING MONTANA)</font></div> <div> </div>

In an adjacent cell I want to use a function(s) to format the new cell to
look like this => PBA data. ;(ALL VERSIONS INCLUDING MONTANA)


Try this:

=TRIM(RIGHT(SUBSTITUTE(LEFT(A1,FIND("</",A1)-1),">",REPT(" ",9999)),9999))

Makes the assumption that your actual data is all in one chunk and will always
start after a ">" and end before the first "</". If that's not always true, or
if there is ">" or "</" within your data, then you will need to pursue more
complex solutions.
 
A

AFSSkier

Glen,

It's not always same. They are random formating codes from random
comments/memos.

Here's an example of the most complexed, (my comments).

<div><font face=Arial size=2>KRAFT</div> (hard return)
<div>Report: Item Performance</div> (hard return)
<div>& n b s p ;(<=no spaces)</div> (hard return)
<div>rewrite table & a m p ;(<=no spaces) queries.</div>
 
R

Ron Rosenfeld

Ron,
Your UDF code works great, however there are still additional formatting
codes like (hard return), and &

That's just a matter of adding those codes to the Pattern.

A code like nbsp has to be added as the hexadecimal ascii code for that which
is A0.

\r and \n are the codes for <CR> and <LF>

and the & stands alone -- but I don't know how to tell the difference between
an ampersand used as part of a text string, and one being used as a formatting
code.

But try this for the pattern line in the UDF.

======================
re.Pattern = "<[^<>]+>|[&\xA0\r\n]"
======================


Or, all together:

==============================
Option Explicit
Function StripFormat(S As String) As String
Dim re As Object
Set re = CreateObject("vbscript.regexp")
re.Global = True
re.Pattern = "<[^<>]+>|[&\xA0\r\n]"
StripFormat = re.Replace(S, "")
End Function
=================================
--ron
 
R

Ron Rosenfeld

Ron,
Your UDF code works great, however there are still additional formatting
codes like (hard return), and &

Sorry, a little bit of research and I realized about the ampersand issue.

Try this instead:
--ron
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top