With a Function, how can I get rid of formating codes like <div> ?

R

Ron Rosenfeld

Ron,
Your UDF code works great, however there are still additional formatting
codes like (hard return), and &

Hit send too fast.

A bit of research reveals that your codes are designated by being between an
"&" and a ";". So we just need to remove those substrings. Again, easily done
with a slight change to the pattern:

re.Pattern = "<[^<>]+>|[\r\n]|&[^;]+;"

--ron
 
A

AFSSkier

Ron,

The amp and nbsp are HTML amper codes. How do you remove them?

Original Data (my comments):
<div><font face=Arial size=2 color="#006666">THE (5/31)6-3 FOR VERSION 2 & 3
ARE READY FOR ON & amp ;(<=no spaces after amper) OFF REPORTS </font></div>
<div> & nbsp ;(<=no spaces after amper) </div>

UDF Result (it's only stripping the none printable characters):
THE (5/31)6-3 FOR VERSION 2 amp; 3 ARE READY FOR ON amp; OFF REPORTS nbsp;

Result needed:
THE (5/31)6-3 FOR VERSION 2 and 3 ARE READY FOR ON and OFF REPORTS.

--
Thanks, Kevin


Ron Rosenfeld said:
Ron,
Your UDF code works great, however there are still additional formatting
codes like (hard return), and &

That's just a matter of adding those codes to the Pattern.

A code like nbsp has to be added as the hexadecimal ascii code for that which
is A0.

\r and \n are the codes for <CR> and <LF>

and the & stands alone -- but I don't know how to tell the difference between
an ampersand used as part of a text string, and one being used as a formatting
code.

But try this for the pattern line in the UDF.

======================
re.Pattern = "<[^<>]+>|[&\xA0\r\n]"
======================


Or, all together:

==============================
Option Explicit
Function StripFormat(S As String) As String
Dim re As Object
Set re = CreateObject("vbscript.regexp")
re.Global = True
re.Pattern = "<[^<>]+>|[&\xA0\r\n]"
StripFormat = re.Replace(S, "")
End Function
=================================
--ron
 
R

Ron Rosenfeld

Ron,

The amp and nbsp are HTML amper codes. How do you remove them?

Original Data (my comments):
<div><font face=Arial size=2 color="#006666">THE (5/31)6-3 FOR VERSION 2 & 3
ARE READY FOR ON & amp ;(<=no spaces after amper) OFF REPORTS </font></div>
<div> & nbsp ;(<=no spaces after amper) </div>

UDF Result (it's only stripping the none printable characters):
THE (5/31)6-3 FOR VERSION 2 amp; 3 ARE READY FOR ON amp; OFF REPORTS nbsp;

Result needed:
THE (5/31)6-3 FOR VERSION 2 and 3 ARE READY FOR ON and OFF REPORTS.

Slight change to my last Pattern after seeing this example and your results:

re.Pattern = "<[^<>]+>|[\r\n]|&[^&;]+;"

--ron
 
A

AFSSkier

Ron,

You're getting closer. However, the & HTML codes need to be replaced, not
stripped.

&(amper)nbsp = " " (space)
&(amper)amp = & (and)

--
Thanks, Kevin


Ron Rosenfeld said:
Ron,

The amp and nbsp are HTML amper codes. How do you remove them?

Original Data (my comments):
<div><font face=Arial size=2 color="#006666">THE (5/31)6-3 FOR VERSION 2 & 3
ARE READY FOR ON & amp ;(<=no spaces after amper) OFF REPORTS </font></div>
<div> & nbsp ;(<=no spaces after amper) </div>

UDF Result (it's only stripping the none printable characters):
THE (5/31)6-3 FOR VERSION 2 amp; 3 ARE READY FOR ON amp; OFF REPORTS nbsp;

Result needed:
THE (5/31)6-3 FOR VERSION 2 and 3 ARE READY FOR ON and OFF REPORTS.

Slight change to my last Pattern after seeing this example and your results:

re.Pattern = "<[^<>]+>|[\r\n]|&[^&;]+;"

--ron
 
R

Ron Rosenfeld

Ron,

You're getting closer. However, the & HTML codes need to be replaced, not
stripped.

&(amper)nbsp = " " (space)
&(amper)amp = & (and)

Kevin,

That can be done. I hope you have a typo in your information, though, because
you have substituted the word "and" for a "text" ampersand in your original
data. In certain types of coding, the "text" ampersand would be an incorrect
usage.

Also, your comments interspersed with the text string to be processed makes
things a bit more confusing to me than might ordinarily have been the case, so
if this doesn't work, you may need to provide more accurate input and desired
outputs, with comments outside of the string to be processed.

In any event, one way is to first process the string to get rid of the <html
codes> and all of the & HTML codes except for &amp and &nbsp.

Then we do replacements on those two codes.

So:

===========================
Option Explicit
Function StripFormat(S As String) As String
Dim sTemp As String
Dim re As Object
Set re = CreateObject("vbscript.regexp")
re.Global = True
re.Pattern = "<[^<>]+>|[\r\n]|(&\s*(amp|nbsp)\s*;)|&[^;&]+;"
sTemp = re.Replace(S, "$1")
re.Pattern = "&\s*amp\s*;"
sTemp = re.Replace(sTemp, "&")
re.Pattern = "&\s*nbsp\s*;"
StripFormat = re.Replace(sTemp, " ")
End Function
===========================

In your examples, you had <space>'s included within the ampersand HTML coding.
If you can be certain that there will never be any spaces in those locations,
the UDF could be made a bit more efficient, by using the VBA replace method
instead of the Regular Expression replace method.
--ron
 
J

Jacob Skaria

Dear "AFSSkier"

If you are looking for a UDF try the below which do not use any patterns...
I have tried with the examples you posted and is working...Try and feedback.

Function GetData(varRange)
Dim intTemp As Integer
If varRange = "" Then GetData = "": Exit Function
Do
intTemp = InStr(intTemp + 1, varRange, ">")
If Mid(varRange, intTemp, 2) <> "><" And _
Mid(varRange, intTemp, 3) <> "> <" Then
GetData = Mid(varRange, intTemp + 1, _
InStr(intTemp, varRange, "</") - intTemp - 1)
End If
Loop Until GetData <> ""
End Function


If this post helps click Yes
 
R

Ron Rosenfeld

Ron,

You're getting closer. However, the & HTML codes need to be replaced, not
stripped.

&(amper)nbsp = " " (space)
&(amper)amp = & (and)

Kevin,

Also, there are, according to one source, 53 ampersand codes. If you want to
be able to replace all of them, we will need to set up an array, or lookup
table, to do so. Not difficult, just tedious.
--ron
 
R

Ron Rosenfeld

Kevin,

Also, there are, according to one source, 53 ampersand codes. If you want to
be able to replace all of them, we will need to set up an array, or lookup
table, to do so. Not difficult, just tedious.
--ron

Here's one method that first replaces all of the ampersand codes with their
ASCII equivalents, and then strips out the <...> codes.

It assumes that the ampersand codes do NOT have any included spaces. (If they
do, different coding will be required).

If you copy this, be aware that the lines making up the array should all be on
one single line. Your newsreader (or mine) may insert inadvertent line breaks
that you will need to remove to make it work correctly.

=================================
Option Explicit
Function StripFormat(S As String) As String
Dim AmpCodes, AmpChars
Dim i As Long
Dim sTemp As String
Dim re As Object

AmpCodes = Array("&quot;", "&amp;", "&lt;", "&gt;", "&nbsp;", "&iexcl;",
"&cent;", "&pound;", "&curren;", "&yen;", "&brvbar;", "&sect;", "&uml;",
"&copy;", "&ordf;", "&laquo;", "&not;", "&shy;", "&reg;", "&macr;", "&deg;",
"&plusmn;", "&sup2", "&sup3;", "&acute;", "&micro;", "&para;", "&middot;",
"&cedil;", "&sup1;", "&ordm;", "&raquo;", "&frac14;", "&frac12;", "&frac34;",
"&iquest;", "&times;", "&divide;", "&ETH;", "&eth;", "&THORN;", "&thorn;",
"&AElig;", "&aelig;", "&OElig;", "&oelig;", "&Aring;", "&Oslash;", "&Ccedil;",
"&ccedil;", "&szlig;", "&Ntilde;", "&ntilde;")

AmpChars = Array("""", "&", "<", ">", " ", "¡", "¢", "£", "¤", "¥", "¦", "§",
"¨", "©", "ª", "«", "¬", "­", "®", "¯", "°", "±", "²", "³", "´", "µ", "¶", "·",
"¸", "¹", "º", "»", "¼", "½", "¾", "¿", "×", "÷", "Ð", "ð", "Þ", "þ", "Æ", "æ",
"Œ", "œ", "Å", "Ø", "Ç", "ç", "ß", "Ñ", "ñ")

'Replace HTML Ampersand Codes
sTemp = S
For i = 0 To UBound(AmpCodes)
sTemp = Replace(sTemp, AmpCodes(i), AmpChars(i))
Next i

'Strip out <> codes; CR and LF
Set re = CreateObject("vbscript.regexp")
re.Global = True
re.Pattern = "<[^<>]+>|[\r\n]"
StripFormat = re.Replace(sTemp, "")
End Function
=====================================

--ron
 
A

AFSSkier

Ron,

From your latest UDF works awesome! Thank you very much for your time a
efforts.

Kevin


Ron Rosenfeld said:
Kevin,

Also, there are, according to one source, 53 ampersand codes. If you want to
be able to replace all of them, we will need to set up an array, or lookup
table, to do so. Not difficult, just tedious.
--ron

Here's one method that first replaces all of the ampersand codes with their
ASCII equivalents, and then strips out the <...> codes.

It assumes that the ampersand codes do NOT have any included spaces. (If they
do, different coding will be required).

If you copy this, be aware that the lines making up the array should all be on
one single line. Your newsreader (or mine) may insert inadvertent line breaks
that you will need to remove to make it work correctly.

=================================
Option Explicit
Function StripFormat(S As String) As String
Dim AmpCodes, AmpChars
Dim i As Long
Dim sTemp As String
Dim re As Object

AmpCodes = Array(""", "&", "<", ">", " ", "¡",
"¢", "£", "¤", "¥", "¦", "§", "¨",
"©", "ª", "«", "¬", "­", "®", "¯", "°",
"±", "&sup2", "³", "´", "µ", "¶", "·",
"¸", "¹", "º", "»", "¼", "½", "¾",
"¿", "×", "÷", "Ã", "ð", "Þ", "þ",
"Æ", "æ", "Œ", "œ", "Å", "Ø", "Ç",
"ç", "ß", "Ñ", "ñ")

AmpChars = Array("""", "&", "<", ">", " ", "¡", "¢", "£", "¤", "¥", "¦", "§",
"¨", "©", "ª", "«", "¬", "­", "®", "¯", "°", "±", "²", "³", "´", "µ", "¶", "·",
"¸", "¹", "º", "»", "¼", "½", "¾", "¿", "×", "÷", "Ã", "ð", "Þ", "þ", "Æ", "æ",
"Œ", "œ", "Å", "Ø", "Ç", "ç", "ß", "Ñ", "ñ")

'Replace HTML Ampersand Codes
sTemp = S
For i = 0 To UBound(AmpCodes)
sTemp = Replace(sTemp, AmpCodes(i), AmpChars(i))
Next i

'Strip out <> codes; CR and LF
Set re = CreateObject("vbscript.regexp")
re.Global = True
re.Pattern = "<[^<>]+>|[\r\n]"
StripFormat = re.Replace(sTemp, "")
End Function
=====================================

--ron
 
A

AFSSkier

Jacob,

You're getting closer. However, the & HTML codes need to be replaced, not
stripped.

&(amper)nbsp = " " (space)
&(amper)amp = & (and)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top