Strip HTML

G

Guest

Hi there.

I am currently working on a database/program that will take a certain html
page and store the strings on the page into a table, which will then be used
in reports/queries.

I have started with stripping the HTML tags off the page first. The function
works very well.
But, my problem is that I am not sure how I would illiminate the function
from removing "<br>" .

Here is the function:

'Ensure that strHTML contains something
If Len(strHTML) = 0 Then
stripHTML = strHTML
Exit Function
End If

Dim arysplit, i, j, strOutput

arysplit = Split(strHTML, "<")

'Assuming strHTML is nonempty, we want to start iterating
'from the 2nd array postition
If Len(arysplit(0)) > 0 Then j = 1 Else j = 0

'Loop through each instance of the array
For i = j To UBound(arysplit)
'Do we find a matching > sign?
If InStr(arysplit(i), ">") Then
'If so, snip out all the text between the start of the string
'and the > sign
'IF statement to NOT remove <br> tags.

arysplit(i) = Mid(arysplit(i), InStr(arysplit(i), ">") + 1)
Else
'Ah, the < was was nonmatching
arysplit(i) = "<" & arysplit(i)
End If
Next

'Rejoin the array into a single string
strOutput = Join(arysplit, "")

'Snip out the first <
strOutput = Mid(strOutput, 2 - j)

'Convert < and > to < and >
strOutput = Replace(strOutput, ">", ">")
strOutput = Replace(strOutput, "<", "<")
strOutput = Replace(strOutput, "–", "<")
stripHTML = strOutput


Thanks.
-State
 
D

Douglas J Steele

If all you want to do is remove the <BR>, you can use

strOutput = Replace(strOutput, "<BR>", " ")

BTW, why are you using Replace to change < to < and > to >? That does
absolutely nothing, other than taking machine cycles!
 
G

Guest

Hey.
I want to remove everything BUT <br>

and I was replacing < with < just to test out ".< and .>"

Thanks.
 
D

Douglas J Steele

Ah. Sorry, I misread.

One approach is to change the <BR> to something that you know isn't going to
occur naturally (say, xxyyxx), do the rest of your changes, then change
xxyyxx back to <BR>.
 
G

Guest

No problem.
Since this will all be an automatic process, there cant be any manual
replacing.

but I did get it to work.

I changed:
strOutput = Join(arysplit, "")

to...
strOutput = Join(arysplit, vbCrLf)

Thanks.
-State
 
D

Douglas J Steele

I didn't say anything about manual replacing.

Change your line of code:

arysplit = Split(strHTML, "<")

to

arysplit = Split(Replace(strHTML, "<BR>", "XXYYXX"), "<")

Change your line of code:

stripHTML = strOutput

to

stripHTML = Replace(strOutput, "XXYYXX", "<BR>")
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top