G
Guest
Hi there.
I have written a function which takes a HTML file, and removes all the tags
from it - leaving me with the text I want. I want to be able to store these
strings in a table eventually.
Say this is the html from the webpage:
<BODY BGCOLOR="#FFF500" TEXT="#000000" style="background-color: #FFF500;
margin: 10px;">
<h3 class="red">LAST FIVE SONGS</h3>
<p style="color: #000000;">
CASSIE – ME & YOU<br>MARY J BLIGE – BE WITHOUT YOU<br>TARAS – I WILL LOVE
AGAIN (MAPL)<br>NICKELBACK – SAVIN ME (MAPL)<br>NATASHA BEDINGFIELD –
UNWRITTEN<br></p>
so far..i turns out like this:
CASSIE - ME AND YOU
MARY J BLIGE - BE WITHOUT YOU
TARAS - I WILL LOVE AGAIN (MAPL)
NICKELBACK - SAVIN ME (MAPL)
NATASHA BEDINGFIELD - UNWRITTEN
As you can see, with everyline of HTML that is removed there is a blank line
inserted(so that the text can be seperated...like above)
I now need to remove ALL blank lines.
Leaving me with...
CASSIE - ME AND YOU
MARY J BLIGE - BE WITHOUT YOU
TARAS - I WILL LOVE AGAIN (MAPL)
NICKELBACK - SAVIN ME (MAPL)
NATASHA BEDINGFIELD - UNWRITTEN
and now blank lines.
If this is too time consuming, I could just throw each line into an array,
and then only store the elements that have text in them.
Thanks.
-State
I have written a function which takes a HTML file, and removes all the tags
from it - leaving me with the text I want. I want to be able to store these
strings in a table eventually.
Say this is the html from the webpage:
<BODY BGCOLOR="#FFF500" TEXT="#000000" style="background-color: #FFF500;
margin: 10px;">
<h3 class="red">LAST FIVE SONGS</h3>
<p style="color: #000000;">
CASSIE – ME & YOU<br>MARY J BLIGE – BE WITHOUT YOU<br>TARAS – I WILL LOVE
AGAIN (MAPL)<br>NICKELBACK – SAVIN ME (MAPL)<br>NATASHA BEDINGFIELD –
UNWRITTEN<br></p>
so far..i turns out like this:
CASSIE - ME AND YOU
MARY J BLIGE - BE WITHOUT YOU
TARAS - I WILL LOVE AGAIN (MAPL)
NICKELBACK - SAVIN ME (MAPL)
NATASHA BEDINGFIELD - UNWRITTEN
As you can see, with everyline of HTML that is removed there is a blank line
inserted(so that the text can be seperated...like above)
I now need to remove ALL blank lines.
Leaving me with...
CASSIE - ME AND YOU
MARY J BLIGE - BE WITHOUT YOU
TARAS - I WILL LOVE AGAIN (MAPL)
NICKELBACK - SAVIN ME (MAPL)
NATASHA BEDINGFIELD - UNWRITTEN
and now blank lines.
If this is too time consuming, I could just throw each line into an array,
and then only store the elements that have text in them.
Thanks.
-State