REGEX Parsing Problem

D

David Elliott

I am doing some pattern matching for HTML and am having
problems for one instance. The pattern is not working
when encountering a "||". You can see in the output
below. It is in the data variable. I am looking to have
the data section in the tag.

I am getting HTML in the form of <tag>data

ex
input <table><tr><td>hello</td></tr></table>
output tag = <table>
data =
tag = <tr>
data =
tag = <td>
data = hello
tag = </td>
data =
tag = </tr>
data =
tag = </table>
data =

Any help would be appreciated.
Dave


============================================

string htmlRegEx = @"(<[^\>|^\<]*[>]*)([^\<]*)";

rx = new Regex(htmlRegEx);
mc = rx.Matches(text);

foreach(Match m in mc)
{
tag = m.Groups[1].Value.Trim();
data = m.Groups[2].Value.Trim();
}

============================================

<SCRIPT language=JavaScript>
<!--
function MM_reloadPage(init) { //reloads the window if Nav4 resized
if (init==true) with (navigator) {if ((appName=="Netscape")&&(parseInt(appVersion)==4)) {
document.MM_pgW=innerWidth; document.MM_pgH=innerHeight; onresize=MM_reloadPage; }}
else if (innerWidth!=document.MM_pgW || innerHeight!=document.MM_pgH) location.reload();
}
MM_reloadPage(true);


// -->
</SCRIPT>
============================================


tag = <SCRIPT language=JavaScript>
data =

tag = <!-- function MM_reloadPage(init) { //reloads the
window if Nav4 resized if (init==true) with (navigator)
{if ((appName=="Netscape")&&(parseInt(appVersion)==4))
{document.MM_pgW=innerWidth; document.MM_pgH=innerHeight;
onresize=MM_reloadPage; }} else if (innerWidth!=document.MM_pgW

data = || innerHeight!=document.MM_pgH) location.reload();}
MM_reloadPage(true);// -->

tag = </SCRIPT>
data =
 
B

Brian Davis

The problem is in your character class "[^\>|^\<]". Character classes [...]
match any single character between the brackets. While the pipe character
usually signifies alternation in a Regex, it represents a literal pipe
character when used inside a character class. Also, you don't need to
escape the ">" or "<". Your character class should read "[^><]" instead,
making the Regex look like this:

(<[^><]*[>]*)([^<]*)


Brian Davis
http://www.knowdotnet.com



David Elliott said:
I am doing some pattern matching for HTML and am having
problems for one instance. The pattern is not working
when encountering a "||". You can see in the output
below. It is in the data variable. I am looking to have
the data section in the tag.

I am getting HTML in the form of <tag>data

ex
input <table><tr><td>hello</td></tr></table>
output tag = <table>
data =
tag = <tr>
data =
tag = <td>
data = hello
tag = </td>
data =
tag = </tr>
data =
tag = </table>
data =

Any help would be appreciated.
Dave


============================================

string htmlRegEx = @"(<[^\>|^\<]*[>]*)([^\<]*)";

rx = new Regex(htmlRegEx);
mc = rx.Matches(text);

foreach(Match m in mc)
{
tag = m.Groups[1].Value.Trim();
data = m.Groups[2].Value.Trim();
}

============================================

<SCRIPT language=JavaScript>
<!--
function MM_reloadPage(init) { //reloads the window if Nav4 resized
if (init==true) with (navigator) {if
((appName=="Netscape")&&(parseInt(appVersion)==4)) {
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top