Help:About Regex !!!

G

Guest

Here is a string
xxxxx<free:news keyword="gaconel/name" length="30"/><free:info
keyword="title" length="20" recordset="10">
Now I want to use RegEx to get 2 string that from <free: ... />,and I
created a Regex to achieve my thouht.
This is My RegEx:
@"<free:\w[\w\d]*\s*(!!!Not \/>!!!)+\/>"
Please Notice the Regex ,(!!!Not \/>!!!) that I mean that middle of the
string can't include the combination of "/>"
Apparently it is not correct of [^\/>]+ because Regex will match the char of
the string one by one.
Who can tell me what shall I do?

Thanks very much!
Yours Gaconel
 
D

Dennis Myrén

I am not sure i understand.
Do you want to extract the values of the attributes or find the entire tag?
Please explain a little bit further.
If you want to find the tag, try:
<[^>]*>

If this is actually well-formed XML that you are working with,
please consider loading it into System.Xml.XmlDocument
in order to make this a very simple task.
 
G

Guest

Thank you Dennis!
I want to extract 2 string from the example string
1:<free:news keyword="gaconel/name" length="30"/>
2:<free:info keyword="title" length="20" recordset="10">
because the string is alterble so no one will know what the string is.
but the string must be like <free: ... />.
The string can include "/" or ">" or "/...>" but not "/>".
Can you understand me ?
 
D

Dennis Myrén

OK so if i understand you correctly,
<free:news keyword="gaconel/name" length="30"/>
should not give a match, but
<free:info keyword="title" length="20" recordset="10">
and
</free:info>
should give a match.

Then, try this expression:
<[^>]*[^\/]>
 
D

Dennis Myrén

If you also want to match only those tags that uses
the free: namespace prefix,
then the expression would be:
<\/?free:[^>]*[^\/]>


--
Regards,
Dennis JD Myrén
Oslo Kodebureau
Dennis Myrén said:
OK so if i understand you correctly,
<free:news keyword="gaconel/name" length="30"/>
should not give a match, but
<free:info keyword="title" length="20" recordset="10">
and
</free:info>
should give a match.

Then, try this expression:
<[^>]*[^\/]>



--
Regards,
Dennis JD Myrén
Oslo Kodebureau
Thank you Dennis!
I want to extract 2 string from the example string
1:<free:news keyword="gaconel/name" length="30"/>
2:<free:info keyword="title" length="20" recordset="10">
because the string is alterble so no one will know what the string is.
but the string must be like <free: ... />.
The string can include "/" or ">" or "/...>" but not "/>".
Can you understand me ?
 
G

Guest

In My document may be :
<html>
<head>
<title>good</title>
</head>
<body>
aabbbbbbbkljfdllkj<free:news keyword="gaconel/name"
length="20"><free:word recordset="10"/>klfjdsljf dsalkfd
salkjfdsalkjf dsal
....
<free:blush:ther class="..."/>
<!--
Maybe there is other tag that is begginning with "<free" and is
end with "/>"
And now I want to find all of the string that is begginning with
"<free" and is end with "/>" and middle
of the string should not include "/>"
What Shall I do?
@"<free:\w[\w\d]*\s*(!!!Not \/>!!!)+\/>"
The same to say that what shall I amend the Regex of "(!!!Not
\/>!!!)+"
-->
</body>
</html>
 
D

Dennis Myrén

I assume you want to match any tags using the free: namespace prefix.

Try this:
<snippet>
using System.Text.RegularExpressions;
Regex rx = new Regex("<\\/?free:[^>]*[^\\/]\\/?>", RegexOptions.Multiline);
MatchCollection matches = rx.Matches(/* Your string */ xml);
foreach (Match match in matches)
{
Console.WriteLine(match.Value);
}

</snippet>
 
G

Guest

my string can be like this <free:news keyword="<http://www.free.com>"/
so if use "<\/?free:[^>]*[^\/]>" then
My string will be not match this
the linchpin is in the middle of the string should not include "/>"
combination .


Dennis Myrén said:
If you also want to match only those tags that uses
the free: namespace prefix,
then the expression would be:
<\/?free:[^>]*[^\/]>


--
Regards,
Dennis JD Myrén
Oslo Kodebureau
Dennis Myrén said:
OK so if i understand you correctly,
<free:news keyword="gaconel/name" length="30"/>
should not give a match, but
<free:info keyword="title" length="20" recordset="10">
and
</free:info>
should give a match.

Then, try this expression:
<[^>]*[^\/]>



--
Regards,
Dennis JD Myrén
Oslo Kodebureau
Thank you Dennis!
I want to extract 2 string from the example string
1:<free:news keyword="gaconel/name" length="30"/>
2:<free:info keyword="title" length="20" recordset="10">
because the string is alterble so no one will know what the string is.
but the string must be like <free: ... />.
The string can include "/" or ">" or "/...>" but not "/>".
Can you understand me ?
 
D

Dennis Myrén

If the HTML document is XHTML(meaning it is well-formed XML)
then i strongly suggest to use XML DOM parser to extract the data.
The HTML you posted was nearly well-formed,
only the free:news element had no end element.
And also, you will have to declare a namespace URI for the free: namespace.
To do that, add an attribute to the root node:
<html xmlns:free="http://tempuri.org/">
(You may change http://tempuri.org/ into something else)

<snippet>
XmlDocument doc = new XmlDocument();
doc.Load( /* Name of the file */ filename );
XmlNamespaceManager nsmgr = new XmlNamespaceManager(doc.NameTable);
nsmgr.AddNamespace("free", /* Must match the namespace URI declared in
document */ "http://tempuri.org/");
XmlNodeList nl = doc.SelectNodes("//free:*", nsmgr);
foreach (XmlNode node in nl)
{
Console.WriteLine(node.OuterXml);
}
</snippet>

This solution is by far the most stabile one i would say.

--
Regards,
Dennis JD Myrén
Oslo Kodebureau
my string can be like this <free:news keyword="<http://www.free.com>"/
so if use "<\/?free:[^>]*[^\/]>" then
My string will be not match this
the linchpin is in the middle of the string should not include "/>"
combination .


Dennis Myrén said:
If you also want to match only those tags that uses
the free: namespace prefix,
then the expression would be:
<\/?free:[^>]*[^\/]>


--
Regards,
Dennis JD Myrén
Oslo Kodebureau
Dennis Myrén said:
OK so if i understand you correctly,
<free:news keyword="gaconel/name" length="30"/>
should not give a match, but
<free:info keyword="title" length="20" recordset="10">
and
</free:info>
should give a match.

Then, try this expression:
<[^>]*[^\/]>



--
Regards,
Dennis JD Myrén
Oslo Kodebureau
Thank you Dennis!
I want to extract 2 string from the example string
1:<free:news keyword="gaconel/name" length="30"/>
2:<free:info keyword="title" length="20" recordset="10">
because the string is alterble so no one will know what the string is.
but the string must be like <free: ... />.
The string can include "/" or ">" or "/...>" but not "/>".
Can you understand me ?
 
G

Guest

I'm sorry Dennis
Can you amend the Regex base on mine?
@"<free:\w[\w\d]*\s*(!!!Not \/>!!!)+\/>"
You only should tell me How to amend the string "(!!!Not \/>!!!)+" .
The finally result I want is : in the middle of the string I described it as
"<free:(.)*/>".
(.)* can be any monogram but not "/>"


Dennis Myrén said:
I assume you want to match any tags using the free: namespace prefix.

Try this:
<snippet>
using System.Text.RegularExpressions;
Regex rx = new Regex("<\\/?free:[^>]*[^\\/]\\/?>", RegexOptions.Multiline);
MatchCollection matches = rx.Matches(/* Your string */ xml);
foreach (Match match in matches)
{
Console.WriteLine(match.Value);
}

</snippet>


--
Regards,
Dennis JD Myrén
Oslo Kodebureau
In My document may be :
<html>
<head>
<title>good</title>
</head>
<body>
aabbbbbbbkljfdllkj<free:news keyword="gaconel/name"
length="20"><free:word recordset="10"/>klfjdsljf dsalkfd
salkjfdsalkjf dsal
....
<free:blush:ther class="..."/>
<!--
Maybe there is other tag that is begginning with "<free" and is
end with "/>"
And now I want to find all of the string that is begginning
with
"<free" and is end with "/>" and middle
of the string should not include "/>"
What Shall I do?
@"<free:\w[\w\d]*\s*(!!!Not \/>!!!)+\/>"
The same to say that what shall I amend the Regex of "(!!!Not
\/>!!!)+"
-->
</body>
</html>
 
D

Dennis Myrén

Please evaluate my most recent reply regarding an XML DOM approach prior
to further help.

--
Regards,
Dennis JD Myrén
Oslo Kodebureau
I'm sorry Dennis
Can you amend the Regex base on mine?
@"<free:\w[\w\d]*\s*(!!!Not \/>!!!)+\/>"
You only should tell me How to amend the string "(!!!Not \/>!!!)+" .
The finally result I want is : in the middle of the string I described it
as
"<free:(.)*/>".
(.)* can be any monogram but not "/>"


Dennis Myrén said:
I assume you want to match any tags using the free: namespace prefix.

Try this:
<snippet>
using System.Text.RegularExpressions;
Regex rx = new Regex("<\\/?free:[^>]*[^\\/]\\/?>", RegexOptions.Multiline);
MatchCollection matches = rx.Matches(/* Your string */ xml);
foreach (Match match in matches)
{
Console.WriteLine(match.Value);
}

</snippet>


--
Regards,
Dennis JD Myrén
Oslo Kodebureau
In My document may be :
<html>
<head>
<title>good</title>
</head>
<body>
aabbbbbbbkljfdllkj<free:news keyword="gaconel/name"
length="20"><free:word recordset="10"/>klfjdsljf dsalkfd
salkjfdsalkjf dsal
....
<free:blush:ther class="..."/>
<!--
Maybe there is other tag that is begginning with "<free" and is
end with "/>"
And now I want to find all of the string that is begginning
with
"<free" and is end with "/>" and middle
of the string should not include "/>"
What Shall I do?
@"<free:\w[\w\d]*\s*(!!!Not \/>!!!)+\/>"
The same to say that what shall I amend the Regex of
"(!!!Not
\/>!!!)+"
-->
</body>
</html>
 
G

Guest

Thank you very much Dennis

Dennis Myrén said:
Please evaluate my most recent reply regarding an XML DOM approach prior
to further help.

--
Regards,
Dennis JD Myrén
Oslo Kodebureau
I'm sorry Dennis
Can you amend the Regex base on mine?
@"<free:\w[\w\d]*\s*(!!!Not \/>!!!)+\/>"
You only should tell me How to amend the string "(!!!Not \/>!!!)+" .
The finally result I want is : in the middle of the string I described it
as
"<free:(.)*/>".
(.)* can be any monogram but not "/>"


Dennis Myrén said:
I assume you want to match any tags using the free: namespace prefix.

Try this:
<snippet>
using System.Text.RegularExpressions;
Regex rx = new Regex("<\\/?free:[^>]*[^\\/]\\/?>", RegexOptions.Multiline);
MatchCollection matches = rx.Matches(/* Your string */ xml);
foreach (Match match in matches)
{
Console.WriteLine(match.Value);
}

</snippet>


--
Regards,
Dennis JD Myrén
Oslo Kodebureau
In My document may be :
<html>
<head>
<title>good</title>
</head>
<body>
aabbbbbbbkljfdllkj<free:news keyword="gaconel/name"
length="20"><free:word recordset="10"/>klfjdsljf dsalkfd
salkjfdsalkjf dsal
....
<free:blush:ther class="..."/>
<!--
Maybe there is other tag that is begginning with "<free"
and
is
end with "/>"
And now I want to find all of the string that is begginning
with
"<free" and is end with "/>" and middle
of the string should not include "/>"
What Shall I do?
@"<free:\w[\w\d]*\s*(!!!Not \/>!!!)+\/>"
The same to say that what shall I amend the Regex of
"(!!!Not
\/>!!!)+"
-->
</body>
</html>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top