regular expression xml help please

  • Thread starter Thread starter adidev
  • Start date Start date
A

adidev

hello,

anyone know how to return any quotes used inside a text node?

for example:

<node1>This is my text node, my quote "quote" is
here</node1>

what i want to do is escape those " with &apos; through the use of
regular expressions

the string i will be operating on is an xml document represented as a
string so there will be multiple nodes per line, etc. when replacing
" with &quot; i do not want to replace attribute " eg. <node1
myattrb="blah"> ( i want to leave these alone )
 
If you don't have text nodes and element nodes in the same element you
could do something simple:

Regex regex = new Regex("(?<=>[^<]*)(\")(?=[^>]*<)");
Console.WriteLine(regex.Replace("<foo attr=\"fudder dudder\">bar \" baz</foo>",
"&quot;"));

There are going to be cases where this fails, as I've pointed out. If you
have a specific data layout you are looking to escape in this manner, then that
might just help a bit more in determining the right expression for you.

The one thing you don't want to match are things between < and >, which is why
the look-ahead/behind assertions are written the way they are.

--
Justin Rogers
DigiTec Web Consultants, LLC.
Blog: http://weblogs.asp.net/justin_rogers


adidev said:
hello,

anyone know how to return any quotes used inside a text node?

for example:

<node1>This is my text node, my quote "quote" is
here</node1>

what i want to do is escape those " with &apos; through the use of
regular expressions

the string i will be operating on is an xml document represented as a
string so there will be multiple nodes per line, etc. when replacing
" with &quot; i do not want to replace attribute " eg. <node1
myattrb="blah"> ( i want to leave these alone )
 
Back
Top