Saving < and > in XML

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

I'm using XmlTextWriter to write XML to a string and XmlDocument to read it
back.

Every thing works OK except if my data contains a < or >. If this is the
case they get saved as < > repectively. When I come to read them back they
appear as the text < > rather than the < and >. What do I have to do to
prevent this happening.

Similarly if I try and save data <abc> I get an exception saying potentially
dangerous data detected

Any ideas how I should handle these cases ? this there some setting that I
need to make when saving the data.
 
< and > are reserved characters in XML. You'll want to wrap these in a
CDATA block or encode them to their escaped values (i.e. &lt;abc&gt;) before
sticking them into XML.

Robert
 
Tony said:
I'm using XmlTextWriter to write XML to a string and XmlDocument to read
it back.

Every thing works OK except if my data contains a < or >. If this is the
case they get saved as < > repectively. When I come to read them back they
appear as the text < > rather than the < and >. What do I have to do to
prevent this happening.

Similarly if I try and save data <abc> I get an exception saying
potentially dangerous data detected

Any ideas how I should handle these cases ? this there some setting that I
need to make when saving the data.

Use the special entity reference.

This applies for & as well.

So:

&amp;
&gt;
&lt;
 
Does this mean I should run any data I'm about to put into XML through some
function that converts to &lt etc first. If so what is this function?
 
Tony said:
Are these the only three reserved characters ?

There are five reserved characters (although when they must
be reserved varies, as I'll explain momentarily):

< &lt;
& &amp;
" &quot;
' &apos;

You must replace '<' with "&lt;" when it appears in text child nodes
of an element. If you fail to do so, the XML parser will expect the
start of a nested child element (which isn't what you want). OTOH,
'>' you can usually leave alone.

You must replace '&' with "&amp;" all of the time. If you fail to do
so, the XML parser will interpret it as you attempting to escape
another character (these are called "character entities").

You must either replace ''' with "&apos;" OR '"' with "&quot;" (but
you do not need to do both) within attribute values. The character
that must be escaped is the character you're using to delimit your
attribute value. That is, if your attribute value is delimited by single
quotes, then you must escape ' when they appear in the value of
that attribute (e.g., O'Reilly ... name='O'Reilly' must be escaped as
name='O&apos;Reilly'), otherwise if your attribute value is delimited
by double quotes then you must escape " when they appear in the
value of the attribute (i.e., if you had said ... name="O'Reilly" then
you wouldn't have to escape anything because the XML parser is
not going to be confused.)

To recap, always replace &. Replace <. Replace the quote character
used to delimit attribute values inside of attribute values.

Inside of a CDATA section, you must escape "]]>" as "]]&lt;", but this
is the only thing you need to escape inside of a CDATA section.


Derek Harmon
 
Tony said:
Does this mean I should run any data I'm about to put into XML through some
function that converts to &lt etc first. If so what is this function?

The function is called String.Replace( ). If you're writing a string as a
text value, you can do this,

strEscapedValue = strOriginalValue.Replace( "&", "&amp;").Replace( "<", "&lt;");

If you're writing an attribute value (where you delimit attribute values
using double quotes, you can do this,

strEscapedAttrVal = strOriginalAttrVal.Replace( "&", "&amp;").Replace( "\"", "&quot;");

This is because .NET Framework 1.1 adds security checks to
HTTP requests to detect the possible presence of scripts that
may be dangerous.

If you replace all of the '<' with "&lt;" you can bypass this as it
ensures the request contains no script (there's also a setting
you can make in web.config to turn this check off, I believe,
although it isn't recommended).


Derek Harmon
 
Are you using .WriteRaw()? If so, don't. Use .WriteElement(), .WriteString()
and the like instead. These ought to automatically convert these characters
into their entities.
 
Back
Top