Incomplete Escaping Functionality??

A

Arthur Dent

Hello All...

I am in an app that needs to write out an XML document for transmittal to an
outside organization. All good and fine... I create the XmlDocument object,
append all my nodes, and values etc etc... and it all works.

Now I go to save the file... I tried two methods...
MyXmlDocument.Save(filename) and
My.Computer.FileSystem.WriteAllText(filename, MyXmlDoc.OuterXml, False)

The problem comes in with XmlDocument.OuterXml. According to XML, there are
5 characters which need to be escaped... Ampersand, LessThan, GreaterThan,
Apostrophe and DoubleQuote. XmlDocument.OuterXml, escapes only three of
these ( & , < and > ). Apostrophe and DoubleQuote do not get escaped. This
is a problem, because the third party we need to deal with *must* have them
escaped, even inside of a Nodes InnerText.
So I figured okay, I'll just escape them myself, but when I try to do that,
it winds up escaping my Ampersand (for example in "&quot;" ), so that it
winds up saving "&amp;quot;".

How in the world can I tell it that it needs to escape ALL FIVE CHARACTERS?
Thanks in advance,
- Arthur Dent.
 
J

John Saunders

Arthur Dent said:
Hello All...

I am in an app that needs to write out an XML document for transmittal to
an outside organization. All good and fine... I create the XmlDocument
object, append all my nodes, and values etc etc... and it all works.

Now I go to save the file... I tried two methods...
MyXmlDocument.Save(filename) and
My.Computer.FileSystem.WriteAllText(filename, MyXmlDoc.OuterXml, False)

The problem comes in with XmlDocument.OuterXml. According to XML, there
are 5 characters which need to be escaped... Ampersand, LessThan,
GreaterThan, Apostrophe and DoubleQuote. XmlDocument.OuterXml, escapes
only three of these ( & , < and > ). Apostrophe and DoubleQuote do not get
escaped. This is a problem, because the third party we need to deal with
*must* have them escaped, even inside of a Nodes InnerText.
So I figured okay, I'll just escape them myself, but when I try to do
that, it winds up escaping my Ampersand (for example in "&quot;" ), so
that it winds up saving "&amp;quot;".


Exactly what are you trying to escape? Do you have these characters within
text nodes? If so, you need to escape them when you create the text node.

John
 
O

Oleg Tkachenko [MVP]

Arthur said:
I am in an app that needs to write out an XML document for transmittal
to an outside organization. All good and fine... I create the
XmlDocument object, append all my nodes, and values etc etc... and it
all works.

Now I go to save the file... I tried two methods...
MyXmlDocument.Save(filename) and
My.Computer.FileSystem.WriteAllText(filename, MyXmlDoc.OuterXml, False)

The problem comes in with XmlDocument.OuterXml. According to XML, there
are 5 characters which need to be escaped... Ampersand, LessThan,
GreaterThan, Apostrophe and DoubleQuote. XmlDocument.OuterXml, escapes
only three of these ( & , < and > ). Apostrophe and DoubleQuote do not
get escaped. This is a problem, because the third party we need to deal
with *must* have them escaped, even inside of a Nodes InnerText.

XML spec says this:

"The ampersand character (&) and the left angle bracket (<) MUST NOT
appear in their literal form, except when used as markup delimiters, or
within a comment, a processing instruction, or a CDATA section. If they
are needed elsewhere, they MUST be escaped using either numeric
character references or the strings "&amp;" and "&lt;" respectively. The
right angle bracket (>) may be represented using the string "&gt;", and
MUST, for compatibility, be escaped using either "&gt;" or a character
reference when it appears in the string "]]>" in content, when that
string is not marking the end of a CDATA section.

To allow attribute values to contain both single and double quotes, the
apostrophe or single-quote character (') may be represented as "&apos;",
and the double-quote character (") as "&quot;"."

So & and < MUST always be escaped, while >, ' and " only must be escaped
under certain circumstances, otherwise they MAY be escaped.

But actually you shouldn't care about XML syntax, XML takes care of it.
 
A

Arthur Dent

I have an XmlNode whose InnerText property contains DoubleQuote.
This causes problem with the 3rd party, because their software cannot handle
the doublequote in the innertext.
When I tried to manually escape it using "&quot;", the Xml parser escaped my
"&" on me, and saved it to the file as "&amp;quot;"... effectively making it
impossible for me to manually escape the doublequote.

Ultimately, I wound up adding the text inside of a CDATA section. This
worked for the 3rd party.
From looking around though, it looked online, like CDATA is a holdover, and
not the recommended way of doing things.




Oleg Tkachenko said:
Arthur said:
I am in an app that needs to write out an XML document for transmittal to
an outside organization. All good and fine... I create the XmlDocument
object, append all my nodes, and values etc etc... and it all works.

Now I go to save the file... I tried two methods...
MyXmlDocument.Save(filename) and
My.Computer.FileSystem.WriteAllText(filename, MyXmlDoc.OuterXml, False)

The problem comes in with XmlDocument.OuterXml. According to XML, there
are 5 characters which need to be escaped... Ampersand, LessThan,
GreaterThan, Apostrophe and DoubleQuote. XmlDocument.OuterXml, escapes
only three of these ( & , < and > ). Apostrophe and DoubleQuote do not
get escaped. This is a problem, because the third party we need to deal
with *must* have them escaped, even inside of a Nodes InnerText.

XML spec says this:

"The ampersand character (&) and the left angle bracket (<) MUST NOT
appear in their literal form, except when used as markup delimiters, or
within a comment, a processing instruction, or a CDATA section. If they
are needed elsewhere, they MUST be escaped using either numeric character
references or the strings "&amp;" and "&lt;" respectively. The right angle
bracket (>) may be represented using the string "&gt;", and MUST, for
compatibility, be escaped using either "&gt;" or a character reference
when it appears in the string "]]>" in content, when that string is not
marking the end of a CDATA section.

To allow attribute values to contain both single and double quotes, the
apostrophe or single-quote character (') may be represented as "&apos;",
and the double-quote character (") as "&quot;"."

So & and < MUST always be escaped, while >, ' and " only must be escaped
under certain circumstances, otherwise they MAY be escaped.

But actually you shouldn't care about XML syntax, XML takes care of it.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Top