XmlNode.InnerXml and Xml Readers / XmlDocument

G

Guest

Hi,

I am currently working on xml files and i am trying to ensure that my code
handles any encoded chars (like > < & ' stored as &lt; > " &apos; )

I am currently using XmlValidatingReader.ReadInnerXml() but i have noticed
the behavior with XmlDocument too.

assume that i am trying to read a node which looks like this

<exlObjectFields>
<it:DSPLY_NAME><test"invalid"></it:DSPLY_NAME>
<it:OFFCL_CODE>TFMS000000</it:OFFCL_CODE>
<it:BCKGRNDPAG>TFMS000000</it:BCKGRNDPAG>
</exlObjectFields>

The DSPLY_NAME Field code that is read looks like this
<it:DSPLY_NAME><test"invalid"></it:DSPLY_NAME>

As you can see. the < > not decoded but " is being decode back to " char. It
there any ways i could force it to read without automatic decoding certain
chars ? Is it a bug ?

TIA
 
S

Steven Cheng[MSFT]

Hi Hermit,

As for the XML character escaping issue, I'm wondering how do you to load
the XML document, is it originally store in file and you use XmlDocument to
load it into memory?

Based on my understanding, the following like XML document is an invalid
one as the '<' , '>' hasn't been escaped and when you load it through
XmlDocument class, it will report exception(also the namespace prefix "it"
need to be declared):

==============
<exlObjectFields>
<it:DSPLY_NAME><test"invalid"></it:DSPLY_NAME>
<it:OFFCL_CODE>TFMS000000</it:OFFCL_CODE>
<it:BCKGRNDPAG>TFMS000000</it:BCKGRNDPAG>
</exlObjectFields>
=============


I have performed some test wihch store the escaped XML in file(as below);

==============
<exlObjectFields xmlns:it="xxxx" >
<it:DSPLY_NAME>&lt;test"invalid"&gt;</it:DSPLY_NAME>
<it:OFFCL_CODE>TFMS000000</it:OFFCL_CODE>
<it:BCKGRNDPAG>TFMS000000</it:BCKGRNDPAG>
</exlObjectFields>
===============

After loading into XmlDocument as following, it still keep the escaped
format(&lt; and &gt;):
private void btnTest2_Click(object sender, EventArgs e)
{
XmlDocument doc = new XmlDocument();

doc.Load("output.xml");

MessageBox.Show(doc.OuterXml);
}
<<<<<<<<<<<<<<<<<<<

Are you also using the similar code logic? Please feel free to let me know
if there is anything I missed.

Sincerely,

Steven Cheng

Microsoft MSDN Online Support Lead



==================================================

Get notification to my posts through email? Please refer to
http://msdn.microsoft.com/subscriptions/managednewsgroups/default.aspx#notif
ications.



Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
where an initial response from the community or a Microsoft Support
Engineer within 1 business day is acceptable. Please note that each follow
up response may take approximately 2 business days as the support
professional working with you may need further investigation to reach the
most efficient resolution. The offering is not appropriate for situations
that require urgent, real-time or phone-based interactions or complex
project analysis and dump analysis issues. Issues of this nature are best
handled working with a dedicated Microsoft Support Engineer by contacting
Microsoft Customer Support Services (CSS) at
http://msdn.microsoft.com/subscriptions/support/default.aspx.

==================================================



This posting is provided "AS IS" with no warranties, and confers no rights.
 
G

Guest

Steven,

The xml i copied was a part of a bigger xml document that does contain the
declaration for "it". However that is not the point.

What i meant was that the following characters need to be escaped as they
are xml reserved chars< <
" "
' &apos;

consider the node
<DSPLY_NAME><test></DSPLY_NAME>

Use XmlTextReader or XmlValidatingReader.ReadInnerXml i correctly received
the value <test>

However if i were to use
<DSPLY_NAME><test"Invalid"></DSPLY_NAME>.
The XmlTextReader or XmlValidatingReader's ReadInnerXml() return
<test"invalid">

The same applied for any use of &apos;.

Is there any way i can avoid the Xml Readers / Document objects from
decoding the encoded characters ?

Regards,

Hermit

--
Regards,

Hermit Dave
http://www.invokeit.co.uk
 
S

Steven Cheng[MSFT]

Thanks for your reply Hermit,

I'm wondering how you load the XML document, have you tried save it in a
file and load it from file. Also, have you set the XmlReader's ReaerSetting
to checkCharacters?

Based on my test, the following like XML fragment will definitely raise
exception when parsing it through XMLReader(since it is an invalid XML
document). Here is my test code to load it:

============================
XmlDocument doc = new XmlDocument();

string filepath = @"baddata.xml";

XmlReaderSettings settings = new XmlReaderSettings();
settings.CheckCharacters = true;


XmlReader xtr = XmlReader.Create(filepath, settings);

doc.Load(xtr);

xtr.Close();

MessageBox.Show(doc.OuterXml);
=============================

=====baddata.xml===========
<?xml version="1.0" encoding="utf-8" ?>
<exlObjectFields xmlns:it="http://schemas.it.org">
<it:DSPLY_NAME>
<test"invalid">
</it:DSPLY_NAME>
<it:OFFCL_CODE>TFMS000000</it:OFFCL_CODE>
<it:BCKGRNDPAG>TFMS000000</it:BCKGRNDPAG>
</exlObjectFields>
===========================

If possible, would you provide your test code logic so that I can also have
a look and test it on my side?

Sincerely,

Steven Cheng

Microsoft MSDN Online Support Lead


This posting is provided "AS IS" with no warranties, and confers no rights.
 
G

Guest

Steven,

I am copying relevent code. If you will notice the contents of fieldsXml
varialbe when it is read you will realise that the problem.

< and > are read as is
&apos; or " are converted to ' or "

Xml file being opened with
----------------------------------------------------------------
XmlTextReader textReader = new XmlTextReader( _sPrivateFilePath );
textReader.WhitespaceHandling = WhitespaceHandling.None;
_oMyReader = new XmlValidatingReader( textReader );
_oMyReader.ValidationType = ValidationType.None; // Xml validation to be
done seperately

// Set the validation event handler
_oMyReader.ValidationEventHandler += new ValidationEventHandler
(ValidationCallBack);
----------------------------------------------------------------

Xml being read.
----------------------------------------------------------------
if (_oMyReader.ReadState == ReadState.Interactive )
{
do
{
if (( _oMyReader.Name == Const.XmlElementName_ExlObject ) &&
( _oMyReader.IsStartElement() ))
{
string fieldsXml = _oMyReader.ReadInnerXml();
string sTag = Const.XmlStartTag_With_Namespace_ExlObjectFields;
string eTag = Const.XmlEndTag_ExlObjectFields;

// use the data read
break;
}
}
while ( _oMyReader.Read() );
}
----------------------------------------------------------------

Xml File
----------------------------------------------------------------
<?xml version="1.0" encoding="utf-8" ?>

<exl xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="ExlSchema"
xmlns:it="DbIssueTypeSchema" xsi:schemaLocation="DbIssueTypeSchema
TEST_SEC.xsd ExlSchema ExlSchema.xsd">
<name>AnEXL</name>
<headends><headend>MLS01-1</headend></headends>
<version>1.0</version>
<date>07-MAR-2007</date>
<description>Raaar</description>


<exlHeader>
<it:EXCHANGE>BA</it:EXCHANGE>
<it:ISSUTYPE>TEST_SEC</it:ISSUTYPE>

<exlHeaderFields>
<it:RECORDTYPE>113</it:RECORDTYPE>
<it:TEMP_VERS>202</it:TEMP_VERS>
</exlHeaderFields>
</exlHeader>

<exlObject>
<it:SYMBOL>TFMS000000</it:SYMBOL>
<it:RIC>TFMR000000.WA</it:RIC>
<exlObjectFields>
<it:DSPLY_NAME><test"invalid"></it:DSPLY_NAME>
<it:OFFCL_CODE>TFMS000000</it:OFFCL_CODE>
<it:BCKGRNDPAG>TFMS000000</it:BCKGRNDPAG>
</exlObjectFields>
</exlObject>
</exl>
----------------------------------------------------------------
--
Regards,

Hermit Dave
http://www.invokeit.co.uk
 
S

Steven Cheng[MSFT]

Thanks for your reply Hermit,

I'll have a look and test through it locally and let you know the result.

Sincerely,

Steven Cheng

Microsoft MSDN Online Support Lead


This posting is provided "AS IS" with no warranties, and confers no rights.
 
S

Steven Cheng[MSFT]

Hi Hermit,

For the code you provided, there still has many undefined variables that
may impact the test code logic. Would you send me a simplified project to
demonstrate it? You can reach me through the email in my signature (remove
"online").

Sincerely,

Steven Cheng

Microsoft MSDN Online Support Lead


This posting is provided "AS IS" with no warranties, and confers no rights.
 
S

Steven Cheng[MSFT]

Hi Hermit,

I've received the test package you sent and performed test. As you put
those escaped special char entites in input string, some of them are not
expand(such as the &lt; and &gt;) and others are expand( the &quot;).

I have checked the entityReference reading & expansion of .NET xml
component in MSDN and it seems xmlreader will always expand character
entities. That's why quotes are expand, for &lt; and &gt; , since < and >
are illegal chars in xml document c ontent, they can not be expand. For
other general entities the XmlTextReader has the "EntityHandling" property
for control whether to preseve entityreference or not:

#EntityReference Reading and Expansion
http://msdn2.microsoft.com/en-us/library/a4f0e433(vs.71).aspx

In addition, if the source XML document is originally ilegal(contains
invalid characters, such as <, >) in content, you need to manually replace
them (through IO reader) before the XML component parse them:

#How to locate and replace special characters in an XML file with Visual C#
.NET
http://support.microsoft.com/kb/316063

Sincerely,

Steven Cheng

Microsoft MSDN Online Support Lead



This posting is provided "AS IS" with no warranties, and confers no rights.
 
G

Guest

Thanks for the detailed reply and pointers to MSDN docs :)

The entityHandling enum should have had a 3rd option on doing nothing :)
rather than resovling char and entity references.

Well if its the only behavior there is very little i can do.

The xml data files do ensure that node inner xml is correctly encoded.
However only < > and & chars need to be encoded. My initial guess based on a
msdn doc was the have " and ' encoded too but i will revert those to chars to
as was before.

Thanks for your help Steven,
 
S

Steven Cheng[MSFT]

Thanks for your reply Hermit,

Yes, it is a pity that the EntityHandling setting only support Full or
character Expanding modes so far. That make us necessary to take care of
those particular character entities in xml document when processing.
Anyway, I think is a good idea to submit a request or comment on this to
the product team/

http://connect.microsoft.com/feedback/default.aspx?SiteID=210

As always, if there is any further things we can help, please feel free to
post in the newsgroup.

Sincerely,

Steven Cheng

Microsoft MSDN Online Support Lead


This posting is provided "AS IS" with no warranties, and confers no rights.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top