XmlDocument problem

A

Ayende Rahien

Serious problem

I'm using Chris Lovett's SgmlReader class


SgmlReader sr = new SgmlReader();
XmlDocument xdoc = new XmlDocument();
sr.DocType = "HTML";
sr.InputStream = new System.IO.StringReader(node.InnerText);
xdoc.Load(sr);
foreach(XmlNode PotentiallyMalicous in xdoc.SelectNodes("//script |
//embed //object | //frameset //frame //iframe | //meta | //link | //style |
//@style"))
{
if (node.ParentNode!=null)
PotentiallyMalicous.ParentNode.RemoveChild(PotentiallyMalicous);
else
xdoc.RemoveChild(PotentiallyMalicous);
}
item.desc = xdoc.InnerText;


Unfrotantely, I'm getting an exception on xdoc.Load(sr), saying:

System.InvalidOperationException: The specified node cannot be
inserted as the valid child of this node, because the specified node is the
wrong type.
at System.Xml.XmlDocument.AppendChildForLoad(XmlNode newChild,
XmlDocument doc)
at System.Xml.XmlLoader.LoadDocSequence(XmlDocument parentDoc)
at System.Xml.XmlLoader.Load(XmlDocument doc, XmlReader reader,
Boolean preserveWhitespace)
at System.Xml.XmlDocument.Load(XmlReader reader)
at Roar.RssFeed.UpdateItem(XmlNode ItemNode, XmlNamespaceManager
nsMgr, ArrayList& NewItems, Boolean Update) in c:\documents and
settings\ayende\my documents\visual studio projects\medar\rssfeed.cs:line
233
at Roar.RssFeed.UpdateFeed(XmlDocument feed, ArrayList& NewItems,
Boolean Update)

I've no idea what is causing this.
node.InnerText equals:
<'a href="http://www.ncl.com/fleet/dawn/index.htm"><'img
src="http://monster2.scripting.com/z/images/archiveScriptingCom/2003/12/01/d
awn.jpg" width="125" height="59" border="0" align="right" hspace="15"
vspace="5" alt="A picture named dawn.jpg"><'/a>Two articles, both from the
NY Times, by coincidence happened to show up one after the other in my
aggregator, a stark contrast of how two kinds of Americans live. The first
<'a
href="http://www.nytimes.com/2003/12/01/nyregion/01SHIP.html?ex=1385614800&e
n=f12a99f582744ee2&ei=5007&partner=USERLAND">article<'/a> details the
luxurious cruise ship Tom DeLay is bringing to the Republican National
Convention in NYC in August, where George Bush will, presumably, be
nominated for a second term as President. It's a very beautiful ship, very
nice. The second <'a
href="http://www.nytimes.com/2003/11/27/international/worldspecial/27LIST.ht
ml?ex=1385355600&en=1fa37d9cc6c8ca0f&ei=5007&partner=USERLAND">article<'/a>
is the daily report of US soldiers killed in Iraq. Yesterday only one
soldier died, David Goldberg, 20, an engineer in the Army reserve,
based in Layton, Utah. Needless to say he won't be going to the Republican
National Convention or riding on any cruise ships. "

Any idea what could cause it? Or how to fix it?
 
Z

Zürcher See

I use also the SgmlReader and I've never has some problem.

I think your problem is that you need a "root" node, an XmlDocument must
have a unique starting node like:

<html>
<head>
...
</head>
<body>
...
</body>
</html>

or

<doc>
<chapter name=1>
...
</chapter>
<chapter name=2>
...
</chapter>
</doc>

The stucture of your document is:
<a ..><img .../> </a>
<#text>
<a ..></a>
<#text>
<a ..></a>
<#text>

If it's so try the following:

new System.IO.StringReader("<p>"+node.InnerText+"</p>");
 
A

Ayende Rahien

Zürcher See said:
I use also the SgmlReader and I've never has some problem.
Try
new System.IO.StringReader("<p>"+node.InnerText+"</p>");

Dude!
It works!
Thanks a Lot
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

SgmlReader problem... 2

Top