Search all nodes using Linq

G

graphicsxp

Hi,

I've converted an HTML page to a valid XML XDocument. Now I want to
search this XDocument using LINQ and I want to find all the nodes that
have an attribute 'class' equals to 'mxb'.

I've done that but it doesn't work :

StringReader reader = new StringReader(xml);
xDoc = XDocument.Load(reader, LoadOptions.None);

var he = from link in xDoc.Elements()
where link.Attribute("class").Value == "mxb"
select link;

Nothing is returned. What is the correct syntax ?
 
M

Martin Honnen

Hi,

I've converted an HTML page to a valid XML XDocument. Now I want to
search this XDocument using LINQ and I want to find all the nodes that
have an attribute 'class' equals to 'mxb'.

I've done that but it doesn't work :

StringReader reader = new StringReader(xml);
xDoc = XDocument.Load(reader, LoadOptions.None);

var he = from link in xDoc.Elements()
where link.Attribute("class").Value == "mxb"
select link;

Nothing is returned. What is the correct syntax ?

You want to look at the descendant elements

var he = from link in xDoc.Descendants()
where link.Attribute("class").Value == "mxb"
select link;
 
A

Anthony Jones

Hi,

I've converted an HTML page to a valid XML XDocument. Now I want to
search this XDocument using LINQ and I want to find all the nodes that
have an attribute 'class' equals to 'mxb'.

I've done that but it doesn't work :

StringReader reader = new StringReader(xml);
xDoc = XDocument.Load(reader, LoadOptions.None);

var he = from link in xDoc.Elements()
where link.Attribute("class").Value == "mxb"
select link;

Nothing is returned. What is the correct syntax ?

Elements() method returns only child Elements which in the case of a
document there will only one child, the root node.

You need Descendants() instead.

If you are looking for Link elements (anchors) in well formed xhtml then
consider:- Descendants('a')
 
N

Nicholas Paldino [.NET/C# MVP]

If you want to get all the elements, you will have to recurse through
all the elements:

public static IEnumerable<XElement> GetSelfAndAllDescendants(this XElement
container)
{
// First, return the container.
yield return container;

// Cycle through the descendants of the container.
foreach (XElement element in container.Descendants)
{
// Get the IEnumerable<XElement> for the child and its
// and its descendants, return those.
foreach (XElement child in GetSelfAndAllDescendants(element))
{
// Return the child.
yield return child;
}
}
}

Once you have that, it's a simple matter of performing your query:

var he = from link in xDoc.Root.GetSelfAndAllDescendants()
where link.Attribute("class").Value == "mxb"
select link;

Note that this is a recursive operation, so for very large datasets, it
might give you some problems (in which case, you might want to switch to an
XmlReader of some sort to enumerate the nodes).
 
N

Nicholas Paldino [.NET/C# MVP]

Ok, my method was overkill, I didn't realize that Decendants traversed
the entire hierarchy.

However, you need to be aware that the Decendants method will not return
the root element, so if the possibility exists that the attribute you are
looking for exists on the root element, you have to account for that.
 
M

Martin Honnen

Nicholas said:
However, you need to be aware that the Decendants method will not return
the root element, so if the possibility exists that the attribute you are
looking for exists on the root element, you have to account for that.

If you have an XDocument object doc, then doc.Descendants() will of
course include the root element i.e.
doc.Descendants().Contains(doc.Root)
is true.

Why do you think that the root is not a descendant element of the
XDocument instance?
 
N

Nicholas Paldino [.NET/C# MVP]

Martin,

It's a matter of what the documentation states for the Descendants
method:

Note that this method will not return itself in the resulting
IEnumerable<(Of <(T>)>). See DescendantsAndSelf if you need to include the
current XElement in the results.
The OP would have to use DescendantsAndSelf if he wants to include the
element in the returned IEnumerable<XElement>.

If it does return itself when you call the Descendants method, then that
is a bug.
 
M

Martin Honnen

Nicholas said:
It's a matter of what the documentation states for the Descendants
method:

Note that this method will not return itself in the resulting
IEnumerable<(Of <(T>)>). See DescendantsAndSelf if you need to include the
current XElement in the results.
The OP would have to use DescendantsAndSelf if he wants to include the
element in the returned IEnumerable<XElement>.

If it does return itself when you call the Descendants method, then that
is a bug.

You need to distinguish between Descendants() called on an XDocument
node and on an XElement node. The root element is a descendant element
of the XDocument node and is thus returned if you call Descendants() on
the XDocument node. An element however is not a descendant of itself
therefore if you call Descendants() on an XElement node then the element
itself is not returned. I think that makes sense and is not a bug. Maybe
the documentation needs some clarfication.
 
N

Nicholas Paldino [.NET/C# MVP]

Martin,

Thanks for clarifying. I agree that the documentation needs some
clarification.
 
G

graphicsxp

Martin,

    Thanks for clarifying.  I agree that the documentation needs some
clarification.

--
          - Nicholas Paldino [.NET/C# MVP]
          - (e-mail address removed)


Nicholas Paldino [.NET/C# MVP] wrote:
    It's a matter of what the documentation states for the Descendants
method:
Note that this method will not return itself in the resulting
IEnumerable<(Of <(T>)>). See DescendantsAndSelf if you need to include
the current XElement in the results.
    The OP would have to use DescendantsAndSelf if he wants to include
the element in the returned IEnumerable<XElement>.
    If it does return itself when you call the Descendants method,then
that is a bug.
You need to distinguish between Descendants() called on an XDocument node
and on an XElement node. The root element is a descendant element of the
XDocument node and is thus returned if you call Descendants() on the
XDocument node. An element however is not a descendant of itself therefore
if you call Descendants() on an XElement node then the element itself is
not returned. I think that makes sense and is not a bug. Maybe the
documentation needs some clarfication.

Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/

well, thanks for all the answers, that certainly helps !

Martin I think you gave me the right code, except that I needed to add
a condition in the where clause as the 'class' attribute might not
always exists :


var he = from link in xDoc.Descendants()
where ((link.Attribute("class") != null) &&
(link.Attribute("class").Value == "mxb"))
select link;
 
M

Martin Honnen

Martin I think you gave me the right code, except that I needed to add
a condition in the where clause as the 'class' attribute might not
always exists :


var he = from link in xDoc.Descendants()
where ((link.Attribute("class") != null) &&
(link.Attribute("class").Value == "mxb"))
select link;


You can shorten that to

var he = from link in xDoc.Descendants()
where (string)link.Attribute("class") == "mxb"
select link;
 
G

graphicsxp

You can shorten that to

var he = from link in xDoc.Descendants()
                       where (string)link.Attribute("class") == "mxb"
                       select link;

Are you sure about that ? When I do that I get 'Object reference not
set to an instance of an object' . I think I need to check for null
attribute, don't you think ?
 
A

Anthony Jones

You can shorten that to

var he = from link in xDoc.Descendants()
where (string)link.Attribute("class") == "mxb"
select link;
Are you sure about that ? When I do that I get 'Object reference not
set to an instance of an object' . I think I need to check for null
attribute, don't you think ?
<<<<<<<<<<<<<<<<<<<<<<<<

No. (string)null is a valid operation so is null == "mxb". I'm not sure
what would be giving you that error.

Works fine in a small test:-

var doc = new XDocument(new XElement("root",
new XElement("item", new XAttribute("attr", "1")),
new XElement("item")));

int x = (from item in doc.Descendants()
where (string)item.Attribute("attr") == "1"
select item).Count();

Console.WriteLine(x);
 
G

graphicsxp

Are you sure about that ?  When I do that I get 'Object reference not
set to an instance of an object' .  I think I need to check for null
attribute, don't you think ?
<<<<<<<<<<<<<<<<<<<<<<<<

No. (string)null is a valid operation so is null == "mxb".  I'm notsure
what would be giving you that error.

Works fine in a small test:-

var doc = new XDocument(new XElement("root",
 new XElement("item", new XAttribute("attr", "1")),
 new XElement("item")));

int x = (from item in doc.Descendants()
   where (string)item.Attribute("attr") == "1"
   select item).Count();

Console.WriteLine(x);

My Linq code is actually the following :

element = (from link in pDoc.Descendants()
where ((link.Attribute("id") != null)
&& (link.Attribute("id").Value == _id))
select link).First().ToString();

And if I remove the null test, I definitely get the exception. Could
it be because I use .First() ?
 
M

Martin Honnen

My Linq code is actually the following :

element = (from link in pDoc.Descendants()
where ((link.Attribute("id") != null)
&& (link.Attribute("id").Value == _id))
select link).First().ToString();

And if I remove the null test, I definitely get the exception. Could
it be because I use .First() ?


Well my suggestion was to replace

where ((link.Attribute("id") != null)
&& (link.Attribute("id").Value == _id))

with

where (string)link.Attribute("id") == _id

that does not throw an exception if the attribute does not exist.

If you simply had

where link.Attribute("id").Value == _id

then you would get an exception if the attribute does not exist but my
suggestion avoids that.


First().ToString() could also give an exception is no element is found
but that is a different issue.
 
G

graphicsxp

Well my suggestion was to replace

   where ((link.Attribute("id") != null)
                     && (link.Attribute("id").Value== _id))

with

   where (string)link.Attribute("id") == _id

that does not throw an exception if the attribute does not exist.

If you simply had

   where link.Attribute("id").Value == _id

then you would get an exception if the attribute does not exist but my
suggestion avoids that.

First().ToString() could also give an exception is no element is found
but that is a different issue.

Yes that's what I did, that is using the cast to a string. And that
gave the exception. But maybe it's the First().ToString() like you
pointed out, but I doubt it because there's always an element found.
 
G

graphicsxp

Well my suggestion was to replace

   where ((link.Attribute("id") != null)
                     && (link.Attribute("id").Value== _id))

with

   where (string)link.Attribute("id") == _id

that does not throw an exception if the attribute does not exist.

If you simply had

   where link.Attribute("id").Value == _id

then you would get an exception if the attribute does not exist but my
suggestion avoids that.

First().ToString() could also give an exception is no element is found
but that is a different issue.

Ohhhhh sorry, my mistake ! I didn't see you had removed the .Value !
Ok, makes sense now.

Thanks
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top