Finding a specific element using LINQ

G

graphicsxp

Hi,

I' ve converted an HTML page to valid XML document and now I want to
find a specific node in the hierarchy.

All I now is the id of one of its parents and the path from the parent
to the node I'm looking for :

<div id="core">
<div class="colFormat4">
<div class="colFormat1 articleType">
<div class="blocE1B-1r">
<h5>
<a href="/societe">

<text>hello</text>
</a>
<text>some text</text>
</h5>
<h1>
Node I'm looking for

So I need to find the node with id="core" and then since I know that
the path from the parent is div/div/div/h1, what would be the LINQ
code for that ?

Thanks
 
G

graphicsxp

Hi,

I' ve converted an HTML page to valid XML document and now I want to
find a specific node in the hierarchy.

All I now is the id of one of its parents and the path from the parent
to the node I'm looking for :

 <div id="core">
          <div class="colFormat4">
            <div class="colFormat1 articleType">
              <div class="blocE1B-1r">
                <h5>
                  <a href="/societe">

                    <text>hello</text>
                  </a>
                  <text>some text</text>
                </h5>
                <h1>
                  Node I'm looking for

So I need to find the node with id="core" and then since I know that
the path from the parent is div/div/div/h1,  what would be the LINQ
code for that ?

Thanks

I've found a way of doing it :

element = (from link in pDoc.Descendants()
where ((link.Attribute("id") != null) &&
(link.Attribute("id").Value == _parentId))
select link).First();

String ns = "{http://www.w3.org/1999/xhtml}";

XElement el2 = element.Element(ns + "div").Element(ns +
"div").Element(ns + "div").Element(ns + "h1");

But as you can see it only works if I specify the namespace. Why
can't I just do :

XElement el2 = element.Element("div").Element("div").Element
("div").Element("h1");

That returns null. Is there a way to ignore the namespace and still
be able to navigate through XElements ?

Thanks
 
A

Anthony Jones

I've found a way of doing it :

element = (from link in pDoc.Descendants()
where ((link.Attribute("id") != null) &&
(link.Attribute("id").Value == _parentId))
select link).First();
Your not using Martins simpler:-

(string)Attribute("id") == _parentId

because ???


<<<<<<<<<
String ns = "{http://www.w3.org/1999/xhtml}";

XElement el2 = element.Element(ns + "div").Element(ns +
"div").Element(ns + "div").Element(ns + "h1");

But as you can see it only works if I specify the namespace. Why
can't I just do :

XElement el2 = element.Element("div").Element("div").Element
("div").Element("h1");

That returns null. Is there a way to ignore the namespace and still
be able to navigate through XElements ?
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

The nature of XML element tag names is that they belong to a wider
namespace. The xmlns="http://www.w3.org/1999/xhtml" on the html specifies
the default namespace for the tag names.

The Element method takes an XName parameter. The XName type represents the
fully qualified tag name. A fully qualified name is one that specifies both
the namespace and the localname (e.g. "div").

XName has an implicit converter for string to XName so when you call
Element("div") you actual create an instance of XName which has a localname
of "div" and a None namespace.

A div in the None namespace is not the same name as a div in the
"http://www.w3.org/1999/xhtml" namespace hence it fails to match.

There are two solutions a pragmatic one being remove the xmlns attribute in
the XDocument you probably don't need (you will need to remove it before
loading into XDocument).

Another solution would be:-

XName div = ""{http://www.w3.org/1999/xhtml}div";
XName h1 = ""{http://www.w3.org/1999/xhtml}h1";

XElement el2 = element.Element(div).Element(div).Element
(div).Element(h1);

Note XName impicit conversion parses the content of { } in the string as the
namespace.
 
G

graphicsxp

I've found a way of doing it :

  element = (from link in pDoc.Descendants()
                    where ((link.Attribute("id") !=null) &&
(link.Attribute("id").Value == _parentId))
                    select link).First();



Your not using Martins simpler:-

(string)Attribute("id") == _parentId

because ???

<<<<<<<<<
      String ns = "{http://www.w3.org/1999/xhtml}";

      XElement el2 = element.Element(ns + "div").Element(ns +
"div").Element(ns + "div").Element(ns + "h1");

But as you can see it only works if I specify the namespace.  Why
can't  I just do :

     XElement el2 = element.Element("div").Element("div").Element
("div").Element("h1");

That returns null.  Is there a way to ignore the namespace and still
be able to navigate through XElements ?
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

The nature of XML element tag names is that they belong to a wider
namespace.  The xmlns="http://www.w3.org/1999/xhtml" on the html specifies
the default namespace for the tag names.

The Element method takes an XName parameter.  The XName type representsthe
fully qualified tag name.  A fully qualified name is one that specifiesboth
the namespace and the localname (e.g. "div").

XName has an implicit converter for string to XName so when you call
Element("div") you actual create an instance of XName which has a localname
of "div" and a None namespace.

A div in the None namespace is not the same name as a div in the
"http://www.w3.org/1999/xhtml" namespace hence it fails to match.

There are two solutions a pragmatic one being remove the xmlns attribute in
the XDocument you probably don't need (you will need to remove it before
loading into XDocument).

Another solution would be:-

XName div = ""{http://www.w3.org/1999/xhtml}div";
XName h1 = ""{http://www.w3.org/1999/xhtml}h1";

 XElement el2 = element.Element(div).Element(div).Element
(div).Element(h1);

Note XName impicit conversion parses the content of { } in the string as the
namespace.



Your not using Martins simpler:-
(string)Attribute("id") == _parentId

because ???


Because this was old code I've posted. Sorry about that, I'm using his
code now, it's great.


Thanks a lot for the explanations. It makes sense. I think I'll just
get rid of the namespace, like you said I don't need it.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top