How to selectively get elements from an XML file?

N

niberhate

I thought this was an easy problem, but have had a hard time figuring
out how to achieve this. For example, I have the following simple
piece of XML. I'd like to process this XML and basically remove all
<SomeGrouping> and </SomeGrouping> from it and return an XML shown
below the input XML. I know I can easily achieve this through Regex
or string replacement, but have been wondering if there is any XML API
that would let me do this.

I came up with the following method by studying the XPath examples
given at MSDN, but it doesn't work as expected.

public static IEnumerable<XElement> XlinqXpathTest()
{
var xmlReader = XmlReader.Create(new
StreamReader("TheXmlFile.xml"));
var root = XElement.Load(xmlReader);
var elements = root.XPathSelectElements(".//MyElement");
return elements;
}

Any suggestion? Thanks.


<!-- The input XML -->

<MyElement Url="some/folder/index.html">
<SomeGrouping>
<MyElement Url="some/other/folder/page1.html" />
<MyElement Url="some/other/folder/page2.html" />
<MyElement Url="some/other/folder/page3.html" />
<MyElement Url="some/other/folder/page4.html" />
<MyElement Url="some/other/folder/page5.html" >
<SomeGrouping>
<MyElement Url="yet/some/other/folder/page6.html" />
</SomeGrouping>
</MyElement>
<MyElement Url="other/folder/page1.html">
<SomeGrouping>
<MyElement Url="other/folder/folder1/page1.html" />
<MyElement Url="other/folder/folder1/page2.html" />
<MyElement Url="other/folder/folder1/page3.html" />
<MyElement Url="other/folder/folder1/page4.html" />
<MyElement Url="other/folder/folder1/page5.html" />
</SomeGrouping>
</MyElement>
<MyElement Url="other/folder1/page1.html" />
<MyElement Url="other/folder1/page2.html" />
<MyElement Url="other/folder1/page3.html" />
<MyElement Url="other/folder1/page4.html" />
<MyElement Url="other/folder1/page5.html" />
</SomeGrouping>
</MyElement>

<!-- Below is the output I want -->

<MyElement Url="some/folder/index.html">
<MyElement Url="some/other/folder/page1.html" />
<MyElement Url="some/other/folder/page2.html" />
<MyElement Url="some/other/folder/page3.html" />
<MyElement Url="some/other/folder/page4.html" />
<MyElement Url="some/other/folder/page5.html" >
<MyElement Url="yet/some/other/folder/page6.html" />
</MyElement>
<MyElement Url="other/folder/page1.html">
<MyElement Url="other/folder/folder1/page1.html" />
<MyElement Url="other/folder/folder1/page2.html" />
<MyElement Url="other/folder/folder1/page3.html" />
<MyElement Url="other/folder/folder1/page4.html" />
<MyElement Url="other/folder/folder1/page5.html" />
</MyElement>
<MyElement Url="other/folder1/page1.html" />
<MyElement Url="other/folder1/page2.html" />
<MyElement Url="other/folder1/page3.html" />
<MyElement Url="other/folder1/page4.html" />
<MyElement Url="other/folder1/page5.html" />
</MyElement>
 
M

Martin Honnen

niberhate said:
I thought this was an easy problem, but have had a hard time figuring
out how to achieve this. For example, I have the following simple
piece of XML. I'd like to process this XML and basically remove all
<SomeGrouping> and </SomeGrouping> from it and return an XML shown
below the input XML. I know I can easily achieve this through Regex
or string replacement, but have been wondering if there is any XML API
that would let me do this.
<!-- The input XML -->

<MyElement Url="some/folder/index.html">
<SomeGrouping>
<MyElement Url="some/other/folder/page1.html" />
<MyElement Url="some/other/folder/page2.html" />
<MyElement Url="some/other/folder/page3.html" />
<MyElement Url="some/other/folder/page4.html" />
<MyElement Url="some/other/folder/page5.html" >
<SomeGrouping>
<MyElement Url="yet/some/other/folder/page6.html" />
</SomeGrouping>
</MyElement>
<MyElement Url="other/folder/page1.html">
<SomeGrouping>
<MyElement Url="other/folder/folder1/page1.html" />
<MyElement Url="other/folder/folder1/page2.html" />
<MyElement Url="other/folder/folder1/page3.html" />
<MyElement Url="other/folder/folder1/page4.html" />
<MyElement Url="other/folder/folder1/page5.html" />
</SomeGrouping>
</MyElement>
<MyElement Url="other/folder1/page1.html" />
<MyElement Url="other/folder1/page2.html" />
<MyElement Url="other/folder1/page3.html" />
<MyElement Url="other/folder1/page4.html" />
<MyElement Url="other/folder1/page5.html" />
</SomeGrouping>
</MyElement>

<!-- Below is the output I want -->

<MyElement Url="some/folder/index.html">
<MyElement Url="some/other/folder/page1.html" />
<MyElement Url="some/other/folder/page2.html" />
<MyElement Url="some/other/folder/page3.html" />
<MyElement Url="some/other/folder/page4.html" />
<MyElement Url="some/other/folder/page5.html" >
<MyElement Url="yet/some/other/folder/page6.html" />
</MyElement>
<MyElement Url="other/folder/page1.html">
<MyElement Url="other/folder/folder1/page1.html" />
<MyElement Url="other/folder/folder1/page2.html" />
<MyElement Url="other/folder/folder1/page3.html" />
<MyElement Url="other/folder/folder1/page4.html" />
<MyElement Url="other/folder/folder1/page5.html" />
</MyElement>
<MyElement Url="other/folder1/page1.html" />
<MyElement Url="other/folder1/page2.html" />
<MyElement Url="other/folder1/page3.html" />
<MyElement Url="other/folder1/page4.html" />
<MyElement Url="other/folder1/page5.html" />
</MyElement>

For a recursive data structure it is best to use a recursive method so
with LINQ to XML and C# you can write a method as follows (doesn't have
to be static of course, done here for testing simplicity):

static IEnumerable<XNode> Filter(XElement element, XName toStrip)
{
if (element.Name == toStrip)
{
return element.Elements().SelectMany(child =>
Filter(child, toStrip));
}
else
{
return new XNode[] {new XElement(element.Name,
element.Attributes(),
from child in element.Elements() select
Filter(child, toStrip))};
}
}

and then use with e.g.


XDocument input = XDocument.Load("file.xml");

XDocument output = new XDocument(
Filter(input.Root, "SomeGrouping")
);

output.Save("result.xml");


XSLT is also a good choice for such stuff, the following stylesheet (you
can run with System.Xml.Xsl.XslCompiledTransform) simply strips all
"SomeGrouping" elements:

<xsl:stylesheet
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select="@* | node()"/>
</xsl:copy>
</xsl:template>

<xsl:template match="SomeGrouping">
<xsl:apply-templates/>
</xsl:template>

</xsl:stylesheet>
 
M

Mr. Arnold

niberhate said:
I thought this was an easy problem, but have had a hard time figuring
out how to achieve this. For example, I have the following simple
piece of XML. I'd like to process this XML and basically remove all
<SomeGrouping> and </SomeGrouping> from it and return an XML shown
below the input XML. I know I can easily achieve this through Regex
or string replacement, but have been wondering if there is any XML API
that would let me do this.

I came up with the following method by studying the XPath examples
given at MSDN, but it doesn't work as expected.

If you can use Linq-2-XML, maybe you can query the XML, and remove the
attribute using the attribute.remove and doing a save.

<http://blogs.microsoft.co.il/blogs/...to-xml-adding-updating-and-deleting-data.aspx>
 
N

niberhate

niberhate said:
I thought this was an easy problem, but have had a hard time figuring
out how to achieve this.  For example, I have the following simple
piece of XML.  I'd like to process this XML and basically remove all
<SomeGrouping> and </SomeGrouping> from it and return an XML shown
below the input XML.  I know I can easily achieve this through Regex
or string replacement, but have been wondering if there is any XML API
that would let me do this.
<!-- The input XML -->
<MyElement Url="some/folder/index.html">
  <SomeGrouping>
    <MyElement Url="some/other/folder/page1.html" />
    <MyElement Url="some/other/folder/page2.html" />
    <MyElement Url="some/other/folder/page3.html" />
    <MyElement Url="some/other/folder/page4.html" />
    <MyElement Url="some/other/folder/page5.html" >
      <SomeGrouping>
        <MyElement Url="yet/some/other/folder/page6.html" />
      </SomeGrouping>
    </MyElement>
    <MyElement Url="other/folder/page1.html">
      <SomeGrouping>
        <MyElement Url="other/folder/folder1/page1.html" />
        <MyElement Url="other/folder/folder1/page2.html" />
        <MyElement Url="other/folder/folder1/page3.html" />
        <MyElement Url="other/folder/folder1/page4.html" />
        <MyElement Url="other/folder/folder1/page5.html" />
      </SomeGrouping>
    </MyElement>
    <MyElement Url="other/folder1/page1.html" />
    <MyElement Url="other/folder1/page2.html" />
    <MyElement Url="other/folder1/page3.html" />
    <MyElement Url="other/folder1/page4.html" />
    <MyElement Url="other/folder1/page5.html" />
  </SomeGrouping>
</MyElement>
<!-- Below is the output I want -->
<MyElement Url="some/folder/index.html">
    <MyElement Url="some/other/folder/page1.html" />
    <MyElement Url="some/other/folder/page2.html" />
    <MyElement Url="some/other/folder/page3.html" />
    <MyElement Url="some/other/folder/page4.html" />
    <MyElement Url="some/other/folder/page5.html" >
        <MyElement Url="yet/some/other/folder/page6.html" />
    </MyElement>
    <MyElement Url="other/folder/page1.html">
        <MyElement Url="other/folder/folder1/page1.html" />
        <MyElement Url="other/folder/folder1/page2.html" />
        <MyElement Url="other/folder/folder1/page3.html" />
        <MyElement Url="other/folder/folder1/page4.html" />
        <MyElement Url="other/folder/folder1/page5.html" />
    </MyElement>
    <MyElement Url="other/folder1/page1.html" />
    <MyElement Url="other/folder1/page2.html" />
    <MyElement Url="other/folder1/page3.html" />
    <MyElement Url="other/folder1/page4.html" />
    <MyElement Url="other/folder1/page5.html" />
</MyElement>

For a recursive data structure it is best to use a recursive method so
with LINQ to XML and C# you can write a method as follows (doesn't have
to be static of course, done here for testing simplicity):

         static IEnumerable<XNode> Filter(XElement element, XName toStrip)
         {
             if (element.Name == toStrip)
             {
                 return element.Elements().SelectMany(child =>
Filter(child, toStrip));
             }
             else
             {
                 return new XNode[] {new XElement(element.Name,
                     element.Attributes(),
                     from child in element.Elements() select
Filter(child, toStrip))};
             }
         }

and then use with e.g.

             XDocument input = XDocument.Load("file.xml");

             XDocument output = new XDocument(
                 Filter(input.Root, "SomeGrouping")
                 );

             output.Save("result.xml");

XSLT is also a good choice for such stuff, the following stylesheet (you
can run with System.Xml.Xsl.XslCompiledTransform) simply strips all
"SomeGrouping" elements:

<xsl:stylesheet
   version="1.0"
   xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

   <xsl:template match="@* | node()">
       <xsl:copy>
           <xsl:apply-templates select="@* | node()"/>
       </xsl:copy>
   </xsl:template>

   <xsl:template match="SomeGrouping">
     <xsl:apply-templates/>
   </xsl:template>

</xsl:stylesheet>

Thank you very much. It worked perfectly. I need to do more practice
with the XML APIs.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top