XML question on elements, child and children of childs...

S

SharpieNewbie

Hi all.. i have a newbie question that i cant find a good reference
online or in some of the books i've seen..

If i have a XML file that i've created, that looks something like
this:

<Movies>
<Movie>
<Title>Lord Of The Rings</Title>
<Cast>
<Actor>Orlando Bloom</Actor>
<Actor>Sean Bean</Actor>
</Cast>
<Genres>
<Genre>Fantasy</Genre>
<Genre>Action</Genre>
</Genres>
</Movie>
<Movie>
<Title>Harry Potter</Title>
<Cast>
<Actor>Daniel Radcliffe</Actor>
</Cast>
<Genres>
<Genre>Fantasy</Genre>
<Genre>Action</Genre>
</Genres>

I have done some code on being able to query and display into each
movie's actors, but i'm confused as to how to move "back" up a node
and go into displaying the genres of the movie?

I think that i might need to be using the XmlNodeList or the
XmlDocument.GetElementsByTagName and do some InnerText, but there
might be an easier way that some of the kind and experienced folks can
point me to use?
Thanks
 
M

Marc Gravell

i'm confused as to how to move "back" up a node
and go into displaying the genres of the movie?

There are always 2 ways to skin a cat with XmlDocument - you can walk
the nodes manually (ParentNode, NextSibling, etc), or you can use XPath.

An example is given below; if that doesn't help, can you please clarify
what you are trying to do? Perhaps post some sample code...

Marc

XmlDocument doc = new XmlDocument();
doc.LoadXml(xml);
foreach (XmlElement el in
doc.SelectNodes("/Movies/Movie/Genres/Genre"))
{
string title =
el.ParentNode.ParentNode.SelectSingleNode("Title").InnerText;
string altTitle =
el.SelectSingleNode("../../Title").InnerText;
Console.WriteLine("{0}: {1}/{2}", el.InnerText, title,
altTitle);
}
 
S

SharpieNewbie

There are always 2 ways to skin a cat with XmlDocument - you can walk
the nodes manually (ParentNode, NextSibling, etc), or you can use XPath.

An example is given below; if that doesn't help, can you please clarify
what you are trying to do? Perhaps post some sample code...

Marc

XmlDocument doc = new XmlDocument();
doc.LoadXml(xml);
foreach (XmlElement el in
doc.SelectNodes("/Movies/Movie/Genres/Genre"))
{
string title =
el.ParentNode.ParentNode.SelectSingleNode("Title").InnerText;
string altTitle =
el.SelectSingleNode("../../Title").InnerText;
Console.WriteLine("{0}: {1}/{2}", el.InnerText, title,
altTitle);
}

Marc,

Thanks for the reply, and thanks for helping a newbie..

Here's my my prior working code:
using System;
using System.Xml;
using System.Text;
using System.IO;

namespace TestProg
{
class Program
{

private static void DisplayHelp()
{
Console.Clear();
Console.WriteLine("Usage: The app requires a Xmlfile that
contains the metadata for each movie.");
Console.WriteLine("");
Console.WriteLine("Press any key to exit.");
Console.ReadLine();
}

static void Main(string[] args)
{
XmlDocument doc = new XmlDocument();
doc.Load("XMLFile2.xml");

XmlNodeList movieList = doc.GetElementsByTagName("Movie");

foreach (XmlNode node in movieList)
{
XmlElement movieElement = (XmlElement)node;

string title =
movieElement.GetElementsByTagName("Title")[0].InnerText;
string actor =
movieElement.GetElementsByTagName("Actor")[0].InnerText;
string genre =
movieElement.GetElementsByTagName("Genre")[0].InnerText;

Console.WriteLine("{0} is performed by {1} and is in
the genre of {2}\n", title, actor, genre);
}




Console.ReadLine();
}

}
}

And the XML that works with it is
<?xml version="1.0" encoding="utf-8" ?>
<Movies>
<Movie>
<Title>Lord Of The Rings</Title>
<Actor>Orlando Bloom</Actor>
<Genre>Fantasy</Genre>
</Movie>
<Movie>
<Title>Harry Potter</Title>
<Actor>Daniel Radcliffe</Actor>
<Genre>Fantasy</Genre>
</Movie>
</Movies>

Even though it works, i'm thinking that that is an expensive way of
doing it with the multiple GetElementsByTagName. If there's a more
elegant way, i don't mind using it too.

I find that if i start to add additonal actors or genres to the Movies
XML, like the original XML example (and shown here below) my attempts
to code for that goes haywire.

<?xml version="1.0" encoding="utf-8" ?>
<Movies>
<Movie>
<Title>Lord Of The Rings</Title>
<Cast>
<Actor>Orlando Bloom</Actor>
<Actor>Sean Bean</Actor>
</Cast>
<Genres>
<Genre>Fantasy</Genre>
</Genres>
</Movie>
<Movie>
<Title>Harry Potter</Title>
<Cast>
<Actor>Daniel Radcliffe</Actor>
</Cast>
<Genres>
<Genre>Fantasy</Genre>
<Genre>Magic</Genre>
</Genres>
</Movie>
</Movies>


This code works for my expected output, but i think it can be refined/
refactored to something more flexible. There's two obvious same block
of code below that parses the actors/genres. I think i need to make it
a more generic method for any future elementtags that i might add to
the XML.

static void Main(string[] args)
{
XmlDocument doc = new XmlDocument();
doc.Load("XMLFile2.xml");

XmlNodeList movieList = doc.GetElementsByTagName("Movie");

foreach (XmlNode node in movieList)
{
XmlElement movieElement = (XmlElement)node;

string title =
movieElement.GetElementsByTagName("Title")[0].InnerText;
string actor = "";
string genre = "";
int actorCount =
movieElement.GetElementsByTagName("Actor").Count;

//section to look for the actors in the movie
if (movieElement.GetElementsByTagName("Actor").Count !
= 1)
{
for (int i = 0; i < actorCount; i++)
{
actor =
movieElement.GetElementsByTagName("Actor").InnerText + ", " +
actor;
}
//need a way to take away the last comma too
//newbie way..
actor = actor.Substring(0, actor.Length - 2);

}
else
{
actor = movieElement.GetElementsByTagName("Actor")
[0].InnerText;
}

//section for the genres
int genreCount =
movieElement.GetElementsByTagName("Genre").Count;

//section to look for the genres in the movie
if (movieElement.GetElementsByTagName("Genre").Count !
= 1)
{
for (int i = 0; i < genreCount; i++)
{
genre =
movieElement.GetElementsByTagName("Genre").InnerText + ", " +
genre;
}
//need a way to take away the last comma too
//newbie way..
genre = genre.Substring(0, genre.Length - 2);

}
else
{
genre = movieElement.GetElementsByTagName("Genre")
[0].InnerText;
}





Console.WriteLine("{0} is performed by {1} and is in
the genre of {2}\n", title, actor, genre);
}
Console.ReadLine();
}

Any comments will me much appreciated..

thanks!
 
S

Stephany Young

For your first example, another way of 'skinning the cat' could be:

XmlDocument doc = new XmlDocument();

doc.Load("XMLFile2.xml");

XmlNode movie = doc.DocumentElement.FirstChild();

while (movie != null)
{

XmlNode info = movie.FirstChild();
while (info != null)
{

string title, actor, genre;
{
case "Title":
title = info.InnerText;
break;
case "Actor":
actor = info.InnerText;
break;
case "Genre":
genre = info.InnerText;
break;
default:
break;
}
info = info.NextSibling();
}
Console.WriteLine("{0} is performed by {1} and is in the genre of
{2}\n", title, actor, genre);
movie = movie.NextSibling();
}

For your second example you can use the same technique to deal with those
elements that can have multiple child elements:

XmlDocument doc = new XmlDocument();

doc.Load("XMLFile2.xml");

XmlNode movie = doc.DocumentElement.FirstChild();

while (movie != null)
{
string title = = movie.SelectSingleNode("Title").InnerText;
List<string> actors = new List<string>();
List<string> genres = new List<string>();
XmlNode node = movie.SelectSingleNode("Cast").FirstChild();
while (node != null)
{
actors.Add(node.InnerText);
node = node.NextSibling();
}
node = movie.SelectSingleNode("Genres").FirstChild();
while (node != null)
{
genres.Add(node.InnerText);
node = node.NextSibling();
}
Console.WriteLine("{0} is performed by {1} and is in the genre of
{2}\n", title, string.Join(", ", actors.ToArray()), string.Join(", ",
genres.ToArray()));
movie = movie.NextSibling();
}

Note that this demonstrates that there is no real need for index counters,
nor is there any need to manually build a comma-delimited string and then
trim extraneous delimiters.

Note also the use of List<string> to avoid any need for 'pre-sizing' arrays.

Pick the fish-bones out of that and adapt it for your needs.


SharpieNewbie said:
There are always 2 ways to skin a cat with XmlDocument - you can walk
the nodes manually (ParentNode, NextSibling, etc), or you can use XPath.

An example is given below; if that doesn't help, can you please clarify
what you are trying to do? Perhaps post some sample code...

Marc

XmlDocument doc = new XmlDocument();
doc.LoadXml(xml);
foreach (XmlElement el in
doc.SelectNodes("/Movies/Movie/Genres/Genre"))
{
string title =
el.ParentNode.ParentNode.SelectSingleNode("Title").InnerText;
string altTitle =
el.SelectSingleNode("../../Title").InnerText;
Console.WriteLine("{0}: {1}/{2}", el.InnerText, title,
altTitle);
}

Marc,

Thanks for the reply, and thanks for helping a newbie..

Here's my my prior working code:
using System;
using System.Xml;
using System.Text;
using System.IO;

namespace TestProg
{
class Program
{

private static void DisplayHelp()
{
Console.Clear();
Console.WriteLine("Usage: The app requires a Xmlfile that
contains the metadata for each movie.");
Console.WriteLine("");
Console.WriteLine("Press any key to exit.");
Console.ReadLine();
}

static void Main(string[] args)
{
XmlDocument doc = new XmlDocument();
doc.Load("XMLFile2.xml");

XmlNodeList movieList = doc.GetElementsByTagName("Movie");

foreach (XmlNode node in movieList)
{
XmlElement movieElement = (XmlElement)node;

string title =
movieElement.GetElementsByTagName("Title")[0].InnerText;
string actor =
movieElement.GetElementsByTagName("Actor")[0].InnerText;
string genre =
movieElement.GetElementsByTagName("Genre")[0].InnerText;

Console.WriteLine("{0} is performed by {1} and is in
the genre of {2}\n", title, actor, genre);
}




Console.ReadLine();
}

}
}

And the XML that works with it is
<?xml version="1.0" encoding="utf-8" ?>
<Movies>
<Movie>
<Title>Lord Of The Rings</Title>
<Actor>Orlando Bloom</Actor>
<Genre>Fantasy</Genre>
</Movie>
<Movie>
<Title>Harry Potter</Title>
<Actor>Daniel Radcliffe</Actor>
<Genre>Fantasy</Genre>
</Movie>
</Movies>

Even though it works, i'm thinking that that is an expensive way of
doing it with the multiple GetElementsByTagName. If there's a more
elegant way, i don't mind using it too.

I find that if i start to add additonal actors or genres to the Movies
XML, like the original XML example (and shown here below) my attempts
to code for that goes haywire.

<?xml version="1.0" encoding="utf-8" ?>
<Movies>
<Movie>
<Title>Lord Of The Rings</Title>
<Cast>
<Actor>Orlando Bloom</Actor>
<Actor>Sean Bean</Actor>
</Cast>
<Genres>
<Genre>Fantasy</Genre>
</Genres>
</Movie>
<Movie>
<Title>Harry Potter</Title>
<Cast>
<Actor>Daniel Radcliffe</Actor>
</Cast>
<Genres>
<Genre>Fantasy</Genre>
<Genre>Magic</Genre>
</Genres>
</Movie>
</Movies>


This code works for my expected output, but i think it can be refined/
refactored to something more flexible. There's two obvious same block
of code below that parses the actors/genres. I think i need to make it
a more generic method for any future elementtags that i might add to
the XML.

static void Main(string[] args)
{
XmlDocument doc = new XmlDocument();
doc.Load("XMLFile2.xml");

XmlNodeList movieList = doc.GetElementsByTagName("Movie");

foreach (XmlNode node in movieList)
{
XmlElement movieElement = (XmlElement)node;

string title =
movieElement.GetElementsByTagName("Title")[0].InnerText;
string actor = "";
string genre = "";
int actorCount =
movieElement.GetElementsByTagName("Actor").Count;

//section to look for the actors in the movie
if (movieElement.GetElementsByTagName("Actor").Count !
= 1)
{
for (int i = 0; i < actorCount; i++)
{
actor =
movieElement.GetElementsByTagName("Actor").InnerText + ", " +
actor;
}
//need a way to take away the last comma too
//newbie way..
actor = actor.Substring(0, actor.Length - 2);

}
else
{
actor = movieElement.GetElementsByTagName("Actor")
[0].InnerText;
}

//section for the genres
int genreCount =
movieElement.GetElementsByTagName("Genre").Count;

//section to look for the genres in the movie
if (movieElement.GetElementsByTagName("Genre").Count !
= 1)
{
for (int i = 0; i < genreCount; i++)
{
genre =
movieElement.GetElementsByTagName("Genre").InnerText + ", " +
genre;
}
//need a way to take away the last comma too
//newbie way..
genre = genre.Substring(0, genre.Length - 2);

}
else
{
genre = movieElement.GetElementsByTagName("Genre")
[0].InnerText;
}





Console.WriteLine("{0} is performed by {1} and is in
the genre of {2}\n", title, actor, genre);
}
Console.ReadLine();
}

Any comments will me much appreciated..

thanks!
 
M

Marc Gravell

You should really be using xpath expressions, not
GetElementsByTagName;
and reuse results if possible. Additionally, string-concatenation in
a loop should ideally use StringBuilder (I can explain why if you
like).

One advantage of xpath here is that it avoids conflicts if you later
decide to add, say:

<Extras>
<Actor>Foo</Actor>
<Actor>Bar</Actor>
</Extras>

With xpath, you can easily disambiguate between /Cast/Actor and
Extras/Actor - while GetElementsByTagName will return
both; perhaps a bigger issue is that GetElementsByTagName gets
tricky if you start using xml-namespaces. The other advantage is
that you can perform richer queries - i.e. if the nodes were:

<Cast>
<Actor Name="Orlando Bloom" Role="Legolas"/>
<Actor Name="Sean Bean" Role="Boromir"/>
</Cast>

(of course, you might want to think about how

Then you can query this with xpath "Cast/Actor/@Name" - but to
do the same with GetElementsByTagName needs extra work. You can
also use xpath to perform filters (both simple and complex),
unions, etc - which you'd have to do manually with
GetElementsByTagName - i.e.

string credits = ConcatenateInnerText(movie, "@Producer |
@Director | Cast/Actor", ", ");

One other observation: it is by no means mandatory, but you could
consider using attributes for singleton values (as I did already
for Actor/@Name and Actor/@Role):

<Movie Title="Lord Of The Rings">
<... the rest as before ...>
</Movie>

This enforces the [0,1] ordinality, and also makes querying the value
slightly easier [this is not the main point] - so with the example
code below, this would be:

string title = movie.GetAttribute("Title");

Of course, while this is probably valid for the Title (excepting those
cases where it gets renamed by locale - "California Man", for
example),
you might want to think about whether @Role is truly a singleton - I'm
thinking of some (terrible) Eddie Murphy films...

A final point; it sounds petty, but the "actor" and "genre" variables
can clearly represent multiple values; as such, I would name them
in the plural. It won't change anything except how you think about
them - but this might be enough to prevent a silly bug (by assuming
that they are singular). this is still enough to warrant an
extra keypress.

(the example below uses your original xml)

Marc

### Example ###

static void Main()
{
XmlDocument doc = new XmlDocument();
doc.LoadXml(YOUR_XML);
foreach (XmlElement movie in doc.SelectNodes("Movies/Movie"))
{
string title = movie.SelectSingleNode("Title").InnerText;
string actors = ConcatenateInnerText(movie, "Cast/Actor",
", ");
string genres = ConcatenateInnerText(movie, "Genres/
Genre", ", ");
Console.WriteLine("{0} is performed by {1} and is in the
genre of {2}", title, actors, genres);
}
}

static string ConcatenateInnerText(XmlNode node, string xpath,
string delimiter)
{
if (node == null) throw new ArgumentNullException("node");
return ConcatenateInnerText(node.SelectNodes(xpath),
delimiter);
}
static string ConcatenateInnerText(XmlNodeList nodes, string
delimiter)
{
if (nodes == null) throw new ArgumentNullException("nodes");
if (delimiter == null) throw new
ArgumentNullException("delimiter");
if (nodes.Count == 0) return "";
StringBuilder sb = new StringBuilder();
foreach (XmlNode node in nodes)
{
string text = node.InnerText;
if (string.IsNullOrEmpty(text)) continue;
if (sb.Length > 0) { sb.Append(delimiter); }
sb.Append(text);
}
return sb.ToString();
}
 
S

SharpieNewbie

Thanks Stephany and Marc.

I have been reading up on the methods that both of you have provided
and i'm learning and testing them out to build it in the ways you've
recommended. I need to understand how these examples work. Appreciate
the samples and the replies!

Cheers

You should really be using xpath expressions, not
GetElementsByTagName;
and reuse results if possible. Additionally, string-concatenation in
a loop should ideally use StringBuilder (I can explain why if you
like).

One advantage of xpath here is that it avoids conflicts if you later
decide to add, say:

<Extras>
<Actor>Foo</Actor>
<Actor>Bar</Actor>
</Extras>

With xpath, you can easily disambiguate between /Cast/Actor and
Extras/Actor - while GetElementsByTagName will return
both; perhaps a bigger issue is that GetElementsByTagName gets
tricky if you start usingxml-namespaces. The other advantage is
that you can perform richer queries - i.e. if the nodes were:

<Cast>
<Actor Name="Orlando Bloom" Role="Legolas"/>
<Actor Name="Sean Bean" Role="Boromir"/>
</Cast>

(of course, you might want to think about how

Then you can query this with xpath "Cast/Actor/@Name" - but to
do the same with GetElementsByTagName needs extra work. You can
also use xpath to perform filters (both simple and complex),
unions, etc - which you'd have to do manually with
GetElementsByTagName - i.e.

string credits = ConcatenateInnerText(movie, "@Producer |
@Director | Cast/Actor", ", ");

One other observation: it is by no means mandatory, but you could
consider using attributes for singleton values (as I did already
for Actor/@Name and Actor/@Role):

<Movie Title="Lord Of The Rings">
<... the rest as before ...>
</Movie>

This enforces the [0,1] ordinality, and also makes querying the value
slightly easier [this is not the main point] - so with the example
code below, this would be:

string title = movie.GetAttribute("Title");

Of course, while this is probably valid for the Title (excepting those
cases where it gets renamed by locale - "California Man", for
example),
you might want to think about whether @Role is truly a singleton - I'm
thinking of some (terrible) Eddie Murphy films...

A final point; it sounds petty, but the "actor" and "genre" variables
can clearly represent multiple values; as such, I would name them
in the plural. It won't change anything except how you think about
them - but this might be enough to prevent a silly bug (by assuming
that they are singular). this is still enough to warrant an
extra keypress.

(the example below uses your originalxml)

Marc

### Example ###

static void Main()
{
XmlDocument doc = new XmlDocument();
doc.LoadXml(YOUR_XML);
foreach (XmlElement movie in doc.SelectNodes("Movies/Movie"))
{
string title = movie.SelectSingleNode("Title").InnerText;
string actors = ConcatenateInnerText(movie, "Cast/Actor",
", ");
string genres = ConcatenateInnerText(movie, "Genres/
Genre", ", ");
Console.WriteLine("{0} is performed by {1} and is in the
genre of {2}", title, actors, genres);
}
}

static string ConcatenateInnerText(XmlNode node, string xpath,
string delimiter)
{
if (node == null) throw new ArgumentNullException("node");
return ConcatenateInnerText(node.SelectNodes(xpath),
delimiter);
}
static string ConcatenateInnerText(XmlNodeList nodes, string
delimiter)
{
if (nodes == null) throw new ArgumentNullException("nodes");
if (delimiter == null) throw new
ArgumentNullException("delimiter");
if (nodes.Count == 0) return "";
StringBuilder sb = new StringBuilder();
foreach (XmlNode node in nodes)
{
string text = node.InnerText;
if (string.IsNullOrEmpty(text)) continue;
if (sb.Length > 0) { sb.Append(delimiter); }
sb.Append(text);
}
return sb.ToString();
}
 
Top