xml parser and working with prefix, name and attributes.

J

jleslie48

Hello,

so I"m given an xml file of an excel spreadsheet that has 3 worksheets
in it on line three I open the element "Workbook" and then:


***********************************************************
<?xml version="1.0"?>
<?mso-application progID="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:blush:ffice:spreadsheet"
xmlns:blush:="urn:schemas-microsoft-com:blush:ffice:blush:ffice"
xmlns:x="urn:schemas-microsoft-com:blush:ffice:excel"
xmlns:ss="urn:schemas-microsoft-com:blush:ffice:spreadsheet"
xmlns:html="http://www.w3.org/TR/REC-html40">
<DocumentProperties xmlns="urn:schemas-microsoft-com:blush:ffice:blush:ffice">
<LastAuthor>Ron Thorson</LastAuthor>
<LastPrinted>2010-03-15T17:43:03Z</LastPrinted>
<Created>2010-03-15T11:45:27Z</Created>
<LastSaved>2010-03-15T13:39:16Z</LastSaved>
<Version>1.0</Version>
</DocumentProperties>
<OfficeDocumentSettings xmlns="urn:schemas-microsoft-
com:blush:ffice:blush:ffice">
<DownloadComponents/>
<LocationOfComponents HRef="file:///E:\"/>
</OfficeDocumentSettings>
<ExcelWorkbook xmlns="urn:schemas-microsoft-com:blush:ffice:excel">
<WindowHeight>5280</WindowHeight>
<WindowWidth>9000</WindowWidth>

.... a whole lot of stuff and then ....

</Styles>
<Worksheet ss:Name="Messages">
<Table ss:StyleID="s22" ss:DefaultColumnWidth="42"
ss:DefaultRowHeight="11.25">
<Column ss:StyleID="s22" ss:AutoFitWidth="0" ss:Width="150"/>
<Column ss:StyleID="s22" ss:AutoFitWidth="0" ss:Width="150"/>
<Column ss:StyleID="s22" ss:AutoFitWidth="0" ss:Width="200"/>
<Column ss:StyleID="s22" ss:AutoFitWidth="0" ss:Width="280"/>


.... a whole punch of stuff for the worksheet (messages)...

<ProtectObjects>False</ProtectObjects>
<ProtectScenarios>False</ProtectScenarios>
</WorksheetOptions>
</Worksheet>
<Worksheet ss:Name="UL-0">
<Table ss:StyleID="s22" ss:DefaultColumnWidth="42"
ss:DefaultRowHeight="11.25">
<Column ss:StyleID="s22" ss:AutoFitWidth="0" ss:Width="65"/>
<Column ss:StyleID="s22" ss:AutoFitWidth="0" ss:Width="65"/>


..... now a bunch of stuff for worksheet (UL-0)...

<ProtectObjects>False</ProtectObjects>
<ProtectScenarios>False</ProtectScenarios>
</WorksheetOptions>
</Worksheet>
<Worksheet ss:Name="DL-0">
<Table ss:StyleID="s22" ss:DefaultColumnWidth="42"
ss:DefaultRowHeight="11.25">
<Column ss:StyleID="s22" ss:AutoFitWidth="0" ss:Width="65"/>
<Column ss:StyleID="s22" ss:AutoFitWidth="0" ss:Width="65"/>
<Column ss:StyleID="s22" ss:AutoFitWidth="0" ss:Width="200"/>
<Column ss:StyleID="s22" ss:AutoFitWidth="0" ss:Width="150"/>
<Column ss:StyleID="s22" ss:AutoFitWidth="0" ss:Width="100"/>
<Column ss:StyleID="s22" ss:AutoFitWidth="0" ss:Width="65"/>
<Column ss:StyleID="s22" ss:AutoFitWidth="0" ss:Width="65"/>
<Column ss:StyleID="s22" ss:AutoFitWidth="0" ss:Width="65"/>
<Column ss:StyleID="s22" ss:AutoFitWidth="0" ss:Width="65"/>
<Column ss:StyleID="s22" ss:AutoFitWidth="0" ss:Width="65"/>
<Column ss:StyleID="s22" ss:AutoFitWidth="0" ss:Width="65"/>
<Row ss:StyleID="s33" ss:AutoFitHeight="0" ss:Height="14.25">
<Cell ss:StyleID="s33"><Data ss:Type="String">Date</Data></Cell>
<Cell ss:StyleID="s33"><Data ss:Type="String">Time</Data></Cell>
<Cell ss:StyleID="s33"><Data ss:Type="String">Type</Data></Cell>
<Cell ss:StyleID="s33"><Data ss:Type="String">Sub Type</Data></Cell>
<Cell ss:StyleID="s33"><Data ss:Type="String">Transaction</Data></
Cell>

.... now I'm looking at the data for worksheet "DL-0" this is the one I
want...


<Cell ss:StyleID="s85"><Data ss:Type="Number">0</Data></Cell>
<Cell ss:StyleID="s108"><Data ss:Type="Number">0.0</Data></Cell>
<Cell ss:StyleID="s32"><Data ss:Type="String">High</Data></Cell>
<Cell ss:StyleID="s32"><Data ss:Type="String">Primary</Data></Cell>
<Cell ss:StyleID="s85"><Data ss:Type="Number">6</Data></Cell>
</Row>
</Table>
<WorksheetOptions xmlns="urn:schemas-microsoft-com:blush:ffice:excel">
<PageSetup>
<Layout x:Orientation="Landscape" x:CenterHorizontal="1"/>
<Header x:Margin="0.19"/>
<Footer x:Data="&amp;R&amp;D &amp;T"/>
<PageMargins x:Left="0.24" x:Right="0.16" x:Top="0.38"/>
</PageSetup>
<Selected/>
<DoNotDisplayGridlines/>
<ProtectObjects>False</ProtectObjects>
<ProtectScenarios>False</ProtectScenarios>
</WorksheetOptions>
</Worksheet>
</Workbook>

************************************************************


that is the end of the file.

I want to process the data in the worksheet "DL-0" only. I want to
make an answer set of the rows of data of that worksheet pulling out
specific columns, say the Date column (that is the first one) and the
time (the second column) and specific ones of the 3-12 (although I'd
be happy with a 2D array of all the rows and columns. )

Can someone show me and explain the naming sequence for either
GetElementsByTagName, or
foreach (XmlElement xxx in doc.SelectNodes(???) or
some other methodology?

I see there is something with the AT sign (@) that I think makes a
selected answer set, sorry, I'm new at this.
 
J

jleslie48

jleslie48 said:
[...]
Can someone show me and explain the naming sequence for either
GetElementsByTagName, or
foreach (XmlElement xxx in doc.SelectNodes(???)   or
some other methodology?

Your question is very vague.  Also, you posted way too much sample data
and way too little sample code.

What "naming sequence" are you talking about?  Are you asking about
XPath?  If so, you might find the specification useful:http://www.w3.org/TR/xpath/

If you're asking about something else, you should try to find a way to
ask your question in a more clear way:

   • You should include a concise-but-complete code example that shows
what you're trying to do, what you have working now, and what isn't
working for you.

   • You should provide only enough sample data to show precisely the
specific problem you're having.  Don't expect people to read through a
couple of pages of XML to try to figure out what part of that XML it is
you're having trouble with.

Pete

Sorry, I'll try and be more precise.

the XML file has the following:


<Workbook xmlns="urn:schemas-microsoft-com:blush:ffice:spreadsheet"
xmlns:blush:="urn:schemas-microsoft-com:blush:ffice:blush:ffice"
xmlns:x="urn:schemas-microsoft-com:blush:ffice:excel"
xmlns:ss="urn:schemas-microsoft-
com:blush:ffice:spreadsheet"
xmlns:html="http://www.w3.org/TR/REC-html40">
<Worksheet ss:Name="Messages">
<Table ss:StyleID="s22" ss:DefaultColumnWidth="42"
ss:DefaultRowHeight="11.25">
<Row ss:StyleID="s33" ss:AutoFitHeight="0" ss:Height="14.25">
<Cell ss:StyleID="s33"><Data ss:Type="String">Start Time</
Data></Cell>
<Cell ss:StyleID="s33"><Data ss:Type="String">Stop Time</
Data></Cell>
...
</Row>
</Table>
</Worksheet>

<Worksheet ss:Name="UL-0">
<Table ss:StyleID="s22" ss:DefaultColumnWidth="42"
ss:DefaultRowHeight="11.25">
<Column ss:StyleID="s22" ss:AutoFitWidth="0" ss:Width="65"/>
<Column ss:StyleID="s22" ss:AutoFitWidth="0" ss:Width="65"/>
...
<Row ss:StyleID="s33" ss:AutoFitHeight="0" ss:Height="14.25">
<Cell ss:StyleID="s33"><Data ss:Type="String">Date</
Data></Cell>
<Cell ss:StyleID="s33"><Data ss:Type="String">Time</
Data></Cell>
<Cell ss:StyleID="s33"><Data ss:Type="String">Type</
Data></Cell>
...
</Row>
<Row ss:AutoFitHeight="1" ss:StyleID="Default">
<Cell ss:StyleID="s32"><Data
ss:Type="String">2010.04.29</Data></Cell>
<Cell ss:StyleID="s32"><Data ss:Type="String">04:56:34</
Data></Cell>
...
</Row>
<Row ss:AutoFitHeight="1" ss:StyleID="Default">
<Cell ss:StyleID="s32"><Data
ss:Type="String">2010.04.29</Data></Cell>
<Cell ss:StyleID="s32"><Data ss:Type="String">04:56:34</
Data></Cell>
...
</Row>
... (lots more rows) ...
</Table>
</Worksheet>

<Worksheet ss:Name="DL-0">
<Table ss:StyleID="s22" ss:DefaultColumnWidth="42"
ss:DefaultRowHeight="11.25">
<Column ss:StyleID="s22" ss:AutoFitWidth="0" ss:Width="65"/>
<Column ss:StyleID="s22" ss:AutoFitWidth="0" ss:Width="65"/>
...
<Row ss:StyleID="s33" ss:AutoFitHeight="0" ss:Height="14.25">
<Cell ss:StyleID="s33"><Data ss:Type="String">Date</
Data></Cell>
<Cell ss:StyleID="s33"><Data ss:Type="String">Time</
Data></Cell>
<Cell ss:StyleID="s33"><Data ss:Type="String">Type</
Data></Cell>
...
</Row>
<Row ss:AutoFitHeight="1" ss:StyleID="Default">
<Cell ss:StyleID="s32"><Data
ss:Type="String">2010.04.29</Data></Cell>
<Cell ss:StyleID="s32"><Data ss:Type="String">04:56:34</
Data></Cell>
...
</Row>
<Row ss:AutoFitHeight="1" ss:StyleID="Default">
<Cell ss:StyleID="s32"><Data
ss:Type="String">2010.04.29</Data></Cell>
<Cell ss:StyleID="s32"><Data ss:Type="String">04:56:34</
Data></Cell>
...
</Row>
... (lots more rows) ...

</Table>
</Worksheet>
</Workbook>


I want all to make a 2d array of rows and columns (cells) for all the
rows of the <worksheet labeled: ss:Name="DL-0"

I think that involves XPath, but I'm not familar with xml enough to
know. My boss said it was something like, "/worksheet@ss:Name="DL-0"/
Cell/Data" to make an answer set, but then he did a lot of hand waving
and said to go look it up.

I looked at that link you sent, and its a good reference for the xml
language, it even has the grammar, but its not helping with the
semantics of xml.

I was looking in C# for something like (pseudo code:)

ans_set = XmlParser.findall("/worksheet@ss:Name="DL-0"/Cell/Data" )

where all the contents of the data element are listed by there row:

printf(" row 3, column 6 of the worksheet dl-0 is:%s\n", ans_set[2]
[5].ToString);

again, sorry for the pseudo code.
 
J

jleslie48

jleslie48 said:
[...]
I want all to make a 2d array of rows and columns (cells) for all the
rows of the <worksheet labeled: ss:Name="DL-0"
I think that involves XPath, but I'm not familar with xml enough to
know.  My boss said it was something like, "/worksheet@ss:Name="DL-0"/
Cell/Data" to make an answer set, but then he did a lot of hand waving
and said to go look it up.

Sounds like your boss doesn't know either.

You probably want something like "/worksheet[@ss:Name='DL-0']", which is
an XPath expression that refers to all worksheet elements with the
attribute "ss:Name" equal to "DL-0".
I looked at that link you sent, and its a good reference for the xml
language, it even has the grammar, but its not helping with the
semantics of xml.

You should have been able to come up with the right XPath expression
based on reading that documentation.
I was looking in C# for something like (pseudo code:)
ans_set = XmlParser.findall("/worksheet@ss:Name="DL-0"/Cell/Data" )

What's "XmlParser"?

Both the System.Xml and System.Xml.Linq XML-handling APIs support XPath,
so you can use syntax like the above to refer to specific data within
the XML document.  The exact means of accessing that data is different
between the .NET namespaces.

Also, because your XML has namespace-qualified attribute names, you'll
have to pay close attention to how to provide fully-qualified names in
the API you're using (see XName and XNamespace for System.Xml.Linq…if
you're using the XmlDocument or XmlReader API, the XmlNamespaceManager
class can help you define namespaces used in the XPath expressions).

It's helpful to keep in mind that even if you see the letters "ss" in
your XML, namespace declarations within the XML cause that "ss" to
expand to the actual namespace URI.  It's that URI that's meaningful in
the context of naming the XML elements and attributes, not the "ss"
itself (but you can use XName/XNamespace or the XmlNamespaceManager to
handle the translation for you, as appropriate).
where all the contents of the data element are listed by there row:
printf(" row 3, column 6 of the worksheet dl-0 is:%s\n", ans_set[2]
[5].ToString);
again, sorry for the pseudo code.

Here are some articles you may find helpful:

http://www.yoda.arachsys.com/csharp...10/08/29/writing-the-perf...http://sscce.org/

Pete

thanks pete, and yes my boss doesn't know either, that's what the hand
waiving is. I'll keep digging.
 
J

jleslie48

jleslie48 said:
[...]
I want all to make a 2d array of rows and columns (cells) for all the
rows of the <worksheet labeled: ss:Name="DL-0"
I think that involves XPath, but I'm not familar with xml enough to
know.  My boss said it was something like, "/worksheet@ss:Name="DL-0"/
Cell/Data" to make an answer set, but then he did a lot of hand waving
and said to go look it up.

Sounds like your boss doesn't know either.

You probably want something like "/worksheet[@ss:Name='DL-0']", which is
an XPath expression that refers to all worksheet elements with the
attribute "ss:Name" equal to "DL-0".
I looked at that link you sent, and its a good reference for the xml
language, it even has the grammar, but its not helping with the
semantics of xml.

You should have been able to come up with the right XPath expression
based on reading that documentation.
I was looking in C# for something like (pseudo code:)
ans_set = XmlParser.findall("/worksheet@ss:Name="DL-0"/Cell/Data" )

What's "XmlParser"?

Both the System.Xml and System.Xml.Linq XML-handling APIs support XPath,
so you can use syntax like the above to refer to specific data within
the XML document.  The exact means of accessing that data is different
between the .NET namespaces.

Also, because your XML has namespace-qualified attribute names, you'll
have to pay close attention to how to provide fully-qualified names in
the API you're using (see XName and XNamespace for System.Xml.Linq…if
you're using the XmlDocument or XmlReader API, the XmlNamespaceManager
class can help you define namespaces used in the XPath expressions).

It's helpful to keep in mind that even if you see the letters "ss" in
your XML, namespace declarations within the XML cause that "ss" to
expand to the actual namespace URI.  It's that URI that's meaningful in
the context of naming the XML elements and attributes, not the "ss"
itself (but you can use XName/XNamespace or the XmlNamespaceManager to
handle the translation for you, as appropriate).
where all the contents of the data element are listed by there row:
printf(" row 3, column 6 of the worksheet dl-0 is:%s\n", ans_set[2]
[5].ToString);
again, sorry for the pseudo code.

Here are some articles you may find helpful:

http://www.yoda.arachsys.com/csharp...10/08/29/writing-the-perf...http://sscce.org/

Pete

Ok, thanks gain.

I've got something working:

/**********************************
try
{
XmlDocument xDoc = new XmlDocument();
//xDoc.Load("sampleXML.xml"); //File is located in the "\bin
\Debug" folder
xDoc.Load("050335.xml"); //File is located in the "\bin\Debug"
folder


XmlNode rootNode = xDoc.DocumentElement;

string ssNamespacePrefix = "ss";
string ssNamespaceURI =
rootNode.GetNamespaceOfPrefix(ssNamespacePrefix);
int row_count;

XPathNavigator xpathNav = xDoc.CreateNavigator();

XmlNamespaceManager namespaceManager = new
XmlNamespaceManager(xDoc.NameTable);
namespaceManager.AddNamespace(ssNamespacePrefix, ssNamespaceURI);

string workSheetXPath = string.Format(
"//{0}:Worksheet[@{0}:Name='{1}']/{0}:Table/
{0}:Row/{0}:Cell/{0}:Data"
, ssNamespacePrefix, "DL-0");

Console.WriteLine(workSheetXPath);

XPathNodeIterator itr = xpathNav.Select(workSheetXPath,
namespaceManager);

row_count = 0;
while (itr.MoveNext()) {
row_count++;
Console.WriteLine(row_count + ":" + itr.Current.Value);
}//while


*********************************/


the variable workSheetXPath

gets set to:
//ss:Worksheet[@ss:Name='DL-0']/ss:Table/ss:Row/ss:Cell/ss:Data

and I get 480 lines of the data fields.

if I change workSheetXPath to:
//ss:Worksheet[@ss:Name='DL-0']/ss:Table/ss:Row

I get 40 lines where each line is the entire row:
1:DateTimeTypeSub TypeTransactionRSLBERTower-RSLTower-
BERBandTowerTimeSlot
2:2010.04.2905:03:35PAGING COMMAND (CCh) T=000.000.0HighTertiary0
3:2010.04.2905:03:35PAGING COMMAND (CCh) T=000.000.0HighSecondary0

now I can determine that each row has 12 columns (480/40 == 12) but
that doesn't seem
like a very efficient way to do that. How would I go about
determining the number of columns?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top