Trying to get just part of an XML file?

E

Ed from AZ

This is posted to both the Word and Excel VBA NG because so far that's
the only way I can figure out to do this. But wait - I throw in FSO
to make it even messier! Using Word and Excel 2007.

We have a proprietary program that saves its working files as XML.
When a user needs to add a list of data, he's presented with either a
grid format that's a pain to work with, or a single-form-per-item page
that's easier to use but wearying after 35+ items. What I want to do
is get the part of the XML to be revised into an Excel worksheet so
the user can work easier. Because this program is proprietary, the
schema is not available to me, and since I'm not familiar with XML
except as text separated by tags, I wouldn't know how to use it.

So my bright idea so far is to open the file for reading in FSO, then
paste that into a new Word doc. This gives me a text file with XML
tags, but without Word's interpretation of the XML. Direct import
into Excel causes Excel to interpret the tags and buries the
information I want to get. Parsing the text in Word, I can set ranges
using the different tags and work my way down to the info. In my
current plan, I'm thinking of driving this from Excel.

The data is set up like so:
<MAIN_CATEGORY>
<DATA_PAGE num="0001">
-- page info --
<GRID_LIST>
<LINE num="0001">
-- line info --
<LINE num="0002">
-- line info --
etc.

The data I want is between the <LINE> tags.

I could get all the data I need by looping through and reading each
data point from <LINE> to </LINE> into an array, then writing that
back into Excel. Or I could simply capture each <GRID_LIST> into a
range, save it out as a separate temp XML file, then import that into
Excel.

Is this really the best way to do this?? Or is there a much easier
way? Any recommendations?
Ed
 
P

Peter Jamieson

I would probably try a different approach to this, although I cannot
provide you with useful detail or code:
a. use Excel VBA and MSXML to extract the nodes and data from the XML.
You don't need a schema to do that. That should save you having to do
your own parsing, and you won't need to involve Word.
b. optionally, if it makes sense to do so, construct an XSLT transform
to transform your XML structure into something simpler to deal with, and
use MSXML to perform the transform. The impression I get is that your
XML is simple enough that you would not need that, which would be handy
because XSLT can be hard work if you are starting from scratch.

Then if I couldn't make that work I'd probably fall back to an approach
that was either similar to yours, except that right now I don't see why
it would be easier to parse the XML in a Word document rather than doing
it directly with Excel VBA.


Peter Jamieson

http://tips.pjmsn.me.uk
 
C

Chip Pearson

You can read the XML file directly using the MSXML library. In VBA,
go to the Tools menu, choose References, and scroll down to "Microsoft
XML, v6.0" and check that item. The code uses some relatively simple
XPath to select the nodes, so I assume that you (1) already know
XPath, or (2) can figure it out, or (3) can ask for assistance with
XPath.

Without know the actual layout of your XML, I just guessed and made up
the following XML file:

<?xml version="1.0" encoding="UTF-8"?>
<MAIN_CATEGORY>
<DATA_PAGE num="001">
<GRID_LIST>
<LINE num="001">
<name>Chip Pearson</name>
<city>Kansas City</city>
</LINE>
<LINE num="002">
<name>Jen Filson</name>
<city>Mission Hills</city>
</LINE>
</GRID_LIST>
</DATA_PAGE>
<DATA_PAGE num="002">
<GRID_LIST>
<LINE num="003">
<name>Janice Downey</name>
<city>Westwood Hills</city>
</LINE>
<LINE num="04">
<name>Mike Stitt</name>
<city>Charlotte</city>
</LINE>
</GRID_LIST>
</DATA_PAGE>
</MAIN_CATEGORY>

Then, you can use code like


Sub AAA()
' Required reference:
' Name: MSXML2
' Description: Microsoft XML, v6.0
' Typical location: C:\Windows\System32\msxml6.dll
' GUID: {F5078F18-C551-11D3-89B9-0000F81FE221}
' Major: 6 Minor: 0

Dim DOM As MSXML2.DOMDocument60
Dim FName As Variant
Dim DataPageList As MSXML2.IXMLDOMNodeList
Dim DataPage As MSXML2.IXMLDOMNodeList
Dim GridList As MSXML2.IXMLDOMNodeList
Dim Grid As MSXML2.IXMLDOMNodeList
Dim LinesList As MSXML2.IXMLDOMNodeList
Dim LineX As MSXML2.IXMLDOMNode
Dim Arr() As String
Dim N As Long
Dim M As Long
Dim P As Long
Dim K As Long
Dim R As Range
Dim J As Long

Set R = Range("C3") '<<<<< OUTPUT ON WORKSHEET
FName = Application.GetOpenFilename("XML Files (*.xml),*.xml")
If FName = False Then
Exit Sub
End If
Set DOM = New MSXML2.DOMDocument60
DOM.Load CStr(FName)
DOM.setProperty "SelectionLanguage", "XPath"

' get all data pages
Set DataPageList = DOM.SelectNodes("/MAIN_CATEGORY/DATA_PAGE")
For N = 0 To DataPageList.Length - 1
Set GridList = DataPageList.Item(N).SelectNodes("./GRID_LIST")
For M = 0 To GridList.Length - 1
Set LinesList = GridList.Item(M).SelectNodes("./LINE")
ReDim Arr(0 To LinesList.Length - 1)
K = 0
For P = 0 To LinesList.Length - 1
Set LineX = LinesList.Item(P)
Arr(K) = LineX.ChildNodes(0).Text & vbCrLf & _
LineX.ChildNodes(1).Text
K = K + 1
Next P
'<< write to worksheet
For J = 0 To UBound(Arr)
R.Offset(0, J).Value = Arr(J)
Next J
Set R = R.Offset(1, 0)
Next M
Next N
End Sub

This lists the contents between the <LINE> and </LINE> tags out to the
worksheet. You'll have to modfy the code a bit to conform with your
actual data structure, but the code will be very muich like that shown
above.

Cordially,
Chip Pearson
Microsoft MVP 1998 - 2010
Pearson Software Consulting, LLC
www.cpearson.com
[email on web site]
 
E

Ed from AZ

Wow!! I didn't even know this existed! Thank you!!

I'll holler back if things start smoking!
Ed


You can read the XML file directly using the MSXML library.  In VBA,
go to the Tools menu, choose References, and scroll down to "Microsoft
XML, v6.0" and check that item.  The code uses some relatively simple
XPath to select the nodes, so I assume that you (1) already know
XPath, or (2) can figure it out, or (3) can ask for assistance with
XPath.

Without know the actual layout of your XML, I just guessed and made up
the following XML file:

<?xml version="1.0" encoding="UTF-8"?>
<MAIN_CATEGORY>
        <DATA_PAGE num="001">
                <GRID_LIST>
                        <LINE num="001">
                                <name>Chip Pearson</name>
                                <city>Kansas City</city>
                        </LINE>
                        <LINE num="002">
                                <name>JenFilson</name>
                                <city>Mission Hills</city>
                        </LINE>
                </GRID_LIST>
        </DATA_PAGE>
        <DATA_PAGE num="002">
                <GRID_LIST>
                        <LINE num="003">
                                <name>Janice Downey</name>
                                <city>Westwood Hills</city>
                        </LINE>
                        <LINE num="04">
                                <name>Mike Stitt</name>
                                <city>Charlotte</city>
                        </LINE>
                </GRID_LIST>
        </DATA_PAGE>
</MAIN_CATEGORY>

Then, you can use code like

Sub AAA()
' Required reference:
'   Name: MSXML2
'   Description: Microsoft XML, v6.0
'   Typical location: C:\Windows\System32\msxml6.dll
'   GUID: {F5078F18-C551-11D3-89B9-0000F81FE221}
'   Major: 6    Minor: 0

Dim DOM As MSXML2.DOMDocument60
Dim FName As Variant
Dim DataPageList As MSXML2.IXMLDOMNodeList
Dim DataPage As MSXML2.IXMLDOMNodeList
Dim GridList As MSXML2.IXMLDOMNodeList
Dim Grid As MSXML2.IXMLDOMNodeList
Dim LinesList As MSXML2.IXMLDOMNodeList
Dim LineX As MSXML2.IXMLDOMNode
Dim Arr() As String
Dim N As Long
Dim M As Long
Dim P As Long
Dim K As Long
Dim R As Range
Dim J As Long

Set R = Range("C3") '<<<<< OUTPUT ON WORKSHEET
FName = Application.GetOpenFilename("XML Files (*.xml),*.xml")
If FName = False Then
    Exit Sub
End If
Set DOM = New MSXML2.DOMDocument60
DOM.Load CStr(FName)
DOM.setProperty "SelectionLanguage", "XPath"

' get all data pages
Set DataPageList = DOM.SelectNodes("/MAIN_CATEGORY/DATA_PAGE")
For N = 0 To DataPageList.Length - 1
    Set GridList = DataPageList.Item(N).SelectNodes("./GRID_LIST")
    For M = 0 To GridList.Length - 1
        Set LinesList = GridList.Item(M).SelectNodes("./LINE")
        ReDim Arr(0 To LinesList.Length - 1)
        K = 0
        For P = 0 To LinesList.Length - 1
            Set LineX = LinesList.Item(P)
            Arr(K) = LineX.ChildNodes(0).Text & vbCrLf & _
                LineX.ChildNodes(1).Text
            K = K + 1
        Next P
        '<< write to worksheet
        For J = 0 To UBound(Arr)
            R.Offset(0, J).Value = Arr(J)
        Next J
        Set R = R.Offset(1, 0)
    Next M
Next N
End Sub

This lists the contents between the <LINE> and </LINE> tags out to the
worksheet.  You'll have to modfy the code a bit to conform with your
actual data structure, but the code will be very muich like that shown
above.

Cordially,
Chip Pearson
Microsoft MVP 1998 - 2010
Pearson Software Consulting, LLCwww.cpearson.com
[email on web site]

This is posted to both the Word and Excel VBA NG because so far that's
the only way I can figure out to do this.  But wait - I throw in FSO
to make it even messier!  Using Word and Excel 2007.
We have a proprietary program that saves its working files as XML.
When a user needs to add a list of data, he's presented with either a
grid format that's a pain to work with, or a single-form-per-item page
that's easier to use but wearying after 35+ items.  What I want to do
is get the part of the XML to be revised into an Excel worksheet so
the user can work easier.  Because this program is proprietary, the
schema is not available to me, and since I'm not familiar with XML
except as text separated by tags, I wouldn't know how to use it.
So my bright idea so far is to open the file for reading in FSO, then
paste that into a new Word doc.  This gives me a text file with XML
tags, but without Word's interpretation of the XML.  Direct import
into Excel causes Excel to interpret the tags and buries the
information I want to get.  Parsing the text in Word, I can set ranges
using the different tags and work my way down to the info.  In my
current plan, I'm thinking of driving this from Excel.
The data is set up like so:
<MAIN_CATEGORY>
<DATA_PAGE num="0001">
-- page info --
<GRID_LIST>
<LINE num="0001">
-- line info --
<LINE num="0002">
-- line info --
etc.
The data I want is between the <LINE> tags.
I could get all the data I need by looping through and reading each
data point from <LINE> to </LINE> into an array, then writing that
back into Excel.  Or I could simply capture each <GRID_LIST> into a
range, save it out as a separate temp XML file, then import that into
Excel.
Is this really the best way to do this??  Or is there a much easier
way?  Any recommendations?
Ed- Hide quoted text -

- Show quoted text -
 
E

Ed from AZ

I just realized one of the major reasons I included Word in this mix.
I am pulling the data into an Excel sheet so I can add to it. I then
need to write the modified data back into the XML file.

Not being familiar with XML or XPath, my thought train was:
-- Read XML file via FSO and paste into Word
-- Find the major groups and sub-groups and set bookmarks
-- Pull the data into Excel by parsing the tags and text
-- Add new data and modify existing data as required
-- Create an array of each sub-group and group, wrapping the data in
the proper tags
-- Replace the contents of the associated bookmark range with the new
data
-- Write the whole kaboodle back into the XML file

So the hiccup in my thinking was how to get the data from Excel back
into the XML file. All I could think of was using the bookmarks and
ranges in Word.

Is there a better way?
Ed


You can read the XML file directly using the MSXML library.  In VBA,
go to the Tools menu, choose References, and scroll down to "Microsoft
XML, v6.0" and check that item.  The code uses some relatively simple
XPath to select the nodes, so I assume that you (1) already know
XPath, or (2) can figure it out, or (3) can ask for assistance with
XPath.

Without know the actual layout of your XML, I just guessed and made up
the following XML file:

<?xml version="1.0" encoding="UTF-8"?>
<MAIN_CATEGORY>
        <DATA_PAGE num="001">
                <GRID_LIST>
                        <LINE num="001">
                                <name>Chip Pearson</name>
                                <city>Kansas City</city>
                        </LINE>
                        <LINE num="002">
                                <name>JenFilson</name>
                                <city>Mission Hills</city>
                        </LINE>
                </GRID_LIST>
        </DATA_PAGE>
        <DATA_PAGE num="002">
                <GRID_LIST>
                        <LINE num="003">
                                <name>Janice Downey</name>
                                <city>Westwood Hills</city>
                        </LINE>
                        <LINE num="04">
                                <name>Mike Stitt</name>
                                <city>Charlotte</city>
                        </LINE>
                </GRID_LIST>
        </DATA_PAGE>
</MAIN_CATEGORY>

Then, you can use code like

Sub AAA()
' Required reference:
'   Name: MSXML2
'   Description: Microsoft XML, v6.0
'   Typical location: C:\Windows\System32\msxml6.dll
'   GUID: {F5078F18-C551-11D3-89B9-0000F81FE221}
'   Major: 6    Minor: 0

Dim DOM As MSXML2.DOMDocument60
Dim FName As Variant
Dim DataPageList As MSXML2.IXMLDOMNodeList
Dim DataPage As MSXML2.IXMLDOMNodeList
Dim GridList As MSXML2.IXMLDOMNodeList
Dim Grid As MSXML2.IXMLDOMNodeList
Dim LinesList As MSXML2.IXMLDOMNodeList
Dim LineX As MSXML2.IXMLDOMNode
Dim Arr() As String
Dim N As Long
Dim M As Long
Dim P As Long
Dim K As Long
Dim R As Range
Dim J As Long

Set R = Range("C3") '<<<<< OUTPUT ON WORKSHEET
FName = Application.GetOpenFilename("XML Files (*.xml),*.xml")
If FName = False Then
    Exit Sub
End If
Set DOM = New MSXML2.DOMDocument60
DOM.Load CStr(FName)
DOM.setProperty "SelectionLanguage", "XPath"

' get all data pages
Set DataPageList = DOM.SelectNodes("/MAIN_CATEGORY/DATA_PAGE")
For N = 0 To DataPageList.Length - 1
    Set GridList = DataPageList.Item(N).SelectNodes("./GRID_LIST")
    For M = 0 To GridList.Length - 1
        Set LinesList = GridList.Item(M).SelectNodes("./LINE")
        ReDim Arr(0 To LinesList.Length - 1)
        K = 0
        For P = 0 To LinesList.Length - 1
            Set LineX = LinesList.Item(P)
            Arr(K) = LineX.ChildNodes(0).Text & vbCrLf & _
                LineX.ChildNodes(1).Text
            K = K + 1
        Next P
        '<< write to worksheet
        For J = 0 To UBound(Arr)
            R.Offset(0, J).Value = Arr(J)
        Next J
        Set R = R.Offset(1, 0)
    Next M
Next N
End Sub

This lists the contents between the <LINE> and </LINE> tags out to the
worksheet.  You'll have to modfy the code a bit to conform with your
actual data structure, but the code will be very muich like that shown
above.

Cordially,
Chip Pearson
Microsoft MVP 1998 - 2010
Pearson Software Consulting, LLCwww.cpearson.com
[email on web site]

This is posted to both the Word and Excel VBA NG because so far that's
the only way I can figure out to do this.  But wait - I throw in FSO
to make it even messier!  Using Word and Excel 2007.
We have a proprietary program that saves its working files as XML.
When a user needs to add a list of data, he's presented with either a
grid format that's a pain to work with, or a single-form-per-item page
that's easier to use but wearying after 35+ items.  What I want to do
is get the part of the XML to be revised into an Excel worksheet so
the user can work easier.  Because this program is proprietary, the
schema is not available to me, and since I'm not familiar with XML
except as text separated by tags, I wouldn't know how to use it.
So my bright idea so far is to open the file for reading in FSO, then
paste that into a new Word doc.  This gives me a text file with XML
tags, but without Word's interpretation of the XML.  Direct import
into Excel causes Excel to interpret the tags and buries the
information I want to get.  Parsing the text in Word, I can set ranges
using the different tags and work my way down to the info.  In my
current plan, I'm thinking of driving this from Excel.
The data is set up like so:
<MAIN_CATEGORY>
<DATA_PAGE num="0001">
-- page info --
<GRID_LIST>
<LINE num="0001">
-- line info --
<LINE num="0002">
-- line info --
etc.
The data I want is between the <LINE> tags.
I could get all the data I need by looping through and reading each
data point from <LINE> to </LINE> into an array, then writing that
back into Excel.  Or I could simply capture each <GRID_LIST> into a
range, save it out as a separate temp XML file, then import that into
Excel.
Is this really the best way to do this??  Or is there a much easier
way?  Any recommendations?
Ed-
 
E

Ed from AZ

Well, now I'm thinking I can do it like this:
-- Begin reading the XML node by node, saving everything into a string
until it hits the node that begins the data I want to pull into Excel.
-- Continue reading, pulling into Excel, until it hits the node at the
end of the data.
-- Finish reading the file, saving everything into a second string.

When I finish in Excel, I can wrap all that in the proper tags, then
assemble strStart & strData & strEnd and write that back into the XML
file.

Leaves Word out of the loop (although I may leave it in for debugging
so I write out what I capture - it's probably too much for the
Debug.Print window!) and makes things a lot smoother.

If it starts smoking, I'll yell for help!
Ed


I just realized one of the major reasons I included Word in this mix.
I am pulling the data into an Excel sheet so I can add to it.  I then
need to write the modified data back into the XML file.

Not being familiar with XML or XPath, my thought train was:
-- Read XML file via FSO and paste into Word
-- Find the major groups and sub-groups and set bookmarks
-- Pull the data into Excel by parsing the tags and text
-- Add new data and modify existing data as required
-- Create an array of each sub-group and group, wrapping the data in
the proper tags
-- Replace the contents of the associated bookmark range with the new
data
-- Write the whole kaboodle back into the XML file

So the hiccup in my thinking was how to get the data from Excel back
into the XML file.  All I could think of was using the bookmarks and
ranges in Word.

Is there a better way?
Ed

You can read the XML file directly using the MSXML library.  In VBA,
go to the Tools menu, choose References, and scroll down to "Microsoft
XML, v6.0" and check that item.  The code uses some relatively simple
XPath to select the nodes, so I assume that you (1) already know
XPath, or (2) can figure it out, or (3) can ask for assistance with
XPath.
Without know the actual layout of your XML, I just guessed and made up
the following XML file:
<?xml version="1.0" encoding="UTF-8"?>
<MAIN_CATEGORY>
        <DATA_PAGE num="001">
                <GRID_LIST>
                        <LINE num="001">
                                <name>Chip Pearson</name>
                                <city>Kansas City</city>
                        </LINE>
                        <LINE num="002">
                                <name>Jen Filson</name>
                                <city>Mission Hills</city>
                        </LINE>
                </GRID_LIST>
        </DATA_PAGE>
        <DATA_PAGE num="002">
                <GRID_LIST>
                        <LINE num="003">
                                <name>Janice Downey</name>
                                <city>Westwood Hills</city>
                        </LINE>
                        <LINE num="04">
                                <name>Mike Stitt</name>
                                <city>Charlotte</city>
                        </LINE>
                </GRID_LIST>
        </DATA_PAGE>
</MAIN_CATEGORY>
Then, you can use code like
Sub AAA()
' Required reference:
'   Name: MSXML2
'   Description: Microsoft XML, v6.0
'   Typical location: C:\Windows\System32\msxml6.dll
'   GUID: {F5078F18-C551-11D3-89B9-0000F81FE221}
'   Major: 6    Minor: 0
Dim DOM As MSXML2.DOMDocument60
Dim FName As Variant
Dim DataPageList As MSXML2.IXMLDOMNodeList
Dim DataPage As MSXML2.IXMLDOMNodeList
Dim GridList As MSXML2.IXMLDOMNodeList
Dim Grid As MSXML2.IXMLDOMNodeList
Dim LinesList As MSXML2.IXMLDOMNodeList
Dim LineX As MSXML2.IXMLDOMNode
Dim Arr() As String
Dim N As Long
Dim M As Long
Dim P As Long
Dim K As Long
Dim R As Range
Dim J As Long
Set R = Range("C3") '<<<<< OUTPUT ON WORKSHEET
FName = Application.GetOpenFilename("XML Files (*.xml),*.xml")
If FName = False Then
    Exit Sub
End If
Set DOM = New MSXML2.DOMDocument60
DOM.Load CStr(FName)
DOM.setProperty "SelectionLanguage", "XPath"
' get all data pages
Set DataPageList = DOM.SelectNodes("/MAIN_CATEGORY/DATA_PAGE")
For N = 0 To DataPageList.Length - 1
    Set GridList = DataPageList.Item(N).SelectNodes("./GRID_LIST")
    For M = 0 To GridList.Length - 1
        Set LinesList = GridList.Item(M).SelectNodes("./LINE")
        ReDim Arr(0 To LinesList.Length - 1)
        K = 0
        For P = 0 To LinesList.Length - 1
            Set LineX = LinesList.Item(P)
            Arr(K) = LineX.ChildNodes(0).Text & vbCrLf & _
                LineX.ChildNodes(1).Text
            K = K + 1
        Next P
        '<< write to worksheet
        For J = 0 To UBound(Arr)
            R.Offset(0, J).Value = Arr(J)
        Next J
        Set R = R.Offset(1, 0)
    Next M
Next N
End Sub
This lists the contents between the <LINE> and </LINE> tags out to the
worksheet.  You'll have to modfy the code a bit to conform with your
actual data structure, but the code will be very muich like that shown
above.
Cordially,
Chip Pearson
Microsoft MVP 1998 - 2010
Pearson Software Consulting, LLCwww.cpearson.com
[email on web site]
This is posted to both the Word and Excel VBA NG because so far that's
the only way I can figure out to do this.  But wait - I throw in FSO
to make it even messier!  Using Word and Excel 2007.
We have a proprietary program that saves its working files as XML.
When a user needs to add a list of data, he's presented with either a
grid format that's a pain to work with, or a single-form-per-item page
that's easier to use but wearying after 35+ items.  What I want to do
is get the part of the XML to be revised into an Excel worksheet so
the user can work easier.  Because this program is proprietary, the
schema is not available to me, and since I'm not familiar with XML
except as text separated by tags, I wouldn't know how to use it.
So my bright idea so far is to open the file for reading in FSO, then
paste that into a new Word doc.  This gives me a text file with XML
tags, but without Word's interpretation of the XML.  Direct import
into Excel causes Excel to interpret the tags and buries the
information I want to get.  Parsing the text in Word, I can set ranges
using the different tags and work my way down to the info.  In my
current plan, I'm thinking of driving this from Excel.
The data is set up like so:
<MAIN_CATEGORY>
<DATA_PAGE num="0001">
-- page info --
<GRID_LIST>
<LINE num="0001">
-- line info --
<LINE num="0002">
-- line info --
etc.
The data I want is between the <LINE> tags.
I could get all the data I need by looping through and reading each
data point from <LINE> to </LINE> into an array, then writing that
back into Excel.  Or I could simply capture each <GRID_LIST> into a
range, save it out as a separate temp XML file, then import that into
Excel.
Is this really the best way to do this??  Or is there a much easier
way?  Any recommendations?
Ed--
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top