Looking for Ideas: Translating Document into Commands in XML

C

Charles Law

Hi guys

A bit of curve ball here ... I have a document (Word) that contains a series
of instructions in sections and subsections (and sub-subsections). There are
350 pages of them.

I need to translate these instructions into something that can be processed
automatically, so I have used the Command pattern to set up a set of
commands that correspond to the various instructions in the document.

I have started to enter the instructions into an xml file, which I can
deserialise into my command hierarchy. However, transcribing 350 pages into
an xml document is tedious, time-consuming and error prone. Because I have
sections and subsections, my xml file is quite wide, as well as very long. I
use XMLSpy to edit the file, but I am forever scrolling backwards and
forwards, up and down, cutting and pasting, and losing my place.

Does anyone have any thoughts on how I might improve the situation, make my
file more maintainable, and perhaps automate the process somehow?

My first thought is to write a simple program to maintain the xml file, but
that could take just as long as entering the data.

Any thoughts very welcome.

TIA

Charles
 
C

Cor Ligthert

Charles,

This is suplied too this newsgroup some weeks ago

\\\Eval by Nigel Amstrong
1. Create a file called: DynamicMath.js
2. Add this code to it:

class DynamicMath
{
static function Eval(MathExpression : String) : double
{
return eval(MathExpression);

};
}

3. Compile it with the command line jsc compiler: jsc /t:library
DynamicMath.js
4. Add a reference to DynamicMath.dll to your project (and to
Microsoft.JScript.dll as well)
5. Use from your favourite .NET language:
Dim d As Double = DynamicMath.Eval("2 + 3 + 4")
MessageBox.Show(d)
6. That's it..
///
Cor
 
C

Charles Law

Hi Cor

Thanks for the reply. I don't think it is going to help unfortunately,
unless I have missed something.

The problem is not so much how to implement the commands nor evaluate them.
Rather, I am trying to find some reliable means of translating a document
into a 'command script'. For example, the document might contain something
like this

1 Start Here
1.1 First Group
1.1.1 Instructions
Do something
Do something else
1.1.2 More Instructions
Do this other thing
Do first thing again
1.2 Second Group
...
1.3 More of the Same

I need to translate this into something that can be deserialised into a
hierarchy of commands, in the style of the Command pattern, so that each
Command can be executed, in sequence. The result of the operation will be

Do something
Do something else
Do this other thing
Do first thing again
...

Charles
 
C

Cor Ligthert

Charles,

When it was my problem and the document is well done, than I would look what
Word automation could do for me and than first look at the allinea settings.

However I did not do this a long time, but you asked for idea's

Cor
 
C

Charles Law

Hi Cor

You are right. I did ask for ideas, and the Word automation is a distinct
possibility.

Unfortunately, the document is not well done, that is to say it is not
consistent in its terminology, and so it would be difficult to create rules
around the written text; but nevertheless possible.

I think you have succinctly cut to the heart of the matter. I was hoping for
a panacea, when all the time fearing that there is no easy option: I either
have to create a sophisticated programme to process the document, or bite
the bullet and get typing.

[A special prize will be awarded to the person who correctly identifies all
the metaphors, and can accurately report the number used]

Charles
 
J

Jay B. Harlow [MVP - Outlook]

Charles,
Which version of Word?

Later versions of Word (XP, 2003, not sure about 2000) support saving as an
XML file.

I would then consider passing Word's XML file to a XSLT transform to
"simplify" the document, then read this "simplified" XML in my program...

Looking at the help for Word 2003, you might be able to define an Xml Schema
that you could attach to your Word Document replace parts of the Word
document with Xml tags. I would think with some effort you might be able to
automate replacing parts of the document with tags, which may eliminate the
need for the XSLT transform.

Note: I've used Xml in Word very minimally.

Hope this helps
Jay
 
T

Thug Passion

The example below ... is it laid out in a table? Do you have any kind
of formatting or anything else that makes the # ( 1.1.2 ) stand out
from the text ( more ins ) and then the stuff below it?

If you can tell the stuff apart, and there's a solid pattern, you
could write a VBA script to loop through the entire document,
line-by-line, and figure out where in the XML each line goes.
 
C

Charles Law

Hi Jay

I noticed the Save As XML so tried it (I have just moved from Word XP to
2003). The resultant file, with no transform, was 9 Mb. I then tried to load
it into XMLSpy and after about 10 minutes of a blank window it GPF'ed on me
:-(

I think you have probably hit on something though, but I don't know XSLT
well enough to know how to start with transforming the file. From what I
could make of the file after loading it in Notepad, it contains a tremendous
amount of bloat. For example, formatting and layout information that I just
don't need. I really only want the structure, after the first pass anyway.
Then I could set about translating the text into something more formal.
Also, this translation process will be a one-off, or at most occasional when
the document changes. It will be the cut down, formal xml file that my
program will read at start-up.

Thanks for the suggestion. I will look into it further.

Cheers.

Charles
 
C

Charles Law

Hi (do I call you Thug?)

On the whole, tables are not used, although in places they are [sadly the
document structure is not as consistent as it might be]. And in another
document that I will have to process tables are used extensively. However,
in this one heading styles are used for each section heading, so I could
glean the hierarchy from those. If I have to resort to a program to do the
processing, I guess this will have to be the way to do it.

Thanks for the suggestion.

Charles
 
J

Jay B. Harlow [MVP - Outlook]

Charles,
The resultant file, with no transform, was 9 Mb.
That is where doing what Thug & I suggested first using a VBA Script to
automate cleaning up the document first. Getting it closer to a "nicer" XML
format first. Then save it, then possible apply an XSLT, then process it....

Is this document a one time thing or is it going to be ongoing?

If its ongoing I would seriously consider defining a template in Word that
helps enforce the format required.

Hope this helps
Jay
 
C

Charles Law

Hi Jay

It is a create once, update occasionally file. Unfortunately, the creation
and maintenance of the document is outside my control, and at 650 pages
(last count) it is unlikely that the client will change it now to fit some
template that I might define.

I am currently looking at creating a VB.NET program to iterate through the
document extracting the bits I need, and perhaps changing them to be more
consistent. You mention VBA script: is that for a specific reason (as
opposed to VB.NET) or does it not matter especially?

Charles
 
J

Jay B. Harlow [MVP - Outlook]

Charles,
The VBA script runs within Word, VB.NET would drive word.

If the VBA script is going to be doing a lot, then it may run faster then
VB.NET will, as VBA is an in-process COM object, while VB.NET is (normally)
an out-of-process COM Interop object.

If the script is only going to be one or two routines I find doing it
directly in Word is easier then creating a VB.NET program to do it,
especially if the routine is only going to be used once.

If the problem looks like it could benefit from OO then I start with VB.NET
to leverage OO. If the problem looks like it will simply be one or two
routines & a couple of loops, I leave it as VBA.

Of course if the routine needs to be used often in that it is tied to a
specific VB.NET program, then its generally easier to make it part of the
VB.NET program although its only one or two routines...

Using "Tools - Upgrade Visual Basic 6 Code" I've converted VBA code to
VB.NET code.

Hope this helps
Jay
 
C

Charles Law

Jay

Thanks for the clarification.

Charles


Jay B. Harlow said:
Charles,
The VBA script runs within Word, VB.NET would drive word.

If the VBA script is going to be doing a lot, then it may run faster then
VB.NET will, as VBA is an in-process COM object, while VB.NET is
(normally) an out-of-process COM Interop object.

If the script is only going to be one or two routines I find doing it
directly in Word is easier then creating a VB.NET program to do it,
especially if the routine is only going to be used once.

If the problem looks like it could benefit from OO then I start with
VB.NET to leverage OO. If the problem looks like it will simply be one or
two routines & a couple of loops, I leave it as VBA.

Of course if the routine needs to be used often in that it is tied to a
specific VB.NET program, then its generally easier to make it part of the
VB.NET program although its only one or two routines...

Using "Tools - Upgrade Visual Basic 6 Code" I've converted VBA code to
VB.NET code.

Hope this helps
Jay
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top