parsing text from one column and writing results into 6 others

G

Guest

I'm modifying a table created by someone else. System is a library database,
with the need for multiple "subject" entries for each book. Current database
has a single column, with up to 6 subjects for one book, all in one field.

Subjects are preceded with "1. "; "2. ", "3. " etc. Thus I could extract
each subject into it's own column such as subj1, subj2, subj3 for that
book... if I knew how!

My goal is to index on ALL subject fields to enhance user search.

I have done extensive coding under FoxPro for DOS in the past, and have
actually written a parsing routine that does the job, but it's getting to be
a nuisance going back to DOS to do the job.
 
A

Albert D. Kallal

Subjects are preceded with "1. "; "2. ", "3. " etc. Thus I could extract
each subject into it's own column such as subj1, subj2, subj3 for that
book... if I knew how!

My goal is to index on ALL subject fields to enhance user search.

I have done extensive coding under FoxPro for DOS in the past, and have
actually written a parsing routine that does the job, but it's getting to
be
a nuisance going back to DOS to do the job.

I assume you also plan to strip out those extra chranaters (the " ", and the
".").

Anyway, you bild a little public function:

However, you have:

"1. "; "2. ", "3. "

the first delimter is a ";", and then you have a ","...which is it? I going
to ASSUME the above is supposed to be a ";"

So, the given string is:

"1. "; "2. "; "3. "

The code would look like:

Public Sub ProcessToChild()

Dim rst As DAO.Recordset
Dim rstChild As DAO.Recordset

Dim strSql As String

Debug.Print "running.."
Set rstChild = CurrentDb.OpenRecordset("tblChildTable")

strSql = "select id, ChapterData from MyTable " & _
"where ChapterData is not null"

Set rst = CurrentDb.OpenRecordset(strSql)
Do While rst.EOF = False
Call StripOut(rstChild, rst!ID, rst!ChapterData)
rst.MoveNext
Loop
rst.Close

MsgBox "done"

End Sub


Public Sub StripOut(rst As DAO.Recordset, lngID As Long, strChapterData As
String)

Dim vBuf As Variant
Dim i As Integer
Dim str As String

vBuf = Split(strChapterData, ";")
For i = 0 To UBound(vBuf)
'
str = Trim(Split(vBuf(i), Chr$(34))(1))
rst.AddNew
rst!Parent_id = lngID
rst!Chapter = str
rst.Update
Next i

rst.Update

End Sub


So, just paste both into a standard code module...you can then from the
command line prompt just run the sub ProcessToChild
(either just place your cursor in the routine and hit f5 to run, or simply
type in the name of the sub from the command line prompt).

If your sample data was wrong, then obviously the above code would need
modifying. The sample code assumes a ";" as a delimiter, and that the data
inside each set is surrounded by actual " (quotes).
 
G

Guest

Many thx.. just what I needed to know. Now I've got to learn how to create a
code module and to execute it for each book.

BTW.. the data in the current SUBJECT field appears as :

1. Bible 2. Family 3. Parents 4. Death

Other than the numbers, spaces and periods, there are no real delimiters.
Some have only one. Some have more.

Thx again

Richard
 
A

Albert D. Kallal

RP said:
Many thx.. just what I needed to know. Now I've got to learn how to create
a
code module and to execute it for each book.

The reason why I have two separate routines, is that the first routine is
your main
processing loop that would run for each book. that is exactly how/why this
was
setup this way.

The 2nd routine takes that data for ONE record, and spits it out to a
"child" table.

It is assumed that this child table is going to be related back to the main
table. You have to spit out this data to anther table to index each value.
BTW.. the data in the current SUBJECT field appears as :
1. Bible 2. Family 3. Parents 4. Death

Well, that is a bit of different problem now. Give that you understand
software, it should be no surprise that changing our delimiter from a ";" to
a space makes this problem some more complex. What happens if we have:

1. Bible 2. Family 3. Lost Parents 4. Death

We can't use the space as a delimiter to pull out the data. If EVERY SINGE
chapter entry NEVER EVER EVER HAD extra spaces, then we could assume that
EACH chapter is two tokens:

1. Bible 2. Family 3. Parents 4. Death

To parse the above, we could say
take each two words delimited by a space..and those two words become
chapter and text

eg:
1. Bible
2. Family
3. Parents
4. Death

However, if we have:

1. Bible 2. Family 3. Lost Parents 4. Death


Note that our software rule we just made breaks down. Each two words
separated by a space does not represent our results we need anymore. So,
depending on the consistency of this data, the parsing problem is either
easy, or VERY VERY complex!!!

Can we assume that each chapter ALWAYS has a number followed by a "."
(period) ? If we can, then lets use the period as our delimiter!!!!

If you can't use the period, and some chapter text will have MORE then one
word..then, we just inherit a difficult parsing problem (one that would
likely be even worse in FoxPro..since we have a bit better parsing then
those old FoxPro programs).

Other than the numbers, spaces and periods, there are no real delimiters.
Some have only one. Some have more.

Lets assume ALWAYS a number then period.
Question: does ANY of the text have a period in it:

Like

1. Bible 2. Family 3. Inc. Parents 4. Death

And, worse, does any text have:

1. Bible 2. Family 3. Inc. Only 5 ancestors 4. Death

Note the above has a number in the middle of the text.

So, before I waste some more time here, you have to lay out the rules. the
most easy approach is to use the dot (period), but that will only work if
the text NEVER has a period in it..only right after the numbers.

So, given that you written and understand software, the ONLY WAY we can
write this code is to come up with a set of rules that work for you case.

So, can we assume that the data has no "." in it, *EXCEPT* after a number?
Will this rule hold out?
 
G

Guest

Thx Albert.... you have a firm grasp on the issues. The FoxPro routine I
wrote uses " 1." etc as delimiter, and it works enough of the time to be
worthwhile.

The numeral needs to be removed from the entry, of course.

An example of the last results of my routine may be seen at

http://home.earthlink.net/~tbilbk1/tbilib/STBIHIST.HTM where S1=subject

Albert... how do I execute this routine in Access??

Richard
 
A

Albert D. Kallal

Albert... how do I execute this routine in Access??


It is very similar process to fox. You simple crate a code module (allow it
be defaulted to name of module1, or you could perhaps give the module name
something describe...like:

MyIndexStuff

note that the code module name must ALWAYS be different then the code you
place inside of this module. And, note that you can often place many subs
and functions in that module.

And, then, just like in fox, you can type in the name of the sub in the
command window..and it runs....

So, you likely want to get some familiarly with the command line prompt in
ms-access (just hit ctrl-g while in the forms side..and you are flipped into
the code + IDE side).

However, unless you have some VBA coding skills here, this is going to be
hard going for you...

Anyway, my example code assumed a ";" , and that seems to be of no use....

I noticed in your link that there was all kinds of periods in the text. so,
again this will make parsing more difficult, and using a dot as the *only*
delimiter don't look that good anymore...

I would still use the split function ( split() ) to move the data into an
array delimited by a "dot". With a trim command, you could test each element
of the array of "tokens" to check for number data ONLY...and, that could
then be a starting point for the data....

check out in the help the isnumeric() function.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top