Searching for Key Words in a String of Text

G

Guest

I am trying to build a database in Access 2003 that will filter approximately
65,000 records of bank transactions and flag those transactions that contain
the names of high risk countries (and other key words). I have a table with
a field that aggregates the text of all transactions, and another table with
a list of countries and their risk scores.

I would like to have a query that assigns a risk score to each transaction
based on which country names appear in the text. One way to do this is to
have a calculated field for each country and write an IIF statement that
returns the country's risk score if the name of the country appears in the
text. It pulls the risk score from the country table using a DLookup. I'm
finding that these queries run very slowly.

Is there a more efficient way of searching the entire country table at once,
without creating a separate field for each country? It seems an InStr
function would work, but then I would lose the trail of specifically which
countries appear in the text.

I would be grateful for any ideas as to how I can run this search more
efficiently.

Thanks!
 
J

John Nurick

Hi Andrew,

I'd be tempted to do this by exporting the data to a text file and
running a Perl script that used a cunning regex substitution that
deleted everything except the country names/key words, which it would
replace with the corresponding risk scores concatenated by plus signs.
Pass the result to eval and Bob's your uncle.

For an all-Access solution, how about this:

1) Normalise the data. Create a new table to implement a M:M
relationship between countries and transactions. Populate it by writing
a VBA procedure along these lines (pseudocode)

open recordset into Countries table
for each Country in recordset,
construct and execute an update query that adds a record
to tblCountriesTransactions for each Transaction that
mentions that country (using LIKE '*' & CountryName & '*'
as the criterion).

2) Once you've done this, an ordinary totals query will tot up the
scores for you.

Or I can think of something more complicated involving a Dictionary and
a Regexp...<g>
 
M

Marshall Barton

Andrew said:
I am trying to build a database in Access 2003 that will filter approximately
65,000 records of bank transactions and flag those transactions that contain
the names of high risk countries (and other key words). I have a table with
a field that aggregates the text of all transactions, and another table with
a list of countries and their risk scores.

I would like to have a query that assigns a risk score to each transaction
based on which country names appear in the text. One way to do this is to
have a calculated field for each country and write an IIF statement that
returns the country's risk score if the name of the country appears in the
text. It pulls the risk score from the country table using a DLookup. I'm
finding that these queries run very slowly.

Is there a more efficient way of searching the entire country table at once,
without creating a separate field for each country? It seems an InStr
function would work, but then I would lose the trail of specifically which
countries appear in the text.


You might want to try starting out with something like:

SELECT T.transID, C.Country, C.Score
FROM Transactions As T LEFT JOIN Countries As C
ON T.TransText Like "*" & C.Country & "*"

You will not be able to do that in the query design grid, so
use SQL view to create the query.
 
G

Guest

Thanks for your help Marshall. I rewrote the query in SQL view using the
sample code you provided, and the query now runs much faster (initially).
The only problem is when I try to sort the records in the query, it still
runs incredibly slow and sometimes freezes up my machine. I tried running
the query on a list of 10 "key words" as practice, and there are
approximately 57,000 records in the database. That doesn't seem like all
that much data to sift through, so I'm not sure why it's running so slowly.

John: Thanks also for your post, but I don't think I'm tech-savy enough to
write the code to implement your solution.

Andrew
 
M

Marshall Barton

What type of field is the transaction description? If it's
a memo field, it might be the cause of at least part of the
speed issue.

Sorting can be a slow process in some cases. Especially if
there's lots of fields involved and it shouldn't even be
attempted with a long memo field.

Unless your query is more complex than the simple outline I
posted, I don't see where indexes might help improve the
performance.

About the only idea I have at this point is that you should
not sort the query. Instead use the query to make another
table and perform further operations on that.
 
G

Guest

I've made most of my primary queries into make-table queries so that I only
run them once and dump the results into a table. Then I can run additional
queries off the query-generated tables, and this seems to speed things up
quite a bit.

Thanks!

Marshall Barton said:
What type of field is the transaction description? If it's
a memo field, it might be the cause of at least part of the
speed issue.

Sorting can be a slow process in some cases. Especially if
there's lots of fields involved and it shouldn't even be
attempted with a long memo field.

Unless your query is more complex than the simple outline I
posted, I don't see where indexes might help improve the
performance.

About the only idea I have at this point is that you should
not sort the query. Instead use the query to make another
table and perform further operations on that.
--
Marsh
MVP [MS Access]

Thanks for your help Marshall. I rewrote the query in SQL view using the
sample code you provided, and the query now runs much faster (initially).
The only problem is when I try to sort the records in the query, it still
runs incredibly slow and sometimes freezes up my machine. I tried running
the query on a list of 10 "key words" as practice, and there are
approximately 57,000 records in the database. That doesn't seem like all
that much data to sift through, so I'm not sure why it's running so slowly.
 
G

Guest

Marsh,

I've been using the SQL statement you gave me for about a month now and it
seems to be working well enough, but the catch is, it only seems to work when
I pull the transaction text from a query. What I would like to do is pull
this text from the memo field of a table, but the problem is that my country
names are captured in a text field, so Access won't let me establish a
relationship between a text field and a memo field (the error message says
that I can't set up any relationships involving memo fields).

The SQL statement currently used in the query is as follows:

INSERT INTO tblGeoHits ( TxnID, SearchString, GeoID )
SELECT qryAggregateText.TxnID, tblGeoSearchStrings.SearchString,
tblGeoSearchStrings.GeoID
FROM qryAggregateText LEFT JOIN tblGeoSearchStrings ON
qryAggregateText.AggregateText Like "*" & tblGeoSearchStrings.SearchString &
"*";

As you can see, I am building a table to store the results of the query,
which searches for country names (tblGeoSearchStrings.SearchString) in each
transaction message (qryAggregateText.AggregateText). The AggregateText
field is a calculated field created by combining several text fields. It
often contains more than 255 characters.

What I would like to do store the contents of AggregateText in a memo field
in a table and run the query against that memo field. How can I do this?

Thanks again for your help,
Andrew




Marshall Barton said:
Is this a repetitive operation or a one time thing? If you
going to be foing this on a regular basis, then you may want
to use a temporary database to hold the "temprary" tables
that you are creating. Take a look at:
http://www.granite.ab.ca/access/temptables.htm
--
Marsh
MVP [MS Access]

I've made most of my primary queries into make-table queries so that I only
run them once and dump the results into a table. Then I can run additional
queries off the query-generated tables, and this seems to speed things up
quite a bit.
 
M

Marshall Barton

I can't see the entire thread anymore. Did you explain what
qryAggregateText did beyond just concatenating text fields?
If it's that simple, you might try checking the speed of
dropping that query and doing it all in this one:

INSERT INTO tblGeoHits ( TxnID, SearchString, GeoID )
SELECT TX.TxnID,
GS.SearchString,
GS.GeoID
FROM thetransactiontable As TX
LEFT JOIN tblGeoSearchStrings As GS
ON TX.text1 Like "*" & GS.SearchString & "*"
OR TX.text2 Like "*" & GS.SearchString & "*"
OR TX.text3 Like "*" & GS.SearchString & "*"

If that's too slow or your calculations are not as simple as
I thought, then you've reached the limit of my ideas. I
don't see why the query you posted works any better than
using your temp table, but if it does the job, stick with
it.
--
Marsh
MVP [MS Access]

I've been using the SQL statement you gave me for about a month now and it
seems to be working well enough, but the catch is, it only seems to work when
I pull the transaction text from a query. What I would like to do is pull
this text from the memo field of a table, but the problem is that my country
names are captured in a text field, so Access won't let me establish a
relationship between a text field and a memo field (the error message says
that I can't set up any relationships involving memo fields).

The SQL statement currently used in the query is as follows:

INSERT INTO tblGeoHits ( TxnID, SearchString, GeoID )
SELECT qryAggregateText.TxnID, tblGeoSearchStrings.SearchString,
tblGeoSearchStrings.GeoID
FROM qryAggregateText LEFT JOIN tblGeoSearchStrings ON
qryAggregateText.AggregateText Like "*" & tblGeoSearchStrings.SearchString &
"*";

As you can see, I am building a table to store the results of the query,
which searches for country names (tblGeoSearchStrings.SearchString) in each
transaction message (qryAggregateText.AggregateText). The AggregateText
field is a calculated field created by combining several text fields. It
often contains more than 255 characters.

What I would like to do store the contents of AggregateText in a memo field
in a table and run the query against that memo field. How can I do this?

Thanks again for your help,
Andrew




Marshall Barton said:
Is this a repetitive operation or a one time thing? If you
going to be foing this on a regular basis, then you may want
to use a temporary database to hold the "temprary" tables
that you are creating. Take a look at:
http://www.granite.ab.ca/access/temptables.htm
--
Marsh
MVP [MS Access]

I've made most of my primary queries into make-table queries so that I only
run them once and dump the results into a table. Then I can run additional
queries off the query-generated tables, and this seems to speed things up
quite a bit.


:
What type of field is the transaction description? If it's
a memo field, it might be the cause of at least part of the
speed issue.

Sorting can be a slow process in some cases. Especially if
there's lots of fields involved and it shouldn't even be
attempted with a long memo field.

Unless your query is more complex than the simple outline I
posted, I don't see where indexes might help improve the
performance.

About the only idea I have at this point is that you should
not sort the query. Instead use the query to make another
table and perform further operations on that.


Andrew wrote:
Thanks for your help Marshall. I rewrote the query in SQL view using the
sample code you provided, and the query now runs much faster (initially).
The only problem is when I try to sort the records in the query, it still
runs incredibly slow and sometimes freezes up my machine. I tried running
the query on a list of 10 "key words" as practice, and there are
approximately 57,000 records in the database. That doesn't seem like all
that much data to sift through, so I'm not sure why it's running so slowly.


Andrew wrote:
I am trying to build a database in Access 2003 that will filter approximately
65,000 records of bank transactions and flag those transactions that contain
the names of high risk countries (and other key words). I have a table with
a field that aggregates the text of all transactions, and another table with
a list of countries and their risk scores.

I would like to have a query that assigns a risk score to each transaction
based on which country names appear in the text. One way to do this is to
have a calculated field for each country and write an IIF statement that
returns the country's risk score if the name of the country appears in the
text. It pulls the risk score from the country table using a DLookup. I'm
finding that these queries run very slowly.

Is there a more efficient way of searching the entire country table at once,
without creating a separate field for each country? It seems an InStr
function would work, but then I would lose the trail of specifically which
countries appear in the text.


:
You might want to try starting out with something like:

SELECT T.transID, C.Country, C.Score
FROM Transactions As T LEFT JOIN Countries As C
ON T.TransText Like "*" & C.Country & "*"

You will not be able to do that in the query design grid, so
use SQL view to create the query.
 
G

Guest

I've actually been able to solve the problem by using a slightly different
query:

INSERT INTO tblGeoHits ( TxnID, SearchString, GeoID )
SELECT tblAggregateText.TxnID, tblGeoSearchStrings.SearchString,
tblGeoSearchStrings.GeoID
FROM tblAggregateText, tblGeoSearchStrings
WHERE tblAggregateText.AggregateText Like "*" &
tblGeoSearchStrings.SearchString & "*"

This seems to be effectively creating an inner join between the memo field
(tblAggregateText.AggregateText) and the text field
(tblGeoSearchStrings.SearchString).

The query still runs slowly, but is a bit faster than when I pull data from
the query qryAggregateText.

Thanks for your help!


Marshall Barton said:
I can't see the entire thread anymore. Did you explain what
qryAggregateText did beyond just concatenating text fields?
If it's that simple, you might try checking the speed of
dropping that query and doing it all in this one:

INSERT INTO tblGeoHits ( TxnID, SearchString, GeoID )
SELECT TX.TxnID,
GS.SearchString,
GS.GeoID
FROM thetransactiontable As TX
LEFT JOIN tblGeoSearchStrings As GS
ON TX.text1 Like "*" & GS.SearchString & "*"
OR TX.text2 Like "*" & GS.SearchString & "*"
OR TX.text3 Like "*" & GS.SearchString & "*"

If that's too slow or your calculations are not as simple as
I thought, then you've reached the limit of my ideas. I
don't see why the query you posted works any better than
using your temp table, but if it does the job, stick with
it.
--
Marsh
MVP [MS Access]

I've been using the SQL statement you gave me for about a month now and it
seems to be working well enough, but the catch is, it only seems to work when
I pull the transaction text from a query. What I would like to do is pull
this text from the memo field of a table, but the problem is that my country
names are captured in a text field, so Access won't let me establish a
relationship between a text field and a memo field (the error message says
that I can't set up any relationships involving memo fields).

The SQL statement currently used in the query is as follows:

INSERT INTO tblGeoHits ( TxnID, SearchString, GeoID )
SELECT qryAggregateText.TxnID, tblGeoSearchStrings.SearchString,
tblGeoSearchStrings.GeoID
FROM qryAggregateText LEFT JOIN tblGeoSearchStrings ON
qryAggregateText.AggregateText Like "*" & tblGeoSearchStrings.SearchString &
"*";

As you can see, I am building a table to store the results of the query,
which searches for country names (tblGeoSearchStrings.SearchString) in each
transaction message (qryAggregateText.AggregateText). The AggregateText
field is a calculated field created by combining several text fields. It
often contains more than 255 characters.

What I would like to do store the contents of AggregateText in a memo field
in a table and run the query against that memo field. How can I do this?

Thanks again for your help,
Andrew




Marshall Barton said:
Is this a repetitive operation or a one time thing? If you
going to be foing this on a regular basis, then you may want
to use a temporary database to hold the "temprary" tables
that you are creating. Take a look at:
http://www.granite.ab.ca/access/temptables.htm
--
Marsh
MVP [MS Access]


Andrew wrote:
I've made most of my primary queries into make-table queries so that I only
run them once and dump the results into a table. Then I can run additional
queries off the query-generated tables, and this seems to speed things up
quite a bit.


:
What type of field is the transaction description? If it's
a memo field, it might be the cause of at least part of the
speed issue.

Sorting can be a slow process in some cases. Especially if
there's lots of fields involved and it shouldn't even be
attempted with a long memo field.

Unless your query is more complex than the simple outline I
posted, I don't see where indexes might help improve the
performance.

About the only idea I have at this point is that you should
not sort the query. Instead use the query to make another
table and perform further operations on that.


Andrew wrote:
Thanks for your help Marshall. I rewrote the query in SQL view using the
sample code you provided, and the query now runs much faster (initially).
The only problem is when I try to sort the records in the query, it still
runs incredibly slow and sometimes freezes up my machine. I tried running
the query on a list of 10 "key words" as practice, and there are
approximately 57,000 records in the database. That doesn't seem like all
that much data to sift through, so I'm not sure why it's running so slowly.


Andrew wrote:
I am trying to build a database in Access 2003 that will filter approximately
65,000 records of bank transactions and flag those transactions that contain
the names of high risk countries (and other key words). I have a table with
a field that aggregates the text of all transactions, and another table with
a list of countries and their risk scores.

I would like to have a query that assigns a risk score to each transaction
based on which country names appear in the text. One way to do this is to
have a calculated field for each country and write an IIF statement that
returns the country's risk score if the name of the country appears in the
text. It pulls the risk score from the country table using a DLookup. I'm
finding that these queries run very slowly.

Is there a more efficient way of searching the entire country table at once,
without creating a separate field for each country? It seems an InStr
function would work, but then I would lose the trail of specifically which
countries appear in the text.


:
You might want to try starting out with something like:

SELECT T.transID, C.Country, C.Score
FROM Transactions As T LEFT JOIN Countries As C
ON T.TransText Like "*" & C.Country & "*"

You will not be able to do that in the query design grid, so
use SQL view to create the query.
 
M

Marshall Barton

Yes, that is the definition of an inner join, but you were
using an outer join so the results will be different.
--
Marsh
MVP [MS Access]

I've actually been able to solve the problem by using a slightly different
query:

INSERT INTO tblGeoHits ( TxnID, SearchString, GeoID )
SELECT tblAggregateText.TxnID, tblGeoSearchStrings.SearchString,
tblGeoSearchStrings.GeoID
FROM tblAggregateText, tblGeoSearchStrings
WHERE tblAggregateText.AggregateText Like "*" &
tblGeoSearchStrings.SearchString & "*"

This seems to be effectively creating an inner join between the memo field
(tblAggregateText.AggregateText) and the text field
(tblGeoSearchStrings.SearchString).

The query still runs slowly, but is a bit faster than when I pull data from
the query qryAggregateText.


Marshall Barton said:
I can't see the entire thread anymore. Did you explain what
qryAggregateText did beyond just concatenating text fields?
If it's that simple, you might try checking the speed of
dropping that query and doing it all in this one:

INSERT INTO tblGeoHits ( TxnID, SearchString, GeoID )
SELECT TX.TxnID,
GS.SearchString,
GS.GeoID
FROM thetransactiontable As TX
LEFT JOIN tblGeoSearchStrings As GS
ON TX.text1 Like "*" & GS.SearchString & "*"
OR TX.text2 Like "*" & GS.SearchString & "*"
OR TX.text3 Like "*" & GS.SearchString & "*"

If that's too slow or your calculations are not as simple as
I thought, then you've reached the limit of my ideas. I
don't see why the query you posted works any better than
using your temp table, but if it does the job, stick with
it.

I've been using the SQL statement you gave me for about a month now and it
seems to be working well enough, but the catch is, it only seems to work when
I pull the transaction text from a query. What I would like to do is pull
this text from the memo field of a table, but the problem is that my country
names are captured in a text field, so Access won't let me establish a
relationship between a text field and a memo field (the error message says
that I can't set up any relationships involving memo fields).

The SQL statement currently used in the query is as follows:

INSERT INTO tblGeoHits ( TxnID, SearchString, GeoID )
SELECT qryAggregateText.TxnID, tblGeoSearchStrings.SearchString,
tblGeoSearchStrings.GeoID
FROM qryAggregateText LEFT JOIN tblGeoSearchStrings ON
qryAggregateText.AggregateText Like "*" & tblGeoSearchStrings.SearchString &
"*";

As you can see, I am building a table to store the results of the query,
which searches for country names (tblGeoSearchStrings.SearchString) in each
transaction message (qryAggregateText.AggregateText). The AggregateText
field is a calculated field created by combining several text fields. It
often contains more than 255 characters.

What I would like to do store the contents of AggregateText in a memo field
in a table and run the query against that memo field. How can I do this?

Thanks again for your help,
Andrew




:

Is this a repetitive operation or a one time thing? If you
going to be foing this on a regular basis, then you may want
to use a temporary database to hold the "temprary" tables
that you are creating. Take a look at:
http://www.granite.ab.ca/access/temptables.htm
--
Marsh
MVP [MS Access]


Andrew wrote:
I've made most of my primary queries into make-table queries so that I only
run them once and dump the results into a table. Then I can run additional
queries off the query-generated tables, and this seems to speed things up
quite a bit.


:
What type of field is the transaction description? If it's
a memo field, it might be the cause of at least part of the
speed issue.

Sorting can be a slow process in some cases. Especially if
there's lots of fields involved and it shouldn't even be
attempted with a long memo field.

Unless your query is more complex than the simple outline I
posted, I don't see where indexes might help improve the
performance.

About the only idea I have at this point is that you should
not sort the query. Instead use the query to make another
table and perform further operations on that.


Andrew wrote:
Thanks for your help Marshall. I rewrote the query in SQL view using the
sample code you provided, and the query now runs much faster (initially).
The only problem is when I try to sort the records in the query, it still
runs incredibly slow and sometimes freezes up my machine. I tried running
the query on a list of 10 "key words" as practice, and there are
approximately 57,000 records in the database. That doesn't seem like all
that much data to sift through, so I'm not sure why it's running so slowly.


Andrew wrote:
I am trying to build a database in Access 2003 that will filter approximately
65,000 records of bank transactions and flag those transactions that contain
the names of high risk countries (and other key words). I have a table with
a field that aggregates the text of all transactions, and another table with
a list of countries and their risk scores.

I would like to have a query that assigns a risk score to each transaction
based on which country names appear in the text. One way to do this is to
have a calculated field for each country and write an IIF statement that
returns the country's risk score if the name of the country appears in the
text. It pulls the risk score from the country table using a DLookup. I'm
finding that these queries run very slowly.

Is there a more efficient way of searching the entire country table at once,
without creating a separate field for each country? It seems an InStr
function would work, but then I would lose the trail of specifically which
countries appear in the text.


:
You might want to try starting out with something like:

SELECT T.transID, C.Country, C.Score
FROM Transactions As T LEFT JOIN Countries As C
ON T.TransText Like "*" & C.Country & "*"

You will not be able to do that in the query design grid, so
use SQL view to create the query.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top