Query for data with different spellings

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

I am developing a database to analyze drug use patterns. The data comes from
different sources such that the field "drug name" can have different
spellings from different sources. An example would be "acetaminophen
w/codeine", "acetaminophen-codeine", "acetaminophen w-codeine",
"acetaminophen/codeine" and others. I need to know the utilization of the
drug, regardless of how it is spelled.

I can build a separate table containing each of these different spellings,
but I don't know how to build a query to check the incoming data against this
"reference table" so that the result would be that all "aceptaminophen with
codeine" entries would end up counted as the number of uses of that drug,
regardless of how it is spelled.

Can someone suggest how I could build a query like this? It won't be
feasible to build a query to search for each drug individually because there
are thousands of drugs where this situation can occur so I need a methodology
that can be applied to all the drugs in the dataset. Thanks for any help.
 
Your DrugNames table should have two columns.
Field: StandardDrugName
Field: VariantName

Then you can link the your incoming data drug names to the VariantName field
of the DrugNames table, but use the StandardDrugName field in all your
queries and analysis.

You would have to be careful and make sure to add new variations to your
DrugNames table. You could use an unmatched query to detect the "missing"
variations and then enter the new versions into your DrugNames table.
 
Thanks for the suggestion. How do I write the query to compare the drug name
in the incoming data table to both "StandardDrugName" and "VariantName"?
 
Add both tables to the query grid
Click on the DrugNames.VariantName and drag it to the drug name in the
incoming table.

Add all the fields except the drug name from the incoming table
Add DrugNames.StandardDrugName to the fields to display.
 
That will work great if there is only one variant name. However, I'm going to
be faced with multiple variant names (possibly 3 or 4). How do I make the
query compare the incoming drug name across all the possible variants? I
really am appreciating your help on this!
 
You should have one record for each combination of variants. For example,
with the sample data you proposed.

Select acetaminophen/codeine as your standard name. Then you would have
records as follows where the colon indicates the break between fields

acetaminophen/codeine : acetaminophen/codeine
acetaminophen/codeine : acetaminophen w/codeine
acetaminophen/codeine : acetaminophen w-codeine
acetaminophen/codeine : acetaminophen-codeine

NOTE that the first record above has identical values in both the standard
and variant names. That is a requirement. You will have to have one record
for every drug you are looking for that has the standard and variant name
equal.
 
What I ultimately need to get to is that I would get a count of how many
acetaminophen/codeine, regardless of how it is spelled. If I understand
correctly, what I would then do is just total by the standard name in my
query and it will actually give me the total for all the
acetaminophen/codeine as well as all the other drugs with similar problems in
the dataset. If I have understood correctly- thank you so much!
 
Yes, your understanding seems to be correct.

You would end up with a query that grouped by the standard name and counted the
standard name. Something like

SELECT StandardDrug, Count(StandardDrug) as CountUses
FROM IncomingTable INNER JOIN StandardNames
ON IncomingTable.DrugName = StandardNames.Variant
GROUP BY StandardDrug

Replace the fake names with the actual ones.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Back
Top