How to extract records matching a list of strings?

M

Magritte

I've decided to take the plunge and learn how to use Access 2007. I've set up
a database but am having difficulty with one task.

I have a master table with a unique integer ID and a text name which is
usually unique but technically may not be.

I have a separate list of names. I want to know which names in the second
list are or are not present in the master table, and for those that are
present, what are the unique IDs?

The second list of strings may match exactly the string in the master table,
or may match exactly but have a different case, or may be a subset of the
string inthe master table. I'd like to find all 3 cases. Where the match is a
substring, I'd like to find all strings in the master table that contain that
substring.

My first attempt was just to create a new table with a single field being a
text field, and each record holding one of the strings in the second list. I
then created a query containing the ID and name in the master table, and the
string in the second table, and creating a join between the string in the
second table and the string in the master table, with an arrow pointing
towards the master table.

And the result: no matches.

I then modified a few of the strings in the second table so that they
matcched exactly (including case) some record in the master table. But still
no matches.

So I'm at a loss as to how to get the result I'm looking for. Am I going
about this completely the wrong way or is there some little thing I'm not
doing?

Thanks for your help!
 
D

Dirk Goldgar

Magritte said:
I've decided to take the plunge and learn how to use Access 2007. I've set
up
a database but am having difficulty with one task.

I have a master table with a unique integer ID and a text name which is
usually unique but technically may not be.

I have a separate list of names. I want to know which names in the second
list are or are not present in the master table, and for those that are
present, what are the unique IDs?

The second list of strings may match exactly the string in the master
table,
or may match exactly but have a different case, or may be a subset of the
string inthe master table. I'd like to find all 3 cases. Where the match
is a
substring, I'd like to find all strings in the master table that contain
that
substring.

My first attempt was just to create a new table with a single field being
a
text field, and each record holding one of the strings in the second list.
I
then created a query containing the ID and name in the master table, and
the
string in the second table, and creating a join between the string in the
second table and the string in the master table, with an arrow pointing
towards the master table.

And the result: no matches.

I then modified a few of the strings in the second table so that they
matcched exactly (including case) some record in the master table. But
still
no matches.

So I'm at a loss as to how to get the result I'm looking for. Am I going
about this completely the wrong way or is there some little thing I'm not
doing?


What you're doing sounds plausible, and a join on the text fields ought to
find matches regardless of case. Could you post the SQL view of the query
your created, and give some examples of the data in the tables?
 
T

T Magritte

What you're doing sounds plausible, and a join on the text fields ought to
find matches regardless of case.  Could you post the SQL view of the query
your created, and give some examples of the data in the tables?

Hi,

Here's the SQL query:
SELECT [gene symbols].[Transcript ID], [gene symbols].symbol,
[shopping list test].[gene name]
FROM [shopping list test] LEFT JOIN [gene symbols] ON [shopping list
test].[gene name] = [gene symbols].symbol;

Here's the whole "shoping list test" table: (ID in this case was an
autogenerated ID column, "gene name" is a text column.

ID gene name
1 arg1
2 ARG2
3 mapk
4 cox1

Here's a few sample rows fromthe "gene symbols" table ("Transcript ID"
is the key):

Transcript ID accession symbol
6772815 NM_007482 Arg1
6796331 NM_009705 Arg2
6980271 BC046971 Arglu1
6879070 BC006592 Arhgap1
6839714 NM_001038663 Mapk1
6940534 NM_009158 Mapk10
6940535 NM_009158 Mapk10
6837785 ENSMUST00000088823 Mapk11
6985818 ENSMUST00000034285 Cotl1
6788993 ENSMUST00000049091 Cox10
6783500 NM_199008 Cox11
6873235 NM_144874 Cox15

So with the exception of differing case, the first and second row of
the test table should find perfect matches in the queried table. The
third and fourth row are substrings of strings present in the queried
table but not exact matches so I wouldn't necessarily expect those to
find matches without adding an additional constraint expression.

However this is the result of the query when run:

Transcript ID symbol gene name
arg1
ARG2
mapk
cox1

so it didn't find anything.

Thanks!
 
D

Dirk Goldgar

T Magritte said:
Here's the SQL query:
SELECT [gene symbols].[Transcript ID], [gene symbols].symbol,
[shopping list test].[gene name]
FROM [shopping list test] LEFT JOIN [gene symbols] ON [shopping list
test].[gene name] = [gene symbols].symbol;

Here's the whole "shoping list test" table: (ID in this case was an
autogenerated ID column, "gene name" is a text column.

ID gene name
1 arg1
2 ARG2
3 mapk
4 cox1

Here's a few sample rows fromthe "gene symbols" table ("Transcript ID"
is the key):

Transcript ID accession symbol
6772815 NM_007482 Arg1
6796331 NM_009705 Arg2
6980271 BC046971 Arglu1
6879070 BC006592 Arhgap1
6839714 NM_001038663 Mapk1
6940534 NM_009158 Mapk10
6940535 NM_009158 Mapk10
6837785 ENSMUST00000088823 Mapk11
6985818 ENSMUST00000034285 Cotl1
6788993 ENSMUST00000049091 Cox10
6783500 NM_199008 Cox11
6873235 NM_144874 Cox15

So with the exception of differing case, the first and second row of
the test table should find perfect matches in the queried table. The
third and fourth row are substrings of strings present in the queried
table but not exact matches so I wouldn't necessarily expect those to
find matches without adding an additional constraint expression.

However this is the result of the query when run:

Transcript ID symbol gene name
arg1
ARG2
mapk
cox1

so it didn't find anything.


When I try your query with the data you posted, I find the matches yuo
expected. Check for leading and/or trailing spaces in the [gene name] and
[symbol] fields in your tables. You can remove them with update queries:

UPDATE [gene symbols] SET symbol = Trim(symbol);

UPDATE [shopping list test] SET [gene name] = Trim([gene name]);

If, after running those queries, you find that your original query finds
matches, then we can proceed to adjust your query so that it returns
"substring" matches.
 
T

T Magritte

Here's the SQL query:
SELECT [gene symbols].[Transcript ID], [gene symbols].symbol,
[shopping list test].[gene name]
FROM [shopping list test] LEFT JOIN [gene symbols] ON [shopping list
test].[gene name] = [gene symbols].symbol;
Here's the whole "shoping list test" table: (ID in this case was an
autogenerated ID column, "gene name" is a text column.
ID gene name
1 arg1
2 ARG2
3 mapk
4 cox1
Here's a few sample rows fromthe "gene symbols" table ("Transcript ID"
is the key):
Transcript ID accession symbol
6772815 NM_007482 Arg1
6796331 NM_009705 Arg2
6980271 BC046971 Arglu1
6879070 BC006592 Arhgap1
6839714 NM_001038663 Mapk1
6940534 NM_009158 Mapk10
6940535 NM_009158 Mapk10
6837785 ENSMUST00000088823 Mapk11
6985818 ENSMUST00000034285 Cotl1
6788993 ENSMUST00000049091 Cox10
6783500 NM_199008 Cox11
6873235 NM_144874 Cox15
So with the exception of differing case, the first and second row of
the test table should find perfect matches in the queried table. The
third and fourth row are substrings of strings present in the queried
table but not exact matches so I wouldn't necessarily expect those to
find matches without adding an additional constraint expression.
However this is the result of the query when run:
Transcript ID symbol gene name
arg1
ARG2
mapk
cox1
so it didn't find anything.

When I try your query with the data you posted, I find the matches yuo
expected.  Check for leading and/or trailing spaces in the [gene name] and
[symbol] fields in your tables.  You can remove them with update queries:

    UPDATE [gene symbols] SET symbol = Trim(symbol);

    UPDATE [shopping list test] SET [gene name] = Trim([gene name]);

If, after running those queries, you find that your original query finds
matches, then we can proceed to adjust your query so that it returns
"substring" matches.

Awesome! After running your suggested update queries I did get the
result I expected. It took some work figuring out how to run the
update (I had to add my database location to the trusted folder then
restart Access before it allowed me to run it). I had thought to check
for extra whitespace but didn't find any during a manual search. It
seems as though Access only shows spaces to the left of the text in a
cell but not to the right?

So if you can fill me in on how to do the substring matching I'd
appreciate it!

Thanks!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top