Comparing two fields with similar (non-identical) content

G

Guest

I have two large tables (Table 1, Table2) that hold information about
Scientific Publications - author, year published, and title. With a query, I
would like to find out which articles are in Table 1, but that are not in
Table 2.

I need only the author and publication date information to identify a single
article. The publicaiton date field shouldn't be a problem, since the dates
will always be formatted identically (e.g. 1959). But there is one problem:
In Table 2, the authors are listed in the following format:

Smith, B.A

....while in Table 1, the authors are listed only by their last name (e.g.
Smith). Is there any way to compare these two tables, and find the missing
articles from Table 2, but while querying for information from these
non-identically-formatted "Author" fields?

To clarify, I want to match Author/Date (Table 1) against Author/Date (Table
2), but am troubled because the Author fields don't contain exactly the same
format. To me, it seems like it could be possible, since both fields will
contain (at least) the last name of the author. But, what do I know? :)

Thanks in advance for your input.

-Simple
 
C

CompGeek78

Why is your information in two different tables?

Generally, it's advisable not to store the same data in more than one
table. Typically, when you break that rule, you run into exactly the
sort of problem you have now. If possible, I would advise reevaluating
the structure of your database to eliminate duplicated data.

Keven
 
G

Guest

Two databases from two different companies. Also, one can be forced into
having more than one table to store the same data (ie. break your rule),
since the size limit for Access 2003 databases is 2 GB (these are much bigger
than that).

Anyway, since they are that big, I can't exactly re-evaluate the structure
of two huge databases, one of which I didn't even make.

In other words, I know the rule, which why I'm asking if there's away around
having to do it all manually (one table has over 90,000 entries).

Thanks.
 
W

Wolfgang Kais

Hello Simple.

Simple said:
I have two large tables (Table 1, Table2) that hold information
about Scientific Publications - author, year published, and title.
With a query, I would like to find out which articles are in Table 1,
but that are not in Table 2.

I need only the author and publication date information to identify
a single article. The publicaiton date field shouldn't be a problem,
since the dates will always be formatted identically (e.g. 1959).

The value of a field does not at all depend of it's format.
But there is one problem:
In Table 2, the authors are listed in the following format:

Smith, B.A

...while in Table 1, the authors are listed only by their last name
(e.g. Smith). Is there any way to compare these two tables, and find
the missing articles from Table 2, but while querying for information
from these non-identically-formatted "Author" fields?

To clarify, I want to match Author/Date (Table 1) against Author/Date
(Table 2), but am troubled because the Author fields don't contain
exactly the same format. To me, it seems like it could be possible,
since both fields will contain (at least) the last name of the author.
But, what do I know? :)

Well, I know that you have some work to do.
You keep mixing up "format" and "value". The authors are represented
by different values (not different formats, which suggests that the
values could be equal).
Of course, you could automatically extract "Smith" from "Smith, B.A",
but there is a chance that there are two authors that have the same
lastname, and that's why matching them by lastname is FORBIDDEN!
You will need to create a new table "authors" that has a primary key
(maybe an "auto value"). You can use that primary key in both other
tables to specify the author in those tables. After that you also
can use this field (authorID) for your initial problem:
Select Table1.* From Table1 Left Join Table2 On Table1.authorID =
Table2.authorID Where Table2.authorID is null.
 
P

Pat Wood

I am only an amateur, but this Query like this worked for me:

SELECT Table1.PKFieldName, Table1.Author, Table2.Author,
Table2.PKFieldName
FROM Table2 INNER JOIN Table1 ON Table2.PKFieldName=Table1.PKFieldName
ORDER BY Table2.Author;

In my query, Table2's PKFieldName was the Parent PKField and Table1 had
the corresponding Child Field.

"ORDER BY Table2.Author" should place your empty fields at the top of
your query datasheet.

You could check your Date Fields like this:

SELECT Table1.PKFieldName, Table1.Date, Table2.Date, Table2.PKFieldName
FROM Table2 INNER JOIN Table1 ON Table2.PKFieldName=Table1.PKFieldName
ORDER BY Table2.Date;

Or, to make it simpler:

SELECT Table1.Author, Table2.Author, Table2.PKFieldName
FROM Table2 INNER JOIN Table1 ON Table2.PKFieldName=Table1.PKFieldName
ORDER BY Table2.Author;

Hope it helps.

Pat
 
P

Pat Wood

I came up with 2 more queries you can try:

SELECT Table2.PKFieldName, Table1.PKFieldName, Table2.JoinField,
Table1.JoinField, Table2.Author, Table1.Author, Table1.Date,
Table2.Date
FROM Table1 RIGHT JOIN Table2 ON Table1.JoinField = Table2.JoinField
ORDER BY Table2.PKFieldName

The "JoinField" is used to establish the "Right Join" and is a
FieldName that both Tables share in common. It does not have to be a
PK.

You can also try:

SELECT Table2.PKFieldName, Table1.PKFieldName, Table2.Author,
Table1.Author, Table1.Date, Table2.Date
FROM Table1 RIGHT JOIN Table2 ON Table1.JoinField = Table2.JoinField
ORDER BY Table2.PKFieldName

I am sure someone else could do it better, but I thought I would try to
help.

I am fairly new at this and may not be using the proper terminology so
if you do not understand my fieldnames I would be glad to explain them.

Pat
 
P

Pat Wood

Oops. I missed this post earlier. Do you want to make the information
in both DBs to match? This many records calls for a different strategy
and Queries.

I have read that in a case like this, it is best to develop the Select
Query that gives you the results you like, then convert it to an update
query. Of course, it would be far better to practice this on a "copy"
of the DB instead of the DB actually being used! Just the thought of
loosing over 90,000 records is too painful to ponder!

Pat
 
P

Pat Wood

Here is a query that should find the Author fields which are not the
same.

SELECT Table2.PKFieldName, Table1.PKFieldName, Table1.Author,
Table2.Author
FROM Table2 LEFT JOIN Table1 ON Table2.JoinField = Table2.JoinField
WHERE Table2.Author Not Like [Table1.Author];

You should be able to use this query to find all the Author fields
where there is no entry:

SELECT Table2.PKFieldName, Table1.PKFieldName, Table1.Author,
Table2.Author
FROM Table2 LEFT JOIN Table1 ON Table2.JoinField = Table2.JoinField
WHERE Table2.Author Is Null;

You may be able to convert the first query into one or even several
different Update Queries. But I am not experienced enough to advise
you on how to update so many records. Perhaps someone else could point
you in the right direction, recommend a book or online article, etc.

Good luck,

Pat
 
G

Guest

Thank you, Pat and Wolfgang, for taking the time out to help me, and for
being patient with an Access beginner. I will attempt to use these
techniques you've provided for me(I've been studying them for a while now).
You have been very helpful, and I am making some progress.

If I have any more questions along the way, I'll be sure to let you know. :)

Thanks again, and take care.

Simple
 
J

John Vinson

I have two large tables (Table 1, Table2) that hold information about
Scientific Publications - author, year published, and title. With a query, I
would like to find out which articles are in Table 1, but that are not in
Table 2.

I need only the author and publication date information to identify a single
article. The publicaiton date field shouldn't be a problem, since the dates
will always be formatted identically (e.g. 1959). But there is one problem:
In Table 2, the authors are listed in the following format:

Smith, B.A

...while in Table 1, the authors are listed only by their last name (e.g.
Smith). Is there any way to compare these two tables, and find the missing
articles from Table 2, but while querying for information from these
non-identically-formatted "Author" fields?

SELECT * FROM Table1
WHERE NOT EXISTS
(SELECT Author FROM Table2 WHERE Table2.Author
LIKE Table1.Author & "*"
AND Table2.PublicationDate = Table1.PublicationDate);

Don't expect this to run rapidly - and of course if there are authors
SMITH, B.A.; SMITH, J.C.; SMITH, M.B.; SMITHERSON, G.W.; ... they'll
all be excluded, probably inappropriately. Given the terms of your
request it's impossible to do this *reliably*, since your Table1
author field doesn't contain enough information to identify the
author.


John W. Vinson[MVP]
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top