Consolidate duplicate entries

T

Tom

Hi

Am rebuilding a book db which lists approx 40k of books, on looking through
a table listing the authors - with fields AuthSName, AuthFName, BookID,
AuthorID - we are noticing numerous duplications of the same authors.
AuthSName and AuthFName are text fields while BookID and AuthorID are
numeric fields.

What I would like to do is consolidate duplicated Authors - AuthSName and
AuthFName - so that they have the same AuthorID.

What is the best way to automate this process?

Any advice/suggestions very much appreciated.

TIA

Tom
 
T

Tom van Stiphout

You do indeed have bad database design, because the Authors table has
a BookID. Rather the Books table should have an AuthorID (if you only
want to record the primary author) or there should be a new table with
BookID and AuthorID (PK over both fields) if you want to record the
(perhaps several) authors for a book.
Enforce all relationships.
Make these changes first; then this "consolidation" is automatically
taken care of.

-Tom.
Microsoft Access MVP
 
K

KARL DEWEY

In case that Tom is not correct in his assumption or that you have corrected
your tables and still need to use a single AuthorID where you have many, try
this.
Create a FixIt table --
AuthSName
AuthFName
OldAuthorID
NewAuthorID

Run a totals append query to populate AuthSName, AuthFName, and OldAuthorID.
Using a select query sorted on AuthSName and AuthFName decide which
OldAuthorID to use for NewAuthorID and edit accordingly. Then you can join
the FixIt table to your table on OldAuthorID to AuthorID and update AuthorID
with NewAuthorID.
 
T

Tom

Hi Karl & Tom

I do very much agree agree with Toms comment about a badly designed db - but
this is originated by a different developer and we have been tasked with its
correction.

I think I have explained myself badly re: the authors table we are looking
to rebuild. The original book table had no AuthorID field but for each book
the AuthSName and AuthFName. There was no AuthID field in the original
table. We have run a make table query to build a Author table -
tblAuthors -using the AuthSName, AuthFName and BookID fields from the
original table. To that table we have added the numeric field AuthID which
is currently unpopulated. Whilst the AuthID can be populated by using
autonumber, on indexing the new Author table on AuthSName we have noticed
that we have many instances of where the same author is the author of
numerous books and therefore this method would produce a different AuthID
for each entry of that authors name.

Therefore we we need to be able to do is populate the AuthID field but where
there are identical entries of AuthSName and AuthFName to assign the same
AuthID number to those identical entries.

The new table - using the AuthID and BookID fields - will then be used to
link a book to its author(s)


Tom
 
J

John W. Vinson

I think I have explained myself badly re: the authors table we are looking
to rebuild. The original book table had no AuthorID field but for each book
the AuthSName and AuthFName. There was no AuthID field in the original
table. We have run a make table query to build a Author table -
tblAuthors -using the AuthSName, AuthFName and BookID fields from the
original table. To that table we have added the numeric field AuthID which
is currently unpopulated. Whilst the AuthID can be populated by using
autonumber, on indexing the new Author table on AuthSName we have noticed
that we have many instances of where the same author is the author of
numerous books and therefore this method would produce a different AuthID
for each entry of that authors name.

Therefore we we need to be able to do is populate the AuthID field but where
there are identical entries of AuthSName and AuthFName to assign the same
AuthID number to those identical entries.

The new table - using the AuthID and BookID fields - will then be used to
link a book to its author(s)

I'd suggest doing this in two steps. You'll be going out on a bit of a limb -
you *could* have books by two different authors who just happen to have the
same name, but that's going to be a problem any way you do it.

First create an Authors table with an autonumber AuthorID field, FirstName,
LastName, any other desired biographical fields to be filled later. Create an
Append query to append JUST ONE first/lastname combination. DON'T include the
BookID. It will be something like

INSERT INTO Authors([FirstName], [LastName])
SELECT DISTINCT AuthFName, AuthLName
FROM yourtable;

Then you'll need a second Authorship table to resolve the many (authors) to
many (books) relationship. This will have fields BookID and AuthorID (Long
Integer if these ID's are autonumbers).

You'ld create an append query joining your table to the newly created Authors
table, joining by AuthFName and AuthLName:

INSERT INTO Authorship(BookID, AuthorID)
SELECT yourtable.BookID, Authors.AuthorID
FROM yourtable INNER JOIN Authors
ON yourtable.AuthFName = Authors.FirstName
AND yourtable.AuthLName = Authors.LName;
 
T

Tom

Thanks John

Will try your suggestions and will post back here if there is any problem.

Tom
John W. Vinson said:
I think I have explained myself badly re: the authors table we are looking
to rebuild. The original book table had no AuthorID field but for each
book
the AuthSName and AuthFName. There was no AuthID field in the original
table. We have run a make table query to build a Author table -
tblAuthors -using the AuthSName, AuthFName and BookID fields from the
original table. To that table we have added the numeric field AuthID which
is currently unpopulated. Whilst the AuthID can be populated by using
autonumber, on indexing the new Author table on AuthSName we have noticed
that we have many instances of where the same author is the author of
numerous books and therefore this method would produce a different AuthID
for each entry of that authors name.

Therefore we we need to be able to do is populate the AuthID field but
where
there are identical entries of AuthSName and AuthFName to assign the same
AuthID number to those identical entries.

The new table - using the AuthID and BookID fields - will then be used to
link a book to its author(s)

I'd suggest doing this in two steps. You'll be going out on a bit of a
limb -
you *could* have books by two different authors who just happen to have
the
same name, but that's going to be a problem any way you do it.

First create an Authors table with an autonumber AuthorID field,
FirstName,
LastName, any other desired biographical fields to be filled later. Create
an
Append query to append JUST ONE first/lastname combination. DON'T include
the
BookID. It will be something like

INSERT INTO Authors([FirstName], [LastName])
SELECT DISTINCT AuthFName, AuthLName
FROM yourtable;

Then you'll need a second Authorship table to resolve the many (authors)
to
many (books) relationship. This will have fields BookID and AuthorID (Long
Integer if these ID's are autonumbers).

You'ld create an append query joining your table to the newly created
Authors
table, joining by AuthFName and AuthLName:

INSERT INTO Authorship(BookID, AuthorID)
SELECT yourtable.BookID, Authors.AuthorID
FROM yourtable INNER JOIN Authors
ON yourtable.AuthFName = Authors.FirstName
AND yourtable.AuthLName = Authors.LName;
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top