Deleting duplicate parents and grouping children to single parent

J

Jonathan

So here is my question. I'm building a rather simple db consisting of a
parent child and child table in a one-to-many relationship (based on the
Parent's ID and the Child's "ParentID" field). For the intended workflow,data
will be regularly entered by importing or cutting and pasting records from an
Excel worksheet. The worksheet rows will contain both parent data and child
data and I've set up a query so this can proceed smoothly. But the result is
that much of the parent data is duplicated. Is there a way for me to automate
the process whereby, after import, duplicate parent records are identified
(say, by one or two matching fields, perhaps more) and the children of the
duplicate parent records can have their ParentID field reassigned to the
first of the duplicate parent records?

This is a problem I've never dealt with before and intrigues me. My users
all have very little experience so it needs to be simple (one or two clicks
perhaps?)

Let me know if I need to explain further.

AND very importantly, an unfortunately, this has to be implemented in Access
2000, though in 6 months I might be upgrading.

Thanks much,

Jonathan
 
A

Allen Browne

It's not exactly simple, Jonathan.

Firstly, you need a way to identify "duplicate" parents. If the data is
identical, you can use the Find Duplicates query wizard to find those
records for you. But chances are that the data is not identical, e.g.:
title could be Ms or Mrs
Address could be slightly different:
7 Fourth St
7 Fourth Street
7 Forth St
7 Forth Way
And so on. Typically only a human eye can make the decision.

Once you have identified duplicates, the next task it to decide which one of
the 2 to keep, to reassign the children of the other to this one, and then
delete the other. To automate that will require some coding experience.
Essentially you will execute an Update query to change the ParentID of the
affected children, and then exeute a Delete query to kill off the
undesirable parent record. Ideally you will exeucte that in a transaction
(particularly if there are other related tables for the parents.)
 
J

Jonathan

Allen,

Thanks for the response. I figured the building would be complex; by simple
I meant I want to keep the steps and decisions for the user as clear as
possible. And while I think I'm capable of the encoding (at least by using
macros and calculations and so forth, VB is beyond me) all the necessary
steps weren't clear to me.

So, after import, the user clicks a button which opens the find duplicates
query (displayed in a form), if they agree that everything but the first in
each series of duplicates can be trashed, a click of another button can run
first the update query and then the delete query. I think I have to presume
that the older parent record, i.e. the first one in the find duplicate query,
will always be the one kept. None of this would involve programmatically
building additional table or queries (unless I suppose I wanted to save the
deleted parents for safety)? For the user then, this is two steps, while for
me it's three steps or so (at least, three independent queries).

Two things I'm uncertain of, but can probably work through, is the exact
expressions for reassigning the ParentID. I think I can group by the
duplicate fields and then use a MIN function to identify the first ParentID
in the group and reassign all the group to that. Then in the delete query I
can select for all parents that lack child records.

Am I understanding this all correctly? One of the issues is that, while I
have a fair amount of experience with access, I have never had to build in
action queries for regular use by the user.

Thanks,

Jonathan
 
A

Allen Browne

You will need VBA code to do this. You will need to build the action query
as a SQL statement to execute.

To reassign 24 to 18, and then kill 24, this is the basic code:
Dim db As DAO.Database
strSql = "UPDATE tblChild SET ParentID = 18 WHERE ParentID = 24;"
db.Execute strSql, dbFailOnError
strSql = "DELETE FROM tblParent WHERE ParentID = 24;"
db.Execute strSql, dbFailOnError

In practice, you will concatenate the values into the string, perhaps from
text boxes, so the line will be like this:
strSql = "UPDATE tblChild SET ParentID = " & [TextNewNum] & " WHERE ParentID
= " & [TextOldNum] & ";"

If you are automating this to keep the lowest ID value from among
duplicates, you will need to use a subquery. Here's a start on that:
http://allenbrowne.com/subquery-01.html
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top