Help, trying to merge and purge

G

Guest

Hello

I've been learning Access for my job and I have been doing quite well on my
own, although I have run into a problem.

I have to update our database with new records, and the way my boss wants to
do this is take our new records (which are already in .mdb format of course)
and merge them with the record we currently have. Then he wants me to purge
all of the duplicate records, so that our new database as our old records and
our new records without any duplicates.

Would anyone a good way to go about this?

Thank you.
 
A

Arvin Meyer [MVP]

The problem is simple if there's only 1 table, but get complex with more
than 1. If just 1 table, merge the records and run the find duplicates
query. Then delete 1 of the duplicate records.

If more than 1 table, you need to use separate tables and write a query to
identify the *possible* duplicates. Then you need to add a column to 1 table
to mark those that are duplicates. Then run a query to identify and delete
the related records. Then delete the marked records.

Now you can add another empty column to the table you are merging, naming
that column to the existing key and renaming your key on the table to merge
to something else. Now append the records and use the old key to add a new
foreign key to the many-side records.

Make several copies of everything before you start because you will
undoubtedly need them.
 
G

Guest

Hi Reno

It all depends on if you are talking about a few records or lots.

If you are only talking about a few you could use a form to check if the
records are already on file and if they have a look at them just to make sure
and they are not then run an append query to add them.

If you're talking about lots of records you could still use an apend but i
would not use a form to check (could take forever). Just run the append with
some validation (like checking on post codes/zip code, 1st name + surname).

Or best of all (but unlikely) if they (both tables) have autonumber ID just
disallow duplicates (this will never happen in real life though - just access
courses :) )

Either way you need to check if the records are already in the DB and if
they are not then append them.

It's quite simply to do once you decide on which approach to take.
 
G

Guest

Thanks for all of the replies:

To answer a few questions about some details;

I'm in marketing, we sell business leads and such, and our database is
filled with records for each state. So we have a table for each state of
course, labeled like AK Ct CA etc etc. Each of those tables have about 1.5mil
records or more.

Now, we have new new records already put into files named after the state
they are located in, so what I'll need to do is that our new state records
(which like I said are about 1.5mil or more per table) and merge them with
our current state tables (Which are about the same amount of records as I
said.)

How complex would that be? And since these are leads they are basically all
names and addresses. Also, do you think it would take a good amount of time
to purge all of the dupe records?

Again thank you for all the replies this is really helping me out a lot
 
G

Guest

Hi

1st of all MAKE A COPY of your database (just in case it all goes wrong).
Work on the COPY until you're happy with the results.

Next - you need to work out what makes a record "unique".
As an example if you had records showing each room in your hous then the
name of the room would be unique (bedroom 1, bedroom 2, Basement, etc) so if
you want to merge another table with this one you would need to disallow new
records with say Bedroom 1.

In your case you need to know (before you start) what you "don't want. As
an example - you have records for lots of people in New York state. OK then
you don't want to get rid of everyone in New Yorks State. In this state you
have a company called Reno's Databases Inc. If Reno's Databases Inc is also
on the other database you "may" not want it in the primary table for New York
State - BUT make sure that there is not a Reno's Databases Inc in another
state's table that you "do" want.

So in this case you will need to create a query (append) and bring in the
New York State table from your primary database (the one you want to keep)
and also a "COPY" of the other table containing data from New York State from
the other database. So you now have 2 tables showing in the design.

Oh - forgot to say, there are other methods to do this but it may be an idea
just to copy the table from the outside database and paste it into you
primary database (call it something like tblNewYorkState2)

Next drag the name field (the field containing Reno's Databases Inc) from
tblNewYorkState2 into the query (top row) and in the Append To row select the
name field of your primary database. In the crieria row (I like to keep
things simple) just put
Not Like
and then then insert the name field from your primary database.

Of course you will need to append all the fields from the external database
into the primary so you will need to include these into the query.

Give it a try on a COPY DB and if you are not happy with the result delete
the copy, make a new one and do it again.

Hope this helps
 
A

Arvin Meyer [MVP]

Now, we have new new records already put into files named after the state
they are located in, so what I'll need to do is that our new state records
(which like I said are about 1.5mil or more per table) and merge them with
our current state tables (Which are about the same amount of records as I
said.)

How complex would that be? And since these are leads they are basically
all
names and addresses. Also, do you think it would take a good amount of
time
to purge all of the dupe records?

By my calculation each table would use approximately 400 MB of space. You
effectively want to double that with each new table. Access/JET cannot
handle that many records, neither can SQL-Server Express or any of the free
versions of SQL-Server (MSDE, etc.)

With Jet, you may be able to get around the limitation by creating a
separate back-end database for each state, but you still have a year or two
at best before you strain the system and corrupt your data. In addition,
this type of design is a kludge, designed to temporarily save a small amount
of money, which is inefficient and extremely poor database design.

I've done work on subsets of large amounts of this type of data for economic
development commissions and we wound up using a full version of SQL-Server
because of the amount of data we had to consume. Considering that you supply
these types of clients, your system demands or even greater. Access is still
an excellent front-end, but you'll need a professionally designed database,
done by someone who builds databases, not just a programmer who happens to
write code for databases. Because you'll need a server based back-end, you
may also need an administrator. A good DBA should be able to accomplish the
tasks to have in mind.
 
G

Guest

Hi Arvin

Yes - you're right. I just worked it out. That would be 156million records
- assuming there no duplicates. Even if there were only 5% duplicates in
each table the DB would at any time be working with upwards of 78,075,000
records -
way above my level :) as I would put meself firmly in the category of
a programmer who happens to write code for databases

--
Wayne
Manchester, England.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top