Find Duplicate records

G

Guest

Hi,

I am trying to write a query which will spot duplicates. There are times
when the shipping department gets a duplicate sales receipt. The data source
is a csv file that UPS Worldship creates for processing packages. There is a
limitation to the data. What it gives me is ReferenceNumber which we use to
place a sales receipt number, CustomerName, Address, City, State, Zip and
NumberOfPackages. The Customer name is selected from a drop down and it
fills in the rest of the address info. The number of packages is numeric and
usually 1 but sometimes 2 or 3. The problem is that when there are two
packages being sent to someone, it will list the record twice. So the data
for this record would look like this:

ReferenceNumber CustomerName Address City State Zip
NumberPgk
232323 Joe Smith 111 Main Street Atlanta GA
30303 2
232323 Joe Smith 111 Main Street Atlanta GA
30303 2

for all of the shipments that only have one package, there is one record
with number of packages being 1. If there had been a duplicate with
shipments of one package then I could run a cross tab and spot records that
appear twice but because the shipments that have 2 or more packages have
duplicate (or triplicate) info, the cross tab would pick this up as a false
duplicate. If the above record to Joe Smith had been a duplicate, there
would be four records rather than two.

Can someone help?

Thanks,
 
A

Allen Browne

One approach would be to create a query, and set its Unique Values property
to Yes in the Properties box. (You may need to omit the NumberPgk field.)

Another approach would be to GROUP BY the first 6 fields. Depress the Total
icon on the toolbar and you get the total row.

Not sure if you have control over where this data comes from, but the ideal
solution would be to omit the duplicate records in the first place, and
using a technique like this to generate them when needed:
Print a Quantity of a Label
at:
http://allenbrowne.com/ser-39.html
 
G

Guest

Alan,

Thank you for your help. The reason why there are duplicates is because our
customer service department sometimes will print an order twice to our
shipping department. This department gets 100 orders a day and they sometimes
don't identify I am actually trying to identify the duplicates rather than
filter them out. I think I need the number of packages column. If a record
appears twice but the number of packages is 1 in each record than this is a
duplicate. If a record appears twice but the number of packages is 2 then it
is not a duplicate. If a record appears four times and the number of
packages is 2 for each then it is a duplicate. Some I am trying to factor
this in to identify what is a duplicate and what is not.

Any thoughts?

Thanks,
 
A

Allen Browne

Okay: include the NumberPkg field.

Use DISCTINCT (the Unique Values property) or GROUP BY.

If you use GROUP BY, you can Count the primary key field.
Then in the Criteria under this, enter:
This limits it to records that have a duplicate.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top