Hi Mike,
In my initial response, I told you how to find the dupes. I did fail to go into much detail regarding how you would select the ones to delete. That's because I was tired and because there are so many potential constraints to deal with that it's mind-boggling. Most important advice - make backups frequently (before every major alteration) and don't over-write the backups, make a collection of them like a janitor's key ring. Label/sort them so you can find what you need because it's a near certainty that you will mess up at least once and find yourself needing to restore whatever. Keep in mind that you may not realize when you mess up, not right away - you could be 3 or 4 steps down the line before you have your epiphany.
For one thing - if you have related transaction tables (and you probably do), then along with the deletions - you have to figure out what you want to do about the related records. How are you using this data, and how might it be used down the road? More than one person has discarded 'old data' or 'dupe data' only to regret it later - they didn't need the information at the time but then, six months or a year later - uh oh.
There is the dilemna of the phones and email contact info. The two records may reflect alternate contact numbers or emails - and no matter which way you go, if you keep one and dump the other - you will be wrong close to 50% of the time. Murphy's Law insures that the ones you get wrong, will be ones that matter.
Where you have dupe clients, you may have purchase histories, transactions, split across both clients. You can almost guarantee that, no matter which one you delete, 50% of the time customer service will need to (for warranty or support) access the 'orphaned' transaction - they will search for transactions with that client and have to tell the client - I can't find your purchase here - do you have your invoice number? <not fun for the svc rep or the client>
If these are sales leads - then the dupe might be the result of the prospect being in contact with the company on two separate ocasions, maybe talking to two different sales reps, regarding two different products or services. What is value of this transaction history? If you have a professional sales force, It can be substantial.
If prospect Jones calls back again, asking (again) about an item or new item - then the level of interest should be considered significantly greatly than a first contact - the sales rep should make every possible attempt to negotiate a close - the client has a sustained interest in doing business with your company; it's time to find out whatever the hidden object is and deal with it.
The sales rep could go to an upsell, explore the feasibility of bundling the products or services together with this one - maybe discounting the bundle to sweeten the deal while maintaining a comfortable transaction payout overall. The point being, these bits of extra insight/knowledge are like money in the bank if you have a skilled sales force. Just mentioning that previous contact (in context of the presentation) makes the customer feel that you take him seriously, that he is important to you and that you are a professional - he perceives that you have your 'stuff' together, you know what you are doing. This is a rather huge edge for the sales team.
You have to make some value judgements to make and research to do, before you hit the switch on this one. I'd make a list of every table in the application (and any applications that links to these tables) that uses the primary key of this table as a foreign key.
When you are done with this, you still won't be done. You will have some dupes where the address is almost identical but not quite Ste vs PMB vs #, one address has apt, next one says #, third skips the apt number altogether. (That is, unless your entries are being address corrected real time and if they were, you would have probably been catching dupes on entry - right? ) On the names side, you may have middle initial on one entry, none on the 2nd, and the whole middle name on the third. Checking for dupe emails and dupe phones is the best way to nail 98% of the 'rest of the dupes'.
Hope this helps,
Gordon