Eliminating duplicates

S

Stapes

Hi

I tried running this query where TM_Customers table might have more
than one match on it, but I only want one record output, so I set URN1
to indexed - no duplicates, this being unique to each record on the
input.

INSERT INTO TTEmp_AllForPostCodeCheck ( PK_Comp, URN1, [School Name],
Co, [Address 1], Add1, [Address 2], Add2, Pocd, Postcode, TOG_OK )
SELECT TM_Customers.PK_Comp, TTemp_TextImport.[School URN],
TTemp_TextImport.[School Name], TM_Customers.Co, TTemp_TextImport.
[Address 1], TM_Customers.Add1,TTemp_TextImport.[Address 2],
TM_Customers.Add2, TM_Customers.Pocd, TTemp_TextImport.Postcode, 0
FROM TTemp_TextImport INNER JOIN TM_Customers ON
TTemp_TextImport.Postcode = TM_Customers.Pocd;

The query returned no records - just an error message:

3022 The changes you requested to the table were not successful
because they would create duplicate values in the index...

How can I get it to just select the first TM_Customer record that
matches each TTemp_TextImport record?

Stapes
 
S

Stapes

Depends what you mean by "first". The notion that there is a "first" record
implies that they can be arranged in a meaningful sequence. What is it?




I tried running this query where TM_Customers table might have more
than one match on it, but I only want one record output, so I set URN1
to indexed - no duplicates, this being unique to each record on the
input.
INSERT INTO TTEmp_AllForPostCodeCheck ( PK_Comp, URN1, [School Name],
Co, [Address 1], Add1, [Address 2], Add2, Pocd, Postcode, TOG_OK )
SELECT TM_Customers.PK_Comp, TTemp_TextImport.[School URN],
TTemp_TextImport.[School Name], TM_Customers.Co, TTemp_TextImport.
[Address 1], TM_Customers.Add1,TTemp_TextImport.[Address 2],
TM_Customers.Add2, TM_Customers.Pocd, TTemp_TextImport.Postcode, 0
FROM TTemp_TextImport INNER JOIN TM_Customers ON
TTemp_TextImport.Postcode = TM_Customers.Pocd;
The query returned no records - just an error message:
3022 The changes you requested to the table were not successful
because they would create duplicate values in the index...
How can I get it to just select the first TM_Customer record that
matches each TTemp_TextImport record?
Stapes- Hide quoted text -

- Show quoted text -

It doesn't have to be in any sequence - I just want one record
returned per input record and no more.
 
B

Baz

Depends what you mean by "first". The notion that there is a "first" record
implies that they can be arranged in a meaningful sequence. What is it?
 
B

Baz

Stapes said:
Depends what you mean by "first". The notion that there is a "first" record
implies that they can be arranged in a meaningful sequence. What is it?
INSERT INTO TTEmp_AllForPostCodeCheck ( PK_Comp, URN1, [School Name],
Co, [Address 1], Add1, [Address 2], Add2, Pocd, Postcode, TOG_OK )
SELECT TM_Customers.PK_Comp, TTemp_TextImport.[School URN],
TTemp_TextImport.[School Name], TM_Customers.Co, TTemp_TextImport.
[Address 1], TM_Customers.Add1,TTemp_TextImport.[Address 2],
TM_Customers.Add2, TM_Customers.Pocd, TTemp_TextImport.Postcode, 0
FROM TTemp_TextImport INNER JOIN TM_Customers ON
TTemp_TextImport.Postcode = TM_Customers.Pocd;
The query returned no records - just an error message:
3022 The changes you requested to the table were not successful
because they would create duplicate values in the index...
How can I get it to just select the first TM_Customer record that
matches each TTemp_TextImport record?
Stapes- Hide quoted text -

- Show quoted text -

It doesn't have to be in any sequence - I just want one record
returned per input record and no more.

Yes, but on what basis? Each unique TTemp_TextImport.Postcode can
potentially match with multiple TM_Customers records, and we can't simply
eliminate duplicate results because there aren't any: you have guaranteed
that by including the primary key from TM_Customers in the SELECT list.

So you are going to have to lose some of those TM_Customers records, but on
what basis? And what on earth will the results mean? You will finish up
with only a subset of your TM_Customers details, chosen on a presumably
pseudo-random basis.

It seems to me that you need to describe the business problem in order to
get help with this.
 
S

Stapes

Depends what you mean by "first". The notion that there is a "first" record
implies that they can be arranged in a meaningful sequence. What is it?
<snip>






INSERT INTO TTEmp_AllForPostCodeCheck ( PK_Comp, URN1, [School Name],
Co, [Address 1], Add1, [Address 2], Add2, Pocd, Postcode, TOG_OK )
SELECT TM_Customers.PK_Comp, TTemp_TextImport.[School URN],
TTemp_TextImport.[School Name], TM_Customers.Co, TTemp_TextImport.
[Address 1], TM_Customers.Add1,TTemp_TextImport.[Address 2],
TM_Customers.Add2, TM_Customers.Pocd, TTemp_TextImport.Postcode, 0
FROM TTemp_TextImport INNER JOIN TM_Customers ON
TTemp_TextImport.Postcode = TM_Customers.Pocd;
The query returned no records - just an error message:
3022 The changes you requested to the table were not successful
because they would create duplicate values in the index...
How can I get it to just select the first TM_Customer record that
matches each TTemp_TextImport record?
Stapes- Hide quoted text -
- Show quoted text -
It doesn't have to be in any sequence - I just want one record
returned per input record and no more.

Yes, but on what basis? Each unique TTemp_TextImport.Postcode can
potentially match with multiple TM_Customers records, and we can't simply
eliminate duplicate results because there aren't any: you have guaranteed
that by including the primary key from TM_Customers in the SELECT list.

So you are going to have to lose some of those TM_Customers records, but on
what basis? And what on earth will the results mean? You will finish up
with only a subset of your TM_Customers details, chosen on a presumably
pseudo-random basis.

It seems to me that you need to describe the business problem in order to
get help with this.- Hide quoted text -

- Show quoted text -

OK.

I have a table of Customers and an import file of customers and
contacts to add. The imported records are compared against the
existing data to see if the records are already on the database. This
is done by matching School Name, Address Line 1, Town & Postcode.
Because there are so many different spellings and versions of
individual Names & addresses, often all that matches is the Postcode.
These records are then displayed to the user, allowing them to OK them
if they match. Occasionally one record on the import file matches with
2 or more records from the database. The user has gone & ticked 2 or
more of them as being correct matches - and was then surprise to find
2 of everything on his database. I only want the user given more than
1 match. The remaining unmatched records will be presented in the next
screen.
I propose to build a seperate linked table of alternative names &
addresses to eliminate this problem.

Stapes
 
B

Baz

Stapes said:
INSERT INTO TTEmp_AllForPostCodeCheck ( PK_Comp, URN1, [School Name],
Co, [Address 1], Add1, [Address 2], Add2, Pocd, Postcode, TOG_OK )
SELECT TM_Customers.PK_Comp, TTemp_TextImport.[School URN],
TTemp_TextImport.[School Name], TM_Customers.Co, TTemp_TextImport.
[Address 1], TM_Customers.Add1,TTemp_TextImport.[Address 2],
TM_Customers.Add2, TM_Customers.Pocd, TTemp_TextImport.Postcode, 0
FROM TTemp_TextImport INNER JOIN TM_Customers ON
TTemp_TextImport.Postcode = TM_Customers.Pocd;
The query returned no records - just an error message:
3022 The changes you requested to the table were not successful
because they would create duplicate values in the index...
How can I get it to just select the first TM_Customer record that
matches each TTemp_TextImport record?

OK.

I have a table of Customers and an import file of customers and
contacts to add. The imported records are compared against the
existing data to see if the records are already on the database. This
is done by matching School Name, Address Line 1, Town & Postcode.
Because there are so many different spellings and versions of
individual Names & addresses, often all that matches is the Postcode.
These records are then displayed to the user, allowing them to OK them
if they match. Occasionally one record on the import file matches with
2 or more records from the database. The user has gone & ticked 2 or
more of them as being correct matches - and was then surprise to find
2 of everything on his database. I only want the user given more than
1 match. The remaining unmatched records will be presented in the next
screen.
I propose to build a seperate linked table of alternative names &
addresses to eliminate this problem.

Stapes

I use this for deduplicating names and addresses:

http://www.dedupeit.co.uk/

It's not great in usability terms (there's a trick or two you need to know
about, which I would be happy to pass on to you) but the actual
deduplication engine does a great job.

As for your immediate problem, what I would do is to import the new records
to a table, and build a form which showed each new record in turn as a
master record in form view, and listed the matching existing customer
records in a subform in datasheet (or continuous forms) view. The subform
would obviously be linked to the main form on postcode.

On the main form would be a couple of command buttons: one of them would be
"No Match", which would update the master record to indicate that it was
new. The other would be "Match", which would update a field on the master
record with the primary key of the currently-selected subform record. Since
there's only one field for this purpose, each imported record can only be
matched with one existing record.

Even more immediately, you could make your query work like this:

INSERT INTO TTEmp_AllForPostCodeCheck ( PK_Comp, URN1, [School Name],
Co, [Address 1], Add1, [Address 2], Add2, Pocd, Postcode, TOG_OK )
SELECT TM_Customers.PK_Comp, TTemp_TextImport.[School URN],
TTemp_TextImport.[School Name], TM_Customers.Co, TTemp_TextImport.
[Address 1], TM_Customers.Add1,TTemp_TextImport.[Address 2],
TM_Customers.Add2, TM_Customers.Pocd, TTemp_TextImport.Postcode, 0
FROM TTemp_TextImport INNER JOIN TM_Customers ON
TTemp_TextImport.Postcode = TM_Customers.Pocd WHERE
TM_Customers.PK_Comp = (SELECT Max(C2.PK_Comp) FROM
TM_Customers AS C2 WHERE C2.Pocd = TM_Customers.Pocd);

I hope you haven't got too many records though, because correlated
subqueries like this are sloooooow.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Top