Duplicate records in query

C Tate · Sep 15, 2004

Our database keeps records on homeless people. If they loose
contact with a service they are 'closed'. If they come back they are
're-opened'. This re-opening process basically seems to involve an append
query to create a duplicate record in the 'bio' table. In addition to this
table is another called 'outcomes'.

When these two tables are used together in a query, the re-opening process
seems to cause duplicate entries to appear in the query results - making it
appear that there are two outcomes for a client instead of one, or eight
instead of four. However, no duplicate outcomes appear in the 'outcomes'
table.

I am totally mystified as to why this should happen but it evidently has
something to do with the 're-opening' process and the append query.

[I posted part of this on an earlier thread which seemed to disappear so
apologies if you have read this before]

Tom Ellison · Sep 15, 2004

Dear C Tate:

There would be some kind of key to the bio table on which this join is
built. Apparently the index built on this key is not 'unique'. This
is at the root of the problem. If you eliminate all duplicates in the
table and then make this key unique, the problem will disappear.
Easier said than done!

A significant problem in working with your existing data is this. If
you are joining to the bio table then there is presumably some
information in that table you want to access. If there is more than
one bio in that table for a subject, then that information may not
always be identical for the multiple records of that person. Whether
you're using the data so joined or working to eliminate the
duplication problem, a decision would have to be made as to which of
the conflicting version of that person's bio you wish to use, and
which to discard.

Because your current construction is seemingly creating ambiguity
regarding these referenced rows in the bio table, the database will
not arbitrarily pick one referenced row and ignore the others. We
should be grateful this is the case. It is showing you ALL the
possible information that fits the request the query is making, not
knowing which one it is you want.

If there are two bios for the same person in which the referenced
columns are all identical, it will still show this twice. Making the
query DISTINCT would eliminate unnecessary duplications of this type.
Likely this is the minority of the instances you have of this problem.

The real cause of the problem is in the design and execution of the
fundamental database. We all strive to eliminate such weaknesses
before they become problems that are much more expensive to repair.

Tom Ellison
Microsoft Access MVP
Ellison Enterprises - Your One Stop IT Experts

C Tate · Sep 16, 2004

Many thanks. I am sure that you are correct - it sounds exactly what is
happening. I think I can understand why the information in the bio table
will show up but am having difficulty making the mental leap to understand
why the outcomes all appear as duplicates. However, making the query
distinct as you suggest might overcome this problem as I can do nothing
about the design of the database. How does one make a query distinct? Is
this simply using the grouping option?

Tom Ellison said:
Dear C Tate:

There would be some kind of key to the bio table on which this join is
built. Apparently the index built on this key is not 'unique'. This
is at the root of the problem. If you eliminate all duplicates in the
table and then make this key unique, the problem will disappear.
Easier said than done!

A significant problem in working with your existing data is this. If
you are joining to the bio table then there is presumably some
information in that table you want to access. If there is more than
one bio in that table for a subject, then that information may not
always be identical for the multiple records of that person. Whether
you're using the data so joined or working to eliminate the
duplication problem, a decision would have to be made as to which of
the conflicting version of that person's bio you wish to use, and
which to discard.

Because your current construction is seemingly creating ambiguity
regarding these referenced rows in the bio table, the database will
not arbitrarily pick one referenced row and ignore the others. We
should be grateful this is the case. It is showing you ALL the
possible information that fits the request the query is making, not
knowing which one it is you want.

If there are two bios for the same person in which the referenced
columns are all identical, it will still show this twice. Making the
query DISTINCT would eliminate unnecessary duplications of this type.
Likely this is the minority of the instances you have of this problem.

The real cause of the problem is in the design and execution of the
fundamental database. We all strive to eliminate such weaknesses
before they become problems that are much more expensive to repair.

Tom Ellison
Microsoft Access MVP
Ellison Enterprises - Your One Stop IT Experts

Our database keeps records on homeless people. If they loose
contact with a service they are 'closed'. If they come back they are
're-opened'. This re-opening process basically seems to involve an append
query to create a duplicate record in the 'bio' table. In addition to this
table is another called 'outcomes'.

When these two tables are used together in a query, the re-opening process
seems to cause duplicate entries to appear in the query results - making it
appear that there are two outcomes for a client instead of one, or eight
instead of four. However, no duplicate outcomes appear in the 'outcomes'
table.

I am totally mystified as to why this should happen but it evidently has
something to do with the 're-opening' process and the append query.

[I posted part of this on an earlier thread which seemed to disappear so
apologies if you have read this before]

Click to expand...

Tom Ellison · Sep 16, 2004

Dear C Tate:

In the SQL View of the query, you can just add the word DISTINCT right
after the SELECT. In the Design View (Design Grid) you can change
query property "Unique Values" to Yes.

This can be done with GROUP BY if you list all the columns. I would
not recommend this, simply because it isn't as direct or a readable.

Tom Ellison
Microsoft Access MVP
Ellison Enterprises - Your One Stop IT Experts

Tom Ellison said:
Many thanks. I am sure that you are correct - it sounds exactly what is
happening. I think I can understand why the information in the bio table
will show up but am having difficulty making the mental leap to understand
why the outcomes all appear as duplicates. However, making the query
distinct as you suggest might overcome this problem as I can do nothing
about the design of the database. How does one make a query distinct? Is
this simply using the grouping option?

Tom Ellison said:

Dear C Tate:

There would be some kind of key to the bio table on which this join is
built. Apparently the index built on this key is not 'unique'. This
is at the root of the problem. If you eliminate all duplicates in the
table and then make this key unique, the problem will disappear.
Easier said than done!

A significant problem in working with your existing data is this. If
you are joining to the bio table then there is presumably some
information in that table you want to access. If there is more than
one bio in that table for a subject, then that information may not
always be identical for the multiple records of that person. Whether
you're using the data so joined or working to eliminate the
duplication problem, a decision would have to be made as to which of
the conflicting version of that person's bio you wish to use, and
which to discard.

Because your current construction is seemingly creating ambiguity
regarding these referenced rows in the bio table, the database will
not arbitrarily pick one referenced row and ignore the others. We
should be grateful this is the case. It is showing you ALL the
possible information that fits the request the query is making, not
knowing which one it is you want.

If there are two bios for the same person in which the referenced
columns are all identical, it will still show this twice. Making the
query DISTINCT would eliminate unnecessary duplications of this type.
Likely this is the minority of the instances you have of this problem.

The real cause of the problem is in the design and execution of the
fundamental database. We all strive to eliminate such weaknesses
before they become problems that are much more expensive to repair.

Tom Ellison
Microsoft Access MVP
Ellison Enterprises - Your One Stop IT Experts

Our database keeps records on homeless people. If they loose
contact with a service they are 'closed'. If they come back they are
're-opened'. This re-opening process basically seems to involve an append
query to create a duplicate record in the 'bio' table. In addition to this
table is another called 'outcomes'.

When these two tables are used together in a query, the re-opening process
seems to cause duplicate entries to appear in the query results - making it
appear that there are two outcomes for a client instead of one, or eight
instead of four. However, no duplicate outcomes appear in the 'outcomes'
table.

I am totally mystified as to why this should happen but it evidently has
something to do with the 're-opening' process and the append query.

[I posted part of this on an earlier thread which seemed to disappear so
apologies if you have read this before]

Click to expand...

Click to expand...