how to get unique resultset

D

djc

table1:
MainID
Field1
Field2

table2:
MainID
Field3
Field4

Lets say I join them on MainID... my result set will only contain the fields
from table1 but my criteria will be checking values in table2. From my query
I can join the tables and check my criteria in the WHERE clause (say, on
Field3) to return a result 'close' to what I need. That result set can have
duplicate MainID's because table2's MainID is not the primary key alone.
table2's primary key is a combo of MainID and Field3. So there are several
records in table2 that have the same MainID. So what I need is only one
occurence of each record from table1 based on MainID returned from this
query...

this should make it more clear. Here is a sample result of my current inner
join query with WHERE clause testing field3 in table2.

table1.MainID, table1.Field1, table1.Field2, table2.Field3
100, some data, some more data, matching data
100, some data, some more data, different matching data that also met
criteria
100, some data, some more data, yet another different match for the criteria

my final result set for this match needs to have only one occurence of the
table1 information like this:
100, some data, some more data

anyone? I rushed this so I apologize for what is probably a very unclear
post. If I don't figure it out or get some input I'll repost with better
attention and detail :) Actually, as I just reread this I think I know the
answer but I will post to see what others might say anyway... I should be
able to use the DISTINCT keyword here right? Originally I thought I couldn't
since it applies to all fields but since I can use the criteria fields in
the WHERE clause without them being in the SELECT part of the statement then
DISTINCT should work.. I'll go try..

any input is appreciated. Thanks.
 
T

Tom Ellison

Dear djc:

If you do not wish to see more than one row in the results for each
row in table1, then don't join to table2. If you do join to table2,
and there is more than on row in table2 for the key on which you JOIN,
the query MUST show more than one row of results. Using your
information, if there are 2 rows in table2 for one of the rows in
table1 and you require it to display Field3, then there are two rows
from table2 from which you are displaying Field3. It takes two rows
to show these two different values.

You have the alternative to aggregate the columns from table2 or to
write a subquery to choose one of the rows from table2 to display and
then not to display the other row.

If all the rows from table2 for a given MainID are identical, then
DISTINCT would fix this. If there is more than one row in table2 for
a value of MainID then all the distinct values from table2 will
display in separate rows.

After all, how could it possibly do this differently and still perform
the query you told it to perform? If you can't explain how this is
possible, give an example.

Tom Ellison
Microsoft Access MVP
Ellison Enterprises - Your One Stop IT Experts
 
D

djc

My post was unclear... In my sample data below I only showed the Field3 to
illustrate 'why' I would get multiple rows with the same MainID.. I don't
actually need to return that column.. which is why using DISTINCT is working
for me (I think). Here is an example.

SELECT DISTINCT tblMain.MainID, tblMain.Title, tblMain.ShortDesc,
tblMain.CreatedOn, tblMain.ModifiedOn
FROM tblMain INNER JOIN tblKeyphrases ON tblMain.MainID =
tblKeyphrases.MainID
WHERE tblKeyphrases Like '*someCriteria*'

1) this seems to be returning what I want... do you see any issue with it?
would you reccomend a different way perhaps?

2) in your reply you first said "If you do not wish to see more than one row
in the results for each row in table1, then don't join to table2". I am
confused by that. How can I use fields in the second table (tblKeyphrases)
as criteria for a WHERE clause if I don't join them?

I am in and out of things that require me to construct sql queries so I
guess you could say I don't have a real solid grip on some of the
fundementals. I usually just try to tackle problems as they come up...
usually with the help of people such as yourself!

thanks for the reply!
 
T

Tom Ellison

Dear djc:

If you use any of the columns in tblKeyphrases other than MainID
(which you could have used from tblMain anyway) and there is more than
one row in tblKeyphrases for that value of MainID (which you indicated
there is) then the only way the query can return the information from
multiple rows of tblKeyphrases is to return multiple rows for the
value of MainID.

I see you are only using tbkKeyphrases to filter out those rows from
tblMain where there does not exist at least one row from tblKeyphrases
with the same MainID value matching your criteria. Doing this with an
INNER JOIN and using DISTINCT is one way of doing that. It was not
until you posted this that I realized you were not SELECTing any
columns from "table2".

Another way, which I prefer, would be like this:

SELECT MainID, Title, ShortDesc, CreatedOn, ModifiedOn
FROM tblMain
WHERE EXISTS (SELECT * FROM tblKeyphrases
WHERE tblKeyphrases.MainID = tblMain.MainID
AND tblKeyphrases.SomeColumn Like '*someCriteria*')

This expresses the thought more directly (in my opinion) and is less
likely to be confused later on when you don't remember everything
about the original. It avoids the DISTINCT by simply not creating the
duplications, which generally seems more efficient than creating
duplication then removing it.

It seems unusual to have a column named tblKeyphrases in a table named
tblKeyphrases, as your code indicates. That is, unless the column
tblKeyphrases is in tblMain instead, but that would be the whole point
of needing to join to the table tblKeyphrases. Well, whatever.

I hope this helped.

Tom Ellison
Microsoft Access MVP
Ellison Enterprises - Your One Stop IT Experts
 
D

djc

1) Thank you. I will try your preferred method. I have never used the EXISTS
keyword... along with probably 70% of the sql language, access specific or
other. hehe. But with every task I try to do I usually learn a new one.
Thanks again for that.

2) the tblKeyphrases was a typo.. its just missing the .columnname after the
table name. whoops.

Thanks again!
 
D

djc

Hey Tom... I have very delayed question: see inline.

Tom Ellison said:
Dear djc:

If you use any of the columns in tblKeyphrases other than MainID
(which you could have used from tblMain anyway) and there is more than
one row in tblKeyphrases for that value of MainID (which you indicated
there is) then the only way the query can return the information from
multiple rows of tblKeyphrases is to return multiple rows for the
value of MainID.

I see you are only using tbkKeyphrases to filter out those rows from
tblMain where there does not exist at least one row from tblKeyphrases
with the same MainID value matching your criteria. Doing this with an
INNER JOIN and using DISTINCT is one way of doing that. It was not
until you posted this that I realized you were not SELECTing any
columns from "table2".

this seems right on...
Another way, which I prefer, would be like this:

SELECT MainID, Title, ShortDesc, CreatedOn, ModifiedOn
FROM tblMain
WHERE EXISTS (SELECT * FROM tblKeyphrases
WHERE tblKeyphrases.MainID = tblMain.MainID
AND tblKeyphrases.SomeColumn Like '*someCriteria*')
I would like to do it this way so I checked out the EXISTS syntax for
subqueries in the Jet sql help file.
This expresses the thought more directly (in my opinion) and is less
likely to be confused later on when you don't remember everything
about the original. It avoids the DISTINCT by simply not creating the
duplications, which generally seems more efficient than creating
duplication then removing it.

this is where I'm confused and I didn't get an answer from the EXISTS syntax
in the jet sql help file. Since the criteria, '*someCriteria*' in your
example, uses wildcards at each end (which is what I want) it seems to me
that it could still return the same type of duplication. For instance if the
criteria were "*comp*" and keyphrases such as "computer science" and
"computer technology" existed then wouldn't I still get the 2 duplicate
MainIDs returned from the subquery?
 
D

djc

yes. Thanks Tom! I understand now.

Tom Ellison said:
Dear djc:

If you have more than one row in tblMain that meets the criteria then
it would show more than one row in the results. This is not what I
thought was the cause of your problem.

If, however, you have more than one row in the subquery from
tblKeyphrases that meets the criteria, it will NOT duplicate in the
query results using the query I gave you.

This syntax will simply test each row from tblMain to see if one or
more associated rows exist in the subquery, and display that row once
no matter how many of these associated rows there are. That is in
contrast to the INNER JOIN, which prepares the query to be able to
display columns from the tblKeyphrases and therefore must "duplicate"
the rows from tblMain in order to display the potentially different
values from tblKeyphrases.

Because of this difference you cannot display the columns from
tblKeyphrases in the syntax I gave you, but that is exactly the point.
By taking away that ability, you take away the duplication. These two
are facilities are inseparable. If you allow the query to display
from the "many" side of the relationship, then you must show the
values from the table on the "one" side of the relationship repeatedly
to give the query a place to display the potentially varying values
from the table on the "many" side. By taking away that ability, you
eliminate the duplication necessarily caused by that ability.

So, does this now match what you want?

Tom Ellison
Microsoft Access MVP
Ellison Enterprises - Your One Stop IT Experts
 
T

Tom Ellison

Dear djc:

If you have more than one row in tblMain that meets the criteria then
it would show more than one row in the results. This is not what I
thought was the cause of your problem.

If, however, you have more than one row in the subquery from
tblKeyphrases that meets the criteria, it will NOT duplicate in the
query results using the query I gave you.

This syntax will simply test each row from tblMain to see if one or
more associated rows exist in the subquery, and display that row once
no matter how many of these associated rows there are. That is in
contrast to the INNER JOIN, which prepares the query to be able to
display columns from the tblKeyphrases and therefore must "duplicate"
the rows from tblMain in order to display the potentially different
values from tblKeyphrases.

Because of this difference you cannot display the columns from
tblKeyphrases in the syntax I gave you, but that is exactly the point.
By taking away that ability, you take away the duplication. These two
are facilities are inseparable. If you allow the query to display
from the "many" side of the relationship, then you must show the
values from the table on the "one" side of the relationship repeatedly
to give the query a place to display the potentially varying values
from the table on the "many" side. By taking away that ability, you
eliminate the duplication necessarily caused by that ability.

So, does this now match what you want?

Tom Ellison
Microsoft Access MVP
Ellison Enterprises - Your One Stop IT Experts
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top