Access Query

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

I have a database of over 2 million lines of data. Each of these lines has
multiple fields (columns) of information associated with it. I am looking for
a way to query/filter so the final result will show duplicates in column A,
B, C. Additionally, I need it to select where Column D does not match.

For example where column A is last name, column B is first name, column C is
address, and column D contains a periodical the individual subscribed to. I
would like to see only instances where the same person, at the same address
has subscribed to different periodicals. Thank you for the help.
 
I have a database of over 2 million lines of data. Each of these lines has
multiple fields (columns) of information associated with it. I am looking for
a way to query/filter so the final result will show duplicates in column A,
B, C. Additionally, I need it to select where Column D does not match.

For example where column A is last name, column B is first name, column C is
address, and column D contains a periodical the individual subscribed to. I
would like to see only instances where the same person, at the same address
has subscribed to different periodicals. Thank you for the help.

A "Self Join" query will do this. Given the size of the database, BACK
IT UP, and then (if you have not already done so) create an Index on
columns A, B and C; it will almost surely not let you create a unique
index but that's not a problem.

Now create a query by adding the table to the query window *twice*.
Join the two instances, joining A to A, B to B, and C to C; select A,
B and C from one of the tables, and D from both of them. As a
criterion on Table_1.D put


so that you'll find only different periodicals (without this criterion
each record will match the other table's instance of the same record).
The > rather than not-equals <> will ensure that you don't find the
pair {Literatunya Gazeta - Harpers} along with {Harpers - Literatunya
Gazeta}.

Note that two people at the same address might be father and son and
have the same name; if you have a Jr. or other suffix include it in
the join as well.

John W. Vinson[MVP]
 
by putting

into the criterion for Table_1.D, I was left with no records. I am wondering
if I was not clear enough in my first question. What I have would look
something like this
A B C D
Jones William 123 Fake St. Harpers
Smith Mathew 133 Fake St. Vogue
Jones William 123 Fake St. Literatunya Gazeta

Additionally I have this information for three years...so if Mr. Jones
signed up for Harpers three years in a row, I have three entries for him.
What I would like the end result to look like would be

A B C D
D_1
Jones William 123 Fake St. Harpers
Literatunya Gazeta

This would leave out Mathew Smith because he only signed up for one magazine
and cut down the number of lines that Mr. Jones shows up on. I'm new at
Access so I hope I've explained myself properly. Thanks again for the help.

J.Scott

John Vinson said:
I have a database of over 2 million lines of data. Each of these lines has
multiple fields (columns) of information associated with it. I am looking for
a way to query/filter so the final result will show duplicates in column A,
B, C. Additionally, I need it to select where Column D does not match.

For example where column A is last name, column B is first name, column C is
address, and column D contains a periodical the individual subscribed to. I
would like to see only instances where the same person, at the same address
has subscribed to different periodicals. Thank you for the help.

A "Self Join" query will do this. Given the size of the database, BACK
IT UP, and then (if you have not already done so) create an Index on
columns A, B and C; it will almost surely not let you create a unique
index but that's not a problem.

Now create a query by adding the table to the query window *twice*.
Join the two instances, joining A to A, B to B, and C to C; select A,
B and C from one of the tables, and D from both of them. As a
criterion on Table_1.D put


so that you'll find only different periodicals (without this criterion
each record will match the other table's instance of the same record).
The > rather than not-equals <> will ensure that you don't find the
pair {Literatunya Gazeta - Harpers} along with {Harpers - Literatunya
Gazeta}.

Note that two people at the same address might be father and son and
have the same name; if you have a Jr. or other suffix include it in
the join as well.

John W. Vinson[MVP]
 
by putting

into the criterion for Table_1.D, I was left with no records. I am wondering
if I was not clear enough in my first question. What I have would look
something like this
A B C D
Jones William 123 Fake St. Harpers
Smith Mathew 133 Fake St. Vogue
Jones William 123 Fake St. Literatunya Gazeta

Additionally I have this information for three years...so if Mr. Jones
signed up for Harpers three years in a row, I have three entries for him.
What I would like the end result to look like would be

A B C D
D_1
Jones William 123 Fake St. Harpers
Literatunya Gazeta


Hrm. That's exactly what I would have expected. Please open your Query
in SQL view and post the SQL here (with real table and fieldnames it
may be easier to figure out what's wrong).

John W. Vinson[MVP]
 
Ok, here it is. Thanks

SELECT Bump.LastName, Bump.FirstName, Bump.Addr2, Bump.Program, Bump_1.Program
FROM Bump AS Bump_1 INNER JOIN Bump ON (Bump_1.Addr2 = Bump.Addr2) AND
(Bump_1.MI = Bump.MI) AND (Bump_1.FirstName = Bump.FirstName) AND
(Bump_1.LastName = Bump.LastName)
WHERE (((Bump_1.Program)>[Bump].[Program]));

John Vinson said:
by putting

into the criterion for Table_1.D, I was left with no records. I am wondering
if I was not clear enough in my first question. What I have would look
something like this
A B C D
Jones William 123 Fake St. Harpers
Smith Mathew 133 Fake St. Vogue
Jones William 123 Fake St. Literatunya Gazeta

Additionally I have this information for three years...so if Mr. Jones
signed up for Harpers three years in a row, I have three entries for him.
What I would like the end result to look like would be

A B C D
D_1
Jones William 123 Fake St. Harpers
Literatunya Gazeta


Hrm. That's exactly what I would have expected. Please open your Query
in SQL view and post the SQL here (with real table and fieldnames it
may be easier to figure out what's wrong).

John W. Vinson[MVP]
 
Ok, here it is. Thanks

SELECT Bump.LastName, Bump.FirstName, Bump.Addr2, Bump.Program, Bump_1.Program
FROM Bump AS Bump_1 INNER JOIN Bump ON (Bump_1.Addr2 = Bump.Addr2) AND
(Bump_1.MI = Bump.MI) AND (Bump_1.FirstName = Bump.FirstName) AND
(Bump_1.LastName = Bump.LastName)
WHERE (((Bump_1.Program)>[Bump].[Program]));

Very odd. This returns no records? Try it without the criterion: do
you at least see each record joined to itself? Can you verify that
there are in fact multiple records in the table with matching
FirstName, LastName, MI and Addr2 fields, but different Programs?
Maybe you could manually duplicate an existing record and change the
Program to verify... maybe there just are no hits!

John W. Vinson[MVP]
 
FYI, the problem involved quotation marks around my criterion. I removed the
quotation marks and found duplicate records. What it did though was show me
record 1and 2 on line one, 1 and 3 on line 2, 1 and 4 on line 3. This ended
up working out ok, I just have to weed through the info a little bit but
still I was left with 2400 lines of data which is far less than I started
with.

John Vinson said:
Ok, here it is. Thanks

SELECT Bump.LastName, Bump.FirstName, Bump.Addr2, Bump.Program, Bump_1.Program
FROM Bump AS Bump_1 INNER JOIN Bump ON (Bump_1.Addr2 = Bump.Addr2) AND
(Bump_1.MI = Bump.MI) AND (Bump_1.FirstName = Bump.FirstName) AND
(Bump_1.LastName = Bump.LastName)
WHERE (((Bump_1.Program)>[Bump].[Program]));

Very odd. This returns no records? Try it without the criterion: do
you at least see each record joined to itself? Can you verify that
there are in fact multiple records in the table with matching
FirstName, LastName, MI and Addr2 fields, but different Programs?
Maybe you could manually duplicate an existing record and change the
Program to verify... maybe there just are no hits!

John W. Vinson[MVP]
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Back
Top