Joining 10K rows to 10K rows

C

Commish

I have a need to join 2 tables that are 10K rows each. The purpose is
to verify that the information that is represented in the 2 tables is
consistent. I am comparing input data to extract data that has gone
through an ETL process.

Now, there are some quirks in the data, and I have created a query for
each of the 2 tables to do things like convert null to "NULL" (if both
values are null, they don't join). And I am now joining the queries.

However. by the time that I've added 3 or 4 pairs of fields to the
join, Access has trouble completing the join and presenting it on the
screen. I can do the joins one at a time, but that's not exactly going
to be repeatable with a new round of data. I've compacted and repaired
the database and that doesn't solve the issue.

Can I join 2 queries - each representing about 10K rows each - in
Access on 10 or more fields? Or have I hit Access's practical limits?
 
J

John W. Vinson

Can I join 2 queries - each representing about 10K rows each - in
Access on 10 or more fields? Or have I hit Access's practical limits?

Should be ok. What limits are you experiencing? Do you have indexes on any of
the fields? Are you joining calculated expressions to calculated expressions
(which may well mess things up)? Could you join on a smaller number of indexed
fields (or just primary key to primary key) and use a WHERE condition to
compare the fields?
 
C

Commish

Should be ok. What limits are you experiencing? Do you have indexes on any of
the fields? Are you joining calculated expressions to calculated expressions
(which may well mess things up)? Could you join on a smaller number of indexed
fields (or just primary key to primary key) and use a WHERE condition to
compare the fields?

What limit I am experiencing is that the query hangs and never
completes - eventually Access falls into the Not Responding status.
Or, the query displays the first screen on results, but cannot be
scrolled without timing out - it does not show the count of records
returned in the query.

Unfortunately, much of the information is text - and yes, I am joining
calculated fields to regular fields. In general, I do nt have to join
a calculation to a calculation.

I hadn't considered the indexes (indices?). Let me cut those off and
see what effect it has.
 
J

Jeff Boyce

To join tables, they need to share at least one common field. Do they?

If you are trying to see if records show up in one that aren't in the other,
take a look at the query wizard ... it offers an "unmatched" query.

Regards

Jeff Boyce
Microsoft Office/Access MVP
 
C

Commish

To join tables, they need to share at least one common field.  Do they?

If you are trying to see if records show up in one that aren't in the other,
take a look at the query wizard ... it offers an "unmatched" query.

Regards

Jeff Boyce
Microsoft Office/Access MVP

Jeff, thanks for the tips.

Yes, the tables each have a unique identifier that serves as a primary
key and uniquely identifies each record. I am analyzing data to make
sure that the ETL process is properly coded, so, all 10K records in
the input should have a matching record in the output.

So, the first join works fast. The second one works, but by the time I
have joined on 4 or more fields, the join will no longer complete.

I'm removing the indexes now, and hope that helps.
 
K

KARL DEWEY

Have you considered joining one field and then using all other fields as
criteria? This will give you those that match.
Follow that with a query to display those not pulled in the match.
qryTableMatch --
SELECT Table1.*
FROM Table1 INNER JOIN Table2 ON Table1.FieldX = Table2.FieldX
WHERE Table1.FieldY = Table2.FieldY AND Table1.FieldZ = Table2.FieldZ AND
etc.;

SELECT Table1.*
FROM Table1 LEFT JOIN Table2 ON Table1.FieldX = Table2.FieldX
WHERE Table2.FieldX Is Null;
 
C

Commish

Have you considered joining one field and then using all other fields as
criteria?  This will give you those that match.
Follow that with a query to display those not pulled in the match.
   qryTableMatch --
SELECT Table1.*
FROM Table1 INNER JOIN Table2 ON Table1.FieldX = Table2.FieldX
WHERE Table1.FieldY = Table2.FieldY AND Table1.FieldZ = Table2.FieldZAND
etc.;

SELECT Table1.*
FROM Table1 LEFT JOIN Table2 ON Table1.FieldX = Table2.FieldX
WHERE Table2.FieldX Is Null;

Thanks for all the help everyone.

Removing the indices seems to have worked. The queries are big and
take 10-20 seconds, but I can see the progress bar working and I am
getting the count of records in the results.
 
P

Pieter Wijnen

Also if still no luck, dump the data into temp tables 2 avoid the function
probs.
remember 2 compact when u'r done

hth

pieter

"Commish" <[email protected]> skrev i melding
Should be ok. What limits are you experiencing? Do you have indexes on any
of
the fields? Are you joining calculated expressions to calculated
expressions
(which may well mess things up)? Could you join on a smaller number of
indexed
fields (or just primary key to primary key) and use a WHERE condition to
compare the fields?

What limit I am experiencing is that the query hangs and never
completes - eventually Access falls into the Not Responding status.
Or, the query displays the first screen on results, but cannot be
scrolled without timing out - it does not show the count of records
returned in the query.

Unfortunately, much of the information is text - and yes, I am joining
calculated fields to regular fields. In general, I do nt have to join
a calculation to a calculation.

I hadn't considered the indexes (indices?). Let me cut those off and
see what effect it has.
 
D

David W. Fenton

:
I'm removing the indexes now, and hope that helps.

Removing indexes is going to make things orders of magnitude
*worse*. You need to be ADDING indexes to the fields that you need
to join on, not removing them.
 
D

David W. Fenton

Have you considered joining one field and then using all other
fields as criteria?

Jet/ACE optimizes an explicit JOIN and the equivalent WHERE clauses
exactly the same, so this will gain nothing.
 
D

David W. Fenton

m:
Removing the indices seems to have worked.

This makes absolutely no sense. Indexes speed up joins. Removing
them requires a full table scan. If you join on expressions, those
don't use indexes and also require a full table scan, so your report
here makes absolutely no sense at all.

I'd like to see the SQL.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top