Inner join with some null values

J

Jim

Hi all,

Here's a mystery I need to find a work-around for. Maybe someone has a
simple solution.

As part of a de-duping process, I'm finding duplicates through a simple
aggreate (GROUP BY) query. For instance, I'm grouping a mail list (MAIL) on
last name, first name, and zip, and returning the max of an id field (pkey).
I them make a temporary table out (TEMP) of the results (where the count of
the group is more than 1).

Now, I want to find *all* the records in each duplicate set so I can process
them further. So, I join TEMP back onto MAIL using last name, first name,
and zip (not, of course, the ID)

Say there are 5 last names, first names, and zips identical, and so we have
a group of five, with one of them having a max id (say 200). The max id
record becomes a row in the TEMP table. Now if I inner join TEMP back onto
to MAIL on the name and zip fields of id#200, I should get back 5 records,
including 200. This is what I want and it indeed works unless...

Unless one of the fields contains a null. Say in the original aggregate
query, the group of five are grouped on a null zip along with first name and
last name. Although Access will group by null, it won't include the null
values in the inner join.

So, I have a bunch of sets of duplicates, represented by the max id of the
group, but can't find the other records in the group if a field is null
because Access won't join on it. (It's as if Access is willing to identify
"a" as null and "b" as null, but it refuses to say "a=b".)

Any ideas?

Thanks much!
 
A

Allen Browne

In your conclusion, you say:
It's as if Access is willing to identify "a" as null and "b" as null,
but it refuses to say "a=b".

That's exactly what's going on. Two nulls are not equal. For an explanation
of why, see:
Common errors with Null
at:
http://allenbrowne.com/casu-12.html
Error #5 is the one you are talking about.

Provided you are willing to switch your query to SQL View, you can address
this issue. You will see a SQL statement like this:
SELECT ...
FROM temp INNER JOIN mail
ON temp.LastName = mail.LastName
AND temp.FirstName = mail.FirstName
AND temp.Zip = mail.Zip
WHERE ...
ORDER BY ...

You can modify the JOIN so that it explicitly handles nulls the way you
want, e.g.:
ON (temp.LastName = mail.LastName OR
(temp.LastName Is Null AND mail.LastName Is Null))
AND (temp.FirstName = mail.FirstName OR
(temp.FirstName Is Null AND mail.FirstName Is Null))
AND (temp.Zip = mail.Zip OR
(temp.Zip Is Null AND mail.Zip Is Null))

Notes:
=====
1. Watch the brackets when mixing ANDs and Ors.

2. Access will not be able to show you the query in design view after you
make this change.

3. Using Is Null will be better than Nz() for this reason:
http://allenbrowne.com/QueryPerfIssue.html#IsNull

4. Not sure this is exactly what you intend but it should give you a path to
get there. For example, the above doesn't match records if *one* of the
tables has a null zip code and the other one has one: changing the innermost
AND to OR could solve that, but may result in too many false positives.
 
J

Jim

Allen,

Thanks so much, and interesting comments from AccessMonster too.

Question, I got the query to work as you suggested, placing the IIF in the
join clause. Also got it to work in a where condition as well. Do you have a
suggestion as to which is preferable? I am making the assumption that Access
reduces both methods to the same sequence of actual operations anyway.

Peace,
Jim
 
A

Allen Browne

Right: VBA knows how to treat the values, as variants have a subtype. JET
cannot know that: since a VBA function can return a variant of different
subtype each time it's called, and JET can have only one data type for the
entire column, the TEXT type is the only safe choice JET can make.

As you say, sometimes it is able to infer what you want and implicitly
typecast so it still works. But you will find numerous scenarios where this
doesn't work reliably: in the worst cases, you just get wrong results
without any error. If you strive for reliable software, you're not going to
be content with that.

--
Allen Browne - Microsoft MVP. Perth, Western Australia

Reply to group, rather than allenbrowne at mvps dot org.

June7 via AccessMonster.com said:
Something else to consider regarding Nz as I have used in queries.
Calculations in query field involve addition of several Nz function
results
with the '+' operator. So even if the query considers the function result
a
string, the values are still summing. Now whether the final sum is a
string
or number is unclear. Depending on the query, the results are either
output
in report textbox or pulled by a graph RowSource query. Again, everything
is
working.
Interesting. I use Nz in several long and complex queries, 3 are Unions,
without issues; also in numerous textboxes and VBA code. So I read Allen
Browne's article. He states "The Nz() function replaces Null with another
value (usually a zero for numbers, or a zero-length string for text.) The
new
value is a Variant data type, and VBA tags it with a subtype: String,
Long,
Double, Date, or whatever.

This is great in VBA: a function can return different subtypes at
different
times. But in a query, a column can be only be ONE data type. JET
therefore
treats Variants as Text, since anything (numbers, dates, characters, ...)
is
valid in a Text column.

The visual clue that JET is treating the column as Text is the way it
left-
aligns. Numbers and dates display right-aligned."

So I tested the Nz function in the Access VBA immediate window and got the
following:
x=null
?Nz(x,0)
0
?Nz(x,"0")
0
x=1
?Nz(x,0)
1
x="1"
?Nz(x,"0")
1

So yes confirms that in Access VBA is okay. I will accept his authority on
what is happening with the results in queries. However, mine are working
(and
these uses involve calculations on value returned by Nz function) so not
going to mess with but good to be aware of.

The only issue I encountered with Nz was when I wanted Excel to connect to
Access project and pull data from query. Wouldn't work with Nz anywhere in
the project. When I replaced all instances with in-line-if it worked. Also
tested Nz in Excel VBA and would not work.
In your conclusion, you say:
It's as if Access is willing to identify "a" as null and "b" as null,
[quoted text clipped - 45 lines]
 
D

David W. Fenton

There's way to many factors to give you an answer that applies to
all situations, but - in general - JET is more efficient with a
JOIN than a WHERE clause.

That's the first bullet point under:
http://allenbrowne.com/QueryPerfIssue.html#Other

I find this an interesting article, as whenever I've used SHOWPLAN
to evaluate explicit JOINs vs. implicit joins (i.e., the equivalent
WHERE clause), Jet optimized them identicallly. Did you do this with
your test cases? If so, I'd be interested in exactly which
circumstances Jet optimizes the implicit join differently than the
explicit.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Top