Need If-Then-Else or Similar Construct in Test for Null Value


D

Dana F. Brewer

I have been out of the game for some time. I am importing a text file into a
database that then needs to be fixed up. I need all the records whether they
have addresses or not. Some will have P.O. Boxes instead of street
addresses. Others will have only phone numbers. I want to fix up the data
by combining fields, moving the pobox data to a new field, and finally
creating a new table with the fixed up data.

This message includes current table fields, desired table fields and tests
that need to be performed. At the very end I've listed as far as I got with
the SQL. Please help!

Freshly imported table has fields:

LastName
FirstName
Number (just the address number)
Direction (like S for South, N for North, etc.)
Street (actual street name)
Address2 (contains apartment number, suite number, P.O. Box, etc.)
City
State
Zip
AreaCode
Phone

The new table has the following fields:

Name (firstname + lastname)
Address (number + direction (only if null) + street OR P.O. Box # moved here)
Address2 (contains apartment number, suite number, - do not show P.O. Box
here)
City
State
Zip
Phone (areacode + phone)


SELECT [FirstName] & " " & [LastName] AS Name, [Number] & " " & [Direction]
& " " & [Street] AS Address
FROM TestImports
WHERE [Direction] Is Not Null

Forever Indebted,

....Dana :)
 
Ad

Advertisements

A

Allen Browne

Hi Dana

You can use Is Null with IIf() to test for null, e.g.:
IIf([Number] Is Null AND [Direction] Is Null, "A", "B")

You can use a slight difference in the 2 concatenation operators, e.g.:
"A" & Null => "A"
"A" + Null => Null
Therefore you can avoid the leading space if Firstname is null like this:
[FirstName] & " " + [Surname]

If at all possible, you need to avoid reserved words (such as Name and
Number) as field names. Here's a list of the many names to avoid:
http://allenbrowne.com/AppIssueBadWord.html

The process of normalizing address data is not a simple one, but you might
like to consider these aspects of the issue:
a) One of the basic rules of normalization is to keep each field atomic
(i.e. don't put 2 things in one field.) Therefore a combined name field
might not be a good idea.

b) If one person could have multiple addresses (e.g. street address, PO box,
and possibly others such as a work address), you might need a related table
(one to many relationship.)
 
K

ken

Dana:

In addition to the points Allen has made, you also have some
redundancy in your
new table by including both City and State columns. Knowing the city
means you know the state, so the latter is redundant in this table.
This is not merely a case of inefficiency; more importantly it leaves
the door open to inconsistent data as the same city could be entered
in different sates in separate rows.

What you should have is a numeric CityID column I the main addresses
table. City names can be duplicated, so you need to be able to
distinguish them. You should then have a Cities table with columns
CityID, City and StateID, and a States table with columns StateID and
State. The primary key CityID and StateID columns in Cities and
States can conveniently be autonumbers, but the corresponding foreign
key CityID and StateID in the main address table and Cities must be
straightforward long integer number data types.

Of course the same reasoning in fact means that CityID itself is
redundant in the main table as strictly its determined by Street (or
rather by a numeric StreetID as street names are also duplicated).
Most people would probably not normalize a table to that degree
however.

Area and zip codes are, I think, a different matter. Certainly here
in the UK the former and the first (outgoing) part of the latter do
not determine City as they can span city and even county boundaries (a
full post code is very specific on the other hand and its common here
for databases to identity a complete address solely by the post code
and house number/name). I imagine that it would be possible to create
tables to model the many-to-many relationships between area codes and
cities, and between zip codes and cities, but I don't know if this is
often done.

Another point to consider is whether you might need to record somebody
by State but not by City. This unlikely I'd have thought, but it
illustrates a principle which applies when creating such things as an
international database where each country might not have a regional
structure (e.g. Monaco, Liechtenstein etc) so the 'region' link
between Cities and Counties is broken. What's done is to include
rows in cities with a value such as 'N/A', one row per country. In
your case there'd be one 'N/A' or 'Unknown' row per state in Cities.
This enables the chin between the main addresses table and the States
table to be maintained even if the city is unknown, but the state is.

You might find the following helpful. It’s a set of queries and
explanatory notes taken from a demo file I produced some time ago to
show how imported data can be decomposed into normalized tables. The
demo uses a simple international contacts lists so there are
countries, regions and cities involved in the addresses. Also each
contact can have more than one employer. You'll see that the basic
principle is first to populate the 'referenced' tables (those on the
one side of a one-to-many relationship) and then the 'referencing'
tables (those on the may side). The latter can include tables which
model many-to-many relationships (sometimes called 'junction' tables
or similar).


1. The query below inserts rows into the Countries table with unique
values of the Country column imported from Excel into the table
MasterTable. This is necessary before rows can be inserted into the
Regions table as that table references the Countries table in a many-
to-one relationship.

INSERT INTO Countries (Country)
SELECT DISTINCT Country
FROM MasterTable;

2. Having inserted rows into the Countries table rows can now be
inserted into the Regions table with the query below. This joins the
MasterTable to the newly filled Countries table on the Country columns
and inserts unique values from the Region column of the MasterTable
and the CountryID column of the Countries table into the Regions
table.

INSERT INTO Regions (Region, CountryID)
SELECT DISTINCT Region, CountryID
FROM MasterTable INNER JOIN Countries
ON MasterTable.Country=Countries.Country;

3. Having inserted rows into the Regions table rows can now be
inserted into the Cities table with the query below. This joins the
MasterTable to the newly filled Regions table on the Region columns.
The Countries table is joined to the MasterTable on the Country
columns and to the Regions table on the CountryID columns, thus taking
account of any regions of the same name in different countries. The
query inserts unique values from the City column of the MasterTable
and the RegionID column of the Regions table into the Cities table.

INSERT INTO Cities (City, RegionID)
SELECT DISTINCT MasterTable.City, Regions.RegionID
FROM Countries INNER JOIN (MasterTable INNER JOIN Regions
ON MasterTable.Region=Regions.Region)
ON (MasterTable.Country=Countries.Country)
AND (Countries.CountryID=Regions.CountryID);

4. The previous queries inserted rows into the Countries, Regions and
Cities tables. Following the insertion of data into the last of these,
Cities, it is now possible to insert rows into the Contacts table as
this only needs to reference the Cities table, the relevant Region and
Country being referenced via the relationships between these three
tables. The query below does this by joining the MasterTable to both
the Cities table, on the City columns, and to the Regions table, on
the Region columns. The Cities table is also joined to Regions on
RegionID and the Countries table is joined to the MasterTable on
Country and the Regions table on Country ID. This is to take account
of the possibility of two cities having the same name, but being in
different regions, which themselves could theoretically have the
same name but be in different countries, so that the correct CityID
value is inserted into Contacts.

5. For simplicity it is assumed that contacts at the same address
have unique names. This might not always be the case, particularly
with commercial premises (the developer of this demo once worked with
two Maggie Taylors in the same building!). In such cases, however,
there is likely to be some distinguishing value such as Job Title or
Department which could be used.

INSERT INTO Contacts ( FirstName, LastName, Address, CityID )
SELECT DISTINCT MasterTable.FirstName, MasterTable.LastName,
MasterTable.Address, Cities.CityID
FROM Countries INNER JOIN ((MasterTable INNER JOIN Cities ON
MasterTable.City = Cities.City)
INNER JOIN Regions ON (Regions.RegionID = Cities.RegionID)
AND (MasterTable.Region = Regions.Region))
ON (Countries.CountryID = Regions.CountryID)
AND (Countries.Country = MasterTable.Country);

6. The query below inserts rows into the Employers table with unique
values of the Employer column imported from Excel into the table
MasterTable. This is necessary before rows can be inserted into the
ContactEmployers table as that table references the Employers table.

INSERT INTO Employers (Employer)
SELECT DISTINCT Employer
FROM MasterTable;

7. Having inserted rows into the Contacts and Employers table it is
now possible to insert rows into the ContactEmployers table which
models the many-to-many relationship between Contacts and Employers.
The query below does this by joining the MasterTable to Contacts on
the Address, LastName and Firstname columns and to the Employers table
on the Employer columns. The ContactID values from Contacts and
EmployerID values from Employers are inserted into the two columns of
ContactEmployers.

INSERT INTO ContactEmployers (ContactID, EmployerID)
SELECT Contacts.ContactID, Employers.EmployerID
FROM (Contacts INNER JOIN MasterTable
ON (Contacts.Address=MasterTable.Address)
AND (Contacts.LastName=MasterTable.LastName)
AND (Contacts.FirstName=MasterTable.FirstName))
INNER JOIN Employers ON MasterTable.Employer=Employers.Employer;

The above does assume that the imported data is completely consistent,
which might not be the case. If you'd like a copy of the demo itself,
which is made up of an Excel file from which the data is imported and
the Access file which decomposes it using the above queries mail me
at:

kenwsheridan<at>yahoo<dot>co<dot>uk

Ken Sheridan
Stafford, England

I have been out of the game for some time. I am importing a text file into a
database that then needs to be fixed up. I need all the records whether they
have addresses or not. Some will have P.O. Boxes instead of street
addresses. Others will have only phone numbers. I want to fix up the data
by combining fields, moving the pobox data to a new field, and finally
creating a new table with the fixed up data.

This message includes current table fields, desired table fields and tests
that need to be performed. At the very end I've listed as far as I got with
the SQL. Please help!

Freshly imported table has fields:

LastName
FirstName
Number (just the address number)
Direction (like S for South, N for North, etc.)
Street (actual street name)
Address2 (contains apartment number, suite number, P.O. Box, etc.)
City
State
Zip
AreaCode
Phone

The new table has the following fields:

Name (firstname + lastname)
Address (number + direction (only if null) + street OR P.O. Box # moved here)
Address2 (contains apartment number, suite number, - do not show P.O. Box
here)
City
State
Zip
Phone (areacode + phone)

SELECT [FirstName] & " " & [LastName] AS Name, [Number] & " " & [Direction]
& " " & [Street] AS Address
FROM TestImports
WHERE [Direction] Is Not Null

Forever Indebted,

...Dana :)
 
J

Joey Dee

In addition to the points Allen has made, you also have some
redundancy in your
new table by including both City and State columns. Knowing the city
means you know the state, so the latter is redundant in this table.

I've been following this thread with interest, as I'm involved in a similar
undertaking. My impression from the terminology in the OP is that he's based
in the US (Area Code, P.O. Box, Zip Code) and although it is possible his
database might include some foreign addresses, he doesn't seem to be
concerned with that.

My question, however is what do you mean by "Knowing the city means you know
the state, so the latter is redundant in this table." Are you suggesting
storing only the city and somehow relying on that datum to indicate the
state?

I must be missing something, because I don't see where this would work,
since, as an example, just about every state in the US has a city, town or
township named "Clinton" and there are thousands of other examples.

What am I misunderstanding?
 
J

John Spencer MVP

I think Ken was over-normalizing the data for the situation. My opinion.

In part, I believe he was saying you could have a table of Cities and that
table would contain
A primary key (possibly an autonumber)
A city name
A state name

Then you could use the primary key to assign the city and state to an address
by linking the Cities table to the Address using the value of the primary key.

His structure would likely reduce data errors since you could not assign an
address to a City State combination that is invalid. For instance there is
(as far as I know) no Omaha, New York. There is an Omaha, Nebraska and an
Omaha, Illinois and those would be two choices in the Cities table.

John Spencer
Access MVP 2002-2005, 2007-2009
The Hilltop Institute
University of Maryland Baltimore County
 
J

Joey Dee

AH, thanks for the clarification. In fact, I've been lusting over a zip-code
based table that would allow a user to enter a zip code and have city and
state fields populate automatically.

I guess the fact that zip codes change from time to time would make
maintenance of such a table a nightmare, unless the USPS were willing to
offer updates.

Thanks again.
 
Ad

Advertisements

J

Joey Dee

They do... last I checked it was $500 for the database. See
http://www.usps.com/ncsc/addressinfo/addressinfomenu.htm?from=zclsearch&page=a

is&WT.z_zip4link=AIS

Just looked at the sample for the CityStateZip product; it's a formatted text
file and it's UUUUGGGGGLLLLLYYYYY....

LOL... I had looked at that and nearly passed out trying to parse the data.
Dreamer that I am, I'm just looking for a zip+4 that includes city and state
but that also includes carrier routes (for my neighborhood association's
membership database) but even the $500 for that madness you so rightly call
UUUGGGGLLLLYYY is out of our budget.

There are services that provide that, but again it's the cost.

Thanks, though.
 
D

Dana F. Brewer

Thanks to all who replied. I don't mean this in a bad way but the first post
didn't really answer my question and then subsequent posts were updates to
the first reply which took an entirely different direction.

Could somebody, anybody just help me write a query to do the tasks I asked
for help with? I understand data normalization, field naming convention,
etc. I just typed a quick example so that I could get some help. This data,
once imported will not be manipulated except to mark the records active or
not active. The data will then be exported for use in another application
where maps are being produced.

The IIF sounds like a good idea but could you just use the sample SQL I sent
an expand it for me? I am comfortable with testing it and updating it from
there but I'm stuck as to how to construct the statements I need.

Thanks!

Allen Browne said:
Hi Dana

You can use Is Null with IIf() to test for null, e.g.:
IIf([Number] Is Null AND [Direction] Is Null, "A", "B")

You can use a slight difference in the 2 concatenation operators, e.g.:
"A" & Null => "A"
"A" + Null => Null
Therefore you can avoid the leading space if Firstname is null like this:
[FirstName] & " " + [Surname]

If at all possible, you need to avoid reserved words (such as Name and
Number) as field names. Here's a list of the many names to avoid:
http://allenbrowne.com/AppIssueBadWord.html

The process of normalizing address data is not a simple one, but you might
like to consider these aspects of the issue:
a) One of the basic rules of normalization is to keep each field atomic
(i.e. don't put 2 things in one field.) Therefore a combined name field
might not be a good idea.

b) If one person could have multiple addresses (e.g. street address, PO box,
and possibly others such as a work address), you might need a related table
(one to many relationship.)

--
Allen Browne - Microsoft MVP. Perth, Western Australia

Reply to group, rather than allenbrowne at mvps dot org.

Dana F. Brewer said:
I have been out of the game for some time. I am importing a text file into
a
database that then needs to be fixed up. I need all the records whether
they
have addresses or not. Some will have P.O. Boxes instead of street
addresses. Others will have only phone numbers. I want to fix up the
data
by combining fields, moving the pobox data to a new field, and finally
creating a new table with the fixed up data.

This message includes current table fields, desired table fields and tests
that need to be performed. At the very end I've listed as far as I got
with
the SQL. Please help!

Freshly imported table has fields:

LastName
FirstName
Number (just the address number)
Direction (like S for South, N for North, etc.)
Street (actual street name)
Address2 (contains apartment number, suite number, P.O. Box, etc.)
City
State
Zip
AreaCode
Phone

The new table has the following fields:

Name (firstname + lastname)
Address (number + direction (only if null) + street OR P.O. Box # moved
here)
Address2 (contains apartment number, suite number, - do not show P.O. Box
here)
City
State
Zip
Phone (areacode + phone)


SELECT [FirstName] & " " & [LastName] AS Name, [Number] & " " &
[Direction]
& " " & [Street] AS Address
FROM TestImports
WHERE [Direction] Is Not Null

Forever Indebted,

...Dana :)
 
J

John W. Vinson

Thanks to all who replied. I don't mean this in a bad way but the first post
didn't really answer my question and then subsequent posts were updates to
the first reply which took an entirely different direction.

Could somebody, anybody just help me write a query to do the tasks I asked
for help with? I understand data normalization, field naming convention,
etc. I just typed a quick example so that I could get some help. This data,
once imported will not be manipulated except to mark the records active or
not active. The data will then be exported for use in another application
where maps are being produced.

The IIF sounds like a good idea but could you just use the sample SQL I sent
an expand it for me? I am comfortable with testing it and updating it from
there but I'm stuck as to how to construct the statements I need.

What you originally posted (if I tracked it back correctly) was:
I have been out of the game for some time. I am importing a text file into a
database that then needs to be fixed up. I need all the records whether they
have addresses or not. Some will have P.O. Boxes instead of street
addresses. Others will have only phone numbers. I want to fix up the data
by combining fields, moving the pobox data to a new field, and finally
creating a new table with the fixed up data.

This message includes current table fields, desired table fields and tests
that need to be performed. At the very end I've listed as far as I got with
the SQL. Please help!

Freshly imported table has fields:

LastName
FirstName
Number (just the address number)
Direction (like S for South, N for North, etc.)
Street (actual street name)
Address2 (contains apartment number, suite number, P.O. Box, etc.)
City
State
Zip
AreaCode
Phone

The new table has the following fields:

Name (firstname + lastname)
Address (number + direction (only if null) + street OR P.O. Box # moved here)
Address2 (contains apartment number, suite number, - do not show P.O. Box
here)
City
State
Zip
Phone (areacode + phone)

SELECT [FirstName] & " " & [LastName] AS Name, [Number] & " " & [Direction]
& " " & [Street] AS Address
FROM TestImports
WHERE [Direction] Is Not Null

One question I would have is the appearance of the PO box field in your
import. If your address data is anything typical, you'll have a mix of "PO Box
31", "Box 29", "POB 50", "Bx 20", "P. O. Box 516", "P.O. Box 109", and who
knows how many other variants! Are you lucky enough to have CONSISTANT values
in the field?

One cute trick you can use to handle Directions - incorporate them into the
concatenation if they're non null without an extra blank if they are - is to
use the + operator in addition to the & operator to concatenate strings:

[Number] & (" " + [Direction]) & " " & [Street]

The + operator returns NULL if either argument is NULL so this will give you
"31140 Circle" or "29275 E. Main" as appropriate.

IF - and as I say, it's a *really big if* - your Address2 field will
consistantly contain the string "PO" if it's a PO Box (or your choice of
invariant and reliable identifier), you could set Address to:

IIF(Left([Address2], 2) = "PO", [Address2], [Number] & (" " + [Direction]) & "
" & [Street])

and Address2 to

IIF(Left([Address2], 2) = "PO", Null, [Address2])

I presume that a record with a PO number in Address2 will not ALSO have an
apartment or unit number.

Parsing addresses is notoriously difficult to automate... and you're seeing
plenty of examples why this is so!
 
Ad

Advertisements

D

Dana F. Brewer

Thanks. Before I had a chance to see your suggestions I kept plugging away
and this is what I found to work:

SELECT StrConv([iFirstName],3) & " " & StrConv([iLastName],3) AS FullName,
IIf([iDirection] Is Null,[iStreetNumber] & " " &
StrConv([iStreet],3),[iStreetNumber] & " " & [iDirection] & " " &
StrConv([iStreet],3)) AS Address, iAddress2 AS Address2, IIf([Address2] Is
Null,Address,Address & ", " & Address2) AS FullAddress, StrConv([iCity],3) AS
City, iZip AS Zip, [iAreaCode] & "" & [iPhone] AS Telephone INTO
tblImportFixedUp
FROM tblRawDataImport;

NOW, if I can just figure out how to compare the records in the
tblImportFixedUp with tblWorkingData and only import records into
tblWorkingData where address+address2 is null.

....Dana
John W. Vinson said:
Thanks to all who replied. I don't mean this in a bad way but the first post
didn't really answer my question and then subsequent posts were updates to
the first reply which took an entirely different direction.

Could somebody, anybody just help me write a query to do the tasks I asked
for help with? I understand data normalization, field naming convention,
etc. I just typed a quick example so that I could get some help. This data,
once imported will not be manipulated except to mark the records active or
not active. The data will then be exported for use in another application
where maps are being produced.

The IIF sounds like a good idea but could you just use the sample SQL I sent
an expand it for me? I am comfortable with testing it and updating it from
there but I'm stuck as to how to construct the statements I need.

What you originally posted (if I tracked it back correctly) was:
I have been out of the game for some time. I am importing a text file into a
database that then needs to be fixed up. I need all the records whether they
have addresses or not. Some will have P.O. Boxes instead of street
addresses. Others will have only phone numbers. I want to fix up the data
by combining fields, moving the pobox data to a new field, and finally
creating a new table with the fixed up data.

This message includes current table fields, desired table fields and tests
that need to be performed. At the very end I've listed as far as I got with
the SQL. Please help!

Freshly imported table has fields:

LastName
FirstName
Number (just the address number)
Direction (like S for South, N for North, etc.)
Street (actual street name)
Address2 (contains apartment number, suite number, P.O. Box, etc.)
City
State
Zip
AreaCode
Phone

The new table has the following fields:

Name (firstname + lastname)
Address (number + direction (only if null) + street OR P.O. Box # moved here)
Address2 (contains apartment number, suite number, - do not show P.O. Box
here)
City
State
Zip
Phone (areacode + phone)

SELECT [FirstName] & " " & [LastName] AS Name, [Number] & " " & [Direction]
& " " & [Street] AS Address
FROM TestImports
WHERE [Direction] Is Not Null

One question I would have is the appearance of the PO box field in your
import. If your address data is anything typical, you'll have a mix of "PO Box
31", "Box 29", "POB 50", "Bx 20", "P. O. Box 516", "P.O. Box 109", and who
knows how many other variants! Are you lucky enough to have CONSISTANT values
in the field?

One cute trick you can use to handle Directions - incorporate them into the
concatenation if they're non null without an extra blank if they are - is to
use the + operator in addition to the & operator to concatenate strings:

[Number] & (" " + [Direction]) & " " & [Street]

The + operator returns NULL if either argument is NULL so this will give you
"31140 Circle" or "29275 E. Main" as appropriate.

IF - and as I say, it's a *really big if* - your Address2 field will
consistantly contain the string "PO" if it's a PO Box (or your choice of
invariant and reliable identifier), you could set Address to:

IIF(Left([Address2], 2) = "PO", [Address2], [Number] & (" " + [Direction]) & "
" & [Street])

and Address2 to

IIF(Left([Address2], 2) = "PO", Null, [Address2])

I presume that a record with a PO number in Address2 will not ALSO have an
apartment or unit number.

Parsing addresses is notoriously difficult to automate... and you're seeing
plenty of examples why this is so!
 
Ad

Advertisements


Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top