Charles:
It might help if we go back to first principles and quickly outline how
relational databases model the part of the real world they are concerned with
and how this applies in your case.
The database relational model was originally developed around 1970 by Codd
and has become the dominant model for business databases.. Over the years the
model has been developed by a vast amount of published research, and the SQL
language has similarly undergone ongoing development. The important thing,
however, is that the model is based on firm scientific principles, so when
the 'rules' of the model are respected applications can be relied upon to
work correctly.
A well designed relational database identifies entity types in the real
world and represents each of these as tables. The attributes of an entity
type are represented by columns in the tables. Its important that there is
no redundancy (or at least as little as is possible without losing
information) in the tables and this is ensured by means of an analytical
processknown as normalization. Tables are said to be in 'normal forms' which
range from First Normal Form (1NF) to Fifth Normal Form (5NF). Other normal
forms beyond this have been identified, but these are rather bizarre in
nature and we don't need to concern ourselves with them here. In fact for
the momemnt we need only concern ourselves with the first three.
Normalization is defined in terms of the functional dependencies within a
table (and in the case of 5NF across tables). A column in a table is sid to
be functionally dependent on another column (or columns) if the value in the
column is determined by the value in the other column. If I had CustomerID
42 in your database then that determines my FirstName value as 'Ken' and
nothing else and my LastName value as 'Sheridan' and nothing else because the
FirstName and LastName columns in a Cutomers table are functionally dependent
on the CustomerID column. To be in Third Normal Form all non-key field in
the table must be functionally dependent on the whole of the primary key and
nothing but the primary key. So a Customers table with these three columns
is fine as the two non-key columns FirstName and LastName are dependent on
the primary key CustomerID and nothing else.
Say I go to some shows and you have a ShowsAttended table with columns
ShowName, CustomerID, FirstName and LastName. This table is not correctly
normalized because FirstName and LastName are not functionally dependent on
the whole of the table's primary key (The combination of Showname and
CutsomerID in this case), but on part of it (CustomerID). It would be
perfectly possible to have CustomerID 42 and completely different FirstName
and LastName values to mine in the same row.
If we take out the FirstName and LastName columns from this shows table then
the CustomerID maps back to my row in Customers so my names are always
correct. But the table is still not good enough. Its theoretically possible
there might be two different shows with the same name (this certainly happens
in the cinema), so we don't know which one I actually attended. Also it
would be very easy for the Showname to be entered inconsistently in different
rows (I once found three versions of myself amongst the authors of technical
articles in a database I worked on!). What we need is a Shows table with
ShowID (a unique number) and ShowName columns. In ShowsAttended we the have
CustomerID and ShowID columns, plus columns such as DateAttended. This
ShowsAttended table is actually modelling a special kind of entity type
because it is a relationship type between Customers and Shows modelling the
many-to-many relationship between those entity types.
When it comes to types of shows, this is another entity type so we need a
Categories table with columns CategoryID, Category. A show might be
categorised as more than one type, however, Jazz and Blues say, so we need a
table, ShowCategories, to model the many-to-many relationship between Shows
and Categories, with columns ShowID and CategoryID.
With his set of tables we are able to extract whatever information we need
from the data because it is a solid model of the part of the real world we
are interested in. To find out which customers attended shows categorised as
Blues or Country say we'd join the Customers, ShowsAttended, Shows,
ShowCategories and Categories in a query and restrict that query's result set
to rows where the Category was Blues or Country.
Customer interests regardless of whether they've attended any shows of those
categories is another entity type, so is modelled by another table
CustomerInterests with columns CustomerID and CategoryID. So if Jazz is
category 6 and Blues is category 9 and I' have interests in each there would
be two rows in this table for me with values 42;6 and 42:9. This table can
then also be incorporated in the query so it returns not only customers
who’ve attended shows of particular categories, but also who've expressed an
interest in them.
I hope this has given you an idea of how the sort of real world scenario you
are dealing with can be robustly modelled in a relational database. The way
of modelling it by means of a table with many different Boolean
(Yes/No)columns to represent customer interests is not only fundamentally
wrong in a relational database, but would make it very difficult to extract
useful information. The beauty of a solidly designed logical model is that,
because it is a scientifically sound representation of the underlying
realities, it enables you to extract whatever information those realities can
provide. The tools for doing this, particularly the SQL language, provide a
very powerful and flexible armoury for doing so.
If I were in your shoes and faced with the task of producing a solid
database which will not only fulfil your immediate requirements but provide a
firm basis for adding other functionality, the first thing I would do would
be to design the logical model in terms of the entity types and relationships
between them. I'd do this, not on my machine, but with a good supply of
pencils and paper, drawing out the model diagrammatically, and testing it by
asking whether it can not only give me the right answers, but also whether
there is anything in the design which might lead it to give me the wrong
answers, which is what results from a badly normalized model. Only then
would I build the 'back end' database as a set of empty tables and the
relationships between them along with the 'front end' interface for entering,
editing and querying the database. Finally I would transfer the data from its
present sources using various 'append' and 'update' queries. The exact
nature of these would depend on the final design of the logical model and the
nature of the existing source data, but the SQL statements I posted in my
earlier post show the general principles involved in the first stage,
producing the Customers table itself. Inserting the Shows and Categories
data should not be difficult. The trickier work is in populating the other
tables which represent the relationships between these main entity types.
The process tends to be tedious rather than difficult, however.
Ken Sheridan
Stafford, England
CharlesZ said:
Ken,
I am not sure if I really understood your answer. I am truly a beginner
with Access. Let me clarify my situation a little.
I am not the Marketing Director at my theatre but I think he is pretty
satisfied with the layout of the master list. We have about 100 shows every
year which all fall under about 7 different categories. It would be annoying
to have to do a query for 30 different shows over a period of 4 years if we
were just looking for all of our customers who were interested in bluegrass.
Currently our mail goes to customers who have expressed interest in a certain
type of show. I would like to add to that list people who have bought
tickets to a certain type of show as well as new columns for the best
attended shows. That information is contained in the "Choice List". The
mailing list that we use is the "Mainlist".
I guess I was hoping for a step by step procedure for implementing a sql
code that would merge the duplicates as I said originally:
When [Last] & [Address] is the same
Choose [Phone] with number over [Phone] with no number
And choose yes over no for [Film] [Cabaret] [Blues] or [Theatre]
If [First] has different values then keep [First] with phone number
I recognize that this may be difficult, so I got to thinking about another
way that I could do this. First I would create a find duplicates query which
would show one yes/no column (FILM for instance). Then I would append code
to the original query to find duplicates which have different Yes/No values
for that column. Since one would be yes and one would be no and I would
prefer a yes value over a no value I would simply copy and paste all the yes
values on to the no values and then do a similar query for the next column.
That way all of my data would be uniform across the board and I could start
deleting duplicate values without worrying about lost data.
Hopefully you could help me out with that?
Thank you,
Charles