Design & Normalization question

Ben · Sep 27, 2003

Hi group-
I am working on re-designing a database structure. My goal is to normalize
the structure to more easily allow for future development. Additionally, I
have to do this for Access, SQL and Oracle.
Anyway, here's what I'm wondering - what are best practices for handling
similar entities? For example, my database has people and companies. Both
can have many similar attributes in one to many or many to many
relationships, e.g. they both can have multiple phone numbers and addresses,
two people could have the same phone number and address, an address could be
for a person and a home business etc. At the same time, there are enough
different attributes that each needs it's own table.

So, my current notion is to this: the person and company primary keys are
identity fields. If I increment them each by 2 but start with 1 and 2 the PK
for people would be odd and the PK for company would be even (also w/ a
constraint just for safety). Now the PK values are unique across both
tables. It seems that the joins would work out very well when querying
against one or both tables. Are there any gotcha's with this type of design?
(I am fairly confident that the number of rows in either tables will not
exeed half an integer value).

TIA for any thoughts
Ben

GPO · Sep 27, 2003

I personally think that the primary key should serve one purpose and one
purpose ONLY. That is to uniquely identify a record. Your proposal seems to
be trying to solve a number of problems with PK. I also think you should
have one concept and ONLY one concept per table. To this end you would have
a companies table, a persons table, and a persons/companies junction table.
You may need the junction table because one company could have more than one
"person" and one "person" might work for more than one company (not
uncommon in highly specialised settings). One person might have many phone
numbers so you might also have a phone table with PersonID, PhoneType
(lookup from say fax, cell, switch, pager, direct, home), and PhoneNumber.

This raises an observation of mine. I often encounter situations where
almost any field that we might take for granted at one point as only ever
having one value, at a later point may need more than one value. A
ContactPerson might be called Yolanda Punch from 1/9/2000 to 1/11/2002.
After that she might be Yolanda Plane. The "easy" fix might be just to
change the last name, but you lose information (that may later be useful) on
when the change took place. The complicated solution is to have a names
table that links back to the ContactPerson table, so that you know what
names they used when. This might even apply to gender. Jo might have been
Joe from 1/7/2000 to 12/12/2002 and Jo thereafter. You may have a genuine
need to retain the start and end dates of all these values.

In order to track changes over time, almost any "field", it would seem,
could require it's own table to be built around it. The hard part is making
all these tables work together efficiently.

GPO

TC · Sep 28, 2003

Here is an example of the kind of thought process that I go through when
thinking about design issues.

two people could have the same ... address...

Is it essential for the system to *know for sure* when two people have the
same address?

* If NO, put the address fields in the person table. Who cares if Tom's
address is entered as "1 blah street", but Dick's is entered as "1 blah st"?
(note the slight difference)

* But if the answer is YES, then, you can not afford to put address fields
in the person table. In the above example, the system would not know that
Tom and Dick are at the same address. You would need a seperate address
table, where the address in question was stored *once* (regardless of how it
was spelled), and Tom & Dick's person records each pointed to that (one)
address.

If I increment them each by 2 but start with 1 and 2 the PK
for people would be odd and the PK for company would be even (also w/ a
constraint just for safety).

Er, this is (ahem) "not a good idea at all"!! Can you clarify what you are
trying to achive by that? There has to be a different way...

HTH,
TC

Ben · Sep 28, 2003

Thanks for your thoughts. I think that the more I explain the situation the
closer I get to a good anser.

Er, this is (ahem) "not a good idea at all"!! Can you clarify what you are
trying to achive by that? There has to be a different way...

OK, let me see if I can clarify the goals a bit. What I'm getting at above
is the desire for a key that is unique across two tables (and I'd prefer not
to use GUIDs). Here is an example of why I think I want this. Users need to
be able to search for say a phone number, and get results whether there are
people or company records or both. Now if both the people and company table
have the same values for thier primary key I need either two resolution
tables to phone numbers, one for people one for company, and design a union
query to return all the matches, or I would need some way of identifying in
one resolution tables whether the People/Company ID refers to a person or a
company. The latter route doesn't seem to make any sense in terms of
writting effecient joins, and the former route seems like a lot of extra
work for a lazy programmer and the SQL engine. *BUT* if the PK is unquie
across the people and company tables then one phone number resolution table
will suffice and queries will simply have 2 inner joins, one for the people
and one for the company tables. I sort of came to this idea after looking at
clustering and partitioning tables horizontally and constraining the PK.

My other solution is to have an "entity" table that basically provides the
PK and nothing else. This PK value becomes the PK value in either the people
or company table, keeping the values unique across both tables. This seems
like a bit of extra work too becuase I would need to define triggers on the
people and company tables, add a row in the entity table, get the identity
value and complete the insert on people or company. Since Access does not
have triggers, I would have quite a work around to be able to support an
Access database. This is why I thought letting the DB do the work might be a
better solutions.

Thanks for your thoughts,
Ben

Adrian Jansen · Sep 29, 2003

I presume you have considered having an Address table, with a foreign key
field linked to a Persons table, and a different foreign key field linked
to Companies. Then you can still search the table for addresses, and pick
up whether they are persons or companies, depending on which key field is
present ?
Same would apply to phone numbers.

I too looked at your idea of having a single field table, just supplying
unique IDs to a table holding all the other attributes as records, rather
than fields, but it gets pretty unmanageable for display and data entry -
you need something like a crosstab, but editable to present the data in the
'usual' format.

--
Regards,

Adrian Jansen
J & K MicroSystems
Microcomputer solutions for industrial control

TC · Sep 29, 2003

Ben, I'll read this today & get back to you tomorrow.
TC

Ben · Sep 30, 2003

Thanks for your thoughts,
The reason I'm so interested in getting this design aspect right is becuase
the concept would be used across a number of attributes that are common to
people and companies, and have one to many or many to many relationships.
Having a resolution table with one field for company and one for people
along with the phone/address/etc. seems like another very viable solution.
Have you seen any performance issues with this design? I'm thinking you'd
need two indexes, along with a separate identity key, or have defaults on
the fields so you could do a compound key across all three.

Thanks,
Ben

TC · Sep 30, 2003

Hi Ben

I understand what you say.

This is all that I can think of, to add.

Let's take phone #s as an example.

If a person can have many #s, and an organization can have many #s, and
persons are stored in a different table to organizations, I still ask the
question: is it essential that the system *knows for sure* that a certain
person & organization have precisely the same #?

= If the answer is *yes*, then, the #s must all be stored in the same
table, & you have the problem that you described.

In that case, I'd be included to make the primary key of the # table, the
person or organization's primary key, plus "P" for person or "O" for
orgaization (or whatever, as appropriate). I really would not bother about
the extra SQL coding or work to do it that way. Jet can handle very
heavy-duty queries, as long as each table is primary-keyed correctly.

= But if the answer is *no*, then, there is no real reason not to have a
table of #s for persons, and a sepertate table of #s for organizations. Then
the problem that you describe, does not exist (since it is irrelevent
whether people & organizations have the same primary keys, or not).

However, now:
(a) The "same" # might be entered differently for a person & an
organization; eg. "123 456" for the person, & "123-456" for the
organization. Now the system does *not* "know for sure" that the person &
organization have the same #.
(b) You'd have to use a union technique for queries.

= As a final alternative, you might consider a table with:

tblEntity
entity_ID (PK) (eg. autonumber)
entity_name
entity_type (P=person, O=organization)

Now the #s could be stored in a seperate table, linked by entity_id; and the
extra attributes for a *person* entity could be stored in one table, & those
for an *organization* entity in another table. This way you agin do not have
problems with "clashing" PK values in the #s table.

Hope that all makes sense!

TC

Adrian Jansen · Sep 30, 2003

Companies and people and addresses certainly cause more than enough
headaches, and its very common. There must be a definitive solution !

I used what I said, a field each for companies and people, and it seems to
work ok. Separate primary key for addresses in the address table. Yes you
could use a composite key, but I dont see much point. I suppose you save
one index, and a long integer of storage, but I dont think you gain any
significant performance. Of course both the foreign key fields are indexed.
One advantage, purely at the GUI level, is that you can use the same subform
for addresses on both Company and Person entry and editing forms.

While it looks like you have some data duplication, in that you can have the
same address data for both a person and a company, having two fields allows
you to differentiate between the following conditions:

Person key null
Company key null
Combined, this should never happen, so you can delete any records. I guess
a composite key would never allow this, but it seems a small advantage.

Person key not null
Company key null
So we have a person at an address. Deleteing the person record can delete
the address record.
Moving the person does nothing to a company, even though that person might
'belong' to a company at the same address. I have a separate junction table
to allow M-M relations between people and companies.

Person key null
Company key not null
Company has an address. Similar to the above

Person key not null
Company key not null
Person and company belong to the same address - handles the case of a
single-owner company, where if the person moves, the company address follows
too ( and vice-versa )

You may also need an address type field, so can separate out
postal/street/delivery/accounts etc addresses.

I also thought about putting the person and company IDs in the same field,
and another field to separate out the types, but that gives problems when
you want to have both person and company at the same address. Also you have
to be very careful with deletes. Same would apply to your idea of even and
odd numbering. Sounds like it goes against the rule of having one 'type' of
thing in one field.

--
Regards,

Adrian Jansen
J & K MicroSystems
Microcomputer solutions for industrial control

Ben · Sep 30, 2003

Thanks to everyone for thier thoughts on this topic.
It looks as though most of us have run into this
situation and have come up with a handful of possible
solutions. Its kind of a surprise not to have seen this
issue addressed in books on RDB design.

I'm still not convinced that there is any reason not to
have a key that is unique across more than one table, the
key still uniquely identifies a record, but that key
would still uniquely identify a record even if you
unioned the people & company tables. The example of using
an entities table that holds the unique key appears to be
another route to the same goal - the keys are unique
across both tables, but via a PK table rather than
constraining the keys. In an earlier post I posed this
scheme, but without triggers, inserts on people or
company require a bit of extra work.

So, I'm going to play with both and see which works the
best.

Thanks again for all of your thoughs.
Ben

Tim Ferguson · Sep 30, 2003

Having a resolution table with one field for company and one for
people along with the phone/address/etc. seems like another very
viable solution.

I am with TC on this: I feel you are making life very much more complicated
than it needs to be. As for the approach described here, it is not even
mathematically appropriate.

To expand on TC's post a bit, you have to make up your mind about what you
are going to use these addresses for. If you are going to run queries like,
"show me everyone who lives at this address", or "give me a list of
addresses where more than one person or company is based", then you do
indeed need a single table of 'things-that-live-in-an-address', which will
have subclasses of Person and Company. For more information on subclassing,
google on this group for Rebecca Riordan's posts.

I strongly suspect, however, that you are rarely going to be asking such a
question, so that PersonalAddresses and CorporateAddresses are functionally
entirely entities. Remember that normalisation is about semantics, and
precisely the semantics of the particular business scene that you are
modelling. There are immediate reasons why PersonalAddresses and
CorporateAddresses would be different -- for example, PersonalAddresses
rarely have multiple DirectDialNumbers, and almost never
CarParkChargeAmount.

The question then, is not "how do I model addresses", but "how do I model
_these_ addresses in _this_ context". And for that, of course, you need to
know bags about the context.

Hope that helps

Tim F

Adrian Jansen · Oct 1, 2003

Good summary Tim.

One point I thought of, is that the idea of having even and odd numbers for
Companies and People is essentially an 'intelligent key', and in general
they are a bad idea.

--
Regards,

Adrian Jansen
J & K MicroSystems
Microcomputer solutions for industrial control

TC · Oct 1, 2003

(snip)

To expand on TC's post a bit, you have to make up your mind about what you
are going to use these addresses for. If you are going to run queries like,
"show me everyone who lives at this address", or "give me a list of
addresses where more than one person or company is based", then you do
indeed need a single table of 'things-that-live-in-an-address', which will
have subclasses of Person and Company.

Yes, that is what I meant - but did not say very clearly!

TC

TC · Oct 1, 2003

Good luck with it!

Post-back your results in due course. I'd like to hear your final results &
conclusions.

Cheers,
TC

Tim Ferguson · Oct 1, 2003

One point I thought of, is that the idea of having even and odd
numbers for Companies and People is essentially an 'intelligent key',
and in general they are a bad idea.

I agree completely. And you can't enforce ref integrity with a rule that
says, "this FK should be in this table or that one.."

All the best

Tim F

Ben · Oct 3, 2003

Once again, thank you all for your insights.
Adrian coined a term in a recent post - 'intelligent key' and Tim pointed
out that referential integrity could not be maintained. I would disagree
that referential integirty cannot be maintained, using sequences in Oracle
and identity columns in SQL along with check contraints, I think integrity
could be maintained. I'm also wondering why an 'intelligent key' is that bad
of an idea. For example, if you enable merge replication on a SQL 2K DB,
each table gets an GUID column, this column is an 'intelligent key' for the
replication agent to uniquely identify any row in any table in a given
database, likewise, with Oracle, any row in an entire database (which is
server in Oracle-speak) is uniquely identified by a
database.schema.object.rowid value. These seem to me to be similar
conceptually to what I'm proposing, I want to be able to identify one thing
across the scope of two tables rather than one. Using the Oracle system, it
would be a bit like storing the object.rowid value rather than just the
rowid.

At the same time, I think maybe my whole idea was an ill-conceived attempt
to 'short-circuit' subclassing people and companies, since the only
attributes they really share (for my purposes) are those in one to many or
many to many relationships.
I think I'll play with it, though with much more caution and a more watchful
eye after this discussion, to see if maybe my particular situation could
benefit, and to learn more about why it might be a bad plan.
Thanks again to all for your time and thoughts,
Ben

Adrian Jansen · Oct 3, 2003

I didnt coin that term, its been around as long as relational databases ( at
least ! ). If you do a search on google in this newsgroup using
'intelligent key' as a keyword, you will get lots of discussion history.

--
Regards,

Adrian Jansen
J & K MicroSystems
Microcomputer solutions for industrial control

Ben said:
Once again, thank you all for your insights.
Adrian coined a term in a recent post - 'intelligent key' and Tim pointed
out that referential integrity could not be maintained. I would disagree

.... snip ........

Ben · Oct 4, 2003

Huh, good to know, I just haven't come across it in my reading.
Thanks again for your time and thoughts.
Ben

Design & Normalization question

Ben

GPO

TC

Ben

Adrian Jansen

TC

Ben

TC

Adrian Jansen

Ben

Tim Ferguson

Adrian Jansen

TC

TC

Tim Ferguson

Ben

Adrian Jansen

Ben