Normalizing - HELP!

gweasel · Feb 12, 2007

I cannot for the life of me wrap my brain around this. Maybe I don't
need to do anything for this particular table (and that is what's
making it so hard for me to grasp), but I'm pretty sure I do as a lot
of information repeats.

As I understand it, normalizing table design means you break a table
down into multiple tables so that you don't have any fields in a
single table that repeat. Right?

To help me see this (I've got a ton of information to enter, I just
need to figure out how to start), can someone tell me what I need to
do for this table (using only these selected samples).

Table Name: PaperInventory

Field Names: Info In Field:
PaperName paper weight.papersize.paperclass.papercolor (ex.
20.L.B.W)
PaperWeight weight (20lb, 60lb, etc...)
PaperSize size (ex. letter. tabloid. etc...)
PaperClass (ex. cover, bond, text)
PaperColor color of the paper

Here's a sample of some of the information I've got.

20lb Letter Bond White
20lb Letter Bond Blue
20lb Tabloid Bond White
60lb Letter Bond White
60lb Tabloid Bond White
80lb Tabloid Text White
80# Tabloid Cover White
80# Letter Cover White

As you can see, there are a lot of them that have things in common.
How would I normalize this, or is it necessary to do so?

Jason Lepack · Feb 12, 2007

tbl_PaperWeight:
PaperWeight_id - autonumber - indexed (no dupes)
PaperWeight_name - text - PK

tbl_PaperSize:
PaperSize_id - autonumber - indexed (no dupes)
PaperSize_name - text - PK

tbl_PaperClass:
PaperClass_id - autonumber - indexed (no dupes)
PaperClass_name - text - PK

tbl_PaperColor:
PaperColor_id - autonumber - indexed (no dupes)
PaperColor_name - PK

tbl_PaperInventory:
Paper_id - autoNumber - indexed (no dupes)
PaperWeight_id - number - FK - PK
PaperSize_id - number - FK - PK
PaperClass_id - number - FK - PK
PaperColor_id - number - FK - PK

Join all of the matching fields in relationships to enforce
referential integrity.

To collect your PaperName you can use a query whenever you wish to
collect it.

Cheers,
Jason Lepack

Guest · Feb 12, 2007

It depends on how you are going to use the information.

Your field PaperName contains all the data in the other fields and if you
are only going to sort by this then it is all you need. BUT if you will be
sorting/counting by paper size without caring to color OR sorting/counting by
color and not concerned with size then you need the separate fields and not
the PaperName.

Tim Ferguson · Feb 13, 2007

don't have any fields in a
single table that repeat. Right?

Wrong.

"Repeating fields" is a phrase often used to describe (one aspect of)
non-normal models, but it's not very precise or accurate. Specifically,
this is usually used about tables that have Owner1Name, Owner1Address,
Owner1Name, Owner2Address and so on.

Normalisation is all about making sure that all the fields in the table
relate to the (primary)* key. If you know the VehicleRegistationNumber,
then there is only one value of Make, of Model, of Colour, of
DateOfManufacture, etc. On the other hand, the NumberOfSeats or
MaxKerbWeight are functions of the Make and Model. All Ford Escorts will
have 5 seats and weigh 1022Kg, regardless of the actual vehicle it is.

This kind of cuts out some repetition: you don't need to store Seats and
Weight etc for every car, when you could keep another list of Seats and
Weight for every variety of Make and Model you are interested in. But
it's not about saving disk space: R databases will often be bigger than
their flat-file equivalents, and they will often be slower too.

The difference is in accuracy, otherwise known as integrity. It's no good
having one record of a Ford Escort having six seats and 1499Kg and being
different from all the others. Which one do you believe? For many of us,
having a database that is correct is more important than having one that
is fast. What happens when you remove the last Ford Escort from the
table? You will lose all information about the number of seats and kerb
weight for that make and model - this may or may not be an important
problem. It would be a big problem if a Doctor ceased to exist because he
went away on holiday and all his Appointments were handled by a locum.
Think of an Employees table, with details of ProjectName, ProjectStart,
etc etc: when there is a change in personnel, if the old ones are deleted
before the new people are inserted, then there won't be a Project to add
them to.

This is why non-R models are said to suffer from Insert, Update and
Delete anomalies, and why R models can be _guaranteed_ to provide the
information that they are said to provide. No, it's not always easy, and
that is why good systems analysts get paid the big bucks. It's certainly
not just a system of applying rules like "no repeating fields", I'm
afraid.

Hope that helps

Tim F

[* note for Jamie: I know that it applies to all candidate keys, but I'm
trying to keep it simple.. :-)

]

Jamie Collins · Feb 14, 2007

"Repeating fields" is a phrase often used to describe (one aspect of)
non-normal models, but it's not very precise or accurate. Specifically,
this is usually used about tables that have Owner1Name, Owner1Address,
Owner1Name, Owner2Address and so on.

The difference is in accuracy, otherwise known as integrity.

Assuming your example is indeed non-normal, do you think there is a
case for 'denormalizing for integrity' in Access/Jet?

Employing your example, consider the rule, 'Must have two owners'.
Modelling as two attributes translates to two columns in the same row,
therefore making both columns NOT NULL and adding a record-level
Validation Rule to ensure Owner1 <> Owner2 is a simple and effective
way of enforcing the rule. In fact, considering the many limitations
of Access/Jet (e.g. cannot defer a table-level CHECK constraint while
each row is inserted*) the alternatives I can think of aren't
completely satisfactory either.

* I can INSERT multiple rows in one operation but Jet cannot, it would
seem e.g.

CREATE TABLE Owners
(OwnerID INTEGER NOT NULL UNIQUE
)
;
CREATE TABLE Chattels (
ChattelsID INTEGER NOT NULL,
OwnerID INTEGER NOT NULL
REFERENCES Owners (OwnerID),
UNIQUE (ChattelsID, OwnerID)
)
;
INSERT INTO Owners VALUES (1)
;
INSERT INTO Owners VALUES (2)
;
INSERT INTO Chattels (ChattelsID, OwnerID)
SELECT 1 AS ChattelsID, OwnerID
FROM Owners
WHERE OwnerID IN (1, 2)
;

Now add the rule:

ALTER TABLE Chattels ADD
CONSTRAINT chattels__must_have_two_owners
CHECK
(
NOT EXISTS
(
SELECT C2.ChattelsID, COUNT(*)
FROM Chattels AS C2
GROUP BY C2.ChattelsID
HAVING COUNT(*) <> 2
)
)
;

and test it based on the current data:

UPDATE Chattels
SET ChattelsID = ChattelsID
;

OK so the rule is satisfied. Try removing then re-adding the data
while the rule is in place:

DELETE FROM Chattels;

Oops! The CHECK bites already. The SQL statement should empty the
table and an empty table should not fail the rule. This should
indicate there is a fundamental problem but let's persevere;

ALTER TABLE Chattels DROP
CONSTRAINT chattels__must_have_two_owners
;
DELETE
FROM Chattels
;
ALTER TABLE Chattels ADD
CONSTRAINT chattels__must_have_two_owners
CHECK
(
NOT EXISTS
(
SELECT C2.ChattelsID, COUNT(*)
FROM Chattels AS C2
GROUP BY C2.ChattelsID
HAVING COUNT(*) <> 2
)
)
;
INSERT INTO Chattels (ChattelsID, OwnerID)
SELECT 1 AS ChattelsID, OwnerID
FROM Owners
WHERE OwnerID IN (1, 2)
;

And the CHECK bites once more.

Considering the required approach of DROP CONSTRAINT -> INSERT/DELETE -

ADD CONSTRAINT within a transaction that would lock the table, I can

see why the Owner1, Owner2 attribute approach is tempting.

Normalisation is all about making sure that all the fields in the table
relate to the (primary)* key.

[* note for Jamie: I know that it applies to all candidate keys, but I'm
trying to keep it simple.. ]

LOL!

Jamie.

--

gweasel · Feb 14, 2007

Thanks for the info, and thanks for the detail Jason.

However, when I set the tables up as you suggested, I cannot create
the relationships to enforce referential integrity.
What simple thing am I overlooking now?

Jason Lepack · Feb 14, 2007

From the main database window click "Tools"->"Relationships" Add in
your tables and link the fields that should be linked. Make sure you
click the "Enforce Referential Integrity" check box for each
relationship.

Cheers,
Jason Lepack

Tim Ferguson · Feb 14, 2007

Uuugggh- it had to be you! :-)

I am a bit worried that we are going to
sink the OP, but here goes...

Assuming your example is indeed non-normal, do you think there is a
case for 'denormalizing for integrity' in Access/Jet?

I don't recognise the concept, but that is mainly because I trust
normalisation to provide integrity. That is not to say that all business
environments can adequately be represented in set theory, and there are
some that can but very messily.

--SKIP-- worked example which I understand but is not quite on the point
of what I was trying to explain above. Which was the Name/Address sets; I
think this is what people mean by removing repeating fields. I was also
thinking where there is equivalence between Owner1 and Owner2 (etc up
OwnerX), which obviously should be a 1:many relationship instead.

A simpler example would be two fields OldOwner and NewOwner, where the
relationships are clearly non-equivalent.

I don't know the best way to model the situation you suggested. I have a
guess that it's what dbInconsistent recordsets are for, but I also agree
with what you are about to say about embedding business rules in the user

interface said:
[* note for Jamie: I know that it applies to all candidate keys, but
I'm trying to keep it simple.. ]

Click to expand...

LOL!

Should have known that was tempting fate!

All the best

Tim F

gweasel · Feb 14, 2007

Jason,

That would be a simple thing, but I did manage that. The issue is
that when I try to link them I get error messages saying either:

"Can't create this relationship and enforce referential
integrity"..."Data in the 'PaperInventory table violates referential
integrity rules".

or

"No unique index found for the reference field of the primary table".

I do have them set up exactly as you specified (as "Autonumber" -
Index - no duplicates or "Text" or "Number" where necessary).

Thanks for the help, I do appreciate it. If it helps, the final use
of this table is to display on a workorder where there will be 4 combo
boxes (so the user selects "Weight" first, which will then filter the
second combo box "Color" based on what colors are available for the
selected weight...etc...). A report can then be printed which will
display the 4 values in a single line (ie... 20lb White Bond Letter)

-RJB

Jason Lepack · Feb 14, 2007

Every item that exists in PaperInventory must already exist in the
other tables bevore you can enforce referential integrity.

Referential Integrity ensures that you don't create an item that you
don't have. Therefore if you don't have the color "Blue" in your
PaperColor table and you try to add "Blue" to your PaperInventory
table it won't let you.

If you still have issues then you can email me your database so I can
see exactly what's going on.

Cheers,
Jason

gweasel · Feb 14, 2007

Thanks for the help.

I think I figured it out though...my field names were different. I
had spaces between "PaperClass" and "ID" in one table, and it was
"PaperClassID" in the other.

Relationships are set up now and enforcing. If (when) I need more
help, I'll let you know.

Thanks again!

Normalizing - HELP!

gweasel

Jason Lepack

Guest

Tim Ferguson

Jamie Collins

gweasel

Jason Lepack

Tim Ferguson

gweasel

Jason Lepack

gweasel