need guidance on normalizing a table

rpw · Feb 29, 2008

Hi,

I've imported some data from a spreadsheet and the table looks like this:

tblZipTerritory
ztID (PK)
ZipCode
Territory (this is the non-normalized bit)

I now have a territory table with territoryID (PK) and territoryName (text
field) in it.

How can I replace the Territory text field in tblZipTerritory with the
territoryID (without entering it one by one)?

rpw · Feb 29, 2008

One more bit of information that might help; the data in
tblZipTerritory/Territory field/column has exact matches in
tblTerritory/territoryname field/column

Steve Schapel · Feb 29, 2008

RPW,

Well, to be honest, what you are proposing is not going to make your
system any more normalized than it is now. You are replacing a text
field in the tblZipTerritory table with a numerical field, which some
may feel might make it more efficient. But that isn't related to
normalisation. And then you have to include a join to the lookup table
every time you need the data. Personally I probably wouldn't bother.

Anyway, to answer your question... You can use an Update Query for this.
First of all add a TerritoryID field to the tblZipTerritory table.
Then make a Query that includes both tables, joined on the fields with
existing matching data, i.e. tblZipTerritory.Territory <=>
Territory.TerritoryName
Add the tblZipTerritory.TerritoryID field to the query design grid.
Make it an Update Query (select Update from the Query menu).
In the Update To row in the grid, enter like this:
[Territory].[TerritoryID]
Run the query (click the toolbar button with the red [!] icon).
Then delete the Territory field from the tblZipTerritory table.

There. that should do it... but make sure you have a backup just in case!

rpw · Feb 29, 2008

Ahh, thank you Steve!

That last bit about what to put in the Update To did the trick- it was
exactly what I couldn't figure out on my own. Probably simple for some, but
not for me - I was thinking that I'd have to do a different criteria filter
and update for each of the 84 different territory names.

Thanks for the comments on normalizing this. I'm trying to do this 'by the
book' and my understanding was that any time you had repeating data you
should consider it a different table/topic. In this case, one territory is
related to many zip codes and it is also related one-to-many to a few other
topics (contracts and counties to name two). Plus, the name of any given
territory might change in the future (so I guess that's where the efficiency
comes in?). Does any of this change your opinion or am I still 'off' on the
normalizing concept?

Anyway, thanks again for the help!
--
rpw

Steve Schapel said:
RPW,

Well, to be honest, what you are proposing is not going to make your
system any more normalized than it is now. You are replacing a text
field in the tblZipTerritory table with a numerical field, which some
may feel might make it more efficient. But that isn't related to
normalisation. And then you have to include a join to the lookup table
every time you need the data. Personally I probably wouldn't bother.

Anyway, to answer your question... You can use an Update Query for this.
First of all add a TerritoryID field to the tblZipTerritory table.
Then make a Query that includes both tables, joined on the fields with
existing matching data, i.e. tblZipTerritory.Territory <=>
Territory.TerritoryName
Add the tblZipTerritory.TerritoryID field to the query design grid.
Make it an Update Query (select Update from the Query menu).
In the Update To row in the grid, enter like this:
[Territory].[TerritoryID]
Run the query (click the toolbar button with the red [!] icon).
Then delete the Territory field from the tblZipTerritory table.

There. that should do it... but make sure you have a backup just in case!

--
Steve Schapel, Microsoft Access MVP

Hi,

I've imported some data from a spreadsheet and the table looks like this:

tblZipTerritory
ztID (PK)
ZipCode
Territory (this is the non-normalized bit)

I now have a territory table with territoryID (PK) and territoryName (text
field) in it.

How can I replace the Territory text field in tblZipTerritory with the
territoryID (without entering it one by one)?

Click to expand...

Steve Schapel · Feb 29, 2008

RPW,

Glad to hear that it's worked out for you.

It isn't repeating data that indicates a normalisation problem. It is
repeating *combinations* of data. Repeating data as such is very common
and perfectly proper. Just because the data in the Territory field
contains duplicates is absolutely not an indication that it is
non-normalised. If there was another field in there as well, like for
example a Sales Rep, and the Sales Reps were assigned to territories,
well then you would have repeating combinations... every time you saw
territory A in the Territory field, you would always see person X in the
Sales Rep field. Now that *would* be non-normalised. But when you
think about it, what you have done has not achieved any reduction in
repeating data. It is just that in one table you have replaced the text
version of the data with a numerical reference to it, but the amount of
repeating data is exactly the same. We could easily here jump off into
one of those perennial debates about natural or surrogate keys - but
that's totally another topic really. From the point of view of
normalisation, I confirm what I said before, and I don't regard what you
have done as "by the book" at all. On the other hand, please understand
that I'm not saying it's wrong - just that it's not normalising
anything. I mentioned efficiency because that is one of the reasons
sometimes given for taking this type of approach, i.e. sorting and
selecting and joining on number fields is quicker than on text fields.
If you have a very large number of records, or a very slow old computer,
this may be a consideration.

rpw · Feb 29, 2008

Thank you for clarifying (but I'm not sure that I'm straight on it).

So, if the fields are only Zip and Territory, then it is OK to have the
repeating text because each combination is unique. Perennial debates aside,
it is simply faster (or more efficient) for the processor to deal with a
number rather than text.

On the other hand, what if the fields are State, County, Zip and Territory?
The State and County fields would have many duplicate combinations of text
and therefore would be better in a separate table, right?

And then if the new table were only State and County it would be OK. But if
there is State, State Abbreviation and County, then it is better for State
and State Abbreviation to be in a separate table, right?

If the above is 'correct' then I think I might be 'getting it', I'm just
using the wrong terminolgy, or rather not being really clear on describing
the problem I was having.

Thank you again for your input and helping me out on this.

Steve Schapel · Feb 29, 2008

RPW

So, if the fields are only Zip and Territory, then it is OK to have the
repeating text because each combination is unique.

Correct. Or another way of looking at it is to say that the purpose of
the table is to define the Zips, and the Territory is information
*about* the Zip. Just like if you had a table to define people, and you
have a field for Sex, this is information *about* the person, and we
would expect that the values for male and female would be repeated
throughout the table.

Perennial debates aside,
it is simply faster (or more efficient) for the processor to deal with a
number rather than text.

Correct. But if the consequence (as it is in your example) of using a
referenced number instead of text, is that you need an additional table
join, or use of a calculated expression, in order to retrieve the text
value (which is actually the one you normally want to see), then I would
expect that this somewhat offsets the processor speed "advantage".

On the other hand, what if the fields are State, County, Zip and Territory?
The State and County fields would have many duplicate combinations of text
and therefore would be better in a separate table, right?
Correct.

And then if the new table were only State and County it would be OK. But if
there is State, State Abbreviation and County, then it is better for State
and State Abbreviation to be in a separate table, right?

From the point of view of normalisation, that is also correct.

If the above is 'correct' then I think I might be 'getting it', I'm just
using the wrong terminolgy, or rather not being really clear on describing
the problem I was having.

I think you have been very clear in describing the situation. And the
way you have gone about setting this up is the same way many developers
would do it in practice. I wouldn't myself. So yes, I think it is a
terminology thing... that was all I was focussing on. Just clarifying
that having the Territory as a text field in the tblZipTerritory table
does not indicate it is non-normalised. And your reasons for replacing
it with a numerical TerritoryID are whatever they are, but they do not
include normalisation.

Thank you again for your input and helping me out on this.

You're welcome. :-)