I was told "Fields are expensive, records are cheap"

D

Dennis

Hi,

I just have a self education question.

In response to one of my questions I stated that I added a lot of fields (a
lot was 30 fields) to my table. The initial implementation was so
successful, the user requested quite a few more enhancements resulting in the
addition of 30 additional fields.

One MVP who responded stated "Fields are expensive, records are cheap". I'm
currious about his statement. I'm new to Access (less than a year) but I
have over 30 years experience with relational databased on multiple
platforms. I've always been taught the exact opposite - that "Fields are
cheap, records are expensive" since going to disk is so slow versus accessing
data in memory.

Is there something different about Access where the statement "Fields are
expense, records are cheap" is true. I'm using Access on local machine where
the front and backs end reside on the same machine as well as having multiple
front ends on each client's machine tied into the back end which resides on
a file server. We have a hardwired ethernet cat5 cable network.

Dennis
 
R

Rick Brandt

Dennis said:
Hi,

I just have a self education question.

In response to one of my questions I stated that I added a lot of fields
(a
lot was 30 fields) to my table. The initial implementation was so
successful, the user requested quite a few more enhancements resulting in
the addition of 30 additional fields.

One MVP who responded stated "Fields are expensive, records are cheap".
I'm
currious about his statement. I'm new to Access (less than a year) but I
have over 30 years experience with relational databased on multiple
platforms. I've always been taught the exact opposite - that "Fields are
cheap, records are expensive" since going to disk is so slow versus
accessing data in memory.

So you think rows are on the disk but fields are in RAM? Data has to be
retrieved from and written to disk regardless of this difference.
Is there something different about Access where the statement "Fields are
expense, records are cheap" is true. I'm using Access on local machine
where the front and backs end reside on the same machine as well as having
multiple
front ends on each client's machine tied into the back end which resides
on
a file server. We have a hardwired ethernet cat5 cable network.

This is not specific to Access. It is a "best practice" in all relational
databases to have a normalized data design. This generally results in
taller, narrower tables as opposed to wider, shallower tables.

When table modifications call for lots of new fields it often means that a
one-to-many relationship that ought to be set up in multiple tables is being
shoe-horned into a single table.
 
J

Jeff Boyce

Rick already responded to your questions ... but I'll add some additional
consideration ...

If your application involved only one table, and one query, and one form,
and one report, and one procedure, when you changed that table from 30
fields to 60, didn't you also have to modify (i.e., maintain) your query,
your form, your report, and your procedure?

Imagine how much maintenance there'd be if you needed to change it again?!
.... or if you had more than one of each.

On the other hand, if what you are using multiple fields to store (hint:
this is a spreadsheetly way to organize the data) were "repeating values"
( a prime example is mmm-yy ... as a field name, to store numbers for a
given month and year), then you could modify your table structure to hold:

tblTable
TableID
YourNumber
YourDateTime

With this design, any time you need to add a new month/year, you don't add
another field, you add another record. This is where the "tall not wide"
comes from...

Good luck!

Regards

Jeff Boyce
Microsoft Access MVP

--
Disclaimer: This author may have received products and services mentioned
in this post. Mention and/or description of a product or service herein
does not constitute endorsement thereof.

Any code or pseudocode included in this post is offered "as is", with no
guarantee as to suitability.

You can thank the FTC of the USA for making this disclaimer
possible/necessary.
 
D

Dennis

Rick,

Thanks for you responses, either I did not explain myself well or you missed
some obvious points in my statements.

Your comment: “It is a "best practice" in all relational databases to have
a normalized data design.

Response: This is true. I figured you would have picked up on that when I
stated I’ve been designing relational databases for over 30 years, which is
why I included that statement. But obviously I was wrong. Yes, relational
databases have been around for well over 30 years.


Your comment: “This generally results in taller, narrower tables as opposed
to wider, shallower tables.â€

Response: This is true when you compare normalized tables to flat files.
However, I was not. I was comparing relational to relational. Therefore,
this statement has nothing to do with the question.

As for narrow tables, that depends on the applications. Most accounting
applications have pretty narrow tables. However, most personal lines and
commercial lines insurance policy and claim tables can get very wide.


Your comment: “So you think rows are on the disk but fields are in RAM?â€

Response: Maybe you learned different computer science that I did, but
while this is a gross over simplification, your statement is ABSOLUTELY
correct. Rows are on disk and fields are in RAM. I know how other relation
database work, but I don’t know how Access works, which is what I am trying
to find out.

It is absolutely true that a program can only works with fields that are in
RAM. An application program can not work with a field that is on disk,
fields are only accessible while they are in RAM. The row has to be retrieve
from the disk and placed in memory in order for the VBA code to have access
to the row’s fields. Most database system provides the data to the program a
record or row at a time (excluding all of the look ahead and optimization
techniques.)

A program reads and writes rows (not fields) to the disk. I don’t believe
that VBA code can work on the entire row, unless there is a way to treat the
entire row as a string. I know that other databases have this capability, but
I don’t know if Access does or not.

When the file system write the row, except for blob and other such specialty
fields, the read and write functions within the database engine do not “knowâ€
where one field ends and the next one starts. The simply read a record from
the table and place it in memory or they write a record / row to the table.

Given, the above I am truly puzzled by your statement. You make it sound as
if it this is not true when in fact that is exactly the way it works, even if
it is extremely over simplified.

Just out of curiosity, do you know of any application software that works
directly on the disk? Granted, the OS and OS utilities work directly on the
hard drive, but I’m not aware of any application software what does not.


Your comment: “Data has to be retrieved from and written to disk regardless
of this difference. “

Response: This statement is the point of my question! But one again, we
disagree. The difference in the disk access time it takes to retrieve one
record from the disk over a network versus retrieving multiple records. To
me that statement “Fields are expensive, records are cheap;†implies it is
better to have smaller row and multiple linked tables than it is to have one
row that contains all of the normalized data.

To me, this is ludicrous – unless there is something I don’t know about
Access. It seems to be that the only things that are achieved by breaking up
a large normalized row into multiple smaller rows in multiple tables in a
significant increase in disk access time and significant higher network
traffic. But then maybe Access is very inefficient in dealing with a lot of
fields and it is truly better to deal with slower disk access time than
Access’s overhead. I don’t know. That is what I am trying to find out.

Even given the increase in network and disk access time, CPU processing
speed and memory access time are still many, many times faster than access
the disks. Also, in most client-server systems I’ve had experience with; the
network traffic (data access) has been the bottle neck.


Your comment: “When table modifications call for lots of new fields it
often means that a one-to-many relationship that ought to be set up in
multiple tables is being shoe-horned into a single table.â€

Response: Obviously you did not read my statement. This was something I
learned over 30 years ago. Third normal form tables do not contain repeating
fields unless they are a multi-valued field, in which case they are quite
useful in exploding parts for a completed assembly such as a car door, engine
or other such things. However, since Access does not support multi-valued
fields at this time, this is not an issue with respect to Access.

As I stated, the initial implementation was SO successful, the user
requested quire a few new enhancement that were beyond the scope of the
original project that we had to add 30 new normalized data fields. (I guess
I should have included that word).


Dennis
 
D

Dennis

Jeff,

Your comment: If your application involved only one table, and one query,
and one form, and one report, and one procedure, when you changed that table
from 30 fields to 60, didn't you also have to modify (i.e., maintain) your
query, your form, your report, and your procedure?

Imagine how much maintenance there'd be if you needed to change it again?!
..... or if you had more than one of each.

Response: What you say is true, but I’ve re-read my question / statements
and no where did I say I had one table, one query, and one form. So I don’t
understand what this has to do with my question.

The original db contained 4 different tables, 20 queries, 6 forms, and 10
reports. After the enhancements, I have about 7 different tables, 45
queries, 15 forms and sub-forms, and about 30 reports.

On all of the system on which I have worked, whenever the user request
significant enhancements and expansion beyond the original scope, there is
significant maintenance. In this case, the enhancement did not affect the
existing reports or half of the existing forms so there was not maintenance
there.

Let’s say that you were asked to write a personal auto policy processing
system, which you did. The client was so happy that they then said the
wanted to expand the system to also write home owners insurance. Would that
not be a significant enhancement that might cause you to added quite a few
additional fields to existing tables as well as many new tables?

In the 30 years I’ve been in the relational database field, I’ve learned to
design systems so that “A good database structure finds the right balance
between the rigidity that prevents bad data (relational integrity, suitable
data types) with flexibility that copes with unforeseen possibilities.â€
(Allen Browne)

As a result, the unforeseen changes on the user part did not alter the
database design. I simply added new fields to existing tables and new tables
where needed. Adding new fields to a table or query does not require any
changes to existing forms or reports that utilize that query but do not
utilize the new fields. So there was no maintenance to those forms or
reports. There was only maintenance issues I had to address were on those
forms, queries, and reports that I had to change to enable the new
enhancements.



Your comment: On the other hand, if what you are using multiple fields to
store (hint: this is a spreadsheetly way to organize the data) were
"repeating values" ( a prime example is mmm-yy ... as a field name, to store
numbers for a given month and year), then you could modify your table
structure to hold:

Response: Again, I wish you had read my introduction where I state I have
30 years of relational database design experience. As I stated in my
response to Rick, this was something I learned NOT to do over 30 years ago.

The really sad part of all this is, neither one of you addressed my original
question whether “Fields are expensive, records are cheapâ€.

Dennis
 
D

Duane Hookom

Dennis,
I don't think you have ever described your table structure(s). It very well
may be that your tables are normalized. How would we know other than to
guess that you know what you are doing?

I think the "Fields are expensive" was described in that each field
generally requires a certain amount of maintenance. Let's say you have a
table with 10 fields. Generally each of these fields requires controls on
forms and reports, columns in queries, and perhaps other time (expense) to
create. Adding another field would generally add 5-10% to the expense of
either creating or maintaining some of these objects. This is a very basic
concept that someone with 30 years of experience creating relational
databases should understand.

I took over a survey application at a hospital from a programmer with many
more years of programming than I had at the time. She had created a single
field for each question in the survey. This required tons of work (expense)
in creating and maintaining the application. My efforts resulted in the
basis for At Your Survey
http://www.rogersaccesslibrary.com/forum/forum_posts.asp?TID=3&SID=7288a967da6c99f76zb9ff36f3577dec
which provides great functionality at much less expense yet more records.

This is why John Vinson often suggests "Fields are expensive, records are
cheap". I'm surprised the basics have eluded you. I expect you took offense
to the statement when you should not have. You should always feel free to
post your structure to support your efforts.

Duane Hookom
MS Access MVP
 
D

Dennis

Duane,

Your comment: I don't think you have ever described your table
structure(s). It very well may be that your tables are normalized. How would
we know other than to guess that you know what you are doing?

Response: You are correct is that I did not describe my table structure.
However, if I read someone had 30 years in relational db design experience, I
would have thought that he or she knew about designing to 3rd normal form and
would not have questioned them.


Your comment: I think the "Fields are expensive" was described in that each
field generally requires a certain amount of maintenance. Let's say you have
a table with 10 fields. Generally each of these fields requires controls on
forms and reports, columns in queries, and perhaps other time (expense) to
create. Adding another field would generally add 5-10% to the expense of
either creating or maintaining some of these objects. This is a very basic
concept that someone with 30 years of experience creating relational
databases should understand.

Response: You are absolutely correct. Personally, I think the maintenance
number is higher. However, I did not interpret his comment this way for the
following reason. If you have to add a data field so you can produce a
report, what do you tell the user – I can’t do it because that field will add
5-10% to the cost and increases maintenance cost? What I have learned in my
30 years is the cost issue is NOT for the developer to decide, it is for the
user to decide. It is incumbent upon us to inform the user of the cost, to
develop a cost effective design, and maybe do a cost benefit analysis if the
numbers are big enough. However , if the user decides it is worth the cost,
then it is not our place to argue.


Your comment: This is why John Vinson often suggests "Fields are expensive,
records are cheap".

Response: As I said, I may have misinterpreted what John had to say. As I
stated in my above response, I though it had something do with the internal
workings of Access. I failed Mind Reading 101 when I was in college and have
not gotten any better over the year. All I received from John was the
statement “"Fields are expensive, records are cheap" with no explanation and
no background information regarding that that statement. As a matter of
fact, he included that statement in the same paragraph where he was asking if
the additional fields were repetitive fields. Given that he was referring to
repetitive fields, how was I supposed to make to giant leap that he was
referring to the development and maintenance cost of the fields and not to
the repetitiveness of the fields themselves?

Your comment: I took over a survey application at a hospital from a
programmer with many more years of programming than I had at the time.

My response: I don’t understand what this comment has to do with the
subject. I’ve come behind other programmer and rewrote the code so it work
better, faster, more flexible, and most importantly – more understandable and
easier to maintain. I had one customer who designed a series of program that
ran in a daily cycle. The only problem with the software is it took 30 hours
to run a daily cycle. When I was done, we were able to run the cycle in two
hours. But so what, that had nothing to do with the above question. I’m
sure other programmers will come behind what I’ve done and reworked my code
so that it is better. And I’m sure other have come behind you and reworked
your code. So I don’t understand the point of this comment.


Your comment: I'm surprised the basics have eluded you.

My Response: I’m surprised that the basics have eluded you also. I believe
you when you say that John uses that phrase when he is referring to the
development and maintenance cost of each field. In that respect I agree with
him. However, from a developer’s stand point, so what? The cost is not the
developer’s decision to make. I firmly believe that one of the basics in
this business is it is developer’s job to provide an efficient and cost
effect solution to the issue along with a realistic development estimate.
However, it is the user’s and only the user’s decision to determine if the
project is worth the development cost. It is their money and they have a
right to decide how to spend it, even if I disagree with them, which
frequently I do.


Your comment: I expect you took offense to the statement when you should
not have. You should always feel free to post your structure to support your
efforts.

My response: I don’t think offense is the right term. I think frustrated
because they did not read the entire statement that I took quite a bit of
time trying to phrase it correctly. Rather than go into all of the details,
table design, and background, I figure it I stated that I had the experience,
people would not be bringing up the issues that you would be with someone who
has just two weeks of relational databases. But, I was wrong. Maybe I just
think differently that other people. Personally, if a doctor tells me they
have been a doctor for 30 years, I don’t normally ask them if they can read a
thermometer. Would a doctor take offence to a question like that? Hmm,
probably.

Tell me something, if you as an MVP were to ask a similar question (which
you would not since you are an MVP) and someone ask if your data was
repeating or if you table was not normalized, would you not just shake your
head?

You have come to my aid quite a few times. I am very grateful for all of
you help. You also deserver a lot of respect for being an Access MVP.
However we are going to have to agree to disagree on the issues above,
especially who decides if a data field is worth the cost.



Dennis
 
D

Dennis

Duane,

In another posting John clarified what he meant by "Fields are expensive,
records are cheap".


"What I consider "expensive" is that an improperly normalized table structure
will require far more work in getarounds, contorted queries, form and report
maintenance, and code than would a properly normalized design. "


John W. Vinson [MVP]


Dennis
 
J

Jeff Boyce

Dennis

Sorry I wasn't more explicit... see responses in-line below

Dennis said:
Jeff,

Your comment: If your application involved only one table, and one query,
and one form, and one report, and one procedure, when you changed that
table
from 30 fields to 60, didn't you also have to modify (i.e., maintain) your
query, your form, your report, and your procedure?

Imagine how much maintenance there'd be if you needed to change it again?!
.... or if you had more than one of each.

Response: What you say is true, but I've re-read my question / statements
and no where did I say I had one table, one query, and one form. So I don't
understand what this has to do with my question.

Please re-read my response. It starts with the word "If". I was describing
the workload/maintenance of an overly-simplified design, and pointing out
that adding fields is expensive, in terms of the maintenance it requires to
all affected objects.
The original db contained 4 different tables, 20 queries, 6 forms, and 10
reports. After the enhancements, I have about 7 different tables, 45
queries, 15 forms and sub-forms, and about 30 reports.

On all of the system on which I have worked, whenever the user request
significant enhancements and expansion beyond the original scope, there is
significant maintenance. In this case, the enhancement did not affect the
existing reports or half of the existing forms so there was not
maintenance
there.

Let's say that you were asked to write a personal auto policy processing
system, which you did. The client was so happy that they then said the
wanted to expand the system to also write home owners insurance. Would
that
not be a significant enhancement that might cause you to added quite a few
additional fields to existing tables as well as many new tables?

That would depend on what data elements the user & I agreed were needed.
In the 30 years I've been in the relational database field, I've learned
to
design systems so that "A good database structure finds the right balance
between the rigidity that prevents bad data (relational integrity,
suitable
data types) with flexibility that copes with unforeseen possibilities."
(Allen Browne)

That sounds like how I try to design systems.
As a result, the unforeseen changes on the user part did not alter the
database design. I simply added new fields to existing tables and new
tables
where needed.

If you are looking for other folks ideas, to compare them with yours and
decide what "balanced" approach would work best for you, let us know. You
asked for an assessment of John V.'s statement.
Adding new fields to a table or query does not require any
changes to existing forms or reports that utilize that query but do not
utilize the new fields. So there was no maintenance to those forms or
reports. There was only maintenance issues I had to address were on those
forms, queries, and reports that I had to change to enable the new
enhancements.

Maintenance is maintenance, whether on one object or several. My comments
were intended to offer the option of a design that would require NO
additional maintenance, since the table would grow longer, not wider.
Your comment: On the other hand, if what you are using multiple fields to
store (hint: this is a spreadsheetly way to organize the data) were
"repeating values" ( a prime example is mmm-yy ... as a field name, to
store
numbers for a given month and year), then you could modify your table
structure to hold:

Response: Again, I wish you had read my introduction where I state I have
30 years of relational database design experience. As I stated in my
response to Rick, this was something I learned NOT to do over 30 years
ago.

That you've learned not to do this was not evident in your original post.
Responding as you have could be interpreted as 'baiting' folks, offering an
incomplete description and then criticizing folks for not inferring or
knowing your detailed situation.

If you want detailed suggestions/ideas/approaches, provide detailed
descriptions.
The really sad part of all this is, neither one of you addressed my
original
question whether "Fields are expensive, records are cheap".

Define "expensive".

Good luck!

Regards

Jeff Boyce
Microsoft Access MVP

--
Disclaimer: This author may have received products and services mentioned
in this post. Mention and/or description of a product or service herein
does not constitute endorsement thereof.

Any code or pseudocode included in this post is offered "as is", with no
guarantee as to suitability.

You can thank the FTC of the USA for making this disclaimer
possible/necessary.
 
D

David W. Fenton

When table modifications call for lots of new fields it often
means that a one-to-many relationship that ought to be set up in
multiple tables is being shoe-horned into a single table.

It also means that you have to alter your front-end forms and
reports by adding all the fields, whereas if you're adding records
instead, whatever continuous or datasheet form you're using to
display those records will just have more records in it, so there is
no need to alter then user interface objects.

To *me*, that's what the phrase means, that adding fields is more
complicated in terms of UI, which, to be frank, is where 90% of our
time is spent in the development process (rather than in schema
design/changes).
 
D

David W. Fenton

However, most personal lines and
commercial lines insurance policy and claim tables can get very
wide.

That suggests they are probably denormalized so some degree.

I know a number of popular applications (ACT springs to mind) use
very wide tables, and this means their application has to be more
complicated to work with the data. It also is one of the main causes
of the performance problems that ACT has in terms of filtering large
datasets -- it's harder to efficiently use indexes when you've got
the data denormalized (and some of the data in ACT (last time I
checked, which was many, many years ago) seems to be stored in
multi-value fields, which is yet another reason for performance
problems, since even if they are indexed, they can't be used
efficiently).

I will agree that a wide table *can* be normalized.

I will also agree that there can be perfectly good reasons to
denormalize.

But I have very few tables in any of my apps that have even close to
50 fields.

A common reason for denormalization that looks properly normalized
is mis-definition of the entities (i.e., a table holds records, each
of which represents an entity, and each field is an attribute of the
entity). In a survey, for instance, it's very common to create a
single record for each survey, with a field for each question. But
this approach is completely unextensible -- any change to the
questionaire requires a change to the table. Or you have to create a
table for each questionaire. Or you have to add fields that are used
in some questionaires and not in others.

I learned this lesson with a medical study database that was tied to
the structure of the UNOS national transplant registry. There was
only one medical study and for it to work the variables being
examined had to be established at the beginning, so it was never
going to change. It seemed logical to use a record per patient. But
before the app was even finished, UNOS changed the data structure of
how they recorded data, and we ended up having to add fields to the
existing tables and then change all the forms and reports to reflect
that. If we'd used a proper structure, i.e., with a record for each
variable, and a table that defined what the variables were, we would
have needed to make absolutely no changes to the *structure* of the
data or the application -- we would have been able to accomodate the
additional variables just by adding *data* to the existing tables.

That's the essence of the problem. Fields are expensive, records are
cheap -- it's easy to change the data, hard to change the data
storage *structure*.
 
D

David W. Fenton

A program reads and writes rows (not fields) to the disk.

I think you're stuck in a very old-fashioned approach to this, one
that reflects the very constricted memory and storage available back
in the 80s and early 90s.

A Jet/ACE data store doesn't store data in records, but in data
pages. It's not like legacy systems at all. And the fields are
variable length, so records are variable length. The efficiency of
interaction with the storage and RAM are pretty moot since the days
of disk caching, when even a write is done to RAM (and then
persisted to disk when convenient). This means there's hardly any
difference between RAM manipulation and reading/writing from/to the
data store, since both are happening in RAM.

Disk caching has been a given on Windows since Win3.11, c. 1992.
This predates Access, in fact, so Access/Jet has never been as bound
to disk space/performance as the databases that seem to form the
model in your head.

And when a server database is involved, the bottleneck is not RAM
vs. disk, but the network connection, which is vastly slower than
not just the RAM, but even slower than disk access.

So, your model seems completely outdated to me.
 
D

David W. Fenton

In another posting John clarified what he meant by "Fields are
expensive, records are cheap".

"What I consider "expensive" is that an improperly normalized
table structure will require far more work in getarounds,
contorted queries, form and report maintenance, and code than
would a properly normalized design. "

There's also the performance issue. If you need to query phone
numbers, in a normalized structure, you will be querying one field,
using one index, retrieving the data pages for a single index across
the wire.

With a denormalized structure, you may need to query multiple
fields, using multiple indexes, and pulling more index pages across
the wire. And if your table is denormalized and you need to add
indexes for repeating fields, you're much more likely run up against
the limitations of number of indexes in a single table, which for
Jet/ACE is 32. If you've got 4 phone number fields in your main
table, that's 4 additional indexes, which means you have only 27
left (since you have to have a PK, of course). If you normalize your
phone numbers, the index is in a completely different table, on a
single field, and it uses up none of the indexes in the main table,
leaving you with 31 available indexes.

Some argue that the UI for normalized data is hard to create, or
that if you use easy methods, you're stuck with subforms, and users
don't find subforms easy to use.

Well, tell me that users will have trouble with this:

http://dfenton.com/DFA/examples/SA/ContactInfo.jpg

That's two instances of a single subform, using an outer join
against the list of possible phone number types for each category of
phone numbers (office vs. home) so that the user just sees a flat
form listing all the phone number fields that can be filled out.

All of the phone numbers are stored in a single table, and records
are created there only when there is a number to be stored (email is
stored in the same table in fact, though it's not included in the
same subform, since it is presented differently because it has
different relevant data fields).

So, I think there are no valid objections to normalization.

That said, I don't always insist on normalized phone numbers,
because in most cases, nobody searches on them. The number of
necesssary numbers is also relatively finite -- while in the last 15
years we've added pagers and mobile phones (and now pagers are out
the window and fax is not going to be relevant 20 years from now),
there's not much else to add. In many of my apps, there's phone,
fax, mobile, email and a memo field for OTHER NUMBERS. This covers
everything, really.

Now, all that said, there's an argument to be made that the dropping
of pagers and the coming abandonment of faxes means that the
denormalized structure is flawed in that not only is it hard to
*add* new fields, it's also hard to remove obsolete ones. I would
say the problem with removing fields vs. adding them is
asymmetrical, since you don't have to touch the underlying data
structures -- you only have to alter the user interface
(forms/reports). That's substantially easier to do than adding
entirely new fields, so it's not nearly as bad a problem once you've
reached a number of repeating fields that covers all the reasonable
possibilities.
 
D

David W. Fenton

You are correct is that I did not describe my table structure.
However, if I read someone had 30 years in relational db design
experience, I would have thought that he or she knew about
designing to 3rd normal form and would not have questioned them.

whether a design is normalized or not depends entirely on how you
define the entity being modelled in the table. If you define the
entity as "this table includes records where each record is a
completed survey" then you'll have a record with a field for each
survey question. It won't be denormalized, because each field is an
attribute of the survey.

But modelling your entity in that fashion is very rigid and
difficult to alter down the road.

If, instead, you consider the questions on the survey to be
variables, each one equal, then the single-record survey structure
is going to be denormalized, since you've got repeating fields
(variable 1, variable 2, variable 3 -- even though they may be named
Age, Sex, Eye Color).

For particular purposes, this may work very well. A Person table
doesn't need to be broken down into repeating variables, since most
people have the same attributes. But there are still plenty of ways
to slice and dice the data, such as whether you treat addresses and
phone numbers as repeating data or not (the clue is where you have
"Work Phone" and "Home Phone" you have repeating data, and "Work
Address" and "Home Address" you have the same thing). What you do
that is up to you -- there is no law that your data has to be
normalized. You choose the level of normalization that is
appropriate to the application you're building.

For surveys, there is no question in my mind that the normalized
structure with each variable as a row in a table instead of as a
field in a record is the only valid way to go, simpley because doing
otherwise makes it very, very difficult to change the survey, or to
store more than one survey.

In other cases, it's not necessarilyl the case that normalized
storage is going to be vastly superior in the long run.

Your particular application may be one where it is appropriate to
add 30 fields to a table (though that seems to indicate to me that
there are either repeating fields or the original design was really
inadequate and not carefully thought through -- not so much by the
database developer as by the client).

But as a general rule, fields are expensive and rows are cheap, and
any time you're contemplating adding a lot of fields to a table, it
should be a red flag that perhaps you should rethink the structure.
That doesn't mean you'll conclude that adding 30 fields to a single
table is wrong, just that you should evaluate the ultimate solution
carefully. Making the wrong decision is a lot harder to rectify
*after* you've implemented it, so taking the time to consider the
issue carefully is going to save loads of time later.
 
D

David W. Fenton

Response: You are absolutely correct. Personally, I think the
maintenance number is higher. However, I did not interpret his
comment this way for the following reason. If you have to add a
data field so you can produce a report, what do you tell the user
ƒ " I canƒ Tt do it because that field will add 5-10% to the cost
and increases maintenance cost? What I have learned in my 30
years is the cost issue is NOT for the developer to decide, it is
for the user to decide. It is incumbent upon us to inform the
user of the cost, to develop a cost effective design, and maybe do
a cost benefit analysis if the numbers are big enough. However ,
if the user decides it is worth the cost, then it is not our place
to argue.

If you model the new fields as records in a table, in many, many
cases, there is no change needed to the app. Thus, the cost of
adding the new field is ZERO. If you model it as a new field in an
existing table, the cost is substantially larger.

What client, understanding the issues, would choose the $100
solution over the $0 solution? But if you as developer only offer
the denormalized structure, they never get that choice.
 
D

David W. Fenton

BruceM gets right to the key point: how you decide to model your
entities determines what is normalized and what is not.

To use your specific example of insurance processing, auto and
home insurance are so different that many developers would use a
different table for the two rather than broaden an existing auto
insurance table to accomodate home insurance. You could argue
that it is all insurance and therefore is an entity that belongs
in a single table.

I would argue that since both are insurance policies, you can use
the survey model, and for each attribute of the policy, you create a
record in a table connected to the main policy record. For the
different types of insurance policy, you'll have a different set of
variables, because they are different types, but by storing each one
as a row in a related table, you've made your structure extensible
so that you can model any type of insurance policy. And if you like,
you can use a single set of user interface objects to work with and
print all of them.
You could similarly argue that
employees and clients are both people, and therefore should be in
one table. Many developers would define "entity" more narrowly,
and therefore use separate tables, in a situation where many
fields are used for one type of entity and not the other, or where
the entities are clearly in different categories.

I long ago switched over to putting all people in the same table,
whether employees or clients. It seems obvious to me that if you
have two tables that have a whole set of fields that are indentical,
that you actually have a single entity stored in two different
tables. This is not to say it is not useful in some circumstances to
set up two separate tables for two instances of what are largely the
same entity. I can see no real benefit from keeping employees and
customers in the same table, though certainly employees might some
day become customers, and could be both at the same time, so there's
a certain justification -- it all depends on the business rules and
how much benefit there is to modelling them as the same or different
entitites.

I go with keeping all real-world entities in the same base table
(think how complicated it would get if you had an employees table
and a customers table and were normalizing your storage of phone
numbers and addresses -- how do you enforce RI if you don't
duplicate the exact same structure for both entity types?), and then
segregating them accordingly. I found this to be the best approach
well over a decade ago and have stuck with it and had no problems.
That doesn't mean others would reach the same conclusion for the
exact same applications, or that the single-entity model is
appropriate to all applications (or even most). It's entirely
dependent on the particular application and the particular
functionality of the app and the business rules in place. However,
given that those things can change drastically over time, I tend to
favor going as normalized as practical on the front end, since that
maximizes flexibility over the long run.
 
D

Duane Hookom

Dennis,
Just a comment regarding a comment regarding a comment ;-)

I made the comment: I took over a survey application at a hospital from a
programmer with many more years of programming than I had at the time.

Your response: I don’t understand what this comment has to do with the
subject.

This was all based on your statement regarding your "over 30 years
experience with relational databased".

Years of experience are not always the same as understanding and knowledge.
In your case, your knowledge may be much greater than 30 years. I'm not sure
how we would know that.

I was simply providing an example suggesting that I have followed developers
who had many years of experience without a basic understanding of properly
structured tables. In no way was I suggesting you don't understand
normalization.
 
D

Duane Hookom

Dennis,
To further my point about experienced developers I would like you to look at
this page on normalization.

http://www.fmsinc.com/tpapers/datanorm/index.html

I find it very wrong yet it is authored by a very respected developer.

It's my opinion the author transforms one non-normalized structure into
another non-normalized structure.

Why not create a more normalized structure that has fields for ID, Year,
AccountCode, and BudgetAmt?
 
D

Dennis

Bruce,

I though I state my point quite clearly in my original message

"I've always been taught the exact opposite - that "Fields are cheap,
records are expensive" since GOING TO DISK IS SO SLOW VERSUS ACCESSING DATA
IN MEMORY. Is there something different about Access where the statement
"Fields are expense, records are cheap" is true." (excuse the caps, but I
had not other way to highlight the text).

The original question has to do with disk access speed versus the speed of
extracting a lot of variable from a long record." and that is all. Everyone
one else is addressing a non-issue. The data structure had NOTHING to do
with the original question, it was simply interesting background information,
that is ALL.

The question was disk access speed over a network versus extracting data
from a long record. That was the whole question and instead of answering the
question, people are worried about 30 fields.

All the 30 year comment was suppose to mean was I don't do stupid mistakes
like repeating fields and that is ALL it meant. Nothing else! However,
everyone, including you, had read way more into that statement.

Your comment of: To use your specific example of insurance processing, auto
and home insurance are so different that many developers would use a
different table for the two rather than broaden an existing auto insurance
table to accommodate home
insurance.

Response: You are so wrong. I know because I’ve been working in the
insurance field for those 30 years. In that time I’ve worked on 12 packages
from 12 different vendors and they all have many things in common. One of
them is they all have a policy master table which contains one record per
policy every issued regardless of what line it was written under. And the
policy master usually has about 100 + fields in it. (A couple had 200 +
fields.) They all also had a line specified policy master where the specific
line information (auto, home, work comp, general liability, etc.) for that
policy was store. (There were a lot of other tables too.) Your comment
about the lines being so different is both correct and incorrect. There are
about 100 fields that are common between each line of business between the
policy information, reinsurance, earned and unearned premiums, statistical
reporting to the insurance bureaus, and other such things. Now, for the
information that is line specific, that information is indeed stored in
different tables.

Actually, if the developer chose to have multiple master tables he would be
creating himself a nightmare when came to policy inquiry, claim processing,
earn and unearned premium account, policy reports, renewal processing,
cancellation and reinstatement processing, and a whole long list of other
issues. But then again, that is where the 30 years of experience comes in.

But it is quite a coincidence that all 12 vendors who had very different
products (from flat files on a mainframe to a windows based client server and
everything in-between) and in some cases where written decades apart in time
all took the same approach.


Your comment: Many developers would define "entity" more narrowly, and
therefore use separate tables, in a situation where many fields are used for
one type of
entity and not the other, or where the entities are clearly in different
categories.

Response: You are absolutely correct. The entity, at least from 12
independent software houses that I know of, is the policy. The policy,
regardless of which line of business, has about 100 to 200 common fields.
(The reason for the huge difference is due to what the different system
capabilities. The more they could do, they more data they stored.). That is
why those fields are in the policy master file / table. The many fields at
are held in common, but are specific to each line of business are held in
line of business policy master files and they are all children to the main
policy master parent.


Dennis
 
D

Dennis

David,

Your comment: I would argue that since both are insurance policies, you can
use the survey model, and for each attribute of the policy, you create a
record in a table connected to the main policy record. For the different
types of insurance policy, you'll have a different set of variables, because
they are different types, but by storing each one as a row in a related
table, you've made your structure extensible so that you can model any type
of insurance policy. And if you like, you can use a single set of user
interface objects to work with and print all of them.

Response: I’ve seen this tried before and it was just too complicated. Not
only do you have the policy table, there is also the coverage master and line
specific coverage table. Then you have the objects that are being insured
tables (houses, warehouses, office buildings, bridges, coin collections,
people - work comp, people’s honesty in bonds, and and on).

Then you have all of the transactions that apply to a policy such as
on-line rating, policy issuance, automatic cancellation for non-pay,
automatic reinstatement for payment received before the cut off date, offer
to renew, automatic renewal, automatic non-renew. Then you have the
sequential endorsement (changes to the policy), that can be done at any level
(policy, coverage, limits, object being insured and the limits on those
specific objects). Then you have out of sequence endorsements where you have
to back off exiting endorsement to apply an endorsement from the past, and
then reapply the backed of endorsement taking into account that the out of
sequence endorsement may have removed the insurable object that a subsequent
endorsement want to change. When you add all of these transactions to all of
the different data, it becomes very very complicated to have a table drive
database.

Maybe there are more recently developed systems that take this approach. I
actually think it is actually a pretty good approach, just very machine
intensive. But with the faster CPU of recent times, it is much more
feasible.

Now that I think about it, one of the systems that I worked on used a
simplified version of what you are suggesting to enable the users to extend
their policy, coverage, and insurable object databases.


Dennis
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top