calculated values in a key field

D

Douglas J. Steele

Realistically, you should never store calculated values. You're trying to
store 2 pieces of information in a single field (which is a violation of
database normalization principles), plus (if you keep the other 2 values)
you're storing information redundantly. In addition, having a key based on
name isn't a good idea (What happens if you have also have Paul Mitchinson
as a customer? What happens if Jane Miller gets married and changes her name
to Jane Smith?

If you're still determined to do it, you'd have to put the calculation in
the form's BeforeInsert event so that it calculates the key before inserting
the record into the table. There's nothing you can do at the table level.
 
B

Bernard Peek

Realistically, you should never store calculated values. You're trying to
store 2 pieces of information in a single field (which is a violation of
database normalization principles), plus (if you keep the other 2 values)
you're storing information redundantly. In addition, having a key based on
name isn't a good idea (What happens if you have also have Paul Mitchinson
as a customer? What happens if Jane Miller gets married and changes her name
to Jane Smith?

If you're still determined to do it, you'd have to put the calculation in
the form's BeforeInsert event so that it calculates the key before inserting
the record into the table. There's nothing you can do at the table level.

You can create a query based on the table and base the form on the query
rather than the table. That makes it possible, but it's definitely not
something I would recommend.
 
B

BruceM

Why would you not recommend it? The calculation would be performed on the
fly, which is exactly what you want. The calculated field would be
available as needed, but wouldn't be stored. Or am I missing just what it
is that you do not recommend?
 
B

Bernard Peek

Storing calculated data is always risk. In fact any deviation from a fully
normalised structure has risks. The main risk is that the parent fields and
the calculated filed might be updated separately, or two differnt sets of
parents could produce the same value in the calculated field.

If you want to do this it may be worth doing some more research on how and
why other people have taken different approaches to the problem. The first
question to ask is what benefit do you get from denormalising the table. If
you can get the same advantages without the risks then the more risky method
isn't worth pursuing.

This sort of calculation is sometimes used in big databases where it can
give performance improvements, but a big database probably has someone whose
job is to find and fix the errors that a poor implementation (or a
well-meaning but naive user) can introduce.

The process of using real-word values like a name to generate a shorter key
is called hashing. There are standard techniques to do it, I'm sure Google
will find quite a lot of hits on hashing algorithms. An important part of a
hashing algorithm is a way of handling situations where different input data
generate the same key value.

If I was implementing a hashing system in a database I would set up a
trigger on the database table so that whenever the record was updated the
hash key was re-calculated.


Why would you not recommend it? The calculation would be performed on the
fly, which is exactly what you want. The calculated field would be
available as needed, but wouldn't be stored. Or am I missing just what it
is that you do not recommend?
 
B

BruceM

You missed the point of my question. You seemed to be saying you would not
recommend basing the form on a query. I asked why not, since a calculated
query field is not a stored value. I *never* advocated *storing* the
calculated value, but there is no reason not to calculate a value for
display purposes.

Bernard Peek said:
Storing calculated data is always risk. In fact any deviation from a fully
normalised structure has risks. The main risk is that the parent fields
and
the calculated filed might be updated separately, or two differnt sets of
parents could produce the same value in the calculated field.

If you want to do this it may be worth doing some more research on how and
why other people have taken different approaches to the problem. The first
question to ask is what benefit do you get from denormalising the table.
If
you can get the same advantages without the risks then the more risky
method
isn't worth pursuing.

This sort of calculation is sometimes used in big databases where it can
give performance improvements, but a big database probably has someone
whose
job is to find and fix the errors that a poor implementation (or a
well-meaning but naive user) can introduce.

The process of using real-word values like a name to generate a shorter
key
is called hashing. There are standard techniques to do it, I'm sure Google
will find quite a lot of hits on hashing algorithms. An important part of
a
hashing algorithm is a way of handling situations where different input
data
generate the same key value.

If I was implementing a hashing system in a database I would set up a
trigger on the database table so that whenever the record was updated the
hash key was re-calculated.
 
J

Jamie Collins

You're trying to
store 2 pieces of information in a single field (which is a violation of
database normalization principles)

I guess you are alluding to update anomalies but something doesn't
sound right as stated. Many data elements comprise multiple pieces of
information: a date, UTM coordinates, telephone number, etc and no one
would suggest splitting them into multiple columns under normal
circumstances.

Jamie.

--
 
B

Bernard Peek

OK, sorry. I understand. Actually I don't have any problem at all with basing
forms on queries, it can be a useful technique. My objection was to using a
hash function to create a key without first doing some research. Using
hashes is a perfectly respectable technique and most database management
systems use it, but it's not easy to do properly.

You missed the point of my question. You seemed to be saying you would not
recommend basing the form on a query. I asked why not, since a calculated
query field is not a stored value. I *never* advocated *storing* the
calculated value, but there is no reason not to calculate a value for
display purposes.
 
B

BruceM

Thanks for clearing that up. I think I have read about hashing, but not
being involved with any big databases I have decided not to pursue it any
further. As it is there is plenty I don't know about subjects of more
immediate concern.

Bernard Peek said:
OK, sorry. I understand. Actually I don't have any problem at all with
basing
forms on queries, it can be a useful technique. My objection was to using
a
hash function to create a key without first doing some research. Using
hashes is a perfectly respectable technique and most database management
systems use it, but it's not easy to do properly.
 
B

Burton Roberts

I would (suggest splitting them into multiple columns under normal
circumstances)
 
P

Peter Mitchell

Hello
Is it possible at all (presumably in a form) to create the value to be
stored in a key field based on another field (or two).
For example, to have the form insert
mitchp for a Customer ID based ona surname of mitchell and a first name
of peter.
I know the calculation - done many many times in a query - and also VB - but
I want the system to actually store the calculated value mitchp.
(Yes most of the real world may use numbers and autonumbers - but
nevertheless ...)

Any suggestions welcome.

Peter
 
J

Jamie Collins

I would (suggest splitting them into multiple columns under normal
circumstances)

I hope no one takes you too seriously. If you truly split dates in to
separate columns for year, month, day, hours, minutes and seconds,
please post the Validation Rule you use to test for legal combinations
e.g. to prevent year=2007 month=2 day=29 and, assuming you need to put
them back together to be able to do so, justify why it's worth
splitting them apart in the first place.

Jamie.

--
 
J

Jamie Collins

I would (suggest splitting them into multiple columns under normal
circumstances)

Perhaps I should have said, "no one would _seriously_ suggest
splitting them into multiple columns under normal circumstances" :(

Exhibit A:

Molecular, Atomic and Sub-atomic Data Elements - Part 1
by Joe Celko
http://www.dbazine.com/ofinterest/oi-articles/celko29

Non-atomic Data Elements:
The most common newbie error is splitting a temporal data element into
(year, month, day) columns or as (year, month) columns, or (year)
columns. The problem with temporal data is that, by its nature, it is
not atomic; it is a continuum. A continuum has no atomic parts; it can
be infinitely subdivided. Thus, the year 2005 is shorthand for the
pair ("2005-01-01 00:00:00," "2005-12-31 23:59:59.999..") by which we
live with the precision that our SQL product has for the open end on
the left. It includes every point in between - every uncountably
infinite one of them...

Atomic Data Elements:
The other mistake one can make is to split an atomic attribute into
columns. As we all know from those 1950s science fiction movies,
nothing good comes from splitting atoms - it could turn your brother
into an atomic werewolf! A phone number in the United States is
displayed as three sections (area code, exchange, and number). Each
part is useless by itself. In fact, you should include the
international prefixes to make it more exact, but usually, context is
enough. You would not split this data element over three columns
because you search and use this value in the order in which it is
presented, and you use it as a whole unit. This is an atom, not a
molecule.
[Unquote]

Jamie.

--
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top