Semi OT: To Null or not to Null

J

John B

Semi OT I know but I want other programmers opinions, not DBA's.

When storing class data in a database, do you use null or the default
for the type if the field is empty.

ie.
class Foo
int P1
string P2
Save()
insert into FooTable (p1 value, p2 value)
Load()
get from FooTable p1, p2 into P1, P2

Table FooTable
P1 integer, P2 nvarchar(100)

Personally, I use the default as it then requires no checking on load
for nulls, or checking on save to transform defaults to nulls.

Any thoughts?


Cheers
JB
 
M

Michael C#

How about a programmer/DBA? Strictly speaking from the DBA side, NULLs
suck. They're supposed to be temporary placeholders for "unknown" values
that you might obtain later. As soon as you get the known value, it should
immediately replace the NULL.

From the programming side, using NULLs in place of known "blank" or "empty"
values instead of blanks, zeroes, or whatever can overcomplicate your
programming amd give you false results. Consider a query where you're
searching for people in a database with no middle name. You have to check
for blanks in the middle name column as well as NULLs using SQL's "IS NULL"
or "COALESCE()" syntax. using "IS NULL" in WHERE clauses can also hurt
performance in some instances, as the Query Optimizer may not be able to use
an Index when querying the column.

Additionally, NULLs can give you 'unexpected' results in aggregate functions
because of how aggregate functions handle NULLs (the AVG() function comes
immediately to mind).

So as you can see, from both sides, I just don't like NULLs :) They have
their place, but I avoid them as much as possible for the reasons above, and
for the reasons you mention. After all, what does a NULL look like if you
export a table to a Text file? The caveat is that sometimes NULLs are the
best option - in a column where the DateTime might not be known, you can't
store a blank DateTime - NULLs might be your best option in that instance.
 
B

Bob Grommes

Use database NULL when the value is (1) truly unknown and (2) you need to
distinguish "unknown" from a default value.

For example suppose you have a required text field in an insurance form that
needs to be filled in over time. And suppose that an empty string is a
valid response to that particluar question. In this case DBNull.Value and
String.Empty (on the client programming side) represent two different ideas:
A "none" or "nothing" response and no response at all. This is in theory
what database NULLs are for -- they answer a question with "I don't know" or
"no one has said yet". They tell you whether or not someone has made a
decision to put a value, even an empty or default one, in that field.

Even in my contrived example, it may be feasible or even desireable to put
the string "UNKNOWN" or "N/A" or similar in the field. As long as that
value can have a single purpose, then you are spared the potential pain and
annoyance of using NULL. However, if the UI actually has to show blanks
rather than this "magic" value then you may end up tap dancing almost as
much as you do with NULL. At least NULL has the virtue that it has a
univeral meaning and representation on the server side, and common
conventions for handling it on the client side.

That is more or less the decision process I go through. Try to avoid NULL,
but there are times when you need it.

--Bob
 
C

Chris R. Timmons

At least NULL has the virtue that
it has a univeral meaning and representation on the server side,
and common conventions for handling it on the client side.

Bob,

I've got to argue with that point. NULL has no inherent meaning of
its own. It only indicates the absence of data. It's not even equal
to itself! In the application logic, a NULL can mean whatever the
programmer wants it to mean. That covers a lot of ground - and
introduces a potential mountain of debugging and maintenance
headaches.

Just off the top of my head, taken by itself a NULL value *could*
mean:

- unknown
- not applicable
- always optional
- conditionally optional
- positive infinity
- negative infinity
- error
- required, but can be supplied later

Since a NULL value has no inherent meaning, it has to be given
meaning somewhere else, usually in the application code. Too often
that meaning is expressed in a comment, which may or may not be up-
to-date with the application's code. This introduces tighter
coupling between the database schema and the app's code.

I prefer to use "flag" columns in conjunction with columns that would
otherwise contain NULLs. (e.g. an optional "EndDate" date column has
a companion "IsEndDateSupplied" bit column). That way the database
schema is self-describing, and it's loosely coupled with the
application's code.

Chris.
 
M

Michael C#

Chris R. Timmons said:
Just off the top of my head, taken by itself a NULL value *could*
mean:

- unknown
- not applicable
- always optional
- conditionally optional
- positive infinity
- negative infinity
- error
- required, but can be supplied later

Personally I would never use NULL to represent most of these particular
items. "always optional"? "conditionally optional"? What does that mean?
Has the user filled in all the other columns and failed to populate these
columns? Are they never going to come back and populate these columns? Why
are we using NULLs to indicate this? How about a code indicating "NO
RESPONSE" to an optional question? Or as you mention later, a separate
column with a flag indicating NO RESPONSE? "No Response" is not "unknown".
Neither is "not applicable", or "error".

"positive infinity", "negative infinity"? If you're using NULL like this,
what should the following statement produce:

SELECT SUM(x) FROM table1

Table1
-------
100
200
300
NULL

To be fair, scalar mathematics involving NULL do produce NULL. So 3 + NULL
= NULL. So in this case, is the result Negative Infinity or Positive
Infinity? There's a slight difference... Perhaps if you split attributes
(argh) you could tell the difference. If your result is -Infinity or
+Infinity, it is not "unknown".

"not applicable"? Again, this is something that should be coded and not
left to potential misinterpretation. Let's look at a quick sample of using
NULL to cover these various areas:

Table2
-------
(Last Name, DOB, SSN)
Johnson, 1968-01-01, 000-00-0001
Smith, 1975-10-18, 000-00-1280
Williams, 1961-07-10, NULL

In this instance, Williams has a NULL value in SSN. Which situation
describes why we have a NULL in Williams' SSN column?

1) he has an SSN but we don't know it yet
2) he has no SSN and we know that for a fact
3) Williams refused to supply his SSN
4) it is conditionally optional based on our business rules which say that
we don't need an SSN since we have last name and DOB
5) it is just flat-out optional, period
6) there was an "error" (i.e., we deemed the SSN we received from him as
invalid based on rules set up by the SSA)

Unfortunately, since we used NULL to represent all these different
possibilities we have no idea why we don't have his SSN.

NULL has an inherent meaning, and it is supposed to represent one thing and
one thing only: "unknown/missing" values. Trying to project 7 or 8
different meanings on NULL is not a good practice to get into, and does
little but confuse the design and implementation.
 
C

Chris R. Timmons

in message


Personally I would never use NULL to represent most of these
particular items.

Michael,

Was all of that directed at me? If so, maybe my post wasn't clear.
I *NEVER* allow NULL values in my databases, and that's the approach
I was advocating.

Chris.
 
B

Bob Grommes

Chris,

You are describing precisely what NULL already provides support for. A
NULLable field just has a bit reserved in the record that can be set or
reset to indicate NULLness. For every eight NULLable fields, a byte of
record space is consumed for these flags. It looks to me like you are
re-inventing the wheel. At least "nullility" has been defined as to how it
will be indexed, ordered, aggregated, etc. And one could argue that
NULLable fields are self-documenting. If a field is nullable you do *not*
know that it's optional, or missing -- just that it's unknown, for whatever
reason.

In my experience it's always a mistake to ascribe any significance to NULL
beyond "I don't know". This makes NULL propagation a lot more intuitive. 5
plus 3 plus "i don't know" equals "I don't know". That makes sense. If you
think of NULL as a general purpose placeholder for which you can assign any
meaning, then you might object that 5 plus 3 plus a placeholder equals 8.
In truth we have a placholder value known as "zero" that works fine in those
situations. Zero is not NULL. An empty string is not NULL. Blanks are not
NULL. NULL is just NULL. A sort of zen statement of absence without any
other meaning.

When I said that NULL has a "universal meaning" I was referring to the fact
that it always indicates that a value is not known. You are wanting to
track one or more *reasons* why it might be unknown, which is a separate
issue. Indeed you might want or need to do that in some situations. But
that would be over and above what NULL is intended to convey.

As a last side point I would like to point out the oft-overlooked fact that
ordinary bit fields cannot have indexes on them. They can be searched, but
that search does not benefit from an index. If a WHERE clause includes
other, indexable fields, and those fields greatly reduce the result set on
their own, then the bit field would not be a big drag, performance-wise.
However if the bit field has to carry the primary burden of the search
critera, as in "give me all the few records in a 3 million record table
where the end date is not known" then you have a likely performance issue.
For this reason when I *do* have a flag field of any kind, and I know that
field is likely to be searched on, I tend to define a char(1) and constrain
it to 'Y' or 'N'. It is less elegant, but it is optimizable in all respects
except disk usage, which is cheap. Or, you might use a tinyint and
constrain it to zero and one.

--Bob
 
M

Michael C#

Chris R. Timmons said:
Michael,

Was all of that directed at me? If so, maybe my post wasn't clear.
I *NEVER* allow NULL values in my databases, and that's the approach
I was advocating.

I wasn't necessarily directing "at you", I was pointing out to others that
might trip across this thread why NULL should not be used to represent all
of the various values you presented. I think I mentioned your method of
using another column to represent one of the above conditions in my post.
Basically the point I was trying to get across is that if someone
implemented a table which contained a NULL-able column, and the NULL was
used to represent items from the list you presented, at best there will be
minor inconsistencies - which will grow considerably as you add more data.
At worst, your data will be unusable without a lot of manual processing
trying to figure out which NULL stands for what in each column.
 
J

John B

Thanks for your input everyone, it was good to read everyones thoughts
on the matter and I learnt a bit on the way, such as bit fields cannot
have an index applied.

I agree with pretty much everything everyone has said, which is good,
pretty much the only time I use null's is for foreign key columns where
there is a valid reason not to have a value.

The reason I asked this question is because I made a change to our
database schema here and defined a tinyint column as not null. We run
imports into our tables from excel and someone complained that the
column didnt accept null's.
I then realised I didnt have a good reason not to allow nulls, other
than that they were not a valid value (and not catered for) in the
application, which, to them, just seems like me being lazy and not
wanting to cater for null's.

Cheers

JB
Not null
 
M

Michael C#

John B said:
I then realised I didnt have a good reason not to allow nulls, other than
that they were not a valid value (and not catered for) in the application,
which, to them, just seems like me being lazy and not wanting to cater for
null's.

Nah, it's exactly the opposite :) They were apparently too lazy to populate
all the required values :) If there's a valid reason for allowing NULLs
(the value is truly unknown), then by all means use it. But if not,
consider putting a Default ZLS (zero-length string), numeric 0, or other
value on the column.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top