nulls versus ?

Peter Kirk · May 18, 2006

Hi,

does anyone know of a resource which discusses the merits of using "nulls"
versus a specific "unknown" value? For example, should I use "null" for an
unknown or unset date, or should I use min-value? For a string (eg a name)
should I use null or an empty string? And so on.

(These sorts of issues crop up a lot in the applications I work on which
mostly involve presenting data on a web interface, and reading and writing
to and from a database).

Thanks,
Peter

Tasos Vogiatzoglou · May 18, 2006

I think that these depends on the semantics. But a general rule I like
to follow is this:

Null in the database for unknown.
Default values set by the editor when the fields are not completed.

e.g. if name was not completed I put an empty string to the db.
or in the lookuptables I add a value "Not set" and from the editor I
set this as a default value

Tasos

Paul Cheetham · May 18, 2006

Hi Peter,

I always create my tables in the database so that all fields are
non-nullable and must have a valid value.
For strings I use an empty string, numeric values vary depending on what
it is for, and for dates I will either use a specific date/time or the
same value as another date in the record.

Nullable fields in a MS SQL table degrades performance, but another
advantage of not allowing null values, is you don't have to check for them.

Paul

Ignacio Machin \( .NET/ C# MVP \) · May 18, 2006

HI,

Paul Cheetham said:
Hi Peter,

I always create my tables in the database so that all fields are
non-nullable and must have a valid value.

What you do with a date column? in a escenario where any date accepted by
SQL is an accepted value?
The only way is using null.

and for dates I will either use a specific date/time or the same value as
another date in the record.

So you use incorrect data, only to avoid dealing with null?

Nullable fields in a MS SQL table degrades performance

How ? This is the first time I hear this

Ignacio Machin \( .NET/ C# MVP \) · May 18, 2006

Hi,

Peter Kirk said:
Hi,

does anyone know of a resource which discusses the merits of using "nulls"
versus a specific "unknown" value? For example, should I use "null" for an
unknown or unset date, or should I use min-value? For a string (eg a name)
should I use null or an empty string? And so on.

There is not a "one rule fits all" it all depends of the type of data you
are dealing with, the acceptable values and the version of the framework you
use.

I do use null when possible to signal "unknown/unset" values in the DB. In
the code if I can use a default value (like DateTime.MinValue) I use it as
it does work in 1.1 and 2.0.
In case I cannot use a default value , like an integer value . then I create
a wrapper type to signal a null status. This is not longer needed with 2.0
as now you have nullable types.

james.curran · May 18, 2006

Nullable fields in a MS SQL table degrades performance,

I never heard that either. In fact, I've heard it the other way: The
non-nullable fields degrade preformance, because the db must now check
each value inserted to make sure it's not null.

Nonsense. You've merely changed the value you have to check for, and
the location of the check. The total work done remains the same,
because you've done nothing with the data entering the system.

In any non-trivial system, some values will be unknowns. Whether you
refer to those unknowns as Null, or "" or -1 or 4-Jul-1776 is relevant.
In each case you have a value you must check for.

Guest · May 18, 2006

Nullable fields in a MS SQL table degrades performance,

I never heard that either. In fact, I've heard it the other way: The
non-nullable fields degrade preformance, because the db must now check
each value inserted to make sure it's not null.

Nonsense. You've merely changed the value you have to check for, and
the location of the check. The total work done remains the same,
because you've done nothing with the data entering the system.

In any non-trivial system, some values will be unknowns. Whether you
refer to those unknowns as Null, or "" or -1 or 4-Jul-1776 is relevant.
In each case you have a value you must check for.

Indeed, and if the check for null is implemented as an integer compare
against zero, then it's _much_ quicker than doing, perhaps, a
character-by-character compare between strings.

Null is your friend.

Mark Wilden · May 18, 2006

I never heard that either. In fact, I've heard it the other way: The
non-nullable fields degrade preformance, because the db must now check
each value inserted to make sure it's not null.

The reason nulls can degrade performance is that the database engine has to
check a bit in the record to determine whether the associated column
contains a null. Also, those bits add to storage requirements.

///ark

Guest · May 18, 2006

And storing and checking a bit is _more_costly_ than comparing a larger
structure?

Mark Wilden · May 18, 2006

And storing and checking a bit is _more_costly_ than comparing a larger
structure?

Depends on usage. Typically, data is more often read than written. Non-null
columns need checking on write. Null columns need checking on read. Plus the
storage, which can double in the case of a small row size.

sdbillsfan · May 18, 2006

Huh? Which "database engine" are you talking about? Please qualify this
ridiculous statement with some sort of example/evidence.

james.curran · May 18, 2006

Non-null columns need checking on write.
Nope. I don't see it.
First of all, the Null/Not Null flag must always exist, as even a
not-null column must give some response to a Is-null query.

So, basically, we have two null-related boolean conditions: Can-be-null
(Y/N) and Is-Null (Y/N), with the limitation that "NY" is invalid.
Your contension is that given that limitation, we can, on occasion, not
store the second bit, and derive it's value algorithmically.

But that requires doing lots of work (columns which appear to be a
fixed size can be either of two different sizes), just to save the
storage of one bit. Since there are other boolean flags which need to
be saved, and bits are doled out in packages of 8 (min) or 32 (more
likely), storing the is-null flag for a not-null column is almost
certainly zero cost. (ie, it inflicts a cost only if the number other
flags total an exact multiple of 8)

so, in pseudo-code:
On insert:
if (column nullable)
if (data is null)
field.is_null = t
else
field = data
field.is_null = f
else
if (data is null)
throw exception
else
field = data
field.is_null = f

On Select
response.data = field
response.nullable = column nullable
response.is_null = field.is_null

Given that, I cannot imagine any scenario, where one path would be
significantly faster that another.

Mark Wilden · May 18, 2006

So, basically, we have two null-related boolean conditions: Can-be-null
(Y/N) and Is-Null (Y/N), with the limitation that "NY" is invalid.

Sort of. The first condition is stored with the table's metadata, whereas
the second has to be stored with each row.

Your contention is that given that limitation, we can, on occasion, not
store the second bit, and derive its value algorithmically.

Right. If the metadata says the column can't be null, we don't have to store
nullness for that column in each row.

But that requires doing lots of work (columns which appear to be a
fixed size can be either of two different sizes),

In SQL Server, each row with nullable columns includes a bitmap indicating
whether each nullable column actually contains null. See "Inside SQL Server
2000", p. 235.

just to save the
storage of one bit. Since there are other boolean flags which need to
be saved, and bits are doled out in packages of 8 (min) or 32 (more
likely), storing the is-null flag for a not-null column is almost
certainly zero cost. (ie, it inflicts a cost only if the number other
flags total an exact multiple of 8)

Including 0.

Which is why I said that making a column nullable can double
the size of the row. The null bitmask (probably because it's variable
length) isn't squeezed in with other metadata.

so, in pseudo-code:
On insert:
if (column nullable)
if (data is null)
field.is_null = t
else
field = data
field.is_null = f
else
if (data is null)
throw exception
else
field = data
field.is_null = f

No, field.is_null does not need to be set in the last case. Your example
actually points out something I missed -- that even writing a nullable field
can impose a performance hit.

///ark

Mark Wilden · May 18, 2006

Huh? Which "database engine" are you talking about? Please qualify this
ridiculous statement with some sort of example/evidence.

You're going to have to be more specific about which ridiculous statement
you're talking about - I make so many.

///ark

sdbillsfan · May 18, 2006

Anything in regard to degrading performance by using nullable columns.
Please supply a supporting example with explain plan and statistics.

Mark Wilden · May 19, 2006

Anything in regard to degrading performance by using nullable columns.
Please supply a supporting example with explain plan and statistics.

First let me know if you read and disagree with my response to James (and
why).

nulls versus ?

Peter Kirk

Tasos Vogiatzoglou

Paul Cheetham

Ignacio Machin \( .NET/ C# MVP \)

Ignacio Machin \( .NET/ C# MVP \)

james.curran

Guest

Mark Wilden

Guest

Mark Wilden

sdbillsfan

james.curran

Mark Wilden

Mark Wilden

sdbillsfan

Mark Wilden