Writing a DAL with TDD

Frans Bouma [C# MVP] · Apr 27, 2008

Arne said:
I don't think DAL is more generic than any other layer except
presentation layers. Other layers are supposed to be reusable too.

That's not an explanation how to solve the 'unlimited usage' problem
when it comes to designing a DAL: how are you going to be sure your
tests cover all possible usages, so you know your DAL is correct?

And the difference between "TDD for DAL" and "unit tests for
generic design of DAL" sound more as a difference in terminology
than a real difference in process.

isn't TDD for DAL starting with the test, mocking the DAL out, then
requiring you to write the real code instead of using the mocks, while
the automated unittests for testing edge cases is more for easing the
proving process of the design?

FB

--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------

Arne Vajhøj · Apr 27, 2008

Frans said:
That's not an explanation how to solve the 'unlimited usage' problem
when it comes to designing a DAL: how are you going to be sure your
tests cover all possible usages, so you know your DAL is correct?

That is not possible.

My point is that it is also impossible for other layers.

isn't TDD for DAL starting with the test, mocking the DAL out, then
requiring you to write the real code instead of using the mocks, while
the automated unittests for testing edge cases is more for easing the
proving process of the design?

You can not prove code to be correct with unit tests.

And even "traditional non-TDD unit tests" approach would prefer
to write the unit tests before.

There should be some difference in the role of the tests in the
design process.

But I do not see it as black and white - more like 10 different
shades of gray.

Arne

Ben Voigt [C++ MVP] · Apr 28, 2008

I understand mocks. But at what point are we dedicating our

development time to developing complex mocks? Let me see you mock the
HTTP Request pipeline and unit test some code that overrides
ASP.NET's Application (Global) .Init method. If your targeting IIS7
and the request pipeline (say your code programmatically wires up
HTTP modules to handle the various pipeline events), then your mock
would have to emulate IIS7 AND ASP.NET application startup. That's
one enormous mock. At what point does it lose it's value (especially
if you have to make it emulate significant functionality in a complex
integrated system, like the ASP.NET/IIS7 pipeline).

Here's the difference between a mock and a test environment. You can quite
easily run real IIS or database engine or whatever, but feed in controlled
data.

Note that I didn't say you'd come up with the best solution using TDD to
drive database design. I simply said you can fill a database with sample
data and use that to drive TDD of your interface code, and that it was
theoretically also possible to apply TDD to the table layout itself.

In short, I don't think that TDD is equally valuable for the
development of all types of application components. My OP here was
posted so that I could find out other's opinions on the "TDD-ability"
of a DAL.

It sounds as if you made up your mind already to not test the DAL, and came
looking for affirmation (to strengthen your case with a manager perhaps).
If so, you approached the newsgroup entirely wrong and wasted everyone's
time.

Cramer · Apr 29, 2008

I understand mocks. But at what point are we dedicating our

Here's the difference between a mock and a test environment. You can
quite easily run real IIS or database engine or whatever, but feed in
controlled data.

Note that I didn't say you'd come up with the best solution using TDD to
drive database design.

I know *you* didn't, but the person posting as "mgsram" did promote the
idea.

I simply said you can fill a database with sample data and use that to
drive TDD of your interface code,

You and others mentioned that, and I raised no concerns about using test
data - ever... gotta have test data.

and that it was theoretically also possible to apply TDD to the table
layout itself.

Just because it's theoretically possible doesn't mean that it's a good idea
(to design a relational database based almost exclusively on how it will be
used, which is what TDD would do).

It sounds as if you made up your mind already to not test the DAL,

You are only partially right - I did make up my mind, but only after getting
a bunch of responses to my OP here. At the time I posted the OP, I thought
that maybe I was missing something about TDD if I was having a difficult
time applying it to my DAL. I'm working on a green field project and just
got on board with the idea of TDD - as I stated - and being new to TDD
figured the group might provide some pointers to help me get "unstuck." What
I got was an explanation of *why* it's difficult (perhaps prohibitively
expensive anyway), to develop a DAL via TDD.

The responses that stated that I could apply TDD to the DAL, and even use
TDD to design the relational database immediately lost all credibility. I'm
new to TDD, but not to data modeling and client/server business application
development. Anybody promoting the idea of applying TDD to designing a
database, IMHO, is dangerous to their employer. That's not meant as an
attack on anybody here personally. It's just a fact. I'd even say that it's
*probably never* a good idea to design a relational database based
exclusively or even mostly on how the data will be programmatically
accessed - which is what TDD would inherently do.

and came looking for affirmation (to strengthen your case with a manager
perhaps).

I *am* the manager. Geeze! I'm actually the technical lead and I have
complete control over this green field project - not only the technical
implementation, but also on the development process. The people I work for
have absolutely zero say in how development proceeds. They just want the
final product. They will never hear about "TDD" much less be able to tell me
yes or no about employing it. And, I do see TDD as very powerful and
currently *require* it's use on this project... just not on *every* aspect
of this project - and probably never on a DAL, or HTTP pipeline
configuration code, and other such things were mocking it would be
prohibitively expensive.

BTW: I'm writing this not to defend anything - but simply to point out how
wrong you are about this. It's beyond me why you seem so personally
offended.

If so, you approached the newsgroup entirely wrong and wasted everyone's
time.

Even if you were right, it is not a wast of everyone's time to have a lively
discussion in which beliefs are challenged. As a result of the challenge
you'll either (1) change your beliefs for the better; or (2) affirm your
beliefs. You can't go wrong either way.

-Cramer

mgsram · Apr 29, 2008

Whether he is narrating integration problems or not, I think it's utterly

naive to drive a database design from the point of view of the code that
will use the database - which is what you advocated elsewhere. I have been
in the industry for a very long time and have spent a lot of time cleaning
up databases, many times doing complete rewrites, after the initial
developers before me failed to make the database model the real world, and
instead designed the database to model their current application needs.
Applying TDD to the design of a database itself, by definition, means you
are modeling your application's needs and not the real world of the
business. That's a huge mistake that many systems never recover from (and if
the system is important enough, will eventually necessitate a complete
redesign and reimplementation of the database according to a new data model
that captures the real-world of the business).

I fully agree. Re-reading my response, I feel how stupid it was of me
to say
that (Its not impossible, but its not practical either) . Honestly, I
misread
Ben's comment and got carried away.

As I stated in my OP here, I'm absolutely fascinated by TDD and now consider
myself to be a True Believer after having been a sceptic for a long time.
But trying to apply TDD to develop a DAL seems particularly troublesome -
especially if we are to follow the TDD dogma carefully (i.e., have our unit
tests atomic, completely self-contained, etc).

Is DAL not supposed to be a object model layer abstracting a
persistent storage
and exposing a seamless interface for the upper layers? i.e. a layer
where
the output from a SQL / stored procedure is converted to object model
and
vice versa? Is my understanding in error?

Whatever i mentioned in my original post was based on the above. I
think there
is considerable logic involved in translation of Object Model to SQL
( perhaps
you will have a buch of Command builders etc.). This, i think can
certainly
be TDDed following the dogma. I.e. to start from an example of what
the DAL
should appear as to the upper layers and then write the logic for
translation.

Pls correct me if i this is incorrect.

The benefits of TDD seem readily apparent - to me anyway - when applied to
things like developing an application framework or parsing engine, or
designing some new algorithm for a particular business scenario. But when it
comes to designing a DAL... or ASP.NET Web application, where a lot of code
MUST interact with external systems or be dependent on things like ASP.NET
Application state, Session state, or the HTTP request pipeline (which as of
IIS 7 is now fully integrated with the IIS request pipeline), then we're
forced to make our unit tests (if we can even make them true unit tests)
depend on external things like the HTTP request pipeline or an external
database. I know, you'll say that we "just" mock those things...

No. i wont say that. I have had similar experiences ( although I m not
ASP.NET/
IIS developer). I have been able to deal with them by writing simple
wrappers
(and mocking the wrappers if necessary) or having event mocks and
resorting
to behavior testing if my logic is event driven.

That said, I have always found Integration testing to be necessary at
some point
of time and I do have complex Integration tests, sometimes involving
multiple
threads and device simulators. My integration tests often take time to
execute
as i have to collect certain amount of data or run for certain time to
validate
the behavior and therefore, i execute them only once in a while.
Still, they
are way way faster than interfacing with a real device.

I understand mocks. But at what point are we dedicating our development time
to developing complex mocks? Let me see you mock the HTTP Request pipeline
and unit test some code that overrides ASP.NET's Application (Global) .Init
method. If your targeting IIS7 and the request pipeline (say your code
programmatically wires up HTTP modules to handle the various pipeline
events), then your mock would have to emulate IIS7 AND ASP.NET application
startup. That's one enormous mock. At what point does it lose it's value
(especially if you have to make it emulate significant functionality in a
complex integrated system, like the ASP.NET/IIS7 pipeline).

I think if i am testing the wiring up, then it would be integration
testing. I
would not use mocks in such a scenario. Once again, behavior driven
testing
has helped me in such scenarios. In,one of my projects, I had to use
MSMQ and
i did use some of the MSMQ API in testing my message sender class (a
simple
forwarding wrapper on top of MSMQ) to validate that the message was
indeed delivered in
the queue. For all of the other objects, i mocked my wrapper. In fact,
I thought
it gave me more control as intellisense would only show whats
available in the wrapper.

In short, I don't think that TDD is equally valuable for the development of
all types of application components. My OP here was posted so that I could
find out other's opinions on the "TDD-ability" of a DAL.

Given the responses here, and silence from some of this group's more
talented developers (Skeet, Duniho, etc), coupled with the strongest
advocate's (for TDD-ing a dal) promotion of even designing a data model
based on how the application will interact with the database (thereby losing
a bunch of credibity) - I'm concluding that it's (1) difficult or impossible
to go the TDD route in earnest with a DAL, unless (2) we mock the he!! out
of the database and therefore dedicate absolutely huge amounts of effort the
developing mocks. That's not worth it, IMHO.

Sure. I am not one of the talented developers and not a frequent
visitor
to the newsgroup either. As for mocking the DB - no I would not do
that; perhaps
I would use in-memory DB for my integration testing ( on the same
lines of using
a device simulator).

mgsram · Apr 29, 2008

Whether he is narrating integration problems or not, I think it's utterly

naive to drive a database design from the point of view of the code that
will use the database - which is what you advocated elsewhere. I have been
in the industry for a very long time and have spent a lot of time cleaning
up databases, many times doing complete rewrites, after the initial
developers before me failed to make the database model the real world, and
instead designed the database to model their current application needs.
Applying TDD to the design of a database itself, by definition, means you
are modeling your application's needs and not the real world of the
business. That's a huge mistake that many systems never recover from (and if
the system is important enough, will eventually necessitate a complete
redesign and reimplementation of the database according to a new data model
that captures the real-world of the business).

I fully agree. Re-reading my response, I feel how stupid it was of me
to say
that (Its not impossible, but its not practical either) . Honestly, I
misread
Ben's comment and got carried away.

As I stated in my OP here, I'm absolutely fascinated by TDD and now consider
myself to be a True Believer after having been a sceptic for a long time.
But trying to apply TDD to develop a DAL seems particularly troublesome -
especially if we are to follow the TDD dogma carefully (i.e., have our unit
tests atomic, completely self-contained, etc).

Is DAL not supposed to be a object model layer abstracting a
persistent storage
and exposing a seamless interface for the upper layers? i.e. a layer
where
the output from a SQL / stored procedure is converted to object model
and
vice versa? Is my understanding in error?

Whatever i mentioned in my original post was based on the above. I
think there
is considerable logic involved in translation of Object Model to SQL
( perhaps
you will have a buch of Command builders etc.). This, i think can
certainly
be TDDed following the dogma. I.e. to start from an example of what
the DAL
should appear as to the upper layers and then write the logic for
translation.

Pls correct me if i am fundamentally wrong.

The benefits of TDD seem readily apparent - to me anyway - when applied to
things like developing an application framework or parsing engine, or
designing some new algorithm for a particular business scenario. But when it
comes to designing a DAL... or ASP.NET Web application, where a lot of code
MUST interact with external systems or be dependent on things like ASP.NET
Application state, Session state, or the HTTP request pipeline (which as of
IIS 7 is now fully integrated with the IIS request pipeline), then we're
forced to make our unit tests (if we can even make them true unit tests)
depend on external things like the HTTP request pipeline or an external
database. I know, you'll say that we "just" mock those things...

No. i wont say that. I have had similar experiences ( although I m not
ASP.NET/
IIS developer). I have been able to deal with them by writing simple
wrappers
(and mocking the wrappers if necessary) or having event mocks and
resorting
to behavior testing if my logic is event driven.

That said, I have always found Integration testing to be necessary at
some point
of time and I do have complex Integration tests, sometimes involving
multiple
threads and device simulators. My integration tests often take time to
execute
as i have to collect certain amount of data or run for certain time to
validate
the behavior and therefore, i execute them only once in a while.
Still, they
are way way faster than interfacing with a real device.

I understand mocks. But at what point are we dedicating our development time
to developing complex mocks? Let me see you mock the HTTP Request pipeline
and unit test some code that overrides ASP.NET's Application (Global) .Init
method. If your targeting IIS7 and the request pipeline (say your code
programmatically wires up HTTP modules to handle the various pipeline
events), then your mock would have to emulate IIS7 AND ASP.NET application
startup. That's one enormous mock. At what point does it lose it's value
(especially if you have to make it emulate significant functionality in a
complex integrated system, like the ASP.NET/IIS7 pipeline).

I think if i am testing the wiring up, then it would be integration
testing. I
would not use mocks in such a scenario. Once again, behavior driven
testing
has helped me in such scenarios. In,one of my projects, I had to use
MSMQ and
i did use some of the MSMQ API in testing my message sender class (a
simple
forwarding wrapper on top of MSMQ) to validate that the message was
indeed delivered in
the queue. For all of the other objects, i mocked my wrapper. In fact,
I thought
it gave me more control as intellisense would only show whats
available in the wrapper.

In short, I don't think that TDD is equally valuable for the development of
all types of application components. My OP here was posted so that I could
find out other's opinions on the "TDD-ability" of a DAL.

Given the responses here, and silence from some of this group's more
talented developers (Skeet, Duniho, etc), coupled with the strongest
advocate's (for TDD-ing a dal) promotion of even designing a data model
based on how the application will interact with the database (thereby losing
a bunch of credibity) - I'm concluding that it's (1) difficult or impossible
to go the TDD route in earnest with a DAL, unless (2) we mock the he!! out
of the database and therefore dedicate absolutely huge amounts of effort the
developing mocks. That's not worth it, IMHO.

Sure. I am not one of the talented developers and not a frequent
visitor
to the newsgroup either. As for mocking the DB - no I would not do
that; perhaps
I would use in-memory DB for my integration testing ( on the same
lines of using
a device simulator).

- Sriram

Frans Bouma [C# MVP] · Apr 29, 2008

Ben said:
It sounds as if you made up your mind already to not test the DAL, and came
looking for affirmation (to strengthen your case with a manager perhaps).

Not using TDD to create the DAL doesn't mean 'not testing the DAL'.
There are numerous ways to make sure your software works. As a DAL is a
general purpose layer which has unlimited # of usage scenario's, it's
impossible to test it with just that: a set of usage scenarios. You
therefore need different ways to test if the DAL is correct.

I think that was the point Cramer was trying to make.

If so, you approached the newsgroup entirely wrong and wasted everyone's
time.

I think the thread was very useful, as you seem to forget that most
people visiting a newsgroup (through an NNTP reader or most likely
through google groups) aren't posting at all but just lurking.

FB

--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------

Ben Voigt [C++ MVP] · Apr 29, 2008

You are only partially right - I did make up my mind, but only after
getting a bunch of responses to my OP here. At the time I posted the
OP, I thought that maybe I was missing something about TDD if I was
having a difficult time applying it to my DAL. I'm working on a green
field project and just got on board with the idea of TDD - as I
stated - and being new to TDD figured the group might provide some
pointers to help me get "unstuck." What I got was an explanation of
*why* it's difficult (perhaps prohibitively expensive anyway), to
develop a DAL via TDD.
The responses that stated that I could apply TDD to the DAL, and even
use TDD to design the relational database immediately lost all
credibility. I'm new to TDD, but not to data modeling and
client/server business application development. Anybody promoting the
idea of applying TDD to designing a database, IMHO, is dangerous to
their employer. That's not meant as an attack on anybody here
personally. It's just a fact. I'd even say that it's *probably never*
a good idea to design a relational database based exclusively or even
mostly on how the data will be programmatically accessed - which is
what TDD would inherently do.

But you must take programmatic access into account early and often in order
to get good performance, right? I'm not an extensive database user so I
don't know for certain.

I *am* the manager. Geeze! I'm actually the technical lead and I have
complete control over this green field project - not only the
technical implementation, but also on the development process. The
people I work for have absolutely zero say in how development
proceeds. They just want the final product. They will never hear
about "TDD" much less be able to tell me yes or no about employing
it. And, I do see TDD as very powerful and currently *require* it's
use on this project... just not on *every* aspect of this project -
and probably never on a DAL, or HTTP pipeline configuration code, and
other such things were mocking it would be prohibitively expensive.

BTW: I'm writing this not to defend anything - but simply to point
out how wrong you are about this. It's beyond me why you seem so
personally offended.

I didn't mean to express offense, just that some of the language you were
using in the preceeding post suggested that you weren't actually interested
in differing opinions. Of course, that's just the nature of human language,
that some readers infer something that wasn't intended.

Even if you were right, it is not a wast of everyone's time to have a
lively discussion in which beliefs are challenged. As a result of the
challenge you'll either (1) change your beliefs for the better; or
(2) affirm your beliefs. You can't go wrong either way.

Actually we get quite a few posts in here from people who lean toward
(3) hunting for ammunition to support their stance and ignoring anything
contrary

I'm glad you aren't among them, and I'm glad this entire thread has been
helpful to you.

Even hunting for ammunition wouldn't necessarily be a bad thing if it's
expressed up front and if the poster seriously considers dissenting
opinions.

Cramer · Apr 29, 2008

Ben Voigt said:
But you must take programmatic access into account early and often in
order to get good performance, right? I'm not an extensive database user
so I don't know for certain.

By "programmatic access" I'm presuming - for purposes of the following
comments - that you mean client application (or middle tier) code that
ultimately consumes data in a database:

When it comes to designing a {data model and associated the tables in a}
relational database, programmatic access is really not much of a
consideration, and for many reasons. The important ones I can think of
offhand include:

1. The data model should model the real-world of the business (to the extent
that it is relevant to the mission of the system the database exists in
support of).

2. A good relational database design (which by definition models a subset of
the real-world business) is, in practice, almost never directly known by the
client applications. Specifically, the client applications almost never know
(and arguably should not know) about the specific tables in the database.
Instead, the clients interact through abstractions provided by (1) the DAL
in the client application or some middle tier; (2) stored procedures; and
(3) views in the database (which, if you don't know are basically stored SQL
statements that typicaly JOIN many tables and manifest the result set as
"one denormalized table".

This isn't to say that performance of a database isn't of concern. It's
actually of huge concern and specific steps must be taken early and often in
order to ensure optimal database performance. The important steps to
ensuring optimal database performance include (not necessarily in order of
importance):

1. Having a proper normalized (through at least 3rd Normal Form) relational
design to start with (IMHO this simply cannot be achieved if the design is
driven by programmatic access instead of by the real-world business model).

2. Smart denormalization (not stupid denormalization) - typically for
reporting purposes. This denormalization can happen via tables or views...
tables for performance reasons, or views for "fewer moving parts" reasons.

3. Effective use of indexes

4. Exposing clients to data exclusively through stored procedures or views
as opposed to allowing clients to submit dynamic SQL (which could include
SELECT * and other such performance-killing things that a poorly written
client could do beyond the knowledge or control of the DBA).

Given all of the above, it should be pretty clear that "designing a
relational database via TDD" why elsewhere in this thread I pointed out that
trying to design a database via TDD is irresponsible. The specific reason it
is irresponsible and even dangerous is that it would lead to a relational
database design that literally models programmatic access as opposed to a
subset of the real-world business model.

BTW: This is all why MS Access is such a terrible product (beyond my
opinion) and IS departments in many big enterprises outright deny it's use
throughout the enterprise. MS Access lets non technical users bind the UI
directly to the underlying tables. This is the most extreme form of
"designing a database based on how the client application will use it."

I didn't mean to express offense, just that some of the language you were
using in the preceeding post suggested that you weren't actually
interested in differing opinions. Of course, that's just the nature of
human language, that some readers infer something that wasn't intended.

Fair enough. Given the above, you can see why I'd be perticularly sceptical
about the possibility of developing a DAL via TDD. With my enthusiasm about
TDD I "hit a wall" when I got to my DAL... thought if over and wondered what
other developers more experienced with TDD than I do when it comes to the
DAL. Thus the OP here.

Cheers...

-Cramer

Cramer · May 5, 2008

REsponses inline:

It does not make sense to design a database after how it will not
be used.
Obviously a database should be designed after how it will be used.

No - not at all. To the contrary, a database should be designed to model
the "real-world" entities and attributes of the business. The
program/software, DAL and all, then needs to figure out how to interact with
that real-world representation of the business model in the database. The
scope, itself, of what gets included in the data model and subsequently
built in the database, is determined by how the application will be used -
or more accurately the role the application will play in the business.

You are therefore correct to say that it does not make sense to design a
database after how it will not be used. That makes no sense at all and it's
amazing that you would conclude that that is what I was saying after stating
in multiple places in this thread that a database should model the real
world of the business (or a subset thereof).

What I'm saying is "not a good idea" is to build a database based on how it
will be used *by the client software* including the DAL. It's in fact a bad
idea.

Case in point: Say you need to provide a report that lists customers and
their phone numbers. TDD would have us build a single table, Customers, with
three columns: FirstName, LastName, and PhoneNumber. Then as requirements
evolve, we'd need to have two phone numbers. So you'd add a second phone
number to the Customers table. Your reports work, you go from "red to green"
in a matter of minutes, and in the TDD world all is well. But your data
model is suddenly violating 2nd Normal Form (if you don't know why that's a
problem, then you're in trouble... well, actually your employer is, but I
digress). Then you need a 3rd phone number and add 3rd column... you see how
this is going. Well, according to the TDD way of doing things (which, btw, I
really; like - just not for data modeling), we find that we need to refactor
("mercilessly, in fact). Now is when we start seeing why TDD is really; a
problemmatic way to go with designing a database. Because we didn't model
the real-world of the business up front, we're now breaking out our phone
numbers into a separate PhoneNumbers (or ContactNumbers) table. When we do
this, we break all of our queries that were based on one table being in
existance, and probably break all of the code that consumed the result set
of those queries. Yes, we *can* refactor this way, but it's probably the
most costly in terms of development time and effort. Now, contrast that TDD
approach to developing a data model with the approach that says we should
model the real world of the business. With that approach, we would meet with
subject matter experts early and have them identify the ways *they* need to
interact with their data, what data points are of significance to them
(within the mission of this particualar database). Before we write one line
of code or build one table or unit test, we know that we would need to track
multiple phone/contact numbers per customer - and therefore, starting on Day
One, we'd build separate Customers and ContactNumbers tables. No need to
refactor (which is TDD's calling card). From Day One, we have this database
normalized through at least 3rd Normal Form.

In short, yes, we *could* use TDD to design a datababase, but the
refactoring part is super expensive when we have to refactor database
structures... reason being is that when we break out tables (or combind
them) we break the queries that worked with the original table design. This
ripples directly to any stored procedures or application code that worked
with those queries.

The only fact of that is that you have a very narrow definition
of TDD and seem incapable of understanding it when it is explained to you.

Just because I don't agree with something doesn't mean that I'm not
understanding what is being said.

If you believe that a DAL *should* be developed with TDD (and not simply
that it's theoretically possible), please provide some explanation (which
means more than making statements). You can say that pigs fly. But until you
provide some compelling logic, then very few people are going to agree with
you.

If it's so incredibly obvious - as you seem to think - then you should be
able to provide an example... just like I did above (with the Customers and
Contacts scenario). Come on - if it's so incredibly simple, then you should
be able to do this without any hesitation or problem. Just one simple
example of how it's good or better to develop a data model via TDD than it
would be to develop it according to the real-world business model.

Wise men listen more than they talk.

Agreed - and did you take a moment to consider what I was referring to when
I wrote that? Or were you just talking before you listened?

mgsram · May 5, 2008

REsponses inline:

No - not at all. To the contrary, a database should be designed to model
the "real-world" entities and attributes of the business. The
program/software, DAL and all, then needs to figure out how to interact with
that real-world representation of the business model in the database. The
scope, itself, of what gets included in the data model and subsequently
built in the database, is determined by how the application will be used -
or more accurately the role the application will play in the business.

You are therefore correct to say that it does not make sense to design a
database after how it will not be used. That makes no sense at all and it's
amazing that you would conclude that that is what I was saying after stating
in multiple places in this thread that a database should model the real
world of the business (or a subset thereof).

What I'm saying is "not a good idea" is to build a database based on how it
will be used *by the client software* including the DAL. It's in fact a bad
idea.

Absolutely. However when you set out with TDD, the objective is not to
satisfy the *client software*. It is to satisfy the user requirements.
DAL or no DAL, the user cares less. Don't get me wrong here. I m not
saying DAL is not required. All i want to say is you need to design
your Database as well ad DAL in the best possible manner that
completely satisfies user requirements - no less, no more.

Case in point: Say you need to provide a report that lists customers and
their phone numbers. TDD would have us build a single table, Customers, with
three columns: FirstName, LastName, and PhoneNumber. Then as requirements
evolve, we'd need to have two phone numbers. So you'd add a second phone
number to the Customers table. Your reports work, you go from "red to green"
in a matter of minutes, and in the TDD world all is well. But your data
model is suddenly violating 2nd Normal Form (if you don't know why that's a
problem, then you're in trouble... well, actually your employer is, but I
digress). Then you need a 3rd phone number and add 3rd column... you see how
this is going. Well, according to the TDD way of doing things (which, btw,I
really; like - just not for data modeling), we find that we need to refactor
("mercilessly, in fact). Now is when we start seeing why TDD is really; a
problemmatic way to go with designing a database. Because we didn't model
the real-world of the business up front, we're now breaking out our phone
numbers into a separate PhoneNumbers (or ContactNumbers) table. When we do
this, we break all of our queries that were based on one table being in
existance, and probably break all of the code that consumed the result set
of those queries. Yes, we *can* refactor this way, but it's probably the
most costly in terms of development time and effort. Now, contrast that TDD
approach to developing a data model with the approach that says we should
model the real world of the business. With that approach, we would meet with
subject matter experts early and have them identify the ways *they* need to
interact with their data, what data points are of significance to them
(within the mission of this particualar database). Before we write one line
of code or build one table or unit test, we know that we would need to track
multiple phone/contact numbers per customer - and therefore, starting on Day
One, we'd build separate Customers and ContactNumbers tables. No need to
refactor (which is TDD's calling card). From Day One, we have this database
normalized through at least 3rd Normal Form.

What you have narrated above is a possible situation that you could be
in when you have not understood the requirements clearly. It has
nothing to do with TDD. TDD does not prescribe following the above
steps, if you know from the beginning that you need to have 3 phone
numbers. The "Subject Matter Experts" (SME) should be your customers
and they should be able to tell you what they need. If the SME
initially required only one phone number, I see nothing wrong in
having a single phone number column and moving ahead; and yes, if they
come back later and ask me to adding more phone numbers, i would
refactor as necessary ( Note: if none of the 3 phone numbers are
optional, then there is nothing wrong in having 3 columns).

In short, yes, we *could* use TDD to design a datababase, but the
refactoring part is super expensive when we have to refactor database
structures... reason being is that when we break out tables (or combind
them) we break the queries that worked with the original table design. This
ripples directly to any stored procedures or application code that worked
with those queries.

Yes, the refactored changes does ripple into stored procedures,
application code etc. But then your tests should go RED and to get to
GREEN you will have to fix them. This way atleast you are saved from
introducing breaking changes in the software.

Moreover, can you gaurantee that once your DB has been designed, you
will never have to change the table structure ever ? I suppose not.
When you design an application for users and when users use the
system, they would certainly feedback and some of their feedback can
drive changes that may involve DB layer as well. How would you deal
with these changes, if you did not have any tests to cover?

Just because I don't agree with something doesn't mean that I'm not
understanding what is being said.

If you believe that a DAL *should* be developed with TDD (and not simply
that it's theoretically possible), please provide some explanation (which
means more than making statements). You can say that pigs fly. But until you
provide some compelling logic, then very few people are going to agree with
you.

If it's so incredibly obvious - as you seem to think - then you should be
able to provide an example... just like I did above (with the Customers and
Contacts scenario). Come on - if it's so incredibly simple, then you should
be able to do this without any hesitation or problem. Just one simple
example of how it's good or better to develop a data model via TDD than it
would be to develop it according to the real-world business model.

I suggested doing a POC before. I m still open to it. Give me a
problem statement and lets change the requirements and see how it
fares. You have nothing to loose and perhaps there is something to
gain for all of us here.

Arne Vajhøj · May 6, 2008

Cramer said:
No - not at all. To the contrary, a database should be designed to model
the "real-world" entities and attributes of the business. The
program/software, DAL and all, then needs to figure out how to interact with
that real-world representation of the business model in the database. The
scope, itself, of what gets included in the data model and subsequently
built in the database, is determined by how the application will be used -
or more accurately the role the application will play in the business.

You are therefore correct to say that it does not make sense to design a
database after how it will not be used. That makes no sense at all and it's
amazing that you would conclude that that is what I was saying after stating
in multiple places in this thread that a database should model the real
world of the business (or a subset thereof).

What I'm saying is "not a good idea" is to build a database based on how it
will be used *by the client software* including the DAL. It's in fact a bad
idea.

Both databases and object model should be designed to model the real
world. Database is not special here.

It is impossible to model all aspect of the real world. You have to only
model the relevant parts.

Relevant parts are those that will be used.

Used means "what" the client apps will need.

I agree that "how" the client apps need them is a bad idea.

Case in point: Say you need to provide a report that lists customers and
their phone numbers. TDD would have us build a single table, Customers, with
three columns: FirstName, LastName, and PhoneNumber. Then as requirements
evolve, we'd need to have two phone numbers. So you'd add a second phone
number to the Customers table. Your reports work, you go from "red to green"
in a matter of minutes, and in the TDD world all is well. But your data
model is suddenly violating 2nd Normal Form (if you don't know why that's a
problem, then you're in trouble... well, actually your employer is, but I
digress). Then you need a 3rd phone number and add 3rd column... you see how
this is going. Well, according to the TDD way of doing things (which, btw, I
really; like - just not for data modeling), we find that we need to refactor
("mercilessly, in fact). Now is when we start seeing why TDD is really; a
problemmatic way to go with designing a database. Because we didn't model
the real-world of the business up front, we're now breaking out our phone
numbers into a separate PhoneNumbers (or ContactNumbers) table. When we do
this, we break all of our queries that were based on one table being in
existance, and probably break all of the code that consumed the result set
of those queries. Yes, we *can* refactor this way, but it's probably the
most costly in terms of development time and effort. Now, contrast that TDD
approach to developing a data model with the approach that says we should
model the real world of the business. With that approach, we would meet with
subject matter experts early and have them identify the ways *they* need to
interact with their data, what data points are of significance to them
(within the mission of this particualar database). Before we write one line
of code or build one table or unit test, we know that we would need to track
multiple phone/contact numbers per customer - and therefore, starting on Day
One, we'd build separate Customers and ContactNumbers tables. No need to
refactor (which is TDD's calling card). From Day One, we have this database
normalized through at least 3rd Normal Form.

You are using a straw man argument.

Noone will use TDD to design a database that way.

In short, yes, we *could* use TDD to design a datababase, but the
refactoring part is super expensive when we have to refactor database
structures... reason being is that when we break out tables (or combind
them) we break the queries that worked with the original table design. This
ripples directly to any stored procedures or application code that worked
with those queries.

Not really any different from object model. Making incompatible changes
also break code.

Just because I don't agree with something doesn't mean that I'm not
understanding what is being said.

You have been told plenty of times that your definition of TDD is too
narrow. But you just keep arguing based on your definition. If you
understand us, then you being very rude.

If you believe that a DAL *should* be developed with TDD (and not simply
that it's theoretically possible), please provide some explanation (which
means more than making statements). You can say that pigs fly. But until you
provide some compelling logic, then very few people are going to agree with
you.

If it's so incredibly obvious - as you seem to think - then you should be
able to provide an example... just like I did above (with the Customers and
Contacts scenario). Come on - if it's so incredibly simple, then you should
be able to do this without any hesitation or problem. Just one simple
example of how it's good or better to develop a data model via TDD than it
would be to develop it according to the real-world business model.

1) You write the tests.

I would say:
- good insert in customers
- good insert in contacts
- join with one contact for customer
- join with multiple contacts per customer
- insert for default value
- insert for range
- insert for constraints

2) You implement the database structure.

3) You run tests and verify that all tests pass. If not you fix
the structure.

Pretty simple.

Arne

Writing a DAL with TDD

Frans Bouma [C# MVP]

Arne Vajhøj

Ben Voigt [C++ MVP]

Cramer

mgsram

mgsram

Frans Bouma [C# MVP]

Ben Voigt [C++ MVP]

Cramer

Cramer

mgsram

Arne Vajhøj