Datasets vs. OOP

cody · May 10, 2004

I've seen an Introduction on ADO.NET with its Datasets on .NET TV and Iam
now wondering how it is realized/used in real world applications.

I don't believe that one would create a dataset and add relations to it and
so on for every table in the application, this would be a real mess and this
has nothing to do with OOP.

Normally, would would create a class Customer, a class Invoices, Positions,
Articles and so on.
But how can this be realized using Datasets? Datasets and encapsulating data
+ hiding implemention in classes seems to be a contradiction to me.

Sorry for me stupidness but I found no example explaining this.

Peter van der Goes · May 10, 2004

cody said:
I've seen an Introduction on ADO.NET with its Datasets on .NET TV and Iam
now wondering how it is realized/used in real world applications.

I don't believe that one would create a dataset and add relations to it and
so on for every table in the application, this would be a real mess and this
has nothing to do with OOP.

Normally, would would create a class Customer, a class Invoices, Positions,
Articles and so on.
But how can this be realized using Datasets? Datasets and encapsulating data
+ hiding implemention in classes seems to be a contradiction to me.

Sorry for me stupidness but I found no example explaining this.

--
cody

[Freeware, Games and Humor]
www.deutronium.de.vu || www.deutronium.tk

Actually, as DataSet is a class in the ADO.NET hierarchy, it has everything
to do with OO.
You use one or more DataSet objects in your OO solution. How you choose to
have objects from your own class definitions interact with DataSet objects
is up to you.
For example, if create an instance of your Customer class, the datamembers
of your object could be set from a given row in a table within a DataSet
object, if it made sense to do so in your scenario.
By using DataSet objects, you're taking advantage of inheritance and
encapsulation, two of the more prominent features of OO.

Peter [MVP Visual Developer]
Jack of all trades, master of none.

Alfredo · May 10, 2004

I don't believe that one would create a dataset and add relations to it and
so on for every table in the application

Tables are not in the application, they are in the DBMS.

, this would be a real mess and this
has nothing to do with OOP.

Data management has nothing to do with OOP.

Normally, would would create a class Customer, a class Invoices, Positions,
Articles and so on.

But that is a blunder. Customers, Invoices, Articles, etc should be
tables not classes. Tables and classes are radically different.

But how can this be realized using Datasets? Datasets and encapsulating data
+ hiding implemention in classes seems to be a contradiction to me.

Datasets are not for that. They are for presenting the tables to the
users.

Regards
Alfredo

Jay B. Harlow [MVP - Outlook] · May 10, 2004

Cody,
Martin Fowler's book "Patterns of Enterprise Application Architecture" from
Addison Wesley discusses when to use "Data Sets" and when to create a Domain
Model. http://www.martinfowler.com/books.html#eaa

Using either Typed DataSets or untyped DataSets, is a viable Object Oriented
approached to creating solutions in .NET. In addition to the more
"traditional" Domain Objects (business objects) approach to. Martin uses the
term Table Module pattern to refer to using a DataSet.

Martin's book discusses when you should consider one over the other, plus
various patterns to support an OO approach with either (DataSet or Domain
object) approach.

For example I would consider a Table Module & Table Data Gateway approach if
my "Domain objects" did not have any real logic to them. See
http://www.martinfowler.com/eaaCatalog/tableModule.html &
http://www.martinfowler.com/eaaCatalog/tableDataGateway.html patterns

However if my "Domain Objects" had heavy logic to them, then I would use a
Domain Model and Data Mapper approach. See:
http://www.martinfowler.com/eaaCatalog/domainModel.html &
http://www.martinfowler.com/eaaCatalog/dataMapper.html patterns

Note: My Data Mappers would use a DataReader to read an row from the
database to create a new Domain object, passing the information for a row to
the constructor of the Domain object.

Hope this helps
Jay

William Ryan eMVP · May 10, 2004

Hi Cody:

cody said:
I've seen an Introduction on ADO.NET with its Datasets on .NET TV and Iam
now wondering how it is realized/used in real world applications.

I don't believe that one would create a dataset and add relations to it and
so on for every table in the application, this would be a real mess and this
has nothing to do with OOP.

I totally agree with the Mess part. It may or may not have something to do
with the OOP part. An object doesn't really care how it's properties get
set as long as the follow the accessor's rules.I konw that's not the point
you were making but that's why I say it may or may not have something to do
with OOP.

Datasets are composed of datatables and datatables are Two dimensional
objects. With DataRelations that can change quite a bit. However, there
are more than a few structures out there that don't fit very comfortably
into the Relational model.

There's a philosophy known as OR/M (Object Relational Mapping) that tends to
bridge this gap, and if you look at ADO.NET 2.0 ,there's an object called an
ObjectSpace that addresses your concerns. OR/M tools basically handle the
mapping of your object properties to you database schema (this is an
oversimplification but it doesn't change the point). Anyway, OR/M tools
sell b/c they do what is a real pain in the butt (see Messy) to do in many
instances.

But the problem is that any downside of DataSets inherent in Relational
Dataabases. I agree that it's not always clean and depending on your
requirements, you may end up shoving Square pegs down round holes but it's
the best we have.

Normally, would would create a class Customer, a class Invoices, Positions,
Articles and so on.
But how can this be realized using Datasets?

The devil is really in the details and the problem here is that you may have
many-to-many relationships which are a nightmare to handle with the current
model. The short answer is create four tables, link them where you can. I
can think of implementations I've worked on where this would fit right in
with the dataset model, and just as easiler think of ones that were ghastly.
It really depends on the implementation of each class, and I know that's
sounds more like an excuse than an answer, but IMHO, it's a situation where
you have to take the good with the bad.
Datasets and encapsulating data

+ hiding implemention in classes seems to be a contradiction to me.

It can certainly be but not necessarily so. Like I mentioned in the
beginning, it's not a contradiction in that the object doesn't care where
it's data comes from. A dataset whose structure can mimic the object
strucutre is a very natural fit. But I'm not going to pretend that there
aren't a bunch of situations where the implementation is awkward.

Regrettably, OOP isn't a perfect model either and not everything can be
modeled by objects (why do I get the feeling I'm opening a can of worms with
this last statement ;-) ). Rather, everything can probably be modelled,
but not elegantly. Same with DataSet.

Sorry for me stupidness but I found no example explaining this.

I think that the result is that you need to consider the tools you have at
your disposal. For years, Joins were really costly (and still are in many
situations) in most RDMBS situations in many circumstances, but that has
nothing to do with Relational theory. On the contrary. However, the
pragmatic reality made the theoritical application a 'contradiction'...
should I normalize to xNormal form or not? So many modelled tables to
performance issues over theoretical correctness even though the theory was
totally correct.

I suspect this is a similar albiet nonidentical situation. I know this is a
bit abstract of an answer but it's really hard to get more specific without
using specific instances. I'll gladly elaborate on specific examples and I
can be a lot more clear then.

Cheers,

Bill

www.devbuzz.com
www.knowdotnet.com

--
cody

[Freeware, Games and Humor]
www.deutronium.de.vu || www.deutronium.tk

Cor Ligthert · May 10, 2004

Hi Cody,

In addition to Bill Ryan,

I see this often on this dotNet newsgroups. It seems that OOP is only OOP
when you make your own collections using the object methods.

I think it is something more.

For the rest I think the most is said in this thread.

Just my thought.

Cor

Frans Bouma [C# MVP] · May 10, 2004

Actually, as DataSet is a class in the ADO.NET hierarchy, it has everything
to do with OO.
You use one or more DataSet objects in your OO solution. How you choose to
have objects from your own class definitions interact with DataSet objects
is up to you.

I hate to spoil the party, Peter, but DataSets lack a lot in the OO area.
For example, the building blocks of datasets are deep down, the datarows and
the datacolumns. There is no public constructor for the datarow. The datarow
can only be created by a datatable. This is because the cells of a datarow
are described by column objects in the datatable. In other words, you can't
use datarow objects on their own, you need a datatable and datacolumns as
well.

You can subclass the datarow, for example to add a property, but this will
create a lot of mess: first you also have to subclass DataTable to produce
your own datarow classes. Now the fun part begins. The embedded functionality
inside the DataTable is not seeing your added code. For example when you bind
to the datatable, the embedded code will completely ignore your property and
it will not show up in the bound grid. Furthermore, serializing the Datatable
will completely skip the new property.

There is also no way you can extend this code, as the methods are made
private (ISerializable is implemented private).

For example, if create an instance of your Customer class, the datamembers
of your object could be set from a given row in a table within a DataSet
object, if it made sense to do so in your scenario.
By using DataSet objects, you're taking advantage of inheritance and
encapsulation, two of the more prominent features of OO.

I fail to see where you take advantage of inheritance when using a DataSet.
All important technology in the dataset is hidden and can't be extended. As a
matter of fact, you can't even create serialization code in the subclass you
derive from DataTable when you use VB.NET, as you can't implement
ISerializable on the derived class because DataTable implements it private
and VB.NET doesn't let you implement it again. You can in C#, by explicitly
implement the interface, however you can't call DataTAble's base class
serialization code, which means that you have to do the serialization
completely yourself.

A datatable is a container/bucket for arrays of object references (datarows)
and it contains per cell index position of these arrays 1 datacolumn object
to describe the contents of all the cells at that index in all the arrays. A
DataSet is a container/bucket for datatables and defines relations between
them via objects. DataSets and friends, although perhaps useful, are a very
bad example for OOP.

FB

Frans Bouma [C# MVP] · May 10, 2004

William said:
cody said:

I've seen an Introduction on ADO.NET with its Datasets on .NET TV and Iam
now wondering how it is realized/used in real world applications.

I don't believe that one would create a dataset and add relations to it and
so on for every table in the application, this would be a real mess and this
has nothing to do with OOP.

Click to expand...

[...]
There's a philosophy known as OR/M (Object Relational Mapping) that tends to
bridge this gap, and if you look at ADO.NET 2.0 ,there's an object called an
ObjectSpace that addresses your concerns.

No, that's been removed from ADO.NET. It will be released separately.

The devil is really in the details and the problem here is that you may have
many-to-many relationships which are a nightmare to handle with the current
model. The short answer is create four tables, link them where you can.

m:n relations are definitions of a higher abstract form, like you find in
NIAM/ORM. In the relational model, they do not exist, in there you only have
FK constraints. You can semantically define relations between elements,
however you always require 1 or more elements:
1:1 relation can be: a PK - PK relation or a PK - FK(non PK)/UC relation
a m:n relation is always build with: a 1:n and a m:1 relation. Because you
can define an m:n relation and an 1:1 relation with different elements, you
can also simply specify these elements and use them separately. For 1:1
relations this can be a real pain, and it's not hard to specify 1:1 relations
with the same logic as 1:n relations are specified because they too are
defined between 2 elements (attribute sets, living in separate tables or in
teh same table).

m:n relations are different because they require an intermediate element
where both ends of the m:n relation are related with. (order <- orderlines ->
product) The problem begins when you want to save 2 elements which are
related via an m:n relation. What to do with the intermediate element? Create
automatically or do you have to add this as well? I find it KEY that that
intermediate element is visible and has to be saved as well. Because if
that's the case, the whole relational model inside a dataset or object model
is much simpler to understand and to work with and code stays consistent.

Datasets and encapsulating data

It can certainly be but not necessarily so. Like I mentioned in the
beginning, it's not a contradiction in that the object doesn't care where
it's data comes from. A dataset whose structure can mimic the object
strucutre is a very natural fit. But I'm not going to pretend that there
aren't a bunch of situations where the implementation is awkward.

Regrettably, OOP isn't a perfect model either and not everything can be
modeled by objects (why do I get the feeling I'm opening a can of worms with
this last statement ;-) ). Rather, everything can probably be modelled,
but not elegantly. Same with DataSet.

I'm with you here. DataSet objects are good at what they are for: being a
container for other containers in which data is stored which is related
(inside the table ! and between tables) based on information stored inside
the container at runtime. This is very flexible, but doesn't work in OOP very
well, where you have fixed definitions and you extend these through
inheritance and polymorphism.

I think that the result is that you need to consider the tools you have at
your disposal. For years, Joins were really costly (and still are in many
situations) in most RDMBS situations in many circumstances, but that has
nothing to do with Relational theory.

Joins are very cheap btw, as they can be very well optimized. Subqueries on
the other hand ARE still very expensive.

FB

Cor Ligthert · May 10, 2004

Hallo Frans,

A lot of text, however when I read this I have to think on the management
class, for which is much the same in my idea as you write.

When it is about dataset everybody seems to want to serialize to get an in
my idea older concept of dataprocessing.

My idea about it is that people want to keep the three tier, while I do not
see much necessary anymore for it (or better to say, it conflicts for me in
some cases)

However I place this to discuss, not because I am completly sure of it.

Before you understand it wrong, I have made applications in far past with
much more than three tiers, because I than seperated the datacom, the
database, the application, the security in seperated tiers.

So go ahead, shoot.

Cor

Alfredo · May 10, 2004

Regrettably, OOP isn't a perfect model either and not everything can be
modeled by objects

OOP is not a data management model. The known data management models
are The Network Model (with its specialization: The Hierarchical
Model) and The Relational Model.

Unfortunately many OO practicioners use the obsolete Network Model.

On the contrary. However, the
pragmatic reality made the theoritical application a 'contradiction'...
should I normalize to xNormal form or not? So many modelled tables to
performance issues over theoretical correctness even though the theory was
totally correct.

The dilemma is due to the lack of data independence of the current
DBMS products.

Regards
Alfredo

Alfredo · May 10, 2004

No, that's been removed from ADO.NET. It will be released separately.

Does anyone know why?

Because MS wants to make money with it or because it is too
controversial?

I'm with you here. DataSet objects are good at what they are for: being a
container for other containers in which data is stored which is related
(inside the table ! and between tables) based on information stored inside
the container at runtime.

DataSet objects are intended to map an SQL database.

Joins are very cheap btw, as they can be very well optimized. Subqueries on
the other hand ARE still very expensive.

Subqueries can be as well optimized as joins.

Regards
Alfredo

Frans Bouma [C# MVP] · May 10, 2004

Cor said:
When it is about dataset everybody seems to want to serialize to get an in
my idea older concept of dataprocessing.

Not necessarily. Serializing a dataset is performed for example when you
return it from a webservice (XmlSerializer) or when you return it from a
remoted method (Soap/Binary formatter or your own).

My idea about it is that people want to keep the three tier, while I do not
see much necessary anymore for it (or better to say, it conflicts for me in
some cases)

n-tier programming is about semantics. You can create a single assembly and
still have a 3-tier application, as long as you keep logical groups separated.

FB

Cor Ligthert · May 10, 2004

Hi Frans,

I forgot to tell, serializing a dataset and deserialization is a piece of
cake when you use the stringreader and writer (which is not used in the
documentation as far as I know). This is all.

Serialize
\\\
Dim sw As New System.IO.StringWriter
ds.WriteXml(sw)
Dim mystring As String = sw.tostring
///
Deserialize
\\\
Dim sr As New System.IO.StringReader(mystring)
Dim ds2 As New DataSet
ds2.ReadXml(sr)
///

Cor

Frans Bouma [C# MVP] · May 10, 2004

Alfredo said:
called an >> ObjectSpace that addresses your concerns.

Does anyone know why?
Because MS wants to make money with it or because it is too
controversial?

It will be free, but I think they removed it for a couple of reasons:
1) integration of a sqlserver only tool inside the .NET framework is perhaps
not a wise thing to do, competition wise. (Oracle f.e. couldn't write its own
SQL engine / mapping engine for objectspaces)
2) it is used in Microsoft Business Framework (and longhorn's winfs), which
are both not part of the .net framework so by releasing it separately, you
can focus on aspects required for those two techniques and not necessarily
have to build in stuff you don't need.

Again, my own opinion, no-one knows for sure.

DataSet objects are intended to map an SQL database.

Where did you get this information? Afaik, they are intented to store
resultsets.

Subqueries can be as well optimized as joins.

No, because you have to perform different data-algebra expressions with
subqueries than with joins. Check the execution plans on northwind for:

SELECT DISTINCT Customers.* FROM
Customers INNER JOIN Orders
ON Customers.CustomerID = Orders.CustomerID
INNER JOIN Employees
ON Orders.EmployeeID = Employees.EmployeeID
Where Employees.Country = 'UK'

and:
SELECT *
FROM Customers
WHERE CustomerID IN
(
SELECT CustomerID
FROM Orders
WHERE EmployeeID IN
(
SELECT EmployeeID
FROM Employees
WHERE Country='UK'
)
)

Trace says: query 1 has a duration of 15 and 162 reads. QUery 2 has a
duration of 16 and 194 reads. It is hard to optimize outer query performance
based on a resultset from an inner query performance, because these
statistics are not known at compile time of the query (the join statistics
are, these are kept with the table, how many rows there are f.e.). So the
optimizer can easily optimize a join path by first trying to weed out as much
rows as possible by starting with the join with an operand the fewest rows.

FB

Peter van der Goes · May 10, 2004

Frans Bouma said:
I hate to spoil the party, Peter, but DataSets lack a lot in the OO area.

<snip>
You didn't "spoil the party".
I detected what I believed to be a question from someone who has some basic
misunderstandings about OO, and I was answering in that vein.

Alfredo · May 10, 2004

Again, my own opinion, no-one knows for sure.

Very informative, thanks.

Where did you get this information? Afaik, they are intented to store
resultsets.

The resultsets returned from an SQL database, to manage them and to
submit the changes to the DBMS. The same as I said but with other
words.

I know they can be used with non SQL based data, but it is clear that
it is the main intention.

No, because you have to perform different data-algebra expressions with
subqueries than with joins. Check the execution plans on northwind for:

I said that they can be as well optimized as joins, not that SQL
Server optimize them as well as joins ;-)

Trace says: query 1 has a duration of 15 and 162 reads. QUery 2 has a
duration of 16 and 194 reads.

Query 2 could be transformed in Query 1 by the optimizer.

Regards
Alfredo

William Ryan eMVP · May 10, 2004

Frans Bouma said:
William said:

cody said:

I've seen an Introduction on ADO.NET with its Datasets on .NET TV and Iam
now wondering how it is realized/used in real world applications.

I don't believe that one would create a dataset and add relations to

Click to expand...

it
and

so on for every table in the application, this would be a real mess

Click to expand...

and
this

has nothing to do with OOP.

Click to expand...

[...]
There's a philosophy known as OR/M (Object Relational Mapping) that tends to
bridge this gap, and if you look at ADO.NET 2.0 ,there's an object called an
ObjectSpace that addresses your concerns.

Click to expand...

No, that's been removed from ADO.NET. It will be released separately.

Nice to know, but I had quite a bit of info to the contrary, guess I need
to keep more up to date on it.

m:n relations are definitions of a higher abstract form, like you find in
NIAM/ORM. In the relational model, they do not exist, in there you only have
FK constraints. You can semantically define relations between elements,
however you always require 1 or more elements:
1:1 relation can be: a PK - PK relation or a PK - FK(non PK)/UC relation
a m:n relation is always build with: a 1:n and a m:1 relation. Because you
can define an m:n relation and an 1:1 relation with different elements, you
can also simply specify these elements and use them separately. For 1:1
relations this can be a real pain, and it's not hard to specify 1:1 relations
with the same logic as 1:n relations are specified because they too are
defined between 2 elements (attribute sets, living in separate tables or in
teh same table).

m:n relations are different because they require an intermediate element
where both ends of the m:n relation are related with. (order <- orderlines ->
product) The problem begins when you want to save 2 elements which are
related via an m:n relation. What to do with the intermediate element? Create
automatically or do you have to add this as well? I find it KEY that that
intermediate element is visible and has to be saved as well. Because if
that's the case, the whole relational model inside a dataset or object model
is much simpler to understand and to work with and code stays consistent.

I'm with you here. DataSet objects are good at what they are for: being a
container for other containers in which data is stored which is related
(inside the table ! and between tables) based on information stored inside
the container at runtime. This is very flexible, but doesn't work in OOP very
well, where you have fixed definitions and you extend these through
inheritance and polymorphism.

Joins are very cheap btw, as they can be very well optimized. Subqueries on
the other hand ARE still very expensive.

They can be cheap if done correctly, they can also be 'expensive' although
that's still a relative term. The point is that the implementation of a
model doesn't mean the model is weak. For years while i was too small to
read and before I was born, there was a fair amount of literature and
stereotypes that the relational model was mud just b/c the vendor
implementation didn't do X well for instance. Since I was sucking my thumb
at the time instead of using Oracle 1.0.1, I have to just look to what was
written and many themes and misconceptions seemed to be fairly commonplace.
But this didn't have anything to do with the model's failure and that was
ostensibly the analogy I was trying to make

Alfredo · May 10, 2004

n-tier programming is about semantics.

N-tier programming is about physical tiers. You still have a logical
client and a logical server.

Regards
Alfredo

William Ryan eMVP · May 10, 2004

Alfredo said:
OOP is not a data management model. The known data management models
are The Network Model (with its specialization: The Hierarchical
Model) and The Relational Model.

Ok, cool. I don't think I was making that point at all, so sorry if it
somehow sounded like I was.

Unfortunately many OO practicioners use the obsolete Network Model.

There are a lot of toerh misaaplications of OOP that are far more complex
than just this issue.

The dilemma is due to the lack of data independence of the current
DBMS products.

I'm not sure I understand what you mean here, but if your point is that
tightly coupling Data with the objects they represent is a problem, then yes
I agree.

Frans Bouma [C# MVP] · May 10, 2004

Cor said:
I forgot to tell, serializing a dataset and deserialization is a piece of
cake when you use the stringreader and writer (which is not used in the
documentation as far as I know). This is all.

Serialize
\\\
Dim sw As New System.IO.StringWriter
ds.WriteXml(sw)
Dim mystring As String = sw.tostring
///
Deserialize
\\\
Dim sr As New System.IO.StringReader(mystring)
Dim ds2 As New DataSet
ds2.ReadXml(sr)
///

I know, but when you subclass the DataTable class and add a property, it
will not be serialized into the data.

FB

Datasets vs. OOP

cody

Peter van der Goes

Alfredo

Jay B. Harlow [MVP - Outlook]

William Ryan eMVP

Cor Ligthert

Frans Bouma [C# MVP]

Frans Bouma [C# MVP]

Cor Ligthert

Alfredo

Alfredo

Frans Bouma [C# MVP]

Cor Ligthert

Frans Bouma [C# MVP]

Peter van der Goes

Alfredo

William Ryan eMVP

Alfredo

William Ryan eMVP

Frans Bouma [C# MVP]