DataSet is efficient in manipulating data?

G

Guest

Hi all,
Our company is developing a large scale Web application with .NET and our designers can not agree on the business entities implementation. I thought I will post the case here and hope to get some opinions from the experts. Our application design is a typical n-tiers design aplication where we have a ASP.NET UI, business logic layer, and a data access layer. The final system will have a web farm with about 10 machines, another ten machines as "application servers", and Oracle database servers. Access from UI to business layer is going through .NET Remoting.
Basically we have agreed that we will use the custom business entity class instead of just pure DataSet because there are a lot of business logic in a business entity, and custom collection to hold a collection of business entities. However, there is a bit of argument involves in the physical implementation of the entity class, or be more specify, the way to represent data in implementation. One group favour the standard OO approach where we create class with attributes to hold the values. Another group want to use DataSet, DataTable as the underlying data structure to maintain relationships between entities. A Business Entity Class has "Properties" to get and set values straight from a DataColumn in a DataRow. So at runtime, each entity object has a reference to a particular DataRow object that is in a DataTable object, which is in a DataSet object. The code looks something like this
----------
case 1: Typical OO approach

Organisation
{
public String Name
{get {return name;}
set {name = value;}
}

// other properties
private string name;
}
}

-- Case 2: DataSet/Table approach

Person
{
public String Name
{ get {return (string)row["name"];}
set { row["name"] = value;}
}
// Other properties
private DataRow row;
}
---------------
Major issues are:
1. Sometime there is a need to display data in a DataGrid and allows sorting of data on different columns. We haven't been able to do the sorting with binding to ArrayList (if there is a way to get around this, please do advice). But there is argument that business layer design should be independant of UI. We should come out with a machinism to solve the formating problem on UI, but not changing the application design to suit UI.
2. Performance of system, typically database update as use cases require us to hold a major entity, and sub entities, and sub sub entities all in memory until user explicitly say "save to database", which could be a very time consuming operation.
The Microsoft consultant that we are contracting suggests that we should optimise database operations. But this is the part that really confuse me. When we are discussing the issue with DataSet, he said DataSet is more efficient in term of database operations (updating etc). And DataSet is the "standard Microsoft practice" in application development. My understanding is DataSet is very convinient in dealing with records and binding to controls, but less efficient. Also accessing DataColumn to retrieve and store value definitely involves much more operations that writing straight to a variable. Consider that we have a lot of business logic to deal with individual entity, there bound to be a lot of access to the columns, which could be an expensive operation. I haven't found any source that said DataSet is more efficient. I would very much like to hear your opinion on this.

Thanks
 
C

Christian Boult

Use typed dataset.
So you would access info in the datatable this way :
dsMyTypedDataSet.Employee[0].EmployeeName
Dataset offer very easy to use data to xml and xml to data transformations
that can be very useful.
You can create your own flavor of a typed dataset to meet your needs if the
one provided by MS dosen't fit your needs. But the underlying data storage
mechanism for your entity should be datasets, all of ado.net works with
datasets so will not have to change from your business entity representation
to a dataset to pass on to DataAdapter and the likes.

One place you win by using custom business objects is in serialization of
datasets to pass through your tiers. Dataset serialization is always
textual, not binary even if you use a binary formatter. Also it's rather
memory intensive,if your dataset is quite big you will hit out of memory
problems, there are ways around that though.

Chris.


CY said:
Hi all,
Our company is developing a large scale Web application with .NET and
our designers can not agree on the business entities implementation. I
thought I will post the case here and hope to get some opinions from the
experts. Our application design is a typical n-tiers design aplication
where we have a ASP.NET UI, business logic layer, and a data access layer.
The final system will have a web farm with about 10 machines, another ten
machines as "application servers", and Oracle database servers. Access from
UI to business layer is going through .NET Remoting.
Basically we have agreed that we will use the custom business entity class
instead of just pure DataSet because there are a lot of business logic in a
business entity, and custom collection to hold a collection of business
entities. However, there is a bit of argument involves in the physical
implementation of the entity class, or be more specify, the way to represent
data in implementation. One group favour the standard OO approach where we
create class with attributes to hold the values. Another group want to use
DataSet, DataTable as the underlying data structure to maintain
relationships between entities. A Business Entity Class has "Properties" to
get and set values straight from a DataColumn in a DataRow. So at runtime,
each entity object has a reference to a particular DataRow object that is in
a DataTable object, which is in a DataSet object. The code looks something
like this
----------
case 1: Typical OO approach

Organisation
{
public String Name
{get {return name;}
set {name = value;}
}

// other properties
private string name;
}
}

-- Case 2: DataSet/Table approach

Person
{
public String Name
{ get {return (string)row["name"];}
set { row["name"] = value;}
}
// Other properties
private DataRow row;
}
sorting of data on different columns. We haven't been able to do the
sorting with binding to ArrayList (if there is a way to get around this,
please do advice). But there is argument that business layer design should
be independant of UI. We should come out with a machinism to solve the
formating problem on UI, but not changing the application design to suit UI.
2. Performance of system, typically database update as use cases require
us to hold a major entity, and sub entities, and sub sub entities all in
memory until user explicitly say "save to database", which could be a very
time consuming operation.
The Microsoft consultant that we are contracting suggests that we should
optimise database operations. But this is the part that really confuse me.
When we are discussing the issue with DataSet, he said DataSet is more
efficient in term of database operations (updating etc). And DataSet is the
"standard Microsoft practice" in application development. My understanding
is DataSet is very convinient in dealing with records and binding to
controls, but less efficient. Also accessing DataColumn to retrieve and
store value definitely involves much more operations that writing straight
to a variable. Consider that we have a lot of business logic to deal with
individual entity, there bound to be a lot of access to the columns, which
could be an expensive operation. I haven't found any source that said
DataSet is more efficient. I would very much like to hear your opinion on
this.
 
Y

Yuancai \(Charlie\) Ye

Hi,
See my comments and wish they are useful to you. Over all, I favor your
approach for the better performance, use your own class objects and pass
them across machines with custom serialization.

--
Yuancai (Charlie) Ye

Fast and securely accessing all of remote data sources anywhere with
SocketPro using batch/queue, asynchrony and parallel computation

See 30 well-tested and real OLEDB examples

RDB, a tool for fast and securely accessing remote databases with dial-up,
cable, DSL and wireless modems anywhere
www.udaparts.com


CY said:
Hi all,
Our company is developing a large scale Web application with .NET and
our designers can not agree on the business entities implementation. I
thought I will post the case here and hope to get some opinions from the
experts. Our application design is a typical n-tiers design aplication
where we have a ASP.NET UI, business logic layer, and a data access layer.
The final system will have a web farm with about 10 machines, another ten
machines as "application servers", and Oracle database servers. Access from
UI to business layer is going through .NET Remoting.

It looks like your project requires high performance and scalability.
Basically we have agreed that we will use the custom business entity class
instead of just pure DataSet because there are a lot of business logic in a
business entity, and custom collection to hold a collection of business
entities. However, there is a bit of argument involves in the physical
implementation of the entity class, or be more specify, the way to represent
data in implementation. One group favour the standard OO approach where we
create class with attributes to hold the values. Another group want to use
DataSet, DataTable as the underlying data structure to maintain
relationships between entities. A Business Entity Class has "Properties" to
get and set values straight from a DataColumn in a DataRow. So at runtime,
each entity object has a reference to a particular DataRow object that is in
a DataTable object, which is in a DataSet object. The code looks something
like this

Using ADO.net objects is clear in logic. However, I don't like these objects
if you consider performnace. I will never expect that passing DataSet and
DataTable across machines will be efficient even thoogh with dotNet 2 in the
future. You may see the two articles at
http://www.udaparts.com/articles/fastsocketpro.htm and
http://www.udaparts.com/articles/sendobjects.htm. They are very different
from dotNet remoting, but you may get some useful information from them.
Using your own objects plus own serialization will be much faster although
it has a bit more work.
----------
case 1: Typical OO approach

Organisation
{
public String Name
{get {return name;}
set {name = value;}
}

// other properties
private string name;
}
}

-- Case 2: DataSet/Table approach

Person
{
public String Name
{ get {return (string)row["name"];}
set { row["name"] = value;}
}
// Other properties
private DataRow row;
}
sorting of data on different columns. We haven't been able to do the
sorting with binding to ArrayList (if there is a way to get around this,
please do advice). But there is argument that business layer design should
be independant of UI. We should come out with a machinism to solve the
formating problem on UI, but not changing the application design to suit UI.

I solved sorting problems by reconstructing a DataSet object at IIS machine.
However, it is considerably expensive to create a DataSet/DatTable object.
You may play the sample at http://www.udaparts.com/articles/iisuskt.htm to
see how much overhead reconstructing a DataSet object has.
2. Performance of system, typically database update as use cases require
us to hold a major entity, and sub entities, and sub sub entities all in
memory until user explicitly say "save to database", which could be a very
time consuming operation.
The Microsoft consultant that we are contracting suggests that we should
optimise database operations. But this is the part that really confuse me.
When we are discussing the issue with DataSet, he said DataSet is more
efficient in term of database operations (updating etc). And DataSet is the
"standard Microsoft practice" in application development. My understanding
is DataSet is very convinient in dealing with records and binding to
controls, but less efficient. Also accessing DataColumn to retrieve and
store value definitely involves much more operations that writing straight
to a variable. Consider that we have a lot of business logic to deal with
individual entity, there bound to be a lot of access to the columns, which
could be an expensive operation. I haven't found any source that said
DataSet is more efficient. I would very much like to hear your opinion on
this.

If all of your data table objects are small and have records no more than
100, DataSet and DataTable may be fine. Once records are more than 500,
passing a DataSet or DataTable is not efficient at all. If you try your
mechinism to serialize DataSet or DataTable, why don't you use your own
classes?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top