PC Review


Reply
Thread Tools Rate Thread

dataset Performence Issue

 
 
=?Utf-8?B?QXNoaXNodGhhcHM=?=
Guest
Posts: n/a
 
      2nd Aug 2005
We've heard reports that the Dataset object in the .NET 1.1 frameworks tends
to slow down dramatically after it reaches a certain size, around 70MB or so,
regardless of the amount of RAM installed on the machine. We haven't
experienced this ourselves yet, but can see the day when our Dataset will
comprise of hundreds of megabytes of data.



Questions:



1 Is this a well-known issue?

2 Is this a general performance problem, or just a problem with certain
methods? The Merge method comes to mind, which we use quite a bit in our
code indirectly.

3 Is there any way to work around this limitation?

4 Does the NET 2.0 framework address this issue?


 
Reply With Quote
 
 
 
 
Sahil Malik [MVP]
Guest
Posts: n/a
 
      3rd Aug 2005
If your DataSet will occupy 100's of MB, seriously that is ... ugghh .. it
makes me feel so pukish .. sorry but just don't do that. Such a big dataset
is like blasphemy, it's so awful, I am just so disgusted to even hear that.
Why do you need such a big dataset? That is an AWFUL way to use a DataSet.

In .NET 1.1, given the size of the dataset of course everything will croak.
Yes Especially GetChanges and Merge, both will be awful - especially if you
have lots of relations and a lot of tables.

The workaround is - "A DATASET IS NOT A DATABASE, DON'T ABUSE IT AS ONE"

..NET 2.0 has many enhancements under the scenes in both the various
underlying collections and the GetChanges algorithm and otherwise that make
using a DataSet a lot better, but 70MB? 100MB in a Dataset (In-Memory
Disconnected Cache?) - That is SUPER DUPER BUMPER ULTRA AWFUL.

BTW, especially YOU need to read Chapters 9 and 10 of my book to understand
clearly and practically why I am advocating so strongly against such a
misuse/ABUSE of a DataSet.

- Sahil Malik [MVP]
ADO.NET 2.0 book -
http://codebetter.com/blogs/sahil.ma.../13/63199.aspx
----------------------------------------------------------------------------




"Ashishthaps" <(E-Mail Removed)> wrote in message
news:5841F739-7619-42C1-AFD7-(E-Mail Removed)...
> We've heard reports that the Dataset object in the .NET 1.1 frameworks
> tends
> to slow down dramatically after it reaches a certain size, around 70MB or
> so,
> regardless of the amount of RAM installed on the machine. We haven't
> experienced this ourselves yet, but can see the day when our Dataset will
> comprise of hundreds of megabytes of data.
>
>
>
> Questions:
>
>
>
> 1 Is this a well-known issue?
>
> 2 Is this a general performance problem, or just a problem with certain
> methods? The Merge method comes to mind, which we use quite a bit in our
> code indirectly.
>
> 3 Is there any way to work around this limitation?
>
> 4 Does the NET 2.0 framework address this issue?
>
>



 
Reply With Quote
 
Adrian Moore
Guest
Posts: n/a
 
      3rd Aug 2005
Sahil,

There's nothing wrong with having a large, in-memory database. There are
lots of reasons in a real-time environment where a database can fit in
memory and performance rules. The SCADA system I work with has a 400 MB
in-memory database. Data is flushed to disk, once an hour or on-demand.

In-memory databases are also typically used in the embedded market since I/O
to flash memory is usually not great.

I see Datasets as a good solution to some problems.

I see other open-source databases like Firebird, SharpSQL and SQLLite as
good solutions when a small footprint RDBMS is needed. They are simple to
install and setup.

MSDE / SQL-Server Express is still overkill for many database needs, but
does provide a great solution to problems that match it capabilities.

I look foward to reading you book when its available.
Ad.


"Sahil Malik [MVP]" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> If your DataSet will occupy 100's of MB, seriously that is ... ugghh .. it
> makes me feel so pukish .. sorry but just don't do that. Such a big
> dataset is like blasphemy, it's so awful, I am just so disgusted to even
> hear that. Why do you need such a big dataset? That is an AWFUL way to use
> a DataSet.
>
> In .NET 1.1, given the size of the dataset of course everything will
> croak. Yes Especially GetChanges and Merge, both will be awful -
> especially if you have lots of relations and a lot of tables.
>
> The workaround is - "A DATASET IS NOT A DATABASE, DON'T ABUSE IT AS ONE"
>
> .NET 2.0 has many enhancements under the scenes in both the various
> underlying collections and the GetChanges algorithm and otherwise that
> make using a DataSet a lot better, but 70MB? 100MB in a Dataset (In-Memory
> Disconnected Cache?) - That is SUPER DUPER BUMPER ULTRA AWFUL.
>
> BTW, especially YOU need to read Chapters 9 and 10 of my book to
> understand clearly and practically why I am advocating so strongly against
> such a misuse/ABUSE of a DataSet.
>
> - Sahil Malik [MVP]
> ADO.NET 2.0 book -
> http://codebetter.com/blogs/sahil.ma.../13/63199.aspx
> ----------------------------------------------------------------------------



 
Reply With Quote
 
Sahil Malik [MVP]
Guest
Posts: n/a
 
      3rd Aug 2005
Adrian,

Call it personal opinion, but I am not a big fan of Prevayler*.* concepts. I
feel standard RDBMS's will evolve to provide that functionality.
But what I am certainly not a fan of is storing 400 MB in a DataSet - that
object is just not designed for such a heavy amount of in memory data.

- Sahil Malik [MVP]
ADO.NET 2.0 book -
http://codebetter.com/blogs/sahil.ma.../13/63199.aspx
----------------------------------------------------------------------------

"Adrian Moore" <(E-Mail Removed)> wrote in message
news:uSvzK%(E-Mail Removed)...
> Sahil,
>
> There's nothing wrong with having a large, in-memory database. There are
> lots of reasons in a real-time environment where a database can fit in
> memory and performance rules. The SCADA system I work with has a 400 MB
> in-memory database. Data is flushed to disk, once an hour or on-demand.
>
> In-memory databases are also typically used in the embedded market since
> I/O to flash memory is usually not great.
>
> I see Datasets as a good solution to some problems.
>
> I see other open-source databases like Firebird, SharpSQL and SQLLite as
> good solutions when a small footprint RDBMS is needed. They are simple to
> install and setup.
>
> MSDE / SQL-Server Express is still overkill for many database needs, but
> does provide a great solution to problems that match it capabilities.
>
> I look foward to reading you book when its available.
> Ad.
>
>
> "Sahil Malik [MVP]" <(E-Mail Removed)> wrote in message
> news:(E-Mail Removed)...
>> If your DataSet will occupy 100's of MB, seriously that is ... ugghh ..
>> it makes me feel so pukish .. sorry but just don't do that. Such a big
>> dataset is like blasphemy, it's so awful, I am just so disgusted to even
>> hear that. Why do you need such a big dataset? That is an AWFUL way to
>> use a DataSet.
>>
>> In .NET 1.1, given the size of the dataset of course everything will
>> croak. Yes Especially GetChanges and Merge, both will be awful -
>> especially if you have lots of relations and a lot of tables.
>>
>> The workaround is - "A DATASET IS NOT A DATABASE, DON'T ABUSE IT AS ONE"
>>
>> .NET 2.0 has many enhancements under the scenes in both the various
>> underlying collections and the GetChanges algorithm and otherwise that
>> make using a DataSet a lot better, but 70MB? 100MB in a Dataset
>> (In-Memory Disconnected Cache?) - That is SUPER DUPER BUMPER ULTRA AWFUL.
>>
>> BTW, especially YOU need to read Chapters 9 and 10 of my book to
>> understand clearly and practically why I am advocating so strongly
>> against such a misuse/ABUSE of a DataSet.
>>
>> - Sahil Malik [MVP]
>> ADO.NET 2.0 book -
>> http://codebetter.com/blogs/sahil.ma.../13/63199.aspx
>> ----------------------------------------------------------------------------

>
>



 
Reply With Quote
 
Cor Ligthert [MVP]
Guest
Posts: n/a
 
      3rd Aug 2005
Adrian,

I agree with you that there can be many reasons to have a large in memory
database.
By instance with a single user used database in situations where speed is
very important.

However that is not the reason a dataset is build for. A dataset is build
for multipurpose use in a multi-user disconnected situation and has
therefore all kinds of features.

By instance in a single used database it is not of any importance if a row
is changed. After copying the previous one in process time, do you dump the
whole database over the offline stored data

Therefore let us in this performance issue not mix up apples with computers.

Cor


 
Reply With Quote
 
Nigel Norris
Guest
Posts: n/a
 
      3rd Aug 2005

"Sahil Malik [MVP]" <(E-Mail Removed)> wrote in message
news:Of7iYB%(E-Mail Removed)...

> But what I am certainly not a fan of is storing 400 MB in a DataSet - that
> object is just not designed for such a heavy amount of in memory data.
>

Sahil,

You make that assertion - can you elaborate as to why you believe that? I
don't see anything in the documentation, or the services that DataSets
provide, that would make me believe that they are not designed to handle
large amounts of data (given the inherent constraints of memory, garbage
collecting very large heaps, etc).

Now it appears to be the case that the V1.1 implementation has some
performance problems in some areas with large tables, but at least some of
these are fixed in V2.0. So Microsoft are making efforts to ensure that
large volumns of data are supported. See the following article for some
information:

http://msdn.microsoft.com/library/de...setenhance.asp

While I would agree that the main use and focus for DataSets is *selective*
caching of data, I can certainly envisage situations where I might want to
hold quite a large extract of a table in a dataset (where connectivity is
not always available, for instance).

---------
Nigel Norris



 
Reply With Quote
 
Adrian Moore
Guest
Posts: n/a
 
      3rd Aug 2005
Sahil,

I expect Datasets will also continue to evolve in order to meet the feature
and performance needs that developers are demanding.

Ad.

"Sahil Malik [MVP]" <(E-Mail Removed)> wrote in message
news:Of7iYB%(E-Mail Removed)...
> Adrian,
>
> Call it personal opinion, but I am not a big fan of Prevayler*.* concepts.
> I feel standard RDBMS's will evolve to provide that functionality.
> But what I am certainly not a fan of is storing 400 MB in a DataSet - that
> object is just not designed for such a heavy amount of in memory data.
>
> - Sahil Malik [MVP]
> ADO.NET 2.0 book -
> http://codebetter.com/blogs/sahil.ma.../13/63199.aspx
> ----------------------------------------------------------------------------



 
Reply With Quote
 
=?Utf-8?B?QXNoaXNodGhhcHM=?=
Guest
Posts: n/a
 
      3rd Aug 2005
I have found answer for my queries i would like to share this with you


Question
=======
“We've heard reports that the Dataset object in the .NET 1.1 frameworks tend
to slow down dramatically after it reaches a certain size, around 70MB or so,
regardless of the amount of RAM installed on the machine. We haven't
experienced this ourselves yet, but can see the day when our Dataset will
comprise of hundreds of megabytes of data. “



Please find answers to your queries below
============================

1 Is this a well-known issue?


Yes for v1.x, it’s a well known issue.



2 Is this a general performance problem, or just a problem with certain
methods? The Merge method comes to mind, which we use quite a bit in our
code indirectly.



For v1.x, this is a general performance problem – it cuts across all write
actions i.e. all methods that modify DataRows – including Merge.

The performance problem is mitigated to some extent for bulk load operations
which include Merge, Adapter.Fill and Table.AcceptChanges(). The impact is
most visible for singleton random inserts, deletes and modifications.



3 Is there any way to work around this limitation?



Ways to work around this limitation?

1. Try to minimize constraints on columns. Constrains include
column.AllowDbNull, ForeignKey, PrimaryKey, UniqueKey, etc.

2. Try to minimize the use of relations as it involved building indexes on
the participating columns of the tables.



4 Does the NET 2.0 framework address this issue?

Yes, out of the box – no changes required in user code. The benefits cut all
across Dataset.



To give you some perspective on the change, 70 MB is at most 500K rows, and
in v2.0 you can fill 1 million rows, inserted randomly with a primary key in
around 30 seconds. To insert the same 1 million rows in v1.x, it takes 30
minutes.


Thanks for your support

"Ashishthaps" wrote:

> We've heard reports that the Dataset object in the .NET 1.1 frameworks tends
> to slow down dramatically after it reaches a certain size, around 70MB or so,
> regardless of the amount of RAM installed on the machine. We haven't
> experienced this ourselves yet, but can see the day when our Dataset will
> comprise of hundreds of megabytes of data.
>
>
>
> Questions:
>
>
>
> 1 Is this a well-known issue?
>
> 2 Is this a general performance problem, or just a problem with certain
> methods? The Merge method comes to mind, which we use quite a bit in our
> code indirectly.
>
> 3 Is there any way to work around this limitation?
>
> 4 Does the NET 2.0 framework address this issue?
>
>

 
Reply With Quote
 
Sahil Malik [MVP]
Guest
Posts: n/a
 
      3rd Aug 2005
Managed code atleast in the near future will never be able to outperform
native unmanaged code. I seriously doubt you will ever substitute a dataset
for SQL Server EVER.

--

- Sahil Malik [MVP]
ADO.NET 2.0 book -
http://codebetter.com/blogs/sahil.ma.../13/63199.aspx
-------------------------------------------------------------------------------------------

"Adrian Moore" <(E-Mail Removed)> wrote in message
news:etA$%(E-Mail Removed)...
> Sahil,
>
> I expect Datasets will also continue to evolve in order to meet the
> feature and performance needs that developers are demanding.
>
> Ad.
>
> "Sahil Malik [MVP]" <(E-Mail Removed)> wrote in message
> news:Of7iYB%(E-Mail Removed)...
>> Adrian,
>>
>> Call it personal opinion, but I am not a big fan of Prevayler*.*
>> concepts. I feel standard RDBMS's will evolve to provide that
>> functionality.
>> But what I am certainly not a fan of is storing 400 MB in a DataSet -
>> that object is just not designed for such a heavy amount of in memory
>> data.
>>
>> - Sahil Malik [MVP]
>> ADO.NET 2.0 book -
>> http://codebetter.com/blogs/sahil.ma.../13/63199.aspx
>> ----------------------------------------------------------------------------

>
>



 
Reply With Quote
 
Sahil Malik [MVP]
Guest
Posts: n/a
 
      3rd Aug 2005
Nigel,

It is true that .NET 2.0's dataset has been really improved from within to
handle greater quantities of data, but the justification of that is the
desire to abuse it as a database. It shouldn't be interpreted as a nod from
Microsoft that a DataSet is okay to abuse as a DataBase. (I am not Microsoft
mind you).

There are a number of reasons for this.

1. First of all, Dataset is guess what - Managed Code, and to send all the
..NET lovers in tailspin, Managed Code can never be as fast and as optimized
as native code. That is reasonable to expect and I am certainly not saying
that unmanaged is better and drop managed code for good - NO WAY, there are
a lot of benefits of using Managed code, but when it comes to raw
performance, Managed code sucks.

2. Secondly, the Garbage collector is that animal that makes things very
very good for 90% of the situations i.e. normal memory usage, but when you
start storing many megabytes or close to a gigabyte of information
completely in RAM - it will actually hurt your application performance. In
those scenarios, you don't want an external policeman who doesn't understand
the specific needs of your app. In that situation, you want fine control on
the memory where you specifiy when it gets cleaned, or serialized to the
disk etc. You need paging mechanisms etc. which are possible to write for
the dataset but are a real royal pain to write and even then they don't work
quite as well as - guess what - native code (i.e. most of what SQL Server).

3. SQL Server and any database comes with a "Query Engine". The number of
optimizations built into that is the work of many Phds (or dudes with
similar smarts and specialization), they have written up SQL Server's query
engine to take advantage of automatic paging, locking algorithms, spilling
over to the disk when needed, "query plans", caching those query plans -
when you compare the object model of a Dataset (or any biz object for that
matter), the comparison is like comparing a candle with the sun.

4. The algorithms in a DataSet is rudimentary, they rely on simple
techniques such as string matching, string manipulation - that level of
simplicity. They work on an "Object structure", every value they access goes
over a dereferenced segment calculation. SQL Server and any standard
database is written with that level of optimization (Not like I have looked
at their code, but it has to be written that way). Maybe in certain
instances DataSet might be smarter than I try to bring out here, but a
DataSet is an in memory object - that is how simple it is. It's just an
"object". It's worse than MS-Access when it comes to managing data for you.
(Don't get me wrong, I still love datasets, I just don't think they replace
databases). Even MS-Access will manage 4 GB, a DataSet doesn't even have a
clear upper limit definied. But of course MS Access isn't an in memory, xml
convertible, serializable in memory cache that lets you extract and merge
changes - so comparing dataset with access is like comparing apples and
oranges (but hey the argument is all about a DataSet is an Orange (in memory
cache), not an Apple (Database)).

5. Lets not forget transactional locks and many other such points, I blogged
about it earlier over here -
http://codebetter.com/blogs/sahil.ma.../23/47547.aspx

6. Datasets are or any such object - AN IN MEMORY disconnected cache of
data. Being completely in memory lends them to the disadvantage of a 32 bit
OS's 2 GB memory allocation limit, there are ways around that but I
personally see those ways as bandaids rather than a true solution. Secondly
being disconnected leaves you with a WORLD of problems to solve when trying
to persist relational data back into the database. If you think
DataAdapter.Update(Dataset) will save your entire dataset into the database,
and also take care of concurrency issues, and transactional deadlocks - you
are sorely mistaken. A simple 3 table hierarchy will require you to write
pages and pages of code to save properly into the database in every
scenario - it is NOT a trivial task. And then you have to worry about not
sending too much over the wire in web service like environments, so
Merge/GetChanges - and oh lets not forget keeping your disconnected cache
fresh. How do you resolve Deleted Rows? .NET 1.1 leaves you with very few
choices, .NET 2.0 has a new IndexOf method which doesn't work in every
circumstance.

Now of course you could argue that #6 proves that simply don't use a
database, only use a disconnected cache i.e. prevayler etc. For that
argument, re-read items #1 thru 5.

In short, while DataSets will continue to improve, or lets say, they will
continue to try and be as good as a full fledged database is, they will
NEVER reach that point. While even though datasets will improve, who knows
you may even be able to run a SQL Query against them directly, but hey you
could always store a little tiny such database on a RAMDisk, or a memory
mapped file in SQL Server, and pretty much get what you need including the
heavy duty research that has already gone in making "a database".

Again, I strongly and vehemently disagree with an architecture that puts 1
GB data into a DataSet. That is complete stupidity in both .NET 1.1 and 2.0.

Whew .. this was a long reply .. gotta go !!

- Sahil Malik [MVP]
ADO.NET 2.0 book -
http://codebetter.com/blogs/sahil.ma.../13/63199.aspx
-------------------------------------------------------------------------------------------




"Nigel Norris" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
>
> "Sahil Malik [MVP]" <(E-Mail Removed)> wrote in message
> news:Of7iYB%(E-Mail Removed)...
>
>> But what I am certainly not a fan of is storing 400 MB in a DataSet -
>> that object is just not designed for such a heavy amount of in memory
>> data.
>>

> Sahil,
>
> You make that assertion - can you elaborate as to why you believe that? I
> don't see anything in the documentation, or the services that DataSets
> provide, that would make me believe that they are not designed to handle
> large amounts of data (given the inherent constraints of memory, garbage
> collecting very large heaps, etc).
>
> Now it appears to be the case that the V1.1 implementation has some
> performance problems in some areas with large tables, but at least some of
> these are fixed in V2.0. So Microsoft are making efforts to ensure that
> large volumns of data are supported. See the following article for some
> information:
>
> http://msdn.microsoft.com/library/de...setenhance.asp
>
> While I would agree that the main use and focus for DataSets is
> *selective* caching of data, I can certainly envisage situations where I
> might want to hold quite a large extract of a table in a dataset (where
> connectivity is not always available, for instance).
>
> ---------
> Nigel Norris
>
>
>



 
Reply With Quote
 
 
 
Reply

Thread Tools
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
pata-sata performence ? =?Utf-8?B?bWl0Y2hrNQ==?= Windows XP Performance 0 11th Mar 2006 02:47 AM
suffring from slow performence manish Windows XP Accessibility 1 3rd Oct 2005 01:27 AM
where to buy high performence motherboard in Canada pirate Computer Hardware 0 20th Aug 2004 05:06 PM
Low .NET Socket performence!!! =?Utf-8?B?TmFkYXY=?= Microsoft C# .NET 5 14th Jul 2004 01:58 PM
Slugish hard disk performence under XP SirCanealot Windows XP Performance 2 2nd Feb 2004 11:24 PM


Features
 

Advertising
 

Newsgroups
 


All times are GMT +1. The time now is 08:01 AM.