Discussion: Typet DataSets vs Untyped DataSets

  • Thread starter Thread starter Przemo
  • Start date Start date
P

Przemo

Hi,

I wonder about performerce comparision between these two
types of DataSets. Are benefits of typed DataSets enought
to beat speed of untyped ones?
Or it doesn't matter?
What do you think?

Przemo
 
Przemo,

The typed dataset is an untyped dataset that is encapseled in another class.
Why would there be a speed difference?

Cor
 
Przemo,
Using a typed dataset will probably be faster then using an untyped dataset.

The reason being is the typed dataset uses a DataColumn when indexing into
the DataRow.Item property. Most untyped dataset code will use either an
Integer or the field name when indexing into the DataRow.Item property.

David Sceppa's book "Microsoft ADO.NET - Core Reference" from MS Press,
explains why & how the typed dataset is faster, he also explains why & how
you can use the same techniques in your own untyped datasets.

I recommend purchasing Sceppa's book as it is a good tutorial on ADO.NET
as well as a good desk reference once you know ADO.NET.

Hope this helps
Jay
 
Jay,
Using a typed dataset will probably be faster then using an untyped dataset.
Why when I use exactly the same code as is use to make a typed dataset to
access my untyped dataset directly.

I can understand your statetement, when you say: "a typed dataset is mostly
made with more eye for efficiency". However when it is as my first sentence,
I don't see it.

Cor
 
Cor,
Have you read Sceppa's book? Do you have Sceppa's book? I would strongly
recommend you purchase & read Sceppa's book!

Read chapter 9 "Working with strongly Typed DataSet objects", specifically
page 391-392 "Run-Time Benefits", as of the writing of the edition I have,
Sceppa found typed dataSets to be almost twice as fast as untyped datasets,
what the actual difference is I'm not sure as the edition I have his
statement was based on a beta version of .NET.

As I explained in my post, Typed DataSets use DataRow.Item(DataColumn),
which is faster then either DataRow.Item(String) or DataRow.Item(Integer). I
suspect the later two are implemented in terms of the first one, thus
explaining the difference.

Hope this helps
Jay
 
Cor,
Seeing as Sceppa's statement was based on a beta I put together the
following test, hopefully I did it right ;-)


Public Class DataSetTiming

Private Delegate Sub Test(ByVal row As DataRow, ByVal column As
DataColumn)

Declare Function QueryPerformanceCounter Lib "Kernel32" (ByRef X As
Long) As Short
Declare Function QueryPerformanceFrequency Lib "Kernel32" (ByRef X As
Long) As Short

Public Shared Sub Main()
Dim table As DataTable = CreateTable()
PopulateTable(table)
For index As Integer = 1 To 10
Debug.WriteLine(index, "test")
Debug.Indent()
RunTest("Integer", AddressOf IntegerIndex, table)
RunTest("String", AddressOf StringIndex, table)
RunTest("Column", AddressOf ColumnIndex, table)
Debug.Unindent()
Debug.WriteLine(Nothing)
Next
End Sub

Private Shared Function CreateTable() As DataTable
Dim table As New DataTable("Test")
table.Columns.Add("id", GetType(Integer))
table.Columns.Add("name", GetType(String))
table.Columns.Add("value", GetType(Decimal))
With table.Columns("id")
.AutoIncrement = True
.AutoIncrementSeed = -1
.AutoIncrementStep = -1
End With
Return table
End Function

Private Shared Sub PopulateTable(ByVal table As DataTable)
Dim rand As New Random
For index As Integer = 0 To 100000
table.Rows.Add(New Object() {Nothing, String.Format("V{0}",
index), rand.Next(1, 1000)})
Next
End Sub

Private Shared Sub RunTest(ByVal category As String, ByVal test As Test,
ByVal table As DataTable)
Dim start, finish, frequency As Long
QueryPerformanceCounter(start)
Dim column As DataColumn = table.Columns("value")
For Each row As DataRow In table.Rows
test(row, column)
Next
QueryPerformanceCounter(finish)
QueryPerformanceFrequency(frequency)
Debug.WriteLine((finish - start) / frequency, category)
End Sub

Private Shared Sub IntegerIndex(ByVal row As DataRow, ByVal column As
DataColumn)
row(2) = DirectCast(row(2), Decimal) * 1.1D
End Sub

Private Shared Sub StringIndex(ByVal row As DataRow, ByVal column As
DataColumn)
row("value") = DirectCast(row("value"), Decimal) * 1.1D
End Sub

Private Shared Sub ColumnIndex(ByVal row As DataRow, ByVal column As
DataColumn)
row(column) = DirectCast(row(column), Decimal) * 1.1D
End Sub

End Class

Running 10 sets I get the following:

test: 1
Integer: 1.3314144674812
String: 1.29888407604877
Column: 1.06492529078416

test: 2
Integer: 1.10235853998204
String: 1.32747067015501
Column: 1.0537758290509

test: 3
Integer: 1.0954998470476
String: 1.33955460819741
Column: 1.06967896757828

test: 4
Integer: 1.12571150802686
String: 1.33948337009313
Column: 1.05996264888415

test: 5
Integer: 1.15852098520901
String: 1.32627694301929
Column: 1.066469621139

test: 6
Integer: 1.15292222894251
String: 1.36355626203889
Column: 1.1006843048488

test: 7
Integer: 1.14156352273823
String: 1.33735544601339
Column: 1.08128267698828

test: 8
Integer: 1.14533830416994
String: 1.33989096379568
Column: 1.08176709609741

test: 9
Integer: 1.13144491827872
String: 1.38126968651044
Column: 1.11549261149113

test: 10
Integer: 1.17510465715615
String: 1.35280657178496
Column: 1.10689766436796

Which shows that indexing DataRow by DataColumn is quicker, however not by
much. indexing DataRow by String is clearly slower...

Hope this helps
Jay
 
Jay,

My point is that everything you create in a typed datases you can do with an
untyped dataset. When you say that this is a wrong statement of me, than I
can agree with you even without one single piece of code.

Another question: is that QueryPerformanceCounter more accurate than the
environment.ticks? (I can look it up however when you say so it is for me in
this case).

It is by the way an interesting piece of code you made, I will take some
time today to evaluate it completly :-)

Thanks,

Cor
 
Cor,
Please read my original post again! Read the entire post, specifically the
second paragraph!

<quote>
The reason being is the typed dataset uses a DataColumn when indexing into
the DataRow.Item property. Most untyped dataset code will use either an
Integer or the field name when indexing into the DataRow.Item property.
</quote>

Notice that I state why the typed dataset is faster (which you are free to
use) and why most untyped code is slower (which again you are free not to
use).

Again Sceppa explains how to use the same techniques as the typed dataset!

Another question: is that QueryPerformanceCounter more accurate than the
environment.ticks? (I can look it up however when you say so it is for me
in
this case).

http://support.microsoft.com/default.aspx?scid=kb;en-us;306978

Enivonment.TickCount is "the amount of time in milliseconds that has passed
since the last time the computer was started". My understanding
Environment.TickCount is calling the Win32 GetTickCount API.

The QueryPerformanceCounter is at the same resolution as the Performance
Counter classes, (nanosecond or higher).

There is also DateTime (DateTime.Ticks) which is "the number of
100-nanosecond intervals that have elapsed..."

VS.NET 2005 (aka Whidbey due out in 2005) will have a stopwatch object that
appears to use either QueryPerformanceCounter or DateTime.

http://lab.msdn.microsoft.com/libra...cpref/html/T_System_Diagnostics_Stopwatch.asp

Hope this helps
Jay
 
Jay,
Please read my original post again! Read the entire post, specifically the
second paragraph!

I read always your text when we are talking completely and mostly even more
than once. All I am asking is based on that second paragraph.

My whole point is that you write "mostly uses......". However we in this
newsgroup are self the ones who decide what we "mostly use".

So it is not the Typed dataset which is faster, it is the way we write it
which makes it faster. As far as I know is the dataset not inheritable, that
means that we can not optimise methods in it.

When I read your text (I assume more people) than is looks for me if there
is a technical reason why a typed dataset is faster, in in my opinion is
that not.

I go more deep in this aspect not because you stated it, however because I
thought that I had seen stated more than once by some persons (not as you do
it) without any word why, in the ADONET newsgroup that a typed dataset works
faster than an untyped.

And in my opinion that cannot be true when both use the same technique.

To keep it more in your words when you use the techniques David Sceppa
explains than there can in my opinion be no difference. So I am looking
where in the above statements I make a mistake.
Enivonment.TickCount is "the amount of time in milliseconds that has passed
since the last time the computer was started". My understanding
Environment.TickCount is calling the Win32 GetTickCount API.

Because of your words "My understandings" I will probably check it with a
little test this week. (Your statement of the environment.tick is in my
opinion true).

When I do it I show you the result, this should be so easy to test.

Cor
 
Jay,

I tested that timer and next time I definitly use that method you use, here
it is not important that it is not managed code in my opinion.

About the column, I evaluated your sample.
http://msdn.microsoft.com/library/d...html/frlrfsystemdatadatarowclassitemtopic.asp

I think that your test should be as it is.

Probably can with the column all the information about the object direct be
given.
With the indexer the column information has first to be found by getting the
datarow.table.columns.item(index)
With the string the same however first it has to be translated probably by
looping through the datarow.Table.Columns.item(collection).

However the above is just a gues, but I think it is something like that.

Cor
 
Cor,
My whole point is that you write "mostly uses......". However we in this
newsgroup are self the ones who decide what we "mostly use".
When you write DataSet code, do you use a DataColumn to index into the
DataRow? Have you seen any code that does?

Do you see where my assertion of "Most untyped dataset code" is true?

So it is not the Typed dataset which is faster, it is the way we write it
which makes it faster.
Is the class half full or half empty? In other words its a matter of
perspective.

The typed dataset is faster, as most users do not use DataColumns to index
into DataRows.

As far as I know is the dataset not inheritable, that
means that we can not optimise methods in it.
DataSet IS inheritable, as are DataTable & DataRow!

How do you think a typed Dataset is made?

Public Class TypedDataSet
Inherits DataSet

End Class
in the ADONET newsgroup that a typed dataset works
faster than an untyped.
And in my opinion that cannot be true when both use the same technique.

Open the source to a typed DataSet, the .vb file itself, it is very
different then how you or I (and most other programmers) would code an
untyped DataSet!

Hope this helps
Jay
 
Jay,
When you write DataSet code, do you use a DataColumn to index into the
DataRow? Have you seen any code that does?

All my next use of it, I see this is a flaw in the documentation on MSDN, it
can not be that you have to buy a book of David Sceppa to be knowed by this
in my opinion.

However now we have discused it, we will see it probably even more as well
in this newsgroup.

:-)

The dataset is not inheritbable, is a stupid statement of me, only opening
one generated typed dataset shows it.

Cor
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Back
Top