large SELECT with ORDER BY very slow (only in ado.net)

dror · Dec 27, 2004

Hello,

I have a problem that actually doesn't even make sense.
I have 4 million rows in my database. I want to get all records into a
DataReader and then read. So if I do it in DAO (either in VB or MFC C++)
it takes around 1 second to get the recordset and then around 10-15
seconds to iterate through the recordset. This speed remains the same even
if I incorporate an ORDER BY clause:
SELECT * FROM myTable ORDER BY customer_id, order_Date;
Without the ORDER BY statement i obtain the same speed in C# using
ado.net. I tested the four possibilities:
1. using OLE DB data provider with Ms Access.
2. using ODBC data provider with Ms Access.
3. using ODBC data provider with Sql Server.
4. using SqlClient data provider with Sql Server.

BUT, when I incorporated the ORDER BY clause each of the first three took
around 3 minutes!!! (the forth was around 30 seconds which is still very
very slow compared to the DAO (which is an older technology and depricated
in vs.net). NOTE: all the extra time is from the method
DbCommand.ExecuteReader() not from iterating through the dataReader.
It's not a question of reducing the size of data i retrieve or indexing
the relevant fields in the database as I want to achieve with ado.net the
same speed I get with DAO. In my experiment I use the same database, same
tables and corresponding objects (as a RecordSet in DAO is the same as a
DataReader in ado.net).
Another thing i noticed is that i use a dynaset in DAO (which no longer
exists in ado.net). Still need to find a solution for this problem.

Thanks,
Dror

Paul Clement · Dec 27, 2004

¤ Hello,
¤
¤ I have a problem that actually doesn't even make sense.
¤ I have 4 million rows in my database. I want to get all records into a
¤ DataReader and then read. So if I do it in DAO (either in VB or MFC C++)
¤ it takes around 1 second to get the recordset and then around 10-15
¤ seconds to iterate through the recordset. This speed remains the same even
¤ if I incorporate an ORDER BY clause:
¤ SELECT * FROM myTable ORDER BY customer_id, order_Date;
¤ Without the ORDER BY statement i obtain the same speed in C# using
¤ ado.net. I tested the four possibilities:
¤ 1. using OLE DB data provider with Ms Access.
¤ 2. using ODBC data provider with Ms Access.
¤ 3. using ODBC data provider with Sql Server.
¤ 4. using SqlClient data provider with Sql Server.
¤
¤ BUT, when I incorporated the ORDER BY clause each of the first three took
¤ around 3 minutes!!! (the forth was around 30 seconds which is still very
¤ very slow compared to the DAO (which is an older technology and depricated
¤ in vs.net). NOTE: all the extra time is from the method
¤ DbCommand.ExecuteReader() not from iterating through the dataReader.
¤ It's not a question of reducing the size of data i retrieve or indexing
¤ the relevant fields in the database as I want to achieve with ado.net the
¤ same speed I get with DAO. In my experiment I use the same database, same
¤ tables and corresponding objects (as a RecordSet in DAO is the same as a
¤ DataReader in ado.net).
¤ Another thing i noticed is that i use a dynaset in DAO (which no longer
¤ exists in ado.net). Still need to find a solution for this problem.

The problem is probably due to the fact that ADO.NET will attempt to return the *data* for all rows
(to the client) upon execution of the query, and then disconnect. A DAO Dynaset Recordset type, on
the other hand, will only return a small subset of the data (and all keys), remain connected and
retrieve additional data as it is requested through the Recordset object.

ADO.NET doesn't support dynamic result sets. The result sets returned are essentially disconnected.
The closest equivalent in DAO to an ADO.NET DataReader would be a Snapshot (forward-only).

Paul ~~~ (e-mail address removed)
Microsoft MVP (Visual Basic)

dror · Dec 27, 2004

Hi,
Thanks for the reply,
I've been experimenting with this for the last few days. I would like to
move to ado.net (DAO has been depricated). I've tried odbc (mssql &
access), oledb (mssql & access) and SqlClient providers and none reach the
performance of DAO's dynaset. I understand the reason for that, as you
explained, but I still need to find a solution as on one hand DAO is
depricated and on the other ado.net's performance is poor (even for
smaller sizes of queries DAO performs better, and not to mention the
performance of filling a DataSet which is even slower). What should I do
in this case? What API should I use? Btw, even legacy ADO performs slower
with dynamic recordset. Why is the idea behind dynaset being depricated,
is microsoft giving up on fast technology? What other API can be used that
runs fast?

Dror

Paul Clement · Dec 28, 2004

¤ Hi,
¤ Thanks for the reply,
¤ I've been experimenting with this for the last few days. I would like to
¤ move to ado.net (DAO has been depricated). I've tried odbc (mssql &
¤ access), oledb (mssql & access) and SqlClient providers and none reach the
¤ performance of DAO's dynaset. I understand the reason for that, as you
¤ explained, but I still need to find a solution as on one hand DAO is
¤ depricated and on the other ado.net's performance is poor (even for
¤ smaller sizes of queries DAO performs better, and not to mention the
¤ performance of filling a DataSet which is even slower). What should I do
¤ in this case? What API should I use? Btw, even legacy ADO performs slower
¤ with dynamic recordset. Why is the idea behind dynaset being depricated,
¤ is microsoft giving up on fast technology? What other API can be used that
¤ runs fast?

My suggestion would be to implement a paging mechanism similar to what you see
in web pages that display long lists of data. You can restrict your queries to
return only a certain number of rows at a time. If you're using SQL Server or
Access, the TOP keyword would do this. Whenever you need to display data beyond
the end of your current dataset, query the database and select all data where
the field you are ordering by is greater than the value of that field for the
last row of the current dataset. This is somewhat similar to how DAO works with
Dynasets, although all the work is performed under the covers.

With respect to your other questions, ADO is slower than DAO, primarily because
DAO talks directly to the Jet database engine while ADO operates through an
additional data access layer called OLEDB.

In addition, Microsoft has slowly moved away from the concept of persistent
connections and server-side cursors. With the movement toward web development
and thin clients the emphasis has been more toward the concept of data
manipulation in a disconnected environment, either at the server application
level or desktop client application level, significantly reducing the usage of
database resources. Connection resources only exist for the time in which an
update or retrieval of data is required. The middle-man, database cursors, have
been more or less rendered obsolete.

I don't think this is necessarily the correct approach as there are applications
that would significantly benefit from the features present in data access
mechanisms such as DAO.

Paul ~~~ (e-mail address removed)
Microsoft MVP (Visual Basic)

large SELECT with ORDER BY very slow (only in ado.net)

dror

Paul Clement

dror

Paul Clement