Removing rows from a DataTable is VERY slow

M

Markus Hjärne

Thanks for the reference,

but I couldn't see anything there about increased Remove performance. I
don't have any index on the DataTable in my test code, so I'm not sure it
will make any difference with the re-written indexing engine. And also,
another post in this thread mentions that Remove is still slow in .NET 2
beta.

Regards,

Markus


Deepak said:
Hi Markus,

This is the source of my information on DataTable Indexing engine for
ADO.NET 2.0

http://msdn.microsoft.com/data/default.aspx?pull=/library/en-us/dnadonet/html/datasetenhance.asp

Regards,

Deepak
[I Code, therefore I am]


Markus Hjärne said:
Thanks for yor reply,

can you tell me what's your source of information when you say that the
indexing engine has been re-written i ADO.NET 2.0?

Regards,

Markus

Deepak said:
Indexing engine for DataTable in ADO.NET 1.x is not very optimized,
thus
you
facing this issue.
It has been re-written in ADO.NET 2.0, and will do these operations much
faster. I believe there is nothing much you can do about this. Storing your
data in multiple data tables will boost performance, but you will have to
access this based on your requirements.

--
Regards,

Deepak
[I Code, therefore I am]

I'm using a DataTable to hold a selection of rows (100,000+) from a large
database table (1,000,000+ rows). At times I want to remove rows that
aren't
used anymore from the DataTable to save memory. But my tests shows
that
it
takes very long time to remove the rows. Running the code below on a
Pentium
2.4 Ghz under Windows XP reports that it takes 187ms to add 100,000 rows,
but 32s (seconds, that is!) to remove half of them.

Can anybody tell me what I'm doing wrong or an alternative, faster
way
to
do
this?

// Create table with one int column
DataTable table = new DataTable();
table.Columns.Add(
new DataColumn(
"Col1",
typeof(int)
)
);

// Add a number of rows to the table
int ticks = Environment.TickCount;
for (int i = 0; i < 100000; i++)
{
table.Rows.Add(new object[] {i});
}
int elapsed = Environment.TickCount - ticks;
MessageBox.Show("Added " + table.Rows.Count + " rows in " + elapsed + "
ms.");

// Create a list of rows to remove (half of those in table)
ArrayList rowsToRemove = new ArrayList(table.Rows.Count/2);
for (int i = 0; i < table.Rows.Count; i++)
{
if ((i % 2) == 0)
rowsToRemove.Add(table.Rows);
}

// Remove the rows from the table
ticks = Environment.TickCount;
for (int i = 0; i < rowsToRemove.Count; i++)
{
table.Rows.Remove((DataRow)rowsToRemove);
}
elapsed = Environment.TickCount - ticks;
MessageBox.Show("Removed " + rowsToRemove.Count + " rows in " +
elapsed
+
"
ms.");

Grateful for any help on this,

Markus Hjarne
 
M

Markus Hjärne

Hi everybody in this thread,

I think I owe you all, especially you who think I should load less data into
my DataTables, a little better explanation of my design thoughts and what
I'm trying do. I also want to say that it would be interesting to hear what
Kevin Yu or someone else at Microsoft has to say about the problem. I have a
feeling you are out there. :).

The problem arises just because I try NOT to keep more data in memory than
necessary. Since the database I'm using contains more data than can be held
in memory at one time, the design of the application have to handle that in
some way. The application will also have a rather specialized GUI with tree
views and specialized grids where the user can (and will) navigate around
the objects in many different, and sometimes sophisticated, ways. The user
will also be able to create, modify and delete most of the objects through
the GUI:

So my design idea was that when reading rows from a database table I would
store the rows in a DataTable and also store each DataRow in a business
object which could be used through out the application. Whenever a request
is made to the database, I need to check if there already exists a
corresponding business object for each returned row, because then I want to
reuse that object to ensure that only one business object exists per row in
the database. This was easily accomplished by using a Hashtable with the
DataRow as the key and the corresponding object as the value.

So far so good, but how should I prevent the memory from being exhausted as
the user moves around in the GUI causing more and more rows to be loaded and
business objects to be created? I could manually unload unchanged
objects/rows explicitly, for example when the user collapses a branch in the
tree view. But that would require that I remembered to unload the objects at
all those places and it also felt a little 'undotnetish'.

Then I got the idea to use a WeakReference for the object in the Hashtable
and simple let the garbage collector handle the deallocation of objects when
they aren't used anywhere in the application any longer. But I still have to
do the removal of the deallocated objects corresponding rows in the
DataTable, so I have a Purge method that must be called periodically to
handle this (or maybe I find some way to handle this by a separate thread).

So the bottom line is that with the GC being as unpredictable as it is,
there comes occasions when there are quite a few rows I need to remove from
the DataTable at the same time, and this is where it is very unsatisfactory
that this operation is so slow.

Thanks for all the response so far, but I still nurish the hope that
somebody has a smart idea how to solve this or at least that Microsoft gives
a hint that the performance problem will be looked into in .NET 2.0.

Regards,

Markus Hjarne
 
G

Guest

It is mentioned in the article. I will quote from the article itself


New Indexing Engine
The indexing engine for the DataTable has been completely rewritten in
ADO.NET 2.0 and scales much better for large datasets. This results in faster
basic inserts, updates, and deletes, and therefore faster Fill and Merge
operations

Regards,

Deepak
[I Code, therefore I am]


Markus Hjärne said:
Thanks for the reference,

but I couldn't see anything there about increased Remove performance. I
don't have any index on the DataTable in my test code, so I'm not sure it
will make any difference with the re-written indexing engine. And also,
another post in this thread mentions that Remove is still slow in .NET 2
beta.

Regards,

Markus


Deepak said:
Hi Markus,

This is the source of my information on DataTable Indexing engine for
ADO.NET 2.0

http://msdn.microsoft.com/data/default.aspx?pull=/library/en-us/dnadonet/html/datasetenhance.asp

Regards,

Deepak
[I Code, therefore I am]


Markus Hjärne said:
Thanks for yor reply,

can you tell me what's your source of information when you say that the
indexing engine has been re-written i ADO.NET 2.0?

Regards,

Markus

"Deepak" <[email protected]> skrev i meddelandet
Indexing engine for DataTable in ADO.NET 1.x is not very optimized, thus
you
facing this issue.
It has been re-written in ADO.NET 2.0, and will do these operations much
faster. I believe there is nothing much you can do about this. Storing
your
data in multiple data tables will boost performance, but you will have to
access this based on your requirements.

--
Regards,

Deepak
[I Code, therefore I am]

I'm using a DataTable to hold a selection of rows (100,000+) from a
large
database table (1,000,000+ rows). At times I want to remove rows that
aren't
used anymore from the DataTable to save memory. But my tests shows that
it
takes very long time to remove the rows. Running the code below on a
Pentium
2.4 Ghz under Windows XP reports that it takes 187ms to add 100,000
rows,
but 32s (seconds, that is!) to remove half of them.

Can anybody tell me what I'm doing wrong or an alternative, faster way
to
do
this?

// Create table with one int column
DataTable table = new DataTable();
table.Columns.Add(
new DataColumn(
"Col1",
typeof(int)
)
);

// Add a number of rows to the table
int ticks = Environment.TickCount;
for (int i = 0; i < 100000; i++)
{
table.Rows.Add(new object[] {i});
}
int elapsed = Environment.TickCount - ticks;
MessageBox.Show("Added " + table.Rows.Count + " rows in " + elapsed + "
ms.");

// Create a list of rows to remove (half of those in table)
ArrayList rowsToRemove = new ArrayList(table.Rows.Count/2);
for (int i = 0; i < table.Rows.Count; i++)
{
if ((i % 2) == 0)
rowsToRemove.Add(table.Rows);
}

// Remove the rows from the table
ticks = Environment.TickCount;
for (int i = 0; i < rowsToRemove.Count; i++)
{
table.Rows.Remove((DataRow)rowsToRemove);
}
elapsed = Environment.TickCount - ticks;
MessageBox.Show("Removed " + rowsToRemove.Count + " rows in " + elapsed
+
"
ms.");

Grateful for any help on this,

Markus Hjarne

 
K

Kawarjit Bedi [MS]

This problem will be addressed in .NET 2.0.

Removing rows from large DataTables is slow in .NET 1.x. The next version
of .NET 2.0 Beta will contain the fix and performance will be much better.
Remove operation will be almost as fast as the Insert operation. In v1.x,
Remove performance is linear, i,e, it's a function of the number of rows in
the table whereas in .NET 2.0, it'd be logarithmic function.

Note: The existing .NET 2.0 Beta improves row insert / remove performance
over .NET 1.x when indexes / constraints are involved. For scenarios like
the case being described in this thread where index/constraint are not
involved the performance may not be much different.

Hope it helps.

Kawarjit Bedi
Program Manager - ADO.NET Team
Microsoft Corp.

This posting is provided "AS IS" with no warranties, and confers no rights.
 
M

Markus Hjärne

I'm sorry, but I still can't see anything about improved performance of
removal.

Regards,

Markus

Deepak said:
It is mentioned in the article. I will quote from the article itself


New Indexing Engine
The indexing engine for the DataTable has been completely rewritten in
ADO.NET 2.0 and scales much better for large datasets. This results in faster
basic inserts, updates, and deletes, and therefore faster Fill and Merge
operations

Regards,

Deepak
[I Code, therefore I am]


Markus Hjärne said:
Thanks for the reference,

but I couldn't see anything there about increased Remove performance. I
don't have any index on the DataTable in my test code, so I'm not sure it
will make any difference with the re-written indexing engine. And also,
another post in this thread mentions that Remove is still slow in .NET 2
beta.

Regards,

Markus


Deepak said:
Hi Markus,

This is the source of my information on DataTable Indexing engine for
ADO.NET 2.0
http://msdn.microsoft.com/data/default.aspx?pull=/library/en-us/dnadonet/html/datasetenhance.asp
Regards,

Deepak
[I Code, therefore I am]


:

Thanks for yor reply,

can you tell me what's your source of information when you say that the
indexing engine has been re-written i ADO.NET 2.0?

Regards,

Markus

"Deepak" <[email protected]> skrev i meddelandet
Indexing engine for DataTable in ADO.NET 1.x is not very
optimized,
thus
you
facing this issue.
It has been re-written in ADO.NET 2.0, and will do these
operations
much
faster. I believe there is nothing much you can do about this. Storing
your
data in multiple data tables will boost performance, but you will
have
to
access this based on your requirements.

--
Regards,

Deepak
[I Code, therefore I am]

I'm using a DataTable to hold a selection of rows (100,000+) from a
large
database table (1,000,000+ rows). At times I want to remove rows that
aren't
used anymore from the DataTable to save memory. But my tests
shows
that
it
takes very long time to remove the rows. Running the code below on a
Pentium
2.4 Ghz under Windows XP reports that it takes 187ms to add 100,000
rows,
but 32s (seconds, that is!) to remove half of them.

Can anybody tell me what I'm doing wrong or an alternative,
faster
way
to
do
this?

// Create table with one int column
DataTable table = new DataTable();
table.Columns.Add(
new DataColumn(
"Col1",
typeof(int)
)
);

// Add a number of rows to the table
int ticks = Environment.TickCount;
for (int i = 0; i < 100000; i++)
{
table.Rows.Add(new object[] {i});
}
int elapsed = Environment.TickCount - ticks;
MessageBox.Show("Added " + table.Rows.Count + " rows in " +
elapsed
+ "
ms.");

// Create a list of rows to remove (half of those in table)
ArrayList rowsToRemove = new ArrayList(table.Rows.Count/2);
for (int i = 0; i < table.Rows.Count; i++)
{
if ((i % 2) == 0)
rowsToRemove.Add(table.Rows);
}

// Remove the rows from the table
ticks = Environment.TickCount;
for (int i = 0; i < rowsToRemove.Count; i++)
{
table.Rows.Remove((DataRow)rowsToRemove);
}
elapsed = Environment.TickCount - ticks;
MessageBox.Show("Removed " + rowsToRemove.Count + " rows in " + elapsed
+
"
ms.");

Grateful for any help on this,

Markus Hjarne

 
M

Markus Hjärne

Thanks!

It really warms my heart to know that you are listened to. It won't help
with my immediate problem, but now I feel that I can keep my design and the
performance problems will eventually be solved. Hopefully I, or some clever
person out there, will come up with a acceptable workaround until then.

For a moment, I actually thought I had come up with a workaround when my
testing using a raw ArrayList showed that RemoveAt was so much faster than
Remove (which wasn't so surprising since RemoveAt doesn't have to lookup the
index). But unfortunately DataRowCollection.RemoveAt wasn't any faster than
Remove, and that surprised me a bit.

Anyway, thanks again!

Regards,

Markus Hjarne
LandFocus
Sweden
 
N

Nayana

hai everybody,

I created a code to retrieve mails from from using vb.net.
Initially imported outlook.mailitem ,by adding microsoft outlook reference
in add refernces.
All the properties except "SenderEmailAddress" is being shown in the
properties list.
Can anyone help me out.
Nayana.
 
I

Imac_Man

He Excuse my Exglish please,

and is necesary that you use outlook reference for send emails, beacause you
can use Echange directly.
 
I

Isabelle Prawitz

Hello !
SenderEmailAddress is a property of Outlook 2003.
If you don't see it, your reference isn't Outlook 2003 (Outlook 11.0).

Bye
Isa
 
G

Guest

Dim email As New Mail.MailMessage()
Dim pj As New Mail.MailAttachment(Chemin_Dossier_Lettre() &
"Reponse_Allo_Mairie_" & tableau(0, 5) & ".doc")
email.To = email_Secretariat()
email.From = "Technicien_Allo_Mairie"
email.Body = "Le Technicien vient de cloturer une
intervention. Vous trouverez ci-joint la lettre qui doit etre envoyée au
contribuable qui avait fait la demande d'intervention. " & _
Chr(10) & "Avant l'impression et l'envoi vérifier le contenu
de la lettre" & _
Chr(10) & Chr(10) & Chr(10) & "Ce message est envoyé par un
robot, ne pas repondre à ce message !"

email.Subject = "Allô Mairie : Lettre de cloture d'intervention"
email.Attachments.Add(pj)
Mail.SmtpMail.SmtpServer = IP_Serveur_Message()
Mail.SmtpMail.Send(email)
End Sub


"Nayana" a écrit :
 
M

Markus Hjärne

Thanks,

that explains it.

/Markus

Sahil Malik said:
Markus,

Here is why -

public void RemoveAt(int index)
{
this.Remove(this[index]);
}

RemoveAt calls Remove in DataRowCollection - which is why no performance
difference.

;-)

- Sahil Malik
http://dotnetjunkies.com/weblog/sahilmalik



Markus Hjärne said:
Thanks!

It really warms my heart to know that you are listened to. It won't help
with my immediate problem, but now I feel that I can keep my design and
the
performance problems will eventually be solved. Hopefully I, or some
clever
person out there, will come up with a acceptable workaround until then.

For a moment, I actually thought I had come up with a workaround when my
testing using a raw ArrayList showed that RemoveAt was so much faster than
Remove (which wasn't so surprising since RemoveAt doesn't have to lookup
the
index). But unfortunately DataRowCollection.RemoveAt wasn't any faster
than
Remove, and that surprised me a bit.

Anyway, thanks again!

Regards,

Markus Hjarne
LandFocus
Sweden



rows
in
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top