PC Review


Reply
Thread Tools Rate Thread

How to delete thousands of SQL records

 
 
ChrisH
Guest
Posts: n/a
 
      17th Dec 2005
Hello,

I need to delete thousands of records from two SQL tables, but the
rows are not contiguous. There is a unique numeric id for each row
(clustered index), and my input is a file containing at least a
hundred thousand of these ids to be deleted. The numbers, however,
are all over the map. On my first attempt, I wrote a service, using
C#, that just looped through the file and issued DELETE commands using
the SqlCommand object and the ExecuteNonQuery method. It beat the
crap out of the server, nearly pegging it for the duration.

On my second attempt, I created a dataset using a BETWEEN clause with
the lowest and highest numbers in the file. In tests with a small
file with the numbers more contiguous than real life, it performed
quite well, simply spiking the CPU a bit when I issued the update.
With an actual test set, however, the service eat up almost 2GB of ram
fairly quickly, then stayed put, maybe grabbing a few KB here and
there. I write a log file during the operation, so I know if the
deletion process has begun, and the log file was never written. I
suspect that SELECT query containing the above-mentioned BETWEEN
clause generated more than a million rows and this is the problem.

That said, I need another approach. Can someone please point me in
the right direction?

Many thanks,

Chris
cholimanatcoxdotnet
 
Reply With Quote
 
 
 
 
Cor Ligthert [MVP]
Guest
Posts: n/a
 
      17th Dec 2005
Chris,

Did you try this building it using the stringbuilder.

"Delete.......................;Delete.....................;Delete..............................

And than one ExecutenonQuery

I hope this helps,

Cor


 
Reply With Quote
 
David Browne
Guest
Posts: n/a
 
      17th Dec 2005

"ChrisH" <cholimanatcoxdotnet> wrote in message
news:(E-Mail Removed)...
> Hello,
>
> I need to delete thousands of records from two SQL tables, but the
> rows are not contiguous. There is a unique numeric id for each row
> (clustered index), and my input is a file containing at least a
> hundred thousand of these ids to be deleted. The numbers, however,
> are all over the map. On my first attempt, I wrote a service, using
> C#, that just looped through the file and issued DELETE commands using
> the SqlCommand object and the ExecuteNonQuery method. It beat the
> crap out of the server, nearly pegging it for the duration.
>
> On my second attempt, I created a dataset using a BETWEEN clause with
> the lowest and highest numbers in the file. In tests with a small
> file with the numbers more contiguous than real life, it performed
> quite well, simply spiking the CPU a bit when I issued the update.
> With an actual test set, however, the service eat up almost 2GB of ram
> fairly quickly, then stayed put, maybe grabbing a few KB here and
> there. I write a log file during the operation, so I know if the
> deletion process has begun, and the log file was never written. I
> suspect that SELECT query containing the above-mentioned BETWEEN
> clause generated more than a million rows and this is the problem.
>
> That said, I need another approach. Can someone please point me in
> the right direction?



Use SQLBulkCopy, BULK INSERT, or plain-old INSERT to get your ID's into a
server-side table, and delete using a JOIN or subquery.

new SqlCommand("create table #RowsToDelete(ID int primary
key)",con).ExecuteNonQuery();

.. . . insert ID's. SQLServer will sort as you go because of the Primary
Key.

new SqlCommand("delete from MyTable where id in (select id from
#RowsToDelete)",con).ExecuteNonQuery();

David


 
Reply With Quote
 
Cowboy \(Gregory A. Beamer\)
Guest
Posts: n/a
 
      17th Dec 2005
Yuck! Hammer that server. :-)

--
Gregory A. Beamer

************************************************
Think outside the box!
************************************************
"Cor Ligthert [MVP]" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> Chris,
>
> Did you try this building it using the stringbuilder.
>
> "Delete.......................;Delete.....................;Delete..............................
>
> And than one ExecutenonQuery
>
> I hope this helps,
>
> Cor
>



 
Reply With Quote
 
Cowboy \(Gregory A. Beamer\)
Guest
Posts: n/a
 
      17th Dec 2005
Upload the numbers into a temp table and use a join to delete.

delete from MyTable
where MyId IN (SELECT MyID FROM #TempTable)

If you find this perf is horrible on the delete, you can, instead, upload
the numbers get a group of numbers not in this list and put that in a temp
table. Then move the non-deletes to another temp table, truncate the
original table and move them back.

SELECT MyID INTO #TempTable2 FROM MyTable m
LEFT JOIN #TempTable t
ON m.MyID = t.MyID
WHERE t.MyID IS NULL

SELECT * INTO #TempTable3 FROM MyTable m
JOIN #TempTable2 t
ON m.MyID = t.MyID

TRUNCATE TABLE MyTable

INSERT INTO MyTable
SELECT * FROM #TempTable3

-- You can get rid of the temp tables explicitly now

If you have constraints on this table (it is referred to by a foreign key
relationship elsewhere) this gets trickier as cascading deletes might or
might not be the best option.

--
Gregory A. Beamer

************************************************
Think outside the box!
************************************************
"ChrisH" <cholimanatcoxdotnet> wrote in message
news:(E-Mail Removed)...
> Hello,
>
> I need to delete thousands of records from two SQL tables, but the
> rows are not contiguous. There is a unique numeric id for each row
> (clustered index), and my input is a file containing at least a
> hundred thousand of these ids to be deleted. The numbers, however,
> are all over the map. On my first attempt, I wrote a service, using
> C#, that just looped through the file and issued DELETE commands using
> the SqlCommand object and the ExecuteNonQuery method. It beat the
> crap out of the server, nearly pegging it for the duration.
>
> On my second attempt, I created a dataset using a BETWEEN clause with
> the lowest and highest numbers in the file. In tests with a small
> file with the numbers more contiguous than real life, it performed
> quite well, simply spiking the CPU a bit when I issued the update.
> With an actual test set, however, the service eat up almost 2GB of ram
> fairly quickly, then stayed put, maybe grabbing a few KB here and
> there. I write a log file during the operation, so I know if the
> deletion process has begun, and the log file was never written. I
> suspect that SELECT query containing the above-mentioned BETWEEN
> clause generated more than a million rows and this is the problem.
>
> That said, I need another approach. Can someone please point me in
> the right direction?
>
> Many thanks,
>
> Chris
> cholimanatcoxdotnet



 
Reply With Quote
 
Sylvain Lafontaine
Guest
Posts: n/a
 
      17th Dec 2005
A great part of your problem here is the amount of basic work that must be
performed by SQL-Server (deleting hundred of thousands of records) and has
much less to do with the way you ask it to do so.

So you should see this as a SQL-Server optimisation problem. For example,
if you have a lot of indexes, than all of these indexes must be updated for
each delete operations. Here are a few suggestions:

1- Are they too many indexes (unused indexes) on this table?

2- Are these indexes heavily fragmented? Do you perform rebuilding and/or
defragmenting of theses indexes on a regular basis?

3- Have you considered the possibility of storing this table/clustered index
in its own file and make this file continuous on the physical hard drive (by
allocating in a single shot or defragmenting the hard drive) ?

Finally, you should ask this question in a newsgroup about SQL-Server (as
this is more a problem related to SQL-Server than to .NET) and buy a good
book on optimizing SQL-Server.

--
Sylvain Lafontaine, ing.
MVP - Technologies Virtual-PC
E-mail: http://cerbermail.com/?QugbLEWINF


"ChrisH" <cholimanatcoxdotnet> wrote in message
news:(E-Mail Removed)...
> Hello,
>
> I need to delete thousands of records from two SQL tables, but the
> rows are not contiguous. There is a unique numeric id for each row
> (clustered index), and my input is a file containing at least a
> hundred thousand of these ids to be deleted. The numbers, however,
> are all over the map. On my first attempt, I wrote a service, using
> C#, that just looped through the file and issued DELETE commands using
> the SqlCommand object and the ExecuteNonQuery method. It beat the
> crap out of the server, nearly pegging it for the duration.
>
> On my second attempt, I created a dataset using a BETWEEN clause with
> the lowest and highest numbers in the file. In tests with a small
> file with the numbers more contiguous than real life, it performed
> quite well, simply spiking the CPU a bit when I issued the update.
> With an actual test set, however, the service eat up almost 2GB of ram
> fairly quickly, then stayed put, maybe grabbing a few KB here and
> there. I write a log file during the operation, so I know if the
> deletion process has begun, and the log file was never written. I
> suspect that SELECT query containing the above-mentioned BETWEEN
> clause generated more than a million rows and this is the problem.
>
> That said, I need another approach. Can someone please point me in
> the right direction?
>
> Many thanks,
>
> Chris
> cholimanatcoxdotnet



 
Reply With Quote
 
ChrisH
Guest
Posts: n/a
 
      20th Dec 2005
On Sat, 17 Dec 2005 09:17:13 +0100, "Cor Ligthert [MVP]"
<(E-Mail Removed)> wrote:

>Chris,
>
>Did you try this building it using the stringbuilder.
>
>"Delete.......................;Delete.....................;Delete..............................
>
>And than one ExecutenonQuery
>
>I hope this helps,
>
>Cor
>


Thanks, I'll give that a try.

Chris
 
Reply With Quote
 
 
 
Reply

Thread Tools
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Processing thousands of records Dave Microsoft Access Queries 17 21st Aug 2008 08:09 PM
Queries for thousands of records CD27 Microsoft Access Queries 2 9th Apr 2007 06:41 PM
result from query gives thousands of records =?Utf-8?B?QmxlbnZpZA==?= Microsoft Access Queries 11 21st Oct 2005 02:12 PM
best way to retrieve thousands of records. jaYPee Microsoft VB .NET 12 6th Dec 2004 01:21 AM
best way to retrieve thousands of records. jaYPee Microsoft ADO .NET 12 6th Dec 2004 01:21 AM


Features
 

Advertising
 

Newsgroups
 


All times are GMT +1. The time now is 02:10 PM.