Best Method to Remove Data Rows

  • Thread starter Thread starter Nigel
  • Start date Start date
N

Nigel

I have a large spreadsheet 60 columns by up to 20,000 rows. I wish to
delete all rows where a specific value does not appear in one column.

In the past I have used a loop that reads all the rows from last to first,
deleting the entire row when the value in the column does not meet the
criteria. This works great except for large files - where it is very slow.

I had thought about sorting the file and then deleting those groups of
records where the criteria is not being met but this requires extra code to
identifier each part of the file to be removed.

Another idea was to use autofilters, where the negated criteria selection
allows me to delete all visible rows.

Any ideas as to the best and fastest method, without having to trawl through
all the data, row by row?
 
It's a good tip, but I already have that in my existing code. In my
spreadsheet most of the data is passive (very few formulas), but there are
some overall counters using subtotal that when switched off speed up the
deletions, but only slightly.

--
Cheers
Nigel



Antchi said:
You might want to try turning off automatic calculations.
 
Filter your data for the rows you want to delete and then run the
following code

Sub Remove_Data()

Dim rng As Range
Dim rng2 As Range

With ActiveSheet.AutoFilter.Range
On Error Resume Next
Set rng2 = .Offset(1, 0).Resize(.Rows.Count - 1, 1) _
.SpecialCells(xlCellTypeVisible)
On Error GoTo 0
End With

If rng2 Is Nothing Then
' Do Nothing
Else ' Delete the rows
Set rng = ActiveSheet.AutoFilter.Range
rng.Offset(1, 0).Resize(rng.Rows.Count - 1).Delete
End If

ActiveSheet.AutoFilterMode = False

End Sub
 
Nigel,

Both stepping through the rows and using the autofilter method will be slow with large workbooks, as
Excel needs to move all that data around every time a row or group of rows is deleted. You could
try this, which I have found to be faster: it sorts twice, each time getting rid of the rows above
the desired value, so Excel is only deleting two blocks of rows. Note that you didn't say what your
criteria for saving was: hence the use of a variant. This assumes your data starts in row 2 with a
header in row 1. Give it a try - and I would be interested in hearing if it made much of a
difference in your case.

HTH,
Bernie
MS Excel MVP

Sub Macro1()
Dim myCell As Range
Dim myCol As Integer
Dim WhatToSave As Variant
Dim myCalc As Variant

WhatToSave = "Criteria for saving"
myCol = 3

With Application
.ScreenUpdating = False
myCalc = .Calculation
.Calculation = xlCalculationManual
.EnableEvents = False
End With

Columns(myCol).Sort Key1:=Cells(2, myCol), Order1:=xlAscending, Header:=xlYes
Set myCell = Columns(myCol).Find(What:=WhatToSave, After:=Cells(1, myCol), LookAt:=xlWhole)
If myCell.Row <> 2 Then Range(Cells(2, myCol), myCell(0, 1)).EntireRow.Delete

Columns(myCol).Sort Key1:=Cells(2, myCol), Order1:=xlDescending, Header:=xlYes
Set myCell = Columns(myCol).Find(What:=WhatToSave, After:=Cells(1, myCol), LookAt:=xlWhole)
If myCell.Row <> 2 Then Range(Cells(2, myCol), myCell(0, 1)).EntireRow.Delete

With Application
.ScreenUpdating = True
.Calculation = myCalc
.EnableEvents = True
End With

End Sub
 
Thanks for the code, this is similar to the method I developed that uses a
filter selection and deletes the rest. But although faster than the scan
all rows method it is still quite slow with my worksheet..

But it is an improvement!
 
Thanks for the tip, I had come to the same conclusion that using sort was
the best and fastest method. What I had not considered was using two sorts
to create the two blocks of data, neat trick. This works the fastest of the
three methods I have explored.

Thanks for the advice
 
Nigel,

You're welcome.

For any other interested parties, I compared the three methods, using a high precision timing
routine and a workbook with 22,222 rows of intermixed values, with 3,333 values that should be kept
(meaning about 19,000 rows to be deleted)

Double Sorting / deletion took between 0.1 and 0.11 seconds.

Filter / deletion took 10.42 seconds, about 100 times longer.

Stepping up through the rows took 92.85 seconds, about 1000 times longer.

Note that for each macro, I turned off screen updating, events, and set calculation to manual.

It is an interesting side note (at least to me) that it was this exact problem that got me
interested in programming Excel, almost 10 years ago. In Quattro Pro, stepping up and deleting the
rows took almost no time, but with Excel 5, it was go out and get a cup of coffee, eat breakfast,
run to the bank, etc..... obviously requiring a different technique.

HTH,
Bernie
MS Excel MVP
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Back
Top