processing time (revised and re-posted)

G

GWC

Is is normal for the same process to take significantly longer each time it is repeated or is there something wrong with my computer?

My spreadsheet has 9 columns and 51,000 rows. It contains no macros nor formulas.

Here's what I'm doing:

I click on the "A" at the top of COL A.

In the FIND and REPLACE box, I type DUPL after 'Find What' and, for Options, I click 'WITHIN SHEET", "SEARCH BY COLUMNS" and "LOOK IN FORMULAS". I then click FIND ALL and it returns 588 cells.

I select the first cell listed and hold down the SHIFT key and select the last cell listed. (That selects all 588 cells listed).

I then click INSERT and INSERT CELLS and SHIFT CELLS RIGHT.

The first time, the "shifting" process" take 8 seconds.
the second time takes 20 seconds
the third time takes 80 seconds.
the fouth time takes 140 seconds
the fifth time takes 200 seconds
the sixth time takes 6 minutes.
the seventh time takes 20 minutes.
 
J

joeu2004

GWC said:
Is is normal for the same process to take significantly
longer each time it is repeated

"Normal?" That cannot be answered in general. The answer is: it depends.

"Is it possible?" Yes. See the details below.


GWC said:
or is there something wrong with my computer?

The fact that the process takes 8 seconds(!) the first time suggests to me
that something is indeed "wrong" with your computer.

That is, one or more of the following conditions are probably true:
1. The computer is memory starved.
2. There is too little disk space available for virtual memory, and/or the
disk is very slow and fragmented.
3. The CPU is very slow.
4. You have software or an environment (e.g. noisy LAN) that is interrupting
the Excel process unduly. That software might be viruses; or it might be
legitimate software running in the background (e.g. Norton).

Also, you neglect to say what version of Excel and Windows you are using.
That could be a factor since some versions have known problems with memory
leaks.


GWC said:
My spreadsheet has 9 columns and 51,000 rows.
It contains no macros nor formulas. Here's what I'm doing:
I click on the "A" at the top of COL A.
In the FIND and REPLACE box, I type DUPL after 'Find What'
and, for Options, I click 'WITHIN SHEET", "SEARCH BY COLUMNS"
and "LOOK IN FORMULAS". I then click FIND ALL and it returns
588 cells.
I select the first cell listed and hold down the SHIFT key
and select the last cell listed. (That selects all 588 cells
listed). I then click INSERT and INSERT CELLS and SHIFT CELLS RIGHT.
The first time, the "shifting" process" take 8 seconds.
the second time takes 20 seconds
the third time takes 80 seconds.
the fouth time takes 140 seconds
the fifth time takes 200 seconds
the sixth time takes 6 minutes.
the seventh time takes 20 minutes.

As I noted above, 8 seconds seems unduly long for the first time.

Nevertheless, I also see an increasing run time when I follow your procedure
as I understand it. See the details below.

The reasons might be two-fold.

First, we expect the procedure to take an increasing amount of time somewhat
if only because the spreadsheet is growing. Theoretically, this might be
exacerbated by the fact that you are always inserting from the originally
selected cells, if my understanding is correct.

I say "theoretically" because in my experiments, that actually does not make
any significant difference(!). I think that demonstrates that the second
reason below is the dominant factor.

Second, we expect the amount of memory for the workbook grows as we insert
and shift right.

But in fact, it appears that the amount of memory grows inordinately, very
much more than it should. This is confirmed by saving the modified file,
closing Excel, then reopening the file. Again, see the details below.

Details....

I am using Excel 2010 with WinXP, both with fairly recent updates. My
computer is a lowly single-CPU single-core 2.13 GHz processor without
hyperthreading with 2 GB RAM.

I have forgotten my disk specs, and I'm too lazy to look them up. But the
disk is 50% full, old (read: probably relatively slow), and it has never
been defrag'd (read: data is probably widely dispersed).

I created a data-only workbook with 9 columns and 51,000 rows of numeric and
text data randomly distributed. Column A contains 588 cells the string
"DUPL" randomly distributed.

I use the macro below to do the FIND/select, then repeated insert with
right-shift.

When I open the Excel file with the initial data, the Excel process uses
about 77 MB of RAM and 43 MB of VM. (Note: WinXP seems to be "capricious"
about its VM usage. Sometimes I see as much as 64 MB of VM for the same
file.)

On my computer, the Find time and the insert-with-right-shift times are the
following:

Find: 0.594 sec
insert # 1: 0.172 sec
insert # 2: 0.250
insert # 3: 0.328
insert # 4: 0.422
insert # 5: 0.531
insert # 6: 0.688
insert # 7: 0.781
insert # 8: 0.969
insert # 9: 1.266
insert #10: 1.375

Obviously, your times might vary. Even my times vary from one trial
experiment to another.

But the key thing to note is: the insert times are indeed increasing each
time.

The reason seems clear when we look at memory usage. After running the
macro (performing 10 inserts), the Excel process uses about 528 MB of RAM
and 492 MB of VM.

Such differences in memory usage are likely to put a strain on the computer.
It might thrash the CPU caches. Concomitantly, it might cause more disk
traffic, which is significantly relatively slow.

In any case, my times are significantly faster than yours, starting with my
0.172 sec compared to your 8 sec.

One explanation could be that your computer has much less memory or slower
CPU and disk.

Another explanation is that your data is very different from my experimental
data.

If you would like me to make an apples-to-apples comparison, you can upload
an example Excel file (devoid of any private data) that demonstrates the
problem to a file-sharing website.

Then post the "shared", "public" or "view-only" link (aka URL; http://...)
in a response here. The following is a list of some free file-sharing
websites; or use your own.

Box.Net: http://www.box.net/files
Windows Live Skydrive: http://skydrive.live.com
MediaFire: http://www.mediafire.com
FileFactory: http://www.filefactory.com
FileSavr: http://www.filesavr.com
RapidShare: http://www.rapidshare.com
 
J

joeu2004

PS.... I said:
I use the macro below to do the FIND/select, then repeated insert with
right-shift.

Oops, I forgot to include the macro. Here it is. (Beware of line wrapping
your news reader.)

-----

Sub testInsert()
Dim c As Range, r As Variant
Dim firstAddress As String
Dim n As Long, i As Long
Dim st As Single, et As Single
Dim s As String
st = Timer
With Columns("a:a")
Set c = .Find(what:="dupl", after:=.Cells(Rows.Count, 1), _
LookIn:=xlFormulas, lookat:=xlPart, searchorder:=xlByColumns, _
searchdirection:=xlNext, MatchCase:=False, matchbyte:=False)
If c Is Nothing Then
Range("c1").Select
MsgBox "nothing"
Exit Sub
End If
firstAddress = c.Address
Set r = c
n = 1
Do
Set c = .FindNext(c)
If c.Address = firstAddress Then Exit Do
n = n + 1
Set r = Application.Union(r, c)
Loop
End With
r.Select
et = Timer
s = "find time: " & Format(et - st, "0.000") & " sec"
For i = 1 To 10
st = Timer
Selection.Insert Shift:=xlToRight, CopyOrigin:=xlFormatFromLeftOrAbove
et = Timer
s = s & vbNewLine & "insert #" & i & " time: " & _
Format(et - st, "0.000") & " sec"
Next
Debug.Print s
MsgBox s
End Sub
 
J

joeu2004

PPS.... I said:
But in fact, it appears that the amount of memory grows
inordinately, very much more than it should. This is
confirmed by saving the modified file, closing Excel, then
reopening the file. Again, see the details below. [....]
After running the macro (performing 10 inserts), the Excel
process uses about 528 MB of RAM and 492 MB of VM.

I forgot to mention....

After saving, closing Excel, and reopening the file, the Excel process again
used only about 78MB RAM and 44 MB VM for the modified file.

So the growth to 528 MB RAM and 492 MB VM seems to be a __huge__ memory leak
in Excel, or just poor memory management within Excel.

(I did not wait long enough to see if perhaps WinXP reclaimed the unused
application memory much later. But there is still some concern that Excel
grew its memory usage to such an extent seemingly unnecessarily, even if
only temporarily.)
 
K

KenCowen

The find method and the discontinuous range inserting is probably causing your problem; which is pretty definitely a memory problem of some sort. Most decent computers can handle 50k rows and 10 or 20 columns. If all you really need to do is scoot the rows with dupl in column A 11 rows to the right you can do it pretty efficiently, quickly, and without the memory issue with the following code:

Sub test()

Dim r As Range
Dim i As Double

Set r = ActiveSheet.UsedRange

Application.ScreenUpdating = False

For i = 1 To r.Rows.Count

If r.Cells(i, 1).Value = "dupl" Then
r.Cells(i, 1).Resize(1, 11).Cut
r.Cells(i, 12).Select
ActiveSheet.Paste
End If

Next i

Application.ScreenUpdating = True

End Sub

It took about 4 seconds, and there was no appreciable increase in file sizewhen I ran it on a 60k row (3M) file. My data was copied from the first few hundred records on your link, then cloned a bunch of times to get 60k records, with dupl appearing every hundred or so.

I hope this helps.

Ken
 
B

Ben McClave

Here's another take on Ken's suggestion. This routine uses Ken's idea of cutting and pasting the "dupl" cells, but it uses Advanced Filter to show only the cells with "dupl" and then cycles through the visible cells only. This macro took about 1 second on my machine.

Here is the code:

Sub InsertDupl()
Dim l As Long
Dim r As Range

Application.ScreenUpdating = False

'Set up Advanced Filter criteria and column headings
Rows("1:4").Insert Shift:=xlDown, CopyOrigin:=xlFormatFromLeftOrAbove
Range("A1").FormulaR1C1 = "Column 1"
Range("A2").FormulaR1C1 = "dupl"
With Range("A4:K4")
For l = 1 To 11
.Range("A1").Offset(0, l - 1).Value = "Column " & l
Next l
End With

'Filter to show only the "dupl" cells
Range("A4:K" & ActiveSheet.UsedRange.Rows.Count).AdvancedFilter Action:=xlFilterInPlace, CriteriaRange:= _
Range("A1:A2"), Unique:=False

'Remove Advanced filter criteria and column headers
Rows("1:4").Delete

'Cut and paste all visible rows
For Each r In Range("A1:A" & ActiveSheet.UsedRange.Rows.Count).SpecialCells(xlCellTypeVisible)
r.Resize(1, 11).Cut r.Offset(0, 11)
Next r

'Clear filter and turn on screenupdating
ActiveSheet.ShowAllData
Application.ScreenUpdating = True

End Sub
 
J

joeu2004

GWC said:
I've tried this in Excel 2010 (which is very slow) and in
Excel 2007 (which is horribly slow).

I concur. And Ken and Ben stole my thunder with their suggested
alternatives.

But there is something hinky about your file that deserves some attention.

Ben's algorithm takes about 9 sec, and Ken's algorithm takes about 18 sec.

First, some of the empty-appearing cells are not truly empty. For example,
ISBLANK(A1) returns FALSE. Yet the Formula Bar shows nothing, not even a
null string.

There are a couple very simple explanations for that. This typically
happens when we copy-and-paste-special-value cells that contain the null
string.

But even when I write a macro that should clear the contents of all cells
with LEN()=0, the end of the worksheet (ActiveSheet.UsedRange) is not where
we would expect it.

This is confirmed when I save the worksheet to a CSV file and look at it
with Notepad. A large number of lines at the end have all-empty fields
(",,,,,,,,,,,").

When I delete the lines with all-empty fields, save the CSV file, open the
CSV file in Excel and save as an XLSX file, all the algorithms perform very
much faster, to wit:

Ken's loop-cut/paste (improved): 1.296 sec
find-loop-cut/paste: 1.750 sec
find-repeated-inserts: 7.219 sec
Ben's filter-loop-cut/paste: 7.375 sec

Note that even the find-repeated-inserts is significantly better, albeit
still not as good as the firt two loop-cut/paste algorithms. With your
original file, the find-repeated-inserts took a total of 16 min -- about 12
sec for the first insert.

Of course, those times are on my computer. YMMV. But they should
demonstrate the relative order of performance on your computer.

Bottom line: You should clean up your file.

One way to do that is:
1. Save Sheet1 to a CSV file.
2. Open the CSV file in Notepad (or equivalent).
3. Delete all lines at the end that have all-empty fields (",,,,,,,,,,,").
4. Save to the CSV file.
5. Open the CSV file in Excel
6. Save to an XLSX file.

An easy way to accomplish Step 3 is:
1. Find ",,,," (without quotes). Highlight the entire line.
2. Scroll down to the end. Do not home down.
3. Put the mouse cursor at the end of the last line. Do not click.
4. Press Shift, then left-click.
5. Press Delete.
 
G

GWC

Hi, joeu2004,

I deleted all of the lines at the end that have all-empty fields (",,,,,,,,,,,") and saved the CVS file.

I opened the CSV file in Excel and did the "shifting" process 10 times. Each time look less than 10 seconds for a total of 93 seconds!

Thank you! Thank you!

Gary
 
J

joeu2004

GWC said:
I deleted all of the lines at the end that have all-empty
fields (",,,,,,,,,,,") and saved the CVS file.
I opened the CSV file in Excel and did the "shifting"
process 10 times. Each time look less than 10 seconds for
a total of 93 seconds!

Odd! It took significantly less total time when I did it.

Of course, this does depend on characteristics of the computer system.

But also, I saved as XLSX and reopened the XLSX file before doing the
repeated shifts.

I would not think that makes any difference. But we're dealing with an
inexplicable phenomenon in the first place. So anything could be an adverse
factor.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top