Excel Sort() Algorithm

M

Michael C

Someone raised a question in my computer class. Does anyone know what type
of sort algorithm Excel uses when you highlight a range and hit the sort
button? I suspect it's not a Bubble sort, because of the inefficiency
involved, but beyond that I don't know what type of sort algorithm they
would have implemented.

Thanks,
Michael C.
 
H

hgrove

Michael C wrote...
Someone raised a question in my computer class. Does anyone
know what type of sort algorithm Excel uses when you highlight
a range and hit the sort button? I suspect it's not a Bubble
sort, because of the inefficiency involved, but beyond that I
don't know what type of sort algorithm they would have
implemented.

Students!

What programming language do you suppose Microsoft used to implemen
Excel? Hint: Excel.exe contains a string that identifies the runtim
library for that language. When you've identified the language, read u
on its standard library. You'll find it includes at least one sor
function.

For that matter, why not test the sort times and figure out whether th
algorithm is O(N^2) or O(N log(N))? If the latter, what algorithm woul
it almost certainly be
 
M

Myrna Larson

I don't know that MS has released that information. But the Excel's sort is
what's referred to as 'stable', meaning that if two keys are equal, they are
left in their original order. So that limits the possibilities.
 
A

Amedee Van Gasse

Michael said:
Someone raised a question in my computer class. Does anyone know
what type of sort algorithm Excel uses when you highlight a range and
hit the sort button? I suspect it's not a Bubble sort, because of
the inefficiency involved, but beyond that I don't know what type of
sort algorithm they would have implemented.

Thanks,
Michael C.

I don't know, but John Walkenbach compares a few sorting algorithms in
Excel (version) Power Programming pages 321-323:

Table 11-1
Sorting Times in Seconds for Four Sort Algorithms
Using Randomly Filled Arrays

Array Excel Worksheet VBA Bubble VBA Quick VBA Counting
Elements Sort Sort Sort Sort
100 0.05 0.00 0.05 0.00
500 0.06 0.11 0.05 0.00
1,000 0.11 0.44 0.11 0.00
5,000 0.55 8.89 0.77 0.00
10,000 1.16 31.69 1.75 0.06
50,000 6.98 788.62 10.21 0.22
100,000 N/A N/A 20.60 0.44

He also compares partially sorted arrays. The result is similar.

Based on these numbers, my best bet is that Excel uses Quick Sort,
slightly faster than the VBA implementation because it is precompiled
into a binary form.
 
H

Harlan Grove

Myrna Larson said:
Yeah, a MAJOR variant -- one that turns it into a stable sort <g>.
....

I suppose it's all in how one interprets 'MAJOR'. Given the setup of =RAND()
in all cells in A1:A100,
=CHAR(65+INT((ROW()-1)/7))&CHAR(65+INT((ROW()-1)/3)) in B1:B100,
=TEXT(ROW()-1," 00") in C1:C100, and =B1&C1 in D1 filled down into D2:D100.
Then sort A1:B100 on col A in either order. The resulting col D values will
contain initial 2 letters in random order followed by numerals in original
row order.

The driver sub.

Sub foo()
Dim a As Variant, b As Variant

a = Application.WorksheetFunction.Transpose(Range("D1:D70").Value)
b = a

qsortproc1 a
Range("E1:E70").Value = Application.WorksheetFunction.Transpose(a)

qsortproc2 b
Range("F1:F70").Value = Application.WorksheetFunction.Transpose(b)
End Sub


The 'unstable' quicksort sub.

Sub qsortproc1( _
arr As Variant, _
Optional a As Variant, _
Optional z As Variant _
)
'--------------------------------------------
'Q&D ascending order quick sort -- RECURSIVE!
'--------------------------------------------
Dim i As Long, k As Long, x As Variant

If Not IsArray(arr) Then Exit Sub

If IsMissing(a) Then a = LBound(arr)
If IsMissing(z) Then z = UBound(arr)

If a >= z Then Exit Sub

i = a + Int((z - a + 1) * Rnd)

x = arr(i)
arr(i) = arr(a)
arr(a) = x

k = a

For i = a + 1 To z
If Left(arr(i), 2) < Left(arr(a), 2) Then
k = k + 1

x = arr(i)
arr(i) = arr(k)
arr(k) = x
End If
Next i

x = arr(k)
arr(k) = arr(a)
arr(a) = x

qsortproc1 arr, a, k - 1
qsortproc1 arr, k + 1, z
End Sub


The 'stable' quicksort sub.

Sub qsortproc2( _
arr As Variant, _
Optional oai As Variant, _
Optional a As Variant, _
Optional z As Variant _
)
'--------------------------------------------
'Q&D ascending order quick sort -- RECURSIVE!
'--------------------------------------------
Dim i As Long, k As Long, x As Variant

If Not IsArray(arr) Then Exit Sub

If IsMissing(a) Then a = LBound(arr)
If IsMissing(z) Then z = UBound(arr)

If a >= z Then Exit Sub

If IsMissing(oai) Then
ReDim oai(a To z)

For i = a To z
oai(i) = i
Next i
End If

i = a + Int((z - a + 1) * Rnd)

x = arr(i)
arr(i) = arr(a)
arr(a) = x

x = oai(i)
oai(i) = oai(a)
oai(a) = x

k = a

For i = a + 1 To z
If Left(arr(i), 2) < Left(arr(a), 2) _
Or (Left(arr(i), 2) = Left(arr(a), 2) And oai(i) < oai(a)) Then
k = k + 1

x = arr(i)
arr(i) = arr(k)
arr(k) = x

x = oai(i)
oai(i) = oai(k)
oai(k) = x
End If
Next i

x = arr(k)
arr(k) = arr(a)
arr(a) = x

x = oai(k)
oai(k) = oai(a)
oai(a) = x

qsortproc2 arr, oai, a, k - 1
qsortproc2 arr, oai, k + 1, z
End Sub


Is quicksort2 really a MAJOR VARIANT of quicksort1? It just doesn't seem so
to me.
 
D

Daniel.M

Hi Myrna,
Yeah, a MAJOR variant -- one that turns it into a stable sort <g>.

I don't see that as a major variant, especially if you don't change the core of
the algorithm.
For example, in C, adding a 4th comparison (one more than the 3 sort keys max),
involving the cells linenumber in the compare() function (the one refered to by
the pointer to function in qsort) would be enough.

We know the "Compare()" used in Excel is already a special variation: it always
leaves blanks at the end (a desirable effect IMO) whether you sort ASC or DESC.

Regards,

Daniel M.
 
M

Michael C

Thanks for all the input. Your chart is a little hard to read, but I get
the idea. One more question - does anyone think they might be using a Radix
sort, or a variation thereof?

Thanks,
Michael C.
 
M

Michael C

Sorry, I meant to say that a Ternary QuickSort and a Radix sort are both
stable. Does anyone have any thoughts possibly one way or the other, as to
which might be the Excel internal implementation?

Thanks,
Michael C.

Michael C said:
Thanks for all the input. Your chart is a little hard to read, but I get
the idea. One more question - does anyone think they might be using a Radix
sort, or a variation thereof?

Thanks,
Michael C.
 
H

hgrove

Michael C wrote...
Sorry, I meant to say that a Ternary QuickSort and a Radix sort
are both stable. Does anyone have any thoughts possibly one
way or the other, as to which might be the Excel internal
implementation?
...

Two things you need to realize about Excel in particular and Microsof
in general. Working code tends to be stable code (little or no retur
on investment improving what's already working). Code that's bee
working for a long time is VERY OLD CODE. Very old code reflects th
design constraints in force at the time it was written, and 15 year
ago memory consumption was a much bigger constraint than it is today.

Radix sort is more memory intensive than quicksort variants, and give
the idiosyncrasies of Excel's collation sequence, it's unlikely Excel'
sort makes use of radix sort. This isn't proof, however.

Jon Bentley and Doug McIlroy publiched the ternary quicksort algorith
in the early 1990s. While I don't recall any problems with the sor
facility in Excel 5 and prior (which would all have gone through desig
phase prior to this algoithm's publication), it's not out of th
question that Excel 97 may have adopted it.

Then again, if Microsoft's C standard library's qsort function adopte
the ternary quicksort algorithm, it's not unlikely Excel's own cod
would not have needed to be modified to take advantage of it.

All this is speculation. There's no way to be sure what algorith
Microsoft uses in Excel without reverse engineering Excel's sor
facility or seeing Excel's source code
 
M

Myrna Larson

Hi, Harlan:

As far as "major" goes i meant a major change in the logic. IMO having to
create and compare a 2nd key fits that description. That doesn't imply it's
hard to program.

re your Q&D routine: It doesn't look much like the QuickSort routines I've
written (I did one in ASM language for the Apple II back in the 1970's) or
modified.

As originally described, QS involve two loops, one searching from the top down
for a value <= the target, another searching the array from the bottom up for
a value > than the target, then swapping these 2 values.

You have only 1 loop, in which you move a lower value to a predetermined lower
position (k). There's no loop to search for a higher value with which to swap.

You have to study the code and figure out the logic all over again (well, I
do, anyway)! IMO this makes it a "major" modification. Whose variation is
this?

In my preliminary testing, your routine takes 33% more time to sort a
randomized listed of 10,000 strings constructed with the formula
=TEXT(RANDBETWEEN(0,999),"000") & TEXT(ROW()," 00000") and sorting on the 1st
3 characters than the "standard" routine I show below.

Would you care to make the necessary modifications to "stabilize" this more
'standard' routine?

Sub QuickSortV(List As Variant, ByVal Min As Long, ByVal Max As Long)
Dim MiddleValue As Variant
Dim Hi As Long
Dim Lo As Long
Dim i As Long

If Min >= Max Then Exit Sub

i = Int((Max - Min + 1) * Rnd + Min)
MiddleValue = List(i)
List(i) = List(Min)

Lo = Min
Hi = Max
Do
Do While Left$(List(Hi), 3) >= Left$(MiddleValue, 3)
Hi = Hi - 1
If Hi <= Lo Then
List(Lo) = MiddleValue
GoTo QSDone
End If
Loop

List(Lo) = List(Hi)
Lo = Lo + 1
Do While Left$(List(Lo), 3) < Left$(MiddleValue, 3)
Lo = Lo + 1
If Lo >= Hi Then
Lo = Hi
List(Hi) = MiddleValue
GoTo QSDone
End If
Loop
List(Hi) = List(Lo)
Loop

QSDone:
QuickSortV List, Min, Lo - 1
QuickSortV List, Lo + 1, Max
End Sub 'QuickSort
 
H

Harlan Grove

Myrna Larson said:
As far as "major" goes i meant a major change in the logic. IMO having
to create and compare a 2nd key fits that description. That doesn't
imply it's hard to program.

While I implemented it using a second array, all I did was augment the sort
keys by appending the original order as the lowest order element. Since the
original order necessarily has all distinct values, the augmented key also
has all distinct values, so original order within original keys would be
preserved. This isn't a major change in logic. This is a minor change in the
sort key which could have been fed through the same quicksort algorithm to
give a stable sort in terms of results.
re your Q&D routine: It doesn't look much like the QuickSort routines
I've written . . . ....
As originally described, QS involve two loops, one searching from the
top down for a value <= the target, another searching the array from
the bottom up for a value > than the target, then swapping these 2
values.

You mean 3 loops. You're forgetting your own outer, 'infinite' loop.
You have only 1 loop, in which you move a lower value to a
predetermined lower position (k). There's no loop to search for a
higher value with which to swap.

You have to study the code and figure out the logic all over again
(well, I do, anyway)! IMO this makes it a "major" modification. Whose
variation is this?

Fairly standard in Unix literature.

Bentley's "Programming Pearls" (ISBN 0-201-10331-1) page 112.

Kernighan & Ritchie's "The C Programming Language, 2nd Ed." (ISBN
0-13-110362-8) page 87.
In my preliminary testing, your routine takes 33% more time to sort a
randomized listed of 10,000 strings constructed with the formula
=TEXT(RANDBETWEEN(0,999),"000") & TEXT(ROW()," 00000") and sorting on
the 1st 3 characters than the "standard" routine I show below.

The multiple loop version involves fewer swaps than the single loop version.
For VBA variants containing strings that's a performance killer, but in
lower level languages like C with block memory moves it's not such a big
deal.

All quicksort requires is a partitioning of the elements such that the pivot
value is in its correct location when the partitioning has been completed.
That's what the 1-loop algorithm I used provides and what the 3-loop
algorithm you used provides. They just do it differently.

In a library quality procedure using block memory operations, the 1-loop
algorithm could add an extra array index and defer swapping until it hits a
record not less than the pivot value, then shift the adjacent less-than
records as a single memory block down one record size chunk and copy the
k_th record into the hole opened above it. While the memory move/copy
operations may be nontrivial in themselves, there's much to be said for the
simplicity of the 1-loop approach.

However, it stinks in VBA. That said, I don't find yours looks a lot like a
translation of an assembler version. I'd prefer a more structured version
(granted with some performance tweaks).


Sub qsortproc1d( _
arr As Variant, _
Optional a As Long, _
Optional z As Long _
)
'--------------------------------------------
'Q&D ascending order quick sort -- RECURSIVE!
'Based mostly on Robert Sedgewick's "Algorithms,
'2nd Ed." (ISBN 0-201-06673-4) page 118.
'--------------------------------------------
Dim i As Long, j As Long
Dim kv As Variant, pv As Variant, t As Variant

If a > 0 And a >= z Then Exit Sub

'-- error checking --
If Not IsArray(arr) Then Exit Sub

'-- optional arg default initializations --
If a = 0 Then a = LBound(arr)
If z = 0 Then z = UBound(arr)

'-- avoid recursion and looping if only 2 items --
If z - a = 1 Then
If Left(arr(a), 3) > Left(arr(z), 3) Then
t = arr(a)
arr(a) = arr(z)
arr(z) = t
End If

Exit Sub
End If

'-- select pivot value at random --
i = a + Int((z - a + 1) * Rnd)

'-- nothing to do if i = z --
If i < z Then
pv = arr(i)
arr(i) = arr(z)
arr(z) = pv
End If

'-- cheat: evaluate pivot value's key value once --
kv = Left(pv, 3)

i = a - 1
j = z

Do While i < j
Do
i = i + 1
Loop While i < z And Left(arr(i), 3) < kv

Do
j = j - 1
Loop While j > a And Left(arr(j), 3) > kv

t = arr(i)
arr(i) = arr(j)
arr(j) = t
Loop

arr(j) = arr(i)
arr(i) = pv
arr(z) = t

'-- skip recursive calls for single items --
If i - a > 1 Then qsortproc1d arr, a, i - 1
If z - i > 1 Then qsortproc1d arr, i + 1, z
End Sub


As for making this stable, I'd just translate the algorithm on page 9 of

http://www.cs.princeton.edu/~rs/talks/QuicksortIsOptimal.pdf

but I'm too tired to do it tonight.
 
H

Harlan Grove

Harlan Grove said:
Fairly standard in Unix literature.

Bentley's "Programming Pearls" (ISBN 0-201-10331-1) page 112.

Kernighan & Ritchie's "The C Programming Language, 2nd Ed." (ISBN
0-13-110362-8) page 87.
....

Further to this, while there are performance advantages to the 3-loop
version, the 1-loop version is much harder to screw up. Perhaps it'd be more
accurate to say that the 3-loop version is really easy to screw up - at
least that's been my experience with it this weekend.
As for making this stable, I'd just translate the algorithm on page 9 of

http://www.cs.princeton.edu/~rs/talks/QuicksortIsOptimal.pdf

but I'm too tired to do it tonight.

If only it were stable. It isn't (or it's over my head how to translate the
code therein into VBA). Instead, I'll make my own qsortproc1d stable. I
still don't consider this a major departure from the 3-loop quicksort - it's
just an augmented key.

If a VBA quicksort were rewritten to take comparison and swap procedures as
string arguments to be run using Application.Run, and if those procedures
were in a separate module with a private Variant to hold oai and
constructors and destructors to initialize and erase oai, then both unstable
and stable quicksort could be accomodated with the SAME quicksort procedure
(properly rewritten). The magic is ALL in the comparison and swapping, NOT
in the fundamental algorithm. Don't lose track of the quicksort forest by
paying too much attention to the comparison and swap trees.


Sub qsortproc2d( _
arr As Variant, _
Optional oai As Variant, _
Optional a As Long, _
Optional z As Long _
)
'--------------------------------------------
'Q&D ascending order quick sort -- RECURSIVE!
'Based mostly on Robert Sedgewick's "Algorithms,
'2nd Ed." (ISBN 0-201-06673-4) page 118.
'using key augmentation to make it stable
'--------------------------------------------
Dim i As Long, j As Long, k As Long, pi As Long
Dim kv As Variant, pv As Variant, t As Variant

'-- check array bounds on subsequent calls --
If Not IsMissing(oai) And a >= z Then Exit Sub

'-- error checking --
If Not IsArray(arr) Then Exit Sub

'-- initial call: initialize array bounds --
'-- and initialize original array indices --
If IsMissing(oai) Then
a = LBound(arr)
z = UBound(arr)

ReDim oai(a To z)

For k = a To z
oai(k) = k
Next k
End If

'-- avoid recursion and looping if only 2 items --
If z - a = 1 Then
If Left(arr(a), 3) < Left(arr(z), 3) Or _
(Left(arr(a), 3) = Left(arr(z), 3) And _
oai(a) < oai(z)) Then Exit Sub

t = arr(a)
arr(a) = arr(z)
arr(z) = t

k = oai(a)
oai(a) = oai(z)
oai(z) = k

Exit Sub
End If

'-- select partitioning record at random --
pi = a + Int((z - a + 1) * Rnd)

pv = arr(pi)
arr(pi) = arr(z)
arr(z) = pv

k = oai(pi)
oai(pi) = oai(z)
oai(z) = k

'-- cheat: store original index and --
'-- key value of partitioning record --
pi = k
kv = Left(pv, 3)

i = a - 1
j = z

Do While i < j
Do
i = i + 1
Loop While i < z And (Left(arr(i), 3) < kv Or _
(Left(arr(i), 3) = kv And oai(i) < pi))

Do
j = j - 1
Loop While j > a And (Left(arr(j), 3) > kv Or _
(Left(arr(j), 3) = kv And oai(j) > pi))

If i > j Then Exit Do

t = arr(i)
arr(i) = arr(j)
arr(j) = t

k = oai(i)
oai(i) = oai(j)
oai(j) = k
Loop

arr(z) = arr(i)
arr(i) = pv

oai(z) = oai(i)
oai(i) = pi

'-- skip recursive calls for single items --
If i - a > 1 Then qsortproc2d arr, oai, a, i - 1
If z - i > 1 Then qsortproc2d arr, oai, i + 1, z
End Sub


Note: this is much slower than your QuickSortV or my qsortproc1d, but I
doubt a stable quicksort based on your QuickSortV would be any faster. But I
leave that to you to implement.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Top