An inefficiency in Array.Sort

A

AMercer

The class below exhibits an inefficiency in Array.Sort, namely comparing an
array element to itself several times during Array.Sort's execution. To
recreate the problem, paste the code below in a test program, instantiate
TestArraySort, and call TestSort.

I have never seen Array.Sort fail, so all I am claiming is an inefficiency.
I stumbled on this while experimenting with generic interfaces. I suggest
this be forwarded to MS for review.

-------------------

Public Class TestArraySort ' expose an inefficiency in Array.Sort
' Array.Sort performs unnecessary comparisons by comparing an element to
itself.
Implements Collections.Generic.IComparer(Of Integer) ' for Array.Sort with
generic comparer

Public a() As Integer ' the array being sorted, visible from a break in
Compare()

Public Function Compare(ByVal x As Integer, ByVal y As Integer) As Integer _
Implements System.Collections.Generic.IComparer(Of Integer).Compare
Debug.Assert(x <> y) ' fail if keys are equal
Return x - y
End Function

Public Sub TestSort() ' Call this to cause an assertion failure in Compare
Randomize()
Dim n As Integer = 5
ReDim a(n)
For i As Integer = 1 To n
a(i) = CInt(Rnd(1) * 100) ' random data
a(i) = a(i) * 100 + i ' guarantee keys are unequal
Next
Array.Sort(a, 1, n, Me) ' sort using Compare() above
End Sub

End Class
 
A

AMercer

Look near the bottom of the original post for
Array.Sort(a, 1, n, Me) ' sort using Compare() above
 
C

Cor Ligthert[MVP]

Sorry,

However, this is not a good place trying to show something to Microsoft as
that is your intention then use.

Connect.Microsoft.com

And as advice show a little sample of what has to be sort as well with that.
(Or here, now it is a little bit a tantalus job to look at your error)

(Beside that try to avoid the Redim. That has as effect hat everytime a new
array is build.

Cor
 
J

James Hahn

You are assuming that it is inefficient to compare the same elements, which
may not be the case.

It is possible that the sort algorithm loses track of the original array
index. This can happen if it does a bulk copy from the source array to the
destination array. The bulk copy can save many compares if one source
sub-array is shorter than the other. The unnecesary comparisons of the same
items is more than offset by the comparisons thus saved.

You would need to know the algorithm used before being able to assert that
it would be worhtwhile to eliminate the unnecessary comparisons.
 
A

AMercer

Two points. First, MS documentation says they use QuickSort which does not
use a temporary array, and Array.Sort is an in-place sort. Second, my claim
is that when sorting array a(), Array.Sort compares a(i) to a(i), not that
they repeatedly compare a(i) to a(j) (did I understand you correctly?). The
former really makes no sense at all, especially presuming a QuickSort
algorithm.
 
F

Family Tree Mike

AMercer said:
Two points. First, MS documentation says they use QuickSort which does
not
use a temporary array, and Array.Sort is an in-place sort. Second, my
claim
is that when sorting array a(), Array.Sort compares a(i) to a(i), not that
they repeatedly compare a(i) to a(j) (did I understand you correctly?).
The
former really makes no sense at all, especially presuming a QuickSort
algorithm.
It would be logical to assume that the implimentation in this Sort routine
determined to scan the entire list of values versus the pivot rather than
the traditional method of swapping out the pivot element was optimal to
other means of tracking the pivot.
 
F

Family Tree Mike

Family Tree Mike said:
It would be logical to assume that the implimentation in this Sort routine
determined to scan the entire list of values versus the pivot rather than
the traditional method of swapping out the pivot element was optimal to
other means of tracking the pivot.


Sorry, I messed that one up...

I meant to say that it seems simply plowing through the array and comparing
values to the pivot value may actually be more efficient that keeping the
index of the pivot and treating it special.
 
J

James Hahn

Quicksort is an example of an algorithm that will compare an element to
itself. Although it is an in place sort, it divides the array into
subarrays and uses a comparison of values to determine the extents.

Your revised description of the problem ("Array.Sort compares a(i) to a(i)")
begs the question. If the comparison you are seeing in the exception really
is a comparison of a(i) to a(i) then obviously the comparison is redundant:
your statement of the problem assumes your conclusion is correct. But your
experiment does not show that array element i is being compared to array
element i. It simply shows that two values are being compared which happen
to be the same.

If it's an in-place sort and the array elements are all different then this
comparison could occur when an array element is compared to a temporary
value - in this case, the pivot.
 
B

Branco

James Hahn wrote:
Your revised description of the problem ("Array.Sort compares a(i) to a(i)")
begs the question.  If the comparison you are seeing in the exception really
is a comparison of a(i) to a(i) then obviously the comparison is redundant:
your statement of the problem assumes your conclusion is correct.  But your
experiment does not show that array element i is being compared to array
element i.  It simply shows that two values are being compared which happen
to be the same.
<snip>

I reproduced the situation using an array with unique values (1 to
100, which were tried ordered, inverted and shuffled) and Array.Sort
seems to consistently compare each element with itself at least twice
per sort.

Regards,

Branco.
 
A

AMercer

Thank you, Branco. The point of my test, like yours, was to sort an array
where no elements are equal. In this case, the only way the Generic
IComparer can be called with two equal numbers is if Array.Sort compares a(i)
to a(i). See also Family Tree Mike's comment above.
 
A

AMercer

I meant to say that it seems simply plowing through the array and comparing
values to the pivot value may actually be more efficient that keeping the
index of the pivot and treating it special.

Possibly. I ran a sort of 100,000 distinct elements, and about 6% of the
tests were equality compares. That seems like a lot, but that is just an
opinion - your hypothesis may be right.

Another idea is this. QuickSort performs best with random input and poorly
when the input is already sorted or partially sorted. Maybe Array.Sort is
coded with a front end to exploit non-random input, and not finding same, it
proceeds like QuickSort. Maybe all the equality compares come from a
presumably sloppily coded front end.

So, how to know what is going on? Ask MS. Would some kind MVP please earn
his pay and get an answer? I tried submitting it to MS, and I got into a
tangle jumping through all the hoops, so I formally surrender.
Alternatively, does somebody out there have the source code for Array.Sort?
If so, please post it here (it shouldn't be too big).

I'm very curious about this issue, but it won't be the end of the world if
it goes unanswered. After all, Array.Sort doesn't fail, at least in my
experience.
 
J

James Hahn

You cannot know that an array element is being compared to itself. If that
was happening, it would obviously be a redundant comparison. It is more
likely that the comparison is between an array element and a temporary
value - such as the pivot, in which case it may not be a redundant
comparison, but an essential part of the process of determining the
sub-array ranges or sub-sorting completion.

James Hahn wrote:
Your revised description of the problem ("Array.Sort compares a(i) to
a(i)")
begs the question. If the comparison you are seeing in the exception
really
is a comparison of a(i) to a(i) then obviously the comparison is
redundant:
your statement of the problem assumes your conclusion is correct. But your
experiment does not show that array element i is being compared to array
element i. It simply shows that two values are being compared which happen
to be the same.
<snip>

I reproduced the situation using an array with unique values (1 to
100, which were tried ordered, inverted and shuffled) and Array.Sort
seems to consistently compare each element with itself at least twice
per sort.

Regards,

Branco.
 
A

AMercer

You cannot know that an array element is being compared to itself.

And other comments ... I think you are missing the point.

When sorting the array {10,20}, the IComparer compares made by Array.Sort are:
10,20 10,20 10,10 10,20 10,10
Similarly, the array {20,10} produces compares
20,10 10,20 10,10 10,20 10,10

I can sort an array of two elements with one compare. Array.Sort takes 5
compares, and two of the compares are of an element vs itself (the 10,10
cases). For larger arrays with distinct integers, you similarly get a sloppy
comparison stream, and some of them will be comparing a number to itself.
How and why this is happening is unknown absent the details of how Array.Sort
is coded. But I assert that something is wrong with Array.Sort, and it looks
to me like a functionally correct but sloppy implementation. The 2-Sort
example does not look like an artifact of a implementation detail, eg a
quicksort pivot operation over a segment of the array that happens to contain
the pivot element. The 2-Sort example looks like sloppy work.
 
A

AMercer

Or maybe James Hahn and Family Tree Mike are right, or at least partially
right. I looked at a 5 sort, and it seemed like Array.Sort was using using a
pivot element to stop an iteration. But it ended up doing 19 compares, the
last 10 of which looked like it had lost its way. I really don't know what
it is doing and how it is doing it, but it does seem to do a lot of compares
to get the sort done. Excessive compares makes me nervous about using
Array.Sort with a generic IComparer to sort an array of object references
where the comparisons are complex. Exploring this question is what prompted
my OP in the first place.
 
J

James Hahn

You are failing to appreciate the distinction I am trying to make between
comparison of two array elelements and the comparison of two values. The
icomparer compares values, not array elements, and will be used when any
comparison of values is reuired. A comparison of an array element to
itself - comparing a(i) to a(i) as you put it - would clearly be
inefficient, as the routine should know that the i values were equal (a
comparison it can do for itself) and therefore there is no need to use the
icomparer. But if the comparison is not between array elements (for
instance, it's between an array element and the pivot or other temporary, or
between two array elements where i is not equal as per my first example)
then the routine MUST invoke the icomparer to do the comparison, and some of
those comparisons may return equal. The routine cannot know beforehand that
the comapre will return equal - that's the whole point of providing the
icomparer.

These comparisons are common to most sorting routines, either because of the
use of temporaries or because the sort operation itself renders comparisons
of the i value meaningless. For a Quicksort, in addition to determining end
of sort or end or subrange, it is possible there is a routine to determine
the optimal pivot, and there may be other optimizations that involve
comparing values.

If you are going to use a two-element sort as an example, then the
inefficiency you are complaining about is a failure to recognise the trivial
case, which is an entirely different issue.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top