Shocking mistake with comparitor function by Microsoft programmers!

R

RayLopez99

If you think only I make rookie mistakes, check out this howler made
by Microsoft programmers.

When I've done a custom comparison I've always made sure the form was
proper. I even had fits over the "equality" operation; I forget the
issue, but it was critical that a value be returned that is logical.
Below, it seems the value returns depends on chance. I'm surprised it
did not crash, like the author below states.

RL

http://www.robweir.com/blog/2010/02/microsoft-random-browser-ballot.html

In this case Microsoft gave the following comparison function:
function RandomSort (a,b)
{
return (0.5 - Math.random());
}

Since Math.random() should return a random number chosen uniformly
between 0 and 1, the RandomSort() function will return a random value
between -0.5 and 0.5. If you know anything about sorting, you can see
the problem here. Sorting requires a self-consistent definition of
ordering. The following assertions must be true if sorting is to make
any sense at all:
If a<b then b>a
If a>b then b<a
If a=b then b=a
if a<b and b<c then a<c
If a>b and b>c then a>c
If a=b and b=c then a=c

All of these statements are violated by the Microsoft comparison
function. Since the comparison function returns random results, a
sort routine that depends on any of these logical implications would
receive inconsistent information regarding the progress of the sort.
Given that, the fact that the results were non-random is hardly
surprising. Depending on the exact search algorithm used, it may just
do a few exchanges operations and then prematurely stop. Or, it could
be worse. It could lead to an infinite loop.
 
V

vanderghast

Yes and no. Well, I already got a very similar discussion with Pete (and he
will probably jump in too), but ASSUMING the classic quick sort algorithm:

In a given step, only comparisons involving the pivot are made;
The transitivity is not required by virtue of the partitions: those less
than the pivot are in a partition which won't speak to those values that
were greater to the pivot, since those are in another partition, and
elements of different partitions would not be involved between themselves
(won't be a partition, otherwise).

So the transitivity rules

if a<b and b<c then a<c
If a>b and b>c then a>c

are not required given the pivot b makes partitions and we would NEVER have
to compare a with c ever.

The reflexivity rules are not required either since the pivot won't be
compared, a second time (in the classic quick sort algorith).

See http://www.sorting-algorithms.com/random-initial-order, as example, and
http://www.sorting-algorithms.com/quick-sort for a discussion.


Adding 'robustness' to the algorithm, though, may require to redo some
comparisons and in that, then, your conclusion is right, but it holds only
because of the implementation, not because of the 'concept' behind the
algorithm (partitions).

As Pete pointed it out, and contrary to the article you pointed out, a
shuffle is preferable to a random sort
(http://en.wikipedia.org/wiki/Knuth_shuffle), at least, for preformance.
The data in the article you supplied sounds strange (out of 5 positions,
getting position 5 first, more than 50% of the time... !?!) No need to make
a Chi-square test to see that there is something fishy...



Vanderghast, Access MVP
 
A

Arne Vajhøj

Yes and no. Well, I already got a very similar discussion with Pete (and
he will probably jump in too), but ASSUMING the classic quick sort
algorithm:

In a given step, only comparisons involving the pivot are made;
The transitivity is not required by virtue of the partitions: those less
than the pivot are in a partition which won't speak to those values that
were greater to the pivot, since those are in another partition, and
elements of different partitions would not be involved between
themselves (won't be a partition, otherwise).

So the transitivity rules

if a<b and b<c then a<c
If a>b and b>c then a>c

are not required given the pivot b makes partitions and we would NEVER
have to compare a with c ever.

The reflexivity rules are not required either since the pivot won't be
compared, a second time (in the classic quick sort algorith).

See http://www.sorting-algorithms.com/random-initial-order, as example,
and http://www.sorting-algorithms.com/quick-sort for a discussion.


Adding 'robustness' to the algorithm, though, may require to redo some
comparisons and in that, then, your conclusion is right, but it holds
only because of the implementation, not because of the 'concept' behind
the algorithm (partitions).

As Pete pointed it out, and contrary to the article you pointed out, a
shuffle is preferable to a random sort
(http://en.wikipedia.org/wiki/Knuth_shuffle), at least, for preformance.
The data in the article you supplied sounds strange (out of 5 positions,
getting position 5 first, more than 50% of the time... !?!) No need to
make a Chi-square test to see that there is something fishy...

Hm.

Let me see.

The language used is JavaScript.

The JavaScript standard ECMA-262 says:

<quote>
If comparefn is not undefined and is not a consistent comparison
function for the elements of this array (see below), the behaviour of
sort is implementation-defined.
</quote>

and

<quote>
A function comparefn is a consistent comparison function for a set of
values S if all of the requirements below are met for all values a, b,
and c (possibly the same value) in the set S: The notation a <CF b means
comparefn(a,b) < 0; a =CF b means comparefn(a,b) = 0 (of either sign);
and a >CF b means comparefn(a,b) > 0.
Calling comparefn(a,b) always returns the same value v when given a
specific pair of values a and b as its two arguments. Furthermore,
Type(v) is Number, and v is not NaN. Note that this implies that exactly
one of a <CF b, a =CF b, and a >CF b will be true for a given pair of a
and b.
Calling comparefn(a,b) does not modify the this object.
a =CF a (reflexivity)
If a =CF b, then b =CF a (symmetry)
If a =CF b and b =CF c, then a =CF c (transitivity of =CF)
If a <CF b and b <CF c, then a <CF c (transitivity of <CF)
If a >CF b and b >CF c, then a >CF c (transitivity of >CF)
</quote>

So the used code was bad code.

And if you compare the points in the JavaScript standard with
the points in the blog post, then it seems rather obvious that
the blog poster actually knows about JavaScript.

And it is good to know about JavaScript when commenting about
the correctness of JavaScript code.

Arne
 
G

Göran Andersson

Arne said:
And it is good to know about JavaScript when commenting about
the correctness of JavaScript code.

As this is the C# newsgroup, I can understand why Vanderghast made the
quick assumption that it was about C# code...
 
R

RayLopez99

As this is the C# newsgroup, I can understand why Vanderghast made the
quick assumption that it was about C# code...

You mean the compare function was improper in C# but proper in
JavaScript? Learned something new

RL
 
V

vanderghast

I didn't claim it is proper in C# either, I said it is implementation
dependant, and even in C#, there are cases where the memory less
presentation, as expressed in the reference you pointed out, will simply
hang, because (it seems that) the implementation does more than the very
basic quick sort algorithm. In fact, you may experience such 'crash' by
ordering a large number of elements, a large number of time, with the
'memory less' randomly generated operands. My whole discussion is about:


ASSUMING the classic (very basic) quick sort algorithm ( is
involved ).



I don't know how I should have stress that point even more, sorry.

Vanderghast, Access MVP
 
G

Göran Andersson

RayLopez99 said:
You mean the compare function was improper in C# but proper in
JavaScript? Learned something new

RL

No, it's the other way around. In C# where the sorting algorithm is
known, such a comparer can be used, but in Javascript where the sorting
algorithm is not defined, the comparer has to be deterministic.
 
A

Arne Vajhøj

I didn't claim it is proper in C# either, I said it is implementation
dependant, and even in C#, there are cases where the memory less
presentation, as expressed in the reference you pointed out, will simply
hang, because (it seems that) the implementation does more than the very
basic quick sort algorithm. In fact, you may experience such 'crash' by
ordering a large number of elements, a large number of time, with the
'memory less' randomly generated operands. My whole discussion is about:

ASSUMING the classic (very basic) quick sort algorithm ( is involved ).

I don't know how I should have stress that point even more, sorry.

Assuming is something you can do about things unknown.

There are no point in defending some coding with an
assumption that is explicit stated in the language
specification is not fulfilled.

The code was written in JavaScript and used some code
that is explicit stated in the JavaScript standard to
be implementation specific. So it is bad code.

Assuming the JavaScript standard saying something
different is like:

<example>
Assuming C# had a for loop with the same syntax
as Pascal then:

for i := 1 to 10 do begin
end;

would be valid C#.
</example>

Arne
 
A

Arne Vajhøj

No, it's the other way around. In C# where the sorting algorithm is
known, such a comparer can be used, but in Javascript where the sorting
algorithm is not defined, the comparer has to be deterministic.

Actually the sorting algorithm in .NET is not that well-defined.

The docs only state that it is using quick-sort - it does
not specify exactly how the pivot element is picked.

What is even worse is that docs for IComparer does not specify
any contracts that needs to be fulfilled.

But my claim is that this is a bug and IComparer indeed
have similar requirements to JavaScript (and Java - check
http://java.sun.com/javase/6/docs/api/java/util/Comparator.html#compare(T, T)).

Othwerwise I can not see how Array.BinarySearch will
ever work with IComparer.

Obviously you can argue that with no restrictions in the
docs, then anything goes.

Arne
 
A

Arne Vajhøj

Actually the sorting algorithm in .NET is not that well-defined.

The docs only state that it is using quick-sort - it does
not specify exactly how the pivot element is picked.

What is even worse is that docs for IComparer does not specify
any contracts that needs to be fulfilled.

But my claim is that this is a bug and IComparer indeed
have similar requirements to JavaScript (and Java - check
http://java.sun.com/javase/6/docs/api/java/util/Comparator.html#compare(T, T)).


Othwerwise I can not see how Array.BinarySearch will
ever work with IComparer.

Obviously you can argue that with no restrictions in the
docs, then anything goes.

It should also be noted that this is only about the deterministic
behavior of the sort.

Even if it is deterministic then there are no guarantee that
the the results will be evenly distributed.

In fact a standard implementation of QuickSort and a comparer
as the one described will not result in an even distribution.

Someone better than me at math will have to explain why. I
suspect that it has something to do with the fact that the
number of elements below and below the pivot element is
not uniformly distributed but binominal distributed.

Arne
 
A

Arne Vajhøj

Assuming is something you can do about things unknown.

There are no point in defending some coding with an
assumption that is explicit stated in the language
specification is not fulfilled.

The code was written in JavaScript and used some code
that is explicit stated in the JavaScript standard to
be implementation specific. So it is bad code.

Assuming the JavaScript standard saying something
different is like:

<example>
Assuming C# had a for loop with the same syntax
as Pascal then:

for i := 1 to 10 do begin
end;

would be valid C#.
</example>

And even if the sort was deterministic, then the result would
not have have the desired distribution. Meaning that it would
still be bad code. Producing deterministic biased results are
not that much better than producing undeterministic results.

Arne


Arne
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top