Serious Bug System.Collections Sort

W

william.hooper

There is a longer article about this subject here:
http://www.codeproject.com/useritems/SortedList_Bug.asp
See the main article and the reply thread started by Robert Rohde.

Alternatively look at this code:

ArrayList a=new ArrayList();

string s1 = "-0.67:-0.33:0.33";
string s2 = "0.67:-0.33:0.33";
string s3 = "-0.67:0.33:-0.33";

a.Add(s1);
a.Add(s2);
a.Add(s3);

a.Sort();
for (int i=0; i<3; i++) Console.WriteLine( a );

Console.WriteLine();

a.Clear();
a.Add(s1);
a.Add(s3);
a.Add(s2);

a.Sort();
for (int i=0; i<3; i++) Console.WriteLine( a );

This code produces the following six lines of output:

-0.67:0.33:-0.33
0.67:-0.33:0.33
-0.67:-0.33:0.33

-0.67:-0.33:0.33
-0.67:0.33:-0.33
0.67:-0.33:0.33

Note that the .Sort produces different outputs depending on the order
the strings are added.

It looks like the Sort algorithm is ignoring the "-" mark.

This is a very serious Bug impacting the System.Collections Array,
SortedList etc.
 
M

Marc Gravell

This appears to relate to the culture-specific comparer, which is presumably
ignoring symbols (one of the CompareOptions flags); perhaps switch to
ordinal comparison, which resolves this. You may be able to use
StringComparer.Ordinal; can't remember if that exists in 1.1, but if not
something like this should do (and use it in the Sort() calls).

public class OrdinalStringComparer : IComparer
{
public readonly static OrdinalStringComparer Singleton = new
OrdinalStringComparer();
private OrdinalStringComparer() { }
public int Compare(object x, object y)
{
return string.CompareOrdinal((string) x, (string) y);
}
}

Marc
 
W

william.hooper

I don't agree....

string s2 = "0.67:-0.33:0.33";
string s3 = "-0.67:0.33:-0.33";

Console.WriteLine( String.Compare(s2,s3));
Console.WriteLine( String.Compare(s2,s3));

returns -1 and 1 showing that the Sting.Compare function is working as
expected.

I don't think this is a culture issue and writing an ICompared for
evxery System.Collections string comparison is a nighmare.
 
M

Marc Gravell

Well... it would appear that the problem is a dodgy comparer; try this
(using your previous values for s1, s2, s3):

Console.WriteLine(Comparer.Default.Compare(s1, s2));
Console.WriteLine(Comparer.Default.Compare(s2, s3));
Console.WriteLine(Comparer.Default.Compare(s3, s1));

Yields 1, 1, 1 meaning there is a comparer loop. Oops! Anybody [MS?] want to
comment on whether this is intentional or a fubar? It would appear to
violate the "transitive" rule of comparers, which should be enforced for
non-zero results (0 being transitive as long as it agrees in each
direction).

Actually, it's quite lucky that this returns at all! Perhaps there is a
panic "oops, shouldn't possibly have taken more than n^2 iterations...".

Anyway, the point of my post was that this can be avoided using an ordinal
comparer. And you can re-use the same one each time... no need to write
anything more.

Marc
 
C

Chris Dunaway

I don't agree....

string s2 = "0.67:-0.33:0.33";
string s3 = "-0.67:0.33:-0.33";

Console.WriteLine( String.Compare(s2,s3));
Console.WriteLine( String.Compare(s2,s3));

returns -1 and 1 showing that the Sting.Compare function is working as
expected.

I don't think this is a culture issue and writing an ICompared for
evxery System.Collections string comparison is a nighmare.
From the docs, note how it mentions that the hyphen might be given a
low weight so that similar words will sort together:

"The .NET Framework supports word, string, and ordinal sort rules. A
word sort performs a culture-sensitive comparison of strings in which
certain nonalphanumeric Unicode characters might have special weights
assigned to them. For example, the hyphen ("-") might have a very small
weight assigned to it so that "coop" and "co-op" appear next to each
other in a sorted list. A string sort is similar to a word sort, except
that there are no special cases and all nonalphanumeric symbols come
before all alphanumeric Unicode characters. An ordinal sort compares
strings based on the numeric value of each Char in the string. For more
information about word, string, and ordinal sort rules, see the
System.Globalization.CompareOptions topic.

Comparison and search procedures are case-sensitive by default and use
the culture associated with the current thread unless specified
otherwise. By definition, any string, including the empty string (""),
compares greater than a null reference, and two null references compare
equal to each other.

If your application makes security decisions based on the result of a
comparison or case change operation, then the operation should use the
invariant culture to ensure the result is not affected by the value of
the current culture. For more information, see the
CultureInfo.InvariantCulture topic."

I don't know if this is what is causing the behavior you reported, but
at least it's worth looking into.

Chris
 
W

william.hooper

Yes, Marc, sorry you are right.

The code

string s1 = "-0.67:-0.33:0.33";
string s2 = "0.67:-0.33:0.33";
string s3 = "-0.67:0.33:-0.33";

Console.WriteLine( String.Compare(s1,s2));
Console.WriteLine( String.Compare(s2,s3));
Console.WriteLine( String.Compare(s3,s1));

Console.WriteLine();

Console.WriteLine( String.CompareOrdinal(s1, s2));
Console.WriteLine( String.CompareOrdinal(s2, s3));
Console.WriteLine( String.CompareOrdinal(s3, s1));

returns

1
1
1

-3
3
3

Ie String.Compare can not handle the "-" marks but
String.CompareOrdinal can.

Is there not some regional setting you can put once in the code so
String.Compare and ArrayList.Sort all work that way from then on? Would
much prefer this...

----

Note it annoys me that the documentation writes:

"The .NET Framework supports word, string, and ordinal sort
rules....For example, the hyphen ("-") might have a very small weight
assigned to it so that "coop" and "co-op" appear next to each other in
a sorted list...."

But in fact the hypen is being completly ignored rather than given a
low weight.

I think this behaviour by default is very undesirable.
 
M

Marc Gravell

Well, it isn't being ignored. If it was being ignored I would expect the
result to be 0,0,0.
I think this behaviour by default is very undesirable.
I think it is buggy, but it isn't in System.Collections - it is in
String.CompareTo. I can't think of a valid occasion in a well-ordered system
when a < b < c < a or a > b > c > a. It just makes no logical sense unless
you are Maurits Escher.

Now, a = b = c = a = 0 I could live with (i.e. hyhpens completely ignored).

Marc
 
C

Chris Dunaway

And FWIW, I didn't see this issue officially reported on the feedback
site. You may wish to post it there.
 
W

w.hooper

If you play around with examples like this:

s1 = "0.67:0.33:-0.33";
s2 = "0.67:-0.33:0.33";
s3 = "-0.67:0.33:0.33";

Console.WriteLine( String.Compare(s1,s2));
Console.WriteLine( String.Compare(s2,s3));
Console.WriteLine( String.Compare(s3,s1));

It all looks OK. But when you put two hyphens into the lines the
String.Compare gets confused and gives silly results.

OK the String.CompareOrdinal function fixes the problem but this is not
behaviour by design surely?

We need a comment from our lord and master MS....
 
W

w.hooper

Chris said:
And FWIW, I didn't see this issue officially reported on the feedback
site. You may wish to post it there.

Chris, what is the feedback site? I would liek to post it there. It's
really annoying and you never know, they might take an interest.
 
M

Marc Gravell

Still no MS viewpoint?

For ref, I think this is actually quite important, as it could (as
illustrated on the now-deleted CodeProject link) cause a whole range of
sort-critical operations to fail... SortedList etc, or any custom
collections that assume that .Sort() might actually work, and then
trust the results.

Any thoughts?

Marc
 
W

william.hooper

Yes absolutely, it cost me lots of time in the SortedList failing. I
think it's to do with the hyphen algorithm which is designed to sort
words with hyphens in a smart way. Obviously with two hyphens in the
word it fails. Anyone who puts two hyphens in strings is taking a huge
risk - yet using hyphens in strings is common enough. A really
dangerous bug I would say, but I guess Microsoft haven't seen that
yet. You could mention on the article that it causes problems with
SortedList and you think it's a very serious problem. I think we have
discovered a real corker and it deserves to be given a lot of attention.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top