ArrayList.Sort() change in v2.0?

G

Guest

The implementation of the Sort() method on a System.Collections.ArrayList
object seems to have changed in v2.0. In particular, attempting to sort
partially sorted data is very slow. Maybe using QuickSort on already sorted
data will result in worst case performance, but v1.1 ArrayList.Sort() doesn't
display the same level of slowness, nor does List<string>.Sort().

I've included test code (console app) below to demonstrate the problem. It
creates an already sorted string array, adds an extra string to various
locations in the array, then calls Sort(). To run against v1.1, which
doesn't include the generic test, simply comment out the "#define CLR_V2".
I'm seeing this issue on all machines tested. Here is sample output:

*****************
*****************
Compiled with Framework Version: v1.1.4322

Sort Test: No extra string added
ArrayList: 0.0468714(s)

Sort Test: Extra string 'aaa' inserted at beginning
ArrayList: 0.0312476(s)

Sort Test: Extra string 'aaa' inserted at middle
ArrayList: 0.0468714(s)

Sort Test: Extra string 'aaa' added to end
ArrayList: 0.0624952(s)

*****************
Compiled with Framework Version: v2.0.50727

Sort Test: No extra string added
List<string>: 0.0624952(s)
ArrayList: 0.0312476(s)

Sort Test: Extra string 'aaa' inserted at beginning
List<string>: 0.0468714(s)
ArrayList: 0.0468714(s)

Sort Test: Extra string 'aaa' inserted at middle
List<string>: 0.0624952(s)
ArrayList: 5.7183108(s)

Sort Test: Extra string 'aaa' added to end
List<string>: 0.0468714(s)
ArrayList: 23.0138574(s)

*****************
*****************
Code:
(copy\paste over generated class in a Visual Studio Console Application
Project )

#define CLR_V2

using System;
using System.Collections;

class Program
{
enum IncludeExtraString
{
None,
FirstElement,
MiddleElement,
LastElement
}

static void Main(string[] args)
{
string extra = "aaa";

Console.WriteLine("Compiled with Framework Version: {0}",
System.Reflection.Assembly.GetExecutingAssembly().ImageRuntimeVersion);

Console.WriteLine();
Console.WriteLine("Sort Test:\tNo extra string added");
ArrayTest(IncludeExtraString.None, null);

Console.WriteLine();
Console.WriteLine("Sort Test:\tExtra string '{0}' inserted at
beginning", extra);
ArrayTest(IncludeExtraString.FirstElement, extra);

Console.WriteLine();
Console.WriteLine("Sort Test:\tExtra string '{0}' inserted at middle",
extra);
ArrayTest(IncludeExtraString.MiddleElement, extra);

Console.WriteLine();
Console.WriteLine("Sort Test:\tExtra string '{0}' added to end", extra);
ArrayTest(IncludeExtraString.LastElement, extra);
}

static void ArrayTest(IncludeExtraString includeString, string extraString)
{
DateTime start, end;
string[] strArray = CreateArray(includeString, extraString);

#if CLR_V2
System.Collections.Generic.List<string> genericList =
new System.Collections.Generic.List<string>(strArray.Length);
genericList.AddRange(strArray);

start = DateTime.Now;
genericList.Sort();
end = DateTime.Now;
Console.WriteLine("List<string>:\t{0}(s)", ((TimeSpan)(end -
start)).TotalSeconds);
#endif

ArrayList arrayList = new ArrayList(strArray.Length);
arrayList.AddRange(strArray);

start = DateTime.Now;
arrayList.Sort();
end = DateTime.Now;
Console.WriteLine("ArrayList:\t{0}(s)", ((TimeSpan)(end -
start)).TotalSeconds);
}

static string[] CreateArray(IncludeExtraString includeString, string
extraString)
{
ArrayList list = new ArrayList();

char[] charString = new char[3];
for (int firstChar = 0; firstChar < 26; firstChar++)
{
charString[0] = Convert.ToChar('a' + firstChar);
for (int secondChar = 0; secondChar < 26; secondChar++)
{
charString[1] = Convert.ToChar('a' + secondChar);
for (int thirdChar = 0; thirdChar < 26; thirdChar++)
{
charString[2] = Convert.ToChar('a' + thirdChar);
list.Add(new string(charString));
}
}
}

switch (includeString)
{
case IncludeExtraString.FirstElement:
list.Insert(0, extraString);
break;
case IncludeExtraString.MiddleElement:
list.Insert(list.Count / 2, extraString);
break;
case IncludeExtraString.LastElement:
list.Add(extraString);
break;
}
return (string[])list.ToArray(typeof(string));
}
}
 
W

Wiktor Zychla [C# MVP]

shame on .NET 2.0 for that! I confirm your observations.

with another, rather concise example:

using System;
using System.Collections;

class ATest {
public static void Test( int n ) {
Console.Write( "{0}: ", n );
ArrayList list = new ArrayList();
for ( int i=0; i<n; i++ ) list.Add(i);
list.Add(1);
DateTime start = DateTime.Now;
list.Sort();
Console.WriteLine( "Time: {0}", DateTime.Now-start );
}
public static void Main() {
Console.WriteLine("Compiled with Framework Version: {0}",
System.Reflection.Assembly.GetExecutingAssembly().ImageRuntimeVersion);
Test(1000);Test(5000);Test(10000);Test(20000);Test(50000);
}
}

I get:

Compiled with Framework Version: v1.1.4322
1000: Time: 00:00:00
5000: Time: 00:00:00
10000: Time: 00:00:00.0156247
20000: Time: 00:00:00.0156247
50000: Time: 00:00:00.0312494

Compiled with Framework Version: v2.0.50727
1000: Time: 00:00:00.0156247
5000: Time: 00:00:00.6406127
10000: Time: 00:00:02.5780755
20000: Time: 00:00:10.2966773
50000: Time: 00:01:05.1393743

I didn't even try to run the test with 100000 elements.

Any other comments on that?
Regards,
Wiktor Zychla
 
W

Wiktor Zychla [C# MVP]

Any other comments on that?

at least I've found a significant difference.

in both cases ArrayList.Sort calls Array.Sort internally, which in turn
calls Array.SorterObjectArray.QuickSort.

in 1.1, the QuickSort always takes a value exactly from the middle of an
array as a splitter.

however, in 2.0 a small "optimization" was added to the QuickSort method and
now it tries to "be smart" and guesses what value should be used as a
splitter: a first one, a last one or the one from the middle of an array
(GetPivotValue method).

I am not a QuickSort expert but I belive that this "optimization" should
have made the typical case slightly faster while making the worst-case
(presorted array) slightly worse.

However, a simple experiment that consist of filling the ArrayList with
completely random values (which should be an ideal "typical case") reveals
that the "optimized" algorithm is still slower!

here are my results for a "random value test":

Compiled with Framework Version: v1.1.4322
1000: Time: 00:00:00
5000: Time: 00:00:00
10000: Time: 00:00:00.0156247
20000: Time: 00:00:00.0312494
50000: Time: 00:00:00.0624988

Compiled with Framework Version: v2.0.50727
1000: Time: 00:00:00
5000: Time: 00:00:00.0156247
10000: Time: 00:00:00.0156247
20000: Time: 00:00:00.0312494
50000: Time: 00:00:00.0937482

I am not sure then they could motivate that change: a change, let me repeat
that, which makes a typical case slightly slower (!) and completely ruins a
worst case (!!).

Regards,
Wiktor Zychla

// random value test
using System;
using System.Collections;

class ATest {
public static void Test( int n ) {
Console.Write( "{0}: ", n );Random r = new Random();
ArrayList list = new ArrayList();
for ( int i=0; i<n; i++ ) list.Add(r.Next());
//list.Add(1);
DateTime start = DateTime.Now;
list.Sort();
Console.WriteLine( "Time: {0}", DateTime.Now-start );
}
public static void Main() {
Console.WriteLine("Compiled with Framework Version: {0}",
System.Reflection.Assembly.GetExecutingAssembly().ImageRuntimeVersion);
Test(1000);Test(5000);Test(10000);Test(20000);Test(50000);
}
}
 
G

Guest

Hi Wiktor,

I advise you *not* to use DateTime.Now for measuring performance. Instead
use the win32 performance counters for accurate measurement:

public static long QueryPerformanceCounter( )
{
long perfCount;
QueryPerformanceCounter( out perfCount );
return perfCount;
}

public static long QueryPerformanceFrequency( )
{
long freq;
QueryPerformanceFrequency( out freq );
return freq;
}

[ System.Runtime.InteropServices.DllImport( "Kernel32.dll" ) ]
private static extern bool QueryPerformanceCounter( out long perfcount );

[ System.Runtime.InteropServices.DllImport( "Kernel32.dll" ) ]
private static extern bool QueryPerformanceFrequency( out long freq );

Then do you measurement as follows:

public static void DoTest( )
{
long begin = QueryPerformanceCounter( );

// Do sorting

long end = QueryPerformanceCounter( );
long freq = QueryPerformanceFrequency( );
long time_ms = (long) ( ((double) ( end - begin )) / ((double) freq) *
1000.0 );
}

The time_ms gives the number of milliseconds the sorting took.

Kind regards,
 
W

Wiktor Zychla [C# MVP]

I advise you *not* to use DateTime.Now for measuring performance. Instead
use the win32 performance counters for accurate measurement:


Tom,

thank you for your comment. While you are absolutely right about using
perormance counters for accurate measurement, event DateTime.Now is
sufficient to notice the difference between a second and a minute in sorting
preordered list ;)

After analyzing the problem a little deeper, I think that the original
issue with the ArrayList is not a bug "per se", it is rather an
inconsistency. You see, the QuickSort method, no matter how you pick up the
splitter value, beaves poorly on worst-cases. The original 1.1
implementation that picks the splitter from the middle of the list is also
terribly slow but in another case, when values form a "v" (Ia m not sure if
it is a worst case for such splitting strategy but it is surely bad enough):

using System;
using System.Collections;

class ATest {
public static void Test( int n ) {
Console.Write( "{0}: ", n );
ArrayList list = new ArrayList();
for ( int i=0; i<n; i++ )
list.Add( -Math.Abs( i-(n/2) ) );
DateTime start = DateTime.Now;
list.Sort();
Console.WriteLine( "Time: {0}", DateTime.Now-start );
}
public static void Main() {
Console.WriteLine("Compiled with Framework Version: {0}",
System.Reflection.Assembly.GetExecutingAssembly().ImageRuntimeVersion);
Test(1000);Test(5000);Test(10000);Test(20000);Test(50000);
}
}

Compiled with Framework Version: v1.1.4322
1000: Time: 00:00:00.0468741
5000: Time: 00:00:00.2499952
10000: Time: 00:00:01.0468549
20000: Time: 00:00:04.1249208
50000: Time: 00:00:25.9213773

Compiled with Framework Version: v2.0.50727
1000: Time: 00:00:00
5000: Time: 00:00:00.0156247
10000: Time: 00:00:00.0312494
20000: Time: 00:00:00.0781235
50000: Time: 00:00:00.1874964

What has been done between 1.1 and 2.0 then was the changing the strategy
of picking the QuickSort splitter, thus falling from one category of
worst-cases to another (from "vs" to "preordered list with appended minimal
element").

Why it is an incosistency then? Because the List<T>.Sort uses another
implementation of QuickSort that picks the splitter in a "1.1" way - from
the middle of the table. That's why Dan has observed that the List<T> is not
influenced with the ArrayList sorting "bug".

Regards,
Wiktor Zychla
 
J

Jon Skeet [C# MVP]

TT (Tom Tempelaere) <"=?Utf-8?B?VFQgKFRvbSBUZW1wZWxhZXJlKQ==?=" <_|\|_0
$P@|/\|titi____AThotmailD.Tcom|/\|@P$0_|\|_> said:
I advise you *not* to use DateTime.Now for measuring performance. Instead
use the win32 performance counters for accurate measurement:

While this is true if you need high precision, I would argue that most
benchmarks which run long enough to avoid the noise of other processes
etc make the extra precision irrelevant. I don't like running
benchmarks for less than 10 seconds, at which point it doesn't really
matter whether you're 50 milliseconds out.

It's a lot quicker to write tests if you don't have to include PInvoke
etc, too :)
 
W

Willy Denoyette [MVP]

The String class has undergone some changes that may influence Sort
performance on non Generic containers, the changes are documented here:
http://msdn.microsoft.com/netframew...ibrary/en-us/dndotnet/html/StringsinNET20.asp

What you could do to speed up the AL sort is selecting the ordinal string
comparer.

arrayList.Sort(StringComparer.Ordinal);
Note that you won't achieve the perf. level of v1.x, but it should be
comparable in a general case (not a worse case like in your sample).


Willy.




| The implementation of the Sort() method on a System.Collections.ArrayList
| object seems to have changed in v2.0. In particular, attempting to sort
| partially sorted data is very slow. Maybe using QuickSort on already
sorted
| data will result in worst case performance, but v1.1 ArrayList.Sort()
doesn't
| display the same level of slowness, nor does List<string>.Sort().
|
| I've included test code (console app) below to demonstrate the problem.
It
| creates an already sorted string array, adds an extra string to various
| locations in the array, then calls Sort(). To run against v1.1, which
| doesn't include the generic test, simply comment out the "#define CLR_V2".
| I'm seeing this issue on all machines tested. Here is sample output:
|
| *****************
| *****************
| Compiled with Framework Version: v1.1.4322
|
| Sort Test: No extra string added
| ArrayList: 0.0468714(s)
|
| Sort Test: Extra string 'aaa' inserted at beginning
| ArrayList: 0.0312476(s)
|
| Sort Test: Extra string 'aaa' inserted at middle
| ArrayList: 0.0468714(s)
|
| Sort Test: Extra string 'aaa' added to end
| ArrayList: 0.0624952(s)
|
| *****************
| Compiled with Framework Version: v2.0.50727
|
| Sort Test: No extra string added
| List<string>: 0.0624952(s)
| ArrayList: 0.0312476(s)
|
| Sort Test: Extra string 'aaa' inserted at beginning
| List<string>: 0.0468714(s)
| ArrayList: 0.0468714(s)
|
| Sort Test: Extra string 'aaa' inserted at middle
| List<string>: 0.0624952(s)
| ArrayList: 5.7183108(s)
|
| Sort Test: Extra string 'aaa' added to end
| List<string>: 0.0468714(s)
| ArrayList: 23.0138574(s)
|
| *****************
| *****************
| Code:
| (copy\paste over generated class in a Visual Studio Console Application
| Project )
|
| #define CLR_V2
|
| using System;
| using System.Collections;
|
| class Program
| {
| enum IncludeExtraString
| {
| None,
| FirstElement,
| MiddleElement,
| LastElement
| }
|
| static void Main(string[] args)
| {
| string extra = "aaa";
|
| Console.WriteLine("Compiled with Framework Version: {0}",
|
System.Reflection.Assembly.GetExecutingAssembly().ImageRuntimeVersion);
|
| Console.WriteLine();
| Console.WriteLine("Sort Test:\tNo extra string added");
| ArrayTest(IncludeExtraString.None, null);
|
| Console.WriteLine();
| Console.WriteLine("Sort Test:\tExtra string '{0}' inserted at
| beginning", extra);
| ArrayTest(IncludeExtraString.FirstElement, extra);
|
| Console.WriteLine();
| Console.WriteLine("Sort Test:\tExtra string '{0}' inserted at middle",
| extra);
| ArrayTest(IncludeExtraString.MiddleElement, extra);
|
| Console.WriteLine();
| Console.WriteLine("Sort Test:\tExtra string '{0}' added to end",
extra);
| ArrayTest(IncludeExtraString.LastElement, extra);
| }
|
| static void ArrayTest(IncludeExtraString includeString, string
extraString)
| {
| DateTime start, end;
| string[] strArray = CreateArray(includeString, extraString);
|
| #if CLR_V2
| System.Collections.Generic.List<string> genericList =
| new System.Collections.Generic.List<string>(strArray.Length);
| genericList.AddRange(strArray);
|
| start = DateTime.Now;
| genericList.Sort();
| end = DateTime.Now;
| Console.WriteLine("List<string>:\t{0}(s)", ((TimeSpan)(end -
| start)).TotalSeconds);
| #endif
|
| ArrayList arrayList = new ArrayList(strArray.Length);
| arrayList.AddRange(strArray);
|
| start = DateTime.Now;
| arrayList.Sort();
| end = DateTime.Now;
| Console.WriteLine("ArrayList:\t{0}(s)", ((TimeSpan)(end -
| start)).TotalSeconds);
| }
|
| static string[] CreateArray(IncludeExtraString includeString, string
| extraString)
| {
| ArrayList list = new ArrayList();
|
| char[] charString = new char[3];
| for (int firstChar = 0; firstChar < 26; firstChar++)
| {
| charString[0] = Convert.ToChar('a' + firstChar);
| for (int secondChar = 0; secondChar < 26; secondChar++)
| {
| charString[1] = Convert.ToChar('a' + secondChar);
| for (int thirdChar = 0; thirdChar < 26; thirdChar++)
| {
| charString[2] = Convert.ToChar('a' + thirdChar);
| list.Add(new string(charString));
| }
| }
| }
|
| switch (includeString)
| {
| case IncludeExtraString.FirstElement:
| list.Insert(0, extraString);
| break;
| case IncludeExtraString.MiddleElement:
| list.Insert(list.Count / 2, extraString);
| break;
| case IncludeExtraString.LastElement:
| list.Add(extraString);
| break;
| }
| return (string[])list.ToArray(typeof(string));
| }
| }
|
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top