.NET 2.0 performance bug in ArrayList.Sort

G

Guest

I have come across with what appears to be a significant performance bug in
..NET 2.0 ArrayList.Sort method when compared with Array.Sort on the same
data. Same data on the same CPU gets sorted a lot faster with both methods
using .NET 1.1, that's why I am pretty sure its a (rather serious) bug. Below
you can find C# test case that should allow you to reproduce this error, to
run it you will need to put 2 data files into current directory where
executable is or just change filename pathes accordingly, the data files can
be obtained from here:

fast_data.txt: http://majestic12.co.uk/files/other/dotnet2bug/fast_data.txt
slow_data.txt: http://majestic12.co.uk/files/other/dotnet2bug/slow_data.txt

The data are strings (URLs) of about similar size in bytes and number.

The following are the console outputs from code on the same machine (AMD x2
3800, 2 GB RAM, XP Pro SP2):

VS2003 .NET 1.1 (with SP1) run:

-----------------------------------------------------------
Loaded 29974 lines from file slow_data.txt
Time to sort strings in ArrayList is: 250 msec
Time to sort strings in string[] array is: 234 msec
Loaded 31688 lines from file fast_data.txt
Time to sort strings in ArrayList is: 250 msec
Time to sort strings in string[] array is: 250 msec
-----------------------------------------------------------

Note that sorting times are almost exactly the same here, so all good in
..NET 1.1 .

-----------------------------------------------------------
VS2005 .NET 2.0 run:

Loaded 29974 lines from file slow_data.txt
Time to sort strings in ArrayList is: 1531 msec
Time to sort strings in string[] array is: 187 msec
Loaded 31688 lines from file fast_data.txt
Time to sort strings in ArrayList is: 703 msec
Time to sort strings in string[] array is: 171 msec
Press ENTER to exit
-----------------------------------------------------------

Notice that on the same machine with the same data sorting times are MUCH
slower in ArrayList.Sort, and particularly for the "slow_data.txt" file,
Array.Sort times are actually better, so I am not complaining there, but
clearly ArrayList sorts are seriously flawed - this appears to be data
dependent and by that I don't mean size of the data or number of strings: I
have got lots of such data files and about every 10th of them is 10-20 times
slower than the other even though it has got about the same number of lines
and bytesize.

Note: I am aware of boxing/unboxing overheads when dealing with ArrayLists,
however in this case the slowdown is really bad comparing to .NET 1.1 and it
appears to be data dependent - I am getting it on about 10% of my dataset
from which I have selected 2 examples (slow and fast) to demonstrate that its
a very serious performance issue.

Here is the code that should allow you to replicate the problem:

//////////////////////////////////////////////////////////////////////////
using System;
using System.Collections;
using System.IO;

/*
This is a test case of what appears to be a major performance issue in
ArrayList.Sort method for strings
in .NET 2.0 - it appears to be data dependant as some similarly sized
data files have got a lot less
performance penalty when sorting them using Array.Sort method on
string[] array.

The kicker: both versions run fast in .NET 1.1

Author: Alex Chudnovsky <[email protected]>
Date: 28 Apr 2006
*/


namespace Majestic12
{
/// <summary>
/// Tests sorting performance of Array.Sort of string[] versus
ArrayList.Sort of the same strings
/// It appears that in .NET 2.0 in some cases ArrayList will take a LOT
more time to do the sorting
/// </summary>
class SlowArrayListSortTest
{
static void Main(string[] args)
{
// load strings from file: assumed to be in the same place as
the executable
TestFile("slow_data.txt"); // <--- this data file has got 10
times slower
TestFile("fast_data.txt"); // <--- more reasonable 2 times slower

Console.WriteLine("Press ENTER to exit");
Console.ReadLine();
}

static void TestFile(string sFile)
{
FileStream oFS=File.OpenRead(sFile);

ArrayList oLines=new ArrayList();

StreamReader oSR=new StreamReader(oFS);

while(oSR.BaseStream.Position<oSR.BaseStream.Length)
{
oLines.Add(oSR.ReadLine());
}

oFS.Close();

Console.WriteLine("Loaded {0} lines from file
{1}",oLines.Count,sFile);

// now copy same strings into string array for speed comparisons
string[] sLines=new string[oLines.Count];

for(int i=0;i<sLines.Length;i++)
sLines=(string)oLines;

DateTime oTime=DateTime.Now;
oLines.Sort();
Console.WriteLine("Time to sort strings in ArrayList is: {0}
msec",(DateTime.Now.Ticks-oTime.Ticks)/TimeSpan.TicksPerMillisecond);

oTime=DateTime.Now;
Array.Sort(sLines);
Console.WriteLine("Time to sort strings in string[] array is:
{0} msec",(DateTime.Now.Ticks-oTime.Ticks)/TimeSpan.TicksPerMillisecond);
}
}
}

//////////////////////////////////////////////////////////////////////////
 
F

Frans Bouma [C# MVP]

Alex said:
I have come across with what appears to be a significant performance
bug in .NET 2.0 ArrayList.Sort method when compared with Array.Sort
on the same data. Same data on the same CPU gets sorted a lot faster
with both methods using .NET 1.1, that's why I am pretty sure its a
(rather serious) bug. Below you can find C# test case that should
allow you to reproduce this error, to run it you will need to put 2
data files into current directory where executable is or just change
filename pathes accordingly, the data files can be obtained from here:

fast_data.txt:
http://majestic12.co.uk/files/other/dotnet2bug/fast_data.txt
slow_data.txt:
http://majestic12.co.uk/files/other/dotnet2bug/slow_data.txt

The data are strings (URLs) of about similar size in bytes and number.

The following are the console outputs from code on the same machine
(AMD x2 3800, 2 GB RAM, XP Pro SP2):

VS2003 .NET 1.1 (with SP1) run:

-----------------------------------------------------------
Loaded 29974 lines from file slow_data.txt
Time to sort strings in ArrayList is: 250 msec
Time to sort strings in string[] array is: 234 msec
Loaded 31688 lines from file fast_data.txt
Time to sort strings in ArrayList is: 250 msec
Time to sort strings in string[] array is: 250 msec
-----------------------------------------------------------

Note that sorting times are almost exactly the same here, so all good
in .NET 1.1 .

-----------------------------------------------------------
VS2005 .NET 2.0 run:

Loaded 29974 lines from file slow_data.txt
Time to sort strings in ArrayList is: 1531 msec
Time to sort strings in string[] array is: 187 msec
Loaded 31688 lines from file fast_data.txt
Time to sort strings in ArrayList is: 703 msec
Time to sort strings in string[] array is: 171 msec
Press ENTER to exit
-----------------------------------------------------------

Notice that on the same machine with the same data sorting times are
MUCH slower in ArrayList.Sort, and particularly for the
"slow_data.txt" file, Array.Sort times are actually better, so I am
not complaining there, but clearly ArrayList sorts are seriously
flawed - this appears to be data dependent and by that I don't mean
size of the data or number of strings: I have got lots of such data
files and about every 10th of them is 10-20 times slower than the
other even though it has got about the same number of lines and
bytesize.

Note: I am aware of boxing/unboxing overheads when dealing with
ArrayLists, however in this case the slowdown is really bad comparing
to .NET 1.1 and it appears to be data dependent - I am getting it on
about 10% of my dataset from which I have selected 2 examples (slow
and fast) to demonstrate that its a very serious performance issue.

Here is the code that should allow you to replicate the problem:

//////////////////////////////////////////////////////////////////////
//// using System;
using System.Collections;
using System.IO;

/*
This is a test case of what appears to be a major performance
issue in ArrayList.Sort method for strings
in .NET 2.0 - it appears to be data dependant as some similarly
sized data files have got a lot less
performance penalty when sorting them using Array.Sort method on
string[] array.

The kicker: both versions run fast in .NET 1.1

Author: Alex Chudnovsky <[email protected]>
Date: 28 Apr 2006
*/


namespace Majestic12
{
/// <summary>
/// Tests sorting performance of Array.Sort of string[] versus
ArrayList.Sort of the same strings
/// It appears that in .NET 2.0 in some cases ArrayList will take
a LOT more time to do the sorting
/// </summary>
class SlowArrayListSortTest
{
static void Main(string[] args)
{
// load strings from file: assumed to be in the same
place as the executable
TestFile("slow_data.txt"); // <--- this data file has got
10 times slower
TestFile("fast_data.txt"); // <--- more reasonable 2
times slower

Console.WriteLine("Press ENTER to exit");
Console.ReadLine();
}

static void TestFile(string sFile)
{
FileStream oFS=File.OpenRead(sFile);

ArrayList oLines=new ArrayList();

StreamReader oSR=new StreamReader(oFS);

while(oSR.BaseStream.Position<oSR.BaseStream.Length)
{
oLines.Add(oSR.ReadLine());
}

oFS.Close();

Console.WriteLine("Loaded {0} lines from file
{1}",oLines.Count,sFile);

// now copy same strings into string array for speed
comparisons string[] sLines=new string[oLines.Count];

for(int i=0;i<sLines.Length;i++)
sLines=(string)oLines;

DateTime oTime=DateTime.Now;
oLines.Sort();
Console.WriteLine("Time to sort strings in ArrayList is:
{0}
msec",(DateTime.Now.Ticks-oTime.Ticks)/TimeSpan.TicksPerMillisecond);

oTime=DateTime.Now;
Array.Sort(sLines);
Console.WriteLine("Time to sort strings in string[] array
is: {0}
msec",(DateTime.Now.Ticks-oTime.Ticks)/TimeSpan.TicksPerMillisecond);
} }
}

//////////////////////////////////////////////////////////////////////
////


Strange, as ArrayList.Sort uses Array.Sort() :D (use reflector, and
see for yourself). So it's not the sort that's flawed, as it's
precisely the same routine that gets executed.

The only difference is that ArrayList.Sort() uses a default comparer,
Array.Sort() doesn't. I think this is where the problem occurs.

I.o.w.: your sorts aren't exactly the same: the array sort is sorting
an array of strings. The arraylist sort is sorting an array of objects
using a comparer.

FB

--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------
 
G

Guest

Frans Bouma said:
I.o.w.: your sorts aren't exactly the same: the array sort is sorting
an array of strings. The arraylist sort is sorting an array of objects
using a comparer.

I am aware of that - I mentioned boxing/unboxing overheads. The issues are:
a) its a LOT slower than in .NET 1.1 <-- this alone makes it unacceptable
b) its DRAMATICALLY slower with some data
Strange, as ArrayList.Sort uses Array.Sort() :D (use reflector, and
see for yourself). So it's not the sort that's flawed, as it's
precisely the same routine that gets executed.

I can't say for sure what's flawed but _something_ certainly is, its no so
much slower performance of ArrayList.Sort that bothers me, its much slower
performance than same code for .NET 1.1, I added List<string> test and it
runs just fine, the problem appears to be solely in ArrayList - there is
absolutely no reason why sorting should take many times over version that is
present in .NET 1.1, I actually have got some much worse case data scenarios
that take even much much longer, this is an issue since legacy .NET 1.1 won't
use List<string> and thus require a rewrite. If performance hit was moderate
it could have been ignored, but the performance hit appears to be _DATA_
dependent, if anyone from Microsoft is interested to track this further then
I will be happy to assist.

Here is link to the source code that was mangled by this forum:
http://majestic12.co.uk/files/other/dotnet2bug/program.cs

Of course I did workaround it by using List<string>, but I don't like that
"solution" since:
a) it prevents me from being able to do legacy .NET 1.1 builds or involve
conditional defines
b) legacy compiled stuff without source will be hit by unexplained slowdowns

regards,

Alex
 
J

Jon Skeet [C# MVP]

Alex Chudnovsky said:
I am aware of that - I mentioned boxing/unboxing overheads.

Those won't be relevant with strings - there's no boxing/unboxing
involved with reference types.

(I haven't read the rest of this thread - just thought I'd mention that
quickly now.)
 
G

Guest

Those won't be relevant with strings - there's no boxing/unboxing
involved with reference types.

Fair enough, I stand corrected. The main issue however is still here: there
should not be any reason why same ArrayList sorts a lot slower in .NET 2.0
build than when it does in .NET 1.1 build with everything else being the same
(machine, data etc). Generic string list and Array.Sort actually sort faster
in .NET 2.0, but ArrayList seems to be rather badly affected, at least for
the data I provided.

It would be good if someone tried the sample code (see link above) and
posted their timings just to see if this is replicated on other machines
(maybe its AMD Athlon x2 that's only affected, something I feel rather
unlikely, but you never know).
 
M

Markus

Alex,
It would be good if someone tried the sample code (see link above)
and posted their timings just to see if this is replicated on other
machines (maybe its AMD Athlon x2 that's only affected, something I
feel rather unlikely, but you never know).


just tried your code, one time with a big file (3MB) and once with a
small file (just several KB):

Result 1, Big File. This is an average result, even though the measured
milliseconds are not always the same (about +/- 40 msec in execution time):

Loaded 70072 lines from file slow_data.txt
Time to sort strings in ArrayList is: 406 msec
Time to sort strings in Generic Lst is: 406 msec
Time to sort strings in string[] array is: 406 msec
Loaded 70072 lines from file fast_data.txt
Time to sort strings in ArrayList is: 406 msec
Time to sort strings in Generic Lst is: 406 msec
Time to sort strings in string[] array is: 453 msec
Press ENTER to exit


Result 2, Small file. Here it is really depending, on the fast data, I
also got the result, where the ArrayList had 15msec and the Generic List
got 31 msec... On the slow data, the Generic List seems to be quite
constant, whereas the ArrayList is between 70 and 180 msec.
One example output:

Loaded 5835 lines from file slow_data.txt
Time to sort strings in ArrayList is: 78 msec
Time to sort strings in Generic Lst is: 93 msec
Time to sort strings in string[] array is: 62 msec
Loaded 5835 lines from file fast_data.txt
Time to sort strings in ArrayList is: 31 msec
Time to sort strings in Generic Lst is: 15 msec
Time to sort strings in string[] array is: 31 msec
Press ENTER to exit


Finally, I thought, that there might be problem with the startup and I
switch the line oLines.Sort with the one of oGLines.Sort and it
happened, that now the oGLines.Sort semms to be slower... so I think it
is a matter of Startup (most of the time might arise at a first
initialization of Array.Sort())... maybe some internal static variable
or whatever...

And additionally, I tried some really big file (some 18MB) and it
happened, that all three sorting algorithms merely were the same... so
for me the Sort() method performs well, it must be some matter of startup.

long answer and a lot of assumptions, but I hope to help!
Markus

btw.: I have tested it on a PIV Centrino Laptop.
 
G

Guest

Thanks for that Marcus - but from what I can see you have not tried it with
my data that I linked to, specifically the following that seems slower than
usual: http://majestic12.co.uk/files/other/dotnet2bug/slow_data.txt

I have got thousands of such data files and about 10% of them exhibit the
problem, the rest get sorted in ArrayList pretty fast, however SOME are much
slower, yet if Array.Sort is used then they are all fast, hence my assertion
that its data dependent and that's why I provided links to actual data to
help replicate it, please try it and post here results, thanks!

The issue appears to be really data dependent, which is probably why it may
have escaped scrutiny during normal tests.

regards

Alex
 
J

Jon Skeet [C# MVP]

Alex Chudnovsky said:
Thanks for that Marcus - but from what I can see you have not tried it with
my data that I linked to, specifically the following that seems slower than
usual: http://majestic12.co.uk/files/other/dotnet2bug/slow_data.txt

I have got thousands of such data files and about 10% of them exhibit the
problem, the rest get sorted in ArrayList pretty fast, however SOME are much
slower, yet if Array.Sort is used then they are all fast, hence my assertion
that its data dependent and that's why I provided links to actual data to
help replicate it, please try it and post here results, thanks!

The issue appears to be really data dependent, which is probably why it may
have escaped scrutiny during normal tests.

I believe I've read before that the ArrayList.Sort pivoting strategy
has changed between versions, which makes some data slower and some
data faster. I think that if it's faster "on average" and the speed-up
of some is comparable to the slow-down of others, that it's a
reasonable change - it sounds like you're just unlucky in your data.
 
G

Guest

I believe I've read before that the ArrayList.Sort pivoting strategy
has changed between versions, which makes some data slower and some
data faster. I think that if it's faster "on average" and the speed-up
of some is comparable to the slow-down of others, that it's a
reasonable change - it sounds like you're just unlucky in your data.

I'd love to see some pointers showing that its the case.

IMO its totally unacceptable to have such an unjustifiable slowdown, as I am
actually getting more than 10 times slower sorts. The only reason I noticed
them because code that run fast under .NET 1.1 actually was like a dog under
..NET 2, and even though 90% of data sets I have get sorted fast, the
remaining 10% were killing performance so much that I had to avoid using
ArrayList. I totally appreciate that some data takes more time to sort,
however the different between sorting calls should not be that dramatic.

The "luck" should not be factor between ArrayList.Sort and List<string>.Sort
and Array.Sort - some overheads - yes, but not THAT much, clearly something
is seriously wrong.

I would really like to see some independent confirmation of timings: please
use my data files as supplied above.

regards,

Alex
 
G

Guest

Ok, I got the real good test for you, this is the actual slowdown that I
experienced in my app, the test before was not as obvious as it is now.

Source code:

Data file (you need to use it, don't use your data files as error seems to
be data dependent!):

-----------------------------------------------
Running under .NET v2.0.50727.42
Loaded 29972 lines from file slow_data.txt
Time to sort strings in Generic List is: 171 msec
Time to sort strings in ArrayList is: 33343 msec
Time to sort strings in string[] array is: 171 msec
Press ENTER to exit
-----------------------------------------------

Notice ArrayList sorting time? Same data but its actually 200 (!) TIMES
slower!

Here is same code (apart from Generic List that is only prevent in .NET 2.0)
run on .NET 1.1:

-----------------------------------------------
Loaded 29972 lines from file slow_data.txt
Time to sort strings in ArrayList is: 187 msec
Time to sort strings in string[] array is: 187 msec
Press ENTER to exit
-----------------------------------------------

Notice that all works fine in .NET 1.1, its same machine, same everything.

So, I would appreciate anyone to try the code above and data file to run on
their PC to verify if you get same much slower performance from ArrayLists
for _that_ data file (ignore all other files you may have).

This is serious stuff - 200 times slowdown for 10% of my real life data
cases meant overall slowdown of around 20 times: this is a killer for legacy
..NET 1.1 apps that may mysteriously start using much more CPU.
 
G

Greg Young

From
http://msdn.microsoft.com/netframew...ibrary/en-us/dndotnet/html/StringsinNET20.asp
try using ArrayList.Sort(StringComparer.Ordinal) ..

with your data ... in above example ...

Running under .NET v2.0.50727.42
Loaded 29972 lines from file C:\slow_data.txt
Time to sort strings in Generic List is: 281 msec
Time to sort strings in ArrayList is: 64281 msec
Time to sort strings in ArrayList with Ordinal is: 140 msec
Time to sort strings in string[] array is: 265 msec
Press ENTER to exit

code added was ..

//bad data is already sorted (worst case) so just resort it again

oTime = DateTime.Now;

oLines.Sort(StringComparer.Ordinal);

Console.WriteLine("Time to sort strings in ArrayList with Ordinal is: {0}
msec", (DateTime.Now.Ticks - oTime.Ticks) / TimeSpan.TicksPerMillisecond);



Cheers,

Greg Young

Visual C# MVP
 
G

Greg Young

Ah they weren't completely sorted .. still huge gain ...

Running under .NET v2.0.50727.42
Loaded 29972 lines from file C:\slow_data.txt
Time to sort strings in Generic List is: 296 msec
Time to sort strings in ArrayList with Ordinal is: 8593 msec
Time to sort strings in ArrayList is: 1046 msec
Time to sort strings in string[] array is: 281 msec
Press ENTER to exit

Cheers,

Greg
Greg Young said:
From
http://msdn.microsoft.com/netframew...ibrary/en-us/dndotnet/html/StringsinNET20.asp
try using ArrayList.Sort(StringComparer.Ordinal) ..

with your data ... in above example ...

Running under .NET v2.0.50727.42
Loaded 29972 lines from file C:\slow_data.txt
Time to sort strings in Generic List is: 281 msec
Time to sort strings in ArrayList is: 64281 msec
Time to sort strings in ArrayList with Ordinal is: 140 msec
Time to sort strings in string[] array is: 265 msec
Press ENTER to exit

code added was ..

//bad data is already sorted (worst case) so just resort it again

oTime = DateTime.Now;

oLines.Sort(StringComparer.Ordinal);

Console.WriteLine("Time to sort strings in ArrayList with Ordinal is: {0}
msec", (DateTime.Now.Ticks - oTime.Ticks) / TimeSpan.TicksPerMillisecond);



Cheers,

Greg Young

Visual C# MVP


Alex Chudnovsky said:
My bad forgot to repost links (source and data were updated, so please
download it again):

Source code: http://majestic12.co.uk/files/other/dotnet2bug/program.cs

Data file (you need to use it, don't use your data files as error seems
to
be data dependent!):
http://majestic12.co.uk/files/other/dotnet2bug/slow_data.txt
 
G

Guest

But Greg, I know I can change code and make it fast - I just use List
(string) or just string[] and then Array.Sort, the issue is that in .NET 2.0
ArrayList.Sort in some cases (as shown above) will sort many many times
slower than same code in .NET 1.1, of course rewriting code will solve the
problem, but I think its a very serious issue that deserves investigating -
if you do not have source code then some .NET apps that run fine in .NET 1.1
will run very slow under .NET 2.0, I can appreciate some slowdown but 200 (!)
slower is just way too slow.

So, my question is not really how to work around this - I already did, but
to get some attention from Microsoft folk to see exactly _WHY_ in this
situation ArrayList.Sort was sooooooooooooo slow - it is clearly a bug of
some kind that needs to be addressed.
 
G

Greg Young

I just escalated things based upon the following code ...

object[] Items = oLines.ToArray();

object[] Items2 = oLines.ToArray();

DateTime oTime = DateTime.Now;

oTime = DateTime.Now;

Array.Sort(Items, 0, Items.Length, Comparer.Default);

Console.WriteLine("Time to sort with Default Comparer: {0} msec",
(DateTime.Now.Ticks - oTime.Ticks) / TimeSpan.TicksPerMillisecond);

oTime = DateTime.Now;

Array.Sort(Items2, 0, Items.Length, null);

Console.WriteLine("Time to sort with Null Comparer: {0} msec",
(DateTime.Now.Ticks - oTime.Ticks) / TimeSpan.TicksPerMillisecond);

oTime = DateTime.Now;
 
G

Guest

I added your code and here is the output I get:

-----------------------------------------------
Running under .NET v2.0.50727.42
Loaded 29972 lines from file slow_data.txt
Time to sort strings in Generic List is: 187 msec
Time to sort strings in ArrayList is: 36609 msec
Time to sort strings in string[] array is: 156 msec
Time to sort with Default Comparer: 156 msec
Time to sort with Null Comparer: 156 msec
Press ENTER to exit
-----------------------------------------------

All good apart from ArrayList.Sort - I did some profiling and it seems that
sorting is repeated many many times, sadly I do not have source code to see
what's exactly wrong - I am very curious to find out because there has got to
be some reason for sorting to run _that_ slow - personally I am fine with it
as I am aware of the problem and have workaround, the only reason I posted is
to try to help others who may get gobsmacked by the slow performance of the
code that run just fine under .NET 1.1.
 
G

Guest

Greg Young said:
Where did you add the clones?
After the arraylist sort by chance :)

You got me, I should not work till that late (almost 3am here in the UK) :(

I now put ToArray() before sorting or ArrayList takes place, and here is the
output:

------------------------------------------------------
Running under .NET v2.0.50727.42
Loaded 29972 lines from file slow_data.txt
Time to sort strings in Generic List is: 187 msec
Time to sort strings in ArrayList is: 36562 msec
Time to sort strings in string[] array is: 171 msec
Time to sort with Default Comparer: 36703 msec
Time to sort with Null Comparer: 156 msec
Press ENTER to exit
------------------------------------------------------

So, it seems default comparer has got the issue of some kind. I am very
tempted to split data into chunks and try to find out if its specific strings
that result in slow performance, but too tired to do it now, at the moment I
am satisfied that its not just a weird issue experienced by me alone :)
 
M

Markus

Alex,
Thanks for that Marcus - but from what I can see you have not tried
it with my data that I linked to, specifically the following that
seems slower than usual:
http://majestic12.co.uk/files/other/dotnet2bug/slow_data.txt


with your data, the ArrayList.Sort() is really much slower, even though
mine seems to be not that bad:

-----------------------------------------------
Running under .NET v2.0.50727.42
Loaded 29989 lines from file slow_data.txt
Time to sort strings in Generic List is: 203 msec
Time to sort strings in ArrayList is: 5218 msec
Time to sort strings in string[] array is: 171 msec
Press ENTER to exit
-----------------------------------------------

What I think, the problem lies in the string comparison, as many of your
strings have the same lets say 20-30 letters at the beginning, an the
real comparison between two strings can only be found after comparing
the first letters (being the same). In Framework 2.0 they have done a
lot with globalization and localization etc., maybe deep inside there,
they use another comparer or have rewritten the comparer to work
correctly with more settings and more cultures. Maybe you try to write
an own comparer, if it is slow or fast does not matter in the first
case, but just to make sure, if the ArrayList.Sort or the
Comparer.Compare method is the problem. If the latter, you can implement
an own Comparer, if the first, you might implement your own sorting
algorithm (which is not really fine)... and the last solution could be:
ArrayList.ToString(), the sort the string-array, which is fast on both
framework versions, and convert back to an ArrayList (ugly, but it might
be ok with execution time).


Finally, what came to my mind (not a direct solution to the speed, but):

Can you change your application to sort the text-file once (e.g. at
system startup or at button click of a user), then write the sorted text
to a file, and then you only have to read the file and it is sorted...
it might prevent you from more sleepless night ;-)..

have a good day!
Markus
 
J

Jon Skeet [C# MVP]

So, it seems default comparer has got the issue of some kind. I am very
tempted to split data into chunks and try to find out if its specific strings
that result in slow performance, but too tired to do it now, at the moment I
am satisfied that its not just a weird issue experienced by me alone :)

Just to narrow things slightly (maybe - I'm not sure I've followed
everything) - I don't think it's the actual comparer that's at fault.

I've written a couple of "proxying" classes which just count the number
of comparisons which are done, proxying to a different implementation.
It also lets you proxy from a non-generic comparer to a generic one.
(The code is below.)

If you use the non-generic Array.Sort with Comparer.Default, it takes a
long time and *lots* of comparisons. If you use the Array.Sort<T> with
an IComparer<string> which proxies to Comparer.Default, it's fast and
uses 1/100th of the comparisons.

Not sure how much help all this is, but I would definitely suggest
opening a bug on the MSDN Product Feedback Center.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top