fastest sorted list type?

C

colin

Hi,
I have an array of structs wich contain amonsgst other things a Length of
type float,
I wish to make a sorted index array based on the length,
wich does have duplicates.

I initialy added all the indices to a list<int> and then sorted the list
using a comparar wich took the index and looked up struct in the array and
compared the length.

This was quite slow, so I thought id write a routine wich used binary search
to add indeices to the list in order,
While this called the compare routine far fewer times it did in fact take a
lot longer.

So i tried SortedList but unfortunatly this doesnt allow duplicate keys.

was I right in my first attempt or is there a better way ?
why is there two types of comparer functionality,
one of wich accepts a static member function,
the other a member function wich cant be static yet it still
takes two parameters, whats the this pointer used for in such cases ?

sort uses either, but binary search uses the later, would this make it
slower ?

the list contains over 2 million entries, wich is why i used an array of
structs as a list of classes would require lots of allocations
....


thanks
Colin =^.^=
 
P

Peter Duniho

[...]
was I right in my first attempt or is there a better way ?

Do you need the data sorted before you're done collecting it? If not,
then your first go at it was better. Maintaining data in sorted order is
generally at least as expensive as sorting it once.

An exception to this would be a binary tree implementation, which if I
recall correctly is what the non-generic SortedList class uses. A binary
tree has a poor worst-case performance (but then so too can some
implementations of quicksort), but for random data it's O(n log n) just
like quicksort.
why is there two types of comparer functionality,
one of wich accepts a static member function,
the other a member function wich cant be static yet it still
takes two parameters, whats the this pointer used for in such cases ?

I'm not really sure which versions you're talking about, but it _sounds_
like you're asking about the difference between the one that takes a
Comparison<T> and the one that takes an IComparer<T>.

The former is simply a delegate...it need not be in any class if you
simply provide an anonymous method, and if it is in a class it can be a
static method or an instance method (though if it's an instance method
then of course you need to provide an instance from which to get the
delegate reference).

The latter is an object that implements a specific interface, IComparer.
IComparer defines a single method, Compare(). This method would be
equivalent to the Comparison<T> delegate method.

In both cases, the comparison method compares the two parameters passed
in. They basically do the same thing, the only difference being how the
method is declared and then passed to the Sort() method.
sort uses either, but binary search uses the later, would this make it
slower ?

Not likely to produce a significant performance difference.
the list contains over 2 million entries, wich is why i used an array of
structs as a list of classes would require lots of allocations

Allocations in .NET are _extremely fast. In fact, I would guess that you
could get a decent performance improvement using a class instead of a
struct, especially if the struct is relatively large, because the data
actually being moved during the sort would be smaller. It's difficult to
anticipate exactly where the break-even point would be, since classes
reduce your data locality which would hurt performance. But at some point
they would definitely win out.

All that said, 2 million elements is a lot of data, and sorting that much
data is going to be relatively slow no matter what you do. At least,
until they perfect quantum computer technology and a sort can be
accomplished in constant time. :)

Pete
 
C

colin

Peter Duniho said:
[...]
was I right in my first attempt or is there a better way ?

Do you need the data sorted before you're done collecting it? If not,
then your first go at it was better. Maintaining data in sorted order is
generally at least as expensive as sorting it once.

thank for the reply, basically the data set is a list of all the possible
lines between
any two 3d points in a list. so that data grows in size quite quickly as the
number of points increase.
I maybe able to trim down the number but id still like to know wich is the
fastest sorted list.

its sorted on the length, wich is saved in the struct as its used many
times.

once the data is put into the container it isnt added to subsequently.
its discarded after being read about 5 times.
but basically it needs to be able to do the shortest lines first etc ...
An exception to this would be a binary tree implementation, which if I
recall correctly is what the non-generic SortedList class uses. A binary
tree has a poor worst-case performance (but then so too can some
implementations of quicksort), but for random data it's O(n log n) just
like quicksort.


I'm not really sure which versions you're talking about, but it _sounds_
like you're asking about the difference between the one that takes a
Comparison<T> and the one that takes an IComparer<T>.

yep thats the ones sorry if i wasnt explicit.
The former is simply a delegate...it need not be in any class if you
simply provide an anonymous method, and if it is in a class it can be a
static method or an instance method (though if it's an instance method
then of course you need to provide an instance from which to get the
delegate reference).

yes its confusing the BinarySearch needs a instance compared to the
The latter is an object that implements a specific interface, IComparer.
IComparer defines a single method, Compare(). This method would be
equivalent to the Comparison<T> delegate method.

In both cases, the comparison method compares the two parameters passed
in. They basically do the same thing, the only difference being how the
method is declared and then passed to the Sort() method.


Not likely to produce a significant performance difference.

aha ok thanks, but im unsure why the binarysearch and add method is slower
than sort,
as it calls my comparer far fewer times, yet it seemed to never end with the
bigger lists ...
maybe its becuase the sort method is self contained and
so doesnt have the bounds checking etc for every access ?
or is it doing lots of moving ?
I dont know the internals at all.
Allocations in .NET are _extremely fast. In fact, I would guess that you
could get a decent performance improvement using a class instead of a
struct, especially if the struct is relatively large, because the data
actually being moved during the sort would be smaller. It's difficult to
anticipate exactly where the break-even point would be, since classes
reduce your data locality which would hurt performance. But at some point
they would definitely win out.

Im not moving the structs at all, im only moving the index,
so its much like a class reference only with an extra level of indirection.
the structs also contain a fair amount of data wich is pre calculated
numbers to speed up computations.
All that said, 2 million elements is a lot of data, and sorting that much
data is going to be relatively slow no matter what you do. At least,
until they perfect quantum computer technology and a sort can be
accomplished in constant time. :)

it takes about 2 minutes to process many lists, only a few of wich are of
that size.
im considering if a binary tree would do better, but its not a trivial thing
to eveluate,
and its been a while since I went into this sort of thing in depth.

thanks
Colin =^.^=
 
P

Peter Duniho

[...]
yes its confusing the BinarySearch needs a instance compared to the
List<t>.Sort
I just have to use a dummy struct is all.

I don't know why BinarySearch() doesn't offer an overload that uses
Comparison<T>. Maybe they figured that since BinarySearch() only makes
sense if you repeatedly use the same comparison, that anyone using it
would already have created a single IComparer implementation that would be
used.

Note that if you were sorting class instances directly, you could just
implement IComparable for that class and use the default BinarySearch()
method, without passing any sort of comparison to it.
aha ok thanks, but im unsure why the binarysearch and add method is
slower
than sort,
as it calls my comparer far fewer times, yet it seemed to never end with
the
bigger lists ...

Why would you say that using BinarySearch() and Add() would call your
comparer "far fewer times"? The order of the algorithm is basically the
same: O(log n) for each element, for an overall order of O(n log n).

The big problem is that not only does using BinarySearch() require doing
just as many comparisons, every time you insert an element into the array
at a position based on the binary search, the portion of the array after
that position needs to be moved. This adds an O(n) operation to your
algorithm for each element you add, which changes the algorithm from O(n
log n) into one that's O(n^2). Which is basically awful. You might as
well be doing an insertion sort with a linked list at that point.

If you want to maintain the data in sorted order as it's being added, and
you know that the data will _not_ be added to the collection in an order
that's already sorted (that is, the input data is basically random), then
you may prefer to use the non-generic SortedList collection. It uses a
binary tree, which has the search performance of a binary search, but with
a constant-order cost for adding elements instead of O(n).
[...]
Im not moving the structs at all, im only moving the index,
so its much like a class reference only with an extra level of
indirection.

I don't understand the point of that. If I understand correctly, you have
an array of ints that are indices into an array of structs? Since an int
is the same size as a reference, I definitely don't see what you hope to
save by doing it that way. If you can allocate your actual data all at
once, then the class instances will all be in the same place in the heap
anyway, just like an array.

What are you hoping to gain by basically reimplementing the reference
logic with integers?
[...]
it takes about 2 minutes to process many lists, only a few of wich are of
that size.
im considering if a binary tree would do better, but its not a trivial
thing
to eveluate,
and its been a while since I went into this sort of thing in depth.

A binary tree isn't going to be faster in terms of providing a final
sorted list. But like I said, if you have a need for accessing the data
in sorted order for some reason as you are adding things to the
collection, the binary tree will be better there.

Implementation is trivial: just use the non-generic SortedList collection
class instead of the generic List<> class.

Pete
 
C

colin

Peter Duniho said:
[...]
yes its confusing the BinarySearch needs a instance compared to the
List<t>.Sort
I just have to use a dummy struct is all.

I don't know why BinarySearch() doesn't offer an overload that uses
Comparison<T>. Maybe they figured that since BinarySearch() only makes
sense if you repeatedly use the same comparison, that anyone using it
would already have created a single IComparer implementation that would be
used.

Note that if you were sorting class instances directly, you could just
implement IComparable for that class and use the default BinarySearch()
method, without passing any sort of comparison to it.

well its an int so I cant do that as that would just compare the values
directly not via a index into an array.
Why would you say that using BinarySearch() and Add() would call your
comparer "far fewer times"? The order of the algorithm is basically the
same: O(log n) for each element, for an overall order of O(n log n).

well its strange then,
for a particular run of numerous lists list I counted it called compare
23261
times using Sort, and 12885 using binarysearch with add
thats a ratio of 1.8:1
The big problem is that not only does using BinarySearch() require doing
just as many comparisons, every time you insert an element into the array
at a position based on the binary search, the portion of the array after
that position needs to be moved. This adds an O(n) operation to your
algorithm for each element you add, which changes the algorithm from O(n
log n) into one that's O(n^2). Which is basically awful. You might as
well be doing an insertion sort with a linked list at that point.

ah i hadnt thought of that.
If you want to maintain the data in sorted order as it's being added, and
you know that the data will _not_ be added to the collection in an order
that's already sorted (that is, the input data is basically random), then
you may prefer to use the non-generic SortedList collection. It uses a
binary tree, which has the search performance of a binary search, but with
a constant-order cost for adding elements instead of O(n).

no i dont need to read the data till its all sorted.
as I need to process the shortest first so I cant do that untill the last
one ahs been sorted.
[...]
Im not moving the structs at all, im only moving the index,
so its much like a class reference only with an extra level of
indirection.

I don't understand the point of that. If I understand correctly, you have
an array of ints that are indices into an array of structs? Since an int
is the same size as a reference, I definitely don't see what you hope to
save by doing it that way. If you can allocate your actual data all at
once, then the class instances will all be in the same place in the heap
anyway, just like an array.

What are you hoping to gain by basically reimplementing the reference
logic with integers?

well im going to change it all to class reference now and see,
if I find alloc is taking a lot of time il just put the class references
into an array and re use them ...
il keep you posted.
[...]
it takes about 2 minutes to process many lists, only a few of wich are of
that size.
im considering if a binary tree would do better, but its not a trivial
thing
to eveluate,
and its been a while since I went into this sort of thing in depth.

A binary tree isn't going to be faster in terms of providing a final
sorted list. But like I said, if you have a need for accessing the data
in sorted order for some reason as you are adding things to the
collection, the binary tree will be better there.

Implementation is trivial: just use the non-generic SortedList collection
class instead of the generic List<> class.

thanks for the help, seems like the add and then sort I had to start with
might be best after all.
class reference might speed things up a bit by losing the level of
indirection,
so il give that a spin.

thanks.
Colin =^.^=
 
C

colin

thanks for the help, seems like the add and then sort I had to start with
might be best after all.
class reference might speed things up a bit by losing the level of
indirection,
so il give that a spin.

didnt make a great deal of difference,
using the index into an array of structs,
or new each time, or a pre allocated array of clases
wich is re used each time.

in a long run of many lists it calls compare about 67 million times,
total time 74 seconds, but ofc it does a heck of a lot of other stuff,
but this compare function is called more than any other function.
the difference was about 5 seconds but it varies by
by nearly that much from run to run anyway.

its still spending 10% of time it spends in my code in the compare routine,
but it spends 75% of the total time in mscorlib and other libs where im not
sure whats its doing.

I stil have much larger data sets to work with that I havnt tried yet ...

interesting the binary search/add was so slow, I realise why now
I should of realised before, I first wrote a binary sorted tree on
an amd2900 4 bit slice processor for a mac address look up table.


Colin =^.^=
 
A

Arne Vajhøj

colin said:
I have an array of structs wich contain amonsgst other things a Length of
type float,
I wish to make a sorted index array based on the length,
wich does have duplicates.

I initialy added all the indices to a list<int> and then sorted the list
using a comparar wich took the index and looked up struct in the array and
compared the length.

This was quite slow, so I thought id write a routine wich used binary search
to add indeices to the list in order,
While this called the compare routine far fewer times it did in fact take a
lot longer.

So i tried SortedList but unfortunatly this doesnt allow duplicate keys.

Have you considered SortedList<float,List<X>> ?

Arne
 
C

colin

Arne Vajhøj said:
Have you considered SortedList<float,List<X>> ?

yes but there are duplicate keys as the sort term is length even though its
float some are still identical.
Im not sure if what your suggesting gets round this with the list<X> ?
maybe the duplicate entries gointo the list ?

I did think of having an index and using that for an extra comparison if the
lengths were the same.

But actually I think im going to make a huge array (maybe 100k) and just use
the Length as an index.
I know the largest possible length in advance by taking the min and max so I
can divide down anyway.
duplicate values will just be listed in the same slot maybe linked.

this ought to be dam fast !

I dont even need it to be 100% accurate.

Colin =^.^=
 
P

Peter Duniho

well its an int so I cant do that as that would just compare the values
directly not via a index into an array.

No. You write the comparer. Note what I wrote: "if you were sorting
class instances directly".

I'm not really sure what you thought I wrote, but whether you're sorting
an array of ints or sorting an array of references to instances of a
class, you have complete control over the comparison. Whatever you meant
when you wrote "I can't do that", I'm sure you're mistaken. You can do
the sort however you like (assuming, of course, your comparison follows
the standard rules of sortability).
well its strange then,
for a particular run of numerous lists list I counted it called compare
23261
times using Sort, and 12885 using binarysearch with add
thats a ratio of 1.8:1

That's not "far fewer". :)

Each algorithm is going to be sensitive to the specific nature of the
data, and even this sensitivity varies according to the specific
implementation (well, at least for quicksort...binary search is more of a
"one way to do it" sort of thing, and so is only really sensitive to the
input data, rather than the exact algorithm).

In any case, in the context of algorithm _order_, even if the difference
was a consistent factor of 2, that's not significant enough to warrant
considering the algorithms significantly different (especially considering
the other costs associated with the binary search/add approach). The
concept of "order" as it applies to algorithm cost analysis is very
coarse-grained, and a constant factor differentiating two algorithms that
are otherwise the same cost is irrelevant in that analysis.

That said, if that 2x difference _is_ in fact consistent, you may find a
performance gain using a binary tree. The binary tree should have a
similar number of comparisons to the binary search, but without the
overhead of an array insertion.

In all cases, you need to be careful to not just look at the total number
of calls to your comparison function. There are lots of other things that
can affect performance, and so you'll want to measure actual time cost
when comparing different implementations. And if you choose one
implementation over another based on this measurement, you should make
sure that a) you have used representative data, and b) that the
implementation you choose doesn't have a worst-case time cost that could
happen with the data you're using.

An implementation that is on average twice as fast as another
implementation, but which has a real-world worst-case possibility that
changes the actual order of the algorithm (from O(n log n) to O(n^2), for
example), is _not_ an implementation you want to use. (So, for example,
don't use a binary tree if there's a possibility that the data will be
added to the collection in its sorted order, or at least do someting to
detect that scenario so that you avoid resorting data that's already
sorted).
[...]
thanks for the help, seems like the add and then sort I had to start with
might be best after all.
class reference might speed things up a bit by losing the level of
indirection,
so il give that a spin.

I see from your other post it didn't make a difference in time cost using
a class versus a struct. So IMHO your choice should be according to code
maintainability, and I think that a class comes out ahead in that case.
Especially since you can then have the class implement IComparable, making
it simpler to do things like sorting.

Pete
 
C

colin

Peter Duniho said:
No. You write the comparer. Note what I wrote: "if you were sorting
class instances directly".

ah sorry maybe I read what you wrote a bit quick or I wrote what I wrote a
bit quick,
cant remember now, I was just off to bed ...

Yes I know, what I meant was its an int so you cant add an interface to it,
at least not to my knowledge, although ive been looking through .net 3.5 and
notice you can add
pseudo member functions, maybe theyl add this to ints to in 5.0 ....

the binary Add was what I had written before wich did exactly what you are
mentioning,
I just had to add an extra parameter for the comparer.
I'm not really sure what you thought I wrote, but whether you're sorting
an array of ints or sorting an array of references to instances of a
class, you have complete control over the comparison. Whatever you meant
when you wrote "I can't do that", I'm sure you're mistaken. You can do
the sort however you like (assuming, of course, your comparison follows
the standard rules of sortability).


That's not "far fewer". :)
lol :)

well that was only for a Very short run, it takes so long for the longer
runs
when I turn on the debug printout. 50% reduction in my code would be
significant.

however given that it only spends 25% of the time in my code as to library
and only 25% of my code in this fucntion its no great difference,
however I have to start looking at some place first to cut down speed,

although it may very well spend more time in the library due to this
function.
Each algorithm is going to be sensitive to the specific nature of the
data, and even this sensitivity varies according to the specific
implementation (well, at least for quicksort...binary search is more of a
"one way to do it" sort of thing, and so is only really sensitive to the
input data, rather than the exact algorithm).

In any case, in the context of algorithm _order_, even if the difference
was a consistent factor of 2, that's not significant enough to warrant
considering the algorithms significantly different (especially considering
the other costs associated with the binary search/add approach). The
concept of "order" as it applies to algorithm cost analysis is very
coarse-grained, and a constant factor differentiating two algorithms that
are otherwise the same cost is irrelevant in that analysis.

That said, if that 2x difference _is_ in fact consistent, you may find a
performance gain using a binary tree. The binary tree should have a
similar number of comparisons to the binary search, but without the
overhead of an array insertion.

In all cases, you need to be careful to not just look at the total number
of calls to your comparison function. There are lots of other things that
can affect performance, and so you'll want to measure actual time cost
when comparing different implementations. And if you choose one
implementation over another based on this measurement, you should make
sure that a) you have used representative data, and b) that the
implementation you choose doesn't have a worst-case time cost that could
happen with the data you're using.

An implementation that is on average twice as fast as another
implementation, but which has a real-world worst-case possibility that
changes the actual order of the algorithm (from O(n log n) to O(n^2), for
example), is _not_ an implementation you want to use. (So, for example,
don't use a binary tree if there's a possibility that the data will be
added to the collection in its sorted order, or at least do someting to
detect that scenario so that you avoid resorting data that's already
sorted).

Yes its coming back to me now, binary tree in sorted order is a nightmare
lol.

Im getting an old programmer now, ive forgoton so much it seems,
It was nice to be able to forget about low level stuff when I moved to c#
think maybe I forgot a bit too much lol.
I just didnt know what was under the hood of the List<T>.Sort() so I assumed
it wasnt very good,
seems now I was mistaken ...
I did read somewhere the SortedList<T> and SortedDictionary<T> were very
fast
maybe its al the same as List<T>

your replies are actually very clear and helpfull
and a few other peoples too, I thank you a lot,
I always seem to hurry mine,
usualy becuase ive spent ages trying to solve a problem and
need a break cos my head hurts but think il just ask on
the forum maybe il have an answer when i get back .. :D
(am I lazy or what!)
[...]
thanks for the help, seems like the add and then sort I had to start with
might be best after all.
class reference might speed things up a bit by losing the level of
indirection,
so il give that a spin.

I see from your other post it didn't make a difference in time cost using
a class versus a struct. So IMHO your choice should be according to code
maintainability, and I think that a class comes out ahead in that case.
Especially since you can then have the class implement IComparable, making
it simpler to do things like sorting.

Yes I agree :)

I dont know if you saw my other post but I intend to try and sort the data
very crudly by using the length
as an index, the maximum length can be determined as the data is being
created,
sorting the data into bins of 1% would make little diference to the outcome,
ofc each 1% bin could then be sorted.
I did have a look on google and wiki lists al the sort algorothms nicely,
but doesnt list this type.

I gues it is just a classic case of dividing the data down into smaller
chunks.

again il give it a try and keep you posted.

The whole process is to clean up the mess of lines created by intersecting
one polyhedron to many polyhedrons
this algorithm cleans up one surface at a time.

this particular very big list is a result of a huge 3d maze,
the square floor is intersected by many walls wich adds about 2000 lines,
this algorithm simply adds lines so the result consists of a set of concave
polygons,
by adding lines to vertices where there is an angle >180' wich amounts to
700 points.

the algorithm works through the list in length order, if it finds a line
that fixes 2 points then it uses this fix
in preference to a line wich fixes just one point with 1/4 the line length.
ofc it also has to make sure it doesnt cross any other lines wich acounts
for another 25% of execution time.

it worked well indeed for small models, I was caught out by this large
number though,
there may well be a better way of doing this altogether but for now I at
least want to get to the bottomm
of sorted lists.

many thanks.
Colin =^.^=
 
A

Arne Vajhøj

colin said:
yes but there are duplicate keys as the sort term is length even though its
float some are still identical.
Im not sure if what your suggesting gets round this with the list<X> ?
maybe the duplicate entries gointo the list ?

Exactly.

Arne

PS: There is also SortedDictionary which you can use similar.
 
C

colin

colin said:
I dont know if you saw my other post but I intend to try and sort the data
very crudly by using the length
as an index, the maximum length can be determined as the data is being
created,
sorting the data into bins of 1% would make little diference to the
outcome,
ofc each 1% bin could then be sorted.
I did have a look on google and wiki lists al the sort algorothms nicely,
but doesnt list this type.

I gues it is just a classic case of dividing the data down into smaller
chunks.

again il give it a try and keep you posted.

yea, :D
splitting the data into chunks so each chunk has a non overlaping range of
lengths
sorting the chunks then adding the chunks together
shaved about 10 seconds off the execution time and placed the time spent
in the compare routine to be insignificant.

still 65 more seconds to optimise elsewhere now though ...

Colin =^.^=
 
P

Peter Duniho

ah sorry maybe I read what you wrote a bit quick or I wrote what I
wrote a
bit quick,
cant remember now, I was just off to bed ...

Yes I know, what I meant was its an int so you cant add an interface to
it,
at least not to my knowledge, although ive been looking through .net 3.5
and
notice you can add
pseudo member functions, maybe theyl add this to ints to in 5.0 ....

I'm not aware of any scenario you've mentioned in which what you're
sorting is "just an int".

Yes, you have a scenario in which you are sorting a list of ints, but when
you actually compare two entries in that list you aren't comparing the
integers themselves, but rather the data in an object to which the int
refers.
the binary Add was what I had written before wich did exactly what you
are
mentioning,
I just had to add an extra parameter for the comparer.

Right. And with that parameter, you are given complete, absolute control
over how the comparison is done. Which makes sense, really. If you're
using an integer index into some array and sorting a list of those
indices, you obviously must be able to override the default "compare two
integers" behavior. And once you're overriding that default behavior, you
can implement whatever behavior your heart desires.
[...]
Im getting an old programmer now, ive forgoton so much it seems,
It was nice to be able to forget about low level stuff when I moved to c#
think maybe I forgot a bit too much lol.
I just didnt know what was under the hood of the List<T>.Sort() so I
assumed
it wasnt very good,
seems now I was mistaken ...
I did read somewhere the SortedList<T> and SortedDictionary<T> were very
fast maybe its al the same as List<T>

My experience has been that the .NET classes are all generally written
with care and with minimal performance problems. You should start out
with the assumption that a given class is very good, and then investigate
further if for some reason that appears to turn out not to be the case.

Even where a class appears to not be very good, it will usually turn out
that it's simply sensitive to misuse. In the rare situation in which the
class really isn't very good, then you can think about writing your own.
your replies are actually very clear and helpfull
and a few other peoples too, I thank you a lot,

You're welcome.
I always seem to hurry mine,
usualy becuase ive spent ages trying to solve a problem and
need a break cos my head hurts but think il just ask on
the forum maybe il have an answer when i get back .. :D
(am I lazy or what!)

No comment. :)
[...]
I dont know if you saw my other post but I intend to try and sort the
data
very crudly by using the length
as an index, the maximum length can be determined as the data is being
created,
sorting the data into bins of 1% would make little diference to the
outcome,
ofc each 1% bin could then be sorted.
I did have a look on google and wiki lists al the sort algorothms nicely,
but doesnt list this type.

I gues it is just a classic case of dividing the data down into smaller
chunks.

I'm not really clear on what you're describing, but it sounds a little
like a merge sort. In case you're looking for a term you can use to
search for additional information on refining the implementation you've
come up with.

Pete
 
C

colin

Peter Duniho said:
......
I'm not really clear on what you're describing, but it sounds a little
like a merge sort. In case you're looking for a term you can use to
search for additional information on refining the implementation you've
come up with.

ah thanks I looked up merge sort on wiki and describes dividing the list
into two sub lists but
without doing any comparing, then dividing again and again etc til each sub
list has 1 item
then sorting it out .. seems like a binary tree in nature.

My method involves splitting the list up into many sub lists
BUT where all the items in each sub list are less than the items in the
next.

this requires 1 test per item.
each sub list is then sorted and all the pieces are simply added in order.
the idea is that each list is much smaller,
so the expeonential rise in comparisons is avoided.

I gues its sensitive to data being in clumps wich in the extreme might mean
one list has all the points.
but is a calculated risk il take, of course this could be identified after
only 1 test per item.

It seems to be very fast indeed on the data set I have wich was cuasing the
problem.

since then Ive managed to redo the graphics so its a heck of a lot faster,
and I think I thought of a much faster algorithm to tell if two 3d lines
interesct,
wich is now taking up 25% of the time.

Colin =^.^=
 
P

Peter Duniho

ah thanks I looked up merge sort on wiki and describes dividing the list
into two sub lists but
without doing any comparing, then dividing again and again etc til each
sub
list has 1 item
then sorting it out .. seems like a binary tree in nature.

That examples does sound sort of like a binary tree, but not exactly.
Typically, a merge sort is used where you have (or can produce) some
relatively small number of multiple ordered lists of data. A new list is
generated by pulling the lowest-ordered element from the collection of
lists (different techniques can be used to keep track of which list
actually contains the current, lowest-ordered element), until all of the
lists have been exhausted.

If you degenerate the creation of these initial lists to having just
single elements, then yes it seems the algorithm acts a little like a
binary tree. I haven't bothered to look up the example you're referring
to, but once you've got a bunch of single-element lists you still have the
problem of ordering them. That could be expensive.

So, at least from your description, the example sounds flawed. Either
you've misunderstood, or someone put in a bad example. I suppose either
is possible.
My method involves splitting the list up into many sub lists
BUT where all the items in each sub list are less than the items in the
next.

This sounds like a quicksort.
this requires 1 test per item.
each sub list is then sorted and all the pieces are simply added in
order.
the idea is that each list is much smaller,
so the expeonential rise in comparisons is avoided.

With quicksort, there's no exponential rise in comparisons. It's O(n log
n), so there's a logarithmic rise in comparisons, which is very low. And
your new sorting algorithm is still basically a quicksort, albeit a more
complicated one.

You haven't changed the overall order of the algorithm (it's still O(n log
n)), though the real cost has been changed from something like "n log n"
to "n (log n/m) + n" where "m" is the number of lists you create. I
suppose this _could_ reduce the actual real-world cost, but you have to
evaluate whether any modest performance benefit, if any, is worth the cost
in maintainability. And of course, you've actually added a new worst-case
scenario that will increase your cost in certain situations (i.e. it will
be _slower_ than just using a plain quicksort).

Personally, it's not a trade-off I'd make but I suppose some people might.

Pete
 
C

colin

Peter Duniho said:
That examples does sound sort of like a binary tree, but not exactly.
Typically, a merge sort is used where you have (or can produce) some
relatively small number of multiple ordered lists of data. A new list is
generated by pulling the lowest-ordered element from the collection of
lists (different techniques can be used to keep track of which list
actually contains the current, lowest-ordered element), until all of the
lists have been exhausted.

If you degenerate the creation of these initial lists to having just
single elements, then yes it seems the algorithm acts a little like a
binary tree. I haven't bothered to look up the example you're referring
to, but once you've got a bunch of single-element lists you still have the
problem of ordering them. That could be expensive.

So, at least from your description, the example sounds flawed. Either
you've misunderstood, or someone put in a bad example. I suppose either
is possible.

ah yes your explnation sounds more like it,
maybe my explanation wasnt that good.
This sounds like a quicksort.


With quicksort, there's no exponential rise in comparisons. It's O(n log
n), so there's a logarithmic rise in comparisons, which is very low. And
your new sorting algorithm is still basically a quicksort, albeit a more
complicated one.

You haven't changed the overall order of the algorithm (it's still O(n log
n)), though the real cost has been changed from something like "n log n"
to "n (log n/m) + n" where "m" is the number of lists you create. I
suppose this _could_ reduce the actual real-world cost, but you have to
evaluate whether any modest performance benefit, if any, is worth the cost
in maintainability. And of course, you've actually added a new worst-case
scenario that will increase your cost in certain situations (i.e. it will
be _slower_ than just using a plain quicksort).

Personally, it's not a trade-off I'd make but I suppose some people might.

theoretically yes id agree with your figures but the actual time taken
in my run seems to bear no resemblance to that.

of course there could be many things affecting it, maybe the size of a list
exceeding the cpu cache is cuasing a big performance issue.

the actual code to split the list is a very small loop.

Colin =^.^=
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top