Dave Sexton said:
It seems you are suggesting that the scale of the data can affect whether
a required sorting operation will actually reduce the overall performance
of an algorithm so that it becomes worse than a linear search.
His point (I think) is that some algorithms only work after some initial
setup, and that setup can be costly. I'm not sure that all sorting and
searching algorithms are a good example of this. For example, depending on
the implementation of a Quicksort, it can perform just as well on small data
sets as large ones. Likewise binary search (though for very small data
sets, the difference between n and log n is very small).
But it is certainly true that the order of an algorithm considers only a
data set that's basically infinite. There can be wide variations in
performance, well away from the theoretical performance, when the data set
is very small. To make matters worse, what constitutes "very small" or
"large enough" varies depending on the algorithm
I think this is what Bill was saying...I don't think I'm adding anything
new, but hopefully am just restating it in a slightly different way, which
might make it clearer to some people (just as it may obfuscate it to others

).
Can this be true even when the data is sorted as it's being entered into
the structure?
Ack! And I mean that in the Bill the Cat way, not the i/o handshaking way.
The analysis becomes very different when you've got an incremental situation
like that. The order analysis for these different algorithms assumes you
start with all the data and run through it once to produce the end results.
What is best in that case may very well not be best for a case where you
want to perform the same operation after each data element is added.
For example, resorting an array would be a silly way to sort data as it's
being added to a data structure. After all, if the data is already sorted,
sorting it again just to add on element is very inefficient. Even worse, as
I noted elsewhere, the worst-case performance for some sorting algorithms
comes when the data starts out sorted. This is a classic example of why
knowing the exact scenario the algorithm is to be used is at least as
important as knowing the order of the algorithm.
I would never use Quicksort to sort data as it's being entered into a data
structure. It's one of the worst choices in that case, and you practically
guarantee minimal performance, even though the order of the algorithm is
superior on average.
Since I'm not one for micro-optimization either I'd rather just choose an
algorithm with an average performance of O(lg(n)) if I'm sure that there
may be substantial amounts of data at times and just swallow the
performance hit on those smaller collections.
As Bill notes, when the data size is small, the order of the algorithm is
usually not relevant since any algorithm will be fast. So yes, swallowing
the performance hit is fine in that case.
However, that's not the same as ignoring the exact scenario being solved.
You need to take into account how the algorithm will be used, and not create
a situation where you choose an algorithm that has a good average case
order, but will always be used where the worst case order applies.
I hope that makes sense...I'm typing quickly because I'm hungry and want to
eat lunch.
Pete