2.3 Sorting Terminology and Notation
As defined, the Sorting Problem allows input with two or more records that have the same key value. Certain applications require that input not contain duplicate key values. Typically, sorting algorithms can handle duplicate key values unless noted otherwise. When duplicate key values are allowed, there might be an implicit ordering to the duplicates, typically based on their order of occurrence within the input. It might be desirable to maintain this initial ordering among duplicates. A sorting algorithm is said to be stable if it does not change the relative ordering of records with identical key values. Many, but not all, of the sorting algorithms presented in this chapter are stable, or can be made stable with minor changes.
When comparing two sorting algorithms, the simplest approach would be to program both and measure their running times. This is an example of empirical comparison. However, doing fair empirical comparisons can be tricky because the running time for many sorting algorithms depends on specifics of the input values. The number of records, the size of the keys and the records, the allowable range of the key values, and the amount by which the input records are “out of order” can all greatly affect the relative running times for sorting algorithms.
When analyzing sorting algorithms, it is traditional to measure the cost by counting the number of comparisons made between keys. This measure is usually closely related to the actual running time for the algorithm and has the advantage of being machine and data-type independent. However, in some cases records might be so large that their physical movement might take a significant fraction of the total running time. If so, it might be appropriate to measure the cost by counting the number of swap operations performed by the algorithm. In most applications we can assume that all records and keys are of fixed length, and that a single comparison or a single swap operation requires a constant amount of time regardless of which keys are involved. However, some special situations “change the rules” for comparing sorting algorithms. For example, an application with records or keys having widely varying length (such as sorting a sequence of variable length strings) cannot expect all comparisons to cost roughly the same. Not only do such situations require special measures for analysis, they also will usually benefit from a special-purpose sorting technique.
Other applications require that a small number of records be sorted, but that the sort be performed frequently. An example would be an application that repeatedly sorts groups of five numbers. In such cases, the constants in the runtime equations that usually get ignored in asymptotic analysis now become crucial. Note that recursive sorting algorithms end up sorting lots of small lists as well.
Finally, some situations require that a sorting algorithm use as little memory as possible. We will call attention to sorting algorithms that require significant extra memory beyond the input array.
2.3.1 Review questions: Sorting Introduction
- “Stable” has nothing to do with how fast an algorithm is.
- It refers to maintaining the relative order of records with equal key values.
- In some applications, we require that records with equal key value preserve the relative order of those records.
Which of these will NOT affect the RELATIVE running times for two sorting algorithms?
- If we speed up the CPU by a factor of two, both sorts will go twice as fast.
Which of these is a traditional measure for the cost of a sorting algorithm?
Note: Multiple answers are possible!
- This question does not ask what affects the cost. It asks what is a measure of the cost.
- The number of records affects the cost, but it does not measure the cost.
- The memory size affects the cost, but it does not measure the cost.
- Records being out of order can increase the cost, but not measure it.
In which case might the number of comparisons not be a good representation of the cost for a sorting algorithm?
- CPU speed would affect all comparisons in the same way.
- Number of records or amount of space won’t affect the value of counting comparisons.
- The longer the string, the longer it takes to compare.
Sometimes, the constant factors in an algorithm’s runtime equation are more important thant its growth rate. When the problem is sorting, this can happen in which situation?
- CPU speed would affect all comparisons in the same way.
- Records being sorted or reverse sorted affects the growth rate of different algorithms differently. But not relevant to this question.
- When we sort only a few records, constants matter a lot.