2.4 Insertion Sort

What would you do if you have a stack of phone bills from the past two years and you want to order by date? A fairly natural way to handle this is to look at the first two bills and put them in order. Then take the third bill and put it into the right position with respect to the first two, and so on. As you take each bill, you would add it to the sorted pile that you have already made. This simple approach is the inspiration for our first sorting algorithm, called Insertion Sort.

Insertion Sort iterates through a list of records. For each iteration, the current record is inserted in turn at the correct position within a sorted list composed of those records already processed. Here is an implementation. The input is an array named A that stores nn records.

function insertionSort(A):
    N = A.size()
    for i = 1 to N-1:
        // Move the i'th element to its correct position.
        j = i
        while j > 0 and A[j] < A[j-1]:
            swap(A, j, j-1)
            j = j-1

(Note that to make the explanation for these sorting algorithms as simple as possible, our visualizations will show the array as though it stored simple integers rather than more complex records. But you should realize that in practice, there is rarely any point to sorting an array of simple integers. Nearly always we want to sort more complex records that each have a key value. In such cases we must have a way to associate a key value with a record. The sorting algorithms will simply assume that the records are comparable.)

Here we see the first few iterations of Insertion Sort.

This continues on with each record in turn. Call the current record xx. Insertion Sort will move it to the left so long as its value is less than that of the record immediately preceding it. As soon as a key value less than or equal to xx is encountered, insertionSort is done with that record because all records to its left in the array must have smaller keys.

2.4.1 Insertion Sort Analysis

Here is an explanation of the worst case cost of insertion sort:

And here is an explanation of the cost of the best case:

And here is the average case cost:

While the best case is significantly faster than the average and worst cases, the average and worst cases are usually more reliable indicators of the “typical” running time. However, there are situations where we can expect the input to be in sorted or nearly sorted order. One example is when an already sorted list is slightly disordered by a small number of additions to the list; restoring sorted order using Insertion Sort might be a good idea if we know that the disordering is slight. And even when the input is not perfectly sorted, Insertion Sort’s cost goes up in proportion to the number of inversions. So a “nearly sorted” list will always be cheap to sort with Insertion Sort. Examples of algorithms that take advantage of Insertion Sort’s near-best-case running time are Shellsort and Quicksort.

Counting comparisons or swaps yields similar results. Each time through the inner for loop yields both a comparison and a swap, except the last (i.e., the comparison that fails the inner for loop’s test), which has no swap. Thus, the number of swaps for the entire sort operation is n1n-1 less than the number of comparisons. This is 0 in the best case, and Θ(n2)\Theta(n^2) in the average and worst cases.

Later we will see algorithms whose growth rate is much better than Θ(n2)\Theta(n^2). Thus for larger arrays, Insertion Sort will not be so good a performer as other algorithms. So Insertion Sort is not the best sorting algorithm to use in most situations. But there are special situations where it is ideal. We already know that Insertion Sort works great when the input is sorted or nearly so. Another good time to use Insertion Sort is when the array is very small, since Insertion Sort is so simple. The algorithms that have better asymptotic growth rates tend to be more complicated, which leads to larger constant factors in their running time. That means they typically need fewer comparisons for larger arrays, but they cost more per comparison. This observation might not seem that helpful, since even an algorithm with high cost per comparison will be fast on small input sizes. But there are times when we might need to do many, many sorts on very small arrays. You should spend some time right now trying to think of a situation where you will need to sort many small arrays. Actually, it happens a lot.

See Computational Fairy Tales: Why Tailors Use Insertion Sort for a discussion on how the relative costs of search and insert can affect what is the best sort algorithm to use.

2.4.2 Review questions: Insertion sort

Answer TRUE or FALSE.

Insertion Sort (as the code is written in this module) is a stable sorting algorithm. Recall that a stable sorting algorithm maintains the relative order of records with equal keys.

  • Think of the behavior of every pass through the inner for loop of the insertion sort if keys are equal.

When implementing Insertion Sort, a binary search could be used to locate the position within the first i1i-1 records of the array into which record ii should be inserted. Using binary search will:

  • The position at which to insert could be found in Θ(log(i))\Theta(\log(i)) steps, but shifting the records to make room for the this record will require Θ(i)\Theta(i) time.

When implementing Insertion Sort, a binary search could be used to locate the position within the first i1i-1 records of the array into which record ii should be inserted. In this implementation, the worst case time will be:

  • The position to insert could be found in Θ(logi)\Theta (\log i), but shifting the records to makeroom for the insert will still require Θ(i)\Theta(i) time

We know that the worst case for Insertion Sort is about n2/2n^2/2, while the average case is about n2/4n^2/4. This means that:

  • Growth rates ignore constants.

In which cases are the growth rates the same for Insertion Sort?

  • Insertion Sort is really cheap in the best case.
  • Its average and worst case times differ by a constant factor.

The order of the input records has what impact on the number of comparisons required by Insertion Sort (as presented in this module)?

  • Does Insertion Sort change when it make a comparison according to the order of the array input values?
  • Yes, Insertion Sort might stop early or might look at many records.

What is the {average|worst}-case time for Insertion Sort to sort an array of n records?

  • In the worst case, each record must make its way to the start of the array. This would occur if the record values are initially arranged from highest to lowest, in the reverse of sorted order.
  • In this case, the number of comparisons will be one the first time through the for loop, two the second time, and so on.

What is the best-case time for Insertionsort to sort an array of n records?

  • The best-case cost occurs when the records are already in sorted order from lowest to highest.
  • In this case, every test on the inner for loop will fail immediately, and no records will be moved.

What is the running time of Insertion Sort when the input is an array where all record values are equal?

  • Each test in the inner for loop will fail because the value at position ii is never less than the value at position i1i-1.

If II is the number of inversions in an input array of nn records, then Insertion Sort will run in what time?

  • Insertion sort has to do nn passes where it compares at least once.
  • If the record for this pass has no remaining inversions, then it requires no work.
  • But if it does have inversions, it will need a swap for each such inversion.

What is the running time for Insertion Sort when the input array has values that are in reverse sort order?

  • On each iteration, the iith record will have to move to the start of the array.
  • This is the worst case.

What is the running time of Insertion Sort when the input is an array that has already been sorted?

  • Each test in the inner for loop will fail because the value at position ii is never less than the value at position i1i-1.

When is Insertion Sort a good choice for sorting an array?

  • Remember that insertion sort implementation is made up of two nested for loops.
  • The outer for loop is executed n1n-1 times. While the number of times the inner for loop executes depends on how many keys in positions 0 to i1i-1 have a value less than that of the key in position i.

When is Insertion Sort a good choice for sorting an array?

  • Insertion Sort if fairly simple.
  • Because Insertion Sort is simple, it tends to cost only a little bit per comparison when compared to more complicated sorting algorithms.

In the worst case, the total number of comparisons for Insertion Sort is closest to:

  • Insertion Sort’s implementation is made up of two nested for loops.
  • The outer for loop is executed n1n-1 times.
  • The inner for loop is executed ii times.
  • The total cost is the sum of ii’s for ii goes from 1 to nn.