2.10 Implementing Mergesort

Implementing Mergesort presents a number of technical difficulties. The first decision is how to represent the lists. Mergesort lends itself well to sorting a singly linked list because merging does not require random access to the list elements. Thus, Mergesort is the method of choice when the input is in the form of a linked list. Implementing merge for linked lists is straightforward, because we need only remove items from the front of the input lists and append items to the output list. Breaking the input list into two equal halves presents some difficulty. Ideally we would just break the lists into front and back halves. However, even if we know the length of the list in advance, it would still be necessary to traverse halfway down the linked list to reach the beginning of the second half. A simpler method, which does not rely on knowing the length of the list in advance, assigns elements of the input list alternating between the two sublists. The first element is assigned to the first sublist, the second element to the second sublist, the third to first sublist, the fourth to the second sublist, and so on. This requires one complete pass through the input list to build the sublists.

When the input to Mergesort is an array, splitting input into two subarrays is easy if we know the array bounds. Merging is also easy if we merge the subarrays into a second array. Note that this approach requires twice the amount of space as any of the sorting methods presented so far, which is a serious disadvantage for Mergesort. It is possible to merge the subarrays without using a second array, but this is extremely difficult to do efficiently and is not really practical. Merging the two subarrays into a second array, while simple to implement, presents another difficulty. The merge process ends with the sorted list in the auxiliary array. Consider how the recursive nature of Mergesort breaks the original array into subarrays. Mergesort is recursively called until subarrays of size 1 have been created, requiring logn\log n levels of recursion. These subarrays are merged into subarrays of size 2, which are in turn merged into subarrays of size 4, and so on. We need to avoid having each merge operation require a new array. With some difficulty, an algorithm can be devised that alternates between two arrays. A much simpler approach is to copy the sorted sublists to the auxiliary array first, and then merge them back to the original array.

Here is a complete implementation for mergesort following this approach. The input records are in array A. Array temp is used as a place to temporarily copy records during the merge process. Parameters left and right define the left and right indices, respectively, for the subarray being sorted. The initial call to mergeSort would be mergeSort(array, temparray, 0, n-1).

function mergeSort(A, temp, left, right):
    if left == right:                 // List has one record
        return
    mid = int((left + right) / 2)     // Select midpoint
    mergeSort(A, temp, left, mid)     // Mergesort first half
    mergeSort(A, temp, mid+1, right)  // Mergesort second half
    for i = left to right:            // Copy subarray to temp
        temp[i] = A[i]
    // Do the merge operation back to A
    i1 = left
    i2 = mid + 1
    for curr = left to right:
        if i1 > mid:                  // Left sublist exhausted
            A[curr] = temp[i2]
            i2 = i2 + 1
        else if i2 > right:           // Right sublist exhausted
            A[curr] = temp[i1]
            i1 = i1 + 1
        else if temp[i1] <= temp[i2]: // Get smaller value
            A[curr] = temp[i1]
            i1 = i1 + 1
        else:
            A[curr] = temp[i2]
            i2 = i2 + 1

Here is a visualization for the merge step.

2.10.1 Review questions: Mergesort

Answer TRUE or FALSE.

Mergesort (as the code is written in this module) is a stable sorting algorithm. Recall that a stable sorting algorithm maintains the relative order of records with equal keys.

  • What happens when two equal values are compared during merge?

Answer TRUE or FALSE.

Mergesort is easier to implement when operating on a linked list than on an array.

  • Look at the length of the code given in the modules.
  • There are a lot of details to deal with when implementing Mergesort on an array.

You must merge 2 sorted lists of size mm and nn, respectively. The number of comparisons needed in the worst case by the merge algorithm will be:

  • Each comparison puts one record in the final sorted array.
  • You don’t compare when there is only one record.

What is the most complicated part of the Mergesort algorithm?

  • There is no partition in Mergesort.
  • The two main parts of Mergesort are splitting and merging.
  • Splitting is easy – just cut the list into two halves.
  • Merging is not hard, but it is more complicated than splitting.

Mergesort works by splitting a list of nn numbers in half, then sorting each half recursively, and finally merging the two halves. Which of the following list implementations would allow Mergesort to work in Θ(nlogn)\Theta(n \log n) time?

Multiple choices are possible!

We have an implementation that works on arrays. It is also easy to do this on linked lists. It works fine on both singly and doubly lined lists.

In which cases are the time complexities the same for Mergesort?

  • Does Mergesort’s cost vary according to the order of the array input values?
  • No, it does not matter what order the input values have.

The order of the input records has what impact on the number of comparisons required by Mergesort (as presented in this module)?

  • Does Mergesort change when it make a comparison according to the order of the array input values?
  • No, it does not matter what order the input values have.

What is the worst-case time for Mergesort to sort an array of nn records?

  • Does Mergesort’s number of comparisons depend on the particular order of the input array?
  • No, it does not.

What is the running time of Mergesort when the input is an array where all record values are equal?

  • Does Mergesort’s number of comparisons depend on the particular order of the input array?
  • No, it does not.

When is Mergesort a good choice for sorting an array?

  • Does Mergesort’s number of comparisons depend on the particular order of the input array?
  • No, it does not.
  • This makes it reliable in the worst case.

In the worst case, the total number of comparisons for Mergesort is closest to: - [x] nlognn \log n - [ ] nn - [ ] n2n^2 - [ ] n2/2n^2/2

  • What is the asymptotic cost of Mergesort?
  • Only one of these answers looks like the asymptotic cost.

The array-based implementation for Mergesort uses how many arrays?

There are logn\log n passes. But the implementation is careful to reuse the auxilliary array rather than make a new one on each pass. So Mergesort requires the original array and an auxilliary array.