7.3 Heapsort
Before we get to Heapsort, consider for a moment the practicality of using a Binary Search Tree for sorting. You could insert all of the values to be sorted into the BST one by one, then traverse the completed tree using an inorder traversal. The output would form a sorted list. This approach is conceptually very similar to Quicksort, where an internal node corresponds to the pivot, and the left (right) subtree consists of the partition of values smaller (larger) than the pivot.
However, the approach has a number of drawbacks, including the extra space required by BST pointers and the amount of time required to insert nodes into the tree. Quicksort implements this same concept in a much more efficient way. But there is also the possibility that the BST might be unbalanced, leading to a worst-case running time. And this is the same problem as Quicksort has with chosing a good pivot (see the section “Quicksort Analysis” in the Quicksort module).
Instead, a good sorting algorithm can be devised based on a tree structure more suited to the purpose. In particular, we would like the tree to be balanced, space efficient, and fast. The algorithm should take advantage of the fact that sorting is a special-purpose application in that all of the values to be stored are available at the start. This means that we do not necessarily need to insert one value at a time into the tree structure.
Heapsort
is based on the heap
data structure. Heapsort has all of the advantages just listed. The
complete binary tree is balanced, its array representation is space
efficient, and we can load all values into the tree at once, taking
advantage of the efficient buildHeap
function. The
asymptotic performance of Heapsort when all of the records have unique
key values is
in the best, average, and worst cases. It is not as fast as Quicksort in
the average case (by a constant factor), but Heapsort has special
properties that will make it particularly useful for external
sorting algorithms, used when sorting data sets too large to fit in
main memory.
A complete implementation is as follows.
function heapsort(a):
// Convert the array to a heap:
buildHeap(a)
// Repeatedly find and remove the minimum element
for heapsize = a.size()-1 downto 0:
0], a[heapsize] = a[heapsize], a[0]
a[0)
siftDown(a, heapsize,
function buildHeap(a):
// Go backwards from the first non-leaf, sifting down each one
= a.size()
heapsize for i = heapsize/2-1 downto 0:
siftDown(a, heapsize, i)
function siftDown(a, heapsize, i):
// Standard sift-down method for max heaps
= 2*i + 1
left while left < heapsize:
= left + 1
right
if a[left] > a[i]:
= left
largest else:
= i
largest
if right < heapsize and a[right] > a[largest]:
= right
largest
if largest == i:
return
else:
= a[largest], a[i]
a[i], a[largest] = largest
i
= 2*i + 1 left
Here is a warmup practice exercise for Heapsort.
7.3.1 Heapsort Proficiency Practice
Now test yourself to see how well you understand Heapsort. Can you reproduce its behavior?
7.3.2 Heapsort Analysis
This visualization presents the running time analysis of Heap Sort
While typically slower than Quicksort by a constant factor (because
unloading the heap using removemax
is somewhat slower than
Quicksort’s series of partitions), Heapsort has one special advantage
over the other sorts studied so far. Building the heap is relatively
cheap, requiring
time. Removing the maximum-valued record from the heap requires
time in the worst case. Thus, if we wish to find the
records with the largest key values in an array, we can do so in time
.
If
is small, this is a substantial improvement over the time required to
find the
largest-valued records using one of the other sorting methods described
earlier (many of which would require sorting all of the array first).
One situation where we are able to take advantage of this concept is in
the implementation of Kruskal’s
algorithm for minimal-cost
spanning trees. That algorithm requires that edges be visited in
ascending order (so, use a min-heap), but this process stops as soon as
the MST is complete. Thus, only a relatively small fraction of the edges
need be sorted.
Another special case arises when all of the records being sorted have the same key value. This represents the best case for Heapsort. This is because removing the smallest value requires only constant time, since the value swapped to the top is never pushed down the heap.
7.3.3 Practice questions: Heapsort
Answer TRUE or FALSE.
Heapsort (as the code is written in this module) is a stable sorting algorithm. Recall that a stable sorting algorithm maintains the relative order of records with equal keys.
- What happens when two equal values are in separate branches of the tree?
- Can the one appearing later in the array move up in the tree?
A disadvantage of Heapsort is:
- Heapsort does not need auxilliary storage.
- Heapsort runs in time.
- Equal-valued records might be in different sides of the heap, and can get out of relative order.
In which cases are the time complexities the same for Heapsort?
- Does Heapsort’s cost vary according to the order of the array input values?
- No, it does not matter what order the input values have.
- However, the best case occurs when all the values are same.
(Assuming no duplicate key values) The order of the input records has what impact on the number of comparisons required by Heapsort (as presented in this module)?
- Can Heapsort’s behavior change depending on outcome of a comparison?
- Yes, it changes things a little bit in that it might move things up and down the heap more or less.
- But this does not matter, because removing a value from the heap normally costs .
What is the worst-case time for Heapsort to sort an array of n records that each have unique key values?
- Does Heapsort’s number of comparisons depend on the particular order of the input array?
- Only a little bit, Heapsort still does basically the same work regardless of input data order.
What is the running time of Heapsort when the input is an array where all key values are equal?
- Heapsort has the same asymptotic cost in the best, average, and worst cases, but only if the key values are all unique.
- If you have a heap with all equal key values, what will the siftdown operation do?
- In that case, siftdown will always return immediately, resulting in a constant time operation. Since Heapsort calls siftdown times, the total cost is .
How much auxilliary space or overhead (beyond the array holding the records) is needed by Heapsort?
- Heapsort does not require any auxilliary arrays.