7.2. Binary Search Trees¶
7.2.1. Binary Search Tree Definition¶
A binary search tree (BST) is a binary tree that conforms to the following condition, known as the binary search tree property. All nodes stored in the left subtree of a node whose key value is \(K\) have key values less than or equal to \(K\). All nodes stored in the right subtree of a node whose key value is \(K\) have key values greater than \(K\). Figure 7.3.1 shows two BSTs for a collection of values. One consequence of the binary search tree property is that if the BST nodes are printed using an inorder traversal, then the resulting enumeration will be in sorted order from lowest to highest.
Here is a class declaration for the BST.
Recall that there are various ways to deal with
keys and
comparing records
Three typical approaches are key-value pairs,
a special comparison method such as using the Comparator
class,
and passing in a comparator function.
Our BST implementation will require that records implement the
Comparable
interface.
// A dictionary implemented using a binary search tree.
class BSTMap<K extends Comparable<K>, V> implements Map<K, V> {
Node root = null; // The root of the binary search tree.
int treeSize = 0; // The size of the tree.
// A node in a binary search tree.
class Node {
K key;
V value;
Node left;
Node right;
Node(K key, V value, Node left, Node right) {
this.key = key;
this.value = value;
this.left = left;
this.right = right;
}
}
// Check that the invariant holds.
void checkInvariant() {
int size = checkInvariantHelper(root, null, null);
if (size != treeSize)
throw new AssertionError("wrong tree size");
}
// Recursive helper method for 'check_invariant'.
// Checks that the node is the root of a valid BST, and that
// all keys k satisfy lo < k < hi. The test lo < k is skipped
// if lo is None, and k < hi is skipped if hi is None.
// Also, it returns the size of the tree.
int checkInvariantHelper(Node node, K lo, K hi) {
if (node == null) return 0;
if (lo != null && node.key.compareTo(lo) <= 0)
throw new AssertionError("key too small");
if (hi != null && node.key.compareTo(hi) >= 0)
throw new AssertionError("key too big");
// Keys in the left subtree should be < node.key
// Keys in the right subtree should be > node.key
return 1 +
checkInvariantHelper(node.left, lo, node.key) +
checkInvariantHelper(node.right, node.key, hi);
}
// Return true if there are no keys.
public boolean isEmpty() {
return root == null;
}
// Return the number of keys.
public int size() {
return treeSize;
}
// Return true if the key has an associated value.
public boolean containsKey(K key) {
return get(key) != null;
}
// Look up a key.
public V get(K key) {
return getHelper(root, key);
}
// Recursive helper method for 'get'.
V getHelper(Node node, K key) {
if (node == null)
return null;
if (node.key.compareTo(key) > 0)
return getHelper(node.left, key);
else if (node.key.compareTo(key) < 0)
return getHelper(node.right, key);
else // node.key == key
return node.value;
}
// Add a key-value pair, or update the value associated with an existing key.
// Returns the previous value associated with the key,
// or null if the key wasn't previously present.
public V put(K key, V value) {
root = putHelper(root, key, value);
if (oldValue == null)
treeSize++;
return oldValue;
}
// Recursive helper method for 'put'.
// Stores the previous value in 'oldValue';
Node putHelper(Node node, K key, V value) {
if (node == null) {
oldValue = null;
return new Node(key, value, null, null);
} else if (node.key.compareTo(key) > 0) {
node.left = putHelper(node.left, key, value);
} else if (node.key.compareTo(key) < 0) {
node.right = putHelper(node.right, key, value);
} else { // node.key == key
oldValue = node.value;
node.value = value;
}
return node;
}
// Used by putHelper and removeHelper, in order to return
// the value previously stored in the node.
private V oldValue;
// Delete a key.
// Returns the previous value associated with the key,
// or null if the key wasn't previously present.
public V remove(K key) {
root = removeHelper(root, key);
if (oldValue != null)
treeSize--;
return oldValue;
}
// Recursive helper method for 'remove'.
Node removeHelper(Node node, K key) {
if (node == null) {
oldValue = null;
return null;
} else if (node.key.compareTo(key) > 0) {
node.left = removeHelper(node.left, key);
return node;
} else if (node.key.compareTo(key) < 0) {
node.right = removeHelper(node.right, key);
return node;
} else { // node.key == key
if (node.left == null) {
oldValue = node.value;
return node.right;
} else if (node.right == null) {
oldValue = node.value;
return node.left;
} else {
Node predecessor = largestNode(node.left);
oldValue = node.value;
node.key = predecessor.key;
node.value = predecessor.value;
node.left = removeHelper(node.left, predecessor.key);
return node;
}
}
}
// Find the largest key.
public K lastKey() {
if (root == null)
return null;
else
return largestNode(root).key;
}
// Find the node having the largest key.
Node largestNode(Node node) {
while (node.right != null)
node = node.right;
return node;
}
// Iterate through all keys.
// This is called when the user writes 'for (K key: bst) { ... }.'
public Iterator<K> iterator() {
// The easiest way to solve this is to add all keys to an
// ArrayList, then iterate through that.
ArrayList<K> keys = new ArrayList<>();
iteratorHelper(root, keys);
return keys.iterator();
}
// Recursive helper method for 'iterator'
void iteratorHelper(Node node, ArrayList<K> keys) {
if (node == null) return;
iteratorHelper(node.left, keys);
keys.add(node.key);
iteratorHelper(node.right, keys);
}
}
# Python does not have internal classes, so we have to make the tree node class standalone.
class Node:
"""A node in a binary search tree."""
def __init__(self, key, value, left, right):
self.key = key
self.value = value
self.left = left
self.right = right
class BSTMap(Map):
"""A dictionary implemented using a binary search tree."""
def __init__(self):
self.root = None
self.treeSize = 0
def check_invariant(self):
"""Check that the invariant holds."""
size = self.check_invariant_helper(self.root, None, None)
assert size == self.treeSize, "wrong tree size"
@staticmethod
def check_invariant_helper(node, lo, hi):
"""Recursive helper method for 'check_invariant'.
Checks that the node is the root of a valid BST, and that
all keys k satisfy lo < k < hi. The test lo < k is skipped
if lo is None, and k < hi is skipped if hi is None."""
if node is None: return 0
assert lo is None or node.key > lo, "key too small"
assert hi is None or node.key < hi, "key too big"
# Keys in the left subtree should be < node.key
# Keys in the right subtree should be > node.key
return (1 +
BSTMap.check_invariant_helper(node.left, lo, node.key) +
BSTMap.check_invariant_helper(node.right, node.key, hi))
def isEmpty(self):
"""Return true if there are no keys."""
return self.root is None
def size(self):
"""Return the number of keys."""
return self.treeSize
def containsKey(self, key):
"""Return true if the key has an associated value."""
return self.get(key) is not None
def get(self, key):
"""Look up a key."""
return self.get_helper(self.root, key)
@staticmethod
def get_helper(node, key):
"""Helper method for 'get'."""
if node is None:
return None
elif node.key > key:
return BSTMap.get_helper(node.left, key)
elif node.key < key:
return BSTMap.get_helper(node.right, key)
else:
return node.value
def put(self, key, value):
"""Add a key-value pair, or update the value associated with an existing key.
Returns the value previously associated with the key,
or None if the key was not present."""
self.root, old_value = self.put_helper(self.root, key, value)
if old_value is None:
self.treeSize += 1
return old_value
@staticmethod
def put_helper(node, key, value):
"""Recursive helper method for 'put'.
Returns the updated node, and the value previously associated with the key."""
if node is None:
return Node(key, value, None, None), None
elif node.key > key:
node.left, old_value = BSTMap.put_helper(node.left, key, value)
elif node.key < key:
node.right, old_value = BSTMap.put_helper(node.right, key, value)
else: # node.key == key
old_value = node.value
node.value = value
return node, old_value
def remove(self, key):
"""Delete a key.
Returns the value previously associated with the key,
or None if the key was not present."""
self.root, old_value = self.remove_helper(self.root, key)
if old_value is not None:
self.treeSize -= 1
return old_value
@staticmethod
def remove_helper(node, key):
"""Helper method for 'remove'.
Returns the updated node, and the value previously associated with the key."""
if node is None:
return None, None
elif node.key > key:
node.left, old_value = BSTMap.remove_helper(node.left, key)
return node, old_value
elif node.key < key:
node.right, old_value = BSTMap.remove_helper(node.right, key)
return node, old_value
else: # node.key == key
if node.left is None:
return node.right, node.value
elif node.right is None:
return node.left, node.value
else:
predecessor = BSTMap.largestNode(node.left)
old_value = node.value
node.key = predecessor.key
node.value = predecessor.value
node.left, _ = BSTMap.remove_helper(node.left, predecessor.key)
return node, old_value
def lastKey(self):
"""Find the largest key."""
if self.root is None:
return None
else:
return self.largestNode(self.root).key
@staticmethod
def largestNode(node):
"""Find the node having the largest key."""
while node.right is not None:
node = node.right
return node
def __iter__(self):
"""Iterate through all keys.
This is called when the user writes 'for key in bst: ...'."""
return self.iter_helper(self.root)
@staticmethod
def iter_helper(node):
"""Helper method for '__iter__'."""
# This method is a generator:
# https://docs.python.org/3/howto/functional.html#generators
# Generators are an easy way to make iterators.
if node is None:
return
else:
for key in BSTMap.iter_helper(node.left):
yield key
yield node.key
for key in BSTMap.iter_helper(node.right):
yield key
def __getitem__(self, key):
"""This is called when the user writes 'x = bst[key]'."""
return self.get(key)
def __setitem__(self, key, value):
"""This is called when the user writes 'bst[key] = value'."""
self.put(key, value)
def __contains__(self, key):
"""This is called when the user writes 'key in bst'."""
return self.containsKey(key)
def __delitem__(self, key):
"""This is called when the user writes 'del bst[key]'."""
self.remove(key)
7.2.1.1. BST Search¶
The first operation that we will look at in detail will find the
record that matches a given key.
Notice that in the BST class, public member function
get
calls private member function getHelper
.
Method get
takes the search key as an explicit parameter
and its BST as an implicit parameter, and returns the record that
matches the key.
However, the find operation is most easily implemented as a
recursive function whose parameters are the root of a
subtree and the search key.
Member getHelper
has the desired form for this recursive
subroutine and is implemented as follows.
// Look up a key.
public V get(K key) {
return getHelper(root, key);
}
// Recursive helper method for 'get'.
V getHelper(Node node, K key) {
if (node == null)
return null;
if (node.key.compareTo(key) > 0)
return getHelper(node.left, key);
else if (node.key.compareTo(key) < 0)
return getHelper(node.right, key);
else // node.key == key
return node.value;
}
def get(self, key):
"""Look up a key."""
return self.get_helper(self.root, key)
@staticmethod
def get_helper(node, key):
"""Helper method for 'get'."""
if node is None:
return None
elif node.key > key:
return BSTMap.get_helper(node.left, key)
elif node.key < key:
return BSTMap.get_helper(node.right, key)
else:
return node.value
7.2.2. BST Insert¶
Now we look at how to insert a new node into the BST.
// Add a key-value pair, or update the value associated with an existing key.
// Returns the previous value associated with the key,
// or null if the key wasn't previously present.
public V put(K key, V value) {
root = putHelper(root, key, value);
if (oldValue == null)
treeSize++;
return oldValue;
}
// Recursive helper method for 'put'.
// Stores the previous value in 'oldValue';
Node putHelper(Node node, K key, V value) {
if (node == null) {
oldValue = null;
return new Node(key, value, null, null);
} else if (node.key.compareTo(key) > 0) {
node.left = putHelper(node.left, key, value);
} else if (node.key.compareTo(key) < 0) {
node.right = putHelper(node.right, key, value);
} else { // node.key == key
oldValue = node.value;
node.value = value;
}
return node;
}
// Used by putHelper and removeHelper, in order to return
// the value previously stored in the node.
private V oldValue;
def put(self, key, value):
"""Add a key-value pair, or update the value associated with an existing key.
Returns the value previously associated with the key,
or None if the key was not present."""
self.root, old_value = self.put_helper(self.root, key, value)
if old_value is None:
self.treeSize += 1
return old_value
@staticmethod
def put_helper(node, key, value):
"""Recursive helper method for 'put'.
Returns the updated node, and the value previously associated with the key."""
if node is None:
return Node(key, value, None, None), None
elif node.key > key:
node.left, old_value = BSTMap.put_helper(node.left, key, value)
elif node.key < key:
node.right, old_value = BSTMap.put_helper(node.right, key, value)
else: # node.key == key
old_value = node.value
node.value = value
return node, old_value
Note that, except for the last node in the path, putHelp
will not actually change the child pointer for any of the nodes that
are visited.
In that sense, many of the assignments seem redundant.
However, the cost of these additional assignments is worth paying to
keep the insertion process simple.
The alternative is to check if a given assignment is necessary, which
is probably more expensive than the assignment!
We have to decide what to do when the node that we want to insert has a key value equal to the key of some node already in the tree. If during insert we find a node that duplicates the key value to be inserted, then we have two options. If the application does not allow nodes with equal keys, then this insertion should be treated as an error (or ignored). If duplicate keys are allowed, our convention will be to insert the duplicate in the left subtree.
The shape of a BST depends on the order in which elements are inserted. A new element is added to the BST as a new leaf node, potentially increasing the depth of the tree. Figure 7.3.1 illustrates two BSTs for a collection of values. It is possible for the BST containing \(n\) nodes to be a chain of nodes with height \(n\). This would happen if, for example, all elements were inserted in sorted order. In general, it is preferable for a BST to be as shallow as possible. This keeps the average cost of a BST operation low.
7.2.3. BST Remove¶
Removing a node from a BST is a bit trickier than inserting a node,
but it is not complicated if all of the possible cases are considered
individually.
Before tackling the general node removal process,
we need a useful companion method, largestNode
, which returns a
pointer to the node containing the maximum value in the subtree.
// Find the node having the largest key.
Node largestNode(Node node) {
while (node.right != null)
node = node.right;
return node;
}
@staticmethod
def largestNode(node):
"""Find the node having the largest key."""
while node.right is not None:
node = node.right
return node
Now we are ready for the removeHelper
method.
Removing a node with given key value \(R\) from the BST
requires that we first find \(R\) and then remove it from the
tree.
So, the first part of the remove operation is a search to find
\(R\).
Once \(R\) is found, there are several possibilities.
If \(R\) has no children, then \(R\)’s parent has its
pointer set to NULL.
If \(R\) has one child, then \(R\)’s parent has
its pointer set to \(R\)’s child (similar to deleteMax
).
The problem comes if \(R\) has two children.
One simple approach, though expensive, is to set \(R\)’s parent to
point to one of \(R\)’s subtrees, and then reinsert the remaining
subtree’s nodes one at a time.
A better alternative is to find a value in one of the
subtrees that can replace the value in \(R\).
Thus, the question becomes: Which value can substitute for the one being removed? It cannot be any arbitrary value, because we must preserve the BST property without making major changes to the structure of the tree. Which value is most like the one being removed? The answer is the least key value greater than the one being removed, or else the greatest key value less than (or equal to) the one being removed. If either of these values replace the one being removed, then the BST property is maintained.
When duplicate node values do not appear in the tree, it makes no difference whether the replacement is the greatest value from the left subtree or the least value from the right subtree. If duplicates are stored in the left subtree, then we must select the replacement from the left subtree. 1 To see why, call the least value in the right subtree \(L\). If multiple nodes in the right subtree have value \(L\), selecting \(L\) as the replacement value for the root of the subtree will result in a tree with equal values to the right of the node now containing \(L\). Selecting the greatest value from the left subtree does not have a similar problem, because it does not violate the Binary Search Tree Property if equal values appear in the left subtree.
- 1
Alternatively, if we prefer to store duplicate values in the right subtree, then we must replace a deleted node with the least value from its right subtree.
The code for removal is shown here.
// Delete a key.
// Returns the previous value associated with the key,
// or null if the key wasn't previously present.
public V remove(K key) {
root = removeHelper(root, key);
if (oldValue != null)
treeSize--;
return oldValue;
}
// Recursive helper method for 'remove'.
Node removeHelper(Node node, K key) {
if (node == null) {
oldValue = null;
return null;
} else if (node.key.compareTo(key) > 0) {
node.left = removeHelper(node.left, key);
return node;
} else if (node.key.compareTo(key) < 0) {
node.right = removeHelper(node.right, key);
return node;
} else { // node.key == key
if (node.left == null) {
oldValue = node.value;
return node.right;
} else if (node.right == null) {
oldValue = node.value;
return node.left;
} else {
Node predecessor = largestNode(node.left);
oldValue = node.value;
node.key = predecessor.key;
node.value = predecessor.value;
node.left = removeHelper(node.left, predecessor.key);
return node;
}
}
}
def remove(self, key):
"""Delete a key.
Returns the value previously associated with the key,
or None if the key was not present."""
self.root, old_value = self.remove_helper(self.root, key)
if old_value is not None:
self.treeSize -= 1
return old_value
@staticmethod
def remove_helper(node, key):
"""Helper method for 'remove'.
Returns the updated node, and the value previously associated with the key."""
if node is None:
return None, None
elif node.key > key:
node.left, old_value = BSTMap.remove_helper(node.left, key)
return node, old_value
elif node.key < key:
node.right, old_value = BSTMap.remove_helper(node.right, key)
return node, old_value
else: # node.key == key
if node.left is None:
return node.right, node.value
elif node.right is None:
return node.left, node.value
else:
predecessor = BSTMap.largestNode(node.left)
old_value = node.value
node.key = predecessor.key
node.value = predecessor.value
node.left, _ = BSTMap.remove_helper(node.left, predecessor.key)
return node, old_value
7.2.4. BST Analysis¶
The cost for getHelper
and putHelper
is the depth of
the node found or inserted.
The cost for removeHelper
is the depth of the node being
removed, or in the case when this node has two children,
the depth of the node with smallest value in its right subtree.
Thus, in the worst case, the cost for any one of these operations is
the depth of the deepest node in the tree.
This is why it is desirable to keep BSTs
balanced, that is, with least possible
height.
If a binary tree is balanced, then the height for a tree of \(n\)
nodes is approximately \(\log n\).
However, if the tree is completely unbalanced, for example in the
shape of a linked list, then the height for a tree with \(n\)
nodes can be as great as \(n\).
Thus, a balanced BST will in the average case have operations costing
\(\Theta(\log n)\), while a badly unbalanced BST can have
operations in the worst case costing \(\Theta(n)\).
Consider the situation where we construct a BST of \(n\) nodes
by inserting records one at a time.
If we are fortunate to have them arrive in an order that results in a
balanced tree (a “random” order is likely to be good
enough for this purpose), then each insertion will cost on average
\(\Theta(\log n)\), for a total cost of
\(\Theta(n \log n)\).
However, if the records are inserted in order of increasing value,
then the resulting tree will be a chain of height \(n\).
The cost of insertion in this case will be
\(\sum_{i=1}^{n} i = \Theta(n^2)\).
Traversing a BST costs \(\Theta(n)\) regardless of the shape of the tree. Each node is visited exactly once, and each child pointer is followed exactly once.
Below is an example traversal, named printHelper
.
It performs an inorder traversal on the BST to print the node keys, in
sorted order.
// An example inorder traversal.
// Prints all node keys, in sorted order.
void printHelper(Node node) {
if (node == null) return;
printHelper(node.left);
System.out.println(node.key);
printHelper(node.right);
}
@staticmethod
def print_helper(node):
"""An example inorder traversal.
Prints all node keys, in sorted order."""
if node is None: return
BSTMap.print_helper(node.left)
print(node.key)
BSTMap.print_helper(node.right)
While the BST is simple to implement and efficient when the tree is balanced, the possibility of its being unbalanced is a serious liability. There are techniques for organizing a BST to guarantee good performance. Two examples are the AVL tree and the splay tree. There also exist other types of search trees that are guaranteed to remain balanced, such as the 2-3 Tree.