11.1 Binary search trees
A binary search tree (BST) is a binary tree that conforms to the following condition, known as the binary search tree property:
All nodes stored in the left subtree of a node whose key value is have key values less than or equal to . All nodes stored in the right subtree of a node whose key value is have key values greater than .
Figure 11.1 below shows two BSTs for a collection of values. One consequence of the binary search tree property is that if the BST nodes are printed using an inorder traversal, then the resulting enumeration will be in sorted order from lowest to highest.
Figure 11.1: Two Binary Search Trees for a collection of values. Tree (a) results if values are inserted in the order 37, 24, 42, 120, 32, 7. Tree (b) results if the same values are inserted in the order 7, 37, 42, 32, 120, 24..
The only thing that differentiates a BST from a normal binary tree is the BST property. This property is an invariant, as as explained in Section 2.2, and it is a condition that the BST always must satisfy.
Invariants are not stored in the datatype, so the actual declaration for a BST is exactly the same as a normal binary tree:
datatype BST:
= null
root: BinaryNode size: Int = 0
The BST property is therefore not explicit in the datatype declaration, but instead it is something that all operations must satisfy. Whenever we implement a new operation, we have to make sure that we never break the property.
Searching in a BST
Because of the BST property, we don’t have to search the whole tree if we want to find an element. Instead, we can start at the root node and compare its value with the value we are searching for. If the value is smaller than the root, we know that we can disregard everything in the right subtree. (And conversely, if the value is larger, we can disregard the left subtree.)
Now, assuming that the value was smaller than the root, we can go to the left child and compare again. If the value now is larger than the child, we can continue into the child’s right subtree. We continue this until we have found the value in a node, or until we reach an empty child. If we reach an empty child we know that the value is not in the tree. Here is a possible implementation of a method that returns true if the element is in the tree:
datatype BST:
...contains(elem):
= root
node while node is not null:
if elem == node.elem:
return true
else if elem < node.elem:
= node.left
node else if elem > node.elem:
= node.right
node return false
Inserting into a BST
When we want to insert an element we first have to search for it,
similar to what we did above. If it already exists we don’t have to do
anything, and if it isn’t in the tree we create a new node and attach it
to the right place. But how do we know where to add the new node? The
variable node
becomes null if the element isn’t found, so
we have nothing to attach the node to – instead we want to attach the
new node to the previous tree node that we looked at. The
solution is to add another temporary variable, parent
,
which points to the parent of node
– i.e., its value in the
previous iteration.
Now, after we have completed the search, and the element wasn’t
found, we can simply create a new node and attach it as a child to
parent
. Depending on if the element is smaller or larger
than the parent element, we make it a left or right child.
Alternatively, if the tree is empty then we have to attach the new node
directly to the root.
datatype BST:
...add(elem):
// Search for the parent of the new node.
= null
parent = root
node while node is not null:
= node
parent if elem == node.elem
return // The element is already in the BST, so we do nothing.
else if elem < node.elem:
= node.left
node else if elem > node.elem:
= node.right
node
// Create a new node and attach it to the parent.
= new BinaryNode(elem)
node if parent is null:
= node // The tree is empty, so we update the root.
root else if elem < parent.elem:
= node
parent.left else if elem > parent.elem:
= node
parent.right size = size + 1
We have to decide what to do when the node that we want to insert has a key value equal to the key of some node already in the tree. If during insert we find a node that duplicates the key value to be inserted, then we have two options. If the application does not allow nodes with equal keys, then this insertion should be treated as an error (or ignored). If duplicate keys are allowed, our convention will be to insert the duplicate in the left subtree.
The shape of a BST depends on the order in which elements are inserted. A new element is added to the BST as a new leaf node, potentially increasing the depth of the tree. Figure 11.1 illustrates two BSTs for a collection of values. It is possible for the BST containing nodes to be a chain of nodes with height . This would happen if, for example, all elements were inserted in sorted order. In general, it is preferable for a BST to be as shallow as possible. This keeps the average cost of a BST operation low.
11.1.1 Recursive search and insert
Bot searching and insertion can be implemented as recursive function
too, and it is instructive to see how to to that. And since we showed
the set operations contains
and add
above, we will show how to implement recursive versions of the
map operations get
and put
.
Note that this means that our binary tree nodes will have the instance variables key and value, instead of the single elem that we used earlier.
Getting the value for a key
First we will look at the get
operation for finding the
record that matches a given key. To accomplish this we need a recursive
helper function which takes a node as argument, and we start by calling
this function with the root:
datatype BSTMap implements Map:
...get(key):
return getHelper(root, key)
The function getHelper
performs the same iteration as we
did earlier with contains
, but implicitly as recursive
calls. When the key is found it returns the value of the node, and if
the key doesn’t exist it returns null:
getHelper(node, key):if node is null:
return null
else if key < node.key:
return getHelper(node.left, key)
else if key > node.key:
return getHelper(node.right, key)
else if key == node.key:
return node.value
Here is an interactive explanation of searching in a BST.
Here is an exercise on the BST search algorithm.
Setting the value for a key
Now we look at how to set the value of a key. Yet again we need a recursive helper function which we call with the root of the tree. This function returns a pointer to the updated tree, and we have to make sure to update the root to this updated tree.
datatype BSTMap implements Map:
...put(key, value):
= putHelper(root, key, value) root
We do the same as for getHelper
as for
putHelper
above, but we have to update the value of each
child pointer in the path to be the new updated tree.
putHelper(node, key, value):if node is null:
size = size + 1
return new BSTNode(key, value)
else if key < node.key:
= putHelper(node.left, key, value)
node.left else if key > node.key:
= putHelper(node.right, key, value)
node.right else if key == node.key:
= value
node.value return node
Here is an interactive explanation of recursive insertion.
Note that, except for the last node in the path,
putHelper
will not actually change the child pointer for
any of the nodes that are visited. In that sense, many of the
assignments seem redundant. However, the cost of these additional
assignments is worth paying to keep the insertion process simple. The
alternative is to check if a given assignment is necessary, which is
probably more expensive than the assignment!
Here is an exercise on BST insertion.
11.1.2 Removing from a BST
Removing a node from a BST is a bit trickier than inserting a node,
but it is not complicated if all of the possible cases are considered
individually. Before tackling the general node removal process, we need
a useful companion method, largestNode
, which returns a
pointer to the node containing the maximum value in a subtree.
function largestNode(node):
while node.right is not null:
= node.right
node return node
This time we will show a recursive implementation of
remove
. For this we need a recursive helper function which
we initially call with the root of the tree.
datatype BSTMap:
...remove(key):
= removeHelper(root, key) root
Now we are ready for the removeHelper
method. Removing a
node with given key value
from the BST requires that we first find
and then remove it from the tree. So, the first part of the remove
operation is a search to find
.
Once
is found, there are several possibilities. If
has no children, then
’s
parent has its pointer set to null. If
has one child, then
’s
parent has its pointer set to
’s
child. The problem comes if
has two children. One simple approach, though expensive, is to set
’s
parent to point to one of
’s
subtrees, and then reinsert the remaining subtree’s nodes one at a time.
A better alternative is to find a value in one of the subtrees that can
replace the value in
.
Thus, the question becomes: Which value can substitute for the one being removed? It cannot be any arbitrary value, because we must preserve the BST property without making major changes to the structure of the tree. Which value is most like the one being removed? The answer is the least key value greater than the one being removed, or else the greatest key value less than (or equal to) the one being removed. If either of these values replace the one being removed, then the BST property is maintained.
Here is an interactive explanation of BST deletion.
When duplicate node values do not appear in the tree, it makes no difference whether the replacement is the greatest value from the left subtree or the least value from the right subtree. If duplicates are stored in the left subtree, then we must select the replacement from the left subtree. To see why, call the least value in the right subtree . If multiple nodes in the right subtree have value , selecting as the replacement value for the root of the subtree will result in a tree with equal values to the right of the node now containing . Selecting the greatest value from the left subtree does not have a similar problem, because it does not violate the Binary Search Tree Property if equal values appear in the left subtree.
Note: Alternatively, we can decide to store duplicate values in the right subtree instead of the left. Then we must replace a deleted node with the least value from its right subtree.
The code for removal is shown here. Note that the helper function returns the updated subtree, and we have to make sure to update the child pointers to this updated tree.
removeHelper(node, key):if node is null:
return null
else if key < node.key:
= removeHelper(node.left, key)
node.left return node
else if key > node.key:
= removeHelper(node.right, key)
node.right return node
else if key == node.key:
if node.left is null:
size = size - 1
return node.right
else if node.right is null:
size = size - 1
return node.left
else:
= largestNode(node.left)
predecessor = predecessor.key
node.key = predecessor.value
node.value = removeHelper(node.left, predecessor.key)
node.left return node
datatype BSTMap:
...remove(key):
= null
parent = root
node while node is not null:
= node
parent if key < node.key:
= node.left
node else if key > node.key:
= node.right
node else: // key == node.key
break // we found the key so break out of the loop
if node is null:
return // the key is not in the tree, so do nothing
// node to be deleted has at most one child
if node.left is null or node.right is null:
// newNode will replace the node to be deleted
if node.left is null:
= node.right
newNode else: // node.right is null
= node.left
newNode
if parent is null:
= newNode // the node to be deleted is the root, so update the root
root else if node == parent.left:
= newNode // node is a left child, so make newNode a left child
parent.left else: // node == parent.right
= newNode // node is a right child, so make newNode a right child
parent.right
= None
node
// node to be deleted has two children
else:
= null
predecessorParent = node.left
predecessor while predecessor.right is not null:
= predecessor
predecessorParent = predecessor.right
predecessor
= predecessor.key
node.key = predecessor.value
node.value
if predecessorParent is null:
= predecessor.left
node.left else:
= predecessor.left predecessorParent.right
Here is an exercise on BST deletion.
11.1.3 Analysis
The cost for contains
, get
,
add
, and put
is the depth of the node found or
inserted. The cost for remove
is the depth of the node
being removed, or in the case when this node has two children, the depth
of the node with smallest value in its right subtree. Thus, in the worst
case, the cost for any one of these operations is the depth of the
deepest node in the tree. This is why it is desirable to keep BSTs balanced,
that is, with least possible height. If a binary tree is balanced, then
the height for a tree of
nodes is approximately
.
However, if the tree is completely unbalanced, for example in the shape
of a linked list, then the height for a tree with
nodes can be as great as
.
Thus, a balanced BST will in the average case have operations costing
,
while a badly unbalanced BST can have operations in the worst case
costing
.
Consider the situation where we construct a BST of
nodes by inserting records one at a time. If we are fortunate to have
them arrive in an order that results in a balanced tree (a “random”
order is likely to be good enough for this purpose), then each insertion
will cost on average
,
for a total cost of
.
However, if the records are inserted in order of increasing value, then
the resulting tree will be a chain of height
.
The cost of insertion in this case will be
.
Traversing a BST costs regardless of the shape of the tree. Each node is visited exactly once, and each child pointer is followed exactly once. For example, if we want to print the nodes in ascending order we can perform an inorder traversal (Section 8.4), which then will take time linear in the size of the tree.
While the BST is simple to implement and efficient when the tree is balanced, the possibility of its being unbalanced is a serious liability. There are techniques for organising a BST to guarantee good performance. Two examples are the AVL tree (see Section 11.3) and the splay tree (see Section 11.4). There also exist other types of search trees that are guaranteed to remain balanced, such as the red-black tree or the 2-3 Tree.
11.1.4 Guided information flow
When writing a recursive method to solve a problem that requires traversing a binary tree, we want to make sure that we are visiting the required nodes (no more and no less).
So far, we have seen several tree traversals that visited every node of the tree. We also saw the BST search, insert, and remove routines, that each go down a single path of the tree. Guided traversal refers to a problem that does not require visiting every node in the tree, though it typically requires looking at more than one path through the tree. This means that the recursive function is making some decision at each node that sometimes lets it avoid visiting one or both of its children. The decision is typically based on the value of the current node. Many problems that require information flow on binary search trees are “guided” in this way.
Example: Minimum value in a tree
An extreme example is finding the minimum value in a BST. A bad solution to this problem would visit every node of the tree. However, we can take advantage of the BST property to avoid visiting most nods in the tree. You know that the values greater than the root are always in the right subtree, and those values less than the root are in the left subtree. Thus, at each node we need only visit the left subtree until we reach a leaf node.
Here is a problem that typically needs to visit more than just a
single path, but not all of the nodes.