10.3 Case study: Implementing sets using lists
Both sets and maps can be implemented using lists. For a set, we can use a simple list of elements.
Using a linked list
Recall from Section 6.3 that a linked list consists of nodes pointing to their successors:
datatype SetNode of T:
elem: T // Value for this node
next: SetNode of T // Pointer to next node in listJust as for linked stacks, our set datatype will consist of a pointer
to the head of the list, and its size. We also have to give the
operations for contains, add and
remove.
datatype LinkedSet implements Set:
head: Node = null // Pointer to head of list
size: Int = 0 // Size of listSearching in a linked list is a simple while loop where we start at
the head and continue down the list. If we reach the end of the list we
return false, because the element wasn’t found:
datatype LinkedSet:
...
contains(elem):
node = head
while node is not null:
if elem == node.elem:
return true // We found the element, so return true
node = node.next // Move to the next list node
return false // We reached the end of the listTo add an element we first have to check if it is already there – sets are not allowed to contain the same element twice. If it is not in the list we can just attach a new list node before the head of the list (and increase the size):
datatype LinkedSet:
...
add(elem):
if not contains(elem):
newHead = new Node(elem, head) // Create a new node pointing to the current head
head = newHead // Redirect the head to the new node
size = size + 1Removing an element is slightly trickier. If we want to remove a
certain node from the list, we have to redirect the next
pointer from the previous node. So we have to search for the
node before the node we want to remove. We can do that by
keeping a prev pointer while iterating through the
list.
class LinkedSet:
...
remove(elem):
prev = null
node = head
while node is not null:
if elem == node.elem: // We found the node to remove
if prev is null:
head = node.next
else:
prev.next = node.next
node.next = null // For garbage collection
size = size - 1
return
prev = node
node = node.nextNote that we have a special case: if prev is null it is
the head itself that we want to remove, because there is not previous
node yet.
Linear complexity
It is clear that all the operations are linear in the size of the list, , because in the worst case we have to look at all nodes. In later chapters we will see how to improve the efficiency, by using
- Balanced search trees (Section 11.2), which bring down the complexity of the operations to .
- Hash tables (Chapter 12), which make the operations constant time, .
But some times it is enough to use a simple linked list-based implementation. And in fact, the separate chaining hash table (Section 12.3) requires an underlying simpler implementation – and there a linked list works very fine!
Using a sorted array
There is a way to speed up one of the operations, by using a
sorted array instead of a linked list. If we have a sorted array of
elements, the contains method can use binary
search (Section 1.6.1), which
takes logarithmic time (Section 2.8).
Hence looking up items will be really efficient.
Unfortunately, modifying the data structure is still slow. If we want
to add an item to a sorted array, we have to keep the array
sorted – and that means we need to insert the new item at the
right place in the array, using the insertion algorithm from Insertion
Sort (Section 3.6).
This takes linear time in the worst case. Similarly, to
remove an item without creating a hole in the array, we
need to shift a bunch of elements in the array, and this also takes
linear time.
However, a sorted array is a suitable way to implement a set or a map that never changes, that is where we never need to add or remove items after the array is created. We start by sorting the array, using either Quicksort or Mergesort, and then we can use binary search to find items in it. Sorted arrays also support the sorted set and sorted map operations such as range queries (see Section 10.4), because these can also be implemented using binary search.
Sorted arrays can also be useful in cases where we always add many items in one go. Given a sorted array , and an unsorted list of items , we can add the items in to as follows. First we sort , then we merge and , using the merge algorithm from Mergesort. Note that the merge step takes linear time, and sorting takes a bit more than linear time, so this is a lot faster than adding all the items from one by one (which would take quadratic time).
10.3.1 Implementing maps
If we want to implement a map instead of a set, we have two choices:
- We can have a list or array of key-value pairs.
- Alternatively, we can use two lists or arrays, one for holding the keys and another for holding the values. The two lists must be “kept in sync” so that a key with its value occurs at the same position.
Note that these maps are exactly as slow as the corresponding sets (linked lists and sorted arrays), so they are not useful for working with large collections – for this we refer to Chapters 11 and 12.
Using a linked list
If we want to implement the map using a linked list, the easiest is probably to have list nodes that contain both the keays and the values:
datatype MapNode of K to V:
key: K // Key for this node
value: V // Value for this node
next: MapNode // Pointer to next node in listThe map operations get, put and
remove are minor modifications to the set operations that
we defined earlier. We leave them as an exercise to the reader to
implement.
Using a sorted array
If we want to use a sorted array, one possibility is to use two arrays – one for the keys and another for the values.
datatype SortedArrayMap of K to V:
keys: Array of K
values: Array of V
size: IntThe implementations of the set methods are left as exercises to the reader, but there are some important things to remember:
- The
keysandvaluesmust be kept in sync, so thatkeys[i]is always the key for the value invalues[i] - This means that we have to remember to modify both arrays in the
same way – if we delete the
th
element we have to delete both
keys[i]andvalues[i](and shift the arrays in the same way)