12.4 Implementing sets and maps using separate chaining
In this section we go into more details in how to get a working implementation. First we show how to implement a hash set, an then we discuss how to extend this into a hash map.
12.4.1 Implementing sets
A separate chaining hash set consists of an internal array
table
of sets. We don’t have to specify what kind of sets,
as long as it supports the basic set methods. Usually it’s perfectly
fine to use a very simple linked list and not something more fancy. To
initialise the table, we first create the internal array of some initial
minimum capacity:
datatype SeparateChainingHashSet implements Set:
Array of Sets
table: size: Int
constructor():
initialise(MIN_CAPACITY)
To simplify things we put the initialisation in a method of its own, because we will reuse it later when resizing the table. The initialisation not only creates the internal table, but also populates it with new empty sets.
datatype SeparateChainingHashSet:
...
initialise(capacity):size = 0
= new Array(capacity)
table for i in 0 .. capacity-1:
= new Set() table[i]
Note that we keep the total number of elements in a variable
size
. It is possible to calculate this value by summing the
sizes of all bins, but this takes time so we remember it in a variable
instead. This also means that we have to update the variable whenever
the size of a bin changes.
Searching for an element
To see if the set contains a given element, we look it up in the corresponding bin:
datatype SeparateChainingHashSet:
...contains(elem):
= table[hashIndex(elem)]
bin return bin.contains(elem)
Adding an element
To add an element we add it to the bin where it should belong. If the size of the bin changed, we know that the key wasn’t in the bin previously, and then we know that the number of elements have increased. We also have to check if the load factor becomes too large, and then we resize the internal table.
datatype SeparateChainingHashSet:
...add(elem):
= table[hashIndex(elem)]
bin = bin.size
oldBinSize add(elem)
bin.if bin.size > oldBinSize:
size = size + 1
if loadFactor() > MAX_LOAD_FACTOR:
size * MULTIPLIER) resizeTable(table.
Removing an element
To remove a value, we do the same: find the underlying bin and remove the value. If we actually removed the element (i.e., if it existed in the bin), then we decrease the total size. We also check if the table becomes too sparse, and then decrease the internal table by a factor.
datatype SeparateChainingHashSet:
...remove(elem):
= table[hashIndex(elem)]
bin = bin.size
oldBinSize remove(elem)
bin.if bin.size < oldBinSize:
size = size - 1
if loadFactor() < MIN_LOAD_FACTOR:
size / MULTIPLIER) resizeTable(table.
Load factor and constants
The load factor is simply the total number of elements divided by the number of bins, or :
datatype SeparateChainingHashSet:
...
loadFactor():return size / table.size
The constants for min and max load factors, and the resizing factor, are a bit arbitrary. With the following values, we ensure that the table on average contains between 0.5 and 2 entries per table index. Increasing these values leads to more better memory usage, but also more conflicts (i.e., longer search times). Also, we enlarge by 50%, or reduce by 33%, each time we resize the table. Increasing this value means that resizing will happen less often, but instead the memory usage will increase.
datatype SeparateChainingHashSet:
...= 8
MIN_CAPACITY = 0.5
MIN_LOAD_FACTOR = 2.0
MAX_LOAD_FACTOR = 1.5 MULTIPLIER
Resizing the internal table
When we resize the internal table, it is very important that we do not keep the old hash indices for the keys, because they will not be the same after resizing. Instead we save the current internal table to a temporary variable, and reinitialise the table to the new capacity. Then we iterate through all bins and entries in the old table, and simply insert them again into the new resized table.
datatype SeparateChainingHashSet:
...
resizeTable(newCapacity):if newCapacity >= MIN_CAPACITY:
= table
oldTable
initialise(newCapacity)for each bin in oldTable:
for each elem in bin:
add(elem)
12.4.2 Implementing maps
It is straightforward to modify the implementation above to become a key-value map instead of a set.
datatype SeparateChainingHashMap implements Map:
Array of Maps
table:
...get(key):
// Similar to contains for hash sets, but use Map.get instead:
return bin.get(key)
... put(key, value):
// Similar to add for hash sets, but use Map.put instead:
put(key, value) ...
... bin.remove(key):
// Similar to remove for hash sets, but use Map.remove instead:
remove(key) ...
... bin.
resizeTable(newCapacity):// Similar to resizeTable for hash sets, but use put instead:
for each key, value in bin:
... put(key, value)
12.4.3 Exercise: Separate
chaining
12.4.3 Exercise: Separate chaining