12.6 Implementing sets and maps using open addressing
An open addressing hash set consists of an internal array
table
of elements. Apart from the special value
null
, denoting an empty slot, the table must also allow the
special value DELETED
, denoting a slot that has been
deleted. To initialise the table, we first create the internal array of
some initial minimum capacity:
datatype OpenAddressingHashSet implements Set:
= new Array(MIN_CAPACITY)
table size = 0
As already explained, all main methods (contains
,
add
and remove
) use the same probing function
to get the same probe sequence. This is implemented by the method
hashAndProbe
that was shown in the previous section. In
this way, a record not in its home position can always be recovered.
To search for an element we probe the table for a position and check if the slot contains the element we are looking for:
datatype OpenAddressingHashSet:
...contains(elem):
= hashAndProbe(elem)
i return table[i] == elem
When we want to add an element we first have to search for it. If the element isn’t found, the probing returns the index to an empty slot, which we then can assign the element. Now we have to increase the size of the set, and check if we have exceeded the maximum allowed load factor. If the load factor is too large, then we resize the internal table – which was explained in the previous section.
datatype OpenAddressingHashSet:
...add(elem):
= hashAndProbe(elem)
i if table[i] is null:
= elem
table[i] size = size + 1
if loadFactor() > MAX_LOAD_FACTOR:
size * MULTIPLIER) resizeTable(table.
Deleting from an open addressing hash table was explained briefly in the last section and we will go into more details in the next section.
We use the same constants as for the separate chaining hash table, but the values are different. Most importantly, the max load factor must be smaller than 1, since there can only be one value per array slot.
datatype OpenAddressingHashSet:
...= 8
MIN_CAPACITY = 0.3
MIN_LOAD_FACTOR = 0.7
MAX_LOAD_FACTOR = 1.5 MULTIPLIER
12.6.1 Resizing
When we resize the internal table, it is very important that we do not keep the old hash indices for the keys, because they will not be the same after resizing. This is exactly the same as for separate chaining. Instead we save the current internal table to a temporary variable, and reinitialise the table to the new capacity. Then we iterate through all entries in the old table, and simply insert them again into the new resized table.
datatype SeparateChainingHashSet:
...
resizeTable(newCapacity):= table // Remember the old table.
oldTable = new Array(newCapacity) // Reset the internal table.
table size = 0 // Reset the number of elements.
for each elem in oldTable:
if elem is not null and not DELETED:
add elem
Now we can get rid of all the deleted slots, simply by not adding them to the new table.
12.6.2 Implementing maps
If we want to implement a map instead of a set, we have to store both a key and a value in the same slot. This can be done in two ways: either one array of key-value pairs, or two arrays of the same length – one for the keys and another for the values. Here we will use the second approach.
datatype OpenAddressingHashMap implements Map:
= new Array(MIN_CAPACITY)
keys = new Array(MIN_CAPACITY)
values size = 0
The changes that has to be made are these, compared to how we implemented sets:
- when searching in the hash table, we have to use the
keys
array (which was thetable
array for the hash set) - when getting and setting the value, we use the same index in the
values
array
Searching is straightforward:
datatype OpenAddressingHashSet:
...get(key):
= hashAndProbe(key)
i if keys[i] == key:
return values[i]
Setting the value for a key becomes slightly longer, because we have to check if the key already exists or not. If the key doesn’t exist we have to increase the size and check if we need to resize the tables.
datatype OpenAddressingHashSet:
...put(key, value):
= hashAndProbe(key)
i = value
values[i] if keys[i] is null:
= key
keys[i] size = size + 1
if loadFactor() > MAX_LOAD_FACTOR:
size * MULTIPLIER) resizeTable(keys.
Reisizing is very similar to that for sets, we just have to remember
to reset both internal tables (keys
and
values
).