8.6 Open Addressing
8.6.1 Hash tables without bins
We now turn to the most commonly used form of hashing: open addressing (also called closed hashing) with no bucketing, and a collision resolution policy that can potentially use any slot in the hash table.
Compared to separate chaining, we will now have room for exactly one entry in each table cell. If we want to implement a HashMap (not a HashSet), we then need to be able to put both a key and a value in the same table cell.
This can be done in two ways: either one array of key-value paris, or two arrays of the same length – one for the keys and another for the values. Here we will use the second approach.
Now we can initialise our open addressing hash map. In addition to the keys and values arrays, we need a counter of the size, and the number of deleted cells (which is explained later in this chapter).
class OpenAddressingHashMap implements Map:
OpenAddressingHashMap():this.initialise(minCapacity)
initialise(capacity):this.keys = new Array(capacity)
this.values = new Array(capacity)
this.mapSize = 0
this.numDeleted = 0
We use the same constants as for the separate chaining map, but the
values are different. Most importantly, the maxLoadFactor
must be smaller than 1, since there can only be one value per array
slot.
class OpenAddressingHashMap implements Map:
...= 8
minCapacity = 0.3
minLoadFactor = 0.7
maxLoadFactor = 1.5 capacityMultiplier
Finding a good table index for a key is done in the same way as for separate chaining hash tables:
class OpenAddressingHashMap implements Map:
...hash(key):
= key.hashCode() & 0x7fffffff
h return h % this.keys.size()
8.6.2 Collision Resolution
The goal of collision
resolution is to find a free slot in the hash table when the “home
position” for the record is already occupied. We can view any collision
resolution method as generating a sequence of hash table slots that can
potentially hold the record. The first slot in the sequence will be the
home position for the key. If the home position is occupied, then the
collision resolution policy goes to the next slot in the sequence. If
this is occupied as well, then another slot must be found, and so on.
This sequence of slots is known as the probe
sequence, and it is generated by some probe
function that we will call p (or probe
in the source code). Probing works as follows.
class OpenAddressingHashMap implements Map:
...
hashAndProbe(key):= this.hash(key)
home for i = 0 to this.keys.size()-1:
= (home + this.probe(key, i)) % this.keys.size()
pos = this.keys[pos]
k if k is null or k == key:
return pos
throw error "Hash table full"
probe(key, i):return i // Linear probing
// return i*i // Quadratic probing
The method hashAndProbe
first calculates the home
slot, which is the hash code compressed to an index in the internal
hash array. Then it uses the probe function
to locate a free slot in the table. Function p has two
parameters, the key
and a count
of where in the probe sequence we wish to be. That is, to get the first
position in the probe sequence for key
,
we call
.
For the next slot in the probe sequence, call
,
then
,
etc. If the key is already in the table, hashAndProbe
returns the position of that entry, otherwise it returns the position of
the first unoccupied slot.
Note that the probe function returns an offset from the original home
position, rather than a slot in the hash table. Thus, the
for
loop in hashAndProbe
is computing
positions in the table at each iteration by adding the value returned
from the probe function to the home position. The
th call to p returns the
th offset to be used.
8.6.3 Implementing methods of the hash map
All main methods in the Map interface (put
,
get
and remove
) use the same probing function
p to get the same probe sequence. In this way, a record
not in its home position can be recovered.
An implementation for the get
method is as follows.
class OpenAddressingHashMap implements Map:
...get(key):
= this.hashAndProbe(key)
i if this.keys[i] is null:
return null
else:
return this.values[i]
Searching and inserting both assume that at least one slot on the probe sequence of every key will be empty. Otherwise they will continue in an infinite loop on unsuccessful searches. Thus, the hash system should keep a count of the number of records stored, and make sure to resize the array when it becomes too full.
Setting a value for a key into the hash map works like this.
class OpenAddressingHashMap implements Map:
...put(key, value):
= this.hashAndProbe(key)
i if this.keys[i] is null:
this.keys[i] = key
this.mapSize = this.mapSize + 1
= this.values[i]
old this.values[i] = value
if this.loadFactor() > maxLoadFactor:
this.resizeTable(this.keys.size() * capacityMultiplier)
return old
First we the next available slot for the given key. If the slot is
empty (null
), we insert the key into the keys table, and
increase the map size. Then we update the values table with the new
value. Finally, we resize the table if the load factor becomes too
large.
Deleting from an open addressing hash table is explained later in this chapter.
8.6.4 Linear probing
The simplest approach to collsion resolution is simply to move down the table from the home slot until a free slot is found. This is known as linear probing. The probe function for simple linear probing is . That is, the th offset on the probe sequence is just , meaning that the th step is simply to move down slots in the table. Once the bottom of the table is reached, the probe sequence wraps around to the beginning of the table (since the last step is to mod the result to the table size). Linear probing has the virtue that all slots in the table will be candidates for inserting a new record before the probe sequence returns to the home position.
Can you see any reason why this might not be the best approach to collision resolution?
8.6.5 The Problem with Linear Probing
While linear probing is probably the first idea that comes to mind when considering collision resolution policies, it is not the only one possible. Probe function p allows us many options for how to do collision resolution. In fact, linear probing is one of the worst collision resolution methods. The main problem is illustrated by the next slideshow.
Again, the ideal behavior for a collision resolution mechanism is that each empty slot in the table will have equal probability of receiving the next record inserted (assuming that every slot in the table has equal probability of being hashed to initially). This tendency of linear probing to cluster items together is known as primary clustering. Small clusters tend to merge into big clusters, making the problem worse.
The problem with primary clustering is that it leads to long probe sequences, which increases execution time. However, linear probing is still a very common probing method, because it is so simple and can be implemented efficiently.