DSABook – Open Addressing

8.6 Open Addressing

8.6.1 Hash tables without bins

We now turn to the most commonly used form of hashing: open addressing (also called closed hashing) with no bucketing, and a collision resolution policy that can potentially use any slot in the hash table.

Compared to separate chaining, we will now have room for exactly one entry in each table cell. If we want to implement a HashMap (not a HashSet), we then need to be able to put both a key and a value in the same table cell.

This can be done in two ways: either one array of key-value paris, or two arrays of the same length – one for the keys and another for the values. Here we will use the second approach.

Now we can initialise our open addressing hash map. In addition to the keys and values arrays, we need a counter of the size, and the number of deleted cells (which is explained later in this chapter).

class OpenAddressingHashMap implements Map:
    OpenAddressingHashMap():
        this.initialise(minCapacity)

    initialise(capacity):
        this.keys = new Array(capacity)
        this.values = new Array(capacity)
        this.mapSize = 0
        this.numDeleted = 0

We use the same constants as for the separate chaining map, but the values are different. Most importantly, the maxLoadFactor must be smaller than 1, since there can only be one value per array slot.

class OpenAddressingHashMap implements Map:
    ...
    minCapacity = 8
    minLoadFactor = 0.3
    maxLoadFactor = 0.7
    capacityMultiplier = 1.5

Finding a good table index for a key is done in the same way as for separate chaining hash tables:

class OpenAddressingHashMap implements Map:
    ...
    hash(key):
        h = key.hashCode() & 0x7fffffff
        return h % this.keys.size()

8.6.2 Collision Resolution

The goal of collision resolution is to find a free slot in the hash table when the “home position” for the record is already occupied. We can view any collision resolution method as generating a sequence of hash table slots that can potentially hold the record. The first slot in the sequence will be the home position for the key. If the home position is occupied, then the collision resolution policy goes to the next slot in the sequence. If this is occupied as well, then another slot must be found, and so on. This sequence of slots is known as the probe sequence, and it is generated by some probe function that we will call p (or probe in the source code). Probing works as follows.

class OpenAddressingHashMap implements Map:
    ...
    hashAndProbe(key):
        home = this.hash(key)
        for i = 0 to this.keys.size()-1:
            pos = (home + this.probe(key, i)) % this.keys.size()
            k = this.keys[pos]
            if k is null or k == key:
                return pos
        throw error "Hash table full"

    probe(key, i):
        return i   // Linear probing
        // return i*i  // Quadratic probing

The method hashAndProbe first calculates the home slot, which is the hash code compressed to an index in the internal hash array. Then it uses the probe function $\textbf{p}(k, i)$ to locate a free slot in the table. Function p has two parameters, the key $k$ and a count $i$ of where in the probe sequence we wish to be. That is, to get the first position in the probe sequence for key $K$ , we call $\textbf{p}(K, 0)$ . For the next slot in the probe sequence, call $\textbf{p}(K, 1)$ , then $\textbf{p}(K, 2)$ , etc. If the key is already in the table, hashAndProbe returns the position of that entry, otherwise it returns the position of the first unoccupied slot.

Note that the probe function returns an offset from the original home position, rather than a slot in the hash table. Thus, the for loop in hashAndProbe is computing positions in the table at each iteration by adding the value returned from the probe function to the home position. The $i$ th call to p returns the $i$ th offset to be used.

8.6.3 Implementing methods of the hash map

All main methods in the Map interface (put, get and remove) use the same probing function p to get the same probe sequence. In this way, a record not in its home position can be recovered.

An implementation for the get method is as follows.

class OpenAddressingHashMap implements Map:
    ...
    get(key):
        i = this.hashAndProbe(key)
        if this.keys[i] is null:
            return null
        else:
            return this.values[i]

Searching and inserting both assume that at least one slot on the probe sequence of every key will be empty. Otherwise they will continue in an infinite loop on unsuccessful searches. Thus, the hash system should keep a count of the number of records stored, and make sure to resize the array when it becomes too full.

Setting a value for a key into the hash map works like this.

class OpenAddressingHashMap implements Map:
    ...
    put(key, value):
        i = this.hashAndProbe(key)
        if this.keys[i] is null:
            this.keys[i] = key
            this.mapSize = this.mapSize + 1
        old = this.values[i]
        this.values[i] = value
        if this.loadFactor() > maxLoadFactor:
            this.resizeTable(this.keys.size() * capacityMultiplier)
        return old

First we the next available slot for the given key. If the slot is empty (null), we insert the key into the keys table, and increase the map size. Then we update the values table with the new value. Finally, we resize the table if the load factor becomes too large.

Deleting from an open addressing hash table is explained later in this chapter.

8.6.4 Linear probing

The simplest approach to collsion resolution is simply to move down the table from the home slot until a free slot is found. This is known as linear probing. The probe function for simple linear probing is $\textbf{p}(K, i) = i$ . That is, the $i$ th offset on the probe sequence is just $i$ , meaning that the $i$ th step is simply to move down $i$ slots in the table. Once the bottom of the table is reached, the probe sequence wraps around to the beginning of the table (since the last step is to mod the result to the table size). Linear probing has the virtue that all slots in the table will be candidates for inserting a new record before the probe sequence returns to the home position.

Can you see any reason why this might not be the best approach to collision resolution?

8.6.5 The Problem with Linear Probing

While linear probing is probably the first idea that comes to mind when considering collision resolution policies, it is not the only one possible. Probe function p allows us many options for how to do collision resolution. In fact, linear probing is one of the worst collision resolution methods. The main problem is illustrated by the next slideshow.

Again, the ideal behavior for a collision resolution mechanism is that each empty slot in the table will have equal probability of receiving the next record inserted (assuming that every slot in the table has equal probability of being hashed to initially). This tendency of linear probing to cluster items together is known as primary clustering. Small clusters tend to merge into big clusters, making the problem worse.

The problem with primary clustering is that it leads to long probe sequences, which increases execution time. However, linear probing is still a very common probing method, because it is so simple and can be implemented efficiently.