Close
Register
Close Window

Data Structures and Algorithms

Chapter 9 Hash Tables

Show Source |    | About   «  9.6. Bucket Hashing (optional)   ::   Contents   ::   9.8. Improved Collision Resolution  »

9.7. Open Addressing

9.7.1. Hash tables without bins

We now turn to the most commonly used form of hashing: open addressing (also called closed hashing) with no bucketing, and a collision resolution policy that can potentially use any slot in the hash table.

Compared to separate chaining, we will now have room for exactly one entry in each table cell. If we want to implement a HashMap (not a HashSet), we then need to be able to put both a key and a value in the same table cell. For this we need a class of key-value pairs.

class KVPair<K, V> {
    K key;
    V value;
    KVPair(K key, V value) {
        this.key = key;
        this.value = value;
    }
}
class KVPair:
    def __init__(self, key, value):
        self.key = key
        self.value = value

Now we can initialise our open addressing hash map. In addition to the array of key-value pairs, we need a counter of the size, and the number of deleted cells (which is explained in later in this chapter).

class OpenAddressingHashMap<K, V> implements Map<K, V> {
    KVPair<K,V>[] internalTable;
    int mapSize;
    int numDeleted;

    OpenAddressingHashMap() {
        initialiseTable(MinCapacity);
    }

    @SuppressWarnings("unchecked")
    private void initialiseTable(int capacity) {
        internalTable = (KVPair<K,V>[]) new KVPair[capacity];
        mapSize = 0;
        numDeleted = 0;
    }
class OpenAddressingHashMap(Map):

    def __init__(self):
        self._initialiseTable(self._minCapacity)

    def _initialiseTable(self, capacity):
        self._internalTable = [None] * capacity
        self._mapSize = 0
        self._numDeleted = 0

We use the same constants as for the separate chaining map, but the values are different. Most importantly, the MaxLoadFactor must be smaller than 1, since there can only be one value per array slot.

    static int MinCapacity = 8;
    static double MinLoadFactor = 0.3;
    static double MaxLoadFactor = 0.7;
    static double CapacityMultiplier = 1.5;
    _minCapacity = 8
    _minLoadFactor = 0.3
    _maxLoadFactor = 0.7
    _capacityMultiplier = 1.5

9.7.2. Collision Resolution

The goal of collision resolution is to find a free slot in the hash table when the “home position” for the record is already occupied. We can view any collision resolution method as generating a sequence of hash table slots that can potentially hold the record. The first slot in the sequence will be the home position for the key. If the home position is occupied, then the collision resolution policy goes to the next slot in the sequence. If this is occupied as well, then another slot must be found, and so on. This sequence of slots is known as the probe sequence, and it is generated by some probe function that we will call p (or probe in the source code). Probing works as follows.

    private int hashAndProbe(K key) {
        int home = key.hashCode() & 0x7fffffff;   // Turn the hash code into a positive integer
        for (int i = 0; i < internalTable.length; i++) {
            int pos = (home + probe(key, i)) % internalTable.length;
            KVPair<K,V> elem = internalTable[pos];
            if (elem == null || key.equals(elem.key))
                return pos;
        }
        throw new IllegalStateException("Hash table full");
    }

    private int probe(K key, int i) {
        return i;        // Linear probing
        // return i*i;   // Quadratic probing
    }
    def _hashAndProbe(self, key):
        raise NotImplementedError("Left as exercise for the Python hacker")

The method hashAndProbe first calculates the home slot, which is the hash code compressed to an index in the internal hash array. Then it uses the probe function \(\textbf{p}(k, i)\) to locate a free slot in the table. Function p has two parameters, the key \(k\) and a count \(i\) of where in the probe sequence we wish to be. That is, to get the first position in the probe sequence for key \(K\), we call \(\textbf{p}(K, 0)\). For the next slot in the probe sequence, call \(\textbf{p}(K, 1)\), then \(\textbf{p}(K, 2)\), etc. If the key is already in the table, hashAndProbe returns the position of that entry, otherwise it returns the position of the first unoccupied slot.

Note that the probe function returns an offset from the original home position, rather than a slot in the hash table. Thus, the for loop in hashAndProbe is computing positions in the table at each iteration by adding the value returned from the probe function to the home position. The \(i\) th call to p returns the \(i\) th offset to be used.

9.7.3. Implementing methods of the hash map

All main methods in the Map interface (put, get and remove) use the same probing function p to get the same probe sequence. In this way, a record not in its home position can be recovered.

An implementation for the get method is as follows.

    public V get(K key) {
        int i = hashAndProbe(key);
        KVPair<K,V> elem = internalTable[i];
        return elem == null ? null : elem.value;
    }
    def get(self, key):
        i = self._hashAndProbe(key)
        elem = self._internalTable[i]
        return None if elem is None else elem.value

Searching and inserting both assume that at least one slot on the probe sequence of every key will be empty. Otherwise they will continue in an infinite loop on unsuccessful searches. Thus, the hash system should keep a count of the number of records stored, and make sure to resize the array when it becomes too full.

Setting a value for a key into the hash map works like this.

    public V put(K key, V value) {
        int i = hashAndProbe(key);
        KVPair<K,V> elem = internalTable[i];
        V old = null;
        if (elem == null) {
            internalTable[i] = new KVPair<>(key, value);
            mapSize++;
        } else {
            old = elem.value;
            elem.value = value;
        }
        if (loadFactor() > MaxLoadFactor)
            resizeTable((int) (internalTable.length * CapacityMultiplier));
        return old;
    }
    def put(self, key, value):
        raise NotImplementedError("Left as exercise for the Python hacker")

First we the next available slot for the given key. If the slot is empty (null), we create a new KVPair with the key and value and insert it into the table, and increase the map size. otherwise we update the value of the current entry, which doesn’t change the size of the table. Finally, we resize the table if the load factor becomes too large.

Deleting from an open addressing hash table is explained in a later module.

9.7.4. Linear probing

The simplest approach to collsion resolution is simply to move down the table from the home slot until a free slot is found. This is known as linear probing. The probe function for simple linear probing is \(\textbf{p}(K, i) = i\). That is, the \(i\) th offset on the probe sequence is just \(i\), meaning that the \(i\) th step is simply to move down \(i\) slots in the table. Once the bottom of the table is reached, the probe sequence wraps around to the beginning of the table (since the last step is to mod the result to the table size). Linear probing has the virtue that all slots in the table will be candidates for inserting a new record before the probe sequence returns to the home position.

Settings

Proficient Saving... Error Saving
Server Error
Resubmit

Can you see any reason why this might not be the best approach to collision resolution?

9.7.4.1. The Problem with Linear Probing

While linear probing is probably the first idea that comes to mind when considering collision resolution policies, it is not the only one possible. Probe function p allows us many options for how to do collision resolution. In fact, linear probing is one of the worst collision resolution methods. The main problem is illustrated by the next slideshow.

Settings

Proficient Saving... Error Saving
Server Error
Resubmit

Again, the ideal behavior for a collision resolution mechanism is that each empty slot in the table will have equal probability of receiving the next record inserted (assuming that every slot in the table has equal probability of being hashed to initially). This tendency of linear probing to cluster items together is known as primary clustering. Small clusters tend to merge into big clusters, making the problem worse.

The problem with primary clustering is that it leads to long probe sequences, which increases execution time. However, linear probing is still a very common probing method, because it is so simple and can be implemented efficiently.

   «  9.6. Bucket Hashing (optional)   ::   Contents   ::   9.8. Improved Collision Resolution  »

nsf
Close Window