Data structures and algorithms

Exercise solutions: Hash tables

Here are suggested solutions to the exercises about hash tables.

Core exercises

A1. Advantages and disadvantages of hash tables.

The main advantange with hash tables is that the main operations are all constant time, O(1). BST operations on the other hand are linear, O(n), in the worst case. However, self-balancing BSTs are logarithmic, O(log n) in the worst case, which is almost constant time.

There are two main disadvantages with hash tables:

A2. Suitable invariants.

There are many possible, but here are suggestions (where h(e) is the table index that the hash function points element e to):

A3. Construct an example to show that deletion is tricky.

Recall the lecture on open addressing hash tables.

A4. Implement lazy deletion.

Left as exercise, or see the code in the course book.

A5. Early versions of PHP.

See this PHP discussion.

Yes, but it would cause all keys to hash to the same spot, which would lead to poor performance. We discuss good and bad hash functions in the lectures and the lecture videos.

A7. Implement hashCode and equals for points

See the S&W website, exercise 2.

The given solution could be generalized to three or more dimensions by dividing the 32 bits of each coordinate into three or more parts and mixing them as before. For example, for coordinates x1, …, xn, we can take xor over i = 1, …, n of xi left rotated by ⌊32 / i⌋.

A8–A10. Insert E A S Y Q U T I O N.

It’s not the answer itself that is important, but the practice. So these are left as exercises for you, but the following table is probably useful – it shows the hash value for each letter, for different values of m. I assume that A is the 1st letter (not the 0th).

AEINOQSTUY
i15914151719202125
m = 51044024010
m = 1611731051111273
m = 101594579015
m = 43332131033
m = 83732531473

Bonus exercises

Most of the following bonus exercises are from the Sedgewick & Wayne book. Some of them can be found at the book website.

B1. Birthday paradox

See the S&W website, exercise 8.

With replacement means that the same song can be chosen more than once. Each time we play a new song we will have increased the number of songs that are likely to be a duplicate.

Here you can read about the birthday paradox.

B2 Insert E A S Y Q U T I O N.

See the answer to A8–A10 above.

B3. Hash attack

See the S&W website, exercise 32.

It is easy to verify that “Aa” and “BB” hash to the same hashCode() value (2112). Now, any string of length 2N that is formed by concatenating these two strings together in any order (e.g., BBBBBB, AaAaAa, BBAaBB, AaBBBB) will hash to the same value. Here is a list of 10000 strings with the same hash value.

B4. Bad hash functions

See the S&W website, exercise 33.

This was done in the hopes of computing the hash function more quickly. Indeed, the hash values were computed more quickly, but it became more likely that many strings hashed to the same values. This resulted in a significant degradation in performance on many real-world inputs (e.g., long URLs) which all hashed to the same value, e.g., http://www.cs.princeton.edu/algs4/34hash/*****java.html.

B5. Avoiding lazy deletion

One example of where it goes wrong: Let’s say we have an empty hash table and three objects we want to insert “hearts”, “diamonds” and “clubs. “Diamonds” and “clubs” have the same hash value and “hearts” hash value is “diamonds -1”.

The problem now is that when we go looking for clubs, it will be positioned before where we start looking.

See Sedgewick & Wayne, page 471, or the Wikipedia article about linear probing.

B6. Insert A–G in some order.

Write a program that tries to insert the keys in all possible orders. From the results you can fill in this table:

Possible?Minimal probesMaximal probes
1. E F G A C B D
2. C E B G F D A
3. B D F A C E G
4. C G B A D E F
5. F G B D A C E
6. G E C A D B F

B7. (S&W exercise 3.4.4).

Left as an exercise.