Here are suggested solutions to the exercises about hash tables.
The main advantange with hash tables is that the main operations are all constant time, O(1). BST operations on the other hand are linear, O(n), in the worst case. However, self-balancing BSTs are logarithmic, O(log n) in the worst case, which is almost constant time.
There are two main disadvantages with hash tables:
There are many possible, but here are suggestions (where h(e) is the table index that the hash function points element e to):
Separate chaining: “for every element e in the linked list in table cell k, h(e) = k; also no element in the list occurs twice.”
Linear probing: “for every element e in table cell k, either (1) h(e) = k, or (2) h(e) < k and all cells k ≤ i < h(e) are occupied, or (3) k < h(e) and all cells i ≥ h(e) and all cells i < k are occupied.”
Recall the lecture on open addressing hash tables.
Left as exercise, or see the code in the course book.
See this PHP discussion.
Yes, but it would cause all keys to hash to the same spot, which would lead to poor performance. We discuss good and bad hash functions in the lectures and the lecture videos.
See the S&W website, exercise 2.
The given solution could be generalized to three or more dimensions by dividing the 32 bits of each coordinate into three or more parts and mixing them as before. For example, for coordinates x1, …, xn, we can take xor over i = 1, …, n of xi left rotated by ⌊32 / i⌋.
It’s not the answer itself that is important, but the practice. So these are left as exercises for you, but the following table is probably useful – it shows the hash value for each letter, for different values of m. I assume that A is the 1st letter (not the 0th).
A | E | I | N | O | Q | S | T | U | Y | |
---|---|---|---|---|---|---|---|---|---|---|
i | 1 | 5 | 9 | 14 | 15 | 17 | 19 | 20 | 21 | 25 |
m = 5 | 1 | 0 | 4 | 4 | 0 | 2 | 4 | 0 | 1 | 0 |
m = 16 | 11 | 7 | 3 | 10 | 5 | 11 | 1 | 12 | 7 | 3 |
m = 10 | 1 | 5 | 9 | 4 | 5 | 7 | 9 | 0 | 1 | 5 |
m = 4 | 3 | 3 | 3 | 2 | 1 | 3 | 1 | 0 | 3 | 3 |
m = 8 | 3 | 7 | 3 | 2 | 5 | 3 | 1 | 4 | 7 | 3 |
Most of the following bonus exercises are from the Sedgewick & Wayne book. Some of them can be found at the book website.
See the S&W website, exercise 8.
With replacement means that the same song can be chosen more than once. Each time we play a new song we will have increased the number of songs that are likely to be a duplicate.
Here you can read about the birthday paradox.
See the answer to A8–A10 above.
See the S&W website, exercise 32.
It is easy to verify that “Aa” and “BB” hash to the same hashCode() value (2112). Now, any string of length 2N that is formed by concatenating these two strings together in any order (e.g., BBBBBB, AaAaAa, BBAaBB, AaBBBB) will hash to the same value. Here is a list of 10000 strings with the same hash value.
See the S&W website, exercise 33.
This was done in the hopes of computing the hash function more quickly.
Indeed, the hash values were computed more quickly, but it became more likely that many strings hashed to the same values.
This resulted in a significant degradation in performance on many real-world inputs (e.g., long URLs) which all hashed to the same value, e.g., http://www.cs.princeton.edu/algs4/34hash/*****java.html
.
One example of where it goes wrong: Let’s say we have an empty hash table and three objects we want to insert “hearts”, “diamonds” and “clubs. “Diamonds” and “clubs” have the same hash value and “hearts” hash value is “diamonds -1”.
The problem now is that when we go looking for clubs, it will be positioned before where we start looking.
See Sedgewick & Wayne, page 471, or the Wikipedia article about linear probing.
Write a program that tries to insert the keys in all possible orders. From the results you can fill in this table:
Possible? | Minimal probes | Maximal probes | |
---|---|---|---|
1. E F G A C B D | |||
2. C E B G F D A | |||
3. B D F A C E G | |||
4. C G B A D E F | |||
5. F G B D A C E | |||
6. G E C A D B F |
Left as an exercise.