12.13 Review questions

This final section contains some review questions about the contents of this chapter.

12.13.1 Review questions: Hash functions

Answer TRUE or FALSE.

For the string hash functions, the size of the hash table limits the length of the string that can be hashed.

  • All hash functions should return a legal index in the hash table.
  • The length of the string does not change this.

The simple mod hash function makes use of:

  • Think of taking a three-digit number mod 10. What digit is used?

The binning hash function makes use of:

  • Consider what happens with 10 bins and 3-digit numbers in the range 0 to 999.
  • 0 to 99 goes to the first bin, 100 to 199 to the second, and so on.

The mid-squares method hash function makes use of:

  • Mid-squares method will square the key value before pulling out the middle bits.
  • While it uses only the middle bits of the squared value, all of the bits in the original key will influence the result.

For the simple string hash function that sums the ASCII values for the letters, does the order of the letters in the string affect the result?

  • Addition is commutative.

Which is the best definition for collision in a hash table?

  • Collision occurs when two keys try to go to the same slot.

(Recall that mm refers to the size of a hash table.) A hash function must:

  • A hash function should always give you a legal index into the hash table array.

If you double the size of a hash table, can you keep using the same hash function?

  • Will the old hash function go to an illegal index in the new table?
  • Will the old hash function go to all slots of the new table?

Answer TRUE or FALSE.

A company wants to use a system where they randomly assign each customer a 9-digit ID (so there are 1 billion possible ID numbers). Of course, there is a chance that two customers will be assigned the same ID. But the company figures that risk is OK if the odds are small enough. They expect to assign IDs to 1000 customers. The chance of a collision is therefore one in a million.

  • While it is true that the last customer given an ID has about one chance in a million of sharing it with another customer (actually, the chance is 999/1,000,000,000 if 999 IDs are already assigned)…
  • … this logic does not account for the fact that any of the previous 998 assignments might also have caused a collision.
  • So the chance for a collision is rather higher than one in a million.

Answer TRUE or FALSE.

A company wants to use a system where they randomly assign each customer a 9-digit ID (so there are 1 billion possible ID numbers). Of course, there is a chance that two customers will be assigned the same ID. But the company figures that risk is OK if the odds are only one in a million. They expect to assign IDs to 1000 customers. But the chance of collision is much higher than one in a million.

  • While it is true that the last customer given an ID has about one chance in a million of sharing it with another customer (actually, the chance is 999/1,000,000,000 if 999 IDs are already assigned)…
  • … this logic does not account for the fact that any of the previous 998 assignments might also have caused a collision.
  • So the chance for a collision is rather higher than one in a million.

12.13.2 Review questions: Hash tables

Answer TRUE or FALSE.

Hashing has the advantage that it can answer range queries or questions like what is the largest key in the database.

  • Hashing is good only for exact match queries.

Which of the following is true of open addressing?

  • In open addressing, the records are in the table.

Answer TRUE or FALSE.

Separate chaining works well for disk-based hash systems.

  • Managing a bunch of linked lists is not so good on disk.
  • Otherwise, hashing works fine on disk.

Answer TRUE or FALSE.

Open addressing works well for disk-based hash systems.

  • Managing a bunch of linked lists is not so good on disk.
  • Otherwise, hashing works fine on disk.

What is the expected worst-case cost for a search on a properly tuned hash system that stores nn records?

  • If the system is working right, you should only need to look at one slot most of the time, or occasionally a couple of slots.

What is a disadvantage of linear probing?

  • Clusters pile up into bigger clusters.
  • This is called primary clustering.

Which of the following is true of separate chaining?

  • In separate chaining, where are the records stored?
  • They are stored in linked lists outside the table.

A separate chaining table has an array size of nn. What is the maximum number of records that can be stored in the table?

  • In separate chaining, where are the records stored?
  • They are stored in linked lists outside the table.

Which is the best definition for a probe sequence?

  • Probing is part of collision resolution.
  • The probe sequence tells you where to look next.

What is one disadvantage of quadratic probing as a collision resolution method?

  • Usually, the probe sequence for quadratic probing does not visit all slots of the table.

Hashing is good for which queries?

  • Hashing takes a key value and then looks at a particular slot in the table.

The special value used to signal a position in a hash table from which a record has been deleted is called a:

  • Starts with a T.

Answer TRUE or FALSE.

When deleting a record from a hash table, we can just do a normal search and then remove the record from the slot where we find it.

  • If there is an empty slot along a probe path, records further down the path won’t be found on a search.

Answer TRUE or FALSE.

A problem with using tombstones is that they take up slots in the table that cannot be reused, so that over time the effective table size shrinks.

  • Tombstone slots can be resued.

Answer TRUE or FALSE.

A problem with using tombstones is that they take up slots in the table, so they slightly increase the search and insert time.

  • They do take up slots in the table.
  • So they do need to be searched over.

Which is true about deletion from a hash table using tombstones?

  • The tombstones fill some slots of the table.
  • While tombstone slots can be reused, they do have to be searched through.
  • The fraction of hash table slots taken up by tombstones is normally not large, so while there is an increased cost, is is not large.