10 Sets and maps

Many programming tasks involve finding the right piece of information in a large dataset. That is, we have a collection of items, and we want to quickly retrieve the items matching certain criteria. Here are some examples of information retrieval problems:

These problems can all be addressed using two abstract data types (ADTs): the set and the map. Both provide efficient ways to manage a collection of elements, supporting operations to find, add, and remove elements.

In this section, we’ll explore what sets and maps are, how they work, and how they can be applied to solve the four example problems introduced above.

You may have already used sets and maps in programming, because almost every programming language provides an implementation for them. For example, Java provides the HashSet and HashMap classes, and Python provides sets and dictionaries (another word for maps) as part of their standard libraries.

Sets and maps are useful in a huge variety of computer programs, and are perhaps the most useful of all data structures. But how can we design a class that implements a set or a map, in such a way that adding, removing and searching can be done efficiently? In this book we will see several different ways of implementing sets and maps.

Balanced BSTs and hash tables are the main ways that sets and maps are implemented in practice. Almost every programming language provides sets and maps as a built-in feature, based on one of these technologies. For example, Java’s HashSet, HashMap, TreeSet and TreeMap, and Python’s sets and dictionaries. By the end of this book you will understand how all of these work.