1.4 Mathematical preliminaries

This section presents the mathematical preliminaries assumed to be familiar to the reader. It serves as a review and reference, allowing you to revisit relevant sections when encountering unfamiliar notation or mathematical techniques in later chapters. If you’re comfortable with these preliminaries, you can safely skip ahead to the next section.

1.4.1 Sets

The concept of a set in the mathematical sense has wide application in computer science. The notations and techniques of set theory are commonly used when describing and implementing algorithms because the abstractions associated with sets often help to clarify and simplify algorithm design.

A set is a collection of distinguishable members or elements. The members are typically drawn from some larger population known as the base type. Each member of a set is either a primitive element of the base type or is a set itself. There is no concept of duplication in a set. Each value from the base type is either in the set or not in the set. For example, a set named 𝐏\mathbf{P} might consist of the three integers 7, 11, and 42. In this case, 𝐏\mathbf{P}’s members are 7, 11, and 42, and the base type is integer.

The following table shows the symbols commonly used to express sets and their relationships.

{1,4}\{1, 4\} A set composed of the members 1 and 4
{x|xis prime}\{x ~|~ x\ \mbox{is prime}\} A set definition using a set comprehension (e.g., the set of all primes)
x∈𝐏x\in\mathbf{P} xx is a member of 𝐏\mathbf{P}
xβˆ‰πx\notin\mathbf{P} xx is not a member of set 𝐏\mathbf{P}
βˆ…\emptyset The null or empty set
|𝐏||\mathbf{P}| Cardinality: the size of 𝐏\mathbf{P}, or its number of members
πβŠ†π\mathbf{P}\subseteq\mathbf{Q} 𝐏\mathbf{P} is included in 𝐐\mathbf{Q}, 𝐏\mathbf{P} is a subset of 𝐐\mathbf{Q}
πβŠ‡π\mathbf{P}\supseteq\mathbf{Q} 𝐐\mathbf{Q} is included in 𝐏\mathbf{P}, 𝐏\mathbf{P} is a superset of 𝐐\mathbf{Q}
𝐏βˆͺ𝐐\mathbf{P}\cup\mathbf{Q} Union: all elements appearing in any of 𝐏\mathbf{P} or 𝐐\mathbf{Q}
𝐏∩𝐐\mathbf{P}\cap\mathbf{Q} Intersection: all elements appearing in both 𝐏\mathbf{P} and 𝐐\mathbf{Q}
πβˆ’π\mathbf{P}-\mathbf{Q} Difference: all elements of 𝐏\mathbf{P} not in 𝐐\mathbf{Q}
𝐏×𝐐\mathbf{P}\times\mathbf{Q} (Cartesian) product: all possible pairs (x,y)(x,y) with x∈𝐏x\in\mathbf{P} and y∈𝐐y\in\mathbf{Q}
2𝐏2^\mathbf{P} or 𝒫(𝐏)\mathcal{P}(\mathbf{P}) Powerset: all possible subsets of 𝐏\mathbf{P}

Here are some examples of this notation in use. First define two sets, 𝐏\mathbf{P} and 𝐐\mathbf{Q}.

𝐏={2,3,5},𝐐={5,10}. \mathbf{P} = \{2, 3, 5\}, \qquad \mathbf{Q} = \{5, 10\}.

|𝐏|=3|\mathbf{P}| = 3 (because 𝐏\mathbf{P} has three members) and |𝐐|=2|\mathbf{Q}| = 2 (because 𝐐\mathbf{Q} has two members). Both of these sets are finite in length. Other sets can be infinite, for example, the set of integers.

The union of 𝐏\mathbf{P} and 𝐐\mathbf{Q}, written 𝐏βˆͺ𝐐\mathbf{P} \cup \mathbf{Q}, is the set of elements in either 𝐏\mathbf{P} or 𝐐\mathbf{Q}, which is {2, 3, 5, 10}. The intersection of 𝐏\mathbf{P} and 𝐐\mathbf{Q}, written 𝐏∩𝐐\mathbf{P} \cap \mathbf{Q}, is the set of elements that appear in both 𝐏\mathbf{P} and 𝐐\mathbf{Q}, which is {5}. The set difference of 𝐏\mathbf{P} and 𝐐\mathbf{Q}, written πβˆ’π\mathbf{P} - \mathbf{Q}, is the set of elements that occur in 𝐏\mathbf{P} but not in 𝐐\mathbf{Q}, which is {2, 3}. Note that 𝐏βˆͺ𝐐=𝐐βˆͺ𝐏\mathbf{P} \cup \mathbf{Q} = \mathbf{Q} \cup \mathbf{P} and that 𝐏∩𝐐=𝐐∩𝐏\mathbf{P} \cap \mathbf{Q} = \mathbf{Q} \cap \mathbf{P}, but in general πβˆ’πβ‰ πβˆ’π\mathbf{P} - \mathbf{Q} \neq \mathbf{Q} - \mathbf{P}. In this example, πβˆ’π={10}\mathbf{Q} - \mathbf{P} = \{10\}. Finally, the set {5, 3, 2} is indistinguishable from set 𝐏\mathbf{P}, because sets have no concept of order. Likewise, set {2, 3, 2, 5} is also indistinguishable from 𝐏\mathbf{P}, because sets have no concept of duplicate elements.

The set product or Cartesian product of two sets 𝐏×𝐐\mathbf{P} \times \mathbf{Q} is a set of ordered pairs. For our example sets, the set product would be:

{(2,5),(2,10),(3,5),(3,10),(5,5),(5,10)}. \{(2, 5),\ (2, 10),\ (3, 5),\ (3, 10),\ (5, 5),\ (5, 10)\}.

The powerset of a set 𝐏\mathbf{P} (denoted 2𝐏2^\mathbf{P} or 𝒫(𝐏)\mathcal{P}(\mathbf{P})) is the set of all possible subsets of 𝐏\mathbf{P}. For our example set 𝐏\mathbf{P}, the powerset would be:

{βˆ…,{2},{3},{5},{2,3},{2,5},{3,5},{2,3,5}}. \{ \emptyset,\ \{2\},\ \{3\},\ \{5\},\ \{2, 3\},\ \{2, 5\},\ \{3, 5\},\ \{2, 3, 5\} \}.

A collection of elements without a specific order, similar to a set, but allowing multiple occurrences of each element, is called a bag. A bag is also known as a multiset and an element that appears more than once is called a duplicate.

A sequence is an ordered collection of elements of the same type that may include duplicates. A sequence can for example be implemented as a list, an array, or a vector. We can access elements in a sequence using zero-based indexing, where the index 0 refers to the first element, 1 to the second, and so on. We will use square brackets [][] to enclose the elements of a sequence. For example, [3,4,5,4][3, 4, 5, 4] is a sequence. Note that sequence [3,5,4,4][3, 5, 4, 4] is distinct from sequence [3,4,5,4][3, 4, 5, 4], and both are distinct from sequence [3,4,5][3, 4, 5].

1.4.2 Relations

A binary relation RR over set 𝐒\mathbf{S} is a set of ordered pairs from 𝐒\mathbf{S}, i.e., RβŠ†π’Γ—π’R\subseteq\mathbf{S}\times\mathbf{S}. As an example of a relation, if 𝐒\mathbf{S} is {a,b,c}\{a, b, c\}, then

{(a,c),(b,c),(c,b)} \{ (a, c), (b, c), (c, b) \}

is a relation, and

{(a,a),(a,c),(b,b),(b,c),(c,c)} \{ (a, a), (a, c), (b, b), (b, c), (c, c) \}

is a different relation. If tuple (x,y)(x, y) is in relation RR, we may use the infix notation xRyxRy. We often use relations, such as the less than operator (<<), on the natural numbers, which includes ordered pairs such as (1,3)(1, 3) and (2,23)(2, 23), but not (3,2)(3, 2) or (2,2)(2, 2). Rather than writing the relationship in terms of ordered pairs, we typically use an infix notation for such relations, writing 1<31<3.

The most important properties of relations are as follows:

(Here RR is a binary relation over a set 𝐒\mathbf{S}, and the condition holds for all a,b,cβˆˆπ’a, b, c \in \mathbf{S}.)

As examples, for the natural numbers, << is irreflexive (because aRaaRa is never true), antisymmetric (because there is no case where aRbaRb and bRabRa), and transitive. Relation ≀\leq is reflexive, antisymmetric, and transitive. Relation == is reflexive, symmetric (and antisymmetric!), and transitive. For people, the relation β€œis a sibling of” is symmetric and transitive. If we define a person to be a sibling of themself, then it is reflexive; if we define a person not to be a sibling of themself, then it is irreflexive.

Equivalence relations

RR is an equivalence relation on set 𝐒\mathbf{S} if it is reflexive, symmetric, and transitive. An equivalence relation can be used to partition a set into equivalence classes. If two elements aa and bb are equivalent to each other, we write a≑ba \equiv b.

A partition of a set 𝐒\mathbf{S} is a collection of subsets that are disjoint from each other and whose union is 𝐒\mathbf{S}. There’s a close correspondence between equivalence relations and partitions: an equivalence relation on a set 𝐒\mathbf{S} partitions the set into disjoint subsets (the subset of equivalent elements). But the opposite is also true: if you have a partition of a set 𝐒\mathbf{S} you can easily define a corresponding equivalence relation (elements are equivalent if they belong to the same partition).

An obvious example of an equivalence relation is the relation == on natural numbers. Examples on real-life objects are β€œhave the same colour”, or β€œas heavy as”. And if we look at strings, β€œhave the same length” is an equivalence relation too.

Partial orders

A binary relation is called a partial order if it is antisymmetric and transitive. If the relation is reflexive, it is called a non-strict partial order. If the relation is irreflexive, it is called a strict partial order. The set on which the partial order is defined is called a partially ordered set or a poset. Elements xx and yy of a set are comparable under a given relation RR if either xRyxRy or yRxyRx. If every pair of distinct elements in a partial order are comparable, then the order is called a total order or linear order.

Example: Less-than

For the integers, relations << and ≀\leq define partial orders. Operation << is a total order because, for every pair of integers xx and yy such that xβ‰ yx \neq y, either x<yx < y or y<xy < x. Likewise, ≀\leq is a total order because, for every pair of integers xx and yy such that xβ‰ yx \neq y, either x≀yx \leq y or y≀xy \leq x.

Example: Subsets

For the powerset of the integers, the subset operator defines a partial order (because it is antisymmetric and transitive). For example, {1,2}βŠ†{1,2,3}\{1, 2\}\subseteq\{1, 2, 3\}. However, sets {1, 2} and {1, 3} are not comparable by the subset operator, because neither is a subset of the other. Therefore, the subset operator does not define a total order on the powerset of the integers.

1.4.3 Miscellaneous notations

We now define several mathematical terms and concepts, providing a reference for future use.

Units of measure:

We will use the following notation for units of measure. β€œB” will be used as an abbreviation for bytes, β€œb” for bits, β€œKB” for kilobytes (210=10242^{10} = 1024 bytes), β€œMB” for megabytes (2202^{20} bytes) β€œGB” for gigabytes (2302^{30} bytes) and β€œms” for milliseconds (a millisecond is 1/1000 of a second). Spaces are not placed between the number and the unit abbreviation when a power of two is intended. Thus a disk drive of size 25 gigabytes (where a gigabyte is intended as 2302^{30} bytes) will be written as β€œ25GB”. Spaces are used when a decimal value is intended. An amount of 2000 bits would therefore be written β€œ2 Kb” while β€œ2Kb” represents 2048 bits. 2000 milliseconds is written as 2000 ms. Note that in this book large amounts of storage are nearly always measured in powers of two and times in powers of ten.

Permutations:

A permutation of a sequence 𝐒\mathbf{S} is simply the members of 𝐒\mathbf{S} arranged in some order. For example, a permutation of the integers 1 through nn would be those values arranged in some order. If the sequence contains nn distinct members, then there are n!n! different permutations for the sequence. This is because there are nn choices for the first member in the permutation; for each choice of first member there are nβˆ’1n-1 choices for the second member, and so on. Sometimes one would like to obtain a random permutation for a sequence, that is, one of the n!n! possible permutations is selected in such a way that each permutation has equal probability of being selected.

Logic Notation:

We will occasionally make use of the notation of symbolic logic.

  • Aβ‡’BA \Rightarrow B means β€œif AA then BB” – i.e., that AA implies BB.
  • A⇔BA \Leftrightarrow B means β€œAA if and only if BB” – i.e., that AA and BB are equivalent.
  • A∨BA \vee B means β€œAA or BB” – the disjunction of AA and BB.
  • A∧BA \wedge B means β€œAA and BB” – the conjunction of AA and BB.
  • Β¬A\neg A means β€œnot AA” – the negation of AA.
Floor and ceiling:

The floor of xx (written ⌊xβŒ‹\lfloor x \rfloor) takes real value xx and returns the greatest integer ≀x\leq x. For example, ⌊3.4βŒ‹=3\lfloor 3.4 \rfloor = 3, as does ⌊3.0βŒ‹\lfloor 3.0 \rfloor, while βŒŠβˆ’3.4βŒ‹=βˆ’4\lfloor -3.4 \rfloor = -4 and βŒŠβˆ’3.0βŒ‹=βˆ’3\lfloor -3.0 \rfloor = -3. The ceiling of xx (written ⌈xβŒ‰\lceil x \rceil) takes real value xx and returns the least integer β‰₯x\geq x. For example, ⌈3.4βŒ‰=4\lceil 3.4 \rceil = 4, as does ⌈4.0βŒ‰\lceil 4.0 \rceil, while βŒˆβˆ’3.4βŒ‰=βŒˆβˆ’3.0βŒ‰=βˆ’3\lceil -3.4 \rceil = \lceil -3.0 \rceil = -3.

Modulus function:

The modulus (or mod) function returns the remainder of an integer division. Sometimes written nmodmn \;\mathrm{mod}\; m in mathematical expressions, the syntax in many programming languages is nn % m. From the definition of remainder, nmodmn \;\mathrm{mod}\; m is the integer rr such that n=qm+rn = qm + r for qq an integer, and |r|<|m||r| < |m|. Therefore, the result of nmodmn \;\mathrm{mod}\; m must be between 0 and mβˆ’1m-1 when nn and mm are positive integers. For example, 5mod3=25\mathop{mod}3 = 2, 25mod3=125\mathop{mod}3 = 1, 5mod7=55\mathop{mod}7 = 5, and 5mod5=05\mathop{mod}5 = 0.

There is more than one way to assign values to qq and rr, depending on how integer division is interpreted. The most common mathematical definition computes the mod function as nmodm=nβˆ’m⌊n/mβŒ‹n \;\mathrm{mod}\; m = n - m\lfloor n/m\rfloor. In this case, βˆ’3mod5=2-3 \;\mathrm{mod}\; 5 = 2. However, Java and C++ compilers typically use the underlying processor’s machine instruction for computing integer arithmetic. On many computers this is done by truncating the resulting fraction, meaning nmodm=nβˆ’m(trunc(n/m))n \;\mathrm{mod}\; m = n - m (\mathrm{trunc}(n/m)). Under this definition, βˆ’3mod5=βˆ’3-3 \;\mathrm{mod}\; 5 = -3. Another language might do something different.

Unfortunately, for many applications this is not what the user wants or expects. For example, many hash systems will perform some computation on a record’s key value and then take the result modulo the hash table size. The expectation here would be that the result is a legal index into the hash table, not a negative number. Implementers of hash functions must either ensure that the result of the computation is always positive, or else add the hash table size to the result of the modulo function when that result is negative.

1.4.4 Logarithms

The logarithm of base bb for value yy is the power to which bb is raised to get yy. Normally, this is written as logby=x\log_b y = x. Thus, if logby=x\log_b y = x then bx=yb^x = y, and blogby=yb^{log_b y} = y.

Logarithms are used frequently by programmers. Here are two typical uses.

Example: Minimum bits

Many programs require an encoding for a collection of objects. What is the minimum number of bits needed to represent nn distinct code values? The answer is ⌈log2nβŒ‰\lceil \log_2 n \rceil bits. For example, if you have 1000 codes to store, you will require at least ⌈log21000βŒ‰=10\lceil \log_2 1000 \rceil = 10 bits to have 1000 different codes (10 bits provide 1024 distinct code values).

Nearly all logarithms we use have a base of two. This is because data structures and algorithms most often divide things in half, or store codes with binary bits. Whenever you see the notation logn\log n, either log2n\log_2 n is meant or else the term is being used asymptotically and so the actual base does not matter. For logarithms using any base other than two, we will show the base explicitly.

Logarithms have the following properties, for any positive values of mm, nn, and rr, and any positive integers aa and bb.

  1. log(nm)=logn+logm\log (nm) = \log n + \log m.
  2. log(n/m)=lognβˆ’logm\log (n/m) = \log n - \log m.
  3. log(nr)=rlogn\log (n^r) = r \log n.
  4. logan=logbn/logba\log_a n = \log_b n / \log_b a.
  5. blogbn=nb^{\log_b n} = n.

The first two properties state that the logarithm of two numbers multiplied (or divided) can be found by adding (or subtracting) the logarithms of the two numbers. Property (3) is simply an extension of property (1). Property (4) tells us that, for variable nn and any two integer constants aa and bb, logan\log_a n and logbn\log_b n differ by the constant factor logba\log_b a, regardless of the value of nn. Most runtime analyses we use are of a type that ignores constant factors in costs. Property (4) says that such analyses need not be concerned with the base of the logarithm, because this can change the total cost only by a constant factor.

When discussing logarithms, exponents often lead to confusion. Property (3) tells us that logn2=2logn\log n^2 = 2 \log n. How do we indicate the square of the logarithm (as opposed to the logarithm of n2n^2)? This could be written as (logn)2(\log n)^2, but it is traditional to use log2n\log^2 n. On the other hand, we might want to take the logarithm of the logarithm of nn. This is written loglogn\log \log n.

1.4.5 Summations

Most programs contain loop constructs. When analysing running time costs for programs with loops, we need to add up the costs for each time the loop is executed. This is an example of a summation. Summations are simply the sum of costs for some function applied to a range of parameter values. Summations are typically written with the following β€œSigma” notation:

βˆ‘i=1nf(i) \sum_{i=1}^{n} f(i)

This notation indicates that we are summing the value of f(i)f(i) over some range of (integer) values. The parameter to the expression and its initial value are indicated below the βˆ‘\sum symbol. Here, the notation i=1i=1 indicates that the parameter is ii and that it begins with the value 1. At the top of the βˆ‘\sum symbol is the expression nn. This indicates the maximum value for the parameter ii. Thus, this notation means to sum the values of f(i)f(i) as ii ranges across the integers from 1 through nn. This can also be written f(1)+f(2)+β‹―+f(nβˆ’1)+f(n)f(1) + f(2) + \cdots + f(n-1) + f(n). Within a sentence, Sigma notation is typeset as βˆ‘i=1nf(i)\sum_{i=1}^{n} f(i).

Given a summation, you often wish to replace it with an algebraic expression with the same value as the summation. This is known as a closed-form solution, and the process of replacing the summation with its closed-form solution is known as solving the summation. For example, the summation βˆ‘i=1n1\sum_{i=1}^{n} 1 is simply the expression β€œ1” summed nn times (remember that ii ranges from 1 to nn). Because the sum of nn ones is nn, the closed-form solution is nn.

Here are explanations about the closed form solutions of two summations that you will see many times in this book. Since this appears so often, it will help you later if you can get comfortable with it.

Here is a list of useful summations, along with their closed-form solutions.

βˆ‘i=1ni=n(n+1)2βˆ‘i=1ni2=2n3+3n2+n6=n(2n+1)(n+1)6βˆ‘i=1lognn=nlognβˆ‘i=0∞ai=11βˆ’aif0<a<1βˆ‘i=0nai=an+1βˆ’1aβˆ’1ifaβ‰ 1\begin{align*} \sum_{i = 1}^{n} i &= \frac{n (n+1)}{2} \\ \sum_{i = 1}^{n} i^2 &= \frac{2 n^3 + 3 n^2 + n}{6} = \frac{n(2n + 1)(n + 1)}{6} \\ \sum_{i = 1}^{\log n} n &= n \log n \\ \sum_{i = 0}^\infty a^i &= \frac{1}{1-a} ~~~ \mbox{if} \ 0 < a < 1 \\ \sum_{i = 0}^{n} a^i &= \frac{a^{n+1} - 1}{a - 1} ~~~ \mbox{if} \ a \neq 1 \end{align*}

As special cases to the last summation, we have the following two:

βˆ‘i=1n12i=1βˆ’12nβˆ‘i=0n2i=2n+1βˆ’1\begin{align*} \sum_{i = 1}^{n} \frac{1}{2^i} &= 1 - \frac{1}{2^n} \\ \sum_{i = 0}^{n} 2^i &= 2^{n+1} - 1 \end{align*}

As a corollary to the previous summation:

βˆ‘i=0logn2i=2logn+1βˆ’1=2nβˆ’1\begin{align*} \sum_{i = 0}^{\log n} 2^i &= 2^{\log n + 1} - 1 = 2n - 1 \end{align*}

Finally:

βˆ‘i=1ni2i=2βˆ’n+22n\begin{align*} \sum_{i = 1}^{n} \frac{i}{2^i} &= 2 - \frac{n+2}{2^n} \end{align*}

1.4.6 Mathematical proof techniques

Solving any problem has two distinct parts: the investigation and the argument. Students are too used to seeing only the argument in their textbooks and lectures. But to be successful in school (and in life after school), one needs to be good at both, and to understand the differences between these two phases of the process. To solve the problem, you must investigate successfully. That means engaging the problem, and working through until you find a solution. Then, to give the answer to your client (whether that β€œclient” be your instructor when writing answers on a homework assignment or exam, or a written report to your boss), you need to be able to make the argument in a way that gets the solution across clearly and succinctly. The argument phase involves good technical writing skills and the ability to make a clear, logical argument.

Being conversant with standard proof techniques can help you in this process. Knowing how to write a good proof helps in many ways. First, it clarifies your thought process, which in turn clarifies your explanations. Second, if you use one of the standard proof structures such as proof by contradiction or an induction proof, then both you and your reader are working from a shared understanding of that structure. That makes for less complexity to your reader to understand your proof, because the reader need not decode the structure of your argument from scratch.

This section briefly introduces three commonly used proof techniques:

  1. deduction, or direct proof,

  2. proof by contradiction, and

  3. proof by mathematical induction.

In general, a direct proof is just a β€œlogical explanation”. A direct proof is sometimes referred to as an argument by deduction. This is simply an argument in terms of logic.

Direct Proof

Many direct proofs are written in English with words such as β€œif … then”. In this case logic notation such as Pβ‡’QP \Rightarrow Q can often help express the proof. Even if we don’t wish to use symbolic logic notation, we can still take advantage of fundamental theorems of logic to structure our arguments. For example, if we want to prove that PP and QQ are equivalent, we can first prove Pβ‡’QP \Rightarrow Q and then prove Qβ‡’PQ \Rightarrow P.

In some domains, proofs are essentially a series of state changes from a start state to an end state. Formal predicate logic can be viewed in this way, with the various β€œrules of logic” being used to make the changes from one formula or combining a couple of formulas to make a new formula on the route to the destination. Symbolic manipulations to solve integration problems in introductory calculus classes are similar in spirit, as are high school geometry proofs.

Example: Sum of first nn integers

Here is a direct proof that βˆ‘i=1ni=(n+1)n/2\sum_{i=1}^n i = (n+1)n/2. If we take the first and last terms of the series, since they are 1 and nn, of course they sum to n+1n+1. If we take the second term and next-to-last term, since they are 2 and nβˆ’1n-1, they also sum to n+1n+1. Likewise for the third term and third-from-the-end term. We can go on and pair up terms like this, such that there are n/2n/2 pairs that each sum to n+1n+1, for a total sum of (n+1)n/2(n+1)n/2. You can check for yourself that this is true even if nn is odd (and so the middle value of the series has no partner).

Proof by Contradiction

The simplest way to disprove a theorem or statement is to find a counter-example to the theorem. Unfortunately, no number of examples supporting a theorem is sufficient to prove that the theorem is correct. However, there is an approach that is vaguely similar to disproving by counter-example, called proof by contradiction. To prove a theorem by contradiction, we first assume that the theorem is false. We then find a logical contradiction stemming from this assumption. If the logic used to find the contradiction is correct, then the only way to resolve the contradiction is to recognise that the assumption that the theorem is false must be incorrect. That is, we conclude that the theorem must be true.

Example: No largest integer

Here is a simple proof by contradiction.

Theorem: There is no largest integer.

Proof by contradiction:

Step 1. Contrary assumption: Assume that there is a largest integer. Call it BB (for β€œbiggest”).

Step 2. Show this assumption leads to a contradiction: Consider C=B+1C = B + 1. CC is an integer because it is the sum of two integers. Also, C>BC > B, which means that BB is not the largest integer after all. Thus, we have reached a contradiction. The only possible flaw in our reasoning is the initial assumption that the theorem is false. Thus, we conclude that the theorem is correct.

A related proof technique is proving the contrapositive. We can prove that P⇒QP \Rightarrow Q by proving (notQ)⇒(notP)(\mathrm{not}\ Q) \Rightarrow (\mathrm{not}\ P). This technique works because the truth table for the two logical statements are the same.

Proof by Mathematical Induction

Mathematical induction can be used to prove a wide variety of theorems. Induction also provides a useful way to think about algorithm design, because it encourages you to think about solving a problem by building up from simple subproblems. Induction can help to prove that a recursive function produces the correct result. Understanding recursion is a big step toward understanding induction, and vice versa, since they work by essentially the same process.

Within the context of algorithm analysis, one of the most important uses for mathematical induction is as a method to test a hypothesis. When seeking a closed-form solution for a summation or recurrence, we might first guess or otherwise acquire evidence that a particular formula is the correct solution. If the formula is indeed correct, it is often an easy matter to prove that fact with an induction proof.

Let Thrm be a theorem to prove, and express Thrm in terms of a positive integer parameter nn. Mathematical induction states that Thrm is true for any value of parameter nn (for nβ‰₯cn \geq c, where cc is some constant) if the following two conditions are true:

  1. Base Case: Thrm holds for n=cn = c, and

  2. Induction Step: If Thrm holds for nβˆ’1n - 1, then Thrm holds for nn.

Proving the base case is usually easy, typically requiring that some small value such as 1 be substituted for nn in the theorem and applying simple algebra or logic as necessary to verify the theorem. Proving the induction step is sometimes easy, and sometimes difficult. Proving induction step (in conjunction with verifying the base case) yields a satisfactory proof by mathematical induction.

The two conditions that make up the induction proof combine to demonstrate that Thrm holds for n=2n=2 as an extension of the fact that Thrm holds for n=1n=1. This fact, combined again with condition (2), indicates that Thrm also holds for n=3n=3, and so on. Thus, Thrm holds for all values of nn (larger than the base cases) once the two conditions have been proved.

What makes mathematical induction so powerful (and so mystifying to most people at first) is that we can take advantage of the assumption that Thrm holds for all values less than nn as a tool to help us prove that Thrm holds for nn. This is known as the induction hypothesis. Having this assumption to work with makes the induction step easier to prove than tackling the original theorem itself. Being able to rely on the induction hypothesis provides extra information that we can bring to bear on the problem.

Recursion and induction have many similarities. Both are anchored on one or more base cases. A recursive function relies on the ability to call itself to get the answer for smaller instances of the problem. Likewise, induction proofs rely on the truth of the induction hypothesis to prove the theorem. The induction hypothesis does not come out of thin air. It is true if and only if the theorem itself is true, and therefore is reliable within the proof context. Using the induction hypothesis to do work is exactly the same as using a recursive call to do work.

Example: Sum of first nn integers

Here is a sample proof by mathematical induction. Call the sum of the first nn positive integers 𝐒(n)\mathbf{S}(n).

Theorem: 𝐒(n)=n(n+1)/2\mathbf{S}(n) = n(n+1)/2.

Proof: The proof is by mathematical induction.

  1. Check the base case. For n=1n = 1, verify that 𝐒(1)=1(1+1)/2\mathbf{S}(1) = 1(1+1)/2. 𝐒(1)\mathbf{S}(1) is simply the sum of the first positive number, which is 1. Because 1(1+1)/2=11(1+1)/2 = 1, the formula is correct for the base case.

  2. State the induction hypothesis. The induction hypothesis is

    𝐒(nβˆ’1)=βˆ‘i=1nβˆ’1i=(nβˆ’1)((nβˆ’1)+1)2=(nβˆ’1)(n)2. \mathbf{S}(n-1) = \sum_{i=1}^{n-1} i = \frac{(n-1)((n-1)+1)}{2} = \frac{(n-1)(n)}{2}.

  3. Use the assumption from the induction hypothesis for nβˆ’1n-1 to show that the result is true for nn. The induction hypothesis states that 𝐒(nβˆ’1)=(nβˆ’1)(n)/2\mathbf{S}(n-1) = (n-1)(n)/2, and because 𝐒(n)=𝐒(nβˆ’1)+n\mathbf{S}(n) = \mathbf{S}(n-1) + n, we can substitute for 𝐒(nβˆ’1)\mathbf{S}(n-1) to get

    βˆ‘i=1ni=(βˆ‘i=1nβˆ’1i)+n=(nβˆ’1)(n)2+n=n2βˆ’n+2n2=n(n+1)2.\begin{align*} \sum_{i=1}^n i &= \left(\sum_{i=1}^{n-1} i\right) + n = \frac{(n-1)(n)}{2} + n\\ &=\frac{n^2 - n + 2n}{2} = \frac{n(n+1)}{2}. \end{align*}

    Thus, by mathematical induction,

    𝐒(n)=βˆ‘i=1ni=n(n+1)/2. \mathbf{S}(n) = \sum_{i=1}^n i = n(n+1)/2.

Note carefully what took place in this example. First we cast 𝐒(n)\mathbf{S}(n) in terms of a smaller occurrence of the problem: 𝐒(n)=𝐒(nβˆ’1)+n\mathbf{S}(n) = \mathbf{S}(n-1) + n. This is important because once 𝐒(nβˆ’1)\mathbf{S}(n-1) comes into the picture, we can use the induction hypothesis to replace 𝐒(nβˆ’1)\mathbf{S}(n-1) with (nβˆ’1)(n)/2(n-1)(n)/2. From here, it is simple algebra to prove that 𝐒(nβˆ’1)+n\mathbf{S}(n-1) + n equals the right-hand side of the original theorem.

Example: Recurrence relation

This example shows how we can use induction to prove that a proposed closed-form solution for a recurrence relation is correct.

Theorem: The following recurrence relation has closed-form solution T(n)=nβˆ’1T(n) = n - 1:

T(n)=T(nβˆ’1)+1T(1)=0\begin{align*} T(n) &= T(n-1) + 1 \\ T(1) &= 0 \end{align*}

Proof: To prove the base case, we observe from the definition that T(2)=T(1)+1=0+1=1T(2) = T(1) + 1 = 0 + 1 = 1. From the proposed closed-form solution we get T(2)=2βˆ’1=1T(2) = 2 - 1 = 1, which matches the definition.

The induction hypothesis is that T(nβˆ’1)=nβˆ’2T(n-1) = n-2. Combining the definition of the recurrence with the induction hypothesis, we see immediately that

T(n)=T(nβˆ’1)+1=nβˆ’2+1=nβˆ’1 T(n) = T(n-1) + 1 = n-2 + 1 = n-1

for n>1n > 1. Thus, we have proved the theorem correct by mathematical induction.

1.4.7 Estimation

The concept of estimation might be unfamiliar to many readers. Estimation is not a mathematical technique, but rather a general engineering skill. This is sometimes known as β€œback of the napkin” or β€œback of the envelope” calculation. Both nicknames suggest that only a rough estimate is produced. It is very useful to computer scientists doing design work, because any proposed solution whose estimated resource requirements fall well outside the problem’s resource constraints can be discarded immediately, allowing time for greater analysis of more promising solutions.

Estimation techniques are a standard part of engineering curricula but are often neglected in computer science. Estimation is no substitute for rigorous, detailed analysis of a problem, but it can help to decide when a rigorous analysis is warranted: If the initial estimate indicates that the solution is unworkable, then further analysis is probably unnecessary.

Estimation can be formalised by the following three-step process:

  1. Determine the major parameters that affect the problem.
  2. Derive an equation that relates the parameters to the problem.
  3. Select values for the parameters, and apply the equation to yield an estimated solution.

When doing estimations, a good way to reassure yourself that the estimate is reasonable is to do it in two different ways. In general, if you want to know what comes out of a system, you can either try to estimate that directly, or you can estimate what goes into the system (assuming that what goes in must later come out). If both approaches (independently) give similar answers, then this should build confidence in the estimate.

When calculating, be sure that your units match. For example, do not add feet and pounds. Verify that the result is in the correct units. Always keep in mind that the output of a calculation is only as good as its input. The more uncertain your valuation for the input parameters in Step 3, the more uncertain the output value. However, back of the envelope calculations are often meant only to get an answer within an order of magnitude, or perhaps within a factor of two. Before doing an estimate, you should decide on acceptable error bounds, such as within 25%, within a factor of two, and so forth. Once you are confident that an estimate falls within your error bounds, leave it alone! Do not try to get a more precise estimate than necessary for your purpose.

Example: Bookcases for a million pages

How many bookcases does it take to store books containing one million pages? Let’s estimate that a book of 200 pages requires one cm on the library shelf (it will help to look at the size of any handy book), yielding about 50 metres of shelf space for one million pages. If a shelf is 80 cm wide, then around 60-65 shelves are required. If a bookcase contains 6 shelves, this yields about 10 bookcases.

To reach this conclusion, we estimated the number of pages per cm, the width of a shelf, and the number of shelves in a bookcase. Any of these estimates can be wrong: books have thinner or thicker paper, shelves can be narrower or wider, and bookcases can have more or less shelves. Let’s say that the paper thickness can vary Β±50%, that shelves can be 60–100 cm wide, and that there can be 5–8 shelves in a bookcase. This suggests that in the extreme case we might be wrong by a factor 2–3, but it’s much more likely that some of the errors even out so that our estimate is much more accurate.

1.4.8 Random numbers

The success of randomised algorithms depends on having access to a good random number generator. While modern compilers are likely to include a random number generator that is good enough for most purposes, it is helpful to understand how they work, and to even be able to construct your own in case you don’t trust the one provided. This is easy to do.

First, let us consider what a random sequence is. From the following list, which appears to be a sequence of β€œrandom” numbers?

  • 1, 1, 1, 1, 1, 1, 1, 1, 1, …
  • 1, 2, 3, 4, 5, 6, 7, 8, 9, …
  • 2, 7, 1, 8, 2, 8, 1, 8, 2, …

In fact, all three happen to be the beginning of a some sequence in which one could continue the pattern to generate more values (in case you do not recognise it, the third one is the initial digits of the irrational constant ee). Viewed as a series of digits, ideally every possible sequence has equal probability of being generated (even the three sequences above). In fact, definitions of randomness generally have features such as:

  • One cannot predict the next item better than by guessing.
  • The series cannot be described more briefly than simply listing it out. This is the equidistribution property.

There is no such thing as a random number sequence, only β€œrandom enough” sequences. A sequence is pseudo random if no future term can be predicted in polynomial time, given all past terms.

Most computer systems use a deterministic algorithm to select pseudorandom numbers. The most commonly used approach historically is known as the Linear Congruential Method (LCM). The LCM method is quite simple. We begin by picking a seed that we will call r1r_1. Then, we can compute successive terms as follows, where bb and tt are some constant positive integers.

ri=(bβ‹…riβˆ’1)modt\begin{align*} r_i &= (b\cdot r_{i-1}) \;\mathrm{mod}\; t \end{align*}

By definition of the mod\mathrm{mod} function, all generated numbers must be in the range 0 to tβˆ’1t-1. Now, consider what happens when ri=rjr_i = r_j for values ii and jj. Of course then ri+1=rj+1r_{i+1} = r_{j+1} which means that we have a repeating cycle.

Since the values coming out of the random number generator are between 0 and tβˆ’1t-1, the longest cycle that we can hope for has length tt. In fact, since r0=0r_0 = 0, it cannot even be quite this long. It turns out that to get a good result, it is crucial to pick good values for both bb and tt. To see why, consider the following example.

Example: Varying the tt value

Given a tt value of 13, we can get very different results depending on the bb value that we pick, in ways that are hard to predict.

ri=(6β‹…riβˆ’1)mod13=…,1,6,10,8,9,2,12,7,3,5,4,11,1,…ri=(7β‹…riβˆ’1)mod13=…,1,7,10,5,9,11,12,6,3,8,4,2,1,…ri=(5β‹…riβˆ’1)mod13=…,1,5,12,8,1,…\begin{align*} r_i \;=\; (6\cdot r_{i-1}) \mod 13 &= \ldots, 1, 6, 10, 8, 9, 2, 12, 7, 3, 5, 4, 11, 1, \ldots \\ r_i \;=\; (7\cdot r_{i-1}) \mod 13 &= \ldots, 1, 7, 10, 5, 9, 11, 12, 6, 3, 8, 4, 2, 1, \ldots \\ r_i \;=\; (5\cdot r_{i-1}) \mod 13 &= \ldots, 1, 5, 12, 8, 1, \ldots \end{align*}

In the case of b=5b=5, the generator goes through only a short sequence before repeating, with the series depending on the seed value chosen. Clearly, a bb value of 5 is far inferior to bb values of 6 or 7 in this example.

If you would like to write a simple LCM random number generator of your own, an effective one can be made with the following formula:

ri=(16807β‹…riβˆ’1)mod231βˆ’1\begin{align*} r_i &= (16807 \cdot r_{i-1}) \mod 2^{31} - 1 \end{align*}

Another approach is based on using a computer chip that generates random numbers resulting from β€œthermal noise” in the system. Time will tell if this approach replaces deterministic approaches.