1.4 Mathematical preliminaries
This section presents the mathematical preliminaries assumed to be familiar to the reader. It serves as a review and reference, allowing you to revisit relevant sections when encountering unfamiliar notation or mathematical techniques in later chapters. If youβre comfortable with these preliminaries, you can safely skip ahead to the next section.
1.4.1 Sets
The concept of a set in the mathematical sense has wide application in computer science. The notations and techniques of set theory are commonly used when describing and implementing algorithms because the abstractions associated with sets often help to clarify and simplify algorithm design.
A set is a collection of distinguishable members or elements. The members are typically drawn from some larger population known as the base type. Each member of a set is either a primitive element of the base type or is a set itself. There is no concept of duplication in a set. Each value from the base type is either in the set or not in the set. For example, a set named might consist of the three integers 7, 11, and 42. In this case, βs members are 7, 11, and 42, and the base type is integer.
The following table shows the symbols commonly used to express sets and their relationships.
A set composed of the members 1 and 4 | |
A set definition using a set comprehension (e.g., the set of all primes) | |
is a member of | |
is not a member of set | |
The null or empty set | |
Cardinality: the size of , or its number of members | |
is included in , is a subset of | |
is included in , is a superset of | |
Union: all elements appearing in any of or | |
Intersection: all elements appearing in both and | |
Difference: all elements of not in | |
(Cartesian) product: all possible pairs with and | |
or | Powerset: all possible subsets of |
Here are some examples of this notation in use. First define two sets, and .
(because has three members) and (because has two members). Both of these sets are finite in length. Other sets can be infinite, for example, the set of integers.
The union of and , written , is the set of elements in either or , which is {2, 3, 5, 10}. The intersection of and , written , is the set of elements that appear in both and , which is {5}. The set difference of and , written , is the set of elements that occur in but not in , which is {2, 3}. Note that and that , but in general . In this example, . Finally, the set {5, 3, 2} is indistinguishable from set , because sets have no concept of order. Likewise, set {2, 3, 2, 5} is also indistinguishable from , because sets have no concept of duplicate elements.
The set product or Cartesian product of two sets is a set of ordered pairs. For our example sets, the set product would be:
The powerset of a set (denoted or ) is the set of all possible subsets of . For our example set , the powerset would be:
A collection of elements without a specific order, similar to a set, but allowing multiple occurrences of each element, is called a bag. A bag is also known as a multiset and an element that appears more than once is called a duplicate.
A sequence is an ordered collection of elements of the same type that may include duplicates. A sequence can for example be implemented as a list, an array, or a vector. We can access elements in a sequence using zero-based indexing, where the index 0 refers to the first element, 1 to the second, and so on. We will use square brackets to enclose the elements of a sequence. For example, is a sequence. Note that sequence is distinct from sequence , and both are distinct from sequence .
1.4.2 Relations
A binary relation over set is a set of ordered pairs from , i.e., . As an example of a relation, if is , then
is a relation, and
is a different relation. If tuple is in relation , we may use the infix notation . We often use relations, such as the less than operator (), on the natural numbers, which includes ordered pairs such as and , but not or . Rather than writing the relationship in terms of ordered pairs, we typically use an infix notation for such relations, writing .
The most important properties of relations are as follows:
- is reflexive if .
- is irreflexive if is never true.
- is symmetric if whenever , then .
- is antisymmetric if whenever and , then .
- is transitive if whenever and , then .
(Here is a binary relation over a set , and the condition holds for all .)
As examples, for the natural numbers, is irreflexive (because is never true), antisymmetric (because there is no case where and ), and transitive. Relation is reflexive, antisymmetric, and transitive. Relation is reflexive, symmetric (and antisymmetric!), and transitive. For people, the relation βis a sibling ofβ is symmetric and transitive. If we define a person to be a sibling of themself, then it is reflexive; if we define a person not to be a sibling of themself, then it is irreflexive.
Equivalence relations
is an equivalence relation on set if it is reflexive, symmetric, and transitive. An equivalence relation can be used to partition a set into equivalence classes. If two elements and are equivalent to each other, we write .
A partition of a set is a collection of subsets that are disjoint from each other and whose union is . Thereβs a close correspondence between equivalence relations and partitions: an equivalence relation on a set partitions the set into disjoint subsets (the subset of equivalent elements). But the opposite is also true: if you have a partition of a set you can easily define a corresponding equivalence relation (elements are equivalent if they belong to the same partition).
An obvious example of an equivalence relation is the relation on natural numbers. Examples on real-life objects are βhave the same colourβ, or βas heavy asβ. And if we look at strings, βhave the same lengthβ is an equivalence relation too.
Partial orders
A binary relation is called a partial order if it is antisymmetric and transitive. If the relation is reflexive, it is called a non-strict partial order. If the relation is irreflexive, it is called a strict partial order. The set on which the partial order is defined is called a partially ordered set or a poset. Elements and of a set are comparable under a given relation if either or . If every pair of distinct elements in a partial order are comparable, then the order is called a total order or linear order.
Example: Less-than
For the integers, relations and define partial orders. Operation is a total order because, for every pair of integers and such that , either or . Likewise, is a total order because, for every pair of integers and such that , either or .
Example: Subsets
For the powerset of the integers, the subset operator defines a partial order (because it is antisymmetric and transitive). For example, . However, sets {1, 2} and {1, 3} are not comparable by the subset operator, because neither is a subset of the other. Therefore, the subset operator does not define a total order on the powerset of the integers.
1.4.3 Miscellaneous notations
We now define several mathematical terms and concepts, providing a reference for future use.
- Units of measure:
-
We will use the following notation for units of measure. βBβ will be used as an abbreviation for bytes, βbβ for bits, βKBβ for kilobytes ( bytes), βMBβ for megabytes ( bytes) βGBβ for gigabytes ( bytes) and βmsβ for milliseconds (a millisecond is 1/1000 of a second). Spaces are not placed between the number and the unit abbreviation when a power of two is intended. Thus a disk drive of size 25 gigabytes (where a gigabyte is intended as bytes) will be written as β25GBβ. Spaces are used when a decimal value is intended. An amount of 2000 bits would therefore be written β2 Kbβ while β2Kbβ represents 2048 bits. 2000 milliseconds is written as 2000 ms. Note that in this book large amounts of storage are nearly always measured in powers of two and times in powers of ten.
- Permutations:
-
A permutation of a sequence is simply the members of arranged in some order. For example, a permutation of the integers 1 through would be those values arranged in some order. If the sequence contains distinct members, then there are different permutations for the sequence. This is because there are choices for the first member in the permutation; for each choice of first member there are choices for the second member, and so on. Sometimes one would like to obtain a random permutation for a sequence, that is, one of the possible permutations is selected in such a way that each permutation has equal probability of being selected.
- Logic Notation:
-
We will occasionally make use of the notation of symbolic logic.
- means βif then β β i.e., that implies .
- means β if and only if β β i.e., that and are equivalent.
- means β or β β the disjunction of and .
- means β and β β the conjunction of and .
- means βnot β β the negation of .
- Floor and ceiling:
-
The floor of (written ) takes real value and returns the greatest integer . For example, , as does , while and . The ceiling of (written ) takes real value and returns the least integer . For example, , as does , while .
- Modulus function:
-
The modulus (or mod) function returns the remainder of an integer division. Sometimes written in mathematical expressions, the syntax in many programming languages is . From the definition of remainder, is the integer such that for an integer, and . Therefore, the result of must be between 0 and when and are positive integers. For example, , , , and .
There is more than one way to assign values to and , depending on how integer division is interpreted. The most common mathematical definition computes the mod function as . In this case, . However, Java and C++ compilers typically use the underlying processorβs machine instruction for computing integer arithmetic. On many computers this is done by truncating the resulting fraction, meaning . Under this definition, . Another language might do something different.
Unfortunately, for many applications this is not what the user wants or expects. For example, many hash systems will perform some computation on a recordβs key value and then take the result modulo the hash table size. The expectation here would be that the result is a legal index into the hash table, not a negative number. Implementers of hash functions must either ensure that the result of the computation is always positive, or else add the hash table size to the result of the modulo function when that result is negative.
1.4.4 Logarithms
The logarithm of base for value is the power to which is raised to get . Normally, this is written as . Thus, if then , and .
Logarithms are used frequently by programmers. Here are two typical uses.
Example: Minimum bits
Many programs require an encoding for a collection of objects. What is the minimum number of bits needed to represent distinct code values? The answer is bits. For example, if you have 1000 codes to store, you will require at least bits to have 1000 different codes (10 bits provide 1024 distinct code values).
Example: Binary search
Consider the binary search algorithm for finding a given value within an array sorted by value from lowest to highest. Binary search first looks at the middle element and determines if the value being searched for is in the upper half or the lower half of the array. The algorithm then continues splitting the appropriate subarray in half until the desired value is found. How many times can an array of size be split in half until only one element remains in the final subarray? The answer is times.
Nearly all logarithms we use have a base of two. This is because data structures and algorithms most often divide things in half, or store codes with binary bits. Whenever you see the notation , either is meant or else the term is being used asymptotically and so the actual base does not matter. For logarithms using any base other than two, we will show the base explicitly.
Logarithms have the following properties, for any positive values of , , and , and any positive integers and .
- .
- .
- .
- .
- .
The first two properties state that the logarithm of two numbers multiplied (or divided) can be found by adding (or subtracting) the logarithms of the two numbers. Property (3) is simply an extension of property (1). Property (4) tells us that, for variable and any two integer constants and , and differ by the constant factor , regardless of the value of . Most runtime analyses we use are of a type that ignores constant factors in costs. Property (4) says that such analyses need not be concerned with the base of the logarithm, because this can change the total cost only by a constant factor.
When discussing logarithms, exponents often lead to confusion. Property (3) tells us that . How do we indicate the square of the logarithm (as opposed to the logarithm of )? This could be written as , but it is traditional to use . On the other hand, we might want to take the logarithm of the logarithm of . This is written .
1.4.5 Summations
Most programs contain loop constructs. When analysing running time costs for programs with loops, we need to add up the costs for each time the loop is executed. This is an example of a summation. Summations are simply the sum of costs for some function applied to a range of parameter values. Summations are typically written with the following βSigmaβ notation:
This notation indicates that we are summing the value of over some range of (integer) values. The parameter to the expression and its initial value are indicated below the symbol. Here, the notation indicates that the parameter is and that it begins with the value 1. At the top of the symbol is the expression . This indicates the maximum value for the parameter . Thus, this notation means to sum the values of as ranges across the integers from 1 through . This can also be written . Within a sentence, Sigma notation is typeset as .
Given a summation, you often wish to replace it with an algebraic expression with the same value as the summation. This is known as a closed-form solution, and the process of replacing the summation with its closed-form solution is known as solving the summation. For example, the summation is simply the expression β1β summed times (remember that ranges from 1 to ). Because the sum of ones is , the closed-form solution is .
Here are explanations about the closed form solutions of two
summations that you will see many times in this book. Since this appears
so often, it will help you later if you can get comfortable with it.
Here is a list of useful summations, along with their closed-form solutions.
As special cases to the last summation, we have the following two:
As a corollary to the previous summation:
Finally:
1.4.6 Mathematical proof techniques
Solving any problem has two distinct parts: the investigation and the argument. Students are too used to seeing only the argument in their textbooks and lectures. But to be successful in school (and in life after school), one needs to be good at both, and to understand the differences between these two phases of the process. To solve the problem, you must investigate successfully. That means engaging the problem, and working through until you find a solution. Then, to give the answer to your client (whether that βclientβ be your instructor when writing answers on a homework assignment or exam, or a written report to your boss), you need to be able to make the argument in a way that gets the solution across clearly and succinctly. The argument phase involves good technical writing skills and the ability to make a clear, logical argument.
Being conversant with standard proof techniques can help you in this process. Knowing how to write a good proof helps in many ways. First, it clarifies your thought process, which in turn clarifies your explanations. Second, if you use one of the standard proof structures such as proof by contradiction or an induction proof, then both you and your reader are working from a shared understanding of that structure. That makes for less complexity to your reader to understand your proof, because the reader need not decode the structure of your argument from scratch.
This section briefly introduces three commonly used proof techniques:
deduction, or direct proof,
proof by contradiction, and
proof by mathematical induction.
In general, a direct proof is just a βlogical explanationβ. A direct proof is sometimes referred to as an argument by deduction. This is simply an argument in terms of logic.
Direct Proof
Many direct proofs are written in English with words such as βif β¦ thenβ. In this case logic notation such as can often help express the proof. Even if we donβt wish to use symbolic logic notation, we can still take advantage of fundamental theorems of logic to structure our arguments. For example, if we want to prove that and are equivalent, we can first prove and then prove .
In some domains, proofs are essentially a series of state changes from a start state to an end state. Formal predicate logic can be viewed in this way, with the various βrules of logicβ being used to make the changes from one formula or combining a couple of formulas to make a new formula on the route to the destination. Symbolic manipulations to solve integration problems in introductory calculus classes are similar in spirit, as are high school geometry proofs.
Example: Sum of first integers
Here is a direct proof that . If we take the first and last terms of the series, since they are 1 and , of course they sum to . If we take the second term and next-to-last term, since they are 2 and , they also sum to . Likewise for the third term and third-from-the-end term. We can go on and pair up terms like this, such that there are pairs that each sum to , for a total sum of . You can check for yourself that this is true even if is odd (and so the middle value of the series has no partner).
Proof by Contradiction
The simplest way to disprove a theorem or statement is to find a counter-example to the theorem. Unfortunately, no number of examples supporting a theorem is sufficient to prove that the theorem is correct. However, there is an approach that is vaguely similar to disproving by counter-example, called proof by contradiction. To prove a theorem by contradiction, we first assume that the theorem is false. We then find a logical contradiction stemming from this assumption. If the logic used to find the contradiction is correct, then the only way to resolve the contradiction is to recognise that the assumption that the theorem is false must be incorrect. That is, we conclude that the theorem must be true.
Example: No largest integer
Here is a simple proof by contradiction.
Theorem: There is no largest integer.
Proof by contradiction:
Step 1. Contrary assumption: Assume that there is a largest integer. Call it (for βbiggestβ).
Step 2. Show this assumption leads to a contradiction: Consider . is an integer because it is the sum of two integers. Also, , which means that is not the largest integer after all. Thus, we have reached a contradiction. The only possible flaw in our reasoning is the initial assumption that the theorem is false. Thus, we conclude that the theorem is correct.
A related proof technique is proving the contrapositive. We can prove that by proving . This technique works because the truth table for the two logical statements are the same.
Proof by Mathematical Induction
Mathematical induction can be used to prove a wide variety of theorems. Induction also provides a useful way to think about algorithm design, because it encourages you to think about solving a problem by building up from simple subproblems. Induction can help to prove that a recursive function produces the correct result. Understanding recursion is a big step toward understanding induction, and vice versa, since they work by essentially the same process.
Within the context of algorithm analysis, one of the most important uses for mathematical induction is as a method to test a hypothesis. When seeking a closed-form solution for a summation or recurrence, we might first guess or otherwise acquire evidence that a particular formula is the correct solution. If the formula is indeed correct, it is often an easy matter to prove that fact with an induction proof.
Let Thrm be a theorem to prove, and express Thrm in terms of a positive integer parameter . Mathematical induction states that Thrm is true for any value of parameter (for , where is some constant) if the following two conditions are true:
Base Case: Thrm holds for , and
Induction Step: If Thrm holds for , then Thrm holds for .
Proving the base case is usually easy, typically requiring that some small value such as 1 be substituted for in the theorem and applying simple algebra or logic as necessary to verify the theorem. Proving the induction step is sometimes easy, and sometimes difficult. Proving induction step (in conjunction with verifying the base case) yields a satisfactory proof by mathematical induction.
The two conditions that make up the induction proof combine to demonstrate that Thrm holds for as an extension of the fact that Thrm holds for . This fact, combined again with condition (2), indicates that Thrm also holds for , and so on. Thus, Thrm holds for all values of (larger than the base cases) once the two conditions have been proved.
What makes mathematical induction so powerful (and so mystifying to most people at first) is that we can take advantage of the assumption that Thrm holds for all values less than as a tool to help us prove that Thrm holds for . This is known as the induction hypothesis. Having this assumption to work with makes the induction step easier to prove than tackling the original theorem itself. Being able to rely on the induction hypothesis provides extra information that we can bring to bear on the problem.
Recursion and induction have many similarities. Both are anchored on one or more base cases. A recursive function relies on the ability to call itself to get the answer for smaller instances of the problem. Likewise, induction proofs rely on the truth of the induction hypothesis to prove the theorem. The induction hypothesis does not come out of thin air. It is true if and only if the theorem itself is true, and therefore is reliable within the proof context. Using the induction hypothesis to do work is exactly the same as using a recursive call to do work.
Example: Sum of first integers
Here is a sample proof by mathematical induction. Call the sum of the first positive integers .
Theorem: .
Proof: The proof is by mathematical induction.
Check the base case. For , verify that . is simply the sum of the first positive number, which is 1. Because , the formula is correct for the base case.
State the induction hypothesis. The induction hypothesis is
Use the assumption from the induction hypothesis for to show that the result is true for . The induction hypothesis states that , and because , we can substitute for to get
Thus, by mathematical induction,
Note carefully what took place in this example. First we cast in terms of a smaller occurrence of the problem: . This is important because once comes into the picture, we can use the induction hypothesis to replace with . From here, it is simple algebra to prove that equals the right-hand side of the original theorem.
Example: Recurrence relation
This example shows how we can use induction to prove that a proposed closed-form solution for a recurrence relation is correct.
Theorem: The following recurrence relation has closed-form solution :
Proof: To prove the base case, we observe from the definition that . From the proposed closed-form solution we get , which matches the definition.
The induction hypothesis is that . Combining the definition of the recurrence with the induction hypothesis, we see immediately that
for . Thus, we have proved the theorem correct by mathematical induction.
1.4.7 Estimation
The concept of estimation might be unfamiliar to many readers. Estimation is not a mathematical technique, but rather a general engineering skill. This is sometimes known as βback of the napkinβ or βback of the envelopeβ calculation. Both nicknames suggest that only a rough estimate is produced. It is very useful to computer scientists doing design work, because any proposed solution whose estimated resource requirements fall well outside the problemβs resource constraints can be discarded immediately, allowing time for greater analysis of more promising solutions.
Estimation techniques are a standard part of engineering curricula but are often neglected in computer science. Estimation is no substitute for rigorous, detailed analysis of a problem, but it can help to decide when a rigorous analysis is warranted: If the initial estimate indicates that the solution is unworkable, then further analysis is probably unnecessary.
Estimation can be formalised by the following three-step process:
- Determine the major parameters that affect the problem.
- Derive an equation that relates the parameters to the problem.
- Select values for the parameters, and apply the equation to yield an estimated solution.
When doing estimations, a good way to reassure yourself that the estimate is reasonable is to do it in two different ways. In general, if you want to know what comes out of a system, you can either try to estimate that directly, or you can estimate what goes into the system (assuming that what goes in must later come out). If both approaches (independently) give similar answers, then this should build confidence in the estimate.
When calculating, be sure that your units match. For example, do not add feet and pounds. Verify that the result is in the correct units. Always keep in mind that the output of a calculation is only as good as its input. The more uncertain your valuation for the input parameters in Step 3, the more uncertain the output value. However, back of the envelope calculations are often meant only to get an answer within an order of magnitude, or perhaps within a factor of two. Before doing an estimate, you should decide on acceptable error bounds, such as within 25%, within a factor of two, and so forth. Once you are confident that an estimate falls within your error bounds, leave it alone! Do not try to get a more precise estimate than necessary for your purpose.
Example: Bookcases for a million pages
How many bookcases does it take to store books containing one million pages? Letβs estimate that a book of 200 pages requires one cm on the library shelf (it will help to look at the size of any handy book), yielding about 50 metres of shelf space for one million pages. If a shelf is 80 cm wide, then around 60-65 shelves are required. If a bookcase contains 6 shelves, this yields about 10 bookcases.
To reach this conclusion, we estimated the number of pages per cm, the width of a shelf, and the number of shelves in a bookcase. Any of these estimates can be wrong: books have thinner or thicker paper, shelves can be narrower or wider, and bookcases can have more or less shelves. Letβs say that the paper thickness can vary Β±50%, that shelves can be 60β100 cm wide, and that there can be 5β8 shelves in a bookcase. This suggests that in the extreme case we might be wrong by a factor 2β3, but itβs much more likely that some of the errors even out so that our estimate is much more accurate.
1.4.8 Random numbers
The success of randomised algorithms depends on having access to a good random number generator. While modern compilers are likely to include a random number generator that is good enough for most purposes, it is helpful to understand how they work, and to even be able to construct your own in case you donβt trust the one provided. This is easy to do.
First, let us consider what a random sequence is. From the following list, which appears to be a sequence of βrandomβ numbers?
- 1, 1, 1, 1, 1, 1, 1, 1, 1, β¦
- 1, 2, 3, 4, 5, 6, 7, 8, 9, β¦
- 2, 7, 1, 8, 2, 8, 1, 8, 2, β¦
In fact, all three happen to be the beginning of a some sequence in which one could continue the pattern to generate more values (in case you do not recognise it, the third one is the initial digits of the irrational constant ). Viewed as a series of digits, ideally every possible sequence has equal probability of being generated (even the three sequences above). In fact, definitions of randomness generally have features such as:
- One cannot predict the next item better than by guessing.
- The series cannot be described more briefly than simply listing it out. This is the equidistribution property.
There is no such thing as a random number sequence, only βrandom enoughβ sequences. A sequence is pseudo random if no future term can be predicted in polynomial time, given all past terms.
Most computer systems use a deterministic algorithm to select pseudorandom numbers. The most commonly used approach historically is known as the Linear Congruential Method (LCM). The LCM method is quite simple. We begin by picking a seed that we will call . Then, we can compute successive terms as follows, where and are some constant positive integers.
By definition of the function, all generated numbers must be in the range 0 to . Now, consider what happens when for values and . Of course then which means that we have a repeating cycle.
Since the values coming out of the random number generator are between 0 and , the longest cycle that we can hope for has length . In fact, since , it cannot even be quite this long. It turns out that to get a good result, it is crucial to pick good values for both and . To see why, consider the following example.
Example: Varying the value
Given a value of 13, we can get very different results depending on the value that we pick, in ways that are hard to predict.
In the case of , the generator goes through only a short sequence before repeating, with the series depending on the seed value chosen. Clearly, a value of 5 is far inferior to values of 6 or 7 in this example.
If you would like to write a simple LCM random number generator of your own, an effective one can be made with the following formula:
Another approach is based on using a computer chip that generates random numbers resulting from βthermal noiseβ in the system. Time will tell if this approach replaces deterministic approaches.