2.7 Algorithm analysis in practice
In this section we show how to analyse complexity in practice, by analysing several simple code fragments. As mentioned in the last section, we will only give the upper bound using the big- notation.
2.7.1 Simplification rules
Once you determine the running-time equation for an algorithm, it is often a simple matter to derive the big- expressions from the equation. You do not need to resort to the formal definitions of asymptotic analysis. Instead, you can use the following rules to determine the simplest form.
Transitivity: If and , then
This first rule says that if some function is an upper bound for your cost function, then any upper bound for is also an upper bound for your cost function.
Constant factor: If , then , for any constant
- Special case: for all constants
The significance of this rule is that you can ignore any multiplicative constants in your equations when using big- notation.
Addition: If and , then
- Special case: if , then
This rule says that given two parts of a program run in sequence (whether two statements or two sections of code), you need consider only the more expensive part.
Multiplication: If and , then
The final rule is used to analyse simple loops in programs. If some action is repeated some number of times, and each repetition has the same cost, then the total cost is the cost of the action multiplied by the number of times that the action takes place.
Taking the first three rules collectively, you can ignore all constants and all lower-order terms to determine the asymptotic growth rate for any cost function. The advantages and dangers of ignoring constants were discussed in the beginning of the previous section. Ignoring lower-order terms is reasonable when performing an asymptotic analysis. The higher-order terms soon swamp the lower-order terms in their contribution to the total cost as becomes larger. Thus, if , then is in . The term contributes relatively little to the total cost for large .
From now on, we will use these simplifying rules when discussing the cost for a program or algorithm.
2.7.2 The complexity hierarchy
We can use the upper bound to define an ordering between complexity classes, where we write for . Using this we can infer the following hierarchy of complexity classes:
Note that we use strict equality here, and trust that you intuitively understand the difference between and . One interesting consequence of asymptotic complexity is that the base of a logarithm becomes irrelevant:
So we usually just write . The reason why the base is irrelevant is a direct consequence of the logarithm laws.
We leave as an exercise to the reader to figure out both the definition of and why the logarithm base is irrelevant. But we will come back to this issue in Chapter 5.
2.7.3 Analysing code fragments
When we want to analyse the complexity of code fragments, the following three rules of thumb will get us very far:
An atomic operation is always
A sequence, , has the complexity , where
An iteration, , has the complexity , where and
Atomic operations
Atomic operations are things like assigning a variable, looking up the value of an array index, or printing something. Not all atomic operations take the same time to execute, but we can assume that they are always constant.
If an operation takes constant time, , then , so we can assume that all atomic operations are .
Sequence of operations
A sequence of atomic operations, such as performing 10 assignments, is still constant (multiplying a constant with 10 is still constant). But what if we have a sequence of non-atomic operations?
E.g., suppose that we have the three operations , , and . The complexity of the sequence will then be sum of the parts, i.e.:
Loops and iterations
What if we perform some operation for each element in an array , what is the complexity? It depends on the size of the array: if we assume that the array has elements, what is the complexity of the following loop?
The loop performs once for every element in , meaning that will be executed times. Therefore the complexity of a loop is:
Note that can be a complex operation, for example a loop itself. If is a simple loop over A, with a constant-time operation in its body, then . And then the outer loop will be in .
2.7.4 Examples of algorithm analysis
Example: Simple assignment
We begin with an analysis of a simple assignment to an integer variable.
= b a
Because the assignment statement takes constant time, it is .
Example: Simple for-loop
Consider a simple for
loop.
sum = 0
for i in 0 .. n-1:
sum = sum + n
The first line is
.
The for
loop is repeated
times. The third line takes constant time so, by simplifying rule (4),
the total cost for executing the two lines making up the
for
loop is
.
By rule (3), the cost of the entire code fragment is also
.
Example: Many for-loops
We now analyse a code fragment with several for
loops,
some of which are nested.
sum = 0
for j in 0 .. n-1: // First for loop
for i in 0 .. j-1: // is a double loop.
sum = sum + 1
for k in 0 .. n-1: // Second for loop.
= k arr[k]
This code fragment has three separate statements: the first
assignment statement and the two for
loops. Again the
assignment statement takes constant time; call it
.
The second for
loop is just like the one in example with
one for
loop and takes
time.
The first for
loop is a double loop and requires a
special technique. We work from the inside of the loop outward. The
expression sum++
requires constant time; call it
.
Because the inner for
loop is executed
times, by simplifying rule (4) it has cost
.
The outer for
loop is executed
times, but each time the cost of the inner loop is different because it
costs
with
changing each time. You should see that for the first execution of the
outer loop,
is 1. For the second execution of the outer loop,
is 2. Each time through the outer loop,
becomes one greater, until the last time through the loop when
.
Thus, the total cost of the loop is
times the sum of the integers 1 through
.
We know that
which is . By simplifying rule (3), is simply .
Example: Comparing nested loops
Compare the asymptotic analysis for the following two code fragments.
= 0
sum1 for i in 0 .. n-1: // First double loop.
for j in 0 .. n-1: // Do n times.
= sum1 + 1
sum1
= 0
sum2 for i in 0 .. n-1: // Second double loop.
for j in 0 .. i-1: // Do i times.
= sum2 + 1 sum2
In the first double loop, the inner for
loop always
executes
times. Because the outer loop executes
times, it should be obvious that the statement sum1=sum1+1
is executed precisely
times. The second loop is similar to the one analysed in the previous
example, with cost
.
This is approximately
.
Thus, both double loops cost
,
though the second requires about half the time of the first.
Example: Non-quadratic nested loops
Not all doubly nested for
loops are strictly
,
the following is an example.
= 0
sum1 = 1
k while k <= n: // Do log n times.
for j in 0 .. n-1: // Do n times.
= sum1 + 1
sum1 = k * 2 k
To make our analysis easier, we will assume that
is a power of two. The code fragment has its outer for
loop
executed
times because on each iteration
is multiplied by two until it reaches
.
Now, because the inner loop always executes
times, the total cost for can be expressed as
So the cost of this double loop is . Note that the summation variable is the logarithm of the loop variable , i.e. .
Our analysis rules give the same result: the outer loop is logarithmic and the inner loop is linear, so we multiply them to .
2.7.5 Advanced algorithm analysis
The rules of thumb above do not always give the tightest possible complexity. In some (rare) cases the simple analysis might give a complexity of say , but a more detailed analysis will give a tighter bound, such as . So, is there something wrong with the rules?
No, the rules are correct, and this is because the notation gives an upper bound. Recall that every function is also in , since .
Example: A nested loop with linear complexity
If we take the non-quadratic example above and just do a very small change, we get a completely different complexity.
= 0
sum2 = 1
k while k <= n: // Do log n times.
for j in 0 .. k-1: // Do k times.
= sum2 + 1
sum2 = k * 2 k
The only difference is that the inner loop runs times, instead of times. The rules of thumb gives us the same complexity as before, , but a more careful analysis reveals a tighter bound.
The outer loop is executed times, and the inner loop has cost , which doubles each time. This can be expressed as the following summation, where is assumed to be a power of two and again .
Now, as mentioned in Section 1.4, this summation has a closed form solution . So, the complexity of the code fragment above is actually linear, , and not linearithmic.
Note that this is an exception where the simple analysis rules do not give a tight enough bound. But in almost all cases the rules work fine, and when they don’t it’s usually only by a logarithmic factor. (And as we all know, logarithmic factors are not that big a deal when it comes to computational complexity.)
2.7.6 Other control statements
What about other control statements? While
loops are
analysed in a manner similar to for
loops. The cost of an
if
statement in the worst case is the greater of the costs
for the then
and else
clauses. This is also
true for the average case, assuming that the size of
does not affect the probability of executing one of the clauses (which
is usually, but not necessarily, true). For switch
statements, the worst-case cost is that of the most expensive branch.
For subroutine calls, simply add the cost of executing the
subroutine.
There are rare situations in which the probability for executing the
various branches of an if
or switch
statement
are functions of the input size. For example, for input of size
,
the then
clause of an if
statement might be
executed with probability
.
An example would be an if
statement that executes the
then
clause only for the smallest of
values. To perform an average-case analysis for such programs, we cannot
simply count the cost of the if
statement as being the cost
of the more expensive branch. In such situations, the technique of amortised
analysis can come to the rescue.
2.7.7 Recursive functions
Determining the execution time of a recursive subroutine can be difficult. The running time for a recursive subroutine is typically best expressed by a recurrence relation. For example, the recursive factorial function calls itself with a value one less than its input value. The result of this recursive call is then multiplied by the input value, which takes constant time. Thus, the cost of the factorial function, if we wish to measure cost in terms of the number of multiplication operations, is one more than the number of multiplications made by the recursive call on the smaller input. Because the base case does no multiplications, its cost is constant. Thus, the running time for this function can be expressed as
The closed-form solution for this recurrence relation is . Recurrence relations are discussed further in Section 7.4.