3.5 Asymptotic Analysis and Upper Bounds
Two views of a graph illustrating the growth rates for six equations. The bottom view shows in detail the lower-left portion of the top view. The horizontal axis represents input size. The vertical axis can represent time, space, or any other measure of cost.
Despite the larger constant for the curve labeled in the figure above, crosses it at the relatively small value of . What if we double the value of the constant in front of the linear equation? As shown in the graph, is surpassed by once . The additional factor of two for the linear growth rate does not much matter. It only doubles the -coordinate for the intersection point. In general, changes to a constant factor in either equation only shift where the two curves cross, not whether the two curves cross.
When you buy a faster computer or a faster compiler, the new problem size that can be run in a given amount of time for a given growth rate is larger by the same factor, regardless of the constant on the running-time equation. The time curves for two algorithms with different growth rates still cross, regardless of their running-time equation constants. For these reasons, we usually ignore the constants when we want an estimate of the growth rate for the running time or other resource requirements of an algorithm. This simplifies the analysis and keeps us thinking about the most important aspect: the growth rate. This is called asymptotic algorithm analysis. To be precise, asymptotic analysis refers to the study of an algorithm as the input size “gets big” or reaches a limit (in the calculus sense). However, it has proved to be so useful to ignore all constant factors that asymptotic analysis is used for most algorithm comparisons.
In rare situations, it is not reasonable to ignore the constants. When comparing algorithms meant to run on small values of , the constant can have a large effect. For example, if the problem requires you to sort many collections of exactly five records, then a sorting algorithm designed for sorting thousands of records is probably not appropriate, even if its asymptotic analysis indicates good performance. There are rare cases where the constants for two algorithms under comparison can differ by a factor of 1000 or more, making the one with lower growth rate impractical for typical problem sizes due to its large constant. Asymptotic analysis is a form of “back of the envelope” estimation for algorithm resource consumption. It provides a simplified model of the running time or other resource needs of an algorithm. This simplification usually helps you understand the behavior of your algorithms. Just be aware of the limitations to asymptotic analysis in the rare situation where the constant is important.
3.5.1 Upper Bounds
Several terms are used to describe the running-time equation for an algorithm. These terms – and their associated symbols – indicate precisely what aspect of the algorithm’s behavior is being described. One is the upper bound for the growth of the algorithm’s running time. It indicates the upper or highest growth rate that the algorithm can have.
Because the phrase “has an upper bound to its growth rate of ” is long and often used when discussing algorithms, we adopt a special notation, called big-Oh notation. If the upper bound for an algorithm’s growth rate (for, say, the worst case) is (f(n)), then we would write that this algorithm is “in the set in the worst case” (or just “in in the worst case”). For example, if grows as fast as (the running time of our algorithm) for the worst-case input, we would say the algorithm is “in in the worst case”.
The following is a precise definition for an upper bound. represents the true running time of the algorithm. is some expression for the upper bound.
For a non-negatively valued function, is in set if there exist two positive constants and such that for all .
Constant is the smallest value of for which the claim of an upper bound holds true. Usually is small, such as 1, but does not need to be. You must also be able to pick some constant , but it is irrelevant what the value for actually is. In other words, the definition says that for all inputs of the type in question (such as the worst case for all inputs of size ) that are large enough (i.e., ), the algorithm always executes in less than or equal to steps for some constant .
Example: Sequential search
Consider the sequential search algorithm for finding a specified value in an array of integers. If visiting and examining one value in the array requires steps where is a positive number, and if the value we search for has equal probability of appearing in any position in the array, then in the average case . For all values of , . Therefore, by the definition, is in for and .
Example: Quadratic algorithm
For a particular algorithm, in the average case where and are positive numbers. Then,
for all . So, for , and . Therefore, is in by the second definition.
Example: Accessing an array cell
Assigning the value from a given position of an array to a variable takes constant time regardless of the size of the array. Thus, (for the best, worst, and average cases). We could say in this case that is in . However, it is traditional to say that an algorithm whose running time has a constant upper bound is in .
If someone asked you out of the blue “Who is the best?” your natural reaction should be to reply “Best at what?” In the same way, if you are asked “What is the growth rate of this algorithm”, you would need to ask “When? Best case? Average case? Or worst case?” Some algorithms have the same behavior no matter which input instance of a given size that they receive. An example is finding the maximum in an array of integers. But for many algorithms, it makes a big difference which particular input of a given size is involved, such as when searching an unsorted array for a particular value. So any statement about the upper bound of an algorithm must be in the context of some specific class of inputs of size . We measure this upper bound nearly always on the best-case, average-case, or worst-case inputs. Thus, we cannot say, “this algorithm has an upper bound to its growth rate of ” because that is an incomplete statement. We must say something like, “this algorithm has an upper bound to its growth rate of in the average case”.
Knowing that something is in says only how bad things can be. Perhaps things are not nearly so bad. Because sequential search is in in the worst case, it is also true to say that sequential search is in . But sequential search is practical for large in a way that is not true for some other algorithms in . We always seek to define the running time of an algorithm with the tightest (lowest) possible upper bound. Thus, we prefer to say that sequential search is in . This also explains why the phrase “is in ” or the notation “” is used instead of “is ” or “”. There is no strict equality to the use of big-Oh notation. is in , but is not in .
3.5.2 Simplifying Rules
Once you determine the running-time equation for an algorithm, it really is a simple matter to derive the big-Oh expressions from the equation. You do not need to resort to the formal definitions of asymptotic analysis. Instead, you can use the following rules to determine the simplest form.
- If is in and is in , then is in .
- If is in for any constant , then is in .
- If is in and is in , then is in .
- If is in and is in , then is in .
The first rule says that if some function is an upper bound for your cost function, then any upper bound for is also an upper bound for your cost function.
The significance of rule (2) is that you can ignore any multiplicative constants in your equations when using big-Oh notation.
Rule (3) says that given two parts of a program run in sequence (whether two statements or two sections of code), you need consider only the more expensive part.
Rule (4) is used to analyze simple loops in programs. If some action is repeated some number of times, and each repetition has the same cost, then the total cost is the cost of the action multiplied by the number of times that the action takes place.
Taking the first three rules collectively, you can ignore all constants and all lower-order terms to determine the asymptotic growth rate for any cost function. The advantages and dangers of ignoring constants were discussed near the beginning of this section. Ignoring lower-order terms is reasonable when performing an asymptotic analysis. The higher-order terms soon swamp the lower-order terms in their contribution to the total cost as (n) becomes larger. Thus, if , then is in . The term contributes relatively little to the total cost for large .
From now on, we will use these simplifying rules when discussing the cost for a program or algorithm.
3.5.3 Tight Bounds
The definition for big-Oh allows us to greatly overestimate the cost for an algorithm. But sometimes we know a tight bound – that is, a bound that truly reflects the cost of the algorithm or program with a constant factor. In that case, we can express this more accurate state of our knowledge using the symbol instead of using big-Oh.
While some textbooks and programmers will casually say that an algorithm is “order of” or “big-Oh” of some cost function, it is generally better to use notation rather than big-Oh notation whenever we have sufficient knowledge about an algorithm to be sure that we know the cost to within a constant factor. OpenDSA modules use notation in preference to big-Oh notation whenever our state of knowledge makes that possible.
3.5.4 Summary
3.5.5 Practice questions: Asymptotic analysis
Answer TRUE or FALSE.
The Sequential Search algorithm is in .
- Recall that means that the program has no inputs that cost more than this.
- A proposed upper bound can be much higher than the actual cost of the program, and still be correct.
For sequential search, the best case occurs when:
- It is a serious misunderstanding of algorithm analysis to think that best case is related to input size. The array being small has nothing to do with what is the best case input (of a given size).
- Implementation of an algorithm (as a program) has nothing to do with the algorithm’s best case.
- Since sequential search can be used on an unsorted array, the value for the search key (large or small) might have nothing to do with where the record with that key will be in the array.
What is the smallest integer such that is in ?
- What is the smallest integer greater than 0.5?
What is the smallest integer such that is in ?
- is bigger than .
- So, that means 1 is too small.
- But, is less than .
- Actually is less than even for tiny positive values of .
Which of these is the best upper bound for a growth rate of ?
- The simplifying rules tell us that we can drop constants and lower order terms from a polynomial that defines the growth rate.
Which of these is the best upper bound for a growth rate of ?
- The simplifying rules tell us that we can drop constants and lower order terms from a polynomial that defines the growth rate.
- If you don’t see a tight upper bound, pick the smallest bound that is larger.