3.8 Analyzing Problems

You most often use the techniques of “algorithm” analysis to analyze an algorithm, or the instantiation of an algorithm as a program. You can also use these same techniques to analyze the cost of a problem. The key question that we want to ask is: How hard is a problem? Certainly we should expect that in some sense, the problem of sorting a list of records is harder than the problem of searching a list of records for a given key value. Certainly the algorithms that we know for sorting some records seem to be more expensive than the algorithms that we know for searching those same records.

What we need are useful definitions for the upper bound and lower bound of a problem.

One might start by thinking that the upper bound for a problem is how hard any algorithm can be for the problem. But we can make algorithms as bad as we want, so that is not useful. Instead, what is useful is to say that a problem is only as hard as what we CAN do. In other words, we should define the upper bound for a problem to be the best algorithm that we know for the problem. Of course, whenever we talk about bounds, we have to say when they apply. We we really should say something like the best algorithm that we know in the worst case, or the best algorithm that we know in the average case.

But what does it mean to give a lower bound for a problem? Lower bound refers to the minimum that any algorithm MUST cost. For example, when searching an unsorted list, we MUST look at every record. When sorting a list, we MUST look at every record (to even know if it is sorted).

It is much easier to show that an algorithm (or program) is in Ω(f(n))\Omega(f(n)) than it is to show that a problem is in Ω(f(n))\Omega(f(n)). For a problem to be in Ω(f(n))\Omega(f(n)) means that every algorithm that solves the problem is in Ω(f(n))\Omega(f(n)), even algorithms that we have not thought of! In other words, EVERY algorithm MUST have at least this cost. So, to prove a lower bound, we need an argument that is true, even for algorithms that we don’t know.

So far all of our examples of algorithm analysis give “obvious” results, with big-Oh always matching Ω\Omega. To understand how big-Oh, Ω\Omega, and Θ\Theta notations are properly used to describe our understanding of a problem or an algorithm, it is best to consider an example where you do not already know a lot about the problem.

Let us look ahead to analyzing the problem of sorting to see how this process works. What is the least possible cost for any sorting algorithm in the worst case? The algorithm must at least look at every element in the input, just to determine that the input is truly sorted. Thus, any sorting algorithm must take at least cncn time. For many problems, this observation that each of the nn inputs must be looked at leads to an easy Ω(n)\Omega(n) lower bound.

In your previous study of computer science, you have probably seen an example of a sorting algorithm whose running time is in O(n2)O(n^2) in the worst case. The simple Bubble Sort and Insertion Sort algorithms typically given as examples in a first year programming course have worst case running times in O(n2)O(n^2). Thus, the problem of sorting can be said to have an upper bound in O(n2)O(n^2). How do we close the gap between Ω(n)\Omega(n) and O(n2)O(n^2)? Can there be a better sorting algorithm? If you can think of no algorithm whose worst-case growth rate is better than O(n2)O(n^2), and if you have discovered no analysis technique to show that the least cost for the problem of sorting in the worst case is greater than Ω(n)\Omega(n), then you cannot know for sure whether or not there is a better algorithm.

Many good sorting algorithms have running time that is in O(nlogn)O(n \log n) in the worst case. This greatly narrows the gap. With this new knowledge, we now have a lower bound in Ω(n)\Omega(n) and an upper bound in O(nlogn)O(n \log n). Should we search for a faster algorithm? Many have tried, without success. Fortunately (or perhaps unfortunately?), we can prove that any sorting algorithm must have running time in Ω(nlogn)\Omega(n \log n) in the worst case. This proof is one of the most important results in the field of algorithm analysis, and it means that no sorting algorithm can possibly run faster than cnlognc n \log n for the worst-case input of size nn. Thus, we can conclude that the problem of sorting is Θ(nlogn)\Theta(n \log n) in the worst case, because the upper and lower bounds have met.

Knowing the lower bound for a problem does not give you a good algorithm. But it does help you to know when to stop looking. If the lower bound for the problem matches the upper bound for the algorithm (within a constant factor), then we know that we can find an algorithm that is better only by a constant factor.

So, to summarize: The upper bound for a problem is the best that you CAN do, while the lower bound for a problem is the least work that you MUST do. If those two are the same, then we say that we really understand our problem.

3.8.1 Practice questions: Analyzing problems

Answer TRUE or FALSE.

No algorithm for searching in an unsorted array can be worse than O(n)O(n) since any algorithm must look at every value in the array in the worst case.

  • Do you think that sequential search is the only algorithm to search an unsorted array?
  • Can’t someone write a worse algorithm, perhaps by adding unnecessary work?

Answer TRUE or FALSE.

The lower bound in the worst case for the problem of searching an unsorted array is Ω(n)\Omega(n) because this is the worst case cost of the sequential search algorithm.

  • While it is true that sequential search is Ω(n)\Omega(n) in the worst case, this is not the whole story.
  • Just because the best algorithm that we happen to know has a certain cost, that does not mean that there is no better algorithm.
  • The reason that search in an unsorted array has a lower bound of Ω(n)\Omega(n) is because we can prove that any algorithm MUST look at every element (in some order) in the worst case.

Answer TRUE or FALSE.

The upper bound for a problem is defined as the upper bound cost for the worst algorithm that we know.

  • There is no limit to how bad someone might make an algorithm. So this can’t make sense as a definition.

Answer TRUE or FALSE.

The upper bound for a problem is defined as the upper bound cost for the best algorithm that we know.

  • This comes straight from the definition for the upper bound of a problem.

Answer TRUE or FALSE.

The lower bound for a problem is defined as the cost of the best algorithm that we know.

  • Just because we know some algorithm does not mean that there does not exist some better algorithm.
  • The lower bound for a problem is the best that an algorithm COULD be, not just what we happen to know.

Answer TRUE or FALSE.

The lower bound for a problem is defined as the least cost that any algorithm could reach.

  • This comes straight from the definition for the lower bound of a problem.

Answer TRUE or FALSE.

The worst case upper bound for sorting an array is O(nlogn)O(n \log n) since this is the cost of the best algorithm (in the worst case) that we know about.

  • This comes straight from the definition for problem upper bound.

Answer TRUE or FALSE.

The lower bound of the sorting problem is O(nlogn)O(n \log n) because we can prove that this is the best cost that any sorting algorithm could reach.

  • This comes straight from the definition of a problem lower bound.

Answer TRUE or FALSE.

The worst case lower bound for sorting an array is O(nlogn)O(n \log n) since this is the cost of the best algorithm (in the worst case) that we know about.

  • Just because we don’t know of a better algorithm does not mean that there is no better algorithm.
  • While it is true that the lower bound for sorting is O(nlogn)O(n \log n), this is not the right reason.
  • The right reason is because we can prove that no algorithm can do better.