Lecture 24

Complexity

As we saw last week, different algorithms to solve the same problem may have different complexities - how can this complexity be measured?

The execution time of an algorithm will be measured in steps. The time taken for each step is not very important; we only have to wait a few years for technology to reduce this time. What is important is the relationship between the size of the input data and the total number of steps required.

If there are n 'pieces' of input data (characters, numbers strings etc...) then we can express the number of steps as a function of n. e.g. n2, 2nlogn, 4n2+5n+100. In a function such as 4n2+5n+100, as n gets larger, the 4n2 term will dominate the other terms. The dominating term is called the asymptotic behavior of the algorithm. This behavior will often be all that we need to know about the complexity of the algorithm.
 
Size of input data (n) log2 n micro- seconds n micro- seconds n2 micro- seconds 2n micro- 
seconds
10 .000003 seconds .00001 seconds .0001 seconds 0.001
seconds
100 .000007 seconds .0001 seconds .01
seconds
1014
centuries
1000 .00001 seconds .001 seconds 1
second
astronomical
10000 .000013 seconds .01
seconds
1.7
minutes
astronomical
100000 .000017 seconds .1
seconds
2.8
hours
astronomical

Any constant factor in the dominant term can also be ignored because this is the same as changing the size of a step. After ignoring all the non-dominant terms and constants, we get an approximate measure of complexity. This is described using 'O-notation' (big-oh-notation). For example, an algorithm is considered O(n2), if it takes a time roughly proportional to n2 for large n.

Algorithms where the complexity is of the form: O(nC) where C is a constant are called polynomial algorithms. Polynomial algorithms often have practical solutions.

Algorithms where the complexity is of the form: O(Cn) where C is a constant are called exponential algorithms. These algorithms are often insoluble since the computation soon becomes enormous, even for relatively small n.

Example: What is the complexity of the bubble sort.

void bubble(int n[],int no) {
        int i,j,t;
        for(i=0;i<no;i++)
          for(j=0;j<no-i-1;j++)
            if(n[j]>n[j+1])  {
              t=n[j];
              n[j]=n[j+1];
              n[j+1]=t;
            }
      }
Here, we can use 'no' as n. If the if statement in the inner loop represents one step, then the number of steps taken is:
  Thus, the complexity of the bubble sort is: O(n2)

How about the quicksort:

void quicksort(int n[], int left,int right) {
  int dp;
  if (left<right)  {
    dp=partition(n,left,right);
    quicksort(n,left,dp-1);
    quicksort(n,dp+1,right);
  }
}
This algorithm will take different times depending on the input data. Now, we must do a best case and a worst case complexity analysis.

Assume that the partition algorithm takes n steps. The worst case time for quicksort will be when the array is already sorted. In this case, the division point will always be the first position in the array. The quicksort function will be called n times for arrays of size (n-1),(n-2),...1. The quicksort has exactly the same complexity as the bubble sort - O(n2), in this case.

The best case occurs when the division point is always half way through the array. Now the quicksort function will be called twice for arrays of size (n/2), four times for arrays of size (n/4)....n times for arrays of size 1. Each of these sets of calls (2*n/2,4*n/4,8*n/8...n*1) takes n steps. There are log2n sets of calls and so the best case complexity of the quicksort is - O(nlog2n).
 

In practice, the performance of the quicksort for randomly ordered data is nearer the best case. We can stop the quicksort having O(n2) complexity for ordered data by randomly choosing the pivot.

Every problem will have an inherent complexity, i.e. the complexity of the best algorithm that can solve the problem. We do not know the inherent complexity of many problems, in these cases we only know the complexity of the best algorithm found so far. This is referred to as an upper bound on the complexity.