Lecture 16--Lower bounds for the sorting problem--key comparisons, inversions. 11/3/97.
================================================================================
NOTE: REMEMBER THAT YOUR INTERIM REPORT ON THE PROGRAMMING ASSIGNMENT IS DUE ON FRIDAY, NOVEMBER 7. EACH PROGRAMMING GROUP SHOULD HAND IN ONE PAGE DESCRIBING WHAT HAS BEEN ACCOMPLISHED, WHAT REMAINS TO BE ACCOMPLISHED, AND THE GROUP'S PLAN FOR COMPLETING THE ASSIGNMENT ON TIME.
================================================================================
16.1. Lower bound arguments. Recall that in Lecture 8 we showed that any comparison-based algorithm which correctly searches any n-element array for an item X must do at least |_ log2n _| comparisons for some input, i.e., if C(n) is the number of comparisons done by the algorithm,
C(n) = "capital omega"(log2n).Since
we have also exhibited an algorithm, binary search, which does a maximum of O(log2n) comparisons to search a sorted array (and for which the running time is also "capital theta"(log2n)), we know that binary search is an optimal time algorithm, up to a constant multiple, for the problem of searching a sorted array using comparisons. That is, up to a constant multiple, we have determined the best possible algorithm, among all algorithms using comparisons, for searching a sorted array of n elements. We have also shown, by exhibiting and analyzing the algorithm linear search, that in the case of an unsorted array the number of comparisons, C'(n), satisfies
C'(n) = "capital omega"(log2n),
C'(n) = O(n).
So in the case of an unsorted array there is a gap between the best algorithm we have been able to find and the lower bound we can prove.
In this lecture we will consider lower bound results for the problem of sorting an array of n keys chosen from some ordered set. To simplify notation and make the proofs which follow less complex, we will assume that we are sorting arrays of n distinct integers 0,1,2,3, ... , n-1. We know that there are n! different permutations of these integers.
16.2. Algorithms which use local comparisons. We have looked at two algorithms, insertion sort and modified bubblesort, which are "capital theta"(n2) algorithms. Both of these algorithms sort by interchanging adjacent array elements only. It turns out that if we use this "local" strategy then these algorithms are asymptotically the fastest we can ever hope to find. To prove this statement we need to look at the structure of a permutation. Let P be a permutation of the integers 0,1,2,...,n-1. We can represent P by giving the sequence of integers into which P transforms the sequence 0 1 2 ... n-1. For example, if P is a permutation of 0 1 2 3 and P sends 2 to position 0, 3 to position 1, 1 to position 2, and 0 to position 3, we can write
P = 2 3 1 0.
We say that a pair (P(i),P(j)) is an inversion if i < j and P(i) > P(j). For example, if P = 3 4 2 1 5, then P has five inversions, (3,2), (3,1), (4,2), (4,1), and (2,1).
Lemma 1. There exists a permutation with n(n-1) / 2 inversions.
Proof. The permutation n-1 n-2 . . . 1 0 has (n-1) + (n-2) + ... + 1 = n(n-1) / 2 inversions.
Theorem 1. Any algorithm which sorts by comparisons of keys and removes at most one inversion after each comparison must do at least n(n-1) / 2 comparisons in the worst case.
Proof. Such an algorithm must remove all inversions from the permutation in Lemma 1, and this permustaion is a legal input to the algorithm.
Lemma 2. The expected number of inversions in a permutation of n elements (averaged over all n! permutations) is n(n-1) / 4.
Proof. For any permutation P = j0, j1, . . . , jn-1, define its transpose Pt = jn-1, ... , j1, j0. (For example, the transpose of 3 4 2 1 5 is 5 1 2 4 3). Now for any pair j,k, with j < k, the inversion (k,j) appears either in P or in Pt, but not both, for every permutation P. There are
"SUM"0<=j<=n-1(n-1-j)
= "SUM"0<=j'<=n-1(j')
= n(n-1) /2 such inversions. So every pair of permutations P,Pt contains a total of n(n-1)/2 inversions, and so on the average a permutation contains {n(n-1) / 2 } * (1/2)= n(n-1) / 4 inversions.
Theorem 2. Any algorithm which sorts by comparisons of keys and removes at most one inversion after each comparison must do an average of at least n(n-1) / 4 comparisons.
Proof. On the average a permutation contains n(n-1) / 4 inversions, by Lemma 2, and so the algorithm must remove this many.
Corollary to Theorem 1 and Theorem 2. Any algorithm which sorts by comparisions of keys and removes at most one inversion after each comparison must do "capital omega"(n2) comparisons both in the worst case and in the average case.
These results show that, since insertion sort and (modified) bubble sort remove at most inversion after each comparison, they are both as fast as possible (up to a constant multiple) if we restrict ourselves to algorithms using such a "local" strategy.
16.3. Algorithms which use key comparisons--the general case. In the general case an algorithm may make comparisons between elements widely separated in the array. If these two elements are swapped, several inversions may be removed at once. We have looked at several algorithms which use this type of comparison--quicksort, heapsort, mergesort, and shellsort--and we have seen that these algorithms can achieve faster running times than "local " algorithms such as insertion sort and bubble sort. In fact, in the worst case, both mergesort and heapsort can execute in time "capital theta"(nlog2n). In this section we will show that for comparison-based sorts this is the best time we can achieve. To do so we suppose we are given some sorting algorithm A. We construct a (binary) decision tree representing all possible sequences of instructions executed by algorithm A. Again we suppose we are sorting an array of n distinct integers, L[0], L[1], ... , L[n-1]. We assume that when the algorithm is finished making comparisons (and the array is sorted) the algorithm A outputs the sorted array. We construct the decision tree by defining a node for each output instruction and each compare instruction executed by the algorithm A as follows:
For each output instruction, the tree contains a node labeled with the rearrangement of keys that will be output when that instruction is executed.For each comparison instruction, where L[i] is compared with L[j], the tree contains a node labeled (i,j). This node has a left child which represents the next compare or output instruction executed if L[i] < L[j] and a right child which represents the next compare or output instruction executed if L[i] > L[j]. The root of the decision tree is the node associated with the first compare instruction the sorting algorithm executes. We can also assume that the tree is "pruned" by removing any comparison nodes with only one child and also by removing any paths in the tree which are never followed. This pruning will only make the tree smaller, and a lower bound based on this smaller tree will therefore also be a lower bound for the algorithm represented by the unpruned tree. After the pruning the tree has n! leaves, one for each possible output, and all internal nodes have degree 2.
Example: If we sort three keys using the algorithm given in the pseudocode below, we get the decision tree shown:
if L[0] < L[1] then
if L[1] < L[2] then
output L[0], L[1], L[2] (a)
else
if L[0] < L[2] then
output L[0], L[2], L[1] (b)
else
output L[2], L[0], L[1] (c)
else
if L[0] < L[2] then
output L[1], L[0], L[2] (d)
else
if L[1] < L[2] then
output L[1], L[2], L[0] (e)
else
output L[2], L[1], L[0] (f)
?
0<1
/ \
? ?
1<2 0<2
/ \ / \
/ \ d102 \
a012 ? ?
0<2 1<2
/ \ /\
b021 c201 / \
/ \
e120 f210
Now the decision tree will have n! leaves, one for each possible ordering of the input. Executing the sort for a given input corresponds to algorithm A following a path from the root to a leaf representing the sorted output for that input. The number of comparisons done by A to produce a given output corresponds to the internal path length of the path A follows for that output. The worst case behavior of A is represented by the length of the longest path, which is the height of the tree. The average behavior of A is just the average length of all the paths from the root to a leaf. In the above example the tree has height 3 and so in the worst case A does 3 compares. The tree has 4 paths of length 3 and 2 paths of length 2, so on the average A does 16/6 = 8/3 i.e., 2 and 2/3 compares.
Exercise 16.1. Show the decision trees for n = 3 for
a. insertion sort
b. bubble sort
c. heapsort
Exercise 16.2 Give an algorithm to sort four integers which uses only five comparisons in the worst case.
Theorem 3. Any algorithm A which sorts n items by comparison of keys does at least ceiling(log2(n!)) = "capital omega"(nlog2n) comparisons in the worst case.
Proof. We hav shown that if a binary tree has l leaves and height h then l <= 2h. Equivalently, log2l <= h. We know that the binary decision tree constructed above must have n! leaves. So we know that log2(n!) is a lower bound on the height of the decision tree. Since the height of the decision tree equals the number of comparisons done in the worst case, log2(n!) is also a lower bound on the number of comparisons. So we need only estimate log2(n!). But we knowlog2(n!) = "SUM"1<=j<=nlog2j
= "capital theta"("integral"1<=x<=nlog2xdx)
="capital theta"(nlog2n).
Exercise 16.3. (grad) Determine constants c and d so that for n >= 1
c*("integral"1<=x<=nlog2xdx) <= (nlog2n) <= d*("integral"1<=x<=nlog2xdx)
Exerciswe 16.4 Determine an approximation to log2(n!) using Stirling's formula.
Note that Theorem 3 shows that mergesort and heapsort are asymptotically optimal algorithms for sorting n integers if we restrict our attention to comparison-based sorts.
We can also show that the average path length of a binary decsion tree constructed as above is "capital theta"(nlog2n), and so mergesort, heapsort, and also quicksort are asymptotically optimal in terms of average behavior.
Recall that we constructed our binary decision tree by pruning so that every internal node had 2 children.
Lemma 3. For a binary decision tree T define H(T) to be the sum of the heights of the leaves of T. Let H(m) be the minimum value of H(T) for ALL binary decision trees T with m leaves. Then
H(m) >= mlog2m.
Proof. We prove this by induction on m.
Step 1. If m = 1 then log2m = 0 and H(m) = 0 so the assumption is true.
Step 2. Now let m be any integer > 1. Assume the statement is true for all binary decision trees with k < m leaves. Let T be any binary decsion tree with m leaves. Then T consists of a root with a left subtree TL containing j leaves and a right subtree TR containing m-j leaves, where 1 <= j <= m-1, and we have
H(TL) + H(TR) >= min1<=j<=m-1(jlog2j + (m-j)log2(m-j))
by the induction hypothesis. Now each leaf in TR is on a path in T of length 1 greater than its length in TR. Similarly each leaf in TL is on a path in T of length 1 greater than its length in TL. So we get
H(T) = j + H(TR) + (m-j) + H(TL) = m + H(TR) + H(TL)
>= min1<=j<=m-1(m + jlog2j + (m-j)log2(m-j))
and so
H(m) = min1<=j<=m-1(m + jlog2j + (m-j)log2(m-j))
and by using calculus to determine the minimum value of m + xlog2x + (m-x)log2(m-x)on the interval [1,m-1] we see that H(m) takes on its minimum value at j = m / 2.This gives
H(m) = m + 2(m/2)log2(m/2) = m + mlog2m - mlog22 = mlog2m,
i.e.,H(m) = "capital omega"(mlog2m).
Step 3. since we have verified steps 1 and 2, we have proved the required inequality by induction.
Exercise 16.5. Verify that the minimum of m + xlog2x + (m-x)log2(m-x) on the interval [1,m-1] occurs at x = m/2.
Theorem 4. Every comparison sort on a list of n elements makes at least "capital omega"(nlog2n) comparisons on the average (when all inputs are assumed to be equally likely).
Proof. By Lemma 3, the binary decision tree for this sort has expected height at least "capital theta"(nlog2n) and this is also the expected number of comparisons done by the sorting algorithm.
Exercise 16.6. Consider the binary decision tree for sorting three integers in Section 16.1. Assume that the inpout to the associated algorithm is already sorted with probability 1/2 and that all other inputs are equally likely. What is the average number of comparisons made by this algorithm in this case?