Lecture 11--quicksort. 10/20/97.

=======================================================================

11.1. The basic quicksort algorithm. In Lecture 10 we saw two examples of simple sorts, insertion sort and bubble sort, both of which required "capital theta"(n2) comparisons (and actual running time) in the worst case and also on the average. Both also sorted the input array "in place", i.e., without having to make use of an auxiliary array. Either is a reasonable choice for sorting small arrays, and modified bubble sort can also work well on large arrays if the input data is in a certain form already.

The quicksort algorithm is very simple to describe and it can also be implemented without use of an auxiliary array. We will see below that its worst case running time is "capital theta" (n2), but its expected running time is only "capital theta"(nlog2n), and in practice it performs well. For sorting arrays which are of moderate size or larger, quicksort is probably the most popular algorithm. Implementing quicksort is somewhat trickier than implementing insertion sort or bubble sort, and this definitely should be taken into account when the array to be sorted is not very large.

Quicksort uses a simple divide and conquer strategy. First let us give a general pseudocode description of quicksort(S), for a set S.

quicksort(S)

choose pivot in S.

split S into S1 = {x | x < pivot} and S2 = {x | x >= pivot}

return quicksort(S1) followed by quicksort(S2)

In the programming assignment a deterministic version of quicksort is given, with S an array A[0] , . . . , A[n-1], and pivot the first element of the subarray that is being worked on. Both recursive and nonrecursive versions are given. We can prove that quicksort is correct by induction. We omit the proof here.

In fact, any element can be chosen to be the pivot. Since the array is initially assumed to be unsorted, there is no way to give a deterministic recipe for choosing the pivot which guarantees that the sets S1 and S2, at any given level of recursion, will be of about equal size.

Exercise 11.1. Using the pseudocode in the programming assignment (use the deterministic recursive version), show how quicksort sorts the array A if A is a 10-element array initially containing the integers 0-9 in the order 3,7,5,8,2,9,0,4,1,6.

Exercise 11.2. Suppose A is an n-element array containing, in some order, the integers 0, 1, ..., n-1.

a. Give an example of how these integers should be arranged initially so that each split j, using the splitting algorithm in the programming assignment, gives one set of size 1 and one set of size n-j for j = 1,2,...,n if the first element is always chosen as the pivot.

b. Give a "rule of thumb" describing when using quicksort with this pivot choice should be avoided. What simple pivot choice might be used in these cases?

11.2. Worst-case behavior (and best-case behavior). Looking at the pseudocode for quicksort in the programming assignment, we see that we can describe the time it takes to sort an n-element array, T(n), by

T(n) = T(sort subarray of elements < pivot) + T(sort subarray of elements >= pivot) + cn,

where cn represents the time it takes to split the array into two subarrays (there is no need to recombine the two arrays, since the partitioning procedure puts the smaller elements in the beginning array locations and the larger in the later array locations).

Exercise 11.3. Based on the pseudocode in the programming assignment, give a reasonable value for c.

Now there are two interesting cases of this recurrence relation. In the "best case", each subarray would have about n / 2 elements and so we would get

T(n) = 2T( n / 2) + cn, T(1) = d,

and the "guess and check" method (including a proof by induction) would give

T(n) = "capital theta"( n log2 n).

But in the worst case we would only split off one element each time and so we would get

T(n) = T(1) + T(n-1) + cn.

Claim: the worst case running time for quicksort is "capital theta"(n2).

Proof. By induction we can show that

T(n) = T(n - j) + j*T(1) + c( n + n-1 + . . . + n-j+1) for 1 <= j <= n-1.

Exercise 11.4. (grad) prove this statement.

Therefore we have

T(n) = nT(1) + c(n (n+1) / 2 ) = "capital theta" (n2).

Note that the running time for quicksort is proportional to the number of comparisons between array elements that we do in the splitting procedure.

11.3. Managing the stack. Recursion. Instead of making two recursive calls, we can rearrange the steps in the quicksort algorithm to do only one call at a time at each level:

quicksort(first,last)

while first < last do

split(first,last,pivot)

quicksort(first,pivot)

first = pivot + 1

In either version we can replace the recursive calls by explicit stack manipulation.

Exercise 11.5. What will be the depth of the stack for the "worst case" of quicksort? What will it be for the "best" case?