Lecture 20--topological sort; finding connected components in a graph. 11/12/97

20.1. Topological sort. If G is an acyclic digraph, then if we trace out a path beginning at any vertex v in G we will never return to v. This simple observation allows us to define a (nonunique) partial order on the vertices of G which is often useful in algorithms involving G. (We call G a dag, for directed acyclic digraph).

Recall that a partial ordering on a set S is a binary relation, which we will denote <=, on S which is reflexive (r <= r), transitive (if r <= s and s <= t then r <= t), and antisymmetric (if r <= s and s <= r, then r = s).

Example. In VLSI design topological sort is part of an algorithm for PLA folding. A PLA or programmable logic array is essentially a "truth table" in hardware; if the PLA has m inputs and n outputs, then there will be m input columns, one for each input, and n output columns, one for each output. Each row represents a particular set of values of the inputs in the first m columns (1,0, or X for "don't care") and whether or not that row is included in each output in the last n columns. "Folding" the PLA refers to allowing some inputs or outputs to share columns, thus reducing the size of the circuit which implements the PLA. For example, suppose we have inputs X0,X1,X2,X3,X4, and outputs Y0,Y1,Y2. Letting V' = not V, VW = V and W, V + W = V or W, suppose we want

Y0 = X0(X1') + X1 X2

Y1 = X0 X1 + X2X3

Y2 = X0X1X4

Then the associated PLA would be represented as

10xxx 100

11xxx 010

11xx1 001

x11xx 100

xx11x 010

Since X3 and X4 do not both appear in any expression, we can "fold" them onto the same column, with X4 entering at the top and X3 entering at the bottom (where ~ represents a broken wire):

10xx 100

11xx 010

11x1 001

~

x11x 100

xx11 010

In this example we only save one column, but in large examples we might save 1/4 to 1/3 of the columns, and this would result in a significant reduction in circuit size. If we want to fold several columns, then we must decide if there is an order in which to list the rows so that we can do these multiple foldings. One way to do this is to define which rows must go "above" which other rows. This leads to a directed graph representation. If the graph is acyclic, we can do a topological sort on the rows to find a legal ordering which allows some folding to be done.

There are numerous other examples of problems which can be solved by representing them as a dag. For example, a PERT network is a dag which gives a job schedule when a set of jobs must be done and some of these jobs must be completed before others can begin.

Topological sort.

Input: G, an n-vertex dag

Output: A partial ordering <= of the vertices of G so that for any two vertices i and j, if there is a directed edge from i to j, then i <= j.

Assume the vertices are v0, v1, ... , vn-1

Each vertex will be given a label i from the set 0 <= i <= n-1

Define an array count[n], initialized to 0

Define a queue Q to hold vertices, initially empty

//Q will hold vertices with no predecessors; these are candidates for next available label

(a)  For i = 0 to n-1

for each vertex vj such that there is an edge vi ---> vj, count[j] = count[j] + 1

(b) For i = 0 to n-1

if count[i] = 0, add vi to Q

Set label = 0

(c) while Q is not empty

remove the next vertex v from the queue and assign it label

label = label + 1

for each vertex w such that there is an edge from v to w do

count[w] = count[w] - 1

if count[w] is now 0, add w to Q

Note that this algorithm is designed to work on the linked list representation of the dag G.

What is its running time?

Let |V| be the number of vertices in G (|V| = n) and |E| be the number of edges in G. Then steps a and c can be completed in time "capital theta" ( |V| + |E|), while step b can be completed in time "capital theta"(|V|). So the running time of topological sort is "capital theta"(|V| + |E|). (Note that we have accomplished topological sort without ever sorting the vertices).

Exercise 20.1. Draw the dag defined below and apply the topological sort algorithm to find a legal job schedule:

A factory assembly process consists of ten subjobs, which we will label A,B,C,D,E,F,G,H,I,J. The jobs have the following precedence requirements:

job(s) to be done      can only be done after these are completed

J A,B,C,E,D

I, H G,F,E

G F

F, E D

C E,D,A

B A

Exercise 20.2. Topological sort can also be implemented using dfs and a stack instead of a queue. (See, e.g., Berman and Paul). Compare the running time and space usage of the stack implementation with the running time and space usage of the algorithm given above.

20.2. Finding connected components in a graph. A connected component of a graph G (undirected) is a maximal connected subgraph of G. The connected components of G can be found in time = "capital theta"(|V| + |E|) by a modified version of dfs, if we use the adjacency list structure to represent G. For each vertex v define comp(v) to be an integer representing the component which v is in and next(v) to be the vertex in v's adjacency list which is the next one to be examined.

initialize the comp array to 0 

for each v initialize next(v) to the first vertex in the adjacency list for v

initialize component-number to 1

for each v in G

if comp(v) = 0 then

dfs(v,component-number)

component-number = component-number + 1

procedure dfs(v,j)

comp(v) = j; stack v

while stack is not empty do

top = pop(stack)

while next(top) is not nil do

w = vertex pointed to by next(top); update next(top)

if comp(w) = 0 then

comp(w) = j

stack w

Exercise 20.3. Calculate the space usage of the connected components algorithm. Explain your calculations.

Exercise 20.4. If a graph (or digraph) is represented as an adjacency matrix, then an algorithm based on computing powers of the matrix can be used to determine the connected components.

a. Describe such an algorithm.

b. Estimate the time and space used by your algorithm for a graph G with n vertices.