Lecture 19. Graphs--review of definitions; dfs, bfs. 11/10/97.
==================================================================================
19.1. Graphs. Some definitions. We now turn our attention to problems whose underlying structure can best be described as some type of graph. As we shall see, this gives us an incredibly large number of problems to study. Unlike arrays or linked lists, graphs have no underlying linear structure. This fact makes graphs applicable in many situations, but it also makes the development of algorithms for these situations more difficult. In Algorithms II, we will look at a class of problems which, so far as anyone knows, cannot even be solved in polynomial time. We will see that graphs are central to the definition of many of these problems. In the next few lectures we will look at some elementary properties of graphs and some elementary graph algorithms.
In our study of sorting techniques we found that the general strategy of divide and conquer was an effective one in many instances. In our study of graphs we will see that another general strategy, the greedy method, is often useful.
We list here some elementary definitions from graph theory. all or most of these definitions should be familiar from data structures.
A graph G consists of a (finite) set V of vertices (singular vertex) and a set E of pairs chosen from V. These pairs are called edges or arcs. An edge in a graph will be denoted by a line. an edge in a digraph will be denoted by an arrow:
x-------------->y will denote
the edge from x to y. A graph G may be undirected, in which case an edge (x,y) is the same as (y,x). Or the graph may be a directed graph, or digraph, in which case (x,y) represents an arc from x to y and (y,x) <> (x,y). Usually the term graph means an undirected graph. We will use that convention here. We will also use the convention that an edge (x,y) will be written xy. We can denote a graph G by listing its vertex set V and its edge set E, G = (V,E). Note that a graph G has two parameters associated with it, |V| and |E|.
Note that the definition of a graph G implies that there is never an edge from a vertex to itself. the definition also implies that there is at most one edge xy between any two vertices x and y.
If xy is an edge in a graph G, We say that x and y are incident with the edge and we say that y is adjacent to x. For an undirected graph, x is adjacent to y if and only if y is adjacent to x. The degree of a vertex is the number of edges incident with it. (For a digraph we define for each vertex its indegree and its outdegree.)
A subgraph of a graph G = (V,E) is a graph G' = (V',E') in which V' "IS CONTAINED IN" V and E' "IS CONTAINED IN" E.
A path in a (directed or undirected) graph is a sequence of vertices x0,x1, ... , xn, such that xixi+1 is an edge in the graph for 0 <= i <= n-1. A cycle is a path in which x0 = xn. We say the path has length n.
A graph G is acyclic if G contains no cycles.
A graph G is connected if there is a path in G between any two vertices x,y in G.
A connected component of a graph G is a maximal connected subgraph of G.
Exercise 19.1. What are the maximum and minimum values possible for | E |
a. if G is a connected graph with n vertices?
b. if G is a connected digraph with n vertices?
A euclidean path (cycle) is one which visits every EDGE exactly once. A hamiltonian path (cycle) is one which visits every VERTEX exactly once.
It turns out that if a graph G is connected and has exactly two vertices of odd degree then G has a euclidean cycle.
Exercise 19.2 (grad). Prove this.
It can also be shown that in general there is no algorithm that runs in polynomial time (polynomial in the number of vertices) which can determine if a graph G has a hamiltonian cycle.
A tree is a connected acyclic graph. For any tree we can choose one vertex and designate it as the root. Once we have done this, we can talk about the parent of a vertex or the children of a vertex, or the depth of a vertex, in relation to the root we have chosen. (Note that a rooted tree has a natural partial ordering of its vertices, and so some algorithm design and analysis techniques which work for arrays may be applicable to a rooted tree.)
Exercise 19.3.
a. What are the minimum and maximum number of edges possible in a tree of n vertices?
b. What if the tree is a directed tree?
The complete graph on n vertices, denoted Kn, is the graph which contains an edge between every pair of vertices x and y.
A bipartite graph is a graph G whose vertex set consists of two disjoint sets, A and B, and whose edge set contains only edges with one vertex in A and one vertex in B. The complete bipartite graph with |A| = m, |B| = n is denoted Km,n.
A weighted graph is one for which a real number w(e) is assigned to every edge in E. Often the weights are restricted to be positive integers.
We will introduce other graph terminology as it is needed.
19.2. Representing graphs in a computer. There are two standard ways of representing a graph in a computer. With obvious modifications, either can also be used to represent a digraph.
1. The adjacency matrix. If G is a graph on n vertices (x0, ... , xn-1), then we define an n x n matrix A in which A(i,j) is 1 if xixj is an edge and 0 otherwise. Note that for a graph the matrix A is symmetric.
The adjacency matrix representation is very efficient for some graph problems. For example, the value of A(i,j) can be accessed in one time unit for any i and j and will tell us whether there is an edge between i and j. More generally, if we multiply A by itself m times (where we replace the usual multiplication and addition operators by logical AND and OR respectively), the entries in the matrix Am tell us whether there is a path of length m between any two given vertices. For any i, the sum of the matrix elements in row i tells us the degree of xi. The adjacency matrix representation can be generalized to allow the matrix A to contain arbitrary values, representing edge weights, as entries, and powers of
A can then be used to compute path "lengths". The matrix representation may also be inefficient, however. For example, if G has very few edges in relation to the number of vertices, a lot of the entries of A will be 0.
2. The linked list representation. There are several possible variations on this structure. In the most standard one the vertices are placed in an array and for each vertex v a linked list of the vertices adjacent to v is defined. This can be a very space efficient way to represent the graph, but some information is difficult to retrieve because of the sequential nature of the linked list structure. Also, each vertex and edge requires two memory cells (one to represent it and one to represent to link to the next list entry). The linked list representation can be generalized to allow each edge record to hold information about the edge (including, but not limited to, the edge weight).
Example. Suppose we define G as the graph with vertex set {a,b,c,d,e,f,g} and edge set
(a,b), (a,c,),(a,g), (b,c),(b,d),(b,e),(b,g),(c,d)(c,g),(d,e),(e,g). Letting a be vertex 0, b vertex 1, etc., we have the two representations of G:
a b c d e f g
col: 0 1 2 3 4 5 6
-----------------------------------------------------------
row:
0-a 0 1 1 0 0 0 1
1-b 1 0 1 1 1 0 1
2-c 1 1 0 1 0 0 1
3-d 0 1 1 0 1 0 1
4-e 0 1 0 1 0 0 1
5-f 0 0 0 0 0 0 0
6-g 1 1 1 1 1 0 0
=============================================================
vertex adjacency lists
list:
0-a-->b-->c-->g-->nil
1-b-->a-->c-->d-->e-->g-->nil
2-c-->a-->b-->d-->g-->nil
3-d-->b-->c-->e-->nil
4-e-->b-->d-->g-->nil
5-f-->nil
6-g-->a-->b-->c-->e-->nil
Exercise 19.4. Give both graph representations for the digraph G defined as follows:
The vertices of G are the integers 1,2, ... , 10. There is an edge from x to y if x is a factor of y.
19.3. Traversing a graph. Since a graph has no natural ordering of its vertices, there is no standard order in which to visit all these vertices. But visiting all the vertices is a standard part of many algorithms which deal with problems described in terms of a graph. Two graph traversal techniques which have natural definition for the special case of trees can are often useful. Both start from an arbitrary initial vertex v. One, depth first search (dfs), traverses the vertices by seeking to go along "long" paths leading from v. The other, breadth first search (bfs), uses the strategy of visiting v's "closest neighbors" before looking further away. If a graph is not connected and we want to search all its vertices, we must carry out a search procedure on each component of the graph.
Example. Consider the graph G defined in the previous section. Let us restrict our attention to the connected component containing the vertices a,b,c,d,e,g. Choosing a as the initial vertex, we have many possible dfs and bfs orderings of the other vertices. For example, we can define dfs1 by a,b,c,d,e,g; dfs2 by a,g,e,c,b,d, ...Similarly, there are many possible bfs orderings. Which particular ordering a search algorithm will choose will depend on how it is implemented and what labeling has been given to the graph being searched.
We give pseudocode for these two procedures. Note that each search procedure is naturally related to a fundamental data structure, dfs to the stack and bfs to the queue.
DFS(v)Note: v is the vertex from which the search is to begin.
visit and mark v
while there is an unmarked vertex w adjacent to v do
DFS(w)
BFS(v)
Note: v is the vertex from which the search is to begin
Let Q be a queue. Initialize Q to be empty.
visit and mark v; insert v in Q
while Q is nonempty do
w = head(Q); remove w from Q
for each unmarked vertex x adjacent to w do
visit and mark x; insert x in Q
Exercise 19.5. Rewrite DFS(v) so that it is nonrecursive.
Exercise 19.6. Define a connected graph G with 6 vertices, labeled 1,2, .., 6, and 14 edges. (You pick the edges.) Choosing vertex 1 as the initial vertex, show the order in which the vertices of your graph will be visited in a DFS and in a BFS. Assume that the searches are implemented so that, if there is a choice of vertices to visit next, the vertex with the smallest label will be chosen.