Lecture 26--Alternative models of computation (PRAM, quantum computers). 11/26/97.

==================================================================================

NOTE:  Here is the revised algorithm to find biconnected components.  It was just
missing one line (initial definition of LOW[v]) (see example in handout):

Let T be empty and Count be 1. Mark each vertex "unvisited". Also initialize a STACK of edges, initially empty. Choose an initial v in V arbitrarily. Then call BICONSEARCH(v). BICONSEARCH(v). mark v "visited" dfsnum[v] = Count ++Count LOW[v] = dfsnum[v] for each vertex w in adjlist(v) do if (v,w) is not on the STACK, push (v,w) ( (v,w) is on STACK if either v < w and w already "visited" or v > w and w = parent[v] ) if w is "unvisited" then add (v,w) to T; set parent(w) = v; BICONSEARCH(w) if LOW[w] >= dfsnum[v] then a biconnected component has been found; pop the edges up to and including (v,w); these are the component LOW[v] = min (LOW[v],LOW[w]) else if w is not parent[v] then LOW[v] = min(LOW[v], dfsnum[w])

================================================================================== 26.1. Alternative models of computation. Up to now we have been igning and analyzing algorithms with reference to the Turing or von Neumann architecture model of computation. That is, we assume our programs will be running on a machine with:

an array of memory cells, each with an address (random access memory)

a fixed set of operations (arithmetic, logical, jumps, etc.), each of which can be completed in one time unit

input and output operations are part of this set

Thus when a program does N operations, it will take N time units. This is the basis of our time analyses of algorithms. Furthermore, each data item will occupy one memory cell. this is the basis of our space analyses of algorithms.

In this lecture we will introduce a theoretical model for parallel computation, the PRAM (pronounced p-ram), or parallel random access machine. This model is a theoretical tool only. It cannot be realized with present day technology. Algorithms which run on a PRAM in some time f(n) for a problem of size n will not run that fast on a real computer. This is because one of the basic assumptions of the PRAM model (see below) is that more than one processor can read or write to a given memory cell at the same time. In practice this is, of course, impossible. The best we could usually hope to do is that n processors could read the contents of one memory cell or combine their outputs (perhaps by some logical function) into one output in time = "capital theta"(logbn) for some base b.

In the next two lectures we will look at some parallel architectures which can actually be built and see some algorithms for these architectures. In the final lecture we will look at a new model of computation, quantum computing.

The important thing to remember about parallel computing is that it is not magic. Algorithms which run only in exponentially much time on a sequential computer will also require exponential time on a parallel machine. In general, for any problem, the size of the problem, n, can be arbitrarily large, but a real parallel machine will only have a fixed number of processors, i.e., the number of processors cannot grow arbitrarily with the problem size. In addition, even when some speedup of the problem seems possible, it may be that the overhead of communication time makes the parallel solution unattractive when compared with a sequential algorithm. Finally, the cost of building and maintaining a parallel machine may not be justified if the machine does not speed up computations sufficiently.

26.2. Definition of a PRAM. A PRAM consists of p identical processors, P(1), P(2), ... , P(p), all connected to one shared random access memory M. Each processor has a small local memory, but all communication is through the shared memory M. One step in a computation consists of:

a. Each processor may read from a memory cell.

b. Each processor does some computations. The processors are synchronized but may do different operations. the program must specify what operation processor P(i) is to perform, as a function of i.

c. Each processor may write to a memory cell.

Example. Finding the largest key in n keys in time "capital theta"(log(n)).

Assume we have n processors and n keys, in memory locations M(1), ..., M(n).
The following algorithm will place the largest of the keys in M(1), in time
"capital theta"(log(n)).

Assume each processor has local variables big,step, and temp. Read M(i) into P(i)'s copy of big; set step - 1. Write -maxint into M(n+i) (i.e., M(n+1), ... , M(2n)). for count = 1 to log(n) do read M(i+step) into temp big =max(big,temp) step = 2*step write big into M(i)

To see how this works, suppose n = 8 and the keys are, in order, 16,12,1,17,23,19,4,8

       Processor     1      2      3      4      5      6      7      8
step    big,temp     
0                    16     12     1      17     23     19     4      8

1                    16,12  12,1   1,17   17,23  23,19  19,4    4,8   8,0
write back to M      16     12     17     23     23     19      8     8

2                    16,17  12,23   17,23 23,19  23,8   19,8    8,0   8,0
write back to M      17     23      23     23    23      19     8      8

4                    17,23  23,23   23,19   23,8  23,8    19,8   8,0   8,0
write back to M       23    23       23      23     23     19    8     8