PARABOLIC PARTIAL DIFFERENTIAL EQUATIONS*

18 downloads 0 Views 1MB Size Report
The algorithm is a variant of the multigrid waveform relaxation method where the scalar ordinary differential equations that make up the kernel ofcomputation are.
() 1994 Society for Industrial and Applied Mathematics

SIAM J. ScI. COMPUT. Vol. 16, No. 3, pp. 531-541, May 1995

002

AN ALGORITHM WITH POLYLOG PARALLEL COMPLEXITY FOR SOLVING PARABOLIC PARTIAL DIFFERENTIAL EQUATIONS* G. HORTON S. VANDEWALLE AND P. WORLEY

,

,

Abstract. The standard numerical algorithms for solving parabolic partial differential equations are inherently sequential in the time direction. This paper describes an algorithm for the time-accurate solution of certain classes of parabolic partial differential equations that can be parallelized in both time and space. It has a serial complexity that is proportional to the serial complexities of the best-known algorithms. The algorithm is a variant of the multigrid waveform relaxation method where the scalar ordinary differential equations that make up the kernel of computation are solved using a cyclic reduction-type algorithm. Experimental results obtained on a massively parallel multiprocessor are presented.

Key words, parabolic partial differential equations, massively parallel computation, waveform relaxation, multigrid, cyclic reduction AMS subject classifications, primary 65M, 65W; secondary 65L05

1. Introduction. For many numerical problems in scientific computation, the execution time grows without bound as a function of the problem size, independent of the number of processors and of the algorithm used [36], [38], [40]. In particular, for most linear partial differential equations (PDEs) arising in mathematical physics, the parallel complexity grows as O(log N), where N is a particular measure of the problem size. The proof is based on deriving upper and lower bounds on the execution time of optimal parallel algorithms for multiprocessors with an unlimited number of processors and no interprocessor communication costs, where both upper and lower bounds are proportional to log N. These optimal parallel algorithms can have very large serial complexities, and the tightness of the bounds on the parallel execution time for practical algorithms is not established by this analysis. In the analysis of standard numerical algorithms for linear PDEs, there is a strong dichotomy in the nature of the growth in the parallel execution time between algorithms for the solution of time-dependent and time-independent PDEs [37], a dichotomy that is notpresent in the optimal algorithm analysis. For example, when approximating elliptic PDEs using finite difference or finite element discretizations, the serial complexity is at least (R)(Ns), where Ns is the size of the underlying grid and (R)(x) denotes a positive quantity whose leading order term is proportional to x. This linear serial complexity can often be achieved using a full multigrid V-cycle algorithm, weighted Jacobi, or multicolor Gauss-Seidel smoothing, and local restriction/prolongation operators, which has a parallel complexity of (R)(log2 Ns) on a multiprocessor with (R)(Ns) processors [3], [9], [25]. So, for these problems, there exists an algorithm whose serial complexity is proportional to that of the best serial algorithm and whose parallel complexity is a polylog function of the serial complexity. *Received by the editors August 21,1991; accepted for publication (in revised form) February 22, 1994.

Lehrstuhl fur Rechnerstrukturen (IMMD 3), Universitat Erlangen-Nurnberg, Martensstrasse 3, D-8520 Erlangen, Federal Republic of Germany (graham@ immd3, informat ik. uni-erlangen, de). tDepartment of Computer Science, Katholieke Universiteit Leuven, Celestijnenlaan 200A, B-3001 Leuven(Heverlee), Belgium (stefanScs. kuleuven, ac. be). Mathematical Sciences Section, Oak Ridge National Laboratory, P.O. Box 2008, Oak Ridge, Tennessee 378316367 (worley@msr. epm. o rnl. gov) This research was supported by the Applied Mathematical Sciences Research Program, Office of Energy Research, U.S. Department of Energy under contract DE-AC05-84OR21400 with Martin Marietta Energy Systems Inc. The following text presents research results of the Belgian Incentive Program "Information Technology"Computer Science of the future, initiated by the Belgian State, Prime Minister’s Service, Science Policy Office. The scientific responsibility is assumed by its authors.

531

532

G. HORTON, S. VANDEWALLE, AND P. WORLEY

Time-stepping methods are commonly used to calculate the time-accurate solution of timedependent PDEs. For a time-accurate solution, the solution is required at a sequence of times 0 ti Nt }, where ti- < ti and ti ti- is small enough to allow accurate interpolation of the solution at all times in between. Standard time-stepping algorithms based on finite difference or finite element discretizations of parabolic PDEs have serial complexities that are linear in the number of space-time locations where the solution function is approximated. Thus, the serial complexity is (R) (Ns. Nt), where Ns is the size of the underlying grid at a fixed time and Nt is the number of time levels. The computations at each time level are usually easily parallelized, but the processing of the time direction in a time-stepping algorithm is inherently sequential. Thus, the parallel complexity is always at least (R)(Nt), i.e., not a polylog function of the serial complexity. Variants of the standard time-stepping algorithms have been proposed that begin the calculation for later time levels before the current time level is finished [10], [28], [35]. These algorithms, however, do not alter the sequential nature of the time direction. Alternatively, the multigrid idea has been extended to parabolic problems discretized on space-time grids in [13] and a time-parallel variant was studied in [1], [4], [5], [16]. This parabolic multigrid algorithm allows computations to proceed concurrently on different time levels, yet lacks robustness w.r.t, the mesh-size parameters and is limited to particular time-discretization schemes and assumptions on the smoothness of the timedependent data (forcing function) [39]. This paper addresses the question of whether good serial algorithms for parabolic PDEs are intrinsically less parallel than good serial algorithms for elliptic PDEs. We pose the question in the following form. For a given parabolic PDE, is there a numerical algorithm for the time-accurate approximation of the solution with the following properties. (i) The serial complexity of the algorithm is proportional to the serial complexity of the best-known serial algorithm for this problem. (ii) The algorithm can be parallelized in the time direction as well as the spatial directions, so that the achievable parallel complexity given an unlimited number ofprocessors is a polylog function of the serial complexity. In 2, we describe a class of algorithms that have properties (i) and (ii) for a large class of linear parabolic PDEs. Not only does this class of algorithms answer the above theoretical question, it also has practical applications on massively parallel multiprocessors. This is illustrated in 3 where timing results obtained on a massively parallel SIMD multiprocessor are given. The algorithm can be adapted and made into an algorithm for solving nonlinear parabolic problems and systems of equations. This is briefly discussed in 4, where we also mention how it can be modified to run efficiently on large MIMD multiprocessors. We finish in 5 with some concluding remarks. Note that the question posed above can also be asked of hyperbolic PDEs. The class of algorithms described here is not effective for hyperbolic problems, but a similar approach can be used to develop algorithms with properties (i) and (ii) for a restricted, but important, class of constant coefficient hyperbolic PDEs. See [41 for details. 2. Multigrid waveform relaxation.

2.1. Waveform relaxation. Waveform relaxation is a technique for solving systems of ordinary differential equations (ODEs) of initial-value-type [23], [27]. It is based on applying standard point and block iterative methods for the solution of linear systems to the solution of a system of ODE:.. For example, let A be an n x n matrix, and consider the problem F, where U and F are vector functions of time. Then the kth step of a d U/dt + AU Jacobi-based iterative method for the solution of this system is

(1)

d uk dt

+ DU

F + (D- A)U (k-l)

A POLYLOG PARALLEL PARABOLIC SOLVER

533

where D is the diagonal of A. Thus, each step of the method involves the solution of n independent scalar ODEs. The decoupling of the system allows different discretizations and timesteps to be used for each of the ODEs, which can lead to significant savings for some applications. To solve parabolic PDEs, the spatial derivatives are discretized to generate a semidiscrete problem and the resulting system of ODEs is solved using waveform relaxation. Miekkala and Nevanlinna have analyzed the convergence of waveform relaxation for linear operators [26]. They showed that, for linear PDEs of the form ut + Lu f where L is an elliptic operator and for standard spatial discretizations, the convergence rates for Jacobi and Gauss-Seidel iterations for the semidiscrete problem are often similar to those for the analogous discrete elliptic problem.

2.2. Multigrid acceleration. Let L h represent the discrete operator generated by discretizing the elliptic operator L on a grid of meshsize h. The correspondence between the convergence rate of waveform relaxation applied to dU/dt + LhU F and the convergence rate of the analogous matrix iterative method applied to L h U F has two immediate implications. First, the convergence rate is too slow for waveform relaxation to be competitive with standard time-stepping algorithms. Second, multigrid techniques may be effective at accelerating convergence of the iteration. Multigrid acceleration has been analyzed by Lubich and Ostermann [24]. Among their results, they showed that multigrid performance, i.e., a convergence factor bounded by a small constant independent of h, can be achieved for the semidiscrete problem if Lh is a symmetric positive definite matrix with constant diagonal entries, and either weighted Jacobi relaxation is used, or Lh has the form

where D is diagonal, and Gauss-Seidel relaxation is used. Note that this latter matrix structure commonly occurs when using a red-black ordering with standard finite difference stencils. They also show that multigrid performance can be achieved for the fully discrete problem if, in addition to the above conditions, all ODEs are discretized by the same method, the time direction is not coarsened, and the ODE solver is an A(c)-stable linear multistep method with suitably large ot or an algebraically stable Runge-Kutta method. Note that all of these conditions are sufficient, but not necessary. This multigrid waveform relaxation algorithm has been tested extensively [31 ]-[33], and has been shown to work well for a variety of parabolic problems, both linear and nonlinear, on both serial and parallel computers. In particular it has been shown that the use of multigrid V-cycles leads in many cases to a rapidly converging iteration with typical convergence factors in the range of 0.06 to 0.2, independent of the size of the spatial mesh and independent of the length of the time-integration interval or the number of time levels. In such cases a nested iteration or full multigrid strategy that employs a small number of V-cycles on each grid level results in an algorithm with linear serial complexity and an algebraic error (norm of the difference between approximation and converged discrete solution) that is smaller than the discretization error (norm of the difference between the converged discrete solution and the solution of the PDE). In addition, it is illustrated in the above references that the serial computation time of the waveform algorithm is similar to (and often smaller than) the serial computation time of a corresponding time-stepping code.

2.3. Multigrid waveform relaxation with cyclic reduction. The waveform relaxation algorithm is normally implemented in a fashion that is still intrinsically sequential in the

534

G. HORTON, S. VANDEWALLE, AND P. WORLEY

time direction. Some time parallelism was introduced in the method described in [34], but a (small) sequential component remained. Here, we introduce parallelism in the time direction by exploiting an idea from the parallel-ODE literature [2], [6], [11 ], 12]. The computation in the time direction only involves solving linear scalar ODEs. If the ODEs are solved using a linear multistep method with a statically determined timestep, then each ODE solution corresponds to the solution of a banded lower triangular matrix equation, or, equivalently, a linear recurrence. Parallelizing linear recurrence equations has been studied extensively [8], [15], [21], [22], [29]. In particular, if a cyclic reduction approach is used to parallelize the linear recurrence, then parallelism is introduced without increasing the order of the serial complexity. For example, if a two-level method is used to discretize the scalar ODE, then a lower bidiagonal matrix equation must be solved. Cyclic reduction combines even-numbered equations with odd-numbered equations to generate a new bidiagonal matrix equation of half the size. If the original matrix equation has the form all a21

a22 a32

(3)

a33 a43

a44 a54

a55 a65

a66 a76

a77 a87

Xl x2 x3 x4 x5 x6 x7 x8

a88

then one step of the cyclic reduction algorithm generates a new matrix equation of the form a22 a43

(4)

a32

x2 a44

a65 a55 a54

X4 X6

a66 a87

a77 a76

a88

x8

f2 a21 -fl a43 f4- a-f3 a65 f6- -fs f8 a87 -77 f7

The solution vector of the smaller system is identical to the even-numbered elements of the solution vector of the original matrix equation. This process is repeated until only one or two equations are left, at which time this small linear system is solved. The solution values for the small system are then used to calculate the unresolved values in the next larger linear system. By repeating this process, the solution to the original matrix equation is calculated. Note that each step of the reduction stage is perfectly parallel in the sense that combining each pair of equations is independent. Similarly, injecting solution values into the next larger system and solving for the unresolved variables can be done independently for each unresolved variable. Thus, cyclic reduction allows us to parallelize the time direction. If a k-level linear multistep algorithm is used to solve the ODE, then the banded lower triangular matrix defining the linear recurrence has a bandwidth of k. The cyclic reduction consecutive algorithm again halves the number of equations at each step, but now k equations are needed to transform the dependencies in a given equation from the k- previous values in the current matrix equation to the k previous values in the new smaller system. is or As before, this process continued until only k fewer equations are left. If k > 2, then solving this small system and injecting the solution back into the next larger system does not decouple the calculation of the unresolved variables. Instead, it produces a new banded lower triangular matrix equation to solve, one whose bandwidth is [k/2]. Repeatedly applying the cyclic reduction algorithm continues to halve the bandwidth until all of the unresolved variables are calculated, at which time they can be injected into the next larger system to reduce its bandwidth.

535

A POLYLOG PARALLEL PARABOLIC SOLVER

The cyclic reduction algorithm is more expensive than the standard serial algorithm, but the complexity is still (R)(Nt) for each ODE solution (if k is independent of Nt). For example, for a two-level method, the serial complexity of the standard algorithm is 3 Nt, while for the cyclic reduction algorithm it is 5Nt or 7Nt, depending on whether certain values are precomputed. For a three-level scheme, the complexity of the standard algorithm is 5Nt, compared with 11Nt or 21Nt for the cyclic reduction algorithm. As long as the complexity of the ODE solver is (R) (Nt), the multigrid waveform relaxation algorithm remains an (R) (Ns. Nt) complexity algorithm. The parallel complexity of the cyclic reduction algorithm is a function of the number of time levels used in the discretization of the ODE. For a k-level scheme, it is (R) (log Nt), where y k/2]. Including this cyclic reduction algorithm in a parallel multigrid waveform relaxation algorithm that uses a weighted Jacobi or red-black Gauss-Seidel smoother leads to a parallel complexity of (R) (log Ns log Nt) for a single V-cycle on a hierarchy of (R) (log Ns) multigrid levels. The total parallel complexity of the full multigrid algorithm with a constant number of V-cycles per grid level is then of the form (R) (loge Ns log y Nt), which is polylog.

2.4. Multigrid waveform relaxation with parallel cyclic reduction. The algorithm described above, multigrid waveform relaxation with cyclic reduction, was motivated by the theoretical question introduced in 1. The following generalization is motivated by practical issues. By imitating Hockney’s parallel cyclic reduction algorithm (PARACR) 15], also called the recursive doubling method, we can lower the parallel complexity of the cyclic reduction algorithm to (R)(log Nt), independent of the length of the recurrence, without increasing the number of processors needed. The idea is to modify all equations at each reduction step. Returning to equation (3), PARACR transforms this.linear system into two decoupled linear systems of half the size: system (4) and all

(5)

a32 a a22 21

Xl a33

a54 a43 a44

x3

a55 .a76

6---65

x5 a77

x7

-

fl f3- a32 f2 f5 a54 f4 f7- a76 f6

Thus, after log 2 Nt steps, there are Nt independent equations whose solutions solve the original problem. While the resulting algorithm has a serial complexity of (R)(Nt log Nt), this is unimportant on a multiprocessor with (R)(Nt) processors. On such multiprocessors, PARACR is substantially faster than the original cyclic reduction algorithm because it avoids the backsubstitution phase. It also eliminates the recursive halving of the bandwidth required in the elimination step when k > 2.

3. Complexity and computational experiment. In this section, we verify the predicted parallel complexity of the multigrid waveform relaxation algorithm. We concentrate in particular on the PARACR or recursive doubling variant, which, within rounding error, provides the same numerical results as the original cyclic-reduction-based algorithm. Not only is the complexity of this variant smaller than that of the basic cyclic-reduction method, its implementation on a massively parallel multiprocessor is also more straightforward. We consider a linear parabolic PDE with variable coefficients on a two-dimensional rectangular domain with Dirichlet boundary conditions. The problem is discretized in space on a regular rectangular grid with an equal number of grid lines in both spatial dimensions, leading to a five point finite difference stencil. Backward Euler (BDF1), a two-level scheme,

536

G. HORTON, S. VANDEWALLE, AND P. WORLEY

the second order backward differentiation formula (BDF2), a three-level scheme, and the Crank-Nicolson method (CN), a two-level scheme, are used to discretize in time. The total number of grid points, not counting the ones with known values on the boundaries and on the initial time level, is given by Nx x Ny x Nt, with

Nx

(6)

Ny

2 P- 1

and

Nt

2Q

for positive integer values P and Q. The algorithm employs a multigrid strategy where the grid hierarchy with P levels is defined by standard spatial coarsening from the fine grid down to a coarse grid with a single unknown per time level. Bilinear prolongation and standard full weighting are used as intergrid transfer operators. The smoother is a Gauss-Seidel waveform relaxation method with a red-black ordering of the grid points. In Table we present the parallel complexities of the multigrid waveform relaxation operators. (One processor per grid point is assumed.) Given are the number of parallel stages--each corresponding to a floating point operation--required in the application of a smoothing half-step (red or black) (Cs), in the defect calculation (Co), in the restriction (CR), in the prolongation (Cp), and in the correction (Cc). Note that Co, CR, Cp, and Cc are completely independent of the size of the grid. For deriving C R we took into account that the defect after a red-black step is zero at half the number of grid points. The constant term in the formula for C s is the complexity of constructing the linear recurrence, whereas the linear term is the complexity of the parallel cyclic reduction algorithm. TABLE Parallel complexity of multigrid waveform relaxation operators.

Method BDF1 BDF2

CN

Cs(Q) 9+3 Q 9+11Q 10+3 Q

Co

CR

12 14 14

6 6 6

Cp 4

Cc

4 4

The complexity of a multigrid V-cycle, denoted by Cv(P, Q), can be calculated from

(7) (8)

Cv(l, Q) Cv(P, Q)

Cs(Q), Cv(P 1, Q)

+v

2Cs(Q) + Co

+ C + Cp + Cc

where v denotes the total number of red-black smoothing steps per grid level. For BDF1, BDF2, and CN we obtain

(9) (10) (11)

Cv(P, Q) Cv(P, Q) Cv(P, Q)

14- 18v, 6v PQ + (23 + 18v)P + (3-6v) Q 16- 18v, 22v PQ + (25 + 18v) P + (11 22v) Q 15 20v, 6v PQ -k- (25 + 20v) P + (3 6v) Q

respectively. From (7) and (8), one readily derives the parallel complexity of the full multigrid waveform relaxation algorithm. Let 6 be the number of V-cycles per grid level (usually or 2). Then P

(12)

CFMG(P, Q)

Ci

+ Cv(1, Q) + _(C + Ce + 6. Cv(i, Q)) i=2

CI is the 6)(1) cost of initialising certain variables before starting the actual multigrid computations on a new grid level. With (9), (10), and (11) this gives

where

(13)

CFMG(P, Q)

6)(p2 Q),

537

A POLYLOG PARALLEL PARABOLIC SOLVER

for BDF1, BDF2, and CN, establishing the predicted polylog parallel complexity. The coefficient of the leading order term is given by 3 for BDF1 and CN, and by 11 v for BDF2. The algorithm analysed above has been implemented on a CM/2 Connection Machine 14], [30], a massively parallel multiprocessor of SIMD (single instruction stream, multiple data stream) type. The results of a timing experiment performed on a machine with 16K processors are reported in Figs. 1 and 2. We present execution times obtained with the implementation of the BDF1 method. The curves for the BDF2 and CN schemes are qualitatively very similar.

0.45

P=5

P

4

0.40 0.35

p

0.30 0.25

P

0.20

2

0.15

0.10

P=I

0.05

a FIG. 1. Multigrid waveform relaxation with parallel cyclic reduction, BDF1 V-cycle times.

are the execution times, in seconds, of multigrid V-cycles with 2 and 1 postsmoothing step, for different values of P and Q. In all cases, P presmoothing steps and Q are such that the total number of grid points on the finest grid is smaller than or equal to the number of processing elements available. The lines are connecting points that correspond to V-cycles on a fixed number of grid levels (P) for different numbers of time levels. The qualitative correctness of formula (9) is easily verified by inspection of the picture. For a fixed value of P the parallel complexity is a linear function of Q with a slope that is linearly dependent on P. A similar rule holds with P and Q interchanged. The latter is verified by looking at the constant difference between the execution times for successive values of P and fixed value of Q, a difference that increases with growing value of Q. In Fig. 2 we present the execution time of the full multigrid algorithm. The (problem dependent) value ; was set to one in this experiment. Also, we did not include the problem dependent cost of initialising the boundary values, initial condition values and the PDE righthand side. The curves clearly illustrate the parabolic dependence of the execution time on the number of grid levels, as predicted by (13).

Given in Fig.

538

G. HORTON, S. VANDEWALLE, AND P. WORLEY

Q=4 Q=6

Q=2

0.6

T 0.5

Q=O

I

1

2

3

4

5

6

7

FIG. 2. Multigrid waveform relaxation with parallel cyclic reduction, BDF1 full multigrid times.

4. Extensions.

4.1. A coarse grain parallel algorithm. By imitating the blocked cyclic reduction al-. gorithm discussed in [18] and [19], the communication cost can be reduced to a manageable size for distributed memory multiprocessors. The algorithm is based on a perfectly parallel substructuring or partitioning technique to reduce the original system to an intermediate system with a single equation per processor. For example, if Pt processors are assigned to the solution of an ODE, the blocked algorithm generates a small intermediate Pt x Pt linear system whose solution by (parallel) cyclic reduction introduces Pt-way parallelism into the rest of the calculation. This coarse grain parallel algorithm is very similar to the space-time concurrent waveform algorithm of [34], where a different method is used to solve the intermediate system.

4.2. Nonlinear problems and systems of parabolic problems. Nonlinear parabolic PDEs can be solved by using a nonlinear or full approximation scheme multigrid waveform method [31 ], [32]. These algorithms are straightforward extensions of the well-known multigrid methods for nonlinear elliptic problems. The nonlinearity is treated in the smoothing steps, where each nonlinear ODE is first linearized (as an ODE in a single unknown) and then solved by a standard ODE method for linear problems. Consequently, the serial and parallel complexities of the algorithms for nonlinear problems are similar to those of the algorithms for linear problems. The algorithms for linear and nonlinear equations are readily extended to algorithms for solving systems of linear or nonlinear equations, by using a block cyclic reduction method, to solve the set of ODEs at each grid point. Although the serial complexity of the block

A POLYLOG PARALLEL PARABOLIC SOLVER

539

cyclic reduction method may be many times larger than that of the standard serial algorithm, it remains an (R) (Nt) method. Also the order of the parallel complexity is unaltered.

5. Concluding remarks. The algorithms described in this paper establish that major classes of parabolic PDEs can be solved in polylog parallel time without giving up linear serial complexity. Moreover, minor variants of these algorithms with log-linear serial complexity can be efficiently implemented on massively parallel SIMD multiprocessors. Using cyclic reduction within the multigrid waveform relaxation algorithm has the desired properties for all problems for which (a) the full multigrid waveform relaxation algorithm has a linear serial complexity when using a relaxation technique that can be efficiently parallelized, like Jacobi or multicolor Gauss-Seidel, and (b) cyclic reduction is a stable algorithm for solving the linear recurrences arising from discretizing the scalar ODEs. Both theory and empirical evidence indicate that (a) is true for a large class of parabolic problems, both linear and nonlinear. While some work on the stability of parallelizing recurrence equations has been done (see [20], [29]) and the numerical examples computed by the authors give no indication of stability problems, more work must be done to establish the stability of this algorithm, in particular when the bandwidth of the recurrence is large and/or the system is not diagonally dominant. This is especially true given the stability problems of the obvious implementation of cyclic reduction for elliptic PDEs [7]. When properties (a) and (b) hold, we have shown the algorithm to have an achievable parallel complexity of (R) (log2 Ns. log Nt), where [k/2q when using cyclic reduction and ?’ 1 when using recursive doubling. This is still higher than that for an elliptic problem with the same number of grid points, and the question arises as to whether it can be decreased to (R)(log 2 N) with N Ns. Nt. The log Nt-factor in the complexity of the multigrid waveform algorithm arises from the parallel complexity of the cyclic reduction algorithm. It could be avoided in two different ways: either by the use of an approximate solver for the lower triangular system with constant parallel complexity, or by the use of a strategy that coarsens the grid in time (as well as space) during the multigrid process. The former approach is the one followed in the parabolic multigrid algorithm with parallel smoothing, by Hackbusch [13], where a Jacobi relaxation step is used to approximate the linear system solution. As mentioned in 1, this approach has some limitations w.r.t, mesh ratios, discretization schemes, and data smoothness assumptions. The latter approach is explored in a paper by Horton and Vandewalle 17]. Their algorithm is also not as generally applicable as the multigrid waveform method but performs satisfactorily in particular for the BDF1 and BDF2 discretizations of the time derivative. Finally, beyond answering the theoretical question of 1, the multigrid waveform algorithm has a clear promise as a practical parallel algorithm, since it is a competitive serial algorithm for many applications. It is realised, however, that adaptive time-step selection and spatial grid adaptivity are crucial to the success of good solvers for the often highly nonlinear parabolic problems that arise in practical applications. Although the waveform relaxation idea allows in principle the use of local step sizes "varying in different regions of the computational domain, and the full multigrid algorithm enables local error estimation based on the availability of solutions on different grid levels, it is not clear how this will affect the actual parallel performance of the algorithm.

,

Acknowledgments. The authors would like to thank Thinking Machines Corporation, Cambridge, MA, for providing access to the Connection Machine Network Server pilot facility, and the Gesellschaft ftir Mathematik und Datenverarbeitung, Sankt-Augustin, Germany, for letting them use the CM/2 machine at the GMD.

540

G. HORTON, S. VANDEWALLE, AND P. WORLEY

REFERENCES E BASTIAN, J. BURMEISTER, AND G. HORTON, Implementation ofa parallel multigrid methodfor parabolic partial differential equations, in Parallel Algorithms for PDEs, Proc. 6th GAMM Seminar Kiel, January 19-2 l, 1990, W. Hackbusch, ed., Vieweg Verlag, Wiesbaden, 1990, pp. 18-27. [2] A. BELLEN AND M. ZENNARO, Parallel algorithms for initial value problems for difference and differential equations, J. Comput. Appl. Math., 25 (1984), pp. 341-353. [3] A. BRANDT, Multigrid solvers on parallel computers, in Elliptic Problem Solvers, M. Schultz, ed., Academic Press, New York, 1981, pp. 39-83. [4] J. BURMEISTER, Paralleles lOsen diskreter parabolischer Probleme mit Mehrgittertechniken, diplomarbeit, Universitit Kiel, 1985. [5] J. BURMEISTER AND G. HORTON, Time-parallel multigrid solution of the Navier-Stokes equations, in Multigrid methods III, Proc. 3rd European Multigrid Conference, Bonn, 1990, W. Hackbusch and U. Trottenberg, eds., Birkhatiser Verlag, Basel, 1991, pp. 155-166. [6] K. BURRAGE, Parallel methods for initial value problems, Appl. Numer. Math., 11 (1993), pp. 5-25. [7] B. BUZBEE, G. GOLUB, AND C. NIELSON, On direct methods for solving Poisson’s equations, SIAM J. Numer. Anal., 7 (1970), pp. 637-657. [8] S. CHEN AND D. KUCK, Time and parallel processor boundsfor linear recurrence systems, IEEE Trans. Comput., C-24 (1975), pp. 701-717. [9] N. DECKER, Note on the parallel efficiency of the Frederickson-McBryan multigrid algorithm, SIAM J. Sci. Statist. Comput., 12 (1991), pp. 208-220. [10] H. EISSFELLER AND S. MOLLER, The triangle method for saving startup time in parallel computers, in Proc. 5th Distributed Memory Computing Conference, Los Alamitos, CA, 1990, D. Walker and Q. Stout, eds., IEEE Computer Society Press, Washington, D.C., pp. 568-572. [11] C. GEAR, Waveform methods for space and time parallelism, J. Comput. Appl. Math., 38 (1991), pp. 137-147. [12] C. GEAR AND X. XUHAI, Parallelism across time in ODEs, Appl. Numer. Math., 11 (1993), pp. 45-68. 13] W. HACKBUSCH, Parabolic multigrid methods, in Computing Methods in Applied Sciences and Engineering VI, R. Glowinski and J.-L. Lions, eds., North Holland, Amsterdam, 1984, pp. 189-197. 14] D. HILLIS, The Connection Machine, MIT Press, Cambridge, MA, 1985. 15] R. HOCKNEY AND C. JESSHOPE, Parallel Computers: Architecture, Programming, and Algorithms, Adam Hilger Ltd., Bristol, UK, 1981. [16] G. HORTON, The time-parallel multigrid method, Comm. Appl. Numer. Meth., 8 (1992), pp. 585-595. [17] G. nORTON AND S. VANDEWALLE, A space-time multigrid method for parabolic P.D.E.S, Tech. report 6/93, IMMD 3, Universitat Erlangen-Ntirnberg, June 1993. [18] S. JOHNSSON, Solving tridiagonal systems on ensemble architectures, SIAM J. Sci. Statist. Comput., 8 (1987), pp. 354-392. 19] S. JOHNSSON, Y. SAAD, AND M. SCnULTZ, Alternating direction methods on multiprocessors, SIAM J. Sci. Statist. Comput., 8 (1987), pp. 686-700 [20] E KOGGE, The numerical stability of parallel algorithms for solving recurrence problems, Tech. report 44, Digital Systems Laboratory, Stanford Electronics Laboratories, Stanford University, Stanford, CA, Sept. 1972. Parallel solution of recurrence problems, IBM J. Res. Develop., 18 (1974), pp. 138-148. [21] [22] E KOGGE AND S. STONE, A parallel algorithmfor the efficient solution ofa general class of recurrence equations, IEEE Trans. Comput., C-22 (1973), pp. 786-793. [23] E. LELARASMEE, A. RUEHLI, AND A. SANGIOVANNI-VINCENTELLI, The waveform relaxation method for timedomain analysis oflarge scale integrated circuits, IEEE Trans. Computer-Aided Design, (1982), pp. 131145.

[24] C. LUBICH AND A. OSTERMANN. Multigrid dynamic iteration for parabolic equations, BIT, 27 (1987), pp. 216234. [25] O. MCBRYAN, P. FREDERICKSON, J. LINDEN, A. SCHOLLER, K. SOLCHENBACH, K. STOBEN, C. THOLE, AND U. TROTTENBERG, Multigrid methods on parallel computers--a survey of recent developments, Impact Comput. Sci. Engrg., 3 (1991), pp. 1-75. [26] U. MIEKKALA AND O. NEVANLINNA, Convergence of dynamic iteration methodsfor initial value problems, SIAM J. Sci. Statist. Comput., 8 (1987), pp. 459-482. [27] A. NEWTON AND A. SANGIOVANNI-VINCENTELLI, Relaxation-based electrical simulation, SIAM J. Sci. Statist. Comput., 4 (1983), pp. 485-524. [28] J. SALTZ, V. NAIK, AND D. NICOL, Reduction of the effects of the communication delays in scientific algorithms on message passing MIMD architectures, SIAM J. Sci. Statist. Comput., 8 (1987), pp. 118-s 138.

A POLYLOG PARALLEL PARABOLIC SOLVER

541

[29] A. SAMEH AND R. BRENT, Solving triangular systems on a parallel computer, SIAM J. Numer. Anal., 14 (1977), pp. 1101-1113. [30] THINKING MACHINES CORPORATION, Connection Machine CM-200 Series, Technical Summary, 245 First Street, Cambridge, MA 02142-1264, June 1991. [31] S. VANDEWALLE, Parallel Multigrid Waveform Relaxation for Parabolic Problems, B.G. Teubner Verlag, Stuttgart, 1993. [32] S. VANDEWALLE AND R. PIESSENS, Numerical experiments with nonlinear multigrid waveform relaxation on a parallel processor, Appl. Numer. Math., 8 (1991), pp. 149-161. [33] Efficient parallel algorithms for solving initial-boundary value and time-periodic parabolic partial differential equations, SIAM J. Sci. Statist. Comput., 13 (1992), pp. 1330-1346. [34] S. VANDEWALLE AND E. VAN DE VELDE, Space-time concurrent multigrid waveform relaxation, Ann. Numer. Math., (1994), pp. 335-346. [35] D. WOMBLE, A time-stepping algorithm for parallel computers, SIAM J. Sci. Statist. Comput., 11 (1990), pp. 824-837. [36] P. WORLEV, Information Requirements and the Implications for Parallel Computation, Ph.D. thesis, Stanford University, Stanford, CA, June 1988. [37] , The effect oftime constraints on scaled speedup, SIAM J. Sci. Statist. Comput., 11 (1990), pp. 838-835. Limits on parallelism in the numerical solution of linear PDEs, SIAM J. Sci. Statist. Comput., 12 [38] (1991), pp. 1-35. [39] Parallelizing across time when solving time-dependent partial differential equations, Tech. report ORNL/TM- 11963, Oak Ridge National Laboratory, Oak Ridge, TN, Sept. 1991. The effect of multiprocessor radius on scaling, Parallel Comput., 18 (1992), pp. 361-376. [40] Parallelizing across time when solving time-dependent partial differential equations, in Proc. 5th [41] SIAM Conference on Parallel Processing for Scientific Computing, J. Dongarra, K. Kennedy, P. Messina, D. Sorensen, and R. Voigt, eds., Philadelphia, PA, 1992, Society for Industrial and Applied Mathematics, pp. 246-252.