Systolic Functional Programming - CiteSeerX

3 downloads 0 Views 207KB Size Report
A Systolic Array (SA) is a GSN associated with a function ? mapping nodes ... These two conditions restrict that the indexing function ? should be chosen so.
Systolic Functional Programming Weichang Du Department of Mathematics, Statistics, & Computer Science University of New Brunswick Saint John, N.B. Canada E2L 4L5 E-mail: [email protected] Abstract

This paper presents speci cations of systolic algorithms in a parallel functional language SysLucid. The language extends conventional functional languages by allowing meanings of program elements, such as variables or expressions, to vary in a context space consisting of temporal and spatial dimensions. In specifying systolic algorithms, the spatial dimensions represent the processor coordinations, and the temporal dimension represents the temporal points for synchronization and pipelining. The temporal and spatial operators specify temporal and spatial relationships among program elements. In SysLucid programs, the both pipeline parallelism and spatial parallelism, as well as temporal, pipeline, and spatial communications, can be speci ed.  This

work was supported in part by NSERC of Canada under a research grant.

Proc. ICCI’94, 924-939  1994 Int. Conf. on Computing and Information

1 Introduction Systolic arrays, initially proposed by H.T.Kung [KL78][Kun82], are a kind of highperformance, special-purpose computer systems for solving computing{bound problems in many areas, such as signal/image processing and scienti c computing. One of problems in systolic array designs is concerned with the speci cation and veri cation of systolic arrays. For the purposes of implementation and proof, a rigorous notation other than informal pictures or \snapshot" for specifying systolic arrays is desirable. The search for programming languages that permit complete speci cations of the computation and communication activities on the systolic arrays represents a very challenging eld of programming language research [Kun82][Kun87]. In this paper, we present speci cations of systolic arrays in a parallel functional language SysLucid. SysLucid is an intensional functional language; it extends conventional functional languages by allowing meanings of program elements varying in a context space consisting of temporal and spatial dimensions. In specifying systolic arrays, the spatial dimensions represent PE coordinations, and the temporal dimension represents the temporal points for synchronization and pipelining. The temporal and spatial relations are speci ed by temporal and spatial operators. In SysLucid programs, the both pipeline parallelism and spatial parallelism, as well as temporal, pipeline, and spatial communications, can be speci ed. The rest of the paper is organized as follows. Section 2 de nes a model of systolic arrays. Section 3 overviews the mLucid programming language. Section 4 describes SysLucid as a specialization of mLucid and shows a general method for specifying systolic arrays in SysLucid. Section 5 is related work. Section 6 is the concluding remarks. Appendix shows SysLucid programs that specify a variety of systolic arrays for convolution computation. 925

2 A model of systolic arrays We de ne a model for systolic arrays, based on [Lei83][MR84][PS88], in two levels: general systolic networks and systolic arrays, A general systolic network (GSN) is de ned as a directed multigraph GSN = (N; E ), where the nodes n 2 N represent processing elements (PEs for short) and the edges e 2 E represent communication paths between PEs. Each edge e is a 6-tuple (p; o; c; m; i; q), where 1. p; q are two PEs; 2. the two variable symbols, o attached to the source p and i attached to the sink q, represent an output port of the PE at the source and an input port of the PE at the sink, respectively; 3. the nonnegative integer m denotes the communication delay along the path; 4. the constant c in the data domain on which the array works denotes the initial data on the path. The meaning of a variable symbol f for an input/output port of a PE varies with a clock, i.e. it is a function f : T ! D where T is the set of time points (natural numbers) and D is the data domain on which the PEs work. For an edge (p; o; c; m; i; q), the value of an input port i is de ned by ( if t < m i(t) = co(t ? m) otherwise. The value of an output port o of a PE at a time point depends on the values of some input ports at the same PE at the same time. The semantics of the GSN is the solutions for all the input/output ports at all the PEs in the network. 926

A Systolic Array (SA) is a GSN associated with a function ? mapping nodes of the GSN to an n{dimensional grid. It has the following special properties. Let (p; o; c; m; i; q) and (q; o; c0; m0; i; s) be two connected edges in the GSN. The following conditions should hold true: 1. ?(p) ? ?(q) = ?(q) ? ?(s), where `{'is the vector di erence. 2. m = m0 and c = c0. These two conditions restrict that the indexing function ? should be chosen so that the PEs on the lines performing same function applications, for computing the values of their output ports, should be placed in a uniform distance and have the same delay on their communication paths. For example, Figure 1 shows an SA for systolic matrix multiplication, where each node is represented by its index (i,j). In the graph, an edge connecting two nodes, 00

01

02

03

10

11

12

13

20

21

22

23

30

31

32

33

(Cout, 0, 1, Cin)

i, j

(Aout, 0, 1, Ain)

(Bout, 0, 1, Bin)

Figure 1: A systolic array for matrix multiplication u and v, has either of the form (u,Aout,0,1,Ain,v) or (u,Bout,0,1,Bin,v). An edge connecting the same node v has the form (v,Cout,0,1,Cin,v). The input ports of each node, Ain, Bin and Cin, are de ned by ( if t < 1 Ain i;j (t) = 0Aout i;j ? (t ? 1) otherwise (

)

(

927

1)

(

0 if t < 1 Bin(t) i;j = Bout ( t ? 1) otherwise i? ;j ( 0 if t < 1 Cin(t) i;j = Cout i;j (t ? 1) otherwise where X i;j means that the input port X at node (i,j). The output ports of each PE, Aout, Bout and Cout, are de ned as (

)

(

(

(

)

1 )

(

)

)

Aout i;j (t) = Ain i;j (t) (

)

(

)

Bout i;j (t) = Bin i;j (t) Cout i;j (t) = Ain i;j (t)  Bin i;j (t) + Cin i;j (t) The output ports of each boundary node that inputs data, Aout i; and Bout de ned as Aout i; (t) = Ai(t) Bout ;j (t) = Bj (t) where Ai and Bj are input data streams with type T ! Z . (

(

)

(

)

)

(

)

(

)

(

)

( 0)

( 0)

(0;j )

, are

(0 )

3 An overview of mLucid mLucid enriches the semantics of conventional functional languages by introducing intensional semantics. In mLucid, the value of an expression varies in an arbitrary n{dimensional integer space, which we call the context space. Each dimension in the context space in nitely extends toward both the positive and negative directions. The value of an expression in mLucid depends on a context in the context space, that is, it is a function, called an intension, from the context space to the data domain of the base functional language. 928

Let D be a at data domain. The intensional semantics of an expression in mLucid is an intension in ID = Z ! ! D, where Z is the domain of integers and ! is the rst in nite ordinal number. Z ! speci es the largest number of possible dimensions that constitute the context space. In practice, however, the value of an expression in a particular mLucid program may vary only at contexts along a nite number of dimensions in the context space, along others it is constant. This number can be determined by analyzing the program[Du91a]. To simplify the presentation, in the following, we use notation mLucid(n) to denote the intensional language that has the same syntax and semantics as mLucid, except that its context space is Z n instead of Z ! for some n. mLucid enriches the semantics of functions from f : Dk ! D (k  1) to f : IDk ! ID in a pointwise manner. In mLucid, the application of a function to its operands whose values are intensions is evaluated pointwise. The result of the application at each context is the result of the operator applying to the values of the operands at that same context. Such pointwise functions do not switch context when applied to the operands at a context. mLucid provides four primitive intensional operators: origin, next, prev (for \previous"), and fby (for \followed by"). The operators switch context from a context to another context along a given dimension in the context space. The operator origin(d)(x) switches context from any context along dimension d to the origin (coordinate 0) of the dimension, and returns the value of x at that context. The operators next(d)(x) and prev(d)(x) switch context from a context to its neighboring contexts along dimension d toward the positive and negative directions, respectively, and return the value of x at that context. Two generic intensional operators, next(d)(i,x) and prev(d)(i,x) are de ned by their primitive counterparts. 929

next(d)(i,x) = if i = 0 then x else next(d)(i-1, next(d)(x)) fi; prev(d)(i,x) = if i = 0 then x else prev(d)(i-1, prev(d)(x)) fi;

The operators switch context from a context to other contexts along dimension d which are i points away toward the positive and negative directions, respectively, and return the value of x at that context. The operator fby(d)(x,y) switches context according to the position of the current context in the context space. fby behaves as the identity operator at contexts on the non{positive side of dimension d and returns the value of x at the current context, and behaves as the prev(d)(x) operator at contexts on the positive side of the dimension d. Using the operator fby, a special operator index is de ned by index(d) = fby(d)(0, index(d)+1)

An application of index to a dimension indicator d is the intension whose value at a context p is p's coordinate for dimension d. In mLucid, we de ne the dimensionality of an expression e as the set of dimensions in which e's value varies and in all other dimensions outside the set it is constant. The dimensionalities of all input variables of a program must be declared in the beginning of the program.

4 Systolic programming in SysLucid

4.1 Temporal and spatial subspaces

To specify the space and time behavior of systolic arrays, we rst consider that the context space of a program in mLucid(n+1) consists of a temporal subspace 930

and a spatial subspace. The temporal subspace is one{dimensional and consists of dimension 0 of the context space, which we also call the time dimension. The spatial subspace is n{dimensional and consists of dimensions k for 1  k  n of the context space, which we also call the space dimensions. With respect to the time dimension and the space dimensions, the coordinates of a context p in the context space are divided into two vectors, t and s. t representing a point in the time dimension is a singleton consisting of p's coordinate for dimension 0. s representing a point in the spatial subspace consists of all other n coordinates of p. In this sense, we represent a context p as a pair (t; s). Given a point t in the time dimension, there are in nitely many contexts in the context space associated with t, that is, all the contexts (t; s) with the same t. In other words, each point in the time dimension is associated with a spatial subspace; at di erent time points the subspaces associated with them may vary. On the other hand, given a point s in the spatial subspace, there are also in nitely many contexts associated with s, that is, all the contexts (t; s) with the same s. In other words, each point in the spatial subspace is associated with a time dimension; at di erent space points the time dimension associated with them may vary. In the following description, to distinguish intensional operations on the two subspaces, we call intensional operations that switch context in the time dimension temporal operations, and use the sux notation time to represent (0), where is an intensional operator. We call intensional operations that switch context in the space dimensions spatial operations.

4.2 SysLucid as a specialization of mLucid

With respect to the model of systolic arrays and the above division of the context space, SysLucid as a specialization of mLucid has the following syntactic restrictions. 931

1. Variables de ned in a program of SysLucid are classi ed into three categories: result, output{port and input{port. 2. A result variable r never appears on the right hand side of any equation in the program, and in r's de ning expression only output-port variables can appear. The the program's output (in the form of its de ning expression) is the list of all result variables. 3. An output-port variable o is de ned by a pointwise expression in which only input-port variables can appear. 4. An input-port variable i is de ned by a conditional on the space indices. It has the most general form:

i = cond    < condition >k : < exp >k    end; < condition >k consists of spatial index operators, integer constants and pointwise arithmetic operators. The expression < exp >k has the form if index time < m then c else prev time(m,switch(o)) fi;

where m is a nonnegative integer constant, c is a constant, switch(o) is a composition of spatial generic operators next(d)(l,x) and prev(d)(l,x), where d 6= 0, l as an o set argument may consist of index operators, integer constants and pointwise arithmetic operators, and o is an output{port variable. Notice that the if{then{else{ construct is a special case of cond with only two cases.

932

4.3 SysLucid programs as systolic array speci cations

In the following, we show that a program in SysLucid is semantically equivalent to a GSN. Given a GSN = (V,E) with the sets of input/output ports I and O, a SysLucid program in mLucid(k+1) for some k > 0 can be de ned as follows.  Let I and O be the sets of input{port and output{port variables of the program, respectively.  Each node v in V corresponds to a point in the k{dimensional spatial subspace. v is represented by the vector (v ; v ;    ; vk ).  Let points in the time dimension correspond to the discrete clock T.  Each input-port variable i is de ned by a conditional with jV j cases, each of which corresponds to a node v. The condition of each case corresponding to v represented by (v ; v ;    ; vk ) is de ned as 1

1

2

2

index(1) eq v and index(2) eq v and    and index(k) eq vk 1

2

Let (u; o; c; m; i; v) 2 E and n = u ? v, where u ? v is the vector di erence of the vector representations of u and v. If u is an interior node, the consequent of the case is de ned by if index time < m then c else prev time(d, switch(o)) fi;

where switch(o) is the composition of k generic intensional operators on o, each of which is either next(j)(nj ,x) if nj  0 or prev(j)(nj ,x) otherwise, where 1  j  k. If u is a host, the consequent is de ned by a free variable. 933

 Let F be the set of functions that de ne the functionalities of output ports in O. Each output-port variable o is de ned by

o = fo0 (Io); where fo0 is a pointwise function that has the same semantics as fo 2 F associated with o and Io  I . According to the semantics of mLucid, in the above de ned program, the value of each input-port or output-port variable at a space point which corresponds a node in GSN is the solution of the corresponding input or output port of that node. A systolic array SA = (V,E,?) is semantically equivalent to a SysLucid program P with the following further syntactical constraint which corresponds to the constraints in the de nition of SA. That is, the de ning expression of each input{port variable i consists of a single case if index time < m then c else prev time(m,switch(o)) fi;

and the o set argument l of each of the generic intensional operations next(d)(l,x) and prev(d)(l,x) in switch(o) is constant. The following is a SysLucid program that speci es the systolic array in Figure 1. In the program, result is a result variable. Ain, Bin, and Cin are input-port variables. Aout, Bout, and Cout are output-port variables. dimensionality A: {time,1}; B: {time,2}; n: {}; result where result = asa_time(Cout, iseod(Aout)); Ain = fby_time(0, fby(1)(A, Aout)); 934

Bin = fby_time(0, fby(2)(B, Bout)); Cin = fby_time(0, Cout); Aout = Ain; Bout = Bin; Cout = Ain * Bin + Cin; end

4.4 Speci cation of parallelism and communications

Context parallelism and intensional communication in SysLucid programs specify the parallel computation and communication behaviors of corresponding systolic arrays. Parallel processing in systolic arrays combines two types of parallelism at architecture level: temporal parallelism and spatial parallelism. In a network of PEs, temporal parallelism, or so-called pipeline parallelism, can be stated as follows. An output port of each PE in the network produces a sequence of results, i.e. a data stream. The ith data item produced by an output port o at PE at time point t is passed to PE 's neighboring processor PE . Using the received input, an output port o of PE produces its ith data item at next time point t + 1; and at the same time (t + 1) o at PE produces its (i + 1)th item. For example, in the matrix multiplication systolic array (Figure 1), the PEs in the same horizontal or vertical lines have pipeline parallelism. On the other hand, spatial parallelism means that at a time point t the output ports of two or more PEs in the network can compute their ith items in parallel. For example, in the matrix multiplication systolic array, the PEs in each diagonal have spatial parallelism. Context parallelism in a SysLucid program speci es both pipeline parallelism and spatial parallelism. The pipeline parallelism is speci ed by the context parallelism between two contexts (t; s) and (t0; s0), where s, s0 represent two neighboring PEs 1

1

2

2

2

1

1

1

935

and t,t' represent two time points, such that in the context dependency graph (t0; s0) depends on (t ? k; s) for some k > 0. The spatial parallelism in the systolic array is speci ed by the context parallelism between two contexts s and s0 in the spatial subspace, representing PEs at all time points. For example, Figure 2 shows the context dependency graph of the matrix multiplication SysLucid program. Pipe Line Parallelism 0

1

Spatial Parallelism

2

Figure 2: The context dependency graph of a systolic matrix multiplication program In the graph, there are two types of context parallelism. One is between a node at context (t; v) and other nodes at contexts with the time points t0  t and the space points that are v's neighbors. The other is among nodes that do not have the temporal dependency relation, that is, the context parallelism among nodes (t; v) and (t0; v0) for all t and t0. In accordance with pipeline parallelism and spatial parallelism, we may also classify intensional communication in SysLucid programs into three categories: temporal communication, pipeline communication, and spatial communication. 936

Temporal communication in a SysLucid program involves intensional operations that switch context only in the time dimension. Temporal communication speci es the transformation of two internal states of a PE in the array, each of which corresponds to a time point. For example, in the systolic matrix multiplication program, there is temporal communication, at the same space point that represents a PE, between the value of variable Cout at time point t and that of variable Cin at time point t + 1. Spatial communication in a SysLucid program involves intensional operations that switch context only in the space dimensions. Spatial communication speci es the communication of two directly connected PEs in the array without delay. Pipeline communication in a SysLucid program involves intensional operations that switch context in both the time dimension and some of space dimensions. It speci es the communication of two directly connected PEs in the array with delay. For example, in the systolic matrix multiplication program, there is pipeline communication between the value of variable Aout at a space point representing a PE at a time point t and that of variable Ain at the PE's right neighboring point at time point t + 1.

5 Related work The programming language Crystal [CCL91] is another functional language originally designed for systolic array speci cation. In Crystal, a program consists of a system of recursion equations similar to other functional languages, but it is interpreted di erently from the traditional functional interpretation. In Crystal, an operational interpretation is given to the equations, and parallelism similar to context parallelism in SysLucid is introduced. Instead of being considered as actual parameters of a function application like C(i,j), the pair (i,j) is considered 937

as an index pair for a process in an ensemble of parallel processes. Each pair (i,j) in the set A corresponds to a process, and A is called a process structure. A major di erence between Crystal and SysLucid is their mathematical semantics. The semantics of Crystal is extensional, while that of SysLucid is intensional. In Crystal, since indices of dimensions appear explicitly as parameters in function de nitions, functions have extensionality, that is, the value of a function application corresponds to an explicit context. By contrast, in SysLucid, since contexts are implicit, functions have intensionality, that is, the value of a function application as an intension depends on the implicit context without referring to indices explicitly.

6 Concluding remarks Intensional programming languages enrich conventional functional languages with intensionality. SysLucid as a specialization of intensional intensional languages for systolic array programming can be used to specify parallel computations and communications of systolic arrays. In SysLucid program, context parallelism represents both the pipeline parallelism and spatial parallelism in corresponding systolic arrays, and intensional communication speci es the connectivity of the PEs in the arrays and synchronization among computations of the neighboring PEs. Future work along this direction includes a generalization of the systolic programming { multidimensional data ow programming, which allows multiple data streams

ow into and out from a multidimensional data ow network. In multidimensional data ow programming, the de nitions of variables at a space point specify a data ow subnetwork; temporal operations in the de nitions specify the stream operations in the subnetwork; spatial operations in the de nitions specify communication between the subnetwork and others at di erent space points. 938

References [CCL91] M. Chen, Y. Choo, and J Li. Crystal: Theory and pragmatics of generating ecient parallel code. In Parallel Functional Languages and Compilers, pages 255{308. ACM Press, 1991. [Du91a] W. Du. Indexical Parallel Programming. PhD thesis, Department of Computer Science, University of Victoria, B.C., Canada, 1991. [Du91b] W. Du. Two indexical parallel programming techniques. In The 1991 International Symposium on Lucid and Intensional Programming, pages 9{18, April 1991. [Du93] W. Du. Context parallelism in an indexical programming language. In Proceedings of 5th International Conference on Computing and Information. IEEE CS Press, May 1993. [KL78] H.T. Kung and C.E. Leiserson. Systolic arrays (for vlsi). In Sparse Matrix Symposium, pages 256{282, 1978. [Kun82] H.T. Kung. Why systolic architectures? IEEE Computer, 15(1):37{46, Jan. 1982. [Kun87] S.Y. Kung. VLSI Array Procesors. Prentice Hall, 1987. [Lei83] C.E. Leiserson. Area{Ecient VLSI Computation. Addison Wesley, 1983. [MR84] R.G. Melhem and W.C. Rheinboldt. A mathematical model for veri cation of systolic networks. SIAM J. Comput., 13(3):541{565, Aug. 1984. [PS88] S. Purushothaman and P.A. Subrahmanyam. Reasoning about systolic algorithms. Journal of Parallel and Distributed Computing, 5:669{699, 1988.

939