Research on Programming Languages for

0 downloads 0 Views 125KB Size Report
In the research of the rst category, we are developing a practical massively parallel language ... NCX is designed as an extension of the language C in order to.
Research on Programming Languages for Massively Parallel Processing Makoto Amamiya31 , Masahiko Satoh32 , Akifumi Makinouchi31 , Ken-ichi Hagiwara33, Taiichi Yuasa34, Hitoshi Aida35 , Kazunori Ueda36 , Keijiro Araki37 , Tetsuo Ida38 , Hiroaki Nishikawa38 and Takanobu Baba39 Massively Parallel Language Research Group New Information Processing System on Massively Parallel Processing Principle Grant-in-Aid for Scienti c Research on Priority Areas Ministry of Education, Science and Culture, Japan

1 Introduction We are pursuing research on programming languages for massively parallel processing as a research group involved in the research project \New Information Processing System on Massively Parallel Processing Principle" in the Grant-in-Aid for Scienti c Research on Priority Areas of the Ministry of Education, Science and Culture, Japan. The objective of the research is the following two points according to the top level research objective of our Massively Parallel Processing Principle Research Project, we abbreviate it as MPP Project; 1. Develop a prototype of massively parallel programming language and compiler system, which is competitive to commercial language systems like data parallel C, Fortran D or HPF. 2. Explore a massively parallel computation model, and design an experimental language as an implementation of the newly explored massively parallel computation model. In the research of the rst category, we are developing a practical massively parallel language and its compiler system. This language system will be o ered to practical use on the prototype of massively parallel machine named JUMP-1, which is under development by other group of our MPPP Project. This language, named NCX [1], is originated from SIMD type parallel language which has been developed by the group of Toyohashi Technical Science University, and is extended to support the programming for the MIMD machines like JUMP-1. NCX is designed as an extension of the language C in order to support mainly data parallel parallel programming. Features introduced in NCX were 0 31 32 Tohoku University, 33 Osaka University, 34 Toyohashi University of Technology, Kyushu University, 35 University of Tokyo, 36 Waseda University, 37 Nara Institute of Science and Technology, 38 University of 39 Utsunomiya University Tsukuba,

1

examined and discussed in our language research group. All members of our language research group contributed to the design of NCX. In the research of the second category, we made research on various kind of issues concerning massively parallel computation model, semantics, architecture and language design issues from massively processing point of view. In this research, we designed a new MIMD language named massively parallel language V [2], which has maximal asynchronous computation semantics. based on data ow computation scheme. The language V is an extended version of data ow-based functional language Valid, which has been developed by the group of Kyushu University. The new language V was also designed through discussions of our research group.

2 Language Design The goal of our research is to develop a language system, which will incorporate the following two describing paradigms, (a) exible data parallel programming paradigm (b) grain-size-free concurrent object-oriented programming paradigm However, to develop a language which integrates these two paradigms directly is not a practical way, since these two paradigms is based on quite di erent computation structures and semantics. Therefore, we decided to develop two kinds of languages as the rst step, NCX and V, each of which is more suitable for each of the two programming styles.

2.1 NCX The design philosophy of NCX is to support the programming of high performance computation in practical applications. Most promising high performance computation is in the area of scienti c applications such as matrix manipulations and numerical computation of partial di erential equations, in which data parallelism is most e ective. NCX is designed as an extension of the language C so that the parallel programming features of NCX are suitable for data parallel computations. Several features of NCX are summarized as the following. (a)

Computation Model Semantics of NCX is developed on the SIMD computation model, since the SIMD computation model is considered more natural as the semantics of data parallel computation. However, for the users who consider the MIMD model is more natural, MIMD computation model is also supported in the sense the critical synchronization points in MIMD computation are set equivalent to that of SIMD computation. For example, in the execution of if statement and loop statement, the user may think executions to be done in parallel between true part and false part and they are synchronized at exit point of the if statement, or he may think the executions to be done in serial order. In language semantics level both are set equivalent. In the SIMD semantics, although users may write programs which would run on a virtual SIMD machine, the programs run on not only the SIMD machine but also SPMD and MIMD machines. The NCX compiler generates most suitable code 2

for the target machine whether it would be SIMD, SPMD or MIMD. Minimum barrier synchronization points are extracted at compile time by synchronization point analysis considering the type of target machine. (b)

Data Structure and Data Mapping NCX o ers the means for eld de nition, in which elements of structure data are executed in parallel. Field corresponds to a set of virtual processors, which are allocated one to one to data items which should be operated in parallel. In the de nition of a eld, user can specify a topology of processor connection. Mesh, binary-tree and hypercube are provided as the basic eld topology. In the course of execution, a eld can be switched to other eld or expanded its dimension according to the change of the data structure. Change of the eld causes data transfer. Data parallel operations are described in local within a eld. Data items in the same eld are identi ed using index.

(c)

Variable De nition and Reference Variables used in data parallel executions are de ned with eld name. Field name is attached to such variables when they are de ned. Every variable of the same name is allocated to each virtual processor, and such a variable is referred using eld index.

(d)

View of Parallel Computation Data parallel computation is programmed by focusing the computation to the current eld, to which a virtual processor is allocated and in execution phase. All computations of data items of the same eld are done synchronously in all virtual processors allocated to the eld. Reduction operators, such as summation, product, maximization, etc., are de ned as primitives, in order to reduce data items in active virtual processors into a single data in mono eld.

(e)

Function De nition and Invocation Function invocation is done at all active virtual processors. Functions invoked at the same time, and the number of parameters of the functions should be same in all active virtual processors. Functions are de ned by classifying three categories; functions called by a speci c eld, functions called by any eld, and functions called only by a eld on a basic eld. Field informations can be attached to argument, by which access to eld data in the function body are controlled. In order to control parameter passing, variable descriptor is provided, which describes information for bindings between formal and actual parameters.

(f )

Input and Output Sequential data I/O, which is done in mono eld, uses basis I/O facilities o ered by the language C. In addition to these conventional sequential I/O, NCX o ers parallel I/O facilities, which can be invoked in every eld. Parallel I/O interface is supported in two types; parallel I/O stream and parallel memory mapped I/O. By the parallel stream I/O, each virtual processor can access directly to each stream data element, while in the parallel memory mapped I/O, virtual processors access data by mapping each I/O le to own memory space.

As an example of NCX, the following program computes N by N matrix multiplication. 3

field matrix(N,N) on mesh; int a, b, c on matrix; in matrix(i,j) { c = 0; spawn(k:N) c@(i,j) += a@(i,k) * b@(k,j); }

In this example, the eld matrix is de ned on base eld of mesh, and variables a, b, c are declared in the eld matrix. The result of multiplication of N by N matrix a and b is assigned to c. The statement sl spawn expands the current led from two dimensional eld of matrix to a three dimensional mesh eld. Within the spawn statement, for each virtual processor Pmesh (i; j ), a set of virtual processors fPmesh (i; j; k) j 0  k < N g participates in computation, where the virtual processors Pmesh (i; j; 0) are same as Pmesh (i; j ). The following example de nes a general matrix multiplication. This function is invoked from any eld by writing, for example, matmul(@arg1, @arg2, @result). void matmul(a,b,c) descriptor int a on field Fa(?I, ?K) descriptor int b on field Fb( K, ?J) descriptor int c on field Fc( I, J) { in Fc(i,j) { c = 0; spawn(k:N) c@(i,j) += a@(i,k) * } }

on mesh; on mesh; on mesh;

b@(k,j);

2.2 Massively Parallel Language V Massively parallel language V is an experimental language for a new massively parallel programming paradigm. In a sense, the language V is considered as to be placed at another extreme point of NCX. The language V is designed in order to explore a natural MIMD-based programming, grain-size-free parallel object-oriented programming, and fusion of the data parallel and parallel object-oriented programming in natural way. Our major objective of developing the language V is to prove feasibleness of MIMD-base massively parallel programming through experiments using the language V. The research on the language V includes designing a language for massively parallel computation on the basis of data ow-based MIMD semantics and developing compiler construction technique for MIMD-based massively parallel programming languages. The major features of the language V are summarized as the following. 4

(a)

Computation Model MIMD computation model, on which the semantics of the language V put its basis, has, in principle, no notion of memory rewriting and sequential execution control. Users write their programs on the concept of relation between data de nition and data reference. Users consider that their programs are executed in such a way that synchronization and execution sequence schedule are controlled automatically by data dependency rule. The compiler of the language V extracts parallel threads and inter-thread synchronization points, and also schedules thread allocation and inter-thread parallel execution, so as to reduce the run time overhead. Although the language V inherits the semantics of data ow-based functional language, it introduces mutable data object like array data rewriting rules, and o ers features to describe history-sensitive objects which have own local state. Message communication among objects are also integrated into data ow computation framework, in which message driven computation is mechanized as a natural extension of data driven computation.

(b)

Data Parallel Computation Data parallel computations for array and vector data are described using on an applyall type semi-higher-order function construct. The for-each expression is provided for this data parallel description. For-each expression is executed by an asynchronous fork-join expansion. Execution of array and vector data uses I-structure memory. Although all data objects are immutable in basic semantics, mutable data object is also possible to be de ned and manipulated under the responsibility of user.

(c)

Concurrent Object-Oriented Computation The object which has its own local state and history-sensitiveness is called agent. The language V has facilities to de ne agent and describe message/data communications between agents. For an ecient computation by using a set of agent which has a regular structure, the language V o ers a facility to de ne an ensemble of agents and its topology. As a typical ensemble of agents, the notion of agent array is introduced in the language V. The execution of agent is asynchronous. Agent suspends its execution and waits for a data or message when the execution encounters the input point of message/data, and, at the arrival of the message/data, the execution resumes automatically by message-driven computation rule. In addition to the mechanism of message/data identi cation and thread triggering by pattern matching, the concept of port is introduced for ecient message/data communications. By using port, an agent sends message/data directly from the export point of the agent to the import point of other agent.

As an example of the language V, the following program de nes a function matmul, which computes n by n matrix multiplication. This program performs the data parallel computation similar to the NCX program. function matmul(a,b: matrix) return matrix = foreach (i,j: integer) in ([1..n],[1..n]) body sum-of(for k init 1 to n do a[i,k]*b[k,j])

5

The following example is a concurrent object-oriented program, in which ports in order to implement the ecient message passing.

agent

uses

agent average(val:real) {channel in1,in2,in3,in4:real} {export out:real} = for (v:real, count:integer, i1,i2,i3,i4:stream} init (val,0,in1,in2,in3,in4) body if count>=M then v else {slet put(v,out), next=(head(i1)+head(i2)+head(i3)+head(4))/4 in recur(next,count+1, tail(i1),tail(i2),tail(i3),tail(i4))}; field F(average)(val:array_of_real) on torus do foreach (i:integer) in ([1..n]) body foreach (j:integer) in ([1..n]) body link(ex[1],ch[1]@up,ch[2]@right,ch[3]@down,ch[4]@left); foreach (i:integer) in ([1..n]) body foreach (j:integer) in ([1..n]) body create(average,val[i,j]);

This program iterates M times the parallel computation of each pixel, which exchanges data between four neighbors and calculates new average value of its own. The eld definition generates an agent aggregate F which has torus topology. In the torus structure, four neighbors can be identi ed by the speci er up, down, right and left.

3 Compilers Compiler construction of NCX is shown in gure 1. In the rst phase, the SIMD source program of NCX is transformed into a virtual machine code. This intermediate code is composed of MIMD emulation loops described by an extended C code on a virtual machine, in which barrier synchronization points are inserted. The next phase translates this intermediate code into the target machine code. In this phase, the following transformation or code generation is performed using target machine information; the insertion of synchronization control code, multi-allocation of virtual processors to physical processor, generation of multithreaded code, thread switching control code and communication control code, and code optimization. The generated code is an SPMD program. The compiler construction of language V is shown in gure 2. The virtual machine of the intermediate code of language V is an ensemble of virtual processor. The virtual processor 6

has an execution unit which supports ne grain split-phase multithreaded computation, data communication control, and memory mechanism which support I-structure access. In the second phase, the intermediate code is transformed into target machine code. Code generation and code optimization which are similar to the second phase of NCX compiler are also performed in this phase. Target Machine (1) Extract synchronization point Optimization(SPMD) PE allocation PE communications etc. Virtual Machine Code (Intermediate code)

NICS NCX program

MIMD Machine (Virtual Processor, Virtual Memory)

Syntax analysis Analyze synchronization point Generate emulastion loop Virtual PE communications

MIMD Machine (AP1000)

Extract synchronization point Optimization(SPMD) PE allocation Target Machine (2) PE communications etc. MIMD Machine (Jump-1) PE allocation Data transfer etc. Target Machine (3) SIMD Machine

Figure 1

NCX Compiler Construction

Object allocation Thread optimization Physical PE allocation PE communication macro etc.

Target Machine (1) MIMD Machine (AP1000)

Virtual Machine Code (DVMC) V program Syntax analysis Dataflow analysis Generate virtual MIMD code (Dataflow, Multithread)

Figure 2

(Multithreading, I-structure Virtual Memory)

Target Machine (2) Object allocation Thread optimization (Memory access, Comm. Synchronization) Physical PE allocation PE communication macro etc.

MIMD Machine (Jump-1)

Compiler Construction of Language V

4 Conclusions In this paper, activity of research group of massively parallel programming language was introduced. In this group, two types of languages are under development. One is the 7

development of SIMD type language called NCX, which is oriented toward data parallel execution, and the other is development of MIMD type highly asynchronous language called V, which is oriented toward data parallel and concurrent object oriented language. The NCX is designed as a data parallel extension of the language C in order to support practical massively parallel computations especially in the application area of scienti c computations. The language V is designed as an experimental language for exploring a new massively parallel programming paradigm.

References [1] The Massively Parallel C Language NCX: Language Manual (Version 3), Proceedings of Fourth Symposium on \New Information Processing System on Massively Parallel Processing Principle," Grant-in-Aid for Scienti c Research on Priority Areas Ministry of Education, Science and Culture, Japan, pp.2.8-2.111, March 14-15, 1994 (In Japanese). [2] The Massively Parallel V Language,Proceedings of Fourth Symposium on \New Information Processing System on Massively Parallel Processing Principle," Grant-in-Aid for Scienti c Research on Priority Areas Ministry of Education, Science and Culture, Japan, pp.2.112-2.190, March 14-15, 1994 (In Japanese).

8