The Do! project: distributed programming using Java - CiteSeerX

10 downloads 66828 Views 195KB Size Report
In a rst step, the programmer writes his application as a ... in Java; it comprises a parallel framework1 9] used to write parallel programs, a distributed. Parallel ...
The Do! project: distributed programming using Java Pascale Launay and Jean-Louis Pazat IRISA, Campus de Beaulieu, F35042 RENNES cedex

[email protected], [email protected] http://www.irisa.fr/caps/PROJECTS/Do/

Abstract. The aim of the Do! project is to ease the task of programming distributed applications using Java. In a rst step, the programmer writes his application as a shared-memory parallel program, by de ning the components of the program; in a second step, the programmer de nes the component locations on distinct hosts; in a third step, our preprocessor transforms the shared-memory parallel program into a distributed-memory program. This paper is an overview of the Do! project.

1 Introduction { overview of the Do! project Writing distributed applications is known to be dicult; many problems have to be taken into account at the same time such as location and access to data and network protocols details. Implementing communications between objects located on distinct hosts is eased in Java by using the Socket class or the Java Remote Method Invocation (rmi) package [8]. However object locations still have to be taken into account and objects are not transparently accessed if they are distant or local. We think that there is a need for higher level tools to ease the programming of distributed applications in Java. The Do! environment ( gure 1) provides an easy way to write distributed applications in Java; it comprises a parallel framework1 [9] used to write parallel programs, a distributed User-defined components Do! environment Preprocessor Parallel framework

DoT

Distributed framework Do! runtime

javac javac rmic JVM

JVM

Parallel program

JVM

Distributed program

Fig. 1. the Do! environment 1

JVM

the framework approach is introduced in section 1.2

framework to express distributed programs, a runtime and a preprocessor (DoT) to transform parallel programs into distributed programs. The frameworks and the runtime are composed of standard Java class libraries, without any extension to the language; our system does not make any byte-code transformation. Therefore Do! parallel and distributed programs can run using any standard Java environment.

1.1 Writing a distributed application In a rst step, the programmer writes an application as a shared-memory parallel program without bothering with the component locations. He/she de nes the components of the program: tasks and data objects. The Do! parallel framework manages the cooperation scheme between components: the programmer describes a task execution model and accesses to data by choosing a class in the framework library. In a second step, the programmer de nes the component locations on distinct hosts by choosing appropriate layout managers. Layout managers are objects included in the parallel framework to describe the object distribution policy. This allow us to automaticaly generate the distributed program from the parallel program. In the distributed framework, layout managers are used to implement the object distribution policy. In a third step, the DoT preprocessor transforms the shared-memory parallel program into a distributed-memory program. Because a program is composed of framework classes and user de ned classes, program transformations consist in changing the framework used in the programs (the parallel framework is replaced by the distributed framework), and transforming components (user-de ned classes) in order to allow transparent locations of objects (mapping and remote accesses).

1.2 Programming with frameworks New programming models are usually de ned through the extension of existing languages or the de nition of new languages. We use the framework approach which is an alternative way to de ne programming models without extending a language. Frameworks are especially suitable for object-oriented languages; they can be seen as a model of program or a program with \holes", where components are missing and have to be de ned by the programmer to t his application in. Unlike in the library approach where the programmer writes the program structure and uses library classes to implement some speci c computations, in the framework approach, the program structure is de ned by the framework, and the programmer provides the components implementing the program's speci c computations ( gure 2).

framework

user-defined components

user-defined program

library components

Fig. 2. the framework and the library approaches

2 Parallel framework The Do! parallel framework is used to express parallel programs. In this paper, we call \parallel program" a program consisting of several control ows sharing a single address space (for example running on a shared-memory multiprocessor). Basic constructs for parallelism already exist in Java at the language level (the Thread class) but we think that there is a need for a more structured parallel programming model in Java. The parallel framework is based on the notions of active objects (tasks) (section 2.1) used to introduce parallelism and collections (section 2.2) to structure parallelism.

2.1 Active objects (tasks)

An active object is the integration of an object and a process. An active object has its own activity: in a sequential object-oriented program, the control

ow runs successively in distinct objects of the program; using active objects, several control

ows run concurrently in distinct objects, leading to inter-object concurrency. When active objects access other objects of the program, they can communicate (through shared objects), leading to intra-object concurrency when concurrent control ows run into a single shared object. A task (active object) activation can be synchronous (the control returns to the \activator" after the task termination) or asynchronous (the control returns when the task is activated). Synchronization with a task can be done by waiting the task termination or by stopping the task execution.

2.2 Collections

Collections are structured sets of objects such as lists, arrays and trees. Collections are used in the parallel framework to store both passive and active objects and thus to express structured parallelism. The programmer groups parallel tasks into a \task collection" and data into a \data collection". The parallel framework implements a programming model where the tasks grouped in a collection run in parallel, with corresponding data items of a data collection as arguments. We have introduced the class Par as a new class to implement a parallel construct without any extension to the Java language. The class Par implements both the parallel activation of the tasks stored in the task collection with the corresponding data items in the data collection and the synchronization with the parallel task terminations. Nested parallelism is introduced by the fact that all instances of the class Par are active objects (tasks), and can be inserted in a task collection and activated in parallel with other tasks. Using this technique, several parallel programming models can be implemented as framework library classes or user-de ned classes. For example, from a data collection and a single task, the DataPar class can be implemented de ning a data-parallel programming model: a copy of the task is activated in parallel for each data item of the data collection; from a task collection and a single data item, the SharedPar class can be implemented de ning a parallel model where all tasks communicate through the single shared data item.

2.3 Layout managers

The parallel framework can be used to write shared-memory parallel programs; but our aim is to generate distributed programs, according to user information about the component locations provided by layout managers. Layout managers are objects included in the parallel framework to describe the collection distribution: the programmer indicates how to distribute the program components (tasks and data) by choosing a layout manager for each collection. Layout managers have no e ect in a parallel program, but are used in the distributed framework (section 3.2).

2.4 Example: a simple parallel program

import do.shared.*; /* parallel framework library */ public class MyTask extends Task { public void run (Object param) { /* task behavior definition */ MyData data = (MyData)param; data.select(criterion); data.print(out); public class MyData { data.add(value); /* the task parameter */ } public void add(...){ ... } } public void remove(...){ ... } public void print(...){ ... } public void select(...){ ... } } public class SimpleParallel { public static void main (String argv[ ]) { /* task and data collection initialization */ Array tasks = new Array(N); Array data = new Array(N); for (int i=0; i