1 MULTIPROCESSOR SCHEDULING ALGORITHMS ... - CiteSeerX

9 downloads 0 Views 162KB Size Report
cellular automata-based scheduler works in two modes. In learning mode a ... suitable for sequential cellular Automata working as a scheduler. Experimental ..... [3] J. Paredis, Coevolutionary Life-Time Learning, in H. -M. Voigt et al. (eds.) ...
1 MULTIPROCESSOR SCHEDULING ALGORITHMS BASED ON CELLULAR AUTOMATA TRAINING FRANCISZEK SEREDYNSKI Wyższa Szkoła Umiejętności, Wesola 52, 25-353 Kielce, Poland

PASCAL BOUVRY University of Luxembourg, Luxembourg

ABSTRACT This article introduces multiprocessor scheduling algorithms based upon cellular automata. To design cellular automata corresponding to a given program graph a generic definition of program graph neighborhood is used, transparent to the various kinds, sizes and shapes of program graphs. The cellular automata-based scheduler works in two modes. In learning mode a genetic algorithm (GA) is applied to discover rules of cellular automata (CAs) suitable for solving instances of a scheduling problem. In operation mode discovered rules of cellular automata are able to find automatically an optimal or suboptimal solution of the scheduling problem for any initial allocation of a program graph in two-processor system graph. Discovered rules are typically suitable for sequential cellular Automata working as a scheduler. Experimental results concerning scheduling algorithms discovered in the context of cellular automata - based scheduling system are presented.

INTRODUCTION

Multiprocessor scheduling belongs to a special category of computational problems. On one side it is closely related to the issue of practical performance of current and future computers. On the other side, the problem even limited to the simplest case considered in the paper when we have to do with the two processor system but any parallel program is an example of computationally difficult unsolved research problem, known to be as an NP-complete problem [2]. Therefore heuristics based in particular on genetic algorithms (GA) , neural networks and simulated annealing are effectively used today (see, e.g. [7]) to solve scheduling problems. One of the problems which still remains is a high computational cost of a scheduler. Among sources of scheduling overhead is neglecting the potential knowledge about the scheduling problem which could be gained during solving instances of the scheduling problem. The motivation of our work is to develop a framework for designing scheduling algorithms where knowledge about scheduling process can be extracted and potentially used for solving new instances of scheduling problem. For this purpose we propose to use a recently emerged and very promising hybrid technique combining evolutionary computation and computation with cellular automata (CA). The remainder of the paper is organized as follows. The next section presents the scheduling problem. Section 3 gives an overview of CA. Section 4 presents the concept of multiprocessor scheduling with use of CA. Section 5 contains experimental results concerning sequential CA applied to scheduling and introduces a coevolutionary GA - based engine for

2 discovering parallel CA scheduler. Last section contains conclusions and discusses future works. MULTIPROCESSOR SCHEDULING

Both a multiprocessor system and a parallel program are represented by corresponding graphs. A multiprocessor system is represented by an undirected unweighted graph Gs=(Vs,Es) called a system graph. Vs is the set of Ns nodes representing processors and Es is the set of edges representing bidirectional channels between processors. A parallel program is represented by a weighted directed acyclic graph Gp=, with a set Vp of Np nodes and a set Ep of edges, called a precedence task graph or a program graph. Vp is the set of Np nodes of the graph representing elementary tasks. The weight bk of the node k describes the processing time needed to execute a task k on any processor of the system. Ep is the set of edges of the precedence task graph describing the communication patterns between the tasks. The weight akl of the edge (k,l) describes a communication time between the pair of tasks k and l when they are located in neighboring processors. Figure 1 (upper) shows examples of the program graph and the system graph. The program represented by the graph consists of 4 tasks with b0=1, b1=b3=2, b2=4 (numbers on the left side of nodes), and a01=a02=a13=1 (numbers on the left side of edges). The system graph represents a multiprocessor system consisting of two processors P0 and P1. The purpose of scheduling is to distribute the tasks among the processors in such a way that the precedence constraints are preserved, and the response time T (the total execution time) is minimized. The response time T depends on the allocation of tasks in the multiprocessor topology and on scheduling policy applied in individual processors: T=f(allocation, scheduling_policy). We assume that the scheduling policy is a user-defined parameter, the same for all processors, but the allocation is a subject to change by a scheduling algorithm. CELLULAR AUTOMATA

One dimensional CA [6] is a collection of two-state cells arranged in a lattice of the length N, and interacted locally in a discrete time t, usually in a parallel and synchronous way. For each cell i called the central cell, a t +1 neighborhood of a radius r is defined. It is assumed that a state qi of the cell i at the time t+1 depends only on states of its neighborhood at the time t, i.e. qit +1 = f (qit , qit1 , qit2 ,..., qint ) . A transition function f defines a rule of updating the cell i. A length Lg of a rule and a number of neighborhood states for a binary uniform CA is Lg=2n, where n=ni is a number of cells of a given neighborhood, and a number of such rules can be expressed as 2Lg. For a CA with e.g. r=2 the length of the rule is equal to Lg=32, and a number of such rules is 232 and grows very fast with Lg.

3

A CONCEPT OF CELLULAR AUTOMATA-BASED SCHEDULER

An idea of CA-based scheduler is presented in Figure 1. With each task of a program graph from Figure 1 (upper) an elementary cell of CAs is associated. An initial state of the CAs correspond to an initial allocation of tasks in the twoprocessor system (Figure 1 (lower-left)). Next, the CAs start to evolve in time according to some rule. Changing states of the CA cells corresponds to changing the allocation of tasks in the system graph, what results in changing the response time T. A final state of the CAs correspond to a final allocation of tasks in the system (Figure 1 (lower-right)). To construct the CA-based scheduler one must solve several problems:(a) what is the topological structure of proposed CAs: linear as shown in Figure 1 (lower-left) or nonlinear, related in some way to the topological structure of a program graph, (b) what kind of a local neighborhood of a program graph is the most appropriate to design corresponding CAs, and (c) how to find in a huge space of CA rules, the rule capable to solve the scheduling problem. In the approach we adopt, (see [5]) the structure of the CA is nonlinear and corresponds to the topology of the program graph. Elementary cells are associated with tasks of a program graph and a some neighborhood of a central task is created. The central cell takes only values 0 or 1, what results in considering the scheduling problem only for the 2-processor topology: the state 0 or 1 of a cell means that a corresponding task is allocated either in the processor P0 or P1, respectively. The scheduler operates in two modes: a mode of learning CA rules and a mode of normal operating. The purpose of the learning mode is to discover effective rules for scheduling. Searching rules is conducted with use of GA in the following way: GA to discover CA rules: t=0 create an initial population P() of size npop of rules WHILE termination_condition NOT TRUE BEGIN create a set of a size n_test of test problems IF hillclimbing_condition TRUE THEN hillclimbing FOR i=1 TO npop BEGIN

Tij* = 0

FOR j=1 TO ntest

Tij* = Tij* + CA(rulei , test j , seq / par , CAsteps )

END * sort P() according to Ti move E of the best individuals from P(t) to P(t+1) FOR k=1 TO npop - E REPEAT rule1parent=select() rule2parent=select() ≠ rule1parent (rule1child, rule2child)= crossover(rule1parent, rule2parent) mutation(rule1child, rule2child)

4 UNTIL Hamming(rule1child, rules)>=H AND Hamming (rule2child, rules)>=H

t=t+1 END Problem_solution = the best rules from P().

For this purpose an initial random population of rules is created. Also a set of random initial allocations of a program graph into the system graph is created. States of the CAs are initiated according to a given allocation of a program graph. The CAs equipped with a rule from the population of rules start to run during predefined number of time steps. Changing states of the CAs corresponds to changing an allocation of task of a program graph. The response time T for a final allocation is evaluated. This procedure of evaluation of a rule is repeated for each allocation from the set of initial allocations. Eventually, a fitness value T* for the rule is evaluated as the sum of final values of T corresponding to each initial allocation of tasks. After evaluation in a similar way of all rules from the population, genetic operators are applied. Selection with elitism transfers some percent of the best rules, called elite to the new population which will processed in the next generation. The remaining part of the population is created by crossover between members of the elite, and next mutation is applied to new members (children) of the population. Also a new set of initial allocations is randomly created at the beginning of each generations. Evolutionary process is continued a predefined number of generations. In the mode of normal operating, when a program graph is randomly allocated, the CA is initiated and equipped with a rule taken from the set of discovered rules. We expect in this mode, that for any initial allocation of tasks of a given program graph, the CAs will be able to find in a finite number of time steps, an allocation of tasks, providing an optimal or suboptimal value of T. EXPERIMENTS

In the experiments reported in this section it is assumed that the CAs work asynchronously, i.e. at a given moment of time only one cell updates its state. A number of experiments with program graphs available in the literature has been conducted (for details, see [5]). Figures 2a,b and 3 show results of experiment with a program graph called g40 consisting of 40 tasks. A population of rules of a size 200 was used in the learning mode in this experiment. Figure 2a shows that rules of CAs providing an optimal scheduling with a response time T=80 was found after 160 generations. Figure 2b shows the performance of found rules evaluated in the operation mode. One can see that the best rules found in the learning mode provide in the average near optimal solutions in the operation mode. Figure 3 shows a run of CA-based scheduler with the best found rule. One can see that the CA finds a steady-state corresponding to an allocation providing an optimal response time T=80 in the step 14. A parallel version has been proposed in [5] based on using coevolutionary GA [3] based on a predator-prey paradigm for discovering rules. Results of the experimental study show that the coevolutionary GA is able to find rules for

5 parallel CAs, which provide significantly faster scheduling in comparison with a sequential CA-based scheduler. CONCLUSIONS

We have presented in the paper a novel approach to design CA-based scheduling algorithms. We used GA in learning mode of work of the scheduler to discover scheduling rules of CA. In this mode a knowledge about solving a given instance of the scheduling problem is extracted and coded into CA rules. A number of questions in this area are still open. The most important of them is how to use the knowledge extracted during learning process and coded into CA rules. Some results of study show that rules discovered for different instances of the scheduling problem may be ranked according to their possibility to solve automatically, in the operation mode, other instances of the scheduling problem. It leads to the concept of artificial immune system [1] for scheduling problem [4] in which discovered rules are reused to solve new instances of the scheduling problem.

FIGURES

1 2

1 0 1

1

4

P0

2

1 2

P1

3

initial (random) task allocation

CA

final task allocation

0

1

1

0

0

1

0

1

0

1

2

3

0

1

2

3

1

1

0

1

0

1

0

CA

initial state

0

0

final state

time

Figure 1. An idea of CA-based scheduler: an example of a program graph and a system graph (upper), corresponding CA performing scheduling (lower).

6

a)

learning CA rules(250) with GA: g40

b)

rules(250,g40) > g40 88

100

86

avr T

average of T

response time T

95

90

85

84

82

final T init T

100 tests

80

80

78 0

20

40

60

0

80 100 120 140 160 180 200 generation

50

100 rules

150

200

Figure 2. Sequential CA-based scheduler for g40: learning mode (a) and operation mode (b) automat

79

81

83

85

87

89

T

0 5 10 15 20 25

Figure 3. Space-time diagram of CA-based scheduler for g40 REFERENCES [1] D. Dasgupta (ed.), Artificial Immune Systems and Their Applications, Springer, 1999. [2] H. El-Rewini and T. G. Lewis, Scheduling Parallel Program Tasks ontoArbitrary Target Machines, Journal of Parallel and Distributed Computing 9, 1990, pp. 138153. [3] J. Paredis, Coevolutionary Life-Time Learning, in H. -M. Voigt et al. (eds.), Parallel Problem Solving from Nature -- PPSN IV}, LNCS 1141, Springer, 1996, pp. 72-80. [4] F. Seredynski, A. Swiecicka, Immune-like System Approach to Cellular Automata based Scheduling, in R. Wyrzykowski et al. (eds.), Parallel Processing and Applied Mathematics, Springer, LNCS 2328, pp. 626-633. [5] F. Seredynski and A. Zomaya, Sequential and Parallel Cellular Automata-based Scheduling Algorithms, IEEE Transactions on Parallel and Distributed Systems, Volume: 13 Issue: 10 , Oct. 2002. [6] S. Wolfram, Universality and Complexity in Cellular Automata, Physica D 10, 1984, pp. 1-35. [7] A. Y. Zomaya, C. Ward, and B. Macey, Genetic Scheduling for Parallel Processor Systems: Comparative Studies and Performance Issues, IEEE Trans. on Parallel and Distributed Systems, Vol.10, No. 8, 1999, pp. 795-812.