Comparison of Simulated GRID Scheduling Algorithms

2 downloads 96 Views 90KB Size Report
Grid computing has emerged as the next-generation parallel and distributed ... explosive Grid Computing environments have now proven to be so significant that.
Comparison of Simulated GRID Scheduling Algorithms Penka Martincová, Michal Zábovský University of Zilina Faculty of Management Science and Informatics, [email protected], [email protected]

Abstract: Grid computing has emerged as the next-generation parallel and distributed computing that aggregates dispersed heterogeneous resources for solving a range of large-scale parallel applications in science, engineering and commerce [1]. Grid scheduling is one of important problems, which has to be resolved by grid researchers. It is a complex problem, which aims to map existing tasks to accessible storage and computational resources in order to get their effective utilization. Since scheduling problem is NP-hard, mostly it’s not possible to find optimal solution and heuristics are used. Simulation is an important tool used for testing the performance of heuristic solutions proposed. This paper presents simulation results of scheduling algorithms.

Introduction In today’s pervasive world of needing information anytime and anywhere, the explosive Grid Computing environments have now proven to be so significant that they are often referred to as being the world’s single and most powerful computer solutions. The Grid Computing discipline involves the actual networking services and connections of a potentially unlimited number of ubiquitous computing devices within a grid. As define Foster & Kesselman [1] in 1998, “a computational grid is a hardware and software infrastructure that provides dependable, consistent, pervasive, and inexpensive access to high-end computational capabilities”. A Grid can be viewed as a seamless, integrated computational and collaborative environment and a high-level view of activities within the Grid. The users interact with the Grid resource broker to solve problems, which in turn performs resource discovery, scheduling, and the processing of application jobs on the distributed Grid resources. We are, in example, using grid architectures for solving large optimizations problems in transportation and thus we are trying to find the best algorithm for scheduling due to the problem size [3]. The scheduling task of allocating resources automatically to user jobs is an essential part of grid architecture. In this paper we describe several scheduling algorithms and present results of experiments, executed in simulation.

Grid Scheduling Schedulers are responsible for the management of jobs, such as allocating resources needed for any specific job, partitioning of jobs to schedule parallel execution of tasks, data management, event correlation, and service-level management capabilities. These schedulers then form a hierarchical structure, with meta-schedulers that form the root and other lower level schedulers, while providing specific scheduling capabilities that form the leaves. SYSTÉMOVÁ INTEGRACE 4/2007

69

Penka Martincová, Michal Zábovský

The objective of scheduling is to minimize the completion time of a parallel application by properly allocating the tasks to the processors. In a broad sense, the scheduling problem exists in two forms: static and dynamic. In static scheduling, the characteristics of a parallel program (task processing times, communication, data dependencies, etc.) are known before program execution. In dynamic scheduling algorithms, the goal includes not only the minimization of the program completion time but also the minimization of the scheduling overhead which constitutes a significant portion of the cost paid for running the scheduler. We have experimented with four algorithms – Hill climbing algorithm, simulated annealing, Taboo search and Genetic algorithm. The criteria used for evaluating the performance were the time for schedule preparing and execution time. Description of algorithms used follows:  Hill climbing (HC) - The algorithm is started with a random schedule. It sequentially makes small changes to the schedule, each time improving it a little bit. At some point, the algorithm arrives at a point where it cannot see any improvement anymore at which point the algorithm terminates. Ideally, at that, point a schedule is found that is close to optimal but it is not guaranteed that hill climbing will ever come close to the optimal solution. 1.

2.

3.



70

Initialization Set - step k = 0; initial schedule S0 (random generated); best schedule Sbest = S0, bestCost = F(S0) Select next solution from the set of allowed transitions for which the best current solution is reached and set Sbest = Sk+1 and bestCost = F(Sk+1); continue with step 3; if there is not a such transition, terminate. Termination if the conditions for termination are fulfilled, terminate, else set k = k +1 and go to step 2.

Simulated annealing (SA) is generic probabilistic meta-algorithm for the global optimization problem, namely locating a good approximation to the global optimum of a given function in a large search space. By analogy with this physical process each step of the SA algorithm replaces the current solution (schedule) by a random "nearby" solution, chosen with a probability that depends on the difference between the corresponding function values and on a global parameter t (called the temperature), that is gradually decreased during the process. The dependency is such that the current solution changes almost randomly when t is large, but increasingly "downhill" as t goes to zero. The allowance for "uphill" moves saves the method from becoming stuck at local minima. Local search employs the idea that a given solution S may be improved by making small changes. Those solutions obtained by modifying solution S are called neighbors of S. The local search algorithm starts with some initial solution and moves from neighbor to neighbor as long as possible while decreasing the objective function’s value. The main problem with this strategy is to escape from local minima where the search cannot find any further neighborhood solution that decreases the objective function value. Different strategies have been proposed to solve this problem.

SYSTÉMOVÁ INTEGRACE 4/2007

Comparison of Simulated GRID Scheduling Algorithms

1.

2.

3.

4. 5.



Initialization: (step) k = 0; initial schedule S0 (random generated); Sbest = S0, bestCost = F(S0), initial temperature t = tmax ; foundBetter = false (if the current best solution was changed); r = 0 (already searched transitions ). Replace a current solution S with new one Snew, form the neighborhood as a probability of transition: p(S, Snew, t) = e – ( F(Snew) – F(S)) / t else if foundBetter = false, terminate. else t = t / (1 + β t) and r = 0 and go to step 2. Generate a new random value h with normal probability distribution on interval and compare p(S, Snew,, t) with h if p(S, Snew, t) >= h and F(S) > F(Snew,), replace current solution S = Snew,, bestCost = F(Snew,), set r = 0 and foundBetter = true else if p(S, Snew,, t) < h, set r = r + 1 If r >= count max of allowed transitions without changing the best solution -> go to step 5, else go to step 2. If the current temperature t F(Snew) or they are solutions accessible from this point using no prohibited transitions. If a set of new solutions is not empty – select Snew, with min. value of obj. function. If F(Snew,) < F(Sbest), update Sbest and bestCost and set r=0. Go to step 5. empty – go to step 4. If r >=max. count of usage of the aspirant rule, terminate; else set r = r + 1, select Snew, with min. value of obj. function and go to step 5. Set S= Snew,; if selected transition isn’t in the list of transitions, add it to list of prohibited; if count of prohibitions > prohibmax remove the oldest prohibition. Go to step 2.

SYSTÉMOVÁ INTEGRACE 4/2007

71

Penka Martincová, Michal Zábovský



Genetic algorithm (GA) is a search technique used in computing to find exact or approximate solutions to optimization and search problems. GA algorithms are categorized as global search heuristics. 1.

2. 3. 4. 5. 6. 7.

Initialization: convert random generated schedule S to vector V (of binary values), add it to the input population Pin Set Vbest=V, bestCost=F(Vbest) Generate next X binary vectors (schedules) and add them to Pin Copy Pin to the new population Pnew Make Y mutations on random members of Pin and new members add to Pnew Make Z crossovers using 2 members of the population Pin ; add new offspring to Pnew Select X+1 members of Pnew with min. Cost and create new population Pin with them. Find member of Pnew with min. objective function; if Cost < bestCost, update Vbest If stop condition is fulfilled - terminate; else go to step 2.

GA are a particular class of evolutionary algorithms (also known as evolutionary computation) that use techniques inspired by evolutionary biology such as inheritance, mutation, selection, and crossover (also called recombination). The most common type of GA works like this: a population is created with a group of individuals created randomly. The individuals in the population are then evaluated. The evaluation function is provided by the programmer and gives the individuals a score based on how well they perform at the given task. Two individuals are then selected based on their fitness, the higher the fitness, higher the chance of being selected. These individuals then "reproduce" to create one or more offspring, after which the offspring are mutated randomly. This continues until a suitable solution has been found or a certain number of generations have passed, depending on the needs of the solution.

Simulation Model Grid scheduling algorithms were simulated using simulation model implemented using Java package GridSim (http://gridbus.cs.mu.oz.au/gridsim). GridSim [4], [5] is a toolkit for modeling and simulation of Grid resources and application scheduling. It provides a comprehensive facility for the simulation of different classes of heterogeneous resources, users, applications, resource brokers, and schedulers. It has facilities for the modeling and simulation of resources and network connectivity with different capabilities, configurations, and domains. These features can be used to simulate resource brokers or Grid schedulers to evaluate performance of scheduling algorithms or heuristics. In GridSim-based simulations, the broker and user entities extend the GridSim class to inherit the ability to communicate with other entities. In GridSim, application tasks/jobs are modeled as Gridlet objects that contain all the information related to the job and the execution management details, such as job length in MI (million instructions), disk I/O operations, input and output file sizes, and the job originator. The broker uses GridSim’s job management protocols and services to map a Gridlet to a resource and manage it throughout its lifecycle.

72

SYSTÉMOVÁ INTEGRACE 4/2007

Comparison of Simulated GRID Scheduling Algorithms

Experiments Scheduling algorithms described in section 2 were implemented and tested using simulation model. There were 3 users in the model; each of them requested execution of 100 gridlets with different length (in MI). In the grid model were included 3 resources: Resource_0, which consisted of 3 machines with 2 processors (50 MIPS) each, Resource_1, which consisted of 1 machine and 3 processors (20 MIPS), Resource_2, which consisted of 2 machines with 2 processors (100 MIPS) each. Simulation was performed on AMD Sempron 2500+ 1,7GHz. Simulated algorithms were set up as follows: HC – it searches one solution only in one cycle. It was repeated 100x. SA – initial temperature t=95, tmin=45, count of transitions=100, β=0,0025. TS – size of prohibited transitions list=5, 5 cycles GA – 5 times creates population. We create gridlets with access and due-date time and simulate their execution using static and dynamic scheduling. Static schedule is created once only before executing the gridlets. During experiments we watch the average time for schedule creating and average execution time for all gridlets. First experiments with static scheduling were performed. At the figure 1 average time for schedule creation is shown and bar diagram at figure 2 shows average execution time for all gridlets. 700

12

597,93

10,11

600

10

8,80

8,82

500 8

7,29

400 6

300

238,41 4

200

2

100

37,89

11,87 0

0

GA

HC

SA

a) average time for schedule creation

TS

GA

HC

SA

TS

b) average execution time of scheduled gridlets

Figure 1 Static scheduling

Second group of experiments were done using dynamic scheduling. In this case gridlets came to the system one by one and new schedule is created each time. At the figure 3 average times for schedule creation is shown and bar diagram at figure 4 shows average execution time for all gridlets.

SYSTÉMOVÁ INTEGRACE 4/2007

73

Penka Martincová, Michal Zábovský

350

8

327,62

6,75

6,59 300

7

250

6

5,87

200

5,25

5

182,16

4 150 3 100 2 50

24,48

5,91 0 GA

HC

SA

1 0

TS

GA

HC

SA

TS

a) average time for schedule creation b) average execution time of scheduled gridlets Figure 2 Dynamic scheduling

Analysis Experiments show, that at both cases HC algorithm needs minimal time for schedule creation. This result was expected, because of simplicity of this algorithm. Biggest time for static schedule creation needs SA algorithm and GA for dynamic scheduling. Analysis of execution times of simulated algorithms shows that if suitable algorithm for static scheduling is needed, best choice is TS algorithm. Its creation time is relatively small and simultaneously average execution time of the schedule is minimal. The features of simulated scheduling algorithms are even better visible in the second group of experiments using dynamic scheduling. HC algorithm again spends minimum time for schedule creating, but execution time of its schedule is the worst. GA needed biggest time for schedule creation but it didn’t create the best schedule. Again we can find that TS is the best choice, because it needs small time for schedule creating and gives the best average executing time for given set of gridlets.

Conclusion We have simulated four scheduling algorithms for executing a set of gridlets, which represent independent tasks with access and due time. As the experiments show TS algorithm is the best choice for this case. It needs relatively small time for schedule creation and gives the best schedule.

References 1. 2.

74

Foster, I., Kesselman, C.: The Grid: Blueprint for a New Computing Infrastructure Second edition, Morgan-Kaufman, 2004. M. Baker, R. Buyya, D. Laforenza: Grids and Grid technologies for wide-area distributed computing, Sotware – Practice and Experience, 2002, John Wiley & Sons, Ltd.

SYSTÉMOVÁ INTEGRACE 4/2007

Comparison of Simulated GRID Scheduling Algorithms

3.

4.

5. 6.

Zábovská, K.: Creation of Valid Shifts for Public Transportation Drivers, Journal of Information, Control and Management Systems, Vol. 5 No. 2/2, EDIS, 2007 Buyya R, Murshed: M. GridSim: A toolkit for the modeling and simulation of distributed resource management and scheduling for Grid computing. Concurrency and Computation: Practice and Experience, 2002 Glover, F. “Future paths for integer programming and links to artificial intelligence”, Comp. Operational Research., Vol. 13, pp. 533-549, 1986 Janáček J.: Optimalizace na dopravních sítích, Edis, Zilina, Slovakia

SYSTÉMOVÁ INTEGRACE 4/2007

75