PERFORMANCE OF SIMULATED GRID SCHEDULING ALGORITHMS

3 downloads 203 Views 88KB Size Report
research and industry support. The Grid Computing discipline involves the actual networking services and connections of a potentially unlimited number of ubiquitous ... computational grid is a hardware and software infrastructure that provides.
Journal of Information, Control and Management Systems, Vol. 5, (2007), No. 2

261

PERFORMANCE OF SIMULATED GRID SCHEDULING ALGORITHMS Penka MARTINCOVÁ University of Žilina, Faculty of Management Science and Informatics, Slovak Republic e-mail: [email protected] Abstract Grid computing has emerged as the next-generation parallel and distributed computing that aggregates dispersed heterogeneous resources for solving a range of large-scale parallel applications in science, engineering and commerce [1]. Grid scheduling is one of important problems, which has to be resolved by grid researchers. It is a complex problem, which aims to map existing tasks to accessible storage and computational resources in order to get their effective utilization. Since scheduling problem is NP-hard, mostly it’s not possible to find optimal solution and heuristics are used. Simulation is an important tool used for testing the performance of heuristic solutions proposed. This paper presents simulation results of scheduling algorithms. Keywords: grid computing, grid scheduling, simulation, Java 1

INTRODUCTION Grid computing is intended to offer an easy and transparent access to remote resources. The importance of grid computing can be seen by the attention it gained in research and industry support. The Grid Computing discipline involves the actual networking services and connections of a potentially unlimited number of ubiquitous computing devices within a grid. A Grid can be viewed as a seamless, integrated computational and collaborative environment and a high-level view of activities within the Grid. The users interact with the Grid resource broker to solve problems, which in turn performs resource discovery, scheduling, and the processing of application jobs on the distributed Grid resources. From the end-user point of view, Grids can be used to provide the following types of services. • Computational services. As define Foster & Kesselman [1] in 1998, “a computational grid is a hardware and software infrastructure that provides dependable, consistent, pervasive, and inexpensive access to high-end computational capabilities”. These are concerned with providing secure services for executing application jobs on distributed computational resources

262

Performance of Simulated GRID Scheduling Algorithms

individually or collectively. Resource brokers provide the services for collective use of distributed resources. A Grid providing computational services is often called a computational Grid. • Data services. These are concerned with proving secure access to distributed datasets and their management. To provide a scalable storage and access to the data sets, they may be replicated, catalogued, and even different datasets stored in different locations to create an illusion of mass storage [8]. The processing of datasets is carried out using computational Grid services and such a combination is commonly called data Grids. • Application services. These are concerned with application management and providing access to remote software and libraries transparently. The emerging technologies such as Web services [3] are expected to play a leading role in defining application services. They build on computational and data services provided by the Grid. The scheduling task of allocating resources automatically to user jobs is an essential part of grid architecture. In this paper we describe several scheduling algorithms and present results of experiments, executed in simulation. The paper is arranged as follows: second section is a short introduction on grid scheduling problems. Third section describes scheduling algorithms, next simulating model and tools, used for simulation of various scheduling algorithms; fifth section presents experimental results and last section analyzes them. 2

GRID SCHEDULING Grid computing is intended to offer easy and transparent access to remote resources. The scheduling task of allocating these resources automatically to user jobs is essential part of grid architecture. The objective of scheduling is to minimize the completion time of a parallel application [10] by properly allocating the tasks to the processors. In a broad sense, the scheduling problem exists in two forms: static and dynamic. In static scheduling, the characteristics of a parallel program (such as task processing times, communication, data dependencies, and synchronization requirements) are known before program execution. In dynamic scheduling, few assumptions about the parallel program can be done before execution and thus scheduling decisions have to be made on the fly. The goal of a dynamic scheduling algorithm includes not only the minimization of the program completion time but also the minimization of the scheduling overhead, which constitutes a significant portion of the cost paid for running the scheduler. Schedulers are responsible for the management of jobs, such as allocating resources needed for any specific job, partitioning of jobs to schedule parallel execution of tasks, data management, event correlation, and service-level management capabilities. These schedulers then form a hierarchical structure, with meta-schedulers

Journal of Information, Control and Management Systems, Vol. 5, (2007), No. 2

263

that form the root and other lower level schedulers, while providing specific scheduling capabilities that form the leaves. These schedulers may be constructed with a local scheduler implementation approach for specific job execution, or another metascheduler or a cluster scheduler for parallel executions. 3

SCHEDULING ALGORITHMS We have experimented with four algorithms – Hill climbing algorithm, Simulated annealing, Tabu search and Genetic algorithm. The criteria used for evaluating the performance were the time for schedule preparing and execution time. Description of used algorithms follows: • Hill climbing (HC) - The algorithm is started with a random schedule. It sequentially makes small changes to the schedule, each time improving it a little bit. At some point, the algorithm arrives at a point where it cannot see any improvement anymore at which point the algorithm terminates. Ideally, at that, point a schedule is found that is close to optimal but it is not guaranteed that hill climbing will ever come close to the optimal solution. 1. Initialization Set - step k = 0; initial schedule S0 (random generated); best schedule Sbest = S0, bestCost = F(S0) 2. Select next solution from the set of allowed transitions for which the best current solution is reached and set Sbest = Sk+1 and bestCost = F(Sk+1); continue with step 3; if there is not a such transition, terminate. 3. Termination if the conditions for termination are fulfilled, terminate, else set k = k +1 and go to step 2. •

Simulated annealing (SA) is generic probabilistic meta-algorithm for the global optimization problem, namely locating a good approximation to the global optimum of a given function in a large search space. By analogy with this physical process each step of the SA algorithm replaces the current solution (schedule) by a random "nearby" solution, chosen with a probability that depends on the difference between the corresponding function values and on a global parameter t (called the temperature), that is gradually decreased during the process. The dependency is such that the current solution changes almost randomly when t is large, but increasingly "downhill" as t goes to zero. The allowance for "uphill" moves saves the method from becoming stuck at local minima. Local search employs the idea that a given solution S may be improved by making small changes. Those solutions obtained by modifying solution S are called neighbors of S. The local search algorithm starts with some initial solution and moves from neighbor to neighbor as long as possible while decreasing the objective function’s value. The main problem with this

264

Performance of Simulated GRID Scheduling Algorithms

strategy is to escape from local minima where the search cannot find any further neighborhood solution that decreases the objective function value. Different strategies have been proposed to solve this problem. 1. Initialization: (step) k = 0; initial schedule S0 (random generated); Sbest = S0, bestCost = F(S0), initial temperature t = tmax ; foundBetter = false (if the current best solution was changed); r = 0 ( the count of alreday searched transitions ). 2. Replace a current solution S with new one Snew, form the neighborhood as a probability of transition: p(S, Snew, t) = e – ( F(Snew) – F(S)) / t - else if foundBetter = false, terminate. else t = t / (1 + β t) and r = 0 and go to step 2. 3. Generate a new random value h with normal probability distribution on interval and compare p(S, Snew,, t) with h - if p(S, Snew, t) >= h and F(S) > F(Snew,), replace current solution S = Snew,, bestCost = F(Snew,), set r = 0 and foundBetter = true - else if p(S, Snew,, t) < h, set r = r + 1 4. If r >= max. count of allowed transitions without changing the best solution -> go to step 5, else go to step 2. 5. If the current temperature t F(Snew) or they are solutions accessible from this point using no prohibited transitions.

Journal of Information, Control and Management Systems, Vol. 5, (2007), No. 2

265

3. If a set of new solutions is • not empty – select Snew, with min. value of objective function. If F(Snew,) < F(Sbest), update Sbest and bestCost and set r=0. Go to step 5. • empty – go to step 4. 4. If r >=max. count of usage of the aspirant rule, terminate; else set r = r + 1, select Snew, with min. value of objective function and go to step 5. 5. Set S= Snew,; if selected transition isn’t in the list of transitions, add it to list of prohibited; if count of prohibitions > prohibmax remove the oldest prohibition. Go to step 2. •

Genetic algorithm (GA) is a search technique used in computing to find exact or approximate solutions to optimization and search problems. Genetic algorithms are categorized as global search heuristics. Genetic algorithms are a particular class of evolutionary algorithms (also known as evolutionary computation) that use techniques inspired by evolutionary biology such as inheritance, mutation, selection, and crossover (also called recombination). The most common type of genetic algorithm works like this: a population is created with a group of individuals created randomly. The individuals in the population are then evaluated. The evaluation function is provided by the programmer and gives the individuals a score based on how well they perform at the given task. Two individuals are then selected based on their fitness, the higher the fitness, higher the chance of being selected. These individuals then "reproduce" to create one or more offspring, after which the offspring are mutated randomly. This continues until a suitable solution has been found or a certain number of generations have passed, depending on the needs of the solution. 1. Initialization: convert random generated schedule S to vector V (of binary values), add it to the input population Pin • Set Vbest=V, bestCost=F(Vbest) • Generate next X binary vectors, which represent schedules and add them to Pin 2. Copy Pin to the new population Pnew 3. Make Y mutations on random members of Pin and new members add to Pnew 4. Make Z crossovers using 2 members of the population Pin ; add new offspring to Pnew 5. Select X+1 members of Pnew with min. Cost and create new population Pin with them. 6. Find member of Pnew with min. objective function; if Cost < bestCost, update Vbest 7. If stop condition is fulfilled - terminate; else go to step 2.

266

Performance of Simulated GRID Scheduling Algorithms

4

SIMULATION MODEL Grid scheduling algorithms were simulated using simulation model implemented using Java package GridSim. GridSim [4], [5] is a toolkit for modeling and simulation of Grid resources and application scheduling. It provides a comprehensive facility for the simulation of different classes of heterogeneous resources, users, applications, resource brokers, and schedulers. It has facilities for the modeling and simulation of resources and network connectivity with different capabilities, configurations, and domains. It supports primitives for application composition, information services for resource discovery, and interfaces for assigning application tasks to resources and managing their execution. These features can be used to simulate resource brokers or Grid schedulers to evaluate performance of scheduling algorithms or heuristics. The GridSim toolkit resource modeling facilities are used to simulate the worldwide Grid resources managed as time- or space-shared scheduling policies. In GridSim-based simulations, the broker and user entities extend the GridSim class to inherit the ability to communicate with other entities. In GridSim, application tasks/jobs are modeled as Gridlet objects that contain all the information related to the job and the execution management details, such as job length in MI (million instructions), disk I/O operations, input and output file sizes, and the job originator. The broker uses GridSim’s job management protocols and services to map a Gridlet to a resource and manage it throughout its lifecycle.

5 5.1

EXPERIMENTS

Experimental Setup Scheduling algorithms described in section 2 were implemented and tested using simulation model. There were 3 users in the model, each of them requested execution of 100 gridlets with different length (in MI). In the grid model were included 3 resources: Resource_0, which consisted of 3 machines with 2 processors (50 MIPS) each, Resource_1, which consisted of 1 machine and 3 processors (20 MIPS), Resource_2, which consisted of 2 machines with 2 processors (100 MIPS) each. Simulation was performed on AMD Sempron 2500+ 1,7GHz. Simulated algorithms were set up as follows: - HC – it searches one solution only in one cycle. It was repeated 100x. - SA – initial temperature t=95, tmin=45, count of transitions=100, β=0,0025. - TS – size of prohibited transitions list=5, 5 cycles - GA – 5 times creates population. We create simple gridlets and simulate their execution using static and dynamic scheduling.

267

Journal of Information, Control and Management Systems, Vol. 5, (2007), No. 2

5.2 Experimental Results Static schedule is created once only before executing the gridlets. During experiments we watch the average time for schedule creating and average execution time for all gridlets. First experiments with static scheduling were performed. At the figure 1 average time for schedule creation is shown and bar diagram at figure 2 shows average execution time for all gridlets. 2500

2192,2 2000

GA –Genetic Algorithm HC – Hill Climbing SA – Simulated Annealing TS – Tabu Search

1565,5 1500

1000

500

186,933 69,867 0 GA

SA

HC

TS

Figure 1 Static scheduling – average time for schedule creation 1170

1160

GA –Genetic Algorithm HC – Hill Climbing SA – Simulated Annealing TS – Tabu Search

1163,919

1165 1159,118

1155

1152,448

1150 1145,518 1145

1140

1135 GA

HC

SA

TS

Figure 2 Static scheduling – average execution time of scheduled gridlets

268

Performance of Simulated GRID Scheduling Algorithms

Second group of experiments were done using dynamic scheduling. In this case gridlets came to the system one by one and new schedule is created each time. At the figure 3 average times for schedule creation is shown and bar diagram at figure 4 shows average execution time for all gridlets. 800

746,458

700 589,769

600

GA –Genetic Algorithm HC – Hill Climbing SA – Simulated Annealing TS – Tabu Search

500

400

300

200 81,179

100 28,136 0 GA

HC

SA

TS

Figure 3 Dynamic scheduling - average time for schedule creation 1165 1162,385

GA –Genetic Algorithm HC – Hill Climbing SA – Simulated Annealing TS – Tabu Search

1160,386 1160

1155 1152,788

1150 1146,388 1145

1140

1135 GA

HC

SA

TS

Figure 4 Dynamic scheduling - average execution time of scheduled gridlets

Journal of Information, Control and Management Systems, Vol. 5, (2007), No. 2

269

6

ANALYSIS Experiments show, that at both cases Hill climbing algorithm needs minimal time for schedule creation. This result was expected, because of simplicity of this algorithm. Biggest time for static schedule creation needs Simulated annealing algorithm and Genetic algorithm for dynamic scheduling. Analysis of execution times of simulated algorithms shows that if suitable algorithm for static scheduling is needed, best choice is Tabu search algorithm. Its creation time is relatively small and simultaneously average execution time of the schedule is minimal. The features of simulated scheduling algorithms are even better visible in the second group of experiments using dynamic scheduling. Hill climbing algorithm again spends minimum time for schedule creating, but execution time of its schedule is the worst. Genetic algorithm needed biggest time for schedule creation but it didn’t create the best schedule. Again we can find, that Tabu search is the best choice, because it needs small time for schedule creating and gives the best average executing time for given set of gridlets. 7

CONCLUSION We have simulated four scheduling algorithms for executing a set of simple gridlets, which represent independent tasks without precedence or other dependences. As the experiments show Tabu search algorithm is the best choice for this case. It needs relatively small time for schedule creation and gives the best schedule. REFERENCES [1] [2]

[3] [4]

[5] [6] [7] [8]

Foster, I., Kesselman, C.: The Grid: Blueprint for a New Computing Infrastructure Second edition, Morgan-Kaufman, 2004. Baker, M., Buyya, R., Laforenza D.: Grids and Grid technologies for wide-area distributed computing, Sotware – Practice and Experience, 2002, John Wiley & Sons, Ltd. W3C. Web services activity. http://www.w3.org/2002/ws/. Buyya. R., Murshed M: GridSim: A toolkit for the modeling and simulation of distributed resource management and scheduling for Grid computing. Concurrency and Computation: Practice and Experience, May 2002. http://gridbus.cs.mu.oz.au/gridsim/ Glover, F. “Future paths for integer programming and links to artificial intelligence”, Comp. Operational Research., Vol. 13, pp. 533-549, 1986 Janáček J.: Optimalizace na dopravních sítích, Vyd. ŽU/Edis –vydavateľstvo ŽU, 2002, ISBN 80-8070-031-1 Zábovský, M.: Adaptívna alokácia databáz v distribuovanom databázovom systéme. Dizertačná práca, 2002, Žilinská univerzita v Žiline

270

Performance of Simulated GRID Scheduling Algorithms

[9] Bellák, M.: Plánovanie úloh v gride, diplomová práca, ŽU, FRI, 2007 [10] Varša, P. Varša P.: Parallel grid implementation for Laplacean equation computation by Gauss-Seidel method, 2nd International Workshop on Grid Computing for Complex Problems, November 2006, Bratislava Acknowledgement This work has been supported by research project MVTS ČR/SR/ŽU4/07.