Optimised Scheduling of Grid Resources Using

0 downloads 0 Views 626KB Size Report
need for the use of optimisation processes in allocation planning. This gap shall be ..... Computing; Dissertation, Monash University, Melbourne, Australia, 12.
Optimised Scheduling of Grid Resources Using Hybrid Evolutionary Algorithms Wilfried Jakob, Alexander Quinte, Karl-Uwe Stucky, and Wolfgang Süß Forschungszentrum Karlsruhe GmbH Institute for Applied Computer Science P.O. Box 3640, 76021 Karlsruhe, Germany {wilfried.jakob, quinte, uwe.stucky, wolfgang.suess}@iai.fzk.de

Abstract. The present contribution shall illustrate the necessity of planning and optimising resource allocation in a grid. Requirements to be met by a resource management system will be defined. These requirements are comparable with the requirements on planning systems in other fields, e.g. production planning systems. Here, various methods have already been developed for optimised planning. Suitable methods are Evolutionary Algorithms. Based on an example from the field of production planning, the performance of these methods is demonstrated and use in the GORBA resource broker shall be described.

1

Introduction

With the growing acceptance of grid computing, the number of resources in a grid environment and the number of users increase constantly. For the best possible usage of grid resources and most rapid execution, efficient planning of the grid resources is required. This contribution shall illustrate the use and benefits of modern resource management methods in a grid environment. It will be shown that there still is considerable need for the use of optimisation processes in allocation planning. This gap shall be closed by the global optimising resource broker GORBA (Global Optimising Resource Broker and Allocator) that is currently being developed. The concept of GORBA and the underlying optimisation processes shall be outlined.

2

Resource Management in a Grid Environment

A resource management system in a grid environment is responsible for allocating grid resources to waiting requests for resources. In this context, the following requirements are made on a modern resource management system [1, 2]: - Quality of Services (QoS) o guaranteed resource usage (advanced reservations) o negotiation of resource usage o deadlines (+ malleability)

o co-allocation for multi-site jobs and complex workflows o service level agreements (SLA) o different QoS levels - Reliability / Fault-Tolerant Scheduling o failure detection & recovery - Grid Economics o payment and penalties for resource usage, failures, and violated SLAs o load balancing in the grid. These requirements are made by both the user and the supplier of grid services. As a rule, each user attaches importance to cheap and a rapid execution of the job with guaranteed response times and the possibility of using and reserving certain resources (QoS). The supplier tries to reach a homogeneous usage of his resources and to serve all users equally well (grid economics). To meet these partly contradicting requirements, the best possible solution has to be found. Both users and suppliers of grid services additionally ask for reliability and fault tolerance. Resource management systems can be divided into queuing systems and planning systems [1, 3]. The difference between both systems lies in the planned time window and the size of the set of jobs considered. Queuing systems try to allocate the resources available at a certain time to the current waiting request for resources. Resource planning for the future for all waiting requests is not done. In contrast to this, planning systems plan for the present and future, which results in an assignment of start times to all requests. Today, almost all resource management systems belong to the class of queuing systems. Contrary to queuing systems, planning systems require more information, such as the duration of execution, long-term availability of ressources, etc. For this reason, implementation of queuing systems usually is much easier. However, a queuing system is efficient in case of a low usage of the system only. In the case of increased usage, the queuing system reveals considerable weaknesses with respect to the quality of services, resource usage, and execution time of the individual grid jobs. For instance, for waiting grid jobs no statements can be made with respect to the presumable time of job execution. Presently, resource management systems [4] exist in e.g. the grid systems Unicore [5], Nimrod/G [6], and Condor-G [7]. Unicore allows for the listing of suitable resources, together with the costs and the presumable execution time. Condor-G allows for resource finding according to criteria given by the user. In Nimrod/G, conventional optimisation in terms of costs or time or both is possible. Usually, resource management systems in grid environments are mere “resource finding systems” with manual resource allocation. At best, optimum scheduling of the actual job takes place with previous planning being maintained (e.g. Nimrod/G). Automatic resource allocation should be made such that the above requirements on a resource management system, e.g. best possible usage of resources, guaranteed and short response time, etc., are fulfilled. Comparable problems are dealt with in many industrial resource planning tasks, such as in production planning. A typical task of production planning is the allocation of alternative processing stations (i.e. resources) to partial jobs in a given order. The main objectives include: - execution of all jobs within their due dates, - execution of all jobs as rapidly as possible, - preferred treatment of rush orders,

- homogeneous resource usage, and - efficient and fast replanning due to equipment breakdown, cancellation of jobs or new orders. Such planning problems are NP-complete and as no efficient exact solution methods are known for this class of problems, heuristic methods are applied to find appropriate solutions within an acceptable time. The planning problems in a grid environment are very similar to the previously described planning problems. Differences consist in the variable availability of resources and in the difficulty to predict the duration of execution of individual jobs, which aggravates planning and frequently gives rise to replanning. Our objective is the development of a resource management system for a grid environment, which fulfils the above requirements in the best possible manner. It generates allocation plans for the resources existing and the jobs to be executed, in which it is specified when each job is executed on which resource. For the optimisation of the allocation plans, the optimisation tool HyGLEAM (Hybrid General-purpose Evolutionary Algorithm and Method) developed at our institute and tested for a variety of applications is used [8, 9]. The concept of our global optimising resource broker GORBA shall be explained below. A similar approach is described in [10].

3

GORBA – Global Optimising Resource Broker and Allocator

Before focusing on our resource broker GORBA, the concept of resource management in a grid environment, in which GORBA is embedded, shall be explained [11]. Our concept of a heterogeneous grid environment is based on describing the grid task as an instantiated workflow, called application job. An application job consists of a workflow definition and the corresponding data. Here, any user-defined structure of the workflow is allowed (e.g. parallelism, sequences, splitting, or joining) [12]. Handling of application jobs requires a dedicated grid middleware. This grid middleware receives the application job from the application and analyses and distributes parts of the application job in the grid. For this purpose, the grid middleware divides the application job into single grid jobs as described by the workflow. After processing these single grid jobs, the grid middleware collects the overall result and sends it back to the application. Fig. 1 shows the principle of our grid environment. Defining application jobs by workflows allows for the description of parallelism and the usage of heterogeneous resources. Moreover, it is independent of a specific application. Resource management is divided into two services, the resource broker and the job manager. Fig. 1 shows the details of our resource management systems. The resource broker GORBA receives the application job that consists of a workflow definition and the corresponding data. It analyses the workflow and generates the single grid jobs from the application job. The resource broker acquires the capacities of the work nodes by using the information service and plans the distribution of the single grid jobs. For this task, GORBA is equipped with two planning components that are based on conventional and evolutionary processes, respectively. In any case, conventional planning is made. In cases of smaller usage, this conventional planning will be com-

pletely sufficient. As soon as higher usage results in allocation conflicts or waiting situations, the conventional planning result is taken as basis of subsequent planning using the Evolutionary Algorithm HyGLEAM. Thus, it is ensured that the planning results of HyGLEAM have the quality of the conventional planning results at least. Resource Management ob Grid J Resource Broker

Application Job

Application Job

Job Manager

App.ico Softwar e Work

Node

Grid

Job

Workflow Decomposition

Scheduling Control

Conventional Scheduling

App.ico Softwar e Work

Node

Schedule (active) Resource List

GLEAM / HyGLEAM Optimisation

Information Service

Basic Grid Services

Fig. 1. Resource Broker GORBA embedded in a grid environment

GORBA generates an optimised allocation plan that distributes the grid jobs to the grid resources. The optimisation objectives are: - Favourable/best allocation of all jobs not yet started to all resources, unless the latter are occupied by jobs already started. - Individual weighing between costs and execution time per application job. - Option “rush order”: specification of acceptable additional costs per time unit of earlier execution compared to a given time. - Specification of global secondary objectives, such as homogenous working loads. For every objective a quality function is defined, which delivers a normalised quality value, from which a weighted sum is calculated. Additionally, penalty functions are used for situations like the violation of due dates. Later extensions of GORBA will aim at optimised data storage in terms of lowcost storage locations, reasonable transfer costs, good network performance, etc. Permanent replanning takes place in case of the following events: - new application job - cancellation/termination of an application job - resource failure - new resources - execution outside of planning time Open problems consist in the difficult comparability of various hardware and software platforms with respect to execution time, the presently frequently lacking avail-

ability of information by the work nodes, and the lacking accuracy of time estimates in the job description by the user. We think that these objectives must be tackled, if an improvement of QoS is required regardless which kind of planning method is used, as this is a general problem of the transition from queuing to planning systems. A schedule generated by GORBA is handed over to the job manager for execution. Therefore, the job manager has access to the grid jobs and the assigned work nodes. It is responsible for the execution of the grid job on the assigned work node. After a grid job has been completed, the job manager transmits the intermediate result to the next grid job or a possibly occurring error to the resource broker. GORBA can be embedded in any grid middleware (e.g. Globus[13]) and can use their basic grid services. 3.1

Hybrid Evolutionary Algorithm HyGLEAM

HyGLEAM is a hybrid consisting of application-independent local search algorithms and the Evolutionary Algorithm GLEAM (General Learning Evolutionary Algorithm and Method) [14]. GLEAM is an Evolutionary Algorithm of its own that combines elements from Evolution Strategy and real-coded Genetic Algorithms with data structuring concepts from computer science. Coding is based on chromosomes consisting of problem-configurable gene types. The definition of a gene type constitutes its set of real, integer or Boolean parameters together with their ranges of values allowing for mutation operators that take explicit restrictions into account. Among others, GLEAM uses mutation operators influenced by the Evolution Strategy in so far, as small parameter changes are more likely than greater ones. Mutation can also change the gene order and add or delete genes in the case of dynamic chromosomes. GLEAM uses ranking-based selection and elitist offspring acceptance. A detailed description of the present state of GLEAM can be found in [15]. To keep the hybrid generally applicable suitable local search algorithms must be derivative-free and able to handle restrictions. Two well-known procedures from the sixties were chosen, since they meet these requirements and are known to be powerful local search procedures: the Rosen-

Fig. 2.

Pseudo code of the used hybridisation of HyGLEAM, also called Memetic Algorithm.

brock algorithm [16] and the Complex method [17]. We use an implementation according to Schwefel, who gives a detailed description of both algorithms together with experimental results [18]. Fig. 2 shows the pseudo code of that hybridisation method of HyGLEAM we use for scheduling (memetic algorithm part of HyGLEAM). As this paper focuses on scheduling of Grid jobs and due to the lack of space HyGLEAM and its basic algorithms have been described here very briefly only and the interested reader is referred to given literature. Scheduling Example from Chemical Industry In order to demonstrate that EAs can be applied successfully to scheduling tasks, the results of a scheduling and resource optimisation problem solved by Blume and Gerbe using GLEAM shall be reported [19]. This task from chemical industry deals with batches with varying numbers of workers being required during the different phases of each batch. The objective of scheduling these batches means a maximum reduction of production time and peak number of workers per shift (human resource). Restrictions like due dates of batches, necessary preproducts from other batches, and the availability of shared equipment must also be observed. Allocation conflicts are solved by the sequence of the batches within a chromosome. As that can be overwritten by suitable changes of the starting times, however, the combinatorial aspect is limited to solving allocation conflicts. The concrete planning task reported here consists of 87 batches for which a manu-

Fig. 3. Results of the scheduling task from chemical industry. The maximum numbers of workers required per shift is shown for a) the manual schedule, b) a time-optimised schedule made by GLEAM and, c) a worker- and time-optimised schedule also made by GLEAM.

ally created schedule served as a standard of comparison. It requires 12 workers at maximum and lasted nearly 210 shifts, as shown in Fig. 3a. The number of shifts can be reduced to 123 (59%), if the upper limit for the human resource needs only to be adhered to, see Fig. 3b. This was achieved by a significant increase of the portion of labour time spent on work during a shift. If both, production time and the number of required workers are to be reduced, the best solution found is a reduction to 148 shifts (70%) and a maximum of 9 workers per shift (75%) as shown in Fig. 3c. This is equivalent to a reduction in man hours of 52 % of the manual solution (Fig. 3a). Besides this task, others have been performed, including replanning, because of new orders and equipment failures. The major result of this is that plans of similar quality were produced in a shorter time compared to the initial planning. As this task has a lot of similarities to the scheduling of job sequences described by workflows and competing for resources we can expect a relevant benefit for optimised resource brokering.

4

Conclusion and Future Work

The presented contribution emphasises the necessity of planning and optimising resource allocation in a grid. With the concept of a global optimising resource broker GORBA, a system was presented, which can fulfil these planning tasks. It is based on the Evolutionary Algorithm HyGLEAM, the performance of which has already been approved in a number of applications [8, 9, 15, 19]. As an example, optimisation of a production planning task with GLEAM was described in this paper. At the moment, it is worked on implementing a first prototype of GORBA, which will then be used to perform reference studies and benchmark tests. The reference studies will be aimed at analysing and evaluating the optimisation processes implemented in GORBA under different load conditions. Depending on the test results, a parallelisation of HyGLEAM may be considered to improve its performance and to allow fast replanning solutions. In the next step, the optimisation options of GORBA are planned to be extended to cover data-related resources as well.

5

References

1. Hovestadt, M., Kao, O., Keller, A., Streit, A.: Scheduling in HPC Resource Management Systems: Queuing vs. Planning . Proceedings of the 9th Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP) at GGF8, Seattle, WA, USA, June 24, 2003, LNCS 2862, Springer Verlag, New York (2003) 1-20 2. Foster, I., Roy, A., Sander, V.: A Quality of Service Architecture that Combines Reservation and Application Adaption; Proc. of the 8th Intern. Workshop on Quality of Service, 2000 3. Hamscher, V., Schwiegelshohn, U., Streit, A., Yahyapour, R.: Evaluation of JobScheduling Strategies for Grid Computing. Proceedings of 1st IEEE/ACM International Workshop on Grid Computing (Grid 2000) at 7th International Conference on High Per-

4. 5. 6. 7. 8.

9. 10. 11.

12. 13. 14. 15. 16. 17. 18. 19.

formance Computing (HiPC-2000), Bangalore, India, LNCS 1971, Springer Verlag, Berlin Heidelberg, New York (2000) 191-202 Moreno, R.A.: Job Scheduling and Resource Management Techniques in Dynamic Grid Environments; 2002 Fellows, D.K.: Abstraction of Resource Broker Interface"; Deliverable D 2.4a/UoM, University of Manchester, 2002 Buyya, R.: Economic-based Distributed Resource Management and Scheduling for Grid Computing; Dissertation, Monash University, Melbourne, Australia, 12. April 2002 Roy, A., Livny, M.: Condor and Preemptive Resume Scheduling; Published in Grid Resource Management: State of the Art and Future Trends, Fall 2003, pp. 135-144, Edited by Nabrzyski, J., Schopf, J.M., Weglarz, J.,published by Kluwer Academic Publisher. Jakob, W.: HyGLEAM – An Approach to Generally Applicable Hybridization of Evolutionary Algorithms. In: Merelo, J.J., et. al. (eds.): Proceedings of PPSN VII, Lecture Notes in Computer Science, Vol. 2439. Springer-Verlag, Berlin Heidelberg New York (2002) 527–536 Jakob, W.: Eine neue Methodik zur Erhöhung der Leistungsfähigkeit Evolutionärer Algorithmen durch die Integration lokaler Suchverfahren. Doctoral thesis, FZKA 6965, University of Karlsruhe (in German) (2004), see also: www.iai.fzk.de/~jakob/HyGLEAM/ Abraham, A., Buyya, R., Nath, B.: Nature's Heuristics for Scheduling Jobs on Computational Grids; International Conference on Advanced Computing and Communications, 2000 Halstenberg, S.; Stucky, K.U.; Süß, W. A grid environment for simulation and optimization and a first implementation of a biomedical application. Proceedings of the OTM 2004 Workshops, Agia Napa, Cyprus, October 25-29, 2004, LNCS 3292, Springer Verlag, Berlin Heidelberg, New York (2004), 59-67 Fischer, L. (Ed.): Workflow Handbook 2001, ISBN 0-9703509-0-2 http://www.globus.org Blume, C.: GLEAM - A System for Intuitive Learning. In: Schwefel, H.P., Männer, R. (eds): Conf. Proc. of PPSN I. LNCS 496, Springer Verlag, Berlin (1990) 48-54 Blume, C., Jakob, W.: GLEAM – An Evolutionary Algorithm for Planning and Control Based on Evolution Strategy. In: Cantú-Paz, E. (ed): GECCO – 2002, Vol. Late Breaking Papers (2002) 31-38 Rosenbrock, H.H.: An Automatic Method for Finding the Greatest or Least Value of a Function. The Computer Journal, 3 (1960) 175-184 Box, M.J.: A New Method of Constrained Optimization and a Comparison with Other Methods. The Computer Journal, 8 (1965) 42-52 Schwefel, H.-P.: Evolution and Optimum Seeking. John Wiley & Sons, New York (1995) Blume, C., Gerbe, M.: Deutliche Senkung der Produktionskosten durch Optimierung des Ressourceneinsatzes. atp 36, 5/94, Oldenbourg Verlag, München (in German) (1994) 25-29