Roadmap for the ARC Grid Middleware - Springer Link

3 downloads 17462 Views 309KB Size Report
such as information services, resource discovery and monitoring, job submission ... Megabytes, it is available for most current Linux distributions and can be in- stalled at any available ... least one of the Grid resources by the means of Grid tools. .... system interfaces and offering better manageability and control of a Grid re-.
Roadmap for the ARC Grid Middleware Paula Eerola1, Tord Ekel¨ of2 , Mattias Ellert2 , Michael Grønager3, John Renner Hansen4 , Sigve Haug5 , Josva Kleist3,8 , azs K´onya1 , Farid Ould-Saada5 , Aleksandr Konstantinov5,6, Bal´ 1 Oxana Smirnova , Ferenc Szalai7 , and Anders W¨a¨an¨ anen4 1

3

Experimental High Energy Physics, Institute of Physics, Lund University, Box 118, SE-22100 Lund, Sweden [email protected] 2 Dept. of Radiation Sciences, Uppsala University, Box 535, SE-75121 Uppsala, Sweden NDGF, NORDUnet A/S, Kastruplundsgade 22-1, DK-2770 Kastrup, Denmark 4 Niels Bohr Institute, Blegdamsvej 17, DK-2100, Copenhagen Ø, Denmark 5 University of Oslo, Dept. of Physics, P. O. Box 1048, Blindern, NO-0316 Oslo, Norway 6 Vilnius University, Institute of Material Science and Applied Research, Saul˙etekio al. 9, Vilnius 2040, Lithuania 7 Institute of National Information and Infrastructure Development NIIF/HUNGARNET, Victor Hugo 18-22, H-1132 Budapest, Hungary 8 Dept. of Computer Science, Aalborg University, Fredrik Bajersvej 7E, DK-9220 Aalborg Ø, Denmark

Abstract. The Advanced Resource Connector (ARC) or the NorduGrid middleware is an open source software solution enabling production quality computational and data Grids, with special emphasis on scalability, stability, reliability and performance. Since its first release in May 2002, the middleware is deployed and being used in production environments. This paper aims to present the future development directions and plans of the ARC middleware in terms of outlining the software development roadmap. Keywords: Grid, Globus, middleware, distributed computing, cluster, Linux, scheduling, data management.

1

Introduction

Advanced Resource Connector (ARC) [1] is a general purpose Grid middleware that provides a very reliable implementation of the fundamental Grid services, such as information services, resource discovery and monitoring, job submission and management, brokering and low-level data and resource management. A growing number of research Grid infrastructure projects (Swegrid [2], Swiss ATLAS Grid [3], M-Grid of Finland [4], Nordic Data Grid Facility (NDGF) [5] etc.) are choosing ARC as their middleware. These resources effectively constitute one of the largest production Grids in the world, united by the common middleware base while having different operational modes and policies. B. K˚ agstr¨ om et al. (Eds.): PARA 2006, LNCS 4699, pp. 471–479, 2007. c Springer-Verlag Berlin Heidelberg 2007 

472

P. Eerola et al.

The development of the open source ARC middleware has been coordinated by the NorduGrid Collaboration [6]. This Collaboration has successfully initiated and takes an active part in several international Grid development and infrastructure projects, such as the EU KnowARC [7] and the Nordic Data Grid Facility. All such projects contribute to the ARC software. Due to these new initiatives and to the active community that has formed around the middleware, substantial development is planned and expected in coming years. This paper presents a common view on the future of the Advanced Resource Connector, incorporating the input from the major contributing projects and the community. The roadmap contains development plans beyond the stable release version 0.6 of ARC. Relation to other middlewares, emerging standards and interoperability issues are out of scope of this paper.

2

ARC Overview

ARC middleware was created as a result of a research process started in 2001 by the Nordic High Energy Physics Community. The initial motivation was to investigate possibilities to set up a regional computational Grid infrastructure using the then existing solutions, primarily the EDG [9] prototype and the Globus R [10]. Studies and tests conducted by NorduGrid showed that such soToolkit lutions were not ready at that time to be used in a heterogeneous production environment, characteristic for the scientific and academic computing in the Nordic countries. Nordic scientific computing has a specific feature in a way that it is carried out by a large number of small and medium size facilities of a different kind and ownership, not by big supercomputing centers. ARC was designed in 2002, having user requirements and experience with other Grid solutions in mind. An important requirement from the very start of ARC development has been to keep the middleware portable, compact and manageable both on the server and the client side. In the stable ARC release version 0.6, the client package occupies only 14 Megabytes, it is available for most current Linux distributions and can be installed at any available location by a non-privileged user. For a computing service, only three main processes are needed: the specialized GridFTP service, the Grid Manager and the Local Information Service. The ARC architecture is carefully planned and designed to satisfy the needs of end-users and resource providers. To ensure that the resulting Grid system stable by design, it was decided to avoid centralized services as much as possible and to identify only three mandatory components (see Figure 1): 1. The Computing Service, implemented as a GridFTP-Grid Manager pair of core services. The Grid Manager (GM) is instantiated at each computing resource’s (typically a cluster of computers) front-end as a new service. It serves as a gateway to the computing resource through a GridFTP channel. GM provides an interface to the local resource management system, facilitates job manipulation, data management, allows for accounting and other essential functions.

Roadmap for the ARC Grid Middleware

473

Fig. 1. Components of the ARC architecture. Arrows point from components that initiate communications towards queried servers.

2. The Information System is the basis for the Grid-like infrastructure. It is realized as a hierarchical distributed database: the information is produced and stored locally for each service (computing, storage), while the hierarchically connected Index Services maintain the list of known resources. 3. The Brokering Client is deployed as a client part in as many instances as users need. It is enabled with resource discovery and brokering capabilities, being able to distribute the workload across the Grid. It also provides client functionality for all the Grid services, and yet is required to be lightweight and installable by any user in an arbitrary location in a matter of few minutes. In this scheme, the Grid is defined as a set of resources registering to the common information system. Grid users are those who are authorized to use at least one of the Grid resources by the means of Grid tools. Services and users must be properly certified by trusted agencies. Users interact with the Grid via their personal clients, which interpret the tasks, query the Information System whenever necessary, discover suitable resources, and forward task requests to the appropriate ones, along with the user’s proxy and other necessary data. If the task consists of job execution, it is submitted to a computing resource, where the Grid Manager interprets the job description, prepares the requested environment, stages in the necessary data, and submits the job to the local resource management system. Grid jobs within the ARC system possess a dedicated area on the computing resource, called the session directory, which effectively implements limited sandboxing for each job. Location of each session directory is a valid URL and serves as the unique Job Identifier. In most configurations this guarantees that the entire contents of the session directory is available to the authorized persons during the lifetime of the Grid job. Job states are reported

474

P. Eerola et al.

in the Information System. Users can monitor the job status and manipulate the jobs with the help of the client tools, that fetch the data from the Information System and forward the necessary instructions to the Grid Manager. The Grid Manager takes care of staging out and registering the data (if requested), submitting logging and accounting information, and eventually cleaning up the space used by the job. Client tools can also be used to retrieve job outputs at any time. Figure 1 shows several optional components, which are not required for an initial Grid setup, but are essential for providing proper Grid services. Most important are the Storage Services and the Data Indexing Services. As long as the users’ jobs do not manipulate very large amounts of data, these are not necessary. However, when Terabytes of data are being processed, they have to be reliably stored and properly indexed, allowing further usage. Data are stored at Storage Elements and can be indexed in a variety of third-party indexing databases. Other non-critical components are the Grid Monitor that provides an easy Web interface to the Information System, the Logging Service that stores historical system data, and a set of third-party User Databases that serve various Virtual Organisations.

3

The Roadmap

The described above solution has been in production use for many years, and in order to keep up with the growing users needs and with the Grid technology development, substantial changes have to be introduced, keeping the original fundamental design principles intact. ARC middleware development is driven in equal parts by the global Grid technology requirements and by customers’ needs. Independently of the nature of the customer – an end-user or a resource owner – the guiding principles for the design are the following: 1. A Grid system, based on ARC, should have no single point of failure, no bottlenecks. 2. The system should be self-organizing with no need for centralized management. 3. The system should be robust and fault-tolerant, capable of providing stable round-the-clock services for years. 4. Grid tools and utilities should be non-intrusive, have small footprint, should not require special underlying system configuration and be easily portable. 5. No extra manpower should be needed to maintain and utilize the Grid layer. 6. Tools and utilities respect local resource owner policies, in particular, securityrelated ones. 7. Middleware development and upgrades must proceed in incremental steps, ensuring compatibility and reasonable co-existence of old and new clients and services during the extended transitional periods. The long-term goal of the development is to make ARC to be able to support easy creation of dynamic Grid systems and to seamlessly integrate with tools

Roadmap for the ARC Grid Middleware

475

used by different end-user communities. The main task taken care of by ARC will still be execution of computational jobs and data management, and the goal is to ease access to these services for potential users while retaining the relatively non-intrusive nature of the current ARC. The architectural vision for future ARC development is quite traditional: it involves a limited number of core components, with computational and data management services on top. The core should be flexible enough to add general kinds of services, ranging from conventional ones for scientific computations, to more generic, like e.g. shared calendars. The resulting product should be, as before, a complete solution, ready to be used out of the box, simple to deploy, use and operate. The post-0.6 major releases of ARC are expected to appear on yearly basis, with the version 1.0 scheduled for May 2007. Between the major releases, frequent incremental releases are planned, such that the newly introduced features and components could become available for early testers as soon as possible. Version 1.0 will introduce new interfaces for the current core ARC services and will also provide core libraries and a container for developing and hosting additional services. Version 2.0 will add numerous higher level components, such as the self-healing storage system and support for dynamic runtime environments via virtualization. Finally, with version 3.0 ARC will get extended with a scalable accounting service, enhanced brokering and job supervising service, and many other higher level functionalities. Most new components will be developed anew, especially those constituting core services. Some existing services will be re-used or seamlessly interfaced, such as e.g. the conventional GridFTP server, VOMS or job flow control tools like Taverna [11]. While ARC versions 1.0 and above will see a complete protocol change, backwards compatibility to pre-0.6 versions will be provided for as long as reasonably needed by keeping old interfaces at the client side, and whenever necessary – at the server side as well. 3.1

ARC Version 1.0

ARC version 1.0 will capitalize on the existing in version 0.6 services and tools, and will mark preparation for transition to new core components. The following steps are foreseen, leading to ARC 1.0: – Standards Document : this document will include an extended plan about how the ARC will implement essential OGF [12], OASIS [13], W3C [14], IETF [15] and other standards recommendations. – Architecture document : will describe the extended new main components and services of ARC, with functionality including distributed storage, information system, virtual organization support, core libraries and the container. This architecture will mainly focus on the transformation of ARC middleware to the Web services based framework and will describe the implemented services in such a new system.

476

P. Eerola et al.

– Web service core framework and container : first of all, the main functions core libraries and the container will be implemented as described in the architecture document. The core framework will include such functions and methods as a common protocol (HTTP(S), SOAP, etc.) multiplexer, internal service communication channels, some part of authentication, operating system level resource management, common configuration system, logging and saving mechanism of internal states of services, etc. Meanwhile, the main service container will contain all the on-site manageable services. – Specifications for the Runtime Environment (RTE) Description Service and for the RTE Repository Service: the current static and rather basic RTE support can be extended in many ways. The goal of this step is to identify the possibilities of such extensions and to define components and interfaces for the new components. The enhanced RTE system with dynamic and virtualized RTE support will be implemented in two steps. – Modular building and packaging framework : current building and packaging procedures will be improved and optimized by introducing means for modular builds. Other improvements in the framework are foreseen, including transition to new version control and build systems. – Extended back-ends: the reliability and robustness of the ARC middleware is the result of its very robust core back-end components. These back-ends form the layer between the Grid and the resources and have on many occasions proved superior to other resource managers in terms of performance, manageability and stability. This work item aims at further improving the performance, manageability and scalability of the ARC back-ends, adding support for additional batch systems, improving and standardising the batch system interfaces and offering better manageability and control of a Grid resource for the resource owner. – Initial Web service basic job management: since job management is one of the most widely used components of current ARC, the first step to the Web service based system is to transform this component using current ARC capabilities to a Web service. – Web service clients: to accommodate for transition of the job management to a Web service, relevant changes will be made in end-user clients and tools. – Enhanced treatment of RTEs via the Runtime Environment Description Service and RTE Repository: after this step, the current static RTE system will be extended with a full semantic description of RTEs and with services that could help to collect and organise RTE packages and their descriptions. Simultaneously, a general framework for dynamic RTEs will become available as well. – Web service based elementary information index framework : as a first step to reach one of the main goals, support for dynamic Grid creation (note also a P2P information system later), ARC will provide a Web service front-end to the current information indexing system.

Roadmap for the ARC Grid Middleware

3.2

477

ARC Version 2.0

ARC version 2.0 will see creation of new high-level components on top of the developing core services. The main foreseen steps towards this release are: – Taverna integration: Taverna [11] is a workflow management system used strongly by the bioinformatics community. This step will see ARC and Taverna working together seamlessly. – flowGuide integration providing proof of industrial quality: flowGuide [16] is a workflow management solution used in automotive industry. It will be adapted to be used in a Grid environment provided by ARC. – ARC – gLite gateway: interoperability between ARC and gLite [17] is a top priority for a large group of traditional ARC customers. This step will enable a gateway-based solution for inter-Grid job submission. – Self-healing Storage: base components: the new storage components of ARC will improve reliability and performance by incorporating automatic replication of data and indices with automatic fail-over, better integration between replica catalogues and storage elements and handling of collections of data. – On-site low level RTE management via RTE controller service based on virtualization: will provide a flexible management system to automatize the RTE management on the computing resources. – Policy Enforcement and delegation engine implemented as a part of the container : this step will extend the ARC security framework with even more fine grained security schemes. An enhanced access rights management framework will be developed, relying on the concept of delegation. A delegation language parser and policy engine will be selected and integrated into the ARC middleware. – MS Windows clients for main components: since vast majority of end-users prefer to use Microsoft Windows for the client operating system, this step is necessary to bring Grid closer to customers, minimizing the acceptance threshold. – P2P-based information service: the goal of this task is to investigate and propose novel, flexible Grid formation mechanisms which utilise the power of overlay network construction technologies of P2P networks. This work item aims to create next generation Grid information indices that fully support dynamic Grid formation. The new information backbones will be able to cope with the highly dynamical nature of Grids due to nodes unpredictably joining or leaving, the heavy load fluctuations and the inevitable node failures. – Improved resource control on the front-end : quotas for data, jobs and VOs will be implemented within the container and core services. – Support for Extended JSDL: ARC will be capable of dealing with complex I/O specification, dynamic RTE and authorisation policies etc. – Self-healing storage system with clients: the core-level data management services will be extended to significantly reduce the manual user effort by providing a high-level, self-healing data storage service allowing the maintenance of user data (metadata) in a fault tolerant, flexible way. Based on this service the meta-data maintenance, replica management, and synchronisation

478

P. Eerola et al.

would be assisted by a Grid storage manager providing a single and uniform interface to the user. – Client-side developer library (next generation arclib): the existing arclib client-side developer library will be extended to take into account changes in the core services and addition of new higher level services.

3.3

ARC Version 3.0

ARC version 3.0 will introduce new services and extend the existing ones, using the new possibilities that will become available in course of development. It is too early to discuss the details, but one can already list some important tools and services: – Job migration service: a set of services and techniques supporting migration of both queueing and running jobs. – Web service based monitoring: system monitoring tools and utilities making use of Web service technologies. – Job supervising service: a higher-level service capable of monitoring and eventually re-submitting jobs. – New brokering algorithms and services: new brokering models will be implemented for dynamic Virtual Organisations, supporting e.g. benchmark-based brokering, push and pull models. – Accounting service relying on Service Level Agreements: will implement a multi-level accounting system in a platform-independent way, based on Web service interfaces. – Wide range of portability of the server-side components: will add support for operating systems currently not engaged in Grid structures, such as Solaris or Mac-OS.

4

Conclusion and Outlook

ARC middleware will see very dynamic development in coming years due to its open source approach and the ever growing developers community, that enjoys steady support from various funding agencies. The NorduGrid collaboration, that created ARC with the help of the Nordunet2 programme funding, will coordinate this development, assisted by several national and international projects. The EU KnowARC project will be one of the major contributors, improving and extending the ARC middleware to become a next-generation Grid middleware conforming to community-based standard interfaces. It will also address interoperability with existing widely deployed middlewares. KnowARC also aims to get ARC included in several standard Linux distributions, contributing to Grid technology, and enabling all kinds of users, from industry to education and research, to easily set up and use this standards-based resource-sharing platform. The Nordic Data Grid Facility (NDGF) project will be another major contributor to ARC development. It aims to create a seamless computing infrastructure

Roadmap for the ARC Grid Middleware

479

for all Nordic researchers, leveraging existing, national computational resources and Grid infrastructures, and to achieve this, it employs a team of middleware developers, ensuring that user requirements will be met in the ARC middleware. Other projects, such as the Nordunet3 programme, national Grid initiatives, smaller scale cooperation activities and even student projects, are also expected to provide contributions to future ARC development. With this in sight, and guided by a detailed roadmap, ARC has a perfect opportunity to grow into a popular, widely respected and used Grid solution.

References 1. Ellert, M., et al.: Advanced Resource Connector middleware for lightweight computational Grids. Future Generation Comp. Syst. 23(2), 219–240 (2007) 2. SWEGRID: the Swedish Grid testbed, http://www.swegrid.se 3. Gadomski, S., et al.: The Swiss ATLAS Computing Prototype, Tech. Rep. CERNATL-COM-SOFT-2005-007, ATLAS note (2005) 4. Material Sciences National Grid Infrastructure, http://www.csc.fi/proj/mgrid/ 5. Nordic Data Grid Facility, http://www.ndgf.org 6. Ould-Saada, F.: The NorduGrid Collaboration. SWITCH journal 1, 23–24 (2004) 7. Grid-enabled Know-how Sharing Technology Based on ARC Services and Open Standards (KnowARC), http://www.knowarc.eu 8. EGEE gLite, gLite – Lightweight Middleware for Grid Computing, http://glite.web.cern.ch/glite/ 9. Laure, E., et al.: The EU DataGrid Setting the Basis for Production Grids. Journal of Grid Computing 2(4), 299–400 (2004) 10. Foster, I., Kesselman, C.: Globus: A Metacomputing Infrastructure Toolkit. International Journal of Supercomputer Applications 11(2), 115–128 (1997) 11. Oinn, T., et al.: Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics Journal 20(17), 3045–3054 (2004) 12. Open Grid Forum, http://www.ogf.org 13. Organization for the Advancement of Structured Information Standards, http://www.oasis-open.org 14. The World Wide Web Consortium, http://www.w3.org 15. The Internet Engineering Task Force, http://www.ietf.org 16. flowGuide Workload Management Solution by science + computing ag, http://www.science-computing.com/en/solutions/workflow management.html 17. Grønager, M., et al.: LCG and ARC middleware interoperability. In: Proceedings of Computing in High Energy Physics (CHEP06), Mumbai, India (2006)