Towards Virtualized Service Providers - Research

4 downloads 3382 Views 2MB Size Report
management effort which requires a lot of human interaction. In this thesis, we propose the creation of a Virtualized Service Provider that would reduce all these  ...
Universitat Polit`ecnica de Catalunya Master in Computer Architecture, Networks and Systems Master Thesis

Towards Virtualized Service Providers Introducing Virtual Execution Environments for Application Lifecycle Management and SLA-Driven Resource Distribution within Virtualized Service Providers

by

´I˜ nigo Goiri Advisor: Jordi Guitart July 2008

Abstract Nowadays, the user only wants to use a certain service and forget any kind of management problem. For this reason, service providers tend to offer complex services which can be used by the users without any effort. However, providing these services implies a high management effort which requires a lot of human interaction. In this thesis, we propose the creation of a Virtualized Service Provider that would reduce all these endeavors, and it would maximize the profits of the service providers. This thesis presents a solution to manage a service provider using virtualization. With regard to the traditional one, it adds a new intelligent resource management that takes into account the agreed SLAs with the customers and the resource usage of the applications, a data management system, and new features like migration and task pausing. Moreover, this service provider does all these tasks transparently to the user. This solution permits having a better resource management amongst applications by consolidating and isolating them at the same time thanks to virtualization, which allows assigning resources to each task. In addition, we provide a full-customized virtual environment for each task that grants full control to the task of its execution environment. This new kind of environment implies a new way of scheduling tasks with regard to traditional service providers. This implies new problems which must be considered like the overheads introduced by this new environment, but it also means advantages, such as live migration of task amongst nodes, or the easy check-pointing in its scheduling policies. Additionally, these resources management policies also take into account economic parameters when deciding allocating resources to a given application and supporting QoS. These decisions are also based on the SLA previously agreed with the customer. Furthermore, the new resource capabilities for managing task takes into account if it is accomplishing the previously agreed SLA, and it varies dynamically its resource assignation according to the task behavior. Hence, the scheduler maximizes the service provider profit by making an optimal use of the resources and reducing the penalties incurred by not meeting the SLA terms.

ii

Acknowledgements I would like to express my gratitude to all the people who has made this project possible. Especially, I would like to take this opportunity to extend my deepest gratitude to my advisor, Dr. Jordi Guitart, for his orientation, advices and support at the time to write this master thesis. Not only during the whole project but also in all the previous work to achieve this VSP that started two years ago. Many thanks to Ferran Juli` a, who has helped me in the development of this project and has worked closely with me day after day. I really want to thank him for all his help and his feedback during the elaboration of this master thesis. Many thanks to Prof. Jordi Torres for giving me the chance to do this master thesis by introducing me at the Virtualization world, and providing me a great environment to make it possible, which is the eDragon Research Group. I would like to show my gratitude to all the members of this fantastic group, especially to Ramon Nou for his invaluable discussion and help whenever I needed it. I am also very grateful to all the people who have participated in the revision of this master thesis, and their detailed feedback, which has highly improved this document. Finally, I would like to dedicate it to my family for their support and patience and especially to my father for being there everyday.

iii

Contents 1 Introduction 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Background 2.1 Virtualization . . . . . . . . . . . . . . 2.1.1 History . . . . . . . . . . . . . 2.1.2 Virtualization in the real world 2.1.3 Virtualization types . . . . . . 2.1.4 Implementation issues . . . . . 2.1.5 Xen . . . . . . . . . . . . . . . 2.2 Service Level Agreement . . . . . . . . 2.3 Agents . . . . . . . . . . . . . . . . . . 2.4 Previous work . . . . . . . . . . . . . .

1 1 2

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

4 4 4 5 8 10 12 13 15 16

3 Architecture 3.1 Task life-cycle . . . . . . . . . . . . . . . . . . . . 3.2 Virtual Machines management . . . . . . . . . . 3.2.1 Virtual Machine creation and destruction 3.2.2 Task execution . . . . . . . . . . . . . . . 3.3 Resource management . . . . . . . . . . . . . . . 3.3.1 Global resource management . . . . . . . 3.3.2 Interaction with the VtM . . . . . . . . . 3.3.3 Local resource management . . . . . . . . 3.3.4 SLA fulfillment . . . . . . . . . . . . . . . 3.4 Data management . . . . . . . . . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

18 19 20 20 21 21 22 23 24 28 30

4 Experimentation 4.1 Experimental environment . . . . . 4.2 Virtualized overhead . . . . . . . . 4.2.1 Measurement methodology 4.2.2 Overhead results . . . . . . 4.3 VM Creation Performance . . . . . 4.4 Data management . . . . . . . . . 4.5 VM Migration . . . . . . . . . . . . 4.5.1 Migration cost . . . . . . . 4.5.2 Migrating applications . . . 4.6 Resource Management . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

32 32 33 33 35 37 38 40 40 41 42

. . . . . . . . . .

iv

. . . . . . . . . .

. . . . . . . . .

. . . . . . . . . .

. . . . . . . . .

. . . . . . . . . .

. . . . . . . . .

. . . . . . . . . .

. . . . . . . . .

. . . . . . . . . .

. . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

4.6.1 4.6.2 4.6.3

Local resource management . . . . . . . . . . . . . . . . . . . . . . Global resource management . . . . . . . . . . . . . . . . . . . . . SLA-driven resource management . . . . . . . . . . . . . . . . . . .

5 Related Work 5.1 Creation and management of VMs . . . . . . . . . . . . 5.2 Local resource management using Virtualization . . . . 5.3 SLA-based resource management . . . . . . . . . . . . . 5.4 Dynamic resource management on clusters . . . . . . . . 5.5 Overall approach for resource management using Virtual

43 44 47

. . . . .

49 49 52 52 53 54

6 Conclusions 6.1 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

55 56

v

. . . . . . . . . . . . . . . . . . . . . . . . Machines

. . . . .

. . . . .

. . . . .

List of Figures 2.1 2.2 2.3 2.4 2.5

Computer systems abstraction layers . . . . . . . Paravirtualization difference to full virtualization Xen . . . . . . . . . . . . . . . . . . . . . . . . . Nwana’s Category of Software Agent . . . . . . . SERA architecture . . . . . . . . . . . . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

8 9 12 15 17

3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10 3.11

Virtualized Servce Provider architecture . . . . . Task life-cycle . . . . . . . . . . . . . . . . . . . . Scheduler architecture . . . . . . . . . . . . . . . Interaction between Scheduler and VtM . . . . . Virtualization Manager architecture . . . . . . . Virtualization Manager resource usage . . . . . . Surplus resource distribution . . . . . . . . . . . Virtualization Manager resource levels . . . . . . SLA enforcement cycle . . . . . . . . . . . . . . . Data management: Distributed shared filesystem Stage-in and Stage-out . . . . . . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

18 20 23 24 25 26 27 28 29 30 31

4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11 4.12 4.13

Resource usage measuring of a VM containing mencoder . . . Resource usage measuring of a VM containing Tomcat . . . . Resource usage of a mencoder . . . . . . . . . . . . . . . . . . Application server execution . . . . . . . . . . . . . . . . . . . One task lifecycle . . . . . . . . . . . . . . . . . . . . . . . . . Network and disk usage on a mencoder remote execution . . A mencoder is being migrated during its execution . . . . . . Migration of a Tomcat during its execution . . . . . . . . . . Calculated resources of a mencoder with requirement of 50% Calculated resources of a Tomcat with different requirements CPU allocation of nodes between VMs . . . . . . . . . . . . . CPU consumption and allocation of tasks . . . . . . . . . . . CPU management with SLA violations . . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

34 34 35 36 38 39 41 42 43 44 45 46 47

5.1 5.2 5.3 5.4 5.5 5.6

SODA architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VMShop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SoftUDC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Globus Virtual Workspace . . . . . . . . . . . . . . . . . . . . . . . . . Management system architecture for heterogeneous workloads . . . . . Two-level autonomic resource management for virtualized data centers

. . . . . .

. . . . . .

49 50 50 51 53 54

vi

List of Tables 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8

Testbed machine . . . . . . . . . . . . . . . . . . . . . . . . . Tomcat performance Virtualized vs Non-Virtualized . . . . . VM creation times . . . . . . . . . . . . . . . . . . . . . . . . Hard drive features . . . . . . . . . . . . . . . . . . . . . . . . Time to copy a file from a disk to another . . . . . . . . . . . Time to execute mencoder in different environments . . . . . Time to execute mencoder in a migrated VM at 300 seconds . Submitted tasks to the VSP . . . . . . . . . . . . . . . . . . .

vii

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

33 37 37 39 39 40 40 44

Chapter 1

Introduction 1.1

Motivation

Nowadays, the users do not want to install and configure the software they need in their computers, what they want is to use it and forget about deployment problems. For this reason, computing is becoming more and more externalized; software is being hosted by a provider that makes it available to the user as a service in the Internet. Thanks to this new approach, the need to install, maintain and run the application in the customer’s own computer has disappeared. These services are typically deployed on a third-party company that can be seen as a service provider. Service provisioning over the Internet has been highly extended in computing since this network was created; for instance, web hosting or email hosting services. Although in the past hosting (web, email, storage. . . ) was the only kind of service provisioning that was really used, in these days service providers start providing application service provisioning. This is known as the Service-Oriented Computing (SOC) paradigm [36] and it also allows to easily compose service for building distributed applications. Those services are offered by service providers that provide the implementation, the service description and the technical support. From the business point of view, the service provider agrees with its customers the Quality of Service (QoS) through a Service Level Agreement (SLA). Generally, the SLA is a bilateral contract between the customer and the service provider that states not only the conditions of service, but also the penalties when the service is not satisfied, as well as the metrics to evaluate the level of service. The kind of services that are commonly requested by a user are very extended, and it can go from a word processing software, like Google Docs, to a Web Server that will be available for a week. It implies a new challenge, making cohabit applications with very different behaviors and requirements. In addition, putting applications of different users in the same computer can imply safety problems and resource unbalance between them. Managing this heterogeneity is one of the biggest problems that has to be solved. Managing a service provider implies a huge effort in terms of human interaction, an administrator has to create and set up each system for being used by the customer. In addition, this management implies a physical presence in order to start or reboot systems. Service provider resource management is mainly done manually in these days. The administrator decides in a manual way the amount of resources and this assignation 1

remains static during the whole task life cycle. During the last years, a more dynamic resource allocation for tasks has appeared [44]; nevertheless, these solutions are partial and only take into account one resource level. For instance, there are projects that try to manage the allocation of a job in a certain node and there exist other projects that manage the resources between applications in the same server. Consolidation of applications of different users in a single server allows saving a lot of space and cooling; nevertheless, a malicious user can put in danger other applications. Allocating an application in a virtual machine introduces security to the users by isolating them from each other. This consolidation can join applications with complementary resource usages during time. In addition, introducing virtualization in service providers solves most of the problems previously mentioned. Virtualization reduces the management efforts because a physical machine is shown as software. For instance, this new abstraction allows having a new server just copying a previously created or moving the application from a server to another if it would be needed. Furthermore, using this approach is useful in order to provide a more customized environment for each user to install the applications that the user need and giving the chance to configure it as he wants. Isolating applications in virtual machines allows a more suitable resource management between applications. For instance, a certain amount of CPU can be assigned to a given virtual machine that allocates a web server and a database. However, this approach is not suitable to all the applications because of the cost of starting a new environment for executing the task. If a user wants to run a task that takes one minute to be executed, starting up the new environment would involve some seconds. For this reason, the applications which this solution is focused are those with a large execution time. Nevertheless, it would be useful if the user would like to execute several short tasks that would be located in the same virtual machine. One of the biggest problems of using virtualization is the amount of overhead introduced by it. Nevertheless, it exists a technology that achieves the levels of performance comparable with traditional environments, it is called paravirtualization. This kind of virtualization is based on the idea of presenting a software interface to the VM that is similar but not identical to that of the underlying hardware. Thanks to this technique, the overhead introduced to manage this new VM becomes highly reduced. Xen [6] will be used as the used paravirtualization technology because it obtains high performance rates closer to common environments in traditional applications by introducing low overheads [13] and reducing the I/O operations cost [18] in contrast with other alternatives. In addition, Xen is Open Source and provides many tools to manage resources between virtual machines.

1.2

Contribution

This project makes use of virtualization in order to bring a new approach to service providers by introducing the concept of a virtualized service provider. This idea allows having a better resource management between applications by consolidating and isolating different applications at the same time. Each task is provided with an on demand fullcustomized (i.e. task specific) and isolated virtual environment, granting full control to the task of its execution environment without any risk to the underlying system or the other tasks. These virtual environments are consolidated in order to achieve a better utilization of the provider’s resources. 2

In addition, this project gives a global solution for managing resources by communicating a two-level resource management, and not only in a partial way. It creates a global resource manager that dynamically allows allocating resources to a certain application in a coarse and fine grained way at the same time. The allocation process is enhanced by using challenging technologies such as agents and virtualization. Our proposal follows the typical service provider architecture, a single scheduler that manages multiple resource managers with some changes for supporting virtualization. On the one hand, the typical scheduler must be modified in order to take into account the overhead added to create and start the execution environment and support new features like migration and pausing. These features also imply introducing new scheduling policies that allows an application to be paused or moved between nodes. On the other hand, the resource manager acts as a machine wrapper that allocates one task per virtual machine and distributes resources among applications. Nevertheless, it needs big changes with regard to typical resource managers in order to support creating virtual machines customized and fulfill the user needs and managing the whole VM life cycle. This new component will be called Virtualization Manager (VtM). As part of this global solution, the system that we present also deals with the management of data that applications need. It allows applications accessing to the data that it requires, even if this application is migrated between nodes. It also provides an efficient access to the data from the applications and allows storing it for accessing it in the future. The resource management in this project also takes another perspective, introducing economic parameters in the decisions for allocating resources to a given application and supporting QoS. In addition to traditional resource management policies, these decisions are also based on the SLA previously agreed with the customer. Using this information the scheduler maximizes the service provider profit, by making an optimal use of the resources and reducing the penalties incurred by not meeting the SLA terms. This master thesis is organized as follows: Chapter 2 gives a background of the key concepts used in the rest of the project; Chapter 3 describes some related work to this project; Chapter 4 presents the architecture of the whole system and details of how it works; Chapter 5 makes some evaluation of the introduced approach; and finally, Chapter 6 presents some conclusions and the future work.

3

Chapter 2

Background This chapter presents some key concepts that will be used during the rest of the thesis. The first concept and the most important one is virtualization, this allows abstracting the machine management to software management. The second one is agent-based programming, it provides a more intuitive architecture and easy communication mechanisms. Business has an important role in the VSP, for this reason, an approach to QoS and SLA in service providers is given. Finally some previous work is presented in order to give a wide view of the main components of this system in different environments.

2.1

Virtualization

As it has been previously presented, one of the most important topics in this project is virtualization [46]. This approach has several new features that bring huge improvements to the typical concept of service provisioning through the network. There are many virtualization alternatives but they have a common issue: hiding technical system details. Virtualization is based on creating an interface that hides implementation issues through encapsulation. In addition, it has introduced access multiplexing to a single machine opening new ways of research. This section will do an overview about virtualization technologies and it will describe which techniques exist and how these are implemented in real solutions. It will also do a complete review about one of these implementations, Xen, the one that will be used in this project. It will also have a look at how these technologies are being used on real environments and which virtualization alternatives and why are being utilized describing their advantages with respect to typical solutions.

2.1.1

History

Virtualization is not something new, it is used since 1960s as software abstraction layer that partitions a hardware platform into virtual machines that simulate the underlying physical machine that allows running unmodified software. This mechanism provides a way to multiplex application usage to users sharing processor time. Providers started the development of hardware that supports virtualized hardware interfaces through the Virtual Machine Monitor, or VMM. In the early days of computing, 4

the operating system was called the supervisor. With the ability to run operating systems in other operating systems, the term hypervisor resulted in the 1970s. At that time, it was used in industry and in academic research; nevertheless, in the 80s, modern multitasking operating systems and hardware price allowed users run their applications in a single machine. It seemed to be the end of virtualization and hardware providers that no longer supported virtualization in their architectures. It became a historical curiosity. However, in the 1990s Stanford University researchers found that they disposed a lot of architectures running different operating systems, and it made difficult to develop and port applications. Introducing a layer that made different hardware looked similar, Virtualization was the solution. Furthermore, it brought new solutions like process isolation, security, mobility and efficient resource management. Nowadays, it is a real alternative and is widely extended; for instance, hardware providers are taking up again virtualization support in their hardware.

2.1.2

Virtualization in the real world

Nowadays, virtualization is opening new ways in many different computing areas thanks to its different advantages such as the saving in power, space, cooling or administration. One of the most important areas where virtualization can introduce big improvements is hosting. In this scenario, servers can be underutilized and different machines can be consolidated in a physic one. Some technologies like operating system virtualization and paravirtualization can achieve desired performance levels in a complete isolated environment. With this solution, fewer machines are needed to attend the same workload with a hardware saving (including costs and space). In addition, it reduces management and administration requirements thanks to migration and replication capabilities of this method. Thanks to some virtualization techniques isolation capabilities, virtualization is a great solution for sandboxing purposes. Virtual machines provide a secure and isolated environment (sandboxes) for running foreign or less-trusted applications. Virtualization methods that achieve a robust environment for the underlying machine are full virtualization and paravirtualization. Therefore, virtualization technology can help building a secure computing platform. Multiple environments in a single computer are another virtualization feature. Many of the virtualization types support multiple virtual machines; nevertheless, only some of them achieve a performance level enough for being really usable in real environments, full virtualization and paravirtualization. In addition, virtualization resource managing capabilities also allow resource sharing in a managed way, taking into account virtual machine requirements and giving QoS capabilities to the system. Last virtualization usage also allows multiple simultaneous operating systems and it allows running specific operating system applications without being necessary to reboot to other operating system. This feature open system dependent applications opens dependent application systems to every operating system and architecture. Thanks to virtualization, architectures or hardware that has never been implemented can be tested. Full virtualization and emulators can achieve this objective, providing new instructions or new features with developing purposes. It also allows a complete profiling that can introduce a considerable overhead; however, developing benefits are much more bigger than difficulties. In addition to architecture virtualization, non existing hardware can be used; for instance, Virtual SCSI drives, Virtual Ethernet adapters, virtual Ethernet 5

switches and hubs, and so on. Software developing takes great benefits of virtualization and one of the biggest is debugging. Having a complete profiled system permits a complete software debugging. In addition, it can help to debug complicated software such as an operating system or a device driver by letting the user execute them on an emulated PC with full software controls. Another improvement that can be obtained with virtualization is migration. An application (or the complete operating system) can be migrated to another machine. This aspect is one of the features of application virtualization, full virtualization, paravirtualization and library virtualization. With these techniques an application can be moved to a different hardware without any modification. In addition, some of these methods allow live migration; in other words, moving an application to an other place while it is being executed. This characteristic can be moved to a higher level, converting a whole system in a package that can be easily deployed and configured in other machines providing complete software packages. Combining last virtualization capabilities, a new scenario for testing purposes can be easily created. For instance, a certain test requires a big amount of machines but we do not have them, thanks to virtualization we can deploy several VMs in a single machine by reducing the hardware needed and saving deploying time. Next subsections will present different approaches that are being used in different fields of the real world like development, security or service provisioning. 2.1.2.1

Virtual servers

Hosted applications rarely make use of all machine resources. Allocating some of them which have complementary loads in the same computer would increase server utilization and reduce hardware costs. Nevertheless, putting distinct types of applications in the same environment without any control would interfere other applications; therefore, it is needed to control and isolate them with a mechanism like virtualization. Introducing this solution, the number of used machines is reduced and the cost decreases; moreover, it also reduces management costs. Migrating and replicating virtual machines is easier than installing a complete operating system or check why is it failing. So it reduces time and personal to manage systems with minimum knowledge. During the last years, virtualization was not considered an enough efficient solution, but nowadays alternatives like OpenVZ or Xen have a minimum overhead with a great performance that makes them a real choice. Nowadays, some hosting enterprises offer virtualized servers known as VPS. Some examples are Spry and its division VPSlink that offers virtual private servers with OpenVZ, linode.com with Xen or Axarnet that uses Virtuozzo. An important measure of web hosting quality is uptime and using virtualization and its migration characteristics provides a 100% server uptime, an impossible issue with traditional hosting. This is another great virtualized servers advantage. 2.1.2.2

Development

In the IT development, virtualization can make easier development tasks and it can be used in many areas such as software development or security issues. Working in this type of environment introduces some improvements with regard to traditional environments. 6

This computing area has been highly benefited by virtualization. This was an area that implied many time for deploying, managing and other tasks that were not strictly needed for developing, thanks to virtualization these undesired tasks have been mostly removed saving many time and it has made development easier. Software development Virtualized environments are used in development and software testing because it allows developers to use multiple virtual machines and check it introducing a basic issue: hardware cost reduction. In addition, tested hardware can be easily adapted to change system characteristics according to developer needs. Another advantage is porting software from the test environment to a production one migrating this machine. This deployment time has been eliminated and applications start running instantly. Virtualization is also used as a sandbox for critic application development. Developing a kernel or a module can crash the machine many times and introducing a minimal layer that isolates real system from the working one to develop applications would make this task easier. Therefore, developer can work without being afraid of crashing the whole system and reducing time to reboot the whole system. For instance, the Linux kernel occupies a single address space, which means that a failure of the kernel or any driver results in the entire operating system crashing. Applying virtualization if one operating system crashes due to a bug, the hypervisor and other operating systems continue to run. This can make debugging the kernel similar to debugging user-space applications. Mobility scenarios Taking into account virtualization implementation issues, it allows taking the whole virtual machine state. This allows migrating a virtual machine to another machine including its state. This feature enable developing new applications that supports execution for a large period, when the overlaying machine needs to be maintained it can be moved to another machine. Furthermore, a virtual machine can be stored periodically to avoid systems failures due to power problems or hardware fails and restored immediately, obtaining a high availability degree. Virtualization can also be seen as a middleware that abstracts underlying system and therefore implementing software in a virtual machine can be ported to any architecture that supports that virtualization layer without any modification. Security Thanks to virtualization, a system can be considered as a safe environment and protect the overlayed system and the rest of virtual machines from possible attacks or failures. In security developing projects, virtualization has also great advantages. For instance, in virus profiling, this job can be done in a virtual environment without any risk and allowing a complete system profiling thanks to VM characteristics. In a local area network, a honeypot implemented on a virtual machine represents a system with some typical bugs or security weaknesses to attract hackers that try to attack the network and distract them from the really important systems of the network. In addition, this honeypot can be highly monitored to make an early detection of possible intrusions. From the local network security view, virtual machines can be a way to easily restore infected systems. Thanks to virtualization management capabilities, a minimal system 7

installation or system backups can be stored in a server to restore them later if it was necessary. Having multiple users in a single machine implies a risk, isolating each user in a restricted virtual machine reduce these risks to the minimum expression. Using a virtualization method some restrictions like preventing some instructions executions, restricting traffic network. . . can be specified, giving a high security level.

2.1.3

Virtualization types

There is not just one way to achieve a virtualized environment. In fact, there are several ways to achieve this with different levels of abstraction obtaining different characteristics and advantages. Computer systems introduce a division into levels of abstraction separated by welldefined interfaces. Levels of abstraction allow implementation details at lower levels of a design to be ignored or simplified, thereby simplifying the design of components at higher levels. The levels of abstraction are arranged in a hierarchy, with lower levels implemented in hardware and higher levels in software. Figure 2.1 shows how a typical system that is separated in different layers introduces a different abstraction degree according to the layer level.

Applications System Libraries

Operating System Hardware Figure 2.1: Computer systems abstraction layers

Virtualization introduces an abstraction layer to show higher layers a different overlayed system. Virtualization can be classified according to the system layer interface that it abstracts, although some virtualization techniques such as paravirtualization or partial virtualization combine some of these with performance purposes. Taking into account some virtualization reviews like [24] that introduces some virtualization techniques, the majors types are: hardware emulation, full virtualization, partial virtualization, paravirtualization, operating system-level virtualization, library virtualization and application virtualization. All these types and some particular implementations will be described in the next subsections. 2.1.3.1

Hardware Emulation

In this virtualization method, VM simulates a complete hardware allowing an unmodified OS to be run. Every instruction is simulated on the underlying hardware and this means a high performance lost (can achieve a 100 times slowdown). The VMM which has to 8

translate guest platform instructions to instructions of the host platform, which is called emulator. In addition, it can even run multiple virtual machines, each one simulating a different processor. Many techniques are used to implement emulation. Some of the most famous examples of emulation are Bochs and QEMU. 2.1.3.2

Full virtualization

This method, also known as native virtualization, uses a virtual machine that mediates between guest operating system and the native hardware. It is faster than emulation but slower than underlayed hardware because of the hypervisor mediation. In this case, host operating system does not need to be modified. Virtual machine simulates enough hardware to allow an unmodified operating system. Certain machine instructions must be trapped and handled within the hypervisor because the underlying hardware is not owned by an operating system but instead, it is shared by it through the hypervisor. One of the biggest advantages of full virtualization is that guest OS can run unmodified. There are multiple alternatives in this technique like VirtualBox. VMWare, Parallels Desktop and z/VM. 2.1.3.3

Paravirtualization

This technique has some similarities to full virtualization. It uses a hypervisor for shared access to the underlying hardware but integrates some virtualization parts into the operating system. This approach implies that the guest system needs to be modified for the hypervisor. It is born with the need to increase full virtualization performance and explores ways to provide high performance virtualization of x86 by implementing a virtual machine that differs from the raw hardware. Guest operating systems are ported to run on the resulting virtual machine. To implement this method, hypervisor offers an API to be used by the guest OS. This call is called “hypercall”. This issue increases the performance with regard to full virtualization. Figure 2.2 shows how full virtualization offers the same interface to the VM of the underlying hardware; nevertheless, paravirtualization offers a modified interface. Guest

Guest

OS

Hypervisor

Hypervisor

Hardware

Hardware

Hardware

Figure 2.2: Paravirtualization difference to full virtualization

On the one hand, guest OS needs to be modified and this can be a disadvantage. On the other hand, this approach offers performance near to the unvirtualized system. In addition, it can run multiple different operating systems concurrently. Some of the most famous examples of paravirtualization are Xen and Parallels Workstation. 9

2.1.3.4

Operating system-level virtualization

This method uses a different technique to virtualize servers on top of the operating system itself. It supports a single operating system and simply isolates the independent servers from one to another. The guest OS environments share the same OS as the host system and applications running in this environment view it as a stand-alone system. This method requires changes to the operating system kernel but this implies a huge advantage, native performance. It enables multiple isolated and secure virtualized servers to run on a single physical server. Each one has its own superuser, set of users/groups, IP address, processes, files, applications, system libraries, configuration files, etc. 2.1.3.5

Library virtualization

In almost all of the systems, applications are programmed using a set of APIs exported by a group of user-level library implementations. Such these libraries are designed to hide the operating system related details to keep it simpler for normal programmers. However, this gives a new opportunity to the virtualization community. The most famous library virtualizer is Wine [5]. It is an open source reimplementation of the win32 API for UNIX-like systems and it can be viewed as a layer that allows compatibility for running Windows programs without any modification; for example, it allows running windows native application to be run in Linux. 2.1.3.6

Application Virtualization

This approach runs applications in a small virtual environment that contains components needed to execute a program such as registry entries, files, environment variables, user interface elements and global objects. This virtual environment acts as a layer between the application and the operating system, and eliminates application conflicts and applicationOS conflicts. The most popular application virtualization implementation is the Java Virtual Machine provided by Sun. Java Virtual Machine [3] is a software layer that introduces a virtual environment that can execute java bytecodes. It abstracts application from the underlying system, the same code can be executed in a x86 or in a PowerPC architecture.

2.1.4

Implementation issues

Virtual machines execute software in the same manner as the machine for which the software was developed. The virtual machine is implemented as a combination of a real machine and virtualizing software and implementations issues depend on the virtualization technique, nevertheless, the main part of them follow the same philosophy more or less. Typically virtualization is done by a layer that manages guest petitions (processor demand, memory or input/output) and translates them into the underlying hardware (or to the underlying operating system in some cases) making them executable. A typical implementation decision in emulation and full virtualized environment is separating an executed code between privileged and non-privileged for performance reasons. This decision is based on the principle that a code is executed in different ring levels and virtual machines are typically in the non-privileged layer and it demand an special control for the privileged instructions. 10

2.1.4.1

Processor

Emulating instructions interpreted by the underlying processor is the key feature of different virtualization implementations. The main task of the emulator is to convert instructions and it could be done by interpretation or binary translation for instance. Then, it executes this code in the underlying machine. Nevertheless, current architectures like IA-32 is not efficiently virtualizable because it does not distinguish between privileged and non-privileged instructions implying that every instruction must be identified. Some improvements in newest processors to avoid this problem will be discussed in next sections. Meanwhile, a typical operating system uses a scheduling algorithm that determines which processes will be executed in which processor and how long, in a virtualized environment, virtualization layer must take the decisions of which VM will be executed. 2.1.4.2

Memory

Operating system assigns memory pages among processes with a page table that assigns real memory among processes running on the system. And virtual machine monitors uses this host operating capabilities to map memory to each process. To implement memory sharing between virtual machines there are several ways, but every method maps guest application memory into the host application address space, including the whole virtual machine memory. This mapping can be done in a more software way or relying this decision on the hardware depending on the virtualization method. Paging requests are converted into disk read/writes by the guest OS (as they would be on a real machine) and they are translated and executed by the virtualization layer. With this technique, standard memory management and replacement policies are still the same than in a non-virtualized machine. 2.1.4.3

Input/Output

Operating system provides an interface to access I/O devices. These accesses can be seen as a service that is invoked as a system call which transfers control to the operating system. It uses an interface to a set of software routines that converts generic hardware requests into specific commands to hardware devices and this is done through device driver calls. Implementing Input and Output typically only store the I/O operation and pass it to the overlying system and then return it to the application converting petitions to system specific formats. 2.1.4.4

Recent hardware support

At the beginning, x86 architecture does not support virtualization and it makes difficult to implement a virtualized environment on this architecture. Virtualization software needs to employ sophisticated mechanisms to trap and virtualize some instructions. For example, some instructions do not trap and can return different results according to the level of privilege mode. In addition, these mechanisms introduce some overhead. Main chip vendors, Intel and AMD, have introduced extensions to resolve these difficulties. They have independently developed virtualization extensions to the x86 archi11

tecture that will allow a hypervisor to run an unmodified guest operating system without introducing emulation performance penalties. This improvements are based on the inclusion of an special mode, VMX, that supports privileged and non-privileged operations and then any instruction can be easily executed without taking into account if it is privileged or not.

2.1.5

Xen

Xen [6] is a free open source hypervisor that allows a high usage degree and consolidation of servers created by XenSource. It provides mechanisms to manage resources, including CPU, memory and I/O. At that moment, this is the quickest and safest virtualization infrastructure. Nevertheless, paravirtualization requires introducing some changes in the virtualized operating system but resulting in near native performance. Many distributors such as Intel, AMD, Dell, Hewlett-Packard, IBM, Novell, Red Hat or Sun Microsystems use this software. In addition, it has a GPL license and can be download freely. In a Xen environment a virtual server is just an operating system instance (called domain in the Xen environment) and its load is being executed on top of the Xen hypervisor. These instances access devices through the hypervisor, which shares resources with other virtualized OS and applications. Xen was created in 2003 by the computation laboratory of the University of Cambridge known as the Xen Hypervisor project, leadered by Ian Pratt. During the next years, the present Xen company was created, XenSource. The key of Xen success is paravirtualization that allows obtaining a high performance level. Xen gives to the guest operating system an idealized hardware layer. Intel has introduced some extensions in Xen to support the newest VT-X Vanderpool architecture. This technology allows running operating systems without any modification to support paravirtualization.

Dom0

DomU

DomU

Linux Drivers Kernel0

Linux KernelU

Linux KernelU

Xen Hypervisor Hardware Figure 2.3: Xen

When the base system supports Intel VT or AMD Pacifica, operating systems without any modification like Windows can be run. With this new architecture and paravirtualization allows this OS without modifications achieve virtualized Linux performance levels. Overhead introduced by Xen hypervisor is less than 5% in CPU intensive applications [13]. In addition, thanks to paravirtualization I/O operations are executed out of the hypervisor and shared between domains following resource sharing policies [18]. Nevertheless, virtualized domains are fully isolated. 12

Xen hypervisor is the lowest and most privileged layer. Above the hypervisor guest operating systems or VMs are allocated. It has its own notation respect virtual machines, it refers them as domains and it is always a guest operating system that is booted when the hypervisor boots and have special management privileges, this domain is called “Domain0”. Xen also offers some tools like live migration, CPU scheduling and memory management, which combined with open source software advantages make Xen a great alternative that allows the administrator having full resources control and distributing resources among VMs. It provides different tools and interfaces that enable accessing to the hypervisor values and modifying and monitoring them. One of the most important interfaces in Xen is XenStore, which provides a way for communicating VMs and easily extends its functionalities. For instance, default memory monitoring does not take into account the real consumption of the guest system and just reports the initially allocated, for this reason and following the philosophy of VMWare tools, we have developed an extension that is located in the VM and informs about its resource usages using XenStore. Taking advantage of these monitoring and managing tools, XenMonitor has been developed. It allows a full control of resources of each VM and monitors a big set of resources like, disk, network, CPU usage, memory. . . In the evaluation section, a little vision of the possibilities that opens this approach is given. Finally, Xen is a widely extended virtualization alternative for increasing server usage and optimizing global costs and it is used by application service providers and hosting companies because it provides a precise resource manager.

2.2

Service Level Agreement

One of the biggest points in a service provider is the business factor. Usually the service provider agrees with its customers the Quality of Service (QoS) through a bilateral contract between the customer and the service provider called Service Level Agreement (frequently SLA). It states the conditions of service, specifying the performance metrics that the service provided should achieve but also the penalties when the service is not satisfied. A SLA is a formally negotiated contract between customers and their service provider or between service providers. It records the common understanding about services, priorities, responsibilities, guarantee, and the level of service. For example, it may specify the levels of availability, serviceability, performance, operation, or other attributes of the service like billing and even penalties in the case of violation of the SLA. This contract defines and identify customer’s needs and encourages dialog in the event of disputes reducing the areas of conflict between the two parties. In addition, it allows eliminating unrealistic expectations by introducing a previous negotiation. The issues that are treated by the contract are: the services to be delivered, the performance that this service should achieve, a way to check this SLA, the customer Duties and Responsibilities and finally the termination of the whole process. Service Level Agreements can contain numerous service performance metrics with corresponding service level objectives. Metrics commonly agreed in these cases are the percentage of calls abandoned while waiting to be answered, the average time it takes for a call to be answered, the percentage of calls answered or the uptime of a service. There are different definitions for SLAs. We have chosen the XML implementation 13

that uses both WS-Agreement [10] and WSLA [27]. These specifications allow the definition of penalties and rewards. Another capability that we found useful from these specifications is the definition of metrics through average measures in order to give more stability to the system. When measuring SLA metrics, it is important to have stable measures, since in other case the SLA metrics can depend more on when you do the measures than in the measures themselves. Next is an XML example file that gives an example of the usage of these type of definitions: ... 2005-12-31T00:00:00 2008-12-31T00:00:00 2 ... Application_monitor process_cpu_time_serie Application_monitor twoSecSchedule process_Cpu_Load 10 ...

This example shows how we can define how many resources does the application need and during how much time. Using these XML files we can specify the SLA that the applications will need. 14

2.3

Agents

The concept of agent has been widely discussed and there is not a clear answer of what is an agent or not. We will define it as a software abstraction for describing a complex software entity that acts with a certain degree of autonomy in order to perform a task. This concept has been widely studied by Nwana in [33]. Agents abstraction relies on the concept of behaviors, which describes small tasks in order to achieve a certain target. When the agent have reached one objective using a behavior, it can finish or can pass to execute an other behavior for obtaining another target. Programming using this approach enhance modularity (which reduces complexity), speed (due to parallelism), reliability (due to redundancy), flexibility (i.e. new tasks are composed more easily from the more modular organization) and reusability at the knowledge level (hence shareability of resources). The power of agents is their capability for cooperating in order to achieve a bigger target. This agent cooperation is easy to achieve because they are designed for being interconnected and allow interoperation of multiple agents. Collaboration learning agents Cooperate

Learn

Smart agents

Collaboration agents

Autonomous

Interface agents

Figure 2.4: Nwana’s Category of Software Agent

Agents can be classified in several different ways, Figure 2.4 shows the classification that Nwana presented. Following we present some typical agent cases: Intelligent agents agents that have the ability to sense the environment and reconfigure itself in response or can learn using trial-and-error. Autonomous agents agents that claim to be autonomous, being self-contained and capable of making independent decisions, and taking actions to satisfy internal goals based upon their perceived environment. Distributed agents agents that work together being very loosely coupled. This agent case should be particularly easy to implement in a distributed fashion and should scale well. Multi-agent systems several agents interact forming a multi-agent system. Each agent will not have al data available and will cooperate with other agents, in addition, there may be little or no global control 15

Mobile agents agents that move themselves including their execution state, on to another processor, to continue execution there. The framework used for implementing agents in this project is JADE (Java Agent DEvelopment Framework) [2]. It is a software Framework fully implemented in Java language. It simplifies the implementation of multi-agent systems through a middle-ware that complies with the FIPA specifications. JADE allows sending messages in an easy way that allows specifying communication protocol between agents and the interoperation between them. JADE agents are implemented as a single thread which is scheduled between the agent core and the different behaviors. This framework provides a wide number of default behavior that carry out different common tasks. Some of these behaviors are: do a task once, do a task after a time, do a task periodically, receive a message and send a message. In our project, each entity (Scheduler and Virtualization Managers) is implemented as an agent. It allows a flexible way to communicate each entity and allows them interoperating with no extra efforts. Each agent can adapt itself if it detects changes and can interact with others in order to solve any problem. For instance, the VtM periodically monitor the states of the job and if it detects any problem can ask to the Scheduler agent in order to move this job to another VtM. Using this approach, each agent has a good knowledge of its local environment and cooperates with others for managing the global service provider in an easy way.

2.4

Previous work

The idea of developing a Virtualized Service Provider comes from previous projects. The project which VSP is feed from is a prototype designed for the BREIN project (Business objective driven REliable and Intelligent grids for real busiNess) [15]. This is a European Commission project, with the objective of bringing the e-business concept developed in recent Grid research projects. The project which VSP is mainly feed from is the prototype designed for the BREIN project prototyping phase. This prototype is called Semantically-Enhanced Resource Allocator (SERA) [19] (Figure 2.5). SERA provides a framework for supporting the execution of medium and long running tasks in a service provider, which facilitates the provider’s management (thus reducing costs) while fulfilling the QoS agreed with the customers. Resources are assigned depending not only on the requirements of the tasks for fulfilling the SLA terms, but also on the information given by service providers with regard to the level of preference (according to business goals) of their customers. The most important technologies in this approach are agents, semantics and virtualization. In this case, semantics is used to construct descriptions of the tasks and the resources. This enhances the inferences and enables the use of business factors, allowing the SERA to dynamically reschedule the tasks in order to meet the requirements of those with more priority. It uses a virtual environment for executing tasks using the Resource Manager, a previous version of the Virtualization Manager used in VSP. In addition, the system supports fine-grain dynamic resource distribution among these virtual environments based on SLAs in order to adapt to changing resource needs of the tasks and maximize service provider’s goals. SERA implements an adaptive behavior by means of agents which guarantees to each application enough resources to meet the agreed QoS goals. Furthermore, it can provide 16

Service Provider Physical Machine 1 Application Manager 1 Resource Manager 1

Client 1

...

Client Manager 1 Client Manager 2

Virtual Machine 1

... Application Manager M

Semantinc Scheduler

Virtual Machine M

...

Client Manager C Client C

Physical Machine N Application Manager 1 Resource Manager N

Semantic Metadata Repository

Virtual Machine 1

... Application Manager R

Virtual Machine R

Figure 2.5: SERA architecture

the virtual environments with supplementary resources, since free resources are also distributed among applications depending on their priority and their resource requirements. The system continuously monitors if the SLAs of the applications running in the provider are being fulfilled. If any SLA violation is detected, an adaptation process for requesting more resources to the provider is started. As it has been said, this prototype makes use of a previous version of VtM in order to manage virtual environment where the tasks will be executed. This component will be described in detail in the Section 3.2.1.

17

Chapter 3

Architecture This section presents the structure and the operation of the Virtualized Service Provider (VSP). The VSP tries to bring all the virtualization advantages that have been previously discussed to traditional service providers by introducing a smart resource management that takes into account economic parameters resulting from the SLA agreed with the user and the estimation about how many resources will this application need. This SLA can specify different parameters like how much must the user pay if the service is done and the penalties if the SLA is violated. As a typical service provider, the system is composed by a single Scheduler and multiple nodes that are managed by Virtualization Managers and are in charge of creating virtual machines where the tasks are going to be executed (Figure 3.1). VSP

Client

Scheduler

Virtualization Manager 1

Task Agent 1 Task Agent 1 Task Agent 1

Task Agent N Virtual Task Agent 1 Virtual Task Machine 1 Agent 1 Virtual Machine 1 Machine 1 Task Agent N Virtual Node 1 Virtualization Task Agent 1 Virtual Task Machine V Agent 1 Virtual Manager N Machine 1 Machine 1 Virtual Node N Virtual Machine W Virtual Machine 1 Machine 1 Node M Virtualization Manager N

Data Repository

Figure 3.1: Virtualized Servce Provider architecture The client interacts with the Scheduler in order to execute a task inside the VSP, this interaction can be done using a Web-Service interface or using XML-RPC. It enables any customer to use this system from a Windows user to a UNIX one. In order to execute a task, the user has to specify the application that wants to execute and the data that 18

the application needs. In addition, the user can provide extra information like an SLA specifying the expected level of service and the penalties if a violation of this contract occurs. Using this information, the Scheduler will deduce the amount of resources that the task will need. Currently, this inference is done in a very basic way, but extensions will be part of our future work. The main advantage of this system in comparison with a traditional provider is that it provides a global solution for executing tasks inside the virtual service provider. This solution allows creating customized environments to execute tasks on demand taking into account what the user needs. It also provides a global resource management that allows assigning resources in a global way (assigning a task to a given node) and in a local way (assigning a certain amount of CPU to a given application in a node). Finally, it provides a data management system that allows any application access the data it needs wherever the task is located. This chapter will firstly explain in detail the way a task follows when it is executed in the VSP in order to introduce the purpose of each component. Next, it will present how VtM manages virtual machines where the task will be executed. As a main part, it will present the different policies that it follows to distribute the resources between the different tasks. Once each component has been explained in detail, it will present the global system interaction taking into account details previously presented in the components. Finally, it will explain how the data that a task needs is managed inside the system.

3.1

Task life-cycle

This section will give a small overview about how a task is executed inside the VSP and the steps that it follows before and after it is executed. This will be helpful in order to understand how the components interact with each other and give a global vision before each component is described in detail. First of all, when we talk about a task we do not talk about just a single task, but a task that can be composed by multiple applications that interact among them to obtain a certain result or offering a service. For instance, a web site can be deployed in the system and be composed by a web server, an application server and a database. All the needed applications will be packed in a virtual machine and we will work with the idea of a task per VM. Hence, a task in this environment will be the set of applications needed to carry out a certain work and all the needed environment to execute it that will be packed inside the VM. Figure 3.2 shows the interaction of components from the moment the task is submitted until this is executed. Firstly (1), the client sends the task and its description in terms of requirements and economic parameters to the Scheduler. The client also has to send the data needed by the task to the data repository. Secondly, the Scheduler decides which VtM will initially contain the VM that will execute the task and sends the task to this VtM (2). Once the VtM has been notified to execute the task and have all the needed information, it creates the virtual machine and a “TaskAgent” (3). This TaskAgent will be in charge of executing the task once the VM is available to execute jobs (4). Finally, it will inform periodically about the status of the task to the Scheduler (5). Once the task has finished or the previously agreed time has been consumed, the Scheduler notifies the VtM that it has to finish the VM that has executed the task (6). When the VtM has destroyed the VM (7), it will copy the output data to the data repository in order to be accessed by the user to recover his results (8). 19

5 8 Client

1

Scheduler

2 6

Virtualization Manager

Task Agent 3 7

3

1

4 Virtual Machine

8

Data Repository

Figure 3.2: Task life-cycle Next sections will present more details of each step and how the resource management for a task is done during this life cycle.

3.2

Virtual Machines management

The Virtualization Manager (VtM) is mainly in charge of creating and managing Virtual Machines where the tasks will be executed. In addition, it makes local resource management that is presented in Section 3.3.3. VtM as the component that manages VMs can be seen as a machine wrapper that allows creating new customized virtual environments for executing tasks, make data needed by the user available inside the virtual machine, execute tasks on it, monitor their execution and finally extract the results and destroy this virtual environment. When a new request for executing a task arrives from the Scheduler, the VtM checks if it is possible to create a new VM with specified features and answers about the success/failure of the virtual machine creation. The configuration parameters specified by the Scheduler include the minimum resource requirements of the virtual execution environment and the priority of the task that will be placed in the VM. Taking into account this information, the VtM distributes the whole resources of the machine to the Virtual Machines that are being executed in that node. As it has been explained before, it uses the Xen hypervisor to provide virtualization due to its performance and the resource management capabilities.

3.2.1

Virtual Machine creation and destruction

Once the VtM has been registered into the Scheduler, it waits for requests from the Scheduler to execute a task and, if it is possible to execute the task, it creates a VM to execute the task and informs the Scheduler of the success. One of the biggest issues in the management of a VM is its disk space. A VM has a home space that is available to the user in order to store needed information, the base system and the swap space. In addition, it can also contain extra applications needed by the client (e.g. Tomcat server) that would be located in an application image that is automatically mounted in the VM. One of the default applications on the system that 20

is present in this application image is Globus, which enables executing and monitoring tasks. Each time the Scheduler requests the creation of a new VM, the following sequence is performed: downloading and creating the main system (a Debian Lenny through debootstrap), copying extra needed application or data, creating home directories and swap space, setting up the whole environment, packing it in an image and starting it up. From this description, one can derive that this process can have two bottlenecks: network (in order to download the whole system) and the disk (copying applications and creating system images, approx. 1GB of data). Creating a cached image that contains all the required files can easily solve the first bottleneck. Nevertheless, it can be reduced much more if a default image with no settings is created. Having this default image implies copying it for each new virtual machine, increasing the second bottleneck, disk. This has been solved by adding a second caching system that periodically copies default image and applications images to a cache space. Finally, the VtM moves these images (just an i-node change) to the final location when a new VM is created. Thanks to these caching techniques, the downloading time has been reduced to 0 and the whole machine creation has been reduced from up to 40 seconds to an average time of 6 seconds. More details about creation times will be shown in the evaluation chapter. When the client has finished the task or the Scheduler decides that this task should be rescheduled or canceled, is necessary to destroy the VM. At the moment that the virtual machine is destroyed, output data is stored in a external data repository where the user can access and take the results.

3.2.2

Task execution

The VtM executes tasks inside the VM and monitors their state in order to control its life-cycle using Globus Toolkit [20]. It implies having it installed in each VM as it has been previously mentioned. The VM must generate the certificates needed to execute tasks remotely and start Globus when the VM system boots by creating certificates each time a VM starts greatly depends on the system clock because certificates can be detected to be not valid yet or already expired. For this reason, an ntpdate runs in the overlaying machine and inside each virtual machine in order to have synchronized clocks in each component. The biggest problem in synchronizing clocks is setting the correct date inside the virtualized environment because it highly depends on the overlaying Xen and migrating it from a machine to an other can imply a delay. This has been solved by desynchronizing the VM clock from the clock that it detects. Thanks to this methodology, tasks can be executed and migrated between machines without any problem.

3.3

Resource management

The Virtualized Service Provider has a global resource management distributed into two different levels that are coordinated in order to fulfill tasks requirements in a dynamic way and taking into account the behavior of the task during its execution. In the first level, the Scheduler decides the node where each task will be executed. Thanks to virtualization, a task can be paused or moved to another node if it would be needed. Using these features we improve the typical scheduler concept, allowing the Scheduler deciding where a running task has to be rescheduled or stopped. 21

In the second level, each Virtualization Manager distributes resources between the tasks contained in a single node. It also monitors the resource consumption of a task and checks if the SLA is satisfied. Using the virtualization abstraction, it can redistribute these resources to solve problems. These two management levels work individually, but also work as a whole allowing moving tasks if the local level cannot achieve their targets.

3.3.1

Global resource management

Global resource management mainly consists in the scheduling of tasks in certain nodes. This scheduling is performed every time that a new task arrives or when an already running task needs more resources and it cannot be locally solved. Knowing the real status of the whole system, including local resource distribution changes, is a key issue in order to perform a good task scheduling between nodes. For this reason, each VtM informs to the Scheduler about the assigned resources to each virtual machine every time that a change in the allocation occurs. In the first case, when a task arrives to the system, it contains a description of itself and a description of the SLA agreed with the user. The initial application requirements are extracted from this information and are used to perform the task scheduling. Once the Scheduler knows the task requirements, it proceeds to send the new task to a machine that has enough resources to satisfy its requirements following different scheduling policies. If this simple scheduling cannot be carried out, it uses the task information and the global system status in order to calculate the utility function of the different possibilities, such as migrate or pause tasks to execute the task in its place, and choose the best one according to different parameters. In the second case, if the Scheduler is informed that a task needs more resources and these needs can not be solved locally, it will reschedule this task performing the same operations that in the new task arrival case. Once we have presented an overview of the Scheduler functionalities, Figure 3.3 shows the architecture of VSP Scheduler. Scheduler follows a cyclic behavior that periodically check the global system state. Nevertheless, it can receive events that immediately force it to recalculate how the current executing jobs are scheduled among the different nodes if a “SLA violation” occurs. This situation will happen if the VtM has no enough resources in its node in order to satisfy task needs. The Scheduler cycle-time indicates the start-time granularity of the jobs. Each cycle generates a shot of the system with the location of each task and this will be the global system scheduling until the next iteration. The cost of deploying and migrating a virtual machine must be considered when the scheduling cycle occurs in order to avoid too much overhead in the system. This mechanism guarantees the maximum utility (defined by the policy) for each cycle time slice. As we do not know the incoming workload, the only way to ensure a global utility maximum would be the use of prediction mechanisms planned to be used in the future. The power of this Scheduler architecture relies on its capability of being modified in order to support different scheduling policies. We can change a policy that tends to put all the tasks in the same node in order to maximize its usage for another one that tries to balance the load between all the loads. These policies can take into account the overhead of migrating a task between different nodes or the economical parameters of the task. In this project we only focus on simple policies in order to test the capabilities of the 22

Scheduler Job Queue

Incoming Jobs

Inference

VtM Interface

Core X ML-RPC

ACL Message

VtM

Job Allocator

Data

Figure 3.3: Scheduler architecture Virtualization Manager and provide a strong infrastructure, implementing and testing new policies will be part of our future work.

3.3.2

Interaction with the VtM

This section will present the interaction between the main components of the VSP, the Scheduler and the Virtualization Manager, and how they cooperate in order to distribute service provider resources between the tasks in a global (placing a task in a certain node) and a local (distributing resources between tasks in the same node) way. The components in the VSP are wrapped using the agent abstraction; hence, the interoperation between the components is done using the agent abstraction. This allows separating each functionality in an intuitive way and communicating each entity in an easy way. Agents have been used to coordinate the different components, monitor and react to possible undesired events, such as SLA violations or failures, providing the system with self-adaptive capabilities. Following this methodology, the Scheduler is wrapped in an agent that exchange messages with VtMs which are also wrapped in an agent and represents the machine that will contain the VMs. The VtM Agent also creates the agent in charge of managing the task execution, the “TaskAgent”. Figure 3.4 shows the interaction between the Scheduler and the VtM agents, this figure also shows the different behaviors that the VtM has and the TaskAgent in charge of managing the task execution. This interaction starts when a VtM is started and initially registers itself in the Scheduler in order to be used by this and once the VtM has been already registered, it starts waiting to execute tasks. In the moment that the Scheduler receives a petition for executing a task from the user, it has to decide where this task should be executed according with its internal 23

StatusNotification

ExecuteTask

TaskAgent

WaitingVM

WaitingTask

VtM Register

Scheduler

register create VM ready VtM status

Task executing

VtM status

Task state change

pause migrate destroy

Figure 3.4: Interaction between Scheduler and VtM policies. Then it sends a message to the selected VtM for executing this inside a virtual machine with given features. When the VtM receives the request, it creates the VM and waits until the machine is available to execute the task. At the moment that the VM is ready to execute a task, it will create an agent called “TaskAgent” which manages this task execution. Meanwhile, the VtM will still wait for executing new tasks and managing other tasks executions. The TaskAgent is in charge of executing the task and once this has been executed, it monitors its status periodically and notifies this to the Scheculer if any change in the task status happens. Finally, when the task has finished, it notifies to the Scheduler that it is done and this will reply to the VtM that the VM that contains this task can be destroyed. The VtM behavior that waits for executing new tasks also waits for messages from the Scheduler to destroy, migrate or pause tasks that are being executed in this VtM. In order to perform the resource management, the VtM periodically sends the machine status to the Scheduler specifying the resources that each task needs in order to make the Scheduler able to decide to move tasks between nodes or pause them. This kind of messages are also sent when an internal SLA violation is detected and it cannot be solved assigning more resources to this task. In this moment, the Scheduler must re-schedule the task in order to solve this SLA violation.

3.3.3

Local resource management

Once a task has been assigned to a given node managed by a Virtualization Manager, the VtM is in charge of creating and managing the virtual machine where the task will be executed. However, it is also in charge of managing its execution and this includes monitoring its execution, in order to fulfill its requirements and assigning more resources 24

to it if it would be necessary. The VtM is the responsible of distributing the physical resources among the VMs that are being executed in a single node. Its goal is maximizing physical resources utilization while respecting the SLA agreed with the user. Figure 3.5 shows the architecture of the VtM. It is composed by two main parts, the first one is the part that is in charge of managing virtual machines that execute tasks (previously presented in 3.2.1) and the second one manages the resources assignment to each virtual machine. This resource management is based on the monitoring of the tasks and the resources assigned to each one. VtM External Interface

Domain Store

VtM Core

TaskAgent 1 JobAgent 2 JobAgent 3 Domain 1 Domain 2 Task 1Domain 3 Task 2 Task 3

Resource Manager

Resource Assigner

VtM Monitor

Resource Calculator

SLA Evaluator

Figure 3.5: Virtualization Manager architecture The first resource management phase is composed by the “VtM Monitor” and the “Resource Calculator” which are in charge of estimating the resources that the task needs and monitoring if the task is accomplishing the SLA. The second phase composed by the “Resource Assigner” is in charge of assigning resources based in the estimation obtained by the first phase and maximizing machine utilization by assigning the machine resources to the tasks without leaving surplus resources no assigned to any task and maximizing the global system usage. 3.3.3.1

Resource monitoring

The VtM Monitor takes care of monitoring the execution of each task and the Resource Calculator estimates the amount of resources that this task really needs in order to assign resources. Figure 3.6 shows an application that is running in a VM managed by the VtM with no other tasks being executed in this node. In the figure, “current” usage is the resource usage that the application is doing during the time and is stored by the VtM Monitor using the XenMonitor. This usage is obtained from outside the virtual machine for two reasons: the first is that doing this, it will charge to the customer the cost (in CPU and memory) of this monitoring, and the second one is because measures taken inside the VM are not real measures: the only way to get real measures is asking to the hypervisor. This problem is manifested, for instance, when having only one real CPU and two VMs that share that CPU and try to consume 100% of the CPU. In this case, we get two different measures of the CPU consumption depending 25

C alculated usage Estimated usage

C urrent usage

Time

Figure 3.6: Virtualization Manager resource usage on where we measure. Measuring outside we would get a measure of the 50% of CPU for each VM but measuring inside the VM we would get a very oscillating measure between 20% to 100%, this difference between the inside and outside measuring is shown at Section 4.2. Next, the “estimated” corresponds to the value that the Scheduler has inferred from the SLA agreed with the user before the execution. Finally, “calculated” is the usage that the “Resource Calculator” estimates that this application will really need to be executed, it is based on the estimation done initially by the Scheduler and varies depending on the application real usage. In order to make the Scheduler conscious about the real requirements of each task, this value is periodically informed to the Scheduler. The calculation of this estimation is initially based on the initial estimation; nevertheless, it varies according to the real usage of the application. Algorithm 1 shows how the calculated value is calculated once a considerable amount of usages values have been recorded, until then it uses the estimation provided by the customer. This algorithm has three different phases; the high increase phase tries to avoid assigning too much resources to a given application in a small period of time. The normal increase phase provides resources to the application immediately. Finally, the decrease phase is brought up slowly in order to not subtract too much resources immediately and wait until it confirms this. Once it is detected that an application do not need that resources, it uses three different values to calculate the estimation: last 5 values mean, 1 minute mean and half and the total mean to reduce the assignment slowly. 3.3.3.2

Resource assignment

Once the resource estimation has been done and the calculated resource usage has been calculated, this value will be used by the “Resource Assigner” as the base amount of resources that will be assigned to the VM. In case that the sum of the amount of required resources is lower than the capacity of that node, the system will become inefficient by having surplus resources that are not assigned to any task. For this reason, the surplus resources are redistributed among the VMs according to a dynamic priority. This priority initially corresponds to the priority set by the Scheduler and it is dynamically modified in order to apply for more resources (by increasing the VM priority) if the internal SLA is violated. The amount of resources assigned to a VM is calculated using the next formula: 26

Algorithm 1 Calculate resource needs for a VM 1: history = A list containing resource usages 2: calculated = last calculated value 3: decreasing = 0 4: if last(history) > 1.1 · calculated then {High increasement} 5: calculated = max(mean(history, 5), calculated) 6: decreasing = 0 7: else if last(history) > calculated then {Normal increasement} 8: calculated = last 9: decreasing = 0 10: else if calculated > mean(history, 5) then {Checking decreasement phase} calculated 11: decreasing+ = 0.5 + mean(history,5) 12: end if 13: if decreasing > 200 then {Decreasement} 14: aux = max(mean(history, 5), mean(history, 60), mean(history)) 15: if required > aux then 16: aux = mean(required, aux) 17: end if 18: calculated = mean(calculated, aux) 19: decreasing/ = 2 20: end if 21: return calculated 100

100

Surplus

VM 1

VM 2

VM 1 Reallocation + VM 2 Usage

0

VM 1

VM 2

VM 1 + VM 2 Usage

0

Figure 3.7: Surplus resource distribution

pi Rassigned (i) = Rcalculated (i) + PN

j=0

pj

· Rsurplus

where pi is the priority of client i Every time that the usage requirements of a VM change or dynamic priorities are changed, the VtM recalculates the resource assignment of the VMs and finally the “Resource Assigner” binds these resources to each VM (Figure 3.7). The resource binding in a node is a key issue in the local resource management; Xen provides different mechanisms for assigning resources to a VM. CPU management is straightforward by using the Xen Scheduler credit policy. This policy allows specifying the maximum amount of CPU assigned to a VM by defining scheduling priorities. For example, in a platform with 4 CPUs (i.e. 400% of CPU capacity) and two VMs, one with a priority of 6 and the other with a priority of 4, the first could take at most the 240% of CPU, while the other could take at most the rest 160% of CPU. The scheduling priorities can be set using the XenStat API. 27

On the other side, there are some limitations for dynamic memory management using VMs. In Linux systems, the mapping of physical memory is done at boot time. Once the guest system has booted, if the amount of memory allocated to the VM is reduced, the guest system is adapted to this reduction automatically. Nevertheless, when assigning to the VM more memory than the initially detected by the guest system, Linux does not make it available to the user. It would be necessary to restart the guest system to make all this memory available. In order to overcome this limitation, the VtM creates all the VMs with the maximum amount of memory possible and then it reduces the amount of allocated memory to the value indicated by the Scheduler.

3.3.4

SLA fulfillment

This resource management also supports SLA enforcement by extending the two level resource management common operation. It is important to have in mind that our proposal is supposed to be executed internally into the service provider and by the moment it does not include any SLA negotiation procedure and assumes that the Scheduler and the customer have already negotiated the SLA and it is available by the Scheduler. The local resource management uses the internal VtM Monitor (Section 3.3.3.1) in order to check if the SLA is being accomplished. SLA used internally by the “SLA Evaluator” is more restrictive than the agreed with the user in order to avoid real violations and acting before real penalties actually happen. The level of restriction of the internal SLA depends on the metrics and its type. For instance, CPU level is decreased a 10% and the number of requests per second is increased a 5%. If the dynamic local resource management cannot assign enough resources in order to avoid this violation, it is requested to the Scheduler to give more resources to the task. Figure 3.8 shows the different resource usage levels of a VM that shares a physical machine with others. For simplicity, it only shows the resource information of a single task. This figure shows the internal SLA used inside the VtM in order to act before a real violation occurs and the real one. 1

2

Actual allocation C alculated usage Internal SLA SLA C urrent usage SLA violation soft SLA violation

Time

Figure 3.8: Virtualization Manager resource levels The graphic shows how the amount of resources assigned to the virtual machine that contains the task varies during the time according to its usage (calculated resource usage) and the amount of resources that are not being used by any task (surplus resources). This 28

situation is happening in the first phase of the figure where exists surplus resources and are assigned to that task according with the formulas previously presented and following the dynamic priority system. If the sum of all the calculated resource usage inside a single node is higher than the capacity of that node (second part of the figure), the VtM informs the situation to the Scheduler and it must re-schedule the task in order to provide the required resources. The figure also shows when the system detects that the task is violating the SLA. The system detects that an application is violating the SLA if the usage is greater than the internal SLA. Nevertheless, the SLA Evaluator must take into account if the application is using the previously assigned resources. Otherwise, it is considered as an SLA soft violation and there is no need to take any measure. It is because if a virtual machine that is executing a task is not using the assigned resources, it could be detected as a violation. 3.3.4.1

SLA enforcement

Initially, the agreed SLA is attached to the task execution request that arrives to the VtM as an XML file that follows the WSLA standard (presented in Background Section 2.2). This SLA is assigned to an SLA Evaluator that will check it periodically. The SLA enforcement cycle (Figure 3.9) starts with the monitoring of the resource usage that is performed every second. This forces the minimum granularity of the interval between two consecutive measures for each metric in the SLA to be also 1 second. New Values

Measurer Sched

Any Change

No

Direct Measurer

Yes

Get values

Recalculate SLA

Resource Assigner

SLA Violation

Sensor Sensor (VtM Sensor (Resource Monitor) (Resource Calculator) Calculator)

Evaluate SLA SLA fulfilled O X YG EN

Figure 3.9: SLA enforcement cycle In each cycle, the Direct Measurer component gets the values from the VtM Monitor, controlling the measurement intervals for each metric in the SLA and ensuring the refresh of the measures on correct time. When the new values arrive at the Measurer Sched component, it checks if any metric has updated its value. In this case, the Measurer Sched recalculates the top-level metric defined in the SLA and it compares the result with the internal SLA value and decides if the SLA is being violated. If the SLA is fulfilled, the Measurer Sched waits until the next iteration; otherwise, the SLA violation protocol starts. 29

The first step in the SLA violation protocol is requesting more resources to the Resource Assigner. If there are unused resources that can be assigned to that VM, the VtM will redistribute the resources as described in Section 3.3.3.2 by increasing the dynamic priority and the SLA cycle will start again. This priority initially corresponds to the priority set by the Scheduler and it is dynamically modified in order to apply for more resources. Finally, if all the physical resources are already allocated, the VtM will notify to the Scheduler the situation in order to reschedule the task again. Using this adaptation mechanism, the system is able to manage itself avoiding as much as possible the SLA violations. Of course that more intelligence could be applied in this part, by improving economic policies in the Scheduler in order to get the better task allocation in terms of profit or taking into account the penalizations for violating the SLAs.

3.4

Data management

Virtualization provides the capacity of moving virtual machines between nodes; that is migration. It implies creating a snapshot of the memory and the disk, and moving it from one node to another. Nevertheless, providing live migration also implies making the disk available on both nodes. Projects like [25] and [42] propose using a NFS server that provides the images to every node. However, our proposal focuses on a distributed view and every node generates and manages its own images. For this reason, we propose using a NFS system distributed among the nodes; in other words, each node will have a NFS server that can be accessed from all the other nodes as is shown in Figure 3.10. Thanks to this technique, the VtM will work on a local disk in the creation phase and it will be locally available in most of the cases, and remotely by everyone else when is needed.

High Speed Network

High Speed Network

Figure 3.10: Data management: Distributed shared filesystem Applications usually need some data as input in order to be executed, and finally they generate an output; for example, the video encoders would need an input video file and it will generate an encoded video file. This data will be stored in a disk image that belongs to each customer and contains the home space of each virtual machine; hence, this data is available in the user home. Taking into account that in the VSP the tasks are executed among the nodes and they can be migrated. In addition, its home space also needs to be shared between the different nodes. For this reason, a server that contains the user images of each customer is present in the VSP and makes it available to the VMs using NFS. This server also provides a service that is in charge of the data stage-in and when the task has finished it performs the stage-out. As it has been presented in Section 3.1, Figure fig:stageinout shows the stage-in of the VM 1 and the stage-out process of the VM N. First, in the stage-in phase, the user sends 30

VSP ...

VM 1 Files

VM N Files

Data Repository Send Files

Get Files

Files Files

Create Image

Extract Image

Image Image

Node

Figure 3.11: Stage-in and Stage-out the data needed by the application that wants to execute in the VSP to a data repository. This sent information can be performed using FTP, SFTP or HTTP transfer. When the virtual machine that will execute a task is being created, the data management service creates a disk image that contains all the data required by the user and puts it in the home space of the machine. Finally, when the task has finished, it performs the stage-out by extracting the information from the VM image and putting it in the data repository that is accessible by the user, where the user can recover this information using the same protocols used to perform the uploading. This is just a proof of concept in order to test if the system works; adding more security to the data transfer is part of our future work. Furthermore, this server also stores the image of each user once the task has finished and it can be reused if the customer requires to use it in a new virtual machine as if it were a “home space” repository. Finally, the performance of this solution will be discussed and tested in detail in the Chapter 4 in order to test how well does it work inside the Virtualized Service Provider.

31

Chapter 4

Experimentation Previous sections have presented a proposal model for using virtualization in a service provider and its advantages. This section will test the most critical implementation issues. The first part will remind on the cost of introducing virtualization instead of using a typical system. Next, we will perform some experiments in order to measure the time needed to create a new environment for the user. Finally the model introduced to manage data will be put under test; and finally, the two-level scheduling will be shown experimentally.

4.1

Experimental environment

As it has been previously presented, our proposal provides a new environment for executing applications with different behaviors and needs in a service provider. For this reason, experiments will focus on the execution of two real applications: a batch process and an application server. The first application is a video file encoding, mencoder [31], which requires one CPU and the amount of resources is directly proportional with its execution time. This feature will be useful for testing the scheduling policies. In addition to be a batch application, it performs accesses to disk in order to read the original file and write the encoded movie. This disk intensive application will be mainly used for testing the data management of the VSP. The second one simulates a web server that is composed by an application server (Apache Tomcat [1]) serving pages through SSL and accessing data on a database server (MySQL [4]). The application will contain a simulation of eBay, RUBiS (Rice University Bidding System) [38]. As it is composed of different parts, it will help to demonstrate the capacity of this system of deploy complex applications with different requirements. In order to stress the whole system and obtain performance metrics of the system a modified httperf [22] which simulates several clients accessing the application server with different behaviors from different nodes is used. Current machines are not easily stressed by a single node which makes requests to it; for this reason, multiple nodes will be coordinated in order to test the server. These clients have a variable behavior and will generate a variable load during time in order to simulate a typical situation in a commercial web server with heavy load during the day and low traffic during the night. 32

Once we have presented the applications that will be executed and their features, the hardware used to simulate the service provider will be presented. Our experimental testbed consists on a machine that acts as scheduler: a Pentium D with two CPUs at 3.2GHz with 2GB of RAM. And two 64-bit Intel Xeon of 4 processors each one that act as Virtualization Managers: one with 3.0GHz per processor and 10GB of RAM memory and the one at 2.66GHz and 4GB of memory. They run Xen 3.1 and the VtM executes in the Domain-0, a summary of this information can be found at Table 4.1. Finally, different machines are used as clients. All these machines are connected through a Gigabit Ethernet. Host A B C

Name pctinet pcperot pcsiset

Role VtM VtM Scheduler

Processor type 64-bit Xeon 64-bit Xeon 64-bit Pentium D

Processor 4 @ 3.00GHz 4 @ 2.66GHz 2 @ 3.20GHz

Memory 10 GB 4 GB 2 GB

Table 4.1: Testbed machine Most part of the software is written in Java and runs under a JRE 1.5, except the scripts that manage the creation of the virtual machines, which are written in Bash script, and some libraries used for accessing Xen, which are written in C.

4.2

Virtualized overhead

The overhead introduced by Xen have been extensively discussed in different works such as [13] or [32]. Nevertheless, we consider that providing a quantification of the overhead introduced for the applications that will be used on the rest of the experiments will give to the reader a better approach of how Xen works and will introduce some interesting ideas that will help to understand the following experiments.

4.2.1

Measurement methodology

Figure 4.1 and 4.2 shows the measuring of a mencoder and a Tomcat execution from inside (using sysstat inside the VM) and outside the VM (using Xen capabilities from the Domain-0) respectively. These figures also show the hypervisor consumption (accounted to the Domain-0) that mainly occurs when I/O operations take place, for instance in the mencoder case, Domain-0 does not consume too much CPU, while in the Tomcat execution, it consumes a 5% of the CPU used by the VM, this is produced by the I/O scheduling. In addition to this Domain-0 overhead, we can observe how the internal and external measurements are different; in the first case, being jailed in a VM introduces a distortion in the OS vision of the machine and it can not give an accurate measurement of the resources usage. In the case of the network intensive application, the I/O network management and the routing needed is accounted to the VM that uses it, hence, the big difference between both measurements. For these reasons, we can conclude that the best method to evaluate the resource usage of an application is measuring it from outside the VM using the Xen capabilities. 33

120

Internal External Domain 0

100

CPU (%)

80 60 40 20 0

0

100

200

300

400 Time (seconds)

500

600

700

Figure 4.1: Resource usage measuring of a VM containing mencoder

250

Internal External Domain 0

CPU (%)

200

150

100

50

0

0

500

1000

1500

2000 Time (seconds)

2500

3000

Figure 4.2: Resource usage measuring of a VM containing Tomcat

34

3500

4.2.2

Overhead results

Once we have observed how to measure the resource usage in this virtualized environment, we will proceed with the measurement of the overhead introduced respect to a traditional system with no virtualization. In order to evaluate it, applications will be executed in a VM and in the same machine without virtualization. Figure 4.3 shows the execution of a mencoder in the machine with no virtualization and finally running in a virtual machine created by our system. The metric to evaluate how well has been executed a mencoder is the total time in mean. Firstly, to execute in the traditional environment it has spent 589.5 seconds and in the second case the execution has taken in mean 586.7 ± 3.85 seconds. Taking into account this standard deviation, we can conclude that executing a CPU intensive task in a paravirtualized environment does not imply a loss of performance.

Memory (MB)

120 100 80 60 40 20 0

CPU Memory

Virtualized CPU Memory Read Write

0

100

200

300

400

500

600

Memory (MB)

CPU (%) CPU (%)

Non virtualized 120 100 80 60 40 20 0

700

Time (seconds)

Figure 4.3: Resource usage of a mencoder In addition of the overhead introduced by virtualization, Figure 4.3 also shows one of the advantages of using virtualization: easy resource usage monitoring of an application without instrumentalizing applications. For instance, we can monitor the I/O and observe how the mencoder reads the source file from disk and periodically writes the output file. As it has been mentioned, virtualizing I/O introduces a certain penalty, nevertheless, in this case the application bottleneck is CPU and the disk accesses are not heavy enough to be noticed as a loss of performance on the task execution time. Figure 4.4 shows the execution of the web server stressed during one hour by several clients. The first one has been executed in the machine without virtualization and the second one inside a virtual machine. Once we have already discussed the differences in terms of CPU usage of this application in both environments due to the virtualization overhead, we will measure its performance. The main performance metric in this kind 35

Reply rate (replies/s)

of applications is measured in terms of replies per second or finished sessions. In this sense, both executions have a very similar behavior with a very similar replies per second pattern (that can be seen in the top graphic of these figures). 600 400 300 200 100 0

CPU (%)

Tomcat reply rate

500

300 250 200 150 100 50 0

0

500

1000

1500

2000

2500

3000

3500

Tomcat CPU usage

0

500

1000

1500

2000

2500

3000

3500

Time (seconds)

Reply rate (replies/s)

(a) in a traditional environment 600 400 300 200 100 0

CPU (%)

Tomcat reply rate

500

300 250 200 150 100 50 0

0

500

1000

1500

2000

2500

3000

3500

Tomcat CPU usage Domain-0 CPU usage

0

500

1000

1500

2000

2500

3000

3500

Time (seconds)

(b) in a virtual machine

Figure 4.4: Application server execution Table 4.2 shows the performance obtained in three different executions of the Application server in a virtualized and a non-virtualized environment. We can see how the difference of the number of connections, requests and replies is negligible in mean. Therefore, the obtained performance of the Tomcat is the same using our approach and a traditional one. Finally, we can conclude that the virtualization technique used in the VSP introduces 36

Execution 1 2 3 mean

Connections VM No VM 892003 890808 888186 889210 891901 894164 890696 891394 -0.078%

Requests VM No VM 1811411 1808630 1803185 1804658 1810934 1815874 1808510 1809721 -0.067%

Replies VM No VM 924382 922698 920221 920674 924056 926494 922887 923289 -0.044%

Table 4.2: Tomcat performance Virtualized vs Non-Virtualized a very low overhead with applications that only use CPU and memory but it must be taken into account for I/O intensive applications and specially in network applications which implies routing. Although this CPU overhead, applications executed inside a virtual machine have a similar final performance.

4.3

VM Creation Performance

One of the most critic points is the virtual environment creation, it implies the reduction of a lot of human effort; nevertheless, it is hard to quantify this kind of improvement. Therefore, this section provides some indicative measurements about the time needed to create a VM and make it usable for a customer’s application, and the benefit of our cache systems for reducing the VM creation time compared with the default approach. As it was explained in Section 3.2.1, the creation of a system implies downloading and creating a base system with debootstrap. The downloading, packaging and creation of a whole default Debian (first caching level) needs around 150 seconds. This image is only done once and can be reused per every new virtual machine. The second caching level, that consist on pre-copying the default and the software image, takes approximately 1 minute per VM. However, applying the whole caching of each disk image (including home and swap spaces) needs 13.5 seconds. Using both caching systems, an image can be created in only 2 seconds. The different times needed for the creation of a typical VM are summarized in Table 4.3. Notice that, once the image is ready, it must be loaded, which needs 4 seconds. According to this, the total time needed to have a full configured system started is less than 7 seconds, when taking advantage of the whole caching system. Action Create and download default system image Create cached base system image Create cached base software image Create image using caching systems Load image Total time for running an image

Time 152.6” 59.1” 13.9” 2.3” 4.4” 6.7”

Table 4.3: VM creation times Nevertheless, this time does not take into account the time that is needed by the guest operating system to boot and be available to the user. In addition, the time needed for instantiating installed software must be also considered. All this time can be appreciated in Figure 4.5, which shows the CPU usage of the system from the VM creation to the VM 37

200

A

C

B

D

Capacity job1 Usage job1 Domain-0

CPU (%)

150

100

50

0

0

20

40

60

80

100

120

140

160

180

200

Time (seconds)

Figure 4.5: One task lifecycle destruction. The job is a mencoder which needs one CPU but it is allocated 4 of them because there are no other tasks in the system. During the phase A, the Domain-0 creates the VM, spending more than one CPU. During the phase B, the guest operating system is booted (first peak in the CPU usage graph) and then Globus is started, which includes certificates creation and deployment of the container (biggest peak in this phase). At this point, the user task can be submitted and runs during the phase C. Finally, during phase D the Domain-0 shows the consumption of the VtM to destroy the VM. Results in this figure confirm that VM creation takes around 6 seconds, while the guest system boot and Globus start takes around 30 seconds. According to this, the full creation of the VM takes around 36 seconds (from the moment that the VtM receives the request until the moment when the VM is fully functional and the customer can execute any task). Finally, VM destruction takes 6 seconds. This figure also shows the CPU consumption of the Domain-0 during the whole VM lifetime. Notice that, this is only remarkable during the creation and destruction of the VM.

4.4

Data management

As it has been previously explained in the Section 3.4, the system provides a complete data management that has the capability of providing to an application all the information it may require, even if this would be moved inside a VM from a node to another. This section also presents that a virtual machine will always access to the local hard disk when is created but remotely when it is migrated. For this reason, the most common situation will be using the machine local disk, nevertheless, executing with a remote disk will be a common situation and the loss of performance of accessing remotely to the information should be measured and taken into account. The test environment in this section will consist on two different hosts acting as a NFS server each one connected through a 1 Gigabit network, their hard disk features (extracted with hdparm) are presented on the Table 4.4. The first experiment consists on copying a file of 580 MB locally and remotely in the two tested system. Table 4.5 shows the results of the test and how slowly is working 38

Cached reads Buffered disk reads

Host A 3190.15 MB/s 62.01 MB/s

Host B 2848.53MB/s 58.61 MB/s

Table 4.4: Hard drive features remotely. Observing the results, we can conclude that the hard disk speed is a key issue and that copying a file from a machine to the same one is faster than doing it remotely as it was initially expected.

Host A Host B

Disk server Host A Host B 4.1” 21.0” 11.4” 11.3”

Table 4.5: Time to copy a file from a disk to another

200 180 160 140 120 100 80 60 40 20 0 300

CPU Disk read Disk write

KB/s

CPU(%)

Sharing the hard disk enables the migration of running application from one to another, nevertheless, as we have seen, working remotely implies a certain loss of performance working with the hard disk. In order to evaluate if working with a remote file system can increase the execution time of an application that uses the disk, a mencoder will be executed using its local file system and using a remote NFS server.

Domain 0 TX Domain 0 RX

KB/s

250 200 150 100 50 0

0

100

200

300

400

500

600

Time (seconds)

Figure 4.6: Network and disk usage on a mencoder remote execution Figure 4.6 shows the CPU load and the disk accesses that produces the application (top graphic) and the extra overhead that is generated in order to access to remote data 39

(bottom graphic). We can see how writes are cached and sent periodically to the server while the reads are being made during the whole execution.

Host A Host B

local 583” 640”

remote 584” 640”

Table 4.6: Time to execute mencoder in different environments Comparing execution times (Table 4.6), executing remotely and locally does not mean any difference, both of them take around 10 minutes and 40 seconds in mean. It is because the bottleneck of this application is CPU and working with speeds greater than a fast Ethernet is enough to execute this job.

4.5

VM Migration

As we have previously presented, sharing disks is needed in order to provide live migration of virtual machines between nodes. Once the loss of performance due to the remote disk has been proved to be negligible with our test applications, this section will prove the migration of an application during its execution.

4.5.1

Migration cost

Section 3.3 presents situations when a virtual machine should be migrated. This strategy is performed if the application needs more resources than the available in the initial node or the scheduler considers that is necessary. As it is logical migrating a virtual machine implies a certain overhead, this overhead must be taken into account for deciding if an application should be migrated or not. For this reason, this section will measure how much costs moving an application between different nodes. Figure 4.8 shows the execution of a mencoder that is migrated from Host A to Host B when it has been executed during 300 seconds. The top graphic shows the CPU usage of the VM that contains the application and the Domain-0 of both machines. The bottom graphic shows the network traffic of Host A and B, this graphic has been scaled in order to show the transfer of data during the remote execution and not just the transfer during the migration. In this figure, we can see how in the first phase (when the application is executed locally) there is no network load, nevertheless, when the migration is being done, it transfers the whole memory and sends it to the other node. In this phase, the Domain-0 of both machines works for migrating the machine. Finally, in the last phase, the load corresponds to a remote execution of a mencoder that has been already studied. The most important result of this experiment is the execution time of a migrated execution. Taking into account the results shown in Table 4.6, the execution time obtained in the experiments and the proportion of time in each machine we obtain the Table 4.5.1.

Host A to B Host B to A

Execution time 611” 613”

Expected time 612” 611”

Table 4.7: Time to execute mencoder in a migrated VM at 300 seconds 40

140

Usage at Host B Usage at Host A Usage at migration (Host A) Domain-0 at Host A Domain-0 at Host B

120

CPU (%)

100 80 60 40 20 0 200

Host A TX Host A RX Host B TX Host B RX

KB/s

150

100

50

0

0

100

200

300

400

500

600

Figure 4.7: A mencoder is being migrated during its execution This table shows that the obtained results corresponds to the expected execution time. The small difference in the times can be because of the number of experiments done, and is expected that with a higher number of data, it will be closer. We can conclude, that migrating a CPU intensive application does not imply a loss of performance on its execution. Figure 4.8 also shows that a big amount of data is transferred between the nodes where the VM will be migrated when the migration is being performed. The amount of data is more or less 1200 MB, it corresponds mainly to the migration of the memory (1 gigabyte), the snapshot of the system and the extra information needed to perform the migration.

4.5.2

Migrating applications

As we have already presented, moving applications between nodes open new ways for scheduling purposes. Nevertheless, when we are talking about applications that need to be available from outside, it is needed to evaluate if migrating an application while it is being executed is possible an efficient. Figure 4.8 shows a Tomcat that is being stressed by several clients, the top graphic shows that there is not any stop in the reply rate while the application is being moved. The virtual machine is always available to the clients that try to access to the web page. Although the machine is moved from one physical machine with a MAC address to another with a different one, Xen flushes switch ARP tables in order to change the physical location of the virtual machine. Thanks to this technique we can obtain a 100% uptime in the service. 41

Reply rate (replies/second)

600

Tomcat reply rate

500 400 300 200 100 0

0

200

400

600

800

1000

400

Host A CPU usage VM CPU usage Migrating VM CPU usage

350 300 CPU (%)

1200

250 200 150 100 50 0

0

200

400

600

800

1000

400

Host B CPU usage VM CPU usage

350 300 CPU (%)

1200

250 200 150 100 50 0

0

200

400

600

800

1000

1200

Time (seconds)

Figure 4.8: Migration of a Tomcat during its execution

The two bottom graphics show the CPU usage in the machine that the Tomcat was initially executed (middle graphic) and the bottom one shows the second machine load. The Domain-0 CPU load is mainly the typical of a Tomcat execution, nevertheless, in the moment that the migration is being done (around 30 seconds), the load of the management domain increases. This extra overhead during the migration has to be taken into account by the scheduler as an extra cost in the moment to decide if an application should be migrated. This capability of migrating a running task while it is being executed to another node will be used by the Scheduler in order to provide more resources to a task or consolidate it and it will be tested in the global scheduling section.

4.6

Resource Management

This section will evaluate the whole resource management in the VSP. Firstly, it will test how resources are distributed inside a local node using the VtM. Next, it will test how well the global scheduling will be performed. Finally, resource management based in task SLA will be shown. These experiments focus on the dynamic resource allocation that the VSP performs. Despite the system can manage all the resources, these tests will just focus on CPU but this can be applied to memory or other resources too. This is because this resource is more dynamic than memory and it gives a better view of the resource management and how does the system work. 42

4.6.1

Local resource management

This section will prove the resource management of local resources in a single node that have been previously presented in the Section 3.3.3. All these operations are performed by the Virtualization Manager that are in charge of monitoring and assigning resources to a job. The key issue in the allocation of resources to a task inside a machine is estimating the resources that it really needs, mainly done by using the calculated resource usage, and the redistribution of the surplus resources. Firstly, the calculation of the resources that a job need will be tested by executing a single task in a machine and obtaining the amount of resources that is really needed. This calculation is trivial if the application consumes more or less the same amount of resources during its execution, this is the case of mencoder. Figure 4.9 shows how the user has initially estimated that his application would need 50% of the CPU, nevertheless, the VtM has detected that it really needs one CPU. 120

Real usage Calculated usage

100

CPU (%)

80

60

40

20

0

0

100

200

300

400

500

600

700

Time (seconds)

Figure 4.9: Calculated resources of a mencoder with requirement of 50% The algorithm for estimating the resources that a task needs, becomes totally meaningful if the executed task has a variable resource usage like an Application Server which is stressed with a different load during time by its clients. Figure 4.10 shows a Tomcat that is submitted to a VtM with different requirements and is being executed inside a VM. We can observe that the top one, where the user does not specify a requirement and leaves all the decisions to the VtM by requiring no resources, the estimation is very close to the real usage of the job. In the middle one, the user specifies that his application needs one CPU and the VtM estimates that this job needs most of the time, more or less a 100% of the CPU and if the usage becomes more than this, it assigns this during the needed period. Finally, in the last one, where the user specifies that it needs two CPUs, it reduces the estimation in comparison with the one required by the user. This algorithm tries to give as most resources as the application needs and reduces it slowly in order to prevent a possible increase in its needs. According with the experiments, we can observe that with a monitoring period of 2 seconds, the systems needs around 200 seconds to detect a decrease in the resource needs. Finally, it uses various means values in order to detect increases or decreases, but it always tends to satisfy its requirements. 43

250

Requirement of 0 CPUs

Real usage Calculated usage

Requirement of 1 CPU

Real usage Calculated usage

Requirement of 2 CPUs

Real usage Calculated usage

CPU (%)

200 150 100 50 0 250

CPU (%)

200 150 100 50 0 250

CPU (%)

200 150 100 50 0

0

500

1000

1500

2000

2500

3000

3500

Time (seconds)

Figure 4.10: Calculated resources of a Tomcat with different requirements

4.6.2

Global resource management

As it has been said in Section 3.3.1 the Scheduler can follow different policies in order to schedule a given task to a certain node based in different parameters such as economics, system utilization or user requirements; nevertheless, it is out of the scope of this thesis and we just introduce simple policies that test the capabilities of the VSP. In this section, we will use a policy that maximizes the utilization of the first added nodes and this makes that tasks that are being executed are easily moved in order to test the live migration. As the SLA-based resource distribution will be tested later, this section will evaluate tasks which have not an associated SLA, hence, the resource management is only based on resource requirement and utilization. The test consists of submitting five tasks to the VSP that contains a Scheduler and two Virtualization Managers. Table 4.8 summarizes the features of these tasks including when they are submitted, how long will they run and their resource requirement inferred by the Scheduler. Task 1 2 3 4 5

Submit 10” 40” 50” 380” 420”

Duration 100” 400” 0” -

Requirement 300% 150% 100% 100% 100%

Table 4.8: Submitted tasks to the VSP 44

Task 1 consumes one CPU during 100 seconds. The second one is a task with a variable CPU consumption, during the first one hundred seconds it uses one CPU and next after it, it starts consuming three CPUs. Task 3 makes a very low CPU consumption during 400 seconds and then it finishes. Task 4 does not do anything and just implies the creation and the destruction of a new virtual environment. Finally, Task 5 consumes as many resources as it has assigned. Figure 4.11 shows the CPU allocation of the two nodes that execute tasks. In addition to the resource allocation, this figure shows the resource consumption of the VtM (running in Domain-0) and shows the usage of the resources of each task; however, usage will be shown in detail later. Host B is the first added node and, according to the policy, this machine will be the one that will be firstly filled. Initially, Task 1 is submitted to the system and it is scheduled in the Host B. When Task 2 is submitted, it does not fit in this machine because it needs 150% of CPU and there is only one CPU available in Host B; therefore, it is sent to Host A. At this point, both tasks receive the whole machine. Host A

400

Capacity Task 2 Usage Task 2 Capacity Task 2' Usage Task 2' Capacity Task 4 Usage Task 4 Capacity Task 5 Usage Task 5 Domain-0

350

% CPU Usgae

300 250 200 150 100 50 0

0

100

200

300

400

500

Capacity Task 1 Usage Task 1 Capacity Task 2 Usage Task 2 Capacity Task 3 Usage Task 3 Domain-0

350 300

% CPU Usgae

600

Host B

400

250 200 150 100 50 0

0

100

200

300

400

500

600

Time (Seconds)

Figure 4.11: CPU allocation of nodes between VMs

In the second 30, Task 3 is scheduled to Host B and the two tasks that are in this node share the resources according to the requirements of each one and following the resource assignment presented in Section 3.3.3.2. During the creation of this three tasks, Domain-0 of each node has consumed almost one CPU during a brief period because of the VtM consumption in order to create them. When the Task 1 finishes, the Host B has three free CPUs and Task 2 is migrated to this node in order to fill it. A new domain that we will call Task 2’, contains the task that is 45

being migrated. At that moment, resources are distributed another time according to task requirements and their usage. Furthermore, the migration process implies an overhead as it has been shown in Section 4.5.1 and it is reflected in the Domain-0 consumption. At the moment that Task 4 is submitted, there are no enough resources in Host B and this is sent to Host A. As it does not do anything, the VM is destroyed when it is executed and, at this point, a new task is scheduled in that node and the whole resource of this node are assigned to this VM. VtM that manages Host A uses one CPU for destroying and creating the virtual machines that run these tasks. Finally, Task 3 finishes and its VM is destroyed. Execution keeps running as the tasks that are in the VSP have no end; therefore, the monitoring stops in this point. 4.6.2.1

Resource assignment

Figure 4.12 shows the CPU assignation of the tasks that have a most valuable information for this experiment: 1, 2 and 3. Task 1 does not use as many resources as the Scheduler has specified; however, the Resource Calculator needs more time to realize it. When Task 3 is started in the same node there are no surplus resources and the resource assignation is trivial: 300% of the CPU to the first one, and 100% to the other. 400

Capacity Task 1 Usage Task 1

% CPU Usgae

350 300 250 200 150 100 50 0 400

Host A C. Task 2 Host A U. Task 2 Host A C. Task 2' Host A U. Task 2' Host B C. Task 2 Host B U. Task 2

% CPU Usgae

350 300 250 200 150 100 50 0 400

Capacity Task 3 Usage Task 3

% CPU Usgae

350 300 250 200 150 100 50 0

0

100

200

300

400

500

600

Time (seconds)

Figure 4.12: CPU consumption and allocation of tasks

At the moment that Task 2 is migrated to that node, there are 150% of CPU that is surplus and it is equally distributed between the two tasks: 175% is assigned to the third and 225% to the second one. Around the second 300 of the execution, Task 2 starts consuming three CPUs and the Resource Calculator detects that it needs more resources; hence, the VtM assigns more 46

resources to this task until there is no more surplus resources to be assigned. However, in the second 470, Resource Calculator detects that Task 3 does not need one CPU and there are surplus resources that can be assigned to Task 2 that has no enough resources in that moment. Hence, more resources are assigned to it. Finally, when Task 3 finishes, the whole machine is assigned to Task 2. Tasks 4 and 5 are located alone in their node; therefore, the whole machine is also assigned to them.

4.6.3

SLA-driven resource management

Once we have observed how the VtM estimates the resources that a task needs to be executed, we will show how the SLA-driven resource management can solve some problems that the simple resource management based on estimation cannot solve. Thanks to the resource management of the VSP based on the estimation of how many resources does a task need, this system can overcome SLA violations that imply traditional resource like CPU or memory by assigning as many resources as the task needs. However, when we talk about other metrics that are hard to evaluate with resource usage monitoring such as replies per second or execution time, the only component that can detect these violations is the SLA evaluator. It happens because the evaluator can monitorize metrics that the resource mangement does not control. In order to solve this kind of SLA violation, the SLA evaluator can request more resources to the VtM (Section 3.3.4). Extending SLA metrics is part of our future work. SLA-driven resource management can also be useful in other cases that would imply economic parameters. Nevertheless, it implies new policies that take into account which are the benefits and the penalties of executing a task or not and this is also part of our future work. Another important problem of the Resource Calculator resource management is that when the system has not many surplus resources assigned to a task, the Resource Calculator cannot detect if an application needs more resources than the calculated ones. This section demonstrates how the SLA evaluator can solve this kind of situation. 400

Capacity Task 4 Usage Task 4 Capacity Task 3 Usage Task 3 Capacity Task 2 Usage Task 2 Capacity Task 1 Usage Task 1 Domain-0

350

300

CPU(%)

250

200

150

100

50

0

0

100

200

300

400

500 Time (seconds)

600

700

Figure 4.13: CPU management with SLA violations

47

800

900

The experiment consists of submitting four tasks that fill the machine by using one CPU each one. Task 1 does not use the assigned resource during a large period of time and the others use one CPU each one. Figure 4.13 shows the execution of the four tasks in a node of the VSP. Firstly, the first three tasks are created sequentially and they start executing. However, Task 4 cannot be submitted because the Resource Calculator estimates that Task 2 and 3 need more than one CPU because the applications consume almost one CPU but sometimes they consume a little more due to the OS. Hence, there is not place for Task 4. Around the second 500, the Resource Calculator detects that Task 1 is not making use of the assigned resources and assigns less resources to this task what implies that resources that where assigned to this task become surplus. At this moment, there are enough free resources for executing Task 4 (it requires one CPU). At this point, the surplus (around 45% of CPU) is equally distributed among the four application because all have the same priority for sharing surplus resources (3.3.3.2). When Task 1 wants to make use of the CPU that it has requested, it has not enough rersources and violates the internal SLA (around second 600). The resource management cannot detect that it needs more resources. In this moment, the SLA Evaluator detects this situation and requires more resources by increasing the dynamic priority that controls the surplus resources, and it assigns more surplus resources to this task. Finally, the SLA of Task 1 is accomplished and it has assigned one CPU. This would not be possible if there would not be the SLA evaluator, because the Resource Calculator cannot estimate that an application would need more resource than the one that it has assigned.

48

Chapter 5

Related Work 5.1

Creation and management of VMs

Introducing virtualization for abstracting nodes of service provider and allocating tasks in a virtual machine in order to consolidate and isolate them inside the same physical machine has been widely investigated during the last years. Next projects that rely on this idea are presented in detail. SODA [23] studies the usage of virtual machines for implementing a hosting utility platform that hosts applications for providing external services that has longer lifetime than a traditional job. This system enables the creation on-demand of application services on a virtual machine allocated in a real hosting platform. The image of the VM is created on-demand in the same way that the VSP does, but it does not introduce any caching system for reducing creation times. The main difference with our approach is that it introduces a switch that is used to direct service clients to the node that provides the service that it requests. Service requests from client

Service requests from client

Service switch for S'

Service switch for S

Virtual Service Node

Service S

SODA Daemon

Service Service S S'

Guest OS Host OS

Guest OS

During Service Lifetime

Guest OS

SODA Daemon

Service S'

Host OS

HUB host1

SODA Daemon

Guest OS Host OS

HUB host2

HUB host3

High speed network SODA Master

SODA Agent

Before service creation

Service creation requests from ASPs

Figure 5.1: SODA architecture The In-VIGO project [8] proposes the creation of dynamic pools of virtual resources that can be aggregated on-demand for application-specific user-specific grid-computing by adding three layers of virtualization over grid resources. It only provides a definition of the layers without an implementation. The first layer which talks about creating virtual 49

resources from physical defines what our VtM does. VMShop is a virtual machine management system which provides application execution environments for Grid Computing. As a physical machine manager uses VMPlant [28] which provides automated configuration to meet application needs and the creation of flexible VMs that can be deployed across distributed Grid resources. In the same way as the VSP does, in order to provide this automated configuration, it introduces a model for the definition of customized VM and uses a cache-based deployment for efficiency reasons. VM Creation Request fromClient (e.g. In-VIGO) (1) VM Request

(6) VM Classad

VMShop (2) Request Estimate

vws010

vws011

user10

user11

(3) (4) VM Creation Create VM Cost

VMPlant Daemon

vws001

vws002

vws003

user1

user2

user3

Host OS (VMPlant)

(5) VM Classad

VMPlant Daemon

vws005

vws006

user5

user6

VMPlant Daemon

Host OS (VMPlant)

Host OS (VMPlant)

Figure 5.2: VMShop SoftUDC [25] also presents virtualization as a way to provide utility computing by offering a centrally managed pool of resources to the user. It lets applications to share physical resources from different parties, and shows a pool of machines as a single one which can execute VMs. Nevertheless, it introduces an improvement in contrast to other alternatives; that is, it introduces the data management issue and shares data between nodes by providing a smart storage virtualization service that allows any machine accessing to any data in the pool. Our VSP takes into account this issue and also lets every task accessing data without taking into account if it is in the same node or not. Management OS

Virtual Machine

Host 1

Host 2

Management OS

Virtual Machine

Management OS

Virtual Machine

Virtual storage device

Virtual volume manager IA-32

IA-32

LAN fabric

IA-32

SAN fabric (for example fibre channel)

Direct attached storage and local disks

NFS server

NFS server

NFS server Physical partition or local disk Network storage

Figure 5.3: SoftUDC Thanks to this remote storage access (Figure 5.3), it allows VM migration between nodes which implies the moving of all data required by the VM from a node to another. 50

Globus Virtual Workspace [26] provides to the user the capability of creating a custom execution environment and execute it on any machine of the pool. VM features and its resources can be changed if it is needed, but in contrast to our approach, it implies restarting the environment. This approach takes advantage of virtualization capabilities by using pausing and restarting for schedule machines. These features bring new efficient resource management possibilities between nodes that give new chances to introduce new scheduling techniques, nevertheless, it does not provide local fine-grained resource management. Globus Virtual Workspace [42] proposes a model for provisioning Virtual Machine images that contains an on-demand environment efficiently. This model takes into account the overhead resulting from instantiating a remote virtual resource and introduces a method for efficiently manage virtual data transfer in order to reduce the overhead. It studies different techniques to schedule this image transfer in order to reduce the time delay between the end-user virtual resource request at the moment it is available. The workspace service has a WSRF frontend that llows users to deploy and manage virtual workspaces

Workspace Service

The VWS manages a set of nodes (typically a cluster) This is called node pool

VWS Node

Each node must have a VMM (Xen) installed, along with the workspace backend (software that manages individual nodes)

Image Node

VM images are stored in a separate node

Pool node

Pool node

Pool node

Pool node

Pool node

Pool node

Pool node

Pool node

Pool node

Pool node

Pool node

Pool node

Resource Provider

Figure 5.4: Globus Virtual Workspace OpenNEbula [34] is a platform that is being currently used in Resevoir project [37]. It provides a flexible virtual infrastructure which dynamically adapts to the changing demands of a service workload. It leverages existing virtualization platforms to create a new virtualization layer between the service and the physical infrastructure. This highly modular and one of its major points is being open in order to be used by anybody and easily to extend. It is highly customizable by allowing adding new features like new scheduling policies and provides a centralized management. All the approaches that have been presented do not take into account the resource management inside the physical machine; they only execute tasks in a certain node making a static resource allocation. Using dynamic resource allocation of VMs inside a single node has been explored in different projects like Virtuoso or VioCluster. Virtuoso provides resource management among Virtual Machines in a cluster and allows a user to execute the applications; but in this case, it allows having a full control of the resources. VSched [29] is the Virtuoso component that provides the capability of co-schedule batch and interactive VMs satisfying constraints on responsiveness and compute rates for each workload in the same manner that our Virtualization Manager does. 51

VIOLIN [39] allows overlaying a virtualized environment over more than one physical domain. Having more than one physical location implies bigger delays when migrating a VM to a remote node, and this project solves this problem by reducing the amount of sent data by introducing diff files. Based on the VIOLIN multi-domain infrastructure, [40] provides autonomic adaptation of the resources. It proposes the monitoring of the resource usage of a VM and provides resource reallocation between VMs in the same host and if it cannot be carried out locally, it migrate the VM to another host. This philosophy is also followed in our Virtualized Service Provider. All these projects belong to the research area; nevertheless, there are some commercial alternatives like Amazon Elastic Compute Cloud (EC2) [9]. This is a commercial system which allows customers to rent computing resources to run applications on. It allows scalable deployment of applications by providing a web services interface through which customers can request an arbitrary number of Virtual Machines, i.e. server instances, on which they can load any software of their choice. EC2 provides to the customer the capacity to create and manage server instances on demand with different virtual machine specifications. Finally, the users pay for the time that they have been using a certain amount of resources.

5.2

Local resource management using Virtualization

Previously presented projects use virtualization for scheduling tasks between nodes and providing services for the users like our approach does; nevertheless, they give a simple local resource management. On the other hand, using virtualization in order to enable fine-grain dynamic resource distribution among VMs in a single node has been widely studied during the last years. For example, [41] proposes a dynamic resource allocation between VM allocated on the same host. It detects if an application contained on a VM is over or underusing assigned resources and uses dynamic priorities for adjusting resource assignation between VMs for optimizing global machine performance. Another project that takes advantage of virtualization features is [43]. This project collocates heterogeneous workloads on any server machine, thus reducing the granularity of resource allocation (Figure 5.5). Classical control theory has been used for developing an adaptive resource control system on [35]. It dynamically adjusts the resource shares to VMs, which contain individual components of complex, multi-tier enterprise applications in a shared hosting environment in order to meet application-level QoS goals. Virtualization implementations have been also modified in order to provide better resource management. This is the case of [21] which has introduced some modifications in the default Xen Hypervisor in order to make it aware about the behavior of hosted applications. This project has improved the default Xen performance by introducing a new communication-aware CPU scheduling algorithm.

5.3

SLA-based resource management

In addition to local resource management based on application behavior, our proposal introduces economical factors to resources reallocation by taking into account a SLA agreed with the user. This topic has been studied in [14], which combines the use of analytic predictive multiclass queuing network models and combinatorial search techniques 52

Flow Controller Optimizer Utility Function Calculator

App A

Web Requests

Request Router

App A App B Node 2 Job D

Monitoring Infrastructure

Job C Node 1

Web Workload Profiler

Web Application Placement Executor

Placement Optimizer

J ob Workload Profiler

App B

J ob Scheduler J obs

Application Placement Controller

J ob Utility Estimator

Node 3 J ob Scheduler Proxy

J ob Executor J ob Queue Manager

Figure 5.5: Management system architecture for heterogeneous workloads to design a controller for determining the required number of servers for each application environment in order to continuously meet the SLA goals under a dynamically varying workload. It also exists a SLA-driven prototype for server farms called Oceano [11]. It enables the dynamic moving of servers across clusters depending on the customer changing needs and the addition or removal of servers from clusters is triggered by SLA violations. In the same manner, [30] presents a middleware for J2EE clusters that optimizes the resource usage to allow application servers fulfilling their SLA without incurring in resource over-provisioning costs. This resource allocation is also done adding or removing nodes from the cluster.

5.4

Dynamic resource management on clusters

Resource management in a service provider has been widely investigated during the last years. An example of it is Sharc [44], which manages resources in shared clusters. This approach manages applications by encapsulating them in capsules and assigns resources to them according with their requirements. In this paper, the authors study the overhead of managing resources using two layers (a global scheduler and a local resource manager) and demonstrate the scalability of this approach. In [16] the authors propose a self-managed system for shared architectures that allows adapting servers to changing workloads while maintaining the QoS requirements. This is performed by adjusting the various resource parameters (primarily the accept queue and the CPU) according to the monitored values. The evaluation is done using an Apache web server that is stressed by different workloads using httperf. Cluster Reserves [12] is a resource management facility for cluster-based web servers 53

that extends single-node resource management mechanisms to a cluster environment. It dynamically adjusts the shares of each server according to the local resource usage and one of the points that it evaluates is the time complexity for allocating resources. Another architecture for resource management in a shared hosting center is Muse [17]. It focuses on energy saving and uses an economic model for dynamic provisioning of resources. This work focuses on maximizing the number of unused servers so that they can be powered down to reduce energy consumption.

5.5

Overall approach for resource management using Virtual Machines

Our work proposes a more general and extensive solution for managing service providers by joining in a single framework the creation of application specific virtual execution environments on demand, the global resource allocation among nodes, and the SLA-driven dynamic resource redistribution at node level (based on the redistribution of surplus resources). A two-level autonomic resource management system for virtualized data centers is presented in [45]. It enables automatic and adaptive resource provisioning in accordance with SLAs specifying dynamic tradeoffs of service quality and cost (Figure 5.6). In this project, the characterization of the relationship between application and resource demand is performed using fuzzy logic. W2

Workload1

...

W3

Application1

A2

A3

Virtual Container1

VC2

VC3

consume

Application SLA1

Application SLA2

Local Controller1

Local Controller2

...

Application SLA3

Local Controller3

provide

Data Center Resource Pool

Global controller Resource SLA

Figure 5.6: Two-level autonomic resource management for virtualized data centers Finally, there are works that have combined some of the previously functionalities, but none of them provide all we offer in our proposal. For example, [7] proposes a dynamic capacity management framework for virtualized hosting services, which is based on an optimization model that links a cost model based on SLA contracts with an analytical queuing based on a performance model. This project does not have a working implementation of the proposed system, but a discrete event simulation. However, these works do not support the creation of VMs on demand either. Once we have studied all the related work, there is not a complete solution for implementing a Virtualized Service Providers that supports a complete resource management based on resource usages and SLAs.

54

Chapter 6

Conclusions This thesis has shown how to bring virtualization advantages to service providers by introducing the concept of Virtualized Service Providers. The described solution allows reducing costs and, at the same time, fulfilling the QoS agreed with the customers by supporting fine-grain dynamic resource distribution among customers’ applications based on SLAs (encoded using a real SLA specification). Virtualization has been used to build on demand execution environments for the tasks, granting full control to them of their execution environment without any risk for the underlying system or the other tasks. The consolidation of VMs allows better use of the provider’s physical resources. In addition, using virtual environments has highly decreased the human interaction in order to administrate the system. This approach provides a complete solution with regard to other similar solutions. It takes into account the local node and the global system resource management taking into consideration the agreement achieved with the users with no extra work for them and in a transparent way. It is performed with a self-adaptive behavior, where each application receive enough resources to achieve the agreed performance goals. Moreover, free resources can be dynamically redistributed among applications when SLA violations are detected. Agents have been used to coordinate the different components, monitor and react to possible undesired events, such as SLA violations or failures, providing the system with self-adaptive capabilities. We have evaluated if encapsulating an application inside a virtual machine can decrease its performance with regard to a typical environment. Thanks to paravirtualization, the overhead introduced is almost negligible for CPU-intensive applications, but it must be considered in the I/O intensive ones, less than 5% of the task consumption. Nevertheless, this overhead does not imply a big loss of performance at the end of the application. The overhead of creating a new environment to execute a task is just 6 seconds, avoiding disk bottleneck during the VM creation, and takes 30 seconds from the moment that the user makes the request until it is ready to execute the task. It makes our system a good candidate to execute tasks with a medium and large life. Resource management with SLA support has also been tested, and the experiments show that the system is able to allocate resources following intelligent resource estimations, and taking benefit of the whole machine by using all the resources. Furthermore, it adapts the resource allocation under changing conditions while fulfilling the agreed 55

performance metrics and solves SLA violations by rescheduling efficiently the resources. One of the keys of this project is the distributed shared file system which allows creating virtual environments efficiently. Furthermore, this system to access data allows migrating running virtual machines without taking into account where they are located. In addition, our VSP has demonstrated in the tests that it can provide efficient live migration and pausing capabilities without implying any change in the tasks. The evaluation has also demonstrated that these features do not imply any overhead and it easily allows moving a task between nodes if it needs more resources or pausing an execution for giving resources to another task. The union of all these features make the Virtualized Service Provider a complete solution and can be easily improved by introducing new resource sharing policies. It gives a great platform for job scheduling and can open a wide range of possibilities for SOC in a near future.

6.1

Future work

This master thesis presents the infrastructure to create a Virtualized Service Provider; nevertheless, it is just a first approach that is usable, but it needs to be improved in order to be a real product. All the improvements that are planned to be added to the VSP focus on the idea of implementing it as a prototype system that can be really used in the real world, and finally, trying to bring it to the current service providers. One of the points that need to be highly studied and introduced is all the SLA stuff related with the previous stages of the execution. It includes the interaction with the user and the negotiation of the SLA with the user. In addition, once the SLA has been agreed, the resources requirements of each task must be inferred from the negotiated SLA. The scheduling policies used in the current version of the VSP will be also improved in order to take into account other factors such as economics or resource usage and take full benefit of the capabilities of this system such as task migration and pausing. All these new policies should be studied in detail in order to get some of them that would be really usable in a real system. Providing data to an application and retrieving results is currently done in a simple way with a simple check about the data and the task owner. This is the weakest link in the security of our system, and providing a most secure access to data is one of the improvements that will be performed in a future work. The last point to study before implementing the VSP as a real product to be sent is studying its performance. Stressing the system with a real workload would be a good way in order to evaluate the chances of this system in the real world. Finally, in a short period of time, the improvements done in the Virtualization Manager will be adapted and added to the BREIN [15] project in order to provide it with a better resource management.

56

Bibliography [1] [2] [3] [4] [5] [6] [7]

[8]

[9] [10]

[11]

[12]

[13]

[14]

[15] [16] [17] [18]

[19]

[20]

Apache tomcat. http://tomcat.apache.org. Jade. http://jade.tilab.com/. Jvm. http://java.sun.com. Mysql. http://www.mysql.com. Wine. http://www.winehq.org. Xen. http://www.xensource.com. B. Abrahao, V. Almeida, J. Almeida, A. Zhang, D. Beyer, and F. Safai. Self-Adaptive SLA-Driven Capacity Management for Internet Services. IEEE/IFIP NOMS Conference, Vancouver, Canada, April, 2006. S. Adabala, V. Chadha, P. Chawla, R. Figueiredo, J. Fortes, I. Krsul, A. Matsunaga, M. Tsugawa, J. Zhang, M. Zhao, L. Zhu, and X. Zhu. From virtualized resources to virtual computing grids: the in-vigo system. Future Gener. Comput. Syst., 21(6):896–909, 2005. Amazon. Amazon elastic compute cloud. http://www.amazon.com/ec2. A. Andrieux, K. Czajkowski, A. Dan, K. Keahey, H. Ludwig, J. Pruyne, J. Rofrano, S. Tuecke, and M. Xu. Web Services Agreement Specification (WS-Agreement). Global Grid Forum GRAAP-WG, Draft, August, 2004. K. Appleby, S. Fakhouri, L. Fong, G. Goldszmidt, M. Kalantar, S. Krishnakumar, D. Pazel, J. Pershing, B. Rochwerger, I. Center, et al. Oceano-SLA based management of a computing utility. Integrated Network Management Proceedings, 2001 IEEE/IFIP International Symposium on, pages 855–868, 2001. M. Aron, P. Druschel, and W. Zwaenepoel. Cluster reserves: a mechanism for resource management in cluster-based network servers. Proceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, pages 90–101, 2000. P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt, and A. Warfield. Xen and the art of virtualization. Proceedings of the nineteenth ACM symposium on Operating systems principles, pages 164–177, 2003. M. Bennani and D. Menasce. Resource allocation for autonomic data centers using analytic performance models. Proceedings of IEEE International Conference on Autonomic Computing, Seattle (ICAC-05), WA, 2005. EU BREIN project. http://www.eu-brein.com. A. Chandra, P. Pradhan, R. Tewari, S. Sahu, and P. Shenoy. An observation-based approach towards self-managing web servers. Computer Communications, 29(8):1174–1188, 2006. J. Chase, D. Anderson, P. Thakar, A. Vahdat, and R. Doyle. Managing energy and server resources in hosting centers. ACM SIGOPS Operating Systems Review, 35(5):103–116, 2001. L. Cherkasova and R. Gardner. Measuring CPU overhead for I/O processing in the Xen virtual machine monitor. Proceedings of the USENIX Annual Technical Conference 2005 on USENIX Annual Technical Conference table of contents, pages 24–24, 2005. J. Ejarque, M. de Palol, I. Goiri, F. Julia, J. Guitart, J. Torres, and R. M. Badia. Using semantics for resource allocation in computing service providers. In Proceedings of IEEE International Conference on Services Computing (SCC 2008), 2008. Globus Toolkit. http://www.globus.org/toolkit/.

57

[21] S. Govindan, A. Nath, A. Das, B. Urgaonkar, and A. Sivasubramaniam. Xen and co.: communication-aware CPU scheduling for consolidated xen-based hosting platforms. Proceedings of the 3rd international conference on Virtual execution environments, pages 126– 136, 2007. [22] Httperf. http://www.hpl.hp.com/research/linux/httperf. [23] X. Jiang and D. Xu. SODA: a service-on-demand architecture for application service hosting utility platforms. High Performance Distributed Computing, 2003. Proceedings. 12th IEEE International Symposium on, pages 174–183, 2003. [24] M. T. Jones. Virtual linux. 2006. http://www-128.ibm.com/developerworks/library/ l-linuxvirt/index.html. [25] M. Kallahalla, M. Uysal, R. Swaminathan, D. Lowell, M. Wray, T. Christian, N. Edwards, C. Dalton, and F. Gittler. Softudc: a software-based data center for utility computing. Computer, 37(11):38–46, Nov. 2004. [26] K. Keahey, I. Foster, T. Freeman, X. Zhang, and D. Galron. Virtual Workspaces in the Grid. Lecture Notes in Computer Science, 3648:421–431, 2005. [27] A. Keller and H. Ludwig. The WSLA Framework: Specifying and Monitoring Service Level Agreements for Web Services. Journal of Network and Systems Management, 11(1):57–81, 2003. [28] I. Krsul, A. Ganguly, J. Zhang, J. A. B. Fortes, and R. J. Figueiredo. Vmplants: Providing and managing virtual machine execution environments for grid computing. In SC ’04: Proceedings of the 2004 ACM/IEEE conference on Supercomputing, page 7, Washington, DC, USA, 2004. IEEE Computer Society. [29] B. Lin and P. A. Dinda. Vsched: Mixing batch and interactive virtual machines using periodic real-time scheduling. In SC ’05: Proceedings of the 2005 ACM/IEEE conference on Supercomputing, page 8, Washington, DC, USA, 2005. IEEE Computer Society. [30] G. Lodi, F. Panzieri, D. Rossi, and E. Turrini. SLA-Driven Clustering of QoS-Aware Application Servers. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, pages 186–197, 2007. [31] mencoder. http://www.mplayerhq.hu. [32] A. Menon, J. Santos, Y. Turner, G. Janakiraman, and W. Zwaenepoel. Diagnosing performance overheads in the xen virtual machine environment. ACM/Usenix International Conference On Virtual Execution Environments: Proceedings of the 1 st ACM/USENIX international conference on Virtual execution environments, 11(12):13–23, 2005. [33] H. Nwana. Software agents: An overview. Knowledge Engineering Review, 11(3):205–244, 1996. [34] OpenNebula. http://www.opennebula.org. [35] P. Padala, K. Shin, X. Zhu, M. Uysal, Z. Wang, S. Singhal, A. Merchant, and K. Salem. Adaptive control of virtualized resources in utility computing environments. Proceedings of the 2007 conference on EuroSys, pages 289–302, 2007. [36] M. Papazoglou and D. Georgakopoulos. Service-Oriented Computing. Communications of the ACM, 46(10):25–28, 2003. [37] Reservoir Project. http://www.reservoir-fp7.eu. [38] RUBiS: Rice University Bidding System. http://www.cs.rice.edu/CS/Systems/DynaServer/RUBiS/. [39] P. Ruth, P. McGachey, and D. Xu. Viocluster: Virtualization for dynamic computational domains. In 2005 IEEE International Conference on Cluster Computing, page 7. IEEE Computer Society, 2005. [40] P. Ruth, J. Rhee, D. Xu, R. Kennell, and S. Goasguen. Autonomic Live Adaptation of Virtual Computational Environments in a Multi-Domain Infrastructure. Proc. IEEE ICAC, pages 5–14, 2006. [41] Y. Song, Y. Sun, H. Wang, and X. Song. An Adaptive Resource Flowing Scheme amongst VMs in a VM-Based Utility Computing. Computer and Information Technology, 2007. CIT 2007. 7th IEEE International Conference on, pages 1053–1058, 2007. [42] B. Sotomayor. A resource management model for vm-based virtual workspaces. 2007. [43] M. Steinder, I. Whalley, and D. Chess. Server virtualization in autonomic management of heterogeneous workloads. 2008.

58

[44] B. Urgaonkar and P. Shenoy. Sharc: Managing CPU and Network Bandwidth in Shared Clusters. 2004. [45] J. Xu, M. Zhao, J. Fortes, R. Carpenter, and M. Yousif. On the Use of Fuzzy Modeling in Virtualized Data Center Management. Proceedings of the Fourth International Conference on Autonomic Computing, 2007. [46] ´I˜ nigo Goiri and J. Guitart. Towards Computing Resource Abstraction: using Virtualization. Technical Report UPC-DAC-RR-2007-71, 2007.

59

Glossary ACL Agent Communication Language ARP Address Resolution Protocol BREIN Business objective driven Reliable and Intelligent grids for real Business CPU Central Processor Unit EC2 Amazon Elastic Compute Cloud FIPA Foundations of Intelligent Physical Agents FTP File Transfer Protocol I/O Input/Output J2EE Java Platform, Enterprise Edition JADE Java Agent Development Framework JRE Java Runtime Environment JVM Java Virtual Machine HTTP Hypertext Transfer Protocol IP Internet Protocol, in some cases it refers to the IP address MAC Media Access Control, in some cases it refers to the IP address NFS Network File System NTP Network Time Protocol OS Operating System QoS Quality of Service SLA Service level agreement VE Virtual Environment VM Virtual machine VMM Virtual Machine Monitor 60

VtM Virtualization manager VPS Virtual Private Server VSP Virtualized Service Provider RUBiS Rice University Bidding System SERA Semantically-Enhanced Resource Allocator SLA Service Level Agreement SOC Service-Oriented Computing SFTP Secure File Transfer Protocol SP Service Provider SSH Secure Shell SSL Secure Sockets Layer VWS Virtual Workspace WS Web Service WSLA Web Service Level Agreement XML Extended Markup Language XML-RPC Remote Procedure Call based on XML

61