An Artificial Intelligence Model for Autonomous Resource Allocation in

166 downloads 0 Views 2MB Size Report
we evaluate the proposed solution while also providing evidence of the overall con- tribution of this .... 4.6 Q-Learning Implementation . ..... 43 Comparison of total moves with PBRS over five learning trials . . . . 99 ...... This chart shows similar.
An Artificial Intelligence Model for Autonomous Resource Allocation in Cloud Computing Environments

Rachael Shaw

Thesis submitted to College of Engineering & Informatics National University of Ireland, Galway

In fulfilment for the degree of Masters of Science Software Design & Development

Supervisor: Dr. Enda Howley

August 2016

Declaration I hereby declare that I am the sole author of this thesis and that the material is entirely my own work and has not been taken from the work of others, except work that has been cited and acknowledged within the text of my work. This thesis has not been accepted for any degree and is not concurrently submitted in candidature for any other degree at this or any other university.

i

To Dad.

I have had many teachers along the way but among all you have been the greatest. Beyond everything, you have taught me to persevere always and for that alone I am forever grateful.

ii

Acknowledgements Firstly, I would like to express my sincere thanks and appreciation to my supervisor, Dr. Enda Howley. In particular, I would like to thank you for the valuable time you gave to this research, your encouragement throughout the year and also your immense guidance and support for without any of it this research would not have been possible. I would like to sincerely thank my parents for your love, support and encouragement not only throughout this research but all through my time in college. I would like to thank you for affording me many opportunities throughout the years, without it I would not be where I am today and I am truly grateful. I would also like to thank my siblings and friends for all of your love and support, you have all helped me along this journey in some way, even if it was just giving me something to laugh about. Thank you all. A special thanks to Cathal for your encouragement and support throughout and for always believing in me. I sincerely appreciate it. Lastly, I would like to thank the members of our research group for your encouragement and advice throughout the year. In particular, I would like to say a special thanks to Dan who listened to me waffle on for a full year and yet gave up his time to proof read this thesis. Cheers, I think a pint is well and truly in order. On this note, all that is left to say is onwards and upwards, here’s to the next chapter.

iii

Publications • Rachael Shaw and Enda Howley, “A Reinforcement Learning Model for Optimising Virtual Machine Placement in Cloud Data Centers”. The Third International Symposium on Next Generation Computing and Internet, ACM, 2017. (Under Review)

iv

Abstract The recent proliferation and commercial use of cloud based services has caused major concern for the high power and energy consumption rates involved in the operation of large scale data centers today. While many efforts have been widely devoted to achieving improved performance of cloud based services, a limited amount of research has been focused towards energy-aware strategies for the efficient management and utilisation of cloud resources. One approach that can be leveraged in order to improve resource utilisation and to drive a more sustainable green cloud infrastructure is Virtual Machine (VM) placement. Through the deployment of a more dynamic and robust VM placement strategy we seek to address this challenging problem by optimising resource usage and in doing so reaching new frontiers in energy efficiency. This work proposes an intelligent Reinforcement Learning (RL) resource optimisation model for the efficient allocation of VM instances in order to significantly reduce energy consumption while also delivering the required Quality of Service (QoS). This technique leverages an underlying mathematical framework which allows an agent to learn optimal behaviour in highly stochastic environments due to its innate ability to reason under uncertainty. Using workload data from real data center deployments we evaluate the proposed solution while also providing evidence of the overall contribution of this research by comparing its performance against the state-of-the-art consolidation policy. The results obtained from this research indicate the immense potential of deploying an energy efficient learning agent to consolidate and support the movement of VMs between physical servers. In particular, the model shows a significant improvement in energy consumption as vindicated by a total reduction of 25% in energy while also reducing the number of service violations by a staggering 63% overall.

v

Contents List of Figures

ix

List of Tables

xi

1 Introduction 1.1 Problem Formulation 1.2 Research Objectives . 1.3 Research Questions . 1.4 Thesis Structure . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

1 3 5 5 6

2 Background and Literature Review 2.1 Overview . . . . . . . . . . . . . . . . . . . . . 2.2 The Evolution of Cloud Computing . . . . . . 2.2.1 Cluster Computing . . . . . . . . . . . 2.2.2 Grid Computing . . . . . . . . . . . . 2.2.3 Cloud Computing . . . . . . . . . . . . 2.3 The Key Characteristics of Cloud Computing 2.3.1 Elimination of Upfront Investments . . 2.3.2 On-Demand Computing Resources . . 2.3.3 High Scalability and Elasticity . . . . . 2.3.4 Ubiquitous Network Access . . . . . . 2.3.5 Measured Service . . . . . . . . . . . . 2.4 A Review of the Cloud Architecture . . . . . . 2.4.1 Application Layer . . . . . . . . . . . . 2.4.2 Platform Layer . . . . . . . . . . . . . 2.4.3 Infrastructure layer . . . . . . . . . . . 2.4.4 Hardware Layer . . . . . . . . . . . . . 2.5 The Cloud Computing Deployment Models . . 2.5.1 Public Clouds . . . . . . . . . . . . . . 2.5.2 Private Clouds . . . . . . . . . . . . . 2.5.3 Hybrid Clouds . . . . . . . . . . . . . . 2.5.4 Community Clouds . . . . . . . . . . . 2.5.5 Federated Clouds . . . . . . . . . . . . 2.6 The Data Center and Energy Consumption . . 2.6.1 Key Areas of Energy Consumption . . 2.6.2 Server Power Consumption . . . . . . . 2.6.3 Power Management Techniques . . . . 2.7 Virtualisation . . . . . . . . . . . . . . . . . . 2.7.1 An Abstract Description . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . .

7 7 8 8 9 9 11 11 12 12 12 12 14 15 15 15 16 16 16 17 18 18 19 21 22 23 24 25 26

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

vi

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

2.7.2 Virtualisation Techniques . . . . . . . . . 2.7.3 Virtualisation Technologies . . . . . . . . . 2.7.4 Migration and Consolidation . . . . . . . . 2.8 An AI Approach to Optimisation . . . . . . . . . 2.8.1 Agents . . . . . . . . . . . . . . . . . . . . 2.8.2 Multiagent Systems . . . . . . . . . . . . . 2.8.3 Reinforcement Learning . . . . . . . . . . 2.9 Related Work . . . . . . . . . . . . . . . . . . . . 2.9.1 Threshold and Non AI Based Approaches 2.9.2 AI Based Approaches . . . . . . . . . . . . 2.9.3 Complex Systems Research Group . . . . . 2.10 Research Gap . . . . . . . . . . . . . . . . . . . . 2.11 Summary . . . . . . . . . . . . . . . . . . . . . . 3 CloudSim 3.1 Overview . . . . . . . . . 3.2 Simulation Tool . . . . . 3.2.1 Key Components 3.2.2 Simulation Model 3.2.3 Simulation Setup 3.3 Summary . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

4 Reinforcement Learning Framework 4.1 Overview . . . . . . . . . . . . . . . . . . 4.2 Selected Policies . . . . . . . . . . . . . . 4.3 PABFD Algorithm Hypothesis . . . . . . 4.4 Additional Classes . . . . . . . . . . . . 4.4.1 Lr-MmtRlPlacement . . . . . . . 4.4.2 PowerVmAllocationRlPlacement 4.4.3 Agent . . . . . . . . . . . . . . . 4.4.4 Environment . . . . . . . . . . . 4.4.5 Algorithm . . . . . . . . . . . . . 4.4.6 RlHelper . . . . . . . . . . . . . . 4.5 Defining the State-Action Space . . . . . 4.6 Q-Learning Implementation . . . . . . . 4.7 Sarsa Implementation . . . . . . . . . . . 4.8 Summary . . . . . . . . . . . . . . . . .

. . . . . .

. . . . . . . . . . . . . .

. . . . . .

. . . . . . . . . . . . . .

. . . . . .

. . . . . . . . . . . . . .

. . . . . .

. . . . . . . . . . . . . .

. . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . .

. . . . . . . . . . . . . .

5 Gridworld 5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Experimental Details . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 Learning an Optimum Policy . . . . . . . . . . . . . . . . . 5.3.2 Comparison of Q-Learning and Sarsa Algorithms Through Accumulated Rewards . . . . . . . . . . . . . . . . . . . . . . 5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

vii

. . . . . . . . . . . . .

27 28 31 33 34 35 36 43 43 47 51 52 53

. . . . . .

54 54 54 55 57 58 59

. . . . . . . . . . . . . .

60 60 60 61 62 62 62 62 62 62 63 65 66 68 70

. . . .

71 71 71 72 72

. 77 . 79 . 80

6 An 6.1 6.2 6.3

Examination of RL Policies Implemented in the Overview . . . . . . . . . . . . . . . . . . . . . . . . . Experimental Details . . . . . . . . . . . . . . . . . . Performance Metrics . . . . . . . . . . . . . . . . . . 6.3.1 Energy Consumption . . . . . . . . . . . . . . 6.3.2 Number of Virtual Machine Migrations . . . . 6.3.3 Service Level Agreement Violations . . . . . . 6.3.4 Energy Service Level Agreement Violations . . 6.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.1 Energy . . . . . . . . . . . . . . . . . . . . . . 6.4.2 Migrations . . . . . . . . . . . . . . . . . . . . 6.4.3 SLAV . . . . . . . . . . . . . . . . . . . . . . 6.4.4 ESV . . . . . . . . . . . . . . . . . . . . . . . 6.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . 6.6 Summary . . . . . . . . . . . . . . . . . . . . . . . .

Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7 Potential Based Reward Shaping with a Comparative Study State of the Art Placement Policy 7.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 PBRS Experimental Details . . . . . . . . . . . . . . . . . . . 7.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.1 PBRS Gridworld Example . . . . . . . . . . . . . . . . 7.3.2 Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.3 Migrations . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.4 SLAV . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.5 ESV . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 Comparative Analysis Experimental Details . . . . . . . . . . 7.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.1 Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.2 Migrations . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.3 SLAV . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.4 ESV . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6 Evaluation of the PABFD Hypothesis . . . . . . . . . . . . . 7.6.1 Active Hosts . . . . . . . . . . . . . . . . . . . . . . . . 7.6.2 Shutdowns . . . . . . . . . . . . . . . . . . . . . . . . . 7.6.3 Server Utilisation . . . . . . . . . . . . . . . . . . . . . 7.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Conclusion 8.1 Contributions . . . . . . . . . . . . . . . 8.1.1 Experimental Review . . . . . . . 8.1.2 Reflecting on the State-of-the-Art 8.2 Impact . . . . . . . . . . . . . . . . . . . 8.3 Future Work . . . . . . . . . . . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

81 81 81 82 82 82 82 83 84 84 89 91 92 94 95

. . . . . . . . . . . . . .

on the . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

96 96 96 98 98 100 102 102 103 105 105 105 108 108 109 110 111 113 115 117 120

. . . . .

. . . . .

. . . . .

. . . . .

121 121 122 125 127 128

Bibliography

129

Appendices

138 viii

List of Figures 1 2

Typical data center . . . . . . . . . . . . . . . . . . . . . . . . . . . . Optimized data center . . . . . . . . . . . . . . . . . . . . . . . . . .

3 4 5 6 7 8 9 10 11 12 13 14

Comparison of computing paradigms [8] . . . . Cloud computing four layered architecture [18] . Public cloud [28] . . . . . . . . . . . . . . . . . Private cloud [28] . . . . . . . . . . . . . . . . . Hybrid cloud [29] . . . . . . . . . . . . . . . . . Community cloud [29] . . . . . . . . . . . . . . Federated network of clouds [30] . . . . . . . . . Breakdown of data center energy overheads [36] Server power consumption [36] . . . . . . . . . . Virtual machine abstraction [47] . . . . . . . . . Virtualised servers with live migration capability The agent environment interaction model [70] .

15 16

CloudSim class diagram [31] . . . . . . . . . . . . . . . . . . . . . . . 55 Simulation interaction diagram [31] . . . . . . . . . . . . . . . . . . . 57

17 18

High level class diagram with integrated RL framework . . . . . . . . 64 Sequence diagram showing CloudSim resource optimisation with RL component . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

19 20 21 22 23 24 25 26

The gridworld problem illustrating the optimal path from each state Q-Learning -greedy number of moves to goal 5x5 gridworld . . . . Comparison of Q-Learning -greedy and softmax action selection polices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Comparison of Sarsa -greedy and softmax action selection polices Comparison of agents performance with Q-Learning and Sarsa . . . Total moves for Q-Learning and Sarsa . . . . . . . . . . . . . . . . Accumulated rewards . . . . . . . . . . . . . . . . . . . . . . . . . . Sutton & Barto cliff walking rewards . . . . . . . . . . . . . . . . .

. . . . . .

74 75 75 76 77 78

27 28 29 30 31 32 33

Energy consumption over 100 iterations . . . . . . . . . . . . Energy consumption Q-Learning with -greedy and softmax Energy consumption Sarsa with -greedy and softmax . . . Energy consumption Q-Learning softmax and Sarsa softmax Total energy consumption over 100 trials . . . . . . . . . . . Total energy consumption over 30 day workload . . . . . . . Average energy consumption over 30 day workload . . . . .

. . . . . . .

84 85 86 86 87 88 88

ix

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . [63] . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . .

. . . . . . . . . . . .

. . . . . . .

. . . . . . . . . . . .

. . . . . . .

. . . . . . . . . . . .

. . . . . . .

. . . . . . . . . . . .

3 4 11 14 17 17 18 19 20 22 23 26 32 36

. 72 . 73

34 35 36 37 38 39 40

Total migrations over 100 trials . . . . . Average migrations over 100 trials . . . Average migrations over 30 day workload Total SLAV 100 trials . . . . . . . . . . Total SLAV 30 day workload . . . . . . Total ESV over 100 trials . . . . . . . . Total ESV 30 day workload . . . . . . .

41 42 43 44 45

Rate of convergence with PBRS in gridworld . . . . . . . . . . . . . . 98 Total number of moves to goal with PBRS . . . . . . . . . . . . . . . 99 Comparison of total moves with PBRS over five learning trials . . . . 99 Energy consumption for PBRS Sarsa against standard Sarsa algorithm100 Total energy consumption over 100 iterations for PBRS Sarsa and standard Sarsa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Comparison of energy consumption with PBRS over five iterations . . 101 PBRS Sarsa vs standard Sarsa average migrations . . . . . . . . . . . 102 PBRS Sarsa vs standard Sarsa total SLAV . . . . . . . . . . . . . . . 103 PBRS Sarsa vs standard Sarsa ESV . . . . . . . . . . . . . . . . . . . 104 Energy consumption over 30 days Lr-Mmt vs RL Agent . . . . . . . . 106 Total energy consumption Lr-Mmt vs RL Agent . . . . . . . . . . . . 106 Migrations Lr-Mmt vs RL Agent . . . . . . . . . . . . . . . . . . . . 108 SLAV Lr-Mmt vs RL Agent . . . . . . . . . . . . . . . . . . . . . . . 109 ESV Lr-Mmt vs RL Agent . . . . . . . . . . . . . . . . . . . . . . . . 109 Number of active hosts LrMmt vs RL Agent . . . . . . . . . . . . . . 111 Host shutdowns over 30 day workload . . . . . . . . . . . . . . . . . . 113 Number of host shutdowns and migrations Lr-Mmt vs RL Agent . . . 114 Active servers resource utilisation Lr-Mmt vs RL Agent . . . . . . . . 115 Server utilisation percentage proportions Lr-Mmt vs RL Agent . . . . 116

46 47 48 49 50 51 52 53 54 55 56 57 58 59

x

. . . . . . . . . . . . . . . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

89 90 90 91 92 93 93

List of Tables 1 2

Percentage of daily energy savings over 30 day workload . . . . . . . 107 Percentage reduction of required hosts over 30 day workload . . . . . 112

2

Comparison of improvements on Lr-Mmt with standard Sarsa vs PBRS Sarsa algorithm over 30 day workload . . . . . . . . . . . . . Percentage reduction of daily migrations over 30 day workload . . . Percentage of daily SLAV savings over 30 day workload . . . . . . . Percentage reduction of daily ESV over 30 day workload . . . . . . Percentage reduction of server shutdowns over 30 day workload . .

3 4 5 6

xi

. . . . .

139 139 140 141 142

Acronyms AI Artificial Intelligence ARPANET Advanced Research Projects Agency Network CEO Chief Executive Officer CLI Command Line Interface CPU Central Processing Unit DCD Dynamic Component Deactivation DPM Dynamic Power Management DPS Dynamic Performance Scaling DRAM Dynamic Random Access Memory DRS Distributed Resource Scheduler DVFS Dynamic Voltage Frequency Scaling ESV Energy Service Level Agreement Violations ESX Elastic Sky X FSM Finite State Machine GA Genetic Algorithm GPS General Problem Solver HaaS Hardware as a Service HPC High-Performance Computing HPG Highest Potential Growth HTTP Hyper Text Transfer Protocol IaaS Infrastructure as a Service IT Information Technology xii

KVM Kernel-based Virtual Machine LAN Local Area Network LR Local Regression MAS Multiagent Systems MC Maximum Correlation MDP Markov Decision Process MIPS Millions of Instructions Per Second MM Minimisation of Migrations MMT Minimum Migration Time MMU Memory Management Unit MPP Minimum Power Packing NAS Network-Attached Storage NIST National Institute of Standards and Technology OS Operating System PaaS Platform as a Service PABFD Power Aware Best-Fit Decreasing PBRS Potential Based Reward Shaping PDM Performance Degradation Due to Migrations PM Power Manager PSO Particle Swarm Optimisation PUE Power Usage Effectiveness QOS Quality of Service RAM Random Access Memory RL Reinforcement Learning SaaS Software as a Service SLA Service Level Agreements SLATAH Service Level Agreement Violation Time Per Active Host SLAV Service Level Agreement Violation xiii

SPM Static Power Management TCO Total Cost of Ownership TCP/IP Transmission Control Protocol/Internet Protocol TD Temporal Difference TLB Translation Lookaside Buffer UPS Uninterruptible Power Supply VM Virtual Machine VMCS Virtual Machine Control Structure VME Virtual Machine Extensions VMM Virtual Machine Monitor VO Virtual Organisation WAN Wide Area Network

xiv

Chapter 1 Introduction In recent years cloud computing has rapidly emerged as one of the most promising and compelling landmark developments in the field of Information Technology. Cloud computing promotes dynamic and on demand provisioning of IT infrastructure, resources and services charged on a pay-per-use basis. This evolution in computing paradigms has given rise to the widespread adoption of the cloud computing model across many industries and sectors. As a result, there has been a move towards the global operation of large scale data centers in order to meet the ever increasing demand for such services. This advancement has also inadvertently resulted in an astronomical rise in energy consumption and carbon emissions. In 2006, it was estimated that data centers accounted for 1.5% of the total U.S electricity consumption resulting in a staggering $4.5 billion in electricity costs [1]. Other studies state that energy costs in data center deployments have been reported to double every five years [2]. Furthermore, by the year 2020 the carbon footprint of data centers is set to exceed the airline industry [3]. In light of these findings, it has also been reported that existing energy efficient technologies could however, reduce energy consumption by an estimated 25% with further reductions of up to 55% achievable with the implementation of more advanced technologies and state-of-the-art energy efficient practices [1]. These figures overall suggest that energy efficiency in cloud infrastructures has become a formidable problem and one of the greatest challenges in the advancement of Information Technology. Virtualisation technologies serve as a standard technique for both power management and resource optimisation in data center operations . The concept of virtualisation plays a pivotal role in the delivery of more dynamic, scalable and elastic computing resources which are inherent characteristics of the cloud computing paradigm [4]. More specifically, virtualisation aids in the promotion of increased server utilisation by apportioning the resources of large physical servers into much smaller independent VMs running on a single host in apparent isolation. Each VM is equipped with its own Operating System (OS) and applications which allows for the simultaneous execution of multiple tasks. This overall results in increased efficiency and resource utilisation. However, despite the benefits gained from the advancements in virtualisation technologies one of the major inefficiencies in data center deployments is caused by poorly managed and idle resources [5]. Current

1

Chapter 1

studies have revealed that on average servers operate at a mere 12-15% of their full capacity indicating the necessity for improved strategies [6]. Solving the problem of energy efficient resource allocation while delivering on strict Service Level Agreements (SLA) is a very challenging obstacle. To address this problem, one approach that could be leveraged in order to maximise energy efficiency in data centers is VM placement. The goal of such an approach is to strategically allocate a greater number of VMs on to a reduced number of servers while satisfying SLA constraints. This is often modelled as an NP-hard bin-packing problem which is a type of computational problem that makes finding an optimal solution a challenge given the number of servers in the environment and the amount of VMs to be placed. To date there has been an extensive amount of research devoted to classical heuristic based VM placement techniques which have proved to be a reasonable solution to this problem. However, one issue with these solutions is that they are relatively static and as a result they are limited in their effectiveness to reduce energy consumption in a more dynamic environment such as the cloud. In this thesis we present an alternative and more dynamic RL based approach to VM placement. This approach is designed to optimise the allocation of VM instances in order drive a more sustainable and energy efficient resource management model. This technique utilises an intelligent learning agent which leverages an underlying mathematical framework based on probabilistic theory in order to gain knowledge about the environment in which it operates. As a result this technique provides the agent with the capability of making informed decisions in non deterministic environments. Through the deployment of a more adaptive and flexible approach we seek to achieve greater energy efficiency which will not only reduce the high energy costs evident in data center operations today but will also have a much wider impact on global environmental sustainability.

2

Chapter 1

1.1

1.1. PROBLEM FORMULATION

Problem Formulation

This research proposes to address the formidable problem of energy efficiency in data center deployments. In order to provide a novel and innovative solution to this problem we introduce an intelligent RL learning agent responsible for optimising the distribution of VMs across the data center. Through the introduction of such an approach we intend on advancing the state-of-the-art while also maximising our research potential by offering a generic solution which can be harnessed across public, private, hybrid, community and federated cloud infrastructures at an IaaS level. The achievement of a more energy efficient resource optimisation solution in cloud based infrastructures requires strategically allocating resources to incoming user requests in such a way that the least amount of resources are used while also adhering to strict SLA constraints. In order to contextualise this problem and to convey the core idea of this research we present a very simplified representation of both a typical data center setup and an optimised data center. Figure 1 describes a typical data center, as illustrated the VMs are distributed across several hosts which remain severely underutilised. This common approach evident in current deployments results in the wastage of valuable resources and constrains the overall capacity of the data center. In addition servers operating at low capacity have been proven to use up to 60% of their maximum power resulting in significant draws on energy consumption [6]. In a realistic cloud setting this problem is further magnified through the operation of potentially hundreds of thousands of servers depending on the size of the data center.

Figure 1: Typical data center 3

Chapter 1

1.1. PROBLEM FORMULATION

The proposed approach seeks to address these fragilities by training an RL agent to strategically allocate VMs on to a reduced number of servers in order to conserve energy and to drive a more sustainable green cloud infrastructure. As illustrated in Figure 2, in order to maximise efficiency the VMs have been migrated from server three and four onto server one and two while also guaranteeing a certain portion of resources remain available in the event of a VM experiencing a surge in demand and as a result requiring additional resources. This effectively allows for optimising the available resources on a reduced number of servers while also ensuring an acceptable level of service is provided according to the SLA. Additionally, it allows idle servers to be powered down in order to further conserve energy. While the implementation of this approach has the potential to achieve significant energy savings it also provides additional benefits. These include reducing the Total Cost of Ownership (TCO) for cloud providers while diminishing the costs of operating additional cooling and auxiliary equipment. Furthermore, an unoptimised data center as illustrated in Figure 1 results in the availability of fragmented resources across numerous machines, by consolidation workloads on to a reduced number of servers it allows the data center to make improved use of its overall capacity allowing it to better serve the needs of its users.

Figure 2: Optimized data center

4

Chapter 1

1.2

1.2. RESEARCH OBJECTIVES

Research Objectives

The following are the research objectives: • To review the state-of-the-art in data center resource optimisation research in order to gain an understanding of current work in this area. • To determine the limitations of current approaches and identify a suitable research gap. • To develop novel RL algorithms for implementation in cloud based infrastructures. • To test and identify the most efficient algorithm across a range of performance metrics. • To explore the implications of more advanced RL techniques on energy, performance and SLA . • To analyse the credibility of the proposed approach against the current state of the art placement policy.

1.3

Research Questions

The research presented in this thesis aims to address the following set of research questions: 1. What are the effects of introducing an RL approach to VM placement on energy consumption and the overall performance of the data center? 2. Can the implementation of a more advanced RL technique further improve the proposed solution? 3. How does an RL solution perform in comparison to the state-of-the-art placement policy? 4. If an RL approach outperforms the state-of-the-art heuristic based placement policy why is the deployment of a learning agent a better solution?

5

Chapter 1

1.4

1.4. THESIS STRUCTURE

Thesis Structure

The structure and layout of this thesis is as follows: • Chapter 1 provides an introduction to the research project and its subject matter including the problem formulation, research objectives and research questions. • Chapter 2 presents the literature review. More specifically, it provides the background material on cloud computing, data center energy efficiency, Virtualisation, Reinforcement Learning and also a review of the the most significant and relevant research conducted to date within the field. • Chapter 3 introduces the CloudSim simulator and describes the key components and the simulation setup that will be required for the work outlined in this thesis. • Chapter 4 describes the design and development of the RL framework which is the core contribution of this research. More specifically, it introduces the RL algorithms which have been developed explicitly for implementation in a cloud environment. • Chapter 5 illustrates and discusses how the concept of RL works in a more simplistic gridworld environment. • Chapter 6 focuses on the implementation of the RL framework. It presents a range of performance metrics which will be used in order to measure the results of each experiment. Furthermore, this chapter analyses and evaluates each of the proposed RL algorithms in order to determine the most promising algorithm for further examination. • Chapter 7 investigates the effects of introducing a more advanced RL technique while also it evaluates the proposed approach against the state-of-the-art consolidation algorithm. • Chapter 8 provides a conclusion and overall reflection on the research conducted. It also highlights any areas requiring further improvements and future research.

6

Chapter 2 Background and Literature Review 2.1

Overview

The goal of the following chapter is to firstly introduce the fundamentals of cloud computing. It will explore its origins, the principle characteristics of the cloud computing paradigm, the underlying cloud architecture and also it will introduce the range of cloud deployment models which are most common throughout the body of literature. This chapter will then provide an insight into contemporary data center energy consumption by examining the key areas of energy consumption within a data center. Following on from this the chapter will explore the dimensions of virtualisation. It will uncover the origins of virtualisation and the range of available virtualisation techniques and technologies. Lastly, this section will close with a discussion on one of the most compelling features of virtualisation which is the capacity to migrate and consolidate workloads. Next this chapter will focus on introducing the field of Artificial Intelligence (AI). In particular this section seeks to develop a theoretical foundation for RL, a method derived from Machine Learning and one of the most active research domains in the field of AI. It will introduce the fundamental RL components and methodologies such as the agent environment interaction model, learning algorithms and action selection strategies. Lastly this chapter will close by examining the most profound and relevant research conducted to date in the field with regards to energy efficiency and dynamic resource allocation. It will explore both threshold/non AI based approaches and AI based approaches. Also this section outlines some of the major work that has been conducted by the Complex Systems Research Group which the work in the thesis has been conducted under.

7

Chapter 2

2.2

2.2. THE EVOLUTION OF CLOUD COMPUTING

The Evolution of Cloud Computing

The significant advancements that have emerged in technology in the twenty-first century have given rise to an unprecedented paradigm shift in computing known as cloud computing. Since its inception it has rapidly converted the landscape of Information Technology. The cloud computing paradigm is built upon the ideology of commoditizing computing services and delivering them in a way that is analogous to that of traditional diurnal utilities such as water, electricity, gas and the telephone [4]. Prior to the advent of cloud computing it was envisioned that computing would one day be the fifth utility, in fact in 1961, Leonard Klienrock a computer scientist who is renowned for his role in the development of Advanced Research Projects Agency Network (ARPANET) the precursor of the Internet said: “As of now, computer networks are still in their infancy. But as they grow up and become more sophisticated, we will probably see the spread of ‘computer utilities’ which, like present electric and telephone utilities, will service individual homes and offices across the country”[7]. The development of the cloud computing model has been derived from an evolution of computing movements, each of which have made significant contributions to the technological breakthroughs which propelled the cloud computing paradigm [8]. In the earlier days of the 1950s service bureaus and time sharing systems paved the way followed by more recent predominant developments such as cluster computing and grid computing [9].

2.2.1

Cluster Computing

In the 1980s at a time when significantly expensive supercomputers were the pinnacle of High-Performance Computing (HPC) a widespread need for smaller scale operations requiring high computing power at a less expensive cost became evident among both the science and business domains. It was from this point on that cluster computing gained momentum while ultimately emulating computational power analogous to that of the supercomputer [10]. A cluster can be described as a collection of interconnected distributed computers which operate over high speed networks and share computational workloads in order to complete a single task. Each node in the network runs the same OS while specialised software is responsible for the scheduling and deployment of computational or data intensive tasks across the network [11]. The resources within a cluster often span a particular administrative domain which is managed and controlled by a single entity [4]. In addition they typically promote improved load balance and high availability pertaining to their design characteristics, consisting of several redundant nodes which prevent a single point of failure. This is one of the many reasons why clusters are still pervasive today and are leveraged by companies such as Google to provide storage and backup facilities [11, 12].

8

Chapter 2

2.2.2

2.2. THE EVOLUTION OF CLOUD COMPUTING

Grid Computing

Grid computing emerged in the 1990s, inspired by the inherent pervasiveness and reliability of the electrical power grids [13].They provided a coherent solution in solving large scale resource intensive computational problems evident within scientific research domains. Such problems required significantly more computing power than the already established supercomputer or cluster computing paradigms [8]. A grid can be described as an enabler for sharing, selection and aggregation of a wide variety of resources disseminated across multiple heterogeneous administrative domains [13]. Grids are composed of multiple Virtual Organisation (VO)s, each of which contribute a small number of resources to a virtual community in return for access to a more formidable pool of converged resources [8]. Unlike cluster computing, the grid ideology consolidates a community of users often governed by their own management, policies and goals [4]. Today grid technology is still utilised across many cutting edge science and engineering research initiatives which capitalise on the availability of idle computational resources [14].

2.2.3

Cloud Computing

Cloud computing has become the most promising paradigm to date, leveraging on the advancements made by its predecessors and gaining exponential widespread adoption across many industries and sectors. Many enterprises leverage cloud computing services in order to benefit from the cost associativity, reliability and the dynamic availability of infinite resources [4]. In October 2007, Google and IBM together announced a collaboration to promote developments in large scale computing, from this point forward the term Cloud Computing became more prevalent [15]. Many experts agree that this new computing model is not solely a combination of the technologies that have come before it, but rather produces its own innate characteristics which have propelled the development of next generation data centres, advancements in virtualisation technologies and dynamically provisioned resources [8]. There are a number of driving factors that enabled cloud computing to be the success it is today, such as the availability of high speed bandwidth across large interconnected networks, the development of multi-core processors and also state-of-the-art virtualisation technologies [16]. In Light of Larry Ellison, Chief Executive Officer (CEO) of Oracle defining the cloud computing model as “everything that we already do” there is still no unified definition [17]. Vaquero et al. present more than twenty definitions of cloud computing which identifies a lack of consensus among the industry [18, 19]. Below illustrates some of the most widely accepted definitions in the literature.

9

Chapter 2

2.2. THE EVOLUTION OF CLOUD COMPUTING

According to Buyya et al. [20] cloud computing can be described as : “A cloud is a type of parallel and distributed system consisting of a collection of interconnected and virtualised computers that are dynamically provisioned and presented as one or more unified computing resources based on service-level agreements established through negotiation between the service provider and consumers”. National Institute of Standards and Technology (NIST) [21] defined it as: “A model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction”. Armbust et al. [16] simply state that: “Cloud computing refers to both the applications delivered as services over the Internet and the hardware and systems software in the data centers that provide those services”. Buyya et al. referred to cloud computing as an interconnected network of virtualised computers which allocates computing resources in compliance with service agreements [20]. NIST highlighted some of the key characteristics of the cloud paradigm in their definition describing it as convenient and provisioned on-demand [21]. Armbust et al. referred to the applications, hardware and systems software in order to define cloud computing [16]. See Figure 3 for an illustration of the relationship between the aforementioned computing models. This figure shows the earlier supercomputers and cluster computing paradigms located at the application orientated side while cloud computing is focused on delivering computing as a large-scale service.

10

Chapter 2

2.3. THE KEY CHARACTERISTICS OF CLOUD COMPUTING

Figure 3: Comparison of computing paradigms [8]

2.3

The Key Characteristics of Cloud Computing

Cloud computing possesses several key characteristics which distinguish it from other computing paradigms. These characteristics are core enablers in its widespread adoption, propelling it to the forefront of the computing age. Below a number of the fundamental characteristics which have been highly discussed in the literature will be outlined.

2.3.1

Elimination of Upfront Investments

In traditional computing many enterprises experience initial high cost expenditures associated with the procurement of the essential infrastructure and software to host many of their services [18]. Cloud computing supports the dynamic leasing of computing resources and infrastructure on a pay-per-use basis for example processors charged by the hour and storage charged by the day [8]. It converts capital expenses into operating expenses thereby reducing the TCO allowing enterprises to focus on their core strengths by outsourcing Information Technology (IT) requirements [16]. The cost efficiency of cloud computing is attributed to vendors operating large scale commodity data centers at low cost locations reaching massive economies of scale [16].

11

Chapter 2

2.3.2

2.3. THE KEY CHARACTERISTICS OF CLOUD COMPUTING

On-Demand Computing Resources

Cloud users have ubiquitous access to a wealth of configurable resources such as networks, servers and storage which can be rapidly provisioned on the fly allowing service providers to dynamically utilise resources through the Internet without having to invest in costly equipment [18, 21]. This allows for efficiently safeguarding against temporal variations in demand. This prime feature drives efficiency for service providers for a fraction of the cost, given that traditionally the provisioning of resources was planned for in advance. This rigid approach often meant that service providers provisioned for peak loads leading to underutilisation of resources in off peak hours [16].

2.3.3

High Scalability and Elasticity

This is one of the most prominent features of cloud computing and the most sought after by service providers. Resources are considered to be elastic, this means that as demand patterns fluctuate the cloud computing environment can autonomously scale up or down resources with a quick turnaround time. For most this has become an operational requirement but particularly for new start up services who commonly experience the flash-crowd effect whereby there is an initial surge in demand followed by a leveling off period [16]. To the user these capabilities often give the illusion of limitless server capacity and storage [21]. These attributes are achievable by means of state-of-the-art virtualisation technologies, such technologies allow for the abstraction and encapsulations of underlying infrastructure allowing for the deployment of a more agile and responsive service with guaranteed delivery on SLA as defined by Quality of Service (QOS) parameters [8, 19]. In furtherance as Foster et al. highlighted that with the increasing size of datasets coupled with the widespread adoption of cloud services the ability of vendors to continuously provide highly scalable and elastic solutions rely on effectively mapping the location of data relative to computational resources [8].

2.3.4

Ubiquitous Network Access

The services hosted in the cloud offer immense flexibility by allowing resources to be easily accessed through the establishment of standard networking protocols such as Transmission Control Protocol/Internet Protocol (TCP/IP) and Hyper Text Transfer Protocol (HTTP) using a variety of heterogeneous devices [22]. These devices can extend to both thick or thin clients such as mobile phones, laptops and desktops [21].

2.3.5

Measured Service

The cloud computing model possess inbuilt mechanisms which allow for the automatic monitoring, controlling and capturing of usage patterns. This underlying trait lends itself to the implementation and offering of utility based pricing services [18]. This inherent quality also allows for better optimisation and control of vir-

12

Chapter 2

2.3. THE KEY CHARACTERISTICS OF CLOUD COMPUTING

tualised resources in accordance to SLA [21]. Overall this can be seen to promote comprehensive resource conservation.

13

Chapter 2

2.4

2.4. A REVIEW OF THE CLOUD ARCHITECTURE

A Review of the Cloud Architecture

Cloud computing can be considered a modern mainstream technology which offers consumers an alternative strategy for the management of their business IT requirements. Cloud computing is based on the principle of the commercialisation of computing services. These services are often referred to as the cloud service stack and can typically be categorised as Infrastructure as a Service (IaaS), Platform as a Service (PaaS) and Software as a Service (SaaS) [8]. The integrity of such services is governed by SLA, this is a contract between the provider and consumer which details the level of service expected such as throughput, availability and reliability [23]. These three fundamental services merged with the systems level hardware and software constitute the cloud computing architecture. The cloud architecture is often represented by several key layers, which according to Youseff et al. are inherently interdependent [24]. In this sense, the service offerings at each layer are composed from the services of the underlying layers. Youseff et al. referred to this as composable services [24]. To gain a thorough understanding of the architecture a detailed four layer synopsis will be discussed. See Figure 4 below.

Figure 4: Cloud computing four layered architecture [18]

14

Chapter 2

2.4.1

2.4. A REVIEW OF THE CLOUD ARCHITECTURE

Application Layer

The application layer constitutes the software applications which support the SaaS model. Such a service is remotely accessible to the general public over the Internet and typically charged on a utility basis, More specifically this type of service can be delivered on demand, on a subscription basis or more commonly using a pay-as-yougo model [8, 16]. Often the end users of this service have no control over the system other than the typical application settings [25]. SaaS migrates the computational workload from the end users hardware to the data center where the application is hosted. In this vein, end users benefit from cost savings such as daily operation and support costs. On the other hand service providers reduce the complexities associated with upgrades and testing while benefiting from continuous flow of revenue in comparison to traditional licencing associated with a one time payment [24].

2.4.2

Platform Layer

This layer leverages PaaS offerings targeted at application developers. This layer composes of the application frameworks and OS to easily build and expedite the deployment of applications on the cloud. In brief it supports the scalability requirements of new web services, improved load balance and integration [24]. The provider manages the tools to build and deploy applications such as the frameworks and libraries while also exercising control over the underlying infrastructure such as the network, storage and OS. The consumer has control over the environment configuration settings and the deployment of applications [25]. PaaS motivates rapid application development and for this reason it is often adopted by agile development teams [26]. Developers leverage the enhanced resources without requiring large amounts of monetary outlays for associated software and underlying infrastructure.

2.4.3

Infrastructure layer

This layer promotes the IaaS model, it enables the provisioning of hardware resources such as processors, storage and network and also the associated software comprising of OSs and virtualisation software [21]. This layer is responsible for providing the fundamental resources for the layers higher up in the service stack. The ability to extract and encapsulate available resources into numerous VMs is the key enabler of this layer [4]. According to Youseff et al. consumers of this layer have super-user access to their VMs allowing for greater control by permitting the customisation of the software stack across each VM which in turn promotes increased performance and efficiency [24].

15

Chapter 2

2.4.4

2.5. THE CLOUD COMPUTING DEPLOYMENT MODELS

Hardware Layer

This layer is composed of all of the physical computing and cooling equipment in the data center including servers, routers, switches and also the additional power and cooling systems [18]. Youseff et al. identified that this layer also offers a service known as Hardware as a Service (HaaS), this is subleasing of entire data centers to interested consumers without requiring the stark investments involved in building a data center. Vendors offering this service scenario are responsible for the operation management and upgrading of the physical hardware [24]. Notably across the body of literature several research experts recognised the need for improved interoperability of systems and user interfaces. In light of cloud computing service providers offering flexible mashups of various applications presented as a unified interface, Most argued that as cloud computing becomes more commonplace across all industries there will be a growing need to design a transparent platform across the entire cloud ecosystem to further expand the capabilities of the cloud [8, 24, 27].

2.5

The Cloud Computing Deployment Models

In general there are four main deployment models which can be classified as public clouds, private clouds, hybrid clouds and community clouds. Each model is classified by an inherent set of attributes and are often harnessed to meet differential business requirements. Additionally, in more recent times the cloud movement has made progress with a more advanced deployment model known as a federated cloud. This model operates at a much more global level than the traditional models mentioned above. However, this section will also address this newer model in order to present the current body of knowledge on this subject matter.

2.5.1

Public Clouds

The public cloud is often leveraged by small to medium businesses as a primary solution to IT requirements due to the cost savings that can be harnessed. Larger organisations often use a public cloud as an extension to their in-house server setup [25]. The public cloud offers computing resources made available to the general public and accessible through the Internet. The range of services available as outlined in the previous section are charged on a pay-per-use basis. This type of cloud is relatively cheap in comparison to private clouds as it does not require investment in costly infrastructure [25]. Additionally it allows service providers to shift the risk of downtime and the provisioning of resources to the infrastructure provider preventing idle resources often associated with traditional computing. Furthermore, one of the major concerns of utilising a public cloud is the lack of control over the associated hardware, network, data and security [18].

16

Chapter 2

2.5. THE CLOUD COMPUTING DEPLOYMENT MODELS

Figure 5: Public cloud [28]

2.5.2

Private Clouds

Also known as an internal cloud, it refers to the in-house data center of a business which is built and operated for the explicit use by the businesses employees, customers and partners often secured by means of an intranet [25]. This type of setup can be fully managed by the business, outsourced by a third party or in certain circumstances a combination of both [21]. This type of solution guarantees improved control over performance, reliability and security. In contrast it is more expensive, analogous to the large capital expenditures associated with traditional proprietary server farms and is often only afforded by the larger more viable businesses [18].

Figure 6: Private cloud [28]

17

Chapter 2

2.5.3

2.5. THE CLOUD COMPUTING DEPLOYMENT MODELS

Hybrid Clouds

This can be considered to be a more blended approach to cloud computing. It is a combination of both a public and private cloud solution. They provide more flexibility by enabling easy migration of applications and data between both infrastructures [21]. In furtherance, hybrid clouds permit better control over security and the locality of applications and data while still leveraging the benefits associated with public clouds such as scalability and elasticity [25].

Figure 7: Hybrid cloud [29]

2.5.4

Community Clouds

This type of cloud infrastructure is setup by a specific cohort of organisations that have a similar requirement or a shared concern. It may be owned, managed and controlled by a single or several organisations within the community or alternatively it can be managed by a third party or even in some use cases a combination of both [21].

18

Chapter 2

2.5. THE CLOUD COMPUTING DEPLOYMENT MODELS

Figure 8: Community cloud [29]

2.5.5

Federated Clouds

A federation is simply put the interconnection of several clouds owned by distinct vendors. At any given time a cloud environment is constrained to a limited amount of hosting capacity. The standard architecture and design does not easily lend itself to the dynamic allocation of load distribution across other globally distributed clouds in the event of resources becoming saturated [8]. This outcome has a detrimental effect on a providers ability to delivery on SLA [30]. The ideology behind a federated cloud is the pooling of unused capacity from geographically diverse locations into a global marketplace which allows the real time trading of capacity. A federated cloud allows for the optimisation of service location relative to geographic location and results in reduced latency and increased QOS [31]. It empowers cloud providers to become more agile and adaptable in the occurrence of a natural disaster and furthermore, it promotes energy efficient data centers in order to drive the ideology of green cloud computing, a major issue which will be discussed in the next section [30, 31]. Buyya et al. proposed a federated inter-cloud environment [30]. See Figure 9 for an overview. This proposed model consists of entities such as: • Cloud Exchange: Responsible for the aggregation of available resources and the processing of incoming queries. • Broker: Mediates between consumer and cloud provider by means of negotiating the provisioning of resources to meet consumers QOS parameters. • Cloud Coordinator: Responsible for publishing available resources in the directory of the cloud exchange. 19

Chapter 2

2.5. THE CLOUD COMPUTING DEPLOYMENT MODELS

Figure 9: Federated network of clouds [30] Throughout the literature several researchers such as Foster et al. and Armbrust et al. called on the need for better interoperability and collaboration between service providers [8, 16]. The lack of universal standards governing cloud technology seems to be nontrivial and creates barriers that prevent the leveraging of future opportunities. To date the developments of cloud federations are still in their infancy with many research initiatives being undertaken by both industry leaders and research bodies. Developments to date have included OPTIMUS toolkit, Open Cirrus and Contrial [32]. More recently, in 2014 Cisco announced major investments into the development of the largest global cloud federation [33].

20

Chapter 2

2.6

2.6. THE DATA CENTER AND ENERGY CONSUMPTION

The Data Center and Energy Consumption

Data centers are a basic and fundamental unit of cloud computing, they can be seen as the backbone of modern IT solutions and provide the essential infrastructure for the execution of the most critical business processes. A data center can refer to a server room in an organisational setting, an enterprise data center serving an entire corporation or alternatively, server farms hosting a multitude of cloud computing services and operated by industry tycoons such as Amazon, Google and Facebook [6]. A significant ongoing concern fuelled by the alarmingly high power and energy consumption rates endemic in existing data centers has become a prevalent issue in recent times. According to the Natural Resources Defense Council the explosion of big data, digital content and e-Commerce are major contributing factors as to why data centers are one of the fastest growing users of electricity [6]. More specifically, in 2013, U.S data centers consumed an estimated total of 91 billion kilowatt-hours of electricity, the equivalent of not only the output of 34 large coal-fired power plants but enough electricity to supply all of the households in New York City twice over. By 2020 the level of consumption is estimated to increase to approximately 140 billion kilowatt-hours annually, costing 13 billion per year in electricity bills and furthermore, pollution of 150 million metric tons of carbon [6]. According to GeSI SMARTer 2020 report data center emissions are projected to increase by 7% annually [34]. Nevertheless, it is also important to highlight that according to the literature larger companies such as Google, Facebook and Microsoft have placed major emphasise on data center energy efficiency, motivated by the TCO and also a significant amount of pressure from environmental organisations. However, the aforementioned state-ofthe-art data centers only represent a small minority of globally installed data centers, between 5% and 7% to be precise. More profoundly, the remaining percent have at best failed to adopt industry best practices [6, 34]. These findings accentuate the importance of implementing techniques to curtail energy and power consumption within data centers while also supporting environmental sustainability.

21

Chapter 2

2.6.1

2.6. THE DATA CENTER AND ENERGY CONSUMPTION

Key Areas of Energy Consumption

Energy Management continues to be a key issue and a growing concern for data center operations. In order to address such a concern it is essential to identify the core elements in a data center which are responsible for high levels of energy consumption. The scope of energy efficient design comprises of both computing devices and also associated auxiliary equipment such as cooling systems and Uninterruptible Power Supply (UPS) which assist in the sustainability of a reliable system [35]. Figure 10 illustrates a break down of data center energy overheads adopted from a study carried out by Google [36]. Notably, one of the largest areas of energy consumption is IT equipment acquiring 30% in total. While the remaining percent is shared by several other associated infrastructural components.

Figure 10: Breakdown of data center energy overheads [36] Power Usage Effectiveness (PUE) is an industry recognised data center efficiency metric based on the total amount of energy used by a data center relative to the energy used by the IT equipment. PUE [37] can formally be defined as:

P UE =

T otal F acility Energy Consumption IT Equipment Energy

A PUE ratio of 2.0 signifies that for every watt consumed by IT equipment an additional watt is used by associated auxiliary equipment. Idealistically, a well designed and operated data center should yield a PUE ratio of less than 2.0. State-of-the-art data centers often achieve a ratio of approximately 1.2 while less prestigious facilities yield a ratio of up to 2.0 [38]. While much research and development in the last five years has been devoted to data center energy efficiency, most evidently efficiency improvements in hardware and auxiliary equipment [35]. Surprisingly, only a significantly small portion has 22

Chapter 2

2.6. THE DATA CENTER AND ENERGY CONSUMPTION

been geared towards server utilisation and operational efficiencies [6]. Moreover, according to current research low levels of server utilisation is the most profound source of energy inefficiency, it is reported that idle servers utilise more than 70% of their peak power, Therefore yielding the greatest opportunity for energy savings [6, 35].

2.6.2

Server Power Consumption

In order to investigate energy consumption within a server it is essential to identify the main components involved. Google, strong advocates in data center energy efficiency and operators of global state-of-the-art data centers conducted research into energy utilisation in modern IT equipment. A major finding which further solidifies the continuous need for improved efficiency was that on average servers spend a relatively small portion of their total time at peak capacity, instead a substantial amount of time is spent within the 10-50% Central Processing Unit (CPU) utilisation range. Furthermore, the results also uncovered that the most concentrated use of energy occurs in the CPU at a rate of 33% followed closely by Dynamic Random Access Memory (DRAM), see Figure 11 [36]. These findings show that the CPU no longer holds dominance over the other various subsystems in terms of energy consumption. This has been the result of incremental improvements in CPU power efficiency converged with the implementation of power saving mechanisms such as Dynamic Voltage Frequency Scaling (DVFS). DVFS gives rise to active low power modes enabling the CPU to operate at power levels as low as 30% [35]. The following subsection will discuss the associated power management methods in more detail.

Figure 11: Server power consumption [36]

23

Chapter 2

2.6.3

2.6. THE DATA CENTER AND ENERGY CONSUMPTION

Power Management Techniques

Server power management can be broadly classified into Static Power Management (SPM) and Dynamic Power Management (DPM) techniques [35]. In order to remain within the scope of this thesis the author will primarily focus on DPM. DPM can be described as a methodology which enables run time dynamic reconfiguration of a systems resource utilisation relative to current resource requirements [35]. It achieves energy efficient computations by selectively reducing the performance of components or alternatively, decommissioning components that may be underutilised or remain in an idle state [39]. The inherently perplexing and non deterministic system environment stipulates that the implementation of DPM techniques must be governed by a Power Manager (PM). The PM is responsible for implementing policies based on a set of observations or assumptions on any given workload [39]. According to Beloglazov et al. DPM techniques can be further subdivided into Dynamic Component Deactivation (DCD) and Dynamic Performance Scaling (DPS) [35]. Dynamic Component Deactivation This technique is often implemented by components which do not support DPS and therefore require transitions between a finite number of power states, for example active, standby and sleep modes [35]. Transitioning between power modes incurs a significant power and performance overhead. Therefore in order to achieve efficiency the projected power savings must be greater than the power required to reinitialise such a component [40]. In some instances the power consumed during a transition can be analogous to that which is consumed in active run time mode [39]. This accentuates the importance of strategic power management policies. In order to implement DCD a workload prediction model is used to forecast efficient transitions [35]. The prediction model is a product of predictive static and adaptive techniques, which gives rise to predictive shutdown and predictive wake up power modes. In order to make a prediction a correlation must exist between past historical data of a systems behaviour and its future demands, the strength of the correlation impacts greatly on the overall accuracy of the prediction [35, 40]. Dynamic Performance Scaling Components such as the CPU support the dynamic adjustment of their performance relative to resource requirements. One of the latest DPS techniques utilised in all modern processors is DVFS. In order to drive CPU optimisation the CPU clock frequency and voltage supply are altered as necessary. This prevents component deactivation , improves the amortisation of resources and promotes energy efficient consumption without affecting performance adversely [35, 41, 42]. More concretely, DVFS curtails the amount of instructions the CPU can process over a particular time frame. Notably, however, while this reduces performance it also increases the run time of a program and therefore the acute management of energy/performance ratios proves essential [35]. There are also challenges associated with the execution of this technique as outlined by Venkatachalam et al. [41]. Firstly, the algorithm must accurately predict 24

Chapter 2

2.7. VIRTUALISATION

workloads at any given time. This proves elusive given the unpredictable stochastic nature of workload patterns for example arbitrary I/O device requests. Furthermore, determining the velocity at which the CPU should operate is also a non trivial matter. There are a number of existing methods for power conservation which leverage DVFS. Interval based methods, analogous to that of predictive and adaptive DCD techniques in that they utilise information from past CPU activity in order to predict future utilisation rates. Intratask methods identify the various tasks running in the system and allocates each of them an appropriate performance rate relative to their requirements. Lastly Intertask methods investigates the internal structure of a program and provisions resources appropriately [35].

2.7

Virtualisation

The era of pervasive on demand computing is heavily accredited to that of virtualisation. Virtualisation is the foundational idea upon which the cloud computing paradigm is built. The concept of virtualisation has been widely understood for a significant amount of time. Its origins are rooted as far back as the early 1960s when the first supercomputer known as the Atlas computer was the first to deploy a Virtual Machine Monitor (VMM). The Atlas computer was a development project run by the University of Manchester, it fundamentally utilised a supervisor to manage computational resources and provision the computing environment for executing user program instructions. In essence this development was coined the birth of the VMM as it initiated the ideology of an abstract layer, a fundamental principle to the core design of VMMs today [43]. One of the most influential developments in virtualisation history occurred in the late 1960s when IBM fostered CP-40, influenced by the ongoing resource intensive research initiatives evident in science and engineering domains. CP-40 marked a turning point in the evolution of virtualisation, attributed as the first successful virtual machine OS. It leveraged fully virtualised hardware to support shared access to costly resources. CP-67 eventually replaced CP-40 and soon after its inception it gained widespread acceptance as a time sharing system [44]. It was not until 1972 that the IBM System/370 propelled the commercialisation of virtualisation technology allowing it to achieve a significant foothold in the market. Today the emergence of a variety of technological advancements has spurred considerable demand for the fabrication of virtualisation software [43].

25

Chapter 2

2.7.1

2.7. VIRTUALISATION

An Abstract Description

Virtualisation is the fundamental concept which powers the cloud computing paradigm. It can be defined as the faithful reproduction of an entire architecture which provides the illusion of a real machine to all software running above it [45]. Virtualisation promotes the abstraction and encapsulation of the underlying infrastructure resulting in the optimisation of resources while delivering highly scalable and elastic computing services, an essential inherent feature of the cloud computing paradigm [4]. Virtualisation software is responsible for controlling and provisioning resources to all VMs operating on the host. The VMM typically resides between the underlying hardware and OS, this facilitates the decoupling of the physical hardware from the associated software for improved modularity, therefore, allowing VMs to operate in apparent isolation [35]. The utilisation of visualisation technology achieves improved performance and fault tolerance while supporting not only hardware and software heterogeneity but also dynamic migration and consolidation of workloads for improved efficiency [46]. Although in the past decade there has been a tremendous surge in the the adoption of virtualisation software the average server utilisation rate is still just 12-15% indicating the need for further advancements [6].

Figure 12: Virtual machine abstraction [47]

26

Chapter 2

2.7.2

2.7. VIRTUALISATION

Virtualisation Techniques

The design of the x86 architecture presents a significant shortcoming pertaining to the implementation of a virtualised environment. As outlined by Neiger et al. the architecture comprises of four layers of privileges fundamental to the management of the underlying hardware [48]. The OS resides at layer 0 obtaining the highest level of privileges. The insertion of a virtualised layer forces the OS to shift upwards to a less privileged layer, this is known as ring compression. As a result it prevents the execution of privileged instructions as they are deemed semantically incompatible. This initiated the development of three virtualisation techniques known as para virtualisation, full virtualisation and hardware assisted virtualisation which seek to circumvent the aforementioned shortcoming. Full Virtualisation Full virtualisation also known as hardware emulation, allows for the decoupling of the guest OS from the underlying hardware. In order to implement such an approach the kernel codes must be trapped and translated using binary translation, thus replacing the original instructions with a set of executable instructions [49]. Full virtualisation maintains high performance during this process period by enabling the CPU to directly execute user level code [50]. This approach proves advantageous in that the guest OS shows no awareness of the virtualised environment, enabling improved security and isolation in a shared environment. Additionally the unmodified guest OS is compatible for execution on both native or virtualised hardware and therefore facilitates seamless migration [50]. Unfortunately, the additional task of trapping privileged instructions incurs a significant performance penalty resulting in multiple instructions being translated together [49]. Para Virtualisation This approach involves modifying the kernel of the guest OS and substituting the instruction architecture with hypercalls. Hypercalls allow a direct channel of communication between the guest OS and the VMM [51]. This enables improved performance and reduced overhead by eliminating the need for binary translation or resource emulation as the guest OS is aware of the shared virtualised environment within which it operates [49, 50]. This technique is often susceptible to significant compatibility and maintainability issues particularly evident within production environments due to the required kernel modifications [50]. Hardware Assisted Virtualisation Vendors such as Intel and AMD sought to address the virtualisation challenges that persisted in the x86-64 series processors, offering built-in virtualisation features which eradicated some of the limitations and overheads evident in the previous two approaches [49]. Intels VT-x virtualisation extensions introduced two additional modes known as VMX root and non root modes of operation. Both modes support all four privilege levels which in turn allow guest OSs to execute on the intended level 0 privilege layer [48]. More concretely, the VMX root mode is deployed by the VMM and provides a mechanism to automatically trap incoming privileged instructions while the VMX non root mode is leveraged by the guest OSs and the 27

Chapter 2

2.7. VIRTUALISATION

application layer [48, 50]. The management and mediation between the VMM and VM instances is aided by an additional instruction set known as Virtual Machine Extensions (VME). In Furtherance, transition mechanisms known as VM exits and VM entries are also harnessed and managed by what is known as a Virtual Machine Control Structure (VMCS). This allows for improved management over the processor state for simultaneous interactions [48]. This entire approach provides a coherent and simplified solution to achieving a high performance virtualised environment while supporting the direct execution of native OSs on the same physical platform [49].

2.7.3

Virtualisation Technologies

The evolution of virtualisation greatly revolves around the notion of a VMM, also commonly referred to as a hypervisor. As previously stated a VMM is an integral component of the cloud computing paradigm. It enables successful dynamic sharing of the underlying resources between a multitude of disparate guest OSs while also supporting the migration of workloads for improved energy efficiency and performance. VMMs can be broadly classified as bare metal or hosted. A bare metal VMM is installed directly onto the hardware while a hosted VMM resides on a layer above the hosts OS, both are commonly in use today. Below is an outline of the most widely deployed VMMs. Xen Xen is an open source virtualisation technology originally fostered at the University of Cambridge and continues to be developed by the Xen open source community [52]. Xen has in recent times invoked widespread adoption and today plays a pivotal role in powering a large number of state-of-the-art public clouds such as Amazon Web Services and Rackspace. Initially Xen was introduced for implementation on paravirtualised guest OSs, its earlier releases were heavily influenced by the limitations pertaining to full virtualisation such as limited cooperation and an overall reduction in performance [52]. In 2006, Intel contributed modifications to Xen technology which induced the release of Xen 3.0. It was considered the first open source bare metal VMM to harness Intel virtualisation technology and support both fully virtualised and paravirtualised guest OSs [53]. The Xen architecture is compartmentalised into several key subdomains. Domain 0 is a mandatory control domain, composed of a centralised interface responsible for the creation, configuration and termination of all guest domains [52]. Furthermore, it provides the management required for all I/O devices connected to the platform and also acts as an intermediary domain for the transportation of data between all other guest domains [53]. Each guest OS is contained within an non privileged isolated domain, often referred to as DomainU. This serves as a security measure in the event of a failure or the occurrence of malicious activity [54]. Memory management is controlled by initialising each VM with a specified allocation of memory resources often bound by a maximum allowable parameter. VMs can request additional memory pages from Xen provided it is operating within its

28

Chapter 2

2.7. VIRTUALISATION

reservation limit. Each VM can also evade unnecessary costs by releasing idle memory back to Xen, thus incentivising improved optimisation of memory resources [52]. All I/O requests are coordinated through Xen shared memory and asynchronous buffer descriptor rings. This allows Xen to validate memory usage for each hosted VM. In many instances the guest OSs multiplex update requests and transfer them together in order to better amortise the overhead of interacting with the VMM [52]. KVM Also known as the Kernel-based Virtual Machine, it is an open source technology which like Xen operates under the GNU public licence. KVM supports full virtualisation and is the only VMM to require Intel VT-X or alternatively, AMD-V instruction sets in its operation [47]. In order to provide support for both hardware extensions KVM leverages a function power vector which detects architecture dependent operations. As a result such operations are contained within modularised containers which isolates them from core KVM functionality [55]. KVM is a hosted VMM, implemented as a additional module in the Linux kernel and supports native guest OSs [56]. The VMM is capable of hosting multiple VMs which appear as regular processes utilising both the native Linux scheduler and memory management features. This in effect reduces the complexities of standard VMM implementations and provides seamless integration with the existing Linux system [55]. The system architecture is partitioned into three distinct execution modes known as the user mode, kernel mode and an additional guest mode [56]. In order to deliver a virtualised platform a systematic coordination among all three execution modes must occur. Each mode is assigned a different level of privileges, thus providing a shield for preventing rogue users from causing harm to the central system [55]. KVM provides a Command Line Interface (CLI) allowing users access to advanced features for controlling power management and memory allocation for hosted VMs [47]. More specifically, VMs are initialised by means of invoking a \dev\kvm device node, this spawns a guest VM which is allocated a subset of the systems resources. The device node has many auxiliary capabilities which give rise to a virtual environment including reading and writing to and from CPU registers and injecting interrupts into virtual CPUs [57]. The KVM virtual memory system maps virtual memory pages into physical addresses by instantiating separate page tables for both the Memory Management Unit (MMU) and the guest VM. Harmony and consistency among both entities is achieved by trapping instructions which attempt to update the original page table and synchronising them with a so called shadow page table. For improved efficiency the MMU is equipped with a Translation Lookaside Buffer (TLB) which accelerates lookups in the page table. Additionally, KVM also supports caching virtual memory by creating write-protect guest memory pages which are shadowed by the KVM. This greatly improves guest performance [55].

29

Chapter 2

2.7. VIRTUALISATION

KVM also facilitates I/O virtualisation by means of offloading programmed I/O and memory mapped I/O requests to a device model which simulates their behaviour [55]. Similar to Xen KVM also supports live migration of VMs for improved optimisation of available resources [47]. VMware VMware was initiated in 1998 as a proprietary technology with a limited licensing scheme. Since its inception VMware technology has proliferated extensively, today it is the current market leader in commercialised virtualisation technology [49]. VMwares more recent endeavors have resulted in the Elastic Sky X (ESX) Server, a bare-metal VMM which exerts greater control over resource provisioning and improved I/O performance [58]. The ESX VMM leverages a specially configured Linux/Unix proprietary OS. The platform supports a range of native commodity OS including Linux and Windows [47]. The VMM creates the illusion of virtualised memory by introducing an extra layer of address translation analogous to that of Xen and KVM. Memory within each VM is partitioned into guest virtual memory and guest physical memory. The guest physical memory houses a virtual page table invisible to the MMU [58]. The VMM intercepts instructions which attempt to alter the guest OS page tables, thereby, preventing updates to the MMU and TLB [52]. The VMM deploys for each VM a physical memory map data structure known as pmap. This provides a mechanism for tracking the translation of virtual pages to physical pages. The shadow page table harnesses the pmap data structure to maintain table consistency which is subsequently utilised by the processor [58]. In order to discourage VMs from accumulating idle memory the Virtual Machine Monitor (VMM) implements a so called memory tax similar to that of Xen. By contrast, however, the VMM supports memory overcommitment. This feature permits VMs residing on the host to utilise more memory than what is available on the physical host. Although such a feature may seem counter intuitive the ideology is based on the presumption that most VMs will often only utilise a fraction of the memory allocated to them [59]. Memory reclamation techniques play a pivotal role in memory overcommitment while also in a broader sense they are a significant enabler in the delivery of highly scalable and elastic computing services [59]. Several techniques can be deployed in order to reclaim idle memory, one of the more commonly used is memory ballooning [58]. The guest OS comprises of a balloon driver, the server invokes the balloon causing it to expand and implicitly coaxes the guest OS to reclaim any idle memory. Other approaches include content-based page sharing and demand paging, for further detail on the above techniques consult [58, 59]. VMware also supports live migration, implementing VMware VMotion and VMware Distributed Resource Scheduler (DRS) to coordinate such an activity. VMotion enables seamless transportation of VMs from one host to another, while DRS is a mechanism which monitors resource utilisation among a collection of servers and leverages VMotion technology to rebalance system load relative to the current workload. This in essence promotes full optimisation of available resources [35]. 30

Chapter 2

2.7.4

2.7. VIRTUALISATION

Migration and Consolidation

Migrating VM instances across distinct physical hosts is a fundamental feature of the cloud architecture. Migration facilitates load balancing for improved systems stability, it provides a mechanism for system maintenance and it also drives performance optimisation for better delivery of SLA [60]. Migration also plays a key role in reducing energy and power consumption, a notably pressing issue in the operation of a data center facility as highlighted in the previous sections. In order to reduce power consumption workloads operating on underutilised hosts are migrated and consolidated into a single coherent management domain which improves resource utilisation while also enabling the underutilised server to be decommissioned. Consolidation therefore allows for efficient amortisation of idle power costs over a period of time [61]. Aggressive consolidation of VM instances can however, lead to performance degradation when applications experience a surge in the demand for resources [62]. According to Srikantaiah et al. there exists an optimal energy/performance point [61]. Striking a balance between both is essential to achieving high performance while reducing overall energy consumption. Moreover, there are additional factors that must also be considered. These include minimising downtime during which a service is unavailable, total migration time including the initialisation of a migration process up to the decommissioning of the original VM [60]. Also often measures must be put in place to safeguard against the possibility of interrupting active services, this problem is commonly associated with resource contention among competing VMs [63]. Live and Offline Migration The process of migrating VM instances can be broadly classified into two types, often referred to as live migration and offline migration [35]. In order to implement live migration There are several approaches that can be adopted each of which produce variations in downtime and also total migration time. One of the most common approaches to achieve transparent migration is referred to as pre-copy [64]. The migration process is initialised by selecting a target host with the available resources, an initial request is sent to the destination host which invokes the instantiation of a VM container comprising of the required resources. An interactive pre-copy approach is executed which transfers all VM memory pages to the destination host in the first iteration, subsequent iterations scan the VM memory for any modified pages since the last transfer. The migration process enters a stop and copy phase which involves several tasks such as suspending the VM instance, redirecting network traffic to the new host and also transferring the CPU state and any other inconsistent memory pages. Once the destination host receives a consistent VM image the primary VM instance is decommissioned and from this point forward the migrated VM becomes active [60] . The major difference between live migration and offline migration is often attributed to the suspension of the VM instance evident in the offline approach, once the VM instance is suspended only then is its contents copied over. This approach is often susceptible to systems downtime and additional overheads which make it a less 31

Chapter 2

2.7. VIRTUALISATION

attractive alternative for VM migration [35]. In this respect, live migration can be described as a more robust and resilient approach to VM migration. Often to simplify the overall migration process with regard to both approaches, modern data centers eradicate the need to migrate disk storage by consolidating storage on what is referred to as a Network-Attached Storage (NAS) device, such a device is accessible to all hosts on the network [60].

Figure 13: Virtualised servers with live migration capability [63] Migration Over a LAN/WAN Network Migration does not merely refer to the process of transferring workloads from one host to another within the same network. As highlighted by Harney et al. its application can be portrayed in a much more broader sense by offloading VM instances between a number of interconnected networks [65]. Live migration can be harnessed over both a Local Area Network (LAN) and a Wide Area Network (WAN). As discussed above, migration over a LAN facilitates the high speed distribution of an entire VM environment across a heterogeneous collection of hosts. By contrast, live migration across a WAN proves to be a far greater challenge. More specifically, there are often several obstacles associated with this activity. The transferring of a VMs network state without compromising access to the service is a non trivial challenge which entails modifications to a VMs original IP address. In furtherance, such an activity can also incur high latency overheads, unlike migration in a LAN setting servers residing on discrete networks often require the transferring of memory and VM disk state over the network [66].

32

Chapter 2

2.8

2.8. AN AI APPROACH TO OPTIMISATION

An AI Approach to Optimisation

The core objective of AI is to attempt to understand human cognitive behaviour, that is perception, reasoning, prediction and manipulation and simulate such a process in an automated environment [67]. Bellman a staunch advocate of mathematics and computational intelligence formally defines AI as: “The automation of activities that we associate with human thinking, activities such as decision making, problem solving, learning..”[68]. The first landmark development in the area of AI can be dated back to 1943, Warren McCulloch and Walter Pitts fostered the first mathematical model of an artificial neural network [69]. In the 1950s major works in the field of AI began to emerge which formed the foundations for more contemporary developments. These works included the Turing Test, pioneered by Alan Turing, a benchmark utilised to determine a machines capacity to “think” analogous to that of a human. Others included Newells and Simons General Problem Solver (GPS), accredited as the earliest computer program to harness human problem solving approaches in order to solve a selection of puzzles [67]. Eventually in the year 1956 the term Artificial Intelligence was coined by John McCarthy, a significant and influential figure in the field of computational rationality [67]. Since its earliest endeavours the field of AI has greatly disseminated into many sub-fields and is indeed today considered an interdisciplinary approach to problem solving. However, it was the earlier work devoted to learning by trial and error, optimal control and temporal-difference methodologies which surfaced in the 1980s and propelled the more modern field of RL. Notably, advancements such as Bellman’s equation, MENACE trial-and-error learning system and also Samuel’s checkers-playing program have all signified major milestones in the evolution of RL [70]. RL in recent years has invoked major interest among the Machine Learning and AI communities [71]. RL provides a more attractive approach for learning and optimisation. It is fundamentally based on the notion of an autonomous agent with the inherent capability to learn by means of environmental interaction. This contrasts significantly to the more commonly implemented method of Supervised Learning which requires a set of input output pairs from which an agent can infer optimal behaviour [72]. Since the advancement of RL, theories they have been successfully applied to a wide variety of complex problem domains such as video gaming, cloud computing, urban traffic congestion management and also autonomous helicopter aerobatics [73–76].

33

Chapter 2

2.8.1

2.8. AN AI APPROACH TO OPTIMISATION

Agents

An agent can be described as anything that has the innate ability to perceive its environment often by means of sensors and react to its environment through actuators [67]. In general there are several types of agents which are composed of different design objectives and are often deployed in distinct environments. These agents as outlined by Wooldridge and Russell & Norvig [67, 77] include: • Simple Reflex Agents: Also known as purely reactive agents. These are the most simple type of agents which react to their environment based on their current perception rather than an accumulated historic perception. The internal workings of such an agent consist of a current internal state which is responsible for guiding the agents decision process, also present is a set of condition-action rules which are called upon to invoke some action. The design of such an agent often results in limited intelligence and in order to be implemented successfully they require an environment that is fully observable. • Model Based Reflex Agents: Unlike the previous agent, this type of agent implements an internal data structure which is a mechanism for tracking environmental states and accumulated historic data. The internal state of such an agent is updated at regular time steps often occurring by means of a cyclical process of combining current perception with internal state and thereby generating current state updates. One of the more distinct differences between this agent and the previously discussed agent is that two additional factors are leveraged. These comprise of knowledge about how the world evolves and also the impact of an agents actions on the world. For this reason alone such agents have the capacity to exploit partially observable environments. • Goal-Based Agents: This type of agent makes informed decisions based on goal information which describes situations that are perceived to be desirable. In order to execute actions that eventually achieve desirable goals the agent combines information about the expected outcomes of possible actions with goal information. This approach inherently incorporates a level of reasoning and consideration of future situations. • Utility Based Agents: In order to generate optimal behaviour as opposed to correct behaviour a utility based agent requires an additional factor known as a performance measure. This measure in essence maps a state or sequence of states to a utility while the objective of this type of agent is to maximise the expected reward. This agent is considered to be more robust as it is instilled with the capacity to be deployed in partially observable environments with high levels of uncertainty. • Learning Agents: This type of agent is one of the more advanced and complex agents. It is composed of learning capabilities which allows such an agent to operate in initially unknown environments while becoming increasingly more competent over time. Such an agent often comprises of four crucial components such as a critic which is responsible for determining the performance level of an agent. A learning element which processes feedback from the critic to enhance future performance. It also comprises of a performance 34

Chapter 2

2.8. AN AI APPROACH TO OPTIMISATION

element responsible for the selection of external actions, lastly a problem generator invokes exploration within an agent in order to potentially discover more optimal solutions in the long term.

2.8.2

Multiagent Systems

Multiagent Systems (MAS) have been theoretically well-grounded within the field of AI for some time, however, their inherent ability to achieve interoperability has gained them widespread recognition across multiple domains. As a result interacting systems have become more common place and pervasive throughout the computing era [77]. For this reason this thesis will provide a brief overview on MAS. MAS are composed of an assembly of autonomous agents often referred to as a society of agents within which each agent is composed of a set of commitments [78]. Agents operating in a MAS influence different parts of a shared environment. Often dependency relationships exist between agents whereby an agent requires the services of another agent in order to achieve a goal. Dependency relationships can often occur at discrete levels of complexity such as unilateral whereby an agent depends on another agent but not the inverse. A mutual dependency relationship is where both agents depend on one another and share a common goal. Also reciprocal dependency can occur when both agents depend on each other to achieve different goals [77]. Two of the most fundamental and desirable characteristics of MAS are cooperation and collaboration. Cooperation transpires when agents identify common goals and recognise that other agents are committed to the same goals [78]. Cooperation allows agents to operate in a systematic and coherent manner. Harmony among cooperating agents is often controlled by arbitration and negotiation, thus preventing overall system and performance degradation [79]. Collaboration among agents requires the decomposition of a problem into a number of sub tasks which are solved simultaneously by several heterogeneous agents [77]. Communication between agents allows for the coordination of actions and behaviours allowing agents to leverage information consumed by other agents operating in the environment, thus expanding their perceptive capabilities [79]. Since the rise of MAS a number of agent communication languages have been developed which can be broadly classified into KQLM and FIPA-ACL implementations [77]. For more information related to the domain of MAS refer to Wooldridge who provides an intuitive explanation [77].

35

Chapter 2

2.8.3

2.8. AN AI APPROACH TO OPTIMISATION

Reinforcement Learning

Reinforcement Learning enables an agent to learn optimal behaviour in a highly stochastic, non deterministic environment with no prior knowledge of its surrounding environment. An agent interacts with its environment in order to gain knowledge about how to optimise its behaviour and discover the most optimal policy to satisfy it design objectives, this is often a cyclical learning process of state-action-reward interactions as illustrated in Figure 14. The agent must discover which actions yield the greatest reward by means of trial and error. Notably the selected action effects both the immediate reward and also impacts on successor environmental states and therefore all subsequent rewards, in the literature this is often referred to as a delayed reward [80].

Figure 14: The agent environment interaction model [70]

Markov Decision Process All RL control problems can be intuitively modelled as Markov Decision Process (MDP), this is considered to be the standard approach to defining learning in sequential decision making problems faced with adverse uncertainty. The MDP framework allows agents to learn an optimal policy incrementally by means of simulated trials [81]. All MDPs are governed by a Markov property. This property states that only the current state of the environment is required in the prediction of all future states [70]. As outlined by Barrett et al. the MDP formalism can be represented

as a four tuple comprising of states, actions, transition probabilities and rewards S , A, p, q [82]. Where: • S , denotes a set of possible states; • A, represents a set of actions; • p(st+1 |st , at ) represents a probability distribution governing state transitions; • q(st+1 |st , at ) denotes the probability distribution governing the rewards received R(st , at );

36

Chapter 2

2.8. AN AI APPROACH TO OPTIMISATION

The learning process as illustrated in Figure 14 is often segregated into discrete time steps t. At the end of each time step t the learning agent occupies state st ∈ S . The agent selects a possible action at ∈ A(st ) where A(st ) refers to the set of possible actions available in the current state st . The execution of the selected action induces an environmental state transition st+1 and results in the allocation of a reward R(st , at ) to the agent. The state transition probability p(st+1 |st , at ) calculates the probability of a transition to st+1 given that the agent occupies state st and selects action at [74], formally defined as: a 0 Ps,s 0 = Pr{st+1 = s |st = s, at = a}

(2.1)

q(st+1 |st , at ) denotes the expected reward received by the agent after transitioning from state st to state st+1 by executing action A(st ) [74], this is defined as: a 0 Rs,s 0 = E {rt+1 |st = s, at = a, st+1 = s }

(2.2)

A policy π is the solution to a MDP, it provides a mapping of states to actions and guides the agents decision making process when selecting an appropriate action for any given state. The objective of a MDP is to generate the most optimal policy which results in the highest overall reward. An optimal policy is often donated as π ∗ [83]. Temporal Difference Learning RL as outlined above is primarily concerned with converging to an optimum solution. Temporal Difference (TD) methodologies as outlined by Sutton and Barto are one such set of approaches which can be harnessed in order to discover an optimal policy [70]. They implement prediction based learning by incrementally updating current estimates based on the outcome of previous estimates, a process often referred to as bootstrapping [74]. The application of TD procedures in real world prediction problems has proven to be a more effective solution in comparison to traditional approaches resulting in more accurate predictions and faster convergence rates [84]. There are several TD learning strategies in use today, two of which are the most widely implemented when faced with an unknown model of the environment are Q-learning and Sarsa [85].

37

Chapter 2

2.8. AN AI APPROACH TO OPTIMISATION

Q-Learning Watkins pioneered the ideology of combining optimal control theory and dynamic programming to foster Q-Learning which today is the most well understood and widely used off-policy TD learning algorithm [80, 86]. Q-values denote the expected reward for each state-action pair often stored in a matrix which is a representation of the agents knowledge to date [87]. At each discrete time step t an agent updates its current Q-value estimate based on the outcome of selecting action at while in state st utilising Sutton and Bartos update rule [70] which is outlined below: h i Q(st , at ) ← Q(st , at ) + α rt+1 + γ max Q(st+1 , a) − Q(st , at ) (2.3) a

The discount factor γ determines the degree to which an agent favours long term rewards over short term gains. The rate of γ typically ranges between 0 and 1. A rate set closer to 1 results in an agent that is more forward looking and strives to maximise future rewards while a rate closer to 0 results in an agent that assigns a greater weight to short term rewards often referred to as a myopic agent [74]. The learning rate α also ranges between 0 and 1. It determines the rate at which Q-values are updated. A learning rate closer to 0 results in less updates to the Q-value estimates and therefore a decline in the agents learning, while a rate of 1 induces more learning. It is often desirable to reduce the learning rate over time as the agent approaches a more optimal strategy [88]. One of the differentiating features of off-policy algorithms such as Q-Learning is that in the event of an agent selecting a suboptimal action the agent still attempts to pursue the most optimum policy possible [81]. This is achieved by updating Qvalues based on adding the immediate reward to the discounted estimation of the most rewarding action the agent could take in the resulting state as exemplified in the following update rule rt+1 + γ maxa Q(st+1 ). The procedural form of the algorithm as outlined by Sutton and Barto [70] is illustrated below: Algorithm 1: Q-Learning Initialize Q(s, a) arbitrarily Repeat(for each episode): Initialize s Repeat(for each step of episode): Choose a from s using policy derived from Q(e.g.,  − greedy) Take action a, observer , s 0  Q(s, a) ← Q(s, a) + α r + γ maxa 0 Q(s 0 , a 0 ) − Q(s, a) s ← s0 ; until s is terminal

38

Chapter 2

2.8. AN AI APPROACH TO OPTIMISATION

Sarsa Originally introduced by Rummery and Niranjan in 1994 [89]. The authors emphasised that the use of γ maxa Q(st+1 , a) as observed in Q-Learning was not a reliable estimation of the expected future return for any given state st . Q-Learning assumes that the most optimal action yielding the greatest reward will be selected in the subsequent state and therefore updates Q-values based on this assumption. As a result the authors argument was warranted on the merits of the probability of inaccurate Q-values in the early stages of the learning process while in the latter stages the occurrence of an overestimated maximum value. Sarsa is an acronym for state, action, reward, state, action. Its name is derived from a quintuple of events that must occur in order to transition from one state to the next and update the current Q-value approximations. The update rule as outlined by Sutton and Barto [70] is as follows: h i Q(st , at ) ← Q(st , at ) + α rt+1 + γQ(st+1 , at+1 ) − Q(st , at ) (2.4) In contrast to Q-Learning the Sarsa learning algorithm is considered to be an onpolicy algorithm which updates action value estimates strictly based on the experience gained as a result of selecting an action according to some policy. In this regard, an agent implementing a Sarsa learning algorithm in some way recognises that a suboptimal action may be invoked and therefore this approach is often likely to produce more accurate estimates. As a result on-policy algorithms are more favourable in highly volatile environments, they tend to avoid parts of the state action space where exploration poses greater danger [70, 90]. The procedural form of the algorithm is adopted from the work of Sutton and Barto [70] and is presented below: Algorithm 2: Sarsa Initialize Q(s, a) arbitrarily Repeat(for each episode): Initialize s Choose a from s using policy derived from Q(e.g.,  − greedy) Repeat(for each step of episode): Take action a, observe r , s 0 Choose a 0 from s 0 using from Q(e.g.,  − greedy)  policy derived  0 0 Q(s, a) ← Q(s, a) + α r + γQ(s , a ) − Q(s, a) s ← s 0; a ← a 0; until s is terminal

39

Chapter 2

2.8. AN AI APPROACH TO OPTIMISATION

Action Selection Strategies One of the fundamental challenges faced by model-free learning agents is the tradeoff between exploration and exploitation [70]. The inherently complex and uncertain nature of RL based control problems require the agent to explore all states infinitely often in order to discover a more optimal policy, this is particularly vital in highly dynamic environments in order to maintain an up to date policy [81]. However, this is often conducive to the selection of suboptimal actions resulting in a local optimum [83]. Therefore, in order to perform well the agent should exploit the knowledge it has already gained. Balancing such a tradeoff is critical as it impacts greatly on the speed of converging to an optimal policy and also the overall quality of the learned policy [91]. Two of the most widely implemented methodologies to manage this trade off are -greedy and softmax. -greedy This exploration mechanism has gained widespread recognition across the research community due to its capacity to achieve near optimal performance with the use of a single parameter when applied to a variety of problem domains [91]. This probability parameter is known as epsilon and it controls the rate of exploration in the state space. At each time step t the probability metric is compared against a random value and if a correlation exits between both values a random action is selected by the agent. Depending on the environment  is often reduced after each learning episode as the agent converges to the optimal policy [92]. Softmax One of the major drawbacks of -greedy is that in order to explore it chooses equally among all actions and therefore, it is equally probable to choose the worst appearing action as opposed to the next best action. In order to overcome this problem softmax assigns action probabilities according to the expected utility, thus ensuring higher rewarding actions are more likely to be explored. Softmax selection can be implemented in a number of ways, one such approach as highlighted by Sutton et al. [70] leverages Boltzmann distribution in which case the learner selects an action based the following probability metric: eQt (a)/τ Pn Q (b)/τ t b=1 e

(2.5)

Where τ denotes the degree to which actions with higher expected returns are favoured. The higher τ the more equiprobable actions are.

40

Chapter 2

2.8. AN AI APPROACH TO OPTIMISATION

Reward Shaping One of the more profound limitations of conventional RL algorithms is the slow rate at which the agent converges to an optimal policy [93]. In order to expedite the learning process the research community have developed more advanced techniques which incorporate domain knowledge into the reward structure allowing for improved guidance during the learning process. In more traditional RL approaches Q-values are typically initialised at a default value of 0, a random value or alternatively pessimistic/optimistic values [94]. The ideology of incorporating domain knowledge allows Q-values to be initialised in a more precise and meaningful way. Reward shaping introduces an additional reward which guides the agent during the learning process without impacting on the optimal policy [95]. While much research has been devoted to reward shaping techniques both in single and multiagent systems as outlined in the following more significant papers [96–99] there is a general consensus that standard reward shaping can often lead to unexpected results. One of the more classical examples is portrayed in the work of Randløv and Alstrøm whereby the learning agent as a result of a poorly defined reward shaping strategy cycles around in circles rather than in the direction of the goal destination [96]. A number of alternative techniques have since been developed in order to solidify reward shaping approaches. One of the most significant advancements to date has been accredited to the development of Potential Based Reward Shaping (PBRS). Potential Based Reward Shaping PBRS is an extension of the standard reward shaping approach fostered by Ng et al. which proved to be a more reliable and effective solution in reducing the time required by the agent to learn while yielding unchanged optimal behaviour [100]. PBRS associates a potential to each state, thus the formulation of PBRS is the difference between both potentials [100] which can be formally expressed as: F (s, a, s 0 ) = γΦ(s 0 ) − Φ(s)

(2.6)

Where Φ is the potential function which maps states to associated potentials and γ is defined as the same discount factor applied in the update rule. The PBRS reward is concatenated to the standard reward received from the environment in order to encourage faster convergence.

41

Chapter 2

2.8. AN AI APPROACH TO OPTIMISATION

The Curse of Dimensionality One of the most common challenges faced by RL algorithms is known as the curse of dimensionality which was originally coined by Bellman in 1957 [101]. In trivial problem domains value functions are often represented discretely in the form of a matrix or table. In more complex real world problems the state space tends to grow exponentially, in some cases the environment is exposed to a continuous state space which renders the implementation of a matrix unsustainable. This becomes a key issue of generalisation whereby the agent must learn by applying previously experienced state knowledge to situations that have not yet been explored [70]. One such approach to curtail this problem is to rely on function approximation, this technique allows the value function to be approximated and thus allows for the prediction of action values for unseen states [89].

42

Chapter 2

2.9

2.9. RELATED WORK

Related Work

In recent times the pervasiveness of on demand computing has invoked major concern for the efficiency of resource allocation policies across the cloud computing paradigm. Initially, high performance computing research initiatives were the greatest focus in data center deployments. However, there has been an ever growing incentive for the development of more energy efficient resource management policies in order to curtail the colossal energy consumption rates endemic in existing data centers. The objective is to drive a more sustainable cloud platform with the capacity to advance technology for future generations. Below is an examination of the most relevant research conducted to date with regards to energy efficiency and dynamic resource allocation.

2.9.1

Threshold and Non AI Based Approaches

Throughout the body of literature threshold/non AI based approaches have prevailed as one of the most widely used approaches for efficient resource management in cloud computing. These type of approaches often require domain or application knowledge in their operation. One such approach is that of Lee et al. who proposed the implementation of two task consolidation heuristics known as ECTC and MaxUtil in order to curtail energy consumption of underutilised resources [5]. The overall procedure requires matching incoming requests with available resources utilising either of the devised consolidation algorithms. The objective is to maximise utilisation while minimising the overall energy consumption without violating time or performance constraints. Both heuristics consider active and idle energy consumption the most significant difference between them is that MaxUtil bases its consolidation decision on average resource utilisation while ECTC calculates the actual energy consumption incurred from the overlapping time of new tasks running in parallel with current tasks on a given resource. The authors experimental evaluation revealed that the performance of each algorithm is entirely dependent on the current state of the environment. Overall both procedures have the capacity to promote improved energy savings and resource utilisation. In particular, the ECTC algorithm achieved an 18% reduction in energy consumption while MaxUtil improved energy efficiency by a lower rate of 13%. However their study did reveal several shortcomings. Their cloud model does not appear to facilitate dynamic reallocation of running tasks. Furthermore, one of the most prominent limitations is that their proposed algorithms base allocation decisions on the current state of the system which fails to consider long term optimisation resulting in a local optimum. Srikantaiah et al. also proposed a solution for dynamic consolidation of workloads in order to achieve optimal management of resources [61]. Similar to the work of Lee et al. [5] the authors focused on the interconnection between energy consumption, resource utilisation and performance levels, their study examined how such factors impact on one another when various workloads are consolidated on a single host. Their initial observations revealed that it is possible to determine an optimal utilisation point. The authors modelled the consolidation problem as a modified

43

Chapter 2

2.9. RELATED WORK

multidimensional bin packaging problem in order to determine the optimal allocation of workloads. They proposed a heuristic which seeks to maximise resource utilisation relative to the optimal point. If a resource is utilised to its limit a new host is booted up and all requests are re-allocated using the proposed heuristic. However, there were also some limitations in their work. The authors assume that all applications can be hosted by any server. Also the proposed solution fails to recognise that migrating workloads also incurs significant overheads which is one of the central arguments in the work of Beloglazov et al. and it was also highlighted by Clark et al. [60, 62]. Similar ideas have also been proposed by Beloglazov et al. in one of their more highly cited papers [102]. Their study examined energy aware provisioning of resources for incoming requests and also the optimisation of current workload allocations. Similar to the work of Srikantaiah et al. [61] they model the provisioning of requests as a bin packaging problem. Unlike the previous studies, the authors implement a Power Aware Best-Fit Decreasing (PABFD) heuristic for the efficient allocation of VM instances to existing hosts. This heuristic sorts all VM instances in decreasing order based on their current CPU utilization rates, it then allocates a VM to a host which will cause the least increase in power consumption. The authors further introduce three double threshold VM selection policies in order to determine which VM instances to offload. These comprise of the Minimisation of Migrations (MM) policy which selects the minimum number of VMs possible to reduce the load when it rises above the upper threshold. The Highest Potential Growth (HPG) policy which migrates VMs that underutilise allocated resources, lastly they also introduce a random policy which randomly selects a number of VMs to migrate. The proposed algorithms were analysed using the CloudSim toolbox and were evaluated based on Service Level Agreement Violation (SLAV)s, power consumption and the total number of migrations. Their results showed that the MM policy outperformed all other policies and rendered the most significant energy savings with the least SLAVs and total number of migrations compared to more static resource allocation techniques. The nature of the proposed policies required regression analysis to model the relationship between energy consumption and the values of the thresholds in order to determine the most appropriate values relative to QOS. Additionally, all policies fail to consider that in real world systems it is common for discrete workloads to have various levels of priorities, this lack of granularity could be seen as a limitation. Extending their above research Beloglazov et al. investigated specifically the impact of online deterministic algorithms coupled with adaptive heuristics for dynamic consolidation of VMs [62]. The motivation for such an approach was driven by the practicality of more dynamic utilisation thresholds for highly stochastic and unpredictable workloads. Their proposed energy model places major emphasis on the delivery of SLA while considerably reducing energy consumption. They introduced a set of host overloading detection algorithms, one of the most significant is a Local Regression (LR) algorithm originally fostered by Cleveland [103]. This approach allows for the analysis of a local subset of data from which a curve can be derived to approximate the original data. The proposed algorithm infers the probability of a host becoming over utilised by multiplying the predicted CPU utilisation by a safety parameter and comparing it against the maximum possible utilisation. In 44

Chapter 2

2.9. RELATED WORK

addition the authors proposed several novel VM selection policies such as Minimum Migration Time (MMT) which selects VMs based on the least amount of time required to migrate. Also suggested was a Maximum Correlation (MC) policy which analyses the CPU correlation of all VM instances and offloads the ones which have the highest correlation. Their experimental results concluded that the composition of LR and MMT in conjunction with the PABFD algorithm significantly outperform all other VM consolidation procedures, resulting in a profound reduction in energy consumption and SLAVs. Cardosa et al. investigated the impact on VM resource allocation and power consumption by leveraging min, max and share parameters analogous to those utilised in commercial virtualisation technologies [104]. Unlike the previous research initiatives outlined above this work establishes the need for workload priorities. The min, max and shares parameters define the upper and lower bounds for resource utilisation for each VM while the shares parameter is a weighted value which denotes the priority of each VM during the distribution of spare resources. The authors hypothesis is that by distributing more limited resources to higher utility VMs it will improve overall utility of the data center while also managing the power-performance tradeoff. The authors developed a PowerExpandMinMax algorithm and evaluated it by harnessing both a small scale synthetic data center setup and also a large real data center testbed. One of their experiments revealed a 47% improvement on the overall utility of the data center. However, in light of this their study focused merely on CPU resource utilisation. Furthermore, the input parameters for the min, max and shares values were defined statically and therefore could not be dynamically adjusted during run time. This questions the suitability of such an approach deployed in a real world stochastic cloud environment. Notably, the authors also did not comment on the impact of such an approach on SLA which could potentially impact the credibility of their findings. Kusic et al. proposed the implementation of a predictive lookahead control algorithm in order to promote reduced energy consumption and SLAVs while maximising overall systems utility [105]. The controller is configured to account for the migration costs of relocating a VM from one host to another while also incorporating the risk of excessive migration into its decision making process. More specifically, their approach leverages the Kalman filter estimator in order to approximate future requests for improved resource provisioning. Their methodology was experimented on a small scale cluster of servers. Their findings revealed that a cluster of servers managed by the proposed controller conserves 26% of the energy required by an uncontrolled environment while still conforming to QOS agreements. Additionally, the experiment proved that a risk-aware controller is an empirically effective method by reducing SLAVs by 35%. However, one of the most prominent limitations of their study is the susceptibility of the proposed model to an exponential increase in worst-case complexity when faced with a rise in the number of control options and more timely predictions. Also equally apparent was that their experimental set up was based on a small scale testbed.

45

Chapter 2

2.9. RELATED WORK

Verma et al. presented an application placement controller known as pMapper which aims to dynamically minimise energy consumption and migration costs while maintaining SLA [106]. The underlying proposed architecture leverages the power management capabilities of commercial virtualisation technologies while the upper layer of the architecture is composed of three distinct management entities, namely a performance manager which monitors current performance and resizes VM instances based on the SLA. A power manager which manages the power state of underlying hardware and a migration manager which estimates the cost of live migration. The output estimates of all three management entities are fed into an arbitrator which has a global view of the system utilising such estimates to decide the most optimal allocation of VM instances. The authors further introduced three dynamic placement algorithms of which include Minimum Power Packing (MPP) which consolidates VMs in a manner which reduces overall power consumption. MPPH is an additional algorithm which factors in a parameter responsible for tracking the previous location of a VM instance in order to minimise the number of migrations. They also introduce pMap which seeks to discover an allocation strategy to minimise total power and migration costs. Their overall results showed a savings in power consumption over a small cluster of servers of 25%. Similar to the work outlined in the following papers [5, 61, 102, 105] the proposed energy model did not consider workload priorities. Furthermore, their study focused on CPU utilisation while their empirical evaluation did not reveal the overall impact on SLAVs, similar limitations were also highlighted in the work of Cardosa et al. [104]. Stein et al. focused on dynamic service composition in order to address the problem of complex computational workflows heavily relying on non deterministic and highly volatile third party services [107]. They explored the optimisation of service selection in order to meet the resource requirements of tasks in a given workflow. More specifically, their approach investigated the implications of choosing particular service compositions on the overall cost and expected reward. Unlike the work proposed by Wajid et al. [108], during service selection the consumer makes predictions about workflow performance and selects a service composition to meet its needs based on QOS criteria such as cost, failure probability and expected duration. The intention is to select services to maximise the expected utility. In order to counteract the occurrence of failure their approach introduces redundancy by selecting multiple services per task, thus ensuring tasks are completed eventually. Their experimental results showed an average 46.8% improvement on utility compared to other selection approaches. This work is however primarily focused on the management of resources for improved performance as opposed to considering the implications of such an approach on energy and performance in equal measure.

46

Chapter 2

2.9.2

2.9. RELATED WORK

AI Based Approaches

Alternatively, a number of research undertakings have been devoted to the application of AI methodologies for autonomic resource allocation in cloud computing. This section will firstly introduce some of the major works in autonomous resource provisioning while the second half of this section will specifically explore the range of RL based approaches. Wei et al. explored the application of game theory to optimise resource allocation across a range of inter-dependent tasks with associated time and cost constraints [109]. Their approach is implemented in two phases. The first phase requires each agent to solve the optimisation problem locally, independent of all other agents. In the second phase an evolutionary global optimisation algorithm is leveraged in order to approximate a final optimal solution while adhering to QOS constraints resulting in overall improved efficiency. Berral et al. explored the problem of dynamic consolidation of VMs for improved energy efficiency and delivery of SLA [110]. The authors proposed a framework composed of energy aware consolidation algorithms and Machine Learning techniques. They developed a Supervised Machine Learning method which implements a prediction model to estimate the impact of workloads on available resources in terms of performance and energy consumption. Their approach reallocates workloads as required and dynamically powers down idle hosts. Unlike the work of Srikantaiah et al. [61] and Beloglazov et al. [102] this approach considers the costs in terms of energy and time required for powering up and down servers. Their experimental results concluded that their methodology outperformed other scheduling policies utilised in their analysis. However, there were also some limitations in their work. They assumed a negligible migration cost while also only considering the response time as a SLA guarantee. Also it could be argued that a Supervised Learning approach is less adaptive as it does not leverage the immediate outcome of resource allocations and also it does not take into consideration the impact of immediate decisions on future system performance. Interestingly, the authors argue that power consumption cannot be determined by linear regression alone as it has a non linear relationship with CPU, however, this seems to be a direct contradiction to the work of Verma et al. and Cardosa et al. which describes a linear relationship between power consumption and CPU [104, 106]. Pandey et al. proposed a Particle Swarm Optimisation (PSO) technique in order to efficiently map workflow tasks to cloud resources [111]. Similar to Wei et al. [109] their approach considers both data transfer costs and execution time. However, a distinct feature of the authors work was the introduction of a PSO algorithm whereby the particles represent the tasks in a given workflow which are mapped to cloud resources. Each particle is evaluated utilising a fitness function in order to discover the optimal mapping of tasks to resources. Additionally, the authors designed a dynamic heuristic which seeks to optimise the cost associated with each mapping derived from the PSO technique. Their experimental results revealed that such an approach could produce three times more cost savings compared to a generic selection policy. However, the application of such a technique in a real cloud platform is questionable on the basis of failure to consider the highly volatile nature of user 47

Chapter 2

2.9. RELATED WORK

requirements as their approach assumes to know the size of the input and output data of all tasks. Wajid et al. studied energy efficient dynamic service composition in application workflow executions within a federated cloud environment [108]. Their approach introduced a process template for each application which denoted a set of tasks required for the successful execution of an application including both functional (service description and quality) and non functional (energy consumption and response time) task requirements and constraints. They implemented a Genetic Algorithm (GA) to discover the most optimal set of composed services to adequately satisfy task requirements. Their experimental results illustrated that by dynamically composing service offerings from a multitude of service providers they could reduce energy consumption considerably. Furthermore, the authors also suggested the use of a utility based run time adaption model which allows a task to periodically check the utility of the current service selection and adapt as necessary. This was suggested in order to take account of the highly stochastic and dynamic nature of resource availability, thus promoting a more robust energy efficient approach. The most evident limitations of this study was that it did not consider the total execution time of the application and also the associated cost of migrating workloads across geo-diverse locations which according to Verma et al. are significant and should be considered in the dynamic allocation of workloads [106]. Bao et al. also explored service composition in the cloud while primarily focusing on the delivery of QOS [112]. Their work was heavily influenced by the lack of research into service interdependency in the cloud, most studies focus merely on task dependencies in service composition. Their approach models available web-services as a Finite State Machine (FSM) in order to accurately represent service constraints and the required sequence in which a service must be invoked. They introduce a tree pruning algorithm which derives an appropriate composition of web-services based on QOS requirements. The selection of services is implemented utilising a weighting scheme in order to invoke the most optimum composition possible.

48

Chapter 2

2.9. RELATED WORK

Reinforcement Learning Based Approaches Tesauro et al. proposed a novel hybrid approach for autonomous resource allocation consisting of both RL and model-based queuing methods [113]. Their work was largely motivated by their original research endeavours as outlined in the following papers [114–116] from which they discovered several RL complexities, namely scalability issues in large state spaces and suboptimal behaviour which naturally occurs in the early stages of learning. To overcome such problems they introduce a non linear function approximator which is trained offline using collected systems data while allowing a queuing policy to manage the system until training has been complete. Their results proved that this is an empirically effective approach to avoid suboptimal performance in the initial stages of learning while also achieving greater batch performance with minimum degradation. The authors extended this research further by presenting an RL approach to discover optimal control policies for online management of power consumption and performance in application servers [117]. Their proposed solution consists of a performance manager and an RL based power manager which strives to optimise the power performance tradeoff by means of cooperation and information sharing over workload variations. The resulting control policies are then harnessed by a CPU frequency controller. Their results showed more than 10% power savings while delivering the required performance. However their simulation did not represent realistic workload variability and also it was not tested in a large scale environment which questions its suitability in real world applications. More recently, Barrett et al. explored the optimisation of resource provisioning to support application scalability in the cloud [74]. They applied a parallel RL methodology in order to expedite the rate of convergence to an optimal policy in a highly stochastic non deterministic environment. Their approach comprised of a systematic coordination of information sharing among multiple agents which effectively proved to reduce the traversal of the state space for each agent. Each agent in the environment seeks to approximate an optimal policy based on their own experience which is communicated to all other agents. The combined experience of all agents is leveraged by an agent manager who is responsible for making a global allocation decision which improves the overall system performance. Another study devoted to dynamic resource provisioning which leverages RL based approaches is that of Ras et al. [118]. They proposed the introduction of an RL based VM auto configuration agent known as VCONF in order to respond effectively to variations in demand in a cloud environment. VCONF is designed to operate in the control domain of virtualised software, it leverages model based RL techniques to speed up the rate of convergence in a non deterministic environment. More specifically, it harnesses environmental models to create simulated experience for action value estimations. Once the model is trained it can be leveraged in order to predict the expected rewards from unseen state action pairs. Their experimental analysis showed that the proposed solution outperforms basic RL approaches by achieving greater throughput and reduced response time. Their research however utilised standardised samples in their training model and did not account for specific VM behaviours. 49

Chapter 2

2.9. RELATED WORK

Das et al. introduce a multiagent based approach to manage the power performance tradeoff by specifically focusing on powering down underutilised servers [119]. Their solution builds on previous work in which they formalise an architecture for the management of autonomous agents [120]. The modified proposed architecture comprises of several application manager agents which are responsible for managing workloads, they communicate the expected utility of receiving the required resources to a resource governing agent. The utility function comprises of both power and performance estimations. The resource governing agent is responsible for global optimisation of resource allocations, it seeks to maximize the total expected utility across all competing applications while also taking account of the cost of powering down underutilised hosts. However one of the major limitations of their proposed solution pertains to the assumption that applications are allocated the resources of an entire server and as a result the focus in this study is merely on minimising the number of active servers while delivering the required level of performance. Flesk et al. proposed Lr-RL, a dynamic VM selection policy which leveraged RL methodologies to optimise the selection of VM instances from over utilised servers [121]. Their approach was largely inspired by the work of Beloglazov et al. who introduced Lr-Mmt, a VM consolidation algorithm which has since become one of the leading research advancements in energy aware VM resource allocation [62]. Their proposed algorithm harnessed an autonomous agent in order to decide the most optimal selection of a VM instance for migration based on the current utilisation of a given host. In order to demonstrate the credibility of their proposed algorithm they analysed and compared it against the Lr-Mmt VM selection policy. Their experimental analysis revealed that their algorithm outperformed that of Lr-Mmt. More specifically, their results rendered significant improvements in energy consumption, the total number of migrations and SLAVs. The authors further expanded their work by introducing PBRS which resulted in an overall improvement upon their initial Lr-RL algorithm. Similar to their previous research approach this solution required the agent to select the most optimal VM for migration based on the current state of the over-utilised host, however this selection process required the agent to specifically choose a VM instance with a size capable of reducing the number of migrations on a given host. Their overall results concluded that this approach was extremely effective resulting in a 15% decrease in energy consumption, a 36% decrease in the total number of VM migrations while also reducing significantly the number of SLAVs.

50

Chapter 2

2.9.3

2.9. RELATED WORK

Complex Systems Research Group

The research proposed in this thesis has been conducted within the Complex Systems Research Group in NUIG, led by Dr. Enda Howley and Dr. Jim Duggan. The research conducted within this group largely examines a variety of AI problems across a diverse range of domains. Some of the more recent research initiatives within the group have been devoted to the area of cloud computing. Among the most influential has been the work of Barret et al. whose earlier research presented an application workflow scheduling architecture [122]. Their approach deployed multiple solver agents to evolve workflow schedules utilising a GA. The outcome produces various cost and make span scheduling solutions which are leveraged by a MDP agent in order to discover the most optimal schedule based the current state of the environment. Their results showed that such an approach was effective as the MDP agent was able to optimally select the best set of evolved workflow schedules in spite of a highly dynamic environment. Their work is further extended as described earlier by deploying a multiagent RL methodology to optimise resource allocation and application scalability. The outcome of their solution produced significant improvements in the rate of converging to an optimal policy for overall improved system performance [74]. Further research in the cloud domain was conducted by Flesk et al. [121]. As outlined earlier their work primarily focused on energy efficiency by developing a dynamic VM selection algorithm which employs RL methodologies. Other research in the group spans a number of areas such as work in Smart Cities research conducted by Mannion et al. who continues to advance the state-of-theart with significant contributions to traffic signal control [75, 83, 87, 94]. Also extensive research over the years has been devoted to evolutionary computation and cooperation [123–125] and also general optimisation [126, 127].

51

Chapter 2

2.10

2.10. RESEARCH GAP

Research Gap

Drawing on the entire body of knowledge presented throughout this section and also across the entire chapter, there is strong evidence to suggest that although a significant amount of research has been devoted to this field of study there still remains a vast amount of research to be carried out in order to further advance the stateof-the-art. In particular, there is a clear need for the development of more adaptive and flexible cloud based infrastructures with the capacity to tackle the challenging problem of energy efficiency in data center deployments. Below is a summary of the most prominent limitations of the state-of-the-art research, it is intended that the research proposed in this thesis will address a select number of these limitations while also addressing a gap in current literature. • Failure to consider that the migration of workloads between hosts incurs significant overhead. • Lack of consideration for workload priorities which could further reduce the impact on SLAVs and energy consumption. • Some of the work presented in this chapter was conducted using small scale testbeds rather than more realistic large scale cloud environments. • The majority of research presented in this chapter focused merely on CPU resource management. • Also equally apparent is that some studies did not consider realistic workload variability in their experimental analysis. It is clearly evident that AI approaches are one way of introducing more robust and flexible solutions as presented in the following significant papers [74, 108, 113, 117, 121, 122]. More noticeably, RL methodologies are theoretically well understood and in particular they facilitate highly dynamic environments due to their inherent ability to reason under uncertainty while also considering allocation decisions that could possibly suffer from delayed consequences. For these reasons this thesis intends on advancing the state-of-the-art in this domain by harnessing RL methodologies which overall is a technique that has not been fully leveraged in the cloud computing paradigm. Furthermore, it is evident that there has been a very limited amount of research focused towards the application of RL methodologies in the area of VM placement. The vast majority of RL techniques applied to optimise cloud resources focus on performance or achieving improved energy efficiency on a much more refined basis as demonstrated in the following publications [74, 117, 118]. The work that has been published to date within the niche domain of VM placement is primarily devoted to the application of heuristic based approaches as outlined in the following papers [5, 61, 102, 105, 106]. While the use of heuristics provide a reasonable solution to this problem they fail to address the need for more adaptive solutions which are better aligned to the dynamic nature of the cloud environment. This presents a major opportunity for the development of a more robust and novel solution to the complex problem of energy efficient resource provisioning.

52

Chapter 2

2.11. SUMMARY

Overall this thesis aims to investigate the effects of introducing an RL methodology for VM placement in a data center deployment. More specifically, it intends on analysing and measuring the performance of both Q-learning and Sarsa learning algorithms combined with various action selection policies which to the best of our knowledge has not yet been investigated in the domain of VM placement. Furthermore, it also aims to identify whether more advanced RL techniques such as PBRS can further improve energy efficiency in relation to the standard RL approach. Lastly, in order to analyse the efficacy of an RL VM placement approach this work will also investigate whether the performance of the proposed approach is able to out perform the state-of-the-art placement policy.

2.11

Summary

This chapter reviewed the most current literature and theoretical components across the fields of both cloud computing and AI that are required for the purposes of this thesis. More specifically, it firstly introduced the background and fundamentals of cloud computing. Following on from this it focused on the domain of AI and in particular a subset of AI known as Reinforcement Learning. This chapter also highlighted the most significant and relevant research conducted to date concerning energy efficiency and dynamic resource allocation beginning with the more classical threshold based approaches outlined in Section 2.9.1 which are predominant throughout the body of literature. The key advancements presented in this section are accredited to the work of Beloglazov et al. and Kusic et al. [62, 102, 105]. Following on from this Section 2.9.2 introduced a range of AI based approaches, more specifically, it explored techniques such as Game Theory, Supervised Machine Learning, PSO, FSMs and also GAs. Progressing forward, this section introduced the body of research leveraging RL methodologies which seek to drive resource optimisation by means of autonomous resource allocation. The major works in this section were conducted by authors such as Tesauro et al. [113, 117] and Barrett et al. [74]. Lastly, this chapter discussed the most significant research presented by the Complex Systems Research Group to which this thesis is strongly affiliated. It also identified a gap in current literature and discussed the rationale behind an RL approach.

53

Chapter 3 CloudSim 3.1

Overview

The following chapter seeks to firstly introduce the CloudSim toolkit as the chosen methodology for the work outlined in this thesis. It will also discuss the rationale behind the utilisation of such a tool and highlight the core classes and components of the CloudSim toolkit. Furthermore CloudSim is an event driven simulation tool, this chapter will describe a high level overview of the simulation model which is fundamental to the communication process between core elements of CloudSim and is required in order to model all possible cloud scenarios within this environment. Lastly this chapter will discuss the components required to setup an energy aware simulation that is required for the purposes of this work.

3.2

Simulation Tool

CloudSim is a toolkit developed by the CLOUDS laboratory at The University of Melbourne, Australia [31]. It is composed of a Java library which provides a framework for modeling and simulating a range of cloud computing infrastructures and scenarios. In order to evaluate the scientific contribution and creditability of the work in this thesis it is essential to conduct analysis on a large scale virtualised cloud infrastructure. The utilisation of a real cloud infrastructure in such a scenario is an unsuitable methodology as it proves to be extremely difficult to simulate large scale experimental analysis in a repeatable and controlled environment. In order to accurately evaluate the performance of the work in this thesis the CloudSim toolkit has been selected as the simulation framework. CloudSim among other functions supports the management of cloud resources across large scale cloud platforms and contains the necessary components to enable the empirical evaluation of energy aware cloud based simulations. Additionally CloudSim is leveraged by a large number of researchers across both industry and academia while also playing a pivotal role in some of the latest cutting edge research initiatives to date as highlighted in the following publications [3, 62, 102, 128, 129].

54

Chapter 3

3.2.1

3.2. SIMULATION TOOL

Key Components

The CloudSim toolkit is composed of a variety of Java classes and interfaces which can be used for describing and modeling data centers, VMs, applications, users, computing resources, scheduling and provisioning policies for the complete management of the entire cloud environment. All components can be extended, modified or replaced to facilitate the simulation of specific cloud scenarios. Below is an outline of some of the fundamental classes and interfaces of the CloudSim toolkit which are standard across all simulations. Figure 15 below illustrates a high level overview of the CloudSim class diagram.

Figure 15: CloudSim class diagram [31]

CloudInformationServices: This class models an entity that acts as a central repository for the registration of all cloud resources. At the start of the simulation process all data centers within the environment register their resources with the CIS entity. Broker: This class allows for the modeling of a broker entity which is responsible for retrieving a list of all of the available resources from the CIS entity and requesting the allocation of those resources to meet an applications QOS requirements. Datacenter: An instance of the datacenter class models the available hardware which is offered by the cloud provider. It encapsulates all of the hosts within the data center and is responsible for the implementation of a set of policies which governs the allocation of memory, bandwidth and storage devices to all hosts and their respective VMs.

55

Chapter 3

3.2. SIMULATION TOOL

Host: This class is used to model a server within a data center. All hosts in the data center can be either heterogeneous or homogeneous. Each instance stores information such as memory, storage, the number and type of processing cores, allocation policies for delegating processing power among VMs executing on the host and also policies which govern the provisioning of memory and bandwidth to those VMs. Vm: This class is used to model VM entities which are subsequently managed by the host they are currently executing on. All VMs are instantiated with a certain configuration such as memory, size and the number of processing elements. Each VM is allocated a certain portion of the hosts resources according to the policies implemented by the host such as VmScheduler, BwProvisioner and RamProvisiner policies. Cloudlet: This class is used to model the workload of application services in the cloud. A cloudlet object fundamentally represents the complexity of an application and its computational requirements. In order to accurately model the stochastic nature of the cloud environment this class allows a cloudlet instance to read in sample CPU utilisation values collected from PlanetLab workload files gathered from real servers as opposed to manually setting cloudlet configurations. DataCenterCharacteristics: This class encapsulates the properties and configuration of each data center instance such as OS, system architecture, lists of hosts, type of VMM and the allocation policy being implemented by the data center to assign VMs to hosts. RamProvisioner: This is an abstract class which defines how Random Access Memory (RAM) is allocated to VM instances executing on the host. By default each host implements the RamProvisionerSimple policy which allocates RAM to VMs if it is available on the host. This class can also be extended to create new policies. In order to assign a VM instance to a given host it must be approved by the RamProvisioner component to ensure that the host has enough free memory. BwProvisioner: Similar to the RamProvisioner class this class is also abstract and signifies how bandwidth is provisioned to VMs residing on a host. By default the BwProvisioningSimple policy is implemented which allows VMs to be allocated as much bandwidth as is requested providing that the host has enough to allocate. To create a new policy this class must be extended and implementation for its abstract methods must be provided. VmAllocationPolicy: This is an abstract class which is used to determine the policy for the allocation of VMs to hosts. The core responsibility of this class is to select a suitable host in the data center which has the appropriate resources required for the execution of the VM. By default a policy called VmAllocationPolicySimple is implemented which allocates VM instances based on the host with the least amount of VMs executing on it.

56

Chapter 3

3.2. SIMULATION TOOL

CloudSim: This is the root class of all classes in the CloudSim toolkit. This class is essential for the management of the simulation process and the event calls which are invoked during the simulation. This will be discussed in more detail in the subsequent section. SimEntity: This abstract class also plays a pivotal role in the simulation process. All entities in CloudSim must extend this class in order to be able to send events to other entities and also invoke event handlers to process incoming events during the simulation process. The three principle methods which are central to this class are startEntity(), processEvent() and also shutdownEntity(). SimEvent: This class models an event which is invoked by an entity and passed between two or more entities during the simulation process. This class encapsulates the details about an event such as the time the event should start and be delivered to the destination entity, data that must be passed with the event and also the event tag which is critical in order to identify the type of processing required by the event that has been sent and therefore it impacts directly on the resulting behaviour of an entity.

3.2.2

Simulation Model

The CloudSim simulation engine is largely controlled by the org.cloudbus.cloudsim.core package which contains the critical classes used for modeling the flow of communication between entities during the simulation process. Each time an entity is created it automatically calls its startEntity() method which simulates its behaviour. Entities interact with each other by sending events. In order for any entity to process an event it invokes the method processEvent() which is responsible for identifying the type of event tag and subsequently invoking the correct handler for the event. The simulation operates in two phases. Phase one involves the generation of events which are added to a future queue. Phase two involves executing the runClockTick() method in the CloudSim class in order to move the events in the future queue to the deferred queue for processing. Both queues allow for the simulation of a dynamic environment in which new events are generated during run time.

Figure 16: Simulation interaction diagram [31]

57

Chapter 3

3.2.3

3.2. SIMULATION TOOL

Simulation Setup

The initialisation of an energy aware simulation in CloudSim can be achieved by accessing a number of packages in the toolkit. In order to setup the simulation environment for this research the following packages, classes, hardware configurations and workload traces discussed below were used. Creating Cloud Entities The org.cloudbus.cloudsim.examples.power.planetlab package can be accessed in order to create entities for the simulation. A new class must be created, this class will contain the main method which will be the starting point for the simulation. This class will also be used to setup the workload for the simulation, define the VM allocation and selection policies and also initialise an output folder which can be used to print out the results of the simulation. This information is passed as parameters to a new PlanetLabRunner instance. The PlanetLabRunner class is primarily responsible for initialising cloud entities such as a broker, data center, hosts, VMs and the cloud services central repository for the registration of available resources. Initialising a Placement Policy The initialisation of a new placement policy can be achieved by accessing the org.cloudbus.cloudsim.power package. This package contains all of the classes for VM allocation and selection policies. It also contains the abstract class PowerVmAllocationPolicyMigrationAbstract, this class defines the method findHostForVm() which is responsible for selecting a host for the execution of a VM instance. To initialise a new placement policy a new class must be created which extends this abstract class. This class should provide a new implementation for the method findHostForVm(). In order to optimise the distribution of VMs across the data center during the simulation process the method optimizeAllocation() found in the class PowerVmAllocationPolicyMigrationAbstract is the central point in CloudSim where host overload detection, VM selection and placement are controlled. It is from within this method that a call to get a new placement for a VM instance occurs and subsequently the new implementation for the method findHostForVm() is invoked. To register a new allocation policy the method getVmAllocationPolicy() must be modified to provide a check for the name of the new allocation policy and instantiate it for the simulation. This method can be found in the RunnerAbstract class located in the package org.cloudbus.cloudsim.examples.power. Hardware & Workload Configurations In order to simulate a large scale cloud environment 800 servers were configured in the data center. These servers are HP Proliant ML110 G5 servers consisting of two cores with the capacity to process 2660 Millions of Instructions Per Second (MIPS). Each server is also configured with 4GB of memory and 1GB for storage and bandwidth. CloudSim allows for the manual setting of cloudlet configurations which result in a static workload. In order to conduct more realistic experiments which accurately model the inherent complexity and dynamic nature of the cloud environment this 58

Chapter 3

3.3. SUMMARY

research utilised real workload traces obtained from the CoMon project which is a monitoring infrastructure for PlanetLab [130]. These traces have been collected over a period of two months from real servers located in over 500 places in different parts of the world. During the simulation process each VM is dynamically allocated a workload trace, at discrete time steps the current CPU rate alters as a new value is read in from the workload trace. By harnessing this workload it makes the overall findings of this work more reliable and accurate.

3.3

Summary

This chapter firstly introduced the CloudSim toolkit. It discussed the most important classes and components of the toolkit that are required in order to simulate the cloud environment. An overview of the simulation model was also provided which highlighted the key interactions among cloud entities which occur during the simulation process. Lastly, this chapter outlined the process of setting up cloud entities, creating a new placement policy in CloudSim, the setup and configuration of the servers and also the application workloads used in this research.

59

Chapter 4 Reinforcement Learning Framework 4.1

Overview

This chapter will describe the design and development of an RL framework as an extension of the CloudSim toolkit which is required for the purposes of this work. It will firstly discuss in more detail how CloudSim operates in relation to resource allocation and the policies that will be used along side the proposed RL model. Leading on from this it will discuss the theory underlying the the state-of-the-art VM consolidation algorithm which has resulted in this research and it will address how the proposed approach seeks to deal with the fragilities arising from this approach. In order to implement an RL VM placement model a number of additional classes and methods were created. These include classes to simulate both RL behaviour and additionally to measure and record its performance. This chapter will detail the development of these classes and methods while also providing a diagrammatic representation of the correlation between CloudSim and the proposed RL model. Lastly, this chapter will provide a more in depth insight into the inner workings of the RL framework. In particular, it will describe the state-action space that has been implemented in this work. Additionally, it will also detail the development of both the Q-learning and Sarsa learning algorithms which have been specifically tailored for implementation within CloudSim.

4.2

Selected Policies

The process of developing an RL framework that will aid in the achievement of the pursued energy optimisation goals requires firstly an understanding of the key point of exchange between CloudSim and the proposed RL model. CloudSim as mentioned previously operates in a three stage process in terms of resource allocation at an IaaS level. These stages comprise of host overload detection, VM selection and VM placement. All three stages occur in a cyclical manner of

60

Chapter 4

4.3. PABFD ALGORITHM HYPOTHESIS

firstly detecting hosts that are likely to become overloaded, selecting VMs to be offloaded and finally placing VM’s that require migration across all servers on to a more suitable host that will drive energy efficiency. All three stages are controlled by policies. In a typical cloud setup underutilised servers often experience a drastic surge in the required amount of resources for each VM operating on it and as a result can quickly become overloaded leading to an increase in SLAVs. For the purposes of this work the Lr-Mmt policy has been selected to govern host overload detection and VM selection [62]. Since its first proposal Lr-Mmt has become a standard throughout the community and often serves as an experimental benchmark for various energy efficient research undertakings. As briefly mentioned in chapter two this heuristic leverages the PABFD algorithm in order to optimise the placement of VMs in the data center.

4.3

PABFD Algorithm Hypothesis

PABFD sorts the VMs in decreasing order based on their current CPU utilisation rates, it then selects a host for each VM instance which will cause the least increase in power consumption. However, although the Lr-Mmt approach has demonstrated its methods efficacy in all three stages of resource allocation there still remains a requirement for the development of a more robust and dynamic solution to resource optimisation. The merits upon which this work is proposed is that theoretically the implementation of an algorithm such as PABFD could potentially result in the placement of VMs across numerous underutilised hosts. Although this algorithm is designed to allocate VMs to hosts resulting in the least power increase, the power increase is measured relative to each host independently rather than a global power estimate of the entire data center. The theory which motivates this research is that given a scenario of hosts operating between 2%-50% this algorithm would allocate VM instances to the hosts operating at substantially lower rates of utilisation and overall is likely to result in the operation of numerous servers that remain relatively underutilised. Furthermore, solely from a scientific perspective operating 5 hosts at a higher rate of 50% would manage the same workload as running double the amount of hosts at a lower rate of 25%. This would result in the improved utilisation of available resources and also it would allow for the conservation of energy by decommissioning poorly utilised hosts. In view of this theory, this research proposes the deployment of an RL agent in order to optimise the distribution of VMs across the data center and consolidate the VMs on to a reduced number of servers while delivering the required QOS. More concretely, the objective of the agent is to discover the overall balance in the dispersal of VMs so as to prevent hosts becoming overloaded too quickly but also ensuring that they are operating efficiently. In order to implement such an approach the RL agent must be invoked in place of the PBRS algorithm.

61

Chapter 4

4.4

4.4. ADDITIONAL CLASSES

Additional Classes

The implementation of the proposed RL agent requires a specific set of behaviours and attributes which have been designed by introducing an RL package containing all the relevant classes required to simulate this behaviour and also to record its performance. Below is a brief description of each of the classes in the RL package and also additional classes required within other packages in CloudSim .

4.4.1

Lr-MmtRlPlacement

The Lr-MmtRlPlacement class is located in the org.cloudbus.cloudsim.examples.power package. As mentioned previously this class contains the main method which is responsible for initialising the simulation and it also allows for the configuration of both the workload data and resource management policies.

4.4.2

PowerVmAllocationRlPlacement

This class contains a new implementation for the findHostForVm() method which executes the proposed RL model. More specifically, it controls the interaction between the core RL classes outlined below and returns a mapping for the placement of VMs on to hosts as selected by the agent. This new method is invoked by the CloudSim optimiseAllocation() method instead of the default placement policy.

4.4.3

Agent

The Agent class is primarily responsible for selecting a host for the placement of a VM instance by implementing either a Q-Learning or Sarsa learning strategy coupled with an -greedy or softmax action selection policy. Additionally, this class is also responsible for updating and retrieving values from the Q-value matrix which essentially stores the agents knowledge to date.

4.4.4

Environment

The primary role of the Environment class is to carry out a series of functions in order to generate the necessary information required by the agent to make a decision. Examples of the functionality in this class include but is not limited to producing a list of hosts with available resources, calculating and retrieving the global state of the data center and the resource utilisation of both the VM to be placed and also the hosts in the environment. Also this class is responsible for the calculation of rewards which are given to the agent after selecting an action in a given state.

4.4.5

Algorithm

The Algorithm class allows for the invocation of either the Q-Learning or Sarsa learning algorithms. This class also declares the values for both the learning rate and discount factor utilised in the Q-value update rule.

62

Chapter 4

4.4.6

4.4. ADDITIONAL CLASSES

RlHelper

This class was designed to help monitor and record the performance of the RL agent. More specifically, CloudSim automatically calculates and generates the results of the overall performance of the data center at the end of each simulation. However, this class proves imperative in order to monitor the performance of the agent on a more refined basis which will allow for a coherent examination of it’s behaviour throughout the simulation process. In addition, the functionality provided by this class will enable a cross comparison of other experimental factors that may uncover any further findings that could be of interest in this study. Examples of some of the functionality that belongs to this class includes methods to read the Q-values back into the Q-value matrix when the simulation is run over a number of iterations, this ensures the agent continues learning from where it left off in the previous simulation. It also contains methods to print Q-values and possible actions out to files to help validate the program while also allowing for the calculation of the performance metrics at more discrete intervals than the default CloudSim setting. Below Figure 17 illustrates a high level class diagram of the novel system which comprises of the RL framework. More specifically, this diagram highlights the integration of the RL framework with CloudSim which is achieved through the method findHostForVmRl(). This method provides a new implementation for the abstract method findHostForVm(). As illustrated the findHostForVmRl() method harnesses the RL classes and methods to simulate the agents behaviour and as a result allocates VMs with a more informed approach. To further convey the integration of the proposed RL framework with CloudSim Figure 18 depicts the various message passing which now occurs during the three stage resource optimisation process. The method optimiseAllocation() is continuously called at discrete time frames during the simulation in order to drive improved resource management. This results in various method calls which in brief creates a list of overloaded hosts utilising the LR host overload detection policy. As a result, VMs are removed from each of the overloaded hosts until the hosts are no longer saturated. This process generates a list of VMs required for placement on other available hosts. Each of the VMs on this list are passed into the RL model to be placed by the agent. Once the agent has decided which hosts to place all the VMs it sends back a migration map which is processed by CloudSim.

63

Chapter 4

4.4. ADDITIONAL CLASSES

Figure 17: High level class diagram with integrated RL framework

64

Chapter 4

4.5. DEFINING THE STATE-ACTION SPACE

Figure 18: Sequence diagram showing CloudSim resource optimisation with RL component

4.5

Defining the State-Action Space

The application of RL in more complex problem domains requires careful definition of the state-action space. As previously mentioned in Chapter 2 Section 2.8.3, conventional RL learning algorithms such as Q-Learning and Sarsa can suffer from exponentially large state-action spaces often referred to as “the curse of dimensionality”. This problem can often have a severally negative impact on the overall performance of the agent while also limiting its ability to operative effectively. The implication of such a problem within a cloud infrastructure would likely cause a surge in energy consumption as the agent continues an exhaustive search of the state-action space. This would result in slower decision making and prolonging the total amount of time resources are being poorly utilised . It is therefore imperative to the design of the proposed RL energy model to define a more cohesive and sustainable state-action space.

65

Chapter 4

4.6. Q-LEARNING IMPLEMENTATION

The state space S represents a global state of the environment. It can be defined as the percentage of active hosts ah in the environment, where th is the total amount of hosts. By defining all possible states as a percentage ranging form 0-100% it results in a more manageable and efficient state space. Furthermore, the defined state space outlined in equation 4.1 aligns well with the design objective of our agent, which is to reduce the number of underutilised hosts in the environment by consolidating VMs on to a smaller number of hosts in order to optimise resource usage. n P

S=

ah

i=1

th

.100

(4.1)

An action A is a combined variable composed of the utilisation rate of a possible host coupled with the size of the VM to be placed. This calculation is defined in equation 4.2, where the host utilisation rate hu is calculated as the sum of the total requested resources for each VM residing on the host as a percentage of the hosts capacity hc . Additionally, VM utilisation vmu is computed as the VMs requested resources r returned as a percentage of the total host capacity hc . " A = hu =

4.6

n P

vmru

j =1

hc

r .100 + vmu = .100 hc

# (4.2)

Q-Learning Implementation

The procedural form of the Q-Learning algorithm is outlined below. This provides a high level overview of the internal operations of the RL agent-environment interaction which occurs during the simulation process.

66

Chapter 4

4.6. Q-LEARNING IMPLEMENTATION

Algorithm 3: Q-Learning VM placement procedure calculate globalState foreach Vm → vmPlacementList do foreach host → hostList do calculate hostUtilisationRate end calculate vmSize foreach hostUtilisationRate do possibleActions ← vmSize + hostUtilisationRate end select host from possibleActions using π allocate vm observe globalState+1 ,reward   calculate Q(s, a) ← Q(s, a) + α r + γ maxa 0 Q(s 0 , a 0 ) − Q(s, a) update QValueMatrix globalState ← globalState+1 end A list of VMs that need to be migrated elsewhere in the data center is passed into the method getNewPlacement(). This subsequently invokes the RL placement policy. Initially, the global state of the environment is calculated and a list of all possible hosts in the data center is created excluding the host the VM was removed from and also hosts which lack available resources to run the VM. This ensures the combined action percentage remains within the 0-100% range. The first VM to be placed is selected from the placement list and the host utilisation rate for each host on the host list is calculated and returned as a percent ranging between 0-100%. Next the size of the VM is calculated and a list of possible actions is generated using the combined action variable. The agent selects a host based on the action selection strategy being followed e.g -greedy or softmax. Once the host is selected it is placed on a migration map and sent back to CloudSim. The global state is recalculated and a reward is received by the agent. Next the Q-value update rule is calculated and the result is stored in the Q-value matrix. Lastly, the global state is updated for the next iteration. This algorithm continues until all VMs in the placement list have been reallocated on to other hosts in the environment.

67

Chapter 4

4.7

4.7. SARSA IMPLEMENTATION

Sarsa Implementation

The Sarsa procedure is provided below. This algorithm is similar to the Q-learning implementation illustrated above. As outlined in Section 2.8.3 the fundamental difference between both algorithms is that Sarsa requires that a quintuple of events consisting of state, action, reward, state, action must occur in order to transition from one state to the next and subsequently update the Q-value estimates. As shown in the Sarsa procedure, the algorithm differs after the point at which the agent observes the new state it transitioned into and the reward it received as a result of previously selecting a host for the allocation of a VM in a given state. Unlike the Q-Learning algorithm, instead of calculating and updating the Q-value estimates the Sarsa procedure recomputes the host list and generates all possible actions based on the current state the agent is in and the next VM to be placed from the placement list. It then selects an action to execute in the following iteration and updates it’s Q-values based on this definite action. In the Q-Learning implementation the agent assumes the most lucrative action will be selected in the subsequent state. However, this does not consider the possibility of the agent selecting a random action and exploring the environment which is not uncommon.

68

Chapter 4

4.7. SARSA IMPLEMENTATION

Algorithm 4: Sarsa VM placement procedure calculate globalState foreach host → hostList do calculate hostUtilisationRate end calculate vmSize foreach hostUtilisationRate do possibleActions ← vmSize + hostUtilisation end select host from possibleActions using π foreach Vm → vmPlacementList do allocate vm observe globalState+1 ,reward foreach host → hostList do calculate hostUtilisationRate end calculate nextVmSize foreach hostUtilisationRate do possibleActions ← nextVmSize + hostUtilisation end select host from possibleActions using π  calculate Q(s, a) ← Q(s, a) + α r + γQ(s 0 , a 0 ) − Q(s, a) update QValueMatrix globalState ← globalState+1 action ← host end

69

Chapter 4

4.8

4.8. SUMMARY

Summary

This chapter discussed the development of the RL framework which will be implemented in subsequent chapters as part of our experimental analysis. More specifically, it discussed the theoretical viewpoint of the author which alongside the need for improved energy efficiency in cloud infrastructures forms the motivation behind this work. Next this chapter outlined the key classes and methods required to simulate the behaviour of the RL agent in order to optimise the placement of VMs in the datacenter for improved energy conservation. This chapter also addressed one of the key underlying issues of RL methodologies known as the “curse of dimensionality”. In particular, it discussed how the strategy of defining the state-action space as percentages ranging from 0-100% significantly reduces the size of the stateaction space in the proposed RL energy model. This type of approach allows for the deployment of a more agile and efficient agent capable of pursuing its design objectives. Lastly, the customised set of Q-Learning and Sarsa learning algorithms which will be implemented in CloudSim were also introduced.

70

Chapter 5 Gridworld 5.1

Overview

Thus far this thesis has explored RL methodologies solely from a theoretical perspective. This chapter presents a simplified test problem which will serve as an aid to further convey and illustrate the key concept of RL as an AI technique. The test problem that has been selected consists of a deterministic gridworld and will be utilised in a small number of experiments. These experiments have been conducted to both illustrate the general idea while also highlighting the fundamental differences between Q-Learning and Sarsa which are the learning strategies that have been selected for the work carried out as part of this thesis.

5.2

Experimental Details

As an initial test problem experiments were conducted using RL in a 5x5 gridworld environment. In order to develop an in depth understanding of the learning process we examine and compare the behaviour of the agent when implementing different combinations of learning and action selection strategies. In particular we analyse the Q-Learning and Sarsa learning procedures coupled with both -greedy and softmax action selection strategies in order to gain a greater understanding of these conventional RL algorithms. In addition, in order to further expose the key differentiating behaviour of the Q-Learning and Sarsa learning approaches we analyse both policies based on the rewards accumulated throughout the learning cycle. Figure 19 below illustrates the gridworld problem domain used to demonstrate the general concept. The goal state is Sg , at the beginning of each episode the agent begins at the start state which is located on the bottom row in the far left cell. The shaded square represents a danger zone, if the agent enters this state it receives a poor reward therefore the agent should avoid it. The agent has a specific action set available to it, this action set is composed of four directional moves North, South, East and West depending on the agents current state. Each time the agent discovers the goal state a new episode begins which promotes further learning. The agents overall objective is to learn the most optimal path through the grid in order to reach the goal state.

71

Chapter 5

5.3. RESULTS

The agent learns by visiting each state infinitely often and updating its Q-values in the form of a matrix, this in essence stores the agents knowledge to date. The Q-values for all experiments were set to a default value of 0 at the beginning of each run to encourage exploration in the early stages of the learning process. In order to ensure the accuracy of the experiments and to improve the overall clarity of the graph the average was taken across 30-100 runs.

5.3

Results

The following sections outline the results of the various experiments that were conducted. These results are illustrated below in Figures 19 to 25.

5.3.1

Learning an Optimum Policy

As an initial experiment an intelligent agent implementing Q-Learning combined with an -greedy action selection policy was deployed in the 5x5 gridworld environment. Figure 19 depicts the optimal path the agent learned in order to navigate through the grid to reach the goal state. It can be observed that the agent learned an optimal policy while avoiding the danger zone. The agent selects more lucrative actions which maximize the expected reward. The arrow in each cell of the grid illustrates these actions.

Figure 19: The gridworld problem illustrating the optimal path from each state In order to further illustrate proof of concept and to expose the learning capabilities of an RL agent executing a Q-Learning -greedy policy we analyse the number of moves the agent took over each episode in order to reach the goal state. As illustrated in Figure 20 in the early stages of the learning process it can be observed that on average the agent took approximately 15 moves to reach the goal state when implementing the Q-Learning -greedy algorithm. At this stage in the learning process the agent had acquired relatively little knowledge about its environment and therefore, continued to explore all possible actions. As a result it can be seen that the agent took a greater number of moves in order to reach the goal state.

72

Chapter 5

5.3. RESULTS

However, it can also be observed that this figure dropped significantly over time which indicates the ability of the agent to learn in unknown environments. As the agent explored by means of repeated state-action-reward interactions it learned to optimise its behaviour and as a result on episode 5 it converged to an optimal policy of 6 moves. As illustrated there is a slight variance in the plotted line from episode 5 - 100, this is not a design defect, rather it is merely due to the set rate of  = 0.10. As a result, the agent continued to explore 10% of the time although it had already discovered the optimal policy, this caused a slight fluctuation in the number of moves taken in order to reach the goal state.

Figure 20: Q-Learning -greedy number of moves to goal 5x5 gridworld Figure 21 illustrates the impact of coupling the Q-Learning procedure with both an -greedy and softmax action selection policy. In the earlier stages of the learning cycle Q-Learning softmax required a greater number of moves to reach the goal state resulting in an average of 17.53 moves before converging to an optimal solution of 6 moves on iteration No.5 (Std Dev 1.35091). In comparison, as previously mentioned Q-Learning -greedy recorded an average of 15.50 moves resulting in an 11.5% decrease in the number of moves over the first learning trial before successfully converging on iteration No.5 (Std Dev 1.2203). As shown, once both polices converged their is no significant difference in performance.

73

Chapter 5

5.3. RESULTS

Figure 21: Comparison of Q-Learning -greedy and softmax action selection polices Below we analyse the implications of an -greedy and softmax action selection policy on the Sarsa learning procedure. Figure 22 displays the behaviour of both policies. Sarsa -greedy resulted in an average of 35.37 moves before descending below the 10 moves threshold to converge on iteration No.2 with a total standard deviation of 2.58842. Similar results were also achieved through the deployment of the Sarsa softmax policy. The learning agent generated an average of 35.5 moves to reach the goal state in the first trial before successfully converging on iteration No.2. Notable, the standard deviation was calculated at a slightly higher rate of 2.64048 which suggests that there more variability evident in the Sarsa softmax procedure in terms of the optimal policy being followed. However, in light of such findings once again their is a very limited differentiation in performance relative to the simplistic gridworld environment.

74

Chapter 5

5.3. RESULTS

Figure 22: Comparison of Sarsa -greedy and softmax action selection polices Having explored the implications of leveraging both the -greedy and softmax action selection policies relative to the selected learning procedures we now focus on evaluating the Q-Learning algorithm against the Sarsa algorithm. Although previous experiments showed no significant difference in performance in relation to the action selection policy being implemented, for the purposes of this experiment the Q-Learning -greedy and Sarsa -greedy policies have been selected primarily based on the performance of both policies in the early stages of the learning process as outlined above.

Figure 23: Comparison of agents performance with Q-Learning and Sarsa

75

Chapter 5

5.3. RESULTS

Figure 23 above illustrates a comparison of the behaviour of the agent during the learning process utilising both the Q-Learning and Sarsa algorithms. It can be noted that both plotted lines illustrate the agents ability to learn as the number of moves executed by the agent reduced significantly overtime. However, there are slight variances in the graph. The Sarsa algorithm initially took far more moves in order to reach the goal state in the early stages of learning with approximately 35 moves in total before it converged to an optimal policy of approximately 10 moves. The agent which has initially no knowledge of the environment recognised the danger zones and as a result took less direct routes to avoid undesired states. In comparison, the Q-Learning algorithm experienced a significantly lower number of moves in the early stages of learning with approximately 15 moves in total. As illustrated, it converged on an optimal policy of 6 moves demonstrating its off-policy nature. Figure 24 illustrates an overall count of steps taken by the agent using both algorithms at the end of 100 iterations. The Q-Learning algorithm resulted in a 34% reduction in the total number of moves required to reach the goal state.

Figure 24: Total moves for Q-Learning and Sarsa

76

Chapter 5

5.3.2

5.3. RESULTS

Comparison of Q-Learning and Sarsa Algorithms Through Accumulated Rewards

In order to fully address the key differentiating feature between Q-Learning and Sarsa an experiment was conducted similar to that of Sutton and Bartos cliff walking example [70]. This experiment primarily focused on the accumulation of rewards per episode for each learning method. The gridworld used in this experiment contained a number of danger zones which the agent should avoid. Figure 25 below illustrates this key differentiation. The Sarsa learning policy experienced a more lucrative accumulation of rewards with an average reward of 59 (Std Dev 4.01). This is primarily the result of the on-policy nature of the Sarsa learning algorithm. The Sarsa algorithm discovered an optimal policy by taking into account the danger posed by exploration, it learned the safest path in order to reach the goal state resulting in the accumulation of greater rewards. In comparison, it can be observed that the Q-Learning algorithm received less rewards resulting in an average reward of 49 (Std Dev 3.39). The Q-Learning agent attempted to pursue the most optimal policy with a disregard for the action selection policy being followed. As a result, the agent often experienced negative rewards. Its off-policy behaviour assumed that the action with the greatest expected reward would be selected in the subsequent state, however a suboptimal action may have been invoked causing the agent to enter an undesired state resulting in a negative impact on the accumulation of rewards. Figure 25 compares favourably with Sutton and Bartos cliff walking example in Figure 26. Stark similarities can be observed between both diagrams.

Figure 25: Accumulated rewards

77

Chapter 5

5.3. RESULTS

Figure 26: Sutton & Barto cliff walking rewards

78

Chapter 5

5.4

5.4. DISCUSSION

Discussion

Figures 20 up to 23 illustrate the basic and fundamental behaviour of an agent during the learning process leveraging both Q-Learning and Sarsa learning algorithms coupled with -greedy and softmax action selection policies. Having analysed and compared the behaviour of an agent following various combinations of such learning and action selection policies there are several key points to reflect on. In particular, while the results showed no profound differences in performance in relation to the action selection policy being implemented it is evident that learning algorithms coupled with an -greedy action selection policy prove to be a more efficient combination when applied to a simplistic gridworld problem. This finding is based primarily on the reduction in the number of moves required to enter the goal state in the early stages of learning. For example, the Q-Learning -greedy algorithm achieves up to an 11.5% decrease in the number of moves in the earlier stages while a smaller but similar result was also evident in comparing both action selection strategies with the Sarsa procedure. In the early stages of learning often the Q-values stored by the agent are inaccurate as the agent requires more integrations with the surrounding environment in order to learn an optimal strategy and to ultimately increase its knowledge. Evidently, in a simplistic environment the application of a weighted approach early on in the learning process where by actions which appear to be more rewarding have a better probability of being selected results in less informed decision making. As a result, this drives up the number of required moves to enter the goal state. Additionally, another key point is that there was a slight fluctuation evident in the number of moves taken by the agent once it had converged due to the rate of . If the rate of  was to be reduced over time as the agent becomes more certain about the environment this line would level off completely. However, an important point to highlight is that in highly volatile environments such as the cloud reducing the rate of  would not cater to the stochastic nature of the environment. Such an environment requires an agent to explore to some degree in order to ensure an up to date policy is maintained, as a result it would make the agent less adaptive. Overall the deployment of this type of strategy often depends heavily on the nature of the environment. Figure 23 and 25 compare the key differentiating features of both learning algorithms in terms of their behaviour during the learning process and also through the accumulation of rewards. As illustrated, the Sarsa learning policy converges at a quicker rate of 3 iterations over the Q-Learning policy but the optimal policy learned is different. The Sarsa algorithm updates its Q-values based on the action taken in subsequent state where as Q-Learning makes an update assuming the action with the most lucrative reward is selected. This key differentiation provides Sarsa with a more realistic view of the environment allowing it to find an optimal policy faster. In addition, it also causes the agent to become more cautious when faced with states that have a negative impact on the expected reward as reflected in Figures 23 and 24 in the overall increase in the number of moves to reach the goal state.

79

Chapter 5

5.5. SUMMARY

An important point to address in relation to both algorithms is that while there is a key differentiation in terms of the optimal policy learned by each agent, for example Q-Learning converged with an optimal policy of an estimated 6 moves while Sarsa converged at a rate of approximately 10 moves. Both policies are in line with the expected behaviour of the algorithms. Sarsa takes an average of 4 moves more to reach the goal state which in this case is the optimal policy when faced with a series of danger zones. In light of this finding there is evidence to suggest that there is no one best algorithm as highlighted by the work of Sutton and Barto [70]. In fact, the performance of each learning strategy depends on the nature of the environment in which it is deployed. To place the results in context with published literature, Figure 25 displays the accumulation of rewards in order to highlight the key differentiating feature between both algorithms. This graph is compared with that of Sutton and Bartos cliff walking experiment in which strong similarities are evident. Overall, this technique proves to be an effective approach in the deployment of a self-governing agent capable of learning in unknown environments. Furthermore, it is also clear to see how the application of such a technique holds massive potential in the development of a more dynamic energy efficient resource model in cloud based infrastructures.

5.5

Summary

This chapter intended on providing a greater understanding of the fundamental and general concept of RL by illustrating the use of this AI methodology in a more simple gridworld environment. In particular it focused on both Q-learning and Sarsa learning algorithms coupled with both softmax and -greedy action selection approaches which will be implemented as part of our energy aware cloud resource model in the subsequent chapters. It firstly compared the behaviour of an agent implementing various combinations of policies during the learning process. It also showed that over time the agent successfully converges to an optimal policy utilising both learning approaches. Following on from this it also highlighted the most profound difference between the Q-Learning and Sarsa learning algorithms. In particular, it showed how the Sarsa learning algorithm learns the safest and most rewarding path through the grid while Q-Learning pursues the most optimal path which sometimes results in the agent entering a poorly rewarded state.

80

Chapter 6 An Examination of RL Policies Implemented in the Cloud 6.1

Overview

As illustrated in the previous chapter, the overall performance of both the QLearning and Sarsa learning algorithms can vary depending on the nature of the environment in which they are deployed. The goal of the following chapter is to conduct a series of experiments in order to analyse and compare the performance of both algorithms utilising two different action selection strategies e.g -greedy and softmax. This analysis is necessary in determining the most robust combination of policies which will ensure the best possible performance from the proposed RL learning agent.

6.2

Experimental Details

This set of experiments compares the following combination of learning and action selection policies: 1. Q-Learning and -greedy. 2. Q-Learning and softmax. 3. Sarsa and -greedy. 4. Sarsa and softmax. In order to conduct a thorough examination of each policy there are two broad divisions under which the performance of each policy must be analysed. These categories include the overall robustness and adaptability of our agent when faced with a more dynamic workload and also the rate at which the agent converges to an optimum solution over a single iterative workload. In light of this, each combination of the above policies will be run over both a 30 day stochastic workload and also a single workload over 100 iterations. The 30 day workload has been created using the PlanetLab workload traces. It has been implemented in order to examine the performance of our agent under a more volatile workload which the agent has not been continuously learning over. As a result this will expose the capability of our 81

Chapter 6

6.3. PERFORMANCE METRICS

agent in a more realistic cloud scenario. Additionally, the performance of each combination of policies must be consistently measured in exchange for reliable and accurate findings. Therefore, the performance metrics outlined below will be used to evaluate each algorithmic approach.

6.3

Performance Metrics

The key performance metrics which have been selected in order to compare and evaluate the effectiveness of the proposed algorithms were proposed by Beloglazov et al. in one of their most highly cited papers [62]. Since being first proposed these metrics have been widely adopted within the community as a standard set of data center performance metrics. These metrics are composed of the following:

6.3.1

Energy Consumption

This can be defined as the total energy consumed by the data centers computational resources as a direct result of processing application workloads. Although other energy overheads exist within a data center such as cooling systems and additional auxiliary equipment, the core focus of this research is the energy consumption from computational resources.

6.3.2

Number of Virtual Machine Migrations

This is the total number of VM migrations that occur during the simulation process. Each time a VM is migrated it is typically subjected to SLAV’s. In order to improve SLAV’s it is advantageous to reduce the number of migrating VM’s.

6.3.3

Service Level Agreement Violations

As discussed in previous chapters, the ability of cloud providers to deliver SLA is critical and a core function of their operation. Beloglazov et al. proposed a combined metric in order to measure the level of SLA being adhered to in the delivery of cloud services. This combined metric can be calculated using both SLATAH and PDM. SLATAH Service Level Agreement Violation Time Per Active Host (SLATAH) determines how long each active host in the environment has experienced 100% utilisation of its resources. According to Belaglazov et al. during the time period in which a host is experiencing 100% utilisation the VM’s residing on the host are not provided with the required resources and as a result SLAV’s occur [62]. The SLATAH calculation is presented below in equation 6.1. N is defined as the total number of active hosts, Tsi is the total time host i has experienced 100% utilisation of it’s resources. Tai is defined as the total time that host i is actively serving VM’s. N 1 X Tsi SLATAH = N i=1 Tai

82

(6.1)

Chapter 6

6.3. PERFORMANCE METRICS

PDM Performance Degradation Due to Migrations (PDM) calculates the overall decrease in the performance experienced by migrating VMs. The PDM calculation is presented below in equation 6.2. M is defined as the number of VMs, Cdj is the estimated performance degradation for VM j while Crj is the total requested CPU by VM j over the span of its life. PDM =

M 1 X Cdj M j =1 Crj

(6.2)

The combined metric for both SLATAH and PDM is defined below: SLAV = SLATAH .PDM

6.3.4

(6.3)

Energy Service Level Agreement Violations

Energy Service Level Agreement Violations (ESV) is also a combined metric which recognizes the multi-objective nature of energy aware resource allocation. The goal of this approach is to reduce energy consumption while also delivering the required level of service, therefore, the ESV metric allows for a more accurate and global view of the data center performance. The ESV calculation is shown below. ESV = E .SLAV

(6.4)

The reallocation and placement of VM instances has a direct impact on each of the above performance metrics. By placing a VM on an already loaded host it can cause a surge in energy consumption while also potentially causing the host to become overloaded incurring SLAV’s and further requiring VM’s to be migrated. Additionally, placing a VM on an underutilised host may also result in the continuation of poor resource utilisation which according to current research is one of the most profound sources of energy inefficiencies in data center operations today [6, 35]. The aim of the proposed approach in this work is to strike a balance between the energy-performance tradeoff and measure the effectiveness of such an approach by leveraging the performance metrics outlined above.

83

Chapter 6

6.4

6.4. RESULTS

Results

Below are the results obtained from this experiment which are categorised according to the selected performance metrics.

6.4.1

Energy

In order to demonstrate the application of RL in provisioning and managing cloud resources Figure 27 portrays the energy consumption of the data center over 100 learning trials utilising Q-Learning combined with an -greedy action selection policy. The 126 kWh energy range has been deemed the optimum energy rate across all four combinations of policies and will be used to measure the rate at which each policy converges. As illustrated, in the early stages of the learning process energy consumption was measured close to 129 kWh. Between trials 0-7 the line shows a downward trend as the energy rate proceeded to decrease demonstrating the agents ability to learn in more dynamic environments. In trial No.7 the energy consumption stabilised to a certain extent at 126.63 kWh indicating that the agent converged to an optimal policy. However, it also began to fluctuate marginally due to the set rate of  which caused the agent to select a random action periodically. Overall the energy rate remained within the 126 kWh energy span.

Figure 27: Energy consumption over 100 iterations

84

Chapter 6

6.4. RESULTS

The Q-Learning algorithm was compared utilising both -greedy and softmax action selection strategies as depicted in Figure 28. Q-Learning -greedy recorded close to 129 kWh of energy in the early stages of learning while Q-Learning softmax resulted in sub 128 kWh. It can be observed that Q-Learning softmax appears to have converged at a faster rate on trial No.4 in comparison to Q-Learning -greedy which converged on trial No.7. Additionally, post the 60th iteration mark Q-Learning softmax generated a slightly higher rate of energy averaging at 126.77 kWh in comparison to Q-Learning -greedy which averaged at 126.60 kWh.

Figure 28: Energy consumption Q-Learning with -greedy and softmax Below Figure 29 illustrates a comparison of the Sarsa learning algorithm coupled with -greedy and softmax action selection strategies. As displayed, Sarsa -greedy commenced learning at an energy rate of 128.56 kWh. On iteration No.6 it achieved sub 127 kWh and converged to an optimum policy. In contrast, Sarsa softmax initially recorded energy consumption of 127.33 kWh and converged much quicker on iteration No.2 than Sarsa -greedy.

85

Chapter 6

6.4. RESULTS

Figure 29: Energy consumption Sarsa with -greedy and softmax Both Q-Learning and Sarsa algorithms paired with a softmax selection strategy resulted in a faster rate of convergence. Below Figure 30 compares both policies relative to the rate at which they converge.

Figure 30: Energy consumption Q-Learning softmax and Sarsa softmax As previously highlighted, Q-Learning softmax recorded an initial energy consumption rate of 127.93 kWh prior to converging on trial No.4 as indicated in Figure 30 by the black dot. Sarsa softmax produced a slightly lower energy rate of 127.33 kWh in the early stages of the learning process before experiencing a descent which caused it to converge on trial No.2. In terms of average energy consumption rates Sarsa softmax experienced a minimal reduction in energy with a rate of 126.73 kWh and a standard deviation of 0.126115. Q-learning softmax averaged at 126.77 kWh with a higher deviation of 86

Chapter 6

6.4. RESULTS

0.188252. By calculating the standard deviation of both policies we can infer that Q-Learning softmax resulted in a greater dispersion and variation in energy while Sarsa softmax produced a more refined and overall reduced rate of energy. While measuring the rate at which each combination of policy converged to an optimum solution provides a perspective on the learning capability of the agent utilising the different variations of learning and action selection procedures. In order to elaborate on the findings it is also vital to review each policy in terms of the total amount of energy consumed over numerous iterative trials. This will impact greatly on the selection of a policy going forward. Figure 31 displays the total energy consumed utilising each variation of policy. It can be observed that Sarsa -greedy consumed the highest rate of energy at a total measure of 12,678 kWh followed in close secession by both Q-Learning -greedy (12,677 kWh) and Q-Learning softmax (12,676 kWh). Sarsa softmax recorded the least amount of energy outperforming all other algorithmic approaches with a total rate of 12,673 kWh. Although the performance differential of each approach in relation to energy is minimal it remains significant in order to identify the algorithm which demonstrates the most potential for further analysis to be conducted in the subsequent chapters.

Figure 31: Total energy consumption over 100 trials In order to evaluate the performance and overall robustness of our agent in a more volatile environment Figure 32 and Figure 33 delineate both the total and average energy output for each combination of policy over a more rigorous 30 day stochastic workload. One of the key observations to be noted is that both variations in the QLearning algorithm incurred greater energy consumption than Sarsa relative to the action selection policy being implemented. Furthermore, both Q-Learning softmax and Sarsa softmax consumed the least amount of energy. Similar to the results arising out of the total energy consumed over 100 iterations Sarsa softmax once again prevailed with a total energy consumption of 3,547 kWh, an average of 118.23 kWh and also it resulted in the lowest standard deviation of 25.09. 87

Chapter 6

6.4. RESULTS

Figure 32: Total energy consumption over 30 day workload

Figure 33: Average energy consumption over 30 day workload

88

Chapter 6

6.4.2

6.4. RESULTS

Migrations

While migrating VM instances across discrete physical hosts in the data center provides a plausible mechanism for performance optimisation and also plays a pivotal role in the reduction of energy and power consumption. The process of spawning a VM image and instantiating it on a new host often causes a surge in energy consumption. Therefore, it is vital to implement a migration strategy which strikes a balance between both energy and performance and thus preventing excessive migrations. Below Figure 34 and Figure 35 present the total and average migrations incurred by each combination of policy over 100 iterative trials. As illustrated, both Sarsa -greedy and Q-Learning -greedy resulted in the greatest number of migrations averaging at 15,187 and 15,182 with a standard deviation of 273.58 and 274.25 respectively. Once again Q-Learning softmax and Sarsa softmax produced the least amount of migrations resulting in a considerable reduction of approximately 20,000 migrations in comparison to algorithms utilising an -greedy exploration strategy.

Figure 34: Total migrations over 100 trials

89

Chapter 6

6.4. RESULTS

Figure 35: Average migrations over 100 trials In addition, the number of migrations generated by each combination of policy was also analysed over the 30 day workload. Figure 36 depicts the average migrations for each policy. Again the results are analogous to those obtained over a single iterative workload. As illustrated, Q-Learning -greedy generated the greatest number of migrations with an average of 15,547 (Std Dev 3702.18) followed by Sarsa -greedy 15,509 (Std Dev 3567.18), Q-Learning softmax 15,282 (Std Dev 3422.20) and Sarsa softmax 15,172 (Std Dev 3464.85).

Figure 36: Average migrations over 30 day workload

90

Chapter 6

6.4.3

6.4. RESULTS

SLAV

Adhering to SLA is essential in the delivery of cloud based services. SLA govern the availability, reliability and overall performance of the service. Failure to deliver the pre agreed QOS can have a detrimental impact on both profitability and the ability of the service provider to gain competitive advantage. In this respect, it is imperative in this analysis to consider the number of SLAVs incurred by each variation of policy. Below Figure 37 depicts the total amount of SLAVs over 100 learning trials. As shown, Sarsa -greedy was subjected to the greatest number of violations while Sarsa softmax once again outperformed all other algorithmic approaches.

Figure 37: Total SLAV 100 trials Figure 38 illustrates the results over the 30 day workload. This chart shows similar results obtained from the single workload. The main differentiation is that when faced with a more randomised 30 day workload Sarsa -greedy outperformed Qlearning -greedy which has been a common observation across all metrics.

91

Chapter 6

6.4. RESULTS

Figure 38: Total SLAV 30 day workload

6.4.4

ESV

The challenging problem in achieving improved energy conservation in data center deployments is that there is an inverse relationship between energy and the level of service provided. As a result, a reduction in energy consumption often causes a surge in the number of SLAVs. One of the key objectives of the proposed RL model is to strike a balance between both variables in order to manage this relationship and drive a more globalised optimisation strategy. The ESV metric plays a pivotal role in addressing this energy-performance tradeoff, as previously mentioned ESV combines both energy and SLAVs in order to measure this impact. In a broad sense this will allow for an unambiguous representation of the data center performance. Figure 39 and Figure 40 present the performance of the data center using each variation of policy under the ESV metric. Once again over 100 iterative learning trials Sarsa -greedy performed least best with an ESV metric of 0.189. Sarsa softmax again performed marginally better than Q-Learning softmax with an ESV of 0.186. As illustrated in Figure 40, when presented with a more volatile workload Sarsa -greedy outperformed Q-Learning -greedy with an ESV metric of 0.068. While Sarsa softmax as expected outperformed all other variations with an ESV of 0.067.

92

Chapter 6

6.4. RESULTS

Figure 39: Total ESV over 100 trials

Figure 40: Total ESV 30 day workload

93

Chapter 6

6.5

6.5. DISCUSSION

Discussion

As a general observation the experiments conducted in this chapter overall expose the ability of an RL agent to learn in more volatile cloud environments. In order to demonstrate the overall efficacy and robustness of the proposed RL algorithms each combination of policy was evaluated under both the rate of convergence and its performance under a more dynamic and strenuous workload. In regards to the rate of convergence one of the most prominent observations is that when both algorithms follow a softmax exploration strategy they converge at a quicker rate. The results as highlighted in Figure 28 and Figure 29 show Q-Learning softmax converging on iteration No.4 while Q-Learning -greedy failed to converge until iteration No.7. Similarly, Sarsa softmax converges on iteration No.2 while Sarsa -greedy requires four additional trials. Figure 30 went on to further compare both algorithms leveraging a softmax exploration policy. This analysis confirmed that Sarsa softmax outperforms all other variations in terms of convergence rates, narrowly beating Q-Learning softmax by a reduction of two iterations. Sarsa softmax also performed best in terms of energy consumption under both a single workload with an energy rate of 12,673 kWh and also over a 30 day workload with a rate of 3,547 kWh as illustrated in Figures 31 to 33. In both instances it showed a narrow improvement of 0.02-0.03% respectively over Q-Learning softmax. While in a broader sense, it secured energy savings of 5-13 kWh relative to the algorithm which performed least best over both workloads. Although these improvements in some cases are minimal it remains significant in identifying the most promising policy for further analysis. In terms of the action selection policy being pursued these were the expected findings. The softmax action selection strategy out performs -greedy in a more stochastic environment. This is largely due to the weighted approach used in the softmax policy, this prevents an agent from choosing equally among all actions which is a fundamental behavioural characteristic of -greedy. By implementing a softmax action selection strategy when entering a phase of exploration it is more probable that the agent will select a host to allocate a VM which will result in a more lucrative reward. In a cloud scenario this proves to be a more empirically effective method which results in the agent discovering an optimal policy faster. An interesting observation is that there seems to be a conflicting correlation between both Sarsa -greedy and Q-Learning -greedy when presented with a more volatile workload as shown in Figure 31 onwards. Sarsa -greedy marginally outperforms Q-Learning -greedy over the 30 day workload. Again, this is due in part to the off-policy nature of the Q-Learning algorithm coupled with an -greedy exploration policy . In a more convoluted environment Q-Learning -greedy assumes that in a subsequent state a VM will be allocated to a host which will result in the maximum possible reward and it updates its Q-values based on this assumption. Additionally, when the agent explores it chooses equally among all possible placement options which also contributes to this slight increase in energy, the number of migrations and also SLAVs.

94

Chapter 6

6.6. SUMMARY

Also equally apparent is that the policies which generated the highest energy consumption also resulted in an increased number of migrations as illustrated in Figures 32 and 36. This confirms there is a parallel relationship between energy consumption and the number of migrations as stated in previously published work by Beloglazov et al. and Clark et al. [60, 62]. Overall it is clear that the policy which consistently outperforms all other variations under each metric over both a single and 30 day stochastic workload is Sarsa softmax. With respect to both workloads Sarsa softmax achieved up to 13 kWh in energy savings, a reduction of best 2.4% in migrations, a 2.3% decrease in SLAVs and also it resulted in an overall ESV saving of up to 2.8%. As it stands, Sarsa softmax has been selected as the most robust and optimum VM placement policy which will be utilised in further experiments.

6.6

Summary

This chapter firstly introduced each of the performance metrics used to effectively measure the efficiency and robustness of the proposed algorithms. These metrics can be broadly classified into energy consumption, SLAV’s, ESV and also the number of VM migrations. This chapter then evaluated the effects of different combinations of policies in optimising the available cloud resources under each of the performance metrics. More specifically, it validated each policy under both a single iterative workload and also a 30 day dynamic workload. The findings proved Sarsa softmax to be the most promising algorithm going forward outperforming all other variations of policies.

95

Chapter 7 Potential Based Reward Shaping with a Comparative Study on the State of the Art Placement Policy 7.1

Overview

The following chapter seeks to reinforce and provide a more robust algorithm by exploring the effects of introducing an advanced RL technique such as PBRS into the original algorithm. Sarsa softmax has been tested and verified as the most promising algorithm in the previous chapter. As a result, this experiment is designed to measure the performance of PBRS Sarsa softmax against the standard Sarsa softmax procedure. More concretely, the PBRS technique will firstly be implemented using the gridworld environment outlined in Chapter 5. This enables us to verify the approach while also providing a greater perspective on the intended behaviour of an agent utilising PBRS in a more simplified problem domain. Following on from this the effects of PBRS will be measured and compared against the standard algorithm across the selected performance metrics. Furthermore, it is imperative to evaluate the overall significance of this research to the development of the discipline. In light of this, this chapter will evaluate the best performing algorithm against the Lr-Mmt policy which harnesses the PABFD placement heuristic. The purpose of this is to place our proposed solution in context with the state-of-the-art VM consolidation policy and to expose the benefits if any of implementing such an approach.

7.2

PBRS Experimental Details

As mentioned previously in Chapter 2 Section 2.8.3, one of the underlying issues pertaining to the implementation of RL algorithms is the rate at which an agent converges to an optimum policy. In order for a standard RL agent to gain the required capacity to operate in any given environment it must learn by means of repeated interactions with that environment. This requires the agent to visit each state infinitely often which is generally a time consuming activity. This problem becomes an even more profound issue in highly volatile environments which requires

96

Chapter 7

7.2. PBRS EXPERIMENTAL DETAILS

an agent to adapt by exploring the state space at regular intervals, thus helping to ensure the agent maintains an up to date model of the environment resulting in informed decision making. PBRS often referred to as knowledge-based RL is designed to provide the learner with an additional reward in order to encourage the agent to make more optimal decisions and advance it closer to the goal state during the learning process [131]. The idea is that a systems designer often holds domain specific knowledge which could be used to better guide the agent during exploration. As outlined earlier, PBRS requires the mapping of states to potentials in order to offer the agent an additional reward which conveys the designers range of preferred states [100], this potential function is defined as: F (s, a, s 0 ) = γΦ(s 0 ) − Φ(s)

(7.1)

The incorporation of PBRS requires a modification to the underlying MDP reward function. For the purposes of this work we use the Sarsa update rule as outlined below. h i Q(st , at ) ← Q(st , at ) + α rt+1 + F (s, s 0 ) + γQ(st+1 , at+1 ) − Q(st , at )

(7.2)

In order to implement PBRS the calculation used to compute the potentials and to ultimately shape the reward must be defined relative to the environment in which the agent is being deployed. In light of this, the values in the Q-value matrix from the previous results chapter were leveraged in order to help shape the reward by providing a greater perspective on the most rewarding VM allocation strategy. This allows for communicating the most optimal strategy to the agent in the early stages of learning in order to expedite the learning and discovery process. The advanced PBRS algorithm will be compared against the standard Sarsa softmax procedure in order to measure the effects of PBRS in a more volatile cloud environment. More specifically, the algorithm will be run over a single iterative workload in order to accurately measure the effects of PBRS on the rate of convergence but also to identify any additional benefits of harnessing such an approach. Although previous experimental analysis showed Sarsa softmax converging on iteration No.2 it is important to highlight that the run time cycle for each iteration is often lengthy given the application workloads utilised throughout these experiments. As a result, the agent component gets called thousands of times during each cycle. By introducing this technique it provides a mechanism for potentially improving convergence rates. Hypothetically, it is expected that the implementation of PBRS with the proposed Sarsa softmax algorithm will result in both the agent discovering the optimal policy faster while also having an overall positive impact across all four performance metrics.

97

Chapter 7

7.3

7.3. RESULTS

Results

Below presents the findings of this experiment starting with firstly an implementation of PBRS in a simplified problem before moving on to addressing its implications in a more convoluted cloud environment.

7.3.1

PBRS Gridworld Example

In order to gain some insight into the behaviour of an agent utilising PBRS the deterministic gridworld presented in Chapter 5 will be harnessed to provide an intuitive explanation. This will illustrate the effects of introducing such a technique in optimising the agents search space while also highlighting its potential in more volatile cloud based environments. Figure 41 below conveys the effects of introducing PBRS in a much more simplified 5x5 gridworld environment. As illustrated, the standard agent required on average of approximately 15 moves (Std Dev 1.17) in order to reach the goal state in the first cycle of the learning process before converging on iteration No.5. In comparison, it can be observed that the PBRS agent reached the goal state in an average of approximately 8 moves (Std Dev 0.76) in the first iteration. More significantly, it converged to an optimal policy much faster on iteration No.2. Once both agents converged there was no significant difference in performance as both algorithms discover the same optimal policy consisting of 6 moves.

Figure 41: Rate of convergence with PBRS in gridworld In order to further convey the effects of introducing a more advanced technique such as PBRS Figure 42 provides a perspective on the total amount of moves required by each agent in order to discover the goal state over 100 iterations. As shown, the PBRS agent outperforms the standard agent with a total average of 626.44 moves (Std Dev 0.28) in comparison to 641.82 moves (Std Dev 1.12) which resulted from the standard approach.

98

Chapter 7

7.3. RESULTS

Figure 42: Total number of moves to goal with PBRS To provide an even greater insight into the potential benefits of such a technique Figure 43 below compares the behaviour of both agents over the first 5 iterations until both have successfully converged to an optimal solution. As illustrated, the PBRS agent on average requires less than 9 moves to discover the goal state on the first learning trial while it took the standard agent 2 additional trials to penetrate below this barrier. Overall it can be observed that PBRS resulted in an agent making more optimal decisions early on in the learning process resulting in a significant reduction in the number of moves required to reach the goal state. More specifically, it achieved a total reduction of 16 moves between iterations 1 and 4. In terms of the rate of convergence, PBRS converged much earlier on iteration 2 while the standard approach eventually converged to an optimal solution on iteration 5.

Figure 43: Comparison of total moves with PBRS over five learning trials

99

Chapter 7

7.3. RESULTS

Having demonstrated the effects of introducing PBRS into a standard RL algorithm in gridworld, this technique is now incorporated into the Sarsa softmax algorithm which outperformed all other algorithm variations in the previous chapter. It is then compared to the standard version in order to expose the implications of leveraging a more advanced technique. The following section presents the results of this experiment across the range of selected performance metrics.

7.3.2

Energy

Figure 44 illustrates the performance of PBRS Sarsa softmax against the standard Sarsa softmax procedure. After a single iteration the standard algorithm produced 127.33 kWh of energy before successfully converging to an optimal policy of 126.86 kWh on the second iteration. The PBRS Sarsa algorithm recorded an initial energy rate of 126.79 kWh which was deemed the optimum rate of energy in the previous chapter. Remarkably, the fruitful benefits of PBRS are further illustrated as the PBRS Sarsa algorithm continued to learn. This is indicated by the downward descent below the 126 kWh barrier before converging at a 125 kWh energy range with a deviation of a mere 0.1 kWh. A two tailed t-test also confirmed that the results obtained were statistically significant with a p-value of