Chapter. 3 Job Scheduling Strategies for HPC

9 downloads 60527 Views 5MB Size Report
Job Scheduling Strategies and their Implications in Location-Based Services on HPC Clouds. Aftab Ahmed ...... efficient policies, we found that LJF-PE provides affordable results in aspect of both low energy. 7020 ...... Seo, S., et al. Hama: An ...
密级:(公开论文不填写)

博士学位论文

基于HPC云服务的位置服务作业调度策略及其应用

作者姓名:

AFTAB AHMED CHANDIO

指导教师:

Professor Dr. Cheng-Zhong Xu 研究员 : 深圳先进技术研究院

学位类别:

工学博士

学科专业:

计算机应用技术

研究所:

深圳先进技术研究院

2015 年 12 月 I

Job Scheduling Strategies and its Implications in LocationBased Services on HPC Clouds

By [Aftab Ahmed Chandio]

A Dissertation Submitted to The University of Chinese Academy of Sciences In partial fulfillment of the requirement For the degree of Doctor of Philosophy, PhD, (Computer Application Technology)

Supervisor: Professor Dr. Cheng-Zhong Xu

[Shenzhen Institutes of Advanced Technology]

December, 2015 II

Dedications

I would like to dedicate this dissertation to my whole family especially to my beloved mother and father, who dreamed of this success in my life, and to my elder brothers, Asghar Ali and Anwar Ali, who always supported and loved me.

III

Declaration

I, Aftab Ahmed Chandio, hereby declare that this thesis titled, “Job Scheduling Strategies and its Implications in Location-Based Services on HPC Clouds” and the work presented in it are my own research work carried out under the guidance of my mentor, except as cited in the references. As far as I know, no portion of the work referred to in this dissertation has been submitted in support of an application for another degree or qualification of this or any other University or any other institution, except for PhD degree of University of Chinese Academy of Sciences, Beijing China.

Author’s Signature:

Date:

Copyright Authorization

I authorize, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences can preserve the following rights of this dissertation. It can send electronic documents to the relevant institute; the material can be searched, Xerox or scan from public database.

Author’s signature:

Supervisor’s signature:

IV

Date:





我声明本论文是我本人在导师指导下进行的研究工作及取得的研究成果。 尽我所知,除了文中特别加以标注和致谢的地方外,本论文中不包含其他人 已经发表或撰写过的研究成果。与我一同工作的同志对本研究所做的任何贡 献均已在论文中作了明确的说明并表示了谢意。 作者签名:

日期:

论文版权使用授权书

本人授权中国科学院深圳先进技术研究院可以保留并向国家有关部门或 机构送交本论文的复印件和电子文档,允许本论文被查阅和借阅,可以将本 论文的全部或部分内容编入有关数据库进行检索,可以采用影印、缩印或扫 描等复制手段保存、汇编本论文。 (保密论文在解密后适用本授权书。) 作者签名:

导师签名:

日期:

V

Acknowledgments

First of all, I thank almighty ALLAH for inspiring me. His help and mercy have made me able to complete my PhD studies and spend unforgettable time in China. I am grateful to my mentor, Professor Dr. Cheng-Zhong Xu for his motivation, enthusiasm, guidance, immense knowledge, and inspiration during this work. Without his support this dissertation would not has happened. Secondly, I am indebted to my whole family, specially my wife Qamar-Ul-Nisa, for their love, patience, and support throughout my educational career. I am thankful to Dr. Samee U. Khan, an associate professor of North Dakota State University Fargo USA and visiting professor of Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences, for his valuable guidance for first part of my dissertation work (i.e., job scheduling strategies). I am also thankful to all of the members of Center for Cloud Computing Research in Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences. Specially, I am indebted to the following people for their insightful comments, helpful discussions, and proof reading of this whole dissertation: Dr. Nikos Tzritis (assistant professor), Dr. Fan Zhang (associate professor), and Dr. Zhibin Yu (associate professor). I am also grateful to the Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen China (i.e. source of PhD studies funding), who awarded me a full scholarship to pursue my PhD studies here in Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences. Finally, I also would like to thank my former teachers, Professor Dr. M. Saleem Chandio, Professor Dr. Riaz Ahmed Memon, Associate Professor Dr. Imtiaz Ali Korejo, Associate Professor, Dr. Hyder Ali Nizamani, Lecturer Ms. Moomal Pandhiyani, and my all teachers of Institute of Mathematics and Computer Science University of Sindh Jamshoro Pakistan, for their support and inspiration. I am also thankful to my well-wishers especially Professor Dr. Rafiq Ahmed Chandio, Shahnawaz Shaikh, and Shafiq Ahmed Chandio.

VI

摘 要

近来,全球范围内越来越多的组织已经迅速采用云范例提供按需存取、存储以及大数 据处理。基于位置服务的应用对采用云服务起到了巨大的推力作用。由于车辆GPS数据的 实时性及其频率和嘈杂的全球定位系统,GPS数据呈指数级增长的数量(形式为大数据)和服 务质量(QoS)(如高精度和快速响应),通常需要分解为实时处理。因此,匹配服务在许多 LBS应用中起着关键性的作用。不幸的是,目前地图匹配策略并不能较好的满足实时处理 的需求。以上问题的出现是由于未采用自适应策略及并行处理计算。另外,在并行计算情 况下,资源分配以及节能技术要求提供质量保证及减少运行成本。因为这部分的能源消耗 是使用者和运营者最主要的支出。然而,在并行计算环境下当考虑到能源消耗的时候,资 源分配策略就成为比较困难的问题,并且无法完成实现质量保证(例如,时间序列及响应 时间)。 针对前面提到的问题,这篇论文主要论述两部分:a)基于云的地图匹配服务的设计 (一种基本的基于位置的服务),用于安装有全球定位系统功能模块的车辆交通轨迹。b) 针对节能技术的高效能云计算在资源分配策略方面的比较研究。值得一提的是,a)我们 介绍一种新颖自适应的、基于云的地图,适用于实时GPS轨迹匹配策略,这种策略通过预 先计算及在匹配过程中自适应微调内部及外部参数来应对一个关键的挑战,即最短路径查 询策略。在将要讨论的地图匹配策略中,区别于传统方法,最短路线距离及临时性限制都 通过云环境下的批量同步并行模式被预先计算。另一方面,b)当通过地图匹配批量GPS 轨迹时,需要一种高性能并行计算环境并需要解决资源分配问题。因此,我们提供一种基 于作业调度策略的高性能云计算之比较研究,以助于在给定情境下选择一种合适的作业调 度策略。 关键词:地图匹配,作业调度,云计算,最短路径查询,运输.

VII

Job Scheduling Strategies and their Implications in Location-Based Services on HPC Clouds Aftab Ahmed Chandio (Computer Application Technology) Directed By Professor Dr. Cheng-Zhong Xu

Abstract Recently, more and more organizations have rapidly adopted cloud paradigm to provide ondemand access, storage, and large data processing on the Internet anywhere in the world. The applications of location-based services (LBSs) have also given significant impetus to adopt cloud. Because of the infrequent and noisy global positioning system (GPS) data of vehicles and its real-time nature, the amount of GPS data grows exponentially (i.e., forms into Big-Data) and the quality of services (QoS) (such as high accuracy and fast response time) of LBS is often degraded in real-time processing. Therefore, the map-matching service plays a pivotal role in ascertaining the quality for many LBS applications. Unfortunately, current map-matching strategies are not apropos for LBS needing real-time processing. This is because they do not apply techniques using adaptive strategies and parallel processing. On the other hand, in the parallel computing environments, resource allocation and energy efficient techniques are required to deliver the QoS and to reduce the operational cost of the system. Because the cost of the energy consumption in the environment is a dominant part of the owner’s and user’s budget. However, when considering energy efficiency in parallel computing environment, the resource management becomes more difficult, and QoS (i.e., queue time and response time) may be violated. Towards the aforementioned issues, the contribution of this dissertation is two-fold: (a) the design of cloud-based map-matching service (a fundamental LBS service) for GPS-embedded vehicles’ trajectories and (b) the comparative study on resource management (a set of job scheduling strategies) for energy efficiency in the high performance computing (HPC) cloud. Specifically, (a) we introduce a novel fully adaptive cloud-based map-matching strategy for realtime GPS trajectories that addresses a key challenge of the shortest path queries (SPQs), an expensive part of a map-matching process by pre-computations and adaptively fine-tuning the interior and the exterior parameters. In the proposed map-matching strategy, unlike the traditional approaches, the shortest path distances and temporal constraints are pre-computed by following the bulk synchronous parallel (BSP) paradigm in the cloud environment. On the other side, (b) when matching a bulk of GPS trajectories, a map-matching process needs a HPC environment and leads to a resource management problem. Therefore, we provide a comparative study on job scheduling strategies of HPC cloud to analyse their implications in LBSs that helps to choose an appropriate job scheduling policy in a given scenario. Keywords: map-matching, job scheduling, cloud computing, shortest path query, transportation. VIII

Table of Contents

Title Page ……………………………………….................................................. -IIDedicated …………………………………………………………….…………….. -IIIDeclaration …………………………………………………………………………. -IVCopyright Authorization …………………………………….……..……………. -IVAcknowledgements…….………………………………………..………............. -VIAbstract ……………………………………………………………..……………… -VIIITable of Content..………………………………………………..…..................... -IXList of Figures ……………………………………………………….................... -XIIIList of Tables …………………………………………………………..…………. -XVIChapter 1. Introduction ……………………………………..…….…..…………….. 1 1.1. Overview and Motivation …..………..………….……………………..……… 1 1.1.1. Location-Based Services ………………………..……………..……… 2 1.1.2. Job Scheduling ………………………………...………………..……… 3 1.2. The Problem Statement ….…….………...…..………………………….…… 4 1.2.1. Map-Matching Problem…………………………..……………..……… 4 1.2.2. Job Scheduling Problem…….……...…………………………..……… 5 1.3. Research Contributions ……………………………...…..…………………… 6 1.4. Research Methodology …………………………...…..………………….…… 6 1.4.1. Data Collection for Map-Matching ……………………………..……… 6 1.4.1.1. Digital Road Network Map …………………….…….…….………7 1.4.1.2. Real-World Trajectory Dataset …….……….…….…….……… 7 1.4.1.3. Synthetic Trajectory Dataset ….……………………….…..…… 7 1.4.2. Simulation Environments for Map-Matching …….…………..……… 7 1.4.3. HPC Workload for Job Scheduling……..……………………..……… 8 1.4.4. Simulation Setup for Job Scheduling…………………………..……… 8 1.5. Dissertation Outline …………....………………………………………….….. 8 1.6. Summary ……….……..…..…………………………………………..…….….. 9 Chapter 2. Cloud Computing and Big-Data: Trends and Issues....................... 10 2.1. Introduction ……………………………………………………………...……... 10 2.2. Cloud Computing and Data-Processing Platforms …….…...…….………… 12 2.2.1. Deployment Models ………………...…………………………..……… 13 2.2.2. Key Aspects of Cloud Architecture ………………….………………… 14 2.2.2.1. Service Orientation ….……………………….…….…….……… 14 2.2.2.2. Virtualization ………………………………………….…….…….. 14 IX

2.2.2.3. Parallel Computing..………………………………….…….…….. 14 2.2.2.3.1. MapReduce (MR) …………………...….…….……..……… 14 2.2.2.3.2. Bulk Synchronous Parallel (BSP) …………….…….…….. 14 2.2.2.3.3. Directed Acyclic Graph (DAG) ……….……….….……….. 14 2.3. Big-Data ……………….…...….….……………………………….….………... 16 2.3.1. Characteristics ……………….………………...…………….…………. 16 2.3.1.1. Volume ……………..….……………………….…….…………… 16 2.3.1.2. Velocity ……………..….……………………….…….…………… 16 2.3.1.3. Variety ……………..….……………………….…….……….…… 17 2.3.1.4. Veracity ……………..….……………………….…….…………… 17 2.3.2. Big-Data Lifecycle ……………………………...………………………. 18 2.3.2.1. Big-Data Generation ………………………….…….….………… 18 2.3.2.2. Big-Data Acquisition ………………………….……..………….… 18 2.3.2.3. Big-Data Storage …….……………………….…….….………… 18 2.3.2.4. Big-Data Production ………………………….…….….………… 19 2.4. Big-Data Processing ………….…………………………………….….……… 19 2.4.1. Analytic Approaches ……………..…………...…………….…………. 20 2.4.1.1. Mathematical Approaches …………….…….…….….………… 20 2.4.1.2. Data-Mining Approaches ……………...…….…….….………… 20 2.4.2. Cloud-Based Big-Data Processing Techniques …………...…..……. 21 2.4.2.1. Batch Processing ……………………….…….…….….………… 21 2.4.2.2. Stream Processing …….…………………………….…….…… 22 2.4.2.3. Graph Processing …………………………………….…….…… 24 2.4.2.4. DAG Processing ………..…………………………….…….…… 25 2.4.2.5. Interactive Processing ….……………….…….…….…………… 25 2.4.2.6. Visualization ….………………………….…….…….…………… 25 2.5. Big-Data Research Directions and Challenges …………………….……… 26 2.5.1. Map-Matching …………………………………………………..………. 26 2.5.2. Visualizing Transportation Data …………….………………..………. 27 2.6. Summary …………………..………………………………………….….…….. 27 Chapter 3. Job Scheduling Strategies for HPC ………….…………….….......... 29 3.1. Introduction …..………………………...……...……..……………..………….. 29 3.2. Related Work …………………………...……...……..……………..….…….. 32 3.3. Job Scheduling Strategy ……………..……...……..……………..…...…….. 33 3.4. Energy Efficient Strategy ………..……...……………………….…..……….. 34 3.4.1. DPM …….……………….……………………………………..….…….. 35 3.4.2. DVFS …………………….…………………………………….….…….. 35 X

3.5. Workload Characterization and Analysis …………….…...….…..…..…….. 36 3.5.1. Dataset Information ………………………………………….….….….. 36 3.5.2. Job Characteristics and Classification .…...….….…………..….…… 37 3.6. Resource Allocation Job Scheduling Policies …..…………………..….….. 40 3.6.1. Experimental Setup …….…………..……....….…...………………….. 41 3.6.2. Simulation Results ……….………………....….…...………………….. 41 3.6.3. Discussions …………....….…...………………………………..……… 51 3.7. Energy Efficient Job Scheduling Policies …….…………….…...……..…… 53 3.7.1. Energy Model ………………..……………………………………..…… 53 3.7.2. Energy Efficient Criteria ……….……………………..…………..……. 54 3.7.3. Problem Statement …………….……………………..…………..……. 55 3.7.4. Proposed Energy Efficient Policies …..…….………..…………….…. 55 3.7.5. Results of the Proposed Energy Efficient Policies…………….…….. 56 3.8. Summary ……………………………..……….….…………………………...... 59 Chapter 4. Job Scheduling Strategies for HPC Cloud …....…………….……... 61 4.1. Introduction ………..……...……………………….….…………….………….. 61 4.2. Related Work ……..……...……………………….….………………….…….. 63 4.2.1. Energy Efficient Techniques ……..……………………………..…….. 64 4.2.1.1. DPM …………………………………….…….…….….……..…… 65 4.2.1.2. DVFS …….…………………………….…….…….….…..……… 65 4.3. System Model …………….…...….….……………………………….……….. 65 4.3.1. Data Center Model …………………………………………….……….. 66 4.3.2. VM Resource Model …………………………………………..……….. 67 4.3.3. VM Workload Criteria ………………………………………………….. 68 4.3.4. Energy Model ………………………………………………….……….. 69 4.3.5. Problem Statement ……..………………………………………..…….. 70 4.4. VM Scheduling Strategies ………………………………...…...……..………. 71 4.4.1. List Heuristics ………………………………………………….……….. 71 4.4.2. Energy Efficient VM Scheduling Strategies …………..…………….. 72 4.5. Experimental Setup …..…..…………….……………....….…...…………….. 75 4.5.1. Computation Environment …………………………………………….. 75 4.5.2. HPC Workloads ………………….…………………………………….. 75 4.5.3. Performance Metrics ………………………………………………..….. 77 4.5.4. Results and Discussions …….…………………………………..…….. 77 4.6. Summary …………………………………………..…….………….………….. 81 Chapter 5. Adaptive and Tuneable Map-Matching Strategy ...………………… 83 5.1. Introduction ….…………………….………….………………………………… 83 XI

5.2. Map-Matching Strategies ...……………………………...……………………. 86 5.2.1. Spatial and Temporal Matching Techniques………………….……… 87 5.3. Definitions ……..………...………………….…………….……………………. 89 5.4. Fully Adaptive Map-Matching Strategy .......................…………….………. 90 5.4.1. Architecture ………………………………………….….….…………… 90 5.4.2. Modified SSSP Algorithm Following BSP Parallel Paradigm............ 91 5.4.3. Map Matching Strategy For Real-Time GPS Trajectories...…….….. 94 5.5. Experimental settings ………………………………………..………………… 98 5.5.1. Computational Environment ………….………..………..…………….. 98 5.5.2. Datasets …………………………….….……………………………….. 98 5.5.2.1. Road Network Dataset …..…………….…….…….…..………… 98 5.5.2.2. Real-World Trajectories Dataset ….…………….……….……… 99 5.5.2.3. Synthetic Trajectories Dataset ………….…….…….….…………99 5.6. Experimental Results and Discussions ……………………………………… 99 5.6.1. RT-Matching vs ST-Matching ………….…………………….………… 100 5.6.2. Impact of Interior and Exterior Settings ………………………............ 101 5.6.2.1. Static Parameters Analysis ………….…….…….……..………… 101 5.6.2.2. Adaptive Parameters Analysis ………….…….…….….…………102 5.6.2.3. Comparison ………………………….….…….…….….…..……… 105 5.7. Summary ……………………………………….……………………….………. 106 Chapter 6. Conclusions ….………………………….......…………………………… 107 6.1. Problem Area …….……………….………….…………………………………. 107 6.2. Contributions …….……………….………….…………………………………. 108 6.3. Limitations and Future Direction …………….………………………………… 109 References …..…………….……………………………………………….…….. 117 Appendix A: List of published/accepted and submitted articles ……..... 110 Appendix B: Pseudocodes …….……………………….…………..……....... 112 Author Biography …………………….…………………….…………………… 116

XII

List of Figures

Figure 1 Illustration of GPS Trajectory for map-matching on the digital road system .. Figure 2 Illustration of job scheduling process .................................................... Figure 3 Digital road map of Shenzhen city in China .............................................

2 3 7

Figure 4. The survey report on the applications driving big-data needs (multiple responses, n = 95) [36] .................................................................... Figure 5 Classification of cloud computing ......................................................... Figure 6 MapReduce framework ...................................................................... Figure 7 Bulk synchronous parallel paradigm ...................................................... Figure 8 Classification of big-data ..................................................................... Figure 9 Big-data framework ........................................................................... Figure 10 Big-data computing models ............................................................... Figure 11 Total offered load in a month per day .................................................. Figure 12 Jobs arriving per hour ....................................................................... Figure 13 Jobs breakdown according to number of CPUs ...................................... Figure 14 Breakdown distributions of the Short jobs ............................................ Figure 15 Breakdown distributions of the Long jobs ............................................. Figure 16 Mean queue time............................................................................. Figure 17 Mean response time......................................................................... Figure 18 Summary for queue time .................................................................. Figure 19 Summary for response time................................................................ Figure 20 Mean slowdown .............................................................................. Figure 21 Slowdown ratio ............................................................................... Figure 22 Mean queue time (ob1) .................................................................... Figure 23 Mean response time (ob1) ................................................................ Figure 24 Mean slowdown (ob1) ...................................................................... Figure 25 Mean queue time (ob2) .................................................................... Figure 26 Mean response time (ob2) ................................................................ Figure 27 Mean slowdown (ob2) ...................................................................... Figure 28 Mean queue time (ob3) .................................................................... Figure 29 Mean response time (ob3) ................................................................ Figure 30 Mean slowdown (ob3) ...................................................................... Figure 31 Mean queue time (ob4) .................................................................... Figure 32 Mean response time (ob4) ................................................................

XIII

11 13 15 15 16 19 21 37 37 38 39 39 42 43 43 44 44 45 45 46 46 47 47 48 48 49 49 50 50

Figure 33 Mean slowdown (ob4) ...................................................................... Figure 34 Energy sum (mJ) .............................................................................. Figure 35 Runtime sum .................................................................................. Figure 36 Energy sum (mJ) of Narrow jobs ......................................................... Figure 37 Runtime sum of Narrow jobs ............................................................. Figure 38 Energy sum (mJ) of Wide jobs ............................................................ Figure 39 Runtime sum of Wide jobs ................................................................ Figure 40 A Gant chart for a user’s VM request scheduled on cores (cpus) ............... Figure 41 A total percentages of busy CPUs in the CCR Buffalo data center .............. Figure 42 A total offered load in a month per day in the CCR Buffalo data center ......

51 57 57 58 58 58 59 68 76 76

Figure 43 Energy efficiency of energy efficient VM scheduling strategies under the different power management governor schemes ..................................

78

Figure 44 A total energy consumption of energy efficient VM scheduling strategies under the different power management governor schemes ....................

79

Figure 45 Overall running time of energy efficient VM scheduling strategies under the different power management governor schemes .................................

79

Figure 46 Makespan of energy efficient VM scheduling strategies under the different power management governor schemes .............................................

80

Figure 47 Mean queue time of energy efficient VM scheduling strategies under the different power management governor schemes .................................

80

Figure 48 Mean slowdown of energy efficient VM scheduling strategies under the different power management governor schemes .................................

81

Figure 49 Slowdown ratio of energy efficient VM scheduling strategies under the different power management governor schemes ................................. Figure 50 Basic steps in a Map-Matching .........................................................

81 86

Figure 51 Simulation results of the ST-Matching w.r.t. a different number of GPS points in real-world trajectories (Total number of SPQs) .........................

88

Figure 52 Simulation results of the ST-Matching w.r.t. a different number of GPS points in real-world trajectories (Total number of LessFound CP) ............. Figure 53 The interior setting for considering candidate points for a sampling point ... Figure 54 An overview of the proposed system .................................................. Figure 55 An example for SSSP computation: a network graph ............................... Figure 56 An example for SSSP computation: SSSP results of all nodes in the graph ...........

88 88 91 92 92

Figure 57 An example of the proposed window-based scheme of map-matching for real-time trajectory (W=5) ............................................................... Figure 58 Running time w.r.t. a different number of GPS points in trajectories .......... Figure 59 Accuracy w.r.t. a different sampling interval (synthetic data set) ...............

XIV

95 100 101

Figure 60 Running time w.r.t. a different number of GPS points in trajectories ..........

101

Figure 61 Overall running time w.r.t. different number of candidate points of full-day trajectory ......................................................................................

102

Figure 62 Average running time of the sliding window of full-day trajectories w.r.t. a different number of circle radius and candidate points ........................... Figure 63 Evaluation of locality of road network:

102

CMP Accuracy w.r.t. a different number of

............................................................................. Figure 64 Evaluation of locality of road network: Distribution of GPS points in each class ..... Figure 65 Evaluation of locality of road network: Performance Metrics, i.e., CPs and SPQs .... GPS sampling-rate

103 103 103

Figure 66 Average running time of the sliding window: Running time of each sliding window in Plot ............................................................................... Figure 67 Running time of each sliding window in Box-and-Whiskers plot ................

XV

105 105

List of Tables

Table 1 Peak power of the components of a typical server .................................... Table 2 Full specification of data center ............................................................ Table 3 Breakdown distribution for job length .................................................... Table 4 Percentage breakdown for correlation between jobs width and length .........

30 36 38 39

Table 5 Observation table for entire workloads (job’s breakdowns for correlation between job’s width and job’s length) ................................................ Table 6 Performance of job scheduling policies ................................................... Table 7 CPU classes ....................................................................................... Table 8 DVFS module ..................................................................................... Table 9 The Peak power of the components of a typical server ............................. Table 10 Nomenclature .................................................................................. Table 11 PM Classes ...................................................................................... Table 12 DVFS Models ................................................................................... Table 13 Percentage breakdowns for correlation between jobs width and length ...... Table 14 Parameters based on locality information.............................................. Table 15 Percentage of total GPS sample data.....................................................

42 52 54 54 62 66 67 70 77 95 103

Table 16 Distributions of GPS sample data in each Class for map-matching full-day real-world trajectory ....................................................................... Table 17 Statistics of sliding window..................................................................

104 104

Table 18 Comparison ratios of adaptive strategy RT-MM-II against static strategy RTMM-I in terms of the different metrics for map-matching full-day realworld trajectory .............................................................................

XVI

105

Chapter. 1

Introduction

In the first chapter of this dissertation, we present an overview of the research problem identified in this dissertation and its motivation. This chapter also summarizes our major contributions and outlines the dissertation organization.

1.1. Overview and Motivation In recent years, most organizations are gradually diverting their applications to the cloud environment. This is because the cloud environment enables on-demand access, storage service, and large data processing on the Internet anywhere in the world. The rapid growth of information and communication technologies (ICT) in the developed and developing countries leads to a new emerging concept called urban computing, wherein, sensors, vehicles, devices, buildings, people, and roads are used as a component to probe city dynamics. Their data representation is widely available in the form of global positioning system (GPS), and is an enormous amount of GPS data ranging from a few dozens of Tera Bytes to Peta Bytes (i.e., GPS Big-Data). Information from GPS data has been used in many location-based services (LBS) and intelligent transportation system (ITS) applications (i.e., traffic flow analysis [1], the route planner [2], geographical social network [3], and the hot route finder [4]) in order to achieve better quality of services (QoS). Because LBS and ITS applications need to process a massive amount of data in an effective way, they have also given significant impetus to adopt cloud (i.e., agent-based urban transportation systems [5], urban intelligence transportation [6], a cloud-enabled intensive FCD computation framework [7], and traffic flow forecasting [8]) and parallel computing paradigm (i.e., CyberGIS framework [9] and TeraGrid GIScience Gateway [10, 11]). On the other hand, in parallel computing environments (such as, supercomputing, cluster, grid, or cloud), resource allocation and energy efficient techniques are required to deliver the QoS and to reduce the operational cost of the system. Because the cost of the energy consumption in the environment is a dominant part of the owner’s and user’s budget. However, when considering many objectives, i.e., non-violation of the QoS, full-filling the service level agreement (SLA) requirements, and achieving energy efficiency in the environments, the resource allocation techniques become harder to define. In the other words, the problem of resource allocation technique (i.e., job scheduling) is a NP-hard problem. From the above motivation, this dissertation works on two major research problems: the design of cloud-based map-matching service (a fundamental LBS service) for GPS trajectories and the comparative study on resource allocation techniques (a set of job scheduling strategies) for energy efficiency and delivering QoS to analyse their implications in LBSs. 1

1.1.1. Location-Based Service Location-based service (LBS) is a service that requires knowledge about vehicle position (latitude and longitude coordinate data, known as GPS data) continuously during its operation. However, the most of LBSs (such as advanced traffic management system, advanced public transport and operations, commercial vehicle operations, emergency management systems, advanced vehicle safety system) require navigation and positioning capabilities to provide the physical location of a vehicle travelling on a road network. Specifically, through providing navigation and positioning capabilities, LBSs assist in keeping track of vehicles and finding desired paths to destinations. Map-matching is a key procedure of LBSs which provides navigation and positioning capabilities. Specifically, the map-matching process plays a pivotal role in ascertaining the QoS for LBSs (such as driving directions, road guidance, moving object management, and traffic flow analysis). Basically, map-matching is a technique to align in an accurate way the observed GPS positions onto a road network that is in a form of a digital map [12]. Figure 1 illustrates (a) a GPS log data and a GPS-embedded vehicle trajectory and (b) road segments of the digital road system (Figure 1 (a) and (b) are taken from [12]). In recent years, due to its sensitivity nature, a large number of map-matching techniques have been proposed and still it is on top priority of LBS application developers. Quddus et. al., [13] presented a survey of map-matching algorithms and identified their performance and overheads.

(a)

Illustration of GPS log data and GPS-embedded vehicle trajectory

(b)

Illustration of road segments of the digital road system

Figure 1 Illustration of GPS Trajectory for map-matching on the digital road system. p: GPS sampling point e: road segment e.start: start road segment e.end: end road segment

2

1.1.2. Job Scheduling The job scheduler is a major component for resources management of any large-scale parallel computing environments such as supercomputing, grid computing, cluster computing, and cloud computing. A job scheduling strategy in a job scheduler is used to assign jobs to resources at specific time intervals such that the capacity of resources should meet job needs [14]. Specifically, a job is a user request queued in the job scheduler system (e.g., user sends requests for map-matching a single or a bulk of GPS trajectories). Suppose 𝑚 denotes the total number of machines (i.e., computing resources), 𝑀𝑖 (𝑖 = 1, … , 𝑚) to process 𝑛 jobs 𝐽𝑗 (𝑗 = 1, … , 𝑛). Figure 2 (a) illustrates a job scheduling process. A job 𝐽𝑗 is a program submitted by a user at a specific time interval (submit-time). Each job contains one or more tasks 𝐽𝑗 = 𝑇𝑗𝑘 (𝑘 = 1, … , 𝑜), with each of these tasks being executed on a separate CPU for a given time period as shown in Figure 2 (b). A complete scheduling process schedules the job, and allocates one or more time intervals of one or more machines. The corresponding scheduling policy problem is to find an optimal schedule process subject to various constraints, such as, (a) minimize the queue time, response time, makespan, and energy consumption, and (b) maximize the overall system utilization.

(a) Job scheduling process

M5 M4

J1 J5 J3

M3 M2 J2

J4

M1

T

(b) A Gantt chart Figure 2 Illustration of job scheduling process.

3

1.2. The Problem Statement In this study, we consider two research problems: (i) map-matching problem for GPSembedded vehicle trajectories and (ii) the resource management problem (job scheduling) for high performance computing (HPC) workloads such as matching a bulk of GPS trajectories.

1.2.1. Map-matching Problem While intensively reviewing literatures [12, 13, 15-17], we identified that the best performance of map-matching in terms of accuracy is based on the transition probability which incorporates the shortest path between two consecutive candidate points. On the other extreme, in the map-matching process in LBS, the execution of the shortest path queries (SPQs) is the most expensive part in terms of running time [12]. Inaccurate Trajectory Data: Furthermore, the map-matching process becomes a critical step in LBS when it needs to process infrequent and imprecise sampling GPS location data that are sometimes affected by two typical problems [18]: 1) Due to the limitations of positioning devices, moving objects mostly generate noisy and imprecise GPS location data that is called the measurement error. This leads uncertainty in acquiring original locations of the object. 2) On the next sampling error, moving objects continuously update its location at discrete time intervals, which may lead to sampling error. The low sampling rate and long intervals between updates may reduce energy consumption and communication bandwidth at the expense of increasing the uncertainty of the actual location. On the contrary, the highsampling-rate greatly increases the amount of extraneous data. Fixed Values of Map-Matching Parameters: The noisy GPS data can be handled by maximizing a number of candidate points (CP) and error circle radius (ECR) in a map-matching process to consider more likely road segments. It may suffer a problem of uncertainty when ECR is too small, map-matching process selects no road segments, while in the case of too large ECR, map-matching process can have more road segments that may be a computation intensive. Next, the GPS data with extended sampling period typically contains a large number of extraneous data (i.e., especially in case of a vehicle: (a) stops many times, (b) moving slowly, (c) trapped in a traffic jam, (d) waiting for a signal, and (e) moving on high-way link), which must be eliminated before the map-matching process. Moreover, in the case of large gap between two consecutive GPS points (low-sampling GPS trajectory) makes map-matching process difficult to judge which road segment drivers choose between these two sampling points if multiple roads are near the trajectory. Accuracy would be degraded. Extra Number of SPQs: The both GPS location errors (i.e., measurement and sampling 4

errors) always incur an extra number of the SPQs in the map-matching process by using a fixed number of CP and ECR, which extensively increase overall running time of the process. Existing map-matching strategies in LBS do not take these facts into account. However, the approaches [19-24] in the literature review can deal with the execution of SPQs by pre-computing the shortest path distances and partitioning a large network graph to small regions such that the required partition could be fit in the memory [23]. Because of the execution of the SPQs is processed in a sequential way, the current state-of-the-art approaches suffer high precomputation and storage costs [24]. Due to the aforementioned facts, unfortunately, the mapmatching techniques in the current state-of-the-art degrade the QoS (i.e., high accuracy and fast running time), and are not apropos for LBS applications needing real-time processing. Basically, real-time traffic information provides vital inputs for dynamic traffic control and management systems [16, 25]. The above is because they do not apply techniques using adaptive strategies and parallel processing. Therefore, it is of paramount importance for the real-time map-matching to handle the executions of the SPQs in an effective and flexible way.

1.2.2. Job Scheduling Problem Map-Matching a Bulk of GPS Trajectories: In today’s ICT era, GPS data is more voluminous and is being collected in real-time nature. For example, there are 14,000 taxis in Shenzhen. With a 30 seconds sampling rate, these taxis generate a 40 million GPS records in a day. When map-matching a bulk of GPS trajectories, it may become a form of high performance computing (HPC) workloads. Intensive Workloads: Basically, HPC is the use of parallel computing for running advanced application programs dealing HPC workloads efficiently, reliably, and quickly. Such application programs are usually complex jobs require a large amount of computing power and/or data, which may be computation-intensive (i.e., job demands more CPU time), data-intensive (i.e., job demands more storage space and communication cost), or mixed (i.e., computation-intensive and data-intensive). Furthermore, resource allocation of such kind of workloads incorporating energy efficiency is a serious concern of a system owner or resource provider (RP). Selection of Resource Scheduling Strategy: Basically, RP always aims to efficiently utilize the finite resources to a vast number of users, and to maintain the different QoS levels. A vast research is introduced resource scheduling strategies to address the problem of system utilization under the different QoS constraints. However, the resource scheduling strategies are designed according to the intensive of workload or different goals. Therefore, comparison and analysis of different scheduling strategies for HPC workload is a difficult task in order to choose the best job scheduling strategy for a given scenario e.g., LBS applications.

5

1.3. Research Contributions Towards the aforementioned research problems, the contribution of this dissertation is also two-fold: (i) the design of map-matching strategy based on cloud and (ii) the study of a set of job scheduling strategies for energy efficiency in cloud environment. We introduce cloud-based RT-MapMatching, a fully adaptive map-matching strategy for realtime GPS trajectories. More specifically our contributions are: (a) We present a systematical model of map-matching strategy for real-time GPS trajectories. (b) We propose a window-based strategy that periodically monitors the flow of real-time GPS workloads and the parameters of sliding windows. (c) We introduce a tuning-based strategy that adaptively adjusts a sampling-rate of GPS data and fine-tunes the interior and the exterior parameters of the map-matching process on real-time in order to reduce a possible number of SPQs. The technique for adjusting a sampling-rate of GPS data provides highly accurate GPS data after eliminating extraneous data. While the technique that fine-tunes the parameters of map-matching process selects an apt number for candidate points and error circle radius for each GPS sampling point in a trajectory. By the both strategies window-based and tuning-based we aptly proposed a viable solution to the issues posed by the SPQs, which efficiently handle the SPQs. (d) Moreover, to compute the shortest path distances and temporal/speed constraint, we proposed an extension of the single source shortest path (SSSP) function following the bulk synchronous parallel (BSP) paradigm in the cloud environment, which reduces pre-computation time and storage cost. (e) The above approach is empirically evaluated using real-world and synthetic datasets. (f) Furthermore, in this research we also provide a comparative study on a set of job scheduling strategies for HPC workloads for resource management. In the last, we incorporate traditional job scheduling strategies with the dynamic voltage and frequency scaling (DVFS) power management technique. The studied job scheduling strategies consist of user-oriented and system-oriented policies. The strategies are extensively simulated and compared with three power management governing methods provided at hypervisor level (i.e., userspec, ondemand, and performance).

1.4. Research Methodology In this section, we describe methodology used for map-matching and job scheduling research. The simulated datasets and their simulation environments for the both research problems are also explained in this section.

1.4.1. Data Collection for Map-Matching In our experiments for map-matching strategies, we used three different datasets: road network dataset, real-world trajectory dataset, and synthetic trajectory dataset. 6

1.4.1.1. Digital Road Network Map The real-world road network map of Shenzhen city in China is used. The real-world road network graph contains a total of 86335 vertices and 133397 road segments. Figure 3 shows a snapshot of digital road map of Shenzhen city in China [26].

Figure 3 Digital road map of Shenzhen city in China.

1.4.1.2. Real-World GPS Trajectory Dataset For testing the proposed map-matching strategy, the real-world trajectory dataset used in our simulation was recently collected on 10th October 2013 from a GPS-embedded taxi travelled in a full-day around the Shenzhen city. About 5,128 total GPS points were captured in 24-hours. We also select few short trajectories based on trips from aforementioned full-day trajectory, which are varied according to the different GPS sampling points and sampling-rate. 1.4.1.3. Synthetic GPS Trajectory Dataset We also used a synthetic trajectory dataset, which is randomly generated by our own simulator. The simulator first creates a shortest path between two random vertices in the road network. Then it considers the edges returned by the shortest path as a ground truth. The sampling points are generated on the respective edges according to the sampling-rate. To encapsulate GPS sampling error, each GPS sampling point follows zero mean and 20 meters standard deviation normal distribution. The method for creating synthetic trajectory dataset has been verified in the Ref. [12].

1.4.2. Simulation Environments for Map-Matching In our overall experiment, we have established two setups for computation environment. Because the pre-processing step uses the parallel paradigm discussed in Section 5.4.1, (a) we 7

created a setup on a Hadoop cloud environment. Each of the total three physical machines, one for a master and two for workers, is comprised of 8 CPUs Intel(R) Xeon(R) 2.20GHz and 32 GB of memory. We used the Hama (version 0.6.2) parallel distributed environment on the top of the Hadoop (version 1.1.2). Next, to provide real-time map-matching processing, (b) we selected one of the above physical machines and designed a custom-based simulation setup. In order to provide fair comparison, we implemented all of the studied map-matching strategies in Java programming language (version 1.7), because the Java environment setup allows database connectivity [27]. We store all of the datasets in the PostgreSQL (version 9.1) database tool together with the PostGIS (version 2.0) spatial tool and pgRouting tool for SPQs.

1.4.3. HPC Workloads for Job Scheduling We used a real data center workload from the Center for Computational Research (CCR) of State University of New York at Buffalo to evaluate the system performance. The data center is a collection of multiple computational resources clustered together using communication infrastructure, which fall into two categories: (a) homogeneous and (b) heterogeneous resources. The resources in homogeneous systems are similar in terms of size and capacity, in which a job executes in similar capacity, whereas the resources in heterogeneous system are organized with different specification. Our experiments include on the both types of data center systems. The workloads were collected during 30 days’ time period from February 20, 2009 to March 22, 2009. A total of 22385 jobs were executed on more than 1000 dual processor nodes (i.e., 2000 CPUs with 3.0 GHz speed) [28-30]. Each job is comprised of a number of tasks requiring one or more than one processing elements.

1.4.4. Simulation Environment for Job Scheduling The entire job scheduling strategies are used to schedule the aforementioned HPC workload to figure out the best job scheduling policies for optimizing the energy efficiency along with the system performance. For the workloads simulation, we developed a custom Java based discrete event simulator. The Java environment setup allows database connectivity [31, 32], where we stored the dataset described. In the event-based setup, the scheduler policies check the queue periodically (i.e., every 10 seconds [33]) and schedule the jobs accordingly.

1.5. Dissertation Outline The rest of the dissertation is organized as follows: Chapter 2 presents a brief introduction of cloud computing and big-data processing techniques. Further in this chapter, in Section 2.2, we give an overview of cloud computing. In Section 2.3, we give an overview of big-data. In Section 2.4, we introduce state-of-the-art big8

data processing technologies. In Section 2.5, we discuss big-data research directions and challenges. Section 2.6 concludes the chapter. Chapter 3 presents a comparative analysis of job-scheduling strategies of large-scale parallel computing systems. This chapter is organized as follows. Section 3.2 states the related work followed by the job scheduling strategy and energy efficient strategy in Section 3.3 and Section 3.4, respectively. The comprehensive characterization and analysis of data center workload is explained in Section 3.5. While in Section 3.6, we present the simulation and experimental results of various job scheduling policies with their discussion. We extend the job scheduling policies to implement the energy efficiency technique and their result discussions are explained in the Section 3.7. Finally, Section 3.8 concludes the chapter and highlights future research directions. Chapter 4 provides a comparative analysis of VM-scheduling strategies of HPC clouds. The rest of the chapter is organized as follows: Section 4.2 presents the related work of energy efficient strategies. In Section 4.3, we explain the system model and problem statement, while in Section 4.4 we introduce VM scheduling strategies for energy efficiency. Section 4.5 states the experimental settings and simulation results of various VM scheduling strategies with their discussion. Finally, in Section 4.6, we conclude the chapter. Chapter 5 presents an adaptive and tuneable cloud-based map-matching strategy for GPStrajectories. Section 5.2 sets forth the related work in terms of map-matching and Section 5.3 expresses major definitions of terms used in this paper. In Section 5.4, we present a complete overview of the proposed fully adaptive map-matching strategy for real-time GPS trajectories. In the sequel, in Section 5.5, we describe our experimental settings and computation environments. Lastly, Section 5.6 presents the results evaluation and discussion, while Section 5.7 concludes the chapter. Chapter 6 concludes the dissertation. In this chapter, Section 6.1 describes problem area. Section 6.2 highlights contributions. Section 6.3 addresses limitations and future research.

1.6. Summary In this chapter, we presented an overview of the research problems and objectives of this dissertation followed by the contributions and dissertation outlines. In the next chapter, we will thoroughly describe the fundamentals of cloud computing and current big-data key technologies. Big-data processing techniques are categorized as batch-based, stream-based, graph-based, DAG-based, interactive-based, or visual-based according to their processing techniques.

9

Chapter. 2

Cloud Computing and Big-Data: Trends and Issues1

The second chapter of this dissertation describes the fundamentals of cloud computing and current big-data key technologies. We discuss big-data processing techniques and categorize them as batch-based, stream-based, graph-based, DAG-based, interactive-based, or visual-based according to their processing techniques. In this chapter, we highlight the strengths and weaknesses of various big-data cloud processing techniques in order to help the big-data community select appropriate big-data processing techniques. We also provide big-data research challenges in aspect to transportation management systems.

2.1. Introduction In today’s ICT era, data is more voluminous and multifarious and is being transferred with increasing speed. Some reasons for these trends are: scientific organizations are solving big problems related to high-performance computing workloads; various types of public services are emerging and being digitized; and new types of resources are being used [34]. Mobile devices, global positioning system, computer logs, social media, sensors, and monitoring systems are all generating big-data. Managing and mining such data to unlock useful information is a significant challenge [35]. Big-data is huge and complex structured or unstructured data that is difficult to manage using traditional technologies such as database management system (DBMS). Call logs, financial transaction logs, social media analytics, intelligent transport services, location-based services, earth observation, medical imaging, and high-energy physics are all sources of big-data. Figure 4 shows the results of a big-data survey conducted by Talend [36]. The survey revealed that many common real-world applications deal with big-data. Real-time monitoring traffic system (RMTS) is one of the most interesting examples of a transportation monitoring system (TMS) [37]. In such a system, information about vehicles, buildings, people, and roads are accessed to probe city dynamics. The data is often in the form of GPS location. Because of the real-time nature of data collected in a TMS, the amount of data can grow exponentially and exceed several dozen terabytes [26]. For example, there are 14,000 taxis in Shenzhen. With a 30s sampling rate, these taxis generate a 40 million GPS records in a day. This GPS data is often used by numerous transportation services for traffic flow analysis, route planning and hot route finder, geographical social networking, smart driving, and map matching [12, 26, 37]. However, to extract and mine massive transportation data from a database 1

A paper with the part of Chapter 2 has been published in the ZTE Communications (indexed by EI-Inspec) 13(1): 50-59, March 2015, ISSN 1673-5188, doi: 10.3969/j.issn.1673-5188.2015.01.007 (wwwen.zte.com.cn/en/about/publications/). Title of the paper: “Big-Data Processing Techniques and Their Challenges in Transport Domain”

10

comprising millions of GPS location records, TMS needs an effective, optimized, intelligent ICT infrastructure.

Figure 4. The survey report on the applications driving big-data needs (multiple responses, n = 95) [36].

Cloud computing is one of the best potential solutions to dealing with big-data. Many bigdata generators have been adapted to cloud computing. According to a survey by GigaSpaces [38], only 20% of IT professionals said their company had no plans to move their big-data to the cloud, which indicates that most companies dealing with big-data have turned to the cloud [26]. Several TMS applications, such as cloud-agent-based urban transportation systems, MapReduce for traffic flow forecasting, and cloud-enabled intensive FCD computation framework [5, 7], have been significant in bringing forward the cloud computing paradigm. Cloud computing integrates with computing infrastructures, e.g., data centres and computing farms, and software frameworks, e.g., Hadoop, MapReduce, HDFS, and storage systems to optimize and manage big-data [35]. Because of the importance and usability of cloud computing in daily life, the number of cloud resource providers has increased. Cloud resource providers offer a variety of services, including computation and storage, to customers at low cost and on a pay-per-use basis. Currently, the cloud computing paradigm is still in its infancy and has to address several issues, such as energy efficiency and efficient resource use [30, 39, 40]. Unfortunately, as bigdata applications are driven into the cloud, the research issues for the cloud paradigm become more complicated. Hosting big-data applications in the cloud is still an open area of research. In this paper, we describe the fundamentals of cloud computing, and we discuss current big-data technologies. We categorize key big-data technologies as batch-based, stream-based, graph-based, DAG-based, interactive-based, or visual-based. To the best of our knowledge, the Hadoop big-data techniques that fall into these categories have not been covered in the literature to date. In this survey, we highlight the strengths and weaknesses of various Hadoop-based big11

data processing techniques in the cloud and in doing so; intend to help people within the big-data community select an appropriate processing technique. We discuss challenges in big-data research as well as future directions in big data related to transportation domain. In Section 2.2, we give an overview of cloud computing. In Section 2.3, we give an overview of big-data. In Section 2.4, we introduce state-of-the-art big-data processing technologies. In Section 2.5, we discuss big-data research directions and challenges. Section 2.6 concludes the chapter.

2.2. Cloud Computing and Data-Processing Platforms Cloud computing is quite mature technology and is being adapted to every kind of real-world application. Over the next two decades, cloud computing technologies will be crucial to innovation in education, government, healthcare, transportation, traffic control, Internet-based business, manufacturing, and media. Cloud computing is the collection of computing resources that can be accessed via a digital network such as wide-area network (WAN) or the Internet. These resources can be accessed using a computer, tablet, notebook, smart phone, GPS device, or some other device. Cloud servers provide and manage the applications and also store data remotely [41]. Cloud computing has been defined in numerous ways by different researchers and experts. The authors of [41-43] all have their own opinions of what constitutes cloud computing. NIST [42] defines cloud computing as “a model for enabling ubiquitous, convenient, on-demand network access to shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.” Cloud computing is not a new concept; it has been derived from several emerging trends and key technologies, including distributed, utility, and parallel computing [43]. In the following sections, we describe the architecture and key technologies of cloud computing, which is classified in Figure 5.

12

Public Organizational

Private Hybird

Architecture SaaS Technical Cloud Computing

PaaS IaaS

Technologies

Virtulization

Map Reduce

Parallel Computing

BSP

SOA

DAG

Figure 5 Classification of cloud computing.

2.2.1. Cloud Deployment Models Cloud architecture can be explained from organizational and technical perspectives. From an organizational perspective, cloud architecture can be categorized as public, private or hybrid [44] according to deployment model. 

Public: A public cloud deployment model is used for the general public or a large group of industries. Examples are Google App Engine, Microsoft Windows Azure, IBM Smart Cloud, and Amazon EC2.



Private: A private cloud deployment model is used for an organization. Examples are Eucalyptus, Amazon VPC, VMware, and Microsoft ECI data centre. Hybrid: A hybrid cloud deployment model is a mixture of two or more clouds (i.e., public and private) for a unique domain. Examples are Windows Azure and VMware vCloud.



From a technical perspective, cloud architecture has three main service models: infrastructure as a service (IaaS), platform as a service (PaaS), and software as a service (SaaS) [45]. 

IaaS: The first-layer IaaS provides access with an abstracted view of centrally located hardware, computers, mass storage, and networks. The most popular examples of IaaS are IBM IaaS, Amazon EC2, Eucalyptus, Nimbus, and Open Nebula.



PaaS: The PaaS layer is designed for developers rather than end-users. It is an environment in which the programmer can execute services. Examples of PaaS are Google AppEngine and Microsoft Azure. opportunity 13



SaaS: The SaaS layer is designed for the end-user, who does not need to install any application software on their local computer. In short, SaaS provides software for rent and is sometimes called on-demand applications over the Internet. The most popular examples of SaaS are Google Maps, Google Docs, Microsoft Windows Live, and Salesforce.com.

2.2.2. Key Aspects of Cloud Architecture 2.2.2.1. Service Orientation In cloud computing, service-oriented architecture (SOA) is a software architecture that defines how services are offered and used. Functions and messages in the SOA model are used by end-users, applications, and other services in cloud computing. In other words, the SOA determines the way services are designed, deployed, and managed. SOA services are flexible, scalable, and loosely coupled [46]. In an SOA, services are interoperable, which means that distributed systems can communicate and exchange data with each another [46]. 2.2.2.2. Virtualization Virtualization involves creating an abstract, logical view of the physical resources, e.g., servers, data storage disks, networks, and software, in the cloud. These physical resources are pooled, managed, and utilized. Virtualization has many advantages in terms of resource usage, management, consolidation, energy consumption, space-saving, emergency planning, dynamic behaviour, availability, and accessibility [47]. Operating systems, platforms, storage devices, network devices, and software applications can all be virtualized. 2.2.2.3. Parallel Computing The parallel computing paradigm in cloud computing is pivotal for solving large, complex computing problems. The current parallel-computing paradigms in cloud environments include MapReduce, bulk synchronous parallel (BSP), and directed acyclic graph (DAG). The jobs handled within these paradigms are computation requests from the end-user and may be split into several tasks. MapReduce was introduced by Google to process mass data on a large cluster of low-end commodity machines [48]. MapReduce is an emerging technique based on Hadoop. It is used to analyze big-data and perform high-throughput computing. Hadoop [49] is an Apache project that provides a library for distributed and parallel processing. Each job is divided into several map- and reduce-tasks (Figure 6). MapReduce takes input data in the form of < key; value > pairs, which are distributed on computation nodes and then map-task produces intermediate < key; value > pairs to distribute them on computation nodes. Finally, the intermediated data is processed by reduce-task to generate the final output < key; value > pairs. During this process, 14

input and output data are stored in Hadoop distributed file system (HDFS), which creates multiple copies of the data as blocks for distribution on nodes.

Figure 6 MapReduce framework.

Bulk Synchronous Parallel (BSP) computing paradigm was introduced by Valiant and Leslie in [50]. A BSP algorithm [50, 51] generates a series of super-steps, each of which executes a user-defined function in parallel that performs computations asynchronously. At the end of every super-step, the BSP algorithm uses a synchronization barrier to synchronize computations within the system. Figure 7 shows a BSP program. The synchronization barrier is the state on which every super-step waits for other super-steps running in parallel. The BSP parallel paradigm is well suited to graph computation problems. In [51], BSP performs better than MapReduce for graph processing problems. Hama [52] and Pregel [53] are common technologies based on BSP graph-based processing for big-data analytics.

Processors

Superstep (S1) Superstep (S2) Superstep (S3) Superstep (Sn) Local Computation

Synchronization Barrier

Bulk Synchronous Parallel (BSP) Computing Paradigm Superstep (S1) Superstep (S2) Superstep (S3) Superstep (Sn)

Process Communication

Figure 7 Bulk synchronous parallel paradigm.

Directed acyclic graph (DAG) computing model describes complicated computing jobs according to dataflow graph processing. DAG is widely used in Dryad, which is a scalable parallel and distributed project of Microsoft [54]. In Dryad, a job is processed in a directed graph 15

manner.

2.3. Big-Data Big-data is a huge structured or unstructured data set that is difficult to compute using a traditional DBMS. An increasing number of organizations are producing huge data sets, the size of which start at a few terabytes. For example, in the U.S., Wal-Mart processes one million of transactions an hour, which creates more than 2.5 PB of data [36]. In the following sections, we discuss the characteristics and lifecycle of big data.

2.3.1. Characteristics Big-data characteristics are often described using a multi-V model (Figure 8). Gartner proposed a 3V model of big data, but an additional dimension, veracity, is also important for data reliability and accuracy [44].

Big-Data Dimensions

Variety

Velocity

Structured

Batch

Unstructured

Near-time

Semi-structured

Real-time

Mixed

Streams

Volume

Veracity

Size

Quality

Figure 8 Classification of big-data.

2.3.1.1. Volume Volume is a major dimension of big-data. Currently, the volume of data is increasing exponentially, from terabytes to petabytes and beyond. 2.3.1.2. Velocity Velocity includes the speed of data creation, capturing, aggregation, processing, and 16

streaming. Different types of big-data may need to be processed at different speeds [44]. Velocity can be categorized as 

Batch: Data arrives and is processed at certain intervals. Many big-data applications process data in batches and have batch velocity.



Near-time: The time between when data arrives and is processed is very small, close to real-time.



Real-time: Data arrives and is processed in a continuous manner, which enables real-time analysis. Streaming: Similar to real-time, data arrives and is processed upon incoming data flows.



2.3.1.3. Variety Variety is one of the most important characteristics of big-data. Many sources of big-data generate many different forms of data. As new applications are developed, a new type of data format may be introduced. As the number of big-data forms grows, designing algorithms or logic for big-data mining and analysis becomes more challenging. Big-data can be categorized as: 







Structured: Big-data in this form is very easy to input and analyse because there are several relational database management (RDBMS) tools that can store, query, and manage the data effectively. Structured big-data comprises characters, numbers, floating points, and dates commonly used in customer relationship management systems. Unstructured: Big-data in this form cannot be stored and managed using traditional RDBMS tools because it is not in a table (i.e., according to a relational model). Unstructured big-data includes location information, photos, videos, audio, emails, sensors data, social media data, biological data, and PDFs that are totally amorphous and very difficult to store, analyse, and mine. Social media websites and sensors are major sources of unstructured big-data. Eighty to ninety percent of today’s data in the world is unstructured social media data [55]. HP Labs has estimated that by 2030 approximately 1 trillion sensors will be in use, monitoring phenomena such as energy consumption, cyberspace, and weather [55]. Semi-structured: Big-data in this form cannot be processed using traditional RDBMS tools. Semi-structured data is a type of structured data that is not organized in a table (i.e., according to a relational model). Mixed: Big-data may also be a mixture of the above forms of data. Mixed big-data requires complex data capture and processing.

2.3.1.4. Veracity 17

The veracity of big-data is the reliability, accuracy, understandability, and trustworthiness of the data. In a recent report [56], it was found that 40–60% of the time needed for big-data analysis was spent preparing the data so that it was as reliable and accurate as possible. In several big-data applications, controlling data quality and accuracy has proven to be a big challenge.

2.3.2. Big-Data Lifecycle In this section, we describe the big-data lifecycle and divide it into four major phases: (a) generation, (b) acquisition, (c) storage, and (d) production. 2.3.2.1. Big-Data Generation The first phase of the big-data lifecycle involves generation of big-data. Specific sources generate a huge amount of multifarious data. That can be categorized as enterprise data, related to online trading, operation, and analysis data managed by RDBMS; Internet of Things (IoT), related to transport, agriculture, government, healthcare, and urbanization; and scientific, related to bio-medical, computational biology, astronomy, and telescope data [57]. 2.3.2.2. Big-Data Acquisition Big-data acquisition is the second phase of the lifecycle and involves collection, preprocessing, and transmission of big-data. In this phase, raw data generated by different sources is collected and transmitted to the next stage of the big-data lifecycle. Log files, sensing, and packet capture library (i.e., Libpcap) are common techniques for acquiring big-data. Because big-data has many forms, an efficient pre-processing and transmission mechanism is required to ensure the data’s veracity. In particular, before data is sent to the next phase, it is filtered during the acquisition phase to remove redundant and useless data. Data integration, cleaning, and redundant elimination are major methods for big data pre-processing. In that way, new data layout with a meaningful data can save storage space and improve overall computing efficiency for big data processing. 2.3.2.3. Big-Data Storage As big data has grown rapidly, the requirements on storage and management has also increased. Specifically, this phase is a responsible of data availability and reliability for big data analytics. Distributed file system (DFS) is commonly used to store big-data originating from large-scale distributed and data-intensive applications. A variety of distributed file systems have been introduced recently. These include GFS, HDFS, TFS, and FastTFS by Taobao, Microsoft Cosmos, and Facebook Haystack. NoSQL database is also commonly used for big-data storage 18

and management. NoSQL databases have three different storage models: key-value model, i.e., Dynamo and Voldemort; document-oriented, i.e., MongoDB, SimpleDB, and CouchDB; and column-oriented, i.e., BigTable. 2.3.2.4. Big-Data Production Big-data production is the last stage of the big-data lifecycle and includes big-data analysis approaches and techniques. Big-data analysis is similar to traditional data analysis in that potentially useful data is extracted and analysed to maximize the value of the data. Approaches to big-data analysis include cluster analysis, factor analysis, correlation analysis, regression analysis, and data mining algorithms such as k-mean, Naïve Bayes, a priori, and SVM. However, these methods cannot be used with big-data because of the massive size of data. If any of these methods are to be leveraged by big-data analysis, they must be re-designed to make use of parallel computing techniques, which may be batch-based (i.e., MapReduce-based), BSPbased, or stream-based.

2.4. Big-Data Processing In this section, we explain big data processing approaches and techniques which are based on cloud environments. Firstly, we discuss about major analysis approaches used to analyse bigdata. Next, we explain five different categories of big-data processing techniques in the next subsection. In Figure 9, it is depicted a complete big-data framework.

Figure 9 Big-data framework.

19

2.4.1. Analytic Approaches Big data analysis approaches are used to retrieve hidden information from big data. Currently, many big data analysis approaches follow basic analysis approaches. Big data analysis approaches include mathematical approaches and data mining approaches. Basically, an analysis approach chosen by a big data application is totally dependent on the nature of the application problem and its requirements. Particularly, different big data analysis approaches provide different outcomes. We categorize big data analysis approaches into two broad categories: mathematical approaches and data mining approaches [58]. 2.4.1.1. Mathematical Approaches Mathematical analysis approaches for big data involve very basic mathematical functions including statistical analysis, factor analysis, and correlation analysis used in many fields (i.e., engineering, physics, economics, healthcare, and biology). In statistical analysis, big data can be completely described, summarized, and concluded for its further analysis. Applications for economic and healthcare widely use statistical analysis approach for big data analysis. In factor analysis, a relationship among many elements presented in big data is analysed with only a few major factors. In such analysis, most important information can be revealed. Correlation analysis is a common mathematical approach used in several big data applications. Basically, with the help of correlation analysis, we can extract information about a strong and weak dependence relationship among many elements contained in big data. 2.4.1.2. Data-Mining Approaches Data mining involves finding useful information from big-data sets and presenting it in a way that is clear and can aid decision-making. Some approaches to data mining in big-data applications include regression analysis, clustering analysis, association rule learning, classification, anomaly or outlier detection, and machine learning. Regression analysis is used to find and analyse tendency and dependency between variables. For example, in CRM big-data applications, different levels of customer satisfaction that affect customer loyalty can be determined through regression analysis. Then, a prediction model can be created to help make decisions about how to increase value for an organization. Clustering analysis is used to identify different pieces of big-data that have similar characteristics and understand differences and similarities between these pieces. In CRM, cluster analysis is used to identify groups of customers who have similar purchasing habits and predict similar products. Association rule learning is used to discover interesting relationships between different variables and uncover hidden patterns in big-data. A business can use patterns and 20

interdependencies between different variables to recommend new products based on products that were retrieved together. This helps a business increase its conversion rate. Classification analysis is used to identify a set of clusters in data comprising different types of data. Classification analysis is similar to clustering analysis. Anomaly (outlier) detection is a data-mining technique for detecting data with unmatched patterns and unexpected behaviour. Detected data has to be analysed because it may indicate fraud or risk within an organization.

2.4.2. Cloud-Based Big-Data Processing Techniques Cloud-based Hadoop is used for processing in a growing number of big-data applications [49], each of which has a different platform and focus. For example, some big-data applications require batch processing and others require real-time processing. Here, we give taxonomy of cloud-based big-data processing techniques (Figure 10).

Big-Data Computing Models Batch-Based Processing

Stream-Based Processing

Graph-Based Processing (BSP)

DAG-Based Processing

Interactive-Based Processing

Dryad

Tableau

Mahout

STORM

Hama

Pentaho

S4

Pregel

Google's Dremel

Skytree

SQLStream

Giraph

Apache Drill

Karmasphere

Splunk

Datameer

Kafka

Cloudera

SAP Hana

Visual-Based Processing Talend Open Studio

Infochimps

IBM BigInsights

Figure 10 Big-data computing models.

2.4.2.1. Batch Processing Big-data batch processing is a MapReduce-based parallel computing paradigm of cloud computing (section 2.2.2.3). There are several tools and techniques are based on batch 21

processing and run on top of Hadoop. These include Mahout [59], Pentaho [60], Skytree [61], Karmasphere [62], Datameer [63], Cloudera [64], Apache Hive, and Google Tenzing. Mahout [59] was introduced by Apache and takes a scalable, parallel approach to mining big-data. It is used in large-scale data-analysis applications. Google, IBM, Amazon, Facebook, and Yahoo have all used Mahout in their projects. Mahout uses clustering analysis, pattern analysis, dimension reduction, classification, and regression. Skytree [61] is a general-purpose server with machine learning and advanced analytics for processing huge datasets at high speed. It has easy commands for users. Machine learning tasks in Skytree server include anomaly or outlier detections, clustering analysis, regression, classification, dimensions reductions, density estimation, and similarity search. Because its main focus is real-time analytics, it enables optimized implementation of machine-learning tasks on both structured and unstructured big data. Pentaho [60] is a big-data software platform for generating business reports. It is enables data capturing, integration, exploration, and visualization for business users. With business analytics, the user can make data-based decisions and increase profitability. Pentaho uses Hadoop for data storage and management and provides a set of plugins to communicate with a document-oriented model of NoSQL databases (i.e., MongoDB) and Cassandra database. Karmasphere [62] is a platform for business big-data analysis. It is based on Hadoop. With Karmasphere, a program can be efficiently designed for big-data analytics and self-service access. Karmasphere is capable of big-data ingestion, reporting, visualization, and iterative analysis in order to gain business insight. It can process structured and unstructured big data on Hadoop embedded with Hive. Datameer [63] provides a business integration PaaS, called Datameer Analytic Solution (DAS), which is based on Hadoop and is used to analyze a large volume of business data. DAS includes an analytics engine, data source integration, and data visualization (in the form of reports, dashboards, and charts). DAS services are deployed in other Hadoop distributions, such as Cloudera, Yahoo!, Amazon, IBM BigInsights, MR, and GreenplumHD. Because the main objective of Datameer is data integration, data can be imported from structured data sources, such as Oracle, MySQL, IBM, HBase, and Cassandra, as well as from unstructured sources, such as log files, LinkedIn, Twitter, and Facebook. Cloudera [64] provides Hadoop solutions such as batch processing, interactive search, and interactive SQL. Cloudera is an Apache Hadoop distribution system called CDH that supports MR, Pig, Flume, and Hive. Cloudera also supports embedded plugins with Teradata, Oracle, and Neteza. 2.4.2.2. Stream Processing Stream-based processing techniques are used to compute continuous flows of data (data 22

streams). Real-time processing overcomes the limitations of batch-based processing. Projects that use stream processing include Storm [65], S4 [66], SQLStream [67], Splunk, Kafka, SAP Hana, Infochimps, and BigInsights. Storm [65] is a fault-tolerant, scalable, distributed system that provides an open-source and real-time computation environment. In contrast to batch processing, Storm reliably processes unbounded and limitless streaming data in real-time. Real-time analytics, online machine learning, interactive operating system, and distributed remote procedure call (RPC) are all implemented in Storm project. This project allows the programmer to create and operate an easy setup and process more than a million of tuples per second. Storm comprises different topologies for different Storm tasks created and submitted by a programmer in any programming language. Because Storm works through graph-based computation, it has nodes, i.e., spouts and bolts, in the topology. Each of these nodes contains a processing logic and processes in parallel. A source of streams is called a spout, and a bolt computes input and output streams. A Storm cluster system is managed by Apache ZooKeeper. In 2010, Yahoo! introduced S4 [37], and Apache included it as an Incubator project in 2011. S4 is a platform that facilitates fault-tolerant, distributed, pluggable, scalable computing. It is designed to process large-scale continuous streams of data. Because its core library is written in Java, a programmer can easily develop applications in S4, which supports cluster management and is robust, scalable, and decentralized. It is used to process large-scale data streams. Analogous to Storm, S4 can also manage the cluster by using Apache ZooKeeper. Yahoo! has deployed S4 for computing thousands of search queries. SQLStream [67] is a platform for processing large-scale unbound streaming data in real-time with the support of automatic, intelligent operations. Specifically, SQLStream is used to discover interesting patterns in unstructured data. The platform responds quite rapidly because the streaming data is processed in memory. Server 3.0 is a recently released version of SQLStream and is used for real-time big-data analytics and management. Splunk [68] is a platform for analyzing real-time streams of machine-generated big data. Senthub, Amazon, and Heroku have all used a Splunk big-data intelligent platform to monitor and analyse their data via a web interface. Splunk can be used with structured or unstructured machine-generated log files. Kafka [69] has been developed for LinkedIn. Kafka is a stream processing tool for managing large-scale streaming and messaging data and processing it using in-memory techniques. Kafka generates an ad hoc solution to the problems created by two different types of data, i.e., operational and activity, belonging to a website. Service logs, CPU/IO usage, and request times are examples of operational data that describes the performance of servers. Activity data, on the other hand, describes the actions of different online users’ actions. These actions include clicking a list, scrolling through webpage content, searching keywords, or 23

copying content. Kafka is used in several organizations. SAP Hana [70] is a stream processing tool that also processes streaming data in-memory. SAP Hana is used for real-time business processes, sentiment data processing, and predictive analysis. It provides three real-time analytics: operational reporting, predictive and text analysis, and data warehousing. SAP Hana can also work with interactive demographic applications and social media. Infochimps [71] cloud suite covers several cloud IaaS services, categorized as: 

Cloud streams: real time analytics for multiple data sources,



Cloud queries: query capability for NewSQL and NoSQL (i.e., Apache Cassandra, HBase, MySQL, and MongoDB)



Cloud Hadoop: analysis of massive amount of data in HDFS. Infochimps platform is suitable for both private and public clouds. It can also control STORM, Kafka, Pig, and Hive.

BigInsights [72] is used in the Infosphere platform introduced by IBM. BigInsights manages and integrates information within Hadoop environment for big-data analytics. BigInsights leverages InfosphereStreams, a stream-based tool of the IBM Infosphere. BigInsights is used for real-time analytics on large-scale data streams. JAQL, Pig, Hive (for querying), Apache Lucene (for text mining), and Apache Oozie (job orchestration) are supported by BigInsights. 2.4.2.3. Graph Processing Graph-based big-data processing techniques work according to the BSP parallel computing paradigm of cloud computing (section 2.2.2.3). Several big-data applications are better suited to graph-based processing over batch processing [51]. Hama [52], Pregel [53], and Giraph [73] are common useful graph processing techniques for big-data analytics. Hama [52] is a complete programming model introduced by Apache. It was inspired by BSP parallel computing paradigm running on the top of Hadoop. Hama is written in Java. Massive scientific computations, including matrix algorithms, graph functions, and network computation algorithms, can be easily implemented through Hama [52]. In Hama architecture, a graph is distributed over all the computational nodes, and the vertices are assumed to reside in the main memory during computation. The Hama architecture three main components: BSPMaster, groom servers, and ZooKeeper. BSPMaster maintains the status of groom servers, super-steps, and job progress. A groom server performs BSP tasks assigned by the BSPMaster, and the synchronization barrier is managed efficiently by the Zookeeper component. Pregel [53] is a graph computational model for efficiently processing billions of vertices connected through trillions of edges. A Pregel program comprises sequences of iterations. In 24

each of these iterations, a vertex may receipt messages, update state or dispatch messages. In this model, a problem is approached through the BSP processing model. Apache Giraph [73] is an iterative graph-processing system built for high scalability. It is widely used within Facebook to analyze and process the social graph generated by users and their connections. Giraph originates from Pregel and is inspired by the BSP distributed computation model. Features of Giraph include master computation, out-of-core computation, and edge-oriented input. 2.4.2.4. DAG Processing Dryad [54] is a scalable parallel and distributed programming model based on dataflow graph processing. Similar to the MR programming model, a Dryad program can be executed in a distributed way on a cluster of multiprocessor or multicore computing nodes. Dryad computes a job in a directed-graph computation manner, wherein each vertex denotes a computational vertex, and an edge denotes a communication channel. This model can generate and dynamically update the job graph and schedule the processes on the resources. Microsoft Server 2005 Integration Services (SSIS) and Dryad-LINQ are built on Dryad. 2.4.2.5. Interactive Processing Tableau [74] sits between users and big-data applications by using an interactive mechanism for large-scale data processing. Tableau comprises three different tools: Tableau Desktop, Tableau Server, and Tableau Public. Tableau Desktop visualizes and normalizes data in different ways. Tableau Server offers browser-based analytics called a business intelligence system. Tableau Public is used for interactive visuals. Tableau uses the Hadoop environment and Hive to process queries. Google Dremel [75] is an interactive analysis system proposed by Google and used for processing nested data. Dremel is a scalable system that complements batch processing tools such as MapReduce. This system is capable of scaling to thousands of processing units. It can process petabytes of data and respond to thousands of users. Dremel can also query very large tables. Apache Drill [76] is also an interactive analysis system designed for processing nested data similar Google Dremel. It has the flexibility to support different queries and different data sources and formats. A Drill system can scale up to more than ten thousand servers that process petabytes of data (i.e., trillions of records) in seconds. Likely Dremel, Drill stores data in HDFS and performs batch analysis using a MapReduce tool. 2.4.2.6. Visualization Talend Open Studio [77] is specially designed for visual big-data analysis. Talend Open 25

Studio has user’s graphical platform that is completely open source software developed in Apache Hadoop. In this platform, programmer can easily build a program for Big Data problem without writing its Java code. Specifically, Talend Open Studio provides facilities of dragging and dropping icons for building up user’s task in Big Data problem. It offers Really Simple Syndication (RSS) feed that may be collected by its components.

2.5. Big-Data Research Directions and Challenges In this section, we highlight research directions and challenges in relation to big-data in transportation management systems, which is one of the emerging generators of big-data. In TMS, moving objects such as GPS-embedded taxis and buses generate GPS location data that exponentially increases the volume of big-data. Location data is required in numerous transportation services, such as map matching, to deal with the uncertainty of trajectories, visualize transport data, analyse traffic flow, mine driving patterns, and give smart driving directions. It is also used for crowd sourcing and geographical social networking. However, to handle and manage the big-data associated with these transportation services, which produce a massive number of GPS records, TMS needs an optimized, intelligent ICT infrastructure. Here, we describe major transportation services that require further research in terms of big-data management.

2.5.1. Map-Matching GPS location data are sometimes affected by two typical problems: 3) Due to the limitations of positioning devices, moving objects mostly generate noisy and imprecise GPS location data that is called the measurement error. This leads uncertainty in acquiring original locations of the object. 4) On the next sampling error, moving objects continuously update its location at discrete time intervals, which may lead to sampling error. The low sampling rate and long intervals between updates may reduce energy consumption and communication bandwidth at the expense of increasing the uncertainty of the actual location. On the contrary, the highsampling-rate greatly increases the amount of extraneous data. Therefore, map matching in TMS is used to accurately align the observed GPS locations on a road network in a form of a digital map [12]. Map matching from massive historical GPS location records is performed to predict a driver’s destination, suggest the shortest route, and mine certain traffic patterns. However, [12] suggests that map matching is most accurate because of transition probability, which incorporates the shortest path between two consecutive observed GPS location points. On the other hand, the execution of the shortest path queries (SPQs) in the map-matching service involves high computational cost, which makes map26

matching unaffordable for real-time processing [12]. Moreover, extraneous data (i.e., in case of a vehicle that stops many times, moves slowly, is trapped in a traffic jam, waits for traffic lights, and moves on a highway link) incurs an extra number of SPQs. The approaches in [23, 24] are introduced to execute the SPQs by pre-computing the shortest path distances and splitting a road network into small portions so that the required portion can be loaded in the memory [23]. Due to the sequential execution of the SPQs, these approaches incur high pre-computation and storage costs [24]. To map match the huge number of moving objects with tremendous GPS location records (i.e., big data) there is a dire need to execute the SPQs in a computationally efficient environment. The SPQs can be implemented in graph-based big data processing paradigms (i.e., see Section 2.4.2.3) on a large cluster of low-end commodity machines. Consequently, precomputations of the SPQs on a large cluster of low-end commodity machines benefits low wallclock-time and storage cost.

2.5.2. Visualizing Transportation Data Visualizing transportation data is crucial in TMS to present raw data and compute results generated by data-mining [37]. Such presentation of data reveals hidden knowledge which helps in decision making to solve a problem in the system. In this service, transportation data can be viewed from different perspectives to detect and describe patterns, trends, and relationships in data. Moreover, it provides an interactive way to present the multiple types of data in TMS called exploratory visualization for purpose of investigation. Exploratory visualization can help to detect the relevant patterns, trends, and relations, which can grow new questions that can cause to view the visualized data in more details [37]. Visualizing the massive amount of transportation data i.e., big data conveys a huge amount of information cannot be better visualized and presented in simple and traditional visualization tools. This service can be more challengeable when it visualizes multimodal data that leads to high dimensions of views such as social, temporal, and spatial [37]. In big data research, visualizing the tremendous transportation data is an open issue and needing a large concern on new techniques of big data management.

2.6. Summary In this paper, we have described cloud computing and key big-data technologies. We categorized big-data key technologies as batch-based, stream-based, graph-based, DAG-based, interactive-based, or visual-based. In this survey, we have discussed the strengths of various Hadoop-based big-data cloud processing techniques that help the big-data community select an appropriate processing technique. Moreover, we have highlighted research directions and 27

challenges in big data in the transportation domain. In the next chapter, we will provide a comprehensive study on job scheduling strategies for HPC workload. Firstly, characterization and analysis of the HPC workload will be described, and then simulation and experimental results of several job scheduling policies will be presented.

28

Chapter. 3

Job Scheduling Strategies for HPC2

This chapter describes the comprehensive characterization and analysis of data center workload, wherein, we present the simulation and experimental results of various job scheduling policies with their discussion. Then we extend the job scheduling policies to implement the energy efficiency technique. Because the cost of the energy consumption in the environment is a dominant part of the owner’s and user’s budget, resource allocation and energy efficient techniques are required to deliver the Quality of Services (QoS) and to reduce the operational cost of the system. However, when considering energy efficiency, resource allocation strategies become more difficult, and QoS (i.e., queue time, response time) may violate. Therefore this chapter is a comparative study on job scheduling in the environment to: (a) minimize the queue time, response time, and energy consumption and (b) maximize the overall system utilization. In this chapter, we highlight the strengths and weakness of various job scheduling polices that helps to choose an appropriate job scheduling policy in a given scenario.

3.1. Introduction Scientific organizations are gradually adopting High Performance Computing (HPC) for solving large problems, which increases computational and storage needs. In the last decade, various scientific organizations spent gigantic budget to carry out research projects using supercomputers [78]. Because of the fact that supercomputers are unaffordable for various organizations, therefore, the organizations were forced to choose low-budget solutions. Consequently, cloud environment emerged as an alternate to provide the facility of large-scale parallel computations. Currently, many cloud Resource Providers (RPs) offer thousands of computational nodes and a variety of services to facilitate end-users. In large-scale parallel computing environments, the end-users submit their requests unaware of the resource allocation strategy. These requests are usually complex jobs, which may be computation-intensive (i.e., job demands more CPU time), data-intensive (i.e., job demands more storage space and communication cost), or mixed (i.e., computation-intensive and data-intensive) 2

Two papers with the part of Chapter 3 have been published, One in an SCI journal (JCR 2014-Q1) the Cluster Computing: The Journal of Networks, Software Tools and Applications 17(4): 1349-1367, December 2014, ISSN 1386-7857, doi: 10.1007/s10586-014-0384-x (http://www.springer.com/journal/10586). Title of the paper: “A comparative study on resource allocation and energy efficient job scheduling strategies in large-scale parallel th computing systems” and The second in proceedings of the 11 IEEE International Symposium on Parallel and Distributed Processing with Applications (IEEE ISPA 2013) in conjunction with IEEE/ACM TrustCom 2013 and IUCC (indexed by EI, CPCI, IEEE, and ACM), Melbourne, Australia, July 2013, pp. 949–957 (2013). doi: 10.1109/TrustCom.2013.116. Title of the paper: “A Comparative Study of Scheduling Strategies in Large-scale Parallel Computational Systems”

29

[79]. Moreover, these requests may require different levels of Quality of Service (QoS), including job turnaround and queue time. Furthermore, the large-scale parallel computing environments consist of (a) a mixture of applications and (b) a pool of finite resources. Because of the above identified factors, RPs pay considerable attention to resource management to deliver the required QoS and enhance system utilization [40]. Several researchers focus on resource management to optimize the system performance considering various QoS constraints. The job scheduling is one of the major components of a resource management system. A scheduling process involves assigning resources to jobs such that no other jobs access the resources at the same time interval [14]. However, due to dynamic nature of the workload, the scheduling problem is hard to solve. The scheduling policy should behave equally well considering the resource heterogeneity and the workload variability. Moreover, the scheduling and resource allocation process is more difficult to design when energy efficiency is also considered [29, 79]. Because of environmental aspects and energy prices, energy efficiency is one of the major requirements of cloud computing. However, energy efficient resource scheduling must not violate the QoS and SLA requirements. In recent years, RPs, such as IBM, Google, and Microsoft have deployed data centers for scientific research, hosting, and storage services for the Internet applications. These data centers are comprised of hundreds of thousands of servers and storage resources. Various components within a data center consume different amount of electricity for their operations, as shown in Table 1 [80]. The table shows the breakdown of peak power consumed by the major components of a single server. It can be observed that the CPU is the largest power consumer within a server. Table 1 Peak power of the components of a typical server Component

Peak Power

Count

Total

40 W

2

80 W

Memory

9W

4

36 W

Disk

12 W

1

12 W

PCI slots

25 W

2

50 W

Motherboard

25 W

1

25 W

Fan

10 W

1

10 W

CPU

System Total

213 W

Considering the aforementioned issues, the contribution of this paper is three-fold: (a) comprehensive characterization of real-world data center workloads, (b) comparison and analysis of a set of job scheduling policies to evaluate system’s performance, and (c) the design of job scheduling policies for energy efficiency. The analysis of job scheduling policies can help to select an appropriate job scheduling policy for a given scenario. Additionally, this paper presents an analysis of how workload characteristics affect job scheduling performance. Collecting log files from a real world system is a common approach to estimate the future workloads [81]. 30

Therefore, we use the workloads of a real data center for the experimental evaluation. As the workload characterization is a major factor in evaluation of the system performance [82], we present a comprehensive characterization of the workload. The study of workload characterization motivates to interpret the difference between jobs’ computation time, and identifies the similar and repeatable workload trends. First of all, we examine thirteen job scheduling policies: five priority-based policies, one tuning policy, one window-based policy, and two backfilling techniques. The priority-based scheduling policies are: (a) First Come First Serve (FCFS), (b) Smallest Job First (SJF), (c) Largest Job First (LJF), (d) Minimum estimated Execution Time (MinET), and (e) Maximum estimated Execution Time (MaxET). The aforementioned priority-based scheduling policies are tuned by applying the First Fit (FF) technique. Moreover, we use a window-based scheduling policy called Window-K. Furthermore, we consider two backfilling techniques namely: (a) Aggressive Backfilling and (b) K-reserved based technique. Both backfilling techniques work in conjunction with the FCFS policy. We simulate all of these scheduling policies with real data center workload. In all of the studied scheduling policies, the job parameters, such as number of jobs, tasks in each job, submission and execution time of each job remain the same to conduct a fair comparison. The aforementioned policies are analyzed using numerous results wherein we split the workload to create multiple datasets. This assumption allows determining, which job scheduling policy produces better results under different datasets. We use four class-based job observations for detailed analysis and comparison of scheduler performance. We performed detailed analysis and observed interesting findings, such as: (a) MinET and SJF when combined with FF technique exhibit better performance compared to other policies, and (b) a large number of small jobs in the workload can stop the MaxET policy for producing at least same results compared to MinET policy with certain job characteristics. Our analysis reveals the results of all policies into three performance classes, i.e., best class, average class, and worst class. Finally, after examining the results of the aforementioned policies, based on results of their performance, we incorporate energy efficiency in three policies i.e., (a) best result producer, (b) average result producer, and (c) worst result producer. The selected policies are extended with energy efficiency technique (i.e., DVFS) to further examine the behavior of workload. The proposed energy efficient policies, (a) SJF-Energy Efficient (SJF-EE), (b) LJF-Energy Efficient (LJF-EE), and (c) LJF-FF-Energy Efficient (LJFFF-EE), are extended based on our previous work [30]. Selection of the policies from best, average, and worse policies helps to analyze the energy efficiency strategy. We input the same workload used in the other job scheduling policies. The rest of the chapter is organized as follows. Section 3.2 states the related work followed by the job scheduling strategy and energy efficient strategy in Section 3.3 and Section 3.4, respectively. The comprehensive characterization and analysis of data center workload is 31

explained in Section 3.5. While in Section 3.6, we present the simulation and experimental results of various job scheduling policies with their discussion. We extend the job scheduling policies to implement the energy efficiency technique and their result discussions are explained in the Section 3.7. Finally, Section 3.8 concludes the chapter and highlights future research directions.

3.2. Related Work In the large-scale parallel computing environment, RP offers dynamic and geographically distributed access to computational and storage resources. Moreover, RP aims to efficiently utilize the finite resources to a vast number of users, and to maintain the different QoS levels [40]. The resource management problem can be handled by selecting an appropriate job scheduling technique for performance optimization. A vast body of research such as [83-89] has focused on resource management through scheduling techniques to address the problem of resource allocation under different QoS constraints. For instance, the Wei et al. in [83] proposed a metric-aware scheduling policy, where the scheduler balances the competing scheduling objectives represented by different performance metrics, i.e., fairness, job waiting time and system utilization. Khan et al. used a self-adaptive weighted sum technique in [84] and [85], game theoretical methodologies in [86], and goal programming approach in [87] to optimize the system performance for grid resource allocation under different QoS constraints. Tracy et al. [88] studied eleven static heuristics for mapping independent tasks on heterogeneous distributed computing system. The authors analyzed and implemented a collection of task mapping policies under a single set of common assumptions. The author in [89] compared the performance of six online scheduling algorithms for batch jobs, keeping in consideration the three objective functions including average flow time, makespan, and maximum wait-time. This paper addresses a similar problem of system performance through a comparative study of job scheduling strategies. Additionally, these environments respond to a large number of users with pay-per-use and pay-as-you-go methods, and execute several jobs in parallel. These jobs require long execution times and are considered computation-intensive. The expected workloads for such systems comprise of a mixture of applications that demand different resources, which result in highly variable workloads [90]. Chapin et al. [91] described parallel workload models in detail, and explained standard workload format for large-scale parallel computing environments. The authors used publically available parallel workloads from [92] that consist of various real world workloads obtained from several large-scale parallel computers. These workloads have been characterized and analyzed in [93] and [94]. The authors observed similarities and differences in the workload characteristics, such as different arrival patterns in peak or non-peak intervals and 32

“power-of-two” number of processor requirements for job execution. The workload characterization is considered a useful approach for system design. Therefore, we examine and characterize the workloads in different perspectives to find out similarities and differences that can be used as a tool for system’s performance optimization. In the state-of-the-art, various authors used publically available workloads to analyze scheduler performance. Most of the authors make certain assumptions about the nature of jobs to present a specific real system for their experiments. In contrast, we have used the log files collected from a real data center for an evaluation that is closer to real scenarios. Moreover, we study a set of scheduling policies to analyze and compare simulation results, highlighting their performance.

3.3. Job Scheduling Strategy The scheduler is a major component for resources management of large-scale parallel environments. A policy in a scheduler is used to assign jobs to resources at specific time intervals such that the capacity of resources should meet job needs [14]. Suppose 𝑚 denotes the total number of machines, 𝑀𝑖 (𝑖 = 1, … , 𝑚) to process 𝑛 jobs 𝐽𝑗 (𝑗 = 1, … , 𝑛). A job 𝐽𝑗 is a program submitted by a user at a specific time interval (submit-time). Each job contains one or more tasks 𝐽𝑗 = 𝑇𝑗𝑘 (𝑘 = 1, … , 𝑜), with each of these tasks being executed on a separate CPU for a given time period. A complete scheduling process schedules the job, and allocates one or more time intervals of one or more machines as shown in Figure 2. The corresponding scheduling policy problem is to find an optimal schedule process subject to various constraints, such as, (a) minimize the queue time, response time, makespan, and energy consumption, and (b) maximize the overall system utilization. Job scheduling policies can be static or dynamic. In static scheduling, the set of jobs are known a-priori, while the dynamic one performs scheduling at job arrival. Because of jobs’ arrival rate and the status of some nodes (off-line or online) may change without any prior knowledge, therefore, the dynamic scheduling method is required [33]. The scheduling process is categorized into two groups: (a) batch mode and (b) online mode scheduling. In an online mode, the job is scheduled on nodes immediately upon arrival, while the batch mode schedulers collect the jobs in a queue until a specified condition is met. A set of jobs considered for scheduling includes newly arrived jobs and the jobs that were unscheduled in the earlier scheduling events, called meta-tasks [33]. The meta-tasks are examined by the corresponding scheduling policy at prescheduled times called scheduling events. The scheduler events can be defined through regular time interval such as every 10 seconds [33]. The batch scheduling method is successfully applied in large-scale parallel environments, such as banking system, health system, virtual campuses, and bio-informatics applications [95]. However, the 33

batch scheduling method and the independent nature of jobs is hard to solve [33]. The set of schedulers examined in this work are based on either dynamic or static batch scheduling policies. In case of batch scheduling policy, the jobs are grouped in batches and executed irrespective of the dynamic environment. Similar to the aforementioned scheduler properties, the scheduling process can be further considered as a family of problems with respect to different job models. These job models directly affect the scheduling policies, which are inspired by the way the systems are managed and how the parallel applications are written [96]. In such a model, job flexibility is an advanced partitioning method supported by the application (i.e., rigid, moldable, evolving, and malleable job flexibility) [96]. There is a difference between rigid and moldable/evolving/malleable jobs. In case of a rigid job, the number of CPUs that are assigned to a job does not change throughout the execution. Alternatively, in case of a moldable/evolving/malleable job, the number of CPUs assigned to the job is subject to change throughout the execution. Another model for schedulers is to support different level of preemption, such as preemptive and non-preemptive [96]. In preemption level, the tasks or entire job can be preempted and migrated during the job execution (i.e., Gang scheduling). While in the non-preemptive scheduling, the processors are dedicated to the job throughout their execution after allocation. Preemption method may have great advantage in terms of system performance improvement, but this method may incur extra overheads, such as the cost of memory/communication due to migration and preemption [96]. Therefore, we consider non-preemptive scheduling process in our experiments.

3.4. Energy Efficient Strategy Due to the primary focus being high throughput and better performance of the data centers and computing farms in cloud environment, the energy efficiency is a rare consideration at the design time [97]. US Environment Protection Agency (EPA) reported that the data centers consumed 1.5% of total sales of electricity in USA for the year 2006, which was 61 billion kilowatt-hour (kWh) [98]. The energy consumption within data centers was estimated to be doubled in 2011 (i.e., more than 100 billion kiloWatt-hour). According to [99], the worldwide power consumption within data centers was doubled from 2000 to 2005. The author estimated that around 80% of the growth in power consumption can be attributable to servers, while 10% to communication and remaining 10% to storage resources. In recent report [100] by Koomey in the year 2011, the rapid rates of growth in data center electricity, which was prevailed for the year 2000 to 2005, is estimated slowed significantly from the year 2005 to 2010 instead of doubled [100]. Authors [99-101] identified the factors such as: (a) to apply the Dynamic Voltage and Frequency Scaling (DVFS), (b) to apply Dynamic Power Management (DPM) or device reduction, (c) to improve the server, storage and cooling 34

efficiency, (d) multi-core processor designs and (e) virtualization. All of the above are the major contributed techniques to reduce the energy consumption without much overhead [97, 99, 101]. The next sections briefly explain the DPM and the DVFS techniques for power efficiency.

3.4.1. DPM DPM is the most effective and an aggressive technique for power saving in which the devices can be powered on/off dynamically [79]. Benini et al. [102] introduced the DPM methodology for dynamically reconfigure the electronic devices. The DPM technique can be applied at various power-hungry components, such as CPU, storage disk, memory, servers, and network devices. All these devices consume substantial power, even with low or zero loads. It has been reported in [103] that an idle machine consumes about 2/3 of the peak load on the server. In addition, the average load of the data center resources is only around 30% percent of the resources [104]. These facts allow placing around 70% of the resources to sleep mode in order to eliminate the idle power consumption most of the time [79]. DPM can be applied to place the processor cores in sleep mode [105], turn off the banks of memory [54], and transitioning network devices to sleep [79, 106, 107]. Due to the fact that the peak performance of the system components only rises during few time intervals, the components are not always be required in active state. For instance, virtual machine workload consolidation techniques are used to consolidate the workload on least number of nodes so that rest of the nodes can be powered off. Typical energy efficient scheduling policies consider: (a) to consolidate the workload on least set of the computing resources and (b) to increase the number of computing resources that can be transitioned to sleeping mode [108]. DPM technique may result in some overheads, such as slept devices take considerable time to wake up, which consumes more energy and increase delay.

3.4.2. DVFS DVFS for energy efficiency has become a popular research issue in the last decade [109112]. The basic idea behind the DVFS technique is to adjust the clock frequency of the CPU through appropriate supply voltage to reduce the CPU energy consumption. DVFS technique is an example of energy proportional techniques [107]. Energy proportional means to consume energy according to workload, i.e., less workload should result in lower energy consumption. On the other extreme, a shrinking fraction of total server power is only up to 25% in current systems [113]. DVFS can be applied at two levels [110]: (a) behavioral and (b) system level. In behavioral level, unique supply voltage for the machine is determined during design which can be constant at runtime. While at the system level, the supply voltage of the machine can vary at runtime. The dynamic voltage machines offer better potential and more flexibility for reducing the energy 35

consumption. This paper solves the problem of energy efficient job scheduling strategy in DVFS method at the behavioral level.

3.5. Workload Characterization and Analysis The system performance is evaluated considering the characteristics of hardware and software components, as well as the workload it processes [82]. Workload characterization helps in understanding overall behavior of the system highlighting the job arrival rate, job size, and job length. Major challenges include: (a) how to manage the system for different loads, (b) how to utilize the resources efficiently, (c) how to meet user demands, and (d) how to minimize the Total Cost of Ownership (TCO). Aforementioned questions mandate the RPs to select appropriate resource management techniques, such as job scheduling policies.

3.5.1. Dataset Information We characterized a real data center workload from the Center for Computational Research (CCR) of State University of New York at Buffalo to evaluate the system performance. The data center is a collection of multiple computational resources clustered together using communication infrastructure, which fall into two categories: (a) homogeneous and (b) heterogeneous resources. The resources in homogeneous systems are similar in terms of size and capacity, in which a job executes in similar capacity, whereas the resources in heterogeneous system are organized with different specification. The workloads were collected during 30 days’ time period from February 20, 2009 to March 22, 2009. A total of 22385 jobs were executed on more than 1000 dual processor nodes [28-30]. A complete specification of the data center is presented in Table 2. Table 2 Full specification of data center Time Duration Total Jobs ran out on DC Total Distinct Nodes Processor name Processor Speed Peak performance

20 Feb. 2009 to 22 Mar. 2009 22385 1045 1056 Dell PowerEdge SC1425 nodes 3.0GHz or 3.2GHz 13 TFlop/s

The total offered load which is the amount of queued workload over time is shown in Figure 11. The offered load exceeds 100% in many days, which implies that enough resources are unavailable to complete the tasks given to the system at particular times.

36

200

Load (%)

150 100 50 0 1

5

9

13

17

21

25

29

Days Figure 11 Total offered load in a month per day.

3.5.2. Job Characteristics and Classification A job is generated by a user and submitted to the system. The system in turn, according to its scheduling policy, allocates a number of processors meeting the demands of the job in question. In this section, we characterize jobs according to different perspectives, such as job arrival rate and job size. Figure 12 shows the total number of jobs arrived per hour in 30-day cycle. From this figure we can make two observations: (a) there are fluctuations in the job arrival rate per hour, (b) the system experiences high job arrival rates in specific time intervals, and (c) job arrival rate does not follow a uniform distribution at hourly cycle.

Figure 12 Jobs arriving per hour.

In terms of the job size we make the following observations. The job size can be well explained in a 2D chart, with y axis representing the number of processors while x axis representing the execution time [114]. Therefore, we distribute the job size according to: (i) 37

job’s width representing the number of CPUs required by the job in question and (ii) job’s length representing the execution time of the respective job. The above perspectives are further classified into four categories: (a) Narrow, (b) Wide, (c) Short, and (d) Long. Specifically, regarding (i) a job requests either a single CPU (Narrow category) or an even number of CPUs (Wide category). On the other extreme (job’s length), a job is executed within either an hour (Short category) or more than an hour (Long category). The above categories are classified based on the aforementioned workloads. Figure 13 shows the job distribution according to their width (CPU requirement). Our analysis revealed that the jobs demand either single CPU (i.e., Narrow jobs) or even number of CPUs (i.e., Wide jobs). It is worth mentioning that the number of Narrow jobs is dominant, i.e., 79% percent of the total jobs, whereas, 21% of the total jobs are in Wide category. 11~20 CPUs 2% 8~10 CPUs 8%

21~50 CPUs 4%

51~100 CPUs 1% > 100 CPUs 1% 1 CPU 2~4 CPUs

2~4 CPUs 5%

8~10 CPUs 11~20 CPUs 21~50 CPUs 51~100 CPUs > 100 CPUs 1 CPU 79% Figure 13 Jobs breakdown according to number of CPUs.

Our next observation is that the job length exhibits the time length of the job being executed. We observed in Table 3 that almost 50% of the total jobs belong to Short category. Consequently, the rest of them belong to Long category. Table 3 Breakdown distribution for job length Job length

No. of jobs

% of jobs

< 1 hour (Short)

10428

46.58

> 1 hour (Long)

11957

53.42

22385

100

Total

To understand in-depth analysis of workload, Figure 14 and Figure 15 present the classification of the jobs in (a) Short and (b) Long categories for in-depth analysis. It can be observed that most of the Short jobs (around 86%) are executed within 18 minutes, while around 38

66.6% of the Long jobs are executed within 12 hours.

No. of jobs (%)

10 8

6 4 2

0

Minutes

Figure 14 Breakdown distributions of the Short jobs.

No. of jobs (%)

10 8

6 4

2 0

Hours

Figure 15 Breakdown distributions of the Long jobs

We also analyzed the job-size to address the correlation between job width and job length, as shown in Table 4. The correlation table reveals that the Narrow jobs dominate all of the categories. Moreover, Short jobs with execution time between 11 to 20 minutes and Long jobs with execution time more than 11 hours are also prevailing. Table 4 Percentage breakdown for correlation between jobs width and length

Job Length

Job Size < 1 Min. 2~10 Min. 11~20 Min. 21~60 Min. 2~4 hours 5~8 hours > 9 hours Total Jobs

1 CPU 4.83 7.60 20.72 3.73 13.23 18.43 10.73 79.27

Job Width 2~24 CPUs > 32 CPUs 0.58 0.90 3.40 1.07 0.67 0.15 8.27 15.03

39

0.13 0.38 2.97 0.27 0.68 0.16 1.10 5.70

Total jobs 5.55 8.88 27.09 5.07 14.57 18.74 20.11 100

In the section workload characterization, our analysis revealed important job characteristics, such as that jobs arrival rate does not follow any trend and possesses heterogeneity in job size and resource requirements. Such heterogeneity in workloads dictates to analyze the effect of various scheduling policies in such scenarios.

3.6. Resource Allocation Job Scheduling Policies Considering the scheduler properties discussed in Section 3.3, we simulate thirteen resource allocation job scheduling policies. All of these policies are briefly described as follows. The FCFS is a simple and static job scheduling policy, where a job is served on arrival basis. In this policy, a job can create long delay for the next jobs when the ready processor does not meet the requirements of the job in question [115]. The LJF and SJF scheduling update the batch of jobs (i.e., meta-task) in decreasing and increasing order in terms of job’s size (i.e., job width), respectively. Alternatively, the MinET and MaxET update the batch of jobs in decreasing and increasing order in terms of job’s length, respectively [33]. The FF is an additional technique to enhance the capability of the above five policies. The FF policy finds a job in shared ordered queue list that can be fit to the first available idle resources to increase resource utilization. A backfilling technique [115, 116] makes resource reservations for jobs in the queue, and backfills these jobs under the constraint that next jobs (i.e., Short jobs) may not violate the time reserved for previous jobs. There are two basic backfilling techniques: (a) aggressive and (b) conservative. The aggressive technique makes a reservation only for the first job in the queue, while the conservative technique makes reservations for all of the jobs contained in the shared queue. The aggressive backfilling (named EAZY) was developed for IBM SP1 parallel supercomputer, which is based on FCFS scheduling policy [115, 116]. In our work, we use the aggressive technique because it outperforms the conservative backfilling technique [116]. In the K-reserved based policy, queue list has a counter containing K number of times that it has been overtaken by subsequent jobs [117]. The K-reserved based policy works similar to the aggressive backfilling technique with the difference being that in the K-reserved based policy, a job is considered to backfill only 𝐾 numbers of times, while the aggressive technique does not have any limitation in terms of the times that job is considered to backfill. The Window-K policy enhances the FCFS policy for a window of 𝐾 consecutive jobs [117]. The window starts with the oldest waiting job, and it contains up to 𝐾 number jobs arrived successively. We set the value of 𝐾 = 5 for the K-reserved based policy and 𝐾 = 10 for the Window-K policy as suggested by the authors in [117].

40

3.6.1. Experimental Setup This section presents the simulation details for the set of scheduling policies. All of the policies are used to schedule the aforementioned workload to figure out the best job scheduling policies for optimizing the energy efficiency along with the system performance. For the workloads simulation, we developed a custom Java based discrete event simulator. The Java environment setup allows database connectivity [31, 32], where we stored a dataset. In the eventbased setup, the scheduler policies check the queue periodically (i.e., every 10 seconds [33]) and schedule the jobs accordingly.

3.6.2. Simulation Results In this section, we analyze all of the scheduling policies considered in this work. We consider four metrics to evaluate the performance of considered scheduling policies: (a) mean queue-time, (b) mean response-time, (c) mean slowdown, and (d) slowdown ratio. It is worthy to note that the service providers are mainly concerned about the mean response-time and mean queue-time, while the customers are concerned with the mean slowdown and the slowdown ratio. The job’s queue-time represents the time elapsed from the arrival time of the respective job until the assignment of the corresponding job to the assigned nodes. The job’s response-time represents the time elapsed after the arrival of the respective job until its finish [118]. Equation 1 and 2 are used to calculate the mean response-time and queue-time of the entire workload, respectively. 𝑀𝑒𝑎𝑛 𝑄𝑢𝑒𝑢𝑒 𝑇𝑖𝑚𝑒 =

∑ 𝑇𝑖𝑚𝑒 (𝑠𝑡𝑎𝑟𝑡𝑡𝑖𝑚𝑒 −𝑠𝑢𝑏𝑚𝑖𝑡𝑡𝑖𝑚𝑒 ) 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝐽𝑜𝑏𝑠

𝑀𝑒𝑎𝑛 𝑅𝑒𝑠𝑝𝑜𝑛𝑠𝑒 𝑇𝑖𝑚𝑒 =

,

∑ 𝑇𝑖𝑚𝑒 (𝑒𝑛𝑑𝑡𝑖𝑚𝑒 −𝑠𝑢𝑏𝑚𝑖𝑡𝑡𝑖𝑚𝑒 ) 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝐽𝑜𝑏𝑠

(1) .

(2)

In the sequel, we give the definition of the rest of the metrics. Mean slowdown is the normalized time of each job (i.e., job completion time divided by job running time). The slowdown ratio exhibits the normalized time of mean response-time [119] derived in Equation 3. 𝑇𝑖𝑚𝑒 (𝑚𝑒𝑎𝑛_𝑟𝑒𝑠𝑝𝑜𝑛𝑠𝑒)

𝑆𝑙𝑜𝑤𝑑𝑜𝑤𝑛 𝑅𝑎𝑡𝑖𝑜 = 𝑇𝑖𝑚𝑒 (𝑚𝑒𝑎𝑛_𝑒𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛) ,

(3)

where Time (mean_execution) and Time (mean_response) (see eq. 2) indicate the mean running time and response-time of the entire workload. For example, if the mean response-time of a set of jobs is 10 time unit and the mean execution-time of the jobs on nodes is 5 time unit, then the slowdown ratio will be 2 time unit. The slowdown ratio is important for measuring the performance of the scheduling policy for the entire workload [118]. Figure 16, Figure 17, Figure 18, Figure 19, Figure 20, and Figure 21 present the results of the aforementioned performance 41

metrics: mean queue-time, mean response-time, summary of mean queue-time, summary of mean response-time, mean slowdown, and slowdown-ratio, respectively for the entire workload. We further compare the performance metrics in terms of job size correlations. To see how job characteristics affect the scheduling accuracy, we create four job observations based on the aforementioned job categories. Table 5 presents the classification as Short and Narrow (SN), Long and Narrow (LN), Short and Wide (SW), and Long and Wide (LW) jobs and their results for mean queue-time, mean response-time, and mean slowdown are depicted in Figure 22Figure 33 for each observation: Ob1-Ob4. 𝐽𝑜𝑏𝑤𝑖𝑑𝑡ℎ = 1 𝐶𝑃𝑈 𝐴𝑁𝐷 𝐽𝑜𝑏𝑙𝑒𝑛𝑔𝑡ℎ ≤ 1 𝐻𝑜𝑢𝑟

(Ob.1)

𝐽𝑜𝑏𝑤𝑖𝑑𝑡ℎ = 1 𝐶𝑃𝑈 𝐴𝑁𝐷 𝐽𝑜𝑏𝑙𝑒𝑛𝑔𝑡ℎ > 1 𝐻𝑜𝑢𝑟

(Ob.2)

𝐽𝑜𝑏𝑤𝑖𝑑𝑡ℎ > 1 𝐶𝑃𝑈 𝐴𝑁𝐷 𝐽𝑜𝑏𝑙𝑒𝑛𝑔𝑡ℎ ≤ 1 𝐻𝑜𝑢𝑟

(Ob.3)

𝐽𝑜𝑏𝑤𝑖𝑑𝑡ℎ > 1 𝐶𝑃𝑈 𝐴𝑁𝐷 𝐽𝑜𝑏𝑙𝑒𝑛𝑔𝑡ℎ > 1 𝐻𝑜𝑢𝑟

(Ob.4)

Table 5 Observation table for entire workloads (job’s breakdowns for correlation between job’s width and job’s length) Job size Narrow (N) Wide (W) Total

1 CPU > 1 CPU

Short (S) 1 Hr 42.39 11.03 53.42

6 5.5 5

Hours

4.5 4 3.5 3 2.5 2 1.5 1

Figure 16 Mean queue time.

42

Total 79.27 20.73 100.00

12 11.5 11

Hours

10.5 10 9.5 9 8.5 8 7.5 7

Figure 17 Mean response time.

Figure 18 Summary for queue time.

43

Figure 19 Summary for response time.

4 3.5 3

Minutes

2.5 2 1.5 1 0.5 0

Figure 20 Mean slowdown.

44

2 1.8

Ratio

1.6 1.4 1.2 1 0.8

Figure 21 Slowdown ratio.

7 6

Hours

5 4 3 2 1 0

Figure 22 Mean queue time (ob1).

45

7 6

Hours

5 4 3 2 1 0

Figure 23 Mean response time (ob1).

10 9 8

Minutes

7 6 5 4 3 2 1 0

Figure 24 Mean slowdown (ob1).

46

4.5 4 3.5

Hours

3 2.5 2 1.5 1 0.5 0

Figure 25 Mean queue time (ob2).

15 14 13 12 Hours

11 10 9 8 7 6 5

Figure 26 Mean response time (ob2).

47

0.06 0.05

Minutes

0.04 0.03 0.02 0.01 0

Figure 27 Mean slowdown (ob2).

3.5 3

Hours

2.5 2 1.5 1 0.5 0

Figure 28 Mean queue time (ob3).

48

3.5 3

Hours

2.5 2 1.5 1 0.5 0

Figure 29 Mean response time (ob3).

3 2.5

Minutes

2 1.5 1 0.5 0

Figure 30 Mean slowdown (ob3).

49

10 9.5 9 8.5 Hours

8 7.5 7 6.5 6 5.5 5

Figure 31 Mean queue time (ob4).

25 24.5 24 23.5 Hours

23 22.5 22 21.5 21 20.5 20

Figure 32 Mean response time (ob4).

50

0.034 0.032

Minutes

0.03 0.028 0.026 0.024 0.022 0.02

Figure 33 Mean slowdown (ob4).

3.6.3. Discussions This section discusses the results and conclusions of the above sections. We explored the job characteristics such as (a) the maximum percent of total jobs requesting single CPU for execution and (b) the percentage of jobs requesting even number of CPUs. Another remark in terms of job running time is that almost half of the total jobs execute within an hour (Short jobs), while the rest of them require more than an hour (Long jobs). Moreover, jobs’ arrival rate does not follow a uniform distribution. The above findings exhibit the workload heterogeneity that may affect various services offered by the system under consideration. Therefore, it is essential to compare and analyze different scheduling policies. Various job scheduling policies are studied in this paper for large-scale parallel computing systems. Some job scheduling policies produce results with overheads. However, each policy possess various characteristics, such as FCFS produces better results with respect to fairness, but does not support resource fragmentation. If a job demands a large number of CPUs for execution and at that time period the system cannot serve the job due to unavailable CPUs, then the job waits in the queue and prevents the next job from being executed. The above happens even in the case that the requirements of the next job are met by the system. Consequently, the aforementioned case increases the job queue time as well as the response time. Our analysis in all of the observations in previous section reveals that the results of FCFS policy are not satisfactory in terms of job queue time and response time metrics. A solution of the processors fragmentation problem in FCFS is introduced in backfilling technique (i.e., conservative and aggressive [115, 116]) with addition to maintain the fairness situation. 51

However, taking into account the introduced job characteristics, other job scheduling policies may become superior to FCFS. For instance, with respect to the job size (i.e., job’s width), the LJF results in better solutions for Wide jobs, while SJF exhibits better results in terms of Narrow jobs. Similarly, according to job running time (i.e., job’s length), MaxET and MinET policies are well suited for Long jobs and Short jobs, respectively. The scheduling policies are evaluated under three different classes, as shown in Table 6. These classes distinguish the scheduling policies into three different sets of policies, such as Class-I, Class-II, and Class-III. The sets of the scheduling policies in each class are explained as follows. Class-I includes four policies: MinET, SJF, as well as these policies combined with FF technique. In Class-II, FF technique is combined with FCFS, LJF, and MaxET. In the same class we also include the aggressive backfilling (EAZY) and K-reserved based (Max_Pri) policies. Finally, Class-III consists of four policies: FCFS, LJF, MaxET, and Window-K policies. Table 6 Performance of job scheduling policies Policies

Performance

Rank

MinET, MinET-FF, SJF, SJF-FF

Best

1st Class

FCFS-FF, LJF-FF, MaxET-FF, EAZY, Max_Pri

Average

2nd Class

FCFS, LJF, MaxET, Window-K

Worst

3rd Class

We found that the policies in Class-I produce better results as compared to the policies in Class-II and Class-III. The policies in Class-II are superior to the policies in Class-III. The reasons are explained in the following paragraph. The major reason that SJF policy is superior to the rest policies in all of the figures is due to the fact of the large number of Narrow jobs. Alternatively, for the MinET and MaxET policies, we have already highlighted that (a) MinET and MaxET take into account user’s estimated running time, and (b) half of the jobs are Long and the rest are Short. Because of the aforementioned factors, the MinET and MaxET may produce almost similar results. However, the results shown in Figure 16-Figure 21 reveal that the aforementioned does not hold. This is explained by the fact that the number of Narrow jobs is quite larger than that of Wide jobs. Because MinET and MinET-FF take into account the above fact, it results in better solutions against MaxET and MaxET-FF. The Figure 22-Figure 27 present the results for Narrow jobs respectively. The number of jobs is 36% in Figure 22-Figure 24 and 42% in Figure 25-Figure 27 of the total jobs in the overall workload. In Figure 22-Figure 24, the jobs belong to SN category, while in Figure 25-Figure 27 the jobs belong to LN category. In the figures of both observations, because the policies in the Class-I schedule the jobs according to smallest job first and minimum execution time, the policies of Class-I dominate the policies belonging to other classes. The results for Wide jobs (i.e., SW and LW) are shown in Figure 28-Figure 30 and Figure 31-Figure 52

33, and their job percentage is 10% and 11%, respectively. In Figure 28-Figure 30, the scheduling policies of both Class-I and Class-II exhibit better results as compared to the policies in Class-III. Figure 31-Figure 33 depicts that the results are almost same for all of the scheduling policies.

3.7. Energy Efficient Job Scheduling Policies In the previous section, we evaluated several resource allocation policies and categorized their results based on their performance. As growing rate of the power consumption in largescale parallel systems is dominant part of the user’s and owner’s budget, an energy efficient scheduling policy is needed together with a resource allocation policy [120]. In this section, we incorporate energy efficiency in three policies selected from each class (discussed in Subsection 0). In the following sections, we discuss power model, scheduling criteria, and problem statement, and then we explain our proposed energy efficient policies with their results and discussion.

3.7.1. Energy Model The energy model for DVFS that is based on the power consumption model in CMOS (complementary metal-oxide semiconductor) logic circuit [121] has been used in this paper. The machine 𝑀𝑖 consumes the power capacitive 𝑃𝑖𝑗𝑘 for computing the task 𝑇𝑗𝑘 . The power is calculated in the following way. 𝑃𝑖𝑗𝑘 = 𝐴 . 𝐶 . 𝑣 2 . 𝑓 ,

(1)

where 𝐴 represents the number of switches per clock cycle, 𝐶 represents total capacitance load, 𝑣 represents supply voltage, and 𝑓 represents machine frequency. The value of switches per clock cycle and total capacitance load are determined at the design time [121]. While the combined reduction of the clock frequency and supply voltage lies in the dynamic power management technique called DVFS. Brown et al. [98] assumed that frequency of each machine is proportionate to its processing speed. The energy consumed by the machine can be reduced by decreasing the frequency and supply voltage. We assume that each machine in the data center is equipped with DVFS module [103] and belonged to one of the specific machine class. Table 7 shows the configuration of different classes that are created in such a way that the total computation size must meet with original setup. For instance in original setup, total number of CPUs was 2090 with 3.0 GHz speed (i.e., total computational size was 6270.0 GHz). The parameters for DVFS levels are presented in Table 8. The table shows the values of the frequency and voltage parameters of 53

twelve DVFS levels of three machine classes [120]. Each class of the machines is comprised different computing capacity (i.e., number of computational cycles per second). Table 7 CPU classes Machine

Total

Speed in MHz

Computation Size

Class-I

1175

3000

3525000

Class-II

520

3600

1872000

Class-III

220

4000

880000

Total

1915

10600

6277000

Table 8 DVFS module Class-I

Class-II

Class-III

Volt.

Volt.

Volt.

1.0

1.3

1.9

2.5

0.9

1.2

1.7

2.3

2

0.8

1.1

1.5

2.1

3

0.7

1.0

1.3

1.9

Level

Freq.

0 1

When lowering the voltage, the operation frequency of the machine decreases and the computational time of the task executed on the machine increases. We apply DVFS at the behavioral level where the supply voltage of processing unit is constant for a task during execution but it may be different for other task.

3.7.2. Energy Efficient Criteria Each job is a collection of tasks, which comprised deadline and workload. All tasks in a job have same value of deadlines and workloads. The scheduler randomly generates the job’s deadline for each job in meta-task event. The randomly generated value of the job deadline must range between the running time of the task on minimum power level of a lowest class machine and the running time of the task on maximum power level of a highest class machine. A load 𝑤𝑙 of the task 𝑇𝑗𝑘 expressed in Millions-of-Instructions (MI) execute on a machine 𝑀𝑖 with computing capacity 𝑐𝑐 expressed in Millions-of-Instruction-Per-Seconds (MIPS). We use job running time (i.e., used for previous setup) as job’s computational cycles. The overall workload 𝑊𝐿 and computing capacity 𝐶𝐶 is denoted by: 𝑊𝐿 = [𝑤𝑙1 … . , 𝑤𝑙𝑛 ] and 𝐶𝐶 = [𝑐𝑐1 … . , 𝑐𝑐𝑛 ]. Estimation Time for Completion (ETC) of the task 𝑇𝑗𝑘 for executing on the machine 𝑀𝑖 is created by workload 𝑤𝑙𝑗𝑘 divided by computing capacity 𝑐𝑐𝑖 . ETC is denoted by 𝐸𝑇𝐶𝑖𝑗𝑘 (𝑖 ∈ 𝑚, 𝑗 ∈ 𝑛, 𝑘 ∈ 𝑜) and calculated in the following way.

54

𝐸𝑇𝐶𝑖𝑗𝑘 =

𝑤𝑙𝑗𝑘 𝑐𝑐𝑖

(2)

For a scheduled 𝑆 task-machine pair (𝑡𝑗𝑘 , 𝑀𝑖 ), the task 𝑡𝑗𝑘 completion time and energy consumption on the machine 𝑀𝑖 is minimized and not violate the task’s deadlines. Suppose we are given processing units and a set of jobs along with a set of tasks. Each job must be placed to processing units such that the job’s constraints (i.e., number of tasks and deadline) are fulfilled. The running time of job on processing unit must be less than deadline. The placement of each job and its task must be placed as much as possible on minimum level of lowest class machine as energy efficiency must be achieved.

3.7.3. Problem Statement This paper addresses the following main scheduling objectives as follows.  Job placement must meet job’s requirement (i.e., each job needs number of CPUs to execute).  Minimize total energy consumed by processing unit and makspan (finishing time of latest job). The mathematical equation for energy consumption objective is given by: 𝑛

𝑚

𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 ∑ ∑ 𝑝𝑖𝑗

(3)

𝑖=1 𝑗=1

where 𝑝 is the power, 𝑖 represents a machine, and 𝑗 represents a task.

3.7.4. Proposed Energy Efficient Policies From the Subsection 0, we found the performance of each job scheduling policy evaluated in different classes. A scheduling policy from each class has been selected to modify with DVFS technique to analyze the behavior of job scheduling policy under the specific workload characteristics. The selection of the policy from best, average, and worse policies will help to analyze the energy efficiency strategy. For the energy efficient policies, we use the same workload described in Section 3.5.1, which were simulated for resource allocation policies (i.e., without DVFS technique). The modified policies that incorporate energy efficiency are: (1) best result producer SJF, (2) average result producer LJF-FF, and (3) worst result producer LJF. Specifically their names when combined with energy efficient policies become: (a) SJF energy efficient, SJF-EE, (b) LJF energy efficient, LJF-EE, and (c) LJF-FF energy efficient, LJFFF-EE. Algorithm 1 shows the pseudo-code for the SJF-EE. To realize the dynamic and real environment in our setup, the jobs submitted to the system are queued, and the scheduling policy may check the queue periodically (i.e., 10 seconds time interval). In each time interval, the SJF55

EE policy sorts the machines according to their minimum ready time (Line 3), and then Line 4 sorts the jobs in decreasing order of their CPU demand (i.e., number of tasks). For the job placement on the machine, each job in the queue is examined such that (a) number of tasks in a job must be less than or equal to the number of ready machines in any machine’ class (Line 6), (b) to place the job on lowest machine class (Line 7), and (c) job estimation time for completion must be less than or equal to job’s deadlines (Line 9). Consequently, energy consumption of the system will be minimized with little running time overhead. Algorithm 1. Smallest Job First Energy Efficient (SJF-EE) Input: Processing units, initialized to its power using 𝐷𝑉𝐹𝑆 = 𝑑𝑣𝑓𝑠1 , 𝑑𝑣𝑓𝑠2 , . . . . . , 𝑑𝑣𝑓𝑠𝑛 and Jobs 𝐽𝑖 (𝑖 = 1, … , 𝑛) Output: A job with its task mapped on the machines with possible minimum energy and makespan cost. 1. Set 𝑙𝑜𝑜𝑝_𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 = 𝑠𝑡𝑎𝑟𝑡_𝑡𝑖𝑚𝑒 2. while 𝑏𝑎𝑡𝑐ℎ_𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 do 3. Sort the ready machines 𝑀𝑗 (𝑗 = 1, … , 𝑚) in decreasing order of their power consumption (i.e., lowest class then minimum level) 4. Sort the submitted jobs 𝐽𝑖 (𝑖 = 1, … , 𝑛) in decreasing order of their CPU demand, number of tasks 𝑇𝑖𝑘 (𝑘 = 1, … , 𝑝) 5. for each 𝐽𝑖 do 6. if number of ready machines 𝑚 in any class >= number of task 𝑝 in 𝐽𝑖 then 7. for each 𝑝𝑜𝑤𝑒𝑟_𝑙𝑒𝑣𝑒𝑙 do 8. Calculate job 𝐸𝑇𝐶 estimation time for completion 9. if 𝐸𝑇𝐶𝑖𝑗 ≤ Ji . 𝑑𝑒𝑎𝑑𝑙𝑖𝑛𝑒𝑠 then 10. Place the job on the machine 11. break 12. end if 13. end for loop 14. end if 15. end for loop 16. 𝑙𝑜𝑜𝑝_𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 = 𝑙𝑜𝑜𝑝_𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 + 10 seconds 17. end while

Other modified scheduling policies for energy efficiency, LJF-EE and LJFFF-EE, perform in the same way as SJF-EE. Their difference is that LJF-EE sorts the jobs in increasing order of their CPU demand, and LJFFF-EE finds the jobs that can be fit on first idle machines to increase the resource utilization (described in the Section 5.5).

3.7.5. Results and Discussion of Energy efficient Policies In this section, we present the result produced by the proposed energy efficient scheduling policies. We consider two metrics to evaluate the performance of energy efficient scheduling policies: (a) total energy consumption of the system and (b) total running time (i.e., makespan) of the entire workload, as shown in Figure 34 and Figure 35, respectively. In the figures, the results reveal that less energy has been consumed by LJFFF-PE against others but it is also observed that the overall running time (makespan) has been increased. While the SJF-PE, modified policy of SJF ranked in first Class in the previous experiment (see Section 3.6), does not produce satisfactory results in terms of both total energy consumption of the system and total runtime of the entire workload. On the other extreme, the extended policy of LJF, LJF-PE consumes less 56

energy than SJF-PE and also minimizes overall running time as compared to other policies. The reason behind the fact that LJFFF-PE provides best performance in aspect of low energy consumption is that in each scheduling batch interval Wide jobs are taking chance to execute on the low performance machines. While the reason behind the fact of LJFFF-PE increasing overall running time is that the scheduler tries to place the jobs on low level of the power of the machine, which forces to increase the job execution time. The aforementioned reasons are also considerable in SJF-PE producing unsatisfactory results, which increases energy consumption of the systems and overall running time of the workload. 7180

Energy Sum (mJ)

7160 7140 7120 7100 7080 7060 7040 7020 LJF-PE

LJFFF-PE

SJF-PE

Figure 34 Energy sum (mJ).

1106000

Runtime Sum

1104000 1102000 1100000 1098000 1096000 1094000 1092000 LJF-PE

LJFFF-PE

SJF-PE

Figure 35 Runtime sum.

After finding results for entire workload, we further analyze the workload for Narrow and Wide jobs (i.e., discussed in the Section 5.4.3). The results are shown in Figure 36-Figure 37 and Figure 38-Figure 39 for Narrow jobs and Wide jobs, respectively. As the primary goal of energy efficient technique in job scheduling policy is to minimize the power consumption with reasonable overhead of running time, here LJF-PE produces better results against other policies. In both scenarios, Narrow and Wide jobs, LJF-PE consumed less amount of energy with little overhead of running time, shown in both figures. While in SJF-PE in both scenarios, SJF-PE did not produce better results against other policies. From the above evaluation of the energy efficient policies, we found that LJF-PE provides affordable results in aspect of both low energy 57

consumption of the system and minimum running time of entire workload. 770 760 750 740 730 720 710 700 690 680 670 660

Energy Sum (mJ)

LJF-PE

LJFFF-PE

SJF-PE

Figure 36 Energy sum (mJ) of Narrow jobs.

100000

Runtime Sum

99500 99000 98500 98000 97500 97000 96500 96000 LJF-PE

LJFFF-PE

SJF-PE

Figure 37 Runtime sum of Narrow jobs.

6480 6460 6440 6420 6400 6380 6360 6340 6320 6300 6280

Energy Sum (mJ)

LJF-PE

LJFFF-PE

SJF-PE

Figure 38 Energy sum (mJ) of Wide jobs.

58

1008000

Runtime Sum

1006000 1004000 1002000 1000000 998000 996000 994000 992000 LJF-PE

LJFFF-PE

SJF-PE

Figure 39 Runtime sum of Wide jobs.

In the above sections, we thoroughly discussed the characterization and analysis of the real data center workload and several resource allocation job scheduling policies, and then we modified the selected policies for energy efficiency. We investigated that most of the job scheduling policies are affected significantly by certain workload characteristics. For instance: (a) Due to the large number of Narrow jobs, MinET and SJF when combined with FF technique exhibit better performance compared to other policies. (b) A large number of small jobs in the workload can stop the MaxET policy for producing at least same results compared to MinET policy with certain job characteristics. (c) While the Wide jobs taking chance to execute on the low performance machine with low level of the power make LJF-PE best performance in terms of energy consumption.

3.8. Summary This paper is a comparative study on resource allocation and energy efficient job scheduling strategies in large-scale parallel computing environment. Because of the growing rate of power and cooling in the aforementioned environments is dominant part of the user’s and owner’s budget, resource allocation job scheduling techniques must consider the energy efficiency while maintaining desired QoS constraints. However, considering energy efficiency together QoS such as queue time and response time, a job scheduling problem is more difficult to design. In this paper, we have studied a total of thirteen job scheduling policies to analyze and compare their results. The set of job scheduling policies considered in this work includes (a) priority-based, (b) first fit, (c) backfilling, and (d) window-based techniques. We used three metrics (queue time, response time, and slowdown ratio) to evaluate the performance of the job scheduling policies. All of the policies were extensively simulated and compared using a real data center workload exhibited a wide range of job heterogeneity. Based on the results of their performance, we then incorporate energy efficiency in three job scheduling policies. We found that most of job scheduling policies are affected significantly by certain workload characteristics, such are: (a) Due to the large number of Narrow jobs, MinET and SJF when combined with FF technique 59

exhibit better performance compared to other policies. (b) A large number of Small jobs in the workload avoid the MaxET policy to produce at least same results compared to MinET policy with certain job characteristics. (c) The Wide jobs taking chance to execute on the low performance machine with low level of the power, which helps LJF-PE to produce best results in terms of energy consumption. Our analysis revealed that a single policy is not sufficient for resource management in parallel computing environments. Such environments need to implement dynamic and adaptive scheduling policies. After simulation of job scheduling strategies on large-scale parallel environments, in the next chapter, we will provide a comprehensive study on VM scheduling strategies for HPC workload in the cloud environment. Simulation and experimental results of six VM scheduling policies will be presented. All of the policies are incorporated with the dynamic voltage and frequency scaling (DVFS) power management technique.

60

Chapter. 4

Job Scheduling Strategies for HPC Clouds3

In virtualized environments, virtual machine (VM) scheduling strategies incorporated with energy efficient techniques are needed to reduce the operational cost of the system while delivering high Quality of Service (QoS). It is widely accepted that the cost of the energy consumption in the environment is a dominant part of the owner’s budget. However, when considering energy efficiency, VM scheduling decisions become more constrained, leading in the violation of job deadlines and hence compromising QoS. This chapter studies energy efficient VM scheduling strategies in virtualized environments to minimize the queue time and makespan under the fulfillment of SLA requirements (i.e., deadline). Specifically, six energy efficient VM scheduling strategies are investigated incorporated with the dynamic voltage and frequency scaling (DVFS) power management technique. They consist of user-oriented and systemoriented policies. The strategies are extensively simulated and compared with three power management governing methods provided at hypervisor level (i.e., userspec, ondemand, and performance). To conduct simulation experiments, we employ real-world high performance computing (HPC) workloads collected from a production data center. For comparison and evaluation, we analyze the: (a) energy consumption, (b) runtime, (c) queue time, (d) makespan, and (e) slowdown ratio. Lastly, we highlight the strengths and weaknesses of the VM scheduling strategies that can help to choose the most appropriate VM scheduling strategy for a given scenario.

4.1. Introduction Recently, virtualization becomes more and more attractive for low-budget environments such as scientific organizations to solve large problems dealing with high performance computing (HPC) workloads. In the last decade, various scientific organizations have employed supercomputers to carry out research at the expense of paying a huge amount of their budget [78]. Consequently, cloud computing has been emerged as an alternative virtualized environment to provide parallel computations at a low cost, which is available to the customers on a pay-peruse basis [34]. The bottom level of cloud computing is structured in thousands of thousand servers called computing farm and data center. Due to its importance and usability in our daily life, the number of cloud resource providers (RPs) gradually increases. Cloud RPs offer a variety of services including computation power and storage space to customers. 3

A paper with the part of Chapter 4 has been submitted for the reviews process in an SCI Journal the SCIENCE CHINA Information Sciences, ISSN 1674-733X. Title of the paper: “Energy efficient VM scheduling strategies using dynamic voltage scaling for HPC workload in cloud data center”

61

In the aforementioned environments, users submit their requests to cloud RPs for virtual machine (VM) resources. The VM resource scheduling strategy is hidden to users. Such requests may be computation-intensive, data-intensive, or mixed (i.e., computation-intensive and dataintensive) called complex jobs [79]. Moreover, such requests may require different quality of service (QoS) requirements in the form of service level agreements (SLAs), including maximum response time, minimum queue time, and fulfilling the job deadline constraint. Furthermore, the cost of the energy consumption in a cloud data center is a dominant percentage of the owner’s budget. Because of the above considerations, the main concern of cloud RPs is the energy efficiency in the underlying cloud data center. Various components of servers which are deployed in cloud data centers can consume different ratios of power for their operations. For example, the power consumption on peak utilization of the major components of a single server is clearly described in Table 9 [80]. The table reveals that the processor consumes a significant part of the power within a server. Table 9 The Peak power of the components of a typical server Power Elements

Processor

RAM

Storage

PCI

M. board

Fan

Power (W)

40

9

12

25

25

10

Total Count

2

4

1

2

1

1

Total Power (W)

80

36

12

50

25

10

Due to the above factors, cloud RPs pay considerable attention to resource management to fulfill the required QoS [40] and to reduce the operational cost. A vast amount of research has focused on resource management to optimize the system performance considering energy efficiency and various QoS constraints. A resource scheduling strategy is one of the major components of a resource management in cloud data centers. However, it is hard to design a resource scheduling strategy when many objectives, i.e., non-violation of the QoS, full-filling the SLA requirements, and achieving energy efficiency, are considered [29, 79]. In a cloud computing environment, the primary concern of cloud RPs is to efficiently utilize the resources to a huge number of customers while maintaining different QoS [40]. The contribution of this paper is three-fold: (a) comparison and analysis of a set of VM scheduling strategies for HPC workload to evaluate the performance of virtualized environments that is comprised of DVFS-based heterogeneous processing elements; (b) the design of VM scheduling strategies for energy efficiency while achieving QoS constraints; and (c) analysis of the power governor schemes (i.e., userspec, ondemand, and performance) provided at hypervisor level. The analysis of energy efficient VM scheduling strategies can help to choose the best VM scheduling strategy for a given power governor scheme. We employ the HPC workloads of a production data center for the experimental evaluation. The workloads have been fully 62

characterized and analyzed in our previous work [30, 39], and the results revealed that the workloads possess heterogeneity in resource requirements (i.e., job width and height). Therefore, it is reasonable to analyze the effect of various VM scheduling strategies for energy efficiency for HPC workload. Specifically, we study a total of six energy efficient VM scheduling strategies: four size-based policies, one first-serve-based policy, and one backfilling-based policy. The size-based energy efficient VM scheduling strategies are: (a) Large Job First (LJFEE), (b) Small Job First (SJF-EE), (c) Minimum Millions of Instruction (MinMI-EE), and (d) Maximum Millions of Instructions (MaxMI-EE). The first-served-based energy efficient policy is the well-known First Come First Serve (FCFS-EE) strategy. The size-based and first-serve-based energy efficient VM scheduling strategies are incorporated with the First Fit (FF) technique. Lastly, we consider the energy efficient Aggressive Backfilling First (ABF-EE) technique as a backfilling-based policy, which basically works in conjunction with the FCFS policy. Major highlights and insights found in the paper are: (a) all of the strategies under the ondemand power management scheme are more energy efficient compared to the strategies under the userspec power management scheme. (b) The LJF-FF-EE and MaxMI-FF-EE consumed less energy but degraded the results of the mean slowdown and the slowdown ratio. (c) The energy efficient SJF-FF-EE and MinMI-FF-EE VM scheduling strategies exhibit better performance in terms of the mean queue time against to the rest heuristics. (d) The energy efficient backfilling VM scheduling policy (i.e., ABF-EE) produced the smallest makespan under the studied power management schemes compared to other policies. The rest of this chapter is organized as follows: Section 4.2 presents the related work of energy efficient strategies. In Section 4.3, we explain the system model and problem statement, while in Section 4.4 we introduce VM scheduling strategies for energy efficiency. Section 4.5 states the experimental settings and simulation results of various VM scheduling strategies with their discussion. Finally, in Section 4.6, we conclude the chapter.

4.2. Related Work In this section, we present the state-of-the-art of energy efficient resource scheduling strategies. Next, we explain the most popular energy efficient techniques which are incorporated with resource scheduling strategies to achieve different objectives. A vast amount of research such as [30, 39, 83, 122-129] has focused on resource scheduling strategies to address the problem of system utilization under the different QoS constraints. Tang et al. [83] introduced a metric-aware job scheduling strategy, where different scheduling objectives, i.e., waiting time, fairness, and system utilization, are balanced by the scheduler in large scale parallel computing environments. Khan et al. proposed optimization techniques using goal programming and game theoretical methodologies in [86, 87] for joint optimization of 63

performance and power consumption in datacenters. Tziritas et al. addressed the problem of virtual machine placement to minimize both energy consumption and network load in cloud environments through application-aware workload consolidation in [122, 123, 130]. Beloglazov et al. [124] adopted the best fit decreasing (BFD) approach and introduced energy-aware resource allocation heuristic called MBFD for efficient management of data centers in cloud computing. The MBFD technique consolidates VMs onto the minimum number of physical machines, switched off in that way the idle physical machines to reduce power consumption [124]. Fu and Zhou [125] propose a novel VM placement policy that prefers placing a migrate able VM on a host that has the minimum correlation coefficient. Mansouri [126] developed a job scheduling policy, called hierarchical job scheduling strategy (HJSS) to reduce the search time for an appropriate computing node. Nesmachnow et al. [127] studied eight scheduling heuristics for mapping VMs in cloud systems. The authors provided an analysis and evaluation of the list scheduling heuristics to maximize the broker profit. Villegas et al. [128] compared the performance of eight provisioning and four allocation policies in three cloud environments including Amazon and EC2. Von Laszewski et al. [129] presented design and implementation of VM scheduling in a DVFS-based virtualized cluster. In this paper, we also address a similar problem of system performance through a comprehensive study of energy efficient VM scheduling strategies for HPC workloads in cloud computing. Different to the past studies, we provide experimental investigation of energy efficient VM scheduling strategies in the context of power management governing schemes provided at hypervisor level. We design a DVFS-based energy efficient model for VM scheduling strategies. Furthermore, various researches used publically available workloads to evaluate the performance of resources scheduling strategies, while most of the works considered few assumptions about the nature of workloads to present a specific real system for their experiments. On the contrary, we considered the workloads collected from a real data center to evaluate resource scheduling strategies that is closer to real scenarios. The study of a set of energy efficient scheduling strategies analyzes and compares the simulation results and highlights their performance.

4.2.1. Energy Efficient Techniques Koomey [99] noticed that the worldwide energy consumption within data centers was doubled from 2000 to 2005. Furthermore, in the same study it was estimated that the growth in the energy consumption was around 80%, 10%, and 10% for servers, communication devices, and storage resources, respectively. Koomey [100] reported in 2011 that the growth in the energy consumption is significantly slowed instead of doubled for the year 2005 to 2010. The above is identified because of the following factors: dynamic voltage and frequency scaling (DVFS), dynamic power management (DPM), cooling efficient strategies, and virtualization. In the 64

following section, we describe two major energy efficient techniques, i.e., DPM and DVFS, which can be incorporated with VM scheduling strategies. 4.2.1.1. DPM In the DPM energy saving technique, the power elements of the server such as processor, memory, storage, and communication devices can be powered on/off dynamically [79]. When the power elements of a server are idle, they still consume 2/3 of the peak load of the server in question [103]. To eliminate power consumption, idle resources can be put in sleep mode most of the time [79]. Particularly, through DPM power saving technique, the processor cores and the network devices can be put in sleep mode [79, 105, 106], while the bank of the memory can be turned off [54]. Furthermore, in virtualized environments, the workloads can be consolidated on the least number of servers, while powering off the rest of the servers [122-124]. In DPM technique, devices that are in sleep mode can increase delay of the wake up time, which consumes substantial power. 4.2.1.2. DVFS DVFS energy saving technique is an energy proportional technique and applied to a processor adjusting the processor frequency through an apt supply voltage [39, 110, 113, 131]. In DVFS technique, the energy consumption of the processor is based on the workloads. Basically, DVFS energy saving technique can be applied at two levels for supplying voltage [110], i.e., behavioral and system level. 

Behavioral level: DVFS determines a constant supply voltage at runtime.



System level: DVFS varies the supply voltage of the processor at runtime.

DVFS provides a satisfactory solution to reduce the energy consumption of the processor. In this paper, we study DVFS at the behavioral level incorporated with various energy efficient VM scheduling strategies in cloud computing for HPC workload.

4.3. System Model In this section, we describe the data center model, VM resource model, energy model, and problem statement considered in this work. Table 10 shows a nomenclature of important notations and descriptions used in this paper.

65

Table 10 Nomenclature Notation

Description PM

Heterogeneous physical machine

DVFS

Dynamic voltage and frequency scale

MIPS

Millions-of-Instruction-Per-Seconds

MI

Millions-of-Instruction

SJF-FF-EE

Energy efficient small job first policy that schedule job on first fit bases

LJF-FF-EE

Energy efficient large job first policy that schedule job on first fit bases

MinMI-FF-EE

Energy efficient minimum MI job first policy that schedule job on first fit bases

MaxMI-FF-EE

Energy efficient maximum MI job first policy that schedule job on first fit bases

FCFS-FF-EE

Energy efficient first come first served policy that schedule job on first fit bases

ABF-EE 𝑐𝑖

Energy efficient aggressive backfilling policy i-th cpu

𝐶𝐶

A total computation capacity of the data center

𝑈𝑗

User’s request under consideration

𝑇𝑗𝑘

k-th task contained in j-th request

𝑉𝑀𝑟

Virtual machine

𝑃(X)

Total energy consumption within a data center by applying X

𝑓𝑖𝑥

Frequency of i-th cpu under x-th DVFS level

𝑣𝑖𝑥

Supply voltage of i-th cpu under x-th DVFS level

𝑤𝑙𝑗𝑘

Computation workload of Tjk

𝑊𝐿

A total computation workloads

𝐸𝑇𝐶𝑖𝑗𝑘𝑥

Estimation time for completion on i-th cpu

4.3.1. Data Center Model In cloud computing, a data center typically comprises of a large number of physical machines (PMs) that are homogeneous, heterogeneous, or mixed. In this study, we focus on private cloud dedicated to HPC applications that is comprised of a number of heterogeneous DVFS-enabled PMs. The PMs in the dedicated HPC cloud can be categorized into different energy-wise categories regarding computing capacity. We show the configuration of the three different PM classes in Table 11 used in our simulation. Each PM class is selected with different computing capacity (i.e., number of computation instructions per second). 66

Table 11 PM Classes PM

Total (PMs)

Cores (cpu)

Speed (MIPS)

Computation Size in GHz

Class-I

45

16

3.6

2592

Class-II

85

8

3.3

2244

Class-III

120

4

3.0

1440

Total

250

28

9.9

6276

Table 11 shows that a PM in each category contains several numbers of cores, called cpu. Each core in a PM has equal computation capacity in terms of millions-of-instruction-perseconds (MIPS). The total computing capacity of the data center comprised of heterogonous PM is expressed as follows: 𝑛

𝐶𝐶 = ∑ 𝑓𝑖1 ,

(1)

𝑖=1

where 𝑓𝑖1 is used for the computation capacity of the 𝑖-th cpu under full frequency, while n signifying the total number of cpus within the system. The total computation size of a data center is calculated in such a way that the size must meet with the original setup that has been simulated in our previous work [30, 39]. Because virtualization is a convenient environment in the targeted applications that can support low-cost deployment system, acceptable input/output (I/O) overheads, etc. It is assumed that the original setup would be shifted on private virtualized cloud data center dedicated to HPC workloads. We simulated the original setup of the Center for Computational Research (CCR) data center of the State University of New York at Buffalo (see Subsection 4.5.2) which has been used for scientific computing in large-scale parallel computing systems [29, 30, 39]. The original setup contains a total of 2090 CPUs with 3.0 GHz speed (i.e., total computational size is about 6270.0 GHz) [30, 39]. We assumed that each PM of the underlying data center is interconnected with a high-speed LAN network and high bandwidth link to the Internet.

4.3.2. VM Resource Model In this paper, we consider the resource model of virtualized environment in the following way. We consider that the resource management component i.e., hypervisor or virtual machine monitor (VMM) in the environment manages the pool of provisioned VMs on the system and receives a set of request from users. These incoming requests are preconfigured in VM templates 67

provided by hypervisor or VMM. More precisely, the request of j-th user is denoted by 𝑈𝑗 (𝑗 = 1, … , 𝑚), with m signifying the total number of user’s request. A request 𝑈𝑗 may contain a single or multiple tasks 𝑈𝑗 = 𝑇𝑗𝑘 (𝑘 = 1, … , 𝑜), with 𝑇𝑗𝑘 declaring the k-th task contained in the request of j-th user, while o denoting the total number of the tasks contained in the request under consideration. Each of these tasks requires executing on a single VM provisioned for a given time period, as shown in Figure 40.

Figure 40 A Gant chart for a user’s VM request scheduled on cores (cpus).

Furthermore, each VM is preconfigured on a single core that is comprised of different computation capacity according to the energy-wise PM classes. We assume that all of the tasks contained in a user request have been assigned the same execution time and they are independently executed (i.e., there is no communication between tasks). When a request 𝑈𝑗 is submitted to a VM resource management component, then o correspondent VMs are started to execute the o tasks of the request 𝑈𝑗 on certain compute PM. A VM is scheduled for execution if the QoS requirements of the corresponding request are met. The QoS requirements considered in this work are the cpu frequency, computation capacity, and job’s deadline. A VM is defined in the following way: 𝑉𝑀𝑟 = (𝑐𝑖 , 𝑇𝑗𝑘 ),

(2)

where 𝑉𝑀𝑟 is the r-th virtual machine and ci is the cpu that the VMr is assigned on.

4.3.3. VM Workload Criteria Each task executing on VM is comprised of computation workload 𝑤𝑙𝑘 and deadline 𝑑𝑘 constraints. A deadline 𝑑𝑘 is randomly generated such that its range is between the time needed to execute the task in question on the minimum cpu frequency of the lowest energy PM and on the maximum cpu frequency of the highest energy PM. We assume that all of the tasks contained in a user request have the same deadline. The deadline constraint is a QoS requirement 68

provided by the user. Computation workload 𝑤𝑙𝑗𝑘 denotes the workload of k-th task of j-th user and is expressed in millions-of-instructions (MI). The total computation workload 𝑊𝐿 within the system is expressed as follows: 𝑚

𝑜

𝑊𝐿 = ∑ ∑ 𝑤𝑙𝑗𝑘 ,

(3)

𝑗=1 𝑘=1

𝐸𝑇𝐶𝑖𝑗𝑘𝑥 denotes the estimation time for completion of 𝑇𝑗𝑘 on the VM preconfigured on the ith cpu under the x-th frequency (the frequency is discussed in next subsection). The calculation of 𝐸𝑇𝐶𝑖𝑗𝑘𝑥 takes place as follows:

𝐸𝑇𝐶𝑖𝑗𝑘𝑥 =

𝑤𝑙𝑗𝑘 𝑓𝑖𝑥

(4)

4.3.4. Energy Model In this paper, we use the power consumption model in CMOS logic circuit [121], called DVFS-based energy model. In [129], it is assumed that each cpu of a PM in the virtualized data center is equipped with the DVFS-based energy model. The power consumption of such a model is given as follows: 2 𝑝𝑖𝑥 = 𝐴 × 𝐶 × 𝑣𝑖𝑥 × 𝑓𝑖𝑥 ,

(5)

where 𝐴 denotes the number of switches per clock cycle, 𝐶 denotes the total capacitance load, 𝑣𝑖𝑥 denotes the supply voltage of i-th cpu under x-th DVFS level, and 𝑓𝑖𝑥 denotes the cpu frequency of i-th cpu under x-th DVFS level. The total capacitance load and the switches per clock cycle are determined at design time [121]. The combined reduction of the clock frequency and supply voltage lies in the dynamic power management technique called DVFS. When 𝑇𝑗𝑘 is executed on the VM preconfigured on the i-th cpu under x-th DVFS level, then the energy consumption equals 𝑝𝑖𝑥 × 𝐸𝑇𝐶𝑖𝑗𝑘𝑥 . Let

𝑋𝑖𝑗𝑘𝑥 be a decision binary variable.

Specifically, 𝑋𝑖𝑗𝑘𝑥 equals 1 when 𝑇𝑗𝑘 is executed on the VM preconfigured on the i-th cpu under x-th DVFS level, otherwise equals zero. Let X denotes a matrix of all of the decision binary variables. Total energy consumption 𝑃(𝑋) of a data center for executing entire workload is calculated in the following way:

69

𝑛

𝑚

𝑜

𝑙

𝑃(𝑋) = ∑ ∑ ∑ ∑ 𝑝𝑖𝑥 × 𝐸𝑇𝐶𝑖𝑗𝑘𝑥 × 𝑋𝑖𝑗𝑘𝑥

(6)

𝑖=1 𝑗=1 𝑘=1 𝑥=1

According to [98], the frequency of each cpu is proportional to its processing speed. When decreasing the frequency and supply voltage of a cpu, the power consumed in that cpu is reduced. In this paper, we also assume that each cpu of a PM in the data center is equipped with DVFS-based energy module, which can be operated in multiple voltages with different cpu frequencies. We choose a set of several DVFS levels in each of the aforementioned three classes of DVFS-enabled PM. The chosen DVFS levels are shown in Table 12. Table 12 DVFS Models DVFS Level

𝑓

Class-I

Class-II

𝑣

𝑣

Class-III 𝑣

0

1.0

2.5

1.9

1.3

1

0.9

2.3

1.7

1.2

2

0.8

2.1

1.5

1.1

3

0.7

1.9

1.3

1.0

4

0.6

1.7

1.1

--

5

0.5

1.5

--

--

6

0.4

1.3

--

--

Table 12 shows the values of frequency and voltage parameters of sixteen DVFS levels of the three PM classes (the same parameters are used in [120]). Alternatively, to decrease the voltage and the operation frequency of a cpu, we increase the execution time of the task executed on the cpu in question.

4.3.5. Problem Statement The problem tackled in this paper is defined as follows: try to schedule the VMs on minimum level of cpu frequency of lowest energy PM such that the total energy consumption and the ETC of jobs are minimized, under the requirement that the VM meets the QoS requirements. The problem is called VM scheduling problem. The optimal decision binary variables X* the minimum total energy consumption within the underlying data center is found when finding the optimal matrix X* of decision binary variables. 𝑛

𝑚

𝑜

𝑙

∗ min(𝑃) = min (∑ ∑ ∑ ∑ 𝑝𝑖𝑥 × 𝐸𝑇𝐶𝑖𝑗𝑘𝑥 × 𝑋𝑖𝑗𝑘𝑥 ). 𝑖=1 𝑗=1 𝑘=1 𝑥=1

70

(7)

4.4. VM Scheduling Strategies In this section, we describe VM scheduling strategies considered in this paper for incorporating energy efficient techniques. All of the VM scheduling strategies are basically based on list heuristics. Each of the list heuristics is incorporated with an energy efficient technique. A set of list heuristics and energy efficient techniques are introduced in the following sections.

4.4.1. List Heuristics Basically, in list heuristics, user’s requests (i.e., user’s jobs) are prioritized based on particular criteria [127], which favor (a) user objectives and (b) system objectives. In this paper, we study a total of six list heuristics for VM scheduling strategies. The aforementioned are detailed below. 

Heuristics based on the size of request: The heuristics based on the size of a user request are user-oriented. Specifically, they prioritize the user requests according to their size i.e., job width (number of task) and job height (execution of task). The Large Job First (LJF) heuristic gives priority to jobs containing as many tasks as possible, while the Short Job First (SJF) heuristic gives priority to jobs containing as less tasks as possible. The above heuristics are based on job width. On the other extreme, the Maximum Millions of Instructions (MaxMI) and the Minimum Millions of Instructions (MinMI) are based on job height. Specifically, MaxMI gives priority to the job with the maximum workload, while MinMI gives priority to the job with the minimum workload.



Heuristic based on backfilling: The heuristic based on backfilling is system-oriented in the sense that it utilizes the system space while providing fairness to users. The heuristic based on backfilling called the Aggressive Backfilling First (ABF). In the ABF, the space reservation for the first request is reserved under the condition that the next jobs (i.e., short jobs) are scheduled if they do not violate for the reservation of the first request [116].



Heuristic based on first served: The heuristic based on the first served (called FCFS) is user-oriented in the sense that it serves the users on a first come basis.



Heuristic based on BinPacking: The heuristic based on BinPacking is system-oriented in the sense that it increases the resource utilization. The BinBacking heuristic considered in this paper is the well-known First Fit (FF) heuristic. FF chooses the first request from the queue, which is fit on the first available idle resource. The heuristics based on the size of the request are combined with FF technique.

71

4.4.2. Energy Efficient VM Scheduling Strategies In this section, we describe energy efficient VM scheduling strategies that are incorporated with all of the above heuristics. The VM scheduling strategies employ the power management schemes provided at hypervisor level or VMM. Specifically, the Xen Hypervisor provides the facility for adjusting the frequency of each core of the PM by using the xenpm4 operating system command [129]. Several modern PM allows each virtual core to operate on independent cpu frequency such as Intel Nehalem quad-core processors with Hyperthreading [129]. The Xen Hypervisor has a cpufreq governor consisting of four governor schemes for power management. We describe each of these governor schemes in the following paragraph. 

ondemand: This power management scheme is a system-oriented energy efficient scheme. In the power management scheme of the Xen Hypervisor, the best frequency of each core is chosen that can be best fit with the chosen workload.



userspace: In this scheme, the administrator can specify the frequency of each core in the PM. This scheme enables the administrator to control manually the cpu frequencies as desired.



performance: This scheme is a default scheme in the Xen Hypervisor. The frequency of each core in the PM is fixed at the highest frequency, which increases the power consumption.



powersave: The Xen Hypervisor chooses the lowest frequency of each core in the PM, which can adversely affect the performance in terms of task execution.

In this paper, we analyze and evaluate the first three power management schemes incorporated with the aforementioned list heuristics tackling the VM scheduling problem. Because of the powersave power management scheme always runs on the lowest frequency and increases task execution, we do not further evaluate it. Alternatively, the performance power management scheme is used as a baseline in order to analyze and compare the userspec and ondemand power management schemes in terms of power efficiency. The basic purpose of this is to understand the best energy efficient VM scheduling strategy under different power management schemes and workloads. The studied VM scheduling strategies when combined with energy efficient technique become: (1) SJF-FF-EE, (2) LJF-FF-EE, (3) MinMI-FF-EE, (4) MaxMI-FF-EE, (5) FCFS-FF-EE, and (6) ABF-EE (i.e., EE stands for energy efficient). In our simulation, the studied heuristics incorporated with the above power management schemes are simulated wherein the levels of cpu frequency (i.e., described in Table IV) are selected. In default and at the time of completion of user requests, all of the cpus configured for VM are placed in minimum level of cpu frequency in ondemand and userspec schemes. 4

K. Yu, Website, March 2009, [online] http://wiki.xen.org/wiki/Xen_power_management

72

Algorithm 2 shows the pseudo-code for size-based energy efficient VM scheduling strategies (i.e., SJF-FF-EE, LJF-FF-EE, MinMI-FF-EE, and MaxMI-FF-EE). The user requests submitted to the resource management system are placed in the queue for VM starting. Then, the VM scheduling strategy checks the queue periodically (every ∆𝐼 time interval) (Line 2-19). In each time interval, the VM scheduling strategy collects the user requests submitted within the time interval in question (Line 4). In the size-based VM scheduling strategy, the incoming user requests are sorted according to the scheduling criteria (Line 4). Then information of ready VMs is also collected for scheduling the submitted requests (Line 5). Each user request in the queue is considered to be placed on a ready VM (Line 6-18). The criteria for the placement are: (a) all of the task in a user request must be scheduled on VMs (7-17), (b) the DVFS level for the core of each VM assigned for a user request should be adjusted on the lowest level (8-13), and (c) the deadline of user request must be less than or equal to 𝐸𝑇𝐶 (Line 9-15). In that way, the system can minimize the total energy consumption under the deadline constraint. Algorithm 2. Size-Based Energy Efficient VM Scheduling Strategy Input: A set of provisioned VM, initialized to the power of its cores using 𝐷𝑉𝐹𝑆 = 𝑓1 , 𝑓2 , . . . . . , 𝑓𝑛 and User’s requests 𝑈𝑗 (𝑗 = 1, … , 𝑚) Output: Each task of the user’s request mapped on the VM with possible low energy cost. 1: ∆𝐵 = 𝑆𝑡𝑎𝑟𝑡 //batch start time 2: each ∆𝐼 seconds do{ 3: ∆𝐵 = ∆𝐵 + ∆𝐼 //increment for interval 4: 𝑈 = collectUserJobs (∆𝐵 ); //collect user’s jobs 5: 𝑉𝑀𝑠 = collectReadyVMs (∆𝐵 ); //collect ready VMs 6: for each 𝑈𝑗 in 𝑈 do 7: 8: 9: 10: 11: 12:

for each 𝑇𝑗𝑘 in 𝑈𝑗 do for each 𝑉𝑀𝑟 in 𝑉𝑀𝑠 do for each 𝑓𝑥 in 𝑉𝑀𝑟 . 𝑐𝑖 do if 𝐸𝑇𝐶𝑖𝑗𝑘𝑥 ≤ 𝑑𝑘 then 𝑉𝑀 = removeVMfrom (𝑉𝑀𝑠); //get a VM placeUserTask (𝑇𝑗𝑘 ,𝑉𝑀);//place task on a VM

13: break; 14: end if 15: end for loop 16: end for loop 17: end for loop 18: end for loop 19: }

In the sequel, we explain the pseudo-code for the backfilling-based energy efficient VM scheduling strategy (i.e., ABF-EE) in Algorithm 3. Similarly, in this algorithm, the user requests are queued to the system for VM allocation and checked periodically. (Line 2-38). At each time interval, the ABF-EE VM scheduling strategy sorts the incoming user requests according to the first come first serve basis, i.e., sorted by submission time (Line 4). Then, user requests are placed to the VM according to the same strategy for reducing energy consumption. Each user 73

request submitted to the queue is checked periodically by the VM scheduler for the placement (Line 7-37). If the first job in the queue cannot be scheduled due to a limitation in the available resources, then the job is assigned the point in time that enough resources become available such that the job in question can be executed (Line 34). The aforementioned point in time is called shadow point in time. The scheduler can execute next jobs under the condition that the shadow point in time of the first job is not violated (Line 20-32). The above means that next jobs are executed as long as their execution does not affect the point in time that the first job will be executed. In that way, fairness and system utilization is achieved. Algorithm 3. Backfilling-Based Energy Efficient VM Scheduling Strategy Input: A set of provisioned VM, initialized to the power of its cores using 𝐷𝑉𝐹𝑆 = 𝑓1 , 𝑓2 , . . . . . , 𝑓𝑛 and User’s requests 𝑈𝑗 (𝑗 = 1, … , 𝑚) Output: Each task of the user’s request mapped on the VM with possible low energy cost. 1: ∆𝐵 = 𝑆𝑡𝑎𝑟𝑡 //batch start time 2: each ∆𝐼 seconds do{ 3: ∆𝐵 = ∆𝐵 + ∆𝐼 //increment for interval 4: 𝑈 = collectUserJobs (∆𝐵 ); //collect user’s jobs 5: 𝑉𝑀𝑠 = collectReadyVMs (∆𝐵 ); //collect ready VMs 6: 𝑓𝑙𝑎𝑔 =TRUE 7: for each 𝑈𝑗 in 𝑈 do 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24:

if 𝑓𝑙𝑎𝑔 AND numberOfTasks(𝑈𝑗 ) 9 hours

10.73

8.27

1.10

20.11

79.27

15.03

5.70

100

Total Jobs

4.5.3. Performance Metrics In this section, we describe the performance metrics considered to analyze all of the energy efficient VM scheduling strategies studied in this paper. We consider a total of seven metrics: (1) energy efficiency, (2) total energy consumption, (3) total runtime, (4) makespan, (5) mean queuetime, (6) mean slowdown, and (7) slowdown ratio. The resource providers are basically anxious about the energy efficiency, total energy consumption, total runtime, and makespan, while the users are concerned with the mean queue time, the mean slowdown, and the slowdown ratio. 

The energy efficiency metric is measured in normalized mJoule compared with the base line VM scheduling strategy that is performed on performance power management governor scheme.



The total runtime is the total execution time of all of the user requests on the system.



The makespan is the difference time between the start time of the first request and the end time of the last request.



The mean queue time is the elapsed time from the submission time of a user’s request until the time the VM assignment takes place.



Mean slowdown is the completion time of a user’s request divided by the running time of the request in question.



The slowdown ratio represents the normalized time of mean response-time.

4.5.4. Results and Discussion This section provides the results and discussions of the energy efficient scheduling strategies. Because the main focus of this study is to reduce energy consumption, we initially evaluate the performance of various energy efficient VM scheduling strategies in terms of energy efficiency 77

with respect to different power governor schemes. The energy efficiency results are shown in Figure 43 for the entire workload. Figure 43 reveals that all of the strategies under the ondemand power governor scheme perform better against the strategies under the userspec power governor scheme. The reason behind the above statement is that the ondemand scheme offers to energy efficient scheduling strategies a placement of jobs on the best fit cpu frequency according to job’s workload. Moreover, Figure 43 shows that MaxMI-FF-EE and LJF-FF-EE perform better in ondemand power governor schemes against the rest heuristics. The above is because MaxMIFF-EE and LJF-FF-EE placed the largest requests (i.e., in terms of workloads and number of tasks, respectively) first fit on the lowest-energy PM under the minimum level of cpu frequency, which consequently increase energy efficiency in data center.

Energy Efficiency (mJoul) %

41 39 ABF-EE

37

LJF-FF-EE

35

FCFS-FF-EE

33

SJF-FF-EE

31

MaxMI-FF-EE

29

MinMI-FF-EE ondemand

userspec

Power Management Governor Schemes Figure 43 Energy efficiency of energy efficient VM scheduling strategies under the different power management governor schemes.

Next, we present the results in terms of the total energy consumption and total runtime in Figure 44 and Figure 45, respectively. We observe in Figure 44 that the VM scheduling strategies under the ondemand power governor scheme consumed less energy than the VM scheduling strategies under the userspec power governor scheme. In the figure, we can also observe that the MaxMI-FF-EE under the ondemand power governor scheme produces results with less energy consumption against the rest heuristics. On the other hand, Figure 45 reveals that the overall running time is slightly increased by the VM scheduling strategies under the ondemand power governor scheme. The above is due to the fact that when lowering clock frequency of processor, the job execution time increases.

78

Energy Consumption (mJoul)

10300 10100 9900

ABF-EE

9700

LJF-FF-EE

9500

FCFS-FF-EE

9300

SJF-FF-EE

9100

MaxMI-FF-EE

8900

MinMI-FF-EE

8700 ondemand

userspec

Power Management Governor Schemes

Running Time (log.)

Figure 44 A total energy consumption of energy efficient VM scheduling strategies under the different power management governor schemes.

7.860 7.855 7.850 7.845 7.840 7.835 7.830 7.825 7.820 7.815

ABF-EE LJF-FF-EE FCFS-FF-EE SJF-FF-EE MaxMI-FF-EE MinMI-FF-EE ondemand

userspec

Power Management Governor Schemes Figure 45 Overall running time of energy efficient VM scheduling strategies under the different power management governor schemes.

In Figure 46 and Figure 47, we show the results in terms of the makespan and the mean queue time, respectively. As shown in Figure 46, the makespan is slightly increased under the ondemand power governor scheme against the userspec one. It must be noted that ABF-EE produced the smallest makespan in compare to the rest heuristics in both power governor schemes. The above is because ABF-EE is an aggressive energy efficient VM scheduling strategy and takes an opportunity to backfill jobs on minimum level of cpu frequency guaranteeing the next job’ execution would not violate shadow point of the first job. In that way, the makespan of the complete system is drastically decreased. In terms of mean queue time in Figure 47, we observed that each heuristic produces similar performance in both power governor 79

schemes. As can be seen, SJF-FF-PE and MinMI-FF-PE are superior against the rest heuristics in terms of queue time in both power governor schemes. The above is because SJF-FF-EE and MinMI-FF-EE placed the shortest requests (i.e., in terms of number of tasks and workloads, respectively) first fit, which subsequently decrease mean queue time in data center. 49000

Makespan

48000 47000

ABF-EE

46000

LJF-FF-EE

45000

FCFS-FF-EE

44000

SJF-FF-EE

43000

MaxMI-FF-EE

42000

MinMI-FF-EE ondemand

userspec

Power Management Governor Schemes Figure 46 Makespan of energy efficient VM scheduling strategies under the different power management governor schemes.

Mean Queue Time (min.)

600 500 ABF-EE

400

LJF-FF-EE

300

FCFS-FF-EE

200

SJF-FF-EE

100

MaxMI-FF-EE

0

MinMI-FF-EE ondemand

userspec

Power Management Governor Schemes Figure 47 Mean queue time of energy efficient VM scheduling strategies under the different power management governor schemes.

Lastly, we provide a comparison of the studied energy efficient VM scheduling strategies under the power governor schemes in terms of the mean slowdown and slowdown ratio shown in Figure 48 and Figure 49, respectively. Figure 48 reveals that the energy efficient VM scheduling strategies under the userspec power governor scheme produced a smaller mean slowdown against the ondemand power governor scheme, with the only exception being the MaxMI-FF-EE policy. Similarly, Figure 49 shows that the slowdown ratio is also smaller under the userspec 80

power governor scheme against the ondemand power governor scheme. Again the only exception is the MaxMI-FF-EE policy. Due to the priority given to place first the largest job in terms of workload, the MaxMI-FF-EE increases the mean slowdown and slowdown ratio in both power governor schemes.

Mean Slowdown

3.00 2.50 ABF-EE

2.00

LJF-FF-EE

1.50

FCFS-FF-EE

1.00

SJF-FF-EE

0.50

MaxMI-FF-EE

0.00 ondemand

userspec

MinMI-FF-EE

Power Management Governor Schemes Figure 48 Mean slowdown of energy efficient VM scheduling strategies under the different power management governor schemes.

3.00 Slowdown Ratio

2.50 ABF-EE

2.00

LJF-FF-EE

1.50

FCFS-FF-EE

1.00

SJF-FF-EE

0.50

MaxMI-FF-EE

0.00

MinMI-FF-EE ondemand

userspec

Power Management Governor Schemes Figure 49 Slowdown ratio of energy efficient VM scheduling strategies under the different power management governor schemes.

4.6. Summary In this paper, we provided a comparative study of energy efficient VM scheduling strategies for HPC workloads in a virtualized environment. Because the energy consumption is a dominant part of the cloud RP’s budget, energy efficient VM scheduling strategies must be considered while maintaining the desired QoS constraints. In this study, we studied a total of six VM scheduling strategies under three power governor schemes. To conduct simulation, we employed 81

a real-world HPC workload obtained from a production data center. This work highlights the strengths and weaknesses of the studied VM scheduling strategies that can help to choose the most appropriate energy efficient VM scheduling strategy in a given scenario. Major highlights and insights found in the paper are: (a) all of the strategies under the ondemand power governor scheme are more energy efficient compared to the strategies under the userspec power governor scheme. (b) MaxMI-FF-EE saved more energy but degraded the performance in terms of mean slowdown and slowdown ratio. (c) SJF-FF-EE and MinMI-FF-EE exhibited better performance in terms of mean queue time against the rest heuristics in all of the power governor schemes. (d) ABF-EE produced the smallest makespan under the studied power governor schemes compared to the rest policies. Our analysis revealed that a single policy is not sufficient for energy efficient resource management in virtualized environments. Such environments need to implement dynamic and adaptive energy efficient VM scheduling policies. In our future work, all of these policies will be analyzed with performance penalty, other subsystems than the cpu, and contention. After presenting study of VM scheduling strategies on cloud, the next chapter will introduce a cloud-based adaptive and tunable map-matching strategy for real-time GPS trajectories. The proposed method solves the key challenge of SPQs by pre-computation of the shortest path distances and the constraint of road segment speed by following the bulk synchronous parallel (BSP) paradigm in the cloud environment.

82

Chapter. 5 Adaptive and Tuneable Map Matching Strategy5 In this chapter, we introduce cloud-based RT-MM, fully adaptive map-matching strategy based on cloud that addresses a key challenge of SPQs in map-matching process for real-time GPS trajectories. We also propose an approach of the tuning-based strategy that adaptively fine-tunes the interior and the exterior parameters of the map-matching process. Furthermore, unlike the traditional approaches, the shortest path distances and the constraint of road segment speed are pre-computed by following the bulk synchronous parallel (BSP) paradigm in the cloud environment. The evaluation of our approach against the current state-of-the-art approaches found in the literature is performed through simulation results based on both synthetic and realworld datasets.

5.1. Introduction In recent years, most organizations are gradually diverting their applications to the cloud environment. The above is because the cloud enables on-demand access, storage service, and large data processing on the Internet anywhere in the world. The advent and rapid growth of information and communication technologies (ICT) leads to a new emerging concept called urban computing, wherein, sensors, vehicles, devices, buildings, people, and roads are accessed as a component to probe city dynamics [37]. The data represented in the aforementioned components is usually obtainable in the form of global positioning system (GPS) data. Due to the live nature of the above GPS-based components, their data can have an exponential growth in the volume ranging from a few dozens of Tera Bytes to Peta Bytes (i.e., Big Data). In order to achieve better quality, information in the GPS data is often used in transport applications including intelligent transportation systems (ITS) and location based services (LBS) (i.e., traffic flow analysis [1], the route planner [2], geographical social network [3], and the hot route finder [4]). Because the aforementioned applications need to process a massive amount of data in an effective way, they have also given significant impetus to adopt cloud (i.e., agent-based urban transportation systems [5], urban intelligence transportation [6], a cloud-enabled intensive FCD computation framework [7], and traffic flow forecasting [8]) and parallel computing paradigm (i.e., CyberGIS framework [9] and TeraGrid GIScience Gateway [10]). 5

Two papers with the part of Chapter 5 have been produced, one submitted for the reviews process in an SCI Journal, Frontiers of Information Technology & Electronic Engineering (ISSN 2095-9184), formerly known as Journal of Zhejiang University SCIENCE C (Computers & Electronics). Title of the Journal paper: “Towards Adaptable and Tuneable Cloud-Based Map-Matching Strategy for GPS Trajectories” , and the Second is accepted (will be nd published) for the Springer’s Lecture Note in Computer Science (LNCS) for the proceedings of the 2 International Conference on Internet of Vehicles (IOV 2015) (indexed by EI, CPCI, and Springer), Chengdu, China, December 2015. Title of the paper: “An approach for map-matching strategy of GPS-trajectories based on the locality of road networks”

83

In this paper, we address the problem of map-matching that plays a pivotal role in ascertaining the quality for many trajectory-based applications (such as driving directions, road guidance, moving object management, and traffic flow analysis). Basically, map-matching is a fundamental process of aforementioned applications to align in an accurate way the observed GPS positions onto a road network that is in a form of a digital map [12]. However, in terms of accuracy the studies [12, 15-17] suggested that the best performance of map-matching is based on the transition probability which incorporates the shortest path between two consecutive observed GPS points. On the extreme, the execution of the shortest path queries (SPQs) in the map-matching process requires a high computational cost which subsequently causes mapmatching unaffordable for real-time processing [12]. Moreover, the map-matching process becomes a critical step when it needs to process infrequent and imprecise sampling GPS data. Particularly, GPS data may be suffered by two typical errors [18]: (1) measurement error and (2) sampling error. The measurement error is caused by limitations of the GPS technology that generates noisy GPS data, while the sampling error arises from high-sampling period that generates extraneous GPS data. To maximize a number of candidate points (CP) and error circle radius (ECR) for considering apt number of more likely road segments can handle the noisy GPS data. Unfortunately, the current stat-of-theart map-matching approaches [12, 15-17] are based on a fixed number of CP and ECR for considering most likely road segments that may lead uncertainty of identifying either no road segments or large number of them, which can be caused of low-accuracy or expense of a large number of SPQs. Next, the sampling error, the GPS data with high-sampling period (i.e., less than 10 seconds) typically contains an extraneous data (i.e., in case of vehicle: (a) stops many times, (b) moving slowly, (c) trapped in a traffic jam, (d) waiting for a signal, and (e) moving on high-way link), which also incurs an extra number of SPQs. Due to elimination of such kind of GPS data, a trajectory may suffer from discontinuity, a reasonable technique therefore is required that can intelligently adjust the sampling-rate by providing accurate trajectory before the mapmatching process and reduce unnecessary GPS data. Nevertheless, the approaches [19-24] can deal with the execution of SPQs by pre-computing the shortest path distances and partitioning a large network graph to small regions such that the required partition could be fit in the memory [23]. Because of the execution of the SPQs is processed in a sequential way, the current state-ofthe-art approaches suffer high pre-computation and storage costs [24]. Due to the aforementioned facts, unfortunately, the approaches in the current state-of-the-art violate the quality of services (QoS) (i.e., high accuracy and fast running time), and are not apropos for map-matching GPS trajectories on real-time. Particularly, real-time traffic information provides vital inputs for dynamic traffic control and management systems [16, 25, 133]. Consequently, it is of paramount importance for the map-matching of real-time GPS trajectories to optimize the problem of SPQs in adaptive and efficient way. 84

To deal with the above challenges, this chapter presents RT-MM, a fully adaptive mapmatching strategy based on cloud for real-time GPS trajectories. The proposed map-matching strategy is approached in the following way: (a) We present a systematically model of mapmatching strategy for real-time GPS trajectories. (b) We introduce a tuning-based strategy that adaptively fine-tunes the interior and the exterior parameters of the map-matching process. The interior parameters i.e., a number of CP and ECR are tuned based on locality of road network. Particularly, the technique that fine-tunes the interior parameters of map-matching process selects an apt number for CP and ECR which is based on different classes of locality of road network for each GPS sampling point. We characterize the locality of road network in different classes which are learnt by splitting road network into small grids in a pre-processing step. While the exterior parameters relevant to a sampling-rate in the sliding window that is intelligently adjusted based on feedback information of monitored parameters at runtime. The technique for adjusting a sampling-rate of GPS data provides highly accurate trajectory after eliminating extraneous data which does not effect on the map-matching accuracy. (c) Moreover, to compute the shortest path distances and temporal/speed constraint, we proposed an extension of the single source shortest path (SSSP) function following the bulk synchronous parallel (BSP) paradigm in the cloud environment. We implemented the SSSP function following the BSP paradigm in the Hama environment which is deployed on top of the Hadoop cloud. In taking the advantage of cloud environment, our proposed strategy drastically reduces pre-computation time and storage cost. By the above strategies (i.e., tuning-, and cloud-based) we aptly proposed a viable solution to the issues posed by the SPQs that efficiently handle the SPQs. (d) The above approach is empirically evaluated using real-world and synthetic datasets. The real-world dataset is a taxi-cab GPS dataset recently collected from the Shenzhen Transportation Systems. In this paper, we consider four performance metrics to evaluate the performance of the studied map-matching strategies: (1) overall running time, (2) percentage of correct matching points (CMP), (3) total number of SPQs, and (4) total number of CPs. The results reveal that our proposed strategy reduces the overall running time of the map-matching process by reducing a number of SPQs and CPs, while maintaining the accuracy. The rest of this chapter is organized as follows. Section 5.2 sets forth the related work in terms of map-matching and Section 5.3 expresses major definitions of terms used in this paper. In Section 5.4, we present a complete overview of the proposed fully adaptive map-matching strategy for real-time GPS trajectories. In the sequel, in Section 5.5, we describe our experimental settings and computation environments. Lastly, Section 5.6 presents the results evaluation and discussion, while Section 5.7 concludes the chapter.

85

5.2. Map-Matching Algorithms In this section, we first discuss basic steps of a map-matching strategy and then we address the research problems of map-matching strategies found in current state-of-the-art approaches. Particularly, all map-matching strategies follow three major steps: (1) Initialization, (2) Weight calculation, and (3) Weight aggregation, as shown in Figure 50. The initialization step of mapmatching strategy prepares a number of CPs projected on the candidate road segments within an ECR range. The second step in a map-matching strategy finds a path between two consecutive points in a trajectory by calculating a weight function of CPs. In the last, the weight aggregation step in a map-matching strategy aggregates the weight scores. INITIALIZATION To prepare candidate points

WEIGHT CALCULATION To calculate weight between candidate points

WEIGHT AGGREGATION To aggregate weight score

Figure 50 Basic steps in a Map-Matching

Map-matching strategies can be mainly classified into: (a) incremental [134, 135], (b) global [12, 15, 136], and (c) statistical [16, 17, 137, 138] methods. Incremental map-matching strategies endeavor to find a local match of geometries for each GPS sample and consider the small portion of space of the road network close to the GPS sample. In this approach, the weight score is aggregated based on the previous result for each GPS sampling point. This approach performs well in terms of accuracy when sampling frequency is very high, i.e., 2-5 seconds time interval between GPS points. In the global map-matching approaches, the algorithms match an entire trajectory with a road network. On the other hand, the global map-matching algorithms produce better results in terms of accuracy when applying low-sampling-rate GPS data. Due to the matching one by one GPS sample point or a small portion of a trajectory, the incremental approaches are very fast but suffers with low accuracy, while the algorithms using the global approach achieve better accuracy at the expense of high computational cost. The approaches in the last category of map-matching strategies perform statistical methods i.e., Bayesian classifiers, hidden markov model [137], kalman filter, and cubic spline interpolation [138] to match GPS points with a road network. These approaches are particularly effective to handle GPS 86

measurement errors. Most current map-matching algorithms aim to achieve better QoS such as (a) high accuracy and (b) fast response time. Due to the tradeoff between high accuracy and fast response time in a map-matching strategy, the aforementioned goal is a problematic to achieve. Because of the most taxis traveled very frequently, their GPS applications can be enabled to reduce sampling-rate of GPS data to save a cost of energy consumption, communication, and computation. For example the study [18] minimizes a cost of energy by reducing a sampling-rate of GPS sampling data. Unfortunately, a low sampling-rate of GPS data increases uncertainty and leads map-matching algorithms to produce low accurate results. On the other hand, because of the SPQs, a mapmatching makes the overall running time of LBS applications unaffordable for real-time processing. However, considering tradeoff between high accuracy and fast response time in the map-matching strategy becomes a big challenge to design an efficient map-matching strategy.

5.2.1. Spatial And Temporal Matching Technique Our proposed map-matching strategy is inspired from the spatial and temporal matching (STM) [12] global map-matching strategy that aims to provide high accuracy. Spatial (i.e., geometric structure) and temporal (i.e., speed) constraints are incorporated in ST-M to solve the problem of low sampling-rate GPS trajectories. In the ST-M, firstly the weight function based on spatial and temporal constraints is defined with respect to two consecutive GPS points and their CPs. A candidate graph is created in the ST-M whereas each node in the graph is a set of CPs and their edges represent a set of road segments in the shortest path between two neighboring points associated with the spatial and temporal weight score. The algorithm then generates a true path based on the largest summation of the weight functions. Because the ST-M algorithm uses SPQs, through experimental evaluation we found that the map-matching process violates the QoS i.e., fast response time [26]. For instance, Figure 51 and Figure 52 shows ST-M results i.e., (a) total number of SPQs and (b) total number of less CPs found, with respect to the CPs and GPS points of real-world trajectories. The reason that the execution time would be high is due to the fact that the algorithm increases a number of SPQs as shown in Figure 51. The above is because of the: (a) more number of GPS points and fixed number of (b) CPs and (c) ECR. On the other extreme, a fixed number of CP and ECR can produce low accurate results if the considered space of a road network for considering CPs does not encompass an apt number of road segments [13] as shown in Figure 52.

87

Figure 51 Simulation results of the ST-Matching w.r.t. a different number of GPS points in real-world trajectories (Total number of SPQs).

Figure 52 Simulation results of the ST-Matching w.r.t. a different number of GPS points in real-world trajectories (Total number of LessFound CP).

For example, Figure 53 shows a snap shot of two fixed ECR values (i.e., r1 =small and r2 =large). In the figure, two CPs could be considered if ECR value would be equals to r1 =small, otherwise four CPs are considered for the point pi .

Figure 53 The interior setting for considering candidate points for a sampling point. p: GPS sampling point e: road segment c: candidate point (CP) r: error circle radius (ECR)

88

Therefore, in this paper, we introduce a novel approach of applying adaptive strategy that addresses the aforementioned challenges. Our introduced tuning-based strategy adaptively adjusts sampling-rate of GPS data and fine-tunes the map-matching parameters (such as a number of CPs and ECR). The proposed tuning-based strategy is conceptually similar to the tuneable job scheduling strategy [83]. Moreover, we enhance our tuning-based map-matching strategy with the strategy of pre-computing the shortest path distances and road network partitioning for real-time GPS trajectories. However, the approaches [19-24] deal with the execution of SPQs by pre-computing the shortest path distances and partitioning a large network graph to small regions such that the required partition could be fit in the memory [23]. It must be noted that the aforecited techniques deal with the execution of SPQs in sequential way, which comes at the expense of high pre-computation and storage costs [24]. In contrast to traditional approaches, we propose an extension of SSSP function following the BSP parallel paradigm in the cloud environment that significantly reduces pre-computation time and storage cost.

5.3. Definitions Definition 1: (GPS trajectory): A taxi completing a trip is called a trajectory 𝑇 that is measured in a sequence of GPS sample points i.e., 𝑇: 𝑝1 → 𝑝2 → ⋯ → 𝑝𝑛 , where 0 < 𝑝𝑖+1 . 𝑡 − 𝑝𝑖 . 𝑡 < ∆𝑇 (1 ≤ 𝑖 < 𝑛) and ∆𝑇 is a time spent between two consecutive GPS points. Each GPS point 𝑝𝑖 𝜖 𝑇 contains information including GPS position in latitude 𝑝𝑖 . 𝑙𝑎𝑡 , longitude 𝑝𝑖 . 𝑙𝑜𝑛 , and timestamp 𝑝𝑖 . 𝑡. Definition 2: (Strategy event): The map-matching examines the trajectory 𝑇 at pre-scheduled times, called a map-matching strategy event 𝐸. Pre-scheduled time is a regular time interval called an event time ∆𝐸. Definition 3: (Meta-points): A set of GPS points from a trajectory 𝑇 considered for mapmatching based on new sampling-rate in each ∆𝐸, called meta-points 𝑀, i.e., 𝑀: 𝑝1 → 𝑝2 → ⋯ → 𝑝𝑛 , where 𝑝𝑖 𝜖 𝑀 , 𝑀𝜖 𝑇 and 0 < 𝑝𝑖+1 . 𝑡 − 𝑝𝑖 . 𝑡 < ∆𝑀, (1 ≤ 𝑖 < 𝑛 , 0 ≤ ∆𝑇 < ∆𝑀). ∆𝑀 is the time spent between two consecutive GPS points. Definition 4: (Sliding Window): A total number of GPS points in 𝑀 considered for a mapmatching, called a sliding window 𝑊. Definition 5: (Road graph): A directed road network graph 𝐺(𝑉, 𝐸) is called a road graph. 𝑉 denotes a set of points intersecting the road segments, called vertices. 𝐸 signifies a set of road segments, called edges 𝑒. A directed edge 𝑒 is associated with: (a) a unique id 𝑒. 𝑔𝑖𝑑, (b) the average travel speed 𝑒. 𝑣, (c) the road length 𝑒. 𝑙, (d) the starting point 𝑒. 𝑠𝑡𝑎𝑟𝑡, (e) the ending point 𝑒. 𝑒𝑛𝑑, (f) and the intermediate points comprising the road polyline.

89

Definition 6: (Path): A path 𝑃 is a list of connected road segments between two given vertices (𝑉𝑖 , 𝑉𝑗 ) in a road network 𝐺 , i.e., 𝑃: 𝑒1 → 𝑒2 → ⋯ → 𝑒𝑛 , where 𝑒1 . 𝑠𝑡𝑎𝑟𝑡 = 𝑉𝑖 , 𝑒𝑛 . 𝑒𝑛𝑑 = 𝑉𝑗 , 𝑒𝑘 . 𝑒𝑛𝑑 = 𝑒𝑘+1 . 𝑠𝑡𝑎𝑟𝑡 , 1 ≤ 𝑘 < 𝑛.

5.4. Fully Adaptive Map-Matching Strategy In this section, we introduce our proposed RT-MM, fully adaptive map-matching strategy for real-time GPS trajectories based on cloud. The primary goal of this study is to improve the mapmatching strategy that can provide trade-off between major QoS (i.e., high accuracy and fast response time). This section discusses the architecture and major components of the proposed map-matching strategy.

5.4.1. Architecture A complete systematically model of our proposed RT-MM consists of two major steps: offline efforts and online efforts. The architecture of the system is shown in Figure 54, and the overview of the system is explained as follows. Off-line efforts are followed by partitioning the road network graph and pre-computing the shortest path distance and temporal/speed constraints. The road network graph is split into small sub-graphs, with each sub-graph keeping its boundary values such as the maximum and minimum longitude and latitude of the sides [22]. Our partition approach guarantees that subgraphs are created subject to the constraint each of them consists of an approximate equal number of nodes for the purpose of load balancing. The purpose of splitting road network graph into small grids is to fit a required grid into memory. Next, in the offline efforts, the shortest path distance and temporal/speed constraints are computed by following the parallel computing paradigm (i.e., BSP [53]) in cloud environment to reduce the pre-processing time. According to our problem requirement, we provide an extension of the SSSP function following BSP paradigm. The aforementioned implementation is taken place in Hama [52] on top of the Hadoop environment. Besides computing the shortest path distance, the proposed SSSP function also computes two more fields required in our system, i.e., the number of edges and the total speed of all edges in a shortest path. We briefly discuss the modified SSSP function approaching BSP parallel paradigm in Section 5.4.2.

90

Figure 54 An overview of the proposed system.

In online efforts, our proposed map-matching strategy periodically monitors the GPS sampling data on real-time in order to adaptively fine-tune the interior and the exterior parameters of the map-matching process. The interior parameters (i.e., CP and ECR) are tuned based on locality of road network. The exterior parameters are relevant to a sampling-rate and a number of GPS points in the sliding window, which are tuned based on feedback information of monitored parameters at runtime. In this phase, the map-matching strategy checks the queue periodically and accepts real-time GPS trajectory 𝑇 for map-matching on the digital map. Nevertheless, the online phase can independently work using SPQs without the aforementioned off-line efforts. A complete map-matching process for real-time GPS trajectories is further discussed in Section 5.4.3.

5.4.2. Modified SSSP Algorithm Following BSP Parallel Paradigm in Cloud Environment In this section, we discuss our modified SSSP algorithm that follows the BSP parallel paradigm to generate all-pairs shortest path distances together with temporal/speed constraint of all of the nodes in the graph. The problem of all-pairs shortest paths has been solved in SSSP function based on Dijkstra algorithm [139]. The SSSP function computes the shortest path distances from a single source node to all-pairs in the graph. Figure 55 and Figure 56 show an example of the SSSP process of a network graph consisting of six nodes connected and seven edges. Each edge is weighted with a value as shown in Figure 55. The SSSP results of each node in the graph are revealed in Figure 56. 91

Figure 55 An example for SSSP computation: a network graph.

Figure 56 An example for SSSP computation: SSSP results of all nodes in the graph.

Because of the: (a) shortest path distance contains graph computations and (b) large data processing, we use a parallel paradigm to reduce pre-computation time. The SSSP function recently has been studied into two well-known parallel paradigms [51], i.e., (a) MapReduce (MR) [48] and (b) BSP [53]. Due to the graph and the iterative processing nature of the problem, MR is badly suited for the shortest path computations [51]. Therefore, Google introduced an alternative model Pregel [53] based on the BSP parallel paradigm. Pregel implemented in Hama [52] has been selected in our system for the shortest path computations. Hama runs on top of the Hadoop cloud environment. Particularly, an algorithm approaching BSP [51, 140] generates a series of supersteps (see Figure 7 in Section 2.2.2.3.2). By default, the SSSP function accepts an input of the network graph and a single source node. Unlike the default structure of the SSSP function provided by Hama, the format of both input and output data is totally modified in our modified version. Besides the shortest path distance, the proposed SSSP function computes two more fields required in our system, i.e., (a) a number of edges and (b) a total speed of all edges in a shortest path. The output of the function is used to calculate spatial and temporal weight scores of two consecutive GPS sample points for the final map-matching process. Algorithm 4 shows a pseudo-code of the modified SSSP function following the BSP parallel paradigm. This algorithm takes as input the road network graph in the following modified format.

92

6

Format : Node\tEdge1:Length,Speed\t. . .\tEdgen:Length,Speed Example: 84\t192:89.4,180 85\t82:7.3,260\t81:8.3,90\t176:162.3,90 86\t48:107.1,90\t164:120.2,260\t306:203.4,260 87\t105:24.6,260

Each line in the input data represents a node. The edges connected to the respective node are separated by the tab key on the same line. Each connected edge is concatenated by three variables: (1) edge id, (2) edge length, and (3) edge speed. At the start of this function, the variable of the actual value (at each vertex except the source) that corresponds to the minimum distance is initialized with an infinity value (line 2-6). Basically, in the modified SSSP, each vertex reads a message from its connected edges, which contains three parts: (a) the distance, (b) a total number of edges, and (c) a total speed of the edges between the source and current vertex. If the value of distance in a message at a given vertex is small than the actual value associated with current vertex, then the function updates the above (a), (b), and (c) values associated with the current vertex (7-13). Finally, the current vertex sends a message with the updated aforementioned values to its all neighbouring edges (line 14-23) and becomes an inactive vertex by calling voteToHalt () (line 25). When a vertex receives a message it becomes active. The process terminates when there are not active vertices to be considered. Algorithm 4: Modified SSSP algorithm in BSP approach Input: A road network graph 𝐺(𝑉, 𝐸) in specific format and single source node Output: Shortest paths from single source node to all nodes in the graph. Note: a record in output data provides shortest distance, number of edges, and total speed of roads in each shortest path. 1: function COMPUTE (vertex 𝑣, messages 𝑚 ) 2: if isStartVertex (𝑣) then 3: minDistanceDefault= 0 4: else 5: minDistanceInDefault= ∞ 6: end if 7: for 𝑚 in messages do 8: if 𝑚.distance < minDistanceInDefault then 9: minDistanceInDefault = 𝑚. distance 10: edgeCountInDefault= 𝑚.edgeCount 11: speedCountInDefault= 𝑚.speedCount 12: end if 13: end for loop 14: if 𝑣. distance > minDistanceInDefault then 15: 𝑣.minDistance= minDistanceInDefault 16: 𝑣.edgeCount= edgeCountInDefault + 1 17: 𝑣.speedCount= speedCountInDefault 18: for 𝑒 in neighbor of 𝑣 19: 𝑚.distance= 𝑒.distance + minDistanceInDefault 20: 𝑚.edgeCount= 𝑚.edgeCount+1 21: 𝑚.speedCount= 𝑒.speed + speedCountInDefault 22: sendMessage (𝑒. 𝑖𝑑 , 𝑚) 23: end for loop 24: end if 6

Default format is only used for computing shortest distance, explained into Hama SSSP page, http://wiki.apache.org/hama/SSSP

93

25: voteToHalt () 26: end function COMPUTE

5.4.3. Map-Matching Strategy for Real-Time GPS Trajectories The proposed map-matching strategy is based on window-based and tuning-based techniques. A raw trajectory is periodically selected in a sliding window. Then the interior and the exterior parameters of map-matching are fine-tuned. The steps used in our proposed mapmatching strategy are mentioned below: (a) monitoring, (b) tuning, (c) candidate preparation, (d) spatial analysis, (e) temporal analysis, and (f) matching score. In the monitoring phase, our map-matching strategy periodically monitors the flow of realtime GPS workloads and the parameters of the sliding window in each strategy event 𝐸 . By the monitored parameters, we determine the current driving state of a vehicle that helps the next phase to fine-tune the sampling-rate of GPS data. Our intuition is that if the vehicle is driven on the high-way or long road segments, then sampling-rate can be reduced by filtering the noisily and extraneous data. The above is because of it is unnecessary to run the map-matching process for the sampling points that are located on the same road segment. Alternatively, high samplingrate of GPS data ensures the accuracy of the map-matching if the vehicle is travelled inside the city. In contrast to Ref [18] to estimate the current driving state of the vehicle, we choose the average speed of the vehicle extracted from the previous sliding window, i.e., defined by: ̅ = 𝑆𝑖−1

́ ∑𝑘 𝑢=1 𝑒𝑢 .𝑙 ∆𝑊(𝑝́ .𝑡−𝑝́ .𝑡) 1 𝑖

(1)

where 𝑒𝑢́. 𝑙1 → 𝑒𝑢́. 𝑙2 → ⋯ → 𝑒𝑢 ́. 𝑙𝑘 is the total length of the matched shortest path and ∆𝑊(𝑝́ 𝑖 .𝑡−𝑝́ 1 .𝑡) is the total time spent on the path 𝑃 in the previous sliding window 𝑊𝑖−1 . If the vehicle is traveled with average speed ̅𝑆𝑖−1 of 40 kilometers per hour or above, then the vehicle is thought to be on high-way, otherwise we consider that the vehicle is moved inside the city. The tuning phase is responsible to amend the sampling-rate ∆𝑇 of GPS sampling data based ̅ . A new sampling-rate ∆𝑀 of the GPS sampling data is adjusted by: on monitored parameter 𝑆𝑖−1 ∆𝑀 =

∆𝐸 , 𝑊

(2)

where ∆𝐸 represents the time interval, while 𝑊 represents the number of GPS sampling points in the sliding window. We call exterior parameters to ∆𝐸 and 𝑊. Both parameters are tunable with pre-defined values such that ∆𝐸=10 and 𝑊=12 imply that the vehicle is in high-way driving state, while ∆𝐸=5 and 𝑊=10 imply that the vehicle is in inside-city driving state. In this way, the sampling-rate of GPS data will be adjusted in a reasonable sampling-rate of GPS data. To 94

maintain the quality of our map-matching strategy, we incorporate the weight score of the previous mapped GPS sampling point as a source of the path in the current sliding window. A schematic model is shown in Figure 57. This figure shows three sliding windows. As observed, second and third sliding windows follow recent results (i.e., last mapped GPS point) of the previous sliding window, which guarantees to create a real path.

Figure 57 An example of the proposed window-based scheme of map-matching for real-time trajectory (W=5).

Besides exterior settings, the tuning phase is also responsible for tuning the interior parameters, i.e., CPs and ECR for consideration of the most likely real road segments. The tuning strategy decides the interior settings based on the locality of the road network. Because characterization of workloads helps in understanding the overall behaviour of the system [141], we characterized the locality of the road network into grid format (i.e., 200 × 200 grids). Each grid is about 440 and 210 meters wide and narrow, respectively. A tuple for each grid stores a density (i.e., a total number of road segments located into the grid) and their ids. From the characterization of the locality of the road network, we create the following table (Table 14) for considering the CPs and ECR settings. Table 14 Parameters based on locality information Class

Density (∑e)

ECR (r)

CP (c)

Class-I

≥21

60

3

Class-II

≥16 ~