Dynamic Resource Allocation in Enterprise Systems - CiteSeerX

12 downloads 87615 Views 677KB Size Report
Mar 11, 2009 - rameter changes when the application is running (e.g. from the monitoring tools, or system logs) and make dynamic server switching decisions ...
2008 14th IEEE International Conference on Parallel and Distributed Systems

Dynamic Resource Allocation in Enterprise Systems James W.J. Xue, Adam P. Chester, Ligang He and Stephen A. Jarvis Department of Computer Science, University of Warwick, Coventry CV4 7AL, United Kingdom Abstract

of web servers, application servers and database servers. Some systems also employ load balancers or edge servers before the web server tier to balance the workload. In each tier, there is normally a cluster of servers to improve the collective processing power. In this work, we model typical enterprise systems using a multi-class closed queueing network (see Figure 1). The advantage of using an analytical model is that we can easily capture the different performance metrics, and identify potential bottlenecks without running the actual system. The model can also react to parameter changes when the application is running (e.g. from the monitoring tools, or system logs) and make dynamic server switching decisions to optimise pre-defined performance metrics. Bottlenecks are a phenomenon where the performance or capacity of an entire system is severely limited by a single component. This component is sometimes called the bottleneck point. Formally, a bottleneck lies on a system’s critical path and provides the lowest throughput [5]. The multi-tiered architecture of an enterprise system can introduce bottlenecks, which will limit the overall system performance. Moreover, the workload mix for a particular application often changes at run-time, which can shift system bottlenecks between tiers [2]. Therefore, system designers need to study the best server configuration to avoid bottlenecks during system capacity planning and provisioning, and ideally provide schemes to support dynamic server allocation during run-time. Workload demand for Internet services is usually very bursty [1][3][26], thus it is difficult to predict the workload level at a certain point in time. Thus, fixed server configurations for a service are far from satisfactory for an application when the workload level is high; whereas it is potentially a waste of resource while the workload is light for the remaining applications supported by the system. Therefore, it is desirable that server resources in a shared hosting environment can be switched between applications to accommodate workload variation. Moreover, during special events, some Internet services are subject to huge increases in demand, which in extreme case can lead to system overloading. During an overload

It is common that Internet service hosting centres use several logical pools to assign server resources to different applications, and that they try to achieve the highest total revenue by making efficient use of these resources. In this paper, multi-tiered enterprise systems are modelled as multi-class closed queueing networks, with each network station corresponding to each application tier. In such queueing networks, bottlenecks can limit overall system performance, and thus should be avoided. We propose a bottleneck-aware server switching policy, which responds to system bottlenecks and switches servers to alleviate these problems as necessary. The switching engine compares the benefits and penalties of a potential switch, and makes a decision as to whether it is likely to be worthwhile switching. We also propose a simple admission control scheme, in addition to the switching policy, to deal with system overloading and optimise the total revenue of multiple applications in the hosting centre. Performance evaluation has been done via simulation and results are compared with those from a proportional switching policy and also a system that implements no switching policy. The experimental results show that the combination of the bottleneck-aware switching policy and the admission control scheme consistently outperforms the other two policies in terms of revenue contribution.

1

Introduction

It is common for companies to outsource their IT infrastructure to Internet Service Providers (ISPs) to reduce their operational cost. The ISPs have their own resource centres and it is common that they host many services simultaneously. ISPs provide services based on Service Level Agreements (SLAs) between themselves and IT service companies. In order to make profits, ISPs have to make efficient use of their resources, while providing agreed services on behalf of their customers. Enterprise systems are typically multi-tiered, consisting 1521­9097/08 $25.00 © 2008 IEEE DOI 10.1109/ICPADS.2008.104

203

Authorized licensed use limited to: WARWICK UNIVERSITY. Downloaded on March 11, 2009 at 06:47 from IEEE Xplore. Restrictions apply.

period, the service’s response time may grow to an unacceptable level, and the exhaustion of resources may cause the service to behave erratically or even crash [24]. In order to better deal with system overloading, many services employ admission control schemes [8][11][23][24]. By rejecting less important requests when the system is overloaded, admission control schemes can guarantee the performance of specific requests. In this paper we propose a model-driven server switching policy to dynamically allocate resources for multi-tiered enterprise architectures in order to achieve the highest revenue. The switching decision is guided by bottleneck identification results from an established approach [6]. A local search algorithm is designed and used to search for better server configurations as the basis for server switching when the system state changes. Furthermore, a simple admission control scheme is used to maintain the number of simultaneous jobs in an enterprise system at an appropriate level. We also compare the performance of the switching policy via simulation and compare the results with those from a proportional switching policy and non-switching policies. The experimental results show considerable benefits of our proposed switching system. The remainder of this paper is organised as follows. Section 2 reviews related work; section 3 demonstrates the modelling of multi-tiered enterprise systems; section 4 introduces the concept of system bottlenecks and an identification methodology; section 5 describes the new server switching policy in this work; in section 6, we briefly describe how admission control is achieved in this framework; the experimental results in section 7 demonstrate the quality of the combination of the admission control scheme and sever switching policy; and section 8 concludes the paper.

2



  





 



Figure 1. A model of a typical configuration of a cluster-based multi-tiered Internet service. C represents customer machines; WS, AS and DS represent web servers, application servers and database servers, respectively. is different from [12][19] in the following respects: a) this work addresses the same server switching issue, but for multi-tiered enterprise system architectures. Servers can be switched between pools in the same tiers; b) a multi-class closed queueing network model is employed as opposed to a single M/M/m queue for computing the various performance metrics; c) this paper also captures the notion of bottlenecks and uses an established identification method to guide server switching; d) this paper also deals with system overloading by using a supporting admission control scheme.

3 3.1

Modelling of Multi-tiered Internet Services The Model

A multi-tiered Internet service can be modelled using a multi-class closed queueing network [22][25]. Figure 1 shows a model for a typical configuration of such applications. In the model, C refers to the client; WS, AS and DS refer to the web server, application server and database server respectively. The queueing network is solved using the MVA (Mean Value Analysis) algorithm [20], which is based on Little’s law [15] and the Arrival Theorem [20][21] from standard queueing theory. In this section, we briefly describe how different performance metrics can be derived from the closed queueing network model. Table 1 summarises the notation used throughout this paper.

Related Work

Recent research on revenue maximisation has attracted great interest. The work in [16] studies methods for maximising profits of best-effort and QoS demanding jobs in a web server farm. [18] provides differentiated services to different jobs using priority queues to maximise a service provider’s revenue. In [7], the authors use an economic approach to manage resources in hosting centres. In their work, services “bid” for resources as a function of delivered performance, and as a result, server energy usage can be reduced. [19] addresses the issue of server switching between different queues and tries to optimise the total profits by solving a dynamic programming equation, with the consideration of various awards and penalties (such as a revenue award and various job holding costs) to support the scheme. In [12], the authors also try to maximise the total revenue by partitioning servers into logical pools and switching servers between pools at run-time. This paper

Consider a product form closed queueing network with N load-independent service stations. N = {1, 2, · · · , M } is the set of station indexes. Suppose there are K customers and they are partitioned into R classes according to their service request patterns; customers grouped in a class are assumed to be statistically identical. R = {1, 2, · · · , R} is the set of class indexes. The service time, Sir , in a multi-class closed queueing network is the average time

204

Authorized licensed use limited to: WARWICK UNIVERSITY. Downloaded on March 11, 2009 at 06:47 from IEEE Xplore. Restrictions apply.

For the case of multi-server nodes (mi > 1), it is necessary to compute the marginal probabilities. The marginal probability that there are j jobs (j = 1, . . . , (mi − 1)) at the station i, given that the network is in state k, is given by [4],

Table 1. Notation used in this paper Symbol Description Sir Service time of job class-r at station i vir Visiting ratio of job class-r at station i N Number of service stations in QN K Number of jobs in QN R Number of job classes in QN Kir Number of class-r job at station i mi Number of servers at station i φr Revenue of each class-r job πi Marginal probability at centre i T System response time Dr Deadline for class-r jobs Er Exit time for class-r jobs Pr Probability that class-r job stays Xr Class-r throughput before switching ! Xr Class-r throughput after switching Ui Utilisation at station i ts Server switching time td Switching decision interval time

" R # 1 X vir πi (j | k) = Xr (k) πi (j − 1 | k − 1r ) j r=1 Sir

Applying Little’s law [15], the throughput of class-r jobs can be calculated, kr v i=1 ir T ir (k)

Xr (k) = PN

(3)

Applying Little’s Law again with the Force Flow Law [17], we derive the mean queue length K ir for class-r job at station i as below, K ir (k) = Xr (k) · T ir (k) · vir

(4)

The starting point of this equation is K ir (0, 0 . . . , 0) = 0, πi (0 | 0) = 1, πi (j | 0) = 0; after K iterations, system response time, throughput and mean queue length in each tier can be computed. In multiclass product form queueing networks, per-class station utilisation can be computed using the following equation [20],

spent by a class-r job during a single visit to station1 i. The service demand, denoted as Dir , is the total service requirement, which is the average amount of time that a class-r job spends in service at station i during execution. This can be derived from the Service Demand Law [17] as Dir = Sir · vir ; here vir is the visiting ratio of class-r jobs to station i. Kr is the total population of customers of class r. The !total population of the network is thus defined as K = r Kr . The vector K = {K1 , K2 , · · · , KR } is used to represent the population of the network. In modern enterprise systems, clusters of servers are commonly used in each application tier to improve server processing capability. Thus, when modelling those applications, we need to consider both -/M/1-FCFS and -/M/mFCFS in each station. Suppose there are k jobs in the queueing network, for i = 1, . . . , N and r = 1, . . . , R, the mean response time of a class-r job at station i can be computed as follows [4], 8 h i P 1 > 1+ R > s=1 K is (k − 1r ) , S > ir < h P T ir (k) = Sir1·mi 1 + R s=1 K is (k − 1r ) > i > Pmi −2 > :+ j=0 (mi − j − 1) πi (j | k − 1r ) ,

(2)

Kr Dir i Dir [1 + K i (k − 1r )]

Uir (k) = P

(5)

and the total station utilisation Ui (k) is the sum of per-class !R station utilisation, Ui (k) = r=1 Uir (k). The above is the exact solution for multiclass product form queueing networks. The trade-offs between exact solutions and approximations are accuracy and speed. We use exact solutions to guide server switching decisions as a higher degree of accuracy is believed to be important here. However, a dedicated machine can be used for the switching system itself, to solve speed and storage issues and to reduce the interference with the servers themselves.

4

mi = 1

Bottleneck Identification

It has been shown in [2] that multi-class models can exhibit multiple simultaneous bottlenecks. The dependency of the bottleneck set on the workload mix is therefore derived. In an enterprise system there are normally different classes of jobs and the class mix can change at run-time. This suggests that there might be several bottlenecks at the same time and bottlenecks can shift from tier to tier over time. Thus, bottleneck identification methodologies are desirable in such systems.

mi > 1 (1)

here, (k − 1r ) = (k1 , . . . , kr − 1, . . . , KR ) is the population vector with one class-r job less in the system. The mean system response is the sum of mean response time of each tier.

1 in this paper, the terms station, centre and node have the same meaning, and are used interchangeably.

205

Authorized licensed use limited to: WARWICK UNIVERSITY. Downloaded on March 11, 2009 at 06:47 from IEEE Xplore. Restrictions apply.

4.1

Identification Methods

jobs in pool 1 exceeds 61.5%, the application server tier becomes the bottleneck. Figure 3 shows the bottleneck identification in pool 2. It is more complex and is a good example of multiple bottlenecks and bottleneck shifting. In this case, when the percentage of silver class jobs is less than 16.7%, the web server tier is the bottleneck; when it is between 16.7% and 33.3%, both the web server tier and the database tier are in the crossover region; if the percentage of silver class jobs lies in the region 33.3% to 50.0%, the database tier becomes the bottleneck; when it is between 50.0% and 75.0%, the system enters another crossover region, where the application server tier and the database server tier dominate; and finally, if the percentage of silver class jobs exceeds 75.0%, the application server tier is the bottleneck in the system.

In [9], it is shown that the bottleneck for a single class queueing network is the station i with the largest service demand Si vi , under the assumption of the invariance of service time Si and visiting ratio vi and given routing frequencies. Considerable research exists [2][9][14] which studies bottleneck identification for multi-class closed productform queueing networks as the population grows to infinity. For a finite population, the results in [10][13] can be used. In this paper we use the approach developed in [6], which uses convex polytopes for bottleneck identification in multi-class queueing networks. This method can compute the set of potential bottlenecks in a network with one thousand servers and fifty customer classes in just a few seconds.

5

silver class jobs (%)

As previously highlighted, the workload in enterprise systems can vary significantly. Due to this variation, it is very difficult to predict the workload in advance. It is therefore the case that one-time system configuration is no longer effective and it is desirable that servers be able to switch from one pool to another, depending on the load conditions. However, the server-switching operation is not costfree, since during the period of switching the servers being switched cannot serve jobs. Therefore, a decision has to be made as to whether it is worth switching, and if so, how much resource should be switched [12].

100 WS tier

WS tier AS tier

53.8 38.5

AS tier

0

46.2

61.5

Server Switching

100 gold class jobs (%)

Figure 2. Bottleneck of the two-class queueing network in pool 1.



gold class jobs (%)

 

 

WS tier

100

WS tier DS tier

83.3

 

AS tier DS tier

50.0

25.0

0



DS tier

66.7

 

 



AS tier

16.7

33.3

50.0

75.0

Figure 4. Illustration of the proposed switching system.

100 silver class jobs (%)

Figure 3. Bottleneck of the two-class queueing network in pool 2.

Figure 4 is an illustration of our proposed system. First, a workload model is built from the load that enters from the admission control component. Based on the workload model and system hardware information, a performance model can be built. The system monitoring facility collects run-time system data and communicates with the switching engine. If the job class mix changes, the monitoring tool should be able to catch the change and pass it to the server switching engine. The engine then solves the performance

Figure 2 and Figure 3 are the bottleneck identification results using convex polytopes for our chosen configurations for pool 1 and pool 2. Figure 2 shows that in pool 1, when the percentage of gold class jobs is less than 46.2%, the web server tier is the bottleneck; when it is between 46.2% and 61.5%, the system enters a crossover points region, where the bottleneck changes; when the percentage of gold class

206

Authorized licensed use limited to: WARWICK UNIVERSITY. Downloaded on March 11, 2009 at 06:47 from IEEE Xplore. Restrictions apply.

model and compares the benefits and penalties of all possible switches before making the final decision.

5.1

Our goal in this paper is to maximise the ISP’s total revenue contributed by both pool i and pool j. In other words, when we decide whether to switch servers, we need to compare the revenue gain and loss caused by server switching, j i and the switching is done only when Vgain > Vloss . In this paper, we only consider switching servers between pools in the same tier (i.e., we switch web servers from pool i to the web server tier in pool j), although given proper configuration, the switching is also possible between tiers (i.e., switching web servers in pool i to the application tier in pool j).

Revenue Function

For a typical Internet service, a user normally issues a sequence of requests (referred to as a session) during new visit to the service site. Intuitively, a request contributes full revenue if it is processed before the deadline Dr . When a request r misses its deadline, it still waits for execution with a probability P (Tr ) and credit is still due for late, yet successful processing. When the response time Tr < Dr , then P (Tr ) = 1; which means that the request contributes full revenue and the user will send another request. Suppose Er is some time point, at which the request is dropped from the system. It is assumed in this paper that when Dr ≤ Tr ≤ Er , the request will quit the system with probability P (Tr ), which follows a uniform distribution. If Tr ≥ Er , then P (Tr ) = 0, which means that the request quits the system without contributing any revenue. The following equation is used for calculating Pr , 8 > 1, > < Tr − Dr P (Tr ) = , > E − Dr > : r 0,

(6)

Tr > Er

The meaning of the above equation is that the longer the completion time of a job r exceeds its deadline, the more likely it is that the client will quit the system, thus approximating real-world client behaviour. Based on the revenue function, the revenue gained and lost by server switching can be calculated. Suppose some servers need to be switched from pool i to pool j. We i use Vloss to represent the revenue loss in pool i. From the time that switching happens, the service capacity offered by server pool i starts to degrade. From eq. 7, the revenue loss in pool i can be derived, i Vloss =

R X r=1

R “ ” “ ” X ! Xri ki φir Pr td − Xri ki φir Pr td

R " r=1

(7)

r=1



r=1

Xrj

Proportional Switching Policy

Algorithm 1 is simple as it only considers the workload proportion. In fact, workload mix and revenue contribution from individual classes in different pools can also affect the total revenue. In the next section, we will introduce a new switching policy, which takes the above factors into account.

$ ! # Xrj k j φjr Pr (td − ts ) R "

5.2.1

Algorithm 1 Proportional Switching Policy Input: N , mi , R, Kir , Sir , vir , φr , ts , td Output: Server configuration 1. for each i in N do 2. m1i /m2i = K 1 /K 2 3. end for 4. calculate Vloss and Vgain using eq. 7 and eq. 8; 5. if Vgain > Vloss then 6. do switching according to the calculations; ! 7. Sir ← Sir ; 8. else 9. server configuration remains the same; 10. end if 11. return current configuration

The server switching itself takes time, during which neither pool i nor pool j can use the servers being switched. Only after switching time ts , does pool j then benefit from the switched servers. During the switching decision interval j time td , the revenue gain Vgain can be calculated as below, j Vgain =

Server Switching Policy

First, we consider a n¨aive policy called the proportional switching policy (PSP). The policy switches servers between pools based on the workload proportion in both pools. Performance criteria for server switching is computed using the queueing network model; if the performance of the new configuration is better than the current one, then server switching is done, otherwise the server configuration remains the same. Algorithm 1 describes how the policy operates.

Tr < Dr Dr ≤ Tr ≤ Er

5.2

5.2.2

# j$ j k φr Pr (td − ts ) (8)

Bottleneck-aware Switching Policy

Here we describe a more sophisticated server switching policy called the bottleneck-aware switching policy (BSP), as described in Algorithm 2. BSP works in two phases: 1)

here, it is assumed the decision interval time td > ts .

207

Authorized licensed use limited to: WARWICK UNIVERSITY. Downloaded on March 11, 2009 at 06:47 from IEEE Xplore. Restrictions apply.

Algorithm 2 The Bottleneck-aware Switching Policy Input: Nr , mi , R, Kir , Sir , vir , φr , ts , td Output: new configuration 1. while bottleneck saturation found in one pool do 2. if found at same tier in the other pool then 3. return; 4. else switch servers to the bottleneck tier; ! ! 5. mi ← mi and Sir ← Sir ; 6. end if 7. end while 8. search configurations using Algorithm 3 9. return current configuration.

Algorithm 3 The Configuration Search Algorithm Input: Nr , mi , R, Kir , Sir , vir , φr , ts , td Output: best configuration Initialisation: compute Ui1 , Ui2 1. while U01 > U02 do 2. if m20 > 1 then 2 2! ← S0r ; 3. m20 ↓, m10 ↑; S0r 1 2 4. while U1 > U1 do 5. if m21 > 1 then 2 2! 6. m21 ↓, m11 ↑; S1r ← S1r ; 1 2 7. while U2 > U2 do 8. if m22 > 1 then 2 2! ← S2r ; 9. m22 ↓, m12 ↑; S2r 10. compute Vloss using eq. 7; 1 1! 11. S2r ← S2r ; 12. compute Vgain using eq. 8; 13. if Vgain > Vloss then 14. store current configuration; 15. end if 16. compute new Ui1 , Ui2 ; 17. end if 18. end while 19. similar steps for U21 < U22 1 1! 20. S1r ← S1r ; 21. compute new Ui1 , Ui2 ; 22. end if 23. end while 24. similar steps for U11 < U12 1 1! 25. S0r ← S0r ; 26. compute new Ui1 , Ui2 ; 27. end if 28. end while 29. similar steps for U01 < U02 30. return best configuration.

Bottleneck identification. It first checks for bottleneck saturation in both pools. If both pools have bottlenecks at the same tier, two cases are considered: a) if both of them are saturated, then no server will be switched; b) if a bottleneck is saturated in one pool but not in the other, then the algorithm incrementally switches servers to the bottleneck tier and compares the new revenue with the value from the current configuration. If a potential switch will result in more revenue, then the configuration will be stored. The process continues until no bottleneck saturation in either pools or no more switching can be done from the other pool. Note that when bottleneck saturation is found, server switching in other tiers has little or no effect, thus it can be safely neglected. 2) Local search. If there is no bottleneck saturation in either of the pools, then the algorithm computes the server utilisation at all tiers in both pools and switches servers from low utilisation tiers to high utilisation tiers using a local search algorithm (Algorithm 3). In both algorithms, superscripts represent pools and subscripts 0, 1, 2 represent the web tier, application tier and database tier respectively. Algorithm 3 uses nested loops to search for possible server switches, starting from the web tier continuing to the database tier. It tries to explore as many possible switching configurations as possible. However, the algorithm will not guarantee that the best switching result (the global optimal) will be found, thus it is a best-effort algorithm. If we use m0 , m1 , m2 to represent the total number of web servers, application servers and database servers in both pools respectively, in the worst case, the total number of searches made by Algorithm 3 will be (m0 −2)×(m1 −2)×(m2 −2), therefore the time complexity is O(m0 · m1 · m2 ). For typical server configurations, m0 , m1 and m2 are not normally large, thus Algorithm 3 is feasible in practice. The time for each search iteration depends on the complexity of the underlying queueing network model, which in turn depends on the number of stations and the number of job classes (the dominant factor as shown in [14]). Enterprise systems are normally three-tiered (N = 3), and the number of job

classes is normally small, depending on the classification criteria. Therefore, solving such a multi-class closed queueing network model is very quick, thus the same applies for each iteration in the searching algorithm. As shown later in this paper, for our configuration, the average runtime of the algorithm is less than 200 milli-seconds on a 2.2Ghz computer, which is considered acceptable.

6

Admission Control

In this paper, we also use a simple admission control scheme, in addition to the server switching policy, to maintain the number of concurrent jobs in the system at an appropriate level. When the workload is high, which in turn makes the overall system response time high, less important requests are rejected first. If requests in this category are rejected, but the overall response time still remains high, the

208

Authorized licensed use limited to: WARWICK UNIVERSITY. Downloaded on March 11, 2009 at 06:47 from IEEE Xplore. Restrictions apply.

7.2.1

AC scheme continues to reject jobs in the system, until the response time decreases to an acceptable level.

7

As described in section 4, even if the total workload remains the same, system bottlenecks can shift among tiers depending on the workload mix. To study the system behaviour of different workload mixes, we choose a few key evaluation points illustrated in Figure 2 and Figure 3. Two sets of experiments are run: 1) keeping the workload mix constant in pool 1 and altering the workload mix in pool 2, as shown in Figure 5; 2) keeping the workload mix in pool 2 constant and altering the workload mix in pool 1 as seen in Figure 6. The server switching time is set to 5 seconds and the switching decision is made every 30 seconds. We explain the impact of the workload mix on the total revenue for the NSP, and compare the results against the PSP and BSP policies. From Figure 5, it can be seen that when the workload mix in pool 1 is constant, (a), (b) and (c) show similar patterns. The total revenue from both pools from NSP and PSP decreases when the percentage of silver class jobs in pool 2 increases from 10% to 40%. This is understandable as silver class jobs contribute less to the total revenue. When the percentage increases to 50%, there is a big increase in total revenue. Based on our observations, this is due to a lower response time in pool 2, which is less than Er for gold class jobs in pool 2. When the percentage of silver class jobs is over 50%, although the response time in pool 2 decreases, the total revenue again decreases due to the decreasing weight of gold class jobs. It can also be seen that Figure 5(a) has the highest revenue and Figure 5(c) has the lowest revenue among the three cases. This is due to the longer response time (within deadline) in pool 1 as a result of the percentage increases in gold class jobs in the pool. As we know, a longer response time results in less throughput, which then results in less revenue contribution. In the second set of experiments, the workload mix in pool 2 is constant and the percentage of gold class jobs in pool 1 is altered. Figure 6(a), 6(b) and 6(c) also present similar patterns. The total revenue in all three cases decreases when the percentage of gold class jobs in pool 1 increases from 10% to 50%. The difference in revenue between BSP and the other two policies is smaller as the weight of gold class jobs increases. When the percentage is greater than 50%, the total revenue increases as the percentage of gold class jobs in pool 1 increases. We notice that the total revenue in Figure 6(c) is significantly higher than that in the other two cases. This is due to low response time (below Er values) of both classes of jobs in pool 2, which can result in a significant increase in revenue. In both sets of experiments, it can be seen that PSP and NSP have almost the same impact on total revenue for the one-time switching. The total revenue from NSP is always higher than those from the other two policies as the local search algorithm is employed in BSP and switching is done

Performance Evaluation

7.1

Experimental Setup

We design and develop a simulator to evaluate the server switching approach in this paper. Two applications are simulated, running on two logical pools (1 and 2). Each application has two classes of job (gold and silver), which represent the importance of these jobs. Both applications are multi-tiered and run on a cluster of servers. The service time Sir and the visiting ratio vir are chosen based on realistic values or from those supplied in supporting literature. Table 2. Experimental parameters. Pool 1 Pool 2 Silver Gold Gold Silver Service WS 0.07 0.1 0.05 0.025 time AS 0.03125 0.1125 0.01 0.06 (sec) DS 0.05 0.025 0.0375 0.025 WS 1.0 0.6 1.0 0.8 Visiting AS 1.6 0.8 2.0 1.0 ratio DS 1.2 0.8 1.6 1.6 Deadline (sec) 20 15 6 8 Exit point (sec) 30 20 10 12 Revenue unit 2 10 20 4 Number WS 4 5 of AS 10 15 servers DS 2 3 Based on a real test-bed which we have access to, the application server switching takes less than five seconds and web server switching is relatively straightforward. Database server switching is more complex, however, it does not affect the switching policy itself. In this paper, we assume switching cost for web servers, application servers and database servers is the same for simplicity. Experimental parameters used for our evaluation can be found in Table 2.

7.2

Mixed Workload

Evaluation Results

Experiments have been conducted for two different workload scenarios called mixed workload and random load. For each of these cases, we compare the results from our proposed bottleneck-aware server switching policy (BSP) with those from the proportional server switching policy (PSP) and the non-switching policy (NSP).

209

Authorized licensed use limited to: WARWICK UNIVERSITY. Downloaded on March 11, 2009 at 06:47 from IEEE Xplore. Restrictions apply.

NSP PSP BSP

12000 11000

10000

10000

9000

9000

8000

8000

7000

7000

revenue

revenue

11000

(a)

6000 5000

13000

4000 3000

2000

2000

1000

1000

11000 10000 9000

10

20

30

40

50

60

70

80

90

100

8000 7000 6000 5000 4000 3000 2000 1000

0 0

(c)

12000

5000

3000

NSP PSP BSP

14000

6000

4000

0

(b)

revenue

NSP PSP BSP

12000

0 0

10

20

30

% silver class jobs

40

50

60

70

80

90

100

0

10

20

30

% silver class jobs

40

50

60

70

80

90

100

% silver class jobs

Figure 5. The relationship between total revenue and workload mix. The number of customers in both pools is 100. The ratio of silver class jobs to gold class jobs in pool 1 is (80:20) in (a), (60:40) in (b) and (20:80) in (c), respectively. The percentage of silver class jobs in pool 2 ranges from 10% to 90%.

NSP PSP BSP

10000

(a) 11000

9000 8000 7000

(b) 11000

10000

10000

9000

9000

3000 2000

6000 5000

4000 3000

2000

2000

0

0 20

30

40

50

60

70

80

90

100

% gold class jobs

5000

3000

1000 10

6000

4000

1000

0

7000

revenue

revenue

4000

(c)

8000

s

7000

5000

NSP PSP BSP

12000

8000

6000

revenue

NSP PSP BSP

12000

1000 0 0

10

20

30

40

50

60

70

80

90

100

0

10

% gold class jobs

20

30

40

50

60

70

80

90

100

% gold class jobs

Figure 6. The relationship between total revenue and workload mix. The number of customers in both pools is 100. The ratio of gold class jobs to silver class jobs in pool 2 is (80:20) in (a), (60:40) in (b) and (20:80) in (c), respectively. The percentage of gold class jobs in pool 1 ranges from 10% to 90%. only when a better configuration is found. 7.2.2

Tables 3 and 4 list the performance results for short and long switching decision intervals (thus switching decision interval time). As can be seen from Table 3, for different server switching times, both PSP and BSP perform better than NSP in terms of revenue contribution with and without AC. When no AC is applied, the improvements are 21.1% and 143.3%, 23.3% and 102.2%, 25.2% and 102.1% for the 5, 10 and 15 second switching times respectively. With AC, the improvement are 20.2% and 143.7%, 23.7% and 142.9%, 25.5% and 104.4%, for the three cases, respectively. Without AC, the numbers of switches are 130 and 20, 108 and 3, 101 and 3, for 5, 10 and 15 second switching times respectively. When AC is employed, the numbers are 145 and 15, 112 and 13, 106 and 3, respectively. As can be seen from both figures, the number of server switches decreases as the server switching time increases. This is because the increase in switching time makes server switching more costly, which results in fewer switches. PSP always implements more switches than BSP. Also, the total revenue from BSP decreases slightly whereas it increases using PSP as the server switching time increases. This is understandable since PSP makes switching decisions solely based on workload proportion, and it switches servers even

Random Workload

In this section, we consider a more representative workload scenario – the random workload. The number of users in pools 1 and 2 are uniformly distributed between 20 and 200. Moreover, the workload mix in each pool is also random. In section 7.2.1, a thirty-second fixed switching decision interval is used. In this section the switching decision interval time is the same as the workload change interval time, which is also a random number uniformly distributed in a fixed range. Two cases are considered: 1) a short switching decision interval time uniformly distributed between 15 and 25 seconds; 2) a long switching decision interval time uniformly distributed between 25 to 55 seconds. In section 7.2.1, a 5 second fixed server switching time is used; we also alter the switching time (to 5, 10 and 15 seconds) and evaluate the performance impact of the switching cost on total revenue for the three different switching policies. We evaluate the performance of the three policies with and without the admission control scheme for each of the above cases. All the experiments run for approximately two hours, during which 1,000 switching decisions are made.

210

Authorized licensed use limited to: WARWICK UNIVERSITY. Downloaded on March 11, 2009 at 06:47 from IEEE Xplore. Restrictions apply.

Table 3. Short decision interval.

Switching time (sec) Policy Number of switches Without Total revenue (x1000) AC Improvement over NPS (%) Number of switches With Total revenue (x1000) AC Improvement over NPS (%) Improvement over non-AC (%)

NSP 0 2340 0 0 2340 0 0

5 PSP 130 2833 21.1 145 2813 20.2 -0.71

BSP 20 5692 143.3 15 5702 143.7 0.17

NSP 0 2340 0 0 2340 0 0

10 PSP 108 2886 23.3 112 2894 23.7 0.27

BSP 3 4731 102.2 13 5684 142.9 20.2

NSP 0 2340 0 0 2340 0 0

15 PSP 101 2928 25.2 106 2937 25.5 0.29

BSP 3 4730 102.1 3 4783 104.4 1.13

BSP 20 11557 141.9 15 11577 142.3 0.17

NSP 0 4778 0 0 4778 0 0

15 PSP 119 5832 22.1 80 6436 34.7 10.4

BSP 3 9539 99.7 15 11566 142.1 21.2

Table 4. Long decision interval.

Switching time (sec) Policy Number of switches Without Total revenue (x1000) AC Improvement over NPS (%) Number of switches With Total revenue (x1000) AC Improvement over NPS (%) Improvement over non-AC (%)

NSP 0 4778 0 0 4778 0 0

5 PSP 152 5702 19.4 158 5661 18.5 -0.73

BSP 20 11567 142.1 13 11579 142.4 0.11

though the performance improvement may be very small. BSP on the other hand tries to search for the best switching that results in more improvement at each switching step. We find that the configuration returned by BSP is usually much further from the current configuration (that not found by PSP), thus each BSP switching step is more costly than that from PSP. On average, for each switching step, the ratio of the improvement over the cost from BSP is greater than that from PSP. Thus, BSP results in more revenue than the PSP policy. Due to the nature of the random load, servers may need to be switched back to their original pool. As the switching time increases, the number of switches for both policies decreases, therefore the total revenue increases from PSP but decreases from BSP. However, BSP consistently outperforms PSP in terms of revenue contribution for all cases, and the improvement from BSP over NSP is more than four times that of PSP.

NSP 0 4778 0 0 4778 0 0

10 PSP 134 5710 19.5 82 6399 33.9 12.1

Table 3. This is reasonable as longer switching interval times result in potentially better configurations, thus more switches. With AC, the number of server switches for PSP increases from 145 to 158 for the 5 second case, but decreases from 112 to 82, 106 to 80 for the other two cases; the numbers of switches from BSP are 13, 15, 15 for 5, 10, 15 second switching times, respectively. We believe that the workload mix (more weight for gold class jobs) in the long switching decision interval case will result in more potentially better configurations, and thus more switches. The revenue improvement when using BSP is almost 142% for all the cases regardless of the use of AC (an exception is the 99.7% implement for the case when the server switching time is 15 seconds and no AC is employed). The reason for the latter decrease is the same as for the number of switches above. The total revenue improvement from PSP without AC are 19.4%, 19.5% and 22.1% for the three switching time cases. With AC, the improvements are 18.5%, 33.9% and 34.7%. The improvements are, however, much less than those from BSP regardless of the use of AC.

From Table 3, it can also be seen that when AC is employed, there is a considerable improvement (20.2%) when the server switching time is 10 seconds. The improvement for the other two cases is less pronounced. The table also shows that when AC is employed, PSP results in more switches in each case compared with the no AC case. We believe this is a result of the workload mix change, which is caused by the AC.

8

Conclusions

In this paper we propose a model-driven server switching policy to dynamically allocate server resources in enterprise systems. Such systems are normally multi-tiered, and in each tier a cluster of servers are commonly used to improve processing capabilities. We model the multi-tiered architecture as a multi-class closed queueing network, with each network station corresponding to each application tier. The

Table 4 presents similar results to those seen in Table 3. Without AC, the number of switches for PSP increases from 130 to 152, 108 to 134, 101 to 119 for 5, 10 and 15 second switching times, respectively; the number from BSP drops to 3 for the 15 second case, this trend can also be seen in

211

Authorized licensed use limited to: WARWICK UNIVERSITY. Downloaded on March 11, 2009 at 06:47 from IEEE Xplore. Restrictions apply.

multi-tiered architecture can introduce bottlenecks, which will limit the overall system performance. In this paper, we use a convex polytopes-based approach to identify bottlenecks in the multi-class closed queueing network. The proposed switching policy responds to identified bottlenecks and switches servers between pools when necessary. In addition, we use an admission control scheme to deal with system overloading, which guarantees that the underlying system can respond to specific customers. Performance evaluation has been done via simulation and the results are compared with those from a n¨aive switching policy and with a system that implements no switching. Our experimental results show that the combination of the admission control scheme and the proposed switching policy performs substantially better than the other two policies in terms of revenue contribution.

[10] D. L. Eager and K. C. Sevcik. Bound Hierarchies for Multiple-class Queueing Networks. Journal of ACM, 33(1):179–206, 1986. [11] S. Elnikety, E. Nahum, J. Tracey, and W. Zwaenepoel. A Method for Transparent Admission Control and Request Scheduling in e-Commerce Web Sites. In International WWW Conference, New York, USA, 2004. [12] L. He, W. J. Xue, and S. A. Jarvis. Partition-based Profit Optimisation for Multi-class Requests in Clusters of Servers. In the IEEE International Conference on e-Business Engineering, 2007. [13] T. Kerola. The Composite Bound Method for Computing Throughput Bounds in Multiple Class Environments. Performance Evaluation, 6(1):1–9, 1986. [14] M. Litoiu. A Performance Analysis Method for Autonomic Computing Systems. ACM Transaction on Autonomous and Adaptive Systems, 2(1):3, 2007. [15] J. Little. A Proof of the Queueing Formula L = λW . Operations Research, 9(3):383–387, May 1961. [16] Z. Liu, M. Squillante, and J. Wolf. On Maximizing Servicelevel-agreement Profits. ACM SIGMETRICS Performance Evaluation, 29:43–44, 01. [17] D. A. Menasce and V. A. F. Almeida. Capacity Planning for Web Performance–metrics, models,and methods. Prentice Hall PTR, 1998. [18] D. A. Menasce, V. A. F. Almeida, R. Fonseca, and M. A. Mendes. Business-oriented Resource Management Policies for e-Commerce Servers. Performance Evaluation, 42:223– 239, 2000. [19] J. Palmer and I. Mitrani. Optimal and Heuristic Policies for Dynamic Server Allocation. Journal of Parallel and Distributed Computing, 65(10):1204–1211, 2005. [20] M. Reiser and S. Lavenberg. Mean-value Analysis of Closed Multi-Chain Queueing Networks. Journal of the Association for Computing Machinary, 27:313–322, 1980. [21] K. Sevcik and I. Mitrani. The Distribution of Queueing Network States at Input and Output Instants. Journal of the ACM, 28(2), April, 1981. [22] B. Urgaonkar, G. Pacifici, P. J. Shenoy, M. Spreitzer, and A. Tantawi. An Analytical Model for Multi-tier Internet Services and its Applications. ACM SIGMETRICS Performance Evaluation Review, pages 291–302, 2005. [23] B. Urgaonkar and P. Shenoy. Cataclysm: Policing Extreme Overloads in Internet Applications. In Proceedings of the WWW2005, Chiba, Japan, May 2005. [24] M. Welsh and D. Culler. Adaptive Overload Control for Busy Internet Servers. In the 2003 USENIX Symposium on Internet Technologies and Systems, 2003. [25] A. Zalewski and A. Ratkowski. Evaluation of Dependability of Multi-tier Internet Business Applications with Queueing Networks. In International Conference on Dependability of Computer Systems ( DEPCOS-RELCOMEX’06), 2006. [26] J. Y. Zhou and T. Yang. Selective Early Request Termination for Busy Internet Services. In 15th International Conference on World Wide Web, Edinburgh, Scotland, 2006.

Acknowledgment This work is supported in part by the UK Engineering and Physical Science Research Council (EPSRC) contract number EP/C538277/1.

References [1] M. Arlitt and T. Jin. A Workload Characterization Study of the 1998 World Cup Web Site. IEEE Network, 14(3):30–37, 2000. [2] G. Balbo and G. Serazzi. Asymptotic Analysis of Multiclass Closed Queueing Networks: Multiple Bottlenecks. Performance Evaluation, 30(3):115–152, 1997. [3] P. Barford and M. Crovella. Generating Representative Web Workloads for Network and Server Performance Evaluation. ACM SIGMETRICS Performance Evaluation Review, 26(1):151–160, 1998. [4] G. Bolch, S. Greiner, H. de Meer, and K. S. Trivedi. Queueing Networks and Markov Chains: modelling and performance evaluation with computer science applications. Wiley, 2nd edition, 2006. [5] J. Y. L. Boudec. Rate Adaptation, Congestion Control and Fairness: A Tutorial, Nov 2005. [6] G. Casale and G. Serazzi. Bottlenecks Identification in Multiclass Queueing Networks Using Convex Polytopes. In 12th Annual Meeting of the IEEE Int’l Symposium on Modelling, Analysis, and Simulation of Comp. and Telecommunication Systems (MASCOTS), 2004. [7] J. S. Chase and D. C. Anderson. Managing Energy and Server Resources in Hosing Centers. In 18th ACM Symposium on Operating Systems Principles, 2001. [8] L. Cherkasova and P. Phaal. Session Based Admission Control: a Mechanism for Peak Load Management of Commercial Web Sites. IEEE Transactions on Computers, 51(6), Jan 2002. [9] P. J. Denning and J. P. Buzen. The Operational Analysis of Queueing Network Models. ACM Computing Surveys, 10(3):225–261, 1978.

212

Authorized licensed use limited to: WARWICK UNIVERSITY. Downloaded on March 11, 2009 at 06:47 from IEEE Xplore. Restrictions apply.