HW.3.4.SIP Overload.pdf

SIP Overload Control Using Automatic Classification Zohair Chentouf Department of Computer Science, College of Computer and Information Sciences King Saud University, Riyadh, Saudi Arabia [email protected]; 966-14695215

Abstract— As the Session Initiation Protocol (SIP) becomes more and more the core of the communication networks convergence, there is urgent need to manage the critical problem of SIP service availability under extreme overload. This paper proposes a novel solution to control SIP overload. We demonstrate the efficiency of the approach compared with a well known SIP overload control algorithm. The solution builds on monitoring a set of SIP servers’ features and uses Support Vector Machines to classify traffic behavior as problematic or not. The validation of the proposed solution is performed through experiments. Keywords- SIP, overload control, automatic classification, SVM

I.

INTRODUCTION

The main driver of the nowadays evolving communications world is the Session Initiation Protocol (SIP) [11], which has been adopted as the signaling protocol for the Internet Multimedia Subsystem (IMS). The latter standard constitutes the convergence of the packet-switched, the Public Switched Telephone Network (PSTN), the wireless, and the cable technology networks. The central role of SIP in the communications architectures makes it indispensible to pay attention to its capability of assuring reliable and highly available services. This is particularly sensitive as the services require high amounts of computation and I/O, which is exactly the case of SIP since SIP sessions often imply concurrent processing and multiple database lookups. Besides that, it is not uncommon that a SIP server, like in any other client-server architecture, receives a request rate that exceeds its capacity. The SIP server has to be able to manage such an overload. Otherwise, the server will quickly be pushed into a congestion collapse. To manage overload, a SIP server may be designed to simply drop the message requests that it cannot process. However, dropping incoming requests causes the corresponding SIP timers in the sending party to fire, which causes the SIP request to be retransmitted. Hence, the overload situation is quickly amplified [10]. Another design option conforms to the SIP standard [11] which recommends the use of the SIP builtin overload control mechanism. The latter relies on the 503Service-Unavailable response message that is used to reject an incoming request. When the upstream server receives 503Service-Unavailable, it stops retransmitting the SIP request. However, the processing cost of this mechanism is known to be high [10]. SIP also specifies an optional parameter “retry-

after” in the SIP 503-Service-Unavailable response. This parameter defines the amount of time that the upstream server should wait before sending any new request to the server. The mechanism that combines the 503-Service-Unavailable with “retry-after” can be seen as an on/off overload control, which is known not to be a sufficient solution to prevent congestion collapse [10]. A. Related Work We are interested in the load control solutions that are designed to be executed by two SIP servers, an upstream server (Proxy A in Figure 1) and a downstream server (Proxy B in Figure 1). This configuration conforms to the Internet Engineering Task Force (IETF) requirements listed in [10]. In order to control the overload in Proxy B, the latter includes the control feedback in the SIP response messages that are sent back to Proxy A, which has then to apply the control to reduce the load that reaches Proxy B. The following research works used this configuration in order to test and compare different algorithms. In [2], the authors tested two overload control algorithms: the bang-bang and the CPU occupancy (OCC) algorithms. The bang-bang algorithm’s output is the decision to accept or reject a new incoming SIP message based on the comparison between the incoming message queue size and a given threshold. The OCC algorithm calculates the acceptance rate based on the CPU occupancy. The experiments concluded that, in terms of goodput, bang-bang is better than OCC. In [3], the authors designed a distributed and dynamic windowbased overload control called Flow Control (FC) algorithm. The algorithm is based on the call establishment delay. Simulations showed that FCC performs better than OCC both in terms of call setup delay and throughput. In [8], four different SIP overload control algorithms have been tested. The first algorithm is an ameliorated version of the SIP builtin “retry-after” mechanism. The second is OCC. The third algorithm, called Queue Delay Control, relies on the queue delay to estimate the message acceptance rate. The fourth algorithm is called Window Control and based on the acceptance rate calculated by the upstream server instead of the downstream one. Simulations showed that the latter algorithm is better than the former three in terms of goodput. In [12] five algorithms are compared. Some are based on the message queue size or the queuing delay. The others are based on the processor occupancy like OCC. The authors conclude through simulations that queuing-based algorithms outperform

occupancy-based ones in terms of goodput. In [1], the authors propose a signal-based approach, which consists in implementing a controller in the upstream server. The latter’s operation is based on the measurement of the rate of overload responses (503 SIP messages) issued by the downstream SIP server and the rate of the request timeouts locally fired. Simulation results show that this algorithm and OCC provide the same performance in terms of goodput. Sun, Yu, and Zheng [14] propose a solution to accomplish Service Level Agreement (SLA)-oriented overload management. For that, the upstream server controls overload and manages request queuing based on the SLA. Simulations showed that the proposed solution prevents the system from overload collapses and better satisfies the service level. Other research works [5, 9, 18] focused on single SIP server overload control where only one server implements the control algorithm without the cooperation of the upstream server. However, such a configuration does not conform to the viewpoint of the IETF described in [10], which recommends designing server-to-server overload solutions.

The third parameter expresses the status of the SIP stack. The fourth parameter is important because the database is often the slowest component in SIP servers. The integration of these four different input data is done through a Support Vector Machine (SVM)-based automatic classifier. The rest of the paper is organized as follows. Section II is an overview of SIP and Support Vector Machines. Section III details the ARC and SVM-based algorithms and the corresponding implementation architectures. Section IV discusses experimentation and results. Section V is a summary of the investigation and suggestions on potential directions. II.

BACKGROUND

A. Session Initiation Protocol The Session Initiation Protocol (SIP) was originally designed in 1999 by the Internet Engineering Task Force (IETF). Since, many other Request For Comments (RFC) have been produced in order to fix some issues in SIP or to add new features. The latest SIP specification is the RFC 3261[11]. SIP is used for initiating and terminating media sessions, and for changing parameters of a current session. A SIP session can involve voice and media such as video and text. A SIP architecture consists of SIP User Agents (UA) and SIP Servers. A SIP UA represents any network end-point that can originate or terminate a SIP session. It might be a SIP-enabled telephone, a SIP PC-client (known as a “softphone”), or a SIPenabled gateway.

Figure 1. Server-to-Server SIP call sequence pattern

B. Research Motivation and Plan The basic idea of the here reported research is to verify the following conjecture that summarizes the author’s observations during his work in SIP service industry: overload control mechanisms’ performance might be enhanced if they take into consideration multiple types of input data. For example, we noticed that in a SIP server it is often the database that contributes more in congestion creation because it operates slower than the other modules of the SIP server. So, a good idea would be the integration of data on the database operation in the load control algorithm. Our research plan consists in implementing and comparing two algorithms. The first is a queue delay-based algorithm called the Adaptive Rate Control (ARC) algorithm in [12]. The second algorithm realizes our conjecture by integrating four input data: (1) call establishment delay, (2) queuing delay, (3) SIP 100-Trying message delay, and (4) database response time. The choice of the first two parameters conforms to the conclusions of [3] and [8, 12], respectively.

There exist four types of SIP servers: (1) SIP Proxy Server: a call-control server that provides routing services of SIP messages (requests and responses) between SIP UAs and servers; (2) SIP Redirect Server: receives an INVITE request from a UA and replies back with a specific SIP response containing the right address where the INVITE request must be sent by the UA; (3) SIP Registrar Server: receives REGISTER requests from UAs. Each REGISTER request contains the current address of the UA. Based on this, the registrar server updates its database of addresses. (4) SIP Location Server: when a SIP request for a new session is received, this server makes a database lookup for the address of the callee. Each SIP response has a reply code ranging from 100 to 699 and a reason phrase. For example, when a UA receives an INVITE request, it may reply by sending sequentially the following three responses: 100-Trying means that the UA has received the INVITE and that it is being processed. 180-Ringing means the UA has received the INVITE and is alerting the user. 200-OK means that the INVITE is accepted (e.g., the called user has picked-up).

Figure 1 illustrates a SIP call between two UAs where two SIP proxies are involved. B. Support Vector Machines Support Vector Machines (SVM) has gained popularity as machine learning algorithm. It is used to learn a classification problem by examples. Training data are given by a set of couples with being the correct classification of the vectors . The SVM algorithm aims to distinguish between the two classes. For that, it first tries to find a separating hyperplane that solves the equation . If such a hyperplane exists, SVM will generate a solution to the problem of maximizing the distance between the two hyperplanes and such that for all points either or . This problem is proved to be equivalent to finding and to minimize so that . SVM may also use a non-linear separation which has a similar formulation except that the dot product is replaced by a nonlinear kernel function such as a sigmoid or polynomial function. More details can be found in [17].

requests to the proxy and establishes a complete SIP call session like depicted by Figure 1. The aim is for the monitor to measure the average 100-Trying message delay and the average service time . The former measures the performance of the proxy’s SIP stack and the latter measures the whole proxy’s performance. At the end of every control interval Tc, the SIP monitor sends a SIP OPTIONS message to the proxy, which responds with 200-OK message that contains the average queuing delay value dk and the average database response time . The monitor calculates the new target load based on the following: (1) where

SIP

OVERLOAD CONTROL ALGORITHMS AND ARCHITECTURES

A. Implementation of ARC The implementation of the Adaptive Rate Control (ARC) and the SVM-based algorithms relies on two different architectures. Figure 2 depicts the architecture of ARC. At the end of every control interval Tc, the SIP proxy sends the average queuing delay value dk to the front-end server, which calculates the new target load based on the following:

SIP Load Generator

B. Implementation of SVM-SLC Our proposed SVM-based method is called SVM-SIP Load Controller (SVM-SLC). The corresponding architecture is depicted by Figure 3. The SIP monitor is a SIP user agent that is located in the same host as the front-end server. It also contains the SVM classifier. It periodically sends INVITE

Front-end SIP Server

SIP Proxy Server

Feedback

Figure 2. SIP Architecture for ARC

The response of the SVM classifier is either +1 or -1. +1 means that during the last Tc period, and according to the four input variables, the system was not evolving towards congestion. So, the acceptance rate is increased. -1 means that the four inputs indicate that a congestion is about to happen. That is why the acceptance rate should be decreased. SIP

SIP Load Generator

D is the allowed message queuing delay budget and C is a constant. The feedback enforcement at the front-end server is operated by accepting only SIP requests during the next control interval Tc and dropping the extra number of requests. The feedback communication from the proxy to the front-end server is included in the SIP 200-OK response messages (see Figure 1).

SIP Regulated load

Among the different methods of classification we chose the Support Vector Machines approach for its ability to process high dimensional data [15, 16]. It has demonstrated very good performance in many domains like bioinformatics, pattern recognition, and network-based anomaly detection. It even demonstrated better performance compared with neural networks in terms of accuracy [6]. III.

is constant.

SIP

Front-end SIP Server

Regulated load

SIP Proxy Server

SIP

Feedback

SIP Monitor

Figure 3. SIP Architecture for SVM-SLC

Status

IV.

EXPERIMENTAL STUDY AND RESULTS

Three stations have been used to room the load generator, the front-end server and the monitor, and the proxy. Each machine has an Intel Pentium 4 CPU 3.40GHz and 1GB RAM memory, running a Windows XP. We used JNI_SVM-light6.01 [4] as SVM classifier. To generate SIP load, we developed a Java-based load generator. This tool is capable of issuing concurrent calls and playing the role of callers and callees at the same time. Another option would have been to use SIPp, which is a Linux-based load generator [13]. The SIP server consists of NIST SIP Stack and NIST JAIN SIP 1.2 layer [7]. We developed a small application on top of the JAIN API that consists in a location and Authentication Authorization and Accounting (AAA) service. When a SIP call request is received, the location-AAA service looks in the location database to check if the callee is registered and if the caller has the right to communicate with him. The call session then follows the state machine depicted by Figure 1. On the receipt of BYE, the location-AAA service looks a second time in the database in order to extract data required for calculating the call charge then stores a complete Call Data Record (CDR) in the database. A. Benchmarking and Calibration For comparison purpose, we first conducted a load test in order to determine the system’s Busy Hour Call Attempt (BHCA) number with no overload control mechanism. Each call signalling conforms to the state machine illustrated in Figure 1. The call holding time is random between 10 and 20 seconds. Our result metric is goodput at a success rate of five nines (99.999%). Measuring the system’s load management performance in terms of goodput conforms to the IETF requirements listed in [10]. The maximum load that has been reached was 18000 cph, which represents the system’s BHCA. A load of 20000 cph causes the server blockage during the beginning of the third hour. After the BHCA has been determined, we had to calibrate the ARC algorithm. Table I summarizes the best configuration of the algorithm. TABLE I. D C Tc

ARC CONFIGURATION 1000 ms 7000 500 ms

Then, in order to configure the SVM classifier, training data consisted in a set of vectors and the corresponding classification value (+1, -1). +1 means successful call and -1 means failed call. Those data have been logged while experimenting with a load of 19000 cph with no overload control. This load generates successful as well as failed calls before completely falling in a congestion collapse after more than two hours. In the log, every line corresponds to a vector Vk of four parameters and represents the last 500 ms of operation data. For every Vk a classification result is assigned (in the log) based on the call success rate (during 500

ms). If the latter is 100%, the classification result is +1; otherwise, it is -1. In the input data (Vk vectors and corresponding classification values), we noticed the presence of periods of failure that contain n consecutive failures (vectors labelled with -1). When n is small, the system can quickly recover to 100% service rate. However, larger n values indicate that the server is or will soon enter a failure phase from which it will never recover back to 100% service rate. When this situation occurs, it takes only a few minutes before the server completely falls into a congestion collapse. So, input data for the SVM classifier have been changed: only Vkn that belong to a failure period where n ≥ 10 have been labelled as failed (-1). Vkn that belong to a failure period where n < 10 have been labelled as success (+1). This aims to avoid for the SVM classifier to conclude false positives, which correspond to failures from which the system will be able to recover. Later experiments showed that better SIP overload control is obtained if in the SVM training phase we label as failure (-1) the 5 Vk that immediately precede Vkn where n ≥ 10. By doing so, we allow the SIP monitor to early suspect the long periods of failure or congestion. Experiments showed also that the best classification accuracy (98%) is obtained when the polynomial SVM kernel is used. After the training and tuning phase, the SVM classifier has been embedded with the SIP monitor. During load tests, the SIP monitor initiates a SIP call every monitoring period Tm. For every call, the monitor measures Tk (the 100-Trying message time) and Sk (the service time for the whole SIP call). In the 200-OK message (see Figure 1), the proxy sends to the monitor the value of Dk (the database average time of processing INVITE and BYE messages) and dk (the queuing time of INVITE messages). The monitor inputs the SVM classifier with . The classifier returns back a value (+1 or -1). At the end of every control period Tc,> Tm the monitor has C=Tc /Tm classification results. If there have been (C-α) failures or more, the monitor sets to -1 (equation 1, Section III.B), which means a failure. If there have been α failures or less, the monitor sets to +1. The monitor then calculates rk+1 using (equation 1, Section III.B) and sends the result to the front-end SIP server using OPTIONS SIP message. If there have been between (α+1) and (C-α-1) failures, the monitor does not have to send a new rate rk+1 to the front-end server. When the latter receives a new rate rk+1, it applies it immediately. Table 2 shows the different values we used during simulations. TABLE II. Tm Tc α kernel

SVM-SLC CONFIGURATION 500 ms 3000 ms 1 0.15 polynomial

B. Experimentation results A series of tests have been conducted with ARC and SVMSLC. Results are summarized by Figure 4 and Figure 5 in which the load unit corresponds to BHCA=18000 cph. Each experience ran during one hour. Figure 4 shows that SVMSLC drops more messages, which means that it detects congestions earlier compared with ARC. Figure 5 shows that SVM-SLC outperforms ARC in terms of goodput from more than 1 BHCA to less than 3 BHCA. As long as the load is less than 1.5 BHCA, SVM-SLC performs at 100%. Starting from more than 3 BHCA, the two algorithms perform equally. From 1.5 BHCA to 2 BHCA SVM-SLC serves more than 80% of the accepted requests while ARC serves less.

communicated to the upstream SIP server to execute the control. The basic motivation of the research has been to verify an initial conjecture that summarizes the author’s observations during his work in SIP service industry: overload control mechanisms’ performance might be enhanced if they take into consideration multiple types of input data. This conjecture has been validated through comparative experimentation study between the well known Adaptive Rate Control (ARC) load control algorithm and the here proposed solution called Support Vector Machine based SIP Load Control (SVM-SLC). In conformance to [10], the comparison has been elaborated based on the server goodput. Experimentation results show that SVM-SLC outperforms ARC in terms of goodput. In SVM-SLC, the SVM classifier is fed with four input data: call establishment delay, queuing delay, SIP 100-Trying message delay, and database response time. The ARC algorithm relies on one single input, which is the average queuing delay. The SVM method relies on supervised learning. Unsupervised learning techniques are appealing because they do not need manually labelled training data. As future work, we intend to investigate the use of unsupervised learning. ACKNOWLEDGMENT The work reported in this paper was carried out with the support of the College of Information and Computer Science Research Center under the grant 12018.

Figure 4. Acceptance rate of ARC and SVM-SLC

Figure 5. Goodput of ARC and SVM-SLC

V.

CONCLUSION

In this paper, we proposed a new solution to the problem of overload control in SIP severs. The solution conforms to [10] since the control feedback is calculated by the SIP server and

REFERENCES [1] A. Abdelal and W. Matragi, Signal-Based Overload Control for SIP Servers, 7th IEEE Consumer Communications and Networking Conference (CCNC), 2010, pp. 1–7. [2] V. Hilt and I. Widjaja, Controlling Overload in Networks of SIP Servers, IEEE International Conference on Network Protocols (ICNP), 2008. pp. 83 – 93. [3] M. Homayouni, M. Jahanbakhsh, V. Azhari, and A. Akbar, Overload Control in SIP Servers: Evaluation and Improvement, IEEE 17th International Conference on Telecommunications (ICT), 2010, pp. 666 – 672. [4] JNI_SVM-light 6.01, available at: http://www.mpi-inf.mpg.de/~mtb/. [5] S. Montagna and M. Pignolo, Performance evaluation of Load Control Techniques in SIP Signaling Servers, Third International Conference on Systems (icons), 2008, pp.51-56. [6] S. Mukkamala, G. Janoski, and A. Sung, Intrusion detection: Support vector machines and neural networks, IEEE Computer Society Student Magazine, Vol. 10, N. 2, 2002. [7] NIST SIP, available at: http://www-x.antd.nist.gov/proj/iptel/nist-sipdownloads.html. [8] E. Noel and C. Johnson, Novel Overload Controls for SIP Networks, 21st International Teletraffic Congress, 2009, pp. 1 – 8. [9] M. Ohta, Overload Protection in a SIP Signaling Network , International Conference on Internet Surveillance and Protection, 2006. [10] J. Rosenberg, Requirements for Management of Overload in the Session Initiation Protocol, RFC 5390, December 2008. [11] J. Rosenberg, H. Schulzrinne, G. Camarillo, A. Johnston, J. Peterson, R. Sparks, M. Handley, and E. Schooler, SIP: Session Initiation, RFC 3261, June 2002. [12] C. Shen, H. Schulzrinne, and E. Nahum, Session Initiation Protocol (SIP) Server Overload Control: Design and Evaluation, Principles, Systems and Applications of IP Telecommunica-tions (IPTComm), July 2008.

[13] SIPp, available at: http://sipp.sourceforge.net/. [14] J. Sun, H. Yu, and W. Zheng, Flow Management with Service Differentiation for SIP Application Servers, The 3rd ChinaGrid Annual Conference, 2008, 272–27. [15] V.N. Vapnik, The nature of statistical learning theory, Springer-Verlag New York, Inc., New York, 1995. [16] V.N. Vapnik, Statistical Learning Theory, New York, Springer-Verlag New York, Inc., New York, 1998. [17] L. Wang (Ed.), Support Vector Machines: Theory and Applications, Springer, 2005. [18] J. Yang, F. Huang, and S. Gou, An Optimized Algorithm for Overload Control of SIP signaling Network, 5th International Conference on Wireless Communications, Networking and Mobile Computing (WiCom), 2009. pp. 1 – 4.