A Statistical Model for Detecting Abnormality in ... - Semantic Scholar

4 downloads 0 Views 163KB Size Report
Abstract. This paper presents a new statistical model for detecting signs of abnormality in static-priority scheduling networks with differentiated services.
A Statistical Model for Detecting Abnormality in Static-Priority Scheduling Networks with Differentiated Services Ming Li1 and Wei Zhao2 1

School of Information Science & Technology, East China Normal University, Shanghai 200062, China [email protected], [email protected] http://www.ee.ecnu.edu.cn/teachers/mli/js_lm(Eng).htm 2 Department of Computer Science, Texas A&M University, College Station, TX 77843-1112, USA [email protected] http://faculty.cs.tamu.edu/zhao/

Abstract. This paper presents a new statistical model for detecting signs of abnormality in static-priority scheduling networks with differentiated services at connection levels on a class-by-class basis. The formulas in terms of detection probability, miss probability, probabilities of classifications, and detection threshold are proposed. Keywords: Anomaly detection, real-time systems, traffic constraint, staticpriority scheduling networks, differentiated services, time series.

1 Introduction Anomaly detection has gained applications in computer communication networks, such as network security, see e.g. [1], [2], [3], [4], [5], [6], [7]. This paper considers the abnormality identification of arrival traffic time series (traffic for short) at connection levels, which relates to traffic models. In traffic engineering, traffic models can be classified into two categories [8]. One is statistically modeling as can be seen from [9], [10], [11]. The other bounded modeling, see e.g. [12], [13], [14], [15]. Though statistically modeling has gained considerable progresses, one thing worth noting is that they are well in agreement with real life data in aggregated case. In general, nevertheless, they are not enough when traffic at connection levels has to be taken into account. In fact, traffic modeling at connection level remains challenging in the field [16]. In the academic area of computer science, a remarkable thing to model traffic at connection level is to study traffic from a view of deterministic queuing theory, which is often called network calculus or bounded modeling. One of the contributions in this paper is to develop traffic constraint (a kind of deterministically bounded model [13]) into a statistical bound of traffic. Recent developments of networking exhibit that there exists an increased interest in differentiated services (DiffServ) [13], [17]. From a view of abnormality detection, instead of detecting abnormality of all connections, we are more interested in Y. Hao et al. (Eds.): CIS 2005, Part II, LNAI 3802, pp. 267 – 272, 2005. © Springer-Verlag Berlin Heidelberg 2005

268

M. Li and W. Zhao

identifying abnormality of some connections in practice. Thus, this paper studies abnormality detection in the environment of DiffServ. As far as detections were concerned, the current situation is not lacking methods for detections [18] but short of reliable detections as can be seen from the statement like this. “The challenge is to develop a system that detects close to 100 percent of attacks. We are still far from achieving this goal [19].” From a view of statistical detection, however, instead of developing a way to detect close to 100 percent of abnormality, we study how to achieve an accurate detection for a given detection probability. By accurate detection, we mean that a detection model is able to report signs of abnormality for a predetermined detection probability. This presentation proposes an accurate detection model of abnormality in static-priority scheduling networks with DiffServ based on two points: 1) the null hypotheses and 2) averaging traffic constraint in [13]. A key point in this contribution is to randomize traffic constraint on an interval-by-interval basis so as to utilize the techniques from a view of time series to carry out a statistical traffic bound, which we shall call average traffic constraint for simplicity. To our best knowledge, this paper is the first attempt to propose average traffic constraint from a view of stochastic processes and moreover apply it to abnormality detection. The rest of paper is organized as follows. Section 2 introduces an average traffic constraint in static-priority scheduling networks with DiffServ. Section 3 discusses detection probability and detection threshold. Section 4 concludes the paper.

2 Average Traffic Constraint In this section, we first brief the conventional traffic constraint. Then, randomize it to a statistical constraint of traffic. The traffic constraint is given by the following definition. Definition 1: Let f (t ) be arrival traffic function. If f (t + I ) − f (t ) ≤ F ( I ) for t > 0

and I > 0, then F ( I ) is called traffic constraint function of f (t ) [13].



Definition 1 is a general description of traffic constraint, meaning that the increment of traffic f (t ) is upper-bounded by F ( I ). It is actually a bounded traffic model [13]. The practical significance of such model is to model traffic at connection level. Due to this, we write the traffic constraint function of group of flows as follows. Definition 2: Let f pi , j , k (t ) be all flows of class i with priority p going through

server k from input link j. Let Fpi , j , k (t ) be the traffic constraint function of f pi , j , k (t ). Then, Fpi , j , k (t ) is given by f pi , j , k (t + I ) − f pi , j , k (t ) ≤ Fpi , j , k ( I ) for t > 0 and I > 0.



Definition 2 provides a bounded model of traffic in static-priority scheduling networks with DiffServ at connection level. Nevertheless, it is still a deterministic model in the bounded modeling sense. We now present a statistical model from a view of bounded modeling. Theoretically, the interval length I can be any positively real number. In practice, however, it is usually selected as a finite positive integer in practice. Fix the value of

A Statistical Model for Detecting Abnormality in Static-Priority Scheduling Networks

269

I and observe Fpi , j , k ( I ) in the interval [(n − 1) I , nI ], n = 1, 2,..., N . For each

interval, there is a traffic constraint function Fpi , j , k ( I ), which is also a function of the index n. We denote this function Fpi , j , k ( I , n). Usually, Fpi , j , k ( I , n) ≠ Fpi , j , k ( I , q) for n ≠ q. Therefore, Fpi , j , k ( I , n) is a random variable over the index n. Now, divide the interval [(n − 1) I , nI ] into M non-overlapped segments. Each segment is of L length. For the mth segment, we compute the mean E[ Fpi , j , k ( I , n)]m (m = 1, 2,..., M ), where E is the mean operator. Again, E[ Fpi , j , k ( I , n)]l ≠ E[ Fpi , j , k ( I , n)]m for l ≠ m. Thus, E[ Fpi , j , k ( I , n)]m is a random variable too. According to statistics, if M ≥ 10, E[ Fpi , j , k ( I , n)]m quite accurately follows Gaussian distribution [1], [20]. In this case, E[ Fpi , j , k ( I , n)]m ~

1 2π σ F

exp[−

{E[ Fpi , j , k ( I , n)]m − Fµ ( M )}2 2σ F2

],

(1)

where σ F2 is the variance of E[ Fpi , j , k ( I , n)]m and Fµ ( M ) is its mean. We call

E[ Fpi , j , k ( I , n)]m average traffic constraint of traffic flow f pi , j , k (t ).

3 Detection Probability In the case of M ≥ 10, it is easily seen that

⎡ ⎤ Fµ ( M ) − E[ Fpi , j , k ( I , n)]m ≤ zα / 2 ⎥ = 1 − α , Pr ob ⎢ z1−α / 2 < σF M ⎢⎣ ⎥⎦

(2)

where (1 − α ) is called confidence coefficient. Let CF ( M , α ) be the confidence interval with (1 − α ) confidence coefficient. Then, ⎡ σ z σ z ⎤ CF ( M , α ) = ⎢ Fµ ( M ) − F α / 2 , Fµ ( M ) + F α / 2 ⎥ . M M ⎦ ⎣

(3)

The above expression exhibits that Fµ ( M ) is a template of average traffic constraint. Statistically, we have (1 − α )% confidence to say that E[ Fpi , j , k ( I , n)]m takes Fµ ( M ) as its approximation with the variation less than or equal to

σ F zα / 2 M

.

Denote ξ  E[ Fpi , j , k ( I , n)]m . Then, ⎛ σ z Pr ob ⎜ ξ > Fµ ( M ) + F α / 2 M ⎝

⎞ α ⎟= . ⎠ 2

(4)

⎛ σ z ⎞ α Pr ob ⎜ ξ ≤ Fµ ( M ) − F α / 2 ⎟ = . M ⎠ 2 ⎝

(5)

On the other hand,

270

M. Li and W. Zhao

For facilitating the discussion, two terms are explained as follows. Correctly recognizing an abnormal sign means detection and failing to recognize it miss. We explain the detection probability as well as miss probability by the following theorem. Theorem 1 (Detection probability and detection threshold): Let

V = Fµ ( M ) +

σ F zα / 2

(6)

M

be the detection threshold. Let Pdet and Pmiss be detection probability and miss probability, respectively. Then, Pdet = P{V < ξ < ∞} = (1 − α / 2),

(7)

Pmiss = P{−∞ < ξ < V } = α / 2.

(8)

Proof: The probability of ξ ∈ CF ( M , α ) is (1 − α ). According to (2) and (5), the probability of ξ ≤ V is (1 − α / 2). Therefore, ξ > V exhibits a sign of abnormality with (1 − α / 2) probability. Hence, Pdet = (1 − α / 2). Since detection probability plus □ miss one equals 1, Pmiss = α / 2. From Theorem 1, we can achieve the following statistical classification criterion for a given detection probability by setting the value α . Corollary 1 (Classification): Let f pi , j , k (t ) be arrival traffic of class i with priority

p going through server k from input link j at a protected site. Then, f pi , j , k (t ) ∈ N if E[ Fpi , j , k ( I , n)]m ≤ V

(9a)

where N implies normal set of traffic flow, and f pi , j , k (t ) ∈ A if E[ Fpi , j , k ( I , n)]m > V .

(9b)

where A implies abnormal set. The proof is straightforward from Theorem 1. The diagram of our detection is indicated in Fig. 1.

Setting detection probability (1−α / 2) f(t)

ξ

Feature extractor

ξ ξ

Establishing template

Fµ (M) Template

Classifier

V

Detection threshold

Fig. 1. Diagram of detection model

Report



A Statistical Model for Detecting Abnormality in Static-Priority Scheduling Networks

271

4 Conclusions In this paper, we have extended the traffic constraint in [13], which is conventionally a bound function of arrival traffic, to a time series by averaging traffic constraints of flows on an interval-by-interval basis in DiffServ environment. Then, we have derived a statistical traffic constraint to bound traffic. Based on this, we have proposed a statistical model for the purpose of abnormality detection in static-priority scheduling networks with differentiated services at connection level. With the present model, signs of abnormality can be identified on a class-by-class basis according to a detection probability that is predetermined. The detection probability may be very high and miss probability may be very low if α is set to be very small. The results in the paper suggest that abnormality signs can be detected at early stage that abnormality occurs since identification is done at connection level.

Acknowledgements This work was supported in part by the National Natural Science Foundation of China (NSFC) under the project grant number 60573125, by the National Science Foundation under Contracts 0081761, 0324988, 0329181, by the Defense Advanced Research Projects Agency under Contract F30602-99-1-0531, and by Texas A&M University under its Telecommunication and Information Task Force Program. Any opinions, findings, conclusions, and/or recommendations expressed in this material, either expressed or implied, are those of the authors and do not necessarily reflect the views of the sponsors listed above.

References 1. Li, M.: An Approach to Reliably Identifying Signs of DDOS Flood Attacks based on LRD Traffic Pattern Recognition. Computer & Security 23 (2004) 549-558 2. Bettati, R., Zhao, W., Teodor, D.: Real-Time Intrusion Detection and Suppression in ATM Networks. Proc., the 1st USENIX Workshop on Intrusion Detection and Network Monitoring, April 1999, 111-118 3. Schultz, E.: Intrusion Prevention. Computer & Security 23 (2004) 265-266 4. Cho, S.-B., Park, H.-J.: Efficient Anomaly Detection by Modeling Privilege Flows Using Hidden Markov Model. Computer & Security 22 (2003) 45-55 5. Cho, S., Cha, S.: SAD: Web Session Anomaly Detection based on Parameter Estimation. Computer & Security 23 (2004) 312-319 6. Gong, F.: Deciphering Detection Techniques: Part III Denial of Service Detection. White Paper, McAfee Network Security Technologies Group, Jan. 2003 7. Sorensen, S.: Competitive Overview of Statistical Anomaly Detection. White Paper, Juniper Networks Inc., www.juniper.net, 2004 8. Michiel, H., Laevens, K.: Teletraffic Engineering in a Broad-Band Era. Proc. IEEE 85 (1997) 2007-2033 9. Willinger, W., Paxson, V.: Where Mathematics Meets the Internet. Notices of the American Mathematical Society 45 (1998) 961-970

272

M. Li and W. Zhao

10. Li, M., Zhao, W., and et al.: Modeling Autocorrelation Functions of Self-Similar Teletraffic in Communication Networks based on Optimal Approximation in Hilbert Space. Applied Mathematical Modelling 27 (2003) 155-168 11. Li, M., Lim, SC.: Modeling Network Traffic Using Cauchy Correlation Model with LongRange Dependence. Modern Physics Letters B 19 (2005) 829-840 12. L.-Boudec, J.-Yves, Patrick, T.: Network Calculus, A Theory of Deterministic Queuing Systems for the Internet. Springer (2001) 13. Wang, S., Xuan, D., Bettati, R., Zhao, W.: Providing Absolute Differentiated Services for Real-Time Applications in Static-Priority Scheduling Networks. IEEE/ACM T. Networking 12 (2004) 326-339 14. Cruz, L.: A Calculus for Network Delay, Part I: Network Elements in Isolation; Part II: Network Analysis. IEEE T. Inform. Theory 37 (1991) 114-131, 132-141 15. Chang, C. S.: On Deterministic Traffic Regulation and Service Guarantees: a Systematic Approach by Filtering. IEEE T. Information Theory 44 (1998) 1097-1109 16. Estan C., Varghese, G.: New Directions in Traffic Measurement and Accounting: Focusing on the Elephants, Ignoring the Mice. ACM T. Computer Systems 21 (2003) 270–313 17. Minei, I.: MPLS DiffServ-Aware Traffic Engineering. White Paper, Juniper Networks Inc., www.juniper.net, 2004 18. Leach, J.: TBSE—An Engineering Approach to The Design of Accurate and Reliable Security Systems. Computer & Security 23 (2004) 265-266 19. Kemmerer, R. A., Vigna, G.: Intrusion Detection: a Brief History and Overview. Supplement to Computer (IEEE Security & Privacy) 35 (2002) 27-30 20. Bendat, J. S., Piersol, A. G.: Random Data: Analysis and Measurement Procedure. 2nd Edition, John Wiley & Sons (1991)