Traffic Anomaly Detection Algorithm for Wireless ... - SAGE Journals

2 downloads 0 Views 772KB Size Report
detection accuracy than traditional traffic anomaly detection algorithms. 1. Introduction ...... “Wireless sensor networks: a survey,” Computer Networks, vol. 38, no.
Research Article Traffic Anomaly Detection Algorithm for Wireless Sensor Networks Based on Improved Exploitation of the GM(1,1) Model Qin Yu,1 Jibin Lyu,2 Lirui Jiang,1 and Longjiang Li1 1

School of Communication and Information Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China 2 Department of Computer Science, University of Southern California (USC), Los Angeles, CA 90089, USA Correspondence should be addressed to Qin Yu; [email protected] Received 16 October 2015; Revised 28 February 2016; Accepted 31 May 2016 Academic Editor: Paolo Bellavista Copyright © 2016 Qin Yu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. As WSNs gain popularity, they are becoming more and more necessary for traffic anomaly detection. Because worms, attacks, intrusions, and other kinds of malicious behaviors can be recognized by traffic analysis and anomaly detection, WSN traffic anomaly detection provides useful tools for timely reaction and appropriate prevention in network security. In the paper, we improve exploitation of GM(1,1) model to make traffic prediction and judge the traffic anomaly in WSNs. Based on our systematical researches on the characteristics of WSN traffic, the causes of WSN abnormal traffic, and latest related research and development, we better exploit the GM(1,1) model following four guidelines: using a sliding window to determine historical data for modeling, optimizing initial value of one-order grey differential equation, making traffic prediction by short step exponential weighted average method, and judging whether the traffic of the next moment is abnormal by Euclidean distance. Then, we propose a traffic anomaly detection algorithm for WSNs based on the improved exploitation of GM(1,1) model. Simulation results and comparative analyses demonstrate that our proposed WSN traffic anomaly detection algorithm can reduce the undetected rate and has better anomaly detection accuracy than traditional traffic anomaly detection algorithms.

1. Introduction In recent years, the emergence of a variety of wireless sensor networks (WSNs) applications, such as military applications [1], home automation [2], smart building [3], health and medical applications [4], vehicle and target tracking [5], and industry domains [6, 7], has been prompted by the developments in the field of distributed computing and microelectromechanical systems. In general, a WSN is composed of a mass of battery-powered thick-deployed and lowpower sensor nodes with sensing, processing, and storage capabilities and wireless communication [6]. Monitoring a certain phenomenon, such as object tracking or environmental data, is the main purpose of sensor nodes composed of power, sensing, computing, and communication modules [8]. As WSNs gain popularity, they are becoming more and more necessary for traffic anomaly detection. In a WSN, traffic anomaly detection is a useful method to understand the network behavior and determine network performance

and reliability contributing to effective and prompt troubleshooting and resolving various issues. Over the past few years, traffic anomaly detection, applied in WSN scenario, has become increasingly a dynamic field of study. Furthermore, intrusions, attacks, worms, and other kinds of malicious behaviors can be identified by traffic analysis and anomaly detection, so traffic anomaly detection in a WSN provides a sound basis for prevention and reaction in network security. As is well known, in the wired networks, in order to correctly detect abnormal traffic, traffic anomaly detection has been widely discussed and a variety of methods have been exploited. Because the traffic characteristics of traditional wired networks are greatly different from WSNs, the method of detecting abnormal served wired networks cannot be directly applied to WSNs. The fact that the nodes energy, storage capacity, and computing power are severely limited is an obvious characteristic of a WSN. In this case, while designing the WSN traffic anomaly detection algorithm is a huge challenge, dealing with the application correlation

2 (burst) and nonstationary characteristics of WSN traffic is another huge challenge. In this paper, we summarize the characteristics of traffic and the causes of abnormal traffic in a WSN. Classification research on traffic anomaly detection model and method in WSN is made, and comparative analyses are also carried out. The GM(1,1) model is efficient and has low computational complexity. So it is quite suitable for the real-time traffic anomaly detection of WSN in which the energy and capability in calculation of the node are limited. We better exploit the GM(1,1) model following four guidelines: using a sliding window to determine historical data for modeling, optimizing initial value of one-order grey differential equation, making traffic prediction by short step exponential weighted average method, and judging whether the traffic of the next moment is abnormal by Euclidean distance. Simulation results and comparative analyses indicate that the novel algorithm, which is based on improved exploitation of GM(1,1) model, possesses higher detection accuracy and better real time than the traditional method. The remainder of this paper is organized as follows. In Section 2, we briefly introduce the existing anomaly detection algorithms and make a comparative analysis of them. In Section 3, we analyze the characteristics of WSN traffic and the cause of WSN abnormal traffic in depth and introduce GM(1,1) model in detail. Then, we design four methods to improve exploitation of GM(1,1) model in Section 4. And a complete traffic anomaly detection algorithm for WSN is proposed in Section 5. In Section 6, we use Matlab to simulate this algorithm and the simulation results demonstrate that this algorithm can reduce the undetected rate and improve the detection accuracy. Section 7 concludes our paper.

2. Related Work The researches on traffic anomaly detection can be classified into three main research directions, namely, detection based on feature and behavior, statistic-based detection, and intelligent detection based on machine learning and data mining. Here, we will review the main research directions. 2.1. Detection Based on Feature and Behavior. The method to detect the anomaly, which is based on the flow characteristics and behavior, is to detect abnormal traffic through looking for patterns matching the anomalous traffic in traffic data of the network. This method, which requires the input of network traffic or data packets, has real-time performance and good detection accuracy. The approach can not only detect network anomalies, but also be applied to analyze and ascertain the types. However, due to this method’s requirement for real-time comparison between the features of abnormal traffic and current traffic, the database of the characteristics of abnormal traffic is a vital factor restricting the detection accuracy. In this method, a huge feature database needs to be built and constantly updated, which will be a great challenge for wireless sensor networks with constrained computing and storage capacity.

International Journal of Distributed Sensor Networks In [9], Wang extracts profiles of the characteristic of sensor nodes and network behavior through wireless sensor network packet traffic, and then anomalies can be identified by monitoring behavior of nodes and network. 2.2. Detection Based on Statistics. Detection method based on statistics, mainly including CUSUM algorithms and wavelet analysis, does not require advanced knowledge of the behaviors characteristics of nodes and network. It directly calculates statistics of the inputted traffic data, such as mean and variance, and then, according to statistical bias, we can determine whether the traffic is abnormal. In [10], a multistatistics modified CUSUM algorithm (M-CUSUM), which is based on matrix, is proposed. By computing the ratio between the sum of subtracting and absolute value of traffic among ingress and egress ports, it can real-timely detect network flow. A wavelet analysisbased real-time anomaly detection (WARAD) algorithm, proposed in [11], reversely collects the network traffic in real time and then utilizes the variance of the wavelet coefficients. This method can not only improve the accuracy and the instantaneity of anomaly detection, but also reduce the computational complexity of solving the Hurst values. Moreover, the variances of different level wavelet coefficients compose Hurst parameters of different decomposition levels. Therefore, through only detecting marked change of variances of adjacent level wavelet coefficients, abnormalities can be determined. 2.3. Intelligent Detection Based on Machine Learning and Data Mining. In this type of algorithm, anomaly detection is usually regarded as a clustering or classification problem, and then a machine learning model can be established. Finally, judgment is made in real time. This intelligent method includes many segments branches, such as ARMA model, Markov model, support vector machine (SVM), Backpropagation (BP) Neural Networks, and Immune-Genetic Algorithm. In [12], a series of Markov models, including tree-indexed Markov chains, are applied to characterize the network behavior. Moreover, optimal decision rules and large deviations techniques are made use of to identify anomalies. A community intrusion detection system on the strength of classification of support vector machine (SVM) is presented by Tian et al. in [13]. In [14], the researchers put forward two new clustering algorithms, namely, the supervised improved competitive learning network (SICLN) and the improved competitive learning network (ICLN). In [15], in order to maximize the detection rates, an enhanced method to detect DDoS attacks, the parameters of the traffic matrix of which are optimized by using a Genetic Algorithm (GA), is proposed. In the last three sections, the current mainstream methods to detect traffic anomaly in WSNs are summarized. And Table 1, in which G means good, B means bad, H means high, L means low, N means normal, and R means relatively, presents advantages and disadvantages of their performance.

International Journal of Distributed Sensor Networks

3

Table 1: Performance of different detection methods. Detection method (based on) Feature and behavior CUSUM Wavelet analysis Markov model ARMA model Immune-Genetic Neural networks SVM

Data needed RH L RL RH RL H H RH

Complexity RL RL N N N H RH RH

Accuracy N N RG RG RG G G G

Intelligence RG N RG N N G G G

Independence RG G RB RG G B RB RG

Notations. G: good; B: bad; H: high; L: low; N: normal; R: relatively.

In Table 1, independence is the performance of the detection method, which is alone applied to detect anomalies. Usually, the methods with relatively bad and bad independence are optimization and assist methods [16]. The method, which is based on feature and behavior, demands that feature database is built, which needs abundant data. The method based on Markov model needs to get Markov prediction model, which requires a mass of data. Similarly, the last three methods also require plenty of data. Generally, the complexity is also increasing with the improvement of detection accuracy. A detection method with low complexity and high accuracy is our research goal.

3. Theoretical Analysis 3.1. WSN Traffic Characteristics. On the whole, there are two important properties, namely, imbalance and application correlation [16], for WSN traffic: (1) The imbalance is mainly reflected in traffic of sensor nodes and convergence nodes. A large proportion of data is transferred from sensor nodes to convergence nodes, but only a small proportion of data, namely, control messages, need to be transferred from convergence nodes to sensor nodes. Therefore, most of the data is aggregated at the base station and convergence nodes. (2) The application correlation means that the network is full of unexpected traffic. WSN is associated with application, which means a full-time driver and periodic data inquiring. Therefore, its traffic data is cyclical. When tracking and collecting the target data, the traffic will increase sharply since a mass of data needs to be transferred in the very short period of time. 3.2. Causes of WSN Traffic Anomaly. The fact that nodes of WSN usually use radio to communicate and are deployed in an open area not only makes it vulnerable to malicious damage of people, but also brings about a series of security risks, such as disclosure of information. Frequent attack methods, including resource depletion attack [17], sinkhole attack, and flooding attack, will cause the abnormal behavior of network traffic. The common attack

Table 2: WSN traffic anomaly causes and traffic changes shape. WSN layer Application layer Transport layer

Network layer

Link layer Physical layer

Attack method Malicious code Ping/ICMP flood SYN flood Packet forgery/playback Selected forwarding Direction misleading Sinkhole Resource depletion Collision Congestion Physical damage

Traffic change Anomaly (whole) Increase (whole) Increase (whole) Indefinite Anomaly (part) Anomaly (part) Increase (part) High for a long time Concentrated (part) High anomaly (part) Decrease to zero

methods of different layers of network are elaborated in Table 2, as well as their caused anomaly. As we can see, almost all the attacks will cause an exception. So monitoring network traffic in a network contributes to the judgment of whether the abnormality has happened and whether a network is suffering from the attack. These are in favor of making appropriate defensive measures in subsequence. 3.3. Definition of GM(1,1) Model. The grey systems theory, established by Julong Deng in 1982, is a new methodology that focuses on the study of problems involving small samples and poor information. It deals with uncertain systems with partially known information through generating, excavating, and extracting useful information from what is available. So, systems’ operational behaviors and their laws of evolution can be correctly described and effectively monitored [18]. The grey model is abstracted from the grey system. GM(1,1) model, the simplest model of the grey model, represents a differential equation with one order and one variable. In the natural world, uncertain systems with small samples and poor information exist commonly. That fact determines the wide range of applicability of grey systems theory. GM(1,1) model has the characteristics of less data, less computation speed, accurate forecasting, and so forth. So, it is widely used in agriculture, forestry, water conservancy, energy, transportation, economy, and other fields. But, so far, no one has applied GM(1,1) model to WSN traffic anomaly detection.

4

International Journal of Distributed Sensor Networks

Denote the original data sequence by 𝑥(0) = (𝑥(0) (1), 𝑥 (2), . . . , 𝑥(0) (𝑛)); 𝑥(0) is the given discrete dimensional sequence of 𝑛 length. The 1-AGO (accumulated generating operation) formation is defined as (0)

𝑥(1) = (𝑥(1) (1) , 𝑥(1) (2) , . . . , 𝑥(1) (𝑛)) ,

(1)

where 𝑥(1) (1) = 𝑥(0) (1), and 𝑥(1) (𝑘) = ∑𝑘𝑖=1 𝑥(0) (𝑖), 𝑘 = 2, 3, . . . , 𝑛. According to GM(1,1), we can get the following first-order grey differential equation: 𝑑𝑥(1) + 𝑎𝑥(1) = 𝑏, 𝑑𝑡

(2)

where 𝑎 is the developing coefficient of GM and 𝑏 is the grey control variable. Denoting the differential coefficient subentry in the form of difference, we can get 𝑑𝑥(1) = 𝑥(1) (𝑘 + 1) − 𝑥(1) (𝑘) . 𝑑𝑡

(3)

Before building a grey GM(1,1) model, a proper 𝛼 value is needed to be assigned for a better background value 𝑧(1) (𝑘). The sequence of background values was defined as follows: 𝑧(1) = {𝑧(1) (1) , 𝑧(1) (2) , . . . , 𝑧(1) (𝑛)} ,

(4)

where 𝑧(1) (𝑘) = 𝛼∗𝑥(1) (𝑘)+(1−𝛼)∗𝑥(1) (𝑘−1), 𝑘 = 2, 3, . . . , 𝑛, 0 ≤ 𝛼 ≤ 1. For convenience, the 𝛼 value is often set to be 0.5, and 𝑧(1) (𝑘) is derived as 𝑧(1) (𝑘) = 0.5𝑥(1) (𝑘) + 0.5𝑥(1) (𝑘 − 1) .

(5)

Set u = [𝑎, 𝑏]𝑇 , coefficient vector Y = [𝑥(0) (2), 𝑥(0) (3), . . . , 𝑥(0) (𝑛)]𝑇 , and accumulated matrix −𝑧(1) (2) 1

] 1] ] ] .. ] . .] ] (1) [−𝑧 (𝑛) 1]

[ (1) [−𝑧 (3) [ B=[ [ .. [ . [

(6)

−1

̂ = [̂𝑎, ̂𝑏] = (B𝑇 B) B𝑇 Y. u

(7)

Solving the first-order grey differential equation, we can get the solution: 𝑎̂ ̂ (1) (𝑘) = [𝑥(0) (1) − ] 𝑒−̂𝑎(𝑘−1) + 𝑥 ̂𝑏

𝑎̂ , ̂𝑏

Step 1 (inspection and processing on the data sequence). First, in order to guarantee the feasibility of the model, inspection and processing on the original data sequence are necessary. Denote the original data sequence by 𝑥(0) = (𝑥(0) (1), 𝑥(0) (2), . . . , 𝑥(0) (𝑛)). Calculate the stepwise ratio 𝜆(𝑘) of series, and it is defined as 𝜆 (𝑘) =

𝑥(0) (𝑘 − 1) , 𝑘 = 2, 3, . . . , 𝑛. 𝑥(0) (𝑘)

(9)

If all stepwise ratios 𝜆(𝑘) are in the range Θ = (𝑒−2/(𝑛+2) , 𝑒2/(𝑛+2) ), sequence 𝑥(0) can be used to forecast by GM(1,1) model. Otherwise, an identical number 𝑐 is added to 𝑥(0) , where 𝑐 is a constant, in order to make 𝜆(𝑘) in the range Θ = (𝑒−2/(𝑛+2) , 𝑒2/(𝑛+2) ). Step 2 (build GM(1,1) model). Based on the data sequence which has passed inspection, GM(1,1) model can be established according to (2). Step 3 (model checking). (a) Residual test: set residual as 𝜀(𝑘). And it is defined as 𝜀 (𝑘) =

̂ (0) (𝑘) 𝑥(0) (𝑘) − 𝑥 , 𝑥(0) (𝑘)

𝑘 = 1, 2, . . . , 𝑛,

(10)

̂ (0) (1) = 𝑥(0) (1). If 𝜀(𝑘) < 0.2, the GM(1,1) model has where 𝑥 reached the general requirements; if 𝜀(𝑘) < 0.1, the model has reached the higher requirements. (b) Stepwise ratio deviation test: according to the stepwise ratio 𝜆(𝑘) of the original data sequence 𝑥(0) and the developing coefficient 𝑎, the corresponding stepwise ratio deviation can be calculated as follows: 1 − 0.5𝑎 (11) ) 𝜆 (𝑘) . 𝜌 (𝑘) = 1 − ( 1 + 0.5𝑎 If 𝜌(𝑘) < 0.2, the GM(1,1) model has reached the general requirements; if 𝜌(𝑘) < 0.1, the model has reached the higher requirements. Step 4 (predicting). Based on GM(1,1) model which has passed the test, according to (8), we can predict the future value. 3.5. Advantages of GM(1,1) Model. Applying GM(1,1) model to detect traffic anomaly of WSN has three main advantages:

Then, set 𝐽(u) = (Y−Bu)𝑇 (Y−Bu). According to the Ordinary Least Square (OLS) method, when 𝐽(u) is minimum, the estimate of u is 𝑇

3.4. Prediction Steps of GM(1,1) Model

𝑘 = 1, 2, . . . , 𝑛. (8)

(1) The modeling of GM(1,1) does not need a mass of data. Only four pieces of data are needed when establishing a GM(1,1) model. So GM(1,1) model can be used under the circumstances that the historical data is less and the integrity of sequence is poor. (2) Using differential equation to build the model can fully tap the essence of the system and has a higher accuracy. (3) It is quite suitable for the real-time traffic anomaly detection of WSN in which the energy and capability in calculation of the node are limited.

International Journal of Distributed Sensor Networks

4. Improvement of Exploitation of GM(1,1) Model 4.1. Using a Sliding Window to Determine Historical Data for Modeling. The historical data, which is used to build GM(1,1) model and predict future data, is quite short. In order to ensure the real time and accuracy of the model, we design a fixed-size sliding window, which should be as short as possible under the premise of high accuracy. In addition to ensuring the real time of the model, this will also guarantee the effectiveness of the latest historical data. Therefore, more accurate predicative data (reasonable network traffic expectation) can be got. 4.2. Optimizing Initial Value of One-Order Grey Differential Equation. In the traditional GM(1,1) model, the first piece of data of historical data is used as the initial condition for first-order grey differential equation. But, in fact, the cognitive function of the new information is greater than the cognitive function of the old information. Therefore, in order to make GM(1,1) model more accurate, the last piece of data of historical data is used as the initial condition of the GM(1,1) model. That is to say, set the last piece of data as 𝑥(1) (𝑛); then, ̂ 𝑥

(1)

(𝑘) = [𝑥

(1)

̂𝑏 (𝑛) − ] 𝑒−̂𝑎(𝑘−𝑛) + 𝑎̂

̂𝑏 , 𝑎̂ 𝑘 = 1, 2, . . . , 𝑛, (12)

̂ (1) (𝑘) − 𝑥 ̂ (1) (𝑘 − 1) , ̂ (0) (𝑘) = 𝑥 𝑥 ̂ (1) (1) . ̂ (0) (1) = 𝑥 𝑘 = 2, 3, . . . , 𝑛, 𝑥 4.3. Making Traffic Prediction by Short Step Exponential Weighted Average Method. The short step exponential weighted average method, which is mainly divided into two parts, short step prediction and predicted traffic value weighted average, is a vital step to perceive WSN traffic anomaly. To a certain degree, the method brings down the accuracy. However, it improves the capability of judging abnormal traffic. Correlation exists in between data at different times. The shorter the interval between them is, the greater their relevance is; conversely, the longer the interval between them is, the smaller their relevance is. Therefore, when using several time series data as sample data to make traffic prediction, it has higher accuracy making shorter step forecast and lower accuracy making longer step forecast [16]. For GM(1,1) model, when 𝐿 ≤ 3, its predictive value is highly effective. And the shorter the step is, the more accurate the value is. According to the analysis above, when 𝐿 = 1, the predictive value is the most effective and accurate. However, the value is not suitable. Therefore, sometimes, it is necessary for designing an anomaly detection algorithm to achieve “inaccurate” predication value. Thus, when the abnormal traffic comes, the normal fitting GM(1,1) model cannot be changed easily. So better predictive value, which could be applied to detect abnormalities easily, can be obtained. Its theoretical basis is that network traffic is often at a certain

5 Modeling using data in sliding window Making L-step prediction 1 2 3 Making exponential weighted average to produce a final determination value 4

···

Figure 1: Exponential weighted average method.

steady state, which has certain “inertia,” so any sudden traffic change is caused by equipment malfunction or humancaused nonnatural behaviors, which can be judged to be abnormal state [16]. For the purpose of making detecting traffic anomalies easier, short step exponential weighted average method is brought in normal traffic. It is shown in Figure 1 and described in the following: (1) Using the data in the sliding window to establish the model, predicting the following 𝐿-step, and saving predictive values in corresponding position of timetable (column coordinate corresponds to different time). (2) Producing a final determination value by making exponential weighted average on 𝐿 values in the same column of timetable. 4.4. Judging Whether the Traffic of the Next Moment Is Abnormal by Euclidean Distance. In traditional judgment method, relative error method is often used. But its effect is not ideal. So, we propose the Euclidean distance method. Set two 𝑊-size data sequences as 𝑎 = (𝑎1 , 𝑎2 , . . . , 𝑎𝑊) and 𝑏 = (𝑏1 , 𝑏2 , . . . , 𝑏𝑊). The Euclidean distance 𝐷 is defined as 2

2

2

𝐷 = √(𝑎1 − 𝑏1 ) + (𝑎2 − 𝑏2 ) + ⋅ ⋅ ⋅ + (𝑎𝑊 − 𝑏𝑊) .

(13)

If we consider the final determination value sequence as 𝑝 = (𝑝1 , 𝑝2 , . . . , 𝑝𝑛 ) and need to judge whether the 𝑇th data of the original data sequence is abnormal, we define 𝑎 = (𝑥(0) (𝑇 − 𝑊 + 1), 𝑥(0) (𝑇 − 𝑊 + 2), . . . , 𝑥(0) (𝑇)) and 𝑏 = (𝑝𝑇−𝑊+1 , 𝑝𝑇−𝑊+2 , . . . , 𝑝𝑇 ). Then, we calculate the Euclidean distance 𝐷. To clarify, we need to set a threshold depending on different WSN, and when 𝐷 exceeds the threshold, the traffic is considered as abnormal and marked by means of a warning signal. In addition, it is important to select the appropriate 𝑊 and threshold. If 𝑊 is too large, the model will be slow; conversely, the model will not be accurate enough. Similarly, if threshold is too large, the system is not sensitive to abnormal traffic; conversely, normal traffic is easily considered to be abnormal. The method to obtain the threshold is not the only one. That is to say, you can obtain this threshold in various ways. Our method to correctly determine the threshold is as follows. Measure some normal

6

International Journal of Distributed Sensor Networks

Start

Make the sliding window move a step forward No

Get data in the sliding window

Yes

Has all the data been traversed?

End

Perform inspection and processing the data sequence Judge whether the traffic of the next moment is abnormal Build GM(1,1) model Make traffic prediction by short step exponential weighted average method No

Is the GM(1,1) model valid?

Yes Make L-step prediction

Figure 2: Flow chart of the whole proposed algorithm.

traffic data and calculate the maximum 𝐷max of a sequence of 𝐷. Then, consider 𝐶𝐷max as threshold, where 1 < 𝐶 ≤ 3 in general.

5. Design and Implementation of Traffic Anomaly Detection in WSN Based on the improved exploitation of GM(1,1) model mentioned in the last two sections, a complete anomaly detection algorithm for WSN is designed. Furthermore, we introduce another traffic anomaly determination mechanism to assist anomaly detection. That is, first detected traffic anomaly value is regarded as a reference. Then, if the traffic is still fluctuating around the reference traffic value within the relative error judging threshold in this continuous time, it is considered abnormal and we send out warning signals. The whole improved GM(1,1)-based traffic anomaly detection algorithm for WSN is described in Figure 2.

6. Simulations and Results Analysis In this section, a simulated and a part of real WSN traffic data consisting of humidity measurement collected during 6-hour period at intervals of 5 seconds in 2010 gathered from the University of North Carolina are used to carry out simulations. We all set sliding window to 5 steps and prediction length to 3 steps. As for the Euclidean distance 𝐷, which depends on different WSN traffic properties, we consider 𝑊 as 5 and choose 0.05 on simulation for simulated WSN traffic and 2.5 for real WSN traffic. In the end, the simulation results are shown in Figures 3(c) and 4(c). We also display the results by traditional GM(1,1)-based algorithm in Figures 3(b) and 4(b) as comparison.

Table 3: Definition of TP, FP, TN, and FN. Predicted Abnormal Normal

Actual Abnormal True positive (TP) False negative (FN)

Normal False positive (FP) True negative (TN)

From the simulation results, we could clearly see that a smoother predictive curve is obtained. This reflects the “inertia” (stability) of normal traffic. Consequently, when an exception takes place, in order to better detect the occurrence of abnormal traffic, the model will not quickly adapt to the abnormality. And the delay mechanism can well contribute to the detection of anomaly and send out an alert. As shown in Figures 3 and 4, compared with traditional methods, the improved algorithm raises the correct detection rate considerably, but the incorrect detection rate remains at quite low level. Therefore, the improved GM(1,1)-based algorithm outperforms the traditional GM(1,1)-based algorithm. To clarify the conclusion from some measures, true positive (TP), false positive (FP), true negative (TN), and false negative (FN) are defined and explained in Table 3. Actually, positive/negative means that the model predicts that the data is abnormal/normal and true/false means that the prediction is right/wrong. Now, we use the terms of false positive rate (FPR) and false negative rate (FNR) to measure traditional and improved GM(1,1)-based algorithm. FPR and FNR are explained in the following formulas: FPR =

FP , FP + TN

FN , FNR = FN + TP

(14)

7

0.7

0.7

0.6

0.6

0.5

0.5

Traffic (Mbps)

Traffic (Mbps)

International Journal of Distributed Sensor Networks

0.4 0.3 0.2

0.3 0.2 0.1

0.1 0

0.4

0

10

20

30

40

50

60

70

80

90

0

0

10

20

30

Time (minutes)

40 50 60 Time (minutes)

70

80

90

Real traffic Predictive traffic Alarm

Real traffic Known alarm

(b) Detection by traditional GM(1,1)-based algorithm

(a) Simulated WSN traffic

0.7

Traffic (Mbps)

0.6 0.5 0.4 0.3 0.2 0.1 0

0

10

20

30

40 50 60 Time (minutes)

70

80

90

Real traffic Predictive traffic Alarm (c) Detection by improved GM(1,1)-based algorithm

Figure 3: Simulation results on simulated WSN traffic.

Table 4: Detection capabilities of different algorithms. Anomaly detection algorithm Traditional GM(1,1)-based algorithm Improved GM(1,1)-based algorithm

FPR 0 0

FNR 87.18% 20.51%

got from our implemented simulations. Different embodiments could get slightly variant consequence, but they all hold the same trend.

7. Conclusions with TP being true positive, TN being true negative, FP being false positive, and FN being false negative. Here, positive/negative means that we judge that the data is abnormal/normal and true/false means that the judgment is right/wrong. In this paper, we only take simulation results on real WSN traffic as an example and the results are shown in Table 4. The results show that while FPR maintains 0, the improved algorithm sharply lowers the FNR, meaning reducing the undetected rate; thus, it improves the detection accuracy. Particularly note that the results shown in Table 4 were

In this paper, we introduce the traffic anomaly detection technique in WSN and GM(1,1) model in detail. Then, through model improvements analysis and algorithm design, an improved GM(1,1)-based traffic anomaly detection algorithm for WSN is proposed. Finally, we use Matlab to simulate this algorithm and the simulation results demonstrate that this algorithm can reduce the undetected rate and improve the detection accuracy. In addition, this algorithm requires less computation and is efficient. So it is quite suitable for the real-time traffic anomaly detection of WSN in which the energy and capability in calculation of the node are limited.

8

International Journal of Distributed Sensor Networks 100

140

90

120

80 100 Traffic (%RH)

Traffic (%RH)

70 60 50 40 30

80 60 40

20

20

10 0 185

190

195

200

205

210

0 185

215

190

195

200

205

210

215

Time (minutes)

Time (minutes) Real traffic Predictive traffic Alarm

Real traffic Known alarm

(b) Detection by traditional GM(1,1)-based algorithm

(a) Real WSN traffic

140 120

Traffic (%RH)

100 80 60 40 20 0 185

190

195

200 205 Time (minutes)

210

215

Real traffic Predictive traffic Alarm (c) Detection by improved GM(1,1)-based algorithm

Figure 4: Simulation results on real WSN traffic.

Competing Interests

References

The authors declare that they have no competing interests.

[1] G. Simon, M. Maroti, A. L´edeczi et al., “Sensor network-based countersniper system,” in Proceedings of the 2nd International Conference on Embedded Networked Sensor Systems (SenSys ’04), pp. 1–12, Baltimore, Md, USA, November 2004. [2] M.-T. Vo, V.-S. Tran, T.-D. Nguyen, and H.-T. Huynh, “Wireless sensor network for multi-storey building: design and implementation,” in Proceedings of the International Conference on Computing, Management and Telecommunications (ComManTel ’13), pp. 175–180, Ho Chi Minh City, Vietnam, January 2013. [3] M. S. Familiar, J. F. Martinez, and L. Lopez, “Pervasive smart spaces and environments: a service-oriented middleware architecture for wireless Ad Hoc and sensor networks,” International

Acknowledgments This work is partly supported by the Chengdu Science and Technology Project (2014-HM01-00310-SF), the Information Technology Research Projects of Ministry of Transport of China (2014 364X14 040), and the National Natural Science Foundation of China (61104042 and 61273235).

International Journal of Distributed Sensor Networks

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

Journal of Distributed Sensor Networks, vol. 2012, Article ID 725190, 11 pages, 2012. M. Al Ameen, J. Liu, and K. Kwak, “Security and privacy issues in wireless sensor networks for healthcare applications,” Journal of Medical Systems, vol. 36, no. 1, pp. 93–101, 2012. A. Arora, P. Dutta, S. Bapat et al., “A line in the sand: a wireless sensor network for target detection, classification, and tracking,” Computer Networks, vol. 46, no. 5, pp. 605–634, 2004. I. F. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci, “Wireless sensor networks: a survey,” Computer Networks, vol. 38, no. 4, pp. 393–422, 2002. A. Flammini, P. Ferrari, D. Marioli, E. Sisinni, and A. Taroni, “Wired and wireless sensor networks for industrial applications,” Microelectronics Journal, vol. 40, no. 9, pp. 1322–1336, 2009. Z. M. Saric, D. D. Kukolj, and N. D. Teslic, “Acoustic source localization in wireless sensor network,” Circuits, Systems, and Signal Processing, vol. 29, no. 5, pp. 837–856, 2010. Q. Wang, “Packet traffic: a good data source for wireless sensor network modeling and anomaly detection,” IEEE Journals & Magazines, vol. 25, no. 3, pp. 15–21, 2011. Z.-X. Sun, Y.-W. Tang, and Y. Cheng, “Router anomaly traffic detection based on modified-CUSUM algorithms,” Journal of Software, vol. 16, no. 12, pp. 2117–2123, 2005. L. Zhiyuan, Z. Qiuzhi, W. Yongkun, T. Zhenyu, and H. Huaming, “Wavelet analysis-based real-time anomaly detection algorithm for wireless sensor network,” Journal of Nanjing Normal University (Natural Science Edition), vol. 1, pp. 87–92, 2014 (Chinese). I. C. Paschalidis and Y. Chen, “Anomaly detection in sensor networks based on large deviations of Markov chain models,” in Proceedings of the 47th IEEE Conference on Decision and Control (CDC ’08), pp. 2338–2343, IEEE, Cancun, Mexico, December 2008. J. Tian, M. Gao, and S. Zhou, “Wireless sensor network for community intrusion detection system based on classify support vector machine,” in Proceedings of the IEEE International Conference on Information and Automation (ICIA ’09), pp. 1217– 1221, Zhuhai, China, June 2009. J. Z. Lei and A. A. Ghorbani, “Improved competitive learning neural networks for network intrusion and fraud detection,” Neurocomputing, vol. 75, pp. 135–145, 2012. S. M. Lee, D. S. Kim, J. H. Lee, and J. S. Park, “Detection of DDoS attacks using optimized traffic matrix,” Computers and Mathematics with Applications, vol. 63, no. 2, pp. 501–510, 2012. Q. Yu, L. Jibin, and L. Jiang, “An improved ARIMA-based traffic anomaly detection algorithm for wireless sensor networks,” International Journal of Distributed Sensor Networks, vol. 2016, Article ID 9653230, 9 pages, 2016. F. Rongrong, Research on Key Technologies of Intrusion Detection for Wireless Sensor Network, Beijing Jiaotong University, Beijing, China, 2013. S. Liu, J. Forrest, and Y. Yang, “A brief introduction to grey systems theory,” in Proceedings of the IEEE International Conference on Grey Systems and Intelligent Services (GSIS ’11), pp. 1–9, IEEE, Nanjing, China, September 2011.

9