Incremental Anomaly-based Intrusion Detection

0 downloads 0 Views 273KB Size Report
the proposed system is analyzed using the standard NSL-KDD dataset. The remainder of .... 1 Available on: http://moa.cms.waikato.ac.nz/downloads. 2 Available ...
3th International Conference on Web Research

Incremental Anomaly-based Intrusion Detection System Using Limited Labeled Data Parisa Alaei

Fakhroddin Noorbehbahani

Faculty of Graduate Studies Safahan Institute of Higher Education Isfahan, Iran [email protected]

Department of Electrical and Computer Engineering Isfahan University of Technology Isfahan, Iran [email protected]

that somehow manage to pass through the firewall and take appropriate action to defend the network. This second layer is known as an Intrusion Detection System (IDS) which is able to identify intrusion attempts by monitoring and analyzing network packets and logs. In case an intrusion is detected, the system alerts the network administer [1-3].

Abstract—With the proliferation of the internet and increased global access to online media, cybercrime is also occurring at an increasing rate. Currently, both personal users and companies are vulnerable to cybercrime. A number of tools including firewalls and Intrusion Detection Systems (IDS) can be used as defense mechanisms. A firewall acts as a checkpoint which allows packets to pass through according to predetermined conditions. In extreme cases, it may even disconnect all network traffic. An IDS, on the other hand, automates the monitoring process in computer networks. The streaming nature of data in computer networks poses a significant challenge in building IDS. In this paper, a method is proposed to overcome this problem by performing online classification on datasets. In doing so, an incremental naive Bayesian classifier is employed. Furthermore, active learning enables solving the problem using a small set of labeled data points which are often very expensive to acquire. The proposed method includes two groups of actions i.e. offline and online. The former involves data preprocessing while the latter introduces the NADAL online method. The proposed method is compared to the incremental naive Bayesian classifier using the NSL-KDD standard dataset. There are three advantages with the proposed method: (1) overcoming the streaming data challenge; (2) reducing the high cost associated with instance labeling; and (3) improved accuracy and Kappa compared to the incremental naive Bayesian approach. Thus, the method is well-suited to IDS applications.

With respect to information source, IDS are divided into two categories: host-based and network-based. Host-based methods tend to monitor and analyze internal computer operations, for instance by determining the resources that are allowed for each host as well as illegal access attempts. Network-based systems, in contrast, deal with intrusion at the network level. Anomalies at this level are often caused by external attackers whose aim is to gain unauthorized network access, steal information, and disrupt the network [4,5]. In terms of method, IDS detect either misuse or anomalies. The former method, also known as signature-based method, uses known attacks or vulnerable points in the system to identify attacks; however, unknown attacks without matching patterns cannot be detected. In the latter method, the behaviors of normal users are profiled and deviations from the normal profile are flagged as intrusions [2]. There are certain challenges for anomaly detection systems. Unlike traditional data packets which are inherently static, data streams are continuous flows of data which cannot be stored; they must be analyzed as one unit. Assuming anomaly detection is a classification problem, this paper aims to present a novel method for incrementally classifying data streams. In doing so, a new framework is proposed to improve anomaly detection performance by classifying data streams in an online manner. Moreover, active learning is employed to reduce the costs associated with data labeling while the performance of the proposed system is analyzed using the standard NSL-KDD dataset.

Keywords—Intrusion Detection, Anomaly Detection; Incremental Classification; Data Stream; Active Learning

I. INTRODUCTION In recent years, the rapid growth of network-based services and technologies has resulted in a surge in the number of network-based computer attacks. An attack refers to a set of actions that compromise the confidentiality, integrity, and accessibility of resources. A system is known to be secure if it can guarantee these three criteria. Attacks must be identified before doing any harm to the organization. Even Local Area Networks (LAN) need to be able to withstand such attacks since network performance is important in terms of bandwidth and other resources. The most common means of defense against potential attacks involves a two-layered system. The first layer comprises a firewall which controls access to the network while the second layer is configured to detect threats

The remainder of this paper is organized as follows. Section 2 reviews previous works on the topic. Naive Bayesian classification and active learning are discussed in Sections 3 and 4, respectively. The proposed method is detailed in Section 5 and evaluated in Section 6. Finally, concluding remarks are given in Section 7.

1 978-1-5386-0420-5/17/$31.00 ©2017 IEEE

3th International Conference on Web Research Efficient naive Bayesian classifiers are applied to the reduced dataset to detect possible intrusions. Experimental results show that the selected features are more appropriate for designing IDS and result in more effective intrusion detection. In this paper, the naive Bayesian algorithm is evaluated using the KDD-NSL dataset to detect four types of attacks: Probe, DoS, U2R, and R2L. Feature reduction may use three standard feature selection methods: correlation, information gain, or gain ratio. The proposed method in this study employs feature vitality based reduction. The results indicate that the proposed model provides better performance.

II. RELATED WORK Anomaly-based IDS have been extensively studied; however, few studies present an incremental approach. Incremental methods may be supervised, semi-supervised, and unsupervised [6]. In this paper, supervised methods are considered which model the normality of the data. Here, the problem of anomaly detection is converted into one of classification [7]. In [8], the authors propose an incremental learning method by cascading a Service Classifier (SC) using Incremental Tree Inductive (ITI) learning. The cascading approach includes three steps: (1) training; (2) test; and (3) incremental learning. Firstly, an SC partitions the training data into clusters and the ITI is trained on each cluster. Secondly, the clusters are assigned to test instances and ITI is examined with the instances. Finally, SC and ITI are updated, as opposed to retrained, using the authors’ proposed approach. The authors use incremental learning to train the binary ITI. According to the results, ITI is superior to K-Mans, SOM, K-Mean+ITI, and SOM+ITI in terms of detection rate. However, with respect to false positive, the cascading SC+ITI approach outperforms ITI and enjoys better detection rate. Yet, it remains to be tested in online conditions.

III. NAIVE BAYESIAN CLASSIFICATION Naive Bayesian classification is a popular method for stream mining. The popularity of the method is due to the fact that the model can be updated with new data streams very easily. The method is inherently incremental since new data points are updated as they arrive. Given this incremental nature, the algorithm is very suitable to stream mining [12]. Assuming classes, namely , , … , , for tuple , the classifier seeks to find the class with the highest posterior probability on the condition . In fact, the classifier predicts whether tuple belongs to the class. Therefore, belongs to if and only if:

In another study, a novel anomaly detection system is proposed by Ren et al. [9] to which dynamically update normal usage profiles. Upon encountering new behavior, density-based incremental clustering is used to insert the new behavior into old profiles. The authors report less sensitivity to data disruptions compared to Anomaly Detection With fast Incremental Clustering (ADWICE) profiles. The approach also improves cluster quality and reduces false alarms; nevertheless, the method displays poor performance in working with large datasets.

(

( | ) ( ) ( )

(1)

Since ( ) remains constant for all classes, one must determine the class that maximizes the expression. If prior probabilities are unknown, they are commonly regarded as being equal i.e. ( ) = ( ) = ⋯ = ( ); Hence, only ( | ) must be maximized. Moreover, the probabilities may be estimated using ( ) = , training tuples with the label .

Other authors [10] propose Reserved Set-Incremental Support Vector Machine (RS-IVM) which is an improved incremental SVM for intrusion detection. In order to reduce the noise cause by large differences between feature values, the authors propose a modified kernel function known as U-RBF which embeds feature means and root square mean differences in the RBF kernel. The authors claim that RS-ISVM facilitates the fluctuation phenomenon in the learning process while providing better and more reliable performance. However, it suffers from low U2R and R2L and requires a large number of parameters.

where |

,

| is the number

Datasets with large numbers of features impose high calculation cost for ( | ). To reduce the calculations, the classes are assumed to be independent. Thus, the following is true: ( | )=

(

(

(2)

| ) = ( × (

In [1], an online Bayesian classifier is constructed which distinguishes between normal and intrusive links in the KDD99 dataset. The classifier starts with a small number of training instances of both normal and intrusive classes. The remaining instances are then classified while the mean and standard deviation of the features are continuously updated. A key action in this online naive Bayesian classifier is to update the µ and σ values following each instance test. The method carries out naive incremental updating.

| )× ( | )

| )×…

Using the training tuples, individual probabilities ( | ), and ( | ) may be estimated [7].

| ),

IV. ACTIVE LEARNING Instead of inquiring about the correct labels for all instances, active learning determines how input instances are selectively labeled. Quite often, this approach requires considerably fewer instances to learn a concept, compared to typical supervised methods. The majority of research on the topic is focused on tuple selection for labeling [7].

Many modern intrusion detection methods focus on feature selection or reduction. This is because many features may be irrelevant or redundant and may inhibit system performance. In [11], important features are identified through reduced input.

2 978-1-5386-0420-5/17/$31.00 ©2017 IEEE

| )=

3th International Conference on Web Research In active learning, once an instance is scanned, depending on the selected strategy, the algorithm searches for the correct label and the predictive model is trained with the new instance. In the following, we briefly explain four active learning strategies

1

Get xt xt

• Random Strategy: Input samples are given random labels. • Fixed Uncertainty Strategy: The instances for which the current classifier has minimum confidence are labeled. A constant threshold is considered. Only those instances are labeled for which the maximum posterior probability as estimated by the classifier does not exceed the threshold. • Variable Uncertainty Strategy: Instances below the threshold are labeled with a time interval; the threshold is introduced as varying with time; and the budget is spent in a uniform fashion over time.

Select DOS features

Select Probe features

Select R2L features

Select U2R features

xt,DOS

xt,Probe

xt,R2L

xt,U2R

Classify with DOS classifier

Classify with Probe classifier

Classify with R2L classifier

Classify with U2R classifier

CDOS(xt,DOS)

CProbe(xt,Probe)

CR2L(xt,R2L)

CU2R(xt,U2R)

AL for DOS classifier

AL for Probe classifier

AL for R2L classifier

AL for U2R classifier

0/1

• Uncertainty Strategy With Randomization: A random threshold is selected and the labels for instances near the threshold are inquired.

0/1

0/1

0/1

OR Gate

V. PROPOSED METHOD The proposed model, called Network Anomaly Detection using Active Learning (NADAL) involves an offline and an online step. The selected dataset is preprocessed in an offline fashion. The NSL-KDD dataset contains instances labeled with the attack type. During the preprocessing step, the attacks are divided into four categories: DoS, Probe, R2L, and U2R. Furthermore, there are four classifiers at the respective layers of attacks. Thus, the preprocessing carried out using Weka selects the appropriate features for each classifier. The selected features are then given to the feature filtering module in NADAL.

Does Output equal 1?

Yes Get yt

No xt , yt Aggregate classifiers output

Update Classifiers

1

Figure 1 illustrates the NADAL framework. In the proposed online method, at each time, each instance is processed at most once to improve the model. The instance is having label passes then discarded. Initially, instance through the feature filtering module and the appropriate features for each classifier are considered. At each layer, the naive Bayesian module incrementally predicts the probability that the instance belongs to the class. Thereafter, the selected active learning strategy (i.e. uncertainty with randomization) is called. The output of the strategy determines whether the label for the instance must be inquired. A logical OR gate is used to aggregate the results from different active learning modules. The classifiers are updated using the instance if the gate outputs 1. Otherwise, the aggregate output module predicts the label according to the maximum certainty calculated by the represents the actual label for classifiers. In this case, instance .

Fig. 1. The proposed model called NADAL

VI. EVALUATION The proposed framework in this paper was implemented using Java in NetBeans 8.0.2. Feature selection was performed using Weka and the Wrapper method. The active learning modules as well as the incremental naive Bayesian module were implemented by modifying the code from Massive Online Analysis (MOA1) 2016.04 written in Java. The standard NSLKDD2 dataset is used for evaluation purposes. The dataset was randomized via the Randomize functionality in Weka. The accuracy and Kappa values were then calculated for the framework at four layers: DoS, Probe, U2R, and R2L. The results were compared to those of the incremental naive Bayesian approach in MOA. In this section, we briefly explain 1

Available on: http://moa.cms.waikato.ac.nz/downloads Available on: http://www.unb.ca/research/iscx/dataset/iscx-NSL-KDDdataset.html 2

3 978-1-5386-0420-5/17/$31.00 ©2017 IEEE

3th International Conference on Web Research the evaluation criteria. Also, different diagrams resulting from the implementation are depicted.

The Kappa coefficient measures the agreement among individuals who classify or measure items. The value is obtained as follows: − (4) = 1− Where and denote observed and chance agreement, respectively.

A. Dataset As mentioned earlier, in this paper, the standard NSL-KDD dataset is used for evaluation purposes. The dataset is a revision of the KDD-99 without repetitive and redundant instances. Each record includes 42 features. The KDDtrain+.txt file was used wherein the 42nd feature identifies a normal vs. attack label. There are four types of attacks: DoS, Probe, R2L, and U2R [14, 15].

C. Implementation Results Implementation results can be seen in Table 1. The results exhibit a clear improvement in both accuracy and Kappa compared to the incremental naive Bayesian approach. The results are shown for the NSL-KDD dataset with 10 randomizations.

B. Evaluation Criteria The results are evaluated according to accuracy and Kappa. Accuracy represents the percentage of tuples in the dataset that are correctly labeled. The measure is calculated as below: + (3) = + TABLE I.

ACCURACY AND KAPPA FOR TEN RANDOMIZATIONS: NADAL VS. INCREMENTAL NAIVE BAYESIAN CLASSIFIER DataSets

Method

1

2

3

4

5

6

7

8

9

10

NADAL Incremental Naive Bayes NADAL Incremental Naive Bayes NADAL Incremental Naive Bayes NADAL Incremental Naive Bayes NADAL Incremental Naive Bayes NADAL Incremental Naive Bayes NADAL Incremental Naive Bayes NADAL Incremental Naive Bayes

93.32

93.05

92.89

93.36

93.61

93.47

93.30

93.81

93.19

93.38

89.96

89.10

88.87

88.46

89.76

88.99

89.13

89.02

89.25

89.45

88.20

87.95

87.54

88.41

88.79

88.54

88.36

89.17

88.06

88.42

82.81

80.96

81.30

80.60

82.47

81.13

81.42

81.47

81.58

81.89

93.44

93.07

92.97

93.46

93.62

93.40

93.36

93.86

93.43

93.32

90.64

90.17

88.38

83.00

89.60

90.96

89.89

87.72

88.33

90.01

88.36

87.94

87.61

88.55

88.76

88.37

88.42

89.21

88.44

88.27

83.68

82.96

80.33

72.27

81.57

84.32

82.52

78.84

80.04

82.70

89.06

86.09

89.03

94.51

90.78

89.40

94.50

86.44

90.63

92.53

89.11

89.35

87.80

87.65

88.63

88.81

89.09

88.41

89.50

88.86

78.52

72.70

78.46

89.07

81.85

79.12

89.07

73.40

81.52

85.22

78.94

79.28

76.70

76.20

77.97

78.24

78.79

77.77

77.77

78.15

95.61

94.88

92.78

95.22

87.59

94.66

87.12

87.09

91.60

87.05

90.41

89.70

89.57

89.28

89.58

89.11

89.29

89.47

89.81

89.57

91.15

89.71

85.44

90.41

74.65

89.20

73.90

73.59

82.86

73.55

80.78

79.41

79.17

78.72

79.08

78.17

78.57

79.04

79.54

79.02

DOS Accuracy DOS Kappa Probe Accuracy Probe Kappa

R2L Accuracy

R2L Kappa

U2R Accuracy U2R Kappa

D. Diagrams In this section, several diagrams are presented to compare NADAL and the incremental naive Bayesian classifiers in terms of accuracy and Kappa.

method has higher accuracy in most cases (Figures 4 and 5). Figure 2 compares the DoS classifier in terms of accuracy for the incremental naive Bayesian approach and NADAL. Similar comparisons are depicted in Figures 3 through 5 for the Probe, R2L, and U2R classifiers, respectively.

1) Accuracy As seen in Figures 2 and 3, the proposed NADAL method enjoys higher accuracy for the DoS and Probe classifiers. Furthermore, with respect to R2L and U2R, the proposed

4 978-1-5386-0420-5/17/$31.00 ©2017 IEEE

3th International Conference on Web Research Incremental Naive Bayes

NADAL

96

U2R Classifier Accuracy

DOS Classifier Accuracy

NADAL 94 92 90 88 86 84 1

2

3

4

5

6

7

8

9

10

Randomized NSL_KDD DataSet

Fig. 2. DoS accuracy NADAL vs. incremental naive Bayesian

2

3

4

5

6

7

8

9

3

NADAL

Fig. 3. Probe accuracy NADAL vs. incremental naive Bayesian

Incremental Naive Bayes

96 94 92 90 88 86 84 82 80

4

5

6

7

8

9

10

Incremental Naive Bayes

90 88 86 84 82 80 78 76 1

2

3

4

5

6

7

8

9

Randomized NSL_KDD DataSet Fig. 6. DoS Kappa NADAL vs. incremental naive Bayesian

1

2

3

4

5

6

7

8

9

10

NADAL

Randomized NSL_KDD DataSet Probe Classifier Kappa

R2L Classifier Accuracy

2

Randomized NSL_KDD DataSet

10

Randomized NSL_KDD DataSet

NADAL

1

2) Kappa As seen in Figures 6 and 7, the proposed NADAL method enjoys higher Kappa for the DoS and Probe classifiers. Furthermore, with respect to R2L and U2R, the proposed method has higher Kappa in most cases (Figures 8 and 9). Figure 6 compares the DoS classifier in terms of Kappa for the incremental naive Bayesian approach and NADAL. Similar comparisons are depicted in Figures 7 through 9 for the Probe, R2L, and U2R classifiers, respectively.

Incremental Naive Bayes

96 93 90 87 84 81 78 75 1

98 96 94 92 90 88 86 84 82

Fig. 5. U2R accuracy NADAL vs. incremental naive Bayesian

DOS Classifier Kappa

Probe Classifier Accuracy

NADAL

Incremental Naive Bayes

Fig. 4. R2L accuracy NADAL vs. incremental naive Bayesian

Incremental Naive Bayes

92 89 86 83 80 77 74 71 68 1

2

3

4

5

6

7

8

9

Randomized NSL_KDD DataSet

Fig. 7. Probe Kappa NADAL vs. incremental naive Bayesian

5 978-1-5386-0420-5/17/$31.00 ©2017 IEEE

10

10

3th International Conference on Web Research NADAL

Incremental Naive Bayes

Incremental Naive Bayes

94

92 89 86 83 80 77 74 71 68

Accuracy

R2L Classifier Kappa

NADAL

92 90 88 86

1

DOS

2 3 4 5 6 7 8 9 10 Randomized NSL_KDD DataSet

Probe

R2L

U2R

Fig. 10. Comparing average accuracy of the four classifiers: NADAL vs. incremental naive Bayesian

Fig. 8. R2L Kappa NADAL vs. incremental naive Bayesian

NADAL

Incremental Naive Bayes

90 Incremental Naive Bayes

85

Kappa

U2R Classifier Kappa

NADAL 96 92 88 84 80 76 72 68

80 75 70 DOS

1

2

3

4

5

6

7

8

9

Probe

R2L

U2R

Fig. 11. Comparing average Kappa of the four classifiers: NADAL vs. incremental naive Bayesian

10

Randomized NSL_KDD DataSet

VII. CONCLUSION AND RECOMMENDATIONS Traditional data packets are inherently static. In contrast, streaming data are continuously created; they cannot be stored; and must by analyzed as a single unit. In this paper, a novel network anomaly detection framework was proposed to improve efficiency in classifying data in an online fashion. Furthermore, active learning was used to reduce labeling costs. The proposed system was evaluated using the standard

Fig. 9. U2R Kappa NADAL vs. incremental naive Bayesian

3) Comparing classifier average accuracies and Kappa On average, NADAL enhances classification accuracy in DOS, Probe, RL2, and U2R. Compared to the incremental naive Bayesian approach, the enhancements are 4.14, 4.62, 1.79, and 1.57, units respectively. Furthermore, the improvements in Kappa are 6.78, 7.47, 2.74, and 3.29 units, respectively. Figure 10 compares average accuracy of NADAL with that of the naive Bayesian approach for ten randomized sets. As seen, NADAL provides higher accuracy for all four classifiers. Average Kappa values are illustrated in Figure 11. Again, NADAL has superior performance.

NSL-KDD dataset. Implementation sresults revealed that the proposed method outperforms the naive Bayesian approach in terms of both accuracy and Kappa. There are many challenges in detecting network anomalies which can be addressed in future studies. Our recommendations are as follows: • Employing other incremental classification approaches in NADAL and comparing the evaluation criteria. • Improving classification accuracy in data with class imbalance so that the data are equally distributed among the training classes. • Detecting concept drift in data streams where the relationship between input data and labels may be modified due to concept drift. The IDS must be able to detect such modifications. We recommend adding a

6 978-1-5386-0420-5/17/$31.00 ©2017 IEEE

3th International Conference on Web Research module to identify modifications and investigate the results [13]. • Using online feature extraction methods in the NADAL framework. ACKNOWLEDGEMENTS The information in this paper was extracted from a Master’s thesis by the authors for the Master of IT Engineering Program in Safahan Higher Education Institute.

REFERENCES [1]

[2]

[3]

[4]

[5]

[6]

[7] [8]

[9]

[10]

[11] [12]

[13]

[14]

[15]

F. Gumus, C. O. Sakar, Z. Erdem, and O. Kursun, “Online Naive Bayes classification for network intrusion detection,” in Advances in Social Networks Analysis and Mining (ASONAM), 2014 IEEE/ACM International Conference on, 2014, pp. 670–674. A. Rasoulifard, A. Ghaemi Bafghi, and M. Kahani, “Incremental hybrid intrusion detection using ensemble of weak classifiers,” Commun. Comput. Inf. Sci., vol. 6 CCIS, pp. 577–584, 2008. R. Singh, H. Kumar, and R. K. Singla, “An intrusion detection system using network traffic profiling and online sequential extreme learning machine,” Expert Syst. Appl., vol. 42, no. 22, pp. 8609–8624, 2015. P. García-Teodoro, J. Díaz-Verdejo, G. Maciá-Fernández, and E. Vázquez, “Anomaly-based network intrusion detection: Techniques, systems and challenges,” Comput. Secur., vol. 28, no. 1–2, pp. 18–28, 2009. M. H. Bhuyan, D. K. Bhattacharyya, and J. K. Kalita, “Network anomaly detection: methods, systems and tools,” IEEE Commun. Surv. Tutorials, vol. 16, no. 1, pp. 303–336, 2014. M. H. Bhuyan, D. K. Bhattacharyya, and J. K. Kalita, “Survey on incremental approaches for network anomaly detection,” arXiv Prepr. arXiv1211.4493, 2012. J. Han, M. Kamber, and J. Pei, Data mining: concepts and techniques. Elsevier, 2011. W.-Y. Yu and H.-M. Lee, “An incremental-learning method for supervised anomaly detection by cascading service classifier and ITI decision tree methods,” in Intelligence and Security Informatics, Springer, 2009, pp. 155–160. F. Ren, L. Hu, H. Liang, X. Liu, and W. Ren, “Using density-based incremental clustering for anomaly detection,” in Computer Science and Software Engineering, 2008 International Conference on, 2008, vol. 3, pp. 986–989. Y. Yi, J. Wu, and W. Xu, “Incremental SVM based on reserved set for network intrusion detection,” Expert Syst. Appl., vol. 38, no. 6, pp. 7698–7707, 2011. S. Mukherjee and N. Sharma, “Intrusion Detection using Naive Bayes Classifier with Feature Reduction,” vol. 4, pp. 119–128, 2012. P. E. N. Lutu, “Fast Feature Selection for Naive Bayes Classification in Data Stream Mining,” in Proceedings of the World Congress on engineering, 2013, vol. 3. I. Zliobaite, A. Bifet, B. Pfahringer, and G. Holmes, “Active learning with drifting streaming data,” Neural Networks Learn. Syst. IEEE Trans., vol. 25, no. 1, pp. 27–39, 2014. P. Aggarwal and S. K. Sharma, “Analysis of KDD Dataset AttributesClass wise for Intrusion Detection,” Procedia Comput. Sci., vol. 57, pp. 842–851, 2015. L. Dhanabal and S. P. Shantharajah, “A study on NSL-KDD dataset for intrusion detection system based on classification algorithms,” Int. J. Adv. Res. Comput. Commun. Eng., vol. 4, no. 6, pp. 446–452, 2015.

7 978-1-5386-0420-5/17/$31.00 ©2017 IEEE