Agent-based Network Intrusion Detection System Using ... - IEEE Xplore

5 downloads 0 Views 135KB Size Report
Agent-based Network Intrusion Detection System Using Data Mining. Approaches. Cheung-Leung Lui. Data Cog Corporation Ltd [email protected].
Agent-based Network Intrusion Detection System Using Data Mining Approaches Cheung-Leung Lui Data Cog Corporation Ltd [email protected]

Tak-Chung Fu Data Cog Corporation Ltd [email protected]

Abstract Most of the existing commercial NIDS products are signature-based but not adaptive. In this paper, an adaptive NIDS using data mining technology is developed. Data mining approaches are used to accurately capture the actual behavior of network traffic, and portfolio mined is useful for differentiating “normal” and “attack” traffics. On the other hand, most of the current researches are using only one engine for detection of various attacks; the proposed system is constructed by a number of agents, which are totally different in both training and detecting processes. Each of the agents has its own strength on capturing a kind of network behavior and hence the system has strength on detecting different types of attack. In addition, its ability on detecting new types of attack as well as a higher tolerant to fluctuations were shown. The experimental results showed that the frequent patterns mined from the audit data could be used as reliable agents, which outperformed from traditional signature-based NIDS.

1. Introduction As networking becomes more widespread, the number of violations to normal operations is increasing. Current firewalls are not sufficient to ensure the security in computer networks, which some intrusions take advantages of vulnerabilities in computer systems or use socially engineered penetration techniques that traditional intrusion prevention techniques are not enough in protection. Network Intrusion Detection System (NIDS) will be another wall for protection. Most of the existing commercial NIDS are signature-based but not adaptive. They have three main problems. First, attack stealthiness: attackers try to hide their actions from either an individual in monitoring the system or a NIDS. Second is a novel intrusion: it is undetectable by signature-based NIDS; they can only be detected as anomalies by observing deviations from normal

Ting-Yee Cheung Data Cog Corporation Ltd [email protected]

network behavior. The last one is distributed attack: it is detected by correlation of attacks. In this paper, an adaptive NIDS using data mining technologies is developed, which accurately capture the actual behavior of network traffic. The proposed NIDS is constructed by different types of agent [1]. There are five types of agent based on three data mining techniques, which are clustering, association rules and sequential association rules approaches. The normal behavior of a network can be profiled and anomaly traffic can easily be detected with the present of network portfolio. In addition, it can adopt the changes of network automatically with the adaptive learning of agents. This paper is organized into five sections. Section 2 contains a discussion on related works while the proposed agent-based NIDS is introduced in section 3. Experimental results are reported in section 4 and the final section concludes the paper.

2. Related Work Most of the commercial NIDSs sold in the market are signature-based with a disadvantage in detection of previously known attacks only. Especially, different kinds of attack come every day. The signature-based NIDS will not be functional when new kinds of attack coming. Therefore, many researchers have proposed and implemented different intrusion detection models based on data mining techniques to tackle this problem. In this section, a brief review on current works is given. NIDS need to be accurate, adaptive, and extensible. [2,3] developed a general and systematic method for intrusion detection and provides an overview on two general data mining algorithms that have been implemented: association rules [4,5] and frequent episodes [6]. In [7], using network intrusion detection as a concrete application example, it describes how to construct models that are both accurate in describing the underlying concepts, and efficient for analyzing data in real-time. The same authors describe a framework, MADAM ID, for Mining Audit Data for Automated Models for Intrusion Detection in [8]. [9]

Proceedings of the Third International Conference on Information Technology and Applications (ICITA’05) 0-7695-2316-1/05 $20.00 © 2005 IEEE

Set of Feature

Decision Maker

Agent

Agent . . . Agent Detection Engine

Figure 2. Architecture of the detection engine For each detection agent, corresponding trainer is built for updating agent in an adaptive manner. Same as the Detection Engine, a Feature Distributor assigns necessary feature vectors to each training node. Each training node is built in a corresponding data mining approach and updated corresponding agent adaptively. Figure 3 shows the structure of a Trainer.

The proposed NIDS is composed of three modules, feature extractor, detection agents and agent trainers. First, a feature extractor converts the data from a monitored system into features which will be used in both training and network intrusion detection stages. Figure 1 shows the overall system architecture.

Feature Extractor Real Time

Update

Trainer

Approach2 . . . Agent n Trainer

Figure 3. Architecture of the agent trainer An anomaly detection model is based on normal behavior only and deviations from it. In other words, the normal behavior of the network is profiled. This model is possibly high in false alarm rate as previously unseen (yet legitimate) system behaviors may be recognized as anomalies, but the adaptive ability of this model to the environment is expected in higher.

3.2. Feature extractor

Network Traffic

Batch Processing

Set of Feature

Feature Distributor

Approach1

In this section, an adaptive NIDS based on a set of agents is proposed. Three main approaches focused on different features are described. Each approach is represented by a number of agents, which has strength on detecting a certain kinds of network intrusion detection. The overall architecture of the proposed system is first discussed in the next subsection while detail description of each agent is presented afterward.

3.1. System architecture

Alarm

Update Corresponding Agents

3. Proposed agent-based NIDS

introduced. The results among all agents are gathered by the decision makers for making conclusion on the final decision of the system (Figure 2). Feature Distributor

introduces a new type of clustering-based algorithm for unsupervised anomaly NIDS, which trains on unlabeled data in order to detect new intrusions. [10] presents a data mining based approach to support signature discovery in NIDS. Furthermore, [11] discusses outlier detection algorithms used in data mining systems. In this paper, an adaptive NIDS based on various data mining techniques is proposed. However, unlike most of the current researches, which only one engine is used for detection of various attacks; the proposed system is constructed by a number of agents, which are totally different in both training and detection processes. In this stage, three data mining approaches: clustering, association and sequential association, are adopted and five types of agent are built. After training with normal traffic for a network behavior, when new type of attack comes, the proposed system can detect such anomaly by distinguishing it from normal traffic.

Detection Engine

Alarm

Proposed IDS

Figure 1. Architecture of agent-based NIDS In Detection Engine, a Feature Distributor allocates necessary feature vectors to each agent. In this stage, three main data mining approaches, clustering, association rules and sequential association rules are

The Feature Extractor has corresponding functions for each kind of statistics, and it is flexible to use. For example, the statistical figures of the frames of packet will be collected (min/average/max value, unique item, …, etc.). Those statistics form feature vectors for both detection and training processes. Currently, the system supports the following frame feature extraction. 1. 2. 3. 4. 5.

Min/Ave/Max value of a field Number of unique item of a field Time range covered by a frame Number of packets in a frame Number of packets in a frame after passing filter

Proceedings of the Third International Conference on Information Technology and Applications (ICITA’05) 0-7695-2316-1/05 $20.00 © 2005 IEEE

6. Number of connection attempted to open in a frame

Table 3. Behaviors of time frame vs. packet frame

3.3. Clustering-based agent

1.More attack tuples in total 2.Normal data becomes very noisy in short time frame

Time frame

The clustering-based agent extracts properties from traffic in terms of frames, and tries to make the normal traffic from isolated clusters in training stage. Then, each cluster will have its representative feature vectors representing certain normal property. For an unknown traffic to be clustered, its traffic property with those trained clusters is compared. If the unknown traffic vector has distance too further away from normal clusters, it is classified as attack traffic, or vice versa. 3.3.1. Feature selection. Different feature sets for different clustering-based agents are shown in Table 1. The features selected are specifying for the quantity based attacks such as probing and denial of service. Table 1. Features for clustering-based agents Agent Cluster TCP Cluster UDP Cluster SYN flood

Feature selected Number of Mean packet unique ports size accessed Number of Mean packet unique ports size accessed Number of Number of unique ports connection accessed open attempted

Number of RST packets Number of ICMP packets Number of RST packets

Time range covered by frame Time range covered by frame Time range covered by frame

Packet frame 1.Less attack tuples in total 2.Normal data is less noisy in short packet frame

3.3.3. Training phase – cluster formation. The performance of two clustering approaches is evaluated first. They are k-means and hierarchical clustering. The properties of different approaches are shown in Table 4. Table 4. k-means vs. hierarchical clustering

k-means 1.Fast in speed 2.Less accurate than hierarchical 3.Accuracy depends on the seeding method

Hierarchical 1. Slow in speed 2. More accurate than k-means 3. Accuracy depends on the distance renew method

Although Table 5 shows that the detection rate of hierarchical clustering gains a little bit advantage compared to k-means clustering, the hierarchical clustering needs higher dimensional space and the computational complexity increases exponentially with increasing of number of features. Thus, in this paper, the k-means clustering approach is chosen as default approach for clustering-based agents unless specify. Table 5. Detection rate on different clustering approaches (jump packet frame = 100 packet)

3.3.2. Frame formation. In the proposed NIDS, the basic unit for feature extraction is frame. A frame contains a number of packets; Figure 4 demonstrates the frame formation. There are four basic frame formations: jump time frame, slide time frame, jump packet frame and slide packet frame. Tables 2 and 3 show their corresponding behaviors.

Hierarchical k-means

True positive 97.23% 97.15%

False negative 2.77% 2.85%

True negative 99.78% 99.79%

False positive 0.22% 0.21%

3.3.4. Detection Engine. It is based on how far (Euclidean distance) of a candidate cluster from normal. If the distance is larger than a threshold, the cluster will be regarded as an intrusion, or vice versa.

3.4. Association rule-based agent

Figure 4. Different frame formations Table 2. Behaviors of jump frame vs. slide frame Jump frame 1. Generate less marginal frame 2. Less attack tuples in total

Slide frame 1. Generate more marginal frame 2. More attack tuples in total

This agent finds out the relationship between features selected and traffic property. The features used are commonly tested in clustering algorithm. 3.4.1. Feature selection. There are four features selected: 1) Number of unique ports accessed (large if attack), 2) Average packet size in frame (smaller than normal traffic), 3) Number of “ICMP destination unreachable” packet in frame (large because of victim response), and 4) Time range covered by packets (port scan usually shows a burst in short time range, and so shorter time range).

Proceedings of the Third International Conference on Information Technology and Applications (ICITA’05) 0-7695-2316-1/05 $20.00 © 2005 IEEE

3.4.2. Training phase – rules formation. The basic association rule algorithm [12] will be used to capture the rule of selected features on network traffic.

the false alarm from one side may be eliminated due to the agents at another side may not generate alarm.

3.4.3. Detection Engine. The number of rules each connection matched is counted. If the frequency is larger than a threshold, it is declared as normal traffic.

3.5. Sequential rule-based agent Besides capturing the general behaviors of normal network traffic, there are some common sequential patterns at connection level of a normal traffic and different sequential patterns in attack traffic. Thus, the sequential rule-based agent can help to extract pattern rules for differentiating the normal traffic and intrusion.

Figure 5. Alarm declaration policy structure

3.5.1. Feature selection. 6 TCP flags: URG, ACK, PSH, RST, SYN and FIN, are used as the items in the rule mining process. 3.5.2. Training phase – rules formation. The sequential association rule algorithm [13] will be used to capture the sequential pattern in network traffic dialog. To facilitate the association mining process, using different minimum support for different itemset is applied (e.g. 1-itemset and 2-itemset has different minimum support). A depreciation mechanism is defined in the following formula where minimum support is decreased by a specified depreciation percentage (depr). For example, if initial support is set to 100% and depr is 90%, then 1-itemset support = 100% and 2-itemset support = 100%*90% = 90%, etc. supportK (%) = depr(%) × supportK −1 (%) where K is K-itemset.

3.5.3. Detection engine. There are two parameters can be adjusted. For each frame, when the number of abnormal connections matched within the packet/time frame is larger than a threshold, the frame will be declared as intrusion. The other is minimum number of abnormal connections in one frame, which will initiate an attack alarm (MIN_ATTACK). Initial threshold is 10%. The best initial value of MIN_ATTACK is got under a training process.

Figure 6. Alarm relationship

4. Experiments In this section, an investigation on the performance of proposed NIDS is studied, and also, different types of attack are tested to evaluate the strength and limitation of each agent. Finally, an open source signature-based NIDS – SNORT [14] is served as a benchmark in the experiments.

4.1. Experimental settings The traffic data from MIT Lincoln Laboratory [15] is used for training and performance testing. The attack scenario lasts for 7 weeks (7 days per week). The MIT’s description for those attacks is shown in Table 6. Table 6. Attack type in experiment data Attack type Port sweep

3.6. Decision maker As this system is composed of a batch of agents, a policy is used to make alarm declaration decision and shown in Figure 5. The true attack is expected to have alarm generated from both clustering based and rule based agents (overlapped area in Figure 6). Therefore,

nmap Smurf IP sweep Pod Neptune

Description Surveillance sweep through many ports to determine which services are supported on a single host (it is regarded as probing) Network mapping using the nmap tool. Denial of service ICMP echo reply flood Surveillance sweep performing either a port sweep or ping on multiple host addresses Denial of service ping of death SYN flood denial of service on one or more ports

Proceedings of the Third International Conference on Information Technology and Applications (ICITA’05) 0-7695-2316-1/05 $20.00 © 2005 IEEE

4.2. Parameters settings

4.3. Experimental results

Jump packet mode in frame extracted is used and set as 100 packets. In cluster-based agent, the k-means clustering approach was adopted and the statistic seed is set with k = 256. Squared-error threshold is set to 0 (did not set) and maximum loop count is 500. Detail features selected for each clustering-based agent were specified in section 3.3.1. In rule-based agent, minimum support is set to 100% and depreciation percentage, depr = 96%. Finally, the threshold settings for different agents used are listed in Table 7. For the tidy representation of data, the different combinations of agents are represented in Table 8.

As each agent is intended to inspect specific kind of traffic, there is lower detection rate in each agent and higher false alarm rate from certain agent. After the alarm decision, a higher detection rate can be achieved, while policy applied limits the false alarm rate. The rule-based agents yield better detection rate than clustering-based agents on the same attack type, due to its tolerance on noisy background and the adapting ability on attack speed. Summary of the results are shown in Table 9, 10 and Figures 7, 8. Table 9. Performance on local intrusion dataset A1

A2

A3

A4

A5

P1

P2

P3

Table 7. Threshold settings for different agents

SYNʳ

100ʳ

0.0ʳ

100ʳ

100ʳ

97.1ʳ

100ʳ

100ʳ

97.1ʳ

Clustering-based Agents Cluster Cluster SYN UDP flood

ACKʳ

100ʳ

3.0ʳ

100ʳ

100ʳ

100ʳ

100ʳ

100ʳ

100ʳ

FINʳ

100ʳ

6.0ʳ

100ʳ

100ʳ

100ʳ

100ʳ

100ʳ

100ʳ

NULLʳ

100ʳ

0.0ʳ

100ʳ

100ʳ

97.0ʳ

100ʳ

100ʳ

100ʳ

Cluster TCP

Dataset\Agent Local Intrusion Lab Dataset MIT Lab Dataset Dataset\Agent Local Intrusion Lab Dataset MIT Lab Dataset

0.051

0.065

0.069

0.114

0.116 0.078 Rule-based Agents Sequential rule Association rule Threshold Min_Attack Threshold 10

6

40

10

6

40

Table 8. Alias for different set of agents Agent 1 (A1) Agent 2 (A2) Agent 3 (A3) Agent 4 (A4) Agent 5 (A5) Policy 1 (P1)

Policy 2 (P2)

Policy 3 (P3)

Cluster TCP Cluster UDP Cluster SYN flood Sequential rule Association rule [(Cluster TCP OR Cluster SYN flood) AND Sequential rule] OR (Cluster UDP AND Association rule) [(Cluster TCP OR Cluster SYN flood) OR Sequential rule] OR (Cluster UDP OR Association rule) A traditional signature-based NIDS (SNORT)

Xmasʳ

100ʳ

0.0ʳ

100ʳ

100ʳ

9.7ʳ

100ʳ

100ʳ

100ʳ

Connectʳ

98.9ʳ

0.0ʳ

89.1ʳ

91.4ʳ

44.9ʳ

91.1ʳ

99.7ʳ

89.7ʳ

UDP Scanʳ

59.4ʳ

98.6ʳ

1.0ʳ

1.0ʳ

98.3ʳ

98.0ʳ

99.0ʳ

42.0ʳ

SYN floodʳ

91.4ʳ

1.1ʳ

99.6ʳ

100ʳ

98.2ʳ

99.6ʳ

100ʳ

91.8ʳ

87.3ʳ

23.3ʳ

74.0ʳ

74.8ʳ

83.2ʳ

96.9ʳ

99.7ʳ

69.4ʳ

0.1ʳ

0.9ʳ

0.0ʳ

0.0ʳ

14.1ʳ

0.8ʳ

14.2ʳ

1.1ʳ

Overall Detection Rate Overall False Alarm

ˣ˸̅˶˸́̇˴˺˸ʻʸʼ ˄˃˃ˁ˃˃ ˌ˃ˁ˃˃ ˋ˃ˁ˃˃ ˊ˃ˁ˃˃ ˉ˃ˁ˃˃ ˈ˃ˁ˃˃ ˇ˃ˁ˃˃ ˆ˃ˁ˃˃ ˅˃ˁ˃˃ ˄˃ˁ˃˃ ˃ˁ˃˃ ˔˺˸́̇ʳ˄ ˔˺˸́̇ʳ˅ ˔˺˸́̇ʳˆ ˔˺˸́̇ʳˇ ˔˺˸́̇ʳˈ ˣ̂˿˼˶̌˔ ˣ̂˿˼˶̌˕ ˣ̂˿˼˶̌˖ ˗˸̇˸˶̇˼̂́ʳ˥˴̇˸

˙˴˿̆˸ʳ˴˿˴̅̀

Figure 7. Detection rate on local intrusion dataset

In this paper, the measurement of the experimental results is based on the standard metrics for evaluations of intrusions, that is: true negative (TN), false negative (FN), false alarm (FP) and correctly detected intrusions (detection rate, TP). As TN and FP, FN and TP are complements; only TP and the FP values are listed in results. Detection rate (TP) refers to the ratio between the number of correctly detected attacks and the total number of attacks while false alarm (FP) rate means the ratio between the number of normal connections that are incorrectly misclassified as attacks and the total number of normal connections.

Table 10. Performance on MIT lab dataset A1

A2

A3

A4

A5

P1

P2

P3

76.8ʳ

0.0ʳ

85.3ʳ

85.3ʳ

93.7ʳ

85.3ʳ

99.0ʳ

89.5ʳ

nmapʳ

0.0ʳ

93.8ʳ

0.0ʳ

0.0ʳ

93.8ʳ

93.8ʳ

93.8ʳ

100ʳ

Smurfʳ

0.0ʳ

100ʳ

0.1ʳ

0.1ʳ

100ʳ

100ʳ

100ʳ

19.2ʳ

IP sweepʳ

0.0ʳ

84.6ʳ

0.0ʳ

0.0ʳ

91.6ʳ

86.9ʳ

91.5ʳ

99.2ʳ

podʳ

7.1ʳ

96.4ʳ

0.0ʳ

0.0ʳ

96.4ʳ

96.4ʳ

96.4ʳ

96.4ʳ

Neptuneʳ

0.0ʳ

0.0ʳ

99.6ʳ

99.8ʳ

88.4ʳ

99.6ʳ

99.8ʳ

94.0ʳ

0.8ʳ

71.6ʳ

28.0ʳ

28.0ʳ

96.6ʳ

99.5ʳ

99.8ʳ

41.8ʳ

4.6ʳ

4.7ʳ

5.6ʳ

0.5ʳ

17.4ʳ

0.6ʳ

23.2ʳ

26.7ʳ

Port sweepʳ

Overall Detection Rateʳ Overall False Alarmʳ

Proceedings of the Third International Conference on Information Technology and Applications (ICITA’05) 0-7695-2316-1/05 $20.00 © 2005 IEEE

[3] W. Lee, S. Stolfo and K. Mok, “Mining Audit Data to Build Intrusion Detection Models,” In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD '98), New York, NY, August 1998.

ˣ˸̅˶˸́̇˴˺˸ʳʻʸʼ ˄˃˃ˁ˃˃ ˌ˃ˁ˃˃ ˋ˃ˁ˃˃ ˊ˃ˁ˃˃ ˉ˃ˁ˃˃ ˈ˃ˁ˃˃ ˇ˃ˁ˃˃ ˆ˃ˁ˃˃ ˅˃ˁ˃˃ ˄˃ˁ˃˃ ˃ˁ˃˃ ˔˺˸́̇ʳ˄

˔˺˸́̇ʳ˅

˔˺˸́̇ʳˆ

˔˺˸́̇ʳˇ

˗˸̇˸˶̇˼̂́ʳ̅˴̇˸

˔˺˸́̇ʳˈ

ˣ̂˿˼˶̌ʳ˔ ˣ̂˿˼˶̌ʳ˕ ˣ̂˿˼˶̌ʳ˖ ˙˴˿̆˸ʳ˴˿˴̅̀

Figure 8. Detection rate on MIT lab dataset

The benchmarking traditional signature-based NIDS shows that for the attack type without signature, the detection rate is very low, while the proposed NIDS which only based on normal traffic shows its strength on capturing “unseen” attack.

5. Conclusions Commercial signature-based NIDS is weak in detection of previously unknown attacks. In this paper, an agent-based NIDS based on various data mining techniques is proposed. Unlike most of the current research, which use only one engine for detection of various attacks, the proposed system is constructed by a number of agents. The NIDS can broaden its view on different behaviors of the network traffic by each of the agents with its own strength on capturing a kind of network behavior. In addition, training with only normal traffic is the process to profile the normal behavior of a network for detecting new types of attack and higher tolerance to the fluctuation. The experimental results show that the frequent patterns mined from the audit data can be used as reliable agents. The agents are outperformed from traditional signature-based NIDS. For future development, the following directions are proposed: (i) to develop more agents which are strength on other aspects, (ii) to set the thresholds by the system with minimum human interrupt and also (iii) to introduce incremental updating mechanism for the detection agents.

6. References [1]

IBMs Aglet Mobile Agent Implementation.

[2] G.W. Lee and S. J. Stolfo, “Data Mining Approaches for Intrusion Detection,” In Proceedings of the Seventh USENIX Security Symposium (SECURITY '98), San Antonio, TX, January 1998.

[4] R. Agrawal, T. Imielinski and A. Swami, “Mining Associations between Sets of Items in Massive Databases,” In Proceedings of the ACM-SIGMOD Int'l Conference on Management of Data, Washington D.C., pp. 207-216, May 1993. [5] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” In Proceedings of the 20th Int'l Conference on Very Large Databases (VLDB94), Santiago, Chile, September. 1994. [6] H. Mannila, H. Toivonen and A. I. Verkamo, “Discovery of Frequent Episodes in Event Sequences,” Data Mining and Knowledge Discovery 1(3), pp. 259-289, 1997. [7] W. Lee, S. Stolfo and K. Mok, “Mining in a Data-flow Environment: Experience in Network Intrusion Detection,” In Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD '99), San Diego, CA, August, 1999. [8] W. Lee and S. J. Stolfo, “A framework for constructing features and models for network intrusion detection systems,” ACM Transactions on Information and System Security (TISSEC), Vol. 3, Issue 4, pp. 227-261, November 2000. [9] L. Portnoy, E. Eskin and S. J. Stolfo, “Intrusion detection with unlabeled data using clustering,” In Proceedings of ACM CSS Workshop on Data Mining Applied to Security (DMSA-2001), Philadelphia, PA, November 2001. [10] H. Han, X. L. Lu, J. Lu, C. Bo, and R. L. Yong, “Data mining aided signature discovery in network-based network intrusion detection system,” ACM SIGOPS Operating Systems Review, Vol. 36, No. 4, pp. 7-13, October 2002. [11] M. I. Petrovskiy , “Outlier Detection Algorithms in Data Mining Systems,” Source Programming and Computing Software, Vol. 29 , Issue 4, pp. 228-237, July-August 2003. [12] R. Agrawal, T. Imielinski, A. Swami and R. Srikant, “Mining Association Rules between Sets of Items in Large Databases,” In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, pp. 207216, May 1993. [13] R. Agrawal and R. Srikant, “Mining Sequential Patterns,” In Proceedings of the Int'l Conference on Data Engineering (ICDE), Taipei, Taiwan, March 1995. [14] http://www.snort.org [15] http://www.ll.mit.edu/SST/ideval/data/data_index.html

Proceedings of the Third International Conference on Information Technology and Applications (ICITA’05) 0-7695-2316-1/05 $20.00 © 2005 IEEE