Anomaly Intrusion Detection using Multi-Objective Genetic Fuzzy System and Agent-based Evolutionary Computation Framework Chi-Ho Tsang1 Sam Kwong2 Hanli Wang1 Department of Computer Science, City University of Hong Kong 83 Tat Chee Avenue, Kowloon, Hong Kong 1 {wilson, hanli}@cs.cityu.edu.hk [email protected] Abstract In this paper, we present a multi-objective genetic fuzzy system for anomaly intrusion detection. The proposed system extracts accurate and interpretable fuzzy rule-based knowledge from network data using an agentbased evolutionary computation framework. The experimental results on KDD-Cup99 intrusion detection benchmark data demonstrate that our system can achieve high detection rate for intrusion attacks and low false positive rate for normal network traffic.

2. Multi-Objective Genetic Fuzzy Intrusion Detection Systems (MOGFIDS) An agent-based evolutionary computation framework, which consists of Fuzzy Set Agent (FSA) and Arbitrator Agent (AA), is proposed to construct a GFRBS. The framework is illustrated in Figure 1 and described below.

1. Introduction Learning classification rules from network data is one of the effective anomaly detection approaches to automate and simplify the manual development of intrusion signatures. In order to construct intelligent rule-based Intrusion Detection Systems (IDS), one of the key challenges is to ensure that the extracted rules should be (i) accurate to detect both known and unknown attacks and recognize normal traffic, and (ii) linguistically interpretable for human comprehension. In traditional Genetic Fuzzy Rule-Based Systems (GFRBS), accuracy and interpretability are often contradictive to each other and not addressed simultaneously. As it is desirable to obtain highly interpretable knowledge in IDS to assist security experts for complicated intrusion analysis, optimizations of both accuracy and interpretability should be taken into account. To achieve this goal, we propose a Multi-Objective Genetic Fuzzy IDS (MOGFIDS), which applies agent-based evolutionary computation framework to generate and evolve an accurate and interpretable fuzzy knowledge base. In addition, the MOGFIDS can search for a near-optimal feature subset from network data. The interpretability including distinguishability, completeness, consistency, compactness and utility are discussed in our previous work [1], and will only be briefly mentioned in this paper. The rest of this paper is organized as follows. Section 2 describes the agent-based framework in MOGFIDS. Experimental results are evaluated in Section 3. Finally, Section 4 concludes this paper.

Proceedings of the Fifth IEEE International Conference on Data Mining (ICDM’05)

1550-4786/05 $20.00 © 2005 IEEE

Figure 1. Agent-based framework and agent evolution process

Each autonomous FSA employs three main strategies to construct and evolve its fuzzy systems. It initializes fuzzy sets information using the fuzzy sets distribution strategy. Based on the initialized fuzzy sets, an interpretable fuzzy rule base is then generated through the interpretability-based regulation strategy and fuzzy rules generation strategy. To find the global optimal fuzzy rule base, in each generation FSAs generate their offspring by cooperatively exchanging their fuzzy sets information, and applying crossover and mutation operations to the chromosomes of hierarchical formulation. The fuzzy rule bases of the offspring FSAs are generated using the previously mentioned strategies in a similar manner. Finally, FSAs submit their fitness values to AA for evaluation. The AA evaluates the parent and offspring FSAs based on their fitness assessments in both accuracy and interpretability criteria. As a result, the elitist FSAs are retained and the low-fitness FSAs are removed in each generation. Three intra-behaviors as well as interactions between the agents are discussed as follows.

2.1. Intra-Behaviors of Fuzzy Set Agents 2.1.1. Fuzzy Sets Distribution Strategy Minimal number of fuzzy sets and rules can be effectively searched without aprior knowledge of fuzzy set topology by Hierarchical GA (HGA) [2]. As depicted in Figure 2, each chromosome in HGA consists of control genes and parameter genes, which are used to optimize the distribution of fuzzy sets.

Figure 2. Example of hierarchical chromosome. Three-level gene structure has a phenotype value (7,6).

To sufficiently represent each fuzzy variable xi, a possible maximal number of fuzzy sets Mi is determined. For N dimensional problem, totally P=M1+M2+…+MN possible fuzzy sets require P binary-valued control genes to manage the activation of their parameter genes. Gaussian combinational membership function (Gauss2mf) is used to formulate antecedent fuzzy sets in parameter genes. The Gauss2mf is defined by the lower bound a1, left center a2, right center a3 and upper bound a4 of the definition domain (a1a2a3a4). The Gauss2mf used in HGA is shown in Figure 3. FSAs randomly initialize all the genes at the beginning of run.

minimal number of fuzzy rules considering both accuracy and interpretability, FSAs perform the following tasks:

(a) Initialization of Rule Base Population Each fuzzy rule is encoded as a string of length N, where the ith element has value ci: 0 ci Mia which indicates the ci th fuzzy set is triggered (ci>0) or the ith fuzzy variable does not play a role in rule generation (ci=0). After that, FSA defines the population size Npop, i.e., the number of fuzzy rule sets that represent a complete rule base. Each individual of fuzzy rule sets population is represented as a concatenated string of the length N×Nrule, where Nrule is a predefined integer specifying the size of the initial fuzzy rule base. In this concatenated string, each substring of length N represents a single fuzzy rule. A heuristic procedure [3] is applied to generate rule consequents for classification such that the consequents are not coded as parts of the concatenated string. The fuzzy rule sets are randomly initialized so that the value of the concatenated string can present one of the fuzzy sets of the corresponding fuzzy variable, or is equal to zero indicating “don’t care” conditions.

(b) Crossover and Mutation New offspring rule sets are generated by crossover and mutation. One-point crossover operation randomly selects different cutoff points for each parent to generate offspring rule sets. An example of crossover is given in Figure 4.

Figure 4. Example of crossover operation on the rule sets.

Figure 3. Example of Gauss2mf encoded in HGA.

2.1.2. Interpretability-based Regulation Strategy As the distinguishability of fuzzy partitioning cannot be guaranteed in the above initialization, we apply an interpretability-based regulation strategy to maintain a more appropriate distribution of fuzzy sets. If the similarity between two fuzzy sets is greater than a given threshold, then the two fuzzy sets will be merged and become a new one. If a fuzzy set is similar to the universal set or a singleton set, then it will be removed from the rule base. For details of the fuzzy sets merging and removal methods, reader may refer to section 4.2 in [1].

2.1.3. Fuzzy Rules Generation Strategy To perform genetic optimization on fuzzy rule base, Pittsburgh approach is applied to extract rules from the training data. Suppose there are N fuzzy variables, Mia is the number of active fuzzy sets for variable xi. and the “don’t care” conditions are included for incomplete rules. Hence, the maximum number of possible rules is (M1a+1)×(M2a+1)×…×(MNa+1). In order to search for a

Proceedings of the Fifth IEEE International Conference on Data Mining (ICDM’05)

1550-4786/05 $20.00 © 2005 IEEE

The mutation operation randomly replaces an element of the rule sets with another linguistic value if a simple probability test is satisfied. Elimination of existing rules and addition of new rules can also be used as mutation operations. As a result, the number of rules in the rule sets string can be changed accordingly.

(c) Evaluation Criteria and Selection Mechanism FSA applies three criteria to evaluate fuzzy rule set candidates: (i) classification accuracy, (ii) number of fuzzy rules, and (iii) total length of fuzzy rules, i.e. the total number of rule antecedents in rule base. The classification accuracy can be calculated using a single winner rule method [4]. For each training sample Xi, the winner rule Ri is determined as ǎi(Xi)=max{ǎk(Xi) | k=1,2,…,R} where R is the number of rules. All the fuzzy rule base candidates are evaluated by FSAs using the robust multi-objective optimization algorithm NSGA-II [5]. Suppose there are Npop+Noffs candidates, where Npop is the size of parent population and Noffs is the size of offspring population resulting from crossover and mutation operations. The FSAs employ elitism strategy to select Npop best candidates from the mixed populations.

2.2. Interactions between FSAs and AA

3.2. Performance Evaluations

The FSAs interact with one another for exchanging fuzzy sets information and generating offspring agents. Assume the number of offspring FSAs Noffs is less than or equal to that of parent FSAs Ncurr, Noffs FSAs can be randomly selected from the current agent population. Therefore, two parent FSAs generate two offspring FSAs, and Noffs offspring FSAs can be generated and applied with crossover and mutation. As depicted in Figure 1, the offspring FSAs apply interpretability-based regulation strategy and fuzzy rules generation strategy to generate rule bases. After that, FSAs send their fitness information to AA, which applies NSGA-II to evaluate parent and offspring FSAs and selects Ncurr best FSAs to be the population in next generation. Elitist FSAs considering both accuracy and interpretability survive from the competition, while the low-fitness FSAs are discarded.

We apply 12 FSAs each of which has 10 fuzzy rule sets solutions, therefore 120 fuzzy systems are generated for initialization. The trends of the multiple objectives against the number of iterations are plotted in Figure 5.

3. Experimental Results 3.1. Data Description and Preprocessing The KDD-Cup99 dataset from the UCI repository (http://www.ics.uci.edu/~mlearn/MLRepository.html) is widely used as the benchmark data for IDS evaluation. We apply its 10% training data (494021 connection records) for training. Each record can be classified as normal traffic, or one of 22 different classes of attacks. All attacks fall into 4 main categories: DOS, R2L, U2R and Probing. To alleviate class imbalance problem in training, random sub-sampling is applied to three largest classes: Normal, Neptune and Smurf, which contain 98% records of whole dataset. As a result, 20752 records are applied for training. The KDD-Cup99 independent test data (311029 records) with different class probability distribution and new attacks is used for evaluation. As each record contains both continuous and nominal features, the nominal features are converted into binary numeric features. Hence, 52 numeric features are constructed and normalized to the interval [0, 1]. They are given in Table 1 below. Table 1. Feature set of the preprocessed KDD-Cup99 data. Note that since feature ‘service type’ can be expanded into 71 binary features that heavily increase the dimensionality as well as the initial rule length, this feature is not applied in this work. # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

feature name duration protocolType=tcp protocolType=udp protocolType=icmp flag=SF flag=REJ flag=S0 flag=S1 flag=S2 flag=S3 flag=SH flag=RSTO flag=RSTOS0 flag=RSTR flag=OTH SrcBytes DstBytes Land

# 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36

feature name wrongFragment Urgent Hot NumFailed LoggedIn numCompromised RootShell suAttempted numRoot numFileCreations numShells numAccessFiles numOutboundCmds isHostLogin isGuestLogin count srvCount serrorRate

# 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52

feature name srvSerrorRate rerrorRate srvRerrorRate sameSrvRate diffSrvRate srvDiffHostRate dstHostCount dstHostSrvCount dstHostSameSrvRate dstHostDiffSrvRate dstHostSameSrcPortRate dstHostSrvDiffHostRate dstHostSerrorRate dstHostSrvSerrorRate dstHostRerrorRate dstHostSrvRerrorRate

Proceedings of the Fifth IEEE International Conference on Data Mining (ICDM’05)

1550-4786/05 $20.00 © 2005 IEEE

Figure 5. Trends of average accuracy, average number of fuzzy sets, average number of rules and average total-length of fuzzy rule-base among all FSAs on the training data.

The results show that the agents can continuously improve the average accuracy using the elitism strategy in each generation. On average, stable accuracies can be obtained using about 106 rules each of which has approximate rule length of 17 only. The rule base is acceptable for accurate classification. The trade-offs among the multiple objectives within the non-dominated fuzzy system solutions are shown in Figure 6.

Figure 6. Non-dominated Pareto front about fuzzy systems of MOGFIDS on the training data.

The results show that, out of 120 fuzzy systems, there are 19 non-dominated solutions found in training. The average accuracy obtained from the FSAs varies from 80% to 99%, with the number of rules ranging from 30 to 300. As widespread non-dominated solutions can be obtained in MOGFIDS considering both accuracy and interpretability, they assist experts to comprehend the intrusion attacks recognized by the fired rules. The fuzzy system, which extracts 148 fuzzy rules from training data and obtains the peak accuracy (99.24%) as well as lowest false positive rate (FPR) (1.1%) for normal traffic, is applied for test validation. Another advantage of the MOGFIDS is that it can be considered as a genetic wrapper in feature selection. As the fuzzy variables can be selected and removed through the crossover and mutation

operations during the FSAs evolution, a desired feature subset that minimizes the classification error and improves the interpretability of fuzzy system can be searched accordingly. It is shown that only 27 out of 52 features are significantly relevant and used in the fuzzy rules of the selected fuzzy system for classification. The selected features are given in Table 2. Table 2. Feature subset selected by MOGFIDS.

Features selected in the extracted fuzzy rules 2,3,4,16,17,18,19,20,22,23,24,25,28,29,33,34,36,41,43,44,45,46,47,48,49,50,51

Due to the space limit, the distributions of only two feature variables are shown in Figure 7. It demonstrates that MOGFIDS generates distinguishable fuzzy sets distributions easily understandable by human beings.

Regarding the rare classes of U2R and R2L attacks, MOGFIDS outperforms other baseline classifiers in terms of both Recall and F-measure for U2R attacks, and also achieves competitive results for R2L attacks, indicating that MOGFIDS can relatively alleviate the over-fitting problem when it is learned with small training samples and evaluated with large test samples with novel attacks. Considering the major classes such as Probe and DOS attacks, both SVM and MOGFIDS can achieve high Recall, Precision and F-measure rates. For the normal traffic recognition, both C4.5 and MOGFIDS obtain relative low FPRs. As the MOGFIDS is further compared with the KDD-Cup99 winner and other rule-based classifiers in the literature, the overall results demonstrate that MOGFIDS achieves robust performance by detecting known and unseen attacks with high detection rate and recognizing normal traffic with acceptably low FPR.

4. Conclusions

Figure 7. Distribution of fuzzy sets of feature with index #46 (left) and #51 (right).

Classification performance of MOGFIDS is measured and compared with that of different baseline classifiers including pruning C4.5, Naïve Bayes (NB), k-Nearest Neighbor (k-NN) and Support Vector Machine (SVM). Note that in k-NN classifier parameter k is set to 5, and the SVM is trained using the well-known fast sequential minimal optimization method with a polynomial kernel. The results are also compared with the winner of KDDCup99 contest [6], RIPPER and an improved PNrule [7] classifier recently proposed in the literature. Table 3 shows the comparative results of different classifiers. Table 3. Recall, precision, F-measure, classification cost [6] and accuracy obtained with different classifiers on the test data. The best results are bold-faced, and the second and third best results are underlined. The numbers of records of each class in our training and test data are given under the category name in the format of {training: test}. C4.5

NB

5-NN

81.88 52.20 63.76

90.45 64.16 75.07

81.61 55.46 66.05

SVM MOGF Cup -IDS Winner [6] 86.27 88.60 83.30 77.72 74.40 64.81 81.77 80.88 72.90

96.99 99.69 98.32

82.75 94.00 88.02

97.00 99.42 98.19

97.56 99.86 98.70

97.20 99.90 98.53

14.47 9.35 11.36

13.16 2.05 3.54

14.91 5.47 8.00

10.09 53.49 16.97

1.45 30.32 2.77

62.74 42.70 50.82

6.90 66.97 12.51

{10000: precision 60593} f-measure

98.38 74.75 84.96

55.47 43.33 48.65

Accuracy Classification cost

92.02

76.45

Probe

recall

{4107: 4166}

precision

DOS

recall

f-measure

{5467: precision 229853} f-measure

U2R

recall

{52: 228}

precision

R2L

recall

{1126: 16189}

precision

f-measure

f-measure

Normal recall

Ripper PNrule [7] [7] 81.16 77.92 79.51

89.01 82.11 85.42

97.10 99.88 98.47

22.06 95.75 35.86

21.74 96.68 35.50

15.79 61.02 25.09

13.20 71.43 22.28

11.84 55.10 19.49

11.40 53.06 18.77

3.55 62.39 6.71

11.01 68.39 18.97

8.40 98.84 15.48

8.33 81.85 15.12

13.05 82.37 22.53

95.89 74.15 83.63

97.99 73.42 83.94

98.36 74.74 84.94

99.50 74.61 85.28

N/A N/A N/A

N/A N/A N/A

91.83

92.54

92.77

92.71

N/A

N/A

0.2480 0.4965 0.2458 0.2457 0.2317 0.2331

N/A

N/A

Proceedings of the Fifth IEEE International Conference on Data Mining (ICDM’05)

1550-4786/05 $20.00 © 2005 IEEE

It is desirable for anomaly rule-based IDS to achieve high classification accuracy, and meanwhile reduce the complexity of the rule bases that are extracted from training data. Due to the fact that the accuracy and interpretability are often contradictory in the optimization of GFRBS, we thus propose MOGFIDS that applies an agent-based evolutionary computation framework to evolve an accurate and interpretable fuzzy knowledge base for anomaly intrusion detection. Experimental results demonstrate that MOGFIDS achieves robust performance for classifying both intrusion attacks and normal network traffic. In addition, it can search for a reduced feature subset and obtain interpretable fuzzy systems.

References [1] H. L. Wang, S. Kwong, Y. Jin, W. Wei, and K. F. Man, Multiobjective hierarchical genetic algorithm for interpretable fuzzy rule-based knowledge extraction, Fuzzy Sets and Systems, 149(1), Jan. 2005, pp. 149-186. [2] K. S. Tang, K. F. Man, Z. F. Liu, and S. Kwong, Minimal fuzzy memberships and rules using hierarchical genetic algorithms, IEEE Trans. Industrial Electronics, 45(1), Feb. 1998, pp. 162169. [3] H. Ishibuchi and T. Nakashima, Effect of rule weights in fuzzy rule-based classification systems, IEEE Trans. Fuzzy Systems, 9(4), Aug. 2001, pp. 506-515. [4] H. Ishibuchi, K. Nozaki, and H. Tanaka, Distributed representation of fuzzy rules and its application to pattern classification, Fuzzy Sets and Systems, 52(1), Nov. 1992, pp. 21-32. [5] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, A fast and elitist multiobjective genetic algorithm: NSGA-II, IEEE Trans. Evolutionary Computation, 6(2), Apr. 2002, pp. 182-197. [6] C. Elkan, Results of the KDD’99 classifier learning, ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, Boston, MA, 1(2), 2000, pp. 63-64. [7] R. Agarwal and M. V. Joshi, PNrule: a new framework for learning classifier models in data mining (a case-study in network intrusion detection), In: Proc. First SIAM Conf. on Data Mining, Apr. 2001.

2. Multi-Objective Genetic Fuzzy Intrusion Detection Systems (MOGFIDS) An agent-based evolutionary computation framework, which consists of Fuzzy Set Agent (FSA) and Arbitrator Agent (AA), is proposed to construct a GFRBS. The framework is illustrated in Figure 1 and described below.

1. Introduction Learning classification rules from network data is one of the effective anomaly detection approaches to automate and simplify the manual development of intrusion signatures. In order to construct intelligent rule-based Intrusion Detection Systems (IDS), one of the key challenges is to ensure that the extracted rules should be (i) accurate to detect both known and unknown attacks and recognize normal traffic, and (ii) linguistically interpretable for human comprehension. In traditional Genetic Fuzzy Rule-Based Systems (GFRBS), accuracy and interpretability are often contradictive to each other and not addressed simultaneously. As it is desirable to obtain highly interpretable knowledge in IDS to assist security experts for complicated intrusion analysis, optimizations of both accuracy and interpretability should be taken into account. To achieve this goal, we propose a Multi-Objective Genetic Fuzzy IDS (MOGFIDS), which applies agent-based evolutionary computation framework to generate and evolve an accurate and interpretable fuzzy knowledge base. In addition, the MOGFIDS can search for a near-optimal feature subset from network data. The interpretability including distinguishability, completeness, consistency, compactness and utility are discussed in our previous work [1], and will only be briefly mentioned in this paper. The rest of this paper is organized as follows. Section 2 describes the agent-based framework in MOGFIDS. Experimental results are evaluated in Section 3. Finally, Section 4 concludes this paper.

Proceedings of the Fifth IEEE International Conference on Data Mining (ICDM’05)

1550-4786/05 $20.00 © 2005 IEEE

Figure 1. Agent-based framework and agent evolution process

Each autonomous FSA employs three main strategies to construct and evolve its fuzzy systems. It initializes fuzzy sets information using the fuzzy sets distribution strategy. Based on the initialized fuzzy sets, an interpretable fuzzy rule base is then generated through the interpretability-based regulation strategy and fuzzy rules generation strategy. To find the global optimal fuzzy rule base, in each generation FSAs generate their offspring by cooperatively exchanging their fuzzy sets information, and applying crossover and mutation operations to the chromosomes of hierarchical formulation. The fuzzy rule bases of the offspring FSAs are generated using the previously mentioned strategies in a similar manner. Finally, FSAs submit their fitness values to AA for evaluation. The AA evaluates the parent and offspring FSAs based on their fitness assessments in both accuracy and interpretability criteria. As a result, the elitist FSAs are retained and the low-fitness FSAs are removed in each generation. Three intra-behaviors as well as interactions between the agents are discussed as follows.

2.1. Intra-Behaviors of Fuzzy Set Agents 2.1.1. Fuzzy Sets Distribution Strategy Minimal number of fuzzy sets and rules can be effectively searched without aprior knowledge of fuzzy set topology by Hierarchical GA (HGA) [2]. As depicted in Figure 2, each chromosome in HGA consists of control genes and parameter genes, which are used to optimize the distribution of fuzzy sets.

Figure 2. Example of hierarchical chromosome. Three-level gene structure has a phenotype value (7,6).

To sufficiently represent each fuzzy variable xi, a possible maximal number of fuzzy sets Mi is determined. For N dimensional problem, totally P=M1+M2+…+MN possible fuzzy sets require P binary-valued control genes to manage the activation of their parameter genes. Gaussian combinational membership function (Gauss2mf) is used to formulate antecedent fuzzy sets in parameter genes. The Gauss2mf is defined by the lower bound a1, left center a2, right center a3 and upper bound a4 of the definition domain (a1a2a3a4). The Gauss2mf used in HGA is shown in Figure 3. FSAs randomly initialize all the genes at the beginning of run.

minimal number of fuzzy rules considering both accuracy and interpretability, FSAs perform the following tasks:

(a) Initialization of Rule Base Population Each fuzzy rule is encoded as a string of length N, where the ith element has value ci: 0 ci Mia which indicates the ci th fuzzy set is triggered (ci>0) or the ith fuzzy variable does not play a role in rule generation (ci=0). After that, FSA defines the population size Npop, i.e., the number of fuzzy rule sets that represent a complete rule base. Each individual of fuzzy rule sets population is represented as a concatenated string of the length N×Nrule, where Nrule is a predefined integer specifying the size of the initial fuzzy rule base. In this concatenated string, each substring of length N represents a single fuzzy rule. A heuristic procedure [3] is applied to generate rule consequents for classification such that the consequents are not coded as parts of the concatenated string. The fuzzy rule sets are randomly initialized so that the value of the concatenated string can present one of the fuzzy sets of the corresponding fuzzy variable, or is equal to zero indicating “don’t care” conditions.

(b) Crossover and Mutation New offspring rule sets are generated by crossover and mutation. One-point crossover operation randomly selects different cutoff points for each parent to generate offspring rule sets. An example of crossover is given in Figure 4.

Figure 4. Example of crossover operation on the rule sets.

Figure 3. Example of Gauss2mf encoded in HGA.

2.1.2. Interpretability-based Regulation Strategy As the distinguishability of fuzzy partitioning cannot be guaranteed in the above initialization, we apply an interpretability-based regulation strategy to maintain a more appropriate distribution of fuzzy sets. If the similarity between two fuzzy sets is greater than a given threshold, then the two fuzzy sets will be merged and become a new one. If a fuzzy set is similar to the universal set or a singleton set, then it will be removed from the rule base. For details of the fuzzy sets merging and removal methods, reader may refer to section 4.2 in [1].

2.1.3. Fuzzy Rules Generation Strategy To perform genetic optimization on fuzzy rule base, Pittsburgh approach is applied to extract rules from the training data. Suppose there are N fuzzy variables, Mia is the number of active fuzzy sets for variable xi. and the “don’t care” conditions are included for incomplete rules. Hence, the maximum number of possible rules is (M1a+1)×(M2a+1)×…×(MNa+1). In order to search for a

Proceedings of the Fifth IEEE International Conference on Data Mining (ICDM’05)

1550-4786/05 $20.00 © 2005 IEEE

The mutation operation randomly replaces an element of the rule sets with another linguistic value if a simple probability test is satisfied. Elimination of existing rules and addition of new rules can also be used as mutation operations. As a result, the number of rules in the rule sets string can be changed accordingly.

(c) Evaluation Criteria and Selection Mechanism FSA applies three criteria to evaluate fuzzy rule set candidates: (i) classification accuracy, (ii) number of fuzzy rules, and (iii) total length of fuzzy rules, i.e. the total number of rule antecedents in rule base. The classification accuracy can be calculated using a single winner rule method [4]. For each training sample Xi, the winner rule Ri is determined as ǎi(Xi)=max{ǎk(Xi) | k=1,2,…,R} where R is the number of rules. All the fuzzy rule base candidates are evaluated by FSAs using the robust multi-objective optimization algorithm NSGA-II [5]. Suppose there are Npop+Noffs candidates, where Npop is the size of parent population and Noffs is the size of offspring population resulting from crossover and mutation operations. The FSAs employ elitism strategy to select Npop best candidates from the mixed populations.

2.2. Interactions between FSAs and AA

3.2. Performance Evaluations

The FSAs interact with one another for exchanging fuzzy sets information and generating offspring agents. Assume the number of offspring FSAs Noffs is less than or equal to that of parent FSAs Ncurr, Noffs FSAs can be randomly selected from the current agent population. Therefore, two parent FSAs generate two offspring FSAs, and Noffs offspring FSAs can be generated and applied with crossover and mutation. As depicted in Figure 1, the offspring FSAs apply interpretability-based regulation strategy and fuzzy rules generation strategy to generate rule bases. After that, FSAs send their fitness information to AA, which applies NSGA-II to evaluate parent and offspring FSAs and selects Ncurr best FSAs to be the population in next generation. Elitist FSAs considering both accuracy and interpretability survive from the competition, while the low-fitness FSAs are discarded.

We apply 12 FSAs each of which has 10 fuzzy rule sets solutions, therefore 120 fuzzy systems are generated for initialization. The trends of the multiple objectives against the number of iterations are plotted in Figure 5.

3. Experimental Results 3.1. Data Description and Preprocessing The KDD-Cup99 dataset from the UCI repository (http://www.ics.uci.edu/~mlearn/MLRepository.html) is widely used as the benchmark data for IDS evaluation. We apply its 10% training data (494021 connection records) for training. Each record can be classified as normal traffic, or one of 22 different classes of attacks. All attacks fall into 4 main categories: DOS, R2L, U2R and Probing. To alleviate class imbalance problem in training, random sub-sampling is applied to three largest classes: Normal, Neptune and Smurf, which contain 98% records of whole dataset. As a result, 20752 records are applied for training. The KDD-Cup99 independent test data (311029 records) with different class probability distribution and new attacks is used for evaluation. As each record contains both continuous and nominal features, the nominal features are converted into binary numeric features. Hence, 52 numeric features are constructed and normalized to the interval [0, 1]. They are given in Table 1 below. Table 1. Feature set of the preprocessed KDD-Cup99 data. Note that since feature ‘service type’ can be expanded into 71 binary features that heavily increase the dimensionality as well as the initial rule length, this feature is not applied in this work. # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

feature name duration protocolType=tcp protocolType=udp protocolType=icmp flag=SF flag=REJ flag=S0 flag=S1 flag=S2 flag=S3 flag=SH flag=RSTO flag=RSTOS0 flag=RSTR flag=OTH SrcBytes DstBytes Land

# 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36

feature name wrongFragment Urgent Hot NumFailed LoggedIn numCompromised RootShell suAttempted numRoot numFileCreations numShells numAccessFiles numOutboundCmds isHostLogin isGuestLogin count srvCount serrorRate

# 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52

feature name srvSerrorRate rerrorRate srvRerrorRate sameSrvRate diffSrvRate srvDiffHostRate dstHostCount dstHostSrvCount dstHostSameSrvRate dstHostDiffSrvRate dstHostSameSrcPortRate dstHostSrvDiffHostRate dstHostSerrorRate dstHostSrvSerrorRate dstHostRerrorRate dstHostSrvRerrorRate

Proceedings of the Fifth IEEE International Conference on Data Mining (ICDM’05)

1550-4786/05 $20.00 © 2005 IEEE

Figure 5. Trends of average accuracy, average number of fuzzy sets, average number of rules and average total-length of fuzzy rule-base among all FSAs on the training data.

The results show that the agents can continuously improve the average accuracy using the elitism strategy in each generation. On average, stable accuracies can be obtained using about 106 rules each of which has approximate rule length of 17 only. The rule base is acceptable for accurate classification. The trade-offs among the multiple objectives within the non-dominated fuzzy system solutions are shown in Figure 6.

Figure 6. Non-dominated Pareto front about fuzzy systems of MOGFIDS on the training data.

The results show that, out of 120 fuzzy systems, there are 19 non-dominated solutions found in training. The average accuracy obtained from the FSAs varies from 80% to 99%, with the number of rules ranging from 30 to 300. As widespread non-dominated solutions can be obtained in MOGFIDS considering both accuracy and interpretability, they assist experts to comprehend the intrusion attacks recognized by the fired rules. The fuzzy system, which extracts 148 fuzzy rules from training data and obtains the peak accuracy (99.24%) as well as lowest false positive rate (FPR) (1.1%) for normal traffic, is applied for test validation. Another advantage of the MOGFIDS is that it can be considered as a genetic wrapper in feature selection. As the fuzzy variables can be selected and removed through the crossover and mutation

operations during the FSAs evolution, a desired feature subset that minimizes the classification error and improves the interpretability of fuzzy system can be searched accordingly. It is shown that only 27 out of 52 features are significantly relevant and used in the fuzzy rules of the selected fuzzy system for classification. The selected features are given in Table 2. Table 2. Feature subset selected by MOGFIDS.

Features selected in the extracted fuzzy rules 2,3,4,16,17,18,19,20,22,23,24,25,28,29,33,34,36,41,43,44,45,46,47,48,49,50,51

Due to the space limit, the distributions of only two feature variables are shown in Figure 7. It demonstrates that MOGFIDS generates distinguishable fuzzy sets distributions easily understandable by human beings.

Regarding the rare classes of U2R and R2L attacks, MOGFIDS outperforms other baseline classifiers in terms of both Recall and F-measure for U2R attacks, and also achieves competitive results for R2L attacks, indicating that MOGFIDS can relatively alleviate the over-fitting problem when it is learned with small training samples and evaluated with large test samples with novel attacks. Considering the major classes such as Probe and DOS attacks, both SVM and MOGFIDS can achieve high Recall, Precision and F-measure rates. For the normal traffic recognition, both C4.5 and MOGFIDS obtain relative low FPRs. As the MOGFIDS is further compared with the KDD-Cup99 winner and other rule-based classifiers in the literature, the overall results demonstrate that MOGFIDS achieves robust performance by detecting known and unseen attacks with high detection rate and recognizing normal traffic with acceptably low FPR.

4. Conclusions

Figure 7. Distribution of fuzzy sets of feature with index #46 (left) and #51 (right).

Classification performance of MOGFIDS is measured and compared with that of different baseline classifiers including pruning C4.5, Naïve Bayes (NB), k-Nearest Neighbor (k-NN) and Support Vector Machine (SVM). Note that in k-NN classifier parameter k is set to 5, and the SVM is trained using the well-known fast sequential minimal optimization method with a polynomial kernel. The results are also compared with the winner of KDDCup99 contest [6], RIPPER and an improved PNrule [7] classifier recently proposed in the literature. Table 3 shows the comparative results of different classifiers. Table 3. Recall, precision, F-measure, classification cost [6] and accuracy obtained with different classifiers on the test data. The best results are bold-faced, and the second and third best results are underlined. The numbers of records of each class in our training and test data are given under the category name in the format of {training: test}. C4.5

NB

5-NN

81.88 52.20 63.76

90.45 64.16 75.07

81.61 55.46 66.05

SVM MOGF Cup -IDS Winner [6] 86.27 88.60 83.30 77.72 74.40 64.81 81.77 80.88 72.90

96.99 99.69 98.32

82.75 94.00 88.02

97.00 99.42 98.19

97.56 99.86 98.70

97.20 99.90 98.53

14.47 9.35 11.36

13.16 2.05 3.54

14.91 5.47 8.00

10.09 53.49 16.97

1.45 30.32 2.77

62.74 42.70 50.82

6.90 66.97 12.51

{10000: precision 60593} f-measure

98.38 74.75 84.96

55.47 43.33 48.65

Accuracy Classification cost

92.02

76.45

Probe

recall

{4107: 4166}

precision

DOS

recall

f-measure

{5467: precision 229853} f-measure

U2R

recall

{52: 228}

precision

R2L

recall

{1126: 16189}

precision

f-measure

f-measure

Normal recall

Ripper PNrule [7] [7] 81.16 77.92 79.51

89.01 82.11 85.42

97.10 99.88 98.47

22.06 95.75 35.86

21.74 96.68 35.50

15.79 61.02 25.09

13.20 71.43 22.28

11.84 55.10 19.49

11.40 53.06 18.77

3.55 62.39 6.71

11.01 68.39 18.97

8.40 98.84 15.48

8.33 81.85 15.12

13.05 82.37 22.53

95.89 74.15 83.63

97.99 73.42 83.94

98.36 74.74 84.94

99.50 74.61 85.28

N/A N/A N/A

N/A N/A N/A

91.83

92.54

92.77

92.71

N/A

N/A

0.2480 0.4965 0.2458 0.2457 0.2317 0.2331

N/A

N/A

Proceedings of the Fifth IEEE International Conference on Data Mining (ICDM’05)

1550-4786/05 $20.00 © 2005 IEEE

It is desirable for anomaly rule-based IDS to achieve high classification accuracy, and meanwhile reduce the complexity of the rule bases that are extracted from training data. Due to the fact that the accuracy and interpretability are often contradictory in the optimization of GFRBS, we thus propose MOGFIDS that applies an agent-based evolutionary computation framework to evolve an accurate and interpretable fuzzy knowledge base for anomaly intrusion detection. Experimental results demonstrate that MOGFIDS achieves robust performance for classifying both intrusion attacks and normal network traffic. In addition, it can search for a reduced feature subset and obtain interpretable fuzzy systems.

References [1] H. L. Wang, S. Kwong, Y. Jin, W. Wei, and K. F. Man, Multiobjective hierarchical genetic algorithm for interpretable fuzzy rule-based knowledge extraction, Fuzzy Sets and Systems, 149(1), Jan. 2005, pp. 149-186. [2] K. S. Tang, K. F. Man, Z. F. Liu, and S. Kwong, Minimal fuzzy memberships and rules using hierarchical genetic algorithms, IEEE Trans. Industrial Electronics, 45(1), Feb. 1998, pp. 162169. [3] H. Ishibuchi and T. Nakashima, Effect of rule weights in fuzzy rule-based classification systems, IEEE Trans. Fuzzy Systems, 9(4), Aug. 2001, pp. 506-515. [4] H. Ishibuchi, K. Nozaki, and H. Tanaka, Distributed representation of fuzzy rules and its application to pattern classification, Fuzzy Sets and Systems, 52(1), Nov. 1992, pp. 21-32. [5] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, A fast and elitist multiobjective genetic algorithm: NSGA-II, IEEE Trans. Evolutionary Computation, 6(2), Apr. 2002, pp. 182-197. [6] C. Elkan, Results of the KDD’99 classifier learning, ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, Boston, MA, 1(2), 2000, pp. 63-64. [7] R. Agarwal and M. V. Joshi, PNrule: a new framework for learning classifier models in data mining (a case-study in network intrusion detection), In: Proc. First SIAM Conf. on Data Mining, Apr. 2001.