A Data Classification method based on Fuzzy Linear Programming ...

9 downloads 455 Views 79KB Size Report
6000 records - School of Management, Graduate University of Chinese Academy of Sciences, .... In compromise solution approach (Yu, 1985a), the best trade-off ...
MCDM 2006, Chania, Greece, July 19-23, 2006

A Data Classification method based on Fuzzy Linear Programming

1

Aihua Li, Yong Shi2, Jing He Chinese Academy of Sciences Research Centre on Data Technology and Knowledge Economy School of Management, Graduate University of Chinese Academy of Sciences, Chinese Academy of Sciences, Beijing 100080, China E-mail: [email protected] {yshi,hejing}@gucas.ac.cn

Keywords: classification, data mining, MCLP, fuzzy linear programming, membership function Summary: Multiple criteria linear programming and multiple criteria quadratic programming classification models have been applied in some field in financial risk analysis and credit risk control such as credit cardholders’ behavior analysis. In this paper, we proposed a Fuzzy linear programming classification method with soft constraints and criteria based on the previous findings from other researchers. In this method, the satisfied result can be obtained through selecting constraint and criteria boundary variable d i * respectively. A general framework of this method is also constructed. Two real-life datasets, one from a major USA bank and the other from a database of KDD 99, are used to test the accurate rate of the proposed method. And the result shows the feasibility of this method.

1. Introduction Data mining becomes an important international technology with the development of database and internet, which can extract non-trivial, implicit, previously unknown and potential useful patterns or knowledge from database. Classification is one of the functions in data mining, which is a kind of supervise learning. There are two steps in the classification process (Han, 2001). First, hidden pattern or discriminant function can be derived from the training set. Second, the pattern or discriminant function is applied to classify the testing data set. The training accurate rate and testing accurate rate are often used to evaluate the model. The term of classification methods initially employ Artificial Intelligent (AI), traditional statistics and machine learning tools, such as decision tree (Quinlan, 1986), linear discriminant analysis (LDA) (Fisher, 1936), support vector machine (SVM) (Vapnik, 1998) and so on. They have been applying in real-life medical, communication and strategic management problems. For different datasets with different characters, classification methods show their different advantages and disadvantages. For example, SVM or BNN fits well for the output of some dataset, but it may result over-fit problem sometimes. LDA shows its advantage when the data sets obey normal distribution, but not a good choice in other conditions. 1

The authors would like to thank Professor Siwei Cheng for his patience and encouragements on this work. They also express their thanks to Mr. Gang Kou and Peng Zhang for their constructive comments in preparing this paper. This research is partially supported by the grants (70531040 and 70472074) from the National Natural Science Foundation of China. 2

The corresponding author

Linear programming classification method (LP) was first proposed in 1980’s (Fred, 1981a; Fred, 1986; Glover, 1990a), which showed its potential applications. In 1990’s multiple criteria linear programming (MCLP) and multiple criteria quadratic programming (MCQP) classification models were developed (Shi, 2001a; Shi, 2002a; Kou, 2003a), which have been successfully used in credit cardholders’ behavior analysis. He et al. (He, 2004a) proposed a fuzzy linear programming model only with soft criteria (FLP), in which a satisfied solution could be solved. In this paper, we proposed a fuzzy linear programming classification method with soft constraints and criteria based on the previous researchers’ work. This paper is presented as follows: Section 2 reviews LP, MCLP and FLP method. Section 3 proposes FLP with soft constraints and criteria, which means decision maker can choose the reasonable bound for constraints in deriving a satisfied solution. Section 4 uses two examples, one from a major USA bank and the other from the database of KDD 99, to test the accurate rate of the proposed method. Some remarks are given in the Section 5.

2. LP, MCLP and FLP Classification Models In the linear programming classification method, the objectives of initial forms can be categorized as MMD and MSD (Fred, 1981b). Here, MMD means maximize the minimum distance of observations from the critical value. MSD means minimize the sum of the distance of the observations from the critical value. For example, in the credit card-holder behavior analysis a basic framework of two-class problems can be presented as: Given a set of r variables (attributes) about a cardholder a = (a1 , a 2 " a r ) , let Ai = ( Ai1 , Ai 2 " Air ) be the development sample of data for the variables, where i = 1,2,…, n and n is the sample size. We want to determine the best coefficients of the variables, denoted by X = ( x1 , x 2 " x r ) T , and a boundary value b (a scalar) to separate two classes: G (Good for non-bankrupt accounts) and B (Bad for bankrupt accounts), that is as follows: Ai X ≤ b , Ai ∈ B ( Bad ) Ai X ≥ b , Ai ∈ G (Good ) . To measure the separation of Good and Bad, we define: α i = the overlapping of two-class boundary for case Ai (external measurement); α = the maximum overlapping of two-class boundary for all cases Ai (α i < α ) ; β i = the distance of case Ai from its adjusted boundary (internal measurement). β = the minimum distance for all cases Ai from its adjusted boundary ( β i > β ). A simple version of Freed and Glover’s model which seeks MSD can be written as: Minimize

∑α i

i

,

Subject to: Ai X ≤ b + α i , Ai ∈ B , Ai X ≥ b − α i , Ai ∈ G

(M1)

where Ai are given, X and b are unrestricted, and α i ≥ 0 . The alternative of the above model is to find MMD as follows: βi , Maximize



i

Subject to: Ai X ≥ b − β i , Ai ∈ B , Ai X ≤ b + β i , Ai ∈ G

(M2)

where Ai are given, X and b are unrestricted, and β i ≥ 0 . A hybrid model (Glover, 1990b) that combines models (M1) and (M2) can be as follows: Minimize

∑ α −∑ β i

i

i

i

,

Subject to: Ai X = b + α i − β i , Ai ∈ B , Ai X = b − α i + β i , Ai ∈ G

(M3)

where Ai are given, X and b are unrestricted, and α i , β i ≥ 0 respectively. Shi (Shi, 2001b) applied the compromise solution of multiple criteria linear programming (MCLP) to minimize the sum of α i and maximize the sum of β i simultaneously. A two-criteria linear programming model is stated as follows: Minimize α i and Maximize βi , (M4)





i

i

Subject to:

Ai X = b + α i − β i , Ai ∈ B , Ai X = b − α i + β i , Ai ∈ G . where Ai are given, X and b are unrestricted, and α i , β i ≥ 0 respectively.

In compromise solution approach (Yu, 1985a), the best trade-off between -Σiαi and Σiβi is identified for an “optimal” solution. To explain this, assume the “ideal value” of -Σiαi be α* > 0 and the “ideal value” of Σiβi be β* > 0. Then, if -Σiαi > α*, the regret measure is defined as -dα+ = Σiαi + α*. Otherwise, it is defined as 0. If -Σiαi < α*, the regret measure is defined as dα - = α* + Σiαi; otherwise, it is 0. Thus, the relationship of these measures are (i) α* + Σiαi = dα - - dα +, (ii) |α* + Σiαi | = dα - + dα +, and (iii) dα- , dα + ≥ 0. Similarly, we derive β* - Σiβi = dβ - - dβ+, |β* - Σiβi | = dβ - + dβ+, and dβ - , dβ+ ≥ 0. An MCLP model for two-class separation is presented as: (M5) Minimize dα - + dα + + dβ - + dβ+ Subject to: α* + Σiαi = dα - - dα +,

β* - Σiβi = dβ - - dβ+, Ai X = b + αi - βi, Ai ∈ B, Ai X = b - αi + βi, Ai ∈ G where Ai, α*, and β* are given, X and b are unrestricted, and αi , βi , dα - , dα + , dβ - , dβ+ ≥ 0. In a fuzzy linear programming approach with soft criteria (He, 2004b), membership functions for the criteria Minimize Σ i α i and Maximize Σ i β i were expressed respectively by: ⎧ ⎧ 1, if Σ α ≥ y if Σ β ≥ y 1, i i 1U i i 2U ⎪ ⎪ α Σ − y Σ − y β ⎪⎪ i i 1L ⎪⎪ i i 2 L , if y < Σ α < y µ F 1 ( x) = ⎨ , if y < Σ β < y , µ F2 ( x) = ⎨ 1L 2 L i i 2U i i 1U ⎪ y1U − y1L ⎪ y2U − y2 L ⎪ ⎪ if Σ β ≤ y 0, 0, if Σ α ≤ y ⎪⎩ ⎪⎩ i i 2L i i 1L

Then, a fuzzy classification method for relaxing criteria Σ i α i and Σ i β i is given as: Maximize

ξ,

Subject to:

(M6)

ξ≤

ξ≤

Σ i α i − y1L , y1U − y1L

Σ i β i − y2L , y 2U − y 2 L

Ai X = b + αi - βi, Ai ∈ G, Ai X = b - αi + βi, Ai ∈ B where Ai, y 1L , y 1U , y 2 L and y 2U are known, X and b are unrestricted, and αi , βi , ξ ≥ 0.

3. A FLP Classification Method with Soft Constraints and Criteria It has been recognized that in many decision making problems, instead of finding the exist “optimal” solution (a goal value), decision makers often approach a “satisfying solution” between upper and lower aspiration levels that can be represented by the upper and lower bounds of acceptability for objective payoffs, respectively (Charnes, 1961; Yu, 1985a). This behavior, which has an important and pervasive impact on human decision making (Lindsay, 1972), is called the decision makers’ goal-seeking and compromise behavior. Zimmermann applied it as the basis of his pioneering work on fuzzy linear programming (Zimmermann, 1978). Fuzzy linear programming problem can be described as follows (Dubois, 1980): B x* = max(D( X ) ∧ F ( X )) = max{λ D(x ) ≥ λ , F ( x) ≥ λ , λ ≥ 0}

( )

x∈X

= max{λ D1 (x ) ≥ λ " D m (x ) ≥ λ , F (x ) ≥ λ , λ ≥ 0},

where fuzzy sets D(x), F(x) are transferred from the constraints and criteria of general programming problem with the membership function µ Di and µ F respectively.

x* is the satisfying solution of

original programming problem. For decision makers, since the optimal solution is not necessary at most time, satisfying solution may be enough to solve real-life problems. In the model MSD and MMD, the crisp “distance measurements” (α i and βi) of observations in criteria and constraints are used to evaluate the classification model in application. To consider the flexibility of the choices for these measurements in obtaining a satisfying solution, we relax the crisp criteria and constraints to soft criteria and constraints. This means that we can allow the flexible boundary b for classification scalar to derive the result what we expect in reality. Based on this idea, we can build a FLP method with both soft criteria and constraints by the following steps. First, we define the membership functions for the MSD problem with soft criterion and constraints as follows: ⎧ 1, if Σ i α i ≤ y1L , ⎪⎪ Σ α − y µ F1 ( x) = ⎨ i i 1U , if y1L < Σ i α i < y1U , ⎪ y1L − y1U 0, ⎩⎪

⎧ 1, 1 ⎪⎪ µ D1 (x ) = ⎨1 − [Ai X − (b + α i )], ⎪ d1 0, ⎩⎪ ⎧ 1, ⎪⎪ 1 [Ai X − b + α i )], µ D2 (x ) = ⎨1 + ⎪ d2 ⎪⎩ 0,

if Σ i α i ≥ y1U ,

if Ai X ≤ b + α i , if b + α i < Ai X < b + α i + d 1 , if Ai X ≥ b + α i + d 1 , if Ai X ≥ b − α i , if b − α i − d 2 < Ai X < b − α i , if Ai X ≤ b − α i − d 2 ,

Then, define y1L = min

∑α i

i

( ∑ α , max ∑ α ), in which the former can be computed

and y1U ∈ min

i

i

i

i

from (M1); thus the fuzzy MSD problem with soft criterion and constraints for (M1) is constructed as follows: Maximize λ ,

(M~ 1)

Subject to: Σ i α i − y1U ≥λ, y1L − y1U Ai X − (b + α i ) ≥ λ, d1

1−

Ai ∈ B ,

Ai X − (b − α i ) ≥ λ, d2 1≥ λ > 0 ,

1+

Ai ∈ G ,

where Ai are given, X and b are unrestricted, α i , d1 , d 2 > 0 respectively. For model (M2), we similarly define the membership function as follows: ⎧ if Σ i β i ≥ y 2U , 1, ⎪⎪ Σ β − y i i 2L , if y 2 L < Σ i β i < y 2U , µ F2 ( x) = ⎨ ⎪ y 2U − y 2 L 0, if Σ i β i ≤ y 2 L , ⎩⎪

⎧ 1, if Ai X ≥ b − β i , ⎪⎪ 1 [Ai X − b + β i ], if b − β i − d 3 < Ai X < b − β i , µ D3 (x ) = ⎨1 + ⎪ d3 if Ai X ≤ b − β i − d 3 , 0, ⎩⎪ ⎧ 1, if Ai X ≤ b + β i , ⎪⎪ 1 [Ai X − b − β i ], if b + β i < Ai X < b + β i + d 4 , µ D4 (x ) = ⎨1 − ⎪ d4 if Ai X ≥ b + β i + d 4 , ⎪⎩ 0,

Then, with the definition y 2U = max

∑β i

i

programming for model (M2) is built as below: Maximize λ, Subject to: Σ i β i − y2L y 2U − y 2 L

1+ 1−

( ∑ β , max ∑ β ), a fuzzy-linear

and y 2 L ∈ min

i

i

i

i

(M~ 2) ≥λ ,

Ai X − b + β i ≥ λ, d3 Ai X − b − β i ≥ λ, d4 1≥ λ > 0 .

Ai ∈ B , Ai ∈ G ,

where Ai are given, X and b are unrestricted, β i , d 3 , d 4 > 0 respectively.

In order to unify the sign of models in this research, we use the same membership function µ F1 in the ~ ~ model of M 1 , µ F2 in the model of M 2 instead of them in model 6, so model 6 would be changed into the following format:

Maximize λ , Subject to: Σ i α i − y1U ≥λ , y1L − y1U Σ i β i − y2L ≥λ ,

(M~ 3)

y 2U − y 2 L

Ai X = b + αi - βi, Ai ∈ G, Ai X = b - αi + βi, Ai ∈ B. where Ai is known, X and b are unrestricted, and αi , βi , λ ≥ 0, y 1L , y 1U , y 2 L and y 2U are the same in ~ ~ the models of M 1 and M 2 . To identifying a fuzzy model for model (M4), we first relax the model (M4)’s constraints to inequality ~ constraints. Then, Suppose d 1 = d 2 = d1* , d 3 = d 4 = d 2* , a fuzzy model with the combination ( M 1) ~ and ( M 2) for the relaxed (M4) will be: ~ Maximize λ , M4 Subject to: Σ i α i − y1U ≥λ, y1L − y1U

( )

Σi β i − y2L ≥λ , y 2U − y 2 L 1+

Ai X − (b + α i − β i )

1−

Ai X − (b + α i − β i )

≥ λ,

Ai ∈ B ,

≥ λ,

Ai ∈ B ,

≥ λ,

Ai ∈ G ,

≥ λ, d 2* 1≥ λ > 0 ,

Ai ∈ G ,

d1*

1+

d1* Ai X − (b − α i + β i )

1−

Ai X − (b − α i + β i )

d 2*

where Ai are given, X and b unrestricted, α i , β i > 0 respectively. d i * > 0 , i = 1,2 are fixed in the ~ computation. The definitions of y1L , y1U , y 2 L and y 2U are the same as those in model M 1 and model ~ M 2 respectively. ~ There are two pieces of difference between model ( M 4) and model (M4). First, instead of not the optimal solution, we obtain a satisfying solution based on the membership function from the fuzzy linear programming. Second, with the soft constraints to Model (M4) or (M8), the boundary b can be flexibly moved by the upper bound and the lower bound with the separated distance d i , i = 1,2,3,4 according the characteristics of the data.

4. Experimental Studies

There are two datasets used here to test the accuracy rate of the proposed fuzzy classification method with both soft criteria and constraints. The first dataset came from a major US bank with 65 attributes, which include the credit cardholder’s over limit fee, over charge fee and other information in credit card using history etc. There are total 6000 records in the dataset. Here we compare the proposed FLP with both soft criteria and constrains with MSD, MMD and MCLP model. We select 1400 records with 700 Good (non-bankrupt) and 700 Bad (bankrupt) randomly from the dataset for training, and the left 4600 are used to test the classifier accuracy, which is based on the method of cross validation. In the experiment, b is given as 0.5 for all models, d 1 = d 2 = d 1* = 1 ~ d 3 = d 4 = d 2* = 1.5 respectively for Fuzzy-model ( M 4 ). There are five groups’ training results in table 1 and testing result in table 2 listed below. In tables 1, 2 and 3, we define: Absolute Accurate Rate of Good= Sensitivity= t _ Good , Good

t _ Bad , Absolute Accurate Rate of Bad=Specificity= Bad Good Bad . Catch-Rate=Accuracy= Sensitivity ⋅ + Specificity ⋅ Good + Bad Good + Bad

Where t _ Good is the number of the “Good” (“Good” records that were correctly classified as much). Good is the number of “Good”; t _ Bad is the number of the “Bad” (“Bad” records that were correctly classified as much). Bad is the number of “Bad”. In this case, to catch a “bad” person is more important than to catch a “good” cardholder in order to avoid the defaulting. ~ Table 1 and Table 2 show us model (M5)-MCLP, Fuzzy-model ( M 3 )-FLP1 and the proposed model ~ ( M 4 )-FLP2 are better than (M1)-MSD and (M2)-MMD. Although Model (M2)- (MMD) is the best for “Bad” catching, but it can not be selected due to its poor “Good” catching and instability in the experiment. MCLP shows its “trade-off” with the balanced “Good” and “Bad” accuracy rate. FLP1 works well for the overall catch rate and a little worse than FLP2 for “bad” catching. Thus, among MSD, MCLP and FLP if we consider the importance of catching “Bad” cardholder and keep a satisfied absolute ~ accuracy rate, fuzzy-model ( M 4 ) would be a good choice. Table 3 shows us that the choice of boundary ~ value d i * in the model ( M 4 ) makes an effect on the result of classification. By adjusting the value

of d i * , we can get the satisfied result of classification in the training process.

Table 1. Training results of 1400 records Model 1(MSD) Absolute Accuracy Rate Different Group

Good

Bad

Model 2(MMD) Absolute Accuracy Rate

Catch Rate

Good

Bad

Model 5(MCLP) Absolute Accuracy Rate

Catch Rate

Good

Bad

Fuzzy-Model 3(FLP1) Absolute Accuracy Rate

Catch Rate

Good

Bad

Fuzzy-Model 4(FLP2) Absolute Accuracy Rate

Catch Rate

Good

Bad

Catch Rate

Group1

0.67

0.68

0.68

0.06

0.93

0.50

0.74

0.74

0.74

0.77

0.80

0.78

0.62

0.82

0.72

Group2

0.70

0.70

0.70

0.04

0.90

0.47

0.79

0.79

0.79

0.75

0.78

0.76

0.68

0.83

0.75

Group3

0.69

0.70

0.70

0.04

0.93

0.49

0.77

0.77

0.77

0.74

0.78

0.76

0.67

0.79

0.73

Group4

0.69

0.70

0.69

0.06

0.91

0.48

0.76

0.75

0.76

0.75

0.79

0.77

0.62

0.84

0.73

Group5

0.72

0.69

0.71

0.28

0.60

0.44

0.73

0.78

0.75

0.28

0.60

0.44

0.58

0.84

0.71

Table 2. Testing results of 4600 records Model 1(MSD) Absolute Accuracy Rate Different Group

Good

Model 2(MMD) Absolute Accuracy Rate

Catch Rate

Bad

Good

Model 5(MCLP) Absolute Accuracy Rate

Catch Rate

Bad

Good

Catch Rate

Bad

Fuzzy-Model 3(FLP1) Absolute Accuracy Rate Good

Catch Rate

Bad

Fuzzy-Model 4(FLP2) Absolute Accuracy Rate Good

Bad

Catch Rate

Group1

0.70

0.77

0.70

0.03

0.92

0.08

0.75

0.74

0.75

0.72

0.78

0.73

0.61

0.84

0.62

Group2

0.69

0.71

0.69

0.03

0.91

0.08

0.73

0.79

0.74

0.74

0.74

0.74

0.63

0.79

0.64

Group3

0.72

0.73

0.72

0.03

0.90

0.08

0.75

0.72

0.75

0.76

0.75

0.76

0.67

0.78

0.67

Group4

0.69

0.70

0.69

0.04

0.88

0.09

0.77

0.68

0.76

0.74

0.70

0.74

0.65

0.82

0.66

Group5

0.70

0.68

0.70

0.28

0.63

0.30

0.75

0.72

0.75

0.72

0.63

0.72

0.61

0.78

0.62

Table 3. Training and Testing result of 1400 records for Fuzzy-model 4

different di

Training: Absolute Accuracy Rate

Testing Absolute Accuracy Rate

d 1 =d 2

d 3 =d 4

Good

Bad

Catch Rate

Good

Bad

Catch Rate

1

3

0.547

0.897

0.722

0.538

0.896

0.558

1

2

0.574

0.863

0.719

0.570

0.877

0.588

1

1.5

0.619

0.823

0.721

0.607

0.838

0.620

1

1

0.673

0.746

0.709

0.670

0.777

0.676

The second dataset came from KDD 99. Here a connection is a sequence of TCP packets starting and ending, between which data flows from a source IP address to a target IP address under some well defined protocol. Each connection is labeled as either normal or an attack, here dos is exactly one specific attack type. In this task we select 38 characters needed. There are 1060078 records in the dataset we used in this example, 812812 “Normal” records and 247266 “Dos” records. First, 4000 records was selected randomly from the data set for training, 2000 of which is labeled “Normal”, the other 2000 is labeled “Dos”. Second, the left records, 810812 records for “Normal” and 245266 records for “Dos” were used for testing. Table 4 and Table 5 show us the result of the training and testing.

Table 4. Training results of 4000 records Model 1(MSD)

Model 2(MMD)

Absolute Accuracy Rate Catc h

Absolute Accuracy Rate Catc h

Normal

Dos

Rate

Normal

Dos

Rate

Normal

Dos

Rate

Normal

Dos

Rate

Group1

0.989

0.997

0.993

0.508

0.919

0.713

0.998

0.993

0.995

0.989

0.997

0.993

Group2

0.991

0.995

0.993

0.269

0.972

0.621

0.992

0.998

0.995

0.99

0.996

0.993

Group3

0.987

0.998

0.992

0.232

0.992

0.612

0.993

0.998

0.995

0.987

0.997

0.992

Group4

0.989

0.997

0.993

0.263

0.982

0.622

0.994

0.997

0.995

0.990

1.000

0.995

Average

0.989

0.997

0.993

0.318

0.966

0.642

0.994

0.997

0.995

0.989

0.998

0.993

Differen t Group

Model 5(MCLP) Absolute Accuracy Rate Catc h

Fuzzy-Model 4(FLP2) Absolute Accuracy Rate Catc h

Table 5. Testing results of other records

Differen t Group

Model 1(MSD)

Model 2(MMD)

Model 5(MCLP)

Fuzzy-Model 4(FLP2)

Absolute Accuracy Rate

Absolute Accuracy Rate

Absolute Accuracy Rate

Absolute Accuracy Rate

Catch

Catch

Catch

Catch

Normal

Dos

Rate

Normal

Dos

Rate

Normal

Dos

Rate

Normal

Dos

Rate

Group1

0.918

0.990

0.935

0.499

0.916

0.595

0.975

0.983

0.977

0.953

0.988

0.961

Group2

0.971

0.986

0.975

0.323

0.980

0.476

0.968

0.989

0.973

0.959

0.988

0.965

Group3

0.925

0.989

0.940

0.254

0.988

0.424

0.930

0.987

0.943

0.966

0.988

0.972

Group4

0.914

0.989

0.931

0.290

0.982

0.451

0.963

0.985

0.968

0.954

0.988

0.962

Average

0.932

0.989

0.945

0.342

0.967

0.487

0.959

0.986

0.965

0.958

0.988

0.965

In table 4 and 5, we use: t _ Normal , Normal t _ Dos Absolute Accurate Rate of Dos=Specificity= , Dos

Absolute Accurate Rate of Normal= Sensitivity=

Catch-Rate=Accuracy= Sensitivity ⋅

Normal Dos . + Specificity ⋅ Normal + Dos Normal + Dos

Where t _ Normal is the number of the “Normal” (“Normal” records that were correctly classified as much). Normal is the number of “Normal”; t _ Dos is the number of the “Dos” (“Dos” records that were correctly classified as much). Dos is the number of “Dos”. In this experimental study, MMD shows the same character as the credit cardholder dataset analysis. But the result of compare is not very clear from the separate group training and testing result, so we computer the average value to analysis the classification efficiency. The average value tells that MCLP and FLP2 show better for catch rate in ~ testing. MSD works well for “Dos” catching and fuzzy-model ( M 4 ) FLP2 does a little worse than that. In this paper, we just compared the proposed FLP classification method with MMD, MSD and MCLP in two real-life datasets. As references, the readers can find the previous comparing works between MCLP, FLP with soft criteria, decision tree and neural network in (Gou, 2003b; He, 2004c; Shi, 2002b). Thus, we shall not elaborate the comparison of this FLP method and other classification methods. 5. Remarks

In this paper, we have proposed a fuzzy linear programming (FLP) classification method with both soft criteria and constraints based on the previous researchers’ works. The relationship between this model and other related models was discussed. Two real-life datasets, one from the real bank in USA and the other from KDD 99, have been used to evaluate the accurate rate of classification. The result shows the feasibility of this method. Moreover, the general framework of FLP for classification have been first time described systemically and evaluated. However, there is some new research work to be considered and continued in the line of research. For example, how does the value d i affect the result of classification? How can we consider ensemble analysis to improve the selection of the best classifier? We shall report the significant results of these ongoing projects in the near future. References

A. Charnes and W.W. Cooper(1961), Management Models and Industrial Applications of Linear Programming, New York: Wiley. Didier Dubois and Henri Prade (1980), Fuzzy Sets and Systems: Theory and Application, USA: Academic Press, 242-248 D. Olson and Y. Shi (2005), Introduction to Business Data Mining, McGraw-Hill/Irwin. Fisher, R.A. (1936), “The Use of Multiple Measurements in Taxonomic Problems”, Annals of Eugenics, 7, 179-188 G. Kou, X. Liu, Y. Peng, Y. Shi, M. Wise, W. Xu (2003), “Multiple Criteria Linear Programming Approach to Data Mining: Models, Algorithm Designs and Software Development”, Optimization Methods and Software, 18 (4) , 453-473. H.J. Zimmermann (1978), “Fuzzy Programming and Linear Programming with Several Objective Functions”, Fuzzy Sets and Systems, 1, 45-55. J. Han and M. Kamber(2001), Data Mining: Concepts and Techniques, Beijing: Academic Press. F. Glover(1990), “Improve Linear Programming Models for Discriminant Analysis”, Decision Sciences, 21, 771-785. J. He, X.Liu, Y. Shi, W. Xu and N. Yan(2004), “Classifications of Credit Cardholder Behavior by Using Fuzzy Linear Programming”, Information Technology and Decision Making, 3, 633-650. J. R. Quinlan (1986), “Induction of Decision Tree”, Machine Learning, 1, 81-106. N. Freed and F. Glover (1981), “Simple but Powerful Goal Programming Models for Discriminant Problems”, European Journal of Operational Research, 7, 44-60. N. Freed and F. Glover (1986), “Evaluating Alternative Linear Programming Models to Solve the Twogroup Discriminant Problem”, Decision Science, 17, 151-162. P. H. Lindsay, D. A. Norman (1972), Human Information Processing: An Introduction to Psychology, New York: Academic Press. P. L. Yu (1985), Multiple Criteria Decision Making: Concepts, Techniques and Extensions, New York: Plenum Press. Y. Lin (2002), “Improvement on Behavior Scores by Dual-model Scoring System”, International Journal of Information Technology and Decision Making, 1 , 153-164. Y. Shi, M. Wise, M. Luo and Y. Lin (2001), “Data Mining in Credit Card Portfolio Management: a Multiple Criteria Decision Making Approach”, Multiple Criteria Decision Making in the New Millennium, Springer, Berlin, 427-436. Y. Shi, Y. Peng, X. Xu and X. Tang (2002), “Data Mining via Multiple Criteria Linear Programming: Applications in Credit Card Portfolio Management”, International Journal of Information Technology and Decision Making, 1 (1), 145-166. V. Vapnik (1998), Statistical Learning Theory, Wiley. http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html