Classification with Degree of Membership: A Fuzzy ... - CiteSeerX

11 downloads 3342 Views 92KB Size Report
Classification with Degree of Membership: A Fuzzy Approach. Wai-Ho Au. Keith C.C. Chan. Department of Computing. The Hong Kong Polytechnic University.
Classification with Degree of Membership: A Fuzzy Approach Wai-Ho Au Keith C.C. Chan Department of Computing The Hong Kong Polytechnic University Hung Hom, Kowloon, Hong Kong E-mail: {cswhau, cskcchan}@comp.polyu.edu.hk Abstract Classification is an important topic in data mining research. It is concerned with the prediction of the values of some attribute in a database based on other attributes. To tackle this problem, most of the existing data mining algorithms adopt either a decision tree based approach or an approach that requires users to provide some userspecified thresholds to guide the search for interesting rules. In this paper, we propose a new approach based on the use of an objective interestingness measure to distinguish interesting rules from uninteresting ones. Using linguistic terms to represent the revealed regularities and exceptions, this approach is especially useful when the discovered rules are presented to human experts for examination because of the affinity with the human knowledge representation. The use of fuzzy technique allows the prediction of attribute values to be associated with degree of membership. Our approach is, therefore, able to deal with the cases that an object can belong to more than one class. For example, a person can suffer from cold and fever to certain extent at the same time. Furthermore, our approach is more resilient to noise and missing data values because of the use of fuzzy technique. To evaluate the performance of our approach, we tested it using several real-life databases. The experimental results show that it can be very effective at data mining tasks. In fact, when compared to popular data mining algorithms, our approach can be better able to uncover useful rules hidden in databases.

1. Introduction Classification is an important topic in data mining research [2, 9-12, 24, 26]. The problem is concerned with the mining of a set of production rules that can allow the values of an attribute in a database to be accurately predicted based on those of other attributes [1-2, 16, 19, 22, 24]. For example, we are given a customer database with each record characterized by such attributes as

income, car-owned, and plan-subscribed, a rule that could be discovered can be “90% of high-income customers who own a Jeep are subscribes of Plan B; 3% of all customers have both characteristics.” The discovery of such a rule could be important to a marketing manager who may, as a result, concentrate on promoting Plan B among high-income Jeep owners. For data mining to be effective, an algorithm should be able to handle linguistic or fuzzy variables. This is because the ability to do so would allow some interesting patterns to be more easily discovered and expressed. For example, if crisp boundaries are defined for “highincome” in the above rule, there is a possibility that it may not be interesting at all as the support and confidence measure is dependent to a large extent on the definitions of boundaries. Despite its importance, many data mining algorithms (e.g., [1-4, 16, 18-19, 22, 24-25]) were not developed to handle fuzzy data or fuzzy rules. They were used mainly to deal with categorical and quantitative attributes. In particular, when dealing with quantitative attributes, their domains are usually divided into equalwidth or equal-frequency intervals. In most cases, the resulting intervals are not too meaningful and are hard to understand. To deal with fuzzy data and fuzzy rules, we present a new approach, which employs linguistic terms to represent regularities and exceptions discovered, in this paper. These linguistic terms can be defined as fuzzy sets so that, based on their membership functions, either categorical or quantitative data, can be transformed by fuzzification. To deal with these fuzzified data so as to discover fuzzy rules, this approach utilizes the idea of residual analysis [5-8]. With it, our approach is able to reveal interesting associations hidden in the database without the need for users’ to supply some subjective thresholds. In other words, unlike many data mining algorithms (e.g., [1-4, 16, 18-19, 22, 24-25]) that only discover rules with consequent consisting only of categorical or discretized crisp-boundary quantitative attributes, our approach is able to discover rules with

consequent composing of linguistic terms. This allows the prediction of attribute values to be associated with degree of membership. Consequently, our approach is able to deal with the cases that an object can belong to more than one class. For example, a person can suffer from cold and fever to certain extent at the same time. Furthermore, the use of linguistic terms to represent the discovered rules also allows quantitative values to be inferred. The rest of this paper is organized as follows. In Section 2, we provide a brief description of how existing algorithms can be used for classification and how fuzzy techniques can be applied to the data mining process. The details of our approach are given in Section 3. To evaluate the performance of this approach, we applied it to several real-life databases. The results of the experiments are discussed in Section 4. Finally, in Section 5, we provide a summary of the paper.

2. Related work Among the different approaches to solving the classification problem, decision tree based algorithms (e.g., [1-2, 19, 21-22]) are the most popular. Other than the use of decision tree based algorithms, techniques that have been developed to mine association rules can also be used for classification (e.g., [16]). It is important to note that the intervals involved in quantitative association rules may not be concise and meaningful enough for human experts to obtain nontrivial knowledge. Linguistic summaries introduced in [27] express knowledge in linguistic representation, which is natural for people to comprehend. In addition to linguistic summaries, the applicability of fuzzy modeling techniques to data mining has been discussed in [13]. Furthermore, an information-theoretic fuzzy approach has been proposed in [17] to discover unreliable data in databases. Nevertheless, these fuzzy techniques have not been developed for classification. An approach, which combines symbolic decision trees with approximate reasoning offered by fuzzy representation, has been proposed in [14] for building fuzzy decision trees. Based on a set of predefined fuzzy linguistic variables, a method for constructing fuzzy decision trees and a number of inference procedures based on conflict resolution in rule-based systems and efficient approximate reasoning methods have been presented in [14]. Given a database, this approach can be used to build a fuzzy decision tree and the resulting tree can be used for inference. An empirical comparison of our approach with C4.5 [21] (a decision tree based approach), CBA [16] (an association rule mining approach), and FID [14] (a fuzzy decision tree approach) on several real-life databases will be given in Section 4 below.

3. A fuzzy approach for data mining Our approach is capable of mining fuzzy rules in large databases without any need for user-specified thresholds or mapping of quantitative into binary attributes. A fuzzy rule describes an interesting relationship between two or more linguistic terms. The definition of linguistic terms is presented in Section 3.1. The details of this approach are then given in Section 3.2. In Section 3.3, we describe how interesting fuzzy rules can be identified. A confidence measure, called weight of evidence [5-8] measure, is then defined in Section 3.4 to provide a means for representing the uncertainty associated with the fuzzy rules. In Section 3.5, we describe how to predict unknown values using the discovered fuzzy rules.

3.1. Linguistic terms Given a set of records, D, each of which consists of a set of attributes J = {I1, …, In}, where Iv, v = 1, …, n, can be quantitative or categorical. For any record, d ∈ D, d[Iv] denotes the value iv in d for attribute Iv. For any quantitative attribute, Iv ∈ J, let dom(Iv) = [lv, uv] ⊆ ℜ denote the domain of the attribute. Based on the fuzzy set theory, a set of linguistic terms can be defined over the domain of each quantitative attribute. Let us therefore denote the linguistic terms associated with some quantitative attribute, Iv ∈ J as Lvr, r = 1, …, sv, so that a corresponding fuzzy set, Lvr, can be defined for each Lvr. The membership function of the fuzzy set is denoted as µ Lvr and is defined as:

µ Lvr : dom( I v ) → [0, 1] The fuzzy sets Lvr, r = 1, …, sv, are then defined as:

µ Lvr (i v )





Lvr =

dom ( I v ) 

iv µ Lvr (i v )





dom ( I v )

iv

if I v is discrete if I v is continuous

for all iv ∈ dom(Iv). The degree of membership of some value iv ∈ dom(Iv) with some linguistic term Lvr is given by µ Lvr . Note that Iv ∈ J can also be categorical and crisp. In such case, let dom( I v ) = {i v1 , ..., i vmv } denote the domain of Iv. In order to handle categorical and quantitative attributes in a uniform manner, we can also define a set of linguistic terms, Lvr, r = 1, …, mv, for each categorical

attribute, Iv ∈ J, where Lvr is represented by a fuzzy set, Lvr, such that Lvr =

1

Using the above technique, we can represent the original attribute, J, using a set of linguistic terms, L = {Lvr | v = 1, …, n, r = 1, …, sv} where sv = mv for categorical attributes. Since each linguistic term is represented by a fuzzy set, we have a set of fuzzy sets, L = {Lvr | v = 1, …, n, r = 1, …, sv}. Given a record, d ∈ D, and a linguistic terms, Lvr ∈ L, which is, in turn, represented by a fuzzy set, Lvr ∈ L, the degree of membership of the values in d with respect to Lvr is given by µ Lvr (d [ I v ]) . In other words, d is characterized by the term Lvr to the degree µ Lvr (d [ I v ]) . If µ Lvr (d[ I v ]) = 1 , d is completely characterized by the term Lvr. If µ Lvr (d [ I v ]) = 0 , d is not characterized by the term Lvr at all. If 0 < µ Lvr (d [ I v ]) < 1 , d is partially characterized by the term Lvr. Realistically, d can also be characterized by more than one linguistic term. Let ϕ be a subset of integers such that ϕ = {v1, …, vm} where v1, …, vm ∈ {1, …, n}, v1 ≠ … ≠ vm and |ϕ| = h ≥ 1. We further suppose that Jϕ = {Iv | v ∈ ϕ}. Given any Jϕ, it is associated with a set of linguistic terms, Lϕr, r = 1, …, sϕ, where sϕ =

∏ϕ s

v

.

v∈

Each Lϕr is defined by a set of linguistic terms, L v1r1 , ..., L vm rm ∈ L . The degree, λLϕr (d ) , to which d is characterized by the term Lϕr is defined as: 11

(1)

We further suppose that deg L pqLϕk is the sum of

ivr

λLϕr (d ) = min(µ Lv r (d[ I v1 ]), ..., µ Lv

oL pqLϕk = min(µ L pq , µ Lϕk )

m rm

(d [ I vm ]))

D can then be represented by a set of fuzzy data, F, which is characterized by a set of linguistic attributes, L = {L1, …, Ln}. For any linguistic attribute, Lv ∈ L, the value of Lv in a record, t ∈ F, is a set of ordered pairs such that t[L v ] = {(L v1 , µ v1 ), ..., (L vsv , µ vsv )}

where Lvk and µvk, k ∈ {1, …, sv}, are a linguistic term and its degree of membership, respectively. For any record, t ∈ F, let oL pqLϕk be the degree to which t is characterized by the linguistic terms Lpq and Lϕk, p ∉ ϕ. oL pqLϕk is defined as:

degrees to which records in F characterized by the linguistic terms Lpq and Lϕk. deg L pqLϕk is given by: deg L pqLϕk =

t∈F

(2)

oL pqLϕk

Based on the linguistic terms, we can apply our approach to mine fuzzy rules in fuzzy data and present them to human users in a way that is much easier understood. Due to the use of fuzzy technique blurring the boundaries of adjacent intervals of numeric qualities, our approach is resilient to noise such as inaccuracies in physical measurements of real-life entities. 3.1.1. An illustrative example. In this section, we illustrate how a relation in a relational database can be transformed to a fuzzy relation based on the linguistic terms. Let us consider a sample relation shown in Figure 1. Age 23 29 33 35 55

MaritalStatus U M M U M

Salary 40,000 43,000 55,000 64,000 62,000

Figure 1. A sample relation. Let us further suppose that the Married attribute, which is a categorical attribute, is represented by two linguistic terms defined as: Unmarried =

1 1 and Married = U M

For the remaining two quantitative attributes, Age and Salary, they are represented by the linguistic terms given in Figure 2. Based on these linguistic terms, the sample relation is transformed to a fuzzy relation shown in Figure 3. Instead of mining interesting rules from the original relation, we perform data mining in the resulting fuzzy relation.

Degree of Membership

Young

1

Middle Aged

Old

0 0

20

40

60

80

Age

that are stored in R3 and so on for 4th and higher order. Our approach iterates until no higher order rule can be found. The function, interesting(Lpq, Lϕk), computes an objective measure to determine whether the relationship between Lpq and Lϕk is interesting. If interesting(Lpq, Lϕk) returns true, a fuzzy rule is then generated by the rulegen function. For each rule generated, this function also returns an uncertainty measure associated with the rule (see Section 3.4). All fuzzy rules generated by rulegen are stored in R that will then be used later for prediction or for the users to examine.

Degree of Membership

(a) The Age attribute Low

1

Medium

High

1) 2) 3) 4) 5) 6) 7)

R1 = {first-order fuzzy rules}; for(m = 2; |Rm – 1| ≠ φ; m + +) do begin C = {each condition in the antecedent of r | r ∈ Rm – 1}; forall ϕ composed of m elements in C do begin forall t ∈ F do

8)

forall (Lpq, µpq) ∈ t[Lp], (Lϕk, µϕk) ∈ t[Lϕ], p ∉ ϕ do

9)

deg L pqLϕk + = min( µ pq , µ ϕk ) ;

0 0

20000

40000

60000

80000 100000

Salary

(b) The Salary attribute

Figure 2. The definitions of linguistic terms. Age {(Young, 0.85), (Middle Aged, 0.15)} {(Young, 0.55), (Middle Aged, 0.45)} {(Young, 0.35), (Middle Aged, 0.65) {(Young, 0.25), (Middle Aged, 0.75)} {(Middle Aged, 0.25), (Old, 0.75)}

MaritalStatus {(Unmarried, 1)} {(Married, 1)} {(Married, 1)} {(Unmarried, 1)} {(Married, 1)}

Salary {(Low, 0.5), (Medium, 0.5)} {(Low, 0.35), (Medium, 0.65)} {(Medium, 0.75), (High, 0.25)} {(Medium, 0.3), (High, 0.7)} {(Medium, 0.4), (High, 0.6)}

Figure 3. The resulting fuzzy relation.

3.2. The fuzzy data mining algorithm It is important to note that a fuzzy rule can be of different orders. A first-order fuzzy rule can be defined to be a rule involving one linguistic term in its antecedent; a second-order rule can be defined to have two; and a thirdorder rule can be defined to have three linguistic terms, etc. Our approach is given in Figure 4 below. To mine interesting first-order rules, our approach makes use of an objective interestingness measure introduced in Section 3.3 below. After these rules are discovered, they are stored in R1 (Figure 4). Rules in R1 are then used to generate second-order rules that are then stored in R2. R2 is then used to generate third-order rules

10) 11) 12) 13)

forall (Lpq, µpq) ∈ t[Lp], (Lϕk, µϕk) ∈ t[Lϕ], p ∉ ϕ do if interesting(Lpq, Lϕk) then Rm = Rm ∪ rulegen(Lpq, Lϕk); end

14) end 15) R =

Rm ; m

Figure 4. The fuzzy data mining algorithm.

3.3. Discovering interesting rules in fuzzy data In order to decide whether a relationship between a linguistic term, Lϕk, and another linguistic term, Lpq, is interesting, we determine whether Pr(L pq | Lϕk ) =

sum of degrees to which records characterized by Lϕk and L pq

(3)

sum of degrees to which records characterized by Lϕk

is significantly different from Pr(L pq ) =

sum of degrees to which records characterized by L pq

 sp

where M =

(4)

M



u =1 i =1

deg L puLϕi .

If this is the case, we

consider the relationship between Lϕk and Lpq interesting. The significance of the difference can be objectively evaluated based on the idea of an adjusted residual [5-8] defined as:

d L pq Lϕk =

z L pqLϕk

(5)

γ L pqLϕk

where z L pqLϕk is the standardized residual [5-8] and is given by: z L pqLϕk =

deg L pqLϕk − eL pqLϕk

(6)

expected to be characterized by Lϕk and Lpq. It is defined as: sϕ i =1

u =1

(7)













sp 









i =1



γ L pqLϕk = 1 −



deg L pqLϕi







1−









M



u =1





deg L pu Lϕk 





M

(8)











If d L pq Lϕk > 1.96 (the 95 percentiles of the normal distribution), we can conclude that the discrepancy between Pr(Lpq | Lϕk) and Pr(Lpq) is significantly different and hence the relationship between Lϕk and Lpq is interesting. Specifically, the presence of Lϕk implies the presence of Lpq. In other words, it is more likely for a record having both Lϕk and Lpq.

3.4. Uncertainty representation Given that a linguistic term Lϕk is associated with another linguistic term Lpq, we can form the following fuzzy rule. L ϕk 

L pq [ wL pqLϕk ]

where wL pqLϕk is the weight of evidence measure that is defined as follows.

(L pi : Lϕk )) i ≠q

Pr(Lϕk | L pq )

(10)



Pr(Lϕk |

L pi ) i ≠q





(9)



= log

the variance of z L pqLϕk and is given by: 

Pr(L pq )

wL pqLϕk = I (L pq : Lϕk ) − I (

deg L pu Lϕk

M

Pr(L pq | Lϕk )

Based on mutual information, the weight of evidence measure is defined in [5-8] as:

sp

deg L pq Lϕi

and γ L pqLϕk is the maximum likelihood estimate [5-8] of



I (L pq : Lϕk ) = log

eL pqLϕk

where e L pqLϕk is the sum of degrees to which records are

e L pqLϕk =

Since the relationship between Lϕk and Lpq is interesting, there is some evidence for a record to be characterized by Lpq given it has Lϕk. The weight of evidence measure is defined in terms of an information theoretic measure known as mutual information. Mutual information measures the change of uncertainty about the presence of Lpq in a record given that it has Lϕk is and, in turn, defined as:

wL pqLϕk can be interpreted intuitively as a measure of

the difference in the gain in information when a record with Lϕk characterized by Lpq and when characterized by Lpi , i ≠ q. The weight of evidence measure can be used to weigh the significance or importance of fuzzy rules. Given that Lϕk is defined by a set of linguistic terms, L v1k1 , ..., L vm km ∈ L , we have a high-order fuzzy rule as follows: L v1k1 , ..., L vmk m

L pq [ wL pqLϕk ]

where v1, …, vm ∈ ϕ.

3.5. Predicting unknown values using fuzzy rules Given a record, d ∈ dom(I1) × … × dom(Ip) × … × dom(In), let d be characterized by n attribute values, α1, …, αp, …, αn, where αp is the value to be predicted. Let Lp, p = 1, …, sp, be the linguistic terms corresponding to the class attribute, Ip. We further let lp be a linguistic term with domain dom(l p ) = {L p1 , ..., L ps p } . The value of αp is given by the value of lp. To predict the correct value of lp, our approach searches the fuzzy rules with Lpq ∈ dom(lp) as consequents. For any combination of attribute values, αϕ, p ∉ ϕ, of d, it is characterized by a linguistic term, Lϕk, to a degree of compatibility, λLϕk (d ) , for each k ∈ {1, …, sϕ}. Given those rules implying the

assignment

of

Lpq,

L ϕk

L pq [ wL pqLϕk ] ,

for

all

k ∈ ζ ⊆ {1, …, sϕ}, the evidence for such assignment is given by: wL pqαϕ = 

k ∈ζ

wL pqLϕk ⋅ λLϕk (d )

(11)

Suppose that, of the n – 1 attribute values excluding

αp, only some combinations of them, α[1], …, α[j], …, α[β] with α[j] = {αi | i ∈ {1, …, n} – {p}}, are found to match one or more rules, then the overall weight of evidence for the value of lp to be assigned to Lpq is given by: wq =

β 

j =1

(12)

wL pqα[ j ]

As a result, the value of αp is given by {(L p1 , w1 ), ..., (L pq , wq ), ..., (L ps p , ws p )} . When a crisp value is to be assigned to αp, the following methods are used depending on Ip is categorical or quantitative. In case that Ip is categorical, lp is assigned to Lpc if wc > wg, g = 1, …, s ′p and g ≠ c

(13)

where s ′p (≤ sp) is the number of linguistic terms implied by the rules. αp is therefore assigned to ipc ∈ dom(Ip). If Ip is quantitative, a new method is used to assign an appropriate value to αp. Given the linguistic terms, L p1 , ..., L ps p , and their overall weight of evidence,

w1 , ..., ws p , let µ L′ pu (i p ) be the weighted degree of membership of ip ∈ dom(Ip) to the fuzzy set Lpu, u ∈ {1, …, sp}. µ L′ pu (i p ) is given by:

µ L′ pu (i p ) = wu ⋅ µ L pu (i p )

(14)

where ip ∈ dom(Ip) and u = 1, …, sp. The defuzzified −1

sp

value, F (

L pu ) , which provide an appropriate value 

u =1

for αp is then defined as: 

−1

sp 

L pu ) =

F ( u =1

dom ( I p )

µ L′ p1 ∪...∪ L ps (i p ) ⋅ i p di p p



dom ( I p )

µ L′ p1 ∪...∪ L ps (i p )di p p

(15)

where µ ′X ∪Y (i ) = max(µ ′X (i ), µ Y′ (i )) for any fuzzy sets X and Y. For quantitative predictions, we use root-meansquared error as a performance measure. Given a set of test records, D, let n be the number of records in D. For any record, r ∈ D, let [l, u] ⊂ ℜ denote the domain of the class attribute. We further let tr be the target value of the class attribute in r and or be the value predicted by our approach. The root-mean-squared error, rms, is defined as rms =

t r − l or − l 1 − n r∈D u − l u −l

2



(16)







4. Performance Analysis To evaluate the effectiveness of our approach, we tested it using several real-life databases: a credit card database, a diabetes database, and a social database. For each experiment, each of these databases was divided into two datasets with records in each dataset randomly selected. The mining of rules was performed on one of them. The other dataset was reserved for testing. For each such testing dataset, the values of the attributes to be predicted were deleted. The rules discovered by mining the other dataset was then used to predict the deleted attribute values. The predicted values are then compared against the original values to see if they are the same. If it is the case, an accuracy count is incremented. Based on this accuracy count, the percentage accuracy for our approach, C4.5 [21] (a decision tree based approach), CBA [16] (an association rule mining approach), and FID [14] (a fuzzy decision tree approach) are computed. The experiments performed for each of the databases were repeated ten times and the percentage accuracy, averaged over the ten trials, were recorded and are presented in the following sections.

4.1. The credit card database The credit card database [20] contains data about credit card applications. It consists of 15 attributes of which one of them, the Success attribute, is concerned with whether or not an application is successful. The meaning of these attributes are not known as the names of the attributes and their values were changed by the donor of the database to meaningless symbols to protect confidentiality of the data. Out of the 15 attributes, 6 are quantitative and 9 are categorical. Each of the 6 quantitative attributes was represented by 4 linguistic terms for our approach and FID. There are altogether 690 records in the database. For experiments, we randomly selected 30% (i.e., 207 in total) of them for testing by deleting from them the values of

the Success attribute. Each of our approach, C4.5, CBA, and FID, was then used to mine rules from the rest of the database (70% or 483 records). The discovered rules were then used to predict the missing Success values in the test records. This procedure of randomly selecting different sets of records for data mining and testing was repeated ten times. The percentage accuracy was computed for each trial and the percentage accuracy averaged over these ten trials is given in Table 1. Of the four different approaches, our approach performed better then C4.5, CBA, and FID by 6.3%, 3.9%, and 30.9%, respectively.

4.2. The diabetes database The diabetes database [23] contains 768 patient records. These records are characterized by 9 attributes including one denoted Test-results. Test-results contains either a “1” (tested positive for diabetes) or a “2” (tested negatively for diabetes). The other attributes are all quantitative. Each of these quantitative attributes was represented by 4 linguistic terms for our approach and FID. A total of 30% of the records were randomly selected from the database and the values of Test-results in these records deleted. Using each of our approach, C4.5, CBA, and FID, rules were mined from the remaining 70% of the data. These rules were then used to determine the values of Test-results in the test dataset. This testing procedure was repeated ten times for each of our approach, C4.5, CBA, and FID and the percentage accuracy, averaged over the ten trials, were determined. Of these different approaches, our approach performed better than C4.5, CBA, and FID by 3.8%, 3.2%, and 15.6%, respectively (Table 1).

4.3. The social database The social database [15] contains data collected by the US Census Bureau. The data in the database were divided into two sets by the donor of the data. The first dataset, which consists of 32,561 records, where used for data mining whereas the second dataset, which consists of 16,281 records, were used for testing. The records in the database are characterized by 15 attributes. Of these attributes, 6 are quantitative. Each of these quantitative attributes was represented by 4 linguistic terms for our approach and FID. The remaining 9 attributes are all categorical. Using each of our approach, C4.5, CBA, and FID, predictive modeling rules were mined from the dataset for data mining. These rules were then used to predict the values of the Salary attribute in the test data. The percentage accuracies for the four approaches are given in Table 1. Of these different approaches, our approach

performed better than C4.5, CBA, and FID by 0.5%, 1.7%, and 62.3%, respectively. Unlike the case with the credit card and the diabetes databases, it should be noted that testing was not repeated for this particular database. This is because the records for data mining and for testing were fixed by the donor rather than randomly selected.

4.4. Discussion In summary, our approach performed better than C4.5, CBA, and FID in the above cases. It achieved an average accuracy of 84.1% and is better than C4.5 by 3.5%, CBA by 2.9%, and FID by 36.2%. If we define a baseline accuracy to mean the accuracy obtained by simply assigning the most frequently occurring values to the attributes being predicted, the baseline accuracy for the credit card database is 55.5%, the diabetes database is 65.1%, and the social database is 76.1%. For all the databases, the accuracies of the rules discovered by our approach, C4.5, and CBA are significantly higher than the baseline accuracy. For the credit card database, the accuracy of FID is only marginally higher than the respective baseline accuracy. For the diabetes database, the accuracy of FID is marginally lower than the baseline accuracy. For the social database, the accuracy of FID is significantly lower than the baseline accuracy. Table 1. Average percentage accuracy. Databases credit card diabetes social Average

Percentage Accuracy Our Approach C4.5 CBA 88.9% 82.6% 85.0% 77.6% 73.8% 74.4% 85.9% 85.4% 84.2% 84.1% 80.6% 81.2%

FID 58.0% 62.0% 23.6% 47.9%

5. Conclusions In summary, we have presented a fuzzy approach that can be used for mining interesting rules for classification with degree of membership. This approach represents the revealed regularities and exceptions using linguistic terms. The use of linguistic terms allows human users to better understand the discovered rules because of the affinity with the human knowledge representation. Furthermore, our approach is capable of finding interesting relationships among attributes without any subjective input required of the users. The effectiveness of it has been evaluated using several real-life databases. The experimental results show that our approach can be very effective at data mining tasks. In fact, in the experiments we performed, our approach was found to be able to predict more accurately than C4.5, CBA, and FID.

References [1] R. Agrawal, S. Ghost, T. Imielinski, B. Iyer, and A. Swami, “An Interval Classifier for Database Mining Applications,” in Proc. of the 18th Int’l Conf. on Very Large Data Bases, Vancouver, British Columbia, Canada, 1992, pp. 560-573. [2] R. Agrawal, T. Imielinski, and A. Swami, “Database Mining: A Performance Perspective,” IEEE Trans. on Knowledge and Data Engineering, vol. 5, no. 6, pp. 914925, 1993. [3] R. Agrawal, T. Imielinski, and A. Swami, “Mining Association Rules between Sets of Items in Large Databases,” in Proc. of the ACM SIGMOD Int’l Conf. on Management of Data, Washington D.C., 1993, pp. 207-216. [4] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” in Proc. of the 20th Int’l Conf. on Very Large Data Bases, Santiago, Chile, 1994, pp. 487-499. [5] W.-H. Au and K.C.C. Chan, “An Effective Algorithm for Discovering Fuzzy Rules in Relational Databases,” in Proc. of the 7th IEEE Int’l Conf. on Fuzzy Systems, Anchorage, Alaska, 1998, pp. 1314-1319. [6] W.-H. Au and K.C.C. Chan, “FARM: A Data Mining System for Discovering Fuzzy Association Rules,” in Proc. of the 8th IEEE Int’l Conf. on Fuzzy Systems, Seoul, Korea, 1999, pp. 1217-1222. [7] K.C.C. Chan and W.-H. Au, “Mining Fuzzy Association Rules,” in Proc. of the 6th Int’l Conf. on Information and Knowledge Management, Las Vegas, Nevada, 1997, pp. 209-215. [8] K.C.C. Chan and W.-H. Au, “Mining Fuzzy Association Rules in a Database Containing Relational and Transactional Data,” in A. Kandel, M. Last, and H. Bunke (Eds.), Data Mining and Computational Intelligence, Heidelberg, Germany; New York, NY: Physica-Verlag, 2001, pp. 95-114. [9] M.-S. Chen, J. Han, and P.S. Yu, “Data Mining: An Overview from a Database Perspective,” IEEE Trans. on Knowledge and Data Engineering, vol. 8, no. 6, pp. 866883, 1996. [10] U.M. Fayyad, “Mining Databases: Towards Algorithms for Knowledge Discovery,” Bulletin of the Technical Committee on Data Mining, vol. 21, no. 1, 1998, pp. 335341. [11] U.M. Fayyad, G. Piatetsky-Shapiro, and P. Smyth, “From Data Mining to Knowledge Discovery: An Overview,” in U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy (Eds.), Advances in Knowledge Discovery and Data Mining, Menlo Park, CA: AAAI/MIT Press, 1996, pp. 1-34. [12] J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann, 2001. [13] K. Hirota and W. Pedrycz, “Fuzzy Computing for Data Mining,” Proc. of the IEEE, vol. 87, no. 9, pp. 1575-1600, 1999. [14] C.Z. Janikow, “Fuzzy Decision Trees: Issues and Methods,” IEEE Trans. on Systems, Man, and Cybernetics – Part B: Cybernetics, vol. 28, no. 1, pp 1-14, 1998. [15] R. Kohavi, “Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision Tree Hybrid,” in Proc. of the 2nd Int’l Conf. on Knowledge Discovery and Data Mining, Portland, Oregon, 1996.

[16] B. Liu, W. Hsu, and Y. Man, “Integrating Classification and Association Rule Mining,” in Proc. of the 4th Int’l Conf. on Knowledge Discovery and Data Mining, New York, NY, 1998. [17] O. Maimon, A. Kandel, and M. Last, “InformationTheoretic Fuzzy Approach to Data Reliability and Data Mining,” Fuzzy Sets and Systems, vol. 117, pp. 183-194, 2001. [18] H. Mannila, H. Toivonen, and A.I. Verkamo, “Efficient Algorithms for Discovering Association Rules,” in Proc. of the AAAI Workshop on Knowledge Discovery in Databases, Seattle, Washington, 1994, pp. 181-192. [19] M. Mehta, J. Rissanen, and R. Agrawal, “SLIQ: A Fast Scalable Classifier for Data Mining,” in Proc. of the 5th Int’l Conf. on Extending Database Technology, Avignon, France, 1996. [20] J.R. Quinlan, “Simplifying Decision Trees,” Int’l J. of ManMachine Studies, vol. 27, pp. 221-234, 1987. [21] J.R. Quinlan, C4.5: Programs for Machine Learning, San Mateo, CA: Morgan Kaufmann, 1993. [22] J. Shafer, R. Agrawal, and M. Mehta, “SPRINT: A Scalable Parallel Classifier for Data Mining,” in Proc. of the 22nd Int’l Conf. on Very Large Data Bases, Mumbai (Bombay), India, 1996. [23] J.W. Smith, J.E. Everhart, W.C. Dickson, W.C. Knowler, and R.S. Johannes, “Using the ADAP Learning Algorithm to Forecast the Onset of Diabetes Mellitus,” in Proc. of the Symp. on Computer Applications and Medial Cares, 1983, pp. 422-425. [24] P. Smyth and R.M. Goodman, “An Information Theoretic Approach to Rule Induction from Databases,” IEEE Trans. on Knowledge and Data Engineering, vol. 4, no. 4, 1992, pp. 301-316. [25] R. Srikant and R. Agrawal, “Mining Quantitative Association Rules in Large Relational Tables,” in Proc. of the ACM SIGMOD Int’l Conf. on Management of Data, Montreal, Canada, 1996, pp. 1-12. [26] S.M. Weiss, Predictive Data Mining: A Practical Guide, San Francisco, CA: Morgan Kaufmann, 1998. [27] R.R. Yager, “On Linguistic Summaries of Data,” in G. Piatetsky-Shapiro and W.J. Frawley (Eds.), Knowledge Discovery in Databases, Menlo Park, CA: AAAI/MIT Press, 1991, pp. 347-363. [28] L. Zadeh, “Fuzzy Sets,” Inform. Control, vol. 8, pp. 338353, 1965.