General Type-2 Fuzzy Neural Network with Hybrid ... - IEEE Xplore

6 downloads 0 Views 526KB Size Report
Wen-Hau Roger Jeng, Chi-Yuan Yeh, and Shie-Jue Lee, Member, IEEE. Abstract—A novel Takagi-Sugeno-Kang (TSK) type fuzzy neural network which uses ...
FUZZ-IEEE 2009, Korea, August 20-24, 2009

General Type-2 Fuzzy Neural Network with Hybrid Learning for Function Approximation Wen-Hau Roger Jeng, Chi-Yuan Yeh, and Shie-Jue Lee, Member, IEEE Abstract— A novel Takagi-Sugeno-Kang (TSK) type fuzzy neural network which uses general type-2 fuzzy sets in a type-2 fuzzy logic system, called general type-2 fuzzy neural network (GT2FNN), is proposed for function approximation. The problems of constructing a GT2FNN include type reduction, structure identification, and parameter identification. An efficient strategy is proposed by using α-cuts to decompose a general type-2 fuzzy set into several interval type-2 fuzzy sets to solve the type reduction problem. Incremental similaritybased fuzzy clustering and linear least squares regression are combined to solve the structure identification problem. Regarding the parameter identification, a hybrid learning algorithm (HLA) which combines particle swarm optimization (PSO) and recursive least squares (RLS) estimator is proposed for refining the antecedent and consequent parameters, respectively, of fuzzy rules. Simulation results show that the resulting networks obtained are robust against outliers.

I. I NTRODUCTION During the past decades, fuzzy logic systems (FLS) based on traditional fuzzy sets, called type-1 fuzzy sets (T1FS) which represent uncertainties by numbers in the range [0, 1], have been successfully applied to different areas of application, such as automatic control, function approximation, data classification, etc. [1]. Sometimes, using T1FS may not be enough to handle the uncertainty which is difficult to be represented as a real value [2]. Type-2 fuzzy logic systems (T2FLS) based on type-2 fuzzy sets (T2FS) have been used to solve this problem, and may perform better than type-1 fuzzy logic systems (T1FLS) due to the flexibility that the membership degrees of T2FS themselves can be fuzzy sets [3],[4]. To date, most of the T2FLSs use interval type-2 fuzzy sets (IT2FS), a special case of general type-2 fuzzy sets (GT2FS). The main reasons can be: (1) the inference procedure in general T2FLS (GT2FLS) is much more complicated than interval T2FLS (IT2FLS) as well as there are no appropriate inference mechanisms for GT2FLS; (2) the amount of time required by GT2FLS is demanding due to the complexity of the type reduction procedure. Recently, Liu proposed an efficient centroid type reduction strategy for GT2FLS [5]. The main idea is based on α-cuts to decompose a GT2FS into several IT2FS, and then apply the KM algorithm [6] to convert IT2FS into T1FS. In this work, the idea of α-cuts is exploited to solve the type reduction problem for GT2FLS and to design a general type-2 fuzzy neural network (GT2FNN). Two more Wen-Hau Roger Jeng, Chi-Yuan Yeh, and Shie-Jue Lee are with the Department of Electrical Engineering, National Sun Yat-Sen University, Kaohsiung 804, Taiwan (corresponding author: [email protected]).

978-1-4244-3597-5/09/$25.00 ©2009 IEEE

1534

stages, structure identification and parameter identification, are required. Regarding the structure identification, an incremental similarity-based fuzzy clustering method [7] is used to partition the dataset into several clusters and a local regression model is built for each cluster, and then a general type-2 fuzzy rule is extracted from each cluster and regressor. Regarding the parameter identification, a hybrid learning algorithm (HLA) which combines particle swarm optimization (PSO) [8] and recursive least squares (RLS) [9] estimator is proposed for refining the antecedent and consequent parameters, respectively, of fuzzy rules. Simulation results show that the resulting networks obtained are robust against outliers. The rest of this paper is organized as follows. Section II presents basic concepts about general type-2 fuzzy inference systems. Section III describes incremental similaritybased fuzzy clustering, multiple linear regression, and rule extraction for structure identification. Section IV describes the hybrid learning algorithm for parameter identification. Experimental results are presented in Section V. Finally, a conclusion is given in Section VI. II. G ENERAL T YPE -2 F UZZY N EURAL N ETWORK In this section, basic concepts about general type-2 fuzzy sets and fuzzy logic systems are introduced. The general type-2 TSK fuzzy neural network is also briefly described. A. General type-2 fuzzy sets When we cannot determine the membership degree of an element in a set as 0 or 1, we can apply fuzzy sets to fuzzify it. Similarly, when we cannot determine the membership degree of an element in a fuzzy set as a crisp number in [0, 1], we can fuzzify it again, i.e., the membership function is fuzzy. This concept, called type-2 fuzzy logic, was first introduced by Zadeh [2] in 1975. A type-2 fuzzy set (T2FS), ˜ can be defined on universe X as denoted as A,     μA˜ (x)/x = fx (μ)/μ /x (1) A˜ = x∈X

x∈X

μ∈Jx

where μA˜ (x) is a secondary membership function (MF), Jx ⊆ [0, 1] is the set of primary membership degrees of x ∈ X, with μ ∈ Jx , ∀x ∈ X, and fx (μ) ∈ [0, 1] is a secondary membership degree. Fig. 1 (Left to right) shows the Gaussian primary membership function and Gaussian secondary membership function, respectively. Note that, when fx (μ) = 1, ∀μ ∈ Jx ⊆ [0, 1], and the secondary MFs are interval sets, the fuzzy set can be called interval type2 fuzzy set (IT2FS), a special case of T2FS. In order to

FUZZ-IEEE 2009, Korea, August 20-24, 2009 1

1

0.9

0.9

j

0.8

0.8

0.7

0.7

0.6

0.6

σ11

0.5 μ = 0.45 0.4

0.5 2

m1=0.45

σ21

0.45 0.4

0.3

0.3

0.2

0.2

0.1 0 −3

where n is the number of input dimension, f j is a lower

0.1 −2

−1

Fig. 1.

m11=0

x=1

2

3

0

0

0.5

1

General type-2 Gaussian membership function.

distinguish IT2FS from T2FS, T2FS will be called general type-2 fuzzy set (GT2FS). Besides, a T1FS is also a special case of a GT2FS, when there is only one element in Jx of GT2FS and the degree of the element is 1. B. General type-2 fuzzy logic system Similar to T1FLS, a GT2FLS includes fuzzifier, fuzzy rule base, fuzzy inference engine, and output processing, where output processing contains type-reducer and defuzzifier. A block diagram of a GT2FLS is depicted in Fig. 2. We

bound of the firing strength in the j-th rule, and f is an upper bound of the firing strength in the j-th rule. 3. Type-reducer. The output sets of the GT2FIS are type-2 fuzzy sets. To obtain embedded type-1 fuzzy sets for each rule, a type-reduced method, centroid type reduction, is used in type-reducer. The efficient algorithm, called the KarnikMendel (KM) algorithm [6], has been developed for centroid type reduction. The type reduction set is ⎡ L ⎤ R J J

j j j j b f + b f b f + b f j α j α j α j α⎥ ⎢  j=1 j=L+1 j=R+1 ⎢ j=1 ⎥ ∨α y y α = ⎢ L ⎥ J R J

⎣ j ⎦ j j j fα + fα fα + fα j=1

j=1

j=L+1

j=R+1

(3) where y is the left-most point of y, y is the right-most point of y, and L and R are obtained from the KM algorithm. 4. Defuzzifier. To obtain a crisp output value for each output variable, a defuzzification method, weighted average, is used in the defuzzifier to convert the related fuzzy conclusion obtained in the previous step to a single real number, which is shown below:

yˆ =

α

y +y α α 2

(4)

α

α

where α = [1, 0.8, 0.6, 0.4, 0.2, 0.01]. C. General type-2 TSK fuzzy neural network The general type-2 TSK fuzzy neural network is a fourlayer network structure as shown in Fig. 3. The four layers

Fig. 2.

General type-2 fuzzy logic system.

briefly describe the functionality of each component and the operation of the whole system as follows: 1. Fuzzifier. For each crisp input value, the fuzzifier transfers it into a GT2FS set to express the associated measurement uncertainty. 2. Fuzzy reasoning. Fuzzy reasoning is performed by the fuzzy GT2 inference engine based on the GT2FS obtained in step 1 and the fuzzy rule base is composed of a set of fuzzy IF-THEN rules. After reasoning, we have a GT2FS for each output variable. To solve type reduction problem, an efficient strategy is proposed by using α-cuts to decompose a general type-2 fuzzy set into several interval type2 fuzzy sets, and then the firing strength of antecedent for each rule with different α-cuts can be computed. Note that α ∧ (μA˜i,j (xi ) )α = [li,j ri,j ]α denotes the α-cut plane of μA˜i,j (xi ) in rule j. In this paper, the product operation is used for inference, and the firing strength of the antecedent can be defined as follows:  n  n n    j j [f f ]α = [li,j ri,j ]α = li,j,α ri,j,α (2) i=1

i=1

i=1

1535

Fig. 3.

β 0,1 + β1,1x1 + β 2,1 x2 + β 0, 2 + β1, 2 x1 + β 2, 2 x2 +

+ β i ,1 xi +

+ β n ,1 xn

+ β i , 2 xi +

+ β n , 2 xn

β 0, j + β1, j x1 + β 2, j x2 +

+ β i , j xi +

+ β n , j xn

β 0, J + β1, J x1 + β 2, J x2 +

+ β i , J xi +

+ β n , J xn

General type-2 fuzzy neural network.

are called the fuzzification layer (layer 1), the conjunction

FUZZ-IEEE 2009, Korea, August 20-24, 2009

layer (layer 2), the normalization layer (layer 3), and the output layer (layer 4), respectively. The operation of the fuzzy neural network is described as follows. Layer 1: Layer 1 contains J rules and each rule contains (1) n nodes. Node(i, j) of this layer produces its output, oi,j , by computing the value of the corresponding general type-2 membership function A˜i,j , i.e., o1i,j =A˜i,j (xi )   1 2 =gt2 xi ; m1i,j , σi,j , σi,j     = ∨α α ∧ μA˜i,j (xi )

(5)

III. S TRUCTURE I DENTIFICATION FOR GT2FNN To date, there are no general guidelines that can be applied to specify the optimal number of fuzzy rules and its corresponding initial values for the first-order TSK type FNN. In this study, we propose a self-constructing method which consists of incremental similarity-based fuzzy clustering, multiple linear regression, and fuzzy rule extraction to solve this problem. The flowchart of the structure identification for the first-order TSK type FNN is depicted in Fig. 4 and the detailed process is described as follows.

α

1 where xi is the i-th dimension of the input, m1i,j and σi,j are mean and standard deviation, respectively, of the primary membership function of the i-th feature in the j-th fuzzy 2 is the deviation of the secondary membership rule, and σi,j function of the i-th feature in the j-th fuzzy rule which is defined as ⎛ ⎛   2  ⎞ 2 ⎞ xi −m1i,j 1 ⎜ ⎜ a − exp − σi,j ⎟ ⎟ ⎜ ⎟ ⎟ μA˜i,j (xi ) (a) = exp ⎜− ⎜ 2 ⎝ ⎠ ⎟ σi,j ⎝ ⎠

(6) where a ∈ [0, 1], i = 1, ..., n, n is the number of input dimension, j = 1, ..., J, J is the number of fuzzy rules, and α ∧ (μA˜i,j (xi ) )α is the input of the Layer 2. Layer 2: Layer 2 contains J rules and each rule contains t nodes where t is the number of α-cut planes. The output (2) (oj )α is: n      (2) (1) oj oi,j = (7) α

i=1

α

where j = 1, ...J. Layer 3: Layer 3 contains t nodes where t is the number of α-cut planes. The output (o3 )α of Layer 3 is the result obtained from the KM Algorithm: (o3 )α = KM ((o21 )α , (o22 )α , ..., (o2j )α , ...(o2J )α , cons) (8) where



β0,1 + β1,1 x1 + ... + βi,1 xi + ... + βn,1 xn β0,2 + β1,2 x1 + ... + βi,2 xi + ... + βn,2 xn .. .

⎢ ⎢ ⎢ ⎢ cons = ⎢ ⎢ β0,j + β1,j x1 + ... + βi,j xi + ... + βn,j xn ⎢ ⎢ .. ⎣ . β0,J + β1,J x1 + ... + βi,J xi + ... + βn,J xn

⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦

(9) is the consequence vector generated from all rules. Layer 4: Layer 4 contains one node and its output, o(4) , represents the result of the centroid defuzzification, i.e.,

(3) )α α (o (10) . o(4) = αα

1536

Fig. 4. rules.

Structure identification for the first-order TSK type GT2 fuzzy

A. Incremental similarity-based clustering The basic concept of the incremental similarity-based clustering algorithm is that one training pattern is considered at a time, and then the input-similarity and out-similarity between this pattern and the existing fuzzy clusters are calculated to determine either to assign it to the most similar cluster and to update the statistical mean and standard deviation for the most similar cluster or to create a new cluster for it and to set initial values for the new cluster. Suppose we are given a set of input-output training patterns (x1 , y1 ), ..., (xl , yl ), with input xi ∈ Rn , i = 1, ..., l, where l is the number of training patterns, and output yi ∈ R. Let J be the number of existing fuzzy clusters. The input-similarity between the i-th training pattern xi and the j-th fuzzy cluster Cj is calculated by   2  n  xk,i − mk,j exp − Gi,j = (11) σk,j k=1

where mk,j and σk,j denote the mean and standard deviation of cluster Cj , respectively. The output-similarity between the i-th training pattern and the j-th cluster Cj is calculated by ei,j =|yi − ycj |

(12)

where ycj denotes the representative output of cluster Cj . If Gi,j ≥ ρ and ei,j ≤ τ where 0 ≤ ρ ≤ 1 and τ are predefined thresholds, this means that xi has passed the input-similarity test and output-similarity test, i.e., xi is similar to Cj . In this case, xi is assigned to the most similar cluster Ca with the

FUZZ-IEEE 2009, Korea, August 20-24, 2009

largest input-similarity and the modification to this cluster is defined as follows: |Ca |mk,a + xki , for k = 1, ..., n mk,a = (13) |Ca | + 1 where |Ca | is the cardinality of Ca , √ σk,a =σk,0 + A − B, for k = 1, ..., n

(14)

where σk,0 is a user-defined initial standard devia(|Ca |−1)(σk,a −σk,0 )2 +|Ca |m2k,a +x2k,a , |Ca | (|Ca |mk,a +xk,a )2 |Ca |(|Ca |+1) , and

tion, A =

yca =

|Ca |yca + yi . |Ca | + 1

and B =

(15)

If xi does not pass the input-similarity test or outputsimilarity test with any existing clusetr, a new fuzzy cluster CJ+1 is created with mJ+1 = xi , σ J+1 = σ 0 , and ycJ+1 = yi . B. Linear least square regressor After the J clusters are obtained, we can build a local regression model for each cluster by applying the linear least squares approach. The multiple linear regression model for the j-th cluster is yj =Xj β j + ε

(16)

where yj ∈ R|Cj | is the output of the j-th cluster, Xj ∈ R|Cj |×(n+1) is the input of the j-th cluster, β j ∈ Rn+1 is the regression parameters of the j-th cluster, and ε ∈ R|Cj | is the vector of random errors. We wish to find βj that minimizes Ej =(yj − Xj β j )T (yj − Xj β j ).

(17)

where x = [x1 , ..., xi , ..., xn ]T is the input vector, yj is the output of the j-th rule, β j = [β0,j , β1,j , ..., βi,j , ..., βn,j ]T is the consequence parameters of the j-th rule obtaining from the j-th linear local regressor, and F˜i,j is GT2 fuzzy set of antecedent part of the i-th feature in the j-th rule. The primary membership function μi,j can be defined as follows: ⎡  2 ⎤ xi − m1i,j ⎦ (21) μi,j = exp ⎣− 1 σi,j where m1i,j is obtained from the statistic mean of the i1 is obtained from the th feature in the j-th cluster, σi,j standard deviation of the i-th feature in the j-th cluster, and the secondary membership function μA˜i,j can be defined as follows: 2 μA˜i,j = gt2(μi,j , σi,j ) (22) 2 1 = σi,j . where 0 ≤ μi,j ≤ 1 is a membership degree and σi,j Now the first-order TSK type GT2 IF-THEN fuzzy rules have been extracted by the above definition and the initial values 1 2 , and σi,j have been determined. The parameter of m1i,j , σi,j values can be refined by a hybrid learning algorithm in the parameter identification phase, to be described below.

IV. PARAMETER I DENTIFICATION FOR GT2FNN In order to improve the convergence speed, a hybrid learning algorithm (HLA) which combines particle swarm optimization (PSO) and the recursive least squares (RLS) estimator is used to train the network. In each iteration, PSO and RLS are applied to refine the antecedent and consequent parameters, respectively, of the first-order TSK type GT2 fuzzy rules. The flowchart of the HLA is depicted in Fig. 5 and the detailed process is described as follows.

Setting partial derivatives of E with respect to the regression parameters β j to zero, we can obtain the least square normal equation (18) XjT Xj βj =XjT yj which implies that βj =(XjT Xj )−1 XjT yj .

(19)

Note that if (XjT Xj )−1 does not exist, we can use (XjT Xj + λI)−1 to solve Eq. (19) where λ is an arbitrary small positive real number and I is a (n + 1) by (n + 1) identity matrix. C. Fuzzy rule extraction After the J clusters and the linear local regressor are obtained, we can extract a first-order TSK type GT2 IFTHEN fuzzy rule from each cluster and the regressor. The parameters of the ‘IF’ part in the j-th rule can be obtained from the j-th fuzzy cluster, while the parameters of the ‘THEN’ part in the j-th rule can be obtain from the j-th linear local regressor. The j-th first-order TSK type GT2 IFTHEN fuzzy rule is: IF x1 is A˜1,j and ... and xi is A˜i,j and ... and xn is A˜n,j THEN yj is β0,j + β1,j x1 + ... + βi,j xi + ... + βn,j xn (20)

1537

Fig. 5. rules.

Parameter identification for the first-order TSK type GT2 fuzzy

A. Particle Swarm Optimization PSO is a population-based global search algorithm for problem solving proposed by Kennedy and Eberhart in 1995 [8]. Each particle is a candidate solution and moves with an adaptable velocity within the search space, and

FUZZ-IEEE 2009, Korea, August 20-24, 2009

remembers the best position it ever encountered. Assume an d-dimensional search space S. The i-th particle is an d-dimensional vector Pi = [pi,1 , ..., pi,d ]T ∈ S. The corresponding current velocity of this particle is Vi (t) = [vi,1 , ..., vi,d ]T . The new velocity Vi (t + 1) is updated by Vi (t + 1) =w × Vi (t) + c1 × rand() × (Pbesti − Pi (t)) + c2 × rand() × (Gbest − Pi (t))

(23) where w, c1 , and c2 are the coefficient of inertia, cognitive, and social, respectively, rand() is uniformly distributed random numbers in [0, 1], Pbesti is the best previous position of this particle (the cognitive effect), and Gbest is the overall best particle (the social effect). The particle then updates its position by using this new velocity. When all particles in a swarm have updated their positions, the swarm migrates to the next generation. If the new position jumps out of the search space, it will be set to a proper value. Note that the structure of a particle in this study is presented as follows: 1 2 1 2 , σ1,1 , ..., m1n,1 , σn,1 , σn,1 , ..., Pi (t) =[m11,1 , σ1,1

(24)

1 2 1 2 m11,J , σ1,J , σ1,J , ..., m1n,J , σn,J , σn,J ]

where n is the dimension of a training pattern, J is the number of GT2 fuzzy rules, the super-index 1 is the primary membership function for crisp input, and the super-index 2 is the secondary membership function for membership degree. The dimension of a particle is d = 3 × n × J. B. Recursive Least Squares Estimator Once the reduced type-1 fuzzy sets are obtained by using the KM algorithm [6], solving consequence parameters can be considered as a multiple linear regression problem. From Eq. (19), we can know that the optimal consequence parameters of fuzzy rules β ∈ R(n+1)×J can be solved by:

(26)

where λ > 0 is a scaling factor, zt+1 = [1, wTt+1 ]T ∈ Rn+1 , et+1 = yt+1 − zTt+1 βt is the prediction error of the (t + 1)-th

1538

(27)

where λ + zTt+1 Ht zt+1 is a scalar and Ht zt+1 (Ht zt+1 )T is a rank-one matrix. Apparently, RLS runs more efficiently than other approaches. Thus, we adopt RLS here. V. E XPERIMENTAL R ESULTS In order to test the approximating capability of the proposed method, we have conducted two simulation experiments using two non-linear functions. For ‘fair’ comparison, the same set of parameters are used in the simulation experiments. For instance, in PSO, the population size is set as 5, maximum iteration is set as 30, and the parameters w, c1 , and c2 are set as 0.5, 1.5, and 1.5, respectively. A. Experiment I In this simulation, the true function is given by: 2

y = 1.1 × (1 − x + 2x2 ) × e−x

/2

, x ∈ [−5, 5].

(28)

The uncorrupted training dataset consists of 200 randomly generated patterns, with input x and corresponding output y. The testing dataset consists of 50 uncorrupted testing patterns generated in the same way. A corrupted training pattern is composed of the same output as the corresponding uncorrupted one but with the input corrupted by adding a random value from a normal distribution with zero mean and standard deviation σ = 0.1. Three corrupted datasets, in which 20%, 30%, and 40% of the patterns are randomly corrupted, are used. The parameters ρ and τ of the incremental similaritybased clustering are set as 0.01 and 0.2, respectively. The simulation results are shown in Fig. 6 and Table I, where the training and testing errors are root mean square error (RMSE). For uncorrupted data shown in Fig. 6, T1FNN, uncorrupted data

(25)

where the input W = [wT1 , ..., wTl ]T ∈ Rl×((n+1)×J) is a non-linear transformation of x and y ∈ Rl is the desired output. Apparently, the size of (W T W ) is larger than that of (X T X) in Eq. (19), and it is unavoidable to calculate the inverse of a large matrix for each particle in each iteration. To avoid this problem, another approach, recursive singular value decomposition (RSVD), which considers one training pattern, (wTi , yi ), at a time was proposed to replace the linear least square approach. In RSVD, a singular value decomposition of W  is needed each time. Although the size of W  is smaller than W , the amount of time required by RSVD is demanding. Another effective approach, called recursive least squares (RLS), which minimizes the summation of squared errors for all training patterns up to the present iteration t, was proposed. The updating formulation for β is: βt+1 =βt + λ−1 Ht+1 zt+1 et+1

Ht zt+1 (Ht zt+1 )T λ + zTt+1 Ht zt+1

20% corrupted data

3

3 Training data T1FNN IT2FNN GT2FNN

2.5 2

2

1.5

1.5

1

1

0.5

0.5

0 −5

0 x

0 −5

5

30% corrupted data

5

3 Training data T1FNN IT2FNN GT2FNN

2

2 1.5

1

1

0.5

0.5 0 x

Training data T1FNN IT2FNN GT2FNN

2.5

1.5

Fig. 6.

0 x 40% corrupted data

3 2.5

0 −5

training data T1FNN IT2FNN GT2FNN

2.5

y

β =(W T W )−1 W T y.

Ht+1 =Ht −

y

1 2 1 2 m11,j , σ1,j , σ1,j , ..., m1n,j , σn,j , σn,j , ...,

training pattern, and

5

0 −5

0 x

5

Simulations results for four datasets of Experiment I.

IT2FNN and GT2FNN estimates are almost indistinguishable from the true function. For corrupted data with progressively increased corruption, GT2FNN estimates are more robust to x-space outliers and they outperform T1FNN and IT2FNN estimates. T1FNN may overfit the training data due to the high percentage of outliers.

FUZZ-IEEE 2009, Korea, August 20-24, 2009 TABLE I S IMULATIONS RESULTS FOR FOUR DATASETS OF E XPERIMENT I.

number of rules 6 6 6

target function

uncorrupted data

1

1

.5

0.5

number of rules 5 5 5

y

0 .5

0 −0.5

−1 1

−1 1 0 x

2

−1

−1

−0.5

0

0.5 x

1

0

x

2

1

−1

20% corrupted data

number of rules 6 6 6 number of rules 5 5 5

−1

−0.5

0.5

0

1

x1

40% corrupted data

1

1

.5

0.5

0

y

uncorrupted data methods training error testing error refining time T1FNN 0.0022 0.0022 3.76 IT2FNN 0.0047 0.0048 9.49 GT2FNN 0.0036 0.0035 31.81 20% corrupted data methods training error testing error refining time T1FNN 0.0901 0.023 3.75 IT2FNN 0.0920 0.026 8.47 GT2FNN 0.0900 0.023 30.44 30% corrupted data methods training error testing error refining time T1FNN 0.1372 0.0872 5.4 IT2FNN 0.1412 0.0598 9.27 GT2FNN 0.1406 0.0435 31.38 40% corrupted data methods training error testing error refining time T1FNN 0.1470 0.1112 3.75 IT2FNN 0.1616 0.0761 8.70 GT2FNN 0.1635 0.0647 30.08

same. However, we can see clearly that GT2FNN performs better for corrupted data. The approximating results obtained by GT2FNN are shown in Fig. 7.

.5

0 −0.5

−1 1

−1 1 0 x

2

Fig. 7.

−1

−1

−0.5

0.5

0

1

x1

0

x

2

−1

−1

−0.5

0.5

0

1

x1

Simulations results for three datasets with GT2FNN.

VI. C ONCLUSION

B. Experiment II In this simulation, the true function is given by : y = x21 × sin(x2 π).

(29)

The uncorrupted training dataset consists of 225 randomly generated patterns, with input x = [x1 , x2 ]T and corresponding output y. The testing dataset consists of 50 uncorrupted testing patterns generated in the same way. A corrupted training pattern is composed of the same output as the corresponding uncorrupted one but with the input corrupted by adding a random value from a normal distribution with zero mean and standard deviation σ = 0.2. Two corrupted datasets, in which 20% and 40% of the patterns are randomly corrupted, are used. The parameters ρ and τ of the incremental similarity-based clustering are set as 0.0001 and 0.4, respectively. The simulation results are shown in Table II. Again, for uncorrupted data, the performances of the three TABLE II S IMULATIONS RESULTS FOR THREE DATASETS OF E XPERIMENT II. uncorrupted data methods training error testing error refining time number of rules T1FNN 0.0280 0.0280 3.81 6 IT2FNN 0.0178 0.0186 9.85 6 GT2FNN 0.0282 0.0297 36.20 6 20% corrupted data methods training error testing error refining time number of rules T1FNN 0.0492 0.0403 4.20 5 IT2FNN 0.0506 0.0379 9.10 5 GT2FNN 0.0427 0.0298 34.25 5 40% corrupted data methods training error testing error refining time number of rules T1FNN 0.0685 0.0384 3.69 7 IT2FNN 0.0689 0.0380 9.44 7 GT2FNN 0.0664 0.0353 34.44 7

fuzzy neural networks on the testing data set are about the

1539

We have presented an efficient approach for constructing general type-2 fuzzy neural networks (GT2FNN). The idea of α-cuts is exploited to solve the type reduction problem. For structure identification, an incremental similarity-based fuzzy clustering method is used to partition the dataset into several clusters and a local regression model is built for each cluster, and then a general type-2 fuzzy rule is extracted from each cluster and regressor. For parameter identification, a hybrid learning algorithm which combines particle swarm optimization and recursive least squares estimator is proposed for refining the antecedent and consequent parameters, respectively, of fuzzy rules. Simulation results have shown that the resulting networks obtained are robust against outliers. R EFERENCES [1] G. J. Klir and B. Yuan, Fuzzy Set and Fuzzy logic. Prentice Hall PTR, May 1995. [2] L. A. Zadeh, “The concept of a linguistic variable and its application to approximate reasoning-1,” Information Sciences, vol. 8, pp. 199–249, January 1975. [3] J. M. Mendel, UNCERTAIN Rule-Based Fuzzy Logic Systems. Prentice Hall PTR, January 2001. [4] J. M. Mendel, “Type-2 fuzzy sets and systems: An overview,” IEEE Computational Intelligence Magazine, vol. 2, no. 1, pp. 20–29, February 2007. [5] F. Liu, “An efficient centroid type-reduction strategy for general type-2 fuzzy logic system,” Information Sciences, vol. 179, no. 9, pp. 2224– 2236, April 2008. [6] N. N. Karnik and J. M. Mendel, “Centroid of a type-2 fuzzy set,” Information Sciences, vol. 132, no. 1-4, pp. 195–220, February 2001. [7] S. J. Lee and C. S. Ouyang, “A neuro-fuzzy system modeling with self-constructing rule generation and hybrid svd-based learning,” IEEE Transaction on Fuzzy Systems, vol. 11, no. 3, pp. 341–353, June 2003. [8] J. Kennedy and R. C. Eberhart, “Particle swarm optimization,” in Proceedings of the IEEE International Conference on Neural Networks, 1995, pp. 1942–1948. [9] V. Kecman, Learning and Soft Computing: Support Vector Machines, Neural Networks, and Fuzzy Logic Models. The MIT Press, March 2001.