A Hierarchical Genetic Algorithm For The Design ... - Semantic Scholar

1 downloads 0 Views 194KB Size Report
A Hierarchical Genetic Algorithm For The Design Of Beta Basis Function Neural. Network. Chaouki Aouiti1, Adel M. Alimi2, Fakhreddine Karray3, Aref Maalej. 4.
A Hierarchical Genetic Algorithm For The Design Of Beta Basis Function Neural Network 4 1

Chaouki Aouiti1 , Adel M. Alimi 2 , Fakhreddine Karray3 , Aref Maalej

University 7 November Carthage, Faculty of Sciences of Bizerta, Tunisia [email protected] 2 REGIM: Research Group on Intelligent Machines, University of Sfax, ENIS, Department of Electrical Engineering, BP W - 3038, Sfax, Tunisia [email protected] 3 PAMI: Pattern Analysis and Machine Intelligence Labora tory, Systems design Engineering Department University of Waterloo, Waterloo, ON N2L 3G1, Canada [email protected] 4 LASEM: Laboratory of Electromechanical Systems, University of Sfax , ENIS, Department of Mechanical Engineering, BP W - 3038, Sfax, Tunisia [email protected] Abstract - We propose an evolutionary neural networkthe genetic processes. A population of chromosomes is training algorithm for Beta basis function neural networks manipulated and each individual represents a possible (BBFNN). Classic training algorithms for neural networks solution to the problem. Each chromosome can be start with a predetermined network structure. Generally the assigned a fitness which indicates the quality of the network resulting from learning applied to a predetermined solution that it encodes. The first insights into evolving architecture is either insufficient or over-complicated. This paper describes a hierarchical genetic learning model of the such networks date back to some of the earliest efforts in BBFNN. In order to examine the performance of the evolutionary computation. Friedman in 1956 proposed but proposed algorithm, they were used for the approximation didn’t implement an evolutionary search. problems. The results obtained are very satisfactory with respect to the relative error.

I. INTRODUCTION Artificial neural networks are typically used to generalize an input-output mapping over a set of examples. The first artificial neural network was the perceptron by Rosenblatt in 1957 [10]. Several algorithms, such as back-propagation have been developed for training artificial neural networks. The general procedure is to view training as a search over an ndimensional parameter space of weights in light of an error function and rely on calculating the gradient of this surface to guide the search. The discovery and popularization of the back-propagation learning rule has strongly stimulated the research on neural networks. However, the standard BP-learning algorithm suffers from the typical handicaps of all steepest descent approaches: Very slow convergence rate and the need for predetermined learning parameters limit the practical use of this algorithm. Many improved learning algorithms have been reported in literature. Some use heuristic rules to find optimal learning parameters. Other refine the gradient descent method to accelerate convergence. Further approaches employ different nonlinear optimization methods like conjugate gradient, Newton’s method or quasi-Newton techniques [9]. Evolutionary algorithms have been used for the design of artificial neural networks. These algorithms are based on

0-7803-7278-6/02/$10.00 ©2002 IEEE

Classic training algorithms for artificial neural networks start with a predetermined network structure, and so the response of NN depends strongly on its structure. Generally the network resulting from learning applied to predetermined architecture is either insufficient or over complicated. But in the last few years some researches have developed learning algorithm which incorporate structure selection mechanisms like constructive algorithms and pruning algorithms [12], [13]. A constructive algorithm starts with a minimal network, an artificial neural network with a minimal number of hidden layer, hidden neurons and connections and adds new layers, neurons or connections if it’s necessary, during the training phase. While, a pruning algorithm does the opposite, starts with a maximal network and deletes the unnecessary layers, nodes and connections during training. Some researches have developed learning GAs that incorporate structure selection mechanisms for neural networks, (Yao and Liu, 1997; Samuel and Gisele, 1998; Billings and Zheng, 1995). Two approaches exists to evolve ANN architectures (Yao and Liu, 1997). In the first approach there is only the evolution of the architectures. If a near optimal architecture is obtained, connection weights are trained. In the second approach, both the architectures and weights are found. For the first category, it is important to choose the information about an architecture that will be coded in the chromosome. An example is to encode all the information (weights, number of layer, the different neurons in each layer,…). This representation is the direct encoding. Another example is to encode only the

most important parameters such that the number of hidden layers. The present work is the sequence of a series of efforts concerning the application of genetic algorithms for the optimization of Beta Basis Function Neural Networks and generally for feedforward neural networks. In [5] a Beta Basis Function Neural Networks was evolved by a discrete genetic algorithm. In [6] a of Beta Basis Function Neural Networks was evolved by a real genetic algorithm. GA aren’t used only for the design of artificial neural networks but they are used in many other domain like the fuzzy system, an examples is the work of Rahmouni et al. [14]. Our study is inspired from their work. It focuses on the evolutionary design of Beta Basis Function Neural Networks (BBFNNs) with a hierarchical GA . We propose a hierarchical genetic algorithm for the design of BBFNNs, which incorporates two key ideas; an outer GA to find the optimum number of neurons in the hidden layer of the BBFNN and an inner GA to find the parameter of the BBFNNs with the best number of neurons in the hidden layer. The proposed methodology simultaneously derives optimal structure of the BBFNN and optimal parameters of the BBFNNs. Simulation results show the effectiveness of the proposed method. The rest of the paper is organized as follows: section 2 is devoted to describe an example of feedforward neural networks it’s the Beta Basis Functions Neural Networks. In section 3, we present a hierarchical genetic algorithm for the design of BBFNNs. Numerical examples and discussion are showing in section 4.

 Dp  x0 = xc − p + q So  Dq  x1 = xc + p +q 

(4)

(1) and (4) ⇒   ( p + q )( x − xc )  p  ( p + q )( x c − x )  q   +1  + 1 Dp Dq       Dp Dq   β( x ) =  if x ∈  xc − ,x c +  p +q p + q     0 else where

(5)

In the multi-dimensional case (dimension =N ) and if X =  x1 , x2 ,..., x N  ; X 0 = x10 ,x02 ,..., x0N ; X1 = x11, x12 ,..., x1N    1 2 N ; P = p , p ,..., p  and Q =  q1 ,q 2 ,..., q N      Then N β X,X 0 ,X1 ,P,Q = ∏β xi ,x0i ,x1i, pi ,q i  (6)   i=1   Alimi has shown ([3], [4]) that if we have a given continuous real function and for any arbitrary precision, there exists a Beta fuzzy basis function expansion that approximates it.

(

)

1

1

1

0.9

0.9

0.9

0.8

0.8

0.8

0.7

0.7

0.7

0.6

0.6

0.6

0.5

0.5

0.5

0.4

0.4

0.4

0.3

0.3

0.3

0.2

0.2

0.1

0.1

0 -1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

.

0 -1

(

)

0.2 0.1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

0 -1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

II. BETA BASIS FUNCTION NEURAL NETWORKS

The first idea of using Beta functions for the design of Beta Basis Function Neural Network (BBFNN) that are generated versions of RBFNN was introduced by Alimi in 1997 [4]. The Beta function is used as kernel function for many reasons, such as its great flexibility (see Fig.1) and its universal approximation characteristics ([5], [6]). The Beta function is defined by: β (x) = β( x , x0 , x1 , p , q ) p q  x − x0   x1 − x   =   x c − x 0   x1 − xc  if x ∈ [ x 0 , x1  0 elsewhere

]

(1)

where p > 0, q > 0, x0 and x1 are real parameters, and: px1 + qx0 xc = (2) p +q Let D = x1 − x0 D is the width of the Beta function.

0-7803-7278-6/02/$10.00 ©2002 IEEE

(3)

Fig. 1. Examples of the Beta function in the one and two dimension case

III. THE GENETIC MODEL Many papers have shown that genetic algorithms are able to find near-optimal solutions to complex problems [11]. In this paper we will present a hierarchical genetic models for the design of BBFNN. We use a discrete representation to code the center xc, the width D, p and q of each beta basis function. Using genetic terminology in our data representation, the first step is choosing a representation for possible solution. Each chromosome that represents a network is a matrix (see Fig.2). The number of lines in this matrix is equal to the number of variables in the function that we will approximate. For the number of columns, each chromosome that represents a network is coded as a variable length because the hidden layer has a variable

number of neurons, so a sequence of four genes codes a beta, the first gene codes for the center, the second gene codes for the width and the other two gene codes for p and q. The whole chromosome codes for the set of the parameters of the hidden layer. Since the repetitive usage of the same neuron in a neural network didn’t improve the approximation capability of the network, each chromosome is formed by distinct sequences. Xc1 Yc1

Dx 1 Dy 1

P x1 P y1

XcN YcN

Dx N Dy N

P xN P yN

Qx N Qy 3 N

Fig. 2. A chromosome that represents a BBFNN with N hidden neurons (the function has two variables)

The second step is to choose the objective function. If one chooses: nc f * (x )= ∑ w jβ j( x ) (7) j =1 where n c is the number of neurons in the hidden layer, βj are the kernel functions of the jth neuron, wj is the jth weights and f* is the output of the network. Given a set of data (xi ,yi ) (i=1,2,...,N) one can obtain the connection weights, the parameters of each neuron and the number of neurons by minimizing the following objective function: T N f1 (nc ,w, x0 ,x1, p,q )= ∑ ( yi − f * ( xi ))( yi − f * (xi )) i=1

(8)

If one uses this function as an objective function [7], the best structure that minimizes this objective function has N hidden nodes. To provide a trade-off between network performance and network structure the objective function to be minimized can be amended to:    f2 (nc ,w,x0 ,x1 , p,q )=  Log  Nmax − nc + 1  +        Log  nc − Nmin + 1   1     1 + f   1 

(9)

where Nmax and Nmin are the maximum and the minimum numb er of neurons in the hidden layer. After the choice of the objective function we randomly chose an initial population of M chromosomes. Each chromosome has between Nmax and N min neurons in the hidden layer. In the following, the genetic operators used in the outer and inner algorithm will be described in detail. A) The Outer Genetic Algorithm 1) The Crossover operators: The most important operator in a genetic algorithm is the crossover operator. This operator recombines the genetic material from two parents into their children. In our outer genetic algorithm we have used two-crossover operators. One of them changes the

0-7803-7278-6/02/$10.00 ©2002 IEEE

number of columns of each chromosome so it changes the number of neurons in the hidden layer. The second operator does not change the number of neurons in the hidden layer. a) The RN- Crossover1 operator: For the first operator, after the selection of the two chromosomes for which we will apply this crossover operator, we choose an arbitrary position a in the first chromosome and a position b in the second chromosome according to a. Then, we exchange the second part of the two chromosomes. If one children has more than Nmax or less than Nmin neurons in the hidden layer we choose another position. b) The RN- Crossover2 operator: For the second operator, after the selection of the two chromosomes for which we will apply this crossover operator, we choose an arbitrary position a in the first chromosome and a position b in the second chromosome according to a. Let Min_point=Min(a,b). First we change the values of a and b to Min_point. Then, we exchange the second part of the two chromosomes. In this case, we have necessarily first children having the same length as the second chromosome and the second children having the same length as the first chromosome. 2) The RN-Mating operator: When we change the position of a neuron in the hidden layer the output of the network didn’t change. For this reason before applying the crossover operator we apply the mating operator. So this operator changes the position of the neurons in the hidden layer (see Fig 3). With this operator the crossover operator will not be a one-point crossover. Xc1 12

Q1 12 Xc2 12

Q2 12 Xc3 12

Q3 12

Xc1 N2

Q1 N2 Xc2 N2

Q2 N2 Xc3 N2

Q3 N2

Xc3 12

Q3 12 Xc1 12

Q1 12 Xc2 12

Q2 12

Xc3 N2

Q3 N2 Xc1 N2

Q1 N2 Xc2 N2

Q2 N2

N

Fig. 3. The R -Mating operator

3) The RN- Mutation operator: The goal of applying the mutation operator is to inject some new information into the population because, generally, the initial population didn’t have all the information that is essential to the solution. We begin by associating to each gene a real value between 0 and 1. If this value is less than the probability of mutation Pm than the operator mutation is applied to the gene by changing the value of the gene. 4) The RN- Addition and The RN- Elimination operators. Classic training algorithms for Beta Basis Function Neural Networks start with a predetermined network structure and the quality of the response of the BBFNN depends strongly on its structure. This problem will be solved

when we use the Genetic Algorithms by the two operators RN - Addition and RN - Elimination. The first operator adds a neuron in the hidden layer (see Fig 4) and the second operator eliminates a neuron (see Fig 5). Xc1 1 D1 1 P 1 1 Q1 1

Xck 1 Dk 1 P k 1 Qk 1

Xc1 N D1 N P 1N Q1 N

Xck N Dk N P kN Qk N

Let Err = |Z-Y|=(|z1 -y1 |,|z2 -y2 |,…,|zN-yN|).

Let j1 /

  max  Erri  = Err and j1 1≤ i≤ N 

j 2 / min  wi  = w than the center of the   j2 layer will be X j 2 th 1neuron ≤ i≤ N h in the hidden j 1

Xc1 1 D1 1 P 1 1 Q1 1

Qk 1 Xck+11 Dk+1 1 P k+11 Qk+1 1

Xc1 N D1 N P 1N Q1 N

Qk N Xck+1N Dk+1N P k+1N Qk+1N

Fig. 4: The RN –Addition operator Xc1 1

D1 1

P 11

Q1 1

Xck 1

Dk 1

P k1

Qk 1

C) The Selection Operator The selection operator used in the inner and in the outer genetic algorithm is the truncation selection with a threshold equal to T. In this selection operator only the fraction T best individuals can be selected and they all have the same selection probability. D) Algorithm

Xc1 N D1 N P 1 N Q1 N Xc1 1

D1 1

P 11

N

N

N

Xc1

D1

P1

Xck N Dk N P k N Qk N Xck-1 1 Dk-1 1 P k-1 1 Qk-1 1

Xck-1

N

Dk-1

N

P k-1

N

Qk-1

N

Fig. 5. The RN -Elimination operator

B) The Inner genetic algorithm 1) The RN-Uniform Crossover operator: N The R -Uniform crossover operator is radically different to the RN - crossover1 operator and RN - crossover2 operator. Each gene in the child is created by copying the corresponding gene from one or the other parent, chosen according to a randomly generated crossover mask. Where there is a 1 in the crossover mask, the gene is copied from the first parent and where there is a 0 in the mask the gene is copied from the second parent. The process is repeated with the parents exchanged to produce the second offspring. A new crossover mask is randomly generated for each pair of parents. After the crossover operator we apply the mutation operator like in the outer genetic algorithm. 2) The New_center operator: The last operator in the inner genetic algorithm is The RN -Addition-Elimination operator. Just as Pm controls the probability of mutation, another parameter, Pad-el gives the probability that a chromosome will be flipped. If the chromosome represent a BBFNN with Nh neurons in the hidden layer. Let X=(x1 ,x2 ,…,xN) the N training points and Z=(z1 ,z2 ,…,zN) the desired output. We find the vector of weights, W=(w1 ,w 2 ,…,w Nh ) between the first and the hidden layer by using the least squares method. Let Y=(y1 ,y2 ,…,yN) the vector of the output of the BBNNs.

0-7803-7278-6/02/$10.00 ©2002 IEEE

1. Choose randomly the initial population. 2. Decode each chromosome in the population. 3. Compute the connection weights from the hidden layer to the output layer. 4. Find the fitness f2 of each chromosome (find the best chromosome). go to step 5. or go to step a. a. Choose randomly the initial population. Each chromosome defines a BBFNN that has the same number of neurons in the hidden layer us the best solution in the step 4. b. Decode each chromosome in the population. c. Compute the connection weights from the hidden layer to the output layer. d. Find the fitness f1 of each chromosome. e. Apply the Truncation Selection. f. Apply Crossover, Mutation, new_center operators. g. If the number of generation is equal to an integer N’ or (f1 < ε) then go to step 5 else return to step b. 5. Apply the Truncation Selection. 6. Apply The RN - Crossover1 or the RN - Crossover2, mutation, addition, elimination and new_center operators. 7. If the number of generation is equal to an integer N or (f2 > M) then stop else return to step 2. Where f1 and f2 are the function describing in (8) and (9), N’ is the maximum number of iteration of the inner GA, N is the maximum number of iteration of the outer GA and ε and M are two variables. IV. SIMULATION RESULTS AND DISCUSSION

We trained the BBFNN with different set functions to approximate different function. We tested our hierarchical genetic algorithm on many approximands in the 1-D case, in the 2-D case, in the 3-D case and in the 4-D case. We sampled 100 points of the function in the 1-D case to give a training set. The 2-D case used 10x10=100 samples. The 3-D case used 7x7x7=343 samples. The 4-D case used 5x5x5x5=625 samples. Below are some examples of test functions we used as approximands:

[ (

)

(

]

)

f 1 ( x ) = 10 exp −5 x + exp −0 .3 x − 0 .8 + exp ( x + 0. 6 ) for −1≤x ≤1 1 if x = 0 and y = 0   sin x if x≠ 0 and y =0  x  g ( x, y )  sin y if x =0 and y ≠ 0 for −1≤ x , y ≤1 1 y   sin x sin y if x ≠ 0 and y ≠ 0  x y  Π x Π y Πz  h 1 ( x , y , z ) = sin   cos   cos   2 2      2  for −1≤ x, y, z ≤1

[ (

)

(

k 1 ( x, y, z,t ) = 10 exp −5 x + exp −10 x + 0. 6

[

]

10 exp(− 5 ( y + 0 .9 ))exp(− 3 x − 0 .8 / 10 )

10 ( z −1)( z −1. 9 )(t − 0 .7 )(t +1 .8 )

)]

Fig. 8. The output of the BBFNN and the desired output h 1 .

Fig. 9. The output of the BBFNN and the desired output k1 .

(a)

(b)

(c)

(d)

for −1≤x, y,z ,t ≤1

Fig. 6. The output of the BBFNN and the desired output f1 .

Fig. 10. Evolution of the maximum, minimum and best number of neurons in the hidden layer

If Y is the vector of the desired output and Z is the vector of the output of the BBFNN then the percent of successful 

approximations is: 100  1 − 

Fig. 7. The output of the BBFNN and the desired output g 1 .

0-7803-7278-6/02/$10.00 ©2002 IEEE

norm (Y −Z )   norm(Y ) 

Fig. 6 plots the output of the BBFNN and the desired output f1 , the percent of successful approximations is 99.4%. Fig. 7 plots the output of the BBFNN and the desired output g 1 , the percent of successful approximations is 99.5%.Fig. 8 plots the output of the BBFNN and the desired output h 1 , the percent of successful approximations is 96.7%. Fig. 9 plots The output of the BBFNN and the desired output k 1 , the percent of successful approximations is 95.6% which confirms the effectiveness of the BBFNN in approximation task.

Fig. 10 a plots the evolution of the maximum, minimum and best number of neurons in the hidden layer for f1 . The best BBFNN has 14 neurons in the hidden layer. Fig. 10 b plots t h e evolution of the maximum, minimum and best number of neurons in the hidden layer for g 1 . The best BBFNN has 32 neurons in the hidden layer. Fig. 10 c plots the evolution of the maximum, minimum and best number of neurons in the hidden layer for h 1 . The best BBFNN has 34 neurons in the hidden layer. Fig. 10 d plots the evolution of the maximum, minimum and best number of neurons in the hidden layer for k 1 . The best BBFNN has 56 neurons in the hidden layer. The results confirm our previous objective owing to show the ability of genetic algorithm to be used for the design of BBFNN. We tested our hierarchical GA for the design of BBFNN on many approximands and it perform well with all functions. V. CONCLUSION This paper presented a hierarchical GA to train BBFNN. The problem was to find the optimal network structure and the different parameters of the network. Our genetic models can automatically determine appropriate structures and network parameters of beta basis function neural networks. In order to find the network's optimal structure, the process modifies the number of neurons in the hidden layer. The performance of the algorithm is achieved by evolving the initial population and by using operators that alter the sizes of the networks. This strategy attempts to avoid Lamarckism. The originality of the genetic model resides on: v A hierarchical GA for the design of BBFNN v Our genetic model can automatically determine appropriate structures and network parameters of BBFNN. v The new genetic operators (the mating operator the new_center operator). v The design of BBFNN used to approximate a multivariable function by GA .

REFERENCE [1] D. H. Ackley, "A connectionist algorithm for genetic search", Proc. On an Inter Conf. On genetic Algorithms and their application, J.Jgretensetette (ed.), Lawrence Erlbaum, Hillsdale, NJ, (1985) 121-135. [2] A.M. Alimi, R. Hassine, and M. Selmi, "Beta Fuzzy Logic Systems: Approximation Properties in the SISO Case", Int. J. Applied Mathematics & Computer Science, Special issue edited by D. Rutkowska and L.A. Zadeh, vol. 10, no. 4, (2000) 101-119. [3] M. A. Alimi, ''What are the Advantages of Using the Beta NeuroFuzzy System?'', Proc. IEEE/IMACS Multiconference on Computational Engineering in Systems Applications: CESA'98, Hammamet, Tunisia, April, vol. 2, (1998) 339-344. [4] M.A. Alimi, ''The Bet a Fuzzy System: Approximation of Standard Membership Functions'', Proc. 17eme Journees Tunisiennes d'Electrotechnique et d'Automatique: JTEA'97,Nabeul, Tunisia, Nov., vol. 1, (1997) 108-112. [5] C. Aouiti, M. A. Alimi, and A. Maalej '' A Genetic Designed Beta Basis Function Neural Networks for approximating of multi-variables

0-7803-7278-6/02/$10.00 ©2002 IEEE

functions ", Proc. Int. Conf. Artificial Neural Nets and Genetic Algorithms Springer Computer Science: ICANNGA'2001 , Prague, Czech Republic , April., (2001) 383-386. [6] C. Aouiti, M. A. Alimi, and A. Maalej, ''Genetic Algorithms to Construct Beta Neuro-Fuzzy Systems'', Proc. Int. Conf. Artificial & Computational Intelligence for Decision, Control & Automation: ACIDCA'2000 , Monastir, Tunisia, March., (2000) 88-93. [7] S. A. Billings, and G. L. Zheng, "Radial Basis Function Network Configuration Using Genetic Algorithms", Neural Networks, Vol. 8, No. 6, (1995) 877-890. [8] C. Chui, and X. Li, Approximating by rigide functions and neural networks with one hidden layer", Journal of Approximation Theory, 70, (1992) 131-141. [9] Y. Ito, "The perceptron, a perceiving and recognizing automaton", Neural Networks, Vol. 8, No. 6, (1991) 877-890. [10] F. Rosenblatt, "representation of functions by superpositions of a step or sigmoidal function and their applications to neural network theory", Project PARA, Cornell Aeronautical Lab. Rep., No. 85: 640-1, Buffalo, NY (1957). [11] X. Yao, "Evolving Artificial Neural Networks", Int. J. Neural Systems, Vol. 4, (1993) 385-394. [12] X. Yao and Y. Liu, “ A new evolutionary system for evolving artificial neural networks” IEEE Transactions on Neural Networks, 8, 3, 1997. [13] X. Yao “Evolving artificial neural networks” Proceeding of the IEEE, 87, 9, September 1999, 1423-1447. [14]A. Rahmouni, and M. Benmohamed, Genetic algorithm based methodology to generate automatically optimal fuzzy systems", IEE Proc-Control Theory Appl., Vol. 145, No. 6, (1998) pp. 583-586.