grammatical evolution for development of neural ...

74 downloads 1132 Views 111KB Size Report
The grammar REL-INT (relative link pointers, integer encoded parameters) .... 404. URL citeseer.ist.psu.edu/koza91genetic. html. Michalewicz Z., 1994. Genetic ...
GRAMMATICAL EVOLUTION FOR DEVELOPMENT OF NEURAL NETWORKS WITH REAL-VALUED WEIGHTS USING CELLULAR ENCODING Jan Drchal ˇ Miroslav Snorek Czech Technical University, Faculty of Electrical Engineering, Department of Computer Science and Engineering Karlovo n´amˇest´ı 13, 121 35, Prague, Czech Republic email: {drchaj1|snorek}@fel.cvut.cz

KEYWORDS Neural networks, Grammatical Evolution, Cellular Encoding ABSTRACT Artificial Neural Network is a well-known tool used for data modeling of systems. This paper focuses on socalled TWEANNs (Toppology and Weight Evolving Artificial Neural Networks). TWEANNs are Evolutionary Algorithms (EAs) which evolve both topology and parameters (weights) of neural networks. Here, we concentrate on a use of an indirect developmental encoding which is an approach inspired by multi-cellular organisms’ development from a single cell (zygote) known from Nature. We examine multiple modifications of a known tree-based indirect developmental encoding: the Cellular Encoding. Grammatical Evolution (GE) is employed instead of Genetic Programming (GP) to optimize program trees. GE is advantageous mainly in the way it can handle constraints (as it evolves program trees which conform to a grammar prespecified using a BNF notation). Moreover, we employ GE’s inner mechanisms to efficiently encode Neural Network parameters (weights and biases). The Cellular Encoding is a neuron-centric approach, therefore an increased attention should be paid to a way in which a proper synaptic link is selected (prior to modification of its parameters). In this work, we compare three different link selection schemes. The results of our investigations show that our modifications of Cellular Encoding improve the ability to evolve real-valued Artificial Neural Networks. INTRODUCTION Using Evolutionary Algorithms (EAs) (Michalewicz 1994) to learn Artificial Neural Networks (ANNs) (Tsoukalas and Uhrig 1996) is a well examined approach as EAs are very robust. Even harder optimization problems must be solved for TWEANNs (Topology and Weight Evolving Artificial Neural Networks) – these algorithms are not limited to search for

proper weight settings only but they also optimize the topologies of neural networks. TWEANN approach is useful for ANN user, who does not have to experiment with different topologies. Also finding (at least near) optimal topology leads to better data modeling results. The problem of most recent TWEANN algorithms is in their inability to evolve large-scale modular ANNs. This is mainly caused by the so-called curse of dimensionality, where optimization methods fail because of high-dimensional space to search through. Most current TWEANN methods use direct encoding approaches to represent ANNs. In direct encoding a single gene describes a single feature (e.g. a connection or a neuron) of an ANN. This is unlike in Nature where genome rather represents a ”program” to build the target organism. Human genome consists of roughly 30 000 genes and is able to encode more than 20 billion neurons, each linked to as many as 10 000 others, plus, of course, the rest of the organism. This efficient information storage can be seen as a kind of compression. Hence, the translation of a genotype to phenotype is a decompression process. Of course, such a high level of compression is only possible, if the information is highly regular. This is true for Nature, as organisms and their brains are known to be highly modular, hierarchical systems. Artificial encodings which are trying to possess such attributes are known as indirect encodings. The indirect developmental encodings for ANNs can be found for example in the works of Eggenberger (Eggenberger-Hotz 1997, Eggenberger-Hotz et al. 2003) or Stanley (D’Ambrosio and Stanley 2007, Gauci and Stanley 2007). This work particularly focuses on experiments with tree-based indirect encodings of ANNs. In (Gruau 1994) Gruau introduced an indirect encoding called Cellular Encoding (CE), where the development of ANNs is controlled by evolved program trees. Trees are well examined data structures and there was already a great amount of interest in development of tree structure optimization Evolutionary Algorithms. The most widely known, Koza’s Genetic Programming (GP) (Koza 1992), which was originally used to evolve

LISP program trees, was adopted by Gruau and used to optimize CE development trees. Here, we employ GP’s successor Grammatical Evolution (GE) as we have alˇ ready used it in (Drchal and Snorek 2008). In contrast with the previous work, where we have experimented with boolean neural networks, here, we compare different modifications of CE which are able to encode ANNs with real-valued weights. EVOLUTION OF TREE STRUCTURES Koza’s Genetic Programming (GP) (Koza 1992) is a well-known Evolutionary Algorithm for optimization of tree structures. GP trees can represent mathematical models of data or programs. Figure shows a simple function of two variables described by an evolved GP tree. The tree nodes are labeled by symbols which can represent either operations, variables or constants. Each node has a predefined arity (number of child nodes). Constants and variables are localized in leaf nodes. GP was already used for evolution of ANNs (Koza and Rice 1991). However, the encoding was direct.

codon and transform it to right interval (). CELLULAR ENCODING Cellular Encoding (CE) was introduced by Gruau in Gruau (1994), where more detailed description can be found. In CE, ANN development starts with a single cell (neuron). A development tree is traversed using breath-first search starting from the root node. CE uses symbols for cell division (PAR, SEQ) and updating of cell registers like neuron bias or input link pointer (B+, INC etc.). The original Cellular Encoding was limited to Boolean Neural Networks where network’s input and output is binary. Boolean Neural Networks use neurons which fire 1 for activities above given threshold and 0 for lower. Weights can be either 1 or −1. The following list contains short description of symbols used in CE development trees: • SEQ sequential division – a new cell inherits mother cell’s outputs, input of a new cell is connected to the mother cell’s output using a connection of weight 1. Development instructions for a mother cell continues in a left subtree, while for a new cell in the right. • PAR parallel division – create a new cell which has the same set of inputs and outputs as the mother cell.

Figure 1: Genetic Programming evolved tree. This tree is assembled of 7 symbols, it uses three operations (+, ×, /), two variables (x, y) and two constants (2, 5). It represents a function f (x, y) = (x × 2) + (5/y).

GRAMMATICAL EVOLUTION In (Ryan et al. 1998) Ryan, Collins and O’Neil introduced a Grammatical Evolution (GE) which is an Evolutionary Algorithm able to evolve solutions according to a user-specified grammar. Unlike GP, GE brings more flexibility as user is able to constrain the way in which the program symbols are assembled together. In GP it is impossible to control their order – any symbol can become a child or a parent of any other symbol. The GE approach is therefore able to radically cut down the search space. The grammar in GE is specified using Backus Naur Form (BNF). GE uses linear genome of integer numbers (codons). When expanding a nonterminal symbol having a choice from n possible rewriting rules, we chose rule No. iM ODn, where i is the integer value of the actual codon. The problem arises when we need to encode a real-valued constants, most often a grammar is expanded to do so (?). The other approach is to use the integer number which encodes a

• INC, DEC increase/decrease internal input link pointer. • W+, W- set weight of an input link designated by an internal input link pointer to 1 or −1. • B+, B- increase/decrease bias by 1. • CUT cut incoming connection given by the internal input link pointer. • WAIT do nothing – continue with the next step (needed for synchronization). • END end processing of this cell. Gruau has later developed a modified method which is capable to encode real-valued weights Gruau et al. (1996). TESTED GRAMMARS This section describes CE based grammars which we have used in our experiments. Rather than using Gruau’s real-valued weight encoding Gruau et al. (1996) which was not suitable for Grammatical Encoding we have decided to modify the existing CE for Boolean Networks. The distinction is that we do not use any kind of special input and output cells (neurons). In the start, an initial cell (zygote) is fully connected to both

input neurons and output neurons. This modification led as to introduction of a new cell register the output link pointer which is an exact counterpart of the input link pointer mentioned above. It selects the neuron output link for link operations such as setting weight or cutting link. The CE had to be of course expanded by counterparts of INC, DEC and all symbols which manipulate links (W+, W- and CUT). These changes to the original CE seems only to increase the dimensionality of the problem. However, we found, that it was impossible to evolve real-valued ANNs for even the simplest tasks without ability to control both input and output neuron’s links. This would be addressed in further research. Note, that we have omitted the CUT symbol in all of our grammars as it was not needed by the test problem (see the next Section). It can be simply added, though. PNT-GRA grammar The grammar designated PNT-GRA (link pointer-based, grammar encoded parameters) resembles the original CE described in the previous section (note that program symbols are shortened, this is in order to save space). ::= N|n|S|P|I|i| O|o|W|w|B ::= ::= 0|1|2|3|4|5|6|7|8|9 The non-terminal which represents a program tree is also a starting symbol. The symbol N corresponds to END, n to WAIT, S to SEQ and P to PAR. The symbols I, i, O and o increase/decrease the cell’s input/output link pointer. The symbols W and w sets the proper input/output weight value and in the similar way the symbol B sets the cell’s bias to . Note, that the nonterminal produces integers in a range from 0 to 999. In a final network this integer is linearly transformed to an interval −100..100 for both weights and biases. PNT-INT The PNT-INT (link pointer-based, integer encoded parameters) grammar uses a different way to encode constants: ::= N|n|S|P|I|i| O|o|W|w|B ::= C|c Here, the non-terminal was replaced by a nonterminal which can be rewriten to two special symbols C and c. The codon value to rewrite is used as a parameter value. It is clear, that we have to use at least two symbols (C and c) – in a case of only a single rewriting rule no codon is used. To match the grammar encoded parameter approach (GRA) the used codons were also represented by integers from 0 to 999. Note that

in comparison with the GRA approach which needs three codons to encode a parameter with the given precision, INT suffices with the only one. ABS-GRA The ABS-GRA (absolute link pointers, grammar encodes parameters) grammar is based on a PNT-GRA grammar. ::= N|n|S|P|W| w|B ::= C|c ::= ::= 0|1|2|3|4|5|6|7|8|9 Both link input pointer and link output pointer were removed. On the other hand the link modification symbols (W and w) were added an integer input . The input/output link is selected as a modulo of a current number of input/output links and the integer. ABS-INT The grammar ABS-INT (absolute link pointers, integer encoded parameters) further simplifies the previous grammar by the use of integer parameter encoding: ::= N|n|S|P|W| w|B ::= C|c REL-INT The grammar REL-INT (relative link pointers, integer encoded parameters) is an attempt to simplify the classical PNT link poiner approach: ::= N|n|S|P|Y| y|W|w|B ::= C|c In the classical approach, symbols I, i, O and o increase/decrease the relevant link pointers. When the number of the neuron’s input/output link’s is high this leads to program trees with a high number of repeated symbols. REL approach introduces symbols Y and y which change input/output link pointers by an integer number from interval −5..5. EXPERIMENTS In this section we show results obtained by using grammars introduced int the previous section. We have evolved ANNs with real-valued weights which solve a well-known XOR problem. XOR problem, although very simple, is a satisfactory tool to show the different behaviour of proposed encodings. The XOR function can be described by the Table 1:

Table 1: The XOR function table. The xi1 , xi2 are the function parameters, while y i designates the function value. i xi1 xi2 y i 1 2 3 4

0 0 1 1

0 1 0 1

0 1 1 0

The fitness f of the developed ANNs puted using the following equation:

N (xi1 , xi2 )

4 X ¯ ¯ ¯N (xi1 , xi2 ) − y i ¯ f (N ) = 4.0 −

Table 2: A comparison of link selection schemes PNT and ABS for grammatical approach of parameter encoding – grammars PNT-GRA and ABS-GRA. Table shows average (avg), standard deviation (stdv), minimum (min) and maximum (max) values for both number of generations needed for optimization and number of used symbols of evolved program trees. PNT ABS

Generations

avg stdv min max

212.26 34.73 123 307

192.03 35.11 59 285

Symbols

avg stdv min max

75.44 33.89 26 303

69.29 29.44 27 293

was com-

(1)

i=1

The network was considered succesful when its fitness f (N ) ≥ 3.9. All experiments were done using population of 1500 individuals, with initial chromosome size of 20, crossover probability 0.9 and mutation probability 0.1. Wrapping was turned off. We have used a tournament selection with tournament size 2. The averaged results over 1000 runs are shown in Tables 2 and 3. The Table 2 summarizes the grammatical approach of parameter encoding (PNT-GRA and ABS-GRA). We can see that the ABS link selection performs about 10% better in number of generations needed for optimization than the classical PNT approach. Also, the lenght of resulting ANN encoding is shorter – by 8% on average. Table 3 summarizes the integer approach (PNT-INT, ABS-INT and REL-INT). It is clear, that for both PNT and ABS link selection schemes the integer approach outperforms the grammatical one. The ABS-INT is the winner of all tested grammars. It beats the classic Gruau-like PNT-GRA by about 33% percent in the number of generations and 23% in the number of symbols needed. The REL scheme was the least successful of all grammars using the integer approach. CONCLUSIONS In this paper, several modifications of Cellular Encoding (CE) for real-weighted Artificial Neural Networks (ANNs) are presented . The modifications took advantage of the Grammatical Evolution (GE) algorithm used for evolving CE program-trees. The comparison was done in two ways: • Comparison of different approaches of parameter encoding (weights and biases). We have compared a classical grammar approach (GRA) and an integer aproach (INT) which benefits from inner mechanism of GE algorithm, where genome consist of integer codons.

Table 3: A comparison of link selection schemes PNT, ABS and REL for integer approach of parameter encoding – grammars PNT-INT, ABS-INT and REL-INT. PNT ABS REL

Generations

avg stdv min max

148.81 25.79 48 216

133.04 29.32 39 216

210.06 34.04 3 319

Symbols

avg stdv min max

57.82 25.00 18 182

51.67 22.47 19 172

58.52 25.63 19 213

• Comparison of different link selection schemes: a classical poiner to link scheme (PNT), our absolute selection scheme (ABS) and our modification of a PNT scheme – the relative selection scheme (REL). The experiments shown the superiority of INT over GRA parameter encoding approach. This can be explained by the fact that the INT is able to encode shorter genomes (in our case only a single symbol was needed instead of three to encode a parameter). Also, the Evolutionary Algorithm benefits from integer number order, when its mutation operators are well designed. In the comparison of link selection schemes the ABS scheme came out as a winner. Again, it can be explained by the compactness of the encoding. Another reason may be the fact that each link modification operator is explicitely forced to select a target link. The REL scheme came out as the worst, however, we supposse that when a kind of recurrent tree evaluation operators will be used (see R program-symbol in Gruau (1994)) it may bring results. In the future, it may be also inter-

esting to experiment with a combination of ABS and REL approaches. The combination of an indirect developmental encoding and proper optimization method is assumed to bring compact genomes able to describe large-scale, modular neural networks. Further experiments with our modifications of Cellular Encoding should be aimed at this direction. ACKNOWLEDGMENTS This research is partially supported by the grant Automated Knowledge Extraction (KJB201210701) of the Grant Agency of the Academy of Sciences of the Czech Republic and the research program ”Transdisciplinary Research in the Area of Biomedical Engineering II” (MSM6840770012) sponsored by the Ministry of Education, Youth and Sports of the Czech Republic and by the CTU IGS under grant CTU0812113. REFERENCES D’Ambrosio D.B. and Stanley K.O., 2007. A novel generative encoding for exploiting neural network sensor and output geometry. In GECCO ’07: Proceedings of the 9th annual conference on Genetic and evolutionary computation. ACM, New York, NY, USA. ISBN 978-1-59593-697-4, 974–981. doi:http://doi.acm.org/ 10.1145/1276958.1277155. ˇ Drchal J. and Snorek M., 2008. Tree-based Indirect Encodings for Evolutionary Development of Neural Networks. In Artificial Neural Networks - ICANN 2008, 18th International Conference Proceedings. Springer, Heidelberg, vol. 2. ISBN 978-3-540-87535-2. ISSN 0302-9743, 839–848. Eggenberger-Hotz P., 1997. Creation of Neural Networks Based on Developmental and Evolutionary Principles. In ICANN ’97: Proceedings of the 7th International Conference on Artificial Neural Networks. SpringerVerlag, London, UK. ISBN 3-540-63631-5, 337–342. Eggenberger-Hotz P.; G´omez G.; and Pfeifer R., 2003. Evolving the morphology of a neural network for controlling a foveating retina and its test on a real robot. In Artificial Life VIII. The 8th International Conference on the Simulation and Synthesis of Living Systems. Gauci J. and Stanley K., 2007. Generating large-scale neural networks through discovering geometric regularities. In GECCO ’07: Proceedings of the 9th annual conference on Genetic and evolutionary computation. ACM, New York, NY, USA. ISBN 978-1-59593-697-4, 997–1004. doi:http://doi.acm.org/10.1145/1276958. 1277158.

Gruau F., 1994. Neural Network Synthesis using Cellular Encoding and the Genetic Algorithm. Ph.D. thesis, France. URL citeseer.ist.psu.edu/ frederic94neural.html. Gruau F.; Whitley D.; and Pyeatt L., 1996. A Comparison between Cellular Encoding and Direct Encoding for Genetic Neural Networks. In J.R. Koza; D.E. Goldberg; D.B. Fogel; and R.L. Riolo (Eds.), Genetic Programming 1996: Proceedings of the First Annual Conference. MIT Press, Stanford University, CA, USA, 81–89. URL citeseer.ist.psu.edu/ gruau96comparison.html. Koza J.R., 1992. Genetic programming: on the programming of computers by means of natural selection. MIT Press, Cambridge, MA, USA. ISBN 0-262-11170-5. Koza J.R. and Rice J.P., 1991. Genetic Generation of Both the Weights and Architecture for a Neural Network. In International Joint Conference on Neural Networks, IJCNN-91. IEEE Computer Society Press, Washington State Convention and Trade Center, Seattle, WA, USA, vol. II. ISBN 0-7803-0164-1, 397– 404. URL citeseer.ist.psu.edu/koza91genetic. html. Michalewicz Z., 1994. Genetic algorithms + data structures = evolution programs (2nd, extended ed.). Springer-Verlag New York, Inc., New York, NY, USA. ISBN 3-540-58090-5. Ryan C.; Collins J.J.; and O’Neill M., 1998. Grammatical Evolution: Evolving Programs for an Arbitrary Language. In EuroGP. 83–96. Tsoukalas L.H. and Uhrig R.E., 1996. Fuzzy and Neural Approaches in Engineering. John Wiley & Sons, Inc., New York, NY, USA. ISBN 0471160032.