Evolutionary Reinforcement Learning for Neurofuzzy Control - CiteSeerX

1 downloads 0 Views 103KB Size Report
neurofuzzy network optimisation using an mGA and, secondly, to extend the work to reinforcement learning. The layout of the remainder of the paper is as ...
Evolutionary Reinforcement Learning for Neurofuzzy Control Munir-ul M. CHOWDHURY, Yun LI Centre for Systems and Control and, Dept. of Electronics and Electrical Engineering, University of Glasgow, Rankine Building, Glasgow G12 8LT, UK Abstract. Disadvantages of

traditional reinforcement learning techniques are complicated structures and that training algorithms are often reliant on the derivative information of the problem domain and also require a priori information of the network architecture. Such handicaps are overcome in this paper with the use of ‘messy genetic algorithms’, whose main characteristic is a variable length chromosome. This paper represents a novel approach to globally optimised and on-line learning fuzzy controllers for cases where supervised learning is difficult. The design method is based on the functional reasoning of fuzzy logic combined with reinforcement learning paradigm. In addition to the structural optimisation of the neurofuzzy network, the messy genetic algorithm has shown to be extremely flexible and computationally efficient. In keeping with reinforcement learning tradition, the evolutionary reinforcement learning based neurofuzzy controller is applied to the potentially unstable non-linear cart-pole balancing problem. Keywords: Neurofuzzy Control, Messy Genetic Algorithms, Reinforcement Learning

1. INTRODUCTION Recent realisation in the similarity and the potential of integrating neural and fuzzy methods has resulted in the development of numerous neurofuzzy controller (NFC) design methods [1-3]. The motivation for such integration lies in the fact that while fuzzy logic can emulate human reasoning and logic and is model independent, however, it is very much trial and error and human experience driven. Neural networks on the other hand can emulate the human brain and thought process, and have the ability to learn from past actions and process in parallel. However many existing neurofuzzy control design methods are based on the backpropagation algorithm [4,5] and are inadequate in that the network structure is fixed and the network requires quality training data. Obtaining the accurate and quality training data for engineering systems is probably the primary difficulty because often it is not possible to do so or the algorithms producing the training data are not in general predictive and can not truly represent the real world. An alternative learning strategy which learning fuzzy design methods have been receiving attention of late is reinforcement learning (RL) [6]. However, traditional reinforcement learning techniques are of complicated structures and training algorithms that are often reliant on the derivative information of the problem domain, and also require a priori information of the network architecture. Numerous attempts have been made to overcome some of these problems using evolutionary or genetic

algorithm techniques [7-9]. In a regular genetic algorithm, however, a difficulty exists which lies in the encoding of the problem by highly fit gene combinations of a fixed-length. For the structure of the controller to be coded, the required linkage format is not exactly known and the chance of obtaining such a linkage in a random generation of coded chromosomes is slim. This issue can be easily addressed using a ‘messy genetic algorithm’ (mGA), whose main characteristic is the variable length of chromosomes. The first attempt to use an mGA for designing fuzzy controllers to design hierarchical fuzzy controllers, which is based on techniques used for fuzzy classifier systems [10]. For this, the rule base is constructed from general rules and special ones relating to special circumstances. In this paper, no such structure is necessary since neurocontrol techniques are used. To this effect, the aims of this paper are first to present a brief overview of our earlier work on neurofuzzy network optimisation using an mGA and, secondly, to extend the work to reinforcement learning. The layout of the remainder of the paper is as follows. Section 2 gives a brief overview of mGAs and Section 3 outlines the type of neurofuzzy structure employed. The reinforcement learning algorithm is described in Section 4 and results of a simple simulation are presented in Section 5.

2. MESSY GENETIC ALGORITHMS

Genetic algorithms are loosely modelled on processes that appear to be at work in biological evolution and the working of the immune systems [9,10]. Central to evolutionary system is the idea of a population of genotypes that are elements of high dimensional search space. More generally, a genotype can be thought of as an arrangement of genes, where each gene takes on values from a suitably defined domain of values. Each genotype encodes for typically one, but possibly a set of candidate solutions, phenotypes, in our case a class of neurofuzzy architecture. The evolutionary process works on a population of such genotypes, preferentially selecting genotypes that code for high fitness phenotypes and reproducing them. Genetic operators such as mutation, crossover, inversion, etc., are used to introduce variety into the population and to sample variants of candidate solutions represented with in the current population. Thus by survival of the fittest GA over several generations, the population gradually evolves towards genotypes that correspond to high fitness phenotypes. A GA is a non-deterministic search algorithm based on the ideas of genetics. GAs try to mimic the Darwinian theory of natural selection and evolution, tending to find optimal solutions to problems instead of trying to solve them directly [9,11]. GAs are global optimisation methods requiring no derivative information and have been successfully applied to many fuzzy control applications, but not without objections. The problem arises with the encoding of the problem parameters. In a regular GA, a coded chromosome is in fixed length that highly fit allele combinations are formed to obtain a convergence towards global optima. Unfortunately the required linkage format (or the structure of the controller to be coded) is not exactly known and the chance of obtaining such a linkage in a random generation of coded string is poor. Poor linkage also means that the probability of disruption on the building block by the genetic operators is much higher [12]. Although inversion and reordering methods can be used to adaptively search tight gene ordering, these are too slow to be considered useful. In our previous work we developed a new learning method using a messy GA [13]. Fig. 1 illustrates the pseudo code for an mGA. The main difference between an mGA and a regular GA is that the mGA uses varying string lengths; the coding scheme considers both the allele positions and values; the crossover operator is replaced by two new operators called cut and splice; and it works in two phases primordial phase and juxtapositional phase. The selection mechanism is as in regular GA but is executed in primordial and juxtapositional phases. During the primordial phase, the population is first

initialised to contain the all possible building blocks of a particular length, thereafter only the selection operator is applied. This results in enriched population of building blocks whose combination will create optimal or near optimal strings. Also, during this phase, the population size is reduced by halving the number of individuals at specified intervals. The juxtapositional phase follows the primordial phase, and here the GA invokes the cut, splice and the other GA operators. Void mGA{ template = zeros; for (level=1;level