Genetic Programming

34 downloads 0 Views 2MB Size Report
Learning of Computer Programs” *1+. • GP algorithms are inspired by theory of. Evolution. ..... Time required to solve a puzzle. 35. Pakistan Institute of ...
Basics of Genetic Programming and GPLab Toolbox

Dr. Asifullah Khan

Pattern Recognition Lab Department of Computer and Information Sciences Pakistan Institute of Engineering and Applied Sciences (PIEAS)

Things to be Discussed Evolutionary Computing and its Types Introduction to Genetic Programming

GPLab (Genetic Programming Toolbox) Modules of GPLab

Parameters setting

Graphical outputs

GP examples GPLab based implementation Pakistan Institute of Engineering and Applied Sciences

2

Genetic Programming (GP) • “ Genetic Programming is the automated Learning of Computer Programs” *1+. • GP algorithms are inspired by theory of Evolution.

Pakistan Institute of Engineering and Applied Sciences

3

Natural Evolution The Darwinian theory of evolution can be summarized as: • In a world with limited resources and stable populations, each individual competes with others for survival. • Those individuals with the “best” characteristics (traits)are more likely to survive and to reproduce. • These desirable characteristics will be passed on to their offspring from generations to generations . Pakistan Institute of Engineering and Applied Sciences

4

Natural Evolution “Viewed as a Learning process, natural evolution results in very long-time learning from the collective experience of generations of populations” In other words, every living organism is the result of million of years of learning by its ancestors about how to survive on earth long enough to reproduce. Pakistan Institute of Engineering and Applied Sciences

5

Natural Evolution There are four essential preconditions for the occurrence of evolution by natural selection [1]: 1. Reproduction of individuals in the population 2. Variation that affects the likelihood of survival of individuals 3. Heredity in reproduction 4. Finite Resources causing competition

Pakistan Institute of Engineering and Applied Sciences

6

Natural Evolution “Competition results in winners or losers”

Natural Evolution “Generally, the most-fit pass on their traits”

Natural Evolution “Evolution requires a diverse population”

Pakistan Institute of Engineering and Applied Sciences

9

Evolution “Mating and mutation creates feature diversity from among the pool of mostly advantageous traits”

“It take thousands of cycles for truly amazing adaptations to emerge”

Evolutionary Computing “Evolutionary computing is the branch of computational intelligence that models the process of natural evolution”

Pakistan Institute of Engineering and Applied Sciences

11

Evolutionary Computing • Several evolutionary algorithms (EAs) have been developed so far Some of the common evolutionary algorithms are: – Differential Evolution – Cultural Algorithm, – Genetic Algorithms – Genetic Programming – …… Pakistan Institute of Engineering and Applied Sciences

12

Differential Evolution (DE) • Differential Evolution was proposed by Price and Storn in 1995. • DE differs from other EAs in the sense that distance and direction information from the current population is used to guide the search process [2]. • Nowadays, it is considered as one of the most powerful evolutionary algorithms for real number function optimization. Pakistan Institute of Engineering and Applied Sciences

13

Cultural Algorithm (CA)

Cultural Algorithm Framework BELIEF SPACE Adjusting Beliefs

Influence Function

Acceptance Function

POPULATION SPACE Evolutionary Operators Pakistan Institute of Engineering and Applied Sciences

Performance Function 14

General Features of CA • Dual Inheritance (at population and knowledge levels) • Knowledge are “beacons” that guide evolution of the population • Supports self adaptation at various levels • Evolution can take place at different rates at different levels (“Culture evolves 10 times faster than the biological component”). • Supports hybrid approaches to problem solving.

Pakistan Institute of Engineering and Applied Sciences

15

Genetic Algorithm

Pakistan Institute of Engineering and Applied Sciences

16

Genetic Algorithms (GA) • The first Evolutionary Computing developed to simulate genetic systems

technique

• A probabilistic search algorithm that iteratively transforms a set of mathematical objects (each with an associated fitness value) into a new population using Darwinian principle of natural selection.

Pakistan Institute of Engineering and Applied Sciences

17

Genetic Programming(GP) In 1992, Koza introduced GP by evolving tree structures

Pakistan Institute of Engineering and Applied Sciences

18

Genetic Programming(GP) • Information learned through biological evolution is regularly stored in DNA base pairs. Sequences of DNA base pairs act like instructions, facilitating the manufacturing of proteins. • This program like nature of DNA, together with the variable length nature of DNA, explains the appeal of biological evolution as a model for evolution of computer programs.

Pakistan Institute of Engineering and Applied Sciences

19

Genetic Programming (GP) GP is mostly used as a Machine Learning Approach Machine learning Evolutionary Algorithms

GP EP GA Artificial Intelligence ES

Artificial Intelligence

Genetic Programming (GP) • Specialization of Genetic Algorithms • GA vs. GP – Main difference between GA and GP is their representation scheme Genetic Algorithms Genetic Programming

• Strings • Tree Structure

Pakistan Institute of Engineering and Applied Sciences

21

Genetic Programming (Basic Algorithm) 1

• Create a population of Programs • Each program attempts to solve the given problem • Determine Fitness of Programs

2

3

• Determine fitness by success in solving the problem • Fitter the member, better its chance to produce offspring in the next generation

• Select Parents and Produce Offspring • Use Selection Schemes • Use crossover and mutation

Pakistan Institute of Engineering and Applied Sciences

22

Genetic Search Cycle Evaluate Candidate Solutions

Initial Population

Fitness Values

New Candidates

Apply Genetic Operators

Check Termination Criteria Save the Best

Selected Individuals

Perform Selection Fitness Values Pakistan Institute of Engineering and Applied Sciences

23

Tree Based Representation • GP evolves executable Computer Programs • Each individual represents a computer program as a tree structure • Tree structure has the following implications [1]; • Adaptive Individuals: Contrary to GAs where the size of individuals are usually fixed, a GP population will usually have individuals of different size, shape and complexity. • Domain Specific Grammar: A grammar needs to be defined that accurately reflects the problem to be solved. It should be possible to represent any possible solution using the defined grammar. Pakistan Institute of Engineering and Applied Sciences

24

Tree Representation (XOR) T=[x1, x2]

OR

F=[OR, AND, NOT]

AND

AND x1

NOT

NOT

x2

X1

Pakistan Institute of Engineering and Applied Sciences

x2

25

GP based Modeling of A Problem

Pakistan Institute of Engineering and Applied Sciences

26

Genetic Programming Components

Terminal Set

• Works as primitive data types • Constants • Parameter-less functions • Inputs • Members of this set make up the leaves of the program tree

Function Set

• Set of available functions • Often tailored according to the needs of the problem domain [2,4]

Pakistan Institute of Engineering and Applied Sciences

27

Using Trees To Represent Computer Programs 9+((X*7)/(Y-5))

Functions + / 9

*

X

Terminals

-

7

Y

5

Initial Population • Initial Population is generated randomly within the restrictions of maximum depth • For each individual – Root is randomly selected from the set of functions – Arity of function determines the branching factor of the root and non-terminal nodes – For each non root node an element is selected randomly from either the terminal set or the function set – If the element is selected from the terminal set the node becomes a leaf Pakistan Institute of Engineering and Applied Sciences

29

Initial Population (Cont.…) • Non-terminals are used to build a complete tree up to the leaf nodes, which are then completely populated with terminals. Every tree is grown to maximum depth and has the maximum number of nodes allowed.

Full

Grow

• The root node is chosen from the function set • All nodes not at maximum depth are chosen randomly and the growth for a branch ends when a terminal is chosen. • Trees can have irregular shapes

Ramp Half and Half

• The population is separated into M partitions, with ith partition having maximum depth of M-i • Half of each partition is populated with grow while the other half is populated with full.

Pakistan Institute of Engineering and Applied Sciences

30

Genetic Operators; Simple Examples Crossover

Mutation

Randomly select a node from Parent 1

Randomly select a node in the program tree

Randomly select a node from Parent 2

Remove that node and its subtree

Swap the two nodes along with their subtrees

Replace the node with a new subtree (generated using full, grow or Ramp half and half)

Pakistan Institute of Engineering and Applied Sciences

31

Genetic Operators (Cont..)

• Extensive research is going on for developing new Genetic Operators; Crossover and Mutation

Pakistan Institute of Engineering and Applied Sciences

32

Crossover Operator *

/

-

1

2

-

Parent 2

+

Parent 1

13

2

power

4

2

abs

-7 +

Child 1

Child 2

* / 13

2

2

4

Pakistan Institute of Engineering and Applied Sciences

1

power

2

abs

-7 33

Mutation Operator + * 1

Right subtree is randomly selected for mutation

+ 2

3

4 + *

The entire subtree is replaced

1

2

*

2 7

Pakistan Institute of Engineering and Applied Sciences

4 34

Fitness based selection • Parents for the production of next generation are selected on the basis of their fitness • Sum of absolute error is the most basic fitness function used • Fitness measure can be varied depending on problem domain – Number of correct solutions – Number of errors navigating a maze – Time required to solve a puzzle

Pakistan Institute of Engineering and Applied Sciences

35

Ranking Selection • Selection Based on Fitness Order – The members of the population are ranked from best to worst. – The selection probability is assigned based on the rank.

Pakistan Institute of Engineering and Applied Sciences

36

Tournament Selection • Select a subset of the population (the tournament size) randomly. • More fit (winning) individuals are used to generate replacements for less fit (losing) individuals. – Accelerates processing time competition) – Facilitates parallel processing

Pakistan Institute of Engineering and Applied Sciences

(compared

with

full

37

Types of GP • • • • • • •

Conventional Tree based GP Linear GP Graph based GP Cartesian GP Multi-Gene GP Multi-Objective Optimization based GP …….

Pakistan Institute of Engineering and Applied Sciences

38

Science-oriented Applications of GP • • • • • •

Sequence prediction/classification Forecasting & Prediction Crystallography; Biochemistry Datamining Geoscience and Remote Sensing …….

Pakistan Institute of Engineering and Applied Sciences

39

Engineering-oriented Applications of GP • • • • • •

On-Line control of Real robots Design of Electrical Circuits Spacecraft Attitude Maneuvers Antenna Design Motion Animation …….

Pakistan Institute of Engineering and Applied Sciences

40

Computer-Science-oriented Applications of GP • • • • • • • • •

Cellular Encoding of ANN Intrusion Detection Image Classification Digital Watermarking Computer Aided Diagnostics Systems Soccer Softbot Team Coordination GP-Music Evolutionary Art ……. Pakistan Institute of Engineering and Applied Sciences

41

Genetic Programming • Part-II –GPLAB

Pakistan Institute of Engineering and Applied Sciences

42

GPLab • GPLab is an open source genetic programming toolbox for MATLAB. • Developed by – Sara Silva (Evolutionary and Complex Systems Group, University of Coimbra, Portugal) – gplab.sourceforge.net/

Pakistan Institute of Engineering and Applied Sciences

43

GPLab (Continued) SET VARS (This module initializes the parameters)

GEN POP (Generates the initial population and calculates its fitness)

GENERATION (Generates new population by applying genetic operators)

Pakistan Institute of Engineering and Applied Sciences

44

SET VARS • This module – initializes the parameters with the default values – updates the parameters with the user settings

• This module can be called – by the user – by a request for parameter initialization from GEN POP

Pakistan Institute of Engineering and Applied Sciences

45

GEN POP • This module generates the initial population and calculates its fitness. • Three initialization methods exist – Full – Grow – Ramp Half and Half

• By default the fitness is the sum of absolute difference between obtained and expected results. • Custom fitness function can also be used. Pakistan Institute of Engineering and Applied Sciences

46

GENERATION • Generates new population by applying the genetic operators (tree crossover, tree mutation) • Parents are selected from the pool through one of the following four sampling methods – – – –

Roulette SUS Tournament Lexicographic Parsimony Pressure Tournament

Pakistan Institute of Engineering and Applied Sciences

47

GENERATION (Continued) • Three methods to calculate the expected no. of offspring are: – Absolute – Rank85 – Rank89

• Repeats itself until the stopping conditions is met or maximum generation is reached

Pakistan Institute of Engineering and Applied Sciences

48

Parameters (Tree Initialization) • Initial population of trees, created at runtime in the beginning of a GPLAB run – Initial maximum depth/size of the new trees is determined by the parameter “inicmaxlevel”

• Method to generate initial population is specified by the parameter “initpoptype” – Possible Values (‘fullinit’, ‘growinit’, ‘rampedinit’)

Pakistan Institute of Engineering and Applied Sciences

50

Parameters (Functions) • “functions” parameter can be used to indicate which functions GPLab should use • Functions available in GPLab to set functions are params=setfunctions(params,’func1’,2,’func2’,1); params=addfunctions(params,’func1’,2,’func2’,1);

• Table of functions on next slide enlists the functions present in GPLab • User defined functions can also be used

Pakistan Institute of Engineering and Applied Sciences

51

Parameters (Available Functions)

Pakistan Institute of Engineering and Applied Sciences

52

Parameters (Terminals) • Terminals are the variables needed to evaluate the fitness cases • GPLab can use as terminals – Constants – Any function with null arity e.g. rand()

• “terminals” parameter can be used to set the terminals • Function to set terminals is params=setterminals(params,’1’,’rand’);

Pakistan Institute of Engineering and Applied Sciences

53

Parameters (Genetic Operators) • Four operators are available in GPLab – – – –

Crossover Mutation Shrink Mutation Swap Mutation

• Functions to set operators are params=setoperators(params,’operator1’,2,2,’operator2’,2,1); params=addoperators(params,’operator1’,2,2,’operator2’,2,1);

Pakistan Institute of Engineering and Applied Sciences

54

Parameters (Selection) • Parents are selected according to one of the five sampling method – – – – –

‘roulette’ ‘sus’ ‘tournament’ ‘lexictour’ ‘doubletour’

• User defined sampling methods can also be used params.sampling=’new_sampling_method’;

Pakistan Institute of Engineering and Applied Sciences

55

Parameters (Expected Number of Children) • Expected number of children can be calculated using one of the three available methods – Absolute – Rank85 – Rank89

• Method to calculate expected no. of children can be selected by setting the parameter “expected”

Pakistan Institute of Engineering and Applied Sciences

56

Parameters (Fitness Measurement) • “calcfitness” parameter determines the method for fitness measurement • Methods available in GPLab for fitness measurement are “regfitness” and “antfitness” • User defined function can also be used for fitness measurement • Data File – When starting a GPLAB run the user is required to indicate the names of the files where the fitness cases are stored

Pakistan Institute of Engineering and Applied Sciences

57

Runtime Graphical Output • GPLab can represent some state variables of algorithm graphically as plots. – Runtime plots are updated in runtime after every generation.

• Four different graphs can be plotted in runtime determined by parameter “graphics” – – – –

plotfitness plotdiversity plotcomplexity plotoperators Pakistan Institute of Engineering and Applied Sciences

58

Fitness

Pakistan Institute of Engineering and Applied Sciences

59

Diversity

Pakistan Institute of Engineering and Applied Sciences

60

Complexity

Pakistan Institute of Engineering and Applied Sciences

61

Operators Probability

Pakistan Institute of Engineering and Applied Sciences

62

Offline Graphical Output • Five specialized functions are provided by GPLab to visualize different aspects of evolution and the results obtained. – – – – –

Accuracy VS Complexity Pareto Front Desired VS Obtained Operator Evolution Tree Visualization

Pakistan Institute of Engineering and Applied Sciences

63

Accuracy VS Complexity

Pakistan Institute of Engineering and Applied Sciences

64

Pareto Front

Pakistan Institute of Engineering and Applied Sciences

65

Desired VS Obtained

Pakistan Institute of Engineering and Applied Sciences

66

Tree Visualization

Pakistan Institute of Engineering and Applied Sciences

67

Genetic Programming (example)

Training Data (XOR) 2 Bit Input Data X1 X2

XOR

0

0

0

0 1 1

1 0 1

1 1 0

Pakistan Institute of Engineering and Applied Sciences

68

Genetic Programming (example) • Terminals – Two Bits of Data (each bit as a separate terminal)

T = { X1, X2 } • Functions – Functions for XOR problem includes the logical operators OR, AND, NOR and NAND

F = { OR, AND, NOR }

Pakistan Institute of Engineering and Applied Sciences

69

Tree Representation NOR NOR x2

AND NOR

x2

X2 NOR

X2

OR X1

X2

x1 Functions

Pakistan Institute of Engineering and Applied Sciences

Terminals 70

Tree Representation; Evolving If Then Rules for decision making IF (quality> 20) AND (Service > 80) THEN good ELSE bad can be represented by the following tree AND

>

quality

>

20

Service

80

Tree Representation; Evolving Codes i =1; while (i < 20) { i = i +1 }

SYMBOLIC REGRESSION

PREPARATORY STEPS

SYMBOLIC REGRESSION

POPULATION OF 4 RANDOMLY CREATED INDIVIDUALS FOR GENERATION 0

SYMBOLIC REGRESSION; x2 + x + 1 FITNESS OF THE 4 INDIVIDUALS IN GEN 0 x+1

x2 + 1

2

x

0.67

1.00

1.70

2.67

SYMBOLIC REGRESSION x2 + x + 1 GENERATION 01

Mutant of (c) Copy of (a)

picking “2” as mutation point

First offspring of crossover of (a) and (b) picking “+” of parent (a) and left-most “x” of parent (b) as crossover points

Second offspring of crossover of (a) and (b) picking “+” of parent (a) and left-most “x” of parent (b) as crossover points

References • 1.Banzhaf, W., Nordin, P., Keller, R. E., & Francone, F. D. (1998). Genetic programming: an introduction (Vol. 1). San Francisco: Morgan Kaufmann. • 2. Alpaydin, E. (2014). Introduction to machine learning. MIT press • 3. Poli, R., Langdon, W. B., McPhee, N. F., & Koza, J. R. (2008).A field guide to genetic programming. Lulu. com. • 4. Poli, R., & Koza, J. (2014). Genetic programming (pp. 143-185). Springer US.

Thanks

Thanks to Department of Electrical Engineering, COMSATS Attock and especially to Dr. Raja Asif