Comparing Evolutionary Methods for Reservoir

0 downloads 0 Views 601KB Size Report
Technology of Pernambuco, Av Professor Luis Freire, 500, Cidade. Universitaria ..... wind speeds obtained by the wind headquarters in Triunfo. The series is made up .... http://www.math.washington.edu/ rtr/fundamentals.pdf. [2] U. D. Schiller ...
Comparing Evolutionary Methods for Reservoir Computing Pretraining Aida A. Ferreira, Teresa B. Ludermir

Abstract—Evolutionary algorithms are very efficient at finding "optimal" solutions for a variety of problems because they do not impose many limitations encountered in traditional methods. Reservoir Computing is a type of recurrent neural network that allows for the black box modeling of (nonlinear) dynamic systems. In contrast to other recurrent neural network approaches, Reservoir Computing does not train the input and internal weights of the network; only the output layer is trained. However, it is necessary to adjust parameters and topology to create a "good" reservoir for a given application. This study compares three different evolutionary methods in order to find the best reservoir applied to the task of time series forecasting. The results obtained with the methods are compared regarding the performance (prediction error) and regarding the computational complexity (time). We used three sets to compare the methods’ results. The results show that it is possible to find well-adjusted networks automatically and that the weights search, without restriction of the echo state property, allows for more adequate solutions to be found for the problem with a lower computational cost.

O

I. INTRODUCTION

ptimization is the process through which to find the best solution amongst a group of possible solutions for a problem. The search process normally starts with an initial solution or a group of them, carrying out progressive enhancements until reaching another set that contains one or all the best possible solutions within a search space [1]. Optimization problems occur frequently in all scientific and engineering areas, coming up whenever there is the need to minimize (or maximize) an objective’s function, which depends on a set of decision variables, while satisfying a set of restrictions related to the choice of those variables. The objective function allows for comparisons of different possible variable choices in order to determine which of these choices would be the best. Among the usual optimization applications are minimizing costs, maximizing profits, minimizing errors, etc. Recurrent Neural Networks (RNNs) of the Reservoir

Computing type differ basically in size, connectivity, type of neuron, topology and output layer training algorithm. These systems’ optimization schemes have attracted many researchers’ attention [2][3][4][5]. Although RC optimization is a challenge, the assessment of its performance for a certain task, is relatively very simple, therefore, RC evolutionary pre-training methods are a natural strategy in finding the best solution for a determined task [6]. II. RESERVOIR COMPUTING Recently, two different solutions were proposed for problems concerning RNNs: Liquid State Machine (LSM) [8] and Echo State Networks (ESN) [9]. They were created independently, with distinct types of application and with different parameter forms [10]. A new RNN paradigm, suggested by Verstraeten, Schrauwen, D'Haen e Stroobandt [11], named Reservoir Computing (RC), proposes the union of LSMs and ESNs in a single research field. This paradigm has yielded promising results [6]. Reservoir Computing showcases an RNN architecture in which its basic structure is made up of two main components: a recurrent non-linear topology of processing elements called Dynamic Reservoir (DR), and a linear output layer, called readout. The DR states are called echo states and they contain information on the entry patterns’ history. In this study we will specifically use the ESNs in the RC paradigm. Fig.1 shows the schematic view of a typical ESN with K input units, N internal units and L output units. The ESNs general state update equation is presented in equation ( 1 ) and the readout equation in equation ( 2 ). x n

1

f W u n 1 Wx n . W

n

1

f

W

u n W

Manuscript received January 30, 2011. Aida A. Ferreira, is with the Federal Institute of Education, Science and Technology of Pernambuco, Av Professor Luis Freire, 500, Cidade Universitaria, Cep:50.740-530 – Recife – PE – Brazil (corresponding author to provide phone: 55-81-21268986; e-mail: [email protected]) and Center of Informatics (CIn), Federal University of Pernambuco (UFPE), P.O. Box 7851, Cidade Universitária, Cep: 50.740-530 – Recife – PE – Brazil. Teresa B. Ludermir is with the Center of Informatics (CIn), Federal University of Pernambuco (UFPE), P.O. Box 7851, Cidade Universitária, Cep: 50.740-530 – Recife – PE – Brazil (corresponding authors to provide phone: 55-81-21268430; e-mail: [email protected]).

1 y n

W

W 1

x n W

(1)

y n

(2)

1 .

Where u n determines the entrance in time n; x n represents the dynamic reservoir’s state in time n; y n is the output in time n and f is the activation function for the DR units (usually sigmoid functions) and f is the activation function for the output layer’s units. The W matrix represents the DR connections, matrix W represents the DR input layer connections, the W matrix represents the output layer connections to the DR and the W matrix

represents the bias connections to the DR. The W matrix represents the DR connections to the output layer, the W matrix represents the input layer connections to the output layer, the W matrix represents the output layer connections to the output layer and the W matrix represents the bias connections to the output layer. Only the connections that are directed to the output layer are trained (W , W ,W , and W ).

matrix by the highest absolute value of its eigenvalues and then multiplying by a factor chosen by the specialist (generally close to 1, in order to be near the stability frontier). Jaeger pointed out in [12] that networks lacking the echo state property can sometimes be transformed into networks possessing echo state property through the inputs’ useful components and that this can be observed in biological neural networks. III. EVOLUTIONARY STRATEGY FOR RC OPTIMIZATION

Fig.1. Schematic view of typical ESN with K input units, N internal units and L output units. Dashed arrows indicate connections that are possible but not required.

In this work we also used a leak rate α for the state update equation in order to make the reservoir timescale more flexible in terms of matching input timescale. Equation ( 3 ) shows the changes. The leak rate α, can effectively tune the dynamics of the reservoir. If the leak state is chosen correctly, the reservoir dynamics can be adjusted to match the timescale of the input flow, making it possible to achieve an enhanced performance [6]. x n

1

f

α W u n

1

1 α x n Wx n W

y n

W

.

(3)

In [9] Jaeger describes the necessary conditions for a reservoir, built with sigmoid neurons and tanh() activation function, to acquire the echo state property. The echo state property indicates that a previous state’s effect, x n , and a previous input, u n , in a future state, x n 1 , should gradually disappear as time goes by, and should not persist or even be amplified. The conditions described by Jaeger in [9] and in [12] for the echo state property are based on the W , and on the spectral largest singular value, σ σ |λ | W . The first condition states that in radius, ρ order for there to be an echo state property, the largest singular value must respect the following restriction: σ 1. The second condition, known as ρ 1 , states that if the spectral radius of the reservoir’s weight matrix is greater than 1 the network has an asymptotically unstable null state, consequently, it does not possess the echo state property. Like the first condition, σ 1, it is considered to be too restrictive for Jaeger in [9]. He uses the second condition in order to propose a heuristic method for the construction of reservoirs in [12]: to randomly build the reservoir’s weight matrix, from a normal distribution with mean equal to 0 and variance equal to 1. To rescale that matrix, first, dividing the

Genetic Algorithms (GAs) are very efficient in searching for optimal solutions (or at least approximately optimal ones) for a wide variety of problems because they do not impose many of the limitations found in traditional methods [15]. GA creates three types of children for the next generation. Elite children are the individuals in the current generation with the best fitness values. These individuals automatically survive to the next generation. Crossover children are created by combining the vectors of a pair of parents. Mutation children are created by introducing random changes, or mutations, to a single parent. . A Reservoir Computing search using GA can be carried out in three different ways: a) as an evolutionary process that operates directly on the network’s topology; b) as an evolutionary operator that works on the main system generating parameters and c) both at the same time. According to Verstraeten et al [11] and Ishii et al [16] the focal parameters which establish the RC dynamic are the number of neurons, the spectral radius, the connection percentage, and the type of neuron activation function. If the evolutionary method operates directly on the connections (options a and c of the previous paragraph), the search space can be too big. For a network with N neurons the genetic vector is in the order of N2 and special operators are needed to combine individuals with distinct sizes [16]. The second way (option b from the previous paragraph) was used in various studies [16], [17], [18]. A problem with that strategy is the imprecise investigation of global parameters, specifically because the network’s interconnections and weights are randomly generated. On the other hand, the genetic vector and the search space are significantly smaller. This study presents three evolutionary methods called RCDESIGN (Reservoir Computing Design e Training), Classical Searching and Topology and Radius Searching. Classical Searching and Topology and Radius Searching used the traditional uniform crossover and mutation operators, but for RCDESIGN it was necessary to create crossover and mutation operators that were adapted to the individuals’ populations of varying sizes. The methods use the same fitness function ( 4 ): f

MSE

MSE

MSE

.

(4)

Where f is the value to be minimized by the evolutionary

algorithm and the MSE is calculated as in ( 5 ): MSE

100

1 NP

P

N

L

T

.

(5)

Where P is the total number of patterns in the set; N is the number of output units of the ESN; L and T are actual and desired outputs (targets) of the i-th neuron in the output layer respectively. The selection function chooses parents for the next generation based on their scaled values from the fitness scaling function. An individual can be selected more than once as a parent, in which case, it contributes its genes to more than one child. The selection option, used in all methods, was stochastic uniform. In this case the parents are chosen by a line in which each parent corresponds to a section of the line whose length is proportional to its scaled value. The algorithm moves along the line in steps of equal size. At each step, the algorithm allocates a parent from the section it lands on. The first step is a uniform random number smaller than the step size. The GA parameters used by the three methods were chosen through random experiments. TABLE 1 shows a summary of these parameters. TABLE I GA PARAMETERS RCDESIGN Classical Searching 120

120

120

Number of generations (NG)

10

10

10

2 75%

2 85%

2 85%

3%

3%

5%

Probability of crossover (ProbC) Tax of mutation (TM)

• • • •

S or η – Size of W, W , W and W . [50; 200]. , 0, there is no connection. S - If 1, there is W , 0, there is no connection. S - If 1, there is W S - If 1, there is W , 0, there is no connection. S - If 1, there is W , 0, there is no connection. S - If 1, there is W , 0, there is no connection. S - If 1, the activation function is tanh(); if 2, the activation function is sign(). S - If 1, the readout training function is pseudoinverse; if 2, it is ridge regress. S - Leak rate value, this parameter can effectively tune the time scales of the reservoir. [0.1; 1]. S - Regularization parameter value. [10-8; 10-1]. - Weights of W, W , W and S ... S

W . [-1; 1]. Vector S will have a minimum size of 2,660 when η equals 50 and will have a maximum size of 40,610 when η equals 200. Fig. 2 shows the conceptual division of S .

Topology and Radius Searching

Population size (TP)

Number of elite

• • • • • • •

We developed the methods using MATLAB and RC Toolbox [7]. The toolbox has two linear regression functions for readout training, Moore-Penrose pseudo-inverse [6] and Ridge Regress [19]. The regularization parameter used in this work is the added noise used by pseudo-inverse or the λ of ridge regress. A. RCDESIGN RCDESIGN uses GA and simultaneously looks for the best values of parameters, topology and weight matrices without rescaling the reservoir matrix by the spectral radius. RCDESIGN considers RC in all its non-linearity, because it allows the possibility for all its connections, instead of using only required connections. P is a group (population) of S vectors, where n represents a genetic algorithm generation and S represents an individual in the population. The maximum value for n is the maximum number of generations (NG ) and the size of set P is defined by the TP parameter of the algorithm. The S notation signals a j characteristic (gene) for the individual denoted as i.

Fig. 2. Conceptual division of S .

Pseudo-code of RCDESIGN 1) Create a new random population P . 2) Call subroutine CreateIndividuals. 3) For n = 1 to stop criterion (NG or convergence) a. Select parents from P . b. Create population P by elite, crossover and mutation. c. Call subroutine CreateIndividuals. 4) Return best solution from . 5) Create an RC network following best solution. 6) Compute test errors. Pseudo-code of subroutine CreateIndividuals 1) For i = 1 to TP a. Create an RC network following S . b. For fold = 1 to 10 (cross-validation) i. Create training and validation set. ii. Simulate network in training set. iii. Training readout. iv. Compute training errors. v. Compute validation errors. c. Compute fitness of S . The crossover operator used by RCDESIGN is an adaptation of the uniform crossover for populations of individuals of different sizes. For each pair of parents, the vector with the biggest dimension is defined as S A and the

smallest one as S B . The S elements are combined from a . The mask indicates which randomly generated mask, S S characteristics will be inherited from S A and which characteristics will be inherited from S B . Fig. 3 illustrates how the crossover operator works.

Fig. 3. Crossover of RCDESIGN. (a) Crossover’s first part. It operates in the fixed part of S. (b) Crossover’s second part.

The mutation operator used by RCDESIGN is also an adaptation from the mutation to different sized individual populations. For each selected individual with a parent, a mask is created. The gene is copied to S . The mutation may occur starting from the second gene until the last gene with a mutation level defined by the parameter established as TM . B. Classical Searching Classical Searching uses the heuristic method of ESNs creation [12] and the idea of rescaling the W matrix by a spectral radius ( 1 ) and a fixed topology with only the required RC connections. Classical Searching uses GA in order to search the principal RC generation parameters, which are the reservoir size, the spectral radius, and the density of interconnections of W. Neurons use tanh() and the readout function is the pseudo-inverse function. The S notation signals a j characteristic (gene) for the individual denoted as i. • S or η – Size of W, W . [50; 200]. • S - Spectral radius.[0.7;1] • S - Density of interconnections. [10%;100%] Pseudo-code of Classical Searching and Topology and Radius Searching: 1) Create a new random population P . 2) Call subroutine CreateIndividuals. 3) For n = 1 to stop criterion (NG or convergence) a. Select parents from P . b. Create population P by elite, crossover and mutation. c. Call subroutine CreateIndividuals. 4) Return best solution from P . 5) Create an RC network following best solution. 6) Compute test errors.

Pseudo-code of subroutine CreateIndividuals 1) For i = 1 to TP a. Create an RC network following S . b. Compute all eigenvalues of W. c. Divide W by max (eigenvalues). d. Multiply W by spectral radius (S ). e. For fold = 1 to 10 (cross-validation) i. Create training and validation set. ii. Simulate network in training set. iii. Training readout. iv. Compute training errors. v. Compute validation errors. f. Compute fitness of S . C. Topology and Radius Searching Topology and Radius Searching uses GA in order to simultaneously search for the spectral radius and the network topology. This method is an intermediary approach between Classical Searching and RCDESIGN. Topology and Radius Searching works like Classical Searching, optimizing only the RC generation parameters; however, it disregards RC’s approximation to linear systems as RCDESIGN does. Topology and Radius Searching uses GA for searching the principal RC generation parameters, which are the reservoir size, the spectral radius, and the density of interconnections of W and the use or not of the optional connections. Neurons use tanh() and the readout function is the pseudo-inverse. The pseudo-code is equal in Classical Searching; the difference is in the individual formation. The S notation signals a j characteristic (gene) for the individual denoted as i. • S or η – Size of W, W , W and W . [50; 200]. • S - Spectral radius. [0.7;1.5] • S - Density of interconnections. [10%;100%] , 0, there is no connection. • S - If 1, there is W , 0, there is no connection. • S - If 1, there is W , 0, there is no connection. • S - If 1, there is W • S - If 1, there is W , 0, there is no connection. , 0, there is no connection. • S - If 1, there is W IV. CRITERIA FOR COMPARING THE METHODS The methods will be compared in terms of their computational complexity, more specifically time and processing, and in terms of error prediction. A. Time Complexity The term “complexity”, concerning algorithms, refers to the necessary resource requirements so that an algorithm may solve a problem within a computational point of view. When that resource is time, one or more fundamental operations are chosen and then the number of executions undergone by this fundamental operation in the carrying out of the algorithm is counted. Intuitively, in case you consider an instance where there is an input with longitude n which steps, it’s said that this problem presents can be solved in

a time complexity of time . When a problem has a cost within a machine and language given in time configuration, this cost will remain equal in all computers, so that this notation generalizes the notion of cost, no matter what equipment is being used. Through the pseudo-code analysis proposed by RCDESIGN, Classical Searching and Topology and Radius Searching methods, three basic differences were noted among the three methods: • Size of the individuals ( T ) • Genetic operators’ levels ( T ) • Rescaling of the W matrix by the spectral radius. A genetic algorithm’s complexity depends on the size of the individual, on the number of generations, on the population size and on the probability of the genetic operators. The complexity of the population generation procedure can be defined, in a simplified way, as and the reproduction procedure’s complexity as ). The difference in the size of the individuals in the three methods implies a difference in complexity of the initial population generation procedures and also in the reproduction operations (crossover and mutation). Rescaling the W matrix by the spectral radius incurs a high computational cost since the process involves the calculation of all eigenvalues of the W matrix, the dividing of W by the highest absolute value among its eigenvalues and finally the multiplying of W by the spectral radius. According to Anderson et al [21] the time complexity for the . So we calculation of a matrix’s eigenvalue (n x n) is defined, in a simplified manner, that the complexity of ). Table II presents a rescaling the W matrix is summary of the basic differences among the three methods in terms of time complexity. According to TABLE II, assuming that a weights search method for RC was less costly in computational terms [16], due to the large size of the search space, than the cost for RC generating parameter search methods is misguided. The cost associated to the large search space is compensated by the lack of cost associated to the rescaling of the W matrix by the spectral radius. TABLE II COMPARISON OF THE TIME COMPLEXITY RCDESIGN Classical Topology and Searching Radius Searching Creating the initial population Reproduction Rescale W by spectral radius

) -

) )

) )

The time complexity for each method will be presented with the average time spent by the method to create each network. The average time is calculated by equation ( 6 ):

t

1 NG TP

time ,

(6)

Where t is the average time, NG stands for the number of all the generations executed by the algorithm, where TP is is the time spent by the the size of the population and algorithm since its creation until the last individual’s evaluation. B. Forecasting Errors The system performance was measured by the mean-square error (MSE) specified in ( 5 ), the mean absolute percentage error (MAPE) specified in ( 7 ), and the mean absolute error (MAE) specified in ( 8 ): MAPE

MAE

100 1 PN

1 L PN P

T ⁄T ,

N

L

T

(7) (8)

where P is the total number of patterns in the set, N is the number of output units of the ESN, L and T are actual and desired (target) outputs of the i-th neuron in the output layer, respectively. V. DATA SETS In this research real data from the project SONDA (System of National Organization of Ambient Data http://www.cptec.inpe.br/sonda/) have been used in order to create the model. SONDA is a project from Brazil’s National Institute of Space Research (INPE) for the implementation of physical infrastructure and human resources in order to gather and improve the resources database of solar and wind energy in Brazil. The wind power model in energy planning is based on statistical operation of the wind farms, considering the wind regime. For the Northeast Region of Brazil, the most important characteristic is the wind speed, as shown in [20]. We chose three wind speed time series to carry out the experiments. The first series consists of the average hourly wind speeds obtained by the wind headquarters in Triunfo. The series is made up of the average hourly wind speeds in the period from July 01, 2004 to December 31, 2006, which amounts to 21,936 patterns. The second series consists of the average hourly wind speed obtained by the wind headquarters of Belo Jardim in the period from July 01, 2004 to December 31, 2005, which amounts to 13,176 patterns. The third series consists of the average hourly wind speed obtained by the wind headquarters of São João do Cariri, located in Paraíba, in the period from January 01, 2006 to December 31, 2007, which amounts to 17,520 patterns. The bases were preprocessed with 20% being added to the maximum value of each series. Thus, the maximum value accepted by the model is equal to the maximum value found in the series increased by 20%. The minimum value is zero. The values of each series were transformed in the same way to fall in a limited range [0; 1]. A twenty-four-step-forward predictor of the average hourly wind speed was chosen.

VI. RESULTS AND COMPARATIVE ANALYSIS For the Triunfo base, the RCDESIGN method selected a , W, W , W and network with W , W , W . The network’s topology chosen by Classical W Searching is fixed, having only the obligatory RC connections. Topology and Radius Searching selected a topology with W , W, W and W . The spectral radius for the network chosen by RCDESIGN was equal to 1.5558, which means it is greater than the limit suggested by Jaeger in [12]. The spectral radius of the network chosen by Classical Searching was equal to 1 and the spectral radius of the network selected by Topology and Radius Searching was equal to 1.1. TABLE III shows the average values and the standard deviation (in parenthesis) of the prediction errors, as well as the spectral radius and the sizes of the three networks selected for the Triunfo base. As we can see from this table, the MSE, the MAPE and the MAE of the system created by RCDESIGN were much smaller than those of the system created by Classical Searching. The results from the RCDESGIN created system were also better than those from Topology and Radius Searching, although they were close. TABLE III RESULTS IN TRIUNFO MSE MAPE(%)

MAE(%)

RCDESIGN. Size = 91. Spec.=1.5558 Training

0.0876(0.002)

8.43(0.10)

0.83(0.71)

Validation

0.0876(0.002)

8.43(0.09)

0.83(0.71)

Test 0,0988(0,002) 9.88(0.09) Classical Searching. Size = 176. Spec.=1

0.87(0.77)

Training

0.4541(0.008)

19.52(0.24)

1.92(1.60)

Validation

0.4541(0.007)

19.52(0.21)

1.92(1.57)

Test 0.3501(0.006) 20.43(0.21) Top. and R. Searching. Size = 182. Spec.=1.1

1.70(1.38)

Training

0.0944(0.002)

8.721(0.10)

0.87(0.74)

Validation

0.0944(0.002)

8.721(0.09)

0.87(0.73)

Test

0.108(0.002)

10.19(0.10)

0.90(0.81)

When compared to the model created by Ferreira et al in [17], for the same base, RCDESIGN was also superior. The model generated in [17] obtained a 0.95 MSE in the test set with a network with 400 neurons in its reservoir, while RCDESIGN obtained a 0.0988 MSE in a network with a reservoir of only 91 neurons. Compared to another model created by Ferreira et al in [18], for the same base, RCDESIGN was also superior. The model created in [18] obtained a 0.96 MSE in the test set with a network containing 547 neurons in its reservoir and a 0.947 spectral radius, while RCDESIGN generated a network with a 0.0988 MSE with 91 neurons in the reservoir

and a 1.5558 spectral radius. The networks selected by RCDESIGN and Topology and Radius Searching had a very good performance for the Triunfo base, in spite of using a spectral radius larger than 1. Fig. 4 presents the last 50 test set patterns, from the prediction carried out by the models generated by the three methods. As we can see, the model created by RCDESIGN and the model created by Topology and Radius Searching were able to learn and identify most of the environment presented in this series, while the model created by Classical Search showcased a much inferior performance.

Fig. 4. Test pattern for Triunfo. (A) Actual x RCDESIGN, (B) Actual X Classical Searching and (C) Actual x Topology and Radius Searching.

As we can see in Fig. 5, the best individual evolution in the RCDESIGN method is decreasing, but this decreasing behavior is not seen in the other two methods. The irregular behavior presented by the Classical Searching and Topology and Radius Searching reinforce what Ozturk et al demonstrated in [13]. In this article they proved that the reservoir dynamics is not determined by its fixed weights and that the input signals influence this dynamic. They showed that there are various weight matrices with the same spectral radius, and unfortunately, they do not display the same performance when the MSE is calculated. The same behavior was observed in the Belo Jardim and São João do Cariri bases.

Fig. 5. Comparison of the fitness evolution for the Triunfo base. (A) Classical Searching, (B) Topology and Radius Searching and (C) RCDESIGN.

For the Belo Jardim base, the RCDESIGN method and selected a network with W , W , W, W , W . The topology of the network chosen by Classical Searching is fixed, with the obligatory RC connections. Topology and Radius Searching selected a topology with

W ,W , W, W , W and W . The spectral radius of the networks chosen by RCDESIGN and by Topology and Radius Searching were larger than the suggested limit (1) for the echo state property. TABLE IV presents the average values and the standard deviation (in parentheses) of the prediction errors as well as the spectral radius and the sizes of the three networks selected for the Belo Jardim base. TABLE IV RESULTS IN BELO JARDIM MSE MAPE(%)

MAE(%)

RCDESIGN. Size = 80. Spec.=1.6144 Training

0.4976(0.010)

17.48(0.24)

Validation

0.4976(0.009)

0.69(0.59)

17.48(0.22)

0.69(0.59)

Test 0.3988(0.007) 12.1(0.13) Classical Searching. Size = 197. Spec.=1

0.63(0.52)

Training

1.251(0.018)

31.57(0.51)

1.15(0.88)

Validation

1.251(0.018)

31.57(0.44)

1.15(0.87)

Test 1.117(0.016) 20.18(0.21) Top. and R. Searching. Size = 92. Spec.=1.3

1.07(0.85)

Training

0.5489(0.010)

18.57(0.23)

0.73(0.62)

Validation

0.5489(0.010)

18.57(0.23)

0.73(0.61)

Test

0.4514(0.008)

12.52(0.13)

0.66(0.56)

The prediction errors in the system created by RCDESIGN were much smaller than those of the system designed by Classical Searching and also smaller than the results from the Topology and Radius Searching system. When compared to the model created by Ferreira et al in [17], for the same base, RCDESIGN was also superior. The model created in [17], obtained, in the test set, an MSE, MAPE and MAE respectively of 4.11, 18.21 and 0.71 in a network with 200 neurons. The model generated by RCDESIGN presented an MSE, MAPE and MAE of 0.3988, 2.1 and 0.63 respectively in a network containing only 80 neurons. Fig. 6 displays the last 50 patterns in the test set, from the prediction carried out by the models created by the three methods. As we can see, the model generated by RCDESIGN as well as the model designed by Topology and Radius Searching were able to learn and identify most of the environment presented in this series, while the models created by Classical Searching had a much inferior performance.

Fig. 6. Test pattern for Belo Jardim. (A) Actual x RCDESIGN, (B) Actual X Classical Searching and (C) Actual x Topology and Radius Searching.

For the São João do Cariri base, the RCDESIGN method , W , chose a network with W , W , W, W W and W . The topology of the network chosen by Classical Searching is fixed, with W , W and W . Topology and Radius Searching selected a topology with , W, W ,W and W . W , W TABLE V presents the average values and the standard deviation (in parenthesis) of the prediction errors, as well as the spectral radius and the size of the networks chosen by the three methods for the São João do Cariri base. The prediction errors for the system created by RCDESIGN were much lower than those for the system designed by Classical Searching and also lower than the results for the system by Topology and Radius Searching. The networks selected by RCDESIGN and Topology and Radius Searching displayed a much better performance for the São João do Cariri base, even using the spectral radius much larger than 1. TABLE V RESULTS IN SÃO JOÃO DO CARIRI MSE MAPE(%)

MAE(%)

RCDESIGN. Size = 106. Spec.=1.7826 Training

0.3715(0.008)

17.3(0.27)

0.67(0.59)

Validation

0.3715(0.007)

17.3(0.24)

0.67(0.58)

Test 0.3226(0.006) 13.01(0.82) Classical Searching. Size = 179. Spec.=1

0.64(0.53)

Training

1.098(0.017)

33.61(0.57)

1.21(0.94)

Validation

1.098(0.017)

33.61(0.57)

1.21(0.94)

Test 1.144(0.017) 24.71(1.60) Top. and R. Searching. Size = 170. Spec.=1.4

1.22(0.97)

Training

0.4286(0.009)

18.3(0.27)

0.72(0.63)

Validation

0.4286(0.009)

18.3(0.24)

0.72(0.62)

Test

0.3912(0.007)

13.91(0.91)

0.70(0.59)

TABLE VI presents the average generating time for each network (in seconds) in each method. The average generating time for the RCDESIGN networks was consistently lower compared to the average Classical Searching and Topology and Radius Searching times, which

means that the strategy of searching the W matrix weights has a lower computational cost than the strategy of searching the RC generation parameters, when this involves the rescaling of the W matrix by the spectral radius. TABLE VI RESULTS OF TIME COMPLEXITY RCDESIGN Classical Topology and Searching Radius Searching Triunfo

3.7625

8.1702

8.6523

Belo Jardim São João do Cariri

1.8951

3.8850

3.3211

2.1693

6.4905

5.9670

VII. CONCLUSION The results presented in this study show that it’s possible to optimize parameters, topology and weights simultaneously, without having to impose limitations for the reduction of the search space. Besides that, the RCDESIGN method showed a lower computational cost for finding the best system than the computational cost of the Classical Searching and Topology and Radius Searching methods. The results displayed regarding the bases which were studied, prove that there is no need to impose a limitation of a spectral radius smaller or equal to 1 on the W matrix [14] in order to create “good” RCs. The networks generated by the method that optimizes W’s weights (RCDESIGN) were smaller than the networks designed by the other methods, which implies less computational cost when operating these networks. REFERENCES [1]

R. T. Rockafellar, (2010, December). “Fundamentals of Optimization”. [Online]. Available: http://www.math.washington.edu/ rtr/fundamentals.pdf. [2] U. D. Schiller and J. J. Steil. “Analyzing the weigtht dynamic of recurrent learning algorithms”, Neurocomputing, vol. 63C, pp. 5-23, 2005. [3] J. Schmidhuber, D. Wierstra, M. E. Gagliolo and F. Gomez. “Training recurrent networks by evolino”. Neural Computation, vol 19(3), pp. 757-779, 2007. [4] T. Van Der Zant , V. Becanovic, K. Ishii, H. U. E. Kobialka and P. G. Plöger. “Finding good echo state networks to control an underwater robot using evolutionary computations”, in Proceedings of the 5th symposium on intelligent autonomous vehicles - IAV 2004, 2004. [5] X. Dutoit, H. V. Brussel and M. Nuttin. “A First Attempt of Reservoir Pruning for Classification Problems”, in Proceedings of the 15th European Symposium on Artificial Neural Networks, ESANN 2007, pp. 507-512,2007. [6] M. Lukosevicius, H. Jaeger. “Reservoir Computing Approaches to recurrent Neural Network Training”, Computer Science Review, vol. 3(3), pp.127-149, 2009. [7] B. Schrauwen, D. Verstraeten, M. D’Haene, (2010, December). “Reservoir Computing Toolbox Manual”. [Online]. Available: http://reslab.elis.ugent.be/rctoolbox. [8] W. Maass, T. Natschlager, H. Markram, “Real-time computing without stable states: A new framework for neural computation based on perturbations”, Neural Computation, vol. 14(11), pp. 2531–2560, 2002. [9] H. Jaeger, “The echo state approach to analyzing and training recurrent neural networks”, Technical Report GMD 148, German National Resource Center for Information Technology, 2001. [10] H. Jaeger, W. Maass and J. Principe. “Special issue on echo state networks and liquid state machines”, Neural Networks, vol. 20(3), pp. 287-289, 2007.

[11] D. Verstraeten, B. Schrauwen, M. D’Haene, D. Stroobandt, “An experimental unification of reservoir computing methods”, in Neural Networks, vol. 20 (3), pp.391-403, 2007. [12] H. Jaeger. “Tutorial on training recurrent neural networks, covering BPTT, RTRL, EKF and the echo state network approach”, Technical Report GMD 159, German National Resource Center for Information Technology, 2002. [13] M. C. Ozturk, D. Xu, J. C. Príncipe, “Analysis and design of echo state networks” in Neural Computation, vol. 19 (1), pp. 111-138, 2007. [14] M. C. Ozturk, J. C. Príncipe, “Computing with transiently stable states”, in Proceedings of the IEEE International Joint Conference on Neural Networks 2005, vol. 3, pp. 1467-1472, 2005. [15] J. H. Holland, “Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology Control and Artificial Intelligence”, MIT Press, Cambridge, MA, USA, 1992. [16] K. Ishii, T. van der Zant , V. Becanovic and P. Ploger. “Identification of motion with echo state network”, in Proceedings of OCEANS 2004 MTS/IEEE -- TECHNO--OCEAN 2004 Conference, vol. 3, pp. 12051210, 2004. [17] A. A. Ferreira, T. B. Ludermir, R. B. de Aquino, M. M. S. Lira, O. N. Neto, “Investigating the Use of Reservoir Computing for Forecasting the Hourly Wind Speed in Short –term”, in Proceedings of International Joint Conference on Neural Networks 2008, pp. 19501957, 2008. [18] A. A. Ferreira, T. B. Ludermir, “Genetic Algoritm for Reservoir Computing Optimization”, in Proceedings of International Joint Conference on Neural Networks 2009, Atlanta, pp. 811-815, 2009. [19] C. M. Bishop, “Pattern Recognition and Machine Learning”, in Information Science and Statistics, Springer, 2006. [20] R. R. B. Aquino, M. A. C. Junior, M. M. S. Lira, O. N. Neto, G. J. Almeida, S. N. N. Tiburcio. “Recurrent neural networks solving a real large scale mid-term scheduling for power plants”, in: Proceedings of International Joint Conference on Neural Networks 2010- IJCNN 2010, pp. 3439-3444, Barcelona, 2010. [21] E. Anderson, Z. Bai and C. Bischof and et al. (2010, December). Available: LAPACK User's Guide, SIAM, 1999. [Online]. http://www.netlib.org/lapack/lug/.