A Reinforcement Learning Automata Optimization Approach for ...

5 downloads 1629 Views 293KB Size Report
ITSE functions are selected as a cost function, but every one of these criteria has some disadvantages [13]. Moreover, for accurate comparison with PSO-PID ...
A Reinforcement Learning Automata Optimization Approach for Optimum Tuning of PID Controller in AVR System Mohammad Kashki1, Youssef Lotfy Abdel-Magid2, and Mohammad Ali Abido3 1

Electrical Eng. Dept. of Shahid Bahonar University, Kerman, Iran Electrical Engineering Program, The Petroleum Institute, Abu Dhabi, UAE 3 Dept. of Electrical Eng., King Fahd University of Petroleum & Minerals, Dhahran, SA 2

Abstract. In this paper, an efficient optimization method based on reinforcement learning automata (RLA) for optimum parameters setting of conventional proportional-integral-derivative (PID) controller for AVR system of power synchronous generator is proposed. The proposed method is Continuous Action Reinforcement Learning Automata (CARLA) which is able to explore and learn to improve control performance without the knowledge of the analytical system model. This paper demonstrates the full details of the CARLA technique and compares its performance with Particle Swarm Optimization (PSO) and Genetic Algorithms (GA) as two famous evolutionary optimization methods. The simulation results show the superior efficiency and performance of the proposed method in regard to other ones. Keywords: reinforcement learning automata; CARLA; PID; evolutionary computations.

1 Introduction The PID controller is the most frequently used control element in the industrial world in comparison to other controllers such as adaptive controllers, artificial neural network based controllers, fuzzy and neuro-fuzzy controllers. It is estimated that, at least, 90% of the controllers employed in the industry are PIDs or its variants [1]. The popularity of the PID controller is attributed to its simple structure, high reliability, and robust performance in a wide range of operating conditions. Despite all of the PID controller good features, unfortunately, its appropriate gain tuning is still a problem in many practical industrial applications because of high order, time delay and nonlinearity of the plants [2]. The normal tuning method in many applications is carried out using the classical tuning rules proposed by Ziegler-Nichols [3], which in general, does not yield optimal or near-optimal behavior in many industrial plants and just cannot be counted as a feasible solution. In recent years, many heuristic methods for the optimum tuning of PID parameters such as genetic algorithms (GA) and simulated annealing (SA) have been proposed with noticeable success in solving complex optimization problems [4-9]. Recently, a modern heuristic algorithm called Particle Swarm Optimization was proposed by Kennedy and D.-S. Huang et al. (Eds.): ICIC 2008, LNAI 5227, pp. 684–692, 2008. © Springer-Verlag Berlin Heidelberg 2008

A Reinforcement Learning Automata Optimization Approach

685

Eberhart which is developed through simulation of a simplified social system and have been found to be robust in solving nonlinear optimization problems [10]. In this paper a reinforcement learning automata based method called Continuous Action Reinforcement Learning Automata (CARLA) which was first introduced by Howell, Frost, Gordon and Wu [14] is used for the optimum tuning of the PID controller of a synchronous generator, and its performance is compared with PSO and GA based PID controllers which are completely investigated by Gaing [13]. The CARLA operates through interaction with a random or unknown environment by selecting actions in a stochastic trial and error process. The CARLA method has been successfully applied to different kind of optimization problems [14-15]. The generator excitation system maintains generator voltage and controls the reactive power flow using an automatic voltage regulator (AVR) [16]. The role of an AVR is to hold the terminal voltage magnitude of a synchronous generator at a specified level. Hence, the stability of the AVR system would seriously affect the security of the power system.

2 PID Controller The PID controller is composed of three main components: proportional, integral and derivative. Fig. 1 shows this structure and components. kp Proportional

yref(t)

+

e(t) -

ki



s

u(t)

Process

y(t)

Intergral

kd s Derivative PID Controller

Fig. 1. PID structure

Where yref (t) is the reference output, e(t) is the error, u(t) is the control signal, and y(t) is the output. Each of the PID controller components has it own specific effect on the controller performance: the Proportional component has the effect of increasing the loop gain to make the system less sensitive to disturbances, the integral component is used principally to eliminate steady-state errors, and the derivative action helps to improve closed loop stability [14]. The gain parameters kp, ki , kd are thus chosen to meet prescribed performance criteria, classically specified in terms of rise and settling times, overshoot and steady state error, following a step change in the reference output signal. The transfer function of PID controller can be expressed by Eq. (1). t

u (t ) = k p e(t ) + k i ∫ e(t )dt + k d 0

de(t ) dt

(1)

Thus, the decision variables involved in the optimization problem are the gain parameters: kp, ki , kd

686

M. Kashki, Y.L. Abdel-Magid, and M.A. Abido

3 Automatic Voltage Regulator The role of an AVR is to hold the terminal voltage magnitude of a synchronous generator at a specified level. A simple AVR system is composed of four main components, namely the amplifier, the exciter, the generator, and the sensor. For comparison with PSO and GA based PID controllers, in this paper, a linearized model of the AVR is considered which takes into account the major time constant of the AVR components and ignores nonlinearities [11], as shown in Fig. 2. Vref

+

kp +



-

ki + kd s s

PID Controller

KA 1 + τ As

KE 1+τ E s

KG 1+τGs

Amplifier

Exciter

Generator

Vr

KR 1+τ Rs Sensor

Fig. 2. Block diagram of linearized AVR with PID controller

Where Vref is the reference voltage and Vr is the excitation voltage. Table 1 summarizes the typical range of linearized model parameters. Table 1. Typical range of AVR linearized model parameters AVR component Amplifier Exciter Generator Sensor

Parameter KA τA KE τE KG τG KR τR

Typical range [10,400] [0.02,0.1] [10,400] [0.5,1] [0.7,1] [1,2] [1,2] [0.001,0.06]

4 Continuous Action Reinforcement Learning Automata In The CARLA optimization method, a continuous probability density function (CPDF) is associated with each decision variable, and through modification of these CPDFs over sufficient number of iterations the optimal value of the decision variables will be determined. The modification process in each iteration is due to reinforcement signal corresponding to a predefined cost function. Fig. 3 shows the diagram of CARLA. The optimization process of the decision variables runs in parallel due to minimization of a predefined cost function and the only interconnection between them is through the environment, i.e. PID controller and AVR system, and via shared performance evaluation function. The probability density functions are the basis of random action selection. The CPDFs are initially defined uniformly as Eq. 2.

A Reinforcement Learning Automata Optimization Approach

687

Start

Initialization of probability distribution functions f1( 0 ) , f 2( 0 ) , L , f n( 0 )

Random action selection of each decision variable due to CPDFs Evaluation of System

Iteration

Calculating cost function and Reinforcement signal

Updating PDFs f j( k ) → f j(k +1) ; j = 1,2, L , n

Convergence?

No Yes Finish

Fig. 3. Diagram of CARLA optimization method 1 ⎧ ⎪ f i (0 ) ( xi ) = ⎨ xi ,max − xi ,min ⎪0 ⎩

xi ∈ [ xi ,min , xi ,max ]

i = 1,2,..., n

(2)

otherwise

Where, n is the number of decision variables, x1, x2,..., xn are the decision variables and f1, f2,..., fn are the corresponding CPDFs. Actions, i.e. selected values of decision variables in each iteration, is selected by Eq. 3. xi

∫f

(k ) i

( xi ) = zi (k )

i = 1,2,..., n

(3)

0

Where k is the iteration number and z varies randomly in the range [0,1]. When all actions were selected, the PID controller will be constructed and applied to the plant for a suitable time. In this paper this step is done by computer simulation, but in practical applications it can be a real-time evaluation. After this evaluation, a scalar cost value is calculated according to a predefined cost function. Normally IAE, ISE, and ITSE functions are selected as a cost function, but every one of these criteria has some disadvantages [13]. Moreover, for accurate comparison with PSO-PID and GA-PID the same cost function is selected as expressed in Eq. 4. J ( k ) = (1 − e − λ ).(M p + E ss ) + e − λ .(t s − t r ) (k)

(4)

Which J is the cost function of the kth iteration, λ is the weighting factor and set between 0.8 to 1.5, Mp is the overshoot of the output signal, Ess is the steady state error, and ts, tr are the settling and rising time respectively. It is reiterated that, the CARLA algorithm does not require the knowledge of the system dynamics, but the

688

M. Kashki, Y.L. Abdel-Magid, and M.A. Abido

designer should be aware of system behaviors in order to define an appropriate cost function. The performance evaluation and consequent modification of the CPDFs are carried out by the reinforcement signal which is defined by Eq. 5. ⎧⎪ ⎪⎩

J mean − J ( k ) ⎫⎫⎪ ⎬⎬ ⎩ J mean − J min ⎭⎪⎭ ⎧

β ( k ) = min ⎨1, max⎨0,

(5)

Where β(k) is the reinforcement signal in the kth iteration and Jmean, Jmin are the mean and minimum values of previous iterations cost value, respectively. This definition of the reinforcement signal performs a reward/inaction rule in CPDFs modification. In the other word, if current selected actions are less that mean value of previous cost, i.e. β=0, then no modification of CPDFs must be performed (inaction) and, if selected action lead to cost value less than minimum of previous cost, i.e. β=1, then maximum reinforcement will be done (reward). The CPDFs modification is done by Eq. 6.

(

⎧α ( k ) f i ( k ) ( x ) + β ( k ) H i ( x i , ~ xi ) f i ( k +1) ( x) = ⎨ i 0 ⎩

)

x i ∈ [ x i ,min , x i ,max ] otherwise

(6)

Where, H i ( x i , ~x i ) is a symmetrical Gaussian function that is centralized at chosen actions ~x i and defined as Eq. 7. H i ( x, ~ xi ) =

⎛ gh (x − ~ xi ) 2 exp⎜ − 2 ⎜ xi , max − x i ,min ⎝ 2( g w ( xi , max − xi , min ))

⎞ ⎟ ⎟ ⎠

(7)

Where gh, gw are the normalized height and width of Gaussian function, respectively, and they determine the speed and resolution of learning. The Gaussian function is used for changing the probability of selected actions as while as their neighbor actions. The parameter αi is the distribution normalization factor in the (k+1)th iteration and is defined as Eq. 8. α i( k ) =

1 xi , max

∫f

(k ) i

( xi ) + β ( k ) H ( xi , ~ xi )dxi

(8)

xi , min

At the end, the convergence criterion of the algorithm determines if the algorithm should stop or not. This criterion can be a specified number of iteration or stand stillness of selected actions, etc. After the algorithm is halted, it is expected that the CPDFs are maximized at corresponding decision variable optimal value.

5 Computer Simulations and Results To verify the efficiency of designed CARLA-PID controller a practical high-order AVR system is used in computer simulation. The simulation results are carried out in Matlab® and Simulink® environments. Furthermore, the performance of proposed CARLA-PID is compared with PSO-PID and GA-PID controller with same performance index criterion. Table 2 summarizes the parameters of AVR system.

A Reinforcement Learning Automata Optimization Approach

689

Table 2. The test AVR model parameters AVR component Amplifier

Parameter KA τA KE τE KG τG KR τR

Exciter Generator Sensor

Value 10 0.1 1 0.4 1 1 1 0.01

To emphasize the necessity of using PID controller in this AVR system, the step response of the terminal voltage without PID controller was simulated as shown in Fig. 4. As can be seen, the overshoot and steady state error are about 50.51% and 8.81%, respectively. The Ziegler-Nichols method for designing the PID controller will result in the gain parameter: kp=1.0228, ki=1.8423, and kd=0.1357. The terminal voltage response with of the Ziegler-Nichols PID controller is shown in Fig. 5. Ter m inal Volt a g e St ep R e sp onse

1.6

1.6

1.4

1.4

1.2

1.2

1 0.8 0.6 0.4

1 0.8 0.6 0.4

0.2 0

Ter m inal Vo lt age Ste p R e sponse W it h Zie gle r- Nichols PID Cont ro lle r

1.8

Terminal Voltage

Terminal Voltage

1.8

0.2 0

1

2

3

4

5 6 Time (sec)

7

8

9

10

0

0

0.5

1

1.5

2

2.5 3 Time (sec)

3.5

4

4.5

5

Fig. 4. Terminal voltage step response with- Fig. 5. Terminal voltage step response with outPID controller Ziegler-Nichols PID controller

Although, the Ziegler-Nichols-PID controller succeeded into eliminating the steady state error, the overshoot value is still high, i.e. Mp=64.42%. 5.1 Optimal PID Controllers The performance evaluation of the optimal PID controllers, namely, the CARLA-PID, the PSO-PID and the GA-PID, is performed for two different values of weighting factor in the cost function, i.e. λ=1.5 and λ=1. The parameters of the CARLA-PID algorithm are as follows: ƒ ƒ ƒ

the decision variables are: x1=kp, x2=ki and x3=kd (n=3); upper and lower bounds of decision variables are as 0≤ kp ≤1.5, 0≤ ki≤1, 0≤ kd≤1; height and width of Gaussian function are gh=0.7 and gw=0.03.

690

M. Kashki, Y.L. Abdel-Magid, and M.A. Abido

Fig. 6 shows the CPDFs variations toward algorithm iterations for λ=1.5,

Fig. 6. CPDFs variation of CARLA-PID gain parameters

The trend of convergence is shown in Fig. 7 which shows that the CARLA algorithm converges in about 89 iterations for λ=1.5 and 120 iterations for λ=1, while, PSO and GA converge in less than 30 iterations. Trend of Convergency

2 1.8 1.6

Min. of Costs

1.4 1.2 1

λ =1

0.8 0.6

λ=1.5

0.4 0.2 0

0

50

100

150

Iterations

Fig. 7. Trend of convergence of CARLA-PID for different values of λ

The parameters of PSO-PID controller are as follows: ƒ the member of each individual is kp, ki and kd; ƒ population size =50 ƒ the limit of change in velocity for each member is half of the corresponding gain parameter maximum value as same as in CARLA-PID And finally, the parameters of GA-PID controller with Elitism scheme [5,6] are as follows: ƒ the member of each individual is kp, ki and kd; ƒ population size =50 ƒ crossover rate Pc = 0.6; ƒ mutation rate Pm = 0.01; 5.2 Comparison of Optimal PID Controllers Table 3 summarizes the best solution of the optimum CARLA-PID, PSO-PID and GA-PID controllers gains followed by the cost function value. Fig. 8 shows the

A Reinforcement Learning Automata Optimization Approach

691

Table 3. Comparison of optimal PID controllers Weight Parameters kp ki kd Mp (%) Ess ts (s) tr (s) Cost Value

PSO-PID

λ=1.5 GA-PID CARLA PID

0.6476 0.5216 0.2375 14.91 0 3.8929 0.1972 0.9408

0.8935 0.6458 0.4014 14.73 0 4.3684 0.1787 1.0499

PSO-PID

λ=1 GA-PID

0.6570 0.5390 0.2458 15.38 0 3.8540 0.1934 1.4442

0.8663 0.7531 0.3365 16.58 0 3.7042 0.1738 1.4056

1.2191 0.2943 0.2742 1.2 0 0.5961 0.1530 0.1084

CARLA PID 1.0184 0.2809 0.2308 1.93 0 0.6462 0.1689 0.1879

efficiency of each optimal PID controller used in AVR system corresponding to different values of weighting factor in cost function. As can be seen by results, the CARLA-PID has more efficiency and performance in compare with two other PIDs. Moreover, Fig. 9 shows the control signal of optimal PID controllers.

1 PSO-PID GA-PID CARLA-PID

0.5

0

0

1

2

3 4 Time (sec)

5

6

7

Control Signal

Terminal Voltage

PSO-PID GA-PID CARLA-PID 0

1

2

3 4 Time (sec)

5

6

0.5 0 0

1

7

Fig. 8. Terminal voltage step response with optimal PID controllers

2

3 4 Time (sec)

5

6

7

Cont r ol Signa l of Opt im a l PID Cont rolle rs ( λ=1)

1.5

1

0

PSO-PID GA-PID CARLA-PID

1

-0.5

Te rm ina l Volt a ge St e p R e sponse wit h Opt im al PID Cont rolle rs ( λ= 1)

0.5

Cont rol Signa l of Opt im a l PID Cont rolle rs ( λ=1.5)

1.5 Control Signal

Terminal Voltage

Te rm ina l Volt a ge St e p R e sponse wit h Opt im a l PID Cont rolle rs ( λ=1.5)

PSO-PID GA-PID CARLA-PID

1 0.5 0 -0.5

0

1

2

3 4 Time (sec)

5

6

7

Fig. 9. Control signal of optimal PID controllers

6 Conclusion An efficient design method using Continuous Action Reinforcement Learning Automata for the optimal tuning of PID conventional controller used in AVR system of synchronous generator is proposed. The method does not require the knowledge of the dynamics and equations of plant. In addition, the performance of the proposed method is compared with Particle Swarm Optimization and Genetic Algorithm methods. As indicated by simulation results, the speed of convergence of the proposed method is less than other methods but its performance is quite superior to the GA and the PSO . Therefore, the proposed method can overcome some of the

692

M. Kashki, Y.L. Abdel-Magid, and M.A. Abido

shortcomings of other optimization techniques and can be widely used in more complex optimization problems.

References 1. Santos, M., de la Cruz, J.M.: Between Fuzzy-PID and PID-Conventional Controllers: a Good Choice, 123–127 (1996) 2. Visioli, A.: Tuning Of PID Controllers with Fuzzy Logic. Instrument Electrical Engineering Control Theory Application 148(1), 1–8 (2001) 3. Ziegler, J., Nichols, G.: Optimum Setting for Automatic Controllers. Transactions of ASME 64(1), 759–768 (1942) 4. Seng, T.L., Khalid, M.B., Yusof, R.: Tuning of a Neuro-Fuzzy Controller by Genetic Algorithm. IEEE Transaction on System, Man, Cybernetic 29, 226–236 (1999) 5. Kawabe, T., Tagami, T.: A Real Coded Genetic Algorithm for Matrix Inequality Design Approach of Robust PID Controller with Two Degrees of Freedom. In: 12th IEEE International Symposium on Intelligent Control, Istanbul, Turkey, pp. 119–124 (1997) 6. Krohling, R.A., Jaschek, H., Rey, J.P.: Designing PI/PID Controller For A Motion Control System Based on Genetic Algorithm. In: 12th IEEE International Symposium on Intelligent Control, Istanbul, Turkey, pp. 125–130 (1997) 7. Kwok, D.P., Sheng, F.: Genetic Algorithm and Simulated Annealing for Optimal Robot Arm PID Control. In: IEEE Conference of Evolutionary Computation, Orlando, FL, pp. 707–713 (1994) 8. Ota, T., Omatu, S.: Tuning of The PID Control Gains By GA. In: IEEE Conference of Emerging Technology Factory Automation, Kauai, HI, pp. 272–274 (1996) 9. Jones, A.H., Oliveira, P.B.D.: Genetic Auto-Tuning of PID Controllers. In: Instrument Electrical Engineering Conference on Genetic Algorithm and Engineering System Innovations Application, pp. 141–145 (1995) 10. Kennedy, J., Eberhart, R.: Particle Swarm Optimization. In: IEEE International Conference Neural Networks, Perth, Australia, pp. 1942–1948 (1995) 11. Yoshida, H., Kawata, K., Fukuyama, Y.: A Particle Swarm Optimization for Reactive Power and Voltage Control Considering Voltage Security Assessment. IEEE Transaction on Power Systems 15, 1232–1239 (2000) 12. Howell, M.N., Frost, G.P., Gordon, T.J., Wu, Q.H.: Continuous Action Reinforcement Learning Applied to Vehicle Suspension Control. Mechatronics 7(3), 263–276 (1997) 13. Gaing, Z.L.: A Particle Swarm Optimization Approach for Optimum Design of PID Controller in AVR System. IEEE Transactions On Energy Conversion 19(2), 384–391 (2004) 14. Howell, M.N., Best, M.C.: On-Line PID Tuning for Engine Idle-Speed Control Using Continuous Action Reinforcement Learning Automata. Control Engineering Practice 8, 147–154 (2000) 15. Howell, M.N., Gordon, T.J.: Continuous Action Reinforcement Learning Automata and Their Application to Adaptive Digital Filter Design. In: Engineering Applications of Artificial Intelligence, pp. 549–562. Elsevier Science Ltd., Amsterdam (2001) 16. Saadat, H.: Power System Analysis. McGraw-Hill, New York (1999)