A Fuzzy Logic Controller with Adaptive Dynamic ... - IEEE Xplore

8 downloads 514 Views 360KB Size Report
Page 1 ... with adaptive dynamic programming optimizing for traffic intersection. Because ... right turns, and test the controller with ADP optimization under both ...
Fifth International Conference on Fuzzy Systems and Knowledge Discovery

A Fuzzy Logic Controller with Adaptive Dynamic Programming Optimization for Traffic Signals Yi Tian, Dongbin Zhao, and Jianqiang Yi Key Laboratory of Complex Systems and Intelligent Science, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China {yi.tian, jianqiang.yi, dongbin.zhao}@ia.ac.cn Pappis and Mamdani [2] designed a simple fuzzy logic controller to control traffic signals an isolated intersection of two one-way streets; Heung and Ho [3] used a hierarchical fuzzy logic controller with genetic algorithms to formulate the control rules; Trabia et al.[4] designed a two-stage fuzzy logic controller for single traffic intersection signal control; Bingham [5] proposed a neurofuzzy traffic signal controller which used reinforcement learning in tuning the membership functions. All the research mentioned above have all reported better performance of the fuzzy logic controllers compared to a fixed-time controller or an actuated controller. However, the optimization of the fuzzy rules or parameters of the fuzzy control still pose some problems. In [10], K. Lu pointed out that most fuzzy controllers for traffic signal control performed well under undersaturated traffic conditions, but did not work that well under oversaturated conditions. In fact, most available strategies are only applicable to undersaturated traffic conditions, such as fixed-time control and actuated control. In this paper, we consider this problem and design a fuzzy logic traffic controller which adopts the rules similar to the extension-based rules of actuated control and use ADP to optimize its parameters. Adaptive dynamic programming was first introduced by Werbos in 1977. It is an advanced control technology suitable for learning in noisy non-linear environments. It uses the universal approximation capacity of artificial neural networks to solve the representation problem of the cost function in dynamic programming. The applications of ADP include aircraft flight, power systems, communication networks, and so on. In [6], Si successfully used ADP in a single cart-pole balancing problem, a pendulum swing up and balancing task and a triple-link inverted pendulum balancing problem. In [7], Xu et al used adaptive critic designs (a synonym of ADP) in the ramp metering problem and the results showed that their controller performed better than the well-known ALINEA ramp control algorithm. Existing ADP can be categorized into many kinds. In this paper we use ADHDP (action dependent heuristic dynamic programming). A fuzzy logic controller for the traffic signal control is taken as the action model, which is optimized with critic model, composed by a neural network, to achieve the sub-optimal control performance.

Abstract This paper proposes a fuzzy logic signal controller with adaptive dynamic programming optimizing for traffic intersection. Because fuzzy logic has a clear advantage that it is able to use expert knowledge well, we adopt it in our controller. As adaptive dynamic programming is an advanced technology which is suitable for solving nonlinear stochastic system optimizing problems, we use it to optimize our fuzzy logic controller. We set up an isolated intersection model of two-way roads with both left and right turns, and test the controller with ADP optimization under both constant and variable traffic flow rates. The simulation results show that this controller has a good performance.

1. Introduction Traffic congestion is an important problem in big cities. However, due to the limited land, we cannot build more roads. In order to solve this problem, one thing we have to do is to maximize the utility of the existing roads, that is, to optimize the signal settings at the traffic intersections. The existing methods for isolated intersection traffic signal control can be classified into two categories: fixedtime strategies and traffic-responsive strategies. Fixedtime strategies use historical traffic information to decide traffic signal settings, while traffic-responsive strategies make use of real-time measurements to calculate in real time the suitable signal settings. [1] Actuated control method is a typical and widely used kind of trafficresponsive strategies. It is simple in structure and performs well at many traffic conditions, but its disadvantage is that it cannot coordinate the streams which currently has the right of way with other waiting streams. In recent years, as intelligent control develops quickly, many new methods have been introduced into this field. Fuzzy logic control method may be the first and extensively studied and implemented one in this field. First introduced by Zadeh, the fuzzy logic has been successfully implemented in many systems. As traffic intersection is a non-linear stochastic system with inherent uncertainties, fuzzy logic has an advantage over traditional control methods in traffic signal control because it has the ability to use expert knowledge well.

978-0-7695-3305-6/08 $25.00 © 2008 IEEE DOI 10.1109/FSKD.2008.493

191

The paper is organized as follows: section 2 introduces traffic intersection model and a related signal control strategy; section 3 and section 4 presents our fuzzy logic controller and its ADP optimizing method; section 4 is the simulation results of our controller under both constant and variable traffic flows.

2.2. Related Control Strategy Actuated control is a typical traffic-responsive strategy. Actuated control is simple in construction but performs well under most circumstances. Since we will compare the our controller’s performance with actuated controller’s in the simulation part of this paper, actuated control is briefly introduced as bellow. The green time of each traffic light should be satisfied both not smaller than a minimum green time and not larger than a maximum green time. The total time from the minimum green time to the maximum green time can be divided into several time intervals, so each time interval may be a few seconds. At the end of the minimum green time, if no vehicle is detected between the front and the rear detectors, actuated controller turns green lights to next phase. Otherwise the current phase remains in green time for a time interval. Then at the end of this time interval, if detectors of the green phase detect no vehicle between them, then the right of the way turns to next phase, otherwise another interval of green time will be added to the current phase. This process goes on until the green time of the current phase reaches the maximum green time.

2. Traffic Signal Control 2.1. Traffic Intersection The object studied in this paper is an isolated intersection of two-way roads with both left and right turns (see in figure 1). Each lane has two detectors: the front detector (at the stop line) and the rear detector (about 50 to 100 meters from the stop line). These detectors are used to provide numbers of vehicles waiting in the lanes. Sometimes, cameras and image analysis instead of detectors are used to perform this function. Though we consider right-turning streams in this paper, they are compatible to other streams. Figure 2 shows the four phases of the traffic intersection we study in this paper. The right-turn streams can go across the intersection at all phases because they are compatible to all the other streams. A saturation flow is the average flow crossing the stop line of an approach when the corresponding stream has right of the way, the upstream demand (or the waiting queue) is sufficiently large, and the downstream links are not blocked by queues. [1] We suppose the saturation flow rate is 1veh/s in this paper.

3. Fuzzy Logic Controller The proposed fuzzy logic controller has two inputs and one output. One input is the number of vehicles in the green phase (Q_green); the other is the weighted sum of numbers of vehicles of the other three phases (Q_red). The linguistic values of Q_green and Q_red are the same: zero, small, medium and big. We use trapezoidal membership functions for the input variables. Figure 3 shows the initial membership functions for both Q_green and Q_red. But during the optimization, they are trained separately.

Figure 3. The initial membership functions of the input variables

Figure 1. An isolated intersection

Table 1. Fuzzy Rules Q_red Zero Zero T Small E Q_green Medium E Big E Output

Figure 2. Four phases

192

Small T E E E

Medium Big T T T T E T E E

the critic network will approximate the cost function J. This is done by minimizing the following error function 1 (5) Ec (k ) = ( J (k ) − U (k ) − γ Jˆ (k + 1)) 2 2 When Ec (k ) = 0 for all k , we will find that Jˆ (k ) = U (k ) + γ Jˆ (k + 1)

The output of the fuzzy logic controller can be either extending (E) the current phase for another time interval or terminating (T) it and turn to next phase. The membership function of E is polynomial curve which opens to the left, and the membership function for T is the mirror-image function that opens to the right. We use minimum as the implication method, maximum as the aggregation method, and centroid as the defuzzification method. If the result is less than 0.5, then another interval will be added to the current phase; otherwise the right of the way turns to next phase. The N rules of the fuzzy logic controller are showed in Table 1.

= U (k ) + γ (U ( k + 1) + γ Jˆ (k + 2)) ∞

= " = ∑ γ i − kU (k )

Thus, after training, the output of the critic network will converge to the J function. As for the training of the action network, it can be done by minimizing J (k ) after the critic network has been trained. Schematic diagram for the implementation of ADP as the training strategy for the fuzzy controller is showed in figure 4. We use Q_green and Q_red as the inputs, and the output is the decision of whether the current phase is extended or terminated. The critic network is a nonlinear neural network with one hidden layer of ten nodes. BP is used to train the critic neural network. Genetic algorithm is used to train the fuzzy controller.

4. Adaptive Dynamic Programming 4.1. Dynamic Programming Suppose there is a discrete-time nonlinear (timevarying) dynamic system x( k + 1) = F ( x(k ), u (k ), k ), k = 0,1, 2" (1) where x ∈ R n is the state vector of the system and u ∈ R m is the control action. The function ∞

J ( x(k ), k ) = ∑ γ i − kU ( x(k ), u (k ), k ), 0 < γ < 1

(6)

i=k

(2)

R∗

i=k

is called the performance index (or cost function), where U ( x(k ), u (k ), k ) is the utility function of the system, and γ is a discount factor. Suppose that

X (k ) z

J * ( x(k + 1), k + 1) is the optimal cost from step k + 1 on for all possible states x(k + 1) . According to Bellman’s principal of optimality, the optimal cost from step k on equals to J * ( x(k ), k ) = min{U ( x(k ), u ( k ), k ) + γ J * ( x(k + 1), k + 1)}

X (k )

u (k )

J (k )

γ

−1

J ( k − 1)

X (k + 1)

U (k )

Fig. 4. Schematic diagram for implementation of ADP as the training strategy for fuzzy controller

The delay of vehicles well reflects the performance of the control system, so in this paper we choose a function of the average vehicle delays as the utility function. Suppose D(1) , D(2) , D(3) , " , D(k − 2) , D(k − 1) , D(k ) are the average delays of step 1, 2, 3, … k − 2 , k − 1 and k , then the utility function is ⎧ −1( failure), if ( D( k − 10) + D( k − 9) + ⎪" + D(k − 1)) /10 < D( k ) ⎪ (7) U (k ) = ⎨ ⎪ 0( success ), if ( D (k − 10) + D (k − 9) + ⎪⎩ " + D (k − 1)) /10 ≥ D (k )

u (k )

(3) Thus the optimal control u* (k ) at step k is u (k ) that achieves the following minimum u * (k ) = arg min{U ( x(k ), k ) + γ J * ( x(k + 1), k + 1)} (4)

4.2. Adaptive Dynamic Programming [8][9] The above idea is the principle of dynamic programming. If the function F in (1) and the cost function J in (2) are known, we can easily calculate u * (k ) according to (4). But as a result of the famous “curse of dimensionality”, it is often computationally untenable to run true dynamic programming. ADP introduces a critic system to solve this “curse of dimensionality” problem. The critic system adaptives the cost function by using a neural network. A typical structure of ADP has three modules: the model network, the critic network and the action network. The output of

R* is the desired ultimate performance objective. The error between the desired ultimate performance objective, denoted by R* and the approximate function J from the critic network is used to adapting the membership functions of the fuzzy controller. Because we define “0” as the utility function’s value for success, R* ought to be set to “0”. But considering the randomness of traffic flow, the utility function cannot always be “0” even when the training is successful. Thus we select a small number

193

close to zero ( − 0.01) as the ultimate objective of success. Figure 5 shows the flow chart of the training process.

5000 steps for simulations under variable flows. We set five different traffic conditions to study the controller’s performance. Table 2 shows the different sets of traffic conditions for simulation I, II, and III. Table 3 shows the flow rates of simulation IV. In simulation I, II, and III, traffic flows are constant. Simulation I is an undersaturated traffic condition, while the traffic in simulation III is heavy that it can be considered to be oversaturated condition. The flow rate in simulation II is between that in I and III. Simulation IV and simulation V are situations with sudden traffic flow changes, which is set to test the controller’s ability of fast recovery from sudden disturbance.

X (0), u (0), J (0)

k = k +1

k = k +1

X (k ), u ( k ), J ( k )

ec (k ) ← γJ (k ) − [ J (k − 1) − U (k )] 1 Ec (k ) ← ec2 (k ) 2 Δwc (k ) ← lc ( k )[−

∂Ec ( k ) ] ∂wc ( k )

wc (k + 1) ← wc ( k ) + Δwc ( k ) E c (k ) < Tc

Table 2. Flow rates of simulation I, II, and III Flow rates (veh/h) East South West North Simulation I 720 1800 720 600 Simulation II 720 1800 1800 900 Simulation III 1800 2400 1800 2400 F=

1 0.5 * ( J − 0.01)^ 2

Table 3. Flow rates settings for simulation IV Flow rates (veh/h) 1s to 2000s 2001s to 3000s 3001s to 5000s

East 360 1800 720

South 900 2400 1800

West 720 1800 600

North 540 1200 450

Table 4. Flow rates settings for simulation V Flow rates (veh/h) 1 s to 2000 s 2001 s to 2200s 2201 s to 5000s

u (k ) ≥ 0.5

u (k ) = ?

East 360 3600 720

South 900 2400 1800

West 720 1800 600

North 540 2400 450

Figure 6 to 10 shows the average delays of the actuated controller and our fuzzy logic controller in simulation I to V. In these figures, solid lines are the results of actuated controller and dashed lines are of fuzzy controller with ADP. It is clear that the fuzzy logic controller reduces the average delay when traffic flow is heavy (figure 8). From figure 9 and figure 10, we can obviously see that when there is a sudden increase of traffic flow, our fuzzy logic controller has less average vehicle delays and recoveries faster from the sudden change.

u (k ) < 0.5

Fig. 5. Flow chart of the ADP optimizing process

5. Simulation Results

30

In the simulation, we consider the isolated intersection introduced in 2.1. Vehicles are generated by random number generator with binominal distribution at the upstream of four approaches. When a vehicle enters the detecting area, it queues to the left-turn queue by a probability of 0.2, the through queue by a probability of 0.6, and the right-turn queue by a probability of 0.2. Each phase has a minimum of 15 seconds and a maximum of 60s of green time. The vehicles at the stop line leave the junction at a speed of 1veh/s when it is green time. In our simulations, we use 1s for each time step and select 2000 steps for simulations under constant flows and

Average delay, s/veh

25

20

15

10

5

0

0

500

1000

1500

2000

2500

time,s

Figure.6. Average delays in simulation I (light traffic flow)

194

actuated controller and fuzzy traffic signal controller to some extent. When traffic flow is constant and small, this controller’s performance is almost same as actuated controller. But under heavy constant traffic flow and flows with sudden changes, it performs clearly better than actuated controller. The success of this study will encourage further research on extending this controller to coordination of traffic intersections networks.

45 40

Average delay, s/veh

35 30 25 20 15 10 5 0

0

500

1000

1500

2000

2500

time, s

Acknowledgements

Figure. 7. Average delays in simulation II (medium traffic flow)

This work was partly supported by the NSFC Projects under Grant No. 60621001, 60575047 and 60475030, the National 973 Project No. 2006CB705500, the Outstanding Overseas Chinese Scholars Fund of Chinese Academy of Sciences (No. 2005-1-11), and the International Cooperative Project on Intelligence and Security Informatics by Chinese Academy of Sciences, China.

60

Average delay, s/veh

50

40

30

20

10

0

0

500

1000

1500

2000

References

2500

time, s

Figure. 8. Average delays in simulation III (heavy traffic flow)

1. M. Papageorgiou, C. Diakaki, V. Dinopoulou, A. Kotsialos, Y. Wang, “Review of road traffic control strategies”, Proceedings of the IEEE, vol. 91, no. 12, 2003, pp. 20432067. 2. C. P. Pappis, E. H. Mamdani, “A fuzzy logic controller for a traffic junction”, IEEE Trans. on Syst. Man and Cyber, vol. SMC-7, no. 10, 1977, pp. 707–717. 3. T. H. Heung, T. K. Ho, “Hierarchical fuzzy logic traffic control at a road junction using genetic algorithms”, Proceedings of the 1998 IEEE International Conference on Fuzzy Systems, 1998,pp. 1170-1175. 4. M. Trabia, M. Kaseko, M. Ande, “Two-stage fuzzy logic controller for traffic signals”, Transportation Research Part C7, 1999, pp. 353-367. 5. E. Bingham, “Reinforcement learning in neurofuzzy traffic signal control”, European Journal of Operational Research 131, 2001, pp. 232-241. 6. J. Si, Y-T. Wang, “On line learning control by association and reinforcement”, IEEE Transactions on Neural Networks, vol. 12, no. 2, 2001, pp. 264-276. 7. J. Xu, W-S. Yu, F-Y. Wang, “Ramp metering based on adaptive critic designs”, Proceedings of the IEEE ITSC, 2006, pp. 1531-1536. 8. J. Si, D. Liu, L. Yang, “Handbook of learning and approximate dynamic programming”, Chapter 5, IEEE Press, 2004, pp. 123-138. 9. D. Liu, “Approximate dynamic programming for self-learning control”, ACTA Automatica Sinica, vol.31, no.1, 2005, pp.1318. 10. K. Lu, “Signal control strategies for intersection under different traffic flow”, Journal of Highway and Transportation Research and Development, vol. 23, no. 4, 2006, pp. 128-131.

60

50

40

30

20

10

0

0

1000

2000

3000

4000

5000

6000

Figure.9. Average delays in simulation IV (traffic flow with sudden change) 80

70

60

50

40

30

20

10

0

0

1000

2000

3000

4000

5000

6000

Figure.10. Average delays in simulation V (traffic flow with sudden change)

6. Conclusion In this paper, a fuzzy logic controller with ADP for optimizing membership functions parameters is proposed to control the traffic signal lights at an isolated traffic intersection. This controller combines the advantages of

195