Biomimetic Hybrid Feedback Feedforward Neural ... - IEEE Xplore

4 downloads 0 Views 1MB Size Report
Abstract—This brief presents a biomimetic hybrid feedback feedfor- ward neural-network learning control (NNLC) strategy inspired by the human motor learning ...
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 28, NO. 6, JUNE 2017

1481

Biomimetic Hybrid Feedback Feedforward Neural-Network Learning Control Yongping Pan, Member, IEEE , and Haoyong Yu, Member, IEEE Abstract— This brief presents a biomimetic hybrid feedback feedforward neural-network learning control (NNLC) strategy inspired by the human motor learning control mechanism for a class of uncertain nonlinear systems. The control structure includes a proportional-derivative controller acting as a feedback servo machine and a radial-basis-function (RBF) NN acting as a feedforward predictive machine. Under the sufficient constraints on control parameters, the closed-loop system achieves semiglobal practical exponential stability, such that an accurate NN approximation is guaranteed in a local region along recurrent reference trajectories. Compared with the existing NNLC methods, the novelties of the proposed method include: 1) the implementation of an adaptive NN control to guarantee plant states being recurrent is not needed, since recurrent reference signals rather than plant states are utilized as NN inputs, which greatly simplifies the analysis and synthesis of the NNLC and 2) the domain of NN approximation can be determined a priori by the given reference signals, which leads to an easy construction of the RBFNNs. Simulation results have verified the effectiveness of this approach.

Index Terms— Adaptive control, hybrid feedback feedforward, human motor learning control, neural network (NN), nonlinear system, persistent excitation. I. I NTRODUCTION Neural-network (NN) approximation-based adaptive control (AAC) has attracted continuous concern in recent years, For example, see [1]–[13]. Compared with the traditional adaptive control, AAC has at least two prominent merits. First, due to the universal approximation property of NNs, the difficulty of system modeling in many practical control problems can be greatly alleviated. This simplifies control synthesis for a larger class of nonlinear systems with functional uncertainties [14]. Second, due to the practical persistently exciting (PE) condition of NNs, parameter convergence is easier to be obtained during control resulting in an accurate online modeling and improved control performances [16]. The second merit differentiates learning AAC from the traditional AAC, where the ability to learn is one of the fundamental attributes of autonomous intelligent behavior [15]. In particular, an adaptive control treats the similar or the same task as a new one and recalculates adaptive parameters to ensure stability of the closed-loop system without any knowledge learned and kept in memory, whereas learning control requires not only the adaptability but also the capabilities of accurate parameter estimation and knowledge storage with reusage for other similar or same tasks [15]. Yet, while most AAC approaches exploit the above-mentioned first merit [1]–[13], a few AAC approaches make use of the abovementioned second merit. Earlier results on the PE property of NNs show that for radial-basis-function (RBF) NNs constructed on a regular lattice, the regression vector is PE if NN inputs belong to certain Manuscript received November 28, 2014; revised June 17, 2015, January 14, 2016; accepted February 2, 2016. Date of publication March 30, 2016; date of current version May 15, 2017. This work was supported in part by the Biomedical Engineering Programme within the Agency for Science, Technology and Research, Singapore, under Grant 1421480015, and in part by the Defense Innovative Research Programme within the Ministry of Defence, Singapore, under Grant MINDEF-NUSDIRP/2012/02. (Corresponding author: H. Yu.) The authors are with the Department of Biomedical Engineering, National University of Singapore, Singapore 117583, Singapore (e-mail: biepany@ nus.edu.sg; [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TNNLS.2016.2527501

neighborhoods of neuron centers with the size being less than 1/2 of the minimal distance of any two neuron centers [24]–[26]. In this sense, ideal NN inputs satisfying the PE condition are featured by periodic or ergodic trajectories visiting the limited neighborhoods of all neuron centers [26]. However, this requirement is impractical, since a random NN input trajectory may not visit the specified neighborhood of any neuron center. Pioneer works of NN learning control (NNLC) in [17] and [18] indicate that with locally supported NNs, only PE on a reduced-dimension regression subvector can lead to closed-loop exponential stability. Yet, the PE condition of NN inputs is not satisfied in [17] and [18]. A deterministic learning technique was developed in [16] to remove the restriction on the size of the neighborhoods in [26], where it is proven that for RBF-NNs constructed on a regular lattice, any recurrent trajectory that stays within the regular lattice can lead to a partial PE condition. Then, an NNLC approach was proposed in [16] to overcome the difficulty in [17] and [18] via three steps as follows: 1) an NN-AAC law is developed to achieve the convergence of plant states to recurrent reference trajectories, so that the plant states become recurrent; 2) due to the state convergence, the regression subvector of localized RBF-NNs along the recurrent state trajectories subsequently satisfies a partial PE condition; and 3) under the partial PE condition, practical exponential stability of the closed-loop system is established to ensure the tracking error convergence and accurate function approximation within a local region along the recurrent state trajectories. The design difficulty of the NNLC in [16] and its variations, such as those in [19]–[23] lies in two aspects: 1) the three-step procedure complicates the analysis and synthesis of the NNLC and 2) the construction of the RBF-NNs is inconvenient, as the NN inputs applied, i.e., the plant states, are unknown before control. This brief focuses on a novel biomimetic design of the NNLC. Biological research suggests that human motions are controlled by a hierarchical neural system, where the lateral part of the cerebellum forms a feedforward path to act as a predictive machine for transforming high-level information, such as intention to middle-level information, and the intermediate part of the cerebellum forms a feedback path to act as a servo machine for executing movement [27]. This hybrid feedback feedforward (HFF) architecture of human motor learning control was also verified by some neuroscientific results [28]–[30]. The above-mentioned principle motivates the contributions of this brief: to propose a biomimetic HFF-NNLC strategy for a class of uncertain nonlinear systems and to establish stability for the resulting closed-loop system. In our HFF design, an RBF-NN is applied as the feedforward predictive machine, and a proportional-derivative (PD) controller is applied as the feedback servo machine. With sufficient constraints on control parameters, the closed-loop system achieves the semiglobal practical exponential stability in the sense that both the tracking and parameter estimation errors exponentially converge to small neighborhoods of zero. This brief extends our previous works of HFF-AAC in [31] and [32] to the learning scenario. Compared with the existing NNLC works such as [16]–[23], this brief has the following significant advantages: 1) The usage of NN-AAC to guarantee plant states being recurrent in the deterministic learning is not needed as recurrent reference signals rather than plant states are utilized as NN inputs, which greatly simplifies the analysis and synthesis of the NNLC;

2162-237X © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

1482

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 28, NO. 6, JUNE 2017

2) The domain of NN approximation can be determined by the given reference signals antecedently, which results in a convenient construction of the RBF-NNs. This brief is organized as follows. The problem formulation and preliminaries are given in Sections II and III, respectively. The HFF-NNLC is developed in Section IV. Simulation results are given in Section V. Finally, the conclusions are drawn in Section VI. Throughout this brief, N, R, R+ , Rn , and Rn×m denote the spaces of natural numbers, real numbers, positive real numbers, real n-vectors, and real n × m-matrices, respectively,  ·  denotes the Euclidian norm, L ∞ denotes the space of bounded signals, tr(A) denotes the trace of A, diag(·) is a diagonal matrix, min(·), max(·), and sup(·) are the functions of minimum, maximum, and supremum, respectively, col(x,y) := [xT, yT]T , c := {x|x| ≤ c} is the ball of radius c, and C k represents the space of functions, whose k-order derivatives all exist and are continuous, where c ∈ R+, x, y ∈ Rn, A ∈ Rn×n, and n,m,k ∈ N. II. P ROBLEM F ORMULATION Consider a class of single-input single-output (SISO) affine nonlinear systems in the Brunovsky canonical form [16] ⎧ ⎪ ⎨x˙i = xi+1 (1) (i = 1, 2, . . . , n − 1) ⎪ ⎩ x˙n = f (x) + bu in which x(t) := [x1 (t), x2 (t), . . . , xn (t)]T ∈ Rn is the state vector, u(t) ∈ R and x1 (t) ∈ R are the control input and the system output, respectively, f (x) : Rn → R, satisfying f (0) = 0 is an unknown C 1 nonlinear driving function, and b ∈ R is an unknown and constant control gain. Let xd (t) ∈ R denote a reference signal. The following assumptions are given for facilitating control synthesis [16]. Assumption 1: There exists an unknown finite constant b0 ∈ R+ , such that 0 < b0 ≤ b holds. Assumption 2: The reference signal xd (t) is recurrent and satisfies (i) xd (t) ∈ L ∞ for i = 0, 1, . . . , n + 1. (n−1) T (n) Let xd := [xd , x˙d , . . . , xd ] ∈ Rn and xde := col(xd , xd ) ∈ n+1 . Define the output tracking error e1 := x1 − xd and filtered R tracking errors e2 to en as follows [34]: ⎧ ⎪ ⎨e2 := e˙1 + k1 e1 (2) ei := e˙i−1 + ki−1 ei−1 + ei−2 ⎪ ⎩ (i = 3, 4, . . . , n) where ki ∈ R+ with i = 1, 2, . . . , n − 1 are control gain parameters. It has been shown in [34] that ei =

i−1  j =0

( j)

ai j e1

(i = 2, 3, . . . , n)

(3)

with ai j = 1 for j = i − 1, where ai j ∈ R+ are obtained from applying (3) to (2) and comparing coefficients. From (1)–(3) with the definition of e1 , one obtains n−2 ( j +2) a(n−1) j e1 + αn−1 e˙n−1 + e˙n−2 e˙n = j =0

n−3

( j +2) (n) a e + x1 j =0 (n−1) j 1 (n) − yd + αn−1 e˙n−1 + e˙n−2 n−3 ( j +2) = a e + kn−1 e˙n−1 j =0 (n−1) j 1

=

(n)

+ e˙n−2 − xd

+ f (x) + bu.

Thus, one gets the open-loop tracking error dynamics ⎧ e˙1 = −k1 e1 + e2 ⎪ ⎪ ⎪ ⎨ e˙i = −ki ei + ei−1 + ei+1 (5) ⎪ (i = 2, 3, . . . , n − 1) ⎪ ⎪ ⎩ e˙n = b(h(x, ν) + u)  ( j +2) (n) where ν := xd − n−3 − kn−1 e˙n−1 − e˙n−2 , and j =0 a(n−1) j e1 h(·) is a lumped uncertainty given by h(x, ν) := ( f (x) − ν)/b

(6)

(n−1) where ν is directly attainable, since all e1 , e˙1 , . . . , e1 in ν can be calculated by x − xd [34]. Let e := [e1, e2 , . . . , en /b]T .1 The objective is to develop an NNLC

strategy for system (1) under Assumptions 1 and 2, such that: 1) e converges to a small neighborhood of zero and 2) h(·) is accurately approximated by an NN along recurrent NN input trajectories. Remark 1: For simplifying discussion, this brief only considers the SISO affine nonlinear system (1) with b being a constant. However, the following results are possible to be extended to system (1) with b being a function of x, multiinput multi-output affine nonlinear systems, strict-feedback nonlinear systems, nonaffine nonlinear systems, and output-feedback nonlinear systems by the approaches in [3], [19]–[21], and [23], respectively. Although these extensions are interesting, they are out of the scope of this brief. Remark 2: A recurrent trajectory represents a large class of periodic and period-like trajectories, which includes periodic, quasiperiodic, almost-periodic, and chaotic trajectories [16]. The recurrent trajectory is featured as follows: Given a constant μ ∈ R+ , there exists a time T (μ) > 0, such that the trajectory returns to a μ-neighborhood of any point on the trajectory within the time T (μ) [21]. The requirement of the reference signal xd being recurrent in Assumption 1 is consistent with the fact that human beings can learn skills effectively by repeating motions. Hence, Assumption 1 is intuitional for learning control. III. P RELIMINARIES A. Radial-Basis-Function Neural Networks To approximate a C 1 function h(χ) : χ → R with χ ⊂ R p , ˆ Wˆ ) : χ × R N → R is given as follows [33]: an RBF-NN h(χ| ˆ Wˆ ) = h(χ|

N 

w j φ j (χ) = Wˆ T (χ)

(7)

j =1

in which χ = [χ1 , χ2 , . . . , χ p ]T ∈ χ is a vector of NN inputs, Wˆ = [w1 , w2 , . . . , w N ]T ∈ R N is a vector of NN weights, N is the number of NN nodes, and (χ) = [φ1 (χ), φ2 (χ), . . . , φ N (χ)]T : χ → R N is the regression vector. Note that the functions φ j (χ) are commonly chosen to be Gaussian RBFs as follows:  (8) φ j (χ) = exp −χ − c j 2 /2σ 2j with c j := [c1 j , c2 j , . . . , c p j ]T , where ci j ∈ R and σ j ∈ R+ are centers and widths of Gaussian functions, i = 1, 2, . . . , p, and j = 1, 2, . . . , N. The Gaussian RBF belongs to the class of localized RBFs, since φ j (χ) = 0 as χ → ∞. According to the universal approximation theorem of NNs [33], given any h(χ) and ε¯ ∈ R+ , there exist a sufficiently large number N and a constant vector of optimal NN weights W ∗ ∈ R N in (7), such that h(χ) = W ∗T (χ) + ε(χ)

(9)

where |ε(χ)| ≤ ε¯ is an optimal approximation error. (4)

1 The special definition of e is only used for the avoidance of the control gain b, amplifying perturbation terms in the following derivation.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 28, NO. 6, JUNE 2017

1483

where γ ∈ R+ is a learning rate. Applying (15) to (5) yields

B. Useful Definitions and Lemmas The following useful definitions and lemmas are introduced for the convenience of control synthesis. Definition 1 [35]: A bounded signal (t) ∈ R N is PE iff there exist constants α1 , α2 , δ ∈ R+ , such that

t +δ (τ )T (τ )dτ ≥ α1 I ∀t ≥ 0. (10) α2 I ≥ t

Definition 2 [35]: A linear time-varying system (c, A) given by x˙ = A(t)x, y = cT (t)x with A ∈ Rn×n , c ∈ Rn , x ∈ Rn , and y ∈ R is uniformly completely observable (UCO) iff there exist constants β1 , β2 , δ ∈ R+ , such that (11) β2 I ≥ N (t, t + δ) ≥ β1 I ∀t ≥ 0 t +δ T T in which N (t, t + δ) := t ϕ (τ, t)c(τ )c (τ )ϕ(τ, t)dτ denotes an observable Gramian matrix, and ϕ(τ, t) denotes a state transition matrix that is associated with A(t). Definition 3 [36]: A variable χ(t, ) : R+ × R → R p is of the order , denoted as χ(t, ) = O(), if there exist constants κ,  ∗ ∈ R + , such that χ(t, ) ≤ κ||, ∀|| ∈ (0,  ∗ ). Lemma 1 (Boundedness of Basis Functions [26]): There exists a constant ψ ∈ R+ independent of x and N, such that the regression ˙ vector  in (7) satisfies max{(χ), (χ)} ≤ ψ, ∀χ ∈ R p . Lemma 2 (Spatially Localized Approximation [16]): For any given trajectory χ(t) : R+ → χ , a C 1 function h(χ) : χ → R can be approximated by the RBF-NN (7) with a limited number of neurons located in a local region along χ(t), such that

ˆ de |Wˆ ). e˙n /b = −kn en − en−1 + h(x, ν) − h(x

(17)

˜ xde ) := h(x, ν) − h(xde ). Adding and subtracting h(xde ) Let h(x, in (17) and using (13), one obtains ˜ xde ) + ε(xde ) (18) e˙n /b = −kn en − en−1 + W˜ T (xde ) + h(x, where W˜ := W ∗ − Wˆ is a parameter estimation error. Applying the localized RBF-NN in (7) along the recurrent trajectories of the NN input xde and using (12) in Lemma 2, one gets the entire closed-loop dynamics composed of (5), (16), and (18) as follows: ⎧ ⎪ e˙1 = −k1 e1 + e2 ⎪ ⎪ ⎪ ⎪ ⎪ e ˙ = −ki ei + ei−1 + ei+1 ⎪ ⎪ i ⎪ ⎨(i = 2, 3, . . . , n − 1) (19) T ⎪ ⎪e˙n /b = −kn en − en−1 + W˜ ζ ζ (xde ) ⎪ ⎪ ⎪ ˜ xde ) + εζ (xde ) ⎪ + h(x, ⎪ ⎪ ⎪ ⎩W˙˜ = −γ e  (x ). ζ

n

ζ

de

with Wζ∗ = [wl∗ , . . . , wl∗ ]T and ζ (x) = [φl1 (x), . . . , φl N (x)]T

Remark 3: In the traditional NNLC [16]–[23], all NN inputs (x, ν) are unknown before control, whereas in the proposed HFF-NNLC, all NN inputs xde are known before control. This important difference leads to some attractive features of the proposed NNLC as follows: 1) the usage of an NN-AAC law in the deterministic learning is not needed, as the recurrent reference signal xde is applied as NN inputs, which greatly simplifies the analysis and synthesis of the NNLC; 2) the approximation domain cd can be determined antecedently by the given xde resulting in a convenient construction of the RBF-NNs; and 3) semiglobal practical exponential stability can be guaranteed by the increase of the control parameters ki to kn and γ . Features 1 and 2 are straightforward, and feature 3 will be proven in Section IV-B.

IV. N EURAL -N ETWORK L EARNING C ONTROL

B. Stability and Performance Analysis Let z := col(e, W˜ ζ ) ∈ Rn+Nζ and define an auxiliary output ya := c0T e = en with c0 = [0 · · · 0 b]T . By these notations, (19) can be expressed into a compact form as follows:      bζT (xde ) b(h˜ + εζ ) z˙ = z + 0 (20) 0 −γ ζ (xde )c0T  T  ya = c0 0 · · · 0 z

h(χ) = Wζ∗T ζ (χ) + εζ (χ) 1

(12) ζ



(φl j (x) > δ), where wl∗ and φl j are the elements of the vectors W ∗ j and  with an index l j ∈ [1, N], respectively, δ ∈ R+ is a small constant, Nζ < N, O(εζ ) = O(ε), and j = 1, 2, . . . , Nζ . Lemma 3 (Partial PE Condition of the RBF-NNs [16]): Consider the localized RBF-NN (6) with the centers placed on a regular lattice to cover cx . For any given C 1 recurrent trajectory χ(t) : R+ → χ , the regression subvector ζ (χ) in (12) is almost always PE.

A. Hybrid Feedback/Feedforward Structure

where

In the traditional NNLC [16]–[23], an RBF-NN in (7) with inputs (x, ν) would be applied in the feedback loop to approximate h(x, ν) in (6). Yet, in this brief, an RBF-NN in (7) with inputs xde is applied in the feedforward loop to approximate h(xde ) := h(x, ν)|x=xd . From Section III-A, h(xde ) can be expressed as follows: h(xde ) = W ∗T (xde ) + ε(xde )

(13)

where the optimal weight vector W ∗ is given by

W ∗ := arg

min

Wˆ ∈cw

sup

xde ∈cd



h(xde ) − h(x ˆ de |Wˆ )

(14)

with cw ⊂ R N and cd ⊂ Rn+1 , where cw , cd ∈ R+ are certain constants. The control law u is designed as follows: ˆ de |Wˆ ) u = −kn en − en−1 − h(x

(15)

ˆ is in the form of (7), and kn ∈ R+ is a PD gain parameter. where h(·) The adaptive law of Wˆ is given by W˙ˆ = γ en (xde )

(16)







A(xde ) =

bζT (xde )

−γ ζ (xde )c0T b = [0 · · · 0 1]T ,

and



−k1

⎢ ⎢ −1 ⎢ ⎢ =⎢ ⎢ 0 ⎢ ⎢ .. ⎣ . 0

0

c = [c0T 0 · · · 0]T

1

0

−k2 .. . .. . ···

1 .. .

··· .. . .. .

−1 0

−kn−1 −1

⎤ 0 .. ⎥ . ⎥ ⎥ ⎥ ⎥ 0 ⎥. ⎥ ⎥ b ⎦ −bkn

It follows from the definitions of , b, and c0 that the transfer function G(s) = c0T (s I − )b is strictly proper and rational, (, b) is controllable, and (c0 , ) is observable, where s is a complex variable. Besides, there are positive-definite symmetric matrices P = diag(1, . . . , 1, b) and Q = diag(2k1 , . . . , 2kn−1 , 2b2 k2 ), such that  T P + P = −Q (21) P b = c0

1484

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 28, NO. 6, JUNE 2017

which implies that G(s) is strictly positive real (SPR) by the Kalman– Yakubovich–Popov Lemma [33, Lemma A.2.8]. Using f (x) ∈ C 1 and Assumption 2, one gets that h(·) in (6) is of 1 C . In addition, from Assumption 2, there exists a constant cd ∈ R+ , such that xde ∈ cd . Thus, applying the mean value theorem to h(·) in (6) and noting f (0) = 0, one gets that for any given compact set ce with ce ∈ R+ being a constant, there exists a certain constant λh ∈ R+ positively correlated with ce , such that ˜ xde )| ≤ λh e ∀e ∈ e . |h(x,

(22)

Consequently, according to the perturbed systems theory [36, Sec. 9], h˜ and εζ in (20) can be regarded as vanishing and nonvanishing perturbations, respectively. Now, the following theorem is established to demonstrate the main result of this brief. Theorem 1: For system (1) with Assumptions 1 and 2 driven by the control law (15) with (2), (7), and (16), there must exist suitably large control parameters k1 to kn and γ , such that the entire closed-loop system (20) achieves the semiglobal practical exponential stability in the sense that both e and W˜ ζ converge to small neighborhoods of zero within a finite time, where the small neighborhoods can be arbitrarily diminished, and a domain of attraction ce0 with ce0 < ce can be arbitrarily enlarged both by the increase of k1 to kn and γ . Proof: Let the nominal system (c, A) of (20) be    bζT (xde ) z z˙ = 0 −γ ζ (xde )c0T   (23) ya = c0T 0 · · · 0 z and choose its Lyapunov function candidate V (z) = eT e/2 + W˜ ζT W˜ ζ /(2γ ).

(24)

According to [36, Lemma 3.2], if (xde ) ∈ C 1 on cd , then (xde ) is locally Lipschitz in xde on cd . Noting the definition (xde ) in (7) and the C ∞ property of the Gaussian RBFs in (8), one obtains (xde ) ∈ C ∞ (naturally ∈ C 1 ) on cd . Therefore, [36, Lemma 3.2] can be invoked to conclude that (xde ), as well as A(xde ), in (23) is locally Lipschitz in xde on cd . From [35, Th. 1.5.2], if there exist constants λ1 , λ2 , λ3 ∈ R+ , such that V (z) in (24) satisfies

t +δ t

λ1 z2 ≤ V (z) ≤ λ2 z2 V˙ (z)|(23) ≤ 0

(26)

V˙ (z(τ ))|(23) dτ ≤ −λ3 z(t)2

(27)

(25)

then the nominal system (23) is exponentially stable. In the following proof, the conditions (25)–(27) are first established to show global exponential stability of the nominal system (23), and then the perturbed systems theory [36, Sec. 9] is applied to show the semiglobal practical exponential stability of the entire closed-loop system (20). The proof is accomplished by the following four steps. Step 1 [Verifying the Conditions (25) and (26)]: Define λ1 := min{1/2, 1/(2γ )} ∈ R+ and λ2 := max{1/2, 1/(2γ )} ∈ R+ . Then, (25) is satisfied, ∀t ≥ 0 and ∀z ∈ Rn+Nζ . The time derivative of V (z) in (24) along the solutions of (23) satisfies ˙ V˙(23) ≤ −ks e2 + W˜ ζT (en ζ (xe ) − Wˆ ζ /γ ) ≤ −ks e2 ≤ 0

(28)

with ks := min1≤in {ki }. Therefore, one immediately obtains that (26) is satisfied, ∀t ≥ 0 and ∀z ∈ Rn+N . Step 2 [Verifying UCO of the Nominal System (c, A)]: Let K := col(0, γ (xde )). Then, an auxiliary system (c, A + K cT ) is obtained

from (23) in the following form:    bζT (xde ) z˙ = z 0 0   ya = c0T 0 · · · 0 z.

(29)

From Lemma 3 and Assumption 2, one gets ζ (xde ) is PE. From Lemma 1 and the definition of ζ (xde ) in (12), one gets ζ (xde ), ˙ ζ (xde ) ∈ L ∞ . Now, since ζ (xde ) is PE, G(s) is SPR, and  ˙ de ) ∈ L ∞ , as a same manner in [17] and [18], Deriva(xde ), (x tion of 2.6.33 in [35] can be invoked to state that (29) is UCO. Hence, according to (10) in Definition 1 and tr(ζ (xde )ζT (xde )) = (xde )2 , there exist constants α1 , α2 , δ ∈ R+ , such that

t +δ α 2 Nζ ≥ ζ (xde (τ ))2 dτ ≥ α1 Nζ ∀t ≥ 0. (30) t

Thus, one immediately obtains

t +δ K (τ )2 dτ ≤ Nζ α2 γ 2 ∀t ≥ 0.

(31)

t

From [35, Lemma 2.5.2], the result in (31) implies that (c, A + K cT ) being UCO is equivalent to (c, A) being UCO. Step 3 [Establishing the Condition (27)]: Since (c, A) is UCO, (11) is satisfied from Definition 2. Multiplying both sides of (11) by z(t) and zT (t) and applying z(τ ) = ϕ(τ, t)z(t) and ya (τ ) = cT z(τ ) to the resulting expression, one obtains

t +δ β2 z(t)2 ≥ ya2 (τ )dτ ≥ β1 z(t)2 (32) t

for some constants β1 , β2 ∈ R+ . Integrating (28) at [t, t + δ] yields

t +δ

t +δ V˙ (τ )|(23) dτ ≤ −ks e(τ )2 dτ. (33) t

t T Noting ya = c0 e, one gets e ≥ |ya |/b. Applying e ≥ |ya |/b to

the above result leads to

t +δ ks t +δ 2 V˙ (τ )|(23) dτ ≤ − ya (τ )dτ. b t t

Combining (32) with (34), one immediately gets

t +δ ks β1 V˙ (τ )|(23) dτ ≤ − z(t)2 b t

(34)

(35)

which implies that there exists λ3 := ks β1 /b to satisfy (27). Now, as the conditions (25)–(27) are established by Steps 1–3, the nominal system (23) is exponentially stable. Because, the conditions (25)–(27) are valid for all z(0) ∈ Rn+Nζ and V in (24) is radially unbounded (i.e., V (z) → ∞ as z → ∞), the stability is global. Step 4 (Analyzing Perturbed Systems): Consider a perturbed system comprised of (23) and the vanishing perturbation h˜ as follows:      bζT (xde ) bh˜ z + z˙ = 0 0 −γ ζ (xde )c0T  T  (36) ya = c0 0 · · · 0 z. From the above-mentioned proof, one gets that the nominal system (23) is locally Lipschitz in xde on cd and is globally expo˜ ≤ λh e, ∀e ∈ ce nentially stable. In addition, one also has |h| by (22). Hence, the Vanishing Perturbation Lemma [36, Lemma 9.1] is invoked to state that there exist suitably large control parameters k1 to kn and γ , such that (36) is exponentially stable, ∀ e(0) ∈ ce0 ⊂ ce with ce0 < ce . Since ce0 can be arbitrarily enlarged by the increase of k1 to kn and γ , the stability is semiglobal. Now, consider the entire closed-loop system (20) constituted by the perturbed system (36) and the nonvanishing perturbation ες . As

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 28, NO. 6, JUNE 2017

Fig. 1.

1485

Control trajectories by the proposed HFF-NNLC in Example 1. (a) Transient tracking at t ∈ [0, 10] s. (b) Steady-state tracking at t ∈ [190, 200] s.

a same manner in [16], since (36) has the semiglobal exponential stability, the Nonvanishing Perturbation Lemma [36, Lemma 9.2] is applied to state that (20) has semiglobal practical exponential stability in the sense that both e and W˜ ζ converge to small neighborhoods of zero within a finite time, where the small neighborhoods can be arbitrarily diminished by the increase of k1 to kn and γ . Remark 4: In Theorem 1, thanks to the HFF structure of the proposed NNLC law (15), NN learning can commence at the beginning of control rather than after the convergence of x to xd in [16] and [19]–[23]. At the early learning stage, the contribution of the RBF-NN ˆ de |Wˆ ) could be small, and the plant is mainly driven by the PD h(x ˆ de |Wˆ ) can capture the feedback; as the NN learning proceeds, h(x dynamics of the lumped uncertainty h(xde ) in (6) and play a key part in the tracking error convergence; after the accomplishment of ˆ de |Wˆ ) can be reused to learning, the learned knowledge stored in h(x achieve the stability and improved performances, such that repeated adaptation can be avoided. The content about knowledge storage and reusage is omitted here, since it is exactly the same as that of [16].

Fig. 2.

Learning trajectories by the proposed HFF-NNLC in Example 1.

V. S IMULATION S TUDIES A. Example 1: Van der Pol Oscillator Consider the same example as [16], where the controlled plant is a Van der Pol oscillator described by  x˙1 = x2  x˙2 = −x1 + β 1 − x12 x2 + u with β ∈ R+ being an unknown parameter, and the reference signal xd is generated by a Duffing oscillator as follows:  x˙d1 = xd2 3 + q cos(wt) x˙d2 = − p1 xd2 − p2 xd1 − p3 xd1 with p1 , p2 , p3 , q, w ∈ R being known parameters. Since b = 1 in this example, one has h(x, v) = f (x) = β(1 − x12 )x2 − x1 , h(xde ) = ˆ d |Wˆ ). For simulations, set f (xd ) and u = −kn en − en−1 + ν − h(x β = 0.7, p1 = 0.4, p2 = −1.1, p3 = 1.0, q = 1.498, w = 1.8, and x(0) = xd (0) = [0.2, 0.3]T as in [16]. To make comparisons, the NNLC of [16] is selected as a baseline controller. The construction of the proposed control law (15) with

(2), (7), and (16) is the same as that of [16] except that xd rather than x is applied as the NN inputs, where ci j are spaced evenly on cx = [−2, 2] × [−2, 2], σi = 0.2828, k1 = 3, k2 = 10, γ = 5, and Wˆ (0) = 0, in which i = 1, 2, and j = 1 to 112 . Simulations are carried out in the MATLAB software running on Windows 7, where the solver is set as fixed-step ode 5, the step size is set as 1 × 10−3 s, and the other settings are kept at their defaults. Control trajectories by the proposed HFF-NNLC are given in Fig. 1, where fast and accurate tracking with rough approximation of h(·) by ˆ is shown at the beginning of the control [see Fig. 1(a)], and even h(·) ˆ is shown better tracking with an exact approximation of h(·) by h(·) after sufficient NN learning [see Fig. 1(b)]. Evolutions of Wˆ and Wˆ  by the proposed HFF-NNLC are given in Fig. 2 to clearly show the learning capability of the RBF-NNs, where partial elements of Wˆ converge to some constants, so that Wˆ  tends to be constant as the time evolves. Although W ∗ in (13) is unknown, such that W˜ ζ → 0 is difficult to be directly verified, W˜ ζ → 0 can still be indirectly ˆ along verified by combining the exact approximation of h(·) by h(·) the trajectories of xd [see Fig. 1(b)] and the partial convergence of

1486

Fig. 3.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 28, NO. 6, JUNE 2017

Control trajectories by the proposed HFF-NNLC in Example 2. (a) Transient tracking at t ∈ [0, 10] s. (b) Steady-state tracking at t ∈ [190, 200] s. TABLE I

C OMPARISONS OF P ERFORMANCE I NDICES FOR T WO C ONTROLLERS

Wˆ to some constants [see Fig. 2]. Control trajectories by the NNLC of [16] are omitted here for saving page space, as they are almost the same as those by the proposed approach. Instead, a comparison of performance indices between the two controllers is given in Table I, ˆ is modeling error. It is observed that the where em (t) := h(t) − h(t) proposed HFF-NNLC performs very similar to the NNLC of [16], although its analysis and synthesis are much simpler. B. Example 2: Aircraft Wing Rock Model Consider a third-order aircraft wing rock dynamics as follows [3]: ⎧ ⎪ ⎨θ˙1 = θ2 θ˙2 = −ω2 θ1 + μ1 θ2 + b1 θ23 + μ2 θ12 θ2 + b2 θ1 θ22 + b0 δ ⎪ ⎩˙ δ = −δ/τ + u/τ where θ1 (rad) is the aircraft roll angle, θ2 (rad/s) is the roll rate, δ (rad) is the actuator output, b0 is the actuator gain, τ is the aileron time constant, ω2 = −c1 a1 , μ1 = c1 a2 − c2 , b1 = c1 a3 , μ2 = c1 a4 , b2 = c1 a5 , and b1 , b2 , c1 to c5 are certain coefficients. It is shown in [3] that the above system can be transformed into ⎧ x˙1 = x2 ⎪ ⎪ ⎪ ⎪ ⎪ x ˙2 = x3 ⎪ ⎪ ⎪ ⎨x˙ = x  − ω + 2μ x x + b x 2 3 2 2 1 2 2 2  2 ⎪ + x3 μ1 + 3b1 x22 + μ2 x12 + 2b2 x1 x2 ⎪ ⎪  ⎪ ⎪ ⎪ + μ1 x2 + b1 x23 + μ2 x12 x2 + b2 x1 x22 /τ ⎪ ⎪ ⎩ − (ω2 x1 + x3 )/τ + (b0 /τ )u where x1 = θ1 , x2 = θ2 , and x3 = −ω2 θ1 +μ1 θ2 +b1 θ23 +μ2 θ12 θ2 + b2 θ1 θ22 + b0 δ. Thus, one has f (x) = x2 (−ω2 + 2μ2 x1 x2 + b2 x22 ) +

Fig. 4.

Learning trajectories by the proposed HFF-NNLC in Example 2.

x3 (μ1 + 3b1 x22 + μ2 x12 + 2b2 x1 x2 ) + (μ1 x2 + b1 x23 + μ2 x12 x2 + b2 x1 x22 )/τ − (ω2 x1 + x3 )/τ and b = b0 /τ . For simulations, choose b0 = 1.5, τ = 1/15, c1 = 0.3540, c2 = 0.0010, a1 = −0.04207, a2 = −0.01456, a3 = 0.04714, a4 = −0.18583, a5 = 0.24234, x(0) = [π/8, 0, 0]T , and xd (t) = (π/6) sin(2t). Since the wing rock dynamics is assumed to be unknown, x2 and x3 are unmeasurable. Here, a high-gain observer as follows: ⎧ ˙ ˙ ⎪ ⎨xˆ1 = xˆ2 + (3/ε)(x1 − xˆ1 ) ˙xˆ2 = x˙ˆ3 + (3/ε2 )(x1 − xˆ1 ) ⎪ ⎩˙ xˆ3 = (1/ε3 )(x1 − xˆ1 ) is applied, such that the control input u in (15) is applicable, where xˆi with i = 1, 2, 3 are estimates of xi , and ε ∈ (0, 1) is an observer gain. The convergence property of the above observer has been well established in [36]. The construction of the proposed control law (15) with (2), (7), and (16) is the same as that of [16], except that xde rather than (x, ν) is applied as NN inputs, where ci j are spaced evenly on de = [−π/6, π/6] × [−π/3, π/3] × [−2π/3, 2π/3] × [−4π/3, 4π/3], σi = (π/6)2i−2 , k1 = k2 = 3, k3 = 2, γ = 10, Wˆ (0) = 0, and ε = 0.005, in which i = 1 to 4 and j = 1 to 34 .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 28, NO. 6, JUNE 2017

Control and learning trajectories by the proposed HFF-NNLC are shown in Figs. 3 and 4, respectively, where the qualitative analysis of these results are the same as those of Figs. 1 and 2. A comparison of performance indices between the two controllers is given in Table I, where it is demonstrated that the proposed HFF-NNLC still performs very similar to the NNLC of [16]. All these results further verify the effectiveness of the proposed approach. VI. C ONCLUSION This brief has successfully developed a biomimetic HFF-NNLC strategy for a class of uncertain nonlinear systems, where the control structure includes a PD controller acting as a feedback servo machine and an RBF-NN, acting as a feedforward predictive machine. The significance of the proposed approach is that it greatly simplifies the analysis and synthesis of the NNLC while performing quite similar to the conventional NNLC. Two illustrative examples have been provided to verify the effectiveness of the proposed approach. Further work would focus on the NNLC without the PE condition. ACKNOWLEDGMENT The authors would like to thank the reviewers for their valuable comments that have greatly improved the quality of this brief. R EFERENCES [1] H. Modares, F. L. Lewis, and M.-B. Naghibi-Sistani, “Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks,” IEEE Trans. Neural Netw. Learn. Syst., vol. 24, no. 10, pp. 1513–1525, Oct. 2013. [2] C.-F. Hsu, “Adaptive neural complementary sliding-mode control via functional-linked wavelet neural network,” Eng. Appl. Artif. Intell., vol. 26, no. 4, pp. 1221–1229, Apr. 2013. [3] Y. Pan, Y. Zhou, T. Sun, and M. J. Er, “Composite adaptive fuzzy H ∞ tracking control of uncertain nonlinear systems,” Neurocomputing, vol. 99, no. 1, pp. 15–24, Jan. 2013. [4] H. Zargarzadeh, T. Dierks, and S. Jagannathan, “Adaptive neural network-based optimal control of nonlinear continuous-time systems in strict-feedback form,” Int. J. Adapt. Control Signal Process., vol. 28, nos. 3–5, pp. 305–324, Mar./May 2014. [5] D. Liu, D. Wang, and H. Li, “Decentralized stabilization for a class of continuous-time nonlinear interconnected systems using online learning optimal control approach,” IEEE Trans. Neural Netw. Learn. Syst., vol. 25, no. 2, pp. 418–428, Feb. 2014. [6] Y.-J. Liu and S. Tong, “Adaptive fuzzy control for a class of nonlinear discrete-time systems with backlash,” IEEE Trans. Fuzzy Syst., vol. 22, no. 5, pp. 1359–1365, Oct. 2014. [7] Y. Pan, H. Yu, and M. J. Er, “Adaptive neural PD control with semiglobal asymptotic stabilization guarantee,” IEEE Trans. Neural Netw. Learn. Syst., vol. 25, no. 12, pp. 2264–2274, Dec. 2014. [8] Y.-C. Wang, C.-J. Chien, R. Chi, and Z. Hou, “A fuzzy-neural adaptive terminal iterative learning control for fed-batch fermentation processes,” Int. J. Fuzzy Syst., vol. 17, no. 3, pp. 423–433, Sep. 2015. [9] K. Shojaei, “Neural adaptive robust output feedback control of wheeled mobile robots with saturating actuators,” Int. J. Adapt. Control Signal Process., vol. 29, no. 7, pp. 855–876, Jul. 2015. [10] A. Melingui, O. Lakhal, B. Daachi, J. B. Mbede, and R. Merzouki, “Adaptive neural network control of a compact bionic handling arm,” IEEE/ASME Trans. Mechatron., vol. 20, no. 6, pp. 2862–2875, Dec. 2015. [11] Y.-J. Liu and S. Tong, “Adaptive fuzzy identification and control for a class of nonlinear pure-feedback MIMO systems with unknown dead zones,” IEEE Trans. Fuzzy Syst., vol. 23, no. 5, pp. 1387–1398, Oct. 2015.

1487

[12] Y. Pan, T. Sun, and H. Yu, “Peaking-free output-feedback adaptive neural control under a nonseparation principle,” IEEE Trans. Neural Netw. Learn. Syst., vol. 26, no. 12, pp. 3097–3108, Dec. 2015. [13] Y.-J. Liu, Y. Gao, S. Tong, and Y. Li, “Fuzzy approximation-based adaptive backstepping optimal control for a class of nonlinear discretetime systems with dead-zone,” IEEE Trans. Fuzzy Syst., vol. 24, no. 1, pp. 16–28, Feb. 2016. [14] K. S. Narendra, “Neural networks for control theory and practice,” Proc. IEEE, vol. 84, no. 10, pp. 1385–1406, Oct. 1996. [15] P. J. Antsaklis, “Intelligent learning control,” IEEE Control Syst., vol. 15, no. 3, pp. 5–7, Jun. 1995. [16] C. Wang and D. J. Hill, “Learning from neural control,” IEEE Trans. Neural Netw., vol. 17, no. 1, pp. 130–146, Jan. 2006. [17] J. A. Farrell, “Persistence of excitation conditions in passive learning control,” Automatica, vol. 33, no. 4, pp. 699–703, Apr. 1997. [18] J. A. Farrell, “Stability and approximator convergence in nonparametric nonlinear adaptive control,” IEEE Trans. Neural Netw., vol. 9, no. 5, pp. 1008–1020, Sep. 1998. [19] T. Liu, C. Wang, and D. J. Hill, “Learning from neural control of nonlinear systems in normal form,” Syst. Control Lett., vol. 58, no. 9, pp. 633–638, Sep. 2009. [20] C. Wang, M. Wang, T. Liu, and D. J. Hill, “Learning from ISS-modular adaptive NN control of nonlinear strict-feedback systems,” IEEE Trans. Neural Netw. Learn. Syst., vol. 23, no. 10, pp. 1539–1550, Oct. 2012. [21] S.-L. Dai, C. Wang, and M. Wang, “Dynamic learning from adaptive neural network control of a class of nonaffine nonlinear systems,” IEEE Trans. Neural Netw. Learn. Syst., vol. 25, no. 1, pp. 111–123, Jan. 2014. [22] M. Wang, C. Wang, and X. Liu, “Dynamic learning from adaptive neural control with predefined performance for a class of nonlinear systems,” Inf. Sci., vol. 279, pp. 874–888, Sep. 2014. [23] B. Xu, C. Yang, and Z. Shi, “Reinforcement learning output feedback NN control using deterministic learning technique,” IEEE Trans. Neural Netw. Learn. Syst., vol. 25, no. 3, pp. 635–641, Mar. 2014. [24] D. Gorinevsky, “On the persistency of excitation in radial basis function network identification of nonlinear systems,” IEEE Trans. Neural Netw., vol. 6, no. 5, pp. 1237–1244, Sep. 1995. [25] S. Lu and T. Ba¸sar, “Robust nonlinear system identification using neuralnetwork models,” IEEE Trans. Neural Netw., vol. 9, no. 3, pp. 407–429, May 1998. [26] A. J. Kurdila, F. J. Narcowich, and J. D. Ward, “Persistency of excitation in identification using radial basis function approximants,” SIAM J. Control Optim., vol. 33, no. 2, pp. 625–642, Mar. 1995. [27] S. Khemaissia and A. Morris, “Use of an artificial neuroadaptive robot model to describe adaptive and learning motor mechanisms in the central nervous system,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 28, no. 3, pp. 404–416, Jun. 1998. [28] T. Lam, M. Anderschitz, and V. Dietz, “Contribution of feedback and feedforward strategies to locomotor adaptations,” J. Neurophysiol., vol. 95, no. 2, pp. 766–773, Feb. 2006. [29] S.-H. Lee and D. Terzopoulos, “Heads up! Biomechanical modeling and neuromuscular control of the neck,” ACM Trans. Graph., vol. 25, no. 3, pp. 1188–1198, Jul. 2006. [30] M. J. Wagner and M. A. Smith, “Shared internal models for feedforward and feedback control,” J. Neurosci., vol. 28, no. 42, pp. 10663–10673, Oct. 2008. [31] Y. Pan and H. Yu, “Biomimetic hybrid feedback feedforword adaptive neural control of robotic arms,” in Proc. IEEE Symp. Comput. Intell. Control Autom., Orlando, FL, USA, Dec. 2014, pp. 1–7. [32] Y. Pan, Y. Liu, B. Xu, and H. Yu, “Hybrid feedback feedforward: An efficient design of adaptive neural network control,” Neural Netw., vol. 76, pp. 122–134, Apr. 2016. [33] J. A. Farrell and M. M. Polycarpou, Adaptive Approximation Based Control: Unifying Neural, Fuzzy and Traditional Adaptive Approximation Approaches. Hoboken, NJ, USA: Wiley, 2006. [34] B. Xian, D. M. Dawson, M. S. de Queiroz, and J. Chen, “A continuous asymptotic tracking control strategy for uncertain nonlinear systems,” IEEE Trans. Autom. Control, vol. 49, no. 7, pp. 1206–1211, Jul. 2004. [35] S. Sastry and M. Bodson, Adaptive Control: Stability, Convergence and Robustness. Englewood Cliffs, NJ, USA: Prentice-Hall, 1989. [36] H. K. Khalil, Nonlinear Systems, 3rd ed. Upper Saddle River, NJ, USA: Prentice-Hall, 2002.