Limited memory BFGS method with backtracking for

0 downloads 0 Views 260KB Size Report
1. Introduction. In this paper, we consider the following system of nonlinear equations ...... Extended Freudenstein and Roth function (n is even) [24]. For i = 1,2,...
Mathematical and Computer Modelling 54 (2011) 367–377

Contents lists available at ScienceDirect

Mathematical and Computer Modelling journal homepage: www.elsevier.com/locate/mcm

Limited memory BFGS method with backtracking for symmetric nonlinear equations Gonglin Yuan a,∗ , Zengxin Wei a , Sha Lu b a

College of Mathematics and Information Science, Guangxi University, Nanning, Guangxi, 530004, PR China

b

School of Mathematics Science, Guangxi Teacher’s Education University, Nanning, Guangxi, 530001, PR China

article

info

Article history: Received 18 February 2010 Received in revised form 13 February 2011 Accepted 14 February 2011 Keywords: Limited memory BFGS method Line search Symmetric nonlinear equations Global convergence

abstract A limited memory BFGS algorithm is presented for solving symmetric nonlinear equations. The global convergence of the given method is established under mild conditions. Numerical results show that the limited memory BFGS method is more interesting than the normal BFGS method for the given problems. © 2011 Elsevier Ltd. All rights reserved.

1. Introduction In this paper, we consider the following system of nonlinear equations g (x) = 0,

x ∈ ℜn ,

(1.1)

where g : ℜn → ℜn is continuously differentiable and the Jacobian ∇ g (x) of g is symmetric for all x ∈ ℜn . This problem comes from an unconstrained problem or an equality constrained optimization problem. Eq. (1.1) is the first order necessary condition for the unconstrained optimization problem when g is the gradient mapping of some function f : ℜn → ℜ, min f (x),

x ∈ ℜn .

(1.2)

For the equality constrained problem min f (z ) s.t . h(z ) = 0,

(1.3)

where h is a vector-valued function, the KKT conditions can be represented as system (1.1) with x = (z , v), and g (z , v) = (∇ f (z ) + ∇ h(z )v, h(z )),

(1.4)

where v is the vector of Lagrange multipliers. Notice that the Jacobian ∇ g (z , v) is symmetric for all (z , v). There are some other practical problems, such as the saddle point problem, the discretized two-point boundary value problem, and the discretized elliptic boundary value problem, taking the form of (1.1) with symmetric Jacobian (see, e.g., Chapter 1 in [1]).



Corresponding author. E-mail address: [email protected] (G. Yuan).

0895-7177/$ – see front matter © 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.mcm.2011.02.021

368

G. Yuan et al. / Mathematical and Computer Modelling 54 (2011) 367–377

Let θ be the norm function defined by θ (x) = 12 ‖g (x)‖2 , where ‖ · ‖ is the Euclidean norm. Then the nonlinear equations problem (1.1) is equivalent to the following global optimization problem min θ (x),

x ∈ ℜn .

(1.5)

There are two numerical methods for (1.1) which are the trust region methods [2–4] and the line search methods, respectively. The line search methods often use the following iterative formula to solve (1.1): xk+1 = xk + αk dk , where αk is a steplength and dk is a search direction. To begin with, we briefly review some methods for (1.1) by line search technique. First, we give some techniques for αk . Brown and Saad [5] proposed the following line search technique to obtain the stepsize αk :

θ (xk + αk dk ) − θ (xk ) ≤ σ αk ∇θ (xk )T dk ,

(1.6)

where σ ∈ (0, 1) and ∇θ (xk ) = ∇ g (xk )g (xk ). Based on this technique, Zhu [6] presented the nonmonotone line search technique:

θ (xk + αk dk ) − θ (xl(k) ) ≤ σ αk ∇θ (xk )T dk ,

(1.7)

θ(xl(k) ) = max0≤j≤m(k) {θ (xk−j )}, m(0) = 0, m(k) = min{m(k − 1) + 1, M } (for k ≥ 1), and M is a nonnegative integer. It is easy to see that the Jacobian matrix ∇ gk must be computed at every iteration when these two techniques (1.6) and (1.7) are used, which will increase the workload, especially for large-scale problems. In order to avoid this drawback, we [7] presented the new backtracking inexact technique

‖g (xk + αk dk )‖2 ≤ ‖g (xk )‖2 + δαk2 gkT dk ,

(1.8)

where δ ∈ (0, 1), gk = g (xk ), and dk is a solution of the system of linear equation (1.12). The global convergence and the superlinear convergence are established, and the numerical results showed that the new line search technique is more effective than the normal line search technique does. Li and Fukashima [8] proposed an approximate monotone line search technique

θ (xk + αk dk ) − θ (xk ) ≤ −δ1 ‖αk dk ‖2 − δ2 ‖αk gk ‖2 + ϵk ‖g (xk )‖2 ,

(1.9)

where δ1 and δ2 are positive constants, αk = r , r ∈ (0, 1), ik is the smallest nonnegative integer i such that (1.9), and ϵk satisfies ik

∞ −

ϵk < ∞.

(1.10)

k=0

They combined the line search (1.9) with one special BFGS update formula and got some better results (see [8] in detail). Inspired by their ideas, we [9] made a further study. At present, there are many papers to be proposed for (1.1) (see [10–12] etc.). Second, we present some techniques for dk . One of the most effective methods is Newton method. It normally requires a fewest number of function evaluations, and is very good at handling ill-conditioning. However, its efficiency largely depends on the possibility to efficiently solve a linear system which arises when computing the search dk at each iteration

∇ g (xk )dk = −g (xk ).

(1.11)

Moreover, the exact solution of system (1.11) could be too burdensome, or is not necessary when xk is far from a solution [13]. Inexact Newton methods represent the basic approach underlying most of the Newton-type large-scale algorithms (see [6,7,14]). At each iteration, the current estimate of the solution is updated by approximately solving the linear system (1.11). The inner iteration is typically ‘‘truncated’’ before the solution of the linear system is obtained. One of the most effective quasi-Newton methods is the famous BFGS method, where the search direction dk is decided by Bk dk + gk = 0,

(1.12)

where gk = g (xk ) and Bk is generated by the following BFGS update formula Bk+1 = Bk −

Bk sk sTk Bk sTk Bk sk

+

yk yk T yk T sk

,

(1.13)

where sk = xk+1 − xk and yk = gk+1 − gk . Let Hk be the inverse of Bk , then the inverse update formula of (1.13) is represented as

(sk − Hk yk )sTk + sk (sk − Hk yk )T (yk T sk )2 (yk T sk )2     T T T sk y k yk s sk s = I− T Hk I − T k + Tk ,

Hk+1 = Hk −

yk T (sk − Hk yk )sk sTk

y k sk

+

yk sk

y k sk

(1.14)

G. Yuan et al. / Mathematical and Computer Modelling 54 (2011) 367–377

369

which is the dual form of the DFP update formula in the sense that Hk ↔ Bk , Hk+1 ↔ Bk+1 , and sk ↔ yk . The limited memory BFGS (L-BFGS) method (see [15]) is an adaptation of the BFGS method for large-scale problems. The implementation is almost identical to that of the standard BFGS method, the only difference is that the inverse Hessian approximation is not formed explicitly, but defined by a small number of BFGS updates. It is often provided a fast rate of linear convergence, and requires (> 0) times using BFGS minimal storage. In the L-BFGS method, matrix Hk is obtained by updating the basic matrix H0 m  iterations. The standard BFGS correction (1.14) has the following form formula with the previous m Hk+1 = VkT Hk Vk + ρk sk sTk , where ρk =

1 sTk yk

, Vk = I − ρ

(1.15)

T k yk sk ,

and I is the unit matrix. Thus, Hk+1 in the L-BFGS method has the following form:

Hk+1 = VkT Hk Vk + ρk sk sTk

= VkT [VkT−1 Hk−1 Vk−1 + ρk−1 sk−1 sTk−1 ]Vk + ρk sk sTk = ... = [VkT . . . VkT− m+1 [Vk− m+1 . . . Vk ] m+1 ]Hk− T T T T + ρk− m+1 [Vk−1 . . . Vk− m+1 sk− m+2 . . . Vk−1 ] + · · · + ρk sk sk . m+2 ]sk− m+1 [Vk−

(1.16)

Some modified L-BFGS methods have been proposed for large-scale bound constrained optimization problems (see [16,17] etc.). Motivated by the above observations, we extend the line search techniques (1.8) and (1.9) to limited memory scheme. A limited memory BFGS algorithm will be presented. Under appropriate conditions, the global convergence is established. The numerical experiments of the proposed method on a set large problems indicate that the given algorithm is promising. In the next section, the backtracking inexact L-BFGS algorithm is stated. Under some reasonable conditions, we establish the global convergence of the algorithm in Section 3. Numerical results are reported in Section 4. 2. Algorithms This section will give the inexact L-BFGS method in association with the new backtracking line search techniques (1.8) and (1.9) for (1.1). Algorithm 1 (L-BFGS1). Step 0: Choose an initial point x0 ∈ ℜn , an initial symmetric positive definite matrix H0 ∈ ℜn×n , positive constants δ1 , δ2 , and constants r , δ, ρ ∈ (0, 1), a positive integer m1 , and ϵk such that (1.10), let k := 0. Step 1: Stop if ‖gk ‖ = 0. Step 2: Determine dk by dk = −Hk gk . Step 3: If

‖g (xk + dk )‖ ≤ ρ‖g (xk )‖,

(2.1)

then take αk = 1 and go to Step 5. Otherwise go to Step 4. Step 4: Let ik be the smallest nonnegative integer i such that (1.9) or (1.8) holds for α = r i . Let αk = r ik . Step 5: Let the next iterative be xk+1 = xk + αk dk .  = min{k + 1, m1 }. Put sk = xk+1 − xk = αk dk and yk = gk+1 − gk . Update H0 for m  times to get Hk+1 by Step 6: Let m (1.14). Step 7: Let k := k + 1. Go to step 1. In the following, we assume that the algorithm updates Bk −the inverse of Hk . We also assume that the basic matrix B0 , and its inverse H0 , are bounded and positive definite. Algorithm 1 which combining with Bk can be stated as follows. Algorithm 2 (L-BFGS2). Step 2: Determine dk by Bk dk = −gk .

(2.2)

 = min{k + 1, m1 }. Put sk = xk+1 − xk = αk dk and yk = gk+1 − gk . Update B0 for m  times, i.e. for Step 6: Let m  + 1, . . . , k compute l=k−m Blk+1 = Blk −

Blk sl sTl Blk sTl Blk sl

+

yl yl T yl T sl

,

(2.3)

m+1 where sl = xl+1 − xl , yl = gl+1 − gl and Bkk− = B0 for all k.

Note that Algorithms 1 and 2 are mathematically equivalent. In our numerical experiments we implement Algorithms 1 and 2 is given only for the purpose of analysis. Throughout this paper, we only discuss Algorithm 2. In the following section, we will concentrate on its global convergence.

370

G. Yuan et al. / Mathematical and Computer Modelling 54 (2011) 367–377

3. Global convergence Let Ω be the level set defined by

Ω = {x|‖g (x)‖ ≤ ‖g (x0 )‖}.

(3.1)

Similar to [7], the following assumptions are needed to get the global convergence of Algorithm 2. Assumption A. (i) g is continuously differentiable on an open convex set Ω1 containing Ω . (ii) The Jacobian of g is symmetric, bounded, and positive definite on Ω1 , namely, there exist positive constants M ≥ m > 0 such that

‖∇ g (x)‖ ≤ M ∀ x ∈ Ω1

(3.2)

m‖d‖2 ≤ dT ∇ g (x)d ∀ x ∈ Ω1 , d ∈ ℜn .

(3.3)

and

Assumption B. Bk is a good approximation to ∇ gk , i.e.,

‖(∇ gk − Bk )dk ‖ ≤ ϵ‖gk ‖,

(3.4)

where ϵ ∈ (0, 1) is a small quantity. The following Lemmas 3.1–3.3 are given in [7]. We represent them here for the sake of completeness. Here, we only state the process of Lemma 3.1. Lemma 3.1. Let Assumption B hold and {αk , dk , xk+1 , gk+1 } be generated by Algorithm 2. Then dk is a descent direction for θ (x) at xk , i.e.,

∇θ (xk )T dk < 0.

(3.5)

Proof. By (2.2), we have

∇θ (xk )T dk = g (xk )T ∇ g (xk )dk = g (xk )T [(∇ g (xk ) − Bk )dk − g (xk )] = g (xk )T (∇ g (xk ) − Bk )dk − g (xk )T g (xk ).

(3.6)

Therefore, taking the norm in the right-hand side of (3.6), we have that from (3.4)

∇θ (xk )T dk ≤ ‖g (xk )‖‖(∇ g (xk ) − Bk )dk ‖ − ‖g (xk )‖2 ≤ −(1 − ϵ)‖g (xk )‖2 . Hence, for ϵ ∈ (0, 1), this lemma is true.

(3.7)



Lemma 3.2. Let Assumption B hold and {αk , dk , xk+1 , gk+1 } be generated by Algorithm 2. Then {xk } ⊂ Ω . Moreover, {‖gk ‖} converges. Lemma 3.3. Let Assumption A be satisfied and {αk , dk , xk+1 , gk+1 } be generated by Algorithm 2. Then yk T sk ≥ m‖sk ‖2 ,

∀ k.

(3.8)

Using yTk sk ≥ m‖sk ‖2 > 0, we can deduce that Bk+1 inherits symmetric and positive definiteness of Bk (see [18]). If the conditions sTk yk > 0 (∀k) and B0 is symmetric and positive definite, then Bk generated by the update formula (2.3) is nonsingular [15]. Therefore, (2.2) has a unique solution for each k. In order to get the global convergence, in the following we assume that Bk is positive definite. By the above lemma and Assumption A, we obtain sTk yk

‖s k ‖

2

≥ m,

‖yk ‖2 sTk yk



M2 m

.

(3.9)

Lemma 3.4. Let Bk be updated by BFGS formula (2.3) and B0 be symmetric and positive definite. For any k ≥ 0, sk and yk such that (3.9). Then, for any positive integer k′ , there exist positive constants β1 , β2 and β3 satisfying

β2 ‖dk ‖2 ≤ dTk Bk dk ≤ β3 ‖dk ‖2 ,

‖Bk dk ‖ ≤ β1 ‖dk ‖

(3.10)

hold for at least ⌈k /2⌉ value of k ∈ {1, 2, . . . , k }. ′



Proof. Similar to [19], for any positive definite matrix B, we define the function

ψ(B) = tr (B) − ln(det (B)),

(3.11)

G. Yuan et al. / Mathematical and Computer Modelling 54 (2011) 367–377

371

where tr (B) and det (B) denote the trace and determinant of B, and ln denotes the natural logarithm. Note that ψ(B) > 0 since

ψ(B) =

n −

(λi − ln λi ),

(3.12)

i=1

where 0 < λ1 ≤ · · · ≤ λn are the eigenvalues of B. We also note that the function

w(t ) = t − ln t ,

t >0

is strictly convex and has the minimum value of 1 at t = 1. Thus ψ(B) ≥ n, and ψ(B) can be considered as a measure of closeness between B and the identity matrix, for which ψ(I ) = n. Note also that since w(t ) > ln t, (3.12) gives

ψ(B) > ln λn − ln λ1 = ln

λn = ln cond(B). λ1

(3.13)

By (2.3), we have k −

‖Bl sl ‖2

l=k− m+1

sTl Bl sl

m+1 tr (Bk+1 ) = tr (Bkk− )−

k −

‖yl ‖2

l=k− m+1

sTl yl

+

,

(3.14)

and k ∏

sTl yl

l=k− m+1

sTl Bl sl

m+1 det (Bk+1 ) = det (Bkk− )

.

(3.15)

Now we will derive a closed-form expression for ψ(Bk ). From (3.11), (3.14) and (3.15), we get k −

m+1 ψ(Bk+1 ) = ψ(Bkk− )−

‖Bl sl ‖2

sTl Bl sl l=k− m+1

+

k −

‖yl ‖2

k ∏

sTl yl

l=k− m+1

sTl Bl sl

− ln

sTl yl l=k− m+1

k −

k k − ∏ [‖Bl sl ‖‖sl ‖]2 sTl Bl sl ‖yl ‖2 sTl yl sTl sl + − ln . T T T 2 (sl Bl sl ) sl sl sl yl sT s sT B s l=k− m+1 l=k− m+1 l=k− m+1 l l l l l

m+1 )− = ψ(Bkk−

Let θk be the angle between sk and Bk sk , defined by cos θk =

sTk Bk sk

‖sk ‖‖Bk sk ‖

.

(3.16)

qk denotes the Rayleigh quotient defined by qk =

sTk Bk sk sTk sk

.

(3.17)

By the definition (3.16) and (3.17) we have m+1 ψ(Bk+1 ) = ψ(Bkk− )+

k −

‖y l ‖2

l=k− m+1

sTl yl

[ k − ‖yl ‖2

m+1 = ψ(Bkk− )+

l=k− m+1



[ k −

+

− ln

1−

l=k− m+1

sTl yl

ql cos2 θl

k ∏

sTl yl

l=k− m+1

sTl sl

] − 1 − ln

] + ln

k −

ql

l=k− m+1

cos2 θl



k ∏

sTl yl

sT s l=k− m+1 l l

k ∏

+ ln

k ∏

+ ln

ql

l=k− m+1

k ∏

cos2 θl

l=k− m+1



ql

cos2 θl l=k− m+1

.

(3.18)

From (3.9) and (3.18), we obtain

(k − m  + 2)(M − 1 − ln m) ψ(Bk+1 ) ≤ ψ(B0 ) + m   [ l k k − ∏ − 2 + ln cos θl + 1− j =0

l=k− m+1

l=k− m+1

(k − m  + 2)(M − 1 − ln m) + = ψ(B0 ) + m

]

ql cos2 θl

k  − j =0

+ ln



k ∏

ql

l=k− m+1

cos2 θl

ln cos2 θj + 1 −

qj cos2 θj

+ ln

qj cos2 θj



.

372

G. Yuan et al. / Mathematical and Computer Modelling 54 (2011) 367–377

Based on the above results and the assumption of positive definiteness Bk , by the way similar to Theorem 2.1 of [19], it is not difficult to can get the conclusion of this lemma.  According to Lemma 3.4, we get

β2 ‖dk ‖ ≤ ‖Bk dk ‖ ≤ β1 ‖dk ‖

(3.19)

and gkT dk = −dTk Bk dk ≤ −β2 ‖dk ‖2 ,

−β3 ‖dk ‖2 ≤ −dTk Bk dk = gkT dk .

(3.20)

Lemma 3.5 (Lemma 3.5 in [7]). Let Assumptions A and B hold. Then Algorithm 2 will produce an iterate xk+1 = xk + αk dk in a finite number of backtracking steps. Proof. If the line search technique (1.8) is used in Algorithm 2. From Lemma 3.8 in [5] we have that in a finite number of backtracking steps, αk must satisfy

‖g (xk + αk dk )‖2 − ‖g (xk )‖2 ≤ σ αk g (xk )T ∇ g (xk )dk .

(3.21)

By (2.2), (3.7), (3.19) and (3.20), we get

αk g (xk )T ∇ g (xk )dk ≤ −αk (1 − ϵ)‖g (xk )‖2 = −αk (1 − ϵ)

gkT dk gkT dk

‖Bk dk ‖2

β22 T g dk . β3 k

(3.22)

β22 T β2 gk dk ≤ αk2 (1 − ϵ) 2 gkT dk . β3 β3

(3.23)

≤ αk (1 − ϵ) By αk ≤ 1, we have

αk g (xk )T ∇ g (xk )dk ≤ αk (1 − ϵ) β2

Let δ ∈ (0, min{1, σ (1 − ϵ) β2 }), then we get the line search (1.8). Similar to the above process, it is easy to deduce that the 3 line search (1.9) is well defined too. The proof is complete.  Lemma 3.5 shows that the line search techniques (1.8) and (1.9) are well-defined. Based on the above lemmas, it is not difficult to establish the global convergence theorem of Algorithm 2. So we only state it as follows, but omit the proof. Theorem 3.6 (Theorem 3.1 of [7] and Theorem 3.3 of [8]). Let Assumptions A and B hold and {αk , dk , xk+1 , gk+1 } be generated by Algorithm 2. Then lim ‖gk ‖ = 0.

(3.24)

k→∞

By Lemma 3.2, {‖gk ‖} converges. So, if (3.24) holds, then every accumulation point of {xk } is a solution of (1.1). Since ∇ g (x) is positive definite on Ω1 , (1.1) has only one solution. Moreover, since Ω is bounded, {xk } ∈ Ω has at least one accumulation point. Therefore {xk } itself converges to the unique solution x∗ of (1.1). 4. Numerical experiments In this section, we report results of some numerical experiments with the L-BFGS1 and normal BFGS algorithm. We now list the test problems g (x) = (f1 (x), f2 (x), . . . , fn (x))T , where Problems 1 and 2 are proposed in [7] and other functions have the associated initial guess x0 . We state the functions as follows. Problem 1. The discretized two-point boundary value problem like the problem in [1] g (x) , Ax +

1

(n + 1)2

F (x) = 0,

G. Yuan et al. / Mathematical and Computer Modelling 54 (2011) 367–377

where A is the n × n tridiagonal matrix given by



−1 8

−1

   A=   

−1

8

8  −1

 ..

−1 .. .

.

..

.

.. ..

    ,   

.

. −1

−1 8

and F (x) = (F1 (x), F2 (x), . . . , Fn (x)) w ith Fi (x) = sin xi − 1, i = 1, 2, . . . , n. Initial guess: x0 = (10, 10, . . . , 10). T

Problem 2. Unconstrained optimization problem min f (x),

x ∈ ℜn ,

with Engval function [20] f : ℜn → ℜ defined by f ( x) =

n −

[(x2i−1 + x2i )2 − 4xi−1 + 3].

i=2

The related symmetric nonlinear equation is g (x) ,

1 4

∇ f (x) = 0,

where g (x) = (g1 (x), g2 (x), . . . , gn (x))T with g1 (x) = x1 (x21 + x22 ) − 1, gi (x) = xi (x2i−1 + 2x2i + x2i+1 ) − 1, g n ( x) =

(

xn x2n−1

+

x2n

i = 2, 3, . . . , n − 1,

).

Initial guess: x0 = (0.01, 0, 0.01, 0, . . .). Problem 3. Exponential function 1 f1 (x) = ex1 −1 − 1, fi (x) = i(exi −1 − xi ), Initial guess: x0 = (

1 n2

,

1 n2

i = 2, 3, . . . , n.

, . . . , n12 )T .

Problem 4. Trigonometric function

 fi (x) = 2 n + i(1 − cos xi ) − sin xi −

n −

 cos xj

(2 sin xi − cos xi ),

j =1 101 Initial guess: x0 = ( 100n ,

101 100n

101 T , . . . , 100n ) .

Problem 5. Singular function f 1 ( x) =

1 3

x31 +

1 2

x22 ,

1 i 1 fi (x) = − x2i + x3i + x2i+1 , 2 3 2 1 2 n 3 fn (x) = − xn + xn . 2 3

i = 2, 3, . . . , n − 1,

Initial guess: x0 = (1, 1, . . . , 1)T . Problem 6. Logarithmic function fi (x) = ln(xi + 1) −

xi n

,

i = 1 , 2 , 3 , . . . , n.

Initial guess: x0 = (1, 1, . . . , 1)T .

i = 1, 2, 3, . . . , n.

373

374

G. Yuan et al. / Mathematical and Computer Modelling 54 (2011) 367–377

Problem 7. Broyden Tridiagonal function [[21], pp. 471–472] f1 (x) = (3 − 0.5x1 )x1 − 2x2 + 1, fi (x) = (3 − 0.5xi )xi − xi−1 + 2xi+1 + 1, fn (x) = (3 − 0.5xn )xn − xn−1 + 1.

i = 2, 3, . . . , n − 1,

Initial guess: x0 = (−1, −1, . . . , −1)T . Problem 8. Trigexp function [[21], p. 473] f1 (x) = 3x31 + 2x2 − 5 + sin(x1 − x2 ) sin(x1 + x2 ), fi (x) = −xi−1 exi−1 −xi + xi (4 + 3x2i ) + 2xi+1 + sin(xi − xi+1 ) sin(xi + xi+1 ) − 8, fn (x) = −xn−1 e

i = 2, 3, . . . , n − 1,

+ 4xn − 3.

xn−1 −xn

Initial guess: x0 = (0, 0, . . . , 0)T . Problem 9. Strictly convex function p. 29] ∑n 1 [[22], xi g (x) is the gradient of h(x) = ( e − xi ). i=1 fi (x) = exi − 1,

i = 1, 2, 3, . . . , n.

Initial guess: x0 = ( , , . . . , 1)T . 1 n

2 n

Problem 10. Strictly convex function p. 30] ∑n 2 i[[22], xi g (x) is the gradient of h(x) = i=1 10 (e − xi ). fi (x) =

i 10

(exi − 1),

i = 1, 2, 3, . . . , n.

Initial guess: x0 = (1, 1, . . . , 1)T . Problem 11. Linear function-full rank fi (x) = xi −

n 2−

n j =1

xj + 1.

Initial guess: x0 = (100, 100, . . . , 100)T . Problem 12. Penalty function fi (x) = f n ( x) =



10−5 (xi − 1),



1

− n

4n

x2j −

j=1

i = 1, 2, 3, . . . , n − 1, 1 4

.

Initial guess: x0 = ( 31 , 13 , . . . , 13 )T . Problem 13. Variable dimensioned function fi (x) = xi − 1,

i = 1, 2, 3, . . . , n − 2,

n−2

fn−1 (x) =



j(xj − 1),

j =1

 f n ( x) =

n −2 −

2 j(xj − 1)

.

j=1

Initial guess: x0 = (1 − 1n , 1 − 2n , . . . , 0)T .

G. Yuan et al. / Mathematical and Computer Modelling 54 (2011) 367–377

Problem 14. Tridiagonal system [23] f1 (x) = 4(x1 − x22 ), fi (x) = 8xi (x2i − xi−1 ) − 2(1 − xi ) + 4(xi − x2i+1 ), fn (x) =

(

8xn x2n

i = 2, 3, . . . , n − 1

− xn−1 ) − 2(1 − xn ).

Initial guess: x0 = (12, 12, . . . , 12). Problem 15. Five-diagonal system [23] f1 (x) = 4(x1 − x22 ) + x2 − x23 , f2 (x) = 8x2 (x22 − x1 ) − 2(1 − x2 ) + 4(x2 − x23 ) + x3 − x24 , fi (x) = 8xi (x2i − xi−1 ) − 2(1 − xi ) + 4(xi − x2i+1 ) + x2i−1 − xi−2 + xi+1 − x2i+2 , fn−1 (x) =

(

8xn−1 x2n−1

− xn−2 ) − 2(1 − xn−1 ) + 4(xn−1 −

x2n

)+

x2n−2

i = 3, 4, . . . , n − 2

− x n −3 ,

fn (x) = 8xn (x2n − xn−1 ) − 2(1 − xn ) + x2n−1 − xn−2 . Initial guess: x0 = (−2, −2, . . . , −2). Problem 16. Seven-diagonal system [23] f1 (x) = 4(x1 − x22 ) + x2 − x23 + x3 − x24 , f2 (x) = 8x2 (x22 − x1 ) − 2(1 − x2 ) + 4(x2 − x23 ) + x21 + x3 − x24 + x4 − x25 f3 (x) = 8x3 (x23 − x2 ) − 2(1 − x3 ) + 4(x3 − x24 ) + x22 − x1 + x4 − x25 + x21 + x5 − x26 fi (x) = 8xi (x2i − xi−1 ) − 2(1 − xi ) + 4(xi − x2i+1 ) + x2i−1 − xi−2 + xi+1 − x2i+2 + x2i−2 + xi+2 − xi−3 − x2i+3 , i = 4, 5, . . . , n − 3 fn−2 (x) = 8xn−2 (x2n−2 − xn−3 ) − 2(1 − xn−2 ) + 4(xn−2 − x2n−1 ) + x2n−3 − xn−4 + xn−1 − x2n + x2n−4 + xn − xn−5 , fn−1 (x) = 8xn−1 (x2n−1 − xn−2 ) − 2(1 − xn−1 ) + 4(xn−1 − x2n ) + x2n−2 − xn−3 + xn + x2n−3 − xn−4 , fn (x) = 8xn (x2n − xn−1 ) − 2(1 − xn ) + x2n−1 − xn−2 + xn−2 − xn−3 . Initial guess: x0 = (−3, −3, . . . , −3). Problem 17. Extended Freudenstein and Roth function (n is even) [24]. For i = 1, 2, . . . , n/2 f2i−1 (x) = x2i−1 + ((5 − x2i )x2i − 2)x2i − 13, f2i (x) = x2i−1 + ((1 + x2i )x2i − 14)x2i − 29. Initial guess: x0 = (6, 3, 6, 3, . . . , 6, 3). Problem 18. Discrete boundary value problem [25]. f1 (x) = 2x1 + 0.5h2 (x1 + h)3 − x2 , fi (x) = 2xi + 0.5h2 (xi + hi)3 − xi−1 + xi+1 , fn (x) = 2xn + 0.5h2 (xn + hn)3 − xn−1 ,

h=

i = 2, 3, . . . , n − 1 1 n+1

.

Initial guess: x0 = (h(h − 1), h(2h − 1), . . . , h(nh − 1)). Problem 19. Troesch problem [26]. f1 (x) = 2x1 + ϱh2 sin h(ϱx1 ) − x2 , fi (x) = 2xi + ϱh2 sin h(ϱxi ) − xi−1 − xi+1 , fn (x) = 2xn + ϱh2 sin h(ϱxn ) − xn−1 , Initial guess: x0 = (0, 0, . . . , 0).

i = 2, 3, . . . , n − 1 1 h= , ϱ = 10. n+1

375

376

G. Yuan et al. / Mathematical and Computer Modelling 54 (2011) 367–377

Table 1 (Numerical results of the test problems). Functions 1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

Dim

Normal BFGS with (1.9) NI/NG/GF

L-BFGS with (1.9) NI/NG/GF

Normal BFGS (1.8) NI/NG/GF

L-BFGS with (1.8) NI/NG/GF

20 300 700 20 300 700 20 100 400 20 100 400 20 100 400 20 100 400 20 100 400 20 100 400 20 100 400 20 100 400 20 100 400 20 100 400 20 100 400 20 100 400 20 100 400 20 100 400 20 100 400 20 100 400 20 100 400

23/38/9.045186e−007 104/197/9.038979e−007 104/193/9.224300e−007 33/229/8.733449e−008 45/318/4.331731e−007 45/319/8.771359e−007 Fails Fails Fails 18/33/3.702085e−007 506/1606/9.806729e−007 9999/70704/5.378534e−005 1238/9688/9.931001e−007 Fails Fails 7/8/5.091924e−010 7/8/1.533339e−010 7/8/2.070211e−010 33/34/4.893097e−007 86/87/9.430458e−007 90/91/9.143368e−007 34/35/2.587953e−007 64/65/7.960480e−007 69/70/6.264744e−007 7/8/1.721378e−007 7/8/2.733460e−007 7/8/5.086604e−007 30/52/6.862573e−007 100/318/9.310603e−007 124/188/9.829743e−007 Fails Fails Fails Fails Fails Fails 1/2/0.000000e+000 1/2/0.000000e+000 1/2/0.000000e+000 Fails Fails Fails Fails Fails Fails Fails Fails Fails Fails Fails Fails 51/52/3.913197e−007 69/70/9.375596e−007 58/59/9.983970e−007 0/1/0.000000e+000 0/1/0.000000e+000 0/1/0.000000e+000

11/13/8.881740e−007 12/14/2.386361e−007 12/14/2.500912e−007 15/33/8.822533e−007 17/33/2.137398e−007 16/34/9.807169e−007 78/394/8.294420e−007 56/351/3.152238e−007 59/305/9.355782e−007 31/46/8.607576e−007 20/28/1.897456e−007 19/27/1.916593e−007 263/1272/9.349690e−007 767/5087/9.612016e−007 1381/8795/9.342003e−007 7/8/5.091922e−010 7/8/1.533339e−010 7/8/2.070210e−010 43/44/3.413141e−007 129/130/9.665198e−007 133/134/9.674219e−007 109/110/9.943799e−007 17/18/6.523010e−007 17/18/5.117020e−007 7/8/2.559413e−007 7/8/4.284203e−007 7/8/8.023870e−007 41/168/8.682101e−007 127/688/9.017690e−007 205/584/5.625530e−007 2/3/3.724462e−012 2/3/3.701625e−011 2/3/2.402670e−010 179/1216/8.423661e−008 564/3911/3.372063e−007 Fails 1/2/0.000000e+000 1/2/0.000000e+000 1/2/0.000000e+000 341/979/9.440352e−007 234/732/4.230040e−007 287/750/9.409452e−007 120/338/5.558580e−007 179/299/9.257026e−007 79/227/8.518929e−007 104/189/4.289544e−007 Fails 206/522/9.115842e−007 27/91/5.034704e−007 32/82/9.188707e−007 29/79/5.495169e−007 24/25/9.819359e−007 24/25/4.275626e−007 23/24/1.721439e−007 0/1/0.000000e+000 0/1/0.000000e+000 0/1/0.000000e+000

23/41/5.327578e−007 97/194/8.917331e−007 96/190/8.112812e−007 33/243/8.733449e−008 45/318/4.331731e−007 45/319/8.771359e−007 Fails Fails Fails 18/68/3.702085e−007 506/1403/9.806729e−007 9999/68562/5.378534e−005 1238/7098/9.931001e−007 Fails Fails 7/8/5.091924e−010 7/8/1.533339e−010 7/8/2.070211e−010 33/34/4.893097e−007 86/87/9.430458e−007 90/91/9.143368e−007 34/35/2.587953e−007 64/65/7.960480e−007 69/70/6.264744e−007 7/8/1.721378e−007 7/8/2.733460e−007 7/8/5.086604e−007 30/31/6.862573e−007 100/136/9.310603e−007 124/181/9.829743e−007 Fails Fails Fails Fails Fails Fails 1/2/0.000000e+000 1/2/0.000000e+000 1/2/0.000000e+000 Fails Fails Fails Fails Fails Fails Fails Fails Fails Fails Fails Fails 51/52/3.913197e−007 69/70/9.375596e−007 58/59/9.983970e−007 0/1/0.000000e+000 0/1/0.000000e+000 0/1/0.000000e+000

12/15/3.007441e−008 12/14/2.386361e−007 12/14/2.500912e−007 15/33/8.822533e−007 17/33/2.137398e−007 16/34/9.807169e−007 78/436/8.294420e−007 56/372/3.152238e−007 59/333/9.355782e−007 31/67/8.607576e−007 20/42/1.897456e−007 19/41/1.916593e−007 263/1300/9.349690e−007 767/5129/9.612016e−007 1381/8851/9.342003e−007 7/8/5.091922e−010 7/8/1.533339e−010 7/8/2.070210e−010 43/44/3.413141e−007 129/130/9.665198e−007 133/134/9.674219e−007 109/110/9.943799e−007 17/18/6.523010e−007 17/18/5.117020e−007 7/8/2.559413e−007 7/8/4.284203e−007 7/8/8.023870e−007 41/42/8.682101e−007 127/247/9.017690e−007 205/500/5.625530e−007 2/10/3.724462e−012 2/10/3.701625e−011 2/10/2.402670e−010 179/740/8.423661e−008 564/2756/3.372063e−007 Fails 1/2/0.000000e+000 1/2/0.000000e+000 1/2/0.000000e+000 341/958/9.440352e−007 234/718/4.230040e−007 287/736/9.409452e−007 120/310/5.558580e−007 179/299/9.257026e−007 79/199/8.518929e−007 104/189/4.289544e−007 Fails 206/522/9.115842e−007 27/91/5.034704e−007 32/82/9.188707e−007 29/79/5.495169e−007 24/25/9.819359e−007 24/25/4.275626e−007 23/24/1.721439e−007 0/1/0.000000e+000 0/1/0.000000e+000 0/1/0.000000e+000

In the experiments, the parameters in Algorithm 1 and the normal BFGS method were chosen as r = 0.1, ρ = 0.5, δ = 0.9, σ = 0.95, m1 = 6, δ1 = δ2 = 0.01, ϵk = (1/NI )2 , B0 is the unit matrix. The program was coded in MATLAB 7.0.1. We stopped the iteration when the condition ‖g (x)‖ ≤ 10−6 was satisfied. Since the line search cannot always ensure the descent condition dTk gk < 0, uphill search direction may occur in the numerical experiments. In this case, the line search rule maybe fails. In order to avoid this case, the stepsize αk will be accepted if the searching time is larger than eight in the inner circle for the test problems. The columns of the Table 1 have the following meaning: x0 : the starting point. GF: the evaluations of the norm function ‖g (x)‖. NI: the total number of iterations. NG: the iteration number of the function evaluations.

G. Yuan et al. / Mathematical and Computer Modelling 54 (2011) 367–377

377

Dim: the dimension of the problem. Fails: denotes the condition ‖g (x)‖ ≤ 10−6 was not satisfied. From the numerical results in Table 1, it is not difficult to show that the L-BFGS method is more successful than the normal BFGS method does. There are many problems which can not be solved by the normal BFGS method. The number of the iterations and the function iteration on Algorithm 1 are less than those of the normal BFGS method. We can see that the numerical results of the line search technique (1.8) and the line search technique (1.9) are similar for many problems. Then we can conclude that the L-BFGS method is more competitive than the normal BFGS method. 5. Conclusion We combine the limited memory BFGS method and the inexact backtracking line search techniques for solving symmetric nonlinear equations. The major contribution of this paper is an extension of the line search techniques (1.8) and (1.9) to limited memory scheme. The given algorithm possesses global convergence and the numerical results show that it is successful for the test problems. We hope the method can be a further topic for the symmetric nonlinear equations. For further research, we should study the convergence of the limited memory BFGS method under other line search rules. Moreover, more numerical experiments for large practical problems should be done in the future. Acknowledgements We would like to thank the anonymous referee for catching several typos of the paper, and their useful suggestions and comments which improved the paper greatly. The first author’s work is supported by Program for Excellent Talents in Guangxi Higher Education Institutions, China NSF grands 10761001, Guangxi Education research project grands 201012MS013, and Guangxi SF grands 0991028. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26]

J.M. Ortega, W.C. Rheinboldt, Iterative of Nonlinear Equations in Several Variables, Academic Press, New York, 1970. J.Y. Fan, A modified Levenberg–Marquardt algorithm for singular system of nonlinear equations, Journal of Computer Mathematics 21 (2003) 625–636. Y. Yuan, Trust region algorithm for nonlinear equations, Information 1 (1998) 7–21. J. Zhang, Y. Wang, A new trust region method for nonlinear equations, Mathematical Methods of Operations Research 58 (2003) 283–298. P.N. Brown, Y. Saad, Convergence theorey of nonlinear Newton–Kryloy algorithms, SIAM Journal on Optimization 4 (1994) 297–330. D. Zhu, Nonmonotone backtracking inexact quasi-Newton algorithms for solving smmoth nonlinear equations, Applied Mathematics and Computation 161 (2005) 875–895. G. Yuan, X. Lu, A new backtracking inexact BFGS method for symmetric nonlinear equations, Computers and Mathematics with Applications 55 (2008) 116–129. D. Li, M. Fukushima, A global and superlinear convergent Gauss–Newton-based BFGS method for symmetric nonlinear equations, SIAM Journal on Numerical Analysis 37 (1999) 152–172. G. Yuan, X. Li, A rank-one fitting method for solving symmetric nonlinear equations, Journal of Applied Functional Analysis 5 (2010) 389–407. D. Li, L. Qi, S. Zhou, Descent directions of quasi-Newton methods for symmetric nonlinear equations, SIAM Journal on Numerical Analysis 40 (2002) 1763–1774. G. Yuan, A new method with descent property for symmetric nonlinear equations, Numerical Functional Analysis and Optimization 31 (2010) 974–987. G. Yuan, X. Lu, Z. Wei, BFGS trust-region method for symmetric nonlinear equations, Journal of Computational and Applied Mathematics 230 (2009) 44–58. S.G. Nash, A surey of truncated-Newton matrices, Journal of Computational and Applied Mathematics 124 (2000) 45–59. A. Griewank, The ‘global’ convergence of Broyden-like methods with a suitable line search, Journal of the Australian Mathematical Society, Series B 28 (1986) 75–92. R.H. Byrd, J. Nocedal, R.B. Schnabel, Representations of quasi-Newton matrices and their use in limited memory methods, Mathematical Programming 63 (1994) 129–156. Y. Xiao, D. Li, An active set limited memory BFGS algorithm for large-scale bound constrained optimization, Mathematical Methods of Operations Research 67 (2008) 443–454. Y. Xiao, Z. Wei, A new subspace limited memory BFGS algorithm for large-scale bound constrained optimization, Applied Mathematics and Computation 185 (2007) 350–359. R. Fletcher, Practical Meethods of Optimization, 2nd ed., John Wiley and Sons, Chichester, 1987. R. Byrd, J. Nocedal, A tool for the analysis of quasi-Newton methods with application to unconstrained minimization, SIAM Journal on Numerical Analysis 26 (1989) 727–739. E. Yamakawa, M. Fukushima, Testing parallel bariable transformation, Computational Optimization and Applications 13 (1999) 253–274. M. Gomez-Ruggiero, J.M. Martinez, A. Moretti, Comparing algorithms for solving sparse nonlinear systems of equations, SIAM Journal on Scienitific and Statistical Computing 23 (1992) 459–483. M. Raydan, The Barzilai and Borwein gradient method for the large scale unconstrained minimization problem, SIAM Journal on Optimization 7 (1997) 26–33. G. Li, Successive column correction algorithms for solving sparse nonlinear systems of equations, Mathematical Programming 43 (1989) 187–207. Y. Bing, G. Lin, An efficient implementation of Merrill’s method for sparse or partially separable systems of nonlinear equations, SIAM Journal on Optimization 2 (1991) 206–221. J.J. Moré, B.S. Garbow, K.E. Hillström, Testing uncosntrained optimization software, ACM Transactions on Mathematical Software 7 (1981) 17–41. S.M. Roberts, J.J. Shipman, On the closed form solution of Troeschs problem, Journal of Computational Physics 21 (1976) 291–304.