A Posteriori Error Estimates of Krylov Subspace Approximations to ...

2 downloads 0 Views 299KB Size Report
Jul 27, 2013 - functions defined recursively by a general function f(z) for arbitrary nodes, we .... In this paper, by introducing certain general functions defined ...
A Posteriori Error Estimates of Krylov Subspace Approximations to Matrix Functions∗ Zhongxiao Jia†

Hui Lv‡

arXiv:1307.7219v1 [math.NA] 27 Jul 2013

Abstract Krylov subspace methods for approximating a matrix function f (A) times a vector v are analyzed in this paper. For the Arnoldi approximation to e−τ A v, two reliable a posteriori error estimates are derived from the new bounds and generalized error expansion we establish. One of them is similar to the residual norm of an approximate solution of the linear system, and the other one is determined critically by the first term of the error expansion of the Arnoldi approximation to e−τ A v due to Saad. We prove that each of the two estimates is reliable to measure the true error norm, and the second one theoretically justifies an empirical claim by Saad. In the paper, by introducing certain general functions defined recursively by a general function f (z) for arbitrary nodes, we obtain the error expansion of the Arnoldi approximation for a wide range of matrix functions f (A), which generalizes Saad’s result on the Arnoldi approximation to e−τ A v. Similarly, it is shown that the first term of the generalized error expansion can be used as a reliable a posteriori estimate for the Arnoldi approximation to some other matrix functions times v. Numerical examples are reported to demonstrate the effectiveness of the a posteriori error estimates for the Arnoldi approximations to e−τ Av, cos(A)v and sin(A)v. Keywords. Krylov subspace method, Arnoldi approximation, matrix functions, a posteriori error estimates, error bounds, error expansion AMS Subject Classifications (2010). 15A16, 65F15, 65F60

1

Introduction

For a given matrix A ∈ CN ×N and a complex-valued function f (z), the problem of numerically approximating f (A)v for a given vector v ∈ CN arises in many applications; see, e.g., [16]. Solving the linear system Ax = b, which involves the reciprocal function f (z) = z −1 , is a special instance and of great importance. Approximating the matrix exponential eA v is the core of many exponential integrators for solving systems of linear ordinary differential equations or time-dependent partial differential equations. Other applications include the evaluation of f (A)v for the square root function, trigonometric functions, the sign function, the logarithm function, etc [10, 15, 16, 36]. The matrix A is generally large and sparse in many applications, for which computing f (A) by conventional algorithms for small or medium sized A is unfeasible. In this case, Krylov subspace methods, in which the only action of A is to form matrix-vector products, have been effective tools for computing f (A)v. The Krylov subspace methods for approximating f (A)v first project f (A) onto a low dimensional Krylov subspace Km (A, v) = ∗ Supported in part by National Basic Research Program of China 2011CB302400 and National Science Foundation of China (No. 11071140). † Department of Mathematical Sciences, Tsinghua University, Beijing, 100084, People’s Republic of China, [email protected] ‡ Department of Mathematical Sciences, Tsinghua University, Beijing, 100084, People’s Republic of China, [email protected]

1

span{v, Av, . . . , Am−1 v}, and then compute an approximation to f (A)v by calculating a small sized matrix function times a specific vector [15, 16, 28]. Since computing an approximation to f (A)v needs all the basis vectors of Km (A, v), this generally limits the dimension of Km (A, v) and makes restarting necessary. The restarted Krylov subspace methods presented in [2, 11, 20, 30, 32, 35] aim at overcoming this disadvantage. However, restarting may slow down the convergence substantially, and may even fail to converge [1, 3, 11, 35]. A deflated restarting technique is proposed in [12] that is adapted from [24, 25] for eigenvalue problems and linear systems so as to accelerate the convergence. A priori error estimates for Krylov subspace approximations to f (A)v can be found in a number of papers, e.g., [14, 22, 23, 26]. As it is known, the convergence of the approximations depends on the regularity of the function f (z) over the given domain, the spectral properties of A and the choice of the interpolation nodes [23]. In particular, there have been several a priori error estimates available for Krylov subspace methods to approximate e−τ A v; see, e.g., [13, 15, 28]. As it has turned out, the methods may exhibit the superlinear convergence [17]. A few quantitatively similar bounds have been derived in [9, 34]. The results in [31] also reveal the superlinear convergence of the approximations, but they are less sharp than those in [9, 17, 34], at least experimentally [21]. The bounds in [37, 38] relate the convergence to the condition number of A when it is symmetric positive definite, and can be quite accurate. Obviously, these results help understand the methods, but are not applicable in practical computations as they involve the a priori spectrum or the field of values. For many familiar linear algebra problems, such as the linear system, the eigenvalue problem, the least squares problem, and the singular value decomposition, each of them has a clear residual notion that can be used for determining the convergence and designing stopping criteria in iterative methods. However, the Krylov subspace approximation to f (A)v is not naturally equipped with a standard residual notion. In fact, the lack of a residual notion is a central problem in computing matrix functions acting on a vector. Effective a posteriori error estimates are, therefore, very appealing and crucial to design practical algorithms. Effective terminations of iterative algorithms have attracted much attention over the last two decades. In [18, 23], the authors have introduced a generalized residual of the Arnoldi approximation to f (A)v. In [5], the authors have provided more details when f (z) = e−τ A . For A symmetric, a Krylov subspace method based on Chebyshev series expansion for e−τ z in the spectral interval of A has been proposed in [4]. The method uses very low storage, but requires accurate approximation of the spectral interval of A. An a posteriori error estimate for this method has been derived in [4]. In [7], a posteriori error estimates have been established for several polynomial Krylov approximations to a class of matrix functions, which are formulated in the general integral form, acting on a vector. These functions include exponential-like functions, cosine, and the sinc function arising in the solution of second-order differential problems. These estimates cannot be unified, each of which is function dependent and quite complicated. In [28], empirically, Saad claimed that the first term of the error expansion can be a reliable a posteriori error estimate for the Arnoldi approximation to eA v. However, its theoretical justification has not yet been given hitherto. In this paper, by introducing certain general functions defined recursively by a general function f (z) for arbitrary nodes in the field of values of A, we obtain an error expansion of the Arnoldi approximation for a wide range of matrix functions f (A). The error expansion is an infinite series, which generalizes Saad’s result on the Arnoldi approximation to e−τ A v. With a specific choice of the nodes, the infinite series reduces to a finite term series. We establish two new upper bounds for the Arnoldi approximation to e−τ A v where τ is often the time step parameter in a finite difference time-stepping method. In the case that A is Hermitian, we derive more compact results that are more convenient to use and more accurate to measure the true error. For the matrix exponential, by a rigorous analysis of the

2

infinite series expansion of error, we derive a reliable a posteriori error estimate, theoretically proving that the first term in the infinite series expansion suffices to provide a good estimate for the true error and confirming the empirical claim due to Saad [28]. We also show why the first term of the error expansion provides a new reliable a posteriori estimate for the Arnoldi approximation to a wide range of general functions other than the exponential function only. As a consequence, the Arnoldi approximation to e−τ A v and more functions acting on a vector is equipped with new theoretically solid stopping criteria. Some typical numerical examples are reported to illustrate the effectiveness of the a posteriori error estimates for the Arnoldi approximations to not only e−τ A v but also sin(A)v and cos(A)v, confirming the sharpness of our a posteriori error estimates. The paper is organized as follows. In Section 2, we review some definitions and properties of matrix functions. We also describe the framework and some basic properties of Krylov subspace approximations to f (A)v based on the Krylov like decomposition. In Section 3, we establish the error expansion of the Arnoldi approximation to general f (A)v. In Section 4, we present some upper bounds for the Arnoldi approximation to e−τ A v and derive two more compact bounds in the case that A is Hermitian. In Section 5, we derive some a posteriori error estimates from the bounds and expansions established in the previous section, and justify the rationale of the first term of each error expansion as an error estimate. The numerical results are then reported to illustrate the sharpness of our a posteriori error estimates for the matrix exponential, the sine and cosine functions. Finally, in Section 6 we conclude the paper with some further remarks and point out future work. Throughout the paper let A be a given matrix of order N , denote by k·k the Euclidean vector norm and the corresponding induced matrix norm, by the asterisk ∗ the conjugate transpose of a matrix or vector, and by the superscript T the transpose of a matrix or vector. The set F (A) ≡ {x∗ Ax : x ∈ CN , x∗ x = 1} denotes the field of values of A, and spec(A) is the spectrum of A.

2

Krylov subspace approximations to f (A)v

In this section, we present some definitions and properties of matrix functions to be used later and review a class of popular Krylov subspace approximations to f (A)v.

2.1

Polynomial interpolatory properties of f (A)

The matrix function f (A) can be equivalently defined via the Jordan canonical form, the Hermite interpolation or the Cauchy integral theorem; see [19, p. 383-436] for details. We review the definition via the Hermite interpolation and some properties associated with it. Definition 2.1. Let A have the minimal polynomial qA (z) = (z − λ1 )r1 · · · (z − λµ )rµ , where λ1 , . . . , λµ are distinct and all ri ≥ 1, let f (z) be a given scalar-valued function whose domain includes the points λ1 , . . . , λµ , and assume that each λi is in the interior of the domain and f (z) is (ri − 1) timesPdifferentiable at λi . Then f (A) ≡ p(A), where p(z) is the unique polynomial of degree µi=1 ri − 1 that satisfies the interpolation conditions p(j) (λi ) = f (j) (λi ), j = 0, 1, . . . , ri − 1, i = 1, . . . , µ.

Proposition 2.1. The polynomial p(z) interpolating f (z) and its derivatives at the roots of qA (z) = 0 can be given explicitly by the Hermite interpolating polynomial. Its Newtonian divided difference form is p(z) = f [x1 ] + f [x1 , x2 ](z − x1 ) + f [x1 , x2 , x3 ](z − x1 )(z − x2 ) + · · · +f [x1 , x2 , . . . , xm ](z − x1 )(z − x2 ) · · · (z − xm−1 ), 3

where m = deg qA (z), the set {xi }m i=1 comprises the distinct eigenvalues λ1 , . . . , λµ with λi having multiplicity ri , and f [x1 , x2 , . . . , xk ] is the divided difference of order k − 1 at x1 , x2 , . . . , xk with f [x1 ] = f (x1 ). In Section 3, we will describe the definition of divided differences and present some properties to be used.

2.2

The Krylov like decomposition and computation of f (A)v

The Arnoldi approximation to f (A)v is based on the Arnoldi decomposition AVm = Vm Hm + hm+1,m vm+1 eTm ,

(2.1)

where the columns of Vm = [v1 , v2 , . . . , vm ] form an orthonormal basis of the Krylov subspace Km (A, v) = span{v, Av, . . . , Am−1 v} with v1 = v/ kvk, Hm = [hi,j ] is an unreduced upper Hessenberg matrix and em ∈ Rm denotes the mth unit coordinate vector. The Arnoldi approximation to f (A)v is given by fm = kvk Vm f (Hm )e1 .

(2.2)

For Hm , there is the following well-known property [27]. Proposition 2.2. Each eigenvalue of Hm has geometric multiplicity equal to one and the minimal polynomial of Hm is its characteristic polynomial. From this proposition and Definition 2.1, it is direct to get the following result. Proposition 2.3. Assume that the spectrum of Hm is contained in the domain of f (z). Then f (Hm) = p(Hm ), where p(z) is the polynomial interpolating f (z), in the Hermitian sense, at the eigenvalues of Hm , repeated according to their multiplicities. In [12], a more general decomposition, called the Krylov like decomposition, is introduced, and the the associated Krylov like approximation to f (A)v is given. The Krylov like decomposition of A with respect to Km (A, v) is of the form T AWm+l = Wm+l Km+l + wkm+l ,

(2.3)

where Km+l ∈ C(m+l)×(m+l) , Wm+l ∈ CN ×(m+l) with range(Wm+l ) = Km (A, v), w ∈ Km+1 (A, v) \ Km (A, v) and km+l ∈ Cm+l . Let f (z) be a function such that f (Km+l ) is defined. Then the Krylov like approximation to f (A)v associated with (2.3) is given by fˆm = Wm+l f (Km+l )ˆb,

(2.4)

where ˆb ∈ Cm+l is any vector such that Wm+lˆb = v. Since the vector w lies in Km+1 (A, v) \ Km (A, v), it can be expressed as w = pm (A)v with a unique polynomial pm (z) of exact degree m. The following result is proved in [12] for fˆm . Theorem 2.1. For the polynomial pm (z) defined by w = pm (A)v, the Krylov like approximation (2.4) to f (A)v can be characterized as fˆm = qm−1 (A)v, where qm−1 (z) interpolates the function f (z) in the Hermitian sense at zeros of pm (z), i.e., at some but, in general, not all eigenvalues of Km+l . 4

For more properties of the Krylov like approximation to f (A)v, one can refer to [12]. Obviously, the Arnoldi decomposition (2.1) is a special instance of the Krylov like decomposition (2.3) when l = 0. If A is Hermitian, the Arnoldi decomposition and approximation reduce to the Lanczos decomposition and approximation. Define the error Em ≡ f (A)v − Wm+l f (Km+l )ˆb. (2.5) Unlike many familiar linear algebra problems, such as the linear system, the eigenvalue problem, the least squares problem and the singular value decomposition, which have exact a posteriori residual norms that are used for stopping criteria in iterative methods, the Krylov subspace approximation to f (A)v is not naturally equipped with a stopping criterion since the error norm kEm k is a priori and cannot be computed explicitly. Therefore, it is crucial to establish reliable and accurate a posteriori error estimates in practical computations. As it has turned out in the literature, this task is nontrivial. We will devote ourselves to this task in later context.

3

The expansion of error Em

In this section, we analyze the error Em produced by the Krylov like approximation (2.4). Inspired by an error expansion of the Arnoldi approximation to eA v due to Saad in [28], we establish a more general form of the error expansion for a wide range of functions. Our attempt is different from Saad’s where he is confined to seeking an expansion formula in terms of Ak vm+1 , k = 0, 1, . . . , ∞, and finds some more general, insightful and informative formula that works for as many functions as possible. Note that the field of values F (A) is a close set. Assume that f (z) is analytic in F (A). Let z0 , z1 , . . . , zk , . . . ∈ F (A) be ordered so that equal points are contiguous, i.e., zi = zj (i < j) → zi = zi+1 = · · · = zj ,

(3.1)

and define the function sequence {φk }∞ k=0 by the recurrence   φ0 (z) = f (z), φk (z) − φk (zk )  φk+1 (z) = , k ≥ 0, z − zk

(3.2)

where zk ∈ F (A). Noting that φk (zk ) is well defined by continuity, it is clear that these functions are analytic for all k. Divided differences of the function f (z) at the points zk are defined recursively by f [zk ] = f (zk ),   f (zk+1 ) − f (zk ) , zk 6= zk+1 , f [zk , zk+1 ] = zk+1 − zk  ′ f (zk ), zk = zk+1 ,  f [z1 , z2 , . . . , zk+1 ] − f [z0 , z1 , . . . , zk ]   , z0 6= zk+1 ,  zk+1 − z0 f [z0 , z1 , . . . , zk+1 ] = f (k+1) (zk+1 )    , z0 = zk+1 . (k + 1)!

If f (z) is k-times continuously differentiable, f [z0 , z1 , . . . , zk ] is a continuous function of its arguments. Moreover, for real points zi ∈ F (A), we have f [z0 , z1 , . . . , zk ] =

f (k) (ζ) for some ζ ∈ F (A). k! 5

(3.3)

However, no result of the form (3.3) holds for complex zi . Nevertheless, if z0 , z1 , . . . , zk ∈ F (A) it holds [16, p. 333] that maxz∈F (A) f (k) (z) . (3.4) |f [z0 , z1 , . . . , zk ]| ≤ k! From the above it is direct to get φk+1 (z) = f [z, z0 , . . . , zk ]. Next we establish a result on the expansion of error Em . Theorem 3.1. Assume that f (z) is analytic in the field of values F (A), and there exists a positive constant C such that maxz∈F (A) f (k) (z) ≤ C for all k ≥ 0. Then the error Em produced by the Krylov like approximation satisfies the expansion Em

= f (A)v − Wm+l f (Km+l )ˆb =

∞ X

T km+l φk (Km+l )ˆb qk−1 (A)w,

(3.5)

k=1

where q0 (z) = 1, qk (z) = (z − z0 ) · · · (z − zk−1 ), k ≥ 1, and ˆb ∈ Cm+l is any vector satisfying Wm+lˆb = v. In particular, if z0 , z1 , . . . , zN −1 are the N exact eigenvalues of A counting multiplicities, then the infinite series (3.5) is simplified as a finite one Em

= f (A)v − Wm+l f (Km+l )ˆb =

N X

T km+l φk (Km+l )ˆb qk−1 (A)w

(3.6)

k=1

if only f (z) is analytic in F (A). Proof. Defining

sjm = φj (A)v − Wm+l φj (Km+l )ˆb,

(3.7)

which is the error of the Krylov like approximation to φj (A)v, and making use of the relation f (z) = (z − z0 )φ1 (z) + f (z0 ), we have f (A)v = f (z0 )v + (A − z0 I)φ1 (A)v

= f (z0 )v + (A − z0 I)(Wm+l φ1 (Km+l )ˆb + s1m )  T = f (z0 )v + Wm+l (Km+l − z0 I) + wkm+l φ1 (Km+l )ˆb + (A − z0 I)s1m   = Wm+l f (z0 )ˆb + (Km+l − z0 I)φ1 (Km+l )ˆb T φ1 (Km+l )ˆb w + (A − z0 I)s1m +km+l T = Wm+l f (Km+l )ˆb + km+l φ1 (Km+l )ˆb w + (A − z0 I)s1m .

Proceed in the same way. We get φ1 (A)v = φ1 (z1 )v + (A − z1 I)φ2 (A)v

= φ1 (z1 )v + (A − z1 I)(Wm+l φ2 (Km+l )ˆb + s2m )  T = φ1 (z1 )v + Wm+l (Km+l − z1 I) + wkm+l φ2 (Km+l )ˆb + (A − z1 I)s2m   = Wm+l φ1 (z1 )ˆb + (Km+l − z1 I)φ2 (Km+l )ˆb T φ2 (Km+l )ˆb w + (A − z1 I)s2m +km+l T φ2 (Km+l )ˆb w + (A − z1 I)s2m . = Wm+l φ1 (Km+l )ˆb + km+l

6

(3.8)

T By definition (3.7), this gives s1m = km+l φ2 (Km+l )ˆb w + (A − z1 I)s2m . Continue expanding j−1 in the same manner. Then we get the following key recurrence formula s2m , s3m , . . . , sm j−1 T φj (Km+l )ˆb w + (A − zj−1 I)sjm , j = 2, . . . , ∞. sm = km+l j−1 Substituting s1m , s2m , . . . , sm successively into (3.8), we obtain

Em = f (A)v − Wm+l f (Km+l )ˆb =

j X

T φk (Km+l )ˆb qk−1 (A)w + qj (A)sjm , km+l

k=1

where q0 (z) = 1, qk (z) = (z − z0 ) · · · (z − zk−1 ), k ≥ 1. Next we prove that kqj (A)sjm k converges to zero faster than O(1/j) for j ≥ M with M a large enough positive integer, i.e., kqj (A)sjm k ≤ o(1/j), when j is large enough. By the definition of sjm and (3.4), we get maxζ∈F (A) f (j) (ζ) C ≤ . | φj (z) |=| f [z, z0 , . . . , zj−1 ] |≤ j! j! Since z0 , z1 , . . . , zj−1 ∈ F (A), we have |zi | ≤ kAk, i = 0, 1, . . . , j − 1. Consequently, we have kqj (A)k ≤ (2kAk)j . So, by Stirling’s inequality  j  j   p p j j 1 2πj < j! < 2πj 1+ , e e 12j − 1 where e is the base number of natural logarithm, we obtain   C1 (2kAk)j 2kAke j k. 2kAke From this, we get log k > log 2kAke + Note that 0 ≤

log k k

log k . k

< 1 for k ≥ 1. Therefore, we relax the above inequality problem to log k ≥ log 2kAke + 1 = log 20kAke,

whose minimal k = M = ⌈20kAke⌉, where ⌈·⌉ is the ceil function. From this,Pthe sum of ∞ −3/2 norms of the terms from the M th to infinity is no more than the order of = Mk R ∞ −3/2 1 √ dx = 2 M , which is smaller than the sum of norms of the first (M − 1) terms, each M x √ of them is of O(1/ k), k = 1, 2, . . . , M − 1. Noting this remarkable decaying tendency, we deduce that the norm of the first term in (3.5) is generally the same order as kEm (τ )k for all sufficiently smooth functions.

4

Upper bounds for kEm (τ )k

Here and hereafter let β = kvk. In this section, we first establish some new upper bounds for the norm of Em (τ ) = e−τ A v − βVm e−τ Hm e1 . Then we prove theoretically why the first term in (3.5) measures the error reliably. This justifies the observation that the first term is numerically “surprisingly sharp” in [28], and for the first time provides a solid theoretical support on the rationale of the error estimates advanced in [28]. Let µ2 [A] denote the 2-logarithmic norm of A, which is defined by kI + hAk − 1 . h→0 h

µ2 [A] = lim

The logarithmic norm has plenty of properties; see [6, p. 31] and [33]. Here we list some of them that will be used later. Proposition 4.1. Let µ2 [A] denote the 2-logarithmic norm of A. Then we have (1) − kAk ≤ µ2 [A] ≤ kAk;   A + A∗ ; (2) µ2 [A] = λmax 2 (3) µ2 [tA] = tµ2 [A], for all t ≥ 0;

(4) etA ≤ etµ2 [A] , for all t ≥ 0. 8

4.1

An upper bound for kEm (τ )k

Next we establish an upper bound for the error norm for a general A and refine it when A is Hermitian. Theorem 4.1. With the notation described previously, it holds that

where µ2 = µ2 [−A].

eτ µ2 − 1 kEm (τ )k ≤ τ βhm+1,m max eTm e−tHm e1 , 0≤t≤τ τ µ2

(4.1)

Proof. Let w(t) = e−tA v denote the true solution and wm (t) = βVm e−tHm e1 be the mth Arnoldi approximation. Then w(t) and wm (t) satisfy w(t)′ = −Aw(t), w(0) = v

(4.2)

and wm (t)′ = −βVm Hm e−tHm e1 , wm (0) = v, respectively. Using (2.1), we have wm (t)′ = −β(AVm − hm+1,m vm+1 eTm )e−tHm e1

= −βAVm e−tHm e1 + βhm+1,m (eTm e−tHm e1 )vm+1

= −Awm (t) + βhm+1,m (eTm e−tHm e1 )vm+1 .

(4.3)

Define the error Em (t) = w(t) − wm (t). Then by (4.2) and (4.3), Em (t) satisfies Em (t)′ = −AEm (t) − g(t), Em (0) = 0, where g(t) = βhm+1,m (eTm e−tHm e1 )vm+1 . By solving the above ODE, we get Z τ e(t−τ )A g(t)dt Em (τ ) = − 0 Z τ = −βhm+1,m (eTm e−tHm e1 )e(t−τ )A vm+1 dt. 0

Taking the norms on the two sides gives

where µ2 = µ2 [−A].

eτ µ2 − 1 , kEm (τ )k ≤ τ βhm+1,m max eTm e−tHm e1 0≤t≤τ τ µ2

In the case that A is Hermitian, the Arnoldi decomposition reduces to the Lanczos decomposition AVm = Vm Tm + βm+1 vm+1 eTm , (4.4) where Tm is Hermitian tridiagonal. The approximation to e−τ A v is given by e−τ A v ≈ βVm e−τ Tm e1 . Remark 4.1. For A Hermitian positive semidefinite, it follows from (4.1) that kEm (τ )k ≤ τ ββm+1 max eTm e−tTm e1 ,

(4.5)

0≤t≤τ

which coincides with the result presented in [37, Theorem 3.1] by setting α = 0 or τ there. 9

In the above theorem, for A Hermitian we can derive a more compact form, which is practically more convenient to use and more accurate to measure the true error. Theorem 4.2. Assume that spec(A) is contained in the interval Λ ≡ [a, b] and the Lanczos process (4.4) can be run m steps without breakdown. Then kEm (τ )k ≤ γ1 τ ββm+1 eTm e−τ Tm e1 , (4.6) where

and µ2 = µ2 [−A].

 τµ  eτ (b−a) e 2 − 1 , µ 6= 0 2 γ1 = τ µ2  eτ (b−a) , µ2 = 0

Proof. As before, the error Em (τ ) satisfies Z τ Em (τ ) = −ββm+1 (eTm e−tTm e1 )e(t−τ )A vm+1 dt.

(4.7)

(4.8)

0

Let λ1 , . . . , λm be the m eigenvalues of Tm and f (z) = ez . According to Propositions 2.1–2.3, it is easy to verify that for all t ≥ 0, −tTm

e

=

m−1 X

ai (t)(−tTm )i ,

i=0

where am−1 (t) = f [−tλ1 , . . . , −tλm ], t ≥ 0. From the tridiagonal structure of Tm , we have m−1 e1 . eTm e−tTm e1 = (−t)m−1 am−1 (t)eTm Tm

(4.9)

Specially, for t = τ we obtain m−1 eTm e−τ Tm e1 = (−τ )m−1 am−1 (τ )eTm Tm e1 .

(4.10)

Combining (4.8), (4.9) and (4.10), we get Em (τ ) =

−ββm+1 eTm e−τ Tm e1

Z

τ 0

 m−1 am−1 (t) (t−τ )A t e vm+1 dt. τ am−1 (τ )

(4.11)

Denote −tΛ ≡ [−tb, −ta], 0 ≤ t ≤ τ. (3.3) shows that there exist ζ1 ∈ −tΛ and ζ2 ∈ −τ Λ, such that (m−1) f (m−1) (ζ ) max am−1 (t) (ζ) ζ∈−tΛ f 1 = ≤ am−1 (τ ) f (m−1) (ζ2 ) minζ∈−τ Λ f (m−1) (ζ) maxζ∈−τ Λ f (m−1) (ζ) = eτ (b−a) . ≤ (4.12) minζ∈−τ Λ f (m−1) (ζ) From the above and (4.11), we get

 m−1

t

eτ (b−a) e(t−τ )A dt τ 0 Z   T −τ T τ (b−a) τ t m−1 (τ −t)µ 2 ≤ ββm+1 em e m e1 e e dt τ 0 ≤ γ1 τ ββm+1 eTm e−τ Tm e1 ,

kEm (τ )k ≤ ββm+1 eTm e−τ Tm e1

where γ1 is defined as (4.7) and µ2 = µ2 [−A].

10

Z

τ

Consider the linear system Ax = v. It is known [29, p. 159-160] that the Lanczos −1 e and the a posteriori residual approximation xm to x = A−1 v is given by xm = βVm Tm 1 rm = v − Axm satisfies −1 rm = ββm+1 (eTm Tm e1 )vm+1 , T −1 whose norm is krm k = ββm+1 em Tm e1 . The a priori error em = x − xm is closely related to rm by em = A−1 rm . Therefore, we have

−1 kem k = A−1 rm ≤ kA−1 kkrm k = γ˜1 ββm+1 eTm Tm e1 , (4.13)

where γ˜1 = A−1 . Theorem 4.2 gives a similar result for the matrix exponential e−τ A v. In the same spirit, for the Lanczos approximation to f (A)v, the left-hand side of the error norm (4.6) is the a priori error, while its right-hand side excluding the a priori factor γ1 is computable and can be interpreted as an a posteriori error. We can write (4.6) and (4.13) in a unified form kf (A)v − βVm f (Tm )e1 k ≤ γββm+1 eTm f (Tm )e1 , where γ is a constant depending on the spectrum of A and f (z) = z −1 or ez .

4.2

The other upper bound for kEm (τ )k

We now analyze expansion (3.5) when f (z) = ez and the sequence {φk (z)} are defined as (3.2), derive compact upper bounds for the first term and the sum of the rest in expansion (3.5), and for the first time prove that kEm (τ )k is determined by the first term of the error expansion. For the Arnoldi approximation (2.2) to e−tA v, expansion (3.5) becomes Em (t) = −tβhm+1,m

∞ X (−t)k−1 eTm φk (−tHm )e1 qk−1 (A)vm+1 , k=1

where q0 = 1, qk = (z − z0 ) · · · (z − zk−1 ), zk ∈ F (A), k ≥ 1. Denoting (2) Em (t) = −tβhm+1,m

∞ X (−t)k−1 eTm φk (−tHm )e1 qk−1 (A)vm+1 , k=2

we can present the following results. (2)

Theorem 4.3. Let µ2 = µ2 [−A]. Then Em (τ ) and Em (τ ) satisfy



(2)

Em (τ ) ≤ γ2 τ βhm+1,m max eTm φ1 (−tHm )e1 , 0≤t≤τ kEm (τ )k ≤ (1 + γ2 )τ βhm+1,m max eTm φ1 (−tHm )e1 , 0≤t≤τ

(4.14)

respectively, where

eτ µ2 − 1 , µ2 6= 0 γ2 = µ2  τ kq (A)v µ2 = 0. 1 m+1 k ,  

kq1 (A)vm+1 k

11

(4.15)

Proof. Define the (m + 1) × (m + 1) matrix   Hm − z0 I 0 Hm ≡ hm+1,m eTm 0 and (2) w(t) = e−tA v, wm (t) = βVm+1 e−t(H m +z0 I) e1 ,

where Hm , hm+1,m and Vm+1 = [Vm , vm+1 ] are generated by the Arnoldi process (2.1). Then   0 etz0 e−tHm e−tH m = (4.16) −thm+1,m etz0 eTm φ1 (−tHm ) 1 and w(t)′ = −Aw(t),

w(0) = v,

(2) wm (t)′ = −βVm+1 (H m + z0 I)e−t(H m +z0 I) e1 , (2)

(2) wm (0) = v.

(2)

Since Em (t) = w(t) − wm (t), we get (2) Em (t)′ = −Aw(t) + βVm+1 (H m + z0 I)e−t(H m +z0 I) e1 .

From (2.1), we have Vm+1 (H m + z0 I) = [ Vm , vm+1 ]



Hm

hm+1,m eTm

0 z0



= [ AVm , z0 vm+1 ].

Then it follows from the above that (2) Em (t)′ = −Aw(t) + β[ AVm z0 vm+1 ]e−t(H m +z0 I) e1   (2) (2) = −A w(t) − wm (t) − Awm (t) + β[ AVm , z0 vm+1 ]e−t(H m +z0 I) e1 (2) = −AEm (t) − β[ 0 q1 (A)vm+1 ]e−t(H m +z0 I) e1 ,

where q1 (z) = z − z0 . Therefore, from (4.16) we get n E (2) (t)′ = −AE (2) (t) + g(t) m m (2) Em (0) = 0 where g(t) = tβhm+1,m eTm φ1 (−tHm )e1 q1 (A)vm+1 . Solving the above ODE, we obtain  Z τ (2) Em (τ ) = βhm+1,m teTm φ1 (−tHm )e1 e(t−τ )A dt q1 (A)vm+1 . 0

Taking the norms on the two sides and defining γ2 as (4.15), we get



(2)

Em (τ ) ≤ γ2 τ βhm+1,m max eTm φ1 (−tHm )e1 0≤t≤τ

and



(2) kEm (τ )k ≤ τ βhm+1,m eTm φ1 (−τ Hm )e1 + Em (τ ) ≤ (1 + γ2 )τ βhm+1,m max eTm φ1 (−tHm )e1 . 0≤t≤τ

12

(4.17)

Let f (z) = ez and φ1 (z) = the following relationship.

ez − ez0 . Then the divided differences of f (z) and φ1 (z) have z − z0

Lemma 4.4. Given z1 , z2 , . . . , zm ∈ F (A), we have φ1 [z1 , . . . , zm ] = f [z0 , z1 , . . . , zm ].

(4.18)

Proof. We prove the lemma by induction. For i = 1, φ1 [z1 ] =

ez1 − ez0 = f [z0 , z1 ]. z1 − z0

So (4.18) is true. Assume that (4.18) holds for i = k, k < m, i.e., φ1 [z1 , . . . , zk ] = f [z0 , z1 , . . . , zk ]. Then for i = k + 1, we get φ1 [z1 , . . . , zk−1 , zk+1 ] − φ1 [z1 , . . . , zk ] zk+1 − zk f [z0 , z1 , . . . , zk−1 , zk+1 ] − f [z0 , z1 , . . . , zk ] = zk+1 − zk = f [z0 , z1 , . . . , zk+1 ].

φ1 [z1 , . . . , zk+1 ] =

Thus, the lemma is true. With γ1 and µ2 defined as in Theorem 4.2, we can refine Theorem 4.3 and get an explicit and compact bound for kEm (τ )k when A is Hermitian. Theorem 4.5. Assume that spec(A) is contained in the interval Λ ≡ [a, b] and the Lanczos process (4.4) can be run m steps without breakdown. Then we have



(2)

Em (τ ) ≤ γ3 τ ββm+1 eTm φ1 (−τ Tm )e1 , kEm (τ )k ≤ (1 + γ3 )τ ββm+1 eTm φ1 (−τ Tm )e1 , where γ3 = τ γ1 k(A − z0 I)vm+1 k and γ1 is defined as (4.7) for any z0 ∈ Λ.

Proof. It follows from (4.17) that for a Hermitian A  Z τ (2) T (t−τ )A Em (τ ) = ββm+1 tem φ1 (−tTm )e1 e dt (A − z0 I)vm+1 . 0

Let λ1 , . . . , λm be the eigenvalues of Tm and z0 ∈ Λ. By Propositions 2.1–2.3 and Lemma 4.4, we know that for all t ≥ 0 φ1 (−tTm ) =

m−1 X

a ˆi (t)(−tTm )i ,

i=0

where a ˆm−1 (t) = φ1 [−tλ1 , . . . , −tλm ]

= f [−tz0 , −tλ1 , . . . , −tλm ]. 13

Similar to the proof of Theorem 4.2, we get Z τ m  t a ˆm−1 (t) (t−τ )A (2) T Em (τ ) = ββm+1 em φ1 (−τ Tm )e1 e dt (A − z0 I)vm+1 m−1 a ˆm−1 (τ ) 0 τ and

(m) (m) a ˆm−1 (t) ≤ maxζ∈−tΛ f (ζ) ≤ maxζ∈−τ Λ f (ζ) = eτ (b−a) . a ˆm−1 (τ ) minζ∈−τ Λ f (m) (ζ) minζ∈−τ Λ f (m) (ζ)

(4.19)

(4.20)

Substituting it into (4.19) gives

and

Z τ

T τ (b−a)

(2) e(τ −t)µ2 dt k(A − z0 I)vm+1 k

Em (τ ) ≤ τ ββm+1 em φ1 (−τ Tm )e1 e 0 ≤ γ3 τ ββm+1 eTm φ1 (−τ Tm )e1



(2) kEm (τ )k ≤ τ ββm+1 eTm φ1 (−τ Tm )e1 + Em (τ ) ≤ (1 + γ3 )τ ββm+1 eTm φ1 (−τ Tm )e1 ,

where γ3 = τ γ1 k(A − z0 I)vm+1 k.

Remark 4.2. Remarably, Theorem 4.5 shows that the a priori error norm kEm (τ )k is essentially determined by the first term of the error expansion provided that γ3 , or equivalently γ1 is not large.

5

Some a posteriori error estimates for approximating f (A)v

In the above section, we have established some upper bounds for the Arnoldi approximation to e−τ A v. They constitute the basis of seeking reliable a posteriori error estimates. Define ξ1 = βhm+1,m eTm f (Hm)e1 and ξ2 = βhm+1,m eTm φ1 (Hm )e1 . (5.1)

In [21], the authors simply suggested taking ξ1 as an a posteriori error estimate, but realized that usually due to the lack of a definition of residual, an insightful interpretation of ξ1 as an a posteriori estimate is not always immediate. In [18, 23], the authors have introduced a generalized residual notion for approximating f (A)v. In [5], the authors have provided more details when f (z) = e−τ A , where the rationale of ξ1 is justified. Let us briefly review why it is so. By the Cauchy integral definition [16, p. 8], the error of the mth Arnoldi approximation to f (A)v can be expressed as Z   1 f (λ) (λI − A)−1 v − βVm (λI − Hm )−1 e1 dλ, (5.2) Em = 2πi Γ where Γ is the contour of F (A). For the Arnoldi method for solving the linear system (λI − A)x = v, the error is em (λ) = (λI − A)−1 v − βVm (λI − Hm )−1 e1 , and the residual is rm (λ) = (λI −A)em (λ), which, by the Arnoldi decomposition, is expressed as  (5.3) rm (λ) = βhm+1,m eTm (λI − Hm )−1 e1 vm+1 .

14

From the notations above, (5.2) becomes Em

1 = 2πi

Z

f (λ)em (λ)dλ.

Γ

Replacing em (λ) by rm (λ), one defines the generalized residual Z 1 Rm = f (λ)rm (λ)dλ, 2πi Γ which exactly equals the standard residual rm of the linear system (λI − A)x = v by taking f (λ) = (λI − A)−1 . Substituting (5.3) into the above leads to  Rm = βhm+1,m eTm f (Hm )e1 vm+1 .

Since ξ1 = kRm k, ξ1 can be interpreted as an a posteriori error estimate and used to design a robust stopping criterion. However, the generalized residual notion does not apply to ξ2 . The theoretical validity of ξ2 as an a posteriori error estimate has been unclear and not (2) been justified because there has been no estimate for kEm (τ )k before. With our bounds established, it is soon clear that both ξ1 and ξ2 can be naturally used as reliable a posteriori error estimates without the help of the generalized residual notion. As a matter of fact, our bounds have essentially given close relationships between the a priori error norm kEm (τ )k and the a posteriori quantities ξ1 and ξ2 . In what follows we will give more details, showing why both ξ1 and ξ2 are generally reliable and accurate a posteriori error estimates. We will confirm their effectiveness by numerical experiments.

5.1

The case that A is Hermitian

For a Hermitian A, we can give a new justification of ξ1 for computing e−τ A v. Theorem 4.2 shows that kEm (τ )k ≤ γ1 ξ1 , where γ1 is defined as (4.7), while Theorem 4.5 indicates that kEm (τ )k ≤ (1 + γ3 )ξ2 with γ3 = τ γ1 k(A − z0 I)vm+1 k. Note that if τ kAk is not very large, the factor γ1 is moderate, which means that ξ1 may be a very good estimate for the true error norm. The rationale of ξ2 can also be justified. Roughly speaking, γ3 is the same as γ1 with about a multiple τ kAk. Note that inequalities (4.12) and (4.20) may be conservative as the factor γ1 is the worst case. Thus, each of error estimates ξ1 and ξ2 should be reliable and accurate to determine whether or not the a priori error of an approximate solution to e−τ A v achieve a desired accuracy in practical computations. All the experiments in this paper are performed on Intel(R) Core(TM)2 Duo CPU T6600 @2.20GHz with RAM 2.00 GB using Matlab 7.8.0 under a Window XP operating system. f (Hm ) and φ1 (Hm ) are computed by means of the spectral decomposition when Hm is Hermitian or by Matlab built-in functions funm and expm. To illustrate the effectiveness of ξ1 and ξ2 , we compute the “exact” solution f (A)v by first using the above functions to calculate f (A) explicitly and then multiplying it with v. Keep in mind that we always require all z0 , z1 , . . . ∈ F (A). However, it is only z0 that is needed to define φ1 (z) in order to compute ξ2 . Note that all the diagonal entries hi,i of Hm lie in F (A). Therefore, to be unique, here and hereafter, in practical implementations, for simplicity we take z0 = h1,1 without any extra price. With z0 given, we then have the function φ1 (z) and can compute the posteriori estimate ξ2 defined in (5.1). Certainly, there are infinitely many choices of z0 since we only needs a z0 ∈ F (A), e.g., we may simply take z0 = h1,1 . Experimentally, we have found that the choice of z0 has little essential effect on the size of ξ2 . 15

The Arnoldi decomposition is performed until the approximation fm = βVm f (Hm)e1 satisfies kf (A)v − fm k / kf (A)vk ≤ ǫ. (5.4)

We take ǫ = 10−12 in the experiments. We define relative posterior estimates as ξ1 :=

ξ2 ξ1 , ξ2 := , kf (A)vk kf (A)vk

(5.5)

and compare them with the true relative error (5.4). Example 1. We justify the effectiveness of ξ1 and ξ2 as estimates for kEm (τ )k when f (A) = −τ A e . Consider the diagonal matrix A of size N = 1001 taken from [17] with equidistantly spaced eigenvalues in the interval [0, 40]. The N -dimensional vector v is generated randomly in a uniform distribution with kvk = 1. τ=0.1

τ=0.5

0

2

True relative error norm Relative error estimate ξ1

True relative error norm Relative error estimate ξ1

Relative error estimate ξ

2

Relative error estimate ξ

2

0

−2

−2

log10(||error||/||exact||)

log10(||error||/||exact||)

−4

−6

−8

−4

−6

−8

−10 −10

−12

−14

−12

0

2

4

6

8

10

12

−14

14

0

5

10

the dimension m of the Krylov subspace

15

20

25

30

the dimension m of the Krylov subspace

τ=1 2

True relative error norm Relative error estimate ξ1 Relative error estimate ξ2

0

log10(||error||/||exact||)

−2

−4

−6

−8

−10

−12

−14

0

5

10

15

20

25

30

35

the dimension m of the Krylov subspace

Figure 1: Example 1: The relative error estimates and the true relative error norm for e−τ A v for A symmetric with N = 1001 Figure 1 depicts the curves of the two relative error estimates ξ1 , ξ2 and the true relative error norm (5.4) for the Lanczos approximations (4.5) to e−τ A v with three parameters τ = 0.1, 0.5 and 1. It is seen that both estimates ξ1 and ξ2 are accurate, and particularly ξ2 is indistinguishable from the true error in a few steps and is sharper than ξ1 by roughly one order. For τ = 0.5, 1, the ξ1 and ξ2 are not accurate in the first very few steps. However, we find that during this stage the true relative error norms themselves stay around one, meaning the Arnoldi approximations are very poor. After this stage, ξ1 and especially ξ2 soon become smooth and provide accurate and reliable estimates for the true relative errors 16

as m increases. In the meantime, we observe from the figure that the effectiveness of ξ1 and ξ2 is little affected by τ kAk. However, we find that the smaller τ is, the faster the Lanczos approximations converge. So the convergence itself is affected by the size of τ considerably.

5.2

The case that A is non-Hermitian

Unlike the Hermitian case, for A non-Hermitian, Theorems 4.1 and 4.3 do not give explicit relationships between the a priori error norm and ξ1 , ξ2 . So it is not obvious how to use ξ1 and ξ2 to estimate the true error norm. However, if we lower our standard a little bit, it is instructive to make use of these two theorems to justify the effectiveness of ξ1 and ξ2 for estimating the true error of the Arnoldi approximations to e−τ A v, as shown below. Define two functions g1 (t) = eTm e−tHm e1 and g2 (t) = eTm φ1 (−tHm )e1 . By continuity, there exist two positive constants c1 , c2 ≥ 1 such that max |g1 (t)| = c1 eTm e−τ Hm e1 and max |g2 (t)| = c2 eTm φ1 (−τ Hm )e1 .

0≤t≤τ

0≤t≤τ

From the above, (4.1) and (4.14), we have kEm (τ )k ≤ c1

eτ µ2 − 1 ξ1 and kEm (τ )k ≤ c2 (1 + γ2 )ξ2 , τ µ2

where µ2 = µ2 [−A] and γ2 is defined as (4.15). Provided that c1 or c2 is not large, ξ1 or ξ2 is expected to estimate the true error norm reliably. However, we should be aware of that ξ1 or ξ2 may underestimate the true error norm considerably in the non-Hermitian case when c1 or c2 is large. The following example illustrate the behavior of ξ1 and ξ2 for A non-Hermitian. Example 2. This example is taken from [11, Example 5.3], and considers the initial boundary value problem on (0, 1)3 × (0, T ),

u˙ − ∆u + τ1 ux1 + τ2 ux2 = 0

on ∂(0, 1)3 for all t ∈ [0, T ],

u(x, t) = 0

x ∈ (0, 1)3 .

u(x, 0) = u0 (x),

Discretizing the Laplacian by the seven-point stencil and the first-order derivatives by central differences on a uniform meshgrid with meshsize h = 1/(n + 1) leads to an ordinary initial value problem u(t) ˙ = −Au(t), t ∈ (0, T ),

u(0) = u0 .

The nonsymmetric matrix A of order N = n3 can be represented as the Kronecker product form 1 A = − 2 [In ⊗ (In ⊗ C1 ) + (B ⊗ In + In ⊗ C2 ) ⊗ In ]. h Here In is the identity matrix of order n and B = tridiag(1, −2, 1), Cj = tridiag(1 + µj , −2, 1 − µj ), j = 1, 2, where µj = τj h/2. This is a popular test problem as the eigenvalues of A are explicitly known. Furthermore, if | µj |> 1 for at least one j, the eigenvalues of A are complex and lie in both left and right planes. For the spectral properties of A, refer to [11, 22]. As in [11], we choose h = 1/15, τ1 = 96, τ2 = 128, which leads to N = 2744 and µ1 = 3.2, µ2 ≈ 4.27, and approximate e−τ A v where τ = h2 and v = [1, 1, . . . , 1]T . We 17

compare relative error estimates ξ1 and ξ2 defined by (5.5) with the true relative error norm (5.4). The convergence curves of these three quantities are depicted in Figure 2. Similar to the case that A is Hermitian, we observe that ξ1 and ξ2 both have excellent behavior, and they mimic the true error norm very well. Particularly, the ξ2 are almost identical to the true error norms, and are more accurate than the ξ1 by about one order. τ1=96, τ2=128 2

True relative error norm Relative error estimate ξ1 Relative error estimate ξ2

0

log10(||error||/||exact||)

−2

−4

−6

−8

−10

−12

−14

0

5

10

15

20

25

30

35

40

the dimension m of the Krylov subspace

Figure 2: Example 2: The relative error estimates and the true relative error norm for e−τ A v for A nonsymmetric with order N = 2744.

5.3

Applications to other matrix functions

Theorem 3.1 has indicated that the error expansion works for other general analytic functions, including sin(z) and cos(z); see Remark 3.2. Furthermore, as analyzed and elaborated in Remark 3.2, just as for e−τ A v, the first term of (3.5) is generally a good error estimate of the Arnoldi approximation to these matrix functions acting on a vector. We now confirm the effectiveness of ξ2 for sin(A)v and cos(A)v. We also test the behavior ξ1 for estimating kEm (τ )k, which has been justified in the beginning of Section 5. Example 3. We consider the Arnoldi approximation to cos(−τ A)v. Here we choose the matrix A and the vector v as in Example 1 for the Hermitian case and as in Example 2 for the nonsymmetric case, respectively. Figures 3–4 illustrate the behavior of the error estimates ξ1 and ξ2 when computing cos(−τ A)v for A symmetric and nonsymmetric, respectively. For A symmetric, the ξ1 and ξ2 are far from accurate and underestimate the true relative errors considerably in the first very few steps for τ = 0.5, 1. However, this does not matter because, during this stage, the true relative error norms also stay around one and the Arnoldi approximations have no accuracy. After this stage, ξ1 and especially ξ2 soon become smooth and provide accurate estimates for the true errors as m increases. In particular, the ξ2 have little difference with the true error norms for the given three τ . For the error estimates of the Arnoldi approximations to sin(−τ A)v, we have very similar findings, so the details are omitted and we do not report the results on it. For A nonsymmetric, the situation for ξ1 is not so good as that in the symmetric case in the first few steps, but afterwards it is still quite accurate to estimate the true error norm for each m when the latter starts becoming small. In contrast, apart from the first few iterations, 18

τ=0.1

τ=0.5

2

2

True relative error norm Relative error estimate ξ

True relative error norm Relative error estimate ξ

1 2

−2

−2

−4

−4

−6

−8

−8

−10

−12

−12

0

5

10

2

−6

−10

−14

Relative error estimate ξ

0

log10(||error||/||exact||)

log10(||error||/||exact||)

1

Relative error estimate ξ

0

−14

15

0

5

10

the dimension m of the Krylov subspace

15

20

25

30

the dimension m of the Krylov subspace

τ=1 2

True relative error norm Relative error estimate ξ

1

Relative error estimate ξ

2

0

log10(||error||/||exact||)

−2

−4

−6

−8

−10

−12

−14

0

5

10

15

20

25

30

35

40

45

the dimension m of the Krylov subspace

Figure 3: Example 3: The relative error estimates and the true relative error norm for cos(−τ A)v for A symmetric with N = 1001 τ1=96, τ2=128 2

True relative error norm Relative error estimate ξ1 Relative error estimate ξ2

0

log10(||error||/||exact||)

−2

−4

−6

−8

−10

−12

−14

0

5

10

15

20

25

30

35

40

the dimension m of the Krylov subspace

Figure 4: Example 3: The relative error estimates and the true relative error norm for cos(−τ A)v for A is nonsymmetric with N = 2744

19

ξ2 is considerably better than ξ1 , and mimics the true error norm very well when the Arnoldi approximation starts converging. So we conclude that ξ2 is very reliable to measure the true relative error norm of the Arnoldi approximations to other analytic functions.

6

Conclusion

We have generalized the error expansion of the Arnoldi approximation to eA v to more general f (A)v for a wide range of matrix functions. We have derived two new a priori upper bounds for the Arnoldi approximations to e−τ A v and established more compact results for A Hermitian. From them, we have proposed two practical a posteriori error estimates. For the matrix exponential, based on the new error expansion, we have quantitatively proved that the first term of the expansion is a reliable a posteriori estimate, which has been experimentally shown to be very accurate to estimate the true error. For other general analytic functions such as sin(z) and cos(z), we have shown why the first term of the error expansion can also be a reliable error estimate. We have numerically confirmed the effectiveness of them for the cosine and sine matrix functions frequently occurring in applications. It is worthwhile to point out that ξ2 is experimentally more accurate than ξ1 for the exponential, cosine and sine functions.

References ¨ttel, A generalization of the steepest [1] M. Afanasjew, M. Eiermann, O. G. Ernst, and S. Gu descent method for matrix functions, Electron. Trans. Numer. Anal., 28 (2008), pp. 206–222. , Implementation of a restarted Krylov subspace method for the evaluation of matrix func[2] tions, Linear Algebra Appl., 429 (2008), pp. 229–314. [3] C. Beattie, M. Embree, and J. Rossi, Convergence of restarted Krylov subspaces to invariant subspaces, SIAM J. Matrix Anal. Appl., 25 (2004), pp. 1074–1109. [4] L. Bergamaschi and M. Vianello, Efficient computation of the exponential operator for large, sparse, symmetric matrices, Numer. Linear Algebra Appl., 7 (2000), pp. 27–45. [5] M. A. Botchev, V. Grimm and M. Hochbruck, Residuals, restarting and Richardson iteration for the matrix exponential, SIAM J. Sci. Comput., to appear. [6] K. Dekker and J. G. Verwer, Stability of Runge-Kutta methods for stiff nonlinear differential equations, North Holland, Amsterdam, 1984. [7] F. Diele, I. Moret, and S. Ragni, Error estimates for polynomial Krylov approximations to matrix functions, SIAM J. Matrix Anal. Appl., 30 (2008), pp. 1546–1565. [8] V. Druskin, On monotonicity of the Lanczos approximation to the matrix exponential, Linear Algebra Appl., 429 (2008), pp. 1679–1683. [9] V. L. Druskin and L. A. Knizhnerman, Two polynomial methods of calculating functions of symmetric matrices, Comput. Math. Math. Phys., 29 (1989), pp. 112–121. , Extended Krylov subspaces: Approximation of the matrix square root and related functions, [10] SIAM J. Matrix Anal. Appl., 19 (1998), pp. 775–771. [11] M. Eiermann and O. G. Ernst, A restarted Krylov subspace method for the evaluation of matrix functions, SIAM J. Numer. Anal., 44 (2006), pp. 2481–2504. ¨ttel, Deflated restarting for matrix functions, SIAM [12] M. Eiermann, O. G. Ernst, and S. Gu J. Matrix Anal. Appl., 32 (2011), pp. 621–641. [13] J. V. Eshof and M. Hochbruck, Preconditioning Lanczos approximations to the matrix exponential, SIAM J. Sci. Comput., 27 (2006), pp. 1438–1457. [14] A. Frommer, Monotone convergence of the Lanczos approximations to matrix functions of Hermitian matrices, Electron. Trans. Numer. Anal., 35 (2009), pp. 118–128. [15] E. Gallopoulos and Y.Saad, Efficient solution of parabolic equations by Krylov approximation method, SIAM J. Sci. Statist. Comput., 13 (1992), pp. 1236–1264.

20

[16] N. J. Higham, Functions of Matrices, SIAM, Philadelphia, PA, 2008. [17] M. Hochbruck and C. Lubich, On Krylov subspace approximations to the matrix exponential operator, SIAM J. Numer. Anal., 34 (1997), pp. 1911–1925. [18] M. Hochbruck, C. Lubich, and H. Selhofer, Exponential integrators for large systems of differential equations, SIAM J. Sci. Comput., 19 (1998), pp. 1552–1574. [19] R. A. Horn and C. R. Johnson, Topics in Matrix Analysis, Cambridge University Press, Cambridge, 1991. ´, I. W. Turner, and D. P. Simpson, A restarted Lanczos approximation to functions [20] M. Ilic of a symmetric matrix, IMA J. Numer. Anal., 30 (2010), pp. 1044–1061. [21] L. Lopez and V. Simoncini, Analysis of projection methods for rational function approximation to the matrix exponential, SIAM J. Numer. Anal., 44 (2006), pp. 613–635. [22] I. Moret and P. Novati, An interpolatory approximation of the matrix exponential based on Faber polynomials, Comput. Appl. Math., 131 (2001), pp. 361–380. [23] , Interpolating functions of matrices on zeros of quasi-kernel polynomials, Numer. Linear Algebra Appl., 12 (2005), pp. 337–353. [24] R. B. Morgan, A restarted GMRES method augmented with eigenvectors, SIAM J. Matrix Anal. Appl., 16 (1995), pp. 1154–1171. , GMRES with deflated restarting, SIAM J. Sci. Comput., 24 (2002), pp. 20–37. [25] [26] P. Novati, A method based on Fej´ er points for the computation of functions of nonsymmetric matrices, Appl. Numer. Math., 44 (2003), pp. 201–224. [27] B. N. Parlett, Global convergence of the basic QR algorithm on Hessenberg matrices, Math. Comp., 22 (1968), pp. 803–817. [28] Y. Saad, Analysis of some Krylov subspace approximations to the matrix exponential operator, SIAM J. Numer. Anal., 29 (1992), pp. 209–228. [29] , Iterative Method for Sparse Linear Systems (Second Edition), SIAM, Philadelphia, PA, 2003. [30] D. C. Sorensen, Implicit application of polynomial filters in a k-step Arnoldi method, SIAM J. Matrix Anal. Appl., 13 (1992), pp. 357–385. [31] D. E. Stewart and T. S. Leyk, Error estimates for Krylov subspace approximations of matrix exponentials, Comput. Appl. Math., 72 (1996), pp. 359–369. [32] G. W. Stewart, A Krylov-Schur algorithm for large eigenproblems, SIAM J. Matrix Anal. Appl., 23 (2001), pp. 601–614. ¨ m, On logarithmic norms, SIAM J. Numer. Anal., 12 (1975), pp. 741–753. [33] T. Stro [34] H. Tal-Ezer, Spectral methods in time for parabolic problems, SIAM J. Numer. Anal., 26 (1989), pp. 1–11. [35] H. Tal-Ezer, On restart and error estimation for Krylov approximation of w = f (A)v, SIAM J. Sci. Comput., 29 (2007), pp. 2426–2441. [36] J. van den Eshof, T. L. A. Frommer, and H. A. van der Vorst, Numerical methods for the QCD overlap operator. I. Sign-function and error bounds, Comput. Phys. Comm., 146 (2002), pp. 203–224. [37] Q. Ye, Error bounds for the Lanczos methods for approximating matrix exponentials, SIAM J. Numer. Anal., 51 (2013), pp. 68–87. [38] P. Zhang, Iterative methods for computing eigenvalues and exponentials of large matrices, Ph.D. thesis, Department of Mathematics, University of Kentucky, Lexington, Kentucky, 2009.

21