Improved estimates of statistical regularization ... - Springer Link

2 downloads 0 Views 1MB Size Report
rameters in the numerical differentiation and smoothing of non-exact data. ... necessary to introduce a form of stabilization since differentiation is an unsta-.
Numerische Malhematik 9 Springer-Verlag1986

Numer. Math. 48, 671-697 (1986)

Improved Estimates of Statistical Regularization Parameters in Fourier Differentiation and Smoothing A.R. Davies I and R.S. Anderssen 2 Department of Applied Mathematics, University College of Wales, Aberystwyth SY23 3BZ, UK 2 Centre for Mathematical Analysis, Australian National University,G.P.O. BoxA Canberra A.C.T. 2601; and Division of Mathematics and Statistics, G.P.O. Box 1965, Canberra, A.C.T. 2601, Australia

Summary. We investigate the statistical methods of cross-validation (CV) and maximum-likelihood (ML) for estimating optimal regularization parameters in the numerical differentiation and smoothing of non-exact data. Various criteria for optimality are examined, and the (asymptotic) notions of strong optimality, weak optimality and suboptimality are introduced relative to these criteria. By restricting attention to certain N-dimensional Hilbert spaces of smooth and stochastic functions, where N is the number of data, we give regularity conditions on the data under which 2cv, the regularization parameter predicted by CV, is strongly optimal with respect to the predictive mean-square signal error. We show that 2ML is at best weakly optimal with respect to this criterion but is strongly optimal with respect to the innovation variance of the data. For numerical differentiation, 2cv and 2ML are both shown to be suboptimal with respect to the predictive mean-square derivative error.

Subject Classifications" AMS (MOS): 65D25, 65D10; CR: G1.2. I. Introduction We consider the problem of estimating the ruth derivative of a data function (or signal) g(x), given only N sampled values y, = g(xn) +~,,

n = 0 ..... N - l ,

(t.1)

contaminated by white noise of unknown variance a z, i.e. Ee,=0;

Eete,=a261,;

1, n=O ..... N - l ,

(1.2)

where E denotes expectation, m is a non-negative integer, and the special case m = 0 describes the problem of smoothing or data graduation. We restrict attention to equally-spaced sampling points x , = n/N on the interval [0, 1). The mth derivative is denoted by f(x)=g(")(x), and is to be estimated on [0, 1].

672

A.R. Davies and R.S. Anderssen

A survey of numerical differentiation procedures for non-exact data (up to about 1972) may be found in Anderssen and Bloomfield [1]; for more recent approaches see, for example, Anderssen and de H o o g [3] and Rice and Rosenblatt [14]. Satisfactory methods for differentiating non-exact data take into account, in a natural way, the structure within the data. If g is assumed to belong to a space of smooth functions then it is natural to model the data using splines, whereas if g is the realization of a continuous time stationary stochastic process it is natural to use spectral analysis. In either setting it is necessary to introduce a form of stabilization since differentiation is an unstable or ill-posed process, the degree of instability or ill-posedness increasing with m, the order of differentiation. Let :U denote an integral operator such that Y f = g . (For differentiation Y depends on boundary conditions and is discussed in w of this paper.) A stabilized (regularized) derivative can be constructed using pth-order Tikhonov regularization min :~v

~ [ ~ t : f ( x , ) - y , ] 2 + 2 [lUll2 , ( N ,= 0

(1.3)

where F is a suitable chosen Hilbert space with norm [l']lr which depends on a parameter p > 0 (usually an integer) called the order of regularization, and 2 > 0 is the regularization parameter which controls the trade-off between the smoothness of the solution of (1.3) and fidelity of : U f to the available data. Anderssen and Bloomfield I-1] have shown that there is a formal equivalence between stabilization in this Hilbert space setting and stabilization in a stochastic setting by means of minimum-variance parametrized Wiener filtering (w5). Both approaches give rise to a regularization filter, or equivalently a Wiener window, which depends on 2 and p. In the related but more general context of regularizing integral equations of the first kind, the importance of choosing p correctly has been emphasized by Cullum [7], and more recently by Lukas [12, 13]. The implications of Lukas' results in the special case of numerical differentiation have not yet been fully explored, and this is left to a later study. In the present paper we assume that p has been chosen in an optimal or near optimal fashion relative to the underlying data function g and order of differentiation m, and proceed to examine the choice of the regularization parameter 2. (For pragmatic methods of estimating optimal (p, 2) pairs, see Baart [5], Davies et al. [9] and G a m b e r [10].) Let fN, z denote the regularized or filtered derivative of (1.3) which approximates f, and let gN;~=~JN;:, be the corresponding smoothed estimate of g, satisfying o~m) - rN; 2." Various criteria may be used to define an optimal value of ~N; 2 ---J 2; the minimization with respect to 2 of some real-valued functional C(2) of the signal error gN;~-g or derivative error f N ; ~ - f being the most natural. Let 2 0 be the minimizer of C(2), where, for example,

C(2)=S(2)==_N1 N~- I [gN;~(xn)-g(x,)] z n=0

is the predictive mean-square signal error, or

(1.4)

Improved Estimates of Statistical Regularization Parameters

673

N-I

C(2)=D(2) = -

~ [fN;~(x,)-f(x,)] 2

(1.5)

n=0

is the predictive mean-square derivative error. We introduce the following levels of optimality which are to be interpreted in the limit as N--* ~ : Definition 1.1. A value of 2 is said to be

(i) strongly C-optimal of order/3 if C(2)

C(2o)

-I+O(N

~),

/3>0,

as

N~m;

(1.6a)

(ii) C-optimal if

C(2) C(2o-~=1+o(1))

as N ~ ,

(1.6b)

where o(1)~0 as N ~ o v ; (iii) weakly C-optimal if C(2) --=O(1)

C(2o)

as

N~;

(1.6c)

(iv) C-suboptimal if

C(2) C(2o)

---,oo

as

N-->oo.

(1.6d)

Remark. We still maintain the definitions (i)-(iv) if in (1.6) C(2) is replaced by its expectation EC(2) and 2 o is replaced by 2 o which minimizes EC(2).

Determining 2 by Cross-Validation

It is well-known that for the smoothing problem and related problems the method of generalized cross-validation (GCV) (Craven and Wahba [6], Wahba [16]) determines a value 2cv for the regularization parameter in (1.3) which is S-optimal in the sense of (1.6b) with C(2)=ES(2). It can be shown that 2cv minimizes 1 Yr(l - A(2)) 2 y

vcv(2) =

(1.7) ( 1 Trace(I-A(2))) 2

where A(2) is the N x N influence matrix satisfying gN; z =A(2) y

(1.8)

with gN;~=(gm~(Xo) . . . . . gN;~(XN_0)r and Y=(Yo .... ,yN_0 r. Working in an appropriate Sobolev space for the spline smoothing problem Craven and

674

A.R. Davies and R.S. Anderssen

Wahba have shown, in particular, that under certain regularity assumptions on the data function g, )~cv= 20 (1 + o (1)), (1.9) where o(1)~0 as N ~ ~ , and 2o, 2cv denote the minimizers of ES(2), EVcv(2), respectively. Equation (1.9) holds also for the general case of integral equations of the first kind [16] and for the numerical differentiation problem. For the latter problem, with v = m + p , it can be shown that under strong regularity conditions on g (see w3) ~CV, ~0 -- N 2 v lC,, (4 v +

1)

[1+o(1)]

as

N~oo,

(1.10)

where C v is a constant depending on 0.2, v and g. Under weaker conditions (see w5) the asymptotic behaviour is

c;

~CV, ~0 ~ N 2 v/(2 v+ 1) [1 + o(1)].

(1.11)

(See also Lukas [13]). In either case 2cv is S-optimal.

Determining 2 by Maximum-Likelihood (ML) In their stochastic approach to numerical differentiation Anderssen and Bloomfield [1, 2] determine a near optimal value 2ML for 2 by maximizing the Whittle likelihood for parameters in a spectral density model for the data. For the spline smoothing problem and its generalizations Wahba [17] has extended the usual notion of ML estimation to the case of improper prior distributions (called G M L estimation) and has thereby effected a comparison of the behaviour and properties of J~ML and 2cv. Interpreted for the differentiation problem, Wahba's results show (1) 2ML is the minimizer of yr(i - A (2)) y VML(2) =[det + (I-A(2))] 1/(N-r)' (1.12) where r is the multiplicity of the zero eigenvalues (if any) of I - A ( 2 ) , and det + (.) denotes the product of non-zero eigenvalues; (2) under both strong and weak regularity assumptions on g, the minimizer of EVML(2) satisfies "~ML--

C" 1) [1+o(1)] N2V/(2v+

as

N~,

(1.13)

where C~' depends o n 0 "2, g and v = m + p . By comparison with (1.10) and (1.11), it follows that )TMLundersmooths with respect to '~cv and 2o as N--, oo when the underlying signal is very smooth, whereas the estimates are compatible when the signal is rougher. In particular,

Improved Estimates of Statistical Regularization Parameters

675

(3) under strong regularity ES (,~ML)

ES(~o~ ~ oo as

N---*~ ,

(1.14)

whereas under weak regularity ES(,~ML)

--=O(1)

ES(2o)

as

N~.

(1.15)

Thus 2ME is weakly S-optimal for rough signals but S-suboptimal for smooth signals.

Objectives of the Present Paper We first examine (w167 the asymptotic behaviour in (1.9), (1.10) and (1.13) more closely, and obtain improved order of convergence estimates for the o(1) terms at the cost of slightly stricter regularity conditions on g. The key idea is to use Euler-Maclaurin sum formulae to estimate moments of discrete regularization filters (w We give conditions under which 2cv is strongly Soptimal. Secondly, in w we abandon the predictive mean-square signal error minimization criterion in (1.4), which is the natural criterion for defining optimal 2 when m=0, and study instead the predictive mean-square derivative error minimization criterion in (1.5), which is the natural criterion when m >0. If 2~o ") denotes the minimizer of ED(2), we show that under strong regularity of g ,~o" ) - N2V/(4v+ C7)2m+ 1) [1+o(1)]

as

N--*oo,

(1.16)

whereas under weak regularity

2tom)=O(N-z~/(2v+ 2m+ 1)).

(1.17)

Thus both ~cv and )~MLalways undersmooth with respect to )~t0m)as N ~ @ and m > 0. In particular we show that 2cv and 2Mr are both D-suboptimal. Finally, in w we show in the stationary stochastic setting that 2Mr minimizes the innovation variance a 2 of the data as N o ~ . This variance is the mean-square projection error of the linear regression on past data which gives rise to the innovation sequence underlying the data. Given the spectral density model for the data inherent in the work of Anderssen and Bloomfield [2] we show that the innovation variance at2=o-2(2) is a function of 2. Thus, whereas 2cv tracks the minimizer of S(2) and is (sometimes) strongly S-optimal, we show that 2ML tracks the minimizer of e2(2) and is strongly ~1-optimal. In order to simplify the analysis and to present a unified treatment of both smooth and stochastic functions we restrict our analysis to N-dimensional approximating Hilbert spaces which may be interpreted either as trigonometric subspaces of a parent Sobolev space with periodic end conditions, or as

676

A . R . D a v i e s a n d R.S. A n d e r s s e n

subspaces of a Hilbert space of continuous time stationary stochastic processes in which all limit operations are construed in a mean-square sense (see, for example, Hannan [11]). These approximating spaces become dense in their respective parent spaces (with respect to the appropriate norm) as N ~ o o . We discuss these approximating spaces and the regularization filters associated with them in w2.

2. Approximating Spaces and Regularization Filters Consider a Hilbert space H~= {qS: qS~rl~r r = 0 , 1.... , s}, with inner product and norm given by (qS, tp)= (q5~*),~0~*))d,

I]~bH.,= ,em~*)'~,

(2.1)

where we specify two possibilities for d : either (i) H, is smooth, by which we mean d = s and ( ' , ' ) d and II'qld denote the "~2 [-0, 1] inner product and norm, respectively; or (ii) H~ is stochastic, by which we mean d is the space of continuous time stationary stochastic processes, which are point-wise mean-square continuous. The inner product is defined by 1

(~, ~)~, = j [ e 4,(x) ~(x)] ax o

and the norm by

(2.2) 1

II4,N~ = ~ E [q~x)] 2 ax. 0

The probability measures associated with the random variables qS(x) and qz(x) are assumed to be such that the integrals in (2.2) exist in the conventional Lebesgue sense. #'~(x) denotes the mean-square derivative of order r at x; a sufficient condition for its existence is that the autocovariance function ?(h) = E ~b(x + h) ~b(x) be 2 r-times continuously differentiable at h = 0. For convenience of notation in what follows we shall often omit the expectation symbol E when treating real-valued functionals of ~b in a stochastic space H s. In all cases the meaning should be clear; for example, in (1.3), when F is stochastic, the sum of squared residuals should strictly be replaced by its expectation. For any ~b~H s we define the sequence of operators ~-~, r = 1, 2 ..... by x

~-' e/)(x)=y[(a(t)-q~o]dt, o ~ - ' q 5 = ~ - 1 ( ~ - ' + 1 q~),

1

r

r

o r = 2 , 3 .... ,

(2.3)

with the property that ~ - r 0 5 ( 0 ) = ~ - ' q S ( 1 ) = 0 , and ~-r~b(x) is continuous or mean square continuous for all xe[0, 1]. In terms of the Fourier series of ~b:

Improved Estimates of Statistical Regularization Parameters qS0e

=

,

677

0 0 and l > 0. We also introduce the convenient notation

Nz~ = N21/~2~).

(3.12)

I m p r o v e d Estimates of Statistical Regularization Parameters

681

L e m m a 3.1. For integers k > O, l > O, satisfying k < v l, we have

/~kl [ 1 - O(N~2(~, k)+ 1)] (•'2" kA ) = 2(2k~5~/(2~)

(3.13)

as N--* 0% provided also Nzv ~ oo. The constant 14t is given by tZkdt t~kt-- n Jo (1 +t2~) ''

(3.14)

Proof Defining (D 2k ~,(0)) =

(1 + 20)2') z

and using the Euler-Maclaurin sum formula for the right hand side of (3.10) we find f2kt(2)=

~ 1 ~B2" (2~)2r 1 ~(2r- 1)((.OM)' 0f q~(co)d0)+2 r=

(3.15)

where the B2~ are Bernoulli numbers and we have used the fact that ~(0)) is an even function of co and 4~~2~ 1)(0)) is odd. Using the substitution t = 21/(2v)0), we have, since 0 ) u = nN, e~M

1

S q'(o))dc~ 0

nNa~

~

-

-

t 2k

dt

~0 (l+t2~),

2(2k + 1)/(2v)

(1 + t 2 ~)~

r~NAv

(1 + t 2 ~)l "

(3.16)

Assuming that N~v~Oe as N ~ o o , the tail integral behaves asymptotically as

t_2(._k) dt

.s~v

1

2(vl-k)-

1

(3.17)

Finally, we show that the tail integral dominates over the remainder term given by the infinite series in (3.15) as N ~ , and N~v--,oo. As co---,oe and (D21/(Zv)---+ 00, w e have

~(0))~2-' 0)- 2~-k), and ~2r-1)(0))~

[2(vl-k+r-1)]! [ 2 ( v l - k ) - 13!

2_t0)_20,l_k+,.)+ 1

Since, for ]0)1> 1, we have 09-- 2 ( v l - k) - 1 O9- 2 ( v l - k + r ) + 1 - -

1 --0) -2

r=l

'

it follows by majorization that the remainder term in (3.15) is 2-10(N-Z(vt-k)-

1)= 4

(2k+ 1)/(2v) O ( N - 2

N - v 2 ( v / - k ) + 1).

682

A.R. Davies a n d R.S. A n d e r s s e n

The lemma now follows immediately upon substitution of (3.16) and (3.17) in (3.15). Lemma 3.2. Suppose

I~(~)l=O([~ol %

~>2,

as

[~ol~oo.

(3.18)

Then for integers 0 < k < min {cq 2 v 1}, 1> O, we have as N --* oo and Nx~ ~ 0%

akt(2) = IIIgllk,(2) + O ( N -('-k)+ 1 N~(k + ~)),

(3.19)

where (D 2k

-

IIIgllk,(~)-

I~(~o)12

-0 (1 +,~oW

do).

(3.20)

Proof. From the relation between discrete and exact Fourier coefficients we

have from (3.18) and (3.6) as N ~ , gN, q = gq + O r~l [(N r - q)- ~+ (N r + q)- ~] =~,q+~q;,O(N-~),

)

-M3v, it follows that the order of convergence in (4.10) is best possible. We conclude this section by showing that 2cv is strongly S-optimal; more precisely

ES(, cv) ES(s -

1 + O(N- 2v/(4v+ 1))

(4.14)

under either regularity condition (3.8) or (3.36). From (3.27) and (3.28) we deduce that 0-2 ES(~.) =,4 2 G2~ ' 2(,~) + ~ ,.Qo2(~), (4.15) and hence, from Lemmas 3.1 and 3.2 with c~>3v, 0-2 Kv ES(2) = ) 2 [1 -+ AN(A)] Illglll2~ + ~ 7 - . [1 + BN(2)] AV

(4.16)

with

AN(,~ ) = -- 0(2) + O(N-'~+ 1 Na~(2,,+ 1)) BN(2 ) = _ 0(N)~4v+ 1). Substitution of either (3.9) or (3.37) into (4.16) then reveals

ES(f~o)=(4v+l)(~vv.a---N)4~/(4v+l)l[Igllll/~4~+')[l+O(N-2~/) (6.5)

Finally, assuming the regularity condition r~6v+ 2m [ffq[2< ~ , q=O

(6.6)

694

A.R. Davies and R.S. Anderssen

it follows from applying Lemmas 3.1 and 3.2 in (6.4) and (6.5) that (6.4) may be rewritten in the form /~(~tv+ 2m+ 1)/(2 v) ~ (~/v+ m, 3

0"2 ['[g'][2v+ rn

N)(l+O('~)+O(N-V+lNZv(2V+m+l)))"

Then, from the observation /~v§ 3 =/~m2/(4v), and using the same methods as previously, we establish Theorem 6.1. Under the regularity condition (6.6) and fN;z6Hp,N o, the minimizer 2(om) of ED(2) may be written \4v

[l+O(N-2~/(4~+2m+l))]

Illglll2,,§

as

N--,oo.(6.7)

Remark. If the regularity condition (6.6) is weakened to ~ --q (,),~v+2m]~ql2 < oo

(6.8)

q=O

then the O(') term in (6.7) may be replaced by o(1) as N ~ . To investigate the optimality of 2cv and 2ML under the criterion D(2), we see from (6.3) that 0.2

ED(2)=2 2 G2~+,,,2(2)+~ f2,~2(2)+)~f2~+~m,1(2) 9O(N-(~-m)),

(6.9)

from which, under condition (6.8) or stronger conditions we deduce

ED(~(om))= O(N-4V/(4.v+2m+ 1)), whereas

ED(~cv ) = O(N-

and

2(2v-m)/(4v+

(6.10)

1)),

ED(f~ML)=O(N-2(v-m)/(2v+l))

as

N~oe.

Hence ED (s

O(N2m(2m+1)/(4v+ 1)(4v+ 2m+ 1)) _.~ O0

(6.11)

and

ED(~ML)--O(N(4m+2)(2m+p)/('sv+2m+l)(2v+l))--r

as

N - - * oo.

(6.12)

E D ( ~{o~))

Under the weaker condition

~O)2(v+m) [~q[2 ~ --q

00,

q=O

(6.10) is no longer valid. From (6.9) we have 0.2

ED(,~)