Block Momentum-LMS Algorithm Based On The Method ... - IEEE Xplore

0 downloads 0 Views 843KB Size Report
Block momentum-LMS algorithm based on the method of parallel tangents. 0. Tan r I ku I u. J.A. Chambers. A.G. Constant in ides. Indexing terms: Block adaptive ...
Block momentum-LMS algorithm based on the method of parallel tangents 0.Tan r I ku Iu J.A. Chambers A.G. Constantinides

Indexing terms: Block adaptive algorithms, Momentum, LMS, PARTAN, Convergence, Stability

Abstract: Based on the method of parallel tangents, the block-LMS algorithm is modified, and the block momentum-LMS algorithm is proposed. The new algorithm has lower computational complexity than the LMS algorithm. It converges significantly faster than the block-LMS algorithm when the input signal is coloured. The time-constant, mean and meansquare convergence conditions and the misadjustment of the proposed algorithm are derived. As a special case, an accurate meansquare convergence condition is obtained for the block-LMS algorithm. Extension to the frequency domain is also discussed. Comprehensive experimental results on system identification and channel equalisation are presented that validate the theoretical findings.

1

Introduction

In the design of stochastic gradient-descent adaptive filtering algorithms, there is a tradeoff between the speed of convergence and the steady-state misadjustment [14]. The performance of the well known least mean square (LMS) algorithm deteriorates when the input signal is highly coloured [5, 61. Techniques based on the exact solution of the least squares (LS) problem over a window of data, such as the exponentially weighted or sliding window recursive LS (RLS), are less sensitive in such operating conditions. However, their computational complexities may not be affordable if a large number of adaptive weights is to be updated, such as in acoustic echo cancellation [7] or in adaptive channel equalisation [SI. Hence, there is further need to develop computationally efficient adaptive algorithms for some signal processing applications. In this paper, we use the method of parallel tangents (PARTAN) [9, 111 to accelerate the convergence speed of the block-LMS algorithm for highly coloured input signals. Encouraging results have been reported in [12] that show an increased learning rate while training 0IEE, 1997 IEE Proceedings online no. 19971096 Paper first received 6th June and in revised form 5th December 1996 The authors are with the Signal Processing and Digital Systems Section, Department of Electrical and Electronic Engineering, Imperial College of Science, Technology and Medicine, London SW7 2BT, UK IEE Proc.-Vis. Image Signal Process., Vol. 144, No. 2, April 1997

multilayer perceptrons with a version of PARTAN. The application of PARTAN in adaptive filtering is also recent. Tao [13] proposed the SGA-PARTAN algorithm which has similar performance to the RLS algorithm. The computational complexity of SGAPARTAN is less than the standard RLS algorithm but higher than fast RLS versions. It is well known that, upon converting the LMS update equation to the block-LMS update, the misadjustment is decreased by a factor equal to the block size, but the speed of convergence is also reduced by the same amount [14]. Therefore, it seems that the only apparent advantage of the block-LMS algorithm over the LMS algorithm is its efficient implementation using fast algorithms [ 151. The block momentum-LMS (BMLMS) algorithm proposed in this paper yields an improvement over the convergence speed of the blockLMS algorithm without a significant increase in computational complexity. The theoretical/experimental results presented indicate that the BMLMS algorithm can be tuned to have similar convergence speed and misadjustment as the LMS algorithm although it may require fewer arithmetic computations depending on the block size. Our analysis also leads to a better understanding of the mean-square convergence of the block-LMS algorithm.

Fig. 1 Adaptive system identification

2

Method of PARTAN

Consider the finite impulse response (FIR) adaptive filter shown in Fig. 1. The input signal, desired signal, additive noise and the error signal are, respectively, denoted by u(n), d(n), w(n) and e(n), where n is the discrete-time index. During the derivations, column vectors and matrices are, respectively, denoted by lower case and upper case bold letters and ( . ) T is the transposition operation. In the LMS algorithm, the cost function that defines the error performance surface (EPS) is ~ ( n=)E { e 2 ( n ) )

(1) 49

and the corresponding stochastic-gradient update equation for real signals is h(n

+ 1) = h(n) + pe(n)u(n)

(2)

= d ( n ) - hT(n)u(n) (3) where u(n) = [u(n) u(n .1) ... u(n - N + 1)IT is the regressor vector and p is the step size. Moreover, we have d(n) = eop,(n)+ h&,u(n) where hop! is the optimal weight vector and eop,(n)is the error signal at h(n) =

e(.)

hopt. The eigenvalue spread of the input signal is defined as

a

+

(4)

&”

where R E{u(n)uT(n)>is the N x N autocorrelation are, respectively, the matrix of u(n), and hi,and ha, minimum and the maximum eigenvalues of R. If X(R) = 1, u(n) is white and the convergence of the LMS algorithm is fast. On the other hand, if X(R) >> 1 then u(n) is highly coloured and LMS convergence is more difficult [5]. The original statement in [9] suggested that, in the case of two search variables on a quadratic EPS, the exact optimum can be located after only three onedimensional searches. In a later study, Shah et al. [lo] showed that, by using a particular version called the continued-PARTAN, the optimum in an N-dimensional search space can be located after a maximum of 2N - 1 steps. The convergence behaviour of the PARTAN and best-step steepest descent (BSSD) methods for the same EPS and initial condition are illustrated in Fig. 2. (In best-step steepest descent, the optimum step size is used at each iteration). For an arbitrary initial point, h(O), a one-dimensional BSSD search is conducted in order to find the minimum along the negative gradient vector direction. This point is denoted by h(1). Then, the same operation is repeated and h(2) is obtained. At the last step, there is no need to evaluate the gradient vector. A one-dimensional BSSD conducted along the line joining h(1) and h(2) is sufficient to locate the global minimum, hopt, exactly. This is shown explicitly in Fig. 2 where the BSSD search follows a different route after h(2) and it takes many more iterations to reach hop!.

+

+

v

momentum term

which is more computationally complex than the PARTAN-LMS and LMS algorithms. The momentumLMS algorithm is a special case of the high order algorithms discussed in [ 19-2 11.

3

L”

X(R)= -

e

Another related algorithm, momentum-LMS, was proposed [16, 171 by obtaining a stochastic approximation of the conjugate gradient method [ l l , IS]. This algorithm has the update equation: h(n 1) = h(n) p e ( n ) u ( n ) P[h(n)- h(n - l)] (7)

BMLMS adaptive algorithm

When X(R) >> 1, the PARTAN-LMS and momentumLMS algorithms exhibit faster convergence than the LMS algorithm [22]. However, eqn. 6 and the ‘momentum term’ in eqn. 7 cause drift in the adaptive weights around the optimum and increase the misadjustment. Furthermore, these algorithms are more sensitive to noise than the LMS algorithm [22]. These drawbacks appear since (i) sample gradient estimates are used, which do not correspond to the true steepest descent direction; (ii) constant p and /3 do not necessarily ensure an optimal step in a particular direction. Possible refinements of eqns. 5 , 6 and 7 have not yet been explored which is the main contribution in this paper. One possibility to enhance the performance is to obtain a more accurate gradient-descent direction by using a block average of the sample gradient vector. After substituting eqn. 5 into eqn. 6: h(2n 1) =h(2n - 1) p(1 P)e(2n - l)u(2n - 1) P[h(2n - 1)- h(2n - 3 ) ] (8) from which we synthesise the block update equation of the BMLMS algorithm as

+

+

+

+

h ( n L + L)= h ( n L )

y

+ d1

+

U

+

P[h(nL) - h(nL - L)] (9) where L is the block size and e(nL+i) = d(nL+i)-h*(nL)u(nL+i), i = 0,1,. . . ,L-1 (10)

During the convergence analysis of the BMLMS algorithm, we use the we11 known independence assumptions which have been extensively described and used in the analysis of stochastic gradient-descent adaptive algorithms [ 1, 2, 51. For the sake of clarity, we rewrite eqn. 9 as

equimagnitude contours on the EPS

+ P[hL(n)

-

............................

h(2n) = h(2n - 1) p e ( 2 n - l)u(2n - 1)

(5)

h(2n - 3 ) ] 0, e(.) is as shown in eqn. 3.

(6)

50

2

(11)

e ~ ( n ) d ~ ( n-) u ~ ( n ) h ~ ( n ) (12) In this new notation: UL,(n) = [ u ( n L ) u ( n L + 1) . . . u ( n L + L - 1 ) ] T (13) e L ( n ) = [ e ( n L ) e(nL 1) . . . e ( n L + L - l)lT (14) dL(n) = [ d(nL) d(nL I) . . . d(nL L - 1)IT (15) and hL(n + 1) = h(nL + L). 1

Convergence behaviours of BSSD and PARTAN

In [6], the PARTAN-LMS algorithm is developed by utilising the search mechanics described above. This algorithm has a split update equation described by

where

I)]

where

pa;atlet tangents

+ h(2n + 1) = h(2n) + /!?[h(2.)

hL(n. -

v

momentum t e r m

................................

Fig.2

e(nL + i ) u ( n L+ 2 )

t=O

-

+

+

+

IEE Proc -Vu Image Signal Process , Vol 144, No 2, April 1997

3.1 Convergence in the mean In this Section conditions for convergence in the mean

approximated by

and the time constant are obtained for the BMLMS algorithm. We first write

+

=eopt(4 UL(n)hopt (16) where eopt(n)is the error vector for hop, in each block. In terms of the weight error vector; vL(n) hL(n) hop!, by substituting eqns. 16 and 12 into eqn. 11, we have dL(4

from which we obtain

e

(17) By using the independence assumptions [2], E{UZ(n) UL(n)}= LR and taking the statistical expectation of eqn. 17, we get E{VL (n+1)) = (1+P) (I-FR)E{VL( n ) }- P E { ~ (nL 1)) (18) where E{ULT(n)e,,(n)} = 0 due to the principle of

orthogonality [5]. The autocorrelation matrix R can be diagonalised as

where Zk is the time constant corresponding to the kth mode by assuming that pAk > (1 py2p:

+ p)/(l - p) and MLMs 0, ( m = 0 , . . . ,2N - 1) : initial power estimate in each frequency bin E, > 0, ( m = 0,. . . , 2 N - 1) : safety constant in each frequency bin hf (0) = OZN Pm(0) = 6, For each new block of N input samples: U f ( i )= F[u(iN - N ) . . . u ( i N ) . . . u ( i N + N U f ( i ) = diag{Uf(z)}

-

l)JT

Y f ( i )= Uf(i)hf(i) y ( i ) = last N elements of {F1[Yf(Z)]} e ( i )= d ( i ) - y(i)

+

P,(i) = aP,(i - 1) (1 - a!)pf,,(i)lz, m = 0,. . . ,2N - 1 M(i) = diag{[Eo + Po(i)]-l.. . [ Q N - ~ + Pziv-1(i)]-'} @ ( i )= first N elements of F - l [ M ( i ) U ~ ( i ) E f ( i ) ]

-3 0

0

Fi .7

2000

LOO0 6000 8000 10000 discrete-time index n S stem mismatches of LMS, block-LMS and BMLMS algo-

ntfms in cKmnel equalsation with SNR = 30dB ~

- -

--__

6

LMS block-LMS BMLMS

Extension t o the frequency domain

When the block-LMS algorithm is implemented in the frequency domain, the convergence speed can be improved significantly. In the resulting algorithm, fastLMS, the modes in the frequency domain are normalised with respect to the power of the input signal corresponding to each frequency bin [ 5 ] . Existing frequency domain implementations differ in the way the output signal and the adaptive weights are computed and the amount of overlap between successive blocks of data. While the constrained implementation computes proper linear convolutions and solves the Wiener problem, the unconstrained implementation uses circular convolutions. Although the later yields faster convergence, the adaptive weights no longer converge to the Wiener solution thereby leading to higher misadjustment [28]. IEE Proc.-Vis. Image Signal Process., Vol. 144, No. 2, April 1997

The following experiment is performed to compare to the fast-LMS and F-BMLMS algorithms. Experiment 5: The second part of Experiment 3 is repeated with the fast-LMS and F-BMLMS algorithms. The step-size of the fast-LMS algorithm is chosen as pFLMs = 1.2 which is the largest step size that yields the fastest convergence without unstable or erratic behaviour. The convergence parameters of the F-BMLMS algorithm are chosen as ,LLF-BMLMS = 0.6 and j3 = 0.5. Although the step size is decreased relative to the fast-LMS algorithm, the convergence speed is maintained by using the momentum term. Other parameters are chosen as a = 0.5, 6, = 1.0, = 0.1. The resulting system mismatches are averaged over 50 trials and shown in Fig. 8. The F-BMLMS algorithm exhibits slightly faster convergence and lower steadystate system mismatch than the fast-LMS algorithm. The improvement in performance is not as significant as in the block-LMS vs. BMLMS case. Due to the normalisation in the frequency domain, each mode of the adaptive algorithm essentially converges with the same rate. This corresponds to driving the adaptive algorithm with an almost white signal. In this case, the convergence speed improvement with an added momentum term is less significant compared with the time domain implementation. 55

-161

0

200

400

60 0

800

1000

discrete-time index n

Fig.8 System mismatches of fast-LMS and F-BMLMS algorithm in system identification with SNR = 20dB F-BMLMS _ _ _ _ F-LMS ~

7

Conclusions

The original method of PARTAN assumes the knowledge of true gradient direction and the optimum step size at each iteration [9-1 I]. In the PARTAN-LMS and momentum-LMS algorithms, neither of these is available. To solve the first problem, the method of PARTAN is combined with the block-LMS algorithm so that a more accurate estimate of the gradient vector is used. The convergence analysis of the resulting BMLMS algorithm is carried out and the time constant, conditions for convergence and the misadjustment are derived. A new mean square convergence condition is also obtained for the block-LMS algorithm which is more accurate than those reported in [14, 15, 261. During experiments with coloured input signals, it is observed that the convergence of the BMLMS algorithm is significantly faster than the block-LMS algorithm. Compared to the LMS algorithm, the BMLMS algorithm has similar convergence speed and slightly higher steady-state system mismatch, but lower computational complexity depending upon the block size which makes the BMLMS algorithm an attractive alternative to the LMS algorithm for applications with highly coloured input signals. Extension to the frequency domain is also discussed. The experimental results comparing the fast-LMS and F-BMLMS algorithms show that a small increase in the convergence speed can be achieved by using a momentum term. 8

Acknowledgments

The authors would like to thank the anonymous reviewers for their valuable suggestions. 9

References

1 WIDROW, B., MCCOOL, J.M., LARIMORE, M.G., and JOHNSON, C.R.: ‘Stationary and nonstationary learning characteristics of the LMS adaptive filter’, Proc. IEEE, 1976, SP-64, (8), pp. 1151-1162

56

2 FEUER, A., and WEINSTEIN, E.: ‘Convergence analysis of LMS filters with uncorrelated Gaussian data’, ZEEE Trans., 1985, ASSP-33, (I), pp. 222-230 3 HARRIS, R.W., CHABRIES, D.M., and BISHOP, F.A.: ‘A variable step (VS) adaptive filter algorithm’, IEEE Trans., 1986, ASSP-34, (2), pp. 309-316 4 TANRIKULU, O., and CHAMBERS, J.A.: ‘Convergence and steady-state properties of the least mean mixed norm (LMMN) adaptive algorithm’, IEE Proc. Vis. Image Signal Process., 1996, 143, (3), pp. 137-142 5 HAYKIN, S.: ‘Adaptive filter theory’ (Prentice-Hall, Englewood Cliffs, New Jersey, 1996, 3rd edn.) 6 TANRIKULU, O., and CONSTANTINIDES, A.G.: ‘The PARTAN-LMS adaptive filtering algorithm’. Proceedings of DSPCAES, Nicosia, Cyprus, 1993, Vol. I , pp. 69-74 7 GILLOIRE, A., and VETTERLI, M.: ‘Adaptive filtering in subbands with critical sampling: analysis, experiments and application to acoustical echo cancellation’, IEEE Trans. Signal Process., 1992, 40, (8), pp. 1862-1875 8 TREICHLER, J.R., FIJALKOW, I., and JOHNSON, C.R.: ‘Fractionally spaced equalisers: how long should they really be?, IEEE Sig. Proc. Mug., 1996, pp. 65-81 9 FORSYTHE, G.E., and MOTZKIN, T.S.: ‘Acceleration of the optimum gradient method, preliminary report’, Bull. Am. Math. Soc., 1951, 57, pp. 304-305 10 SHAH, B.V., BUEHLER, R.J., and KEMPTHORNE, 0.: ‘Some algorithms for minimising a function of several variables’, J. Soc. Indust. Appl. Math., 1964, 12, pp. 74-92 11 PIERRE, D.A.: ‘Optimisation theory with applications’ (Dover, New York, 1986, reprint) 12 BROWN, M.K.: ‘Methods for rapid learning in artificial neural networks’JEEE international conference on Systems, man and cybernetics, 1991, Vol. 3, pp. 1575-1580 13 TAO, K.M.: ‘Statistical averaging and PARTAN: some alternatives to LMS and RLS’. ICASSP-92, San Francisco, 1992, Vol. 4, pp. 25-28 14 FEUER, A.: ‘Performance analysis of the block least mean square algorithm’, IEEE Trans., 1985, CAS-32, (9), pp. 960-963 15 CLARK, G.A., MITRA, S.K., and PARKER, S.R.: ‘Block implementation of adaptive digital filters’, IEEE Trans., 1981, CAS-28, (6), pp. 584-592 16 ROY, S., and SHYNK, J.J.: ‘Analysis of the momentum LMS algorithm’, IEEE Trans. Acoust. Speech Siznal Process., 1990, 38, (li), pp. 2088-2095 17 TUGAY. M.A.. and TANIK. Y.: ‘Promrties of the momentum LMS algorithm;. Proceedings ’of MELECON-89, Portugal, 1989, pp. 197-200 18 BORAY, G.K., and SRINATH, M.D.: ‘Conjugate gradient techniques for adaptive filtering’, IEEE Trans. Circuits Syst., 1992, 39, (l), pp. 1-10 19 PROAKIS, J.G.: ‘Channel identification for high speed digital communications’, IEEE Trans., 1974, AC-19, pp. 916-922 20 GLOVER, J.R.: ‘Comments on “channel identification for high speed digital communications”’, IEEE Trans., 1975, AC-20, pp. 823 21 GLOVER, J.R.: ‘High order algorithms for adaptive filters’, IEEE Trans., 1979, COM-27, pp. 216-221 22 TANRIKULU, 0.: ‘Adaptive signal processing algorithms with accelerated convergence and noise immunity’. PhD thesis, Imperial College of Science, Technology and Medicine, 1995 23 OGATA, K.: ‘Discete-time control systems’ (Prentice-Hall, Englewood Cliffs, New Jersey, 1987) 24 JAGERMAN, O.L.: ‘Nonstationary blocking in telephone traffic’, Bell Syst. Tech. J., 1975, 54, (3) 25 OPPENHEIM, A.V., WILLSKY, AS., and YOUNG, I.T.: ‘Signals and systems’ (Prentice-Hall, Englewood Cliffs, New Jersey, 1983) 26 HSIEH, L.S.L., and WOOD, S.L.: ‘Performance analysis of time domain block LMS algorithms’. ICASSP-93, Miimeapolis, USA, 1993, Vol. 3, pp. 535-538 27 DOUGLAS, S.C.: ‘Analysis of the multiple-error and block leastmean-square adaptive algorithms’, IEEE Trans. Circuits Syst. 11, Analog. Digit. Signal Process., 1995, 42, (2), pp. 92-101 28 SHYNK, J.J.: ‘Frequency-domain and multirate adaptive filtering’, IEEE Sig. Proc. Mag., 1992, 9, (l), pp. 1&37

IEE Proc -Vis. Image Signal Process, Vol. 144, No 2, April 1997