ML ESTIMATION OF COVARIANCE MATRICES WITH KRONECKER ...

8 downloads 0 Views 60KB Size Report
elled as the Kronecker product of two smaller covariance ma- trices. These smaller matrices may also be structured, e.g., being Toeplitz or at least persymmetric.
ML ESTIMATION OF COVARIANCE MATRICES WITH KRONECKER AND PERSYMMETRIC STRUCTURE Magnus Jansson1, Petter Wirf¨alt1, Karl Werner2 , Bj¨orn Ottersten1 1

ACCESS Linnaeus Center, Electrical Engineering/Signal Processing lab KTH – Royal Institute of Technology, Sweden 2 Ericsson Research, Ericsson AB, Kista, Sweden ABSTRACT

Estimation of covariance matrices is often an integral part in many signal processing algorithms. In some applications, the covariance matrices can be assumed to have certain structure. Imposing this structure in the estimation typically leads to improved accuracy and robustness (e.g., to small sample effects). In MIMO communications or in signal modelling of EEG data the full covariance matrix can sometimes be modelled as the Kronecker product of two smaller covariance matrices. These smaller matrices may also be structured, e.g., being Toeplitz or at least persymmetric. In this paper we discuss a recently proposed closed form maximum likelihood (ML) based method for the estimation of the Kronecker factor matrices. We also extend the previously presented method to be able to impose the persymmetric constraint into the estimator. Numerical examples show that the mean square errors of the new estimator attains the Cram´er-Rao bound even for very small sample sizes. Index Terms— Structured covariance matrices; Kronecker; Persymmetric; Centro-Hermitian; Forward-backward; Maximum likelihood 1. INTRODUCTION In wireless MIMO communications, channels are in certain scenarios modeled by the so-called Kronecker model. In this model the channel matrix elements are assumed to be zeromean Gaussian. The correlations between elements are assumed to satisfy particular relations such that the covariance matrix of the vectorized channel matrix has a Kronecker product structure [1, 2]. Clearly estimating covariance matrices of this form from data is an important problem in obtaining such channel models. It may also be part of the design and analysis of efficient signal processing algorithms for MIMO communications [3]. The problem of estimating covariance matrices having the Kronecker product structure also appears in other applications. For example in spatio-temporal modeling of MEG/EEG data [4, 5]. In statistics, processes having Kronecker covariance matrices are referred to as separable. They occur if the data vector can be modeled as the Kronecker

product between two independent vector processes. The maximum likelihood (ML) method for this problem has been previously discussed in the statistical literature (see e.g. [6, 7, 8]). In particular, a cyclic optimization approach has been proposed to find the ML estimates of the Kronecker factor matrices. The approach is in [6] called the fliflop algorithm, which we review in Section 3.1. The flipflop algorithm has been found to perform very well in numerical examples. Recently, in the signal processing literature, the performances of the flipflop iterations were analytically studied in [9]. It was shown that the flipflop estimates are asymptotically statistically efficient already after three iterations independently of the initialization of the method. This partly explains the fast convergence of the method that was previously observed in numerical examples. In some applications it is reasonable to assume that the Kronecker factor matrices have additional structure. They may for example be covariance matrices corresponding to stationary data and hence have a Toeplitz structure. In MIMO communications this additional Toeplitz structure is sometimes assumed when uniform linear arrays are employed in the systems. If the arrays are not necessarily uniform but are symmetric and linear it may instead be assumed that the (receive and transmit covariance) matrices are persymmetric. Note that the set of Toeplitz matrices is a subset of the set of persymmetric matrices. A drawback of the flipflop algorithm mentioned in [9] is that it is difficult to account for additional structure of the Kronecker factor matrices in the estimator. In this paper, we will however show that it is possible to account for the persymmetric structure by a simple modification of the basic fliflop algorithm. Hence, for estimation problems in which the Kronecker factors also are constrained to be persymmetric, we can devise an estimator that inherits the nice properties of the previous ML based flipflop algorithm. 2. PROBLEM FORMULATION Suppose that x(t) is a zero-mean complex Gaussian discrete time stochastic vector process with E[x(k)x∗ (l)] = R0 δ(k, l)

where ∗ denotes Hermitian transpose and δ(k, l) is Kronecker’s delta function. The covariance matrix R0 is assumed to be separable in the sense that it can be written as the Kronecker product R0 = A0 ⊗ B0 .

(1)

where A0 is m × m and B0 is n × n. Both A0 and B0 are Hermitian and positive definite (p.d.). In this paper we further assume that the factor matrices satisfy a persymmetric constraint such that A0 = JAT0 J

(2)

JBT0 J

(3)

B0 =

each step in this procedure can be performed in closed form, which is one reason behind the popularity of the method. To see how this is done, we rewrite the last term in (4) as ˆ −1 ⊗ B−1 )} = tr{ tr{R(A

m X m X

ˆ kl (A−1 )lk B−1 } (6) R

k=1 l=1

ˆ kl is the k, lth block of size n × n in R, ˆ and (A−1 )lk where R −1 refers to the l, kth element in A . Then, since log |X| + tr{RX−1 } is minimized by X = R (for p.d. matrices), using (6) in (4) we see that the B that minimizes (4) for a fixed A is m

m

1 X X ˆ kl −1 ˆ B(A) = R (A )lk m

(7)

k=1 l=1

where



0 ... 0 . . .  J = .  .. 1 ...

0 1 0

ˆ ˆ and (It can be shown that B(A) is p.d. at least as long as R A are p.d..) In a similar manner it can be shown that the A that minimizes (4) for a given B is

 1 0  ..  . .

n

0

We will use the same symbol J for matrices of this form although they may refer to matrices of different dimensions. The dimension should be clear from the context. The main problem considered in this paper is to devise a maximum likelihood (ML) based method for estimating R0 from observations of x(t) for t = 0, 1, . . . , N − 1, under the Kronecker and persymmetric constraints. Note that the Kronecker parametrization is not identifiable since A0 ⊗ B0 = A0 α ⊗ B0 α−1 for any scalar α 6= 0. Thus, it is only possible to estimate A0 and B0 up to a scale factor. 3. ML ESTIMATION In this section we first review the ML method for the Kronecker model [6, 8]. That is, when the covariance matrix R0 satisfies (1). In Section 3.2, we then present the new ML method when also the persymmetric constraints (2)-(3) on A0 and B0 are included. 3.1. ML for Kronecker model The negative log-likelihood function for the problem can be written as (omitting parameter independent terms) ˆ −1 ⊗ B−1 )} (4) l(A, B) = n log |A| + m log |B| + tr{R(A where N −1 X ˆ = 1 R x(t)x∗ (t) N t=0

(5)

is the sample covariance matrix, and | · | is the determinant function. To find the matrices A and B that minimize l(A, B) a common approach is to use cyclic minimization and alternate between minimizing with respect to A and B, while keeping the other variable fixed. The minimization needed in

n

1 X X ¯ kl −1 ˆ A(B) = R (B )lk n

(8)

k=1 l=1

¯ kl is the k, lth block of size m × m in where R ¯ = Pm,n RP ˆ m,n R and Pm,n is a symmetric permutation matrix defined such that Pm,n (X ⊗ Y) = (Y ⊗ X)Pm,n for any matrices X and Y of dimensions (m×m) and (n×n), respectively. The cyclic minimization algorithm, which is also referred to as the flipflop algorithm, now runs as follows: (i) Select an admissible initial estimate of A = Ai . (ii) Minimize (4) with respect to B given A = Ai . That is, ˆ i ). use (7) to compute B(A ˆ from the (iii) Minimize (4) with respect to A given B = B previous step using (8). ˆ from the (iv) Minimize (4) with respect to B given A = A previous step using (7). (v) Iterate steps (iii) and (iv) until convergence. Obviously we may choose to reverse the order of minimization and first initialize B and then minimize with respect to A etc.. Empirically the fliflop algorithm has been shown to perform very well and that it even can converge faster than Newton methods. Additionally, in [9] it was shown that already the estimates obtained after step (iv) above are asymptotically efficient (regardless of the initialization!). Thus, at least for large N , there is no need to iterate the algorithm but only perform the computations in (i)-(iv). We will call this the non-iterative flipflop algorithm.

3.2. ML for persymmetric Kronecker model

min

A,B subject to A=JAT J, B=JBT J

l(A, B)

(9)

where l(A, B) is given in (4). To find these minimizers we will rewrite the criterion function by using the following facts. Given that A = JAT J, B = JBT J we have

Sample cov. Non it. flipflop Non it. flipflop PS CRB PS

0

10

Normalized RMSE

The difference in the maximum likelihood problem for the persymmetric Kronecker model is that the criterion in (4) should be minimized with respect to A and B subject to the constraints (2)-(3). That is, we should find the minimizers of the optimization problem

−1

10

−2

10

−1

A

T

= (JA J)

−1

= JA

−T

since J−1 = J. This implies that −1

A

⊗B

−1

= JA

−T

J ⊗ JB

= (J ⊗ J) (A

−T

−T

⊗ B−T ) (J ⊗ J)

−T ˆ −1 ⊗ B−1 )} = tr{R(J[A ˆ tr{R(A ⊗ B−T ]J)} ˆ (A−T ⊗ B−T )} = tr{JRJ

in (12) are persymmetric. Hence, the unconstrained minimizers of lPS (A, B) must be equal to the solution of the constrained ML problem in (9). We conclude that one way to obtain the persymmetric ML estimates is to apply the flipflop ˆ algorithm as reviewed in the preceding section but replace R ˆ PS and choose the initial estimate of A (or B) to be perby R symmetric. 4. NUMERICAL EXAMPLES

(11)

where we used the facts that tr{XY} = tr{YX} and tr{X} = tr{XT } for arbitrary matrices X and Y. Thus, because of (11) we can rewrite the likelihood criterion in the persymmetric case as lPS (A, B) = n log |A| + m log |B| 1 ˆ ˆ T J)(A−1 ⊗ B−1 )}. (12) + tr{ (R + JR 2 Note that lPS (A, B) above is equivalent to l(A, B) in (4) given that A and B are persymmetric. This is the reason we put the index PS (persymmetric) on lPS (A, B) in (12). Also ˆ + JR ˆ T J) note that the data enters lPS (A, B) in the form 12 (R that is persymmetric by construction. For future use we define ˆ PS = 1 (R ˆ + JR ˆ T J). R 2

3

10

(10)

where we used that X1 X2 ⊗ Y1 Y2 = (X1 ⊗ Y1 )(X2 ⊗ Y2 ) for arbitrary matrices Xi , Yi , (i = 1, 2) of suitable dimensions, and in the last equality we observed that J ⊗ J is equal to a J-matrix of a larger dimension. Using (10) in the trace term in (4) we get

ˆ T J (A−1 ⊗ B−1 )} = tr{JR

2

10

Number of samples N

Fig. 1. Normalized root mean square error of three estimators as a function of number of samples when A0 and B0 have persymmetric structure. Averages taken over L = 500 Monte Carlo runs for each sample set.

J

= J (A−T ⊗ B−T ) J

1

10

J

(13)

ˆ PS is in fact the ML estimate in the class of persymmetric (R ˆ (see, e.g., [10])). matrices given R The reason for rewriting the likelihood criterion is that it can be shown that the unconstrained minimizers of lPS (A, B)

Numerical Monte Carlo simulations are performed in order to judge the performance of the investigated estimator. Random complex A0 and B0 matrices of the correct structure are generated and then fixed, and R0 is found from these. In each simulation, N samples are generated from a complex ˆ Gaussian distribution with covariance matrix R0 to form R according to (5). The root-mean square errors of the investigated estimators are then calculated according to v u L u1 X ˆ k − R0 k 2 kR F RMSE = t (14) L kR0 k2F k=1

ˆk where R0 is the true covariance matrix as given by (1), R is the estimated covariance according to the studied estimator 2 from Monte Carlo simulation P k, k ·2 kF is the Frobenius norm 2 according to kAkF = i,j |Aij | , and L is the number of Monte Carlo simulations for the current sample case. In Fig. 1 the root mean squares, (14), of three different estimators are compared. A0 and B0 have the discussed persymmetric structure with n = m = 4. The first estimator is simply the sample covariance and never approaches the optimum estimate. The second estimator is the non-iterative

Sample cov. Non it. flipflop Non it. flipflop PS Covariance matching CRB (PS) CRB (toep)

0

Normalized RMSE

10

−1

10

cess is the Kronecker product of two valid covariance matrices. In this paper we also assumed these Kronecker factor matrices to be persymmetric. We discussed the ML estimation of the Kronecker covariance matrix under the persymmetric constraint and suggested to use the non-iterative flipflop algorithm for computing the estimates. Numerical examples indicate an excellent estimation performance of the method and the error was close to the Cram´er-Rao Bound even for very small sample cases. 6. REFERENCES

−2

10

1

10

2

10

Number of samples N

3

10

Fig. 2. Normalized root mean square error of four estimators as a function of number of samples when A0 and B0 have Toeplitz structure. Also shown is the CRB based on the same A0 and B0 but computed under the persymmetric assumption. Averages taken over L = 500 Monte Carlo runs. flipflop algorithm that only utilizes the Kronecker structure of the problem, while the third estimator is the non-iterative flipflop algorithm that also takes the persymmetry into account. Both flipflop algorithms are initialized by a random p.d. matrix, which is persymmetric for the structured estimator and Hermitian for the unstructured. In Fig. 1, the Cram´erRao Bound (CRB) is computed under the persymmetric assumption and is obtained by using the result in [9]. The structured flipflop algorithm can be seen to attain the CRB already at very small sample cases. The unstructured flipflop never becomes optimal. In Fig. 2 additional Toeplitz symmetry is imposed on A0 and B0 , which allows for better estimation due to the fewer parameters needed for the parametrization. The simulations are performed in the same way as in Fig. 1. In addition to the estimators shown in Fig. 1, the optimal covariance matching estimator developed in [9] is also displayed. This algorithm is able to take any arbitrary linear structure in A0 and B0 into account in a statistically efficient manner, but is slightly more complex than the flipflop methods and often requires a larger N to approach the CRB. The algorithm devised in Section 3.2 is not able to take the Toeplitz structure of the problem into account and thus cannot reach the global CRB, but, as in Fig. 1, still attains the relaxed CRB complying to the persymmetric structure of the problem.

[1] J.P Kermoal, L. Schumacher, K. I. Pedersen, P. E. Mogensen, and F. Frederiksen, “A stochastic MIMO radio channel model with experimental validation,” IEEE Journal on Selected Areas in Communications, vol. 20, no. 6, pp. 1211–1226, August 2002. [2] K. Yu, M. Bengtsson, B. Ottersten, D. McNamara, and P. Karlsson, “Modeling of wide-band MIMO radio channels based on NLoS indoor measurements,” IEEE Transactions on Vehicular Technology, vol. 53, no. 8, pp. 655–665, May 2004. [3] K. Werner and M. Jansson, “Estimating MIMO channel covariances from training data under the Kronecker model,” Signal Processing, vol. 89, no. 1, pp. 1–13, 2009. [4] J. C. de Munck, H. M. Huizenga, L. J. Waldorp, and R. M. Heethaar, “Estimating stationary dipoles from MEG/EEG data contaminated with spatially and temporally correlated background noise,” IEEE Transactions on Signal Processing, vol. 50, no. 7, pp. 1565–1572, July 2002. [5] P. Strobach, “Low-rank detection of multichannel Gaussian signals using block matrix approximation,” IEEE Transactions on Signal Processing, vol. 43, no. 1, pp. 233–242, January 1995. [6] N. Lu and D.L. Zimmerman, “On likelihood-based inference for a separable covariance matrix,” Technical Report 337, Department of Statistics and Actuarial Science, University of Iowa, Iowa City, Iowa, 2004. [7] N. Lu and D.L. Zimmerman, “The likelihood ratio test for a separable covariance matrix,” Statistics and Probability Letters, vol. 73, no. 5, pp. 449–457, May 2005. [8] K. Mardia and C. Goodall, “Spatial-temporal analysis of multivariate environmental data,” in Multivariate Environmental Statistics, pp. 347–386. Elsevier, 1993.

5. CONCLUSIONS

[9] K. Werner, M. Jansson, and P. Stoica, “On estimation of covariance matrices with Kronecker product structure,” IEEE Transactions on Signal Processing, vol. 56, no. 2, pp. 478–491, Feb. 2008.

We have studied the ML estimation of covariance matrices from independent zero-mean samples of a multivariate complex Gaussian process. The covariance matrix of the true pro-

[10] M. Jansson and P. Stoica, “Forward-only and forwardbackward sample covariances – A comparative study,” Signal Processing, vol. 77, no. 3, pp. 235–245, Sep. 1999.