communications

0 downloads 0 Views 6MB Size Report
Paper approved by H. Leib, the Editor for Communication and Information. Theory of ...... where fλ(x) is the probability density function (pdf) of the nonnegative ...
I E E E TRANSACTIONS ON

C OM M U N I CATIONS A P U B L I C AT I O N O F T H E I E E E C O M M U N I C AT I O N S S O C I E T Y

MAY 2008

VOLUME 56

NUMBER 5

IECMBT

(ISSN 0090-6778)

TRANSACTIONS LETTERS DIGITAL COMMUNICATIONS Cramer-Rao Lower Bound for Non-Data-Aided SNR Estimation of Linear Modulation Schemes . . . . . . . . . . . . . . . . . . . . . 689 Wilfried Gappmair F A D I N G/ EQ U A L I Z A T I O N Simple Average BER Formulas for M -ary Orthogonal Signals with Noncoherent Diversity Combining in Nakagami-m Fading Channels. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 694 Redha M. Radaydeh and Mustafa M. Matalgah SYNCHRONIZATION On the Joint Synchronization of Clock Offset and Skew in RBS-Protocol. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 700 Ilkay Sari, Erchin Serpedin, Kyoung-Lae Noh, Qasim Chaudhari, and Bruce Suter TRANSMISSION SYSTEMS Blind Estimation of Carrier Frequency Offset and DC Offset for OFDM Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704 Hai Lin, Herath Mudiyanselage Sankassa Bandara Senevirathna, and Katsumi Yamashita Capacity of MRC on Correlated Rician Fading Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 708 Khairi Ashour Hamdi

TRANSACTIONS PAPERS DATA COMPRESSION Joint Source and Channel Coding using Punctured Ring Convolutional Coded CPM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 712 Zihuai Lin and Tor Aulin M O D U L A T I O N/ DE T E C T I O N Near Optimal Common Detection Techniques for Shaped Offset QPSK and Feher’s QPSK . . . . . . . . . . . . . . . . . . . . . . . . . . 724 Tom Nelson, Erik Perrins, and Michael Rice Probability Density Function of Reliability Metrics in BICM with Arbitrary Modulation: Closed-form through Algorithmic Approach. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 736 Leszek Szczecinski, Rolando Bettancourt, and Rodolfo Feick MULTIPLE ACCESS Mobility Enhanced Smart Antenna Adaptive Sectoring for Uplink Capacity Maximization in CDMA Cellular Network . . 743 Alex Wang and Vikram Krishnamurthy NETWORKS Robust Optimal Cross-Layer Designs for TDD-OFDMA Systems with Imperfect CSIT and Unknown Interference: State-Space Approach Based on 1-bit ACK/NAK Feedbacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 754 Rui Wang and Vincent K. N. Lau OPTICAL COMMUNICATIONS Hard-Limiting Performance Analysis of 2-D Optical Codes Under the Chip-Asynchronous Assumption . . . . . . . . . . . . . . . . 762 Chia-Cheng Hsu, Guu-Chang Yang, and Wing C. Kwong (Table of Contents continued on back cover)

SPREAD SPECTRUM Joint Optimum Linear Precoding and Power Control Strategies for Downlink MC-CDMA Systems . . . . . . . . . . . . . . . . . . . 769 Nevio Benvenuto, Paola Bisaglia, and Federico Boccardi Minimum Mean-Squared Error Iterative Successive Parallel Arbitrated Decision Feedback Detectors for DS-CDMA Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 778 Rodrigo C. de Lamare and Raimundo Sampaio-Neto SYNCHRONIZATION Synchronization and Integration Region Optimization for UWB Signals with Non-coherent Detection and A uto-correlation Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 790 Rongrong Zhang and Xiaodai Dong TRANSMISSION SYSTEMS Cross-layer Adaptive Transmission: Optimal Strategies in Fading Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 799 Anh Tuan Hoang and Mehul Motani T R A N S M I S S I O N/ RE C E P T I O N Non Binary and Precoded Faster Than Nyquist Signaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 808 Fredrik Rusek and John B. Anderson Progressive Linear Precoder Optimization for MIMO Packet Retransmissions Exploiting Channel Covariance Information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 818 Haitong Sun, Zhihua Shi, Chunming Zhao, Jonathan H. Manton, and Zhi Ding Optimum Power Allocation for Multiuser OFDM with Arbitrary Signal Constellations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 828 Angel Lozano, Antonia M. Tulino, and Sergio Verdu Quantized Principal Component Selection Precoding for Spatial Multiplexing with Limited Feedback . . . . . . . . . . . . . . . . . 838 Cheol Mun

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

689

Transactions Letters Cramer-Rao Lower Bound for Non-Data-Aided SNR Estimation of Linear Modulation Schemes Wilfried Gappmair, Member, IEEE

Abstract—Powerful parameter estimators exhibit a jitter variance which is fairly close to the Cramer-Rao lower bound (CRLB) as the theoretical limit. In contrast to symbol timing and carrier frequency/phase, not very much information is available from the open literature with respect to the signal-to-noise ratio (SNR), i. e., the CRLB has been reported only for the dataaided case and some simple M -PSK examples for non-data-aided estimation of the SNR. Motivated by this background, an efficient algorithm is presented which applies to any M -ary one/twodimensional modulation scheme with axis/halfplane symmetry and a channel distorted by additive white Gaussian noise. Finally, the performance of different SNR estimators is compared to the derived bound. Index Terms—Digital receivers, Cramer-Rao lower bound, non-data-aided SNR estimation, linear modulation schemes.

P

I. I NTRODUCTION

ARAMETER estimation and synchronization in digital receivers has already reached a mature state [1], [2]. Powerful algorithms exhibit no bias and their jitter variance is close to the Cramer-Rao lower bound (CRLB) as the theoretical limit [3]. However, compared to symbol timing or carrier frequency/phase as parameters, not very much information is available about the estimation of the signalto-noise ratio (SNR) although a lot of modern communication systems require the latter to be known for proper operation, e. g., power control for adaptive modulation [4] or iterative soft-decoding procedures [5]. Pauluzzi and Beaulieu [6] provided an excellent overview of useful algorithms including, for comparison purposes, the derivation of the CRLB for data-aided (DA) estimation of the SNR which applies to any linear modulation scheme. Assuming 2-PSK and 4-PSK signals, the CRLB for the nondata-aided (NDA) case is discussed in [7] with an elegant numerical solution given in [8]. The approach taken in this letter, however, allows the efficient computation of the CRLB for NDA SNR estimation using any M -ary one/two-dimensional (1/2D) symbol constellation with axis/halfplane symmetry. The rest of the paper is organized as follows. In Section II, the equivalent baseband model for analysis and simulation work is introduced assuming that the symbol timing has been

Paper approved by H. Leib, the Editor for Communication and Information Theory of the IEEE Communications Society. Manuscript received May 6, 2006; revised February 14, 2007. The author is with the Institute of Communication Networks and Satellite Communications, Graz University of Technology, 8010 Graz, Austria (e-mail: [email protected]). Digital Object Identifier 10.1109/TCOMM.2008.060275.

established already. Then, in Section III, the computation of the CRLB via Fisher’s information matrix will be reviewed and applied to DA and NDA estimations of the SNR. In Section IV, the bound is compared to the performance of different SNR estimators available from the open literature, focusing on 4-PSK and 16-QAM as prominent representatives of linear modulation schemes with constant and non-constant modulus, respectively. Finally, conclusions are drawn in Section V. II. E QUIVALENT BASEBAND M ODEL Throughout this paper, it is assumed that the symbol timing has been perfectly recovered and that the independent and identically distributed (i.i.d.) symbols ck ∈ C are complex zero-mean with normalized variance E[|ck |2 ] = 1, where C denotes the M -ary symbol alphabet. Furthermore, let the received signal be rotated by the carrier phase θk . Note that the latter is not necessarily a constant if, for example, carrier frequency offsets have to be taken into account. In the sequel, the complex samples rk at the output of the receiver matched filter (MF), distorted by additive white Gaussian noise (AWGN), are given by √ √ rk = Sck ejθk + N wk . (1) Real and imaginary parts of the complex zero-mean AWGN samples wk are independent, each with the same variance of 1/2. Signal and noise power will be expressed by S and N such that the true SNR is simply defined by ρ := S/N . In the following, the analytical work is based on an observation interval of k = 1, 2, . . . , L samples (1). Note also that no oversampling will be considered (baud-rate sampling, one sample per symbol), because symbol timing is assumed to be established by appropriate means [1], [2]. III. C OMPUTATION OF THE C RAMER -R AO L OWER B OUND Given a vector u of parameters to be estimated, the probability of a sequence r of L independent receiver samples (1), frequently also denoted as observations, is computed as Pr(r|u) = Ev [Pr(r|u, v)], where Ev [·] expresses expectation with respect to the vector v of nuisance parameters. Using an unbiased estimator, i. e., E[ˆ u] = u, the CRLB of the i-th element in u is provided by CRLB(ui ) = F−1 ii (u) with

c 2008 IEEE 0090-6778/08$25.00 

(2)

690

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

 Fij (u) = −Er





2



∂ ∂ ∂ Λ(r|u) ≡ Er Λ(r|u) Λ(r|u) ∂ui ∂uj ∂ui ∂uj (3)

as the (i, j)-th element in the Fisher information matrix [3]; Λ(r|u) := log Pr(r|u) is the associated log-likelihood function (LLF) and Er [·] symbolizes the expected operation with respect to data and noise.

If both carrier phase and data are known, no nuisance parameter needs to be taken into account. Therefore, with u = (ρ, N ), the Cramer-Rao lower bound for unbiased estimation of ρ develops as [3], [6] 2

2

2

∂ Λ Er [ ∂N 2] 2

2

∂ Λ ∂ Λ ∂ Λ Er [ ∂ρ∂N ] − Er [ ∂N 2 ]Er [ ∂ρ2 ]

.

(4)

With independent samples rk , the LLF in (4) is straightforwardly given by Λ(r|ρ, N ) = log

L 

Pr(rk |ρ, N, ck , θk )

(5)

k=1

where

  √ |rk − Sck ejθk |2 1 exp − . (6) Pr(rk |ρ, N, ck , θk ) = πN N

Hence, with S = ρN , the LLF can be restated as shown in (7). Using (7), the derivatives in (4) are computed easily. Omitting algebraic details for the sake of clarity, the CRLB for dataaided SNR estimation is given by CRLBDA (ρ) = (ρ2 +2ρ)/L, for convenience normalized with ρ2 and henceforth written as   2 CRLBDA (ρ) 1 1 + . (8) = NCRLBDA (ρ) := ρ2 L ρ The relationship is equivalent to the CRLB in [6], evaluated at one sample per symbol. It will serve as a benchmark for SNR estimates obtained at the baud rate from the MF output. For one-dimensional signals rk , i. e., θk = 0, ck and wk real-valued with E[c2k ] = E[wk2 ] = 1, the LLF appears as Λ(r|ρ, N ) = −

Pr(rk |ρ, N, θk ) 1 1 −|rk −√Sdi ejθk |2 /N e = M πN di ∈C 1 −|rk |2 /N 1 2√ρ/N Re[rk d∗i e−jθk ]−ρ|di |2 e = e . (10) πN M di ∈C

A. Data-Aided Estimation

CRLB(ρ) =

to the i.i.d. elements of the latter, i. e., Pr[ck ∈ C] = 1/M , EC [·] is established as

L

1 2 L log(2πN ) − (rk − 2 ρN rk ck + ρN c2k ). 2 2N k=1 (9)

B. Non-Data-Aided Estimation If neither carrier phase nor data are known in advance, i. e., v = (c, θ) with c = (c1 , c2 , . . . , cL ) and θ = (θ1 , θ2 , . . . , θL ), the likelihood function Pr(r|u, v) has first to be averaged with respect to v before the CRLB can be computed according to (4). Assuming again independent samples rk , the joint probability Pr(r|ρ, N, θ) is obtained as L k=1 EC [Pr(rk |ρ, N, ck , θk )], where EC [·] denotes expectation of (6) with respect to the M -ary symbol alphabet C. Due

Furthermore, if di and −di ∈ C, the relationship simplifies to (11). H denotes the subset of symbols in C located in the right (or left) half of the complex plane; alternatively, the same result is achieved if H is the subset in the upper (or lower) part of the plane. Note also that this property of di , throughout this paper denoted as halfplane symmetry, does not necessarily mean that the related constellation is quadrature symmetric although the latter represents a special case of the former. Thus, it is easily verified that (11) applies also to 2-PSK (M = 2, H = {1}) and M -PAM schemes as will be detailed below. √ √ With yk := rk e−jθk = Sck + N nk , where the derotated noise component nk := wk e−jθk has the same probability distribution as wk , i. e., both wk and nk are complex zeromean AWGN samples, the expectation with respect to θ does not need to be considered for subsequent analysis. Hence, by inspection of (11), the likelihood function for yk appears as f (yk |ρ, N )

1 −|yk |2 /N 2 −ρ|di |2 e = e cosh(2 ρ/N Re[yk d∗i ]). πN M di ∈H

(12)

For 2-PSK as the most simple case, explicitly addressed in [7] and [8], this yields

f (yk |ρ, N ) =

1 −ρ−|yk |2 /N e cosh(2 ρ/N Re[yk ]). (13) πN

Taking into account the fact that the samples yk are independent, it is easy to show that the CRLB for unbiased NDA estimation of the SNR is given by ∂Λ 2 ) Ey ( ∂N 1 CRLBNDA (ρ) = ∂Λ 2 ∂Λ 2

L Ey ( ) Ey ( ) − Ey ∂Λ ∂Λ 2 ∂ρ ∂N ∂ρ ∂N (14) where Λ(yk |ρ, N ) := log f (yk |ρ, N ) defines the related LLF. In contrast to the DA case, no closed-form solution is available. Simplified numerical evaluation is the reason why the equivalent form in (3), involving only the first derivatives of Λ(·), is employed in (14). Since Ey [·] symbolizes the averaging with respect to data ck ∈ C and noise nk , the

GAPPMAIR: CRAMER-RAO LOWER BOUND FOR NON-DATA-AIDED SNR ESTIMATION OF LINEAR MODULATION SCHEMES

691

L

1 (|rk |2 − 2 ρN Re[rk c∗k e−jθk ] + ρN |ck |2 ) N

(7)

1 −|rk |2 /N 2 −ρ|di |2 e e cosh(2 ρ/N Re[rk d∗i e−jθk ]) πN M

(11)

Λ(r|ρ, N ) = −L log(πN ) −

k=1

Pr(rk |ρ, N, θk ) =

di ∈H

In this context, fρ (·) and fN (·) are the partial derivatives of (12) with respect to ρ and N , whereas pN (nk ) denotes the probability density of nk . Employing the definitions

Cik (ρ, N ) := cosh(2 ρN Re[yk d∗i ]) (18)

(19) Sik (ρ, N ) := sinh(2 ρN Re[yk d∗i ]) the derivatives can be expressed as shown in (20) and (21). 2 Note that the factor Ak := (πN )−1 e−|yk | /N is also part of f (·) in (12) such that it may be omitted throughout the computation of (15) – (17). In the √ following, with |yk |2 = √ √ 2 ∗ N | ρck + nk | and Re[yk di ] = N Re[( ρck + nk )d∗i ] plugged into (12), (20) and (21), it becomes clear that the noise power N cancels out in (14), i. e., the CRLB emerges only as a function of ρ and the corresponding modulation scheme. Note also that for M -PSK the computation simplifies insofar as |ck | = |di | = 1. Each of the terms in (15) – (17) must be averaged with respect to the complex zero-mean noise samples nk = nk,I + jnk,Q by means of the probability density 2 2 1 (22) pN (nI , nQ ) = e−(nI +nQ ) . π Therefore, with EC [·] as the average with respect to the M ary symbol alphabet C and the term of interest in (15) – (17) generically denoted by Φ(ck , nk ), the expected operations can be summarized as Ey [Φ(ck , nk )]  ∞  ∞  Φ(ck , nI , nQ )pN (nI , nQ )dnI dnQ = EC −∞ −∞   1 ∞ ∞ = Φ(ck , nI , nQ )pN (nI , nQ )dnI dnQ . M −∞ −∞ ck ∈C

(23)

16 ___ 2-PSK - - 4-PSK -.-. 8-PSK -..- 16-QAM

14

CRLBNDA/CRLBDA

expected operations in (14) develop as 2   ∂Λ Ey ∂ρ   2 ∞  fρ (yk |ρ, N ) = EC (15) pN (nk )dnk f (yk |ρ, N ) −∞ 2   ∂Λ Ey ∂N   2 ∞  fN (yk |ρ, N ) = EC (16) pN (nk )dnk f (yk |ρ, N ) −∞   ∂Λ ∂Λ Ey ∂ρ ∂N  ∞  fρ (yk |ρ, N )fN (yk |ρ, N ) = EC pN (nk )dnk . f (yk |ρ, N )2 −∞ (17)

12 10 8 6 4 2

5

10

15

20

25

30

SNR [dB]

Fig. 1.

CRLB factors for different 2D modulation schemes.

As already mentioned, no closed-form solution is available for this problem such that a numerical approach must be envisaged. From the open literature [9], [10], it is well known that the type of integrals encountered in (23) might be tackled most effectively by a Gauss-Hermitean quadrature, i. e., ν ν 1 Hl Hm Φ(ck , xl , xm ) πM ck ∈C l=1 m=1 (24) where Hl , Hm represent the Hermitean weighting coefficients and xl , xm are roots of the Hermitean polynomial with degree ν. Throughout this letter, a value of ν = 20 turned out to be large enough for sufficient accuracy. For several 2D constellations, Fig. 1 illustrates the evolution of the CRLB factor defined as CRLBNDA (ρ)/CRLBDA (ρ). It can be seen that the NDA curves approach asymptotically the value of the DA limit, which is the same for all 2D schemes [6], whereas the NDA results turn out to be different. In contrast to the PSK schemes shown in Fig. 1, the 16-QAM factor develops in a non-monotone manner. Irrespective of the parameter to be estimated, this phenomenon seems to be typical for modulation schemes with non-constant modulus since it is also observed with the CRLB for NDA estimation of the carrier frequency/phase [11]. The framework developed previously may be employed for real-valued signals as well, as it is satisfied √ by 1D √ schemes like M -PAM. In this case, with rk = Sck + N wk and E[c2k ] = E[wk2 ] = 1, the related probability function appears as

Ey [Φ(ck , nk )] ≈

f (rk |ρ, N )

2 1 2 −ρ|di |2 /2 = √ e cosh( ρ/Nrk di ). e−|rk | /2N M 2πN di ∈H (25)

692

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008



 Re[yk d∗i ] √ Sik (ρ, N ) − |di |2 Cik (ρ, N ) ρN di ∈H     2 1 ρ |yk |2 ∗ C e−ρ|di | − (ρ, N ) − Re[y d ]S (ρ, N ) fN (yk |ρ, N ) = Ak ik k i ik N2 N N3 fρ (yk |ρ, N ) = Ak



e−ρ|di |

2

(20) (21)

di ∈H

30

1

Normalized MSE

0.1 0.05

Mean estimator output

DA (4-PSK) DA (16-QAM) DD (4-PSK) DD (16-QAM) - - NCRLB NDA(4-PSK) -.- NCRLB NDA(16-QAM) ___ NCRLB DA

0.5

0.01 0.005

25

20

15

10

DA (4-PSK) DA (16-QAM) DD (4-PSK) DD (16-QAM)

5

- - Unbiased 0.001 5

10

15

20

25

30

5

SNR [dB]

Now, H denotes the subset of symbols on the positive (or negative) part of the real axis. Since (25) is very similar to (12), all relationships necessary to compute the CRLB for 1D problems are more or less the same as those provided for the 2D case. The main difference is in some immaterial constants, the fact that no Re[·] and | · | operations must be taken into account and that the Gauss-Hermitean quadrature involves solely a single sum. IV. P ERFORMANCE C OMPARISON For oversampled signals distorted by AWGN, typically encountered at the matched filter (MF) input of digital receivers, a maximum-likelihood (ML) algorithm has been developed in [6], [12] for DA estimation of the SNR. But the latter can not be applied to oversampled signals at the MF output, mainly due to the fact that the noise samples are now correlated (nonwhite); split-symbol estimators, also discussed in [6], are not useful alternatives because of their inferior performance. On the other hand, assuming the symbol timing to be established by appropriate means [1], [2] such that only one sample per symbol has to be processed, the ML method – in [6] denoted as squared signal-to-noise variance (SNV) estimator – turns out to be an attractive solution for SNR estimation. In the following, it is assumed that the carrier phase in (1) is suitably recovered by DA techniques [1], [2]. For convenience, bias and jitter effects will be neglected such that the phase estimate θˆk = θk . Applying the perfectly recovered samples zk = rk e−jθk to the analytical work in [6], the DA SNR estimate may be derived as

where

M12 − M0 M12

M02 M2

15

20

25

30

SNR [dB]

Fig. 2. Evolution of the normalized mean square error E[(ρ − ρ) ˆ 2 ]/ρ2 : L = 512.

ρˆ =

10

(26)

Fig. 3.

Evolution of the mean estimator output E[ˆ ρ] : L = 512.

M0 :=

L L L 1 1 1 |ck |2 , M1 := Re[c∗k zk ], M2 := |zk |2 . L k=1 L k=1 L k=1 (27)

Note that in [6] a bias-reducing factor L/(L − 3/2) is taken into account as well. The latter can be omitted if L is sufficiently large as it is the case throughout this letter. Moreover, if |ck | = 1, then M0 = 1 such that (26) simplifies accordingly. For 4-PSK and L = 512, Fig. 2 depicts the simulation results of the mean square error (MSE), i. e., E[(ρ − ρˆ)2 ], for convenience normalized with ρ2 . It can be seen that the performance of the DA estimator is very close to CRLBDA . Fig. 3 exemplifies the behavior of the mean estimator output given by E[ˆ ρ]. No significant deviation from the unbiased case E[ˆ ρ] = ρ is identified. If data are not known in advance, the carrier phase in (1) must be computed via suitable NDA algorithms [1], [2]. Taking into consideration phase ambiguity but neglecting again bias and jitter effects, θˆk is now given by θk + 2iπ/M , i = 0, 1, . . . , M − 1, for M -PSK, and θk + iπ/2, i = 0, 1, . . . , 3, for M -QAM or similar quadrature-symmetric schemes. Using ˆ (27) evaluated at zk = rk e−jθk and ck replaced by the decision value cˆk , (26) may be employed to obtain a decisiondirected (DD) SNR estimate. Fig. 2 shows that the MSE deviates significantly from CRLBDA as soon as SNR < 10 dB. Note also that the gap between MSE and CRLBNDA , which suggests the development of an NDA estimator derived from the ML principle. This observation is also confirmed by Fig. 3, where the mean DD output starts to degrade at SNR < 7 dB. Fig. 2 illustrates also the DA/DD performance for 16QAM and L = 512. As already observed with 4-PSK, the DA variance is close to CRLBDA , whereas the DD solution diverges considerably from CRLBNDA when SNR < 15 dB,

GAPPMAIR: CRAMER-RAO LOWER BOUND FOR NON-DATA-AIDED SNR ESTIMATION OF LINEAR MODULATION SCHEMES

also verified by Fig. 3 showing the mean estimator output. V. C ONCLUDING R EMARKS In order to compute the CRLB for NDA estimation of the SNR, an efficient procedure has been presented. This is mainly achieved (i) by exploiting the axis/halfplane symmetry of conventional one/two-dimensional modulation schemes and (ii) by numerical evaluation of the involved integrals via a GaussHermitean quadrature. Regardless of the signal constellation, the NDA bound approaches asymptotically the DA case with increasing SNR values. In the sequel, the performance of DA and DD estimators available from the open literature has been verified by simulation results: whereas the former exhibit an MSE close to the related CRLB, the latter reveal an increasing gap in the medium-to-low SNR range, which might be bridged by an NDA estimator derived from the corresponding ML principle. For real-valued signals like M -PAM, the derived relationships apply as well although results, due to their minor relevance in practice, are not shown. ACKNOWLEDGMENT Part of the work has been carried out in SatNEx-II (Satellite Communications Network of Excellence, IST No. 27393) launched by the European Commission for advanced research in satellite communications within the Sixth Framework Programme. Also, the author would like to thank his colleagues for fruitful discussions and the anonymous reviewers for their comments which helped to improve the quality of this paper.

693

R EFERENCES [1] U. Mengali and A. N. D’Andrea, Synchronization Techniques for Digital Receivers. New York: Plenum Press, 1997. [2] H. Meyr, M. Moeneclaey, and S. A. Fechtel, Digital Communication Receivers: Synchronization, Channel Estimation, and Signal Processing. New York: Wiley, 1998. [3] S. M. Kay, Fundamentals of Statistical Signal Processing: Estimation Theory. Upper Saddle River, NJ: Prentice Hall, 1993. [4] T. S. Chung and A. J. Goldsmith, “Degrees of freedom in adaptive modulation: a unified view,” IEEE Trans. Commun., vol. COM-49, no. 9, pp. 1561–1571, Sept. 2001. [5] T. A. Summers and S. G. Wilson, “SNR mismatch and online estimation in Turbo decoding,” IEEE Trans. Commun., vol. COM-46, no. 4, pp. 421–423, Apr. 1998. [6] D. R. Pauluzzi and N. C. Beaulieu, “A comparison of SNR estimation techniques for the AWGN channel,” IEEE Trans. Commun., vol. COM48, no. 10, pp. 1681–1691, Oct. 2000. [7] N. S. Alagha, “Cramer-Rao bounds of SNR estimates for BPSK and QPSK modulated signals,” IEEE Commun. Lett., vol. 5, no. 1, pp. 10– 12, Jan. 2001. [8] A. Wiesel, J. Goldberg, and H. Messer, “Non-data-aided signal-to-noise ratio estimation,” in Proc. Int. Commun. Conf., New York, May 2002, pp. 197–201. [9] M. Abramowitz and I. A. Stegun, Handbook of Mathematical Functions. New York: Dover Publications, 1970. [10] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, Numerical Recipes in C: The Art of Scientific Computing. New York: Cambridge Univ. Press, 1994. [11] F. Rice, B. Cowley, B. Moran, and M. Rice, “Cramer-Rao lower bounds for QAM phase and frequency estimation,” IEEE Trans. Commun., vol. COM-49, no. 9, pp. 1582–1591, Sept. 2001. [12] Y. Chen and N. C. Beaulieu, “Maximum likelihood SNR estimators for digital receivers,” in Proc. Pacific RIM Conference on Communications, Computers and Signal Processing, Victoria, BC, Canada, Aug. 2005, pp. 637–640.

694

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

Simple Average BER Formulas for M -ary Orthogonal Signals with Noncoherent Diversity Combining in Nakagami-m Fading Channels Redha M. Radaydeh, Member, IEEE, and Mustafa M. Matalgah, Senior Member, IEEE

Abstract— This paper presents simple and exact expressions for the average bit error rate (BER) of M -ary orthogonal signals with noncoherent diversity combining in independent as well as arbitrarily correlated nonidentically distributed Nakagami-m fading channels. The expressions are given in terms of elementary functions, and they do not involve numerical integrals or complicated functional operations. In addition, they are valid for the generalized case when the channels have arbitrary average signalto-noise ratios (SNRs) as well as nonidentical fading parameters. Simulation results are provided to validate the analytical results. Index Terms— M -ary orthogonal signals, noncoherent diversity combining, Nakagami-m fading, correlated fading, average bit error rate.

N

I. I NTRODUCTION

ONCOHERENT M -ary orthogonal modulation is practically employed in current wireless communication systems (for example, in the reverse link of IS-95 wireless systems). The average error performance of M -ary orthogonal signals over nonidentical and/or correlated fading channels with diversity reception has been investigated in several places (see, e.g., [2]–[6] and [1, ch. 9]) for different fading channel distributions. In these references, the conventional noncoherent equal-gain combining (NC-EGC), which is also known as the square-law combining, is employed at the receiver to achieve diversity gain. Although noncoherent square-law diversity combining is simple-to-implement, it usually results in complicated expressions for the system average error performance. In [1] (see also [2]) and [6], the system average bit error rate (BER) has been analyzed in two different approaches and the final results have been expressed in terms of one-fold finite range integrals. In [3], the average BER has been expressed in terms of higher-order derivatives of the moment generating function (MGF) of the “signal-plus-noise” decision variable, which were evaluated in closed forms given in terms of special summations. These special summations require look-up tables to be calculated separately for each channel fading scenario and transmission conditions. In [4], the average BER has been also expressed in terms of higher-order derivatives of the MGF of the “signal-plus-noise” decision variable. To evaluate Paper approved by V. A. Aalo, the Editor for Diversity and Fading Channel Theory of the IEEE Communications Society. Manuscript received May 2, 2006; revised December 6, 2006 and May 6, 2007. This paper was presented in part at the 49th annual IEEE Global Telecommunications Conference (GLOBECOM’2006), San Francisco, CA, November 2006. R. M. Radaydeh is with the Department of Electrical Engineering, Jordan University of Science and Technology (JUST), Irbid, 22110, Jordan (e-mail: [email protected]). M. M. Matalgah is with the Center for Wireless Communications, Department of Electrical Engineering, The University of Mississippi, University, MS 38677, USA (e-mail: [email protected]). Digital Object Identifier 10.1109/TCOMM.2008.060274.

these higher-order derivatives, special complicated recursive algorithms were developed for Nakagami-m and Rician fading models. In [5], a special algorithm has been presented (see [5, app. C]) to evaluate the higher-order derivatives of the MGF of the “signal-plus-noise” decision variable (or more specifically, the MGF of the combined signal-to-noise ratio (SNR)). A main drawback of the conventional square-law combiner, in addition to its expressions complexity, is that it incurs noncoherent combing loss at relatively low average SNRs [1, ch. 7], [7, ch. 12]. Recently in [8], a noncoherent diversity combiner has been proposed for nonidentical Nakagamim fading channels. This novel noncoherent combiner has the same structure as the conventional square-law combiner but with weighted branches, and hence, it may experience comparable level of implementation complexity given that the weighting coefficients are known. It has been shown in [8] that the proposed combiner is capable of completely eliminating the noncoherent combining loss associated with the conventional square-law combiner. The analysis in [8] is restricted to the case of binary differential phase shift keying (DPSK) and binary noncoherent orthogonal frequency shift keying (NC-FSK) signals over independent nonidentical Nakagami-m fading channels. In this paper, we use the proposed combiner in [8] to extend the analysis of the system average BER to the case of M -ary orthogonal signals over independent and arbitrarily correlated nonidentical Nakagami-m fading channels. The expressions for the average BER derived herein are given in terms of elementary functions, and they do not involve numerical integrations or complicated functional operations. Interestingly, the results show that the proposed combiner in [8] works well (more specifically, does not incur noncoherent combining loss) even for the case of arbitrarily correlated channels that experience high correlation coefficients. The rest of the paper is organized as follows. Section II presents the system model. Section III contains the derivation of the system average BER. Comparisons between the derived expressions for the system average BER in this paper and those addressed previously are depicted in section IV. Finally, conclusion follows in section V. II. S YSTEM M ODEL The input information bits are divided into groups each containing ks bits. Every group of bits is used by the modulator to activate one of predefined complex-valued baseband ks is the total orthogonal signals {˜ si (x)}M i=1 , where M = 2 number of signals. The real-valued bandpass transmitted signal can be written as " ! (1) si (x) = ! s˜i (x) ej 2fc πx ; 0 ≤ x ≤ Ts ,

c 2008 IEEE 0090-6778/08$25.00 !

RADAYDEH and MATALGAH: SIMPLE AVERAGE BER FORMULAS FOR M -ARY ORTHOGONAL SIGNALS

√ where j = −1, Ts = ks Tb is the signal duration, Tb is the si (x)}M bit duration, and fc is the carrier frequency in Hz. {˜ i=1 in (1) satisfy the following # Ts s˜k (x)˜ s∗i (x) dx = 2Es δ(i − k), (2) 0

where Es = ks Eb is the energy of the real-valued bandpass signal, and Eb is the bit energy. The signal in (1) is transmitted over slowly varying fading channel, which is typical of digital mobile radio applications. The fading is said to be slow as long as the signaling duration Ts is less that the channel coherent time Tc [1, ch. 2]. For an average reference signaling rate of 100 kbs, the average signaling duration is (0.01) ks ms, where ks is the number of bits per signal. The value of Tc for the channel ranges from 30 to 1 ms for the mobile speed of 2.5 to 75 km/hr [10]. Therefore, the fading will be constant over at least one signaling duration. To combat the severe effects of deep fading, a diversity system of order L is used. There are several design methods for achieving diversity in faded signals reception. One method is to use antenna array at the receiver (antenna diversity). Another method is to use multiple frequency channels separated by at least the channel coherence bandwidth (e.g., multicarrier systems such as MC-CDMA and OFDM). In broadband wireless applications over frequencyselective channels, diversity can be achieved by resolving received multipath components at different time delays (e.g., as in direct-sequence spread-spectrum systems with Rake receiver). The complex-valued impulse response of the channel ˜ ) = $L ν" ejφ! δ(τ − τ" ), where ν" , φ" , is given by h(τ "=1 and τ" are, respectively, the channel fading envelope, phase shift, and time delay of the %th diversity branch. {ν" }L "=1 are in general arbitrarily correlated and nonidentically distributed Nakagami-m random variables. The faded signal replica at the input of the diversity receiver that corresponds to the ith transmitted signal can now be written, in equivalent low-pass representation, as r˜i (x) =

L % "=1

ν" s˜i (x − τ" )ejϕ! + z˜i," (x),

(3)

{˜ zi," (x)}L "=1 ,

for i = 1, 2, . . . , M , are independent where identically distributed (i.i.d.) complex-valued additive white Gaussian noise (AWGN) random processes each having zero mean and normalized variance (i.e., Var{˜ zi," (x)} = E{|˜ zi," (x)|2 } = 1). On each diversity branch, the signal in (3) is passed through a bank of M matched filters that are tuned to {˜ si (x)}M i=1 . The resulting signals at the output of the %th diversity branch may be written as & √ 2 Γ" + z˜i," , if k = i r˜k," = , (4) if k %= i z˜k," , where {˜ zi," }L "=1 , for i = 1, 2, . . . , M , are i.i.d. complex-valued AWGN random variables each having zero mean and unity zi," } is the instantaneous SNR variance, and Γ" = ν"2 Es /Var{˜ over the %th diversity branch. In [8], a suboptimum noncoherent decision metric for the case of nonidentical Nakagami-m fading channels has been proposed. That is, we obtain the

695

decision variables as λk =

L %

λk," =

"=1

L % "=1

2

w" |˜ rk," | ;

k = 1, 2, . . . , M,

(5)

Γ! where w" = (Γ /m (Γ" is the average SNR per symbol and ! ! +1) m" is the Nakagami-m fading parameter both corresponding to the %th diversity branch), and r˜k," is defined in (4). The correct decision is made at the receiver iff λi = max {λk }M k=1 , for k %= i.

III. AVERAGE E RROR P ERFORMANCE From (4) and (5), the random variables {λk," }, for k %= i, are i.i.d. Gamma distributed random variables each with parameters of λk," ∼ G (1, 2w" ). The characteristic function (CF) of λk," is given by Ψλk,! (jv) = (1 − jv2w" )−1 . Then $L CF of λk = "=1 λk," , for k %= i, becomes [7, ch. 2] Ψλk (jv) =

L '

"=1

L

% 1 A" = , 1 − jv2w" 1 − jv2w"

(6)

"=1

where ( )* * A" = Ψλk (jv)(1 − jv2w" ) *

jv=1/(2w! )

.

(7)

Assuming equiprobable signals and that the ith signal is transmitted, the average symbol error rate (SER) is defined as +# x ,M −1 # ∞ fλi (x) fλk (u) du dx, (8) Ps (e) = 1 − 0

0

where fλ (x) is the probability density function (pdf) of the nonnegative decision variable λ. fλk (u) in (8) can be obtained by taking the inverse Fourier transform of (6), and the result is L % A" −u/2w! e . (9) fλk (u) = 2w" "=1

Now the inner term of the integral in (8) can be expressed as ,M −1 +# x fλk (u) du 0 / L 0n . M −1 % % M −1 n −x/2w! = A" e . (10) (−1) n n=0 "=1

Applying multinomial expansion, one has [11] / L 0n 2t! L 1 % % ' A" e−x/2w! −x/2w! , A" e = n! t" ! "=1

|t|=n

(11)

"=1

where t = [t1 t2 . . . tL ] is an L-dimensional vector of nonnegative integers. The$sum in (11) is taken over all vectors of L length n (i.e., |t| = "=1 t" = n). The equality in (11) can be easily verified by considering the simple case of n = 1; wherein, for the right hand side, a total of L vectors each containing one unity element and L − 1 zero elements are

696

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

considered. Substituting (11) into (10), and then the result in (8), gives 3 4 # ∞ M −1 % L % % t" Ps (e) = B(M, t) exp −x fλi (x)dx 2w" 0 n=1 |t|=n "=1 5 L 6 M −1 % % % t" = B(M, t) Ψλi − , (12) 2w" n=1 "=1

|t|=n

where t

! L (−1)n+1 (M − 1)! ' (A" ) , B(M, t) = (M − n − 1)! t" !

(13)

"=1

and Ψλi (jv) $ is the CF of the “signal-plus-noise” decision L variable λi = p=1 λi,p . Conditioned on the fading statistics, the “signal-plus-noise” random available λi,p is described as a non-central chi-square distribution with 2 degrees of freedom and noncentrality parameter of 2wp Γp . The conditional CF of λi is given by [7, ch. 2] & 9 L 7 8 ' jv2wp Γp 1 = Ψλi jv|{Γp }L exp . p=1 1 − jv2wp 1 − jv2wp p=1 (14) The unconditional CF of λi can be obtained by averaging (14) over the joint pdf of the received SNRs {Γp }L p=1 . The result depends on whether the diversity channels are independent or not. Now we consider the cases of independent as well as arbitrarily correlated nonidentically distributed Nakagami-m channels. A. Independent Nonidentical Channels For independent nonidentically distributed Nakagami-m channels, the joint pdf of {Γp }L p=1 factors into the product of their marginal pdfs. Therefore the unconditional CF of λi may be expressed as & 9 # ∞ L ' jv2wp Γp 1 Ψλi (jv) = exp fΓp (Γp )dΓp 1−jv2wp 0 1−jv2wp p=1 . L ' jv2wp 1 = ΨΓp . (15) 1 − jv2wp 1 − jv2wp p=1 The CF of Γp is given by ΨΓp = (1 − jvΓp /mp )−mp [1, (2.22)]. Substituting this result into (15), we obtain ..−mp L ' Γp m −1 Ψλi (jv) = (1−jv2wp ) p . 1−jv2wp 1+ mp p=1 (16) The SER in (12) can be expressed in closed form, and then the average BER, which is given by Pb (e) = 2(MM−1) Ps (e), becomes 5 6mp −1 M −1 % L L % ' % wp t" M Pb (e) = 1+ B(M, t) 2(M −1) n=1 w" p=1 "=1 |t|=n 5 6−mp . L Γp % wp t" × 1 + 1+ . (17) mp w" "=1

The result in (17) is a new simple expression for the average BER of M -ary orthogonal signals in nonidentical Nakagamim channel employing noncoherent diversity combining. For the special case when L = 1, 7the parameters in (17) become 8 as follows: t = n, B(M, t) = Mn−1 (−1)n+1 , mp = m, and Γp = Γ. Then (17) reduces to . M −1 m−1 % M − 1 (−1)n+1 (1 + n) M : : ;;m , Pb (e) = n 2(M − 1) n=1 Γ 1+n 1+ m (18) which is the well-known result of the average BER of M -ary orthogonal signals in single Nakagami-m fading channel (see, e.g., [1, eq. (8.195)]). B. Arbitrarily Correlated Nonidentical Channels For the case of arbitrarily correlated nonidentical Nakagamim channels, the joint pdf of {Γp }L p=1 does not factor into the products of their marginal pdfs. Alternatively, we define Γ0 = where ap =

L %

2wp 1−jv2wp .

Ψλi (jv) =

L

% 2wp Γp = ap Γp , 1 − jv2wp p=1 p=1 #



(19)

Then (15) may now be written as ejvΓ0 fΓ0 (Γ0 ) dΓ0

0

= ΨΓ0 (jv)

L '

L '

1 1 − jv2wp p=1

1 . 1 − jv2wp p=1

(20)

The instantaneous SNR Γp is Gamma distributed ran8 7 dom variable having parameters of Γp ∼ G mp , Γp /mp . Therefore, Γp can be expressed as sum of 2mp squared 2m i.i.d. Gaussian random variables, {gp,h }h=1p , each with zero mean of Var{gp,h } = Γp /2mp (i.e., gp,h ∼ 8 7 and variance $2mp 2 T N 0, Γp /2mp ). That is Γp = h=1 gp,h = gp gp , where 2T 1 gp = gp,1 gp,2 · · · gp,2mp , for p = 1, 2, . . . , L. Then Γ0 = $L √ $T $ $ ap gp . The statistical correlation p=1 gp gp , where gp = can be defined in terms of that among the among {Γp }L p=1 $ L vectors {gp }p=1 . Following the assumptions made in [12], let the fading parameters be indexed in increasing order such ( ) T

T

T

T

$ that m1 ≤ m2 ≤ · · · ≤ mL . Define g$ = g1$ g2$ · · · gL $L to be an mT × 1 vector, where mT = p=1 2mp . Then the T elements of the mT ×mT covariance matrix Q = Cov{g$ g$ } are constructed as shown in the equation at the top of the next page. The correlation coefficient < between Γp and Γc is defined as ρΓp ,Γc = Cov{Γp Γc }/ Var{Γp }Var{Γc }, which can be expressed in terms of ρp,c as ρΓp ,Γc = = min{mp ,mc } max{mp ,mc }

T ρ2p,c [12]. Let {βu }m u=1 be the eigenvalues of Q. Using the Karhunen-Lo` vector $mT e√ve (KL) expansion of the mT β χ S , where {χ } g$ , we have g$ = u u u u u=1 is a u=1 set of i.i.d. Gaussian random variables each with zero mean and unity variance, and Su is the uth orthonormal eigenvector d $m T χ$u , where the symbol corresponding to βu . Then Γ0 = u=1 d “=” denotes “the equality in distribution”, χ$u = βu χ2u , and T {χ$u }m u=1 is a set of independent Gamma distributed random

RADAYDEH and MATALGAH: SIMPLE AVERAGE BER FORMULAS FOR M -ARY ORTHOGONAL SIGNALS

697

 ap Var {gp,h } ,     if p = c and h = e;   <  ! $ $ "  ρp,c ap ac Var{gp,h }Var{gc,h }, Cov gp,h gc,e =  if p %= c but h = e = 1, 2, . . . , 2 min{mp , mc };      0,   otherwise variables with parameters χ$u ∼ G (1/2, 2βu ). Assume that {βd }D d=1 , for D ≤ mT , to be the distinct eigenvalues of the matrix Q where βd having algebraic multiplicity µd such that $D d $D D d=1 µd = mT . Then Γ0 = d=1 ζd , where {ζd }d=1 is a set of independent Gamma distributed random variables with parameters ζd ∼ G (µd /2, 2βd ). The CF of Γ0 can now be expressed as ΨΓ0 (jv) = det (I − jv2Q)

−1/2

=

D '

d=1

(1 − jv2βd )

−µd /2

M −1 % % M B(M, t) 2(M − 1) n=1

×

D '

5

d=1

|t|=n

L % t" 1+βd w" "=1

6−µd /2

L '

5

p=1

1+

L % wp t" "=1

w"

, (21)

6−1

.

(22)

For the case of independent nonidentical fading channels, the covariance matrix can be expressed as / Γ1 Γ1 Γ2 Γ2 · · · a1 a2 · · · a2 ··· Q = diag a1 2m1 2m1 2m2 2m2 B CD E B CD E 2m1 identical terms 2m2 identical terms 0 ΓL ΓL aL · · · aL . 2mL 2mL B CD E 2mL identical terms

There are D = L distinct eigenvalues of the matrix Q which Γ are its distinct diagonal elements with βp = ap 2mpp having µp = 2mp , for p = 1, 2, . . . , L. In this case, (22) reduces to (17), as expected. For the case of correlated Nakagami-m channels that have the same level of fading severity (i.e., channels having the same fading parameter mp = m, for p = 1, 2, . . . , L), the covariance matrix can be written as Q = diag [D1 D2 · · · D2m ], where {Dd }2m d=1 is a set of L×L identical matrices. In general, D has {βp }L p=1 positive eigenvalues. Then the CF of Γ0 may be expressed as [12], [1, eq. (9.230)] : √ ;−m −m ΨΓ0 (jv) = det (I − jv2D) = det I − jvC R =

L '

p=1 1 Note



(1 − jv2βp )

−m

, (23)

that the eigenvalues of the matrix Q are obtained when jv = 2wp t! L !=1 2w in {ap = 1−jv2w }p=1 .

$L

!

p

M −1 % % M B(M, t) 2(M − 1) n=1 |t|=n 5 6−m 5 6−1 L L L ' % % t" wp t" × 1+βp 1+ . (24) w" w" p=1

Pb (e) =

where I is an identity matrix. Substituting (21) into (20) and then the resulting CF into (12) gives1 Pb (e) =

where R is an L×L positive definite power correlation matrix whose (p, c) element is ρΓp ,Γc , for p %= c, and unity for p = c. The correlation properties among {Γp }L p=1 are independent of any scaling constants. The matrix C in (23) is defined as C = 2 1 1 . Substituting (23) into (20) and diag a Γ a Γ . . . a Γ 1 1 2 2 L L m then the resulting CF into (12) give

"=1

"=1

IV. P ERFORMANCE C OMPARISONS In this section, we compare between the average BERs of the M -ary orthogonal signals, which are evaluated using the derived expressions herein and previously derived expressions employing the conventional square-law combining, in independent and arbitrarily correlated nonidentical Nakagamim fading channels. The results in [1]–[6] are equivalent as they should give the same average BER when evaluated for a specific case. However, we choose to compare with the result in [1, eq. (9.134)] since it was expressed in terms of one-fold integral, which can be readily evaluated numerically. Figs. 1 and 2 present the average BER (i.e., Pb (e)) of M ary orthogonal signals (M = 4 in Fig. 1 and M = 8 in Fig. 2) as a function of the average SNR per bit of the first diversity branch (Γb1 = Γ1 / log2 (M )) over independent nonidentical Nakagami-m fading channels at different values of fading parameters m and diversity order L. Exponentially decaying power delay profile (PDP) is assumed in the figures such that Γ" = Γ1 e−δ("−1) , where δ is the average power decaying factor. The dashed-line curves in Figs. 1 and 2 were obtained using (17), whereas the dotted-line curves were evaluated using [1, eq. (9.134)]. In addition, results for the case of single channel reception (i.e., L = 1) were included in both figures (solid-line curves) to show the noncoherent combining loss, which is associated with the conventional square-law receiver, at different values of M , m, and L. Note that the mark points in Figs. 1 and 2 represent the simulation results for the three cases mentioned above, wherein at least 1 × 105 simulation runs were executed to obtain each of these points. It is clear from both figures that the noncoherent receiver that implements (5) outperforms the conventional square-law receiver, and it is capable of eliminating the noncoherent combining loss associated with the latter for any combination of M , L, and m (note that the dotted-line curves in Figs. 1 and 2, which represent the average BER of the conventional squarelaw combiner presented in [1, eq. (9.134)], intersect with

698

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008 0

0

10

10

−1

10

m = 0.5

−1

10

m = 1.5

−2

10

−3

10

m =2

m = 20

L =1

−3

10

Average BER

Average BER

−2

10

L =1 M = 4, L = 4, δ = 3

−4

10

[1, eq. (9.134)], indep. channels;

M = 4, L = 3, δ = 2.5

(dot) [1, eq. (9.134)], correlated channels;

[1, eq. (9.134)],

square−law combiner

−4

10

square −law combiner

−6

10

eq. (17), noncoherent combiner eq. (5 )

eq. (17), indep. channels; noncoherent combiner eq. (5 )

−7

10

m = 4.5

(dot) eq. (24), correlated channels; noncoherent combiner eq. (5 )

’’ Mark’’ : simulation points 0

2

4

6

8

10

12

14

16

18

Average SNR per bit of first diversity branch [dB]

20

22

Fig. 1. Average BER of 4-ary orthogonal signals as a function of the average SNR per bit of first diversity branch (Γb1 ) in independent nonidentical Nakagami-m fading channels at different values of m assuming exponentially decaying PDP (δ = 2.5) and L = 1, 3. Simulation results: (∗) conventional square-law combiner (M = 4, L = 3, δ = 2.5); (◦) noncoherent combiner in (5) (M = 4, L = 3, δ = 2.5); (") single-channel reception (M = 4, L = 1). 0

10

L =1 M = 8, L = 4, δ = 2.5

−1

10

[1, eq. (9.134)], square −law combiner eq. (17), nonncoherent combiner eq. (5 ) ’’Mark’’ : Simulation points

Average BER

−2

10

m = 20

−3

10

m =2

m = 4.5

−4

10

0

2

4

6

8

10

12

14

16

Average SNR per bit of first diversity branch [dB]

18

20

22

Fig. 2. Average BER of 8-ary orthogonal signals as a function of Γb1 in independent nonidentical Nakagami-m fading channels at different values of m assuming exponentially decaying PDP (δ = 2.5) and L = 1, 4. Simulation results: (∗) conventional square-law combiner (M = 8, L = 4, δ = 2.5); (◦) noncoherent combiner in (5) (M = 8, L = 4, δ = 2.5); (") single-channel reception (M = 8, L = 1).

the curves corresponding to L = 1, whereas the dashed-line curves, which represent the average BER evaluated using (17), are always lower-bound to those for L = 1). These results generalize our observation made in [8] that was limited to the binary signaling case (i.e., M = 2). We now compare the result in (12) (or equivalently (17)) with the well-known formula in [1, eq. (9.134)] in terms of their computational speed. It is seen that (12) is given in double sum of elementary functions and the CF of the combined SNR. On the other hand, [1, eq. (9.134)] contains double sum of elementary elements and a finite-integral; in which the integrand is given in terms of elementary functions

−8

10

m = 6.5

squar e −law combiner

−5

10

0

2

4

6

8

10

m = 20

12

14

16

Average SNR per bit of first diversity branch [dB]

18

20

22

Fig. 3. Average BER of 4-ary orthogonal signals as a function of Γb1 in independent and arbitrarily correlated nonidentical Nakagami-m fading channels at different values of m; assuming exponentially decaying PDP (δ = 3) arbitrary CP (25) with L = 1, 4.

and the CF (or more specifically the MGF) of the combined SNR. Generally speaking, the main difference between (12) and [1, eq. (9.134)] is that the latter involves an integral (which is in the form of summation in numerical computation) that is needed to be evaluated repeatedly for each summation increment. This additional computational effort does impact the complexity of calculations. As an example, Table I presents the estimated computation time (in seconds) required to generate the average BER curves in Figs. 1 and 2 over the extended range of Γb1 from 0 dB to 25 dB (note that Figs. 1 and 2 show results limited to 22 dB). In the calculations, a 1.7 GHz Pentium M processor was used, and Γb1 was increased in uniform steps of 1 dB. The integral in [1, eq. (9.134)] was numerically evaluated using the Matlab function quadl with an error set to 10−15 to achieve a desired accuracy. It should be mentioned that although the evaluation of [1, eq. (9.134)] is affected by the integration routine employed as well as by the computation accuracy needed, the results in Table I represent a relative reference for the purpose of comparison. It is clear from the Table that large time savings is achieved using (17) to estimate the system average BER as compared to [1, eq. (9.134)]. Fig. 3 shows Pb (e) of 4-ary orthogonal signals as a function of Γb1 over independent as well as arbitrarily correlated nonidentical Nakagami-m fading channels at different values of m; assuming exponentially decaying PDP (δ = 3), arbitrary correlation profile (CP), and L = 1 and 4. The power correlation matrix in (23) is assumed to follow [9, eq. (55)]. That is   1 0.795 0.605 0.375  0.795 1 0.795 0.605  . R= (25)  0.605 0.795 1 0.795  0.375 0.605 0.795 1

The system average BER for correlated channels is obtained using (24) and [1, eq. (9.134)]. We notice that the same observations made on Figs. 1 and 2 also hold true on Fig. 3. In

RADAYDEH and MATALGAH: SIMPLE AVERAGE BER FORMULAS FOR M -ARY ORTHOGONAL SIGNALS

699

TABLE I C OMPARISON OF THE ESTIMATED COMPUTATION TIME ( IN SECONDS ) FOR THE GENERATED AVERAGE BER CURVES SHOWN IN F IG . 1 AND 2 FOR THE EXTENDED RANGE OF

Γb1 ∈ [0 : 1 : 25] IN DECIBEL .

Fig. 1 (M = 4, L = 3, į = 2.5)

Expression

Fig. 2 (M = 8, L = 4, į = 2.5)

for Avg. BER

m = 0.5

m=2

m = 4.5

m = 20

m=2

m = 4.5

m = 20

[1. eq. (9.134)]

149.718

128.890

128.984

142.859

1226.953

1144.516

1284.657

eq. (17)

0.297

0.234

0.203

0.203

3.172

3.188

3.203

TABLE II C OMPARISON OF Γb1 REQUIRED TO ACHIEVE CERTAIN AVERAGE BER OF 8- ARY SYSTEM OVER NAKAGAMI -m CHANNELS AT DIFFERENT m ASSUMING EXPONENTIAL

PDP (δ = 3), ARBITRARY CP (25), AND L = 1, 3.

Average SNR per bit of first diversity branch (in dB) required to achieve certain average BER Avg. BER

M = 8, L = 3, į = 3, m = 4 Independent channels eq. (17)

[1,eq. (9.134)]

Correlated channels eq. (24)

M = 8, L = 3, į = 3, m = 8

m=4

[1,eq. (9.134)]

L=1 ----------

Independent channels eq. (17)

[1,eq. (9.134)]

m=8

Correlated channels eq. (24)

[1,eq. (9.134)]

L=1 ----------

1e-2

7.637

8.525

7.727

8.814

7.781

6.411

7.375

6.432

7.504

6.452

1e-4

13.311

13.845

13.931

14.741

14.078

10.758

11.399

10.877

11.730

10.940

1e-6

17.346

17.705

18.869

19.411

19.410

13.775

14.238

14.103

14.851

14.231

1e-8

20.619

20.873

22.940

23.292

24.509

16.217

16.581

16.884

17.521

17.118

1e-10

23.463

23.625

26.331

26.581

29.541

18.304

18.583

19.402

19.931

19.822

addition, we observe that (24) indicates that the noncoherent receiver that implements (5) can eliminate the noncoherent combining loss even when the diversity branches are highly correlated. Table II compares between [1, eq. (9.134)], (17), and (24) in terms of Γb1 required to achieve certain average BER of 8-ary orthogonal signals over independent and arbitrarily correlated nonidentical Nakagami-m channels at different values of m assuming exponentially decaying PDP (δ = 3), arbitrary CP as in (25), and L = 1 and 3. Obviously, the results obtained using the derived expressions herein for both cases of independent and correlated channels, not only are simple to calculate, but also indicate that (5) completely eliminates the noncoherent combining loss associated with the conventional square-law receiver and, at the same time, provides performance improvements for any combination of M , L, m, and δ. V. CONCLUSION We have presented simple and exact expressions for the average BER of M -ary orthogonal signals with noncoherent diversity combining in independent as well as arbitrarily correlated nonidentically distributed Nakagami-m fading channels. The derived expressions are given in terms of elementary functions, and they do not involve numerical integrals or complicated functional operations. Interestingly, we have found that the noncoherent diversity combiner used herein alleviates the combining loss associated with the conventional squarelaw combiner, and, at the same time, provides performance improvements for any combination of the number of orthogonal signals, the diversity order, the fading parameters, the power delay profile, and the correlation profile models.

R EFERENCES [1] M. K. Simon and M.-S. Alouini, Digital Communication over Fading Channels. John Wiley & Sons, 2005. [2] M. K. Simon and M.-S. Alouini, “Bit error probability of noncoherent M -ary orthogonal modulation over generalized fading channels,” J. Commun. and Networks, vol. 1, pp. 111–117, June 1999. [3] M. Z. Win and R. K. Mallik, “Error analysis of noncoherent M-ary FSK with postdetection EGC over correlated Nakagami and Rician channels,” IEEE Trans. Commun., vol. 50, pp. 378–383, Mar. 2002. [4] Q. T. Zhang, “Error performance of noncoherent MFSK with L-diversity on correlated fading channels,” IEEE Trans. Wireless Commun., vol. 1, pp. 531–539, July 2002. [5] A. Annamalai and C. Tellambura, “A moment-generating function (MGF) derivative based unified analysis of incoherent diversity reception of M-ary orthogonal signals over independent and correlated fading channels,” Int. J. Wireless Inform. Network, pp. 41–56, Jan. 2003. [6] R. M. Radaydeh, M. M. Matalgah, and G. Matalkah, “Probability of error performance of noncoherent M-ary FSK over multi-link generalized fading channels,” in Proc. VTC’05-Fall, vol. 3, pp. 1499–1503, Sept. 2005. [7] J. G. Proakis, Digital Communications, 3rd ed. New York: McGraw Hill, 1995. [8] R. M. Radaydeh and M. M. Matalgah, “Improved performance noncoherent weighted-coefficients diversity combiner for DPSK and NCFSK signals in nonidentical Nakagami fading channels,” IEEE Commun. Lett., vo. 10, no. 4, pp. 281–283, Apr. 2006. [9] Q. T. Zhang, “Exact analysis of postdetection combining for DPSK and NFSK systems over arbitrarily correlated Nakagami channels,” IEEE Trans. Commun., vol. 46, pp. 1459–1467, Nov. 1998. [10] V. K. N. Lau and S. V. Maric, “Variable rate adaptive modulation for DS-CDMA,” IEEE Trans. Commun., vol. 47, no. 4, pp. 577–589, Apr. 1999. [11] R. L. Graham, D. E. Knuth, O. Patashnik, Concrete Mathematics. Addison-Wesley, 1989, pp. 166-172. [12] M. Z. Win, G. Chrisikos, and J. H. Winters, “MRC performance for M-ary modulation in arbitrarily correlated Nakagami fading channels,” IEEE Commun. Lett., vol. 4, no. 10, pp. 301–303, Oct. 2000.

700

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

On the Joint Synchronization of Clock Offset and Skew in RBS-Protocol Ilkay Sari, Erchin Serpedin, Kyoung-Lae Noh, Qasim Chaudhari, and Bruce Suter

Abstract—Motivated by the necessity of having a good clock synchronization amongst the nodes of wireless ad-hoc sensor networks, the joint maximum likelihood (JML) estimator for clock phase offset and skew under exponential noise model for reference broadcast synchronization (RBS) protocol is formulated and found via a direct algorithm. The Gibbs Sampler is also proposed for joint clock phase offset and skew estimation and shown to provide superior performance relative to JMLestimator. Lower and upper bounds for the mean-square errors (MSE) of JML-estimator and Gibbs Sampler are introduced in terms of the MSE of the Uniform Minimum Variance Unbiased (UMVU) estimator and the conventional Best Linear Unbiased Estimator (BLUE), respectively. Index Terms—Clock synchronization, maximum likelihood, Gibbs sampling, sensor networks.

I. I NTRODUCTION N sensor networks, the need for synchronized time arises as a very valuable tool for intra-network coordination among various sensors and for obtaining a coordinated interaction between the sensor network and the physical real world. Getting more accurate and energy-efficient synchronization protocols for wireless sensor networks that achieve the best performance limits represents a fundamental design problem for large-scale sensor networks. As explained in [1], the energy constraints in wireless sensor networks are very strict. Hence, using more overhead for better synchronization is not a good solution. To overcome both of these challenges at the same time, one way is to employ energy efficient protocols specifically designed for wireless sensor networks as [2] and [3]. The other way is to reduce the amount of energy spent on signal transmissions by using sophisticated tools from statistical signal processing as in [4] and [5]. In this letter, we will follow the later strategy by relying on more powerful statistical signal processing algorithms. It has been experimentally demonstrated by Pottie and Kaiser [6] that the energy required to transmit 1 bit over 100 meters (3 Joules) is equivalent to the energy required to execute 3 millions of instructions. Therefore, the strategy we have proposed here aims towards trading the computational energy for reduced communication energy. By developing highly accurate synchronization schemes, we aim to minimize the RF energy consumption (reduction of the number of synchronization beacons) by using slightly more complex computational algorithms. The key to achieve this objective is to employ powerful statistical signal processing techniques for developing highly accurate clock synchronization protocols.

I

Paper approved by H. Minn, the Editor for Synchronization and Equalization of the IEEE Communications Society. Manuscript received March 27, 2006; revised February 14, 2007, June 6, 2007, and September 24, 2007. I. Sari, E. Serpedin (contact author), K.-L. Noh and Q. Chaudhari are with the Dept. of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843-3128, USA (e-mail: {ilkay, serpedin}@ece.tamu.edu). Dr. B. W. Suter is in New York (e-mail: [email protected]). Digital Object Identifier 10.1109/TCOMM.2008.060184.

This paper considers the joint maximum likelihood (JML) estimation of clock offset and skew under exponential noise model in the Reference Broadcast Synchronization (RBS) protocol [2]. Gibbs Sampler is also proposed for joint estimation of clock offset and skew. Lower and upper bounds for the performances of JML and Gibbs Sampler are derived in terms of mean-square errors (MSE) of the Uniform Minimum Variance Unbiased (UMVU) estimator and Best Linear Unbiased Estimator (BLUE), respectively. II. RBS-P ROTOCOL AND M ODELING A SSUMPTIONS RBS is a recently proposed receiver-to-receiver synchronization protocol for wireless sensor networks [2]. Roughly speaking, a transmitter-node broadcasts N synchronization signals and the receiver-nodes put time-stamps on these signals. Then, for efficient implementation, the receivers pass the data consisting of the time-stamps to the transmitter where the clock offsets and skews between different pairs of nodes are calculated. By the help of this protocol, two of the main error sources of clock synchronization are eliminated, which are uncertainties at Send Time and Access Time. Furthermore, the difference between propagation times is negligible compared to the uncertainty at Receive Time, which becomes the only error source. In [2], it is experimentally argued that uncertainty at receive time can be modeled in terms of the normal distribution. However, in real-life scenarios, there will be communication signals going around, not just synchronization signals, and nodes will have other jobs, not just the time-stamping. Therefore, we expect that the link from the transmitter to an individual receiver might behave as a regular point-to-point network link (M/M/1), where the receive time uncertainty has exponential distribution (see [7]). In some texts, this type of uncertainty is named as processing delay. Besides, whether the time-stamping is done on Application or Data Link Layer, will change only the mean of the distribution. In short, the ith time-stamps at the receivers X and Y are given by X[i] = T1 + θx + βx τ [i] + vx,λx [i],

(1)

Y [i] = T1 + θy + βy τ [i] + vy,λy [i],

(2)

where T1 stands for the time on the transmitter when it sends the first synchronization signal, θx and βx stand for the offset and skew between the clocks of the receiver X and the transmitter, τ [i] stands for the difference between T1 and the time of ith synchronization signal (with respect to the transmitter’s clock) and vx,λx [i] stands for the exponential iid (independently and identically distributed) noise (with mean 1/λx ), with i = 1, . . . , N . The parameters to be estimated, the offset and skew between the clocks of the nodes X and Y , are given by the following equations: Θ = θ x − θy ,

c 2008 IEEE 0090-6778/08$25.00 

β = βx − βy .

(3)

SARI et al.: ON THE JOINT SYNCHRONIZATION OF CLOCK OFFSET AND SKEW IN RBS-PROTOCOL

III. JML E STIMATION OF THE O FFSET AND S KEW The estimation of clock skew becomes more important in the context of energy-constrained sensor networks. If nodes have good skew estimates, they could live much longer without dedicating valuable resources for periodic resynchronization. Having these good reasons at hand, now we will consider the JML estimation for the skew and the phase offset. Although important, the joint estimation of clock skew and phase offset might not be easy. Sadler shows that under uniform noise, there are infinite solutions for ML estimation [5]. Besides, the support of likelihood function is not convex which leaves out the possibility of taking the mean of all equally likely solutions. In this letter, we will consider the case described in (1) and (2). As long as the two parameter sets {θx , βx , λx } and {θy , βy , λy } do not have a direct relationship and the noise sources in different nodes are independent (both of which are realistic assumptions), we can find the JML-estimator for Θ and β without loss of any information by estimating the parameters (θx , βx ) and (θy , βy ) separately and plugging these estimates back into (3). Thus, we will concentrate on the estimation of θx and βx . First of all, for simplicity, we will assume that τ [i] = i − 1 and T1 = 0, then the likelihood function becomes N  L(θx , βx ) = λx e−λx (X[i]−(θx +(i−1)βx ) I(X[i]≥θx +(i−1)βx ) i=1

=

−λx N (X−f ) λN x e

N 

I(X[i]≥fi ) ,

(4)

i=1

where f (θx , βx ) = θx + N2−1 βx , fi (θx , βx ) = θx + (i − 1)βx , X stands for the sample mean of observations X[i] (i = 1, . . . , N ), and I(x≥a) denotes the indicator function, being equal to 1 when x ≥ a and being 0 elsewhere. Note that in (4), the multiplication of indicator functions defines a convex region  (S) on the parameter space (θx , βx ), N with S = {(θx , βx ) : i=1 fi (θx , βx ) ≤ X[i]}. S has k k vertices {sj }j=1 and k+1 edges (1 ≤ k ≤ N −1). Specifically, the shape of this region and the value of k will strongly depend on the ordering of X[1], ...X[N ]. On this region, we have to maximize the objective function, f (θx , βx ) = θx + N 2−1 βx . Since 0 < N2−1 < N − 1, the support of the solution is guaranteed to be a closed-convex region on the boundary of S. If N = 2m, the solution will be one of the vertices sj and if N = 2m − 1 the solution will assume possibly a segment of the line fm : θx + (m − 1)βx (or again one of the vertices sj , depending on the ordering of the observations). Fig. 1 illustrates these remarks for N = 2 and X[2] > X[1]. In this illustrative example, since f attains its maximum on s1 amongst all points on S, s1 gives the JML estimation of θx and βx . Before proceeding any further, we have to clarify one more point. In derivations up to now, we assumed that λx and λy were both known. However, if we assume they are unknown and use the reduced likelihood function for (θx , βx ) as in [8], it is straightforward to show that we end up with the same JML solution. IV. A PPLICATION OF G IBBS S AMPLER Although it is possible to find the exact solution for the ML-estimate as explained above, we will also apply the Gibbs

Fig. 1.

701

S and the solution s1 .

Sampler to jointly estimate the parameters. Although by using the Gibbs Sampler it is possible to find an approximate JML estimation which is arbitrarily close to the exact one, there are some more important advantages that the Gibbs Sampler will provide us. First of all, it can be shown that the JML estimation (θˆx,ML , βˆx,ML ) is biased for finite N . (As an example consider the case in Fig. 1, E[θˆx,ML ] = E[X[1]] = E[θx + vx,λx [1]] = θx + 1/λx .) For this reason, we need to look for a uniform minimum variance unbiased (UMVU) estimator. However, the Neyman-Fisher factorization theorem provides mini ((X[i] − θx + (i − 1)βx )) as sufficient statistics, which is not independent of the parameters to be estimated. On the other hand, if we use the Gibbs Sampler at the end we do not have just a single point estimate but the posterior distribution for the parameters to be estimated as the output. Then, we can either find the JML-estimator or set the corresponding estimator as the mean value of the posterior distribution of the parameter, which will automatically perform the marginalization and will give better results with reduced bias and variance. For details on the Gibbs Sampler, please refer to [9]. Another appealing feature of the Gibbs Sampler is its straightforward extendability for additional unknown parameters. For example, it is possible that λx is unknown or in addition to the clock phase offset and skew we could have clock drifts: γx and γy . Although very important, due to the limited space we did not consider such a scenario in this letter. The drifts will be observed on RHS (right-hand side) of (1) and (2) as additional terms: τ 2 [i]γx and τ 2 [i]γy . Definitely, it is straightforward to adapt the Gibbs Sampler to these scenarios. Before applying it, we will briefly give some information about the Gibbs Sampling. Assume that we have the data vector z and we want to estimate some parameters Φ = [φ1 , φ2 , ..., φM ]T . For any kind of statistical inference we want to use the joint posterior distribution of the parameters p(Φ|z) ∝ p(z|Φ)p(Φ) (in point estimation, prior distribution p(Φ) is chosen as noninformative). When it is hard to carry out mathematical derivations on the posterior, we stick to MonteCarlo methods, i.e., to draw as many samples as possible from the posterior so that the inference we make using these samples will be arbitrarily close to the exact solution. When it is hard to draw from the joint posterior directly, MCMC

702

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

(Markov Chain Monte-Carlo) type of iterative methods will be used. That resumes to setting up a Markov chain whose stationary distribution is the joint posterior we need. One convenient way to do this is to use Gibbs Sampling in which we iteratively draw samples from one-dimensional conditionals p(φi |z, Φi ), where Φi is an (M − 1) × 1 vector with entries {φj }j=i . Under mild conditions, these one dimensional conditional distributions uniquely determine the joint posterior distribution [10]. Specifically, the general algorithm for Gibbs Sampling with (0) (0) initial values Φ(0) = [φ1 , ..., φM ] is to iterate the following: (1) (0) (0) • Draw φ1 from p(φ1 |z, φ2 , ..., φM ) (1) (1) (0) (0) • Draw φ2 from p(φ2 |z, φ1 , φ3 , ..., φM ) .. . (1) (1) (1) • Draw φM from p(φM |z, φ1 , ..., φM−1 ). After a threshold value t, the set {Φ(t) , Φ(t+1) , ...} behaves like samples from the joint posterior of the parameters. One important point is that the joint posterior distribution should be proper. Otherwise the Gibbs Sampler always converges to some local points, but not necessarily to a meaningful one [11]. For this reason to assure that the posterior is proper, in application of Gibbs Sampler to the point estimation, priors are not directly chosen as flat, but they are chosen from conjugate families and then their parameters arranged so as to have noninformative priors. However in our case, the likelihood function itself can be used as posterior distribution, since its integral is always bounded and positive-valued which makes it proper. We do not need to use any other type of priors but flat. Then in our case, using (4), the procedure becomes

10

• •

(1)

Draw θx Draw

(1) βx

θxt+1 ,

(0)

from ∝ eλx Nθx I(θx ≤ mini (X[i] − (i − 1)βx )) from ∝ e

λx

N (N −1) βx 2

I(βx ≤

(1)

x mini ( X[i]−θ i−1

)).

For we will draw a sample from the exponential distribution with parameter λx N , multiply it with -1 and add (t) mini (X[i]−(i−1)βx ) to it. The procedure for βxt+1 is similar. Note that if λx were unknown, we would utilize the Gamma (t+1) distribution to draw for λx . V. P ERFORMANCE B OUNDS AND S IMULATIONS In this part, we will look at the performances of the Gibbs Sampler and the JML-estimator. However, it will be useful to have some benchmarks with whom to compare their performances. First we will look for lower bounds. Since the likelihood function does not satisfy the regularity conditions required by CRLB (Cramer-Rao Lower Bound), calculating CRLBs is dropped out of the list. One possible lower bound can be found by assuming that all the parameters are known but the one to be bounded, which reduces the problem to the well-known derivation of the bound for a single unknown parameter in exponential noise. Then we can find the UMVU estimators both for the phase offset and skew in closed forms. For θx , it is derived that the UMVU becomes 1 , (5) θˆx,UMV U = mini (X[i] − (i − 1)βx ) − N λx and the MSE (Mean-Square Error) of the estimator equals 1/(N λx )2 [12]-[13]. For βx , the likelihood function is N (N −1) X[i] − θx ). (6) L(βx ) = Ceλx 2 βx ΠN i=2 I(βx ≤ i−1

−6

BLUE GIBBS JML UMVU GIBBS (for truncated Gaussian priors)

−7

MSE of Estimators

10

10

10

−8

−9

4

Fig. 2.

8

12

16 20 24 N:Number of Synchronization Signals

28

32

36

M SE for θˆx,BLU E , θˆx,J M L , θˆx,GIBBS , and θˆx,U M V U .

x By Factorization Theorem, mini ( X[i]−θ i−1 ) is sufficient statistics and it is straightforward to show that it is also complete. This result can be established by following the similar lines of proof as it is done in [12] for θx . Then, by Lehmann-Scheffe Theorem, the UMVU estimator for the skew when the offset and λx are known takes the form:

2 X[i] − θx )− . βˆx,UMV U = mini ( i−1 λx N (N − 1)

(7)

The MSE of the estimator (7) is equal to the variance of x Z = mini=2,3,...,N ( X[i]−θ i−1 ). Thus, we first need to determine the distribution of Z. From the theory of order statistics, the distribution of the minimum of a sample set is given by F (z) = 1 − (1 − F2 (z))(1 − F3 (z))...(1 − FN (z)), where x ≤ z) = P r(vx,λx [i] ≤ (i−1)(z −βx)) = Fi (z) = P r( X[i]−θ i−1 λx (i−1)(z−βx ) )I(z ≥ βx ). Then the distribution becomes (1 − e F (z) = 1 − eλx (ziβx )(1+2+...+N −1) = 1 − eλx (z−βx )

N (N −1) 2

, (8) which is an exponential distribution with the scale parame−1) and the location parameter βx . The MSE of ter λx N (N 2 βˆx,UMV U equals the variance of Z which is 4/(λx N (N − 1))2 . Therefore, we do not expect the MSE of joint estimator for (θx , βx ) to decay faster than ∝ (1/N 2 , 1/N 4 ). We will also consider the BLUE (Best Linear Unbiased Estimator), since it will represent an upper bound. Here, the same notation is used as [5] except that X is replaced with A (A  [1, x], where 1 = [1, 1, · · ·, 1]T and x = [0, 1, · · ·, N − 1]T ) to prevent possible confusion. Since noise is not zero-mean in our model unlike [5], we need to subtract 1/λx from the resulting linear estimate of θx so as to end up with the BLUE-estimator. Then we have [θˆx,BLUE , βˆx,BLUE ]T = (AT A)−1 AT X − [ λ1x , 0]T . It is known that var([θˆx,BLUE , βˆx,BLUE ]T ) 2 T −1 = diag{1/λx (A A) } ∝ [1/N, 1/N 3 ]T . For a detailed discussion on this estimator, the interested readers are referred to [5]. The MSE of the Gibbs Sampler and the JML-estimator for θx = 1 and βx = 0.01 with λx = 103 (which makes var(vx,λx ) = 10−6 ), are presented in Figs. 2 and 3, respectively. In these simulations, the initial values of clock parameters are chosen as zeros. These figures also include the

SARI et al.: ON THE JOINT SYNCHRONIZATION OF CLOCK OFFSET AND SKEW IN RBS-PROTOCOL

10

MSE of Estimators

10

−6

6 LP GIBBS

BLUE GIBBS JML UMVU GIBBS (for truncated Gaussian priors)

−7

5

−8

Relative Number of Computations

10

703

4

10

−9

3

10

10

10

−10

2

−11

−12

Fig. 3.

4

8

12

16 20 24 N:Number of Synchronization Signals

28

32

36

M SE for βˆx,BLU E , βˆx,J M L , βˆx,GIBBS and βˆx,U M V U .

lower and upper bounds presented above. The MSE are plotted against the number of synchronization signals from 4 to 36. It is interesting to note that the MSE of the Gibbs Sampler and the JML-estimator behave like the lower bound, i.e., decay rates on the order of ∝ 1/N 2 and ∝ 1/N 4 , respectively. Note also that the Gibbs Sampler performs better with MSEvalues around 40% for θx and 25% for βx compared to the corresponding values of JML-estimator. We should also note that the convergence of the Gibbs Sampler is achieved after a number of iterations on the order of N . To shed some light on the sensitivity of the Gibbs Sampler to the prior mismatch, we have also provided some simulation results for the mismatched prior knowledge. This is important for engineers and system designers in order to make proper choice of estimator for their considered systems. Fig. 2 and Fig. 3 show the performance of Gibbs Sampler where we have modeled actual prior as a truncated Gaussian while the assumed prior in the Gibbs Sampler is uniform. For prior of offset, truncation points have been chosen as 0 and 10 whereas the mean and Standard Deviation of parent Gaussian distribution as 5 and 2 respectively. And for prior of skew, truncation points are chosen as 0 and 1 whereas the mean and Standard Deviation of parent Gaussian distribution as 0.5 and 0.25 respectively. One drawback of the Gibbs Sampler is definitely its computational complexity. The computational complexity of Gibbs Sampler is affected by the random number generations in each iteration and the number of iterations necessary to converge. Fig. 4 compares the computational complexity of Gibbs Sampler with that of the Linear Programming algorithm through Matlab’s built-in flops function. Although the Gibbs Sampler clearly requires more computations, the required level of precision can be achieved by lesser number of signal transmissions. Hence, there is a tradeoff between the complexity and the gains achieved by Gibbs Sampler. VI. C ONCLUSIONS Under the exponential noise model, we have shown the JML-estimator for the clock skew and phase offset, the JMLestimator is not ill-behaved as opposed to the uniformlydistributed noise case from [5]. JML-estimator of the skew and the phase offset exists and is either unique or a line segment

1

0

4

Fig. 4.

8

12

16 20 24 N: Number of Synchronization Signals

28

32

36

Computational Complexity of the LP and the GIBBS Algorithms.

depending on the magnitudes of the observed data samples. At worst, the support of all equally likely solutions is a closedconvex set (a line segment). The setting was convenient to apply Gibbs Sampler which further increased the performance of JML-estimator. The performances of both estimators (JML and Gibbs Sampler) scale with the same power-law (with respect to the number of synchronization signals: N ). Lower and upper-bounds for the performance of JML and Gibbs Sampler estimators were also presented in terms of the MSEperformances of UMVU estimator and BLUE, respectively. R EFERENCES [1] B. Sundararaman et al., “Clock synchronization in wireless sensor networks: a Survey,” Ad-Hoc Networks, vol. 3, pp. 281-323, May 2005. [2] J. Elson, L. Girod, and D. Estrin, “Fine-grained network time synchronization using reference broadcasts,” in Proc. OSDI, 2002. [3] S. Ganeriwal et al., “Timing-sync protocol for sensor networks,” in Proc. ACM SenSys, 2003, pp. 138-149. [4] N. Khajehnouri and A. H. Sayed, “A distributed broadcasting timesynchronization scheme for wireless sensor networks,” in Proc. ICASSP, 2005, pp. 1053-1056. [5] B. M. Sadler, “Local and broadcast clock sync. in a sensor network,” IEEE Signal Processing Lett., vol. 13, pp. 9-12, Jan. 2006. [6] G. Pottie and W. Kaiser, “Wireless integrated network sensors,” Commun. ACM, vol. 43, no. 5, pp. 51-58, May 2000. [7] H. S. Abdel-Ghaffar, “Analysis of synchronization algortihms with timeout control over networks with exponentially symmetric delays,” IEEE Trans. Commun., vol. 50, pp. 1652-1661, Oct. 2002. [8] D. R. Jeske, “On the maximum likelihood estimation of clock offset,” IEEE Trans. Commun., vol. 53 pp. 53-54, Jan. 2005. [9] A. E. Gelfand and A. F. M. Smith, “Sampling-based approaches to calculating marginal densities,” J. Amer. Stat. Ass., vol. 85, pp. 398409, 1990. [10] J. Besag, “Spatial interaction and the statistical analysis of lattice systems,” J. Roy. Stat. Soc. Series B (Methodological), vol. 36, pp. 192236, 1974. [11] J. P. Hobert and G. Casella, “Functional compatibility, Markov chains, and Gibbs sampling with improper posteriors,” J. Comp. Graph. Stat., vol. 7, pp. 42-60, 1998. [12] E. L. Lehmann and G. Casella, Theory of Point Estimation. Springer, 1998. [13] D. R. Jeske and A. Sampath, “Estimation of clock offset using bootstrap bias-correction techniques,” Technometrics, vol. 45 pp. 256-261, Aug. 2003.

704

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

Blind Estimation of Carrier Frequency Offset and DC Offset for OFDM Systems Hai Lin, Member, IEEE, Herath Mudiyanselage Sankassa Bandara Senevirathna, and Katsumi Yamashita, Member, IEEE

Abstract—In this letter, a blind joint carrier frequency offset and DC Offset (DCO) estimator for OFDM systems is proposed. By exploiting the underlying subspace of the received OFDM signals after the time-domain-average based coarse DCO cancellation, the proposed estimator can achieve excellent performance, which is demonstrated by simulations. Index Terms—Carrier frequency offset (CFO), DC offset (DCO), orthogonal frequency division multiplexing (OFDM).

The rest of the letter is organized as follows: The fundamentals of OFDM and MUSIC-like blind CFO estimator in the absence of the DCO are given in Section II. The development of the blind joint CFO and DCO estimator is given in Section III. Simulation-based performance comparisons are presented in Section IV. Finally, Section V concludes this letter. II. B LIND CFO E STIMATION FOR OFDM S YSTEMS

O

I. I NTRODUCTION

A. OFDM Fundamentals

RTHOGONAL frequency division multiplexing (OFDM) has become a popular technology in wireless communications [1]–[3]. The main drawback of OFDM is its sensitivity to carrier frequency offset (CFO) [4], which needs to be compensated by baseband digital signal processing (DSP) [5]. On the other hand, the recent demand for low-cost mobile terminals has led to the application of direct-conversion architecture based receiver (DCR), which, however, introduces additional disturbances, such as DC offset (DCO), I/Q imbalance, even-order distortion, and flicker noise [6]. Among these, perhaps the DCO is the most serious problem [7]. Although it can be removed by high-pass filter (HPF) [6], this analog cancellation will distort the desired signals, which have been corrupted by the CFO. The coexistence of CFO and DCO is a critical problem in OFDM systems using DCR. Until now, some DSP-based studies have been conducted on this topic [8], [9]. [8] uses periodic preambles for the CFO estimation; however this provides only the coarse estimate of the DCO power. In [9], the best linear unbiased estimator (BLUE) is used to estimate the DCO, which also requires periodic preambles to achieve satisfactory performance. It is desirable to estimate CFO and DCO simultaneously. Unlike the pilot-aided estimators mentioned above, we propose a novel blind joint CFO and DCO estimator. By exploiting the underlying subspace structure of the received OFDM signals after the time-domainaverage (TDA) based coarse DCO cancellation, the proposed estimator is able to achieve excellent performance, which is demonstrated by simulations. Paper approved by S. K. Wilson, the Editor for Multicarrier Modulation of the IEEE Communications Society. Manuscript received April 9, 2006; revised November 1, 2006 and February 13, 2007. This work was partly presented at the IEEE Global Communications Conference, San Francisco, CA, Nov. 2006. H. Lin and K. Yamashita are with the Department of Electrical and Information Systems, Osaka Prefecture University, Osaka, 599-8531, Japan (e-mail: {lin, yamashita}@eis.osakafu-u.ac.jp). H. Senevirathna was with the Department of Electrical and Information Systems, Osaka Prefecture University. He is now with Lanka Bell Ltd., 78 Grandpass road, Colombo 14, Sri Lanka (e-mail: [email protected]). Digital Object Identifier 10.1109/TCOMM.2008.060215.

An OFDM system with N subcarriers and Δf = B/N frequency spacing is considered, where B is the total system bandwidth. Let Sk,m and Hm represent the signal carried by the mth subcarrier and the corresponding channel frequency response, respectively. Then, the N noise-free samples in the kth received OFDM block can be described as r(k) = ΓN (ε)WN dN (k),

(1)

where WN is the N × N inverse DFT (IDFT) matrix, ε is the CFO normalized to Δf , and ΓN (ε) dN (k)

2πε

2πε(N −1)

= diag(1, ej N , . . . , ej N ), (2) = ejφk [H0 Sk,0 , H1 Sk,1 , . . . , HN −1 Sk,N −1 ]T .(3)

In Eq.(3), φk = 2πε(k(N + Ncp ) + Ncp )/N , where Ncp is the duration of cyclic prefix (CP). The general non-integer CFO ε will destroy the orthogonality among the subcarriers and cause inter-carrier interference. B. Blind CFO Estimation Without DCO In the literature of blind CFO estimation for OFDM systems, there are CP [10] and subspace-based [11], [12] approaches. The CP-based approach was originally developed for frequency flat channels and thus suffers from the long delay induced by the frequency-selective channel [13], [14]. The ESPRIT approach [12], which needs not only several OFDM blocks to make the correlation matrix full-rank but also online singular value decomposition (SVD) and matrix inverse, is computationally tight. In contrast, the MUSIC-like estimator (ME) [11] is relatively light in computational cost and has been considered for practical implementation [15]. Usually, a practical OFDM system has some nullsubcarriers to avoid aliasing and ease transmit filtering. Then, (1) can be rewritten as r(k) = ΓN (ε)WQ dQ (k), where WQ = [w1 , w2 , . . . , wQ ] is a column subset of WN , and dQ (k) is a row subset of dN (k), corresponding to the Q < N modulated subcarriers. Let W⊥ = [wQ+1 , . . . , wN ] represent the known orthogonal complement of WQ and EN (ν) =

c 2008 IEEE 0090-6778/08$25.00 

LIN et al.: BLIND ESTIMATION OF CARRIER FREQUENCY OFFSET AND DC OFFSET FOR OFDM SYSTEMS

0

H ΓH N (ν). Since W⊥ EN (ν)r(k) = 0 for ν = ε, the CFO estimate εˆ is obtained by finding ν that minimizes [11]

JME (ν) =

K  L 

705

10

−1

10

H wQ+i EN (ν)r(k)2 ,

(4)

−2

10

where L ≤ N − Q and K is the number of OFDM blocks used. This estimator is similar to the high-resolution MUSIC estimator in array signal processing and can be easily impleH mented in hardware, since wQ+i EN (ν) can be pre-calculated and stored in the memory.

CFO NMSE

k=1 i=1

−3

10

−4

10

−5

10

III. B LIND CFO AND DCO E STIMATION In this letter, we consider only the DCO and neglect other disturbances introduced by DCR. Usually, the √ DC subcarrier is unloaded, i.e., Sk,0 = 0. Further, wQ+1 = a/ N , where a is an all 1 column vector. This can be considered as a solution to DCO, since in the absence of the CFO, the DCO can be easily obtained by a TDA. However, in the coexistence of CFO and an unknown DCO α, assuming that the ADC has a sufficiently wide dynamic range, we have r(k) = ΓN (ε)WQ dQ (k) + αa, whose TDA given by 1 H a ΓN (ε)WQ dQ (k) + α (5) N is no longer the DCO. Since the DC component includes useful signal components, DC component removal by HPF results in performance loss. Furthermore, in the case H H of√non-integer ε, wQ+i EN (ε)r(k) = αwQ+i EN (ε)a = H α N wQ+i EN (ε)wQ+1 is non-zero. Consequently, the ME loses its validity in the presence of the DCO. r¯(k) =

A. TDA-based DCO cancellation in DSP stage The DCO can be treated as the desired signal, while the combination of OFDM signal and white Gaussian noise (WGN) is a hybrid noise. Since this hybrid noise is not WGN, it has been shown that the TDA in the DSP stage is not the optimal DCO estimation [9], which is consistent with (5). The signals after the TDA-based DCO cancellation can be given as r(k) = ΩN ΓN (ε)WQ dQ (k), (6) where ΩN = IN ×N − (1/N )aaH , and IN ×N is the N × N identity matrix. Evidently, the rank of ΩN is N − 1. Noteworthy, since αΩN a = 0, we have r(k) = ΩN r(k). B. Blind CFO Estimation Although the TDA is not the optimal DCO estimation, it can remove the DCO to some extent. Then, a direct idea is the combination of the ME and TDA-based coarse DCO cancellation (ME-TDA). Consider the ME-TDA using r(k), we have H H EN (ν)r(k) = wQ+i EN (ν)ΩN ΓN (ε)WQ dQ (k). (7) wQ+i

Clearly, the ME-TDA suffers from the non-identity of ΛN (ε) = EN (ε)ΩN ΓN (ε). One may consider employing  N (ν) = EN (ν)Ω† instead of EN (ν), where Ω† is the E N N Moore-Penrose pseudoinverse of ΩN . However, ΩN is a selfpseudoinverse matrix and Ω†N ΩN = ΩN (see Appendix A

−6

10

0

CPE, Case A ME, Case A ME−TDA, Case A NBE, Case A CPE, Case B ME, Case B ME−TDA, Case B NBE, Case B CPE, Case C ME, Case C ME−TDA, Case C NBE, Case C

5

10

15

20

25

30

35

40

SNR (dB)

Fig. 1.

CFO NMSE versus SNR.

for the proof); therefore, it is of not much help to solve the difficulty of non-identity. Recalling the basic concept of the ME, the subspace orthogonal to the signal rather than W⊥ is important. In fact, W⊥ is the nullspace of the CFO-free OFDM signals. This indicates that the nullspace of Θ(ν)  ΩN ΓN (ν)WQ is a good candidate for blind CFO estimation. Since ΓN (ν) is a full-rank matrix and WQ is a tall matrix with rank Q < N −1, the rank of Θ(ν) is Q. The SVD of Θ(ν) is     Σs (ν) Θ(ν) = Us (ν) Uz (ν) Vs (ν), (8) 0 where Uz (ν) is an N × (N − Q) tall matrix that represents the nullspace. Let Uz,i (ν) represent the ith column of Uz (ν), UH r(k) = UH z,i (ν) z,i (ν)ΩN r(k) will be zero when ν = ε. The cost function of CFO estimation in the proposed nullspacebased estimator (NBE) is given by J(ν) =

K  L 

2 UH z,i (ν)ΩN r(k) ,

(9)

k=1 i=1

where L and K are identical with those in the ME. Similarly, H UH z,i (ν)ΩN and wQ+i EN (ν)ΩN can be calculated in advance over the 1-D grid search range, which signifies that the TDA is not necessary to be performed online and the NBE and ME-TDA have the same complexity with the ME. C. Blind DCO Estimation Once the CFO estimate εˆ has been obtained, we can calculate H H EN (ˆ ε)r(k) ≈ αwQ+i EN (ˆ ε)a yk,i = wQ+i

(10)

H for i ∈ [1, L] and k ∈ [1, K]. Since xk,i = wQ+i EN (ˆ ε)a can also be calculated, we have G = K × L equations for only one unknown variable α. Using two G × 1 vectors y and x to represent yk,i and xk,i , respectively, this fullrank overdetermined problem can be solved by least squares method, providing a unique solution

α ˆ = (xH x)−1 xH y.

(11)

706

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

TABLE I C OMPLEXITY C OMPARISON CPE (perfect timing) Ncp + 1 Ncp − 1 1 -

NBE, ME-TDA, ME (K = 1) Nsr × L × 2 × N Nsr × L × (N − 1) 1

Nsr : size of CFO search range.

10

10

10

IV. S IMULATION R ESULTS 10

10

−2

−3

−4

−5

−6

0

5

10

15

20

25

30

35

40

SNR (dB) Fig. 2.

DCO MSE versus SNR, Case A.

40 35 30

Effective SNR (dB)

Simulations were performed to demonstrate the validity of the proposed blind joint CFO and DCO estimator. The simulated OFDM system takes the system parameters from the 802.11a WLAN standard [2], where B = 20MHz, N = 64, Q = 52, and Ncp = 16, with QPSK signaling. The frequency-selective fading channel has 15 paths, with exponential power delay profile such that E{|hp |2 } = e−p/5 for p = 0, 1, 2, . . . , 14. The normalized mean square error (NMSE) of the CFO defined as E[(ε − εˆ)2 ] is used to measure the estimator’s performance. The power of the received distortion-free signals is adjusted to be 1, and the signal-tonoise ratio (SNR) is defined as 1/σz2 . The NBE is compared with the ME-TDA and ME, as well as with the CP-based estimator (CPE). K = 1 and L = N − Q are the common parameters for the NBE, ME-TDA and ME. The ME and CPE are without DC cancellation. Three cases of CFO and DCO are considered. Case A: ε = 0.32, |α|2 = 0.5, Case B: ε = 0.08, |α|2 = 0.5, Case C: ε = 0.32, α = 0. From the results in Fig. 1, one can see that when α = 0, the NBE obtains satisfactory performance, the ME completely fails, and the ME-TDA exhibits an error floor at high SNR. This error floor can be explained as follows: For the METDA, the performance loss arises from two aspects. One is the WGN and the other is the non-identity of ΛN (ε). At low SNR condition, the WGN is the dominant part of the disturbance. When the SNR increases, the non-identity of ΛN (ε) cannot be overcome, thus the error floor occurs. When α = 0, the NBE shows a small performance loss compared with the ME, since it takes the nonexisting DCO into account. Comparing the results for α = 0 and α = 0, we know that the NBE’s performance does not vary with the DCO, since the remainder DCO after TDA-based DCO cancellation is −(1/N )aH ΓN (ε)WQ dQ (k), which is independent of α. For the CPE, although it has very less computational complexity as shown in Table I, the channel delay spread and the DCO cause the error floor across almost the total SNR range in all the three cases. In addition, the comparison of the NBE and BLUE [9] in terms of the mean square error of the DCO, i.e., E[|α− α ˆ |2 ], is shown in Fig. 2, where the BLUE’s coefficients are designed for SNR=20dB. The NBE shows better performance than the BLUE, which has an error floor. Since the information-bearing OFDM symbol has the same subcarrier occupation with the “long training sequence” in 802.11a, the reason of this error floor lies in the frequency response of the BLUE estimator, which cannot suppress the signal components at carriers near DC [9]. The NBE’s ability of compensating CFO and DCO can be

−1

BLUE NBE

DCO MSE

Multiplication Addition Argument Argmin

10

CPE ME ME−TDA NBE

25 20 15 10 5 0 −5 10

15

20

25

30

35

40

SNR (dB) Fig. 3.

Effective SNR versus SNR, Case A.

seen from the effective SNR performance in Fig. 3. Furthermore, Fig. 4 shows the block error rates (BLER) for block size of 1000 bytes in 802.11a 12Mbps mode. As expected, the NBE, which can achieve the target BLER of 10% from SNR=16dB, outperforms the CPE, ME and ME-TDA. V. C ONCLUSIONS In this letter, we discussed the underlying subspace structure of the received OFDM signals after the TDA-based coarse DCO cancellation. Based on this subspace structure, a novel blind joint CFO and DCO estimator for OFDM systems has been proposed. In contrast to the pre-proposed pilot-aided approaches that partially solve the coexistence of CFO and DCO, the proposed estimator highlights its ability of estimating CFO and DCO simultaneously, without the assistance of the pilot. Further, it is noteworthy that it has the same computational complexity with that of the ME. The superior performance of the proposed estimator has been demonstrated by the simulations carried out on the basis of the practical 802.11a WLAN application.

LIN et al.: BLIND ESTIMATION OF CARRIER FREQUENCY OFFSET AND DC OFFSET FOR OFDM SYSTEMS 0

R EFERENCES

10

−1

BLER

10

−2

10

CPE ME ME−TDA NBE

−3

10

10

15

20

25

30

35

40

SNR (dB) Fig. 4.

707

BLER versus SNR, Case A.

ACKNOWLEDGMENT The authors wish to thank the editor and anonymous reviewers for their helpful comments and suggestions. A PPENDIX A P ROOF OF SELF - PSEUDOINVERSE OF ΩN Clearly, ΩN is a real symmetric matrix and can be rewritten as ΩN = IN ×N − N1 BN , where BN is a unit matrix and BN BN = N BN . The pseudoinverse operation finds Ω†N from ΩN Ω†N ΩN = ΩN .

(12)

We have 1 1 BN )(IN ×N − BN ) N N 2 1 = IN ×N − BN + 2 BN BN = ΩN .(13) N N Comparing (13) with (12), it is clear that Ω†N = ΩN and ΩN is self-pseudoinverse. ΩN ΩN

= (IN ×N −

[1] H. Sari, G. Karam, and I. Jeanclaude, “Transmission techniques for digital terrestrial TV broadcasting,” IEEE Commun. Mag., vol. 33, pp. 100–109, Feb. 1995. [2] Part 11: Wireless LAN Medium Access Control and Physical Layer (PHY) Specifications: High-Speed Physical Layer in the 5GHz Band, IEEE Standard 802.11a-1999. [3] Part 11: Wireless LAN Medium Access Control and Physical Layer (PHY) Specifications: Further Higher Data Rate Extension in the 2.4GHz Band, IEEE Standard 802.11g-2003. [4] T. Pollet, M. van Bladel, and M. Moeneclaey, “BER sensitivity of OFDM systems to carrier frequency offset and wiener phase noise,” IEEE Trans. Commun., vol. 43, pp. 191–193, Feb./Mar./Apr. 1995. [5] P. H. Moose, “A technique for orthogonal frequency division multiplexing frequency offset correction noise,” IEEE Trans. Commun., vol. 42, pp. 2908–2914, Oct. 1994. [6] B. Razavi, “Design considerations for direct-conversion receivers,” IEEE Trans. Circuits Systems II, vol. 44, no. 6, pp. 428–435, June 1997. [7] A. A. Abidi, “Direct-conversion radio transceivers for digital communications,” IEEE J. Solid-State Circuits, vol. 30, no. 12, pp. 1399–1410, Dec. 1995. [8] C. K. Ho, S. Sun, and P. He, “Low complexity frequency offset estimation in the presence of DC offset,” in Proc. IEEE ICC 2003, pp. 2051–2055, May 2003. [9] S. Marsili, “DC offset estimation in OFDM based WLAN application,” in Proc. IEEE GLOBECOM 2004, Dec. 2004. [10] J.-J.van de Beek, M.Sandell and P. O. Borjesson, “ML estimation of time and frequency offset in OFDM systems,” IEEE Trans. Signal Processing, vol. 48, pp. 1800–1805, July 1997. [11] H. Liu and U. Tureli, “A high-efficiency carrier estimator for OFDM communications,” IEEE Commun. Lett., vol. 2, pp. 104–106, Apr. 1998. [12] U. Tureli, H. Liu, and M. D. Zoltowski, “OFDM blind carrier offset estimation: ESPRIT,” IEEE Trans. Commun., vol. 48, pp. 1459–1461, Sept. 2000. [13] U. Tureli, P. J. Honan, and H. Liu, “Low-complexity nonlinear least squares carrier offset estimator for OFDM: identifiability, diversity and perfromance,” IEEE Trans. Signal Processing, vol. 52, pp. 2441–2452, Sept. 2004. [14] Y. Yao and G. B. Giannakis, “Blind carrier frequency offset estimation in SISO, MIMO, and multiuser OFDM systems,” IEEE Trans. Commun., vol. 53, pp. 173–183, Jan. 2005. [15] U. Tureli, D. Kivanc, and H. Liu, “Experimental and analytical studies on a high-resolution OFDM carrier frequency offset estimator,” IEEE Trans. Veh. Technol., vol. 50, pp. 629-643, Mar. 2001.

708

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

Capacity of MRC on Correlated Rician Fading Channels Khairi Ashour Hamdi, Senior Member, IEEE

Abstract—A new exact explicit expression is derived for the ergodic capacity of maximal ratio combining (MRC) schemes over arbitrarily correlated Rician fading channels. This is used to study the effects of channel correlation on the ergodic capacity. Numerical results reveal that both the phase and the magnitude of correlation have an impact on the ergodic capacity of Rician fading channels. This is in contrast to correlated Rayleigh fading, where the phase of the correlation has no effect on the ergodic capacity. It is also observed that negatively correlated branches in Rician fading may lead to an increase in ergodic capacity beyond that obtained by uncorrelated branches. Index Terms—Channel capacity, maximal ratio combining (MRC), Rician fading, correlated fading, wireless SIMO systems.

I. I NTRODUCTION

D

IVERSITY reception is increasingly becoming a primary technique for improving the performance of radio communication systems in multipath propagation environments. Therefore, the performance of diversity schemes has recently received a considerable research efforts (e.g. [1]-[15]). Recent relevant research on the evaluation of ergodic capacity in correlated fading channels include [3]-[12]. Boche and Jorswieck [3] has analytically shown that, correlation in Rayleigh fading causes a loss in the ergodic capacity (compared to uncorrelated fading), and gave a simple expression for the capacity loss in a fully correlated diversity system. Closed-form expressions for the capacity in the special case of correlated Rayleigh fading are given in [4]-[6]. Capacity analysis in case of Rician fading are given in [7]-[12]. Zhang and Liu in [7] used Porteous’ lemma to find a simple approximate expression for the ergodic capacity in correlated Rician fading channels. Taylor series expansions are used in [8] to obtain more accurate approximations for the ergodic capacity in terms of the moments of the combined channel gain. Laguerre-series expansions are used in [9], [11] for the distribution of the combined SNR. The accuracy of some other approximations for the ergodic capacity in Rician fading has recently been analyzed in [12]. On the other hand, recent advances on performance analysis of digital communication systems in fading channels has recognized the potential importance of moment generating functions (MGF), or Laplace transforms, as a powerful tool for simplifying the analysis of diversity communication systems. This has led to simple expressions to average bit and symbol Paper approved by M. Chiani, the Editor for Wireless Communications of the IEEE Communications Society. Manuscript received September 15, 2006; revised May 15, 2007. K. A. Hamdi is with the School of Electrical & Electronic Engineering, The University of Manchester. Sackville street, PO Box 88, Manchester M60 1QD, United Kingdom (e-mail: [email protected]). Digital Object Identifier 10.1109/TCOMM.2008.060381.

error rates for a wide variety of digital signaling schemes on fading channels, including multichannel reception with correlated diversity (e.g. [14]-[16]). Key to these developments was the transformations of the conditional error rate expressions into different equivalent forms in which the conditional variable appears only as an exponent. For instance, the following identity has been widely employed to simplify the error rate analysis of coherent communication systems in fading    √ 2 π/2 −SNR dθ. (1) exp erfc SNR = π 0 sin2 θ In this letter, we show that it is possible to express the conditional capacity log (1 + SNR ) in a form similar to (1), in which SNR appears only as an exponent. This facilitates using the moment generating functions and leads to a new simple expression for the ergodic capacity in arbitrarily correlated Rician fading channels. This letter is organized as follows. The problem is stated in Section II, and the proposed new solution is given in Section III. Some numerical examples are given in Section IV, and Section V concludes this letter. II. T HE P ROBLEM The purpose of this letter is to derive a simple expression for computing the following average, which represents the ergodic (average) capacity of the MRC diversity system    (2) C = E log2 1 + g † g where E [.] is the expectation operator, g is a M × 1 complex random vector that represents the normalized complex channel gains, and the superscript † denotes Hermitian transposition. Here, the instantaneous signal-to-noise ratio (SNR) at the 2 mth is |gm | , m = 1, 2, .., M, whereas g † g = M channel 2 m=1 |gm | is the combined SNR. In Rician fading channels, g is a complex Gaussian vector having a probability density function (pdf) given by f (g) =

† −1 1 e−(g−μ) Λ (g−μ) π M det (Λ)

(3)

where μ = E [g]

is the mean vector and Λ = † E (g − μ) (g − μ) is the covariance matrix. The Rician 2

|μm | . factor for the mth channel is equal to κm = Λ m,m Direct evaluation of the average in (2),    † −1 1 e−(g−μ) Λ (g−μ) dg C = log2 1 + g † g M π det (Λ) g (4) where the integration is over the M dimensional vector g, requires a huge computational efforts. Other common methods use the pdf for the quadratic form g † g, instead. However,

c 2008 IEEE 0090-6778/08$25.00 

HAMDI: CAPACITY OF MRC ON CORRELATED RICIAN FADING CHANNELS

simple closed-form expressions for the pdf of the Gaussian quadratic forms are known only in some special cases. Otherwise, the cumulative probability distribution function of arbitrary Gaussian quadratic forms are expressed as infinite series with coefficients being determined recursively in terms of the eigenvalues of the covariance matrix (e.g., [1], [9], [11], [18]). On the other hand, closed-form expressions for the MGFs (or Laplace transforms) of Gaussian quadratic forms are readily known for arbitrarily complex Gaussian vectors (e.g. [17], [18])

† M (z) = E e−zg g =



1 det (I M + zΛ)



−μ

−1 z

−1

I M +Λ

μ

e

(5) where I M is the M × M identity matrix. This has been successfully employed in [14] and [15] to obtain simple expressions for bit and symbol error rates of different digital communication systems. However, in order to utilize (5) for computing the ergodic capacity, it is required,  firstly, to represent the conditional capacity expression log2 1 + g † g in a different equivalent form in which the quadratic random variable g † g would appear only as an exponent. In next section, we show that this is possible. III. T HE E RGODIC C APACITY E VALUATION In order to employ the available closed-form expression for the MGF of the Gaussian quadratic forms (5) for evaluating the ergodic capacity in correlated Rician fading, we need firstly the following lemma. Lemma 1: For any x > 0  ∞  1 1 − e−xz e−z dz. (6) ln (1 + x) = z 0 Proof: The proof is given in the Appendix. Now, with (6), the conditional capacity expression  log2 1 + g † g can be expressed in the following desirable form  ∞     † 1 † 1 − e−zg g e−z dz (7) log2 1 + g g = log2 e z 0 in which the quadratic form g † g appears only at the exponent. Therefore, from (7) and (5), the ergodic capacity (2) can be evaluated as follows

709



−zg † g , is , with M (z) = E e 435]) that the term 1−M(z) z related to the tail probability as follows  ∞   1−M(z) = e−zx Pr g † g > x dx. z 0

  Now, owing to the fact that 0 ≤ Pr g † g > x ≤ 1, it can be shown by applying the Steffensen’s inequality for integrals [19, Equ. 12.316] that ∀z ∈ R+ 0< =

1−M(z) z  ∞ −zx 0

e



 Pr g g > x dx ≤ †

 E[g † g ] 0

e−zx dx

  1 − e−zE[g g] ≤ E g† g (9) z where we have  ∞used the fact that, for any non-negative random variable X, 0 Pr {X > x} dx = E [X] . (9) proves that the integrand in (8) is bounded. Furthermore, it can be seen that it is also continuous and possess all derivatives ∀z ∈ R+ . Therefore, standard numerical integration packages can be used straightforwardly to compute (8) To summarize, (8) gives the ergodic capacity of MRC over correlated Rician fading directly in terms of the covariance matrix Λ and the mean vector μ. It should be emphasized at this point that though (8) involves a single numerical integration, however it offers a huge reduction in the required computational complexity when compared to the direct method (4) which requires M -fold integrals. On the other hand, when comparing the computational complexity of the new expression (8) with other known methods, it is to be noticed that the ergodic capacity (8) is given directly in terms of the original covariance matrix Λ, without the need of any eigendecomposition operations. This is in contrast to most previous research on the capacity of MRC over correlated fading channels (e.g. [5], [6], [9]-[11]) which require all distinct eigenvalues of the covariance matrix with their multiplicities. It is also worth mentioning that although an expression for the pdf of the combined SNR can be obtained by the Laplace inversion of M (z), however this would take the form of an infinite series (Laguerre-series) and involves (in addition to the eigendecomposition of the covariance matrix) solving a large set of linear equations recursively (e.g., [9], [11], [18]). †



IV. N UMERICAL E XAMPLES

This section gives some numerical examples that demonstrate the effects of correlation on the ergodic capacity. In 1 − M (z) −z e dz C = log2 e Fig. 1, we consider a constant correlation model in Rayleigh z 0 ⎛

−1 ⎞ fading (κ = 0) with ρij =SNRρ ∀i = j, and ρii =SNR,  ∞ −μ† z −1 I M +Λ μ 1 1⎜ i = 1, 2, .., M, where SNR is the average signal-to-noise ratio ⎟ −z e = log2 e ⎝1 − ⎠ e dz z det (I + zΛ) (SNR) per branch. Here, we plot the ergodic capacity against M 0 SNR for M = 2 and 10 branches in cases of uncorrelated, (8) and negatively and positively correlated branches. Fig. 1 shows which involves only one single-integral over the non-negative that correlation (positively or negatively) decreases the ergodic real line R+ . capacity of MRC in Rayleigh fading. However, the loss in As far the evaluation of the integral in (8) is concerned, ergodic capacity does not exceed 7% when M = 10. (from we prove in what follows that the integrand is continuos and 11.554 bps/Hz into 10.798 pbs/Hz at SNR= 25 dB). We also bounded (and therefore has no singular points in the range note from Fig. 1, that negative and positive correlations result of integration). To prove this, notice from [21, Equ. 2.6, pp. in identical capacities in case of Rayleigh fading. 



710

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

4.4 The Capacity [b/s/Hz]

The Capacity [b/s/Hz]

12 ρ= ±1 ρ= 0

8

M=10

4

M=2

0

10

1

4.2 4.1

0.5

4.0 0.2

3.9 3.8 3.7

0 -10

5

4.3

20

κ=0

0

90

180

270

360

o

arg (ρ)

SNR [dB] Fig. 1. The ergodic capacity against SNR [dB] in case of M = 2 and 10. ρ = 0 and ±1 in case of Rayleigh fading channel κ = 0.

4.2

3π/4

4.1

π/2 π/4

4.0

arg (ρ)= 0

3.9 3.8

4.4

π

The Capacity [b/s/Hz]

The Capacity [b/s/Hz]

4.3

Fig. 3. The ergodic capacity against the correlation phase [degrees] arg ρ◦ for fully correlated dual branches with |ρ| = 1 and several values of Rice factor κ = 0, 0.2, 0.5, 1, 5. SNR=10 dB.

0

0.2

0.4

0.6

0.8

1.0

|ρ| Fig. 2. The ergodic capacity against the correlation magnitude |ρ| = 1 for several values of the correlation phase arg ρ = 0, π4 , π2 , 3π , π. M = 2, 4 κ = 1 and SNR=10 dB.

In order to gain insight into the effect of correlation on the ergodic capacity of MRC in Rician fading, we consider in Figs. 2-4 a dual diversity  system in a Rician fading scenario κ SNR , m = 1, 2, and a covariance of mean vector μm = 1+κ matrix  SNR  SNR ρ 1+κ 1+κ Λ= SNR ∗ SNR 1+κ ρ 1+κ with SNR= 10 dB. In Fig. 2, we let the Rice factor κ = 1, and plot the ergodic capacity against the correlation’s magnitude |ρ| for several correlation’s phase arg ρ = 0, π4 , π2 , 3π 4 , π. We observe that, in contrast to Rayleigh fading, the ergodic capacity depends on the correlation’s phase. Furthermore, depending on the size of the correlation’s phase, the ergodic capacity can be increased with increasing the magnitude of the correlation. Specifically, as can be seen from Fig. 2, the ergodic capacity increases with increasing |ρ| when arg ρ > π/2. In Fig. 3, we plot the ergodic capacity against arg ρ when |ρ| = 1 and κ = 0, 0.2, 0.5, 1,5. We observe that when κ = 0, then the phase arg ρ has no affect on the capacity. However,

4.3 4.2 4.1

ρ=0 ρ=0.5 ρ=-0.5 ρ=-1 ρ=1

4.0 3.9 3.8 3.7 0.01

0.1

1

10

Rice factor ( κ ) Fig. 4. The ergodic capacity against the Rice factor κ. M = 2, SNR=10 dB, and for several correlation coefficients ρ = 0, ±0.5, ±1.

when κ > 0, then the ergodic capacity is maximized when arg ρ = π (i.e., negative correlation). In Fig. 4, we plot the ergodic capacity against κ for different correlations. One can make the following observations from Figs. 2-4: 1) For a given correlation, increasing the Rice factor κ leads to an increase in the ergodic capacity. 2) For any value of Rice factor κ, the capacity with positively correlated branches does not exceed that with negatively correlated branches. 3) Negatively correlated branches perform better than uncorrelated branches when κ > 0.4. On the other hand, increasing the magnitude of correlation when κ < 0.2 causes to decrease the ergodic capacity. V. S UMMARY A new simple expression is derived for computing the ergodic capacity of MRC with arbitrarily correlated Rician faded branches. This is used to determine the effect of correlated branches on the performance of MRC diversity. Numerical results indicate that the ergodic capacity of MRC in negatively correlated Rician fading channels can be improved beyond what would be achieved in uncorrelated channels.

HAMDI: CAPACITY OF MRC ON CORRELATED RICIAN FADING CHANNELS

A PPENDIX In order to give a formal proof of Lemma 1, consider the following series expansion of ln (1 + x) which is valid for all x ≥ 0 [20, Eq. 4.1.25]  n ∞  1 x , x ≥ 0. (10) ln (1 + x) = n 1+x n=1 Now, using the identity1 (e.g. [19, Eqs. 8.312.2 or 3.381.4])  ∞ n−1 s e−s/x ds, n, x > 0 xn = (11) Γ (n) 0 (10) becomes ln (1 + x)

 ∞  1 ∞ sn−1 −s 1+x x ds e = n Γ (n) 0 n=1   ∞  ∞ 1+x 1 sn−1 e−s x ds = n Γ (n) 0  ∞  n=1 1+x 1 s (e − 1) e−s x dz s 0

(12)

which reduces to (6) when we substitute s = zx. R EFERENCES [1] H. T. Hui, “The performance of the maximum ratio combining method in correlated Rician-fading channels for antenna-diversity signal combining,” IEEE Trans. Antennas Propag., vol. 53, no. 3, pp. 958-964, Mar. 2005. [2] E. A. Jorswieck, T. J. Oechtering, and H. Boche, “Performance analysis of combining techniques with correlated diversity,” IEEE WCNC’05, pp. 849-854, Mar. 2005. [3] H. Boche and E. A. Jorswieck, “On the ergodic capacity as a function of the correlation properties in systems with multiple transmit antennas without CSI at the transmitter,” IEEE Trans. Commun., vol. 52, no. 10, pp. 1654-1657, Oct. 2004. [4] R. Annavajjala and L. B. Milstein, “On the capacity of dual diversity combining schemes on correlated Rayleigh fading channels with unequal branch gains,” IEEE WCNC’04, pp. 300-305, Mar. 2004. [5] H. Zhang, W. Li, and T. A. Gulliver, “Capacity and error probability of orthogonal space time block codes over correlated Rayleigh and Rician channels,” IEICE Trans. Fundamentals of Elec., Commun., and Comp. Science, vol. E 88-A, no. 11, pp. 3203-3213, Nov. 2005. [6] A. Forenza, M. R. McKay, I. B. Collings, and R. W. Heath Jr., “Switching between OSTBC and spatial multiplexing with linear receivers in spatially correlated MIMO channels,” in Proc. IEEE Veh. Technol. Conf. (VTC), Melbourne, Australia, May 2006. [7] Q. T. Zhang and D. P. Liu, “A simple capacity formula for correlated diversity Rician fading channels,” IEEE Commun. Lett., vol. 6, no. 1, pp. 481-483, Nov. 2002. [8] J. Perez et al., “Tight closed-form approximation for the ergodic capacity of orthogonal STBC,” IEEE Trans. Wireless Commun., vol. 6, no. 2, pp. 452-457, Feb. 2007. [9] L. Musavian, M. Dohler, M. R. Nakhai, and A. H. Aghvami, “Closedform capacity expressions of orthogonalized correlated MIMO channels,” IEEE Commun. Lett., vol. 8, no. 6, pp. 365-367, June 2004. [10] S. Furrer and P. Cornoel, “Simple ergodic and outage capacity expressions for correlated diversity Ricean fading channels,” IEEE Trans. Wireless Commun., vol. 5, no. 7, pp. 1606-1609, July 2006. [11] R. U. Nabar, H. Bolcskei, and A. J. Paulraj, “Diversity and outage performance in space-time block coded Ricean MIMO channels,” IEEE Trans. Wireless Commun., vol. 4, no. 5, pp. 2519-2532, Sept. 2005. [12] S. Khatalin and J. P. Fonseka, “On the channel capacity in Rician and Hoyt environments with MRC diversity,” IEEE Trans. Veh. Technol., vol. 55, no. 1, pp. 173-141, Jan. 2006. 1 Notice that (11) is common in communication engineering. For instance, it is related to gamma probability density function, and the Laplace transform of tn−1 .

711

[13] M. K. Simon and M.-S. Alouini, “A unified approach to the probability of error for noncoherent and differentially coherent modulations over generalized fading channels,” IEEE Trans. Commun., vol. 46, no. 12, pp. 1654-1657, Dec. 1998. [14] V. V. Veeravalli, “On performance analysis for signaling on correlated fading channels,” IEEE Trans. Commun., vol. 49, no. 11, pp. 1879-1883, Nov. 2001. [15] M. Yao, “Impact of correlated diversity branches in Rician fading channels,” in Proc. IEEE ICC’05, pp. 473-477, May 2005. [16] M.-S. Alouini and A. Goldsmith, “A unified approach for calculating error rates of linearly modulated signals over generalized fading channels, IEEE Trans. Commun., vol. 47, no. 92, pp. 1324-1334, Sept. 1999. [17] M. Schwarz, W. R. Bennet, and S. Stein, Communication Systems and Techniques. New York: McGraw-Hill, 1966. [18] A. M. Mathai and S. B. Provost, Quadratic Forms in Random Variables, Theory and Applications, New York: Marcel Dekker, 1992. [19] L. S. Gradshteyn and I. M. Ryzhik, Table of Integrals, Series, and Products, 6th ed. Academic Press, 2000. [20] S. M. Abramowitz, Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. U.S. Department of Commerce, 1972. [21] W. Feller, An Introduction to Probability Theory and Its Applications, vol. II, 2nd ed. John Wiley & Sons, 1971.

712

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

Transactions Papers Joint Source and Channel Coding using Punctured Ring Convolutional Coded CPM Zihuai Lin, Member, IEEE, and Tor Aulin, Fellow, IEEE

Abstract—In this paper, a novel trellis source encoding scheme based on punctured ring convolutional codes is presented. Joint Source and Channel Coding (JSCC) using trellis coded Continuous Phase Modulation (CPM) with punctured convolutional codes over rings is investigated. The channels considered are the Additive White Gaussian Noise (AWGN) channel and the Rayleigh fading channel. Optimal soft decoding for the proposed JSCC scheme is studied. The soft decoder is based on the A Posteriori Probability (APP) algorithm for trellis coded CPM with punctured ring convolutional codes. It is shown that these systems with soft decoding outperform the same systems with hard decoding especially when the systems operate at low to medium Signal-to-Noise Ratio (SNR). Furthermore, adaptive JSCC approaches based on the proposed source coding scheme are investigated. Compared with JSCC schemes with fixed source coding rates, the proposed adaptive approaches can achieve much better performance in the high SNR region. The novelties of this work are the development of a trellis source encoding method based on punctured ring convolutional codes, the use of a soft decoder, the APP algorithm for the combined systems and the adaptive approaches to the JSCC problem. Index Terms—Joint source and channel coding, punctured ring TCQ, combined punctured ring TCQ/CPM, punctured ring convolutional coded CPM, soft decoding.

I. I NTRODUCTION IGITAL mobile radio communications becomes more and more popular in our daily life. Such communications usually require powerful coding and modulation schemes to preserve the limited bandwidth and power resources. Due to complexity and delay constraints of modern communication systems, design principles based on Shannon’s source and channel separation theorem are being reconsidered. Joint source-channel coding and decoding as an alterative for reliable communications over noisy channels are beginning to draw more and more attention. For data transmission over nonlinear and/or fading channels, such as satellite and mobile radio channels, the digital modulation class with constant envelope is generally considered as

D

Paper approved by F. Alajaji, the Editor for Source and Source/Channel Coding of the IEEE Communications Society. Manuscript received March 18, 2006; revised January 3, 2007 and August 22, 2007. Z. Lin was with the Department of Computer Science and Engineering, Chalmers University of Technology, SE 412 96, Gothenburg, Sweden, and now is with the School of Electrical and Information Engineering, University of Sydney, Sydney, NSW 2006, Australia (e-mail: [email protected]). T. Aulin is with the Department of Computer Science and Engineering, Chalmers University of Technology, SE 412 96, Gothenburg, Sweden (e-mail: [email protected]). Digital Object Identifier 10.1109/TCOMM.2008.060138.

the most suitable choice, because it minimizes the distortion due to non-linear amplification in the high power amplifiers. Phase Shift Keying (PSK) schemes [1] have constant envelope but discontinuous phase transitions between adjacent channel symbols. Continuous Phase Modulation (CPM) [2] schemes have not only constant envelope, but also continuous phase change between adjacent channel symbols. Thus, compared with the PSK schemes, CPM schemes have less side-lobe power in their spectra. Due to the excellent bandwidth efficiency, some nonconstant envelope schemes, e.g., Q2 PSK, SQAM [3], can also be applicable to satellite communications. In this case, however, the Traveling Wave Tube (TWT) type amplifier, which is currently used in communication satellite transponders and exhibit non-linear characteristics in both amplitude and phase of the signal, must operate in the linear region [3]. This will increase the system complexity and a significant loss of transmitter power. In [4], we developed a trellis source encoding scheme based on non-binary trellis encoders, particularly, on ring convolutional encoders [5]. A Joint Source-Channel Coding (JSCC) scheme using combined Ring Trellis Coded Quantization (RTCQ) with Ring Convolutional Coded Continuous Phase Modulation (RCCCPM) is also investigated. Both the Rayleigh fading channel and the Additive White Gaussian Noise (AWGN) channel are considered. Similar to the use of the same trellis encoder for both Trellis Coded Quantization (TCQ) and Trellis Coded Modulation (TCM) in combined TCQ/TCM JSCC schemes [6], RTCQ and RCCCPM of the investigated systems employ the same trellis encoder. Therefore, decoding can be based on the trellis of RCCCPM. In [7], [8], it is shown that compared to trellis coded CPM systems using binary codes of the same complexity, RCCCPM systems yield larger minimum normalized squared Euclidean distance [2]. Therefore, the combined RTCQ/RCCCPM JSCC scheme yields performance superior to the results of that using binary convolutional coded CPM [9]. In this paper, we propose a novel TCQ scheme based on punctured ring convolutional codes. For convenience, we call it Punctured Ring TCQ (PRTCQ). Depending on the puncturing matrix, the source encoding rate in bits per source sample can be fractional. Furthermore, we propose a JSCC scheme using combined PRTCQ with Punctured Ring Convolutional Coded CPM (PRCCCPM).

c 2008 IEEE 0090-6778/08$25.00 

LIN and AULIN: JOINT SOURCE AND CHANNEL CODING USING PUNCTURED RING CONVOLUTIONAL CODED CPM

We begin with the introduction of a new trellis quantization scheme using punctured ring convolutional codes. Then, an adaptive JSCC scheme for combined PRTCQ/PRCCCPM is proposed. Furthermore, an iterative approach to serially concatenated PRTCQ/CPM is developed. Finally, we give some simulation results. II. TCQ USING P UNCTURED C ONVOLUTIONAL C ODES OVER R INGS The concept of convolutional codes over rings is first introduced by Massey and Mittelholzer et al in [5]. They show that when combined with M -ary phase modulation, with the same number of states, convolutional codes over the ring of integers modulo-M have at least as large a free Euclidean distance [1] as the best binary convolutional codes over the Galois field GF(2). Motivated by the observation that there is a similarity between coded modulation and coded quantization, we proposed in [4] a RTCQ scheme based on ring convolutional codes. Source encoding using RTCQ can be described as follows. Let ZM and S be the ring of integers modulo M and the state space of the trellis encoder, respectively. Given a discrete time continuous amplitude input source sequence {xk : k = 1, · · · , LJ }, the ZM trellis encoded quantizer searches its trellis diagram to find a reproduction sequence {ˆ xk } having the minimum squared error distortion between the input source sequence and the reproduced sequence. Let Dj (s) be the minimum accumulative distortion 1 for state s at discrete time j, s ∈ S. Then the distortion Dj+1 (s) can be computed recursively by   Dj+1 (s) = min Dj (s ) + (xj+1 − Γ(s , s))2 . (1)  s ∈S



Here Γ(s , s) is the reproduction level associated with the valid transition2 between state s at discrete time j and its next state s at time j + 1, Γ(s , s) ∈ C where C is an expanded set of reproduction levels. (1) can be solved for ∀s ∈ S, and ∀j ∈ {0, 1, . . . , LJ − 1}. Given an input source sequence {xk }, the quantizer keeps track of the distortion of the survivor path through its trellis defined by the ZM ring convolutional encoder. Assume the initial state is s0 ∈ S and sL is the final survivor state which has the minimum accumulative distortion Dmin (sL ) = min DLJ (s). ∀s∈S

(2)

By tracking back from the final state sL to the initial state s0 , a survivor path with the minimum squared error distortion can be found. One advantage of using a ring convolutional encoder over a binary encoder is that a larger codebook can be obtained. For example, for a rate R/(R + 1) encoder, the size of the codebook for TCQ [10] is 2R+1 , whereas for an ZM RTCQ, it is M R+1 . Another advantage is that the ring encoder can be combined with a Continuous Phase Encoder (CPE)3 to 1 Accumulative distortion is the sum of the distortions of all source samples up to jth sample. 2 Here a “valid transition” is such that the state transition between s and s exists. 3 According to [11], CPM can be described as a concatenation of a CPE and a Memoryless Modulator (MM). In fact, this was also pointed out in [2].

713

form a RCCCPM [7], [8], which can increase the minimum Normalized Square Euclidean Distance (NSED) in comparison with binary convolutional coded CPM [12]. Base on RTCQ, we now propose a trellis based source encoding scheme using a Punctured Ring Convolutional Code (PRCC). A high rate PRCC can be obtained by puncturing a parent 1/n ring convolutional code. The operation of puncturing some coded symbols is implemented by using an (n × p) puncturing matrix, Pmat , where p is the puncturing period. Let t be the total number of transmitted symbols during a puncturing period p, the coding rate of a PRCC is r = p/t. By combining the trellis for a PRCC and a quantizer, PRTCQ is obtained. For a PRTCQ scheme with an underlying PRCC obtained by puncturing a parent 1/n ring convolutional code over ZM , the source encoding rate R is n· log2 M ·p/t bits/sample, and the size of the codebook is 2R+log2 M . Given an independent, identically distributed (i.i.d.), realvalued, discrete-time, continuous-amplitude, stationary random process {xk : k = 1, 2, · · · } with zero mean and variance σx2 . The punctured ring convolutional encoded quantizer maps each source sample into one of N = 2R+log2 M reproduction levels, drawn from a finite reproduction alphabet codebook C = {c1 , c2 , · · · , cN }. It searches all possible paths through the source code trellis to determine the sequence of reproduction levels that minimizes the total average distortion over the entire sequence of source sample. Using the idea of encoderdecoder pair description for Vector Quantization (VQ) [13], PRTCQ can be described as a pair of a PRTCQ encoder and a decoder. Assume that the trellis of a PRTCQ is defined by a PRCC obtained by puncturing a rate R/(R + 1) parent ZM convolutional code. Let Ls be the length of the encoded source sequence. The encoder maps each source symbol into a sequence of M -ary codewords, each of length R, in which (R − 1) symbols are used to specify the subset, and the remaining 1 symbol to specify the quantization levels in the subset. The decoder inputs the sequence from the encoder and outputs a sequence of M -ary codewords, each of length R + 1. The total length4 of the output sequence of the PRTCQ decoder is (R + 1) · Ls . The ith codeword is ui =(u1i , u2i , · · · , uR+1 ), i uji ∈ {0, · · · , M −1}, i ∈ {1, · · · , N } and j ∈ {1, · · · , R+1}. Here ui is referred to as the index vector of the reproduction level ci , drawn from a finite reproduction alphabet (codebook) C. A. Branch Metric For a RTCQ [4] scheme based on a trellis which is defined by a rate 1/2 ring convolutional code, at time k, the branch metric used in the Viterbi algorithm [1] is the squared error d2z1 z0 = (xk − czk1 zk0 )2 between a source sample xk and the k k quantization level czk1 zk0 corresponding to the symbols (zk1 , zk0 ) as the output of the RTCQ encoder. For a PRTCQ scheme with an underlying PRCC obtained by puncturing a rate 1/2 parent convolutional code, since one source sample may be used by two trellis sections [14], the 4 Note that the total length of the output sequence of a PRTCQ decoder is not related to the rate of the underlying PRCC, but related to the rate of the parent code.

714

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

PRTCQ     CPM modulator   {xk }} s(t, μ) {Yk }/{μ{ν k{Bk} k} ENC. DEC. CPE MM

? Channel

} {ˆ xk

{P r(μk |r)} r Soft/Hard Decod. APP Decod. Demod.

r(t, μ)

Fig. 1. System model for a combined PRTCQ/PRCCCPM system over the channel.

branch metric needs to be separated into two parts. One part for each trellis section, like the one used in [15], [16] for a binary trellis encoder. In this work, the metric used by the PRTCQ source encoder for trellis sections without puncturing, (i.e., both of the output symbols (zk1 , zk0 ) are kept), is given by d2,AP = d2z1 · + d2·z0 k k

k

(3)

where d2z1 · = min(d2z1 0 , d2z1 1 , d2z1 2 , d2z1 3 ) and d2·z0 = k k k k k k min(d20z0 , d21z1 , d22z2 , d23z3 ). For trellis sections with puncturk k k k ing, say at time k, if only the output symbol zk0 is punctured, then the decoder will use d2z1 · as the metric increment for k Viterbi decoding, otherwise, d2·z0 will be used.

fading channel [17]. In the proposed systems, the PRTCQ decoder also acts as the encoder for RCCCPM. The overall channel encoder of the combined PRTCQ/PRCCCPM systems is the combined PRTCQ decoder and the CPE. The CPE takes one output symbol from the PRTCQ decoder as an input and generates one vector which is used by the memoryless modulator of CPM to generate one channel transmission waveform. Therefore, the overall channel coding rate is the same as the encoding rate of the underlying punctured ring convolutional encoder of the PRTCQ. There is a clear relationship between the channel coding rate and the source encoding rate due to the use of the same punctured ring convolutional encoder for both PRTCQ and PRCCCPM. That is, the source encoding rate in bits per source sample is n · log2 M times the channel coding rate. Thus, the transmission rate of the overall system is n transmitted channel symbols per source sample. The tradeoff between the source coding rate and the channel coding rate as described in [18] is also reflected in this kind of systems. The higher the puncture rate, the higher the channel coding rate, the more bits for a source sample. The highest puncture rate is limited to 1, therefore, the source encoding rate is no more than n · log2 M bits/sample. Since the PRTCQ is based on a trellis which is defined by a punctured ring convolutional encoder, decoding of the combined system can be based on the trellis for trellis coded CPM over rings.

k

is an approximation Note that the incremental metric d2,AP k to the branch metric d2z1 z0 . For a binary trellis encoder, it was k k shown in [15] that it provides a good approximation to the optimal branch metric in the sense of Maximum Likelihood Sequence Detection (MLSD). The observation also holds for a non-binary trellis encoder if there is an appropriate mapping (which will be explained in Section V) between the quantization levels and the output symbols of a PRTCQ encoder. III. A DAPTIVE JSCC OF C OMBINED PRTCQ/PRCCCPM

We now investigate a JSCC scheme using punctured trellis coded CPM over the ring of integers modulo M . The investigated CPM schemes are M -ary with modulation index h = 1/M . Both the punctured ring convolutional encoder and the CPE have the same algebraic structure. A block diagram of the system is shown in Fig. 1. The source encoder, PRTCQ, maps (using the Viterbi algorithm) each source sample into one of N = 2R+log2 M quantization levels. Assume that at discrete time k, a source sample xk is quantized into a quantization level ci ∈ C using a trellis coded quantizer based on a punctured ring convolutional code. Denote by Bk the output symbol of the PRTCQ encoder and Yk = (Yk1 , · · · , Ykn ) the output vector of the PRTCQ decoder at time k, Ykj ∈ {0, · · · , M − 1} for j ∈ {1, · · · , n}. Obviously, Yk = ui . Note that Bk and Yk correspond to one source sample xk , they may extend over more than one trellis section. The sequence {Y k } is then mapped to an input sequence {μk } of the CPE. The generated waveform from the CPM modulator is transmitted over the channel. The channel can be the AWGN channel or the frequency flat slow Rayleigh

A. Soft Decoding of Combined PRTCQ/PRCCCPM We now study a symbol-by-symbol a posteriori Probability (APP) (or BCJR) decoding algorithm [19] for PRCCCPM over the ring of integers modulo-M . It is different from all systems in the literature so far, e.g., [7], [8], etc. For PRCCCPM, the decoding is more complicated than that for RCCCPM without puncturing. The reason is that the trellis structure varies for different trellis sections. For example, consider a combined punctured ring convolutional encoder over Z4 with Pmat = [110; 101] with a quaternary 1REC5 CPM with modulation index h = 1/4. If at time k, there is no puncturing, (which corresponds to the first column of the puncturing matrix Pmat ), the decoder employing the APP algorithm will determine the transition branch metric based on the two received symbols or waveforms corresponding to the two non-punctured output symbols of the ring convolutional encoder. At time k + 1, the decoder will determine the metric based on the received symbol corresponding to the non-punctured upper output symbol of the ring encoder. Since the state of the CPE depends on the branch output symbols, the state transitions will be different for different trellis sections. It is possible to fix the trellis structure for decoding. The decoder can wait until receiving two symbols to decode even for trellis sections with puncturing, that is, the decoding is based on two trellis sections. The problem, however, is that the concatenated symbols of the two trellis sections are not the output from the ring convolutional encoder with one input symbol. They correspond to two consecutive input symbols 5 LREC means the frequency pulse of the CPM scheme is rectangular with pulse length L symbol duration, see [2] for details.

LIN and AULIN: JOINT SOURCE AND CHANNEL CODING USING PUNCTURED RING CONVOLUTIONAL CODED CPM

of the ring convolutional encoder. It is then hard to determine the state of the PRCCCPM. In this work, the decoding is based on a varying trellis structure. The state σ j of the trellis at discrete time j is cpm cpm ), where σ cc denote the defined as (σ cc j , σj j and σ j state of the PRCC and the state of CPE at discrete time j, respectively. For a PRCCCPM system with a punctured ring convolutional encoder over the ring of integers modulo M having m memory elements, and an M -ary CPM scheme with a rational and irreducible modulation index h = K/P , the total number of states is M m ·P ·M (L−1) . The state transition σ j → σ j+1 is determined by the input of the punctured ring convolutional encoder. Associated with this transition is also the input symbol μ ∈ {0, · · · , M − 1} of the CPE and the mean vector which is obtained by letting the transmitted CPM waveform pass through a bank of complex filters which are matched to the transmitted signals. The APP decoding algorithm developed for RCCCPM systems in [4] can be used for PRCCCPM systems with some modification. The APP algorithm [19] computes the APP P r(Uk = μ|r1 ) of an input symbol μ of the CPE at symbol interval k conditioned on a sufficient statistic r1 = (r1 , · · · , r ) based on channel observations r(t, μ), where  is the length of the input data sequence to the CPE. The advantage by considering joint source and channel coding is that the source statistics obtained by source encoding can be directly used for combined source and channel decoding. The source statistics is obtained by computing the probabilities of the symbols of the input to the punctured ring convolutional encoder. Now, we present a soft decoding algorithm for the combined PRTCQ and CPM systems. This algorithm is based on the APP decoding of PRCCCPM. Under the minimum Mean Square Error (MSE) distortion criterion, given the observable r1 = {r1 , · · · , r }, the optimal estimate x ˆopt of the source sample x can be obtained by setting ∂D/∂ x ˆ = 0, where D is the distortion of the PRTCQ/PRCCCPM system D = E[(x − x ˆ)2 |r1 ] =

 N

(x − ci )2 · p(x, ui |r1 )dx. (4)

i=1

Here ui is the index vector for the quantization level ci , ci ∈ C, ∀i ∈ {1, 2, · · · , N }. From estimation theory [20], (let ∂D/∂ci = 0), we have x ˆopt =

 N

 x·p(x, ui |r1 )dx =

x·p(x|r1 )dx = E[x|r1 ].

i=1

(5)

Consequently, we have x ˆopt =

 N

x · p(x, ui |r1 )dx =

 N

i=1

i=1



p(x, ui , r1 ) dx. p(r1 )

(6)

715

Then we have  N p(r1 |ui ) · p(x|ui ) · P r(ui ) dx x· x ˆopt = p(r1 ) i=1  N p(r1 |ui ) · P r(ui ) x · p(x|ui )dx = p(r1 ) i=1 =

N

p(ui |r1 ) · E[x|ui ],

(8)

i=1

where E[x|ui ] is the centroid of the ith encoder region, i.e., the ith codeword (quantization level) ci of the source encoder PRTCQ. (8) can be further written as x ˆkopt

= E[x|r1 ] =

N −1

P r(Uk = ui |r1 )·E[x|ui ]

i=0

=

N −1

n

i=0

j=1

{

P r(Ukj = uji |r1 )}·E[x|ui ].

(9)

The APP P r(Ukj = uji |r1 ) can be obtained using the APP decoding algorithm for PRCCCPM systems. B. Asymptotical Performance of Combined PRTCQ/ PRCCCPM In [21], we developed an upper bound on the channel distortion6 for a combined TCQ with binary convolutional coded CPM system under MLSD. The bound is based on the transfer function technique [22], which is modified and generalized to include analog signals in discrete time. This, in turn, is based upon the union bound. It is shown in [21] that the developed analytical bounds are consistent with the simulation results. This method can also be applied to the JSCC system using PRCCCPM. Let Es be the transmitted symbol energy and N0 /2 be the double sided power spectral density of the additive white Gaussian noise. Let m be the number of memory elements of the underlying punctured ring convolutional encoder of PRTCQ and ν the transmission rate in bits per channel symbol. The channel distortion for a memoryless non-uniform source of a combined PRTCQ/PRCCCPM system will follow the following statement. Theorem 1: [Upper bound on the channel distortion for a memoryless non-uniform source] For a discrete memoryless non-uniform source, under the assumption that the channel distortion per single letter for a combined PRTCQ/PRCCCPM systems is upper bounded by

  Es Es 2 Dc < Q exp d2min · dmin νN0 2νN0 ∂T (Γ, I, W ) |Γ=Pmax ,I=1,W =e(−Es /2νN0 ) , (10) ∂I

Factorize the term (7)

where d2min is the minimum NSED and Γ, I, W are dummy variables [22]. The average transfer function is

and note that p(r1 |x, ui ) = p(r1 |ui ) since r1 is obtained when the index sequence u is transmitted.

6 The channel distortion is the signal distortion caused by the noisy channel, further explained in [21].

p(x, ui , r1 )

=

p(r1 |x, ui )

· p(x|ui ) · P r(ui ),

716

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

PRTCQ     CPM modulator   be improved by introducing a pseudo-random interleaver bes(t, μ) r(t, μ) tween the PRTCQ encoder and the CPE and using iterative {μ {νk} } {Y } {xk }-ENC.{Bk} k k π DEC. CPE MM Chan. decoding [23]. With a large and random interleaver and with iterative decoding, the bit a priori probabilities input to the {p(r|νk )} O APP algorithm of each component encoder can be almost } {Π I,e k {Λk } r(t, μ) {ˆ xk } - Soft - −1 PRTCQ CPE Source statistically independent. Under this independence assumption, π - SISO Estimat. Demod. - SISO iterative decoding can be shown to be an optimal algorithm {P r(Bk ; O)} for iteratively minimizing cross entropy - a decoding strategy π O,e which is lossless and prohibitively complex [24]. It is shown I {Πk } {Λk } [23] that the iterative decoder can give performance close to MLSD of the concatenated and interleaved codes - a procedure Fig. 2. Iterative decoding for serially concatenated PRTCQ/CPM systems. which is infeasible for large interleavers due to the prohibitive complexity. In this section, we will develop an iterative decoding m algorithm for serially concatenated PRTCQ/CPM systems. The M block diagram is shown in Fig. 2. The subscript k denotes disT (Γ, I, W ) = M −m T (κ, Γ, I, W ) κ=1 crete time. π denotes a random interleaver and π −1 denotes the m M deinterleaver. The source sequence {xk : k = 1, 2, · · · , Ls } 2 Bsκ ,l,z,d Γl I z W d (11) is first quantized by a PRTCQ quantizer. Suppose the trellis = M −m κ=1 l z d of the PRTCQ is defined by a rate 1/n convolutional encoder where Bsκ ,l,z,d is the number of error events that start from over the ring of integers modulo-M . The generated sequence state sκ , and have NSED d2 , length l and total channel {Yk } with length nLs is then fed into a random interleaver distortion caused by the error event given by z. Pmax is a π and becomes the input sequence to the CPE. contant value, < 1 and the Q function is defined The generated output waveform s(t, μ) from the CPM √ 0 < Pmax 2 ∞ as Q(x) = ( 2π)−1 x e−z /2 dz. modulator is transmitted over the channel, which can be an The proof of the Theorem 1 is similar to the one for AWGN channel or a Rayleigh fading channel. The receiver combined TCQ with binary convolutional coded CPM system consists of a PRTCQ Soft Input Soft Output (SISO) [23] [21]. Equation (10) can be further expressed as module and a CPE SISO module. The CPE SISO module is described in [14]. Here we focus on the description for the E s Bd ·Q( d2 ), (12) PRTCQ SISO module. Dc ≤ νN0 d2 Again, we use the varying structured trellis of the PRTCQ where ˜ k = (Y˜ 1 , Y˜ 2 , · · · , Y˜ n0 ) be the output symbols decoder. Let Y m k k k M of the punctured ring convolutional encoder at time k, where −m −ν·l Bd = M Bsκ ,l,z,d ·z·M . (13) ˜k the corresponding input symbol. Let n0 ≤ n. Denote by B z κ=1 l ˜bk and y˜j be a realization of B ˜k and Y˜ j , respectively, and k k j Asymptotically (large SNR values), the channel distortion ˜b , y˜ ∈ {0, · · · , M − 1}, j ∈ {1, · · · , n0 }. ˜ k and Note that Y k k of the system will be dominated by the minimum NSED d2min B ˜k are different from Yk and Bk in that they correspond to a and Bdmin (the number of error events with d2min ). For the trellis section k rather than a source sample x . k JSCC systems using PRCCCPM, the NSED d2 associated with For a trellis section at discrete time k, we define the log an error event Υ = μ − μ can be calculated as ˜ k as ratio of the APP values for the jth symbol of Y

l−1     1 (i+1)T 2 e:Y˜kj (e)=i Ak−1 (s ) · Γk (s , s) · Zk (s) d = r· log2 M · l − cos φ(t, γ)dt . (14) j,O  , Λ = log T i=0 iT k,i   e:Y˜ j (e)=0 Ak−1 (s ) · Γk (s , s) · Zk (s)

Here T is the  symbol interval duration and φ(t, γ) = i [2πhωi + 4πh j=i−L+1 γi q(t  − jT )],  where ωj =  cpm j−L cpm  is the difference ˆ j ) = RP RP (σ j − σ n=0 γn phase state and r is the code rate of the trellis encoder of the PRTCQ, r = p/s. Rx {·} is the modulo x operator and q(·) is the phase response [2]. Larger d2min is used as the design criterion for the selection of the best puncture matrices of the investigated PRTCQ/PRCCCPM systems. IV. A N I TERATIVE A PPROACH TO S ERIALLY C ONCATENATED PRTCQ/CPM S YSTEMS While the APP based soft decoding algorithm is the optimal solution for the PRCCCPM systems described in Section III in the sense of minimum MSE, the performance can significantly

k

i ∈ {1, · · · , M − 1}.

(15)

Here, Y˜kj (e) is the jth symbol of Y˜k of edge e which is associated with the state transition from s to s at time k and letter O denotes the outer decoder. Ak (s) and Zk (s) can be recursively computed branch metric Γk (s , s)  from the    as s Ak−1 (s )Γk (s , s) and Zk (s ) =  in [23], Ak (s) =  s Zk+1 (s)Γk+1 (s , s). The branch metric Γk (s , s) for trellis section at time k can  0 ˜k = ˜bk ) n P r(Y˜ j = be computed by Γk (s , s) = P r(B k j=1 y˜kj ). For trellis sections without puncturing, n0 = n. Let Πj,O k,i ˜ k, be the log ratio of the a priori values for jth symbol of Y then P r(Y˜kj = i) Πj,O = log , i ∈ {1, 2, · · · , M − 1}. (16) k,i P r(Y˜kj = 0)

LIN and AULIN: JOINT SOURCE AND CHANNEL CODING USING PUNCTURED RING CONVOLUTIONAL CODED CPM

(15) can be further expressed as j,O,e + Πj,O Λj,O k,i = Λk,i k,i

(17)

where Λj,O,e is the Log-Likelihood Ratio (LLR) value of the k,i ˜ k . It is obtained extrinsic information for the jth symbol of Y from the other symbols rather than the symbol j itself, that is why it is called extrinsic information. With the iterative decoding algorithm, the vector of = the log ratio of the extrinsic APP values Λj,O,e k j,O,e j,O,e , Λ , · · · , Λ ] of the outer decoder is passed [Λj,O,e k,1 k,2 k,M−1 to the inner CPE SISO module as the a priori information j,I j,I j,I Πj,I k = [Πk,1 , Πk,2 , · · · , Πk,M−1 ] for the next iteration, the letter I denotes the inner decoder. Similarly, the extrinsic APP from the inner decoder becomes the a priori values Λj,I,e k for the outer decoder in the next iteration. information Πj,O k From a large number of simulations, we notice that the density of the extrinsic LLR Λ has a Gaussian-like distribution [25]. The design of the concatenated PRTCQ and PRCCCPM with interleaver can be based on two techniques. The first one is based on the union bound technique under MLSD of the combined sytem with interleaver [23]. In [26], the performance of serially concatenated TCQ/CPM systems under MLSD has been analyzed. It aims at minimizing upper bounds on the channel distortion and generally provide good solutions for medium to high channel SNR values. This technique can also be applied to the proposed systems in this paper. The second one is based on the probability density evolution, or similar techniques [27], [28]. The design goal is to minimize the decoding threshold7 of the system with iterative decoding by increasing the interleaver size. This analysis method is suited to low to medium channel SNR values. Since for JSCC systems, the error floor effect in the high SNR region is usually less significant, due to the fact that the channel distortion caused by a few bit errors for a long source block is small. In this paper, we focus on convergence analysis of the proposed systems. The convergence analysis technique developed in [14] for serially concatenated TCQ/CPM systems working at symbol level can be applied for the studied systems with some modifications. The modifications are made mainly for adapting the varying trellis structure of the serially concatenated PRTCQ/CPM systems. Now for convenience, we suppose that the input of a SISO module at a discrete time is a random variable V . Let v be a realization of V , and v ∈ {v1 , v2 , · · · , vN }. The mutual information I(V, Λ) between a symbol V and the extrinsic LLR Λ is given by I(V, Λ)

=

N  j=1

+∞

−∞

pΛ (ξ|vj )P r(vj )

· log2 N

pΛ (ξ|vj )

i=1 pΛ (ξ|vi )P r(vi )

dξ (18)

where pΛ (ξ|u) is the conditional probability density function of a random vector Λ. For simplicity, we assume that all the 7 Decoding

threshold is the minimum channel SNR value, above which the bit error rate approaches to zero as the interleaver size goes to infinity [29].

717

symbols vi are transmitted with equal probability 8 . Then (18) becomes (19), where n is the number of bits per symbol. Note that (19) is derived without using the uniform error property [30] of geometrically uniform codes [31]. It can also be applied to General codes, i.e., the codes without any regularity [32]. For geometrically uniform codes, (19) simplifies as [33] for symbol level decoder by assuming the all zero sequence to be transmitted. Through independent simulations of the two SISOs, the input/output mutual information with parameter Es /N0 can be estimated [33]. Similar to [33], we estimate the mutual information for the inner and outer decoders separately. Due to the non-uniform error property of CPM, we can not use the all zero sequence as the transmitted sequence. Instead, we generate an input data sequence with each symbol chosen independently from the alphabet {0, 1, · · · , N − 1}. For each symbol, a Gaussian random vector is generated as the LLR of the extrinsic information for that symbol [25]. The sequence of the generated Gaussian random vectors are then fed into the SISO module and the SISO module evaluates the mutual information at symbol level. The Mutual Information (MI) 2 I ) for the inner CPE module can be estimated using I(σΛ ⎧ ⎛ ⎞⎫ Lj N ⎨ N ⎬ l l 1 1 2 I I(σΛ ) =n− log2 ⎝1 + eΛi · e−Λj ⎠ . ⎭ N j=1 ⎩ Lj l=1 i=1,i=j

(20)

Here Lj (j = 1, · · · , N ) is the number of the symbol uj in the index sequence of the source block and Λli denotes the lth extrinsic LLR given the symbol ui is transmitted, Λl1 = 0. For the outer PRTCQ SISO decoder, due to the uniform error property, the all zero sequence can be used to evaluate the output MI

L1 N l 1 2 O I(σΛ (21) ) =n− log2 1 + eΛi . L1 i=2 l=1

Here L1 is the number of the symbol u1 in the index sequence of the source block. The extrinsic information transfer (EXIT) chart [29] is used for the design of the optimal puncture matrices used for the serially concatenated PRTCQ/CPM systems. The design criterion is to search the puncture patterns with the lowest decoding threshold and smaller area between the outer and inner decoder decoding trajectories for each rate. According to [34], the area indicates the SNR loss between the convergence decoding threshold and Shannon limit. V. S YSTEM P ERFORMANCE OF PRTCQ FOR M EMORYLESS S OURCES Computer simulations have been performed for encoding samples from the memoryless uniform, Gaussian and Laplacian sources using PRTCQ. The PRTCQ encoders are obtained by puncturing a rate 1/2 Z4 parent non-systematic convolutional code with the generator polynomial matrix so G(D) = [1 + D + D2 , 1 + D2 ]. The puncturing matrices Pmat for these systems are listed in the second row of Table 1. 8 The assumption is invalid if the analog source is non-uniformly distributed. However, we can still use it as an approximation if the source is non-uniformly distributed.

718

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

I(V, Λ) = =

N N  1 +∞ i=1,i=j pΛ (ξ|vi ) )dξ log2 N − pΛ (ξ|vj )· log2 (1 + N j=1 −∞ pΛ (ξ|vj )

" ! N N 1 i=1,i=j p(Λ|vi ) p(Λ|v1 ) n− · E log2 1 + N j=1 p(Λ|v1 ) p(Λ|vj )

(19)

TABLE I P UNCTURING MATRICES FOR THE INVESTIGATED PRTCQ SCHEMES .

R = 2 bits/sample

Source Encoding Rate Puncture Matrix so Pmat

#

Puncture Matrix co Pmat

#

Puncture Matrix Ico Pmat

#

1 1

1 1

1 1

R = 2 13 bits/sample

$

#

$

#

$

#

R = bits/sample

1 1 1 1

1 1 1 1

1 0 0 1

1 1

1 1 1 0

0 1 1 1

1 1 1 1

1 1

1 1 1 1

1 0 0 1

1 1 1 1

1 1

$

#

$

#

$

#

2 23

1 1

1 1 1 0

0 1

1 1

1 0 0 1

1 1

1 1

1 0 0 1

1 1

R = 3 bits/sample

R = 3 13 bits/sample

$

#

$

#

$

#

$

#

$

#

$

#

1 0

0 1 1 1

1 1

1 0 0 1

1 0

0 1 1 1

1 0

0 1

1 0 0 1

1 1

1 1

1 0

0 1 1 0

0 1

1 0

0 1

1 0 0 1

1 1

$

$

$

The simulation results for the two systems are listed in r ci Table 2. It also shows the results for a rate 2 32 three dimenc1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 c12 c13 c14 c15 c16 sional TCVQ scheme [35], [36] of 16 states using double sized r r r r r r r r r r r r r r r r ui codebook. The codebook of the TCVQ scheme is designed 00 03 02 01 11 10 13 12 22 21 20 23 33 32 31 30 using a LBG algorithm [37]. The size of the codebook of the Fig. 3. A Rate R = 2 23 bits/sample PRTCQ encoder quantization level PRTCQ is N = 2(R+log2 4) , while the size of the codebook mapping. for a κ-dimensional TCVQ is 2κR+1 . Now let us look at the complexity of the PRTCQ scheme. Here, we adopt the complexity measure as the one used in They are source optimized and are obtained by an exhaustive [38]. The complexity is defined as the the total number of search for the best performed puncturing pattern in terms of edges that the Viterbi Algorithm has to evaluate per source the Signal to Quantization Noise Ratio (SQNR) performance. sample. Let Ns denote the number of states of the trellis. The mapping between the quantization levels and the out- For a rate R bits/sample PRTCQ scheme, the total number put symbols of the punctured ring convolutional encoder is of edges in the trellis corresponding to one source sample is designed according to the rule that maximizes the Euclidean Ns · 2R+1 · p/s, where p and s are the puncturing period distance of the quantization levels within a group, i.e., the and the total number of transmitted symbols per puncturing quantization levels corresponding to the output symbols of period, respectively. the edges entering and leaving the same state. As an example, As a reference, for a κ-dimensional TCVQ [35], [36] with the appropriate mapping for the rate R = 2 23 bits/sample source encoding rate R bits per (scalar) sample, the total PRTCQ encoder quantization level mapping is shown in Fig. number of edges in the trellis corresponding to one source 3. Two PRTCQ schemes are investigated in this paper. For sample is Ns · 2κR . It is easy to see that the encoding PRTCQ scheme I, the expanded codebook is obtained by complexity with TCVQ is more than 2κ(R−1) times higher using a rate (R + log2 M ) Lloyd-Max quantizer [13]. For than for PRTCQ. For the investigated R = 2.67 bits/sample PRTCQ scheme II, the codebook is obtained using a modified schemes, the complexity for the TCVQ scheme is about generalized Lloyd algorithm [13] based on a training sequence 42 times higher than the one for the PRTCQ scheme. The [10]. The training sequence consists of 100000 samples from complexity of the PRTCQ, however, is reduced at the sacrifice a source generating i.i.d. samples. For each simulation, 100 of the SQNR performance. For example, with R = 2.67 source blocks each containing 1000 samples (or vectors) are bits/sample TCVQ, the SQNR is 2.7 dB higher than the one with PRTCQ. used. The performance is measured using the SQNR. r

r

r

r

r

r

r

r

r

r

r

r

r

r

r

LIN and AULIN: JOINT SOURCE AND CHANNEL CODING USING PUNCTURED RING CONVOLUTIONAL CODED CPM

719

TABLE II SQNR PERFORMANCE COMPARISON OF PRTCQ AND TCVQ FOR UNIFORM , G AUSSIAN AND L APLACIAN SOURCES AT RATE R = 2 23 AND 3 13 BITS / SAMPLE . B OTH OF THEM HAVE 16 STATES . T HE ENTRIES OF THE TABLE ARE IN D B.

Uniform Gaussian Laplacian

PRTCQ I R = 2 23 R = 3 31 15.84 20.64 12.93 17.45 11.58 15.90

Note that the complexity of TCVQ mentioned above is for traditional TCVQ, e.g., [35], [36]. In [39], a symmetric codebook is used by TCVQ to reduce the quantizer complexity with very little performance loss for a generalized Gaussian source. In [40], Hu and Tiong proposed a computational acceleration method based on a so called partial distance search scheme. Experimental results showed that the method can reduce the computation complexity by about 60 − 90% depending on the codebook structure of TCVQ. It can be seen that even with 90% reduction, the computation complexity of the studied R = 2.67 bits/sample TCVQ scheme is still about 4 times higher than the proposed PRTCQ scheme with the same rate. The trellis based scalar vector quantization [41] idea, which combines TCQ with a scalar vector quantizer [42] to achieve boundary gain and granular gain, may also be extended by replacing TCQ with PRTCQ. In this case the codebook of the PRTCQ will be obtained by using a scalar vector quantizer based on the asymptotic equipartition property [43] of the multidimensional probability density function of the source. To evaluate the significance, or the reliability of the measured SQNR values, 95% confidence intervals of the true average SQNR value for each source are computed. For Gaussian and uniform sources, the 95% confidence interval of the SQNR values listed in Table 2 is no more than 0.004 dB. That is, with probability 95%, the difference between the true average SQNR value and the values listed in Table 2 for Gaussian, or uniform source is no more than 0.004 dB. For the Laplacian source, it is no more than 0.02 dB. That is, with probability 95%, the difference between the true average SQNR value and the values listed in Table 2 for the Laplacian source is no more than 0.02 dB. VI. S IMULATION R ESULTS FOR JSCC USING C OMBINED PRTCQ/PRCCCPM Computer simulations have been performed for combined PRTCQ/PRCCCPM systems with soft decoding over the AWGN channel and the Rayleigh fading channel. Five systems are simulated, the source encoding rates for these systems are 2, 2 13 , 2 32 , 3, 3 31 bits per sample, respectively. The PRTCQ encoders are obtained by puncturing a rate 1/2 Z4 parent nonsystematic convolutional code with the generator polynomial matrix G(D) = [1 + D + D2 , 1 + D2 ]. The puncturing co matrix Pmat for these systems are listed in the third row of so co Table 1. Unlike Pmat , Pmat are optimized over the combined PRTCQ/PRCCCPM systems by an exhaustive search for the best performed puncturing pattern in terms of the largest

PRTCQ II R = 2 23 R = 3 31 16.31 20.82 14.14 17.85 12.83 16.13

TCVQ R = 2 23 17.14 15.63 15.08

18 16 14 Joint Source−Channel SDR dB

Source

12 R=2,hard R=2,soft R=2.67,soft R=2.67, hard R=3,hard R=3,soft OPTA C

10 8 6

2

R=3.33,soft R=3.33,hard R=2.33,hard BDCPM

4 2

R=2.33,soft 0

2

3

4

5

6 7 8 Channel Es/N0 dB

9

10

11

12

Fig. 4. System performance of combined PRTCQ/PRCCCPM systems for a memoryless Gaussian source, soft decoding vs hard decoding.

minimum NSED d2min and less SQNR degradation compared with source optimized puncturing matrices. For each channel SNR value, 50 source blocks with 4000 samples each of a zero mean, unit variance Gaussian source were used in the simulation. The same principle can be applied for a memoryless uniform source and a Laplacian source. The codebook of the PRTCQ in the combined system is channel optimized. It is obtained by using a modified generalized Lloyd algorithm over the noisy channel based on a source training sequence and a channel noise training sequence. Each sequence consists of 100000 zero mean, unit variance Gaussian samples. The algorithm can be found in [14]. This design algorithm is similar to the one used in [44] and [45]. The performance measure is the Signal to Distortion Ratio (SDR) in dB. A. Combined PRTCQ/PRCCCPM over AWGN Channel The quaternary 1REC CPM scheme with h = 1/4 is used for this investigation. Simulation results for the Gaussian distributed source are shown in Fig. 4. For comparison, the SDR performance for the considered systems with hard decoding is also plotted in Fig. 4. It can be seen that the systems with soft decoding outperform the corresponding systems with hard decoding about 1 dB at low channel SNR. As a reference, the Optimal Performance Theoretically Attainable (OPTA) for a memoryless Gaussian source is also

720

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008 20 18

Joint Source−Channel SDR dB

16 14 12 10 8 6

OPTA C

4

R=2,RTCQ/CPM,soft decoding TCQ/TCM,soft decoding BDCPM

2

2

Adaptive PRTCQ/PRCCCPM,soft 0

2

3

4

5

6 7 8 Channel E /N dB s

9

10

11

12

0

Fig. 5. System performance of combined PRTCQ/PRCCCPM systems with rate allocation for a memoryless Gaussian source.

shown in Fig. 4. In the figure, OP T AC2 is obtained by evaluating the distortion rate function [46] at the channel capacity in bits per source sample for two dimensional signalling (e.g., 8PSK). Note that the OP T AC2 provided here is not the upper bounds for the described PRTCQ/PRCCCPM systems, but the upper bound for the investigated TCQ/TCM/8PSK systems which will be described later. This is because CPM is, in general, a multi-dimensional signalling scheme. The upper bound for the PRTCQ/PRCCCPM systems can be obtained by evaluating the distortion rate function at the channel capacity in bits per source symbol for the investigated CPM signalling. The capacity of CPM signalling can be found in the literature, e.g., [47], etc. This upper bound is denoted by BDCP M in the figure. An adaptive JSCC scheme using combined PRTCQ/PRCCCPM is also investigated. The source encoding rate can be selected according to the channel condition. For each channel SNR value, the best performed PRTCQ/PRCCCPM scheme among the five investigated schemes is selected. This can be implemented either by using a feedback channel between the receiver and the transmitter or by using a pre-defined lookup table. The simulation result of the adaptive PRTCQ/PRCCCPM scheme is shown in Fig. 5. Also shown is the performance for a combined RTCQ/RCCCPM system described in [4]. The source encoding rate is 2 bits/sample. It can be seen that the performance of the proposed PRTCQ/PRCCCPM is close to the one for combined RTCQ/RCCCPM at low to medium channel SNR. At high channel SNR values, the proposed PRTCQ/PRCCCPM system significantly outperforms the combined RTCQ/RCCCPM system. Fig. 5 also shows the system performance for a rate R = 2 bits/sample combined TCQ/TCM system described in [48]. Unlike [48] where 8PAM was used, in our simulation 8PSK modulation is employed. The reason is that 8PAM is one dimensional signaling, while 8PSK is two dimensional signaling. It was shown in [26] that the SDR performance for the combined TCQ/TCM system described in [48] using 8PSK outperforms the one with 8PAM. Another reason of using

8PSK is that, in this paper, we aimed at applications of satellite and mobile radio communications where a constant envelope property is desired. The source encoder TCQ is based on a rate 2/3 16 state binary recursive systematic convolutional encoder with parity check matrix H = (23, 04, 16) given in octal form [49]. The simulation result for the combined TCQ/TCM/8PSK system with soft decoding is shown in Fig. 5. Given the same transmission rate in terms of the transmitted channel symbols per source sample 9 , it can be seen that except at the very low channel SNR region10, the combined PRTCQ/PRCCCPM systems outperform the investigated TCQ/TCM system in terms of the SDR. The minimum NSEDs for these systems when the observation interval is larger than or equal to NB 11 are given in Table 3 along with the normalized double-sided bandwidth 2BTb for the corresponding PRCCCPM systems. The bandwidth used here is defined as 99% in band power which is explained explicitly in [2]. In comparison, for the combined TCQ/TCM/8PSK system, 2BTb is 2 [9]. B. Combined PRTCQ/PRCCCPM over Rayleigh Fading Channel The channel model for the JSCC systems mentioned so far is solely based on the AWGN channel. In a mobile radio communication system, due to the multipath propagation of the transmitted signals, the channel is usually modelled as a multipath fading channel [17]. The simplest case of the multipath fading is the so called frequency flat slow Rayleigh fading [17]. Frequency flat implies that the received multipath signal has no time dispersion, there is no intersymbol interference for the transmitted signal. Slow fading means that within a transmitted symbol interval, the fading is nearly constant. Computer simulations for combined PRTCQ/PRCCCPM systems over frequency flat slow Rayleigh fading channels are also simulated in this paper. We assume that the receiver has perfect channel state information. The fading level at each time interval is modeled as a zero mean complex Gaussian random variable with unit variance. The real and the imaginary part of this complex random variable are independent. Fig. 6 shows the simulation results for a memoryless Gaussian source with different source encoding rates. Also shown is the SDR performance for the corresponding systems in the AWGN channel. It can be seen that compared with the systems for AWGN channel, there is roughly 2 ∼ 4 dB performance loss for the systems with Rayleigh fading channel due to the multipath propagation. C. Simulation Results for Serially Concatenated PRTCQ CPM with Iterative Decoding An adaptive JSCC system using serially concatenated PRTCQ/CPM with iterative soft decoding for a memoryless 9 That is, the transmission time of n CPM symbols is equal to the one for the transmission of one 8PSK symbol. 10 The worse performance of the combined PRTCQ/PRCCCPM systems at low SNR is due to the recursive nature of the CPE, where the errors are propagated. 11 N B is the observation symbol intervals which is needed to reach the upper bound on the minimum Euclidean distance [2].

LIN and AULIN: JOINT SOURCE AND CHANNEL CODING USING PUNCTURED RING CONVOLUTIONAL CODED CPM

721

TABLE III T HE MINIMUM N ORMALIZED S QUARED E UCLIDEAN DISTANCE FOR THE INVESTIGATED COMBINED PRTCQ/PRCCCPM SYSTEMS

Combined PRTCQ/PRCCCPM Systems System for 2 bits/sample System for 2 13 bits/sample System for 2 23 bits/sample System for 3 bits/sample System for 3 13 bits/sample

d2min so Pmat 5.39 8.06 5.45 2.18 3.63

Coding rate r of PRCC 1/2 7/12 2/3 3/4 5/6

d2min co Pmat 5.39 9.01 6.97 4.73 6.97

NB so Pmat 11 14 6 6 11

NB co Pmat 11 11 13 6 14

2BTb 1.88 1.61 1.41 1.25 1.13

20

16

3 bits/sample 18

14 8/3 bits/sample Joint Source−Channel SDR dB

16

SDR in dB

12 AWGN

7/3 bits/sample

10 Fading 8

14

12

10

8

Adaptive iterative approach OPTA C

6

BDCPM

6

2

4

Adaptive PRTCQ/PRCCCPM

4

6

8

10 12 Channel E /N dB s

14

16

18

0

4

4

5

6

7

8 9 Channel Es /N0 dB

10

11

12

Fig. 6. System performance for combined PRTCQ/PRCCCPM systems with iterative decoding for rates R = 2 13 ,2 23 and 3 bits/sample over Rayleigh fading channel.

Fig. 7. System performance of combined PRTCQ/PRCCCPM systems with rate allocation for a memoryless Gaussian source with the iterative decoding approach.

Gaussian source over the AWGN channel is also simulated. The source encoding rates are R = 2, 2 31 , 2 23 , 3, 3 31 bits/sample. The generator polynomial of the parent ring convolutional encoder is the same as the one used for combined PRTCQ/PRCCCPM without iterative decoding. The puncture Ico matrices Pmat used for this investigation are listed in the fourth row of Table 1, they are obtained by means of the EXIT chart (see Section IV). Fig. 7 shows the simulation results for the considered system. Also shown is the SDR performance for an adaptive PRTCQ/PRCCCPM system without iterative decoding. It can be seen that the system with an iterative approach significantly improves the SDR performance compared to the one without iterative decoding. When the channel SNR is larger than ∼ 4.8 dB, a performance gain of roughly 1.5 ∼ 3 dB can be obtained over the adaptive system without iterative decoding. As described in [14], a convergence analysis based on EXIT chart is mainly used to predict the decoding threshold of the systems with iterative decoding. The EXIT charts for the five studied systems are plotted in Fig. 8. The decoding thresholds for the five studied serially concatenated PRTCQ/CPM systems with iterative decoding are 4.31, 5.19, 6.07, 7.08 and 8.29, respectively. This threshold is in dB of Es /N0 . It can be seen that the area between the outer and inner decoder decoding trajectories for the investigated higher rate codes

is relatively small compared with the one for the lower rate codes. Thus, the higher rate codes have less SNR loss between the convergence decoding threshold and the Shannon limit than the lower rate codes [34]. The search for the optimal PRCC codes for the proposed PRTCQ/PRCCCPM systems are left for further study. Further comparison with other combined source channel coding schemes, e.g., a channel optimized vector quantization with BPSK using soft demodulation [50], [51], will be studied later. In general, our proposed systems will outperform a fixed rate JSCC scheme in the high SNR region due to the adaptation of channel conditions. VII. S UMMARY We have proposed a new trellis coded quantization scheme based on PRCCs. It is shown that the proposed PRTCQ can achieve a fractional source encoding rate with low complexity. We also investigate a JSCC scheme using PRCCCPM. The channel can be the AWGN channel and the Rayleigh fading channel. An optimal soft decoding algorithm for the investigated systems is first derived. For a memoryless Gaussian source, it is shown that the combined PRTCQ/PRCCCPM systems with soft decoding outperform the corresponding systems with hard decoding. An adaptive JSCC scheme based on combined PRTCQ/PRCCCPM is further investigated. Compared

722

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

2 1.8

1.4 1.2

Outer PRCC, rate=5/ 6 Q1REC Es/N0=8.29 dB

1

Q1REC E s/N0=7.08 dB

0.8

Outer PRCC, rate=3/ 4 Q1REC E /N =6.07 dB

0.6

Outer PRCC, rate=2/ 3 Q1REC E /N =5.19 dB

out

in,

(CPM),I (PRCC)

1.6

I

s

s

0.4

0

0

Q1REC Es/N0=4.31 dB 0.2 0

Outer PRCC, rate=1/ 2 Outer PRCC, rate=7/12 0

0.5

1 I (CPM),I (PRCC) in

1.5

2

out

Fig. 8. EXIT chart for serially concatenated PRTCQ/CPM systems with iterative decoding for punctured ring convolutional encoding rates r = 1/2, 7/12, 2/3, 3/4, 5/6. The corresponding source encoding rates are R = 2, 2 13 , 2 23 , 3, 3 13 bits per source sample.

with a fixed rate RTCQ/RCCCPM system, the performance can be improved significantly in the high SNR region by using the proposed adaptive PRTCQ/PRCCCPM system. An iterative decoding approach for serially concatenated PRTCQ/CPM has also been presented. It is shown that the studied systems with iterative decoding perform much better than the JSCC systems using PRCCCPM in terms of SDR performance. ACKNOWLEDGMENT The authors would like to thank the Editor and the anonymous reviewers for their valuable comments. R EFERENCES [1] J. M. Wozencraft and I. M. Jacobs, Principles of Communication Engineering. New York: John Wiley & Sons, 1965. [2] J. B. Anderson, T. Aulin, and C. E. Sundberg, Digital Phase Modulation. New York: Plenum Press, 1986. [3] International Telecommunication Union, Handbook on Satellite Communications. John Willy & Sons, Inc, third edition, 2002. [4] Z. Lin and T. Aulin, “Joint source and channel coding using ring convolutional coded CPM,” in Proc. IEEE International Symposium on Information Theory, Adelaide, Australia, Sept. 2005, pp. 1656–1660. [5] J. L. Massey and T. Mittelholzer, “Convolutional codes over rings,” in Proc. 4th Joint Swedish-USSR Int. Workshop Information Theory, pp. 14–18, 1989. [6] M. Marcellin and T. Fischer, “Joint trellis-coded quantization/modulation,” IEEE Trans. Commun., vol. 39, no. 2, pp. 172–176, Feb. 1991. [7] R. H. Yang and D. P. Taylor, “Trellis-coded continuous-phase frequencyshift keying with ring convolutional codes,” IEEE Trans. Inform. Theory, vol. 40, no. 4, pp. 1057–1067, July 1994. [8] B. Rimoldi and Q. Li, “Coded continuous phase modulation using ring convolutional codes,” IEEE Trans. Commun., vol. 43, no. 11, pp. 2714– 2720, Nov. 1995. [9] D. G. Daut and C. A. Sanders, “Joint source/channel coding using trelliscoded CPFSK,” in Proc. IEEE Comm. Theory Mini-Conf. in conj. with Globecom’94, pp. 212–217, 1994. [10] M. W. Marcellin and T. R. Fischer, “Trellis coded quantization of memoryless and Gauss-Markov sources,” IEEE Trans. Commun., vol. 38, no. 1, pp. 82–93, Jan. 1990. [11] B. Rimoldi, “A decomposition approach to CPM,” IEEE Trans. Inform. Theory, vol. 34, no. 2, pp. 260–270, Mar. 1988.

[12] G. Lindell, On coded continuous phase modulation, Ph.D. thesis, Dept. Telecommunction Theory, University of Lund, Sweden, 1985. [13] A. Gersho and R. M. Gray, Vector Quantization and Signal Compression. Kluwer Academic, 1991. [14] Z. Lin, Joint source-channel coding using trellis coded CPM, Ph.D Thesis, Chalmers University of Technology, Gothenburg, Sweden, Jan. 2006, http://www.ce.chalmers.se/TCT. [15] T. Woerz and R. Schweikert, “Performance of punctured pragmatic codes,” in Proc. IEEE Global Telecommunications Conference, pp. 664– 669, 1995. [16] X. D. Tian, Image and video transmission over noisy channels, Ph.D thesis, University of California, San Diego, 2005. [17] T. S. Rappaport, Wireless Communications Principles and Practice. Upper Saddle River, NJ: Prentice Hall PTR, 1996. [18] B. Hochwald, “Tradeoff between source and channel coding on a Gaussian channel,” IEEE Trans. Inform. Theory, vol. 44, no. 7, pp. 3044–3055, Nov. 1998. [19] L. R. Bahl, J. Cocke, F. Jelinek, and J. Raviv, “Optimal decoding of linear codes for minimizing symbol error rate,” IEEE Trans. Inform. Theory, vol. IT-20, pp. 284–287, Mar. 1974. [20] H. L. Van Trees, Detection, Estimation, and Modulation Theory, Part I. New York: John Wiley & Sons, Inc, 2001. [21] Z. Lin and T. Aulin, “Upper bounds on the channel distortion of combined TCQ/CPM Systems,” in Proc. International Conference on Communications, Seuol, South Korea, May 2005. [22] T. Aulin, “Symbol error probability bounds for coherently Viterbi detected continuous phase modulated signals,” IEEE Trans. Commun., vol. COM-29, no. 11, pp. 1707–1715, Nov. 1981. [23] S. Benedetto and E. Biglieri, Principles of Digital Transmission With Wireless Applications. New York: Kluwer Academic/Plenum, 1999. [24] M. Moher and T. A. Gulliver, “Cross-entropy and iterative decoding,” IEEE Trans. Inform. Theory, vol. 44, no. 7, pp. 3097–3104, Nov. 1998. [25] A. Grant, “Convergence of non-binary iterative decoding,” in Proc. IEEE Global Telecommunications Conference, vol. 2, no. 25-29, pp. 1058–1062, Nov. 2001. [26] Z. Lin, “Joint source and channel coding using combined TCQ and CPM schemes,” Lic. Eng. thesis, Chalmers University of Technology, Gothenburg, Sweden, Sept. 2003, http://www.ce.chalmers.se/TCT. [27] H. El Gamal and A. R. Hammons, “Analyzing the turbo decoder using the Gaussian approximation,” IEEE Trans. Inform. Theory, vol. 47, pp. 671–686, Feb. 2001. [28] T. J. Richardson and R. L. Urbanke, “The capacity of low-density paritycheck codes under message-passing decoding,” IEEE Trans. Inform. Theory, vol. 47, pp. 599–618, Feb. 2001. [29] S. ten Brink, “Design of serially concatenated codes based on iterative decoding convergence,” in Proc. Int. Symp. Turbo Codes and Related Topics, pp. 319–322, 2000, Brest, France. [30] A. J. Viterbi and L. K. Omura, Principles of Digital Communication and Coding. New York: McGraw-Hill, 1979. [31] G. D. Forney, Jr., “Geometrically uniform codes,” IEEE Trans. Inform. Theory, vol. 37, no. 5, pp. 1241–1260, Sept. 1991. [32] S. Benedetto, M. Mondin, and G. Montorsi, “Performance evaluation of trellis-coded modulation schemes,” Proc. IEEE, vol. 82, no. 6, pp. 833–855, June 1994. [33] B. Scanavino, G. Montorsi, and S. Benedetto, “Convergence perperties of iterative decoders working at bit and symbol level,” in Proc. IEEE Global Telecommunications Conference, vol. 2, no. 25-29, pp. 1037– 1041, Nov. 2001. [34] A. Ashikhmin, S. ten Brink, and G. Kramer, “Extrinsic information transfer functions: model and erasure channel properties,” IEEE Trans. Inform. Theory, vol. 50, pp. 2657–2673, Nov. 2004. [35] H. S. Wang and N. Moayeri, “Trellis coded vector quantization,” IEEE Trans. Commun., vol. 40, no. 8, pp. 1273–1276, Aug. 1992. [36] T. R. Fischer, M. W. Marcellin, and M. Wang, “Trellis coded vector quantization,” IEEE Trans. Inform. Theory, vol. 37, pp. 1551–1566, Nov. 1991. [37] Y. Linde, A. Buzo, and R. M. Gray, “An algorithm for vector quantizer design,” in Proc. IEEE Vehicular Technology Conference, Sept. 2000. [38] R. J. van der Vleuten and J. H. Weber, “Construction and evalution of trellis-coded quantizers for memoryless sources,” IEEE Trans. Inform. Theory, vol. 41, no. 3, pp. 853–859, May 1995. [39] B. Belzer and J. D. Villasenor, “Symmetric trellis-coded vector quantization,” IEEE Trans. Commun., vol. 45, no. 11, pp. 1354–1357, Nov. 1997. [40] M. Hu and T . B. Tiong, “Trellis-coded vector quantization based on modified set partition method and partial vector search scheme,” Electron. Lett., vol. 36, no. 10, pp. 884–886, May 2000.

LIN and AULIN: JOINT SOURCE AND CHANNEL CODING USING PUNCTURED RING CONVOLUTIONAL CODED CPM

[41] R. Laroia and N. Farvardin, “Trellis-based scalar-vector quantizer for memoryless sources,” IEEE Trans. Inform. Theory, vol. 40, no. 3, pp. 860–870, May 1994. [42] R. Laroia and N. Farvardin, “A structured fixed-rate vector quantizer derived from a variable-length scalar quantizer: part I-memoryless sources,” IEEE Trans. Inform. Theory, vol. 39, no. 3, pp. 851–867, May 1993. [43] T. M. Cover and J. A. Thomas, Elements of Infromation Theory. New York: John Wiley and Sons, Inc., 1991. [44] M. Wang and T. Fischer, “Trellis-coded quantization designed for noisy channels,” IEEE Trans. Inform. Theory, vol. 40, no. 6, pp. 1792–1802, Nov. 1994. [45] E. Ayanoglu and R. M. Gray, “The design of joint source and channel trellis waveform coders,” IEEE Trans. Inform. Theory, vol. IT-33, pp. 855–865, Nov. 1987. [46] T. Berger, Rate Distortion Theory. Englewood Cliffs, NJ: Prentice-Hall, Inc., 1971. [47] K. Padmanabhan, S. Ranganathan, S. P. Sundaravaradhan, and O. M. Collins, “General CPM and its capacity,” in Proc. IEEE International Symposium on Information Theory. IEEE, Sept. 2005, pp. 750–754. [48] K. P. Ho and K. H. Chei, “Optimal soft decoding for combined trelliscoded quantization/modulation,” IEEE Trans. Commun., vol. 48, no. 6, pp. 901–904, June 2000. [49] G. Ungerboeck, “Channel coding with multilevel/phase signals,” IEEE Trans. Inform. Theory, vol. 28, no. 1, pp. 55–67, Jan. 1982. [50] F. Alajaji and N. C. Phamdo, “Soft-decision COVQ for Raylegh-fading channels,” IEEE Commn. Lett., vol. 2, no. 6, pp. 162–164, June 1998. [51] N. Phamdo and F. Alajaji, “Soft-decision demodulation design for COVQ over white, colored, and ISI Gaussian channels,” IEEE Trans. Commun., vol. 48, no. 9, pp. 1499–1506, June 2000.

Zihuai Lin received the M.S. degree in electrical engineering from Beijing Polytechnic University, Beijing, China, in 1995 and the M.S. (with distinction), Licentiate and Ph.D degrees in electrical engineering from the Chalmers University of Technology, Gothenburg, Sweden, in 1999, 2003 and 2006, respectively. From February 1995 to June 1996 he was with Beijing Polytechnic University as a lecturer. He became a system engineer at the American compression Labs Inc. in 1996 with focus on video communication systems. In 1999, he joined Ericsson Radio Systems AB, Stockholm, Sweden as a system designer, where he worked with European Wireless LAN standard–HiperLAN/2, Wireless Internet, etc. From November 2000 to January 2006, he was with the department of computer science and engineering, Chalmers University of Technology, Gothenburg, Sweden, working with telecommunication theory, especially on trellis based joint source and channel coding, coded modulation, iterative decoding, etc. Since May 2006, he was appointed as an associate professor at the department of electronic systems, Aalborg university, Denmark, working with MIMO, space time coding, cooperative communications, Relay networks, etc. He also served as an external senior researcher for Nokia research, Aalborg, Denmark, working with 3GPP EUTRA LTE standardization, and relay communications, etc. Currently, he works at the school of electrical and information engineering, university of Sydney, Australia. His research interests are digital communications, telecommunication theory, iterative decoding, joint source channel coding, coded modulation, radio resource management, MIMO, space time coding, cooperative communications, ad hoc networks, networking coding, relay networks, wireless networking, etc. Dr. Lin received the Swedish Foundation for International Cooperation and Higher Education (STINT) Scholarship from 1997 to 1998 and the Chinese Government Award for Outstanding Self-Financed Students Abroad in 2005.

723

Tor Aulin (S’77-M’80-SM’83-F’99) was born in Malm, Sweden, on September 12, 1948. He received the M.S. degree in electrical engineering from the University of Lund, Lund, Sweden, in 1974 and the Dr. Techn. (Ph.D.) degree from the Institute of Telecommunication Theory, University of Lund, in November 1979. He became a Docent at the University of Lund in 1981 and worked at this institute as a Postdoctoral Fellow. During this period he was also a Visiting Scientist at the ECSE Department at Rensselaer Polytechnic Institute, Troy, NY. Following this he spent one year at the European Space Agency (ESA), the European Space Research and Technology Centre (ESTEC) in Noordwijk, the Netherlands, as an ESA Research Fellow. In 1983 he became a Research Professor (Docent) in Information Theory at Chalmers University of Technology, Gteborg, Sweden. In 1991 he formed the Telecommunication Theory Group there and also became a Docent in Computer Engineering in 1995. During the fall of 1995 he was a Visiting Fellow at the Telecommunications Engineering Department, Australian National University, Canberra, ACT, Australia. He was a Visiting Professor at City University of Hong Kong in 2004 and in 2005 he was a Research scholar at the University of Southern California (USC) in Los Angeles, CA, USA. During 2005 he also spent several months working at Communication Systems Department at Lund University, Lund, Sweden. Some of his research interests are communication theory, combined modulation/coding strategies (such as CPM and TCM), analysis of general sequence detection strategies, digital radio channel characterization, digital satellite communication systems, and information theory. During recent years the potentials of these have been considered for iterative decoding in concatenated versions. This is also the case for such schemes integrated into Multiple Access strategies (TCMA, Trellis Code Multiple Access and its CPM counterpart). Joint source/channel coding also falls into this concept. His company, AUCOM, has performed several advanced theoretical studies as a consultant to some of the major international organizations dealing with developing and operating satellite communication systems, e.g., INTELSAT and ESA. He has also performed theoretical study contracts for Saab and Volvo. Nokia has trusted him as an Internal Lecturer and he has performed numerous studies for Ericsson in the area of digital radio transmission, the latter resulting in a patent. He has authored and published some 200 technical papers and has also authored the book Digital Phase Modulation (Plenum, 1986) as a result of his extensive research in this area at that time. He has organized and chaired several sessions at international symposia/conferences organized by, e.g., IEEE and is an EAMEC representative within the Communications Society of the IEEE. He has been an Editor for IEEE Transactions on Communications in the area of communication theory and coding for a decade. He is also (since 30 years) on the Communication Theory Committee within IEEE COMSOC. In December 1997 Dr. Aulin was awarded the Senior Individual Grant at a ceremony in Stockholm, Sweden, handed over by the Prime Minister of Sweden. This has thereafter been repeated in 2004. Dr. Aulin has two papers among the best (Best-of-the-Best) published during the first 50 years of the IEEE COMSOC, selected in connection with their 50th anniversary in 2002. Dr. Aulin also has an academic degree as a solo cellist.

724

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

Near Optimal Common Detection Techniques for Shaped Offset QPSK and Feher’s QPSK Tom Nelson, Member, IEEE, Erik Perrins, Senior Member, IEEE, and Michael Rice, Senior Member, IEEE

Abstract—A detector architecture capable of detecting both shaped offset quadrature phase shift keying (SOQPSK-TG) and Feher’s quadrature phase shift keying (FQPSK-JR) is developed and analyzed. Both modulations are embodied as fully interoperable modulations in the Interrange Instrumentation Group (IRIG) standard IRIG-106. It is shown that the common detector achieves near optimal bit error rate performance without knowledge of which modulation is used by the transmitter. The detection techniques are based on a common trellis-coded modulation representation and a common continuous phase modulation (CPM) representation for these two modulations. In addition the common pulse amplitude modulation (PAM) decomposition of the common CPM representation is developed. The common PAM-based detector offers the best performancecomplexity trade-off among the detectors considered. Index Terms—Offset QPSK, continuous phase modulation, cross-correlated trellis-coded quadrature modulation, detection.

P

I. I NTRODUCTION

OWER and bandwidth constraints present challenges to modulation design. The constant envelope constraint is also imposed when operation through a fully saturated nonlinear RF power amplifier is required. Examples include commercial and military satellite communication links, digital mobile telephony (i.e., Gaussian minimum shift keying (GMSK) for the Global System for Mobile communications (GSM) [1]), and aeronautical telemetry [2]. Aeronautical telemetry is an interesting case study since the solution to this problem resulted in the adoption of two interoperable waveforms known as Feher-patented quadrature phase shift keying (FQPSK) and shaped offset quadrature phase shift keying (SOQPSK). From the 1970s, pulse code modulation/frequency modulation (PCM/FM) has been the dominant modulation used for test and evaluation on government test ranges in the USA, Europe, and Asia. (PCM/FM is binary continuous phase modulation (CPM) with a digital modulation index h = 0.7 and a frequency pulse which is a rectangular pulse with a duration of one bit time that has been low-pass filtered.) In the USA, the main spectral allocations for aeronautical telemetry are L-band (1435 –

Paper approved by H. Leib, the Editor for Communication and Information Theory of the IEEE Communications Society. Manuscript received March 14, 2006; revised November 17, 2006. This work was supported by a grant from the U.S. Air Force under contract FA9302-05-C-0001. T. Nelson was with the Department of Electrical & Computer Engineering, Brigham Young University, Provo, UT. He is now with L-3 Communications, Communication Systems–West, Salt Lake City, UT 84116 (e-mail: [email protected]). E. Perrins is with the Department of Electrical Engineering & Computer Science, University of Kansas, Lawrence, KS 66045 (e-mail: [email protected]). M. Rice is with the Department of Electrical & Computer Engineering, Brigham Young University, Provo, UT 84602 (e-mail: [email protected]). Digital Object Identifier 10.1109/TCOMM.2008.060155.

1535 MHz), lower S-band (2200 – 2290 MHz), and upper Sband (2310 – 2390 MHz). Increasing data rate requirements along with an ever increasing number of test flights put tremendous pressure on these spectral allocations in the 1980s and 1990s. The situation was further exacerbated in 1997 when the lower portion of upper S-band from 2310 to 2360 MHz was reallocated in two separate auctions1 . In response to this situation, the Telemetry Group of the Range Commander’s Council adopted a more bandwidth efficient modulation as part of its Interrange Instrumentation Group (IRIG) standard, IRIG-106 [3], in 2000. This modulation, known as FQPSK-B, was a proprietary version of FQPSK described in [4]. Efforts to reduce some aspects of the implementation complexity resulted in a non-proprietary version, known as FQPSK-JR [5] which was adopted as part of IRIG-106 in 2004. Also in 2004, a version of SOQPSK, known as SOQPSK-TG [6], was adopted as a license-free, fully interoperable alternative in the IRIG-106 standard. These modulations are described in more detail in Section II. Briefly, FQPSK (and its variants) is a linear modulation whose inphase and quadrature components are drawn from a set of waveforms in a constrained way. The set of waveforms, called “wavelets” in the original patents [7], are defined to produce a quasi-constant envelope modulated carrier. (The quadrature waveforms can be defined as delayed versions of the inphase waveforms thereby giving the modulation the look and feel of an offset modulation.) Simon’s pioneering analysis of this waveform revealed that the waveform selection constraints can be formulated as a trellis code and termed this, and the general class of modulations, cross-correlated trelliscoded quadrature modulation or XTCQM. SOQPSK-TG is defined as a constrained ternary partial response CPM with modulation index h = 1/2 and was derived from the full response version of SOQPSK defined in the military UHF satellite communication standard MIL-STD 188-181 [8]. In most situations, a trellis-coded linear modulation and a CPM are not adopted as equivalent transmission techniques in a standard. In this case, the difficulties of adopting a standard with proprietary components and the challenges of licensing patented technology proved the dominant factors in arriving at this odd situation. FQPSK-JR and SOQPSK-TG can coexist in a standard because both have essentially the same bandwidth and the same bit error rate performance when detected using a simple integrate-and-dump offset quadrature phase shift keying (OQPSK) detector. In the absence of errors, 1 2320 – 2345 MHz was reallocated for digital audio radio in one auction while 2305 – 2320 MHz and 2345 – 2360 MHz were allocated to wireless communications services in the other auction.

c 2008 IEEE 0090-6778/08$25.00 

NELSON et al.: NEAR OPTIMAL COMMON DETECTION TECHNIQUES FOR SHAPED OFFSET QPSK AND FEHER’S QPSK

the simple symbol-by-symbol detector produces exactly the same sequence when the transmitter uses either FQPSK-JR or SOQPSK-TG. It is in this sense that the modulations are considered fully interoperable. The simple symbol-by-symbol detector has two attractive features: 1) low complexity, and 2) it does not have to “know” which modulation is used by the transmitter. These features are achieved at the expense of detection efficiency: the bit error rate performance of this simple detector is about 2 dB worse than what could be achieved with maximum likelihood detection. Since SOQPSK-TG is a CPM and FQPSK-JR is an XTCQM, it is natural to assume that the optimal detector must be equipped with two different detection algorithms and endowed with the knowledge of which modulation is used by the transmitter. In this paper we show how a single detection algorithm can be used for both modulations and that this algorithm does not have to “know” which modulation is used. We refer to such a detector as a common detector and show that its bit error rate performance for both SOQPSK-TG and FQPSK-JR is within 0.1 dB of the maximum likelihood performance for each. While considering candidates for the common detector we will compare each detector’s bit error rate performance for the two modulations in question to the optimum performance for each of those modulations. SOQPSK-TG and FQPSK-JR have similar distance properties and hence their optimum detectors have similar probabilities of bit error. To facilitate these comparisons for SOQPSK-TG, we define the performance metric ΔS = SNRC,S − 10.21 dB (1) where SNRC,S is the signal to noise ratio (SNR, which is Eb /N0 in dB where Eb is the engergy per bit and N0 is the variance of the complex noise) that the common detector requires to achieve probability of bit error Pb = 10−5 when SOQPSK-TG is transmitted and 10.21 dB is the SNR required for the optimum detector to achive the same probability of bit error2 . For FQPSK-JR, we define, ΔF = SNRC,F − 10.32 dB

(2)

where SNRC,F is the SNR required for the common detector to achive Pb = 10−5 when FQPSK-JR is transmitted and 10.32 dB is the SNR required for the maximum likelihood FQPSKJR detector to achieve Pb = 10−5 . ΔS and ΔF quantify the detection efficiency loss for the candidate common detectors and will be used as figures of merit in this paper. The major contribution of this paper is to show that a given maximum likelihood detector can be “tuned” to give near optimal performance for two different modulation types. In particular, • We derive the equivalent XTCQM representation for SOQPSK-TG in Appendix A and use this representation to produce a common approximate XTCQM representation for both SOQPSK-TG and FQPSK-JR. The 2 The number 10.21 dB comes from (10) which is an approximation to the union bound. For both SOQPSK-TG and FQPSK-JR (for which the number is 10.32 dB), the inclusion of two terms in (10) and (13), respectively, is sufficient to obtain a very good approximation even at SNR values as low as 3 dB, as confirmed by simulations reported later in this paper.

725

detection performance of SOQPSK-TG and FQPSK-JR using the maximum likelihood XTCQM detector based on this approximation is analyzed and simulated. This development is detailed in Section III where it is shown that the common XTCQM detector has ΔS = 0.14 dB and ΔF = 0.08 dB. • We derive an approximate CPM representation for FQPSK-JR and use this representation to produce a common approximate CPM representation for both SOQPSKTG and FQPSK-JR. The detection performance of SOQPSK-TG and FQPSK-JR using the maximum likelihood CPM detector based on this approximation is analyzed and simulated. The CPM approximation for FQPSK-JR is derived in Appendix B and the performance analysis is presented in Section IV-A where it is shown that ΔS = 0.21 dB and ΔF = 0.01 dB. • The common CPM representation suggests an equivalent PAM representation which can be used as the basis for a common detector. The PAM representation is presented in Section IV-B. The corresponding analysis and simulation of the common PAM detector shows that ΔS = 0.11 dB and ΔF = 0.09 dB. We use the preceding results to propose a common detector for SOQPSK-TG and FQPSK-JR in Section V. II. I NTEROPERABLE M ODULATIONS A. SOQPSK-TG SOQPSK-TG is defined as a CPM of the form  2Eb exp [j (φ(t, α) + φ0 )] (3) sS (t, α) = Tb where Tb is the bit time (or reciprocal of the bit rate) and Eb is the energy per bit in the signal. The phase is  t  ∞ φ(t, α) = 2πh αn gS (τ − nTb )dτ = 2πh

−∞ n=−∞ ∞ 

αn qS (t − nTb ).

(4)

n=−∞

t where gS (t) is the frequency pulse; qS (t) = −∞ gS (τ )dτ is the phase pulse; φ0 is an arbitrary phase which, without loss of generality, can be set to 0; h = 1/2 is the modulation index; and αn ∈ {−1, 0, 1} are the ternary symbols which are related to the binary input symbols an ∈ {−1, 1} by [9] an−1 (an − an−2 ) . (5) 2 The frequency pulse for SOQPSK-TG is a spectral raised cosine windowed by a temporal raised cosine [6]:     πBt cos πρBt sin 2Tb 2Tb  × w(t) gS (t) = C (6) 2 ×   πBt ρBt 1 − 4 2Tb 2Tb αn = (−1)n+1

for w(t) =

⎧ ⎪ 1 ⎪ ⎪ ⎨

1 ⎪2

⎪ ⎪ ⎩0

+

1 2

cos



π T2



t 2Tb

− T1



0 ≤ 2Tt < T1 b T1 ≤ 2Tt b ≤ T1 + T2 . T1 + T2 < 2Tt b (7)

726

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

B. FQPSK-JR

q (t) S

FQPSK-JR is defined as an OQPSK modulation of the form  sF (t) = sI,m (t − nTs ) + jsQ,m (t − nTs − Ts /2). (11)

0.5 q (t) F

Amplitude

0.4

n

g (t) F 0.3 0.2 gS(t) 0.1 0 −4

−3

−2

−1

0 t/T

1

2

3

4

b

Fig. 1. The frequency pulse and phase pulse for both SOQPSK-TG (gS (t) and qS (t)) and the CPM approximation of FQPSK-JR (gF (t) and qF (t)).

For SOQPSK-TG, the parameters are3 ρ = 0.7, B = 1.25, T1 = 1.5, and T2 = 0.5. The constant C is chosen to make qS (t) = 1/2 for t ≥ 2(T1 + T2 )Tb . The frequency pulse and corresponding phase pulse for this case are shown in Figure 1. Observe that these values of ρ, B, T1 and T2 make SOQPSKTG a partial response CPM spanning L = 8 bit intervals. An analysis of maximum likelihood detection of SOQPSK was performed by Geoghegan [10], [11], [12] following the standard union bound technique based on pairwise error probabilities [13]. The binary-to-ternary mapping (5) contributes an extra step to the analysis. Let a = . . . ak−3 , ak−2 , ak−1 , ak , ak+1 , ak+2 , ak+3 , . . .

(8)

represent a generic binary symbol sequence with ak ∈ {−1, +1}. The minimum distance error event occurs between the waveforms corresponding to two binary symbol sequences whose difference satisfies a1 − a2 = ± [. . . , 0, 0, 0, 2, 0, 0, 0, . . .]

(9)

where the difference (or erroneous symbol) occurs at index k. As it turns out, there are two ways a pair of binary sequences can produce (9). Half of the sequence pairs have ak−1 = ak+1 and are characterized by waveforms separated by a normalized squared Euclidean distance of 1.60. The other half of the sequence pairs have ak−1 = −ak+1 and are characterized by waveforms separated by a normalized squared Euclidean distance of 2.58. Since these error events produce one bit error, the probability of error is well approximated by



   1 1 Eb Eb Pb ≈ 1 × Q +1× Q . (10) 1.60 2.58 2 N0 2 N0 Using this expression, Pb = 10−5 is achieved at SNR = 10.21 dB. 3 In the original publication [6], two versions of SOQPSK were described: SOQPSK-A defined by ρ = 1, B = 1.35, T1 = 1.4, and T2 = 0.6 and SOQPSK-B defined by ρ = 0.5, B = 1.45, T1 = 2.8, and T2 = 1.2. SOQPSK-A has a slightly narrower bandwidth (measured at the -60 dB level) and slightly worse detection efficiency than SOQPSK-B. The Telemetry Group of the Range Commanders Council adopted the compromise waveform, designated SOQPSK-TG in 2004.

with data dependent pulses sI,m (t) and sQ,m (t) each drawn in a constrained way from a set of 16 waveforms [5]. The waveform index m is determined by the modulating data bits as explained in [14]. The 16 pulses have a duration of 2Tb = Ts and are listed in [5] and [14]. Simon showed that the original version of FQPSK has an XTCQM interpretation from which the optimum maximum likelihood detector followed [14]. This representation consists of 16 waveforms for the inphase component and 16 waveforms for the quadrature component for a total of 32 possible complex-valued waveforms when the constraints on possible combinations are taken into account. The XTCQM representation of FQPSK-JR is the same as that for the original FQPSK except that three of the waveforms are modified. Consequently the optimum detector for FQPSK-JR has the same form as that described by Simon [14] for FQPSK. For the purposes of comparison with the XTCQM representation of SOQPSK-TG, it is advantageous to re-express FQPSK-JR in the form sF (t) =  IF (t − kTs ; a2k , . . . , a2k−4 ) + jQF (t − kTs ; a2k , . . . , a2k−4 ). k

(12)

Five information bits are used to select an in-phase waveform IF (t; ·) and a quadrature waveform QF (t; ·) which are transmitted during an interval of 2Tb seconds. The next waveform is determined by clocking in two new bits (and discarding the two oldest bits) to form a new group of 5 bits that select the waveform. This slightly different, but equivalent, point of view represents the memory in the modulated carrier using a sliding window that is five bits wide and strides through the input data 2 bits at a time. Note that IF (t; ·) is drawn from a set of 16 waveforms (the sI,m (t) of (11) which are uniquely specified using 4 bits) but that the addressing uses 5 bits. This is explained as follows: When FQPSK is expressed in the form (12), the list of 32 waveforms for IF (t; ·) consists of the 16 waveforms sI,m (t) of (11) each repeated twice. The same applies to QF (t; ·) together with sQ,m (t) of (11). This “double listing” is required to accommodate the waveform indexing scheme to produce the proper 32 waveforms. The representation is largely notional and is used to conceptualize the relationships between FQPSK and SOQPSK. The asymptotic performance of maximum likelihood detection of FQPSK has been analyzed by Simon [14]. In concept, maximum likelihood detection of FQPSK organizes the outputs of 32 matched filters (one filter matched to each of the 32 possible transmitted waveforms) in a trellis and performs maximum likelihood sequence estimation. The standard union bound composed of pairwise error probabilities is used to quantify the bit error rate performance of this modulation. The minimum distance error events span three trellis states. Over this span, every trellis state is reachable by every trellis state via two paths. Thus there are 32 × 32 × 2 = 2048 such

NELSON et al.: NEAR OPTIMAL COMMON DETECTION TECHNIQUES FOR SHAPED OFFSET QPSK AND FEHER’S QPSK

727

t = 2kTb

−1

10

I (− t ;0 ) −2

bit error rate

10

in-phase component

−3

10

M

I (− t ; N X − 1)

−4

10

−5

10

t = 2kTb

Theory (Tx=SOQPSK −TG) Theory (Tx=FQPSK −JR) I&D (Tx=SOQPSK−TG) Common det. filter (Tx=SOQPSK−TG) I&D (Tx=FQPSK−JR) Common det. filter (Tx=FQPSK−JR)

Q (− t;0 )

−6

10

4

6

8 Eb/N0 (dB)

Trellis 2×NX states

10

12

Fig. 2. Bit error rates for SOQPSK-TG and FQPSK-JR for the integrate-anddump (I&D) detector and the common symbol-by-symbol detector along with the theoretical curves for each modulation. For the I&D detector ΔS = 2.0 dB and ΔF = 2.2 dB while the average matched filter detector has ΔS = 1.5 dB and ΔF = 1.6 dB.

paths over three steps. Each of these paths has one competing path associated with it that contributes a single bit error. 1024 of these path pairs are separated by a normalized Euclidean distance of 1.56 and 1024 of these path pairs are separated by a normalized Euclidean distance of 2.59. As such, the probability of bit error is well approximated by



   1024 1024 Eb Eb Q Q +1× . Pb ≈ 1 × 1.56 2.59 2048 N0 2048 N0 (13) (Why the coefficients are expressed this way will become evident in Section III.) Using this expression, Pb = 10−5 is achieved at SNR = 10.32 dB. C. Symbol-by-Symbol Detection SOQPSK-TG and FQPSK-JR are considered to be interoperable because of their essentially identical bandwidth and similar bit error rate performance with symbol-by-symbol detection using an integrate-and-dump detection filter. Using an unshaped OQPSK detector with FQPSK (and its variants) is natural since FQPSK is defined as an offset modulation with data dependent pulse shapes. The use of this detection technique with SOQPSK-TG is motivated by the well established connection between CPM with modulation index h = 1/2 and OQPSK [15]–[18]. Symbol-by-symbol detection has been thoroughly investigated for SOQPSK-TG by Geoghegan [10] and for FQPSK by Simon [14]. Our own simulation results are shown in Figure 2 where we see that ΔS ≈ 2.0 dB and ΔF ≈ 2.2 dB. Symbol-by-symbol detection with better detection filters has also been investigated for SOQPSK-TG in [10] and for FQPSK in [14]. The XTCQM representations for both modulations can be used to define detection filters for use with a symbol-by-symbol detector as explained in Appendix A. The bit error rate performance of SOQPSK-TG and FQPSK-JR using the improved detection filter is also plotted in Figure 2. Observe that the improved performance reduces ΔS to about 1.5 dB and ΔF to about

quadrature component

M

Q (− t ; N X − 1) Fig. 3. Block diagram of the maximum likelihood XTCQM detector. Each filter is a real-valued filter of length 2Tb . The indexes in the filter impulse responses are the decimal equivalents of the binary symbol patterns that define the waveforms.

1.6 dB. The performance of this approach falls well short of that of maximum likelihood detection since symbol-by-symbol detection ignores the memory inherent in the waveforms. The fact that these losses are still significant motivates the search for common detectors that perform better than the symbol-bysymbol detector. III. C OMMON XTCQM D ETECTOR A generic maximum-likelihood XTCQM detector is illustrated in Figure 3. The in-phase component of the noisy received waveform is filtered by a bank of filters matched to the NX possible in-phase waveforms. Likewise, the quadrature component of the received waveform is filtered by a bank of filters matched to the possible quadrature waveforms. These matched filter outputs are sampled, once per symbol, and used by a maximum likelihood sequence estimator operating on a trellis with 2NX states. For FQPSK and FQPSK-JR, NX = 16. In order to formulate such a detector for SOQPSK-TG, an XTCQM representation of SOQPSK-TG is needed. This representation, developed in Appendix A, is of the form shown in (14). This representation consists of 1024 in-phase waveforms IS (t) and 1024 quadrature waveforms QS (t) indexed by a sliding window that is 11 bits wide and strides through the bits 2 at a time. The XTCQM representation of SOQPSK-TG requires 2048 complex waveforms and is an exact representation of this modulation. The maximum-likelihood XTCQM detector is that of Figure 3 with NX = 1024. Since this detector performs maximum likelihood detection, the bit error rate performance of this detector is given by (10). In the case of the optimum XTCQM detector, the constant that scales each term in (10), 1/2, is obtained as follows. The minimum distance error event spans 6 trellis stages and produces a single bit error. Over this interval, each of the 2048 states is reachable by every state via two paths, so there are a total of 2048 × 2048 × 2 = 223 possible pairs of paths to consider. 222 of these path pairs are separated by a normalized Euclidean distance of 1.60 and the other 222 are separated by a normalized Euclidean distance of

728

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

sS (t) =



IS (t − kTs ; a2k+1 , . . . , a2k−9 ) + jQS (t − kTs ; a2k+1 , . . . , a2k−9 )

(14)

k

2.58. Thus (10) can be rewritten as  



 1 × 222 1 × 222 Eb Eb Pb ≈ + . Q 1.60 Q 2.58 223 N0 223 N0 (15) This form of the expression for the probability of bit error will be helpful in the discussion below about suboptimal detectors. Since the number of matched filters and the number of trellis states is different for FQPSK-JR and SOQPSK-TG, the forgoing formulation does not permit the detector in Figure 3 to operate as a common detector for these modulations. This issue is resolved by identifying a set of 32 waveforms suitable for use with both FQPSK-JR and SOQPSK-TG in Figure 3. The first step is to reduce the number of waveforms required to represent SOQPSK-TG to 32. This is achieved using the averaging technique described in Appendix A to produce (16). The in-phase waveform I˜S (t; ·) is drawn from a set of 16 ˜ S (t; ·) is drawn waveforms while the quadrature waveform Q from a different set of 16 waveforms. The total number of complex-valued waveforms is 32, and 5 bits are used to select the waveform. Note that with a simple redefinition of the bit indexes, the representation (16) is identical in form to the XTCQM representation of FQPSK-JR given by (12). Since the sets of waveforms from which the waveforms IF (t; ·) and I˜S (t; ·) are drawn are different, a common set must be identified in order to produce a common detector of the form shown in Figure 3. The common waveforms define the matched filters and trellis connections. Three possibilities were explored: 1) the FQPSK-JR waveforms IF (t; ·) and QF (t; ·); ˜ S (t; ·); and 3) 2) the SOQPSK-TG waveforms I˜S (t; ·) and Q average waveforms IF (t; ·) + I˜S (t; ·) 2 . ˜ S (t; ·) QF (t; ·) + Q Qavg (t; ·) = 2 Iavg (t; ·) =

(17)

A number of other possibilities could be envisioned (e.g., waveforms that minimize the average squared error). However, the performance results, summarized below, show that a detector based on the average waveforms is on the order of 1/10 of a dB from optimum. This suggests there is very little to be gained by using waveforms based on more elaborate criteria. Since the set of waveforms used by the detector is different from the set of waveforms used by the modulator, the mismatched receiver analysis technique, described in [13], [19], [20], can be used to evaluate the performance of each of these options. Due to space limitations, only the performance of the common detector based on the average waveforms is reported here since it was the best of the three4 . When FQPSK-JR is produced by the modulator, the mismatch is a result of the fact that the detector’s model for 4 For a detailed performance analysis of all three common XTCQM detectors including plots of the distance spectra, see [21].

the transmitted signal is based on the 32 waveforms defined by (17) rather than on the actual 32 FQPSK-JR waveforms. Each error event included in (13) involves a pair of waveforms defined by bit sequences a1 and a2 . Let s(t; a1 ) and s(t; a2 ) represent the corresponding signals produced by the transmitter and let s˜(t; a1 ) and s˜(t; a2 ) represent the corresponding signal used by the detector based on its set of waveforms. The probability of the error event is 

 E b d˜2 Q N0 where d˜ = and



1 d1 − d2 √ 2Eb d3

 d1 = d2 =

R R

d3 =

(18)

2

|s(t; a1 ) − s˜(t; a2 )| dt

(19)

|s(t; a1 ) − s˜(t; a1 )|2 dt

(20)

2

|˜ s(t; a1 ) − s˜(t; a2 )| dt.

(21)

R

The 1024 pairs of sequences associated with the minimum distance error event in (13) produce a set of 1024 modified distances denoted d˜l (for l = 0, . . . 1023) consisting of 128 unique values that range from 1.41 to 1.70. In the same way, the 1024 pairs of sequences associated with the error event quantified by the second term in (13) produce a set of 1024 modified distances d˜m (for m = 0, . . . 1023) with 128 unique values ranging from 2.42 to 2.76. The probability of error is  



 1023 1023  1 1  E E b b Pb ≈ + . Q Q d˜2l d˜2m 2048 N0 2048 m=0 N0 l=0 (22) A plot of this expression, along with computer simulations, are shown in Figure 4. This detector achieves Pb = 10−5 at SNR = 10.37 dB. Thus, ΔF = 0.05 dB. When SOQPSK-TG is produced by the modulator, the mismatch is a consequence of the fact that the detector uses only 32 waveforms (the full representation requires 2048 waveforms). As before, all 32 trellis states can be reached by all the trellis states via two paths over three trellis stages. These 32 × 32 × 2 = 2048 trellis paths correspond to 23 × 2048 × 23 = 217 sequences that the modulator is capable of producing. (Recall that the averaging process eliminated three bits from each end of the 11-bit sequence that defined the full SOQPSK-TG waveforms.) Pairing each of the possible transmitted paths with the corresponding trellis path using the modified distance measure (18) produces a set of 216 distances denoted d˜2l for l = 0, 1, . . . 216 − 1 ranging from 1.40 to 1.81 that correspond to the error events quantified by the first term in (15) and a set of 216 distances denoted d˜2m for m = 0, 1, . . . 216 −1 ranging from 2.45 to 2.71 that correspond

NELSON et al.: NEAR OPTIMAL COMMON DETECTION TECHNIQUES FOR SHAPED OFFSET QPSK AND FEHER’S QPSK

sS (t) ≈



729

˜ S (t − kTs ; a2k−2 , . . . , a2k−6 ) I˜S (t − kTs ; a2k−2 , . . . , a2k−6 ) + j Q

(16)

k

−1

compute phase state

10

t = kTb

−2

bit error rate

10

θn

exp{− jφ (− t ;0 )}

−3

10

in-phase/quadrature −4

Optimum SOQPSK−TG Optimum FQPSK−JR SOQPSK−TG analysis FQPSK−JR analysis SOQPSK−TG sim. FQPSK−JR sim.

−5

10

−6

10

Trellis 4×2L−1 states

M

10

3

4

5

6

7 8 Eb/N0 (dB)

{ (

exp − jφ − t ;2 L +1 − 2

9

10

11

Fig. 4. Probability of bit error versus SNR (Eb /N0 in dB) for the common XTCQM detector for both SOQPSK-TG and FQPSK-JR. For this detector ΔS = 0.14 dB and ΔF = 0.05 dB.

to the error events quantified by the first term in (15). Thus the probability of error is approximated by  



 16 −1 216 −1 2 1 1  E E b b + 17 . Q Q d˜2l d˜2m Pb ≈ 17 2 N0 2 m=0 N0 l=0 (23) A plot of this expression, along with computer simulations, are shown in Figure 4. This detector achieves Pb = 10−5 at SNR = 10.35 dB. Thus, ΔS = 0.14 dB. The common XTCQM detector has the stucture shown in Figure 3. As explained above, there are NX = 32 complex waveforms in the common XTCQM representation of SOQPSK-TG and FQPSK-JR. However, there are only 16 unique inphase waveforms (each is repeated once). In addition, these 16 waveforms consist of 8 waveforms and their negatives. The same is true for the quadrature waveforms. Furthermore, the 8 quadrature waveforms are shifted versions of the 8 inphase waveforms. By exploiting these symmetries, one can reduce NX to 8 in Figure 3. Consequently the common XTCQM detector has 16 real valued length 2Tb matched filters and a 16 state trellis the same as the XTCQM detector described by Simon in [14], [22] for FQPSK. IV. C OMMON CPM D ETECTOR Another candidate for the common detector is the CPM detector. SOQPSK-TG is defined as a CPM, as explained in Section II-A. In order to formulate a common CPM detector, a CPM representation of FQPSK-JR is needed. A CPM representation of a different (and proprietary) variety of FQPSK known as FQPSK-B was presented in [23] and [24], but that work did not result in a representation that is compatible with the CPM representation of SOQPSK described in Section II-A. In Appendix B we present a representation that is compatible with SOQPSK where we show that FQPSK-JR

)}

2L+1 − 1 matched filters

Fig. 5. The maximum likelihood CPM detector for a length-L frequency pulse and ternary symbols constrained by (5). Each filter is a complex-valued filter of length Tb . The indexes in the filter impulse responses are the decimal equivalents of the binary symbol patterns that define the waveforms.

can be described in the form given by (3) – (4) for SOQPSK with the same constrained ternary alphabet (5) and a length2Tb frequency pulse gF (t) given by (44). The two most common approaches for detecting CPM signals are the detector based on the traditional complex exponential representation of CPM [13] and a detector based on a decomposition of CPM [25]–[31]. In the following we explore how detectors following both of these approaches can be modified to form common CPM detectors for SOQPSK-TG and FQPSK-JR. A. Common Detector Based on Complex Exponential Representation With FQPSK-JR represented (approximately) as a CPM signal, we can develop a common CPM detector for SOQPSKTG and FQPSK-JR. The structure for the maximum likelihood detector for CPM with a length-L frequency pulse and ternary symbols, whose relationship to the binary symbols is given by (5), is illustrated in Figure 5. The complex-baseband signal is processed by a bank of 2L+1 −1 length-Tb matched filters [32] that correlate the received signal with the possible waveforms defined by the correlative state vector. The matched filter (or correlator) outputs are sampled every Tb seconds in synchronism with the ternary symbol boundaries and processed by a maximum likelihood sequence estimator operating on a trellis with 4 × 2L−1 states. (Note that with the ternary alphabet (5), there are 4 phase states.) In order for this structure to operate as a common detector, the representation for both waveforms must be based on a common frequency pulse. Since L = 8 for SOQPSK-TG and L = 2 for the FQPSK-JR approximation, the first step is to truncate the SOQPSK-TG frequency pulse gS (t) to span 2Tb . This is a common technique for reducing the complexity of partial response CPM [19]. The next step is to identify a suitable frequency pulse upon which the detector can be based.

730

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

the frequency pulse of the CPM approximation for FQPSK-JR g2 (t) = gF (t),

−1

10

−2

10

bit error rate

As with the case for the common XTCQM detector, several options could be envisioned. The three explored in this work are the truncated SOQPSK-TG frequency pulse  gS (t + 3Tb ) 0 ≤ t ≤ 2Tb g1 (t) = , (24) 0 otherwise

−3

10

−4

10

Optimum SOQPSK−TG Optimum FQPSK−JR SOQPSK−TG analysis FQPSK−JR analysis SOQPSK−TG sim. FQPSK−JR sim.

(25)

and the average of the first two g1 (t) + g2 (t) . (26) 2 All three of these pulses have length L = 2 bit times. There are seven pairs of (αn−1 , αn ) that the receiver can encounter5, so seven length-Tb complex matched filters are required. Exploiting the symmetries of cosine and sine, the filterbank may be implemented using 12 real-valued, lengthTb filters and two integrate-and-dump filters. The mismatched receiver analysis used in Section III can be used here to determine which of the three candidates for the common detector is the best option. A detailed analysis can be found in [21]. The results of that analysis are summarized here. In all cases, the error event given by (9) is the one of interest. The probability of bit error for each case is of the form  



 M1 −1 M 2 −1  1 1  E E b b Pb ≈ + Q Q d˜2l d˜2m M N0 M m=0 N0 l=0 (27) where M is the total number of pairs of bit sequences that need to be considered, M1 is the number of pairs whose distances are close to 1.56 (the minimum distance in the optimum detector), and M2 is the number of pairs whose distances are close to 2.59 (the next smallest distance in the optimum detector). When SOQPSK-TG is transmitted, M = 256 and M1 = M2 = 128 in (27). The detector based on frequency pulse g2 (t) has the best distance properties among the three options, with the 128 sequences in the first sum of (27) having modified distances ranging from 1.26 to 1.93 and the 128 sequences in the second sum of (27) having modified distances ranging from 2.29 to 2.86. When FQPSK-JR is transmitted, M = 64 and M1 = M2 = 32 (these numbers are smaller than they are for SOQPSK-TG because the frequency pulse of FQPSK-JR has L = 2 rather than L = 8). The detector is based in the frequency pulse g2 (t) has the best modified distance properties with 32 pairs of waveforms having distances ranging from 1.54 to 1.58 and 32 pairs of waveforms having distances ranging from 2.57 to 2.61. Plots of (27) for both modulations are illustrated in Figure 6. Simulation results are also included to demonstrate how accurate the approximations are. The common CPM detector based on g2 (t) achieves ΔS = 0.21 dB and ΔF = 0.01 dB. g3 (t) =

5 There are a total of 32 = 9 different combinations of two ternary symbols, but the combinations (−1, +1) and (+1, −1) are not allowed as a result of the constraints that (5) places on the sequence of ternary symbols.

−5

10

−6

10

3

4

5

6

7 8 Eb/N0 (dB)

9

10

11

Fig. 6. Probability of bit error for SOQPSK-TG and FQPSK-JR for the common CPM detector (based on the FQPSK-JR frequency pulse) along with the theoretical curves for each modulation. Also shown are simulation results for this detector with both modulations. The common CPM detector has ΔS = 0.21 dB and ΔF = 0.01.

B. Common Detector Based on PAM Representation It is well known that CPM signals can be decomposed (either exactly or approximately) as sums of linearly modulated signals [25], [27], [29], [31], [33]. Since SOQPSKTG is defined as a CPM, and with the CPM approximation of FQPSK-JR, the PAM representations of these modulations provide another option for a common detector. The PAM decomposition of CPM signals is well known to provide an alternative and often greatly simplified structure for detecting CPM signals. The PAM representation of binary CPM was first introduced by Laurent [25] and was used as the basis for simplified CPM detectors by Kaleh [26]. The representation was later extended to M-ary CPM (with M even) by Mengali and Morelli [27] and applied to the detection of CPM by Colavolpe and Raheli [28]. In addition, Huang and Li presented a different approach for decomposing M-ary CPM (M even) based on a switched linear modulation model [33]. CPM detectors based on this model were presented in [34]. The non-binary multi-index case was explored by Perrins and Rice [29]. Recently the PAM decomposition for M-ary CPM (with M odd) has been developed [31]. This PAM decomposition was applied to SOQPSK-TG in [35]. Because SOQPSK-TG and the CPM approximation of FQPSK-JR are ternary CPM, we will use the Perrins-Rice decomposition to illustrate the technique for developing a common PAM detector. A generic block diagram of the maximum likelihood detector for CPM based on the PAM representation is illustrated in Figure 7. A bank of filters, each matched to one of the NP constituent PAM pulses processes the received signal at complex baseband. The filter outputs are sampled once per bit and used by a maximum likelihood sequence estimator operating on a trellis with SP states. In order to formulate a common PAM detector, it is necessary to approximate both SOQPSK-TG and FQPSK-JR by the same number of PAM pulses. Suitable pulses for this approximation must also be identified. The PAM decomposition of SOQPSK-TG consists of rep-

NELSON et al.: NEAR OPTIMAL COMMON DETECTION TECHNIQUES FOR SHAPED OFFSET QPSK AND FEHER’S QPSK

t = kTb

c0 (− t )

(t)

c0,S (t)

c

(t)

c

0,F

1,F

0.4 Amplitude

Trellis SP states

SOQPSK−TG

c 0.5

c1 (− t ) in-phase/quadrature

FQPSK−JR

0.6

731

1,S

(t)

0.3 0.2

M

0.1

c N P −1 (− t )

0

Fig. 7. The maximum likelihood detector for CPM based on the equivalent PAM representation. Each filter is a real-valued filter with varying lengths as described in the text.

resenting the continuous phase signal as the sum of pulse amplitude modulated signals. Thus the nonlinear modulation is transformed into a linear modulation where the modulating symbols (the so-called pseudo-symbols) are related to the original data symbols in a nonlinear way. The PAM decomposition of SOQPSK-TG is given by sS (t) =

4373  k=0

bk,i ck,S (t − iTb )

(28)

i

where bk,i are the pseudo-symbols which are a function of the data bits as explained in [31] and ck,S (t) are the PAM pulses for the decomposition of SOQPSK-TG. The number of terms in the sum in (28) is a function of the duration of the frequency pulse and is explained in detail in [31]. The PAM pulses have varying lengths ranging from (L + 1)Tb (which is 9Tb in this case) to Tb . The large number of terms in the sum in (28) is due to the length of the SOQPSK-TG frequency pulse gS (t) and would appear to render this representation prohibitively complex. However, as is often the case with the PAM decomposition of partial response CPM, the vast majority of the PAM pulses ck,S (t) contains essentially no energy and hence can be neglected. In fact, sS (t) is well approximated as the sum of only two pulses which are known as the principal pulses [35]. As a result, (28) can be rewritten as  b0,i c0,S (t − iTb ) + b1,i c1,S (t − iTb ). (29) sS (t) ≈ i

The principal pulses for SOQPSK-TG are plotted in Figure 8. Strictly speaking, the pulses c0,S (t) and c1,S (t) have lengths 9Tb and 8Tb , respectively. However, as can be seen in Figure 8, these pulses are approximately zero over most of their durations. We will exploit this observation when forming the common PAM detector below. The PAM decomposition for FQPSK-JR is sF (t) =

5   k=0

bk,i ck,F (t − iTb ).

(30)

i

There are six pulses in the PAM representation of FQPSKJR, but four of them contain negligible energy, so the PAM

0

1

2

3

4

5

6

7

8

9

t/Tb

Fig. 8. Principal PAM pulses for SOQPSK-TG and FQPSK-JR. The lengths of these pulses are L + 1 and L, so the SOQPSK-TG pulses have lengths 9Tb and 8Tb while the FQPSK-JR pulses have lengths 3Tb and 2Tb . Note that for the SOQPSK-TG pulses, even though, strictly speaking, the pulses have lengths of 9Tb and 8Tb , they are well approximated as being nonzero only over lengths of 3Tb and 2Tb , respectively. This observation will be exploited when formulating the common PAM detector for these two modulations.

decomposition of FQPSK-JR is well approximated as  b0,i c0,F (t − iTb ) + b1,i c1,F (t − iTb ). sF (t) ≈

(31)

i

The principal pulses c0,F (t) and c1,F (t) for FQPSK-JR have lengths 3Tb and 2Tb , respectively and are shown in Figure 8. The trellis for the SOQPSK-TG based PAM detector is identical to the trellis for the FQPSK-JR-based PAM detector. This is because the relationship between the pseudo-symbols and the data bits is the same for both modulations. The trellis has SP = 4 states and is decribed in [35]. This trellis is the obvious choice for the common detector trellis. The remaining task in formulating the common PAM detector is the determination of the common matched filters. As explained above, the PAM pulses for SOQPSK-TG are much longer than those for FQPSK-JR. However, as also explained above, the SOQPSK-TG PAM pulses are essentially equal to zero over the majority of their durations. In fact, the first and second principal SOQPSK-TG pulses are very well approximated as being nonzero over lengths of 3Tb and 2Tb , respectively, which are the lengths of the principal FQPSK-JR PAM pulses. Consequently, we consider the following options for the PAM pulses for the common PAM detector: Option 1, the truncated SOQPSK-TG based PAM pulses  c0,S (t + 3Tb ) 0 ≤ t ≤ 3Tb c0,1 (t) = 0 otherwise  c1,S (t + 4Tb ) 0 ≤ t ≤ 2Tb , (32) c1,1 (t) = 0 otherwise Option 2, the FQPSK-JR based PAM pulses c0,2 (t) = c0,F (t)

c1,2 (t) = c1,F (t),

(33)

and Option 3, pulses which are the average of those two c0,3 (t) =

c0,1 (t) + c0,2 (t) 2

c1,3 (t) =

c1,1 (t) + c1,2 (t) . 2 (34)

732

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

TABLE I C OMPARISON OF C OMMON D ETECTORS . N OTE THAT THE DETECTION FILTER COUNTS ARE GIVEN IN TERMS OF EQUIVALENT REAL - VALUED , LENGTH -Tb FILTERS IN ORDER TO ALLOW A CONSISTENT COMPARISON BETWEEN THE DETECTORS .

−1

10

−2

bit error rate

10

SNR (Eb /N0 in dB) for Pb = 10−5 Optimal Sym. XTCQM CPM FQPSK-JR 10.32 12.0 10.37 10.33 SOQPSK-TG 10.21 11.7 10.35 10.42 Detection Efficiency Loss (dB) ΔF 1.6 0.05 0.01 ΔS 1.5 0.14 0.21 Detector Complexity Detection filters 1 32 12 Trellis states 1 16 8

−3

10

−4

10

Optimum SOQPSK−TG Optimum FQPSK−JR SOQPSK−TG analysis FQPSK−JR analysis SOQPSK−TG sim. FQPSK−JR sim.

−5

10

−6

10

3

4

5

6

7 8 Eb/N0 (dB)

9

10

PAM 10.41 10.32 0.09 0.11 5 4

11

Fig. 9. Probability of bit error for the common PAM detector based on the SOQPSK-TG PAM pulses. Also shown are simulation results for both SOQPSK-TG and FQPSK-JR detected with this common PAM detector. The common PAM detector has ΔS = 0.11 dB and ΔF = 0.09 dB.

All of the options listed above have length 3Tb and 2Tb for the two PAM pulses . As a result, the common PAM detector has 2 real-valued matched filters which are the equivalent of 5 real-valued length Tb matched filters. The common PAM detector is a special case of the detector of Figure 7 with NP = 2 and SP = 4. A mismatched receiver analysis can be used to determine which set of the candidate PAM pulses listed above is the best option for the common PAM detector. The analysis for the PAM detector differs from the analysis for the XTCQM or CPM detector because the distance measure given by (18) does not apply as explained in [30]. The modified distance measure d´ that is appropriate for PAM detectors along with the resulting expression for the probability of bit error is given in [30]. This analysis is given in [21] and the results are summarized here. An examination of d´ reveals that the distances for the three options are very similar for both SOQPSK-TG and FQPSKJR, and, as a result, the bit error rate performance is also very similar. In fact, when SOQPSK-TG is the transmitted modulation the three options vary by less than 0.02 dB at Pb = 10−5 and when FQPSK-JR is the transmitted modulation they vary by less than 0.01 dB at that same Pb . The PAM detector based on Option 1, the truncated SOQPSK-TG PAM pulses, is very slightly better than the other options, so that detector is chosen as the common PAM detector. The probability of bit error for this detector is plotted in Figure 9. With this common PAM detector ΔS = 0.11 dB and ΔF = 0.09 dB. Computer simulations confirm the mismatched analysis and are also plotted in Figure 9. V. C ONCLUSIONS SOQPSK-TG and FQPSK-JR share many similarities. We have shown that both may be represented as cross-correlated trellis-coded quadrature modulations and that both may be represented as continuous phase modulations (although the CPM interpretation for FQPSK-JR is only an approximation). These common views confirm their interoperability and suggest architectures for common detectors: XTCQM, CPM, and PAM

based detectors. We have shown how these three detectors can be modified to form common detectors for SOQPSK-TG and FQPSK-JR whose bit error rate performance is a great improvement over the existing common detector based on symbol-by-symbol detection. This improvement comes at the cost of increased complexity in the detector. The attractive feature of these common detectors is that they offer this improved detection efficiency without requiring knowledge of which modulation is employed by the transmitter. The bit error rate performance and complexity of these detectors are summarized in Table I. All three of the proposed common detectors have bit error rate performances that are fairly close to optimum. The common CPM detector has the best performance for FQPSK-JR (ΔF = 0.01 dB) and the common PAM detector has the best performance for SOQPSK-TG (ΔS = 0.11 dB). When considering the performance of these detectors for both modulations combined, one could look for the minimum arithmetic mean of ΔF and ΔS , the minimum geometric mean of these losses, or one could minimize the maximum loss between ΔF and ΔS . Regardless of which criterion one uses, the PAM detector has the best detection performance, although the difference among the three detectors is quite small. Because the PAM detector has the lowest complexity of the three detectors considered, it appears to be the best choice for the common detection of SOQPSK-TG and FQPSK-JR. A PPENDIX A XTCQM R EPRESENTATION OF SOQPSK-TG In order to obtain the XTCQM representation of SOQPSKTG we need to determine the data dependent pulses IS (t; an , . . .) and QS (t; an , . . .) for this modulation which are analogous to those for FQPSK-JR in (12). These waveforms have a duration of 2Tb , which causes the resulting XTCQM signal trellis to be time invariant6. In order to obtain length 6 Aulin’s quadrature representation of CPM has waveforms with a duration of Tb and the resulting signal trellis is time varying [36]. Rimoldi incorporated a tilted phase into Aulin’s length Tb representation [37] to obtain a time invariant trellis. Simon took a different approach to obtain a quadrature representation with a time invariant trellis: he extended the duration of the waveforms to 2Tb [9]. Our quadrature representation of SOQPSK-TG is a combination of Aulin’s approach and Simon’s approach. It has a time invariant trellis due to the fact that the waveforms have a duration of 2Tb similar to Simon but it does not have an encoder separate from a waveform mapper as Simon’s representation does.

NELSON et al.: NEAR OPTIMAL COMMON DETECTION TECHNIQUES FOR SHAPED OFFSET QPSK AND FEHER’S QPSK

2Tb quadrature waveforms for SOQPSK-TG, we begin by examining the phase (4) during the interval nTb ≤ t ≤ (n+1)Tb . φ(t, α) during this interval can be written as φ(t, α) = 2πh + 2πh

n−L 

αk qs (t − kTb )

k=−∞ n 

αk qS (t − kTb )

k=n−L+1

=

n−L π  αk + π 2 k=−∞

n 

αk qs (t − kTb ),

k=n−L+1

nTb ≤ t ≤ (n + 1)Tb .

(35)

The duration of interest can be extended from (n + 1)Tb to (n+ 2)Tb by extending the sum for the correlative state vector to produce φ(t; α) =

n−L π  αl + π 2 l=−∞

nTb ≤ t ≤ (n + 2)Tb

n+1 

αl qS (t − lTb ),

l=n−L+1

(36)

where n is now constrained to be even. We make this constraint explicit by setting n = 2k. Inserting (5) into (36) results in φ(t; a2k ) = θ2k + π

2k+1  i=2k−L+1



(−1)i+1 ai−1 (ai − ai−2 ) qS (t − iTb ) 2 (37) 

where a2k = a2k−L−1 a2k−L · · · a2k+1 and θ2k = 2k−L i=−∞ αi is the phase state. With L = 8Tb there are 9 terms in the sum in (37) and 11 bits that contribute to φ(t, a2k ) during the interval 2kTb ≤ t ≤ (2k + 2)Tb . The phase state θ2k does not introduce a dependency on any additional bits. Therefore the number of waveforms is determined solely by the number of bits that contribute to the sum in (37). As a result, 2048 complex waveforms are needed to exactly represent SOQPSK-TG as an XTCQM. The I and Q waveforms are given by π 2

IS (t; a2k ) = cos (φ(t; a2k )) ,

QS (t; a2k ) = sin (φ(t; a2k )) . (38) Then the XTCQM representation of SOQPSK-TG can be expressed as  sS (t) = IS (t − kTs ; a2k ) + jQS (t − kTs ; a2k ). (39) k

This shows that even though SOQPSK-TG is defined as a constrained ternary CPM, it can also be viewed as a XTCQM consisting of 2048 waveforms. This view suggests an alternative form for the optimal detector: an XTCQM detector. As explained in Section III, the common detector requires a representation for SOQPSK that uses 32 waveforms instead of the 2048 required by (39). The number of waveforms required by the XTCQM representation of SOQPSK-TG can be reduced by averaging the waveforms that differ in the first and last bits. These are the waveforms that are most similar. (This technique was used by Simon [22] to reduce the number of waveforms required to represent FQPSK.) This averaging technique is illustrated for the inphase waveforms IS (t; ·)

733

below. Application to the quadrature waveforms QS (t; ·) is straight forward. The number of inphase waveforms is reduced from 2048 to 512 using IS,512 (t; a2k−8 , . . . , a2k ) 1 = IS (t; −1, a2k−8 , . . . , a2k , −1) 4 1 + IS (t; −1, a2k−8 , . . . , a2k , +1) 4 1 + IS (t; +1, a2k−8 , . . . , a2k , −1) 4 1 + sS (t; +1, a2k−8 , . . . , a2k , +1). 4

(40)

Performing the same averaging process on the 512 waveforms results in the 128 waveforms given by IS,128 (t; a2k−7 , . . . , a2k−1 ) 1 = IS,512 (t; −1, a2k−7 , . . . , a2k−1 , −1) 4 1 + IS,512 (t; −1, a2k−7 , . . . , a2k−1 , +1) 4 1 + IS,512 (t; +1, a2k−7 , . . . , a2k−1 , −1) 4 1 + IS,512 (t; +1, a2k−7 , . . . , a2k−1 , +1) 4

(41)

and averaging those waveforms results in the 32 waveforms given by IS,32 (t; a2k−6 , . . . , a2k−2 ) 1 = IS,128 (t; −1, a2k−6 , . . . , a2k−2 , −1) 4 1 + IS,128 (t; −1, a2k−6 , . . . , a2k−2 , +1) 4 1 + IS,128 (t; +1, a2k−6 , . . . , a2k−2 , −1) 4 1 + IS,128 (t; +1, a2k−6 , . . . , a2k−2 , +1). 4

(42)

The waveforms IS,32 (t; ·) and QS,32 (t; ·) are the waveforms ˜ S (t; ·) in (16). I˜S (t; ·) and Q Note that the averaging process can be continued until only two waveforms remain. It turns out to be a single length-2Tb waveform and its negative. This waveform can be averaged with the corresponding average of the FQPSK-JR waveforms to produce a common waveform for use by a common symbolby-symbol detector. The bit error rate of such a detector was discussed in Section II-C. A PPENDIX B CPM A PPROXIMATION OF FQPSK-JR The CPM approximation of FQPSK-JR is obtained by determining the phase pulse qF (t) as a function of the XTCQM waveforms for FQPSK-JR. The length of qF (t) is L = 2 because the waveforms are defined over a two bit interval. The phase pulse, qF (t), is defined by the phase trajectories between constellation points. For example, consider the case where the initial phase state is π/4 and αn = −1. In that case

734

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

the I and Q waveforms (IF (t; ·) and QF (t; ·), respectively) are given by7    πt 2 2 IF (t; ·) = 1 − A cos 2Tb   π(t − Tb ) QF (t; ·) = −A sin 2Tb for 0 ≤ t ≤ 2Tb . The phase pulse in this case is given by ⎡  ⎤ π(t − Tb ) ⎢ −A sin ⎥ ⎢ ⎥ 2Tb ⎢ ⎥ qF (t) = arctan ⎢   ⎥ ⎣ ⎦ πt 1 − A2 cos2 2Tb ⎡ ⎤   πt ⎢ ⎥ −A cos ⎢ ⎥ 2Tb ⎢ ⎥ = arctan ⎢  (43)  ⎥ . ⎣ ⎦ πt 2 2 1 − A cos 2Tb The frequency pulse gF (t) is then the derivative of qF (t) and is given by   Aπ πt sin 2Tb 2Tb gF (t) =  (44)   πt 1 − A2 cos2 2Tb and is plotted in Figure 1 along with the phase pulse qF (t). It is easy to show that starting with the three other phase states this approach produces the same gF (t). The same is true for all four phase states when αn = 1. When αn = 0 no phase transition occurs and the gF (t) in (44) can be assumed. Thus gF (t) in (44) is the frequency pulse for the CPM approximation of FQPSK-JR. As explained in [38], the CPM representation of FQPSK-JR is exact when the sequence of ternary data symbols α is such that a single nonzero αn surrounded by zeros occurs but it is only approximate when two or more consecutive nonzero symbols occur, although the approximation is quite good. Since the phase shift due to nonzero values of αn is ±π/2, the modulation index of the CPM approximation is h = 1/2. R EFERENCES [1] T. Rappaport, Wireless Communications: Principles and Practice. New York: Prentice Hall PTR, 2nd ed., 2001. [2] S. Horan, Introduction to PCM Telemetering Systems. CRC Press, 2nd ed., 2002. [3] Range Commanders Council Telemetry Group, Range Commanders Council, White Sands Missile Range, New Mexico, IRIG Standard 106-04: Telemetry Standards, 2004. [Online]. Available: http://www.jcte.jcs.mil/rcc/manuals/106-04/. [4] P. S. Leung and K. Feher, “F-QPSK–a superior modulation technique for mobile and personal communications,” IEEE Trans. Broadcast., vol. 39, pp. 288–294, June 1993. [5] R. P. Jefferis, “Evaluation of constant envelope offset quadrature phase shift keying transmitters with a software based signal analyzer,” in Proc. Internat. Telem. Conf., San Diego, CA, Oct. 2004. 7 A is an adjustable parameter that appears in the definitions of the 16 √ waveforms for FQPSK. This parameter is set to 1/ 2 to produce the constant envelope FQPSK-JR.

[6] T. J. Hill, “An enhanced, constant envelope, interoperable shaped offset QPSK (SOQPSK) waveform for improved spectral efficiency,” in Proc. Internat. Telem. Conf., San Diego, CA, Oct. 2000. [7] K. Feher and et al., U.S. patents: 4,567,602; 4,339,724; 4,644,565; 5,784,402; 5,491,457. Canadian patents: 1,211,517; 1,130,871; 1,265,851. [8] D. I. S. Agency, “Department of defense interface standard, interoperability standard for single-access 5-kHz and 25-kHz UHF satellite communications channels,” tech. rep. MIL-STD-188-181B, Department of Defense, Mar. 1999. [9] L. Li and M. K. Simon, “Performance of coded OQPSK and MILSTD SOQPSK with iterative decoding,” IEEE Trans. Commun., vol. 52, pp. 1890–1900, Nov. 2004. [10] M. Geoghegan, “Optimal linear detection of SOQPSK,” in Proc. Internat. Telem. Conf., San Diego, CA, Oct. 2002. [11] M. Geoghegan, “Implementation and performance results for trellis detection of SOQPSK,” in Proc. Internat. Telem. Conf., Las Vegas, NV, Oct. 2001. [12] M. Geoghegan, “Bandwidth and power efficiency trade-offs of SOQPSK,” in Proc. Internat. Telem. Conf., San Diego, CA, Oct. 2002. [13] J. B. Anderson, T. Aulin, and C.-E. Sundberg, Digital Phase Modulation. New York: Plenum Press, 1986. [14] M. K. Simon and T.-Y. Yan, “Performance evaluation and interpretation of unfiltered Feher-patented quadrature phase-shift keying (FQPSK),” Telecommunications and Mission Operations Progress Report, Jet Propulsion Laboratory, May 1999. [Online]. Available: http://tmo.jpl.nasa.gov/progress report/42-137/137C.pdf. [15] A. Svensson and C.-E. Sundberg, “Optimum MSK-type receivers for CPM on Gaussian and Rayleigh fading channels,” Proc. IEE, pp. 480– 490, Aug. 1984. [16] A. Svensson and C.-E. Sundberg, “Serial MSK-type detection of partial response continuous phase modulation,” IEEE Trans. Commun., vol. 33, pp. 44–52, Jan. 1985. [17] P. Galko and S. Pasupathy, “Linear receivers for correlatively coded MSK,” IEEE Trans. Commun., vol. 33, pp. 338–347, Apr. 1985. [18] R. Rhodes, S. Wilson, and A. Svensson, “MSK-type reception of continuous phase modulation: Cochannel and adjacent channel interference,” IEEE Trans. Commun., vol. 35, pp. 185–193, Feb. 1987. [19] A. Svensson, C.-E. Sundberg, and T. Aulin, “A class of reducedcomplexity Viterbi detectors for partial response continuous phase modulation,” IEEE Trans. Commun., vol. 32, pp. 1079–1087, Oct. 1984. [20] E. Perrins and M. Rice, “Reduced complexity detectors for multi-h CPM in aeronautical telemetry,” IEEE Trans. Aerospace Electron. Syst., vol. 43, no. 1, pp. 286–300, Jan. 2007. [21] T. Nelson, E. Perrins, and M. Rice, “Performance analysis of common detectors for shaped offset QPSK and Feher’s QPSK.” Department of Electrical and Computer Engineering, Brigham Young University, Nov. 2006. [Online]. Available: http://hdl.handle.net/1877/422. [22] M. K. Simon, Bandwidth-Efficient Digital Modulation with Application to Deep-Space Communications. New York: Wiley, 2003. [23] H. C. Park, K. Lee, and K. Feher, “Continous phase modulation of spectrally efficient FQPSK signals,” in Proc. Veh. Technol. Conf., pp. 692–695, Oct. 2003. [24] H. C. Park, K. Lee, and K. Feher, “Continous phase modulation of FQPSK-B signals,” IEEE Trans. Veh. Technol., vol. 56, pp. 157–172, Jan. 2007. [25] P. A. Laurent, “Exact and approximate construction of digital phase modulations by superposition of amplitude modulated pulses (AMP),” IEEE Trans. Commun., vol. 34, pp. 150–160, Feb. 1986. [26] G. K. Kaleh, “Simple coherent receivers for partial response continuous phase modulation,” IEEE J. Select. Areas Commun., vol. 7, pp. 1427– 1436, Dec. 1989. [27] U. Mengali and M. Morelli, “Decomposition of M-ary CPM signals into PAM waveforms,” IEEE Trans. Inform. Theory, vol. 41, pp. 1265–1275, Sept. 1995. [28] G. Colavolpe and R. Raheli, “Reduced-complexity detection and phase synchronization of CPM signals,” IEEE Trans. Commun., vol. 45, pp. 1070–1079, Sept. 1997. [29] E. Perrins and M. Rice, “PAM decomposition of M-ary multi-h CPM,” IEEE Trans. Commun., vol. 53, pp. 2065–2075, Dec. 2005. [30] E. Perrins and M. Rice, “A new performance bound for PAM-based CPM detectors,” IEEE Trans. Commun., vol. 53, pp. 1688–1696, Oct. 2005. [31] E. Perrins and M. Rice, “PAM representation of ternary CPM,” IEEE Trans. Commun., to appear. [32] E. Perrins, R. Schober, M. Rice, and M. K. Simon, “Multiple-bit differential detection of shaped-offset QPSK,” IEEE Trans. Commun., vol. 55, no. 12, pp. 2328–2340, Dec. 2007.

NELSON et al.: NEAR OPTIMAL COMMON DETECTION TECHNIQUES FOR SHAPED OFFSET QPSK AND FEHER’S QPSK

[33] X. Huang and Y. Li, “MMSE-optimal approximation of continuousphase modulated signal as superposition of linearly modulated pulses,” IEEE Trans. Commun., vol. 53, pp. 1166–1177, July 2005. [34] X. Huang and Y. Li, “Simple CPM receivers based on a switched linear modulation model,” IEEE Trans. Commun., vol. 53, pp. 1100–1103, July 2005. [35] E. Perrins and M. Rice, “Simple detectors for shaped-offset QPSK using the PAM decomposition,” in Proc. IEEE Global Telecommun. Conf., Nov. 2005. [36] T. Aulin, N. Rydbeck, and C.-E. Sundberg, “Continuous phase modulation–part II: partial response signaling,” IEEE Trans. Commun., vol. 29, pp. 210–225, Mar. 1981. [37] B. E. Rimoldi, “A decomposition approach to CPM,” IEEE Trans. Inform. Theory, vol. 34, pp. 260–270, Mar. 1988. [38] T. Nelson and M. Rice, “Common detectors for shaped offset QPSK (SOQPSK) and Feher-patented QPSK (FQPSK),” in Proc. IEEE Global Telecommun. Conf., St. Louis, MO, Nov. 2005.

Tom Nelson (S’95, M’08) received his BSEE (1994, magna com laude), his MSEE (1995) and his Ph.D. (2007) from Brigham Young University. From 19952002 he worked in the wireless communications field for Signal Science, Inc., Allen Telecom, Inc., Condor Systems, Inc., and Radix Systems, Inc. doing digital modem design and real-time embedded system development. He also has worked as a consultant doing advanced digital communications receiver design. He currently works for L-3 Communications, Communication Systems–West, Inc. doing research with and analysis of wireless communications systems. Dr. Nelson is a member of the IEEE Communications Society and the IEEE Signal Processing Society. His research interests include waveform design, detection and estimation theory, synchronization, digital communications theory, and error correcting codes.

735

Erik Perrins (S’96-M’05-SM’06) received the B.S. (magna cum laude), M.S., and Ph.D. degrees from Brigham Young University, Provo, UT, in 1997, 1998, and 2005, respectively all in electrical engineering. From 1998 to 2004, he was with Motorola, Inc., Schaumburg, IL, where he was engaged in research on advanced development of land mobile radio products. Since August 2005, he has been with the Department of Electrical Engineering and Computer Science, University of Kansas, Lawrence, where he is currently an Assistant Professor. Since 2004, he has been an Industry Consultant on problems such as reducedcomplexity receiver design and receiver synchronization. His current research interests include digital communication theory, advanced modulation techniques, synchronization, channel coding, and complexity reduction in receivers. He is an Editor of the IEEE Transactions on Communications for modulation theory. Dr. Perrins is a member of the IEEE Communications Society and the IEEE Vehicular Technology Society.

Michael Rice (M’82 SM’98) received a BSEE from Louisiana Tech University in 1987 and his Ph.D. from Georgia Tech in 1991. Dr. Rice was with Digital Transmission Systems, Inc. in Atlanta and joined the faculty at Brigham Young University in 1991 where he is currently the Jim Abrams Professor in the Department of Electrical & Computer Engineering. Professor Rice was a NASA/ASEE Summer Faculty Fellow at the Jet Propulsion Laboratory during 1994 and 1995 where he worked on land mobile satellite systems. During the 1999-2000 academic year, Professor Rice was a visiting scholar at the Communication Systems and Signal Processing Institute at San Diego State University. Professor Rice’s research interests are in the area of digital communication theory and error control coding with a special interest in applications to telemetering and software radio design. He has been a consultant to both government and industry on telemetry related issues. He is a member of the IEEE Communications Society. He is currently serving as the vice-chair of the Communication Theory Technical Committee in the IEEE Communications Society and as Technical Editor for Command, Control and Communication Systems for IEEE Transactions on Aerospace and Electronic Systems.

736

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

Probability Density Function of Reliability Metrics in BICM with Arbitrary Modulation: Closed-form through Algorithmic Approach Leszek Szczecinski, Senior Member, IEEE, Rolando Bettancourt, and Rodolfo Feick, Senior Member, IEEE

Abstract—In the popular bit-interleaved coded modulation (BICM) the output of the channel encoder and the input of the modulator are separated by a bit-level interleaver. From the decoder’s point of view, the modulator, the transmission channel, and the demodulator (calculating bits’ reliability metrics) become a memoryless BICM channel with binary inputs and real outputs. In unfaded channels, the BICM channel’s outputs (reliability metrics) are known to be Gaussian for binary- or quaternary phase shift keying but their probability density function (PDF) is not known for higher-order modulation. We fill this gap by presenting an algorithmic method to calculate closed-form expressions for the PDF of reliability metrics in BICM with arbitrary modulation and bits-to-symbol mapping when the so-called max-log approximation is applied. Such probabilistic description of BICM channel is useful to analyze, from an informationtheoretic point of view, any BICM constellation/mapping design. Index Terms—Logarithmic likelihood ratios, L-values, bitinterleaved coded modulation, probability density functions, softinput decoding, LLR, BICM, PDF.

I. I NTRODUCTION N this paper we propose a method to obtain the parametric description of probability density functions (PDF) of the reliability metrics for bit-interleaved coded modulation (BICM) in unfaded channels. BICM is a flexible modulation/coding scheme in which the selection of the modulation constellation is decoupled from the choice of the coding rate [1][2]. It was devised [1] as a competing alternative to trellis coded modulation (TCM) [3] in fading channels. However, its versatility is the reason for its popularity also in unfaded channels [2] particularly because it is accompanied by a small performance penalty when compared to less flexible coded modulation approaches such as TCM.

I

Paper approved by S. A. Jafar, the Editor for Wireless Communication Theory and CDMA of the IEEE Communications Society. Manuscript received March 27, 2006; revised August 28, 2006. This work was supported by NSERC, Canada (under research grant 249704-02), by Fundacion Andes, Chile, and by Conicyt, Chile (under project PBCT-ACT-11/2004). This work was presented at the IEEE Global Telecommunications Conference 2006, San Francisco, USA, Nov. 2006. L. Szczecinski is with the Institut National de la Recherche Scientifique, INRS-EMT, University of Quebec, Place Bonaventure 800, Gauchetiere W. Suite 6900 Montreal, H5A 1K6, Canada (e-mail: [email protected]). R. Bettancourt was with the Universidad T´ecnica F´ederico Santa Mar´ıa, Valpara´ıso, Chile. He is now with VTR Globalcom, Reyes Lavalle 3340, Las Condes, Santiago, Chile (e-mail: [email protected]). R. Feick is with the Department of Electronics Engineering, Universidad Tecnica Federico Santa Mar´ıa, Avenida Espana 1680, Valpara´ıso, Chile (email: [email protected], [email protected]). Digital Object Identifier 10.1109/TCOMM.2008.060169.

Lak

BICM channel

ENC

y

Π

η

yk

μ−1

+

μ x

r

Π−1

L

DEC

Lk

Fig. 1. Model of the BICM transmission where demapping uses a priori L-values : interleaver Π, bits-to-symbols mapper μ{·}, metrics (L-values) calculator μ−1 {·}, and deinterleaver Π−1 , may be modelled as a BICM channel with binary inputs y and real outputs L.

In BICM, the output of the channel encoder and the input to the modulator are separated by a bit-level interleaver, as shown in Fig. 1. At the receiver, the reliability metrics calculated for the coded bits according to the maximum likelihood principle, are deinterleaved and passed to the softinput decoder. These metrics are most often expressed as logarithmic likelihood ratios (LLR), or simply : L-values. To calculate them, the so-called max-log approximation is frequently considered [1][2][4][5] as a way to alleviate the computational complexity of the metrics’ calculation. From the decoder’s point of view, the modulator, the transmission channel, and the demodulator (calculating the L-values) may be seen as one entity equivalent to a binary input – real output BICM channel [6] which, when assuming perfect interleaving, may be considered as memoryless [1]. The probabilistic description of the L-values fully defines the BICM channel so knowing their probability density function (PDF) we may evaluate the capacity, cutoff rate, and/or any other information-theoretic parameter. Even though some of these parameters may be obtained through Gaussian quadrature or Monte-Carlo integration (e.g., the channel’s capacity [2]), the possibility of using an analytical form of the PDFs would enormously simplify calculations. Moreover, for accurate evaluation of the performance of the soft input decoder, knowledge of the PDFs is necessary [6][7]. To deal with this problem, a Gaussian approximation is sometimes used [6][8] because, in unfaded channels, L-values are indeed Gaussian for binary- and quaternary phase-shift keying (BPSK and QPSK) [9][10]. However, it is quite clear from the observations of the histograms (e.g., [6][8]) that the Gaussian model is not appropriate for higher-order modulations. Although the analytical description of the PDFs is fundamental to effectively characterize BICM channels, this prob-

c 2008 IEEE 0090-6778/08$25.00 

SZCZECINSKI et al.: PROBABILITY DENSITY FUNCTION OF RELIABILITY METRICS IN BICM WITH ARBITRARY MODULATION

lem was considered difficult [6] and very little work has been reported on how to explicitly address this issue. Only recently, an exact expression for the PDF of L-values in M -ary quadrature amplitude modulation (QAM) with Gray mapping was obtained. It was shown to be a sum of truncated Gaussian functions [11]. To the best of our knowledge, there is no published work showing the form of the PDF for other modulations. We note that the developments of [11] took advantage of the particularity of the Gray mapping in M -QAM, and were based on one-dimensional considerations. The objective of this paper and its main contribution is to generalize the approach of [11] to arbitrary complex constellations. Therefore, our results extend those presented in [11], and we demonstrate that the PDF is a sum of truncated Gaussian functions weighted with shifted and scaled complementary error functions. With no constraints on the modulation or mapping, the parameters of the sought closed-form expressions of the PDFs are obtained through well defined algorithmic steps; this explains the title of this paper. For an even more general framework, we consider also the case when a priori L-values of the modulating bits are available. Such a scenario resembles the case of iterative (turbo) detection [12], [13] or it may be encountered when non-uniform signaling is to be considered [14]. This paper is organized as follows. A model of the system is shown in Section II. In Section III we present the method to obtain closed form expressions for the PDF of the Lvalues. We show in Section IV numerical examples contrasting the developed analytical expressions with the histograms of the LLRs obtained from simulated data. The conclusions are presented in Section V. II. S YSTEM M ODEL Consider a baseband model of the BICM transmission where coded bits yk (n) are interleaved, gathered in codewords of length m, y(n) = [y0 (n), . . . , ym−1 (n)] ∈ B and mapped into symbols x(n) = μ{y(n)} ∈ X , where n denotes the discrete time, M = 2m , and B = {b0 , . . . , bM−1 } is a set of all codewords labelling the symbols from the zero-mean and unitary energy constellation M−1 X2 = {a0 , . . . , aM−1 }, i.e., M−1 1 a = 0 and k k=0 k=0 |ak | = 1. The probability of M generating zeros and ones are equal, i.e., Pr{yk (n) = 1} = 12 . Aiming at a fully general formulation, we do not restrain X or μ{·} to be of any particular form. The case of μ{·} implementing Gray mapping for X being M -QAM, treated in [11], is a particular instance of our model. In what follows, to simplify the notation, we omit showing the dependence on n because all the operations are memoryless. Transmission over additive white Gaussian noise (AWGN) channel results in r = x + η, where η ∈ C is a zero-mean, complex, white Gaussian noise with variance N0 (i.e., its real and imaginary parts are independent Gaussian variables each with variance N20 ), thus the average received signal-to-noise ratio (SNR) per symbol is given by γ = N10 . The a posteriori reliability metrics for the k-th transmitted bits are calculated as logarithmic likelihood ratios (L-values)

[4]

 Pr{yk = 1|r} Pr{y = 0|r}   k  p r|yk = 1   + Lak = log p r|yk = 0

Lap k = log

737



(1)

= Lk + Lak where p(·|·) is the conditional PDF of the channel output and Lak is the a priori L-value for the k-th bit. The inclusion of this term generalizes our approach and has practically no impact on the complexity of the proposed algorithm. On the other hand, it might be exploited in the context of the so-called BICM with iterative decoding (BICM-ID) [12][15] or for transmission with non-uniform signaling [14]. If the a priori L-values are of no interest, their effect may be simply omitted by setting Lak to zero. The term Lk in (1), is known as the extrinsic L-value, and for the AWGN channel, it is calculated as [16][4]

 2 a∈Xk,1 exp(−γ|r − a| )Pr{x = a} Lk = log  − Lak , 2 )Pr{x = a} exp(−γ|r − a| a∈Xk,0 (2) where Xk,b is the set of symbols having the k-th labelling bit equal to b; its cardinality is given by |Xk,b | = M 2 . Deinterleaved extrinsic L-values Lk are denoted by L. In this paper we assume that the so-called max-log approximation is used by the receiver, i.e. Lk = min γ|r − a|2 − Λk {a} a∈Xk,0 (3) − min γ|r − a|2 − Λk {a} a∈Xk,1

where Λk {a} =

m−1 j=0 j=k

βj {a}Laj includes the effect of the

a priori L-values1 and βj {a} denotes the j-th bit of the codeword labelling the symbol a. We note that (3), considered already in the early works on BICM [1][2], is a frequently used approximation for (2) whose application is justified by a relatively small performance loss (as will be also discussed at the end of this paper) and a significant complexity reduction – an important issue in an industrial context, e.g., [5]. Thus, the metrics’ calculator implementing (3) is an element of the BICM channel, which justifies the analysis we conduct in this paper. As we will see, the use of approximation (3) is also key to obtain the exact expressions for the metrics’ PDF which, otherwise, are difficult to treat analytically. Deriving the PDF for (2) seems to be a much harder problem. III. PDF OF L- VALUES We must find an expression for the conditional PDF of the L-values of the k-th bit given by d d Fk (λ|b) = Pr{Lk < λ|yk = b}, (4) dλ dλ where Fk (λ|b) is the cumulative distribution function (CDF) for the bit at the k-th position (or simply – for the k-th bit) of pk (λ|yk = b) =

1 The bits y are assumed independent, so using a priori L-values we obtain k Pr{x = a} = m j=1 Pr{yj = βj {a}}.

738

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

the label y. Knowing that Pr{x = a} = |X1k,b | = M 2 due to the assumed equiprobable generation of the bits yk , Fk (λ|b) is calculated as Fk (λ|b) = Pr{Lk < λ|yk = b}  1 2  = Pr{Lk < λ|x = a} = Pk (λ|a), |Xk,b | M a∈Xk,b

a∈Xk,b

(5)

where Pk (λ|a) also has the meaning of a CDF, but conditioned on sending the symbol a. Since we do not assume any particular form of X or μ{·}, all the required terms in (5) must be calculated explicitly. Based on the assumption of perfect interleaving, the PDF averaged over the bits’ positions yields p(λ|y = b) =

1 m

m−1 

pk (λ|yk = b).

2 1.5

Z1,2

1 0.5 0 −0.5

1010 1010

0000

Z1,4

a1,0,2

0110

1000

a1,1,2

0010 0010

a1,0,4

0100

1100 1100

1110

a1,1,4 1111

0111 Z1,1 0111 a1,0,1

1101

0101

a1,1,1 0001 0001

0011

−1

a1,0,3

1001 1001

1011

a1,1,3

Z1,3

−1.5 −2 −2

−1.5

−1

−0.5

0

0.5

a)

1

1.5

2

(6) 2

k=0

To arrive at the desired solution we will use the following general steps • First, we will divide the observation space into regions, within which the expression (3) is a linear form of the channel output r. • Next, we will calculate the CDF of the L-values (5), identifying explicitly the terms which are dependent on λ. • Finally, we will differentiate the CDF with respect to λ, to obtain the desired expression for the PDF.

1.5 1010 1

0010

−0.5

0000

a1,0,4

0.5 0

Z1,4

1000

0110 0110

1110

a1,0,1

0100

1100 1100

a1,0,2 a1,0,3 1111 1101 1101 Z1,3 0101 a1,1,3 a1,1,4

Z1,1 0111 0111 a1,1,1 0011

0001 1001

1011 1011

−1

a1,1,2 −1.5

Z1,2

A. Linearization and related problems

−2 −2

We start with the following representation of (3)

  Lk (r) = γ|r − ak,0 (r)|2 − Λk {ak,0 (r)}   − γ|r − ak,1 (r)|2 − Λk {ak,1 (r)}   ∗  =2γ [r · ak,1 (r) − ak,0 (r) ] + |ak,0 (r)|2 − |ak,1 (r)|2 + Λk {ak,1 (r)} − Λk {ak,0 (r)}

(7)

where [·] denotes the real part and the terms ak,b (r), b ∈ {0, 1} are non-linear functions of r given by ak,b (r) = arg min {γ|r − a|2 − Λk {a}} b = 0, 1. a∈Xk,b

(8)

This yields the symbols from Xk,b “closest” to the observation r in the sense of a metric that combines the squared Euclidean distance and a priori L-values. The non-linearity in (8) is difficult to deal with directly, so for each bit k, we divide the observation space into Tk non-empty regions Zk,t = {r : ak,0 (r) = ak,0,t ∧ ak,1 (r) = ak,1,t }

t = 1, . . . , Tk (9)

within which ak,b (r) are constant and equal to symbols ak,b,t ∈ Xk,b , b ∈ {0, 1}, cf. Fig. 2; the operator ∧ here denotes a logical AND. The division defined in (9) is crucial for our development because for r ∈ Zk,t the symbols ak,b (r) do not depend on r, so the L-value Lk (r) then becomes a locally (i.e., within Zk,t ) linear form of r, cf. (7).

−1.5

−1

−0.5

0

b)

0.5

1

1.5

2

Fig. 2. 16-APSK constellation (symbols shown as markers) and labelling used in numerical examples. The regions Zk,t , t = 1, 2, 3, 4 obtained for a) k = 0, and b) k = 3 are shown shaded. Following the notational convention of Fig. 3, the symbols ak,b,t are indicated by blank markers: squares for b = 0 and circles for b = 1.

To determine the form of Zk,t and to find the corresponding symbols ak,b,t we rewrite (9) using (8) Zk,t ={r : γ|r − ak,0,t |2 − Λk {ak,0,t } ≤ γ|r − a|2 − Λk {a} ∧ γ|r − ak,1,t |2 − Λk {ak,1,t } ≤ γ|r − a |2 − Λk {a }, a ∈ Xk,0 , a ∈ Xk,1 , a = ak,0,t , a = ak,1,t }. (10) The quadratic terms of r cancel in (10) so, for given ak,b,t , b = 0, 1, there are M − 2 linear inequalities defining the region Zk,t , but only Jk,t ≤ M − 2 are non-redundant, i.e., we may write Zk,t = {r : Lk,t,j (r) ≤ 0 c∗k,t,j ]

j = 1, . . . , Jk,t },

(11)

+ dk,t,j are linear forms where Lk,t,j (r) ≡ [r · of r defining non-redundant inequalities obtained from (10). Henceforth, we will often use the symbol L (i.e., without the argument r ) to denote a set of parameters {c, d} defining the linear form L(r) = [r · c∗ ] + d, and we use −L ≡ {−c, −d}. A numerically efficient procedure adopted from the area of computational geometry, described in [17] may be used to find

SZCZECINSKI et al.: PROBABILITY DENSITY FUNCTION OF RELIABILITY METRICS IN BICM WITH ARBITRARY MODULATION

Lk,t,j and the vertices vk,t,f of the polygon directly from (10). Note, however, that although the problem (10) resembles the one defining the so-called decision regions in [17], the forms of both entities are entirely different, and cannot be deduced one from another. The region Zk,t may be i) a (convex) polygon with Vk,t = Jk,t vertices, ii) a region that extends to infinity (i.e., an “infinite” polygon) defined by Vk,t = Jk,t − 1 vertices or, iii) an empty set [if inequalities in (10) are contradictory, i.e. Jk,t = 0]. The last condition allows us also to reject those candidates for pairs of symbols ak,0,t and ak,1,t , which produce empty regions Zk,t . For completeness, we note that in some cases, the two dimensional constellation may be seen as a product of two, independent one-dimensional constellations. Such a particular case, which occurs, e.g., for 2m -QAM with Gray mapping, requires different (and simpler) considerations and results in a different form of PDF as shown in [11]. B. Calculating the CDF Because Zk,t are disjoint and their union covers the whole Tk Zk,t = C (C is the space of observation space, i.e., t=1 complex numbers), we can write the terms appearing in (5) as Pk (λ|a) = =

Tk  t=1 Tk 

Pr{Lk (r) < λ ∧ r ∈ Zk,t | x = a}

;;; ;;; ;;;

739

Lk,t (r; λ) = 0

Lk,t,5 (r) = 0

Lk,t,4 (r) = 0 ak,1,t

Lk,t,1 (r) = 0

Lk,t,2 (r) = 0

Lk,t,3 (r) = 0

ak,0,t

a)

;;;; ;;;; ;;;; Lk,t (r; λ) = 0

Lk,t,3 (r) = 0 ak,1,t

Lk,t,1 (r) = 0

Lk,t,2 (r) = 0

ak,0,t

b)

Pr{Lk,t (r; λ) < 0 ∧ Lk,t,1 (r) ≤ 0 ∧

t=1

. . . ∧ Lk,t,Jk,t (r) ≤ 0|x = a}

(12)

where, using (7) we define Lk,t (r; λ) = [r · c∗k,t ] + dk,t (λ) = Lk,t (r; 0) −

λ 2γ

(13)

with ck,t = ak,1,t − ak,0,t 1 dk,t (λ) = [|ak,0,t |2 − |ak,1,t |2 ] 2 1 [Λk {ak,1,t } − Λk {ak,0,t } − λ]. + 2γ

(14)

so that Lk,t (r; λ) +

1 λ = Lk (r). 2γ 2γ

(15)

Again, using the introduced notation we may write Lk,t (λ) = {ck,t , dk,t (λ)} to emphasize that the linear form (13) depends on λ. From (13) and (14) we may also deduce that the line Lk,t (r; λ) = 0 is perpendicular to the line passing through ak,0,t and ak,1,t . The terms in (12) determine the probabilities of finding r in the region defined by Lk,t,j (r) ≤ 0, j = 1, . . . , Jk,t and Lk,t (r; λ) ≤ 0, conditioned on the transmission of x = a. We thus now consider two effectively different cases shown schematically in Fig. 3; the vectors normal to the lines defining the regions are shown pointing to the half-plane where the inequality is not satisfied [i.e., where Lk,t,j (r) > 0]. In the first case, cf. Fig. 3a, depending on the value of λ, the line Lk,t (r; λ) = 0 either intersects with the sides of two

Fig. 3.

Examples of the regions Zk,t : a) polygon, b) “infinite” polygon.

regions or has no intersection at all. These conditions may be encountered when the region Zk,t is a polygon or an “infinite” polygon. In the second case, independently of the value of λ, the line Lk,t (r; λ) = 0 intersects with only one of the region’s sides. This may happen only when the region is an “infinite” polygon, cf. Fig. 3b. Calculation of (12) requires integration of the Gaussian PDF over the region defined by linear constraints. This problem was already treated in [17], where it was demonstrated that each of such regions (shown hashed in Fig. 3) may be represented as a union and subtraction of the so-called wedges. By definition, the wedge W(Lx , Ly ) is a region where two arbitrary linear inequalities Lx (r) ≤ 0 and Ly (r) ≤ 0 are satisfied. The probability that r belongs to a wedge, conditioned on sending x = a, is given by [17][18]   Pr{Lx (r) < 0 ∧ Ly (r) < 0|x = a} ≡ I Lx , Ly | a   [cx c∗y ] Ly (a)  Lx (a)  (16) =Q 2γ, 2γ; |cx | |cy | |cx ||cy | where the two-dimensional Q-function is defined as [19]  ∞  ∞  v2 −2vuρ+u2  1 − 2(1−ρ2 )  Q(t, s; ρ) = e dvdu. 2π 1 − ρ2 t s (17) In what follows we will also use the one-dimensional Q∞ 2 function Q(t) = √12π t exp (− τ2 )dτ ; the difference in

740

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

notation with (17) is clear through the number of arguments. Since the hashed regions in Fig. 3 are union/subtraction of wedges, the integral of the Gaussian distribution over these regions can be expressed in terms of the sum/subtraction of the integrals over wedges. Thus, the terms in (12) may be evaluated as Pr{Lk,t (r; λ) < 0 ∧ r ∈ Zk,t | x = a} = ⎧   ⎪ ⎨I Lk,t (λ), Lk,t,jα |a   Jk,t  cf. Fig. 3a −I Lk,t (λ), −Lk,t,jω |a + ϕ {Lk,t,j }j=1 ⎪    ⎩ Jk,t  I Lk,t (λ), Lk,t,jα |a + ϕ {Lk,t,j }j=1 cf. Fig. 3b (18) where jα , jω are indices of the polygon’s sides intersecting with the line Lk,t (r; λ) = 0 (in Fig. 3a jα = 3 and jω = 5; in Fig. 3b jα = 2). Integrals over the wedges  indicated  by the darkened arcs in Fig. 3 correspond to I Lk,t (λ), ·|a , while ϕ(·) contains integrals over the wedges indicated by blank arcs.

by jα and jω . Changing value of λ, the line Lk,t (r; λ) = 0 moves along the line linking the points ak,0,t and ak,1,t (shown dashed in Fig. 3). In particular, by increasing λ we move Lk,t (r; λ) = 0 towards ak,1,t . It is obvious that the indices jα and jω of the “active” sides do not change with λ as long as the value of λ is such that the line Lk,t (r; λ) = 0 does not pass through a vertex of the polygon. Therefore, the coefficients of the PDF are constant for λ belonging to the interval limited by L-values λ calculated at the vertices of the polygon: λk,t,j = Lk (vk,t,j ), cf. (7). The limits of the intervals are schematically shown in Fig. 3 as orthogonal projections of the vertices on the line passing through ak,0,t and ak,1,t , (L-values calculated for such a projection and for the vertex itself are equal). Sorting λk,t,j , we obtain the limits of the intervals over which the parameters of the function defined in (20) remain constant. Since there are Vk,t vertices, there will be Vk,t + 1 of such disjoint intervals. Differentiating (12) and applying (18) and (20) we obtain the PDF conditioned on the sent symbol k    d Pk (λ|a) ≡ Pk (λ|a) = Ψk,t λ|a dλ t=1

T

C. Calculating the PDF To obtain (6), we need to differentiate (12) with respect to λ. This implies differentiation of (18) as well, and is the reason why the expression of ϕ(·) in (18) is not shown here2 : it is independent of λ, and thus will be eliminated after the differentiation. First, differentiation of (17) yields d Q(t(λ), s; ρ) dλ     2 −1 t (λ) s − t(λ)ρ d = √ exp − Q  t(λ) (19) 2 2π 1 − ρ2 dλ   so I Lk,t (λ), Lk,t,j | a is also easily differentiated with respect to λ  d  I Lk,t (λ), Lk,t,j |a ≡ f (λ; Ak,t,j (a), Bk,t,j , Ωk,t (a), Σk,t ) dλ   2  Ωk,t (a) − λ 1 =  exp − 2 · Σk,t · γ 2πΣk,t · γ √  · Q γ(Ak,t,j (a) + λBk,t,j ) (20) where

Σk,t = 2|ck,t | [ck,t c∗k,t,j ] . ρk,t,j = |ck,t ||ck,t,j |

Vk,t +1



f (λ; Ak,t,jα,k,t,v (a), Bk,t,jα,k,t,v , Ωk,t (a), Σk,t )

v=1

− f (λ; −Ak,t,jω,k,t,v (a), −Bk,t,jω,k,t,v , Ωk,t (a), Σk,t ) · wk,t,v (λ) (27)

and the windowing function 1 if λ ∈ (λk,t,v−1 , λk,t,v ) wk,t,v (λ) = 0 otherwise

(28)

(25)

IV. N UMERICAL EXAMPLES

  2 Lk,t (a; 0) Lk,t,j (a) − ρ (21) k,t,j 1 − ρ2k,t |ck,t,j | |ck,t | ρk,t,j  (22) = |ck,t |γ 2(1 − ρ2k,t,j ) 2

  Ψk,t λ|a =

(24)



Ωk,t (a) = 2γLk,t (a; 0)

where

determines the interval of λ over which the function f (·) contributes to Pk (λ|a). For convenience, we use here λk,t,0 = −∞ and λk,t,Vk,t +1 = ∞. The indices jα,k,t,v and jω,k,t,v correspond to the values of jα and jω obtained in the v-th interval of λ in the region Zk,t . Finally, grouping will define the complete form the results 2  P (λ|a), which further should be of pk (λ|b) = M a∈Xk,b k used to obtain p(λ|b) via (6). We note that if the extrinsic L-values Lk are calculated for Lak = 0 (i.e., without a priori information), the form of the regions Zk,t does not change with γ, cf. (10). Therefore, once the parameters of the function f (·) in (20) are known, the PDF can be calculated for any value of SNR γ without geometric considerations, i.e., practically without computational overhead. From a computational complexity point of view, this compares favorably with the histogram-based approach.

Ak,t,j (a) = Bk,t,j

(26)

(23)

We note that the coefficients of the PDF (A, B, Ω, Σ) will depend on the “active” polygon’s sides, i.e., those indicated 2 But it is straightforward to obtain as shown in [17]. For example, in Fig. 3a,     ϕ({Lk,t,j }5j=1 ) = −I Lk,t,5 , −Lk,t,1 |a − I Lk,t,1 , −Lk,t,2 |a +   I −Lk,t,2 , −Lk,t,3 |a

As an example we consider a 16-ary amplitude phase shift keying (APSK) constellation whose form and labelling are defined in [20, Sec. 5.4.3] and reproduced in Fig. 2. The ratio of the outer and inner constellation radii R2 /R1 was taken equal to the standard value 3.15 [20, Sec. 5.4.3]; this constellation was designed to work with a coding rate ρ = 23 .

SZCZECINSKI et al.: PROBABILITY DENSITY FUNCTION OF RELIABILITY METRICS IN BICM WITH ARBITRARY MODULATION

0.45

0.45

0.4

0.4

0.35

0.35 0.3

P0 (λ|1)

P0 (λ|0)

0.3

0.25

0.25 0.2

0.2

0.15

0.15 0.1

0.1

0.05

0.05

0 −25

−20

−15

−10

a)

−5

0

0 −5

5

0.25

0.2

0.2

P3 (λ|0)

0.25

10

b)

15

20

25

0.15

0.1

0.1

0.05

0.05

0 −25

5

P3 (λ|1)

0.15

0

−20

−15

−10

c)

−5

0

5

0 −5

0

5

10

d)

15

20

25

Fig. 4. PDF Pk (λ|b) obtained from the proposed analytical formulas (lines) and estimated using histograms (markers) for a) k = 0, b = 0, b) k = 0, b = 1, c) k = 2, b = 0, and d) k = 2, b = 1. The solid lines denote results obtained when γ = 4dB and dashed ones when γ = 9dB.

The forms of regions Z0,t and Z3,t , shown in Fig. 2a and Fig. 2b, respectively, are obtained for Lak ≡ 0 (i.e., without a priori information). Regions Zk,t are notably irregular and could not have been predicted by simple inspection. Thus, their efficient and automated calculation, using techniques of [17], is helpful for the implementation of the proposed method. Further, using γ = 4dB and γ = 9dB, and applying the proposed algorithm, we calculate the conditional PDFs Pk (λ|b) for k = 0, 2 and b = 0, 1 and compare them in Fig. 4 to the histograms of the L-values obtained transmitting 5 · 105 randomly generated symbols. First of all, we note the perfect match between histograms (markers) and analytical formulas (lines), as well as the fact that the L-values are clearly not well characterized by a Gaussian PDF. We may also appreciate that the distributions are not symmetric for k = 0. Note that increased SNR causes not only a shift of the PDF away from zero (which reflects the higher reliability of the bits’ metrics) but also makes the form of the PDF change significantly. Finally, as an additional consistency check, and using the developed expressions for PDF, we calculated mutual information (MI) between the L-values L and the bits y (for 16APSK modulation and a wide range of SNR γ). Since MI has the meaning of the BICM channel capacity, it may be directly compared to the capacity obtained using the method of [2]. Note, that the latter corresponds to calculating the Lvalues by the means of the exact formula (2). On the other hand, the former takes into account the use of the max-log approximation (3) and then, to obtain the capacity curve, the effort (related to calculation of the parameters of the PDF) is not high and must be deployed only once. When the

741

parameters are calculated, the capacity is rapidly evaluated via a one-dimensional numerical integration (to calculate the expectation in the MI formulation) for any value of SNR. We do not show here the graphical representation of the obtained results because both MI curves practically overlapped. At a coding rate ρ = 23 , for which the modulation 16APSK was designed, the SNR loss of the max-log approximation was less than 0.05dB. This fully justifies applications of the approximation to alleviate the computational complexity of the metrics calculation. V. C ONCLUSIONS In this paper we develop a method for the calculation of the parameters of the probability density function for the reliability metrics (L-values) in BICM. For the case of unfaded channels, closed-form expressions for these PDFs can be obtained assuming that the so-called max-log approximation is applied to calculate the metrics. Our method works for any modulation and bits-to-symbols mapping. A generalization which deals with a priori L-values is also considered. The advantage of the proposed method in the analysis and design of the constellation and/or mapping for BICM transmission is twofold. First, the PDFs of the L-values may be used to efficiently calculate (i.e., via one-dimensional integration) information-theoretic properties of the BICM channel. This offers an advantage when compared to two-dimensional or Monte-Carlo integrations used, e.g., to calculate the capacity [2]. Second, the PDF form, necessary to estimate the coded performance is available so, the proposed method may lead to simplified and/or improved accuracy coded performance analysis. R EFERENCES [1] E. Zehavi, “8-PSK trellis codes for a Rayleigh channel,” IEEE Trans. Commun., vol. 40, no. 3, pp. 873–884, May 1992. [2] G. Caire, G. Taricco, and E. Biglieri, “Bit-interleaved coded modulation,” IEEE Trans. Inform. Theory, vol. 44, no. 3, pp. 927–946, May 1998. [3] G. Ungerboeck, “Channel coding with multilevel/phase signals,” IEEE Trans. Inform. Theory, no. 1, pp. 55–67, Jan. 1982. [4] B. Hochwald and S. ten Brink, “Achieving near-capacity on a multipleantenna channel,” IEEE Trans. Commun., vol. 51, no. 3, pp. 389–399, Mar. 2003. [5] Ericsson, Motorola, and Nokia, “Link evaluation methods for high speed downlink packet access (HSDPA),” TSG-RAN Working Group 1 Meeting #15, TSGR1#15(00)1093, Tech. Rep., Aug. 2000. [6] A. Martinez, A. Guill´en i F`abregas, and G. Caire, “Error probability analysis of bit-interleaved coded modulation,” IEEE Trans. Inform. Theory, vol. 52, no. 1, pp. 262–271, Jan. 2006. [7] A. Abedi and A. K. Khandani, “An analytical method for approximate performance evaluation of binary linear block codes,” IEEE Trans. Commun., vol. 52, no. 2, pp. 228–235, Feb. 2004. [8] A. Guill´en i F`abregas, A. Martinez, and G. Caire, “Error probability of bit-interleaved coded modulation using the Gaussian approximation,” in Proc. Conference on Information Sciences and Systems, 2004. [9] S. ten Brink, “Convergence behaviour of iteratively decoded parallel concatenated codes,” IEEE Trans. Commun., vol. 49, no. 10, pp. 1727– 1737, Oct. 2001. [10] M. T¨uchler, R. Koetter, and A. C. Singer, “Minimum mean squared error equalisations using a priori information,” IEEE Trans. Signal Processing, vol. 50, no. 3, pp. 673–683, Mar. 2002. [11] M. Benjillali, L. Szczecinski, and S. Aissa, “Probability density functions of logarithmic likelihood ratios in rectangular QAM,” in Proc. Twenty-Third Biennial Symposium on Communications, Kingston, Canada, May 2006, pp. 283–286. [12] X. Li and J. A. Ritcey, “Bit-interleaved coded modulation with iterative decoding,” IEEE Commun. Lett., vol. 1, no. 6, pp. 169–171, Nov. 1997.

742

[13] S. ten Brink, “Convergence of iterative decoding,” IEE Electron. Lett., vol. 35, no. 10, pp. 806–808, May 1999. [14] G. Takahara, F. Alajaji, N. C. Beaulieu, and H. Kuai, “Constellation mappings for two-dimensional signaling of nonuniform sources,” IEEE Trans. Commun., vol. 51, no. 3, pp. 400–408, Mar. 2003. [15] F. Schreckenbach, N. G¨ortz, J. Hagenauer, and G. Bauch, “Optimization of symbol mappings for bit-interleaved coded modulation with iterative decoding,” IEEE Commun. Lett., vol. 7, no. 12, pp. 593–595, Dec. 2003. [16] G. Caire, G. Taricco, and E. Biglieri, “Capacity of bit-interleaved channels,” IEE Electron. Lett., vol. 32, no. 12, pp. 1060–1061, June 1996. [17] L. Szczecinski, S. Aissa, C. Gonzalez, and M. Bacic, “Exact evaluation of bit- and symbol-error rates for arbitrary 2-D modulation and nonuniform signaling in AWGN channel,” IEEE Trans. Commun., vol. 54, no. 6, pp. 1049–1056, June 2006. [18] S. Park and D. Yoon, “An alternative expression for the symbol-error probability of MPSK in the presence of I/Q unbalance,” IEEE Trans. Commun., vol. 52, no. 12, pp. 2079–2081, Dec. 2004. [19] M. K. Simon, “A simpler form of the Craig representation for the twodimensional joint Gaussian Q-function,” IEEE Commun. Lett., vol. 6, no. 2, pp. 49–51, Feb. 2002. [20] ETSI EN 302 307, “Digital video broadcasting (DVB), second generation framing structure, channel coding and modulation systems for broadcasting, interactive services, news gathering and other broadband satellite applications,” Jan. 2004.

Leszek Szczecinski (M’98-SM’07), received M.Eng. degree from the Technical University of Warsaw in 1992, and Ph.D. from INRSTelecommunications, Montreal, Canada in 1997. From 1998 to 2000, he was Assistant Professor at the Department of Electrical Engineering, University of Chile. From 2001 to 2007, he had been Assistant Professor, and since 2007, Associate Professor at INRS-EMT, University of Quebec, Canada. His research interests are in the area of digital signal processing, communication theory, wireless communications and analysis and design of iterative (turbo) processing algorithms

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

Rolando Bettancourt was born in 1982 in Valpara´ıso, Chile. He received the B.S. and M.Sc. in Electronic Engineering degrees in 2003 and 2006 respectively, from Universidad T´ecnica Federico Santa Mar´ıa, Valpara´ıso, Chile. From 2005 to 2006, he conducted research in collaboration with Institut National de la Recherche Scientifique, Monreal, QC, Canada, in the field of iterative (turbo) receivers. Since February, 2007, he has been working with VTR GlobalCom, Santiago, Chile. His current research interests are wireless and mobile communications, advanced coding techniques, iterative processing and data communications and networking.

Rodolfo Feick (S’71-M’76-SM’95) obtained the degree of Ingeniero Civil Electr´onico at Universidad T´ecnica Federico Santa Mar´ıa, Valpara´ıso, Chile in 1970 and the Ph.D. degree in Electrical Engineering at the University of Pittsburgh in 1975. He has been with the Department of Electronics Engineering at Universidad T´ecnica Federico Santa Mar´ıa since 1975, where he is the head of the telecommunications area. His current interests include RF channel modeling, digital communications, microwave system design and RF measurement.

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

743

Mobility Enhanced Smart Antenna Adaptive Sectoring for Uplink Capacity Maximization in CDMA Cellular Network Alex Wang and Vikram Krishnamurthy

Abstract—In this paper, adaptive sectoring of a CDMA cellular network is investigated, and the aim is to maximize the uplink capacity by utilizing mobiles’ spatial information. One important feature of the algorithm developed is that it does not depend on tracking individual mobile, but rather on the statistics of mobiles. The distribution of mobiles is modeled as a spatial Poisson process, whose rate function quantizes mobile concentration and is inferred with a Bayesian estimator based on the statistics of network traffic. In addition, the time dynamics of the rate function is assumed to evolve according to mobiles’ mobility pattern and it is formulated using the Influence model. With the knowledge of mobiles’ spatial distribution, the interference and thus the outage probability of different sector partitions of a cell can be computed. The adaptive sectoring problem is formulated as a shortest path problem, where each path corresponds to a particular sector partition, and the partition is weighted by its outage probability. In simulation examples, a hot spot scenario is simulated with the adaptive sectoring mechanism, and it is observed that load balancing between sectors is achieved and which greatly reduces the effect of hot spot. Index Terms—Mobility estimation, adaptive sectoring, smart antenna, and CDMA uplink capacity.

I

I. I NTRODUCTION

N wireless cellular networks, CDMA is a promising technology to offer high quality and robust voice/data services. Its RAKE receiver design and soft handoff greatly increase the robustness against multipath and fast fading environment. However, the ever increasing demand for the wireless services is continuously challenging the capacity limit of CDMA networks. In this paper, the application of adaptive sectorization to increase the network capacity is studied. It is well known that CDMA systems are interference limited, and sectoring has been an effective mean of increasing the network capacity by introducing spatial domain orthogonalization to the system. The conventional method applied in, for example, GSM and IS-95 employs 120◦ or 60◦ sectoring to achieve better reuse of network resources. However, one major drawback of this scheme is its inflexibility in dealing with non-stationary and non-uniform mobile distribution. For example, hot spots can cause outage in a sector while other sectors have light traffic. Paper approved by X. Wang, the Editor for Multiuser Detection and Equalization of the IEEE Communications Society. Manuscript received February 23, 2006; revised January 10, 2007. The authors are with the University of British Columbia, Department of Electrical and Computer Engineering, 2112 Acadia Road, Vancouver, British Columbia V6T 1R5, Canada (e-mail: [email protected]). Digital Object Identifier 10.1109/TCOMM.2008.060117.

In this paper, we extend the conventional sectorization by allowing base stations to observe network traffic and adaptively sectorize cells accordingly. The dynamic sectorization is achieved by deploying smart antenna systems at base stations. While smart antenna is often associated with adaptive beam forming, our approach is fundamentally different. Even though both approaches utilize the spatial domain, while beam forming directs dedicated beam to each mobile, sectorization spans cells with few main beams with each beam corresponds to a sector. Smart antenna system is supported in many wireless standards; [1] describes auxiliary pilot support with switched beam in CDMA2000, [2] details the application of smart antenna in IS-95 and [3] describes how dedicated pilot symbols in WCDMA systems can render future deployment of smart antenna easier. Many researchers, [4]–[6], have investigated the adaptive sectoring problem. [4] considers the case with fixed and known mobile locations, as in the wireless local loop, and formulates the adaptive sectoring as a shortest path problem. The problem is solved for two cases: minimizing the mobiles’ total transmit power and minimizing the base station’s total received power. Our approach follows the modeling technique used in [4] and extends it to take mobile movement into account. [6] assumes a spatial Poisson process with known intensity function λ, and the probability of having k mobiles in an area A is given by k −λA the Poisson distribution P (k, A) = (λA) . By fixing k k! e 2 ∞ r θ and P = j=k P (j, A), and replace A by 2 where r is cell radius and θ is a sector’s angle span, P is the probability of having more than k users in θ. Adaptive sectoring is computed by an iterative method which reduces θ when k is above a certain threshold. [5] continuously monitors SINR (signal to interference and noise ratio) of all the users, and sectorize cells to equalize SINR in all sectors. However, in each of the above solutions, there are certain limitations. While the work in [4] is designed for wireless local loop, it does not work with constantly moving mobiles. In [6], the success of the algorithm depends on the knowledge of mobile concentration. Moreover, the SINR-based sectoring in [5] may be unstable because of the shadowing and fast fading in the measurement of SINR [7]. The major difference between adaptive and conventional sectoring is the system’s responsiveness to changes in mobile distribution. Movement of people is observed to follow certain patterns [8], and in this paper, a mobility-enhanced traffic model is developed to capture the dynamic of mobile

c 2008 IEEE 0090-6778/08$25.00 

744

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

concentration over period of a day from the network traffic observed. Based on the estimated mobile distribution, the sectoring problem is solved to maximize the uplink capacity. The main contributions of the paper: 1) A mobility-enhanced traffic model is developed. Mobile distribution is modeled as a spatial Poisson process with time-varying rate function. The rate functions of different locations are assumed to evolve according to the mobiles’ aggregate mobility pattern and which is formulated with the Influence model [9]. If the dynamics is formulated in ordinary Markov chain, the curse of dimensionality greatly limits the applicability of the model. 2) Recursive MAP estimator of the spatial Poisson’s rate function given the network traffic. It provides real time tracking of mobile distribution over the network. Remark: Global MAP estimator which tracks the joint a posteriori distribution of the entire network’s rate functions is too computationally intensive for real time applications. The MAP estimator developed tracks the marginal a posteriori distribution of each individual location’s rate function. 3) Formulation of the adaptive sectoring problem as a shortest path problem for changing mobile distribution. The modeling technique was first applied in [4] to sectorize cells in wireless local loop based on individual mobiles, and it is extended to work with the aggregate mobility pattern in this paper. The paper is organized as follows. Sec. II defines the adaptive sectoring problem and formulates the related models. Sec. III develops the algorithms for solving the adaptive sectoring problem. Sec. IV presents the simulation results and Sec. V concludes the paper. II. M ODEL D EFINITIONS AND F ORMULATION OF A DAPTIVE S ECTORING P ROBLEM In this section, adaptive sectoring of CDMA networks is formulated as a sequence of uplink capacity maximization problems, and the adaptive capability deals particularly with the mobility pattern of mobile users. Uplink capacity is chosen as the cost function because it is the limiting factor [10], [11], and it is measured by the probability of interference at a base station exceeding a certain threshold value [10], [12], i.e., the outage probability. In summary, the aim is to formulate a outage probability minimization problem for CDMA uplinks with adaptive sectorization. In order to compute outage probability at each base station, knowledge of mobiles’ whereabouts is necessary. Yet, as the number of mobiles increases, tracking individuals is computationally intensive, and it may lead to frequent sectoring because of their various movement patterns. The approach taken considers mobiles’ aggregate movement, and it is implemented by dividing the network into areal units and tracking the time evolution of mobile concentration at each unit. The tracked mobile concentration in turn enables the computation of outage probability. The adaptive sectoring problem consists of three components: the formulation of the sectoring problem as an optimization problem, the mobility

model for the mobile users, and the numerical computation of outage probability. The components are established in this section. A. Formulation of Adaptive Sectoring Problem The network model consists of hexagonal cellular networks and, as illustrated in Fig. 1, each cell is divided into six equally spaced areal units called subareas. As will be described in further detail in the next two subsections, mobility of the mobiles is modeled as a graph, where each node represents the mobile concentration in each subarea and the nodes are connected by edges indicating the prior assumption of the movement patterns of the mobiles. In this subsection, given the mobile concentration, discrete sectoring is considered and formulated as an optimization problem. Sectors at each base station are defined by the antenna’s sector-beams, whose beamwidth is multiples of a subarea’s angular span. Perfect beam pattern (no overlap between beams) is assumed, and thus mutual interference is ignored. Note the dimension of subareas defines the granularity of the model, i.e., finer tracking of mobile distribution is enabled by smaller subareas, but with higher computational load on the system. A natural mathematical representation of the adaptive sectoring problem is with graph partitioning. The key advantage is that, under certain conditions, the partitioning problem has a one-to-one correspondence to a shortest path problem, and which is readily solvable. The modeling technique described was first applied in [4] to sectorize wireless local loops with stationary mobiles, and it is extended in this paper to deal with aggregate statistics of mobile movement. Because the rest of the paper builds on top of the model in [4], it is briefly summarized in this subsection. Fig. 1 illustrates the graph theoretical representation of the cellular network. A cell is modeled as a ring of nodes where each node represents a subarea, and the sectoring problem is equivalent to the partitioning of nodes (subareas) into subsets (sectors). Denote A = {a1 , a2 , . . . , aM } as the nodes of the ring, and π = {S1 , . . . , SN } the partition of A into N subsets, the partitioning is considered optimal if it minimizes the cost function C(π), where C(π) is the outage probability experienced in all sectors π. Denote the outage probability in each sector Si as W (Si ), the adaptive sectoring problem is reduced to a graph partitioning problem with the following cost function: N  C(π) = W (Si ) (1) i=1

In general, the problem of optimally partitioning an arbitrary graph with an arbitrary cost function is a NP-hard optimization problem. However, it has been shown that the partitioning problem can be solved in polynomial time if the graph is a string and the cost function is separable. Definition A function of M variables, f (x1 , x2 , . . . , xM ), is separable if it can be expressed as a sum ofM functions of M a single variable; i.e., f (x1 , x2 , . . . , xM ) = i=1 fi (xi ). Theorem 1 If the cost function is separable, the problem of optimally partitioning a string can be reduced to a shortest

WANG and KRISHNAMURTHY: MOBILITY ENHANCED SMART ANTENNA ADAPTIVE SECTORING FOR UPLINK CAPACITY MAXIMIZATION

745

v

12 0 0 1 01111 1 0000 1111 0000 e2 e1 0000 1111 0000 1111 0000 1111 0000 01111 00v1 11 v3 1 0000 1111 0000 01111 1 00 11 0 1 00 11

e3

e6

0 1

00 11

0 00v6 11 v4 1 01111 1 00 11 0000 0000 0000 1111 1111 0000 1111 e5 0000 1111 0000 1111 e 4 0000 1111 0000 01111 1

v1

11 00 00 11

e1

v2

1 0 0 1

e2

v3

1 0 0 1

e3

v4

1 0 0 1

e4

v5

1 0 0 1

e5

v6

1 0 0 1

0 1

v5

Fig. 1. The network is modeled as a hexagonal cellular network, and each cell is divided into six equally spaced areas called subarea. The Graph-theoretic representation of a cell is illustrated. The graph on the left is the ring representation of a cell, where each node is a subarea. (Sectors are disjoint subsets of nodes.) The graph on the right is one of six reduced string representations, where the edge e6 is arbitrarily chosen and removed.

path problem. ( [4] applied this result to sectorize wireless local loop.) Proof The proof can be found in [13]. The important observation of a ring is that it can be broken into a string if an edge is removed. In addition, with perfect sector-beam assumption, the computation of a sector’s outage probability is independent of other sectors. In other words, the hypothesis of the above theorem holds. Fig. 1 illustrates the string representation where the edge e6 is arbitrarily chosen and removed. It should be noted that the removal of an edge traded problem complexity with computational complexity since six strings are generated from one ring. The mapping of the graph partitioning problem to a shortest path problem is illustrated by the construction of an acyclic network. However, the detail is omitted and it can be found in [4], [13]. The important point to note is that once the acyclic network is constructed, the weight at each edge corresponds to the outage probability in each sector, and it is changing in time. According to the above formulation, the adaptive sectoring problem can be viewed as a shortest path problem with a changing weight matrix, where the weight depends on the evolving statistics of mobile distribution. In Sec. II-B, a model is developed to track the statistics in each subarea, and Sec. III-C computes the outage probability.

B. Mobility-Enhanced Traffic Model In this subsection, the aim is to develop a mobility model for mobile users that enables the tracking of mobile distribution by collecting its statistics. The first major problem that has to be addressed is the observability of mobility. In general, mobility is not observable, and it can only be indirectly observed through the network traffic processed at the base station. It is realized that if mobile users place calls according to a Poisson process, something could be said regarding the mobiles’ spatial distribution according to the following theorem. Theorem 2 Let Π be a Poisson process of arriving calls at a base station with constant rate, X, from mobiles in an arbitrary subarea. Once the mobiles placed the call, they move at random around the subarea with independent trajectories. Let E be a spatial subset of the subarea such that the probability of the mobile who called at time s being in E at a subsequent time t is p(s, t). Then the number of mobiles in E at time t

has a Poisson distribution with mean  t u(t) = X p(s, t)ds. 0

Assuming uniform distribution for p(s, t) over the subarea, the distribution of the mobiles in the subarea is a spatial Poisson process with rate equals to that of the arriving calls. Proof The proof can be found in [14], (pg 49 Bartlett’s Theorem). In Theorem 2, a number of assumptions are made, and it is worthwhile going into the details. Assumption 1 The arrival process at the base station is a Poisson process with constant rate. The arrival process referred to in the Theorem is the connection requests made by mobiles in the subarea. For example, the number of times Access Channel is requested in IS95 or CDMA2000. From the study of broadband network traffic [15], the connection request is generally modeled as an inhomogeneous Poisson process. The additional assumption is that the rate function, X, in a subarea is a jump process with finite states, and jumps occur on a hourly basis. Assumption 2 Mobiles are distributed uniformly over the subarea. The assumption is made to simplify the discussion, and it seems reasonable if the subarea is small enough such that highly attractive locations such as shopping malls do not appear as a clustered point in the subarea. However, other distributions may be applied but they are not studied here. Given the relation of the network traffic and the spatial Poisson process, we can express mobile concentration in an subarea as a spatial Poisson process and estimate, in real time, its rate function based on the statistics of connection requests. In addition, the time dynamics of the rate function in each subarea can be expressed as a function of mobiles’ mobility pattern, and which will be given in more detail in the next subsection. The network model based on the spatial Poisson process is formulated and given below. Model Definition Let i = 1, 2, . . . , M indexes subareas, where M is the number of subareas in the network, and k = 0, 1, . . . , 23 denotes hours of a day, Πik is a spatial Poisson process with a constant rate Xki in the subarea i during the time interval [k, k + 1). Xk = [Xk1 , Xk2 , . . . , XkM ] is a discrete time discrete state stochastic process, and, with Theorem 2, its state controls the rate of connection arrivals observed in each subarea. If there is only one subarea, Xk1 is a hidden

746

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

Markov chain observed through a Poisson process. For M subareas, the state of Xki for all subarea i is modeled with the Influence model [9]. Suppose the subarea of interest is i, let D(i) denotes the dependency of i, which is i itself and its adjacent subareas, the transition probability for Xki is  i i |Xk1 , . . . , XkM ) = dij P (Xk+1 |Xkj ), (2) P (Xk+1 

j∈D(i)

i |Xkj ) are where dij ( j dij = 1 and dij > 0) and P (Xk+1 model parameters that are assumed known. (Some parameter estimation techniques can be found in [16].) Furthermore, i the initial probability distribution P (Xk=0 ) is also assumed known for all i.

C. Justification of Influence Model and Model Reduction with Geographical Interpretation The premise of the mobility model are the mapping of the aggregate inter-subarea mobile movement to the networktraffic pattern, and the use of Markovian framework to model the aggregate movement. The implicit model assumptions made in (2) are the division of network into subareas, and the pattern of spatial interaction embedded in the parameters i dij and P (Xk+1 |Xkj ). In this subsection, we will provide justification of the assumptions that is in line with the Markovian framework applied in the geographical analysis such as the study of migration processes and commuting patterns, and discuss how model parameters could be reduced based on their geographical interpretation. The use of Markovian nodal structure in geographical analysis is common and some examples can be found in [17], [18]. The most straightforward approach to characterize the network’s nodal structure is via the use of transition matrix, 1 M i.e., P (Xk+1 , . . . , Xk+1 |Xk1 , . . . , XkM ), which contains the joint state dynamics of the network. However, such complete specification is not desirable because not only does the model dimension grow exponentially as the number of subareas, the interpretation of the model parameters is difficult. In Asavathiratham and et al [9], the Influence Model is introduced to describe interactions between many Markov chains. The model 1 M i , . . . , Xk+1 |Xk1 , . . . , XkM ) to M P (Xk+1 | simplifies P (Xk+1 1 M i 1 M Xk , . . . , Xk ), and for each i, P (Xk+1 |Xk , . . . , Xk ) is a i |Xkj ) as shown in Eq. (2). convex combinations of P (Xk+1 Note that Eq. (2) degenerates to a standard Markov chain if dij = 1 for i = j, and 0 otherwise. In order to explain the spatial interaction among different subareas, we concern with attributal and associational properties of nodes [19]. Attributal properties refer to nodal characteristics due to the nodes itself (e.g. population), and associational properties refer to nodal characteristics due to the relationship between the nodes (e.g. distance). The associational properties is captured by the model parameter dij and the attributal properties by the conditional probability i |Xkj ). P (Xk+1 The spatial interaction interpretation of the parameters is established by some geographical indices. In terms of mobility, for example, attributes such as residential or business district may be assigned to each node, and indices such as the number of bus routes may be assigned to each edge. Other choices such

as the number of office buildings, residences, or the number of registered companies may also be used. According to their geographical interpretation, the model parameters can either be empirically estimated based on some geographical indices or be aggregated according to their attributes to reduce the model complexity. The parameter dij is constant factor indicating how often subarea i is influenced by subarea j, and it can be interpreted as the probability of mobiles commuting from j to i as a function of routes connecting them. The parameter could be empirically estimated, for example, by counting the outgoing bus routes from one subarea to another. Consider a HomeWork mobility pattern seen in many mobility papers [20], [21] and a simple network consists of only three fully connected subareas, A, B, and C. Let mA , mB and mC be the proportion of working people in A, B and C respectively, and let bij be the number of bus routes running from j to i. The influence matrix can be constructed as ⎞ ⎛ bCA mA ( bBAbBA 1 − mA +bCA ) mA ( bBA +bCA ) ⎟ ⎜ b 1 − mB mB ( bABbCB ⎝ mB ( bABAB +bCB ) +bCB ) ⎠ . bAC bBC mC ( bAC +bBC ) mC ( bAC +bBC ) 1 − mC i |Xkj ) On the other hand, the conditional probability P (Xk+1 specifies the effects of the subarea i on j, and it could be interpreted as the probability of mobiles switching between active and non-active talking states given that they’re, for example, commuting from a business district to a residential area. In the same Home-Work setting, the complexity of the parameters can be reduced by assigning residential or business attribute to each subarea. The attribute is useful because, combining with the state of the subarea (mobile concentration), the time of the day could be inferred and thus the time variance in the parameter removed. Intuitively, residential area has high traffic, for example, in the morning and the evening when people are not working. Moreover, the attributes allow the application of the previous technique to empirically estimate the parameters. In addition to the geographical interpretation that the Influence model posses and some ways to estimate and reduce the model parameters, the model complexity of the Influence model is another advantage in justifying its use. Suppose there are M subareas and each subarea has P states, the total number of model parameters are M P 2 +M 2 , which is greatly reduced from P 2M as in the case of complete specification.

D. CDMA Network Assumptions and Outage Probability Calculation In this subsection, the models used for the CDMA network and the propagation are introduced, and the outage probability expression is derived. For adaptive sectoring, because the performance analysis concerns time scale in hours, many important CDMA physical layer effects, such as the signature sequence structure and the fast fading losses, are not included. The performance of the adaptive sectoring is studied with perfect power control, soft handoff and log-normal shadowing. Propagation Model and Interference Calculation The propagation loss in general is modeled as the product of γth power

WANG and KRISHNAMURTHY: MOBILITY ENHANCED SMART ANTENNA ADAPTIVE SECTORING FOR UPLINK CAPACITY MAXIMIZATION

of distance and a log-normal shadowing component [10]. Let Bn denotes the location of base station n and suppose an arbitrary mobile at location z, the following propagation loss model is assumed: Γz [Bn ] ≡ d[z, Bn ] 10 γ

ξ/10

= d[z, Bn ] 10 γ

a(ξz /10)

10

b(ξz,Bn /10)

(3) where d[z, Bn ] is the distance between z and Bn , and γ is the path loss exponent. The shadowing ξ = aξz + bξz,Bn is the superposition of two components: ξz is the shadowing in the near field of the mobile at point z, and ξz,Bn is the shadowing in the wireless link between the mobile at z and the base station Bn . ξz and ξz,Bn are independent Gaussian random variables with the following properties: E(ξz ) = E(ξz,Bn ) = 0, Var(ξz ) = Var(ξz,Bn ) = σ 2 and E(ξz ξz,Bn ) = 0 for all n, and E(ξz,Bn ξz,Bm ) = 0 for all n = m. In addition, ξz and ξz,Bn are assumed to have equal standard deviation, and thus a2 = b2 = 1/2 is assumed. With the propagation model in place, the base station that does the power control and the set of base stations that participate in soft handoff can be defined. Let B = [B1 B2 . . . BN ] denotes the N base stations in the network, the path loss of a mobile at location z to each of the base station is then Γz [B] = [Γz [B1 ] Γz [B2 ] . . . Γz [BN ]]. The base station that power controls the mobile at z is defined as Cz = arg mini Γz [Bi ]. As regard to soft handoff, assume Ns base stations are involved, the soft handoff set, ζz , is defined with respect to location z as the set of Ns base stations with the least path loss values in Γz [B]. Ns is taken to be 2 in the rest of the paper. Furthermore, the received power from the mobile at z is power controlled to have magnitude of 1 at the base station Cz . As a result, a mobile at z and power controlled by Cz has to transmit with power Γz [Cz ], and the interference it induces on base station Bn is equal to Γz [Cz ]/Γz [Bn ], if Bn = Cz (4) I= 1, if Bn = Cz Interference Calculation Revisited with Spatial Poisson In Sec. II-B, Theorem 2 establishes that the mobile distribution in the network is spatial Poisson. The problem to be addressed is to revisit the interference calculation and take spatial Poisson into account. Fig. 2 shows the network model conceptually. Each point z ∈ 2 is assigned a soft handoff set ζz , which is represented as a diamond-shaped area; for any mobile within the diamond, the mobile is in soft handoff with the two base stations at the vertices, and power controlled by the one with smaller pathloss. Suppose the interference at B1 is of interest, which is the central base station in Fig. 2. Let S be the set of all subareas in the shaded area A, and Πik the subarea i’s spatial Poisson set of mobiles that is loading process with rate Xki , the entire the sector is denoted as Πk = i∈S Πik . With (4), the total interference at the sector of B1 is  I[B1 ] = Γz [Cz ]/Γz [B1 ]. (5) z∈Πk

According to [22], (5) can be classified into two components by identifying the two point patterns in Πk ; any point in Πk

747

11

A 12

10

, 13

3

4

14

9

2

8

1

5

15

7

6

16

19

18

17

Fig. 2. Computation of the outage probability at the central base station 1 for a particular sector configuration covering the network area A. The diamonds shown in the figure represent the soft handoff set (Soft handoff is assumed to involve only two base stations in this paper), that is, for each mobile located in a diamond, the mobile is in soft handoff with the two base stations at the diamond’s vertices, and power controlled by the one with smaller pathloss.

can either belong to a point pattern corresponding to Cz = B1 (Πk [B1 ]) or the one corresponding to Cz = B1 (Πk [B¯1 ]). The classification is achieved by the Poisson Marking theorem. According to the network definitions given above, the probability that a mobile at z is power controlled by B1 is P (Cz = B1 ) = P (Γz [B1 ] < Γz [Bn ]) and the probability of not power controlled by B1 is P (Cz = B1 ) = P (Γz [B1 ] > Γz [Bn ]) for all n ∈ ζz and n = 1. Furthermore, let z ∈ 2 and define the mean measure of Πk as shown in (6), the two points patterns of mobiles in Πk are then classified as: In-cell mobile is the point pattern Πk [B1 ] with mean measure m[B1 ](dz) defined by d(m[B1 ])(z) = P (Cz = B1 )dm(z). Other-cell mobile is the point pattern Πk [B¯1 ] ≡ Πk − Πk [B1 ] with mean measure m[B¯1 ](dz) defined by d(m[B¯1 ])(z) = P (Cz = B1 )dm(z) = (1 − P (Cz = B1 ))dm(z). As a result, let I i [B1 ] and I o [B1 ] denote the in-cell and other-cell interference at B1 respectively. The total interference at B1 expressed in (5) is the summation of I i [B1 ] and I o [B1 ]:   I[B1 ] = I i [B1 ]+I o [B1 ] = 1+ Γz [Cz ]/Γz [B1 ]. z∈Πk [B1 ]

z∈Πk [B¯1 ]

The outage probability of the sector is then ⎛ ⎞   P⎝ 1+ Γz [Cz ]/Γz [B1 ] > α⎠ , z∈Πk [B1 ]

(7)

(8)

z∈Πk [B¯1 ]

where α is the threshold value for the total interference. The

748

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

m(dz) =



Xki δ(i, dz), where δ(i, dz) =

i∈S

evaluation of (8) is presented in Sec. III-C. Note that the outage probability expression derived in this subsection is for a particular sector partition, however, it should be clear that the expression is identical for different partitions except the spatial Poisson process Πk . III. A DAPTIVE S ECTORING A LGORITHM In the previous section, the adaptive sectoring problem is mapped to a shortest path problem of an acyclic network with changing weight matrix. In this section, the algorithms that work behind each model component is described. Sec. III-A provides an overview, Sec. III-B describes the MAP estimator of the spatial Poisson’s rate function, Sec. III-C computes the outage probability and Sec. III-D discusses a possible antenna architecture for deploying the system discussed. A. Overview of the Adaptive Sectoring Algorithm in Pseudocode Recall in Sec. II-B, the time t of a day is divided into hourly intervals k = {0, 1, . . . , 23}. Let M be the number of subareas and N the number of sectors in a cell, the pseudocode of the adaptive sectoring algorithm is provided in Algorithm 1. Algorithm 1 Adaptive Sectoring Algorithm i 1: Initialize P (Xk=0 ) for all subarea i and the Influence model parameters {See Sec. II-B.} 2: for (k = 0; k ≤ 23; k + +) do 3: t←k 4: repeat 5: t = t + T 6: MAP estimation of each subarea’s rate function. Complexity O(M ). {See Sec. III-B} 7: until (MAP estimator stabilizes) 8: Construct the shortest path’s weight matrix. Complexity O(M ). {See Sec. II-A for the shortest path formulation and Sec. III-C for the outage probability computation} 9: Dijstra’s Algorithm to find the shortest path. Complexity O(M 2 N ). 10: Cell sectorization. {See Sec. III-D} 11: Update the rate function estimator for k + 1. {See Sec. III-B} 12: end for

B. MAP Estimator of Spatial Poisson’s Rate Function In this subsection, a MAP estimator of Xki given the traffic statistics is introduced. It should be noted that due to computation complexity, the joint a posteriori distribution of Xk is not tracked, but only the a posteriori of Xki (See Appendix A for details). Since the estimators for all subareas are equivalent, and for notational convenience, the subarea to be estimated is labeled as Xk1 , and the neighbors of Xk1 are labeled as Xk2 , Xk3 and Xk4 . The setup is illustrated in Fig. 7.

1, if dz is in subarea i 0, otherwise.

(6)

MAP Estimator Algorithm Let Πik be a spatial Poisson process with rate Xki at the subarea i, and let {Nki (σ); k ≤ σ < t} denotes the observed path of Πik in the time interval [k, t), i.e., the connection requests processed at the base station. The approximated a posteriori probability mass function of Xk1 is iteratively calculated by the following algorithm. For t ∈ [k, k+1), define Nti = Nki (t+t)−Nki (t), where t is an arbitrary time interval, the a posteriori probability mass function P (Xki |Nki (σ); k ≤ σ < t), for i = 1, 2, 3, 4 is shown in (9), where X¯ki = E{Xki |Nki (σ); k ≤ σ < t}, and for t small enough, Nti is either 0 or 1 depending on occurrence or nonoccurrence of events. At the end of the time interval [k, k + 1), label n1 = {Nk1 (σ); k ≤ σ < k + 1}, . . . , and n4 = {Nk4 (σ); k ≤ σ < k + 1}, the probability mass function of the subarea 1 at the beginning of the next time interval [k + 1, k + 2) is 1 = x|n1 , n2 , n3 , n4 ) P (Xk+1

=

4 

d1j



j=1

1 P (Xk+1 = x|Xkj )P (Xkj |nj ),

(10)

Xkj

1 |Xkj ) are Influence model parameters. where d1j and P (Xk+1 For t ∈ [k + 1, k + 2), (9) again continuously update the a posteriori probability upon receiving connection requests. As i a result, assuming the initial probability P (Xk=0 ) is known i for all i, the a posteriori probability of Xk can be tracked for any time t, and thus the MAP estimator at time t is

arg max P (Xk1 = x|Nk1 (σ), . . . , Nk4 (σ); k ≤ σ < t) x

(11)

C. Outage Probability Evaluation for Adaptive Sectoring In this subsection, the aim is to evaluate the outage probability (8) of an arbitrary sector configuration. The spatial Poisson distribution is assumed known, and its resulting outage probability is computed. Let Πk be the union of spatial Poisson processes loading a sector, the total interference received at the sector of base station B1 is computed with (7) and the outage probability is P (I i [B1 ] + I o [B1 ] > α) ⎛  =P⎝ 1+ z∈Πk [B1 ]



⎞ Γz [Cz ]/Γz [B1 ] > α⎠ .

z∈Πk [B¯1 ]

Recall Πk [B1 ] is the mobile point pattern power controlled by B1 and Πk [B¯1 ] is the point pattern not power controlled by B1 . It is obvious that the first term, I i [B1 ], is a Poisson random variable. In addition, in order for the base station B1 to be well defined, a condition that I i [B1 ] > 0 should be imposed. Combining the two observations, the outage probability becomes P (I i [B1 ] + I o [B1 ] > α|I i [B1 ] > 0) ∞ e−u  (u)j P (I o [B1 ] > α − (j − 1)) = 1 − e−u j=1 j!

WANG and KRISHNAMURTHY: MOBILITY ENHANCED SMART ANTENNA ADAPTIVE SECTORING FOR UPLINK CAPACITY MAXIMIZATION

P (Xki |Nki (σ); k ≤ σ < t + t)

 −1 = P (Xki |Nki (σ); k ≤ σ < t) 1 + (Xki − X¯ki )X¯ki (Nti − X¯ki t) + o(t),

where u ≡ E(I i [B1 ]). The exact expression for P (I o [B1 ]) is difficult, however, from [12], [22], it is shown that Gaussian approximation can be applied; the approximation is motivated by the central limit theorem and it is treated rigorously in [12]. The mean and variance of the Gaussian approximation are the first and second cumulants of I o [B1 ] respectively, and whose computation is shown in Theorem 3 and 4. Theorem 3 Let Πk [B¯1 ] be a Poisson point pattern on the network area A  with mean measure m[B¯1 ] o = If z∈Πk [B¯1 ] Γz [Cz ]/Γz [B1 ]. and let I [B1 ] ¯ min(|Γ [C ]/Γ [B ]|, 1)m[ B ](dz) < ∞ holds, then z z z 1 1 A for any complex number s, E(exp(sI o [B1 ]))   [exp(sΓz [Cz ]/Γz [B1 ]) − 1]m[B¯1 ](dz) . = exp A

Proof The proof can be found in [22]. The hypothesis holds since the area A is finite (only the first layer of interference is considered in this paper), and the mean measure of the spatial Poisson process has finite states. Theorem 4 Divide the network area A into Ain−cell and Aother−cell , where in-cell (other-cell) is defined by the inclusion (exclusion) of B1 in the soft handoff set ζz at the location z. Label the soft handoff base stations in Ain−cell as Cz and B1 , and the base stations in Aother−cell as Bm and Bn (Assume soft handoff of 2 base stations), let κc denotes z] the cth cumulant of I o [B1 ], and φz [B1 ] = ΓΓzz [C [B1 ] if Cz = B1 and 0 otherwise, yielding (12), where m(dz) is defined in (6), β ≡ ln10/10 and MBl ≡ 10γlog10 d[z, Bl ]. Proof The proof can be found in [22]. With (12), I o [B1 ]’s mean and variance, κ1 and κ2 respectively, are calculated, and the outage probability becomes: ∞ e−m  mj Q(y˜j ) 1 − e−m j=1 j! (13) √ where y˜j ≡ (α − j + 1 − κ1 )/ κ2 , m is the mean of Πk [B1 ] and Q is the Q-function for the standard normal distribution. From the above equations, it can be seen that if the rate function of the spatial Poisson process is known, the cost function for each sectoring configuration can be computed. However, the computation requires two numerical integration: one for the in-cell mobiles and the other for the other-cell mobiles. The numerical integration process is computationally intensive and time consuming. Fortunately, because the rate function of the spatial Poisson process is assumed to be constant over each subarea, the integration can be precomputed, and real time operation has computational complexity linear to the number of subareas in Πk .

P (I i [B1 ]+I o [B1 ] > α|I i [B1 ] > 0) =

D. Antenna Architecture The antenna architecture that support the adaptive sectoring algorithm can be considered as a migration from a fixed

749

(9)

3-sector CDMA system to a switched-beam smart antenna system. A switched-beam system has a set of predefined antenna patterns, and it serves each mobile with the dynamically chosen antenna pattern of best signal. However, in many cases, such individual-based adaptation is not necessary. Therefore, as an intermediate stage between the fixed sectoring and the switched beam antenna system, adaptive sectoring forms sectors by combining subset of the switch-beams. The implementation can be build on top of existing fixed sectoring system by deploying circular antenna array and a beam forming network per sector [23]. Each sector is identified by its pilot signal and softer handoff is used when mobiles travel between sectors. In addition, since the location of mobiles can be identified by the beam with the strongest signal strength, the statistics of each subarea’s network traffic (Recall Sec. III-B) can be collected by summing up the connection requests in the beams making up the subarea. IV. S IMULATION R ESULTS In this section, the focus is on the numerical studies of the adaptive sectoring algorithm. The analysis consists of two parts: Sec. IV-A simulates the traffic tracking with the mobility-enhanced traffic model as described in Sec. III-B, and Sec. IV-B simulates a hot-spot scenario where a comparison in performance of adaptive and fixed sectoring is made. A. Simulation of Spatial Poisson Estimation As described in Sec. II-B, the network traffic is a spatial Poisson process with rate function modeled according to the Influence Model. Ideally, the parameters of the Influence model can be learned from the actual traffic statistics using particle filtering or EM algorithm [16]. However, in this paper, the model parameters are assumed known from empirical data, and a hypothetical network is used to simulate the network traffic. The hypothetical network consists of 10 adjacent subareas, and the Influence model parameters are arbitrarily chosen. The rate function Xki in each subarea is assumed to have three states {High, Medium, Low}. Fig. 3 illustrates the tracking of two subareas for four time slices. The blue line is the true state, and the green dotted line is the MAP estimator. It can be seen that the estimator follows the true state nicely except the first few minutes after each time the rate function changes value (on a hourly basis). The reason is that the estimation equation is formulated in a finite difference form, and update is done only upon new arrival of connection requests. As a result, a certain convergence time is needed for the MAP estimator to reach the true value. Fig. 4 illustrates the real time tracking with the finite difference equation (9) of the two subareas during the first time interval. The top (bottom) plot shows the convergence of the probability mass function to the state High (Low) as connection requests are accumulated. It can be observed that the

750

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

κc

 dc   o lnE(exp(sI [B ])  1 c ds c   s=0  Γz [Cz ] E(φz [B1 ])c m(dz) = E ; Γz [Cz ] < Γz [B1 ] m(dz) Γz [B1 ] c A   A Γz [Cz ] E ; Γz [Cz ] < Γz [B1 ] m(dz) Γz [B1 ] Ain−cell c    Γz [Bm ] E ; Γz [Bm ] < Γz [Bn ] m(dz) Γz [B1 ] Aother−cell c    Γz [Bn ] E ; Γz [Bn ] < Γz [Bm ] m(dz) Γz [B1 ] Aother−cell cγ     √ MCz − MB1 d[z, Cz ] 2 √ m(dz) Q 2cβbσ + exp((cβbσ) ) d[z, B1 ] 2σb Ain−cell      cγ d[z, Bm ] cβbσ MBm − MBn 2 √ m(dz), Q √ + 2 exp((cβbσ) ) d[z, B1 ] 2 2bσ Aother−cell

= = = + + = +

Spatial Poisson rate tracking of two subareas

Probability tracking of a subarea in state High 1

20

0.8 Probability

Rate

15 10 5 0 0

(12)

0.6 0.4 0.2

50

100

150

200

0 0

250

10

20

20

30

40

50

60

50

60

Probability tracking of a subarea in state Low 1

15 Probability

Rate

0.8

10 5 0 0

0.6 0.4 0.2

50

100

150

200

250

Time (minutes) Estimated intensity

0 0

True intensity

10

20

30 Time (minutes)

40 High

Fig. 3. The real time MAP estimations of the traffic in 2 subareas are plotted for four 60-minute periods. The realizations of the intensity functions in subareas are generated using the Influence model, and (11) is applied to follow. The solid line is the simulated realization, and the dotted line is the MAP estimator value.

convergence time is inversely proportional to the magnitude of the rate function. The tracking converges within 4 minutes when the subarea is in state High, while the tracking of the state Low takes approximately 40 minutes. However, since the MAP estimator depends only on the absolute difference between the state probabilities, the estimation yields accurate result as long as the true state has the highest probability. B. Simulation of Adaptive Sectorization The typical problem of nonuniform traffic is manifested in the generation of hot spots. In this subsection, the response of the adaptive sectoring algorithm is studied against a hot spot scenario, where a comparison in network capacity of the adaptive and the fixed sectoring is made. Fig. 2 illustrates the network model. The network consists of 19 cells and each cell has radius of one. The value of the path-loss exponent, γ, is assumed to be 4, and the required SIR is set

Medium

Low

Fig. 4. The two plots illustrates the real time tracking of the a posteriori probability of the two subareas in Fig. 3 for the first 60-minute period. (9) is applied to update the a posteriori probability mass function as traffic data is accumulated. It is evident from the figure that the convergence time in tracking is inversely proportional to the rate function.

to 7 dB/128, which corresponds to a despread SIR of 7 dB when the spreading factor is 128. Furthermore, the shadowing component in the propagation uncertainty is taken to have standard deviation of 8 dB. Under the network assumptions made, the outage probability of different sector configurations under uniform traffic is illustrated in Fig. 5. It is obvious that the sector with angular span of 4 subareas has the steepest slope, and the sector with one subarea is the smoothest. Suppose the cell of interest is the central cell, the hot spot scenario considered is to increase the rate function of its two neighboring cells, and determine how adaptive sectoring can mitigate the effect. Fig. 6 illustrates the difference in outage probabilities between the fixed and the adaptive sectoring. In the fixed sectoring case, when the rate function is gradually increased, the sector closest to the hot spot experiences high outage probability while the other two sectors have all the

WANG and KRISHNAMURTHY: MOBILITY ENHANCED SMART ANTENNA ADAPTIVE SECTORING FOR UPLINK CAPACITY MAXIMIZATION Outage probability of different sectoring configurations with constant Poisson rate in all subareas

0

10

−5

10

Xk2

Outage probability

−10

10

Xk3

Xk1

−15

10

Xk4 −20

751

Nk1

Nk1

Xk1

Xk1

Nk2

Nk2

Xk2

Xk2

Nk3

Nk3

Xk3

Xk3

Nk4

Nk4

Xk4

Xk4

10

1 subarea 2 subareas

k=0

−25

10

3 subareas 4 subareas

−30

10

0

2

4

6

8 10 12 spatial Poisson rate

14

16

18

20

Fig. 5. Outage probability of sector configuration consisting of 1 to 4 subareas under constant spatial Poisson rate.

k=1

Fig. 7. Label the subarea of interest as Xk1 , the network on the left illustrates the dependency structure of Xk1 on its neighbors. Only 4 subareas are shown because it is assumed that only adjacent subareas have influence on the dynamics of Xk1 (Recall Sec. II-B). The dynamic Bayesian network on the right models the evolution of Xk1 . The solid lines indicate the dependency structure of Xk1 on itself and its neighbors’ previous states. The dotted lines indicate the dependency structure of other subareas.

Outage probability with fixed sectoring under hot spot

0

Outage probability

10

−5

10

−10

10

0

5

10

15

20

25

30

25

30

Outage probability with dynamic sectoring under hot spot

0

Outage probability

10

−5

10

−10

10

0

5

10

15 spatial Poisson rate Sector1

20 Sector2

Sector3

Fig. 6. Comparison of system performance with fixed and dynamic sectoring under hot spot condition. It can be observed that dynamic sectoring balances the traffic and keep the outage probabilities of the three sectors under 1%, where the loaded sector in fixed sectoring has approximately 9% outage probability.

unutilized resources. On the other hand, the adaptive sectoring algorithm narrows the loaded sector when its outage probability starts to rise, and share the load among the three sectors. It is observed that even though the outage probabilities have risen in the other two sectors, they are well below 1%; the outage probability in the fixed sectoring case has risen to approximately 9%. V. C ONCLUSION In this paper, the adaptive sectoring problem is formulated as a shortest path problem. The weight matrix of the acyclic network constructed depends on mobiles’ spatial distribution, and which is estimated by a MAP estimator as a function of the network traffic. The real time tracking of the network traffic enables the system to minimize the outage probability at a base station by responding to non-stationary and non-uniform mobile distribution with adaptive sectoring.

The simulation on the tracking of the spatial Poisson process’ rate functions has shown rapid convergence when the rate function is high. However, convergence is slow when the rate is low. Fortunately, slow convergence does not mean bad performance. Accurate estimation is made as long as the true state has the highest probability. Furthermore, the simulation of hot spot scenario has demonstrated the ability of the adaptive sectoring to cope with nonuniform traffic distribution. The adaptive sectoring balances the load between sectors such that no sector has outage probability exceeding 1%, while the fixed sectoring scheme experiences outage of approximately 9%. Future work with the model developed is parameter estimation with real traffic data. Currently, the model parameters are assumed known, and each subarea is assumed to have the same state space. However, the Influence Model is very flexible. It is possible to have different state space for different subareas, and have the model parameters estimated based on real time traffic data. A PPENDIX A. Derivation of MAP Estimator In this Appendix, the MAP estimator introduced in Sec. III-B is derived. The estimator developed is similar to other HMM-type estimators except 1) The a posteriori probability is updated continuously in discrete steps during the time interval [k, k + 1) instead of once every k, and 2) The a posteriori probability of each individual subarea is tracked instead of the joint a posteriori probability of all the subareas. Tracking joint a posteriori probability distribution is too computationally intensive for real time applications. For notational convenience, the state of the subarea of interest is labelled as Xk1 and its neighbors as Xk2 , Xk3 and Xk4 . The time evolution of Xk1 and its dependency are illustrated in Fig. 7, where the solid lines show the dependency of Xk1 at k + 1 on itself and its neighbors’ previous state. (The dotted lines refer to other subareas’ dependency, and they are irrelevant in estimating Xk1 .) For each subarea i, Xki is hidden, and only the

752

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

1 Pk|k−1

Nk1

2 Pk|k−1

3 Pk|k−1

Xk1

Nk2 Xk2

Nk3

4 Pk|k−1

Xk3

Nk4 Xk4

1 Pk|k

1 Pk+1|k

Xk1

2 Pk|k

R EFERENCES 3 Pk|k

4 Pk|k

k=1

k=0

Fig. 8. Factor graph representation of the Bayesian network in Fig. 7. The Xk1 is isolated and the conditional probabilities relevant to the estimation of Xk1 are explicitly illustrated.

connection request, Nki , is observed. Denote the a posteriori probability distribution of Xk1 given all the network traffic 1 1 2 3 4 = P (Xk1 |N0:k , N0:k , N0:k , N0:k ), and up to time k as Pk|k 1 1 2 3 4 ), the prediction as Pk+1|k = P (Xk+1 |N0:k , N0:k , N0:k , N0:k i i i i where N0:k = (N0 N1 . . . Nk ). The recursive computation of 1 1 and Pk+1|k tracks the a posteriori probability for subarea Pk|k 1. Fig. 8 illustrates the dynamics as a factor graph with the a posteriori distribution explicitly stated. i The prediction step can be computed easily given Pk|k for     1 · · · , P i = 1, 2, 3, 4. Denote Xk ≡ 1 2 4 Xk Xk Xk k+1|k can be written as  1 1 1 2 3 4 = P (Xk+1 |Xk1 , Xk2 , Xk3 , Xk4 )Pk|k Pk|k Pk|k Pk|k Pk+1|k X

k   1 1 2 3 4 = ( d1j P (Xk+1 |Xkj ))Pk|k Pk|k Pk|k Pk|k

Xk

=

 j

i propagated forward in time to form the a posterior density Pk|k as data are accumulated. For this interpretation, it is convenient to rewrite (14) in the finite difference form shown in (15). For t sufficiently small, the term o(t) can be disregarded and Nti = Nki (t + t) − Nki (t) will be either zero or one according to the nonoccurrence or occurrence of a point in i [t, t + t). As σ reaches time k + 1, P (Xki |Nki (σ)) = Pk|k , which completes the cycle.

j

 j 1 d1j ( P (Xk+1 |Xkj )Pk|k ) Xkj

where the second step uses the Influence model representation, 1 |Xkj ) are defined model parameters. and both d1j and P (Xk+1 i The updating step, the computation of Pk|k from the previous i prediction Pk|k−1 , is illustrated in the following Theorem. Theorem Suppose Nki (t) is doubly stochastic Poisson with rate Xki , and Xki is a random variable. If we let Pt (Xki |Nki (σ); k ≤ σ < t) denote the conditional probability density function for Xki given the connection request statistics {Nki (σ); k ≤ σ < t}, then we obtain i , and X¯ki = (14), with P (Xki |Nki (σ); σ = k) = Pk|k−1 i i E{Xk |Nk (σ); k ≤ σ < t}. Proof The proof can be found in [24]. Eq. (14) can also be viewed as defining an updating i algorithm according to which the prior density Pk|k−1 is

[1] S. Dennett, “The CDMA2000 ITU-R RTT candidate submission (0.18),” TIA, Tech. Rep., July 1998. [2] J. C. Liberti and T. S. Rappaport, Smart Antennas for Wireless Communications: IS-95 and Third Generation CDMA Applications. Prentice Hall PTR, 1999. [3] 3GPP, “Technical specification group radio access network, physical channels and mapping of transport channels onto physical channels (FDD), TS 25.221 V3.2.0 (2000-03).” [4] C. U. Saraydar and A. Yener, “Adaptive cell sectorization for cdma systems,” IEEE J. Select. Areas Commun., vol. 19, pp. 1041–1051, June 2001. [5] F. M. R. Giuliano and F. Vatalaro, “Smart cell sectorization for third generation cdma systems,” Wireless Commun. and Mobile Comput., vol. 2, pp. 253–267, May 2002. [6] A. Ahmad, “A CDMA network architecture using optimized sectoring,” IEEE Trans. Veh. Technol., vol. 51, pp. 404–410, May 2002. [7] J. C. Yun and et al., “Traffic balancing performance of adaptive sectorized systems,” in Proc. Vehicular Technology Conference, vol. 3, pp. 1878–1881, Sept. 2002. [8] D. Tang and M. Baker, “Analysis of a metropolitan-area wireless network,” Wireless Networks, vol. 8, pp. 107–120, Mar. 2002. [9] C. Asavathiratham, S. Roy, B. Lesieutre, and G. Verghese, “The influence model,” IEEE Control Syst. Mag., vol. 21, pp. 52–64, Dec. 2001. [10] A. J. Viterbi, CDMA: Principles of Spread Spectrum Communication. Addison-Wesley Publishing Company, 1995. [11] L. Korowajczuk and et al., Designing CDMA2000 Systems. John Wiley and Sons, Ltd, 2004. [12] J. S. Evans and D. Everitt, “On the teletraffic of CDMA cellular networks,” IEEE Trans. Veh. Technol., vol. 48, pp. 153–165, Jan. 1999. [13] C. D. Simone and et al., “Fair dissections of spiders, worms, and caterpillars,” Networks, vol. 20, pp. 323–344, 1990. [14] J. F. C. Kingman, Poisson Processes. Clarendon Press, Oxford, 1993. [15] M. Sexton and A. Reid, Broadband Networking. Artech House, Inc., 1997. [16] S. Basu, T. Choudhury, B. Clarkson, and A. Pentland, “Learning human interactions with the influence model,” MIT Media Laboratory, Tech. Rep., June 2001. [17] C. Dennis, D. Marsland, and T. Cockett, “Central place practice: shopping centre attractiveness measures, hinterland boundaries and the UK retail hierarchy,” J. Retailing and Consumer Services, vol. 9, pp. 185–199, 2002. [18] G. Navdal, I. Thorsen, and J. Uboe, “Modeling spatial structures through equilibrium states for transitioin matrices,” J. Regional Science, vol. 36, pp. 171–196, 1996. [19] L. A. Brown and F. E. Horton, “Functional distance: An operational approach,” Geographical Analysis, vol. 2, pp. 76–83, 1970. [20] D. Lam, D. Cox, and J. Widom, “Teletraffic modeling for personal communications services,” IEEE Commun. Mag., vol. 35, pp. 79–87, Feb. 1997. [21] A. Kumar, M. N. Umesh, and R. Jha, “Mobility modeling of rush hour traffic for location area design in cellular networks,” in Proc. Proc. 3rd ACM International Workshop on Wireless Mobile Multimedia. New York: ACM Press, 2000, pp. 48–54. [22] C. C. Chan and S. V. Hanly, “Calculating the outage probability in a cdma network with spatial poisson traffic,” IEEE Trans. Veh. Technol., vol. 50, pp. 183–203, Jan. 2001. [23] M. Mahmoudi, E. S. Sousa, and H. Alavi, “Adaptive sector size control in a cdma system using butler matrix,” IEEE Trans. Veh. Technol., vol. 2, pp. 1355–1359, May 1999. [24] D. L. Snyder and M. I. Miller, Random Point Processes in Time and Space. Springer-Verlag, 1991.

WANG and KRISHNAMURTHY: MOBILITY ENHANCED SMART ANTENNA ADAPTIVE SECTORING FOR UPLINK CAPACITY MAXIMIZATION −1 dp(Xki |Nki (σ); k ≤ σ < t) = p(Xki |Nki (σ); k ≤ σ < t)(Xki − X¯ki )X¯ki (dNti − X¯ki dt),

P (Xki |Nki (σ); k ≤ σ < t + t)

 −1 = P (Xki |Nki (σ); k ≤ σ < t) 1 + (Xki − X¯ki )X¯ki (Nti − X¯ki t) + o(t)

Alex Wang was born in 1979 in Taiwan. He received his bachelor degree (with honours) in Engineering Physics with Commerce Minor and his Master degree in Electrical and Computer Engineering from the University of British Columbia in 2003 and 2005 respectively. He is currently pursuing his Ph.D. degree in Statistical Signal Processing, under the supervision of Dr. Vikram Krishnamurthy, in the University of British Columbia. His research interests include radar signal processing, syntactic pattern recognition and uncertain reasoning.

753

(14)

(15)

Vikram Krishnamurthy (S’90-M’91-SM’99-F’05) was born in 1966. He received his bachelor’s degree from the University of Auckland, New Zealand in 1988, and Ph.D. from the Australian National University, Canberra, in 1992. Since 2002, he has been a professor and Canada Research Chair at the Department of Electrical Engineering, University of British Columbia, Vancouver, Canada. Prior to 2002, he was a chaired professor at the Department of Electrical and Electronic Engineering, University of Melbourne, Australia, where he also served as deputy head of department. His current research interests include stochastic dynamical systems for modeling of biological ion channels and biosensors, stochastic optimization and scheduling, and statistical signal processing. Dr. Krishnamurthy has served as associate editor for several journals including IEEE Transactions Automatic Control, IEEE Transactions on Signal Processing, IEEE Transactions Aerospace and Electronic Systems, IEEE Transactions Circuits and Systems B, IEEE Transactions Nanobioscience, and Systems and Control Letters. He is co-editor with S. H. Chung and O. Andersen of the book Biological Membrane Ion Channels – Dynamics Structure and Applications, published by Springer-Verlag in 2006.

754

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

Robust Optimal Cross-Layer Designs for TDD-OFDMA Systems with Imperfect CSIT and Unknown Interference: State-Space Approach Based on 1-bit ACK/NAK Feedbacks Rui Wang and Vincent K. N. Lau

Abstract—Cross-layer designs for OFDMA systems have been shown to offer significant gains of spectral efficiency by exploiting the multiuser diversity over the temporal and frequency domains. In this paper, we shall propose a robust optimal cross-layer design for downlink TDD-OFDMA systems with imperfect channel state information at the base station (CSIT) and unknown interference in slow fading channels. Exploiting the ACK/NAK (1-bit) feedbacks from the mobiles, the proposed cross-layer design does not require knowledge of the CSIT error statistics or interference statistics. To take into account of the potential packet error due to the imperfect CSIT and unknown interference, we define average system goodput (which measures the average b/s/Hz successfully delivered to the mobile) as our optimization objective. We formulate the cross-layer design as a state-space control problem. The optimal power, optimal rate and optimal user allocations are determined as the output equations from the system states based on dynamic programming approach. Simulation results illustrate that the performance of the proposed closed-loop cross-layer design is very robust with respect to imperfect CSIT, unknown interference, model mismatch as well as channel variations due to Doppler. Index Terms—Closed-loop, multiple antennas, imperfect CSIT, cross-layer.

I. I NTRODUCTION

R

ECENTLY, cross-layer scheduling in OFDMA systems has received tremendous attentions. High spectral efficency can be achieved by exploiting the multi-user selection diversity over the temporal and frequency domains [1]–[4]. To exploit the multi-user selection diversity, knowledge of Channel State Information is required at the base station (CSIT). However, for TDD systems, obtaining perfect CSIT is very challenging, especially for large number of subcarriers or large number of users. When the base station has perfect knowledge of CSIT, the transmitted packet will be virtually error free (with powerful error correction coding) in slow fading channels, and hence, the system can achieve ergodic capacity. However, when the base station has imperfect CSIT or there is unknown interference at the mobile receivers, the Paper approved by Y. Fang, the Editor for Wireless Networks of the IEEE Communications Society. Manuscript received March 29, 2006; revised January 31, 2007 and May 14, 2007. This work was supported by the Research Grants Council of the Hong Kong Government through the grant RGC 615606. The authors are with the Dept. of ECE, Hong Kong University of Science and Technology (e-mail: {wray, eeknlau}@ust.hk). Digital Object Identifier 10.1109/TCOMM.2008.060100.

scheduled data rate may be larger than the instantaneous channel capacity which is unknown to the base station. This results in packet transmission error even if powerful error correction code is applied. Moreover, the efficiency of the multi-user scheduling is reduced because the wrong set of users may be selected for transmission. Most of the existing cross-layer designs addressed the imperfect CSIT issue are based on heuristic approaches. For example, in [5], [6], the cross-layer schedulers are designed assuming CSIT is perfect and the effect of imperfect CSIT is evaluated by simulations. However, this approach does not offer any design insight on what should be the optimal design and performance with imperfect CSIT as the optimal design can be quite different from that with perfect CSIT. It is also found that the performance of the naive cross-layer scheduler (designed for perfect CSIT) is very sensitive to imperfect CSIT even at very small CSIT errors [7]. In [7], [8], the authors discuss the optimal cross-layer design with imperfect CSIT. However, knowledges of the CSIT error statistics (such as the error distribution or error variance) and interference statistics are required, which may not be available in practice. In all the works mentioned above, the cross-layer design is open-loop. In open-loop scheduling, the set of admitted users, the power allocation and the rate allocation are determined based on the estimated CSIT (as well as estimated interference), and remain to be the same for the entire scheduling time slot. There are some existing works on the closedloop adaptation with the ACK/NAK feedbacks [9]–[12]. For example, in [9], the authors present a power and rate control policy for a point-to-point system with delay constrained traffic based on ACK/NAK feedback. However, the crosslayer scheduling (user selection) issue is not addressed. In [10], the authors present a heuristic adaptive rate control and randomized scheduling algorithm for flat-fading channels based on learning automata. In all these works, the solutions are heuristic and there is no insight on how good the heuristic solutions approach the optimal performance. Furthermore, knowledge of CSIT error statistics are needed and they did not address the potential issue of unknown interference. In this paper, we shall propose a robust and optimal closedloop cross-layer design for downlink TDD-OFDMA systems with imperfect CSIT and unknown interference for slow fading channels. We shall utilize the ACK/NAK (1-bit) feedbacks

c 2008 IEEE 0090-6778/08$25.00 

WANG and LAU: ROBUST OPTIMAL CROSS-LAYER DESIGNS FOR TDD-OFDMA SYSTEMS

Scheduling Slot n 1

2

3

{rk,m,1} {pk,m,1} {Am,1} Packet Slot 1

Fig. 1.

4

...

......

755

Scheduling Slot n+1 N

{rk,m,N} {pk,m,N} {Am,N} Packet Slot N

1

2

{rk,m,1} {pk,m,1} {Am,1} Packet Slot 1

3

4

...

......

N

{rk,m,N} {pk,m,N} {Am,N} Packet Slot N

Illustration of scheduling slot and packet slot.

from the mobiles to adjust the power allocation, the rate allocation as well as user assignment per packet slot. No knowledge of the CSIT error statistics or interference statistics is required at the base station. To take into account of the potential packet error, we define average system goodput, which measures the average b/s/Hz successfully delivered to the mobiles, as the optimization objective. We formulate the cross-layer design as a state-space control problem, where the optimal power, optimal rate and optimal user allocations are determined as the output equations from the system states based on dynamic programming approach. Finally, simulation results illustrate that the performance of the proposed closedloop cross-layer design is very robust with respect to imperfect CSIT, unknown interference, model mismatch as well as channel variation due to Doppler. This paper is organized as follows. In section II, we outline the OFDMA system model as well as the imperfect CSIT and unknown interference model. In section III, we shall define the system goodput and formulate the closed-loop cross-layer design as a state-space control problem in the presence of imperfect CSIT and unknown interference. In section IV, we shall derive the optimal system outputs as well as the optimal state evolution in transient state based on dynamic programming approach. In section V, we shall discuss the convergence of our state-space approach. In section VI, numerical results are presented and discussed. Finally, we give a brief summary in section VII. II. OFDMA S YSTEM M ODEL A. Slow Fading Channel Model We consider a communication system with K mobile users and one base station over a slow-varying frequency selective fading channel. Let M be the number of subcarriers in the system. We consider a scheduling slot structure, which consists of N packet bursts as illustrated in Figure 1. We assume the channel is quasi-statistic within a scheduling slot in this paper. Let Xm,n be the transmit symbol on the m-th subcarrier in the n-th packet burst, the received signal Yk,m,n of the k-th user on the m-th subcarrier in the n-th packet burst can be expressed as: Yk,m,n = hk,m Xm,n + Zk,m,n + Ik,m,n

(1)

where hk,m is the channel coefficient of the m-th subcarrier and the k-th user, which is i.i.d. complex Gaussian distributed with zero mean and unit variance, Zk,m,n is the i.i.d. zeromean complex Gaussian noise with variance σz2 /M and Ik,m,n denotes the zero-mean complex Gaussian interference (due to other cell interference) at the k-th mobile receiver with variance βk2 /M .

Fig. 2.

The structure of the closed-loop cross-layer scheduler.

B. Channel Estimation Model and Maximum Achievable Data Rate In this paper, we consider the imperfect channel state information at the base station (imperfect CSIT), which can be modelled as: hbk,m = hk,m + Δk,m

(2)

where hk,m is the actual CSI and Δk,m is the CSIT estimation error. We consider the case where there is interference to the mobile receivers, which may come from the surrounding cells. As it usually being in practical systems, we assume the base station has no idea about the mobile interference power βk of the K users as well as the variance of the CSIT errors Δk,m , 2 . denoted as σΔ For simplicity, we assume the CSIR as well as interference power measurement at the mobile station is perfect for the detection of downlink packets. Hence, based on the received signal model in (1), the maximum achievable data rate of the k-th user on the m-th subcarrier in the n-th packet burst is given by the maximum mutual information between Yk,m,n and Xm,n conditional on CSIR hk,m : Ck,m,n

=

max I(Yk,m,n ; Xm,n |hk,m )   |hk,m |2 = log2 1 + pm,n 2 σz /M + βk2 /M Pr(Xm,n )

(3)

where pm,n is the corresponding transmit power. C. MAC Layer Model The MAC layer is responsible for scheduling the radio resource at each scheduling slot based on the estimated CSIT as well as the ACK/NAK feedbacks. Figure 2 illustrates the structure of the cross-layer scheduler. The outputs of the MAC scheduler include the the power allocation {pk,m,n }, the rate allocation {rk,m,n } as well as user selection {Am,n }. After the packets in the first packet slot are transmitted, the selected mobiles will send the ACK/NAK feedbacks to the base station before the next packet is delivered1. For subsequent packet bursts in a scheduling slot, the cross-layer scheduler adapts the power allocation, rate allocation as well as user selection based on the CSIT hb = {hbk,m } and the ACK/NAK feedbacks from the mobiles f1n−1 = {fAm,i ,m,i |i ∈ {1, n − 1}, ∀m} 1 For simplicity, we assume the delay of the ACK/NAK is small compared with the packet duration.

756

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

(fk,m,n = 1 if an ACK is received from the k-th user after transmitting the n-th packet on the m-th subcarrier, and 0 otherwise). Hence, the MAC layer scheduler can be represented by the power allocation policy, rate allocation policy and user selection policy defined below. Definition 1. Power Allocation Policy:      b n−1  P = pk,m,n (h , f1 )∀k, m, n and pk,m,n ≤ P0 k,m,n

(4)

Rate Allocation Policy:     b n−1  R = rk,m,n (h , f1 )∀k, m, n User Selection Policy:     b n−1 A = Am,n (h , f1 ) ⊂ {1, K}∀m, n and |Am,n | ≤ 1



The causal state evolution policy S is defined as: sk,m,n = S(Sn−1 , f1n−1 )



(7)

The system outputs, including the admitted users, power allocation and rate allocation, are functions of the current system state Sn . |h

lk,m,n = inf {x|qk,m,n (x|f1n−1 , hb ) > 0}

(8)

uk,m,n = sup{x|qk,m,n (x|f1n−1 , hb ) > 0}

(9)

x

D. Packet Transmission Error and Average Goodput Let rk,m,n be the scheduled data rate for the user k on the m-th subcarrier in n-th packet. The instantaneous goodput of the k-th user on the m-th subcarrier in n-th packet, which measures the bits successfully delivered to the receiver, is given by: ρk,m,n = rk,m,n 1[Ck,m,n ≥ rk,m,n ]

(5)

where 1(I) is the indicator function which is equal to 1 if the event I is true and 0 otherwise. The average total goodput, which measures the average total b/s/Hz successfully delivered to the mobiles (averaged over ergodic realization of CSI), is defined as: N M  U (P0 , A, R, P) = E ρAm,n ,m,n n=1 m=1

=Ehb

M N  



Eh rAm,n ,m,n 1[CAm,n ,m,n ≥ rAm,n ,m,n ]hb

n=1 m=1



=Ehb G(P0 , hb , A, R, P)

(6)

where h denotes the actual channel coefficients; Ehb [X] denotes the expectation of the random variable X w.r.t. hb . G(.) measures the conditional system goodput (conditioned on the estimated CSIT hb ). To account for the potential packet error (rk,m,n > Ck,m,n ), we shall design the cross-layer scheduler to optimize the total average system goodput U (.). III. C ROSS -L AYER D ESIGN F ORMULATION WITH I MPERFECT CSIT AND U NKNOWN I NTERFERENCE A. Closed-Loop Structure of the Cross-Layer Scheduler Figure 2 illustrates the structure of the closed-loop crosslayer scheduler. The scheduler is characterized by an internal state Sn and the state evolves based on the feedbacks of the users after each packet transmission. The scheduler outputs are uniquely determined by the system state. We first define the notations as follows: • sk,m,n denotes the state of user k on subcarrier m during the n-th packet burst, and Sn = {sk,m,n |∀k, m}.

|2

k,m From (3), the actual SINR (with unit power) σ2 /M+β 2 z k /M is a random variable with certain conditional pdf qk,m,n (x|f1n−1 , hb ) (The base station doesn’t know this distribution explicitly due to the lack of knowledge of the CSIT error statistics and interference statistics, however, the base station can make assumption on this distribution. We shall show the robustness of this assumption). We define the state sk,m,n = [lk,m,n , uk,m,n ] to be the lower bound and upper bound of the SINR given the knowledge of CSIT hb and the ACK/NAK feedbacks f1n−1 :

x

B. Optimization Objective To take into consideration of the potential packet errors, given any realization of the imperfect CSIT, we shall optimize the conditional average system goodput G(.). Since the user selection, power allocation and rate allocation are functions of the system state Sn , we rewrite (6) as: G(P0 , h , A, R, P, S) = ESN 1 b

N 

g¯n (pn , hb , Sn )

(10)

n=1

where SN 1 = {S1 , . . . , SN }, pn is the total transmit power for the n-th packet burst, g¯n denotes the conditional average goodput (conditioned on the CSIT hb and current system state Sn ) contributed by the n−th packet burst and is given by: g¯n (.) =

M 

rAm,n ,m,n Pr[CAm,n ,m,n ≥ rAm,n ,m,n |hb , Sn ]

m=1

(11) Thus, the closed-loop cross-layer scheduling problem with imperfect CSIT and unknown interference can be summarized as the following optimization problem: Prob 1 (Cross-Layer Problem Formulation with Imperfect CSIT). Given any realization of the estimated CSIT for all mobile users at all subcarriers hb = {hbk,m }, determine the optimal state evolution policy S, the optimal user selection policy A, the optimal power allocation policy P as well as the optimal rate allocation policy R such that the conditional total goodput, G(.) is maximized. That is, N   G∗ (P0 , hb ) = max ESN g¯n (pn , hb , Sn ) (12) 1 A,R,P,S

n=1

where the power allocation, rate allocation policies are subject to the following constraints: • Total Transmit Power Constraint in (4) • Quality of Service (QoS) Requirement: The conditional packet error probability of all the users is less than a target .

WANG and LAU: ROBUST OPTIMAL CROSS-LAYER DESIGNS FOR TDD-OFDMA SYSTEMS

IV. O PTIMAL S OLUTION A. Optimal State Evolution At the base station, the actual SINR of user k on the mth subcarrier in the n-th packet burst is a random variable with density qk,m,n (x) = qk,m,1 (x|f1n−1 ). The event {f1n−1 } is equivalent to the event {LBk,m,n ≤ x ≤ U Bk,m,n }, where   rk,m,i  − 1  2 1 ≤ i ≤ n−1 and f LBk,m,n = max = 1 k,m,i i pk,m,i    rk,m,i  − 1  2 1 ≤ i ≤ n−1 and f = 0 U Bk,m,n = min k,m,i i pk,m,i  Hence,we have qk,m,n (x) = qk,m,1 (x|LBk,m,n ≤ x ≤ U Bk,m,n ). According to the definition of the system state (8,9), we get lk,m,n = LBk,m,n and uk,m,n = U Bk,m,n And the optimal state evolution in (7) is k ∈ Am,n ):  rk,m,n max{lk,m,n , 2 pk,m,n−1 } lk,m,n+1 = lk,m,n  rk,m,n min{uk,m,n , 2 pk,m,n−1 } uk,m,n+1 = uk,m,n

(13)

given by (suppose if fk,m,n = 1, otherwise.

(14) if fk,m,n = 0,

otherwise.

(15)

757

Proof 1. The proof of this lemma is based on the recursive structure of Fn (.). We omit it here due to the page limit. As a result of Lemma 1, the optimization problem with respect to {Am,n }, {pk,m,n}, {rk,m,n } (given any CSIT realization hb and current system state Sn ) can be divided and conquer into N steps. The recursive equation in (17) is also called the Bellmen’s equation [13] and the optimization problem belongs to the Markov decision problem. The general solution of the Markov decision problem involves an offline recursion and an online strategy. We elaborate these two procedures as follows. 1) Backward Recursion for User Selection Policy and Power/Rate Allocation Policies: In the offline strategy, we shall partition the optimization for the average goodput G∗ (P, hb ) with respect to the user selection policy {Am,n }, the power allocation policy {pk,m,n } and the rate allocation policy {rk,m,n } (for the N packet bursts) into N recursive ∗ optimizations using the recursive relationship of Fn∗ and Fn+1 in (17). These optimal policies will be used for the online algorithm when the actual ACK/NAK feedbacks are received. The offline recursive solution is elaborated in the following steps. • Step 1. Consider the last packet burst n = N . Recall that the channel capacity is given by:   |hk,m |2 Ck,m,N = log2 1 + pk,m,N 2 (19) σz /M + βk2 /M |2

|h

B. Optimal System Output Equations In fact, the optimization objective G(.) can be divided and conquered into a set of recursive equations. This recursive relationship is summarized in the following lemma: Lemma 1 (Recursive Formulation of the Conditional Goodput). Let Fn (P, hb , Sn ) be the total average goodput from the n-th packet burst to the N -th packet burst conditional on the CSIT and the system state Sn with total residual power P . i.e., Fn (P, hb , Sn ) =¯ gn (pn , h , Sn ) + b

N   i=n+1 Si

, Sn ) And let Fn∗ (P, hb N power constraint i=n

b

(16) Fn∗ (.)

Fn∗ (P, hb , Sn ) =    ∗ max Pr(Sn+1 |Sn , hb )Fn+1 (P − pn , hb , Sn+1 ) g¯n (pn ) + Sn+1

(17)

where pn =

M  m=1

(20)

where θk,m,N is the SINR scaling factor given by: θk,m,N = Q−1 k,m,N ()

(21)

To determine the optimal power allocation policies, {pk,m,N }, we form the Lagrangian as: M 

(1 − ) log2 (1 + pAm,n ,m,N θAm,n ,m,N )

m=1

be the optimized subject to  M ∗ p ≤ P . F n (.) can be m=1 Am,n ,m,n expressed recursively as:

{pk,m,n } {rk,m,n } {Am,n }

rk,m,N = log2 (1 + pk,m,N θk,m,N )

L=

Pr(Si |Sn , h )¯ gi (pi , h , Si ) b

k,m where σ2 /M+β is a random variable with density 2 z k /M qk,m,N (x). Let Qk,m,N (x) be the corresponding cumulative distribution function. To satisfy the packet error requirement , the scheduled data rate is given by:

pAm,n ,m,n and we assume FN∗ +1 = 0.

Furthermore, the optimal conditional goodput in (12) is given by G∗ (P0 , hb ) = F1∗ (P0 , hb ) (18)

−λN

M 

pAm,n ,m,N

m=1

Using standard optimization technique, the optimal power allocation policy is given by: +  1 1 ∗ − (22) pAm,n ,m,N = λN θAm,n ,m,N where (X)+ = max(0, X), λN is the Lagrangian multiplier given by M   1  1 1 pN + = λN M θ m=1 Am,n ,m,N

(23)

for sufficiently large pN . Finally, substituting (22) and (23) into the objective function FN∗ (.) (17), the optimal user selection is given by: Am,N = arg max{θk,m,N } k

(24)

758

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

Hence, the closed form for FN∗ (pN ) is given by: ∗ (pN ) FN∗ (pN ) = g¯N  M  =(1 − ) log2 pN +

 +(1 − ) log2



M  m=1

1

Proof 2 Please refer to Appendix A. Hence, we have

M

M N  

θ m=1 Am,n ,m,N

Fn∗ (.) = (1 − ) log2

θAm,n ,m,N 



MM

(25)

Since {θk,m,N } are functions of SN , the equations (24,22,20) give the optimal user selection, the optimal power allocation and the optimal rate allocation in terms of the system state. Step 2. Consider the packet burst n, where n = {N − 1, N − 2, .....1}. Given the target error probability , the state transition probability Pr(Sn+1 |Sn , hb ) in (17) has the form of (1−)a b , where a is the total number of ACK feedbacks and b is the total number of NAK feedbacks after the transmission of the n-th packet. Since  is usually chosen to be very small, most of state transition probabilities are very small except the one when a = |A| and b = 0 (In this case, there is no transmission error). Hence, we have: Fn∗ (P, hb , Sn )   ∗ ≈ max g¯n (pn , hb , Sn ) + Fn+1 (P − pn , hb , Sn+1 ) {pAm,n ,m,n } {rAm,n ,m,n } {Am,n }

(26)

where the state Sn+1 is derived from its previous state Sn based on the all ACK feedbacks. Similar to Step 1, the optimal power and rate allocation policies are given by: +  1 1 − (27) pAm,n ,m,n = λn θAm,n ,m,n rAm,n ,m,n = log2 (1 + pAm,n ,m,n θAm,n ,m,n ) where

pn =

  M  1 1 1 pn + = λn M θ m=1 Am,n ,m,n

(28) (29)



1 θ m=1 Am,n ,m,n

(30)

The optimal user selection is given by the following lemma: Lemma 2 (Optimal User Selection). Let sk,m,n+1 , . . . , sk,m,N be the system states evolved from sk,m,n with all ACK feedbacks and θk,m,n , . . . , θk,m,N be the corresponding SINR scaling factors. The optimal admitted user in the m-th subcarrier and the n−th packet burst is given by: Am,n = arg max k

N  i=n

θk,m,i

P + N −n+1

(31)

θAm,i ,m,i

M (N −n+1)M

M N  

i=n m=1

1 θAm,i ,m,i

N −n+1



+ (1 − )

(N −n+1)M (32)

2) Online Solution: The online strategy is a realtime algorithm. For instance, upon receiving the specific ACK/NAK feedbacks fn , we update the system state Sn to Sn+1 by (7), and then, select the optimal users, the optimal power and rate allocation by the optimal policies {Am,n }, {pk,m,n } and {rk,m,n } (obtained in the offline backward recursion). The online processing is illustrate below: • Step 1. At the first packet burst, the optimal users, the optimal power and rate allocation {Am,1 }, {rk,m,1 }, {pk,m,1 } based on the estimated CSIT hb is obtained according to (31,27,28). • Step 2. Before transmitting the n+1-th packet burst (n = {1, 2, ....N − 1}), the base station has already obtained the specific ACK/NAK feedbacks of the previous packet fn and updated the system state accordingly. The optimal user selection, the optimal power and rate allocation for the n + 1-th packet are obtained from (31,27,28) and (24,22,20) in the offline recursion. V. S TEADY S TATE A NALYSIS The convergence of the system state can be summarized in the following lemma: Lemma 3 For sufficiently large n and quasi-static fading channel, we have: lim Am,n = arg max

n→∞

k

|hk,m |2 . σz2 /M + βk2 /M

(33)

Furthermore, if the user j has the largest SINR in the m-th subcarrier, we have lim Sj,m,n =

n→∞

M N   1 P 1 + N − n + 1 N − n + 1 i=n m=1 θAm,i ,m,i M 

∗ log2

 i=n m=1

|hj,m |2 . + βj2 /M

σz2 /M

(34)

In other words, for sufficiently large n, the system state of the user with largest SINR will converge to the actual SINR and the user selection will converge to the best user selection (as if perfect CSIT were available). Proof 3 Please refer to Appendix B. VI. N UMERICAL R ESULT AND D ISCUSSION In this section, we shall illustrate the performance of our closed-loop cross-layer scheduler design. In our simulation, the number of users K is 5, the number of multipaths Lp is 4 and the target packet error probability  is 0.01. For simplicity, we assume the unknown interference power βk2 of each user is the same. The unknown interference is quasi-static within a scheduling slot but random between scheduling slots according to U (0, I). In the simulation, the actual CSI is generated according to complex Gaussian distribution CN (0, 1). We

WANG and LAU: ROBUST OPTIMAL CROSS-LAYER DESIGNS FOR TDD-OFDMA SYSTEMS

759

1.8 perfect CSIT, I=0.1, 2

2

1.6

proposed closed−loop, I=0.1, 2 non−adaptive closed−loop, I=0.1, 2

Bandwidth Efficiency (bit/s/Hz)

1.5

Bandwidth Efficiency (bit/s/Hz)

open−loop, I=0.1, 2

naive, I=0.1, 2 round robin, I=0.1, 2

1

1.2 closed loop scheduler

0.8 naive scheduler round robin scheduler 0.4

0.5

0 0

0

2

4 6 Average Transmit Power per Packet (dB)

8

4

8 12 Index of Packet Burst

16

20

10

2 σΔ

Fig. 3. Average goodput performance versus transmit power at = 0.1, M = 4, I = {0.1, 2}. Open-loop refers to the cross-layer design based on the imperfect CSIT knowledge obtained at the beginning of scheduling slot only. Non-adaptive closed-loop refers to the closed-loop cross-layer design where the CSIT can be updated according to the feedbacks and where there is no power adaption among the packet bursts. Perfect CSIT refers to the ideal system with perfect CSIT and this serves as performance upper bound for bench marking. Round robin scheduler refers to the naive cross-layer design assuming the CSIT is perfect while selecting user randomly. Naive scheduler refers to the cross-layer scheduler assuming the CSIT is perfect.

assume the base station does not have any knowledge on the actual interference power β, actual distribution of the SINR 2 . The base as well as the actual CSIT estimation error σΔ station has default values for these parameters (βdef = 1, 2 σΔ,def = 0.5) which is not the same as the actual parameters. We shall show by simulation that although the default parameters may not equal to the actual parameters, the system state in the proposed design can still converge to the actual SINR and the closed-loop system is very robust with respect to the mismatch even in high CSIT error and high interference power. Each point in the figures is obtained by averaging over 1000 independent fading realizations. A. Performance of the closed-loop Cross-Layer Scheduler on Static Channel We first consider the case of slow fading in which the channel fading is quasi-static within a scheduling slot. Figure 3 shows the average system goodput versus the transmit power of the proposed closed-loop scheduler at high CSIT errors 2 σΔ = 0.1, M = 4 and the maximum unknown interference power I = 0.1, 2. For comparison, we also compare our proposed design with various baselines, namely the openloop cross-layer scheduler, non-adaptive closed-loop crosslayer scheduler, naive scheduler (designed assuming perfect CSIT) and round robin scheduler. The open-loop scheduler, the round robin scheduler and the naive scheduler are considered as open-loop designs because they did not exploit the ACK/NAK feedbacks from the mobiles. The proposed closed-loop scheduler achieves a significant performance gain over these open-loop schedulers. This illustrates that with the ACK/NAK feedback, significant cross-layer gains can be

2 = 0.1, Fig. 4. Average goodput performance of each packet burst at σΔ M = 4, P0 = 23dB, I = 1. Round robin scheduler refers to the naive cross-layer design assuming the CSIT is perfect while selecting user randomly. Naive scheduler refers to the cross-layer scheduler designed for perfect CSIT.

achieved even at large CSIT errors and large unknown interference. Furthermore, the proposed closed-loop scheduler also achieves a significant performance gain over the non-adaptive closed-loop scheduler, where the CSIT is updated according to the feedbacks, however there is no power adaption among the packet bursts. This illustrates the importance of our proposed design of state-space based adaption. The proposed design is also robust to the mismatch in the channel statistics and parameters. Figure 4 illustrates the average goodput of each packet burst (averaged over multiple scheduling slots) at high CSIT 2 errors σΔ = 0.1, I = 1, M = 4 and P0 = 23dB. The average goodput of the closed-loop scheduler increases with the packet burst index. There are two reasons for this. On one hand, because the scheduler can get better estimation of the actual SINR at later packet slots after receiving more ACK/NAK feedbacks, the decisions of user selection made in the later packet slots are more accurate. Since the scheduler can explore more multiuser diversity in the later packet bursts, the performance is better. On the other hand, since the CSIT is more accurate in the later packet slots, more power will be allocated to them to explore the performance gain of multiuser diversity. As a contrary. the two reference schedulers do not have such behavior because the knowledge of the actual SINR remains to be the same at all packet slots. B. The Performance Sensitivity on Doppler Spread In this part, we consider frequency selective fading channels with Doppler frequency fd from 20Hz to 100Hz, which corresponds to a speed of 9 and 45 km/hr at 2.4GHz. The duration of the packet slot is 0.2ms. Figure 5 illustrates the average system goodput versus the doppler frequency of the proposed closed-loop scheduler, round robin scheduler and 2 = 0.1, I = 1, 2, naive scheduler at large CSIT errors σΔ M = 4 and P0 = 23dB respectively. It can be observed that significant gain of the proposed closed-loop cross-layer design can be achieved at moderate to large Doppler.

760

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

6

1.4

close loop, I=1

1.2

instantaneous capacity, I=1

0.8

Data Rate (bit per subcarrier)

Bandwidth Efficiency (bit/s/Hz)

5

1

close loop, I= 2

0.6

round robin, I=1 round robin, I= 2

naive, I=1 naive, I= 2

0.4

0.2

0 20

40

60 Doppler Frequency (Hz)

80

100

2 = Fig. 5. Average goodput performance versus Doppler frequency at σΔ 0.1, M = 4, I = {1, 2} and P0 = 23dB. Round robin scheduler refers to the naive cross-layer design assuming the CSIT is perfect while selecting user randomly. Naive scheduler refers to the cross-layer scheduler designed for perfect CSIT.

3

4 instantaneous data rate, I= 1

instantaneous capacity, I= 2

3

instantaneous data rate, I= 2 2

1

0

0

10

20

30 40 50 Index of Packet Burst

60

70

80

Fig. 7. The transient of the instantaneous scheduled data rate and the actual instantaneous channel capacity versus time (packet slot) at fd = 20Hz, 2 = 0.1, M = 4, K = 1 and P = 29dB. σΔ 0

instantaneous capacity, I=1

We formulate the cross-layer design as a state-space control problem. The optimal power, optimal rate and optimal user allocation are determined as the output equations from the system state. Based on dynamic programming approach, we work out the optimal state evolution using backward recursion and forward recursion algorithms. Simulations illustrate that the proposed closed-loop cross-layer scheduler has very robust goodput performance at moderate to high CSIT errors, interference power and moderate Doppler.

Data Rate (bit per subcarrier)

2.5 instantaneous data rate, I=1 2

1.5

1

instantaneous data rate, I= 2

A PPENDIX A: P ROOF OF L EMMA 2

0.5

0

instantaneous capacity, I= 2

0

10

20

30 40 50 Index of Packet Burst

60

70

80

Fig. 6. The transient of the instantaneous scheduled data rate and the actual 2 = instantaneous channel capacity versus time (packet slot) at fd = 0Hz, σΔ 0.1, M = 4, K = 1 and P0 = 29dB.

C. The Convergence of the Close Loop Adaptation Figure 6 and 7 illustrate the instantaneous scheduled data rate versus time in a scheduling slot at Doppler frequencies 2 of fd = 0 and fd = 20Hz, high CSIT errors σΔ = 0.1 and high interference I = 1, 2. In the simulation, M = 4, K = 1 and P0 = 29dB. In both cases, the scheduled data rate of the proposed closed-loop cross-layer design converges to the instantaneous actual capacity quite well. This justifies the robustness of the proposed closed-loop scheduler with respect to the CSIT error, unknown interference, model mismatch and the channel variation due to Doppler. VII. S UMMARY In this paper, we propose a robust cross-layer design for the downlink OFDMA systems with imperfect CSIT and unknown interference for slow frequency selective fading channels.

Fn∗ (P, hb , Sn ) =

= max (1 − ) {Am,i }

max

{Am,i },{pk,m,i }

M  m=1

M N  

N 

 log2

(1 − )

i=n

{Am,i }

m=1

rAm,i ,m,i

m=1 i=n

θAm,i ,m,i

[M (N − n + 1)]N−n+1  )N−n+1

1 θ Am,i ,m,i i=n m=1 N M   log2 θAm,i ,m,i ≈ max (1 − ) ∗(P +

N M  

i=n

P N−n+1 [M (N − n + 1)]N−n+1



where the first equality comes from the target packet error rate constraint which is similar to (20); the second equality is obtained from standard water-filling approach over m = 1 to M and i = n to N with sufficient large power constraint P ; the last approximation is made for sufficiently large P . We can observe from the above equation that the average goodput of a subcarrier is independent of the user selection of other subcarriers. In other words, the user selection of each subcarrier can be decoupled, i.e.: {Am,n , ..., Am,N } = arg max

N  i=n

θAm,i ,m,i

∀m ∈ {1, M } (35)

WANG and LAU: ROBUST OPTIMAL CROSS-LAYER DESIGNS FOR TDD-OFDMA SYSTEMS

Since we only consider ACK feedback, as the packet index grows from n to N , the SINR scaling factor of the selected user θk,m,i will increase. However, the SINR scaling factor of the un-selected user will remain the same (because they won’t be updated by feedbacks). Hence, the optimal user selection of any subcarrier must satisfy: Am,n = Am,n+1 = ... = Am,N

∀m ∈ {1, M }

(36)

combining (35) and (36), we complete the proof of lemma IV-B1. A PPENDIX B: P ROOF OF L EMMA 3 Let’s consider another lemma first. Lemma 4 If user k is selected infinite times in the mth subcarrier , the state of this user in this subcarrier will converge to the actual SINR. Proof 4 This is because every selection will lead to update on the user state, which will make the lower bound of the state approach to the upper bound of the state. As a result, both bounds will converge to the actual SINR. We omit the detail of the proof here due to the page limit. Assume the user j has the largest SINR Bj,m in the m-th subcarrier. We can argue that only this user will be selected infinite times in the m-th subcarrier when N tends to infinity. Otherwise, suppose another user i with SINR Bi,m is selected infinite times, we have the following inconsistent conclusions: • Since the user i is selected infinite times, according to the above lemma, there should exists a packet burst indexed by n such that li,m,n ≤ Bi,m ≤ ui,m,n < Bj,m . • According to the strategy of state evolution, we have ui,m,p < Bj,m ≤ uj,m,p ∀p ≥ n. N N • Let’s compare l=p θj,m,l and l=p θi,m,l at p-th (p ≥ n) packet burst. Since θj,m,l and θi,m,l (l = p + 1, ..., N ) are derived by assuming all ACK feedbacks, we have θj,m,l → uj,m,p and θi,m,l → ui,m,p for sufficiently large l. Due to uj,m,p > ui,m,p , we can conclude that θj,m,l > θi,m,l for sufficiently large l. N N • Hence, we have l=p θj,m,l > l=p θi,m,l when N tends to infinity. According to our user selection strategy, the user i will never been selected in the p-th packet bursts. Remember that p ≥ n, the user i will never been selected after the n-th packet burst. This conflicts with the statement that user i with SINR Bi,m is selected infinite times. Hence, only the user j who has the largest SINR will be selected infinite times. Combining this result with Lemma 4, we can get the conclusion of Lemma 3. R EFERENCES [1] L. C. Wang and W. J. Lin, “Throughput and fairness enhancement for OFDMA broadband wireless access systems using the maximum C/I scheduling,” in Proc. IEEE VTC 2004, pp. 4696–4700, Sept. 2004. [2] C. Wengerter, J. Ohlhorst, and A. von Elbwart, “Fairness and throughput analysis for generalized proportional fair frequency scheduling in OFDMA,” in Proc. IEEE VTC 2005, pp. 1903–1907, May 2005. [3] M. Y. Shen, G. Q. Li, and H. Liu, “Effect of traffic channel configuration on the orthogonal frequency division multiple access downlink performance,” IEEE Trans. Wireless Commun., pp. 1901–1913, July 2005.

761

[4] D. Niyato and E. Hossain, “Adaptive fair subcarrier/rate allocation in multirate OFDMA networks: radio link level queuing performance analysis,” IEEE Trans. Veh. Technol., pp. 1897–1907, Nov. 2006. [5] 3GPP, “Physical layer aspects of utra high speed downlink packet access,” TR 25.848 V4.0.0. [6] V. K. N. Lau, M. L. Jiang, and Y. J. Liu, “Cross layer design of uplink multi-antenna wireless systems with outdated CSI,” IEEE Trans. Wireless Commun., pp. 1250–1253, June 2006. [7] M. L. Jiang and V. K. N. Lau, “Performance analysis of proportional fair uplink scheduling with channel estimation error in multiple antennas system,” in Proc. IEEE PIMRC 2004, pp. 1628–1632, Sept. 2004. [8] R. Wang and V. Lau, “On the design of downlink multi-user multiantenna OFDMA systems with imperfect CSIT,” in Proc. IEEE PIMRC 2005, Sept. 2005. [9] T. Holliday, A. Goldsmith, and P. Glynn, “Wireless link adaptation policies: QoS for deadline constrained traffic with imperfect channel estimates,” in Proc. IEEE ICC 2002, pp. 3366–3371, April 2002. [10] M. A. Haleem and R. Chandramouli, “Joint adaptive rate control and randomized scheduling for multimedia wireless systems,” in Proc. IEEE ICC 2004, pp. 1500–1504, June 2004. [11] A. K. Karmokar, D. V. Djonin, and V. K. Bhargava, “Delay constrained rate and power adaptation over correlated fading channels,” in Proc. IEEE Global Telecommunications Conference, 2004., vol. 6, pp. 3448– 3453, Nov. 2004. [12] H. T. Zheng and H. Viswanathan, “Optimizing the ARQ performance in downlink packet data systems with scheduling,” IEEE Trans. Wireless Commun., vol. 4, pp. 495–506, Mar. 2005. [13] M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley and Sons, 2005.

IEEE 802.16m.

Rui Wang graduated from the Dept of Computer Science and Technology, University of Science & Technology of China, with a B.Eng in 2004. After that, he was admitted by the Dept of Electronic & Computer Engineering, Hong Kong University of Science & Technology, for PhD study. He is now Ph.D. candidate on wireless communication. His current research interests include cross-layer optimization, wireless ad-hoc network, and cognitive radio. He is also involved in the standardization of IEEE 802.22 (Wireless Regional Area Network) and

Vincent Lau graduated from the Dept of EEE, University of Hong Kong with a B.Eng (Distinction 1st Hons) in 1992. He joined the HK Telecom after graduation for three years as project engineer and later promoted to system engineer. He obtained the Sir Edward Youde Memorial Fellowship and the Croucher Foundation in 95 and went to the University of Cambridge for a Ph.D. in mobile communications. He completed the Ph.D. degree in two years and joined the Lucent Technologies - Bell labs (ASIC department) as member of technical staff in 1997. In 2004, he joined the Department of Electrical and Electronic Engineering, Hong Kong University of Science and Technology. At the same time, he is a technology advisor of HK-ASTRI on the R&D of wireless LAN access infrastructure with smart antenna. He has a total of seven years industrial experience and three years of academic experience. His research interests include adaptive modulation and channel coding, information theory with state feedback, multi-user MIMO scheduling, crosslayer optimization, baseband SoC design (UMTS base station ASIC, 3G1x mobile ASIC, Wireless LAN MIMO ASIC). He is the principal author of a book on MIMO Technologies (to be published by John Wiley and Sons) as well as the chapter author of two books on wideband CDMA technologies. He has published more than 40 papers in IEEE transactions and journals and 47 papers in international conference, 14 Bell Labs Technical Memos and received two best paper awards. He has eight US patents pending and is currently a senior member of IEEE.

762

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

Hard-Limiting Performance Analysis of 2-D Optical Codes Under the Chip-Asynchronous Assumption Chia-Cheng Hsu, Guu-Chang Yang, Senior Member, IEEE, and Wing C. Kwong, Senior Member, IEEE Abstract—The traditional chip-synchronous assumption used in the analyses of optical codes in optical code-division multiple access gives a pessimistic performance upper bound, while a more realistic chip-asynchronous assumption gives a more accurate performance. It is also known that a hard-limiter can be placed at the front end of an optical decoder to reduce the effects of multiple-access interference and the near-far problem. In this paper, the “hard-limiting” performance of two-dimensional (2-D) optical codes is analyzed under the chip-asynchronous assumption. We apply a Markov-chain method for a more accurate analysis, which can be generalized to 2-D optical codes with arbitrary maximum cross-correlation values. The performance of 2-D optical codes with the hard-limiting and chip-asynchronous assumptions is also compared with the soft-limiting and chipsynchronous assumptions. Index Terms—Chip asynchronous, hard limiting, optical code division multiple access.

O

I. I NTRODUCTION

PTICAL code-division multiple access (O-CDMA) has been attracting interests in the areas of fast fiberoptic and optical-wireless multiple-access networks [1]–[7]. O-CDMA allows many simultaneous users to access the same optical transmission channel asynchronously through the assignment of unique signature codes. Because of soft blocking on the numbers of simultaneous users and possible subscribers, O-CDMA also finds applications, for example, in the areas of optical passive networks [8] and packet-switching lightwave networks [9], [10]. Recently, a two-dimensional (2-D) incoherent coding technique, so-called the wavelength-hopping time-spreading (or simply wavelength-time) coding [11]–[17], attracts attention. It is because the coding can be flexibly expanded in the wavelength or time domain, supporting larger code cardinality or more simultaneous users than one-dimensional (1D) coding schemes. The wavelength-time scheme has also been successfully demonstrated in multigigabit/s, multiuser OCDMA testbeds recently [18], [19]. To support the scheme, 2D optical codes have been proposed [11]–[17]. However, most studies on the performance analyses of these optical codes Paper approved by I. Andonovic, the Editor for Optical Networks and Devices of the IEEE Communications Society. Manuscript received April 20, 2006; revised January 11, 2007. This work was supported in part by the National Science Council of Republic of China under Grant NSC 95-2221-E005-023-MY3, in part by the Ministry of Education, Taiwan, R.O.C. under the ATU plan, in part by the U.S. Defense Advance Research Projects Agency under Grant MDA972-03-1-0006, and in part by the presidential Research Award and Faulty Development and Research Grants of Hofstra University. C.-C. Hsu and G.-C. Yang are with the Department of Electrical Engineering, National Chung-Hsing University, Taichung 402, Taiwan, R.O.C. (e-mail: [email protected]). W. C. Kwong is with the Department of Engineering, Hofstra University, Hempstead, NY 11549, USA (e-mail: [email protected]). Digital Object Identifier 10.1109/TCOMM.2008.060222.

used a pessimistic chip-synchronous assumption for ease of computation [2], [3], [11]–[17], [20], [21]. Chip synchronous means that the timings of all users are perfectly aligned in the time-slot (or so-called chip) level, but this assumption always give an upper bound on the performance. In [22], Brady and Verdu used a semiclassical method to analyze the effects of chip asynchronism to the performance of a family of 1-D prime codes [3]. We reported a Gaussian approximation method for analyzing the performances of 1D and 2-D optical codes with the cross-correlation functions of at most one without the chip-synchronous assumption in [3] and [23]. In [24], we also provided a more accurate chip-asynchronous performance analysis of 2-D optical codes by using a combinatorial method and the analysis could be generalized to an arbitrary maximum cross-correlation value λc ≥ 1. It is known that a hard-limiter can be placed at the front end of a receiver before correlation is performed for reducing the effects of multiple-access interference (MAI) and the nearfar problem [2], [3]. While a receiver with a hard-limiter is called a hard-limiting receiver, a soft-limiting receiver simply refers to the one without hard limiting. For the conventional chip-synchronous case, the hard-limiting performances of 1D optical orthogonal codes with λc = 1 and prime codes with λc = 2 were analyzed in [2], [3], [11], [20], [21], [23], and [25]. Chen and Yang [26] further showed that their analysis could be generalized to an arbitrary maximum crosscorrelation value λc ≥ 1. The performance of modified prime codes with double hard-limiters (i.e., one before and one after an optical decoder) was also analyzed in [27]. To our best knowledge, however, there lack of studies on the hard-limiting performance of 2-D optical codes with arbitrary λc under the more realistic chip-asynchronous assumption, of which we would like to address in this paper. Recently, the effect of beat noise in O-CDMA systems have been studied theoretically and experimentally [28]–[35]. It is found that the amount of beat noise is directly related to the coherent length of the incoherent laser sources used in the systems. Since this paper focuses on the influences of chip asynchronism and hard limiting to the performance of optical codes with an arbitrary λc , we do not include beat noise in the analyses. That is, we assume the use of laser sources with very short coherent length in order to isolate the code performance from the beat-noise effect. If one is interested in the effect of beat noise to practical O-CDMA systems, the analyses in [28]–[35] can be combined with our new analytical method. The organization of the rest of this paper is as follows. We first review the hard-limiting operation in O-CDMA in Section II. The impact of the chip-asynchronous assumption

c 2008 IEEE 0090-6778/08$25.00 

II. H ARD -L IMITING O PERATION For a hard-limiting receiver, the interferences at all nonempty pulse positions of every cross-correlation function are equalized. Therefore, an error occurs whenever the transmitted data bit is zero but the total number of nonzero pulse positions in the cross-correction function exceeds a predetermined decision threshold T hhard . If the input light intensity at a time instant is greater than or equal to T hhard , a hard-limiter will clip its output light intensity to a fixed level [36]–[38]. Otherwise, the output of the hard-limiter will be zero. Therefore, an ideal hard-limiter can be defined as [2]  1, x ≥ T hhard g(x) = (1) 0, 0 ≤ x < T hhard where x is the input light intensity and “1” represents the clipped output intensity level. Hard limiting provides a performance improvement because it is able to exclude some combinations of the interference patterns from becoming heavily localized in a small number of time slots [2], [20]. For example, a decoder with a signature sequence 0000010100010000000000101000001 of length 31 and weight 6 is receiving a multiplexed (from several interferers) O-CDMA signal 0000010000030000000000102000001. Without the hard-limiter, the cross-correlation function can be as high as 8 at the decoder output, resulting in a decoding error as the cross-correlation function is greater the expected autocorrelation peak (i.e., weight 6). With the hardlimiter, the multiplexed O-CDMA signal will be clipped to be 0000010000010000000000101000001, which will not cause any decoding error as the cross-correlation function is now only at most 5. III. C HIP -A SYNCHRONOUS A SSUMPTION Illustrating the effect of chip asynchronism, Fig. 1 shows an example of the cross-correlation functions between two 1-D code sequences, 1000100001100 and 0001000101100, for the chip-synchronous and chipasynchronous cases. Under the chip-synchronous assumption, the discrete cross-correlation values, Ig , within the chip interval, [g, g + 1) for g = {0, 1, 2, . . . , 12}, are equal to {2, 2, 1, 2, 1, 0, 2, 2, 1, 0, 2, 0, 1}, correspondingly. However, under the chip-asynchronous assumption, the average crosscorrelation values, Iµ , within the same chip interval, [g, g +1), become {2, 3/2, 3/2, 3/2, 1/2, 1, 2, 3/2, 1/2, 1, 1, 1/2, 3/2},

(a)

(b)

correlation

to the “hit” probabilities created by interferers is formulated in Section III. In Section IV, a Markov-chain method for analyzing the hard-limiting, chip-asynchronous performance of optical codes with an arbitrary maximum cross-correlation function λc is shown. As a numerical example in Section V, the hard-limiting and soft-limiting performances of the λc = 1 2-D carrier-hopping prime code (CHPC) [3], [23] and λc = 2 2-D quadratic-congruence CHPC (QC-CHPC) [15] are compared under the chip-synchronous and chip-asynchronous assumptions. Although we use 2-D optical codes as numerical examples, our new analytical technique can also be applied to 1-D optical codes once the hit probabilities have been obtained.

correlation

HSU et al.: HARD-LIMITING PERFORMANCE ANALYSIS OF 2-D OPTICAL CODES UNDER THE CHIP-ASYNCHRONOUS ASSUMPTION

763

2 1 1

2

3

4

5

6

7

8

9

10 11 12 13

chip number

1

2

3

4

5

6

7

8

9

10 11 12 13

chip number

2 1

Fig. 1. Cross-correlation functions between two 1-D code sequences, 1000100001100 and 0001000101100, under the (a) chip-synchronous and (b) chip-asynchronous assumptions.

correspondingly. That is, for the chip-asynchronous case, the cross-correlation value in a time slot depends on the amount of time shifts between any two correlating code sequences (or code matrices in 2-D optical codes) and can be caused by a pulse (or nothing) from the preceding time slot and a pulse (or nothing) from the present time slot, as shown in Fig. 1. Thus, in additional to define qi (used in the traditional chip-synchronous case) as the (hit) probability of having a cross-correlation value (or the number of hits) of i in a time slot, we now need to consider two consecutive time slots and define qi,j as the probability of the cross-correlation value in the preceding time slot equal to i and the cross-correlation value in the present time slot equal to j, under the original chip-synchronous assumption, where i and j = {0, 1, 2, . . . , λc }. For λc = 1 optical codes, the hit probabilities are given by [3, Chapter 3], q1 = 0.5q1,0 + 0.5q0,1 + q1,1

(2)

q0,0 + q1,0 + q0,1 + q1,1 = 1

(3)

q1,0 = q0,1

(4)

The equation of each hit-probability term depends on the optical code in use. Once q1 and q1,1 are formulated, the remaining hit probabilities in (2)–(4) can be obtained. For example, the λc = 1 2-D CHPC in [3] or [15] has q1 = w2 /(2LN ) and q1,1 = w(w − 1)/[2N (N − 1)], where w is the code weight, L is the number of wavelengths, and N is the code length. In general, for λc = c (≥ 1) optical codes, the hit probabilities are given by 1 (qi,j + qj,i ) 2 j=0 c

qi =

c c  

qi,j = 1

(5)

(6)

i=0 j=0

for i and j = {0, 1, 2, . . . , c}. Note that qi,j = qj,i is true only for the case of λc = 1.

764

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

sub-position 1 1 Fig. 2.

2

3 4 w-1 pulse positions

w

Divide w pulse positions into 2w sub-positions.

IV. P ERFORMANCE A NALYSIS In this section, we analyze the hard-limiting performance of λc = c optical codes under the chip-asynchronous assumption. For the chip-synchronous case, the combinatorial hardlimiting error probability of the λc = c optical codes is known and given by [26]   j   w 1  w  j−i j Pe,syn,hard = (−1) 2 j i=0 i j=T h  c  K−1 ⎫ ⎬  i m · q (7) m w ⎭ m m=0

i

where 0 = 1, K is the number of simultaneous users, w is the code weight, qm is the probability of getting m = {0, 1, 2, . . . , c} hits in a time slot, and T h is the decision threshold. In traditional analyses, T h = w was assumed. However, this assumption is not necessary and (7) is the generalized form for any T h ≤ w. The soft-limiting, chip-asynchronous error probability of the λc = c optical codes is given by [24] 1 1 Pe,asyn = − 2 2 w−1   · c c lhit =0



c ⎜ ·⎜ ⎝ i=0



i=0 c 

(K − 1)!



j=0,(i,j)=(0,0) li,j !



(K − 1 − l )!

pw,0

pw,1/2

pw,1

···

pw,w

(l)

Let ri denote the probability that i of the w pulse positions of the desired user are hit by l interferers and let r(l) denote the vector of these probabilities as & % (l) (l) (l) (l) r(l) = r0 r1/2 r1 · · · rw (11) (l)

Because r0 = [1 0 0 · · · 0], the vector r(l) can be obtained by multiplying r(0) with the transition probability matrix P l times [39], such that

li,j ⎟ ⎟ q K−1−l qi,j ⎠ 0,0

j=0 (i,j)=(0,0)



⎞⎤ c c w − 12 i=0 j=0 (i + j)li,j ⎠⎦ · ⎣1 − Q ⎝   c c 1 2l (i − j) i,j i=0 j=0 12

While the original MAI level is uniformly distributed in the closed interval [0, 1], we here assume the MAI level of one hit as 1/2 in average, for ease of computation. Then, we can divide the w pulse positions into 2w sub-positions, as shown in Fig. 2. Every hit occurs in one of the 2w subpositions. These sub-positions can be modeled as the states of a Markov chain [26], [39]. As shown in Fig. 3, there are in total 2w states, labeled as {0, 1/2, 1, 3/2, . . . , w − 1/2, w}, correspondingly. Let wl (i) represent that i of the w pulse positions of the desired user are hit by l interferers, where i = {0, 1/2, 1, 3/2, . . . , w − 1/2, w}. Then, the transition probability pi,j from wl (i) to wl+1 (j) is given by c ⎧ c ⎪ m=0 n=0 ⎨ 2i (m+n−2k )(2w−2i 2k ) (9) qm,n if j = i + k pi,j = 2w ( ) ⎪ m+n ⎩ 0 otherwise  where k = {0, 1/2, 1, 3/2, . . . , c − 1/2, c} and xy = 0 if x < y. Next, we define P as the matrix of the transition probability pi,j , such that ⎡ ⎤ p0,0 p0,1/2 p0,1 · · · p0,w ⎢ p1/2,0 p1/2,1/2 p1/2,1 · · · p1/2,w ⎥ ⎢ ⎥ ⎢ p1,1/2 p1,1 · · · p1,w ⎥ (10) P = ⎢ p1,0 ⎥ ⎢ ⎥ .. .. .. .. .. ⎣ ⎦ . . . . .

(8)

where 1/2 assumes 0-1 data-bit transmis√ an equiprobable 2 ∞ sion, Q(x) = (1/ 2π) x e−y /2 dy, li,j is the number of users interfering in such a way that the cross-correlation value in the preceding time slot is i ∈ [0, c] but the crosscorrelation c value c in the present time slot is j ∈ [0, c], lhit = i=0 j=0,(i,j)=(0,0) min(i, j)li,j denotes the minimum number of hits that causes no errors, and l = c c i=0 j=0,(i,j)=(0,0) li,j ≤ K − 1. Now, we apply the Markov-chain method [39] to analyze the hard-limiting performance of the λc = c optical codes. For the chip-asynchronous case, each interferer may contribute one to 2λc pulse overlaps (or hits) in the cross-correlation function. The probability of contributing one hit is q1,0 + q0,1 , and the probability of contributing two hits is q2,0 + q0,2 + q1,1 . In general, the probability of contributing m hits is given by qm,0 + qm−1,1 + qm−2,2 + · · · + q0,m (or i+j=m qi,j ).

r(l) = r(0) P l = [1 0 0 · · · 0]P l

(12)

As P is an upper triangular matrix, the eigenvalues of P are the entries on its main diagonal [40, Th. 6.1]. Therefore, P is diagonalizable [40, Th. 6.6] and has 2w + 1 distinct eigenvalues, such that P = SDS −1 . The diagonal entries of D are eigenvalues of P and the columns of S are the corresponding eigenvectors [40, Th. 6.5]. Then, we have P K−1 = SDK−1 S −1

(13)

Because D is diagonal, DK−1 is also a diagonal matrix and given by ⎤ ⎡ K−1 p0,0 0 0 ··· 0 ⎢ 0 0 ··· 0 ⎥ pK−1 ⎥ ⎢ 1/2,1/2 ⎥ ⎢ K−1 K−1 ⎥ (14) ⎢ 0 0 p · · · 0 D =⎢ 1,1 ⎥ .. .. .. .. ⎥ ⎢ .. ⎦ ⎣ . . . . . 0 0 0 · · · pK−1 w,w

HSU et al.: HARD-LIMITING PERFORMANCE ANALYSIS OF 2-D OPTICAL CODES UNDER THE CHIP-ASYNCHRONOUS ASSUMPTION

p0,c

p

p

0,1

0

p

0,1/2

p0,0

1/2

p0,w-1/2

p

1/2,3/2

p

1

1/2,1

p1/2,1/2

p

c-1,c

p

c,c+1

p

1,3/2

c

c-1/2,c

p1,1

765

p0,w

p

p

w-3/2,w-1/2

p

w-1,w

w-1/2

p

c,c+1/2

pc,c

w

p

w-1,w-1/2

w-1/2,w

pw-1/2,w-1/2

pw,w

Fig. 3. State transition diagram of the Markov chain on pi,j , where the states represent that there are i of the w pulse positions of the desired user hit by the l interferers.

The corresponding eigenvectors are  2w 2w ⎡ 1 2w 1 2 · · ·  2w 2w−1 ⎢ 0 · · · 2w−1 1 1 ⎢ 2w−1 ⎢ 0 0 1 · · · 2w−2 S=⎢ 2w−2 ⎢ . .. .. .. .. ⎣ .. . . . . 0 0 0 ··· 1

sub-position 1

⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦

⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦

Substituting (13)–(16) into (11) and (12) with l = K − 1, the vector r(K−1) can be written as % (K−1) (K−1) (K−1) r(K−1) = r0 r1/2 r1 ··· & (K−1) (K−1) (17) · · · rw rk

(K−1)

rk

=

2k  i=0

(−1)2k−i

   2w 2k K−1 pi/2,i/2 2k i

5

where j is the number of the other hit pulse positions. There are in total 2n + j hit sub-positions. Thus, the probability of 2n + j sub-positions being hit by interferers is given by (18), such that (K−1)

(Ph )K−1 2n+j

= =

(16)

where

2 3 4 pulse positions

Fig. 4. Example of the condition that there are three pulse positions with the MAI levels greater than one.

The details of the derivation of (15) can be found in [26, Appendix]. Thus, the inverse of S can be found by using [40, Th. 3.7] and given by S −1 =  2w  ⎡ 1 − 2w (−1)2w 2w 1 2 2w ··· 2w−1 ⎢ 0 · · · (−1)2w−1 2w−1 1 − 1 2w−1 ⎢  ⎢ 0 0 1 · · · (−1)2w−2 2w−2 ⎢ 2w−2 ⎢ . .. .. .. .. ⎣ .. . . . . 0 0 0 ··· 1

1

(15)

(18)

With hard limiting, an error occurs when the desired user is receiving a data bit “0” but interferers are contributing MAI with the total number of pulse positions (each with the MAI level greater than or equal to one) exceeds or is equal to the threshold T h. In other words, an error occurs only when there are n ∈ [T h, w] pulse positions with the MAI levels greater than 1. If there are n such pulse positions, the remaining pulse positions will have the MAI levels equal to 1/2 or zero. When the MAI level of one pulse position is 1/2, one of the two sub-positions of that pulse position will get hit. Then, the total number of the combinations that there are n pulse positions with MAI levels greater than or equal to 1 is   w−n   w  w−n j 2 (19) n j=0 j

r(2n+j)/2  2w

2n+j 2n+j 

2n+j−i

(−1)

i=0

  2n + j K−1 pi/2,i/2 (20) i

Using Fig. 4 as an example, there are five pulse positions and the MAI levels of pulse positions 1, 3, and 5 are all at least 1, but the MAI level of position 4 is 1/2. There are 7 sub-positions being hit. If there are K simultaneous users, the probability being hit is given by  7of these 7 sub-positions = i=0 (−1)7−i 7i pK−1 (Ph )K−1 7 i/2,i/2 . Finally, the hard-limiting, chip-asynchronous error probability is obtained by using (9), (19), and (20), such that  w   w−n  1  w  w−n j Pe,asyn,hard = 2 (Ph )K−1 2n+j 2 n j=0 j n=T h  w   w−n  1  w  w−n j 2 = 2 n j=0 j n=T h   2n+j  2n + j · (−1)2n+j−i i i=0

K−1  c c  i   k+l  2w qk,l (21) · k=0 l=0

k+l

where the factor 1/2 comes from the assumption of equiprobable bit-one and bit-zero transmission. V. N UMERICAL E XAMPLES In Fig. 5, the hard-limiting error probabilities, Pe,syn,hard of (7) and Pe,asyn,hard of (21), of the λc = 1 2-D CHPC

766

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008 −2

−2

10

10 (w,N )

−3

(w,N )

−3

10

10 (5,25)

−4

−4

10

10

(5,25) −5

(7,49)

10 Error probability Pe

Error probability Pe

−5

(7,49)

10

−6

10

−7

10

−8

10

−9

−6

10

−7

10

−8

10

−9

10

10

−10

−10

Simulation

10

Simulation (hard−limiting)

10

Synchronous −11

Sof t−limitin g −11

Asynchronous

10

−12

10

Hard−limitin g

10

−12

0

5

10

15

20 25 30 35 Number of simultaneous users K

40

45

10

50

Fig. 5. Hard-limiting error probabilities, Pe,syn,hard and Pe,asyn,hard , versus the number of simultaneous users K for the λc = 1 2-D CHPC under the chip-synchronous and chip-asynchronous assumptions.

0

5

10

15

20 25 30 35 Number of simultaneous users K

40

45

50

Fig. 6. Chip-asynchronous error probabilities, Pe,asyn,soft and Pe,asyn,hard , versus the number of simultaneous users K for the λc = 1 2-D CHPC without and with hard limiting. 0

10

TABLE I

PARAMETERS FOR THE ERROR PROBABILITY PLOTS

(w,N ) −2

10

(w,L,N)

(5,5,25)

(7,7,49)

(5,5,25)

(7,7,49)

q1

0.1

0.0714

0.0862

0.06333

q2

--

0.00645

0.00376

0.0105

0.00523

0.00052

0.00034

0.0757

0.0581

q q q

QC-CHPC [15]

0,1

0.0167

1,1

--

2,2

=q

--

1,0

0.0089 --

0.0833

0.0625

q 0,2 = q 2,0

--

--

0.00504

0.00287

q 1,2 = q 2,1

--

--

0.00089

0.00055

27

6

23

K @10

-9

−4

10 Error probability Pe

CHPC [3], [23]

(5,25)

(7,49)

−6

10

−8

10

Simulation −10

10

Asynchronous −12

10

8

with the threshold T h = w are plotted against the number of simulations users K, under the chip-synchronous and chipasynchronous assumptions. (For the example of the CHPC code matrices, please see [3] or [23].) For weight w = 5, number of wavelengths L = p1 = p2 = 5, and length N = p1 p2 = 25, we have q1 = w2 /(2LN ) = 0.1, q1,1 = w(w − 1)/[2N (N − 1)] = 0.0167, and q1,0 = q0,1 = 0.0833, as shown in Table I. Also shown in the table are the hit probabilities of the CHPC with w = 7, L = p1 = p2 = 7, and N = p1 p2 = 49. As shown in Fig. 5, the hard-limiting error probability gets worse as K increases. As expected, the performance of the chip-asynchronous case is better than the chip-synchronous case since the latter always gives the performance upper bound. The difference in the error probability increases with w or N and is found to be at least 2-3 orders of magnitude in this example. The computer-simulation results are found closer to the results from the new chip-asynchronous analysis, validating its correctness. For the plots in this section, we perform the computer simulation by first constructing all possible matrices in the 2-D optical code set of given length and weight, and then randomly assigning these code matrices in accordance with the number of simultaneous users selected for the simulation.

Synchronous

0

10

20

30

40 50 60 70 Number of simultaneous users K

80

90

100

Fig. 7. Hard-limiting error probabilities, Pe,syn,hard and Pe,asyn,hard , versus the number of simultaneous users K for the λc = 2 2-D QC-CHPC under the chip-synchronous and chip-asynchronous assumptions.

The starting transmission time of each code matrix is chosen from a random number between [0, 1), which corresponds to the fraction of a chip interval, in order to emulate the chipasynchronous assumption. If needed, nonzero amplitude levels of the multiplexed O-CDMA signal at any time instant is clipped to the same level in order to emulate the hard-limiting operation. To obtain a given error probability, say 10−8 , for a given K, we iterate the simulation by at least 1010 times. As shown in Fig. 6, the soft-limiting and hard-limiting error probabilities, Pe,asyn,soft of (8) and Pe,asyn,hard of (21), of the λc = 1 2-D CHPC are plotted against the number of simultaneous users K, under the chip-asynchronous assumption. The same parameters of Fig. 5 are used. As expected, the chip-asynchronous case with hard limiting has a better performance than without hard limiting, and the difference in the error probability is about 3-4 orders of magnitude in this example. In Fig. 7, the hard-limiting error probabilities, Pe,syn,hard of (7) and Pe,asyn,hard of (21), of the λc = 2 2-D QC-CHPC [15]

HSU et al.: HARD-LIMITING PERFORMANCE ANALYSIS OF 2-D OPTICAL CODES UNDER THE CHIP-ASYNCHRONOUS ASSUMPTION 0

10

(w,N ) −2

10

(7,49)

(5,25)

(5,25) −4

Error probability Pe

10

(7,49) −6

10

−8

10

Simulation (hard−limiting) −10

10

Sof t−limiting Hard−limitin g

−12

10

0

]t

10

20

30

40 50 60 70 Number of simultaneous users K

80

90

100

Fig. 8. Chip-asynchronous error probabilities, Pe,asyn,soft and Pe,asyn,hard , versus the number of simultaneous users K for the λc = 2 2-D QC-CHPC without and with hard limiting. 0

10

−2

10

−4

Error probability Pe

10

Th=3

Th=5

−6

10

Th=4

767

of (21), of the λc = 2 2-D QC-CHPC versus the number of simultaneous users K with different decision threshold T h under the chip-asynchronous assumption. For w = 5, L = 5, and N = 25, the performances of the QC-CHPC with T h = {3, 4, 5} are shown in the upper part of the figure. For w = 7, L = 7, and N = 49, the performances of the QCCHPC with T h = {5, 6, 7} are shown in the lower part of the figure. It can be seen that the QC-CHPC has an increasingly better performance as T h increases. The difference in the error probability is about 1-2 orders of magnitude as T h increases by one. While we use 2-D optical codes as the numerical examples in this paper, it is important to point out that the new analytical technique in Section IV can also be applied to 1-D optical codes after the hit probabilities in (2)–(6) have been obtained. It is because our error-probability equations are functions of the hit probabilities. VI. C ONCLUSIONS In this paper, the hard-limiting performance of optical codes in O-CDMA was analyzed under the more realistic chip-asynchronous assumption. We applied a Markov-chain method to obtain the error probability of optical codes with an arbitrary maximum cross-correlation value. As expected, the performance of the chip-asynchronous case is superior to that of the chip-synchronous case. The difference in the error probability increases with the code weight or length. We also showed that hard limiting improved performance.

Th=6 −8

R EFERENCES

Th=7

10

Th=5 Simulation −10

10

(w,N) = (5,25) (w,N) = (7,49)

−12

10

0

10

20

30

40 50 60 70 Number of simultaneous users K

80

90

100

Fig. 9. Hard-limiting, chip-asynchronous error probability, Pe,asyn,hard , versus the number of simultaneous users K for the λc = 2 2-D QC-CHPC with various decision threshold T h.

are plotted against the number of simultaneous users K, under the chip-synchronous and chip-asynchronous assumptions. (For the example of the QC-CHPC code matrices, please see [15].) For (w = 5, L = p1 = p2 = 5, N = p1 p2 = 25) and (w = 7, L = p1 = p2 = 7, N = p1 p2 = 49), the hit probabilities are shown in Table I. Similar to Fig. 5, the performance of the chip-asynchronous case is always better than the chip-synchronous case, especially for a small K. We can also find that the computer-simulation results are found closer to the results from the new chip-asynchronous analysis, validating its correctness. In Fig. 8, the error probabilities, Pe,asyn,soft of (8) and Pe,asyn,hard of (21), of the λc = 2 2-D QC-CHPC are plotted against the number of simultaneous users K under the chip-asynchronous assumption. The difference in the error probability increases with w or N , which is in the range of 1-4 orders of magnitude in this example. Fig. 9 shows the hard-limiting error probability, Pe,asyn,hard

[1] J. A. Salehi, “Code division multiple-access techniques in optical fiber networks–part I: fundamental principles,” IEEE Trans. Commun., vol. 37, no. 8, pp. 824-833, Aug. 1989. [2] J. A. Salehi and C. A. Brackett, “Code division multiple-access techniques in optical fiber networks–part II: system performance analysis,” IEEE Trans. Commun., vol. 37, no. 8, pp. 834-842, Aug. 1989. [3] G.-C. Yang and W. C. Kwong, Prime Codes with Applications to CDMA Optical and Wireless Networks. Norwood, MA: Artech House, 2002. [4] A. Stok and E. H. Sargent, “The role of optical CDMA in access networks,” IEEE Commun. Mag., vol. 40, no. 9, pp. 83-87, Sept. 2002. [5] J. Ratnam, “Optical CDMA in broadband communication–scope and applications,” J. Opt. Commun., vol. 23, no. 1, pp. 11-21, Jan. 2002. [6] B. Hamzeh and M. Kavehrad, “OCDMA-coded free-space optical links for wireless optical-mesh networks,” IEEE Trans. Commun., vol. 52, no. 12, pp. 2165-2174, Dec. 2004. [7] V. J. Hernandez, A. J. Mendez, C. V. Bennett, R. M. Gagliardi, and W. J. Lennon, “Bit-error-rate analysis of a 16-user gigabit ethernet opticalCDMA (O-CDMA) technology demonstrator using wavelength/time codes,” IEEE Photon. Technol. Lett., vol. 17, no. 12, pp. 2784-2786, Dec. 2005. [8] K. Kitayama, X. Wang, and N. Wada, “OCDMA over WDM PONsolution path to gigabit-symmetric FTTH,” J. Lightwave Technol., vol. 24, no. 4, pp. 1654-1662, Apr. 2006. [9] X. Wang and N. Wada, “Experimental demonstration of OCDMA traffic over optical packet switching network with hybrid PLC and SSFBG en/decoders,” J. Lightwave Technol., vol. 24, no. 8, pp. 30123020, Aug. 2006. [10] C.-S. Br`es, I. Glesk, R. J. Runser, and P. R. Prucnal, “All-optical OCDMA code-drop unit for transparent ring networks,” IEEE Photon. Technol. Lett., vol. 17, no. 5, pp. 1088-1090, May 2005. [11] G.-C. Yang and W. C. Kwong, “Performance comparison of multiwavelength CDMA and WDMA+CDMA for fiber-optic networks,” IEEE Trans. Commun., vol. 45, no. 11, pp. 1426-1434, Nov. 1997. [12] A. J. Mendez, R. M. Gagliardi, V. J. Hernandez, C. V. Bennett, and W. J. Lennon, “High-performance optical CDMA system based on 2-D optical orthogonal codes,” J. Lightwave Technol., vol. 22, no. 11, pp. 2409-2419, Nov. 2004.

768

[13] W. C. Kwong, G.-C. Yang, V. Baby, C.-S. Br`es, and P. R. Prucnal, “Multiple-wavelength optical orthogonal codes under prime-sequence permutations for optical CDMA,” IEEE Trans. Commun., vol. 53, no. 1, pp. 117-123, Jan. 2005. [14] F.-R. Gu and J. Wu, “Construction of two-dimensional wavelength/time optical orthogonal codes using difference family,” J. Lightwave Technol., vol. 23, no. 11, pp. 3642-3652, Nov. 2005. [15] C.-Y. Chang, G.-C. Yang, and W. C. Kwong, “Wavelength-time codes with maximum cross-correlation function of two for multicode-keying optical CDMA,” J. Lightwave Technol., vol. 24, no. 3, pp. 1093-1100, Mar. 2006. [16] S. Sun, H. Yin, Z. Wang, and A. Xu, “A new family of 2-D optical orthogonal codes and analysis of its performance in optical CDMA access networks,” J. Lightwave Technol., vol. 24, no. 4, pp. 1646-1653, Apr. 2006. [17] C.-P. Hsieh, C.-Y. Chang, G.-C. Yang, and W. C. Kwong, “A bipolarbipolar code for asynchronous wavelength-time optical CDMA,” IEEE Trans. Commun., vol. 54, no. 7, pp. 1190-1194, July 2006. [18] V. Baby, I. Glesk, R. J. Runser, R. Fischer, Y.-K. Huang, C.-S. Br`es, W. C. Kwong, T. H. Curtis, and P. R. Prucnal, “Experimental demonstration and scalability analysis of a four-node 102-Gchip/s fast frequency-hopping time-spreading optical CDMA network,” IEEE Photon. Technol. Lett., vol. 17, no. 1, pp. 253-255, Jan. 2005. [19] C.-S. Br`es, I. Glesk, and P. R. Prucnal, “Demonstration of an eightuser 115-Gchip/s incoherent OCDMA system using supercontinuum generation and optical time gating,” IEEE Photon. Technol. Lett., vol. 18, no. 7, pp. 889-891, Apr. 2006. [20] M. Y. Azizo˜glu, J. A. Salehi, and Y. Li, “Optical CDMA via temporal codes,” IEEE Trans. Commun., vol. 40, no. 7, pp. 1162-1170, July 1992. [21] H. Kwon, “Optical orthogonal code-division multiple-access systems– part II: multibits/sequence-period OOCDMA,” IEEE Trans. Commun., vol. 42, no. 8, pp. 2592-2599, Aug. 1994. [22] D. Brady and S. Verdu, “A semiclassical analysis of optical code division multiple access,” IEEE Trans. Commun., vol. 39, no. 1, pp. 85-93, Jan. 1991. [23] G.-C. Yang and W. C. Kwong, “Performance analysis of extended carrier-hopping prime codes for optical CDMA,” IEEE Trans. Commun., vol. 53, no. 5, pp. 876-881, May 2005. [24] C.-C. Hsu, G.-C. Yang, and W. C. Kwong, “Performance analysis of 2-D optical codes with arbitrary cross-correlation value under the chipasynchronous assumption,” IEEE Commun. Lett., accepted. [25] H. M. H. Shalaby, “Chip-level detection in optical code division multiple access,” J. Lightwave Technol., vol. 16, no. 6, pp. 1077-1087, June 1998. [26] J.-J. Chen and G.-C. Yang, “CDMA fiber-optic systems with optical hard limiters,” IEEE J. Lightwave Technol., vol. 18, no. 7, pp. 950-958, July 2001. [27] T. Ohtsuki, K. Sato, I. Sasase, and S. Mori, “Direct-detection optical synchronous CDMA systems with double optical hard-limiters using modified prime sequence codes,” IEEE Trans. Commun., vol. 14, no. 9, pp. 1879-1887, Dec. 1996. [28] X. Wang and K. Kitayama, “Analysis of beat noise in coherent and incoherent time-spreading OCDMA,” J. Lightwave Technol., vol. 22, no. 10, pp. 2226-2235, Oct. 2004. [29] M. Meenakshi and I. Andonovic, “Affect of physical layer impairments on SUM and AND detection strategies for 2-D optical CDMA,” IEEE Photon. Technol. Lett., vol. 17, no. 5, pp. 1112-1114, May 2005. [30] S. Ayotte and L. A. Rusch, “Experimental comparison of coherent versus incoherent sources in a four-user lambda-t OCDMA system at 1.25 Gb/s,” IEEE Photon. Technol. Lett., vol. 17, no. 11, pp. 24932495, Nov. 2005. [31] C.-H. Lin, J. Wu, C.-L. Yang, “Noncoherent spatial/spectral optical CDMA system with two-dimensional perfect difference codes,” J. Lightwave Technol., vol. 23, no. 12, pp. 3966-3980, Dec. 2005. [32] J.-F. Huang, and C.-C. Yang, “Permuted M-matrices for the reduction of phase-induced intensity noise in optical CDMA network,” IEEE Trans. Commun., vol. 54, no. 1, pp. 150-158, Jan. 2006. [33] T. M. Bazan, D. Harle, and I. Andonovic, “Performance analysis of 2-D time-wavelength OCDMA systems with coherent light sources: code design considerations,” J. Lightwave Technol., vol. 24, no. 10, pp. 3583-3589, Oct. 2006. [34] T. M. Bazan, D. Harle, and I. Andonovic, “Mitigation of beat noise in time-wavelength optical code-division multiple-access systems,” J. Lightwave Technol., vol. 24, no. 11, pp. 4215-4222, Nov. 2006. [35] C.-S. Br`es, Y.-K. Huang, D. Rand, I. Glesk, P. R. Prucnal, T. M. Bazan, C. Michie, D. Harle, and I. Andonovic, “On the experimental

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

characterization of beat noise in 2-D time-spreading wavelength-

[36] [37] [38] [39] [40]

hopping OCDMA systems,” IEEE Photon. Technol. Lett., vol. 18, no. 21, pp. 2314-2316, Nov. 2006. H. Gibbs, Optical Bistability: Controlling Light with Light. New York: Academic, 1985. J. Jewell, M. Rushford, and H. Gibbs, “Use of a single nonlinear FabryPerot etalon as optical logic gates,” Appl. Phys. Lett., vol. 44, no. 2, pp. 172-174, Jan. 1984. J.-H. Wu and J. Wu, “Synchronous fiber-optic CDMA using hard limiter and BCH codes,” J. Lightwave Technol., vol. 13, pp. 11691176, June 1995. A. Leon-Garcia, Probability and Random Processes for Electrical Engineering, 2nd ed. Reading, MA: Addison-Wesley, 1994. D. Lay, Linear Algebra and Its Applications. Reading, MA: AddisonWesley, 1994.

Chia-Cheng Hsu was born in Taipei, Taiwan. He received a B.S. degree in 2004 and an M.S. degree in 2006, both in electrical engineering from the National Chung-Hsing University, Taichung, Taiwan. His research interests include optical communications and wireless communications.

Guu-Chang Yang (S’88-M’89-SM’05) received the B.S. degree in electrical engineering from the National Taiwan University, Taiwan, in 1985, and the M.S. and Ph.D. degrees in electrical engineering from the University of Maryland, College Park, Maryland, in 1989 and 1992, respectively. ¿From 1988 to 1992, Dr. Yang was a research assistant in the System Research Center at the University of Maryland. In 1992, he joined the faculty of the National Chung-Hsing University, Taichung, Taiwan, where he is currently a Professor of the Department of Electrical Engineering. He was the chairman of the Department of Electrical Engineering from 2001 to 2004. His research interests include wireless and optical communication systems, spreading code designs, and applications of code-division multiple access. He co-authored the first-of-its-kind technical book on optical CDMA, “Prime codes with applications to CDMA optical and wireless networks” (Norwood, MA: Artech House), in 2002 and contributed one chapter to another optical CDMA book, “Optical Code Division Multiple Access: Fundamentals and Applications” (Boca Raton, FL: Taylor & Francis), in 2006. Dr. Yang served as Chairman of the IEEE Information Theory Society Taipei Chapter from 2003 to 2005, and served as the Vice-Chairman of the IEEE Information Theory Society Taipei Chapter from 1999 to 2000. He received the Distinguished Research Award from the National Science Council in 2004 and Excellent Young Electrical Engineering Award from the Chinese Institute of Electrical Engineering in 2003. He also received the Best Teaching Awards from the Department of Electrical Engineering of the National ChungHsing University from 2001 to 2004.

Wing C. Kwong (S’88-M’92-SM’97) received a B.S. in electrical engineering from the University of California, San Diego, in 1987, and a Ph.D. degree in electrical engineering from Princeton University, Princeton, New Jersey, in 1992. In 1992, he joined the faculty of Hofstra University, Hempstead, New York, where he is presently a professor in the Department of Engineering. His research interests are centered on optical and wireless communication systems and multiple-access networks, optical interconnection networks, and ultrafast all-optical signal processing techniques. Dr. Kwong is an associate editor of the IEEE Transactions on Communications. He was the recipient of the NEC Graduate Fellowship awarded by the NEC Research Institute in 1991. He received the Young Engineer Award from the IEEE (Long Island section) in 1998. He has published over 120 professional papers and chaired technical sessions in international conferences. He also gave invited seminars in various countries, such as Canada, Korea, and Taiwan. He was a co-author of the first-of-its-kind technical book on optical CDMA, “Prime codes with applications to CDMA optical and wireless networks” (Norwood, MA: Artech House), in 2002 and contributed one chapter to another optical CDMA book, “Optical Code Division Multiple Access: Fundamentals and Applications” (Boca Raton, FL: Taylor & Francis), in 2006.

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

769

Joint Optimum Linear Precoding and Power Control Strategies for Downlink MC-CDMA Systems Nevio Benvenuto, Paola Bisaglia, and Federico Boccardi

Abstract—In this paper we derive two iterative joint precoding and power allocation methods for downlink MC-CDMA systems with multiple transmit antennas at the base station. The first method optimally solves the so called assigned target SINRs problem, i.e. determines the precoding coefficients and the minimum sum-power allocation under a constraint on the SINR of each user. The second method instead solves the maximum sum-rate problem, i.e. maximizes the system throughput under a constraint on the overall transmit power. The main result of this paper is that the proposed joint precoding and power control solutions provide an advantage with respect to suboptimal solutions for a fully loaded system, i.e. when the number of users is equal to the spreading factor times the number of transmit antennas. For lower loads, performance of the various schemes becomes closer. Index Terms—Downlink, MC-CDMA, multi-antenna, powercontrol, precoding.

I. I NTRODUCTION

M

ULTICARRIER-CODE division multiple access (MCCDMA) systems are being considered as potential candidates for next generation (4G) technology. In fact, MCCDMA inherits from CDMA the multiple access flexibility, while from orthogonal frequency division multiplexing (OFDM) characteristics such as robustness against multi-path propagation [1]. Experimental systems using MC-CDMA have been implemented by NTT DoCoMo [2] and IST European projects MATRICE [3] and 4MORE [4]. Unfortunately, the performance of MC-CDMA systems is essentially limited by the multiple-access interference (MAI), caused by the loss of orthogonality among users in multipath environments. To overcome this problem, in downlink transmissions with multiple antennas at the base station, precoding techniques have been proposed [5]- [11]: the idea is that channel-dependent transmit processing can improve performance by optimally allocating resources, such as complex gain and loaded bits over the different subcarriers and antennas [12]. The channel state Paper approved by S. Ulukus, the Editor for Wireless Communication Theory of the IEEE Communications Society. Manuscript received July 25, 2005; revised September 7, 2006 and January 30, 2007. This research was supported in part by the Italian Ministry of University and Research (MIUR) and in part by the European project NEWCOM. The work was initiated when P. Bisaglia and F. Boccardi were with the University of Padova. N. Benvenuto is with the Dipartimento di Ingegneria dell’Informazione, University of Padova, Via Gradenigo 6/B 35131 Padova, Italy (e-mail: [email protected]). P. Bisaglia is with Dora Spa, STMicroelectronics Group, Via Lavoratori Vittime del Col du Mont 24, 11100 Aosta, Italy (e-mail: [email protected]). F. Boccardi is with Bell Labs, Alcatel-Lucent, The Quadrant, Stonehill Green, Westlea, Swindon, Wiltshire, SN5 7DJ, UK (e-mail: [email protected]). Digital Object Identifier 10.1109/TCOMM.2008.050153.

information should be fedback to the transmitter in frequency division duplex (FDD) systems (see for example [13] and references therein) or should be estimated at either the receiver or the transmitter in time division duplex (TDD) systems with symmetric channels. In [5]- [8] different precoding methods are proposed, based on i) the maximum ratio transmission (MRT) criterion, that is the analogous of maximum ratio combining (MRC) at the receiver, ii) the zero-forcing (ZF) criterion and iii) the modified signal-to-interference plus noise ratio (SINR) criterion. However, in these works separated power constraints are imposed for each active user. This leads to a suboptimum total transmit power allocation. Only in [9][11] precoding techniques with a power constraint on the overall transmit power are considered. In particular, in [9] and [10] the precoding coefficients are found according to a sort of ZF criterion. Besides, the precoding coefficients and power allocation are found by separate criteria. Overall, it is seen that performance of this scheme is poor, especially at low signalto-noise ratio (SNR) [14]. In [11] the precoding coefficients and power allocation are jointly determined according to a suboptimum criterion, given by the sum of mean square errors of all users. In this paper we develop two iterative joint precoding and power allocation methods for downlink MC-CDMA systems with multiple transmit antennas at the base station. The first method optimally solves the so called assigned target signal-to-interference-plus-noise ratios (SINRs) problem, i.e. the precoding coefficients and the minimum sum-power allocation are found under a constraint on the SINR of each user. The second method instead solves the maximum sumrate problem, i.e. maximizes the system throughput under a constraint on the overall transmit power. To achieve these targets we first develop a new analytical formulation for the downlink multi-input single-output (MISO) MC-CDMA received signal, which is reminiscent of the well-known multiinput multi-output Gaussian broadcast channel (MIMO-GBC) model. In fact, we tackle the assigned target SINRs problem as follows. Firstly, we solve the dual uplink precoding problem using the standard power control strategy proposed in [15], [16], iteratively updating the transmit powers and receiver filter coefficients of the users. If the problem is feasible this algorithm converges to both a minimum solution for the powers and a minimum mean square error (MMSE) multiuser solution for the receiver coefficients. Secondly, using the duality as in [17] and [18], we determine the downlink optimum precoding coefficients. Lastly, still based on [17], we find the user power allocation. As regards the maximum sum-rate problem, we use the algo-

c 2008 IEEE 0090-6778/08$25.00 

770

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

rithm proposed in [19] (see also [20]), in the framework of the single carrier MIMO-GBC, suitably modified to MC-CDMA systems. This technique solves an unconstrained maximization problem, starting from a suitable initial solution and iteratively updating the precoding coefficients. In this paper, we also compare the above optimum solutions with lower complexity suboptimum solutions where the precoding is set to be a suboptimal Moore-Penrose matrix and two power allocation strategies are considered: the first is based on a water-filling approach and the second on an assigned target SINRs approach. We compare the various solutions in terms of sum-power and sum-rate, both with perfect channel knowledge and with inaccurate channel estimate. It is seen that the proposed joint precoding and power control solutions may provide a big advantage in performance with respect to suboptimal solutions [7], [9]–[11] only for a fully loaded system. Furthermore, we note that, although the optimum schemes proposed in this paper are based on iterative solutions, while for the suboptimal schemes of [7], [9]–[11] a close form solution exists, the number of required iterations is usually small. The paper is organized as follows. In Section II we introduce the system model and in Section III we present two optimal joint precoding and power allocation schemes. In Section IV we consider two simplified precoding schemes which make use of the Moore-Penrose pseudoinverse and different power allocation strategies. As benchmark for the proposed precoders, in Section V we briefly review the structure proposed in [9], [10] and present also performance of a scheme based on MMSE single user post detector and iterative power allocation. In Section VI we discuss the simulation results, also with inaccurate channel estimate. Finally, conclusions are given in Section VII. The following notation will be used. Vectors and matrices are in bold face. The symbols (·)∗ , (·)T , (·)H , (·)−1 ,  ·  and E [·] designate the complex conjugation, the transposition, the Hermitian, the inverse, the Euclidean vector norm and the expectation, respectively. Moreover, • I denotes the identity matrix while 0 indicates a vector whose entries are all zero.  • If T is a matrix, T , tr(T ), [T ][:,u] and [T ]u,v denote respectively the pseudo-inverse, the trace, the u-th column and the (u, v) element. • If x and y are vectors, xj , x ◦ y, x  y and diag{x} indicate, respectively, the j-th element of x, the element wise vector multiplication, the componentwise inequality between the two vectors and a diagonal matrix whose diagonal is the vector x. + • (x) denotes the positive part of x, i.e.  x if x ≥ 0 + (x) = (1) 0 if x < 0 II. S YSTEM M ODEL We consider a downlink MC-CDMA system within a MISO configuration where the base station is equipped with M antennas, while the mobile terminal has only one antenna. Let d(u) denote the transmitted data symbols of user u, NSF

T  (u) (u) c0 , . . . , cNSF −1 the √ (u) unit-energy spreading sequence ci ∈ {±1/ NSF }, selected from a set of orthogonal codes. Each symbol is spread by repeating d(u) NSF times and multiplied by the user specific spreading sequence. A serial-to-parallel conversion then follows to load the chips into the OFDM modulator. The chip period Tchip is related to the symbol period T by Tchip = T /NSF . In this work to simplify the notation the number of OFDM subcarriers Q equals the spreading factor (Q = NSF ). Precoding is performed before OFDM modula˜ (u) that tion by multiplying the signal by a complex vector k pre will be introduced later. OFDM modulation is implemented by using the inverse discrete Fourier transform (IDFT). Last, a cyclic prefix (CP) of suitable length is inserted in front of every OFDM symbol to avoid intersymbol interference. For each user u, with u = 0, . . . , U − 1, we define the following quantities. the spreading factor and c(u) =

1) The QM × 1 spreading vector cs(u) = [c(u)T , c(u)T , . . . , c(u)T ]T

(2)

assuming the same spreading sequence is used across the M transmit antennas. 2) The QM × 1 precoding coefficient vector, including the spreading operation

with

(u) ˜ (u) ◦ c(u) =k kpre s pre

(3)

 T ˜ (u) = k˜(u) [0], . . . , k˜(u) [QM − 1] k pre pre pre

(4)

which assigns a complex gain for each subcarrier signal transmitted over the different antennas. 3) The QM × 1 channel vector, including the despreading operation T

T

(u) heq = [h(0,u) , . . . , h(M−1,u) ]T eq eq

(5)

with = c(u)∗ ◦ h(m,u) h(m,u) eq

m = 0, . . . , M − 1 (6)

and h(m,u) is the Q × 1 channel frequency response between the m-th transmit antenna and the u-th user. We emphasize that, differently from [21], in this paper we do not consider joint transmit-receive optimization since, although powerful, this approach would lead to an iterative procedure with an increased computational complexity. Here, at the receiver, the signals received on the different subcarriers are simply combined via despreading. For a discussion about different combining techniques in the presence of a precoder in a MCCDMA we refer to [11]. Last, from (3) and (5) we define the QM ×U precoding matrix ⎡ ⎤T T k(0) pre ⎢ ⎥ .. ⎥ K pre = ⎢ (7) . ⎣ ⎦ −1) k(U pre

T

BENVENUTO et al.: JOINT OPTIMUM LINEAR PRECODING AND POWER CONTROL STRATEGIES FOR DOWNLINK MC-CDMA SYSTEMS

and the equivalent U × QM system channel matrix ⎡ ⎤ T h(0) eq ⎢ ⎥ .. ⎥ H eq = ⎢ . ⎣ ⎦ T −1) h(U eq

(8)

At the receiver, after the DFT and despreading operation, the equivalent input-output model is given by y = H eq K pre d + neq = H eq x + neq

(9)

T

where d = [d0 , . . . , dU −1 ] is the U × 1 user data vector, x = K pre d is the transmitted vector, and neq ∼ CN (0, σn2 eq I) represents the additive white (complex) Gaussian noise (AWGN), after the despreading stage. If Pmax is the total transmit power, the transmitted vector is constrained such that

 E ||x||2 = tr E xxH ≤ Pmax (10) We also define the average SNR as ρ=

Pmax σn2 eq Q

(11)

2 To simplify the notation the following we consider σneq = in 2 1. Defined qu = E |du | , u = 0, . . . , U − 1, and q = [q0 , . . . , qU −1 ]T , from (9) the SINR γu of user u at the detection point is given by  2   qu [H eq K pre ]u,u  γu = , u = 0, . . . , U − 1 U −1  2    1+ qj [H eq K pre ]u,j 

In the dual uplink the optimal receive vectors and power allocation that minimizes the total transmitted power are easy to find. From [22] we know that for any set of transmit powers p, the receiver that maximizes the SINR of each user is the linear MMSE filter given by −1  H + H H HH Gopt (p) = σn2 ul R−1 ul ul ul dul −1  

−1 H = diag p0 , . . . , p−1 HH ul U−1 + H ul H ul (15) where we have made explicit the dependence on p. The optimal allocation of the powers for the dual uplink can be obtained using the iterative

algorithm  proposed in [15], as shown in (16), with Gopt p(l−1) [u,:] the u-th row of

 Gopt p(l−1) , and l denoting the iteration index.

This

power  control technique has the property that lim diag p(l) = l→∞ p , where p is the componentwise minimum power solution for the SINRs constraint problem. If the sum-power diverges, then the problem is not feasible. Once the optimal receiver for the dual uplink is found, the optimal precoder for the problem (13) is given by  K pre = GH opt (p )

III. O PTIMUM J OINT P RECODING M ATRIX AND P OWER L OADING A. Minimum sum-power for the assigned target SINRs (SOP) We consider the following problem ⎧ 

⎨ min tr E xxH q,K pre

⎩γ γ,

(13)

where γ is the target SINR vector. In the framework of beamforming for the MIMO-BC this problem has been solved using the duality between uplink and downlink [17], [20]. Here, the dual uplink channel model of (9) can be written as y ul = GH ul dul + nul

(14)

 H 2 where G = K H pre , H ul = H eq , nul ∼ CN 0, σnul I is AWGN with σn2 ul = 1 and d  U × 1 information data  ul is the H vector, such that Rdul = E dul dul = diag(p0 , . . . , pU−1 ). We recall from [17] the following major findings: i) The set of SINRs achieved with all possible choices of G is the same for dual uplink and downlink. ii) For all possible choices of G, the sum of the powers of the users in the dual uplink is equal to the total transmitted power in the downlink.

(17)

and the optimal power allocation q  for the downlink can be found by solving the following system of equations [I − diag (a) H eq K pre ] q = a

(18)

where

j=0,j=u

(12) and the vector γ of SINRs is given by γ = [γ0 , . . . , γU−1 ]T .

771

au =

γu , (γ u + 1) [H eq K pre ]u,u

u = 0, . . . , U − 1

(19)

This structure will be denoted as SINR constrained Optimum Precoder (SOP). We emphasize that the SOP solves the assigned target SINRs problem with the minimum sum power with respect to all possible linear precoders. B. Maximum sum-rate under a total power constraint (POP) Let R =

U−1 

log (1 + γu ). The max sum-rate problem can

u=0

be written as

⎧ ⎨ max R q,K pre ⎩tr E xxH  ≤ P max

(20)

This is a not convex problem and the optimum solution can be searched for example by using some heuristic approaches based on evolutionistic techniques [23]. In [19], a faster heuristic technique is proposed for the MIMO-BC when the channel matrix is square. In fact, it can be verified that the proof of the Lemma 1 in [19] holds when the channel matrix is not square: hence we are able to use it when QM > U . The following equivalent input-output unconstrained model is used K pre  d + neq y = H eq (21) β

772

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

⎛ γu ⎝

U −1 

v=0,v=u

pu(l) =

⎞        2   2     p(l−1) +  Gopt p(l−1)  ⎠  Gopt p(l−1) H ul  v    u,v [u,:]

T where d = d0 , . . . , dU −1 is the normalized data vector  2 such that E |d = 1, u = 0, . . . , U − 1, neq ∼  u| CN 0, σn2 eq I , where σn2 eq = ρ1 , with ρ defined in (11), and    (22) β = tr K H pre K pre

is a normalization factor. Hence, (20) can be rewritten as max R = max

K pre

K pre

U −1 

U −1  j=0,j=u

(23)

 2   [H eq K pre ]u,j  . We empha-

size that (23) corresponds to (20), rewritten as an unconstrained problem and that the power loading is implicitly realized by the precoding matrix. It is possible to show that a stationary point of (23) has the form [19], [24] −1  K pre = σn2 eq tr (D) I + H H HH (24) eq DH eq eq Δ ⎛

 2   [H eq K pre ]u,u  ⎜ D = diag ⎜ 2  ⎝    αu αu + [H eq K pre ]u,u  and

In this section we consider suboptimum precoding strategies. Firstly, the Moore-Penrose precoding is used with  −1 H K pre = H eq = H H H H (27) eq eq eq Hence, the received signal vector (9) becomes y = d + neq



2 ⎞    U −1 K ] [H  eq pre u,u  ⎟ ⎜ log ⎝1 + = max ⎠ K pre αu u=0

where

IV. T HE M OORE -P ENROSE P RECODER WITH O PTIMUM P OWER A LLOCATION

log (1 + γu )

u=0

where αu = β 2 σn2 eq +

(16)



 2   Gopt p(l−1) H ul u,u 

 Δ = diag

[H eq K pre ]u,u αu

⎞ ⎟ ⎟ ⎠

(28)

Still, the transmitted vector must satisfy the power constraint in (10). We observe that using a Moore-Penrose beamformer, the MISO multi-user channel is decoupled into U SISO single-user parallel AWGN channels, with a common power constraint [25]. From (28) the SINR of user u is given by γu =

qu , σn2 eq

u = 0, . . . , U − 1

(29)

Next, based on the SNR (29), in Section IV-A we consider the minimum power solution under a SINR constraint and in Section IV-B the maximum sum-rate solution under a power constraint. A. Minimum sum-power for the assigned target SINRs (SMPP)

(25)

By using K pre given in (27), problem (13) becomes  

min tr E xxH q

 (26)

The equation (24) cannot be solved explicitly and the following iterative algorithm is proposed [19]:

γu ≥ γ u ,

u = 0, . . . , U − 1

(30)

From (29), the minimum sum power solution is trivial and is given by qu = σn2 γ u ,

u = 0, . . . , U − 1

(31)

eq (0) 0) D (0) ← I,  1 −1  Δ ← I, l ← (l) 2 (l−1) H (l−1) H (l−1) I + H eq D H eq H eq Δ 1) K pre ← σneq tr D In (31) the power is allocated in order to guarantee the same

2) Compute D (l) and Δ(l) by (25) and (26) U −1  3) R(l) ← log(1 + γu(l) ) (l)

u=0

(l−1)

4) if |R R−R (l−1) else GOTO 5) 5) l ← l + 1 GOTO 1)

|

< ε then STOP

In 4) ε is a suitable small −1 real number. Moreover, here (1) H 2 HH K pre = σneq I + H eq H eq eq corresponds to the regularized pseudo-inverse of the channel matrix [14]. This choice of K (1) pre guarantees that the iterative technique converges within a small number of iterations. We call this structure Power constrained Optimum Precoder (POP).

SINR to each user. We note that if the channel is full-rank and QM ≥ U , it is always possible to find the power allocation which guarantees to each user the required SINR. We call this structure SINRs constrained Moore-Penrose Precoder (SMPP). B. Maximum sum-rate under a total power constraint (PMPP) By using K pre given in (27), problem (20) becomes ⎧ U −1 ⎪ ⎨max  log (1 + γ ) q

u

⎪ ⎩ u=0H  ≤ Pmax tr E xx

(32)

BENVENUTO et al.: JOINT OPTIMUM LINEAR PRECODING AND POWER CONTROL STRATEGIES FOR DOWNLINK MC-CDMA SYSTEMS

Let Θ = K H pre K pre . From (29), problem (32) can be rewritten as ⎧   U −1  ⎪ q ⎪ u ⎪ log 1 + 2 ⎪ ⎨max q σneq u=0 (33) U −1  ⎪ ⎪ ⎪ ⎪ [Θ]u,u qu ≤ Pmax ⎩ u=0

which for qu = [Θ]u,u qu , becomes ⎧   U−1   ⎪ q ⎪ u ⎪ log 1 + ⎪ ⎨max q [Θ]u,u σ 2 −1 ⎪U

⎪ ⎪ ⎪ ⎩

neq

u=0

1 qu = " λ[Θ]u,u Moreover, since λ ≥ 0 from (39) we set [Θ]v,v qv = Pmax

(34)

u=0

qu = Pmax

(41)

(35)

Now, (40) into (41) yields U−1 2 # [Θ]v,v λ=

v=0

(42)

2 Pmax

Substituting (42) in (40) we finally obtain qu =

where ν is such that U −1 

(40)

v=0

Problem (34) is a convex optimization problem whose solution can be found by the water-filling method [25], [26], and is given by u = 0, 1, . . . , U − 1

where λ and μu , u = 0, . . . , U − 1, are the Lagrange multipliers. From the KKT conditions it is μu = 0, u = 0, . . . , U − 1. Setting the first derivative of J with respect to qu to zero gives

U−1 

qu ≤ Pmax

qu = (ν − [Θ]u,u σn2 eq )+

773

Pmax U −1 #  " [Θ]u,u [Θ]v,v

(43)

v=0

(36)

u=0

We note that with a water-filling power allocation the number of active users, for which qu > 0, increases with the SNR. We call this structure Power constrained Moore-Penrose Precoder (PMPP). V. B ENCHMARKS A. A suboptimum MC-CDMA precoder [9], [10] (SISMPP) In this section we review, as benchmark for our proposed precoders, the suboptimum precoder [9], [10], where again a Moore-Penrose beamformer is used and the power is allocated such that the sum of the inverse of the SINR is minimized. We emphasize that this criterion is suboptimum both in the beamformer choice and in the power allocation strategy. The problem can be formulated as follows: ⎧ U −1  ⎪ ⎪ ⎪ min γu−1 ⎪ q ⎪ ⎪ ⎨ u=0 (37) q 0 ⎪ ⎪ U −1 ⎪  ⎪ ⎪ ⎪ [Θ]u,u qu ≤ Pmax ⎩ u=0

Assuming that all users have the same noise variance, (37) can be written as ⎧ U −1  ⎪ 1 ⎪min ⎪ ⎪ ⎪ q q ⎪ ⎪ u=0 u ⎨ (38) q 0 ⎪ ⎪ ⎪ U −1 ⎪  ⎪ ⎪ ⎪ [Θ]u,u qu ≤ Pmax ⎩ u=0

The solution to (38) is found looking for the minimum of the augmented cost function  U −1 U −1 U −1    1 J= +λ [Θ]u,u qu − Pmax − μu qu (39) q u=0 u u=0 u=0

This scheme is denoted as Sum of Inverse SINR Moore-Penrose Precoder (SISMPP). B. The single-user MMSE post-detector with optimum power allocation (SUPD) For comparison purposes, it is interesting to study the gap between pre and post-detection techniques for MC-CDMA systems. We recall that at the receiver only single-user detection techniques can be performed, whereas at the transmitter joint encoding techniques are possible. In this section we consider the single user minimum mean square error (MMSE) post-detector, whose coefficients are found iteratively in conjunction with the power control [15], [16], originally proposed by Yates and Ulukus in the framework of multi-user detection in uplink CDMA systems. In order to perform a combining at the receiver side,   we con(m,u) sider a modified system model. Let H = diag h(m,u)   (0,u) (u) (M−1,u) , a Q × QM matrix. = H ,...,H and H (u)

(u) Setting kpre = cs , u = 0, . . . , U − 1, for the generic user u the Q × 1 received vector after the DFT and before combining and despreading is given by (u)

r(u) = H K pre d + n (44)  2 σn eq where from (9), n ∼ CN 0, NSF I . We now consider a 1× Q combining vector g (u) , u = 0, . . . , U − 1, that includes also the despreading operation, to yield the signal y (u) = g (u) r (u) . The MMSE receiver is found by   (u) (45) g MMSE = arg min E |y (u) − d(u) |2 g (u)   = arg min E |g (u) r (u) − d(u) |2 (46) g (u)

whose solution yields (u)H

−1

g MMSE = R(u) s(u)

(47)

774

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

where

18

and



σn2 eq (u) (u)H I+H K pre diag(q) K H = pre H NSF (48)

  (u)   ∗ (u) s(u) = E r(u) d(u) = H kpre

[:,u]

qu

14

(49)

We apply now the power control [15], [16], to find the power allocation that fulfils the SINR constraints. Let   ζu g (u) (q) , q =

  2   (u) (u)  g H K pre 

2 σn

eq g (u) 2 NSF +

U −1 

u

  2 (u)   qi  g (u) H K pre 

i=0, i=u

u = 0, . . . , U − 1

the SINR for user u is given by   γu = qu ζu g (u) (q) , q

POP [7] [11] PMPP SISMPP

16

sum−rate [bit/s/Hz]

R(u) = E r (u) r (u)

H

,

12

M=1

10

M=2

8 6 4

i

2

(50) 0

0

5

10

15

(51)

We define the interference function I as ⎡ ⎤T γ U −1 γ0 , . . . ,  ⎦ I (q) = ⎣  (0) (U −1) ζu g MMSE (q) , q ζu g MMSE (q) , q (52) Since I(·) satisfies the following properties [15], [16] • I 0   (u) ≥ • If q 1

q 2 then ζu g MMSE (q 2 ) , q 2     (u) (u) ζu g MMSE (q 1 ) , q 2 ≥ ζu g MMSE (q 1 ) , q 1 u = 0, . . . , U − 1. Hence I (q 1 )  I (q 2 ).  (u) • If α > 1 then ζu g MMSE (αq) , αq ≥     (u) (u) ≥ ζu g MMSE (q) , q /α, ζu g MMSE (q) , αq for u = 0, . . . , U − 1. Hence αI (q) I (αq), 

it is seen that the iterative equation q (l) = I q (l−1) converges to the optimal solution. VI. S IMULATION R ESULTS A. Performance in presence of perfect channel estimate In order to evaluate the performance of the proposed schemes, we simulated a MC-CDMA system having 64 subcarriers and a cyclic prefix equal to 16. The spreading factor was chosen equal to the number of subcarriers. The channel was modeled as Rayleigh frequency selective fading with an exponential power delay profile, having a normalized r.m.s delay spread of one sample period. Channels across the different antennas were assumed i.i.d.. Moreover we assumed the channel stationary over at least one OFDM symbol, although it may change from symbol to symbol. In the case of maximum sum-rate precoders, the results are given in terms of sum-rate (in bit/s/Hz) versus the normalized SNR ρ, while in the case of minimum sum-power precoders, the results are given in terms of sum-power versus the required SINR. In Figure 1(a) a comparison of POP, PMPP, SISMPP and precoders proposed in [7] and [11], in terms of average sumrate versus ρ is shown, for Q = 64, U = 64 and with two values of M . We observe that for M = 1 (fully loaded system) the POP outperforms the other structures. However, at very high SNRs the performance of the various precoders gets

20 ρ [dB]

25

30

35

40

(a)

18 POP [7] [11] PMPP SISMPP MP with eq. pow. alloc

16 14

sum−rate [bit/s/Hz]



12 10

M=2

8 6 M=1

4 2 0

0

5

10

15

20 ρ [dB]

25

30

35

40

(b) Fig. 1. Comparison between different precoders in terms of sum-rate versus ρ, Q = 64. 1(a): U = 64, M = 1 (fully loaded system) and M = 2 (half loaded system). 1(b): U = 128, M = 1 (overloaded system) and M = 2 (fully loaded system).

closer. Moreover, for M = 2 (half loaded system) the gap between the considered schemes is reduced. We note that the Moore-Penrose precoder depends on the solution of a system of U equations and M Q unknowns and for M = 2 the number of degrees of freedom is much greater than in the case with M = 1. In order to consider a hybrid space division multiple access (SDMA)/CDMA approach, we have also simulated an overloaded MC-CDMA, with U > QM , where a random reallocation of U − Q spreading sequences between the U users is used. In Figure 1(b) the comparison is for a system with Q = 64 and U = 128. For M = 1 (overloaded system) we observe that the POP, by exploiting the multiuser diversity, selects the best user subset to maximize the sum-rate.

BENVENUTO et al.: JOINT OPTIMUM LINEAR PRECODING AND POWER CONTROL STRATEGIES FOR DOWNLINK MC-CDMA SYSTEMS

10

50 SMPP SOP SUPD

M=2, Q=64, U=128 M=1, Q=64, U=64

9

40

8 7

M=1

30 Sum Power [dB]

sum rate [bit/s/Hz]

775

6 5 4

20

10

3 2

0 M=2

1 0

−10 1

2

3

4

5

6 7 8 iteration number

9

10

11

12

13

B. Performance in presence of inaccurate channel estimate Here we consider the effect of non perfect channel state information in the various proposed schemes. For the precoding schemes this effect can be due to inaccurate channel estimates or delay between the channel estimation at the receive side and the reception of the feedback at the transmit side. For the MMSE post-detection scheme this effect may be due to inaccurate channel estimates.

4

8 12 required SINR [dB]

16

20

16

20

(a)

Fig. 2. Sum-rate vs iteration number, as determined by the POP algorithm, for a single channel realization with ρ = 20 dB and two system configurations: i) U = Q = 64 and M = 1, and ii) Q = 64, U = 128, and M = 2.

50 SMPP SOP SUPD

40

30 Sum Power [dB]

As regards the Moore-Penrose precoder, for U > QM the solution requires to solve a system with more equations than unknowns; therefore the zero forcing (ZF) solution does not exist. For this reason we considered an equal power allocation which experiences a floor due to interference at 2 bit/s/Hz. However, for M = 2 (fully loaded system) it is U = QM and the ZF solution exists. We observe that at low SNRs, the POP slightly outperforms the other schemes. In Figure 2 we studied the converge speed of the POP algorithm, for a given channel realization with ρ = 20 dB and two system configurations: i) Q = 64, U = 64 and M = 1, and ii) Q = 64, U = 128, and M = 2. We note that for the first configuration after 5 iterations the algorithm achieves the 90% of the final sumrate value, while for the second configuration the convergence is slightly slower, due to the increased dimensions of the precoding matrix. In Figure 3(a) a comparison in terms of sum-power versus required SINR is shown, for Q = 64, U = 64, M = 1 and M = 2. We observe that for M = 1 the SOP outperforms both SUPD and SMPP, which in turn have a similar performance. However, at very high SINRs, all precoders perform similarly. For M = 2 both the SOP and SMPP perform better than the SUPD, which shows same performance as the case M = 1. Again, at low SINRs the SOP performs slightly better than the SMPP. Figure 3(b) reports the same comparison, but with Q = 64 and U = 32. We observe that the increased number of degrees of freedom brings the performance of the different precoders closer.

0

20 M=1 10

0 M=2 −10

0

4

8 12 required SINR [dB]

(b) Fig. 3. Comparison between three precoders in terms of minimum power versus required SINR, for Q = 64. 3(a): U = 64, M = 1 (fully loaded system) and M = 2 (half loaded system). 3(b): U = 32, M = 1 (half loaded system) and M = 2 (lightly loaded system).

If h(m,u) is the time-domain channel impulse response (CIR) between the transmit antenna m-th and the user u-th, we ˆ (n,m) the estimated CIR. We define the estimation denote by h ˆ (m,u) − h(m,u) and the normalized ratio error vector Δh = h U Λn = ρE[Δh 2 ] [27]. Hence we model the channel estimate (m,u) (m,u) ˆ h as a Gaussian random and withU mean h vector 2 2 variance given by σΔh = E ||Δh = ρΛn . In Figure 4, a comparison of the various precoders in terms of sum-rate versus ρ is shown, for Q = 64, U = 64, M = 1, M = 2, and Λn = 5 and Λn = 15 dB. We observe that for Λn = 15 dB, the POP looses roughly only 1.5 bit/s/Hz for all values of ρ for both M = 1 and M = 2, with respect to the ideal case with Λn = ∞ (shown in Figure 1(a)). Let γ u be the required SINR and γu the actual SINR (12),

776

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

8

0

POP Λn=15 dB [7] Λ = 15 dB

7

−2

n

PMPP Λn=15 dB

5

n

−6

POP Λn=5 dB [7] Λ =5 dB n

4

PMPP Λn=5 dB SISMPP Λn=5 dB

3

−8 −10

Λn=5 dB

[11] Λn=5 dB −12

2

SMPP SOP SUPD

−14

1 0

Λ =15 dB

n

[11] Λn=15 dB

μ

sum−rate [bit/s/Hz]

−4

SISMPP Λ =15 dB

6

−16

0

5

10

15

20 ρ [dB]

25

30

35

40

0

5

10 required SINR [dB]

15

20

Fig. 5. Comparison between three precoders in terms of relative MSE μ versus required SINR. Q = 64, U = 64, M = 1 and Λn = 5, 15 dB.

(a)

12

SINRs problem, the second optimally (in the sense of the max sum rate between all the linear precoders) solves the max sum rate problem. It is seen that the proposed joint precoding and power control solutions may provide a big advantage with respect to [7]–[11] for a fully loaded system, i.e. when the number of users is equal to the spreading factor times the number of transmit antennas. However, for lower loads, performance of the various schemes becomes closer. Moreover, while suboptimal schemes yield close form solutions, the proposed optimum schemes require iterative methods, whose number of iterations is usually small.

10 Λn = 15 dB sum−rate [bit/s/Hz]

8

6

4

2

Λ = 5 dB n

0

0

5

10

15

20

25

30

35

R EFERENCES 40

ρ [dB]

(b) Fig. 4. Comparison between the different structures in terms of sum-rate versus ρ in the presence of inaccurate channel estimate. Q = 64, U = 64 and Λn = 5, 15 dB. 3(a): M=1; 3(b): M=2.

in correspondence to K pre evaluated by using the estimated channel. The average  error in the actual SINR is evaluated by the quantity μ = E 10 log10 γγu . In Figure 5, a comparison u of the various precoding schemes in terms of μ versus γ u , for Q = 64, U = 64, M = 1 and for two values of Λn is shown. We note that post-detection schemes are less sensitive to channel estimation errors. VII. C ONCLUSIONS In this paper we have presented a class of joint linear precoding and power control techniques for downlink MCCDMA systems. We have proposed two iterative structures: the first optimally (in the sense of the minimum power between all the linear precoders) solves the assigned target

[1] K. Fazel and S. Kaiser, Multi-Carrier and Spread Spectrum Systems. Chichester: Wiley, 2003. [2] Y. Kishiyama, N. Maeda, K. Higuchi, H. Atarashi, and M. Sawahashi, “Experiments of throughput performance above 100-Mbps in forward link for VDF-OFCDM broadband packet wireless access,” in Proc. IEEE VTC’F03, Oct. 2003. [3] IST-MATRICE project, http://www.ist-matrice.org. [4] IST-4MORE project, http://www.ist-4more.org. [5] A. Silva and A. Gameiro, “Pre-filtering antenna array for downlink TDD MC-CDMA systems,” in Proc. Vehicular Techn. Conf., VTC 2003Spring, April 2003. [6] T. S¨alzer, A. Silva, A. Gameiro, and D. Mottier, “Pre-filtering using antenna arrays for multiple access interference mitigation in multicarrier CDMA downlink,” in Proc. IST Mobile Comm. Summit, June 2003. [7] T. S¨alzer and D. Mottier, “Downlink strategies using antenna arrays for interference mitigation in multi-carrier CDMA,” in Proc. MCSS 2003, Sept. 2003. [8] A. Silva, and A. Gameiro, “Pre-filtering techniques using antenna arrays for downlink TDD MC-CDMA systems,” in Proc. MCSS 2003, Sept. 2003. [9] P. Bisaglia, N. Benvenuto, S. Pupolin, L. Sanguinetti, and M. Morelli, “Pre-equalization techniques for downlink and uplink TDD MC-CDMA systems,” in Proc. WPMC’04, Sept. 2004. [10] M. Morelli and L. Sanguinetti, “A novel pre-filtering technique for downlink transmissions in TDD MC-CDMA systems,” IEEE. Trans. Wireless Commun., vol. 4, pp. 2064–2069, Sept. 2005. [11] L. Sanguinetti, M. Morelli, and I. Cosovic, “MMSE pre-filtering techniques for TDD MC-CDMA downlink transmissions,” in Proc. Vehicular Techn. Conf., VTC 2005-Spring, May 2005. [12] H. Sampath, P. Stoica, and A. Paulraj, “Generalized linear precoder and decoder design for MIMO channels using the weighted MMSE criterion,” IEEE. Trans. Commun., vol. 49, pp. 2198–2206, Dec. 2001.

BENVENUTO et al.: JOINT OPTIMUM LINEAR PRECODING AND POWER CONTROL STRATEGIES FOR DOWNLINK MC-CDMA SYSTEMS

[13] T. Yoo, N. Jindal, and A. Goldsmith, “Multi-antenna broadcast channels with limited feedback and user selection,” submitted to IEEE J. Select. Areas Commun., May 2006. [14] C. B. Peel, B. M. Hochwald, and A. L. Swindlehurst, “A vectorperturbation technique for near-capacity multi-antenna multi-user communication–part I,” IEEE. Trans. Commun., vol. 53, pp. 195–202, Jan. 2005. [15] S. Ulukus and R. D. Yates, “Adaptive power control and MMSE interference suppression,” Wireless Networks, pp. 489–496, Nov. 1998. [16] R. D. Yates, “A framework for uplink power control in cellular radio systems,” IEEE J. Select. Areas. Commun., pp. 1341–1347, Sept. 1995. [17] P. Viswanath and D. Tse, “Sum capacity of the vector Gaussian channel and uplink-downlink duality,” IEEE Trans. Inform. Theory, vol. 49, pp. 1912–1921, Aug. 2003. [18] F. Rashid-Farrokhi, K. J. R. Liu, and L. Tassiulas, “Transmit beamforming and power control in wireless networks with fading channels,” IEEE J. Select. Areas Commun., vol. 16, pp. 1437–1450, Oct. 1998. [19] M. Stojnic, H. Vikalo, and B. H. Hassibi, “Rate maximization in multiantenna broadcast channels with linear preprocessing,” in Proc. IEEE Globecom, pp. 3957–3961, Dec. 2004. [20] F. Boccardi, F. Tosato and G. Caire, “Precoding schemes for the MIMOGBC,” in Proc. IEEE International Zurich Seminar on Communications, Feb. 2006. [21] J. H. Chang, L. Tassiulas, and F. Rashid-Farrokhi, “Joint transmitter receiver diversity for efficient space division multiaccess,” IEEE. Trans. Wireless Commun., vol. 1, pp. 16–27, Jan. 2002. [22] U. Madhow and M. L. Honigh, “MMSE interference suppression for direct-sequence spread-spectrum CDMA,” IEEE. Trans. Commun., vol. 42, pp. 3178–3188, Dec. 1994. [23] R. Storn and K. Price, “Differential evolution - a simple and efficient adaptive scheme for global optimization over continuous spaces,” Berkeley, CA, Tech. Rep. TR-95-012, 1995. [Online]. Available: http://www.icsi.berkeley.edu/ftp/pub/techreports/1995/tr-95-012.pdf [24] G. Caire, S. Shamai (Shitz), Y. Steinberg, and H. Weingarten, “MIMO broadcast channel: theoretical and system aspects,” Foundations and Trends in Communications and Information Theory, (in preparation). [25] T. Cover and J. Thomas, Elements of Information Theory. New York: Wiley, 1991. [26] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge, UK: Cambridge University Press, 2004. [27] N. Benvenuto and G. Cherubini, Algorithms for Communications Systems and Their Applications. Chichester, UK: Wiley, 2002.

777

Nevio Benvenuto received the Laurea degree from the University of Padova, Padova, Italy, and the Ph.D. degree from the University of Massachusetts, Amherst, in 1976 and 1983, respectively, both in electrical engineering. From 1983 to 1985 he was with AT&T Bell Laboratories, Holmdel, NJ, working on signal analysis problems. He spent the next three years alternating between the University of Padova, where he worked on communication systems research, and Bell Laboratories, as a Visiting Professor. From 1987 to 1990, he was a member of the faculty at the University of Ancona. He was a member of the faculty at the University of L’Aquila from 1994 to 1995. Currently, he is a Professor in the Electrical Engineering Department, University of Padova. His research interests include voice and data communications, digital radio, and signal processing. Paola Bisaglia was born in Padova, Italy, on August 8, 1971. She received the Laurea (cum laude) and Ph.D. degrees in electronic engineering from the University of Padova, Padova, Italy in 1996 and 2000 respectively. In 2000 she joined Hewlett-Packard Research Laboratories in Bristol, England, where she worked on home phone-line networking and wireless LANs. From 2002 to 2005 she was a research fellow at the Department of Information Engineering of the University of Padova, Italy, investigating pre- and post-detection strategies for next generation (4G) broadband cellular systems. She has been with Dora S.p.A., an STMicroelectronics Group since May 2005, where she is involved in research activities and in the design of integrated circuits for narrow-band and wide-band power-line communications dedicated to indoor and outdoor applications.

relaying techniques.

Federico Boccardi received the Laurea degree in Telecommunication Engineering from the University of Padova, Italy, in 2002 and the Ph.D. in Electronic and Telecommunication Engineering from the same university in 2006. From October 2004 to April 2005 he was a visiting student in Eurecom, France. From November 2005 to May 2006, he was a visiting student in Bell Labs, Crawford Hill, NJ. Since December 2006 he has been with Bell Labs, AlcatelLucent in Swindon, UK. His research activity is focused on multiple antenna multiuser schemes and

778

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

Minimum Mean-Squared Error Iterative Successive Parallel Arbitrated Decision Feedback Detectors for DS-CDMA Systems Rodrigo C. de Lamare and Raimundo Sampaio-Neto

Abstract—In this paper we propose minimum mean squared error (MMSE) iterative successive parallel arbitrated decision feedback (DF) receivers for direct sequence code division multiple access (DS-CDMA) systems. We describe the MMSE design criterion for DF multiuser detectors along with successive, parallel and iterative interference cancellation structures. A novel efficient DF structure that employs successive cancellation with parallel arbitrated branches and a near-optimal low complexity user ordering algorithm are presented. The proposed DF receiver structure and the ordering algorithm are then combined with iterative cascaded DF stages for mitigating the deleterious effects of error propagation for convolutionally encoded systems with both Viterbi and turbo decoding as well as for uncoded schemes. We mathematically study the relations between the MMSE achieved by the analyzed DF structures, including the novel scheme, with imperfect and perfect feedback. Simulation results for an uplink scenario assess the new iterative DF detectors against linear receivers and evaluate the effects of error propagation of the new cancellation methods against existing ones. Index Terms—DS-CDMA systems, multiuser detection, decision feedback structures, iterative detection, iterative decoding.

I. I NTRODUCTION

M

ULTIUSER detection has been proposed as a means to suppress multi-access interference (MAI), increasing the capacity and the performance of CDMA systems [1]. The optimal multiuser detector of Verdu [2] suffers from exponential complexity and requires the knowledge of timing, amplitude and signature sequences. This fact has motivated the development of various sub-optimal strategies: the linear [3] and decision feedback (DF) [4] receivers, the successive interference canceller [5] and the multistage detector [6]. Recently, Verdu and Shamai [7] and Rapajic [8] et al. have investigated the information theoretic trade-off between the spectral and power efficiency of linear and non-linear multiuser detectors in synchronous AWGN channels. These works have shown that given a sufficient signal to noise ratio and for high loads (the ratio of users to processing gain close to one), DF detection has a substantially higher Paper approved by X. Wang, the Editor for Multiuser Detection and Equalization of the IEEE Communications Society. Manuscript received April 4, 2006; revised December 6, 2006. R. C. de Lamare is with the Communications Research Group, Department of Electronics, University of York, York Y010 5DD, United Kingdom (e-mail: [email protected]). R. Sampaio-Neto is with CETUC/PUC-RIO, 22453-900, Rio de Janeiro, Brazil (e-mail: [email protected]). Digital Object Identifier 10.1109/TCOMM.2008.060209.

spectral efficiency than linear detection. For uplink scenarios, DF structures, which are relatively simple and perform linear interference suppression followed by interference cancellation, provide substantial gains over linear detection. Minimum mean squared error (MMSE) multiuser detectors usually show good performance and have simple adaptive implementation. In particular, when used with short or repeated spreading sequences the MMSE design criterion leads to adaptive versions which only require a training sequence for estimating the receiver parameters. Previous work on DF detectors examined successive interference cancellation [9], [10], [11], parallel interference cancellation [13], [14], [15] and multistage or iterative DF detectors [14], [15]. The DF detector with successive interference cancellation (S-DF) is optimal, in the sense that it achieves the sum capacity of the the synchronous AWGN channel [10]. The S-DF scheme is capable of alleviating the effects of error propagation despite it generally leads to non uniform performance over the users. In particular, the user ordering plays an important role in the performance of S-DF detectors. Studies on decorrelator DF detectors with optimal user ordering have been reported in [11] for imperfect feedback and in [12] for perfect feedback. The problem with the optimal ordering algorithms in [11], [12] is that they represent a very high computational burden for practical receiver design. Conversely, the DF receiver with parallel interference cancellation (P-DF) [13], [14], [15] satisfies the uplink requirements, namely, cancellation of intracell interference and suppression of the remaining other-cell interference, and provides, in general, uniform performance over the user population even though it is more sensitive to error propagation. The multistage or iterative DF schemes presented in [14], [15] are based on the combination of SDF and P-DF schemes in multiple stages in order to refine the symbol estimates, resulting in improved performance over conventional S-DF, P-DF and mitigation of error propagation. In this work, we propose the design of MMSE DF detectors that employ a novel successive parallel arbitrated DF (SPADF) structure based on the generation of parallel arbitrated branches. The motivation for the novel DF structures is to mitigate the effects of error propagation often found in P-DF structures [13], [14], [15]. The basic idea is to improve the S-DF structure using different orders of cancellation and then select the most likely estimate. A near-optimal user ordering algorithm is described for the new SPA-DF detector structure and is compared to the optimal user ordering algorithm, which

c 2008 IEEE 0090-6778/08$25.00 

DE LAMARE and SAMPAIO-NETO: MINIMUM MEAN-SQUARED DECISION FEEDBACK DETECTORS

requires the evaluation of K! different cancellation orders. The results in terms of performance show that the SPA-DF structure with the suboptimal ordering algorithm can achieve a performance very close to that of the S-DF with optimal ordering. Furthermore, the new SPA-DF scheme is combined with iterative cascaded DF stages, where the subsequent stage uses S-DF, P-DF or the new SPA-DF system to refine the symbol estimates of the users and combat the effects of error propagation. The performance of the proposed SPADF scheme and the sub-optimal ordering algorithm and their combinations with other schemes in a multistage detection structure is investigated for both uncoded and convolutionally encoded systems with Viterbi and turbo decoding. This paper is structured as follows. Section II briefly describes the DS-CDMA system model. The MMSE decision feedback receiver filters are described in Section III. Sections IV is devoted to the novel SPA-DF scheme, the near-optimal user ordering algorithm and the combination of the SPADF detector with iterative cascaded DF stages and Section V details the proposed SPA-DF receiver for convolutionally coded systems with Viterbi and turbo decoding. Section VI presents and discusses the simulation results and Section VII draws the concluding remarks of this paper. II. DS-CDMA S YSTEM M ODEL Let us consider the uplink of a symbol synchronous binary phase-shift keying (BPSK) DS-CDMA system with K users, N chips per symbol and Lp propagation paths. It should be remarked that a synchronous model is assumed for simplicity, although it captures most of the features of more realistic asynchronous models with small to moderate delay spreads. The baseband signal transmitted by the k-th active user to the base station is given by xk (t) = Ak

∞ 

bk (i)sk (t − iT )

(1)

i=−∞

where bk (i) ∈ {±1} denotes the i-th symbol for user k, the real valued spreading waveform and the amplitude associated The spreading with user k are sk (t) and Ak , respectively. N (t) = a (i)φ(t − iTc ), waveforms are expressed by s k k i=1 √ where ak (i) ∈ {±1/ N }, φ(t) is the chip waveform, Tc is the chip duration and N = T /Tc is the processing gain. Assuming that the receiver is synchronised with the main path, the coherently demodulated composite received signal is r(t) =

p −1 K L 

hk,l (t)xk (t − τk,l ) + n(t)

(2)

k=1 l=0

where hk,l (t) and τk,l are, respectively, the channel coefficient and the delay associated with the l-th path and the k-th user. Assuming that τk,l = lTc , the channel is constant during each symbol interval, the spreading codes are repeated from symbol to symbol and the receiver is synchronized with the main path, the received signal r(t) after filtering by a chippulse matched filter and sampled at chip rate yields the M -

779

dimensional received vector r(i) =

K 

¯ k hk (i − 1) Ak bk (i)Ck hk (i) + Ak bk (i − 1)C

k=1

˘ k hk (i + 1) + n(i) + Ak bk (i + 1)C K    = Ak bk (i)pk (i) + ηk (i) + n(i)

(3)

k=1

where M = N + Lp − 1, n(i) = [n1 (i) . . . nM (i)]T is the complex gaussian noise vector with E[n(i)nH (i)] = σ 2 I, (.)T and (.)H denote transpose and Hermitian transpose, respectively, E[.] stands for ensemble average, bk (i) ∈ {±1 + j0} is the symbol for user k, the amplitude of user k is Ak , the user k channel vector is hk (i) = [hk,0 (i) . . . hk,Lp −1 (i)]T with hk,l (i) = hk,l (iTc ) for l = 0, . . . , Lp − 1, the ISI is given by ¯ k hk (i − 1) + Ak bk (i + 1)C ˘ k hk (i + 1) η k (i) = Ak bk (i − 1)C and assumes that the channel order is not greater than N , i.e. Lp − 1 ≤ N , sk = [ak (1) . . . ak (N )]T is the signature sequence for user k and pk (i) = Ck hk (i) is the effective signature sequence for user k, the M × Lp convolution matrix Ck contains one-chip shifted versions of sk and the M × Lp ¯ k and C ˘ k with segments of sk have the following matrices C structure ⎡ ⎤ ak (1) 0 ... 0 .. ⎥ ⎢ .. .. ⎢ . . ak (1) . ⎥ ⎢ ⎥ ⎢ ⎥ .. .. ⎢ak (N ) ⎥ . 0 . ⎢ ⎥ Ck = ⎢ ⎥, . . ⎢ 0 . ak (1) ⎥ ak (N ) ⎢ ⎥ ⎢ . .. ⎥ .. .. ⎢ .. . . ⎥ . ⎣ ⎦ .. . ak (N ) 0 0 ⎤ 0 ak (N ) . . . ak (N − Lp + 1) .. ⎥ ⎢ .. .. ⎥ ⎢. . . 0 ⎥ ⎢ ⎥ ⎢ .. . . ⎥ ⎢0 . ak (N ) . ⎥, ¯k = ⎢ C ⎥ ⎢. .. ⎥ ⎢ .. . 0 0 ⎥ ⎢ ⎥ ⎢ . . .. .. ⎦ ⎣0 0 0 0 ... 0 ⎡

⎤ 0 ⎢ .. ⎥ ⎢ .⎥ ... ⎥ ⎢ ⎢ 0 ... 0 0⎥ ⎥ ⎢ ˘k = ⎢ C ⎥. .. ⎢ ak (1) . 0 0⎥ ⎥ ⎢ ⎢ .. ⎥ .. .. .. ⎣ . .⎦ . . ak (Lp − 1) . . . ak (1) 0 ⎡

0 .. .

...

0 .. .

The MAI comes from the non-orthogonality between the received signature sequences, whereas the ISI span Ls depends on the length of the channel response, which is related to the length of the chip sequence. For Lp = 1, Ls = 1 (no ISI), for 1 < Lp ≤ N, Ls = 2, for N < Lp ≤ 2N, Ls = 3.

780

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

III. MMSE D ECISION F EEDBACK R ECEIVERS Let us describe in this section the design of synchronous MMSE decision feedback detectors. The input to the hard decision device corresponding to the ith symbol is ˆ where the input z(i) = z(i) = WH (i)r(i) − FH (i)b(i), [z1 (i) . . . zK (i)]T , W(i) = [w1 . . . wK ] is M × K the ˆ feedforward matrix, b(i) = [b1 (i) . . . bK (i)]T is the K × 1 vector of estimated symbols, which are fed back through the K × K feedback matrix F(i) = [f1 (i) . . . fK (i)]. Generally, the DF receiver design is equivalent to determining for user k a feedforward filter wk (i) with M elements and a feedback one fk (i) with K elements that provide an estimate of the desired symbol: ˆ , zk (i) = wkH (i)r(i) − fkH (i)b(i)

k = 1, 2, . . . , K (4)

ˆ = sgn[(W r(i))] is the vector with initial deciwhere b(i) sions provided by the linear section, wk and fk are optimized by the MMSE criterion. In particular, the feedback filter fk (i) of user k has a number of non-zero coefficients corresponding to the available number of feedback connections for each type of cancellation structure. The final detected symbol is: 

 

 ˆbf (i) = sgn  zk (i) = sgn  wH (i)r(i) − f H (i)b(i) ˆ H

k

k

k

(5) where the operator (.)H denotes Hermitian transpose, (.) selects the real part and sgn(.) is the signum function. To describe the optimal MMSE filters we will initially ˆ = b, and then will assume perfect feedback, that is b consider a more general framework. Consider the following cost function:

JMSE = E |bk (i) − wkH r(i) + fkH b(i)|2 (6) Let us divide the users into two sets, similarly to [14] D = {j : ˆbj is f ed back }

(7)

U = {j : j ∈ / D}

(8)

where the two sets D and U correspond to detected and undetected users, respectively. Let us also define the matrices of effective spreading sequences P = [p1 . . . pK ], PD = [p1 . . . pD ] and PU = [p1 . . . pU ]. The minimization of the cost function in (6) with respect to the filters wk and fk yields: (9) wk = R−1 U pk fk = PH D wk

(10)

where the associated covariance matrices are R = 2 E[r(i)rH (i)] = PPH + σ 2 I, RU = PU PH U + σ I = H R − PD PD . Thus, assuming perfect feedback and that user k is the desired one, the associated MMSE for the DF receiver is given by: −1 JMMSE = σb2 − pH k RU pk

(11)

where σb2 = E[|b2k (i)|]. The result in (11) means that in the absence of error propagation, the MAI in set D is eliminated and user k is only affected by interferers in set U . For the successive interference cancellation DF (S-DF) detector , we have for user k D = {1, . . . , k − 1},

U = {k, . . . , K}

(12)

where the filter matrix F(i) is strictly upper triangular. The S-DF structure is optimal in the sense of that it achieves the sum capacity of the synchronous CDMA channel with AWGN [10]. In addition, the S-DF scheme is less affected by error propagation although it generally does not provide uniform performance over the user population. In order to design the S-DF receivers and satisfy the constraints of the SDF structure, the designer must obtain the vector with initial ˆ decisions b(i) = sgn[(WH (i)r(i))] and then resort to the following cancellation approach. The non-zero part of the filter fk corresponds to the number of used feedback connections and to the users to be cancelled. For the S-DF, the number of feedback elements and their associated number of non-zero filter coefficients in fk (where k goes from the second detected user to the last one) range from 1 to K − 1. The parallel interference cancellation DF (P-DF) [14] receiver can offer uniform performance over the users but it suffers from error propagation. For the P-DF in a single cell, we have [14] D = {1, . . . , k − 1 k + 1, . . . , K}, U = {k} pk wk = R−1 U pk = A2k + σ 2

(13) (14)

The MMSE associated with the P-DF system is obtained by substituting RU = R − PD PH D into (9), which yields: H 2 −1 JMMSE = σb2 − pH pk = k (pk pk + σ I)

A2k

σ2 + σ2

(15)

where for P-DF F(i) is full and constrained to have zeros along the diagonal to avoid cancelling the desired symbols. In order to design P-DF receivers and satisfy their constraints, the designer must obtain the vector with initial decisions ˆ b(i) = sgn[(WH (i)r(i))] and then resort to the following cancellation approach. The non-zero part of the filter fk corresponds to the number of used feedback connections and to the users to be cancelled. For the P-DF, the feedback connections used and their associated number of non-zero filter coefficients in fk are equal to K − 1 for all users and the matrix F(i) has zeros on the main diagonal to avoid cancelling the desired symbols. Now let us consider a more general framework, where the feedback is not perfect. The minimization of the cost function in (4) with respect to wk and fk leads to the following filter expressions: wk = R−1 (pk + Bfk ) (16) ˆb ˆ H ])−1 BH wk ≈ BH wk fk = (E[b

(17)

ˆb ˆ H ] ≈ I for small error rates and B = where E[b H ˆ E[r(i)b (i)]. The associated MMSE for DF receivers subject ˆb ˆ H ] ≈ I and imperfect feedback is approximately given to E[b by −1 −1 JMMSE ≈ σb2 − pH pk − pH Bfk (18) k R k R In Appendix I we show that the expression in (18) equals (11) under perfect feedback, and provide several other relationships between DF structure with and without perfect feedback. Note that the MMSE associated with DF receivers that are subject ˆ H ], to imperfect feedback depends on the matrix B = E[rb that under perfect feedback equals PD , and the feedback

DE LAMARE and SAMPAIO-NETO: MINIMUM MEAN-SQUARED DECISION FEEDBACK DETECTORS

781

filter fk or set of filters F. Specifically, if we choose a given structure for F this approach will lead to different methods of interference cancellation and performance improvements for the DF detector as compared to linear detection. The motivation for our work is to investigate alternative methods of finding structures for F that provide enhanced performance. IV. S UCCESSIVE PARALLEL A RBITRATED DF AND I TERATIVE D ETECTION In this section, we present a novel interference cancellation structure and describe a low complexity near-optimal ordering algorithm that employs different orders of cancellation and then selects the most likely symbol estimate. The proposed ordering algorithm is compared with the optimal user ordering algorithm, which requires the evaluation of K! different cancellation orders and turns out to be too complex for practical use. The new receiver structure, denoted successive parallel arbitrated DF (SPA-DF) detection, is then combined with iterative cascaded DF stages [14], [15] to further refine the symbol estimates. The motivation for the novel DF structures is to mitigate the effects of error propagation often found in P-DF structures [14], [15], that are of great interest for uplink scenarios due to its capability of providing uniform performance over the users. A. Successive Parallel Arbitrated DF Detection The idea of parallel arbitration is to employ successive interference cancellation (SIC) to rapidly converge to a local maximum of the likelihood function and, by running parallel branches of SIC with different orders of cancellation, one can arrive at sufficiently different local maxima [16]. The goal of the new scheme, whose block diagram is shown in Fig. 1, is to improve performance using parallel searches and to select the most likely symbol estimate. The idea of the ordering algorithm is to employ SIC for different branches based on the power of the users to rapidly converge to a local maximum of the likelihood function and, on the basis of the euclidean distance, our approach selects the most likely estimate. In order to obtain the benefits of parallel search, the candidates should be arbitrated, yielding different estimates of a symbol. The estimate of a symbol that has the highest likelihood is then selected at the output. Unlike the work of Barriac and Madhow [16] that employed matched filters as the starting point, we adopt MMSE DF receivers as the initial condition and the euclidean distance for selecting the most likely symbol. The concept of parallel arbitration is thus incorporated into a DF detector structure, that applies linear interference suppression followed by SIC and yields improved starting points as compared to matched filters. Note that our approach does not require signal reconstruction as the PASIC in [16] because the MMSE filters automatically compute the coefficients for interference cancellation. Following the schematic of Fig. 1, the user k output of the parallel branch l (l = 1, . . . , L) for the SPA-DF receiver structure is given by: ˆ zkl (i) = wkH (i)r(i) − [Ml F]H k b(i)

(19)

Fig. 1.

Block diagram of the proposed SPA-DF receiver.

ˆ where b(i) = sgn[(WH r(i))] and the matrices Ml are permutated square identity (IK ) matrices with dimension K whose structures for an L = 4-branch SPA-DF scheme are given by:   I3K/4 0K/4,3K/4 M1 = IK , M2 = , IK/4 0K/4,3K/4 ⎡ ⎤ 0 ... 1   0K/2 IK/2 ⎢ ⎥ , M4 = ⎣ ... . · . ... ⎦ (20) M3 = IK/2 0K/2 1 ... 0 where 0m,n denotes an m × n-dimensional matrix full of zeros and the structures of the matrices Ml correspond to phase shifts regarding the cancellation order of the users. The purpose of the matrices in (20) is to change the order of cancellation. When M = I the order of cancellation is a simple successive cancellation (S-DF) based upon the user powers (the same as [9], [10]). Specifically, the above matrices perform the cancellation with the following order with respect to user powers: M1 with indices 1, . . . , K; M2 with indices K/4, K/4 + 1, . . . , K, 1, . . . , K/4 − 1;M3 with indices K/2, K/2 + 1, . . . , K, 1, . . . , K/2 − 1; M4 with K, . . . , 1 (reverse order). The proposed ordering algorithm shifts the ordering of the users according to K/B, where B is the number of parallel branches. The rationale for this approach is to shift the ordering and attempt to benefit a given user or group of users for each decoding branch. Following this approach, a user that for a given ordering appears to be in an unfavorable position can benefit in other parallel branches by being detected in a more favorable situation. For more branches, additional phase shifts are applied with respect to user cancellation ordering. Note that different update orders were tested although they did not result in performance improvements. The final output ˆbfk (i) of the SPA-DF detector chooses the best estimate of the L candidates for each symbol interval i as described by:   ˆb(f ) (i) = sgn  arg min el (i) (21) k k 1≤l≤L

where the best estimate is the value zkl (i) that minimizes (f ) elk (i) = |bk (i) − zkl (i)| and ˆbk (i) forms the vector of (f ) (f ) ˆ (i) = [ˆb (i) . . . ˆb(f ) (i)]T . The number final decisions b 1 K k of parallel branches L that yield detection candidates is a parameter that must be chosen by the designer. In this context,

782

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

the optimal ordering algorithm conducts an exhaustive search and is given by   ˆb(f ) (i) = sgn  arg min el (i) (22) k k 1≤l≤K!

where the number of candidates is L = K! and is clearly very complex for practical systems. Our studies indicate that L = 4 achieves most of the gains of the new structure and offers a good trade-off between performance and complexity. The SPA-DF system employs the same filters, namely W and F, of the traditional S-DF structure and requires additional arithmetic operations to compute the parallel arbitrated candidates. A discussion of the approximate MMSE attained by the proposed SPA-DF structure is included in Appendix II, whereas expressions for the MMSE of the optimal ordering algorithm are given in Appendix III. As occurs with S-DF receivers, a disadvantage of the SPA-DF detector is that it generally does not provide uniform performance over the user population. In a scenario with tight power control successive techniques tend to favor the last detected users, resulting in non-uniform performance. To equalize the performance of the users an iterative technique with multiple stages can be used. B. Iterative Successive Parallel Arbitrated DF Detection In [14], Woodward et al. presented an iterative detector with an S-DF in the first stage and P-DF or S-DF structures, with users being demodulated in reverse order, in the second stage. The work of [14] was then extended to account for coded systems and training-based reduced-rank filters [15]. Here, we focus on the proposed SPA-DF receiver and the low complexity near-optimal ordering algorithm, and combine the SPA-DF structure with iterative detection. An iterative receiver with hard-decision feedback is defined by: ˆ (m) (i) z(m+1) (i) = WH (i)r(i) − FH (i)b

(23)

where the filters W and F can be S-DF or P-DF structures, ˆ m (i) is the vector of tentative decisions from the and b preceding iteration that is described by: 

 ˆ (1) (i) = sgn  WH (i)r(i) (24) b 

 ˆ (m) (i) = sgn  z(m) (i) , m > 1 (25) b where the number of stages m depends on the application. More stages can be added and the order of the users is reversed from stage to stage. To equalize the performance over the user population, we consider a two-stage structure. The first stage is an SPA-DF scheme with filters W1 and F1 . The tentative decisions are passed to the second stage, which consists of an S-DF, an PDF or an SPA-DF detector with filters W2 and F2 , that are computed similarly to W1 and F1 but use the decisions of the first stage. The resulting iterative receiver system is denoted ISPAS-DF when an S-DF scheme is deployed in the second stage, whereas for P-DF filters in the second stage the overall scheme is called ISPAP-DF. The output of the second stage of the resulting scheme is: 2 H ˆ (2) zj (i) = [MW2 (i)]H (i) j r(i) − [MF (i)]j b (2)

(26)

where zj is the jth component of the soft output vector z, M is a square permutation matrix with ones along the reverse diagonal and zeros elsewhere (similar to M4 in (18)), [.]j denotes the jth column of the argument (a matrix), m and ˆbm j (i) = sgn[(zj (i))]. The third proposed iterative scheme is denoted ISPASPA-DF and corresponds to an SPADF architecture employed in both stages. The output of the lth branch of its second stage is: (2) 2 H ˆ (2) (i) (27) zl,j (i) = [MW2 (i)]H j r(i) − [Ml F (i)]j b   (2) and el,j = where ˆbj (i) = sgn  arg min1≤l≤L ell,j (i) |bk (i) − zl,j (i)|. Note that the users in the second stage are demodulated successively and in reverse order relative to the first branch of the SPA-DF structure (a conventional S-DF). The role of reversing the cancellation order in successive stages is to equalize the performance of the users over the population or at least reduce the performance disparities. Indeed, it provides a better performance than keeping the same ordering as the last decoded users in the first stage tend to be favored by the reduced interference. The rationale is that by using these benefited users (last decoded ones) as the first ones to be decoded in the second stage, the resulting performance is improved. Additional stages can be included, although our studies suggest that the gains in performance are marginal. Hence, the two-stage scheme is adopted for the rest of this work.

V. S UCCESSIVE PARALLEL A RBITRATED DF AND I TERATIVE D ETECTION FOR C ODED S YSTEMS This section is devoted to the description of the proposed SPA-DF detector and iterative detection schemes for coded systems which employ convolutional codes with Viterbi and turbo decoding. Specifically, we present iterative DF detectors based on the proposed SPA-DF structure which exploits user ordering and combine the SPA-DF with either the S-DF, the P-DF or another SPA-DF in the second stage. We show that a reduced number of turbo iterations can be used with the proposed iterative detector when a near-optimal user ordering is employed and that savings in transmitted power are also obtained as compared to previously reported turbo detectors [19]-[23]. A. Convolutional Codes with Viterbi Decoding The structure shown in Fig. 1 can be extended to coded systems by including a decoder after the selection unit and before the slicer and an encoder that processes the refined estimates before the feedback filter F(i). For the proposed SPADF receiver structure, users are decoded successively with the aid of the Viterbi algorithm for each parallel arbitrated branch and then reencoded with a convolutional encoder and used for interference cancellation. The motivation for the proposed encoded structure is that significant gains can be obtained from iterative techniques with soft cancellation methods and error control coding [17]-[23] and from efficient receivers structures and ordering algorithms such as the novel SPA-DF detector. The decoding process of the existing S-DF, P-DF and iterative schemes, namely the ISS-DF and the ISP-DF, are explained in

DE LAMARE and SAMPAIO-NETO: MINIMUM MEAN-SQUARED DECISION FEEDBACK DETECTORS

[14]. The decoding of the proposed iterative detection schemes that employ the SPA-DF detector (ISPAS-DF, ISPAP-DF and ISPASPA-DF) resembles the uncoded case, where the second stage benefits from the enhanced estimates provided by the first stage that now employs convolutional codes followed by a Viterbi decoder with branch metrics based on the Hamming distance. Specifically, the output of the second stage of the resulting scheme for coded systems is: 2 H ˆ (2) zj (i) = [MW2 (i)]H (i) j r(i) − [MF (i)]j b (2)

where

 ˆ (2) (i)]l = [b

ˆb(2) j ˆb(1) j

for l > j for l < j

(28)

(29)

ˆ (2) (i). ˆ (2) (i)]l is the lth entry of the decision vector b where [b Accordingly, the output of the second stage of the ISPASPADF (the SPA-DF architecture is employed in both stages) is desbribed by: 2 H ˆ (2) zl,j (i) = [MW2 (i)]H (i) j r(i) − [Ml F (i)]j b   (2) where ˆbj (i) = sgn  arg min1≤l≤L el,j (i) and  (2) |bj (i) − zl,j (i)| for l > j el,j (i) = (1) |bj (i) − zl,j (i)| for l < j (2)

(30)

(31)

B. Iterative Turbo Receiver and Decoding A CDMA system with convolutional codes being used at the transmitter and the proposed iterative SPA-DF receiver with turbo decoding is illustrated in Fig. 2. The proposed iterative (turbo) receiver structure consists of the following stages: a soft-input-soft-output (SISO) SPA-DF detector and a maximum a posteriori (MAP) decoder. These stages are separated by interleavers and deinterleavers. Specifically, soft outputs from the SPA-DF are used to estimate likelihoods which are interleaved and input to the MAP decoder for the convolutional code. The MAP decoder computes a posteriori probabilities (APPs) for each user’s encoded symbols, which are used to generate soft estimates. These soft estimates are subsequently used to update the SPA-DF filters, de-interleaved and fed back through the feedback filter. This process is then iterated. The proposed SPA-DF detector yields the a posteriori loglikelihood ratio (LLR) of a transmitted symbol (+1 or −1) for every code bit of each user as given by Λ1 [bk (i)] = log

P [bk (i) = +1|r(i)] , P [bk (i) = −1|r(i)]

k = 1, . . . , K. (32)

superscript p denotes the quantity obtained in the previous iteration. Assuming equally likely bits, for the first iteration we have λp2 [bk (i)] = 0 for all users. The first term in (33), [r(i)|bk (i)=+1] i.e. λ1 [bk (i)] = log P P [r(i)|bk (i)=−1] , represents the extrinsic information yielded by the SISO SPA-DF detector based on the received data r(i), the prior information about the code bits of all other users λp2 [bl (i)], l = k and the prior information about the code bits of the kth user other than the ith bit. The extrinsic information λ1 [bk (i)] provided by the MAP decoder is then de-interleaved and fed back into the MAP decoder of the kth user as the a priori information in the next iteration. Based on the prior information λp1 [bk (i)] and the trellis structure of the code, the kth user’s MAP decoder computes the a posteriori LLR of each code bit as described by P [bk (i) = +1|λp1 [bk (i); decoding] P [bk (i) = −1|λp1 [bk (i); decoding] = λ2 [bk (i)] + λp1 [bk (i)], k = 1, . . . , K.

Λ2 [bk (i)] = log

[bk (i)=+1] where λp2 [bk (i)] = log P P [bk (i)=−1] represents the a priori LLR of the code bit bk (i), which is computed by the MAP decoder of the kth user in the previous iteration, interleaved and then fed back to the SPA-DF detector. Note that the

(34)

From the above equality, it is seen that the output of the MAP decoder is the sum of the prior information λp1 [bk (i)] and the extrinsic information λ2 [bk (i)] yielded by the MAP decoder. This extrinsic information is the information about the code bit bk (i) obtained from the prior information about the other code bits λp1 [bk (j)], j = i [22]. The MAP decoder also computes the a posteriori LLR of every information bit, which is used to make a decision on the decoded bit at the last iteration. After interleaving, the extrinsic information yielded by the K MAP decoders λ2 [bk (i)], k = 1, . . . , K is fed back to the SPADF detector, as the prior information about the code bits of all users in the subsequent iteration. At the first iteration, the extrinsic information λ1 [bk (i)] and λ2 [bk (i)] are statistically independent and as the iterations are computed they become more correlated and the improvement due to each iteration is gradually reduced. For the purpose of MAP decoding, we assume that the interference plus noise at the output of the subtractor in Fig. 2 (b), which corresponds to z(i), is Gaussian. This assumption is reasonable when there are many active users, has been used in previous works [15],[22]-[23] and provides an efficient and accurate way of computing the extrinsic information. Thus, for the kth user and mth iteration the soft output of the SPA-DF detector is written as (m)

(m)

zk (i) = Vk

(m)

bk (i) + ξk

(i)

(35)

(m)

where Vk (i) is a scalar variable equivalent to the kth user’s (m) amplitude and ξk (i) is a Gaussian random variable with variance σ 2(m) . Since we have ξk

Using Bayes’ rule, the above equation can be written as P [bk (i) = +1] P [r(i)|bk (i) = +1] + log Λ1 [bk (i)] = log P [r(i)|bk (i) = −1] P [bk (i) = −1] p = λ1 [bk (i)] + λ2 [bk (i)] (33)

783

(m)

Vk and

  (m) (i) = E b∗k (i)zk (i)

  (m) (m) σξ2(m) (i) = E |zk (i) − Vk (i)bk (i)|2

(36) (37)

k

(m) the designer can obtain the estimates Vˆk (i) and σ ˆ 2(m) (i) ξk via the corresponding sample averages over the packet transmission. These estimates are used to compute the detector a (m) posteriori probabilities P [bk (i) = ±1|zk (i)] which are deinterleaved and input to the MAP decoder for the convolutional

784

Fig. 2.

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

Block diagram of the proposed system with the SPA-DF detector and turbo decoding.

code. In what follows, we assume that the MAP decoder generates APPs P [bk (i) = ±1], which are used to compute the input to the feedback filter fk (i). From (35) the extrinsic information delivered by the soft output SPA-DF is given by

C. Extensions

offsets. These remedies imply in augmented filter lengths and consequently increased computational complexity. To alleviate for the increase in filter length and the increased amount of training, the designer can resort to reduced-rank estimation techniques such as the Multistage Wiener Filter, as in [14], or to a new very promising technique that employs interpolated FIR filters [25]. An extension with low complexity turbo schemes such as the one in [26] are also possible with the structures presented in this paper. For dynamic channels that are subject to fading, the designer can rely on adaptive signal processing techniques and make the proposed detector structures adaptive in order to track the variations of the channel and the interference. This includes some modifications for CDMA systems with long codes, which require a different approach for estimating the covariance observation matrix R due to the loss of the cyclostationarity. Finally, we also remark that the proposed detection schemes can be deployed for narrow-band systems with multiple transmitter and receiver antennas, exploiting the capacity improvements of spatial multiplexing.

Here, we briefly comment on how the proposed receiver structures can be extended to take into account asynchronous systems, dynamic scenarios, other types of communications systems and multiple access techniques. For asynchronous systems with large relative delays amongst the users, the observation window of each user should be expanded in order to consider an increased number of samples derived from the offsets amongst users. Alternatively for small relative delays amongst users, the designer can resort to chip oversampling to compensate for the random timing

VI. S IMULATIONS In this section, we evaluate the performance of the iterative arbitrated DF structures introduced in Section IV and compare them with other existing structures. Due to the extreme difficulty of theoretically analyzing such scheme, we adopt a simulation approach and conduct several experiments in order to verify the effectiveness of the proposed techniques. In particular, we have carried out experiments to assess the bit error rate (BER) performance of the DF receivers for different

(m)

λ1 [bk (i)] = log +

P [zk (i)|bk (i) = +1] (m) P [zk (i)|bk (i)

= −1]

(m) (m) (zk (i) + Vk )2 2 2σ (m) (i) ξk

=

(m)

=−

(m) 2

(zk (i) − Vk 2σ 2(m) (i)

(m) (m) 2Vk zk (i) σ 2(m) (i) ξk

)

ξk

(38)

The SPA-DF turbo detector chooses the best estimate of the L candidates for the mth turbo decoding iteration as: (m)

lbest,k (i) = arg min elk (i) 1≤l≤L

(39)

where the best estimate is the value zkl (i) which minimizes elk (i) = |bk (i) − zkl (i)|.

DE LAMARE and SAMPAIO-NETO: MINIMUM MEAN-SQUARED DECISION FEEDBACK DETECTORS

• • • • • •

• •

S-DF: the successive DF detector of [9], [10]. P-DF: the parallel DF detector of [13], [14]. ISS-DF: the iterative system of Woodward et al. [14] with S-DF in the first and second stages. ISP-DF: the iterative system of Woodward et al. [14] with S-DF in the first stage and P-DF in the second stage. SPA-DF: the proposed successive parallel arbitrated receiver. ISPAS-DF: the proposed iterative detector with the novel SPA-DF in the first stage and the S-DF in the second stage. ISPAP-DF: the proposed iterative receiver with the SPADF in the first stage and the P-DF in the second stage. ISPASPA-DF: the proposed iterative receiver with the SPA-DF in the first and second stages.

Let us first consider the proposed SPA-DF, evaluate the number of arbitrated branches that should be used in the ordering algorithm and account for the impact of additional branches upon performance. In addition to this, we carry out a comparison of the proposed low complexity user ordering algorithm against the optimal ordering approach, briefly described in Section IV. A, that tests K! possible branches and selects the most likely estimate. We designed the novel DF receivers with L = 2, 4, 8 parallel branches and compared their BER performance versus number of symbols with the existing S-DF and P-DF structures, as depicted in Fig. 3. The results show that the proposed low complexity ordering algorithm achieves a performance close to the optimal ordering, whilst keeping the complexity reasonably low for practical utilization. Furthermore, the performance of the new SPA-DF scheme with L = 2, 4, 8 outperforms the S-DF and the P-DF detector. It can be noted from the curves that the performance of the new SPA-DF improves as the number of parallel branches increase. In this regard, we also notice that the gains of performance obtained through additional branches

N=16, K=5 users, E /N =12 dB b

0

10

0

Linear S−DF P−DF SPA−DF(L=2) SPA−DF(L=4) SPA−DF(L=8) SPA−DF−optimal ordering

−1

10 Average BER

loads, channel profiles, and signal to noise ratios (Eb /N0 ). The DS-CDMA system employs random generated spreading sequences of length N = 16, N = 32 and N = 64, has perfect power control and use statistically independent random channels with Lp = 3, whose coefficients hk,l are taken, for each run, from uniform random variables −1 Lp between h2k,l = 1. It and 1, and which are normalized so that l=1 should be remarked that the existence of multipath creates an error floor for the multiuser receivers, making it more difficult the interference suppression of associated users. Note also that given the performance of current power control algorithms, ideal power control is not far from a realistic situation. Thematrices used in (14) and (15) iare estimated i ˆ ˆ ˆ H (l). by R(i) = 1i l=1 r(l)rH (l) and B(i) = 1i l=1 r(l)b For coded systems, we employ a convolutional code with rate R = 3/4 and constraint length 6 which can be found in [24]. In particular, for turbo decoding plots we used S-random interleavers with block size equal to 256. In the following experiments, averaged over 200 runs for uncoded systems, over 2000 for encoded systems with Viterbi decoding and over 20000 for turbo decoded schemes, it is indicated the receiver structure (linear or decision feedback (DF)). Amongst the different DF structures, we consider:

785

−2

10

−3

10

0

Fig. 3.

200

400

600

800 1000 1200 Number of received symbols

1400

1600

1800

2000

BER performance versus number of symbols.

decrease as L is increased, resulting in marginal improvements for more than L = 4 branches. For this reason, we adopt L = 4 for the remaining experiments because it presents a very attractive trade-off between performance and complexity. A performance comparison in terms of BER of the proposed DF structures, namely SPA-DF, ISPAP-DF, ISPAS-DF and ISPASPA-DF with existing iterative and conventional DF and linear detectors is illustrated in Figs. 4 to 5, for uncoded systems and in Fig. 6, for convolutionally coded systems. In particular, we show BER performance curves versus Eb /N0 and number of users (K) for the analyzed receivers. The results for a system with N = 32, depicted in Fig. 4 indicate that the best performance is achieved with the novel ISPASPA-DF (the SPA-DF is employed in two cascaded stages), followed by the new ISPAP-DF, the existing ISP-DF [14], the ISPAS-DF, the SPA-DF, the P-DF, the ISS-DF, the S-DF and the linear detector. Specifically, the ISPASPA-DF detector can save up to 1.5 dB and support up to 4 more users in comparison with the ISP-DF (which is the best existing scheme) for the same BER performance. The ISPAP-DF scheme can save up to 1 dB and support up to 2 more users in comparison with the ISP-DF for the same BER performance. Moreover, the performance advantages of the ISPASPA-DF and ISPAP-DF systems are substantially superior to the other existing approaches. The results for a larger system with N = 64, illustrated in Fig. 5, corroborate the curves obtained for the smaller system in Fig. 4. In particular, the same BER performance hierarchy is observed for the detection schemes (except for the ISPASDF, that now outperforms the ISP-DF) and we notice some additional gains in performance for the proposed schemes over the existing techniques. Specifically, the ISPASPA-DF detector can save up to 1.8 dB and support up to 10 additional users in comparison with the ISP-DF for the same BER performance. The ISPAP-DF scheme can save up to 1.4 dB and support up to 8 more users in comparison with the ISP-DF for the same BER performance. Moreover, the performance advantages of the ISPASPA-DF and ISPAP-DF systems are even more pronounced over the other analyzed schemes for

786

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008 (b) N=32, E /N =10 dB

(a) N=32, K=14 users

0

b

−1

10

10

0

(b) N=32, E /N =6 dB

(a) N=32, K=14 users

0

b

0

10

Linear S−DF P−DF SPA−DF ISS −DF ISP−DF ISPAS −DF ISPAP−DF ISPASPA−DF

10

−1

−1

10

0

Linear S −DF P −DF SPA −DF ISS −DF ISP−DF ISPAS −DF ISPAP −DF ISPASPA −DF

10

−1

10

10

−2

10

Average BER

Average BER

Average BER

Average BER

−2

−2

−3

10 −2

Linear S−DF P−DF SPA−DF ISS−DF ISP−DF ISPAS −DF ISPAP−DF ISPASPA−DF

10

2

4

6 E /N (dB) b

Fig. 4.

Linear S −DF P −DF SPA −DF ISS −DF ISP −DF ISPAS −DF ISPAP −DF ISPASPA −DF

−4

−3

0

8

10

10

12

−4

10

−5

2

6

0

10

14 18 22 Number of users (K )

26

10

30

BER performance versus (a) Eb /N0 and (b) number of users (K).

−3

10

10

−3

10

10

−5

0

2

4 6 Eb/N0 (dB)

8

10

10

2

6

10

14 18 22 Number of users (K )

26

30

Fig. 6. BER performance of a convolutionally coded system with R = 3/4 versus (a) Eb /N0 and (b) number of users (K).

(b) N=64, Eb/N0=10 dB

(a) N=64, K=28 users

Linear S −DF P −DF SPA−DF ISS −DF ISP−DF ISPAS −DF ISPAP−DF ISPASPA−DF

−1

10

−1

10

(a) N=32, K=24, R=3/4, Iterations=4

0

(b) N=32, E b/N0=4 dB, R=3/4, Iterations=4

0

10

10

−1

10

−1

10

−2

10

Average BER

Linear S−DF P−DF SPA −DF ISS −DF ISP−DF ISPAS −DF ISPAP−DF ISPASPA−DF

−2

10

−2

Average BER

Average BER

Average BER

−2

10

−3

10

Linear S−DF −4

10

−3

10

P −DF

10

SPA −DF ISS −DF −5

ISP−DF

10

−3

−4

10

ISPAS −DF

10

ISPAP−DF −3

10

0

Fig. 5.

2

4

6 Eb/N0 (dB)

8

10

12

4

10

16 22 28 34 Number of users (K )

40

46

BER performance versus (a) Eb /N0 and (b) number of users (K).

ISPASPA −DF

−6

10

Single User Bound 0

1

2 3 Eb/N0 (dB)

−5

4

5

10

4

8

12 16 20 24 Number of users (K)

28

32

Fig. 7. BER performance of a turbo decoded system with R = 3/4 versus (a) Eb /N0 and (b) number of users (K).

larger systems. The BER performance of the analyzed detection schemes was then examined for convolutionally encoded systems with Viterbi decoding, N = 32 and rate R = 3/4, as depicted in Fig. 6. The results corroborate those obtained for uncoded systems in Figs. 4 and 5, and indicate that the proposed ISPASPA-DF and ISPAP-DF detection schemes significantly outperform the remaining receiver structures. In particular, the ISPASPA-DF detector can support up to 8 additional users in comparison with the ISP-DF for the same BER performance, whereas the ISPAP-DF scheme can accomodate up to 6 more users in comparison with the ISP-DF for the same BER performance. It is worth noting that the linear and P-DF detectors experience performance losses for coded systems, relative to the other structures, as verified in [14] and which is a result of the loss in spreading gain that increases the interference power at the output of the MMSE receiver. The BER performance of the analyzed detection schemes was also investigated for convolutionally encoded systems with turbo decoding. In our studies with turbo receivers, we tested several code rates and found that R = 1/2 was unable

to attain good performance for highly loaded systems, whereas R = 3/4 was powerful enough to obtain good performance even in fully loaded systems. For this reason, we adopted the rate R = 3/4 for the remaining experiments with iterative decoders and considered a system with N = 32, as depicted in Fig. 7. The results corroborate those obtained for uncoded and encoded systems with Viterbi decoding in Figs. 5 and 6, and indicate that the proposed ISPASPA-DF and ISPAPDF detection schemes significantly outperform the remaining receiver structures. In particular, the ISPASPA-DF detector can approach the single user bound with only 4 iterations and offer a significant advantage over the existing detectors. In comparison with existing iterative DF detectors, the ISPASPADF can save up to 0.5 dB for the same BER performance, whereas it can accommodate a fully loaded system with only 4 iterations and operating with only 4 dB with negligible performance degradation as the load is increased. In Fig. 8 it is illustrated the average BER performance of the detectors versus the number of iterations of the turbo

DE LAMARE and SAMPAIO-NETO: MINIMUM MEAN-SQUARED DECISION FEEDBACK DETECTORS

N=32, K=24, Eb/N0=4 dB

−1

(a) N=32, K=14 users, E b/N0=10 dB

−1

(b) N=32, K=14 users, E /N =6 dB

−2

Individual BER

10

−3

10

Individual BER

−2

0

10

Linear S −DF P −DF SPA−DF ISS −DF ISP −DF ISPAS −DF ISPAP−DF ISPASPA−DF

10

b

−1

10

10

Average BER

787

−2

10

−3

10

−4

10 Linear S −DF P −DF SPA −DF ISS −DF ISP −DF ISPAS −DF ISPAP− DF ISPASPA−DF

−4

10

−3

10

2

4

6

−5

10

8 User index

10

12

14

2

4

6

8 User index

10

12

14

−5

10

1

2

3

4

5 6 Number of iterations

7

8

9

10

Fig. 9. BER performance versus user index for (a) an uncoded system (b) a convolutionally coded system with rate R = 3/4.

Fig. 8. BER performance of a turbo decoded system with R = 3/4 versus number of iterations.

VII. C ONCLUSIONS decoder. The plots show that the proposed ISPASPA-DF and the ISPAP-DF detectors achieve the single user bound with only 4 and 7 iterations, respectively, whereas the remaining detectors require more iterations to achieve this performance. This is an important feature of the proposed detectors as they can save considerable computational resources by operating with a lower number of turbo iterations. The last scenario, shown in Figs. 9, considers the individual BER performance of the users for both uncoded and convolutionally encoded systems with Viterbi decoding. From the curves, we observe that a disadvantage of S-DF relative to PDF is that it does not provide uniform performance over the user population. We also notice that for the S-DF receivers, user 1 achieves the same performance of their linear receivers counterparts, and as the successive cancellation is performed users with higher indices benefit from the interference cancellation. The same non-uniform performance is verified for the proposed SPA-DF, the existing ISS-DF and the novel ISPAS-DF and ISPASPA-DF. Conversely, the new ISPAP-DF, the existing P-DF and the existing ISP-DF provide uniform performance over the users which is an important goal for the uplink of DS-CDMA systems. In particular, the novel ISPAPDF detector achieves the best uniform performance of the analyzed structures and is superior to the ISP-DF and to the PDF, that suffers from error propagation. For coded systems, we notice that the performance of the proposed ISPASPA-DF and ISPAS-DF, and the existing ISS-DF and S-DF becomes very attractive for the users with indices greater than 5 (where the SIC-based schemes outperform the ISPAP-DF, the ISP-DF and the P-DF). This suggests the deployment of these structures for systems that rely on differentiated services, where the quality of service (QoS) can be made different for different groups of users. In this context and as an example, users with the first indices and poorer performance should be allocated to voice services, while the users with better performance should be designated to data transmission services that require improved QoS.

A novel SPA-DF structure and a low complexity nearoptimal ordering algorithm were presented and combined with iterative techniques for use with cascaded DF stages for mitigating the deleterious effects of error propagation. The proposed SPA-DF and iterative receivers for DS-CDMA systems were investigated in an uplink scenario and compared to existing schemes in the literature. The results for both uncoded and convolutionally encoded systems using Viterbi and turbo decoding show that the new detection schemes can offer considerable gains as compared to existing DF and linear receivers, support systems with higher loads and mitigate the phenomenon of error propagation. A PPENDIX In this Appendix, we provide some relationships between the MMSE attained by a decision feedback structure with perfect and imperfect feedback. Let us consider an alternative expression for the cost function in (4) for user k: H H H JM SE = σb2 − wkH pk − pH k wk + wk Rwk + fk fk − wk Bfk

− f H B H wk

(40)

Consider the expression for the feedforward filter wk = R−1 (pk + Bfk ) obtained in (16) and the expression for the ˆb ˆ H ] in (17). feedback filter fk = Q−1 BH wk with Q = E[b By substituting the optimal MMSE expressions obtained in (17) into (16) for the filters we obtain an alternative expression for the feedback filter fk : fk = D−1 Q−1 BH R−1 pk

(41)

where D = (I − Q−1 BH R−1 B) and the above expression only depends on Q, B, R and pk . By inserting the expression wk = R−1 (pk + Bfk ) and (41) into (40), we have for user k:

788

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

the expression for the feedforward filter wk = R−1 U pk = pk , the MMSE for user k is approximately given by: 2 2 |Ak | +σ

JM M SE −1 −1 = σb2 − pH pk − fkH BH R−1 pk − pH Bfk k R k R

H 2 −1 pk JMMSE ≈ σb2 − pH k (pk pk + σ I)

− fkH BH R−1 Bfk + fkH fk −1 −1 = σb2 − pH pk − pH BQ−1 D−1 BH R−1 pk k R k R −1 − pH BQ−1 D−1 Q−1 BH R−1 pk k R H −1 − pk R BQ−1 D−1 BH R−1 BD−1 Q−1 BH R−1 pk −1 + pH BQ−1 D−1 Q−1 BH R−1 pk k R −1 −1 = σb2 − pH pk − pH BQ−1 D−1 BH R−1 pk k R k R −1 − pH BQ−1 D−1 Q−1 BH R−1 pk k R −1 + pH BQ−1 D−1 (I − BH R−1 B)D−1 Q−1 BH R−1 pk k R

At this point, it is convenient to adopt the judicious approxˆb ˆ H ] ≈ I, which is justified for moderate imation Q = E[b to low BER values. By using this approximation we have fk ≈ D−1 BH R−1 pk , where D ≈ (I − BH R−1 B), and the MMSE expression for user k is approximated by: −1 −1 JMMSE ≈ σb2 − pH pk − pH BD−1 BH R−1 pk k R k R

Note that the above result corresponds to the single-user bound because we assume that all users (with perfect decision) had been fed back, as in P-DF. For imperfect feedback, the P-DF is known to be susceptible to error propagation, while the S-DF is more effective in combating these deleterious effects. The proposed SPA-DF employs several versions of S-DF in parallel and chooses the best estimate amongst these parallel branches, resulting in improved performance over the S-DF, as verified in our studies. Here, we mathematically discuss the MMSE of the SPA-DF, under the assumption of perfect feedback. If we consider the SPA-DF with L branches, we have L different groups of undetected users, namely, U1 , U2 , . . . , UL and the associated expression for the feedforward filter wk = R−1 Ul pk , where l = 1, 2, . . . , L. Therefore, the MMSE for user k is approximately given by:

−1 − pH BD−1 BH R−1 pk k R

−1 + pH BD−1 (I − BH R−1 B)D−1 BH R−1 pk k R

−1 −1 ≈ σb2 − pH pk − pH BD−1 BH R−1 pk k R k R −1 − pH BD−1 BH R−1 pk k R −1 + pH BD−1 BH R−1 pk k R

−1 −1 ≈ σb2 − pH pk − pH BD−1 BH R−1 pk k R k R −1 ≈ σb2 − pH pk k R

−1 − pH B(I − BH R−1 B)−1 BH R−1 pk k R

(42)

The approximate expression obtained in (43) represents the MMSE attained by a general decision feedback structure that has imperfect feedback. The equation in (43) is a function of B, R and pk , and is still dependent on the decisions. Let us ˆ and look at the filter now assume perfect feedback (b = b) −1 expressions. Since wk = R (pk + Bfk ) and fk = BH wk = pH R−1 (pk + Bfk ) −1 −1 JMMSE ≈ σb2 − pH pk − pH BD−1 BH R−1 pk k R k R −1 −1 ≈ σb2 − pH pk − pH Bfk k R k R −1 ≈ σb2 − pH (pk + Bfk ) k R

−1 Rwk = σb2 − pH ≈ σb2 − pH k R k wk

JMMSE ≈ arg min (M SEUl ) 1≤l≤L

The approximate expression obtained in (43) has been significantly simplified due to the assumption of perfect feedback and indicates that the MMSE for user k is a function of wk . If we consider a decision feedback structure such as successive cancellation (S-DF), use the expression for the feedforward filter wk = R−1 U pk , the MMSE for user k is approximately given by: −1 JMMSE ≈ σb2 − pH (44) k RU pk where the above result means that the MMSE attained by user k is proportional to the number of undetected users expressed by the covariance matrix RU . If we consider a decision feedback structure such as parallel cancellation (P-DF), use

(46)

−1 where M SEUl = σb2 − pH k RUl pk and the above expression means that the MMSE attained by user k with the SPADF is at least equal to a standard S-DF (with L = 1 and approximate MMSE given by (45)). The approximate MMSE in (47) is also proportional to the number of undetected users expressed by the covariance matrix RUl , but can benefit from different groups of undetected users, by selecting the undetected group of users that yield smaller MSE, resulting in better performance. Indeed, the MMSE of the proposed SPA-DF structure in (47) is upperbounded by the MMSE of the standard S-DF detector given through (45). Here, we mathematically discuss the MMSE of S-DF detectors with the optimal ordering algorithm. If we consider an exhaustive search over all the possible orderings for an S-DF, we have K! different groups of undetected users or equivalently K! possible orderings. The optimal ordering SDF can be seen as a generalisation of the proposed SPA-DF structure in which the number of branches is equal to K!. Mathematically, for the case of imperfect decisions we have for the optimal ordering S-DF the following expression

JMMSE ≈ arg min (JMSE,l ) (43)

(45)

1≤l≤K!

(47)

where −1 H H −1 −1 pk − fk,l B R pk − pH Bfk,l JMSE,l = σb2 − pH k,l R k R H H − fk,l BH R−1 Bfk,l + fk,l fk,l

(48)

The expression in (49) is similar in form to the first line of (42) but depends on the ordering l and the associated feedback filter fk,l . In the case of perfect feedback, the corresponding expression for the feedforward filter is wk = R−1 Ul pk , where l = 1, 2, . . . , K! and we have K! different groups of undetected users, namely, U1 , U2 , . . . , UK! . Therefore, the MMSE for user k is approximately given by JMMSE ≈ arg min (M SEUl ) 1≤l≤K!

(49)

DE LAMARE and SAMPAIO-NETO: MINIMUM MEAN-SQUARED DECISION FEEDBACK DETECTORS

−1 where M SEUl = σb2 − pH k RUl pk and the above expression means that the MMSE attained by user k with the optimal ordering is at least equal to a standard S-DF (with L = 1 and approximate MMSE given by (45)). The approximate MMSE in (50) is indeed proportional to the number of undetected users expressed by the covariance matrix RUl . The key point is that the designer searches for all possible groups of undetected users and selects the one which yields the smallest MSE, resulting in better performance. The main problem is that as K increases the complexity becomes prohibitive and its implementation impractical.

R EFERENCES [1] S. Verdu, Multiuser Detection. Cambridge, 1998. [2] S. Verdu, “Minimum probability of error for asynchronous Gaussian multiple-access channels,” IEEE Trans. Inform. Theory, vol.IT-32, no. 1, pp. 85-96, Jan. 1986. [3] R. Lupas and S. Verdu, “Linear multiuser detectors for synchronous code-division multiple-access channels,” IEEE Trans. Inform. Theory, vol. 35, pp. 123136, Jan. 1989. [4] M. Abdulrahman, A. U. K. Sheikh, and D. D. Falconer, “Decision feedback equalization for CDMA in indoor wireless communications,” IEEE J. Select. Areas Commun., vol 12, no. 4, May 1994. [5] P. Patel and J. Holtzman, “Analysis of a simple successive interference cancellation scheme in a DS/CDMA systems,” IEEE J. Select. Areas Commun., vol. 12, no. 5, June 1994. [6] M. K. Varanasi and B. Aazhang, “Multistage detection in asynchronous CDMA communications,” IEEE Trans. Commun., vol. 38, no. 4, pp. 509-19, Apr. 1990. [7] S. Verdu and S. Shamai, “Spectral efficiency of CDMA with random spreading,” IEEE Trans. Inform. Theory, vol. 45, pp. 622-640, 1999. [8] P. B. Rapajic, M. L. Honig, and G. K. Woodward, “Multiuser decisionfeedback detection: Performance bounds and adaptive algorithms,” in Proc. IEEE Int. Symp. on Inform. Theory, Boston, MA, Aug. 1998, p. 34. [9] A. Duel-Hallen, “A family of multiuser decision-feedback detectors for asynchronous CDMA channels,” IEEE Trans. Commun., vol. 43, Feb.-Apr. 1995. [10] M. K. Varanasi and T. Guess, “Optimum decision feedback multiuser equalization with successive decoding achieves the total capacity of the Gaussian multiple-access channel,” in Proc. 31st Asilomar Conf. Signals, Systems and Computers, Monterey, Nov. 1997, pp. 1405-1409. [11] M. K. Varanasi, “Decision feedback multiuser detection: a systematic approach,” IEEE Trans. Inform. Theory, vol. 45, pp. 219-240, Jan. 1999. [12] J. Luo, K. R. Pattipati, P. K. Willet, and F. Hasegawa, “Optimal user ordering and time labeling for ideal decision feedback detection in asynchronous CDMA,” IEEE Trans. Commun., vol. 51, no. 11, Nov. 2003. [13] G. Woodward, R. Ratasuk, M. L. Honig, and P. Rapajic, “Multistage decision-feedback detection for DS-CDMA,” in Proc. IEEE ICC, June 1999. [14] G. Woodward, R. Ratasuk, M. L. Honig, and P. Rapajic, “Minimum mean-squared error multiuser decision-feedback detectors for DSCDMA,” IEEE Trans. Commun., vol. 50, no. 12, Dec. 2002. [15] M. Honig, G. Woodward, and Y. Sun, “Adaptive iterative multiuser decision feedback detection,” IEEE Trans. Wireless Commun., vol. 3, no. 2, Mar. 2004. [16] G. Barriac and U. Madhow, “PASIC: a new paradigm for lowcomplexity multiuser detection,” in Proc. Conf. on Inform. Sciences and Systems, The Johns Hopkins University, Mar. 2001. [17] J. Foerster and L. B. Milstein, “Coding for a coherent DS-CDMA system employing an MMSE receiver in a Rayleigh fading channel,” IEEE Trans. Commun., vol. 48, pp. 19091918, June 2000. [18] W. G. Phoel and M. L. Honig, “Performance of coded DS-CDMA with pilot-assisted channel estimation and linear interference suppression,” IEEE Trans. Commun., vol. 50, pp. 822832, May 2002. [19] P. D. Alexander, A. J. Grant, and M. C. Reed, “Iterative detection in code-division multiple-access with error control coding,” Eur. Trans. Telecommun., vol. 9, pp. 419425, Sept.-Oct. 1998.

789

[20] M. C. Reed, C. B. Schlegel, P. D. Alexander, and J. A. Asenstorfer, “Iterative multiuser detection for CDMA with FEC: near-single-user performance,” IEEE Trans. Commun., vol. 46, pp. 16931699, Dec. 1998. [21] P. D. Alexander, M. C. Reed, J. Asenstorfer, and C. B. Schlegel, “Iterative multiuser interference reduction: turbo CDMA,” IEEE Trans. Commun., vol. 47, July 1999. [22] X. Wang and H. V. Poor, “Iterative (turbo) soft interference cancellation and decoding for coded CDMA,” IEEE Trans. Commun., vol. 47, pp. 10461061, July 1999. [23] H. E. Galmal and E. Geroniotis, “Iterative multiuser detection for coded CDMA signals in AWGN and fading channels, IEEE J. Select. Areas Commun., vol. 47, pp. 3041, Jan. 2000. [24] J. G. Proakis, Digital Communications, 3rd edition. New York: McGraw Hill, 1995. [25] R. C. de Lamare and R. Sampaio-Neto, “Adaptive reduced-rank MMSE filtering with interpolated FIR filters and adaptive interpolators,” IEEE Signal Processing Lett., vol. 12, no. 3, Mar. 2005. [26] F. Vogelbruch and S. Haar, “Low complexity turbo equalization based on soft feedback interference cancellation,” IEEE Commun. Lett., vol. 9, no. 7, July 2005.

Rodrigo Caiado de Lamare received the Diploma in electronic engineering from the Federal University of Rio de Janeiro (UFRJ) in 1998 and the M.Sc. and PhD degrees, both in electrical engineering, from the Pontifical Catholic University of Rio de Janeiro (PUC-Rio) in 2001 and 2004, respectively. From January 2005 to June 2005, he was a Post-Doctoral Fellow at the Center for Studies in Telecommunications (CETUC) of PUC-Rio and from July 2005 to January 2006, he worked as a Post-Doctoral Fellow at the Signal Processing Laboratory, UFRJ. Since January 2006 he has been with the Communications Research Group, Department of Electronics, University of York, United Kingdom, where he is currently Lecturer in Communications Engineering. His research interests lie in communications and signal processing.

Raimundo Sampaio Neto received the Diploma and the M.Sc. degrees, both in electrical engineering, from Pontificia Universidade Cat´olica do Rio de Janeiro (PUC-Rio) in 1975 and 1978, respectively, and the Ph.D. degree in electrical engineering from the University of Southern California (USC), Los Angeles, in 1983. From 1978 to 1979 he was an Assistant Professor at PUC-Rio, and from 1979 to 1983 he was a doctoral student and a Research Assistant in the Department of Electrical Engineering at USC with a fellowship from CAPES. From November 1983 to June 1984 he was a Post-Doctoral fellow at the Communication Sciences Institute of the Department of Electrical Engineering at USC, and a member of the technical staff of Axiomatic Corporation, Los Angeles. He is now a researcher at the Center for Studies in Telecommunications (CETUC) and an Associate Professor of the Department of Electrical Engineering of PUC-Rio, where he has been since July 1984. During 1991 he was a Visiting Professor in the Department of Electrical Engineering at USC. Prof. Sampaio has participated in various projects and has consulted for several private companies and government agencies. He was co-organizer of the Session on Recent Results for the IEEE Workshop on Information Theory, 1992, Salvador. He has also served as Technical Program co-Chairman for IEEE Global Telecommunications Conference (Globecom’99) held in Rio de Janeiro in December 1999 and as a member of the technical program committees of several national and international conferences. He was in office for two consecutive terms for the Board of Directors of the Brazilian Communications Society where he is now a member of its Advisory Council and Associate Editor of the Journal of the Brazilian Communication Society. His areas of interest include communication systems theory, digital transmission, satellite communications and multiuser detection.

790

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

Synchronization and Integration Region Optimization for UWB Signals with Non-coherent Detection and Auto-correlation Detection Rongrong Zhang and Xiaodai Dong

Abstract—Non-coherent detection and auto-correlation detection are promising techniques for low complexity, low cost and low data rate ultra-wideband communication applications. For both schemes, the integration region of a receiver integrator significantly affects the bit error rate (BER) performance. In this paper, a method of synchronization and estimating the optimal integration region, i.e., the initial point and the length of the integration, is presented. Following a theoretical BER analysis, a data-aided estimation method using the idea of inter-symbol correlation is proposed. It is shown that using noise corrupted received signals, the proposed method is not only practically applicable, but also enhances the performance compared to nonoptimal timing methods. Index Terms—Ultra-wideband (UWB), binary pulse position modulation (BPPM), transmitted reference (TR), integration region optimization, data-aided synchronization, auto-correlation detection, non-coherent detection.

I. I NTRODUCTION N recent years, ultra-wideband (UWB) communication for personal area networks has attracted significant interest from researchers and engineers in the communication field. A UWB signal has a large bandwidth, which is either greater than 500 MHz or more than 20% of its center frequency. This ultra-wide bandwidth makes it possible to transmit very short pulses with a low duty cycle, providing greater multipath diversity as more multipath components are resolvable than in the traditional narrowband system. Optimal coherent detection of UWB signals requires formidably complex channel estimation [1]. Hence suboptimal receivers that do not require channel estimation were proposed for low complexity and low data rate applications, using either non-coherent detection of pulse position modulation (PPM) signals (e.g., [2]) or auto-correlation detection of transmitted reference (TR) signals (e.g., [3]). These two schemes simply perform “multiply-integration-and-detection” at the receiver, which requires no channel estimation and only frame or symbol rate sampling. They yield reasonable performance with a sufficiently low complexity. For these schemes, the position and length of the integration region greatly affects

I

Paper approved by Y. Li, the Editor for Wireless Communication Theory of the IEEE Communications Society. Manuscript received April 13, 2006; revised February 19, 2007. This work was presented in part at the IEEE Wireless Communication & Network Conference (WCNC), Hong Kong, March 2007. The authors are with the Department of Electrical and Computer Engineering, University of Victoria, Victoria, BC, Canada V8W 3P6 (e-mail: {rongrong, xdong}@ece.uvic.ca). Digital Object Identifier 10.1109/TCOMM.2008.060229.

their performance, because the start and end points of the integration determine how much signal energy and noise energy will be captured, which in turn determines the bit-errorrate (BER). This issue was studied in some literature [4]-[6]. In [4], synchronization for noncoherent schemes was performed iteratively until the integration interval is approximately the same as the signal region (SR), i.e., the time interval during which the transmitted pulses and their multipath components are received. This led to some performance improvement since the noise only region is excluded in the integration. However, covering the entire signal region is not necessarily an optimal solution. In [5] and [6], the integration region was divided into a large number of smaller intervals, and a weighted summation was performed on the integration results so that the signal-to-noise ratio (SNR) at the detector is maximized. Both [5] and [6] focused on the methods of finding the optimal combining weights, while assuming the synchronization was done beforehand and the integration intervals were fixed. It is a different approach from the one in this paper, which does a synchronization first and then determines the position of the integration interval. In this paper, determining the optimal integration region includes the estimation of two parameters: the start point of the integration and the length of the integration. The start point estimation is a similar problem to the fine synchronization that estimates the time of arrival (TOA) of the first significant tap. Previous literature, e.g. [7]-[8], proposed methods such as energy detection or maximum likelihood detection on the TOA estimation problem. Performance sensitivity to the timing inaccuracy of a non-coherent UWB system was derived in [9]. Theoretical discussion on the relationship between BER performance and the integration length of PPM noncoherent receivers and TR receivers can be found in [2] and [10], respectively. Except for the signal-to-noise ratio (SNR) maximization criterion derived, no practical method was given on how to carry out the optimal integration length estimation in both papers. In this paper, we develop an optimum integration region estimation method that is able to perform timing acquisition from the noise corrupted received signal. The method we propose is based on the idea of maximizing the captured receiving SNR at the integrator. In particular, we propose a data-aided approach that first performs a framelevel coarse timing, followed by accurate timing acquisition that determines the start point of the detection integration and then estimates the optimal integration length.

c 2008 IEEE 0090-6778/08$25.00 

ZHANG and DONG: SYNCHRONIZATION AND INTEGRATION REGION OPTIMIZATION FOR UWB SIGNALS

The idea of data-aided timing using inter-symbol correlation can be traced back to previous research on the synchronization for UWB signals in [11]-[12]. These two papers studied frame-level synchronization, which is similar to the coarse timing step in our method. Since both [11] and [12] focused on bipolar modulation with coherent detection, they did not deal with the integration length problem. In this paper, we further apply the idea of inter-symbol correlation with training sequence to non-coherent and auto-correlation schemes to perform fine timing and determine the optimal integration interval. Due to the transmitted reference signal structure, we are able to devise a relatively simple training sequence. The rest of this paper is organized as follows. Section II describes the system models. A theoretical BER performance analysis of the optimal integration region is given in Section III. Section IV proposes a practical estimation approach using training sequences. Section V presents some simulation results using the developed method and Section VI gives concluding remarks. II. S YSTEM M ODEL In impulse radio UWB communications, data are conveyed by pulses. Normally, a UWB symbol is comprised of several frames, with one or more pulses in each frame. Time hopping (TH), delay hopping (DH) or direct sequence (DS) spread spectrum may be applied on frame level or symbol level so as to achieve multi-user access. Pulse position dithering or polarity scrambling can also be adopted for data transmission to avoid spectral lines in the transmitted signal. In this paper we focus on synchronization with training sequences. Therefore, multiple access and scrambling are not involved in the system model. However, they can be adopted in data transmission once the synchronization is done. The modulation techniques considered in this paper are binary pulse position modulation (BPPM) and transmitted reference (TR) scheme. For BPPM, each frame contains a single pulse which locates in either the first half or the second half of the frame, depending on the data being “0” or “1”. For the TR scheme, one frame has two pulses. The first one that sits at the beginning of the frame is the reference pulse, and the second one at the middle of the frame is the data pulse, which may be the replica of the reference pulse or its antipode. Both BPPM and TR signals can be represented as  Nf  Eb   (i) (i) (i) s (t) = a1 p(t− nTf )+ a2 p(t− nTf − Tf /2) Nf n=1 (1) where (i) is the symbol index, Nf is the total number of frames in a symbol, Tf is the frame duration, Eb is the bit signal (i) (i) energy, and p(t) is the shaping pulse. Variables a1 and a2 are related to both data and the modulation scheme. For BPPM (i) (i) modulation, a1 = 1 − d(i) , a2 = d(i) , where d(i) ∈ {0, 1} is (i) a binary input data; for√transmitted reference, a1 ≡ √ whereas √ (i) (i) 2/2 and a2 = 2 × d − 2/2. Every single symbol in (1) is expressed as having its own time axis. The origin of the time axis is set at the time when that particular symbol is transmitted.

791

The UWB channel models described in IEEE 802.15.4a models can be generalized as a quasi-static tapped delay line [13]. That is, L  αl δ(t − τl ) (2) h(t) = l=1

where L is the total number of resolvable multipath taps, αl and τl are the complex magnitude and delay of the lth tap, respectively. To avoid the inter-frame interference (IFI) as well as the inter pulse interference (IPI) between the received reference and data pulses in the TR scheme, the frame duration Tf is chosen to satisfy Tf > 2(τL − τ1 + Tp ), where Tp is the duration of p(t) ∗ f (t) where f (t) is the receiver filter matched to p(t) and ∗ is operator of convolution. The received waveform after passing through f (t) is given by 

r

(i)

(t) =

Nf  Eb   (i) (i) a1 g(t−nTf )+a2 g(t−nTf −Tf /2) +n(t) Nf n=1 (3)

where g(t)  p(t) ∗ h(t) ∗ f (t) is the impulse response of the equivalent channel. The non-zero support of g(t) is [τ1 , τL + Tp ], which is shorter than Tf /2. The additive band limited complex Gaussian noise n(t) has a variance of N0 . Assuming an ideal low-pass filter (LPF) is used for f (t), the autocorrelation function of n(t) is Rn (τ ) = 2BN0 sinc(2Bτ ), where B is the bandwidth of n(t), or equivalently the bandwidth of f (t). An LPF matched to p(t) can also be used in the receiver, bringing only slight modification on Rn (τ ). Because the channel information is neither known by the transmitter nor by the receiver beforehand, and estimating it from the noise corrupted received signal is overwhelmingly costly, non-coherent or auto-correlation detection is a good compromise between complexity and performance. Noncoherent detection is used for orthogonal signals such as BPPM. The detector simply calculates the received signal energy in the two possible time slots in one frame, i.e., (i)

y1 =

Nf   j=1

(i) y2

=

Nf   j=1

T0 +T

|r(i) (t + jTf )|2 dt

(4)

T0 Tf /2+T0 +T

|r(i) (t + jTf )|2 dt

(5)

Tf /2+T0

where T0 and T are the start point and the length of the integration region respectively. The decision is then made as “1” (i) (i) y1 ≶ y2 . “0” The auto-correlation detector for TR signals correlates the reference pulse with the data pulse. That is, (i) yT R

=

Nf   j=1

T0 +T



r(i) (t + jTf )r(i)∗ (t + jTf + Tf /2)dt

T0

(6) where the superscript ∗ denotes complex conjugate, and (x) takes the real part of x. Then the decision is made as “1” (i) yT R ≷ 0. “0” Since T0 and T determine the receiving SNR, finding proper values for these two parameters is crucial to the performance

792

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

of the system. Next we present a method of estimating the optimal values for these two parameters. III. BER PERFORMANCE A NALYSIS In this section, the performance analysis will first focus on the PPM modulation, and the results can be easily extended to the transmitted reference scheme thereafter. In the absence of IFI and IPI, demodulation and detection can be carried out symbol by symbol. Thus in the subsequent discussion in this section the index i will be omitted for notation brevity. Intuitively, since more noise will be counted into the decision statistics with a longer integration length, the integration region should be chosen within the non-zero support of g(t), i.e., T0  τ1 and T0 + T  τL + Tp . Suppose in the i-th symbol interval d = 0 is transmitted, the detector outputs are then given by (7) and Nf  T /2+T0 +T f  y2 = |n(t + jTf )|2 dt  ζ4 , (8) j=1

Tf /2+T0

T +T where Ecap (To , T ) = Eb T00 |g(t)|2 dt is the signal energy captured in the integration, ζ1 and ζ2 are the signal-noise cross terms, and ζ3 and ζ4 are the noise-noise cross terms. It can be proved that when T  Tp , they can be approximated as independent Gaussian random variables

of which the distri butions are respectively ζ1 , ζ2 ∼ N 0, N0 Ecap (T0 , T ) and ζ3 , ζ4 ∼ N 2Nf N0 BT, 2Nf N02 BT . Similar to the expression given in [2], the bit error rate of BPPM non-coherent detection is given by

Fig. 1.

Received TR signal (noiseless) r (i) (t) and r (i+1) (t), Nf = 4.

threshold crossing (TC) scheme can be used in the estimation of T0 . That is, ⎧

ˆ ⎪ ⎨ T0 = max τ |Ecap (τ1 , τ − τ1 ) < ξ 2 (11) Ecap (Tˆ0 , T ) ⎪ . ⎩ Tˆ = arg max 2 ˆ0 , T ) T 4Nf N BT + 2N0 Ecap (T 0 Similar derivation can lead to the optimal integration interval of the transmitted reference signals. Again, suppose d = 0 is transmitted, the auto-correlator output given by (6) can be expressed as shown in (12). Similar to the PPM case, the noise terms ζ1 , ζ2 , ζ3 can be approximated as independent Gaussian random variables when T  Tp , with distributions

N E (T ,T )

, ζ3 ∼ N 0, Nf N02 BT . Thus as ζ1 , ζ2 ∼ N 0, 0 cap4 0 the BER of a TR receiver can be written as

Pe,P P M = P (y1 < y2 ) = P Ecap (T0 , T ) + ζ1 + ζ2 + ζ3 − ζ4 < 0 Pe,T R = P (yT R > 0) = P (−Ecap (T0 , T )/2 + ζ1 + ζ2 + ζ3 > 0) 

 2 (T , T ) 2 (T , T ) Ecap Ecap 0 0 =Q . =Q 4Nf N02 BT + 2N0 Ecap (T0 , T ) 4Nf N02 BT + 2N0 Ecap (T0 , T ) (9) (13)

∞ t2 where Q(x)  √12π x e− 2 dt. Eq. (9) shows that the BER of BPPM signal largely depends on the choice of T0 and T . Therefore the optimization of T0 and T is equivalent to the maximization 2 Ecap (T0 , T ) . (Tˆ0 , Tˆ ) = arg max 2 T0 ,T 4Nf N0 BT + 2N0 Ecap (T0 , T )

(10)

Noticing that most of the energy in h(t) is in the front part and the latter part contains relatively scatter and low-energy pulses, by including the latter part of SR into the integration interval the additional collected signal energy may not compensate the additional noise energy. Therefore, the optimal integration length T is usually smaller than the length of SR. Although the two-dimensional maximization in (10) can be solved by trying all the possible values within T0 ∈ this [τ1 , τL + Tp ] and T ∈ [0, τL + Tp − T0 ] for both

variables, τ +T −τ approach has a computation complexity at O ( L Δtp 1 )2 , which grows rapidly as τL + Tp − τ1 gets larger or the trying step size Δt gets smaller. In order to alleviate the computation task, we hope to fix one degree of freedom first and then perform the maximization over the other variable. It is found through simulation that deeming T0 as the arrival time of the first “significant” tap, i.e., the first tap whose captured energy exceeds a small threshold ξ, involves only negligible performance degradation compared to the optimum. Thus a

A similar expression for TR can be found in [14]. From (13), the optimal integration interval for TR system is given by (Tˆ0 , Tˆ ) = arg max

2 Ecap (T0 , T ) . + 2N0 Ecap (T0 , T )

T0 ,T 4Nf N02 BT

(14)

IV. O PTIMIZATION USING TRAINING SEQUENCE Comparing (9) and (13), the BER performance of BPPM and TR are exactly the same, provided that all the parameters in (9) and (13) are equal. This implies the optimum integration region of these two schemes should be exactly the same for the same channel condition. In this section, a data-aided {T0 , T }estimation method for both schemes is presented. Because the designed training sequence is based on the TR signal structure, we introduce our method for the TR scheme. However, exactly the same training pulses and the estimation method apply to the BPPM scheme, since both schemes have the equivalent optimal integration region. Suppose at the i-th bit interval, d(i) = 1, and at the (i + 1)-th bit interval, d(i+1) = 0. Fig. 1 depicts the noiseless version of the two received TR signals r(i) (t) and r(i+1) (t) with Nf = 4. The integration region optimization procedure consists of three steps: 1) Coarse timing Let the receiver catch τ at an arbitrary point τ0 = iNf Tf +˜ τ at the very beginning, where i is an integer and τ˜ is in the

ZHANG and DONG: SYNCHRONIZATION AND INTEGRATION REGION OPTIMIZATION FOR UWB SIGNALS



T0 +T

y1 = Eb

 2

|g(t)| dt +

T0

 +

793

Nf  Eb  T0 +T g(t)n∗ (t + jTf )dt Nf j=1 T0

Nf  Nf  T0 +T  Eb  T0 +T ∗ g (t)n(t + jTf )dt + |n(t + jTf )|2 dt Nf j=1 T0 T 0 j=1

(7)

 Ecap (T0 , T ) + ζ1 + ζ2 + ζ3

yT R

Eb =− 2  +



T0 +T

 2

|g(t)| dt +

T0

Nf  Tf  Eb  T0 +T  ) dt  g(t)n∗ (t + jTf + 2Nf j=1 T0 2

Nf  Nf  T0 +T    Tf  Eb  T0 +T  ∗ ) dt  g (t)n(t + jTf ) dt +  n(t + jTf )n∗ (t + jTf + 2Nf j=1 T0 2 j=1 T0

(12)

 −Ecap (T0 , T )/2 + ζ1 + ζ2 + ζ3 f −1   N  Ij = 

n=0

τ˜+

τ˜

Tf 2

r(i) (t + nTf + j

 Tf Tf ∗(i) Tf  )r (t + nTf + + j )dt, j = 0 · · · 2Nf − 1. 2 2 2

region [τL + Tp − Tf /2, τL + Tp + (Nf − 1/2)Tf ] as shown in Fig. 1. Perform the integrations shown in (15). If for a certain T j, τ˜ + j 2f falls in the first half of the first frame in a symbol plus a small interval from the end of the last received pulse of the previous symbol to the beginning of the current symbol, T i.e., τ˜ + j 2f ∈ [τL + Tp − Tf /2, τ1 + Tf /2], the noiseless part T of Ij will be E2b . Then for another j, if τ˜ + j 2f falls in the second

half of the first frame, the noiseless part of Ij Tbecomes Eb ˙ )/Nf < Eb , where τ˜˙  (˜ 1 − 2E( τ ˜ τ − τ1 mod 2f ) + τ1 . 2 2 Tf For other j’s, τ˜ + j 2 fall into other frames of the current symbol, and the noiseless part of Ij will be even smaller. Therefore simply shifting τ to Tf arg max Ij (16) τˆ = τ0 + j 2 will ensure τˆ to be in the first half of the first frame in a symbol plus a small interval before it, i.e., [iNf Tf + τL + Tp − Tf /2, iNf Tf + τ1 + Tf /2] or [(i + 1)Nf Tf + τL + Tp − Tf /2, (i + 1)Nf Tf + τ1 + Tf /2], which is the prerequisites of the following fine timing steps. 2) Estimating T0 , or equivalently τ1 After the coarse timing step we have τˆ ∈ [τL + Tp − Tf /2, τ1 + Tf /2], which is indicated in Fig. 1. Note that since whether τˆ is at the beginning of the i-th symbol or the (i + 1)th symbol does not affect the fine timing step, parameter i is omitted here for brevity. Now do the following integration T f −1  τ + f  N  2 x(τ ) =  r(i) (t + nTf )r(i+1)∗ (t + nTf )dt , n=0

τ

Tf Tf  τ  τˆ + τˆ − 2 2

(17) T

T

for all τ ’s in the range [ˆ τ − 2f , τˆ + 2f ], which is just the correlation between the shadowed parts of r(i) (t) and r(i+1) (t) τ in Fig. 1. Define E(τ )  0 |g(t)|2 dt, then Eb

1 − 2E(τ ) + ζ(τ ) (18) x(τ ) = 2

(15)

Fig. 2. Typical curve of x(τ ) with detailed display of the threshold and estimated T0 .

where ζ(τ ) is approximately a zero-mean Gaussian noise process with variance σζ2 = (Nf N02 BTf + N0 Eb )/2. A plot of x(τ ) for τ  τL + Tp − Tf /2 including the effect of noise ζ is illustrated in Fig. 2. When τ ∈ [τL + Tp − Tf /2, τ1 ], which is the blank space from the end of the last received pulse to the begining of the current pulse, E(τ ) achieves its minimum value 0, which leads x(τ ) to its maximum Eb /2 if neglecting ζ(τ ). When τ < τL + Tp − Tf /2, x(τ ) monotonously increases, whereas when τ > τ1 , x(τ ) monotonously decreases. In other words, a noiseless x(τ ) has a plateau shape with flat top in the interval T τ ∈ [τL + Tp − 2f , τ1 ]. Thus we can estimate T0 as

Tf Tf  τ  τˆ + (19) Tˆ0 = max τ |x(τ ) > x0 , τˆ − 2 2

1 where the threshold x0 ≈ Eb 2 − ξ . The rule used to decide the value of x0 is given later.

794

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

3) Estimating T Since Tˆ0 is obtained, we can confine candidates of T in the region [Tˆ0 , Tˆ0 + Tf /2]. Comparing the definition of E(τ ) and Ecap (Tˆ0 , T ) and using (13) and (18), T can be estimated as  2 x(Tˆ0 ) − x(τ + Tˆ0 ) Tˆ = arg max   . (20) τ 4Nf N02 Bτ + 2N0 x(Tˆ0 ) − x(τ + Tˆ0 ) Defining the SNR γ  Eb /N0 , and normalizing x(τ ) to x ˜(τ )  x(τ )/Eb , (20) can be rewritten as  2 ˜(Tˆ0 ) − x γ2 x ˜(τ + Tˆ0 ) ˆ T = arg max (21)  . τ 4Nf Bτ + 2γ x ˜(Tˆ0 ) − x ˜(τ + Tˆ0 ) To use (20) or (21) the knowledge of the SNR γ is required at the receiver. When γ is not too large, e.g. when γ < 15 dB for CM1 and when γ < 20 dB for CM8, which are quite reasonable in practical environments, the second term in the denominator is much smaller than the first one. For example, when Nf = 4, B = 500 MHz, Tf = 1 μs, γ = 17 dB, τ will be more than 125 ns for a typical CM8 channel, thus the x(Tˆ0 ) − first term 4Nf Bτ > 1000 while the second term 2γ(˜ ˆ x ˜(τ + T0 )) < 100. Therefore we can simply ignore the second term in the denominator of (20) or (21) so that they can be further simplified to  2 x(Tˆ0 ) − x(τ + Tˆ0 ) . (22) Tˆ = arg max τ τ Eq. (22) is the formula adopted in Section V to estimate T . For a system operating in a high SNR environment we can use (21) instead of (22) and substituting γ by a fixed SNR value less than the true SNR. It is found in our simulation that if the true SNR is in the 10-20 dB region, γ fixed at 10 dB will work well. The discussion so far has not considered the nuisance noise term ζ(τ ). When the noise term is taken into account, the useful signal waveform will be distorted, sometimes even inundated, making accurate estimation almost impossible without necessary denoising treatment. Since ζ(τ ) is approximately a Gaussian random process, an effective way of denoising is by averaging. To better fulfill this task, a training sequence is implemented. Observing (17), we can see that x(τ ) is essentially the correlation of two consecutive TR symbols, one conveying data “1” and the other conveying “0”. Thus a training sequence can be designed such that “0”s and “1”s appear alternately, i.e., (i)

dt = (−1)i−1 , i = 1 · · · N

(23)

where an even integer N denotes the length of the sequence. With the N -symbol training sequence, there are now N − 1 pairs of consecutive symbols that can be used to generate x(τ ). We can average the N − 1 correlation outcomes to arrive at a revised x(τ ) as x ¯(τ ) =

T f −1  τ + f   N−1  N 2 1 r (i) (t + nTf ) · r ∗(i+1) (t + nTf )dt .  N −1 τ i=1 n=0 (24)

It is easy to find that x ¯(τ ) has a smaller noise variance of σζ2¯ = (Nf N02 BTf + N0 Eb )/2(N − 1).

Furthermore, if we are able to implement analog averaging over several symbols, the variance of the noise term can be further reduced by first averaging over the analog symbol waveforms and then correlating the averaged signals ¯(τ ) is as shown in (25). The variance of noise term of x 2 2 2 σζ¯ = (2Nf N0 BTf + N N0 Eb )/N . Obviously, a longer training sequence will result in smaller σζ2¯ and σζ2¯, which makes the estimation more accurate. With a large N , σζ2¯ ¯(τ ) will be much greater than σζ2¯, thus the improvement of x becomes more prominent than x ¯(τ ). The value of the threshold x0 can be decided according to the chosen

averaging scheme. In simulation, we choose x0 as max x(τ ) − 2σζ if no denoising process is employed at all. If (24) or (25) is used for averaging, the threshold be

should ¯0 = max x¯(τ ) −2σζ¯, ¯(τ ) −2σζ¯ or x adjusted to x ¯0 = max x respectively. An example of the threshold x0 and the estimated Tˆ0 is shown in Fig. 2. Similar to (24) and (25), averaging process can also be performed on the coarse timing step, using (26) or (27) instead of (15), where M is the length of the training sequence that is used to do the coarse timing. It can be the same as N , or can be not. One thing worth noting is that with coarse timing and fine synchronization together, the length of the training sequence should be at least M +N so that there are M training symbols used for coarse timing and there are another N symbols used for fine synchronization and estimating the optimal integration length T . Furthermore, some implementation considerations are presented in the following. First, the above discussion of the timing method is based on the continuous signal. To search for the estimation of T0 and T exhaustively in the continuous T T region [ˆ τ − 2f , τˆ + 2f ] and [Tˆ0 , Tˆ0 + Tf /2], respectively, is a formidable task, if not impossible at all. To reduce the complexity, the search can be performed on a discrete series of time instants instead. The distance between adjacent time instants, denoted by Tb , depends on the required synchronization resolution. Since the searching process requires the whole training sequence to be sent once for every calculation of x(τ ), a long preamble would be needed which affects the achievable data rate. To avoid this problem we can substitute the integration in (17) with a series of integrations over smaller intervals, as T shown in (28), where Q = 2Tfb and x takes the integral part of x. The xk ’s are stored in the registers and then (17) can be obtained by digital processing as x(τ )|τ =ˆτ +kTb =

k+Q−1 

xj , k = −Q, · · · , Q − 1.

(29)

j=k

Consequently, eqs. (19) and (22) can be expressed as

τ + kTb ) > x0 , k = −Q, · · · , Q − 1} Tˆ0 = Tb max {k|x(ˆ (30) and  2 x(Tˆ0 ) − x(kTb + Tˆ0 ) , k = 1, · · · , Q Tˆ = Tb · arg max k kTb (31) and (24)-(25) can be modified accordingly.

ZHANG and DONG: SYNCHRONIZATION AND INTEGRATION REGION OPTIMIZATION FOR UWB SIGNALS

¯(τ ) =  x

 N f −1  n=0

τ +Tf /2

τ

795

 2 N/2   2 N/2     (2i−1) ∗(2i) r (t + nTf ) · r (t + nTf ) dt N i=1 N i=1

(25)

T f −1  τ M−1  ˜+ 2f   N Tf 1 Tf Tf  + j )dt I¯j = r(k+i) (t + nTf + j ) × r∗(k+i) (t + nTf +  M i=0 n=0 τ˜ 2 2 2

(26)

j = 0 · · · 2Nf − 1 Tf Nf −1  2 −1 2 −1 Tf   2   τ˜+ 2   Tf   Tf ¯ Ij = 2  + j ) dt r(k+2i) (t + nTf + j ) r∗(k+2i) (t + nTf + M n=0 τ˜ 2 2 2 i=0 i=0 M

M

Tf Nf −1  2 −1 2 −1 Tf   2   τ˜+ 2   Tf   Tf + j ) dt + 2 r(k+2i+1) (t + nTf + j ) r∗(k+2i+1) (t + nTf + M n=0 τ˜ 2 2 2 i=0 i=0 M

(27)

M

j = 0 · · · 2Nf − 1 f −1   N

n=0

τˆ+(k+1)Tb

τˆ+kTb

 r(i) (t + nTf )r(i+1)∗ (t + nTf )dt , k = −Q, · · · , 2Q − 1

Note that the algorithm still requires an analog delay line as long as a symbol duration, which is out of the capability of a normal wideband analog delay line. This is a well known open problem in the TR literature. Future research will focus on modifying the algorithm to avoid the long delay requirement.

−1

−2

10

−3

10

V. S IMULATION R ESULTS

CM1, Theoretically optimal CM1, Estimated with (24) CM1, Estimated with (25) CM1, Scale = 0.5 CM1, Fixed length = 25 ns

10

11

12

13 SNR (dB)

14

15

16

−1

10

−2

10 BER

In this section, simulation results of PPM non-coherent detection and transmitted reference scheme are presented for IEEE 802.15.4a CM1 and CM8 channel models, respectively. The default parameters for the simulations are as follows: number of frames in a symbol Nf = 1, frame duration Tf = 1 μs, the bandwidth B = 494 MHz, sampling rate for the simulation fs = 3.952 GHz, the shaping pulse p(t) is a root raised cosine pulse with roll-off factor β = 0.25, the resolution Tb is the simulation sampling rate. All the simulation results are the average of the error rate in 100 different channel realizations. Synchronization is performed on each of the channel realizations. Figs. 3-4 represent four different scenarios, including PPM modulation in the CM1 channel model, TR scheme in the CM1 channel model, PPM modulation in the CM8 channel model, and TR scheme in the CM8 channel model. In all simulations, we assume that the coarse timing is perfectly done beforehand and we only focus on the fine timing steps. Each figure contains five BER vs. SNR curves. Among them, the theoretically optimal curve stands for the method given by (10) and (14), with the assumption that full channel state information is known at the receiver. The “Estimated with (24)” curve represents the proposed estimation method using eqs. (19) and (22)-(24). Eq. (23) is used to construct a training sequence, (19) and (22) are used to estimate T0 and T respectively, and eq. (24) is used for averaging. The “Estimated with (25)” curve represents the estimation method that utilizes almost the same equations as the “Estimated with (24)” method, except this time eqs. (25) is employed to do the

(28)

10

BER

xk = 

−3

10

−4

10

15

CM8, Theoretically optimal CM8, Estimated with (24) CM8, Estimated with (25) CM8, Scale = 0.5 CM8, Fixed length = 100 ns 15.5

16

16.5

17

17.5 SNR (dB)

18

18.5

19

19.5

20

Fig. 3. BER obtained with 5 different integration regions for PPM noncoherent detection in CM1 and CM8 channel realizations.

averaging. It is clearly shown that this estimation method has a very close performance to the theoretically optimal method. The performance of the method using (24) for denoising is slightly worse than the method using (25), because the noise variances of (24) is larger than that of (25). We have also performed simulations using (21) instead of (22), by fixing γ to a value smaller than the actual SNR. Our results show negligible difference between the two for both CM1 and CM8 channels. The rest two non-optimal methods, one determines the integration region from the amplitude of the channel gain and the other fixes the integration length as constants, are included in Figs. 3-4 for comparison to show the benefit of performing integration region optimization. The “Scale=0.5” method defines the integration start and end points to be the first and last taps that have magnitude greater than or

796

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

−1

BER

BER

10

−2

10

−3

10

CM1, N=128 CM1, N=64 CM1, N=32 CM1, N=16 CM1, N=8

−1

10

CM1, Theoretically optimal CM1, Estimated with (24) CM1, Estimated with (25) CM1, Scale = 0.5 CM1, Fixed length = 25 ns

10

11

−2

10

−3

10 12

13 SNR (dB)

14

15

16

11

−1

10

11. 5

12

12. 5

13

13. 5 SNR (dB)

14

14. 5

15

15. 5

CM8, N=128 CM8, N=64 CM8, N=32 CM8, N=16 CM8, N=8

−1

10

−2

10

BER

BER

−2

−3

10

−4

10

15

CM8, Theoretically optimal CM8, Estimated with (24) CM8, Estimated with (25) CM8, Scale = 50. CM8, Fixed length = 100 ns 15. 5

16

16. 5

−3

17

17. 5 SNR (dB)

18

18. 5

19

19. 5

20

15

16

16. 5

17

17. 5 SNR (dB)

18

18. 5

19

19. 5

20

−1

10

Bit Error Rate

−2

10

−2

10

N =2, Theoretically optimal f

Nf=2, Fixed Length=8 ns Nf=2, Scale=0.5

−3

10

11. 5

12

12. 5

13

13. 5 SNR (dB)

14

14. 5

15

15. 5

Nf=2, Est. using (24)

16

N =2, Est. using (25) −3

10

CM8, N=128 CM8, N=64 CM8, N=32 CM8, N=16 CM8, N=8

−1

10

−2

f

13

13. 5

14

14. 5 Eb/N0

15

15. 5

16

16

16. 5 E /N

17

17. 5

18

−1

10

10

−2

Bit Error Rate

BER

15. 5

Fig. 6. The effect of different training sequence length on the performance of transmitted reference in CM1 and CM8 channel realizations. Eq. (25) is used for averaging.

CM1, N=128 CM1, N=64 CM1, N=32 CM1, N=16 CM1, N=8

−1

10

BER

10

10

Fig. 4. BER obtained with 5 different integration regions for transmitted reference in CM1 and CM8 channel realizations.

11

16

−3

10

10

N =4, Theoretically optimal f

−3

10

Nf=4, Fixed Length=4 ns Nf=4, Scale=0.5 N =4, Est. using (24) f

15

15. 5

16

16. 5

17

17. 5 SNR (dB)

18

18. 5

19

19. 5

N =4, Est. using (25)

20 −4

10

15

f

15. 5

b

Fig. 5. The effect of different training sequence length on the performance of PPM non-coherent detection in CM1 and CM8 channel realizations. Eq. (24) is used for averaging.

equal to 50% of the strongest tap. Since it uses the channel amplitude instead of the channel energy to determine the integration region, this approach has worse performance than the theoretical and estimated optimal ones. Note that this method also requires channel state information, thus is difficult to implement. The “Fixed length” curve integrates the signal from the beginning of a frame (assuming known) through a fixed integration length which is set to be 25 ns for the CM1 channel and 100 ns for the CM8 channel. Apparently, this non-adaptive method gives the worst performance among the five. CM1 is found to be more sensitive to the choice of integration region than CM8. The parameters in the scale method and fixed length method are carefully chosen through simulation trials such that the average BER over different channel realizations can be reduced as much as possible. Figs. 5-6 display the effect of different training sequence lengths on the BER performance. Since the PPM non-coherent detection case and TR scheme are equivalent in terms of BER

0

Fig. 7. The BER performance of 5 different integration region determining methods. The top plot represents the Nf = 2 case, the bottom plot represents the Nf = 4 case.

performance, we plotted the curves of using (24) with PPM non-coherent detection and the curves of using (25) with the TR scheme. Actually the results are interchangeable for PPM and TR. As discussed in the previous section, a longer training sequence results in a smaller noise variance and consequently lower BER. Generally, if (24) is used, a training sequence with N = 64 is sufficient, while if (25) is used, N = 32 or more makes very little difference on the BER performance. Fig. 7 shows the simulation results of the Nf = 2 and Nf = 4 cases in CM1 channels for the TR scheme. Compared to Figs. 3-4, we can find that the difference between a multi-frame signal structure and a single-frame structure is that the former performs integration on multiple intervals thus involves more noise and has a worse performance. For example, performance of an Nf = 2 system is roughly 1.5 dB worse than an Nf = 1 system, and the same difference exists between an Nf = 4 system and an Nf = 2 system.

ZHANG and DONG: SYNCHRONIZATION AND INTEGRATION REGION OPTIMIZATION FOR UWB SIGNALS 0

0

10

10

−1

Bit error rate

Bit error rate

−2

10

−2

10

CM1 10 dB CM1 12 dB CM1 14 dB

−5

0 Error in T0 estimation (ns)

−3

10

5

−1

−20

−10 0 10 Error in T estimation (ns)

20

−1

10

10

−2

VI. C ONCLUSION

−2

Bit error rate

10

Bit error rate

10

CM1 10 dB CM1 12 dB CM1 14 dB

−3

10

and M = 64 if employing (26) to ensure 90% probability of correct coarse timing. Our simulation also shows that the coarse timing step of the proposed method is more reliable than the methods presented in [11] and [12]. This is mainly because the method of [11], in which a {+1, +1, -1, -1} pattern training sequence is employed, utilizes only M/2 correlations compared to M correlations in the proposed method, and the method of [12] requires an additional analog averaging process to construct a template beforehand.

−1

10

CM8 16 dB CM8 18 dB CM8 20 dB

−3

10

−4

10 −10

10

CM8 16 dB CM8 18 dB CM8 20 dB

−3

10

−4

−5 0 5 Error in T0 estimation (ns)

10 −50

10

0 Error in T estimation (ns)

50

Fig. 8. The effect of a timing error on the BER performance of the noncoherent and TR schemes. The top two plots represent error in T0 and T for CM1 channels; the bottom two plots represent error in T0 and T for CM8 channels. 25

20

T0−τ1 (ns)

T −τ (ns)

20

0

1

15 10

0

15 10 5

5 0

10 20 30 40 CM1 Channel Realization Index

0

50

30

0

10 20 30 40 CM8 Channel Realization Index

50

R EFERENCES

160 T (ns)

T (ns)

180

10

140 120 100 80

0

10 20 30 40 CM1 Channel Realization Index

50

0.06

0

10 20 30 40 CM8 Channel Realization Index

50

0

10 20 30 40 CM8 Channel Realization Index

50

0.035 0.03 BER

BER

0.04

0.025

0.02 0.02 0

In this paper, a data-aided algorithm of synchronization and optimal integration region estimation has been proposed. The algorithm is based on the theoretical analysis of the optimal integration region that minimizes the probability of error. The proposed method employs a {0,1,0,1} pattern training sequence to first perform a frame-level coarse timing, then does a fine synchronization and finally estimates the optimal integration length by auto-correlation operations on consecutive symbol waveforms. To enhance the estimation accuracy, averaging over the training sequence has been applied for denoising purpose. Simulation results have shown that the proposed method can greatly reduce the bit error rate compared to the conventional fixed length method and achieve close to theoretical optimum performance.

200

20

0

797

0

10 20 30 40 CM1 Channel Realization Index

50

0.015

Fig. 9. The distribution of the optimal integration region and the corresponding BER of 50 CM1 and CM8 channel realizations.

In Fig. 8, the effect of T0 or T mistiming is evaluated. We can see that the BER performance in CM1 channels is quite sensitive to the accuracy of T0 and T estimation. The BER performance in CM8 channels is less sensitive to the estimation accuracy, but estimation errors can still cause visible BER degradation. Fig. 9 plots the distributions of the optimal T0 , T and the corresponding BER of 50 channel realizations for CM1 and CM8. It is shown that both T0 and T may vary in a large range between different channel realizations, and the corresponding BER also changes greatly. Therefore, for every single channel realization an integration region optimization is worthwhile for the non-coherent detection and auto-correlation detection systems. We have also tested the coarse timing step of our algorithm but the figures are not shown here. We find when the SNR is greater than 11 dB in CM1 channels or greater than 15 dB in CM8 channels, an M = 16 is sufficient if applying (27)

[1] V. Lottici, A. D’Andrea, and U. Mengali, “Channel estimation for ultrawideband communications,” IEEE J. Select. Areas Commun., vol. 20, no. 9, pp. 1638-1645, Dec. 2002. [2] M. Weisenhorn and W. Hirt, “Robust noncoherent receiver exploiting UWB channel properties,” in Proc. IEEE Conference on Ultra-wideband Systems and Technologies pp. 156-160, May 2004. [3] R. Hoctor and H. Tomlinson, “Delay-hopped transmitted-reference RF communications,” in Proc. IEEE Conference on Ultra-wideband Systems and Technologies, pp. 265-269, May 2002. [4] N. He and C. Tepedelenlioglu, “Adaptive synchronization for noncoherent UWB receivers,” in Proc. ICSSAP, vol. 4, pp. 517-520, May 2004. [5] J. Romme and K. Witrisal, “Transmitted-reference UWB systems using weighted autocorrelation receivers,” IEEE Trans. Microwave Theory Techniques, vol. 54, no. 4, pp. 1754-1761, Apr. 2006. [6] G. Leus and A. van der Veen, “A weighted autocorrrelation receiver for transmitted reference ultra wideband communications,” in Proc. IEEE Workshop on SPAWC, pp. 965-969, June 2005. [7] J. Y. Lee and R. A. Scholtz, “ Ranging in a dense multipath environment using an UWB radio link,” IEEE J. Select. Areas Commun., vol. 20, no. 9, pp. 1677-1683, Dec. 2002. [8] A. Rabbachin and I. Oppermann, “Synchronization analysis for UWB systems with a low-complexity energy collection receiver,” in Proc. IEEE Conf. Ultra-wideband Systems and Technologies, pp. 288-292, May 2004. [9] N. He and C. Tepedelenlioglu, “Performance analysis of non-coherent UWB receivers at different synchronization levels,” IEEE Trans. Wireless Commun., vol. 5, no. 6, pp. 1266-1273, June 2006. [10] S. Franze and U. Mitra “Integration interval optimization and performance analysis for UWB transmitted reference systems,” in Proc. IEEE Conference on Ultra-wideband Systems and Technologies, pp. 26-30, May 2004. [11] L. Yang and G. Giannakis, “Timing ultra-wideband signals with dirty templates,” IEEE Trans. Commun., vol. 53, no. 11, pp. 1952-1963, Nov. 2005. [12] Z. Tian and G. Giannakis, “A GLRT approach to data-aided timing acquisition in UWB radios–part I: algorithms,” IEEE Trans. Wireless Commun., vol. 4, no. 6, pp. 2956-2967, Nov. 2005. [13] A. F. Molisch et al., “IEEE 802.15.4a channel model - final report,” Tech. Rep. Document IEEE 802.15-04-0662-02-004a, 2005.

798

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

[14] Y.-L. Chao and R. A. Scholtz, “Optimal and suboptimal receivers for ultra-wideband transmitted reference systems,” in Proc. IEEE Global Telecommunications Conference, vol. 2, pp. 759-763, Dec. 2003.

Rongrong Zhang (S’06) received his B.Eng. degree in Electronics and Information Engineering from Shanghai Jiao Tong University, China in 2004. Currently he is working towards an M.A.Sc degree with Dr. Xiaodai Dong in University of Victoria. He is presently a research assistant in the wireless communication and network lab in University of Victoria, Victoria, BC, Canada. His research interest is timing and synchronization of ultra-wideband communication.

Xiaodai Dong (S’97-M’00) received her B.Sc. degree in Information and Control Engineering from Xi’an Jiaotong University, China in 1992, her M.Sc. degree in Electrical Engineering from National University of Singapore in 1995 and her Ph.D. degree in Electrical and Computer Engineering from Queen’s University, Kingston, ON, Canada in 2000. She is presently an Assistant Professor and Canada Research Chair (Tier II) in Ultra-wideband Communications at the Department of Electrical and Computer Engineering, University of Victoria, Victoria, BC, Canada. Prior to joining UVic, she was an Assistant Professor at the Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB, Canada. From 1999 to 2002, she was with Nortel Networks, Ottawa, ON, Canada and involved in the base transceiver design of the third-generation (3G) mobile communication systems. Dr. Dong is an Associate Editor for IEEE Transactions on Communications and an Editor for Journal of Communications and Networks. Her research interests include communication theory, modulation and coding, and ultrawideband radio.

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

799

Cross-layer Adaptive Transmission: Optimal Strategies in Fading Channels Anh Tuan Hoang, Member, IEEE, and Mehul Motani, Member, IEEE Abstract—We consider cross-layer adaptive transmission for a single-user system with stochastic data traffic and a timevarying wireless channel. The objective is to vary the transmit power and rate according to the buffer and channel conditions so that the system throughput, defined as the long-term average rate of successful data transmission, is maximized, subject to an average transmit power constraint. When adaptation is subject to a fixed bit error rate (BER) requirement, maximizing the system throughput is equivalent to minimizing packet loss due to buffer overflow. When the BER requirement is relaxed, maximizing the system throughput is equivalent to minimizing total packet loss due to buffer overflow and transmission errors. In both cases, we obtain optimal transmission policies through dynamic programming. We identify an interesting structural property of these optimal policies, i.e., for certain correlated fading channel models, the optimal transmit power and rate can increase when the channel gain decreases toward outage. This is in sharp contrast to the water-filling structure of policies that maximize the rate of transmission over fading channels. Numerical results are provided to support the theoretical development. Index Terms—Cross-layer design, adaptive transmission, throughput maximization, Markov decision process.

I

I. I NTRODUCTION

N modern and future wireless communications, maximizing throughput under limited available energy and bandwidth is and will be a challenging task. The task becomes even harder in scenarios when the data arrival processes are stochastic, the buffer space is limited, and the transmission medium is time-varying. In this paper, we study a problem of adapting the transmission parameters of a single-user system according to the data arrival statistics, buffer occupancy, and channel condition in order to maximize the system throughput. We consider the single-user system depicted in Fig. 1. Time is divided into frames of equal length. During each frame, data packets arrive to the buffer according to some known stochastic distribution. The buffer has a finite length and when there is no space left, arriving packets are dropped and considered lost. The transmitter transmits data in the buffer over a discrete-time block-fading channel. The fading process is represented by a finite state Markov chain (FSMC). We define the system state during each time frame as the combination of the buffer occupancy and the channel state. Paper approved by R. Fantacci, the Editor for Wireless Networks and Systems of the IEEE Communications Society. Manuscript received April 4, 2006; revised August 28, 2006, November 8, 2006, and February 19, 2007. A. T. Hoang is with the Department of Networking Protocols, Institute for Infocomm Research (I2R), 21 Heng Mui Keng Terrace, Singapore 119613. Previously, he was with the Department of Electrical and Computer Engineering, National University of Singapore (e-mail: [email protected]). M. Motani is with the Department of Electrical and Computer Engineering, National University of Singapore, Singapore 119260 (e-mail: [email protected]). Digital Object Identifier 10.1109/TCOMM.2008.060214.

Transmitter

Packets

Wireless Channel

Receiver

Buffer Control Signals

Fig. 1. A single-user system with stochastic data arrival, limited buffer, and time-varying channel.

In this paper, we assume that the transmitter and receiver have complete knowledge of the instantaneous system state information (SSI). We deal with imperfect SSI (e.g., delayed, erroneous, and quantized SSI) in [1], [2]. In general, for the SSI to be available at the transmitter and receiver, some processing and signaling is required. Assuming that both the transmitter and receiver have complete knowledge of the current system state, we consider the problem of adapting the transmit power and rate during each time frame according to the SSI so that the system throughput is maximized, subject to an average transmit power constraint. The system throughput is defined as the rate at which packets are successfully transmitted. We first consider the case when the adaptation is subject to a fixed bit error rate (BER) requirement. This may be appropriate when a certain quality of service is mandated by communication standards or specific user applications. In this case, maximizing the system throughput is equivalent to minimizing the packet loss rate due to buffer overflow. When the BER requirement is relaxed, we take into account the tradeoff between packet loss due to buffer overflow and packet loss due to transmission errors. Our work is closely related to the work by Goldsmith in [3] and [4]. The objective of our work and that of [3], [4] are similar, i.e., to maximize the throughput of transmission over a time-varying channel subject to an average transmit power constraint. However, we take into account the effects of stochastic data arrival, finite-length buffer, and transmission errors, and adapt the transmit power and rate to both the channel gain and buffer occupancy. With this formulation, we point out an interesting structural property of the optimal policies, i.e., for certain correlated fading channel models, the optimal transmit power and rate can increase as the channel gain decreases. This is in sharp contrast to the water-filling structure of the capacity achieving policy in [3], [4]. Taking a broader view, our work follows the cross-layer design approach, which aims to take the system variations and statistics at multiple layers of the protocol stack into account. In our work, the transmission decisions, which are part of the physical layer, take into account the data arrival statistics and the buffer condition, which are the parameters of higher layers. In this context, our paper is closely related to the works

c 2008 IEEE 0090-6778/08$25.00 

800

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

in [5]–[8], in which a similar system model with stochastic data arrival, a finite-length buffer, and a time-varying channel is considered. However, our work is different from [5]–[8] in several significant ways. First, while the objective of our work is to maximize the system throughput, [5]–[8] concentrate more on achieving good quality of service (QoS), which is defined as the average delay experienced by each data packet. Second, the objective of maximizing the throughput motivates us to consider the effects of transmission errors, which are not considered in [5]–[8]. Third, in [5]–[8], the authors characterize how the optimal transmission rate depends on the channel condition; however, their characterization is only for the case when the fading process is independent and identically distributed (i.i.d.) over time. In that case, the structure of the optimal policies is similar to water-filling. In our work, we look at the dependency when the fading process is time correlated and make an interesting observation. The works in [9] and [10] also take both data arrival and channel statistics into account when carrying out adaptive transmission. While their formulation allows for optimizing over both packet losses due to transmission failure and buffer overflow, their assumptions result in no packet losses due to transmission errors. Specifically, their policies never transmit above the Shannon capacity and they assume no transmission errors at rates below capacity. In their recent works ( [11], [12]), Liu et al. do take into account both packet losses due to transmission errors and buffer overflows. Their definition of system throughput is also similar to ours. However, the policies considered in [11], [12] adapt to the channel state information only, not to the buffer and data arrival statistics. With respect to the existing literature, the main contributions of this paper can be summarized as follows. •







We obtain, via dynamic programming, optimal policies which maximize the system throughput for two scenarios, i.e., with and without a fixed BER requirement. When the BER requirement is relaxed, we show that there is a tradeoff between packet loss due to transmission errors and packet loss due to buffer overflow. We show that for certain correlated channel models and relatively large power constraints, the optimal transmission power and rate can increase as the channel gain decreases. This effect is in contrast to policies which operate in the spirit of water-filling. We present numerical results to support the theoretical development. Specifically, we compare via simulation the performance of our optimal policies to various suboptimal schemes. We also confirm via numerical computations the structure of the optimal policies mentioned above.

The rest of this paper is organized as follows. In Section II, we define our throughput maximization problem. In Section III, we describe our approach to solve the problem, via dynamic programming. Section IV deals with the structure of the optimal policies. In Section V, we relax the BER constraint and consider both buffer overflow and transmission error. In Section VI, we present numerical results and discussion. We end with some concluding remarks in Section VII.

II. P ROBLEM D EFINITION A. System Model We consider a single-user system depicted in Fig. 1. Time is divided into frames of equal length of Tf seconds each and frame i refers to the time period [iTf , (i+1)Tf ). The number of packets arriving to the buffer during frame i is denoted by Ai . We assume that these Ai packets are only added to the buffer at the end of frame i. We consider the case when {Ai } is i.i.d. over time so that the index i can be omitted. The distribution of the number of packets arriving during each time frame is assumed known and denoted by pA (a). The average packet arrival rate is λ. All packets have the same length of L bits. The buffer can store up to B packets and if a packet arrives when the buffer is full, it is dropped and considered lost. We consider a discrete-time block-fading channel with additive white Gaussian noise (AWGN). W and No /2 respectively denote the channel bandwidth and noise power density. The fading process is represented by a stationary and ergodic Kstate Markov chain, with the channel states numbered from 0 to K−1. The channel power gain of state g, g ∈ {0, . . . K−1}, is denoted by γg . During each frame, the channel is assumed to remain in a single state. Letting Gi denote the channel state during time frame i, the channel state transition probability is defined as PG (g, g  ) = Pr{Gi = g  | Gi−1 = g}. 

(1) 

We assume that PG (g, g ) are known for all g and g . The stationary distribution of each channel state g is denoted by pG (g). In general, a finite state Markov channel (FSMC) is suitable for modeling a slowly varying flat-fading channel [13], [14]. A FSMC is constructed by first partitioning the range of the fading gain into a finite number of sections. Then, each section corresponds to a state in the Markov chain. Given knowledge of the fading process, the stationary distribution pG (g) as well as the channel state transition probabilities PG (g, g  ) can be derived [13], [14]. B. Adaptive Transmission We denote the system state in frame i by S i = (Bi , Gi ), where Bi is the number of packets in the buffer at the beginning of frame i while Gi is the channel state during frame i. At the beginning of frame i, we assume that the transmitter and the receiver have complete knowledge of S i . We assume that, based on S i , the transmitter can vary its transmit power and rate. For frame i, let Pi (Watts) and Ui (packets/frame) denote the transmit power and rate, respectively. We must have 0 ≤ Ui ≤ Bi . In addition, we assume that Pi ∈ P where P is the set of all power levels that the transmitter can operate at. We call a pair (Ui , Pi ) a control action for frame i. Note that the transmitter can change its transmission rate Ui by changing the coding and/or modulation schemes [4], [15]–[17]. Let Pb (g, u, P ) be the function that gives the BER when the channel state is g and the transmit power and rate are P and u respectively. Pb (g, u, P ) depends on the specific coding, modulation, and detection schemes used. We further

HOANG and MOTANI: CROSS-LAYER ADAPTIVE TRANSMISSION: OPTIMAL STRATEGIES IN FADING CHANNELS

assume that a packet is in error if at least l out of its L bits are corrupted. Then we can characterize the packet error probability in terms of u, g, P as Pp (g, u, P ) L     (L−j) L = Pb (g, u, P )j 1 − Pb (g, u, P ) . j

(2)

j=l

As an example, let us change the transmission rate by varying the constellation size of an M-ary quadrature amplitude modulator (MQAM) while fixing its symbol rate. From [18], assuming ideal coherent phase detection, the BER for a particular transmit power P and rate u bits per QAM symbol can be upper-bounded by   P γg . (3) Pb (g, u, P ) ≤ 0.2 exp −1.5 W No (2u − 1) C. Throughput Maximization Problem We adopt the following definition of the system throughput. Definition 1: The system throughput is the long-term average rate at which packets are successfully transmitted. For an average packet arrival rate λ, a buffer overflow probability Pof , and a packet error rate Pp , the system throughput can be calculated by throughput = λ − λPof − Pp (λ − λPof ) = λ(1 − Pof )(1 − Pp ).

(4)

We consider the following optimization problem: Throughput Maximization Problem: At the beginning of each time frame i, given that the system state S i is known to the transmitter and the receiver, select the transmission parameters (Ui , Pi ) so that the system throughput is maximized, subject to an average transmit power constraint P . D. Satisfying a BER Constraint From this point on until the end of Section IV, we adopt an extra constraint that the control action (Ui , Pi ) must be selected so that a fixed BER is satisfied. From a practical point of view, many existing communication protocols require a fixed BER. Furthermore, enforcing a BER requirement enables us to have a good comparison between our optimal adaptive transmission policies and those obtained in [3]–[6], where a BER constraint is also enforced. Let P (u, g, Pb ) be the minimum power needed to transmit u packets in a frame of length Tf seconds when the channel state is g and the BER constraint is Pb . P (u, g, Pb ) depends on the specific coding, modulation, and detection schemes being used. Furthermore, we must have P (u, g, Pb ) ∈ P. In case there is no power level in P that satisfies both the transmission rate u and the BER constraint when the channel state is g, then it means that transmission rate u is not feasible in channel state g. As an example, if an adaptive MQAM scheme as described at the end of Section II-B is employed, from (3), we can approximate P (u, g, Pb ) by: P (u, g, Pb )

   W No − log(5Pb )(2u − 1) . = arg min P ≥ γg 1.5 P ∈P (5)

801

In general, we assume that P (u, g, Pb ) is non-decreasing in u and non-increasing in g. As the BER performance is always kept at Pb , from (2), the packet error probability Pp is always kept unchanged. When both λ and Pp are fixed, from (4), it is clear that maximizing the system throughput is equivalent to minimizing Pof . So from now on, we concentrate on minimizing the rate at which packets are dropped due to buffer overflow. For frame i, given that there are b packets in the buffer and we decide to transmit at rate u packets/frame, the expected number of packets that are dropped due to buffer overflow is

Lo (b, u) = E max{0, A + b − u − B} (6) where the expectation is with respect to the distribution of A, i.e., the number of packets arriving in the frame. Our optimization problem can be written as: T −1  1 arg min lim sup E (Lo (Bi , Ui )) (7) U0 ,...,UT −1 T →∞ T i=0 subject to: Ui ∈ {0, 1, . . . Bi } ∀i = 0, 1, . . . T − 1, T −1  1 lim sup E P (Ui , Gi , Pb ) ≤ P . T →∞ T i=0

(8) (9)

III. PARETO O PTIMAL P OLICIES Instead of directly solving the above optimization problem, we reformulate it as a problem of minimizing a weighted sum of the long-term packet drop rate and average transmit power. In particular, we aim to minimize T −1  1 Javr = lim sup E CI (Bi , Gi , Ui ) , (10) T →∞ T i=0 where CI (b, g, u) is the immediate cost incurred in state (b, g) when the control action (u, P ) is taken, i.e., CI (b, g, u) = P (u, g, Pb ) + βLo (b, u).

(11)

In (11), β is a positive weighting factor that gives the priority to reducing packet loss over conserving power. In particular, by increasing β, we tend to transmit at a higher rate in order to lower the packet loss rate at the expense of more transmit power. On the other hand, for smaller values of β, the average transmission power will be reduced at the cost of increasing packet loss rate. As pointed out in [6], if P β and Lβ are the average power and packet loss rate (due to buffer overflow) obtained when minimizing Javr for a particular value of β, then Lβ is also the minimum achievable loss rate given a power constraint of P β . In other words, for each value of β, minimizing Javr gives us a Pareto optimal point (Lβ , P β ) in the Loss Rate versus Power Constraint curve. The problem of minimizing Javr is an infinite horizon average cost Markov decision process (MDP) with system state S i = (Bi , Gi ), control action Ui , and immediate cost function CI (Bi , Gi , Ui ). For an MDP to be well defined, we also need to characterize the dynamics of the system given a control action in a particular system state. Supposing the system state at time frame i is S i = s = (b, g) and a control

802

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

action u is taken, the probability of the system being in state s = (b , g  ) in the next time frame is PS (s, s , u) = Pr{S i+1 = s | S i = s, Ui = u} = PG (g, g  )PB (b, b , u),

(12)

where PB (b, b , u) = Pr{Bi+1 = b |Bi = b, Ui = u}.

(13)

Furthermore, Bi+1 = min{Bi + Ai − Ui , B}.

(14)

Based on (12), (13), (14), the system dynamics are well defined. Let π = {μ0 , μ1 , μ2 , . . .} be a policy which maps system states into transmission rates for each frame i, i.e., Ui = μi (Bi , Gi ). We have the optimization problem π ∗ = arg min Javr (π) π

1 = arg min lim sup E π T →∞ T

T −1 

CI (Bi , Gi , Ui )

(15) .

π

π

−1  T lim E αi

T →∞

i=0

CI (Bi , Gi , Ui ) |B0 = b, G0 = g, π



(16)

K−1 ∞ 

 Jα∗ (b, g) = min CI (b, g, u) + α PG (g, g  ) g =0 a=0

 pA (a)Jα∗ (min{b + a − u, B}, g  ) .

ΔI (b, g, u1 , u2 ) = CI (b, g, u2 ) − CI (b, g, u1 )

(19)

ΔF (b, g, u1 , u2 ) = CF (b, g, u1 ) − CF (b, g, u2 ).

(20)

and

As can be seen, ΔI (b, g, u1 , u2 ) is the increase in immediate cost while ΔF (b, g, u1 , u2 ) is the reduction in future cost when the transmission rate is increased from u1 to u2 . Clearly, action u2 is more favorable than u1 in state (b, g) if and only if ΔI (b, g, u1 , u2 ) < ΔF (b, g, u1 , u2 ). From (11) and (19), we have ΔI (b, 1, u1 , u2 ) − ΔI (b, 2, u1 , u2 ) = P (u2 , 1, Pb ) − P (u1 , 1, Pb ) − P (u2 , 2, Pb ) + P (u1 , 2, Pb ).

.

where 0 < α < 1 is the discounting factor. As the immediate cost function CI is bounded, the limit in (16) always exists. As shown in [19], when α → 1 the solution of the discounted cost problem converges to that of the average cost problem in (15). Moreover, the solution of the discounted cost problem satisfies a simple dynamic programming equation given by

u

For state (b, g) with b > 0, g > 0, let 0 ≤ u1 < u2 ≤ b be two possible transmission rates. Let

i=0

In our system, as all states are connected, there exists a stationary policy π ∗ , i.e. μi ≡ μ for all i, which is a solution to (15). To simplify the notation, we just write Ui = π ∗ (Bi , Gi ). As it is shown in [19], using a simple policy iteration algorithm, an optimal policy π ∗ can be reached in a finite number of steps. Finally, it is also useful to consider the discounted cost problem defined as: arg min Jα (b,g, π) = arg min

We will show the above effect for a simple FSMC model which has three possible states, i.e., K = 3. In particular, we assume that 0 = γ0 < γ1 < γ2 . Moreover, in the channel model, transitions can only happen between adjacent channel states, i.e., PG (0, 2) = PG (2, 0) = 0 while PG (0, 0), PG (0, 1), PG (1, 1), PG (1, 0), PG (1, 2), PG (2, 2), PG (2, 1) are all positive . Let us look at the insight behind the dynamic programming equation (17). When the system is in state (b, g), b > 0, g > 0, there are two effects of taking a control action u. First, transmitting at rate u incurs an immediate cost CI (b, g, u). Second, transmitting at rate u in state (b, g) also reduces the future cost K−1 ∞   PG (g, g  )pA (a) CF (b, g, u) = α (18) g =0 a=0  ∗  Jα (min{b + a − u, B}, g ) .

We state the following lemma, the proof of which is given in the Appendix. Lemma 1: For each buffer state b > 0, there exists a constant βo such that for every β > βo and 0 ≤ u1 < u2 ≤ b, the following inequality holds: ΔI (b, 1, u1 , u2 ) − ΔI (b, 2, u1 , u2 ) < ΔF (b, 1, u1 , u2 ) − ΔF (b, 2, u1 , u2 ).

(17)

Equation (17) is particularly useful for analyzing the structure of the optimal policy. IV. S TRUCTURE OF THE O PTIMAL P OLICY In this section, we will point out that, for certain FSMC models in which the fading process is correlated over time, when the transmission power constraint is relatively large, the optimal transmission power and rate are non-increasing in the channel gain. This is counter to the well known water-filling structure of the capacity-achieving link adaptation policy, which allocates more transmission power to good channel states and less power to bad channel states [3].

(21)

(22)

Theorem 1: For each buffer state b > 0, let βo be defined as in Lemma 1 and β > βo , then the optimal transmission rate for each state (b, g), g > 0, is non-increasing in g. Proof: We present a proof by contradiction. Let u∗1 and ∗ u2 be the optimal transmission rate at states (b, 1) and (b, 2) respectively. Suppose 0 ≤ u∗1 < u∗2 ≤ b. From (17) we have CI (b, 1, u∗1 ) + CF (b, 1,u∗1 ) (23) ≤ CI (b, 1, u∗2 ) + CF (b, 1, u∗2 ). CI (b, 2, u∗2 ) + CF (b, 2,u∗2 )

≤ CI (b, 2, u∗1 ) + CF (b, 2, u∗1 ).

(24)

Inequalities (23) and (24) respectively imply (25) and (26) ΔI (b, 1, u∗1 , u∗2 ) = CI (b, 1, u∗2 ) − CI (b, 1, u∗1 ) ≥ CF (b, 1, u∗1 ) − CF (b, 1, u∗2 ) = ΔF (b, 1, u∗1 , u∗2 ),

(25)

HOANG and MOTANI: CROSS-LAYER ADAPTIVE TRANSMISSION: OPTIMAL STRATEGIES IN FADING CHANNELS

ΔI (b, 2, u∗1 , u∗2 ) ≤ ΔF (b, 2, u∗1 , u∗2 ).

(26)

From (25) and (26) we have:

and sufficient condition for a power level to be optimal is that it must satisfy P = arg min CI (b, g, u, P )

ΔI (b, 1, u∗1 , u∗2 ) − ΔI (b, 2, u∗1 , u∗2 ) (27) ≥ ΔF (b, 1, u∗1 , u∗2 ) − ΔF (b, 2, u∗1 , u∗2 ), which contradicts Lemma 1 and therefore, u∗1 ≥ u∗2 . Comment: Theorem 1 shows that for a certain correlated fading channel model and average transmission power constraint, the optimal transmission rate is non-increasing in the channel gain. In fact, our numerical results (see Section VI) show an even stronger effect, i.e., in some cases, the optimal transmission rate decreases when the channel gain increases.

803

P ∈P

(34)

= arg min {P + βLe (g, u, P )} . P ∈P

In other words, in each system state, we only have to decide which rate the transmitter should use. After that, the power level will follow directly by solving (34). Let π be a stationary policy which maps system states into transmission rate for each frame i, i.e., Ui = μi (Bi , Gi ). Define CI∗ (b, g, u) = min CI (b, g, u, P ) (35) P ∈P

V. R EMOVING THE BER C ONSTRAINT

and

In this section, we relax the BER constraint and allow the trade off between packet loss due to buffer overflow and packet loss due to transmission errors.

1 Javr (π) = lim sup E T T →∞

T −1 

CI∗ (Bi , Gi , Ui )|π

.

(36)

i=0

We have to solve the optimization problem A. Taking Packet Loss Due to Transmission Error into Account

π∗ = arg min Javr (π).

Given the transmission rate u, power P , channel state g, and the packet error probability of Pp (g, u, P ), the expected number of packets lost due to transmission error is

Again, this problem can be solved efficiently using dynamic programming techniques [19].

Le (g, u, P ) = uPp (g, u, P )

(28)

For fixed packet arrival rate, maximizing the system throughput is equivalent to minimizing total packet loss rate due to both buffer overflow and transmission error. So we have the optimization problem: T −1   1 arg min lim sup E Lo (Bi , Ui ) + Le (Gi , Ui , Pi ) Ui ,Pi T →∞ T i=0 (29) subject to: Ui ∈ {0, 1, . . . Bi } ∀i = 0, 1, . . . T − 1, Pi ∈ P

∀i = 0, 1, . . . T − 1, T −1  1 lim sup E Pi ≤ P . T →∞ T i=0

(30) (31) (32)

B. Optimal Policies Similar to the approach in Section III, we can reformulate the above optimization problem as a problem of minimizing a weighted sum of the total packet loss rate (due to buffer overflow and transmission error) and average transmission power. The only modification needed here is for the immediate cost function CI . Now we have:   CI (b, g, u, P ) = P + β Lo (b, u) + Le (g, u, P ) . (33) At time i, let the system state be S i = s = (b, g) and a control action (u, P ) is taken, the probability of the system being in state s = (b , g  ) in the next time frame is still characterized by (12), (13), (14). An important point to note from (12), (13), (14) is that the chosen transmission power level P does not have any effect on the system dynamics. Therefore, given a choice of transmission rate u, the necessary

π

(37)

VI. N UMERICAL RESULTS AND D ISCUSSION In this section, we present numerical results to illustrate the previous theoretical development. We focus on the structure of the optimal buffer and channel adaptive transmission policies as well as the performance, in terms of the packet loss rate, normalized by the arrival rate λ. A. System Parameters Packets arrive to the buffer according to a Poisson distribution with average rate λ = 103 and 3 × 103 packets/second. All packets have the same length of L = 100 bits. The buffer length is B = 15 packets. The channel bandwidth is W = 100 kHz and the power density of AWGN noise is No /2 = 10−5 Watt/Hz. We consider both cases of correlated and i.i.d. fading channels. For the correlated channel model, we use an 8-state FSMC as described in Table I. This channel model is obtained by quantizing the power gain of a Rayleigh fading channel that has average power gain γ = 0.8 and Doppler frequency fD = 10 Hz. For the i.i.d. channel model, the values of the channel gains are the same as in Table I; however, the channel evolves independently over time with all states being equiprobable. Adaptive transmission is based on a variable-rate, variablepower MQAM scheme similar to that described in [4]. Let Ts be the symbol period of the MQAM modulator and assume a Nyquist signaling pulse, sinc(t/Ts ), is used so that the value of Ts is fixed at 1/W seconds. When the symbol period Ts is kept unchanged, varying the signal constellation size of the modulator gives us different data transmission rates. As has been specified in Section II, the power and rate adaptation are carried out in a frame-by-frame basis. Each frame consists of F modulated symbols, i.e., Tf = F Ts . We set F = L = 100 so that when a signal constellation of size M = 2u is used, exactly u packets are transmitted during each time frame.

804

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

TABLE I C HANNEL STATES AND TRANSITION PROBABILITIES . Channel states k γk Pkk Pk,k+1 Pk,k−1

0 0 0.9359 0.0641 0

1 0.1068 0.8552 0.0807 0.0641

2 0.2301 0.8334 0.0859 0.0807

3 0.3760 0.8306 0.0835 0.0859

7

4 0.5545 0.8420 0.0745 0.0835

5 0.7847 0.8665 0.0590 0.0745

6 1.1090 0.9048 0.0361 0.0590

7 1.6636 0.9639 0 0.0361

6 1 packet in buffer 5 packets in buffer 10 packets in buffer 14 packets in buffer

1 packet in buffer 5 packets in buffer 10 packets in buffer

Packets transmitted/frame

Packets transmitted/frame

6

14 packets in buffer

5

4

3

2

4

3

2

1

1

0

5

0

1

Worst state

2

3

4

Channel states

5

6

0

7

Best state

Fig. 2. Structure of optimal policies, i.e., transmission rates for different channel states when the buffer occupancy is fixed at 1, 5, 10, 14 packets. Power constraint P = 16dB. Channel is correlated over time (Tab. I).

0

1

2

Worst state

3

4

5

Channel states

6

Best state

7

Fig. 3. Structure of optimal policies, i.e., transmission rates for different channel states when the buffer occupancy is fixed at 1, 5, 10, 14 packets. Power constraint P = 16dB. Channel is i.i.d. over time. 0.5

0.4

Mormalized Packet Loss Rate

As discussed in Sections III and V, we consider two classes of buffer and channel adaptive transmission policies. In the first class, transmit power and rate are selected subject to a fixed BER requirement. We use (5) to approximate the power needed to transmit u bits per QAM symbol when the channel gain is γk and the BER constraint is Pb . This class of policies is called MDP I. The other class of adaptive transmission policies is called MDP II. In MDP II policies, the BER requirement is removed and packet loss due to transmission errors is taken into account. Also, for MDP II policies, we assume that the set P of possible power levels is finite.

−3

MDP_I (BER = 10 ) Ch_Adpt Ch_Ivn

0.45

0.35 0.3 0.25

0.2

0.15

0.1

B. Structure of Optimal Policies First, let us look at the structure of MDP I policies obtained by solving (15) for the correlated FSMC given in Table I. In Fig.2, we plot the optimal transmission rates of an MDP I policy obtained when λ = 103 packets/sec, B = 15 packets, fD = 10 Hz, Pb = 10−3 and P = 16 dB for different values of the buffer occupancies. As can be seen, for a particular value of the buffer occupancy, the optimal transmission rate increases when the channel gain decreases toward the outage point (state 0). This is consistent with our discussion in Section IV. For comparison, we also obtain an optimal policy for the i.i.d. channel model and plot its structure in Fig. 3. As can be seen, for each buffer occupancy, the optimal transmission rate increases when the channel gain increases.

12

14

16

18

20

22

24

Power (dB) Fig. 4. Normalized packet loss rate (due to buffer overflow only) versus average transmission power. B = 15, λ = 3 packets/frame, Pb = 10−3 . Channel model is given by Table I.

C. Packet Loss due to Buffer Overflow Now we compare the performance of MDP I policies with those of other adaptive transmission schemes. All transmission is subject to a BER constraint of Pb = 10−3 and we only care about packet loss due to buffer overflow. We consider two other types of policies, channel inversion, i.e., C Inv, and channel adaptive, i.e., C Adpt. In a C Inv policy, the transmission rate is always kept unchanged and given the

HOANG and MOTANI: CROSS-LAYER ADAPTIVE TRANSMISSION: OPTIMAL STRATEGIES IN FADING CHANNELS 0.6

805

0.6 −3

−3

MDP_I, BER = 10

−5

MDP_I, BER = 10

0.4

−6

MDP_I, BER = 10 MDP_II, 20 power levels

0.3 0.25 0.2

0.15

0.1 14

MDP_I, BER = 10 MDP_II, 5 power levels MDP_II, 10 power levels MDP_II, 20 power levels

0.5

MDP_I, BER = 10−4

Normalized Packet Loss Rate

Normalized Packet Loss Rate

0.5

0.4

0.3 0.25 0.2

0.15

0.1 16

18

20

22

24

26

14

16

18

20

Power (dB)

channel gain, the necessary power is calculated based on (5) to guarantee the target BER. In a C Adpt policy, we use the optimal link-adaptive policy that maximizes the transmission rate for our channel model under some power constraint and with the assumption that there are always packets to transmit. The performance of the three schemes, in terms of normalized packet loss rate (due to buffer overflow) versus consumed power are shown in Fig. 4. As expected, MDP I outperforms the other two classes of adaptive policies. For low value of average transmit power, the performance of MDP I policies and C Adpt policies are very close while that of the C Inv policies is much worse. This is expected, since at low power, the structure of an MDP I policy is similar to that of the C Adpt, and by focusing on conserving power, the system performance is improved. At high power, the performance of MDP I and C Inv policies are close and it is interesting to see that the C Inv scheme results in less packet loss rate relative to the C Adpt scheme. This means that at this high range of average transmission power, if we only adapt to the channel, the performance can be worse than not doing any rate adaptation at all. D. Packet Loss due to Buffer Overflow and Transmission Errors Now we take packet transmission errors into account and compare the performance, in terms of total normalized packet loss rate (due to buffer overflow and transmission errors) versus average transmit power, of the two classes of buffer and channel adaptive transmission policies, namely MDP I and MDP II. Fig. 5 is for correlated channel model. We plot the performances of MDP I policies corresponding to BER values of 10−3 , 10−4 , 10−5 , 10−6 and an MDP II scheme that has 20 different power levels, from 4 to 40 dB. As can be seen, among all the schemes, the MDP II scheme performs best. For high values of BER, i.e. 10−3 and 10−4 , MDP I policies perform well in low ranges of transmission power while become much worse than the MDP II policies when the power is high.

24

26

Fig. 6. Normalized packet loss rate (due to buffer overflow and transmission errors) versus average transmission power. B = 15, λ = 3 packets/frame. Channel model is correlated over time and is given in Table I. 0.4 0.3

Normalized Packet Loss Rate

Fig. 5. Normalized packet loss rate (due to buffer overflow and transmission errors) versus average transmission power. B = 15, λ = 3 packets/frame. Channel model is correlated over time and is given in Table I.

22

Power (dB)

0.2 0.1 0.05

0.01 MDP_I, BER = 10−3 −4

MDP_I, BER = 10

MDP_I, BER = 10−5 MDP_I, BER = 10−6 MDP_II, 20 power levels 14

16

18

20

22

24

Power (dB) Fig. 7. Normalized packet loss rate (due to buffer overflow and transmission errors) versus average transmission power. B = 15, λ = 3 packets/frame. Channel model is i.i.d. over time.

On the other hand, for low value of BER, i.e. 10−6 , the performance of MDP I is much worse than MDP II in low power range. This can be explained by looking at the structure of the MDP II. As MDP II tries to balance between packet loss due to buffer overflow and transmission errors, when the power constraint is low, it tends to transmit at relatively high BER values and when the power constraint is high, it transmits at low BER levels. In other words, at low power, the structure of a MDP II scheme is similar to those of MDP I schemes corresponding to high BER constraints. On the other hand, when the power constraint is high, MDP II is closer to a MDP I with low value of BER constraint. In Fig. 6, we plot the performance of different MDP II policies that correspond to different numbers of possible power levels (from 4 to 40dB). As can be seen, even with only 5 different power levels, the MDP II scheme can perform much better than MDP I schemes. Figs. 7 and 8 show result for i.i.d. channel models and similar effects can be observed.

806

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

all g. For 0 < b ≤ B, let u∗ be a value that achieve the minimization in (38). a) If u∗ = 0, then from (38) we have:

Normalized Packet Loss Rate

0.4 0.3 0.2

Ji (b − 1, g) ≤ CI (b − 1, g, 0)

0.1

K−1 ∞ 



0.05

  PG (g, g  )pA (a)Ji−1 q(b − 1, a), g 

g =0 a=0

< CI (b, g, 0) + α 

0.01 −3

Ji−1 q(b, a), g

MDP_I, BER = 10 MDP_II, 5 power levels MDP_II, 10 power levels MDP_II, 20 power levels 14

16

18

20

22

24

Fig. 8. Normalized packet loss rate (due to buffer overflow and transmission errors) versus average transmission power. B = 15, λ = 3 packets/frame. Channel model is i.i.d. over time.

VII. C ONCLUSION In this paper, we considered the problem of buffer and channel adaptive transmission for maximizing the system throughput subject to an average transmit power constraint. Given that accurate buffer and channel states are available for making decisions, we show how optimal control policies can be obtained for transmission with and without a fixed BER requirement. Our paper highlights some important issues in wireless data communications. First, as nodes are only equipped with limited batteries and have to operate within a dynamic environment, cross-layer design is essential to achieve good performance and conserve energy. Second, when statistics at multiple layers are taken into account, the popular intuition associated with layered design may no longer be true. For example, this paper shows that, in a correlated fading channel, the structure of the optimal buffer and channel adaptive transmission policies can be in sharp contrast to the well known strategy of water-filling. A PPENDIX P ROOF OF L EMMA 1 Let us first prove the following Lemmas 2, 3, and 4. Lemma 2: For all 0 ≤ g < K, Jα∗ (b, g) is increasing in the buffer occupancy b. Proof: This lemma can be proved by induction. Let J0 be a bounded and increasing function on the state space (b, g). For i = 1, 2, . . ., let

Ji (b, g) = min CI (b, g, u) +α



(39) PG (g, g  )pA (a)

g =0 a=0

= Ji (b, g).

b) If u∗ > 0, then from (38) we have:

Power (dB)

u K−1 ∞ 



K−1 ∞  

  (38) PG (g, g  )pA (a)Ji−1 q(b − u, a), g  ,

Ji (b − 1, g) ≤ CI (b − 1, g, u∗ − 1) +α

∞ K−1  g =0 a=0



< CI (b, g, u ) + α

K−1 ∞  

(40) 

PG (g, g )pA (a)

g =0 a=0

  Ji−1 q(b − u∗ , a), g  = Ji (b, g). We have proved that if Ji−1 (b, g) is increasing in b for all g then the same is true for Ji (b, g). Therefore, by induction, Jα∗ (b, g) = limi→∞ Ji (b, g) is increasing in b for all g. Lemma 3: For all 0 ≤ b1 < b2 ≤ B and all 0 < g < K, Jα∗ (b2 , g) − Jα∗ (b1 , g) is upper bounded when β increases. Proof: Let u∗1 be the optimal transmission rate in state (b1 , g) and u2 = u∗1 + b2 − b1 , then Jα∗ (b2 , g) − Jα∗ (b1 , g) ≤ CI (b2 , g, u2 ) + CF (b2 , g, u2 ) − CI (b1 , g, u∗1 ) − CF (b1 , g, u∗1 ) = CI (b2 , g, u2 ) − CI (b1 , g, u∗1 ) + CF (b2 , g, u2 ) − CF (b1 , g, u∗1 ).

(41)

As u2 = u∗1 + b2 − b1 , CF (b2 , g, u2 ) = CF (b1 , g, u∗1 ) while CI (b2 , g, u2 ) − CI (b1 , g, u∗1 )

= P (u∗1 + b2 − b1 , g, P b ) − P (u∗1 , g, P b ).

Therefore Jα∗ (b2 , g) − Jα∗ (b1 , g) ≤ P (u∗1 + b2 − b1 , g, P b ) − P (u∗1 , g, P b ). (42) It is clear that the left hand side of (42) is bounded when β increases, so the proof is completed. Lemma 3 is for situation in which the channel state g > 0, when g = 0, we have the following lemma. Lemma 4: For all 0 ≤ b1 < b2 ≤ B, Jα∗ (b2 , 0) − Jα∗ (b1 , 0) increases without bound when β increases. Proof: When the channel is in state 0, no transmission is possible, therefore Jα∗ (b1 , 0) = CI (b1 , 0, 0)

g =0 a=0

where q(b, a) = min{b + a, B}. Note that from the value iteration algorithm for solving discounted cost problem (17), we have Jα∗ (b, g) = limi→∞ Ji (b, g) for all 0 ≤ b ≤ B and 0 ≤ g < K. Now assuming Ji−1 (b, g) is increasing in b for all g, we will show that Ji (b, g) is also increasing in b for

  PG (g, g  )pA (a)Ji−1 q(b − u∗ , a), g 



∞ K−1 

  PG (0, g)pA (a)Jα∗ q(b1 , a), g ,

g=0 a=0

Jα∗ (b2 , 0)

= CI (b2 , 0, 0) +α

K−1 ∞  g=0 a=0

  PG (0, g)pA (a)Jα∗ q(b2 , a), g .

(43)

HOANG and MOTANI: CROSS-LAYER ADAPTIVE TRANSMISSION: OPTIMAL STRATEGIES IN FADING CHANNELS

Therefore Jα∗ (b2 , 0) − Jα∗ (b1 , 0) ∞ K−1 

= CI (b2 , 0, 0) − CI (b1 , 0, 0)    PG (0, g)pA (a) Jα∗ q(b2 , a), g



g=0 a=0

 ∗



(44)

− Jα q(b1 , a), g > CI (b2 , 0, 0) − CI (b1 , 0, 0)   = β L(b2 , 0) − L(b1 , 0) . The inequality in (44) is due to Lemma 2. From (44), it is clear that Jα∗ (b2 , 0) − Jα∗ (b1 , 0) increases without bound when β increases and the proof is completed. Proof of Lemma 1: First of all, we have ΔI (b, 1, u1 , u2 ) − ΔI (b, 2, u1 , u2 )     1 1 . − = W No f (u2 , Pb ) − f (u1 , Pb ) γ1 γ2

(45)

Therefore, the left hand side of (22) does not depend on β. For the right hand side of (22), we have: ΔF (b, g, u1 , u2 ) = α

K−1 ∞ 

PG (g, g  )

(46)

g =0 a=0 

pA (a) (Jα∗ (q(b − u1 , a), g ) − Jα∗ (q(b − u2 , a), g  )) Now ΔF (b, 1, u1 , u2 ) − ΔF (b, 2, u1 , u2 ) ∞  K−1    =α PG (1, g  ) − PG (2, g  ) pA (a) g =0 a=0

   ∗ Jα q(b − u1 , a), g  − Jα∗ q(b − u2 , a), g =α

K−1 ∞  



(47)  

 PG (1, g  ) − PG (2, g  ) pA (a)

g =1 a=0

    ∗ Jα q(b − u1 , a), g  − Jα∗ q(b − u2 , a), g  ∞     PG (1, 0) − PG (2, 0) pA (a) +α 

(48)

   q(b − u1 , a), 0 − Jα∗ q(b − u2 , a), 0 .

a=0

Jα∗



As β increases, the first term in (48) is bounded from below (from Lemmas 2 and 3) while the second term increases without bound (from Lemma 4). This combined with (45) completes the proof. R EFERENCES [1] A. T. Hoang and M. Motani, “Buffer and channel adaptive transmission over fading channels with imperfect channel state information,” in Proc. IEEE WCNC 2004, Atlanta, Mar. 2004, pp. 1891–1896. [2] ——, “Cross-layer adaptive transmission: coping with incomplete system state information,” submitted to IEEE Trans. Commun., 2006. [3] A. J. Goldsmith and P. P. Varaiya, “Capacity of fading channels with channel side information,” IEEE Trans. Inform. Theory, vol. 43, pp. 1986–1992, Nov. 1997. [4] A. J. Goldsmith and S. G. Chua, “Variable-rate variable-power mqam for fading channels,” IEEE Trans. Commun., vol. 45, no. 10, pp. 1218– 1230, Oct. 1997. [5] B. Collins and R. Cruz, “Transmission policy for time varying channel with average delay constraints,” in Proc. 1999 Allerton Conf. on Commun. Control and Comp, 1999, pp. 1–9.

807

[6] R. A. Berry and R. G. Gallager, “Communication over fading channels with delay constraints,” IEEE Trans. Inform. Theory, vol. 48, no. 5, pp. 1135–1149, May 2002. [7] M. Goyal, A. Kumar, and V. Sharma, “Power constrained and delay optimal policies for scheduling transmission over a fading channel,” in Proc. IEEE INFOCOM’03, Mar. 2003, pp. 311–320. [8] A. Fu, E. Modiano, and J. Tsitsiklis, “Optimal energy allocation for delay-constrained data transmission over a time-varying channel,” in Proc. IEEE INFOCOM’03, Mar. 2003. [9] D. Rajan, A. Sabharwal, and B. Aszhang, “Transmission policies for bursty traffic sources on wireless channels,” in Proc. 35th Annual Conference on Information Science and Systems, Baltimore, Mar. 1991. [10] H. Wang and N. Mandayam, “A simple packet scheduling scheme for wireless data over fading channels,” IEEE Trans. Commun., vol. 52, no. 7, pp. 1055–1059, July 2004. [11] Q. Liu, S. Zhou, and G. B. Giannakis, “Cross-layer combining of adaptive modulation and coding with truncated arq over wireless links,” IEEE Trans. Wireless Commun., vol. 3, no. 5, pp. 1746–1755, Sept. 2004. [12] ——, “Queuing with adaptive modulation and coding over wirless link: Cross-layer analysis and design,” IEEE Trans. Wireless Commun., vol. 4, no. 3, pp. 1142–1153, May 2005. [13] H. S. Wang and N. Moayeri, “Finite-state markov channel–a useful model for radio communication channels,” IEEE Trans. Veh. Technol., vol. 44, pp. 473–479, Feb. 1995. [14] D. Zhang, W. B. Wu, and K. M. Wasserman, “Analysis on markov modeling of packet transmission over wireless channels,” in Proc. IEEE WCNC’02, Mar. 2002, pp. 876–880. [15] B. Vucetic, “An adaptive coding scheme for time-varying channels,” IEEE Trans. Commun., vol. 39, pp. 653–663, May 1991. [16] W. T. Webb and R. Steele, “Variable rate qam for mobile radio,” IEEE Trans. Commun., vol. 43, pp. 2223–2230, July 1995. [17] A. T. Hoang and M. Motani, “Buffer and channel adaptive modulation for transmission over fading channels,” in Proc. ICC’03, July 2003, pp. 2748–2752. [18] G. J. Foschini and J. Salz, “Digital communications over fading radio channels,” Bell Syst. Tech. J., pp. 429–456, Feb. 1983. [19] P. R. Kumar and P. Varaiya, Stochastic Systems: estimation, identification, and adaptive control. Englewood Cliffs, NJ: Prentice Hall, 1986.

Anh Tuan Hoang (IEEE Member) received the Bachelor degree (with First Class Honours) in telecommunications engineering from the University of Sydney in 2000. He completed his Ph.D. degree in electrical engineering at the National University of Singapore in 2005. Dr. Hoang is currently a Research Fellow at the Department of Networking Protocols, Institute for Infocomm Research, Singapore. His research focuses on design/optimization of wireless comm. networks. Specific areas of interest include crosslayer design, dynamic spectrum access, and cooperative communications.

Mehul Motani is an Assistant Professor in the Electrical and Computer Engineering Department at the National University of Singapore. He graduated with a Ph.D. from Cornell University, focusing on information theory and coding for CDMA systems. Prior to his Ph.D., he was a member of technical staff at Lockheed Martin in Syracuse, New York for over four years. Recently he has been working on research problems which sit at the boundary of information theory, communications and networking, including the design of wireless ad-hoc and sensor network systems. He was awarded the Intel Foundation Fellowship for work related to his Ph.D. in 2000. He is on the organizing committees for ISIT 2006 and 2007 and the technical program committees of MobiCom 2007 and Infocom 2008 and several other conferences. He participates actively in IEEE and ACM and has served as the secretary of the IEEE Information Theory Society Board of Governors.

808

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

Non Binary and Precoded Faster Than Nyquist Signaling Fredrik Rusek and John B. Anderson

Abstract—Faster than Nyquist (FTN) signaling is an important method of narrowband coding. The concept is extended here to non binary signal constellations; these are much more bandwidth efficient than binary ones. A powerful method of finding the minimum distance for binary and non binary FTN is presented. Precoding FTN transmissions with short linear filters proves to be an effective way to gain distance. A Shannon limit to bit error rate is derived that applies for FTN. Tests of an M-algorithm receiver are performed and compared to this limit. Index Terms—Coded modulation, Mazo limit, faster than Nyquist, bandwidth efficient coding.

I. I NTRODUCTION

T

occurs for other T -orthogonal pulses, and the limits for root raised cosine (RC) pulses with excess bandwidth were derived in [3]. Efficient receivers for FTN signaling were presented there for the first time. Methods of computing the minimum distance of FTN signaling can be found in [4] and [5]. Mazotype limits can be derived for pulse shapes that are not orthogonal for any T [6]. Mazo limit phenomena turn up in other places as well, for example, in constant-envelope coded modulation; see [7] and references therein. Precoding strategies for FTN were studied in [8] and [9]. The non binary case has not been studied as much, and its minimum distances are still an open problem. In this paper we develop an efficient method of finding minimum distances for non binary FTN. Distances for short (4–8 tap) optimal precoding filters with quaternary as well as binary FTN are also studied. The paper is organized as follows. In section II we give the system model and in section III we derive the algorithm used to search for the minimum Euclidean distance. In section IV a method to find optimal precoding filters is presented. Numerical results and capacity calculations are given in section V and VI. Decoding is discussed in section VII.

HE concept of Faster Than Nyquist (FTN) signaling is well  established. If a pulse amplitude modulation (PAM) signal a[n]v(t − nT ) is based on an orthogonal pulse v(t), the pulses can be packed closer than the Nyquist rate 1/T without suffering any distance loss. In a bandpass system two quadrature signals can be used. The result is a much more bandwidth-efficient coding system. Mazo showed [1] that for binary sinc pulses the symbol time can be reduced to 0.802T without suffering any loss in minimum Euclidean distance. We refer to this value as the Mazo limit. An introduction to the philosophy of Mazo signaling has been given in [2]. At first, FTN signaling seems to contradict the Nyquist limit and so it is useful to review how it works. Nyquist pulse signal design is based on orthogonality. There exist about 2W T orthogonal signals in W positive Hertz and T seconds. By means of filters matched to each one, data values that modulate each can be maximum likelihood detected independently,  and therefore about 2W T symbols can be transmitted. If v is 1/T sinc(t/T ) and there are N pulses each in the I and Q baseband channels, the product 2W T tends in ratio to 2(1/T )(N T ) = 2N . The sinc pulses thus carry as many symbols as any orthogonal pulse train can carry. If the aim is to achieve asymptotically the same error rate, without necessarily using orthogonal pulses, then the sinc pulses can arrive faster than 1/T . A more complex maximum likelihood sequence estimation (MLSE) receiver is required because of intersymbol interference. A similar phenomenon

Consider a baseband PAM system based on a T -orthogonal pulse ψ(t). We are mostly interested in ψ(t) being a root RC pulse with excess bandwidth α. When α = 0, ψ(t) is a sinc pulse. The one sided bandwidth of ψ(t) is W = (1+α)/(2T ). The transmitted signal is

Paper approved by H. Leib, the Editor for Communication and Information Theory of the IEEE Communications Society. Manuscript received February 8, 2006; revised November 23, 2006 and April 4, 2007. This work was supported in part by the Swedish Research Council (VR), grant number 6212003-3210. The authors are with the Department of Electrical and Information Technology, Lund University, Box 118 SE-221 00 Lund, Sweden (e-mail: {fredrikr, anderson}@eit.lth.se). Digital Object Identifier 10.1109/TCOMM.2008.060075.

where Ma is the data alphabet size. The optimum receiver should filter the received signal sa (t) + n(t), where n(t) is additive white Gaussian noise (AWGN) with spectral density N0 /2, with a filter matched to ψ(t) [3]. This should be followed by sampling every τ T

II. S YSTEM M ODEL

sa (t) =

∞ 

a[n]ψ(t − nτ T ),

τ ≤1

(1)

n=−∞

where a[n] are independent identically distributed data symbols and 1/τ T is the  ∞signaling rate. We assume ψ(t) to be unit energy, i.e. −∞ |ψ(t)|2 dt = 1. For T -orthogonal pulses the system will not suffer from intersymbol interference (ISI) when τ = 1. For τ < 1 we say that we have FTN signaling, and ISI is unavoidable. The normalized bandwidth consumption is nbw = τ T

c 2008 IEEE 0090-6778/08$25.00 

1+α 1 2T log2 Ma

Hz/bit/s,

(2)

RUSEK and ANDERSON: NON BINARY AND PRECODED FASTER THAN NYQUIST SIGNALING

n(t) a[n]

ψ(t)

sa (t)

t=nτ T

r(t)

ψ ∗ (−t)

Decoder

a ˆ[n]

Fig. 1. Overall system of faster-than Nyquist signaling. Consecutive data symbols a[n] are spaced every τ T seconds.

second and a decoding algorithm to mitigate the effects of the ISI. The system model is illustrated in figure 1. For MLSE reception, it can be shown that there exist constants K1 and K2 such that the probability of a symbol error Ps can be bounded by [16]   Eb  Eb  ≤ Ps ≤ K2 Q . (3) d2min d2min K1 Q N0 N0 These inequalities are tight for large Eb /N0 , and d2min thus drives the asymptotic error probability and is a measure of a systems noise immunity. The square Euclidean distance, henceforth simply called “distance”, between the (real) data sequences a and a is ∞ 1 2  |sa (t) − sa (t)|2 dt d (a, a ) = 2Eb −∞ ∞  ∞ 1 = | (a[n] − a [n])ψ(t − nτ T )|2 dt 2Eb −∞ n=−∞ ∞  ∞ = | e[n]ψ(t − nτ T )|2 dt =

−∞ n=−∞ ∞ 

e[m]ρψ [n − m]e[n] = d2 (e),

(4)

m,n=−∞

where

ρψ [n] =



−∞

ψ(t)ψ(t + nτ T )dt

(5)

is the autocorrelation of the continuous pulse ψ(t) √ at samples spaced τ T seconds, and e[n] = (a[n] − a [n])/ 2Eb . An important fact is that (4) takes the linear form d2 (e) =

∞ 

re [k]ρψ [n]

re [n] =

∞ 

e[n]e[n + k].

capability. Therefore symmetry properties of the error events are important. In [8] the following hypothesis was stated: If |ρψ [1]| >> |ρψ [n]|, n > 1 then the error event causing the minimum distance should be one where the symbols alternate in sign. Another hypothesis is that for low enough  bandwidth the worst error event is a zero sum event, i.e. n e[n] = 0. Low enough bandwidth means typically nbw< .2 Hz/bit/s; see [7] and references therein. Both these hypotheses heavily reduce the computation. However, we can never be sure that they are valid and in our numerical results we give an example where a search based on the first hypothesis does not produce the minimum distance of the system. We will show that the second assumption gives too small reduction to be really useful. We therefore take a new look at the problem of efficiently finding the minimum distance for non binary alphabets. The notation used is as follows: e ρψ [n] gb [n] TN e d2ρψ (e) uv u∗ [n] supp(e)

discrete vector of symbols, with nth element e[n] τ T -sampled autocorrelation of a continuous pulse ψ(t) autocorrelation of a discrete sequence b[n] Truncation to the first N + 1 symbols of e distance of e calculated by (4) using ρψ [n] convolution of u and v complex conjugate of u[n] support of e

III. F INDING THE M INIMUM D ISTANCE We start by describing a different but closely related problem; we will then transform our original ISI problem into the new one. Consider the finite causal ISI tap sequence b[n]. The transmitted signal for data symbols a and generator sequence b is ∞  xa|b [k] = a[n]b[k − n]. (8) n=−∞

The distance between two data signals is ∞ 1  |xa|b [n] − xa |b [n]|2 d2 (a, a )  2Eb n=−∞ =

(6)

∞ 

e[m]gb [n − m]e[n] = d2 (e),

gb [k] =

(7)

k=−∞

By concatenating an outer code to the FTN signals the d2min here can be significantly increased, which will reduce BER. There may be a bandwidth expansion, so it will be important to compare systems with similar bandwidth in what follows. Mazo’s claim that it is possible to transmit at 0.802/T for α = 0 without distance loss was proven rigorously in [4]. The results for α > 0, given in [3], were obtained by an exhaustive search out to 14 error symbols. For the nonbinary case little is known. Finding minimum distances by searching is very hard for large alphabet sizes since there are |E|L error events of length L for the error symbol alphabet E. For 8 PAM, |E| = 15, and searching out to length 14 as in [3] gives 3 × 1016 error sequences which is beyond our computation

(9)

m,n=−∞

with

n=−∞

where

809

∞ 

b[n]b[n + k]

(10)

n=−∞

It can be shown [10] that the Z transform of gb [k] can be written as Nz

(1 − ζi z −1 )(1 − ζi∗ z), (11) Gb (z) = cc∗ i=1

ζi∗

where ζi and are the zeros of Gb (z) and c is a normalization constant. From (11) we see that it is always possible to construct Nz

(12) H(z) = c (1 − ζi z −1 ), i=1

such that H ∗ (1/z ∗ ) = c∗

Nz

(1 − ζi∗ z), i=1

(13)

810

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

and





Gb (z) = H(z)H (1/z ).

(14)

Let h[n] be a sequence obtained by the inverse z-tranform of H(z), i.e. h[n] = Z −1 {H(z)}; note that since there is a great degree of freedom when constructing H(z) we have in general b[n] = h[n]. We say that h[n] and H(z) are obtained from the spectral factorization of Gb (z) [10]. Since h∗ [−n] = Z −1 {H ∗ (1/z ∗ )} it follows that gb [k] =

∞ 

h[n]h[n + k].

(15)

n=−∞

If H(z) is obtained from the factorization with |ζi | ≤ 1, ∀i, the sequence obtained by the inverse z-transform is minimum phase and is denoted hmp [n]. The minimum distance of all h[n] including b[n] and hmp [n] are equal since they have the same autocorrelation sequence. But more effective bounds will stem from hmp [n] since it is minimum phase. Henceforth we construct all tap sequences such that they are minimum phase, and b[n] will be taken as the factorization hmp [n]. An efficient branch and bound algorithm to find the minimum distance of an ISI sequence is given in [14]. The algorithm works as follows. For a given error event e, a lower bound on distance for all error events starting with the same symbols as e is found. This lower bound is then compared to an upper bound for d2min ; when the lower bound is larger than the upper, the whole tree emanating from e is removed. The lower bound is lemma 1 below. This bound can be further sharpened but this is omitted here since we will eventually replace the lemma with another. Lemma 1: Given a generator sequence h[n] and a particular error sequence es [n], if AN (es ) is the set of error sequences AN (es ) = {e : TN e = TN es },

(16)

then a lower bound for these is 2 (es )  lN

N  n=0

|xes |h [n]|2 ≤

min {d2 (e)}.

e∈AN (es )

(17)

2 (es ) is larger than some known The lemma implies that if lN 2 2 upper bound dub to dmin then all events in the set AN (es ) can be eliminated from the search for d2min , as previously mentioned. Note that the distance of any error event gives an upper bound to d2min . Based on the sequence h[n] and lemma 1 it is straightforward to set up a branch and bound algorithm to solve for d2min . Any h[n] giving the same autocorrelation g may be used, but the minimum phase one will be most effective in curtailing the search. We now return to our original problem: given an arbitrary time continuous pulse ψ(t), find the minimum distance d2min . From (6) we see that if the error event support is limited to L+ 1 error symbols then the distance of an event only depends on {ρψ [−L], ρψ [−L + 1], . . . , ρψ [L]}. If we only consider events of length L + 1 we actually only find upper bounds to d2min . But if L is large, say 20 or so, we are confident that the result is valid. This is motivated by the fact that the d2min achieving error events turned out to be much shorter than the search length (L + 1) used in forthcoming sections. In the sequel we write d2min when we mean upper bound to d2min .

Since d2min for ψ(t) only depends on a finite sequence of autocorrelation values there is in principle nothing that differs this problem from finding d2min for a discrete tap sequence. We could try to find a sequence b[n] having an autocorrelation sequence equal to ρψ [n], |n| ≤ L (18) gb [n] = 0, otherwise But this truncated gb [n] is in general not a valid autocorrelation sequence and consequently no sequence b[n] need exist such that gb [n] = b[n]b∗ [−n]. Thus lemma 1 cannot be used. Note that if we truncate the pulse ψ(t) to a certain length, the (finite) autocorrelation stemming from the truncation is valid; then the method to come is unnecessary. However, the obtained result is then only a (good) approximation. If we want to avoid truncations and seek distances for infinite pulse shapes we need the method below. Furthermore, using our approach the problem of escaped distance (see [7], chapter 6) is completely avoided. The following lemma gives a sufficient and necessary condition for a sequence to be a valid autocorrelation sequence. A formal proof is found in [15], although the lemma has appeared much earlier. Lemma 2: A sequence g[n] with Hermitian symmetry is a valid autocorrelation sequence if and only if G(eiω ) =

∞ 

g[k]e−iωk ≥ 0,

for all ω ∈ (−π, π]. (19)

k=−∞

Now let gb be as in (18) and take θ=

min

ω∈(−π,π]

Gb (eiω ).

(20)

The case θ ≥ 0 implies that a sequence b[n] can be found from gb and consequently the algorithm in [14] can be applied. Take θ < 0 and define a new autocorrelation sequence gb such that gb [n] − θ, n = 0 gb [n] = (21) gb [n], n = 0. From gb [n] it is now possible to obtain a sequence b [n] through spectral factorization since Gb (ω) ≥ 0, ω ∈ (−π, π]. However, the distance of an error event e[n] calculated using gb is not equal to the distance calculated using gb (or ρψ ). In fact, from (6), d2gb (e) = d2gb (e) + θre [0].

(22)

Due to (22) lemma 1 does not hold and consequently the algorithm to find d2min must be modified. First let denote the largest energy among the error symbols in E, i.e. = max |e[n]|2 ,

e[n] ∈ E.

(23)

We now initialize every error event with the “distance” (L + 1)θ. This corresponds to the worst case of θre [0] in (22) for a given search length L. We can modify lemma 1 into Lemma 3: Let the set AN (es ) = {e : TN e = TN es , supp(e) ≤ L + 1}, with N ≤ L. For gb as in (18)

RUSEK and ANDERSON: NON BINARY AND PRECODED FASTER THAN NYQUIST SIGNALING

and a particular error sequence es [n], a lower bound for set AN (es ) is λ2N (es ) 

N 

|xes |b [n]|2 + (L + 1)θ

n=0 N 

(e2s [n] − )θ ≤

+

n=0

=

N 

min {d2gb (e)}.

|xe|b [n]| +

=

N 

+

∞ 

|xe|b [n]|2 +

|xe|b [n]| + θ

L 

e2 [n]

L 

|xe|b [n]|2 +

n=N +1

(e2 [n] − )θ.

n=N +1

(25)

Since ≥ e2 [n], θ ≤ 0 implies that the last term of (25) is always nonnegative. |xe|b [n]|2 being nonnegative, we have λ2N (e) ≤ d2gb (e). All sequences in AN (es ) have the same first N + 1 components as es and we have for all e ∈ AN (es ) λ2N (es ) = λ2N (e) ≤ d2gb (e),

(26) λ2N (es )

which especially implies that mine∈AN (es ) {d2gb (e)} and the proof is complete.



We now further sharpen lemma 3. Let d2min,gb [n] =

min

e : supp(e)≤n

r(t)

t = nτT

φ*(–t)

Decoder

â[n]

IV. P RECODED FASTER T HAN N YQUIST S IGNALING

|xe|b [n]|2 + θ(L + 1)

n=0 ∞ 

ϕ(t)

n=0

(e2 [n] − )θ

= λ2N (e) +

b[n]

If the expressions for d2gb (es ) in lemma 4 are larger at some node than an upper bound to d2min , the entire tree beyond that node can be removed from the search.

n=N +1

n=0 L 

2

n=N +1

n=0

a[n]

(24)

e∈AN (es )

∞ 

2

n(t)

φ(t)

Fig. 2. System model for precoded FTN signaling. The input data symbols are spaced τ T seconds apart.

Proof: According to (22) we can write d2gb (e) as d2gb (e)

811

{d2gb (e)},

n integer.

(27)

We can then prove the following lemma. Lemma 4: Define the set BN (es ) = {e : TN (e) = TN es , TN e = e, supp(e) ≤ L + 1}, with N ≤ L. Let ΔN (e) = d2gb (TN e) − λ2N (e) − (L − N ) θ. Then for e ∈ BN (es ) and ΔN (e) < dmin,gb [L+1−N ] we have d2gb (e) ≥ d2min,gb [L + 1−N ] + dmin,gb [L + 1 − N ]ΔN (e) + d2gb (TN e) − (L − N )θ

(28)

And for the case ΔN (e) ≥ dmin,gb [L+1−N ] we have d2gb (e) ≥ λ2N (es ).

(29)

This lemma is proved in appendix B. The lemma is a modification of a lemma in [15]. It is now straightforward to set up a branch and bound algorithm to find the minimum distance of the system. For lemma 4 to be useful, it should be easy to find d2min,gb [n] compared to d2min,gb [n]. From our experience this is always the case; since gb is a valid autocorrelation sequence the search method in [14] can be applied. The root node (depth 0), should be initialized by (L + 1)θ and the branch metric at depth k is |xe|b [k]|2 + (e2 [k] − )θ. (30)

In this section we improve d2min by precoding the input data sequence. In the literature there is a rich variety of coding/precoding methods for ISI and partial response signaling (PRS) channels; see [11]– [13] and references therein. Most of these methods work by applying some sort of rate decreasing outer code and possibly a precoder; sometimes the precoder is a linear filter. In [3] constraint coding was used to increase the distance, but the systems there could never achieve the full antipodal distance of an uncoded system. The task of the precoder is usually to ease the decoding burden at the expense of the bit error rate (BER); examples are the famous Tomlinson-Harashima precoder and the LaroiaTretter-Farvardin precoder from [12]. We will in a sense do the opposite, use a rate 1 PRS precoder in order to improve the BER, at the cost of decoding complexity. This strategy was used in [8], with an argument tracing back to Forney [16], but the results are apparently incorrect.1 A method of designing optimal linear filters with respect to d2min is given in [14]. The scope of that paper was to design optimal PRS codes based on orthogonal interpolation pulses ψ(t). A technique called partial spectrum mapping was used in [15] for designing bandlimited filters; this technique is essentially what we make use of next. But since the algorithm in section III was unknown it was not possible to find d2min and it could only be estimated by means of a heavy search. Furthermore, how to implement the pulses obtained via partial spectral mapping was not described; there was no concept of an underlying pulse form. Assume that the input data sequence is convolved with a finite tap sequence b[n]. Alternatively this can be seen as a linear modulation of the data sequence with a different pulse φ(t); this is illustrated in figure 2. The transmitted signal is 1 Our outcomes differ from [8] in a number of ways in the sequel, most often because the error events explored in [8] were too short. Some brief examples are as follows. (i) The distance 0.9023 marked by an asterix in table III stems from the error sequence 2, −2, 0 repeated 15 times, an event of length 44; [8] suggests 0.9778, which stems from a shorter event. (ii) Combining the duobinary pulse (37) having ρ = 0.65 with the 2-tap prefilter 1.4302, −1.4302 leads not to signals with d2min = 2, but to d2min = 0.3665; it is achieved by the error event 2, −2, 2, 0, 0, 0, 2, −2, 2, 0, 0, 0, 2, −2, 2. Furthermore, for optimum 2-tap prefilters, d2min cannot equal 2 for any ρ ≤ 0.867. (iii) The  paper proposes another system in which the transfer function is |H(f )| = (πT /2ρ) cos(πT f /ρ), |f | < (ρ/2T ), ρ ≤ 1. It is claimed that ρ = 0.60 and the 3-tap prefilter 1.3727, −1.3727, 1.3727 lead to d2min = 2, but we find that d2min is 0.7027, achieved by the event 2, −2, 2, −2, 0, 0, 2, −2, 2, −2, 0, 0, 2, −2, 2, −2. Furthermore, for the best 3-tap prefilters, d2min cannot equal 2 for ρ ≤ 0.775.

812

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

TABLE I T HE OBTAINED MINIMUM DISTANCES FOR 4 AND 8 PAM. F OR 4 PAM THE SEARCH LENGTH L WAS 25 FOR ALL α. F OR 8 PAM L WAS 20 FOR ALL α.

2

0.8

d2min for Ma PAM 10% 4

20% 8

4

30% 8

4

8

.80

4/5

2/7

4/5

2/7

4/5

2/7

.75

.708

.204

4/5

2/7

4/5

2/7

.70

.593

.131

.677

.203

.791

.282

.65

.358

.0885

.547

.127

.642

.184

60

.151

.0479

.254

.091

.437

.114

.55

.129

.151

.0393

.198

.0708

.50

.131

.0212

.147

.0331

.45

.102

.128

.0160

2/7

d2min

α τ Ma

0.1

0.01

Fig. 3.

given by

∞ 

sa (t) =

a[n]φ(t − nτ T ),

(31)

n=−∞

where φ(t) is φ(t) =

L b −1

2PAM 4PAM 8PAM

b[n]ψ(t − nτ T ),

0.45

0.4

0.35

0.3

0.25

0.2

nbw, Hz/bit/s

0.15

0.1

0.05

Comparison of d2min for 2-, 4- and 8PAM.

TABLE II M INIMUM DISTANCES FOR PRECODED 2 AND 4 PAM WITH DIFFERENT LENGTHS Lb OF THE PRECODING FILTER . S EARCH LENGTH L WAS L = 25 FOR 2 PAM, L = 20 FOR 4 PAM WITH Lb = 4 AND L = 18 FOR 4 PAM WITH Lb = 6. T HE NO - PRECODING CASE IS INCLUDED FOR COMPARISON .

(32)

d2min for precoded Ma PAM

n=0

and Lb is the support of b[n]. It can be shown that the autocorrelation ρφ [n] is

uncoded

Lb = 4

Lb = 6

Lb = 8

τ Ma

2

4

2

4

2

4

2

.65

1.60

.64

2

4/5

2

4/5

2

ρφ [n] = gb [n]  ρψ [n].

.625

1.47

.59

2

.759

2

4/5

2

.58

1.27

.35

1.59

.578

1.86

.67

2

.55

1.21

.20

1.46

.445

1.66

.61

.50

1.01

.15

1.31

.193

.46

.93

(33)

It can be shown that the distance of an error event is L b −1

d2 (e) =

gb [n]μe [n],

(34)

4

1.14

n=1−Lb

where

∞ 

μe [n] =

re [k]ρψ [k − n],

(35)

k=−∞

with re [k] defined in (7). The energy of φ(t) should equal 1; this is equivalent to ρφ [0] = 1, or from (33) L b −1

gb [k]ρψ [k] = 1.

(36)

k=1−Lb

We now have linear equations both for distance (34) and energy normalization (36). Together with the linear condition given in lemma 2 we can solve for the optimal sequence b[n] for a given pulse ψ(t). For the procedure used we refer to [14]. To find d2min we use the branch and bound algorithm in section III. V. N UMERICAL R ESULTS We start with d2min for the uncoded root RC FTN case in table I. We present results for 4 and 8 PAM and α = 10, 20 and 30%. Recall that the matched filter bounds are .8 and 2/7 for 4 and 8 PAM. The Mazo limits, i.e. the τ where d2min falls below the matched filter bound for the first time, are the same for 2,4 and 8 PAM and are τ = .7032 for α = 30%. A comparison between 2,4 and 8 PAM is shown in figure 3 for α = 30%. It is seen that there is an optimal alphabet size

for each bandwidth. For example, at nbw 0.15 Hz/bit/s there is roughly 3 dB gain by using 8 PAM instead of 4 PAM. A similar result for Butterworth pulses is reported in [6]. We give some results for the precoded FTN signaling next. We only consider only α = 30%. Results for 2 PAM with Lb = 4, 6, 8 and 4 PAM with Lb = 4, 6 are given in table II. Especially note the 4.8 dB gain by using a precoding filter of support 6 for 4 PAM and τ = .55. We also found d2min for frequency scaled versions of the duobinary pulse, as proposed for binary transmission in [8]. The transfer function of the duobinary pulse is |Ψ(f )| =



2T /ρ cos(πT f /ρ),

|f | < (ρ/2T ),

ρ ≤ 1. (37) The normalized bandwidth is nbw = ρ/2 log2 Ma Hz/bit/s. Results are given in table III. Some of these differ from [8]. To see the strength of the d2min algorithm we compare the effort of finding d2min for 8 PAM, α = 10%, τ = .60 with an exhaustive search. We searched over all events with support ≤ 20. For an exhaustive search this implies testing 1520 ≈ 3.33 × 1023 events but our algorithm only considered ≈ 6 × 107 . Using generating functions one can show that the number of error events fulfilling the zero sum assumption for Ma PAM

RUSEK and ANDERSON: NON BINARY AND PRECODED FASTER THAN NYQUIST SIGNALING

TABLE III M INIMUM DISTANCES FOR THE DUOBINARY PULSE , 2,4 AND 8 PAM. S EARCH LENGTH L WAS 20 FOR ALL THREE ALPHABETS . T HE VALUE MARKED BY AN ASTERIX IS DIFFERENT FROM THAT IN [8]. d2min for the duobinary pulse ρ

2 PAM

4 PAM

1

2

4/5

2/7

.95

1.648

.659

.236

.90

1.49

.595

.207

.85

1.37

.531

.139

.80

1.281

.361

.109

.75

1.19

.233

.078

.70

1.11

.172

.031

.65

.9023∗

.143

.021

.60

.723

.094

.55

.628

.076

.50

.568

.055

.45

.420

.40

.336

8 PAM

and support L is

 L Ma L + i(1 − 2Ma ) − 1 i (−1) . (Ma − 1)L + i(1 − 2Ma ) i a −1)L 0≤i≤ (M (2Ma −1)

Evaluating this for Ma = 8 and L = 20 gives ≈ 6.81 × 1021 error events; thus the zero sum reduction is less than 100-fold and is not very helpful. In general the worst case error events are rather short, 4–12 error symbols, but we found events up to 18 error symbols long. In the lower region (τ < .6) of table I, all events for 8 PAM have support > 10 symbols, and the largest support is 14. We found many examples of worst case error events that contradict the hypothesis in [8].2 The zero sum assumption was never violated in any of the tables. VI. C APACITY In this section we derive the capacity of schemes like FTN and the Shannon bound to bit error rate for them. In section VII the BER bound for FTN will be compared to both actual decoding performance and to BER bounds for trellis-coded modulation (TCM) schemes based on RC pulses. The capacity calculation has appeared in the literature (see e.g. [21]) but the BER bound has apparently not. It is an important tool in the evaluation of FTN-like coding schemes, since it includes the effect of both energy and spectral density and it directly relates to an easily measured quantity. According to classical Shannon theory, signals with W positive hertz, uniform power spectral density (PSD) and P Watts have capacity CW = W log2 [1 + P/N0 W ], where N0 /2 is the noise density. Elementary calculus extends this brickwall result to signals with an arbitrary PSD |H(f )|2 ; the outcome is ∞ 2|H(f )|2 CH = log2 [1 + ] df (bit/s) (38) No 0 2 An example is uncoded binary transmission with α = 30%, τ = .45. Then ρψ [1] = .6868 and maxn>1 |ρψ [n]| = .1796 so the conditions in the hypotheses are fullfilled, but the worst event is 2, −2, 0, 2, −2, 0, 2, −2, 0.

813

 in which P is now |H(f )|2 df . Some calculations with (38) show that the stopband of H(f ) has a major effect on the capacity of narrowband signaling, even though its power is small. A Shannon limit to BER gives the lowest BER of coding schemes with a given PSD, as a function of Eb /N0 . It is derived as follows. Consider a coded modulation with PSD |H(f )|2 that carries Rber binary data bits/s. If Rber ≥ CH , standard rate-distortion theory tells us that Rber can be compressed to CH in the ratio Rber = 1 − hB (β), CH

(39)

where β is the resulting error rate of the compression and hB ( ) is the binary entropy function. The channel carries the compressed data nearly perfectly at rate CH . We now fix the system rate Rber and scale |H(f )|2 by a parameter γ > 0. This scales Eb /N0 = P/N0 Rber to γEb /N0 , and eqs. (38)– (39) yield a BER β for the new γEb /N0 . We thus obtain a relationship between β and Eb /N0 that is parameterized in γ. The highest allowed Eb /N0 is the one for which CH is the given Rber . The Shannon BER limit is the ultimate limit for any coding scheme having PSD |H(f )|2 , but some coded modulations may be bounded away from this limit. Consider TCM, multilevel coding and many other types of coded modulation, in  which the signals have the form s(t) = an v(t − nT ) with T -orthogonal v(t). The {an } can be thought of as real valued code letters. If the {an } are uncorrelated and zero mean, the PSD of s(t) has the same shape as |V (f )|2 . As shown by Nyquist, useful orthogonal pulse PSDs obey a symmetry condition about the frequency W = 1/2T . The most common pulse is the root RC, with nominal passband [0, W ] Hz and stopband [W, (1 + α)/2T ]. The most narrowband orthogonal pulse is sinc(t/T ), with flat PSD and α = 0. It can be shown that CV in (38) always increases compared to the brickwall CW when a pulse with Nyquist’s symmetry and α > 0 is substituted for sinc(t/T ). This is the fundamental reason why orthogonal-pulse schemes can be bounded away from their limit: The BER of these does not depend on α but only on the fact that v(t) is orthogonal; in particular, α can be zero, so these schemes must achieve a Shannon BER limit derived from the brickwall capacity CW . The general BER limit derived from a non-sinc |V (f )|2 via eqs. (38)–(39) must lie strictly to the left in a plot of BER vs. Eb /N0 like those in the next section. This opens an avenue for FTN schemes to perform better than orthogonal ones. VII. D ECODING Now we discuss decoding of the coded modulation schemes and give some receiver tests to verify the obtained minimum distances. The optimal strategy is a full MLSE, but this is out of the question, since the complexity of MLSE is MaL where L, the length of the ISI, is in theory infinite and in practice very long. Thus a reduced sequence estimation (RSE) is necessary. However, a detector that gives essentially the MLSE performance is needed, otherwise the bandwidth gains are not exploited; this implies that all forms of linear

814

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

−1

0

10

10

4PAM Ref. 4PAM 8PAM Ref. 8PAM

−1

10

FTN c.c.+FTN Ref. C FTN C TCM

−2

10

−2

10

BER

BER

−3

10 −3

10

−4

10

−4

10

−5

−5

10

10

−6

10

−6

10

12

14

16

18

Eb /No

20

22

24

Fig. 4. M-algorithm receiver tests of 4- and 8PAM systems with parameters τ = 0.55, Lb = 6, α = 0.30. Bandwidths are 0.1788 and 0.1192 Hz/bit/s  for 4- and 8PAM. Curves marked  by Ref. are the references Q( d2min Eb /N0 ) for 4PAM and K1 Q( d2min Eb /N0 ) for 8PAM, K1 is the multiplicity of the d2min achieving error event. M is 32 and 100 for 4- and 8PAM. The actual precoder used is {2.2418, −3.6121, 4.2673, −3.5962, 2.3436, −.9159}.

equalization and nonlinear methods such as straightforward decision feedback are ordinarily ruled out. Over the past 30 years much research has gone into finding good RSE strategies for the AWGN channel; see [17], [18] and references therein. For example a common strategy is reduced state sequence estimation (RSSE); this method works with a considerably smaller trellis than the original and obtains close to optimal performance as Eb /N0 increases. An efficient receiver structure well suited to binary FTN was recently proposed in [3]; the structure was based on the Ungerboeck observation model [19]. This structure could be generalized to Ma -ary signaling. But the first part of the receiver is a soft output truncated Viterbi algorithm [20], whose complexity grows as MaLv where Lv denotes the truncation length of the ISI. Since we have precoded signals and significant FTN complexity, a large Lv is probably needed in order to avoid too much residual ISI; therefore we believe that this receiver is in general too complex for quaternary and octal signaling with precoders as long as 4–6 taps. In this paper a different strategy is tested, the simple M algorithm. If the Ungerboeck model is used the M -algorithm is observed to work badly for large alphabets, such as 8 PAM. Therefore the whitened matched filter (WMF) model [16] is assumed. Similar to [3] we found it hard to work with the WMF model when the impulse response of the root RC pulse is long, e.g. 80T . Therefore we have done the following: a front end whitened matched receiver filter was determined for root RC pulses of length 20T ; all receiver tests are done with pulses of length 80T but the receiver filter is for the 20T pulse. This of course makes the decoder mismatched, but the noise variance emanating from the mismatch is small compared to the AWGN variance. We now have to decode backwards, which might be a drawback, since the decoding cannot start until the whole block has arrived. If the receiver

10

6

8

10

12

14

16

Eb /No Fig. 5. Comparison of M-algorithm receiver tests for rate 2 uncoded and convolutionally encoded signals having τ = 0.7 and α = 0.30. The uncoded system, denoted FTN,  is a 4PAM system with τ = 0.7, M = 8; curve marked by Ref. is Q( d2min Eb /N0 ) for this system. Curve marked c.c.+FTN denotes a convolutionally encoded 8PAM system with τ = 0.7, M = 80; encoder is the (23,40) convolutional code; C FTN denotes Shannon BER limit (38). Curve marked C TCM denotes limit for TCM and related methods having the same normalized bandwidth.

filter is set up for forward decoding, it is no longer stable; this is due to the mismatching of the filters. This receiver is simple and shows good performance. At each level in the trellis the M -algorithm keeps only the M most promising paths. As usual the symbols are released with some delay (the decision depth); see [22] for a study of decision depths for ISI channels. We have performed receiver tests for both 4- and 8PAM for uncoded, precoded and convolutionally encoded FTN systems; the tests are shown in figures 4–6. By convolutionally encoded FTN signaling we mean a scheme where 1 out of k input bits are first encoded by a rate 1/2 convolutional code, and these k+1 bits are then mapped onto a 2k+1 PAM signal set. This is followed by ordinary FTN signaling. The minimum distance of these systems has not been conclusively determined. Figure 4 compares M -algorithm error rates for 4 and 8PAM systems with optimal precoding to their d2min -based Q-function estimate. The signal generation parameters are τ = 0.55, Lb = 6, α = 0.3. Note that precoded 8PAM FTN is not included in Table 2 because it is very hard to find d2min . In the test we therefore used the precoder constructed for 4PAM also for 8PAM. The required M is approximately 32 and 100 for 4- and 8PAM, and these are used in the tests. The normalized bandwidth of e.g. 8PAM is [(1 + 0.3)/2]0.55/3 = 0.1192 Hz/bit/s. The Q-function estimates lie 1–2 dB to the left of the test results. This reference is based solely on d2min for 4PAM and only applies asymptotically. For 8PAM we included the multiplicity K1 , see eq. (3), of the d2min achieving error event [16] [23]. This reference becomes tight. Figure 5 shows coding systems at rate 2 bits/channel use. An uncoded 4PAM system (decoder M = 8) is compared to a (23,40) convolutionally encoded 8PAM system (decoder

RUSEK and ANDERSON: NON BINARY AND PRECODED FASTER THAN NYQUIST SIGNALING

−1

10

A PPENDIX : P ROOF OF L EMMA 4

FTN Ref.

−2

The proof of lemma 4 is a modification of the proof of lemma 7.4.2 in [15]. The modification is due to the extra term in (22). The proof requires lemma 5 below; except for notation, lemma 5 is identical to lemma 7.4.1 in [15] and therefore we give it without proof. Let xe|b [k] be an error signal generated according to (8). Decompose xe|b [k] into two parts

C FTN C TCM

10

−3

10

−4

BER

815

10

−5

10

x ¯e|b [n, N ] 

−6

10

x˙ e|b [n, N ] 

−7

10

N 

e[k]b [n − k]

n=0 n 

e[k]b [n − k]

(40)

n=N +1 −8

10

12

14

16

Eb /No

18

20

22

Fig. 6. M-algorithm receiver tests of a rate 3 uncoded 8PAM uncoded FTN system with τ = 0.7, α = 0.30 and M = 16. Curve marked Ref. is Q( d2min Eb /N0 ). Curve marked C FTN denotes FTN Shannon BER bound (38); C TCM denotes bound for TCM and related methods having the same normalized bandwidth.

M = 80). Signal parameters are τ = 0.7 and α = 0.3. The convolutionally coded system is about 2 dB better at all Eb /N0 tested, although it requires a decoder with 10 times the complexity. The figure shows the d2min reference for the uncoded 4PAM system, and it is a tight estimate at high Eb /N0 . The figure also shows Shannon BER limits both for the FTN pulse PSD and for 30% root RC-based TCM systems with the same bandwidth (dashed curve). Figure 6 shows an uncoded rate 3 system with the same signal parameters as Figure 5. Once again, performance is compared to the d2min reference and the two Shannon BER limits. Agreement with the reference is again good. In both figures 5 and 6 we see that the uncoded FTN signaling lies roughly 6 dB from its Shannon limit at BER 10−5 but only 3–4 dB from the Shannon limit for competing methods such as TCM and multilevel coding. The convolutionally encoded scheme (figure 5) actually gets within 1.5 dB of the competing method Shannon limit at BER 10−3 .

Then lemma 7.4.1 in [15] with our notation reads Lemma 5: If er is an error sequence such that er = TN er then ∞  |x˙ e|b [n, N ]|2 ≥ d2min,gb [L + 1 − N ] (41) N +1

We can now prove lemma 4. Proof of lemma 4. According to (22) we can write d2gb (e) as shown in (42). By using TN e instead of e in (42) we obtain d2gb (TN e) = λ2N (e)+

∞ 

|¯ xe|b [n, N ]|2 +(L−N )θ (43)

n=N +1

Following [15] we conclude as shown in (44). Instead of finding the minimum in (44) we lower bound it as shown in (45). The last inequality follows from the fact that there is more freedom to choose z[n] than x˙ er |b [n, N ] in the minimization. That z ≥ dmin,gb [L + 1 − N ] is clear from lemma 5. The minimization (45) is easy to solve; there are two types of solution depending whether the difference Δ2N (er )  d2gb (TN er ) − λ2N (er ) − (L − N )θ ∞  |¯ xe|b [n, N ]|2 =

(46)

n=N +1

is larger or smaller than d2min,gb [L + 1 − N ]. The solution is z[n] = −(max{1, dmin,gb [L + 1 − N ]/ΔN (er )}) x ¯er |b [n, N ], n>N (47)

VIII. C ONCLUSIONS

The value of this solution is shown in (48). Inserting (48) into (44) gives (49), and the lemma is proved.

We have proposed a powerful algorithm to search for the minimum distance of nonbinary FTN signaling. We are capable of searching out to a very large search length even for large alphabets. The algorithm is the extension to infinite time signals of an existing algorithm proposed for discrete signals in [15]. Furthermore, the method can be used for a general ISI signal, and no assumptions on the worst case error events are made, except for length. We also found the optimal precoding filter for binary and quaternary FTN transmissions; the distance algorithm was a crucial tool here. M-algorithm receiver tests give a reasonable verification of these distances. A major reason for the improved performance of FTN systems is their more favorable Shannon limit. In a future paper we will present a careful study of the FTN Shannon capacity.

[1] J. E. Mazo, “Faster–than–Nyquist signaling,” Bell Syst. Tech. J., vol. 54, pp. 1451–1462, Oct. 1975. [2] J. B. Anderson and F. Rusek, “Improving OFDM: multistream faster than Nyquist signaling,” in Proc., 6th Int. ITG-Conf. Source and Channel Coding, Munich, April 2006. [3] A. D. Liveris and C. N. Georghiades, “Exploiting faster–than–Nyquist signaling,” IEEE Trans. Commun., vol. 51, pp. 1502–1511, Sept. 2003. [4] D. Hajela, “On computing the minimum distance for faster–than– Nyquist signaling,” IEEE Trans. Inform. Theory, vol. 36, pp. 289–295, Mar. 1990. [5] J. E. Mazo and H. J. Landau, “On the minimum distance problem for faster–than–Nyquist signaling,” IEEE Trans. Inform. Theory, vol. IT-34, pp. 1420–1427, Nov. 1988. [6] F. Rusek and J. B. Anderson, “M-ary coded modulation by butterworth filtering,” in Proc. Int. Symp. Information Theory, Yokohama, p. 184, June 2003.

R EFERENCES

816

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

d2gb (e) =

N 

|¯ xe|b [n, N ] + x˙ e|b [n, N ]|2 + θ

n=N +1

n=0

=

∞ 

|xe|b [n]|2 + ∞ 

λ2N (e) +

e∈BN (er )

min

∞  

e∈BN (er )

≥ ≥

min

min

min

|¯ xe|b [n, N ] + x˙ e|b [n, N ]|2 +

|¯ xer

n=N +1

min

∞  

(42)

L 

(e2r [n] − )θ



(44)

n=N +1

L 

(e2r [n] − )θ



n=N +1 |b

[n, N ] + x˙ er

∞  

z≥dmin,gb [L+1−N ]

z≥dmin,gb [L+1−N ]

n=N +1

n=N +1

e∈BN (er )

(e2 [n] − )θ

|¯ xer |b [n, N ] + x˙ er |b [n, N ]|2 +

|¯ xer |b [n, N ] + x˙ er |b [n, N ]|2 +

∞  

L 

n=N +1

∞  

e∈BN (er )

e2 [n]

n=0

n=N +1

min {d2gb (e)} = λ2N (er ) +

L 

|b

2

[n, N ]|



+

|¯ xer |b [n, N ] + z[n]|2

min

L  

e∈BN (er )



(e2r [n] − )θ



n=N +1

(45)

n=N +1

 |¯ xer |b [n, N ] + z[n]|2 = (max{0, dmin,gb [L + 1 − N ] − ΔN (er )})2

(48)

n=N +1

⎧ 2 ⎨ λN (e), dmin,gb [L+1−N ]+dmin,gb [L+1−N ]ΔN (e) d2gb (e) ≥ ⎩ +d2gb (TN e)−(L−N )θ ,

[7] J. B. Anderson and A. Svensson, Coded Modulation Systems. New York: Plenum, 2003. [8] C.-K. Wang and L.-S. Lee, “Practically realizable digital transmission significantly below the Nyquist bandwidth,” IEEE Trans. Commun., vol. 43, pp. 166–169, Feb./Mar./Apr. 1995. [9] K.-T. Wu and K. Feher, “Multilevel PRS/QPRS above the Nyquist rate,” IEEE Trans. Commun., vol. 33, no. 7, pp. 735–739, July 1985. [10] A. V. Oppenheim and R. W. Schafer, Discrete-Time Signal Processing. Englewood Cliffs, NJ: Prentice-Hall, 1989. [11] J. K. Wolf and G. Ungerboeck, “Trellis coding for partial-response channels,” IEEE Trans. Commun., vol. 34, no. 8, pp. 765–772, Aug. 1986. [12] R. Laroia, S. A. Tretter, and N. Farvardin, “A simple and effective precoding scheme for noise whitening on intersymbol interference channels,” IEEE Trans. Commun., vol. 41, pp. 460–463, Oct. 1993. [13] R. Karabed, P. H. Siegel, and E. Soljanin, “Constrained coding for binary channels with high intersymbol interference,” IEEE Trans. Inform. Theory, vol. IT–45, pp. 1777–1797, Sept. 1999. [14] A. Said and J. B. Anderson, “Design of optimal signals for bandwidth– efficient linear coded modulation,” IEEE Trans. Inform. Theory, vol. 44, pp. 701–713, Mar. 1998. [15] A. Said, “Design of optimal signals for bandwidth-efficient linear coded modulation,” Ph.D. thesis, Dept. Elec., Computer and Systems Eng., Rensselaer Poly. Inst., Troy, NY, Feb. 1994. [16] G. D. Forney, Jr., “Maximum likelihood sequence estimation of digital sequences in the presence of intersymbol interference,” IEEE Trans. Inform. Theory, vol. 18, pp. 363–378, May 1972. [17] M. V. Eyuboglu and S. U. Qureshi, “Reduced–state sequence estimation with set partitioning and decision feedback,” IEEE Trans. Commun., vol. 36, pp. 13–20, Jan. 1988. [18] F. Xiong, A. Zerik, and E. Shwedyk, “Sequential sequence estimation for channels with intersymbol interference of finite or infinite length,” IEEE Trans. Commun., vol. 38, pp. 795–804, June 1990. [19] G. Ungerboeck, “Adaptive maximum-likelihood receiver for carriermodulated data-transmission systems,” IEEE Trans. Commun., vol. 22, pp. 624–636, May 1974. [20] P. J. McLane, “A residual interference error bound for truncated state

ΔN (e) ≥ dmin,gb [L+1−N ]

(49)

ΔN (e) < dmin,gb [L+1−N ]

detectors,” IEEE Trans. Inform. Theory, vol. 26, pp. 548–553, Sept. 1980. [21] S. Shamai, L. H. Ozarow, and A. D. Wyner, “Information rates for a discrete-time Gaussian channel with intersymbol interference and stationary inputs,” IEEE Trans. Inform. Theory, vol. 37, pp. 1527–1539, Nov. 1991. [22] F. Rusek and J. B. Anderson, “On decision depth for partial response codes,” in Proc., Int. Conf. Commun., Seoul, May 2005. [23] J. G. Proakis, Digital Communications, 4th ed. New York: McGraw-Hill, 2001.

Fredrik Rusek was born in Lund, Sweden in 1978. He received the M.S. and Ph.D. degrees in electrical engineering from Lund Institute of Technology in 2003 and 2007. From 2008 he is employed as researcher at the Department of Electrical and Information Technology at Lund Institute of Technology. His research interests include modulation theory, equalization, wireless communications and applied information theory.

RUSEK and ANDERSON: NON BINARY AND PRECODED FASTER THAN NYQUIST SIGNALING

John B. Anderson was born in New York State in 1945. He received the B.S., M.S. and Ph.D. degrees in electrical engineering from Cornell University in 1967, 1969 and 1972. During 1972-80 he was on the faculty of the Electrical and Computer Engineering Dept. at McMaster University in Canada, and during 1981-98 he was Professor in the Electrical, Computer and Systems Engineering Dept. at Rensselaer Polytechnic Institute. Since 1998 he has held the Ericsson Chair in Digital Communication at Lund Univ., Sweden. He has held visiting professorships at the Univ. of Calif., Berkeley (1978-79), Chalmers Univ., Sweden (1987), Queen’s Univ., Canada (1987), Deutsche Luft- und Raumfahrt, Germany (1991-92, 1995-96) and Tech. Univ. of Munich (1995-96). His research work is in coding and communication algorithms, bandwidth-efficient coding, and the application of these to data transmission and compression. He has served widely as a consultant in these fields. Presently, he is Director of the Swedish Strategic Research Foundation Center for High Speed Wireless Communication at Lund. Dr. Anderson was a member of the IEEE Information Theory Society

817

Board of Governors during 1980–87 and 2001–06, serving as the Society’s Vice-President (1983-84) and President (1985). In 1983 and 2006 he was Co-Chair of the IEEE International Symposium on Information Theory. He served during the 1990s as chair of Research Initiation Grants for the IEEE Foundation. In the IEEE publications sphere, he served on the Publications Board of IEEE during 1989–91 and 1994–96. He was a member of the IEEE Press Board during 1993–2006 and during 1994–96 was Editor-in-Chief of the Press. Since 1998 he has edited the IEEE Press book Series on Digital and Mobile Communication. He has also served as Associate Editor of the IEEE Transactions on Information Theory (1980-84) and as Guest Editor of the IEEE Transactions on Communications on several occasions. Dr. Anderson is author or coauthor of six textbooks, including most recently Digital Transmission Engineering (IEEE Press, 2nd ed. 2005), Coded Modulation Systems (Plenum/Springer 2003), and Understanding Information Transmission (IEEE Press 2005). He is Fellow of the IEEE (1987) and received the Humboldt Research Prize (Germany) in 1991. In 1996 he was elected Swedish National Visiting Chair in Information Technology. He received the IEEE Third Millenium Medal in 2000.

818

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

Progressive Linear Precoder Optimization for MIMO Packet Retransmissions Exploiting Channel Covariance Information Haitong Sun, Student Member, IEEE, Zhihua Shi, Chunming Zhao, Member, IEEE, Jonathan H. Manton, Senior Member, IEEE, and Zhi Ding, Fellow, IEEE

Abstract—This work investigates the design of linear precoders for ARQ packet retransmissions in Multi-Input Multi-Output (MIMO) systems. We consider transmitter precoder design based on partial MIMO channel information in the form of their covariance feedback. Our objective is to maximize the ergodic mutual information provided by multiple (re)transmissions of a packet subject to transmission power constraint. We propose a set of near-optimal successive linear ARQ precoders for flat fading MIMO channels. This progressive linear ARQ precoder combines the appropriate power loading and the reverse-order pairing of singular values in the current retransmission with previous transmissions. This reverse-order pairing is a special feature unique to our sequential ARQ precoding approach with demonstrated performance gains. Index Terms—MIMO systems, automatic repeat request, mutual information, antenna correlation, channel state information.

I

I. I NTRODUCTION

T has been widely known that substantial increase of channel capacity can be achieved by Multi-Input Multi-Output (MIMO) systems [1], [2] through well designed transmitter precoding according to full or partial (statistical) Channel State Information (CSI). There has been extensive research on the selection of optimal precoders under various criteria based on different levels of CSI knowledge. In applications where full CSI is available at transmitters, closed form linear precoders have been found under various optimization criteria [2]–[9]. In practical wireless system, even though training may help receivers to acquire full CSI, sending full CSI to the transmitter through feedback may not be Paper approved by D. L. Goeckel, the Editor for Space-Time and OFDM of the IEEE Communications Society. Manuscript received March 1, 2006; revised October 25, 2006 and March 16, 2007. This work was supported in part by the MOST (China) 973 Program Award No.2007CB310603. The work of H. Sun and Z. Ding is also supported in part by the National Science Foundation Grants, CCF-0515058, CNS-0520126 and by the US Army Research Office Grant W911NF-05-1-0382. H. Sun was with the Univ. of California, Davis, CA 95616, USA. He is now with Qualcomm, San Diego, CA, USA (e-mail: [email protected]). Z. Shi and C. Zhao are with the National Mobile Communications Research Laboratory, Southeast University, China (e-mail: {shizhihua, cmzhao}@seu.edu.cn). J. H. Manton is with the Dept. of Information Engineering, Research School of Information Sciences and Engineering (RSISE), Australian National University, Canberra ACT 0200, Australia (e-mail: [email protected]). Z. Ding is Guest Changjiang Chair Professor of Southeast University, Nanjing, China, and Professor of Electrical and Computer Engineering, Univ. of California Davis, CA 95616, USA (e-mail: [email protected]). Digital Object Identifier 10.1109/TCOMM.2008.060114.

possible due to wireless channel fading, delay, and limited feedback capacity. However, statistical MIMO CSI such as channel mean or covariance, can be better estimated and be given to the transmitter. Considerable efforts have been drawn to the design of optimal precoders from partial (statistical) CSI feedback to maximize ergodic capacity [10], [11], or to maximize capacity bounds [12], [13]. Similar ideas have been applied to precoder designs that minimize mean square symbol error, e.g., [14]. We note that to improve link reliability, practical modern systems are often equipped with Automatic Repeat reQuest (ARQ). While extensive ARQ studies exist primarily from the FEC perspective, only limited number of research works in the literature address the integrated effect of MIMO and ARQ. Recently, a theoretical framework has been presented by Gamal et. al [15] for MIMO ARQ systems. By generalizing the well known work by Zheng and Tse [16] on single transmission MIMO systems, [15] analyzed the diversitymultiplexing-delay tradeoff for MIMO ARQ retransmissions. It should be noted, however, that this new MIMO-ARQ tradeoff analysis is focused on scenarios where the transmitter does not have any CSI. When certain level of CSI is available, more active transmission techniques, such as precoding, can be developed to deliver better system performance. Given full CSI at the transmitter, we have recently developed optimal sequential linear precoders for ARQ retransmissions under both maximum capacity criterion and minimum MSE criterion [17], [18]. In this work, we investigate the problem of precoder design when only partial MIMO CSI (channel covariance) is known. As an optimality criterion, outage mutual information would better capture the practical MIMO system performance under fading channels. However, its analysis includes multiple factors such as target information rate and data packet length which complicate the design problem. To reduce design complexity, we choose ergodic mutual information as the performance metric. By sequentially optimizing the precoder for each ARQ transmission, our goal is to maximize ergodic mutual information delivered by multiple transmissions of a packet. It should be remarked that sequential (progressive) precoding optimization is a feature unique to ARQ. Since previous transmissions cannot be altered and future retransmissions may not be needed, in addition, knowledge of the future channel may not be available at the transmitter, the transmitter should only optimize the current

c 2008 IEEE 0090-6778/08$25.00 

SUN et al.: PROGRESSIVE LINEAR PRECODER OPTIMIZATION FOR MIMO PACKET RETRANSMISSIONS

 w1

 u

F1

H1

ARQ F2

ARQ FM

. . .

 y1

 y

 w2

 y2

H2  wM

HM

 yM

Fig. 1. Block diagram of M linearly precoded (re-) transmissions of a packet through flat fading MIMO channel.

819

symbol u, the overall received signal is ⎡ ⎡ ⎤ ⎡ ⎤ ⎤ y1 w 1 H1 F1 ⎢ ⎢ . ⎥ ⎥ ⎢ ⎥ .. y = ⎣ ... ⎦ = ⎣  (1) ⎦ u + ⎣ .. ⎦ = Hu + w. . yM HM FM w M  

H

w 

where Hi and Fi (i = 1, · · · , M ) are the MIMO channel and the linear precoder during the ith transmission with noise vector w  i . Their sizes are MR × MT , MT × MT , MR × 1 and MT × 1, respectively. Note that the transmission number M depends on the success of previous (re)transmissions. Without loss of generality, data symbols in u are i.i.d. with zero mean and covariance matrix Ru = E{uuH } = σu2 IMT . The noises {w  i } are also zero mean with covariance matrix 2 Rwi = E{w  iw  iH } = σw IMR . Thus, 2 Rw = E{w w  H } = diag(Rw1 , . . . , RwM ) = σw IM·MR .

(re-)transmission. The paper is organized as follows: Section II outlines the system and channel model. Section III formulates the sequential precoder design problem against the criterion of maximum ergodic mutual information given channel covariance information, presenting a set of possible optimal precoders involving an unitary matrix. In Section IV, we limit our precoder design requiring the determination of an unitary matrix to the smaller set of permutation matrices. Deriving a tight ergodic mutual information bound, we propose an optimal permutation matrix and provide two suboptimal power loading strategies. Simulation results are presented in Section VI while concluding remarks are given in Section VII. II. S YSTEM M ODEL A. Notations We apply standard notations here. A represents a matrix and a represents a vector. {·}H denotes conjugate transpose. IN denotes an N × N identity matrix. In addition, we use diag(a1 , . . . , aN ) diagonal matrix with diagonal elements a1 , . . . , aN λiA ith nonnegative singular value of a matrix A ,im m × n submatrix formed by entries of Aij11,i,j22,··· ,··· ,jn A from rows i1 , i2 , · · · , im and columns j1 , j2 , · · · , jn . B. System Model for MIMO ARQ Retransmissions Consider a flat fading MIMO wireless system with MT transmit and MR receive antennas. Let MR ≥ MT to guarantee symbol recoverability. As shown in Fig.1, the MIMO system has an ARQ retransmission mechanism. We let the same data packet be transmitted during ARQ retransmissions. Extending the single transmission precoding design problems of [2], [7], [11], our goal is to apply different precoders {Fi } targeting MIMO channels {Hi } during different retransmissions. Hence, after M (ARQ) transmissions of the packet

We define the transmission signal-to-noise-ratio (SNR) as γ = 2 σu2 /σw . In this paper, we design optimum precoders Fi to maximize the ergodic mutual information due to multiple transmissions of u via M MIMO channels. For fairness, the transmit power for each transmission is limited Tr{Fm FH m } = p. C. Channel Model and Information Transmitter precoder design relies on the amount of available CSI. Conventionally, if CSI is not fully known, flat fading MIMO channels are modeled as a matrix of complex Gaussian entries. In this work, we consider a well established channel covariance feedback model [11], [19], [20] in which the channel is fading rapidly. In general, correlated MIMO channel matrix can be decomposed into 1/2

1/2

H = ΣR Hω ΣT ,

(2)

where Hω is an MR × MT matrix of i.i.d. zero mean, unit variance complex Gaussian random variables, while ΣT and ΣR are MT × MT and MR × MR covariance matrices representing the correlation among transmit and receive antennas, respectively. In designing the optimal precoder at the transmitter, we focus on exploiting the correlation among transmit antennas. Thus, we assume uncorrelated receive antennas, i.e. ΣR = IMR . III. E RGODIC M UTUAL I NFORMATION AND O PTIMAL P RECODER S TRUCTURE Input/output mutual information delivered by MIMO systems (1) has been extensively investigated [1], [2]. We design linear MIMO-ARQ precoders to maximize ergodic mutual information provided by multiple transmissions of the same data symbol. We present a generic optimal precoder structure useful for designing suboptimal precoders subsequently. We emphasize again that our ARQ design of optimal precoders assumes that the transmitter only has knowledge of the channel covariance matrix ΣT while ΣR = IMR .

820

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

I



H = E log2 det IM·MR + R−1 w HRu H  M  1/2 H H 1/2 H = E log2 det IMT + γ Fi (ΣT,i ) Hω,i Hω,i ΣT,i Fi

(3)

i=1

A. Ergodic Mutual Information Ergodic mutual information between the source date u and received signal y relies on the distribution of u. As shown in [2], given any input covariance matrix Ru , the maximum mutual information is achieved when the input signal is circularly symmetric complex Gaussian. Using the channel model of (2), Hω,i , i = 1, · · · , M are independent Gaussian random matrices and ΣT,i is channel covariance matrix at the transmitter during ith transmission of a packet. Hence, our goal is to design Fi to maximize the ergodic capacity as shown in (3), where the second equality follows from the identity det(I + AB) = det(I + BA). Note that, to simplify the problem, we do not use the conditional channel distribution by exploiting the fact that previous M − 1 retransmissions fail. In other words, in the ergodic capacity computation, we still consider all the possible channel realizations without excluding those good ones that could give correct reception during the previous transmissions. B. Sequential Precoder Design and Optimal Precoder Structure Sequential design of linear MIMO precoders under ARQ differs from those under predetermined time diversity because future ARQ retransmissions may not occur when considering the i−th transmission. Meanwhile, based on our problem 1/2 formulation (3), since the channel covariance matrix ΣT,i is allowed to change from transmission to transmission, the transmitter does not have the covariance knowledge of the future channel. Thus, at a given (i−th) retransmission, the ARQ transmitter must progressively find this best linear precoder based on the knowledge of both the current CSI and the previous transmissions. For a single transmission, optimal precoder structure has been proposed [11]. Applying similar techniques, here we propose the optimal sequential precoder structure under ARQ retransmissions. In short, for the mth transmission, the transmitter will optimize current linear precoder Fm based on the current and previous CSI (covariance matrices) {ΣT,i , i = 1, 2, · · · , m} plus the previous precoders {Fopt , i = 1, 2, · · · , m − 1}. i Therefore, the sequential precoder design for mth transmission under power constraint can be formulated as shown in (4). Before proceeding, we clarify that (1) We select ergodic mutual information as the performance metric. Although actual system performance can be better characterized by outage capacity, it unfortunately leads to a highly complex design problem. Our choice is mainly for better tractability. (2) Our precoding design focuses on the Type-I hybrid ARQ scheme where the same packet is resent during ARQ retransmissions. Compared with Type-II ARQ where the additional incremental redundancy bits are sent during ARQ retransmissions for better error correction, TypeI ARQ requires much lower encoding and decoding

complexity during retransmissions and is attractive for multiple antenna systems. For precoder design under Type-II ARQ, single transmission design results [11] can be directly applied which give linear ergodic mutual information increase with the number of retransmissions. 1/2 Now, to simplify problem (4), we decompose ΣT,i as 1/2 1/2 H ΣT,i = UT,i ΛT,i VT,i where UT,i and VT,i are unitary 1/2 1/2 T while ΛT,i = diag{(λ1ΣT ,i )1/2 , · · · , (λM } with deΣT ,i ) scending entries. Because multiplying a precoder matrix with unitary matrix does not change its power, we use transformation H ˜ i = VT,i F Fi , i = 1, · · · , m. (5) Using (5) and the fact that Hω,i and Hω,i UT,i have identical distribution, problem (4) can be simplified as shown in (6). The following theorem characterizes some special structures of the sequentially optimal precoder. ˜ opt Theorem 1: The solution F to the optimization of (6) m can be decomposed into the product H opt ˜ opt F m = VT,m Fm = ΛFm UFm

or

Fopt m = VT,m ΛFm UFm , (7)

T in which ΛFm = diag{λ1Fm , · · · , λM Fm } is a diagonal matrix while UFm is unitary.  The proof of Theorem 1 follows the techniques in [11] and is given in Appendix A. We note that UF1 is an inherent ambiguity for the first transmission as it does not affect ergodic mutual information. Nevertheless, in subsequent transmissions, UFi significantly affects the system mutual information. This effect is clear from our previous precoder design with perfect CSI [17], [18]. The basic role of UFi is to allocate strong channel modes to unlucky data symbols assigned unreliable channel modes in previous transmissions. Finding the optimal unitary matrix UFi is a hard problem. To simplify, we restrict UFi to a permutation matrix PFi rather than unitary. This is not an arbitrary decision. The justification is that this simplification preserves the important property of optimal precoder shared by all existing precoder designs under various levels of CSI and different optimality criteria [2]–[7], [11], [17], [18]. It still permits current channel allocation based on previous channel effects. Basically, each data stream will be power loaded and transmitted over one orthogonal channel mode. This simplified precoder structure leads to the suboptimal solution of

F∗i = VT,i ΛFi PFi

i = 1, · · · , m.

(8)

Our precoder design will from now on be based on this nearoptimal precoder structure. IV. E RGODIC M UTUAL I NFORMATION B OUND AND S UBOPTIMAL D ESIGN Alternatively, we can first derive a tight upper bound of ergodic mutual information. Based on this bound, we will derive

SUN et al.: PROGRESSIVE LINEAR PRECODER OPTIMIZATION FOR MIMO PACKET RETRANSMISSIONS

Fopt m = arg max E Fm

⎧ ⎨ ⎩

⎛ log2 det ⎝

IMT

subject to Tr{Fm FH m} = p

˜ opt = arg max E F m ˜m F

⎧ ⎨ ⎩

⎛ log2 det ⎝

IMT

˜H} = p ˜ mF subject to Tr{F m

+

H H γFH m (ΣT,m ) Hω,m Hω,m ΣT,m Fm

D

=



=



D∗1 , · · · , D∗m−1 , Dm  H diag HH ω,1 , · · · , Hω,m .

i=1

+ γ

1/2

1/2

m−1  i=1

˜ opt )H Λ1/2 HH Hω,i Λ1/2 F ˜ opt (F ω,i i T,i T,i i

H ˜HΛ ˜ + γF m T,m Hω,m Hω,m ΛT,m Fm

While closed-form ergodic mutual information of MIMO wireless fading channels has been found for uncorrelated channels [2], [21] and for correlated channels [22], their forms are often too complex for designing precoders. Existing precoder designs in terms of power loading generally consider simpler mutual information bounds for suboptimal solutions [12], [13], [23]. Utilizing the suboptimal precoder structure (8), the ergodic mutual information becomes a function of only the power loading matrix ΛFm and the permutation matrix PFm is shown in (9), in which {P∗Fi , i = 1, · · · , m − 1} and {Λ∗Fi , i = 1, · · · , m−1} are known matrices that have already been designed and applied during previous transmissions. We denote diagonal matrix

and expanded matrices

1/2 1/2 opt (Fopt )H (ΣT,i )H HH ω,i Hω,i ΣT,i Fi i

γ

A. Ergodic Mutual Information

Di =

m−1 

+

a suboptimal permutation matrix and present two suboptimal power loading strategies.

1/2 PH Fi ΛT,i ΛFi PFi ,

821

(10)

(11) (12)

The ergodic mutual information (9) can be written as 

I(ΛFm , PFm ) = E log2 det IMT + γDHω HωH DH (13) The major difficulty in finding the closed-form of (13) arises from the logarithm operation. Applying Jensen’s inequality, an upper bound on ergodic mutual information is obtained and a new objective of precoder design can be simplified to maximizing 

(14) I˜ = log2 E det IMT + γDHω HωH DH subject to the power constraint. The following Theorem summarizes the closed form

of (14).  Theorem 2: E det IMT + γDHω HωH DH can be expressed as shown in (15), where |j1 , · · · , jk |m denotes the ,ik number of columns in Dji11 ,··· ,··· ,jk that are chosen from Dm , i.e., the number of elements in {j1 , j2 , · · · , jk } satisfying (d − 1) · MT + 1 ≤ ji ≤ d · MT .  The proof is given in Appendix B. m Note that we always have the identity d=1 |j1 , · · · , jk |d = k. Additionally, (15) can be further simplified by incorporating

1/2

1/2

⎞⎫ ⎬ ⎠ ⎭

⎞⎫ ⎬ ⎠ , ⎭

(4)

(6)

the parallel-diagonal structure of D in (11) into (16), where (Djb )ib ,ib is the (ib )th diagonal entry of the submatrix Djb in (11). Since jb can be chosen freely from 1, ..., m, |j1 , · · · , jk |d is equivalently the number of elements in {j1 , j2 , · · · , jk } that equal to d. Based on our analysis thus far, instead of designing the optimal precoder (ΛFm and PFm ) that maximizes the exact metric (9), we seek suboptimal precoders to maximize the mutual information bound (14) that uses the closed form of (16). Hence, the precoder design problem now becomes max

ΛFm ,PFm



E det IMT + γDHω HωH DH ,

subject to Tr{Λ2Fm } = p.

(17)

B. Optimal Permutation Matrix We solve the optimization of (17) in two steps: (1) finding the optimal permutation matrix PFm ; and (2) proposing a power loading strategy ΛFm . The following Theorem provides the optimal permutation matrix PFm , Theorem 3: Given a fixed power loading matrix ΛFm , optimal permutation matrix PFm maximizing (14) arranges, in reverse order, the diagonal entries of m−1 ! i=1

D2i =

m−1 !

(P∗Fi )H ΛT,i (Λ∗Fi )2 P∗Fi ,

(18)

i=1

and 2 D2m = PH Fm ΛT,m ΛFm PFm .

(19) 

Proof: To show Theorem 3, it suffices to consider two entries in D2m that have the same order as their m−1 2 corresponding elements in i=1 Di . We show that switching positions of these two entries without changing other entries increases the value of (14). Hence, by induction and iteration over all entry pairs, the optimality of reverse order pairing is evident. Now we prove the 2-entry case. Let (Di )j,j denote the j th diagonal entry of Di . Without loss of generality, assume that the first two entries are paired in the order m−1 m−1 2 2 2 (D ) ≥ of i 1,1 i=1 i=1 (Di )2,2 and (Dm )1,1 ≥ 2 (Dm )2,2 . We only need to inspect parts of (16) related

822

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 56, NO. 5, MAY 2008

I(ΛFm , PFm ) = E ⎧ MT ⎨ ! k=0



⎧ ⎨ ⎩

⎛ log2 det ⎝

!

γk

IMT

γ

+

i=1 1/2 1/2 H γPH Fm ΛFm ΛT,m Hω,m Hω,m ΛT,m ΛFm PFm

"

!

k=0



!

γk

#

1≤i1