Multicarrier Communications and Signal Processing

6 downloads 0 Views 6MB Size Report
Visiting Professor at ESIEE in Paris. From ...... time coding, and radio resource management. ...... W. Yu, “Optimal multi-user spectrum management for dig-.
EURASIP Journal on Applied Signal Processing

Multicarrier Communications and Signal Processing Guest Editors: Ye (Geoffrey) Li, Hamid R. Sadjadpour, Dirk Dahlhaus, and Kung Yao

EURASIP Journal on Applied Signal Processing

Multicarrier Communications and Signal Processing

EURASIP Journal on Applied Signal Processing

Multicarrier Communications and Signal Processing Guest Editors: Ye (Geoffrey) Li, Hamid R. Sadjadpour, Dirk Dahlhaus, and Kung Yao

Copyright © 2004 Hindawi Publishing Corporation. All rights reserved. This is a special issue published in volume 2004 of “EURASIP Journal on Applied Signal Processing.” All articles are open access articles distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Editor-in-Chief Marc Moonen, Belgium

Senior Advisory Editor K. J. Ray Liu, College Park, USA

Associate Editors Gonzalo Arce, USA Jaakko Astola, Finland Kenneth Barner, USA Mauro Barni, Italy Sankar Basu, USA Jacob Benesty, Canada Helmut Bölcskei, Switzerland Joe Chen, USA Chong-Yung Chi, Taiwan M. Reha Civanlar, Turkey Luciano Costa, Brazil Satya Dharanipragada, USA Petar M. Djurić, USA Jean-Luc Dugelay, France Touradj Ebrahimi, Switzerland Frank Ehlers, Germany Moncef Gabbouj, Finland Sharon Gannot, Israel Fulvio Gini, Italy

A. Gorokhov, The Netherlands Peter Handel, Sweden Ulrich Heute, Germany John Homer, Australia Jiri Jan, Czech Søren Holdt Jensen, Denmark Mark Kahrs, USA Thomas Kaiser, Germany Moon Gi Kang, Korea Aggelos Katsaggelos, USA Walter Kellermann, Germany Alex Kot, Singapore C.-C. Jay Kuo, USA Chin-Hui Lee, USA Sang Uk Lee, Korea Geert Leus, The Netherlands Mark Liao, Taiwan Yuan-Pei Lin, Taiwan Bernie Mulgrew, UK

King N. Ngan, Hong Kong Douglas O’Shaughnessy, Canada Antonio Ortega, USA Montse Pardas, Spain Vincent Poor, USA Phillip Regalia, France Markus Rupp, Austria Hideaki Sakai, Japan Bill Sandham, UK Dirk Slock, France Piet Sommen, The Netherlands John Sorensen, Denmark Sergios Theodoridis, Greece Dimitrios Tzovaras, Greece Jacques Verly, Belgium Xiaodong Wang, USA Douglas Williams, USA Xiang-Gen Xia, USA Jar-Ferr Yang, Taiwan

Contents Editorial, Ye (Geoffrey) Li, Hamid R. Sadjadpour, Dirk Dahlhaus, and Kung Yao Volume 2004 (2004), Issue 10, Pages 1431-1432 Split SR-RLS for the Joint Initialization of the Per-Tone Equalizers and Per-Tone Echo Cancelers in DMT-Based Receivers, Geert Ysebaert, Koen Vanbleu, Gert Cuypers, and Marc Moonen Volume 2004 (2004), Issue 10, Pages 1433-1445 Zero-Forcing Frequency-Domain Equalization for Generalized DMT Transceivers with Insufficient Guard Interval, Tanja Karp, Steffen Trautmann, and Norbert J. Fliege Volume 2004 (2004), Issue 10, Pages 1446-1459 EM-Based Channel Estimation Algorithms for OFDM, Xiaoqiang Ma, Hisashi Kobayashi, and Stuart C. Schwartz Volume 2004 (2004), Issue 10, Pages 1460-1477 Adaptive Zero-Padding OFDM over Frequency-Selective Multipath Channels, Neng Wang and Steven D. Blostein Volume 2004 (2004), Issue 10, Pages 1478-1488 Parallel Multistage Decision Feedback Equalizer for Single-Carrier Layered Space-Time Systems in Frequency-Selective Channels, Jing Xu, Haifeng Wang, Shixin Cheng, and Ming Chen Volume 2004 (2004), Issue 10, Pages 1489-1497 PSD-Constrained PAR Reduction for DMT/OFDM, Niklas Andgart, Brian S. Krongold, Per Ödling, Albin Johansson, and Per Ola Börjesson Volume 2004 (2004), Issue 10, Pages 1498-1507 Bandwidth Efficient OFDM Transmitter Diversity Techniques, King F. Lee and Douglas B. Williams Volume 2004 (2004), Issue 10, Pages 1508-1519 Partial Crosstalk Cancellation for Upstream VDSL, Raphael Cendrillon, Marc Moonen, George Ginis, Katleen Van Acker, Tom Bostoen, and Piet Vandaele Volume 2004 (2004), Issue 10, Pages 1520-1535 Performance Analysis of Multiple-Symbol Differential Detection for OFDM over Both Time- and Frequency-Selective Rayleigh Fading Channels, Akira Ishii, Hideki Ochiai, and Tadashi Fujino Volume 2004 (2004), Issue 10, Pages 1536-1545 Optimized Irregular Low-Density Parity-Check Codes for Multicarrier Modulations over Frequency-Selective Channels, Valérian Mannoni, David Declercq, and Guillaume Gelle Volume 2004 (2004), Issue 10, Pages 1546-1556 Layered Video Transmission on Adaptive OFDM Wireless Systems, D. Dardari, M. G. Martini, M. Mazzotti, and M. Chiani Volume 2004 (2004), Issue 10, Pages 1557-1567

Multicarrier Block-Spread CDMA for Broadband Cellular Downlink, Frederik Petré, Geert Leus, Marc Moonen, and Hugo De Man Volume 2004 (2004), Issue 10, Pages 1568-1584 Bit Error Rate Analysis for MC-CDMA Systems in Nakagami-m Fading Channels, Zexian Li and Matti Latva-aho Volume 2004 (2004), Issue 10, Pages 1585-1594 Performance of Asynchronous MC-CDMA Systems with Maximal Ratio Combining in Frequency-Selective Fading Channels, Keli Zhang and Yong Liang Guan Volume 2004 (2004), Issue 10, Pages 1595-1603 Design and Implementation of MC-CDMA Systems for Future Wireless Networks, Sébastien Le Nours, Fabienne Nouvel, and Jean-François Hélard Volume 2004 (2004), Issue 10, Pages 1604-1615

EURASIP Journal on Applied Signal Processing 2004:10, 1431–1432 c 2004 Hindawi Publishing Corporation 

Editorial Ye (Geoffrey) Li School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332-0250, USA Email: [email protected]

Hamid R. Sadjadpour School of Engineering, University of California, Santa Cruz, CA 95064, USA Email: [email protected]

Dirk Dahlhaus Communication Technology Lab, ETH Zurich, Sternwartstrasse 7, 8092 Zurich, Switzerland Email: [email protected]

Kung Yao Department of Electrical Engineering, University of California, Los Angeles, CA 90095, USA Email: [email protected]

Multicarrier (MC) transmission, especially, orthogonal frequency division multiplexing (OFDM), has recently attracted considerable attention since it has been shown to be an effective technique to combat delay spread or frequencyselective fading of wireless or wireline channels. This approach has been adopted in standards for several outdoor and indoor high-speed wireless and wireline data applications, including wireless local area networks, digital audio and video broadcasting, and digital subscriber line modems. MC transmission requires no equalizers, which makes it possible to combine with many advanced techniques to improve the capacity and enhance the performance of transmission. At the same time, many issues in MC communications, such as time- and frequency-offset estimation and correction, channel estimation, and peak-to-average power ratio (PAPR) reduction, need to be solved. This special issue includes 15 papers that address all of these issues. Channel estimation and (one-tap) equalization are very important for signal detection of MC or OFDM. The first five papers are on this topic. The papers by G. Ysebaert et al. and by T. Karp et al. investigate one-tap or per-tone equalization in DMT. The paper by X. Ma et al. applies EM algorithms in channel estimation of OFDM-based wireless communication systems. The paper by N. Wang and S. D. Blostein develops adaptive zero-padding approaches for bandwidth efficient OFDM. The paper by J. Xu et al. compares the complexity and the performance of multiple-input multiple-output (MIMO) OFDM and single-carrier systems with frequencydomain equalization (SC-FDE). The PAPR problem is dealt

with in the paper by N. Andgart et al., where per-tone reservation is used to reduce the PARP of OFDM (or DMT) systems. There are five papers that investigate signal detection and coding in OFDM or DMT systems. The paper by K. F. Lee and D. B. Williams proposes iterative space-time and spacefrequency block-coded OFDM with transmit antenna arrays. The paper by R. Cendrillon et al. deals with partial crosstalk cancellation in DMT-based very-high-data-rate digital subscribe line (VDSL) systems. The papers by A. Ishii et al. and by V. Mannoni et al. study differential detection and LDPC code for OFDM systems, respectively. The paper by D. Dardari et al. studies adaptive modulation and bit loading for OFDM-based video transmission systems. MC can be used together with code-division multiple access (CDMA) to form MC-CDMA and get their advantages. There are four papers in this topic. The paper by F. Petr´e et al. studies MC-based block-spread CDMA for broadband cellular systems. The papers by Z. Li and M. Latva-aho and by K. Zhang and Y. L. Guan analyze the performance of MCCDMA systems. The paper by S. Le Nours et al. investigates implementation issues of MC-CDMA. Again, we would like to thank the authors for their submissions and the reviewers for their high-quality reviews. Ye (Geoffrey) Li Hamid R. Sadjadpour Dirk Dahlhaus Kung Yao

1432 Ye (Geoffrey) Li received his B.S.E. and M.S.E. degrees in 1983 and 1986, respectively, from the Department of Wireless Engineering, Nanjing Institute of Technology, Nanjing, China, and his Ph.D. degree in 1994 from the Department of Electrical Engineering, Auburn University, Alabama. After spending several years at AT&T Labs – Research, he joined the School of Electrical and Computer Engineering at Georgia Tech as an Associate Professor in 2000. His general research interests include statistical signal processing and wireless communications. In these areas, he has contributed over 100 papers published in referred journals and presented in various international conferences. He also has over 10 USA patents granted or pending. He once served as a Guest Editor for two special issues on signal processing for wireless communications for the IEEE J-SAC. He is currently serving as an Editor for Wireless Communication Theory for the IEEE Transactions on Communications and an Editorial Board Member of EURASIP Journal on Applied Signal Processing. He organized and chaired many international conferences. He was, for example, the Technical Program Vice-Chair of IEEE 2003 International Conference on Communications. Hamid R. Sadjadpour received his B.S. and M.S. degrees in 1986 and 1988, respectively, from the Department of Electrical Engineering, Sharif University of Technology, Tehran, Iran, and his Ph.D. degree in 1996 from the Department of Electrical Engineering, University of Southern California, California. He worked first as a Senior Technical Staff Member and then a Principal Technical Staff Member at AT&T Labs – Research between 1995 and 2001. He joined the Department of Electrical Engineering at University of California, Santa Cruz, as an Assistant Professor in 2001. His general research interests include communication theory and signal processing for wireless communications, fiber optic, and wired applications. In these areas, he has published over 45 journal, conference, or technical papers. He also has 11 patents granted or pending. Dirk Dahlhaus received the Dipl.-Ing. degree in electrical engineering from Ruhr-Universit¨at Bochum, Germany, in 1992, and the Ph.D. degree from Swiss Federal Institute of Technology (ETH) Zurich, Switzerland, in 1998. Since April 1999, he has been an Assistant Professor for mobile radio systems at the Communication Technology Laboratory, ETH Zurich. His research interests include different aspects in the physical layer of wireless and mobile radio communication systems where he has published some 50 papers (http://www.nari.ee.ethz.ch). In 2002, he was a President of the International Zurich Seminar on Broadband Communications.

EURASIP Journal on Applied Signal Processing Kung Yao received the B.S.E. (with highest honors), M.A., and Ph.D. degrees in electrical engineering, all from Princeton University, Princeton, NJ. Presently, he is a Professor in the Electrical Engineering Department at UCLA. In 1969, he was a Visiting Assistant Professor at MIT. From 1985 to 1988, he served as an Assistant Dean of the School of Engineering and Applied Science at UCLA. His research interests include sensor array system, digital communication theory, wireless radio system, chaos communications, digital and array processing, systolic and VLSI algorithms, and simulation. He has published over 250 journal and conference papers. Dr. Yao received the IEEE Signal Processing Society’s 1993 Senior Award on VLSI signal processing. He was the Coeditor of a two-volume series of an IEEE reprint book High-Performance VLSI Signal Processing Innovative Architectures and Algorithms, IEEE Press, 1997. From 1991 to 1993, he was the Associate Editor of VLSI Signal Processing of the IEEE Trans. on Circuits and Systems. Since 1999, he is an Associate Editor of the IEEE Communications Letters. He is an Associate Editor of Journal of VLSI Signal Processing and EURASIP Journal on Applied Signal Processing. He is a Fellow of IEEE.

EURASIP Journal on Applied Signal Processing 2004:10, 1433–1445 c 2004 Hindawi Publishing Corporation 

Split SR-RLS for the Joint Initialization of the Per-Tone Equalizers and Per-Tone Echo Cancelers in DMT-Based Receivers Geert Ysebaert ESAT-SCD, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, 3001 Leuven-Heverlee, Belgium Email: [email protected]

Koen Vanbleu ESAT-SCD, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, 3001 Leuven-Heverlee, Belgium Email: [email protected]

Gert Cuypers ESAT-SCD, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, 3001 Leuven-Heverlee, Belgium Email: [email protected]

Marc Moonen ESAT-SCD, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, 3001 Leuven-Heverlee, Belgium Email: [email protected] Received 6 March 2003; Revised 25 August 2003 In asymmetric digital subscriber lines (ADSL), the available bandwidth is divided in subcarriers or tones which are assigned to the upstream and/or downstream transmission direction. To allow efficient bidirectional communication over one twisted pair, echo cancellation is required to separate upstream and downstream channels. In addition, intersymbol interference and intercarrier interference have to be reduced by means of equalization. In this paper, a computationally efficient algorithm for adaptively initializing the per-tone equalizers (PTEQ) and per-tone echo cancelers (PTEC) is presented. For a given number of equalizer and echo canceler taps per-tone, it was shown that the joint PTEQ/PTEC receiver structure is able to maximize the signal-tonoise ratio (SNR) on each subcarrier and hence also the achievable bit rate. The proposed initialization scheme is based on a modification of the square root recursive least squares (SR-RLS) algorithm to reduce computational complexity and memory requirement compared to full SR-RLS, while keeping the convergence rate acceptably fast. Our performance analysis will show that the proposed method converges in the mean and an upper bound for the step size is given. Moreover, we will indicate how the presented initialization method can be reused in several other ADSL applications. Keywords and phrases: adaptive signal processing, split SR-RLS, DMT, DSL, per-tone equalization, per-tone echo cancellation.

1.

INTRODUCTION

ADSL stands for asymmetric digital subscriber lines and is able to provide broadband data transmission over the existing telephone network. To increase the spectral efficiency of the available bandwidth, ADSL employs a transmission technique based on multicarrier modulation, namely, discrete multitone (DMT) [1, 2]. DMT divides the available bandwidth into N parallel subchannels or tones, by means of an N-point inverse fast Fourier transform (IFFT). At the transmitter, each tone is modulated by quadrature amplitude modulation (QAM) and IFFT transformed to obtain

a time domain signal. At the receiver, an N-point FFT can be used for demodulation. Prepending each data block after IFFT modulation with a cyclic prefix ensures that the subchannels remain independent after transmission over a channel. If the order of the channel (modeled as an FIR filter) is smaller than the cyclic prefix length, ν, the transmitted signal can easily be recovered by a bank of complex scalars, the so-called frequency domain equalizers (FEQs). In the ADSL context, the channel impulse response typically exceeds the cyclic prefix length, thereby destroying subchannel orthogonality. As a result, intersymbol interference (ISI) and intercarrier interference (ICI) will be present and

1434

EURASIP Journal on Applied Signal Processing

a channel-shortening time domain equalizer (TEQ) is required [3, 4, 5, 6, 7]. An alternative equalization structure is based on “per-tone” equalization (PTEQ), which accomplishes the joint task of TEQ/FEQ independently for each tone [8, 9]. Besides equalization, echo cancellation is required to separate upstream and downstream signals and to enable efficient bidirectional communication over the same telephone wire. Echo occurs due to signal leakage from the transmit side to the receive side in the modem since both sides are imperfectly coupled to the telephone line. If properly designed, echo cancellation can improve the reach and/or noise margin of an ADSL system by allowing both upstream and downstream signals to share the low frequency portion of the available frequency band. Several echo cancellation structures for DMT transceivers have been studied in literature [6, 8, 10, 11, 12, 13]. All the proposed structures exploit a common principle, namely, the echo channel is estimated through an adaptive updating process and an emulated version of the echo is subtracted from the received signal. Unfortunately, the echo cancelers, studied in [10, 11, 12], are designed independently from the equalizer. Van Acker et al. presented a joint per-tone echo cancellation (PTEC) and PTEQ, where an echo canceler and equalizer have to be designed for each tone separately [13]. For a given number of equalizer and echo canceler taps per subcarrier, this approach is able to optimize the signal-to-noise ratio (SNR) on each subcarrier and hence maximizes the achievable bit rate [13]. In this paper, we will focus on adaptively initializing the PTEQ/PTEC receiver structure. The problem consists of solving several parallel minimum mean square error (MMSE) problems (one MMSE problem for each tone) in an adaptive way. We are especially interested in developing an adaptive algorithm which exhibits fast convergence, low memory requirement, and low computational complexity. In the literature, several adaptive algorithms exist to solve an MMSE problem of the form min E w



d(k) − wT u(k)

2 

,

(1)

where E {·} represents the expectation operator, {·}T denotes the transpose, d(k) is some desired signal at time k, w are the unknown coefficients and u(k) is the input vector. The most well-known and extensively studied adaptive algorithm is certainly the least mean square (LMS) algorithm by Widrow and Hoff [14, 15]. Although the algorithm is simple, the bad conditioning of the input autocorrelation matrices (one for each tone) for the PTEQ/PTEC receiver, leads to slow convergence. Since the seventies, a lot of effort has been spent to find alternatives for LMS with faster convergence, which has lead to a variety of algorithms. (i) LMS derivatives: these algorithms are derived from the original LMS scheme and include algorithms as normalized LMS (NLMS) [14] and looping LMS (LLMS) [16]. In NLMS, the step size is normalized with the in-

put signal power to avoid gradient noise amplification [14], which leads to slightly improved convergence. LLMS repeatedly applies LMS to a block of data, but still requires too many iterations and computations in case of the PTEQ/PTEC receiver. (ii) Transform domain LMS: this type of adaptive filters refers to LMS filters where blocks of input data are preprocessed with a (unitary) data-independent transformation [17, 18]. The main purpose of this preprocessing step is to improve the eigenvalue distribution of the input autocorrelation matrix and hence to accelerate convergence. The choice of this transformation largely depends on the underlying problem. Time series filtering applications, where u(k) is drawn from a tapped delay line, typically use the discrete Fourier transform (DFT), to obtain the so-called frequency domain LMS algorithm. However, the PTEQ/PTEC receiver is in fact a “linear combiner” problem, where no shift structure in u(k) is available. Hence, an optimal transformation is not straightforward to obtain. (iii) Square root recursive least squares (SR-RLS): in general, the SR-RLS algorithm does not impose any restrictions on the input data structure u(k) . SR-RLS exhibits fast convergence, be it that SR-RLS adds computational complexity, compared to the LMS derivatives. Since the order of complexity increases with the square of the number of parameters in w, complexity reductions are desired. To mitigate the high computational burden of RLS, the family of fast RLS algorithms such as fast transversal filters (FTF) [19] and QR-decomposition based lattice filters (QRD-LSL) have been proposed. Unfortunately, the complexity reductions attained in these algorithms rely again on the signal shift nature of the filtering problem. Hence, these fast schemes are not suitable for our problem in particular. (iv) Split RLS: this algorithm approximates the RLS algorithm with several lower-dimensional RLS problems and is able to obtain a complexity which is linear in the number of parameters [20]. Although this method does not require any specific data structure, only the estimation error is computed without finding w directly. Moreover, the authors of [20] do not prove the convergence of the obtained algorithm and indicate that a high level of misadjustment is possible for highly correlated input signals. The contributions of this paper can be summarized as follows. First, we will derive a general method for adaptively computing w of (1) without relying on any specific data structure in u(k) . Whereas the split RLS algorithm of [20] only computes the estimation error, d(k) − wT u(k) , the proposed method “merges” the SR-RLS1 and the split RLS algorithms to find the tap weight vector w explicitly. The resulting structure will be referred to as split SR-RLS. As opposed 1 The SR-RLS algorithm is sometimes also referred to as the inverse QRRLS algorithm [14].

Split SR-RLS for Joint PTEQ/PTEC Initialization in DMT Receivers to [20], we will provide a general proof of convergence. The proof will indicate that the step size of the proposed adaptation process can always be chosen in such a way that convergence in the mean is achieved. In addition, an upper bound for the step size will be derived. The second contribution of this paper is the application of the proposed split SR-RLS method to the PTEQ/PTEC initialization problem. Due to the specific nature of the PTEQ/PTEC input elements, we will illustrate how a lower complexity and lower memory requirement can be achieved compared to full SR-RLS. Although the rate of convergence will be slower than full SR-RLS, the presented algorithm will converge much faster than NLMS. We will also indicate briefly the applicability of the proposed split SR-RLS method to other ADSL initialization problems. The paper is organized as follows. In Section 2, the data model and the notation for standard adaptive algorithms are introduced. Section 3 describes the split SR-RLS algorithm, which is applied to initialize the PTEQ/PTEC in Section 4. Finally, simulation results are presented in Section 5, followed by the conclusions in Section 6. 2.

1435

2.1.

Least mean square

The (normalized) LMS algorithm was designed as a stochastic gradient descent method to solve (1) [14]. It approximates the MMSE solution by continuously updating the weight vector w as new data vectors are received, according to w(k+1) ←− w(k) +

T



T



(5)

often leads to a convergence rate which is unacceptably slow. 2.2.

Square root recursive least square

To overcome the slow convergence of LMS, (1) can be approximated by a least squares (LS) problem

2

(6)

w(k)

(i) time domain vectors and matrices are indicated by bold face lower case and upper case letters, respectively; (ii) {·}T , {·}H , {·}∗ denote transpose, complex conjugate transpose and complex conjugate, respectively; (iii) w is the unknown, complex-valued tap weight vector with T parameters, while u(k) is used to indicate a complex-valued input signal vector at time k; (iv) Xuu and Xku denote autocorrelation and crosscorrelation matrices, respectively, (defined in (5) and (13)). Problem formulation Given the input data vectors u(k) at time instant k, (k) u(k) = u(k) · · · uT −1 0

T

,

where d(k) is a vector of k + 1 training or desired symbols 

d(k) = d(0) · · · d(k)



T

,

U(k)



···

(8)

u(k) T −1

H

Given U(k) U(k) is full rank2 , the LS solution of (6) is given by H

−1

H

U(k) d(k) .

(9)

With Q(k) R(k) the QR-decomposition of U(k) [21], we can rewrite (9) as −1

w(k) = R(k) z(k) ,

such that the filter output, wT u(k) , is as close as possible to some desired signal d(k) in mean square sense, compare (1). Here, every variable can be complex-valued and no specific structure on the input data is assumed. In general, w just forms a linear combination of the input elements and is henceforth referred to as a linear combiner. In the following subsections, we will discuss NLMS and SR-RLS to find the optimal MMSE solution of (1) in an adaptive way.

(7)



u(k) 0



(3)

,

(0) u(0) · · · uT −1 0 ..   . .

 .  =  .. 

w(k) = U(k) U(k) (2)

T

and U(k) contains a set of k + 1 input signal vectors

the goal is to find the T unknown weight coefficients w = w0 · · · wT −1



Xuu = E u(k) u(k) ,

min d(k) − U(k) w(k) 2 ,

Notation Throughout this paper the following notation will be used:



(4)

where e(k) = d(k+1) − w(k) u(k+1) , µ represents the step size to govern the convergence rate and α prevents overflow for signals with low energy. This algorithm is computationally simple, but a large eigenvalue spread of the input correlation matrix,

DATA MODEL AND STANDARD ADAPTIVE ALGORITHMS



µ ∗ u(k+1) e(k) , α2 + u(k+1)H u(k+1)

(10)

H

where z(k) = Q(k) d(k) . The SR-RLS algorithm is based on it−T eratively updating the lower triangular matrix S(k) = R(k) by means of unitary Givens or Jacobi rotations [14]. The matrix R(k) is the (upper triangular) Cholesky factor of the sam H ∗ T ple covariance matrix U(k) U(k) = kj=0 u( j) u( j) . Often, an exponential weighting factor 0 < λ < 1 is included to ensure that data in the distant past is forgotten in order to track 2 In

practice, k must at least be equal to T − 1 to satisfy this condition.

1436

EURASIP Journal on Applied Signal Processing

Initialize filter coefficients w(0) and S(0) . For k = 0, . . . , ∞, (1) form the matrix-vector product: a = −S(k) u(k+1) ; (2) for m = 0, . . . , T − 1, determine the Givens rotations [14] Qm , where each Qm zeroes out the (m + 1)st element of a: 



  Qm ←−       

     cos φm e jψ sin φm    1  ..  .    1

1 0 ...  ... ... 0 .. . . . . 1 





−e− jψ sin φm

3.

cos φm

 

0T ×1 a ; ←− QT −1 · · · Q0 · δ 1

(3) update S(k) and determine the Kalman gain vector, k(k+1) , using the previously obtained Qm , m = 0, . . . , T − 1. Apply exponential weighting with λ: 

S(k+1) T −δ · k(k+1)



 ←− QT −1 · · · Q0 ·

S(k+1) ←−

Based on (11), (12), and (13), we observe that all eigenvalues of Xku are (approximately) equal. Hence, the Kalman gain update direction removes the eigenvalue spread and by this improves the convergence speed. This improvement in performance, however, is achieved at the expense of a large increase in computational complexity and memory requirement. Whereas the complexity of NLMS is on the order of O(T), the complexity and memory requirement of SR-RLS is O(T 2 ).



S(k) , 01×T

S(k+1) ; λ

(4) update w(k) :

SPLIT SR-RLS WITH REDUCED COMPLEXITY

To alleviate the computational burden of a full-blown SRRLS, the input elements of the “linear combiner” application under consideration could be divided into smaller groups, compare the split RLS algorithm in [20]. Unlike [20], our goal is to compute w(k) instead of e(k) only. As we will motivate in the next section, we are mainly interested for the PTEQ/PTEC receiver in dividing the input vector into two (unequal) parts. The ultimate goal is to design a modified SR-RLS scheme maintaining a fast convergence rate but with lower computational complexity and lower memory requirement. To achieve this goal, we will merge the split RLS and SRRLS algorithm into a split SR-RLS algorithm. Assume we split the input vector u(k) into two parts of length T1 and T2 , respectively, such that T1 + T2 = T (a reordering of the inputs might be possible), that is, 

w(k+1) ←− w(k) + k(k+1) e(k) .

u(k) = u(k) 1

Algorithm 1: The SR-RLS algorithm [22].

H

U(k) U(k) = R(k) R(k) =

k 



λ2(k− j) u( j) u( j) ≈

1

T

j =0

1 − λ2

Xuu ,

(11)

where 1/(1 − λ2 ) represents in fact the memory of the system. The last equality only holds for large k and λ close to unity. As mentioned before, LMS convergence is dictated by the eigenvalue spread of the input correlation matrix Xuu . SRRLS is able to “get rid” of the eigenvalue spread by using an iterative update based on a transformed update direction T





k(k) = S(k) S(k) u(k) ,

(12)

which is called the Kalman gain vector. An efficient realization of updating S(k) and w(k) is described in Algorithm 1 [22]. Similar to LMS (cf. (5)), the convergence of SR-RLS is determined by the crosscorrelation matrix of k(k) and u(k) : 

T



Xku = E k(k) u(k) .

u(k) 2

T

T

,

(14)

with

(13)



T



T u(k) T −1

(k) (k) u(k) · · · uT1 −1 1 = u0

statistical variations of the input data in a nonstationary environment. Correspondingly, we can write H

T

u(k) 2

=

u(k) T1

···

,

(15)

.

Now, we design a separate SR-RLS problem for each set of inputs. This requires two lower triangular matrices S(k) 1 and S(k) (of size T × T and T × T , respectively) to be updated, 1 1 2 2 2 see Algorithm 2. The update direction is now determined by l(k+1) , which consists of a concatenation of two Kalman gain vectors, one for each input set. Similar to (12), we can write 

l

(k)

T

(k) S(k) 1 S1 = 0T2 ×T1









(k) 0T1 ×T2  u1  (k) (k)∗ . T (k)∗ (k)∗ = T u u S(k) S 2 2 2

(16)

Notice that a step size µ has been added to ensure convergence. In Appendix A, we show that the convergence of the proposed scheme is determined by the maximum eigenvalue of the crosscorrelation matrix between l(k) and u(k) : 

T



Xlu = E l(k) u(k) .

(17)

Additionally, in Appendix B it is shown that Xlu has eigenvalues 1 − λ2 with multiplicity  T1 − T2 and 2T2 eigenvalues equal to (1 − λ2 )(1 ± di ), with the di ’s equal to the cosines squared of the principal angles between the subspaces (k) S1 and S2 spanned by the columns of U(k) 1 and U2 , where

Split SR-RLS for Joint PTEQ/PTEC Initialization in DMT Receivers 4.

(0) Initialize filter coefficients w(0) and S(0) 1 , S2 . For k = 0, . . . , ∞,

(1) form the matrix-vector products: a1 =

(k) (k+1) −S1 u1 ,

(k+1) a2 = −S(k) ; 2 u2

(2) for m = 0, . . . , T − 1, determine the Givens rotations [14] Qm , where Qm zeroes out the elements of a1 and a2 : 









  (k)  S(k+1) 1   ←− QT −1 · · · Q0 · S1 , T 1 (k+1) 01×T1 −δ1 · k1     (k)

S2   ←− QT −1 · · · QT · S2 , 1 (k+1)T 01×T2 −δ2 · k2 S(k+1) 1 , λ

←− S(k+1) 2

S(k+1) 2 ; λ

(4) update w(k) : 

l(k+1) =

In this section, we will apply the split SR-RLS algorithm for the initialization of the PTEQ/PTEC receiver structure. The PTEQ-only receiver [9] will be briefly reviewed in the first subsection and will be extended with PTEC in the second subsection [13].





  

k(k+1)  1 , k2(k+1)

w(k+1) ←− w(k) + µl(k+1) e(k) .

Per-tone equalization

As mentioned in the introduction, the channel impulse response in the ADSL context typically exceeds the cyclic prefix length, thereby destroying subchannel orthogonality. The resulting ISI and ICI can be mitigated by means of a channelshortening TEQ combined with a bank of one-tap FEQs [3, 4, 5, 6, 7]. An alternative equalization structure is based on PTEQ, which accomplishes the joint task of TEQ/FEQ independently for each subcarrier [8, 9] and which is able to optimize the overall bit rate. In the following, the ADSL data model is mainly based on [9] and only the main results will be repeated here. Mathematically, the received signal vector y(k) is obtained from the transmitted data through the following operations:

(k) (3) update S(k) 1 and S2 and determine the Kalman gain vector using the previously obtained Qm , m = 0, . . . , T − 1. Apply exponential weighting with λ:

S(k+1) ←− 1

SPLIT SR-RLS INITIALIZATION OF THE PTEQ/PTEC RECEIVER

4.1.

0T1 ×1 a1 , ←− QT1 −1 · · · Q0 · δ1 1     0T2 ×1 a2 ←− QT −1 · · · QT1 · ; δ2 1

(k+1)

1437

(18)







yks+ν−TEQ +2+1   ..  = 0(1)   . y(k+1)s+1 



y(k)



Algorithm 2: The split SR-RLS algorithm.

PIN

 · 0

U(k) 1

0

U(k) 2

and are matrices containing the first T1 and the last T2 columns of U(k) , respectively. Apparently, the modified update direction is able to remove partially the eigenvalue spread and by this will lead to a convergence speed in between SR-RLS and NLMS. In Appendix B, it is also shown that convergence in the mean is achieved when µ satisfies 0clip level)

between −Amax and Amax . As a result, starting with a symbol with peak level max |xL [n]|, the peak level can at best be reduced down to max |xL [n]| − Amax . Since this model admits additional degrees of freedom compared to the true reduction signal, it serves as a lower bound on the achievable PAR level. Given a peak value for a symbol block, this can be expressed as

4.

SIMULATIONS

A DMT system with symbol length N = 512 is simulated with tones 33–255 used for either data transmission or PAR reduction (these system parameters are the same for downlink ADSL transmission). Each of the data-carrying tones uses a 1024-point QAM constellation. Before active set processing, the signals have been oversampled by the factor L = 4 to limit analog peak-regrowth effects upon digital-toanalog conversion. It has been observed that operating on the digital L = 1 signal does not provide any worthwhile PAR reduction performance at the analog signal [15]. Oversampling to L = 4 makes the computational cost increase by a factor of 4, although L = 2 could be employed for a performance decrease which varies based upon the number of tones, their locations, and PSD constraints. As described in Section 2.3, the averaged PSD constraint for the reduction tones could be set to about 1 dB above the nominal PSD mask for the given example. We now use this figure as a guideline for the instantaneous PSD mask in the following simulations. To illustrate the effects when varying the maximum reduction power per tone, the simulations will first use a restrictive constraint set at the nominal PSD mask, and then use a looser mask, where the magnitude is increased by 50% (+3.5 dB). We view the forthcoming PAR results on a per-symbol basis using the simulated probability that at least one sample in a symbol block exceeds a certain PAR level. This corre-

10−2 10−3 10−4 10−5 10−6

9

10

11

12

13

14

15

16

Clip level (dB) 4th iteration PAR Min. PAR bound 2-bound Amax bound

Initial PAR 1st iteration PAR 2nd iteration PAR 3rd iteration PAR

3.4.3. 2-Bound The Amax bound from (17) corresponds to the achieved peak level when all tones are filled in order to reduce the largest peak in xL . A similar bound can be computed after the active set approach has already performed its first iteration. The two balanced peaks can be reduced (without any regard for the other samples, and thus making a bound) until all tones meet the PSD constraint. This bound, which we refer to as the 2-bound, is simple to simulate because α0 and α1 must be of equal magnitude due to the symmetry of p.

10−1

Figure 4: Symbol clip probability for 12 PAR reduction tones, chosen as a contiguous block of the highest tones. Up to four active set iterations are applied, but the algorithm stops once any tone hits the PSD constraint. The three leftmost curves represent optimal solution bounds.

sponds to taking the maximum value over one symbol in (1), thereby reflecting the probability that a symbol is transmitted with distortion. This clip probability also is commonly used in the literature. A viable alternative would be to evaluate the clip probability of each individual sample, which reflects the percentage of time the transmitted signal is clipped. 4.1.

Block placed tones

4.1.1. Restrictive PSD constraint Figure 4 shows simulations with the upper block of 12 tones (number 244–255) used for PAR reduction and subjected to an instantaneous PSD constraint equal to the nominal PSD level for the data tones. The curves show the reduction performance using the extended active set algorithm, stopping as soon as any PSD constraint is reached. Shown on the vertical axis is the probability that the time domain symbol block x¯ L would be clipped if subjected to a clip level γc on the xaxis, that is, $

(

)

%

Prob PAR x¯ L > γc .

(18)

Starting at the rightmost line, corresponding to the clip probability of an unreduced symbol, curves representing iterations one through four are shown. The two leftmost curves show the lower bounds from Section 3.4 (Amax bound and 2-bound), which the simulations cannot cross. The third lowest curve, dashed and

1504

EURASIP Journal on Applied Signal Processing 100 Clip probability, p (symbol PAR>clip level)

Clip probability, p (symbol PAR>clip level)

100 10−1 10−2 10−3 10−4 10−5 10−6

10−1 10−2 10−3 10−4 10−5 10−6

9

10

11

12

13

14

15

16

9

10

11

Clip level (dB) Initial PAR 1st iteration PAR 2nd iteration PAR 3rd iteration PAR

12

13

14

15

16

Clip level (dB) Initial PAR 1st iteration PAR 2nd iteration PAR 3rd iteration PAR

4th iteration PAR Min. PAR bound 2-bound Amax bound

4th iteration PAR Min. PAR bound 2-bound Amax bound

Figure 5: Symbol clip probability for PAR reduction with the 12 highest tones. The PSD constraint allows 50% higher magnitude per tone than in Figure 4. The reduction performance shows only a small gain compared to Figure 4, showing that this placement cannot take much advantage of the loosened PSD constraint.

Figure 6: Symbol clip probability for PAR reduction with the 24 highest tones with the same PSD constraint used in Figure 4. The simulations indicate only a small reduction gain compared to Figure 4, showing that adding extra tones to the reserved block does not help PAR reduction much.

ending at a clip probability of 3 · 10−4 is the PAR achieved by finding the minimum value of (6) with linearized quadratic constraints (a 32-sided polygon, cf. Figure 3) and using the same upper block of 12 tones. This curve will also serve as a bound for the suboptimal algorithm, but due to its much larger complexity, this curve has not been simulated for the lower clip probabilities. Looking at the performance of the low complexity algorithm, we see that for the higher clip probabilities, there is a performance gain of about 0.15 dB going beyond two iterations, and an additional 0.1 dB compared to the minimum PAR bound (dashed line). At the lower clip probabilities, we see that the curves converge towards the Amax bound from (17). Here we see a situation where a restrictive PSD constraint and a small number of reduction tones set a limit on the achievable PAR level. The reduction performance is limited by the Amax bound, and not necessarily by the block placement reduction performance. The low complexity algorithm provides near-optimal performance at a very low cost for this system.

only about 0.3 dB. The block placement simply cannot take advantage of the increased reduction power, and is the real limiting factor in this case. Looking at the performance of the low complexity algorithm, we see that its loss compared to the minimum PAR bound is about 0.2 dB.

4.1.2. Loosening the PSD constraint In Figure 5, the PSD constraint is increased by 50% in magnitude for each tone. Comparing the figures, we see that the lower bound decreases due to an increase of the maximum reduction signal. However, the simulated reduction performance, including the optimal solution, increases by

4.1.3. Increasing the number of tones Figure 6 shows results for when the upper block of 24 tones are used for PAR reduction along with the same PSD constraint as in Figure 4. Looking at the figure, we see that the gain from 12 to 24 tones is only about 0.4 dB, which is small considering that the maximum reduction magnitude has been doubled (the Amax bound is significantly lower). In this situation, however, we see that after 4 active set iterations, we are about 0.2 dB from the minimum PAR bound at higher probabilities, thus telling us that further iterations are likely not worth the significant cost to achieve it. 4.2.

Randomly chosen tones

We have seen that even when constraints (PSD limit or number of tones) are loosened, a bad tone set selection can still be a limiting factor. Now a more “spread-out” toneset is evaluated, where the reserved tones are randomly selected in the interval from 33 to 255 inclusive. 4.2.1. Restrictive PSD constraint Figure 7 shows similar simulations as Figure 4, using the restrictive instantaneous PSD constraint, equal to the average

PSD-Constrained PAR Reduction for DMT/OFDM

1505 100 Clip probability, p (symbol PAR>clip level)

Clip probability, p (symbol PAR>clip level)

100 10−1 10−2 10−3 10−4 10−5 10−6

9

10

11

12

13

14

15

10−1 10−2 10−3 10−4 10−5 10−6

16

9

10

11

Clip level (dB) 4th iteration PAR Min. PAR bound 2-bound Amax bound

Initial PAR 1st iteration PAR 2nd iteration PAR 3rd iteration PAR

Initial PAR 1st iteration PAR 2nd iteration PAR 3rd iteration PAR

Figure 7: Symbol clip probability for 12 randomly chosen PAR reduction tones. The three lowest curves show bounds on the achievable performance as in previous simulations.

13

14

15

16

4th iteration PAR Min. PAR bound 2-bound Amax bound

Figure 9: Symbol clip probability for PAR reduction with 24 random tones with the same PSD constraint used in Figure 7.

4.2.2. Loosening the PSD constraint

100 Clip probability, p (symbol PAR>clip level)

12

Clip level (dB)

Figure 8 shows the performance when the PSD constraint is set to allow for a tone magnitude 50% higher than before. The reduction performance has increased thanks to more allowed power. At the lower clip probabilities, the gains are close to 1 dB compared to Figure 7, and the active set results are very close to the performance bounds. At higher clip probabilities, the gains are close to 0.5 dB, but are a noticeable distance from the very tight minimum-PAR bound. This is only a minor issue, since in these regions, the PAR level after 3 or 4 iterations is already rather low.

10−1 10−2 10−3 10−4 10−5

4.2.3. Increasing the number of tones 10−6

9

10

11

12

13

14

15

16

Clip level (dB) Initial PAR 1st iteration PAR 2nd iteration PAR 3rd iteration PAR

4th iteration PAR Min. PAR bound 2-bound Amax bound

Figure 8: Symbol clip probability for PAR reduction with 12 random tones. The PSD constraint allows 50% higher magnitude per tone than in Figure 7.

Finally, Figure 9 shows simulations using 24 randomly chosen tones, with the restrictive PSD constraint. Due to the superior reduction ability for this placement type, the resulting PAR level is clearly lower than in the previous simulation. The allowed Amax is 100% higher here than with half the number of tones, and we see that a larger number of active set iterations may be needed to achieve PAR levels very close to the optimal solution. However, when considering lower clip probabilities, the 4th active set iteration is not very far from the 2-bound. 5.

power mask for the data tones. Looking at the figure, the iterations quickly converge to within 0.1 dB of the Amax bound, and the performance is only slightly better than for block placed tones. Here the Amax bound effectively sets the limitation on system performance [20, 21].

CONCLUSIONS

Introducing PSD constraints into tone reservation affects the achievable PAR reduction and significantly alters the complexity-versus-performance tradeoff for practical algorithms. The results in this paper show the impact that PSD constraints have on tone reservation performance, and it is clear

1506 that the effect when using randomly chosen tone sets is more severe than for contiguous tone sets. A low complexity suboptimal solution has been presented, and results show that its performance is close to optimal solution bounds. Since small performance increases incur a major computation cost (greater than the low complexity algorithm itself), we assert that our proposed approach gives a very good tradeoff of complexity and PAR reduction. To evaluate whether the oversampling of L = 4 is sufficient, the signals were oversampled by an additional factor of 4 after reduction. The peak regrowth has been observed to be less than 0.2 dB. Further studies could also include the effect on peak regrowth after the filter chain present in the transmitter [16, 17, 18]. An important special case results when a nonuniform PSD constraint is given, that is, more power is allowed on some reserved tones than others. In this case, certain tones may reach their PSD constraint much sooner than the rest, and sizeable performance gains beyond this stoppage point may still exist. An intelligent approach may be to modify the formation of p by weighting the impulse projection onto the tones according to the nonuniformity of the PSD mask. In this way, the more restricted tones do not reach their PSD constraint with greater ease than the others. Although the real baseband DMT case is the main focus of this paper, the principles can also be applied to the complex baseband case (for wireless OFDM systems), as an active set approach for this case has already been developed in [14, 16]. The problem with tone reservation in wireless systems is that it may not be desirable to sacrifice data tones in a fading channel. However, it is possible that in a fixed wireless scenario (with a slowly varying channel), channel state feedback could be employed and certain subchannels with low SNRs could be used for tone reservation.

EURASIP Journal on Applied Signal Processing

[8] [9] [10]

[11]

[12]

[13]

[14]

[15] [16] [17]

ACKNOWLEDGMENT

[18]

This work was supported by Ericsson AB and by the Australian Research Council.

[19] [20]

REFERENCES [1] J. A. C. Bingham, “Multicarrier modulation for data transmission: an idea whose time has come,” IEEE Communications Magazine, vol. 28, no. 5, pp. 5–14, 1990. [2] T. Starr, J. M. Cioffi, and P. J. Silverman, Understanding Digital Subscriber Line Technology, Prentice Hall, Upper Saddle River, NJ, USA, 1999. [3] ITU-T, Asymmetric digital subscriber line (ADSL) transceivers, Recommendation G.992.1, June 1999. [4] A. Gatherer and M. Polley, “Controlling clipping probability in DMT transmission,” in Proc. Asilomar Conference on Signals, Systems, and Computers, vol. 1, pp. 578–584, Pacific Grove, Calif, USA, November 1997. [5] J. Tellado-Mourelo, Peak to average power reduction for multicarrier modulation, Ph.D. dissertation, Stanford University, Stanford, Calif, USA, September 1999. [6] M. Friese, “Multitone signals with low crest factor,” IEEE Trans. Communications, vol. 45, no. 10, pp. 1338–1344, 1997. [7] D. L. Jones, “Peak power reduction in OFDM and DMT via

[21] [22]

[23]

[24] [25]

active channel modification,” in Proc. Asilomar Conference on Signals, Systems, and Computers, vol. 2, pp. 1076–1079, Pacific Grove, Calif, USA, October 1999. D. J. G. Mestdagh and P. M. P. Spruyt, “A method to reduce the probability of clipping in DMT-based transceivers,” IEEE Trans. Communications, vol. 44, no. 10, pp. 1234–1238, 1996. B. M. Popovi´c, “Synthesis of power efficient multitone signals with flat amplitude spectrum,” IEEE Trans. Communications, vol. 39, no. 7, pp. 1031–1033, 1991. R. W. B¨auml, R. F. H. Fischer, and J. B. Huber, “Reducing the peak-to-average power ratio of multicarrier modulation by selected mapping,” Electronics Letters, vol. 32, no. 22, pp. 2056–2057, 1996. P. O. B¨orjesson, H. G. Feichtinger, N. Grip, et al., “A lowcomplexity PAR-reduction method for DMT-VDSL,” in Proc. 5th IEEE International Symposium on Digital Signal Processing for Communication Systems, pp. 164–199, Perth, Australia, February 1999. P. O. B¨orjesson, H. G. Feichtinger, N. Grip, et al., “DMT PARreduction by weighted cancellation waveforms,” in Proc. Radiovetenskaplig Konferens, pp. 303–307, Karlskrona, Sweden, June 1999. B. S. Krongold and D. L. Jones, “A new method for PAR reduction in baseband DMT systems,” in Proc. Asilomar Conference on Signals, Systems, and Computers, vol. 1, pp. 502–506, Pacific Grove, Calif, USA, November 2001. B. S. Krongold and D. L. Jones, “A new tone reservation method for complex-baseband PAR reduction in OFDM systems,” in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 3, pp. 2321–2324, Orlando, Fla, USA, May 2002. B. S. Krongold, New techniques for multicarrier communication systems, Ph.D. dissertation, University of Illinois at Urbana-Champaign, Urbana, Ill, USA, November 2001. B. S. Krongold and D. L. Jones, “An active-set approach for OFDM PAR reduction via tone reservation,” IEEE Trans. Signal Processing, vol. 52, no. 2, pp. 495–509, 2004. W. Henkel and V. Zrno, “PAR reduction revisited: an extension to Tellado’s method,” in Proc. International OFDMWorkshop, pp. 31.1–31.6, Hamburg, Germany, September 2001. J. Tellado and J. M. Cioffi, Further results on peak-to-average ratio reduction, ANSI Document, T1E1.4 no. 98-252, August 1998. ITU-T, Asymmetric digital subscriber line (ADSL) transceivers2 (ADSL2), Recommendation G.992.3, July 2002. ¨ N. Petersson, A. Johansson, P. Odling, and P. O. B¨orjesson, “A performance bound on PSD-constrained PAR reduction,” in Proc. IEEE International Conference on Communications, pp. 3498–3502, Anchorage, Alaska, USA, May 2003. N. Petersson, Peak and power reduction in multicarrier systems, Licentiate thesis, Lund University, Lund, Sweden, November 2002. ¨ P. Odling, N. Petersson, A. Johansson, and P. O. B¨orjesson, “How much PAR to bring to the party?,” in Proc. Nordic Signal Processing Symposium, Tromsø–Trondheim, Norway, October 2002. ¨ N. Petersson, A. Johansson, P. Odling, and P. O. B¨orjesson, “Analysis of tone selection for PAR reduction,” in Proc. International Conference on Information, Communications and Signal Processing, Singapore, October 2001. D. G. Luenberger, Linear and Nonlinear Programming, Addison-Wesley, Boston, Mass, USA, 1984. G. H. Golub and C. F. van Loan, Matrix Computations, John Hopkins University Press, Baltimore, Md, USA, 2nd edition, 1989.

PSD-Constrained PAR Reduction for DMT/OFDM

1507

Niklas Andgart was born in H¨assleholm, Sweden in 1975. He received his M.S.E.E. degree in 2000 and his Licentiate in Engineering degree in 2002, both from Lund University. During the fall of 1999 he was with the Vehicle and Dynamics Laboratory at the University of California at Berkeley, and in early 2004 he was visiting the Department of Electrical and Electronic Engineering at the University of Melbourne. Currently, he is working towards a Ph.D. in signal processing at the Department of Information Technology at Lund University. His research is within signal processing for communication systems and he works with DSL research in cooperation with Ericsson AB in Stockholm.

Albin Johansson was born in 1968 in Stockholm, Sweden. He received his M.S.E.E. degree in 1993 from Royal Institute of Technology in Stockholm and is now pursuing his Ph.D. at Lund Institute of Technology. From 1993 he holds a position at Ericsson AB as Chief of Technology Linecards within broadband access, being responsible for the choice of the technology in Ericsson’s wireline broadband access products. He has been actively involved in development of the standardization of ADSL within ETSI, ANSI, ITU-T, and ADSL forum. He has been Editor for ITU-T G.997.1 and chair in one of ADSL forums subcommittees. In addition, from 1992 to 1995, he was teaching undergraduate students at Royal Institute of Technology. Since 2001, he has been a member of the Signal Processing group at Lund Institute of Technology. He has published 6 conference papers, numerous standardization contributions, and holds 7 patents.

Brian S. Krongold received his B.S., M.S., and Ph.D. degrees in electrical engineering in 1995, 1997, and 2001, respectively, from the University of Illinois at UrbanaChampaign, and worked as a Research Assistant at the Coordinated Science Laboratory from 1995–2001. Since December 2001, he has been a Research Fellow in the ARC Special Research Centre for UltraBroadband Information Networks in the Department of Electrical and Electronic Engineering at the University of Melbourne, Australia. During the summer of 1994, he interned for Martin Marietta at the Oak Ridge National Laboratory, Oak Ridge, Tennessee. From January to August 1995, he consulted at Bell Laboratories in Middletown, New Jersey. During the summer of 1998, he worked at the Electronics and Telecommunications Research Institute, Taejon, South Korea, under a National Science Foundation summer research grant. He received his second prize in the Student Paper Contest at the 2001 Asilomar Conference on Signals, Systems, and Computers. His research interests are in multicarrier communication systems, electro-optical signal processing, and time-frequency analysis and wavelets.

Per Ola B¨orjesson was born in Karlshamn, Sweden in 1945. He received his M.S. degree in electrical engineering in 1970 and his Ph.D. degree in telecommunication theory in 1980, both from Lund Institute of Technology (LTH), Lund, Sweden. In 1983, he received the degree of Docent in telecommunication theory. From 1988 to 1998, he was Professor of signal processing at Lule˚a University of Technology. Since 1998, he is Professor of signal processing at Lund University. His primary research interest lies in high performance communication systems, in particular, high data rate wireless and twisted pair systems. He is presently researching signal processing techniques in communication systems that use orthogonal frequency division multiplexing (OFDM) or discrete multitone (DMT) modulation. He emphasizes the interaction between models and real systems, from the creation of application-oriented models based on system knowledge to the implementation and evaluation of algorithms.

¨ ¨ Per Odling was born in 1966 in Ornsk¨ oldsvik, Sweden. He received his M.S.E.E. degree in 1989, his Licentiate of Engineering 1993, and his Ph.D. in signal processing 1995, all from Lule˚a University of Technology, Sweden. In 2000, he was awarded the Docent degree from Lund Institute of Technology, and in 2003 he was appointed Full Professor there. From 1995, he was an Assistant Professor at Lule˚a University of Technology, serving as Vice Head of the Division of Signal Processing. In parallel, he consulted for Telia AB and ST-Microelectronics, developing an OFDM-based proposal for the standardization of UMTS/IMT-2000 and VDSL for standardization in ITU, ETSI, and ANSI. Accepting a position as Key Researcher at the Telecommunications Research Center, Vienna in 1999, he left the arctic north for historic Vienna. There, he spent three years advising graduate students and industry. He also consulted for the Austrian Telecommunications Regulatory Authority on the unbundling of the local loop. He is, since 2003, a Professor at Lund Institute of Technology, stationed at Ericsson AB, Stockholm. He also serves as an Associate Editor for the IEEE Transactions on Vehicular Technology. He has published more than forty journal and conference papers, thirty-five standardization contributions, and a dozen patents.

EURASIP Journal on Applied Signal Processing 2004:10, 1508–1519 c 2004 Hindawi Publishing Corporation 

Bandwidth Efficient OFDM Transmitter Diversity Techniques King F. Lee Multimedia Architecture Lab, Motorola Labs, Schaumburg, IL 60196, USA Email: [email protected]

Douglas B. Williams School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332-0250, USA Email: [email protected] Received 17 December 2002; Revised 2 September 2003 Space-time block-coded orthogonal frequency division multiplexing (OFDM) transmitter diversity techniques have been shown to be efficient means of achieving near-optimal diversity gain in frequency-selective fading channels. However, these known techniques all require a cyclic prefix to be added to the transmitted symbols, resulting in bandwidth expansion. In this paper, iterative space-time and space-frequency block-coded OFDM transmitter diversity techniques are proposed that exploit spatial diversity to improve spectral efficiency by eliminating the need for a cyclic prefix. Keywords and phrases: space-time coding, space-frequency coding, transmitter diversity, OFDM, channel estimation, pilot symbols.

1.

INTRODUCTION

The last decade has witnessed an explosive growth of wireless communications, especially in mobile communications and personal communications services (PCS). With the continuing expansion in both existing and new markets and the introduction of exciting new services such as wireless internet access and multimedia applications, the wireless communications market is expected to continue to grow at a rapid pace. Furthermore, the ever-increasing demand for faster and more reliable services to support new applications has created strong interests in developing high data rate wireless communications systems. With existing and emerging wireless applications, all competing for a limited radio spectrum, the development of high data rate wireless communications systems that are spectrally efficient is especially important. The main challenge in developing reliable high data rate mobile communications systems is to overcome the detrimental effects of frequency-selective fading in mobile communications channels. A number of space-time coded orthogonal frequency division multiplexing (OFDM) transmitter diversity techniques have recently been proposed for high data rate wireless communications [1, 2, 3, 4]. It has been shown in [3, 4] that space-time and space-frequency block-coded OFDM (STBC-OFDM and SFBC-OFDM) sys-

tems are efficient means of achieving near optimum diversity gain in frequency-selective fading channels. These previously proposed OFDM transmitter diversity systems all require a cyclic prefix to be added to the transmitted symbols to avoid intersymbol interference (ISI) and interchannel interference (ICI) in the OFDM symbols, and the number of cyclic prefix symbols has to be equal to or greater than the order of the wireless channels [5]. The addition of the cyclic prefix causes bandwidth expansion if a desired data rate is to be maintained or a reduction in data rate if the transmission bandwidth is fixed. For many high data rate systems, the addition of a cyclic prefix can cause more than a 15% bandwidth expansion, which is a very significant loss of a valuable resource [6]. In this paper, we propose iterative space-time and spacefrequency block-coded OFDM (ISTBC-OFDM and ISFBCOFDM) transmitter diversity techniques that do not require a cyclic prefix and, therefore, are more bandwidth efficient than previously proposed systems. Computer simulations are used extensively to evaluate the performances of the various systems considered in this paper. The COST207 six-ray typical urban (TU) channel power delay profile [7] is used to model the frequencyselective fading channels in all the simulations. Furthermore, for the simulations in Sections 2 and 3, perfect estimates of the channel impulse responses (CIRs) are assumed to be available at the receiver.

Bandwidth Efficient OFDM Transmitter Diversity Techniques

1509 100

Tx1

X(u)

Serial to X(n) Transmitter diversity parallel encoder

Parallel to serial

7 X(n) Diversity decoder

Y(n)

7 1 (n) Λ 7 2 (n) Λ

h1 (n) Rx

Tx2 h2 (n)

X2 (n)

7 X(u)

IDFT & cyclic prefix

IDFT & cyclic prefix

Prefix removal & DFT

10−2

10−4

r(n)

Channel estimator

Figure 1: Block diagram of a two-branch OFDM transmitter diversity system utilizing a cyclic prefix.

The remainder of the paper is organized as follows. In Section 2, a brief overview of OFDM transmitter diversity systems utilizing a cyclic prefix is provided. Section 3 gives a detailed description of the proposed bandwidth efficient ISTBC-OFDM and ISFBC-OFDM transmitter diversity systems. Section 4 considers channel estimation techniques for OFDM transmitter diversity systems without a cyclic prefix. Finally, Section 5 summarizes the results and outlines possible future research in this area. 2.

Average BER

X1 (n)

OFDM TRANSMITTER DIVERSITY SYSTEMS UTILIZING A CYCLIC PREFIX

A block diagram of a general two-branch OFDM transmitter diversity system with a cyclic prefix is shown in Figure 1. Let X(u) denote the input serial data symbols with symbol duration TS . The serial to parallel converter collects K serial data symbols into a data vector X(n) = [X(n, 0) X(n, 1) · · · X(n, K − 1)]T , which has a block duration of KTS .1 The transmitter diversity encoder codes X(n) into two vectors X1 (n) and X2 (n) according to an appropriate coding scheme as in [1, 2, 3, 4]. The coded vector X1 (n) is modulated by an inverse discrete Fourier transform (IDFT) into an OFDM symbol sequence. A length G cyclic extension is added to the OFDM symbol sequence and the resulting signal is transmitted from the first transmit antenna. Similarly, vector X2 (n) is modulated by an IDFT, cyclically extended, and transmitted from the second transmit antenna. Let h1 (n) denote the CIR between the first transmit antenna and the receiver and let h2 (n) denote the CIR between the second transmit antenna and the receiver. To avoid ISI and ICI, the length of the cyclic extension G is chosen to be greater than or equal to L, the maximum order of the CIRs, that is, G ≥ L [5]. At the receiver, the received signal vector first has the 1 Throughout the paper, we will use the notation that A(n, k) denotes the kth element of the vector A(n).

10−6

0

5

10

15

20

25

30

35

40

Average received SNR (dB) 4-QAM in flat Rayleigh fading channel (theoretical) 2-branch STBC-OFDM without a cyclic prefix (simulated) 2-branch STBC-OFDM with a cyclic prefix (simulated)

Figure 2: Performance of STBC-OFDM without a cyclic prefix in a TU channel with TS = 2−20 second, K = 32, L = 5, and fD = 10 Hz.

cyclic prefix removed and is then demodulated by a discrete Fourier transform (DFT) to yield the demodulated signal vector Y(n). Assuming the CIRs remain constant during the entire block interval, the demodulated signal is given by [3, 4] Y(n) = Λ1 (n)X1 (n) + Λ2 (n)X2 (n) + Z(n),

(1)

where Λ1 (n) and Λ2 (n) are two diagonal matrices whose elements are the DFTs of the respective CIRs and Z(n) is the DFT of the channel noise. Elements of Z(n) are generally assumed to be additive white Gaussian noise (AGWN) with variance σZ2 . In OFDM systems, the use of a cyclic prefix transforms the linear convolution between the transmitted symbols and the frequency-selective CIR into circular convolution. The IDFT and DFT pair used in the OFDM modulation and demodulation processes then transforms the time-domain circular convolution into simple multiplication in the frequency domain. The net effect is that OFDM with a cyclic prefix transforms the frequency-selective fading channel into multiple perfectly decoupled flat fading subchannels. The OFDM transmitter diversity systems in [1, 2, 3, 4] all rely on this special property of OFDM with a cyclic prefix in the precoding and decoding processes to achieve good diversity performance. Without the cyclic prefix, the convolution between the transmitted symbols and the frequency-selective CIR reverts back to the usual linear convolution, causing ISI and ICI in the OFDM systems. As a result, the underlying OFDM subchannels are no longer decoupled flat fading channels and the diversity performance of STBC-OFDM and SFBC-OFDM transmitter diversity systems is significantly degraded. For example, Figure 2 shows simulation results of the BER performances for an STBC-OFDM transmitter diversity system in a slow fading channel with maximum Doppler

1510 frequency fD = 10 Hz, both with and without a cyclic prefix. The example STBC-OFDM system has a block size K = 32 and channel order L = 5, requiring a cyclic prefix of length 5 with the resultant bandwidth expansion of L ÷ K = 15.6%. Figure 2 clearly shows the degradation of the diversity gain for STBC-OFDM without a cyclic prefix. Although not shown here, performances of SFBC-OFDM transmitter diversity systems without a cyclic prefix exhibit similar degradations. 3.

BANDWIDTH EFFICIENT OFDM TRANSMITTER DIVERSITY SYSTEMS

As described in Section 2 and demonstrated in the example of Figure 2, the performances of STBC-OFDM and SFBCOFDM transmitter diversity systems are significantly degraded without the cyclic prefix. Therefore, in order to eliminate the cyclic prefix requirement for STBC-OFDM and SFBC-OFDM systems, some form of ISI and ICI equalization for these OFDM transmitter diversity systems is needed. A number of equalization techniques have been proposed to reduce the negative effects of ISI and ICI for OFDM systems without a cyclic prefix or when the cyclic prefix is shorter than the channel memory [8, 9, 10, 11, 12]. Unfortunately, these equalization techniques are highly channel specific, that is, the equalizer coefficients are strong functions of the channel response. With transmitter diversity, as shown in Figure 1, the received signal is the superposition of signals transmitted simultaneously from multiple transmitters and the channel responses between each transmitter and the receiver are generally different. An equalizer that can simultaneously equalize the channel responses from all the transmitters does not exist, in general. Therefore, any equalization technique that is specific to the channel response will not be effective for transmitter diversity systems. However, here a compensation technique that is only “partially” dependent on the channel responses will be shown to be very effective for STBC-OFDM and SFBC-OFDM transmitter diversity systems without a cyclic prefix. The proposed technique, described in detail in the following sections, provides an effective and efficient means of eliminating the need for a cyclic prefix for the STBC-OFDM and SFBC-OFDM transmitter diversity systems, thus eliminating the bandwidth expansion while still achieving very good diversity performance. The proposed technique extends the tail cancellation and cyclic reconstruction ideas shown in [13] and the iterative technique shown in [14] to STBC-OFDM and SFBC-OFDM transmitter diversity systems. Therefore, the proposed techniques will be referred to as ISTBC-OFDM and ISFBCOFDM transmitter diversity. The ISTBC-OFDM and ISFBCOFDM techniques rely on two key properties of the IDFT and DFT. (1) The IDFT and DFT pair diagonalizes any circulant matrix. This property is equivalent to the more familiar property of the DFT where circular convolution in the time domain equates to simple multiplication

EURASIP Journal on Applied Signal Processing in the frequency domain. This property is the key to transforming a frequency-selective fading channel into multiple completely decoupled flat fading subchannels. (2) The IDFT and DFT are linear transforms and superposition holds when applied to the received signal in a transmitter diversity system, which is a sum of signals from multiple transmitters. Linearity allows the transforms to operate on the received signal components without any undesirable cross-terms. The proposed technique is applicable to both STBC-OFDM and SFBC-OFDM transmitter diversity systems. Since the ISFBC-OFDM transmitter diversity algorithm is simpler, the ISFBC-OFDM technique will be described first in Section 3.1 followed by the ISTBC-OFDM algorithm in Section 3.2. 3.1.

ISFBC-OFDM systems

A block diagram of the ISFBC-OFDM system is shown in Figure 3. Let X(n) denote the K × 1 vector at the output of the serial to parallel converter at the block instant n. The spacefrequency encoder codes X(n) into vectors X1 (n) and X2 (n) according to the coding scheme for SFBC-OFDM [4]. The SFBC vectors X1 (n) and X2 (n) are modulated by the IDFT into time-domain OFDM signals x1 (n) and x2 (n) and then transmitted through channels with CIRs h1 (n) and h2 (n). Note that no cyclic prefix is added to either x1 (n) or x2 (n), so there is no bandwidth expansion or rate reduction. For presentation simplicity, the additive channel noise will be omitted in the following derivation. In the absence of noise, the received signal vector is given by r(n) = x1 (n) ∗ h1 (n) + x2 (n) ∗ h2 (n),

(2)

where ∗ denotes linear convolution. Equivalently, the received signal vector can be expressed in terms of convolution matrices of the CIRs and transmitted signal vectors as follows: r(n) = H1,0 x1 (n) + H1,1 x1 (n − 1)

(3)

+ H2,0 x2 (n) + H2,1 x2 (n − 1),

where the first index in the subscript denotes the spatial dimension and the second index denotes the temporal dimension. The convolution matrices Hm,0 and Hm,1 are defined in terms of the CIRs hm (n) as follows: 

hm,0

0

··· ··· ··· ···

0 .. . .. . .. . .. .



  .. .. .. h  . . .  m,1 hm,0 0     ..   . hm,1 hm,0 . . . . . . . . .      . .. .. .. ..  , . . . . hm,L . .      . . . .. .. .. h  0  0   m,0  .  .. .. ..  .   . . . . hm,1 hm,0 0  0 · · · 0 hm,L · · · hm,1 hm,0

Bandwidth Efficient OFDM Transmitter Diversity Techniques

1511 Tx1 h1 (n) X1 (n)

X(u)

Serial to parallel

X(n)

IDFT

Space-frequency encoder

Rx

Tx2 h2 (n) X2 (n)

IDFT

7 − 1) X(n

7 X(u)

Parallel to serial

7 X(n) Space-frequency decoder

Y(n)

DFT

Tail cancellation & cyclic reconstruction

r(n)

Channel estimator

Figure 3: Block diagram of the ISFBC-OFDM transmitter diversity system. 

0 .  ..    .. .  . . .  .  ..  . . .

···

0 .. . .. . .. . .. .

0 hm,L · · · .. .. . 0 . .. .. .. . . . .. .. .. . . . .. .. .. . . . .. .. .. . . .



hm,2 hm,1  .. . hm,2   ..   .. . .  

Notice that (H1,0 + H1,1 ) is a circulant matrix corresponding to h1 (n), (H2,0 + H2,1 ) is a circulant matrix for h2 (n), and (6) is simply the sum of circular convolutions. The equivalent equation in the frequency domain is

 

, 0 hm,L    .. . 0   ..   0 . 

0 ··· ··· ··· ··· ···

Y(n) = Λ1 (n)X1 (n) + Λ2 (n)X2 (n),

0 (4)

respectively, where m = 1 and 2 and the implicit dependency of the time-varying CIRs on the block instant n has been omitted for briefness of presentation. The H1,1 x1 (n − 1) and H2,1 x2 (n − 1) terms in (3) represent contributions from the previous block that can be eliminated using the previ7 − 1) and the estimated channel responses ous decision X(n 7 hm (n) from the channel estimator. Notice that x1 (n − 1) and x2 (n − 1) are simply the IDFTs of the SFBC X(n − 1), so they 7 − 1). Elimination of the contribucan be estimated from X(n tion from x1 (n − 1) and x2 (n − 1) is referred to as tail cancellation [13] and can be achieved by 7 1,1 x 7 2,1 x 2r(n) = r(n) − H 71 (n − 1) − H 72 (n − 1) ≈ H1,0 x1 (n) + H2,0 x2 (n).

(5)

On the other hand, the desired received signal for SFBCOFDM transmitter diversity, which has the correct circular convolution (or cyclic) property, has the form y(n) = H1,0 x1 (n) + H1,1 x1 (n) + H2,0 x2 (n) + H2,1 x2 (n).

(6)

(7)

where Λ1 (n) and Λ2 (n) are diagonal matrices whose elements are the DFTs of the respective CIRs h1 (n) and h2 (n). The time-domain equation in (6), or equivalently the frequency-domain equation in (7), is the desired ISIand ICI-free flat fading subchannel system we are attempting to achieve. Hence, the goal is to add an estimate of H1,1 x1 (n) + H2,1 x2 (n) to 2r(n) to approximate the desired signal y(n). Adding an estimate of H1,1 x1 (n) + H2,1 x2 (n) to 2r(n) amounts to restoring the cyclic property of the SFBC-OFDM system and is referred to as cyclic reconstruction [13]. Since x1 (n) and x2 (n) are functions of the yet-to-be-determined symbol vector X(n), x1 (n) and x2 (n) are not readily available for the cyclic reconstruction. The iterative approach in [14] is therefore adapted here for the cyclic reconstruction process. A flow diagram of the ISFBC-OFDM algorithm is shown in Figure 4, and an outline of the algorithm is as follows. 7 − 1) (1) Space-frequency code the previous decision X(n 7 1 (n − 1) and X 7 2 (n − 1) and modulate with into X an IDFT to form x71 (n − 1) and x72 (n − 1). Tail cancellation is then performed on the received signal vector r(n) to form 2r(n) as in (5). Initialize iteration number i to zero. (2) Demodulate 2r(n) with a DFT and decode with the space-frequency decoder and decision device to form

1512

EURASIP Journal on Applied Signal Processing 100

r(n) 7 1 (n) h

Average BER

Space-frequency encode & IDFT

7 − 1) X(n

Tail cancellation & set i = 0

7 2 (n) h

i=0 i>0 DFT & space-frequency decode

10−4

i=0 i=1 i=2

10−6

7 (i) (n) X

i = i end? Yes

10−2

7 (i) (n) 7 X(n) =X

No

0

5

10 15 20 25 30 Average received SNR (dB)

35

40

4-QAM in flat Rayleigh fading channel (theoretical) 2-branch SFBC-OFDM without a cyclic prefix (simulated) 2-branch SFBC-OFDM with a cyclic prefix (simulated) 2-branch ISFBC-OFDM transmitter diversity (simulated)

Figure 5: Performance of ISFBC-OFDM transmitter diversity in a TU channel with TS = 2−21 second, K = 256, L = 10, and fD = 20 Hz. Iteration number is indicated for i = 0, 1, and 2.

Space-frequency encode & IDFT 2r(n)

the ISFBC-OFDM transmitter diversity system provides significantly better performance over that of the SFBC-OFDM without a cyclic prefix. For this example, the performance of the ISFBC-OFDM system approaches that of the SFBCOFDM with a cyclic prefix within just one to two iterations.

Cyclic reconstruction & set i = i + 1 y7(i) (n)

Figure 4: Flow diagram of the ISFBC-OFDM transmitter diversity algorithm. 7 (0) (n).2 the estimate X 7 (i) (n) into X 7 1(i) (n) and X 7 2(i) (n) (3) Space-frequency code X (i) and modulate with an IDFT to form x71 (n) and x72(i) (n). (4) Form the cyclic reconstructed signal as (i) (i) 7 1,1 x 7 2,1 x y(i) (n) = 2r(n) + H 71 (n) + H 72 (n)

(8)

and increment the iteration number to i = i + 1. (5) An updated decision on X(n) can then be obtained from y(i) (n) with a DFT, space-frequency decoding, and passing through the decision device to yield the 7 (i) (n). updated decision X (6) Repeat steps 3–5 for a predetermined number of times 7 to obtain the final decision X(n). Simulation results for a two-branch ISFBC-OFDM transmitter diversity system at various iterations (i = 0, 1, and 2) are shown in Figure 5. For the simulations in Sections 3.1 and 3.2, perfect estimates of the CIRs are assumed to be available at the receiver. Simulation results in Figure 5 show that 2 The parenthesized superscript will be used to denote the iteration num-

7 (1) (n) is the estimate of X(n) 7 ber, for example, X after the first iteration.

3.2. ISTBC-OFDM systems The ISTBC-OFDM transmitter diversity system will be described next. A block diagram of the ISTBC-OFDM system is shown in Figure 6. For the two-branch STBC-OFDM system, the diversity encoding and decoding are performed on two consecutive data blocks over two block instants [3]. The space-time encoder codes X(n) and X(n + 1) into two vector pairs [X1 (n), X1 (n + 1)] and [X2 (n), X2 (n + 1)] using the coding scheme for STBC-OFDM, where n is incremented by two for every two block instants. The STBC vectors X1 (n), X1 (n + 1), X2 (n), and X2 (n + 1) are first modulated by an IDFT into time-domain OFDM signals x1 (n), x1 (n + 1), x2 (n), and x2 (n + 1) and then transmitted through channels with CIRs h1 (n) and h2 (n). In the absence of noise, the received signals during the two corresponding block instants are given by r(n) = H1,0 x1 (n) + H1,1 x1 (n − 1) + H2,0 x2 (n) + H2,1 x2 (n − 1), r(n + 1) = H1,0 x1 (n + 1) + H1,1 x1 (n) + H2,0 x2 (n + 1) + H2,1 x2 (n).

(9)

Here, the desired signals with the correct cyclic property are y(n) = H1,0 x1 (n) + H1,1 x1 (n) + H2,0 x2 (n) + H2,1 x2 (n), y(n + 1) = H1,0 x1 (n + 1) + H1,1 x1 (n + 1) + H2,0 x2 (n + 1) + H2,1 x2 (n + 1).

(10)

Bandwidth Efficient OFDM Transmitter Diversity Techniques

1513 Tx1 h1 (n)

X(u)

Serial to parallel

X(n) X(n + 1)

X1 (n) X1 (n + 1)

Space-time encoder

IDFT Rx

Tx2 h2 (n) X2 (n) X2 (n + 1)

IDFT

7 − 2) X(n 7 − 1) X(n

7 X(u)

Parallel to serial

7 X(n)

Space-time decoder

Y(n)

DFT

Y(n + 1)

Tail cancellation & cyclic reconstruction

r(n) r(n + 1)

7 + 1) X(n

Channel estimator

Figure 6: Block diagram of the ISTBC-OFDM transmitter diversity system.

Tail cancellation can be performed on r(n) with the previous 7 − 2) and X(n 7 − 1) as follows: decisions X(n 7 1,1 x 7 2,1 x 2r(n) = r(n) − H 71 (n − 1) − H 72 (n − 1),

(4) Form the cyclic reconstructed signal y(i) (n) as in (8). (5) Perform tail cancellation and cyclic reconstruction on r(n + 1) as follows:

(11)

where x71 (n − 1) and x72 (n − 1) are the IDFTs of the STBC 7 − 2) and X(n 7 − 1). Cyclic reconstruction of y (i) (n) can X(n be done similarly to the steps in the ISFBC-OFDM algorithm except that the space-time block coding is used in7 stead. Tail cancellation for r(n + 1), however, requires X(n) 7 + 1), which are still to be determined. Therefore, the and X(n ISTBC-OFDM algorithm requires some modifications from that of the ISFBC-OFDM. Recall that with the ISFBC-OFDM algorithm, the tail cancellation step is performed once in the beginning and only the cyclic reconstruction is updated iteratively. For ISTBC-OFDM, both the tail cancellation and cyclic reconstruction for y(i) (n + 1) have to be done through iterative updates. A flow diagram for the ISTBC-OFDM algorithm is shown in Figure 7 and an outline of the algorithm is as follows. 7 − 1) (1) Space-time code the previous decisions X(n 7 − 2) and modulate with an IDFT to form and X(n x71 (n − 1) and x72 (n − 1). Tail cancellation is then performed on the received signal vector r(n) to form 2r(n) as in (11). Initialize iteration number i to zero. (2) Demodulate 2r(n) and r(n + 1) with a DFT and decode using the space-time decoder and decision device to 7 (0) (n) and X 7 (0) (n + 1). form the estimates X (i) (i) 7 (n) and X 7 (n + 1) and modulate (3) Space-time code X with an IDFT to form x71(i) (n), x71(i) (n + 1), x72(i) (n), and x72(i) (n + 1).

(i) (i) 7 1,1 x 7 2,1 x y(i) (n + 1) = r(n + 1) − H 71 (n) − H 72 (n) (i) (i) 7 1,1 x 7 2,1 x 71 (n + 1) + H 72 (n + 1) +H

(12)

and increment the iteration number as i = i + 1. (6) An updated decision on X(n) and X(n + 1) can then be obtained from y(i) (n) and y(i) (n + 1) by performing a DFT, space-time decoding, and passing through the 7 (i) (n) decision device to yield the updated decisions X (i) 7 and X (n + 1). (7) Repeat steps 3–6 for a predetermined number of times 7 7 + 1). and X(n to obtain the final decisions X(n) Simulation results for a two-branch ISTBC-OFDM transmitter diversity system at various iterations (i = 0, 1, 2, and 3) are shown in Figure 8. Simulation results show that the ISTBC-OFDM transmitter diversity system provides significant improvement over STBC-OFDM without a cyclic prefix. For this particular example, ISTBC-OFDM provides over 12 dB of diversity gain at a BER of 10−4 and lowers the error floor from 10−3 to 2 × 10−5 . 3.3.

Computational complexity

As compared to STBC-OFDM and SFBC-OFDM systems, ISTBC-OFDM and ISFBC-OFDM systems require additional computations to combat the ISI and ICI caused by the lack of a cyclic prefix. The additional complexity depends on several system parameters, such as the block size K, the number of iterations i, and the channel order L. In

1514

EURASIP Journal on Applied Signal Processing 100

Space-time encode & IDFT

7 − 2) X(n 7 − 1) X(n

r(n) r(n + 1) 7 1 (n) h 7 2 (n) h

Average BER

Tail cancellation & set i = 0

i=0 i>0 DFT & space-time decode

7 (i) (n + 1) X

Yes

7 (i) (n) 7 X(n) =X

10−4

7 (i) (n + 1) 7 + 1) = X X(n

Space-time encode & IDFT

i=1 i=2 i=3

0

5

10 15 20 25 30 Average received SNR (dB)

35

40

4-QAM in flat Rayleigh fading channel (theoretical) 2-branch STBC-OFDM without a cyclic prefix (simulated) 2-branch STBC-OFDM with a cyclic prefix (simulated) 2-branch ISTBC-OFDM transmitter diversity (simulated)

No

2r(n)

i=0

10−6

7 (i) (n) X

i = i end?

10−2

Figure 8: Performance of ISTBC-OFDM transmitter diversity in a TU channel with TS = 2−20 second, K = 32, L = 5, and fD = 10 Hz. Iteration number is indicated for i = 0, 1, 2, and 3.

r(n + 1) Tail cancellation cyclic reconstruction & set i = i + 1 y(i) (n) y(i) (n + 1)

Figure 7: Flow diagram of the ISTBC-OFDM transmitter diversity algorithm.

this section, the computational complexities of the ISTBCOFDM and ISFBC-OFDM algorithms are considered. First, notice that the space-time and space-frequency block encodings involve only minor reindexing, negation, and conjugation, which is negation of the imaginary part. These operations have essentially zero cost and, therefore, will not be counted in the computational load of the algorithms. The computational complexity of the ISTBC-OFDM and ISFBCOFDM algorithms for each OFDM block, that is, every K data symbols, is summarized in Tables 1 and 2, respectively. Since the block size K is usually much larger than the channel order L, the convolution matrices used in the tail cancellation and cyclic reconstruction steps are generally sparse. Therefore, the multiplication operations for the tail cancellation and cyclic reconstruction have only minor impact on the computational loads. As shown in Tables 1 and 2, the ISTBC-OFDM and ISFBC-OFDM algorithms have about the same computational loads, especially when the number of iterations is large, and most of the computational complexity is in the DFTs. To lessen the computational load, the block size K can be chosen to be a power of two so that a highly efficient FFT algorithm, which requires only approximately (K/2)log2 K multiplications and Klog2 K additions [15], can be used.

Compared with other equalization techniques for OFDM systems without a sufficient cyclic prefix [8, 9, 10, 11, 12], which often have a computational complexity of O(K 3 ) for a block size of K [16], the proposed ISTBC-OFDM and ISFBCOFDM algorithms have significantly lower computational loads. More importantly, as mentioned at the beginning of this section, none of the techniques shown in [8, 9, 10, 11, 12] is applicable to multiple transmitter systems. Therefore, the proposed ISTBC-OFDM and ISFBC-OFDM algorithms are not only efficient but also the only techniques known to the authors that are applicable to OFDM transmitter diversity systems without a cyclic prefix. Although ISTBC-OFDM and ISFBC-OFDM transmitter diversity systems incur additional computational complexity beyond that required by STBC-OFDM and SFBC-OFDM systems, the added computational loads allow for significant improvement in bandwidth efficiency. It is important to note that radio spectrum is a limited resource while the computation powers of signal processors continue to double about every eighteen months [17]. Therefore, tradeoffs between bandwidth efficiency and reasonable increases in computational complexity will likely continue to be in favor of the bandwidth efficient approaches. 4.

CHANNEL ESTIMATION FOR ISTBC-OFDM AND ISFBC-OFDM SYSTEMS

It has been shown in previous sections that the ISTBCOFDM and ISFBC-OFDM transmitter diversity techniques are effective and efficient means of achieving good diversity gain in frequency-selective fading channels without requiring the use of a cyclic prefix. For these systems, knowledge of the channel parameters is required at the receivers

Bandwidth Efficient OFDM Transmitter Diversity Techniques

1515

Table 1: Computational complexity of the ISTBC-OFDM algorithm. DFTs Tail cancellation

1

Cyclic reconstruction (per iteration)

3

Total complexity‡ for i iterations

Multiplications

Additions

L(L + 1) 2 3L(L + 1) + 2K 2 3i + 1 3i + 1 K log2 K + 2iK + L(L + 1) 2 2

L(L + 1) 2 3L(L + 1) +K 2 3i + 1 (3i + 1)K log2 K + iK + L(L + 1) 2

Table 2: Computational complexity of the ISFBC-OFDM algorithm.

Tail cancellation Cyclic reconstruction (per iteration) Total complexity‡ for i iterations ‡

DFTs 2

Multiplications L(L + 1)

Additions L(L + 1)

3

L(L + 1) + 2K

L(L + 1) + K

3i + 2 K log2 K + 2iK + (i + 1)L(L + 1) 2

(3i + 2)K log2 K + iK + (i + 1)L(L + 1)

Assuming K is a power of two and each FFT requires (K/2) log2 K multiplications and K log2 K additions.

for tail cancellation, cyclic reconstruction, and decoding. All the impressive diversity gain results shown in Figures 5 and 8 were achieved under the assumption that perfect channel information was available at the receiver. In practice, the receiver has to estimate the channel information and the channel estimation process is usually far from perfect. Channel estimation techniques for conventional OFDM systems have been studied extensively by many researchers [18, 19, 20, 21, 22, 23, 24]. However, channel estimation for OFDM systems with transmitter diversity has seen only limited development so far. Channel estimation for transmitter diversity systems is generally complicated by the fact that signals transmitted simultaneously from multiple antennas become interference for each other during the channel estimation process. In this section, we study channel estimation techniques that are compatible with OFDM transmitter diversity systems without a cyclic prefix and are thus applicable to ISTBC-OFDM and ISFBC-OFDM systems. In [25], a decision-directed MMSE channel estimator for OFDM systems with transmitter diversity was proposed. The main drawback of the MMSE channel estimation approach is the high computational complexity required to update the channel estimates during the data transmission mode. More importantly, the channel estimator in [25] requires the subchannels to be completely decoupled. In the absence of a cyclic prefix of sufficient length, the subchannels are no longer decoupled and the performance of the estimator is significantly degraded. Therefore, a different channel estimation approach is needed for the ISTBC-OFDM and ISFBCOFDM systems. In this section, we consider an extension of the multirate pilot-symbol-assisted (PSA) channel estimation technique proposed in [26] to OFDM transmitter diversity systems without a cyclic prefix, making it suitable for the ISTBC-OFDM and ISFBC-OFDM systems. The lack of a cyclic prefix in ISTBC-OFDM and ISFBCOFDM systems presents a particular challenge to the channel estimation process. Without a sufficiently long cyclic

prefix, the subchannels of these OFDM systems are distorted by ISI and ICI. Thus, the desirable decoupled relationship in (1), which both the decision-directed MMSE channel estimator in [25] and the PSA channel estimator in [26] depend on, is no longer valid. Therefore, neither the decision-directed MMSE channel estimator in [25] nor the PSA channel estimator in [26] is directly applicable to the ISTBC-OFDM and ISFBC-OFDM systems. With the decision-directed approach, in addition to minimizing the interference among the multiple transmitted signals, the channel estimator would also have to eliminate the ISI and ICI caused by the lack of the cyclic prefix during the data transmission mode. Hence, any decision-directed approach is unlikely to yield an effective channel estimator for the ISTBC-OFDM and ISFBC-OFDM systems. On the other hand, with the PSA channel estimator, the ISI and ICI caused by the lack of the cyclic prefix only need to be eliminated during the pilot mode, which is generally an easier problem to be solved. Therefore, we propose a modification to the PSA channel estimator in [26], making it suitable for OFDM transmitter diversity systems without a cyclic prefix. First, an interesting property of any length K sequence s(m) with only even harmonics, that is, all the odd frequency bins are zero, is that the sequence s(m) is periodic in K/2. That is, s(m) = s(m + K/2) for 0 ≤ m ≤ K/2 − 1. The first half of the sequence is in effect the cyclic extension of the second half of the sequence and, therefore, can be used just like a length K/2 guard interval for the second half of the sequence [14]. The PSA channel estimator developed in [26] can be extended to work with OFDM transmitter diversity systems without a cyclic prefix by using pilot sequences with the above cyclic property. Define a length K chirp sequence as follows: C(k) = e j(πk

2 /K)

,

0 ≤ k ≤ K − 1.

(13)

Let PSm (n, k) denote the kth tone of the pilot symbol

1516

EURASIP Journal on Applied Signal Processing

PSm n, k + 2(m − 1) =



 (−1)m √MC k + 2(m − 1) 0

if (k)2M = 0, otherwise,

(14)

where C(k) is the chirp sequence as defined in (13), M is the number of transmitters, (k)2M denotes k modulo (2M), 1 ≤ m ≤ M, 0 ≤ k ≤ K − 1, and 1 ≤ m + k ≤ K. Figure 9 shows the pilot symbol patterns for a typical twobranch OFDM transmitter diversity system without a cyclic prefix. Notice that the pilot symbols in Figure 9 satisfy the following properties. (1) The pilot symbols transmitted from different transmitters occupy different frequency bins. This property enables the avoidance of interference among pilot symbols from different transmitters and is the same property as that implemented for the channel estimator in [26]. (2) The pilot symbols transmitted from the same transmitter have only nonzero values on even subcarriers. This property ensures that the time-domain pilot sequence is periodic in K/2 so that the first half of the sequence can serve as the guard interval for the second half of the sequence. At the receiver, the last K/2 samples of the received signal vector rPS (n) are assigned to the vector y(n) as follows:    K   rPS n, k +

2

y(n, k) = 

 rPS (n, k)

K for 0 ≤ k ≤ − 1, 2 K for ≤ k ≤ K − 1, 2

(15)

where the subscript PS denotes the received signal during the pilot mode. The resulting vector y(n) is simply the cyclic extension of the received signal after the removal of the guard interval. The vector y(n) is then demodulated with a DFT to yield the input signal Y(n) to the channel estimator. With the pilot symbols constructed as in (14), the cyclic property is ensured during the pilot mode and each symbol in Y(n) contains only the pilot contribution from one transmitter. The complex gain of the (k + 2(m − 1))th subcarrier from the mth transmitter can be estimated by 

2 m n, k + 2(m − 1) Λ



    k + 2(m − 1)   Y n,   = PSm n, k + 2(m − 1)   0

if (k)2M = 0,

(16)

otherwise.

Notice that the nonzero estimate   2 m n, k + 2(m − 1) Λ     = Λm n, k + 2(m − 1) + W n, k + 2(m − 1) ,

(17)

where Λm (n, k + 2(m − 1)) is the actual complex gain of the (k + 2(m − 1))th subcarrier from the mth transmitter and

PS1

PS2

k (frequency index)



N

0 1 2 3 4 5 6 7

P D D D D P D D D D

D D D D ··· D D D D

D D D D D D D D

D P D D D D D D D D D D D ··· D P D D D D D D D D D D D

D D D D D D D D

D P D D D D D D D ··· D P D D D D D D D

k (frequency index)

N

transmitted from the mth transmit antenna during the block instant n. The pilot symbols are constructed as follows:

0 1 2 3 4 5 6 7

D D P D D D D P D D

D D D D ··· D D D D

D D D D D D D D

D D D D D D D P D D D D D ··· D D D D D D D P D D D D D

D D D D D D D D

D D D D D P D D D ··· D D D D D P D D D

n (time index) D Data symbol P Pilot symbol Null symbol

Figure 9: Pilot symbol patterns for an OFDM transmitter diversity system without a cyclic prefix where K = 8 and M = 2.

W(n, k + 2(m − 1)) is the sampled channel noise which is a zero-mean complex Gaussian random variable with variance 2 σW = σZ2 /(2M) [27]. 2 m (n) are, in effect, samples The diagonal elements of Λ of the frequency response of the channel between the mth 2 m (n) be the IDFT of the transmitter and the receiver. Let h 2 m (n) is related 2 m (n). In the absence of noise, h diagonal of Λ to the actual CIR hm (n) by [27] h2m (n, k) =

    2M −1 K 1 ' hm n, k + e j(πm/M)l . l 2M l=0 2M K

(18)

2 m (n) is the sum of circularly shifted images of Notice that h hm (n). The images in (18) are the direct result of sampling in the frequency domain. To avoid aliasing in the time domain, the condition K ≥ 2M(L + 1) must be satisfied. To remove 2 m (n) is passed through a length L+1 rectangular the images, h 7 m (n) at the window of gain M to yield the temporal estimate h pilot instant as follows:  h (n, k) + ξ(n, k), m h7m (n, k) =  0,

0 ≤ k ≤ L, L + 1 ≤ k ≤ K − 1.

(19)

7 m (n) yields the estimate of the channel paramThe DFT of h eters 7 m (n) = Λm (n) + Ξ(n), Λ

(20)

where the elements of the noise vector Ξ(n) have a variance of 2 σW (2M(L+1)/K). Since 2M(L+1) < K in general, in addition

Bandwidth Efficient OFDM Transmitter Diversity Techniques

1517

1/PS1 (pN, 0) Y (pN, 0)

h71 (pN, 0)

0 0 0 1/PS1 (pN, 4)

h71 (pN, 1) . . .

Y (pN, 4) 0

. . .

0 1/PS1 (pN, K − 4)

A71 (n, 0)

IDFT

Interpolation filter

A71 (n, 2) DFT

h71 (pN, L)

. . .

h71 (pN, L + 1)

0

. . .

Y (pN, K − 4)

A71 (n, 1)

. . .

. . .

h71 (pN, K − 1)

0 0 0

A71 (n, K − 1)

0

0 0

h72 (pN, 0)

Y (pN, 2) 1/PS2 (pN, 2) 0 0 0 Y (pN, 6)

A72 (n, 0)

h72 (pN, 1) . . .

Interpolation filter

A72 (n, 1)

. . .

A72 (n, 2)

DFT

7 IDFT h2 (pN, L)

1/PS2 (pN, 6) 0

. . .

0 Y (pN, K − 2)

. . .

h72 (pN, L + 1) 0

. . .

. . .

h72 (pN, K − 1) 0

A72 (n, K − 1)

1/PS2 (pN, K − 2) 0

Figure 10: Block diagram of the proposed PSA channel estimator for a two-branch OFDM transmitter diversity system without a cyclic prefix.

to removing the images, the windowing operation also reduces the variance of the noise by a factor of 2M(L + 1)/K. 7 m (n) are then These temporal estimates at the pilot instants h passed through a third-order least-square interpolation filter [26] to provide the estimated channel parameters during the data transmission mode. A block diagram of the proposed PSA channel estimator for a two-branch OFDM transmitter diversity system without a cyclic prefix is shown in Figure 10. The performance of the proposed PSA channel estimator for OFDM transmitter diversity systems without a cyclic prefix has been evaluated by simulations. The simulations used K = 128 and N = 20 for ISTBC-OFDM and K = 256 and N = 10 for ISFBC-OFDM. Simulation results of the average BER after two iterations (i = 2) for a two-branch ISTBCOFDM system with ideal channel parameters and with channel parameters estimated by the proposed PSA channel estimator with a third-order least-square interpolator are shown in Figure 11. Simulation results for the ISFBC-OFDM system are shown in Figure 12.

At low SNR and with estimated channel parameters, both the ISTBC-OFDM and the ISFBC-OFDM systems have about 2 dB performance degradation from the corresponding systems using ideal channel parameters. At high SNR, the BER performance of the ISTBC-OFDM system with estimated parameters approaches that with the ideal parameters. The ISFBC-OFDM system, however, still exhibits a slight degradation with estimated parameters, especially in faster fading environments ( fD = 100 Hz). The ISFBC-OFDM system seems to be more sensitive to channel estimation error at faster fading environments than the ISTBC-OFDM system. The cause of this difference in sensitivity to channel estimation between the two systems is under investigation. 5.

SUMMARY

Bandwidth efficient ISTBC-OFDM and ISFBC-OFDM transmitter diversity systems have been presented in this paper. A low-complexity PSA channel estimator for OFDM

1518

EURASIP Journal on Applied Signal Processing 100

Average BER

10−2

10−4

10−6

10−8

0

5

10

15

20

25

30

35

40

Average received SNR (dB) Ideal parameters, fD = 50 HZ Estimated parameters, fD = 50 HZ Ideal parameters, fD = 100 HZ Estimated parameters, fD = 100 HZ

Figure 11: Performance comparison of ISTBC-OFDM systems with ideal channel parameters and channel parameters estimated with a third-order least-square interpolator. 100

tems without a cyclic prefix. More importantly, the ISTBCOFDM and ISFBC-OFDM algorithms are the only known techniques applicable to OFDM transmitter diversity systems without a cyclic prefix. For ease of presentation, this paper has focused on systems with two transmit antennas (M = 2) and a single receive antenna. It should be noted that the proposed approach is also applicable to systems with a larger number of transmit antennas (M > 2) and can be easily extended to systems with multiple receive antennas by replicating the proposed technique at each receive antenna branch together with a signal combining scheme similar to that shown in [28, Section III-B]. We have developed the ISTBC-OFDM and ISFBCOFDM algorithms to take advantage of the relatively simple space-time block coding process in the STBC-OFDM and SFBC-OFDM systems. Other more sophisticated OFDM transmitter diversity systems [29, 30, 31, 32, 33] have the potential of achieving higher performance than STBC-OFDM and SFBC-OFDM systems, albeit at higher computational complexities. Future work will include applying tail cancellation and cyclic reconstruction techniques to these more recently proposed OFDM transmitter diversity systems for removal of the cyclic prefix and, subsequently, improvement in bandwidth efficiency. ACKNOWLEDGMENT

Average BER

10−2

This paper was presented in part at the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Orlando, Florida, May 2002.

10−4

REFERENCES 10−6

10−8

0

5

10 15 20 25 30 Average received SNR (dB)

35

40

Ideal parameters, fD = 50 HZ Estimated parameters, fD = 50 HZ Ideal parameters, fD = 100 HZ Estimated parameters, fD = 100 HZ

Figure 12: Performance comparison of ISFBC-OFDM systems with ideal channel parameters and channel parameters estimated with a third-order least-square interpolator.

transmitter diversity systems without a cyclic prefix, therefore applicable to ISTBC-OFDM and ISFBC-OFDM systems, has also been presented. The proposed ISTBC-OFDM and ISFBC-OFDM systems are shown to be effective and efficient means of eliminating the need for a cyclic prefix while still providing good spatial diversity gain. The computational complexity of the ISTBC-OFDM and ISFBC-OFDM algorithms has been analyzed and shown to be significantly more efficient than other equalization techniques for OFDM sys-

[1] D. Agrawal, V. Tarokh, A. Naguib, and N. Seshadri, “Spacetime coded OFDM for high data-rate wireless communication over wideband channels,” in Proc. IEEE Vehicular Technology Conference (VTC ’98), vol. 3, pp. 2232–2236, Ottawa, Ont, Canada, May 1998. [2] Y. Li, J. C. Chuang, and N. R. Sollenberger, “Transmitter diversity for OFDM systems and its impact on high-rate data wireless networks,” IEEE Journal on Selected Areas in Communications, vol. 17, no. 7, pp. 1233–1243, 1999. [3] K. F. Lee and D. B. Williams, “A space-time coded transmitter diversity technique for frequency selective fading channels,” in Proc. IEEE Sensor Array and Multichannel Signal Processing Workshop (SAM ’00), pp. 149–152, Cambridge, Mass, USA, March 2000. [4] K. F. Lee and D. B. Williams, “A Space-frequency transmitter diversity technique for OFDM systems,” in Proc. IEEE Global Telecommunications Conference (GLOBECOM ’00), vol. 3, pp. 1473–1477, San Francisco, Calif, USA, November–December 2000. [5] A. Ruiz, J. M. Cioffi, and S. Kasturia, “Discrete multiple tone modulation with coset coding for the spectrally shaped channel,” IEEE Trans. Communications, vol. 40, no. 6, pp. 1012– 1029, 1992. [6] R. van Nee and R. Prasad, OFDM for Wireless Multimedia Communications, Artech House, Boston, Mass, USA, 2000. [7] Commission of European Communities, Digital Land Mobile Radio Communications—COST 207, Office for Official Publications of the European Communities, Luxembourg, 1989.

Bandwidth Efficient OFDM Transmitter Diversity Techniques [8] N. Al-Dhahir and J. M. Cioffi, “Optimum finite-length equalization for multicarrier transceivers,” IEEE Trans. Communications, vol. 44, no. 1, pp. 56–64, 1996. [9] P. J. W. Melsa, R. C. Younce, and C. E. Rohrs, “Impulse response shortening for discrete multitone transceivers,” IEEE Trans. Communications, vol. 44, no. 12, pp. 1662–1672, 1996. [10] L. Vandendorpe, “MMSE equalizers for multitone systems without guard time,” in Proc. 8th European Signal Processing Conference (EUSIPCO ’96), vol. 3, pp. 2049–2052, Trieste, Italy, September 1996. [11] L. Vandendorpe, “Fractionally spaced linear and DF MIMO equalizers for multitone systems without guard time,” Annals of Telecommunications, vol. 52, no. 1-2, pp. 21–30, 1997. [12] Y. Sun and L. Tong, “Channel equalization for wireless OFDM systems with ICI and ISI,” in Proc. IEEE International Conference on Communications (ICC ’99), vol. 1, pp. 182–186, Vancouver, BC, Canada, June 1999. [13] J. M. Cioffi and J. A. C. Bingham, “A data-driven multitone echo canceller,” IEEE Trans. Communications, vol. 42, no. 10, pp. 2853–2869, 1994. [14] D. Kim and G. L. St¨uber, “Residual ISI cancellation for OFDM with applications to HDTV broadcasting,” IEEE Journal on Selected Areas in Communications, vol. 16, no. 8, pp. 1590– 1599, 1998. [15] A. V. Oppenheim and R. W. Schafer, Discrete-Time Signal Processing, Prentice-Hall, Englewood Cliffs, NJ, USA, 1989. [16] G. H. Golub and C. F. Van Loan, Matrix Computations, Johns Hopkins University Press, Baltimore, Md, USA, 3rd edition, 1996. [17] G. E. Moore, “Cramming more components onto integrated circuits,” Electronics Magazine, vol. 38, no. 8, pp. 114–117, 1965. [18] J.-J. van de Beek, O. Edfors, M. Sandell, S. K. Wilson, and P. O. B¨orjesson, “On channel estimation in OFDM system,” in Proc. IEEE Vehicular Technology Conference (VTC ’95), vol. 2, pp. 815–819, Chicago, Ill, USA, July 1995. [19] P. Hoeher, S. Kaiser, and P. Robertson, “Two-dimensional pilot-symbol-aided channel estimation by Wiener filtering,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (ICASSP ’97), vol. 3, pp. 1845–1848, Munich, Germany, April 1997. [20] P. Hoeher, S. Kaiser, and P. Robertson, “Pilot-symbol-aided channel estimation in time and frequency,” in Proc. IEEE Global Telecommunications Conference (GLOBECOM ’97), Communication Theory Mini-Conference, pp. 90–96, Phoenix, Ariz, USA, November 1997. [21] O. Edfors, M. Sandell, J.-J. van de Beek, S. K. Wilson, and P. O. B¨orjesson, “OFDM channel estimation by singular value decomposition,” IEEE Trans. Communications, vol. 46, no. 7, pp. 931–939, 1998. [22] O. Edfors, M. Sandell, J.-J. van de Beek, S. K. Wilson, and P. O. B¨orjesson, “Analysis of DFT-based channel estimators for OFDM,” Wireless Personal Communications, vol. 12, no. 1, pp. 55–70, 2000. [23] Y. Li, L. J. Cimini Jr., and N. R. Sollenberger, “Robust channel estimation for OFDM systems with rapid dispersive fading channels,” IEEE Trans. Communications, vol. 46, no. 7, pp. 902–915, 1998. [24] Y. Li, “Pilot-symbol-aided channel estimation for OFDM in wireless systems,” IEEE Trans. Vehicular Technology, vol. 49, no. 4, pp. 1207–1215, 2000. [25] Y. Li, N. Seshadri, and S. Ariyavisitakul, “Channel estimation for OFDM systems with transmitter diversity in mobile wireless channels,” IEEE Journal on Selected Areas in Communications, vol. 17, no. 3, pp. 461–471, 1999. [26] K. F. Lee and D. B. Williams, “Pilot-symbol-assisted channel estimation for space-time coded OFDM systems,” EURASIP

1519

[27] [28] [29]

[30]

[31]

[32] [33]

Journal on Applied Signal Processing, vol. 2002, no. 5, pp. 507– 516, 2002. N. J. Fliege, Multirate Digital Signal Processing: Multirate Systems, Filter Banks, Wavelets, John Wiley & Sons, West Sussex, UK, 1994. S. M. Alamouti, “A simple transmit diversity technique for wireless communications,” IEEE Journal on Selected Areas in Communications, vol. 16, no. 8, pp. 1451–1458, 1998. Y. Gong and K. B. Letaief, “Space-frequency-time coded OFDM for broadband wireless communications,” in Proc. IEEE Global Telecommunications Conference (GLOBECOM ’01), vol. 1, pp. 519–523, San Antonio, Tex, USA, November 2001. B. Lu, X. Wang, and K. R. Narayanan, “LDPC-based spacetime coded OFDM systems over correlated fading channels: Performance analysis and receiver design,” IEEE Trans. Communications, vol. 50, no. 1, pp. 74–88, 2002. Y. G. Li, J. H. Winters, and N. R. Sollenberger, “MIMOOFDM for wireless communications: signal detection with enhanced channel estimation,” IEEE Trans. Communications, vol. 50, no. 9, pp. 1471–1477, 2002. A. F. Molisch, M. Z. Win, and J. H. Winters, “Space-timefrequency (STF) coding for MIMO-OFDM systems,” IEEE Communications Letters, vol. 6, no. 9, pp. 370–372, 2002. Z. Liu, Y. Xin, and G. B. Giannakis, “Space-time-frequency coded OFDM over frequency-selective fading channels,” IEEE Trans. Signal Processing, vol. 50, no. 10, pp. 2465–2476, 2002.

King F. Lee received his B.S.E.E. degree from the University of Florida, M.S.E. degree from Florida Atlantic University, and Ph.D. degree from the Georgia Institute of Technology. In 1979, he joined Motorola Inc., where he is currently a Distinguished Member of the Technical Staff. His areas of interest include mixed analog-digital integrated circuit design, wireless communications, and digital signal processing. Dr. Lee has served as a Member of the Industrial Advisory Board of the NSF Research Center for the Design of Analog-Digital Integrated Circuits (CDADIC) from 1989 to 1993. He is a Registered Professional Engineer and a Member of the Eta Kappa Nu Honor Society. Douglas B. Williams received the B.S.E.E., M.S., and Ph.D. degrees in electrical and computer engineering from Rice University, Houston, Texas, in 1984, 1987, and 1989, respectively. In 1989, he joined the faculty of the School of Electrical and Computer Engineering at the Georgia Institute of Technology, Atlanta, Georgia, where he is currently an Associate Professor. There he is also affiliated with the Center for Signal and Image Processing and teaches courses in signal processing and telecommunications. Dr. Williams has served as an Associate Editor of the IEEE Transactions on Signal Processing and is a Member of the IEEE Signal Processing Society’s SPTM Technical Committee. He was on the conference committee for the 1996 International Conference on Acoustics, Speech, and Signal Processing that was held in Atlanta and is currently Cochair of the 2002 IEEE DSP and Signal Processing Education Workshops. Dr. Williams was Coeditor of the Digital Signal Processing Handbook published in 1998 by CRC Press and IEEE Press. He is a Member of the Tau Beta Pi, Eta Kappa Nu, and Phi Beta Kappa Honor Societies.

EURASIP Journal on Applied Signal Processing 2004:10, 1520–1535 c 2004 Hindawi Publishing Corporation 

Partial Crosstalk Cancellation for Upstream VDSL Raphael Cendrillon Department of Electrical Engineering, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, Leuven-Heverlee 3001, Belgium Email: [email protected]

Marc Moonen Department of Electrical Engineering, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, Leuven-Heverlee 3001, Belgium Email: [email protected]

George Ginis Texas Instruments, 2043 Samaritan Drive, San Jose, CA 95124, USA Email: [email protected]

Katleen Van Acker Alcatel Bell, Francis Wellesplein 1, Antwerp 2018, Belgium Email: katleen.van [email protected]

Tom Bostoen Alcatel Bell, Francis Wellesplein 1, Antwerp 2018, Belgium Email: [email protected]

Piet Vandaele Alcatel Bell, Francis Wellesplein 1, Antwerp 2018, Belgium Email: [email protected] Received 5 March 2003 Crosstalk is a major problem in modern DSL systems such as VDSL. Many crosstalk cancellation techniques have been proposed to help mitigate crosstalk, but whilst they lead to impressive performance gains, their complexity grows with the square of the number of lines within a binder. In binder groups which can carry up to hundreds of lines, this complexity is outside the scope of current implementation. In this paper, we investigate partial crosstalk cancellation for upstream VDSL. The majority of the detrimental effects of crosstalk are typically limited to a small subset of lines and tones. Furthermore, significant crosstalk is often only seen from neighbouring pairs within the binder configuration. We present a number of algorithms which exploit these properties to reduce the complexity of crosstalk cancellation. These algorithms are shown to achieve the majority of the performance gains of full crosstalk cancellation with significantly reduced run-time complexity. Keywords and phrases: DSL, interference cancellation, reduced complexity, partial crosstalk cancellation, crosstalk selectivity, hybrid selection/combining.

1.

INTRODUCTION

VDSL is the next step in the on-going evolution of DSL systems. Supporting data rates up to 52 Mbps in the downstream, VDSL offers the potential of bringing truly broadband access to the consumer market. VDSL supports such high data rates by operating over short line lengths and transmitting in frequencies up to 12 MHz.

The twisted pairs in the access network are distributed within large binder groups which typically contain anything from 20 to 100 individual pairs. As a result of the close distance between twisted pairs within binders and the high frequencies used in VDSL transmission, there is significant electromagnetic coupling between nearby pairs. This electromagnetic coupling leads to interference or crosstalk between the different systems operating within a binder.

Partial Crosstalk Cancellation for Upstream VDSL There are two types of crosstalk, near-end crosstalk (NEXT) and far-end crosstalk (FEXT). NEXT occurs when the upstream (US) signal of one modem couples into the downstream signal of another or vice versa. FEXT occurs when two signals traveling in the same direction couple. In VDSL, NEXT is avoided through the use of FDD. FEXT, on the other hand, is still present. FEXT is typically 10–15 dB larger than the background noise and is the dominant source of performance degradation in VDSL. Many crosstalk cancellation schemes have been proposed for VDSL based on linear pre- and postfiltering [1, 2], successive interference cancellation [3, 4], and turbo coding [5]. These schemes are applicable to US transmission where the receiving modems are colocated. In downstream transmission, it is also possible to precompensate for crosstalk since the transmitters are then colocated at the central office (CO) [3, 6]. Cancellation of crosstalk from alien systems like HPNA and HDSL has also been investigated [7, 8]. Since crosstalk is the dominant source of performance degradation in VDSL, removing it leads to spectacular performance gains, for example, 50–130 Mbps in the US direction [3]. Whilst the benefits of crosstalk cancellation are large, complexity can be extremely high. For example, in a bundle with 20 users all transmitting on 4096 tones and operating at a block rate of 4000 blocks per second, the complexity of linear crosstalk cancellation exceeds 6.5 billion multiplications per second. This is outside the scope of present-day implementation and may remain infeasible economically for several years. Other techniques such as soft-interference cancellation and nonlinear crosstalk cancellation add even more complexity. What is required is a crosstalk cancellation scheme with scalable complexity. It should support both conventional single-user detection (SUD) and full crosstalk cancellation. Furthermore, it should exhibit graceful performance degradation as complexity is reduced. We present a US crosstalk cancellation scheme which exhibits these properties. It is shown that by exploiting the space- and frequencyselective nature of crosstalk channels, this crosstalk cancellation scheme can achieve the majority of the performance gains of full crosstalk cancellation with a fraction of the runtime complexity. This paper is organised as follows. In Section 2, we describe the system model for the crosstalk environment. Section 3 describes crosstalk cancellation, its performance and complexity. Due to the high complexity of full crosstalk cancellation, in Sections 4 and 5, we introduce the concept of partial crosstalk cancellation which exploits both the spaceand frequency-selectivity of the crosstalk channel. This takes advantage of the fact that the majority of the crosstalk experienced by a modem comes from only a few other crosstalkers in the binder. Furthermore, since crosstalk coupling varies dramatically with frequency, the worst effects of crosstalk are limited to a small selection of tones. Exploiting these two properties leads to significant reductions in complexity. In Section 6, we describe a partial cancellation algorithm which exploits space-selectivity. An algorithm which exploits frequency-selectivity only is described in Section 7. As we

1521 will see, achieving the largest possible reduction in run-time complexity requires algorithms to exploit both forms of selectivity and in Section 8 we describe such algorithms. The performance of the algorithms is compared in Section 9 and conclusions are drawn in Section 10. 2.

UPSTREAM SYSTEM MODEL

We begin by assuming that all receiving modems are colocated at the CO as is the case in US transmission. This is a prerequisite for crosstalk cancellation since signal level coordination is required between receivers. Through synchronized transmission and the cyclic structure of DMT blocks, crosstalk can be modelled independently on each tone. We assume there are N + 1 users within the binder group so that each user has N interferers. Transmission of a single DMT block can be modelled as 













(1,N+1) yk1 x1 z1 h(1,1) · · · hk k    .k   .k   .   . .  .  . = . .  +  . , .. ..   .   .  .   .  (N+1,N+1) ykN+1 xkN+1 zkN+1 · · · hk h(N+1,1) k

yk = Hk xk + zk . (1) Here xkn and ykn denote the symbols transmitted and received, respectively by user n on tone k. The tone k is in the range 1, . . . , K, where K is the number of tones in the DMT system (e.g., for VDSL, K = 4096). hk(n,n) is the direct channel of user n at tone k, and hk(n,m) is the crosstalk channel from user m into user n. zkn represents the additive noise experienced by user n on tone k and is assumed to be spatially 2 white and Gaussian such that E {zk zH k } = σk IN . We denote the transmit auto-correlation on tone k as E {xk xkH } = Sk with sm k  [Sk ]m,m . Note that Sk is a diagonal matrix since coordination is not available between the different customer premises (CP) transmitters. A matrix A is said to be column-wise diagonal dominant if it satisfies & (m,m) & & & &a &  &a(n,m) &,

∀n  = m,

(2)

where a(n,m)  [A]n,m , whilst A is said to be row-wise diagonal dominant if it satisfies & (n,n) & & & &a &  &a(n,m) &,

∀n  = m.

(3)

If A satisfies both (2) and (3), it is said to be strictly diagonal dominant. In DSL channels with colocated receivers, the channel matrix Hk is column-wise diagonal dominant and satisfies the following property: & & & & & (m,m) & & (n,m) & &hk &  &hk &,

∀n  = m.

(4)

In other words, the direct channel of any user always has a larger gain than the channel from that user’s transmitter into any other user’s receiver. This property has been verified

1522

EURASIP Journal on Applied Signal Processing

through extensive cable measurements (see the semiempirical crosstalk channel models in [9]). It will be exploited in the remaining sections. 3.

CROSSTALK CANCELLATION

3.1. Optimal crosstalk cancellation When both the transmitters and the receivers of the modems within a binder are colocated, channel capacity can be achieved in a simple fashion [1, 2]. Using the singular value decomposition (SVD), define svd

Hk = Uk Λk VH k ,

(5)

where the columns of Uk and Vk are the left and right singular vectors of Hk , respectively, and the singular values Λk  diag{λ1k , . . . , λN+1 k }. It is assumed that Hk is nonsingular, which is ensured by (4) provided that hk(n,n)  = 0 for all n. Define the true set of symbols x2k  [x2k1 · · · x2kN+1 ]T which are generated by the QAM encoders. Define E {2xk x2kH }  S2k = diag{2sk1 , . . . , 2skN+1 }. For a given S2k , the optimal transmitter structure prefilters x2k with the matrix Pk = Vk

(6) wkn

such that xk = Pk x2k . At the receiver, we apply the filter = −1 H eH n Λk Uk to generate our estimate of the transmitted symbol x7kn = wkn yk

  = wkn Hk Pk x 2k + zk =

x2kn

(7)

+ z2kn ,

where en  [IN+1 ]col n , IN+1 is the (N + 1) × (N + 1) iden−1 H tity matrix, and z2kn  eH n Λk Uk zk . Here we use [A]row n and [A]col n to denote the nth row and column of matrix A, respectively. Note that E {|2 zkn |2 } = σk2 (λnk )−2 . The preand postfiltering operations remove crosstalk without causing noise enhancement. Applying a conventional slicer to x7kn achieves the following rate for user n on tone k: 



 2 1 ckn = log 1 + σk−2 λnk 2skn , Γ

(8)

where Γ represents the SNR gap to capacity and is a function of the target BER, coding gain, and noise margin [10]. The maximum achievable rate of the multiline DSL channel is C=

' k

& & & & 1 −2 H& log & &IN + Γ σk Hk Sk Hk &.  

(9)

It is straightforward to show n k ckn = C. So through the application of a simple linear pre- and postfilter and a conventional slicer, it is possible to operate at the maximum achievable rate of the DSL channel for the given S2k . Unfortunately, application of a prefilter requires the transmitting modems to be colocated. In US DSL, this is typically not the case since transmitting modems are located at different CPs.

3.2.

Simplified, near-optimal crosstalk cancellation

As a result of the column-wise diagonal dominance of Hk , rates close to the maximum can be achieved with a very simple receiver structure. Furthermore, prefiltering is not required so such rates can be achieved without colocated transmitting modems. We now show why this is true. Theorem 1. Any column-wise diagonal dominant matrix Hk which satisfies (A.7) can be decomposed into Hk = Qk Σk

(10)

such that Qk is unitary and Σk is strictly diagonal dominant with positive diagonal elements. Furthermore, the off-diagonal elements of Σk can be bounded using (A.27) and (A.30). Proof. See the appendix. The strict diagonal dominance of Σk allows us to make the approximations (

)

(

)−1

Σk  diag Σk , Σk−1  diag Σk

.

(11)

Hence (

)

Hk  Qk diag Σk IN .

(12)

Comparison with (5) yields Uk  Qk , Λk  diag{Σk }, and VH k  IN . So the optimal transmit/receive structure of Section 3.1 is well approximated by Pk  IN ,

(

wkn  eH n diag Σk

)−1

−1 H  eH n Σk Qk

QH k

(13)

−1 = eH n Hk ,

where we use (11) to go from line 2 to 3. In [6], an upper bound is proposed for the capacity loss incurred due to the above approximation. This is shown to be minimal for all practical DSL channels. Since Pk = IN , prefiltering is not required. This is important since in US DSL transmitting, modems are not colocated. Furthermore, the optimal receiver structure is well approximated by a linear zero-forcing (ZF) design. Thus we can achieve close to maximum rate using the following estimate: −1 x7kn = eH n Hk yk .

(14)

Note that noise enhancement is not a problem since Hk−1  H Σk−1 QH k . Qk is unitary hence it does not alter the statistics of the noise. Σk−1 is approximately diagonal hence it scales the signal and noise equally. Using this scheme, crosstalk cancellation of one user at one tone requires N multiplications per DMT block. So crosstalk cancellation for N + 1 users on K tones at a block rate b (DMT blocks per second) requires (N 2 + N)Kb multiplications per second. Thus the complexity rapidly grows

Partial Crosstalk Cancellation for Upstream VDSL

1523

Proportion of total crosstalk

Channel gain (dB)

−30 −35 −40 −45 −50 −55

0

2

4

6

8

10

12

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

Frequency (MHz)

Figure 1: FEXT transfer functions for 0.5 mm British Telecom cable.

with the number of users in a bundle. For example, in a 20user system with 4096 tones and a block rate of 4000, the complexity is 6.5 billion multiplications per second. So whilst crosstalk cancellation leads to significant performance gains, it can be extremely complex, certainly beyond the complexity available in present-day systems. This is the motivation behind partial crosstalk cancellation.

1

2

3 4 Crosstalkers

5

6

7

Figure 2: Proportion of crosstalk caused by i largest crosstalkers.

Proportion of total crosstalk

h(1, 2) h(1, 3) h(1, 4)

0

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

0

500

1000

1500

2000

2500

3000

Tones

4.

CROSSTALK SELECTIVITY

In Figure 1, some crosstalk transfer functions are plotted from a set of measurements of a British Telecom cable consisting of 8 × 0.5 mm pairs. Examining this plot, we can make two observations. First, from a particular user’s perspective, some crosstalkers cause significant amounts of interference, whilst others cause little interference at all. We refer to this as the spaceselectivity of crosstalk since the crosstalk channels vary significantly between lines. Space-selectivity arises naturally due to the physical layout of binders. A 25-pair binder is depicted in Figure 4. As can be seen, each pair is typically surrounded by 4–5 neighbours. Since electromagnetic coupling decreases rapidly with distance, each pair will experience significant crosstalk from only a few other surrounding pairs within the binder. Naturally twisted pairs which are nearby within a bindergroup will cause each other more crosstalk. The nearfar effect also gives rise to space-selectivity. In US transmission, modems which are located closer to the CO will cause more crosstalk than those located further away. To illustrate the space-selectivity of crosstalk, we calculated the proportion of total crosstalk energy that is caused by the i largest crosstalkers of user n on tone k. All users have identical transmit PSDs, hence, from the perspective of user n, crosstalker m is said to be larger than crosstalker q at tone (n,q) k if |hk(n,m) | > |hk |. The result was averaged across all tones k and every line n within the binder. The measurements were done using the British Telecom cable and the result is shown in Figure 2. As can be seen on average, approximately 90% of

Figure 3: Proportion of crosstalk contained within i worst tones.

crosstalk energy is caused by the 4 largest crosstalkers. Second, crosstalk channels vary significantly with frequency. So whilst a user may experience significant crosstalk on one tone, weak crosstalk may be experienced on other tones. We refer to this as the frequency-selectivity of crosstalk which arises naturally from the frequency-dependent nature of electromagnetic coupling. To illustrate the frequency-selectivity of crosstalk, we calculated the proportion of total crosstalk energy contained within the i worst tones. From the perspective of user n and crosstalker m, tone k is said to be worse than tone l if (n,m) (n,m) |hk | > |hl |. The result is shown in Figure 3. Approximately 90% of the crosstalk is contained within half of the tones. So the effects of crosstalk vary considerably with both space and frequency. Furthermore, the majority of its effects are contained within a relatively small subset of tones and crosstalkers. These observations suggest that we can achieve the majority of the performance gains of crosstalk cancellation by cancelling only the largest crosstalkers on each tone and we refer to this as partial crosstalk cancellation. Some tones will see more significant crosstalkers than others and we can scale between conventional SUD and full crosstalk cancellation on a tone-by-tone basis. On each tone, we choose the degree of crosstalk cancellation based on the severity of crosstalk experienced. By cancelling only

1524

EURASIP Journal on Applied Signal Processing to the N multiplications required for full crosstalk cancellation. This technique has many similarities to hybrid selection/combining from the wireless field [11, 12]. There, selection is also used between receive antennas to reduce runtime complexity and reduce the number of analog front ends (AFE) required. 5.2.

Partial crosstalk canceller design

We now describe the design of the partial cancellation coefficients wnk . We begin with a reduced system model which only contains the signals observed in the detection of user n at tone k

User of interest

n

ynk = Hk xnk + Hnk xnk + znk .

Dominant crosstalker

xnk contains the signals transmitted onto the set of observed lines {n, Mnk }

Figure 4: Geometry of a 25-pair bundle.

the largest crosstalkers and by varying the degree of crosstalk cancellation on each tone, partial crosstalk cancellation can approach the performance of full crosstalk cancellation with a fraction of the run-time complexity. 5.

PARTIAL CROSSTALK CANCELLATION

5.1. Partial crosstalk canceller structure We now describe the design of partial crosstalk cancellation in more detail. In the detection of user n, we observe the direct line of user n (to recover the signal) and pk,n additional lines (to enable crosstalk cancellation). pk,n varies with both the tone k and the user n to match the severity of crosstalk seen by that user on that tone. Note that pk,n = N corresponds to full crosstalk cancellation whilst pk,n = 0 corresponds to none (i.e., SUD). Define the set of extra observation lines



T

.

Mnk  {1, . . . , n − 1, n + 1, . . . , N + 1} \ Mnk  ) ( = mk,n (1), . . . , mk,n N − pk,n ,

(16)

(17)

where A \ B denotes the elements contained in set A and not in set B. We form an estimate of the transmitted symbol using a linear combination of the received signals on the observation lines only: =

wnk ynk .

(20)

n



hk(n,n)

  n  Hk rows Mnk , col n







Hn n  n k row n, cols Mk  , Hk rows Mnk , cols Mnk

(21)

where [A]rows A , cols B denotes the submatrix formed from the rows A and columns B of matrix A. xnk contains the signals transmitted onto the set of nonobserved lines Mkn : 

m (1)

xnk  xk k,n

 m (N − pk,n ) T

· · · xk k,n

(22)

and Hnk contains the corresponding channels Hnk

  n Hk row n, cols Mkn   n .

Hk

(23)

rows Mnk , cols Mnk

znk contains the noise seen on the observed lines

We also define the set of lines which are not observed in the detection of user n on tone k

x7kn

T

and Hk contains the corresponding channels

(15)

and the corresponding received signals ynk  ykn , ykmk,n (1) · · · ykmk,n (pk,n )



xnk  xkn , xkmk,n (1) · · · xkmk,n (pk,n )

n Hk

(  ) Mnk  mk,n (1), . . . , mk,n pk,n

(19)



znk  zkn , zkmk,n (1) · · · zkmk,n (pk,n )

T

.

(24)

We choose a ZF design which was shown in Section 3.2 to be a near-optimal transmit/receive structure. The partial cancellation filter is designed to remove all crosstalk from crosstalkers in the set Mnk : 

n −1

wnk  eH 1 Hk

,

(25)

where en  [I pk,n +1 ]col n . Hence x7kn = xkn + wnk Hnk xnk + wnk znk .

(26)

(18)

Note that crosstalk cancellation for user n at tone k now requires only pk,n multiplications per DMT block in contrast

The first term is the transmitted signal whilst the second and third terms are the residual crosstalk and filtered noise, respectively.

Partial Crosstalk Cancellation for Upstream VDSL 6.

1525

LINE SELECTION

Thus we find

In DSL, the majority of the crosstalk that a particular user experiences comes from only a few of the other users within the system. We have referred to this effect as the space-selectivity of the crosstalk channel and we exploit it to reduce the complexity of crosstalk cancellation. In practice, this corresponds to observing only the subset Mnk of the lines at the CO when detecting user n. In this section, we investigate the optimal choice for the subset Mnk . Our problem is thus &

&

ckn s.t. &Mnk & ≤ pk,n , max n

& &

&−2 &

wnk  &hk(n,n) & ×

4$

hk(n,n)

%∗ $

% (mk,n (1),n) ∗

hk

$

···

% (mk,n (pk,n ),n) ∗

5

hk

.

(34) Using (4), we can make the approximation $

wnk Hnk  hk(n,n)

& %∗ & & (n,n) &−2  n  &hk & Hk row 1 ,

(35)

(27)

Mk

hence the residual interference

where |A| denotes the cardinality of set A and of user n on tone k.

ckn

$

is the rate

wnk Hnk xnk  hk(n,n)

& %∗ & & (n,n) &−2 ' (n,m) m & hk & hk xk .

(36)

m∈Mnk

6.1. Residual interference Column-wise diagonal dominance in Hk implies the same n in Hk . Hence we can use the decomposition defined in Theorem 1 n n

n

Hk = Qk Σk , n

n

(

n)

Σk  diag Σk . n

(i, j)

= 

'

 n ρ(i,1) k,n Qk col i

i  n ρ(1,1) k,n Qk col 1 .

Now since the diagonal elements of the norm of both sides of (30) yields

n Σk

. . n  .−1 n . . . col 1 2 Qk col 1 2 & (n,n) & &, &h k

m∈Mnk

Using (4) and (33), we can make the approximation $

wnk znk  hk(n,n)

(31) n Hk

n −1

wnk = eH 1 Hk

( n )−1  n H  eH Qk 1 diag Σk  (1,1) −1  n H  ρk,n Qk col 1 & (n,n) &−1  n H  &hk & Qk col 1 .

(32)

From (30), n H col 1

Qk

% $ (1,1) −1  n H  ρk,n Hk col 1 & &−1 4$ %∗ $ (m (1),n) %∗ $ (m (p ),n) %∗ 5 & (n,n) & k,n k,n  & hk & . hk(n,n) hk k,n · · · hk

(33)

& %∗ & & (n,n) &−2 n & hk & zk .

(38)

The power of the filtered noise is thus: 

E wnk znk wnk znk 6.3.

where we use the column-wise diagonal dominance of n and the observation [Hk ]1,1 = hk(n,n) . Hence 

& & & & & (n,n) &−2 ' & (n,m) &2 m &hk & sk .  & hk &

Filtered noise

+

.



6.2.

(30)

are positive, taking

. H ρ(1,1) k k,n  

H ,

(37)

n

n Hk col 1



(29)

We define ρk,n  [Σk ]i, j . Since Σk is column-wise diagonal dominant, 

+

E wnk Hnk xnk wnk Hnk xnk

(28)

where Qk is unitary and Σk strictly diagonal dominant. Hence n

The power of the residual interference is thus:

H ,

& & & (n,n) &−2  &hk & σk2 .

(39)

SINR after partial crosstalk cancellation

After crosstalk cancellation, we have the following estimate of the transmitted signal: x7kn = xkn + wnk Hnk xnk + wnk znk .

(40)

The signal-to-interference-plus-noise ratio (SINR) at the input of the decision device is thus & & & (n,n) &2 n &hk & sk SINRnk   & & & (n,m) &2 m & sk + σk2 m∈Mnk &hk

(41)

with the approximation becoming exact in strongly columnwise diagonal dominant channels. There are two interesting observations to make at this point. First, as we expected, the ZF crosstalk canceller removes crosstalk caused by the modems in the set Mnk perfectly. Second, more surprisingly, the ZF crosstalk canceller does not change the statistics of the crosstalk caused by modems outside of the set Mnk . It also does not change the statistics of the noise. So the column-wise diagonal dominant property of Hk ensures us that a ZF partial crosstalk canceller will not cause enhancement of the crosstalk caused by modems outside Mnk or of the noise.

1526

EURASIP Journal on Applied Signal Processing ( ) Mnk = qk,n (1), . . . , qk,n (c) ,

∀n, k

Algorithm 1: Line selection only.

6.4. Line selection algorithm Maximizing SINRnk and thus rate ckn corresponds to minimizing the amount of interference in the set Mkn . Note that we assume a sufficient number of noise sources and crosstalkers such that the background noise and residual interference are approximately Gaussian. So, to maximize rate ckn , we simply choose Mnk to contain the largest crosstalkers of user n on tone k. Define the indices of the crosstalkers of user n on tone k sorted in order of crosstalk strength (

)

qk,n (1), . . . , qk,n (N) s.t.

& & & & & (n,qk,n (i)) &2 qk,n (i) & (n,qk,n (i+1)) &2 qk,n (i+1) &hk & sk & sk ≥ &hk ,

qk,n (i)  = n,

∀i,

(42)

∀i.

Remark 1 (optimal line selection). In column-wise diagonal dominant channels, the set Mnk , which maximizes the rate of user n on tone k subject to a complexity constraint of pk,n multiplications/DMT block (see optimization in (27)), is (  ) Mnk = qk,n (1), . . . , qk,n pk,n .

(43)

Proof. Follows from examination of (41). At this point, we can propose a simple approach to partial crosstalk cancellation: Algorithm 1. Assume we operate under a complexity limit of cK multiplications/DMT block/user, '& & &Mn & ≤ cK, k

∀n.

for user n regardless of the partial cancellation algorithm employed. We assume that the direct and crosstalk channel gains (n,m) |hk |2 for all n, m, k are available and do not need to be calculated. The initialization complexity (in terms of multiplications and logarithm operations per user) of the different partial cancellation algorithms is listed in Table 1. The required number of sort operations of each size is listed in Table 2. All algorithms have equal run-time complexity. 7.

TONE SELECTION

In the previous section, we presented Algorithm 1 for partial crosstalk cancellation. This algorithm exploits the space-selectivity of the crosstalk channel, that is, the fact that crosstalk varies significantly between different lines. Crosstalk coupling also varies significantly with frequency and this can also be exploited to reduce run-time complexity. In low frequencies, crosstalk coupling is minimal so we would expect minimal gains from crosstalk cancellation. In higher frequencies, on the other hand, crosstalk coupling can be severe. However, in high frequencies, the direct channel attenuation is high so the channel can only support minimal bit-loading even in the absence of crosstalk. This limits the potential gains of crosstalk cancellation. The largest gains from crosstalk cancellation will be experienced in intermediate frequencies and this is where most of the run-time complexity should be allocated. Define the rate achieved by user n on tone k when the pk,n largest crosstalkers are cancelled as  

rk,n pk,n



 & & & (n,n) &2 n &hk & sk  . & & & (n,qk,n (i)) &2 qk,n (i) 2 & sk + σk i= pk,n +1 &hk

1   log 1 +  Γ N

(46)

(44)

k

Define the gain of full crosstalk cancellation (pk,n = N)

This corresponds to c times the complexity of a conventional frequency domain equalizer (FEQ) as is currently implemented in VDSL modems. In this algorithm, we simply cancel the c largest crosstalkers on each tone, hence pk,n = c,

∀n, k.

(45)

The reduction in run-time complexity from this algorithm comes from space-selectivity only. Since the degree of partial cancellation stays constant across all tones, this algorithm cannot exploit the frequency-selectivity of the crosstalk channel. As we will see, this leads to suboptimal performance when compared to algorithms which exploit both space- and frequency-selectivity. The advantage of this algorithm is its simplicity. The algorithm requires only O(KN) multiplications and K sorting operations of N values to initialize the partial crosstalk canceller for one user. Here we define initialization complexity as the complexity of determining Mnk for all k. Initialization complexity does not include actual calculation of the crosstalk parameters wnk  cancellation 3 for each tone. This requires O( k (pk,n + 1) ) multiplications

gk,n  rk,n (N) − rk,n (0)

(47)

and the indices of the tones ordered by this gain (

)

kn (1), . . . , kn (K) s.t. gkn (i),n ≥ gkn (i+1),n ,

∀i.

(48)

Note that by operating on a logarithmic scale, gk,n can be calculated by dividing the arguments of the logarithms in rk,n (N) and rk,n (0). We can now define another partial crosstalk cancellation algorithm: Algorithm 2. This algorithm simply employs full crosstalk cancellation on the cK/N tones with the largest gain and no cancellation on all other tones. This leads to a runtime complexity of cK multiplications/DMT block/user. Note that in this algorithm, pk,n is restricted to take only the values 0 or N. As a result, it is not possible to only cancel the largest crosstalkers and this algorithm cannot exploit space-selectivity. The initialization complexity of this algorithm is O(KN) multiplications and one sort of size K, per user.

Partial Crosstalk Cancellation for Upstream VDSL

1527

Table 1: Initialization complexity (mults. and log operations) of partial crosstalk cancellation algorithms (per user). Scheme Line selection only Tone selection only Simple joint selection Optimal joint selection

N = 7, K = 4096 Mults. Logs 29×103 0 49×103 0 98×103 0 184×103 33×103

Initialization complexity Mults. Logs KN 0 K(N + 5) 0 3K(N + 1) 0 K(0.5N 2 + 2.5N + 3) K(N + 1)

N = 99, K = 4096 Mults. Logs 0.4×106 0 0.4×106 0 1.2×106 0 21.1×106 0.4×106

Table 2: Initialization complexity (sort operations) of partial cancellation algorithms (per user).

Mnk = {m : (m, k) ∈ {dn (1), . . . , dn (cK)}}

Sort operations Sort size N Sort size K Sort size KN Line selection only K 0 0 0 1 0 Tone selection only 0 0 1 Simple joint selection 0 0 KN Optimal joint selection

Algorithm 3: Simple tone-line selection.

Scheme

crosstalkers as



& (n,n) &2 n  &h & s g k,n (m)  log 1 + k 2 k 

Γσk



n M k = {1, . . . , n − 1, n+1, . . . , N +1}, ∅

k ∈ {kn (1), . . . , kn (cK/N)} otherwise

Algorithm 2: Tone selection only.

8.

JOINT TONE-LINE SELECTION

In Sections 6 and 7, we described partial cancellation algorithms which exploit only one form of selectivity in the crosstalk channel. To achieve maximum reduction in runtime complexity, it is necessary to exploit both space- and frequency-selectivity. We should adapt the degree of crosstalk cancellation done on each tone pk,n to match the potential gains. In practice, this means that we allow pk,n to take on values other than 0 and N whilst also allowing pk,n to vary from tone to tone. 8.1. Simple joint tone-line selection As we saw in Section 6.3, observing the direct line of a crosstalker allows us to remove the crosstalk it causes to the user being detected. Hence line selection is equivalent to choosing which subset of crosstalkers we desire to cancel. When combined with tone selection, our problem is effectively to choose which (crosstalker, tone) pairs to cancel in the detection of a certain user. The rate improvement from cancelling a particular crosstalker on a particular tone is dependent on the other crosstalkers that will be cancelled on that tone. As such, there is an inherent coupling in crosstalker selection which greatly complicates matters [13]. In this algorithm, we remove this coupling by ignoring the effect of other crosstalkers in the system. This greatly simplifies (crosstalker, tone) pair selection with only a small performance penalty, as will be demonstrated in Section 9. Define the gain of cancelling crosstalker m on tone k in the detection of user n and in the absence of all other

&

&2



(n,n) 1 &hk & snk  − log 1 + & (n,m) . & 2 Γ &hk &2 sm k + σk

(49)

Note that if we work in a logarithmic scale, then g k,n (m) can be calculated by simply dividing the arguments of each log function. Define (crosstalker, tone) pair dn (i)  (mn (i), kn (i)) and its corresponding gain g n (dn (i))  g kn (i),n (mn (i)). This allows us to define the indices of (crosstalker, tone) pairs ordered by gain (

)









dn (1), . . . , dn (KN) s.t. g n dn (i) ≥ g n dn (i + 1) ,

∀i.

(50)

We can now define our simplified joint tone-line selection algorithm: Algorithm 3. In the detection of user n, we observe the direct line of crosstalker m on tone k if the pair (

)

(m, k) ∈ dn (1), . . . , dn (cK) .

(51)

This leads to a run-time complexity of cK multiplications/DMT block/user. The benefit of this algorithm is its low complexity. Pair selection for one user has a complexity of O(KN) multiplications and one sort of size KN. Furthermore, this algorithm exploits both the space- and frequencyselectivity of the crosstalk channel, allowing it to cancel the largest crosstalkers on the tones where they do the most harm. In Section 9, we will see that this algorithm leads to near-optimal performance. 8.2. Optimum joint tone-line selection It is interesting to evaluate the suboptimality of the algorithms we described so far through an upper bound achieved by a truly optimal partial cancellation algorithm. The problem of partial cancellation is effectively a resource allocation problem. Given cK multiplications per user, we need to distribute these across tones such that the largest rate is achieved: max n

{Mk }k=1,...,K

' k

ckn s.t.

'& & &Mn & ≤ cK. k

k

(52)

1528

EURASIP Journal on Applied Signal Processing Initialize vk,n (p) = (rk,n (p) − rk,n (0))/ p ∀k, p > 0 Repeat (ks , ps ) = arg max(k,p) vk,n (p) M nks = {qks ,n (1), . . . , qks ,n (ps )} vks ,n (p) = 0, p = 1, . . . , ps vks ,n (p) = (rks ,n (p) − rks ,n (ps ))/(p − ps ), + 1, . . . , N p = ps  While k |Mnk | < cK Algorithm 4: Optimal tone-line selection.

Since the channel is column-wise diagonal dominant, Remark 1 allows us to determine, in a simple fashion, the best set of lines to observe in the detection of user n. Hence our problem simplifies to max

{ pk,n }k=1,...,K

' k

ckn s.t.

'

pk,n ≤ cK.

(53)

cations in each iteration, the total allocated complexity will be at the most cK + N. With K = 4096, typically cK  N. Hence the difference between the desired run-time complexity and that of the solution provided by the algorithm is minimal. The upper bound is thus tight. Like Algorithm 3, this algorithm can exploit both the space- and frequency-selectivity of crosstalk to reduce runtime complexity. This algorithm generates a resource allocation at the end of each iteration which is optimal. That is, of all the resource allocations of equal run-time complexity, the one generated by this algorithm achieves the highest rate. Unfortunately, this algorithm is considerably more complex than Algorithm 3. Pair selection for a single user requires O(KN 2 ) multiplications and O(KN) logarithm operations. It is hard to define the exact sorting complexity since it varies significantly with the scenario. Sorting complexity is typically much higher than any of the other algorithms and can require up to KN sort operations which can have sizes as large as KN.

k

8.3. An exhaustive search could require us to evaluate up to N K different allocations. In VDSL, K = 4096, which makes any such search numerically intractable. Due to the structure of the problem, it is possible to come up with a greedy algorithm, Algorithm 4, which will iteratively find the optimal allocation for some values of c. The algorithm cannot find a solution for any arbitrary value of c; however, the range of values of c generated by the algorithm are so closely spaced that this is not a practical problem. Define the value of cancelling p crosstalkers on tone k as vk,n (p) =

rk,n (p) − rk,n (0) . p

(54)

Recall that rk,n (p) is the rate achieved by user n on tone k when the p largest crosstalkers are cancelled and is evaluated using (46). Value is the increase in rate (benefit) divided by the increase in run-time complexity (cost). It measures increase in bit rate per multiplication when p multiplications are spent on tone k. The algorithm begins by initializing vk,n (p) for all values of p and k. It then proceeds as follows: (1) Find choice of tone k and cancelled crosstalkers p with largest value vk,n (p). Store this in (ks , ps ). (2) Set lines to be observed on tone ks to Mnks = {qks ,n (1), . . . , qks ,n (ps )}. (3) Set value of cancelling ps or less crosstalkers on tone ks to zero. This prevents reselection of previously selected pairs. (4) Update value of cancelling ps + 1 or more crosstalkers on tone ks . The rate increase and cost should be relative to the currently selected number of crosstalkers. The algorithm iterates through steps (1)–(4) until the allocated complexity exceeds cK. This yields an upper bound on the partial crosstalk cancellation performance for a given complexity. Since the algorithm allocates at most N multipli-

Complexity distribution between users

So far we have limited the run-time complexity of detecting each user to cK such that '& & &Mn & ≤ cK, k

∀n.

(55)

k

If crosstalk cancellation of all lines in a binder is integrated into a single processing module at the CO, then multiplications can be shared between users. That is, the true constraint is on the total complexity of crosstalk cancellation for all users ''& & &Mn & ≤ cK(N + 1). k n

(56)

k

The available complexity can be divided between users based on our desired rates for each. Denote the number of multiplications/DMT block allocated to user n as κn , then κn = µn cK(N + 1) s.t.

'

µn = 1.

(57)

n

Here µn is a parameter which determines the proportion of computing resources allocated to user n. This allows us to view partial cancellation as a resource allocation problem not just across tones but across users as well. Given a fixed number of multiplications, we must divide them between users based on the desired rate of each user. In a similar fashion to work done in multiuser power allocation (see, e.g., [14, 15]), we can define a rate region as the set of all achievable rate tuples under a given total complexity constraint. This allows us to visualise the different trade-offs that can be achieved between the rates of different users inside a binder. Limiting crosstalk cancellation on each tone to the users who benefit the most leads to further reductions in run-time complexity with minimal performance loss. This is demonstrated in Section 9.2.

Partial Crosstalk Cancellation for Upstream VDSL

1529 13

Table 3: Simulation parameters.

9.

4096 4.3125 kHz 4 kHz 3 dB 6 dB < 10−7 Flat −60 dBm/Hz 998 0.5 mm (24-Gauge) 135 Ohm ETSI type A [9]

Data rate (Mbps)

12

Number of DMT tones Tone width Symbol rate Coding gain Noise margin Symbol error probability Transmit PSD FDD band plan Cable type Source/load resistance Alien crosstalk

11 10 9 8 7 6

0

10

20

30

40

50

60

70

80

90

100

Run-time complexity (%) Line selection Tone selection Simple joint selection Optimal joint selection

PERFORMANCE

We now compare the performance of the partial crosstalk cancellation algorithms described in Sections 6, 7, and 8. Performance is compared over a range of scenarios with crosstalk channels which exhibit both space- and frequencyselectivity. As we show, the ability to exploit both space- and frequency-selectivity is essential for achieving low run-time complexity in all scenarios. We use semiempirical transfer functions from the ETSI VDSL standards [9]. Note that in these channel models, each user sees identical crosstalk channels to all crosstalkers of equal line length. That is, the variation of crosstalk channel attenuation with the distance between lines within the binder is not modelled. When a binder consists of lines of varying length, the model does capture the near-far effect. All users will see the modems located closest to the CO (nearend) as the largest sources of crosstalk. On the other hand, when a binder consists of lines of equal length, all users will see equal crosstalk from all other users. So there will be no space-selectivity in the crosstalk channel model. In reality, we would expect more space-selectivity than is contained within these channel models. Hence we can expect the reduction in run-time complexity to be even larger than that shown here. The number of lines in the binder is always 8, so N = 7. Other simulation parameters are listed in Table 3. 9.1. Equidistant lines (8 × 1000 m) In the first scenario, the binder contains 8 × 1000 m lines. Since the lines are of equal length, the crosstalk channels exhibit frequency-selectivity only; no space-selectivity is present. Shown in Figure 5 are the rates achieved by each of the algorithms versus run-time complexity. Complexity is shown as a percentage relative to full crosstalk cancellation (c = N). Algorithm 1 can only exploit space-selectivity. There is no space-selectivity in this scenario so this algorithm gives extremely poor performance. Worst of all, we actually see a nonconvex rate versus run-time complexity curve. So doing partial crosstalk cancellation gives worse performance than time sharing. In other words, we could do full crosstalk cancellation for some fraction of the time and none for the rest

Figure 5: Data rate versus run-time complexity (equidistant lines).

and this would lead to better performance than Algorithm 1 with the same run-time complexity. The reason for this is as follows: as we increase the number of crosstalkers cancelled pk,n , the increase in signal-to-interference ratio (SIR) grows rapidly. We illustrate this with the following example. Consider a binder with 7 crosstalkers. We assume that the crosstalkers all have identical crosstalk channels χk(n) to user n as is the case in our simulation. Cancelling the first crosstalker causes the SIR to increase from (1/7)|hk(n,n) |2 |χk(n) |−2 to (1/6)|hk(n,n) |2 |χk(n) |−2 . Cancelling the sixth crosstalker gives a much larger SIR increase from (1/2)|hk(n,n) |2 |χk(n) |−2 to |hk(n,n) |2 |χk(n) |−2 . In general, cancelling the pth crosstalker leads to an SIR increase of (N − p + (n) 1)−1 (N − p)−1 |h(n,n) |2 |χk |−2 . So the increase in SIR grows k rapidly with p as p → N. Recall that ckn = log(1 + SINRnk )  SINRnk for low SINRnk . So when crosstalkers have equal strength and the SINR is low, data-rate gain will grow rapidly with the number of crosstalkers cancelled p. This is why cancelling N crosstalkers typically gives greater than N times the data rate gain of cancelling one crosstalker. This leads to the nonconvex rate-complexity curve of Figure 5. When the channel exhibits space-selectivity, the first crosstalker causes much more interference than the second, and so on. This effect counteracts the rapid growth of SIR with p. As a result, the best trade-off between performance and complexity usually occurs somewhere between no and full crosstalk cancellation. Algorithm 2 cannot exploit space-selectivity. In this scenario, this is not a problem since all crosstalkers have equal strength. Algorithm 2 can implement a form of frequencysharing. This is analogous to the time sharing just discussed and allows this algorithm to cancel, for example, 6 crosstalkers on half of the tones instead of 3 crosstalkers on all of the tones. For this reason, Algorithm 2 will always give a convex rate versus complexity curve. Comparing the performance of Algorithm 2 to the optimal algorithm, Algorithm 4,

1530

EURASIP Journal on Applied Signal Processing 6 Far-end data rate (Mbps)

Near-end data rate (Mbps)

80 70 60 50 40 30 20

0

10

20

30 40 50 60 70 Run-time complexity (%)

80

90

5 4 3 2 1 0

100

0

10

20

30 40 50 60 70 Run-time complexity (%)

80

90

100

Line selection Tone selection Simple joint selection Optimal joint selection

Line selection Tone selection Simple joint selection Optimal joint selection

Figure 6: Near-end data rate versus run-time complexity.

Figure 7: Far-end data rate versus run-time complexity.

we see that it gives near-optimal performance in this scenario. Algorithm 3 also gives near-optimal performance. Note that with 29% of the complexity of full crosstalk cancellation, we can achieve 89% of the performance gains.

end users. On far-end users, 29% complexity achieves 97% of the performance gains. We now examine the distribution of run-time complexity between users as described in Section 8.3. Figure 8 contains the achievable rate regions under varying complexities c using Algorithm 3. The rate region was constructed by dividing multiplications between the two classes of near-end and far-end users. Users of one class receive an equal number of multiplications, 2µnear cK and 2µfar cK multiplications per DMT block for the near-end and far-end users, respectively. By varying the parameter µfar , we can trace out the boundary of the rate region. Note that µnear = 1 − µfar . We see in Figure 8 that with c = 2 (29% of the run-time complexity of full crosstalk cancellation), we can achieve the majority of the operating points within the rate region. In Figure 9, the achievable rate regions of the different partial cancellation algorithms are compared for c = 2. Note the considerably larger rate region which is achieved by exploiting both space- and frequency-selectivity, in Algorithms 3 and 4.

9.2. Near-far scenario (4 × 300 m, 4 × 1200 m) We now evaluate the selection algorithms in a binder consisting of 4 × 300 m loops and 4 × 1200 m loops. In this configuration, the lines suffer the near-far effect causing all users to see the 300 m near-end lines as the largest sources of crosstalk. This space-selectivity assists the partial cancellation algorithms in reducing run-time complexity. Frequency-selectivity is present in this scenario and is most pronounced on far-end lines. Near-end lines have relatively flat channels and benefit less from algorithms which exploit frequency-selectivity alone. Figure 6 contains the rates of the 300 m near-end users versus complexity under the different algorithms. Figure 7 contains the same for the 1200 m far-end users. Algorithm 1 cannot exploit frequency-selectivity. On near-end lines, frequency-selectivity is minimal and reasonable performance is still achieved. Again we see a nonconvex rate-complexity curve; however, above 43% complexity, Algorithm 1 gives near-optimal performance. On far-end users, frequency-selectivity is pronounced and Algorithm 1 gives poor performance. Algorithm 2 cannot exploit space-selectivity and, on near-end users, this leads to poor performance which is virtually identical to time sharing. On far-end users, frequencyselectivity is pronounced and this algorithm still achieves reasonable performance despite its inability to exploit spaceselectivity. Algorithm 3 can exploit both space- and frequencyselectivity. As a result, it gives near-optimal performance for both near- and far-end users. With 43% complexity, this algorithm can achieve 99% of the performance gains on near-

9.3.

Distributed scenario (300 : 100 : 1000 m)

Simulations were run in a distributed scenario consisting of 8 lines ranging from 300 m to 1000 m in 100 m increments. Algorithm 3 exhibited near-optimal performance and could increase the average rate from 9.7 Mbps to 23.7 Mbps with only 29% of the complexity of full crosstalk cancellation. This is equivalent to 2 times the complexity of a conventional FEQ. We have seen that the performance of algorithms which exploit only one type of selectivity such as Algorithms 1 and 2 varies considerably with the scenario. By exploiting both space- and frequency-selectivity, Algorithm 3 consistently gave near-optimal performance. This algorithm is also considerably less complex than the optimal algorithm, Algorithm 4.

Far-end data rate (Mbps)

Partial Crosstalk Cancellation for Upstream VDSL

1531

Full cancellation (c = 7)

6 5

c=2

4 c=1

3 2 1 0

No cancellation (c = 0) 0

10

20

30

40

50

60

70

Near-end data rate (Mbps)

Far-end data rate (Mbps)

Figure 8: Achievable rate regions versus complexity (simple joint selection algorithm). Full cancellation

6 5 4 3 2 1 0

0

10

20

30

40

50

60

70

Near-end data rate (Mbps) Line selection Tone selection Simple joint selection Optimal joint selection

Figure 9: Achievable rate regions of different algorithms (c = 2).

10.

CONCLUSIONS

Crosstalk is the limiting factor in VDSL performance. Many crosstalk cancellation techniques have been proposed and these lead to significant performance gains. Unfortunately, crosstalk cancellation has a high run-time complexity and this grows rapidly with the number of users in a binder. Crosstalk channels in the DSL environment exhibit both space- and frequency-selectivity. The majority of the effects of crosstalk are limited to a small number of crosstalkers and tones. Partial crosstalk cancellation exploits this by only performing crosstalk cancellation on the tones and lines where it gives the most benefit. This allows it to give close to the performance of full crosstalk cancellation with considerably reduced run-time complexity. In this paper, we presented several partial crosstalk cancellation algorithms for upstream transmission. It was seen that designing a partial crosstalk canceller requires us to choose which lines to observe when detecting each user on each tone. This is equivalent to choosing the (crosstalker, tone) pairs to cancel in the detection of each user. We described different algorithms for choosing pairs. These included simplistic algorithms such as Algorithm 1 which exploits space-selectivity only, and Algorithm 2 which exploits frequency-selectivity only. In Section 9, we saw that the per-

formance of these two algorithms varies greatly depending on the scenario. Robust performance requires us to exploit both space- and frequency-selectivity together. We presented an optimal algorithm (Algorithm 4) for partial crosstalk cancellation. Whilst this algorithm is highly complex, its ability to exploit both space- and frequencyselectivity led to good performance in all scenarios. Partial crosstalk canceller initialization for one user in this algorithm requires O(KN 2 ) multiplications and O(KN) logarithms. A simple joint selection algorithm (Algorithm 3) was described which decouples the problem of (crosstalker, tone) pair selection thereby reducing initialization complexity significantly. This algorithm gave near-optimal performance in all of the scenarios we evaluated and has an initialization complexity of only O(KN) multiplications per user. With Algorithm 3, it is possible to increase the average rate from 9.7 to 23.7 Mbps using only 2 times the run-time complexity of a conventional single-user detector (SUD), that is, frequency domain equalizer (FEQ), as is currently implemented in VDSL modems. With this complexity, the algorithm achieves 89% of the performance gains of full crosstalk cancellation. By treating computational complexity as a resource to be divided across tones and users, we developed rate regions in Section 9. These allow us to visualize all of the achievable rate tuples under a certain run-time complexity constraint. This is quite similar to work done in the areas of multiuser power allocation (see, e.g., [14, 15]); however, here we consider the allocation of computing resources rather than transmit power. Whilst this paper has focused on crosstalk cancellation in VDSL, the techniques here are also applicable to MIMOCDMA systems. Taking into account the processing gain, the interference path typically has 15–20 dB more attenuation than the main path [16]. Hence the MIMO-CDMA channel is column-wise diagonal dominant and the partial crosstalk cancellation techniques developed here can be directly applied. In this work, we have considered crosstalk cancellation, which is applicable only to upstream DSL where receivers are colocated at the CO. In downstream DSL, it is also possible to mitigate the effects of crosstalk through crosstalk precompensation [3, 6]. The development of partial crosstalk precompensation algorithms with reduced run-time complexity is the subject of ongoing research. The simulations done here neglected the problem of power loading and assumed flat transmit PSDs. The use of nonflat PSDs through multiuser water filling or power back-off is currently the subject of much activity in the research community (see, e.g., [14, 15, 17, 18]). The use of nonflat PSDs increases space- and frequency-selectivity and would allow partial cancellation to achieve even greater runtime complexity reductions whilst maintaining similar performance. The combination of multiuser power allocation and partial cancellation will lead to even larger achievable rates with implementable run-time complexities. This is an important area for future work.

1532

EURASIP Journal on Applied Signal Processing

APPENDIX

implies . & & .2 & & & (m,m) &2 & (n,m) &2 . 2  . &2 & rk & = . Hk col m . − &2 rk

PROOF OF THEOREM 1

2

Sort the diagonal elements of Hk in decreasing order: (

)

tk (1), . . . , tk (N + 1) s.t. h(tk k (i),tk (i)) ≥ hk(tk (i+1),tk (i+1)) ,



i∈{ / n,m}

∀i.

& & & & & & & 2(m,m) &2 ' & 2(i,m) &2 & (m,m) &2 & + &hk & − &2 & ≤ &h rk k

(A.1)

We define the permutation matrix

i =m



Πk  etk (1) · · · etk (N+1)

T

.

& & &  & & (m,m) &2 & 2(m,m) &2  & 1 + N tan2 α − &2 & ≤ &h rk k

(A.2)

5 & & 4 & (m,m) &2 1 + N tan2 α & ≤ &2 rk − 1 1 − 4α2 4 5 & & & (m,m) &2 4α2 + N tan2 α & rk , ∀n  = m, = &2 1 − 4α2

We use Πk to reorder the rows and columns of Hk : 2 k = ΠH H k Hk Πk .

(A.3)

(A.11)

2 k ]n,m . From (A.3), Define h2k(n,m)  [H

where we use (A.6) to get from line 2 to 3 and the lower bound in (A.9) to get from line 3 to 4. Hence

h2k(n,m) = hk(tk (n),tk (m)) .

(A.4)

& & & & & (n,m) &2 & (m,m) &2 &2 & , rk & ≤ f1 (α)&2 rk

Using (A.1) yields h2k(n,n) ≥ h2k(m,m) ,

& ' & & (i,m) &2 &2 rk &

∀m > n.

(A.5)

So application of Πk reorders the rows and columns of Hk such that its diagonal elements are in decreasing order. Define α which measures the degree of column-wise diagonal dominance of the channel & & & & & (n,m) & & 2(n,m) & & hk & &hk & & & arctan & arctan & α  max & (m,m) & = max & 2(m,m) & n,m n,m & hk & &hk & =m =m n n

∀n  = m,

(A.12)

where f1 (α) 

4α2 + N tan2 α . 1 − 4α2

(A.13)

From (A.11), & & & & &  & & (m,m) &2 & (n,m) &2 & 2(m,m) &2  &hk & 1 + N tan2 α ≥ &2 & + &2 rk rk & ,

∀n  = m,

& & & (m,m) &2 & . ≥ &2 rk

(A.6)

(A.14) such that

Hence

& & & & & (n,m) &2 & (m,m) &2 &hk & ≤ tan2 α&hk & ,

∀n  = m.

(A.7)

& & & (m,m) &2 & & &2 & rk & 2(m,m) &2 . & hk & ≥  2

(A.15)

.  .2 .  .2 . 2 . 2 . . . Rk col n . = . H k col n .

(A.16)

1 + N tan α

Using the QR decomposition, 2 kR 2k = Q 2k. H

(A.8)

2 k has positive We define the QR decomposition such that R values on the diagonal. This is without loss of generality. From [3], & & & & & & & (n,n) & & 2(n,n) &3 & 2(n,n) &3 &hk & 1 − 4α2 ≤ &2 rk & ≤ &hk & 1 + N tan2 α, (A.9)

Now

2

2

implies . & & .2 & '& & (n,n) &2 . 2  . & (m,n) &2 &2 &2 rk & = . Hk col n . − rk & 2

r2k(n,m)

where Now

m =n

& & & '& & 2(n,n) &2 & (m,n) &2 & − &2 ≥ &h rk & k

2 k ]n,m .  [R

m =n

.  .2 .  .2 . 2 . 2 . . . Rk col m . = . H k col m . , 2

2

∀m,

(A.10)

& & & & & 2(n,n) &2 & (n,n) &2 & − &2 ≥ &h rk & N f1 (α), k

(A.17)

Partial Crosstalk Cancellation for Upstream VDSL

1533

where we use (A.12) to get from line 2 to 3. So

Using (A.12), & & & & & (n,m) &2 & (vk (n),vk (m)) &2 &ρk & = &2 & rk

& & & 2(n,n) &2 & &2 & & h k & (n,n) & &2 rk & ≥

& & & (vk (m),vk (m)) &2 & , ≤ f1 (α)&2 rk

1 + N f1 (α)



& & & 2(m,m) &2 & hk &

(A.26)

1 + N f1 (α)

Thus

& & & (m,m) &2 &2 & rk 

≥ 

& & & & & (n,m) &2 & (m,m) &2 &ρk & ≤ f1 (α)&ρk & ,



1 + N f1 (α) 1 + N tan2 α



& & & (n,m) &2 &2 rk & 



,

f1 (α) 1 + N f1 (α) 1 + N tan2 α

∀m > n,

where we use (A.5) to get from line 1 to 2, (A.15) from 2 2 k is upper triangular, to 3, and (A.12) from 3 to 4. Since R (n,m) r2k = 0 for all m < n, which implies & & & & & (n,n) &2 & (n,m) &2 &2 rk & ≥ &2 rk & = 0,

∀m < n.

(A.19)

& & & & & (m,m) & & (n,m) & & ρk &  &ρk &,

(A.28)

∀vk (n)  = vk (m).

(A.29) ∀n  = m,

(A.20)

Thus & & & & & (n,m) &2 & (n,n) &2 &ρk & ≤ f2 (α)&ρk & ,





f2 (α)  f1 (α) 1 + N f1 (α) 1 + N tan2 α .

(A.21)

2 k ΠH Hk = Πk H k

  2 k ΠH 2 k ΠH Πk Q Πk R k k



(A.22)

= Qk Σk ,

where we exploit the fact that Πk is unitary. We have defined 2 k ΠH Qk  Πk Q k , 2 k ΠH Σk  Πk R k .

(A.23)

Since Πk is unitary, Qk QH k = IN+1 and hence Qk is unitary. 2 k , Σk is not strictly diagonal dominant. Note that, unlike R Define the inverse permutation order )

k (n),vk (m)) vk (1), . . . , vk (N + 1) s.t. hk(n,m) = h2(v . k

(A.24)

Define ρk(n,m)  [Σk ]n,m . Compare (A.3) and (A.23). Thus (A.24) implies ρk(n,m) = r2k(vk (n),vk (m)) .

(A.25)

∀n  = m.

(A.30)

∀n  = m.

(A.31)

For small α, f2 (α)  1, hence & & & & & (n,n) & & (n,m) & &ρk &  &ρk &,

From (A.3),

(

∀n  = m.

& & & & & (n,m) &2 & (vk (n),vk (m)) &2 &ρk & = &2 & rk & & & (vk (n),vk (n)) &2 & , ≤ f2 (α)&2 rk

where 

(A.27)

So column-wise diagonal dominance in Hk implies columnwise diagonal dominance in Σk . Similarly, using (A.20),

Combining (A.18) and (A.19), & & & & & (n,m) &2 & (n,n) &2 &2 rk & ≤ f2 (α)&2 rk & ,

∀n  = m.

For small α, f1 (α)  1, hence

(A.18)

=

∀vk (n)  = vk (m).

So column-wise diagonal dominance in Hk implies row-wise diagonal dominance in Σk . Combining (A.28) and (A.31) leads to strict diagonal dominance in Σk . Since the diagonal 2 k are positive, the diagonal elements of Σk are elements of R also positive. Thus Hk can be decomposed as Hk = Qk Σk ,

(A.32)

where Qk is unitary and Σk is strictly diagonal dominant with positive diagonal elements. ACKNOWLEDGMENTS This work was carried out in the frame of IUAP P5/22, Dynamical Systems and Control: Computation, Identification, and Modeling, and P5/11, Mobile Multimedia Communication Systems and Networks; the Concerted Research Action GOA-MEFISTO-666, Mathematical Engineering for Information and Communication Systems Technology; FWO Project G.0196.02, Design of Efficient Communication Techniques for Wireless Time-Dispersive Multiuser MIMO Systems and was partially sponsored by Alcatel Bell.

1534 REFERENCES [1] G. Taubock and W. Henkel, “MIMO systems in the subscriber-line network,” in Proc. 5th International OFDM Workshop, pp. 18.1–18.3, Hamburg, Germany, September 2000. [2] R. Cendrillon, M. Moonen, R. Suciu, and G. Ginis, “Simplified power allocation and TX/RX structure for MIMODSL,” in Proc. IEEE Global Telecommunications Conference, vol. 4, pp. 1842–1846, San Francisco, Calif, USA, December 2003. [3] G. Ginis and J. Cioffi, “Vectored transmission for digital subscriber line systems,” IEEE Journal on Selected Areas in Communications, vol. 20, no. 5, pp. 1085–1104, 2002. [4] W. Yu and J. Cioffi, “Multiuser detection in vector multiple access channels using generalized decision feedback equalization,” in Proc. 5th International Conference on Signal Processing, World Computer Congress, Beijing, China, August 2000. [5] H. Dai and V. Poor, “Turbo multiuser detection for coded DMT VDSL systems,” IEEE Journal on Selected Areas in Communications, vol. 20, no. 2, pp. 351–362, 2002. [6] R. Cendrillon and M. Moonen, “Improved linear crosstalk precompensation for downstream VDSL,” in preparation. [7] C. Zeng and J. Cioffi, “Crosstalk cancellation in ADSL systems,” in Proc. IEEE Global Telecommunications Conference, vol. 1, pp. 344–348, San Antonio, Tex, USA, November 2001. [8] K. Cheong, W. Choi, and J. Cioffi, “Multiuser soft interference canceler via iterative decoding for DSL applications,” IEEE Journal on Selected Areas in Communications, vol. 20, no. 2, pp. 363–371, 2002. [9] ETSI, Transmission and Multiplexing (TM); Access transmission systems on metallic access cables; Very high speed Digital Subscriber Line (VDSL); Part 1: Functional requirements, ETSI Std. TS 101 270-1, Rev. V.1.3.1, 2003. [10] G. Forney and M. Eyuboglu, “Combined equalization and coding using precoding,” IEEE Communications Magazine, vol. 29, no. 12, pp. 25–34, 1991. [11] D. Gore and A. Paulraj, “Space-time block coding with optimal antenna selection,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing, vol. 4, pp. 2441–2444, Salt Lake City, Utah, USA, May 2001. [12] R. Cendrillon, M. Moonen, D. Gore, and A. Paulraj, “Low complexity crosstalk cancellation through line selection in upstream VDSL,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing, vol. 4, pp. 692–695, Hong Kong, China, April 2003. [13] R. Cendrillon, M. Moonen, G. Ginis, K. V. Acker, T. Bostoen, and P. Vandaele, “Partial crosstalk cancellation exploiting line and tone selection in upstream VDSL,” in Proc. Baiona Workshop on Signal Processing in Communications, Baiona, Spain, September 2003. [14] R. Cendrillon, M. Moonen, J. Verlinden, T. Bostoen, and W. Yu, “Optimal multi-user spectrum management for digital subscriber lines,” in 2004 IEEE International Conference on Communications, vol. 1, pp. 1–5, Paris, France, June 2004. [15] W. Yu, G. Ginis, and J. Cioffi, “Distributed multiuser power control for digital subscriber lines,” IEEE Journal on Selected Areas in Communications, vol. 20, no. 5, pp. 1105–1115, 2002. [16] S. Chung, Digital transmission techniques for frequency selective Gaussian interference channels, Ph.D. dissertation, Stanford University, Stanford, Calif, USA, 2003.

EURASIP Journal on Applied Signal Processing [17] R. Cendrillon, M. Moonen, and R. Suciu, “Simplified power allocation for the DSL multi-access channel through columnwise diagonal dominance,” in Proc. IEEE 3rd International Symposium on Image and Signal Processing and Analysis, vol. 2, pp. 634–638, Rome, Italy, September 2003. [18] R. Cendrillon, O. Rousseaux, M. Moonen, E. Van den Bogaert, and J. Verlinden, “Waterfilling in MIMO systems with power constraints on each transmitter,” in Proc. Symposium on Information Theory in the Benelux, De Koningshof, Veldhoven, The Netherlands, May 2003. Raphael Cendrillon was born in Melbourne, Australia in 1978. He received the Bachelor’s of Electrical Engineering (with first-class honours) from the University of Queensland, Australia in 1999. He is currently pursuing the Ph.D. in electrical engineering at the Katholieke Universteit Leuven, Belgium. His current research interests include DSL systems, multiuser information theory, signal processing, and MIMO techniques. His work is done in close collaboration with Alcatel Research and Innovation, Belgium, for which he was awarded the Alcatel Bell Award in 2004. He was also the recipient of an IEEE Travel Grant in 2003 and the KU Leuven Bursary for Advanced Foreign Students in 2004. Marc Moonen received the Electrical Engineering degree and the Ph.D. degree in applied sciences from the Katholieke Universiteit Leuven, Leuven, Belgium, in 1986 and 1990, respectively. Since 2004, he is a Full Professor at the Electrical Engineering Department of Katholieke Universiteit Leuven, where he is currently heading a research team of 16 Ph.D. candidates and postdocs, working in the area of signal processing for digital communications, wireless communications, DSL, and audio signal processing. He received the 1994 KU Leuven Research Council Award, the 1997 Alcatel Bell (Belgium) Award (with Piet Vandaele), and was a 1997 “Laureate of the Belgium Royal Academy of Science.” He was the Chairman of the IEEE Benelux Signal Processing Chapter (1998–2002), and is currently a EURASIP AdCom Member (European Association for Signal, Speech and Image Processing, from 2000 till now). He is Editor-in-Chief for the “EURASIP Journal on Applied Signal Processing” (from 2003 till now), and a Member of the Editorial Board of “Integration, the VLSI Journal,” “IEEE Transactions on Circuits and Systems II” (2002–2003), “EURASIP Journal on Wireless Communications and Networking,” and “IEEE Signal Processing Magazine.” George Ginis received the Diploma in electrical and computer engineering from the National Technical University of Athens, Athens, Greece, in 1997, and the M.S. and Ph.D. degrees in electrical engineering from Stanford University, Stanford, Calif, in 1998 and 2002, respectively. Currently, he is with the Broadband Communications Group of Texas Instruments, San Jose, Calif. His research interests include multiuser transmission theory, interference mitigation, and their application to wireline and wireless communications.

Partial Crosstalk Cancellation for Upstream VDSL Katleen Van Acker was born in Belgium, in 1973. She received the electrical engineering degree and the Ph.D. degree in applied sciences from the Katholieke Universiteit Leuven, Leuven, Belgium, in 1996 and 2001, respectively. She is currently working in Alcatel Research and Innovation. Her research interests are in signal processing for digital communications. Tom Bostoen received the M.S. degree in physical engineering from Ghent University, Ghent, Belgium, in 1998. Since 1998, he has been with the Research & Innovation Department of Alcatel, Antwerpen, Belgium. His research interests include time-domain reflectometry (TDR), multiuser transmission theory, interference mitigation, and their applications to digital subscriber line (DSL) technology. Piet Vandaele was born in Diksmuide, Belgium, in 1972. He received the electrical engineering degree and the Ph.D. degree in applied sciences from the Katholieke Universiteit Leuven, Belgium, in 1995 and 1999, respectively. In 2000, he joined Alcatel Bell, Antwerp, Belgium. Currently, he is working in the Network Strategy Group on DSL Access and Edge Network evolution. Dr. Vandaele received the Alcatel Bell Award (with M. Moonen) in 1997 and the IEEE Signal Processing Society 2002 Young Author Best Paper Award (with G. Leus) in 2003.

1535

EURASIP Journal on Applied Signal Processing 2004:10, 1536–1545 c 2004 Hindawi Publishing Corporation 

Performance Analysis of Multiple-Symbol Differential Detection for OFDM over Both Time- and Frequency-Selective Rayleigh Fading Channels Akira Ishii Department of Communications and Systems, University of Electro-Communications, 1-5-1 Chofugaoka, Chofu-shi, Tokyo 182-8585, Japan Email: [email protected]

Hideki Ochiai Division of Physics, Electrical and Computer Engineering, Yokohama National University, 79-1 Tokiwadai, Hodogaya-ku, Yokohama 240-8501, Japan Email: [email protected]

Tadashi Fujino Department of Communications and Systems, University of Electro-Communications, 1-5-1 Chofugaoka, Chofu-shi, Tokyo 182-8585, Japan Email: [email protected] Received 28 February 2003; Revised 8 October 2003 The performance of orthogonal frequency-division multiplexing (OFDM) system with multiple-symbol differential detection (MSDD) is analyzed over both time- and frequency-selective Rayleigh fading channels. The optimal decision metrics of timedomain MSDD (TD-MSDD) and frequency-domain MSDD (FD-MSDD) are derived by calculating the exact covariance matrix under the assumption that the guard time is longer than the delay spread, thus causing no effective intersymbol interference (ISI). Since the complexity of calculating the exact covariance matrix turns out to be substantial for FD-MSDD, we also develop a suboptimal metric based on the simplified covariance matrix. The comparative analysis between TD-MSDD and FD-MSDD suggests that the most significant improvement is achieved by the FD-MSDD with the optimal metric and a large symbol observation interval, since the time selectiveness of the channel has a dominant effect on the bit error rate of the OFDM system. Keywords and phrases: orthogonal frequency-division multiplexing, multiple-symbol differential detection, time- and frequencyselective channels, Rayleigh fading.

1.

INTRODUCTION

In mobile communications systems, there has been a growing demand for high data rate services such as video phone, high-quality digital distribution of music, and digital television terrestrial broadcasting (DTTB) [1]. In such systems, the delay spread of the channel becomes a major impairment to cope with, since it may cause a severe intersymbol interference (ISI). It is well known that the orthogonal frequency-division multiplexing (OFDM), which transmits the information symbols in parallel over a number of spectrally overlapping but temporally orthogonal subchannels [2], is an effective technique to combat the ISI. With a guard interval longer than the maximum delay spread of

the channel, OFDM can effectively avoid the ISI with high spectral efficiency and reasonable complexity. However, the time-selective nature of the channel due to the Doppler shift also results in the loss of orthogonality among subcarriers, causing a considerable interchannel interference (ICI) [3]. When the time selectiveness of the channel becomes severe, that is, both amplitude and phase of the received signal vary fast, the reliable estimation of the channel state information (CSI) becomes challenging. In such cases, the differential detection (DD) in combination with OFDM may lead to a simple receiver structure, eliminating the need for complex channel estimation. In general, however, the DD suffers from a performance penalty, compared to coherent detection with perfect CSI over an additive white Gaussian noise

Performance Analysis of MSDD for OFDM over Fading Channels (AWGN) channel. In order to reduce this gap between the coherent detection and conventional DD, the multiple-symbol differential detection (MSDD) has been introduced for Mary phase-shift keying (MPSK) signals over the AWGN channel in [4]. Making a joint decision on a block of NM consecutive information symbols based on NM + 1 received samples as opposed to conventional symbol-by-symbol detection, MSDD can asymptotically achieve the performance of the coherent detector. Since the conventional DD or MSDD relies on the time-invariant nature of the channel impulse response over adjacent symbols, its performance will be considerably degraded when the channel is time selective, which results in an irreducible error floor. To cope with this time variance, MSDD has been modified in [5, 6]. Its decision metric utilizes the covariance matrix conditioned on the transmitted information symbol sequence. For OFDM, the DD can be applied over time domain, frequency domain, or both. Because of the long symbol duration, the performance of the time-domain DD (TD-DD) may be mostly affected by the time-selective fading. On the other hand, the performance of the frequency-domain DD (FD-DD) may also depend on the frequency-selectiveness of the channel associated with delay spread [7, 8]. In [8, 9, 10], the bit error rate (BER) performance of TD-DD and FD-DD has been theoretically analyzed over time- and frequencyselective Rayleigh fading channels, including the effects of the ISI caused by the delay spread longer than the guard time. In [11], the performance of MSDD with coded modulation has been studied in terms of channel capacity over quasistatic Rayleigh fading channels with OFDM scenario and ideal interleaving. In this paper, the performance of MSDD combined with OFDM is analyzed over time- and frequency-selective Rayleigh fading channels. Assuming the guard time is longer than the delay spread, we derive the optimal decision metrics. Furthermore, we study the theoretical BER performance of the MSDD for OFDM by extending the result of [6]. Our approach is based on the truncated union bound, which counts only dominant terms of the pairwise error probability (PEP) in the union bound. Based on these analytical results, we compare TD-MSDD and FD-MSDD in terms of irreducible BER behavior for high signal-to-noise ratio (SNR). The paper is organized as follows. After the description of the system model considered throughout the paper in Section 2, we describe the proposed metrics of TD-MSDD and FD-MSDD in Section 3. The bit error probability based on these metrics is studied in Section 4. Section 5 is devoted to a comparative study on the theoretical and simulation results of the MSDD with the various decision metrics developed in the paper. Finally, concluding remarks are given in Section 6. 2.

SYSTEM MODEL

2.1. OFDM with differential encoding The discrete-time baseband equivalent model of the system under consideration is described in Figure 1. Informa-

1537 MDPSK modulation in TD or FD

Ns -point IFFT

Add guard interval Time- and frequencyselective channel

White Gaussian noise Multiple-symbol differential detector in TD or FD

Ns -point FFT

+ Remove guard interval

Figure 1: The discrete-time baseband equivalent model of OFDM with MSDD.

tion bits are Gray mapped onto MPSK and let ci (n) = exp( jθi (n)), where θi (n) ∈ {(2πm)/M, m = 0, 1, . . . , M − 1}, denote the information symbol prior to the differential encoding, which will be assigned on the nth subcarrier of the ith OFDM symbol with Ns subcarriers. Information symbols are assumed to be independent and identically distributed (i.i.d.). For TD-(MS)DD, information symbols are differentially encoded over the consecutive OFDM symbols with the same subcarrier index n. For FD-(MS)DD, on the other hand, information symbols are differentially encoded over the adjacent subcarriers within the same OFDM symbol index i. The differentially encoded symbol si (n) in each domain can be thus expressed as  ci (n)si−1 (n),

si (n) = 

in TD,

ci (n)si (n − 1),

in FD,

(1)

where si (n) ∈ {exp( j2πm/M), m = 0, 1, . . . , M − 1}. The symbol transmitted on the nth subcarrier of the ith OFDM symbol is given by -

ai (n) = Es si (n),

n = 0, 1, . . . , Ns − 1,

(2)

where Es denotes the signal energy per subcarrier symbol. The complex sequence ai (n), n = 0, 1, . . . , Ns − 1, is modulated by the Ns -point inverse discrete Fourier transform (IDFT) to yield Ns time-domain samples corresponding to the ith OFDM symbol. Let Ts denote the Nyquist interval between the output samples. Thus, the OFDM symbol length without guard interval is given by Ns Ts . After the insertion of the guard interval, the transmitted baseband sequence of the ith OFDM symbol can be expressed as Ns −1 1 ' g ai (n)e j(2πnk/Ns ) xi (k) = 3 Ns n=0 g

for −G ≤ k ≤ Ns −1, (3)

where the initial G samples of xi (k), k = −G, −G + 1, . . . , −1, g constitute the guard interval. Assuming that xi (k) is zero for k < −G and k ≥ Ns , the total transmitted baseband sequence

1538

EURASIP Journal on Applied Signal Processing

is written as

where ∞ '

x(k) =

g





xi k − i Ns + G .

  1,    d zd,i (l) = *   ci+ j (l), 

(4)

i=−∞

2.2. Channel model and received baseband sequence We assume that the channel is subject to a wide-sense stationary uncorrelated scattering (WSSUS) Rayleigh fading [12] and is modeled as a time-variant tapped delay line with fixed tap spacing Ts , each tap having Jakes power spectrum [13]. Provided that the maximum delay of the channel impulse response Tm does not exceed M p Ts for some integer M p , the received baseband sequence assuming perfect synchronization can be expressed as r(k) =

∞ '

M p −1

'

i=−∞ m=0

g



hm (k)xi k − m − i Ns + G



+ n(k), (5)

where n(k) is the sample of an AWGN process. Then, the ith received OFDM symbol can be given by ri (k) = r(i(Ns + G) + k) for −G ≤ k ≤ Ns − 1. Assuming that Tm does not exceed GTs , the ri (k) after eliminating the initial G guard samples can be expressed as M p −1

ri (k) =

'

m=0

g

hm,i (k)xi (k − m) + ni (k) for 0 ≤ k ≤ Ns − 1,

(6) where hm,i (k) = hm (i(Ns +G)+k). The demodulator performs DFT on {ri (k), 0 ≤ k ≤ Ns − 1}, producing the output [14] 

j =1

 t  R (l), R (l), . . . , R i i+1 i+NM (l) ,  t Ri (l) =  Ri (l), Ri (l + 1), . . . , Ri l + NM ,

where, throughout the paper, the notations (·)t and (·)† are used to denote the transpose and the Hermitian transpose, respectively. The column vector Ri (l) is input to a multiple-symbol differential detector implemented based on maximum-likelihood sequence estimation (MLSE). The MLSE detects the most likely estimated information symbol sequence

−1

(12)

3.2.

Covariance matrix in time-domain MSDD

The covariance of Ri (l) in (7) can be written as 



∗ ∗ + ai (l)Hi (l)Ci+α (l) + a∗i+α (l)Hi+α (l)Ci (l)

Following the basis on the MSDD system in [4, 5, 6], we rewrite the transmitted complex sequence in (2) as -

in FD,



∗ (l)a∗i+α (l) E Ri (l)R∗i+α (l) = E ai (l)Hi (l)Hi+α

3.1. Multiple-symbol differential detection

-

(11)

ˆ Ri (l) is a covariance matrix of Ri (l) is the smallest, where Φ conditioned on cˆ i (l). It should be noted that the complexity of MSDD increases exponentially with M NM . In the following, we derive the covariance matrix for each case.



ai (l + d) = Es si (l)zd,i (l),

in TD, in FD,

from all M NM possible NM -length sequences. As shown in [6], this is accomplished by selecting the sequence cˆ i (l) of which the metric 

OPTIMAL AND SUBOPTIMAL METRICS

in TD,

in TD, (10) in FD,

ˆ R (l) Ri (l) M cˆ i (l) = Ri (l)† Φ i

Here, Ri (l) denotes the received symbol on the lth subcarrier of the ith OFDM symbol. In (7), Hi (l), Ci (l), and Wi (l) are the multiplicative distortion, the ICI, and the AWGN, respectively, on the lth subcarrier of the ith OFDM symbol. Based on Ri (l), a multiple-symbol differential detector in each domain makes a decision on the estimated information symbols, which is described in the next section.

ai+d (l) = Es si (l)zd,i (l),

in FD,

and NM denotes the observation interval of the information symbols. Note that with this definition of NM , the conventional DD corresponds to the case with NM = 1. Also, apparently, we have zd,i (l) ∈ {exp( j2πm/M), m = 0, 1, . . . , M − 1}. The received symbols in (7) are divided into a detection block that consists of (NM + 1) symbols as

for 0 ≤ l ≤ Ns − 1. (7)

3.

for 1 ≤ d ≤ NM ,

   cˆ (l), . . . , cˆ i+1 i+NM (l) ,   cˆ i (l) =  cˆi (l + 1), . . . , cˆi l + NM ,

Ns −1 1 ' +3 ni (k)e− j(2πkl/Ns ) Ns k=0

(9)

for d = 0,

j =1



p −1 N' s −1 M' 1' ai (n) hm,i (k)e− j(2πnm/Ns ) e j(2πk(n−l)/Ns ) + Ns n k=0 m=0 =l

in TD,

for 1 ≤ d ≤ NM ,

   1,  d zd,i (l) = * ci (l + j),   

p −1 Ns −1 M' 1 ' Ri (l) =  hm,i (k)e− j(2πlm/Ns ) ai (l) Ns k=0 m=0

= Hi (l)ai (l) + Ci (l) + Wi (l),

for d = 0,

(8)



∗ ∗ + Ci (l)Ci+α (l) + Wi (l)Wi+α (l) ,

(13) where the notation E[·] and ·∗ are used to denote the expectation and complex conjugate, respectively. For uncorrelated and isotropic scattering, the correlation of the tap coefficients

Performance Analysis of MSDD for OFDM over Fading Channels Ai A†i = Es I and (8), (19) can be rewritten as

is expressed, by definition, as 

E hm,i (k)h∗m ,i+α (k )





where σm2 is the average power of the mth channel tap, J0 (·) is the zeroth-order Bessel function of the first kind, fD is the maximum Doppler frequency, and δm,m is the Kronecker delta function. By normalizing the average power of each M p −1 path such that m=0 σm2 = 1, the correlation of the multiplicative distortion is expressed as 

∗ φt (α) ≡ E Hi (l)Hi+α (l)



Ns −1 N' s −1     (15) 1 ' J0 2π fD Ts k − k − α Ns + G . 2 Ns k=0 k =0

Due to the assumption of the statistical independence of   the information symbols, we have E ai (l)a∗i (l ) = Es δi,i δl,l , which yields 







∗ ∗ (l) = E a∗i+α (l)Hi+α (l)Ci (l) = 0. E ai (l)Hi (l)Ci+α

∗ E Ci (l)Ci+α (l)



=

Es −

Es Ns + 2 Ns2

N' s −1



 

Ns − k J0 2π fD Ts k



δ0,α

3.3.

2 ≡ σICI δ0,α ,

(17)

 −1

Z†i .

(21)

Covariance matrix in frequency-domain MSDD

Likewise, for FD-MSDD, by noticing that the correlation of interest is irrelevant to the OFDM symbol index i, the covariance of R(l) in (7) can be expressed as 

E R(l)R∗ (l + α)



 = E a(l)H(l)H ∗ (l + α)a∗ (l + α)

+ a(l)H(l)C ∗ (l + α) ∗

(22)



+ a (l + α)H (l + α)C(l)



+ C(l)C ∗ (l + α) + W(l)W ∗ (l + α) . Given the transmitted symbols a(l), (22) can be decomposed as 

E R(l)R∗ (l + α)

k=1

(20)

ˆ R−1 can be obtained by substituting estimated seTherefore, Φ i quence Zˆ i = diag(1, zˆ1,i , . . . , zˆNM ,i ) for Zi in (21). When the channel is stationary such that all the variables Es , N0 , Φt , ˆ R−1 need not be calculated each and σICI remain constant, Φ i time.

(16)

##



2 ΦR−i1 = Zi Es Φt + N0 + σICI I

 "

"

 

where Zi = diag(1, z1,i , . . . , zNM ,i ). Then, since Zi is a unitary matrix, it follows that

As shown in [3], for sufficiently large Ns , the central limit theorem can be invoked and the ICI can be modeled as a complex Gaussian random process with zero mean. Then the correlation of the ICI can be obtained as 



2 I Z†i , ΦRi = Zi Es Φt + N0 + σICI

(14)

    2 J0 2π fD Ts k − k − α Ns + G δm,m , = σm

=

1539



  = a(l)E H(l)H ∗ (l + α) a∗ (l + α)    ∗ ∗

+ E a(l)H(l)C (l + α) + E a (l + α)H ∗ (l + α)C(l)

2 σICI



is the variance of the ICI. where The correlation of the AWGN is given by 



∗ E Ni (l)Ni+α (l) = N0 δ0,α ,









+ E C(l)C ∗ (l + α) + E W(l)W ∗ (l + α) . (23) (18)

where N0 is the one-sided power spectral density of the AWGN process. Recognizing that the covariance matrix of arbitrary Ri (l) denoted by ΦRi (l) is irrelevant to the index l, and using (13), (14), (15), (16), (17), and (18), one can easily show that 2 ΦRi = Ai Φt A†i + N0 + σICI I,





(19)

where Ai = diag(ai , ai+1 , . . . , ai+NM ) is a diagonal matrix, Φt is the covariance matrix of the multiplicative distortion of which the (β, γ)th element can be expressed as φt (γ − β) defined in (15), and I is the identity matrix of size NM + 1. With

The first term in (23) requires the correlation of the multiplicative distortion, which is given by 

φ f (α) ≡ E H(l)H ∗ (l+α)

 #

"

M p −1 N' s −1     ' 2 j(2παm/N ) 1 s Ns − k J0 2π fD Ts k σm e = 2 Ns +2 Ns m=0 k=1

  M' p −1 σ2 = 1 − ICI σm2 e j(2παm/Ns ) .

Es

m=0

(24) Due to the wide-sense stationarity of the fading process, the covariance matrix of H(l) can be given by Φ f in which the (β, γ)th element has φ f (γ − β) of (24).

1540

EURASIP Journal on Applied Signal Processing

The second term in (23) requires the calculation of the following term: κl (β, γ)



≡ E a(l + β)H(l + β)C ∗ (l + γ) l+N 'M

1 a(l + β) a∗ (n) Ns2 n=l, n =l+γ

=

·

N' s −1 N' s −1

=

'

σm2 e j(2π(n−(l+β))m/Ns )

m=0



·







J0 2π fD Ts (k − k ) e

− j(2πk  (n−(l+γ))/Ns )

N' s −1

n=0, n =l,...,l+NM k =0 k  =0

Es Ns2 '

m=0

,

·

where we have applied (8). Using the Taylor series expansion of the Bessel function J0 (2πx) ≈ 1 − (πx)2 , which becomes valid for |x|  1 [15], κl (β, γ) in (25) can be approximated as

l+N 'M

n =l, n  =l+γ

n=l, n =l+β 

N' s −1 N' s −1



J0 2π fD Ts (k − k )





· e j(2πk(n−(l+β))/Ns ) e j(2πk (−n +(l+γ))/Ns )  2 N' s −1     2Es π fD Ts  ≈ q n − (l+β) q − n+(l + γ) 2  Ns n=0, n =l,...,l+NM l+N 'M

zn−l (l)

n=l, n =l+β

Es π fD Ts zβ (l) Ns '

m=0





M p −1

l+N 'M

·

M p −1

n=l, n =l+γ



k=0 k =0

2

zn∗−l (l)

zn∗ −l (l)

σm2 e j(2π(n −n)m/Ns )

+

l+N 'M

l+N 'M

zn−l (l)

κl (β, γ)

·



M p −1

(25)





J0 2π fD Ts (k − k )



·

k=0 k =0



N' s −1 N' s −1

· e j(2πk(n−(l+β))/Ns ) e− j(2πk (n−(l+γ))/Ns )

+

M p −1 l+N 'M ' Es ∗ z (l) z (l) σm2 e j(2π(n−(l+β))m/Ns ) β n−l 2 Ns m=0 n=l, n =l+γ N' s −1 N' s −1

Es Ns2



J0 2π fD Ts (k − k ) e− j(2πk (n−(l+γ))/Ns )

k=0 k =0

=

φC,l (β, γ)

  ≡ E C(l + β)C ∗ (l + γ)



M p −1



given by



n =l, n  =l+γ

σm2 e j(2π(n−(l+β))m/Ns ) p n − (l+γ) ,

zn −l (l)

'



m=0

σm2 e j(2π(n −n)m/Ns )

  · q n − (l + β) q − n + (l + γ) ,  

(26)

 



(29)

where p(α) =

N' s −1





Ns − 1 k − k

where

 2



e− j(2πk α/Ns ) .

(27)

q(α) =

k =0



≡ E a∗ (l + γ)H ∗ (l + γ)C(l + β)













J0 2π fD Ts (k − k) e− j(2πk (−n+l+β)/Ns )

k =0

l+N 'M

2

n=l, n =l+β

M p −1

zn−l (l)

'

m=0



ΦR(l) = Z(l) Es Φ f + N0 I Z† (l) + Kl + Ξl + ΦC,l .

Es π fD Ts ∗ zγ (l) Ns ·



(31)

In the following, the notations Kl , Ξl , and ΦC,l represent the matrices with the (β, γ)th element given by κl (β, γ), ξl (β, γ), and φC,l (β, γ), respectively. Then, using (23), (24), (25), (26), (27), (28), (29), (30), and (31), it can be shown that

' Es ∗ zγ (l) zn−l (l) σm2 e− j(2π(n−(l+γ))m/Ns ) 2 Ns m=0 n=l, n =l+β

k=0





M p −1

l+N 'M

N' s −1 N' s −1

(30)

Finally, for the AWGN term, we have E W(l)W ∗ (l + α) = N0 δ0,α .

ξl (β, γ)

·

ke j(2πkα/Ns ) .

k=0

Likewise, the third term in (23) requires the following:

=

N' s −1

σm2 e− j(2π(n−(l+γ))m/Ns ) p(−n + l+β). (28)

The fourth term in (23) corresponds to the ICI, which is

(32)

The exact calculation of (32) requires the knowledge of both delay profile and fD . Furthermore, it requires higher computational complexity resulting from (25), (26), (27), (28), and −1 ˆ R(l) ˆ (29) and calculations of inverse matrices Φ over all Z(l). To obviate the computation of these unwieldy terms, we also introduce the following suboptimal alternative: 



 

2 ˆ ˆ R(l) = Z(l) Es Φ f + N0 + σICI I Zˆ † (l). Φ

(33)

Performance Analysis of MSDD for OFDM over Fading Channels This approximate covariance matrix can be obtained by simply substituting the covariance matrix of the multiplicative distortion Φ f in FD for Φt in (20). Since this approximate covariance matrix has an analogous aspect to the covariance matrix in TD, the required computation can be significantly reduced. The price for this simplification is its performance degradation caused by the time selectiveness of the channel, compared to FD-MSDD with the exact covariance matrix. Note that without ICI, the matrices (32) and (33) become identical. The BER performance of this suboptimal FD-MSDD is examined over both time- and frequencyselective Rayleigh fading channels in Section 5. 4.

1541

where h(ui (l), uˆ i (l)) denotes the Hamming distance between ui (l) and uˆ i (l). An upper bound on the BER can be obtained by the union of all pairwise error events. The BER of TD-MSDD is independent of the OFDM symbol index i, the subcarrier index l, and information symbol sequence c in terms of theoretical BER associated with the corresponding covariance matrix (20). As a result, c can be assumed as the all-zerophase sequence, that is, c = (1, . . . , 1). The union bound on the BER of TD-MSDD can then be written as Pb ≤ =

4.1. Pairwise error probability The PEP of MSDD for OFDM can be derived simply by substituting the covariance matrix derived in Section 3 for that of PEP given in [6]. It can be shown that 



=−

where



'

4

Residue





ΦD (s) s

5

, RPpoles

N* M +1 k=1

1 , 2λk s + 1

where λk is the kth eigenvalue of the matrix 



(37)

This expression is the exact PEP of TD-MSDD and FDMSDD. The PEP of the suboptimal FD-MSDD can be obˆ Ri (l) in tained simply by replacing the covariance matrix Φ (37) with the corresponding covariance matrix in (33). The covariance matrix ΦRi (l) in (37) remains unchanged and it corresponds to the exact covariance matrix associated with the actual received symbols. 4.2. Approximate BER The information symbol sequence ci (l) has NM log2 M information bits denoted by ui (l). Let uˆ i (l) also denote estimated information bits associated with cˆ i (l). The pairwise BER associated with transmitting a sequence ci (l) and detecting an erroneous sequence cˆ i (l) is given by 

1 '' Pb (c −→ cˆ ) M NM c cˆ  =c '' 1 ˆ = N h(u, u)P(c −→ cˆ ). M M NM log2 M c cˆ  =c

(35)

(36)

ˆ R−1(l) − ΦR−1(l) . G = ΦRi (l) Φ i i

(38)

(40)

Direct application of (39) and (40), however, does not yield a tight bound of the bit error performance for TDMSDD and FD-MSDD over time- and frequency-selective Rayleigh fading channels. As shown in [6] for single-carrier transmission over the time-selective channel, the BER can be approximated by the summation of the PEP over the set of most likely error events. These most likely error events are determined by the set {zˆ1 , . . . , zˆNM } which has the highest correlation with the set {z1 , . . . , zNM }, where the correlation  is defined as µ = |1 + Nk=M1 zk zˆk |2 . There are only a total of 2 for NM = 1 and 2NM + 2 for NM ≥ 2 such events over each set {z1 , . . . , zNM }. Since the difference of PEP between TDMSDD and MSDD for single-carrier transmission is only an additive ICI, the BER of TD-MSDD can be approximated by the same method. In the case of FD-MSDD, when the effects of the ICI are relatively small, the covariance matrix of FDMSDD is similar to that of TD-MSDD. Hence, we conjecture that the BER of FD-MSDD can be also approximated by the same method. Consequently, by defining the set of these most likely error events as χ, the approximate BER can be expressed as Pb ≈

1 NM log2 M

'

ˆ h(u, u)P(c −→ cˆ ),

cˆ  =c, cˆ ∈χ

for TD-MSDD,



Pb ci (l) −→ cˆ i (l)     1 = h ui (l), uˆ i (l) P ci (l) −→ cˆ i (l) , NM log2 M

(39)

where the summation is taken over all the distinct sequences cˆ which differ from the transmitted information symbol sequence c. On the other hand, the BER of both the optimal and suboptimal FD-MSDD is dependent on the transmitted sequence c. Since it is independent of the subcarrier index l, l can be assumed to be 0. It must be averaged over all the sequences c. The union bound on the BER of FD-MSDD can then be obtained as

i

ΦD (s) is the characteristic function of D, and the summation is taken over all the residues calculated at the poles of ΦD (s)/s located on the right-hand plane. Following [6], one may have ΦD (s) =

' 1 ˆ h(u, u)P(c −→ cˆ ), NM log2 M cˆ  =c

Pb ≤



D = M cˆ i (l) − M ci (l)  −1  † ˆ R (l) − ΦR−1(l) Ri (l), = Ri (l) Φ i

(34)

Pb (c −→ cˆ )

cˆ  =c

BIT ERROR PROBABILITY ANALYSIS

P ci (l) −→ cˆ i (l) = Prob(D ≤ 0)

'

Pb ≈

1 M NM NM log2 M

' '

ˆ h(u, u)P(c −→ cˆ ),

c cˆ  =c, cˆ ∈χ

for FD-MSDD.

(41)

1542

EURASIP Journal on Applied Signal Processing

10−1

10−1 NM = 1

10−2 10−2

NM = 2

BER

NM = 4

10−4

10−3

NM = 4

10−5 10−4

10

15

NM NM NM NM

20

25

30 35 40 Eb /N0 (dB)

=1 =2 =4 =7

45

50

55

60

NM = 10 NM = 1 (simulation) NM = 4 (simulation)

Figure 2: BER performance of TD-MSDD with QDPSK over the time- and frequency-selective Rayleigh fading channel with fD = 0.01, Tm ≤ 7/64, RG = 7/64. NM = 1 corresponds to conventional DD.

It is shown in [8] that for TD-DD and FD-DD (i.e., NM = 1) with QDPSK, inphase and quadrature components of the received sequence are statistically independent. Thus, in the case of TD-DD and FD-DD with QDPSK, most likely error events are statistically independent, and thus the BER obtained by the above method results in a closed-form expression. 5.

NM = 7

BER

10−3

NUMERICAL RESULTS

Numerical results presented in this section include Monte Carlo simulation results and theoretical results based on the approximate BER in (41). These results are investigated over a two-ray equal-power profile. As a generalization of MSDD to OFDM, we normalize the Doppler frequency and delay spread by the OFDM symbol period, defined as fD = fD Ns Ts and Tm = M p Ts /(Ns Ts ) = M p /Ns , respectively. For this channel, the average power of the mth channel tap can be expressed as   1,

σm2 =  2 0,

for m = 0, M p , for m  = 0, M p .

(42)

5.1. Verification of analysis Theoretical and simulation results for the BER performance of TD-MSDD with QDPSK over the time- and frequencyselective channel with fD = 0.01, Tm ≤ 7/64, guard interval ratio RG = 7/64 (defined as RG = G/Ns ), are shown in

10−6 10

NM = 7 15

20

25

30 35 40 Eb /N0 (dB)

45

50

55

60

Conventional DD (NM = 1) Optimal FD-MSDD Suboptimal FD-MSDD Conventional DD (NM = 1, simulation) Optimal FD-MSDD (NM = 4, simulation) Suboptimal FD-MSDD (NM = 4, simulation)

Figure 3: BER performance of FD-MSDD with QDPSK over the time- and frequency-selective Rayleigh fading channel with fD = 0.01, Tm = 2/64, Ns = 64, G ≥ 2.

Figure 2. Note that the OFDM system with Ns = 64, a carrier frequency of 5 GHz, a bandwidth of 1 MHz, and a mobile station velocity of 34 km/h may result in fD ≈ 0.01. In this case, since the ISI does not occur, these results are independent of the specific value of Tm (≤ 7/64). Although RG is relevant to the correlation of the multiplicative distortion, its effect is relatively small without ISI. It is observed from Figure 2 that for NM = 4, the simulation results show close agreement with the theoretical results at high SNR (above 25 dB). At lower SNR, however, the approximation appears to be slightly pessimistic, due to the asymptotic tightness nature of the union bound. The performance degradation of TD-DD is noticeable over the time-selective channel. This is caused by both decrease in the intersymbol correlation of the multiplicative distortion and the irreducible ICI associated with the OFDM transmission. Even though increasing NM in TD-MSDD may alleviate performance degradation due to decrease in the intersymbol correlation, it is not capable of reducing the effect of the ICI. Thus, the error floor appears for TD-MSDD even with large NM . Figure 3 compares theoretical and simulation results for the BER performance of FD-MSDD with QDPSK over the time- and frequency-selective channel with fD = 0.01, Tm = 2/64, Ns = 64, G ≥ 2. Note that the result is irrelevant to the value of Ns . Similar to the case of TD-MSDD, good agreement between the simulation and theoretical results is observed at high SNR (above 20 dB). Even though the performance degradation is noticeable for FD-DD, increasing NM may improve the bit error performance of both the optimal and suboptimal FD-MSDD. Furthermore, the significant benefit of the optimal FD-MSDD over the suboptimal FDMSDD is apparent. This stems from the fact that the optimal

Performance Analysis of MSDD for OFDM over Fading Channels

1543

10−1

10−3

10−2

10−4

10−3 BER

BER

10−2

10−5

10−4 10−5

10−6

10−6 0

10−7 10

15

20

NM = 1 NM = 2 NM = 4

25

30 35 40 Eb /N0 (dB)

45

50

55

0.01

0.02

60

Figure 4 shows theoretical results for the BER performance of FD-MSDD with QDPSK over the time-nonselective (i.e., fD = 0.0) frequency-selective channel with Tm = 2/64 and G ≥ M p . In this case, the behavior of optimal FD-MSDD is equivalent to that of suboptimal FD-MSDD, since Kl , Ξl , ΦC,l in (32) are all equal to zero matrices. It is observed from Figure 4 that without ICI, the irreducible error floor associated with a decrease in the inter-subcarrier correlation of the multiplicative distortion for FD-DD can be efficiently eliminated for FD-MSDD even with NM as small as 2. When NM = 10, the performance degradation from that with frequency-nonselective channel is approximately 0.4 dB at a BER of 10−6 . Thus, in the limit as the observation interval approaches infinity, the BER behavior of FD-MSDD over frequency-selective channels without ICI approaches that with the same observation interval over a static channel. 5.3. Comparison between TD-MSDD and FD-MSDD Figure 5 shows theoretical results for the BER performance of TD-DD and FD-DD with QDPSK employed in each dimension and RG = 7/64. For the sake of comparison of the asymptotic bit error performance at error floor region, Eb /N0 is fixed at 60 [dB]. Note that, given the system parameters by Ns = 64, 5 GHz carrier frequency, and 1 MHz bandwidth, the range fD up to 0.05 corresponds to the mobile station velocity up to approximately 170 km/h. It is observed from Figure 5 that the performance degradation of TD-DD is caused only by the time selectiveness and is irrelevant to the frequency selectiveness, as long as the ISI is

0.02 0.05 0

Figure 5: BER performance of TD-DD and FD-DD with QDPSK in each dimension, Eb /N0 = 60 (dB), RG = 7/64.

10−2

metric calculates the exact impact of ICI whereas the suboptimal metric only utilizes the approximation.

10−3 BER

5.2. Asymptotic performance of FD-MSDD

0.04

TD-DD FD-DD

NM = 7 NM = 10 NM = 10 (Tm = 0)

Figure 4: BER performance of FD-MSDD with QDPSK over the time-nonselective (i.e., fD = 0.0) frequency-selective Rayleigh fading channel with Tm = 2/64, G ≥ M p .

0.03

fD

0.1 0.08 0.06 0.04 Tm

10−4 10−5 10−6

0

0.01

0.02 fD

0.03

0.04

0.1 0.08 0.06 0.04 Tm 0.02 0.05 0

Optimal FD-MSDD (NM = 2) Suboptimal FD-MSDD (NM = 2)

Figure 6: BER performance of optimal FD-MSDD and suboptimal FD-MSDD with QDPSK in each dimension, Eb /N0 = 60 (dB), NM = 2, Ns = 64, G = 7.

negligible. For FD-DD, the frequency selectiveness is the limiting factor for the BER. These results suggest the importance of appropriate selection of the DD technique matched to the channel statistics. Theoretical results for the BER performance of the optimal and suboptimal FD-MSDD with QDPSK in each dimension with Eb /N0 = 60 [dB], NM = 2, Ns = 64, G = 7 are shown in Figure 6, where it is observed that for NM = 2, the difference between the optimal and suboptimal FD-MSDD is negligible. Thus, the optimal FD-MSDD with complicated decision metric may not be necessarily rewarding in practice. Unlike FD-DD, both the FD-MSDD approaches are robust against the frequency selectiveness, and the ICI due to the time selectiveness is the limiting factor.

1544

EURASIP Journal on Applied Signal Processing

10−3

10−3

10−4

10−4

BER

BER

10−2

10−5

10−5 10−6

0

0.01

0.02

0.03

fD

0.04

0.1 0.08 0.06 0.04 Tm 0.02

0

0.01

0.02

0.05 0

fD

TD-MSDD (NM = 2) Optimal FD-MSDD (NM = 2)

0.03

0.04

0.05 0

TD-MSDD (NM = 4) Suboptimal FD-MSDD (NM = 4)

Figure 7: BER performance of TD-MSDD and optimal FD-MSDD with QDPSK in each dimension, Eb /N0 = 60 (dB), NM = 2, Ns = 64, G = 7.

Figure 9: BER performance of TD-MSDD and suboptimal FDMSDD with QDPSK in each dimension, Eb /N0 = 60 (dB), NM = 4, Ns = 64, G = 7.

conditions as Figure 8 are shown in Figure 9 , where it is observed that the behavior of the suboptimal FD-MSDD is analogous to that of TD-MSDD. Thus, for NM ≥ 2, the difference between the BER performance of TD-MSDD and that of the suboptimal FD-MSDD may be negligible.

10−3 BER

0.1 0.08 0.06 0.04 Tm 0.02

10−6

10−4 10−5

6. 0.1 0.08 0.06 0.04 Tm 0.02

10−6 0

0.01

0.02 fD

0.03

0.04

0.05 0

TD-MSDD (NM = 4) Optimal FD-MSDD (NM = 4)

Figure 8: BER performance of TD-MSDD and optimal FD-MSDD with QDPSK in each dimension, Eb /N0 = 60 (dB), NM = 4, Ns = 64, G = 7.

Theoretical results for the BER performance of TDMSDD and the optimal FD-MSDD with the same channel and system parameters above are shown in Figure 7. It is observed that for NM = 2, the behavior of FD-MSDD is analogous to that of TD-MSDD, since both are able to mitigate the performance degradation associated with the decrease in the correlation of the multiplicative distortion. With NM = 2, however, they do not alleviate the effect of ICI. Figure 8 shows the performance of the system with the same parameters as Figure 7 except now we set NM = 4. It is observed that even though the optimal FD-MSDD requires higher complexity, it outperforms TD-MSDD on almost all channel statistics compared. This difference comes from the fact that the optimal FD-MSDD can also mitigate the ICI. Finally, theoretical results for the BER performance of TD-MSDD and the suboptimal FD-MSDD with the same

CONCLUSION

In this paper, we applied MSDD to OFDM over time- and frequency-selective Rayleigh fading channels under the assumption that the guard time is longer than the delay spread, thus causing no effective ISI. Optimal decision metrics of TD-MSDD and FD-MSDD have been derived based on the exact covariance matrix conditioned on transmitted information symbol sequence. The theoretical BER performance of MSDD for OFDM has been analyzed, and based on these analytical results, we have shown that when simple receiver structure is preferable, both TD-MSDD and the suboptimal FD-MSDD provide a good performance because of their robustness against the time- and frequency-selective nature of the channel. Thus, as opposed to need of careful selection between TD-DD and FD-DD according to the channel statistics, the difference in BER performance between TDMSDD and the suboptimal FD-MSDD is negligible. Furthermore, it has been shown that if the enhancement of computational complexity at the receiver is acceptable, the optimal FD-MSDD may be a very effective strategy due to its robustness against the ICI over such channels. In the limit as the observation interval approaches infinity, the BER performance of FD-MSDD over frequencyvarying channels without ICI may approach that with the same observation interval over a static channel. However, the high computational complexity is the main disadvantage of MSDD, and it has been shown in [16, 17] that decision-feedback differential detection (DF-DD) techniques provide a good performance at a low computational complexity. Since it has been shown that MSDD and DF-DD are

Performance Analysis of MSDD for OFDM over Fading Channels equivalent and DF-DD can be derived from MSDD by introducing decision-feedback symbols into the MSDD metrics, the metrics proposed in this paper can be also applied to DFDD for OFDM for reduction of computational complexity. Therefore, extension of the proposed metric to DF-DD with OFDM may be a topic for future study. REFERENCES [1] Y. Wu and B. Caron, “Digital television terrestrial broadcasting,” IEEE Communications Magazine, vol. 32, no. 5, pp. 46– 52, 1994. [2] R. van Nee and R. Prasad, OFDM for Wireless Multimedia Communications, Artech House Publishers, Boston, Mass, USA, 2000. [3] M. Russell and G. L. Stuber, “Interchannel interference analysis of OFDM in a mobile environment,” in IEEE 45th Vehicular Technology Conference (VTC ’95), vol. 2, pp. 820–824, Chicago, Ill, USA, July 1995. [4] D. Divsalar and M. K. Simon, “Multiple-symbol differential detection of MPSK,” IEEE Trans. Communications, vol. 38, no. 3, pp. 300–308, 1990. [5] P. Ho and D. Fung, “Error performance of multiple symbol differential detection of PSK signals transmitted over correlated Rayleigh fading channels,” in Proc. IEEE International Conference on Communications (ICC ’91), vol. 2, pp. 568–574, Denver, Colo, USA, June 1991. [6] P. Ho and D. Fung, “Error performance of multiple-symbol differential detection of PSK signals transmitted over correlated Rayleigh fading channels,” IEEE Trans. Communications, vol. 40, no. 10, pp. 1566–1569, 1992. [7] H. Schubert, A. Richter, and K. Iversen, “Differential modulation for OFDM in frequency vs. time domain,” in ACTS Mobile Communication Summit, pp. 763–768, Aalborg, Denmark, October 1997. [8] M. Lott, “Comparision of frequency and time domain differential modulation in an OFDM system for wireless ATM,” in IEEE 49th Vehicular Technology Conference (VTC ’99), vol. 2, pp. 877–883, Houston, Tex, USA, May 1999. [9] M. Lott, P. Seidenberg, and S. Mangold, “BER characteristic for the broadband radio channel at 5.2 GHz,” in Proc. ITG-Workshop Wellenausbreitung bei Funksystemen und Mikrowellensystemen, pp. 53–60, Oberpfaffenhofen, Germany, May 1998. [10] M. Okada, S. Hara, and N. Morinaga, “Bit error rate performances of orthogonal multicarrier modulation radio transmission systems,” IEICE Transaction on Communications, vol. E76-B, pp. 113–119, 1993. [11] R. F. H. Fischer, L. H.-J. Lampe, and S. H. Muller-Weinfurtner, “Coded modulation for noncoherent reception with application to OFDM,” IEEE Trans. Vehicular Technology, vol. 50, no. 4, pp. 910–919, 2001. [12] J. G. Proakis, Digital Communications, McGraw-Hill, New York, NY, USA, 4th edition, 2000. [13] W. C. Jakes Jr., Ed., Microwave Mobile Communications, John Wiley & Sons, New York, NY, USA, 1974. [14] Y. H. Kim, I. Song, H. G. Kim, T. Chang, and H. M. Kim, “Performance analysis of a coded OFDM system in time-varying multipath Rayleigh fading channels,” IEEE Trans. Vehicular Technology, vol. 48, no. 5, pp. 1610–1615, 1999. [15] W. C. Lee, Mobile Communications Engineering, McGrawHill, New York, NY, USA, 1982. [16] F. Adachi and M. Sawahashi, “Decision feedback multiplesymbol differential detection for M-ary DPSK,” Electronics Letters, vol. 29, no. 15, pp. 1385–1387, 1993.

1545

[17] R. Schober, W. H. Gerstacker, and J. B. Huber, “Decisionfeedback differential detection of MDPSK for flat Rayleigh fading channels,” IEEE Trans. Communications, vol. 47, no. 7, pp. 1025–1035, 1999. Akira Ishii received the B.E. and M.E. degrees in communication engineering from the University of Electro-Communications, Tokyo, Japan, in 2002 and 2004, respectively. He has joined NTT DoCoMo, Tokyo, Japan, in 2004. His current research interests include modulation and coding techniques in mobile communications.

Hideki Ochiai received the B.E. degree in communication engineering from Osaka University, Osaka, Japan, in 1996, and the M.E. and Ph.D. degrees in information and communication engineering from The University of Tokyo, Tokyo, Japan, in 1998 and 2001, respectively. From 1994 to 1995, he was with the Department of Electrical Engineering, University of California (UCLA), Los Angeles, under the scholarship of the Ministry of Education, Science, and Culture. From 2001 to 2003, he was with the Department of Information and Communication Engineering, The University of Electro-Communications, Tokyo, Japan. Since April 2003, he has been with the Division of Physics, Electrical and Computer Engineering, Yokohama National University, Yokohama, Japan, where he is an Assistant Professor. His current research interests include modulation and coding techniques in mobile communications. Dr. Ochiai was a recipient of a Student Paper Award from the Telecommunications Advancement Foundation in 1999 and the Ericsson Young Scientist Award in 2000. Tadashi Fujino was born in Osaka, Japan, on 15 July, 1945. He received his B.E. and M.E. degrees in electrical engineering and his Ph.D. degree in communication engineering from Osaka University in 1968, 1970, and 1985, respectively. Since April 2000, he has been Professor in wireless communication at the Department of Information and Communication Engineering, the University of Electro-Communications, Tokyo, Japan. Before he engaged with the University, he had been engaged with Mitsubishi Electric Corporation, Tokyo, Japan, since 1970, where he devoted his efforts to R&D in the area of wireless communications such as digital satellite communications and land mobile communications. His major works are the development of 120 Mbit/s QPSK modem, the Trellis Coded 8-PSK Modem to operate at 120 Mbit/s, and portable phones used in Japan and Europe. He received the Meritorious Award from the ARIB (the Associate of Radio Industries and Businesses of Japan) of MPT of Japan, in 1997. He is an IEEE Fellow. He is also a Member of IEICE and a Member of Society of Information Theory and Its Applications.

EURASIP Journal on Applied Signal Processing 2004:10, 1546–1556 c 2004 Hindawi Publishing Corporation 

Optimized Irregular Low-Density Parity-Check Codes for Multicarrier Modulations over Frequency-Selective Channels ´ Valerian Mannoni ETIS/ENSEA/UCP CNRS 8051, 6 Avenue du Ponceau 95014 Cergy-Pontoise Cedex, France Laboratory of Decision and Communications Systems (D´eCom), University of Reims Champagne-Ardenne, UFR Sciences, Moulin de la Housse, 51687 Reims Cedex 2, France Email: [email protected]

David Declercq ETIS/ENSEA/UCP CNRS 8051, 6 Avenue du Ponceau 95014 Cergy-Pontoise Cedex, France Email: [email protected]

Guillaume Gelle Laboratory of Decision and Communications Systems (D´eCom), University of Reims Champagne-Ardenne, UFR Sciences, Moulin de la Housse, 51687 Reims Cedex 2, France Email: [email protected] Received 1 March 2003; Revised 30 October 2003 This paper deals with optimized channel coding for OFDM transmissions (COFDM) over frequency-selective channels using irregular low-density parity-check (LDPC) codes. Firstly, we introduce a new characterization of the LDPC code irregularity called “irregularity profile.” Then, using this parameterization, we derive a new criterion based on the minimization of the transmission bit error probability to design an irregular LDPC code suited to the frequency selectivity of the channel. The optimization of this criterion is done using the Gaussian approximation technique. Simulations illustrate the good performance of our approach for different transmission channels. Keywords and phrases: orthogonal frequency division multiplexing, frequency-selective channel, optimized channel coding, irregular LDPC codes, density evolution.

1.

INTRODUCTION

In this paper, we address the problem of designing codes for transmissions over frequency-selective channels when orthogonal frequency division multiplexing (OFDM) modulation technique is used. Multicarrier modulations are good candidates for the emerging high rate transmissions, either wired, wireless, single, or multiuser. Several standards have chosen OFDM modulation because it allows a very simple mitigation of intersymbol interference (ISI), which could be very destructive when the information rate is high [1, 2]. The OFDM modulator transforms a frequency-selective channel into a set of flat-fading channels, which are easier to equalize. The problem of channel coding for OFDM systems (CodedOFDM or COFDM) has been already addressed [3]. Based on the emerging capacity approaching coding schemes, we propose an alternative coding structure for COFDM. In some

applications, like wired xDSL transmissions, there exists a backward channel that propagates some information from the receiver back to the transmitter. Properly used, this information can give to the transmitter an estimation of the channel that is going to be crossed. We propose in this paper to make use of this Information to design a code that is adapted to a frequency-selective OFDM channel. Although we assume perfect channel state information (CSI) at the transmitter, we will see that partial CSI is sufficient for our code design. In 1948, Shannon [4] characterized the optimal performance theoretically reachable for coded transmission over a noisy channel. Since then, the construction of capacity approaching codes has been the main challenge of coding research. Turbo codes [5] and Gallager low-density paritycheck (LDPC) codes [6, 7] are the two competing families of pseudorandom codes that could achieve the capacity for

various kind of channels. It has been shown that irregular LDPC codes are especially interesting because one can optimize the parameters that characterize their irregularity in order to find the codes that are the closest to the capacity for various types of channels. The optimization of these codes has been done for the binary erasure channel (BEC) [8] and for the AWGN channel [9]. The optimization of LDPC codes for various nonstandard channels has also been addressed in the literature [10, 11]. We propose in this paper to optimize the LDPC code irregularity for OFDM frequency-selective channels. After the OFDM demodulation, the signal that feeds the channel decoder is seen as coming from a set of Gaussian flatfading channels, each one having a different noise power. So, this channel can be interpreted as a nonstationary Gaussian channel for which the optimization of LDPC codes is not a direct generalization of existing work. Indeed, such an optimization requires a finer characterization of the LDPC code irregularity, that we call irregularity profile [12]. This paper is organized as follow. In Section 2, we define the irregularity profile and make some recalls about OFDM signaling, LDPC codes, and their decoding algorithm. Section 3 describes our optimization algorithm suited to OFDM frequency-selective channels, the results are presented in Section 4, and a conclusion is given in Section 5. 2.

PARAMETERIZATION OF LDPC CODED OFDM

In this section, we introduce the main concepts and notations about LDPC codes that we will use for the optimization. As well as turbo codes, LDPC codes can achieve reliable transmission for a signal-to-noise ratio (SNR) extremely close to the Shannon limit on the AWGN channel [13]. Moreover, these codes present some advantages, such as a simple description of their structure, the easiness to make them irregular, and a fully parallelizable decoding implementation [14]. 2.1. Irregularity profile In this section, we propose to generalize the parameterization of LDPC irregularity in order to cope with nonstationary channels. This new parameterization is called irregularity profile [12]. LDPC block codes are defined by a sparse paritycheck matrix H(M × N), where N denotes the codeword length and M the number of parity checks (in this work, we use only full-rank parity-check matrix, so, if R is the code rate we have M = (1 − R)N). An LDPC code can also be represented by its factor graph which is a bipartite graph with two kinds of nodes: data nodes representing the codeword bits and function nodes representing the parity checks [15]. The nth data node and the kth check node are connected by an edge if and only if Hk,n is equal to 1. A regular (N, tc , tr ) LDPC code has a parity-check matrix with exactly tc ones per column and tr ones per row. When the data nodes and the check nodes have unequal connection degrees (number of edges connected to a node), the LDPC code is irregular. The irregularity is conveniently specified  tr max by two i−1 and ρ(x) = j −1 , polynomials: λ(x) = ti=c max 2 λi x j =2 ρ j x

1547 Connection degree

Optimized LDPC Codes for Multicarrier Modulations

7 4 2

(1)

(2)

λ7

(3)

λ4

λ2

0

1

N

(7)

(4)

(4)

(2)

(2)

(2)

u1

u2

u3

r1

r2

r3

Codeword

Random permuter

Parity checks (9)

(6)

(6)

M

Figure 1: Factor graph of a rate R = 1/2 irregular LDPC code. The connection degree of each node is specified in braces. We have drawn the irregularity profile according to the fraction of data nodes.

where λi is the fraction of edges which are connected to a degree i data nodes and ρ j is the fraction of edges which are connected to a degree j check nodes. tc max and tr max represent the maximal data and check node connection degrees and a degree i data node (resp., a degree j check node) is a node connected to exactly i (resp., j) edges. These two poly tr max nomials are related by (1 − R) ti=c max 2 λi /i = j =2 ρ j / j. It is also useful to use the following dual polynomial represen   i−1  j −1 tation: λ (x) = ti=c max and ρ (x) = tjr=max with 2 λi x 2 ρjx λi being the fraction of data nodes with a connection degree i and ρj being the fraction of degree j check nodes.  (λi , ρ j ) and (λi , ρj ) are related by λi = (λi /i)/ tkc=max 2 λk /k and tr max  ρ j = (ρ j / j)/ k=2 ρk /k. Using these irregularity parameters, the optimization of LDPC codes has already been performed for various channels, including BEC [8, 13], AWGN [9], and Rayleigh channels [16]. The optimization method is based on the study of the asymptotic behavior of the LDPC codes during the decoding steps. For more details about irregular codes, we refer the reader to [13]. In the above described parameterization, (λ (x), ρ (x)) represents the distribution functions of the node degrees, but a node with a given degree could be placed anywhere within the codeword. This is not an issue for memoryless channels, but when the channel is not stationary or has memory, the order in which the nodes are placed in the codeword matters. That is why we introduce a more general description of irregular LDPC codes that we call irregularity profile. This parameterization includes the location pi of the set of data nodes with a given connection degree i. For example, the code described in Figure 1 is defined by   (pi ) i−1 λ (x) = ti=c max = (1/6)(1) x6 + (1/3)(2) x3 + (1/2)(3) x 2 λ i x  8 and ρ (x) = (1/3)x + (2/3)x5 . Note that the position of the

1548

EURASIP Journal on Applied Signal Processing Xk bn

AWGN nn

xn

Ck LDPC code

4 QAM

Frequencyselective channel

Cyclic prefix

IFFT

+

hn , Hk Yk Rk bˆ n

Channel decoding

yn Equalizer

Prefix removal

FFT

Figure 2: OFDM transmission with LDPC channel coding over frequency-selective channels.

check nodes is arbitrary since the parity checks are not influenced by the channel. This irregularity description provides a good framework in order to optimize the LDPC parameters for a wide range of channels, including nonstationary channels. We will denote the irregularity profile (λ, p, ρ). The vectors λ and ρ collect the coefficients of the two polynomials (λ(x), ρ(x)) while the vector p indicates the positions of the different groups of nodes. Using this parameterization, we assume that the nodes with the same degree are located in the same channel neighborhood. Even if better codes that do not fulfil the last assumption may exist, in order to complete the optimization of the irregularity profile in a “reasonable time,” we have decided to restrict the number of parameters. We will see in the simulations that, although restrictive, this definition of the irregularity profile yields a significant performance improvement. Our goal is to optimize the irregularity profile for OFDM frequency-selective channels, which can be interpreted, at the OFDM demodulator output, as a no-stationary flat-fading Gaussian channel. Now, we will briefly present the OFDM transmission scheme and the derivation of the likelihood expression which are needed to initialize the LDPC decoder. 2.2. OFDM communication system The OFDM system consists in dividing the available spectrum into many carriers, each one being modulated by a low-rate data stream. The structure of the communication system is shown in Figure 2. The information bits bn are encoded by an LDPC code and the resulting codeword is sent to the OFDM transmitter. After a serial-to-parallel conversion the bits are mapped into a 4-QAM constellation on the Nc subcarriers to obtain a block of symbols Xk (k = 1, . . . , Nc ). Then, this block is transformed into a time-domain sequence by the inverse discrete Fourier transform (IDFT). In wired (xDSL) transmissions, the signal is baseband and therefore real valued. Although the optimization of LPDC codes does not require a baseband channel, we have decided to restrict the derivation of the formula to this case. In order to transmit a real signal, we build a block of symbols with a Hermitian symmetry which leads us to a real signal by IDFT. The trans-



iφk mit signal is xn = 2Nc k=1 Sk cos(2πk f0 n + φk ) with Xk = Sk e . A cyclic prefix longer than the channel memory is used as a preamble and the signal xn is sent through the frequencyselective channel. After removing the cyclic prefix, the re ceived signal can be written as yn = Lj =−01 h j xn− j +nn , with h j ( j = 0, . . . , L − 1) representing the coefficients of the channel impulse response and nn being the AWGN with zero mean and variance σn2 . The DFT-transforms the time-domain sequence yn into a frequency-domain sequence Yk and the frequency-selective channel becomes a set of Nc Gaussian ISI free channels with fading Hk :

Yk = Hk Xk + Nk

∀k = 1, . . . , Nc ,

(1)

with Hk being the kth channel spectrum coefficient, Xk the kth symbol, and Nk the Gaussian noise with zero mean and variance σn2 . The equalization corrects the channel distortion, and is easily done in the frequency domain with a simple multiplication by a coefficient Kk (for all k = 1, . . . , Nc ) on each subcarrier: Rk = Kk Yk for all k = 1, . . . , Nc . We have used the zero forcing (ZF) equalizer and, so, the kth equalizer coefficient is Kk = Hk∗ / |Hk |2 . Note that the choice of the equalizer type is not important in our case since any of the usual equalizers used in OFDM transmission (ZF, MMSE, maximum likelihood) would lead to the same expressions of the messages feeding the LDPC decoder. For the sake of simplicity, we then chose the ZF equalizer. Using the equalizer output Rk and the channel model (1), we obtain the expressions of the observed log-likelihood ratios u0k (LLRs): 

u02k

&



(

=

u02k+1





&





&

p Rk &Xk p Rk &C2k = 1  = log Xk /C2k =1  &  = log  & & p Rk &C2k = 0 Xk /C2k =0 p Rk Xk 4 Re Rk Kk∗ Hk∗

)

, & &2 & Kk & σ 2 n   &  p Rk &C2k+1 = 1  = log Xk /C2k+1 =1 = log  & p Rk &C2k+1 = 0 Xk /C2k+1 =0 ( ) ∗ ∗

=

4 Im Rk Kk Hk & &2 &Kk & σ 2



p Rk &Xk  &  p Rk &Xk

∀k = 1, . . . , Nc ,

n

(2)

Optimized LDPC Codes for Multicarrier Modulations

1549

where (C2k , C2k+1 ) are the 2 codeword bits used to form the kth 4-QAM symbol Xk , and Re{·} and Im{·} denote the real and the imaginary parts, respectively. Using these LLRs as initialization messages, we finally iteratively decode the noisy codeword to obtain an estimate uˆ n of the input sequence. The decoding algorithm will be presented in the next section. 2.3. Decoding LDPC codes using belief propagation LDPC codes are easily decoded by an iterative probabilistic algorithm known as belief propagation [7]. The belief propagation algorithm, using the Bayes rule locally, iteratively updates the a posteriori probabilities (APPs) of each bit in the codeword. So, this algorithm can be viewed as an iterative message-passing algorithm on the associated factor graph. Moreover, for a finite codeword length, the factor graph of an LDPC code contains many cycles which lead to a suboptimal calculation of the APPs. Each iteration of belief propagation is composed of two steps: (i) the data pass which updates the messages through the variable nodes; (ii) the check pass which updates the messages through the check nodes. Usually, it is more convenient to use LLRs as messages. Let v = log(p(y |c = 1)/ p(y |c = −1)) the output message of a variable node and u = log(p(y  |c = 1)/ p(y  |c = −1)) the output message of a check node. During the data pass on a variable node with a connection degree equal to i, the output message v on the qth branch is as follows vq = u0 +

i '

un

∀q = 1, . . . , i,

(3)

n=1 n =q

where un , n = 1, . . . , i, are the incoming messages from all the data node neighbors and u0 is the observed LLR (or channel value). At the first decoding iteration, all the un are set to zero for n = 1, . . . , i. During a check pass, we use the following “tanh rule” [17] to express the output message u on the pth branch: j

tanh

* up v = tanh m 2 2 m=1

∀ p = 1, . . . , j,

(4)

m =p

where vm , m = 1, . . . , j, are the incoming messages from the check node neighbors. After a few iterations of belief propagation, we can calculate the a posteriori ratio w for each data node which is equal to the sum of all messages feeding a variable node wk =  u0 + in=1 un , k = 1, . . . , N. Finally, we use wk to estimate the information bits: uˆ k = (1 − sign(wk ))/2, k = 1, . . . , N. Thus, after having introduced in this part the main notations and concepts which are essential to optimize the LDPC codes over OFDM frequency-selective channel, we present in the next section our proposed optimization scheme.

3.

OPTIMIZATION WITH A GAUSSIAN APPROXIMATION

In order to determine the performance of LDPC codes under belief propagation, Richardson and Urbanke [18] have introduced a general method to predict the asymptotic behavior of the LDPC codes. This method called density evolution is based on the study of the probability density functions (pdfs) of messages being propagated in the factor graph during the decoding steps under the assumption of cyclesfree graph. For memoryless binary-input continuous-output AWGN channels, Chung et al. [9] proposed a Gaussian approximation of message densities to simplify the analysis of the density evolution. For many channels, including the AWGN channel, LDPC codes (with an infinite codeword length) exhibit a threshold phenomenon. This threshold corresponds to an SNR above which the bit error probability converges to zero when the number of belief propagation iterations tends to infinity (in [9], the threshold was defined as a noise power). The criterion used by Chung to optimize the LDPC codes on the AWGN channels is to choose the code which exhibits the lowest threshold. So, the Gaussian approximation allows to calculate this threshold quickly and ensures an easier design for good LDPC codes on AWGN channels. In this section, we extend Chung’s algorithm to OFDM transmissions over frequency-selective channels. 3.1.

Gaussian approximation for AWGN channels

We first introduce the notations used in the Gaussian approximation method for a stationary AWGN channel. We assume that the channel is Gaussian with zero mean and variance σn2 , the constellation is BPSK, and the all-zero codeword is sent. Then the observed LLR u0 is also Gaussian with mean 2/σn2 and variance 4/σn2 . We note that the variance of u0 is equal to twice the mean, and this property (called consistency condition in [13]) is preserved through the belief propagation steps. This reduces the study of the density evolution to only the mean of the pdf. It is stated in [9] that the Gaussian approximation is rather a good approximation for the variable nodes outputs v, but not so good for the check nodes outputs u. We found that Gaussian approximation has been shown sufficiently accurate to provide a good LDPC optimization. From (3), the mean m(l) v,i of the output message of a variable node with a degree i is given by (l−1) m(l) , v,i = mu0 + (i − 1)mu

(5)

where mu0 is the mean of observed LLR u0 and l denotes the lth decoding iteration. At the lth iteration, an incoming message v to a check node has the following Gaussian mixture density fv(l) : fv(l) =

t' c max

$

%

(l) λi N m(l) v,i , 2mv,i .

(6)

i=2

From (4) and under the “local tree assumption” which states the independence between the messages vi , the updated

1550

EURASIP Journal on Applied Signal Processing

mean m(l) u at the lth iteration can be expressed as follows: m(l) u

t' r max

=

ρjφ

−1

j =2

  



1− 1−

t' c max

λi φ

$

m(l) v,i

%

j −1   

i=2

,

|H(ν)|

(7)

(l) where φ(m(l) v ) is equal to 1 − E[tanh(v /2)] (cf. [9]). Using (5) and (7) iteratively, we can follow the evolution of m(l) u along the decoding iteration. Recalling that the word error probability converges to zero if and only if m(l) u → +∞ when l → +∞. It is easy to calculate the threshold corresponding to an SNR above which m(l) u tends to infinity when l tends to infinity.

3.2. Gaussian approximation for OFDM frequency-selective channels We will now extend the Gaussian approximation approach to OFDM channels which are not stationary. This will lead us to the optimization of the irregularity profile (λ, p, ρ). Before studying the asymptotical behavior of the decoder, we first discuss the statistical properties of the observed messages. As stated in Section 2.2, the messages at the input of the decoder are 

u0,2k

&

(



&

4 Im Rk Kk∗ Hk∗ p Rk &C2k+1 = 1 = u0,2k+1 = log  & & &2 &Kk & σ 2 p Rk &C2k+1 = 0 n

)

∀k ∈ N.

Assuming that the all-zero codeword is sent: 

C2k , C2k+1 = (0, 0) ∀k, i.e., Xk = 1 + j,

U0,k = u0,2k

&2

(

) ∀k ∈ N.

(10) has a consistent Gaussian pdf

 & & & &  &Hk &2 8&Hk &2   4  = N mu , 2mu =N , 0,k 0,k 2 2

σn

(2)

ν

(3)

λ3

λ2

Figure 3: Rectangular approximation of a frequency-selective channel spectrum.

First, we split the channel spectrum into tc max − 1 parts according to the irregularity profile as represented in Figure 3. Each part corresponds to the spectrum bandwidth (p ) (of length λi i ), where the bits with the same connection degree i are transmitted. We sort the parts in ascending order of the positions pi in the irregularity profile so that 

Bi = b pi −1 ; b pi



 =

k' i−1

(p j )

λj

;

ki '

 (p j )

λj



∀i = 2, . . . , tc max ,

j =k1

where the kl are such that pkl = l for all l = 1, . . . , tc max − 1. In order to build the rectangular approximation of the channel spectrum H(ν) in each band Bi , we have chosen a staircase function such that the amplitude of the channel spectrum is divided into Ns (i) equal parts. This type of approximation is called a simple function [19]. This means that the channel is modeled, in the band Bi , by N' s (i)

Hi,k · 1Ai,k (ν) ∀ν ∈ Bi .

(13)

k=1

4 Re Hk∗ Nk 4&Hk & = u0,2k+1 = + σn2 σn2

σn

(1)

λ4

sNs (i) (ν) =

So, the observed message U0,k fu0,k : fu0,k

b3 = 1

b2

(9)

the observed LLRs become &

b1

(12)

(8)



b0 = 0

)

(



α3,2

j =k1

4 Re Rk Kk∗ Hk∗ p Rk &C2k = 1  = = log  & , & &2 & Kk & σ 2 p Rk &C2k = 0 n 

H3,2

1A represents the indicator function of set A, Ns (i) is the number of stairs in each band, Hi,k are the amplitudes of the stairs, and Ai,k (ν) are the sets defined as 0

&

&

4

Ai,k = ν; &H(ν)& ∈ mi +

 k  Mi − mi , Ns (i)

51

∀k ∈ N.

(11) First of all, remark that the “local tree” assumption requires an infinite codeword length, and thereby an infinite number of subcarriers. Unlike the AWGN channel case, each observed message has different statistical properties. Since we have an infinite number of observed messages (one for each subcarrier), the model of the densities (11) involves an infinite number of equations. To circumvent this problem, we build a rectangular approximation of the channel spectrum which reduces the model to a finite number of equations.

 k + 1 mi + Mi − mi Ns (i)

(14)

with mi = minν∈Bi (|H(ν)|) and Mi = maxν∈Bi (|H(ν)|). We have drawn on Figure 3 the approximation of a typical ADSL spectrum, and its approximation with tc max − 1 = 3 and (Ns (4) = 3, Ns (3) = 4, Ns (2) = 3). At this point, we can remark that we do not need an accurate measure of the channel shape since only a rectangular approximation is used in the optimization process. This ensures a certain robustness of our code design with respect to the channel knowledge at the transmitter. This is a clear advantage since the method is still valid for slowly varying

Optimized LDPC Codes for Multicarrier Modulations

1551

channels or when channel estimation errors occur. This issue is further discussed in the conclusion. Let αi,k denote the normalized width of the subband k located in the bandwidth Bi with αi,k =

supν∈Ai,k |ν| − inf ν∈Ai,k |ν| (pi )

λi

.

(15)

So, an observed message incoming a data node with a connection degree i has a Gaussian mixture density fu0,i : fu0,i =

N' s (i)











αi,k N mu0 Hi,k , 2mu0 Hi,k ,

ing Gaussian mixture density fv(l) : fv(l)

=





Ns (i) $ % (p ) ' (l)  λi i  αi,k N m(l) vi,k , 2mvi,k i=2 k=1

t' c max

(19)

which leads to the generalization of (7): m(l) u =

t' r max j =2

 



ρ j φ−1 1 − 1 −

t' c max

(p ) λi i

i=2

" Ns (i) '

$

αi,k φ m(l) vi,k

k=1

# j −1   % 

.

(16)

(20)

where mu0 (Hi,k ) is the mean of the LLRs located on the kth subband of Bi :

Now, using (18) and (20) iteratively, we can follow the evolution of m(l) u along the decoding iterations for a frequencyselective channel. We will make use of these equations in order to optimize the LDPC code profile.

k=1





mu0 Hi,k =

2 4Hi,k 4 = 2 2 σn σn αi,k

/ Ai,k

& & &H(ν)&2 dν.

(17)

It is important to note that the density fu0,i is consistent for all i because it is a mixture of consistent Gaussian densities. Therefore, the variance of the messages u0,i is twice its mean and so, density evolution can be used for frequencyselective channels. Thus, the main difference between OFDM frequency-selective channels and AWGN channels is that the density of incoming messages to the factor graph fu0,i is a function of the code irregularity profile. Actually, as shown in (17), the mean mu0 (Hi,k ) depends on the λi ’s through the parameter αi,k and the set Ai,k . Now, we discuss the accuracy of the rectangular approximation. The greater Ns (i) in each bandwidth Bi , the best the approximation, but the resulting density will be more computationally difficult to manage—because it is a mixture of Ns (i) Gaussian densities. In order to choose the number of stairs in each Bi , we fix a maximum number Ns max and a threshold ε. Then we evaluate the Kullback divergence iteratively between fu0,i (Ns max ) and fu0,i (n) for all n ∈ {1, . . . , Ns max }. We choose for Ns (i) the maximum value of n such that DK ( fu0,i (Ns max ), fu0,i (n)) < ε. Using a Kullback divergence [20], well suited to evaluate the distance between the tails of pdfs, we ensure that the likelihood values computed from the approximation will not be too different than the actual likelihood values—and so will be the mean mu0 (Hi,k ). The choice of Ns max and of ε is a tradeoff between accuracy and computational complexity. It is now easy to generalize the equations describing the evolution of the mean (5)–(7) to the case of OFDM frequency-selective channels. The mean m(l) vi,k of the output message of a variable node with a degree i in the kth subband Bi is given by 



(l−1) m(l) . vi,k = mu0 Hi,k + (i − 1)mu

(18)

Then, at the lth iteration of belief propagation decoding, an incoming message v to a check node will have the follow-

3.3.

Optimization

In [9], the optimization criterion was to minimize the LDPC decoding threshold, which seems to be the best choice for stationary channels. We think that a different optimization criterion could be used for OFDM frequency-selective channels based on their nonstationarity, leading to a codeword which is unequally fed by varying likelihoods. Indeed, it is more relevant to place the information bits in the codeword when the channel is less noisy. For this reason, as an optimization criterion, we choose to minimize the information bit error probability after L iterations of belief propagation at a carefully chosen Eb /N0 . Using this optimization criterion, we get the LDPC code that asymptotically achieves the best performance after a finite number of decoding iterations for a given Eb /N0 . We will see through simulations that the choice of this optimization criterion is relevant, especially for small block length. The probability of error at the lth iteration for a degree i bits that is transmitted in the subband k (which is located in Bi ) is given by ;

Pel,i,k = Q 







mu0 Hi,k + im(l) u  , 2

(21)

where Q(·) is the Gaussian tail function. Let Binfo be the band of normalized length R where the information bits are transmitted. Likewise, Binfo,i,k = Binfo ∩ Ai,k defines the band where the information bits of degree i of the subband k are transmitted. So, the length αinfo,i,k of this band can be written as αinfo,i,k = sup |ν| − inf |ν|. ν∈Binfo,i,k

ν∈Binfo,i,k

(22)

Therefore, we have the following property: t' st (i) c max N' i=2 k=1

αinfo,i,k = R.

(23)

1552

EURASIP Journal on Applied Signal Processing

With these notations, the bit error probability on the information bits is defined by ;

Pel,info

tc max N' st (i) 1 ' = αinfo,i,k Q R i=2 k=1







mu0 Hi,k + im(l) u  . (24) 2

9 8

6 5 4 3 2

Information

1

0

200

Redundancy

400

600

800

1000

1200

Subcarrier number Channel spectrum ×5 Irregularity profile

Figure 4: Data nodes connection degrees for the ch A with R = 0.5. 11 10

SIMULATION RESULTS

This part presents the results obtained with our approach (min Pe ) for different channels including AWGN channel, ADSL channel, and a nonmonotonous spectrum channel. The results are compared with those of a regular channel coding scheme and the method which consists in the minimization of the threshold for frequency-selective channels called min T. Section 4.1 presents the obtained irregular profiles while Section 4.2 gives the comparison of the performances for different codeword lengths. 4.1. Optimized irregular LDPC code profiles

9 8 7 6 5 4 3 Information

2

In this section, we present the structure of optimized LDPC codes over the two different frequency-selective channels: (i) a typical ADSL channel called ch A with impulse response hA = [0.06, 0.72, 0.54, 0.36, 0.18, 0.114, 0.078, 0.054, 0.033, 0.018, 0.012];

10

7

We can note that this bit error probability depends on the λi and pi through the parameter αinfo,i,k and the mean mu0 (Hi,k ). The problem is that the dependance of (24) is nonlinear in the required parameters λi , which is the main difference of our optimization scheme compared to existing work. It should be noted that the nonlinearity is not a result of the criterion chosen, but comes from the nonstationarity of the channel. It is then not possible to optimize the code profile easily with linear programming, as in [9, 18] for the erasure and the AWGN channels. This nonlinear cost function minimization with continuous space parameters can be solved efficiently using differential evolution [21], a method that has been previously used for LDPC optimization in Rayleigh fading channels [16]. For more details about differential evolution applied, we refer the reader to [22]. 4.

11

(25)

(ii) a nonmonotonous spectrum channel denoted by ch B with impulse response hB = [−0.21, −0.17, 0.31, 0.68, −0.27, −0.15, 0.19, 0.13]. (26) In order to find the good profiles in a reasonable time, some parameters of the LDPC codes are fixed. Here we set ρ(x) = x7 , tc max = 10, and we choose a coding rate equal to R = 1/2. To obtain the optimized code for a given channel, we have minimized the bit error probability (using (24)) for a value of Eb /N0 slightly lower than the threshold exhibited by the min T method. Figures 4 and 5 depict the irregularity profiles

1

Redundancy

Redundancy 0

200

400 600 800 Subcarrier number

1000

1200

Channel spectrum ×7 Irregularity profile

Figure 5: Data nodes connection degrees for the ch B with R = 0.5.

of LDPC codes obtained after optimization with respect to ch A and ch B. The resulting degree distributions λ ch A and λ ch B are 







λ ch A = λ2 , . . . , λ10 = [0.5, 0.1221, 0.0165, 0.1035, 0.0029, 0.1016, 0.0675, 0.0772, 0.0087], λ ch B = λ2 , . . . , λ10 = [0.5, 0.0069, 0.111, 0.1202, 0.1455, 0.0263, 0.0102, 0.0058, 0.0745]. (27)

Optimized LDPC Codes for Multicarrier Modulations

4.2. Performance study One of LDPC drawbacks is their high encoding complexity. Several authors have proposed LDPC encoding methods whose complexity scales as O(N) [12, 23, 24], but these methods need to permute the codeword bits, breaking down the desired irregularity profile of the code. In [25], MacKay et al. propose to encode the information bits directly with the parity matrix, which can be done in linear time when the parity matrix is upper triangular. This approach allows to keep the irregularity profile while ensuring a simple encoding. For this purpose, we have derived a bit filling algorithm to build an upper triangular parity matrix H with the profiles of Figures 4 and 5 and no cycles of length 4. In our simulations, the number of subcarriers Nc is set to 1024, the length of the cyclic prefix is set to 12, and the LDPC codeword length N = k × Nc . Two codeword block sizes are used: N = 16384 (which is decoding with 200 iterations of belief propagation) in order to obtain the performances close to the asymptotic behavior and N = 1024 (decoding with 40 iterations of belief propagation) in order to consider more practical cases. The results are plotted in Figures 6–10. Figures 6 and 7 illustrate the performance between regular and optimized irregular coding schemes for a transmission over the AWGN, ch A, and ch B channels. The regular code is defined by λ(x) = x3 and ρ(x) = x7 . For an error probability equal to 10−5 , we can note an improvement of 1.5 dB for the optimized code compared to the regular one

100 10−1

BER

10−2 10−3 10−4 10−5 10−6

0

1

2

3

4

5

6

7

8

Eb /N0 (dB) Regular code ADSL channel Optimized code ADSL channel Regular code AWGN channel Optimized code AWGN channel

Figure 6: Performance comparison between our method and the regular coding scheme for a rate R = 1/2 over ch A and over the AWGN channel. N = 1024. 100

10−1

10−2 BER

Interestingly, the optimization process gives proportions of degree 2 bit nodes (λ2 ) equal to 1 − R, R being the code rate. This result is due to the minimization of the probability of error only on the information bits, which represent exactly a proportion of R of the codeword. This means that the optimization criterion tries to protect the information bits better that the redundancy bits by allocating more edges in the graph to the information bits. Again, this would not be an issue on stationary channels, but for OFDM frequencyselective channels, the position of the information bits in the codeword matters. Having a look at Figures 4 and 5, we remark that putting the node degree locations as parameters in the cost function leads to an irregularity profile that has two behaviors. First, the redundancy bits are connected to exactly 2 edges and are placed in the codeword where the channel has the lowest gain. Secondly, the connections on the information bits are inversely proportional to the channel shape. This kind of profile could be interpreted as a compensation for the channel selectivity. We can explain this phenomenon by the following way: a bit which is connected to a large number of check nodes is well protected against the additive noise because it gets lots of information coming from the other bits during the decoding process. So, an information bit transmitted on a subcarrier with a small SNR must have a large connection degree in order to be well protected. We once more recall that the obtained profiles result from the optimization process thanks to the introduction of location parameters in the code profile, and do not require any a priori or heuristic.

1553

10−3

10−4

10−5

2

2.5

3

3.5

4

4.5

5

5.5

6

6.5

Eb /N0 (dB) Optimized code Regular code

Figure 7: Performance comparison between our method and the regular coding scheme for a rate R = 1/2 over ch B. N = 1024.

for ch A with roughly the same decoding complexity. The same remarks hold true for ch B with an improvement of 0.8 dB. We have also performed the simulations with the optimized code (obtained for ch A) and the regular one on the AWGN channel. These results show that the gap is greater in the case of frequency-selective channel than for the AWGN channel. So, we can conclude that optimized irregular coding strategy is well suited to transmissions over frequencyselective channels.

EURASIP Journal on Applied Signal Processing 100

100

10−1

10−1

10−2

10−2 BER

BER

1554

10−3

10−3

10−4

10−4

10−5

10−5

10−6

1

2

3

4

5

6

7

10−6

8

Code optimized by the minimization of the threshold Code optimized by the minimization of Pe

100

10−1

BER

10−2

10−3

10−4

1

1.5

2

2.5

3

3.5

2

3

4

5

6

7

8

Optimized code ADSL channel Interleaved AWGN code ADSL channel

Figure 8: Performance comparison over ch A between the code obtained using the min Pe and the min T methods. R = 1/2 and N = 1024.

10−5

1

Eb /N0 (dB)

Eb /N0 (dB)

4

4.5

5

5.5

6

Eb /N0 (dB) Code optimized by the minimization of Pe Code optimized by the minimization of the threshold

Figure 9: Performance comparison over ch A between the code obtained using the min Pe and the min T methods. R = 1/2 and N = 16384.

Figures 8 and 9 compare the performances of LDPC codes obtained with two different optimization criteria, namely, the minimization of the bit error probability min Pe and the threshold minimization min T. In the case where N = 1024 (Figure 8), the LDPC code obtained by min Pe outperforms the one obtained by the min T method. For example, for a bit error rate (BER) equal to 10−5 , we observe an improvement equal to 1.1 dB. For N = 16384 depicted in Figure 9, two regions can be distinguished, namely, a region in which the performances obtained by min Pe are bet-

Figure 10: Performance comparison over ch A between our method and the LDPC code obtained by optimization on AWGN channel for a rate R = 1/2. N = 1024.

ter than ones obtained with min T. This region is defined for the SNRs lower than 5.15 dB. For SNR higher than 5.15 dB, the min T code presents a threshold phenomenon and outperforms the min Pe code. So, for applications where N is constrained to a relatively small value, the use of min Pe criterion to design an optimized channel coding scheme will be preferable. In order to be sure that the optimization of LDPC codes for a specific channel is of great importance, we have also compared our codes to irregular LDPC codes presented in [18], which have been optimized for the AWGN channel (with the same parameters, i.e., ρ(x) = x7 , tc max = 10, and R = 1/2). Because the positions of the codeword bits are a result of our optimization process, we have added an interleaver in the case of the code optimized on the AWGN channel. The comparison is done in Figure 10 for a codeword of length 1024 and with the frequency-selective channel ch A. We can notice that adapting the LDPC code irregularity to the channel shape leads to a high improvement of the coding performances. 5.

DISCUSSION AND CONCLUSION

In this paper, we have optimized the structure of LDPC codes for transmissions over OFDM frequency-selective channel. The optimization is based on a new and general parameterization of the LDPC code irregularity: the irregularity profile. We have optimized the irregularity profile using a Gaussian approximation technique. We have shown that it is relevant to minimize the bit error probability instead of trying to get a vanishing block error probability LDPC code. This has been shown by simulations on several channels and for small and large codeword lengths. We obtain optimized LDPC codes that exhibit a performance improvement of several dBs

Optimized LDPC Codes for Multicarrier Modulations

1555

compared to a regular LDPC code. This improvement is greater than the one observed on stationary Gaussian channel, which means that using irregular LDPC codes is even more important on OFDM frequency-selective channels than more simple channels. We must emphasize that the proposed method does not required a perfect a priori knowledge on the CSI for the optimization of the channel code and therefore is well suited to practical case. Actually, the a priori knowledge on the CSI is quite often incomplete due to the use of a short training sequence to estimate it. Moreover, we think that our method is still interesting for a slowly varying channel and/or when channel estimation mismatches occurs. However, we are currently working on the robustness of the optimized codes with respect to partial or wrong CSI. This work will be reported in future publication. Another discussion can be engaged to compare our approach and power allocation approach as in DMT standard. When perfect CSI knowledge is available at the transmitter, the power allocation transforms a frequencyselective channel into an AWGN channel. So, in theory, it is possible to find the optimal channel code leading to the frequency channel capacity. We can notice that this assumes a perfect power allocation with respect to waterfilling principle. Once again, in some practical case, perfect power allocation is not a realistic assumption due to the CSI estimation but also because power allocation implies the use of a bit loading algorithm which authorizes to use just a few number of constellations which assume to use integer bits. For these reasons, we believe that power allocation and irregularity profile optimization are not necessarily competitive and could be used jointly in order to achieve performance close to the capacity. The methods that we have developed in this paper could also be successfully applied to multiuser multicarrier modulations such as MC-CDMA.

[11]

ACKNOWLEDGMENTS

[17]

We wish to thank the reviewers for their helpful comments that have improved the presentation of this paper and gave us some insights about future developments. This work was partially supported by the Groupe de Recherche Information, Signal, Images et Vision (GdR ISIS, CNRS).

[6] [7] [8] [9]

[10]

[12]

[13]

[14]

[15] [16]

[18]

[19]

REFERENCES [1] J. A. C. Bingham, “Multicarrier modulation for data transmission: an idea whose time has come,” IEEE Communications Magazine, vol. 28, no. 5, pp. 5–14, 1990. [2] H. Sari, G. Karam, and I. Jeanclaude, “Transmission techniques for digital terrestrial TV broadcasting,” IEEE Communications Magazine, vol. 33, no. 2, pp. 100–109, 1995. [3] W. Y. Zou and Y. Wu, “COFDM: an overview,” IEEE Transactions on Broadcasting, vol. 41, no. 1, pp. 1–8, 1995. [4] C. E. Shannon, “A mathematical theory of communication,” Bell System Technical Journal, vol. 27, pp. 379–423, 623–656, July–October 1948. [5] C. Berrou, A. Glavieux, and P. Thitimajshima, “Near Shannon limit error-correcting coding and decoding: Turbo-codes,” in

[20] [21]

[22]

[23]

Proc. IEEE International Conference on Communications (ICC ’93), vol. 2, pp. 1064–1070, Geneva, Switzerland, May 1993. M. Sipser and D. A. Spielman, “Expander codes,” IEEE Transactions on Information Theory, vol. 42, no. 6, pp. 1710–1722, 1996. D. J. C. MacKay, “Good error-correcting codes based on very sparse matrices,” IEEE Transactions on Information Theory, vol. 45, no. 2, pp. 399–431, 1999. A. Shokrollahi and R. Storn, Design of Efficient Erasure Codes with Differential Evolution, Bell Laboratories, Murry Hill, NJ, USA, 1999. S.-Y. Chung, T. J. Richardson, and R. L. Urbanke, “Analysis of sum-product decoding of low-density parity-check codes using a Gaussian approximation,” IEEE Transactions on Information Theory, vol. 47, no. 2, pp. 657–670, 2001. D. Doan and K. Narayanan, “Design of good low rate codes for ISI channels based on spectral shaping,” in Proc. 3rd International Symposium on Turbo-Codes & Related Topics, Brest, France, September 2003. H. Pishro-Nik, N. Rahnavard, J. Ha, F. Fekri, and A. Adibi, “Low-density parity-check codes for volume holographic memory systems,” Applied Optics-IP, vol. 42, no. 5, pp. 861– 870, 2003. D. Declercq, G. Gelle, and V. Mannoni, “Irregular channel coding for OFDM transmission,” in Proc. 6th International Symposium on Communication Theory and Applications (ISCTA ’01), Ambleside, UK, July 2001. T. J. Richardson, M. A. Shokrollahi, and R. L. Urbanke, “Design of capacity-approaching irregular low-density paritycheck codes,” IEEE Transactions on Information Theory, vol. 47, no. 2, pp. 619–637, 2001. F. Verdier and D. Declercq, “A LDPC parity check matrix construction for parallel hardware decoding,” in Proc. 3rd International Symposium on Turbo-Codes & Related Topics, Brest, France, September 2003. F. R. Kschischang, B. J. Frey, and H.-A. Loeliger, “Factor graphs and the sum-product algorithm,” IEEE Transactions on Information Theory, vol. 47, no. 2, pp. 498–519, 2001. J. Hou, P. H. Siegel, and L. B. Milstein, “Performance analysis and code optimization of low density parity-check codes on Rayleigh fading channels,” IEEE Journal on Selected Areas in Communications, vol. 19, no. 5, pp. 924–934, 2001. C. R. Hartmann and L. D. Rudolph, “An optimum symbolby-symbol decoding rule for linear codes,” IEEE Transactions on Information Theory, vol. 22, no. 5, pp. 514–517, 1976. T. J. Richardson and R. L. Urbanke, “The capacity of lowdensity parity-check codes under message-passing decoding,” IEEE Transactions on Information Theory, vol. 47, no. 2, pp. 599–618, 2001. M. Capinski and E. Kopp, Measure, Integral and Probability, Springer-Verlag, New York, NY, USA, 1999. M. G. Kendall and A. Stuart, The Advanced Theory of Statistics, C. Griffin, London, UK, 2nd edition, 1963. V. Mannoni, D. Declereq, and G. Gelle, “Optimized irregular Gallager codes for OFDM transmission,” in Proc. 13th IEEE International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC ’02), vol. 1, pp. 222–226, Lisboa, Portugal, September 2002. R. Storn and K. Price, “Differential evolution—a simple and efficient heuristic for global optimization over continuous spaces,” Journal of Global Optimization, vol. 11, no. 4, pp. 341–359, 1997. D. A. Spielman, “Linear-time encodable and decodable errorcorrecting codes,” IEEE Transactions on Information Theory, vol. 42, no. 6, pp. 1723–1731, 1996.

1556 [24] T. J. Richardson and R. L. Urbanke, “Efficient encoding of low-density parity-check codes,” IEEE Transactions on Information Theory, vol. 47, no. 2, pp. 638–656, 2001. [25] D. J. C. MacKay, S. T. Wilson, and M. C. Davey, “Comparison of constructions of irregular Gallager codes,” IEEE Trans. Communications, vol. 47, no. 10, pp. 1449–1454, 1999. Val´erian Mannoni was born in 1977 in Brou-sur-Chantereine, France. He received the Diplome d’Etudes Approfondies (M.S.) degree in automatic and digital signal processing from the University of Reims Champagne-Ardenne, Reims, France, in 2000. Since October 2000, he has been pursuing the Ph.D. degree in the Laboratory of Decision and Communication Systems (D´eCom), the University of Reims Champagne-Ardenne and in the ETIS laboratory, the University of Cergy-Pontoise, France. His research interests are in the channel coding for the multicarrier transmissions, more especially with the Low-Density Parity-Check codes. David Declercq received his Ph.D. degree in electrical and computer engineering in 1998 from the University of Cergy-Pontoise, France. He is now an Associate Professor at the ETIS laboratory in the University of Cergy-Pontoise, in the Signal Processing Research Group. He is the Head of the Signal Processing Group since 2003. His research interests are mainly in statistical model estimation and information theory for digital communication. In particular, he is interested in designing good LDPC codes for various types of nonstandard channels (OFDM, multiuser, etc.). Guillaume Gelle was born in France in 1969. He received the DEA (M.S.) degree in 1994 and the Ph.D. degree in 1998 from the University of Reims Champagne-Ardenne in electrical and computer engineering. He is currently an Associate Professor in the Deptartment of Electrical Engineering and in the Laboratory of Decision and Communications Systems (D´eCom) where he leads the Communication Systems Team (SysCom). His general research interests include digital signal processing and communication with emphasis on nonlinear, nonGaussian, and nonstationary signal processing. He currently works on advanced signal processing and coding theory for multicarrier communication systems.

EURASIP Journal on Applied Signal Processing

EURASIP Journal on Applied Signal Processing 2004:10, 1557–1567 c 2004 Hindawi Publishing Corporation 

Layered Video Transmission on Adaptive OFDM Wireless Systems D. Dardari IEIIT-BO/CNR, CNIT, DEIS, University of Bologna, Viale Risorgimento 2, 40136 Bologna, Italy Email: [email protected]

M. G. Martini IEIIT-BO/CNR, CNIT, DEIS, University of Bologna, Viale Risorgimento 2, 40136 Bologna, Italy Email: [email protected]

M. Mazzotti IEIIT-BO/CNR, CNIT, DEIS, University of Bologna, Viale Risorgimento 2, 40136 Bologna, Italy Email: [email protected]

M. Chiani IEIIT-BO/CNR, CNIT, DEIS, University of Bologna, Viale Risorgimento 2, 40136 Bologna, Italy Email: [email protected] Received 28 February 2003; Revised 26 January 2004 Future wireless video transmission systems will consider orthogonal frequency division multiplexing (OFDM) as the basic modulation technique due to its robustness and low complexity implementation in the presence of frequency-selective channels. Recently, adaptive bit loading techniques have been applied to OFDM showing good performance gains in cable transmission systems. In this paper a multilayer bit loading technique, based on the so called “ordered subcarrier selection algorithm,” is proposed and applied to a Hiperlan2-like wireless system at 5 GHz for efficient layered multimedia transmission. Different schemes realizing unequal error protection both at coding and modulation levels are compared. The strong impact of this technique in terms of video quality is evaluated for MPEG-4 video transmission. Keywords and phrases: OFDM, adaptive modulation, bit loading, UEP, MPEG-4.

1.

INTRODUCTION

One of the main goals in the near future of communication systems is the development of multimedia efficient data coding, compression, and transmission techniques that permit real-time mobile communications. In this context, the major challenge is the integration of different categories of networks and wireless local area networks (WLAN). Systems have to be adaptive, that is, they have to react to changing quality conditions, like varying channel capacity. In high-speed wireless data applications, the orthogonal frequency division multiplexing (OFDM) modulation scheme has been considered due to its relatively simple receiver structure compared to single-carrier transmission in frequency-selective fading channels. OFDM modulation is adopted by IEEE for the extension of the 802.11 wireless LAN standard to the 5 GHz band (IEEE802.11a), providing data

rates up to 54 Mbps [1]. ETSI adopted the OFDM scheme for the high performance LAN physical layer standard (Hiperlan2) As well [2]. Conventional OFDM modems use fixed constellation size and power level allocation of all subchannels. In more recent standards (i.e., IEEE802.11a), the adaptation of the constellation size (the same for all subchannels) according to the global channel-state time-variation is admitted. Due to multipath fading, some subchannels could experience severe degradation in the signal-to-noise ratio (SNR), resulting in high overall bit error rates. Channel coding is a common technique to mitigate this effect. If the channel is static (e.g., in digital subscribers lines (DSL)) or slowly time varying, the receiver can provide the transmitter with detailed channel state information (CSI) using a robust feedback channel. Based on the CSI, more sophisticated adaptive transmission techniques have the possibility to dynamically modify

1558 the parameters of the modulator in order to improve the performance [3]. Thanks to the characteristic of multicarrier modulations, it is also possible to dynamically change the transmitting power and bit rate of each subchannel according to channel selectivity variations (adaptive bit loading). The first applications of bit loading algorithms appeared in DSL systems [4, 5]. It is a well-known fact that the theoretical channel capacity can be approached by distributing the total transmitted energy according to the water-filling principle [6]. In the realistic case where a finite granularity in constellation size is required, the rounded bit distribution obtained starting from the water-filling solution could still not be the optimum. Some suboptimum algorithms to reduce the complexity have been proposed in the ADSL context [7, 8]. Campello [9] gives the theoretically sufficient conditions for a discrete bit allocation to be optimal. Based on his conditions, a “greedy” algorithm can be used to achieve the optimal discrete bit/power loading distribution. Recently, some studies regarding the application of adaptive bit loading algorithms to wireless channels appeared [10, 11, 12, 13]. In this case, particular attention must be paid to channel estimation and CSI update rate effects on the performance [14, 15, 16]. However, water-filling-based techniques require a large overhead for CSI feedback, making them suitable only for static or very slow time varying channels. Moreover, the modem must be able to continually change the modulation format and power on subcarrier basis (high complexity if high data rates are requested). Hence, simple suboptimal algorithms should be investigated in order to reduce complexity and CSI overhead. The adaptation of the modulation segment also to the source data structure and significance may provide good results by realizing unequal error protection (UEP) in the modulation domain. UEP has proven to provide good performance in the case of transmission of compressed sources, where the bits produced have a different significance. Providing a lower bit error rate for the bits with higher significance and leaving the less significant bits with less protection makes it possible to increase the perceived quality. UEP has been applied for audio transmission [17, 18], for progressive image transmission [19], and for subband coded audio and video transmission, as some kinds of sources lend themselves to be partitioned into differently sensitive groups of bits. Also UEP for block-based video coded sources has been proposed as in [20, 21, 22]. UEP is classically performed at channel coding level, through convolutional and, more recently, turbo codes. Multiresolution constellations allows a nonuniform data protection in the modulation domain [23]. Some recent studies have proposed to perform UEP in the modulation domain, exploiting the characteristics of multicarrier modulations [13, 24]. In this case, the fact that a nonuniform bit and power allocation among the subcarriers is required implies a significant modem complexity and a high CSI signaling overhead between the transmitter and the receiver with respect to the uniform case. This may cause a higher sensitivity to signaling errors.

EURASIP Journal on Applied Signal Processing In this paper, a simple bit loading algorithm, where the constellation size and power levels are constrained to be uniform for all used subcarriers, is proposed and extended to the multilayer case to perform UEP of layered video sources at the modulation level. This technique is compared with UEP realized at channel coding level and with an equal error protection (EEP) scheme based on classic bit loading techniques. The performance evaluation in terms of peak signal-to-noise ratio (PSNR) for MPEG-4 video transmission for wireless data service is addressed, showing the large gain that can be obtained, especially at low SNRs. 2.

REFERENCE TRANSMISSION SYSTEM

Figure 1 illustrates the considered transmission system. The transmitter section is basically made up of a channel coder followed by a bit loading unit that distributes the data bit, according to the algorithm implemented, among the subchannels (more details about its functions will be given further) and a conventional OFDM modulator. The OFDM scheme allows the transmission of N parallel complex symbols An (n = 1, 2, . . . , N), that 3belong to an Mn points constellation set {±1, ±3, . . . , ±( Mn − 1)} for both real and imaginary dimensions, into N parallel subchannels (or subcarriers). The symbol (or frame) duration is denoted with Ts . Generally, only a limited, and constant over the subcarriers, set of values for Mn = M is adopted in practical modems (e.g., M = 2, 4, 16, 64) [1]. In order to grant the orthogonality between subchannels in ideal channel conditions, the subchannel subdivision is obtained by means of an inverse fast Fourier transform (IFFT) of order NFFT (N < NFFT to accommodate virtual subcarriers). Samples at the output of the IFFT block are converted from parallel to serial and transmitted every Tc seconds (chip time). In practice, due to propagation effects, subchannels still do not remain orthogonal, so a cyclic prefix (guard interval) is added to the OFDM symbol (the IFFT output) in order to remove the intersymbol interference (ISI) among subchannels [25]. Its duration is a multiple D of the chip time Tc , that is, Tg = D · Tc . At the receiver side, the reverse process is performed. The cyclic prefix represents a redundancy, in fact, only the time Tu = NFFT · Tc is dedicated to the transmission of useful data, whereas the total OFDM symbol time is Ts = Tu + Tg = Tc · (NFFT + D). The power efficiency (less than 1) due to the guard interval is ηD =

Tu NFFT . =  Ts NFFT + D

(1)

If the maximum multipath delay Td is less than the guard interval Tg , no ISI is present and the complex received signal at the nth output of the FFT block can be written, in a normalized form, as [26] Zn = Hn · wn · An + xn ,

(2)

where Hn is the channel transfer function gain related to the nth subchannel, and wn is a weight coefficient which allows nonuniform power level allocation on the transmitter side as

Layered Video Transmission on Adaptive OFDM Wireless Systems Video stream

MPEG-4 coder Layer 1

1559

Layer 3 Layered MPEG-4 stream A1 . Channel Bit . . coder loading

IFFT NFFT

. . .

P/S guard interval

D/A

AN CSI

Channel decoder

Detection

Z1 . . .

FFT

. . .

Guard interval removal S/P

A/D

ZN

P b3

P b1

5 GHz wireless channel

Channel estimation

Video stream

MPEG-4 decoder

PSNR

Figure 1: Transmission system block diagram.

required by common bit loading algorithms. In not adapted schemes, wn = 1 for all n. Following the same normalization done in [26], the random variable xn represents the zeromean complex Gaussian thermal noise component at the nth FFT output with power σx2

& &2  2N 0 = E &xn & = ,

where N0 is the single-side power noise density. Recalling that symbol An belongs to an Mn -QAM constellation, the average power Pn dedicated to the nth subchannel is 

& &2  2 Mn − 1 wn2 , Pn = E &An & · wn2 = 3

(4)

leading to a total average transmitted power PT : PT =

Nu '

Pn .

(5)

n=1

Nu is the actual number of subchannels used by the bit loading algorithm. We have neglected the presence of pilot subcarriers allocated for channel estimation purposes. In the case where an Mn -QAM signaling is adopted, assuming ideal phase offset compensation, perfect carrier recovery and synchronization, the bit error probability related to the nth subchannel can be approximated as follows [27]: 3



< =

&

&2

= w 2 · &Hn & 2 Mn − 1 > n · erfc . Pbn ∼ = 3 σx2 Mn · log2 Mn

(6)

3



< =

3



< =

&

&2

= Pn · 3&Hn & 2 Mn − 1 >   · erfc Pbn ∼ = 3 2 Mn − 1 σx2 Mn · log2 Mn &

&2

(7)

= E 3εn · &Hn & · ηD 2 Mn − 1 > s   , = 3 · erfc N0 2 Mn − 1 Mn · log2 Mn

(3)

Tu



Considering (3), (4), and (5), we obtain

where Es PT · Ts = N0 2N0

(8)

denotes the transmitted (OFDM) radio frequency symbol energy-to-noise ratio, and εn = Pn /PT indicates the fraction of the power dedicated to the nth subchannel. Obviously, it  is Nn=u 1 εn = 1. Once the code rate, Rc , and the actual number of bit transmitted per frame, bT , are fixed, Es /N0 can be expressed as a function of the received average bit energy-tonoise ratio Eb /N0 : Eb Es = Rc bT . N0 N0

(9)

As can be noted, the performance at each subchannel depends on |Hn |, so severely attenuated subchannels could compromise the performance. In general, a suitable channel coding is necessary to improve the overall performance (coded OFDM) [26] as done in the numerical results.

1560 3.

EURASIP Journal on Applied Signal Processing

MULTILAYER ADAPTIVE BIT LOADING

3.1. Ordered subcarrier selection algorithm Current WLAN standards [1, 2] consider a fixed bit loading scheme where, once the decision on the constellation size M based on overall propagation conditions has been made, all subchannels (Nu = N) utilize the same size M (Mn = M) and the same power fraction (εn = 1/N), independently by the single subchannel condition. In the following, this case is referred to as the reference scheme (conventional OFDM scheme with no adaptation). The total number of bits transmitted by every OFDM symbol time Ts is bT = N log2 (M). The basic principle of adaptive modulation techniques is the opportunity of dynamically modifying the modulation parameters (Mn , εn , and Nu ) according to the time-variant channel conditions [3]. This can be accomplished efficiently if the transmitter knows the channel state. A feedback channel should thus be available, as shown in Figure 1, in order to pass the CSI to the transmitter. The rate of CSI depends on the channel variability, in particular on the channel coherence time. Common adaptive schemes require that each subchannel be loaded using a particular constellation size Mn and fractional power level εn , different from that allocated in the other subchannels [7]. The optimal set for εn and Mn , that maximizes the power margin, is given by the Campello’s conditions [9]. In those cases, all source bits are assumed to have the same importance (EEP). These algorithms lead to a high level of modem complexity and the necessity to provide a large signaling overhead in time-varying wireless channels, especially in high-speed systems. To partially overcome this problem, some techniques appeared in the Literature; Grunheid et al. [28] propose a simplified scheme where the optimization is performed with a blockwise allocation of modulation levels. In [29], it is shown that a constant power allocation scheme has a negligible performance loss compared to the optimal solution. In order to obtain low complexity modems, we herein propose a modified scheme transmitting the same amount of bits bT as in the reference scheme, but where only a subset Nu ≤ N of the available N subchannels is effectively used. Now, the actual constellation size has to be suitably increased in order to allocate all the bT bits, that is, Mn = M = 2bT /Nu ,

n = 1, 2, . . . , Nu .

(10)

Obviously, only a limited number of values for Nu is allowed if we want the constellation size M to result in a practical integer value. The total transmitted power is uniformly distributed among the Nu used subchannels as a consequence (εn = 1/Nu ). The basic idea of the ordered subcarrier selection algorithm that we propose herein is to select only the strongest Nu subchannels (i.e., the subchannels characterized by a higher value for |Hn |2 ) and to use higher constellation sizes by keeping the total bit rate and transmitted power unchanged. In our approach, both the power level and the constellation size are kept constant over the selected set of subchannels. The receiver’s task is to estimate the channel gain Hn , select the

Nu strongest (most reliable) subchannels and, through the feedback channel, inform the transmitter which to use in the next packet transmission. It is to be pointed out that the feedback throughput required is very limited compared to that required by common bit loading algorithms [5]. To find which choice for Nu gives good results, we analyze the average bit error probability, obtained by the algorithm proposed, in the case where all subchannels are affected by independent Rayleigh fading. This means that λ = |Hn |2 is an exponentially distributed random variable [27]. The functions fλ (λ) = e−λ ,

(11)

Fλ (λ) = 1 − e−λ ,

are the probability density function (pdf) and the cumulative distribution function (cdf), respectively, of λ. The fading process has been normalized so that Eλ [λ] = 1. Ex [·] denotes the statistical expectation over the random variable x. According to the new algorithm, prior to data assignment, subchannels are ordered so that λ1 ≤ λ2 ≤ · · · ≤ λN , with λk = |Ho(k) |2 . The index ordering is taken into account by the function n = o(k). Referring to the order statistic theory results [30], the pdf for λk can be expressed as follows:  

 k−1 

fk λk =

N!Fλ λk

 N −k  

1 − Fλ λk (k − 1)!(N − k)! 

N!e−λk (N −k+1) 1 − e−λk = (k − 1)!(N − k)!

fλ λk

(12)

k−1

,

k = 1, . . . , N.

Looking at (12), we may observe that ordered fading statistics depend on the subchannel index k. The average bit error probability on the kth (ordered) subchannel is defined as follows: 



P bk = Eλk Pbk ,

(13)

where the Eλk [Pbk ] is obtained by averaging (7) over the statistics given by (12). Considering that only the Nu strongest subchannels are used, the final average bit error probability expression becomes Pb =

1 Nu

N '

P bk .

(14)

k=N −Nu +1

In [31], it has been verified analytically that the average bit error probability in (14) is minimized for Nu = N/2 if M = 4 (i.e., when only half subchannels and quadruplicated constellation size are used) and Nu = 2N/3 if M = 16. This result shows that the optimum choice of the number of active subchannels Nu does not depend on the actual instantaneous SNR but only on the long-term overall channel statistics (in this case, Rayleigh fading). The same minimum has been obtained by simulation in particular practical cases, that is, considering the 5 GHz ETSI channel models [32] and block and convolutional channel coding [31]. The performance gain obtained is induced by the selection of the more reliable subcarriers.

Layered Video Transmission on Adaptive OFDM Wireless Systems 3.2. Extension to the multilayer case We now extend the considered bit loading algorithm to the multilayer case, where several data streams must be transmitted simultaneously with different performance requirements (UEP) as typical in multimedia applications. In this case, the total number of subchannels is divided into L sets (the number of layers), each one, denoted with C(l), is associated to a different layer. The bit stream, with bit rate Brl , associated to each layer is required to have a specific target bit error probability Pbl . The problem is to find the optimal set of parameters {C(l), Mn , εn } (l = 1, 2, . . . , L and n = 1, 2, . . . , N), given Pbl , Brl , and the channel state Hn that minimizes the total transmitted power PT . The optimization problem is NPhard [33] and some suboptimal algorithms are present in the literature [13, 24, 33]. They require the knowledge of the relationship between the video quality, in terms of PSNR (see below) or subjective measures, and the correspondent bit error probability Pbl required for each layer. However, this relationship is not easy to find as it requires extensive simulation or, alternatively, a model valid in general conditions. In the paper herein, we investigate a more simple suboptimal scheme capable of realizing UEP at modulation level. It is based on the above-mentioned adaptive ordered subcarrier selection algorithm, where UEP is simply achieved by assigning the bits, belonging to each layer, to subchannels starting from the most reliable down to the least reliable. It must be highlighted that the ordered subcarrier selection algorithm minimizes the overall average bit error probability in (14). However, the layered bit assignment described above leads to an unbalanced average bit error probability between different layers data streams, since bits belonging to more important layers are more protected due to the ordering process. 4.

MPEG-4 CODING

In order to evaluate the performance of video transmission with the proposed technique, we focused on MPEG-4 [34], the latest MPEG ISO/IEC standard for video compression. The MPEG-4 standard utilizes the concept of object-based coding, allowing interactivity, and layered coding. The MPEG-4 bitstream is basically structured in video objects (VO’s), video object layers (VOL’s), that is, the information related to an object in a scalability layer, video object planes (VOP’s), that is, the instance of an object in a frame and, optionally, groups of video object planes (GOV’s) and packets. Just like most video compression standards, it extensively relies on prediction and entropy coding and it is consequently very sensitive to channel errors. With the goal of transmission over error prone channels, some error resilience tools have been added to the MPEG-4 standard: reversible variable length codes (RVLC), header extension codes (HEC), resync markers, and data partitioning help in adding robustness to the MPEG-4 bitstream. With the use of resync markers, the MPEG-4 bitstream is composed of packets which are of almost the same length, separated by start codes, unique words recognizable from any sequence of

1561

I frames

Header DC-DCT coefficients Marker AC-DCT coefficients

P frames

Header

L1

Motion data

Marker

L2

Texture data

L3

Figure 2: Data partitioning of the MPEG-4 packet.

variable length codewords, but not robust to channel errors. The data partitioning tool allows the separation of data with different significances within the packet. Regardless of these tools, MPEG-4 video transmission over wireless channels is still critical: for this reason, studies aimed at efficiently transmitting MPEG-4 video over wireless channels are currently being performed. If properly exploited, error resilience tools can produce a further improvement of the received video quality. In particular, the data partitioning tool can be usefully exploited with the purpose of performing UEP: information bits contained in each packet are split into three partitions, each of which has a different sensitivity to channel errors. As shown in Figure 2, for intra (I) frames—reference frames for predictive coding—partitions consist of a header, DC discrete cosine transform (DCT) coefficients, and ACDCT coefficients, separated by a marker. As far as predicted (P) frames are concerned, partitions consist of a header, a motion partition, containing data for the motion compensation, thus very sensitive to channel errors, and a texture partition, separated by a marker. The data partitioning tool may thus be exploited to perform UEP, both at channel coding and at modulation level. 5. 5.1.

SYSTEM PARAMETERS Source coding parameters

In this work, as in [22], we coded according to the MPEG-4 standard the first 13 frames of a video sequence (the “Foreman” test sequence in CIF format) at a bit rate of 644 kbps. The MoMuSys MPEG-4 codec [35] has been used, with some modifications in the decoder, in order to improve the robustness to errors. Additional standard-compatible error resilience techniques have also been adopted. In particular, the lack of robustness in packet/VOP/GOV headers has been afforded with the technique presented in [36], allowing error detection in these critical portions of the bitstream through transparent cyclic redundancy check (CRC); the technique described in [37] has been considered for the reorganization of the bitstream in packets with fixed length and made of fixed length partitions (Figure 3), and for increasing the start codes robustness through substitution with more robust synchronization words.

1562

EURASIP Journal on Applied Signal Processing

SC substitution Stuffed variable & stuffing length packet

MPEG-4 coder

Variable length packets

Iˆ l2

l1

l3

···

···

Substituted SC

Substituted SC

Queue 1

Queue 2

···

L1

Queue 3

···

L2

L bits Fixed length packet

···

L3

[1, 2]: Ts = 4 microseconds, NFFT = 64, N = 48, Tg = 800 nanoseconds, subcarrier spacing ∆ f = 312.5 KHz. In this case, ηD = 0.8. The total system capacity is kept constant at bT /Ts = 24 Mbps. Since the average code rate is Rc = 1/2, the final useful bit rate becomes Br = 12 Mbps. The transmission of one packet requires 10 OFDM frames. As the fixed total bit rate is here 12 Mbps, we supposed to send a packet every 678 microseconds, considering others multimedia streams to be transmitted in the remaining time. As far as the channel model is concerned, we refer to the 5 GHz “E” ETSI channel model [32] (outdoor in non line-ofsight condition) characterized by 18 Rayleigh fading paths. The channel is assumed invariant during the transmission of each packet. The optimization (bit loading) is performed, according to the temporal evolution of the channel, every Tcsi seconds, supposing that the CSI is sent with the same rate. It is advisable that Tcsi < Tch , where Tch is the channel correlation time. In [31], it has been shown that no significant performance degradation is present if Tcsi < 7 − 10 milliseconds, in the case the user moves with a maximum speed of 3 Km/h.

To channel encoder

Figure 3: Reorganization of the MPEG-4 bitstream in fixed-length packets.

In this work, we have organized the bitstream in packets made of 432 bits, with 27 bits for the first portion of the packet, containing start codes and headers, 108 bits for the second portion of the packet, containing data relevant to the first data partitions, and 297 bits for the last portion, containing data relevant to the second data partitions. Consequently, the unequal protection schemes considered in the paper, both through modulation and through channel codes, refer to a fixed packet structure. When the UEP is realized at channel coding level, the following coding rates are used for each layer: Rc1 = 1/3, Rc2 = 8/21, Rc3 = 8/13, for an average code rate of Rc  1/2. For a fair comparison, when EEP is adopted or UEP is implemented at modulation level, the coding rate is kept constant for all layers to Rc = 1/2 as well. Rate compatible punctured recursive systematic convolutional (RCPRSC) codes with rational systematic generator matrix Gs(D) = (1, R1(D) = (1 + D + D2 + D4 )/(1 + D3 + D4 ), R2(D) = (1 + D2 + D3 + D4 )/(1 + D3 + D4 )), and puncturing period P = 8 are used [18]. We assume in the following that the first frame is received error free in order to conceal the subsequent frames; we may in fact retransmit the frame if any errors occur since a small delay may be tolerated at the beginning of the bitstream. 5.2. Transmission system parameters Without loss in generality, system parameters are taken from the IEEE802.11a physical layer specifications (or Hiperlan2)

6.

NUMERICAL RESULTS

The comparison among the different UEP techniques has been performed for MPEG-4 video transmission over wireless systems, in terms of PSNR. The PSNR is a measure of the video quality defined, in dB, as follows: 

PSNR = 20 log10



255 , RMSE

(15)

where RMSE is the square root of the mean square error M N 

MSE =

i=1

j =1

2

f (i, j) − F(i, j) , M×N

(16)

f (i, j) and F(i, j) being the luminance of the pixel (i, j) in the source and the reconstructed images, containing M × N pixels each. We evaluate the PSNR on the luminance component (Y) of the frame. The PSNR is averaged over the nine P frames of the first GOV and the first four frames of the second GOV. Results of thirty simulations, performed with different noise seeds, have been averaged in order to obtain more reliable results. The average PSNR is thus

PSNRavg =

N Nf 1 's ' PSNR(s, f ), Ns N f s=1 f =1

(17)

where Ns is the number of simulations performed, N f the number of frames considered in the average, and s and f are the simulation index and the frame index. In the case considered, Ns = 22.

Layered Video Transmission on Adaptive OFDM Wireless Systems

1563 36

Table 1: Schemes considered. Reference scheme: no adaptation, no UEP. Adaptive ordered subcarrier selection algorithm, Nu = N (UEP through subcarrier reordering). Adaptive Campello’s algorithm, no UEP. Adaptive ordered subcarrier selection algorithm, Nu = N/2 (UEP through subcarrier reordering). Reference scheme: no adaptation, UEP at channel coding level. Adaptive Campello’s algorithm, UEP at channel coding level.

B C D E F

34 32 30 PSNR (dB)

A

28 26 24 22 20 18 16 14

1

2

3

4

6

7

8

9

10

11

12

13

Frame number

36 A B

34 32 30

C D

Figure 5: Performance comparison between schemes A, B, C, and D in terms of PSNR (dB) as a function of the frame number for Eb /N0 = 7 dB (EEP channel coding).

28 26 24 22

36

20

34

18

32

16

30

14

1

2

3

4

5

6

7

8

9

10

11

12

13

Frame number A B

C D

PSNR (dB)

PSNR (dB)

5

28 26 24 22 20

Figure 4: Performance comparison between schemes A, B, C, and D in terms of PSNR (dB) as a function of the frame number for Eb /N0 = 11 dB (EEP channel coding).

18 16 14

6

7

8

9

10

11

Eb /N0 (dB)

In the following, we consider the bit loading schemes A, B, C, and D with reference to Table 1. It is worth noticing that scheme C (Campello) is the optimal bit loading solution but it does not offer the possibility to perform UEP since the bit error rate is the same for all subchannels. On the contrary, the proposed schemes B and D are suboptimal but they allow the possibility to perform UEP at modulation level due to the subchannel ordering process. Scheme A (no adaptation) is considered for comparison. Also in this case, UEP cannot be performed at modulation level. The different schemes considered are reported for better clarity in Table 1. In Figures 4 and 5, the evolution of the PSNR as a function of the frame number is shown for schemes A, B, C, and D. The case Eb /N0 = 11 dB is reported in Figure 4 where, comparing curves A and C, it is possible to have an idea of the large gain (up to 18 dB) obtainable by the introduction of adaptive loading algorithms. The same gain is achieved with

A B

C D

Figure 6: PSNR (dB) of frames I versus SNR (Eb /N0 ) for schemes A, B, C, and D (EEP channel coding).

the simpler scheme D, proposed herein, that employs the UEP at modulation level. The benefit of UEP becomes more evident at lower SNR, as shown in Figure 5 for Eb /N0 = 7 dB, where both schemes B and D, that implement UEP at modulation level, overcome scheme C. In any case, the gain with respect to the reference scheme (scheme A) remains still remarkable (about 7 dB). The PSNR against Eb /N0 , related to I frames only, is shown in Figure 6. It is interesting to highlight the crossing point between curves C and D referring to Campello’s (EEP)

EURASIP Journal on Applied Signal Processing 36

36

34

34

32

32

30

30

28

28

PSNR (dB)

PSNR (dB)

1564

26 24

26 24

22

22

20

20

18

18

16

16

14

14 6

7

8

9

10

11

1

2

3

4

6

7

8

9

10

11

12

13

Frame number

Eb /N0 (dB) A B

5

A B C

C D

Figure 7: PSNR (dB) of frames P versus SNR (Eb /N0 ) for schemes A, B, C, and D (EEP channel coding).

36

D E F

Figure 9: Performance comparison between schemes A–F in terms of PSNR (dB) as a function of the frame number for Eb /N0 = 7 dB. Schemes E and F are the same as A and C but with UEP realized through channel coding.

34 32

PSNR (dB)

30 28 26 24 22 20 18 16 14

1

2

3

4

5

6

7

8

9

10

11

12

13

Frame number A B C

D E F

Figure 8: Performance comparison between schemes A–F in terms of PSNR (dB) as a function of the frame number for Eb /N0 = 11 dB. Schemes E and F are the same as A and C but with UEP realized through channel coding.

and the ordered subcarrier selection algorithms, respectively; at low SNR values, the UEP makes scheme D more robust than scheme C, even though scheme D is simpler and it is suboptimal in single-layer systems. In fact, the gain due to UEP of scheme D at low SNR values is able to compensate for the loss due the suboptimality of bit loading. For better channel conditions (high SNR values), the UEP benefits decreases and cannot compensate for the suboptimality loss of

the bit loading algorithm. A similar behavior can be seen in Figure 7 regarding P frames. The impact of UEP realized through channel coding is illustrated in Figures 8 and 9 for Eb /N0 = 11 dB and Eb /N0 = 7 dB, respectively. Schemes E and F are the same as A and C but with UEP realized at channel coding level. Schemes A–D are reported for comparison. At high SNR levels (Figure 8), the impact of UEP is significant only when applied to the reference scheme (curve E), but does not give any significant improvement to scheme C (curve F). On the contrary, at lower values for SNR (Figure 9), the performance obtained with scheme F becomes comparable to the performance of scheme D, where the UEP is realized at modulation level. It should be remarked that scheme D is much less complex that scheme C. The visual quality improves too with the adaptive loading technique considered, as shown in Figure 10, where the received frame number 9 of the Foreman sequence is reported for schemes A, C, D, E, and F in the case of Eb /N0 = 7 dB. The quality improvement is evident above all for schemes D and F. 7.

CONCLUSIONS

In this paper, adaptive loading techniques for multicarrier modulation, applied to Hiperlan2 physical layer system, have been analyzed and compared. A simple multilayer bit loading algorithm has been considered, in order to perform UEP at modulation level, and compared with other bit-loading and UEP schemes. The technique has been applied to MPEG-4 video transmission with good performance gain results over

Layered Video Transmission on Adaptive OFDM Wireless Systems

1565

[4] [5] [6]

(a) Original.

(b) Scheme A.

[7]

[8]

[9]

(c) Scheme C.

(d) Scheme D.

[10] [11]

[12] (e) Scheme E.

(f) Scheme F.

Figure 10: Frame (P) number 9 of the Foreman sequence. Eb /N0 = 7 dB.

no adapted schemes, allowing an acceptable video reception also at low SNR values. It has been shown that for high values of SNR, the performance improvement is due mainly to the adaptation of bit loading algorithms to channel impairments, whereas at low SNR values, the advantage introduced by the UEP becomes more significant both at modulation and coding levels.

ACKNOWLEDGMENT The work has been partly supported by the Ministero dell’Istruzione, dell’Universit`a e della Ricerca (MIUR) in the framework of the “Virtual Immersive Communication” (VICOM) project.

[13]

[14]

[15]

[16]

[17]

REFERENCES [1] IEEE802.11, part 11, “Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications: High Speed Physical Layer in the 5 GHz Band,” P802.11a/D7.0, July 1999. [2] ETSI, “Broadband Radio Access Networks (BRAN) Hiperlan Type2—Physical Layer,” TS 101 475, V 1.1.1, April 2000. [3] T. Keller and L. Hanzo, “Adaptive multicarrier modulation: a convenient framework for time-frequency processing in wire-

[18] [19]

less communications,” Proceedings of the IEEE, vol. 88, no. 5, pp. 611–640, 2000. I. Kalet, “The multitone channel,” IEEE Trans. Communications, vol. 37, no. 2, pp. 119–124, 1989. J. A. C. Bingham, ADSL, VDSL, and Multicarrier Modulation, John Wiley & Sons, New York, NY, USA, 2000. J. A. C. Bingham, “Multicarrier modulation for data transmission: an idea whose time has come,” IEEE Communications Magazine, vol. 28, no. 5, pp. 5–14, 1990. P. S. Chow, J. M. Cioffi, and J. A. C. Bingham, “A practical discrete multitone transceiver loading algorithm for data transmission over spectrally shaped channels,” IEEE Trans. Communications, vol. 43, no. 2/3/4, pp. 773–775, 1995. R. F. H. Fischer and J. B. Huber, “A new loading algorithm for discrete multitone transmission,” in Proc. IEEE Global Telecommunications Conference (GLOBECOM ’96), vol. 1, pp. 724–728, London, UK, November 1996. J. Campello, “Optimal discrete bit loading for multicarrier modulation systems,” in Proc. IEEE International Symposium on Information Theory (ISIT ’98), p. 193, Cambridge, Mass, USA, August 1998. A. Czylwik, “Adaptive OFDM for wideband radio channels,” in Proc. IEEE Global Telecommunications Conference (GLOBECOM ’96), vol. 1, pp. 713–718, London, UK, November 1996. L. van der Perre, S. Thoen, P. Vandenameele, B. Gyselinckx, and M. Engels, “Adaptive loading strategy for a high speed OFDM-based WLAN,” in Proc. IEEE Global Telecommunications Conference (GLOBECOM ’98), vol. 4, pp. 1936–1940, Sydney, NSW, Australia, November 1998. A. N. Barreto and S. Furrer, “Adaptive bit loading for wireless OFDM systems,” in Proc. 12th IEEE International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC ’01), vol. 2, pp. G–88–G–92, San Diego, Calif, USA, September–October 2001. D. Dardari, M. G. Martini, M. Milantoni, and M. Chiani, “MPEG-4 video transmission in the 5 GHz band through an adaptive OFDM wireless scheme,” in Proc. 13th IEEE International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC ’02), vol. 4, pp. 1680–1684, Lisboa, Portugal, September 2002. Q. Su and S. Schwartz, “Effects of imperfect channel information on adaptive loading gain of OFDM,” in IEEE 54th Vehicular Technology Conference (VTC ’01), vol. 1, pp. 475–478, Atlantic City, NJ, USA, October 2001. M. R. Souryal and R. L. Pickholtz, “Adaptive modulation with imperfect channel information in OFDM,” in Proc. IEEE International Conference on Communications (ICC ’01), vol. 6, pp. 1861–1865, Helsinki, Finland, June 2001. S. Ye, R. S. Blum, and L. J. Cimini Jr., “Adaptive modulation for variable-rate OFDM systems with imperfect channel information,” in IEEE 55th Vehicular Technology Conference (VTC ’02), vol. 2, pp. 767–771, Birmingham, Ala, USA, May 2002. W. Xu, S. Heinen, M. Adrat, P. Vary, T. Hindelang, M. Schmautz, and J. Hagenauer, “An adaptive multirate speech codec proposed for the GSM,” in Proc. 3rd ITG Conference on Source and Channel Coding, pp. 51–56, Munchen, Germany, January 2000. J. Hagenauer and T. Stockhammer, “Channel coding and transmission aspects for wireless multimedia,” Proceedings of the IEEE, vol. 87, no. 10, pp. 1764–1777, 1999. A. A. Alatan, M. Zhao, and A. N. Akansu, “Unequal error protection of SPIHT encoded image bit streams,” IEEE Journal on Selected Areas in Communications, vol. 18, no. 6, pp. 814–818, 2000.

1566 [20] H. Gharavi and S. M. Alamouti, “Multipriority video transmission for third-generation wireless communication systems,” Proceedings of the IEEE, vol. 87, no. 10, pp. 1751–1763, 1999. [21] M. Budagavi, W. R. Heinzelman, J. Webb, and R. Talluri, “Wireless MPEG-4 video communication on DSP chips,” IEEE Signal Processing Magazine, vol. 17, no. 1, pp. 36–53, 2000. [22] M. G. Martini and M. Chiani, “Wireless transmission of MPEG-4 video: performance evaluation of unequal error protection over a block fading channel,” in Proc. 53rd IEEE Vehicular Technology Conference (VTC ’01 Spring), vol. 3, pp. 2056– 2060, Rhodes, Greece, May 2001. [23] ETSI EN 300 744, European Standard, “Digital video broadcasting (DVB): framing structure, channel coding and modulation for digital terrestrial television,” 2001. [24] H. Zheng and K. J. R. Liu, “Robust image and video transmission over spectrally shaped channels using multicarrier modulation,” IEEE Trans. Multimedia, vol. 1, no. 1, pp. 88–103, 1999. [25] S. B. Weinstein and P. M. Ebert, “Data transmission by frequency-division multiplexing using the discrete Fourier transform,” IEEE Trans. Communications, vol. 19, no. 5, pp. 628–634, 1971. [26] D. Dardari and V. Tralli, “High-speed indoor wireless communications at 60 GHz with coded OFDM,” IEEE Trans. Communications, vol. 47, no. 11, pp. 1709–1721, 1999. [27] J. G. Proakis, Digital Communications, McGraw Hill, New York, NY, USA, 4th edition, 2001. [28] R. Grunheid, E. Bolinth, and H. Rohling, “A blockwise loading algorithm for the adaptive modulation technique in OFDM systems,” in IEEE 54th Vehicular Technology Conference (VTC ’01), vol. 2, pp. 948–951, Atlantic City, NJ, USA, October 2001. [29] W. Yu and J. M. Cioffi, “On constant power water-filling,” in Proc. IEEE International Conference on Communications (ICC ’01), vol. 6, pp. 1665–1669, Helsinki, Finland, June 2001. [30] H. A. David, Order Statistics, John Wiley & Sons, New York, NY, USA, 1981. [31] D. Dardari, “A low complexity and low overhead adaptive bit loading algorithm for OFDM based high-speed WLAN systems,” Cost meeting Rep. COST 273 TD(03)155, Prague, Czech Republic, September 2003. [32] J. Medbo, H. Andersson, and P. Schramm, “Channel models for Hiperlan/2 in different indoor scenarios,” Cost meeting Rep. COST 259 TD (98) 070, Bradford, UK, April 1998. [33] L. M. C. Hoo, J. Tellado, and J. Cioffi, “Discrete dual QoS loading algorithms for multicarrier systems,” in Proc. IEEE International Conference on Communications (ICC ’99), vol. 2, pp. 796–800, Vancouver, British Columbia, Canada, June 1999. [34] MPEG-4 standard, MPEG-4 Video Group, “Final Draft of International Standard,” ISO-IEC/JTC1/SC29/WG11 N2502a, Atlantic City, NJ, USA, October 1998. [35] MoMuSys project website, http://www.cordis.lu/infowin/acts/ rus/projects/ac098.htm. [36] M. G. Martini and M. Chiani, “Joint source-channel error detection with standard compatibility for wireless video transmission,” in Proc. IEEE Wireless Communications and Networking Conference (WCNC ’02), vol. 1, pp. 215–219, Orlando, Fla, USA, March 2002. [37] M. G. Martini, M. Mazzotti, and M. Chiani, “Fixed-packetlength transcoding for error resilient video transmission over WCDMA radio links,” in Proc. IEEE Packet Video, Nantes, France, April 2003.

EURASIP Journal on Applied Signal Processing D. Dardari was born in Rimini, Italy, on January 19, 1968. He received his Laurea degree in electronic engineering (with the highest honors) and his Ph.D. degree in electronic engineering and computer science from the University of Bologna, Italy, in 1993 and 1998, respectively. In the same year, he joined the Dipartimento di Elettronica, Informatica e Sistemistica to develop his research activity in the area of digital communications. During the research activity, he has collaborated and taken a significant role in the following national and European projects: European project “PROMETHEUS” regarding short-range communication systems for cooperative driving; MIUR “WWLAN” project for wideband high-speed wireless LAN; CNIT/ASI (Italian Space Agency) projects “Integration of Multimedia Services on Heterogeneous Satellite Networks” and “Study, Design and Realization of Guaranteed Quality of Service Re-configurable Satellite Network for Multimedia Applications;” MIUR project “VICOM” (Virtual Immersive Communications). Since 2000, he has been a Research Associate at the University of Bologna. He held the position of Lecturer of electrical communications and digital transmission and telecommunications systems at the same university. His research interests are in OFDM systems, nonlinear effects, cellular mobile radio, satellite systems, and wireless LAN. He is a Member and a Reviewer of the IEEE. M. G. Martini studied electronic engineering at the University of Perugia (Italy) and at the University of Li`ege (Belgium) and received the Laurea degree in electronic engineering (with the highest honors) from the University of Perugia (Italy) in 1998. After a collaboration with the Aerospace Engineering Department, University of Rome (Italy), she joined in February 1999 the Dipartimento di Elettronica, Informatica e Sistemistica (DEIS), University of Bologna. Here, she has worked as key person for several national and international projects, such as the “Joint Source and Channel Coding” (JSCC) project, in cooperation with Philips Research Monza and Philips Research France, the JOCO (Joint Source and Channel Coding Driven Digital Baseband Design for 4G Multimedia Streaming), and Phoenix (Jointly Optimising Multimedia Transmission in IP Based Wireless Networks) European IST projects. She received the Ph.D. degree in electronics and computer science from the University of Bologna in March 2002. She is currently a Postdoc Researcher at DEIS, University of Bologna. Her research interests are mainly in video coding, channel coding, joint source and channel coding, error resilient video transmission, and wireless multimedia networks. M. Mazzotti was born in Lugo, Italy, on 12 March 1977. He received the degree in telecommunications engineering (with the highest honors) in July 2002, from the University of Bologna. Now he is working as a Ph.D. student in the Dipartimento di Elettronica, Informatica e Sistemistica in the University of Bologna. His main research interests cover multimedia communications, joint source and channel coding, and wireless communication systems.

Layered Video Transmission on Adaptive OFDM Wireless Systems M. Chiani was born in Rimini, Italy, on April 4, 1964. He received the Dr.-Ing. degree (with the highest honors) in electronic engineering and the Ph.D. degree in electronic engineering and computer science from the University of Bologna, Bologna, Italy, in 1989 and 1993, respectively. From 1994, he was with the Dipartimento di Elettronica, Informatica e Sistemistica, University of Bologna, Cesena, where he is currently a Professor and Chair for telecommunications. His research interests include the areas of communications theory, coding, and wireless networks. Dr. Chiani is an Editor for Wireless Communications, IEEE Transactions on Communications, and Chair of the Radio Communications Committee, IEEE Communications Society. He was in the Technical Program Committees of the IEEE Conferences GLOBECOM 1997, ICC 1999, ICC 2001, ICC 2002, GLOBECOM 2003, and ICC 2004, and is a Senior Member of IEEE.

1567

EURASIP Journal on Applied Signal Processing 2004:10, 1568–1584 c 2004 Hindawi Publishing Corporation 

Multicarrier Block-Spread CDMA for Broadband Cellular Downlink Frederik Petre´ Wireless Research, Interuniversity MicroElectronics Center (IMEC), Kapeldreef 75, 3001 Leuven, Belgium Email: [email protected]

Geert Leus Electrical Engineering, Mathematics and Computer Science, Delft University of Technology (TUDelft), Mekelweg 4, 2628 CD Delft, The Netherlands Email: [email protected]

Marc Moonen Department of Electrical Engineering (ESAT), Katholieke Universiteit Leuven (KULeuven), Kasteelpark Arenberg 10, 3001 Leuven, Belgium Email: [email protected]

Hugo De Man Interuniversity MicroElectronics Center (IMEC), Kapeldreef 75, 3001 Leuven, Belgium Email: [email protected] Received 6 March 2003; Revised 7 November 2003 Effective suppression of multiuser interference (MUI) and mitigation of frequency-selective fading effects within the complexity constraints of the mobile constitute major challenges for broadband cellular downlink transceiver design. Existing wideband direct-sequence (DS) code division multiple access (CDMA) transceivers suppress MUI statistically by restoring the orthogonality among users at the receiver. However, they call for receive diversity and multichannel equalization to improve the fading effects caused by deep channel fades. Relying on redundant block spreading and linear precoding, we design a so-called multicarrier block-spread- (MCBS-)CDMA transceiver that preserves the orthogonality among users and guarantees symbol detection, regardless of the underlying frequency-selective fading channels. These properties allow for deterministic MUI elimination through low-complexity block despreading and enable full diversity gains, irrespective of the system load. Different options to perform equalization and decoding, either jointly or separately, strike the trade-off between performance and complexity. To improve the performance over multi-input multi-output (MIMO) multipath fading channels, our MCBS-CDMA transceiver combines well with space-time block-coding (STBC) techniques, to exploit both multiantenna and multipath diversity gains, irrespective of the system load. Simulation results demonstrate the superior performance of MCBS-CDMA compared to competing alternatives. Keywords and phrases: multicarrier CDMA, broadband cellular system, frequency-selective fading channels, equalization, MIMO, space-time block coding.

1. INTRODUCTION The main drivers toward future broadband cellular systems, like high-speed wireless internet access and mobile multimedia, require much higher data rates in the downlink (from base to mobile station) than in the uplink (from mobile to base station) direction. Given the asymmetric nature of most of these broadband services, the capacity and performance bottlenecks clearly reside in the downlink of these future systems. Broadband cellular downlink communications poses three main challenges to successful transceiver design. First, for increasing data rates, the underlying multipath channels

become more time dispersive, causing intersymbol interference (ISI) and interchip interference (ICI), or, equivalently, frequency-selective fading. Second, due to the increasing success of future broadband services, more users will try to access the common network resources, causing multiuser interference (MUI). Both ISI/ICI and MUI are important performance limiting factors for future broadband cellular systems, because they determine their capabilities in dealing with high data rates and system loads, respectively. Third, cost, size, and power consumption issues put severe constraints on the receiver complexity at the mobile station (MS).

Multicarrier Block-Spread CDMA Direct-sequence (DS) code division multiple access (CDMA) has emerged as the predominant air interface technology for the 3G cellular standard [1], because it increases capacity and facilitates network planning in a cellular system, compared to conventional multiple access techniques like frequency-division multiple access (FDMA) and timedivision multiple access (TDMA) [2]. In the downlink, DSCDMA relies on the orthogonality of the spreading codes to separate the different user signals. However, ICI destroys the orthogonality among users, giving rise to MUI. Since the MUI is essentially caused by the multipath channel, linear chip-level equalization, followed by correlation with the desired user’s spreading code, allows to suppress the MUI [3, 4, 5, 6]. However, chip equalizer receivers suppress MUI only statistically, and require receive diversity to cope with the effects caused by deep channel fades [7, 8]. On the other hand, it is well known that orthogonal frequency-division multiplexing (OFDM), also called multicarrier (MC) modulation, with cyclic prefixing (CP) constitutes an elegant solution to combat the wireless channel impairments [9, 10, 11]. It converts a frequency-selective channel into a number of parallel flat fading channels by multiplexing blocks of information symbols on orthogonal subcarriers using implementation efficient fast Fourier transform (FFT) operations. Hence, the complex equalizer commonly encountered in single-carrier (SC) systems reduces to a set of parallel and independent single-tap equalizers. However, OFDM, in itself, does not extract frequency diversity, but calls for bandwidth overconsuming forward error correction (FEC) coding techniques to enable frequency diversity [12]. Furthermore, OFDM as such does not support multiple users but requires a multiple access technique on top of it. In this paper, we propose a novel MC-CDMA transceiver that synergistically combines the advantages of DS-CDMA and OFDM to tackle the challenges of broadband cellular downlink communications. By capitalizing on the general concepts of redundant block spreading and linear precoding, our so-called multicarrier block-spread- (MCBS-)CDMA transceiver possesses three unique properties compared to competing alternatives (Section 2). First, by CP or zero padding (ZP) the block-spread symbol blocks, our MCBSCDMA transceiver preserves the orthogonality among users, regardless of the underlying time-dispersive multipath channels. This property allows for deterministic (as opposed to statistical) MUI elimination through low-complexity and channel-independent block despreading. Second, redundant linear precoding guarantees symbol detectability and full frequency-diversity gains, thus robustifying the transmission against deep channel fades. Assuming perfect channel state information (CSI) at the receiver, different equalization and decoding options, ranging from linear over decisiondirected to maximum likelihood (ML) detection, strike the trade-off between performance and complexity (Section 3). Finally, our transceiver exhibits a rewarding synergy with multiantenna techniques, to increase the spectral efficiency and to improve the link reliability of multiple users in a broadband cellular network (Section 4). Simulation results

1569 demonstrate the outstanding performance of the proposed transceiver compared to competing alternatives (Section 5). Several other MC-CDMA techniques that also combine CDMA with OFDM have recently gained increased momentum as candidate air interface for future broadband cellular systems [13]. Three different flavours of MC-CDMA exist, depending on the exact position of the CDMA and the OFDM component in the transmission scheme. The first variant, called MC-CDMA, performs the spreading operation before the symbol blocking (or serial-to-parallel conversion), which results in a spreading of the information symbols across the different subcarriers [14, 15, 16]. However, like classical DS-CDMA, MC-CDMA does not enable full frequency-diversity gains. The second variant, called MCDS-CDMA, executes the spreading operation after the symbol blocking, resulting in a spreading of the information symbols along the time axis of the different subcarriers [17, 18]. However, like classical OFDM, MC-DS-CDMA necessitates bandwidth overconsuming FEC coding plus frequencydomain (FD) interleaving to mitigate frequency-selective fading. The third variant, called multitone (MT) DS-CDMA, performs the spreading after the OFDM modulation such that the resulting spectrum of each subcarrier no longer satisfies the orthogonality condition [19]. Hence, MT-DSCDMA suffers from ISI and intertone interference (ITI), as well as MUI, and requires expensive multiuser detection techniques to achieve a reasonable performance. Finally, alternative MUI-free MC transceivers, like AMOUR [20] and generalized multicarrier (GMC) CDMA [11], rely on an orthogonal frequency-division multiple access- (OFDMA-)like approach to retain the orthogonality among users, regardless of the underlying multipath channels. Unlike our MCBSCDMA transceiver, these transceivers do not inherit the nice properties of CDMA related to universal frequency reuse1 in a cellular network, such as increased capacity and simplified network planning. Notation We use roman letters to represent scalars, lower boldface letters to denote column vectors (i.e., blocks), and upper boldface letters to denote matrices (i.e., a collection of blocks). (·)∗ , (·)T , and (·)H represent conjugate, transpose, and Hermitian, respectively. Further, | · | and  ·  represent the absolute value and Frobenius norm, respectively. We reserve E{·} for expectation and · for integer flooring. Subscripts nt and nr point to the nt th transmit and the nr th receive antenna, respectively. Superscript m points to the mth user. Argument i denotes symbol index for symbol scalar sequences and symbol block index for symbol block sequences. Likewise, argument n denotes chip index for chip scalar sequences and chip block index for chip block sequences. Tilded letters x˜ denote FD signals and upperlined letters x¯ denote space-time block-encoded signals at the transmitter and block-despread 1 Universal frequency reuse, also called frequency reuse of one-in-one, is a unique attribute of CDMA systems, which refers to the reuse of the same frequencies in neighbouring cells.

1570

EURASIP Journal on Applied Signal Processing mth user sm [i]

S/P

sm [i] B×1

Θ

˜sm [i] Q×1

Nx

TX cm [n] Other users · · · x˜ [n] x[n] u[n] u[n] x + IFFT T P/S Q×1 Q×1 K ×1

Figure 1: MCBS-CDMA downlink transmission scheme.

signals at the receiver. Acuted letters x´ denote space-time block-decoded signals at the receiver. Hatted letters xˆ denote soft estimates, whereas hatted and underlined letters xˆ denote hard estimates. 2.

MCBS-CDMA TRANSCEIVER DESIGN

Effective suppression of MUI and mitigation of ISI and frequency-selective fading, within the complexity constraints of the MS, pose major challenges to transceiver design for the broadband cellular downlink application. To tackle these challenges, we propose a novel MC-CDMA transceiver that combines two specific CDMA and OFDM concepts, namely, block-spread CDMA and linearly-precoded OFDM. The resulting so-called MCBS-CDMA transceiver exhibits two unique properties compared to competing alternatives. First, by relying on block-spread CDMA, MCBS-CDMA preserves the orthogonality among users, even after propagation through a time-dispersive multipath channel. This property allows for deterministic (as opposed to statistical) MUI elimination at the receiver through low-complexity block despreading. Second, by relying on linearly-precoded OFDM, MCBS-CDMA mitigates ISI and guarantees symbol detection, regardless of the underlying frequency-selective multipath channel. This property enables full frequency-diversity gains and, hence, robustness against frequency-selective fading at the receiver, through ML single-user equalization. Furthermore, different single-user equalization options, ranging from linear over decision-directed to ML detection, strike the trade-off between performance and complexity. This section is organized as follows. Section 2.1 introduces the MCBS-CDMA downlink transmission scheme, and motivates the different operations involved. Section 2.2 demonstrates how our MCBS-CDMA transceiver enables MUI-resilient reception over frequency-selective multipath channels. Finally, Section 2.3 argues the need for single-user equalization and guaranteed symbol detection. 2.1. MCBS-CDMA downlink transmission We consider a single cell of a cellular system with a base station (BS) serving M active MSs within its coverage area. For now, we limit ourselves to the single-antenna case and defer the multiantenna case to Section 4. The block diagram in Figure 1 describes the MCBS-CDMA downlink transmission scheme (where only the mth user is explicitly shown) that transforms the M user data symbol sequences {sm [i]}M m=1 , with a rate 1/Ts , into the multiuser chip sequence u[n], with a rate 1/Tc . Apart from the user multiplexing and the IFFT, the MCBS-CDMA transmission scheme performs three ma-

jor operations, namely, linear precoding, block spreading, and adding transmit redundancy. Since our scheme belongs to the general class of block transmission schemes, the mth user’s data symbol sequence sm [i] is first serial-to-parallel converted into blocks of B symbols, leading to the symbol block sequence sm [i] := [sm [iB], . . ., sm [(i + 1)B − 1]]T . The first operation involves complex-field linear precoding, where the encoding is performed over the complex field rather than over the Galois field, as done traditionally [21, 22]. Unlike MC-CDMA that spreads the information symbols across the subcarriers employing a user-specific spreading code [14, 15, 16], MCBS-CDMA precodes the information symbols on the different subcarriers employing a linear precoding matrix. Specifically, the information blocks sm [i] are linearly precoded by a Q × B matrix Θ to yield the Q × 1 precoded symbol blocks: ˜sm [i] := Θ · sm [i],

(1)

where Q is the number of subcarriers, and Θ is a para-unitary matrix, that is, ΘH · Θ = IB . The linear precoding can be either redundant (Q > B) or nonredundant (Q = B). For conciseness, we limit our discussion to redundant precoding, but the proposed concepts apply equally well to nonredundant precoding. As we will show later, linear precoding guarantees symbol detection and maximum frequencydiversity gains, and thus robustifies the transmission against frequency-selective fading. The second operation entails a block-spreading operation, which is also depicted in Figure 1. Unlike DS-CDMA and MC-CDMA that rely on classical symbol spreading (operating on a scalar symbol), MCBS-CDMA relies on block spreading (operating on a block of symbols). Specifically, the block sequence ˜sm [i] is block spread by a factor N with the user composite code sequence cm [n], which is the multiplication of a short (periodic) orthogonal Walsh-Hadamard spreading code that is MS specific and a long (aperiodic) overlay scrambling code that is BS specific. The chip block sequences of the different active users are added, resulting in the multiuser chip block sequence: x˜ [n] =

M '

˜sm [i]cm [n],

(2)

m=1

where the symbol block index i relates to the chip block index n through i =  n/N . The block spreading operation is also illustrated in Figure 1, where the N × replicator repeats the symbol block at its input N times. Collecting N consecutive ˜ := [˜x[iN], . . . , x˜ [(i + 1)N − 1]], chip blocks, x˜ [n], into X[i]

Multicarrier Block-Spread CDMA

1571 cm [n]∗

RX v[n]

S/P

v[n] K ×1

R

y[n] Q×1

FFT

y˜ [n] x Q×1

N

n=1

y˜ m [i] Q×1

Equalizer

sˆ m [i] B×1

P/S

sˆm [i]

Figure 2: MUI-resilient MCBS-CDMA downlink reception scheme.

we obtain the symbol block level equivalent of (2), that is: ˜ = X[i]

M '

˜ · C[i]T , ˜sm [i] · cm [i]T = S[i]

(3) 2.2.

m=1

where cm [i] := [cm [iN], . . . , cm [(i + 1)N − 1]]T is the mth user’s composite code vector used to block-spread its data ˜ symbol block ˜sm [i], S[i] := [˜s1 [i], . . . , ˜sM [i]] collects the symbol blocks of the different active users, and C[i] := [c1 [i], . . . , cM [i]] collects the composite code vectors of the different active users. The block spreading operation in (3) can be viewed as classical symbol spreading, where every user’s information symbols on the different subcarriers are spread along the time axis, using the same spreading code. Furthermore, by choosing Q sufficiently high, each subcarrier experiences frequency-flat fading, such that the orthogonality among users is preserved on every subcarrier, even after propagation through a frequency-selective channel. Consequently, as will become apparent in Section 2.2, block spreading enables MUI-resilient reception and thus effectively deals with the MUI. Subsequently, the Q × Q IFFT ˜ [n] into matrix FH Q transforms the FD chip block sequence x ˜ [n]. the time-domain (TD) chip block sequence: x[n] = FH Q ·x The third operation involves the addition of transmit redundancy. Specifically, the K × Q transmit matrix T, with K the transmitted block length, K ≥ Q, adds some redundancy to the chip blocks x[n], that is, u[n] := T · x[n]. As will be clarified later, this transmit redundancy copes with the time-dispersive effect of multipath propagation, and enables low-complexity equalization at the receiver. Finally, the resulting transmitted chip block sequence u[n] is parallelto-serial converted into the corresponding scalar sequence [u[nK], . . . , u[(n + 1)K − 1]]T := u[n], and transmitted over the air at a rate 1/Tc . By analyzing the rates of the different transmitter blocks in Figure 1, it is clear that the channel symbol rate, Rs , relates to the chip rate, Rc , through Rs = (B/K)(1/N)Rc . From a bandwidth utilization point of view, the BS transmits B information symbols to each of the M users, using NK = N(Q + L) = N(B + 2L) transmitted chips, where the overhead of 2L stems from the (B + L) × B redundant linear precoder, Θ, which guarantees symbol detection, and the length-L CP, which is common to all users and removes interblock interference (IBI). Therefore, the bandwidth efficiency of our transceiver supporting M users can be calculated as MCBS-CDMA =

Clearly, as the number of users approaches its maximum value, that is, M = N, the bandwidth efficiency also converges to its maximum value, ¯MCBS-CDMA = B/(B + 2L).

MB MB = ≤ 1. NK N(B + 2L)

(4)

MUI-resilient reception with MCBS-CDMA

Adopting a discrete-time baseband equivalent model, the synchronized and chip-sampled received signal is a channeldistorted version of the transmitted signal, and can be written as v[n] =

Lc '

h[l]u[n − l] + w[n],

(5)

l=0

where h[l] is the chip-sampled FIR channel that models the frequency-selective multipath propagation between the transmitter and the receiver including the effect of transmit and receive filters, Lc is the order of h[l], and w[n] denotes the additive Gaussian noise, which we assume to be white with variance σw2 . Furthermore, we define L as a known upper bound on the channel order L ≥ Lc , which can be well approximated by L ≈  τmax /Tc  +1, where τmax is the maximum excess delay within the given propagation environment. The block diagram in Figure 2 describes the reception scheme for the MS of interest (which we assume to be the mth one), which transforms the received sequence v[n] into an estimate of the desired user’s data symbol sequence sˆm [i]. Assuming perfect chip and block synchronization, the received sequence v[n] is serial-to-parallel converted into its corresponding block sequence v[n] := [v[nK], . . . , v[(n + 1)K − 1]]T . From the scalar input/output relationship in (5), we can derive the corresponding block input/output relationship: v[n] = H[0] · u[n] + H[1] · u[n − 1] + w[n],

(6)

where w[n] := [w[nK], . . . , w[(n + 1)K − 1]]T is the noise block sequence, H[0] is a K × K lower triangular Toeplitz matrix with entries [H[0]] p,q = h[p − q], and H[1] is a K × K upper triangular Toeplitz matrix with entries [H[1]] p,q = h[K + p − q] (see, e.g., [11] for a detailed derivation of the single-user case). The time-dispersive nature of multipath propagation gives rise to so-called IBI between successive blocks, which is modelled by the second term in (6). The Q × K receive matrix R again removes the redundancy from the blocks v[n]: y[n] := R · v[n]. The purpose of the transmit/receive pair (T, R) is twofold. First, it allows for simple block-by-block processing by removing the IBI. Second, it enables low-complexity FD equalization by making the linear channel convolution appear circulant to the received block.

1572

EURASIP Journal on Applied Signal Processing

To guarantee perfect IBI removal, the pair (T, R) should satisfy the following condition [11]: R · H[1] · T = 0.

(7)

To enable circulant channel convolution, the resulting chan˙ := R · H[0] · T should be circulant. In this way, nel matrix H we obtain a simplified block input/output relationship in the TD: ˙ · x[n] + z[n], y[n] = H

(8)

where z[n] := R · w[n] is the corresponding noise block sequence. In general, two options for the pair (T, R) exist that satisfy the above conditions. The first option corresponds to CP in classical OFDM systems [23], and boils down to choosing K = Q + L, and selecting 

T

T = Tcp := ITcp , ITQ ,





R = Rcp := 0Q×L , IQ ,

(9)

where Icp consists of the last L rows of IQ . The circulant property is enforced at the transmitter by adding a cyclic prefix of length L to each block. Indeed, premultiplying a vector with Tcp copies its last L entries and pastes them to its top. The IBI is removed at the receiver by discarding the cyclic prefix of each received block. Indeed, premultiplying a vector with Rcp deletes its first L entries and thus satisfies (7). The second option corresponds to ZP, and boils down to setting K = Q + L, and selecting 

T

T = Tz p := ITQ , 0TQ×L ,





R = R z p : = IQ , Iz p ,

(10)

where Iz p is formed by the first L columns of IQ . Unlike classical OFDM systems, here the IBI is entirely dealt with at the transmitter. Indeed, premultiplying a vector with Tz p pads L trailing zeros to its bottom and thus satisfies (7). The circulant property is enforced at the receiver by time-aliasing each received block. Indeed, premultiplying a vector with Rz p adds its last L entries to its first L entries. Referring back to (8), circulant matrices possess a nice property that enables simple per-tone equalization in the FD. Property 1. Circulant matrices can be diagonalized by FFT operations [24] ˙ = FH ˜ H Q · H · FQ ,

(11)

˜ h˜ := [H(e j0 ), H(e j(2π/Q) ), . . . , ˜ := diag(h), with H j(2π/Q)(Q −1) H(e )] the FD channel response evaluated on the  FFT grid, H(z) := Ll=0 h[l]z−l the z-transform of h[l], and FQ the Q × Q FFT matrix. Aiming at low-complexity FD processing, we transform y[n] into the FD by defining y˜ [n] := FQ · y[n]. Relying on Property 1, this leads to the following FD block input/output

relationship: ˜ · x˜ [n] + z˜[n], y˜ [n] = H

(12)

where z˜[n] := FQ · z[n] is the corresponding FD noise block sequence. Collecting N consecutive chip blocks y˜ [n] into ˜ ˜ := [˜y[iN], . . . , y˜ [(i + 1)N − 1]], defining X[i] ˜ and Z[i] Y[i] ˜ in a similar manner as Y[i], and exploiting (3), we obtain the symbol block level equivalent of (12), that is, ˜ ˜ · C[i]T + Z[i]. ˜ =H ˜ · S[i] Y[i]

(13)

By inspecting (13), we can conclude that our transceiver preserves the orthogonality among users, even after propagation through a (possibly unknown) frequency-selective multipath channel. This property allows for deterministic MUI elimination through low-complexity code-matched filtering. Indeed, by block despreading (13) with the desired user’s composite code vector cm [i] (we assume the mth user to be the desired one), we obtain ˜ · cm [i]∗ = H ˜ · Θ · sm [i] + z˜m [i], y˜ m [i] := Y[i]

(14)

˜ · cm [i]∗ is the corresponding noise block where z˜m [i] := Z[i] sequence. Our transceiver successfully converts (through block despreading) a multiuser detection problem into an equivalent but simpler single-user equalization problem. Moreover, the operation of block despreading preserves ML optimality, since it does not incur any information loss in the Shannon sense regarding the desired user’s symbol block sm [i]. In the above discussion, our main focus was on the downlink problem, which is simpler in nature than the uplink problem, since the different user signals experience the same multipath channel, time offset, and carrier frequency offset. In theory, the same signal design is also feasible in the uplink. Assuming perfect time and frequency synchronization between the different users and the BS, it can be shown that the orthogonality among users is still preserved, even if the user signals now propagate through a different multipath channel. In practice, perfect time and frequency synchronization cannot be guaranteed, since the user signals experience a different time offset and carrier frequency offset, with respect to the BS. Furthermore, the BS receiver can only compensate for a certain user’s synchronization mismatches after this user’s signal has been separated from the received multiuser mixture. Otherwise, a compensation for that particular user would affect all other users too. However, since the proposed block spreading scheme relies on the orthogonality preservation property, which requires perfect time and frequency synchronization, the synchronization mismatches would have introduced irreducible distortion at that point already. Therefore, in contrast with the downlink, which can rely on existing single-user schemes, a new scheme is needed in the uplink, in which each user estimates its synchronization mismatches with respect to the BS and compensates these before transmission, which we refer to as presynchronization. Only the small residual mismatches that remain after pre-synchronization should be compensated after separation, which we refer to as postsynchronization.

Multicarrier Block-Spread CDMA

1573

2.3. Single-user equalization for MCBS-CDMA

Table 1: Complexity of ML.

After successful elimination of the MUI, we still need to detect the desired user’s symbol block sm [i] from (14). Ignoring, for the moment, the presence of Θ (or, equivalently, set˜ to have ting Q = B and selecting Θ = IQ ), this requires H full column rank Q. Unfortunately, this condition only holds for channels that do not invoke any zero diagonal entries in ˜ In other words, if the MS experiences a deep channel fade H. on a particular tone (corresponding to a zero diagonal entry ˜ the information symbol on that tone cannot be recovin H), ered. To guarantee symbol detectability of the B symbols in sm [i], regardless of the symbol constellation, we thus need to design the precoder Θ such that ˜ · Θ) = B, rank(H

(15)

irrespective of the underlying channel realization [11]. Since an FIR channel of order L can invoke at most L zero diagonal ˜ this requires any Q − L = B rows of Θ to be entries in H, linearly independent. In [21, 22], two classes of precoders have been constructed that satisfy this condition and thus guarantee symbol detectability or, equivalently, enable full frequencydiversity gain; namely, the Vandermonde precoders and the real cosine precoders. The Q × B complex Vandermonde precoder is defined by [Θ(ρ)]q,b = ρqb , where ρ := [ρ0 , . . . , ρQ−1 ]T , and the ρq ’s, with q = 0, . . . , Q − 1, are Q complex points, such that ρq  = ρq for all q  = q . A special case of the general Vandermonde precoder is a truncated FFT matrix, defined by choosing ρq = exp(− j2πq/Q). The Q × B real cosine precoder is defined by [Θ(φ)]q,b = cos(b+1/2)φq , where φ := [φ0 , . . . , φQ−1 ]T , and the φq ’s, with = (2k + 1)π q = 0, . . . , Q − 1, are Q real points, such that φq  and φq ± φq  = 2kπ for all q  = q and k integer. A special case of the general cosine precoder is a truncated discrete cosine transform (DCT) matrix, defined by choosing φq = qπ/Q. 3.

EQUALIZATION OPTIONS

In this section, we discuss different options to perform equalization and decoding of the linear precoding, either jointly or separately, under the assumption of perfect CSI at the receiver. These options allow to trade-off performance versus complexity, ranging from optimal ML detection with exponential complexity to linear and decision-directed detection with linear complexity. To evaluate the complexity, we distinguish between the initialization phase, when the equalizers are calculated based on the channel knowledge, and the data processing phase, when the received data is actually processed. The rate of the former is related to the channel’s fading rate, whereas that of the latter is executed continuously at the symbol block rate. By analyzing the rate of the different receiver blocks in Figure 2, it is clear that the equalizer operates at a rate which is B times lower than the symbol rate that is, Req = Rs /B. This section is organized as follows. Section 3.1 investigates ML detection. Section 3.2 studies joint linear equalization and decoding, whereas Section 3.3 introduces joint deci-

Data processing QC B B+1 2C − C B − 1 Q − CB C −1 2C B+1 − C B − 1 3Q − 1 + 2QC B − 3 C

Multiplications Additions Data transfers

sion feedback equalization and decoding. Finally, Section 3.4 proposes separate linear equalization and decoding. 3.1.

ML detection

The ML algorithm is optimal in an ML sense but has a very high complexity. Amongst all possible transmitted blocks, it retains the one that maximizes the likelihood function or, equivalently, minimizes the Euclidean distance: . . ˜ · Θ · sm [i].2 . sˆ m [i] = arg mmin .y˜ m [i] − H s [i]∈S

(16)

In other words, the ML metric is given by the Euclidean distance between the actual received block and the block that would have been received if a particular symbol block had been transmitted in a noiseless environment. The number of possible transmit vectors in S is the cardinality of S, that is, |S | = C B , with C the constellation size. Consequently, the number of points to inspect grows exponentially with the initial block length B. The ML algorithm does not require an initialization phase. During the data processing phase, the ML algorithm calculates the Euclidean distance metric of (16), for all possible transmit vectors sm [i]. To lower the complexity, a treelike implementation avoids frequent recalculation of common subexpressions. Table 1 summarizes the complexity of the ML algorithm in terms of complex multiplications, additions, and data transfers. The overall complexity is O(QC B ) during data processing. Hence, this algorithm is only feasible for a small block length B and a small constellation size C. 3.2.

Joint linear equalization and decoding

Linear equalizers that perform joint equalization and decoding combine a low complexity with medium performance. A first possibility is to apply a zero-forcing block linear equalizer (ZF-BLE) [25] 

˜H ·H ˜ ·Θ GZF = ΘH · H

−1

˜ H, · ΘH · H

(17)

which completely eliminates the ISI, irrespective of the noise level. A second possibility is to apply a minimum meansquare-error block linear equalizer (MMSE-BLE) [25] 

˜H ·H ˜ ·Θ+ GMMSE = ΘH · H

σw2 IB σs2

−1

˜ H, · ΘH · H

(18)

which minimizes the MSE between the actual transmitted symbol block and its estimate. Here, σw2 and σs2 are the noise variance and the information symbol variance, respectively.

1574

EURASIP Journal on Applied Signal Processing Table 2: Complexity of ZF-BLE. Initialization B3 Q 13 + 3B 2 Q + BQ 3 6 B3 Q 5 2 + 3B Q − BQ − B 2 3 6 2B 3 Q + 21B 2 Q + 7BQ − 3B 2

Multiplications Additions Data transfers

Data processing BQ BQ − B 6BQ − 3B

Table 3: Complexity of MMSE-BLE. Initialization B3 Q 5 2 7 + B Q + BQ + 1 6 2 3 5 2 BQ − B2 + B B Q− 2 2 11 B3 Q + 15B 2 Q + BQ − 3B 2 + 3B + 3 2 2

Multiplications Additions Data transfers

During the initialization phase, GZF and GMMSE can be computed from the set of multiple linear systems, implicitly shown in (17) and (18), respectively. For the ZF-BLE, the solution of each linear system can be found using the LU decomposition, which relies on Gauss elimination with partial pivoting [24]. For the MMSE-BLE, each linear system can be solved based on the LDLH decomposition (instead of the LU decomposition), which relies on Gauss elimination without pivoting [24]. During the data processing phase, the equalizers GZF and GMMSE are applied to the received block y˜ m [i]. Tables 2 and 3 summarize the complexity of the ZF- and the MMSE-BLE, respectively, in terms of complex multiplications, additions, and data transfers. In both cases, the overall complexity is O(B3 Q) during initialization and O(BQ) during data processing. 3.3. Joint decision feedback equalization and decoding





sˆ m [i] = slice W · y˜ m [i] − B · sˆ m [i] .

(19)

The feedforward and feedback sections can be designed according to a ZF or MMSE criterium. In either case, B should be a strictly upper or lower triangular matrix with zero diagonal entries, in order to feedback decisions in a causal way. To design the decision feedback counterpart of the ZFBLE, we compute the Cholesky decomposition of the matrix ˜H ·H ˜ · Θ in (17), that is, ΘH · H 

˜H ·H ˜ · Θ = Σ 1 · U1 ΘH · H

H

· Σ1 · U1 ,

(20)

BQ BQ − B 6BQ − 3B

where U1 is an upper triangular matrix with ones along the diagonal and Σ1 is a diagonal matrix with real entries. The ZF-BDFE then follows from 

WZF = U1 · GZF = Σ1−1 · UH 1 · Σ1 BZF = U1 − IB .

−1

˜ H, · ΘH · H

(21)

The linear feedforward section WZF suppresses the ISI originating from “future” symbols, the so-called precursor ISI, whereas the nonlinear feedback section BZF eliminates the ISI originating from “past” symbols, the so-called postcursor ISI. Likewise, to design the decision feedback counterpart of the MMSE-BLE, we compute the Cholesky decomposition of ˜H ·H ˜ · Θ + (σw2 /σs2 )IB in (18), that is, the matrix ΘH · H ˜H ·H ˜ ·Θ+ ΘH · H

The class of nonlinear equalizers that perform joint decision feedback equalization and decoding lies in between the former categories, both in terms of performance and in complexity. The block decision feedback equalizers (BDFEs) consist of a feedforward section, represented by the matrix W, and a feedback section, represented by the matrix B [26, 27]:

Data processing

H  σw2 IB = Σ2 · U2 · Σ2 · U2 , σs2

(22)

where U2 is an upper triangular matrix with ones along the diagonal, and Σ2 is a diagonal matrix with real entries. The MMSE-BDFE can then be calculated as 

WMMSE = U2 · GMMSE = Σ2−1 · UH 2 · Σ2 BMMSE = U2 − IB .

−1

˜ H, · ΘH · H

(23) During the initialization phase, the feedforward and feedback filters of the ZF- and MMSE-BDFE are computed based on (21) and (23), respectively, relying on the Cholesky decomposition [24]. During the data processing phase, the received data is first filtered with the feedforward filter, W, and then fed back with the feedback filter, B, according to (19). Tables 4 and 5 summarize the complexity of the ZF- and MMSE-BDFE, respectively, in terms of complex multiplications, additions, and data transfers. In both cases, the overall complexity is O(B3 Q) during initialization and O(BQ) during data processing. Hence, the nonlinear BDFEs involve the same order of complexity as their linear counterparts.

Multicarrier Block-Spread CDMA

1575 Table 4: Complexity of ZF-BDFE. Initialization B3 Q B 3 13 B2 B + 4B 2 Q + + BQ + + 3 6 6 2 3 3 3 B Q 11 B 5 − + 4B 2 Q + BQ − B 2 + B 3 6 6 6 2B 3 Q + 27B 2 Q + B 3 + 4BQ − B 2 + 4B

Multiplications Additions Data transfers

Data processing BQ + B 2 BQ + B 2 − B 6BQ + 6B 2 − 3B

Table 5: Complexity of MMSE-BDFE. Initialization B3 Q 7 2 B3 7 B2 B + B Q+ + BQ + + +1 6 2 6 3 2 3 B3 3 11 7 2 2 − BQ − B + B Q+ B 2 6 2 6 3 5 B Q + 21B 2 Q + B 3 + BQ − B 2 + 7B + 3 2 2

Multiplications Additions Data transfers

3.4. Separate linear equalization and decoding Previously, we have only considered joint equalization and decoding of the linear precoding. However, in order to even further reduce the complexity with respect to the block linear equalizers of Section 3.2, equalization and decoding can be performed separately as well: ˜ · y˜ [i], sˆ [i] = Θ · G H

m

m

(24)

˜ for which we rely on the para-unitary property of Θ. Here, G performs per-tone linear equalization (PT-LE) only, and tries to restore ˜sm [i], whereas ΘH subsequently performs linear decoding only, and tries to restore sm [i]. The ZF-per-tone linear equalizer (PT-LE), which can be expressed as 

−1

˜ ZF = H ˜H ·H ˜ G

˜ H, ·H

(25)

perfectly removes the amplitude and phase distortion on every tone, irrespective of the noise level. The MMSE-PT-LE, which balances amplitude and phase distortion with noise enhancement on every tone, can be expressed as 

˜ MMSE = H ˜H ·H ˜ + σw2 R˜s−1 G

−1

˜ H, ·H

(26)

where R˜s := E{˜sm [i] · ˜sm [i]H } = σs2 Θ · ΘH is the covariance matrix of ˜sm [i]. The MMSE equalizer only decouples into Q parallel and independent single-tap equalizers, if we neglect the color in the precoded symbols, that is, R˜s ≈ σs2 IQ . ˜ ZF and G ˜ MMSE are calDuring the initialization phase, G culated from (25) and (26), respectively, where the matrix inversion reduces to Q parallel scalar divisions. During the data processing phase, the received data is separately equalized and decoded, according to (24). Furthermore, the linear decoding step relies on implementation efficient IDCT or IFFT operations. Tables 6 and 7 summarize the complexity of the ZF- and MMSE-PT-LE, respectively, in terms of com-

Data processing BQ + B 2 BQ + B 2 − B 6BQ + 6B 2 − 3B

plex multiplications, additions, and data transfers. In both cases, the overall complexity is O(Q) during initialization and O(Q log2 (Q)) during data processing. 4.

EXTENSION TO MULTIPLE ANTENNAS

As shown in Sections 2 and 3, MCBS-CDMA successfully addresses the challenges of broadband cellular downlink communications. However, the spectral efficiency of singleantenna MCBS-CDMA is still limited by the received signalto-noise ratio (SNR) and cannot be further improved by traditional communication techniques. As opposed to singleantenna systems, MIMO systems that deploy NT transmit and NR receive antennas enable an Nmin -fold capacity increase in rich scattering environments, where Nmin = min{NT , NR } is called the multiplexing gain [28, 29, 30]. Besides the time, frequency, and code dimensions, MIMO systems create an extra spatial dimension that allows to increase the spectral efficiency and/or to improve the performance. On the one hand, space-division multiplexing (SDM) techniques achieve high spectral efficiency by exploiting the spatial multiplexing gain [31] (see also [32]). On the other hand, space-time coding (STC) techniques achieve high qualityof-service (QoS) by exploiting diversity and coding gains [33, 34, 35]. Besides the leverages they offer, MIMO systems also sharpen the challenges of broadband cellular downlink communications. First, time dispersion and ISI are now caused by NT NR frequency-selective multipath fading channels instead of just 1. Second, MUI originates from NT M sources instead of just M. Third, the presence of multiple antennas seriously impairs a low-complexity implementation of the MS. To tackle these challenges, we will demonstrate the synergy between our MCBS-CDMA waveform and MIMO signal processing. In particular, we focus on a space-time block-coded (STBC) MCBS-CDMA transmission, but the general principles apply equally well to a space-time trellis coded or a space-division multiplexed MCBS-CDMA transmission.

1576

EURASIP Journal on Applied Signal Processing Table 6: Complexity of ZF-PT-LE. Initialization

Multiplications

2Q

Additions



Data transfers

6Q

Data processing   1 Q log2 (Q) + 1 2 Q log2 (Q)   3 3Q log2 (Q) + 1 2

transmitted symbol block at time instant 2i + 1 from one antenna is the conjugate of the transmitted symbol block at time instant 2i from the other antenna (with a possible sign change). This corresponds to a per-tone implementation of the classical Alamouti scheme for frequency-flat fading channels [34]. As we will show later, this property allows for deterministic transmit stream separation at the receiver. After ST encoding, the resulting symbol block sequences NT {s¯m nt [i]}nt =1 are block-spread and code-division multiplexed with those of the other users:

Table 7: Complexity of MMSE-PT-LE. Initialization Multiplications

2Q + 1 Q

Additions Data transfers

9Q + 3

Data processing   1 Q log2 (Q) + 1 2 Q log2 (Q)   3 3Q log2 (Q) + 1 2

This section is organized as follows. Section 4.1 details the STBC MCBS-CDMA transmission scheme for the case of NT = 2 transmit antennas. Section 4.2 demonstrates how the user orthogonality preservation property of MCBS-CDMA translates to the MIMO case, which allows to convert a difficult multiuser MIMO detection problem into an equivalent but simpler single-user MIMO equalization problem. Finally, Section 4.3 explains how space-time decoding and equalization can then be performed for each user separately. 4.1. Space-time block-coded MCBS-CDMA transmission The block diagram in Figure 3 describes the STBC MCBSCDMA downlink transmission scheme (where only the mth user is explicitly shown), that transforms the M user data symbol sequences {sm [i]}M m=1 into NT ST coded multiuser chip sequences {unt [n]}NntT=1 with a rate 1/Tc . For conciseness, we limit ourselves to the case of NT = 2 transmit antennas with rate R = 1 space-time block codes. Note, however, that the proposed technique can be easily extended to the case of NT > 2 transmit antennas with R = 1/2 space-time block codes, by resorting to the generalized orthogonal designs of [35]. As for the single-antenna case, the information symbols are first grouped into blocks of B symbols and linearly precoded. Unlike the traditional approach of performing ST encoding at the scalar symbol level, we perform ST encoding at the symbol block level; this was also done in, for example, [36]. Out ST encoder operates in the FD and takes two consecutive symbol blocks {˜sm [2i], ˜sm [2i + 1]} to output the following 2Q × 2 matrix of ST coded symbol blocks: 





x˜ nt [n] =

M ' m=1

m s¯m nt [i]c [n],

At each time interval i, the ST coded symbol blocks s¯m 1 [i] and s¯m 2 [i] are forwarded to the first and the second transmit antenna, respectively. From (27), we can easily verify that the

(28)

At this point, it is important to note that each of the NT parallel block sequences are block spread by the same composite code sequence cm [n], guaranteeing an efficient utilization of the available code space. As will become apparent later, this property allows for deterministic user separation at every receive antenna. After IFFT transformation and the addition of some form of transmit redundancy ˜ nt [n], unt [n] = T · FH Q ·x

(29)

the corresponding scalar sequences {unt [n]}NntT=1 are transmitted over the air at a rate 1/Tc . 4.2.

MUI-resilient MIMO reception

The block diagram in Figure 4 describes the reception scheme for the MS of interest, which transforms the different received sequences {vnr [n]}NnrR=1 into an estimate of the desired user’s data sequence sˆm [i]. After transmit redundancy removal and FFT transformation, we obtain the multiantenna counterpart of (13): ˜ nr [i] = Y

NT '

˜ nr [i], ˜ nr ,nt · X ˜ nt [i] + Z H

(30)

nt =1

˜ nr [i] := [˜ynr [iN], . . . , y˜nr [(i + 1)N − 1]] stacks N where Y consecutive received chip blocks y˜nr [n] at the nr th receive ˜ nr ,nt is the diagonal FD channel matrix from the antenna, H ˜ nt [i] and nt th transmit to the nr th receive antenna, and X ˜ nr [i] are similarly defined as Y ˜ nr [i]. From (28) and (30), Z we can conclude that our transceiver retains the user orthogonality at each receive antenna, irrespective of the underlying frequency-selective multipath channels. As in the singleantenna case, a low-complexity block despreading operation with the desired user’s composite code vector cm [i] deterministically removes the MUI at each receive antenna:



˜sm [2i] −˜sm [2i + 1]∗ ¯m s¯m 1 [2i] s 1 [2i + 1] . (27) = m m m ˜s [2i + 1] ˜sm [2i]∗ s¯2 [2i] s¯2 [2i + 1]

n = iN + n .

˜ nr [i] · cm [i]∗ = y¯ nmr [i] := Y

NT ' nt =1

˜ nr ,nt · s¯m ¯m H nt [i] + z nr [i]. (31)

Hence, our transceiver successfully converts (through block despreading) a multiuser MIMO detection problem into an equivalent single-user MIMO equalization problem.

Multicarrier Block-Spread CDMA

1577 mth user s¯m 1 [i] Q×1

sm [i]

Block S/P Θ ST B×1 Q × 1 encoder sm [i]

TX 1 cm [n] Other users ··· x1 [n] x˜ 1 [n] u1 [n] u1 [n] x + IFFT P/S T Q×1 Q×1 K ×1

Nx

˜sm [i]

s¯m 2 [i] Q×1

TX 2 cm [n] Other users ··· x2 [n] x˜ 2 [n] u2 [n] u2 [n] x + T P/S IFFT Q×1 Q×1 K ×1

Nx

Figure 3: STBC MCBS-CDMA downlink transmission scheme. RX 1

cm [n]∗

v1 [n]

S/P

v1 [n] K ×1

R

. . .

RX NR vNR [n]

S/P

y1 [n]

FFT

Q×1 . . .

vNR [n] K ×1

R

y˜1 [n] Q×1 . . .

yNR [n]

FFT

Q×1

n=1

N

x

y¯ 1m [i] Q×1 Block y˜ m [i] ST decoder Q × 1

. . .

cm [n]∗

y˜NR [n] Q×1

N

x

n=1

Equalizer

sˆ m [i] B×1

P/S

sˆm [i]

m y¯ N [i] R

Q×1

Figure 4: MUI-resilient STBC/MCBS-CDMA MIMO reception scheme.

4.3. Single-user space-time decoding and equalization After MUI elimination, the information blocks sm [i] still need to be decoded from the received block despread sequences {y¯ nmr [i]}NnrR=1 . Our ST decoder decomposes into three steps: an initial ST decoding step, a transmit stream separation step for each receive antenna, and, finally, a receive antenna combining step. The initial ST decoding step considers two consecutive symbol blocks {y¯ nmr [2i] and y¯ nmr [2i + 1]}, both satisfying the block input/output relationship of (31). By exploiting the ST code structure of (27) as in [36], we arrive at ˜ ˜ nr ,1 · s¯m ¯m ¯m y¯ nmr [2i] = H (32) 1 [2i] + Hnr ,2 · s 2 [2i] + z nr [2i], m ∗ ∗ m ∗ m ˜ n ,2 · s¯1 [2i] + z¯ m ˜ n ,1 · s¯2 [2i] + H y¯ nr [2i + 1] = −H [2i + 1]∗ . nr r r (33)

Combining (32) and (33) into a single block matrix form, we obtain  

y¯ nmr [2i] m y¯ nr [2i + 1]∗ 

 

 



y´ nmr [2i] ¯H ¯m := U nr · r nr [i] m y´ nr [2i + 1] 

r´ m nr [i]











 







˜ nr ,1 H ˜ nr ,2 z¯ m H ˜sm [2i] n [2i] · m + m r , ∗ ∗ ˜ n ,2 −H ˜ n ,1 ˜ s [2i + 1] ¯ z [2i + 1]∗ H nr r r ¯ nr H





˜ n · ˜sm [2i] z´ m [2i] D = ˜ r m + m nr , z´ nr [2i + 1] Dnr · ˜s [2i + 1]



r¯m nr [i]

=

in the classical Alamouti scheme but only for single-user frequency-flat fading multipath channels [34]. The transmit stream separation step relies on this property to deterministically remove the transmit stream interference through low-complexity linear processing. We define ˜ nr with nonnegative diagonal entries as the Q × Q matrix D ˜ nr := [H ˜ nr ,1 · H ˜ nr ,2 · H ˜ ∗n ,1 + H ˜ ∗n ,2 ]1/2 . From (34), we can verD r r ¯ ¯H ¯ ˜2 ify that the channel matrix Hnr satisfies H nr · Hnr = I2 ⊗ Dnr , ¯ where ⊗ stands for Kronecker product. Based on Hnr and ˜ nr , we can construct a unitary matrix U ¯ nr := H ¯ nr ·(I2 ⊗D ˜ n−1 ), D r H ·H ¯H ¯ ¯ ¯ ˜ which satisfies U · = I and U = I ⊗ . U D nr 2Q nr 2 nr Pernr nr ¯H forming unitary combining on (34) (through U nr ) collects the transmit antenna diversity at the nr th receive antenna:







η¯ m nr [i]



(34) sm [2i] and s¯m sm [2i + 1] follow where s¯m 1 [2i] = ˜ 2 [2i] = ˜ ¯ from (27). From the structure of Hnr in (34), we can deduce that our transceiver retains the orthogonality among transmit streams at each receive antenna for each tone separately, regardless of the underlying frequency-selective multipath channels. A similar property was also encountered



η´ m nr [i]

(35)



¯ H ¯m where the resulting noise η´ m nr [i] := Unr · η nr [i] is still white 2 with variance σw . Since multiplying with a unitary matrix preserves ML optimality, we can deduce from (35) that the symbol blocks ˜sm [2i] and ˜sm [2i + 1] can be decoded separately in an optimal way. As a result, the different symbol blocks ˜sm [i] can be detected independently from ˜ nr · ˜sm [i] + z´ m y´ nmr [i] = D nr [i].

(36)

Stacking the blocks from the different receive antennas N

{y´ nmr [i]}nrR=1 for the final receive antenna combining step, we

1578

EURASIP Journal on Applied Signal Processing

obtain

Table 8: Parameters of the ITU pedestrian B channel.     

   m  ˜1  y´ 1m [i] z´ 1 [i] D     ..  =  ..  ·˜sm [i] +   ..   .  .   .  ˜ NR y´ Nm [i] z´ m D N [i] R 

y´ m [i]



   ´ H



R 

(37)



z´ m [i]

At this point, we have only collected the transmit antenna diversity at each receive antenna, but still need to collect the ˜ with receive antenna diversity. We define the Q × Q matrix D NT NR ˜ ˜ nonnegative diagonal entries as D := [ nt =1 nr =1 Hnr ,nt · ´ H ·H ˜ ∗n ,n ]1/2 . From (37), we can verify that: H ´ =D ˜ 2 . Based H r t ´ ˜ ´ · on H and D, we can construct a tall unitary matrix U´ := H −1 H H ´ ´ ˜ ´ ´ ˜ D , which satisfies U · U = IQ and U · H = D. Gathering the receive antenna diversity through multiplying (37) with U´ H , we finally obtain ˜ · Θ · sm [i] + z˜m [i], y˜ m [i] := U´ H · y´ m [i] = D

(38)

where the resulting noise z˜m [i] := U´ H ·z´ m [i] is still white with variance σw2 . Since the multiplication with a tall unitary matrix, which does not remove information, preserves ML decoding optimality, the blocks sm [i] can be optimally decoded from (38). Furthermore, since (38) has the same structure as its single-antenna counterpart in (14), the design of the linear precoder Θ in Section 2.3 and the different equalization options that we have discussed in Section 3 can be applied here as well. Specifically, with Lt the number of taps of the underlying multipath channels, the ML detector achieves the full diversity order of NT NR Lt , hence, both multi-antenna and multipath diversity. The transmit antenna diversity is enabled at the transmitter by the space-time encoder and collected at each receive antenna by the transmit stream separation step. The receive antenna diversity is collected by the final receive antenna combining step. The multipath diversity is enabled at the transmitter by the linear precoder, and extracted at the receiver by the ML joint equalization and decoding step. 5.

SIMULATION RESULTS

We consider the downlink of an MCBS-CDMA system, operating at a carrier frequency of Fc = 2 GHz and transmitting with a chip rate of Rc = 1/Tc = 4.096 MHz. Each user’s bit sequence is QPSK modulated with nb = 2 bits per symbol. To assess the performance of the MCBS-CDMA system, we have selected ITU’s outdoor-to-indoor and pedestrian B channel model, which models typical urban propagation environments. The main parameters of this tapped delay line model are summarized in Table 8. Hence, the multipath channel has Lt = 6 Rayleigh fading taps with a maximum excess delay of τmax = 3700 ns, resulting in a minimum channel order of Lmin = τmax /Tc  = 16. To satisfy the IBI removal condition L ≥ Lmin , we choose the CP length L = 32. This specific design can even handle a maximum excess delay of Tg = LTc = 7812.5 ns, with Tg the guard time. However, a larger transmit redundancy can be used to handle more ICI.

Tap 1 2 3 4 5 6

Excess delay (ns) 0 200 800 1200 2300 3700

Average relative power (dB) 0 −0.9 −4.9 −8.0 −7.8 −23.9

Table 9: Main MCBS-CDMA system parameters. Carrier frequency Chip rate Modulation format Initial block length Cyclic prefix length Number of subcarriers Transmitted block length Symbol rate

Fc = 2 GHz Rc = 4.096 MHz nb = 2 (QPSK) B = 224 L = 32 Q = 256 K = 288 Rs = 199 kHz

Adversely, a smaller transmit redundancy is allowed if less ICI has to be handled. To limit the overhead, we choose the number of subcarriers Q = 8L = 256, leading to a transmitted block length K = Q + L = 288. Hence, the information symbols are parsed into blocks of B = Q − L = 224 symbols and linearly precoded into blocks of size Q = 256. The Q × B precoding matrix, Θ, constitutes the first B columns of the Q × Q DCT matrix [22]. The precoded symbol blocks are subsequently block spread by a real orthogonal Walsh-Hadamard spreading code of length N = 16, along with a complex random scrambling code. For the above parameters, this results in a channel symbol rate of Rs = (B/K)(1/N)Rc = 199 kHz. For convenience, the main MCBS-CDMA system parameters are summarized in Table 9. In the following, we show the average bit error rate (BER) versus the average received SNR for three different test cases. Here, the SNR is defined as the average received energy per bit of the desired user versus the noise power spectral density. Section 5.1 compares the different single-user equalization options, from a BER performance as well as a complexity point of view. Section 5.2 compares the BER performance of the proposed MCBS-CDMA transceiver with two competing CDMA transceivers. Finally, Section 5.3 discusses the BER performance of the SIBC-MCBS-CDMA transceiver in different propagation environments. 5.1.

Comparison of different equalization options

We test the different equalization options, discussed in Section 3, for a fully-loaded MCBS-CDMA system with M = 16 active users. Figure 5 compares the performance of the different block linear equalizers (BLEs) and BDFEs that perform joint equalization and decoding. As a reference also, the performance of a system without linear precoding (uncoded) as well as the optimal ML performance are shown. Clearly, the system without linear precoding only achieves diversity 1, whereas

1579

100

100

10−1

10−1 Average BER

Average BER

Multicarrier Block-Spread CDMA

10−2

10−3

10−2

10−3

10−4

10−4 0

2

4

Uncoded ZP-BLE MMSE-BLE

6

8 10 12 14 Average SNR (dB)

16

18

20

ZF-BDFE MMSE-BDFE ML

0

2

4

6

8 10 12 14 Average SNR (dB)

Uncoded ZF-PT-LE MMSE-PT-LE

16

18

20

ZF-BLE MMSE-BLE ML

Figure 5: Performance comparison of joint block linear equalization (BLE) and decoding versus joint block decision feedback equalization (BDFE) and decoding for fully-loaded MCBS-CDMA system with M = 16 users. Both ZF and MMSE critera are considered. Uncoded and ML performances are shown as a reference.

Figure 6: Performance comparison of separate PT-LE and decoding versus joint block linear equalization (BLE) and decoding for a fully-loaded MCBS-CDMA system with M = 16 users. Both the ZF and the MMSE criteria are considered. Uncoded and ML performances are shown as a reference.

ML detection achieves the full frequency-diversity gain Lt = 6. The ZF-BLE performs worse than the uncoded system at low SNR but better at high SNR (SNR ≥ 9 dB). The MMSEBLE always outperforms the uncoded system and achieves a diversity gain between 1 and Lt = 6. At a BER of 10−3 , it realizes a 3 dB gain compared to its ZF counterpart. The nonlinear ZF- and MMSE-BDFEs outperform their respective linear counterparts, although this effect is more pronounced for the ZF than for the MMSE criterion. For a target BER of 10−3 , the MMSE-BDFE exhibits a 1.9 dB gain relative to the MMSE-BLE, whereas the ZF-BDFE exhibits a 4.2 dB gain relative to the ZF-BLE. Furthermore, the MMSE-BDFE marginally outperforms the ZF-BDFE by 0.7 dB, and comes within 1.4 dB of the optimal ML detector. Figure 6 compares the performance of separate PT-LE and decoding versus joint block linear equalization (BLE) and decoding, both of which perform linear equalization. On the one hand, the ZF-PT-LE always performs worse than the uncoded system, due to the excessive noise enhancement caused by the presence of channel nulls. For a target BER of 10−2 , the ZF-BLE outperforms its corresponding ZF-PT-LE by 7.4 dB. On the other hand, the MMSE-PT-LE performs within 0.3 dB of its corresponding MMSE-BLE, and, thus, achieves a diversity gain between 1 and Lt = 6. The MMSEBLE, on its turn, outperforms the uncoded system by 4.8 dB and comes within 2.7 dB of the optimal ML detector. Tables 10 and 11 summarize the complexity results for the different MCBS-CDMA equalization options. Table 10 compares the initialization complexity of the different equalization options. The initialization complexity of the ZF-BLE, which is similar to that of the ZF-BDFE, involves an opera-

Table 10: Comparison of the initialization complexity of the different MCBS-CDMA equalization options.

ML ZF-BLE MMSE-BLE ZF-BDFE MMSE-BDFE ZF-PT-LE MMSE-PT-LE

mpys — 998 M 512 M 1.0 G 527 M 0.5 k 0.5 k

Initialization adds — 998 M 32 M 1.0 G 47 M 0.3 k

dts — 6.0 G 1.6 G 6.1 G 1.7 G 1.5 k 2.3 k

Table 11: Comparison of the data processing complexity of the different MCBS-CDMA equalization options. mpys/s ML 1.7 · 10131 G ZF-BLE 51 M MMSE-BLE 51 M ZF-BDFE 96 M MMSE-BDFE 96 M ZF-PT-LE 1M MMSE-PT-LE 1M

Data processing adds/s 3.9 · 10131 G 51 M 51 M 95 M 95 M 2M 2M

dts/s 1.5 · 10132 G 305 M 305 M 573 M 573 M 9M 9M

tion count of 998 Mmpys and 998 Madds, and a data transfer count of 6.0 Gdts. The initialization complexity of the MMSE-BLE, which is similar to that of the MMSE-BDFE, involves 2 times less multiplications, 30 times less additions,

1580

5.2. Comparison of different CDMA transceivers In the following, we compare three different CDMA transceivers. (1) The first transceiver applies the downlink DS-CDMA transmission scheme used in 3G cellular standards, performing classical symbol spreading. The receiver employs either a classical RAKE combiner or an MMSE time-domain chip equalizer (TD-CE) [3, 4, 5, 6, 7, 8] based on perfect CSI. The number of fingers in the RAKE combiner equals Lt = 6, while the order of the chip equalizer equals Qc = 23. The bandwidth efficiency of this first transceiver, supporting M1 users, can be calculated as 1 = M1 /N, where N is the length of the Walsh-Hadamard spreading codes. (2) The second transceiver applies the downlink MCCDMA transmission scheme, performing classical symbol spreading followed by OFDM modulation [14, 15, 16]. The receiver employs an MMSE frequencydomain chip equalizer (FD-CE) based on perfect CSI. The bandwidth efficiency of this second transceiver, supporting M2 users, can be calculated as 2 = MC-CDMA = M2 B2 /(B2 N + L), where B2 is the initial block length and Q2 = B2 N is the number of subcarriers. The overhead of L stems from the CP for IBI removal. (3) The third transceiver is our MCBS-CDMA transceiver that we have derived in Section 2, combining blockspread CDMA and linearly-precoded OFDM. The receiver employs an MMSE PT-LE or ML detection. As discussed in Section 2.1, the bandwidth efficiency of this third transceiver, supporting M3 users, can be calculated as 3 = MCBS-CDMA = M3 B3 /N(B3 +2L), where

100

10−1 Average BER

and 3.7 times less data transfers. Specifically, it amounts to an operation count of 512 Mmpys and 32 Madds, and a data transfer count of 1.6 Gdts. On the other hand, the MMSEPT-LE involves an initialization complexity, which is between 5 and 6 orders of magnitude smaller than that of its corresponding MMSE-BLE. Specifically, its initialization complexity amounts to an operation count of 0.5 kmpys and 0.3 kadds, and a data transfer count of 2.3 kdts. Table 11 compares the data processing complexity of the different equalization options. Note that the equalizer block operates at a rate which is B times lower than the symbol rate Rs , that is, Req = Rs /B = 889 Hz. The data processing complexity of the optimal ML algorithm is astronomically high, which certainly prohibits implementation, even on the most advanced quantum computers. The BLEs have a data processing complexity, which amounts to an operation count of 51 Mmpys/s and 51 Madds/s, and a data transfer bandwidth of 305 Mdts/s. On the one hand, the data processing complexity of the BDFEs is approximately twice that of the BLEs. On the other hand, the data processing complexity of the PTLEs is roughly between 1 and 2 orders of magnitude lower than that of the corresponding BLEs. Specifically, it amounts to an operation count of 1 Mmpys/s and 2 Madds/s, and a data transfer bandwidth of 9 Mdts/s.

EURASIP Journal on Applied Signal Processing

10−2

10−3

10−4 0

2

4

6

8 10 12 14 Average SNR (dB)

16

18

20

DS-CDMA/RAKE DS-CDMA/MMSE-TD-CE MC-CDMA/MMSE-FD-CE MCBS-CDMA/MMSE-PT-LE MCBS-CDMA/ML

Figure 7: Comparison of DS-CDMA, MC-CDMA, and MCBSCDMA for small system load with M1 = 3, M2 = 3, and M3 = 4 users, respectively: RAKE and MMSE-TD-CE for DS-CDMA; MMSE-FD-CE for MC-CDMA; MMSE-PT-LE and ML for MCBSCDMA.

B3 is the initial block length and Q3 = B3 + L is the number of tones. The overhead of 2L stems from the redundant linear precoding, on the one hand, and the CP, on the other. In order to make a fair comparison between the three transceivers, we should force their respective bandwidth efficiencies to be the same, that is, 1 = 2 = 3 . This leads to the following relationship between the number of users to be supported by the different transceivers: M2 = ((B2 N + L)/B2 N)M1 , and M3 = ((B3 + 2L)/B3 )M1 . With N = 16, L = 32, Q2 = Q3 = 8L = 256, B2 = 16, and B3 = 224, we can derive that M2 = (9/8)M1 and M3 = (9/7)M1 . Furthermore, we ensure that the total transmit power is the same for the different transceivers. Figure 7 compares the performance of the different transceivers for a small system load with M1 = 3, M2 = 3, and M3 = 4 users, respectively (1 ≈ 2 ≈ 3 ). The DSCDMA RAKE receiver starts flooring off at 10−3 , due to ISI/ICI and associated MUI. The DS-CDMA MMSE-TDCE actively suppresses these interferences and achieves a significant performance improvement compared to the RAKE. On the other hand, the MC-CDMA MMSE-FD-CE has the same performance as the DS-CDMA MMSE-TD-CE at low SNR (SNR < 8), but clearly outperforms it at high SNR. Furthermore, the MCBS-CDMA MMSE-PT-LE that deterministically removes the MUI but still suffers from ISI performs worse than both the DS-CDMA MMSE-TD-CE and the MC-CDMA MMSE-FD-CE. Specifically, for a target BER of 10−4 , the DS-CDMA MMSE-TD-CE realizes a 0.5 dB gain

1581

100

100

10−1

10−1 Average BER

Average BER

Multicarrier Block-Spread CDMA

10−2

10−3

10−2

10−3

10−4

10−4 0

2

4

6

8 10 12 14 Average SNR (dB)

16

18

0

20

DS-CDMA/RAKE DS-CDMA/MMSE-TD-CE MC-CDMA/MMSE-FD-CE MCBS-CDMA/MMSE-PT-LE MCBS-CDMA/ML

Figure 8: Comparison of DS-CDMA, MC-CDMA, and MCBSCDMA for large system load with M1 = 12, M2 = 14, and M3 = 16 users, respectively: RAKE and MMSE-TD-CE for DS-CDMA; MMSE-FD-CE for MC-CDMA; MMSE-PT-LE and ML for MCBSCDMA.

compared to the MCBS-CDMA MMSE-PT-LE, whereas the MC-CDMA MMSE-FD-CE realizes a 2.8 dB gain. Finally, the optimal MCBS-CDMA ML achieves the full diversity gain of Lt = 6. Figure 8 depicts the performance of the different transceivers for a large system load with M1 = 12, M2 = 14, and M3 = 16 users, respectively (1 ≈ 2 ≈ 3 ). The DSCDMA RAKE receiver clearly suffers from a BER floor at 8 · 10−2 , since it does not cope at all with the increased MUI. Although the DS-CDMA MMSE-TD-CE still outperforms the RAKE, its performance also starts flooring off, because it does not completely suppress these interferences at high SNR. Indeed, the existence of a ZF solution for DSCDMA TD chip equalization requires multichannel reception at the MS [7, 8]. Hence, both DS-CDMA receivers suffer from a BER saturation level that increases with the system load M1 . Likewise, since the MC-CDMA MMSE-FDCE does not deterministically suppress the MUI either, its performance is also affected by the increased MUI. However, unlike the DS-CDMA MMSE-TD-CE, it does not suffer from a BER floor, since it more effectively copes with the ICI through CP. In contrast with DS-CDMA and MC-CDMA, MCBS-CDMA is an MUI-free CDMA transceiver, such that its performance remains unaffected by the increased MUI. Consequently, even at large system load, the MCBS-CDMA MMSE-PT-LE achieves a diversity order between 1 and Lt = 6. Furthermore, the MCBS-CDMA MMSE-PT-LE now performs better than both the DS-CDMA MMSE-TD-CE and the MC-CDMA MMSE-FD-CE. Specifically, for a target BER

2

4

6

8 10 12 Average SNR (dB)

(1, 1) MMSE-PT-LE (1, 1) ML (2, 1) MMSE-PT-LE

14

16

18

20

(2, 1) ML (2, 2) MMSE-PT-LE (2, 2) ML

Figure 9: Performance of STBC-MCBS-CDMA for channels with small delay spread. Different MIMO system setups, ranging from (1, 1) over (2, 1) to (2, 2). MMSE-PT-LE and ML detection.

of 3 · 10−3 , the MCBS-CDMA MMSE-PT-LE outperforms the DS-CDMA MMSE-TD-CE by 6.8 dB. Additionally, for a target BER of 10−4 , the MCBS-CDMA MMSE-PT-LE performs 1 dB better than the MC-CDMA MMSE-FD-CE. Finally, the optimal MCBS-CDMA ML still achieves the full diversity gain of Lt = 6. 5.3.

Performance of space-time block-coded MCBS-CDMA

We test our STBC-MCBS-CDMA transceiver of Section 4, employing a cascade of STBC and MCBS-CDMA, for three different MIMO system setups (NT , NR ): the (1, 1) setup, the (2, 1) setup with TX diversity only, and the (2, 2) setup with both TX and RX diversity. The system is fully loaded supporting M = 16 active users. For each setup, the receiver employs an MMSE-PT-LE or an ML detector based on perfect CSI. Figure 9 depicts the performance over a propagation environment with a small delay spread. The underlying multipath channel has Lt = 2 chip-spaced Rayleigh fading taps of equal average power. For a target BER at 10−3 , and focusing on the MMSE-PT-LE, the (2, 1) setup outperforms the (1, 1) setup by 6 dB. The (2, 2) setup achieves, on its turn, a 3.7 dB gain compared to the (2, 1) setup. Comparing the MMSEPT-LE with its corresponding ML detector, it incurs a 4.2 dB loss for the (1, 1) setup, but only a 0.3 dB loss for the (2, 2) setup. So, the larger the number of transmit and/or receive antennas, the better the linear MMSE-PT-LE succeeds in extracting the full diversity of order NT NR Lt . Figure 10 shows the performance over a propagation environment with a large delay spread. The underlying

1582

EURASIP Journal on Applied Signal Processing 100

Average BER

10−1

10−2

10−3

10−4 0

2

4

6

8 10 12 14 Average SNR (dB)

(1, 1) MMSE-PT-LE (1, 1) ML (2, 1) MMSE-PT-LE

16

18

20

(2, 1) ML (2, 2) MMSE-PT-LE (2, 2) ML

Figure 10: Performance of STBC-MCBS-CDMA for channels with large delay spread. Different MIMO system setups, ranging from (1, 1) over (2, 1) to (2, 2). MMSE-PT-LE and ML detection.

multipath channel, which is the ITU pedestrian B channel that we have introduced before, has Lt = 6 Rayleigh fading taps. For a target BER at 10−3 , and focusing on the MMSE-PT-LE, the (2, 1) setup outperforms the (1, 1) setup by 4.4 dB, whereas the (2, 2) setup achieves, on its turn, a 1.1 dB gain compared to the (2, 1) setup. So, compared to Figure 9, the corresponding gains due to multiantenna diversity are now smaller because of the inherently larger underlying multipath diversity. Comparing the MMSE-PT-LE with its corresponding ML detector, it incurs a 0.9 dB loss for the (2, 2) setup. 6.

CONCLUSION

To cope with the challenges of broadband cellular downlink communications, we have designed a novel multicarrier CDMA transceiver that enables significant performance improvements compared to 3G cellular systems, yielding gains of up to 6.8 dB in full load situations. To this end, our MCBS-CDMA transmission technique capitalizes on redundant block spreading and linear precoding to preserve the orthogonality among users and to enable full multipath diversity gains, regardless of the underlying multipath channels. The corresponding receiver relies on low-complexity block despreading to convert a difficult multiuser detection problem into an equivalent but simpler single-user equalization problem, for which any single-user equalizer allows to trade-off performance versus complexity. In this perspective, we have evaluated the performance and complexity of four different single-user equalization options for a realistic MCBS-CDMA cellular system that fits the UMTS channel bandwidth. On the one hand, the performance results show that, for a target BER of 10−3 , the MMSE-BDFE exhibits a 1.9 dB gain relative to the MMSE-BLE, and comes

within 1.4 dB of the optimal ML detector. Furthermore, the MMSE-PT-LE performs within 0.3 dB of the MMSE-BLE, while it is 3.6 dB away from the ML detector. On the other hand, the complexity estimates show that the initialization complexity of the MMSE-BDFE is similar to that of the MMSE-BLE, while its data processing complexity is approximately two times higher. Furthermore, the MMSE-PT-LE involves an initialization complexity, which is between 5 and 6 orders of magnitude smaller than that of the MMSEBLE, while its data processing complexity is roughly between 1 and 2 orders of magnitude smaller. Based on this study, we can conclude that the MMSE-PT-LE offers a good trade-off between performance and complexity. Finally, to increase the spectral efficiency and to improve the link reliability of multiple users in a broadband cellular network, we have demonstrated the rewarding synergy between MCBSCDMA and existing and evolving MIMO communication techniques. Specifically, our STBC-MCBS-CDMA transmission technique not only retains the orthogonality among users but also among the different transmit streams of each user. At the receiver, these properties, respectively, allow for deterministic ML user separation through low-complexity block despreading as well as deterministic transmit stream separation through simple linear processing. Consequently, ML equalization per transmit stream and per user achieves maximum multiantenna and multipath diversity gains for every user in the system, irrespective of the system load. Furthermore, the low-complexity MMSE-PT-LE approaches the optimal ML performance (within 0.9 dB for a (2, 2) system), and comes close to extracting the full diversity in reduced as well as full load settings. REFERENCES [1] H. Holma and A. Toskala, WCDMA for UMTS: Radio Access for Third Generation Mobile Communications, John Wiley & Sons, New York, NY, USA, 2001. [2] L. B. Milstein, “Wideband code division multiple access,” IEEE Journal on Selected Areas in Communications, vol. 18, no. 8, pp. 1344–1354, 2000. [3] A. Klein, “Data detection algorithms specially designed for the downlink of CDMA mobile radio systems,” in IEEE 47th Vehicular Technology Conference (VTC ’97), vol. 1, pp. 203– 207, Phoenix, Ariz, USA, May 1997. [4] I. Ghauri and D. T. M. Slock, “Linear receivers for the DSCDMA downlink exploiting orthogonality of spreading sequences,” in Proc. IEEE 32nd Asilomar Conference on Signals, Systems & Computers, vol. 1, pp. 650–654, Pacific Grove, Calif, USA, November 1998. [5] C. D. Frank, E. Visotsky, and U. Madhow, “Adaptive interference suppression for the downlink of a direct sequence CDMA system with long spreading sequences,” The Journal of VLSI Signal Processing, vol. 30, no. 1-3, pp. 273–291, 2002. [6] K. Hooli, M. Juntti, M. J. Heikkil¨a, P. Komulainen, M. Latvaaho, and J. Lilleberg, “Chip-level channel equalization in WCDMA downlink,” EURASIP Journal on Applied Signal Processing, vol. 2002, no. 8, pp. 757–770, 2002. [7] T. P. Krauss, W. J. Hillery, and M. D. Zoltowski, “Downlink specific linear equalization for frequency selective CDMA cellular systems,” The Journal of VLSI Signal Processing, vol. 30, no. 1-3, pp. 143–161, 2002.

Multicarrier Block-Spread CDMA [8] F. Petr´e, G. Leus, L. Deneire, M. Engels, M. Moonen, and H. De Man, “Adaptive chip equalization for DS-CDMA downlink with receive diversity,” IEEE Transactions on Wireless Communications, May 2003, accepted subject to major revisions. [9] L. J. Cimini Jr., “Analysis and simulation of a digital mobile channel using orthogonal frequency division multiplexing,” IEEE Trans. Communications, vol. 33, no. 7, pp. 665–675, 1985. [10] J. A. C. Bingham, “Multicarrier modulation for data transmission: an idea whose time has come,” IEEE Communications Magazine, vol. 28, no. 5, pp. 5–14, 1990. [11] Z. Wang and G. B. Giannakis, “Wireless multicarrier communications,” IEEE Signal Processing Magazine, vol. 17, no. 3, pp. 29–48, 2000. [12] W. Y. Zou and Y. Wu, “COFDM: an overview,” IEEE Transactions on Broadcasting, vol. 41, no. 1, pp. 1–8, 1995. [13] S. Hara and R. Prasad, “Overview of multicarrier CDMA,” IEEE Communications Magazine, vol. 35, no. 12, pp. 126–133, 1997. [14] N. Yee, J.-P. Linnartz, and G. Fettweis, “Multi-carrier CDMA in indoor wireless radio networks,” in Proc. IEEE International Symposium on Personal, Indoor, and Mobile Radio Communications (PIMRC ’93), vol. 1, pp. 109–113, Yokohama, Japan, September 1993. [15] K. Fazel, S. Kaiser, and M. Schnell, “A flexible and high performance cellular mobile communications system based on orthogonal multi-carrier SSMA,” Wireless Personal Communications, vol. 2, no. 1/2, pp. 121–144, 1995. [16] S. Kaiser, “OFDM code-division multiplexing in fading channels,” IEEE Trans. Communications, vol. 50, no. 8, pp. 1266– 1273, 2002. [17] V. M. DaSilva and E. S. Sousa, “Multicarrier orthogonal CDMA signals for quasi-synchronous communication systems,” IEEE Journal on Selected Areas in Communications, vol. 12, no. 5, pp. 842–852, 1994. [18] S. Kondo and L. B. Milstein, “Performance of multicarrier DS CDMA systems,” IEEE Trans. Communications, vol. 44, no. 2, pp. 238–246, 1996. [19] L. Vandendorpe, “Multitone spread spectrum multiple access communications system in a multipath Rician fading channel,” IEEE Trans. Vehicular Technology, vol. 44, no. 2, pp. 327– 337, 1995. [20] G. B. Giannakis, Z. Wang, A. Scaglione, and S. Barbarossa, “AMOUR-generalized multicarrier transceivers for blind CDMA regardless of multipath,” IEEE Trans. Communications, vol. 48, no. 12, pp. 2064–2076, 2000. [21] Z. Wang and G. B. Giannakis, “Linearly precoded or coded OFDM against wireless channel fades?,” in Proc. IEEE 3rd Workshop on Signal Processing Advances in Wireless Communications (SPAWC ’01), pp. 267–270, Taiwan, China, March 2001. [22] Z. Wang and G. B. Giannakis, “Complex-field coding for OFDM over fading wireless channels,” IEEE Transactions on Information Theory, vol. 49, no. 3, pp. 707–720, 2003. [23] A. Peled and A. Ruiz, “Frequency domain data transmission using reduced computational complexity algorithms,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing (ICASSP ’80), vol. 5, pp. 964–967, Denver, Colo, USA, April 1980. [24] G. H. Golub and C. F. Van Loan, Matrix Computations, Johns Hopkins University Press, Baltimore, Md, USA, 1996. [25] A. Klein and P. W. Baier, “Linear unbiased data estimation in mobile radio systems applying CDMA,” IEEE Journal on

1583

[26]

[27]

[28]

[29] [30]

[31]

[32]

[33]

[34] [35] [36]

Selected Areas in Communications, vol. 11, no. 7, pp. 1058– 1066, 1993. A. Klein, G. K. Kaleh, and P. W. Baier, “Zero forcing and minimum mean-square-error equalization for multiuser detection in code-division multiple-access channels,” IEEE Trans. Vehicular Technology, vol. 45, no. 2, pp. 276–287, 1996. A. Stamoulis, G. B. Giannakis, and A. Scaglione, “Block FIR decision-feedback equalizers for filterbank precoded transmissions with blind channel estimation capabilities,” IEEE Trans. Communications, vol. 49, no. 1, pp. 69–83, 2001. G. J. Foschini and M. J. Gans, “On limits of wireless communications in a fading environment when using multiple antennas,” Wireless Personal Communications, vol. 6, no. 3, pp. 311–335, 1998. G. G. Raleigh and J. M. Cioffi, “Spatio-temporal coding for wireless communication,” IEEE Trans. Communications, vol. 46, no. 3, pp. 357–366, 1998. D. Gesbert, H. Bolcskei, D. Gore, and A. Paulraj, “MIMO wireless channels: capacity and performance prediction,” in Proc. IEEE Global Telecommunications Conference (GLOBECOM ’00), vol. 2, pp. 1083–1088, San Francisco, Calif, USA, November–December 2000. G. J. Foschini, “Layered space-time architecture for wireless communication in a fading environment when using multiple antennas,” Bell Labs Technical Journal, vol. 1, no. 2, pp. 41–59, 1996. A. Paulraj and T. Kailath, “Increasing capacity in wireless broadcast systems using distributed transmission/directional reception (DTDR),” US Patent 5345599, Stanford University, Stanford, Calif, USA, September 1994. V. Tarokh, N. Seshadri, and A. R. Calderbank, “Space-time codes for high data rate wireless communication: performance criterion and code construction,” IEEE Transactions on Information Theory, vol. 44, no. 2, pp. 744–765, 1998. S. M. Alamouti, “A simple transmit diversity technique for wireless communications,” IEEE Journal on Selected Areas in Communications, vol. 16, no. 8, pp. 1451–1458, 1998. V. Tarokh, H. Jafarkhani, and A. R. Calderbank, “Space-time block codes from orthogonal designs,” IEEE Transactions on Information Theory, vol. 45, no. 5, pp. 1456–1467, 1999. Z. Liu, G. B. Giannakis, B. Muquet, and S. Zhou, “Spacetime coding for broadband wireless communications,” Wireless Communications and Mobile Computing, vol. 1, no. 1, pp. 35–53, 2001.

Frederik Petr´e was born in Tienen, Belgium, on December 12, 1974. He received the Electrical Engineering degree and the Ph.D. in applied sciences from the Katholieke Universiteit Leuven (KULeuven), Leuven, Belgium, in July 1997 and December 2003, respectively. In September 1997, he joined the Design Technology for Integrated Information and Communication Systems (DESICS) Division at the Interuniversity Micro-Electronics Center (IMEC) in Leuven, Belgium. Within the Digital Broadband Terminals (DBATE) Group of DESICS, he first performed predoctoral research on wireline transceiver design for twisted pair, coaxial cable, and powerline communications. During the fall of 1998, he visited the Information Systems Laboratory (ISL) at Stanford University, California, USA, working on OFDM-based powerline communications. In January 1999, he joined the Wireless Systems (WISE) group of DESICS as a Ph.D. researcher, funded by the Institute for

1584 Scientific and Technological Research in Flanders (IWT). Since January 2004, he is a Senior Scientist within the Wireless Research group of DESICS. He is investigating the baseband signal processing algorithms and architectures for future wireless communication systems, like third generation (3G) and fourth generation (4G) cellular networks, and wireless local area networks (WLANs). His main research interests are modulation theory, multiple access schemes, channel estimation and equalization, and smart antenna and MIMO techniques. He is a Member of the ProRISC technical program committee and the IEEE Benelux Section on Communications and Vehicular Technology (CVT). He is a Member of the Executive Board and Project Leader of the Reconfigurable Radio Project of the Network of Excellence in Wireless Communications (NEWCOM), established under the sixth framework of the European Commission. Geert Leus was born in Leuven, Belgium, in 1973. He received the Electrical Engineering degree and the Ph.D. degree in applied sciences from the Katholieke Universiteit Leuven, Belgium, in June 1996 and May 2000, respectively. He has been a Research Assistant and a Postdoctoral Fellow of the Fund for Scientific Research—Flanders, Belgium, from October 1996 till September 2003. During that period, Geert Leus was affiliated with the Electrical Engineering Department of the Katholieke Universiteit Leuven, Belgium. Currently, Geert Leus is an Assistant Professor at the Faculty of Electrical Engineering, Mathematics and Computer Science of the Delft University of Technology, The Netherlands. During the summer of 1998, he visited Stanford University, and from March 2001 till May 2002, he was a Visiting Researcher and Lecturer at the University of Minnesota. His research interests are in the area of signal processing for communications. Geert Leus received a 2002 IEEE Signal Processing Society Young Author Best Paper Award. He is a Member of the IEEE Signal Processing for Communications Technical Committee, and an Associate Editor for the IEEE Transactions on Wireless Communications and the IEEE Signal Processing Letters. Marc Moonen received the Electrical Engineering degree and the Ph.D. degree in applied sciences from the Katholieke Universiteit Leuven, Leuven, Belgium, in 1986 and 1990, respectively. Since 2004, he is a Full Professor at the Electrical Engineering Department of Katholieke Universiteit Leuven, where he is currently heading a research team of 16 Ph.D. candidates and postdocs, working in the area of signal processing for digital communications, wireless communications, DSL, and audio signal processing. He received the 1994 KU Leuven Research Council Award, the 1997 Alcatel Bell (Belgium) Award (with Piet Vandaele), and was a 1997 “Laureate of the Belgium Royal Academy of Science.” He was the Chairman of the IEEE Benelux Signal Processing Chapter (1998–2002), and is currently a EURASIP AdCom Member (European Association for Signal, Speech and Image Processing, from 2000 till now). He is Editor-in-Chief for the “EURASIP Journal on Applied Signal Processing” (from 2003 till now), and a Member of the Editorial Board of “Integration, the VLSI Journal,” “IEEE Transactions on Circuits and Systems II” (2002-2003), “EURASIP Journal on Wireless Communications and Networking,” and “IEEE Signal Processing Magazine.”

EURASIP Journal on Applied Signal Processing Hugo De Man is Professor of electrical engineering at the Katholieke Universiteit Leuven, Belgium since 1976. He was Visiting Associate Professor at UC Berkeley in 1975 teaching semiconductor physics and VLSI design. His early research was devoted to the development of mixed-signal, switched capacitor, and DSP simulation tools as well as new topologies for high-speed CMOS circuits which lead to the invention of NORA CMOS. In 1984, he was one of the cofounders of IMEC (Interuniversity Microelectronics Center), which, today, is the largest independent semiconductor research institute in Europe with over 1100 employees. From 1984 to 1995, he was Vice President of IMEC, responsible for research in design technology for DSP and telecom applications. In 1995, he became a Senior Research Fellow of IMEC, working on strategies for education and research on design of future post-PC systems. His research at IMEC has lead to many novel tools and methods in the area of high-level synthesis, hardware-software codesign, and C++ based design. Many of these tools are now commercialized by spin-off companies like Coware, Adelante Techn, and Target Compilers. His work and teaching also resulted in a cluster of DSP-oriented companies in Leuven, now known as DSP Valley, where more than 1500 DSP engineers work on design tools and on telecom, networking, and multimedia integrated system products. In 1999, he received the Technical Achievement Award of the IEEE Signal Processing Society, the Phil Kaufman Award of the EDA Consortium, and the Golden Jubilee Medal of the IEEE Circuits and Systems Society. Hugo De Man is an IEEE Fellow and a Member of the Royal Academy of Sciences in Belgium.

EURASIP Journal on Applied Signal Processing 2004:10, 1585–1594 c 2004 Hindawi Publishing Corporation 

Bit Error Rate Analysis for MC-CDMA Systems in Nakagami-m Fading Channels Zexian Li Centre for Wireless Communications (CWC), University of Oulu, 90014 Oulu, Finland Email: [email protected]

Matti Latva-aho Centre for Wireless Communications (CWC), University of Oulu, 90014 Oulu, Finland Email: [email protected] Received 24 February 2003; Revised 22 September 2003 Multicarrier code division multiple access (MC-CDMA) is a promising technique that combines orthogonal frequency division multiplexing (OFDM) with CDMA. In this paper, based on an alternative expression for the Q-function, characteristic function and Gaussian approximation, we present a new practical technique for determining the bit error rate (BER) of multiuser MCCDMA systems in frequency-selective Nakagami-m fading channels. The results are applicable to systems employing coherent demodulation with maximal ratio combining (MRC) or equal gain combining (EGC). The analysis assumes that different subcarriers experience independent fading channels, which are not necessarily identically distributed. The final average BER is expressed in the form of a single finite range integral and an integrand composed of tabulated functions which can be easily computed numerically. The accuracy of the proposed approach is demonstrated with computer simulations. Keywords and phrases: multicarrier CDMA, bit error rate, Nakagami fading channel, spread-spectrum communications.

1.

INTRODUCTION

Multicarrier code division multiple access (MC-CDMA), which efficiently combines CDMA with orthogonal frequency division multiplexing (OFDM), has gained considerable attention as a promising multiple access technique for future mobile communications [1, 2, 3, 4, 5, 6, 7, 8]. MCCDMA is a spread spectrum technique where the signal is spread in the frequency domain. Since the MC-CDMA technique possesses the advantages of both OFDM and CDMA, it has the properties desirable for future systems such as insensitivity to frequency-selective fading channels, frequency diversity, and the capability of supporting multirate service by applying either multicode or variable spreading factor techniques [1]. Many papers have been dedicated to the bit error rate (BER) analysis of MC-CDMA [3, 4, 5, 6, 7]. The performance of MC-CDMA has been studied both for the uplink and the downlink of a mobile communication system [3] in which perfect time synchronization among users was assumed. To get the BER, three approximation methods for the distribution of the sum of independently identically distributed (i.i.d.) Rayleigh random variables (r.v.’s) were employed in the paper: the law of large numbers (LLN) approximation, the small parameter approximation and the central

limit theorem (CLT) approximation. The authors of [5] analyzed the BER performance of MC-CDMA systems with a frequency offset. The CLT approximation was used in the analysis. A performance analysis using the LLN approximation of an MC-CDMA system employing an antenna array at the base station has been presented in [6]. The bit error probability in multipath channels was analyzed in [7] based on the CLT approximation. It is well known that approximation methods are not always accurate in practice, thus we have to choose the approximation method according to the system parameters and/or operating environment. For a maximal ratio combining (MRC) receiver operating in a Rayleigh fading channel, the distribution of the sum of exponentially distributed r.v.’s is known to have a gamma distribution from which the exact expression for the average probability of error can be obtained. However, for an equal gain combining (EGC) receiver, finding the distribution of the sum of the independent Rayleigh r.v.’s is more problematic. In [9], Beaulieu offered an infinite series representation of this sum. With the help of characteristic functions of the decision variables, the authors of [10] studied the performance of MC-CDMA with an EGC receiver and Rayleigh fading channels. Nakagami fading channels have received considerable attention in the study of various aspects of wireless systems [11, 12]. The

1586

EURASIP Journal on Applied Signal Processing

User data

2.1.

cos(w0 t)

Spreading code

Transmitter

Transmitted signal Sk (t) corresponding to the block of M data bits of the kth user is . . .

S/P



;

Channel

GI

M −1 N −1     2P ' ' ck [n]bk [m] cos ωn t pTb t − mTb , N m=0 n=0 (1) where P is the power of a data bit, M is the packet size, {ck [n]} represents the signature sequence of the kth user, pTb (t) is the rectangular pulse defined over the bit duration [0, Tb ], and bk [m] represents the mth input data bit from user k which are independent equiprobable r.v.’s (∈ ±1) with E[bk (m)] = 0 and E[|bk (m)|2 ] = 1. ωn is the angular frequency of the nth subcarrier.

Sk (t) =

cos(wN −1 t) (a) cos(w0 t)

Received data

Remove GI

Frequency domain equalizer

. S/P ..

cos(wN −1 t)

. . .

P/S

User Data

2.2. Spreading code

(b)

Figure 1: Block diagram for the MC-CDMA (a) transmitter and (b) receiver (GI: guard interval).

Channel model

Independent, frequency-selective Nakagami-m fading channels for each user are considered. With the proper selection of the number of subcarriers for a user, it is reasonable to assume that each subcarrier undergoes independent frequency-nonselective Nakagami fading. Therefore, the equivalent time-variant complex fading channel for the kth user, nth subcarrier can be represented as 



Hk,n (t; τ) = βk,n (t)e jθk,n (t) δ τ − τk , Nakagami distribution provides a more general and versatile way to model wireless channels [13]. The authors investigated the BER performance of MC-CDMA with an EGC receiver and Nakagami-m fading channels [8] with the same fading parameter m on different subcarriers. Although usually the correlation of fading channels amongst subcarriers cannot be ignored, it can be reduced with a properly designed frequency interleaver. Furthermore, the BER performance of MC-CDMA with independent fading channels can provide a helpful benchmark for system design. Motivated by this, the objective of this paper is to present an alternative Gaussian approximation (AGA) approach for deriving the expression for the BER of MC-CDMA with both MRC and EGC in Nakagami-m fading channels where independent fading channels between different subcarriers are assumed. By using an alternative expression for the Gaussian Q(·) function and the characteristic function of Nakagami-m variables, the average BER of an MC-CDMA system can be found. The rest of this paper is organized as follows. Section 2 gives a description of the MC-CDMA system model. The performance analysis for both MRC and EGC is carried out in Section 3. Section 4 provides a comparison between computer simulation results and analytical results. Finally, Section 5 draws the conclusions.

where τk is the propagation delay for the kth user and δ(·) is the Dirac delta function. The amplitudes {βk,n (t)} are independent Nakagami-m r.v.’s and the phase offsets {θk,n (t)} are identical r.v.’s uniformly distributed over [0, 2π). The fading amplitude βk,n is characterized by a Nakagami-m distribution [13] 

p βk,n , Ωk,n

SYSTEM MODEL

In this section, the model of an MC-CDMA system is described. We assume that there are K simultaneous users, each having N subcarriers. The block diagrams of the considered MC-CDMA transmitter and receiver with one tap frequency domain equalizers in the uplink are depicted in Figure 1.



m

2m −1

"

2 2mk,nk,n βk,nk,n −mk,n βk,n   exp = mk,n Ωk,n Ωk,n Γ mk,n

#

(3)

2 with the parameters mk,n = Ω2k,n /E[(βk,n − Ω2k,n )2 ] ≥ 0.5 and 2 ], E[·] denotes the expectation operator and Ωk,n = E[βk,n Γ(·) is the Gamma function. The Nakagami assumption on 2 the amplitude implies that γk,n = βk,n (Eb /N0 )(Eb = PTb : the energy per bit) follows the well-known gamma distribution





p γk,n =

  mk,nk,n γk,nk,n m γ  exp − k,n k,n , mk,n  γ¯k,n γ¯k,n Γ mk,n m

m −1

(4)

where γ¯k,n = (Eb /N0 )Ωk,n is the average signal-to-noise ratio (SNR) per symbol. For the downlink, Hk,n is the same for different k at a certain reception point {k = 0, 1, . . . , K − 1}. 2.3.

2.

(2)

Receiver

The received signal r(t) can be written as ;

r(t) =

K −1 M −1 N −1 2P ' ' ' βk,n (t)ck [n]bk [m] N k=0 m=0 n=0









× cos ωn t + θk,n (t) pTb t − mTb − τk + n(t),

(5)

BER Analysis for MC-CDMA in Nakagami-m Fading Channels where n(t) is the additive white Gaussian noise (AWGN) with a double-sided power spectral density of N0 /2. The insertion of an equalizer in the frequency domain or time domain is necessary to upgrade the performance of the system by multiplying each subcarrier by the factor Gk,n (m) in the mth bit interval [14]. Without the loss of generality, we consider the signal from the first user as the desired signal. With coherent demodulation, the decision variable v0 of the mth data bit of the first user is given by v0 =

1 Tb

/ (m+1)Tb mTb

r(t)

N' −1





c0 [n]G0,n (m) cos ωn t + θk,n (t) dt,

n=0

(6) where it has been assumed that one data bit occupies all subcarriers1 and the receiver is synchronized with the desired user (k = 0). The channel fading and phase shift variables are assumed to be constant over the time interval [mTb , (m + 1)Tb ] and are denoted by βk,n (m) and θk,n (m). In this paper, we have paid attention to the two commonly and effectively used combining methods: MRC (G0,n (m) = β0,n (m)) and EGC (G0,n (m) = 1) [14, 15]. For brevity, the time index m is omitted in the following. 3.

PERFORMANCE ANALYSIS

An alternative representation of the Q-function was presented in [16] and leads to a convenient method for performance analysis. By applying the Q-function 1 Q(x) = π

/ π/2 0



1587 η is a Gaussian random variable with zero mean and variance  2 ση2 = (N0 /4Tb ) nN=−01 β0,n . The MAI term I can be expressed as follows: ;

I=

K −1 N −1 P ' ' bk ck [n]c0 [n]βk,n β0,n cos θ˜k,n , 2N k=1 n=0

where θ˜k,n = θ0,n − θk,n . θ0,n and θk,n are i.i.d. r.v.’s, uniformly distributed over [0, 2π). According to [17], the probability density function of θ˜k,n can be easily obtained and E[cos θ˜k,n ] = 0. Since βk,n and θk,n (k = 1, 2, . . . , K − 1; n = 0, 1, . . . , N − 1) are i.i.d. r.v.’s, all (K − 1) × N terms in the summation of (10) are uncorrelated with zero means. Assuming that there is no near-far problem, MAI can be approximated by a conditional Gaussian random variable with zero mean and variance N −1

 2   ' 2 P E cos2 θ˜k,n β0,n , (11) (K − 1)E βk,n 2N n=0

 

σI2 = E I 2 =

where E[cos2 θ˜k,n ] = 1/2. We see that v0 is a conditional Gaussian variable conditioned on {β0,n }. Since η and I are mutually independent, the probability of error using BPSK modulation conditioned on {β0,n } is simply given by [18]



x2 exp − dθ, 2 sin2 θ

x≥0

v0 = S + I + η,



(7)

and the characteristic function of Nakagami-m fading r.v.’s, the bit error probability of an MC-CDMA system can be evaluated. In order to be more general, the uplink direction is considered. For simplicity, it is assumed that different subcarriers experience an i.i.d. fading channel, although identical fading channels are not necessary for the analysis. Assuming that the users are time synchronous, after demodulation and combining subcarrier signals, the decision variable in (6) can be written as (8)

Pr error |β0,n

;

S=

P 2N

N' −1

2 b0 β0,n ,

(9)

n=0

1 Higher data rates can be obtained by using a small spreading factor (SF), that is, subcarriers are used by different data bits. For SF = 1, the system becomes OFDM.





< = = = Q> 

S2  . 2 ση + σI2

(12)

To compute the average BER, we must statistically average (12) over the joint probability density function pβ (β0,0 , . . . , β0,N −1 ) of the fading amplitudes. Using the alternative Q-function (7) and the assumption of independent fading channels at different subcarriers, the average BER can be expressed as Pe =

/∞ 0

···

/∞ 0

1 π

/ π/2 0





exp  −

S2 / ση2 + σI2 2 sin2 φ

 

  × pβ0,0 β0,0 , . . . , pβ0,N −1   × β0,N −1 dβ0,0 · · · dβ0,N −1 dφ

where S represents the desired signal term, I is the multiple access interference (MAI) from other users, and η is the AWGN term. 3.1. Performance of MRC With G0,n = β0,n and from (6), (8), we get the desired signal of (8) as

(10)

=

1 π

/ π/2 N* −1 0





I0,n SINR0,n , φ dφ,

n=0

(13) where SINR0,n is the average signal to interference plus noise ratio (SINR) for the nth subcarrier of the first user and the following equation has been used: N' −1 1 S2    = γ0,n . 2 N/2 + Eb /N0 (K − 1)/2 n=0 ση2 + σI

(14)

1588

EURASIP Journal on Applied Signal Processing 6∞

By using (4) and 0 xv exp(−ax)dx = Γ(v + 1)/av+1 (a > 0, v > −1) [19], I0,n can be expressed as I0,n =

exp 0

" =

"

/∞



γ0,n     − 2sin2 φ N/2 + Eb /N0 (K − 1)/2

#

 

SINR0,n m0,n sin2 φ

 2    P E cos2 θ˜k,n . (K − 1)E βk,n 2

σI2 = E I 2 =



× p γ0,n dγ0,n #−m0,n

1+

where θ˜k,n has the same meaning as in (10). The term I can be approximated by a Gaussian random variable with zero mean and variance

Using the alternative representation of the Q-function (7), the average BER can be expressed as

.

(15)

Pe =

/∞ 0

···

/∞

1 π

0

/ π/2







I0,n SINR0,n , φ

0

N

dφ.

(17)

Since a multiuser system is considered in this paper, the average BER of the system is given by BER =

1 K

K' −1

Pe (k).

k=0

The EGC equalizer is of importance because the enhancement of MAI due to MRC can be alleviated by EGC. The decision of the mth data bit of the first user is used during the analysis. Similar to MRC, the conditional BER of the system with EGC can be obtained as Pr error|β0,n

< = = = Q> 



S2  , 2 ση + σI2

n=0



× dφ dβ0,0 · · · dβ0,N −1 .

(23) We extended the technique of [20] to an MC-CDMA system with multiple users. By changing variables, (23) becomes Pe =

/∞ 0

1 π

/ π/2 0





exp −

A2 λ2 pλ (λ)dλ dφ, 2 sin2 φ

(24)

where < = = A=>

P



 2

λ=

2N ση2 + σI

N' −1

β0,n

(25)

n=0

and λ denotes the sum of the fading amplitudes after combining. Next, according to the definition of the characteristic function, the term pλ (λ) could be obtained by employing the characteristic function of the Nakagami-m fading channel

3.2. Performance of EGC





× pβ0,0 β0,0 , . . . , pβ0,N −1 β0,N −1

(18)

Using (13)–(18), we can obtain the average BER of the MCCDMA system with MRC by using the simple form of a single integral with finite limits and an integrand composed of an elementary function.





(16)

If all N subcarriers are identically distributed with the same average SINR per bit, then (13) simplifies further to / π/2

2   2 2   N −1  (P/2N)/ ση +σI  '  exp− β0,n   2 



2 (P/N)E β0,n SINR0,n =  2 . K −1 N0 + k=1 (P/N)E βk,n



2 sin φ

0

The average SINR0,n for the nth subcarrier of the first user can be obtained as

1 Pe = π

(22)

pλ (λ) =

1 2π

/∞ −∞

ψλ ( jv)e− jvλ dv.

(26)

Since the fading experienced by different subcarriers is assumed to be mutually independent, the characteristic function of λ simply equals the product of the characteristic function of individual components, leading to

(19)

ψλ ( jv) =

N −1 *

ψβ0,n ( jv).

(27)

n=0

where the expressions for S, η, and I are different from those of the MRC receiver and can be derived from (6) and (8). The desired signal with perfect channel estimation can be expressed as ;

S=

N −1 P ' b0 β0,n , 2N n=0

(20)

η is a Gaussian random variable with zero mean and variance ση2 = NN0 /4Tb . The MAI term I can be written in the form of ;

I=

P 2N

K' −1 N' −1 k=1 n=0

Thus (26) can be of the form pλ (λ) =

(21)

/ ∞  N* −1 −∞



ψβ0,n ( jv) e− jvλ dv.

(28)

n=0

By combining (28) and (24), we get 1 Pe = 2 2π ×

bk ck [n]c0 [n]βk,n cos θ˜k,n ,

1 2π

/ π/2 / ∞  N* −1 0

/∞

exp 

0

−∞

"



ψβ0,n ( jv)

n=0

#

A2 2 λ − jvλ dλ dv dφ. − 2 sinφ 

J(v,φ)



(29)

BER Analysis for MC-CDMA in Nakagami-m Fading Channels The integral of J(v, φ) can be obtained as [16]

where

"

#

sin2 φ 2 J(v, φ) = X(φ) + jY (v, φ) exp − v 2A2 4  5 Y (v, φ) = X 2 (φ) + Y 2 (v, φ) exp j arctan X(φ) 



"

× exp

1589

#

sin2 φ 2 v , − 2A2

W(v, φ) = R(v, φ) cos Θ(v, φ), -

R(v, φ) = X 2 (φ) + Y 2 (v, φ)

where 1 F1 (·; ·; ·) is the Kummer confluent hypergeometric function [21] and

Pe =

1 2π 2

/ π/2 / ∞ −∞

0



" N −1 *



ψλ ( jv) =

(31)

=

n=0

/∞





N' −1

Ω0,n 4m0,n

v2



V0,n (v) U0,n (v)

5

"

U0,n = 1 F1

−∞

η(φ)

;

Γ m0,n + 1/2   V0,n = Γ m0,n

"

#

Ω0,n 3 v2 Ω0,n v1 F1 1 − m0,n ; ; . m0,n 2 4m0,n (34)

Substituting (33) and (30) into (32) gives 

"

2

N' −1

sin φ Ω0,n exp − + v2 2 2A 4m −∞ 0,n n=0 × W(v, φ)dv dφ,

(35)





, φ exp − x2 dx 



(38)



2N p −1 N p ! π 2

 2 .

(39)

N p HN p −1 xi

The remainder of (38) is "√

N p ! π (2N p ) 2A = n W ξ, φ 2 (2n)! sin φ

#

(−∞ < ξ < ∞). (40)

The order number of N p can be properly selected by taking both complexity and accuracy into consideration. Because of the symmetry of the Hermite polynomials about the origin, the nonzero roots occur in pairs ±xi , and the corresponding weight coefficients obey the symmetry relation Hxi = HxN p −i . Both the zeros and the weight factors of the N p -order Hermite polynomial are tabulated in [21, 22] for various polynomial orders N p . Thus yielding the final result in the form of a single finite integral on φ, namely, 1 Pe = 2 2π

#

(37)

where N p is the order of the Hermite polynomial HN p (·) and xi is the ith zero of the N p -order Hermite polynomial. Hxi are the weight factors of the N p -order Hermite polynomial and are given by



#



x = Hxi W  - i , φ + RN p , η(φ) i=1

RN p

1 1 Ω0,n v2 − m0,n ; ; , 2 2 4m0,n



, φ exp − x2 dx



x W -

ψβ0,n ( jv) J(v, φ) dv dφ, (32)

in which

0

η(φ)

Np '

(33)

/ π/2 / ∞



x W -

Hxi =

n=0



N' −1 Y (v, φ) V0,n (v) + arctan . X(φ) U 0,n (v) n=0



n=0



(36)



which can be readily evaluated by the Gaussian-Hermite quadrature formula [21, 22],

#

2 2 U0,n (v)+V0,n (v) exp j arctan



× exp

1 Pe = 2 2π

−∞

ψβ0,n ( jv)

n=0 N −1 4*





/∞

where (·) denotes the real part. Next we elaborate the expression of the characteristic function corresponding to the Nakagami-m fading channel. By definition, the characteristic function of β0,n is given by ψβ0,n ( jv) = E[exp( jβ0,n v)]. It can be expressed as [20] N −1 *





X(φ) =

Generally speaking, the characteristic function of a random variable will be a complex quantity and hence the product of the characteristic function in (27) will be also complex. However, since the average BER is real, it is sufficient to consider only the real part of the right side of (29), which yields



N −1 2 + Finally, letting η(φ) = sin2 φ/2A n=0 (Ω0,n /4m0,n ) and changing the variables as x = η(φ)v, the inner infinite integral can be derived as

?

π sin φ , 2 A " # 1 3 sin2 φ 2 v sin2 φ v . Y (v, φ) = −1 F1 ; ; 2 2 2A2 A2

2 2 U0,n (v) + V0,n (v),

n=0

Θ(v, φ) = arctan (30)

N −1 *

/ π/2 0

-

1

η(φ)

 N 'p  Hx W  - xi i

i=1

η(φ)



, φdφ.

(41)

The average BER of a system with multiple users is obtained by averaging (41) over individual users Pe .

EURASIP Journal on Applied Signal Processing 100

100

10−1

10−1

Average BER

Average BER

1590

10−2

10−3

0

2

4

6 8 Eb /N0 (dB)

10

12

14

Analytical results Simulation results

Figure 2: BER as a function of Eb /N0 (dB) for the MRC receivers of the uplink with different numbers of users in a fading channel. The methods used are AGA analysis and computer simulations (N = 8; ◦: single user; ∗: 2 active users; +: 8 active users).

4.

10−3

10−4

10−4

10−5

10−2

NUMERICAL AND SIMULATION RESULTS

In this section, both computer simulations and a theoretical analysis are carried out to investigate the BER performance of an MC-CDMA system with multiple active users in a Nakagami-m fading channel. The fading channels used in computer simulations are Rayleigh (corresponding to m = 1) fading channels and Nakagami-m fading channels (m = 2 is selected). Both the uplink and downlink are considered here. The simulated system utilizes Walsh-Hadamard (WH) codes as signature sequences. The number of subcarriers is equal to the length of the signature sequence. To calculate the BER, it is assumed that the mean power of each interfering user is equal to the mean power of the desired signal. It is also assumed that the uplink users are synchronous within a cyclic prefix. A flat fading channel on each subcarrier is used and i.i.d. fading among different subcarriers is assumed in this section. Figure 2 shows the comparison of the results from computer simulations and the AGA analysis presented in Section 3 for the uplink MRC receiver in a Rayleigh fading channel with different numbers of active users. The number of subcarriers and maximum number of users used in the simulation system is 8. As we can see from Figure 2, the results achieved by the AGA analysis agree well with those of the computer simulations. Similar results can be obtained from Figure 3, which demonstrates the performance comparison of the same approach for the EGC receiver of the uplink in a Rayleigh fading channel. From Figures 2 and 3, it is not difficult to see that the AGA analysis gives nearly the same results as the computer simulations. There exists a marginal difference in the multiple user cases when MAI becomes the dominant factor affecting system performance. This is due to the inadequate assumption of a Gaussian MAI

0

2

4

6

8 10 Eb /N0 (dB)

12

14

16

Analytical results Simulation results

Figure 3: BER as a function of Eb /N0 (dB) for the EGC receivers of the uplink with different numbers of users in a fading channel. The methods used are AGA analysis and computer simulations (N = 8; ◦: single user; ∗: 2 active users; +: 8 active users).

model when the number of active users is small (K < 10). In [8], we have compared the analytical results for EGC receivers using the proposed AGA technique and the method proposed in [8]. It was shown that the two analytical methods give quite the same analysis results from which the accuracy of the presented analysis method was further demonstrated. However, the method presented in [8] requires that the fading channels on all subcarriers have the same fading parameters. The approach presented in this paper can also be employed to obtain the BER for the receivers in the downlink of an MC-CDMA system. Of course the formulas presented in Section 3 must be changed to correspond to the synchronous downlink case. The performance comparison between the analytical and simulation results of both an MRC receiver and EGC receiver in the downlink are shown in Figures 4 and 5, respectively. The number of subcarriers is fixed to 8 and the number of active users is varied. It is clearly seen from these two figures that the approach is also accurate in the downlink. MRC is not practical for the downlink as the loss of orthogonality of the WH codes is emphasized in the receiver when applying it. In the downlink, EGC outperforms MRC in most cases, especially at high SNR. This means that the loss of orthogonality for EGC, which is caused by channel fading, is less than that of MRC. In order to further verify the accuracy of the proposed AGA method, the comparison between analysis and simulation results for Nakagami-m fading channels with m = 2 is shown in Figure 6 (MRC receivers) and Figure 7 (EGC receiver). The considered system is uplink MC-CDMA with 8 subcarriers and different numbers of active users. From these two figures it should be noted that the AGA method gives more accurate results with m = 2 than in Rayleigh fading channels.

1591

100

100

10−1

10−1

10−2

10−2

Average BER

Average BER

BER Analysis for MC-CDMA in Nakagami-m Fading Channels

10−3

10−4

10−5

10−3

10−4

0

2

4

6

8

10

12

14

10−5

16

0

2

4

Eb /N0 (dB) Analytical results Simulation results

8

10

12

14

Analytical results Simulation results

Figure 4: BER as a function of Eb /N0 (dB) for the MRC receivers of the downlink with different numbers of users in a fading channel. The methods used are AGA analysis and computer simulations (N = 8; ◦: single user; ∗: 2 active users; +: 8 active users).

Figure 6: BER as a function of Eb /N0 (dB) for the MRC receivers of the uplink with different numbers of users in a fading channel. The methods used are AGA analysis and computer simulations (N = 8; fading parameter m = 2; ◦: single user; ∗: 2 active users; +: 8 active users).

100

100

10−1

10−1

10−2

Average BER

Average BER

6

Eb /N0 (dB)

10−3

10−2

10−3

10−4 10−4 10−5

0

2

4

6

8 10 Eb /N0 (dB)

12

14

16

Analytical results Simulation results

Figure 5: BER as a function of Eb /N0 (dB) for the EGC receivers of the downlink with different numbers of users in a fading channel. The methods used are AGA analysis and computer simulations (N = 8; ◦: single user; ∗: 2 active users; +: 8 active users).

Figures 8 and 9 illustrate the effects of channel fading parameter m on the performance of MRC and EGC receivers. Both figures show the analytical results for MRC and EGC receivers in the uplink with different numbers of users. The Eb /N0 is fixed to 0 dB and the number of subcarriers N is 8. As expected, the system performance improves as

10−5

0

2

4

6 8 Eb /N0 (dB)

10

12

14

Analytical results Simulation results

Figure 7: BER as a function of Eb /N0 (dB) for the EGC receivers of the uplink with different numbers of users in a fading channel. The methods used are AGA analysis and computer simulations (N = 8; fading parameter m = 2; ◦: single user; ∗: 2 active users; +: 8 active users).

the amount of fading decreases, more specifically, as m increases, the performance of both EGC and MRC receivers becomes better. The performance of the receivers with the nonfading channel can be obtained until m approaches 10. By

1592

EURASIP Journal on Applied Signal Processing

0.17

100

0.16 0.15

10−1 Average BER

Average BER

0.14 0.13 0.12 0.11 0.1

10−2

10−3

0.09 0.08 0.07

0

1

2

3

4

5 m

k=1 k=2

6

7

8

9

k=4 k=8

0

50

100 150 Number of active users

200

250

MRC EGC

Figure 8: BER as a function of the fading parameter m for the MRC receiver of the uplink with different numbers of active users (N = 8, Eb /N0 = 0 dB).

0.22

Figure 10: BER as a function of the number of active users for both the EGC and MRC receivers of the uplink in a Rayleigh fading channel (N = 256, Eb /N0 = 0 dB and 7 dB).

and MRC receivers with 256 subcarriers in a Rayleigh fading channel (m = 1) is shown in Figure 10. The Eb /N0 is fixed at 0 and 7 dB. The significant impacts of MAI can be observed. It can also be noted that at Eb /N0 = 7 dB, the fully loaded system works well if efficient channel coding is employed.

0.2 0.18 Average BER

10−4

10

0.16 0.14

5.

0.12 0.1 0.08 0.06

0

1

k=1 k=2

2

3

4

5 m

6

7

8

9

10

k=4 k=8

Figure 9: BER as a function of the fading parameter m for the EGC receiver of the uplink with different numbers of active users (N = 8, Eb /N0 = 0 dB).

comparing the MRC receiver and the EGC receiver, it should be noted that the performance curves of the EGC receiver change more than that of the MRC receiver and this suggests that the EGC receiver is more sensitive to the variation of fading parameter m than the MRC receiver. Using the expression for the BER obtained for the uplink transmission in a Nakagami-m fading channel, the average BER versus the number of active users both for EGC

CONCLUSIONS

The BER analysis for MC-CDMA receivers with multiple active users in frequency-selective Nakagami-m fading channels was presented in this paper. The analysis was applied to evaluate the performance of both EGC and MRC receivers in the uplink and downlink. The AGA approach utilizes an alternative expression for the Q-function, combining this with the characteristic function of Nakagami-m r.v.’s, thereby eliminating the need for deriving the distribution of the sum of Nakagami-m signals for the EGC receiver and, hence, avoiding all approximations required therein. The approach used in this paper has the advantage of simplicity in expression and computational efficiency. Both theoretical analysis and computer simulations were used to evaluate the BER performance of the receivers in Rayleigh fading channels. It was of importance to observe that the computer simulations demonstrated the accuracy of the analysis method based on AGA. Therefore, the method presented here provides us with a powerful practical tool to evaluate the BER performance of MC-CDMA systems, especially when the number of subcarriers and users is too large to obtain simulation results. In addition, it was also seen that the influence of MAI on the system performance is significant and that the BER saturates at high SNR for both EGC and MRC receivers in the uplink and downlink when the system is heavily loaded.

BER Analysis for MC-CDMA in Nakagami-m Fading Channels APPENDIX THE CHARACTERISTIC FUNCTION OF A RAYLEIGH RANDOM VARIABLE To obtain the performance of the receivers using EGC in a Rayleigh fading, we can let m0,n = 1 in Section 3. Alternatively, it can be obtained by directly using the characteristic function of Rayleigh r.v.’s. The characteristic function of Rayleigh random variables can be expressed by virtue of the sine and cosine transforms, ψβ0,n ( jv) =

"

/∞

#

2   −β0,n 2β0,n exp cos β0,n v dβ0,n Ω0,n Ω0,n

0

/∞

+j

0

"

"

1 Ω v2 1; ; − 0,n 2 4

= 1 F1

;

+j

#

2   −β0,n 2β0,n exp sin β0,n v dβ0,n Ω0,n Ω0,n

πΩ0,n v exp 4

#

"

#

Ω0,n v2 − . 4 (A.1)

The following formulas from [23] have been used /∞ 0





= /∞ 0



xv exp − αx2 cos(xy)dx "



#

1 −(1/2)(1+v) 1 1 1 1 1 y2 Γ , α + v × 1 F1 + v; ; − 2 2 2 2 2 2 4α 



x exp − αx2 sin(xy)dx 

=



1 −(1/2)(1+v) √ −3/2 1 πa y exp − a−1 y 2 . α 4 4 (A.2)

The characteristic function of a Rayleigh random variable can also be written as 

ψβ0,n

"

( jv) = 1 F1

1 1 Ω0,n v2 − ; ; 2 2 4

" × exp

#

;

#

+j



πΩ0,n  v 4

Ω0,n v2 − , 4

(A.3)

where the property [21] 1 F1 (a, b, z)

= ez × 1 F1 (b − a, b, z),

(A.4)

was used. Then following the same procedure as in Section 3 and making some changes, the performance of EGC in a Rayleigh fading channel can be obtained. ACKNOWLEDGMENTS The authors would like to acknowledge Dr. Mohammed Abdel-Hafez from United Arab Emirates University for useful discussions when preparing this paper. The reviewers are

1593 appreciated for their helpful comments and suggestions. This paper was presented in part at the IEEE International Conference on Communications (ICC ’02), New York, April 28May 2, 2002. This research was supported by the Academy of Finland, the Finnish National Technology Agency (TEKES), Nokia, the Finnish Defence Forces, and Elektrobit. REFERENCES [1] K. Fazel and S. Kaiser, Eds., Multi-Carrier Spread-Spectrum & Related Topics, Kluwer Academic, Boston, Mass, USA, 2002. [2] S. Hara and R. Prasad, “Overview of multicarrier CDMA,” IEEE Communications Magazine, vol. 35, no. 12, pp. 126–133, 1997. [3] N. Yee, J.-P. Linnartz, and G. Fettweis, “Multicarrier CDMA in indoor wireless radio networks,” in Proc. IEEE Personal, Indoor and Mobile Radio Communications (PIMRC ’93), pp. 109–113, Yokohama, Japan, September 1993. [4] X. Gui and T. S. Ng, “Performance of asynchronous orthogonal multicarrier CDMA system in frequency selective fading channel,” IEEE Trans. Communications, vol. 47, no. 7, pp. 1084–1091, 1999. [5] J. Jang and K. B. Lee, “Effects of frequency offset on MC/CDMA system performance,” IEEE Communications Letters, vol. 3, no. 7, pp. 196–198, 1999. [6] C. K. Kim and Y. S. Cho, “Performance of a wireless MCCDMA system with an antenna array in a fading channel: reverse link,” IEEE Trans. Communications, vol. 48, no. 8, pp. 1257–1261, 2000. [7] S. Moon, G. KO, and K. Kim, “Performance analysis of orthogonal multicarrier-CDMA on two-ray multipath fading channels,” IEICE Transactions on Communications, vol. E84B, no. 1, pp. 128–133, 2001. [8] Z. Li and M. Latva-aho, “BER performance evaluation for MC-CDMA systems in Nakagami-m fading,” Electronics Letters, vol. 38, no. 24, pp. 1516–1518, 2002. [9] N. C. Beaulieu, “An infinite series for the computation of the complementary probability distribution function of a sum of independent random variables and its application to the sum of Rayleigh random variables,” IEEE Trans. Communications, vol. 38, no. 9, pp. 1463–1474, 1990. [10] B. Smida, C. L. Despins, and G. Y. Delisle, “MC-CDMA performance evaluation over a multipath fading channel using the characteristic function method,” IEEE Trans. Communications, vol. 49, no. 8, pp. 1325–1328, 2001. [11] M. K. Simon and M.-S. Alouini, “A unified performance analysis of digital communication with dual selective combining diversity over correlated Rayleigh and Nakagami-m fading channels,” IEEE Trans. Communications, vol. 47, no. 1, pp. 33– 43, 1999. [12] Q. T. Zhang, “Exact analysis of postdetection combining for DPSK and NFSK systems over arbitrarily correlated Nakagami channels,” IEEE Trans. Communications, vol. 46, no. 11, pp. 1459–1467, 1998. [13] M. Nakagami, “The m-distribution—A general formula of intensity distribution of rapid fading,” in Statistical Methods in Radio Wave Propagation, pp. 3–36, Pergamon Press, Oxford, UK, 1960. [14] S. Kaiser, Multi-carrier CDMA mobile radio system-analysis and optimization of detection, decoding, and channel estimation, Ph.D. dissertation, University of Munich, Munich, Germany, 1998. [15] Z. Li and M. Latva-aho, “Performance comparison of frequency domain equalizer for MC-CDMA systems,” in Proc. IEEE International Conference on Mobile and Wireless

1594

[16]

[17] [18] [19] [20]

[21] [22] [23]

EURASIP Journal on Applied Signal Processing Communications Networks (MWCN ’01), pp. 85–89, Recife, Brazil, August 2001. M. K. Simon and M.-S. Alouini, “A unified approach to the performance analysis of digital communication over generalized fading channels,” Proceedings of the IEEE, vol. 86, no. 9, pp. 1860–1877, 1998. A. Papoulis, Probability, Random Variables, and Stochastic Processes, McGraw-Hill, New York, NY, USA, 3rd edition, 1991. J. G. Proakis, Digital Communications, McGraw-Hill, New York, NY, USA, 3rd edition, 1995. I. S. Gradshteyn and I. M. Ryzhik, Table of Integrals, Series, and Products, Academic Press, San Diego, Calif, USA, 5th edition, 1994. M.-S. Alouini and M. K. Simon, “Performance analysis of coherent equal gain combining over Nakagami-m fading channels,” IEEE Trans. Vehicular Technology, vol. 50, no. 6, pp. 1449–1463, 2001. M. Abramowitz and I. A. Stegun, Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, Dover Publications, New York, NY, USA, 10th edition, 1972. Z. Kopal, Numerical Analysis, Chapman & Hall, London, UK, 2nd edition, 1961. A. Erdelyi, Ed., Tables of Integral Transforms, vol. 1, McGrawHill, New York, NY, USA, 1954.

Zexian Li received the Ph.D. degree from Beijing University of Posts and Telecommunications, Beijing, China in 1999. Before that, he received the B.S. and M.S. degrees from Harbin Institute of Technology, Harbin, China, in 1994 and 1996, respectively. From August 1999 to September 2000, he was a Research Engineer in Huawei Technologies Co. Ltd., Beijing. Since October 2000, he has been working in Centre for Wireless Communications (CWC) at University of Oulu, Finland. His research interests include future broadband wireless communications, multicarrier communication systems, communication theory, information theory, and advanced signal processing for communications. Matti Latva-aho received the M.S. (E.E.), Lic.Tech., and Dr. Tech degrees from the University of Oulu, Finland in 1992, 1996, and 1998, respectively. From 1992 to 1993, he was a Research Engineer at Nokia Mobile Phones, Oulu, Finland. During the years 1994–1998 he was a Research Scientist at Telecommunication Laboratory and Centre for Wireless Communications at the University of Oulu. Prof. Latva-aho has been Director of Centre for Wireless Communications at the University of Oulu during 1998–2003. Since 2000 he has been Professor of digital transmission techniques at Telecommunications Laboratory. His research interests include future broadband wireless communication systems and related transceiver algorithms. Prof. Latvaaho has published more than 70 conference or journal papers in the field of CDMA communications.

EURASIP Journal on Applied Signal Processing 2004:10, 1595–1603 c 2004 Hindawi Publishing Corporation 

Performance of Asynchronous MC-CDMA Systems with Maximal Ratio Combining in Frequency-Selective Fading Channels Keli Zhang School of Electrical and Electronic Engineering, Nanyang Technological University, Nanyang Avenue, Singapore 639798 Email: [email protected]

Yong Liang Guan School of Electrical and Electronic Engineering, Nanyang Technological University, Nanyang Avenue, Singapore 639798 Email: [email protected] Received 28 February 2003; Revised 29 August 2003 The bit error rate (BER) performance of the asynchronous uplink channel of multicarrier code division multiple access (MCCDMA) systems with maximal ratio combining (MRC) is analyzed. The study takes into account the effects of channel path correlations in generalized frequency-selective fading channels. Closed-form BER expressions are developed for correlated Nakagami fading channels with arbitrary fading parameters. For channels with correlated Rician fading paths, the BER formula developed is in one-dimensional integration form with finite integration limits, which is also easy to evaluate. The accuracy of the derived BER formulas are verified by computer simulations. The derived BER formulas are also useful in terms of computing other system performance measures such as error floor and user capacity. Keywords and phrases: MC-CDMA, MRC, asynchronous transmission, correlations, fading channels.

1. INTRODUCTION Multicarrier code division multiple access (MC-CDMA) is a technique that combines direct sequence (DS) CDMA with orthogonal frequency division multiplexing (OFDM) modulation. It is one of the candidate technologies considered for the 4th-generation wireless communication systems [1]. MC-CDMA transmits every data symbol on multiple narrowband subcarriers and utilizes cyclic prefix to absorb and remove intersymbol interference (ISI) arising from frequency-selective fading. As it is unlikely for all subcarriers to experience deep fade simultaneously, frequency diversity is achieved when the subcarriers are appropriately combined at the receiver. In [2, 3], it is shown that MC-CDMA outperforms the conventional DS-CDMA and two other forms of CDMA with OFDM modulation, namely MC-DS-CDMA and multitone CDMA. Several combining techniques have been proposed for MC-CDMA systems. Maximal ratio combining (MRC) offers maximum improvement in the presence of spectrally white Gaussian noise [4, 5]. It is shown to achieve better performance for MC-CDMA uplink than equal gain combining (EGC) in [3], and the resultant system has lower error floor than DS-CDMA and MC-DS-CDMA.

The bit error rate (BER) performance of MC-CDMA systems is not easy to analyze as the receiver operations involve coherent combining of a large number of independent or correlated fading subcarriers with possibly different fading statistics. Signal analysis is further complicated by the presence of multiuser interference (MUI) in the received signal. In the literature, simulations are often used to study the performance of MC-CDMA systems [2, 6, 7, 8]. For the downlink performance of MC-CDMA with MRC, performance lower bounds are given in [2, 9]. For the MC-CDMA uplink with MRC, to the authors’ knowledge, the most general performance analysis is given in [3], where Monte Carlo integration is used to evaluate the BER expressions which involve multidimensional integration of dimensions equal to the number of subcarriers. Although simplified performance formulations are given in [10, 11, 12], they are based on the assumptions of independent and identically distributed (i.i.d.) fading among the channel paths [10], or independent fading among the subcarriers [11, 12]. Furthermore, all the works reported in [2, 3, 9, 10, 11, 12] only consider Rayleigh fading channels. In this paper, we conduct a BER analysis for MC-CDMA uplink with MRC in Rayleigh, Rician, and Nakagami fading

1596

EURASIP Journal on Applied Signal Processing

channels with arbitrary fading parameters and correlations between the channel paths or subcarriers. A simplified signal model for the MRC output in uplink channel is first developed. By employing the time-frequency equivalence for multicarrier signals, the generic BER expression of the MCCDMA system with MRC is expressed in terms of the channel impulse response (CIR) information, instead of the subcarrier fading information. Then, by using the technique of Cholesky decomposition, a closed-form BER formula that does not require integration is obtained for channels with correlated Nakagami fading paths. For channels with correlated Rician fading paths, the BER formula is reduced to a form of one-dimensional integration by employing an alternative form of the Gaussian Q-function. The organization of the paper is as follows. The generic BER analysis is given in Section 2, while specific BER formulations for a variety of fading channels are given in Section 3. Section 4 presents the verification results and Section 5 concludes the paper. 2.

are the frequency-domain subcarrier fading gain and phase for the nth subcarrier, respectively, and η(t) is the additive white Gaussian noise (AWGN). The decision variable U of user 1 is U=

I=

=

∞ '

;

v=−∞

N     2Eb 'c bk (v)ck,n uTs t − vTs cos wn t + θk,n , Nc Ts n=1 (1)

where Eb and Ts are the bit energy and symbol duration respectively; uTs (t) represents a rectangular pulse waveform with amplitude 1 and duration Ts ; bk (v) is the vth transmitted data bit, ck,n is the random spreading code; wn is the frequency of the nth subcarrier; and θk,n is the random phase at transmitter. For uplink transmission, the base station receives signals from different users through different propagation channels. This leads to different channel amplitudes and phases to be associated with different users. More importantly, asynchronous transmission between users results in misalignment in the signal arriving times among different users. Hence the received MC-CDMA uplink signal from a quasistatic frequency-selective fading channel is of the form r(t) = η(t) +

∞ '

v=−∞

;

N N 2Eb 'u 'c hk,n bk (v)ck,n uTs (t − vTs − ξk ) Nc Ts k=1 n=1

  × cos wn t + φk,n ,

(2) where φk,n = θk,n +ϕk,n − wn ξk , ξk is the time misalignment of user k with respect to user 1 (the reference user), hk,n and ϕk,n



(3)

n=1

J=

N N   Eb Ts 'u 'c hk,n αn cos φk,n − φ1,n 2Nc k=2 n=1

   × bk (−1)ck,n c1,n ξk + bk (0)c1,n ck,n Ts − ξk , ; N N N Eb Ts 'u 'c 'c

hk,n αn

2Nc 0/

sk (t)



c1,n cos wn t + φ1,n α1,n dt

where D and η are the desired signal and noise components, respectively, I and J are the MUI components, and α1,n is the combiner coefficient for the nth subcarrier of user 1. Without loss of generality, we abbreviate h1,n as hn and α1,n as αn . For MRC, the combiner coefficient αn = 3  hn . Then the desired signal component D = Eb Ts /2Nc Nn=c 1 h2n , and the noise  component η has zero mean and variance (N0 Ts /4) Nn=c 1 h2n . For simplicity in analysis, the uplink MUI is divided into two parts: I is the MUI from the same subcarrier of other users while J is the MUI from other subcarriers of other users [3], that is, ;

Considering an MC-CDMA system with Nu users, each of whom employs Nc subcarriers modulated with BPSK, the transmitted signal corresponding to the kth user can be expressed as follows:

0

Nc '

r(t)

= D + I + J + η,

GENERIC BER ANALYSIS

2.1. Signal model

/ Ts

×

ξk 0

k=2 n=1 q=1, q =n

bk (−1)ck,n c1,n cos













/ Ts

+

ξk

bk (0)ck,n c1,n cos

wn − wq t + φk,n − φ1,n dt 1

wn − wq t + φk,n − φ1,n dt , (4)

where bk (0) and bk (−1) represent the current and previous data bits of the kth user, respectively. Since the user data and fading parameters of different users are uncorrelated, the summands in (4) are uncorrelated too. Even though the subcarrier fading gain hk,n of the same user may be correlated to some extent, the summands in (4) in this case are still uncorrelated due to the presence of other uncorrelated variables such as phase φk,n in the equations. Moreover, since the number of summands in (4) are very large (e.g., Nc can be at least 64 and Nu can be as large as Nc ), both I and J are the summations of large number of uncorrelated terms. Hence, central limit theorem (CLT) can be applied to approximate I and J as Gaussian random variables (RVs) [13]. It is shown in [3] that I and J have zero mean and variance given by 

(5)

N N Eb Ts Nu − 1 σ 2 'c 2 'c h (i − n)−2 , n 4Nc π 2 n=1 i=1, i =n

(6)



var(J) =



N Eb Ts Nu − 1 σ 2 'c 2 hn , 3Nc n=1

var(I) =



where σ 2 is the subcarrier fading power of other users. Thus the BER of an MC-CDMA uplink channel conditioned on

MC-CDMA Uplink Performance in Correlated Multipath Channels the set of subcarrier fading amplitudes {hn } is (

Pe | hn

1597

variable β, that is,

)



Pe =



Nc '   h2n   n=1 = Q <   '   ' Nc Nc = = 2 = 2 Nu−1 σ 2 Nc 2 + Nu−1 σ > + h 2 n 3 2Eb /N0 2π n=1

    ,  Nc  ' −2 2 (i −n) hn

where Q(·) is Gaussian Q-function. The average BER Pe can then be obtained by averaging (7) over the joint distribution function (jdf) of {hn }, that is, Pe =

/∞ 0

···

/∞ 0



(

Pe | hn

)

jdf

(

)

hn dh1 · · ·dhNc .

(8)

The multidimensional integration in (8) is not easy to evaluate even by using Monte Carlo integration. This is because the number of subcarriers is usually large (e.g., 64 in IEEE 802.11a wireless LAN systems) and the subcarrier fading gains are usually correlated. 2.2. Simplification of BER formula 

−2 The presence of Ni=c1, i in (6) results in different =n (i − n) dependence on hn in the variance expressions of I, J, and η, thus complicating the analysis of the uplink BER. However, we noticed that its value does not vary much for different values of n, hence it can be approximated as a constant a whose value only depends on Nc , that is,

Nc '

N N 1 1 'c 'c 1  = a. 2 (i − n) N (i − n)2 c n=1 i=1, i i=1, i =n =n

(9)

Later we will show using simulation results that the effect of the approximation made in (9) on the BER is negligible.  With this approximation, the term Nn=c 1 h2n becomes a common factor in the var(I) expression (5), the var(J) expression (6), and the variance expression of η. Thus the conditional uplink BER expression in (7) can be simplified to < = Nc = ' = = h2n = = ( ) =  n=1    Pe | hn = Q  > 2 Nu −1 σ 2 Nu −1 σ 2 a  + +  3 2π 2



Nc 2Eb /N0

    .   

(10)

4 

β=

n=1

h2n ,

(11)

the BER can be obtained by averaging (10) over the probability density function (pdf) of the combined subcarrier fading

%

νβ f (β)dβ,

(12)







5−1

.

(13)

Comparing (12) and (8), the dimension of integration is reduced from Nc to one, provided that the pdf of β can be obtained. However, in general, it is not easy to find the pdf of β for larger number of subcarriers whose fading gains may be correlated, and/or different subcarriers may have different fading characteristics. We will circumvent this problem by transforming the subcarrier-domain integration in (12) into a path-domain integration. This will be elaborated in the next section. 2.3.

Time- and frequency-domain equivalence of MRC Output

The CIR of a multipath fading channel with N p resolvable paths is typically represented using the tapped delay line model as g(t) =

Np '



 



gl exp jψl δ t − τl ,

(14)

l=1

where gl , ψl , and τl are the fading envelope, phase, and delay of the lth path, respectively. Denoting the complex subcarrier fading gains as a vector h˜ with length Nc , and the complex path fading gains as a vector g˜ with length N p , then h˜ is related to g˜ by discrete Fourier transform (DFT) [14, 15], that is, h˜ = W˜g,

(15)

where W



    =   

1

1

e− j2πτ1 /Ts e− j4πτ1 /Ts .. .

e− j2πτ2 /Ts e− j4πτ2 /Ts .. .

··· ··· ···

.. .

1

e− j2πτN p /Ts e− j4πτN p /Ts .. .

     .   

e− j(Nc −1)2πτ1 /Ts e− j(Nc −1)2πτ2 /Ts · · · e− j(Nc −1)2πτN p /Ts (16)

Nc ' Nc '

$-

2 Nu − 1 σ 2 Nu − 1 σ 2 a Nc N0 + ν= + 3 2π 2 2 Eb

The term Denoting

0

Q

where f (·) denotes the pdf and ν is given by

n=1 i=1, i =n

(7)

/∞

n=1

h2n

Nc

2 n=1 hn

in (10) can now be represented as

˜ H h˜ = g˜ H WH W˜g = Nc g˜ H g˜ = Nc =h

Np '

gl2

(17)

l=1

due to the fact that WH W = Nc IN p , where IN p denotes an N p × N p identity matrix and the superscript H denotes the matrix Hermitian transpose operator. Expression (17) signifies that MRC of subcarrier fading is equivalent to MRC of

1598

EURASIP Journal on Applied Signal Processing

path fading; hence the BER expression in (12) can now be rewritten as Pe =

/∞ 0

Q

$-

%

νNc γ f (γ)dγ,

(18)

where Ωl = E[gl2 ] and ml = E2 [gl2 ]/E[(gl2 − E[gl2 ])2 ]. Γ(·) is the Euler gamma function and E[·] denotes statistical expectation. Since the square of Nakagami RV γl = gl2 follows the gamma distribution ml /Ωl

f γl = γ=

Np '

gl2 .

(19)

l=1

Compared to (12), (18) is much easier to compute because the pdf of γ is generally easier to analyze than the pdf of β for the following reasons: (i) most practical channels are characterized and represented in the form of CIR or power delay profiles [16, 17], which are more directly applicable to (18) than (12); (ii) the number of significant channel paths N p is normally much less than the number of subcarriers Nc . For example, Nc can be 64 for the IEEE 802.11a wireless LAN standard, or as large as 2056 for the European digital video broadcasting (DVB) standard, while N p in practical wireless communication systems is normally less than 10 [16, 17]; (iii) last but not least, fading among the subcarriers is normally correlated (even for channels with independent fading paths [2, 10]). This increases the complexity in determining the pdf of β.

BER FORMULATIONS FOR DIFFERENT FADING CHANNELS

Notice that (18) is equal to the BER expression for a conventional time-domain MRC system, so the problem now is to find the pdf of γ, which is the MRC output of the multiple fading paths. To be most general, we will consider the pdf of γ with arbitrarily correlated paths in Rayleigh, Rician, and Nakagami fading channels. Rayleigh fading is discussed as a special case of Nakagami or Rician fading.

In this subsection, we model the fading path gains {gl } as independent Nakagami-distributed RVs. Nakagami fading distribution, also known as m-distribution, is widely adopted for modelling fading channels because of its good fit to empirical measurements [18, 19, 20], as well as the tractability it renders to BER evaluation [21]. A variety of fading effects can be modelled as Nakagami fading with different m parameters, including Rayleigh fading as a special case when m equals 1. The Nakagami-m distribution is given by 

ml

4 

gl 2ml −1 exp −

e−(ml /Ωl )γl γlml −1   , Γ ml

(21)

m=

Np '

ml ,

(22)

Ωl .

(23)

l=1

Ω=

Np ' l=1

For independent {γl } with nonidentical ml /Ωl , the exact pdf of γ becomes much more complicated to derive. In [22], we have shown that the combined output γ in this case can be adequately approximated as a new gamma-distributed RV. For this resultant gamma distribution, its power Ω is given by (23), while its m parameter value can be derived through moment matching to be $N p

l=1 Ωl

%2

m = N p  . 2 l=1 Ωl /ml

(24)

For equal ml /Ωl , (24) is reduced to (22). Substituting (21) into the BER formula (18) with appropriate m and Ω parameters, we have Pe =

/∞ 0

$ =

Q

m1 Ων

-

νNc γ

%m $

Γ

1 2



(m/Ω)m e−(m/Ω)γ γm−1 dγ Γ(m) %

$

1 + m F1 m, 12 + m; 1 + m, − m Ων 2



2 πΓ(1 + m)

%

(25) ,

where 2 F1 (·) is the hypergeometric function and ν is given in (13).

3.1. Nakagami fading channels with independent paths

  2 ml f gl =   Γ ml Ωl

ml

the MRC signal γ = γ1 + γ2 + · · · + γN p can be viewed as the summation of N p number of gamma variables. If {γl } are independent with identical ml /Ωl for all values of l, γ follows exactly another gamma distribution with new values of m and Ω given by [5]

Therefore, we will use (18) and the pdf of the combined path fading variable γ to formulate Pe in the next section. 3.



 

where



5

ml 2 gl , Ωl

(20)

3.2.

Nakagami fading channels with correlated paths

Although statistical independence among the diversity branches is desired in MRC systems, there are cases where this assumption is not valid [23, 24]. Hence we consider Nakagami fading channels with correlated paths in this subsection. Dual-branch MRC system with correlated Nakagami fading branches is discussed in [5, 24]. The study is further generalized to arbitrary number of diversity branches in [23], subject to the conditions of identical branch parameters, that is, ml and Ωl are the same for all values of l. Also, the results of [23] are only applicable to constant or exponential branch correlation models. In [25], arbitrary branch correlation is

MC-CDMA Uplink Performance in Correlated Multipath Channels studied, but the analysis is limited to identical and integervalued ml parameters across the branches. In [21, 26], noninteger ml values are considered, but the ml parameters must still be identical across the branches. In [27], we propose an approach to obtain the fading statistics of correlated {γl } without any constraints on the fading parameters and correlations of the channel paths. In this approach, Cholesky decomposition is used to transform correlated gamma RVs into linear combinations of independent gamma RVs. Specifically, denote γ = [γ1 , . . . , γN p ]T as the set of correlated gamma variables with covariance matrix Cγ = E[γγT ] − E[γ]E[γT ]. By Cholesky decomposition, Cγ = LLT , where L is a lower triangular matrix with ( j, i)th element denoted as l ji . Let w = [w1 w2 · · · wN p ]T be a set of independent gamma RVs with identity covariance matrix Cw . Next, let γ = Lw

(26)

or equivalently

γl =

l '

lli wi ,

(27)

i=1

.. . γN p =

Np '

lN p i wi ;

i=1

then the covariance matrix of Lw is 



T

E L w − E[w] w − E[w] LT = LLT = Cγ .

(28)

Therefore, the correlated path variables γ have been transformed to weighted sums of independent variables w with weights given by (26) or (27), without affecting the path correlations. By matching the moments of both sides of (27) progressively from top down, the m-parameter mw,i and the power Ωw,i of the elements wi in w can be obtained iteratively using the following equations: mw,1 = m1 ,



mw,i

Since {wi } are independent, it follows from our earlier analysis in this paper that γ can be approximated as a new gamma distributed RV with Ω given by (23) and m given by 

 2 −1 Np l ' '   m = Ω2  m−w,l1 Ωw,l lli   .

(31)

i=1

l=1

It can be shown that the m-parameter expression given in (31) reduces to (24) and (22) for independent paths with nonidentical ml /Ωl and identical ml /Ωl ratios, respectively. Hence (31) is a more general expression. Finally, it should be clear from the preceding analyses that a closed-form BER formula for MC-CDMA uplink channel with MRC in Nakagami fading channel without any constraint on the fading parameters and correlations of the channel paths is realized in (25), with ν, Ω, and m given by (13), (23), and (31), respectively. Furthermore, with ml = 1 for the fading paths, (25) becomes applicable to channels with Rayleigh fading paths. 3.3.

γ1 = l11 w1 , γ2 = l21 w1 + l22 w2 , .. .

 

1599

2 i−1 ' 3   = lii−2  liq mw,q − Ωi  , q=1

3

Summing up both sides of (27) then gives the resultant combined output Np  '

 l '

l=1

i=1

wl

lli .

 



(30)









2 K l + 1 gl K l + 1 gl 2 exp − Kl − f gl = Ωl Ωl  ;

× I0  2





 

Kl Kl + 1  gl , Ωl

(32)

where Ωl = E[gl2 ], I0 (·) is the zeroth-order modified Bessel function of the first kind, and Kl is the Rician K factor [28]. When Kl = 0, (32) reduces to Rayleigh fading distribution. It is well known that for Rician fading channel, the complex channel gain can be represented as complex Gaussian RVs, that is, 

(29)

Ωw,i = mw,i .

γ=

Rician fading channels with independent or correlated paths The Rician distribution is another popular fading model for signal envelops received in channels with direct line-of-sight (LOS) or specular component [28]. When the LOS component is absent, the Rician fading distribution will be reduced to Rayleigh. The BER of the MRC diversity system with Rayleigh diversity branches are available in [28, 29]. However, for Rician fading and especially correlated Rician fading branches, the results in [28, 30, 31] are complicated as they may include hypergeometric functions or sum of integrals of Bessel functions. In this section, we will derive a new one-dimensional integral BER expression based on the characteristic function (CF) [32] of the combined output. The Rician fading path gains gl follow the pdf expression







g˜ = [g1 exp jψ1 , . . . , gN p exp jψN p ]T = Xc + jXs .

(33)

Define X = [Xc ; Xs ], where Xc and Xs are N p ×1 real Gaussian random vectors, µ = E[X] as the mean vector, and Cx as the N p covariance matrix of X. The CF of γ = l=1 gl2 is given in [32] as follows: 4

Ψγ ( jω) =

5  2N p  2   / 1 − 2 jωλ k k k=1 , 1/2 @2N p  k=1 1 − 2 jωλk

exp jω

(34)

1600

EURASIP Journal on Applied Signal Processing 100

100

10−1

10−1

BER

BER

10−2 10−2

(b) Correlated paths

10−3 10−3

10−4

(a) Independent paths

Number of paths = 1, 2, 4, 8, 64 0

5

10

15

20

25

30

Eb /N0 (dB) Solid lines: our results Markers: results from [10]

where λk are the eigenvalues of Cx , Cx = VΛVT , Λ = diag(λ1 , λ2 , . . . , λ2N p ), and k is given by [1 , 2 , . . . , 2N p ]T = VT µ. By utilizing an alternative expression for the Gaussian Qfunction given in [33], that is, 1 π

"

/ π/2

exp 0

#



x2 dφ 2 sin2 φ

(35)

and with the CF of γ given in (34), the BER expression (18) of MC-CDMA uplink with MRC in correlated Rician fading channel can now be obtained by the new equation as shown: 1 Pe = π

/ π/2 0

"

Ψγ

#

νNc − dφ. sin2 φ

(36)

Since only one-dimensional integration is involved and the integration limits are finite, (36) can be easily evaluated numerically. Also, (36) is general enough to cover the case of independent paths with Cx being a diagonal matrix, and the Rayleigh fading case with µ = 0. 4.

0

5

10

15 Eb /N0 (dB)

20

25

30

Analytical with approx. (10) Analytical without approx. (10) Simulation

Figure 1: Comparison of BER values computed using (25) in this paper and BER values taken from [10, Figure 4] (64 subcarriers, 12 users, i.i.d. Rayleigh paths).

Q(x) =

10−5

RESULTS AND DISCUSSIONS

To demonstrate the validity and simplicity of our proposed BER formulation approaches, we compare our results with that in [10], which considers channels with i.i.d. Rayleigh fading paths. Laplace transform and residual method are  used in [10] to compute the pdf of Nn=c 1 h2n and the resultant BER expression is in one-dimensional integration form. In contrast, our BER expression for this case is the closed-form formula in (25). In Figure 1, the analytical BER values com-

Figure 2: Performance of MC-CDMA uplink with MRC (128 subcarriers, 10 users) in a channel with 3 correlated Rician fading paths, K = [5 3 2], Ω = [0.4 0.35 0.25]. Channel (a) contains independent paths with Cx = I6 ; channel (b) contains correlated paths with Cx = [1 0.1 0.2 0 0 0; 0.1 1 0.5 0 0 0; 0.2 0.5 1 0 0 0; 0 0 0 1 0.1 0.2; 0 0 0 0.1 1 0.5; 0 0 0 0.2 0.5 1].

puted using (25) in this paper are compared with the BER values taken from [10, Figure 4]. Both sets of BER values are found to match exactly. In Figures 2 and 3, we use computer simulations of asynchronous MC-CDMA uplink with 128 subcarriers and 10 active users to verify our BER formulas for different fading conditions. There are three types of BER plots in Figures 2 and 3: one simulation plot and two analytical plots (obtained with or without the approximation made in (9)). Figure 2 is for a channel with independent or correlated Rician fading paths with randomly selected Kl , Ωl , and correlation values. Figure 3 is for Nakagami channels consisting of independent paths with identical ml /Ωl ratio, or correlated paths with unequal ml /Ωl ratios. Detailed channel specifications are given in the respective figure captions. The “analytical with approximation (9)” plots in Figures 2 and 3 are obtained by using (36) and (25), respectively, while the “analytical without approximation (9)” plots are obtained by Monte Carlo integration of the conditional BER given in (7). Both Figures 2 and 3 show that the analytical BER values with or without approximation (9) are indistinguishable, hence the effect of the approximation made in (9) on the system BER values is negligible. Although not shown in this paper, the same verification has also been carried out for other values of subcarrier number Nc and fading parameters, for example, Nc from 16 to 256, Rician K factor from 0 to 20, and Nakagami m parameter from 1 to 20. The same conclusion that the effect of the approximation made in (9) on the system BER is insignificant can be reached.

1601

10−1

100

10−2

10−2

10−3

10−4 Error floor

BER

MC-CDMA Uplink Performance in Correlated Multipath Channels

(b) Correlated paths 10−4

10−8

10−5 10−6 10−7

10−6

10−10

(a) Independent paths

0

5

10

15 Eb /N0 (dB)

20

25

30

0

Analytical with approx. (10) Analytical without approx. (10) Simulation

Figure 3: Performance of MC-CDMA uplink with MRC (128 subcarriers, 10 users) in a channel with (a) 4 independent Nakagami fading paths, m = [9 5 4 2], Ω = [0.45 0.25 0.2 0.1]; (b) 3 correlated Nakagami fading paths, m = [9 7 3], Ω = [0.5 0.3 0.2], Cγ =[1 0.4 0.3; 0.4 1 0.7; 0.3 0.7 1].

20

40

60 80 Number of users

100

120

Correlated Rician paths Correlated Nakagami paths

Figure 4: Error floor values versus number of users for MC-CDMA uplink with MRC. Number of subcarriers = 128.

50 45

For Nakagami fading channels, if the fadings in different channel paths are independent and have identical ml /Ωl ratio, the MRC output γ follows an exact gamma distribution. Hence, as seen for the channel condition (a) in Figure 3, both the simulation and analytical BER plots match very well. On the other hand, when the fadings in different channel paths have different ml /Ωl ratios or are correlated, γ is only approximately gamma-distributed. Hence, some mismatch can be observed between the analytical and simulated BER plots for the channel condition (b) in Figure 3. The approximation error associated with the pdf of γ has been discussed in detail in [22, 27]. As concluded in these papers, for most cases, the approximation error is very small and acceptable. As seen in Figures 1, 2, and 3, the MC-CDMA uplink exhibits the familiar error floor at high Eb /N0 level due to the MUI power predominating over the AWGN power. With our simple uplink BER expressions of (36) and (25), the error floor can be easily evaluated by setting Eb /N0 in (13) to infinity. Figure 4 shows the dependence of the error floor values on the number of users in channels with correlated Rician and Nakagami paths. The Rician channel model has the same parameters as the channel condition (b) in Figure 2, while the Nakagami channel model is the same as the channel condition (b) in Figure 3. The Rician channel suffers higher error floor because its channel paths are subject to more severe fading than the Nakagami channel. Furthermore, with our closed-form or one-dimensional integral BER formulas, the effect of various system or channel parameters on the system performance can also be ob-

Number of users

40 35 30 25 20 15 10 5 0

5

10

15

20

25

30

35

Eb /N0 (dB) 4 i.i.d. Rayleigh paths 8 i.i.d. Rayleigh paths

Figure 5: User capacity of MC-CDMA uplink with MRC in channels with i.i.d. Rayleigh fading paths (target BER = 10−3 , 128 subcarriers).

tained with ease. As an example, Figure 5 shows how many users the MC-CDMA system can support in order to meet a target BER of 10−3 in channels with 4 or 8 i.i.d. Rayleigh paths. Such user capacity results can be easily obtained from (25) by simple numerical root finding. Besides, our earlier analysis predicts that the larger the number of channel paths, the more diversity the MC-CDMA system can achieve. This explains why the channel with more paths in Figure 5 can accommodate more users.

1602 5.

CONCLUSIONS

In this paper, we present a way to obtain the analytical BER of MC-CDMA uplink with MRC in channels with correlated Rayleigh, Rician, or Nakagami fading paths. We first achieved a simplified signal model for the MRC output and established its time-frequency equivalence, which states that combining the subcarriers using MRC has exactly the same effect as combining the channel paths using MRC. This principle is exploited to achieve very simplified BER formulas based on (CIR) information of the MC-CDMA system under study. A new closed-form BER formula is derived for channels with correlated Nakagami fading paths using the techniques of Cholesky decomposition and gamma pdf approximation. The BER formula is exact if all the channel paths have identical ml /Ωl ratio; otherwise, it is approximate, but nonetheless adequately accurate. For channels with correlated Rician fading paths, the associated analytical BER formula is derived by appropriate function mapping based on CF and an alternative form of the Gaussian Q-function. The resultant BER formula contains only one-dimensional integration and hence can be easily integrated numerically. REFERENCES [1] A. C. McCormick and E. A. Al-Susa, “Multicarrier CDMA for future generation mobile communication,” Electronics & Communication Engineering Journal, vol. 14, no. 2, pp. 52–60, 2002. [2] S. Hara and R. Prasad, “Overview of multicarrier CDMA,” IEEE Communications Magazine, vol. 35, no. 12, pp. 126–133, 1997. [3] X. Gui and T. S. Ng, “Performance of asynchronous orthogonal multicarrier CDMA system in frequency selective fading channel,” IEEE Transactions on Communications, vol. 47, no. 7, pp. 1084–1091, 1999. [4] W. C. Jakes Jr., Ed., Microwave Mobile Communications, Wiley, New York, NY, USA, 1974. [5] E. K. Al-Hussaini and A. M. Al-Bassiouni, “Performance of MRC diversity systems for the detection of signals with Nakagami fading,” IEEE Transactions on Communications, vol. 33, no. 12, pp. 1315–1319, 1985. [6] N. Yee, J. P. Linnartz, and G. Fettweis, “Multi-carrier CDMA in indoor wireless radio networks,” in Proc. IEEE International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC ’93), pp. 109–113, Yokohama, Japan, September 1993. [7] S. Kaiser, “On the performance of different detection techniques for OFDM-CDMA in fading channels,” in Proc. IEEE Global Telecommunications Conference (GLOBECOM ’95), vol. 3, pp. 2059–2063, Singapore, November 1995. [8] R. L. Gouable and M. Helard, “Performance of single and multi-user detection techniques for a MC-CDMA system over channel model used for HIPERLAN2,” in Proc. IEEE International Symposium on Spread Spectrum Techniques and Applications (ISSSTA ’00), vol. 2, pp. 718–722, Parsippany, NJ, USA, September 2000. [9] Q. Shi and M. Latva-aho, “An exact error floor for downlink MC-CDMA in correlated Rayleigh fading channels,” IEEE Communications Letters, vol. 6, no. 5, pp. 196–198, 2002. [10] J. Park, J. Kim, S. Choi, N. Cho, and D. Hong, “Performance of MC-CDMA systems in non-independent Rayleigh fading,” in Proc. IEEE International Conference on Communications (ICC ’99), pp. 506–510, Vancouver, BC, Canada, June 1999.

EURASIP Journal on Applied Signal Processing [11] Z. Li and M. Latva-aho, “Simple analysis of MRC receivers for MC-CDMA systems in fading channels,” in Proc. International Conferences on Info-Tech and Info-Net (ICII ’01), vol. 2, pp. 560–565, Beijing, China, October 2001. [12] Q. Shi and M. Latva-aho, “Exact bit error rate calculations for synchronous MC-CDMA over a Rayleigh fading channel,” IEEE Communications Letters, vol. 6, no. 7, pp. 276–278, 2002. [13] A. Papoulis and S. U. Pillai, Probability, Random Variables and Stochastic Processes, McGraw-Hill, New York, NY, USA, 4th edition, 2002. [14] O. Edfors, M. Sandell, J.-J. van de Beek, S. K. Wilson, and P. O. Borjesson, “OFDM channel estimation by singular value decomposition,” IEEE Transactions on Communications, vol. 46, no. 7, pp. 931–939, 1998. [15] B. Yang, K. B. Letaief, R. S. Cheng, and Z. Cao, “Channel estimation for OFDM transmission in multipath fading channels based on parametric channel modeling,” IEEE Transactions on Communications, vol. 49, no. 3, pp. 467–479, 2001. [16] T. Ojanpera and R. Prasad, Wideband CDMA for Third Generation Mobile Communications, Artech House Publishers, Boston, Mass, USA, 2001. [17] M. Patzold, Mobile Fading Channels, Wiley, New York, NY, USA, 2002. [18] G. L. Turin, W. S. Jewell, and T. L. Johnston, “Simulation of urban vehicle-monitoring systems,” IEEE Transactions on Vehicular Technology, vol. 21, no. 1, pp. 9–16, 1972. [19] H. Suzuki, “A statistical model for urban radio propagation,” IEEE Transactions on Communications, vol. 25, no. 7, pp. 673– 680, 1977. [20] A. Wojnar, “Rayleigh, Rice and Nakagami—in search of efficient models of fading radio channels,” in Int. Wroclaw Symp. Electromagnetic Compatibility, pp. 797–802, Wroclaw, Poland, 1988. [21] Q. T. Zhang, “Maximal-ratio combining over Nakagami fading channels with an arbitrary branch covariance matrix,” IEEE Transactions Vehicular Technology, vol. 48, no. 4, pp. 1141–1150, 1999. [22] K. Zhang, Z. Song, and Y. L. Guan, “A generalized model for the combined output statistics of MRC diversity systems in Nakagami fading channels,” in Int. Symp. Communication Systems, Networks and Digital Signal Processing (CSNDSP ’02), Staffordshire University, UK, July 2002. [23] V. A. Aalo, “Performance of maximal-ratio diversity systems in a correlated Nakagami-fading environment,” IEEE Transactions on Communications, vol. 43, no. 8, pp. 2360–2369, 1995. [24] G. Fedele, L. Izzo, and M. Tanda, “Dual diversity reception of M-ary DPSK signals over Nakagami fading channels,” in IEEE 6th Int. Symp. Personal, Indoor and Mobile Radio Communication (PIMRC ’95), pp. 1195–1201, Toronto, Canada, September 1995. [25] P. Lombardo, G. Fedele, and M. M. Rao, “MRC performance for binary signals in Nakagami fading with general branch correlation,” IEEE Transactions on Communications, vol. 47, no. 1, pp. 44–52, 1999. [26] M.-S. Alouini, A. Abdi, and M. Kaveh, “Sum of gamma variates and performance of wireless communication systems over Nakagami-fading channels,” IEEE Transactions on Vehicular Technology, vol. 50, no. 6, pp. 1471–1480, 2001. [27] K. Zhang, Z. Song, and Y. L. Guan, “Cholesky decomposition model for correlated MRC diversity systems in Nakagami fading channels,” in IEEE 56th Vehicular Technology Conference (VTC ’02 Fall), vol. 3, pp. 1515–1519, Vancouver, BC, Canada, September 2002. [28] M. K. Simon and M.-S. Alouini, Digital Communication over Fading Channels: A Unified Approach to Performance Analysis, Wiley, New York, NY, USA, 2000.

MC-CDMA Uplink Performance in Correlated Multipath Channels [29] J. G. Proakis, Digital Communications, McGraw-Hill, New York, NY, USA, 4th edition, 2001. [30] D. D. N. Bevan, V. T. Ermolayev, and A. G. Flaksman, “Coherent multichannel reception of binary modulated signals with dependent Rician fading,” IEE Proceedings Communications, vol. 148, no. 2, pp. 105–111, 2001. [31] W. C. Lindsey, “Error probabilities for Rician fading multichannel reception of binary and N-ary signals,” IEEE Transactions on Information Theory, vol. 10, no. 4, pp. 339–350, 1964. [32] R. K. Mallik and M. Z. Win, “Error probability of binary NFSK and DPSK with postdetection combining over correlated Rician channels,” IEEE Transactions on Communications, vol. 48, no. 12, pp. 1975–1978, 2000. [33] M.-S. Alouini and A. J. Goldsmith, “A unified approach for calculating error rates of linearly modulated signals over generalized fading channels,” IEEE Transactions on Communications, vol. 47, no. 9, pp. 1324–1334, 1999. Keli Zhang received the B.Eng. degree in electrical and electronic engineering from Nanyang Technological University, Singapore. Since 2001, she has been working towards the Ph.D. degree in wireless communications at the School of Electrical and Electronic Engineering (EEE) in Nanyang Technological University, Singapore. Her research interests include channel modeling for MCM systems, performance analysis of MC-CDMA, diversity combining. Yong Liang Guan received his B.Eng. and Ph.D. degrees from the National University of Singapore and Imperial College of Science, Technology and Medicine, University of London, respectively. He is currently an Assistant Professor in the School of Electrical and Electronic Engineering (EEE), Nanyang Technological University (NTU), Singapore. He is also the Program Director for the Wireless Network Research Group in the Positioning and Wireless Technology Center (PWTC), and the Deputy Director of the Center for Information Security, NTU. His research interests include multicarrier modulation, Turbo and space-time coding/decoding, channel modelling, and digital multimedia watermarking.

1603

EURASIP Journal on Applied Signal Processing 2004:10, 1604–1615 c 2004 Hindawi Publishing Corporation 

Design and Implementation of MC-CDMA Systems for Future Wireless Networks ´ Sebastien Le Nours CNRS UMR IETR (Institut en Electronique et T´el´ecommunications de Rennes), INSA Rennes, 20 avenue des Buttes de Co¨esmes, 35043 Rennes Cedex, France Email: [email protected]

Fabienne Nouvel CNRS UMR IETR (Institut en Electronique et T´el´ecommunications de Rennes), INSA Rennes, 20 avenue des Buttes de Co¨esmes, 35043 Rennes Cedex, France Email: [email protected]

´ Jean-Franc¸ois Helard CNRS UMR IETR (Institut en Electronique et T´el´ecommunications de Rennes), INSA Rennes, 20 avenue des Buttes de Co¨esmes, 35043 Rennes Cedex, France Email: [email protected] Received 28 February 2003; Revised 8 October 2003 The emerging need for high data rate wireless services has raised considerable interest in MC-CDMA systems. In this work, we describe an MC-CDMA system design process for indoor propagation scenarios. The system specifications and simulations are firstly given, and then implementation aspects on a mixed, multi-DSP and FPGA architecture are presented. In order to reduce development cycle, we propose the use of efficient design methodologies to improve development steps such as complexity evaluation, system distribution according to the architecture, and hardware-software code generation. Implementation results of the considered MC-CDMA system are then given. Keywords and phrases: MC-CDMA, multi-DSP-FPGA architecture, codesign methodology, hardware-software distribution.

1.

INTRODUCTION

The European third-generation (3G) terrestrial mobile system under deployment aims at offering a large variety of circuit and packet services and greater capacity compared to second-generation (2G) systems. The evolution from 2G to 3G corresponds to adapting a new air interface but most of all to a change of focus from voice to multimedia. Fourth generation (4G), as for it, will be defined by the ability to integrate heterogeneous networks, especially radio mobile networks and wireless local area networks (WLAN), that is, to offer access to all services, all the time and everywhere [1]. Besides, the rapid growth of Internet services and the increasing interest in portable computing devices are likely to create a strong demand for high-speed wireless data services. Presumably, it is anticipated that systems with a maximum information bit rate of more than 2–20 Mbps in a vehicular environment and possibly 50–100 Mbps in indoor to pedestrian environments will be needed, using a 50–100 MHz bandwidth. Key issues to fully meet these evolution perspectives are based upon the

most efficient use of scarce spectrum resources, and upon the advent of reconfigurable radio conceivable due to the emergence of software defined radio (SDR) equipments [2]. On the one hand, the multicarrier code-division multiple-access (MC-CDMA) modulation scheme has already proven to be a strong candidate as an access technique for broadband cellular systems [3]. Different concepts based on the combination of multicarrier (MC) modulation with direct-sequence CDMA (DS-CDMA) have been introduced in 1993 [4, 5, 6, 7]. Since that time, owing to its high spectral efficiency and high flexibility, MC-CDMA scheme has become a promising access technique for 4G air interface. MC-CDMA benefits are for example highlighted in [8]; it is demonstrated that, with respect to universal mobile telecommunications system (UMTS) and International Mobile Telecommunications 2000 (IMT2000) requirements based on a 5 MHz bandwidth channel, a net bit rate up to 4 Mbps with a 1/2 rate channel code and even 6 Mbps with a 3/4 rate code could be assigned to a single user for indoor but also macrocellular environments with a vehicular

Future Design and Implementation of MC-CDMA Systems mobility. Thus, MC-CDMA is nowadays considered as a very promising technique, specifically for the downlink of the future cellular mobile radio systems. Then, MC-CDMA is for example studied within the European IST project MATRICE (MC-CDMA transmission techniques for integrated broadband cellular Systems)1 . This work has been partly carried out within this project MATRICE which aims at defining a new air interface for 4G systems. On the other hand, the advent of such wireless communication systems also depends on the use of optimized embedded architectures and consequently of advanced design methods. Due to increased complexity applications, achieving high performances solutions is no more guaranteed by fully software (SW) implementation, using general-purpose processors (GPP) or digital signal processors (DSP), or fully hardware (HW) implementation on application specific integrated circuits (ASIC). Thus, heterogeneous architectures based on the combined use of reconfigurable HW components as field programmable gate array (FPGA) and reprogrammable SW processors such as DSP represent attractive and appropriate solutions for complex radiocommunication systems implementation and rapid prototyping. As a result, concurrent design, or codesign, methods become convenient to favour reduced development cycle for SDR system design [9]. These methods notably make possible efficient design spaces exploration to achieve an optimized matching between developed algorithms and targeted architectures [10]. In this context, an implementation of an airport data link based on MC communications was proposed in [11]. Besides, special focus on equalisation receiver design [12] or system consumption [13] can also be found. In a general way, this work aims at investigating MC-CDMA system design in the 4G context, from system definition to implementation under real-time constraints. This paper is dedicated to the study of MC-CDMA for indoor propagation scenarios. This first step is necessary to guarantee the feasibility of the implementation under real-time constraints. According to channel properties, different configurations for a MC-CDMA downlink air interface are proposed and simulated. Implementation results on a heterogeneous platform combining DSP and FPGA are also presented. Our implementation approach is based on specific codesign methods in order to propose an efficient design flow integrating system modelisation, algorithms complexity evaluation, architectural exploration, automatic code generation, and implementation on the testbed platform. This paper is organized as follows. In Section 2, first of all, the main features of the studied MC-CDMA system are presented. Furthermore, used heterogeneous platform is described, and the benefits of our codesign approach will be highlighted. In Section 3, system parameters are presented and simulation results are given. Section 4 deals with complexity analysis of studied MC-CDMA functions, whereas Section 5 presents implementation aspects of our codesign 1 www.ist-matrice.org

1605

System specifications

System modelisation

Architecture definition

Complexity analysis, distribution, and performances prediction

SW synthesis

HW-SW interface synthesis

HW synthesis

Simulation and validation according to system constraints

Implementation on heterogeneous target

Figure 1: Generic codesign flow.

approach on the mixed architecture. Finally, Section 6 summarizes the results and conclusions are given. 2.

MC-CDMA SYSTEM DESIGN

codesign flow enables a top-down design from a system modelisation step to implementation on a prototyping board under real-time constraints, as illustrated in Figure 1. The first step aims at establishing MC-CDMA system specifications according to channel properties. Once validated, the system modelisation will then be used as an entry point in the architectural design. This important step deals with HW-SW distribution according to the specified functions complexity and the available architecture. Accurate modelisation is required to efficiently investigate various implementation solutions according to real-time constraints, such as throughput and consumption. Then, automatic synthesis of the adopted solution, both for the SW part, the HW part, and the interfaces, leads to a reduced development time and reliable solution. 2.1.

MC-CDMA system modelisation

The MC-CDMA air interface allows high-capacity networks and robustness in the case of frequency-selective channels, taking benefits from CDMA capability offered by the spread spectrum technique, and MC modulation as orthogonal frequency division multiplex (OFDM). A possible generic downlink transmission scheme is depicted in Figure 2. Each user data can be simultaneously processed at the spreading step before MC modulation. In the following, due to their good properties for the downlink [14], WalshHadamard (WH) spreading sequences will be considered. The presented MC-CDMA configuration is based on the transmission of multiple data per MC-CDMA symbol for each user. Data dij (n) denotes the ith, 1 ≤ i ≤ Nb , data transmitted by user j, 1 ≤ j ≤ Nu , in the nth MC-CDMA symbol.

1606

EURASIP Journal on Applied Signal Processing MC-CDMA transmitter Spreading

. ..

Frequency interleaving

. . .

. .. Last set of 0 d Nb transmitted symbols 1 for each active user d Nb

. . .

Spreading

. . .

Nu

Receiver user j . . . d 1j N dj b

. . .

Despreading

. . .

Equalisation

. . . . . .

. ..

Frequency deinterleaving

First set of 0 d1 1 transmitted symbols for each active user 1 dNu

OFDM modulation

. OFDM . . demodulation

IF and analog RF conversions

. ..

. . .

Propagation channel model

Numerical and BB conversions

···

Channel estimation

. . .

Figure 2: Studied MC-CDMA transmitter and receiver.

The maximum number of available users, which is also equal to the length of the WH spreading sequences, will be denoted Nu . The total number of subcarriers is Nc = Nz + Ncu , where Nz and Ncu are the number of unused and used subcarriers, respectively. Therefore, the number of data transmitted by each user in one MC-CDMA symbol is Nb = Ncu /Nu . Frequency interleaving is performed in order to fully exploit the frequency diversity offered by OFDM modulation. At the receiver part, despreading is done according to the specific user sequence after equalisation in the frequency domain. The system synchronisation and intermediate frequency (IF) and baseband (BB) conversions problems are beyond the scope of this paper and will not be addressed. Among various equalisation techniques, we especially focus on single-user detection techniques. Channel estimation function can efficiently be performed by using pilot subcarriers insertion. The arrangement of these pilots must guarantee an optimum sampling of the channel transfer function in time and in frequency, depending on the bandwidth coherence and on the time coherence of the channel [15]. Obviously, MC-CDMA system offers high flexibility in resources (spectral efficiency, number of users) allocation which consequently induces large design spaces. As a result, high-level design methods are convenient in order to deal with such complexity and for efficient implementation. 2.2. Description of the proposed codesign approach Most of radiocommunication systems designed on heterogeneous platforms are faced by the complexity of mixing SW and HW design flows. Functions distribution according to

HW or SW mostly depends on designers experience. Besides, the matching between algorithm and architecture and estimation performances for multicomponent architecture is rarely addressed. Thus, as illustrated in similar works [16, 17], a highlevel specification is required to improve HW-SW distribution and combined simulation. Our purpose is to propose an efficient top-down design flow, making possible efficient architectural choices taking into account specified algorithms and heterogeneous target properties. Besides, in order to favour reusability and to reduce design process duration, a multisource integration, as well as HW description language (HDL) sources such as C codes, is required. As illustrated in Figure 3, our design process is based on the concurrent use of two codesign methods and their associated tools: the codesign methodology for embedded systems (CoMES) methodology [18], and the algorithm architecture adequation [19] (AAA) methodology; “Adequation” is a French word meaning an efficient matching. The first method is used for system modelisation and simulation, algorithms complexity evaluation, and architectural design, whereas the second one is used for functions distribution and code generation. CoMES modelisation combines a graph model with C-coded algorithms, allowing complete system simulation without any assumption on architecture. Functions activity and complexity can then be evaluated using a profiling step. In a second part, the target architecture can be defined as a set of interconnected HW and SW processors. Finally, functions distribution on the multicomponent architecture can be studied according to system attributes such as time

Future Design and Implementation of MC-CDMA Systems

1607

Feedback for distribution optimisation

System simulation without architectural assumptions

Complexity analysis, and implementation performances prediction

System simulation with architecture limitations and implementation performances evaluations

AAA specifications Automatic functions distribution and scheduling

Implementation performances evaluations of the optimised distributed solution

HW-SW SW HW interface synthesis synthesis synthesis DSP FPGA implementation implementation

AAA design flow

Architectural attributes

System modelisation

CoMES design flow

System specifications

System validation on the prototyping board

Figure 3: Considered design flow.

Software module

Software module

External memory banks

Quick-port interface

Slow-port interface

C6701 C6701 CPU

Virtex Peripherals

C6701 CPU

External memory banks Peripherals

C6701

Hardware module

Slow-port Quick-port Slow-port interface interface interface

External memory banks

Memory controller

Hardware functions implementation Slow-port Quick-port interface interface

PCI bus

IP interfaces Processing units Physical links

Figure 4: Heterogeneous architecture description.

execution on each component, data communication durations, and intercomponent interfaces behaviour. At this step, we can obtain a fully validated and detailed performances estimation of the mapped functions on the distributed architecture. The AAA methodology, as for it, is firstly used for functions automatic distribution, taking into account the different complexity parameters given by the previous step. This feedback makes possible accurate system evaluation. Once an efficient matching between functions and architecture has been found, the AAA methodology allows for algorithms and inter-component communications code generation, as well as for C generation and for VHDL generation.

The next part presents the used architecture for the MCCDMA system implementation. 2.3. Testbed architecture description Our prototyping platform is based on a peripheral component interconnect (PCI) Sundance Multiprocessor motherboard where two DSP-based modules and one FPGA module are plugged. As illustrated in Figure 4, two different communication formats are used: a 8-bit bidirectional format, denoted by slow port, allowing 20 Mbps transfer rate, and a 16-bit bidirectional format, denoted by quick port, allowing 200 Mbps throughput.

1608

EURASIP Journal on Applied Signal Processing Table 1: Propagation channel parameters.

Each SW module uses the TMS320C6701 DSP from Texas Instrument. This component is based on a very long instruction word (VLIW) architecture making it possible to compute 8 operations per cycle at a 167 MHz frequency. The FPGA is a XCV400 Virtex with 400 Kgates, corresponding to 2400 logic blocks. Memory blocks are also available in the FPGA. Dedicated components are used on the SW modules to make possible data exchanges between the DSP peripherals and the communication ports. Besides, HW intellectual property (IP) cores are provided to be inserted in the FPGA component to control the communication channels. The FPGA is configured using a bitstream sent by a DSP. The described codesign approach will be applied to this architecture for system implementation. The next part presents MC-CDMA system parameters and simulation results according to the used channel model.

Parameters Sampling frequency Fs Number of total/used subcarriers (Nc /Ncu ) Symbol/guard interval duration (Tu /Tg ) Subcarrier spacing (∆ f ) Used bandwidth (W) Number of users (Nu )—full-load system Number of symbols per user (Nb ) Net bit rate per user (Du )

Configuration II 50 MHz

64/48

256/192

3.2 µs/0.8 µs

5.12 µs/0.8 µs

321.5 kHz 15.4 MHz

195.3 kHz 37.5 MHz

4, 8, 16

16, 32, 64

12, 6, 3

12, 6, 3

6, 3, 1.5 Mbps

4, 2, 1 Mbps

Np

Ns

Time

···

Nz /2

···

···

Ncu

···

Frequency

3.

Configuration I 20 MHz

. . . . .. .. .. ..

Channel parameters 390 ns 50 ns 11 MHz 15 ms 17.33 Hz

···

. . . . . . . . .. .. .. .. .. .. .. ..

Parameters Maximum delay τmax Delay spread στ Measured 50% coherence bandwidth Bc Measured 50% time coherence Tc Typical Doppler shift fD at 1 m/s

Table 2: Configurations parameters.

···

··· ···

SYSTEM DEFINITION

For indoor propagation scenarios, we considered the BRANA channel as defined in [20], with a frequency carrier fc = 5.2 GHz. In our simulations, the propagation channel will consist of 18 power loss paths with a flat Doppler spectrum on each path. In Table 1, the required channel parameters used to establish our simulation model for the propagation scenario are summed up. This channel model has been implemented on the prototyping board presented in Section 2 in order to simulate the studied MC-CDMA system. The system parameters are chosen according to the time and frequency coherence of the channel in order to reduce intersubcarrier interferences (ICI) and intersymbol interferences (ISI). Besides, investigated MC-CDMA configurations are designed to propose high throughput and high capacity solutions for indoor scenarios. From the system model illustrated in Figure 2, the offered net bit rate per user can be expressed as follows: Du =

nN nNcu nNb  cu  =   =  , Nu Tu + Tg Nu Nc /Fs + Tg Nc /Fs + Tg (1)

where (i) n is the bits number per symbol according to the used modulation. In the following, QPSK modulation will be considered (n = 2); (ii) Ncu corresponds to the number of used subcarriers per MC-CDMA symbol; (iii) Tu +Tg is the whole MC-CDMA symbol duration, with a sampling frequency denoted by Fs . Tg is the guard in-

Unused subcarrier symbol Data subcarrier Pilot subcarrier

Figure 5: MC-CDMA frame structures.

terval duration. According to τmax value and in order to avoid ISI, Tg will be taken equal to 0.8 microseconds. The first proposed configuration, which parameters set is summed up in Table 2, is based on HIPERLAN Type 2 specifications with a 20 MHz sampling frequency. The ratios Og = Tu /(Tu + Tg ) = 0.8 show a low spectral efficiency loss due to guard interval insertion, which corresponds to a power efficiency loss equal to 0.97 dB. In the second studied configuration, a 50 MHz sampling frequency is targeted to achieve a better tradeoff between bit rate and users capacity. In that case, Og = 0.86 leads to a power efficiency loss equal to 0.63 dB. Besides, in both cases, Doppler shift is very low compared to the subcarrier spacing. An appropriate approach for channel estimation in highspeed packet transmission is the use of a dedicated pilot symbol periodically inserted in the transmission frame. Furthermore, the very high ratio between the BRAN-A channel time coherence and the MC-CDMA symbol duration induces very slow channel evolution during the transmission of each symbol. Thus, the considered frame structure, illustrated in a general way in Figure 5, includes N p = 1 pilot symbol at the beginning of each frame and Ns additional MC-CDMA data symbols per frame.

Future Design and Implementation of MC-CDMA Systems

1609

Table 3: Frame structure parameters. Configuration II

100/1000

100/500

400 µs/4 ms

592 µs/3 ms

Then, channel estimation is processed from the pilot symbol received at the beginning of each frame. As a result, nonideal channel estimation parameters impact on the quality of the transmission could be studied. Moreover, a constant power is allocated to pilots for all configurations. Simulations were performed considering a 1 m/s mobile speed. Table 3 gives the different simulated frame structures. Besides, we investigated different detection techniques: maximum ratio combining (MRC), equal gain combining (EGC), orthogonality restoring combining (ORC), and suboptimal minimum mean square error (MMSE) techniques [21]. This last one is done using a fixed signal-to-noise parameter at 12 dB for the MMSE coefficients computation. Figure 6 illustrates the BER performance of considered single-user detectors for configuration I with Nu = 8, whereas Figure 7 represents performance for configuration II with Nu = 32, both in the full-load case. The depicted curves obviously demonstrate efficiency of MMSE-based detector compared to others techniques. Besides, the two detectors using linear channel equalisation, that is, ORC and MMSE detectors, are more sensitive to inaccurate channel estimation than diversity combining detectors such as EGC. According to the presented configurations and the frame structure, a tradeoff between the power allocated to pilot symbols and the performance degradation resulting from channel estimation errors should be found. A similar approach as described in [22] could be used. In the following parts, we present the MC-CDMA system implementation results and the different steps of our design methodology used from system simulation to integration. 4.

MODELISATION AND COMPLEXITY ANALYSIS EVALUATION

4.1. Modelisation step In our design approach and according to specifications given Section 3, the CoMES methodology and its associated tool is used to firstly model the studied MC-CDMA system without any assumption about the architecture. The benefits of this approach is that the model will both be used for functional and architectural descriptions at an abstract level to ease HW-SW distribution and to evaluate implementation performances. The functional model is based on three complementary viewpoints [23]: (i) the structural organisation viewpoint, which represents data dependencies between functional elements, is firstly specified. At the functional level, data are exchanged through ideal FIFO (first-in first-out) communication ports;

10−1

10−2 BER

Configuration I

10−3

10−4 5

10

MRC, Ns = 100 EGC, Ns = 100 ZF, Ns = 100 MMSE, Ns = 100

15 Eb /N0 (dB)

20

25

MRC, Ns = 1000 EGC, Ns = 1000 ZF, Ns = 1000 MMSE, Ns = 1000

Figure 6: Performances results for configuration I, Nu = 8.

10−1

BER

Parameters Number of data symbols per frame Ns Frame duration

10−2

10−3

5

10

MRC, Ns = 100 EGC, Ns = 100 ZF, Ns = 100 MMSE, Ns = 100

15 Eb /N0 (dB)

20

25

MRC, Ns = 500 EGC, Ns = 500 ZF, Ns = 500 MMSE, Ns = 500

Figure 7: Performances results for configuration II, Nu = 32.

(ii) the behavioral viewpoint defines the set of operations and their time order for each function. These two complementary specifications are graphically defined; (iii) the algorithm viewpoint is finally specified in C/C++ language and describes the set of instructions for each operation previously defined. This description approach leads to an efficient design with reduced errors propagation and to a fully executable model for system verification and performances evaluation. The estimation of system performances uses an

1610

EURASIP Journal on Applied Signal Processing Behavioural description Data generation structure

∗N

s

Pilots insertion

Modulation

Spreading

···

Behavioural model symbols: Loop

Data transmission

Conditional wait

FIFO communication

Operation

(a) Temporal execution ···

Functions execution

F Data generation I/O DataE F Modulator

twr cwr

···

tf

··· ···

I/O Modulated data F Spreading

···

I/O Spreaded data

···

F Interleaving . . . Functional structures: F Function I/O Data exchange

Functions execution state: Data read or write Data exchange Function activity Resource waiting

Functional attributes: twr Write/read duration cwr Port capacity tf Function duration

(b)

Figure 8: Example of (a) the MC-CDMA system modelisation and (b) execution graph representation according to associated functional attributes.

uninterpreted model, taking into account system attributes such as operation durations, data exchange formats, FIFO capacity, and so forth. The CoMES simulation model is then a true timed model and not only a functional model as used by most of commercial simulators. To illustrate system attributes influence, Figure 8 shows an MC-CDMA system modelisation example and the associated execution graph at the functional level. At this step, attributes can be set by default and will be more accurately determined once the functions durations evaluation step is done. Figure 9 illustrates CoMES tool system simulation capabilities at the structural viewpoint and at the algorithm viewpoint. Moreover, generic parameters such as spreading length, number of users, or equalisation techniques can be set before simulation to obtain a flexible description and to ease design space exploration.

Then, the MC-CDMA system could fully be modeled using the CoMES tool. Besides, BER simulations were performed to validate system behaviour and used algorithms. This modelisation and functional validation is the first step to achieve before function complexity analysis. 4.2. Complexity analysis evaluation Complexity analysis step aims at defining each function complexity and at investigating implementation performances according to the processor target kind. From system modelisation and specified algorithms, the CoMES tool allows to evaluate functions activity and relative complexity thanks to a profiling step. Complexity comparison illustrated in Figure 10 indicates relative functions duration in system execution. Channel model description complexity has not been represented. This step makes it possible to improve functions

Future Design and Implementation of MC-CDMA Systems

1611

BER evolution

Average power evolution

Data evolution

Figure 9: Example of system simulation applying the CoMES methodology and using the associated tool. On the left-hand side, the structural simulation is based on an execution graph representation of functions activity, and on the right-hand side, algorithmic simulation capabilities are presented. Active Inactive Resource waiting

25%

100%

20%

80%

15%

10%

5%

Function activity

Execution load

90% 70% 60% 50% 40% 30% 20% 10% 0%

Dat a ge nera tion QPS Km odu latio n Spre adin g Inte rleav ing MC mod ulat MC ion dem odu latio n Equ alisa tion Dein terle avin g Des prea ding QPS K de mod ulat ion Dat a rec epti on

Dat a ge nera tion QPS Km odu latio n Spre adin g Inte rleav ing MC mod ulat MC ion dem odu latio n Equ alisa tion Dein terle avin g Des prea ding QPS K de mod ulat ion Dat a rec epti on

0%

Figure 10: Function complexity evaluated for MC-CDMA configuration using Nu = 32 and Nc = 256.

Figure 11: Functions activity in system execution evaluated for MC-CDMA configuration using Nu = 32 and Nc = 256.

description and coding style still without any architectural assumption. Besides, each function activity can also be measured. Figure 11 illustrates the potential bottleneck represented by MC modulation and demodulation compared to other functions. Then, this function still remains the most computing function in the considered MC-CDMA system. This profiling step helps designers to identify critical computing functions. In addition, for accurate architectural design, we completed this complexity evaluation by considering functions implementation performances according to processors targeted on our testbed platform [24]. Thus, whereas OFDM modulation is efficiently performed by an

inverse fast fourier transform (IFFT) algorithm, spreading can conveniently be implemented using a fast hadamard transform (FHT). Results given in Table 4 highlight benefits of FPGA implementation for most of the considered MC-CDMA functions. Computation times are measured according to C6701 DSP clock, that is, 6 nanoseconds, and considering a 20 nanoseconds cycle for the FPGA. These values measured by implementation of each elementary function on the testbed components are used in order to find an efficient matching of MC-CDMA system on the architecture and to evaluate performances achievable with such a platform.

1612

EURASIP Journal on Applied Signal Processing Table 4: Function implementation results in microseconds.

Parameters QPSK modulation Spreading Interleaving OFDM modulation Channel estimation Equalisation

Parameters set Nc = 64, 256 Nu = 4, 8, 16, 32, 64 Nc = 64, 256 Nc = 64, 256 Nc = 64, 256 Nc = 64, 256

C6701 DSP implementation 0.9, 3.5 0.642, 1.542, 3.624, 8.274, 18.684 1.1, 4.2 28.7, 146.718 0.7, 2.4 1.37, 4.83

FPGA implementation 0.96, 3.84 0.08, 0.16, 0.32, 0.64, 1.28 0.96, 3.84 3.84, 15.36 Not implemented 3.84, 15.36 Distribution

HW processor

FPGA HW interface

Communication port 1

HW Cint interface

DSP 1

DSP 2

Data generation

0 Data to modulation QPSK modulation

Communication links

Spreading

Tex

Interleaving 2nd SW processor

1st SW processor

MC modulation

SW interface SW interface

SW interface

Cconc - Sover

Schedule

SW interface

MCM to channel Channel Channel to MCM

Cconc - Sover

Architectural attributes: Cint Interface capacity

MC demodulation

Cconc Software processor concurrency Tex Data exchange duration Sover Software overhead

Equalisation

Figure 12: Architecture model using CoMES methodology.

Deinterleaving Despreading Demodulation Demodulation to data reception

5.

ARCHITECTURAL DESIGN AND IMPLEMENTATION RESULTS

5.1. Architectural design: functions distribution study and implementation efficiency evaluation The purpose of the architectural design step is to define an efficient matching between the developed functions and the available architecture. The CoMES methodology allows to study the impact of functions distribution on the architecture. Architectural attributes such as SW component concurrency, cycle duration for each component, SW overhead, and intercomponent communications durations complete the architecture description. We then modeled our platform as illustrated in Figure 12. Nevertheless, despite the fact that the CoMES tool is still used as the performances evaluation method, the AAA methodology is followed to ease the distribution step. Indeed, this method and its associated tool SynDEx makes possible a quasiautomatic distribution and scheduling of the defined system on the architecture, taking into account previous evaluated functions durations and communication costs. Heuristic research is done to reduce system execution cycle. An example of function distribution and scheduling on the target architecture is illustrated in Figure 13.

Data reception Estimated computation time

Figure 13: Matching exploration result.

Thanks to this exploration step, different distribution performances can easily be investigated, improving the difficult task of HW-SW partition. The retained solution can then be more accurately evaluated in the CoMES tool in terms of pipelined behaviour. The concurrent simulation of algorithms allocated on the architecture makes it possible, on the one hand, to optimize the target architecture or, on the other hand, to evaluate implementation performances at an abstract level. Then, achievable throughputs according to the architecture configuration can then be evaluated. Table 5 gives examples of simulation results following the CoMES methodology. The ideal case denotes results obtained neglecting the communication ports influence. Others cases takes into account the communication kind. It illustrates poor efficiency of a fully SW implementation and the potential bottleneck represented by intercomponent communications.

Future Design and Implementation of MC-CDMA Systems

1613

Table 5: Implementation performances evaluation per user. Configuration I Nu = 8, Nc = 64 181 Kbps 93.9 Kbps 167 Kbps 2.9 Mbps 444 Kbps 2.5 Kbps

Parameters Ideal Slow port Quick port Ideal Slow port Quick port

Fully SW implementation

HW-SW implementation

100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0%

Retained architectural solution

11.60% 9.40% 50.10%

32.83%

17.07%

DSP 1

C and VHDL libraries (communications, functions)

45.50%

29.76%

78.99%

Figure 14: Component activity estimation for configuration II, Nu = 32, implemented on HW-SW architecture with quick ports.

Finally, the components activity can be measured for the most satisfactory solution, as illustrated in Figure 14 for the HW-SW implementation using quick ports. Before implementation on the testbed platform, the last step of the design process is the generation of the codes both for the SW and the HW parts. 5.2. Hardware-Software code generation and implementation results As indicated in Figure 3, the AAA methodology is used in order to generate codes at once for the SW, the HW part, and the interfaces. The code generation process is described in Figure 15. After the distribution step, the tool SynDEx makes it possible to generate distributed executives for each component. This code takes into account intercomponent synchronisations and calls to functions. The code generation uses specific libraries built according to the component kind. The description of theses libraries will not be addressed in the present paper, the reader should be refered to works described in [25]. The benefits of this approach are the generation of fully validated codes reducing the verification step once implementation on the testbed is done. The libraries used for SW generation already exist [26]. We built the needed library for HW generation [27]. This library uses the different developed functions and the required interfaces. The main synthe-

DSP 2

FPGA

System validation on the prototyping board

FPGA

Data processing Data communication Resources waiting

SynDEx code generation

DSP 1

24.74%

DSP 2

Configuration II Nu = 32, Nc = 256 36.4 Kbps 31.4 Kbps 35.8 Kbps 780 Kbps 125 Kbps 645 Kbps

Figure 15: SynDEx code generation process.

sis results in terms of FPGA logic elements for each function implemented in the HW part are given in Table 6. For example, in the Configuration I case, the automatic generation of the HW part of the transmitter retained solution, both for the required interface and the computation functions, corresponds to 1132 logic elements and 12 memory blocks. Then, the HW synthesis results made it possible to fully validate this design at a 50 MHz frequency. 6.

CONCLUSION

We have presented a codesign approach and associated tools for the MC-CDMA system rapid prototyping on a mixed architecture. This design goes from system specification and simulation to HW-SW code generation and implementation on a testbed platform. The use of the CoMES model allows system simulations at the functional level as well as at the architectural one. Then, this top-down design approach makes it possible to accurately evaluate system implementation efficiency, according to functions complexity and architecture properties. Finally, the use of the AAA methodology completes this HW-SW design by covering the distribution and code generation steps. The described design process, applied to MC-CDMA system, facilitates and reduces the development cycle. Then, we easily investigate different implementation solutions according to the considered HW platform. Besides, the benefits of this approach fit into the SoftWare Radio (SWR) requirements for efficient design methods. From our application’s point of view, evaluation results and implementation show the ability to obtain high-speed data rate using a mixed architecture. The demonstrated

1614

EURASIP Journal on Applied Signal Processing Table 6: Synthesis results in terms of occupied logic elements for the HW implemented functions.

Parameters Spreading Interleaving OFDM modulation Equalisation Quick communication ports

Configuration I 138 143 590 – 6 memory blocks 200 80

feasibility of the studied MC-CDMA system could lead to its enhancement of the outdoor propagation characteristics. ACKNOWLEDGMENT This work has been partly carried out within the European Union IST research project MATRICE (MC-CDMA transmission techniques for integrated broadband cellular systems).

[13]

[14]

REFERENCES [1] J. Pereira, “A personal perspective of fourth generation,” Telektronikk, vol. 97, no. 1, pp. 20–30, 2001. [2] M. Arndt, S. Martin, B. Miscopein, V. Bella, L. Bollea, and E. Buracchini, “Software radio: the challenges for reconfigurable terminals,” Annals of Telecommunications, vol. 57, no. 7-8, pp. 570–612, 2002. [3] S. Hara and R. Prasad, “Overview of multicarrier CDMA,” IEEE Communications Magazine, vol. 35, no. 12, pp. 126–133, 1997. [4] N. Yee, J. P. Linnartz, and G. Fettweis, “Multi-carrier CDMA in indoor wireless radio networks,” in Proceedings of the IEEE Personal Indoor and Mobile Radio Communications (PIMRC ’93), pp. 109–113, Yokohama, Japan, September 1993. [5] K. Fazel and L. Papke, “On the performance of convolutionally-coded CDMA/OFDM for mobile communication system,” in Proceedings of the IEEE Personal Indoor and Mobile Radio Communications (PIMRC ’93), pp. 468–472, Yokohama, Japan, September 1993. [6] A. Chouly, A. Brajal, and S. Jourdan, “Orthogonal multicarrier techniques applied to direct sequence spread spectrum CDMA systems,” in Proceedings of Global Telecommunications Conference (GLOBECOM ’93), pp. 1723–1728, Houston, Tex, USA, November 1993. [7] V. M. DaSilva and E. S. Sousa, “Performance of orthogonal CDMA codes for quasi-synchronous communication systems,” in Proceedings of the IEEE International Conference on Universal Personal Communications (ICUPC ’93), pp. 995– 999, Ottawa, Ont, Canada, October 1993. [8] M. H´elard, R. Le Gouable, J.-F. H´elard, and J.-Y. Baudais, “Multicarrier CDMA techniques for future wideband wireless networks,” Annals of Telecommunications, vol. 56, no. 5-6, pp. 260–274, 2001. [9] A. A. Kountouris, C. Moy, and L. Rambaud, “Reconfigurability: a key property in software radio systems,” in First Karlshruhe Workshop on Software Radio, Karlshruhe, Germany, March 2000. [10] J. Staunstrup and W. Wolf, Hardware/Software Co-Design: Principles and Practice, Kluwer Academic Publishers, Norwell, Mass, USA, 1997. [11] E. Haas, H. Lang, and M. Schnell, “Development and implementation of an advanced airport data link based on multicarrier communications,” European Transactions on Telecommunications, vol. 13, no. 5, pp. 447–454, 2002. [12] A. C. McCormick, P. M. Grant, J. S. Thompson, T. Arslan, and

[15]

[16]

[17] [18]

[19]

[20] [21]

[22]

[23]

[24] [25]

Configuration II 400 325 655 – 6 memory blocks 315 80

A. T. Erdogan, “Implementation of a SIC based MC-CDMA base station receiver,” European Transactions on Telecommunications, vol. 13, no. 5, pp. 513–518, 2002. A. C. McCormick, P. M. Grant, J. S. Thompson, T. Arslan, and A. T. Erdogan, “A low power MMSE receiver architecture for multi-carrier CDMA,” in Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS ’01), pp. 41–44, Sydney, Australia, May 2001. S. Nobilet, J.-F. H´elard, and D. Mottier, “Spreading sequences for uplink and downlink MC-CDMA systems: PAPR and MAI minimization,” European Transactions on Telecommunications, vol. 13, no. 5, pp. 465–473, 2002. P. H¨oher, S. Kaiser, and P. Robertson, “Two-dimensional pilot-symbol-aided channel estimation by Wiener filtering,” in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ’97), pp. 1845–1848, Munich, Germany, April 1997. M. Vasilko, L. Machacek, M. Matej, P. Stepien, and S. Holloway, “A rapid prototyping methodology and platform for seamless communication systems,” in Proceedings of 12th IEEE International Workshop on Rapid System Prototyping (RSP ’01), pp. 70–76, Monterey, Calif, USA, June 2001. I. Seskar and N. B. Mandayam, “A software radio architecture for linear multiuser detection,” in Conference on Information Sciences and Systems, Princeton, NJ, USA, March 1998. J.-P. Calvez, Embedded Real-Time Systems: A Specification and Design Methodology, vol. 23 of Wiley Series in Software Engineering Practice Ser., John Wiley & Sons, New York, NY, USA, 1993. T. Grandpierre, C. Lavarenne, and Y. Sorel, “Optimized rapid prototyping for real-time embedded heterogeneous multiprocessors,” in Proceedings of the 7th International Workshop on Hardware/Software Codesign (CODES ’99), pp. 74–78, New York, NY, USA, May 1999. ETSI, “Project Broadband Radio Access Networks (BRAN), HIPERLAN Type 2; Physical layer,” Technical specification, October 1999. S. Kaiser, Multi-carrier CDMA mobile radio system-analysis and optimization of detection, decoding, and channel estimation, Ph.D. thesis, University of Munich, Munich, Germany, 1998. T. S¨alzer, D. Mottier, and L. Brunel, “Influence of system load on channel estimation in MC-CDMA mobile radio communication systems,” in Proceedings IEEE Vehicular Technology Conference (VTC ’01), pp. 522–526, Rhodes, Greece, May 2001. J. P. Calvez, D. Heller, and O. Pasquier, “Uninterpreted cosimulation for performance evaluation of Hw/Sw systems,” in 4th International Workshop on Hardware/Software Co-Design (CODES/CASHE ’96), pp. 132–139, Pittsburgh, Pa, USA, March 1996. J.-F. H´elard, F. Nouvel, and S. Le Nours, “A MC-CDMA system analysis in a software radio context,” Annals of Telecommunications, vol. 57, no. 7-8, pp. 699–720, 2002. V. Fresse, O. D´eforges, and J.-F. Nezan, “AVSynDEx: a rapid prototyping process dedicated to the implementation of

Future Design and Implementation of MC-CDMA Systems digital image processing applications on multi-DSP and FPGA architectures,” EURASIP Journal on Applied Signal Processing, vol. 2002, no. 9, pp. 990–1002, 2002. [26] Y. Le Mener, M. Raulet, J.-F. Nezan, A. Kountouris, and C. Moy, “SynDEx executive kernel development for DSPs TI C6X applied to real-time and embedded multiprocessors architectures,” in Proc. 11th European Signal Processing Conference (EUSIPCO ’02), Toulouse, France, September 2002. [27] F. Nouvel, S. Le Nours, and I. Hermann, “AAA methodology and SynDEx tool capabilities for designing on heterogeneous architecture,” in 18th Conference on Design of Circuits and Integrated Systems (DCIS ’03), Ciudad Real, Spain, November 2003. S´ebastien Le Nours received the Engineering degree in electronics from the ISEN School, Brest, France, in 2000. In 2003, he received the Ph.D. degree in electronics from the National Institute of Applied Sciences (INSA), Rennes, France. His research interests include design methodologies for embedded systems and signal processing techniques, with particular emphasis on multicarrier spread spectrum for mobile communications. He is currently working as an Assistant Professor at the UBS University and is associated to the LESTER laboratory, Lorient, France. Fabienne Nouvel received the Engineering degree in electronics from the National Institute of Applied Sciences (INSA), Rennes, France, in 1985. She worked for 5 years in networks domains. In 1994, she received the Ph.D. degree in electronics from the INSA, Rennes. Since 1995, she has been an Associate Professor at INSA in electronics. Her research topics include electronics, signal processing techniques, especially spread spectrum for embedded indoor and outdoor communications, and design methodologies for heterogeneous systems. Jean-Franc¸ois H´elard received his Dipl.Ing. degree from the National Institute of Applied Sciences (INSA), Rennes, France, in 1981. From 1982 to 1997, he was a Research Engineer and then Head of the channel coding for digital broadcasting research group at the CCETT (France Telecom Research Center) in Rennes, where he worked successively on digital audio broadcasting within EUREKA 147 DAB (digital audio broadcasting) and terrestrial digital video broadcasting (DVB-T) within the framework of the European project dTTb. In 1992, he received the Ph.D. degree in electronics and joined INSA in 1997, where he is currently a Professor and Head of the group Systems, Propagation and Radars of the Rennes Institute for Electronics and Telecommunications (IETR) which depends on the French National Centre for Scientific Research (CNRS). His present research interests lie in signal processing techniques for digital communications, as spacetime and channel coding, multicarrier modulation, spread spectrum, and multiuser communications. He is the author or coauthor of more than 48 technical papers in international scientific journals and conferences, and holds 11 European patents.

1615