AN ONLINE LEARNING APPROACH TO ... - IEEE Xplore

2 downloads 46553 Views 159KB Size Report
555 Technology Square, Cambridge, MA, 02139. ABSTRACT. In this paper, we ... wireless networks has become one of the major bottlenecks to provide reliableย ...
2015 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, SEPT. 17โ€“20, 2015, BOSTON, USA

AN ONLINE LEARNING APPROACH TO THROUGHPUT OPTIMIZATION IN WIRELESS NETWORKS UNDER DYNAMIC AND UNKNOWN INTERFERENCE CONDITIONS Ramesh Annavajjala, Rami S. Mangoubi, Christopher C. Yu and James M. Zagami The Charles Stark Draper Laboratory 555 Technology Square, Cambridge, MA, 02139 ABSTRACT In this paper, we consider a multi-user communication system with dynamically varying interference on block fading channels. We focus on a multi-antenna receiver, single-antenna transmitters, and the case in which the receiver has no knowledge of the channel state information, interference dynamics, and the variance of the additive noise. Pilot-assisted transmission techniques are employed to enable channel estimation at the receiver. For a given channel coherence length, increasing the number of pilots improves the estimation accuracy, with the tradeoff of reduction in data throughput. Thus, we propose to optimize the pilot content within the data frame to maximize the average data throughput. We employ well-known cross-validation techniques from the machine learning literature to simultaneously improve the estimation accuracy as well as the average throughput. Simulation results with the proposed approach suggest that even when the average number of active interferers is larger than the number of degrees of freedom, at least 85% of the ideal throughput can be achieved with the optimum pilot overhead. Index Termsโ€” multiple-access communication, dynamic interference, optimum combining, diagonal loading, cross-validation 1. INTRODUCTION 1.1. Background The recent decade has witnessed near-exponential growth in the wireless data usage [1], and we are marching towards a fifth generation wireless technology to provide data rates of the order of multi-gigabits [2]. With network densification via small cells, and due to spatial-division multiple access (SDMA), interference in wireless networks has become one of the major bottlenecks to provide reliable communication. A large body of research has focused on characterizing the effects of interference on the system performance [3]. A well-known approach to signal detection in the presence of interference is the use of optimum combining receiver [4]. The optimum combining receiver is essentially a linear minimum-mean-square error (L-MMSE) receiver that maximizes the signal-to-interference-plus-noise ratio (SINR) at the output of the combiner. This is to be contrasted against the maximal ratio combining (MRC) receiver that maximizes the signal-to-noise ratio (SNR) which is optimal when there is no interference. While MRC requires the knowledge of the instantaneous channel state information (CSI) and the noise variance, the L-MMSE receiver requires additional information about the instantaneous interference covariance matrix. Also, in a dynamic interference environment, the number of active interferers can be random in which case neither the MRC nor the L-MMSE receiver is optimal. To enable c 978-1-4673-7454-5/15/$31.00 โƒ2015 IEEE

practical implementation of these algorithms, typically known (or pilot) symbols are inserted within the data frame so that the receiver can estimate the CSI of the desired and interfering users. Although pilot symbols can improve the estimation accuracy, they lead to a reduction in the data throughput. Traditionally, interference mitigation algorithms in the literature assume a fixed number of interferers, and focus on detection performance with either known or estimated CSI [5]. On the other hand, using random set theory and approximate Bayesian recursions, optimal joint detection of multiple users is formulated in [6, 7]. Since estimation of the sample covariance matrix (SMI) requires at least as many samples as the number of receive antennas, the impact of diagonal loading on the SMI-based L-MMSE receiver performance is studied in [8]. Also, the robustness of Capon beamformer is studied in [9] wherein the authors employ the Lagrangian multiplier methodology to precisely compute the diagonal loading based on the ellipsoidal uncertainty set of the array steering vector. While many works have addressed computation of optimal diagonal loading, these approaches assume either channel statistics or a deterministic number of interferers [10, 11]. Recently, there is growing interest in the application of machine learning (ML) [12] techniques to wireless communications and networking [13]-[15] as problems in ML and communication share many similarities. For example, regression in ML is closely tied to continuous-valued parameter estimation in communication, whereas the classification in ML bears similarity with detection of finite-dimensional signal constellations. In particular, for cognitive wireless networks, using the support vector machines, the authors in [16] address the channels and modulation selection problem, whereas the radio-frequency channel characterization problem is studied in [17]. Using unsupervised learning, [18] studies the robust signal classification problem. 1.2. Problem Statement In this paper, we address the problem of interference mitigation when the interference is dynamically varying and when the receiver has no knowledge of the CSI and the noise variance. We focus on a block fading channel model with single-antenna transmitters and multiple antennas at the receiver, and employ pilot-assisted transmission techniques to enable channel estimation at the receiver. Increasing the number of pilots improves the channel estimation accuracy, but the overhead results in will lead to a reduction in data throughput, we propose to optimize the pilot content within the data frame to maximize the average data throughput. We employ well-known cross-validation (CV) techniques from machine learning literature to simultaneously improve the estimation accuracy as well as the average throughput. For practical channel coherence

lengths, and for binary modulations, our simulation results suggest that significant throughput improvement can be achieved with minimal pilot overhead even when the total number of users is larger than the number of receive antennas. The paper is organized as follows. In Section 2, we introduce the system model that captures dynamic interference conditions. The problem formulation is described in Section 3. A cross-validation approach to parameter estimation and signal detection is detailed in Section 4, and simulation results are presented in Section 5. We conclude this work in Section 6. 2. SYSTEM MODEL Notation: Lower-case bold-faced variables denote the column vectors (i.e., x) whereas upper-case bold-faced variables denote the matrices (i.e., A). The identity matrix of size ๐‘ ร— ๐‘ is denoted by I๐‘ . The transpose (or Hermitian) of a vector or a matrix is denoted by (โ‹…)โŠค (or (โ‹…)โ€  ). A complex (or real)-Gaussian random vector (cgRV or rgRV) x with mean m and covariance matrix C is denoted by x โˆผ ๐’ž๐’ฉ (m, C) (or x โˆผ ๐’ฉ (m, C)). The expectation operator is denoted by ๐”ผ [โ‹…]. The size of (or the number of elements in) a set ๐’ฎ is denoted by โˆฃ๐’ฎโˆฃ. For a scalar/vector/matrix โ–ก, โ„œ{โ–ก} denotes the corresponding real part. We consider a communication link with a desired transmitter and its receiver, which is affected by a number of interfering transmitters. The maximum number of interferers is denoted by ๐พmax . In this work, we focus on the case when each transmitting node is equipped with a single transmit antenna. The desired receiver is assumed to have ๐‘๐‘… receive antennas. The air interference is such that a channel use is defined as communication on a specific frequency tone during a symbol period. This model, for example, corresponds to an OFDMA (orthogonal frequency-division multiple access) air interface. We consider a block fading channel model wherein the channel remains constant within a block of ๐‘ channel uses, and varies slowly across the blocks. We denote the block index by ๐‘, and the index of the channel use within a block by ๐‘›. Assuming perfect symbol synchronization at the receiver , the ๐‘๐‘… ร— 1-dimensional signal vector at the receiver can be written as y(๐‘, ๐‘›)

=

h0 (๐‘)๐›ผ0 (๐‘)๐‘ฅ0 (๐‘, ๐‘›) + ๐พ max โˆ‘

h๐‘˜ (๐‘)๐›ผ๐‘˜ (๐‘)๐‘ฅ๐‘˜ (๐‘, ๐‘›) + v(๐‘, ๐‘›)

๐‘˜=1

=

H(๐‘)๐œถ(๐‘)x(๐‘, ๐‘›) + v(๐‘, ๐‘›),

(1)

where ๐‘ฅ0 (๐‘, ๐‘›) is the desired userโ€™s signal, h0 (๐‘) is the ๐‘๐‘… ร— 1dimensional channel from the desired user, ๐‘ฅ๐‘˜ (๐‘, ๐‘›) is the signal from the ๐‘˜-th interferer, h๐‘˜ (๐‘) is the ๐‘๐‘… ร— 1-dimensional channel from the ๐‘˜-th interferer, and v(๐‘, ๐‘›) is the additive noise with mean 0 and spatial covariance matrix R. We also have H(๐‘) = [h0((๐‘), . . . , h๐พmax (๐‘)] the global channel matrix, and ) ๐œถ(๐‘) = diag [๐›ผ0 (๐‘), . . . , ๐›ผ๐พmax (๐‘)]โŠค the diagonal matrix of the

user activation factors, and x(๐‘, ๐‘›) = [๐‘ฅ0 (๐‘, ๐‘›), . . . , ๐‘ฅ๐พmax (๐‘, ๐‘›)]โŠค the vector-valued symbols of all the users. For simplicity, we assume v(๐‘, ๐‘›) to be independent and identically distributed (i.i.d) complex-Gaussian across channel uses. The coefficients ๐›ผ๐‘˜ (๐‘) represent the activity factors for user ๐‘˜, ๐‘˜ = 0, . . . , ๐พmax . A simple model for ๐›ผ๐‘˜ (๐‘) is an i.i.d Bernoulli distribution. That is, ๐›ผ๐‘˜ (๐‘) = 1 with probability ๐‘๐‘˜ , and is 0 with probability 1 โˆ’ ๐‘๐‘˜ . In this work, we set ๐‘0 = 1 and ๐‘๐‘˜ = ๐‘, ๐‘˜ = 1, . . . , ๐‘˜max . That is, the desired user is always present in the received signal model of (1). Note that

when ๐›ผ๐‘˜ (๐‘) = 1, โˆ€๐‘˜ = 0, . . . , ๐พmax , detection of ๐‘ฅ0 (๐‘, ๐‘›) using a linear receiver requires ๐‘๐‘… โ‰ฅ ๐พmax + 1. It is important to realize that determining the set of active interferers from the received signal model in (1) is closely related to the model order determination problem [19]. With knowledge of the sample correlation matrix of the received signal and the underlying noise variance, this problem is well-studied in [20]-[24]. We note that a block of ๐‘ symbols, in practice, is generally partitioned into ๐‘๐‘ƒ pilot symbols and ๐‘๐ท data symbols. The pilot symbols enable the receiver estimate the channel parameters, whereas the information is carried within the data symbols. At the receiver, a more realistic assumption is that the pilot symbols from the desired transmitter are known whereas they are unknown from the interfering transmitters. Without loss of generality, the first ๐‘๐‘ƒ positions of the block are assumed to contain pilots. As a result, for ๐‘› = 1, . . . , ๐‘๐‘ƒ , we set ๐‘ฅ0 (๐‘, ๐‘›) = 1 and ๐‘ฅ๐‘˜ (๐‘, ๐‘›) = ยฑ1, with equal probability, for ๐‘˜ = 1, . . . , ๐พmax . The remaining ๐‘๐ท of the ๐‘ symbols contain the modulation data that must be detected at the desired receiver. Note that each transmitter can employ a modulation format that is different from the other transmitters. For simplicity, we assume a common signal constellation that has ๐‘€ modulation symbols. The channels h๐‘˜ (๐‘), ๐‘˜ = 0, . . . , ๐พmax , can have a variety of distributions that strongly depend on the propagation environment. For simplicity, we assume that h๐‘˜ (๐‘) โˆผ ๐’ž๐’ฉ (0, ๐›บ๐‘˜ I๐‘๐‘… ). This model assumes a rich scattering environment, and the channel gains are spatially uncorrelated. This is a valid assumption for widely spaced antenna elements. The variable ๐›บ๐‘˜ captures the distancedependent average channel power from user ๐‘˜. We also assume spatial independence of fading channels across the users. 3. OPTIMUM ONLINE LEARNING Our goal is to devise algorithms for channel parameter estimation and signal detection in dynamic interference conditions. From an implementation standpoint, we constrain the receiver to use linear detection algorithms (such as zero-forcing or L-MMSE receiver approaches). Note that we have a basic assumption that ๐พmax โ‰ค ๐‘๐‘… โˆ’ 1 for the feasibility of linear receivers when ๐‘ = 1. However, the average number of interferers is ๐พmax ๐‘ which can be significantly less than ๐‘๐‘… , depending upon the value of ๐‘, and a linear receiver with a fixed number of receive antennas can potentially withstand a group of interferers that is larger than ๐‘๐‘… โˆ’ 1. We note that the number of symbols within a coherence block, ๐‘ , is a function of the channel selectivity in time and frequency. Roughly, ๐‘ varies inversely with the product of the channel coherence lengths in time and frequency. Also, ๐‘ should be higher than ๐‘๐‘… to make estimation of channel covariance matrix at the receiver feasible using a portion of the pilot symbols. We also assume a flexibility in our choice of ๐‘๐‘ƒ and ๐‘๐ท such that ๐‘ = ๐‘๐‘ƒ + ๐‘๐ท . Note that choosing a higher ๐‘๐‘ƒ provides good channel estimation accuracy, but at the cost of information rate. In this work, the noise covariance matrix is set to R = ๐œŽ๐‘›2 I๐‘๐‘… , and, in addition to the channel gains, the receiver does not have knowledge of ๐œŽ๐‘›2 . With linear processing constraints at the receiver, let us denote by w๐‘ the weight vector employed within block ๐‘ to detect the symbols ๐‘ฅ0 (๐‘, ๐‘›), ๐‘› = ๐‘๐‘ƒ + 1, . . . , ๐‘ . The detected symbol ๐‘ฅ ห†0 (๐‘, ๐‘›) is simply { } ๐‘ฅ ห†0 (๐‘, ๐‘›) = slicer w๐‘โ€  y(๐‘, ๐‘›), ๐’ฎ , ๐‘› = ๐‘๐‘ƒ + 1, . . . , ๐‘, (2) where ๐’ฎ is the signal constellation employed by the desired user, and

slicer {๐‘ง, ๐’ฎ} = argmin๐‘ฅโˆˆ๐’ฎ โˆฃ๐‘ง โˆ’ ๐‘ฅโˆฃ2 is the inverse mapping of the complex-valued signal ๐‘ง to produce the nearest modulation symbol within ๐’ฎ. Note that since the channel gains and the noise variance are unknown, pilot symbols are used to estimate these parameters which in turn are used to form the weight vector w๐‘ . The fraction of symbols that are correctly detected is termed as the normalized throughput, and is given by

where Rideal is the ideal channel covariance matrix which is given by โˆ‘ Rideal = h0 hโ€ 0 + h๐‘˜ hโ€ ๐‘˜ + ๐œŽ๐‘›2 I๐‘๐‘… , (7) ๐‘˜โˆˆโ„

(3)

and the denominator in (6) ensures that wโ€  h0 = 1 so that the desired userโ€™s signal, upon the application of w, has no channel-specific scaling. The equalized symbols for the desired user are given by { } โ€  ๐‘ฅ ห†0 (๐‘›) = slicer we๏ฌ€ y(๐‘, ๐‘›), ๐’ฎ ๐‘› = ๐‘๐‘ƒ + 1, . . . , ๐‘. (8)

where 1A is the indicator function that evaluates to 1 when the event A is true, and is 0 when A is false. Our goal is, for a block length ๐‘ , to optimally allocate the pilots and data to maximize the average normalized throughput, ๐”ผ [๐’ฏ๐‘ ]. More formally, the optimization problem is:

The probability of error in correctly detecting ๐‘ฅ0 (๐‘›) when all the users employ a binary constellation is given by [ ( )] โˆ‘ โ€  โ€  โˆš โ„œ{we๏ฌ€ h0 + ๐‘˜โˆˆโ„ we๏ฌ€ h ๐‘˜ ๐‘ฅ๐‘˜ } ๐‘ƒ๐‘  = ๐”ผ ๐’ฌ 2 , (9) ๐œŽ๐‘› โˆฅwe๏ฌ€ โˆฅ

โˆ‘๐‘ ๐’ฏ๐‘ =

๐‘›=๐‘๐‘ƒ +1

[๐‘๐‘ƒ,๐‘œ๐‘๐‘ก , ๐‘๐ท,๐‘œ๐‘๐‘ก ]

=

=

1{ห†๐‘ฅ0 (๐‘,๐‘›)โ‰ก๐‘ฅ0 (๐‘,๐‘›)} ๐‘

,

argmax ๐”ผ [๐’ฏ๐‘ ] ๐‘๐‘ƒ ,๐‘๐ท

subject to ๐‘๐‘ƒ + ๐‘๐ท = ๐‘ ) ๐‘๐ท ( argmax 1 โˆ’ ๐‘ƒ๐‘  ๐‘๐‘ƒ ,๐‘๐ท ๐‘ subject to ๐‘๐‘ƒ + ๐‘๐ท = ๐‘,

(4)

where ๐‘ƒ ๐‘  is the average symbol error probability. Since the constrains are integer valued, and ๐‘ƒ ๐‘  is analytically intractable, it is rather hard to analytically solve (4). Further, with pilot-based channel estimation, ๐‘ƒ ๐‘  itself is a function of ๐‘๐‘ƒ . To proceed further, we ]โŠค [ (๐‘–) (๐‘–) choose a set of pilot/data partitions, ๐‘›(๐‘–) = ๐‘๐‘ƒ , ๐‘๐ท , such that (๐‘–)

(๐‘–)

๐‘๐‘ƒ + ๐‘๐ท = ๐‘ . For each partition ๐‘–, we employ cross-validation principles from the ML literature for robust weight vector computation, and record the throughput achieved, ๐”ผ [๐’ฏ๐‘ ](๐‘–) . The optimal partition, ๐‘–โˆ— , is simply ๐‘–โˆ— = argmax๐‘– ๐”ผ [๐’ฏ๐‘ ](๐‘–) . The main advantage of this approach is that the search complexity is fully controlled by the the number of partitions, and we only need to search around the small-to-moderate pilot sizes. Since many parameters in the model (1) are unknown, we expect that cross-validation approaches provide best-in-class estimation as well as detection performances for both in-sample as well as out-of-sample data. 4. CROSS VALIDATION APPROACH 4.1. Ideal Performance Before we embark on cross-validation approaches to the throughput optimization problem in (4), we first look at the best possible performance under ideal channel knowledge. This ideal performance also serves as an upper bound on what is achievable by any learning algorithm. With ideal channel knowledge, we drop the index of the coherence block ๐‘. Within a coherence block, we denote by โ„ = {๐‘–1 , . . . , ๐‘–๐พ } the set of active interferers. The instantaneous interference channel matrix can then be denoted by Hโ„ which is given by โ„‹โ„ = [h๐‘–1 , . . . , h๐‘–๐พ ] . (5) Having knowledge of h0 , Hโ„ and the noise variance ๐œŽ๐‘›2 , the linear MMSE weight vector at the receiver is we๏ฌ€ =

Rโˆ’1 ideal h0

hโ€ 0 Rโˆ’1 ideal h0

,

(6)

where ๐’ฌ (๐‘ฅ) is the complimentary cumulative distribution function of a standard Gaussian rv, and the expectation is over h0 , and, for ๐‘˜ โˆˆ โ„, {h๐‘–๐‘˜ , ๐‘ฅ๐‘˜ }. In (9), ๐‘ฅ๐‘˜ = ยฑ1, with equal probability, are the modulation symbols of the ๐‘˜th active interferer. We also note that when there is no interference (i.e., ๐พ = 0), the optimal detection rule is maximal ratio combining (MRC) with the weights w = h0 /โˆฅh0 โˆฅ2 , and the error probability takes a form different from (9) as [ ( )] โˆš โˆฅh0 โˆฅ ๐‘ƒ ๐‘ ,๐‘€ ๐‘…๐ถ,๐พ=0 = ๐”ผ ๐’ฌ 2 , (10) ๐œŽ๐‘› and the expectation in (10) is over the channel h0 . Since (9) and (10) are not functions of ๐‘๐‘ƒ , it follows that the optimal throughput is achieved by setting ๐‘๐‘ƒ = 0 and ๐‘๐ท = ๐‘ . That is, as one would expect, with genie-aided channel information, all the symbols within a block are used for data transmission. 4.2. Channel Estimation and Beamforming via Cross-Validation We now describe a procedure that performs channel estimation, signal detection, and optimization of training and data phases to maximize the normalized throughput. We first divide the pilot portion of the frame into training and validation phases. We define by ๐›ฟ the ratio between the number of symbols for training and the number of pilot symbols. With this, ๐‘๐‘ƒ,๐‘ก = ๐›ฟ๐‘๐‘ƒ is the number of pilot symbols available for training and ๐‘๐‘ƒ,๐‘ฃ = (1 โˆ’ ๐›ฟ)๐‘๐‘ƒ is the number of pilot symbols available for validation. The set of pilot indices โ„๐‘ƒ is partitioned into โ„๐‘ก and โ„๐‘ฃ such that โ„๐‘ก contain the pilot indices for training, whereas โ„๐‘ฃ contain the indices for testing. Using โ„๐‘ก , a sample-mean based channel estimate is โˆ‘ ห† 0 (๐‘) = 1 y(๐‘, ๐‘›) = h0 (๐‘) + h โˆฃโ„๐‘ก โˆฃ ๐‘›โˆˆโ„ ๐‘ก

1 โˆ‘ โˆ‘ h๐‘–๐‘˜ (๐‘, ๐‘›)๐‘ฅ๐‘–๐‘˜ (๐‘, ๐‘›) + v๐‘ก (๐‘), โˆฃโ„๐‘ก โˆฃ ๐‘›โˆˆโ„ ๐‘˜โˆˆโ„

(11)

๐‘ก

where the second term in (11) ( is the inter-user ) (or multiple-access) ๐œŽ2 interference, and v๐‘ก (๐‘) โˆผ ๐’ž๐’ฉ 0, โˆฃโ„๐‘›๐‘ก โˆฃ I๐‘๐‘… is the channel estimation error (in the absence of any interference). An estimate of the overall covariance matrix using โ„๐‘ก is โˆ‘ ห† ๐‘ก๐‘œ๐‘ก๐‘Ž๐‘™ (๐‘) = 1 y(๐‘, ๐‘›)yโ€  (๐‘, ๐‘›), R โˆฃโ„๐‘ก โˆฃ ๐‘›โˆˆโ„ ๐‘ก

(12)

K(max) = 0. SNR [dB] = 0. NR = 4. N = 1000. ฮด = 0.8

and an estimate of the noise variance is given by ห†2 ๐‘› = ๐œŽ

โˆ‘ 1 ห† 0 (๐‘)โˆฅ2 . โˆฅy(๐‘, ๐‘›) โˆ’ h โˆฃโ„๐‘ก โˆฃ๐‘๐‘… ๐‘›โˆˆโ„

0.99

(13)

๐‘ก

ห† 0 (๐‘) ห† โˆ’1 (๐‘)h R ๐‘’๐‘ ๐‘ก,๐œ† , โ€  โˆ’1 ห† 0 (๐‘) ห† ห† (๐‘)h h (๐‘)R 0

(14)

Normalized Throughput

0.97

Note that the estimate (13) is biased, and this bias can be corrected relatively easily only when there is no interference. We propose the following weight vector to detect the desired userโ€™s modulation symbols: w๐œ† (๐‘) =

With CV: Based on MSE With CV: Based on mean pilot error Without CV

0.98

0.96 0.95 0.94 0.93 0.92

๐‘’๐‘ ๐‘ก,๐œ†

0.91

where 0.9

(15)

is an estimated covariance matrix of the received signal augmented with a diagonal load that is parameterized by ๐œ†. We note that, (14) provides a robust beamformer in the presence of unknown noise variance and dynamic interference, and, unlike [8],[9],[10], and [11], we determine the optimal ๐œ† solely based on the received data and known pilot symbols without regard to the statistics of interference and noise. The detected symbols using (15) are simply { } ๐‘ฅ ห†0 (๐‘›) = slicer w๐œ†โ€  (๐‘)y(๐‘, ๐‘›), ๐’ฎ ๐‘› = ๐‘๐‘ƒ + 1, . . . , ๐‘. (16) Using the fact that ๐‘ฅ0 (๐‘, ๐‘›) = 1 for ๐‘› โˆˆ โ„๐‘ฃ , an optimal ๐œ† can be obtained by minimizing the sample MSE between the estimated and true pilot symbols, or by minimizing the sample error rate between the detected and true pilot symbols. That is, 2  1 โˆ‘   โ€  ๐œ†โ˜…,๐‘€ ๐‘†๐ธ = argmin (17) 1 โˆ’ w๐œ† (๐‘)y(๐‘, ๐‘›) โˆฃโ„ โˆฃ ๐‘ฃ ๐œ†โˆˆ๐œฆ ๐‘›โˆˆโ„ ๐‘ฃ

is the optimal ๐œ† that minimizes the sample MSE in the testing set, and 1 โˆ‘ { { 1 sign โ„œ{wโ€  (๐‘)y(๐‘,๐‘›)}}โˆ•=1} (18) ๐œ†โ˜…,๐ต๐ธ๐‘… = argmin โˆฃโ„๐‘ฃ โˆฃ ๐‘›โˆˆโ„ ๐œ† ๐œ†โˆˆ๐œฆ ๐‘ฃ

is the optimal ๐œ† that minimizes the sample BER in the testing set. Note that in (17) and (18) ๐œฆ is a set of ๐œ†s that the receiver must search over, and the overall detection complexity grows linearly with โˆฃ๐œฆโˆฃ. Once the optimal ๐œ† is found, the receiver employs all the pilot symbols to estimate the channel, overall covariance matrix, and the noise variance. The resulting weight vector w๐œ†โ˜… (๐‘) is used to detect all the data symbols within the frame. We refer to the optimal beamformer based on (17) as the MSE-CV-BF, whereas the one based on (18) as the BER-CV-BF. The conventional beamformer without CV is termed as C-BF which is obtained by using all the pilots to estimate the desired channel, interference-plus-noise covariance matrix, and the additive noise variance, and the diagonal load is simply the noise variance. 5. SIMULATION RESULTS 5.1. Parameters and Methodology In all the simulations, we set ๐‘๐‘… = 4 receive antennas, employ binary constellations for all the users (i.e., ๐’ฎ = {โˆ’1, +1}), and set ๐›ฟ = 0.8 (i.e., 80% of pilots for training and the remaining 20% for

0.89 10

20

30

40

50 60 Number of Pilots

70

80

90

100

(a) SNR = 0 dB SNR [dB] = 20. NR = 4. N = 1000. ฮด = 0.8 1 With CV: Based on MSE With CV: Based on mean pilot error Without CV

0.99 0.98 Normalized Throughput

( ) ห† ๐‘ก๐‘œ๐‘ก๐‘Ž๐‘™ (๐‘) + ๐œ† trace R ห† ๐‘’๐‘ ๐‘ก,๐œ† (๐‘) = R ห† ๐‘ก๐‘œ๐‘ก๐‘Ž๐‘™ (๐‘) I๐‘ R ๐‘… ๐‘๐‘…

0.97 0.96 0.95 0.94 0.93 0.92 0.91 0.9 10

20

30

40

50 60 Number of Pilots

70

80

90

100

(b) SNR = 20 dB Fig. 1: Normalized throughput under the first approach to interference modeling. Here, ๐พmax = 0, ๐›ฟ = 0.8, ๐‘๐‘… = 4 antennas, and a frame length of ๐‘ = 1000 symbols.

validation) which is a general recommendation from the ML literature [12]. The channel coherence length ๐‘ is set to 1000 symbols, and the activity factor of interferers, ๐‘, is set to 0.5. The diagonal load search window ๐œฆ, in dB, is chosen from [โˆ’20, 20] in increments of 2. For each data/pilot partition, we generate 100000 independent realizations of (1). For each realization, we compute the optimal weight vector from (14) with the sample MSE minimizing ๐œ†โ˜…,๐‘€ ๐‘†๐ธ from (17), or the sample BER minimizing ๐œ†โ˜…,๐ต๐ธ๐‘… from (18). Using the optimal BF, the data symbols are detected as per (16), and the normalized throughput per realization is computed as per (3). Upon averaging (3) over the realizations, we obtain ๐”ผ [๐’ฏ๐‘ ]. In all the simulations, the throughput is further normalized by the ideal throughput with perfect CSI at the receiver. The interference is modeled in two different approaches. In the first approach, all the active interferers transmit at the same average power level, which is denoted by ๐›พ ๐ผ = ๐›บ๐‘– /๐œŽ๐‘›2 , ๐‘– โˆˆ โ„, relative to the thermal noise power. If ๐›พ 0 = ๐›บ0 /๐œŽ๐‘›2 denote the average received SNR from the desired user, then ๐›พ 0 /๐›พ ๐ผ denote the average carrier-

K(max) = 5. p = 0.5. CIR [dB] = โˆ’10. SNR [dB] = 0. NR = 4. N = 1000. ฮด = 0.8

K(max) = 10. p = 0.5. CIR [dB] = 0. SNR [dB] = 0. NR = 4. N = 1000. ฮด = 0.8

0.9

0.94

0.88 0.92

With CV: Based on MSE With CV: Based on mean pilot error Without CV

0.84

Normalized Throughput

Normalized Throughput

0.86

0.82 0.8 0.78

0.9

0.88

0.86 With CV: Based on MSE With CV: Based on mean pilot error Without CV

0.76 0.84 0.74 0.72 10

20

30

40

50 60 Number of Pilots

70

80

90

0.82 10

100

20

30

(a) SNR = 0 dB

40

50 60 Number of Pilots

70

80

90

100

(a) SNR = 0 dB K(max) = 10. p = 0.5. CIR [dB] = 0. SNR [dB] = 20. NR = 4. N = 1000. ฮด = 0.8

K(max) = 5. p = 0.5. CIR [dB] = โˆ’10. SNR [dB] = 20. NR = 4. N = 1000. ฮด = 0.8 0.9

0.94

0.92

Normalized Throughput

Normalized Throughput

0.85

With CV: Based on MSE With CV: Based on mean pilot error Without CV

0.8

0.75

0.9

0.88

0.86

0.84 0.7

With CV: Based on MSE With CV: Based on mean pilot error Without CV

0.82

0.65 10

20

30

40

50 60 Number of Pilots

70

80

90

100

0.8 10

20

30

40

50 60 Number of Pilots

70

80

90

100

(b) SNR = 20 dB

(b) SNR = 20 dB

Fig. 2: Normalized throughput under the first approach to interference modeling. Here, ๐พmax = 5, ๐‘ = 0.5, ๐›ฟ = 0.8, CIR = -10 dB, ๐‘๐‘… = 4 antennas, and a frame length of ๐‘ = 1000 symbols.

Fig. 3: Normalized throughput under the second approach to interference modeling. Here, ๐พmax = 10, ๐‘ = 0.5, ๐›ฟ = 0.8, CIR = 0 dB, ๐›ฅ = 10 dB, ๐‘๐‘… = 4 antennas, and a frame length of ๐‘ = 1000 symbols.

to-interference ratio (CIR). In the second approach, each interferer is assumed to transmit at a power level that is uniformly distributed within [โˆ’๐›ฅ, ๐›ฅ] dB relative to a nominal value of ๐›พ ๐ผ . This model allows for distance-dependent power variations and/or any residual errors due to open-loop power control. Under the second approach, we set the nominal CIR to be 0 dB and ๐›ฅ = 10 dB. 5.2. Results and Observations Under the first approach to interference modeling, the throughput is plotted as a function of ๐‘๐‘ƒ for ๐›พ 0 โˆˆ {0, 20} dB. Fig. 1 depicts the throughput performance when ๐พmax = 0, whereas Fig. 2 assumes ๐พmax = 5 and ๐‘ = 0.5. We observe from Fig. 1 that, in the absence of any interference, there is very little to be gained from the CV approach as the optimum load is 0. In fact, at very low SNRs (i.e., ๐›พ 0 = 0 dB) and at very low ๐‘๐‘ƒ there is a small degradation in performance with both MSE-CV-BF and BER-CV-BF relative to the C-BF. As the SNR increases to 20 dB, all the approaches yield iden-

tical performances. However, with interference, the performances are remarkably different, as shown in Fig. 2. From Fig. 2, we observe that, at both lower and higher operating SNRs, the proposed MSE-CV-BF and BER-CV-BF approaches significantly outperform C-BF. For example, with 1% pilot overhead, the throughput of MSECV-BF is around 83% which is 15% higher than that of C-BF. We also notice that at lower ๐‘๐‘ƒ , MSE-CV-BF has a small advantage over BER-CV-BF, whereas at higher SNR and with larger ๐‘๐‘ƒ these two approaches have comparable performances. Under the second approach to interference modeling, Fig. 3 considers an over-loaded scenario with ๐พmax = 10 and ๐‘ = 0.5. Note that the average number of interferers in this case is 5 which is higher than ๐‘๐‘… โˆ’ 1 = 3. When ๐›พ 0 = 0 dB, we see that the normalized throughput with C-BF peaks around 87% with ๐‘๐‘ƒ = 50, whereas with just 25 pilots the normalized throughput improves to 92% with the MSE-CV-BF. As the SNR increases to 20 dB, we see a slight dip (to around 86% at ๐‘๐‘ƒ = 50) in the normalized throughput of C-BF,

whereas it increases to around 93% with the MSE-CV-BF. We also observe that, in the region of higher pilot overhead, the BER-CV-BF has a slightly inferior performance compared with the MSE-CV-BF at lower SNRs, and the two approaches have comparable performances as the SNR increases. However, for lower pilot overhead, MSE-CV-BF offers superior performance compared with BER-CVBF. 6. CONCLUSION Traditionally, interference mitigation algorithms in the literature have focused on either identifying/estimating a deterministic number of interference channels or employing a variety of receivers with either ideal/estimated channels. In this work, we have addressed the problem of robust interference mitigation with linear receivers for data throughput optimization when the receiver has no knowledge of the channel statistics, and when the interference itself is dynamically varying across the channel coherence length. Using the cross-validation principles from machine learning, we have obtained the optimum data and pilot allocation to maximize the average throughput. Our results have shown that even when the average number of active interferers, ๐พmax ๐‘, is larger than the number of degrees of freedom, ๐‘๐‘… โˆ’ 1, at least 85% of the normalized throughput can be achieved with the optimum pilot overhead. 7. REFERENCES [1] Cisco White Paper, โ€œVisual Networking Index: Global Mobile Data Traffic Forecast Update, 2014 2019.โ€ Available at http://www.cisco.com/c/en/us/ solutions/collateral/service-provider/ visual-networking-index-vni/white_paper_ c11-520862.html.

[10] N. Ma and J. Goh, โ€œEfficient method to determine diagonal loading value,โ€ in Proc. IEEE Int. Conf. Acoustics, Speech Signal Processing, vol. V, 2003, pp. 341-344. [11] X. Mestre and M. A. Lagunas, โ€œFinite sample size effect on minimum variance beamformers: Optimum diagonal loading factor for large arrays,โ€ IEEE Trans. Sig. Processing, vol. 54, no. 1, pp. 69-82, Jan. 2006. [12] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Ed., Springer & Sons, 2013. [13] C. Clancy, J. Hecker, E. Stuntebeck, and T. OShea, โ€œApplications of machine learning to cognitive radio networks,โ€ IEEE Wireless Commun., vol. 14, no. 4, pp. 47-52, Aug. 2007. [14] A. He, K. K. Bae, T. Newman, J. Gaeddert, K. Kim, R. Menon, L. Morales-Tirado, J. Neel, Y. Zhao, J. Reed, and W. Tranter, โ€œA survey of artificial intelligence for cognitive radios,โ€ IEEE Trans. Vehicular Techno., vol. 59, no. 4, pp. 1578-1592, May 2010. [15] M. Bkassiny, Y. Li and S. K. Jayaweera, โ€œA survey on machinelearning techniques in cognitive radios,โ€ IEEE Comm. Surveys & Tutorials, vol. 15, no. 3, pp. 1136-1159, Third Quarer 2013. [16] G. Xu and Y. Lu, โ€œChannel and modulation selection based on support vector machines for cognitive radio,โ€ in Proc. International Conference on Wireless Communications, Networking and Mobile Computing (WiCOM), Sept. 2006, pp. 1-4. [17] T. Atwood, โ€œRF channel characterization for cognitive radio using support vector machines,โ€ Ph.D. dissertation, University of New Mexico, Nov. 2009. [18] T. Clancy, A. Khawar, and T. Newman, โ€œRobust signal classification using unsupervised learning,โ€ IEEE Trans. on Wireless Commun., vol. 10, no. 4, pp. 1289-1299, Apr. 2011. [19] P. D. Grunwald, The Minimum Description Length Principle, The MIT Press, 2007.

[2] J. G. Andrews, S. Buzzi, W. Choi, S. V. Hanley, A. Lozano, A. C. K. Soong, J. C. Zhang, โ€œWhat will 5G be?,โ€ IEEE Journal on Selected Areas in Commun., vol. 32, no. 6, pp. 1065-1082, June 2014.

[20] H. Akaike, โ€œA new look at the statistical model identification,โ€ IEEE Trans. Automat. Contr., vol. 19, pp. 716-723, 1974.

[3] P. Stavroulakis, Interference Analysis of Communication Systems, Edited, IEEE Press Selected Reprint Series, 1980.

[21] G. Schwartz, โ€œEstimation the order of a model,โ€ Ann. Stat., vol. 6, pp. 461-464, 1974.

[4] J. H. Winters, โ€œOptimum combining in digital mobile radio with cochannel interference,โ€ IEEE Journal on Selected Areas in Commun., vol. 2, no. 4, pp. 528-539. July 1984.

[22] J. Rissanen, โ€œModeling by shortest data description,โ€ Automatica, vol. 14, pp. 465-471, 1978.

[5] M. L. Honig, Advances in Multiuser Detection, Edited, John Wiley & Sons, 2009. [6] E. Biglieri and M. Lops, โ€œMultiuser detection in a dynamic environment. Part I: User identification and data detection,โ€ IEEE Trans. Info. Theory, vol. 53, no. 9, pp. 3158-3170, Sep. 2007. [7] E. Biglieri and M. Lops, โ€œMultiuser detection in a dynamic environment. Part I: Joint user identification and parameter estimation,โ€ IEEE Trans. Info. Theory, vol. 55, no. 5, pp. 23652374, May 2009. [8] B. D. Carlson, โ€œCovariance matrix estimation errors and diagonal loading in adaptive arrays,โ€ IEEE Trans. Aerospace and Electronics Systems, vol. 24, no. 4, pp. 397-401, July 1998. [9] J. Li, P. Stoica, and Z. Wang, โ€œOn robust Capon beamforming and diagonal loading,โ€ IEEE Trans. Signal Processing, vol. 51, no. 7, pp. 1702-1715, July 2003.

[23] M. Wax and T. Kailath, โ€œDetection of signals by information theoretic criteria,โ€ IEEE Trans. on Acoustic, Speech, and Signal Processing (ASSP), vol. 33, pp. 387-392, Apr. 1985. [24] R. R. Nadakuditi and A. Edelman, โ€œSample eigenvalue based detection of high-dimensional signals in white noise using relatively few samples,โ€ IEEE Trans. Sig. Processing, vol. 56, no. 7, pp. 2625-2638, July 2008.