Capacity bounds for noncoherent fading channels with a peak constraint

0 downloads 0 Views 82KB Size Report
to the information rate with side information present, minus a penalty term to ... for complete CSI, with the peak constraint ignored, and the other is based on the ...
Capacity bounds for noncoherent fading channels with a peak constraint Vignesh Sethuraman Dept. of Electrical Engineering

Bruce Hajek

Krishna Narayanan

Dept. of Electrical and Computer Engineering Dept. of Electrical Engineering

University of Illinois at Urbana-Champaign

University of Illinois at Urbana-Champaign

Texas A and M University

[email protected]

[email protected]

[email protected]

Abstract— A discrete-time single-user channel with temporally correlated Rayleigh fading is considered. Neither the transmitter nor the receiver has channel side information (CSI), and both peak and average power constraints are placed on the inputs. Two lower bounds to the capacity are presented. One is motivated by the technique of decision feedback decoding. The other is related to the information rate with side information present, minus a penalty term to account for the information about the channel that is learned at the receiver. The second lower bound is a slight variation of a bound of Shamai and Marzetta. The bounds are compared numerically to two upper bounds for a channel with Gauss Markov Rayleigh fading. One upper bound is the capacity for complete CSI, with the peak constraint ignored, and the other is based on the capacity per unit energy. In general, the gap between the upper and lower bounds depends on the channel memory, but is quite small for low SNR.

word (xm1 ,. . . ,xmn ), m = 1, . . . , M , satisfies the constraints n X |xmi |2 ≤ nPave (2) i=1

max |xmi |2

1≤i≤n

≤ Ppeak

(3)

and the average (assuming equiprobable messages) probability of correctly decoding the message based on the received vector (Y1 , . . . , Yn ) is greater than or equal to 1 − .

The peak power constraint Ppeak can be expressed as a

multiple of the average power constraint Pave . Let β be the peak to average ratio, Ppeak (4) Pave > 0 and β ≥ 1. A number R is an -achievable rate β=

Fix Pave

per unit time if, for every γ > 0, there exists an no sufficiently

large so that, if n ≥ no , there exists an (n, M, Pave , βPave , )

I. I NTRODUCTION AND P ROBLEM S TATEMENT

code with log M ≥ n(R − γ). A nonnegative number R is an

Consider the discrete-time channel Y k = H k Xk + W k

achievable rate per unit time if it is -achievable for 0 <  < 1.

(1)

The capacity, C(Pave , β), is defined in an operational sense as the maximum of the achievable rates per unit time.

where X is the input, Y is the output, H is the multiplicative noise (fading) and W is the additive noise process. The additive noise W is an independent and identically distributed (i.i.d.) process. The fading process H is correlated in time, and

Alternatively, an information theoretic definition of capacity is as follows. C(Pave , β) = lim sup n→∞ P

n X1

1 I(X1n ; Y1n ). n

(5)

is assumed to be ergodic. The processes X, H and W are mu-

where the supremum is over the set of input distributions PX1n

tually independent. There is no side information at transmitter

that satisfy the constraints:

or receiver. Both average and peak power constraints will be considered. Since the capacity of this channel is not known, it is of interest to find good lower bounds on the capacity. This is precisely the focus of this paper.

PX1n [ max |Xi |2 ≤ βPave ] 1≤i≤n # " n 1X 2 |Xi | EX1n n i=1

=

1

(6)



Pave .

(7)

An (n, M, Pave , Ppeak , ) code for this channel consists of

Since the fading process H is stationary and ergodic, and the

M codewords, each of block length n, such that each code-

noise process is i.i.d. and hence weakly mixing, the channel is

stationary and ergodic. Also, the channel has no input memory.

To establish (11), it suffices to prove that, for any input process

It follows from ideas surrounding information stability [1,

X,

2] and the Shannon-McMillan-Breiman theorem for finite alphabet ergodic sources that the above operational definition of capacity per unit time is equivalent to the information theoretic definition (see [3] for details). It is of interest to construct a good lower bound on C(Pave , β). In Proposition 2.1, a lower bound is constructed

I(Xk ; Y1n |X1k−1 ) ≥ I(Xk ; Yk |U1k−1 ) where Uk = (Xk , Yk ). But, I(Xk ; Y1n |X1k−1 ) = I(Xk ; Y1k−1 |X1k−1 )

n |U1k−1 , Yk ) +I(Xk ; Yk |U1k−1 ) + I(Xk ; Yk+1

≥ I(Xk ; Yk |U1k−1 )

for the case β = 1. A method for adapting this lower bound to the case β > 1 is later presented. In Proposition 2.2, a different lower bound is obtained using an adaptation of a bound in [4]. These bounds are then compared to two known upper bounds and shown to be tight at low SNRs.

(13)

This completes the proof. Note that I(Xk ; Y1k−1 |X1k−1 ) is zero for a causal channel.

−1 −1 An interpretation of I(X0 , Y0 |X−∞ , Y−∞ ) as a lower

bound and a constructive scheme to obtain performance close

to this lower bound is based on the idea of coded decision II. B OUNDS

feedback receivers [5]. Through the use of appropriate block

Consider the channel modeled in (1). For a given P > 0, √ let FP be a distribution on the set χP = {x ∈ C : |x| ≤ P }.

interleaving such as in [5] and the use of capacity achiev-

Let the input X be an i.i.d. process with marginal distribution

a symbol at time n will have access to past outputs and

FP . Let Cl1 (FP ) be defined by

also to the past decoded decisions, which are the channel

ing codes for memoryless channels, the receiver decoding

inputs. Using L past decoded decisions along with the channel

−1 −1 Cl1 (FP ) = I(X0 ; Y0 |X−∞ , Y−∞ ).

(8)

Proposition 2.1: For any average power constraint P > 0, for any distribution FP on χP , Cl1 (FP ) is a lower bound on the capacity of the channel with average power constraint P

outputs renders an equivalent channel with information rate −1 −1 I(X0 , Y0 |X−L , Y−L ). In the limit as L → ∞, we get the

afore mentioned lower bound.

The lower bound Cl1 (FP ) is particularly easy to compute for the specialized case when the channel processes H and W

and peak to average ratio β = 1: Cl1 (FP ) ≤ C(P, 1).

(9)

are proper complex Gaussian, and the input process X satisfies √ |Xk | ≡ P , where P > 0 is the average power constraint. It is

Proof: Since the input process X is i.i.d. and the channel

assumed without loss of generality that the channel processes

is causal, I(Xn ; Yn |X1n−1 , Y1n−1 ) is non-decreasing in n. So,

H and W are normalized such that E[|H0 |2 ] = E[|W0 |2 ] = 1. b 0 = E[H b 0 |X −1 , Y −1 ], where E b denotes the linear Let H −∞ −∞ e 0 = H0 − minimum mean square estimator (LMMSE). Let H

limn→∞ I(Xn ; Yn |X1n−1 , Y1n−1 ) exists. Furthermore, it can be shown by using the lower semi-continuity of divergence,

that −1 −1 lim I(Xn ; Yn |X1n−1 , Y1n−1 ) = I(X0 ; Y0 |X−∞ , Y−∞ ).

n→∞

(10)

Proving Proposition 2.1 is thus equivalent to proving that for

b 0 . Since the process H is proper complex Gaussian, both H b 0 and H e 0 are proper complex Gaussian. It follows from the H

e 0 is mean zero and independent orthogonality principle that H −1 −1 of (X−∞ , Y−∞ ). The output Y0 is then given by

Y0

any n ≥ 1, C(P, 1) ≥

I(Xn ; Yn |X1n−1 , Y1n−1 ).

(11)

From the expression for capacity in terms of mutual information and the chain rule, nC(P, 1) ≥ I(X1n ; Y1n ) =

n X

k=1

I(Xk ; Y1n |X1k−1 )

(12)

=

H 0 X0 + W 0

(14)

=

b 0 X0 + W ¨0 H

(15)

¨0 = H e 0 X0 + W0 . Since H e 0 is proper complex where W

Gaussian and is independent of the process X, and since √ e 0 X0 is also a proper complex Gaussian |Xk | ≡ P, H e 0 X0 , random variable independent of the process X. So, H −1 −1 ¨0 W0 and (X−∞ , Y−∞ ) are mutually independent. Hence W

b 0 |2 ] = 1 − E[|H e 0 |2 ]. E[|H

−1 −1 and (X−∞ , Y−∞ ) are mutually independent. From (15), it is

clear that the following forms a Markov chain: −1 −1 (X−∞ , Y−∞ )

¨ 0 is It follows that the second moment (and variance) of W

b 0 ↔ (X0 , Y0 ). ↔H

b 0 , is a sufficient statistic of The LMMSE H

given by

−1 −1 (X−∞ , Y−∞ ),

rela-

tive to (X0 , Y0 ). Since sufficient statistics preserve conditional

¨ 0 |2 ] = exp E[|W

−1 −1 b0) I(X0 ; Y0 |X−∞ , Y−∞ ) = I(X0 ; Y0 |H

(16)

b 0 ). Cl1 (FP ) = I(X0 ; Y0 |H

(17)

To evaluate the above mutual information expression, the e 0 and H b 0 are needed. A method to calculate variances of H

these quantities is described next. Yk Xk∗ , where



ˇk Vk = H k + W 1 P

dω log (1 + P SH (ω)) 2π −π

ˇ k is Wk Xk∗ . The variance of W



.

(26)

b 0 ) = h(Y0 |H b 0 ) − h(Y0 |H b 0 , X0 ) I(X0 ; Y0 |H

(18) 1 P

. Since W is

an i.i.d. proper complex Gaussian process independent of the √ ˇ k is also an i.i.d. proper process X, and since |Xk | ≡ P , W

(27)

b 0 ), depends on The first term in the above expression, h(Y0 |H

the input distribution FP and can be evaluated numerically. b 0 and X0 , is proper complex Gaussian with Since Y0 , given H ¨ 0 |2 ], the second term is given by the following variance E[|W expression.

denotes the complex conjugate.

It follows that ˇk = where W

π

sion in (17) can be evaluated in the following manner.

From (8) and (16),

1 P

Z

In view of (15), (25) and (26), the mutual information expres-

mutual information, we have

Let Vk =

(25)

b 0 , X0 ) = log(πe) + h(Y0 |H

Z

π

log (1 + P SH (ω)) −π

dω (28) 2π

In Section III, Cl1 (FP ) is evaluated for the Gauss Markov fading channel using the above expressions. The following corollary of Proposition 2.1 gives a lower bound on capacity when the peak to average ratio β is greater

complex Gaussian process independent of X. So, the processes ˇ and H are mutually independent. It thus follows from X, W

than 1. It is derived by modifying the lower bound for the case

(18) that the process X is independent of (H, V ). It follows

Corollary 2.1: For β > 1, a lower bound to C(P, β) is

that

when β = 1 by using the well known time-sharing argument. given by

e0) Variance(H

−1 −1 MSEE[H0 |X−∞ , Y−∞ ]

=

−1 −1 MSEE[H0 |X−∞ , V−∞ ]

=

1 Cl1 (FγP ) γ∈[1,β] γ

β Cl1 (FP ) = max

(19)

(29)

(20)

A second lower bound is given next. The bound is an

(21) 1 −1 = MSEE[V0 |V−∞ ]− (22) P Here, MSEE denotes the mean square estimation error. The

adaptation of a bound of Shamai and Marzetta [4] (also see

−1 MSEE[H0 |V−∞ ]

=

[8]). For P > 0 given, let FP be a distribution on the set χP . Let X be an i.i.d. process with distribution FP . Let 1 I(H1n ; Y1n |X1n ). n

process V is clearly stationary, ergodic and regular. It is a

Cl2 (FP ) = I(X1 ; Y1 |H1 ) − lim

standard result in estimation theory [6, Chapter IV.9 Theorem

Proposition 2.2: For any power constraint P > 0, for any

4] (also see [7, Chapter XII.4 Theorem 4.3]) that the one-step prediction error given the infinite past is given by:   Z π 1 dω −1 log SH (ω) + log MSEE[V0 |V−∞ ]= P 2π −π

1 P



exp

Z

π

log (1 + P SH (ω)) −π

dω 2π



Cl2 (FP ) ≤ C(P, 1).

(31)

Proof:

component of the power spectral measure of H. e 0 and H b 0 are zero mean. Their second moments are Both H e 0 |2 ] = E[|H

(30)

distribution FP on χP ,

(23)

Here, SH (ω) denotes the density of the absolutely continuous

given below.

n→∞

 −1 .

(24)

1 I(X1n ; Y1n ) n

= ≥

Since X is i.i.d.,

1 (I(X1n , H1n ; Y1n ) − I(H1n ; Y1n |X1n )) n 1 (I(X1n ; Y1n |H1n ) − I(H1n ; Y1n |X1n )) n

1 n n n n I(X1 ; Y1 |H1 )

is just I(X1 ; Y1 |H1 ). So,

1 1 I(X1n ; Y1n ) ≥ I(X1 ; Y1 |H1 ) − I(H1n ; Y1n |X1n ) n n

(32)

The first term in the expression on the right is independent of channel memory. For the second term, it can be shown that

1 n n n n I(H1 ; Y1 |X1 )

is non-increasing in n. From the above

analysis, it is clear that Cl2 (FP ) is a lower bound on C(P, 1).

where 1 Cu (Pave , β) = Pave − β

Z

π

log(1+βPave SH (ω)) −π

dω (40) 2π

A second upper bound on C(Pave , β) is the capacity of the channel without the peak constraint and with CSI at the

Suppose the channel processes H and W are proper com√ plex Gaussian, and the input process X satisfies |Xk | ≡ P , where P > 0 is the average power constraint. It is assumed

receiver, denoted by Ccoh (Pave ). III. N UMERICAL RESULTS

without loss of generality that the channel processes H and

The lower bounds on capacity hold for any stationary

W are normalized such that E[|H0 |2 ] = E[|W0 |2 ] = 1. Using

ergodic channel. In this section, the fading process H is fixed

Szeg¨o’s first limit theorem [9], it can be shown that Z π dω 1 n n n log (1 + P SH (ω)) lim I(H1 ; Y1 |X1 ) = n→∞ n 2π −π

pared. Let the correlation coefficient (= E[H0 H1∗ ]/E[H0 H0∗ ]) (33)

The lower bound Cl2 (FP ) is then given by the following expression. Cl2 (FP ) = I(X0 ; Y0 |H0 )−

Z

π

log (1 + P SH (ω)) −π

dω . (34) 2π

It is not known whether, in general, the two lower bounds Cl1 (FP ) and Cl2 (FP ) are ordered. Nevertheless, it is possible to compare them for Rayleigh fading channels specified above √ for the case |Xk | ≡ P . From (17), (27) and (28), Cl1 (FP ) is given by:

Z

to be Gauss Markov and the upper and lower bounds are combe denoted by ρ. It is noted that when |ρ| = 1, the process H

is not ergodic and the bounds do not hold.

The average power constraint on the input is denoted by P > 0. The peak power constraint is specified by the average power constraint P together with the peak to average ratio β ≥ 1. Since the additive noise process W has unit variance,

the average input power is also the signal to noise ratio (SNR). The cases β = 1 and β > 1 are considered separately. A. Peak to average ratio β = 1:

In this case, the peak power constraint coincides with π

dω b0) − Cl1 (FP ) = h(Y0 |H log (πe(1 + P SH (ω))) . 2π −π (35)

the average power constraint. This applies well to constant

for Cl2 (FP ) is obtained from (34). Z π dω log (πe(1 + P SH (ω))) . Cl2 (FP ) = h(Y0 |H0 ) − 2π −π (36) e 0 |H b 0 ) ≥ 0. So, h(Y0 |H b 0 ) ≥ h(Y0 |H b0, H e 0 ). Clearly, I(Y0 ; H

Cu (P, 1) and Ccoh (P ). For the lower bounds, FP is fixed as a √ √ √ √ uniform distribution on the set {+ P , j P , − P , −j P }

Noting that h(Y0 |X0 , H0 ) = log(πe), the following expression

b0, H e 0 ) = h(Y0 |H0 ) from (14). This implies the But h(Y0 |H following inequality:

b 0 ) ≥ h(Y0 |H0 ) h(Y0 |H

(37)

amplitude modulation schemes such as QPSK and FSK. The two upper bounds for average power P are given by

– this corresponds to using QPSK, where all symbols in the constellation are equally likely. The bounds are calculated by numerical integration as explained in Section II. In Figure 1, the bounds on capacity are plotted as functions of SNR, with ρ fixed at 0.99. At low SNR, the upper and lower

bounds are close – for P < −10dB, the two bounds almost coincide, thus giving (up to plotting accuracy) the capacity of

From (35), (36) and (37), it follows that, for any P > 0, when

this channel. A significant part of the gap between the upper

the channel processes are proper complex Gaussian, and the √ input process X satisfies |Xk | ≡ P ,

bound and lower bound at high SNR is due to the fact that

Cl1 (FP ) ≥ Cl2 (FP ).

the upper bound uses Gaussian signaling. By increasing the

(38)

An upper bound on C(Pave , β) is given in [3, (4), Proposition 2.3], and is a consequence of an expression for capacity per unit energy.

we use a QPSK constellation for the lower bound whereas constellation size from QPSK to a larger PSK constellation, the lower bound can be improved and the gap can be reduced. Clearly, the memory in the channel is a significant factor in determining how good the bounds are. For example, when

C(Pave , β) ≤ Cu (Pave , β)

(39)

there is no memory in the channel, i.e. ρ = 0, both Cl1 (FP )

100

10 Cu

1

1

Capacity (nats/s)

Capacity (nats/s)

10

0.1 Ccoh Cl1

0.01

Cl2

0.001

1e-05 -30

-25

-20

-15 -10 -5 0 5 10 β = 1 : Average Power, P (dB)

15

0.1

β =10

0.01 0.001

1e-05 -30

20

Bounds on capacity vs average power P : β = 1, ρ = 0.99

1

Fig. 3.

-25

-20 -15 -10 -5 0 5 10 β = 10 : Average Power, P (dB)

15

20

Bounds on capacity vs average power P : β = 10, ρ = 0.99

IV. C ONCLUSION

Cu Cl1 Cl2

P=-5 dB

Two lower bounds on the capacity of the non-coherent fad-

0.1 Capacity (nats/s)

β =1

0.0001

0.0001

Fig. 1.

Cu Ccoh Cl

ing channel with stationary and ergodic fading were derived.

P=-10 dB

Using the Gauss-Markov channel as an example, the lower

P=-15 dB

bounds are plotted and compared numerically with the upper

0.01

bounds. The lower bounds compare very well with the upper bounds at low SNR. The performance of the lower bounds can

0.001

be further improved by considering a larger input constellation such as M -PSK (M > 4) instead of QPSK. At high SNR,

0.0001 1

10 100 1000 β = 1 : Channel memory T( ρ ) = 1/(1- ρ )

10000

the lower bounds can be improved by considering multi-level signaling. For the case of Rayleigh fading with β = 1, it is

Fig. 2. Bounds on capacity vs ρ: average power P fixed at −15dB, −10dB and −5dB, β = 1

shown that Cl1 ≥ Cl2 , though it is not known if they are ordered in general.

R EFERENCES and Cl2 (FP ) become trivial lower bounds. The closer ρ is to 1, the greater the memory in the channel. We use

1 1−ρ

as a

measure of the channel memory. To study the effect of channel memory on the capacity, the bounds are plotted in Figure 2 as functions of

1 1−ρ ,

with P fixed at −5dB, −10dB and −15dB.

B. β > 1 : Suppose the peak to average ratio, β is greater than 1. An upper bound on C(P, β) is given by Cu (P, β) (40). The β expression Cl1 (FP ) (29) gives the corresponding lower bound.

The bounds corresponding to ρ = 0.99 and β = 10 are β plotted in Figure 3. Instead of evaluating Cl1 (FP ) at β = 10, 1 β Cl1 (FβP )

is plotted for various values of β ∈ [1, 10]. It is

understood that the supremum of these curves forms a lower bound to the capacity C(P, 10).

[1] M. S. Pinsker, Information and Information Stability of Random Variables and Processes. Holden-Day, Inc., 1964. [2] R. M. Gray, Entropy and Information Theory. Springer-Verlag, 1990. [3] V. Sethuraman and B. Hajek, “Capacity per unit energy of fading channels with a peak constraint,” submitted to IEEE Transactions on Information Theory, 2004. [4] S. Shamai (Shitz) and T. Marzetta, “Multiuser capacity in block fading with no channel state information,” IEEE Transactions on Information Theory, vol. 48, pp. 938–942, Apr. 2002. [5] T. Guess and M. Varanasi, “An information-theoretic framework for deriving canonical decision-feedback receivers in Gaussian channels,” IEEE Transaction on Information Theory, vol. 51, pp. 173–187, Jan. 2005. [6] I. Gihman and A. Skorohod, The Theory of Stochastic Processes I. New York: Springer-Verlab, 1974. [7] J. Doob, Stochastic Processes. New York: Wiley, 1953. [8] Y. Liang and V. V. Veeravalli, “Capacity of noncoherent time-selective Rayleigh-fading channels,” Information Theory, IEEE Transactions on, vol. 50, pp. 3095–3110, Dec. 2004. [9] G. Szeg¨o, “Ein grenzwertsatz uber die toeplitzschen determinanten einer reelen psitiven funktion,” Math, Ann., vol. 76, pp. 490–503, 1915.