SELECTED SOLUTIONS TO PRINCIPLES OF DIGITAL COMMUNICATION Cambridge Press 2008 by ROBERT G. GALLAGER A complete set of solutions is available from Cambridge Press for instructors teaching a class using this text. This is a subset of solutions that I feel would be valuable for those studying the subject on their own.

Chapter 2 Exercise 2.2: (a) V + W is a random variable, so its expectation, by definition, is X X E[V + W ] = (v + w)pV W (v, w) v∈V w∈W

=

X X

v pV W (v, w) +

v∈V w∈W

=

X

v∈V

=

X

v

X

X X

w pV W (v, w)

v∈V w∈W

pV W (v, w) +

w∈W

v pV (v) +

v∈V

X

X

w∈W

w

X

pV W (v, w)

v∈V

w pW (w)

w∈W

= E[V ] + E[W ].

(b) Once again, working from first principles, X X E[V · W ] = (v · w)pV W (v, w) v∈V w∈W

=

X X

v∈V w∈W

=

X

v∈V

(v · w)pV (v)pW (w)

v pV (v)

X

(Using independence)

w pW (w)

w∈W

= E[V ] · E[W ].

(c) To discover a case where E[V · W ] 6= E[V ] · E[W ], first try the simplest kind of example where V and W are binary with the joint pmf pV W (0, 1) = pV W (1, 0) = 1/2;

pV W (0, 0) = pV W (1, 1) = 0. 1

Clearly, V and W are not independent. Also, E[V · W ] = 0 whereas E[V ] = E[W ] = 1/2 and hence E[V ] · E[W ] = 1/4.

The second case requires some experimentation. One approach is to choose a joint distribution such that E[V · W ] = 0 and E[V ] = 0. A simple solution is then given by the pmf, pV W (−1, 0) = pV W (0, 1) = pV W (1, 0) = 1/3. Again, V and W are not independent. Clearly, E[V · W ] = 0. Also, E[V ] = 0 (what is E[W ]?). Hence, E[V · W ] = E[V ] · E[W ]. (d) σV2 +W

= E[(V + W )2 ] − (E[V + W ])2

= E[V 2 ] + E[W 2 ] + E[2V · W ] − (E[V ] + E[W ])2

= E[V 2 ] + E[W 2 ] + 2E[V ] · E[W ] − E[V ]2 − E[W ]2 − 2E[V ] · E[W ] = E[V 2 ] − E[V ]2 + E[W 2 ] − E[W ]2 2 = σV2 + σW .

Exercise 2.4: (a) Since X1 and X2 are iid, symmetry implies that Pr(X1 > X2 ) = Pr(X2 > X1 ). These two events are mutually exclusive and the event X1 = X2 has 0 probability. Thus Pr(X1 > X2 ) and Pr(X1 < X2 ) sum to 1, so must each be 1/2. Thus Pr(X1 ≥ X2 ) = Pr(X2 ≥ X1 ) = 1/2.

(b) Invoking the symmetry among X1 , X2 and X3 , we see that each has the same probability of being the smallest of the three (the probability of a tie is 0). These three events are mutually exclusive and their probabilities must add up to 1. Therefore each event occurs with probability 1/3. (c) The event {N > n} is the same as the event {X1 is the minimum among the n iid random variables X1 , X2 , · · · , Xn }. By extending the argument in part (b), we see that 1 Pr(X1 is the smallest of X1 , . . . , Xn ) = 1/n. Finally, Pr {N ≥ n} = Pr {N > n − 1}= n−1 for n ≥ 2. (d) Since N is a non-negative integer random variable (taking on values from 2 to 1), we can use Exercise 2.3(a) as follows: E[N ] =

1 X

n=1

Pr {N ≥ n}

= Pr {N ≥ 1} + = 1+ = 1+

1 X

1 n−1 n=2 1 X 1 . n n=1

2

1 X

n=2

Pr {N ≥ n}

Since the series

P1

1 n=1 n

diverges, we conclude that E[N ] = 1.

(e) Since the alphabet has a finite number of letters,1 Pr(X1 = X2 ) is no longer 0 and depends on the particular probability distribution. Thus, although, Pr(X1 ≥ X2 ) = Pr(X2 ≥ X1 ) by symmetry, neither can be found without knowing the distribution. Out of the alphabet letters with nonzero probability, let amin be a letter of minimum numeric value. If X1 = amin , then no subsequent rv X2 , X3 , . . . can have a smaller value, so N = 1 in this case. Since the event X1 = amin occurs with positive probability, E[N ] = 1. Exercise 2.6: (a) Assume the contrary; i.e., there is a suﬃx-free code that is not uniquely decodable. Then that code must contain two distinct sequences of source letters, say, x1 , x2 , . . . , xn and x01 , x02 , . . . , x0m such that, C(x1 )C(x2 ) . . . C(xn ) = C(x01 )C(x02 ) . . . C(x0m ). Then one of the following must hold: • C(xn ) = C(x0m ) • C(xn ) is a suﬃx of C(x0m ) • C(x0m ) is a suﬃx of C(xn ). In the last two cases we arrive at a contradiction since the code is hypothesized to be suﬃx-free. In the first case, xn must equal x0m because of the suﬃx freedom. Simply delete that final letter from each sequence and repeat the argument. Since the sequences are distinct, the final letter must diﬀer after some number of repetitions of the above argument, and at that point one of the latter two cases holds and a contradiction is reached. Hence, suﬃx-free codes are uniquely decodable. (b) Any prefix-free code becomes a suﬃx-free code if the ordering of symbols in each codeword is reversed. About the simplest such example is {0,01,11} which can be seen to be a suﬃx-free code (with codeword lengths {1, 2, 2}) but not a prefix-free code. A codeword in the above code cannot be decoded as soon as its last bit arrives at the decoder. To illustrate a rather extreme case, consider the following output produced by the encoder, 0111111111 . . . Assuming that source letters {a,b,c} map to {0,01,11}, we cannot distinguish between the two possible source sequences, acccccccc . . . and bcccccccc . . . , 1

The same results can be obtained with some extra work for a countably infinite discrete alphabet.

3

till the end of the string is reached. Hence, in this case the decoder might have to wait for an arbitrarily long time before decoding. (c) There cannot be any code with codeword lengths (1, 2, 2) that is both prefix free and suﬃx free. Without loss of generality, set C1 = 0. Then a prefix-free code cannot use either the codewords 00 and 01 for C2 or C3 , and thus must use 10 and 11, which is not suﬃx free. Exercise 2.7: Consider the set of codeword lengths (1,2,2) and arrange them as (2,1,2). Then, u1 =0 is represented as 0.00. Next, u2 = 1/4 = 0.01 must be represented using 1 bit after the binary point, which is not possible. Hence, the algorithm fails. Exercise 2.9: (a) Assume, as usual, that pj > 0 for each j. From Eqs. (2.8) and (2.9) H[X] − L =

M X j=1

∑ −lj ∏ M X 2−lj 2 pj log ≤ pj − 1 log e = 0. pj pj j=1

As is evident from Figure 2.7, the inequality is strict unless 2−lj = pj for each j. Thus if H[X] = L, it follows that 2−lj = pj for each j. (b) First consider Figure 2.4, repeated below, assuming that Pr(a) = 1/2 and Pr(b) = Pr(c) = 1/4. The first order node 0 corresponds to the letter a and has probability 1/2. The first order node 1 corresponds to the occurence of either letter b or c, and thus has probability 1/2. ✏bb ✏✏ ✏P ✏ P ✏ Pbc ✟PP ✟ Pba✏ b ✏cb ✟ ✏✏ PP ✏ ✓❍❍ ✓ ❍✏ cPPPca Pcc 1✓ ✓ ✏ab ✏ ❅ PP 0❅ ✟ ✟✏ Pac a ❅✟ PP Paa

aa → 00 ab → 011 ac → 010 ba → 110 bb → 1111 bc → 1110 ca → 100 cb → 1011 cc → 1010

Similarly, the second order node 00 corresponds to aa, which has probability 1/4, and the second order node 01 corresponds to either ab or ac, which have cumulative probability 1/4. In the same way, 10 amd 11 correspond to b and c, with probabilities 1/4 each. One can proceed with higher order nodes in the same way, but what is the principle behind this? In general, when an infinite binary tree is used to represent an unending sequence of letters from an iid source where each letter j has probability pj and length `j = 2−j , weQsee that each −`xi node corresponding to an initial sequence of letters x1 , . . . , xn has a probability equal i2 P to the product of the individual letter probabilities and an order equal to i `xi . Thus each node labelled by a subsequence of letters has a probability 2−` where ` is the order of that node. The other nodes (those unlabelled in the example above) have a probability equal to the sum of the immediately following labelled nodes. This probability is again 2−` for an `th order node, which can be established by induction if one wishes to be formal.

4

Exercise 2.11: (a) For n = 2, 2 M M M M X M X X X X 2−lj = 2−lj1 2−lj2 = 2−(lj1 +lj2 ) . j=1

j1 =1

j2 =1

j1 =1 j2 =1

The same approach works for arbitrary n.

(b) Each source n-tuple x n = (aj1 aj2 , . . . , ajn ), is encoded into a concatenation C(aj1 )C(aj2 ) . . . C(ajn ) of binary digits of aggregate length l(x n ) = lj1 + lj2 + · · · , +ljn . Since there is one n-tuple x n for each choice of aj1 , aj2 , . . . , ajn , the result of part (a) can be rewritten as n M X X n 2−lj = 2−l(x ) . (1) xn

j=1

(c) Rewriting (1) in terms of the number Ai of concatenations of n codewords of aggregate length i, n nl M max X X 2−lj = Ai 2−i . j=1

i=1

This uses the fact that since each codeword has length at most lmax , each concatenation has length at most nlmax .

(d) From unique decodability, each of these concatenations must be diﬀerent, so there are at most 2i concatenations of aggregate length i, i.e., Ai ≤ 2i . Thus, since the above sum contains at most nlmax terms, n M X 2−lj ≤ nlmax . (2) j=1

(e) Note that 1/n

[nlmax ]

∑

∏ ln(nlmax ) = exp −→ exp(0) = 1 n

as n → 1. Since (2) must be satisfied for all n, the Kraft inequality must be satisfied. Exercise 2.13: (a) In the Huﬀman algorithm, we start by combining p3 and p4 . Since we have p1 = p3 +p4 ≥ p2 , we can combine p1 and p2 in the next step, leading to all codewords of length 2. We can also combine the supersymbol obtained by combining symbols 3 and 4 with symbol 2, yielding codewords of lengths 1,2,3 and 3 respectively. (b) Note that p3 ≤ p2 and p4 ≤ p2 so p3 + p4 ≤ 2p2 . Thus p1 = p3 + p4 ≤ 2p2

which implies p1 + p3 + p4 ≤ 4p2 . 5

Since p2 = 1 − p1 − p3 − p4 , the latter equation implies that 1 − p2 ≤ 4p2 , or p2 ≥ 0.2. From the former equation, then, p1 ≤ 2p2 ≤ 0.4 shows that p1 ≤ 0.4. These bounds can be met by also choosing p3 = p4 = 0.2. Thus pmax = 0.4. (c) Reasoning similarly to part (b), p2 ≤ p1 and p2 = 1 − p1 − p3 − p4 = 1 − 2p1 . Thus 1 − 2p1 ≤ p1 so p1 ≥ 1/3, i.e., pmin = 1/3. This bound is achievable by choosing p1 = p2 = 1/3 and p3 = p4 = 1/6. (d) The argument in part (b) remains the same if we assume p1 ≤ p3 +p4 rather than p1 = p3 +p4 , i.e., p1 ≤ p3 + p4 implies that p1 ≤ pmax . Thus assuming p1 > pmax implies that p1 > p3 + p4 . Thus the supersymbol obtained by combining symbols 3 and 4 will be combined with symbol 2 (or perhaps with symbol 1 if p2 = p1 ). Thus the codeword for symbol 1 (or perhaps the codeword for symbol 2) will have length 1. (e) The lengths of any optimal prefix free code must be either (1, 2, 3, 3) or (2, 2, 2, 2). If p1 > pmax , then, from (b), p1 > p3 + p4 , so the lengths (1, 2, 3, 3) yield a lower average length than (2, 2, 2, 2). (f) The argument in part (c) remains almost the same if we start with the assumption that p1 ≥ p3 + p4 . In this case p2 = 1 − p1 − p3 − p3 ≥ 1 − 2p1 . Combined with p1 ≥ p2 , we again have p1 ≥ pmin . Thus if p1 < pmin , we must have p3 + p4 > p1 ≥ p2 . We then must combine p1 and p2 in the second step of the Huﬀman algorithm, so each codeword will have length 2. (g) It turns out that pmax is still 2/5. To see this, first note that if p1 = 2/5, p2 = p3 = 1/5 and all other symbols have an aggregate probability of 1/5, then the Huﬀman code construction combines the least likely symbols until they are tied together into a supersymbol of probability 1/5. The completion of the algorithm, as in part (b), can lead to either one codeword of length 1 or 3 codewords of length 2 and the others of longer length. If p1 > 2/5, then at each stage of the algorithm, two nodes of aggregate probability less than 2/5 are combined, leaving symbol 1 unattached until only 4 nodes remain in the reduced symbol set. The argument in (d) then guarantees that the code will have one codeword of length 1. Exercise 2.15: (a) This is the same as Lemma 2.5.1. (b) Since p1 < pM −1 + pM , we see that p1 < p0M −1 , where p0M −1 is the probability of the node in the reduced code tree corresponding to letters M − 1 and M in the original alphabet. Thus, 0 by part (a), l1 ≥ lM −1 = lM − 1.

(c) Consider an arbitrary minimum-expected-length code tree. This code tree must be full (by Lemma 2.5.2), so suppose that symbol k is the sibling of symbol M in this tree. If k = 1, then l1 = lM , and otherwise, p1 < pM + pk , so l1 must be at least as large as the length of the immediate parent of M , showing that l1 ≥ lM − 1.

(d) and (e) We have shown that the shortest and longest length diﬀer by at most 1, with some number m ≥ 1 lengths equal to l1 and the remaining M −m lengths equal to l1 +1. It follows that 2l1 +1 = 2m + (M − m) = M + m. From this is follows that l1 = blog2 (M )c and m = 2l1 +1 − M . Exercise 2.16: (a) Grow a full ternary tree to a full ternary tree at each step. The smallest tree has 3 leaves. For the next largest full tree, convert one of the leaves into an intermediate node and grow 3 leaves

6

from that node. We lose 1 leaf, but gain 2 more at each growth extension. Thus, M = 3 + 2n (for n an integer). (b) It is clear that for optimality, all the unused leaves in the tree must have the same length as the longest codeword. For M even, combine the 2 lowest probabilities into a node at the first step, then combine the 3 lowest probability nodes for all the rest of the steps until the root node. If M is odd, a full ternary tree is possible, so combine the 3 lowest probability nodes at each step. (c) If {a, b, c, d, e, f } have symbol probabilities {0.3, 0.2, 0.2, 0.1, 0.1, 0.1} respectively, then the ternary Huﬀman code will be {a → 0, b → 1, c → 20, d → 21, e → 220, f → 221}. Exercise 2.18: (a) Applying the Huﬀman coding algorithm to the code with M + 1 symbols with pM +1 = 0, we combine symbol M + 1 with symbol M and the reduced code has M symbols with probabilities p1 , . . . , pM . The Huﬀman code for this reduced set of symbols is simply the code for the original set of symbols with symbol M + 1 eliminated. Thus the code including symbol M + 1 is the 0 reduced code modified by a unit length increase in the codeword for symbol M . Thus L = L +pM 0 where L is the expected length for the code with M symbols. (b) All n of the zero probability symbols are combined together in the Huﬀman algorithm, and the reduced code from this combination is then the same as the code with M + 1 symbols in 0 part (a). Thus L = L + pM again. Exercise 2.19: (a) The entropies H(X), H(Y ), and H(XY ) can be expressed as X H(XY ) = − pXY (x, y) log pXY (x, y) x∈X ,y∈Y

H(X) = − H(Y ) = −

X

pXY (x, y) log pX (x)

x∈X ,y∈Y

X

pXY (x, y) log pY (y).

x∈X ,y∈Y

It is assumed that all symbol pairs x, y of zero probability have been removed from this sum, and thus all x (y) for which pX (x) = 0 ( pY (y) = 0) are consequently removed. Combining these equations, H(XY ) − H(X) − H(Y ) =

X

pXY (x, y) log

x∈X ,y∈Y

(b) Using the standard inequality log x ≤ (x − 1) log e, H(XY ) − H(X) − H(Y ) ≤

X

x∈X ,y∈Y

pX (x)pY (y) . pXY (x, y)

∑

∏ pX (x)pY (y) pXY (x, y) − 1 log e = 0. pXY (x, y)

Thus H(X, Y ) ≤ H(X) + H(Y ). Note that this inequality is satisfied with equality if and only if X and Y are independent. 7

(c) For n symbols, X1 , . . . , Xn , let Y be the ‘super-symbol’ X2 , . . . , Xn . Then using (b), H(X1 , . . . , Xn ) = H(X1 , Y ) ≤ H(X1 ) + H(Y ) = H(X1 ) + H(X2 , . . . , Xn ). Iterating this gives the desired result. An alternate approach generalizes part (b) in the following way: H(X1 , . . . , Xn ) −

X

H(Xi )

=

X

p(x1 , . . . , xn ) log

x1 ,... ,xn

i

≤ 0,

p(x1 ), . . . , ...p(xn ) p(x1 , . . . , xn )

where we have used log x ≤ (x − 1) log e again. Exercise 2.20 (a) Y is 1 if X = 1, which occurs with probability p1 . Y is 0 otherwise. Thus H(Y ) = −p1 log(p1 ) − (1 − p1 ) log(1 − p1 ) = Hb (p1 ). (b) Given Y =1, X = 1 with probability 1, so H(X | Y =1) = 0.

(c) Given Y =0, X=1 has probability 0, so X has M −1 possible choices with non-zero probability. The maximum entropy for an alphabet of size M −1 terms is log(M −1), so H(X|Y =0) ≤ log(M − 1). This upper bound is met with equality if Pr(X=j | X6=1) = M1−1 for all j 6= 1. Since Pr(X=j|X6=1) = pj /(1 − p1 ), this upper bound on H(X | Y =0) is achieved when p2 = p3 = · · · = pM . Combining this with part (b), H(X | Y ) = p1 H(X | Y =1) ≤ (1−p1 ) log(M − 1). (d) Note that H(XY ) = H(Y ) + H(X|Y ) ≤ Hb (p1 ) + (1−p1 ) log(M −1) and this is met with equality for p2 = · · · , pM . There are now two reasonable approaches. One is to note that H(XY ) can also be expressed as H(X) + H(Y |X). Since Y is uniquely specified by X, H(Y |X) = 0, H(X) = H(XY ) ≤ Hb (p1 ) + (1 − p1 ) log(M − 1),

(3)

with equality when p2 = p3 = · · · = pM . The other approach is to observe that H(X) ≤ H(XY ), which again leads (3), but this does not immediately imply that equality is met for p2 = · · · = pM . Equation (3) is the Fano bound of information theory; it is useful when p1 is very close to 1 and plays a key role in the noisy channel coding theorem. (e) The same bound applies to each symbol by replacing p1 by pj for any j, 1 ≤ j ≤ M . Thus it also applies to pmax . Exercise 2.22: One way to generate a source code for (X1 , X2 , X3 is to concatenate a Huﬀman code for (X1 , X2 ) with a Huﬀman code of X3 . The expected length of the resulting code for (X1 , X2 , X3 ) is Lmin,2 + Lmin . The expected length per source letter of this code is 23 Lmin,2 +

8

1 3 Lmin .

The expected length per source letter of the optimal code for (X1 , X2 , X3 ) can be no worse, so 2 1 Lmin,3 ≤ Lmin,2 + Lmin . 3 3 Exercise 2.23: (Run Length Coding) (a) Let C and C 0 be the codes mapping source symbols to intermediate integers and intermediate integers to outbit bits respectively. If C 0 is uniquely decodable, then the intermediate integers can be decoded from the received bit stream, and if C is also uniquely decodable, the original source bits can be decoded. The lengths specified for C 0 satisfy Kraft and thus this code can be made prefix-free and thus uniquely decodable. For example, mapping 8 → 1 and each other integer to 0 followed by its 3 bit binary representation is prefix-free. C is a variable to fixed length code, mapping {b, ab, a2 b, . . . , a7 b, a8 } to the integers 0 to 8. This set of strings forms a full prefix-free set, and thus any binary string can be parsed into these ‘codewords’, which are then mapped to the integers 0 to 8. The integers can then be decoded into the ‘codewords’ which are then concatenated into the original binary sequence. In general, a variable to fixed length code is uniquely decodable if the encoder can parse, which is guaranteed if that set of ‘codewords’ is full and prefix-free. (b) Each occurence of source letter b causes 4 bits to leave the encoder immediately. In addition, each subsequent run of 8 a’s causes 1 extra bit to leave the encoder. Thus, for each b, the encoder emits 4 bits with probability 1; it emits an extra bit with probability (0.9)8 ; it emits yet a further bit with probability (0.9)16 and so forth. Letting Y be the number of output bits per input b, E(Y ) = 4 + (.09)8 + (0.9)16 + · · · =

4 · (0.9)8 = 4.756. 1 − (0.9)8

(c) To count the number of b’s out of the source, let Bi = 1 if the P ith source letter is b and 2 Bi = 0 otherwise. Then E(Bi ) = 0.1 and σBi = 0.09. Let AB = (1/n) ni=1 Bi be the number of b’s per input in a run of n = 1020 inputs. This has mean 0.1 and variance (0.9) · 10−21 , which is close to 0.1 with very high probability. As the number of trials increase, it is closer to 0.1 with still higher probability. (d) The total number of output bits corresponding to the essentially 1019 b’s in the 1020 source letters is with high probability close to 4.756 · 1019 (1 + ≤) for small ≤. Thus, L≈

(0.1) · 4 · (0.9)8 = 0.4756. 1 − (0.9)8

Renewal theory provides a more convincing way to fully justify this solution. Note that the achieved L is impressive considering that the entropy of the source is −(0.9) log(0.9) − (0.1) log(0.1) = 0.469 bits/source symbol. Exercise 2.25: (a) Note that W takes on values − log(2/3) with probability 2/3 and − log(1/3) with probability 1/3. Thus E(W ) = log 3 − 23 . Note that E(W ) = H(X). The fluctuation of W around its means 2 2 1 2 2 is −1 3 with probability 3 and 3 with probability 3 . Thus σW = 9 . 9

(b) The bound on the probability of the typical set, as derived using the Chebyshev inequality, and stated in (2.2) is: Pr(Xn ∈ T≤ ) ≥ 1 −

2 σW 1 =1− . n≤2 45

(c) To count the number of a’s out of the source, let the rv Yi (Xi ) be 1 for Xi = a and 0 for Xi = b. The Yi (Xi )’s are iid with mean Y = 2/3 and σY2 = 2/9. Na (X n ) is given by Na =

n X

Yi (Xi ),

i=1

which has mean 2n/3 and variance 2n/9.

P (d) Since the n-tuple Xn is iid, the sample outcome w(xn ) = i w(xi ). Let na be the sample value of Na corresponding to x n . Since w(a) = − log 2/3 and w(b) = − log 1/3, we have w(xn ) = na (− log 2/3) + (n − na )(− log 1/3) = n log 3 − na

W (Xn ) = n log 3 − Na .

˜ (X n ), the fluctuation of W (X n ) around its mean, is the negative of the In other words, W ˜ (X n ) = −N ˜a (X n ). fluctuation of Na (X n ); that is W (e) The typical set is given by: Ø Ø Ø Ø Ω æ Ω æ n n) Ø Ø Ø Ø w(x ˜ ≤ n Ø w(x ) n Ø 5/4 (and thus for f ≥ 2). Thus we can use the band edge symmetry condition, gˆ(f ) + gˆ(2 − f ) = 1/2 over 0 ≤ f ≤ 1. Since gˆ(f ) = 0 for 3/4 < f ≤ 1, it is necessary that gˆ(f ) = 1/2 for 1 < f ≤ 5/4. Similarly, since gˆ(f ) = 0 for f > 5/4, we must satisfy gˆ(f ) = 1/2 for |f | < 3/4. Thus, to satisfy the Nyquist criterion, gˆ(f ) is uniquely specified as below.

41

0.5

gˆ(f ) 0

3 4

5 4

ˆ )]. Elsewhere gˆ(f ) = 0 In the regions where gˆ(f ) = 1/2, we must choose qˆ(f ) = 1/[2ˆ p(f )h(f ˆ because h(f ) = 0, and thus qˆ(f ) is arbitrary. More specifically, we must choose qˆ(f ) to satisfy |f | ≤ 0.5; 0.5, 1 qˆ(f ) = 3−2|f | , 0.5 < |f | ≤ 0.75 1 , 1 ≤ |f | ≤ 5/4 3−2|f |

It makes no diﬀerence what qˆ(f ) is elsewhere as it will be multiplied by zero there. ˆ ) = 0 for f > 3/4, it is necessary that gˆ(f ) = 0 for |f | > 3/4. Thus, for all integers (c) Since h(f m, gˆ(f + 2m) is 0 for 3/4 < f < 1 and the Nyquist criterion cannot be met. ˆ ) 6= 0, it is possible for gˆ(f ) to have an arbitrary value by (d) If for some frequency f , pˆ(f )h(f ˆ ) = 0 for some f , then gˆ(f ) = 0. choosing qˆ(f ) appropriately. On the other hand, if pˆ(f )h(f Thus, to avoid ISI, it is necessary that for each 0 ≤ f ≤ 1/(2T ),Pthere is some integer m such that ˆ +m/T )ˆ ˆ +m/T )ˆ h(f p(f +m/T ) 6= 0. Equivalently, it is necessary that m h(f p(f +m/T ) 6= 0 for all f . ˆ ) goes through There is one peculiarity here that you were not expected to deal with. If pˆ(f )h(f zero at f0 with some given slope, and that is the only f that can be used to satisfy the Nyquist criterion, then even if we ignore the point f0 , the response qˆ(f ) would approach infinity fast enough in the vicinity of f0 that qˆ(f ) would not be L2 . This overall problem shows that under ordinary conditions (i.e.non-zero filter transforms), there is no problem in choosing qˆ(f ) to avoid intersymbol interference. Later, when noise is taken into account, it will be seen that it is undesirable for qˆ(f ) be very large where pˆ(f ) is small, since this amplifies the noise in frequency regions where there is very little signal. Exercise 6.8: (a) With α = 1, the flat part of gˆ(f ) disappears. Using T = 1 and using the familiar formula cos2 x = (1 + cos 2x)/2, gˆ1 (f ) becomes ∑ ∏ 1 πf f gˆ1 (f ) = 1 + cos( ) rect( ). 2 2 2 Writing cos x = (1/2)[eix + e−ix ] and using the frequency shift rule for Fourier transforms, we

42

get 1 1 g1 (t) = sinc(2t) + sinc(2t + 1) + sinc(2t − 1) 2 2 sin(2πt) 1 sin(π(2t + 1)) 1 sin(π(2t − 1)) = + + 2πt 2 π(2t + 1) 2 π(2t − 1) sin(2πt) 1 sin(2πt) 1 sin(2πt) = − − 2πt 2 π(2t + 1) 2 π(2t − 1) ∑ ∏ sin(2πt) 1 1 1 sin(2πt) = − − = 2π t 2t + 1 2t − 1 2πt(1 − 4t2 ) sin(πt) cos(πt) sinc(t) cos(πt) = = . 2 πt(1 − 4t ) (1 − 4t2 ) This agrees with (6.18) in the text for α = 1, T = 1. Note that the denominator is 0 at t = ±0.5. The numerator is also 0, and it can be seen from the first equation above that the limiting value as t → ±0.5 is 1/2. Note also that this approaches 0 with increasing t as 1/t3 , much faster than sinc(t). (b) It is necessary to use the result of Exercise 6.6 here. As shown there, the inverse transform of a real symmetric waveform gα (f ) that satisfies the Nyquist criterion for T = 1 and has a rolloﬀ of α ≤ 1 is equal to sinc(t)v(t). Here v(t) is lowpass limited to α/2 and its transform vˆ(f ) is given by the following: vˆ(f + 1/2) =

dˆ g (f ) df

for

−(1 + α) (1 − α) fc and the overlapped portion can not be retrieved without additional knowledge about u(t). (c) The ambiguity is obvious by repeating the argument in (a). Now, since y(t) has some nonzero bandwidth, ambiguity might be possible in other ways also. We have already seen, however, that if u(t) has a bandwidth less than fc , then u(t) can be uniquely retrieved from x(t) in the absence of noise. (d) For u(t) real, x(t) = 2u(t) cos(2πfc t), so u(t) can be retrieved by dividing x(t) by 2 cos(2πfc t) except at those points of measure 0 where the cosine function is 0. This is not a reasonable approach, especially in the presence of noise, but it points out that the PAM case is essentially diﬀerent from the QAM case 44

(e) Since u∗ (t) exp(2πifc t) has energy at positive frequencies, the use of a Hilbert filter does not have an output equal to u(t) exp(2πifc t), and thus u(t) does not result from shifting this output down by fc . In the same way, the bands at 2fc and −2fc that result from DSB-QC demodulation mix with those at 0 frequency, so cannot be removed by an ordinary LTI filter. For QAM, this problem is to be expected since u(t) cannot be uniquely generated by any means at all. For PAM it is surprising, since it says that these methods are not general. Since all time-limited waveforms are unbounded in frequency, it says that there is a fundamental theoretical problem with the standard methods of demodulation. This is not a problem in practice, since fc is usually so much larger than the nominal bandwidth of u(t) that this problem is of no significance. Exercise 6.13: (a) Since u(t) is real, φ1 (t) =

Chapter 2 Exercise 2.2: (a) V + W is a random variable, so its expectation, by definition, is X X E[V + W ] = (v + w)pV W (v, w) v∈V w∈W

=

X X

v pV W (v, w) +

v∈V w∈W

=

X

v∈V

=

X

v

X

X X

w pV W (v, w)

v∈V w∈W

pV W (v, w) +

w∈W

v pV (v) +

v∈V

X

X

w∈W

w

X

pV W (v, w)

v∈V

w pW (w)

w∈W

= E[V ] + E[W ].

(b) Once again, working from first principles, X X E[V · W ] = (v · w)pV W (v, w) v∈V w∈W

=

X X

v∈V w∈W

=

X

v∈V

(v · w)pV (v)pW (w)

v pV (v)

X

(Using independence)

w pW (w)

w∈W

= E[V ] · E[W ].

(c) To discover a case where E[V · W ] 6= E[V ] · E[W ], first try the simplest kind of example where V and W are binary with the joint pmf pV W (0, 1) = pV W (1, 0) = 1/2;

pV W (0, 0) = pV W (1, 1) = 0. 1

Clearly, V and W are not independent. Also, E[V · W ] = 0 whereas E[V ] = E[W ] = 1/2 and hence E[V ] · E[W ] = 1/4.

The second case requires some experimentation. One approach is to choose a joint distribution such that E[V · W ] = 0 and E[V ] = 0. A simple solution is then given by the pmf, pV W (−1, 0) = pV W (0, 1) = pV W (1, 0) = 1/3. Again, V and W are not independent. Clearly, E[V · W ] = 0. Also, E[V ] = 0 (what is E[W ]?). Hence, E[V · W ] = E[V ] · E[W ]. (d) σV2 +W

= E[(V + W )2 ] − (E[V + W ])2

= E[V 2 ] + E[W 2 ] + E[2V · W ] − (E[V ] + E[W ])2

= E[V 2 ] + E[W 2 ] + 2E[V ] · E[W ] − E[V ]2 − E[W ]2 − 2E[V ] · E[W ] = E[V 2 ] − E[V ]2 + E[W 2 ] − E[W ]2 2 = σV2 + σW .

Exercise 2.4: (a) Since X1 and X2 are iid, symmetry implies that Pr(X1 > X2 ) = Pr(X2 > X1 ). These two events are mutually exclusive and the event X1 = X2 has 0 probability. Thus Pr(X1 > X2 ) and Pr(X1 < X2 ) sum to 1, so must each be 1/2. Thus Pr(X1 ≥ X2 ) = Pr(X2 ≥ X1 ) = 1/2.

(b) Invoking the symmetry among X1 , X2 and X3 , we see that each has the same probability of being the smallest of the three (the probability of a tie is 0). These three events are mutually exclusive and their probabilities must add up to 1. Therefore each event occurs with probability 1/3. (c) The event {N > n} is the same as the event {X1 is the minimum among the n iid random variables X1 , X2 , · · · , Xn }. By extending the argument in part (b), we see that 1 Pr(X1 is the smallest of X1 , . . . , Xn ) = 1/n. Finally, Pr {N ≥ n} = Pr {N > n − 1}= n−1 for n ≥ 2. (d) Since N is a non-negative integer random variable (taking on values from 2 to 1), we can use Exercise 2.3(a) as follows: E[N ] =

1 X

n=1

Pr {N ≥ n}

= Pr {N ≥ 1} + = 1+ = 1+

1 X

1 n−1 n=2 1 X 1 . n n=1

2

1 X

n=2

Pr {N ≥ n}

Since the series

P1

1 n=1 n

diverges, we conclude that E[N ] = 1.

(e) Since the alphabet has a finite number of letters,1 Pr(X1 = X2 ) is no longer 0 and depends on the particular probability distribution. Thus, although, Pr(X1 ≥ X2 ) = Pr(X2 ≥ X1 ) by symmetry, neither can be found without knowing the distribution. Out of the alphabet letters with nonzero probability, let amin be a letter of minimum numeric value. If X1 = amin , then no subsequent rv X2 , X3 , . . . can have a smaller value, so N = 1 in this case. Since the event X1 = amin occurs with positive probability, E[N ] = 1. Exercise 2.6: (a) Assume the contrary; i.e., there is a suﬃx-free code that is not uniquely decodable. Then that code must contain two distinct sequences of source letters, say, x1 , x2 , . . . , xn and x01 , x02 , . . . , x0m such that, C(x1 )C(x2 ) . . . C(xn ) = C(x01 )C(x02 ) . . . C(x0m ). Then one of the following must hold: • C(xn ) = C(x0m ) • C(xn ) is a suﬃx of C(x0m ) • C(x0m ) is a suﬃx of C(xn ). In the last two cases we arrive at a contradiction since the code is hypothesized to be suﬃx-free. In the first case, xn must equal x0m because of the suﬃx freedom. Simply delete that final letter from each sequence and repeat the argument. Since the sequences are distinct, the final letter must diﬀer after some number of repetitions of the above argument, and at that point one of the latter two cases holds and a contradiction is reached. Hence, suﬃx-free codes are uniquely decodable. (b) Any prefix-free code becomes a suﬃx-free code if the ordering of symbols in each codeword is reversed. About the simplest such example is {0,01,11} which can be seen to be a suﬃx-free code (with codeword lengths {1, 2, 2}) but not a prefix-free code. A codeword in the above code cannot be decoded as soon as its last bit arrives at the decoder. To illustrate a rather extreme case, consider the following output produced by the encoder, 0111111111 . . . Assuming that source letters {a,b,c} map to {0,01,11}, we cannot distinguish between the two possible source sequences, acccccccc . . . and bcccccccc . . . , 1

The same results can be obtained with some extra work for a countably infinite discrete alphabet.

3

till the end of the string is reached. Hence, in this case the decoder might have to wait for an arbitrarily long time before decoding. (c) There cannot be any code with codeword lengths (1, 2, 2) that is both prefix free and suﬃx free. Without loss of generality, set C1 = 0. Then a prefix-free code cannot use either the codewords 00 and 01 for C2 or C3 , and thus must use 10 and 11, which is not suﬃx free. Exercise 2.7: Consider the set of codeword lengths (1,2,2) and arrange them as (2,1,2). Then, u1 =0 is represented as 0.00. Next, u2 = 1/4 = 0.01 must be represented using 1 bit after the binary point, which is not possible. Hence, the algorithm fails. Exercise 2.9: (a) Assume, as usual, that pj > 0 for each j. From Eqs. (2.8) and (2.9) H[X] − L =

M X j=1

∑ −lj ∏ M X 2−lj 2 pj log ≤ pj − 1 log e = 0. pj pj j=1

As is evident from Figure 2.7, the inequality is strict unless 2−lj = pj for each j. Thus if H[X] = L, it follows that 2−lj = pj for each j. (b) First consider Figure 2.4, repeated below, assuming that Pr(a) = 1/2 and Pr(b) = Pr(c) = 1/4. The first order node 0 corresponds to the letter a and has probability 1/2. The first order node 1 corresponds to the occurence of either letter b or c, and thus has probability 1/2. ✏bb ✏✏ ✏P ✏ P ✏ Pbc ✟PP ✟ Pba✏ b ✏cb ✟ ✏✏ PP ✏ ✓❍❍ ✓ ❍✏ cPPPca Pcc 1✓ ✓ ✏ab ✏ ❅ PP 0❅ ✟ ✟✏ Pac a ❅✟ PP Paa

aa → 00 ab → 011 ac → 010 ba → 110 bb → 1111 bc → 1110 ca → 100 cb → 1011 cc → 1010

Similarly, the second order node 00 corresponds to aa, which has probability 1/4, and the second order node 01 corresponds to either ab or ac, which have cumulative probability 1/4. In the same way, 10 amd 11 correspond to b and c, with probabilities 1/4 each. One can proceed with higher order nodes in the same way, but what is the principle behind this? In general, when an infinite binary tree is used to represent an unending sequence of letters from an iid source where each letter j has probability pj and length `j = 2−j , weQsee that each −`xi node corresponding to an initial sequence of letters x1 , . . . , xn has a probability equal i2 P to the product of the individual letter probabilities and an order equal to i `xi . Thus each node labelled by a subsequence of letters has a probability 2−` where ` is the order of that node. The other nodes (those unlabelled in the example above) have a probability equal to the sum of the immediately following labelled nodes. This probability is again 2−` for an `th order node, which can be established by induction if one wishes to be formal.

4

Exercise 2.11: (a) For n = 2, 2 M M M M X M X X X X 2−lj = 2−lj1 2−lj2 = 2−(lj1 +lj2 ) . j=1

j1 =1

j2 =1

j1 =1 j2 =1

The same approach works for arbitrary n.

(b) Each source n-tuple x n = (aj1 aj2 , . . . , ajn ), is encoded into a concatenation C(aj1 )C(aj2 ) . . . C(ajn ) of binary digits of aggregate length l(x n ) = lj1 + lj2 + · · · , +ljn . Since there is one n-tuple x n for each choice of aj1 , aj2 , . . . , ajn , the result of part (a) can be rewritten as n M X X n 2−lj = 2−l(x ) . (1) xn

j=1

(c) Rewriting (1) in terms of the number Ai of concatenations of n codewords of aggregate length i, n nl M max X X 2−lj = Ai 2−i . j=1

i=1

This uses the fact that since each codeword has length at most lmax , each concatenation has length at most nlmax .

(d) From unique decodability, each of these concatenations must be diﬀerent, so there are at most 2i concatenations of aggregate length i, i.e., Ai ≤ 2i . Thus, since the above sum contains at most nlmax terms, n M X 2−lj ≤ nlmax . (2) j=1

(e) Note that 1/n

[nlmax ]

∑

∏ ln(nlmax ) = exp −→ exp(0) = 1 n

as n → 1. Since (2) must be satisfied for all n, the Kraft inequality must be satisfied. Exercise 2.13: (a) In the Huﬀman algorithm, we start by combining p3 and p4 . Since we have p1 = p3 +p4 ≥ p2 , we can combine p1 and p2 in the next step, leading to all codewords of length 2. We can also combine the supersymbol obtained by combining symbols 3 and 4 with symbol 2, yielding codewords of lengths 1,2,3 and 3 respectively. (b) Note that p3 ≤ p2 and p4 ≤ p2 so p3 + p4 ≤ 2p2 . Thus p1 = p3 + p4 ≤ 2p2

which implies p1 + p3 + p4 ≤ 4p2 . 5

Since p2 = 1 − p1 − p3 − p4 , the latter equation implies that 1 − p2 ≤ 4p2 , or p2 ≥ 0.2. From the former equation, then, p1 ≤ 2p2 ≤ 0.4 shows that p1 ≤ 0.4. These bounds can be met by also choosing p3 = p4 = 0.2. Thus pmax = 0.4. (c) Reasoning similarly to part (b), p2 ≤ p1 and p2 = 1 − p1 − p3 − p4 = 1 − 2p1 . Thus 1 − 2p1 ≤ p1 so p1 ≥ 1/3, i.e., pmin = 1/3. This bound is achievable by choosing p1 = p2 = 1/3 and p3 = p4 = 1/6. (d) The argument in part (b) remains the same if we assume p1 ≤ p3 +p4 rather than p1 = p3 +p4 , i.e., p1 ≤ p3 + p4 implies that p1 ≤ pmax . Thus assuming p1 > pmax implies that p1 > p3 + p4 . Thus the supersymbol obtained by combining symbols 3 and 4 will be combined with symbol 2 (or perhaps with symbol 1 if p2 = p1 ). Thus the codeword for symbol 1 (or perhaps the codeword for symbol 2) will have length 1. (e) The lengths of any optimal prefix free code must be either (1, 2, 3, 3) or (2, 2, 2, 2). If p1 > pmax , then, from (b), p1 > p3 + p4 , so the lengths (1, 2, 3, 3) yield a lower average length than (2, 2, 2, 2). (f) The argument in part (c) remains almost the same if we start with the assumption that p1 ≥ p3 + p4 . In this case p2 = 1 − p1 − p3 − p3 ≥ 1 − 2p1 . Combined with p1 ≥ p2 , we again have p1 ≥ pmin . Thus if p1 < pmin , we must have p3 + p4 > p1 ≥ p2 . We then must combine p1 and p2 in the second step of the Huﬀman algorithm, so each codeword will have length 2. (g) It turns out that pmax is still 2/5. To see this, first note that if p1 = 2/5, p2 = p3 = 1/5 and all other symbols have an aggregate probability of 1/5, then the Huﬀman code construction combines the least likely symbols until they are tied together into a supersymbol of probability 1/5. The completion of the algorithm, as in part (b), can lead to either one codeword of length 1 or 3 codewords of length 2 and the others of longer length. If p1 > 2/5, then at each stage of the algorithm, two nodes of aggregate probability less than 2/5 are combined, leaving symbol 1 unattached until only 4 nodes remain in the reduced symbol set. The argument in (d) then guarantees that the code will have one codeword of length 1. Exercise 2.15: (a) This is the same as Lemma 2.5.1. (b) Since p1 < pM −1 + pM , we see that p1 < p0M −1 , where p0M −1 is the probability of the node in the reduced code tree corresponding to letters M − 1 and M in the original alphabet. Thus, 0 by part (a), l1 ≥ lM −1 = lM − 1.

(c) Consider an arbitrary minimum-expected-length code tree. This code tree must be full (by Lemma 2.5.2), so suppose that symbol k is the sibling of symbol M in this tree. If k = 1, then l1 = lM , and otherwise, p1 < pM + pk , so l1 must be at least as large as the length of the immediate parent of M , showing that l1 ≥ lM − 1.

(d) and (e) We have shown that the shortest and longest length diﬀer by at most 1, with some number m ≥ 1 lengths equal to l1 and the remaining M −m lengths equal to l1 +1. It follows that 2l1 +1 = 2m + (M − m) = M + m. From this is follows that l1 = blog2 (M )c and m = 2l1 +1 − M . Exercise 2.16: (a) Grow a full ternary tree to a full ternary tree at each step. The smallest tree has 3 leaves. For the next largest full tree, convert one of the leaves into an intermediate node and grow 3 leaves

6

from that node. We lose 1 leaf, but gain 2 more at each growth extension. Thus, M = 3 + 2n (for n an integer). (b) It is clear that for optimality, all the unused leaves in the tree must have the same length as the longest codeword. For M even, combine the 2 lowest probabilities into a node at the first step, then combine the 3 lowest probability nodes for all the rest of the steps until the root node. If M is odd, a full ternary tree is possible, so combine the 3 lowest probability nodes at each step. (c) If {a, b, c, d, e, f } have symbol probabilities {0.3, 0.2, 0.2, 0.1, 0.1, 0.1} respectively, then the ternary Huﬀman code will be {a → 0, b → 1, c → 20, d → 21, e → 220, f → 221}. Exercise 2.18: (a) Applying the Huﬀman coding algorithm to the code with M + 1 symbols with pM +1 = 0, we combine symbol M + 1 with symbol M and the reduced code has M symbols with probabilities p1 , . . . , pM . The Huﬀman code for this reduced set of symbols is simply the code for the original set of symbols with symbol M + 1 eliminated. Thus the code including symbol M + 1 is the 0 reduced code modified by a unit length increase in the codeword for symbol M . Thus L = L +pM 0 where L is the expected length for the code with M symbols. (b) All n of the zero probability symbols are combined together in the Huﬀman algorithm, and the reduced code from this combination is then the same as the code with M + 1 symbols in 0 part (a). Thus L = L + pM again. Exercise 2.19: (a) The entropies H(X), H(Y ), and H(XY ) can be expressed as X H(XY ) = − pXY (x, y) log pXY (x, y) x∈X ,y∈Y

H(X) = − H(Y ) = −

X

pXY (x, y) log pX (x)

x∈X ,y∈Y

X

pXY (x, y) log pY (y).

x∈X ,y∈Y

It is assumed that all symbol pairs x, y of zero probability have been removed from this sum, and thus all x (y) for which pX (x) = 0 ( pY (y) = 0) are consequently removed. Combining these equations, H(XY ) − H(X) − H(Y ) =

X

pXY (x, y) log

x∈X ,y∈Y

(b) Using the standard inequality log x ≤ (x − 1) log e, H(XY ) − H(X) − H(Y ) ≤

X

x∈X ,y∈Y

pX (x)pY (y) . pXY (x, y)

∑

∏ pX (x)pY (y) pXY (x, y) − 1 log e = 0. pXY (x, y)

Thus H(X, Y ) ≤ H(X) + H(Y ). Note that this inequality is satisfied with equality if and only if X and Y are independent. 7

(c) For n symbols, X1 , . . . , Xn , let Y be the ‘super-symbol’ X2 , . . . , Xn . Then using (b), H(X1 , . . . , Xn ) = H(X1 , Y ) ≤ H(X1 ) + H(Y ) = H(X1 ) + H(X2 , . . . , Xn ). Iterating this gives the desired result. An alternate approach generalizes part (b) in the following way: H(X1 , . . . , Xn ) −

X

H(Xi )

=

X

p(x1 , . . . , xn ) log

x1 ,... ,xn

i

≤ 0,

p(x1 ), . . . , ...p(xn ) p(x1 , . . . , xn )

where we have used log x ≤ (x − 1) log e again. Exercise 2.20 (a) Y is 1 if X = 1, which occurs with probability p1 . Y is 0 otherwise. Thus H(Y ) = −p1 log(p1 ) − (1 − p1 ) log(1 − p1 ) = Hb (p1 ). (b) Given Y =1, X = 1 with probability 1, so H(X | Y =1) = 0.

(c) Given Y =0, X=1 has probability 0, so X has M −1 possible choices with non-zero probability. The maximum entropy for an alphabet of size M −1 terms is log(M −1), so H(X|Y =0) ≤ log(M − 1). This upper bound is met with equality if Pr(X=j | X6=1) = M1−1 for all j 6= 1. Since Pr(X=j|X6=1) = pj /(1 − p1 ), this upper bound on H(X | Y =0) is achieved when p2 = p3 = · · · = pM . Combining this with part (b), H(X | Y ) = p1 H(X | Y =1) ≤ (1−p1 ) log(M − 1). (d) Note that H(XY ) = H(Y ) + H(X|Y ) ≤ Hb (p1 ) + (1−p1 ) log(M −1) and this is met with equality for p2 = · · · , pM . There are now two reasonable approaches. One is to note that H(XY ) can also be expressed as H(X) + H(Y |X). Since Y is uniquely specified by X, H(Y |X) = 0, H(X) = H(XY ) ≤ Hb (p1 ) + (1 − p1 ) log(M − 1),

(3)

with equality when p2 = p3 = · · · = pM . The other approach is to observe that H(X) ≤ H(XY ), which again leads (3), but this does not immediately imply that equality is met for p2 = · · · = pM . Equation (3) is the Fano bound of information theory; it is useful when p1 is very close to 1 and plays a key role in the noisy channel coding theorem. (e) The same bound applies to each symbol by replacing p1 by pj for any j, 1 ≤ j ≤ M . Thus it also applies to pmax . Exercise 2.22: One way to generate a source code for (X1 , X2 , X3 is to concatenate a Huﬀman code for (X1 , X2 ) with a Huﬀman code of X3 . The expected length of the resulting code for (X1 , X2 , X3 ) is Lmin,2 + Lmin . The expected length per source letter of this code is 23 Lmin,2 +

8

1 3 Lmin .

The expected length per source letter of the optimal code for (X1 , X2 , X3 ) can be no worse, so 2 1 Lmin,3 ≤ Lmin,2 + Lmin . 3 3 Exercise 2.23: (Run Length Coding) (a) Let C and C 0 be the codes mapping source symbols to intermediate integers and intermediate integers to outbit bits respectively. If C 0 is uniquely decodable, then the intermediate integers can be decoded from the received bit stream, and if C is also uniquely decodable, the original source bits can be decoded. The lengths specified for C 0 satisfy Kraft and thus this code can be made prefix-free and thus uniquely decodable. For example, mapping 8 → 1 and each other integer to 0 followed by its 3 bit binary representation is prefix-free. C is a variable to fixed length code, mapping {b, ab, a2 b, . . . , a7 b, a8 } to the integers 0 to 8. This set of strings forms a full prefix-free set, and thus any binary string can be parsed into these ‘codewords’, which are then mapped to the integers 0 to 8. The integers can then be decoded into the ‘codewords’ which are then concatenated into the original binary sequence. In general, a variable to fixed length code is uniquely decodable if the encoder can parse, which is guaranteed if that set of ‘codewords’ is full and prefix-free. (b) Each occurence of source letter b causes 4 bits to leave the encoder immediately. In addition, each subsequent run of 8 a’s causes 1 extra bit to leave the encoder. Thus, for each b, the encoder emits 4 bits with probability 1; it emits an extra bit with probability (0.9)8 ; it emits yet a further bit with probability (0.9)16 and so forth. Letting Y be the number of output bits per input b, E(Y ) = 4 + (.09)8 + (0.9)16 + · · · =

4 · (0.9)8 = 4.756. 1 − (0.9)8

(c) To count the number of b’s out of the source, let Bi = 1 if the P ith source letter is b and 2 Bi = 0 otherwise. Then E(Bi ) = 0.1 and σBi = 0.09. Let AB = (1/n) ni=1 Bi be the number of b’s per input in a run of n = 1020 inputs. This has mean 0.1 and variance (0.9) · 10−21 , which is close to 0.1 with very high probability. As the number of trials increase, it is closer to 0.1 with still higher probability. (d) The total number of output bits corresponding to the essentially 1019 b’s in the 1020 source letters is with high probability close to 4.756 · 1019 (1 + ≤) for small ≤. Thus, L≈

(0.1) · 4 · (0.9)8 = 0.4756. 1 − (0.9)8

Renewal theory provides a more convincing way to fully justify this solution. Note that the achieved L is impressive considering that the entropy of the source is −(0.9) log(0.9) − (0.1) log(0.1) = 0.469 bits/source symbol. Exercise 2.25: (a) Note that W takes on values − log(2/3) with probability 2/3 and − log(1/3) with probability 1/3. Thus E(W ) = log 3 − 23 . Note that E(W ) = H(X). The fluctuation of W around its means 2 2 1 2 2 is −1 3 with probability 3 and 3 with probability 3 . Thus σW = 9 . 9

(b) The bound on the probability of the typical set, as derived using the Chebyshev inequality, and stated in (2.2) is: Pr(Xn ∈ T≤ ) ≥ 1 −

2 σW 1 =1− . n≤2 45

(c) To count the number of a’s out of the source, let the rv Yi (Xi ) be 1 for Xi = a and 0 for Xi = b. The Yi (Xi )’s are iid with mean Y = 2/3 and σY2 = 2/9. Na (X n ) is given by Na =

n X

Yi (Xi ),

i=1

which has mean 2n/3 and variance 2n/9.

P (d) Since the n-tuple Xn is iid, the sample outcome w(xn ) = i w(xi ). Let na be the sample value of Na corresponding to x n . Since w(a) = − log 2/3 and w(b) = − log 1/3, we have w(xn ) = na (− log 2/3) + (n − na )(− log 1/3) = n log 3 − na

W (Xn ) = n log 3 − Na .

˜ (X n ), the fluctuation of W (X n ) around its mean, is the negative of the In other words, W ˜ (X n ) = −N ˜a (X n ). fluctuation of Na (X n ); that is W (e) The typical set is given by: Ø Ø Ø Ø Ω æ Ω æ n n) Ø Ø Ø Ø w(x ˜ ≤ n Ø w(x ) n Ø 5/4 (and thus for f ≥ 2). Thus we can use the band edge symmetry condition, gˆ(f ) + gˆ(2 − f ) = 1/2 over 0 ≤ f ≤ 1. Since gˆ(f ) = 0 for 3/4 < f ≤ 1, it is necessary that gˆ(f ) = 1/2 for 1 < f ≤ 5/4. Similarly, since gˆ(f ) = 0 for f > 5/4, we must satisfy gˆ(f ) = 1/2 for |f | < 3/4. Thus, to satisfy the Nyquist criterion, gˆ(f ) is uniquely specified as below.

41

0.5

gˆ(f ) 0

3 4

5 4

ˆ )]. Elsewhere gˆ(f ) = 0 In the regions where gˆ(f ) = 1/2, we must choose qˆ(f ) = 1/[2ˆ p(f )h(f ˆ because h(f ) = 0, and thus qˆ(f ) is arbitrary. More specifically, we must choose qˆ(f ) to satisfy |f | ≤ 0.5; 0.5, 1 qˆ(f ) = 3−2|f | , 0.5 < |f | ≤ 0.75 1 , 1 ≤ |f | ≤ 5/4 3−2|f |

It makes no diﬀerence what qˆ(f ) is elsewhere as it will be multiplied by zero there. ˆ ) = 0 for f > 3/4, it is necessary that gˆ(f ) = 0 for |f | > 3/4. Thus, for all integers (c) Since h(f m, gˆ(f + 2m) is 0 for 3/4 < f < 1 and the Nyquist criterion cannot be met. ˆ ) 6= 0, it is possible for gˆ(f ) to have an arbitrary value by (d) If for some frequency f , pˆ(f )h(f ˆ ) = 0 for some f , then gˆ(f ) = 0. choosing qˆ(f ) appropriately. On the other hand, if pˆ(f )h(f Thus, to avoid ISI, it is necessary that for each 0 ≤ f ≤ 1/(2T ),Pthere is some integer m such that ˆ +m/T )ˆ ˆ +m/T )ˆ h(f p(f +m/T ) 6= 0. Equivalently, it is necessary that m h(f p(f +m/T ) 6= 0 for all f . ˆ ) goes through There is one peculiarity here that you were not expected to deal with. If pˆ(f )h(f zero at f0 with some given slope, and that is the only f that can be used to satisfy the Nyquist criterion, then even if we ignore the point f0 , the response qˆ(f ) would approach infinity fast enough in the vicinity of f0 that qˆ(f ) would not be L2 . This overall problem shows that under ordinary conditions (i.e.non-zero filter transforms), there is no problem in choosing qˆ(f ) to avoid intersymbol interference. Later, when noise is taken into account, it will be seen that it is undesirable for qˆ(f ) be very large where pˆ(f ) is small, since this amplifies the noise in frequency regions where there is very little signal. Exercise 6.8: (a) With α = 1, the flat part of gˆ(f ) disappears. Using T = 1 and using the familiar formula cos2 x = (1 + cos 2x)/2, gˆ1 (f ) becomes ∑ ∏ 1 πf f gˆ1 (f ) = 1 + cos( ) rect( ). 2 2 2 Writing cos x = (1/2)[eix + e−ix ] and using the frequency shift rule for Fourier transforms, we

42

get 1 1 g1 (t) = sinc(2t) + sinc(2t + 1) + sinc(2t − 1) 2 2 sin(2πt) 1 sin(π(2t + 1)) 1 sin(π(2t − 1)) = + + 2πt 2 π(2t + 1) 2 π(2t − 1) sin(2πt) 1 sin(2πt) 1 sin(2πt) = − − 2πt 2 π(2t + 1) 2 π(2t − 1) ∑ ∏ sin(2πt) 1 1 1 sin(2πt) = − − = 2π t 2t + 1 2t − 1 2πt(1 − 4t2 ) sin(πt) cos(πt) sinc(t) cos(πt) = = . 2 πt(1 − 4t ) (1 − 4t2 ) This agrees with (6.18) in the text for α = 1, T = 1. Note that the denominator is 0 at t = ±0.5. The numerator is also 0, and it can be seen from the first equation above that the limiting value as t → ±0.5 is 1/2. Note also that this approaches 0 with increasing t as 1/t3 , much faster than sinc(t). (b) It is necessary to use the result of Exercise 6.6 here. As shown there, the inverse transform of a real symmetric waveform gα (f ) that satisfies the Nyquist criterion for T = 1 and has a rolloﬀ of α ≤ 1 is equal to sinc(t)v(t). Here v(t) is lowpass limited to α/2 and its transform vˆ(f ) is given by the following: vˆ(f + 1/2) =

dˆ g (f ) df

for

−(1 + α) (1 − α) fc and the overlapped portion can not be retrieved without additional knowledge about u(t). (c) The ambiguity is obvious by repeating the argument in (a). Now, since y(t) has some nonzero bandwidth, ambiguity might be possible in other ways also. We have already seen, however, that if u(t) has a bandwidth less than fc , then u(t) can be uniquely retrieved from x(t) in the absence of noise. (d) For u(t) real, x(t) = 2u(t) cos(2πfc t), so u(t) can be retrieved by dividing x(t) by 2 cos(2πfc t) except at those points of measure 0 where the cosine function is 0. This is not a reasonable approach, especially in the presence of noise, but it points out that the PAM case is essentially diﬀerent from the QAM case 44

(e) Since u∗ (t) exp(2πifc t) has energy at positive frequencies, the use of a Hilbert filter does not have an output equal to u(t) exp(2πifc t), and thus u(t) does not result from shifting this output down by fc . In the same way, the bands at 2fc and −2fc that result from DSB-QC demodulation mix with those at 0 frequency, so cannot be removed by an ordinary LTI filter. For QAM, this problem is to be expected since u(t) cannot be uniquely generated by any means at all. For PAM it is surprising, since it says that these methods are not general. Since all time-limited waveforms are unbounded in frequency, it says that there is a fundamental theoretical problem with the standard methods of demodulation. This is not a problem in practice, since fc is usually so much larger than the nominal bandwidth of u(t) that this problem is of no significance. Exercise 6.13: (a) Since u(t) is real, φ1 (t) =