C PiQkli - Semantic Scholar

1 downloads 0 Views 461KB Size Report
limit yielding a point on the R(d) curve if the reproducing alphabet is finite, and we obtain a ... Then clearly. Fs(Qc2),q”') 2 Fs(Q(2),q'2') r Fs(Q(3),q'2') > . . . . III.
122 compression techniques,” in 1970 Int. Conf. Communications, pp. 20.14-20.21. [6] T. J. Lynch, “Compression of weather charts by the segmented Lynch-Davisson code,” IEEE Trans. Commun. (Corresp.), vol. COM-21, pp. 774-776, June 1973. [7] H. Meyr et al., “Optimum run length codes,” in 1973 Znt. Co& Communications, pp. 48.1948.25. [8] W. S. Michel et al., “A coded facsimile system,” in IRE Weston Conv. Rec., 1957, pp. 84-93. [9] W. S. Michel, “Statistical encoding for text and picture communication,” Commun. Electron., vol. 35, pp. 33-36, Mar. 1958. DO1H. G. .Musmann and D. Preuss, “A redundancy reducing facsimile coding scheme,” Nachrichtentech. Z., vol. 26, no. 2, pp. 91-94, Feb. 1973. Hll K. Ohmori et al., “An application of cellular logic for high speed decoding of minimum-redundancy codes,” in Proc. AFIPS Fall Joint Comp. Conf., vol. 41, Dec. 1972, pp. 345351. WI D. Preuss, “Redundanzreduzierende codierung von faksimilessignalen,” Nachrichtentech. Z., vol. 24, no. 11, pp. 564-568, Nov. 1971. u31 B. M. Rosenheck, “FASTFAX, a second generation facsimile system employing redundancy reduction techniques,” IEEE Trans. Commun. Technol., vol. COM-18, pp. 772-779, Dec. 1970. [I41 R. A. Schaphorst and R. L. Remm, “An image coding system which reduces redundancy in two dimensions,” in Proc. Nat. Aerospace Electronics Co&, May 14-16, 1962, pp. 116-127. 1151V. M. Tyler, “Two hardcopy terminals for PCM communications of meteorological products,” in 1969 Znt. Conf. Communications, pp. 11.21-11.28. D'51H. E. White et al., “Dictionary look-up encoding of graphics data,” in Picture Bandwidth Compression. New York: Gordon and Breach, 1972, pp. 267-281. [I71 H. Wyle, T. Erb, and R. Banow, “Reduced-time facsimile transmission by digital coding,” IRE Trans. Commun. Syst., vol. CS-9, pp. 215-222, Sept. 1961. 1181P. Zamperoni, “An improved run-length code for typewritten text and drawings,” Nachrichtentech. Z., vol. 25, no. 4, pp. 204-206, Apr. 1972. E. Quality Measurement Studies: Legibility, Moire Effects, and Scan Aperture Functions [l] R. B. Arps, R. L. Erdmann, A. S. Neal, and C. E. Schlaepfer, “Character legibility versus resolution in image processing of printed matter,” IEEE Trans. Man-Mach. Syst., vol. MMS-10, pp. 66-71, Sept. 1969. [2] R. B. Arps, “A model for legibility dependence on spatial resolution,” in 1970 SZD IDEA Symp., pp. 80-81. [3] R. L. Erdmann and A. S. Neal, “Word legibility as a function of letter legibility with word size, word familiarity, and resolution as parameters,” J. Appl. Psychol., vol. 52, no. 5, pp. 403-409, 1968. [4] W. F. Schreiber, “Reproduction of graphical data by facsimile,” Massachusetts Institute of Technology, Cambridge, Mass., M.I.T., Quarterly Progress Rep. 84, pp. 291-293, Jan. 1967. [5] B. J. Vieri, “Image quality in facsimile systems,” in 1971 Int. Conf. Communications, pp. 7.17-7.20. F. Related Bibliographies [l] W. K. Pratt, “A bibliography on television bandwidth re-

IEEE’ TRANSACTIONS

[2] [3] [4] [5] [6] [7]

ON INFORMATION

THEORY,

JANUARY

1974

duction studies,” IEEE Trans. Inform. Theory, vol. IT-13, pp. 114115, Jan. 1967. A. Rosenfeld, “Picture processing by computer,” Computing Surveys, vol. 1, no. 3, pp. 147-176, Sept. 1969. -, “Progress in picture processing: 1969-71,” ACM Computing $urveys, vol. 5, pp. 81-108, June 1973. D. A. Shurtleff, “Studies in television legibility-a review of the literature,” Inform. Display, vol. 4, pp. 40-45, 1967. “Bibliography on digital image processing and related topics,” Image Processing Lab., Univ. of Southern Calif., Los Angeles, Calif., USCEE Rep. 410, Feb. 1972. L. C. Wilkins and P. A. Wintz, “Bibliography on data compression, picture properties, and picture coding,” IEEE Trans. Inform. Theory, vol. IT-17, pp. 180-199, Mar. 1971. W. Wolff and R. Lippman, “Literaturverzeichnis Quellencodierung,” Institut fur Nachrichtentechnik, Technische Universitat Braunschweig, Germany, Oct. 1970.

On the Computation of Rate-Distortion

Functions

I. CSISZAR Af~tract-In a recent paper [l], Blahut suggestedan efficient algorithm for computing rate-distortion functions. In this correspondence we show that the sequence of distributions used in that algorithm has a limit yielding a point on the R(d) curve if the reproducing alphabet is finite, and we obtain a similar but weaker result for countable reproducing alphabets. I. INTRODUCTION

In a recent paper [l 1, Blahut suggestedefficient algorithms for computing channel capacity and rate-distortion functions; for the former case, the same algorithm appears (though in less generality) in the work of Arimoto [2]. The’convergence proofs in [l ] contain a gap, claiming that a product 6, a. . bk cannot have a limit (as k -+ a) if a subsequence of the b1 has a limit > 1; this is true for a positive limit only. In [2] the proofs are complete, and, moreover, the distributions themselves used in the algorithm are shown to converge to one achieving capacity. We shall use a method similar to that of [2] to prove an analogous result for the rate-distortion function, supposing that the reproducing alphabet is finite. A similar but weaker result will also be obtained for an infinite reproducing alphabet. As reviewers pointed out, Blahut’s original convergence statement [see (12)] has also been proved in a paper of Boukris [3], as well as by Blahut himself in his Ph.D. dissertation [4]. II. PRELIMINARIES Suppose that symbols selected from a finite or countable alphabet with distributions p = (pj) are to be reproduced by symbols of another finite or countable alphabet. For a given loss matrix (pjk), the rate-distortion function R(d) is defined as the minimum amount of information needed for reproduction with average loss I d, i.e., R(d) = inf Z(Q) = inf c c pjQrljlog j

k

~

Qklj

C PiQkli

(1)

i

Manuscript received March 6, 1973; revised July 30, 1973. The author is with the Mathematical Institute of the Hungarian Academy of Sciences, Budapest, Hungary.

123

COXUUW'ONDENCE

This shows, in particular, that the series

where the inf refers to Q: D(Q) 5 d, with

T [Fs(Q(n)d”-l))- F,(Q,q)l

(2) (The argumentsp are omitted becausep will be fixed throughout.) Setting F, = infe (Z(Q) - SD(Q)), the R(d) curve is the envelope of the lines with slope s 5 0 and R-axis intercept F,, seee.g., [5]. It suffices therefore to compute F,. Blahut’s algorithm [l] is based on the observation that F, = infQ,, F,(Q,q) where

F(Q s 94)

= C C j

PjQkljlog eklj

k

sDtQ>

-

(3)

-1

9k(Q>= CPjQklj. Qkrj(9)= j

qk em

($Pjk)

F

41 exP

(J’Pjl)

[

1

(4)

This was established in [l] by Lagrange multipliers. A more direct proof is given by the easily checked identities

F,(Q,q>= F,(Q,q(Q))+ Z(dQ) II 4) = F,(Q(q),q)+ C PjZ(QjII Qj(q>> j

(5)

where Qj stands for the distribution (Qk, j), for fixed j, and Z(q 11q’) denotes the nonnegative quantity

z(q11 4’)= c qk1%q . k

(6)

qk

Starting from an arbitrary q(l) = (qkcl)) with q,“) > 0, for each k, set recursively Q@) = Q(q(“-‘)), qcn) = q(Qcn)), n = 2,3,. . . . Then clearly 2 Fs(Q(2),q’2’) r Fs(Q(3),q’2’)

> . . ..

III. CONWRGENCE RESULTSFORBLAHUT'SALGORITHM For arbitrary probabilities”

Q and q = q(Q),

consider the “backward

= PjQk I

(7) and let Pk denote the corresponding distribution for fixed k (with qk > 0). We start from the easily checked identity pj

Ik

j/qk

FXQ(R),q(n-l)) + & qkz(pk 11p,‘“‘) - F,(Q,q> = c qk log qkO qr-l) k

(8)

where Pkcn) is defined as Pk with Q”‘) playing the role of Q, i.e., p$$ = p,q:“-l’

ew

cSPjk)

9:“’

C

I

9:“-‘)

ew

Supposing 5 lim F,(Q’“‘,q’“‘) n-m

F,(Q,q)

CsPjl)

1 -l.

(9)

= lim Fs(Qcn),q(“-l)) n+cc

F

k

9k”)

(11)

if Z(q II q(l)) < cc, proving lim F,(Q’“‘,q’“‘) “-+CD

= lim Fs(Q(n),q(“-l)) “-rKl

= F, = inf F,(Q,q) (12)

= I(9 II 99

provided that the infinum in (12) can be approached by q with Z(q II q(l)) < co. The latter condition is trivially fulfilled if the reproducing alphabet is finite; if it is countable, an easily checked sufficient condition is d* = infzpjpjk < cc. k j Theorem I: If the reproducing alphabet is finite, there exists a q* (possibly depending on q(l)) such that q@) -+ q*, Qcn) + Q* = Q(q*) and F,(Q*,q*) = F,. Proof: Pick a convergent subsequence q(“i) -+ q*, say, of q(“). Then Q(R+‘) = Q(qcni)) + Q(q*) = Q* and Fs(Q(“‘+l),q(“‘))

-+ F,(Q*,q*).

In view of (12), we have F,(Q*,q*) = F,; thus q* = q(Q*), hence (10) applies for Q* and q*. In particular, Z(q * 114’“)) is a nonincreasing sequence.Since qtni) -+ q* implies Z(q* II qfni)) -+ 0, this means Z(q* II 9’“)) + 0. Hence q@) --, q*, completing the proof. For a countable reproducing alphabet, this proof breaks down at two points: i) it may not be possible to extract a subsequence from q(“) converging to a distribution q*; ii) even if such q@f) --, q* can be found, Z(q* jl qcni)) --f 0 does not necessarily follow. Theorem 2: Even for a countable reproducing alphabet, if q* and Q* = Q(q*) achieving F, exist and Z(q* II q(l)) < co,

the backward probabilities (9) always converge to those corresponding to Q* [see (7)], for each k with qk* > 0. Moreover, F qk(n)exp

(SPjk)

+ F qk* exp (spjk),

for all j.

Proof: (8) implies not only (10) and (1 I), but similar results for Ck qkZ(Pk II Pk(“)) instead of [Fs(Q(n),q(n-ll) - F,(Q,q)], as well. Thus with q = q*, we have lim,,, Z(P,* II Pkcn)) = 0, if qk* > 0, proving the first assertion. In view of (9), (7), and (4), the second assertion follows from the first one if we show q,(“)/q~-l) --$ 1, for qk* > 0. However since Z(q* II 9”‘)) is nonincreasing, qk(“) must be bounded away from 0, for each fixed k with qk* > 0, hence qk(n)/q~-l) + 1 follows from the limit relation - Fs(Q’“‘,q’“‘)

--f 0.

IV. DISCUSSION

qk loi% qkO qp-l'

= c qk log qkcN) -

= 0

See the tirst identity in (5).

0 5 nE$+l [F,(Q’“‘d”-l)> - UQ,q>l “=$+,

- Fs(Q,q)]

Z(q’“’ 11q’“- ‘)) = Fs(Q(n),q(“-l))

(8) implies, for any N > It4 2 1,

s

lim [Fs(Q(“),q(“-‘))

n-+m

qk

and it consists in successive minimizations with respect to Q and q. For fixed Q or q, F,(Q,q) is minimized by q(Q) or Q(q), respectively, defined by

Fs(Qc2),q”‘)

converges; thus

- I(9 II dN’).

(10)

We have shown that the sequenceof distributions appearing in Blahut’s [l ] algorithm has a limit yielding a point on the R(d) curve, if the reproducing alphabet is finite, and established the convergence of the “backward probabilities” also for a countable reproducing alphabet. The cardinality of the original alphabet

124

IEEE TRANSACTIONS

has been irrelevant. Let us remark that in general, each limiting distribution of a subsequence of q(“) (if any) achieves F,, on account of the lower semicontinuity of F,(Q,q). Under a compactness condition, such as that for each j and K, the set of K with Pjk < K be finite, in which case Q* and q* = q(Q*) achieving F, certainly exist, one can therefore always assert q(“) --f q * if q* is unique. While q* need not be unique, the sums Ck qk* exp (spjk) are by Theorem 2. The case of a countable reproducing alphabet has been included in spite of its limited practical interest because the results obtained for this case have straightforward extensions to abstract alphabets, replacing probabilities by densities and sums by integrals. REFERENCES [l] R. E. Blahut, “Computation of channel capacity and rate-distortion functions,” IEEE Trans. Inform. Theory, vol. IT-18, pp. 46W73, July _^-IY IL S. Arimoto, “An algorithm for computing capacity of arbitrary dis121 crete memoryless channels,” IEEE Trans. Inform. Theory, vol. IT-l& pp. 142Or Jan. 1972. [31 P. Boukrts, “An upper bound on the speed of convergence of the Blahut algorithm for computing rate-distortion functions,” IEEE Trans. Inform. Theory (Corresp.), vol. IT-19, pp. 708-709, Sept. 1973. [41 R. E. Blahut, “An hypothesis-testing approach to information theory,” Ph.D. dissertation, Dep. Elec. Eng., Cornell Univ., Ithaca, N.Y., Sept. 1972. [5] T. Berger, Rate Distortion Theory. A Mathematical Basis for Data Compression. Englewood Cliffs, N.J.: Prentice-Hall, 1971.

ON INFORMATION

THEORY,

JANUARY

1974

giving a set G=

p)aiIpiI~~andCpi=l i 1 l of possible probability distributions. Such a situation might occur due to lack of precise information on the probabilities or because the message frequencies are known to fluctuate as a function of time or other variables, e.g., the same channel and transmission code might be used for several types of communication with each type using less than or equal to N possible codewords. Suppose we want to maximize the rate of transmission T that can be guaranteed for all probability distributions p E G. That is, we wish to find a code e E A that minimizes the largest average word length for probability distributions in G or, using our previous notation, solves the problem min max i cs.4

psG

pi&(c).

(1.2)

i=l

A solution e would determine the best lower bound on transmission rate for given G, namely, 1

T=

max C Pili(C) . PEG 1

(1.3)

II. PROPOSEDMINIMAX SOLUTIONS A Generalization of Huffman Coding for Messages with Relative Frequencies Given by Upper and Lower Bounds

An equivalent formulation of the problem (1.2) is to determine a pair of points fi E G, c^E A satisfying C

CiziCe)

STEPHEN A. SMITH Absiruct-A generalization of the Hulkan coding procedure is given for cases in which the source letter probabilities are known only to fall in certain ranges.

I. INTRODUCTIONTOTHEPROBLEM Given a random variable X taking on N discrete values with i =

1,2,-.-,N

~l@hzl(ch~ ’. ,INk) denote the respective word lengths for a given code c, the Huffman coding procedure [3] gives us a solution to this problem C

PiziCc)

(1.1)

i

where A is the set of admissible codes. We will consider the following generalization of the problem. Suppose that the probability distribution for X,p = pl,pl, - . ‘,pN is specified only within upper and lower bounds of the form 0 I GI*I pi I Bi S 1,

i =

for all c E A

(2.1)

for all p E G.

(2.2)

i

The proposed solutions are as follows. For fi, take the probability distribution in G, illustrated in Fig. 1, that is defined by ji =

A, ai, I Bi,

if tli < A < & if aI 2 A if bi I A

(2.3)

where the value of A is determined by the requirement that

and an alphabet with a finite number of elements D, the classical noiselesscoding problem is to construct from the alphabet a code consisting of N distinct words corresponding to the respective values of X so that the average or expected word length for transmitting X is minimized.’ If we let

mh csA

BiziCc)*

C d*z*(E) 2 C Pizi(e), i

pi = P{X = Xi},

s C 1

i

1,2,-.-,N

Ffii

= 1.2

It is clear that fi is unique since modifying it through increasing or decreasing the value of A immediately violates the requirement j51 +-** + eN = 1. In addition, by using a graphical representation like the one in Fig. 1, an appropriate value of A can be determined quickly by trial and error. For 6 we then use the Huffman Code that is optimal for fi and this code can, of course, be determined by the Huffman coding procedure. The optimality of these solutions is demonstrated in the Appendix. The form of the solutions can be intuitively explained using Shannon’s information measure [4] and the concept of maximum entropy. Since the entropy of p for an alphabet of D characters -C Pi lo& Pi i is a lower bound on the actual minimum average word length Inill C PiC(C) CGA i

Manuscript received January 5, 1973; revised July 30, 1973. The author is with - _.^ - .--. the Palo Alto Research Center, Xerox Corporation, Palo Alto, Cam: Y4304. r 1A 1 complete re cn. 1 4. ..1 discussion of coding problems is given in Abramson [I] or ~sn TV,

s Note that if Zt 01~< 1 < & B1, such a A always exists, but is not unique in some cases.