Optimal Bit Allocation via the Generalized BFOS Algorithm 1 Introduction

Optimal Bit Allocation via the Generalized BFOS Algorithm Eve A. Riskin

3

Abstract We analyze the use of the generalized Breiman, Friedman, Olshen, and Stone (BFOS) algorithm, a recently developed technique for variable rate vector quantizer design, for optimal bit allocation. It is shown that if each source has a convex quantizer function then the complexity of the algorithm is low.

Key Words: Bit allocation, vector quantization, tree coding

1 Introduction In bit allocation, a given number of bits is assigned to a set of dierent sources to minimize the overall distortion of the coder. Westerink, Biemond, and Boekee [1] developed an optimal bit allocation algorithm which we simplify when all operational distortion-rate functions (which we refer to in this note as quantizer functions) are convex. We restate their algorithm using the generalized BFOS algorithm for both cases of convex and nonconvex quantizer functions (QFs) and analyze its complexity. Bit allocation using the generalized BFOS algorithm was rst suggested in [2]. 3 Eve

A. Riskin was with the Information Systems Laboratory, Stanford University. She is now

with the Department of Electrical Engineering, FT-10, University of Washington, Seattle, WA 98195. This work was supported by ESL, a subsidiary of TRW, and by Rockwell International.

It was

presented in part at the 1990 International Symposium on Information Theory, San Diego, California, January 1990.

1

2 The Generalized BFOS Algorithm The generalized BFOS algorithm is an extension of an algorithm for optimal pruning in tree-structured classi cation and regression [3] to coding. For a source coding application, it nds a sequence of nested subtrees (each subtree is a subtree of the next subtree) of a given tree-structured coder in which each one is optimal in that it has the lowest average distortion of all subtrees of the tree with the same or lower average rate. Speci cally, let S be a subtree of a complete tree T that shares the same root (denoted S T ). The generalized BFOS algorithm minimizes the tree functional, J (S ) = u2(S )+ u1 (S ) over all S T . It trades o two monotonic tree functionals, u1 and u2, where u1 is monotonically increasing (never decreases) and u2 is monotonically decreasing (never increases) as the tree grows. The parameter = 0 11uu can be interpreted as a Lagrange multiplier that trades o u2 for u1; here, 1u2 is the change in u2 and 1u1 is the change in u1 due to pruning o a branch (a subtree not necessarily rooted at the root node whose leaves are a subset of the leaves of the tree) of T . Thus, is equal to the magnitude of the ratio of the increase in u2 to decrease in u1. The generalized BFOS algorithm prunes o branches of a tree T in order of increasing to nd its optimal pruned subtrees. It is proved in [2] that the optimal subtrees are nested. Henceforth, we will assume that u1 and u2 are respectively the average rate and distortion of a nite collection of vector quantizers. 2 1

3 Bit Allocation for Classi ed Vector Quantization Assume that we have a classi ed vector quantizer (VQ) with M classes [4]. The motivation of classi ed VQ is to code inputs with codebooks speci cally designed for the type of input for better overall performance. The classi cation can be performed using such methods as decision trees, edge detection, or VQ, and side information is 2

used to specify the class. If the overall code has a xed rate, then each codebook contains the same number of codewords. The natural way to design a variable rate code is simply to vary the sizes of the codebooks by allocating diering numbers of bits to them. The generalized BFOS algorithm determines the optimal codebook sizes by allocating a maximum number of bits to each class, and then deallocating or \pruning" bits optimally. For simplicity, here we allocate only integral numbers of bits, but the algorithm can be easily extended to non-integral bit rates. Let this maximum number of bits be q, meaning at most 2q codewords are in any class codebook. The training sequence is partitioned by the classi er into M subsequences which are used to design a size 2q full search codebook (other types of codebooks could be used) for each class. The splitting method of the generalized Lloyd algorithm [5] produces a sequence of full search codebooks of sizes 20; 21; : : : ; 2q01 as the size 2q codebook is designed. From this, the quantizer function for each class for rates 0; 1; : : : ; q bits per vector is determined. If class i has a convex QF, then (0) 0 di(1) j > j di(1) 0 di(2) j > j di(2) 0 di(3) j > : : : j dri(0) ri (1) 0 ri (2) ri (2) 0 ri (3) i 0 ri (1) where di(j) is the average distortion and ri(j ) is the average rate with j bits allocated to class i. As the rate increases, the magnitude of the slope of the QF decreases. If all QFs are convex, then the algorithm is very simple and involves deallocating (or allocating) only one bit at a time. Because the slope is the lowest at the highest rate, the minimum magnitude slope is due to deallocating the highest rate bit alone. If more than one bit were deallocated, the corresponding slope would be higher. A case where all class QFs are guaranteed to be convex is if each class has a set of codebooks that is a sequence of optimally pruned tree-structured vector quantizers [2]. In this case, non-integral numbers of bits would be deallocated at a time. The bit allocation scheme can be modeled as an (M; 1) tree in which the root has 3

@PPP 0 ) 90 @R PPPq ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Figure 1: (M; 1) tree model for optimal bit allocation children (one for each class) and the subtree rooted at each child is unary and represents the linearly linked list of class codebooks of sizes 20; 21; : : : ; 2q [2]. The average rate of the code is a monotonically increasing tree functional and the average distortion is a monotonically decreasing tree functional. See Figure 1 for a diagram of the tree model. Here, the depth in the tree represents one more than the number of bits allocated to the class, and the number of squares in the block represents the number of codewords in the codebook. In this case, M = 4 and q = 4. Here, a bit allocation b is an M -tuple of nonnegative integers which determines the rate and distortion, R(b) and D(b), of the code. Let B be the set of all possible bit allocations. The operational distortion-rate function can be expressed as

M

^ T (Rd) = minfD(b)jR(b) Rdg: D b2B It speci es the minimum average distortion for a desired rate, Rd , under the condition that the quantizer is based on an allowable bit allocation. Let ji bits be allocated to class i for i = 1; 2; : : : ; M in a given bit allocation so that class i measures an average distortion di(ji). Let pi be the probability that class i is selected by the classi er. We can express the overall average distortion and rate, D and R, as D = p1 d1 (j1 ) + p2 d2 (j2 ) + : : : + pM dM (jM ) and R = p1j1 + p2j2 + : : : + pM jM : 4

Assume that we get the next bit allocation by pruning bits from class 1 alone. Let ji0 be the number of bits allocated to class i under the new bit allocation. Since the QF of class 1 is assumed convex, we prune o only one bit, that is, j10 = j1 0 1. The other distortions and numbers of bits are not changed, , i.e.

di (ji0) = di (ji ); i = 2; 3; : : : ; M

and

ji0 = ji ; i = 2; 3; : : : ; M:

Then D0 and R0, the new average distortion and rate, are D0 = p1 d1 (j10 ) + p2 d2 (j2 ) + : : : + pM dM (jM )

and

R0 = p1 j10 + p2j2 + : : : + pM jM :

Now,

1Doverall = 0 (D0 0 D) = 0 p1(d1(j10 ) 0 d1(j1)) = 0 d1(j10 ) 0 d1(j1) : (1) 1Roverall (R0 0 R) p1(j10 0 j1 ) j10 0 j1 The convexity assumption means that only one bit at a time is pruned o and so, =0

j10 0 j1 = (j1 0 1) 0 j1 = 01:

Therefore,

= d1 (j10 ) 0 d1 (j1 );

and is simply the dierence in distortion when there are j10 and j1 bits allocated to class 1. Thus, the only necessary calculation for each class is to determine the slopes by subtracting the distortion at each rate from the distortion at a rate one bit higher. Within a class, the slopes are necessarily ordered. The ordered lists of slopes for each class are merged and the bits are pruned o in the order of increasing slope magnitude in the merged list. 5

To run the complete algorithm, the codebooks are pruned back until no bits remain. One could of course stop if a desired average rate or distortion is reached. The sequence of pruned class codebooks is \nested" as are the tree-structured vector quantizers in [2]. This is similar to an observation by Shoham and Gersho that as increases, the bits allocated to one particular codebook are strictly nonincreasing [6]. 3.0.1

Bit Allocation Algorithm (With Convexity Assumption)

We here state the bit allocation algorithm given the convexity assumption. A sequence of optimal bit allocations is produced with monotonically decreasing bit rates. The key to the algorithm is that in Step 2, all of the slope calculations are performed ahead of time. We use modi ed notation from Westerink, Biemond, and Boekee [1] and from Gersho and Gray [7]. Here, our S (1; 1), which is the of the pruning algorithm, is the reciprocal of their s(1; 1). Si(j; j 0 1) is just the magnitude of the slope of the QF of class i between rates j 0 1 and j bits. Let Bi be the number of bits allocated to class i. Again, there are M classes and a maximum of q bits can be assigned to each class. 1. For i = 1; 2; : : : ; M , set Bi = q. This is the initial bit allocation. 2. Calculate for i = 1; 2; : : : ; M for j = 1; 2; 1 1 1 ; q, for each class i 1Doverall = 0 di (j) 0 di(j 0 1) = d (j 0 1) 0 d (j): (2) i i 1Roverall j 0 (j 0 1) 3. Determine the class for which Si(Bi; Bi 0 1) is the lowest. Assume it is class l. (If the minimum S (1; 1) is not unique, then select all classes with this value.) Si (j; j 0 1) = 0

Set Bl = Bl 0 1.

4. Calculate the new overall average rate and distortion D and R as D=

XM pidi(Bi) i=1

6

(3)

and

M X R = pi Bi : i=1

Check if R = 0; if so, stop.

(4)

5. Repeat steps 3 and 4. 3.1

Bit Allocating

In the algorithm, the lower convex hull of the operational distortion-rate function is traced out as bits are deallocated. Given the assumption that the class QFs are all convex, we could equivalently allocate bits rather than deallocate them. In this case, we start with no bits allocated and modify Step 3 of the algorithm to search for the highest Si(Bi; Bi + 1). This would nd the optimal bit allocations in order of increasing rate. 3.2

Complexity

The complexity of the algorithm with the convexity assumption can be measured as follows. First, qM slope calculations are performed in Step 2. A slope calculation is simply one subtraction. Next, there are M ordered lists of q numbers to merge. This can be done with at most qM log2 M comparisons, by repeatedly merging two lists at a time [8]. 3.3

Removing the Convexity Assumption

If any class QF is not convex, the generalized BFOS algorithm can still be used for bit allocation, but with some modi cation and additional complexity. We must eectively nd the convex hull of the nonconvex QFs by calculating the slopes due to deallocating more than one bit at a time. If such a slope is lower than one due to deallocating the highest rate bit alone, then that number of bits is deallocated. We could equivalently allocate bits here rather than deallocate them. 7

We here state the algorithm without the convexity assumption. Let Bi be the number of bits allocated to class i. 1. For i = 1; 2; : : : ; M , set Bi = q. This is the initial bit allocation. 2. Calculate for i = 1; 2; : : : ; M for n = 1; 2; : : : ; Bi 1Doverall = 0 di(Bi) 0 di(Bi 0 n) 1Roverall Bi 0 (Bi 0 n) = di (Bi 0 n) 0 di(Bi) :

Si (Bi ; Bi 0 n) = 0

(5)

n

3. For each class i = 1; 2; : : : ; M , determine n for which Si (Bi ; Bi 0 n) is minimized. 4. Determine the class for which Si(Bi; Bi 0 n) is the lowest. Assume it is class l. (If the minimum S (1; 1) is not unique, then select all classes with this value.) Set Bl = Bl 0 n.

5. Calculate rate and distortion of the code. Check if R = 0; if so, stop. 6. Repeat steps 2, 3, 4 and 5 but do 2 and 3 only for class l. Si(Bi; Bi 0 n) will not have changed for the other classes. The complexity here can be measured as follows. In the worst case, n would always be 1 in Step 3 for every class. This would involve q X q(q + 1) M k=M 2 k=1 slope calculations. A slope calculation consists of one subtraction and one division. The nonconvex case then requires q+12 times as many slope calculations as the convex case and each slope calculation involves an additional division. In the worst case, Step 3 would also require a series of q; q 0 1; : : : ; 2-way comparisons to nd the n that 8

minimizes S (1; 1); no comparisons would be necessary here for the convex case. The worst case complexity for Step 4 would be the same as for the convex case. Shoham and Gersho's algorithm describes how to quickly nd one particular bit allocation for a target rate. This depends on guessing a value of that corresponds to a rate close to the desired rate. With a good initial guess, the bit allocation can be found fast. If the initial value of is not close to the target rate, then the number of necessary comparisons may be signi cant. With the generalized BFOS algorithm with the convexity assumption, for at most qM subtractions and qM log2 M comparisons, we get all the extreme points on the convex hull of the operational distortion-rate curve. (Without the convexity assumption, the complexity is increased as described above). This makes the algorithm particularly well-suited to buer control problems since the encoder can switch conveniently between higher and lower rate optimal codes. In addition, our development is conceptually simple and shows that optimal bit allocation can be treated as a special case of the generalized BFOS algorithm.

4 Conclusions We have analyzed a bit allocation algorithm using the generalized BFOS algorithm [2]. It involves assigning a maximum number of bits to each source and then deallocating the bits in order of increasing magnitude slope of change in distortion to change in rate. The algorithm is easy to understand and has low complexity, particularly when all class quantizer functions are convex. Better performance is expected if no maximum limit is placed on the number of bits that can be allocated to each source [9].

5 Acknowledgements The author wishes to thank Kenneth Zeger, Tsutomo Kawabata, David Gluss, and the reviewers for helpful comments and discussions. 9

References [1] P. H. Westerink, J. Biemond, and D. E. Boekee, \An optimal bit allocation algorithm for sub-band coding," in Proceedings of ICASSP, pp. 757{760, IEEE Acoustics Speech and Signal Processing Society, 1988. [2] P. A. Chou, T. Lookabaugh, and R. M. Gray, \Optimal pruning with applications to tree-structured source coding and modeling," IEEE Transactions on Information Theory, vol. 35, pp. 299 { 315, March 1989. [3] L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classi cation and Regression Trees. The Wadsworth Statistics/Probability Series, Belmont, California: Wadsworth, 1984. [4] B. Ramamurthi and A. Gersho, \Classi ed vector quantization of images," IEEE Transactions on Communications, vol. 34, pp. 1105{1115, Nov. 1986. [5] Y. Linde, A. Buzo, and R. M. Gray, \An algorithm for vector quantizer design," IEEE Transactions on Communications, vol. 28, pp. 84{95, Jan. 1980. [6] Y. Shoham and A. Gersho, \Ecient bit allocation for an arbitrary set of quantizers," IEEE Transactions on Acoustics Speech and Signal Processing, vol. 36, pp. 1445 { 1453, Sep. 1988. [7] A. Gersho and R. M. Gray, Vector Quantization and Signal Compression. Boston: Kluwer Academic Publishers, 1990. [8] D. E. Knuth, The Art of Computer Programming. Vol. 3, Reading, MA: AddisonWesley, 1973. [9] E. A. Riskin and R. M. Gray, \A greedy tree growing algorithm for the design of variable rate vector quantizers," IEEE Transactions on Acoustics Speech and Signal Processing. Submitted for publication. 10