1210

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 20, NO. 9, SEPTEMBER 2010

Efficient Bit Allocation and Rate Control Algorithms for Hierarchical Video Coding Chan-Won Seo, Student Member, IEEE, Jung Won Kang, Jong-Ki Han, and Truong Q. Nguyen, Fellow, IEEE

Abstract—Hierarchical structure is a useful tool for providing the necessary scalability in adapting to the variety of channel environments. For schemes involving hierarchical picture structures, bit allocation, and rate control algorithms are vital components for improving video codec performance. Since conventional bit allocation schemes do not efficiently consider the hierarchical structure characteristics, it is difficult to optimize the video quality at an arbitrary bitrate. Similarly, conventional quantization parameter decision methods are not appropriate for controlling the bitrate generated by a codec using a hierarchical encoding structure. In this paper, we propose an effective bit allocation scheme that assigns the target number of bits to pictures or macroblocks (MBs) and improves the overall quality of images encoded by a hierarchical-based encoder. A rate control scheme is also proposed to ensure that the generated bitrate is equal to the assigned target bitrate. From the simulation results, the proposed schemes outperformed conventional methods from a rate-distortion perspective, by efficiently controlling the bitrate of the MB unit. The algorithms regulated the generated bits to achieve the target bits by using the proposed linear R-Q model. Index Terms—Bit allocation, hierarchical video coding, R-Q model, rate control.

I. Introduction

H

.264/AVC Annex A [1] allows temporal scalability, which can be provided by a hierarchical structure. Temporal scalability is also provided in MPEG-2 and MPEG4 part 2. The hierarchical structure adopted in H.264/AVC can support several levels of temporal scalability, and the use of hierarchical encoding structure is not restricted to the dyadic case. It is generally known that it is more efficient to use hierarchical encoding structure rather than the classical “IBBP” structure for most cases [2]. In H.264/AVC Annex G [3], [4], scalable video coding (SVC) has been developed by

Manuscript received November 5, 2008; revised April 3, 2009, July 7, 2009, and October 29, 2009. Date of publication July 26, 2010; date of current version September 9, 2010. This work was supported by the Korean Research Foundation, under Grant KRF-2009-013-1-D00078. This paper was recommended by Associate Editor J. Ridge. C.-W. Seo and J.-K. Han (corresponding author) are with the Department of Information and Communication Engineering, Sejong University, Seoul 143747, Korea (e-mail: [email protected]; [email protected]). J. W. Kang is with the Broadcasting Media Research Group of the Electronics and Telecommunications Research Institute (ETRI), Daejeon 305700, Korea (e-mail: [email protected]). T. Q. Nguyen is with the Department of Electrical and Computer Engineering, University of California, San Diego, CA 92037 USA (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TCSVT.2010.2057011

incorporating signal-to-noise ratio (SNR) and spatial scalabilities. These scalabilities are needed to match the variety of channel environments and conditions. For schemes involving hierarchical picture structures, bit allocation, and rate control are vital components [5]–[7]. The coding efficiency of a video codec can be increased by efficiently allocating target bits for each macroblock (MB) and controlling the quantity of bits generated. While conventional schemes for bit allocation and rate control do not consider efficiently the hierarchical structure, the algorithms proposed in this paper use the video coding structure property. As for schemes applied to the H.264/AVC reference software, bit allocation algorithms were studied to assign target bits for a frame in [5]–[7]. Li et al. [5] and Lim et al. [6] proposed algorithms to control the generated bitrate, where specific models using the mean absolute difference (MAD) and ρ value were used to determine the quantization parameter (QP). The rate control algorithm proposed in [5] uses an R-Q model to determine QP using the MAD of the current frame. Since the MAD value can be calculated after motion vectors have been estimated in a frame, the value is predicted by a temporal linear model. In [6], a ρ-domain source model was proposed to determine QP, where ρ is the percentage of coefficients with a value of zero among the quantized transform coefficients. This model utilizes the relation between ρ and the number of the generated bits. A. Leontaris et al. [7] proposed rate control schemes for a hierarchical structure-based encoder, where the number of allocated bits and QP for a frame are determined by considering the temporal level and slice type. The techniques proposed in this paper allocate target bits to temporal levels, frames, and MBs by considering their hierarchical levels, sensitivities, and complexities, where an improved R-Q model is proposed. The QPs are determined to control the generated bitrate by using the proposed R-Q model. The encoder using the proposed algorithm yields a bitstream with a bitrate that is equal to the target value. Note that the schemes proposed in this paper can be extended for use with SVC. This paper is organized as follows. In Section II, we briefly describe the conventional schemes. A new R-Q model is proposed in Section III. In Section IV, we propose a bit allocation algorithm to assign the target bits for each MB. To generate a bitstream with a rate equal to the target bits, an efficient rate control algorithm is proposed in Section V. The proposed schemes are summarized in Section VI. We

c 2010 IEEE 1051-8215/$26.00

SEO et al.: EFFICIENT BIT ALLOCATION AND RATE CONTROL ALGORITHMS FOR HIERARCHICAL VIDEO CODING

extend the proposed schemes for use with SVC in Section VII. Simulation results are presented in Section VIII. Section IX presents the conclusions. II. Conventional Rate Control Schemes In the motion estimation (ME) module, the QP has to be fixed before motion vectors are estimated and the mode is determined since the QP is used in ME/mode decision (MD). However, before performing motion estimation and mode decision procedures, the QP value needed to generate the bitstream with target bitrate cannot be computed. This is generally known as “Chicken and Egg Dilemma” [5]. Several schemes have been proposed to solve this problem [5], [6], [8], [9]. The rate control scheme proposed in [9] uses two passes to overcome the dilemma. If the first encoding pass using a specific QP fails to generate the target bitrate, then a second pass is conducted to refine the QP value. Since multiple passes are used to encode video data, the complexity of this method is very high. In the 7th Joint Video Team (JVT) meeting at Pattaya, Thailand, Li et al. [5] proposed a rate control scheme that uses a quadratic R-Q model. This scheme uses two steps, the first of which assigns the target bits for the current frame, where a fluid flow traffic model and the complexity are utilized. In the second step, Q is determined by using the proposed quadratic R-Q model [5], as follows: Vt X1 X 2 (1) = + MAD Q Q2 where Vt , Q, X1 , and X2 are the assigned target bits, the quantization step size, and two model parameters, respectively. In (1), the parameters X1 and X2 are calculated as follows: n −1 i=1 Qi Si − X2 Qi X1 = (2) n n n n n i=1 Si − ( i=1 Q−1 i )( i=1 Qi Si ) X2 = (3) n n −2 2 n i=1 Qi − ( i=1 Q−1 i ) where Qi and Si are the actual quantization step size and actual bits used in previous pictures, respectively. n denotes the number of data (for example, MB or frame) encoded previously. Equations (2) and (3) are derived using least mean square estimation. In (1), MAD denotes the mean absolute difference between the current and reference blocks over a frame, which can be calculated from the current and reference blocks after the motion estimation procedure. However, in (1), since Q has to be determined before the ME procedure, MAD cannot be calculated. Thus, the MAD of the current frame is predicted from the MAD of the previous frame using the linear model as follows:

1211

relation between the generated bitrate and ρ. In [6], an efficient model related to variables {R, ρ, Eρ , EQP , QP} was proposed, where R, ρ, Eρ , and EQP are the bitrate, probability of zeros among the quantized transform coefficients, e−ρ , and QP e− 12.5 , respectively. In [7], two modes were proposed to control the bitrate considering the hierarchical B picture structure. The first mode is called RC MODE 2, where the quantization parameters QPBl of B pictures in the lth level are determined based on the QPI/P values of previous key pictures (I or P picture). In the other mode, RC MODE 3, the QPs for the I and B frames are determined using a quadratic R-Q model derived from the P frame property.

III. Proposed R-Q Model To predict the number of bits generated by the encoder, models related to the MAD have been used in [5], [7], [11], and [12] which are the conventional schemes based on the MAD only. Since most of the generated bits are from encoding the high-frequency components (ac coefficients) of the residual signal, the MAD is not a sufficient quantity for computing the generated bits. To overcome the limitation of the conventional schemes, in this section we propose a new R-Q model based on both the variance of difference (VOD) and MAD to predict the generated bitrates, in contrast to the conventional schemes in [5], [7], [11], and [12], which used the MAD and parameters X1 and X2 . In [11], the number, S, of bits generated from encoding a frame is S =κ·

√

MAD

(5)

where κ is a constant value that depends on the coded data. The new model that considering both the VOD and MAD can be represented by √ √ VOD MAD ZVOD = σ · + (6) Q Q

(4)

where ZVOD denotes the cost related to the generated bits, and VOD is the averaged variance for the residuals of all of the 4×4 blocks in a frame. The relationship between the generated bits S and ZVOD is shown in Fig. 1(a), (c), and (e), where the test sequence is Football (CIF size), the hierarchical group of picture (HGOP) size is set to 8, the intra period is 32, and the QPI/P value of the I and P frames is varied from 6 to 48 by intervals of 2. The QPBl values of the B frames are set to “QPI/P + l” where l denotes the hierarchical level of each B frame. In order to compare the proposed model of (6) with a MAD-based model which is represented by √ MAD ZMAD = (7) Q

where MADc and MADp are the MADs of the current and previous frames, respectively. α1 and α2 are two parameters in the prediction model. Recently, Lim et al. [6] proposed a rate control scheme using a ρ-domain source model which employs a linear

the relationship between ZMAD and the generated bits, S, is represented in Fig. 1(b), (d), and (f). In Fig. 1, each symbol “o” is located at a point indicated by (S, Z), where “S” is the number of bits generated by the Joint Model (JM) 16.1 reference software [13] with a QP, and “Z” is the number

MADc = α1 × MADp + α2

1212

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 20, NO. 9, SEPTEMBER 2010

Fig. 1. Relation between Z and the number of the generated bits. (a) I slice with ZVOD . (b) I slice with ZMAD . (c) P slice with ZVOD . (d) P slice with ZMAD . (e) B slice with ZVOD . (f) B slice with ZMAD .

predicted by (6) and (7). As observed from Fig. 1, both (6) and (7) produce a non-linear relationship between Z and the actual bitrates, S. In order to linearize the relationship, we apply a squared root function to the costs as follows: √ √ √ VOD MAD ZVOD = σ · + (8) Q Q √ √ MAD ZMAD = . (9) Q √ The √ relationships of the generated bitrates S to ZVOD and ZMAD are shown in Fig. 2. To measure the accuracy of the

models, we use the R2 function given in [12], [14], and [15] as follows: ˆi 2 Xi − X R2 = 1 − i 2 . ¯ X − X i i

(10)

ˆ i are the actual and predicted values of the ith In (10), Xi and X ¯ is the mean of all data points data point, respectively, and X shown in Fig. 2. The last term in (10) implies the squared error sum normalized by variance of the actual values. The ˆ i is obtained by using the linear models in (8) estimated value X

SEO et al.: EFFICIENT BIT ALLOCATION AND RATE CONTROL ALGORITHMS FOR HIERARCHICAL VIDEO CODING

1213

√ √ √ √ Fig. 2. and the number of the generated bits. (a) I slice with ZVOD . (b) I slice with ZMAD . (c) P slice with ZVOD . (d) P slice √ √ Relation between Z √ with ZMAD . (e) B slice with ZVOD . (f) B slice with ZMAD . TABLE I R2 Values When the Generated Bitrates are Estimated by √ √ ZMAD Using and ZVOD Slice Type I P B

Values of R2 √ √ Using ZMAD Using ZVOD 0.9619 0.9790 0.9706 0.9852 0.9598 0.9732

and (9). If the model can predict the generated bitrates exactly ˆ i for all i, R2 becomes 1. The for all of the data, i.e., Xi = X R2 values for (8) and (9) are shown in Table I. As observed in Table I, the proposed model in (8) yields more accurate results than using the MAD only as in (9). Consequently, we propose a new R-Q model that uses the VOD as follows: S

= ξ1 ×

√ σ·

VOD + Q

√

MAD + ξ2 Q

where ξ1 and ξ2 are linear model parameters.

(11)

1214

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 20, NO. 9, SEPTEMBER 2010

TABLE II bi and βi Values Used in (19) Slice Type 4

Fig. 3.

I P B1 B2 B3 B4

Hierarchical structure where HGOP size is 8.

b 0.2792 0.2673 0.2914 0.2439

β 1.3684 1.3903 1.2979 1.3061

GOP Size 8 b β 0.2676 1.3815 0.2642 1.3926 0.3115 1.3305 0.3036 1.3003 0.2468 1.2816

16 b 0.2625 0.2647 0.3364 0.3548 0.3397 0.2879

β 1.3902 1.3966 1.2991 1.2554 1.2348 1.2111

IV. Proposed Bit Allocation Scheme We propose a bit allocation algorithm that considers hierarchical level, complexity, and sensitivity of each frame. Fig. 3 represents the hierarchical structure of a sequence that gives temporal scalability to a bitstream, where some B frames are used as reference frames. The proposed scheme consists of three steps used to allocate bits for (a) a hierarchical level, (b) a frame, and (c) a MB.

are the same within an HGOP. Thus, we apply weighting factors to assign bits to a frame, as in (14), which follows the ideas from [17] l−1 l w l Tavg (14) = × RHGOP − Gu L max l l k k u=0 w ·N + w ·N

A. Bit Allocation for a Hierarchical Level

l where Tavg and N l denote the average target bits for a frame and the number of frames in temporal level l, respectively. Lmax is the maximum temporal level number in the current HGOP structure. Gu denotes the number of bits generated from encoding pictures in temporal level u. We note that Gu u is equal to Tavg · N u . Thus, (14) is rewritten as l−1 wl l u u Tavg = × RHGOP − Tavg ·N . (15) L max l l k k u=0 w ·N + w ·N

The bits allocated for each intra-period are represented as follows: BitRate (12) × Nintra FrameRate where Icurr represents the bits assigned to a current intra-period and Iprev represents the bits that remain after the data in the previous intra-period have been encoded. Nintra denotes the number of frames in an intra-period. An intra-period consists of several HGOP structures. The bits assigned for an HGOP are calculated by a uniform assignment as follows: Icurr = Iprev +

RHGOP =

Icurr NHGOP

(13)

where RHGOP is the bit number allocated for a current HGOP, and NHGOP is the number of remaining HGOPs in the current intra-period. In Fig. 3, an HGOP consists of I, P, and B pictures, where the distortion resulting from encoding a particular picture affects the quality of other pictures due to their dependency in the hierarchical structure. The quality of P is affected by the distortion generated in encoding I picture. The quality of B1 is affected by those of I and P. As the temporal level increases, the quality of B2 is affected by the distortion generated in encoding {I and B1} or {B1 and P}. We know from this dependency that the significant factors for pictures depend on their level and slice type. To incorporate these significant factors into the algorithm, weighting factors have been used in bit allocation schemes [7], [16], where the weighting factors can be fixed or determined by users. In the proposed algorithms, the temporal weighting factors are optimized instead of using fixed values. From the dependency between pictures, we define weighting factor wl , which indicates the significance of the picture, where l denotes the hierarchical level. In Fig. 3, all of the picture types in a level

k=l+1

k=l+1

To increase the coding efficiency, the weighting factors have to be optimized as follows: ⎫⎤ ⎡ ⎧ Lmax ⎨ Ni ⎬ Distji (w)+λij ·Rateij (w) ⎦ (16) w∗= arg min ⎣ ⎩ ⎭ wl >0,w⊂R i=0

j=1

where Distji (w) and Rateij (w) are the distortion and the generated bitrate, respectively, when weighting factors w are used. λij denotes a Lagrange multiplier for the ith level and the jth picture. Weighting factors w can be represented as (17) w = w0 , w1 , . . . , wLmax . Lmax N i i In (16), i=0 j=1 Ratej (w) is equal to the number of bits assigned to an HGOP (i.e., RHGOP ). We know that Lmax N i i i=0 j=1 Ratej (w) is independent of w. Thus, the optimization of (16) can be simplified to ⎧ ⎫⎤ ⎡ Lmax ⎨ Ni ⎬ w∗ = arg min ⎣ Distji (w) ⎦ ⎩ ⎭ wl >0,w⊂R i=0

=

arg

min

wl >0,w⊂R

j=1

[DHGOP ]

(18)

i i where DHGOP = Li=0max N j=1 Distj (w). DHGOP in (18) can be approximated using the distortion model proposed in [8], as follows:

SEO et al.: EFFICIENT BIT ALLOCATION AND RATE CONTROL ALGORITHMS FOR HIERARCHICAL VIDEO CODING

⎧

DHGOP ≈

Lmax ⎪ ⎨ i=0

≈

⎛ ∞

⎪ ⎩m=−∞

Lmax

i

b Q

⎜ ⎝

i (m+1/2)Q

(m−1/2)Qi

i i β

N

i

⎞ ⎫ ⎪ ⎬ 2 ⎟ x−m·Qi ·fX (x) dx⎠·N i ⎪ ⎭

TABLE III Initial Weighting Factors, wlt Slice Type

(19)

which can be rewritten as

Qi = σ i ·

√

VODi +

√

MADi ×

i Tavg − ξ2

−2

ξ1

.

⎛

C4i ⎝RHGOP −

i−1

where

#2βi −1 " i ∂DHGOP C2 · wi + C3i i i = 2 · C1 · β · ∂wi C4i · wi − C5i ⎧ ⎨ −C2i · C5i − C3i · C4i − C2i · wi 2 + C3i · · i 2 ⎩ C4 · wi − C5i

∂C4i ∂wi

⎫ ⎬ ⎭

(32)

j ∂Tavg ∂C4i =− (33) · Nj i i ∂w ∂w j=0 j−1 j ∂Tavg −wj·N i u u = Tavg·N 2 × RHGOP − ∂wi L max u=0 wj·N j+ wk·N k i−1

(25) ⎞

j Tavg · N j ⎠ − ξ2i · N i

16 13.25 8.5 8.0 4.0 2.0 1.0

where h and τ are the number and scaling factor, iteration l [h] respectively. ∇DHGOP w is calculated by & % ∂DHGOP ∂DHGOP ∂DHGOP T ∇DHGOP = (31) , . . . , , ∂w0 ∂wLmax −1 ∂wLmax

k=j+1

+ wj·N j

+

(26)

wj L max

wk·N

j−1 u ∂Tavg × − ·N u i ∂w k u=0

. (34)

k=j+1

(24)

k=i+1

[0]

(21)

After substituting (21) into (19), the equation is rewritten by using (15) as (22), shown at the bottom of the next page. i i and MADpre are predicted by the scheme In (22), VODpre i i described in [5]. ξ1 and ξ2 are linear model parameters for the ith layer. To simplify (22), some of the terms in (22) are denoted as follows: β i i + i C1i bi · N i · σ i · VODpre (23) MADpre C2i ξ1i · N i Lmax C3i ξ1i · wk · N k

HGOP Size 4 8 4.25 7.25 2.5 4.5 2.0 4.0 1.0 2.0 1.0

2 2.75 1.5 1.0

I P B1 B2 B3 B4

i=0

which is not restricted by the bitrate range [8]. In (19), Qi denotes the quantization step size in the ith level. fX (x) is the probability density function of the transformed coefficients, and bi and βi are model parameters that are set to the values in Table II. DHGOP in (19) can be modified according to the HGOP size, since the number of frames in a temporal level and the distance between the current and reference frames varies with the HGOP size. The values for bi and βi in Table II were i be the average target bitrate determined empirically. Let Tavg i assigned to the ith level, substituting Tavg into S in (11) gives √ √ VODi MADi i = ξ1 · σ i · (20) + + ξ2 Tavg i Q Qi

1215

Note that the derivatives of C3i and C5i with respect to wi are ‘zero’ since these are independent of wi , although they are functions of wk in (25) and (27). In (34), the partial derivative j of Tavg with respect to wi is calculated recursively. The initial value of recursive function (34) is

j=0

C5i ξ2i ·

Lmax

wk · N k .

(27)

k=i+1

DHGOP in (22) can be represented by the following equation by using (23)–(27): ! i$ " i Lmax i #2β i C · w + C 2 3 C1i × . (28) DHGOP = i i − Ci C · w 4 5 i=0 To obtain the optimized weighting factors w, the conventional steepest descent algorithm [18] is used as follows: wl and

[h+1]

= wl

[h]

+τ·δ

[h] δ = −∇DHGOP wl

(29)

(30)

0 ∂Tavg

=' ∂wi

−w0 · N i w0

·

N0

+

L max

wk

(2 × RHGOP . ·

(35)

Nk

k=1

Since the number of temporal levels is finite, the complexity to calculate the derivative is simple for most hierarchical structures. The initial values for the weighting factors are shown in Table III. These values were determined empirically through simulations using the test sequences Foreman, Football, Soccer, and Crew with various bitrates and HGOP structures. Based on the weighting factors, the allocated bit F l for the lth temporal level is calculated as follows: l−1 l l · N w Fl = × RHGOP − Gi . (36) max i wl · N l + Li=l+1 w · Ni i=0

1216

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 20, NO. 9, SEPTEMBER 2010

The bit numbers for frames in the lth level are assigned according to their significances. The details of this scheme are described in the next section. B. Bit Allocation for a Frame The frames in a temporal level have different coding efficiencies from each other. The peak signal-to-noise ratio (PSNRs) of some frames increase dramatically with the assignment of a few more allocated bits, while others produce only a slight gain with much more allocated bits. While conventional schemes [5], [7], [16] do not consider the quality sensitivity characteristics in their work, we consider it in this paper. To allocate target bits for a frame, it is necessary to consider the sensitivity of frames in the computation of assigned bits. In this paper, we consider the increments of PSNR according to the variation in the assigned bits Jul

= =

PSNR Bits PSNR(Qlu−1 − 1) − PSNR(Qlu−1 + 1) Bits(Qlu−1 − 1) − Bits(Qlu−1 + 1)

(37) (38)

where Jul denotes the sensitivity of the uth frame at the lth hierarchical level in an HGOP. Qlu−1 is the quantization step size used in the (u − 1)th frame. In (37) and (38), Jul is calculated by applying the quantization step size Qlu−1 used in the previous frame to the current frame. The values of PSNR(Q) and Bits(Q) can be simply estimated by the method proposed in [8] without encoding the current frame. The target bits for a frame can be allocated with the ratio Jul as follows: ⎛ Tul =

Jul l × (N l − u) Jul + Javg

× ⎝F l −

u−1

Tul =

1 × ⎝F l − Nl − u + 1

DHGOP

=

Sjl ⎠

(39)

⎞ Sjl ⎠ .

⎡ ⎧ ⎪ i Lmax ⎢ i ⎨σ ⎢b ⎣ ⎪ ⎩ i=0 Lmax ⎢ ⎢ i i i ⎢b N σ ⎣ i=0

l VOD(u,v) +

=

l MAD(u,v)

P(0)l(u,v) + 1

(40)

l where B(u,v) denotes the bits allocated for the vth MB in the l uth frame. Cavg denotes the average complexity of the previous l MBs in the current frame. M(u,k) represents the bits generated from the kth MB in the uth frame. NMB represents the number l of MBs in a frame. v−1 k=0 M(u,k) is the total bits generated from encoding all of the previous MBs.

An efficient rate control algorithm is necessary to generate the assigned bit using the bit allocation process described in Section IV. The bitrate is controlled by adjusting the QP. When the first frame in a hierarchical level is encoded, the initial QP is set as follows: ⎧ 40, if (L1 > b/p) ⎪ ⎪ ⎪ ⎪ ⎪ 35, if (L1 ≤ b/p < L2 ) ⎪ ⎪ ⎪ ⎨30, if (L ≤ b/p < L ) 2 3 0 QPinit = (43) ⎪ 25, if (L ≤ b/p < L ) 3 4 ⎪ ⎪ ⎪ ⎪ ⎪ 20, if (L4 ≤ b/p < L5 ) ⎪ ⎪ ⎩ 15, if (L5 ≤ b/p) l 0 = QPinit +l+1 QPinit

i VODpre

(41)

where P(0) is the probability that the transformed coefficients l l have zero values. In (41), VOD(u,v) , MAD(u,v) , and P(0)l(u,v) are the averaged values of VODs, MADs, and P(0)s of the 4×4 blocks at the vth MB in the uth frame, at the lth temporal level. VOD, MAD, and P(0) are predicted from the data of the previous frame. η is a constant that can be set to a value empirically determined from general video sequences. The assigned bit for a MB is computed as v−1 l C (u,v) l l (42) B(u,v) = l × Tul − M(u,k) l C(u,v) + Cavg × (NMB−v−1) k=0

j=1

⎡ =

η· l C(u,v)

V. Proposed Rate Control Scheme

where Tul denotes the target bits for the uth frame at the lth l level. u−1 j=1 Sj represents the bits generated while encoding the previous frames. If the frame is in the first or second HGOP, the target bits are assigned uniformly for frames in the HGOP by the following equation since the Jul cannot be calculated with the parameters of the previous HGOP: u−1

In this section, we consider the complexity of a MB to allocate bits for the MB. The complexity implies the difficulty of encoding the MB. The complexity indicates the quantity of bits generated from encoding the MB. This complexity is defined as )

⎞

j=1

⎛

C. Bit Allocation for a MB

+

i −ξ i Tavg 2 ξ1i

i + VODpre

⎫βi i ⎪ ⎬ MADpre

2

⎪ ⎭

⎤ ⎥ N i⎥ ⎦ ⎛

i MADpre

(44)

β i ⎜ ξ1i ⎜ ⎝ wi · RHGOP −i−1 Tavg k ·N k k=0

Lmax

wi ·N i +

k=i+1

wk ·N k

⎞2βi ⎤ ⎟ ⎥ ⎟ ⎥ ⎠ ⎥ ⎦ i

−ξ2

(22)

SEO et al.: EFFICIENT BIT ALLOCATION AND RATE CONTROL ALGORITHMS FOR HIERARCHICAL VIDEO CODING

where L1

L2

=

=

L3

=

L4

=

L5

=

⎧ ⎪ ⎨0.3, 0.3, ⎪ ⎩ 0.6, ⎧ ⎪ ⎨0.4, 0.7, ⎪ ⎩ 1.0, ⎧ ⎪ ⎨0.5, 1.0, ⎪ ⎩ 1.2, ⎧ ⎪ ⎨0.6, 1.4, ⎪ ⎩ 2.4, ⎧ ⎪ ⎨0.7, 1.7, ⎪ ⎩ 3.0,

is calculated for fixed ξ1 and ξ2 values. To find σ, (11) is rewritten as

if (Hresol ≤ QCIF) if (QCIF < Hresol ≤ CIF) if (Hresol > CIF)

'

if (Hresol ≤ QCIF) if (QCIF < Hresol ≤ CIF) if (Iresol > CIF)

if (Hresol ≤ QCIF) if (QCIF < Hresol ≤ CIF) if (Hresol > CIF) (45)

l is an initial QP where b/p is the target bits/pixel, and QPinit value at the lth hierarchical level. Hresol denotes the resolution 0 of the picture. For the 0th level, the initial parameter QPinit l is determined by (43). The QPinit values of other levels are determined by (44). When the frame is not the first one in a level, Q value is decided using the R-Q model proposed in (11). In (11), σ, ξ1 , and ξ2 have to be fixed before the calculation. We propose three steps for determining these parameters. In the first step, ξ1 and ξ2 are calculated for a fixed σ value, where the cost function is ⎫⎤2 ⎧ ⎡ √ √ N−1 ⎬ ⎨ VOD MAD i i ⎣Si − ξ1 · σi · E1 = + + ξ2 ⎦ . (46) ⎭ ⎩ Qi Qi i=0

By using the partial differential method for E1 , the parameters ξ1 and ξ2 are calculated from the following operations: √ √ VODi MADi (47) Zi = σi · + Qi Qi ∂E1 = −2 × [Si − {ξ1 × Zi + ξ2 }] = 0 ∂ξ2 i=0 N−1

N−1

ξ2 =

Si − ξ 1 ·

i=0

N×

Zi

i=0

N N−1

∂E1 = −2 × ∂ξ1

ξ1 =

N−1

(48)

[Si − {ξ1 × Zi + ξ2 }] × Zi = 0

'N−1 ( 'N−1 ( Si · Zi− Si · Zi

i=0

i=0

N−1

'N−1 (2 Zi− Zi

i=0

i=0

N×

i=0

√

(2 =σ·

VOD + Q

√

MAD . Q

(50)

Applying the partial differential method to (51) gives the following scheme for determining σ for fixed ξ1 and ξ2 : ! √ √ ($ N−1 ' Si −ξ2 (2 ' ∂E2 VODi MADi = −2× − σ· + ∂σ ξ1 Qi Qi i=0 √ VODi · =0 Qi N−1 Si −ξ2 2 √VODi N−1 √VODi·√MADi · Qi − ξ1 Q2i i=0 . σ = i=0 N−1 VODi i=0

Q2i

(52) Finally, ξ1 and ξ2 are recalculated using (48) and (49) with the optimized σ value to minimize the prediction error. After the parameters (ξ1 , ξ2 and σ) have been calculated, Q is determined to control the generated bitrate. Q is determined by using an equation modified from (11);

Q=

σ·

√

√ VODcurr + MADcurr 2

(53)

S−ξ2 ξ1

where VODcurr and MADcurr are the VOD and MAD of the current frame. The QP can be selected for a particular value of Q. The proposed R-Q model (53) is also applied to determine the Q of a MB. After a Q for a MB has been determined, QPMB is modified as follows to prevent a rapid fluctuation in the quality of a frame: ⎧ ⎪ ⎨QPframe − 2, if(QPframe − 2 > QPMB ) QPMB = QPframe + 2, if(QPframe + 2 < QPMB ) ⎪ ⎩ QPMB , otherwise

(54)

where QPMB and QPframe are the QP values determined for a MB and frame, respectively.

VI. Summary of the Proposed Schemes

i=0

N−1

S − ξ2 ξ1

The cost function using (50) is ! √ √ ($ 2 N−1 ' Si −ξ2 (2 ' VODi MADi E2 = − σ· . (51) + ξ1 Qi Qi i=0

if (Hresol ≤ QCIF) if (QCIF < Hresol ≤ CIF) if (Hresol > CIF)

if (Hresol ≤ QCIF) if (QCIF < Hresol ≤ CIF) if (Hresol > CIF)

1217

(49)

where N is the number of previously encoded pictures in the current level. In the second step of the proposed scheme, σ

The proposed overall algorithm is summarized in Fig. 4. The overall algorithm consists of bit allocation and rate control parts. The bit allocation procedure is performed consecutively for an intra-period, HGOP, temporal level, frame, and MB. The bit allocation module has been described in (12), (13), (36), (39), and (42) in Section IV. After the number of bits for a frame has been assigned, the Q for the frame is determined by (53), where S is substituted by the target bitrate Tul for a

1218

Fig. 4.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 20, NO. 9, SEPTEMBER 2010

Flowchart of the proposed bit allocation and rate control algorithms

frame. The bit quantity for a MB is assigned by (42), where the complexities of MBs are used. The Q value for a MB l is calculated by (53), where the B(u,v) bits assigned to a MB is applied into S. The algorithm is applied for all HGOPs, temporal levels, frames, and MBs iteratively until all of the data have been encoded. This paper proposed a new R-Q model in (11) of Section III, where the VOD and MAD are utilized, while conventional schemes use only the MAD. This model is used to allocate the bitrate for a temporal level and a MB in (36) and (42) of Section IV, respectively. The model is also used to control the generated bitrate in (53) of Section V. The proposed schemes (bit allocation and rate control) outperform the conventional methods because the proposed R-Q model is more accurate than the conventional schemes, as shown in Table I. In addition, it uses the optimized weighting factor wi and PSNR sensitivity Jul in (36) and (39) to obtain coding gain. VII. Extension to SVC The algorithms proposed in the previous sections can be extended to SVC(H.264/AVC Annex G). SVC has been developed as an extension of H.264/AVC and many tools have been adopted to SVC [3], where the sequence structure consists of base and enhancement layers. The base layer encoded by the

SVC encoder can be decoded by H.264/AVC decoder. In order to improve the coding efficiency of the enhancement layer, inter-layer prediction mode using the coding information of the base layer is employed. A. Conventional Rate Control Schemes for SVC Some research [16], [17] has been conducted to control the bitrate in the SVC codec, where the bitrates of the base layer are controlled by conventional schemes [5], [12] (proposed for JM reference software), since the base layer of SVC is compatible with H.264/AVC. In [16], a bit allocation algorithm for hierarchical B picture structure was proposed, along with an improved MAD prediction scheme. When base layer is encoded, the MAD is predicted from the previous pictures in the base layer. This method is called the temporal prediction of the MAD and is used in the H.264/AVC reference software [13]. On the other hand, when the enhancement layer is encoded, since the base layer has been encoded, the data in the base layer can be used to predict the MAD of the current picture in the enhancement layer, which is called spatial prediction. In [16], to efficiently predict the MAD of the picture in the enhancement layer, the temporal and spatial predictions are adaptively selected. In [17], the bitrate of each temporal layer was allocated using the fixed weighting factors assigned to lev-

SEO et al.: EFFICIENT BIT ALLOCATION AND RATE CONTROL ALGORITHMS FOR HIERARCHICAL VIDEO CODING

TABLE IV eMAD =

Averaged Prediction Errors

1 T

T i i − MADpred MADact

Football Crew Soccer Tempete

Sequence Football Crew Soccer Tempete

Sequence Football Crew Soccer Tempete

TABLE V eVOD =

Averaged Prediction Errors

i=1

When the Proposed MAD Prediction and Conventional Schemes are Applied to Base and Enhancement Layers in SVC Codec

Sequence

1219

Sequence Football Crew Soccer Tempete

(b) Frame Level Enhancement Base [5] [16] Proposed 0.8378 0.5236 0.4879 0.4751 0.6972 0.4026 0.3696 0.3291 0.2341 0.2462 0.2592 0.2378 0.2923 0.1380 0.1362 0.1253

Sequence Football Crew Soccer Tempete

(b) MB Level Enhancement [5] [16] Proposed 2.8723 1.8228 1.6323 1.6261 1.2781 1.0111 0.9402 0.9261 1.7175 1.2674 1.1929 1.1553 1.6154 1.3868 1.2808 1.2407 Base

B. Proposed Prediction Scheme for Variables Since inter-layer prediction of H.264/AVC Annex G can be applied to each MB adaptively in the enhancement layer, some MBs are encoded with inter-layer prediction and others are encoded without inter-layer prediction. Therefore the probability of inter-layer prediction should be considered when the MAD of the enhancement layer is predicted. In our paper, to predict the MAD of the enhancement layer, we propose a combined form with both the temporally and spatially predicted MADs, where the combination ratio is set to the inter-layer prediction probability Probinter-layer MADpred = 1 − Probinter-layer · MADtemp +Probinter-layer · MADspat (55) where MADpred denotes the MAD predicted for the current frame in the enhancement layer. MADtemp and MADspat denote the predicted MADs temporally and spatially predicted MADs, respectively. Probinter-layer is the probability that the MBs in the current frame are encoded with the inter layer prediction mode. Since Probinter-layer is unknown before the current frame has been encoded, the probability of the previous picture is used. The method of (55) can be used to predict other variables, VOD and P(0) in (11) and (41), respectively. Since the first frame is encoded with the initial QP, the R-Q model is not needed for the first frame. Consequently, the initial values of the variables [e.g., MAD, VOD, P(0)] are not needed either.

T i i − VODpred VODact

i=1

When the Proposed VOD Prediction and Conventional Schemes are Applied to Base and Enhancement Layers in SVC Codec

(a) Temporal Level Enhancement Base [5] [16] Proposed 1.0047 0.6100 0.5670 0.5563 0.6451 0.3991 0.3969 0.3273 0.2763 0.2881 0.2981 0.2772 0.3197 0.1445 0.1429 0.1294

els in the hierarchical structure. After the bits were allocated, the QP was determined by the R-Q model used in [5].

1 T

(a) Temporal Level Enhancement Base [5] [16] Proposed 10.7326 11.9954 11.0297 10.7326 2.7594 3.1535 3.0546 2.7594 4.4443 4.2357 4.5610 4.4443 2.9250 3.1334 3.1107 2.9250 (b) Frame Level Enhancement Base [5] [16] Proposed 18.9245 10.0545 9.0958 9.0289 5.1571 2.8626 2.7258 2.4762 5.1183 3.6927 3.9699 3.8462 7.1028 2.7999 2.7801 2.6423

Sequence

Base

Football Crew Soccer Tempete

70.8352 13.7526 37.0749 35.1141

(c) MB Level Enhancement [5] [16] Proposed 38.6286 33.9128 33.3922 10.2688 9.4488 9.3042 22.6264 21.3121 20.5147 25.0208 22.7753 22.0406

TABLE VI i |Ti −Si | Error Ratio er = When the Proposed and i Ti Conventional Rate Control Algorithms are Applied to H.264/AVC Codec Sequence Soccer City Ice Crew

HGOP Size 4 8 4 8 4 8 4 8

[5] 0.3599 0.5656 0.6349 0.8708 0.3567 0.7377 0.5966 0.8852

[7] RC MODE 2

RC MODE 3

0.3959 0.6637 0.7265 0.8430 0.3651 0.6747 0.6388 1.0581

0.3115 0.8988 0.7040 1.0950 0.7378 1.4927 0.6288 0.9334

Proposed Rate Control 0.1593 0.2118 0.1965 0.2066 0.1586 0.2413 0.2430 0.3080

Tables IV and V1 show the averaged prediction errors when the conventional and proposed schemes are applied to predict the MAD and VOD of the temporal level, frame, and MB of the test sequences Football, Crew, Soccer, and Tempete. The averaged prediction errors for the MAD and VOD are defined as T , 1 ,, , i i eMAD = (56) − MADpred ,MADact , T i=1 eVOD =

T , 1 ,, , i i − VODpred ,VODact , T i=1

(57)

1 In Tables IV and V, HGOP size is set to 4. Resolutions of base and enhancement layers are quarter common intermediate format (QCIF) and common intermediate format (CIF), respectively. Option for inter-layer prediction of joint scalable video model (JSVM) is set to “Adaptive.” QPbase and QPenh are set to 32 and 28, respectively.

1220

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 20, NO. 9, SEPTEMBER 2010

where eMAD , eVOD and T denote the averaged prediction errors and the number of coded data (e.g., temporal levels, i i i frames, and MBs), respectively. MADact , MADpred , VODact i and VODpred are the actual and predicted MADs and VODs for the ith data, respectively. In these tables, the averaged prediction errors of the proposed scheme are smaller than those of conventional schemes [5], [16].

VIII. Simulation Results In this section, we show simulation results to evaluate the performance of the proposed algorithm. The simulation was implemented with the reference software JM 16.1 [13]. The test sequences were Soccer, City, Ice, Crew, Football, Flower, and Tempete whose sizes are QCIF or CIF. The HGOP size was set to 4 or 8. Context adaptive binary arithmetic coding was used as the entropy coder and two reference frames were used. We compared the proposed schemes with the conventional methods described in [5] and [7]. Among the schemes proposed in [7], RC MODE 2 and RC MODE 3 were used since these methods consider the hierarchical picture structure. We tested the algorithms for as many cases as possible. A. Results From Rate Control Using the Proposed R-Q Model In this section, we present the simulation results for the rate control scheme proposed in Section V. This technique is used to control the QP so that the generated bitrate is equal to the target bitrate. The target bitrates of I, P, B1, B2, and B3 were set to 10, 7, 5, 3, and 2 kbits, respectively. Table VI shows the error ratio between the target and the generated bits. The error ratio is defined as NData |Ti − Si | er = i=1 (58) NData i=1 Ti where Ti and Si are the target bits and the generated bits in the ith picture, respectively. Note that the R-Q models in [5] and [7] were derived based on only the characteristics of P frames. The conventional scheme in [5] cannot control the bitrate of B picture, since the method in [5] does not consider B frames. RC MODE 2 of [7] sets the QP for B picture with the value related to QPs of I and P pictures using offsets. RC MODE 3 of [7] assigns the target bits for B and I pictures, but decides the QP based on R-Q model designed for P picture, not B and I pictures. As opposed to the conventional schemes [5], [7], since the proposed R-Q model is derived to control the generated bits for all frame types, as shown in Table I and Fig. 2, the performance of the proposed rate control scheme in controlling the generated bitrates is much more precise than the conventional schemes, as seen in Table VI. B. Results of Bit Allocation and Rate Control Schemes In this section, the performance of the algorithm described in Section VI is evaluated. We compare the proposed scheme with conventional schemes [5], [7]. Table VII shows the bitrate and PSNRs for the various schemes where Intra period is set to 32. As observed in Table VII, the proposed scheme is more

Fig. 5. R-D curves when each step of bit allocation algorithm proposed in Section IV is used separately. Test sequence is Tempete.

efficient at controlling the generated bits and maintaining highvideo quality than the conventional schemes under various circumstances. The proposed scheme has coding gain over the conventional schemes for the following reasons. In the proposed algorithm, the weighting factors for temporal levels are optimized to assign target bits to each level as described in Section IV-A, while the weighting factors of the conventional schemes are set to fixed values or are selected by the user. Moreover, the proposed scheme considers the coding efficiency according to the increments of the assigned bits, while the conventional schemes do not. The bit allocation scheme proposed in Section IV consists of three steps for the temporal, frame, and MB levels. The coding gains resulting from these steps are shown in Fig. 5, where the algorithms for temporal, frame, and MB are denoted by “A,” “B,” and “C,” respectively. “A” denotes the bit allocation algorithm for the temporal level described in Section IV-A. When bitrates are assigned by “A,” bit quantities for the frame and MB are assigned uniformly. When “A + B” is used for bit allocation, the bit quantities for the temporal and frame levels are assigned by the schemes proposed in Sections IV-A and IV-B, respectively, while those for MB are set uniformly. “IWF” denotes a scheme where bit quantities for the temporal level are assigned by using the initial weighting factor described in Table III, while the target bits for the frame and MB are set uniformly. As observed in Fig. 5, each step of the bit allocation scheme proposed in Section IV gives coding gain independently or cooperatively. This means that all of the steps of the proposed algorithm are useful for allocating the target bits for the temporal, frame, and MB levels. In order to check the fluctuation of the generated bits for intra-period, Fig. 6 shows bitrates generated over an intraperiod. From these results, we conclude that our proposed scheme is able to control bitrate over an intra period without fluctuation. C. Results From Schemes Extended to SVC To verify the effectiveness of the schemes extended to SVC, a simulation was performed with JSVM 9.12 [19], [20]. Table VIII shows the results for the circumstance where

SEO et al.: EFFICIENT BIT ALLOCATION AND RATE CONTROL ALGORITHMS FOR HIERARCHICAL VIDEO CODING

1221

TABLE VII Performances of Bit Allocation and Rate Control Algorithms in Terms of Bitrate and PSNR Where the Schemes are Applied to H.264/AVC Sequence

Football (QCIF) Crew (QCIF) Soccer (QCIF) Ice (CIF) Flower (CIF)

[5] HGOP Size 4 8 8 8 16

Target Bitrate (kb/s) 256 384 256 384 128 256 256 512 2000 3000

Bitrate (kb/s) 296.93 408.15 327.12 432.34 161.38 289.60 287.93 583.49 4016.52 4347.42

[7] PSNR (dB) 31.18 34.04 35.72 37.45 33.58 36.69 37.00 40.26 40.20 41.12

RC MODE 2 Bitrate PSNR (kb/s) (dB) 266.36 31.99 396.78 34.57 264.91 35.42 396.14 37.49 132.40 32.59 260.90 36.66 282.33 36.90 597.79 40.28 2818.53 37.05 4062.97 40.61

Fig. 6. Bitrate per a intra period when the proposed and conventional algorithms are used. (a) Target bitrate is 256 kb/s. (b) Target bitrate is 96 kb/s.

RC MODE 3 Bitrate PSNR (kb/s) (dB) 258.65 31.88 388.47 34.47 260.05 35.22 393.74 36.63 130.04 32.62 258.97 36.04 339.33 35.10 598.88 38.44 2551.58 33.81 3206.99 36.14

Proposed Scheme Bitrate PSNR (kb/s) (dB) 255.17 32.02 383.00 34.42 255.70 35.40 383.56 37.82 127.53 32.62 255.87 36.59 249.69 36.00 498.26 39.94 1962.48 33.65 2944.56 36.53

Fig. 7. R-D curves when the bit allocation and rate control algorithms are applied to JSVM 9.12. Test sequence is Crew. MGS and temporal scalabilities are used. Option for inter layer prediction of JSVM is set to “Adaptive.” (a) Base layer. (b) Enhancement layer

1222

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 20, NO. 9, SEPTEMBER 2010

TABLE VIII Performances of Bit Allocation and Rate Control Algorithms in Terms of Bitrate and PSNR When the Proposed Schemes are Applied to SVC

Sequence

Football (HGOP 4)

Flower (HGOP 16)

Sequence

Crew (HGOP 8)

Soccer (HGOP 8)

Layers

Base (QCIF) Enh. (CIF) Base (QCIF) Enh. (CIF) Base (QCIF) Enh. (CIF) Base (QCIF) Enh. (CIF)

Layers

Base Enh. Base Enh. Base Enh. Base Enh.

(QCIF) (QCIF) (QCIF) (QCIF) (QCIF) (QCIF) (QCIF) (QCIF)

(a) Target Bitrate (kb/s) 128 256 256 512 192 768 256 1024

Spatial and [5] Bitrate (kb/s) 127.18 286.75 252.24 511.53 126.23 543.55 213.63 1008.47

Temporal Scalabilities are Used [16] [17] PSNR Bitrate PSNR Bitrate PSNR (dB) (kb/s) (dB) (kb/s) (dB) 26.66 127.11 26.97 125.51 27.24 28.14 255.69 27.33 257.84 27.02 30.91 255.09 30.52 281.47 31.71 31.83 510.30 31.08 615.40 32.49 30.03 191.38 28.39 123.65 30.38 28.87 766.57 27.51 499.10 29.00 33.25 257.16 30.54 192.38 33.38 32.42 1020.39 29.47 873.31 32.23

(b) SNR (MGS) and Temporal Scalabilities are Target [5] [16] Bitrate Bitrate PSNR Bitrate PSNR (kb/s) (kb/s) (dB) (kb/s) (dB) 96 93.13 28.61 95.48 28.87 128 143.93 34.13 127.38 32.78 128 119.22 30.16 127.53 29.57 160 214.65 35.37 159.36 33.71 192 189.48 33.97 191.33 32.69 256 398.98 39.85 255.80 36.41 256 237.30 35.73 255.20 34.52 320 503.82 41.99 318.88 38.25

spatial or SNR [medium grain scalability (MGS)] scalabilities were used. We applied the conventional schemes [5], [16], [17] to each layer independently. In the simulations for [16] and [17], the weighting factors for the temporal levels were set to the default values proposed in [16] and [17]. The option for the inter-layer prediction of JSVM was set to “Adaptive,” intra-period was set to 32, two layers {QCIF and CIF} or {QCIF and QCIF} were used, and the frame rates of the sequences were “30 Hz” for all layers. For the proposed scheme, the algorithms proposed in Sections III–V were applied to the base layer. For the enhancement layer of the SVC encoder, the algorithms that used the variable prediction scheme described in Section VII-B were used. Some of the conventional schemes cannot achieve the target bitrate even if the PSNRs were high. In Fig. 7, the performances of the proposed schemes are compared with those of the conventional schemes in the sense of rate-distortion optimization. The simulation conditions of Fig. 7 are the same as those of (b) in Table VIII. In most cases, the results of the proposed scheme are more efficient than those of the conventional schemes. From these results, we verify that the proposed schemes can be extended to the SVC codec.

IX. Conclusion We proposed efficient bit allocation and rate control algorithms for hierarchical video coding. A new R-Q model was described and the dependency between pictures and sensitivity of picture are utilized to allocate the bitrate optimally. The experimental results showed that the proposed schemes were more accurate at controlling the generated bitrate than the conventional schemes. We could observe from the simulation

Proposed Bitrate (kb/s) 127.86 258.59 253.61 511.21 183.34 727.20 245.59 973.57

Scheme PSNR (dB) 27.46 27.70 31.05 31.60 32.65 31.86 35.02 33.41

Used [17] Bitrate PSNR (kb/s) (dB) 96.63 29.75 148.47 34.56 120.96 31.06 199.87 35.21 200.52 34.60 394.87 40.05 237.32 35.90 437.16 41.40

Proposed Bitrate (kb/s) 96.11 125.58 126.34 157.46 188.11 250.57 251.68 313.23

Scheme PSNR (dB) 30.09 34.62 32.09 35.99 34.72 39.57 36.30 41.07

results that conventional schemes could not efficiently control the bitrate in hierarchical B picture structure. The proposed bit allocation scheme could assign the target bits efficiently, since the proposed scheme considers many factors, such as the temporal level, frame sensitivity, and MB complexity. Acknowledgment The authors would like to thank the reviewers for their constructive and valuable comments on this paper. References [1] Advanced Video Coding for Generic Audiovisual Services, ITU-T Rec. H.264 and ISO/IEC 14496-10, Doc. E32768, Nov. 2007. [2] H. Schwarz, D. Marpe, and T. Wiegand, “Analysis of hierarchical Bpictures and MCTF,” in Proc. ICME, Jul. 2006, pp. 1929–1932. [3] H. Schwarz, D. Marpe, and T. Wiegand, “Overview of the scalable video coding extension of the H.264/AVC standard,” IEEE Trans. Circuits Syst. Video Technol., vol. 17, no. 9, pp. 1103–1120, Sep. 2007. [4] H. C. Huang, W. H. Peng, T. Chiang, and H. M. Hang, “Advances in the scalable amendment of H.264/AVC,” IEEE Commun. Mag., vol. 45, no. 1, pp. 68–76, Jan. 2007. [5] Adaptive Basic Unit Layer Rate Control for JVT, document JVTG012.doc, Joint Video Team of ISO/IEC MPEG and ITU-T VCEG, Mar. 2003. [6] S. C. Lim, H. R. Na, and Y. L. Lee, “Rate control based on linear regression for H.264/MPEG-4 AVC,” Signal Process.: Image Commun., vol. 22, no. 1, pp. 39–58, Jan. 2007 [7] Rate Control Reorganization in the Joint Model (JM) Reference Software, document JVT-W042.doc, Joint Video Team of ISO/IEC MPEG and ITU-T VCEG, Apr. 2007. [8] N. Karmaci, Y. Altunbasak, and R. M. Mersereau, “Frame bit allocation for the H.264/AVC video coder via cauchy-density-based rate and distortion models,” IEEE Trans. Circuits Syst. Video Technol., vol. 15, no. 8, pp. 994–1006, Aug. 2005. [9] Rate Control on JVT Standard, document JVT-D030.doc, Joint Video Team of ISO/IEC MPEG and ITU-T VCEG, Jul. 2002.

SEO et al.: EFFICIENT BIT ALLOCATION AND RATE CONTROL ALGORITHMS FOR HIERARCHICAL VIDEO CODING

[10] K. S¨uehring. H.264/AVC Reference Software (JM 11.0) [Online]. Available: http://iphome.hhi.de/suehring/tml/download/old jm/ [11] B. Xie and W. Zeng, “A sequence-based rate control framework for consistent quality real-time video,” IEEE Trans. Circuits Syst. Video Technol., vol. 16, no. 1, pp. 56–71, Jan. 2006. [12] Y. Liu, G. Li. Zhengguo, and Y. C. Soh, “A novel rate control scheme for low delay video communication of H.264/AVC standard,” IEEE Trans. Circuits Syst. Video Technol., vol. 17, no. 1, pp. 68–78, Jan. 2007. [13] K. S¨uehring. H.264/AVC Reference Software (JM 16.1) [Online]. Available: http://iphome.hhi.de/suehring/tml/download/ [14] J. L. Devore and N. R. Farnum, Applied Statistics for Engineers and Scientists. New York: Duxbury, 1999. [15] D. K. Kwon, M. Y. Shen, and C. C. J. Kuo, “Rate control for H.264 video with enhanced rate and distortion models,” IEEE Trans. Circuits Syst. Video Technol., vol. 17, no. 5, pp. 517–529, May 2007. [16] Y. Liu and Z. G. Li, “Rate control of H.264/AVC scalable extension,” IEEE Trans. Circuits Systems Video Technol., vol. 18, no. 1, pp. 116– 121, Jan. 2008. [17] Rate Control for the Joint Scalable Video Model, document JVTW043.doc, Joint Video Team of ISO/IEC MPEG and ITU-T VCEG, Apr. 2007. [18] G. Arfken, “The method of steepest descents,” in Mathematical Methods for Physicists, 3rd ed. Orlando, FL: Academic, 1985, sec. 7.4, pp. 428– 436. [19] JSVM Software, document JVT-Z203.doc, Joint Video Team of ISO/IEC MPEG and ITU-T VCEG, Jan. 2008. [20] Draft Reference Software for SVC, document JVT-Z211.doc, Joint Video Team of ISO/IEC MPEG and ITU-T VCEG, Jan. 2008.

Chan-Won Seo (S’09) was born in Incheon, Korea, on March 23, 1982. He received the B.S. and M.S. degrees from the Department of Information and Communication Engineering, Sejong University, Seoul, Korea, in 2007 and 2009, respectively. He is currently pursuing the Ph.D. degree from the same university. His current research interests include video coding, scalable video coding, and future video coding.

Jung Won Kang received the B.S. and M.S. degrees in electrical engineering from Hankuk Aviation University, Seoul, Korea, in 1993 and 1995, respectively. She received the Ph.D degree in electrical and computer engineering from the Georgia Institute of Technology, Atlanta, in 2003. Since 2003, she has been a Senior Research Staff Member with the Broadcasting Media Research Group of Electronics and Telecommunications Research Institute, Daejeon, Korea. Her current research interests include video signal processing, video coding, and video adaptation.

1223

Jong-Ki Han was born in Seoul, Korea, on September 5, 1968. He received the B.S., M.S., and Ph.D. degrees in electrical engineering from the Korea Advanced Institute of Science and Technology, Taejon, Korea, in 1992, 1994, and 1999, respectively. From 1999 to 2001, he was a Technical Staff Member with the Corporate Research and Development Center, Samsung Electronics Company, Suwon, South Korea. He is currently an Associate Professor with the Department of Information and Communications Engineering, Sejong University, Seoul, Korea. His current research interests include image and audio signal compression, transcoding, and very large scale integrated signal processing.

Truong Q. Nguyen (F’05) received the B.S., M.S., and Ph.D. degrees in electrical engineering from the California Institute of Technology, Pasadena, in 1985, 1986, and 1989, respectively. He is currently a Professor with the Department of Electrical and Computer Engineering, University of California, San Diego. He is the coauthor (with Prof. G. Strang) of a popular textbook, Wavelets & Filter Banks (Cambridge, MA: Wellesley-Cambridge, 1997), and the author of several MATLAB-based toolboxes on image compression, electrocardiogram compression and filter bank design. He has over 300 publications. His current research interests include video processing algorithms and their efficient implementation. He received the IEEE Transaction in Signal Processing Paper Award (image and multidimensional processing area) for the paper he co-wrote with Prof. P. P. Vaidyanathan on linear-phase perfect-reconstruction filter banks in 1992. He received the NSF Career Award in 1995 and is currently the Series Editor of Digital Signal Processing for Academic Press. He has served as an Associate Editor for the IEEE Transaction on Signal Processing from 1994 to 1996, for the Signal Processing Letters from 2001 to 2003, for the IEEE Transaction on Circuits and Systems from 1996 to 1997 and 2001 to 2004, and for the IEEE Transaction on Image Processing from 2004 to 2005.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 20, NO. 9, SEPTEMBER 2010

Efficient Bit Allocation and Rate Control Algorithms for Hierarchical Video Coding Chan-Won Seo, Student Member, IEEE, Jung Won Kang, Jong-Ki Han, and Truong Q. Nguyen, Fellow, IEEE

Abstract—Hierarchical structure is a useful tool for providing the necessary scalability in adapting to the variety of channel environments. For schemes involving hierarchical picture structures, bit allocation, and rate control algorithms are vital components for improving video codec performance. Since conventional bit allocation schemes do not efficiently consider the hierarchical structure characteristics, it is difficult to optimize the video quality at an arbitrary bitrate. Similarly, conventional quantization parameter decision methods are not appropriate for controlling the bitrate generated by a codec using a hierarchical encoding structure. In this paper, we propose an effective bit allocation scheme that assigns the target number of bits to pictures or macroblocks (MBs) and improves the overall quality of images encoded by a hierarchical-based encoder. A rate control scheme is also proposed to ensure that the generated bitrate is equal to the assigned target bitrate. From the simulation results, the proposed schemes outperformed conventional methods from a rate-distortion perspective, by efficiently controlling the bitrate of the MB unit. The algorithms regulated the generated bits to achieve the target bits by using the proposed linear R-Q model. Index Terms—Bit allocation, hierarchical video coding, R-Q model, rate control.

I. Introduction

H

.264/AVC Annex A [1] allows temporal scalability, which can be provided by a hierarchical structure. Temporal scalability is also provided in MPEG-2 and MPEG4 part 2. The hierarchical structure adopted in H.264/AVC can support several levels of temporal scalability, and the use of hierarchical encoding structure is not restricted to the dyadic case. It is generally known that it is more efficient to use hierarchical encoding structure rather than the classical “IBBP” structure for most cases [2]. In H.264/AVC Annex G [3], [4], scalable video coding (SVC) has been developed by

Manuscript received November 5, 2008; revised April 3, 2009, July 7, 2009, and October 29, 2009. Date of publication July 26, 2010; date of current version September 9, 2010. This work was supported by the Korean Research Foundation, under Grant KRF-2009-013-1-D00078. This paper was recommended by Associate Editor J. Ridge. C.-W. Seo and J.-K. Han (corresponding author) are with the Department of Information and Communication Engineering, Sejong University, Seoul 143747, Korea (e-mail: [email protected]; [email protected]). J. W. Kang is with the Broadcasting Media Research Group of the Electronics and Telecommunications Research Institute (ETRI), Daejeon 305700, Korea (e-mail: [email protected]). T. Q. Nguyen is with the Department of Electrical and Computer Engineering, University of California, San Diego, CA 92037 USA (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TCSVT.2010.2057011

incorporating signal-to-noise ratio (SNR) and spatial scalabilities. These scalabilities are needed to match the variety of channel environments and conditions. For schemes involving hierarchical picture structures, bit allocation, and rate control are vital components [5]–[7]. The coding efficiency of a video codec can be increased by efficiently allocating target bits for each macroblock (MB) and controlling the quantity of bits generated. While conventional schemes for bit allocation and rate control do not consider efficiently the hierarchical structure, the algorithms proposed in this paper use the video coding structure property. As for schemes applied to the H.264/AVC reference software, bit allocation algorithms were studied to assign target bits for a frame in [5]–[7]. Li et al. [5] and Lim et al. [6] proposed algorithms to control the generated bitrate, where specific models using the mean absolute difference (MAD) and ρ value were used to determine the quantization parameter (QP). The rate control algorithm proposed in [5] uses an R-Q model to determine QP using the MAD of the current frame. Since the MAD value can be calculated after motion vectors have been estimated in a frame, the value is predicted by a temporal linear model. In [6], a ρ-domain source model was proposed to determine QP, where ρ is the percentage of coefficients with a value of zero among the quantized transform coefficients. This model utilizes the relation between ρ and the number of the generated bits. A. Leontaris et al. [7] proposed rate control schemes for a hierarchical structure-based encoder, where the number of allocated bits and QP for a frame are determined by considering the temporal level and slice type. The techniques proposed in this paper allocate target bits to temporal levels, frames, and MBs by considering their hierarchical levels, sensitivities, and complexities, where an improved R-Q model is proposed. The QPs are determined to control the generated bitrate by using the proposed R-Q model. The encoder using the proposed algorithm yields a bitstream with a bitrate that is equal to the target value. Note that the schemes proposed in this paper can be extended for use with SVC. This paper is organized as follows. In Section II, we briefly describe the conventional schemes. A new R-Q model is proposed in Section III. In Section IV, we propose a bit allocation algorithm to assign the target bits for each MB. To generate a bitstream with a rate equal to the target bits, an efficient rate control algorithm is proposed in Section V. The proposed schemes are summarized in Section VI. We

c 2010 IEEE 1051-8215/$26.00

SEO et al.: EFFICIENT BIT ALLOCATION AND RATE CONTROL ALGORITHMS FOR HIERARCHICAL VIDEO CODING

extend the proposed schemes for use with SVC in Section VII. Simulation results are presented in Section VIII. Section IX presents the conclusions. II. Conventional Rate Control Schemes In the motion estimation (ME) module, the QP has to be fixed before motion vectors are estimated and the mode is determined since the QP is used in ME/mode decision (MD). However, before performing motion estimation and mode decision procedures, the QP value needed to generate the bitstream with target bitrate cannot be computed. This is generally known as “Chicken and Egg Dilemma” [5]. Several schemes have been proposed to solve this problem [5], [6], [8], [9]. The rate control scheme proposed in [9] uses two passes to overcome the dilemma. If the first encoding pass using a specific QP fails to generate the target bitrate, then a second pass is conducted to refine the QP value. Since multiple passes are used to encode video data, the complexity of this method is very high. In the 7th Joint Video Team (JVT) meeting at Pattaya, Thailand, Li et al. [5] proposed a rate control scheme that uses a quadratic R-Q model. This scheme uses two steps, the first of which assigns the target bits for the current frame, where a fluid flow traffic model and the complexity are utilized. In the second step, Q is determined by using the proposed quadratic R-Q model [5], as follows: Vt X1 X 2 (1) = + MAD Q Q2 where Vt , Q, X1 , and X2 are the assigned target bits, the quantization step size, and two model parameters, respectively. In (1), the parameters X1 and X2 are calculated as follows: n −1 i=1 Qi Si − X2 Qi X1 = (2) n n n n n i=1 Si − ( i=1 Q−1 i )( i=1 Qi Si ) X2 = (3) n n −2 2 n i=1 Qi − ( i=1 Q−1 i ) where Qi and Si are the actual quantization step size and actual bits used in previous pictures, respectively. n denotes the number of data (for example, MB or frame) encoded previously. Equations (2) and (3) are derived using least mean square estimation. In (1), MAD denotes the mean absolute difference between the current and reference blocks over a frame, which can be calculated from the current and reference blocks after the motion estimation procedure. However, in (1), since Q has to be determined before the ME procedure, MAD cannot be calculated. Thus, the MAD of the current frame is predicted from the MAD of the previous frame using the linear model as follows:

1211

relation between the generated bitrate and ρ. In [6], an efficient model related to variables {R, ρ, Eρ , EQP , QP} was proposed, where R, ρ, Eρ , and EQP are the bitrate, probability of zeros among the quantized transform coefficients, e−ρ , and QP e− 12.5 , respectively. In [7], two modes were proposed to control the bitrate considering the hierarchical B picture structure. The first mode is called RC MODE 2, where the quantization parameters QPBl of B pictures in the lth level are determined based on the QPI/P values of previous key pictures (I or P picture). In the other mode, RC MODE 3, the QPs for the I and B frames are determined using a quadratic R-Q model derived from the P frame property.

III. Proposed R-Q Model To predict the number of bits generated by the encoder, models related to the MAD have been used in [5], [7], [11], and [12] which are the conventional schemes based on the MAD only. Since most of the generated bits are from encoding the high-frequency components (ac coefficients) of the residual signal, the MAD is not a sufficient quantity for computing the generated bits. To overcome the limitation of the conventional schemes, in this section we propose a new R-Q model based on both the variance of difference (VOD) and MAD to predict the generated bitrates, in contrast to the conventional schemes in [5], [7], [11], and [12], which used the MAD and parameters X1 and X2 . In [11], the number, S, of bits generated from encoding a frame is S =κ·

√

MAD

(5)

where κ is a constant value that depends on the coded data. The new model that considering both the VOD and MAD can be represented by √ √ VOD MAD ZVOD = σ · + (6) Q Q

(4)

where ZVOD denotes the cost related to the generated bits, and VOD is the averaged variance for the residuals of all of the 4×4 blocks in a frame. The relationship between the generated bits S and ZVOD is shown in Fig. 1(a), (c), and (e), where the test sequence is Football (CIF size), the hierarchical group of picture (HGOP) size is set to 8, the intra period is 32, and the QPI/P value of the I and P frames is varied from 6 to 48 by intervals of 2. The QPBl values of the B frames are set to “QPI/P + l” where l denotes the hierarchical level of each B frame. In order to compare the proposed model of (6) with a MAD-based model which is represented by √ MAD ZMAD = (7) Q

where MADc and MADp are the MADs of the current and previous frames, respectively. α1 and α2 are two parameters in the prediction model. Recently, Lim et al. [6] proposed a rate control scheme using a ρ-domain source model which employs a linear

the relationship between ZMAD and the generated bits, S, is represented in Fig. 1(b), (d), and (f). In Fig. 1, each symbol “o” is located at a point indicated by (S, Z), where “S” is the number of bits generated by the Joint Model (JM) 16.1 reference software [13] with a QP, and “Z” is the number

MADc = α1 × MADp + α2

1212

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 20, NO. 9, SEPTEMBER 2010

Fig. 1. Relation between Z and the number of the generated bits. (a) I slice with ZVOD . (b) I slice with ZMAD . (c) P slice with ZVOD . (d) P slice with ZMAD . (e) B slice with ZVOD . (f) B slice with ZMAD .

predicted by (6) and (7). As observed from Fig. 1, both (6) and (7) produce a non-linear relationship between Z and the actual bitrates, S. In order to linearize the relationship, we apply a squared root function to the costs as follows: √ √ √ VOD MAD ZVOD = σ · + (8) Q Q √ √ MAD ZMAD = . (9) Q √ The √ relationships of the generated bitrates S to ZVOD and ZMAD are shown in Fig. 2. To measure the accuracy of the

models, we use the R2 function given in [12], [14], and [15] as follows: ˆi 2 Xi − X R2 = 1 − i 2 . ¯ X − X i i

(10)

ˆ i are the actual and predicted values of the ith In (10), Xi and X ¯ is the mean of all data points data point, respectively, and X shown in Fig. 2. The last term in (10) implies the squared error sum normalized by variance of the actual values. The ˆ i is obtained by using the linear models in (8) estimated value X

SEO et al.: EFFICIENT BIT ALLOCATION AND RATE CONTROL ALGORITHMS FOR HIERARCHICAL VIDEO CODING

1213

√ √ √ √ Fig. 2. and the number of the generated bits. (a) I slice with ZVOD . (b) I slice with ZMAD . (c) P slice with ZVOD . (d) P slice √ √ Relation between Z √ with ZMAD . (e) B slice with ZVOD . (f) B slice with ZMAD . TABLE I R2 Values When the Generated Bitrates are Estimated by √ √ ZMAD Using and ZVOD Slice Type I P B

Values of R2 √ √ Using ZMAD Using ZVOD 0.9619 0.9790 0.9706 0.9852 0.9598 0.9732

and (9). If the model can predict the generated bitrates exactly ˆ i for all i, R2 becomes 1. The for all of the data, i.e., Xi = X R2 values for (8) and (9) are shown in Table I. As observed in Table I, the proposed model in (8) yields more accurate results than using the MAD only as in (9). Consequently, we propose a new R-Q model that uses the VOD as follows: S

= ξ1 ×

√ σ·

VOD + Q

√

MAD + ξ2 Q

where ξ1 and ξ2 are linear model parameters.

(11)

1214

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 20, NO. 9, SEPTEMBER 2010

TABLE II bi and βi Values Used in (19) Slice Type 4

Fig. 3.

I P B1 B2 B3 B4

Hierarchical structure where HGOP size is 8.

b 0.2792 0.2673 0.2914 0.2439

β 1.3684 1.3903 1.2979 1.3061

GOP Size 8 b β 0.2676 1.3815 0.2642 1.3926 0.3115 1.3305 0.3036 1.3003 0.2468 1.2816

16 b 0.2625 0.2647 0.3364 0.3548 0.3397 0.2879

β 1.3902 1.3966 1.2991 1.2554 1.2348 1.2111

IV. Proposed Bit Allocation Scheme We propose a bit allocation algorithm that considers hierarchical level, complexity, and sensitivity of each frame. Fig. 3 represents the hierarchical structure of a sequence that gives temporal scalability to a bitstream, where some B frames are used as reference frames. The proposed scheme consists of three steps used to allocate bits for (a) a hierarchical level, (b) a frame, and (c) a MB.

are the same within an HGOP. Thus, we apply weighting factors to assign bits to a frame, as in (14), which follows the ideas from [17] l−1 l w l Tavg (14) = × RHGOP − Gu L max l l k k u=0 w ·N + w ·N

A. Bit Allocation for a Hierarchical Level

l where Tavg and N l denote the average target bits for a frame and the number of frames in temporal level l, respectively. Lmax is the maximum temporal level number in the current HGOP structure. Gu denotes the number of bits generated from encoding pictures in temporal level u. We note that Gu u is equal to Tavg · N u . Thus, (14) is rewritten as l−1 wl l u u Tavg = × RHGOP − Tavg ·N . (15) L max l l k k u=0 w ·N + w ·N

The bits allocated for each intra-period are represented as follows: BitRate (12) × Nintra FrameRate where Icurr represents the bits assigned to a current intra-period and Iprev represents the bits that remain after the data in the previous intra-period have been encoded. Nintra denotes the number of frames in an intra-period. An intra-period consists of several HGOP structures. The bits assigned for an HGOP are calculated by a uniform assignment as follows: Icurr = Iprev +

RHGOP =

Icurr NHGOP

(13)

where RHGOP is the bit number allocated for a current HGOP, and NHGOP is the number of remaining HGOPs in the current intra-period. In Fig. 3, an HGOP consists of I, P, and B pictures, where the distortion resulting from encoding a particular picture affects the quality of other pictures due to their dependency in the hierarchical structure. The quality of P is affected by the distortion generated in encoding I picture. The quality of B1 is affected by those of I and P. As the temporal level increases, the quality of B2 is affected by the distortion generated in encoding {I and B1} or {B1 and P}. We know from this dependency that the significant factors for pictures depend on their level and slice type. To incorporate these significant factors into the algorithm, weighting factors have been used in bit allocation schemes [7], [16], where the weighting factors can be fixed or determined by users. In the proposed algorithms, the temporal weighting factors are optimized instead of using fixed values. From the dependency between pictures, we define weighting factor wl , which indicates the significance of the picture, where l denotes the hierarchical level. In Fig. 3, all of the picture types in a level

k=l+1

k=l+1

To increase the coding efficiency, the weighting factors have to be optimized as follows: ⎫⎤ ⎡ ⎧ Lmax ⎨ Ni ⎬ Distji (w)+λij ·Rateij (w) ⎦ (16) w∗= arg min ⎣ ⎩ ⎭ wl >0,w⊂R i=0

j=1

where Distji (w) and Rateij (w) are the distortion and the generated bitrate, respectively, when weighting factors w are used. λij denotes a Lagrange multiplier for the ith level and the jth picture. Weighting factors w can be represented as (17) w = w0 , w1 , . . . , wLmax . Lmax N i i In (16), i=0 j=1 Ratej (w) is equal to the number of bits assigned to an HGOP (i.e., RHGOP ). We know that Lmax N i i i=0 j=1 Ratej (w) is independent of w. Thus, the optimization of (16) can be simplified to ⎧ ⎫⎤ ⎡ Lmax ⎨ Ni ⎬ w∗ = arg min ⎣ Distji (w) ⎦ ⎩ ⎭ wl >0,w⊂R i=0

=

arg

min

wl >0,w⊂R

j=1

[DHGOP ]

(18)

i i where DHGOP = Li=0max N j=1 Distj (w). DHGOP in (18) can be approximated using the distortion model proposed in [8], as follows:

SEO et al.: EFFICIENT BIT ALLOCATION AND RATE CONTROL ALGORITHMS FOR HIERARCHICAL VIDEO CODING

⎧

DHGOP ≈

Lmax ⎪ ⎨ i=0

≈

⎛ ∞

⎪ ⎩m=−∞

Lmax

i

b Q

⎜ ⎝

i (m+1/2)Q

(m−1/2)Qi

i i β

N

i

⎞ ⎫ ⎪ ⎬ 2 ⎟ x−m·Qi ·fX (x) dx⎠·N i ⎪ ⎭

TABLE III Initial Weighting Factors, wlt Slice Type

(19)

which can be rewritten as

Qi = σ i ·

√

VODi +

√

MADi ×

i Tavg − ξ2

−2

ξ1

.

⎛

C4i ⎝RHGOP −

i−1

where

#2βi −1 " i ∂DHGOP C2 · wi + C3i i i = 2 · C1 · β · ∂wi C4i · wi − C5i ⎧ ⎨ −C2i · C5i − C3i · C4i − C2i · wi 2 + C3i · · i 2 ⎩ C4 · wi − C5i

∂C4i ∂wi

⎫ ⎬ ⎭

(32)

j ∂Tavg ∂C4i =− (33) · Nj i i ∂w ∂w j=0 j−1 j ∂Tavg −wj·N i u u = Tavg·N 2 × RHGOP − ∂wi L max u=0 wj·N j+ wk·N k i−1

(25) ⎞

j Tavg · N j ⎠ − ξ2i · N i

16 13.25 8.5 8.0 4.0 2.0 1.0

where h and τ are the number and scaling factor, iteration l [h] respectively. ∇DHGOP w is calculated by & % ∂DHGOP ∂DHGOP ∂DHGOP T ∇DHGOP = (31) , . . . , , ∂w0 ∂wLmax −1 ∂wLmax

k=j+1

+ wj·N j

+

(26)

wj L max

wk·N

j−1 u ∂Tavg × − ·N u i ∂w k u=0

. (34)

k=j+1

(24)

k=i+1

[0]

(21)

After substituting (21) into (19), the equation is rewritten by using (15) as (22), shown at the bottom of the next page. i i and MADpre are predicted by the scheme In (22), VODpre i i described in [5]. ξ1 and ξ2 are linear model parameters for the ith layer. To simplify (22), some of the terms in (22) are denoted as follows: β i i + i C1i bi · N i · σ i · VODpre (23) MADpre C2i ξ1i · N i Lmax C3i ξ1i · wk · N k

HGOP Size 4 8 4.25 7.25 2.5 4.5 2.0 4.0 1.0 2.0 1.0

2 2.75 1.5 1.0

I P B1 B2 B3 B4

i=0

which is not restricted by the bitrate range [8]. In (19), Qi denotes the quantization step size in the ith level. fX (x) is the probability density function of the transformed coefficients, and bi and βi are model parameters that are set to the values in Table II. DHGOP in (19) can be modified according to the HGOP size, since the number of frames in a temporal level and the distance between the current and reference frames varies with the HGOP size. The values for bi and βi in Table II were i be the average target bitrate determined empirically. Let Tavg i assigned to the ith level, substituting Tavg into S in (11) gives √ √ VODi MADi i = ξ1 · σ i · (20) + + ξ2 Tavg i Q Qi

1215

Note that the derivatives of C3i and C5i with respect to wi are ‘zero’ since these are independent of wi , although they are functions of wk in (25) and (27). In (34), the partial derivative j of Tavg with respect to wi is calculated recursively. The initial value of recursive function (34) is

j=0

C5i ξ2i ·

Lmax

wk · N k .

(27)

k=i+1

DHGOP in (22) can be represented by the following equation by using (23)–(27): ! i$ " i Lmax i #2β i C · w + C 2 3 C1i × . (28) DHGOP = i i − Ci C · w 4 5 i=0 To obtain the optimized weighting factors w, the conventional steepest descent algorithm [18] is used as follows: wl and

[h+1]

= wl

[h]

+τ·δ

[h] δ = −∇DHGOP wl

(29)

(30)

0 ∂Tavg

=' ∂wi

−w0 · N i w0

·

N0

+

L max

wk

(2 × RHGOP . ·

(35)

Nk

k=1

Since the number of temporal levels is finite, the complexity to calculate the derivative is simple for most hierarchical structures. The initial values for the weighting factors are shown in Table III. These values were determined empirically through simulations using the test sequences Foreman, Football, Soccer, and Crew with various bitrates and HGOP structures. Based on the weighting factors, the allocated bit F l for the lth temporal level is calculated as follows: l−1 l l · N w Fl = × RHGOP − Gi . (36) max i wl · N l + Li=l+1 w · Ni i=0

1216

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 20, NO. 9, SEPTEMBER 2010

The bit numbers for frames in the lth level are assigned according to their significances. The details of this scheme are described in the next section. B. Bit Allocation for a Frame The frames in a temporal level have different coding efficiencies from each other. The peak signal-to-noise ratio (PSNRs) of some frames increase dramatically with the assignment of a few more allocated bits, while others produce only a slight gain with much more allocated bits. While conventional schemes [5], [7], [16] do not consider the quality sensitivity characteristics in their work, we consider it in this paper. To allocate target bits for a frame, it is necessary to consider the sensitivity of frames in the computation of assigned bits. In this paper, we consider the increments of PSNR according to the variation in the assigned bits Jul

= =

PSNR Bits PSNR(Qlu−1 − 1) − PSNR(Qlu−1 + 1) Bits(Qlu−1 − 1) − Bits(Qlu−1 + 1)

(37) (38)

where Jul denotes the sensitivity of the uth frame at the lth hierarchical level in an HGOP. Qlu−1 is the quantization step size used in the (u − 1)th frame. In (37) and (38), Jul is calculated by applying the quantization step size Qlu−1 used in the previous frame to the current frame. The values of PSNR(Q) and Bits(Q) can be simply estimated by the method proposed in [8] without encoding the current frame. The target bits for a frame can be allocated with the ratio Jul as follows: ⎛ Tul =

Jul l × (N l − u) Jul + Javg

× ⎝F l −

u−1

Tul =

1 × ⎝F l − Nl − u + 1

DHGOP

=

Sjl ⎠

(39)

⎞ Sjl ⎠ .

⎡ ⎧ ⎪ i Lmax ⎢ i ⎨σ ⎢b ⎣ ⎪ ⎩ i=0 Lmax ⎢ ⎢ i i i ⎢b N σ ⎣ i=0

l VOD(u,v) +

=

l MAD(u,v)

P(0)l(u,v) + 1

(40)

l where B(u,v) denotes the bits allocated for the vth MB in the l uth frame. Cavg denotes the average complexity of the previous l MBs in the current frame. M(u,k) represents the bits generated from the kth MB in the uth frame. NMB represents the number l of MBs in a frame. v−1 k=0 M(u,k) is the total bits generated from encoding all of the previous MBs.

An efficient rate control algorithm is necessary to generate the assigned bit using the bit allocation process described in Section IV. The bitrate is controlled by adjusting the QP. When the first frame in a hierarchical level is encoded, the initial QP is set as follows: ⎧ 40, if (L1 > b/p) ⎪ ⎪ ⎪ ⎪ ⎪ 35, if (L1 ≤ b/p < L2 ) ⎪ ⎪ ⎪ ⎨30, if (L ≤ b/p < L ) 2 3 0 QPinit = (43) ⎪ 25, if (L ≤ b/p < L ) 3 4 ⎪ ⎪ ⎪ ⎪ ⎪ 20, if (L4 ≤ b/p < L5 ) ⎪ ⎪ ⎩ 15, if (L5 ≤ b/p) l 0 = QPinit +l+1 QPinit

i VODpre

(41)

where P(0) is the probability that the transformed coefficients l l have zero values. In (41), VOD(u,v) , MAD(u,v) , and P(0)l(u,v) are the averaged values of VODs, MADs, and P(0)s of the 4×4 blocks at the vth MB in the uth frame, at the lth temporal level. VOD, MAD, and P(0) are predicted from the data of the previous frame. η is a constant that can be set to a value empirically determined from general video sequences. The assigned bit for a MB is computed as v−1 l C (u,v) l l (42) B(u,v) = l × Tul − M(u,k) l C(u,v) + Cavg × (NMB−v−1) k=0

j=1

⎡ =

η· l C(u,v)

V. Proposed Rate Control Scheme

where Tul denotes the target bits for the uth frame at the lth l level. u−1 j=1 Sj represents the bits generated while encoding the previous frames. If the frame is in the first or second HGOP, the target bits are assigned uniformly for frames in the HGOP by the following equation since the Jul cannot be calculated with the parameters of the previous HGOP: u−1

In this section, we consider the complexity of a MB to allocate bits for the MB. The complexity implies the difficulty of encoding the MB. The complexity indicates the quantity of bits generated from encoding the MB. This complexity is defined as )

⎞

j=1

⎛

C. Bit Allocation for a MB

+

i −ξ i Tavg 2 ξ1i

i + VODpre

⎫βi i ⎪ ⎬ MADpre

2

⎪ ⎭

⎤ ⎥ N i⎥ ⎦ ⎛

i MADpre

(44)

β i ⎜ ξ1i ⎜ ⎝ wi · RHGOP −i−1 Tavg k ·N k k=0

Lmax

wi ·N i +

k=i+1

wk ·N k

⎞2βi ⎤ ⎟ ⎥ ⎟ ⎥ ⎠ ⎥ ⎦ i

−ξ2

(22)

SEO et al.: EFFICIENT BIT ALLOCATION AND RATE CONTROL ALGORITHMS FOR HIERARCHICAL VIDEO CODING

where L1

L2

=

=

L3

=

L4

=

L5

=

⎧ ⎪ ⎨0.3, 0.3, ⎪ ⎩ 0.6, ⎧ ⎪ ⎨0.4, 0.7, ⎪ ⎩ 1.0, ⎧ ⎪ ⎨0.5, 1.0, ⎪ ⎩ 1.2, ⎧ ⎪ ⎨0.6, 1.4, ⎪ ⎩ 2.4, ⎧ ⎪ ⎨0.7, 1.7, ⎪ ⎩ 3.0,

is calculated for fixed ξ1 and ξ2 values. To find σ, (11) is rewritten as

if (Hresol ≤ QCIF) if (QCIF < Hresol ≤ CIF) if (Hresol > CIF)

'

if (Hresol ≤ QCIF) if (QCIF < Hresol ≤ CIF) if (Iresol > CIF)

if (Hresol ≤ QCIF) if (QCIF < Hresol ≤ CIF) if (Hresol > CIF) (45)

l is an initial QP where b/p is the target bits/pixel, and QPinit value at the lth hierarchical level. Hresol denotes the resolution 0 of the picture. For the 0th level, the initial parameter QPinit l is determined by (43). The QPinit values of other levels are determined by (44). When the frame is not the first one in a level, Q value is decided using the R-Q model proposed in (11). In (11), σ, ξ1 , and ξ2 have to be fixed before the calculation. We propose three steps for determining these parameters. In the first step, ξ1 and ξ2 are calculated for a fixed σ value, where the cost function is ⎫⎤2 ⎧ ⎡ √ √ N−1 ⎬ ⎨ VOD MAD i i ⎣Si − ξ1 · σi · E1 = + + ξ2 ⎦ . (46) ⎭ ⎩ Qi Qi i=0

By using the partial differential method for E1 , the parameters ξ1 and ξ2 are calculated from the following operations: √ √ VODi MADi (47) Zi = σi · + Qi Qi ∂E1 = −2 × [Si − {ξ1 × Zi + ξ2 }] = 0 ∂ξ2 i=0 N−1

N−1

ξ2 =

Si − ξ 1 ·

i=0

N×

Zi

i=0

N N−1

∂E1 = −2 × ∂ξ1

ξ1 =

N−1

(48)

[Si − {ξ1 × Zi + ξ2 }] × Zi = 0

'N−1 ( 'N−1 ( Si · Zi− Si · Zi

i=0

i=0

N−1

'N−1 (2 Zi− Zi

i=0

i=0

N×

i=0

√

(2 =σ·

VOD + Q

√

MAD . Q

(50)

Applying the partial differential method to (51) gives the following scheme for determining σ for fixed ξ1 and ξ2 : ! √ √ ($ N−1 ' Si −ξ2 (2 ' ∂E2 VODi MADi = −2× − σ· + ∂σ ξ1 Qi Qi i=0 √ VODi · =0 Qi N−1 Si −ξ2 2 √VODi N−1 √VODi·√MADi · Qi − ξ1 Q2i i=0 . σ = i=0 N−1 VODi i=0

Q2i

(52) Finally, ξ1 and ξ2 are recalculated using (48) and (49) with the optimized σ value to minimize the prediction error. After the parameters (ξ1 , ξ2 and σ) have been calculated, Q is determined to control the generated bitrate. Q is determined by using an equation modified from (11);

Q=

σ·

√

√ VODcurr + MADcurr 2

(53)

S−ξ2 ξ1

where VODcurr and MADcurr are the VOD and MAD of the current frame. The QP can be selected for a particular value of Q. The proposed R-Q model (53) is also applied to determine the Q of a MB. After a Q for a MB has been determined, QPMB is modified as follows to prevent a rapid fluctuation in the quality of a frame: ⎧ ⎪ ⎨QPframe − 2, if(QPframe − 2 > QPMB ) QPMB = QPframe + 2, if(QPframe + 2 < QPMB ) ⎪ ⎩ QPMB , otherwise

(54)

where QPMB and QPframe are the QP values determined for a MB and frame, respectively.

VI. Summary of the Proposed Schemes

i=0

N−1

S − ξ2 ξ1

The cost function using (50) is ! √ √ ($ 2 N−1 ' Si −ξ2 (2 ' VODi MADi E2 = − σ· . (51) + ξ1 Qi Qi i=0

if (Hresol ≤ QCIF) if (QCIF < Hresol ≤ CIF) if (Hresol > CIF)

if (Hresol ≤ QCIF) if (QCIF < Hresol ≤ CIF) if (Hresol > CIF)

1217

(49)

where N is the number of previously encoded pictures in the current level. In the second step of the proposed scheme, σ

The proposed overall algorithm is summarized in Fig. 4. The overall algorithm consists of bit allocation and rate control parts. The bit allocation procedure is performed consecutively for an intra-period, HGOP, temporal level, frame, and MB. The bit allocation module has been described in (12), (13), (36), (39), and (42) in Section IV. After the number of bits for a frame has been assigned, the Q for the frame is determined by (53), where S is substituted by the target bitrate Tul for a

1218

Fig. 4.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 20, NO. 9, SEPTEMBER 2010

Flowchart of the proposed bit allocation and rate control algorithms

frame. The bit quantity for a MB is assigned by (42), where the complexities of MBs are used. The Q value for a MB l is calculated by (53), where the B(u,v) bits assigned to a MB is applied into S. The algorithm is applied for all HGOPs, temporal levels, frames, and MBs iteratively until all of the data have been encoded. This paper proposed a new R-Q model in (11) of Section III, where the VOD and MAD are utilized, while conventional schemes use only the MAD. This model is used to allocate the bitrate for a temporal level and a MB in (36) and (42) of Section IV, respectively. The model is also used to control the generated bitrate in (53) of Section V. The proposed schemes (bit allocation and rate control) outperform the conventional methods because the proposed R-Q model is more accurate than the conventional schemes, as shown in Table I. In addition, it uses the optimized weighting factor wi and PSNR sensitivity Jul in (36) and (39) to obtain coding gain. VII. Extension to SVC The algorithms proposed in the previous sections can be extended to SVC(H.264/AVC Annex G). SVC has been developed as an extension of H.264/AVC and many tools have been adopted to SVC [3], where the sequence structure consists of base and enhancement layers. The base layer encoded by the

SVC encoder can be decoded by H.264/AVC decoder. In order to improve the coding efficiency of the enhancement layer, inter-layer prediction mode using the coding information of the base layer is employed. A. Conventional Rate Control Schemes for SVC Some research [16], [17] has been conducted to control the bitrate in the SVC codec, where the bitrates of the base layer are controlled by conventional schemes [5], [12] (proposed for JM reference software), since the base layer of SVC is compatible with H.264/AVC. In [16], a bit allocation algorithm for hierarchical B picture structure was proposed, along with an improved MAD prediction scheme. When base layer is encoded, the MAD is predicted from the previous pictures in the base layer. This method is called the temporal prediction of the MAD and is used in the H.264/AVC reference software [13]. On the other hand, when the enhancement layer is encoded, since the base layer has been encoded, the data in the base layer can be used to predict the MAD of the current picture in the enhancement layer, which is called spatial prediction. In [16], to efficiently predict the MAD of the picture in the enhancement layer, the temporal and spatial predictions are adaptively selected. In [17], the bitrate of each temporal layer was allocated using the fixed weighting factors assigned to lev-

SEO et al.: EFFICIENT BIT ALLOCATION AND RATE CONTROL ALGORITHMS FOR HIERARCHICAL VIDEO CODING

TABLE IV eMAD =

Averaged Prediction Errors

1 T

T i i − MADpred MADact

Football Crew Soccer Tempete

Sequence Football Crew Soccer Tempete

Sequence Football Crew Soccer Tempete

TABLE V eVOD =

Averaged Prediction Errors

i=1

When the Proposed MAD Prediction and Conventional Schemes are Applied to Base and Enhancement Layers in SVC Codec

Sequence

1219

Sequence Football Crew Soccer Tempete

(b) Frame Level Enhancement Base [5] [16] Proposed 0.8378 0.5236 0.4879 0.4751 0.6972 0.4026 0.3696 0.3291 0.2341 0.2462 0.2592 0.2378 0.2923 0.1380 0.1362 0.1253

Sequence Football Crew Soccer Tempete

(b) MB Level Enhancement [5] [16] Proposed 2.8723 1.8228 1.6323 1.6261 1.2781 1.0111 0.9402 0.9261 1.7175 1.2674 1.1929 1.1553 1.6154 1.3868 1.2808 1.2407 Base

B. Proposed Prediction Scheme for Variables Since inter-layer prediction of H.264/AVC Annex G can be applied to each MB adaptively in the enhancement layer, some MBs are encoded with inter-layer prediction and others are encoded without inter-layer prediction. Therefore the probability of inter-layer prediction should be considered when the MAD of the enhancement layer is predicted. In our paper, to predict the MAD of the enhancement layer, we propose a combined form with both the temporally and spatially predicted MADs, where the combination ratio is set to the inter-layer prediction probability Probinter-layer MADpred = 1 − Probinter-layer · MADtemp +Probinter-layer · MADspat (55) where MADpred denotes the MAD predicted for the current frame in the enhancement layer. MADtemp and MADspat denote the predicted MADs temporally and spatially predicted MADs, respectively. Probinter-layer is the probability that the MBs in the current frame are encoded with the inter layer prediction mode. Since Probinter-layer is unknown before the current frame has been encoded, the probability of the previous picture is used. The method of (55) can be used to predict other variables, VOD and P(0) in (11) and (41), respectively. Since the first frame is encoded with the initial QP, the R-Q model is not needed for the first frame. Consequently, the initial values of the variables [e.g., MAD, VOD, P(0)] are not needed either.

T i i − VODpred VODact

i=1

When the Proposed VOD Prediction and Conventional Schemes are Applied to Base and Enhancement Layers in SVC Codec

(a) Temporal Level Enhancement Base [5] [16] Proposed 1.0047 0.6100 0.5670 0.5563 0.6451 0.3991 0.3969 0.3273 0.2763 0.2881 0.2981 0.2772 0.3197 0.1445 0.1429 0.1294

els in the hierarchical structure. After the bits were allocated, the QP was determined by the R-Q model used in [5].

1 T

(a) Temporal Level Enhancement Base [5] [16] Proposed 10.7326 11.9954 11.0297 10.7326 2.7594 3.1535 3.0546 2.7594 4.4443 4.2357 4.5610 4.4443 2.9250 3.1334 3.1107 2.9250 (b) Frame Level Enhancement Base [5] [16] Proposed 18.9245 10.0545 9.0958 9.0289 5.1571 2.8626 2.7258 2.4762 5.1183 3.6927 3.9699 3.8462 7.1028 2.7999 2.7801 2.6423

Sequence

Base

Football Crew Soccer Tempete

70.8352 13.7526 37.0749 35.1141

(c) MB Level Enhancement [5] [16] Proposed 38.6286 33.9128 33.3922 10.2688 9.4488 9.3042 22.6264 21.3121 20.5147 25.0208 22.7753 22.0406

TABLE VI i |Ti −Si | Error Ratio er = When the Proposed and i Ti Conventional Rate Control Algorithms are Applied to H.264/AVC Codec Sequence Soccer City Ice Crew

HGOP Size 4 8 4 8 4 8 4 8

[5] 0.3599 0.5656 0.6349 0.8708 0.3567 0.7377 0.5966 0.8852

[7] RC MODE 2

RC MODE 3

0.3959 0.6637 0.7265 0.8430 0.3651 0.6747 0.6388 1.0581

0.3115 0.8988 0.7040 1.0950 0.7378 1.4927 0.6288 0.9334

Proposed Rate Control 0.1593 0.2118 0.1965 0.2066 0.1586 0.2413 0.2430 0.3080

Tables IV and V1 show the averaged prediction errors when the conventional and proposed schemes are applied to predict the MAD and VOD of the temporal level, frame, and MB of the test sequences Football, Crew, Soccer, and Tempete. The averaged prediction errors for the MAD and VOD are defined as T , 1 ,, , i i eMAD = (56) − MADpred ,MADact , T i=1 eVOD =

T , 1 ,, , i i − VODpred ,VODact , T i=1

(57)

1 In Tables IV and V, HGOP size is set to 4. Resolutions of base and enhancement layers are quarter common intermediate format (QCIF) and common intermediate format (CIF), respectively. Option for inter-layer prediction of joint scalable video model (JSVM) is set to “Adaptive.” QPbase and QPenh are set to 32 and 28, respectively.

1220

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 20, NO. 9, SEPTEMBER 2010

where eMAD , eVOD and T denote the averaged prediction errors and the number of coded data (e.g., temporal levels, i i i frames, and MBs), respectively. MADact , MADpred , VODact i and VODpred are the actual and predicted MADs and VODs for the ith data, respectively. In these tables, the averaged prediction errors of the proposed scheme are smaller than those of conventional schemes [5], [16].

VIII. Simulation Results In this section, we show simulation results to evaluate the performance of the proposed algorithm. The simulation was implemented with the reference software JM 16.1 [13]. The test sequences were Soccer, City, Ice, Crew, Football, Flower, and Tempete whose sizes are QCIF or CIF. The HGOP size was set to 4 or 8. Context adaptive binary arithmetic coding was used as the entropy coder and two reference frames were used. We compared the proposed schemes with the conventional methods described in [5] and [7]. Among the schemes proposed in [7], RC MODE 2 and RC MODE 3 were used since these methods consider the hierarchical picture structure. We tested the algorithms for as many cases as possible. A. Results From Rate Control Using the Proposed R-Q Model In this section, we present the simulation results for the rate control scheme proposed in Section V. This technique is used to control the QP so that the generated bitrate is equal to the target bitrate. The target bitrates of I, P, B1, B2, and B3 were set to 10, 7, 5, 3, and 2 kbits, respectively. Table VI shows the error ratio between the target and the generated bits. The error ratio is defined as NData |Ti − Si | er = i=1 (58) NData i=1 Ti where Ti and Si are the target bits and the generated bits in the ith picture, respectively. Note that the R-Q models in [5] and [7] were derived based on only the characteristics of P frames. The conventional scheme in [5] cannot control the bitrate of B picture, since the method in [5] does not consider B frames. RC MODE 2 of [7] sets the QP for B picture with the value related to QPs of I and P pictures using offsets. RC MODE 3 of [7] assigns the target bits for B and I pictures, but decides the QP based on R-Q model designed for P picture, not B and I pictures. As opposed to the conventional schemes [5], [7], since the proposed R-Q model is derived to control the generated bits for all frame types, as shown in Table I and Fig. 2, the performance of the proposed rate control scheme in controlling the generated bitrates is much more precise than the conventional schemes, as seen in Table VI. B. Results of Bit Allocation and Rate Control Schemes In this section, the performance of the algorithm described in Section VI is evaluated. We compare the proposed scheme with conventional schemes [5], [7]. Table VII shows the bitrate and PSNRs for the various schemes where Intra period is set to 32. As observed in Table VII, the proposed scheme is more

Fig. 5. R-D curves when each step of bit allocation algorithm proposed in Section IV is used separately. Test sequence is Tempete.

efficient at controlling the generated bits and maintaining highvideo quality than the conventional schemes under various circumstances. The proposed scheme has coding gain over the conventional schemes for the following reasons. In the proposed algorithm, the weighting factors for temporal levels are optimized to assign target bits to each level as described in Section IV-A, while the weighting factors of the conventional schemes are set to fixed values or are selected by the user. Moreover, the proposed scheme considers the coding efficiency according to the increments of the assigned bits, while the conventional schemes do not. The bit allocation scheme proposed in Section IV consists of three steps for the temporal, frame, and MB levels. The coding gains resulting from these steps are shown in Fig. 5, where the algorithms for temporal, frame, and MB are denoted by “A,” “B,” and “C,” respectively. “A” denotes the bit allocation algorithm for the temporal level described in Section IV-A. When bitrates are assigned by “A,” bit quantities for the frame and MB are assigned uniformly. When “A + B” is used for bit allocation, the bit quantities for the temporal and frame levels are assigned by the schemes proposed in Sections IV-A and IV-B, respectively, while those for MB are set uniformly. “IWF” denotes a scheme where bit quantities for the temporal level are assigned by using the initial weighting factor described in Table III, while the target bits for the frame and MB are set uniformly. As observed in Fig. 5, each step of the bit allocation scheme proposed in Section IV gives coding gain independently or cooperatively. This means that all of the steps of the proposed algorithm are useful for allocating the target bits for the temporal, frame, and MB levels. In order to check the fluctuation of the generated bits for intra-period, Fig. 6 shows bitrates generated over an intraperiod. From these results, we conclude that our proposed scheme is able to control bitrate over an intra period without fluctuation. C. Results From Schemes Extended to SVC To verify the effectiveness of the schemes extended to SVC, a simulation was performed with JSVM 9.12 [19], [20]. Table VIII shows the results for the circumstance where

SEO et al.: EFFICIENT BIT ALLOCATION AND RATE CONTROL ALGORITHMS FOR HIERARCHICAL VIDEO CODING

1221

TABLE VII Performances of Bit Allocation and Rate Control Algorithms in Terms of Bitrate and PSNR Where the Schemes are Applied to H.264/AVC Sequence

Football (QCIF) Crew (QCIF) Soccer (QCIF) Ice (CIF) Flower (CIF)

[5] HGOP Size 4 8 8 8 16

Target Bitrate (kb/s) 256 384 256 384 128 256 256 512 2000 3000

Bitrate (kb/s) 296.93 408.15 327.12 432.34 161.38 289.60 287.93 583.49 4016.52 4347.42

[7] PSNR (dB) 31.18 34.04 35.72 37.45 33.58 36.69 37.00 40.26 40.20 41.12

RC MODE 2 Bitrate PSNR (kb/s) (dB) 266.36 31.99 396.78 34.57 264.91 35.42 396.14 37.49 132.40 32.59 260.90 36.66 282.33 36.90 597.79 40.28 2818.53 37.05 4062.97 40.61

Fig. 6. Bitrate per a intra period when the proposed and conventional algorithms are used. (a) Target bitrate is 256 kb/s. (b) Target bitrate is 96 kb/s.

RC MODE 3 Bitrate PSNR (kb/s) (dB) 258.65 31.88 388.47 34.47 260.05 35.22 393.74 36.63 130.04 32.62 258.97 36.04 339.33 35.10 598.88 38.44 2551.58 33.81 3206.99 36.14

Proposed Scheme Bitrate PSNR (kb/s) (dB) 255.17 32.02 383.00 34.42 255.70 35.40 383.56 37.82 127.53 32.62 255.87 36.59 249.69 36.00 498.26 39.94 1962.48 33.65 2944.56 36.53

Fig. 7. R-D curves when the bit allocation and rate control algorithms are applied to JSVM 9.12. Test sequence is Crew. MGS and temporal scalabilities are used. Option for inter layer prediction of JSVM is set to “Adaptive.” (a) Base layer. (b) Enhancement layer

1222

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 20, NO. 9, SEPTEMBER 2010

TABLE VIII Performances of Bit Allocation and Rate Control Algorithms in Terms of Bitrate and PSNR When the Proposed Schemes are Applied to SVC

Sequence

Football (HGOP 4)

Flower (HGOP 16)

Sequence

Crew (HGOP 8)

Soccer (HGOP 8)

Layers

Base (QCIF) Enh. (CIF) Base (QCIF) Enh. (CIF) Base (QCIF) Enh. (CIF) Base (QCIF) Enh. (CIF)

Layers

Base Enh. Base Enh. Base Enh. Base Enh.

(QCIF) (QCIF) (QCIF) (QCIF) (QCIF) (QCIF) (QCIF) (QCIF)

(a) Target Bitrate (kb/s) 128 256 256 512 192 768 256 1024

Spatial and [5] Bitrate (kb/s) 127.18 286.75 252.24 511.53 126.23 543.55 213.63 1008.47

Temporal Scalabilities are Used [16] [17] PSNR Bitrate PSNR Bitrate PSNR (dB) (kb/s) (dB) (kb/s) (dB) 26.66 127.11 26.97 125.51 27.24 28.14 255.69 27.33 257.84 27.02 30.91 255.09 30.52 281.47 31.71 31.83 510.30 31.08 615.40 32.49 30.03 191.38 28.39 123.65 30.38 28.87 766.57 27.51 499.10 29.00 33.25 257.16 30.54 192.38 33.38 32.42 1020.39 29.47 873.31 32.23

(b) SNR (MGS) and Temporal Scalabilities are Target [5] [16] Bitrate Bitrate PSNR Bitrate PSNR (kb/s) (kb/s) (dB) (kb/s) (dB) 96 93.13 28.61 95.48 28.87 128 143.93 34.13 127.38 32.78 128 119.22 30.16 127.53 29.57 160 214.65 35.37 159.36 33.71 192 189.48 33.97 191.33 32.69 256 398.98 39.85 255.80 36.41 256 237.30 35.73 255.20 34.52 320 503.82 41.99 318.88 38.25

spatial or SNR [medium grain scalability (MGS)] scalabilities were used. We applied the conventional schemes [5], [16], [17] to each layer independently. In the simulations for [16] and [17], the weighting factors for the temporal levels were set to the default values proposed in [16] and [17]. The option for the inter-layer prediction of JSVM was set to “Adaptive,” intra-period was set to 32, two layers {QCIF and CIF} or {QCIF and QCIF} were used, and the frame rates of the sequences were “30 Hz” for all layers. For the proposed scheme, the algorithms proposed in Sections III–V were applied to the base layer. For the enhancement layer of the SVC encoder, the algorithms that used the variable prediction scheme described in Section VII-B were used. Some of the conventional schemes cannot achieve the target bitrate even if the PSNRs were high. In Fig. 7, the performances of the proposed schemes are compared with those of the conventional schemes in the sense of rate-distortion optimization. The simulation conditions of Fig. 7 are the same as those of (b) in Table VIII. In most cases, the results of the proposed scheme are more efficient than those of the conventional schemes. From these results, we verify that the proposed schemes can be extended to the SVC codec.

IX. Conclusion We proposed efficient bit allocation and rate control algorithms for hierarchical video coding. A new R-Q model was described and the dependency between pictures and sensitivity of picture are utilized to allocate the bitrate optimally. The experimental results showed that the proposed schemes were more accurate at controlling the generated bitrate than the conventional schemes. We could observe from the simulation

Proposed Bitrate (kb/s) 127.86 258.59 253.61 511.21 183.34 727.20 245.59 973.57

Scheme PSNR (dB) 27.46 27.70 31.05 31.60 32.65 31.86 35.02 33.41

Used [17] Bitrate PSNR (kb/s) (dB) 96.63 29.75 148.47 34.56 120.96 31.06 199.87 35.21 200.52 34.60 394.87 40.05 237.32 35.90 437.16 41.40

Proposed Bitrate (kb/s) 96.11 125.58 126.34 157.46 188.11 250.57 251.68 313.23

Scheme PSNR (dB) 30.09 34.62 32.09 35.99 34.72 39.57 36.30 41.07

results that conventional schemes could not efficiently control the bitrate in hierarchical B picture structure. The proposed bit allocation scheme could assign the target bits efficiently, since the proposed scheme considers many factors, such as the temporal level, frame sensitivity, and MB complexity. Acknowledgment The authors would like to thank the reviewers for their constructive and valuable comments on this paper. References [1] Advanced Video Coding for Generic Audiovisual Services, ITU-T Rec. H.264 and ISO/IEC 14496-10, Doc. E32768, Nov. 2007. [2] H. Schwarz, D. Marpe, and T. Wiegand, “Analysis of hierarchical Bpictures and MCTF,” in Proc. ICME, Jul. 2006, pp. 1929–1932. [3] H. Schwarz, D. Marpe, and T. Wiegand, “Overview of the scalable video coding extension of the H.264/AVC standard,” IEEE Trans. Circuits Syst. Video Technol., vol. 17, no. 9, pp. 1103–1120, Sep. 2007. [4] H. C. Huang, W. H. Peng, T. Chiang, and H. M. Hang, “Advances in the scalable amendment of H.264/AVC,” IEEE Commun. Mag., vol. 45, no. 1, pp. 68–76, Jan. 2007. [5] Adaptive Basic Unit Layer Rate Control for JVT, document JVTG012.doc, Joint Video Team of ISO/IEC MPEG and ITU-T VCEG, Mar. 2003. [6] S. C. Lim, H. R. Na, and Y. L. Lee, “Rate control based on linear regression for H.264/MPEG-4 AVC,” Signal Process.: Image Commun., vol. 22, no. 1, pp. 39–58, Jan. 2007 [7] Rate Control Reorganization in the Joint Model (JM) Reference Software, document JVT-W042.doc, Joint Video Team of ISO/IEC MPEG and ITU-T VCEG, Apr. 2007. [8] N. Karmaci, Y. Altunbasak, and R. M. Mersereau, “Frame bit allocation for the H.264/AVC video coder via cauchy-density-based rate and distortion models,” IEEE Trans. Circuits Syst. Video Technol., vol. 15, no. 8, pp. 994–1006, Aug. 2005. [9] Rate Control on JVT Standard, document JVT-D030.doc, Joint Video Team of ISO/IEC MPEG and ITU-T VCEG, Jul. 2002.

SEO et al.: EFFICIENT BIT ALLOCATION AND RATE CONTROL ALGORITHMS FOR HIERARCHICAL VIDEO CODING

[10] K. S¨uehring. H.264/AVC Reference Software (JM 11.0) [Online]. Available: http://iphome.hhi.de/suehring/tml/download/old jm/ [11] B. Xie and W. Zeng, “A sequence-based rate control framework for consistent quality real-time video,” IEEE Trans. Circuits Syst. Video Technol., vol. 16, no. 1, pp. 56–71, Jan. 2006. [12] Y. Liu, G. Li. Zhengguo, and Y. C. Soh, “A novel rate control scheme for low delay video communication of H.264/AVC standard,” IEEE Trans. Circuits Syst. Video Technol., vol. 17, no. 1, pp. 68–78, Jan. 2007. [13] K. S¨uehring. H.264/AVC Reference Software (JM 16.1) [Online]. Available: http://iphome.hhi.de/suehring/tml/download/ [14] J. L. Devore and N. R. Farnum, Applied Statistics for Engineers and Scientists. New York: Duxbury, 1999. [15] D. K. Kwon, M. Y. Shen, and C. C. J. Kuo, “Rate control for H.264 video with enhanced rate and distortion models,” IEEE Trans. Circuits Syst. Video Technol., vol. 17, no. 5, pp. 517–529, May 2007. [16] Y. Liu and Z. G. Li, “Rate control of H.264/AVC scalable extension,” IEEE Trans. Circuits Systems Video Technol., vol. 18, no. 1, pp. 116– 121, Jan. 2008. [17] Rate Control for the Joint Scalable Video Model, document JVTW043.doc, Joint Video Team of ISO/IEC MPEG and ITU-T VCEG, Apr. 2007. [18] G. Arfken, “The method of steepest descents,” in Mathematical Methods for Physicists, 3rd ed. Orlando, FL: Academic, 1985, sec. 7.4, pp. 428– 436. [19] JSVM Software, document JVT-Z203.doc, Joint Video Team of ISO/IEC MPEG and ITU-T VCEG, Jan. 2008. [20] Draft Reference Software for SVC, document JVT-Z211.doc, Joint Video Team of ISO/IEC MPEG and ITU-T VCEG, Jan. 2008.

Chan-Won Seo (S’09) was born in Incheon, Korea, on March 23, 1982. He received the B.S. and M.S. degrees from the Department of Information and Communication Engineering, Sejong University, Seoul, Korea, in 2007 and 2009, respectively. He is currently pursuing the Ph.D. degree from the same university. His current research interests include video coding, scalable video coding, and future video coding.

Jung Won Kang received the B.S. and M.S. degrees in electrical engineering from Hankuk Aviation University, Seoul, Korea, in 1993 and 1995, respectively. She received the Ph.D degree in electrical and computer engineering from the Georgia Institute of Technology, Atlanta, in 2003. Since 2003, she has been a Senior Research Staff Member with the Broadcasting Media Research Group of Electronics and Telecommunications Research Institute, Daejeon, Korea. Her current research interests include video signal processing, video coding, and video adaptation.

1223

Jong-Ki Han was born in Seoul, Korea, on September 5, 1968. He received the B.S., M.S., and Ph.D. degrees in electrical engineering from the Korea Advanced Institute of Science and Technology, Taejon, Korea, in 1992, 1994, and 1999, respectively. From 1999 to 2001, he was a Technical Staff Member with the Corporate Research and Development Center, Samsung Electronics Company, Suwon, South Korea. He is currently an Associate Professor with the Department of Information and Communications Engineering, Sejong University, Seoul, Korea. His current research interests include image and audio signal compression, transcoding, and very large scale integrated signal processing.

Truong Q. Nguyen (F’05) received the B.S., M.S., and Ph.D. degrees in electrical engineering from the California Institute of Technology, Pasadena, in 1985, 1986, and 1989, respectively. He is currently a Professor with the Department of Electrical and Computer Engineering, University of California, San Diego. He is the coauthor (with Prof. G. Strang) of a popular textbook, Wavelets & Filter Banks (Cambridge, MA: Wellesley-Cambridge, 1997), and the author of several MATLAB-based toolboxes on image compression, electrocardiogram compression and filter bank design. He has over 300 publications. His current research interests include video processing algorithms and their efficient implementation. He received the IEEE Transaction in Signal Processing Paper Award (image and multidimensional processing area) for the paper he co-wrote with Prof. P. P. Vaidyanathan on linear-phase perfect-reconstruction filter banks in 1992. He received the NSF Career Award in 1995 and is currently the Series Editor of Digital Signal Processing for Academic Press. He has served as an Associate Editor for the IEEE Transaction on Signal Processing from 1994 to 1996, for the Signal Processing Letters from 2001 to 2003, for the IEEE Transaction on Circuits and Systems from 1996 to 1997 and 2001 to 2004, and for the IEEE Transaction on Image Processing from 2004 to 2005.