AN OPTIMAL PACKETIZATION SCHEME FOR

2 downloads 0 Views 80KB Size Report
mance metric for streaming FGS bit streams over packet erasure networks is first ..... July 2000. [4] Xiaolin Wu, Samuel Cheng, and Zixiang Xiong, “On Packeti-.
AN OPTIMAL PACKETIZATION SCHEME FOR FINE GRANULARITY SCALABLE BITSTREAM Hua Cai1 , Guobin Shen2 , Zixiang Xiong3 , Shipeng Li2 , and Bing Zeng1 1

Department of Electrical and Electronic Engineering, The Hong Kong University of Science and Technology 2

3

Internet Media Group, Microsoft Research China

Department of Electrical Engineering, Texas A&M University

ABSTRACT This paper addresses the problem of optimal packetization for Fine Granularity Scalability (FGS) bit stream. A very general performance metric for streaming FGS bit streams over packet erasure networks is first defined and analyzed. Then three packetization strategies, namely the baseline packetization, the binary-tree packetization and the optimal packetization are presented, justified and compared. Finally, some experimental results are presented to demonstrate the effectiveness of the packetization schemes. 1. INTRODUCTION Streaming multimedia contents over the Internet is becoming more and more popular in recent years, partially due to the extraordinary presentation capability of multimedia data and partially due to the wider and wider deployment of broadband networking services. However, network heterogeneity and competitions among traffic have made the available bandwidth fluctuating for multimedia streaming applications. Furthermore, the delivery is not errorfree, due to the best effort nature of the current Internet. In order to provide the best possible quality of service (QoS) to the end users, two kinds of efforts have been exerted. One happens at the source coding stage while the other happens at the delivery stage. Previously, layer coding [1] was proposed to combat the heterogeneity of networks. However, it turns out that these schemes are inefficient when the bandwidth fluctuates. Recently Fine Granularity Scalability (FGS) coding [2] scheme was proposed. An FGS bit stream can be arbitrarily truncated and each truncated version is fully decodable. Therefore, it provides much better bandwidth adaptation capability. As a result, it was adopted in the latest video coding standard, MPEG-4 [3]. Despite the scalability provided by source coding schemes such as the FGS scheme, different delivery mechanisms still make big differences, given the current states of the Internet. Lots of researches have been performed in this aspect. For example, the optimal packetization problem of an embedded bit stream produced by a wavelet coding technique was studied in [4]. An optimal method to transport packetized media over a packet erasure channel such as the Internet was proposed by Philip Chou et al. [5]. These provided a very general framework that takes into consideration almost all the factors that are critical for continuous playback of a multimedia stream, such as packet dependency, time constraint and This research was fully carried out in Microsoft Research China.

error protection. It was shown that the dependency among packets plays an important role to the overall performance: the less dependency among packets, the better performance will be achieved. In this paper, we present an optimal packetization method for delivering FGS bit stream over a lossy packet switching network. We have chosen to use the FGS coding scheme because of its intrinsic bandwidth adaptation ability. Our goal here is very simple: try to packetize the bit stream such that the dependency among resultant packets is minimized. To this end, we propose three packetization strategies, namely the baseline packetization strategy, the binary-tree packetization strategy and the optimal packetization strategy. The baseline packetization strategy is simple and straightforward, but it introduces heavy dependency among packets. The binary-tree packetization strategy greatly reduces the dependency and the optimal packetization strategy eliminates the dependency completely. The rest of the paper is organized as follows. We first define the performance metric of a packetization scheme in Section 2. In Section 3, three packetization strategies, namely the baseline packetization, the binary-tree packetization and the optimal packetization, are elaborated and compared. Experimental results are shown in Section 4. Finally, we draw some conclusions and outline our future work in Section 5. 2. THE PERFORMANCE METRIC Since the available bandwidth is time varying, it is very desirable to have a scalable video coding scheme that can delicately adapt to available bandwidth so as to achieve the highest bandwidth efficiency. The FGS coding scheme is such a technique, with which the encoder generates two bit streams: the base layer and the enhancement layer. The based layer is coded by traditional motion compensated DCT transform, the same as other layered coding techniques. The base layer is usually of low quality and is very thin to fit in typical small bandwidths. The residue between the original DCT coefficients and the dequantized base layer DCT coefficients forms the enhancement bitstream and is coded with the bit plane coding technology. Bit plane coding yields an embedded bitstream and achieves the desired fine granularity scalability. This radically discriminates FGS from other ordinary layered coding techniques. The dependency among bit planes is very strong. A bit plane is decodable only if all its ancestors, i.e., lower bit planes that it depends on, are received and successfully decoded. In this paper,

J1 =

X

∆D(f, l, i) × (1 − pe (f, l, i))

(f,l,i)∈I

×

Y

P1

1st

b it p lan e

we define the performance metric of streaming FGS bit streams over packet erasure networks as follows:

P4

P6

3rd 4th

P2

P3

2nd

P10

P5

P7

P11

P12

P8

P13

P9

P14

P15

P16

...

(1 − pe (m))

(1)

(a) baseline packetization

m7→(f,l,i)

(f,l,i)∈I

where B is the current estimated bandwidth, τ is the time slot length and B · τ is the estimated available rate. RBL is the bit rate for the base layer bit stream. RARQ and RF EC are, respectively, the rates for retransmission and error protection if adopted. Obviously, an optimal packetization scheme would be yielded when J1 is maximized. Eqn. (1) is a very general performance metric. It has considered the dependency among bit planes in the enhancement layer bit stream, the dependency between the base layer and enhancement layer bit streams, and also the error protection. The influence of error protection is reflected through pe (·). If unequal error protection (UEP) is adopted, pe ’s will be different for different packets. If equal error protection (EEP) or no error protection is adopted, pe ’s will be the same for all packets. 3. THREE PACKETIZATION STRATEGIES The performance metric presented above indicates that the streaming performance is very sensitive to dependencies among packets. This observation guides our effort to find the packetization scheme that results in smallest packet dependency. Since the enhancement bit stream depends on the base layer bit stream which is usually very thin, we assume the base layer bit stream is transmitted correctly in this paper. Furthermore, for the sake of simplicity, no error protection is applied. In FGS, the macroblock is the minimum coding unit, as can be observed from Eqn. (1), and, therefore, we will also adopt it as the minimum manipulation unit throughout the paper. 3.1. Baseline Packetization The baseline packetization strategy generates packets in a simple and straightforward way. It scans and groups the bit stream segments from the segment set I in a raster-scan order, from the most significant bit plane to the least significant one and from the left to the right within each bit plane, as illustrated in Fig. 1-(a). When the pre-determined packet length is reached, a packet will be formed. The remaining bit stream will be packetized with new packets.

b it p lan e

P2

P1

2nd 3rd

P3

4th

P3 P3

P1 P4

P5

P4

P2 P6

P2

P7

P8

P9

P5 P6 P6 P7 P7 P8

P10

P9 P9

P10

...

(b) binary-tree packetization Macro Blocks 0

1

2

3

4

5

6

7

8

9

10

11

12

13

...

1st 2nd

b it p lan e

where the term (f, l, i) denotes the bit stream segment for the f th frame, lth bit plane and ith macroblock (MB). ∆D(f, l, i) is the contribution, i.e., the distortion reduction that is achieved if (f, l, i) is successfully decoded. pe (f, l, i) and pe (m) is the packet loss probability of the packet that the MB (f, l, i) resides in and the mth packet, respectively. The bit stream to packet mapping, A 7→ B, stands for the dependency that packet A must be received in order to decode the received packet bit stream segment B. The selected MB segment set, I, indicates all bit stream segments that will be transmitted in the current time slot (bandwidth adjusting interval). The transmitting rate of I should satisfy: X ∆R(f, l, i) ≤ B · τ − RBL − RARQ − RF EC

Frame P1

1st

3rd 4th 5th 6th ...

1st Packet

2nd Packet

3rd Packet

(c) optimal packetization Fig. 1. Illustration of the three packetization schemes.

Poor performance of this method can be expected, due to the heavy dependency introduced between packets. For example, if packet P1 is lost, then P3 , P6 , P10 , P11 and P12 are totally undecodable, even if they are received successfully. This lost packet also contaminates packet P4 . P4 may be partially decodable or totally undecodable, depends on the implementation of the decoder. The contamination will propagate further, from P7 to P9 and from P13 to P16 . In summary, the single loss of packet P1 may render all the packets except P2 and P5 useless. What’s more, the rationale of this method lies in the consensus that the lower enhancement layer bears more importance than the higher layers. However, this is not always true if viewed in the rate-distortion sense. Therefore, the baseline packetization method generally does not yield optimal rate-distortion performance. 3.2. Binary Tree Packetization The reason for the severe drawback of the baseline packetization method is that there is no packet alignment. The only constraint is the packet length and then a packet is formed directly, i.e., there is no consideration about its performance. However, as often can be observed, the size of the bit streams of the FGS enhancement bit planes exhibits approximately an exponentially (base 2) increasing tendency. Therefore, all enhancement layers of a frame can be naturally represented by a binary tree with the nodes standing for the segments of bit stream and the edges for dependency. To obtain such a binary tree structure, the following constraint is exerted: any left child is aligned to the left with its parent while any right child is aligned to the right with its parent. This constraint is applied in a top-down fashion and it warranties a well aligned tree structure. Clearly, the alignment would help a lot in eliminating the dependency among packets. The penalty is that the corresponding

packet size will vary from node to node. Each node in the binary tree is associated with a rate-distortion tuple. According to the available bit rate, the binary tree is pruned first using a rate distortion criterion. For instance, in Fig. 1-(b), some bit stream segments at the 4th bit plane are left out. It is possible that each selected node be sent out as a packet. However this would induce too much overhead, typically due to the packet header, because the binary tree representation has produced nodes with pretty fine granularity. To reduce the overhead, we predefined a preferred transport packet length which is 5 kilo-bits in our experiments. In order to fulfill the packet length requirement and to reduce the packet dependency further, the parent node and the two children nodes may be grouped to form an actual transport packet.

kbps for CIF format. The packet length is set to 5 kilo-bits. The time slot (adjusting interval), τ , is set to 1 second. In our experiments, the base layer packets are transmitted without error, and the enhancement layer packets are transmitted at packet loss ratio of 0, 2.5%, 5%, 10% and 20%. Fig. 2 comparatively shows the results of the three packetization strategies for Foreman sequence, where the upper subplot is for the CIF format and the lower for the QCIF format. As a reference, the performance (with constant PSNRs over different packet loss ratios) of the base layer stream only is also shown. Correspondingly, Fig. 3 shows all the experimental results for Coastguard sequence. Foreman (CIF) 37 36

A better performance can be anticipated for the binary-tree packetization strategy because it removes much of the dependency among packets. It also yields pretty fine granular bit stream segments that are selected according to rate distortion measurement. The optimal packetization is the strategy that goes further, till the extremes probably, in both directions. First, we try to construct each packet in such a way that it does not depends on other packets. Note that in a bit plane coded bit stream, the dependency is between different bit planes not between macroblocks that are spatially tiled. Therefore, the optimal packetization strategy should put all bit planes for the same macroblock into the same packet. As an example, the first packet in Fig. 1-(c) does not depend on the second packet, and vice versa. Under this effort, Eqn. (1) reduces to X J2 = ∆D(f, l, i) × (1 − pe (f, l, i)) (2)

35 PSNR (dB)

3.3. Optimal Packetization

4. EXPERIMENTAL RESULTS AND DISCUSSIONS Many experiments have been performed to test and compare the performances of the proposed packetization strategies. The standard test sequences Foreman and Coastguard in both CIF and QCIF formats are used. The frame rate is set to 10 Hz. For the CIF and QCIF formats, the bit rate of the base layer is 128 kbps and 32 kbps, respectively. Only the first frame is encoded as an I frame and others as P frames. The available bandwidth for transmitting the enhancement bit stream is 192 kbps for QCIF format and 768

Baseline 33

Binary Tree

32

Optimized

31

Base layer

30 29 0

0.05

0.1 0.15 Packet Loss Ratio

0.2

Foreman (QCIF)

35 34

(f,l,i)∈I

33 PSNR (dB)

while the rate constraint remains unchanged. Notice that in Eqn. (2), there is a term ∆D(f, l, i). This term reminds us that the rate-distortion criterion needs to be considered in order to deliver optimal performance. Usually, the finer granularity, the better performance will result by the rate distortion consideration. In our scheme, a macroblock is the minimal unit. Therefore, we apply the rate-distortion criterion at the macroblock level. Each layer of a macroblock is represented by a rate-distortion tuple. For a given target bit rate, we will determine which macroblocks, and to what layers of these macroblocks are packetized together in one packet using the equal-slope argument in a Lagrangian minimization problem. Fig. 1-(c) clearly illustrates that different selected macroblocks may end up with different selected layers. Again, multiple macroblocks may be grouped to fulfill the packet length requirement so as to reduce the overhead.

34

32

Baseline

31

Binary Tree

30

Optimized

29

Base layer

28 27 0

0.05

0.1 0.15 Packet Loss Ratio

0.2

Fig. 2. Packet loss ratio vs. PSNR plots for test sequence Foreman. It is clear from the figures that the streaming performance drops very fast when the baseline packetization strategy is used. This phenomenon is simply due to the heavy dependency among various packets. The binary-tree packetization method works very well since much of the dependency is eliminated and the rate distortion information is adopted for tree pruning. Finally, the optimal packetization scheme delivers the best performance under all experimental conditions, exactly as what we expect, because all packets are independent, i.e., a lost packet will not contaminate any other packet; and the rate distortion has been considered during the optimization. However, the impact of the rate-distortion selection does not seem to significant, as can be seen clearly when

Coastguard (CIF)

Table 1. Comparison of effective undecodable data ratio for the three packetization schemes. PLR stands for packet loss ratio.

34 33

PLR Baseline Binary-tree Optimal

PSNR (d B)

32 31

Baseline

30

Binary Tree

29

Optimized

28

Base Layer

26 0

0.05

0.1

0.15

10% 28.8% 16.5% 10.0%

20% 49.5% 30.8% 20.0%

0.2

5. CONCLUSION AND FUTURE WORK

Packet Loss Ratio

Coastguard (QCIF)

34 33 32 P SNR ( dB)

5% 14.7% 8.3% 5.0%

quirement is that the side information such as rate and distortion must be available beforehand. These information can be generated in the encoding stage.

27

31

Baseline

30

Binary Tree

29

Optimized

28

Base layer

27 26 0

0.05

0.1

0.15

0.2

Packet Loss Ratio

Fig. 3. Packet loss ratio vs. PSNR plots for Coastguard.

the packet loss ratio is 0. The performance gain is generally limited to be less than 0.3 dB. Quantitatively, when channel’s packet loss ratio is 20%, the resultant difference between the baseline and the optimal packetization strategies is about 1.5 to 2 dB for Foreman sequence, and is about 1.5 dB for Coastguard sequence. We have mentioned throughout the paper that the binary-tree and the optimal packetization strategies greatly reduce the dependency among packets. To evaluate to what extent the dependency is reduced, we show the effective undecodable data ratio in Table 1. Indeed, we could have shown undecodable packet ratio, which is more explicit. However, in the experiments with the baseline packetization scheme, a packet received may be partially decodable even some of its ancestors are not available provided that the decoder is robust enough. For instance, in Fig. 1-(a), if P2 is lost, P4 is still partially decoded in our system.1 Table 1 reports only the results of Coastguard at CIF format as other results are quite similar. Clearly, the binary-tree packetization strategy reduces the contamination of the lost packets greatly while the optimal strategy totally eliminates the packet dependency. We note that, despite the seemingly complexity of the binarytree and optimal packetization strategies, they are actually very fast and real-time implementation is not an issue. The only re1 Many

2.5% 7.4% 4.4% 2.5%

thanks go to Dr. Feng Wu for providing the robust codec.

In this paper, we presented a very general performance metric for streaming FGS bit streams over lossy networks (e.g., the Internet). Considering the special structure of an FGS bit stream, three packetization methods were proposed. The baseline packetization method is simple and straightforward, but the performance is not good. The binary-tree packetization, as the name suggests, represents the bit stream as a binary tree and performs a ratedistortion based pruning. As a result, much better performance is yielded. The optimal packetization moves further: it performs a rate-distortion based selection at the finest granularity and thus eliminates all the packet dependency. The performance is by far the best. In our experiments, we have assumed no error protection or equal error protection. However, unequal error protection schemes are readily applicable. In fact, we have interleaved packets naturally. Furthermore, the proposed technique can certainly be applied to some other scalable coding techniques such as the Progressive Fine Granularity Scalability (PFGS) coding scheme [6]. However, more considerations need to be taken in that case and will be our future work. 6. REFERENCES [1] Vasudev Bhaskaran and Konstantinos Konstantinides, Image and Video Compression Standards – Algorithms and Architectures, Kluwer Academic Publishers, second edition, 1997. [2] W. Li, “Overview of Fine Granularity Scalability in MPEG-4 Video Standard,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 11, no. 3, pp. 301–317, Mar. 2001. [3] ISO/IEC 14496-2/FPDAM4, “Coding of Audio-Visual Objects, Part-2 Visual, Amendment 4: Streaming Video Provile,” July 2000. [4] Xiaolin Wu, Samuel Cheng, and Zixiang Xiong, “On Packetization of Embedded Multimedia Bitstreams,” IEEE Transactions on Multimedia, vol. 3, no. 1, pp. 132–140, Mar. 2001. [5] Philip A. Chou and Zhourong Miao, “Rate-Distortion Optimized Streaming of Packetized Media,” Tech. Rep. MSR-TR-2001-35, Microsoft Research, Feb. 2001, http://www.research.microsoft.com. [6] F. Wu, S. Li, and Y.-Q. Zhang, “A framework for Efficient Progresive Fine Granularity Scalable Video Coding,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 11, no. 3, pp. 332–344, Mar. 2001.