A Hierarchical Unequal Packet Loss Protection Scheme for Robust H ...

0 downloads 0 Views 1MB Size Report
In our work, we focus on variable bit rate encoding with fixed quantization scales to implement the ULP schemes. Thus, for the P-frames, under fixed quantization ...
A Hierarchical Unequal Packet Loss Protection Scheme for Robust H.264/AVC Transmission Xingjun Zhang, Xiao-Hong Peng

Dajun Wu

Tim Porter and Richard Haywood

School of Engineering and Applied Science Aston University, UK Email: {x.zhang, x-h.peng}@aston.ac.uk

Institute for Infocomm Research, Singapore Email: [email protected]

School of Engineering and Applied Science Aston University, UK Email: {portertr, haywoorj}@aston.ac.uk,

Abstract—In this paper, we are concerned with robust H.264/AVC video transmission over lossy packet networks and present a hierarchical unequal packet loss protection (HULP) scheme in a transmission system that efficiently combines erasure coding, H.264/AVC error resilience techniques and importance measures in video coding schemes. The importance of the video stream packets is distinguished by three criteria: the frame sequence number in a group of pictures (GOP), the per-frame bitrate and the H.264/AVC data partition type. Using a fixed amount of redundancy, more important packets of a video stream are protected with a more powerful erasure code than the less important packets. We demonstrate the effectiveness of this approach by system implementation and performance evaluation. In the presence of packet loss, we show that the received video quality, as measured by PSNR, is significantly improved when the HULP scheme is used. More importantly, in our experiments, HULP can achieve higher PSNR values and better user perceived quality, but requiring less redundancy, than equal loss protection (ELP) schemes.

I. I NTRODUCTION Combining forward error correction (FEC) algorithms with appropriate selection of error resilient techniques are often shown to be advantageous for transmission of real-time data. Most existing works, which are based on FEC to design unequal loss protection schemes, focus on the transmission of H.263 and MPEG-2/4 video [1][2][3]. Several protection schemes for H.264/AVC video transmission were proposed, but those schemes only consider a single aspect, i.e. [4], [5], [6] are based on Flexible Macroblock Ordering (FMO) and [7] only considers the data partitioning of H.264/AVC. By analyzing the existing unequal importance settings in H.264/AVC coding schemes, and exploiting the H.264/AVC data partition resilience tool, we present a hierarchical unequal loss protection (HULP) scheme that is implemented over an emulation testbed. The scheme can differentiate the importance of the stream packets according to the derivations of the packets, which are classified into three levels. Firstly, the steam packets are classified by the GOP sequence number of the frames, indicating the picture group the packets come from. The packets to be transmitted earlier in the sequence are more important. Secondly, the importance of the packet is distinguished by the number of bits per frame (we use the term per-frame bitrate in this paper), which reveals the frame quality as contributed to video and the frame importance in

motion compensated prediction. Finally, the packets are also classified by the H.264/AVC data partition type. To the best of our knowledge, there is no one unequal loss protection scheme for H.264/AVC transmission that takes into account those three-level unequal importances. Reed-Solomon (RS) codes with different recoverability are used in the erasure coding approach. A dropped packet can be regarded as an erasure, so we call the FEC code used in the video transmission system as an erasure code and, equally, FEC coding as erasure coding. The performance in terms of PSNR (peak signal-to-noise ratio) is analyzed in the process of system implementation. The paper is organized as follows. Section 2 examines the existing unequal importance settings in the different levels of H.264/AVC video encoding schemes. Section 3 presents the hierarchical ULP scheme and system implementation. In Section 4, we provide experimental results and performance analysis. Section 5 concludes the paper. II. BACKGROUND A. GOP Level Unequal Importance In compressed video, different frame types (I, P and B) are assigned different levels of importance. The I-frame is the most important, and the P-frame is more important than the B-frame. In addition, the frames in a GOP have a descending order of importance from the beginning frames to the ending frames. A GOP sequence starts with an I-frame, and all the other frames depend on it. The first P-frame is predicted using the I-frame. Subsequent P-frames use the previous P-frame as their reference until next GOP starts. B-frames are predicted from the preceding and following I-frame or P-frame. Due to this temporal dependency, the decoding of the current frame strongly depends on its preceding frames in a GOP. The earlier a frame is lost in a GOP, the more frames that will be corrupted afterwards [8]. B. Per-frame Bitrate Vs. Unequal Importance There is evidence showing the relationship between perframe bitrate and frame importance. Figure 1 plots the perframe bitrate of the first 100 P-frames and B-frames in the standard paris.cif video test sequence, which will be used in our experiments. We can see that the per-frame bitrate of Pframes is much higher than that of B-frames. The bitrate of

the I-frame is 148160 bits (not shown in the figure), which is higher than any P-frame’s bitrate. H.264/AVC adopts variable 40000

P-frames B-frames

35000

Perframe Bitrate (Bits)

30000 25000 20000 15000 10000 5000 0 0

10

20

30

Fig. 1.

40 50 60 Frame Number

70

80

90

100

Per-frame Bitrate

bit rate (VBR) video compression. The per-frame bitrate in the video sequence mainly depends on the quantization parameter (QP) used, the sequence content and the frame type. In general, video can be encoded with fixed quantization scales, which result in nearly constant video quality at the expense of variable bit rates. Alternatively, it can also be encoded with rate control, which adapts the quantization scales to keep the video bit rate nearly constant at the expense of variable video quality [9]. In our work, we focus on variable bit rate encoding with fixed quantization scales to implement the ULP schemes. Thus, for the P-frames, under fixed quantization scales, the difference of the per-frame bitrate is mainly because of the frame content. The frame with higher bitrate has much more information for prediction coding and gets more chances to be the reference frame for the purpose of motion estimation using multiple reference frames than the lower bitrate frames. C. H.264/AVC Data Partitioning Normally, each coded slice is encapsulated into one NALU (Network Abstraction Layer Unit). In the case of data partitioning, each coded slice is split into three partitions, which are each encapsulated in a NALU of their own. The H.264/AVC specification defines the three data partitions (A, B & C): partition A (PA) contains the slice header, macroblock types, quantization parameters, prediction modes, and motion vectors; partition B (PB) contains residual information of intracoded macroblocks; and partition C (PC) contains residual information of inter-coded macroblocks. The purpose of data partitioning is to divide the coded data into several data streams with different importance. A network that can provide different transmission or protection priorities to the packets with corresponding importance is able to protect the important data in a more efficient way [10]. III. H IERARCHICAL U NEQUAL L OSS P ROTECTION A. System Description The architecture of the proposed transmission system or the emulation testbed is presented in Figure 2. The whole

system is composed of three parts: the server, the packet loss controller and the client. The server includes the hierarchical NALU classification component, RS encoder, embedded H.264/AVC encoder and the RTP server. The raw footage is encoded using data partitioning. There are twelve different partitions in total. Except for PA, PB and PC, we call all others key partitions, including SPS (Sequence Parameter Set), PPS (Picture Parameter Set), IDR (Instantaneous Decoder Refresh) data slices and setup parameters. The system assumes that these key partitions are transmitted using a security channel. All PA, PB and PC partitions are classified according to the criteria analyzed in section 2. One of the motivations for this architecture is to keep the HULP method consistent with the H.264/AVC design idea, which specifies the video coding layer and the network abstraction layer separately. The system intends to enable robust video transmission without the details of video source coding. According to the importance of the HULP classification, the RS encoder generates parity packets by choosing suitable RS codec parameters. NALU and parity packets are packed into the RTP format and transmitted through the packet loss controller that is created by encapsulating the Linux advanced networking traffic controller (TC) and used to alter the network traffic in terms of network bandwidth, delay, jitter, packet loss rates, and packet loss patterns. The Video Coding Layer (Encoder) Data Partition Enabled

Network Abstraction Layer (Encoder)

. . . . . . . .

. . . .

Transport (RTP)

RS Encoder

H.264 Encoder

Video Coding Layer (Decoder)

NALU Classification

Network Abstraction Layer (Decoder)

RS Decoder (Classification Distinguished)

Packet Loss Controller

Transport (RTP)

H.264 Decoder

Fig. 2.

System Architecture

receiver client is equipped with the RS decoder, the embedded H.264/AVC decoder and the RTP client. The lost packets are recovered by utilizing an appropriate RS code according to the NALU header information. Finally, H.264/AVC coding layer takes over to carry out the remaining decoding tasks One concern at the server is how to setup the classification ranks for the GOP level and per-frame bitrate. In general, the more classified levels that are used, the more delicate unequal protection that will be achieved. Further, a number of other issues must also be taken into account. The system assembles the classified packets into different blocks of packets (BOP) to implement the RS code across packets. More classification stages will result in smaller BOP sizes, which directly affect the capability of the RS code in coping with burst packet losses. For example, suppose that there are 15 packets belonging to one RS (15, 11) BOP transmitted over a lossy network, and this code can recover up to 4 burst packet losses. But if those 15 packets are used to form two RS (7,

I Frame HB frames

IDR Partition

Security Channel

PA

RS(15,9)

PB, PC

RS(15,11)

PA

RS(15,9)

PB, PC

RS(15,13)

PA

RS(15,11)

PB, PC

RS(15,13)

FH frames LB frames GOP

P Frames HB frames SH frames LB frames B Frames

Fig. 3.

PA

RS(15,13)

PB, PC

RS(15,13)

PA, PC

RS(15,13)

classified partitions Vs. RS codes

B. A System Implementation The system parameters are fixed for validating the HULP scheme. For simplification, we just configure at two stages for both GOP-level and per-frame bitrate classifications. This means that the frames in the GOP are firstly divided into the First Half (FH) and the Second Half (SH) frames, and then are further divided into the High per-frame Bitrate (HB) and the Low per-frame Bitrate (LB) frames. The per-frame bitrate distribution is shown in Figure 1 and the classification threshold is set to 3000 bits per frame. Data partitioning uses the H.264/AVC standard method. In the system implementation, the GOP-level and per-frame bitrate classifying is performed after data partitioning. The outputs of VCL are NALUs. Each NALU takes one data partition. The system is aware of the GOP position, the bitrate of the frame, and the source from which a NALU is created. We implement the systematic RS codes in erasure coding. Figure 3 shows the mapping of the classified partitions onto the RS code chosen. IV. E XPERIMENTAL R ESULTS

TABLE I U NRECOVERABLE PACKETS S TATISTICS PLR

ELP UPk 3 0 10 14 11 7 27 65 56 94 100 114 156 201 228 240 276 303 346 397

LPk 38 57 129 178 195 205 277 339 346 417 456 483 532 563 616 655 687 698 732 803

1% 2% 3% 4% 5% 6% 7% 8% 9% 10% 11% 12% 13% 14% 15% 16% 17% 18% 19% 20%

UPp 7.8% 0 7.7% 7.8% 5.6% 3.4% 9.7% 19.2% 16.2% 22.5% 21.9% 23.6% 29.3% 35.7% 37% 36.6% 40.1% 43.4% 47.2% 49.4%

LPk 36 94 120 172 219 256 298 327 372 392 452 472 509 565 589 635 657 705 726 831

HULP UPk 0 6 8 12 18 28 53 73 78 91 183 169 201 260 268 310 323 367 409 517

UPb 0 6.4% 6.7% 6.9% 8.2% 10.9% 17.8% 22.3% 21% 23.2% 40.4% 35.8% 39.4% 46% 45.5% 48.8% 49.1% 52% 56.3% 62.2%

(fps) in an IPBPBPB ... structure. In this case, the errors caused by packet loss in any frame will be propagated throughout the whole video sequence. The encoder generated 4019 partitions in total, including 13 key parameter partitions (such as SPS, PPS), 2178 PA and 1828 PB/PC. On the network emulator, the random packet loss pattern is adopted and the packet loss rate is tuned from 1% to 20%. For simplicity and clearness, we will calculate the PSNR values only using the I-frame and Pframes of the GOP in the following performance comparison. Because the loss of packets in B-frames does not affect any other frames, this simplification does not lose the generality. For system validation, an equal loss protection (ELP) scheme is also implemented by changing all the k values of the RS (n, k) codes used in the HULP to 11 except for the one used for B-frame protection. 36

ELP HULP

34 32 Average PSNR (dB)

5) BOPs (with one packet unused), they can only recover up to 2 burst packet losses per BOP. Thus, BOPs with large sizes are more robust than the small ones. However, a larger BOP would result in a longer delay. In practice, the total number of video packets in a BOP should be determined by considering the tradeoff between delay and robustness [3]. For the Internet applications, the target number of video packets in one BOP can be determined according to the end-to-end system delay constraint [11]. If the HULP scheme is used in a real application, the configuration of the ranks needs to know the frame bitrate. This value can be easily retrieved for off-line and one GOP delay tolerable applications. For the interactive application, it is possible to predict the rank values according to the encoding history of the same type of video.

30 28 26 24 22 1

5

Fig. 4.

10 Packet Loss Rate (%)

15

20

Average PSNR Values

A. Experiment Design The system is built based on a modified version of the test model coder JM12.4 which is provided by the Joint Video Team (JVT). JM12.4 is chosen because it supports data partitioning for the purpose of error resilience. As an example, the first 100 frames of a standard video sequence in the 4:2:0 YUV CIF format has been encoded at 30 frames per seconds

B. Experimental Results Table 1 presents the statistical features of packet losses and recovery in the experiments. For different packet loss rates (PLR), the number of packets lost (LPk), the number of packets unrecovered among those lost (UPk), and the percentage

38 36 34

PSNR (dB)

32 30 28

Without Protection ELP HULP

26 24 22 20 0

10

20

Fig. 5.

30

40 50 60 Frame Number

70

80

90

100

90

100

PSNR comparison at 6% PLR

38

Without Protection ELP HULP

36 34

PSNR (dB)

32 30 28 26 24 22 20 18 0

10

20

Fig. 6.

30

40 50 60 Frame Number

70

80

PSNR comparison at 13% PLR

of unrecoverable rate (UPb) are shown against a range of PLRs for both ELP and HULP schemes in the table. Although the numbers of lost packets shown in the HULP and ELP experiments are not exactly the same for a PLR, the same loss rate can still be obtained for both cases as the total numbers of packets transmitted for the two schemes are not identical, either. From Table 1, we can see that overall the unrecoverable rate of ELP is relatively lower than that of HULP. This is because that ELP used slightly more redundancy than HULP, i.e., with the redundancy rates 21.604% for ELP and 20.997% for HULP, respectively. However, from the following results 38

Without Protection ELP HULP

36 34

PSNR (dB)

32 30 28 26 24 22 20 18 0

10

20

Fig. 7.

30

40 50 60 Frame Number

70

80

PSNR comparison at 20% PLR

90

100

and analysis, we will see that HULP, though requiring less redundancy, can still achieve higher PSNR values and better user perceived quality than ELP. The measured average PSNR values are plotted in Figure 4, which clearly shows the HULP scheme has much better PSNR values as the packet loss rate increases. In Figures 5, 6 and 7, the PSNR values with respect to the frame sequence number are shown for three different packet loss rates, respectively. We can see that: (1) the PSNR values are improved for both protection schemes against the scheme without any protection; (2) Figure 5 shows that when the packet loss rate is low, both HULP and ELP can recover the majority of all lost packets; (3) Figures 6 and 7 show that the frame PSNR values of HULP are dramatically higher than the ELP method when the packet loss rate is high; and (4) With the packet loss rate increasing and the error aggregating, the PSNR values of HULP and ELP begin to converge, but HULP’s PSNR is still marginally higher than that of others schemes. From Figure 5, we find that ELP’s PSNR is higher than that of HULP in some cases. That is because ELP is better in coping with the burst losses when the packet loss rate is low. In the experiments, there are 1071 of 4006 packets are protected by the RS (15, 13) code in the HULP scheme. For the transmission of these packets, if there are more than two packets lost consecutively, those losses will not be recovered. HULP uses the RS (15, 9) code, which can recover up to 6 losses and hence more powerful than the RS (15, 13) code used by ELP, to protect the more important 539 of 4006 packets. However, this advantage of the code cannot be exerted in the low packet loss rate region as it is very rare to have around consecutive 6 packets lost out of 15 when the lost rate is low. In some cases, the curves are prone to jitter. This phenomenon is because: (1) Despite these protection schemes are compared against the same packet loss rate, the location of the lost packets may differ between schemes, resulting in the aggregated errors being very different; and (2) The frame structure chosen in the H.264 encoding scheme is quite sensitive to the packet loss and therefore any displayed error will be propagated throughout the entire video sequence. Therefore, the accuracy of the comparisons in PSNR may be affected due to these two reasons. The user perceived quality is also investigated. Figures 8 and 9 show the snapshots of the 42th and the 100th frames for the proposed HULP, ELP and the transmission without any protection at the 15% packet loss rate, respectively. We can see that: (1) compared with the transmission without any protection, the visual qualities of the frames are all improved in both schemes; and (2) The frame quality for HULP is much better than that of ELP, which implies that user perceived quality is significantly improved when using the HULP scheme. The decoding complexity and added redundancy are the main cost in the system. Rizzo had showed the feasibility of FEC coding in software at high speeds [12]. There are two kinds of redundancy in the system. The first one is the parity packet redundancy. The other one is the padding redundancy

(a) Fig. 8.

(b)

Frame 42 at 15% PLR using: (a)without protection; (b)ELP; (c) HULP

(a) Fig. 9.

(c)

(b)

(c)

Frame 100 at 15% PLR using: (a)without protection; (b)ELP; (c) HULP

due to the different video packet size. The padding redundancy does not need to be transmitted by using the virtual padding process[3]. The system uses no complex interleaving or FEC assignment algorithms, allowing for fast processing to meet the requirements on transmission throughput and latency. V. C ONCLUSIONS In this paper, a hierarchical unequal packet loss protection scheme, named HULP, is presented for robust H.264/AVC video transmission over lossy networks by combining erasure coding, H.264/AVC error resilience techniques and unequal importance in different levels of syntax hierarchy in video coding schemes. The PSNR value and user perceived video quality are used to validate the proposed scheme and assess the performance of the implemented system. The experimental results demonstrate that the HULP scheme can significantly improve the video PSNR values. For the amount of redundancy specified, we have compared the performances of the HULP scheme and the equal loss protection (ELP) scheme and found that even though the HULP scheme uses less parity redundancy than the ELP scheme in the experiments, it still can achieve higher PSNR values and better user perceived quality than ELP as the packet loss rate increases. VI. ACKNOWLEDGMENTS This work was supported by the grants from the Engineering and Physical Sciences Research Council of the United Kingdom and Xyratex Ltd. The authors wish to thank Dave Milward and Tim Courtney for useful discussions.

R EFERENCES [1] M. J. Boyce, “Packet loss resilient transmission of mpeg video over the internet,” SP: IC, vol. 15, pp. 7–24, 1999. [2] J. Goshi, A. E. Mohr, R. E. Ladner, E. A. Riskin, and A. F. Lippman, “Unequal loss protection for h.263 compressed video,” IEEE Trans. Circuits Syst. Video Techn., vol. 15, pp. 412–419, 2005. [3] X. Yang, C. Zhu, Z. Li, X. Lin, G. Feng, S. Wu, and N. Ling, “Unequal loss protection for robust transmission of motion compensated video over the internet,” Signal Processing: Image Communication, vol. 18, pp. 157–167, 2003. [4] N. Thomos, S. Argyropoulos, N. V. Boulgouris, and M. G. Strintzis, “Robust transmission of h.264 streams using adaptive group slicing and unequal error protection,” EURASIP Journal on Applied Signal Processing, vol. 2006, pp. 120–120, 2006. [5] H. K. Arachchi, W. Fernando, S. Panchadcharam, and W. Weerakkody, “Unequal error protection technique for roi based h.264 video coding,” in Proceedings of IEEE Canadian Conference on Electrical and Computer Engineering, 2006. [6] Y. Dhondt, P. Lambert, and R. Van de Walle, “A flexible macroblock scheme for unequal error protection,” in Proceedings of the 2006 IEEE International Conference on Image Processing, 2006. [7] T. Stockhammer and M. Bystrom, “H.264/avc data partitioning for mobile video communication,” in Proceedings of the International Conference on Image Processing, 2004. [8] Y. Xu and Y. Zhou, “H.264 video communication based refined error concealment schemes,” IEEE Trans. on Consumer Electronics, vol. 50, pp. 1134–1141, 2004. [9] G. V. der Auwera., P. T. David, and M. Reisslein, “Traffic characteristics of h.264/avc variable bit rate video,” iEEE Communications Magazine, in print, 2008. [10] M. Stefaan, D. Yves, V. D. W. Dieter, D. S. Davy, and V. D. W. Rik, “A performance evaluation of the data partitioning tool in h.264/avc,” in Proceedings of the SPIE/Optics East conference, 2006. [11] S.-W. Yuk, M.-G. Kang, B.-C. Shin, and D.-H. Cho, “An adaptive redundancy control method for erasure-code-based real-time data transmission over the internet,” IEEE Transactions on Multimedia, vol. 3, pp. 366– 374, 2001. [12] L. Rizzo, “On the feasibility of software fec,” University of Pisa, Tech. Rep., 1997.