On Video Streaming over Variable Bit-Rate and ... - Semantic Scholar

0 downloads 0 Views 154KB Size Report
Abstract. We consider streaming of video sequences over both, constant and variable bitrate channels. Our goal is to enable de- coding of each video unit before ...
On Video Streaming over Variable Bit-Rate and Wireless Channels Hrvoje Jenkaˇc, Thomas Stockhammer, Gabriel Kuhn Abstract We consider streaming of video sequences over both, constant and variable bitrate channels. Our goal is to enable decoding of each video unit before exceeding its displaying deadline and, hence, to guarantee successful sequence presentation even if the media rate does not match the channel rate. In this work, we will show that the separation between a delay jitter buffer and a decoder buffer is in general sub-optimal for VBR video transmitted over VBR channel. We will define the minimum initial delay and the minimum required buffer for a given video-stream and a deterministic VBR channel. In addition, we provide some probabilistic statements in the case that we have a random behaviour of the channel bit-rate. A specific example tailored to wireless video streaming are discussed in greater details and bounds are derived which allow to guarantee a certain quality-of-service even for random VBR channels in a wireless environment. A specific example validates the findings.

I. I NTRODUCTION The popularity of IP-based video streaming over the internet grows every day with hunderts of new subscribers registered daily. Advanced mobile systems, such as 2G+ and 3G already allow IP-based multimedia transmission and reception at any place and time at reasonable and sufficient data rates. Multimedia data, such as streaming video, is usually stored on a server in the wired internet. As the streaming data can be requested by “wired” clients, where packet losses mainly result from network congestion, as well as from a wireless mobile, where packet losses mainly result from a pure link quality, the transmission policy should be able to adapt to the individual transmission conditions. Due to predictive video coding, lost IP-packets result not only in decoding errors of the current frame, but also in decoding errors of subsequent frames included in the the dependency chain. Therefore, channel coding and repeat request strategies are used to combat bad channel conditions, resulting from multi-path propagation, scattering and fading, and to guarantee an errorfree reception of IP-packets. Usually in mobile systems, not the whole IP-packet has to be retransmitted, but only the lost entity on the radio link layer (RLC). Compared to the IP-packet size, this transmission units are of very small size (100 - 400 bit), which makes them very attractive for retransmission policies. Assume the general simple media streaming system setup consisting of media streaming server, a transport channel, and a streaming client. The server stores several pre-encoded media streams, which are in general encoded by variable bit-rate (VBR). VBR coded video has in general much better rate-distortion performance than video coded with constant bit-rate (CBR) [1]. Each VBR media stream is characterized by a certain duration  and a playout curve p t . The playout curve is defined as the overall amount of data (e.g., measured in bits) produced by the video encoder up to time t. In general, this curve is monotonically increasing and has a stair-case characteristic. In addition, we assume p t 8t  and p t p  S 8t >  with S the size of the media stream in bits. In the following, we assume that all transport overhead such as IP-headers, system specific information or control messages are included in the playout curve. The transport channel is usually characterized by the peak transmission bit rate R (in bits per second), at which bits enter the decoder buffer. In constant bit-rate scenarios, R is often the channel bit rate and the average bit rate of the video clip. In a typical multimedia messaging application, the media stream would be downloaded and stored at the receiver. The downloading time Td is given as Td S=R. The minimum time interval between the moment when the download started and moment the playback of the stream can be started is obviously

()

()=0

0

()= ( )=

=

Hrvoje Jenkaˇc and Thomas Stockhammer are with the Institute for Communications Engineering (LNT), Munich University of Technology (TUM), D-80290 Munich, Germany; Contact: fjenkac, [email protected], Tel: +49 89 28923088 Gabriel Kuhn is with the Chair of Mathematical Statistics at the Center for Mathematical Sciences of the Munich University of Technology; Contact: [email protected]

given by the downloading time Td . In addition, the receiver requires a storage capacity of at least S bits at the receiver. In streaming applications, the goal is to start playback while downloading the media stream. This obviously has two significant advantages. Firstly, the initial playback delay can be reduced significantly, and, secondly, the storage capacity at the receiver, usually referred to as receiver buffer size B can be much smaller than the total size of the bit-stream S . The decoder buffer is necessary to smooth the bit-rate fluctuations in the VBR coded video. This buffer size cannot be larger than the physical buffer of the decoding device. However, this strategy of starting the playback while downloading also involves the problem of a possible decoder underflow or overflow. An underflow occurs in the case that due to the variable rate nature of the encoded video not enough data is present at the decoder at the time a certain video frame has to be decoded and displayed. This results in a display problem and the continuous flow of the decoded video is interrupted. In contrast, an overflow occurs whenever too many data has arrived at the decoder. This results in a loss of data and, in general, in a significant performance degradation. To avoid these problems, the playout curve, the transmission rate, the initial delay and the buffer size have to be adjusted appropriately. After requesting the stream, the video decoder starts removing bits from the buffer when the initial decoder buffer fullness F (also in bits) is achieved. F and R determine the initial or start-up delay , where F=R. The connection between R; B; F and the playout curve p t can be formalized by the so-called leaky bucket model [2], [3]. It is said that leaky bucket model with parameters R; B; F contains a bit-stream with playout curve p t if there is no underflow or overflow of the decoder buffer. If a complementary encoder buffer is used, decoder overflow can be avoided. This is as the decoder buffer is full the encoder buffer is empty and no data is transmitted. However, then no data bits enter the channel which obviously is costly in terms of overall data-rate. The bits enter the decoder buffer at rate R until the level of fullness is F and then the bits for the first media unit are instantaneously removed. The bits keep entering the buffer at rate R and the decoder removes the bits for the following frames at some given time instants, typically (but not necessarily) with the frame rate of the video stream. Two different kind of system design issues are typical. In a first common problem, the leaky bucket model is completetly specified and the video has to be encoded such that the playout curve fulfills the requirement of no decoder buffer underflow. This application is very typical for the live encoding and streaming of video data as in this case the playout curve is not determined in advance. Therefore, in video standardization communities these consideration have been used to specify a hypothetical reference decoder (HRD) [4], [5]. System specifications of media transmission standards usually discuss and specify the connections between a leaky bucket and the playout curve. For the Video Buffer Verifier (VBV) design for MPEG-2, as well as for the HRD design for H.263, the buffer size B and the initial delay are fixed. B is either specified in the bit-stream (MPEG-2) or by some constraint in the specification of the profile and level. can be fixed by transmitting the value in the bit-stream (MPEG-2 CBR mode), by filling up the buffer completely B=R, or by just using the size of the first frame, i.e., the first frame is (MPEG-2 VBR mode), i.e. immediately decoded (H.263). Then, with buffer and delay fixed once, the video encoder has to design an appropriate playout curve such that a buffer underflow is avoided. Recently, for the standardization of H.26L a new approach to the HRD design has been proposed [6], which generalizes and extends the concepts of previous HRDs. An encoder can create a video bit-stream that is contained by several leaky buckets. This HRD interpolates among the leaky bucket parameters and can operate at any desired transmission bit rate, buffer size or delay. In a second problem, the playout curve is specified and fixed and a transmission rate R, B and F have to be determined such that the bit-stream is contained by this leaky bucket. In this work we will focus on the second problem entirely as we deal with pre-encoded video streams to be conveyed over channels and systems with different characteristics. The goal is to determine appropriate initial delays and buffer sizes for variable bit-rate channels with focus to wireless video streaming. It has been recognized that media streaming especially over the Internet the CBR assumption in common HRD models is not satisfying. Delay jitter occurs due to packet delays and congestions in routers as well



()

()



(

)



=

=



as end-to-end re-transmissions in case of packet losses. Therefore, channel-adaptive streaming technologies have gained significant interest. According to [7], these techniques can be grouped into three different categories. Adaptive media playout [8] is a new technique that allows a streaming media client, without the involvement of the server, to control the rate at which data is consumed by the playout process. Therefore, the probability of decoder buffer underflows and overflows can be reduced, but still a noticable artifact in the displayed video occurs. A second technology for a streaming media system is proposed, which makes decisions that govern how it will allocate transmission resources among packets. Recent work [9] provides a flexible framework to allow rate-distortion (RD) optimized packet scheduling. In this case, the system allocates time and bandwidth resources dependency in adapting to the varying channel conditions. Finally, it is shown that this RD-optimized transmission can be supported, if media streams are pre-encoded with appropriate packet dependencies, possibly adapted to the channel (channel-adaptive packet dependency control) [10]. Wireless video streaming offers additional challenges. The main criteria are limited bandwidth, hand-helds with limited processing power and memory capacity and, packet losses on mobile links due to fading and interference. This requires an optimal exploitation of the bandwidth resources, a careful design of receiver buffers, and mechanisms to either react to packet losses or delay jitter in case of re-transmissions. Usually, the latter techniques applying link-layer retransmissions and trading off throughput and delay jitter for reliability on the wireless link is preferred as data losses result in a significant degradation of the displayed video. For video streaming over 3G mobile systems a video buffering model has been proposed to support variable bitrate coding. In [11], the mapping of UMTS bearer bit-rates to media bit-rates has been proposed. In addition, the specification and SDP-based transmission of the initial delay and the buffer size for a given playout curve are specified. The assumption of a CBR channel with linear receiver curve is maintained. This is justified as at the decoder a separation of the delay jitter buffer to compensate link-layer re-transmissions, and the decoder buffer is applied. In this work, we will show that the separation between delay jitter buffer and decoder buffer is in general sub-optimal for VBR video transmitted over VBR channel. We will define the minimum initial delay and the minimum required buffer for a given playout curve p t and a deterministic VBR channel. In addition, we provide some probabilistic statements in the case that we have a random behaviour of the channel bit-rate. A specific example tailored to wireless video streaming will be discussed in greater details and bounds will be derived which allow to guarantee a certain quality-of-service even for random VBR channels in a wireless environment. Conclusions and future work items will be discussed.

()

II. STREAMING M EDIA

OVER

VARIABLE B IT- RATE C HANNELS

A. Overview In the following, we assume a media streaming system setup consisting of media streaming server, a transport channel, and a streaming client. The server stores several pre-encoded media streams where each of it is characterized by a certain duration  and a playout curve p t . At the encoder, no encoder buffer is present such that we can have both, decoder buffer overflows and decoder buffer underflows. We will discuss later why we dispense with the encoder buffer. For the channel we still assume to be error-free with limited transmission-rate. However, in contrast to common considerations, we assume that the bit-rate might vary over time. This channel can be characterized by the receiver curve r t , which specifies the total amount of data received error-free up to time t at the receiver. Obviously, r t is monotonically increasing and we define rt 8t  . In this case, not only the variance of the playout curve p t has to be considered in the selection of the initial delay and the receiver buffer size, but also the receiver curve r t . The downloading time Td for a bit-stream with size S in the VBR case is given as fTd r Td S g. However, as we are interested in streaming applications and we want to minimize the initial delay and the receiver buffer size B for a VBR channel. To avoid a buffer underflow at the receiver buffer, the initial playback delay has to be chosen such that for any time instant t at least p t bits are available at the decoder, i.e. f 2R r t p t 8t g: (1)

()

()=0

()

0



()

() : ( )=

( )



: ()

( )

()



Furthermore, the receiver buffer should have the capacity to store all received and non-decoded data, as buffer overflow will result in lost data. The buffer size B has to be chosen such that at any time instant t all received non-decoded data can be stored in the receiver buffer, i.e.

fB 2 R : r(t)  p(t ) + B 8t g:

()

(2)

() = () )

Obviously, if r t is known before transmission or even prior to encoding and generation of p t , the playout curve can be designed such that the video stream contains a leaky bucket with r t ; B; F r = , or and B can be selected appropriately according to (1) and (2), respectively. We will provide a simple solution for this problem in subsection II-B. However, in general, the variable bit-rate behaviour is not known in advance, neither at the encoder nor at the decoder. This problem will be discussed in details in in subsection II-C. A specific example tailored to wireless transmission will be provided in section III.

( ()



B. Deterministic Receiver Curves

()

()

Assume for now that the exact receiver curve r t and the playout curve p t is known in advance at the media streaming server. Let us additionally define the pseudo-inverse function of the monotonically increasing playout curve p t as p( 1) x ft x  p t g. Then, the following proposition specifies the minimum initial delay and the decoder buffer size. Proposition 1: Given a playout curve p t , and a receiver curve r t , the minimum initial delay to avoid buffer underflow should be chosen as

()

( ) = min : ()

  = max t t

()

()

p(

1)

(r(t)) ;

(3)

and the corresponding minimum receiver buffer size to avoid a decoder buffer overflow as

B = max fr(t) t

p(t

)g :

(4)

Outline of proof. From (1) it is obvious that we attempt to have minimum initial delay, which is given as maximum horizontal difference between the playout curve p t and the receiver curve r t . With properties of p t and the definition of the pseudo-inverse the condition in (3) is obvious. Similarly, the minimum buffer size is the maximum vertical difference for any t between the receiver curve and the delayed playout curve.  In [11] the problem of streaming video over VBR channel is also discussed. However, in contrast to just one receiver buffer it is proposed to introduce two buffers at the receiver, a delay jitter buffer and a decoder buffer. The task of the delay jitter buffer compensates the delay jitter introduced by the channel to obtain a CBR channel with bit-rate R at the entrance of the decoder buffer. Therefore, traditional HRDs such as the MPEG-2 VBV or the H.263 HRD can be applied. The receiver curve r t is delayed by an initial delay 1 and de-jittered such that the at the output of the delay jitter buffer a CBR channel with bit-rate R is visible to decoder buffer. With Proposition 1 it is obvious that the minimum initial delay 1 and the minimum delay jitter buffer size B1 are given as

()

()

()

()





 = max ft r(t)=Rg ; t B = max fr(t) R  (t  )g : t

(5)

1

1

(6)

1

()

The decoder buffer is specified such that the video stream with playout curve p t is contained in the leaky R= 2 and with Proposition 1 we can again specify the minimum initial delay 2 and bucket R; B2 ; F the minimum decoder buffer size B2 as

(

= )

  = max t p (R  t) ; t B = max fR  t p(t  )g : t 2

2

(

1)

2



(7) (8)

We will now compare the single buffer approach with separate buffer approach. The connections are graphically illustrated in Figure 1 and serve as an intuitive proof for several of the following statements. For any valid playout curve p t and any valid receiver curve r t the following statements can be shown: 1. For any fixed R, the minimum delay for the single buffer case according to (1), 1 the minimum delay for the delay jitter buffer according to (5), and 2 the minimum delay for the separate decoder buffer according to (7), it is obvious that  1 2. 2. Similarly, for the respective minimum buffer sizes B , B1 , and B2 according to (4), (6), and (8), respectively, and for any R, it is obvious that B  B1 B2 . 3. Only if there exists an R > such that 8t r t  R  t Rt  p t 1 and 8t 2 , then we can design separate buffers such that the sum delay equals to the single buffer case, i.e. 1 2 . This can be separated by a straight line. means, that the receiver curve r t and the delayed playout curve p t This is illustrated in Figure 1. 4. Only if there exists an R such that it fulfills the previous condition, and, in addition, 8t r t  R  t B1 and 8t R  t  p t B2 , then we can design two separate buffers with B B1 B2 . This 1 2 B can be separated means, that the receiver curve r t and the delayed and shifted playout curve p t by a straight line R  t B . This is also illustrated in Figure 1. This shows, that in general two buffers are in general worse than one buffer in terms of minimum initial delay and minimum receiver buffer size. In addition, for separated delay jitter buffer and decoder buffer it is worth to optimized the intermediate bit-rate R to minimize the delay. As a single receiver buffer generally performs at least as good as two separate buffers, we only discuss this case in the following.



()

()

0 ()

 )+

+

  + +



()

(

) ( )

(  )+ ()

( ) +

d

= +

(

max. bu er apa ity

playout curve

receiver curve

receiver curve



( )  = + ()

max. bu er apa ity

d



B

1

t

B1

playout curve

de−jittered curve

B2

2 t

Fig. 1. Single receiver buffer versus separate delay jitter and decoder buffers

C. Random Receiver Curves

()

The assumption that the receiver curve r t is a-priori known at the transmitter or receiver is obviously not very realistic for most practical examples. However, to specify the initial delay and the minimum buffer size the exact knowledge of the receiver curve is necessary. Especially in systems, where we want to guarantee a certain Quality-of-Service (QoS), events like buffer overflow or underflows should not happen too frequently. Therefore, it is reasonable to guarantee a certain QoS by specifying the probability that the transmission of a sequence is successful without any buffer overflow or underflow with the constraint of minimum initial delay and minimum receiver buffer size. To formalize the concept, let us define the stationary and ergodic random process R t describing the

()

random receiver curve behaviour. Obviously, each realization of this random process has the same monotonic properties as the deterministic receiver curve r t as defined previously. However, we assume that some upper limit ru t and some lower limit rl t for the random receiver curve R t exist, which determine the probability  that any realization of R t is entirely within these limits for all t. This also means that the receiver curve will be in between the limits with probability  . However, we are not necessarily interested in events that the receiver curve is entirely the region, but in the event successful sequence playout at the receiver, defined as , is possible. A successful playout is defined such that no deadline violation and no receiver buffer overflow occurs while playing back the sequence. Hence, continuous decoding and error-free sequence presentation is assured. The following propositions provides guidelines how to select the initial delay and the receiver buffer size to guarantee a certain probability that the successful playout at the receiver is possible. Proposition 2: For a given playout curve p t , a given upper and lower limit ru t and rl t such that

()

()

() ()

()



()  := Prfrl (t)  R(t)  ru (t) 8tg;

()

()

(9)

and, if the initial delay is selected as    max t t

and the receiver buffer size as

B

 max fru(t) t

p(

1)

(rl(t)) ;

p(t

(10)

 )g ;

(11)

it can be guaranteed that the probability of successful playout of the sequence at the receiver is at least  , i.e., Prf g   .



( )

Outline of proof. According to (3) the condition in (10) guarantees that the delayed playout curve p t  is always below rl t . Additionally, with (4), the condition in (11) guarantees that the delayed and shifted playout curve p t B is always above ru t . Therefore, selecting  according to (10) and B  according to (11) guarantees that the video-stream with playout curve p t can be decoded for all receiver curves r t with property rl t  r t  ru t 8t. Note, that there might exist additional receiver curves which ensure successful decoding, but do not fulfill this property. However, all receiver curves fulfilling this property allow successfull decoding, and, therefore, the probability that a sequence with property rl t  r t  ru t 8t occurs, is a lower bound to the probability of successful decoding.  If the streaming server has knowledge of the playout curve p t and the random process of the VBR channel R t , or at least on the bounds rl t and ru t , then the design criteria according to Proposition 2 allows the transmitter to select the initial delay and the buffer according such that a certain QoS in terms of successful decoding probability is guaranteed. We will discuss the problem of initial delay selection and buffer design for a typical wireless transmission scenario in the following.

() (  )+

()

() ()

()

()

()

()

()



()

()

()

()

()

III. APPLICATION

TO

W IRELESS V IDEO S TREAMING

A. Channel Model We briefly describe the structure of our wireless video transmission system. We assume a wireless environment with a feedback channel and assume that an a priori probabilistic model of the channel behavior is available. The focus is on packet-based streaming to wireless receivers. We consider a communication systems, where application packets, e.g. video frames encapsulated in RTP/IP, are segmented into smaller link layer packets of constant payload length C (in bits). The link layer packets, indexed by i ; ; : : : ; U, are sent out at times t iL . Link layer packets are either correctly received immediately at the same time instant or a loss of this packet is indicated. Assume the probability for correct reception as p. In the remainder we assume that packet losses occur statistically independent. Although this seems to be very restrictive, for 3G mobile systems this is a valid assumption as the fast power control adapts the transmission conditions

=

=1 2

such that a certain loss probability is maintained [12]. For example, for UMTS Terrestrial Radio Access Network (UTRAN), a radio bearer using a dedicated channel and running in acknowledged mode could fulfill the requirements of recovering from lost RTP packets and having a fairly stable network throughput behaviour [13]. First of all, a dedicated channel can maintain a fixed transport channel rate on the physical layer. Secondly, when used in acknowledged mode, the probability of lost IP packets is virtually zero due to an efficient re-transmission protocol on the Radio Link Control (RLC) layer, which re-transmits only the erroneous LLC packets of an IP packet. The delay of the re-transmission is neglected for the moment as in general back-channel delay and re-transmission on link layers happen very rapidly. This assumption can be justified especially in scenarios where the channel propagation time of one packet is sufficiently smaller than the time interval between two consecutive video frames or IP packet transmissions. Moreover, in delayed feedback systems packet labeling allows to reorder received packets. As we assume the link layer payload size C and the channel propagation time sufficiently small, the additional feedback delay is considered to be negligible. In the following, we consider that erroneous LLC packets are re-transmitted immediately on the next available transmission slot and that re-transmissions of erroneous packets are performed until the current LLC packet is correctly received. B. Delay and Buffer Design for Wireless Video Transmission The presented model on the wireless transmission with instantaneous re-transmission on LLC packets of payload size C , transmit time interval L , and success probability p is in the following abbreviated by W C; L; p . To define the random receiver curver R t , let Xi be a random variable which describes the successful transmission of a LLC packet at time index i, with Xi 2 f g, and let p be the probability of a successful packet reception (Xi ) and p the probability for a lost packet (Xi ). The lost packet will be immediately re-transmitted at time instant i . Thus, we consider Xi i.i.d and define the random variable Ti as the number of successfully received LLC packets at after i link-layer transmission attempts, i.e.

(

)

()

=1

1

0; 1

+1

Ti :=

i X j =1

=0

Xj :

(12)

(

)

With Ti , we define the random receiver curve for the investigated wireless channel W C; L ; p as

R(t) = CTi

8 t 2 [iL; (i + 1)L);

(13)

=0 1

; ; : : : U . Assuming that the channel has been used i times, according to our retransmission policy, with i Ti LLC packets have been received. Obviously, due to our assumption of statistically independent packets, Ti is binomially distributed, i.e.  

Pr(Ti = j ) :=

i j p (1 j

p)i j :

(14)

We are now interested on finding appropriate initial delays and buffer design criteria when transmitting over such a channel. Therefore, we are interested to find an upper limit ru t and a lower limit rl t for this random process, such that we can use Proposition 2. The following lemma is useful in to find these values. ; ; : : : U , each with success probability p Lemma 1: For a set of binomial random processes Ti with i according to (12), any positive constant  , an upper bound j

() =0 1

1

ui := ip + a lower bound

j

li := ip

p

2p(1

p

2p(1

()

k

p)i log(log(i)) ;

(15)

k

p)i log(log(i)); ;

(16)

any L, such that fL by

= 1; 2;    : 8i