Radio Link Buffer Management and Scheduling for ... - Springer Link

4 downloads 21080 Views 342KB Size Report
c 2005 Springer Science + Business Media, Inc. Manufactured in The Netherlands. ... Keywords: radio link buffer management, scheduling, wireless shared ...
Telecommunication Systems 30:1/2/3, 255–277, 2005 c 2005 Springer Science + Business Media, Inc. Manufactured in The Netherlands. 

Radio Link Buffer Management and Scheduling for Wireless Video Streaming ¨ GUNTHER LIEBL HRVOJE JENKAC

[email protected] [email protected] Lehrstuhl f¨ur Nachrichtentechnik (LNT), Technische Universit¨at M¨unchen (TUM), D-80290 M¨unchen, Germany THOMAS STOCKHAMMER CHRISTIAN BUCHNER

[email protected] [email protected]

Nomor Research GmbH, D-83346 Bergen, Germany

Abstract. In this paper we compare strategies for joint radio link buffer management and scheduling for wireless video streaming. Based on previous work in this area [8], we search for an optimal combination of scheduler and drop strategy for different end-to-end streaming options including timestamp-based streaming and ahead-of-time streaming, both with variable initial playout delay. We will show that a performance gain versus the two best drop strategies in Liebl et al. [8], i.e. drop the HOL packet or drop the packet with the lowest priority starting from HOL, is possible: Provided that some basic side-information on the structure of the incoming video stream is available, a more sophisticated drop strategy removes packets from an HOL group of packets in such a way that the temporal dependencies usually present in video streams are not violated. This advanced buffer management scheme yields significant improvements for almost all investigated scheduling algorithms and streaming options. In addition, we will demonstrate the importance of fairness among users when selecting a suitable scheduler, especially if ahead-of-time streaming is to be applied: Given a reasonable initial playout delay at the streaming media client, both the overall achievable quality averaged over all users, as well as the individual quality of users with bad channel conditions can be increased significantly by trading off fairness with maximum throughput of the system. Keywords: radio link buffer management, scheduling, wireless shared channel, video streaming

1.

Introduction

Optimization and adaptation of video streaming strategies to the underlying network has become a challenging task, since in general, streaming content is foreseen to be supplied to both wired clients and clients which use a wireless connection, like High-Speed Downlink Packet Access (HSDPA), as the last hop in the overall transmission chain. This heterogeneous network structure results in a number of conflicting issues: On the one hand, significant performance gains for video transmission over wireless channels can be achieved by appropriate adaptation. On the other hand, optimization of the media

256

LIEBL ET AL.

encoding parameters or streaming server transmission strategies exclusively to wireless links will result in suboptimal performance for a wired transmission and vice versa. Hence, in order to increase system performance significantly, so-called “cross-layer” design is required. In a wireless streaming environment a magnitude of components have to be considered: Among them are the streaming server, which is located either on the Internet or in the operators core network, the wireless streaming client, the media coding, intermediate buffering, channel resource allocation and scheduling, receiver buffering, admission control, media playout, error concealment, etc. Since the search for an optimal joint set of all parameters is usually not feasible, suboptimal solutions have to be considered, which yield sufficient performance gains by jointly optimizing a subset of the above parameters. In this work we focus on strategies for joint radio link buffer management and scheduling for incoming IP-based multimedia streams at the radio link layer. Based on the wireless shared channel scenario in Liebl et al. [8], we search for an optimal combination of scheduler and drop strategy for different end-to-end streaming options including timestamp-based streaming and ahead-of-time streaming, both with variable initial playout delay. In addition to the previously proposed drop strategies at the radio link buffers, we will investigate the gains achievable by incorporating basic side-information on the structure of the video stream. Our advanced drop strategy removes elements from an HOL group of packets in such a way that the temporal dependencies usually present in video streams are not violated. We will assess the performance of this advanced buffer management scheme for an HSDPA scenario with the help of our wireless system emulator WiNe2. Furthermore, we will demonstrate the importance of fairness among users when selecting a scheduler, especially if ahead-of-time streaming is to be applied. The remainder of this paper is organized as follows: In Section 2 we will review some preliminaries for video streaming applications, followed by an overview of a wireless multiuser system and the High-Speed Downlink Packet Access model in Section 3. The various scheduling algorithms used in this work, as well as both the existing and the newly proposed buffer management algorithms will be explained in Section 4. Detailed results for different buffer management strategies, scheduling policies, and end-to-end streaming modes for typical test cases are given in Section 5. The paper concludes with a summary of the major issues. 2.

Preliminaries and definitions for video streaming applications

2.1. End-to-end streaming system We consider a typical end-to-end streaming scenario as described already in [8], which consists of a media server and one or more clients requesting pre-encoded multimedia data to be streamed to them over the network. The media server stores data in form of a packet-stream, defined by a sequence of packets called data units in the following, i.e. P = P1 , P2 , . . .. Each data unit Pn has a

RADIO LINK BUFFER MANAGEMENT AND SCHEDULING

257

certain size rn in bits and an assigned timing information in form of a Decoding Time Stamp tDTS,n indicating when this data unit must be decoded relative to tDTS,1 . After the server has received a request from a client it starts transmitting the first data unit P1 at time instant ts,1 . The following data units Pn are equivalently transmitted at time instants ts,n . Data unit Pn is completely received at the far-end at tr,n and the interval  between receiving time and sending time is called the channel delay δn = tr,n − ts,n . The latter reflects the most important value for a real-time application, as too long end-toend delays result in significant performance degradation. For sake of completeness we model the loss of a data unit by an infinite channel delay, i.e. δn = ∞. We assume in the following that if data units are received, they are correct and otherwise, their loss or delayed arrival is detected by the use of appropriate sequence numbering. The received data unit Pn is kept in the receiver buffer until it is forwarded to the video decoder at decoding time td,n . Without loss of generality we assume that the arbitrary value of the Decoding Time Stamp of the first data unit is equivalent to the decoding time of the first data unit, i.e. tDTS,1 = td,1 . An important criteria of the performance of a video streaming system is the time between the request of a receiver for a certain stream and the time the first data unit is presented. Neglecting the delay for conveying the request from the receiver to the streaming server and the time for the streaming server to set up the streaming session, as well as assuming that the first frame is immediately presented after it has been decoded, this delay can be computed as the difference between the sending time of the first data unit, ts,1 , and the decoding time of the first data  unit, td,1 , and is in the following defined as the initial delay δinit = td,1 − ts,1 . Then, data units which fulfill ts,n + δn ≤ tDTS,n can be decoded in time. The decoder buffer is used for de-jittering and data unit Pn is stored in it for some time tDTS,n − tr,n . In the remainder of this work we assume that the decoder buffer is sufficiently large such that no restrictions apply to the transmission time instant ts,n of data unit Pn . Small variations in the channel delay can be compensated for by this receiver-side buffer, but long-term variances result in loss of data units. Several advanced techniques have been proposed in the literature to cope with this so-called “late-loss”, which are, for example, summarized in [4]. However, state-of-the-art streaming systems available on the market in general do not apply any of these techniques yet. Therefore, we restrict ourselves in the following to an important practical scenario, but we note that generalization of our framework to more advanced technologies is easily feasible: To support a minimum amount of flexibility within the source data, we apply one of the simplest and widely-used rate adaptation schemes for pre-encoded video, namely temporal scalability. We achieve this by using a GOP structure, which includes I -frames, P-frames, and disposable B-frames.

258

LIEBL ET AL.

2.2. Source abstraction for streaming We will briefly introduce a formalized description of the encoded video following the definitions in [8]: The video encoder Qe maps the video signal s = {s1 , . . . , s N } onto  a packet-stream P = Qe (s). We assume a one-to-one mapping between source units sn , n = 1, . . . , N , (i.e. video frames) and data units (i.e. packets). Therefore, each video frame sn generates exactly one data unit Pn , which can be transported separately over the network. For convenience, we define the sampling curve or encoding schedule, Bn , as theoverall amount of data produced by the video encoder up to data unit n,  i.e. Bn = nj=1 r j . Encoding and decoding of sn with a specific video coder Q results in a recon struction quality Q n = q(sn , Q(sn )), where q(s, sˆ ) measures the rewards/costs when representing s by sˆ . We restrict ourselves in the following to the Peak Signal-to-Noise Ratio (PSNR), as it is accepted as a good measure to estimate video performance. The total average quality up to source unit n for a sequence of size N is therefore defined as n  1  Q¯ n (N ) = Qi , N i=1

(1)



and the total quality is defined as Q¯ = Q¯ N (N ). According to Chou and Miao [2], regardless of how many media objects there are in a multimedia presentation, and regardless of what algorithms are used for encoding and packetizing those media objects, the result is a set of data units for the presentation which can be represented as a directed acyclic graph. If such a set of data units is received by the client, only those data units whose ancestors have all been also received can be decoded. We will use this structure in the definition of a straightforward concealment algorithm: In case of a lost data unit, the corresponding source unit is represented by the timely-nearest received and reconstructed source unit (i.e. a direct or indirect ancestor). If there is no preceding source unit, e.g. I -frames, the lost source unit is concealed with a standard representation, e.g. a grey image. In case of consecutive data unit loss, the concealment is applied recursively. The concealment quality Q˜ n,ν (i), if sn is represented with si , is defined as  Q˜ n (i) = q(sn , Q(si )).

Therefore, we express the importance of each data unit Pn as the amount by which the quality at the receiver increases if the data unit is correctly decoded, i.e.   N   1 Q n − Q˜ n (c (n)) + [ Q˜ i (n) − Q˜ i (c (n))] , In = (2) N i=n+1 n i

with c (n) the number of the concealing source unit for sn , and n  i indicating that i depends on n due to the concealment strategy. Additionally, the concealment quality

RADIO LINK BUFFER MANAGEMENT AND SCHEDULING

259

Q˜ n (0) means that source unit sn is concealed with a standard representation. Hence, the overall quality can alternatively be computed as N N  1  ¯ Q= Qn = Q0 + In , N n=1 n=1

(3)

with Q 0 the minimum quality, if all frames are presented as grey. Hence, quality is incrementally additive with respect to the partial order in the dependency graph. 2.3. Streaming parameters and performance criteria In the following we will give a brief overview of how to evaluate the performance of streaming video over lossy and variable delay channels. The video decoder might experience the absence of certain data units Pn in the decoding process due to one of three different reasons: 1. The data unit can be lost due to impairments on the mobile radio channel or buffer overflow in the network, i.e. δn = ∞; 2. it arrives after its decoding time has expired such that it is no more useful (“late-loss”), i.e. δn > tDTS,n − ts,n ; 3. the server has not even attempted to transmit the unit. Whereas the former two reasons mainly depend on the channel, the latter can be viewed as temporal scalability and a simple means for offline rate control. In what follows we assume that the server transmits all data units of the stream no later than at their nominal sending time ts,n , which is computed from tDTS,n and the sending time of the first data unit, ts,1 . Another important parameter in our end-to-end streaming system is the initial delay δinit at the client. On the one hand, this value should be kept as low as possible to avoid annoying startup delay to the end user. On the other hand, longer initial delay can compensate for larger variations in the channel delay and reduce late-loss. Since we have ruled out more advanced transmit and receiver strategies, we can reduce the evaluation of the single-user streaming performance with the availability of a sequence of channel delays δ = {δ1 , . . . , δ N } for each data unit Pn with n = 1, . . . , N and a predefined initial delay δinit to Q(δ, δinit ) = Q 0 +

N  n=1

In 1 {δn ≤ δinit }

n−1 

1 {δm ≤ δinit } ,

(4)

m=1 m ≺n

where 1 {A} denotes the indicator function being 1 if A is true and 0 otherwise. As a second measure of interest we use the percentage of lost data units, P(δ, δinit ) which is

260

LIEBL ET AL.

computed as P(δ, δinit ) =

N 1  1 {δn > δinit } . N n=1

(5)

To evaluate the performance of video streaming applications it is essential to obtain a reasonable distribution of the channel delay sequence δ. However, the problem is that for complex systems this value is neither deterministic nor can it be determined by a simple statistical description as, for example, suggested in Chou and Miao [2] for the modeling of Internet packet delays. The approach taken to simulate the channel delay sequences for a wireless multiuser system will be discussed next. 3.

Streaming in a wireless multiuser environment

3.1. General system overview Figure 1 shows a simplified model of a wireless multiuser streaming environment. We assume that M users in the serving area of a base station in a mobile system have requested to stream multimedia data from one or several streaming servers. In the following we are exclusively interested in the downlink of this system. We assume that the core network is over-provisioned such that congestion is not an issue on the backbone. The streaming server forwards the packets directly into the radio link buffers, where packets are kept until they are transmitted over a shared wireless link to the media clients. A scheduler then decides which users can access the wireless system resources bandwidth and transmit power, and a resource allocation unit integrated in the scheduler assigns these resources appropriately. It is obvious that for the same resources available different users can transmit a different amount of data, e.g. a user close to the base station can use a coding and modulation scheme which achieves a higher bit-rate than one at the boundary of the serving area. In general, the performance of the streaming system should significantly depend on many parameters such as the buffer management, the scheduling algorithm, the resource allocation, the bandwidth and power share, the number of users, etc. In the remainder of this paper we will concentrate on the first two aspects and investigate their influence for a specific multiuser system which is briefly introduced in the next section. 3.2. System example—HSDPA The system under investigation in this work is High-Speed Downlink Packet Access, which is part of the Universal Mobile Telecommunications Systems (UMTS) release 5 specification and has been introduced to increase the packet data throughput to mobile terminals significantly while optimizing the resource allocation efficiency at the same time [5]. The key new features compared to standard UMTS packet transmission modes

RADIO LINK BUFFER MANAGEMENT AND SCHEDULING

261

Figure 1. Simplified multiuser streaming system.

are the use of adaptive modulation and coding to perform link adaptation instead of fast power control, fast Layer-1 hybrid ARQ with transmission combining, as well as fast scheduling directly at the Node-B on a very short time-scale of 2 ms. Thus, much of the signal processing previously performed in the radio network controller has been moved as close to the air interface as possible to allow immediate reaction to varying channel characteristics. Thus, HSDPA seems to be very well-suited to accommodate the high data rate and low latency requirements of streaming applications. 3.3. Extension of performance criteria to wireless multi-user streaming In principle, each single end-to-end streaming connection behaves as discussed in Section 2. However, in case multiple users attempt to stream data over a wireless shared link, the channel delay sequences δm for each individual user not only depend on the variance and dynamics of the wireless channel, but to a large extent also on the scheduling and buffer management algorithms applied at the base station.

262

LIEBL ET AL.

For this reason, we have used a straightforward extension of the performance criteria for the single user system by averaging the values defined in Section 2.3 over all users, i.e.

4.

Q(M, δinit ) =

M 1  Q(δm , δinit ), M m=1

(6)

P(M, δinit ) =

M 1  P(δm , δinit ). M m=1

(7)

Scheduling and buffer management strategies

4.1. Scheduling An important component of every wireless multiuser system is the scheduling unit in the base station, which is typically not standardized and therefore allows for vendor-specific optimization. The scheduling unit often consists of two functional elements, a resource allocator and a main scheduler stage: Given a certain resource budget (e.g. transmission power and number of spreading codes in HSDPA), the task of the resource allocator is to determine an allocation for each active flow that best utilizes this budget. The scheduler then selects one or several flows for each Transmission Time Interval (TTI) according to a certain scheduling policy possibly taking into account information available from the channel and the application. Several scheduling algorithms for wireless multiuser systems have already been proposed in the literature [1,3,7]. We will briefly review and characterize them in this section. For more details on the algorithms we refer the interested reader to the provided references. 4.1.1. Basic scheduling strategies Well-known wireless and fixed network scheduling algorithms include, for example, the Round Robin scheduler, which serves users and flows cyclically without taking into account any information from the channel or the traffic characteristics. 4.1.2. Channel—state dependent schedulers The simplest, but also most appealing idea for wireless shared channels—in contrast to fixed network schedulers—is the exploitation of the channel state of individual users. Obviously, if the flow of the user with the best receiving conditions (e.g. highest signal-tonoise-and-interference ratio) is selected at any time instant, the overall system throughput is maximized. This scheduler is therefore referred to as Maximum Throughput (MT) scheduler and may be the most appropriate if throughput is the measure of interest. However, as users with bad receiving conditions are blocked, some fairness is often required in the system. For example, the Proportional-Fair policy schedules the user with the currently highest ratio of actual to average throughput.

RADIO LINK BUFFER MANAGEMENT AND SCHEDULING

263

4.1.3. Queue—dependent schedulers The previously presented algorithms do not take into account the buffer fullness at the entrance of the wireless system except that flows without any data to be transmitted are excluded from the scheduling process. Queue-dependent schedulers take into account exactly this information, e.g. the Maximum Queue (MQ) scheduler selects the flow whose Head-Of-Line packet in the queue currently has the largest waiting time. 4.1.4. Hybrid scheduling policies It has been recognized that it might be beneficial to take into account both criteria, the channel state information and the queue information, in the scheduling algorithm. In Shakkottai and Stolyar [10] hybrid algorithms have been proposed under the acronyms Modified Largest Weighted Delay First (MLWDF) and Exponential Rule, which yield the most promising results among the standard solutions, but require a clever choice of gain factors and thresholds. 4.1.5. Application-aware schedulers Experimental results with more advanced schedulers are already available [6,9]. The latter take into account the special properties of streaming media flows and yield significant improvements over the standard solutions described above. However, any scheduler of this type requires a significant amount of side-information that has to be passed from the application layer to the scheduling unit in the base station. Therefore, we will not consider them in this paper. 4.2. Radio link buffer management The radio link buffer management controls the fill process of the respective buffer of each flow at the base station. Note that we assume that this buffer stores entire IP packets, which are in our case abstracted to data units (sent by the streaming server). For better insight into the problem, we assume that a single radio link buffer can store N data units, independent of their size (in practical systems, however, the physical memory is shared among the buffers and thus individual packet size matters). If the radio link buffers are not emptied fast enough because the channel is too bad and/or too many streams are competing for the common resources, the wireless system approaches or even exceeds its capacity. When the buffer fill level of individual streams approaches the buffer size N , data units in the queue have to be dropped. We will present and discuss several possible buffer management strategies in the following: 4.2.1. Infinite buffer size (IBS) Each radio link buffer has infinite buffer size N = ∞, which guarantees that the entire stream can be stored in this buffer. No packets are dropped resulting in only delayed data units at the streaming client. This is the standard procedure for a system with sufficient physical memory for a practical number of parallel streams.

264

LIEBL ET AL.

4.2.2. Drop new arrivals (DNA) Only N packets are stored in the radio link buffer. In case of a full queue, newly arriving packets are dropped and are therefore lost. Note that this is the standard procedure applied in a variety of elements in a wired network, e.g. routers. 4.2.3. Drop random packet (DRP) Similar to DNA, but instead of dropping the newly arrived packet we randomly pick a packet in the queue to be dropped. The incoming packet is enqueued at last position. This strategy is somewhat uncommon, but we have included it here since all other possibilities are only specific deterministic variants of it. 4.2.4. Drop HOL packet (DHP) Same as DRP, but instead of a random pick we drop the Head-Of-Line (HOL) packet, i.e. the packet which resides longest in the buffer. The incoming packet is enqueued at last position. This is motivated by the fact that streaming media packets usually have a deadline associated with them. Hence, in order to avoid inefficient use of channel resources for packets that are subject to late-loss at the media client anyway, we drop the packet with highest probability of deadline violation. 4.2.5. Drop priority based (DPB) Similar to DHP, but priority information is exploited. Assuming that each data unit has assigned a priority information, we drop the one with the lowest priority which resides longest in the buffer. The incoming packet is again enqueued at last position. Our motivation here is the fact that sophisticated media codecs, like H.264/AVC [14], provide options to indicate the importance of certain elements in a media stream on a very coarse scale. Hence, this side-information should be used to remove those packets first which do not affect the end-to-end quality significantly [8]. 4.2.6. Drop dependency based (DDB) The major drawback of both DHP and DPB is obvious: Starting from the HOL of the buffer when dropping medium to high priority packets is suboptimal due to the inherent temporal dependencies within the media stream. For example, if I- or P-frames in a GOP are removed, all the remaining frames in the GOP are useless at the media decoder and have to be concealed anyway. Thus, leaving them in the buffer and transmitting them leads to inefficient utilization of the scarce wireless resources. Provided that basic side-information on the structure of the incoming media stream is available to the buffer management (e.g. the GOP structure and its relation to packet priorities), an optimized strategy operates on the HOL group of packets with interdependencies: While all low priority packets can be deleted starting from the beginning of the HOL group, any medium or high priority packets should be first removed from the end of the HOL group to avoid broken dependencies. Since the structure of the media stream is usually fixed during one session, the buffer management only has to determine this information once during the setup procedure, which we believe to be feasible at least in future releases of wireless systems.

RADIO LINK BUFFER MANAGEMENT AND SCHEDULING

265

4.3. Streaming server rate control Obviously, the radio link buffer management can also be viewed as a simple dynamic means to reduce the incoming media stream to a sustainable data rate at lower quality. We will assume in our analysis that no external dynamic end-to-end rate control is performed between the media server and the client. Although this assumption might not hold for more advanced streaming systems, the performance results without rate control can be easily transferred to rate control systems. Therefore, we will only be concerned with two server modes: 4.3.1. Timestamp-based streaming In case of TBS the data units Pn are transmitted exactly at sending time ts,n . If the radio link buffer is emptied faster than new data units arrive, then it possibly underruns. In this case, this flow is temporarily excluded from the scheduling. 4.3.2. Ahead-of-time streaming In contrast, in case of ATS, the streaming server is notified that the radio link buffer can still accept packets. Hence, it forwards data units to the radio link buffer even before their nominal sending time ts,n such that the radio link buffer never underruns and all flows are always considered by the scheduler. However, the streaming server eventually has to forward a single data unit no later than at ts,n regardless of the fill level notification. Thus, a drop strategy at the radio link buffer is still required even for ATS. Note that this mode requires pre-recorded streams at the server and a sufficiently large decoder buffer at the media client.

5.

Experimental results

5.1. Definition of test scenario 5.1.1. Wireless system part We have used the same HSDPA scenario as in Liebl et al. [8] for our performance evaluations, which has been implemented with the use of our WiNe2 real-time network emulator [11]. We consider a hexagonal cellular layout with one serving base station (called Node-B in HSDPA) and 8 tiers of interfering base stations. The cell radius is 1 km and a Node-B serves three sectors of 120 degrees each. The propagation environment is of type Pedestrian-B as specified in [15]. The Node-B transmit power per sector is limited to 20 W, of which 90% are available for HSDPA. Together with the 15 possible spreading codes they represent the resource budget to be assigned by the scheduler per TTI. All user terminals apply HARQ with chase combining at the receiver, and the maximum number of transmissions of one radio link packet is 4. A total of M = 10 users are attached to the serving Node-B, which are placed randomly inside one sector and move at a speed of 3 km/h. We assume that users do not enter or exit the serving area during a simulation run by using appropriate mobility models. The average

266

LIEBL ET AL.

Table 1 Average SNIR of all 10 users. SNIR (dB) User 1 User 2 User 3 User 4 User 5

12.9 10.7 17.9 9.7 18.0

SNIR (dB) User 6 User 7 User 8 User 9 User 10

7.6 14.5 8.5 16.8 10.0

signal-to-noise-and-interference ratio (SNIR) of each user over the whole simulation duration is given in Table 1. 5.1.2. Video streaming part Each user has requested a streaming service for the same QCIF sequence of length N = 2698 frames with alternating speakers and sport scenes, which has been preencoded at a frame rate of 30 frames per second using H.264/AVC [14] test model software JM4.0 with a single quantization parameter 28 and no rate control enabled. The GOP structure is IBBPBBP...I, with an I -frame distance of 1 second. The I -frames have Instantaneous Decoder Refresh (IDR) property, the B-frames are not referenced and are therefore disposable. The PSNR results in Q(N ) = 36.98 dB, and the average bit-rate over the whole sequence duration is 178.5 kbit/s. Figure 2 shows the normalized sampling curve Bn /B N and normalized cumulative importance Q n (N )/ Q¯ versus the DTS tDTS,n . Due to the Variable Bit-Rate encoding it is obvious that the PSNR increases almost linearly, whereas the sampling curve has

Figure 2. Normalized sampling curve Bn /B N and normalized cumulative importance Q n (N )/ Q¯ for 90 seconds test sequence with N = 2698, Q(N ) = 36.9833 dB, and B N = 2008271 bytes resulting in an average bit-rate of about 178.5 kbit/s.

RADIO LINK BUFFER MANAGEMENT AND SCHEDULING

267

periods with slow increase corresponding to the low-motion speaker scenes and periods with fast increase for sport scenes. In addition, especially during low-motion parts, a staircase behavior at I -frame positions can be observed. Each encoded video frame is packetized into a single Network Abstraction Layer (NAL) unit, which itself is encapsulated into a Real-Time Transport Protocol (RTP) packet according to Wenger et al. [13]. The overhead for the RTP header and the NAL header is considered in the sampling curve. Therefore, each RTP packet with inherent timing information tDTS,n and packet size rn corresponds to a data unit Pn . In addition, the two-bit NAL Reference Identification field (NRI) in the NAL unit header is set such that disposable B-frames obtain value 0, IDR frames value 3, half of the P-frames value 1, and half of the P-frames value 2 according to their importance In . For each user the sequence starts at a random initial frame and is looped six times such that nine minutes of streaming service resulting in about 18,000 data units are simulated. 5.1.3. Evaluation criteria We evaluate the performance in terms of average PSNR Q(M = 10, δinit ) and average error rate P(M = 10, δinit ) over all users versus initial delay δinit , which is varied between 0 and 6 seconds. In addition, we compare the performance of different users based on their individual PSNR Q(δm , δinit ) for selected parameter settings. All presented results with limited buffer size assume a restriction of N = 30 packets (i.e. 1 second of video) for the radio link buffers. However, the tendencies inherent in the diagrams also hold for larger buffer sizes as well. In addition, since IBS with ATS makes no sense, we only fill up each buffer to at most N = 30 for this combination.

5.2. Buffer management performance for MT scheduling In a first set of experiments we decided to use a maximum throughput scheduler, as a key motivation of HSDPA is the exploitation of the channel state to maximize the overall throughput. In figures 3a,b we compare the average PSNR over all users under this scheduling policy for different buffer management strategies and both TBS and ATS. It is obvious that regardless of the drop and streaming strategy, the performance of the system increases with larger initial delay as the probability of late-loss decreases. As the system is overloaded (about 30%), in case of IBS the fullness of the radio link buffers increases over the length of the streams. Since no dropping is performed, users with bad channel conditions experience significant HOL blocking and excessive initial delay at the media client is required for sufficient performance of the streaming application for both TBS and ATS. However, it is worth to note that due to the maximum throughput scheduler at least some users, namely those close to the base station, are served with good quality, but the

268

LIEBL ET AL.

Figure 3. (a) and (b) Average PSNR Q(M = 10, δinit ) versus initial delay δinit for MT scheduling and buffer management strategies.

worse users experience too high channel delays for this setup. This fact is especially evident in figure 3(b), where due to the persistent occupation of the channel by the good users (who always have data to be sent in case of ATS) the quality for very short initial playout delay is already quite high, but then only increases very slowly.

RADIO LINK BUFFER MANAGEMENT AND SCHEDULING

269

In order to improve the overall system performance it is beneficial to drop data units already at the radio link buffers (irrespective of the strategy) to reduce the excess load at the air interface and convert late-loss into controlled packet removals: By keeping the radio link buffer size within a reasonable range, at least an in-time delivery of a temporally scaled version of the video stream can be guaranteed. The simplest strategy DNA, i.e. dropping the newly arrived packet, shows some gain, but is still worse than dropping packets randomly. The by far best performance is obtained by applying our proposed DDB algorithm, which is optimal for both TBS and ATS: While DHP and DPB intersect at relatively large initial delay, the curve for DDB shows good performance over the whole range of delay values, especially in case of ATS (see figure 3(b)). Furthermore, an interesting observation can be made for all drop strategies and TBS, for which the performance curves consist of two parts: If the radio link buffer size is larger than the initial delay (the leftmost part), an almost linear gain can be achieved by increasing the latter. However, soon after the initial delay matches the radio link buffer size, an increase has no effect, since the majority of packets is now dropped due to the finite buffer size and not due to deadline expiration at the media client. Thus, the PSNR curve runs into saturation (remains flat over the initial delay). This can also be seen in figure 4(a), where the average probability of lost data units over all users, i.e. P(M = 10, δinit ), is plotted versus the initial delay: While the loss rate for IBS is intolerably high over the whole range of delay values, the curves for DDB are promising and explain the good PSNR results. In case of ATS, the slow increase in PSNR over the initial delay is also reflected in the slow decrease in loss rate in figure 4(b).

5.3. Buffer management performance for MLWDF scheduling In this section we will evaluate the performance of our drop strategies for a hybrid scheduler that has been designed to account for some fair trade-off between channel and queue state. The first interesting observation is the fact that IBS should never be combined with an MLWDF scheduler: Both figures 5(a) and (b) show disastrous consequences in terms of the achievable average PSNR over all users. This is also reflected in the respective average loss rates over all users in figures 6(a) and (b). It can be concluded that considering the delay of the HOL packet in the scheduling metric leads to large performance degradations for all users in the system, if the amount of HOL blocking is not limited. On the other hand, if the radio link buffer size is limited, applying our DDB algorithm either yields close to optimum (TBS) or strictly optimum (ATS) performance. If one compares the slope of the performance curves for the MT and the MLWDF scheduler, e.g. figures 3(a) and 5(a), it is easily observable that they differ largely. Hence, we will investigate the influence of the scheduler on the performance of the streaming application in the next section.

270

LIEBL ET AL.

Figure 4. (a) and (b) Average loss rates P(M = 10, δinit ) versus initial delay δinit for MT scheduling and buffer management strategies.

RADIO LINK BUFFER MANAGEMENT AND SCHEDULING

271

Figure 5. (a) and (b) Average PSNR Q(M = 10, δinit ) versus initial delay δinit for MLWDF scheduling and buffer management strategies.

272

LIEBL ET AL.

Figure 6. (a) and (b) Average loss rates P(M = 10, δinit ) versus initial delay δinit for MLWDF scheduling and buffer management strategies.

RADIO LINK BUFFER MANAGEMENT AND SCHEDULING

273

5.4. Scheduler comparison and fairness Figures 7(a) and (b) contain the average PSNR over all users for six different combinations of scheduler and optimal drop strategy under TBS and ATS. It is interesting to note

Figure 7. (a) and (b) Average PSNR Q(M = 10, δinit ) versus initial delay δinit for different schedulers and optimal buffer management strategy.

274

LIEBL ET AL.

Figure 8. (a) and (b) PSNR Q(δm , δinit ) versus initial delay δinit of the best, worst, and average user for MT scheduling and DDB.

that besides the case of MQ scheduler and TBS, optimal buffer management is achieved in all other cases by choosing our DDB algorithm. The question which scheduler to use, however, is not as simple, but largely depends on whether the initial playout delay will be larger than the buffer size or not: For very

RADIO LINK BUFFER MANAGEMENT AND SCHEDULING

275

Figure 9. (a) and (b) PSNR Q(δm , δinit ) versus initial delay δinit of the best, worst, and average user for MLWDF scheduling and DDB.

short initial playout delays, the MT scheduler performs best, while all other schedulers with queue-based metrics are not very efficient. For reasonable initial playout delays larger than one second, the MLWDF scheduler would be the better choice. However, the type of scheduler has to be chosen upon system startup without knowing individual initial playout delays.

276

LIEBL ET AL.

A way to do this is to also consider the fairness among the users in the system by looking at the PSNR of the best, worst, and medium (“average”) user depicted in figures 8(a) and (b) and 9(a) and (b) for both MT and MLWDF: While the MT scheduler favors both the best and the medium user by suppressing the bad user significantly, the MLWDF scheduler tries to achieve a trade-off between maximum throughput of the system and fairness. Hence, the best and medium user quality is reduced by a small amount, while the system tries to supply the bad user with sufficient quality as well. This is especially evident in case of ATS (figures 8(b) and 9(b)), where the gain of the bad user is large compared to the decrease of the other two. Obviously, for reasonable initial playout delays, this support of the bad users also results in an increase in average quality over all users, since more of them contribute to it. We want to note that more advanced priority-based scheduling policies are expected to improve the performance of DDB further, since long delays of important frames become less likely then. Furthermore, a deadline-based scheduling policy may allow ATS to outperform TBS by operating always at the maximum capacity of the channel. For issues regarding the performance gain versus the complexity see, for example, [6,9]. 6.

Conclusion

In this paper we have investigated strategies for joint radio link buffer management and scheduling for video streaming over wireless shared channels. As a straightforward extension to previous work we have proposed a more sophisticated drop strategy at the radio link buffer that incorporates side-information on the temporal dependency structure in typical video streams. Albeit for one combination of scheduler and streaming mode, our newly proposed algorithm provides optimal performance over the whole range of (unknown) initial playout delays. Since the side-information only has to be determined once during the setup phase (e.g. by adjusting to the fixed priority structure within a GOP), we consider it to be feasible within future releases of wireless systems like HSDPA. Furthermore, our investigations have gained us some valuable insight into the applicability of certain types of schedulers for wireless video streaming: In particular, we showed that the combination of a queue-state dependent scheduler with infinite radio link buffer size may lead to disastrous results for all users. However, the combination of DDB with a hybrid, i.e. both channel- and queue-dependent scheduler, seems to yield a good trade-off between average quality over all users and fairness among individual users. Since including side-information on the video stream in the buffer management has proven to be successful, an extension to making part of it also available to the scheduler is the subject of ongoing research at our institute. References [1]

M. Andrews, K. Kumaran, K. Ramanan, A. Stolyar and P. Whiting, Providing quality of service over a shared wireless link, IEEE Communications Magazine 39 (2001) 150–154 .

RADIO LINK BUFFER MANAGEMENT AND SCHEDULING

[2] [3] [4] [5] [6] [7] [8]

[9] [10]

[11]

[12]

[13]

[14]

[15]

277

P.A. Chou and Z. Miao, Rate-distortion optimized streaming of packetized media, Tech. Rep. MSRTR-2001-35, Microsoft Research (February 2001). H. Fattah and C. Leung, An overview of scheduling algorithms in wireless multimedia networks, IEEE Transactions on Wireless Communications 9(5) (2002) 76–83. B. Girod, M. Kalman, Y.J. Liang and R. Zhang, Advances in video channel–adaptive streaming, in Proc. IEEE International Conference on Image Processing, Rochester (NY), USA, September 2002. H. Holma and A. Toskala, WCDMA for UMTS (John Wiley & Sons, Inc., New York, NY, USA, 2002). M. Kalman and B. Girod, Optimized transcoding rate selection and packet scheduling for transmitting multiple video streams over a shared channel, in Proc. ICIP 2005 (Genua, Italy, September 2005). S.H. Kang and A. Zakhor, Packet scheduling algorithm for wireless video streaming, in Proc. International Packet Video Workshop 2002 (Pittsburgh, USA, April 2002). G. Liebl, H. Jenkac, T. Stockhammer and C. Buchner, Radio link buffer management and scheduling for video streaming over wireless shared channels, in: Proc. Packet Video Workshop 2004 (Irvine, CA, USA, December 2004). G. Liebl, M. Kalman and B. Girod, Deadline-aware scheduling for wireless video streaming, in Proc. ICME 2005 (Amsterdam, NL, July 2005). S. Shakkottai and A.L. Stolyar, Scheduling algorithms for a mixture of real-time and non-real-time data in HDR, in Proc. 17th International Teletraffic Congress (ITC-17) (Salvador, Brazil, September 2001). T. Stockhammer, G. Liebl, H. Jenkac, P. Strasser, D. Pfeifer and J. Hagenauer, WiNe2—Wireless network demonstration platform for IP-based real-time multimedia transmission, in Proc. Packet Video Workshop 2003 (Nantes, France, April 2003). R.S. Tupelly, J. Zhang and E.K. Chong, Opportunistic scheduling for streaming video in wireless networks, in Proc. 37th Annual Conference on Information Science and Systems (Baltimore, MD, USA, March 2003). S. Wenger, M. Hannuksela, T. Stockhammer, M. Westerlund and D. Singer, RTP payload format for H.264 video, Internet Engineering Task Force (IETF), Internet Draft (work in progress), draft-ietfavt-rtp-h264-04.txt (February 2004). T. Wiegand, G. Sullivan, G. Bjontegaard and A. Luthra, Overview of the H. 264/AVC video coding standard, Special Issue on the H.264/AVC Video Coding Standard, IEEE Transactions on Circuits and Systems for Video Technology 13(7) 2003. 3GPP TR.101.112 (UMTS v3.2.0), Selection Procedure for the Choice of Radio Transmission Technologies of the UMTS, ETSI (1998).