Dynamic Quality Adaptation and Bandwidth Allocation ...

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TWC.2017.2756887, IEEE Transactions on Wireless Communications

Dynamic Quality Adaptation and Bandwidth Allocation for Adaptive Streaming over Time-Varying Wireless Networks Yashuang Guo, Qinghai Yang, F. Richard Yu, and Victor C.M. Leung

Abstract—Dynamic adaptive bitrate (ABR) streaming has recently been widely deployed in wireless networks. It, however, does not impose adaptation logic for selecting the quality of video chunks for mobile users. In this paper, we propose a two time-scale resource optimization scheme for ABR streaming over wireless networks under time-varying channels. Our proposed resource optimization scheme takes into account three key factors that make a critical impact on quality of experience (QoE) of ABR streaming, including video quality, quality variation and video rebuffer. Lyapunov optimization technique is employed to maximize the QoE of users by dynamically adapting the video quality at the application layer and allocating bandwidth at the physical layer. Without the prior knowledge of channel statistics, we develop a video streaming algorithm (VSA) to obtain the video quality adaptation and bandwidth allocation decisions. For the arbitrary sample path of channel states, we compare the QoE achieved by VSA with that achieved by an optimal T slot lookahead algorithm, i.e., knowledge of the future channel path over an interval of length T time slots. Simulation results demonstrate the effectiveness of the proposed VSA for ABR streaming over time-varying wireless networks. Index Terms—ABR streaming, dynamic resource optimization, wireless networks, quality of experience (QoE).

I. I NTRODUCTION Mobile video streaming, owing to the rapid development of smart devices, is fueling a dramatic growth of mobile data traffic lately. Cisco technical report [1] predicted that mobile data traffic will increase dramatically in the next few years, up to 24.3 EB per month by 2019, and nearly 72% of the traffic is expected to be video traffic. Therefore, the design of efficient video streaming algorithms is of key significance to provide a high quality of experience (QoE) for the increasing demand of video streaming over wireless networks. In practice, the wireless channel conditions of mobile users often vary due to reasons such as user mobility. Thus, the fluctuating wireless channel condition should be properly dealt with to provide “tolerable” video streaming service. To address this issue, adaptive bitrate (ABR) streaming [2] has become This research was supported in part by NSF China (61471287) and 111 Project (B08038). (Corresponding author: F. Richard Yu) Y. Guo and Q. Yang are with State Key Laboratory of ISN, School of the Telecomm. Engineering, Xidian University, China. (email: [email protected]; [email protected]) F. R. Yu is with the Department of Systems and Computer Engineering, Carleton University, Ottawa, Canada. (e-mail: [email protected]). V. C. M. Leung is with the Department of Electrical and Computer Engineering, The University of British Columbia, Vancouver, BC V6T 1Z4, Canada. (email: [email protected])

an important enabling technology to provide video streaming service to mobile users under time-varying channel conditions. In ABR streaming, a video content is encoded into multiple versions at different rates. Each encoded video is further fragmented into small video chunks, each of which typically contains several seconds of video. In this way, a mobile user can dynamically adapt to the change in the channel condition by adjusting the video bitrate. For instance, a mobile user can receive a high version encoded at high bitrate when the channel is good, whereas the mobile user may receive a low version encoded at low bitrate when the channel is poor. Currently, most of the video files requested by mobile users are sourced from external video servers [3], such as video server on the Internet. Therefore, two components are required to realize practical ABR streaming to mobile users over timevarying channels: (1) quality adaptation (QA) at the application layer (APP), which determines the quality level at which each chunk should be fetched from video server for each user, and (2) radio resource management (RRM) (e.g., bandwidth allocation) at the physical layer (PHY), which determines the transmission rate for users through RRM at the physical layer of wireless networks. Apparently, these two components are coupled. However, in real wireless networks, providing ABR streaming to mobile users is very challenging, because it involves not only the receiver buffer state, but also the channel state information. Besides, QoE of video streaming is affected by many factors such as video quality, rebuffer time, and quality variation. Moreover, some of the contributing QoE factors are conflicting. For example, selecting a higher video bitrate would improve the video quality while it may lead to many rebuffer time concurrently. Similarly, adjusting the video quality by adapting to the current channel condition may help improve the video quality whereas it may lead to frequent quality variation (or switching), which would be visually annoying. In general, there is no closed form expression available to show the direct relation between QA, RRM and many contributing factors of ABR streaming. The RRM problem is even harder in typical wireless networks where multiple users concurrently stream videos through the same access network, since the radio resource at the PHY is coupled among the multiple users. As a result, developing efficient algorithms to perform QA and RRM based on the many contributing QoE factors for improving QoE of ABR streaming is of great importance.

1536-1276 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TWC.2017.2756887, IEEE Transactions on Wireless Communications 2

Based on the network model for ABR streaming over realistic wireless networks, and from the system point of view, our first-order goal of the paper is to design a general framework and investigate the unifying principles for ABR streaming in practical wireless networks with consideration of the timevarying channels and the several key factors affecting QoE of ABR streaming. To do so, we design a two time-scale RRM framework, and address the problem of fine-grained resource optimization for ABR streaming in time-varying networks by directly formulating three of the key affecting QoE factors, i.e., video quality, video rebuffer, and quality variation into the problem formulation. Specifically, the main contributions of this paper are as follows: • A stochastic optimization model is employed to maximize the long-term time-averaged QoE of users, which takes into account three QoE factor of ABR streaming, including video quality, video rebuffer, and quality variation. • By using Lyapunov optimization technique, we develop an online video streaming algorithm (VSA) to perform the joint QA and bandwidth allocation (BA) without any prior knowledge of channel statistics. In particular, VSA separates the joint QA and BA problem into two independent subproblems, which can be performed at the APP and the PHY, respectively. • We prove that the BA subproblem is a convex optimization problem. And by exploiting the special structure of the BA subproblem, we develop the optimal algorithm for the BA subproblem. • For the arbitrary sample path of channel states, we compare the QoE achieved by VSA with that achieved by an optimal T -slot lookahead algorithm, i.e., knowledge of the future channel path over an interval of length T time slots. The remainder of this paper is organized as follows. Section II presents the related works of ABR streaming over timevarying wireless networks. Section III provides the system model followed by the problem formulation in Section IV. We present the VSA in Section V and analyze the performance of the proposed VSA in Section VI. In Section VII, we present the simulation results, and conclude the paper in Section VIII. II. R ELATED W ORKS OF ABR S TREAMING OVER W IRELESS N ETWORKS In recent years, by leveraging ABR, RRM in wireless networks has been studied in many works. Specifically, we can classify the existing RRM works into two main categories: snapshot-based and queue-based algorithms. Snapshot-based algorithms are based on the infinite backlog assumption, namely, they simply assumed that users’ requested video files are already pre-stored at the access networks. With this assumption, by utilizing a quality-rate model, which mapped the physical layer rate into video quality, the proposed algorithms were performed repeatedly at each time slot to determine the physical layer rate (and also video quality at the APP) at the current time slot. In [5], considering the fairness between different users, the authors devised a low-complexity joint power allocation and subcarrier assignment scheme to solve the max-min

TABLE I S UMMARY OF KEY NOTATIONS Notation K t T Tp Tc m Qk (t) W gk (t) P N0 xk (t) Ck (t) Rk (t) Uk (t) sk (t) Bk (t − 1) Td,k (t) Tr,k (t) w q , w v , wr D D1 , D2 V ∆T (t) ∆(Θ(t)) Θ(t)

Meaning Number of users Index of time slot Number of time slots in the network Time duration of transmission time at the PHY Time duration of a chunk time Index of chunk time Number of bits in the data queue for user k at time slot t Bandwidth of the network Channel gain from BS to user k during time slot t Constant transmit power of BS Power spectral density of additive white Gaussian noise Fraction of bandwidth allocated to user k at time slot t Transmission rate from BS to user k during time slot Number of bits placed in the data for user k at time slot t Video quality for user k at time slot t Remaining bitrate of the chunk that has been selected to be downloded/or being downloaded during transmission time slot t Buffer video time for user k at the end of time slot (t − 1) Time needed to completely download the remaining part of the ongoing chunk during transmission time slot t Rebuffer time of user k’s playback buffer during time slot t Weights of video quality, video quality variation and video rebuffer time Time duration of a chunk Positive constants Lyapunov control parameter T -slot Lyapunov drift One slot Lyapunov drift Network state at time slot t

video quality optimization problem. In addition, several studies have focused on multicell networks [6–9]. For example, Li et al. [8] developed a two-step user association and subcarrier assignment algorithm to improve the total video quality of the network with lower transmit power consumption. Xie et al. [9] considered the joint problem of user association and rate allocation in heterogeneous wireless networks to maximize the video quality of the network. Under the infinite backlog assumption, the snapshot-based algorithms only consider an individual part, (i.e, the RRM subproblem), of the joint QA and RRM problem of ABR streaming over real wireless networks. However, from the system point of view, the performance of the whole is not the sum of the individual parts, but is a consequence of the relationship of the performance between the parts [30]. Thus, problems cannot be solved separately, since they are interdependent. Therefore, the snapshot-based works are actually heuristic algorithms, and, generally, the snapshotbased algorithms result in the underutilization of the available network resources. That is because in such algorithms, the video bitrates higher than the current channel condition can support are never to be selected to avoid playback interruptions (or rebuffer). Therefore, the snapshot-based algorithms usually cannot reach the maximum QoE of ABR streaming allowed by the available network [10]. Another category of RRM for ABR streaming is the queuebased algorithms. With consideration of both the time-varying nature of wireless channels and the fact that videos are from



&6,

&6, %6

8VHU

&6,

%$

4$ 9LGHR6HUYHU

8QLWWLPHIRU4$ )RUWKHSODFHPHQWRIFKXQNVDWWKHDSSOLFDWLRQOD\HU

8VHU

&6,

8VHU.

WLPH

%DQGZLGWK $FFRFDWLRQ

Ă

QSI R W

C W

R W

C W

RK W

Fig. 1.

CK W

System Model.

external servers, the authors in [4] proposed a cross-layer QA and RRM (e.g., power allocation and subcarrier assignment) scheme for ABR streaming over OFDMA networks. With the aim to maximize the time-averaged video quality, an online algorithm was developed to obtain the optimal QA, power allocation and subcarrier assignment decisions at each time slot. Based on the queue model, Kim et al. [11] developed both the centralized and distributed algorithms for the implementation of ABR streaming in caching D2D systems. In [12], the authors developed a scheduling policy for users to dynamically select the helpers to download from and adjust the QA in a network formed by multiple helpers and several users. However, as a common feature, all of the aforementioned works [4, 11, 12] did not comprehensively take into account the factors that make critical influence on QoE of ABR streaming. That is, the optimization problems in the aforementioned works were formulated as optimization problems only to maximize the time-averaged video quality while other QoE factors were not incorporated into the objective function. III. S YSTEM MODEL As shown in Fig. 1, we consider a downlink single-cell wireless network consisting of one base station (BS) and K users. The network operates in a time-slotted manner with a time slot index t ∈ {0, 1, 2, · · · }. Each user requires an ABR streaming service from external video server. Video data from the server arrives at the BS at each chunk time. The BS maintains a transmission queue for each user to temporally store the arrived data before the data is transmitted to its corresponding user. The controller at the BS can obtain queue state information (QSI) by observing the amount of backlogged data in these data queues. Users’ channel state information (CSI) can also be reported to the controller at the BS from the feedback channel. Then, the controller utilizes the joint information of CSI and QSI to make the QA and BA decisions. A. Two-Timescale Operations We envision that QA is operated at a larger time scale than BA. This is because, in practice, the time scale for QA, which places chunks at the BS data queue, is much larger than the BA unit time, as can be seen in Fig. 2. In current video streaming technology [13], the duration of a video chunk is usually 2-10

8QLWWLPHIRU%$ )RUWKHWUDQVPLVVLRQRIGDWDDWWKHSK\VLFDO OD\HU

Fig. 2.

Two differentiated time-scales of QA and BA.

seconds. By contrast, the time length of each physical resource block (PRB), denoted by Tp , in wireless networks is usually of a much lower order. For example, the time length of each PRB in Long Term Evolution (LTE) systems is 0.5 ms, i.e, Tp = 0.5ms. We refer every time slot with a duration of Tp as the transmission time, and every chunk time with a duration of Tc , as the video chunk time. For simplicity, Tc is assumed to be integer multiples of the transmission time Tp , i.e., new chunks are requested at integer multiples of the transmission time. B. Data Queue Model and Network Dynamics At time slot t = mTc , i.e., the beginning of the mth chunk time, each user k ∈ K requests a quality mode for its requested video. That is, at each chunk time t ∈ {0, Tc , 2mTc , 3mTc , · · · }, each user k ∈ K specifies the quality mode Uk for its next video chunk. This decision specifies the quality Uk (t) and the amount of bits Rk (t) associated with the chunk requested. The bits Rk (t) are placed in the data queue Qk at BS. The data queue Qk (t) evolves over the transmission slots as follows: Qk (t + 1) = max[Qk (t) − Ck (t)Tp , 0] + Rk (t),

(1)

where Ck (t) is the transmission rate from BS to user k at time slot t and Ck (t)Tp is the amount of bits downloaded by user k from BS during time slot t. Note that the data queue in (1) decreases every transmission slot t as new bits are transmitted from BS to user, but can only increase at chunk time t = mTc as new chunks are placed in the BS. Intuitively, Qk consists of bits associated with video chunks that have been prepared in the BS (or at the edge of the network) for user k but not yet fully received by user k [21]. An individual queue Qk (t) is strongly stable [14] if: T −1 1∑ lim sup E{Qk (t)} < ∞, T →∞ T t=0

(2)

and a network is stable if all individual queues in the network are stable. In real systems, strong stable ensures that the longterm average departure rate from the queue is greater than or equal to the long-term average input rate injected into the


This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TWC.2017.2756887, IEEE Transactions on Wireless Communications 4 T∑ −1 1 Ck (t) T →∞ T t=0

queue, i.e., lim

T∑ −1 1 Rk (t) T →∞ T t=0

≥ lim

[14]. Thus,

all of the chunks placed in a data queue in the BS will be finally transmitted to its corresponding user when the data queue is strongly stable.

%XIIHU YLGHRWLPH '

EHJLQWRGRZQORDG WKHVHOHFWHGFKXQN DWWLPHVORWW

C. Downlink Bandwidth Allocation Let xk (t) be the fraction of bandwidth that the BS will allocate to user k at time slot t. Because the allocated bandwidth of all users cannot be greater than 1, we then express the BA constraints as follows: xk (t) ∈ [0, 1], ∀k, t, 0≤

K ∑

xk (t) ≤ 1, ∀t.

WKHFXUUHQWFKXQNEHLQJ GRZQORDGHGLVFRPSOHWHO\ GRZQORDGHG $WWLPHW7S

'7S

(3)

3OD\EDFN EXIIHULV HPSW\

7LPH W

Fig. 3.

W

W

W

W

Illustration of buffer video time dynamics.

(4)

k=1

The transmission rate for user k at time slot t, denoted by Ck (t), is expressed as [17] ( ) P gk (t) Ck (t) = W xk (t) log2 1 + , ∀k, t, (5) xk (t)W N0 where W is the bandwidth of the network, gk (t) is the channel gain from BS to user k at time slot t, N0 is the power spectral density of additive white Gaussian noise, and P is the constant transmit power of BS. D. Video Streaming Model 1) Video Quality Model: We model the video as a set of consecutive video chunks. Each chunk is encoded at different bitrates. Let Lk,max be the maximum quality level for video k, and Lk = {1, 2, · · · , Lk,max } be the set of all available video quality levels for video k, each of which contains D seconds of video. Let Rk = {Rk,1 , Rk,2 , · · · , Rk,Lk,max } be the set of all available bitrate levels for user k. Generally, the higher bitrate is selected, the higher video quality is perceived by the user. Let Uk (.) : Rk → Uk be a nondecreasing function, which maps the selected bitrate Rk to video quality perceived by user k. In practice, Uk (.) may depend on many factors like the content of the video. For example, for HD (high-definition) video like animated movies, which has more details, 3Mbps and 1Mbps may lead to significant difference in user perceived video quality, whereas the video quality in 3Mbps and 1Mbps may be similar on user perceived video quality for videos like weather forecast, which has less details [22]. 2) Buffer Video Time Model: The video chunks are downloaded into a playback buffer at mobile user, which contains downloaded but yet unviewed video. In this paper, considering the facts that most of today’s mobile devices have enough storage capacity to download the entire video chunks [10] and, in current wireless networks, the dominant case of user playback buffer is rebuffer due to the insufficient wireless network resources [18], we do not consider the buffer overflow. A typical playback buffer may hold a few seconds of video chunks. In conventional single quality (version) video streaming, the buffered video playback time can be easily measured by

dividing the buffered video size by the average video playback rate [18]. In ABR streaming, however, since a playback buffer contains chunks from different qualities, there is no longer a direct mapping between the buffered video size and the buffered video time. To tackle the problem, we use the buffer video time [19] to measure the length of playback buffer. Let Bk (t − 1) denote the buffer video time (measured in seconds) for user k at the end of time slot (t − 1). The buffer video time evolves as the chunks are being downloaded and the video is being played. Specifically, the buffer video time increases by D seconds after a chunk is downloaded and decreases as the user watches the video. Fig. 3 helps illustrate the conceptual operation of the buffer video time. At chunk time t = mTc , the playback buffer starts to download a chunk (say the o-th chunk). The download time for the o-th chunk depends on the selected bitrate as well as the download speed (the transmission rate for user at the PHY) experienced during this whole download process. Let sk (t) be the remaining bitrate of the chunk that has been selected to be downloded/or being downloaded during transmission time slot t by the QA policy. Define Td,k (t) =

sk (t) sk (t) ( = Ck (t) xk (t)W log2 1 +

P gk (t) xk (t)W N0

)

(6)

as the download time of sk (t), which is the time needed to completely download the remaining part of the ongoing chunk during transmission time slot t at the PHY transmission rate Ck (t). Without loss of generality, in the paper, no more than one chunk can be downloaded during a transmission slot, namely, if the undergoing chunk downloading has been finished before the end of the transmission time t, it will wait until the next transmission time to download a new chunk. This provision is reasonable because of the following two reasons: (1) the chance of fully downloading a video chunk in one transmission slot is very small due to the limited radio resources; and (2) in practice, video player needs some time to process the completely downloaded chunk before a new chunk is to be downloaded. Let Tr,k (t) be the rebuffer time (measured in seconds) of user k’s playback buffer during time slot t: 1) If Tp ≥ Td,k (t) and Td,k (t) < Bk (t − 1), the remaining part of user k’s chunk selected for transmission at time



slot t will be completely downloaded during transmission time slot t as Tp ≥ Td,k (t). In addition, since Td,k (t) < Bk (t − 1), there is no rebuffer, i.e.,

2) Average quality variation: This tracks the magnitude of the changes in{the quality from one chunk to another: } T∑ −1 K ∑ lim T1 E |Uk (Rk (t)) − Uk (Rk (t − 1))| .

Tr,k (t) = 0.

3) Rebuffer: This tracks the time lengths } playback { K that the T∑ −1 ∑ buffer is empty: lim T1 E Tr,k (t) .

(7)

And the buffer video time of user k at the end of time slot t is given as Bk (t) = Bk (t − 1) − Tp + D.

(8)

2) If Tp ≥ Td,k (t) and Td,k (t) ≥ Bk (t − 1), the remaining part of user k’s chunk selected for transmission at time slot t will also be completely downloaded during time slot t as Ts ≤ Td,k (t). But user k will experience a duration of rebuffer since Td,k (t) ≥ Bk (t − 1). The rebuffer time Tr,k (t) is given as1 Tr,k (t) = Td,k (t) − Bk (t − 1),

(9)

and the buffer video time of user k at the end of time slot t is Bk (t) = D − (Tp − Td,k (t)) = D − Tp + Td,k (t). (10) 3) If Tp < Td,k (t), the remaining part of user k’s chunk selected for transmission at time slot t will not be completely downloaded during time slot t as Tp < Td,k (t). In this case, if Bk (t − 1) ≥ Tp , there is no rebuffer; otherwise, user k will experience a duration of rebuffer, which is given as Tp − Bk (t − 1). Thus, the rebuffer time during time slot t is given as Tr,k (t) = (Tp − Bk (t − 1), 0)+ .

(11)

Here, the notation (x)+ = max(x, 0) ensures that the term can never be negative. The buffer video time of user k at the end of time slot t is given as Bk (t) = (Bk (t − 1) − Tp )+ .

(12)

Based on the above discussion, the rebuffer time of user k during time slot t can be summarized as Tr,k (t) = [min (Td,k (t), Tp ) − Bk (t − 1)]+ ,

(13)

and the buffer video time of user k at the end of time slot t can be summarized as { max [Bk (t − 1), Td,k (t)] + D − Tp , if Tp ≥ Td,k (t), Bk (t) = (Bk (t − 1) − Tp )+ , otherwise. (14) 3) QoE Maximization: The ultimate goal of the joint QA and BA is to improve the QoE of users in order to achieve higher long-term user engagement. While there are many contributing factors affecting QoE of ABR streaming, three of the key QoE factors are as follows [20]: 1) Average video quality: The average { K per-chunk} quality T∑ −1 ∑ 1 over all chunks: lim T E Uk (Rk (t)) . T →∞

t=0

k=1

1 In the paper, we consider that D > T . Thus, in (9) there is no rebuffer p after Td,k (t) since D > Tp > Tp − Td,k (t).

T →∞

t=1

k=1

T →∞

t=0

k=1

As different contributing factors may have different impact on QoE, we define the QoE of ABR streaming by a weighted sum of the aforementioned factors [19]: {K } T −1 ∑ 1 ∑ QoE =wq lim E Uk (Rk (t)) T →∞ T t=0 k=1 {K } T −1 ∑ 1 ∑ E |Uk (Rk (t)) −Uk (Rk (t − 1))| − wv lim T →∞ T t=0 k=1 } {K T −1 ∑ 1 ∑ (15) − wr lim E Tr,k (t) . T →∞ T t=0 k=1

Here wq , wv and wr are non-negative weighting parameters corresponding to video quality, video quality variation, and rebuffer time, respectively. IV. P ROBLEM F ORMULATION From the above descriptions, the joint QA and BA is to maximize the time-averaged QoE of all the users in the network. Meanwhile, the stability of the network as well as the BA constraints must be satisfied. The QoE optimization problem is formulated as the following stochastic optimization problem: max

P1:

QoE

R(t), X(t)

s.t.

C1: C2:

(16)

xk (t) ∈ [0, 1], ∀k, t, K ∑

xk (t) ≤ 1, ∀t,

k=1

C3: C4:

0 ≤ Rk (t) ≤ Rk,Lk,max , ∀k, t, Queue Qk is strongly stable, ∀k,

where R(t) = {Rk (t)} and X(t) = {xk (t)} are the amount of bits vector for chunks and the BA vector of the network, respectively. C1 and C2 are the BA constraints, which guarantee that the bandwidth occupied by all of the users cannot be greater than 1. C3 is the instantaneous QA constraint, which means that each user can only be served with a quality no greater than the maximum available Lk,max . C4 is the network stability constraint. Theoretically, the optimal solution to P1 can be found if the prior knowledge of CSI is available by using methods such as dynamic programming. However, these methods are computationally complex and suffer from the curse of dimensionality. For example, the computation complexity of Markov decision process (MDP) (a typical representative of dynamic programming), is exponential to the number of users. Furthermore, it is usually impractical to obtain the prior knowledge of CSI in real



networks. With this regard, Lyapunov optimization technique [14], [25] has become a good fit for wireless networks, because the algorithms developed from Lyapunov optimization technique have many advantages, e.g., less requirement for prior knowledge and low computation complexity. To this end, in this study, we develop an online algorithm, referred to as VSA, depending solely on the current information of QSI and CSI, to solve P1.

A. Problem Transformation Let Θ(t) = [Q(t)] be a concatenated vector. Define the following Lyapunov function [14], [25]: K

(17)

k=1

Without loss of generality, we assume that all queues are empty when t = 0 such that L(Θ(0)) = 0. Define the one-slot Lyapunov drift ∆(Θ(t)) as follows: ∆(Θ(t)) , E{L (Θ(t + 1)) − L (Θ(t))}. Subtracting the expectation of QoE(t) =

K ∑

(18)

QoEk (t) at time

k=1

slot t from (18), we obtain the following drift-minus-reward term: ∆(Θ(t)) − V E{QoE(t)},

{ +E

k=1 K ∑

k=1 {K ∑

}

V wv |Uk (Rk (t)) − Uk (Rk (t − 1) | } [V wr Tr,k (t) − Qk (t)Ck (t)Tp ] ,

(20)

k=1

In this section, we present the design rationale of the proposed VSA by using Lyapunov optimization technique, which enables us to stabilize the data queues and maximize the QoE simultaneously. The application of this method transforms the stochastic optimization problem P1 into a series of successive instantaneous static optimization problems. We first show how the original problem P1 is transformed using the framework of Lyapunov optimization technique. Then, we show the detailed procedures of implementing VSA.

1∑ 2 Qk (t). 2

∆(Θ(t)) − V E{QoE(t)} {K } ∑ ≤ D1 + E [Qk (t)Rk (t) − V wq Uk (Rk (t))]

+E

V. DYNAMIC V IDEO S TREAMING A LGORITHM

L (Θ(t)) =

the following upper bound:

(19)

where V is a non-negative constant parameter that controls the trade-off between the drift ∆(Θ(t)) and the reward E{QoE(t}. A greater value of V indicates a greater priority assigned to maximize QoE at the expense of larger queue lengths, and vice versa. According to the design principle of Lyapunov optimization technique, the VSA decisions should be chosen to minimize an upper bound of (19) at each time slot t. Theorem 1 stated below provides such an upper bound. Theorem 1: (Upper Bound of the Drift-Minus-Reward Term) Under any control algorithm, the drift-minus-reward term has

where D1 is a positive constant that satisfies the following inequality for time slot t: } 1∑ { 2 E Rk (t) + Ck2 (t)Tp2 . 2 K

D1 ≥

(21)

k=1

Proof : Similar proof can be found in [14], [25]. By Theorem 1 and the principle of Lyapunov optimization technique, the original problem P1 is transformed into minimizing the right-hand side (R.H.S) of (20) subject to constraints C1, C2, and C3. B. Algorithm Design According to the design principle of Lyapunov optimization technique, the joint QA and BA decisions are made to minimize the R.H.S of (20) subject to C1, C2, and C3. Specifically, because the second and third terms on the R.H.S of (20) only involve the QA decision {Rk (t)}, and the last term on the R.H.S of (20) only involves the BA decision {xk (t)}, the joint QA and BA problem can be separated into two independent subproblems, with QA minimizing the second and the third terms of (20) over all possible QA decisions, and BA minimizing the last term of (20) over all possible BA decisions. Algorithm 1 describes the pseudo-code of VSA. The controller performs the following three operations: (1) QA, which determines the served quality for each user; (2) BA, which is responsible for the bandwidth allocation; and (3) Updating for {Qk (t)} and {Bk (t)}. Algorithm 1 Video Streaming Algorithm VSA 1: Input: (K, W, N0 , Tc , Tp , T ) 2: Initialization: t ← 0, Q(0) = 0, B(0) = 0 3: While t ≤ T 4: At each chunk time t = mTc 5: R(t) = QA(Q(t), B(t)) 6: At each transmission time slot t 7: X(t) =BA(Q(t), H(t)) 8: Q(t + 1) = queue updating (Q(t)) according to (1). 9: B(t + 1) = queue updating (B(t)) according to (14). 10: t←t+1 11: end while 12: Output: R, X, Q, B



1) Video Quality Adaptation (at each chunk time): The QA subproblem determines the selected quality (or data rate at the APP) at each chunk time for each user. The QA subproblem is formulated as K ∑ min [Qk (t)Rk (t) − V wq Uk (Rk (t))] {Rk (t)}

+

k=1 K ∑

V wv |Uk (Rk (t)) − Uk (Rk (t − 1)) |

k=1

s.t. C3.

(22)

The minimization variables Rk (t) appear in separate terms of the sum and hence can be optimized separately over each user. Typically, in practical networks, Rk (t) takes values in the finite and discrete set Rk , thus, the optimal value of Rk (t), denoted by Rk∗ (t), can be determined by searching all of the possible chunk sizes in Rk . Based on the design of VSA, the rate adaptation in VSA is a radio access network (RAN)-side rate adaptation algorithm. Currently, client-side adaptation, which rate adaptation is performed at the user side, is the usual case in real systems. Thus, a third-party software [23, 24], which enables the individual user to obtain the instantaneous CSI and QSI at each time slot, is needed to make VSA a general user-side rate adaptation algorithm [2], [21]. 2) Bandwidth Allocation (at each transmission time): The BA subproblem determines the transmission rate at each transmission time t by the bandwidth allocation. The BA subproblem is formulated as K ∑ min [V wr Tr,k (t) − Qk (t)Ck (t)Tp ] {xk (t)}

k=1

s.t. C1 and C2.

(23)

In fact, the BA subproblem is a convex problem of {xk (t)}. For proving the convexity of the BA subproblem, we will start with the BA subproblem with single user: • Case 1: When Tp ≤ Bk (t−1), the rebuffer time Tr,k (t) = 0, the objective function in (23) reduces to the following problem ( ) P gk (t) min −Qk (t)xk (t)W Tp log2 1 + . N0 xk (t)W xk (t) •

Case 2: When Tp > Bk (t − 1) and Td,k (t) ≤ Bk (t − 1), the rebuffer time Tr,k (t) = 0, the objective function in (23) also reduces as follows ( ) P gk (t) min −Qk (t)xk (t)W Tp log2 1 + . N0 xk (t)W xk (t)

•

Case 3: When Tp > Bk (t − 1) and Td,k (t) ≥ Tp , the rebuffer time Tr,k (t) = Tp − Bk (t − 1), the objective function in (23) reduces to min xk (t)

V wr [Tp − Bk (t − 1)] ( − Qk (t)xk (t)W Tp log2 1 +

P gk (t) N0 xk (t)W

) .

•

Case 4: When Tp > Bk (t−1) and Tp > Td,k (t) > Bk (t− 1), the rebuffer time Tr,k (t) = Td,k (t) − Bk (t − 1), the objective function in (23) becomes   sk (t) ) − Bk (t − 1) ( min V wr  (t) xk (t) xk (t)W log2 1 + N0Pxgkk(t)W ( ) P gk (t) . − Qk (t)xk (t)W Tp log2 1 + N0 xk (t)W

Based on the above discussion, the objective functions in case 1 and case 3 at time slot t can be summarized as min xk (t)

V wr [max (Tp − Bk (t − 1), 0)] ( ) P gk (t) − Qk (t)xk (t)W Tp log2 1 + , N0 xk (t)W

(24)

and the objective functions in case 2 and case 4 at time slot t can be summarized as min xk (t)

V wr [max (Td,k (t) − Bk (t − 1), 0)] ( ) P gk (t) − Qk (t)xk (t)W Tp log2 1 + . N0 xk (t)W

(25)

1 Define f (xk (t)) = Td,k (t) ( ) = sk1(t) xk (t)W log2 1 + N0PWgkx(t) , then we can prove k (t) that f (xk (t)) is an increasing and concave function, and that Td,k (t) is a decreasing and convex function with respect to xk (t) in the interval [0, 1]. Proof: Please refer to Appendix A for the proof. With Theorem 2, we can prove that the BA subproblem with multiple users is a convex optimization problem. Proof: For cases 1 and case 3, it is easy to know that the objective function in (24) is convex, since f (xk (t)) is concave. For the objective function in (25) for case 2 and case 4, according to the Pointwise Maximum Theory [15] (page 80) : if f1 and f2 are convex functions, then their pointwise maximum f3 , defined by

Theorem

2:

f3 (x) = max {f1 (x), f2 (x)} ,

(26)

is also convex. With (26), we can know that max (Td,k (t) − Bk (t − 1), 0) in the objective function in (25) is convex. As a result, the objective function in (25) is also convex, as it is the sum of 2 convex functions max (Td,k (t) − Bk (t − 1), 0) and −Qk (t)sk (t)Tp f (xk (t)). Therefore, the objective function for the whole network with K users is convex, as it is the sum of K convex functions. In addition, C1 and C2 are all linear constraints; thus, the set produced by them for X(t) is convex. Therefore, C1 and C2 together construct a convex set as well. Therefore, the BA optimization problem minimizes a convex function over a convex set, and thus, it is a convex optimization problem. For the multiple-user case, we can solve the BA subproblem by exploiting the convexity of the BA subproblem. Specifically, from the perspective of mathematics, there are only



two different kinds of the BA problem formulation for each user as expressed in (24) and (25), respectively. Thus, the BA subproblem of the network can be solved after formulating all of possible BA formulations with K users. That is, for a user (say user k), if Tp ≤ Bk (t − 1) in case 1 is satisfied or Tp > Bk (t − 1) and Td,k (t) ≤ Bk (t − 1) in case 3 is satisfied, the objective function for user k is (24); else the objective function for user k is (25). Therefore, for solving the BA subproblem with multiple users, we can implement the four procedures as follows: (1) Firstly, check which of the objective functions in (24) and (25) each user may belong to; (2) Secondly, formulate the BA subproblem of the network based on all of the possible problem formulations checked in (1); (3) Thirdly, adopt the standard convex optimization techniques to solve the formulated BA subproblem of the network by off-the-shelf solvers, e.g., cvx [16]; (4) Fourthly, after solving all of the possible BA problem formulations, select the X(t) with the minimum value of the objective function in (23) as the optimal BA decisions. Algorithm 2 The optimal BA Algorithm at each transmission slot with Multiple Users 1: for k = 1 to K 2: check which of the objectives in (24) and (25) each user possibly belongs to 3: end for 4: Formulate all of the possible BA optimization problems of the network based on the checked results. 5: Adopt a standard convex optimization techniques (e.g., CVX) to solve the formulated BA optimization problem. 6: Choose the X(t) with the minimum value of the objective function in (23) as the optimal BA decisions. Remark 1: (Intuitions behind the BA Policy) The common feature of Case 1 and Case 2 is that, the rebuffer time is 0, which means that the event of video rebuffer does not occur during transmission time slot t. This is due to that the video buffer time (or the pre-transmitted data at user’s playback buffer) at the beginning of time slot t (or at the end of time slot t − 1) is large enough to prevent user from video rebuffer (or starvation). In contrast, Case 3 and Case 4 mean that the video buffer time at the beginning of transmission time slot t is not large enough to prevent user from rebuffer. More specifically, Case 3 implies that the video buffer time at the beginning of transmission time slot t is particularly small such that even though the whole bandwidth of the network during transmission time slot t is allocated to the user, it still cannot completely download the undergoing chunk. Thus, the user is bound to experience a fixed time length of rebuffer, given as Tr,k (t) = Tp −Bk (t−1). Case 4 implies that the video buffer time at the beginning of time slot t is relatively small such that user’s rebuffer time can be decreased by allocating more bandwidth and thus the objective function in Case 4 becomes a tradeoff between user’s rebuffer time and transmission rate.

In Case 1, Case 2 and Case 3, the objective functions of the BA subproblem all become the pure transmission rate maximization problem, whereas in Case 4, the objective function is a tradeoff between current video rebuffer time and transmission rate. Currently, due to the insufficiency of the wireless network resources, completely smooth video streaming without any video rebuffer (or stall event) (as in Case 1 and Case 2) is hardly possible. Thus, the essence of the dynamic ABR streaming scheme with considerations of the key QoE contributing factors, including video rebuffer, video quality, and quality variation, has been embedded in VSA. Now, if we look at the whole network from the scheduler’s (in the BS), QA Rk (t) at the APP is the input and transmission rate Ck (t) at the PHY (resulted from BA xk (t) is the output. From (22) and (23), we can know that the input and output of the network are mutually affected. That is, the whole network (or system) implementing VSA is now kind of an open network (or system) with feedbacks, in which the feedbacks are QA in the last time slot, QSI and CSI. For example, as expressed in (22), if there is a backlog of the data queue because of limited bandwidth and/or high quality selection at the previous time slots, then it is more likely that a larger amount of new data corresponding to a higher quality level will be rejected in the next time slot to avoid congestion at the BS. (Computation Complexity of the Optimal BA Algorithm with Multiple Users) From the above analysis, we know that each user can possibly have two BA problem formulations. Thus, the BA algorithm in Algorithm 2 with K users has at most 2K formulations, which is exponential to the number of the users. In the paper, we focus on the cross-layer resource management scheme for ABR steaming. Thus, we leave the work of developing an efficient and low-complexity BA algorithm as a future work.

VI. P ERFORMANCE A NALYSIS OF VSA In the real-world scenarios, arbitrary user motion yields slower time variations of CSI at the same time-scale of the video chunk. As a result, any stationarity or ergodicity assumption about CSI is unlikely to hold in some practical networks. Therefore, we consider the optimality of VSA for an arbitrary sample path of CSI G(t). 2 Following the footsteps of [14], we compare the QoE achieved by the proposed VSA with that achieved by an optimal policy with T -slot lookahead, i.e., knowledge of the future CSI over an interval of length T time slots. Time is split into frames of duration T time slots and we consider F such frames. For an arbitrary CSI sample path G(t), we consider the following static optimization problem over the j-th (j ∈ [0, 1, · · · , F −1])

2 Currently, due to the development of Lyapunov optimization technique, we cannot give a concrete performance analysis of the two-time scale VSA.



with the bounded queue backlogs satisfying

frame: 1 QoEj , T

max

t=jT +T −1 ∑ K ∑

t=jT +T −1 ∑ K ∑

s.t.

t=jT

t=jT

F T −1 K 1 ∑ ∑ D2 T V (QoE max − QoE min ) Qk (t) ≤ + F T t=0 ϵ ϵ

QoEk (t)

k=1

k=1

[Rk (t) − Tp Ck (t)] ≤ 0.

(27)

k=1

In (27), G(t) for t ∈ [jT, · · · , jT + T − 1] in the j-th frame are treated as known quantities. Define the T -slot Lyapunov drift ∆T (t) as follows: ∆T (t) , E{L (Θ(t + T − 1)) − L (Θ(t))}. Subtracting the expectation of QoE(t) =

K ∑

In (31), QoEjopt is the maximum value of the j-th frame of (27). In (32), QoE max and QoE min are positive constants satisfying the following constraint: QoE min ≤ QoE (a∗ (t)) ≤ QoE max , a∗ (t) ∈ AG(t) ,

(28)

QoEk (t) at time

0 ≤ Rk∗ (t) ≤ Rkmax , ∀t, k, 0 ≤ Ck∗ (t) ≤ Ckmax , ∀t, k.

slot t from (28), we obtain the following drift-minus-reward term: ∆T (t) − V E{QoE(t)}.

(29)

According to the design principle of Lyapunov optimization, the VSA decisions should be chosen to minimize an upper bound of (29) at each frame j. Theorem 3 stated below provides such an upper bound. Theorem 3: Let V ≥ 0 and t = jT for some nonnegative integer j. Let AG(t) represent the set of all QA and BA options available under a given G(t). Then, under any feasible decisions, which satisfy C1–C3 and (27), we have  +T −1 ∑ K jT∑

  ∆T (t) − V E QoEk (t)   t=jT k=1   +T −1 ∑ K  jT∑ ≤ D2 − V E QoEk∗ (t)   t=jT k=1   jT∑ +T −1 K ∑  +E Qk (jT ) [Rk∗ (t) − Ck∗ (t)Tp ] ,  

(30)

t=jT

where D2 is a positive constant, QoEk∗ (t), Rk∗ (t), Ck∗ (t) are the resulted QoE, chunk size and transmission rate values under any feasible decision a∗ (t) ∈ AG(t) . Proof: Please refer to Appendix B for the proof. Theorem 4: (a) Performances of VSA under Arbitrary Sample Path of CSI: The VSA achieves the following performance: F T −1 1 ∑ QoE = lim QoE(t) F −→∞ F T t=0 F −1 D2 T 1 ∑ QoEjopt − , F −→∞ F V j=0

≥ lim

(31)

(33)

and Rkmax and Ckmax are positive constants satisfying the following constraints,

k=1

k=1

T −1 ∑ max (Rkmax , Ckmax Tp ) . 2 k=1 (32) K

+

Proof: : Please refer to Appendix C for the proof. (b) Performances of VSA under the independently and identically distributed (i.i.d) CSI: If the CSI is i.i.d, the VSA achieves the following performance: T −1 1 ∑ D2 QoE(t) ≥ QoE opt − , T −→∞ T V t=0

QoE = lim

(34)

with the bounded queue backlogs satisfying T −1 K 1 ∑∑ D2 + V (QoE max − QoE max ) Qk (t) ≤ , T −→∞ T ϵ t=0 k=1 (35)

lim

where QoE opt is the optimal value of the stochastic optimization problem (16). Proof : Similar proof can be found in [14], [25]. Remark 2: (Regarding the Performance of VSA under arbitrary CSI) (31) shows that the theoretical QoE achieved by VSA is within O(1/V ) of the time-averaged QoE that can be achieved by the T -slot lookahead policy, which has the knowledge of CSI in future T time slots in advance. That is, when V is sufficiently large, the QoE value achieved by VSA is arbitrarily close to the optimum QoE achieved by the T-slot lookahead algorithm, since D2 is a constant (independent of V ) and D2 /V is arbitrarily small. However, from (32), we can see that the data queue backlogs bound also grows with V . Intuitively, the growing data queue backlogs will lead to larger video rebuffer time, which can be observed in Fig. 5. In practice, for ABR streaming, video rebuffer time is a key QoE factor [26, 28] and it is ideal that the video rebuffer time should be made as small as possible. Considering this, for some particular scenarios, where users are particularly concerned with the video rebuffer time, a timeaveraged video rebuffer time constraint can be added in the stochastic problem formulation in (16) to guarantee that the average video rebuffer time of each user during the viewing of the video is strictly limited below an upper bound.



0.14

0 0

500

1000

1500

2000

200

0.08 0.06 0.04

0 10 -3

0 0

Fig. 4.

0.1

0.02

2

Q (t) (bit)

400

Video rebuffer time

0.12 200

1

Q (t) (bit)

400

500

1000

Time slot

1500

2000

Evolution of data queue Q(t) by VSA with V = 1, 000.

10 -2

10 -1

10 0

10 1

10 2

V

Fig. 5.

Rebuffer time with different values of V by VSA.

0.9

VII. N UMERICAL R ESULTS AND D ISCUSSIONS In this section, we evaluate the performance of the proposed VSA3 . For simplicity, gk (t), ∀k, t, is selected from the space 0.001 to 0.1 with an interval of 0.001, each with probability 1/100. We set W = 1, K = 2, P = 1Watt, N0 = 10−8 W/Hz [5], D = 2s, Tc = 2s, Tp = 0.2s, wq = 1,wv = 1 and wr = 3, 000 [19]. Each user requests a video with R = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]. The quality-rate functions of video 1 and video 2 are characterized by U1 (R1 ) = 3R1 and by U2 (R2 ) = R2 , respectively, with R1 and R2 both being selected from R. The experiment is simulated for T = 2, 000 consecutive time slots. With T = 2, 000, the total time duration of ABR streaming is T × Tp = 2000 × 0.2 = 400s, and the T ×T number of chunks for each user is D p = 200. Firstly, we illustrate the data queue stability with V = 1, 000. As shown in Fig. 4, from the definition of strong stability, we know that Q(t) is stable, which guarantees the stability of the network. Figs. 5, 6, 7 and 8 show the video rebuffer time, video quality, video quality variation and QoE, with respect to different values of V . Please note that in the simulation, we present the performance of video rebuffer time, video quality and QoE after normalization. As illustrated in Fig. 5, with an increasing V , video rebuffer time is increased. This is because, as formulated in (19), V stands for the emphasis the network controller places on the network delay (or the network congestion) compared with QoE maximization. That is, if V is larger, QoE maximization takes higher priority, which means 3 In essence, the way in which we evaluate the performance of the proposed VSA is in a numerical fashion instead of in a simulation fashion. And thus, our work in the paper is a video-aware or QoE-aware RRM work instead of a video resource optimization work. Since the BA optimization problem is a composite function, which needs to constructed by some simple (standard) functions in cvx, using cvx with Mosek Bundle can improve the efficiency of the optimal BA algorithm.

0.85

Video quality

Remark 3: (Regarding the Performance of VSA under the i.i.d CSI) In particular, if the CSI is i.i.d., the bounding term in (31) is explicitly given by QoE opt , which is the optimal value of the stochastic problem in (16). And, the time average in the R.H.S of (31) and (32) both become ensemble averages because of the i.i.d CSI.

0.8 0.75 0.7 0.65 0.6 0.55 10 -3

10 -2

10 -1

10 0

10 1

10 2

V

Fig. 6.

Video quality with different values of V by VSA

that QoE optimization is obtained at higher cost of data queue backlogs, thus larger network delay (leading to larger video rebuffer time) is observed in Fig. 5. On the other hand, smaller V means that the network controller considers network delay with higher priority compared with QoE maximization, which results in lower network delay and lower rebuffer time at the cost of smaller QoE. As shown in Figs. 6 and 7, the video quality performance and the video quality variation performance are both improved with an increasing V , which is as expected considering the implication of V expressed in (19). Specifically, the video quality increases and the video quality variation decreases with the increasing V . Actually, the joint ABR streaming performances by VSA shown in Figs. 5, 6 and 7 are in accord with ABR streaming in real networks. For example, as reported in [11], selecting a higher video bitrate would improve the video quality while it may lead to many rebuffer time concurrently. Thus, we know that proposed model is effective in modeling ABR streaming over wireless networks, since “A model, in the context of science, is a simplified representation of some real phenomenon ” [30] (page 209). However, as shown in Fig. 8, the QoE value is not increased and even decreased with the increasing V in the simulation. This is different from existing queue-aware algorithms for ABR streaming [4], [11] [21], in which the QoE is improved with the increasing V . This is because in [4], [11] [21], the QoE only considers one QoE factor, i.e., the video quality. It is easy to understand that the video quality will increase with the



0 -20

8

-40 -60

QoE

Video quality variation

9

7

-80 -100

6

-120 5 10 -3

10 -2

10 -1

10 0

10 1

10 2

-140 10 -3

VSA JRCRA Max-weight with layer 1 Max-weight with layer 3 Max-weight with layer 5 Max-weight with layer 7 Max-weight with layer 9

10 -2

10 -1

V

Fig. 7.

Quality variation with different values of V .

Fig. 9.

0

Video quality

QoE

10 2

10 1

10 2

QoE comparison.

0.8

-20

-30

0.6 0.4


0.2

10 -2

10 -1

10 0

10 1

10 2

0 10 -3

10 -2

V

Fig. 8.

10 1

1

-10

-40 10 -3

10 0

V

QoE with different values of V by VSA.

increasing V as expressed in (19). In fact, as shown in Fig. 6, the video quality by VSA is also increased with the increasing V . But, when we directly formulate three of the QoE factors in the objective function in (16), the relationship between QoE and the value of V is not so easy to explain as shown in [4], [11] [21]. In the simulation, with the weight of video rebuffer time wr = 3000 compared with the weight of video quality wq being 1 and the weight of video quality variation wv being 1, we know that the video rebuffer time is in the dominant place of the weighted sum of QoE and determines the main trend of QoE. And thus, when the video buffer time does not approach 0 and increases with the increasing V , the QoE is decreasing. However, as shown in Fig. 13, with pre-buffering time being 3 seconds, the QoE value is increased with the increasing V . This is because in this case, the video buffer time is as small as possible, thus it is the video quality and video buffer time make a major difference in the weighted sum of QoE. Finally, we compare the proposed VSA with two baselines. Baseline 1 is the joint rate control and resource allocation (JRCRA) algorithm in [4], [11] [21]. Actually, JRCRA is a queue-aware ABR streaming algorithm as VSA, but JRCRA only considers the video quality in the problem formulation. Baseline 2 is the traditional queue-aware max-weight algorithm [25], [27] with the aim to maximize the time-averaged throughput of the network. Typically, the max-weight algorithm is suitable for delay-tolerant applications like file downloading. Since the max-weight algorithm is usually for delay-tolerant applications, it does not consider the quality adjustment at the

10 -1

10 0

V

Fig. 10.

Video quality comparison.

application layer. Thus, in order to make a fair and reasonable comparison, we simulate the max-weight algorithm with the layer index. For example, for the max-weight algorithm with layer i, in the simulation, we select video quality i for the users at each chunk time, place the data associated with video quality i in the data queue Q(t), and perform the queue-aware max-weight algorithm at each transmission slot. Figs. 9, 10, 11 and 12 show the performance comparison of VSA with the two baselines in QoE, video quality, video rebuffer time and video quality variation, respectively. From the joint comparisons, we observe that VSA achieves a better QoE, i.e., it achieves a better tradeoff between the factors affecting QoE. For example, when V ≤ 1, VSA achieves a better QoE compared with both JRCRA and the max-weight algorithm with layer 1, 3, 5, 7, 9. This is because, as illustrated in Figs. 10, 11, with V ≤ 1, VSA achieves a video quality performance that can be comparable with JRCRA and a video rebuffer time performance that can be comparable with the max-weight algorithm. On the other hand, with the explanation of the effect of V on the performance of VSA, we can understand that, when V is larger (i.e., V > 1 in the paper), video quality takes much higher priority than video buffer time. Hence, it is possible that the value of video quality by VSA is larger while the value of video rebuffer time is also larger than other techniques (mainly referring to the Max-weight with layer 1, 3, 5, 7). Thus, with the weight of video buffer time wr being 3000 compared with the weight of video quality wq being 1 in the simulation, from the perspective of pure mathematical, when V > 1, the value



0.014


0.04 0.03 0.02

0.012

0.01

Video rebuffer time

Video rebuffer time

0.05

0.008

Pre-buffering Pre-buffering Pre-buffering Pre-buffering

time=0 time=1 time=2 time=3

0.006

0.004

0.01

0.002

0 10 -3

10 -2

10 -1

10 0

10 1

10 2

0 10 -3

10 -2

10 -1

V

Fig. 11.

10 0

10 1

10 2

10 1

10 2

V

(a) Video rebuffer time.

Video rebuffer time comparison. 10

10

QoE

Video quality variation

0 8 VSA JRCRA Max-weight with layer 1 Max-weight with layer 3 Max-weight with layer 5 Max-weight with layer 7 Max-weight with layer 9

6 4 2

-10 -20 -30 -40 10 -3

0 10 -3

10

-2

10

-1

10

0

10

1

10

Video quality variation comparison.

of QoE by VSA is smaller than other techniques. But from the whole interval of V , especially when V ≤ 1, VSA achieves a much better performance of ABR streaming. Please note that, due to the development of Lyapunov optimization technique, currently, VSA cannot reach a completely automatic adjustment of V (according to the specific network condition), in which VSA can automatically find the optimal value of V based on the given network. Thus, an alternative way of finding an appropriate value of V is by using the conservative setting. For example, at the busy time (or day) of the associated network, like 8pm-10pm in the residential area, we may set a much smaller value of V to obtain a better performance on video buffer time. As shown in Fig. 11, we know that the main drawback of JRCRA is that it incurs large video rebuffer time. This is because it only formulates the video quality in its maximization, while the video rebuffer time is ignored. In addition, for the max-weight algorithm with layer index, the main drawback is that it cannot be automatically adaptive to the time-varying channel or the video content. For example, when the selected video quality is small (say layer 1), the max-weight algorithm with layer 1 leads to smaller video buffer time whereas the video quality is also smaller compared with VSA and JRCRA. By contrast, without the prior knowledge of channel statistics, the proposed VSA obtains a better tradeoff between video quality and video rebuffed time. Actually, the performance by VSA can be further enahnced if some pre-buffering time is tolerable by mobile users. We plot the video rebuffer time

10 -2

10 -1

10 0

V

2

(b) QoE

V

Fig. 12.

Pre-buffering time=0 Pre-buffering time=1 Pre-buffering time=2 Pre-buffering time=3

Fig. 13. time.

Video rebuffer time and QoE by VSA with different pre-buffering

and QoE by VSA with the pre-buffering time being [0, 1, 2, 3] seconds in Fig. 13. Thus, we claim that VSA is suitable for providing ABR streaming to mobile users over wireless networks with time-varying channel conditions.

VIII. C ONCLUSIONS AND F UTURE R ESEARCH In this paper, we proposed a two time-scale resource optimization scheme for ABR streaming over wireless networks, with QA performed at the APP and BA performed at the PHY. We formulated the problem as a stochastic optimization problem to maximize the long-term time-averaged QoE of the network, which takes into account three key QoE factors of ABR streaming, including video quality, quality variation and video rebuffer. Considering the difficulty of having the prior knowledge channel statistics in practical wireless networks, an online algorithm VSA, which only needs the current knowledge of CSI and QSI, was developed to obtain the QA and BA decisions. We also presented the performance analysis of VSA compared with the T -slot lookahead algorithm for an arbitrary CSI sample path. Simulation results verified the effectiveness of the proposed VSA for ABR streaming over time-varying wireless networks. In our future work, we will investigate the application of deep reinforcement learning [29] to ABR streaming work.



A PPENDIX A P ROOF OF T HEOREM 2 Let

P gk (t) N0

A PPENDIX B P ROOF OF T HEOREM 3 From (20), we have

= c > 0. Then f (xk (t)) can be expressed as

f (xk (t)) =

1 W xk (t) log2 (1 + c/W xk (t)). sk (t)

L (Θ(t + 1)) − L (Θ(t)) − αQoE(t) ≤ D1 (t) − V QoE (a∗ (t))

(36)

Taking the first and second derivatives of f (xk (t)) with respect to xk (t), respectively, we have df d[xk (t)] [ ] W 1 c c = log2 1 + , − sk (t) W xk (t) sk (t) [xk (t) + c/W ] ln 2 (37)

+

K ∑

Qk (t)(Rk∗ (a∗ (t)) − Tp Ck∗ (a∗ (t))) ,

(39)

k=1

where L (Θ(t + 1)) is as defined in (17) and a∗ (t) ∈ AG(t) . Now note that for all τ ∈ {t, · · · , t + T − 1}:

f ′ (xk (t)) =

|Qk (τ ) − Qk (t)| ≤ (τ − t) max [Rkmax , Tp Ckmax ] ,

(40)

where Rkmax and Ckmax satisfy the following constraints, 0 ≤ Rk∗ (t) ≤ Rkmax ≤ Rmax , ∀t, k, 0≤

Ck∗ (t)

≤

Ckmax

(41)

≤ Cmax , ∀t, k.

(42)

and Plugging (40) in (39), we get: 2

d f d2 [xk (t)] { } 1 1 c − . = [xk (t) + c/W ]sk (t) ln 2 [xk (t) + c/W ] xk (t) (38)

f 2 (xk (t)) =

(a) Proof of convexity and monotonicity of f (xk (t)) From (38), we know f 2 (xk (t)) < 0 and f (xk (t)) is concave in the interval [0, 1], since c/W > 0 and 0 ≤ xk (t) ≤ 1. Define g(xk (t)) = f ′ (xk (t)). Then from (38), we know g(xk (t)) is a decreasing function of xk (t) since g ′ (xk (t)) = f 2 (xk (t)) < 0. 1 Define h(c) = skW(t) log2 (1+c/W )− skc(t) (1+c/W ) ln 2 . Then ′ g(1) = f (1) = h(c) with c > 0. It is easy to know that xk (t) h(c) ≥ 0, ∀c > 0, since h′ (c) = sk (t)[WW+x ≥ 0 and 2 k (t)] ln 2 h(0) = 0. Thus, g(1) = f ′ (1) > 0. Therefore, f ′ (xk (t)) > 0 in the interval [0, 1] since f ′ (1) = g(1) > 0 and f ′ (xk (t)) is an decreasing. Therefore, f (x) is increasing. To sum up, f (xk (t))is an increasing and concave function of xk (t) in the interval of [0, 1]. (b) Proof of convexity and monotonicity of Td,k (xk (t)) According to the theory in [15] ( page 84 (3.10) ): Define l(x) = h(f (x)). If h(x) is convex and non-increasing, and f (x) is concave, l(x) is convex, Actually, Td,k (t) can be perceived as a composition Td,k (t) = h(f (xk (t))), where h(x) = x1 is a( convex and) non-increasing and f (xk (t)) = P gk (t) 1 sk (t) xk (t) log2 1 + N0 xk (t) is concave. Therefore, according to the theorem in [15] ( page 84 (3.10) ), Td,k (t) is convex. In addition, it is easy to verify that Td,k (xk (t)) is a decreasing function of xk (t) in the interval [0, 1], since the first derivative of Td,k (xk (t)) with respective to xk (t), denoted by ′ ′ (t)) Td,k (xk (t)) = − [ff(x(xk(t))] 2 ≤ 0. To sum up, Td,k (xk (t)) is a k decreasing and convex function of xk (t) in the interval of [0, 1].

L (Θ(τ + 1)) − L (Θ(τ )) − αQoE(a∗ (τ )) ≤ D1 (t) − V QoE(a∗ (τ )) +

K ∑

[Qk (t) + (τ − t) max(Rkmax , Tp Ckmax )] [Rk∗ (τ ) − Ck∗ (τ )]

k=1

≤ D2 + 2D2 (τ − t) − V QoE(a∗ (τ )) +

K ∑

Qk (t) [Rk∗ (τ ) − Tp Ck∗ (τ )] ,

(43)

k=1

where D2 =

1 2

K [ ] ∑ 2 2 . + Tp2 Cmax Rmax k=1

Summing the (43) over τ ∈ {jT, · · · , jT + T − 1} and using jT +T ∑ −1 , we have the fact that (τ − jT ) = (T −1)T 2 τ =jT

∆T (t) − V E

 +T −1 jT∑ 

≤ D2 T 2 − V E

+E

 K ∑ 

  QoE(t)

t=jT

 +T −1 ∑ K jT∑ 

Qk (jT )

k=1

t=jT



QoE ∗ (t)

k=1

  

jT∑ +T −1

 

t=jT



[Rk∗ (t) − Tp Ck∗ (t)]

.

(44)

A PPENDIX C P ROOF OF T HEOREM 4 (a) We now consider the policy comprising of the control actions satisfying the following constraint: 1 T

nT∑ +T −1

[Rk∗ (t) − Tp Ck∗ (t)] ≤ −ϵ,

(45)

t=nT



Summing the above over the frames n ∈ {0, · · · , F − 1} yields

where ϵ > 0 is arbitrary. We plug (45) in (44) and obtain L(G(j + 1)T ) − L(G(jT )) − V

jT∑ +T −1 ∑ K t=jT

≤ D2 T 2 − V

jT∑ +T −1 ∑ K t=jT

QoEk∗ (t) − ϵ

k=1

QoEk (t)

L(Θ(F T ) − L(Θ(0)) − V

k=1 K ∑

F∑ T −1 ∑ K

QoEk (t)

T =0 k=1

Qk (jT ).

(46)

k=1

≤ D2 F T − V T 2

F −1 ∑

QoEjopt .

(52)

j=0

Denote by QoEjmax the maximum of the QoE for frame j. Using the fact (33), and by rearranging (46), and ignoring appropriate terms, we obtain

F T −1 1 ∑ QoE(t) F T t=0

L(G(j + 1)T ) − L(Θ(jT )) K ∑ ( ) ≤ D2 T 2 V T QoE max − QoE min − ϵ Qk (jT ).

Dividing both sides by V F T and using the fact that L(Θ(F T )) ≥ 0, we get

(47)

k=1

F −1 −L(Θ(0)) D2 F T 2 VT ∑ ≥ − + QoEjopt . V FT V FT V F T j=0

(53)

Summing the above over the frames j ∈ {0, · · · , F − 1} Taking limits as F −→ ∞ yields (31). yields ( ) R EFERENCES L(G(F T ) − L(G(0)) ≤ D2 T 2 F + V F T QoE max − QoE min −ϵ

F −1 ∑ K ∑

Qk (jT ).

(48)

j=0 k=1

≥

using the fact L(Θ(F T ) F∑ −1 ∑ K that − Qk (jT ) ≤ By

F T (T −1) 2

−

the

fact

Qk (t)

+

t=0 k=1

j=0 k=1 K ∑ k=1

0,

F∑ T −1 ∑ K

max (Rkmax , Tp Ckmax ),

rearranging

and

neglecting appropriate terms, we get ϵ

F∑ T −1 ∑ K

Qk (t)

t=0 k=1

( ) ≤ L(G(0)) + D2 T 2 F + V F T QoE max − QoE min F T (T − 1) ∑ max (Rkmax , Tp Ckmax ) . 2 K

+

(49)

k=1

Taking limits as F −→ ∞ and dividing (49) by ϵF T , yield F T −1 K 1 ∑ ∑ V (QoE max − QoE min ) D2 T + Qk (t) ≤ F T t=0 ϵ ϵ k=1

K T −1 ∑ + max (Rkmax , Tp Ckmax ) . 2

(50)

k=1

(b) Let QoEjopt represent the optimal solution for frame j to the static problem (27). We now consider the policy comprising of the decisions which achieve the optimal solution QoEjopt . Using QoEjopt in (46), we have L(G(n + 1)T ) − L(G(nT )) − V

nT∑ +T −1 ∑ K t=nT

≤ D2 T 2 − V T QoEjopt .

QoEk (t)

k=1

(51)

[1] White Paper, “Cisco visual networking index: Global mobile data trafficforecast update, 2014-2019,” technical report, Cisco, Feb. 2015. [2] Y. Sanchez, “iDASH: Improved dynamic adaptive streaming over HTTP using scalable video coding,” in Proceeding of ACM Multimedia Systems, pp. 23-25, 2011. [3] M. Chiang, Networked Life: 20 Questions and Answers. Cambridge, U.K.: Cambridge Univ. Press, Dec. 2012. [4] Y. Guo, Q. Yang, and K. S. Kwak, “Quality-oriented rate control and resource allocation in time-varying OFDMA networks,” IEEE Transactions on Vehicular Technology, vol. 66, no. 3, pp. 2324-2338, March 2017. [5] M. Rugelj, U. Sedlar, and M. Volk, “Novel cross-layer QoE-aware radio resource allocation algorithms in multiuser OFDMA systems,” IEEE Transactions on Communications, vol. 62, no. 9, pp. 3196-3208, Sept. 2014. [6] Y. Xu, R. Q. Hu, and Y. Qian, “Video quality-based spectral and energy efficient mobile association in heterogeneous wireless networks,” IEEE Transactions on Communications, vol. 64, no. 2, pp. 805-817, Feb. 2016. [7] J. Zheng, Y. Cai, and Y. Liu, “Optimal power allocation and user scheduling in multicell networks: Base station cooperation using a gametheoretic approach,” IEEE Transactions on Wireless Communications, vol. 13, no. 12, pp. 6928-6942, Dec. 2014. [8] P. Li, Y. Wang, and W. Zhang, “QoE-oriented two-stage resource allocation in femtocell networks,” in Proceedings of IEEE VTC, pp. 1-5, Sept. 2014. [9] R. Xie, F. R. Yu, and T. Huang, “Joint user association and rate allocation for HTTP adaptive streaming in heterogeneous cellular networks,” in Proceedings of IEEE ICC, pp. 1-6, 2016. [10] C. Zhou, C. W. Lin, and Z. Guo, “mDASH: A markov decision-based rate adaptation approach for dynamic HTTP streaming,” IEEE Transactions on Multimedia, vol. 18, no. 4, pp. 738-751, April 2016. [11] J. Kim, G. Caire, and A. F. Molisch, “Quality-aware streaming and scheduling for device-to-device video delivery,” IEEE/ACM Transactions on Networking, vol. 24, no. 4, pp. 2319-2331, Aug. 2016. [12] D. Bethanabhotla, G. Caire, and M. J. Neely, “Adaptive video streaming for wireless networks with multiple users and helpers,” IEEE Transactions on Communications, vol. 63, no. 1, pp. 268-285, Jan. 2015. [13] T. Wiegand, G. J. Sullivan, and G. Bjontegaard, “Overview of the H.264/AVC video coding standard,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, no. 7, pp. 560-576, July 2003. [14] M. J. Neely, Stochastic Network Optimization with Application to Communication and Queueing Systems. San Rafael, CA, USA: Morgan & Claypool, 2010. [15] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge, U.K.: Cambridge Univ. Press, 2004. [16] M. Grant, S. Boyd, and Y. Ye, “Cvx: Matlab software for disciplined convex programming,” version 2.0 beta, Sept. 2012. [Online]. Available: http://cvxr.com/cvx.



[17] A. J. Goldsmith, Wireless Communications. Cambridge University Press, 2005. [18] J. Qiao, X. S. Shen, and J. W. Mark, “Video quality provisioning for millimeter wave 5G cellular networks with link outage,” IEEE Transactions on Wireless Communications, vol. 14, no. 10, pp. 5692-5703, Oct. 2015. [19] X. Yin, A. Jindal, and V. Sekar, “A control-theoretic approach for dynamic adaptive video streaming over HTTP,” in Proceedings of ACM SIGCOMM, Aug. 2015. [20] M. Li and C. Y. Lee, “A cost-effective and real-time QoE evaluation method for multimedia streaming services,” Telecommunication Systems, vol. 59, no. 3, pp. 317-327, July 2015. [21] D. Bethanabhotla, G. Caire, and M. J. Neely, “WiFlix: Adaptive video streaming in massive MU-MIMO wireless networks,” IEEE Transactions on Wireless Communications, vol. 15, no. 6, pp. 4088-4103, June 2016. [22] M. Chen, Mi. Ponec, and S. Sengupta, “Utility maximization in peerto-peer systems with applications to video conferencing,” IEEE/ACM Transactions on Networking, vol. 20, no. 6, pp. 1681-1694, Dec. 2012. [23] “Mobile-Edge ComputingIntroductory Technical White Paper,” Sept. 2014. [24] C. Liang and F. R. Yu, “Wireless virtualization for next generation mobile cellular networks,” IEEE Wireless Communications, vol. 22, no. 1, pp. 6169, Feb. 2015. [25] L. Georgiadis, M. J. Neely, and L. Tassiulas, “Resource allocation and cross-layer control in wireless networks,” Foundations and Trends in Networking, vol. 1, no. 1, pp. 1-144, April 2006. [26] S. Chen, J. Yang, and Y. Ran, “Adaptive layer switching algorithm based on buffer underflow probability for scalable video streaming over wireless networks,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 26, no. 6, pp. 1146-1160, June 2016. [27] X. Xiang, C. Lin, and X. Chen, “Toward optimal admission control and resource allocation for LTE-A femtocell uplink,” IEEE Transactions on Vehicular Technology, vol. 64, no. 7, pp. 3247-3261, July, 2015. [28] T. Hofeld, M. Seufert, and C. Sieber, “Identifying QoE optimal adaptation of HTTP adaptive streaming based on subjective studies,” Computer Networks, vol. 81, pp. 320-332, 2015. [29] V. Mnih, K. Kavukcuoglu, and G. Ostrovski, “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529-533, 2015. [30] M. Mitchell, Complexity: A Guided Tour, Oxford University Press, 2009.

Yashuang Guo received the B.S. degree in electronic and information engineering from Dalian University, China, in 2011. She is currently pursuing a Ph.D. degree in Communication and Information Systems at Xidian University. She has been a Visiting Scholar with the University of British Columbia from Sept. 2016 to Aug. 2017. Her research interests include cross-layer design, QoE provisioning and applications of stochastic optimization in wireless networks.

Qinghai Yang received his B.S. degree in Communication Engineering from Shandong University of Technology, China in 1998, M.S. degree in Information and Communication Systems from Xidian University, China in 2001, and Ph. D. in Communication Engineering from Inha University, Korea in 2007 with university-president award. From 2007 to 2008, he was a research fellow at UWB-ITRC, Korea. Since 2008, he is with Xidian University, China . His current research interest lies in the fields of autonomic communication, content delivery networks and LTEA techniques.

F. Richard Yu (S’00–M’04–SM’08) received the PhD degree in electrical engineering from the University of British Columbia (UBC) in 2003. From 2002 to 2006, he was with Ericsson (in Lund, Sweden) and a start-up in California, USA. He joined Carleton University in 2007, where he is currently a Professor. He received the IEEE Outstanding Service Award in 2016, IEEE Outstanding Leadership Award in 2013, Carleton Research Achievement Award in 2012, the Ontario Early Researcher Award (formerly Premiers Research Excellence Award) in 2011, the Excellent Contribution Award at IEEE/IFIP TrustCom 2010, the Leadership Opportunity Fund Award from Canada Foundation of Innovation in 2009 and the Best Paper Awards at IEEE VTC 2017 Spring, ICC 2014, Globecom 2012, IEEE/IFIP TrustCom 2009 and Int’l Conference on Networking 2005. His research interests include cross-layer/cross-system design, connected vehicles, security, and green ICT. He serves on the editorial boards of several journals, including Co-Editor-inChief for Ad Hoc & Sensor Wireless Networks, Lead Series Editor for IEEE Transactions on Vehicular Technology, IEEE Transactions on Green Communications and Networking, and IEEE Communications Surveys & Tutorials. He has served as the Technical Program Committee (TPC) Co-Chair of numerous conferences. Dr. Yu is a registered Professional Engineer in the province of Ontario, Canada, a Fellow of the Institution of Engineering and Technology (IET), and a senior member of the IEEE. He is a Distinguished Lecturer and a member of Board of Governors of the IEEE Vehicular Technology Society.

Victor C. M. Leung (S’75–M’89–SM’97–F’03) received the B.A.Sc. (Hons.) degree in electrical engineering from the University of British Columbia (UBC) in 1977, and was awarded the APEBC Gold Medal as the head of the graduating class in the Faculty of Applied Science. He attended graduate school at UBC on a Canadian Natural Sciences and Engineering Research Council Postgraduate Scholarship and received the Ph.D. degree in electrical engineering in 1982. From 1981 to 1987, Dr. Leung was a Senior Member of Technical Staff and satellite system specialist at MPR Teltech Ltd., Canada. In 1988, he was a Lecturer in the Department of Electronics at the Chinese University of Hong Kong. He returned to UBC as a faculty member in 1989, and currently holds the positions of Professor and TELUS Mobility Research Chair in Advanced Telecommunications Engineering in the Department of Electrical and Computer Engineering. Dr. Leung has co-authored more than 1000 journal/conference papers, 37 book chapters, and co-edited 12 book titles. Several of his papers had been selected for best paper awards. His research interests are in the broad areas of wireless networks and mobile systems. Dr. Leung is a registered Professional Engineer in the Province of British Columbia, Canada. He is a Fellow of IEEE, the Royal Society of Canada, the Engineering Institute of Canada, and the Canadian Academy of Engineering. He was a Distinguished Lecturer of the IEEE Communications Society. He is serving on the editorial boards of the IEEE Wireless Communications Letters, IEEE Transactions on Green Communications and Networking, IEEE Transactions on Cloud Computing, IEEE Access, Computer Communications, and several other journals, and has previously served on the editorial boards of the IEEE Journal on Selected Areas in Communications - Wireless Communications Series and Series on Green Communications and Networking, IEEE Transactions on Wireless Communications, IEEE Transactions on Vehicular Technology, IEEE Transactions on Computers, and Journal of Communications and Networks. He has guest-edited many journal special issues, and provided leadership to the organizing committees and technical program committees of numerous conferences and workshops. He received the IEEE Vancouver Section Centennial Award and 2011 UBC Killam Research Prize. He is the recipient of the 2017 Canadian Award for Telecommunications Research. He co-authored papers that won the 2017 IEEE ComSoc Fred W. Ellersick Prize and the 2017 IEEE Systems Journal Best Paper Award.