Resource Allocation for Multihomed Scalable ... - Stanford University

4 downloads 873 Views 210KB Size Report
Modern laptops and hand-held devices can access multi- ple networks with diverse ..... i=1 xi on [0, 1]n, is ˆf(x1, ··· ,xn) = mini=1,··· ,n xi [29]. Applying this to each ...
Resource Allocation for Multihomed Scalable Video Streaming to Multiple Clients Nikolaos M. Freris∗ , Cheng-Hsin Hsu∗, Xiaoqing Zhu† , and Jatinder Pal Singh∗ ∗ Deutsche

Telekom R&D Laboratories, 5050 El Camino Real 221 Los Altos, CA 94022 Systems Inc., 170 West Tasman Drive San Jose, CA 95134

† Cisco

Access Networks 1 Network Interface

Video Database

Video Splitter

Video Assembler

Internet

Net Intf.

Scalable Decoder

N

Cellular

1

.....

Joint Optimization Algorithm

WLAN

ABR/RTT Tool

Net Intf.

ABR/RTT Tool

.....

Modern laptops and hand-held devices can access multiple networks with diverse and dynamic characteristics. For example, 3G data networks offer pervasive connectivity but may suffer from low network capacity [2], while Wireless Local-Area Networks (WLANs) can provide higher capacity but each access point only covers a small area. In multihomed video streaming [3], [4], a video is concurrently sent over multiple access networks in order to achieve higher aggregate bandwidth, more pervasive connectivity, improved error resilience, and lower communication delays [5]. Several US mobile service providers have reported large data traffic increases in their 3G data networks [6], [7]; multihoming can help offload traffic from congested networks, in order attain better streaming quality, as well as lower transit costs for service providers. Arbitrarily splitting a video stream into multiple substreams and sending each substream over an access network may lead to degraded video quality and playout glitches; this is because transmitting a substream at a low rate may underutilize the network resources, while transmitting at a rate close to the available bandwidth causes late packet delivery. Rate control based on measurements of available bit rate (ABR)

ABR/RTT Measurement Tool

Net Intf.

.....

I. I NTRODUCTION

Clients Streaming Server

......

Abstract—We consider multihomed scalable video streaming, where videos are transmitted by a single server to multiple clients over heterogeneous access networks. The specific problem that we address is to determine which video packets to transmit over each network, in order to minimize a cost function of the expected video distortion at the clients. We present a network model and a video model that capture the network conditions and video characteristics, respectively. We develop an integer program for deterministic packet scheduling. We propose different cost functions in order to provide service differentiation and address fairness among users. We propose several suboptimal convex problems for randomized packet scheduling, and study their performance and complexity. We propose an algorithm that yields a good performance and is suitable for real-time applications. We conduct extensive trace-driven simulations to evaluate the proposed algorithms using real network conditions and scalable video streams. The simulation results show that the proposed algorithm: (i) outperforms the rate control algorithms defined in the Datagram Congestion Control Protocol (DCCP) by about 10 dB, (ii) results in video quality, of 4.33 dB and 1.84 dB higher than the two heuristics developed in [1], (iii) runs efficiently, up to six times faster than one of the heuristics, and (iv) indeed can provide service differentiation among users. Index Terms—video streaming, quality optimization, rate control, stream adaptation

Video Assembler

Net Intf.

Scalable Decoder

U

Fig. 1. System architecture of a scalable video streaming system with U clients and N access networks.

and round-trip time (RTT) needs to be used to achieve a good trade-off between throughput and delay. Once the bit rate of each substream is determined, the video stream must be adapted into the right format so that it can be delivered to the client in a timely fashion. Stream adaptation is typically realized by computationally demanding transcoding [8], [9]. In contrast, scalable video coding, such as the H.264/SVC standard [9], supports efficient stream adaptation and allows service providers to save expenses on deploying streaming servers and transcoders.1 Scalable video streams, however, feature complex interdependencies among video packets, that stream adaptation must carefully account for. We study the joint rate control and scalable stream adaptation problem for multiple clients2 concurrently competing for the same access networks (cf. Fig. 1). We formulate an optimization problem to determine, for each client: (i) the streaming rate over each access network, (ii) the video packets to be transmitted, and (iii) the access network each video packet is sent over. Our contributions can be summarized as follows: • We formulate the rate control and stream adaptation problem as an integer program where the objective is to minimize a cost function of the expected video distortion. We propose different cost functions in order to provide service differentiation and address fairness among users. • We consider randomized packet scheduling by relaxing the integer program into real-valued optimization pro1 Despite a small cost on coding inefficiency, modern H.264/SVC coders are reported to significantly outperform previous scalable coding schemes, and even outperform some nonscalable coders such as MPEG-4 ASP (Advanced Simple Profile) [10]. 2 Throughout the paper, we use the terms client and user interchangeably.

study stream adaptation between one base station and multiple clients in a single-hop wireless network. In [21] the authors proposed a streaming platform to support multihoming, which was tested to reduce video interruptions and achieve higher and more stable received video quality. In previous work [1], we considered scalable video streaming for a single client. # 0

1

2

3

4

5

6

7

8

Fig. 2. Dependency among NALUs of H.264/SVC streams. Each square represents a NALU belonging to an MGS layer, and each rounded box represents a video frame.





grams. We derive convex programming approximations which can be efficiently solved using convex solvers such as CVX [11]. We analyze the trade-off between performance and computational complexity, and propose a convex program that yields good performance while being suitable for realtime applications. Simulation results show that the proposed algorithm: (i) outperforms the rate control algorithms defined in the Datagram Congestion Control Protocol (DCCP) standard [12] by about 10 dB, (ii) achieves better balance between performance and run-time, (iii) results in better performance than the heuristic algorithms proposed in [1], under diverse background traffic load, and (iv) indeed provides service differentiation among users. II. R ELATED W ORK

Rate control for nonscalable video streams has been investigated in [3], [4], [13]–[16]. Szwabe et al. [13] propose an architecture to monitor network conditions and control the streaming rate over a single access network. Jurca and Frossard [14] study the problem of rate control for video streaming over a multi-hop network, assuming known packet loss rates and available bandwidths for each network link. Zhu et al. [15] propose joint routing and rate control algorithms for ad-hoc wireless networks. Rate control for clients with multiple interfaces has been studied in [3], [4], [16]. Singh et al. [16] propose a solution based on stochastic control of Markov Decision Processes, Alpcan et al. [4] give a solution based on H∞ -optimal control, and Zhu et al. [3] present a solution based on convex optimization. Efficient stream adaptation for scalable streams has been studied in [17]–[21]. Hefeeda and Hsu [17] consider the stream adaptation problem for Fine-Grained Scalable (FGS) video streaming from multiple senders to a single client; they employ a rate-distortion (R-D) function designed for FGS streams, and consider stream adaptation to maximize the overall video quality. Amonou et al. [18] study the problem of prioritizing video packets of H.264/SVC streams; they empirically calculate the distortion impact of dropping each video packet, and give higher priorities to video packets with higher impact values. Sun et al. [19] propose an R-D model for FGS streams coded by H.264/SVC, which captures the drifting error caused by truncating video packets. Mansour et al. [20]

III. S YSTEM A RCHITECTURE AND O PTIMIZATION P ROBLEM A. System Architecture A multihomed scalable streaming system consists of a scalable streaming server and U multihomed clients where each client has access to N heterogeneous networks (see Fig. 1). The server contains a database of scalable videos; when requested by a client, a video stream is divided into N substreams by a video splitter which controls the rate of each substream to ensure timely delivery of video packets. For each client, the server sets up a connection over each access network, and transmits substream n (1 ≤ n ≤ N ) over access network n. Each client has a video assembler that combines the received substreams into a single scalable video stream, which is then fed to a video decoder. Access networks are heterogeneous and time-varying; periodic measurements of the ABR, cn , as well as the RTT, τn , are carried out for each access network using a lightweight tool such as Abing [22]. This measurement tool runs on both server and client sides, and monitors end-toend network conditions. Given the network conditions and the video characteristics for each client, we develop an algorithm to determine the streaming rates of individual access networks along with the video packets to be included in each substream. B. Network Model For a given user u (1 ≤ u ≤ U ), we let ru,n be the PU substream rate over access network n and rn := u=1 ru,n be the total streaming rate for network n. For access network n, we use pn to denote the packet loss probability, which accounts for losses due to packets missing their playout deadlines. We assume that access networks are statistically independent and write pn = gn (cn , rn ), where gn (cn , rn ) is increasing in rn and decreasing in cn . While our analysis can accommodate various queueing models [23] in defining gn (cn , rn ), we adopt the M/M/1 model which was shown to yield a good approximation in typical streaming applications [3], [24]. We denote the playout deadline by t0 and define the average oneway delay by tn := τ2n . The one-way delay can be related to n , where αn is the residual bandwidth, cn − rn , as tn = cnα−r n a parameter estimated from past observations of tn , cn , rn via linear regression [3]. We have pn = e −

t0 (cn −rn ) αn

.

(1)

C. Video Model We consider H.264/SVC [9] video streams coded with medium-grained quality scalability (MGS). Each coded stream u, 1 ≤ u ≤ U , is divided into multiple Network Abstraction

Layer Units (NALUs). For user u, each NALU gu,m,q is identified by frame number m, 1 ≤ m ≤ Mu , and quality layer q, 0 ≤ q ≤ Qu . NALU gu,m,0 corresponds to the base layer u of frame m, while {gu,m,q }Q q=1 denote quality enhancement layers. The H.264/SVC standard imposes dependencies among NALUs: gu,m,q (0 < q ≤ Qu ) depends on all gu,m,q′ , q ′ < q, while gu,m,0 depends on its ancestor frames as determined by the hierarchical prediction structure (cf. Fig. 2). We let3 Pu,m be the ancestor frames of frame m, and use su,m,q to represent the size of NALU gu,m,q . Let xu,m,q,n be a boolean decision variable which is equal to 1, if gu,m,q is sent over access network n, and is 0 otherwise. We allow for a packet to be sent over at most one access network; this is because efficient link-layer error control mechanisms, such as forward error correction (FEC) and automatic repeat request (ARQ) are widely applied in wireless networks to reduce packet loss rates, hence sending a NALU over multiple access networks does not lead to significant improvements on video quality [25]. Let xu,m,q =

N X

xu,m,q,n

multihomed scalable video streaming problem as one of finding the xu,m,q,n values to minimize a convex cost function Q U

Using the network model (1), the expected delivery probability of NALU gu,m,q denoted, by some abuse of notation, by xu,m,q ∈ [0, 1] is given by

(2)

xu,m,q =

n=1

be a binary variable with value 1, if NALU gu,m,q is sent over some network, and 0 otherwise. We model video distortion in mean square error (MSE). We let du,m = eu,m + yu,m be the total distortion of frame m, where eu,m denotes the truncation distortion, and yu,m denotes the drifting distortion. Truncation distortion refers to the quality degradation due to dropping NALUs of frame m. Let δˆu,m be the full-quality distortion of frame m, achieved when all NALUs are received, and δu,m,q (0 ≤ q ≤ Qu ) be the additional distortion introduced by dropping NALU gu,m,q . In order to decode gu,m,q , all NALUs gu,m,q′ , q ′ < q, must have been decoded, thus we have eu,m = δˆu,m +

Qu X q=0

N X

1−

Y

 xu,m,q′ δu,m,q .

(3)

q′ ≤q

k∈Pu,m

where αu,m , βu,m,k are parameters to be estimated from measurements; each βu,m,k is constrained to be nonnegative. D. Optimization Problem We denote, by some abuse of notation, the expected distortion, after accounting for random packet losses, of the mth frame of user u by du,m and define the vectors du := (du,1 , · · · , du,Mu )′ , d := (d′1 , · · · , d′U )′ . We formulate the this paper, we use bold symbols to represent vectors.

(1 − pn )xu,m,q,n ,

(6)

n=1

while the expected truncation distortion is still given by (3), and the expected drifting distortion by (4). Since NALUs have different sizes, some NALUs gu,m,q may comprise multiple, say Pu,m,q packets. This case can be handled by letting xu,m,q,p,n be 1, if the p−th packet of NALU gu,m,q is sent over access network n, and 0 else. Then, we may replace (6) and (3) with xu,m,q,p = eu,m

= δˆu,m +

Qu X

N X

(1 − pn )xu,m,q,p,n ,

(7)

n=1 Pu,m,q

1−

q=0

Drifting distortion is caused by imperfect reconstruction of ancestor frames Pu,m used for inter-frame prediction. Following the discussion in [1], [26], we propose an affine model: X βu,m,k eu,k , (4) yu,m = αu,m +

3 In

M

C(d) : R+ u=1 u → R+ , which is non-decreasing in each PU argument. One special case of interest is C(d) = u Cu (du ), where each Cu (du ) is convex and non-decreasing in each argument. We can provide service differentiation among users and framesPby considering different cost functions, e.g., u Cu (du ) = M m=1 wu,m du,m , wu,m ≥ 0. We can also address fairness among users, e.g., weighted P u min-max fairness by setting C(d) = maxu=1,··· ,U wu M m=1 du,m , wu ≥ 0. For user u, let Fu be the frame rate in frames-per-second (fps). The average transport stream rate for network n is given by Qu Mu X U X Fu X su,m,q xu,m,q,n . (5) rn = Mu m=1 q=0 u=1

Y Y

 xu,m,q′ ,p δu,m,q ,

(8)

q′ ≤q p=1

respectively. In the sequel, we assume Pu,m,q,p = 1 for notational simplicity; extending the optimization program and the proposed optimization algorithms to handle this general case is straightforward. The joint rate control and stream adaptation problem for the U clients, considering Mu frames for client u, is given by the integer program: min x

s.t. rn = pn = xu,m,q = eu,m yu,m du,m

PU

Fu u=1 Mu

C(d) PMu PQu

e PN

m=1 q=0 su,m,q xu,m,q,n , −t0 (cn −rn )/αn

, (1 − p )x n u,m,q,n , n=1 P Q Q u = δˆu,m + q=0 (1 − q′ ≤q xu,m,q′ )δu,m,q , P = αu,m + k∈Pu,m βu,m,k eu,k , = eu,m + yu,m , PN n=1 xu,m,q,n ≤ 1, xu,m,q,n ∈ {0, 1}.

(9a) (9b) (9c) (9d) (9e) (9f) (9g) (9h) (9i)

We also consider randomized packet scheduling by relaxing xu,m,q,n ∈ [0, 1], where, in this case, xu,m,q,n represent PU the mean values of u=1 Mu Qu N mutually independent Bernoulli random variables. The expected truncation distortion is still given by (3), if we assume that packet losses of access networks are statistically independent from the decision variables xu,m,q,n . This assumption is an approximation which can be made accurate by considering a two-timescale separation approach: suppose that the optimization window size is large enough for the stochastic process (such as a Markov chain) characterizing the network losses to converge to the stationary distribution. Then the approximation error in (9d) is negligible both in theory and in practice [27]. E. Properties of the Optimization Problem The objective function is increasing in rn , pn for any fixed x, and is decreasing in xu,m,q,n for each (u, m, q, n). The objective function is increasing in eu,m and yu,m for each (u, m). Based on these properties, we can replace the equality constraints in (9c), (9d), (9e) with ≥, ≤, ≥ inequality constraints, respectively. This yields an equivalent formulation with no nonlinear equality constraints. The above monotonicity properties guarantee that an optimal integer solution for x satisfies the property that gu,m,q is sent over some network, only if all gu,m,q′ , q ′ < q, are sent over some network, as well. The randomized optimization problem is not convex due to multinomial terms in (9d), (9e). The problem can neither be converted into an equivalent convex program by means of ′ exponential transformations of the form x → ex , nor can it be rendered in the format of geometric programming [28]. In Sec. IV-B, we present convex approximations to this problem. IV. O PTIMIZATION

ALGORITHMS

In this section, we propose several deterministic and randomized packet scheduling algorithms. We assume, for the sake of notational simplicity and without loss of generality, that Mu = M, Qu = Q, for all u = 1, · · · , U . A. Optimal Algorithms The integer program (9) can be solved by means of exhaustive search; the complexity of a naive exhaustive search is 2UM(Q+1)N , which can be reduced to (N + 1)MQUW in the light of (9h). If we further exploit the monotonicity properties of Sec. III-E, the complexity can be reduced to [(Q + 2)(N + 1)]MU . B. Convex Approximations We derive approximate convex programs for the randomized packet scheduling problem; the goal is to approximate the non-convex constraint set of (9) by a convex superset, by considering convex approximations to the multilinear product terms in (9d), (9e). Our approach also applies directly to the case where some NALUs gu,m,q comprise multiple packets, i.e., Pu,m,q > 1, since (7), (8) also feature multilinear product terms.

In the next lemma, we present a convex programming formulation that approximates the multilinear functions in (9d) and (9e), in a term-by-term fashion. Lemma 1 (Term-by-Term Convex Approximation (TTC)): The optimization problem min x

s.t. rn =

PU

Fu u=1 M

pn ≥ xu,m,q ≤ eu,m ≥ yu,m = du,m =

PM

m=1

e

C(d) PQ

(10a)

q=0 su,m,q xu,m,q,n ,

t (cn −rn ) − 0 α n

,

PN

n=1 min(1 − pn , xu,m,q,n ), P ˆ δu,m + Q xu,m,q′ )δu,m,q , q=0 (1 − min q′ ≤q P αu,m + k∈Pu,m βu,m,k eu,k , eu,m + yu,m , PN n=1 xu,m,q,n ≤ 1,

xu,m,q,n ∈ [0, 1],

(10b) (10c) (10d) (10e) (10f) (10g) (10h) (10i)

is a convex program whose optimal value is an underestimate of the optimal value of (9). It consists of U M (Q + 1)N decision variables and 2U M (Q + 1)+ 3U M + 2N constraints. It can be written as an equivalent smooth convex program by substituting the min in (10d), (10e) with inequality constraints. If we assume that C(d) is continuous on RUM + , then the convex program admits an optimal solution and has the strong duality property4. envelope of f (x1 , · · · , xn ) = Qn Proof: The nconcave ˆ i=1 xi on [0, 1] , is f (x1 , · · · , xn ) = mini=1,··· ,n xi [29]. Applying this to each multinomial term in (9d) and (9e), we get, by exploiting the monotonicity properties of Sec. III-E and the fact that the minimum of affine functions is a concave function, convex program (10). The program has a non-empty and compact set of optimal solutions since the constraint set on the decision variables xu,m,q,n is the compact unit hypercube X = [0, 1]UM(Q+1)N , and since all inequality constraints along with the objective function involve continuous functions. The convex program (10) has the strong duality property as well as a nonempty and bounded set of dual optimal solutions; because it satisfies Slater condition [28], i.e., there exists a feasible solution for which all inequality constraints are strictly satisfied (for example, let ǫ > 0 sufficiently small and xu,m,q,n = ǫ, pn = 1 − ǫ, xu,m,q = 2ǫ ). We present another method of approximating the nonconvex multilinear inequalities (9d) and (9e), by means of their convex envelopes. This yields the optimal convex approximation of the non-convex constraint set of (9). Lemma 2 (Multilinear Convex Approximation (MC)): The optimization problem in (11) is a convex program whose optimal value is an underestimate of the optimal value of (9). If we assume that C(d) is continuous on RUM + , then the convex program admits an optimal solution and has the strong duality property. Proof: Consider a multilinear function f (x1 , · · · , xk ) on [0, 1]n1 × . . . [0, 1]nk , i.e., a function that is linear in 4 Strong

duality is important for the performance of numerical methods [28].

each argument. The convex envelope of f (·) is given by conv(f ) = max fξ (x1 , · · · , xk )s.t. f ξ (x1 , · · · , xk ) ≤ f (x1 , · · · , xk ), ∀(x1 , · · · , xk ) ∈ {0, 1}k , where fξ (·) is Pk defined as: fξ (x1 , · · · , xk ) := i=1 f (ξ1 , · · · , xi , · · · , ξk ) − (k −1)f (ξ) [29]. The rest of the proof follows along the same lines as in Lemma 1.

C. Heuristic Algorithms

Remark 1 (Hybrid Convex Approximation (HC)): We can replace (10d) with (11d) for a balance between performance and computational complexity. We found, through simulations, that HC significantly outperforms TTC in most cases, while achieving a low run-time, hence we report results using HC in the remainder of the paper. Remark 2 (Computational Complexity): The TTC optimization program contains a polynomial number of constraints in U, M, Q, N . MC requires computing the convex envelope in (11d) and (11e). The convex envelopes in (11d) can be computed offline with 24N tests. This is because the calculation P of the convex envelope of f (x1 , x2 ) = N n=1 x1 (n)x2 (n) on [0, 1]N × [0, 1]N does not depend on the problem parameters. However, the convex envelopes in (11e) depend on the problem parameters. The computation takes U M 22(Q+1) tests, which might take prohibitively long time. HC contains an exponential number of constraints in N . For fixed N , and given u, m, q, the concave envelope in (11d) is given by the minimum of a fixed number, say AN of affine functions of xu,m,q,n , pn (for example for N = 3, A3 = 8), while the total number of constraints is polynomial in U, M, Q. Therefore, for small values of N , we propose using HC for a good trade-off between performance and run-time. Remark 3: We can improve the convex approximation in (10e) by replacing parameter Qδu,m,q with δ¯u,m,q , where δ¯u,m,q are chosen to satisfy q′ ≤q x ˆu,m,q′ δu,m,q ≈ ˆ computed minq′ ≤q x ˆu,m,q′ δ¯u,m,q for a near-optimal solution x by some heuristic algorithm, like the ones in Sec. IV-C.

As a basis for comparison, we present multi-client extensions to the polynomial-complexity heuristic algorithms proposed in [1]. The Simple Rate-Distortion Optimization (SRDO) algorithm takes a maximal allowed packet loss rate Pmax as the input and sorts NALUs in descending order of δu,m,q /su,m,q . It sequentially assigns NALUs to the access network with the smallest pn until all access networks are fully loaded, i.e., right before the smallest pn exceeds Pmax. SRDO has a complexity of O U M (Q + 1) log[U M (Q + 1)] . The Progressive Rate-Distortion Optimization (PRDO) algorithm considers the net distortion gain of assigning NALU gu,m,q over access network n, namely bu,m,q,n , based on the distortion model (cf. Sec. III-C). Following the video prediction structure, PRDO sequentially schedules the immediately b decodable NALU gu,m,q with the largest nonnegative su,m,q,n u,m,q value, to network n. The algorithm stops when all unscheduled NALUs have non-positive net distortion values. PRDO has a  complexity of O U M 3 (Q + 1)2 N 2 . V. E VALUATION A. Setup We use Abing [22], [30] to periodically collect ABR and RTT traces between two subnets at Stanford University and Deutsche Telekom Lab (Berlin) for two hours. At Deutsche Telekom Lab, Abing was run over three access networks: Ethernet, 802.11b, and 802.11g. Parts of the network traces were used in [1], [3], [16], and further details can be found therein. We consider four 4CIF (704x576) video sequences: City, Soccer, Crew, and Harbour, encoded as scalable streams using JSVM Reference Software. Each scalable stream has a base layer and seven MGS layers. In Fig. 3 we plot the average video quality for the four streams, where each sample

min x

PU

Fu u=1 M

s.t. rn = pn ≥ xu,m,q ≤

yu,m = du,m =

m=1

e min

{L1m (ξ, x, p) s.t.

em ≥

PM

:=

ξ ∈ {0, 1}

N

C(d) PQ

(11a)

q=0 su,m,q xu,m,q,n ,

t (cn −rn ) − 0 α n

,

(11b) (11c)

PN

n=1 (ξ1 (n)xu,m,q,n + (1 − pn )ξ2 (n) − ξ1 (n)ξ2 (n)) PN × {0, 1}N , L1m (ξ, x, p) ≥ n=1 (1 − pn )xm,q,n N N

∀(x, p) ∈ {0, 1} × {0, 1} }, P Pq−1 Q ¯ ) := Q δˆm + q=0 δm,q − min {L2m (ξ, x q=0 ( q′ =0 i≤q′ ξ(i)δm,q′ + PQ Q PQ Q q=0 i≤q ξ(i)δm,q q′ =q i≤q′ ,i6=q ξ(i)xm,q δm,q′ ) − Q PQ Q+1 2 ¯ ) ≤ q=0 x ¯m,q δm,q ∀¯ x ∈ {0, 1}Q+1}, s.t. ξ ∈ {0, 1} , Lm (ξ, x P αu,m + k∈Pu,m βu,m,k eu,k ,

(11d)

eu,m + yu,m , PN n=1 xu,m,q,n ≤ 1, xu,m,q,n ∈ [0, 1].

(11g)

PQ

(11e) (11f) (11h) (11i)

Streaming Rate (Mbps)

Fig. 3.

1

2 3 Rate in Mbps

4

5

R-D curves of the considered videos.

5 4 3 2 DCCP-TCP DCCP-TFRC HC

1 0

0

10

20 30 40 Time (sec)

50

60

Quality in PSNR (dB)

Estimated Actual

40

35

30 100

120 130 140 Frame Number

4

DCCP-TCP DCCP-TFRC HC

3 2 1 0

30 25 20 15 10

City

Soccer Crew Harbour Video Sequence

(a)

(b)

Fig. 6. Streaming rate achieved by the different algorithms: (a) sample results from City and (b) overall results.

point represents an MGS layer. We estimate the video model parameters by extracting and decoding multiple substreams from each stream and measuring the video quality. To evaluate the accuracy of the video model, we randomly extracted 32 substreams from each video stream, computed the empirical per-frame video quality and compared it to the video quality estimated by the video model (cf. Fig. 4). The model approximation errors for City, Soccer, Crew, and Harbour were measured to be 2.82%, 1.38%, 0.74%, and 1.65%, respectively. In [1], we implemented a multihomed streaming server in NS-2 [31] which supports the SRDO and PRDO algorithms. We have extended this streaming server to support the HC algorithm5 for multiple clients. Several modern transport protocols, including unreliable DCCP [12] and reliable Stream Control Transmission Protocol (SCTP) [32], can be used as benchmarks for the proposed algorithms. Since reliable data delivery is not critical to real-time applications [12], we chose DCCP as the benchmark. We have implemented a multihomed DCCP streaming server, based on an open-source DCCP implementation [33] which supports two standard rate control algorithms: TCP-like and TCP-friendly rate control (TFRC). The DCCP streaming server sets up a connection over each access network and assigns NALUs to each connection from lower to higher quality layers until reaching the rate limit 5 We solve HC numerically using CVX [11]. The run-time values correspond to a 2.8 GHz PC with Matlab R2010a.

DCCP-TCP DCCP-TFRC HC

5

150

Fig. 4. The proposed video model closely follows empirical results. Sample results from Soccer.

5

35

0 110

City

Soccer Crew Harbour Video Sequence

Fig. 5. Video quality achieved by different algorithms.

Quality in PSNR (dB)

0

Quality in PSNR (dB)

Crew City Harbour Soccer

45

Streaming Rate (Mbps)

Quality in PSNR (dB)

38 37 36 35 34 33 32 31 30

36 34 32 30 28 26 24 22 20 20

SRDO PRDO HC 30

40 50 60 70 80 Background Traffic (%)

90

Fig. 7. Sample video quality from Harbour under different background traffic load.

computed by the rate control algorithms. The DCCP streaming servers with TCP-like and TFRC rate control algorithms are referred to as DCCP-TCP and DCCP-TFRC, respectively. We simulate multihomed video streaming sessions using the four videos with random start times in the network traces. We inject background traffic in network n using a constant bit rate (CBR) traffic generator, and we set its rate as λrn , where background traffic load λ is a parameter and rn is the ABR given in the network traces. We chose M = 32, Q = 7, t0 = 1 sec, Pmax = 0.1, and λ ∈ [0.20, 0.90]. The maximum UDP packet size is set to 1000 bytes. Unless otherwise specified, the average distortion is employed as the cost function. We conduct simulations with a single user (U = 1) and compare the performance of HC against SRDO and PRDO. We also run HC for three streams (U = 3) of different videos. For each setup, we test the algorithms 300 times, and consider four performance metrics: video quality, streaming rate, packet delivery delay, and run-time. B. Results Comparison against DCCP. We compare the video quality achieved by the HC algorithm against the DCCP rate control algorithms under λ = 40%. Fig. 5 presents the average video quality of the considered videos; the HC algorithm outperforms the DCCP rate control algorithms by about 10 dB.

30

40 50 60 70 80 Background Traffic (%)

90

25

City Soccer Crew Harbour

30

Run-Time (sec)

10 9 8 7 6 5 4 3 2 1 0 −1 20

40 50 60 70 80 Background Traffic (%)

(a)

(b)

40 35 30 25 20 15 10 5 0

Quality in PSNR (dB)

Quality in PSNR (dB)

Fig. 8. Video quality improvement achieved by HC over: (a) SRDO and (b) PRDO under different background traffic load.

Crew City Harbour 0

10

20 30 40 Time (sec)

50

35 30 25 20 15 10 5 0

60

Fig. 10. Sample video quality with cost function C(100,10,1) .

C (1,1,1)

Crew City Harbour C (5,1,1) C (100,10,1) Cost Function

Fig. 11. Average video quality with different cost functions.

We report the streaming rates achieved by different algorithms in Fig. 6. Fig. 6(a) shows a sample time period, which reveals that the HC algorithm also leads to smoother streaming rates. Fig 6(b) plots the average streaming rates for all videos, which indicates that the HC algorithm leads to streaming rates comparable to the DCCP rate control algorithms. Fig. 6 reveals that the HC algorithm is TCP-friendly, since the DCCP rate control algorithms were designed to be TCP-friendly. Next, we calculate the average packet delivery delay caused by different algorithms. We found that, for all videos, DCCPTCP and DCCP-TFRC lead to on average 1.69 and 2.45 sec delay, respectively, while the HC algorithm results in about 0.13 sec delay. This indicates that schedules produced by the HC algorithm deliver more packets on time, which, in turn, justifies the better video quality compared to DCCP. Comparison against SRDO and PRDO. We study the performance of the HC algorithm under different background traffic load, and compare it against SRDO and PRDO. Fig. 7 presents the achieved video quality from Harbour; the HC algorithm outperforms the PRDO algorithm, which in turn outperforms the SRDO algorithm. We also plot the quality improvement resulted by HC over SRDO and PRDO in Fig. 8; the HC algorithm almost always leads to quality improvement. Specifically, among all videos, the maximum, mean, and minimum quality improvements over SRDO are 7.36, 4.33, and 1.19 dB, respectively. The maximum, mean, and minimum

SRDO PRDO HC

20 15 10 5 0 20

90

30

40 50 60 70 80 Background Traffic (%)

90

Fig. 9. Sample run-time from Harbour under different background traffic load.

Streaming Rate (Mbps)

City Soccer Crew Harbour

Improvement in PSNR (dB)

Improvement in PSNR (dB)

10 9 8 7 6 5 4 3 2 1 0 −1 20

1.4 1.2 1 0.8 0.6 0.4 0.2 0

C (1,1,1)

Crew City Harbour C (5,1,1) C (100,10,1) Cost Function

Fig. 12. Average streaming rate with different cost functions.

quality improvements over PRDO are 4.71, 1.84, and -0.33 dB. Fig. 9 presents the run-time of the different algorithms; the HC algorithm reduces the run-time by up to 6 times in comparison with PRDO. Although SRDO runs fast, at 185 msec on average, it results in lower video quality as illustrated in Figs. 7 and 8. Therefore, we propose to use the HC algorithm for good performance and reasonable run-time. Multiple Clients and Service Differentiation. We use the HC algorithm to stream different videos to three clients under λ = 40%. Three cost functions C(1,1,1) , C(5,1,1) , and C(100,10,1) are considered, where C(w1 ,w2 ,w3 ) := PMu P3 m=1 du,m . We plot the video quality of individual u=1 wu clients with C(100,10,1) in Fig. 10, which shows that the HC algorithm achieves service differentiation: client 3 (Harbour) has the lowest video quality among all clients. Fig. 11 presents the overall video quality under different cost functions, which shows that various degrees of service differentiation can be achieved by different cost functions. For example, with C(100,10,1) , the video quality of client 3 (Harbour) is 10 dB lower than that of client 1 (Crew), while the gap is reduced to 3 dB with C(5,1,1) . Fig. 12 plots the average streaming rate under different cost functions. For C(1,1,1) , client 1 (Crew) achieves higher video quality than other clients (cf. Fig. 11), despite receiving lower rate (cf. Fig. 12); this is because Crew has a steeper R-D curve (cf. Fig. 3).

VI. C ONCLUSIONS In this paper, we have addressed the problem of streaming scalable videos from a server to multiple clients over heterogeneous access networks. We have formulated this problem as an integer program for joint rate control and stream adaptation in order to determine, for each client: (i) the streaming rates of individual access networks, (ii) the video packets selected for transmission, and (iii) the access network each video packet is sent over, so as to minimize a cost function of received video distortions. We have proposed using different cost functions to account for service differentiation and fairness among users. We have derived convex programming approximations to the randomized packet-scheduling problem, and have studied the trade-off between performance and run-time: one of our algorithms (TTC) has a lower run-time at the cost of inferior performance, while the other one (MC) has better performance at the cost of exponential complexity. We have proposed a hybrid algorithm (HC) that yields good performance for a low number of access networks, and is suitable for real-time applications. We have also extended the heuristic algorithms SRDO and PRDO in [1] for multiple clients. We have conducted extensive simulations to compare the performance of HC against SRDO, PRDO, and the rate control algorithms defined in the DCCP standard. The simulation results have shown that the HC algorithm: (i) outperforms the rate control algorithms in the DCCP standard by about 10 dB in video quality, (ii) results in an average quality improvement of 4.33 dB vs. SRDO, and 1.84 dB vs. PRDO, under various background traffic load, (iii) runs efficiently, up to six times faster than PRDO, and (iv) indeed provides service differentiation among users. R EFERENCES [1] C. Hsu, N. Freris, J. Singh, and X. Zhu, “Rate control and stream adaptation for scalable video streaming over multiple access networks,” in Proc. of International Packet Video Workshop (PV’10), Hong Kong, China, December 2010. [2] F. Hartung, U. Horn, J. Huschke, M. Kampmann, T. Lohmar, and M. Lundevall, “Delivery of broadcast services in 3G networks,” IEEE Transactions on Broadcasting, vol. 53, no. 1, pp. 188–199, March 2007. [3] X. Zhu, P. Agrawal, J. Singh, T. Alpcan, and B. Girod, “Distributed rate allocation policies for multihomed video streaming over heterogeneous access networks,” IEEE Transactions on Multimedia, vol. 11, no. 4, pp. 752–764, June 2009. [4] T. Alpcan, J. Singh, and T. Basar, “Robust rate control for heterogeneous network access in multihomed environments,” IEEE Transactions on Mobile Computing, vol. 8, no. 1, pp. 41–51, January 2009. [5] J. Apostolopoulos and M. Trott, “Path diversity for enhanced media streaming,” IEEE Communications Magazine, vol. 42, no. 8, pp. 80–87, August 2004. [6] “AT&T faces 5,000 percent surge in traffic,” http://www.internetnews. com/mobility/article.php/3843001, 2009. [7] “T-Mobile’s growth focusing on 3G,” http://connectedplanetonline.com/ wireless/news/t-mobile-3g-growth-0130, 2009. [8] J. Xin, C. Lin, and M. Sun, “Digital video transcoding,” Proceedings of the IEEE, vol. 93, no. 1, pp. 84–97, January 2005. [9] H. Schwarz, D. Marpe, and T. Wiegand, “Overview of the scalable video coding extension of the H.264/AVC standard,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 17, no. 9, pp. 1103– 1120, September 2007. [10] M. Wien, H. Schwarz, and T. Oelbaum, “Performance analysis of SVC,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 17, no. 9, pp. 1194–1203, September 2007.

[11] “CVX: Matlab software for disciplined convex programming,” http:// www.stanford.edu/∼ boyd/cvx/, 2009. [12] Y. Lai, “DCCP: Transport protocol with congestion control and unreliability,” IEEE Internet Computing Magazine, vol. 12, no. 5, pp. 78–83, September/October 2008. [13] A. Szwabe, A. Schorr, F. Hauck, and A. Kassler, “Dynamic multimedia stream adaptation and rate control for heterogeneous networks,” in Proc. of IEEE International Packet Video Workshop (PV’06), Hangzhou, China, May 2006, pp. 63–69. [14] D. Jurca and P. Frossard, “Media-specific rate allocation in heterogeneous wireless networks,” in Proc. of IEEE International Packet Video Workshop (PV’06), Hangzhou, China, May 2006, pp. 713–726. [15] X. Zhu, J. Singh, and B. Girod, “Joint routing and rate allocation for multiple video streams in ad-hoc wireless networks,” in Proc. of IEEE International Packet Video Workshop (PV’06), Hangzhou, China, May 2006, pp. 727–736. [16] J. Singh, T. Alpcan, P. Agrawal, and V. Sharma, “An optimal flow assignment framework for heterogeneous network access,” in Proc. of IEEE International Symposium on a World of Wireless, Mobile and Multimedia Networks (WoWMoM’07), Helsinki, Finland, June 2007, pp. 1–12. [17] M. Hefeeda and C. Hsu, “Rate-distortion optimized streaming of finegrained scalable video sequences,” ACM Transactions on Multimedia Computing, Communications, and Applications, vol. 4, no. 1, pp. 2:1– 2:28, January 2008. [18] I. Amonou, N. Cammas, S. Kervadec, and S. Pateux, “Optimized ratedistortion extraction with quality layers in the scalable extension of H.264/AVC,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 17, no. 9, pp. 1186–1193, September 2007. [19] J. Sun, W. Gao, D. Zhao, and W. Li, “On rate-distortion modeling and extraction of H.264/SVC fine-granular scalable video,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 19, no. 3, pp. 323–336, March 2009. [20] H. Mansour, V. Krishnamurthy, and P. Nasiopoulos, “Channel aware multiuser scalable video streaming over lossy under-provisioned channels: Modeling and analysis,” IEEE Transactions on Multimedia, vol. 10, no. 7, pp. 1366–1381, November 2008. [21] K. Evensen, T. Kupka, D. Kaspar, P. Halvorsen, and C. Griwodz, “Quality-adaptive scheduling for live streaming over multiple access networks,” in Proc. of ACM International Workshop on Network and Operating Systems Support for Digital Audio and Video (NOSSDAV’10), Amsterdam, Netherlands, June 2010, pp. 21–26. [22] “Abing project page,” http://www-iepm.slac.stanford.edu/tools/abing/. [23] D. Gross, J. Shortle, J. Thompson, and C. Harris, Fundamentals of Queueing Theory, 4th ed. Wiley-Interscience, 2008. [24] X. Zhu, E. Setton, and B. Girod, “Congestion-distortion optimized video transmission over ad hoc networks,” Signal Processing: Image Communication, vol. 20, no. 8, pp. 773–783, September 2005. [25] Q. Zhang, W. Zhu, and Y. Zhang, “End-to-end QoS for video delivery over wireless Internet,” Proceedings of the IEEE, vol. 93, no. 1, pp. 123–134, January 2005. [26] Y. Liang, J. Apostolopoulos, and B. Girod, “Analysis of packet loss for compressed video: Effect of burst losses and correlation between error frames,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 18, no. 7, pp. 861–874, July 2008. [27] L. Jiang and J. Walrand, “A distributed CSMA algorithm for throughput and utility maximization in wireless networks,” IEEE/ACM Transactions on Networking, November 2009, accepted to appear. [28] S. Boyd and L. Vandenberghe, Convex Optimization, 1st ed. Cambridge University Press, 2004. [29] H. Sherali, “Convex envelopes of multilinear functions over a unit hypercube and over special discrete sets,” ACTA Mathematica Vietnamica, vol. 22, no. 1, pp. 245–270, 1997. [30] J. Navratil and R. Cottrell, “ABwE: A practical approach to available bandwidth estimation,” in Proc. of Passive and Active Measurement Workshop (PAM’03), La Jolla, CA, April 2003. [31] “The network simulator,” http://www.isi.edu/nsnam/ns/. [32] A. Caro, J. Iyengar, P. Amer, S. Ladha, G. Heinz, and K. Shah, “SCTP: A proposed standard for robust internet data transport,” IEEE Computer, vol. 36, no. 11, pp. 56–63, November 2003. [33] N. Mattsson, “A DCCP module for NS-2,” Master’s thesis, Department of Computer Science and Electrical Engineering, Lulea Tekniska University, 2004.