Distributed Sender Scheduling for Multimedia ... - Semantic Scholar

1 downloads 133 Views 176KB Size Report
be found according to the Gittins indices [9] of the senders. 2) The proposed scheme can optimally select the best sender for multi-source transmission taking ...
Distributed Sender Scheduling for Multimedia Transmission in Wireless Peer-to-Peer Networks Pengbo Si†‡ , F. Richard Yu‡ , Hong Ji† and Victor C.M. Leung§ Department of Systems and Computer Engineering, Carleton University, Ottawa, ON, Canada † Key Laboratory of Universal Wireless Communication, Ministry of Education Beijing University of Posts and Telecommunications, Beijing, P.R. China § Department of Electrical and Computer Engineering, The University of British Columbia, Vancouver, BC, Canada Email: [email protected], richard [email protected], [email protected] and [email protected]

Abstract— Multi-source multimedia transmission is a popular approach in wireless peer-to-peer networks. Most previous related work concentrates on the protocols and network structures, and consequently ignores the multiple senders scheduling problem. In this paper, we present a distributed sender scheduling algorithm for multi-source transmission in wireless P2P networks, which can maximize the data rate and minimize the power consumption. Specifically, we formulate the network as a multi-arm bandit system. The optimal distributed scheduling policy can be found according to the Gittins indices of the senders. Numerical examples illustrate that the data rate and power consumption can be improved significantly compared to existing schemes.

I. I NTRODUCTION There has been significant growth in peer-to-peer (P2P) applications in the Internet. It is reported that the Internet traffic of P2P is up to 37.9% of the global Internet traffic in 2007 [1]. To facilitate P2P applications, P2P lookup protocols are proposed in wireline networks to find where the desired data are stored. Chord [2] is one of the typical protocols. For multimedia transmission in P2P networks, multi-source transmission is a popular architecture to increase the scalability of these networks. A good example of multi-source architeture is the BitTorrent system [3]. In [4], a multi-source transmission protocol using network coding is presented. With the recent advances in wireless communication technologies, it is widely envisioned that P2P applications will be popular in wireless networks [5]. Although some work has been done for multi-source transmission in wireless P2P networks, most of previous work concentrates on the protocols and network structures of this approach. Consequently, how to schedule these multiple senders optimally is ignored. Authors of [6] make fine attempts in this direction by using senders’ states, such as bandwidth and availability, as the selection criteria in wireline P2P networks. However, it may not be a easy task to extend the scheme in [6] to wireless P2P networks due to the distinct characteristics in wireless networks. A scheme to reduce the energy consumption of mobile devices in wireless P2P file sharing is proposed in [5], however, how to handle multiple sources is not studied in details. This work was jointly supported by the Hi-Tech Research and Development Program (National 863 Program) under Grant 2007AA01Z221, the National Natural Science Foundation of China under Grant 60672124 and the Scientific Research Foundation of Graduate School of BUPT under Grant No. 6, 2006.

In this paper, we present a distributed algorithm for scheduling the senders for multi-source multimedia transmission in wireless P2P networks, which can maximize the data rate and minimize the power consumption for wireless P2P applications. The motivations behind our work include: 1) A fundamental characteristic of wireless networks is the time-varying and user-dependent fading channel. An important means is the use of multi-user diversity. It is shown that the optimal strategy is opportunistic scheduling that schedule at any time only the user with the best channel to transmit [7]. 2) The channels in wireless P2P networks experience timevary and user-dependent fading. Selecting the best sender for multi-source multimedia transmission in wireless P2P networks may maximize the data rate. 3) System resource constraints are important issues in wireless mobile devices. Some examples of the constraints include limited battery power, low-power microprocessor and small memory. In selecting the best sender, these resource constraints should be taken into account. 4) There is no centralized control point in wireless P2P networks. Therefore, the scheduling scheme should be distributed. To the best of our knowledge, the design of distributed sender scheduling scheme for multimedia transmission in wireless P2P networks has not been addressed in previous work. The main contributions of this paper are as follow. 1) We formulate the wireless P2P network as a multi-arm bandit system [8]. The optimal distributed sender policy can be found according to the Gittins indices [9] of the senders. 2) The proposed scheme can optimally select the best sender for multi-source transmission taking into account of channel conditions and resource constraints. It can maximize the data rate and minimize the power consumption. 3) It is a distributed and scalable scheme. There is no need for a centralized control point to coordinate the senders, and senders can join and leave from the networks freely. Extensive numerical examples illustrate the effectiveness of the proposed scheme. It is shown that the data rate and power consumption in the proposed scheme can be improved significantly compared to traditional schemes. The rest of the paper is organized as follows. Section II describes the system model for wireless P2P networks. The multi-arm bandit formulation is presented in Section III. Our proposed scheme is given in Section IV. Section V provides

978-1-4244-2324-8/08/$25.00 © 2008 IEEE. This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE "GLOBECOM" 2008 proceedings.

the numerical results. We conclude this study in Section VI. II. S YSTEM M ODEL Locating the desired files in P2P networks is realized by lookup protocols. Our research is focused providing maximized receiving data rate and minimized power consumption by scheduling multiple potential senders, since it’s probable that L nodes have the desired file, where L > 1. We call them potential senders. Considering the battery powered wireless devices, it is assumed that minimizing the energy consumption and maximizing the network life time to be equivalent. We adopt one of the commonly used lifetime definitions: the number of data collection time slots until the number of dead nodes reaches a threshold LT [10].

The observation vector of residual energy is Ψ = (ψ1 , ψ2 , . . . , ψH ), where each element corresponds to an element in E. The probability matrix of the observations when sender l is active is represented as BE (l) = [νgh ]H×H , where k+1 = g|sk+1 = h, ak = l), g ∈ Ψ, h ∈ E. (4) νgh = P r(yE E

where ak ∈ {1, 2, . . . , L} denotes that sender l is the active sender at time instant k. III. M ULTI - ARM BANDIT F ORMULATION In this section, we formulate the sender scheduling problem for multi-source transmission as a multi-arm bandit system. It has an “indexable” property that dramatically simplifies the computation and implementation.

A. Channel Model Wireless channels are not stable and only provide limited bandwidth. We consider slow fading wireless channels that can be modeled using finite-state Markov channels [11]. The channel is characterized by a set of states, C = (C1 , C2 , . . . , CG ), where G is the number of available channel state levels. The time axis is divided into slots of equal duration. The channel state is skC in time slot k ∈ {1, 2, . . . , K}, where K is the number of time slots. The transition probability matrix for sender l is AC (l) = [cgh ]G×G , where = g|skC = h), g, h ∈ C. cgh = P r(sk+1 C

(1)

Assume that there are hand shaking signals between potential senders and receivers at the beginning of the timeslots, so that the senders can use the receiving signal for channel state observation. However, due to the limitation of hardware, potential senders can only know the probability of the channel state level. Denote the observation vector to be Θ = (θ1 , θ2 , . . . , θG ), where each element corresponds to an element in C. Assume the channel state observation at time k . The probability matrix of the observations when slot k is yC sender l is active is represented as BC (l) = [σgh ]G×G , where k+1 = g|sk+1 = h, ak = l), g ∈ Θ, h ∈ C, (2) σgh = P r(yC C

where ak ∈ {1, 2, . . . , L} denotes that sender l is the active sender at time instant k. B. Energy Model

A. System Formulation In practice, the state of battery residual energy is independent to the channel state. Therefore, the state of sender l, sk (l), can be modeled as sk (l) = [skC (l), skE (l)]. If sender l is active at time slot k, then the state sk (l) evolves according to an Ul -state Markov chain with transition probability matrix A(l) = [(cgh , egh )]Ul ×Ul , where cgh and egh are defined in (1) and (3), respectively, and Ul = G × H. If sender l is not an active sender at time instant k, sk+1 (l) = sk (l). That is to say, the states of all other L−1 passive senders do not change. The state of the active sender is observed by the detector output y k+1 (l) for the active sender state sk+1 (l). Assume that there is a finite Wl observation set indexed by w(l) = 1, 2, . . . , Wl . Denote Y k = (y 1 (a0 ), . . . , y k (ak−1 )) as the observation history for time instant k. Let B(l) = (bdf (l))d∈Ul ,f ∈Wl denote the observation probability matrix, where each element bdf (l) = P r(y k+1 (l) = f |sk+1 (l) = d, ak = l), in which ak ∈ {1, 2, . . . , L} denotes that sender l is the active sender at time instant k [13]. The observation is derived from σgh and νgh , defined in (2) and (4), respectively. After a potential sender is selected as the active sender, an instantaneous reward β k R(sk (l), l) is accrued, where 0 ≤ β ≤ 1 is the discount factor. Define the total expected discounted reward over an infinite time horizon as [13]: ∞   k k k k β R(s (a ), a ) . (5) Ja = E k=0

Most wireless devices are powered by batteries with limited energy. Assume that the residual energy of each wireless device does not changed when it’s not in use [12]. Residual energy of each device can be detected locally with the nonezero probability of false detection. We also divide the continuous battery residual energy into discrete levels, denoted by E = (E1 , E2 , . . . , EH ), where H is the number of available energy state levels. The energy state is skE at time instant k, k . The residual energy and the energy state observation to be yE is also a Markov chain when sender l is active. The transition probability matrix is AE (l) = [egh ]H×H , where = g|skE = h), g, h ∈ E. egh = P r(sk+1 E

(3)

Consequently, the aim of the sender scheduling scheme is to find the optimal policy that produces the maximum reward. Based on the formulation above, we can compute the information state, which is a probability distribution, and also a sufficient statistic for the decision and observation history. Define the information state xk (l) for each sender l to be xk (l) = (xki (l)), where i = 1, 2, . . . , Ul , xki (l) = P r(sk (l) = i|Y k , ak−1 = l) is a l − 1 dimension simplex. Denote χ(l) to be the state space of information states xi (l), χ(l) = {x(l)  ∈ Ul : 1Ul x(l) = 1, 0 ≤ xi (l) ≤ 1, for all i ∈ 1, . . . , Ul }. If sender l is the active sender at time instant k, the information state xk (l) can be recursively updated by the hidden Markov

978-1-4244-2324-8/08/$25.00 © 2008 IEEE. This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE "GLOBECOM" 2008 proceedings.

model (HMM) state filter that is known as forward algorithm with the new observation y k+1 (l) [13]: 

xk+1 (l) =

B(l, y k+1 (l))A (l)xk (l) ,  1Ul B(l, y k+1 (l))A (l)xk (l)

(6)

where if y k+1 (l) = m, then B(l, m) is a diagonal matrix that has its non-zero elements only on the main diagonal, which is denoted by b1m (l), b2m (l), . . . , bUl m (l). The states of all other L − 1 passive senders keep unaffected. That is, xk+1 (q) = xk (q), if sender q is not the active sender, q ∈ {1, . . . , L}, q = l. Denote the reward Rak as Nak dimensional vector R(ak ): 

[R(sk (l) = 1, ak )), . . . , R(sk (l) = Nak , ak )] .

(7)

Define Ja to be the total expected discounted reward over an infinite time (7) into (5), we can get  horizon. By substituting  ∞ k k k k β R (a )x (a ) . Ja = E k=0 Finally, for each sender, there is a function γ k (l, xk (l)) called Gittins index. The policy has an index rule: active sender q, where q = max γ(l, xk (l)). Thus, to solve the sender l∈1,...,L

scheduling problem, computing the Gittins indices is the key step, which will be described in the following. B. Value Iteration Algorithm for Computing Gittins Index The Gittins index for each sender l is independent of the other L − 1 senders and off-line computed with dynamic programming formulation. For each sender l, let a positive real number M (l) for each potential sender l denotes ¯ (l), M ¯ (l) = a positive real number, 0 ≤ M (l) ≤ M k k max R(s (l) = i, a = l)/1 − β. i∈Ul ¯ (l) and For simplification, we omit the l in M (l) and M k l,K the superscript k in x (l). Define V (x(l), M ) to be the value function for sender l of an K−horizon, and satisfies the functional Bellman’s recursion [14]:  Wl   l,k+1 (x(l), M ) = max M, R (l)x(l) + β V l,k V



B(l, m)A (l)x(l) ,M  1Ul B(l, m)A (l)x(l)



m=1 





1Ul B(l, m)A (l)x(l) ,(8)

where M denotes the parameterized retirement reward, k = 0, . . . , K − 1. We denote γ K (l, x(l)) to be the approximate Gittins index γ K (l, x(l)) = min{M : V l,K (x(l), M ) = M }. The finite horizon Gittins index can be arbitrarily accurate by choosing the horizon K large enough [13].

the augmented observation process as y k ∈ {1, . . . , Wl + 1}, and the observation Wl +1 corresponds to a fictitious observation that causes the information state jumps into the retirement state. The (Ul + 1) × (Ul + 1) dimensional transition matrix observation probability matrices are defined as [13]:

A(l) 0Ul 0Ul ×Ul 1Ul   , A2 (l) = , A1 (l) = 0Ul 1 0Ul 1

B(l) 0Ul  , B2 (l) = I(Ul +1)×(Ul +1) , B1 (l) = 0Ul 1 B1 (l, m) = diag(column d of B1 (l)), B2 (l, m) = diag(column d of B2 (l)), where d ∈ {1, . . . , Wl + 1}. Coordinate transformation is used to construct a standard POMDP, and its value function is V l (x(l), M ). The pseudo-information state is defined as [13]:   ¯ 1 − M/M ¯ ¯. , 0≤M ≤M z = M/M Define the information state π and coordinate transformation, π =z⊗x ¯, A¯1 = I2×2 ⊗ A1 , A¯2 = I2×2 ⊗ A2 , ¯1 (l, d) = I2×2 ⊗ B1 (l, d), B ¯2 (l, d) = I2×2 ⊗ B2 (l, d), B   ¯ 1 (l) = R (l) 0 R (l) 0 , R    ¯ 2 (l) = M ¯ 1 . (9) R 0 0 0 Ul Ul where ⊗ denotes Kronecker product. Information state π(l) is a 2(Ul + 1) dimensional vector and belongs to Π(l), Π(l) =  {π : 12(Ul +1) π(l) = 1, and πi (l) ≥ 0, i = 1, 2, . . . , 2(Ul +1)}. Define the control variable v k ∈ {1, 2} at each time slot, v k = 1 means continue and v k = 2 means retire. ¯ With the definition of transition matrix A(l), obser¯ ¯ vation matrix B(l), reward matrix R(l) of two valued control  (v k ∈ {1, 2}) in equation (9) and objective K ¯  k (l)π k , we define a standard POMDP. max E βkR k=0 v v Update its information state π(l) according to [13]: π k+1 (l) =

¯vk (l, y¯k+1 )(A¯vk (l)) π k (l) B ¯v (l, y¯k+1 )(A¯v (l)) π k (l) , 12(Ul +1) B k k 

where v k ∈ {1, 2}, y¯k+1 ∈ {1, . . . , Wl + 1}. The value iteration recursion for optimizing this POMDP over the finite horizon K is given by [13]:  W l +1  k+1 ¯ 1 (l)π(l) + β (l, π(l)) = max R V¯ k (l) V¯ m=1

C. Finite Dimensional Characterization of Gittins Index The value iteration recursion (8) can be translated into a partial observable Markov decision process (POMDP) value function with a different coordinate basis, thus the Gittins index can be computed [13]. A fictitious retirement information state is added to find a solution of computing Gittins index. The (Ul + 1) dimensional augmented information state     is defined as x ¯ ∈ {[x , 0] , [0Ul , 1] }, where x ∈ χ(l), and   x ¯k = [0Ul , 1] denotes the retirement information state. Define

¯1 (l, m)(A¯1 (l)) π k (l)   B k ¯ ¯ ¯1 (l, m)(A¯1 (l)) π k (l) 1 B1 (l, m)(A1 (l)) π (l), 1 B W l +1 

¯2 (l, m)(A¯2 (l)) π k (l) B V¯ k (l)  ¯ 1 B2 (l, m)(A¯2 (l)) π k (l) m=1   ¯2 (l, m)(A¯2 (l)) π k (l) , 1B (10) ¯ 2 (l)π(l) + β R

¯  (l)π(l), R ¯  (l)π(l)}, V¯ k (l, π) where V¯0 (l, π) = max{R 1 2 denotes the value function of the dynamic program

978-1-4244-2324-8/08/$25.00 © 2008 IEEE. This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE "GLOBECOM" 2008 proceedings.

  K t ¯ t K−k V¯ k (l, π) = max E R β (l))π | π = π , and t t=K−k v k = 1, 2 . . . , K. This value function has several characteristics [13]: 1) the value function in (8) is equal to the value function of the standard POMDP defined in (10). 2) V¯ k (l, π(l)) is piecewise linear and convex and has the finite dimensional representation V¯ k (l, π(l)) =  max (λki ) π(l), where each vector λki is of the form λki = k λk i ∈Λ (l)

   (λki (1)) 0 (λki (3)) 0 , where λki (1), λki (3) ∈ Ul . There always exists a unique vector in Λk (l) which is denoted    ¯ 1 by λk1 = M with optimal control v k = 2, if 0 Ul Ul +2 all the elements of R(l) are not equal. Else if all the elements of R(l) are equal, then Λk (l) compromises of a single vector   ¯ 1 ¯ 1 λk1 = M 0 M 0 . Ul Ul 3) γ K (l, x(l)) for the information state x(l) ∈ χ(l) of sender l is given by the finite dimensional representation 

¯ (λK (3)) x(l) M i γ (l, x(l)) = max  ¯. K (λK (3) − λK (1)) x(l) + M λK i ∈Λ i i K

(11)

IV. O PTIMAL P OTENTIAL S ENDER S CHEDULING A. Scheduling Process To solve the sender scheduling problem, the off-line computation should be executed first. For each sender l = 1, 2, . . . , L, input: A(l), B(l), R(l), x0 (l), K, and β, then off-line compute finite set of vectors ΛK (l). The real-time sender scheduling scheme is as follows: Step 1: The receiver executes a P2P lookup protocol to find the desired file, and then the address list of the L potential senders and property information is feedback to the receiver. According to the length of the multimedia file, the receiver calculates the number of file fragments. Step 2: At k = 0, the receiver multicasts the request of the first fragment to all L potential senders using the multicast algorithm that will be described in Subsection IV-B. The address list of all potential senders are included in the request. Each sender l, l = 1, 2, . . . , L computes γ K (l, x0 (l)) according to (11) and multicasts the index to the others senders, using the multicast algorithm described in Subsection IV-B. Step 3: Each sender stores the L-dimensional vector (ak , γ), where ak is the active sender and γ is the vector of Gittins indices of the L senders, arranged in descending order, i.e., γ = (γ(1, xk (1)), γ(2, xk (2)), . . . , γ(L, xk (L))), where γ(1, xk (1)) is the sender that has the highest Gittins index. Step 4: Sender 1 transmits the requested fragment to the receiver as the active sender. Step 5: At the beginning of the next time slot, sender 1 obtains the observation y k+1 (1) from the detector, updates the state estimation using (6), and then computes its γ K (1, xk+1 (1)) according to (11). Step 6: Keep the Gittins indices unchanged for the other senders, q = 2, 3, . . . , L. Step 7: If γ K (1, xk+1 (1)) ≥ γ K (1, xk (1)), sender 1 will continue to be active. Else, if γ K (1, xk+1 (1)) < γ K (1, xk (1)),

sender 1 will multicast γ K (l, xk+1 (1)) to other potential senders and become a passive sender. Step 8: Go to Step 3 to rearrange the vector in each potential sender, until the last fragment is successfully transmitted. B. Multicast Algorithm In order to efficiently multicast the request from the receiver and the Gittins indices from the senders, we choose the Distributed Minimum Energy Multicast (DMEM) algorithm in [15], because it is designed to reduce as much as possible the total RF energy required by multicast communication in wireless Networks. It’s also source-initiated and distributed, consequently consistent with our proposal. In DMEM, the multicast source explores the energy conservation offered by the use of several localized operations, in which nodes make decisions based solely on the knowledge of distances to all its tree neighbors. Whenever there is at least one source and one receiver in the network, but no route information is known, a source-based shortest path tree (SPT) is created by performing a network-wide flood to propagate the MULTICAST-JOIN-REQUEST (MJREQ) messages initiated by the source. Then, the tree flood mechanism is used to data multicasting, allowing real-time transmission power adjustment at each relay node in the multicast tree. V. N UMERICAL R ESULTS AND D ISCUSSIONS In this section, we illustrate the performance of the proposed scheme by numerical examples. Two types of potential senders (type I and type II) with two state variables (wireless channel and residual energy) are considered. The extension to more sender types is straightforward. The parameters of these two types of senders are defined in the following subsections. A. Gittins Indices Since the Gittins Indices are very important in our sender scheduling scheme, we show the Gittins Indices of the two types of senders in the numerical examples. There are four states: low battery with bad channel (b0c0), low battery with good channel (b0c1), high battery with bad channel (b1c0), and high battery with good channel (b1c1). Four observations are available corresponding to the states. Assume the probability of false observation to be 10%, and the reward matrices of the two types   of senders are R(1) = R(2) = 100 300 400 500 . Set β = 0.8, then matrices A(1), A(2), B(1), B(2), R(1) and R(2) can be calculated. The Gittins indices of the two types of the senders are shown in Fig. 1. We can see that the Gittins index of type II sender is larger than that of type I sender when the channel is good and the energy is high, which indicates that type II sender should be selected as the active sender in this case. Type II sender should be selected otherwise. B. Receiving Data Rate Improvement The receiving data rate is an important QoS for wireless P2P networks. In this subsection, we use different parameters: the battery state change probability is α and 0.4 for sender of type

978-1-4244-2324-8/08/$25.00 © 2008 IEEE. This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE "GLOBECOM" 2008 proceedings.

100

2,500

Gittins Index

2,000

1,500

500 0 Good Channel Low Energy

Fig. 1.

0.2

0.4

0.6 Information State

0.8

1 1 Good Channel High Energy

0.2 0.4 0.6 0.8 Information state

40 20

5

10 15 The Number of Senders b

Fig. 3.

200

150 Optimal Selection Scheme Existing Selection Scheme 0.65 0.7 0.75 0.8 0.85 Battery Change Probability a

Fig. 2.

0.9

Optimal Selection Scheme Existing Selection Scheme

200

150

100

2

4 6 8 The Number of Senders b

20

7 Optimal Selection Scheme Select One until Dead

6.5 6 5.5 5 4.5 0.6

0.65 0.7 0.75 0.8 0.85 Battery Change Probability b

0.9

Life time comparison.

VI. C ONCLUSIONS

250

Receiving Bit Rate (Kbits/s)

Receiving Bit Rate (Kbits/s)

60

Gittins Index of the 2 types of senders.

250

100 0.6

80

0

Bad Channel High 0 Energy

1,000

7.5

Optimal Selection Scheme Select One until Dead

Life Time (Time Slot)

Life Time (Time Slot)

Sender of Type I Sender of Type II

10

With the optimization goal of maximizing the receiving data rate of the multimedia receiver, and the life time of the network, we considered multiple sender scheduling problem in wireless P2P networks. An algorithm to optimally schedule the potential senders was proposed based on the multi-arm bandit problem formulation. The distributed transmission process was also discussed. Numerical results demonstrated that the Gittins index based optimal policy improves the receiving data rate and network life time significantly.

Receiving data rate comparison.

R EFERENCES I and type II, respectively, where α denotes the probability of the battery state changing from high to low during  the and active time slot. Here R(1) = 100 300 200 500   R(2) = 50 150 100 250 . We can see from Figs. 2 that the receiving data rate is larger in the proposed scheme than in the existing scheme. There are two potential senders in Fig. 2a. In Fig. 2b, there are more potential senders, in which only one is of type I and the others are of type II. We can observe that in the existing scheme, the receiving data rate decreases as the increasing of potential sender number because the probability of selecting a sender that provides lower reward increases. C. Network Life Time Improvement In this subsection, we focus on the network life time with LT = 1. Three residual energy levels are considered: dead, low and high, with the probability of the change from high to low and from low to dead both to be α for the sender of type I and 0.4 for the sender of type II. We set R(1) = R(2) =  0 4 6 and different α for two types of senders. Numerical results are shown in Fig. 3. In Fig. 3a, 1/4 potential senders are of type I and the others are of type II. In Fig. 3b, two senders of type I and type II are simulated. The optimal selection scheme is compared with an existing scheme, in which one sender is selected randomly in the beginning of the multimedia transmission, and keep it active until dead, then selects another living one randomly until dead. In Fig. 3a, we observe the network lifetime increases as the number of senders increases, and the proposed scheme always has longer network lifetime compared to the existing scheme. Similar observation can be derived from Fig. 3b with different α.

[1] “Global IP traffic forecast and methodology, 2006-2011,” Cisco Systems White Paper, Jan. 2008. [2] I. Stoic, R. Morris, D. R. Karger, M. F. Kaashoek, and H. Balakrishnan, “Chord: A scalable peer-to-peer lookup protocol for Internet,” IEEE/ACM Trans. Netw., vol. 11, pp. 17–32, Feb. 2003. [3] B. Choen, “Incentives build robustness in bitorrent,” in Proc. P2P Economics Workshop, (Berkeley, CA), pp. 1978–1982. [4] C. Gkantsidis and P. R. Rodriguez, “Network coding for large scale content distribution,” in Proc. IEEE Infocom’05, (Miami, FL), Mar. 2005. [5] A. K.-H. Leung and Y.-K. Kwok, “On localized application-driven topology control for energy-efficient wireless peer-to-peer file sharing,” IEEE Trans. Mobile Comput., vol. 7, pp. 66–80, Jan. 2008. [6] M. Hefeeda, A. Habib, B. Botev, D. Xu, and B. Bhargava, “Promise: peer-to-peer media streaming using CollectCast,” in Proc. 11th ACM Conf. Multimedia, (Berkeley, California), pp. 45–54, Nov. 2003. [7] P. Viswanath, D. N. C. Tse, and R. Laroia, “Opportunistic beamforming using dumb antennas,” IEEE Trans. Inform. Theory, vol. 48, no. 6, pp. 1277–1294, 2002. [8] P. Whittle, “Multi-armed bandits and the Gittins index,” J. R. Statist. Soc. B, vol. 42, no. 2, pp. 143–149, 1980. [9] J. Gittins, Multi–armed Bandit Allocation Indices. Wiley, 1989. [10] Q. Dong, “Maximizing system lifetime in wireless sensor networks,” in Proc. 4h Int. Symp. Inform. Proc. in Sensor Netw., (Los Angeles, California), pp. 13–19, Apr. 2005. [11] W. Turin, Performance Analysis of Digital Transmission Systems. New York: Computer Science, 1990. [12] Y. Chen, Q. Zhao, and V. Krishnamurthy, “Transmission scheduling for optimizing sensor network lifetime: A stochastic shortest path approach,” IEEE Trans. Signal Proc., vol. 55, no. 5, pp. 2294–2309, 2007. [13] V. Krishnamurthy, “A value iteration algorithm for partially observed markov decision process multi-armed bandits,” Math. of Oper. Res., pp. 133–152, May 2005. [14] D. Bertsekas, Dynamic Programming and Optimal Control, vol. 1 and 2. Belmont, Massachusetts: Athena Scientific, 1995. [15] S. Guo and O. Yang, “Localized operations for distributed minimum energy multicast algorithm in mobile ad hoc networks,” IEEE Trans. Paral. and Dist. Sys., vol. 18, pp. 186–198, Feb. 2007.

978-1-4244-2324-8/08/$25.00 © 2008 IEEE. This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE "GLOBECOM" 2008 proceedings.