Learning Automata-Based Approach - IEEE Xplore

8 downloads 0 Views 139KB Size Report
Monmouth University, NJ, USA. Email:obaidat@monmouth.edu. Abstract—Routing issues in the existing Smart Grid (SG) literature are focused on Home Area ...
IEEE ICC 2014 - Cognitive Radio and Networks Symposium

Routing as a Bayesian Coalition Game in Smart Grid Neighborhood Area Networks: Learning Automata-Based Approach Neeraj Kumar Department of Computer Science and Engineering Thapar University, Patiala (Pb.) Email: [email protected]

Sudip Misra Senior Member, IEEE

Mohammad S. Obaidat Fellow of IEEE and Fellow of SCS

School of Information Technology IIT Kharagpur (W.B.) Email:[email protected]

CSSE Department Monmouth University, NJ, USA Email:[email protected]

Abstract—Routing issues in the existing Smart Grid (SG) literature are focused on Home Area Networks (HANs), Neighborhood Area Networks (NANs), or Wide Area Networks (WANs). Among these, routing in NANs is the most challenging as it entails construction and maintenance of backhaul having various Mesh Routers (MRs). Wireless networks are generally used for communication between backhaul and centralized controller for power distribution. This triggers increased chances of congestion due to scarce resources of available bandwidth and number of channels. Keeping in view of the same, in this paper, we propose a new Efficient Routing Scheme (ERS) as a Bayesian Coalition Game (BCG). The solution strategy integrates the concepts of Learning Automata (LA) in NANs. LA are assumed to be the players in the game, which are deployed at the MRs in NANs. Coalition among the players of the game is scaffolded upon the concepts of Bayesian Networks. Each player in the game is allowed to move from one coalition to another depending upon the payoff function. Corresponding to each move of the player in the game, its action may be rewarded or penalized from the environment. Based upon reward/penalty from the environment, each player updates its action probability vector. The proposed scheme is evaluated with respect to various performance evalaution metrics such as load utilization factor, user satisfaction levels, delay and probability of transmission.

I.

with the aid of wireless and allied technologies. As shown in Figure 1, it uses the built-in intelligence techniques for electricity generation, transmission, distribution and consumption for safer and cleaner environment [1]. The idea of SG

I NTRODUCTION

A Smart Grid (SG) is conceptually the next generation power grid having advanced IT-enabled features such as automated power and communication infrastructure management, generation of bills, customer support, online diagnosis, and so on all integrated seamlessly into the core power systems. The traditional grids are generally used to transmit power from centralized repository to the consumers [1], [2]. In contrast, a SG controls the two way flow of electricity and information. The core components of SG are the intelligent meters, wired/wireless backhaul systems, gateways and consumers. Intelligent (smart) meters are the devices which are installed on the consumer houses to collect the amount of consumption of electricity by the consumers. The collected data may be transmitted to the nearest wired/wireless backhaul network in which various Mesh Routers (MRs) are installed. By utilizing advanced communication technologies such as 3G/4G and WiFi, SG is capable of providing cost-effective services to the end users in an efficient manner. It may be considered as a future grid, which is capable of providing various services

978-1-4799-2003-7/14/$31.00 ©2014 IEEE

1502

Fig. 1.

Network infrastructure and power life cycle in Smart Grid [1]

evolves with an aim to provide safe and secure electricity consumption using smart meter. During the peak hours of electricity consumption, reliability and back-up are other few issues which need special attention. Keeping in view of the same, in this paper, we proposed a new Bayesian coalition game and Learning Automata (LA)-based solution for efficient routing in NAN of SG. Strategically, the players in the game build coalitions among themselves using Bayesian coalition game theory and have flexibility to move from one coalition to another, depending upon the conditional probability. For each action taken by the automaton, it gets reward or penalty, and accordingly, it updates its action probability vector. The motivation to solve the above problem using Bayesian coalition Game and LA is that solution using these techniques can be

IEEE ICC 2014 - Cognitive Radio and Networks Symposium

optimized after finite number of iterations. It is one of the meta-heuristic approaches which can be used in wide range of engineering applications and is found successful in wide areas of applications. The rest of the paper is organized as follows. Section 2 describes the related existing literature. Section 3 provides a brief overview of Learning Automata. Section 4 provides the details about the proposed approach. Section 5 illustrates the simulation results with analysis and discussion. Finally, Section 6 concludes the paper while providing future research directions in this emerging area. II.

R ELATED W ORK

Lot of research proposals exist in the literature covering various aspects of SG. We review some of the most common existing proposals in this emerging technology. Most of the decisions are taken in SG NANs. The routing schemes are classified as wireless mesh, or PLC based [1]. A NAN builds the backbone, where various communication decisions are to be taken for reliable, secure and Quality of Service (QoS) aware routing in SG. Iwao et al. [2] proposed a multipath scheme for efficient routing in NANs in SG by using wireless mesh backbone. The proposed approach uses various resources in the mesh backbone for efficient routing in NANs. Dawson et al. [3] proposed a multipath routing scheme for lossy link in NANs. The authors have illustrated their scheme with respect to various performance parameters using multipath mechanism. Gharavi et al. [4] proposed a new scheme for multigate communication in SG. Authors have designed a new scheme where multigate communication among the nodes in NANs. Li et al. [5] proposed a new secure information aggregation scheme for SG using homomorphic encryption. The designed scheme generates less overhead in terms of computation and storage in SG. The authors have illustrated their scheme with respect to various conditions using encryption. Bartoli et al. [6] proposed secure lossless aggregation for smart grid in M2M networks. The designed scheme provides secure aggregation at various levels in NANs of SG. Islam et al. [7] proposed a unique scheme for secure communication for wireless mesh networks. Wireless mesh networks form the backbone of NANs in SG. So, essential security features are to be provided for the underlying network. Keeping this in view, the scheme proposed in [7] can be used to mitigate various attacks in the SG environment. Othman et al. [8] proposed a new secure mechanism for communication among the clients in SG. Gamer et al. [9] proposed a new scheme for differentiated security in wireless mesh networks which act as a backbone in the NANs. The designed scheme is effective with respect to various security threats in wireless networks. Li et al. [10] proposed a QoS-aware routing scheme in smart grid. The authors have illustrated the scheme with various QoS parameters. The performance of the proposed scheme is found to be better than the other schemes in the literature. Liang et al. [11] proposed a new broadcast algorithm for multipath routing in narrowband communication. The proposed scheme is evaluated in different environments with respect to varying various parameters, and its performance was found to be better than the other existing broadcast algorithms for multipath environment. Uludag et al. [12] evaluated various taxonomies for deploying mesh network based test-beds which can be useful for constructing the backbone to be used in SGs. The authors have compared various techniques with respect to

1503

different performance evaluation metrics which can be useful for the construction of next generation smart grid environment for generation, distribution, and electricity consumption for different end users. III.

BACKGROUND AND P RELIMINARIES

LA is a control mechanism which reacts according to the encoded instructions, and performs a particular task with feedback from the environment [13]–[23]. Figure 2 describes the basic structure of an automaton. An automaton takes the input parameters and act according to these parameters to produce an output. The automaton has the capability of improvement by learning from its environment, so that it can choose the optimal action from a finite set of allowed actions through repeated interactions [13]–[23]. The objective of a learning automaton is to find the optimal solution with minimized penalty received from the environment [13]–[23]. Mathematically, LA is defined

Fig. 2.

Relationship between Learning automata and environment

as (Q, K, P, δ, ω), where Q = (q1 , q2 , ..., qn ) is the finite set of states of LA, K = (k1 , k2 , ..., kn ) is the finite set of actions performed by the LA, P = (p1 , p2 , .., pn ) is the finite set of response received from the environment, and δ : Q × P → Q maps the current state and input from the environment to the next state of the automaton and is a function which maps the current state and response from the environment to the state of the automaton [13]–[23]. Let LA = (LA1 , LA2 , .., LAn ) are the finite number of LAs in the proposed system. Each automaton is assumed to be the player in the game and performs some action. In return, it gets either penalty or reward from the environment in the form of payoff, and updates its action probability values (as defined below). In the present solution the groups of automata are located at NANs and interact with the environment to get some probabilistic value (the payoff) from the environment, and accordingly update their action probability vector. Finally, it may be mentioned that Nash Equilibrium (NE) can be achieved if each players strategy is the best response of strategies of the other players. The NE is the marker to the optimized solution of the game. Definition 1: Let a game be defined as T = (P, σ, f ), where P = (P1 , P2 , ..., Pn ) are the players in the game such that each player Pi P has an individual strategy space σi and payoff function as f : σi × σj × .... × σn → R, where R is a real number, and σi × σj × .... × σn is the strategy space for multiplayer game. Then Nash Equilibrium is achieved with a joint strategy profile as f (σi , σj ) ≤ f ∗ (σi , σj ), where f ∗ (σi , σj ) is a function in joint strategy profile so that no player has an incentive to deviate from the equilibrium [20]–[23].

IEEE ICC 2014 - Cognitive Radio and Networks Symposium

Definition 2:Bayesian Coalition Game is based upon the concept of Bayesian Networks, in which players are represented as nodes of the Directed Acyclic Graph (DAG) [20]–[23] and their moves are on the arcs of the DAG. Each node in the graph has discrete random variable such that it has a conditional probability of each state of the node with respect to the combination of states of its parents [20]–[23]. Definition 3:A coalition [20]–[23] among the automata stationed at MRs is an ordered pair (n, k), where n is the number of players and k : 2n → R, where R is real number between [0,1]. A. Environment The environment is the place where the automaton operates and performs its action. The environment in which the automaton operates can be defined as a triplet < X, Y, ρ >, where X = (X1 , X2 , ..., Xn ) are finite inputs, Y = (Y1 , Y2 , ...Yn ) are the values of the reinforcement signal, and ρ = (ρ1 , ρ2 , ..., ρn ) are the penalty probabilities associated with each Xi , 1 ≤ i ≤ n. The automaton performs a finite number of actions, based upon which the response of the environment is either a reward or a penalty. According to the response received from the environment, the automaton decides its next action. B. Action probability updates Corresponding to each input parameter and action, the automaton updates its action probability vector by using the learning algorithm. In the proposed scheme, we have considered Linear Reward-Inaction (LRI ) scheme, in which, if the automaton receives the reward from the environment, then the action probability is updated; otherwise, the probability remains the same [3-10, 21,22,23]. The formulas for probability updating are as follows: pj (n + 1) = (1 − a)pj (n), j = i, Y = 0 pj (n + 1) = apj (n), j = i, Y = 0 pj (n + 1) = pj (n), Y = 1

(1) (2) (3)

where a is the learning parameter IV.

P ROPOSED A PPROACH

The proposed approach consists of formation of coalition among the players of the game using Bayesian Coalition Game Theory. These coalitions are used in making routing decisions. All the players in the game are allowed to form coalitions among themeselves. Each player in the game chooses some strategy from the strategy space, by looking at the strategies of the other players in the game. Also, each player is allowed to choose more than one strategy from the strategy space, by looking at the strategies of the other players in the game. Various decisions about selection of particular route and coalition formation are controlled by comparing individual players payoff with all the other opponents in the game. The payoff function for individual players is defined as follows: Pi (σi ) =

n  req in + req out i

1

bai × C n

i

(4)

1504

Equation (4) describes the number of incoming and outgoing requests reqiin , reqiout from the clients to the MRs, where the automaton is deployed and forming the backbone of the mesh network in NANs. bai , C n are the bandwidth available on the channel for satisfying the users requests, and total number of channels available. The players and their action probabilities are represented as the arcs of a Directed Acyclic Graph (DAG). The arcs in the graph are added or removed if the players leave or join the coalition. As we have taken the probability updating strategy in the proposed scheme, corresponding to each action of the automaton, the latter gets a reward or a penalty. Let pki (σi ) be the probability of selection of an action by the automaton. In DAG, corresponding to each edge one automaton is assigned along with number of actions and conditional probability to select one of the actions. Each player selects one of the actions depending upon the conditions that what are the moves taken by the other players in the game in previous iterations. In the designed algorithm, for each action of the automaton, it may be rewarded or penalized with respect to finite number of actions taken by the same to join a coalition. According to the feedback from the environment, it updates its action probability vector and may leave or join a coalition. According to the movement made by the automaton, the structure of the DAG which represents the players and their action probabilities may change, i.e., the formation of coalition is dynamic in nature, and is based on the activities of the players in the game. As the players constantly watch the moves of the other players in the game, so adaptive decisions are to be taken by selecting one of the strategies from the strategy space, which will maximize the profit with respect to the moves of the players in the game. The algorithm 1 describes various steps used for coalition formation among the LAs. Algorithm 1 Dyanmic Coalition formation Inputs: C, P, σi Inputs: Output of cluster formation Parameters: δ, λ 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18:

for (C = 1; C ≤ n; C + +) do for (A = 1; A ≤ n; A + +) do if (Ai reward) then Pi (σi ) = Pi (σi ) + λ Pik (σi ) = Pik (σi ) + δ else Pi (σi ) = Pi (σi ) − λ Pik (σi ) = Pik (σi ) end if Choose coalition having maximum value of pki (σi ) Send request to join coalition with maximum pki (σi ) if (pnrequests ≤ (reqiin + reqiout )) then Allow the new player in the coalition else Deny the request end if end for end for

Once the coalition among the players of the game is formed, adaptive decisions about routing are taken by the automaton working in that cluster of coalition. Decision about

IEEE ICC 2014 - Cognitive Radio and Networks Symposium

the selection of particular route is taken by selecting one of the strategies from the strategy space. The algorithm 2 describes the detailed steps for route selection in smart grid environment using LA. Algorithm 2 LA based Route Selection Inputs: n, A, Pi (σi (0), nr , np ) Inputs: Optimized Route selection Parameters: nr , np , 0 ≤ c ≤ 1 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23:

nr : are the number of rewards to LA np : are the number of penalties to LA Pi (σi (0)) : is the initial probability of selection of an action for (i = 1; i ≤ n; i + +) do Select a path from source to destination randomly Let initial probability of selection of the action is σi (0) if (Source = Destination) then Pi (σi )(t + 1) = Pi (σi )(t) else Compute the pay off each player as n  reqiin +reqiout Pi (σi ) = ba ×C n 1

i

end if if (Pi (σi ) ≤ thr) then Pi (σi )(t + 1) = Pi (σi )(t) + c(1 − Pi (σi )(t)) Pn−i (σi )(t + 1) = (1 − c)(1 − Pn−i (σi )(t)) nr = nr + 1 else Pi (σi )(t + 1) = Pi (σi )(t) Pn−i (σi )(t + 1) = Pn−i (σi )(t) np = np + 1 end if r Select the path with maximum of nnp end for V.

S IMULATION R ESULTS

We have used Matlab to analyze the performance of the proposed algorithm by considering a scenario involving power consumption from different electrical appliances in our home. We have considered three different zones for evaluation underloaded, saturated and overloaded, and computed the Load Utilization Factor (LUF) as follows:  Load < thr, Underlaoded Load = thr, Saturated (5) LU F = Load > thr, Overlaoded Also, user satisfaction level is computed based upon the demand and supply factor. This factor is extremely important for estimation of user experience, where there is large number of power cuts. We have varied the learning rate of the automaton operating at the backbone and the results with or or without using the LA-based approach. We have carried out 10 simulation runs with duration of each simulation run is 15 minutes and average value is used for computing the results. We have compared the proposed scheme with traditional conventional scheme where no smart grid environment is created and it looks like the old traditional power grid used to supply the electricity to our homes.

1505

Figure 3 shows the impact of varying the learning rate of an automaton on LUF. We have considered three different thresholds of load as thr1=high, thr2=medium, thr3=low and the results are shown in Figure 3. At low learning rate, LUF is high, as at this stage the automaton has fewer interactions with the environment in which it is operating. However, with an increase in the learning rate, LUF decreases, as there are more interactions with the environment, which gives more chances to the automaton to take adaptive decisions, and consequently, the LUF also decreases. The automaton gets reward/penalty from the environment corresponding to each action it performs, and hence, more interactions with the environment occur at the later stages of the game. The reward and penalty lies in the interval having value between 0 and 1. Initially it is assumed that all the nodes get equal reward value from 0, 1. The thresholds shown in the figure 3 are used to compute the load values with high, low and medium values. According to these thresholds for load, different values of LUF are analyzed by varying the learning rates of the automaton. Figure 4 shows the impact of varying the learning rate of the automaton on user satisfaction levels. Users have increased satisfaction when there are less power cuts. At low learning rate, the satisfaction level of users is less, as an automaton has less interaction with the environment with values of reward and penalty. But with an increase in the learning rate of the automaton, it interacts more with the environment which guides it to take more adaptive decisions resulting in increased optimization between demand and supply. Figure 5 shows the information overhead with varying traffic flow in distribution with and without using the proposed scheme. The results obtained show that the proposed scheme generates lesser overhead compared to the conventional scheme. This is due to the fact that in the proposed scheme, the automaton takes adaptive decision for selection of the best route by interacting with the environment. The automaton has learning capability, which makes it suitable for taking adaptive decisions at various stages in NANs of SG. Figure 6 shows the impact of the proposed scheme on delay in making the route decision with variation in traffic flow. As shown in the Figure, with an increase in traffic flow, the delay is increased constantly but increase in delay is less in the proposed scheme as compared to the conventional scheme which does not use BCG and LA. As the proposed scheme is adaptive in making the decisions about route selection by getting the inputs from the environment, it is closer in offering adaptive decisions, as compared to the other approach. Hence, the proposed approach is effective in making adaptive decision by adaptively learning and getting inputs from the environment, as compared to the other approach. Figure 7 shows the impact of the proposed scheme on the probability of packet transmission with varying traffic flows. With an increase in traffic flows, the probability of packet transmission decreases constantly. As shown in Figure 7, the proposed approach shows higher probability of packet transmission. As the proposed approach uses the concepts of LA in which each automaton has the ability to make adaptive decisions in NANs for transmission of the signals with respect to the available resources, so the chances of taking the correct decisions are more in the proposed approach as compared to other schemes of its category.

IEEE ICC 2014 - Cognitive Radio and Networks Symposium

Fig. 3. Variation of LUF with varying Learning rate

Fig. 4. User satisfaction level with varying Fig. 5. Overhead generation with varying traffic learning rate flows

Fig. 6. Delay estimation with variation in traffic Fig. 7. flow

VI.

Probability of packets transmission

C ONCLUSION AND F UTURE D IRECTIONS

Number of ongoing efforts attempt to convert the traditional power grid to SG for efficient distribution of power in modern homes and offices. In this paper, we proposed a new approach for taking adaptive decisions for route selection in NANs of SGs using the Bayesian Coalition Game and Learning Automata. An automaton is assumed to be the player in the game, which interacts with the environment to take adaptive decisions. A strategic space is constructed which is used by the players of the game to select one of the strategies based upon the opponent moves. A payoff function is used for this purpose. Corresponding to each move by the automaton, the automaton receives reward/penalty, according to which it updates its action probability vector. The results and observations can be used for further improvements with respect to user satisfaction level in the next generation smart/digital home. In future, we will explore the possibility of creation of Public Key Infrastructure (PKI)

[4]

H. Gharavi, B. Hu, Multigate communication network for smart grid, Proceedings of the IEEE 99 (2011) 10281045.

[5]

F. Li, B. Luo, P. Liu, Secure information aggregation for smart grids using homomorphic encryption, in: First IEEE International Conference on Smart Grid Communications (SmartGridComm), 2010, pp. 327332.

[6]

A. Bartoli, J. H. Serrano, M. Soriano, M. Dohler, A. Kountouris, D. Barthel, Secure lossless aggregation for smart grid m2m networks, in: First IEEE International Conference on Smart Grid Communications (SmartGridComm), 2010, pp. 333338.

[7]

M. S. Islam, Y. J. Yoon, M. A. Hamid, C. S. Hong, A secure hybrid wireless mesh protocol for 802.11s mesh network, in: Proceedings of the International Conference on Computational Science and Its Applications, Part I, ICCSA 08, Springer-Verlag, Berlin, Heidelberg,2008, pp. 972985.

[8]

J. B. Othman, Y. Benitez, On securing hwmp using ibc, in: IEEE International Conference on Communications (ICC), 2011, pp. 15.

[9]

T. Gamer, L. Vlker, M. Zitterbart, Differentiated security in wireless mesh networks, Security and Communication Networks 4 (2011) 257266.

[10]

H. Li, W. Zhang, QoS routing in smart grid, in: IEEE Global Telecommunications Conference, GLOBECOM 2010, 2010, pp. 16.

[11]

S. Liang, S. Chen, X. Ding, C. Zhang, Y. Xu, A broadcasting algorithm of multipath routing in narrowband power line communication networks, in proc. of ICCSN 2011, pp. 467471.

[12]

S. Uludag, T. Imboden, K. Akkaya, A taxonomy and evaluation for developing 802.11-based wireless mesh network testbeds, International Journal of Communication Systems, 2011.

[13]

J. Torkestani, M. Meybodi, Learning automata based algorithms for solving stochastic minimum spanning tree problem, Applied Soft Computing, Vol. 11, Issue 6, pp. 4064-4077, 2011.

[14]

J. Torkestani, M. Meybodi, An intelligent backbone formation algorithm for wireless adhoc networks based upon distributed learning automata, Computer Networks, Vol. 54, Issue 5, pp. 826-843, 2010.

[15]

K. Narendra, M. A. L. Thathachar, On the Behavior of a Learning

R EFERENCES [1] [2]

[3]

N. Saputro, K. Akkaya, S. Uludag, A survey on routing protocols for smart grid communications, Computer Networks, 56:2742-2771, 2012. T. Iwao, K. Yamada, M. Yura, Y. Nakaya, A. Cardenas, S. Lee, R. Masuoka, Dynamic data forwarding in wireless mesh networks, in: First IEEE International Conference on Smart Grid Communications (SmartGridComm), 2010, pp. 385390. S. D. Haggerty, A. Tavakoli, D. Culler, Hydro: A hybrid routing protocol for low-power and lossy networks, in: First IEEE International Conference on Smart Grid Communications (SmartGridComm), 2010, pp. 268273.

1506

IEEE ICC 2014 - Cognitive Radio and Networks Symposium

[16]

[17]

[18]

[19]

[20]

[21] [22]

[23]

Automaton in a Changing Environment with Application to Telephone Traffic Routing, IEEE Transactions on Systems, Man, and Cybernetics, Vol. SMC- l0,Issue 5, pp. 262-269, 1980. S. Misra, P. V. Krishna, V. Saritha, LACAV: An energy efficient channel assignment mechanism for vehicular Ad Hoc Networks, Journal of Supercomputing, Vol. 62, Issue 3, pp. 1241-1262, 2012. S. Misra, P. V. Krishna, A. Bhiwal, A. Chawla, B. E. Wolfinger, C. Lee, A Learning automata based fault tolerant routing algorithm for mobile Ad Hoc Networks, Journal of Supercomputing, Vol. 62, Issue 1, pp. 4-23, 2012. S. Misra, B. J. Oommen, S. Yanamandra, M. S. Obaidat, Random Early detection for congestion avoidance in wired networks: a decentralized pursuit learning automata like solution, IEEE Transactions of System, Man, and Cybernetics, Vol. 40, Issue 1, pp. 66-76, 2010. E. Khajonpong, D. Niyato, Coalition-Based Cooperative Packet Delivery under Uncertainty: A Dynamic Bayesian Coalitional Game, IEEE Transaction on Mobile Computing, Vol. 12, Issue 2, pp.371-385, 2013. W. Saad, Z. Han, H. Are, N. Dusit, E. Hossain, Coalition Formation Games for Distributed Cooperation among Roadside Units in Vehicular Networks, IEEE Journal on Selected Areas in Communications, Vol. 29,Issue 1, pp.48-60, 2011. M. Naserian, K. Tepe, Game theoretic approach in routing protocol for wireless ad hoc networks, Ad Hoc Networks, Vol. 7, pp.569-578, 2009. N. Kumar, J. Kim, ELACCA: Efficient Learning Automata based Cell Clustering algorithm for Wireless Sensor Networks, In Press, Wireless Personal Communications, vol. 73, no.4,1495-1512, 2013. N. Kumar, N. Chilamkurti, J. P. C. Rodrigues, Learning Automata-based Opportunistic Data Aggregation and Forwarding Scheme for Alert Generation in Vehicular Ad Hoc Networks, Computer Communications,vol.59, no.1, 22-32, 2014.

1507