Betweenness Centrality Metrics for Assessing

2 downloads 0 Views 2MB Size Report
Betweenness Centrality Metrics for Assessing Electrical Power. Network Robustness against Fragmentation and Node Failure. K.A. Hawick. Computer Science ...
Technical Report CSTN-119

Betweenness Centrality Metrics for Assessing Electrical Power Network Robustness against Fragmentation and Node Failure K.A. Hawick Computer Science, Institute for Information and Mathematical Sciences, Massey University, North Shore 102-904, Auckland, New Zealand email: [email protected] Tel: +64 9 414 0800 Fax: +64 9 441 8181 March 2011 ABSTRACT Electrical power distribution networks are generally constructed incrementally over time and designs are heavily influenced by spatial geography. Assessing the robustness of electrical distribution networks against vulnerability to fragmentation and failure of particular nodes is a difficult general problem. Graph metrics such as betweenness and other centrality measures can give some indications of vulnerable nodes. We analyse both a parameterised synthetic power distribution network, and a real specific electrical network to experiment with: betweenness; centrality; clustering; and shortest-path distance metrics. We experiment with systematically removing the current most central node to study overall network degradation properties. We find that even for simple networks a great deal of inherent complexity exists and that the emergent failure properties are not necessarily monotonic and can exhibit sudden catastrophic changes. KEY WORDS power network; betweenness centrality; robustness; fragmentation; complexity.

1

Figure 1: Simulated New Zealand Power Distribution Major Network (r = 75.) cluding work on distribution networks [19], hierarchical networks [6] and other methods for assessing load and points of potential failure [33]. There are many such applications for these techniques including transport systems [14] and other utility distribution networks [36] but in this present paper we focus on electrical power distribution networks [1, 7] and associated graph issues [20, 24]. Many of the issues unique to electrical distribution networks arise because of their particular spatial structure and the topological constraints [17] with which they are built [28]. These give rise to various potential fail-

Introduction

Power distribution networks are highly complex graphs [31] with designed and legacy properties that are non trivial to understand. Numerical simulation of the static characteristics of these networks is possible and using optimised graph analysis software it is possible to investigate plausible dynamic failure scenarios when nodes of a network are postulated to fail. A considerable body of recent work has been reported in the literature concerning complex networks [23] in1

ure points and an overall fragility [4, 5] which must be addressed to avoid the consequences for the public [2, 18, 30]. Network analysis approaches such as agent based flow [27] and searching for identifiable network patterns [34] of potential failure are powerful techniques but a range of static graph analysis metrics can also be brought to bear on particular network designs. In this present article we are interested in centrality and other metrics and how they characterise networks and give signatures of potential failure The notion of a network centrality metric [11] is not new and various centrality measures [12] have been developed over the last 40 years. The betweenness centrality metric [21] is of particular interest and is discussed in Section 2 below. In summary it helps identify the most central or important node in a given network and therefore the node whose failure will have the greatest consequences on the whole system. Recent work has explored this metric [16] for applications including social [9] networks, communications networks [32] as well as power systems. It is difficult to obtain realistic and up-to-date electrical power network graphs. Such data is understandably sensitive. Real network data can have a lot of singularities and specific pathologies that are difficult to understand in the abstract. In addition to studying a public domain data set for the USA Western States Electrical Power network, we also investigate some synthetic networks based upon actual exit points for New Zealand’s national electrical grid system. Figure 1 shows one of our synthetic electrical power distribution networks based upon the actual major node exit points on New Zealand’s national grid, including both the North and South islands and the all important link between them. Synthetic data can give insights and help bridge the understanding between the unintuitive graph metrics and the real (noisy and messy) actual network data that represents the real problem we are trying to characterise. We are also interested in addressing the problem of how one poses appropriate numerical experiments on complex networks and how can one generate suitable parameterised synthetic data sets. Our article is structured as follows: In Section 2 we describe some of the formulae used to implement the various graph and network metrics including betweenness centrality. In Section 3 we describe how some synthetic networks were generated using actual node exit points for New Zealand’s electrical grid and present the metrics applied to this family of plausible electrical power networks. We apply the metrics to an actual power distribution network data set and give results in Section 4. In Section 5 we give some discussion of the implica-

tions of use of these synthesis techniques and centrality metric analyses and offer some conclusions ad areas for further study in Section 6.

2

Metrics

Consider a Graph G with a set V vertices or nodes, of which there are NV nodes and a set of E edges of which there are NE individual connecting power lines. In this present paper we do not weight the arcs, and imagine them all part of a single integrated high voltage trunk system connecting exit points to down-transformers and the consumer network structure. Centrality metrics attempt to rank the nodes in some order specifying which is the most connected or important to the network as a whole. A simple centrality measure is simply the in or out degree of a node. That is, the number of other nodes that is connects to or from. We look at plots of the input, output and total degree in Sections 3 and 4. However the simple static degrees do not necessarily give insights into the wider implications of a node failure. Another centrality metric is the so-called “betweenness.” This is defined in terms of the node through which the most number of pathways connecting any two other nodes pass. Computing the betweenness involves computing the shortest path distance between each pair of nodes (s, t); s ∈ V, t ∈ V . We then compute the fraction of the shortest paths that pass through each vertex v and sum this fraction over all possible pairs of vertices (s, t). This can be written as: CB (v) =

X s6=v6=t∈V

σs,t (v) σs,t

(1)

where σs,t is total number of shortest paths from s to We can optionally normalise by dividing by number of node pairs not including v. This factor is (n−1)((n−2) although for the work we report here with a fixed and known number of nodes in the graph it us instructive to plot the betweenness centrality un-normalised so we can see an actual number of pathways in the context of the whole network, and other distances and metrics. Computing shortest path data for a network is along standing problem and although there are several algorithms available [8, 13, 29] in practice the choice (for networks that are not too large) is dominated by the ease of integration with the data structures and the other software apparatus used. Computationally, the complexity of obtaining the shortest paths is O(NV3 ) using the Floyd-Warshall algorithm [10]. There are other and newer algorithms such as Brandes’ algorithm, which takes O(NV NE ) [3]. In this 2

present work the calculations were done repeatedly as networks were progressively allowed to fail and it was sufficient and easiest to use the Floyd-Warshall algorithm

3

Synthetic NZ Power Net Results

It is useful when exploring abstract graph metrics to experiment with relatively small, visualisable but still realistic networks. The synthetic test sets were constructed from the actual main node exit points in the New Zealand electrical power distribution network. Approximate number and locations of the exit points were obtained from web maps data available at http: //www.gridnewzealand.co.nz/. The arc line connections however are not the actual ones used but are generated synthetically by applying a circular inclusion distance, so that all nodes within a radial distance parameter R of one another become connected. This adjustable parameter allows us to experiment with a family of related and realistic networks to assess the role and behaviour of the metrics described in Section 2. The data were generated using a graph editing tool [15] and the spatial units are consistent with a grid resolution (shown on the map diagrams) of 64 units. We experiment with R ranging from 25 up to 100, and the synthetic network is fully connected at R > 85. Figure 2 shows how the synthetic NZ power network is fragmented at low radial parameter values of R. The colours show the component clusters. We can vary R from 25 to 100 and the effect on the number of cluster components present is plotted in Figure 3. The all important north-south island bridge only appears at R = 75 and the network is only fully connected above R = 85. Higher R values are essentially just overconnecting the network and giving it some additional robustness against failure. It should be emphasised this is not a realistic power line connection pattern - it most certainly is not an economically viable one. This algorithm does give us a parameterised connection pattern however. A realistic pattern would take into account line weights and use a radial connectivity approach only for the shorter range lines. Figure 3 also shows the effect of other metrics when R is varied. As can be seen the network is almost fully connected and the biggest cluster size is almost saturated above R = 50, but the final remote South island node does not become connected until parameter R > 85. Similarly one can see the number of distinct cluster fall relatively smoothly with increasing R.One can also identify humps in the all-pairs distance of the

Figure 3: Variation with spatial inclusion radius R network at these points. Additional connectivity lowers this distance metric gradually with increasing R. The lower plot in Figure 3 shows how the mean degree rises monotonically - but not completely linearly - with increasing R. The number of leaf nodes - nodes that lead nowhere - is itself an interesting metric. These synthetic networks are small so the plot of the number of leaf nodes is somewhat noisy. Interestingly it does not behave monotonically but in fact dips then rises again. The betweenness centrality is of particular interest. We can identify the node in the network that has the highest betweenness centrality and remove it. We can perform this progressively to reveal how the network might systematically fail. We hypothesise that this experiment of how an overloaded network might fail in practice. Probabilistically the most overloaded node might be the one that fails first. Its failure changes the properties of the network and a new most overloaded node is identified and then that removed. This procedure can be repeated to give an indication of the progressive network failure. Figure 4 shows three sets of curves for synthetic NZ Power distribution grids with R = 50, 75, 100 as the node with the current highest betweenness centrality is 3

Figure 2: Clustered NZ Power Synthetic Data Set at R = 25, 35, 50, 100. Colours (online version) show separate component clusters. removed by disconnecting it completely. This procedure is repeated 40 times which is enough to observe the network fail even with a high amount of redundant over-connectivity. Figure 4 shows a plot of: the number of arcs; the number of separate component clusters; the un-normalised betweenness; the Newman clustering coefficient; the all-pairs distance; and the number of leaf nodes for the three R valued networks. The number of arcs falls monotonically as nodes are culled. This is not surprising. These networks are small enough that the curves are not completely linear. Likewise the number of component clusters rises almost monotonically and as would be expected the fragmentation is more severe for the least-connected R = 50 network. The highest betweenness of the remaining nodes is seen to vary dramatically. In this particular pattern of nodes it starts high and then falls to a more steady value as the highly central nodes are cleared away. Interestingly it rises again dramatically for all three networks showing that the network has changed properties considerably after 3, 16, 19,27 and 35 nodes have been culled. These values are not trivially identifiable with component cluster fragmentation. The all-pairs distance (of the biggest connected component) shows similar properties which therefore seem to be associated with the internal cluster properties rather than edge or leaf behaviour. The number of leaf nodes changes only very slightly – from two down to zero for these changes. The Newman clustering coefficient [25] gives an indication of how compact the network is and is seen to rise generally as more nodes are culled. It behaves more smoothly for the R = 75 network but exhibits more dramatic shifts for more over-connected or under-

connected networks.

4

Western States Power Results

Figure 5: Degree Frequency of Occurrence in the Western States Data Set We can apply the graph metric analysis techniques to a real electrical power network data set. The Western States data [35] was compiled by Watts and Strogatz and is available online at Watts’ web-site (http: //cdg.columbia.edu/cdg/datasets)and has 4,941 nodes, with a total of 6,594 edges in a fullyconnected component cluster. The data are available only as a structural graph without spatial locations so a map overlay is not possible. Figure 5 shows the frequency of occurrence of degree for the 4,941 nodes in the Western States electrical power network data set. This is a simple centrality mea4

Figure 4: Three of the Synthetic NZ Power Distribution Networks (R = 50, 75, 100), showing variation with the number of (highest betweenness) culled nodes. sure to compute. This particular data set has generally more outputs than inputs and these fall-off in frequency of occurrence approximately exponentially with degree. This does emphasise that some nodes are very highly connected with degrees of up to 19. Figure 6 shows the betweenness centrality scatterplotted against the input degree of each node in the net-

work. This shows that there is a non trivial relationship between betweenness and simple degree. Curiously the data suggest that the nodes with the highest betweenness actually have relatively low input degree. Figure 7 shows the betweenness centrality scatterplotted against the input, output and total degree of the nodes. This emphasises the complex nature of the net-

5

Figure 6: Betweenness vs Input Degree for the Western States data Set.

Figure 9: Log-Log Ranked Betweenness for the Western States Data Set.

Figure 7: Betweenness vs Degree (Total, Input, and Output) for the Western States data Set. Colours show the three degree values.

phasises the importance of certain key nodes - not necessarily the ones with the highest degree however as we have seen from Figure 7. The all-pairs distance of the largest cluster as shown in Figure 8 rises almost completely steadily as nodes are culled. Interestingly this network shows a much higher number of leaf nodes and a more obvious fall off in leaf nodes as high betweenness nodes are culled. This is likely due to the formation of small islands - dimers and other small clusters as the network fragments. We have not plotted the Newman clustering coefficient as it is uniformly low for this network. Figure 9 shows a log-log plot of the betweenness centrality as it varies with the node rank. The limiting slope is given but the curve cannot really be ascribed to a power-law over most of the range. The tail-off is certainly smooth and nearly monotonic however.

work betweenness for output and total degree as well as inputs. Low to medium degree appears to be associated with high values of betweenness for this dataset. Figure 8 shows the Western States electrical distribution network as the node with the current highest betweenness centrality is removed by disconnecting it completely. This procedure is repeated 38 times which is enough to observe the network degrade considerably. The Western States electrical network is more than ten times larger than our synthetic New Zealand grid networks. As such the number of arcs falls over much more smoothly - and almost linearly with the culling of the highest-betweenness nodes. The number of component clusters rises almost linearly and certainly monotonically. The un-normalised betweenness is seen to fall off almost exponentially as nodes are culled. This em-

5

Discussion

Although the networks we have studied are not huge, it is still computationally significant to compute the metrics we have discussed. A typical 2012 era desktop can compute everything we have discussed for the synthetic network (around 150 nodes) in a matter of minutes but several days of processing were required to compute everything for the Western States electrical network of around 5000 nodes. This is exacerbated by the need to recompute properties for the whole network when a single node is culled. There is scope for considerable efficiency if algorithms could be developed to reuse unaffected information as the network ha s individual nodes culled. Generally we observe that the betweenness centrality 6

Figure 8: USA Western States Power Distribution Network data set, showing variation with the number of (highest betweenness) culled nodes. metric is a much more subtle and insightful property than the simple degree centrality. The anti correlations between these for the Western States power network are unintuitive and suggest that some combination of met-

rics is needed to categorise network properties. We have seen that our synthetic network generation algorithm based upon a fixed pattern of realistic node information is at least a plausible way of generating net-

7

works with a variable parameter and with which we can study the centrality, clustering and distance metrics presented. It appears that a very high betweenness of a relatively small number of key nodes can happen quite easily and by accident. A good strategy to design for network robustness therefore might be to simply identify them and add some extra redundant connectivity around these key nodes. It does not appear than simply identifying nodes as key because they have a high degree is necessarily valid. Other work is reported on further refinements to centrality metrics, including the use of all paths and not just the shortest ones [26]; and also on the use of heuristics and second order centrality metrics [22]. There is therefore scope for experimenting further with these metrics on synthetic and well understood test networks to try to build up more intuition about what they reveal about complex networks base don actual power and distribution systems.

6

are subject to many physical constraints and carry a legacy of historical design decisions. Simulation studies such as this can look at both static properties but also semi-dynamic ones as nodes are made to fail.

References [1] Barrows, E.C.S.P.H.C., Blumsack, S.: Comparing the topological and electrical structure of the north american electric power infrastructure. CoRR abs/1105.0214, 1–26 (2011) [2] Bernstein, A., Bienstock, D., Hay, D., Uzunoglu, M., Zussman, G.: Power grid vulnerability to geographically correlated failures - a numerical evaluation. Tech Report 2011-05-06, Comumbia University, Electrical Engineering (May 2011) [3] Brandes, U.: A faster algorithm for betweenness centrality. The Journal of Mathematical Sociology 25(2), 163–177 (2010) [4] Buldyrev, S., Parshani, R., Paul, G., Stanley, H.E., Havlin, S.: Catastrophic cascade of failures in interdependent networks. Nature 464, 1025–1028 (April 2010) [5] Callaway, D.S., Newman, M.E.J., Strogatz, S.H., Watts, D.J.: Network robustness and fragility: Percolation on random graphs. Phys. Rev. Lett. 85(255) (December 2000) [6] Cloteaux, B.: Extracting hierarchies with overlapping structure from network data. In: Proc. 2011 Winter Simulation Conference. pp. 3335–3343. Phoenix, Arizona, USA (6 December 2011) [7] Cropper, M.L., Limonov, A., Malik, K., Singh, A.: Estimating the impact of restructuring on electricity generation efficiency: The case of the indian thermal power sector. NBER Working Paper 17383, National Bureau of Economic Research (September 2011), http://www.nber.org/ papers/w17383 [8] Dijkstra, E.: A Note on Two Problems in Connexion with Graphs. Numerische Mathematik 1(1), 269–271 (1959) [9] Everett, M., Borgatti, S.P.: Ego network betweenness. Social Networks 27, 31–38 (2005) [10] Floyd, R.W.: Algorithm 97: Shortest Path. Communications of the ACM 5(6), 345 (1962) [11] Freeman, L.C.: Centrality in social networks conceptual clarification. Social Networks 1, 215–239 (1978) [12] Freeman, L.C., Borgatti, S.P., White, D.R.: Centrality in valued graphs: A measure of betweenness based on network flow. Social Networks 13, 141–154 (1991) [13] Goldberg, A.V.: A simple shortest path algorithm

Conclusion

We have studied synthetic test networks based on real node information but with an adjustable connectivity parameter based on radial distance. We have used the simpler test networks to explore the meaning and interpretation of various centrality, clustering and distance graph analysis metrics. After developing software that could apply these metrics both statically but also to a network that is progressively culled of central nodes and particularly identify the betweenness centrality metric as indicating key nodes more effectively than a simple degree centrality metric. We have observed both monotonic and catastrophic regions of network failure both as network inclusivity is varied but also as key nodes were made to fail. In summary, no one metric is enough to identify whether a real network is susceptible to catastrophic failure or which are the key nodes to be concerned out. However, collectively the metrics we discus do appear to give some good indications and a possible design strategy is to focus on supplying redundant by passes to those nodes that do have a high betweenness centrality and where the all-pairs distance within the largest component cluster exhibits a discontinuity. We have used simple unweighted networks in this study. Useful future work could include a consideration of network resistance/flow properties on weighted networks and compare them with appropriately measured betweenness. Electrical power distribution networks

8

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

with linear average time. In: Proc. 9th Annual European Symposium on Algorithms. pp. 230–241 (2001) Guimera, R., Amaral, L.A.N.: Modeling the world-wide airport network. Eur. Phys. J. B 38, 381–385 (2004) Hawick, K.: Interactive graph algorithm visualization and the graviz prototype. Tech. Rep. CSTN-061, Computer Science, Massey University (August 2008), www.massey.ac.nz/ ˜kahawick/cstn/061/cstn-061.html Hines, P., Blumsack, S.: A centrality measure for electrical networks. In: Proc. 41st Hawaii Int. Conf. on System Sciences. pp. 1–8. HICSS, Hawaii, Big Island (7-10 January 2008) Hines, P., Blumsack, S., Cotilla-Sanchez, E., Barrows, C.: The topological and electrical structure of power grids. In: Proc. 43rd Hawaii Int. Conf. on System Sciences. Koloa, Kauai, Hawaii, USA (5 January 2010) Hines, P., Cotilla-Sanchez, E., Blumsack, S.: Topological models and critical slowing down: Two approaches to power system blackout risk analysis. In: Proc. 44th Hawaii Int. Conf. on System Sciences. pp. 1–10. Kauai, Hawaii, USA (4-7 January 2011) Hooyberghs, H., Lombeek, S.V., Giuraniuc, C., Schaeybroeck, B.V., Indekeu, J.O.: Ising model for distribution networks. arXiv 1105.5329v1, Instituut voor Theoretische Fysica, Katholieke Iniversiteit Leuven, Belgium (May 2011) Johnson, J.K., Chertkov, M.: A majorizationminimization approach to design of power transmission networks. In: Proc. 49th IEEE Conf. on Decision and Control (CDC). pp. 3996–4003. Atlanta, GA, USA (15-17 December 2010) Joy, M.P., Brock, A., Ingber, D.E., Huang, S.: High-betweenness proteins in teh yeast protein interaction network. Journal of Biomedecine and Biotechnology 2, 96–103 (2005) Kermarrec, A.M., Merrer, E.L., Sericola, B., Tredan, G.: Second order centrality: Distributed assessment of nodes criticity in complex networks. Computer Communications 34, 619–628 (2011) Kocarev, L., In, V.: Network science: A new paradigm shift. IEEE Network Nov/Dec, 6–9 (2010) Ling, Z., Shan-Shui, Y., Lin, C., Li, W.: Reliability analysis of dc power distribution network based on minimal cut sets. IEEE Network 1, 1–7 (2011) Newman, M.E.J.: Small worlds: The structure of social networks. Complexity (2000)

[26] Newman, M.E.J.: A measure of betweenness centrality based on random walks. Social Networks 27, 39–54 (2005) [27] Nguyen, P.H., Kling, W.L., Georgiadis, G., Papatriantafilou, M., Tuan, L.A., Bertling, L.: Distributed routing algorithms to manage power flow in agent-based active distribution network. In: Proc. IEEE Innovative Smart Grid Technologies Conference Europe (ISGT Europe). pp. 1–7. Gothenberg (11-13 October 2010) [28] Pagani, G.A., marco Aiello: The power grid as a complex network: a survey. arXiv 1105.3338v1, University of Groningen, The Netherlands, Groningen (May 2011) [29] Prim, R.: Shortest connection networks and some generalizations. The Bell System Technical Journal November, 1389–1401 (1957) [30] Queiroz, L.M.O., Lyra, C.: Adaptive hybrid genetic algorithm for technical loss reduction in distribution networks under variable demands. IEEE Trans. on Power Systems 24(1), 445–453 (Feb 2009) [31] Standish, R.K.: Complexity of networks. In: Recent Advances in Artificial Life. vol. 3, pp. 253– 263. World Scientific, ACAL’05, Sydney, Australia (2005), chapter 19 [32] Tizghadam, A., Leon-Garcia, A.: Betweenness centrality and resistance distance in communication networks. IEEE Network Nov/Dec, 10–16 (2010) [33] Uriarte, F.M.: A tensor approach to the mesh resistance matrix. IEEE Trans. on Power Systems 26(4), 1989–1997 (2011) [34] Watson, J., Hawkins, J., Bradley, D., Dassanayake, D., Wiles, J., Hanan, J.: Towards a network pattern language for complex systems. In: Advances in Artificial Life. vol. 3, pp. 309– 317. World Scientific, ACAL’05, Sydney, Australia (2005), chapter 23 [35] Watts, D.J., Strogatz, S.H.: Collective dynamics of ‘small-world’ networks. Nature 393, 440–442 (June 1998) [36] Yazdani, A., Jeffrey, P.: A complex network approach to robustness and vulnerability of spatially organized water distribution networks. arXiv 1008.1770v2, Cranfield University, UK (August 2010)

9