A New Small World Lattice - Semantic Scholar

3 downloads 0 Views 144KB Size Report
A small world network is generated by random “rewiring” of this ring lattice. ... a node's neighbors by determining the ratio of actual number of edges present.
A New Small World Lattice Abhishek Parakh1 and Subhash Kak2 1

2

College of Information Science and Technology, University of Nebraska at Omaha, Omaha, NE 68182 ? Computer Science Department Oklahoma State University, Stillwater, OK 74078 ??

Abstract. This paper considers a scalable lattice that may be used to generate models of random small world networks. We describe its properties and investigate its robustness to random node failures. We also define group and reachability coefficients to characterize the properties of the network. Simulation results are presented that show that the new coefficients well describe a social network.

1

Introduction

Social networks have been modeled in many different ways and their characteristics analyzed for connectivity, scalability and flow of information through them [2, 5, 6, 3]. Amongst these is the Watts-Strogatz (WS) small world model which has been used in social, engineering and computer networks [11]. The WS model begins with a ring lattice consisting of N nodes, where each node is directly connected to k immediate neighbors that are located symmetrically on two sides of a node. A small world network is generated by random “rewiring” of this ring lattice. The extent of rewiring is controlled by probability p. The phenomenon of a rapid decrease in network diameter upon addition of random long distance connections between nodes in a network is the “small world phenomenon”. Small world networks are generally associated with high clustering coefficient C and small average shortest path lengths L that capture many features of social computing networks. The clustering coefficient measures the cliqueness of a node’s neighbors by determining the ratio of actual number of edges present between a node’s neighbor to the total allowable edges between them; the average path length is the average of the lengths of shortest paths that exist between a node and other nodes in the network. In this paper, we propose a new small world (NSW) lattice to serve as a template to generate larger small world networks. Our model has nodes that have many local connections and a few far connections. The connections in the lattice are so arranged that they achieve a balance between the number of links per node and the diameter of the network. In the resulting network every node can communicate with any other node either directly or via multihop. We show ? ??

[email protected] [email protected]

log

N

that the upper bound on the number of hops is b 22 8 c + 2, where N is the number of nodes in the network. Section 2 defines group and reachability coefficients that well describe the characteristics of a social network. Section 3 presents our NSW model, describes its characteristics, the upper bound on the path length in the network and effects of random link failures on it.

2 2.1

Group and Reachability Coefficient The Group Coefficient

The clustering coefficient of a node is defined as

m k(k−1) 2

, where m is the actual

number of connections existing between the neighbors of a node and k is the number of neighbors of that node [11]. This does not consider the node itself, that is in question, as a part of the cluster. Consequently, tree networks (that form the basis for scale-free networks, commonly modeling the World Wide Web, citation networks and some social networks [1]) have a clustering coefficient of zero for every node, yet the network may be connected, indicating that the clustering coefficient fails to characterize such networks. We, therefore, propose a group coefficient that considers the node also to be a part of the cluster: G = Group coefficient =

m+k (k+1)k 2

Akin to C(p) in [11], let G(p) be the group coefficient with probability p of rewiring an edge (see figure 1). The relationship between the clustering coefficient k−1 2 and the group coefficient is given by G = C × ( k+1 ) + k+1 . 2 > 0, When the clustering coefficient is zero, the group coefficient is k+1 which is intuitively satisfying because even though the neighbors may not have any interconnections between them, they can all communicate through a single node. The node, for which we are calculating the group coefficient, now acts as the root of a tree. The group coefficient is inversely proportional to the number of nodes that will get disconnected if the root node of the tree fails. For an isolated node (m = 0 and k = 0) we define G to be 0. For a fully connected network, the clustering coefficient of every node is 1 and k = N − 1. Therefore, the resulting group coefficient is also 1. 2.2

The Reachability Coefficient

The reachability coefficient provides a means of comparing the ‘connectedness’ of nodes in the network. One may argue that this information may be deduced from the average path length, but the average path length is only meaningful for connected networks. We are considering a general situation where a given set of nodes may form directed or undirected networks. We express the reachability coefficient of a node as:

R = P1 · [ hP1 · (P ) + hP2 · (P − 1) + hP3 · (P − 2) + . . . + hNP−1 · 1] where hi is the number of nodes at a distance of i hops, N is the network size and P = N − 1. Some of the properties of the reachability coefficient are: 1. The reachability coefficient for an isolated node is 0. 2. The reachability coefficient for a node with N − 1 neighbors is 1. 3. The reachability coefficient of a node at the edge of a connected chain network approaches 0.5 as N approaches infinity.

3

The New Small World Lattice

We introduce the new small world lattice, NSW (figure 2) that achieves the smallest diameter and average path length, using the least number of edges, when routing is performed only using local information. By using local information, we mean that a node sends a message to one of its direct connections that is closest to the destination, i.e. the node chooses the local best at every step. The proposed small world lattice consists of O(logN ) connections for each node. The lattice of figure 2 shows up in binary search of a sorted sequences, where a divide and conquer strategy is used [4]. Each node is connected so that it divides the sequence of nodes by successive halves. If we consider nodes, with ids 1 to N , to lie on a circle then node i is connected to nodes i+1, i+2, i+4, i+8, i+16, i+32, and so on. Figure 2 depicts only the connections that node 1 establishes. Since the network is symmetric, nodes on the left of node 1 have similar connections with node 1 resulting in 2log2 N − 1 connections for the node, when N = 2x , for some integer x. Let N1 (i) denote the set of neighbors of node i; N2 (i) denote the set of neighbors of neighbors of node i and so on. All the nodes in set N1 (i) can be reached in h = 1 hop, all the nodes in set N2 (i) − N1 (i) can be reached in h = 2 hops, all the nodes in set N3 (i) − (N2 (i) + N1 (i)) can be reached from node i in h = 3 hops and so on. Let |Nh (i)| denote the cardinality of set Nh (i), then the expected number of hops between node i and a randomly and uniformly chosen node in a network of N nodes and diameter D is given by, E(hops) =

Ph−1 D X |Nh (i) − ( k=1 Nk (i))| ·h N −1

h=1

where “-” denotes set difference and “+” denotes set union. In general, the connections shown in the network in figure 2 may be taken as bidirectional or unidirectional connections. When taken as unidirectional connections, the network diameter D is O(dlog2 N e) [9]. In a social network each connection is considered to be bidirectional. A message is routed via multiple hops in the following manner (the routing protocol):

1 0.9 0.8 0.7 0.6 C(p)/C(0) G(p)/G(0) C(p)/G(p)

0.5 0.4 0.3 0.2 0.1 0 −4 10

10

−3

−2

10 p

−1

10

10

0

Fig. 1. The characteristic curves for the clustering coefficient and the group coefficient are plotted above for the WS model, N=512. p = probability of rewiring.

Fig. 2. Only connections of node 1 are shown in the NSW model with a network size of N =16.

1. To communicate with node q, node i first sends the message to node m that is closest to the destination and is a direct contact of node i. 2. Node m then checks if node q is its direct contact. If it is, then node m sends the message to node q directly. However, if q is not one of m’s direct

contacts, then node m locates the next node, l, that is closest to node q, among its direct contacts, and sends the message to it. 3. Node l repeats step 2, and so on until the message reaches node q. Although the above routing protocol may cause a message to bounce back and forth about a destination, the distance from the destination will decrease at each step. Theorem: Consider a network having bidirectional connections and N nodes, where N = 2k , for some integer k ≥ 3, such that every node connects with 2log2 (N ) − 1 other nodes as described in the NSW model, then the upper bound log

N

on path length is b 22 8 c + 2 hops. Proof: A formal proof is omitted here because of space constraints. However, it may be found in [7] and [8]. Upon tabulating the number of nodes at different hops in the network, we see that for unidirectional connections the distribution forms Pingala’s Meru Prastara (more commonly known as Pascal’s triangle) [10]. However, when the network connections are bidirectional, the resulting distribution is that of table 1. It is seen that the distribution in table 1 is determinate. Table 1. Distribution of nodes at various hops for different network sizes and bidirectional connections

N 2 4 8 16 32 64 128 256 512 1024

Number of hops 01 2 3 4 5 11 13 15 2 17 8 1 9 18 4 1 11 32 20 1 13 50 56 8 1 15 72 120 48 1 17 98 220 160 16 1 19 128 364 400 112

Our observations from table 1 are: 1. The number of nodes at 1 hop increases by 2 as the network size increases from 2k to 2k+1 . Another formula determining the number of nodes at 1 hop is 2log2 N − 1. 2. The fourth column in the table is for nodes at 2 hops and is given by 2 · n2 , n = 1, 2, . . .. 3. The fifth column in the table is for nodes at 3 hops and is given by 2n·(n+1)(2n+1) , 3 n = 1, 2, . . ..

4. Similarly, the sixth column in the table is for nodes at 4 hops and is given 2 2 by n ·(n3−1)·2 , where n = 2, 3, . . ..

1 0.9 0.8 0.7 0.6

C(p)/C(0) L(p)/L(0)

0.5 0.4 0.3 0.2 0.1 −4 10

10

−3

−2

10 p

−1

10

10

0

Fig. 3. Plot of average path length (L) and the clustering coefficients (C) of the NSW model

1

Average reachability coefficient

0.998

0.996 WS model NSW model

0.994

0.992

0.99

0.988 0

0.1

0.2

0.3

0.4 0.5 0.6 0.7 Probability of rewiring p

0.8

0.9

1

Fig. 4. A comparison between reachability coefficient of the WS model and the NSW model for a network size of 1024 and varying probability of reconnection.

We randomize the new lattice using a randomization procedure where for every node on the lattice each of its edges is rewired with a probability p to a

randomly chosen node on the lattice. As the probability of rewiring is increased, the network approaches a random network and loses its small world properties. Figure 3 plots the ratio of L(p) and C(p) to their values at p = 0, respectively, where p = 0 represents the starting NSW model and p = 1 is the random network. From the figure we see that compared to a random network, the NSW model has a significantly higher clustering coefficient while still having comparable average path lengths. This is one of the desirable properties of a small world network. Figure 4 makes a comparison between the reachability coefficient of the WS model and the NSW model when the probability of rewiring is varied from 0.1 to 0.99. Note that as the rewiring probability increases the network approaches a random network which are known to be characterized by low average path lengths resulting in high reachability coefficients. We see from the graph that the NSW model has a high reachability coefficient, close to random graphs.

% of nodes disconnected

1 0.8 0.6 0.4 0.2 0 0

0.2

0.4 0.6 0.8 Probability of link failure p

1

Fig. 5. Percentage of nodes that get isolated as the probability of link failure increases. Network size is 512 (NSW model)

3.1

Robustness of the Proposed Model

In order to model the link failures we start with the NSW model and then remove links (representing a failure) based on probability p. Figures 5 plots the percentage of nodes that get isolated resulting in an unconnected network. It is seen that the first isolated node occurs when the probability of link failure is about p = 0.58. This figure may be supported by the fact that nodes in the NSW have about 2log2 N connections, however, as the probability of link failure increases to above 0.5, the number of connections decreases below log2 N , causing the network to become unconnected.

Average reachability coefficient

1 0.8 0.6 0.4 0.2 0 0

0.2

0.4 0.6 0.8 Probability of link failure p

1

Fig. 6. Reachability coefficient of the network potted against the probability of link failure for a network of size 512 in the NSW model.

4

Conclusions

In this paper we introduced group and reachability coefficients that work well in the characterization of the properties of a social network. The new small world lattice, introduced in this paper, achieves a balance between the number of connections and the average path length in the network. Compared to a random network, the NSW model possesses a high clustering coefficient while maintaining comparable path lengths. Upon testing the effects of random link failures on the NSW model, we see that the network characteristics deteriorate only when node degree falls below log2 N . Further, unlike random graphs, the routing algorithm and the path is predetermined in the NSW model.

References 1. A. Barabasi and R. Albert. Emergence of Scaling in Random Networks. Science, 286(5439):509–512, 1999. 2. J. Caverlee, L. Liu, and S. Webb. The socialtrust framework for trusted social information management: Architecture and algorithms. Information Sciences, 180(1):95 – 112, 2010. Special Issue on Collective Intelligence. 3. C. Haythornthwaite. Social network analysis: An approach and technique for the study of information exchange. Library & Information Science Research, 18(4):323 – 342, 1996. 4. D. E. Knuth. The Art of Computer Programming, Volume 3: (2nd ed.) Sorting and Searching, volume 3. Addison Wesley Longman Publishing Co., Inc., Redwood City, CA, USA, 2 edition, 1998. 5. R. Monclar, A. Tecla, J. Oliveira, and J. M. de Souza. Mek: Using spatial-temporal information to improve social networks and knowledge dissemination. Information Sciences, 179(15):2524 – 2537, 2009.

6. Y. Ni, L. Xie, and Z.-Q. Liu. Minimizing the expected complete influence time of a social network. Information Sciences, 180(13):2514 – 2527, 2010. 7. A. Parakh and S. Kak. A key distribution scheme for sensor networks using structured graphs. In Emerging Trends in Electronic and Photonic Devices Systems, 2009. ELECTRO ’09. International Conference on, pages 10 –13, dec. 2009. 8. A. Parakh and S. Kak. Efficient key management in sensor networks. In GLOBECOM Workshops (GC Wkshps), 2010 IEEE, pages 1539 –1544, dec. 2010. 9. I. Stoica, R. Morris, D. Liben-Nowell, D. R. Karger, M. F. Kaashoek, F. Dabek, and H. Balakrishnan. Chord: a scalable peer-to-peer lookup protocol for internet applications. IEEE/ACM Transactions on Networking, 11(1):17–32, 2003. 10. L. Varshney. Local fidelity, constrained codes, and the meru prastara. Potentials, IEEE, 27(2):27–32, March-April 2008. 11. D. J. Watts and S. H. Strogatz. Collective dynamics of ‘small-world’ networks. Nature, 393:440–442, 1998.