Community Detection Using Differential Evolution Algorithm with ...

4 downloads 1810 Views 229KB Size Report
multiple objective functions for community detection. The community detection in social networks is an important problem in many scientific fields. Differential ...
International Journal of Urban Design for Ubiquitous Computing Vol. 2, No. 1 (2014), pp.7-14 http://dx.doi.org/10.21742/ijuduc.2014.2.1.02

Community Detection Using Differential Evolution Algorithm with Multiple Objective Function Harish Kumar Shakya, Kuldeep Singh and Bhaskar Biswas Department of Computer Science & Engineering Indian Institute of Technology (BHU), Varanasi, INDIA [email protected], [email protected], [email protected] Abstract In this paper, we conducted an experiment on differential evolution algorithm with multiple objective functions for community detection. The community detection in social networks is an important problem in many scientific fields. Differential algorithms use modularity as a fitness function in general. In this work, we have used different objective function such as conductance, normalized cut, Average degree and two well known datasets such as Zachary Karate Club and American College Football Network. We evaluated DECD on several artificial and real-world social and biological networks using multiple objective functions. After result analysis shows that DECD with multiple objective functions has very competitive performance compared with other community detection algorithms. Keywords: community detection, differential evolution algorithm, objective function, social network

1. Introduction Recently, research on social network has become a hotspot in interdisciplinary subjects. Social networks can be seen as a abstract of a complex network, where a node represents an individual or a component in such systems, and an edge represents natural or artificial relationships. Community detection is hot research area in social networks from last few years. Community detection is a process to find groups according to the different parameters and depends on network’s environment, i.e. distance based, property based, structure based and relation based etc. In recent years, many community detection methods and survey have been introduced with each such methods being classified according to its algorithms. Most survey classifies research papers and methods according to the type of community detection algorithm. Definition of community is varying from author to author and from algorithm to algorithm. Consist of a group of network nodes that are relatively densely connected to each other but sparsely connected to other dense groups in the networks called a community. In survey by S. Fortunato [2, 18] is exhaustive with respect to many community detection methods and has been based on a graphic representation. In survey conducted by Porter [2, 44] only includes graph partitioning approaches and offers insight into graphical techniques through citing the first survey. In survey by B. Yang [2, 46] is quite exhaustive relative to all techniques relying on graphical representation and produces a good overview of the field through classifying all techniques in a tree structure. N. Gulbahce and S. Lehmann [2, 26] conducted a partial survey analyzing hierarchical type community detection methods and provides a number of leads for future community detection approaches. In another survey by Pons [2, 43] incorporates several community detection methods and classifies them into five different families’ i.e. classical

ISSN: 2205-8605 IJUDUC Copyright ⓒ 2014 GV School Publication

International Journal of Urban Design for Ubiquitous Computing Vol. 2, No. 1 (2014)

approaches, separative approaches, agglomerative approaches, random walk type algorithms and miscellaneous approaches. Papadopoulos [2, 42] classifies community detection techniques in five methodological categories’ i.e. Cohesive group discovery [2, 41], Vertex clustering [2, 43], Community quality optimization [2, 40, 11], Divisive [2, 40], Model based method [2, 23]. Danon [2, 14] primarily focuses on the performance of each type of algorithm. Community as a group of network nodes, Some other methods are available for community identification- hierarchical clustering [1,20,21], Distance based structural equivalence[1, 20, 22], Pearson co-rrelation [1, 20, 37], Donetti-Munzo method [1, 21, 37], Capocci method[1, 24, 47],Newman-Girwan method[1,40], Spectral Partitioning [1, 8, 37], Extremal Optimization [1, 37, 38], Potts Method[6], Louvain algorithm[5], Modularity based algorithm [2, 48, 8, 12], Dynamica algorithm [2, 45, 13], Cetrality based technique by GN [2, 40], Modularity optimization method by GN[39],Heuristic algorithm [2, 25, 28], Spectral bisection form [2, 29], kernigham & Lin method [2, 31], Random walk algorithm [2, 27].

2. Literature Survey In recent years, many approaches to reveal community structure in networks have been proposed. In particular, modularity optimization is the most known community detection method, which was proposed by Girvan and Newman [25].They used the concept of modularity as the criterion to stop the division of a network in sub-networks in their divisive hierarchical clustering algorithm. Fast greedy modularity optimization was introduced by Clauset et al. [9]. This method is essentially a fast implementation of a previous technique proposed in Ref. [25]. Starting from a set of isolated nodes, the links of the original graph are iteratively added such to produce the largest possible increase of the modularity of Newman and Girvan at each step [15]. In the following we will refer to the method as FM. In Ref. [16], Rosvall and Bergstrom turned the problem of finding the best cluster structure of a graph into the problem of optimally compressing the information on the structure of the graph, so that one can recover as closely as possible the original structure when the compressed information is decoded. In the following we will refer to the method as Info Map. In Ref. [17], proposed a memetic algorithm for optimizing the modularity density [32], which we named Meme- Net, to reveal community structure of a network. Meme-Net also shows its ability to explore the network at different resolutions and reveal the hierarchical structure of the network. GANet [9], a genetic algorithm for community detection in social networks proposed by Pizzuti, introduced the concept of community score to measure the quality of a partitioning of a network in communities, and tried to optimize this quantity by a genetic algorithm. In GA-Net, only one objective function, community score, is optimized, so that only a certain solution is obtained in one run. Unlike most existing methods, the algorithm does not require the number of communities in advance. This number is automatically determined by the optimal value of the community score. Pizzuti proposed MOGA-Net in [7], which employs Multiobjective Genetic Algorithm to uncover community structure in complex networks. This algorithm introduces two objective functions. The first objective function employs the concept of community score to measure the quality of the division in communities of a network. The higher the community score, the denser the clustering obtained. The second defines the concept of fitness of the nodes, which belong to a module, and iteratively find modules, which have the highest sum of node fitness, in the following referred to as community fitness. When this sum reaches its maximum value, the number of external links in minimized. Both the objective functions have a positive real-valued parameter controlling the size of the communities. The higher the value of the parameter, the smaller the size of the communities found. MOGA-Net exploits the

2

Copyright ⓒ 2014 GV School Publication

International Journal of Urban Design for Ubiquitous Computing Vol. 2, No. 1 (2014)

benefits of these two functions and obtains the communities present in the network by selectively exploring the search space, without need to know in advance the exact number of groups. This number is automatically determined by the optimal compromise values of the objectives. An interesting result of the multiobjective approach is that it returns not a single partitioning of the network, but a set of solutions [7]. Each of these solutions corresponds to a different trade-off between the two objectives and thus to diverse partitioning of the network consisting of various number of clusters. This gives the readers a great chance to analyze several partitions at different hierarchical levels. Modularity Q has been widely used recently. Modularity Q was used in Ref. [10] which optimized network modularity using genetic algorithm to detect community. It is scalable to very large networks and does not need any a priori knowledge about the number of communities or any threshold value. However, Fortunato and Barthélemy [30] showed mathematically that the optimization of modularity has a resolution limit, raising important concerns about the reliability of the modules detected so far using this technique, or eventually using some other quality functions. Differential evolution (DE) is a stochastic population-based optimization algorithm, introduced by storn & price in 1995. It is a method that optimizes a problem by interactively trying to improve a candidate solution with regards to a given measure of quality. For these types of problems modularity is a major issue. Modularity is the degree to which a system’s component may be separated & recombined. A system is called modular when they can be decomposed into a number of components that may be mixed & matched in a variety & configuration in this we are using DE for community detection. To detect the underlying community structure in social network, many successful algorithms have been proposed but most of them are based on greedy algorithms & performs poorly on large social networks. Moreover, many algorithms for community detection also require some prior knowledge about the community structure, e.g., the number of communities, which is very difficult to be obtained in real-world networks. To overcome these drawbacks, a new community detection algorithm based on Differential evolution (DE), named DECD. To the best of our knowledge, it is the first time DE is introduced for community detection. In DECD, DE is used to evolve a population of potential solutions for network partitions to maximize the network modularity [3, 40]. It is worth mentioning that DECD does not require any prior knowledge about the community structure when detecting communities in networks, which is beneficial for its applications to real world problems where prior knowledge is usually not available. Apart from introducing DE for community detection, other key contributions of this algorithm:  The design of an improved version of the standard binomial crossover in DE to transmit some important information about the community structure during evolution in DECD.  A biased process and a clean-up operation similar to [3, 33] is introduced to DECD to improved the quality of the individuals in the population.  A thorough evolution of the performance of DECD on artificial and real-world social networks, which achieved better results than other state-of-art community detection algorithms.  The application of DECD to a yeast interacting protein datasets [3, 34], which achieve the best results in the literature. Differential evolution (DE) algorithm is a very simple yet efficient evolutionary algorithm proposed by storn and price in 1995 [3, 35]. DE algorithm is evolving from the very basic evolutionary algorithm like genetic algorithm (GA) with the help of some very important modifications e.g. Crossover operation, fitness function, mutation operation,

Copyright ⓒ 2014 GV School Publication

3

International Journal of Urban Design for Ubiquitous Computing Vol. 2, No. 1 (2014)

biased process and clean-up operation similar [3, 33] to improve the quality of the individual in the population. In DECD we used the greedy approach, means random search in the whole dataset, in small social networks DECD perform best results but after some study about differential evolutiona experiments and results we know that the DE can-not handle large amount of data (large networks) we found that the random grouping scheme used in [4, 36] is not suitable for large scale social network community detection problem, because it will lose connectivity information of the network, which crucial for the search performance of DE on modularity. In order to achieve better scalability to handle large-scale networks Therefore, Qiang Huang et al. [4] make some change in DECD e.g. bias grouping, global network mutation and divide & conquer strategy is used. In this strategy divide a large scale problem into sub-components and evolve those subcomponents independently and co-adaptively. In bias grouping scheme for handling large scale networks. The idea behind this bias grouping scheme is to dynamically decompose the whole networks into smaller subcomponents which each consist of nodes that are more likely connected to each other. Therefore, the search algorithm can optimize these tightly interacting variables together, which will ultimately lead to better results than splitting variables into subcomponents with unconnected nodes.

3. Experimental Work and Result Analysis In this paper, we have conducted number of experiments for optimize the results of community detection in social networks. In these experiments, we used the differential evolutionary algorithm. The DE algorithm evolves from the genetic algorithm, where the modularity function has been used as a fitness function. Main idea behind of this experiment is to evaluate different objective function. We have used Conductance, Normalized cut, Average degree as as objective functions. This table was made after performing experiments 5 times while in the researcher it was run 50 times due to this there was a variation in the result. Table 1. Comparative Results between Used Objective Function

OBJECTIVE FUNCTION

DATABASE

Modularity

Zachary Karate Club 0.710

American College Football 0.920

Conductance

0.571

0.650

Normalized cut

0.226

0.412

Average degree

0.699

0.641

We performed DE + MULTIPLE OBJECTIVE FUNCTION on two well known Data Sets i.e. Zachary Karate club and American College Football Network. We assessed the performance of various algorithms on the basis of NMI. Main Purpose of using these 3 objective functions in place of the standard modularity function was to check whether there exist any functions which can replace modularity and give better results. As it can be seen from the table given above that very few function come close to modularity and the best of all the 3 functions is Average Degree which has the least difference between

4

Copyright ⓒ 2014 GV School Publication

International Journal of Urban Design for Ubiquitous Computing Vol. 2, No. 1 (2014)

NMI values. The worst function to calculate fitness of communities is Normalized cut. All the experiments performed and using Mostly two datasets Zachary karate club and American college football team. 3.1 Zachary Karate Club Zachary Karate Club is one of the classic studies in social network analysis [25]. Over the course of two years in the early 1970s, Wayne Zachary observed social interactions between the members of a karate club at an American university. He built network of connections with 34 vertices and 78 edges among members of the club based on their social interactions. By chance, a dispute arose during the course of his study between the club's administrator and the karate teacher. As a result, the club splits into two smaller communities with the administrator and the teacher being as the central persons accordingly. 3.2 American College Football We turn to the network of American College Football [25]. This network represents the schedule of Division I games for the 2000 season. It consists of 115 vertices and 616 edges which are the representations of football teams and regular season games among them respectively. During the 2000 season, all of the 115 teams are divided into 12 conferences containing around 8 to 12 teams each. Games are more frequent between members of the same conference than between members of different conferences. Apparently, each conference can be considered as one Community of the network.

Figure 1. NMI Best for Different Objective Function Using Two Datasets

The experimental results are shown in tabular and graph formats. The results are almost equal to the modularity objective function which is used as a base. The results are represented in the form of NMI value (Normalized mutual information). NMI is a standard quality metric which checks for the performance of community detection algorithms. In social networks mostly used the NMI and modularity function as a evaluator for different algorithms. NMI values lies between 0 to 1, the values near about 1 is represent better results and near about 0 is worst results. We have shown a output table1, in that table contains three columns first one is objective function and second is NMIbest for Zachary karate club and last is NMIbest for American football team. In this table, DECD is used the modularity as an objective function and the whole experiment perform on Differential evolutionary algorithm with multiple objective functions. In both dataset similarly next objective function Conductance is equal perform, miner difference between NMI values. Normalized cut is better for large datasets compare to small datasets, the NMI values is clearly shown karate club contain 0.226 and Football team is 0.412. Average degree is just opposite to normalized cut means average degree objective

Copyright ⓒ 2014 GV School Publication

5

International Journal of Urban Design for Ubiquitous Computing Vol. 2, No. 1 (2014)

function is good for small dataset (karate club=0.699) and performs poor for large dataset (Football team=0.641). Based on the experimental results, Differential evolutionary algorithm is not good for very large datasets, but in this results 3 objective function out of 4 perform good for large datasets and only 1 objective function if good for small datasets. We should use many other objective functions in DE with some different concepts like as bi-objective and multi-objective functions because objective function is key role play in social networks for community detection. Each and every objective functions have contain specific property for calculating the network metrics and actor metrics, so we can choose each objective function for specific conditions because they provide best results on that positions.

4. Conclusion and Future Work In this paper, we have done experiments on evolutionary algorithm for community detection in social network. We focus only differential evolutionary (DE) algorithms for community detection in social network with the help of different type objective functions are used. We have also done the survey of community detection algorithms for community detection in social network. Objective function is a method to check the results of community detection algorithms in social networks. Objective function is defined & verified the available communities are good or not according to quality wise (definition of community). We have done experiments on DE using different objective function (quality metric) and modify the results of community detection based on number of datasets. We can say that after my experiments DE is more appropriate for average degree and conductance objective function compare to others. In future work we will do two different tasks, first is using single criterion objective function and multi-criterion objective function with DE and second is we will do the same experiments with different swarm techniques.

References [1] H. K. Shakya and B. Biswas, “Review of Community Detection Approaches in Social Networks using Bayesian Method and Graph Theory”, Trends in Innovative Computing 2012 Information Retrieval and Data Mining , WICT 2nd International Conference (IIITM-kerala), (2012). [2] M. Planti´e and M. Crampes, “Survey on Social Community Detection”, Published in Social media retrieval, Springer, Version-1, (2013) March 25th, pp. 65-85. [3] G. Jia, Z. Cai, M. Musolesi, Y. Wang, D. A. Tennant, R. J. M. Weber, J. K. Heath and S. He, “Community detection in social and biological networks using Differential Evolution”, Learning and Intelligent Optimization Conference LION 6, Paris, (2012) January 16-20. [4] Q. Huang, G. Jia, T. White, M. Musolesi, N. Turan, K. Tang, S. He, J. K. Heath and X. Yao, “Community Detection Using Cooperative Co-evolutionary Differential Evolution”, PPSN 2012, Part II, LNCS 7492, (2012), pp. 235-244. [5] V. D. Blondel, J.-L. Guillaume, R. Lambiotte and E. Lefebvre, “Fast unfolding of communities in large networks”, Journal of Statistical Mechanics: Theory and Experiment, vol. 10, P10008, (2008) October. [6] J. Reichardt and S. Bornholdt, “Detecting Fuzzy Community Structures in Complex Networks with a Potts Model”, Physical Review Letters, vol. 93, no. 21, (2004). [7] C. Pizzuti, “A multi-objective genetic algorithm for community detection in networks”, Proceedings of the 21st IEEE International Conference on Tools with Artificial Intelligence, Newark, New Jersey, USA, (2009), pp. 379-386. [8] U. Brandes, M. Gaertler and D. Wagner, “Experiments on Graph Clustering Algorithms”, In Proceedings 11th European Symposium on Algorithms (ESA ’03), (2003), pp. 568-579.

6

Copyright ⓒ 2014 GV School Publication

International Journal of Urban Design for Ubiquitous Computing Vol. 2, No. 1 (2014)

[9] C. Pizzuti, “Ga-net: a genetic algorithm for community detection in social networks”, Parallel Problem Solving from Nature C PPSN X, in: Lecture Notes Computer Science, Springer, Berlin, Heidelberg, vol. 5199, (2008), pp. 1081-1090. [10] A. Lancichinetti, S. Fortunato and K. Kertesz, “Detecting the overlapping and hierarchical community structure of complex networks”, New Journal of Physics, vol. 11, (2009). [11] A. Clauset, “Finding local community structure in networks. Physical Review E –Statistical”, Nonlinear and Soft Matter Physics, vol. 72, no.2, (2005). [12] A. Clauset, M. Newman and C. Moore, “Finding community structure in very large networks”, Physical Review E, vol. 70, no. 6, (2004), pp. 1-6. [13] D. Hughes, “Random walks and random environments”, vol. 1: Random walks. Bulletin of Mathematical Biology, vol. 58, no. 3, (1996), pp. 598-599. [14] L. Danon, J. Duch, A. Diaz-Guilera and A. Arenas, “Comparing community structure identification”, Journal of Statistical Mechanics: Theory and Experiment, vol. 09, no. 10, (2005). [15] E. Zitzler and L. Thiele, “Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach”, IEEE Transactions on Evolutionary Computation, vol. 3, no. 4, (1999), pp. 257-271. [16] M. Rosvall and C. T. Bergstrom, “Maps of random walks on complex networks reveal community structure”, Proceedings of the National Academy of Sciences of the USA, vol. 105, no. 4, (2008), pp. 1118-1123. [17] M. Gong, B. Fu, L. Jiao and H. Du, “A memetic algorithm for community detection in networks”, Phys. Rev. E., vol. 84, no. 5, 056101, (2011). [18] S. Fortunato, “Community detection in graphs”, Physics Reports, vol. 486, no. 3-5, (2009) June, pp. 103. [19] S. Fortunato and C. Castellano, “Community Structure in Graphs”, chapter of Springer’s encyclopedia of Complexity and System Science, (2008). [20] H. Bo, “Research on community detection and its applications [Thesis]”, Beijing Jiaotong University, (2009). [21] L. Donetti and M. A. Munoz, “Detecting network communities: a new systematic and efficient algorithm”, Journal of Statistical Mechanics: Theory and Experiment, (2004). [22] L. A. Adamic and E. Adar, “Friends and neighbors on the web [Journal Article]”, Social Networks, vol. 25, no. 3, (2003), pp. 211-230. [23] S. Gregory, “Finding overlapping communities in networks by label propagation”, New Journal of Physics, vol. 12, no. 10, (2009). [24] A. Cappocci, V. D. P. Servedio and G. Caldarelli, “Colaiori F-Detecting communities in large networks, Physica A”, vol. 352, no. 2-4, pp. 669-676. [25] M. Girvan and M. Newman, “Community structure in social and biological networks”, Proceedings of the National Academy of Sciences of the United States of America, vol. 99, no. 12, (2002), pp. 7821-7826. [26] N. Gulbahce and S. Lehmann, “The art of community detection, BioEssays news and reviews in molecular cellular and developmental biology”, vol. 30, no. 10, (2008) October. [27] H. David and K. Yehuda, “On Clustering Using Random Walks”, In R. Hariharan, M. Mukund, and V. Vinay, editors, FSTTCS 2001, LNCS 2245, Berlin Heidelberg, SpringerVerlag, (2001), pp. 18-41. [28] J. R. Tyler, D. M. Wilkinson and B. A Huberman, “Email as Spectroscopy: Automated discovery of Community Structure within Organizations”, In Communities and technologies, Kluwer, (2003), pp. 81-96. [29] A. Pothen, H. D. Simon and K.-P. Liou, “Partitioning Sparse Matrices with Eigenvectors of Graphs”, SIAM Journal on Matrix Analysis and Applications, vol. 11, no. 3, (1990) May, pp. 430. [30] S. Fortunato and M. Barthélemy, “Resolution limit in community detection”, Proceedings of the National Academy of Sciences of the USA, vol. 104, no. 1, (2007), pp. 36-41.

Copyright ⓒ 2014 GV School Publication

7

International Journal of Urban Design for Ubiquitous Computing Vol. 2, No. 1 (2014)

[31] B. W. Kernighan and S. Lin, “An Efficient Heuristic Procedure for Partitioning Graphs”, Bell System Technical Journal, vol. 49, no. 2, (1970), pp. 291-308. [32] Z. Li, S. Zhang, R.-S. Wang, X.-S. Zhang and L. N. Chen, “Quantitative function for community detection”, Physical Review E, vol. 77, no. 3, (2008). [33] M. Tasgin and H. Bingol, “Community detection in complex networks using genetic algorithm”, In Proceedings of the European Conference on Complex Systems, (2006). [34] A. C. Gavin, “Proteome survey reveals modularity of the yeast cell machinery”, Na, vol. 440, (2006), pp. 31-636. [35] R. Storn and K. Price, “Differential evolution a simple and efficient adaptive scheme for global optimization over continuous spaces”, Journal of Global Optimization, vol. 11, (1997), pp. 341-359. [36] Z. Yang, K. Tang, and X. Yao, “Large scale evolutionary optimization using co-operative a Co-evolution”, Information Sciences, vol. 178, (2008), pp. 2985-2999. [37] P. Ioannis, S. Roberts and Ben Sheldon, “Community Detection Algorithms: a comparative evaluation on artificial and real-world networks”, [Thesis], University of Oxford, (2010). [38] J. Durch and A. Arenas, “Community detection in complex networks using extremal optimization”, Physical Review E, vol. 72, (2005). [39] M. Newman, “Fast algorithm for detecting community structure in networks”, Physical Review E, vol. 69, no. 6, (2004) June. [40] M. Newman and M. Girvan, “Finding and evaluating community structure in networks”, Physical Review E, vol. 69, no. 2, (2004) February. [41] G. Palla, I. Der´Enyi, I. Farkas and T. Vicsek, “Uncovering the overlapping community structure of complex networks in nature and society”, Nature, vol. 435, no. 7043, (2005) June, pp. 814-8. [42] S. Papadopoulos, Y. Kompatsiaris, A. Vakali and P. Spyridonos, “Community detection in Social Media”, Data Mining and Knowledge Discovery, (2011) June, pp. 1-40. [43] P. Pons, “D´etection de communaut´es dans les grands graphes de terrain”, PhD thesis, Paris vol. 7, (2007). [44] M. A. Porter, J. P. Onnela and P. J. Much, “Communities in Networks”, (2009). [45] F. Y. Wu, “The Potts model, Reviews of Modern Physics”, vol. 54, no. 1, (1982) January. [46] Y. Bo, L. Dayou, L. Jiming and F. Borko, “Discovering communities from Social Networks: Methodologies and Applications”, Springer US, Boston, MA, (2010). [47] N. Agarwal, H. Liu, L. Tang and P. S. Yu, “Identifying Influential Bloggers in a Community”, Proceedings of the 1st International Conference on Web Search and Data Mining (WSDM08), Stanford, California, (2008) February 11-12, pp. 207-218.

8

Copyright ⓒ 2014 GV School Publication