(MCDM) Methods - Core

4 downloads 0 Views 591KB Size Report
Multi-Criteria Decision Making (MCDM) methods such as Pugh, technique for order preference by similarity to an ideal solution. (TOPSIS) are discussed to ...
Available online at www.sciencedirect.com

ScienceDirect Procedia Computer Science 57 (2015) 1179 – 1188

3rd International Conference on Recent Trends in Computing 2015 (ICRTC-2015)

Potential Influencers Identification Using Multi-Criteria Decision Making (MCDM) Methods Dr. Meera Gandhia, Muruganantham Ab* Professor, Department of Computer Science and Engineering, Sathyabama University, Chennai-600 119, India b Research Scholar, Sathyabama University, Chennai-600 119, India

a

Abstract Multi-Criteria Decision Making (MCDM) methods such as Pugh, technique for order preference by similarity to an ideal solution (TOPSIS) are discussed to identify the potential influencers in Social Media.. In Web 2.0 technology, every user is become major contributors in online social media. Social media sites like Facebook, Twitter are common phenomena for business organizations to offer business services to their customers. The amount of interaction generated at these social media sites are in the form like post, comments, Tweets, likes etc. influences the attitude and behaviour of others. It is important to monitor, estimate and engage the potential influencers who are most relevant to the brand, product or campaign are become important now. In this way, business enterprises could retain efforts aimed at sustaining the activity of influential users, who take minimal effort and resources to improve product sales and enhance their reputations to improve the business enterprise. In this article, a research framework was proposed to estimate the influencers in a social media site using Multi-criteria Decision Making (MCDM) methods and compared the results. The proposed approach is more dynamic and capable of identifying the potential influencers preciously than using standard centrality measures, which are incapable to be applied in large-scale networks due to the computational complexity. The MCDM based approach effectiveness was tested using a Facebook datasets and the results were shown with existing algorithms such as degree, PageRank and centrality measures. Comparisons were made on the ranking of influencers to evaluate the performance of MCDM methods, in which TOPSIS outperforms other methods. © 2015 Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license © 2015 The Authors. Published by Elsevier B.V. (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review underresponsibility responsibility of organizing committee 3rd International Conference Recent in Computing Peer-review under of organizing committee of theof 3rdthe International Conference on RecentonTrends in Trends Computing 2015 2015 (ICRTC-2015). (ICRTC-2015) Keywords: Influence Users; Facebook; TOPSIS; Social Influence; Social Network, Pugh technique

* Corresponding author. Tel.: +91-9884466284; E-mail address: [email protected], [email protected]

1877-0509 © 2015 Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review under responsibility of organizing committee of the 3rd International Conference on Recent Trends in Computing 2015 (ICRTC-2015) doi:10.1016/j.procs.2015.07.411

1180

Meera Gandhi and A. Muruganantham / Procedia Computer Science 57 (2015) 1179 – 1188

1. Introduction Social networking sites become very popular in recent years because of the increasing proliferation and affordability of internet enabled devices such as mobile devices and tablets. Many such social networks like Facebook, Twitter, LinkedIn are extremely rich in content, and they typically contain a tremendous amount of content and linkage data which can be leveraged for analysis and applied for business decisions. A study by Complete Incorporated for the Tourist Domain showed that more than 80% of users preferred other users’ opinions in order to make their buying decisions. In fact, it has been concluded that 97% of Internet users have read and been influenced by other users’ opinions while planning a trip [1]. Consumers or users of businesses tend to trust the opinion of other consumers or users, especially those with prior experience of a product or service, rather than company marketing. An important customer's influence is called social influence influences other customers’ preferences by shaping their attitudes and behaviours. Monitoring, identifying and engaging the potential influence users who are most relevant to the brand, product or campaign is become important now. In this way, business enterprises could retain efforts aimed at sustaining the activity of influencers, who take minimal effort and resources to improve product sales and enhance their reputations to improve the business enterprise. Numbers of studies are found on estimating the potential influencers who can maximize the information propagation within a social network. They assume that the social network structure and the influence probabilities on the edges of pairs of customers are given as inputs to the influence maximization problem. However, the influence probabilities are not always available unless prior knowledge of the relationships between actors is accessible. our proposed technique is to extract the influencers given the user interactions of a social media platform rather than the influence maximization problem. In this work, we propose techniques to compute the user influence by combining content-based and network-based approaches using technique for order preference by similarity to ideal solution (TOPSIS) method. This article is organized as follows. In the next section, we discuss the related works, and then highlight the research gap. In Section 3, we introduce our proposed techniques to solve the research problem defined in Section 2. Experiment results and discussions are presented in Section 4. We conclude our work and discuss the future work in Section 5. 2. Related Work In the literature, there are a number of works related to influential user identification in social networks. Models were proposed to compute social influence probabilities from real social network data, which uses action logs of web data, and next define propagation of actions and propagation graph [2]. The algorithms first learn model parameters, and then test the learned models to make predictions. In the process, influence probabilities were calculated based on actions in action logs. Thus each action can appear completely either in training or test dataset. Another technique [3] proposed by King and Chan summarize tasks and techniques in social computing mainly include but not limited to: social network theory, modelling, and analysis; ranking; query log processing; web spam detection; graph/link analysis and mining; collaborative filtering; sentiment analysis and opinion mining. Bharathi & Tang proposed methods [4][5] to focus on influence maximization problem, and Pang [6] and Turney[7] found polarity detection from web text, but few of them attempted to analyze deeply and find how a user sentimentally influences or is influenced by another in social networks. TwitterRank was proposed by Weng [8] to identify influential users in Twitter. As an extension of PageRank algorithm, it measures the influence taking both the topical similarity between users and the link structure into account. They truly process the tweets published in Twitter, and present their results to validate their solution on influence maximization problem. DuanbingChen[9] used the Susceptible–Infected–Recovered (SIR) model to examine the spreading influence of the nodes ranked by different centrality measures. Kaiquan[10] identified influencers using joint influence powers through Influence network, which took long time to build. Zhigu[11] uses user trust network to identify influence users took long time to build trust list, which is incomplete. Qian [12] proposed weighted LeaderRank technique by allowing users with more fans get more scores from the ground node that is, replacing the standard random walk by a biased random walk.

1181

Meera Gandhi and A. Muruganantham / Procedia Computer Science 57 (2015) 1179 – 1188

Tang [13] proposed a new approach to incorporate users’ reply relationship, conversation content and response immediacy which capture both explicit and implicit interaction between users to identify influential users of online healthcare community. Although the previous research has examined the problem of discovering a group of influential users, it did not quickly identify influence users using minimum computing power and it was not dynamic to the situation. Although these research works proposed multiple techniques of generating social network and estimating influence probabilities, they are not directly dealing with the problem of identifying influencers from a social network. To address the problem of identifying potential influencers, it is required to understand the properties of the online social networking environment and develop the method to extract the social network structure, develop formulations to compute the influence probabilities, and develop algorithms to rank the influencers. 2.1. Direct Graph Formally, assume a social network is modelled as directed graph G (V:E) where nodes V represent users, edges E represent social relationships among users and N represent size of network. Suppose user x adopts an innovation at time t1. We say that user x influences user y if and only if at time t2 when user y adopts the innovation, user x has already adopted it at an earlier time t1, at which time x and y were already friends. We therefore assume that social influence occurs when the information of a friend adopting the innovation has the influence to flow to neighbouring nodes in the social network. 2.2. Social influence Social influence refers to the behavioural change of individuals affected by others in a network. Social influence is an intuitive and well-accepted phenomenon in social networks [14]. The strength of social influence depends on many factors such as the strength of relationships between people in the networks, the network distance between users, temporal effects, characteristics of networks and individuals in the network. ‘ Standard Network Graph Metrics such as centrality closeness, eigenvector closeness and Betweenness closeness are related to social influence in terms of the structural effects of different edges and nodes. Degree: It is radial and volume-based measure. The simplest and most popular measure is degree centrality. Let A be the adjacency matrix of a network, and deg(i) be the degree of node i. The degree centrality ‫ܥ‬௜஽ாீ of node i is defined to be the degree of node: ‫ܥ‬௜஽ாீ = deg(i)

(1)

Closeness:It is radial and length based measures. Unlike the volume based measures, the length based measures count the length of the walks. The most popular centrality measure in this group is closeness centrality [19]. It measures the centrality by computing the average of the shortest distances to all other nodes. The closeness centrality ‫ܥ‬௜஼௅ை of node i is defined as follows: ் ‫ܥ‬௜஼௅ை = ݁௜ ܵͳ (2) Here S be the matrix whose (i, j)thelement contains the length of the shortest path from node i to node j and 1 is the all one vector. Node Betweenness or Betweenness Centrality: nodes of high Betweenness occupy critical positions in the network structure, and are therefore able to play critical roles. It is often enabled by a large amount of flow, which is carried by nodes which occupy a position at the interface of tightly-knit groups. Such nodes are considered to have high Betweenness. The Betweenness centrality ‫ܥ‬௜஻ா் of node i is defined as follows (3) ‫ܥ‬௜஻ா் = σ௝ǡ௞ ܾ ೔ೕೖ ್ೕೖ

Here bijk is the number of shortest paths from node j to k, and b jk be the number of shortest paths from node j to k that pass through node i.

1182

Meera Gandhi and A. Muruganantham / Procedia Computer Science 57 (2015) 1179 – 1188

Eigenvector Centrality: It is defined as a function of number and strength of connections to its neighbors and as well as those neighbors’ centralities. Let x(i) be the Eigenvector centrality of a node vi. Then, ଵ ே σ ‫ܣ‬௜ǡ௝ ‫ݔ‬ሺ݆ሻ (4) x(i)= ‫ ڊ‬௝ୀଵ Here ‫ ڊ‬is a constant and A denotes the adjacency matrix. In nutshell, The Eigenvector centrality network metric takes into consideration not only how many connections a vertex has (i.e., its degree), but also the degree of the vertices that it is connected to. PageRank:In PageRank, the transition probability P(vj|vi) equals to 1/out-degree(vi), where out-degree(vi) is the number of out-links of vi. In PageRank algorithm, all vertices are initialized with a unique PageRank score, which equals to one over the total number of vertices of the whole network. The PageRank scores are iteratively updated by the following formulation until convergence. PR(vj)ൌ ሺͳ െ †ሻ ൅ ݀ ෍

ி௢௥௔௟௟௩௜ǣ௘௜௝ᖡா

௩௝

ቀሺ ሻܴܲሺ‫݅ݒ‬ሻቁ ௩௜

(5)

The Structural metrics of a social networking site such degree, closeness centrality, Betweenness centrality, eigenvector centrality and PageRank are calculated based on the equations (1), (2), (3), (4) and (5) respectively. 3. Our Methodology Numbers of centrality measures [14] [15] and their applications have been proposed for identifying influential nodes. However, all of them focused on only one centrality measure and they have some limitations and disadvantages [16].Although many works proposed multiple techniques of generating social network and estimating influence probabilities, they are not directly dealing with the problem of estimating influencers from a social network. To address the problem of estimating the potential influencers, it is important to understand the properties of the social networking site and develop the mechanism to extract the social network structure and develop formulations to compute the influence possibilities in the dynamic business situation. Multi-Criteria Decision Making (MCDM) methods such as Pugh,technique for order preference by similarity to an ideal solution (TOPSIS) are discussed. Comparisons were made on the ranking of influencers to evaluate the performance of MCDM methods. 3.1. Pugh or Decision Matrix Method Let C be the criteria vector of a DMM C=(c1, c2, …cn) where Cj belongs to the criteria domain of the problem and n is the total number of criteria. Let W be the weights criteria vector of DMM W=(w1,w2,…wn) where wjɛ[0,N] | N ≠∞ Let Ai be the rating vector of i alternative Ai=(a1, a2,…, an) where am€ {-1,0,1} Consider the matrix D be defined by D=(aij) where aijis the rating of alternative to i to the criterion j,ai, € {-1,0,1}. D is called the rating matrix of the DMM Consider the vector S be define by S= W X D being D=(s1, s2,..sm) where sk is the product of weight i by alternative j and m is the number of alternatives. ƒͳͳ ǥ ƒͳ (s1, s2,..sm) = (w1, w2,…, wn) x ൭ Ǥ Ǥ ǤǤ ǥ ൱ ƒͳ ǥ ƒ The highest skwill be the team’s proposal for the problem analysed. Additionally, alternatives have been ranked by the team [17].

Meera Gandhi and A. Muruganantham / Procedia Computer Science 57 (2015) 1179 – 1188

1183

3.2. TOPSIS Method TOPSIS (technique for order preference by similarity to an ideal solution) method is presented in Chen and Hwang [18], with reference to Hwang and Yoon [19]. TOPSIS is a multiple criteria method to identify solutions from a finite set of alternatives. The basic principle is that the chosen alternative should have the shortest distance from the positive ideal solution and the farthest distance from the negative ideal solution. The procedure of TOPSIS can be expressed in a series of steps: (1) Calculate the normalized decision matrix. The normalized value nij is calculated as nij= xij ⁄ ටσ௠ ௝ୀଵ

x2ij

(2) Calculate the weighted normalized decision matrix. The weighted normalized value vij is calculated as vij=winij, j=1, ….,m, i=1, ….,n, where wj is the weight of the ith attribute or criterion, and σ௠ ௝ୀଵ (3) Determine the positive ideal and negative ideal solution. A+ = { v+1, … v+n}={(

݉ܽ‫ݔ‬ ݉݅݊ ݆ vij | i ɛ I ),( ݆ vij | i ɛ J ) },

A- = { v-1, … v-n} ={ (

݉ܽ‫ݔ‬ ݉݅݊ v | i ɛ I ), ( ݆ vij | i ɛ J ) }, ݆ ij

wj =1

Where I is associated with benefit criteria, and J is associated with cost criteria. (4) Calculate the separation measures, using the n-dimensional Euclidean distance. The separation of each alternative from the ideal solution is given as d+j = {σ௡௜ୀଵ ( vij –v+i)2} ½ , j=1,…,m. Similarly, the separation from the negative ideal solution is given as d-j = {σ௡௜ୀଵ ( vij – v-i)2} ½ , j=1,…,m. (5) Calculate the relative closeness to the ideal solution. The relative closeness of the alternative Aj with respect to A+ is defined as Rj= d-j / ( d+j + d-j), j=1,…,m. Since d j≥ 0 and d+j≥ 0 , then, clearly, Rj ɛ [0, 1]. (6) Rank the preference order. For ranking Decision Making Units (DMUs) using this index, we can rank DMUs in decreasing order. -

The basic principle of the TOPSIS method is that the chosen alternative should have the ‘‘shortest distance’’ from the positive ideal solution and the ‘‘farthest distance’’ from the negative ideal solution. The TOPSIS method introduces two ‘‘reference’’ points, but it does not consider the relative importance of the distances from these points The flow chart of the proposed method is shown in Figure 1.The specific steps of the method are illustrated as the following: Step 1 Construct network. Social network structure data is constructed as network using any conventional tool like NodeXL. Users with relationships are represented as undirected graph as shown in Figure 2.

1184

Meera Gandhi and A. Muruganantham / Procedia Computer Science 57 (2015) 1179 – 1188

Step 2 Calculate the different centrality values. Degree centrality (DC), closeness centrality (CC) and Betweenness Centrality (BC) and eigenvector centrality (EC) are calculated as mentioned in the equations (1), (2), (3) and (4).

Start

Represent social network structure data

Calculate the centrality measures (like Betweenness centrality, Closeness Centrality & Eigen vector centrality)

Apply MCDM methods such as Pugh &TOPSIS method on centrality measures to list the influencers

End

Figure 1: The proposed method flow chart Step 3 Identify influencers using MCDM based methods Pugh &TOPSIS From the centrality values of social network data, the top influencers are listed using Pugh &TOPSIS method 4. Experiment and Discussion In this section, A dataset (Facebook) has been downloaded to evaluate the performance of MCDM methods: Pugh and TOPSIS approach. 4.1. Experiment Datasets Our Facebook dataset was downloaded and applied through Network analysis tool to construct the network graph structure. The extracted Facebook data was represented through NodeXL as network graph structures and found 260+ users connections with structural metrics like Degree, Closeness Centrality, Betweenness Centrality , Eigenvector Centrality and PageRank.

Meera Gandhi and A. Muruganantham / Procedia Computer Science 57 (2015) 1179 – 1188

1185

Figure 2: Network Graph Structure and its structural metrics in a Facebook dataset 4.2. Identification of Influencers In a tradition approach, the influencers are extracted based on specific structural metric values. For example, In the Figure 2, the nodes are sorted descending order of betweenness centrality. The highest betweenness centrality can reach any information to the rest of users in the Facebook network very rapidly than the vertices having zero betweenness centrality. In a social network, a connection to a popular individual (having highest Eigenvector Centrality is more important than a connection to a loner. The eigenvector centrality network metric takes into consideration not only how many connections a vertex has (i.e., its degree), but also the degree of the vertices that it is connected to. Similarly, based on the descending order of degree, closeness centrality and PageRank values, the influencers are identified as mentioned in the Figure 3 to Figure 7.Depending upon the business situation or information flow in the social network, an appropriate centrality measure say Eigenvector centrality is chosen as measure to identify the influencers when the popularity of node is important. This approach will not work for all possible business situations. MCDM based approach such as TOPSIS obtain the evaluation of node importance of each node, which is not limited to only one centrality measure, but considers different centrality measures synthetically. The proposed approach using Multi-criteria Decision Making (MCDM) methods such as TOPSIS select the influencers considering more than one structural metrics as shown in the Figure 8. The TOPSIS based calculation on structural metrics is shown in the Figure 9.

1186

Meera Gandhi and A. Muruganantham / Procedia Computer Science 57 (2015) 1179 – 1188

Meera Gandhi and A. Muruganantham / Procedia Computer Science 57 (2015) 1179 – 1188

1187

Figure 9: TOPSIS calculation and ranked influencers

4.3. Experiment Results The Multi-criteria Decision Making (MCDM) methods such as TOPSIS approach estimates the influencers based on more than one attributes of relevance such as betweenness centrality, closeness centrality, Eigenvector centrality, PageRank, Degree etc. Depending upon the business situation or information flow TOPSIS method chooses the appropriate combination of centrality measures. In figure 9, Betweenness centrality, closeness centrality, Eigenvector centralities are chosen as multi-attribute in TOPSIS method. The MCDM based TOPSIS approach attempts to choose alternatives that simultaneously have the shortest distance from the positive ideal solution and the farthest distance from the negative-ideal solution. It can aggregate multiple attributes from different aspects to make decision or make evaluation depending upon the business situation or context or interest.

5. Conclusion and Future Direction In this paper, a new approach to estimate the potential influencers in the social network like Facebook using Multicriteria Decision Making (MCDM) methods such as TOPSIS rather than using single attribute based network metrics. Influencers’ identification is taken into multiple network metrics rather than one network metrics. Due to this improvement, the potential influencers are identified and ranked preciously for all possible business situations whereas the single attribute network metric based approach works better for a specific business situation/interest. Our approach has many potential applications in the context of understanding influence users. The influencers’ identified by our approach is meaningful because they work for all business situations. The current work has still few limitations and can be improved in the future. (1) The structural network metrics are taken into account while it is not considered the dynamic properties, which are evolved over the time period. (2) The current work is limited to social network site-Facebook and it can be extended other online social network site like Blogs, E-mail, Twitter, Myspace etc. (3) Our approach based experiment result performance can be evaluated with available methods References [1] Gretzel, Ulrike and Kyung Hyan Yoo, 2008, Use and impact of online travel reviews. In Information and communication technologies in tourism 2008. Springer, p. 3546. [2] Goyal A., Bonchi F, and Laks V. S. Lakshmanan, 2010, Learning influence probabilities in social networks, WSDM '10 Proceedings of the third ACM international conference on Web search and data mining, p. 241-250. [3] King I., Li J., and Chan T.,2009, A Brief Survey of Computational Approaches in Social Computing, IJCNN’09 Proceedings of the International joint conference on Neural Networks, IEEE Press, p. 2699-2706. [4] Bharathi S., Kempe D., and Salek M.,2007, Competitive influence maximization in social networks, Lecture Notes in Computer Science, p. 306-311. [5] Tang J., Sun J., Wang C., and Yang Zi, 2009, Social influence analysis in large-scale networks, Proceedings of the 15th ACM SIGKDD International conference on Knowledge discovery and data mining, p. 807-816. [6] Pang B., Lee L., and Vaithyanathan S,2002, Thumbs up? Sentiment lassification using machine learning techniques, Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), p. 79-86

1188

Meera Gandhi and A. Muruganantham / Procedia Computer Science 57 (2015) 1179 – 1188

[7] Turney P. D., 2002, Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews, Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL ’02), p. 417-424. [8] Weng J., Lim E. P., Jiang J., and Qi He, 2010, TwitterRank: finding topic sensitive influential twitterers, WSDM '10 Proceedings of the third ACM international conference on Web search and data mining, p. 261-270. [9] Duanbing Chen, Linyuan Lü, Ming-Sheng Shang, Yi-Cheng Zhang, Tao Zhou, 2012, Identifying influential nodes in complex networks, Physica A: Statistical Mechanics and its Applications, Volume 391, Issue 4, p. 1777–1787. [10] Kaiquan Xu, Xitong Guo, Jiexun Li, Raymond Y.K. Lau, Stephen S.Y. Liao,2012, Discovering target groups in social networking sites: An effective method for maximizing joint influential power, Electronic Commerce Research and Applications, Volume 11, Issue 4, p. 318–334 [11] Zhigu Zhu, 2013, Discovering the influential users oriented to viral marketing based on online social networks,PhysicaA: Statistical Mechanics and its Applications, Volume 392, Issue 16, p. 3459–3469. [12] Qian Li, Tao Zhou, Linyuan Lü, Duanbing Chen,2014,Identifying influential spreaders by weighted LeaderRank, PhysicaA: Statistical Mechanics and its Applications, Volume 404, p. 47–55. [13] Tang Xuning, Yang Christopher, 2012,Ranking User Influence in Healthcare Social Media, ACM Transactions on Intelligent Systems and Technology, Vol. 3, No. 4, Article No. 73. [14] S.P. Borgatti,2005,Centrality and network flow, Social Networks, Volume 27, p. 55–71. [15] S.P. Borgatti, M.G. Everett, 2006, A graph-theoretic perspective on centrality, Social Networks, Volume 28, p. 466–484. [16] Tore Opsahl,Filip Agneessens,John Skvoretz, 2010, Node centrality in weighted networks: Generalizing degree and shortest paths, Social Networks, Volume 32, Issue 3, p. 245-251 [17] Redesigning Decision Matrix Method with an indeterminacy-based inference process, Jose L. Salmeron, Florentin Smarandache, 2006, Advances in Fuzzy Sets and Systems, Vol. 1(2), p. 263-271. [18] Chen S.J, Hwang C.L., 1992,Fuzzy Multiple Attribute Decision Making: Methods and Applications, Springer-Verlag, Berlin. [19] Hwang C.L., Yoon K., 1981, Multiple Attribute Decision Making Methods and Applications, Springer, Berlin Heidelberg