Investigating the effectiveness of advertising on ...

2 downloads 5256 Views 217KB Size Report
The life cycle of an on-line social network (such as Facebook, Myspace, etc.) can be described by 4 ... To do this first we reproduce real networks based on a huge sample of Facebook ... Based on the work of M. Gjoka et al. [6] we managed to ...
CREAT. MATH. 23 (2014), No. 1,

INFORM. 73 - 80

Online version at http://creative-mathematics.ubm.ro/ Print Edition: ISSN 1584 - 286X Online Edition: ISSN 1843 - 441X

Investigating the effectiveness of advertising on declining social networks G ERGELY K OCSIS and I MRE VARGA A BSTRACT. In this paper we aim to investigate the question if it’s worth it to advertise on declining social networks or not. Our investigations are based on computer simulations using a previously defined and here simplified model of information spreading. To make our results as close to the real life as possible we run simulations both on a real network sample and on several generated networks representing as well classical scale-free-like social topologies as declining social networks. As a result we found how the continuous destruction of the network affects the spreading, changing also the effectiveness of the advertising. 1. I NTRODUCTION AND MOTIVATION The life cycle of an on-line social network (such as Facebook, Myspace, etc.) can be described by 4 distinct states (see Fig. 1). First when it is introduced, the amount of users (nodes) of the network starts to increase slowly because only a few people know about the network. After a while – as more and more users are connected to the network – this increasing speeds up. This phase is called the growing of the network. At a given point the network reaches its possible maximum number of users. The growing slows down and for a more or less long time the social network stays in its mature state. History shows however that this state is not the last in the life of a social network. With time a new rival always appears and sooner or later the network enters to its final phase of declining [12]. It is easy to understand that advertising in social networks in their first 3 states – especially in the middle two – is very worthy. These networks have almost clean scale-free topology [1] meaning that they are highly connected. This property makes the spreading of information very rapid as it turns out from the several recent papers investigating information spreading on scale-free networks [8, 10]. However the same is not proven to be true for declining social networks. In the followings we will examine information spreading on social network topologies to find out how the effectiveness changes in the final state of their life cycle. To do this first we reproduce real networks based on a huge sample of Facebook data. After that we examine the effect of declining. 2. R EAL AND GENERATED NETWORKS Based on the work of M. Gjoka et al. [6] we managed to get a set of data from a real piece of the global Facebook social network. This network contains more than 50 million nodes and almost 100 million edges making it possible to get smooth results of our simulations of spreading. This kind of network however has some drawbacks as well. First of all Facebook is rather in its mature than in its declining phase, so the results on only this networks can not answer our starting question. That is why we try to find a Received: 04.11.2013. In revised form: 05.06.2014. Accepted: 12.06.2014 2010 Mathematics Subject Classification. 05C82, 90B15, 91D30. Key words and phrases. Scale-free networks, social networks information spreading, simulation, modeling. Corresponding author: Imre Varga; [email protected] 73

Decline

Maturity

Users

Growth

Gergely Kocsis and Imre Varga

Introduction

74

Time F IGURE 1. Life cycle of a social network [12]. The network goes through 4 distinct phases during its life. First it is introduced. This is followed by rapid growing. After a while it enters to maturity. And finally it starts to decline. We focus on this last phase. method to generate scale-free networks taking into account the fact that sometimes users leave the network. Examining the degree distribution of our sample network naturally it turns out that it is close to have a power law degree distribution. However the exponent of the degree distribution is not the theoretical γ = 3 but a slightly lower one, being different from the values kept by the scale-free network generating algorithm of the Barab´asi-Albert (BA) model [2] or other models [4, 5]. To solve this problem in an easy way and get the desired degree distribution we used the model of Lee et al. [9] in which the classical preferential attachment growing rule is altered a bit. The generating model is a mixture of the popularity-driven BA algorithm and the so called fitness-driven algorithm. The generation of a growing network starts from a small connected network of m0 initial nodes. The network is increased node-by-node. Each new node is linked to m = m0 − 2 of existing points. In the BA model the probability of linking to node i is proportional to the number of links ki to this node, i.e. if a node has more neighbors it has more chance to get a new one. In contrary in the fitness-driven model each node has a randomly assigned fitness value between 0 and 1, and the probability to join to node i is given by the product of this fitness value and the number of existing links ki . In our system each new node connects to the network according to BA model with probability p while the fitness-driven connection has 1 − p probability. Thus p = NP /NF , where NP and NF refers to the number of nodes attached by popularity-driven and fitness-driven method, respectively. The number of nodes is given as N0 = m0 + NP + NF , where m0 is negligible. As it is expected the degree distribution P (k) of the generated networks follows power law behavior, which means most of the nodes have only a few contacts while there are some with large number of connections. As it can be seen in Fig. 2/a almost half of the nodes have only the initial m neighbors, but there are nodes with more than 10000 connections as well. The value of the exponent γ depends on probability p, so changing

Investigating the effectiveness of advertising on declining social networks

75

3.0

1

p=1.0 (popularity) p=0.0 (fitness)

-2

2.9 2.8

(p)

-4

-6

2.6 2.5

-8

10

1

2.7

2.4

a) 10

10

2

3

k

10

4

10

b)

2.3 0.0

0.2

0.4

p

0.6

0.8

1.0

F IGURE 2. a) Degree distribution of the generated networks at different values of p. The networks have N0 = 5.000.000 nodes and m = 2 was used. b) The exponent of degree distribution is monotonically increasing by BA dominance p. the dominance of the BA algorithm the slope of the degree distribution function can be controlled monotonically in a wide range (See Fig. 2/b). The value of m has no influence on the exponent of the distribution, it only shifts the curves with the same exponent. If the distribution is rescaled by m the curves for different values can be collapsed. m only determines the average number of links. The degree distribution can be written as: P (k) = 2m2 k −γ(p) .

(2.1)

Using the above process we became able to generate networks with almost the same degree distribution exponent as our sample network. 3. ATTACK OF NETWORKS It is known that a growing social network has an almost clean scale-free degree distribution. However we would like to examine networks in their declining phase. To simulate this declining process we have to remove nodes from the network through a so called attack process [3, 11]. The problem is that it is not known yet which are those users who leave an online social network first. A possible scenario is that (i.) pioneers of an online society are the first to leave and change to a new rival becoming the pioneers of the new network as well. If we assume the rich get richer [2] conception this means those users leave the network first who are the hubs of the system, i.e. who have the most connections. On the other hand (ii.) if some users have only a few connections they cannot enjoy the advantage of the system, they can feel unsuccessful and they maybe leave the network. In this case the peripheral users quit losing only a few connections. A third case can be possible as well. (iii.) It may also happen that the degree of nodes has no effect on the probability of leaving the network. Because of this we implemented three different ways to attack the network based on the degree of nodes. Namely i.) removing the high degree nodes more likely (central attack), ii.) removing the low degree nodes more likely (peripheral attack) and iii.) removing all nodes with the same probability (general attack). In

76

Gergely Kocsis and Imre Varga

the central case the chance of a node being removed is proportional to the number of its neighbors k. While the preferential attachment based models obey the rich get richer philosophy, old nodes which are joined to the network at the beginning have much higher degree k. Removing these popular nodes means removing the old, pioneer users. In this sense this kind of attack indicates that case when the online social system has a FIFO-like (who is the ”First In” will be the ”First Out”) behavior. In the second, peripheral case connected nodes with only a few links have larger probability to be removed. Namely the chance to be removed is in inverse proportion to k. Based on the growing method the recently joined, new nodes can have only small values of k. During the peripheral attack if we remove them more frequently most of new nodes can spend only a short time period in the network. In other words a ”Last In First Out” (LIFO) phenomenon can be observed. That is the analogy of social networks where the society has a hard core and an altering, temporary cover layer. In the third case nodes have the same probability to be removed independently from the number of connections. In addition the attack can be described by the attack strength η = N − /N0 , where N − is the number of removed nodes. Finally the parameters of these model networks at each attack types are the original number of nodes N0 , the dominance of BA model p, number of links of new nodes m and the attack strength η. As it is well-known, scale-free networks are very resistant against general attack [3]. The degree distribution has almost the same exponent γ even if the attack strength η is close to 0.5. The exponent γ seems to be constant also in case peripheral attack. It can be understood that removing recently joined nodes with small k leads to a state which is similar to an earlier and smaller one if the system is large enough. When the nodes with larger number of links are damaged frequently the slope of the degree distribution of centrally attacked network is decreasing very fast. The degree distribution can be seen for different attack mechanisms in Fig. 3.

1

P(k)

10

peripheral general central

-2

10

-4

-6

10

-8

10

1

10

2

10

3

10

10

4

k F IGURE 3. Degree distribution of attacked networks at η = 0.4. The solid line is the fitted power law curve of the original network. Peripheral and general attack do not change the exponent, while central attack changes even the functional form of the curve.

Investigating the effectiveness of advertising on declining social networks

77

The attack process has influence not only on the exponent of the degree distribution. Due to the removing of nodes after a while the initially connected networks fall into pieces, forming smaller clusters. In most of the cases there is a cluster which is much larger than the others. Usually this component is called the so called ”giant component” and its size compared to the system size Sg /N is an important property of the network. The size distribution of smaller clusters obeys power law form with exponent τ . In the unattacked network the average number of connections of nodes ⟨k⟩ is determined only by the value of m, but after the attack process it changes. By the tuning of the input parameters (dominance of BA model p, number of links of a new node m, type of attack and attack strength η) we can modify the 4 structural properties of the network, namely: γ, ⟨k⟩, Sg /N and τ [11]. Analyzing the Facebook data set we found that to model its structure we have to generate a network where γ is around 2.5, ⟨k⟩ < 10 and Sg /N > 0.99. Due to the last condition the value of τ is negligible. Our main goal of this project is to find out the effect of structural changes of the network on information spreading. Keeping this in mind we also need to examine the network from the point of spreading while we generate. Namely when we try to generate a network that is as similar to our sample data as possible, we also have to check how a spreading phenomenon runs on it. 4. I NFORMATION SPREADING In a previous work Kocsis and Kun proposed a model for information spreading on complex networks [7]. In our current work the simulations are based on a simplified version of this model where we omit those parameters that were found not to have a relevant effect on the spreading. In short this simplified model builds up as the following: The socio-economic system is represented by a set of agents which have a complex topology of social contacts. Agents take place on the nodes of an underlying network topology and the social contacts are represented by the edges. The agents can be in two different states describing whether they are informed or not. This state is characterized by an integer variable Si , which can have two distinct values { 1, if agent i is uninformed, Si = (4.2) 0, if agent i is informed. Initially all agents are uninformed. The state of an agent can change stochastically at discrete points of time based on the amount of information it receives from the outside. As it was presented in the original model, the agents get information from two competing channels: an external information source (that can represent e.g. advertising through advertisements) and from each other (i.e. word of mouth). For simplicity here we assume that the amount of information coming from the outer source and the amount of information from an informed neighbor (i.e. an informed agent that is connected to the observed one) are both one ”unit” per time-step. However agents have different sensitivity for the two channels. The sensitivity of agents to the external information field is described by a parameter β while the sensitivity of agents to the information from the neighbors is presented by α, where both α and β are real numbers. From the above it is easy to see that the total amount of information received by agent i (Ii ) can be written in the form Ii = αSi

ki ∑ (1 − Sj ) + βSi , j=1

where ki denotes the number of social contacts of agent i.

(4.3)

78

Gergely Kocsis and Imre Varga

Uninformed agents can get informed in each time-step with a certain probability A, which is a monotonically increasing function of the available information Ii : A(Ii ) = 1 − e−Ii .

(4.4)

Since the state of the surroundings of a selected agent can change, the total amount of new information Ii and the resulting awareness probability A(Ii ) are both functions of time. This model has been thoroughly investigated already in the original paper. In our current work the parameters of the model were set to the values proven to be the closest to the real world scenarios found previously [7]. Namely we set α = 0.01 and β = 0.001 in order to balance the effect of the two channels. 1

1

a)

b)

0.8

0.8

0.6

0.6

N0 = 104 N0 = 107 no attack peripheral general central

0.4

0.2

0

hSi

hSi

η 0.4

peripheral general central

0.2

0 0

200

400

600

0

200

t

400

600

800

1000

t

F IGURE 4. a) Ratio of informed nodes in the system as a function of time t for networks of different sizes attacked in three ways. Note that networks attacked in the same way show the same fashion of spreading independent of the size of the network (η = 0.4). b) The effect of attack on the spreading. The curves show the spreading on the networks as a function of time t. The direction of the arrow shows the increase of the number of removed nodes (η = 0.1; 0.2; 0.3; 0.4).

5. R ESULTS OF SIMULATIONS We carried out a large number of simulations on several networks with different structural properties. Based on our results published in [11] we always set p = 0.1 and m = 3 in the growing phase of the network generation and we attacked the resulted networks in all three different ways respectively. The resulted networks differ in size N0 , degree distribution exponent γ, average number of connections ⟨k⟩ and in size of the giant component Sg . During these simulations we chose the ratio of informed agents ⟨S⟩ as the key indicator of the spreading. We found that the spreading process behaves the same way independently of the size of the network through several different network sizes, as it is illustrated in Fig. 4/a. The reason of this size independence is the information spreading model itself. Namely because of the outer information source, some agents always get randomly informed serving later as information sources to others (see Eq. 4.4.). These agents pop up in the system equally distributed so the information has to spread only so that the distance between these agents acting as first information sources is covered. And this distance is naturally independent of the system size.

Investigating the effectiveness of advertising on declining social networks

79

What became clear directly from our previous simulations is that the different ways of attacking have different consequences on the spreading. Naturally in all cases with the increase of the number of removed nodes the spreading becomes slower. However the same attack strength affects the spreading in different ways. As it was seen above, if low degree nodes are removed from the network, the structure of it does not change. This is why we see in Fig. 4/b that even if we remove 40% of the lowest degree nodes, the behavior of spreading does not change much. Focusing on the case of removing all nodes with equal probability shows, that in this case a smaller amount of removed nodes can imply the same effect. The third case however, where we attack directly the highest degree nodes highlights, that even if we remove only a few nodes from the network, the spreading slows dramatically as it is expected from the literature. This means that the future of spreading highly depends on the way how the network declines. Both the effect of the different attack types and of the attack strength η can be seen in Fig. 4/b. 1.0

2

a)

b)

0.8

3

10

5

t∗

hSi

0.6

peripheral central general

0.4

F acebook Grow and destroy BA

0.2

0.0

2

2

10 0

100

200

t

300

400

0

0.1

0.2

η

0.3

0.4

F IGURE 5. a) Information spreading on different networks. Our model is much closer to the real case then a simple growing BA Network. (Parameters of grow-and-destroy network: p = 0.1, m = 3 and general attack with η = 0.4.) b) The time needed to reach a well informed state (⟨S⟩ > 0.9) exponentially depends on attack strength η. At a certain value of η the spreading is much slower in case of central attack than in general one, but the peripheral case is the fastest. The networks here were generated using the grow and destroy model with the same parameters as on Fig a). To find a parameter set of our model, that results a network with similar properties as the real graph and acts similar during the spreading, we ran the same simulations on the real and the generated networks and compared the trends of the informed agents as a function of time. We found that our grow-and-destroy model can give a much better approximation of information spreading in real networks then other models of only growing networks. Fig. 5/a shows that the behavior of a simple popularity driven BA network differs from the behavior of real Facebook-based network, while our network gives much better correspondence. In order to quantify the speed of information spreading we determined the time t∗ which is necessary to reach a state where 90% of agents are informed. We found that the time needed to achieve a given spreading degree t∗ is changing exponentially with the attack strength η for not too large values of it. On semi-log plot the slopes of the curves belonging to different attack types, differ definitely, (see Fig. 5/b). These different behaviors and the fact that the needed time to reach saturation is monotonically increasing are easy to understand. However the reason why the increasing is

80

Gergely Kocsis and Imre Varga

exponential may be the subject of further investigations. Concluding the results above, with the loose of the users the effectiveness of advertising on social networks decreases rapidly, meaning that the worth of continuing advertising on them is fairly questionable. 6. S UMMARY In this work our goal was to find an answer to the question ”How information spreads on a declining social network?” or more generally ”What is the effect of declining of the network on the effectiveness of advertising?”. In order to make our results close to the real networks we used a set of data of real Facebook users. We also generated scale-free networks using a mixture of popularity-driven and fitness-driven models of growing networks. Then the networks were attacked in three different ways to model the effects of declining. During the study of information spreading on the above networks we used the average ratio of informed agents as a function of time as an indicator of the spreading. We found that while the so called peripheral attack hardly changes the process, the general attack affects it more while the central attack dramatically changes it. By our simulations we showed that the spreading does not depend on the system size qualitatively (even if it is attacked). To answer our main question we showed that with the number of users left the network the information spreading slows exponentially, making the advertising less effective. ´ Acknowledgement. The publication was supported by the TAMOP-4.2.2.C-11/1/KONV2012-0001 project. The project has been supported by the European Union, co-financed by the European Social Fund. R EFERENCES [1] Albert, R. and Barab´asi, A.-L., Statistical mechanics of complex networks, Rev. Mod. Phys., 74, 47–97 (2002) [2] Albert, R. and Barab´asi, A.-L., Emergence of Scaling in Random Networks, Science 286 5439, 509–512 (1999) [3] Albert, R., Jeong, H. and Barab´asi, A. L., The Internet’s Achilles’ Heel: Error and attack tolerance of complex networks, Nature 406, 378–382 (2000) [4] Caldarelli, G., Capocci, A., De Los Rios, P. and Muoz, M. A., Scale-Free Networks from Varying Vertex Intrinsic Fitness, Phys. Rev. Lett., 89, 258702 (2002) [5] Dorogovtsev, S. N. and Mendes, J. F. F., Effect of the accelerating growth of communications networks on their structure, Phys. Rev. E, 63, 025101 (2001) [6] Gjoka, M., Kurant, M., Butts, C. T. and Markopoulou, A., Walking in Facebook: A Case Study of Unbiased Sampling of OSNs, Proceedings of IEEE INFOCOM ’10, San Diego, CA, (2010) [7] Kocsis, G. and Kun, F., Competition of information channels in the spreading of innovations, Phys. Rev. E, 84, 026111 (2011) [8] Kosmidis, K. and Bunde, A., Propagation of confidential information on SF networks, Phys. A, 376, 699–707 (2007) [9] Lee, H. Y., Chan, H. Y. and Hui, P. M., Scale-free networks with tunable degree distribution exponents, Phys. Rev. E, 69, 067102 (2004) [10] Tang, X. G. and Wong, E. W. M., Information traffic in scale-free networks with fluctuations in packet generation rate, Phys. A, 388, 4797–4802 (2009) [11] Varga, I., N´emeth, A. and Kocsis, G., A novel method of generating tunable underlaying network topologies for social simulation, submitted to The 4th IEEE International Conference on Cognitive Infocommunicaitons (2013) [12] Varghese, B. M., The Life Cycle of a Social Network, Techpedia (http://www.techipedia.com/2011/socialnetwork-life-cycle/ - last visited: June 20, 2013) (2011) D EPARTMENT OF I NFORMATICS S YSTEMS AND N ETWORKS U NIVERSITY OF D EBRECEN ´ , D EBRECEN , H UNGARY 26, K ASSAI UT E-mail address: [email protected] E-mail address: [email protected]