Local Wealth Redistribution Promotes Cooperation in ...

2 downloads 0 Views 3MB Size Report
Feb 3, 2018 - ignorance proposed by John Rawls [35]: Agents should decide the .... W Feldman, Laurel Fogarty, Stefano Ghirlanda, Timothy Lillicrap, and.
Local Wealth Redistribution Promotes Cooperation in Multiagent Systems

1 2 3

5

7 8 9 10

14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

Fernando P. Santos

Collective Learning Group, The MIT Media Lab Massachusetts Institute of Technology, 22 Ames Street, Cambridge, Massachusetts [email protected]

INESC-ID and Instituto Superior Técnico, Universidade de Lisboa 2744-016 Porto Salvo, Portugal [email protected]

Designing mechanisms that leverage cooperation between agents has been a long-lasting goal in Multiagent Systems. The task is especially challenging when agents are selfish, lack common goals and face social dilemmas, i.e., situations in which individual interest conflicts with social welfare. Past works explored mechanisms that explain cooperation in biological and social systems, providing important clues for the aim of designing cooperative artificial societies. In particular, several works show that cooperation is able to emerge when specific network structures underlie agents’ interactions. Notwithstanding, social dilemmas in which defection is highly tempting still pose challenges concerning the effective sustainability of cooperation. Here we propose a new redistribution mechanism that can be applied in structured populations of agents. Importantly, we show that, when implemented locally (i.e., agents share a fraction of their wealth surplus with their nearest neighbors), redistribution excels in promoting cooperation under regimes where, before, only defection prevailed.

30 31 32 33

CCS CONCEPTS

• Computing methodologies → Multi-agent systems; Cooperation and coordination;

34

KEYWORDS

36

Emergent behaviour; Social networks; Social simulation; Simulation of complex systems; Cooperation

38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53

1

Un

35

37

INTRODUCTION

Explaining cooperation among selfish and unrelated individuals has been a central topic in evolutionary biology and social sciences [23]. Simultaneously, the challenge of designing cooperative Multiagent Systems (MAS) has been a long standing goal of researchers in artificial intelligence (AI) [10, 19]. More than thirty years ago it was already clear that ”Intelligent agents will inevitably need to interact flexibly with other entities. The existence of conflicting goals will need to be handled by these automated agents, just as it is routinely handled by humans.“ [10]. In Cooperative multiagent interactions, agents need to collaborate towards common goals, which introduces challenges associated with coordination, communication and teamwork modeling [19, 28].

57

Proc. of the 17th International Conference on Autonomous Agents and Multiagent Systems Unpublished working draft. NotS. Koenig for distribution (AAMAS 2018), M. Dastani, G. Sukthankar, E. Andre, (eds.), July 2018, Stockholm, Sweden © 2018 International Foundation for Autonomous Agents and Multiagent Systems (www.ifaamas.org). All rights reserved. https://doi.org/doi

58

2018-02-03 20:52 page 1 (pp. 1-9)

54 55 56

61

63 64 65 66 67 68 69

ABSTRACT

Self-interested interactions, in contrast, require the design of indirect incentive schemes that motivate selfish agents to cooperate in a sustainable way [7, 19]. Cooperation is often framed as an altruistic act that requires an agent to pay a cost (c) in order to generate a benefit (b) to another. Refusing to incur in such a cost is associated with an act of defection and results in no benefits generated. Whenever the benefit exceeds the cost (b > c) and plays occur simultaneously, agents face the Prisoner’s Dilemma, a decisionmaking challenge that embodies a fundamental social dilemma within MAS [21]: rational agents pursuing their self-interests are expected to defect, while the optimal collective outcome requires cooperation. If defection is the likely decision of rational agents, however, how can we justify the ubiquity of cooperation in the real world? Evolutionary biology has pursued this fundamental question by searching for additional evolutionary mechanisms that might help to explain the emergence of cooperative behavior [22, 23]. Some of these mechanisms allowed to develop solutions that found applications in computer science, such as informing about ways of incentivizing cooperation in p2p networks [9, 11], wireless sensor networks [2], robotics [47] or resource allocation and distributed work systems [43] – to name a few. Network reciprocity is one of the most popular mechanisms to explain the evolution of cooperation in social and biological systems [24, 26, 30–32, 38]. In this context, populations are structured and interactions among agents are constrained. These constraints are often modelled by means of a complex network of interactions. Applications of this mechanism have been explored in the design of MAS that reach high levels of cooperation [1, 17, 29, 34]. Despite these advances, cooperation on structured populations is still hard to achieve when considering social dilemmas with high levels of temptation to defect. Additional complementary mechanisms are required. Here we consider that agents contribute a percentage of their surplus (defined below), which is later divided among a Beneficiary Set of other agents. In this context, we aim at answering the following questions: • Does redistribution of wealth promote the evolution of cooperation? • How should Beneficiary Sets be selected? • What are the potential disadvantages of such a mechanism? Using methods from Evolutionary Game Theory (EGT) [44] and resorting to computer simulations, we explore how wealth redistribution impacts the evolution of cooperation on a population of agents without memory (i.e. unable to recall past interactions) and rationally bounded (i.e. lacking full information on payoff structure of the game they are engaging). We assume that agents resort

pu No blis t f he or d w di o str rk ib ing ut d io ra n

13

Flávio L. Pinheiro

ft

6

12

60

62

4

11

59

70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116

118 119 120 121 122 123 124 125 126 127 128 129 130 131 132

to social learning through peer imitation, which proves to be a predominant adaptation scheme employed by humans [36]. Also, we consider that strategies are binary – Cooperate and Defect – opting to focus our attention on the complexity provided by 1) heterogeneous populations, 2) the redistribution mechanism and 3) the self-organizing process of agents when adapting over time. The role of larger strategy spaces (such as in [29, 34, 41]) lies outside the scope of the present work. With redistribution, we show that cooperation emerges in a parameter region where previously it was absent. Moreover, we show that the optimal choice of redistributing groups consists in picking the nearest neighbors (local redistribution). This result fits with a local and polycentric view of incentive mechanisms [27, 46] in MAS, which may not only be easier to implement but, as we show, establish an optimal scale of interaction in terms of eliciting cooperation.

133

135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174

2

RELATED WORK

The problem of Cooperation is a broad and intrinsically multidisciplinary topic, which has been part of the MAS research agenda for a long time [10, 19]. In the realm of evolutionary biology, several mechanisms were proposed to explain the evolution of cooperation [22]. Kin selection [13], direct reciprocity [45], indirect reciprocity [25, 42] and network reciprocity [26, 38] constitute some of the most important mechanisms proposed. Remarkably, these mechanisms have been applied in AI in order to design MAS in which cooperation emerges. For example, Waibel et al. associated kin selection with evolutionary robotics [47]; Griffiths employed indirect reciprocity to promote cooperation in p2p networks while Ho et al. investigated the social norms that, through a system of reputations and indirect reciprocity, promote cooperation in crowdsourcing markets [12, 16]. Similarly, Peleteiro et al. combined indirect reciprocity with complex networks to design a MAS where, again, cooperation is able to emerge [29]. On top of that, Han applied EGT – as performed in our study – in order to investigate the role of punishment and commitments in multiagent cooperation, both in pairwise [14] and group interactions [15]. Regarding alternative agent-oriented approaches to sustain cooperation in MAS, we shall underline the role of electronic institutions [4, 8] whereby agents’ actions are explicitly constrained so that desirable collective behaviors can be engineered. The role of population structure and network reciprocity is, in this context, a prolific area of research. In [31] it was shown that complex networks are able to fundamentally change the dilemma at stake, depending on the particular topology considered [18, 31]; Ranjbar-Sahraei et al. applied tools from control theory in order to study the role of complex networks on the evolution of cooperation [34]. Importantly, the role of dynamic networks – i.e., agents are able to rewire their links – was also shown to significantly improve the levels of cooperation, especially in networks with a high average degree of connectivity [32, 39]. A survey on the topic of complex networks and the emergence of cooperation in MAS can be accessed in [17]. Previous works found that cooperation in structured population substantially decreases when the temptation to defect increases (see Model for a proper definition of Temptation). Thereby, here

Un

134

we contribute with an additional mechanism of cooperation on structured populations. We consider a mechanism of redistribution, inspired in the wealth redistribution mechanisms that prevail in modern economic/political systems, mainly through taxation. We are particularly interested in understanding how to sample redistribution groups in an effective way. In this context, we shall underline the works of Salazar et al. and Burguillo-Rial, in which a system of taxes and coalitions was shown to promote cooperation on complex networks [37] and regular grids [5]. While [37] and [5] do an excellent job showing how coalitions – leaded by a single agent – emerge, here we consider a simpler/decentralized model (e.g. no leaders are considered and taxes are redistributed rather than centralized in a single entity) and focus our analysis on showing that local redistribution sets are optimal. Our approach does not require additional means of reciprocity, memory, leadership, punishment or knowledge about features of the network. We cover a wide range of dilemma strengths and explicitly show when the local redistribution promotes cooperation by itself. Notwithstanding, the analysis performed in [37] and [5] surely provides important insights to address in future works, on how to explicitly model the adherence to beneficiary sets and guarantee their stability. Also, while here we assume an egalitarian redistribution over each individual in the Beneficiary Set, we shall note that different redistribution heuristics may imply different levels of allocation fairness [33]. In this context, a recent work introduces the concept of Distributed Distributed Justice [20] and shows that local interactions may provide a reliable basis to build trust and reputation between agents, which can be used to regulate, in a decentralized way, the levels of justice in agents’ actions. This way, it is rewarding to note that local interactions not only constitute an optimal scale to form cooperative Beneficiary Sets (as we show, see below), but also provide the convenient interaction environment to allow justice in contributions to be sustained.

pu No blis t f he or d w di o str rk ib ing ut d io ra n

117

Flávio L. Pinheiro and Fernando P. Santos

ft

AAMAS’18, July 2018, Stockholm, Sweden

175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208

3 MODEL 3.1 Three Stage Redistribution Game

Here we propose a sequential game dynamics made of three stages. Focusing on an arbitrary agent i, these stages can be described as follows: (1) Agent i participates in a one-shot game (here a Prisoner’s Dilemma) with all his/her neighbors j. From each interaction j, he/she obtains a payoff πi, j . After all interactions, agent i P accumulates a total payoff Πi = j πi, j ; (2) Next, agent i contributes a fraction α of his/her payoff surplus (Πi − θ ) to be redistributed. The group that benefits from agent i contribution is called Beneficiary Set i (Bi ). (3) Finally, agent i receives his/her share from each Beneficiary Set that he/she is part of. We refer to α as the level of taxation, as it defines the fraction of the surplus that agents contribute, while θ is the threshold level of payoff that defines the surplus. By definition, agents with negative payoff cannot contribute (i.e., θ > 0); they might, however, receive benefits from the Beneficiary Sets. Each agent i contributes only to one Beneficiary Set Bi from which they cannot be part of, that is, agents do not receive from the Beneficiary Set they contribute to. A central question of this work is how to select Bi for each i. As 2018-02-03 20:52 page 2 (pp. 1-9)

209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232

Local Wealth Redistribution Promotes Cooperation in Multiagent Systems 1.0

233 critical level of taxation, ⍺*

234 235 236 237 238 239 240

0.8

𝜃=0

.99

a) .90 𝜃=0 .7 𝜃=0

0.6

293 294 295 296

𝜃 = 0.0

0.2

0

297

homogeneous network

c) 1.0

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

2.0

Temptation parameter, T

244

248 249

Figure 1: Solutions for the two-person game with wealth redistribution. Each curve indicates the critical taxation levels (α ∗ ) above which the nature of the social dilemma changes, for different payoff thresholds (θ ) and as a function of the Temptation parameter (T ).

250 251 252 253

we show, this decision has a profound and non-trivial impact on the overall cooperation levels in the system.

254 255 256 257 258 259 260

3.2

The Prisoner’s Dilemma Game

In general, all the possible outcomes of a two-strategy two-player game, in which two agents engage in a one-shot interaction that requires them to decide – independently and simultaneously – whether they wish to Cooperate (C) or to Defect (D), can be summarized in a payoff matrix, such as

261 262 263

C

264

D

265

C

D

R

S

T

P

290

2018-02-03 20:52 page 3 (pp. 1-9)

267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288

Un

289

which reads as the payoff obtained by playing the row strategy when facing an opponent with the column strategy. Here, R represents the Reward payoff for mutual cooperation and P the Punishment for mutual defection. When one of the individuals Defects and the other Cooperates, the first receives the Temptation payoff (T ) while the second obtains the Sucker’s payoff (S). In this manuscript we consider that agents interact according to the Prisoner’s Dilemma (PD). Agents are said to face a PD whenever the relationship between the payoffs is such that T > R > P > S [44]. In such a scenario, rational agents seeking to optimize their self-returns are expected to always Defect. However, since the best aggregated outcome would have both players cooperating (2R > 2P), agents are said to face a social dilemma: optimizing self-returns clashes with optimizing the social outcome. In this sense, mutual cooperation is Pareto Optimal and contributes to increase both average payoff (over mutual defection) and egalitarian social welfare (over unilateral cooperation) [6]. It is noteworthy to mention that other situations – with different optimal rational responses – arise when the parameters take a different relationship [21]: the Stag Hunt game when R > T > P > S; the Snowdrift Game when T > R > S > P; the Harmony Game when R > T > S > P; or the Deadlock Game when T > P > R > S, to name a few. Notwithstanding, the PD is by far the most popular metaphor of social dilemmas [44] and the one that presents the biggest challenge for cooperation to emerge.

266

single peak over hzi

degree (z)

ft

247

299

D(z) ⇡ z

303 304

degree (z)

305 306

Figure 2: Graphical depiction of the specific structures used in this work. a) Homogeneous Networks correspond to a structure in which all nodes have the same degree. b) Heterogeneous Networks are characterized by a high variance among the degree of nodes. The color of each node indicates its degree: blue tones represent lower degree and red tones higher degree. Panel c) and d) show, respectively, the degree distributions of the Homogeneous and Heterogeneous networks under analysis. In particular, we use scale-free networks as representatives of heterogeneous structures; these have a degree distribution that decays as a power law.

307 308 309 310 311 312 313 314 315 316 317 318 319

For these reasons, PD shall be the main focus of study in this manuscript. We further simplify the parameter space by considering that R = 1, P = 0, S = 1 − T and 1 < T ≤ 2 with the game being fully determined by the Temptation value (T). In that sense, higher temptation creates more stringent conditions for the emergence of cooperation.

3.3

301 302

pu No blis t f he or d w di o str rk ib ing ut d io ra n

246

298

300 1.1

fraction of nodes

0.0

243

Heterogeneous Network

d)

fraction of nodes

242

291

5

.50 𝜃=0 5 .2 𝜃=0

0.4

b)

292

241

245

AAMAS’18, July 2018, Stockholm, Sweden

320 321 322 323 324 325 326

Prisoner’s Dilemma with Wealth Redistribution

As an introductory example, let us start by analyzing the particular case of two interacting agents (i and j) in a one-shot event. In this case, the Beneficiary Sets of each agent (Bi and B j ) are composed only by the opponent. Wealth/payoff redistribution can thus be analyzed by considering a slightly modified payoff matrix, that takes into account the second and third stages. The resulting payoff matrix becomes

327 328 329 330 331 332 333 334 335 336

C

D

C

1

1 − T + α (T − θ )

D

T − α (T − θ )

0

337 338 339

where θ is the payoff threshold and α is the level of taxation. The rationale to arrive at this payoff structure is the following: whenever both players choose to act the same way the payoff remains the same as their contributions (from taxes) and benefits (from receiving the contributions of their opponent) cancel out. A Defector playing against a Cooperator sees his payoff of T subtracted by an amount α (T − θ ) while not receiving any benefit, since the Cooperator has

340 341 342 343 344 345 346 347 348

AAMAS’18, July 2018, Stockholm, Sweden

Homogeneous

349 350 351

a)

Flávio L. Pinheiro and Fernando P. Santos

a)

Heterogeneous

407

[⍺=0.5]

408 409

1.0

1.0

352

410

359 360 361

0.5 0.4 0.3

362

0.1

363

0.0 1.0 1.1 1.2 1.3

364

b)

369

374 375 376 377 378

0.7 0.6 0.5 0.4 0.3

0.1

379

381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396

0.0 1.0 1.1 1.2 1.3

1.4

1.5

1.6 1.7

0.75

0.50

401 402 403 404 405 406

1.4

1.5

1.6

1.7

1.8

1.9

2.0

419 420 421 422 423

Legend

424

𝜃 = 0.0 𝜃 = 0.4 𝜃 = 0.8 𝜃 = 1.2 𝜃 = 1.6 𝜃 = 2.0

0.8 0.6 0.4

425 426 427 428 429

0.2

430 431

0.0

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

432

2.0

Temptation Parameter, T

433

0.25

0.00

1.8 1.9 2.0

Temptation Parameter, T

Figure 3: Level of Cooperation on Homogeneous Random Networks (a) and Heterogeneous (Scale-free) Networks (b). Each plot shows the level of cooperation under a different combination of taxation level, α, and Temptation, T . In all cases the fitness threshold is fixed at θ = R = 1.0. Blue indicates regions where Cooperation dominates, Red delimits regions dominated by Defectors. Top bars above each panel indicate the level of cooperation in the absence of wealth redistribution, as a function of the Temptation payoff parameter. The level of cooperation is computed by estimating the expected fraction of cooperators when the population reaches a stationary state. To that end we run 104 independent simulations that start with 50% cooperators and 50% defectors. Population size of Z = 103 and intensity of selection β = 1.0.

398

400

1.3

[⍺=0.8]

1.0

397

399

1.2

Homogeneous 1.0

Un

380

417

Temptation Parameter, T

1.8 1.9 2.0

𝜃 = 1.00

0.2

416

418

1.1

b)

level of cooperation

373

0.8

No Wealth Redistribution

0.2

0.00

1.00

level of taxation, ⍺

372

1.0 0.9

370 371

1.6 1.7

415

0.0

Heterogeneous

366

368

1.5

414

0.4

0.25

Temptation Parameter, T

365

367

1.4

0.50

413

1.0

𝜃 = 1.00

0.2

0.75

412

0.6

ft

358

0.6

Level of Cooperation

357

level of cooperation

356

0.7

411

0.8

pu No blis t f he or d w di o str rk ib ing ut d io ra n

355

0.8

level of taxation, ⍺

354

1.00

Level of Cooperation

0.9

353

negative payoff and does not contribute. Likewise, the Cooperator is exempt from contributing but receives an additional contribution of α (T − θ ), which represents the amount taxed to the Defector. To inspect whether wealth redistribution changes the nature of the social dilemma (i.e. from a Prisoner’s Dilemma to another type of game) we have to inspect whether there is a difference in the relationship between the payoffs R and T or P and S. This sums up

434

Figure 4: Level of cooperation on Heterogeneous (a) and Homogeneous (b) populations for different values of the payoff threshold (θ ) as a function of the Temptation payoff parameter (T ). Gray Dashed line shows the results obtained in the absence of a wealth redistribution scheme. Population size of Z = 103 and intensity of selection β = 1.0.

436 437 438 439 440 441 442 443

to solving a single inequality,

444

T − α (T − θ ) < 1

(1)

445 446

which results in the critical values of α, α∗ >

435

T −1 (T − θ )

447 448

(2)

449 450

Hence, depending on the choice of θ and for a given T , α ∗ is the minimum level of taxation required to observe a change in the nature of the game faced by agents. It is straightforward to notice that the nature of the game changes from a Prisoner’s Dilemma to an Harmony Game as the relationship moves from T > R > P > S to R > T > S > P. Figure 1 shows α ∗ for different values of T and θ . Clearly, in well-mixed populations and under the simple scenario of a MAS composed by two agents, the redistribution mechanism has the simple effect of reshaping the payoff matrix, trivially changing the nature of the dilemma. Such a trivial conclusion cannot be drawn with large populations playing on networks, where we will show that different ways of assigning the Beneficiary Sets have a profound impact on the ensuing levels of cooperation. 2018-02-03 20:52 page 4 (pp. 1-9)

451 452 453 454 455 456 457 458 459 460 461 462 463 464

Local Wealth Redistribution Promotes Cooperation in Multiagent Systems

468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496

498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513

3.5

Games on Networks

We study the expected level of cooperation attained by the population. We estimate this quantity through computer simulations. The level of cooperation corresponds to the expected fraction of cooperators in a population that evolved after 2.5 × 106 iterations. We estimate this quantity by averaging the observed fraction of cooperators at the final of each simulation, over 104 independent simulations. Each simulation starts from a population with an equal composition of Cooperators and Defectors, which are randomly placed along the nodes of the network. In between each update round, each agent i plays once with all his/her zi nearest neighbors (i.e., agents they are directly connected with). The accumulated payoff over all interactions an agent i participates can be computed as

Un

497

D C Π i = nC i T − σi (1 − T )(ni + ni )

(3)

521

where niD (nC i ) is the number of neighbors of i that Defect (Cooperate) and σi is equal to 1 if i is a Cooperator and 0 otherwise. From the accumulated payoff, agents contribute to a pool a fraction α of the surplus Π − θ . The fitness fi of an agent i results from subtracting from his/her accumulated payoff his/her contributions plus the share he/she obtains from each of the Beneficiary Sets j he/she participates in. We shall underline that, while T is the same for all agents (that is, the dilemma is the same for everyone in the

522

2018-02-03 20:52 page 5 (pp. 1-9)

514 515 516 517 518 519 520

Heterogeneous

523

[𝜃 = 0.80, ⍺ = 0.50]

a)

524

1.0

525 526

0.8

527

Nearest Neighbors (d = 1)

0.6

528 529 530

Random Group

0.4

531 0.2

532 533

0.0

534 1.0

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

2.0

Temptation parameter, T

[𝜃 = 1.00, ⍺ = 0.90]

1.0

537 538 539

Nearest Neighbors (d = 1)

0.8

535 536

Homogeneous

b)

level of cooperation

467

Structured Populations

ft

Let us consider a population of Z agents in which agents correspond to the nodes/vertices of a complex network, while links dictate who interacts with whom. The structure reflects the existence of constraints that limit interactions between agents. These constraints can arise from spatial or communication limitations. The number of interactions that each agent i participates in defines his/her degree zi . The distribution of degrees, D(z), describes the fraction of agents that has degree z. In this work we consider two structures: Homogeneous Random Graphs [40, 41] and Scale-Free Barabási Networks [3]. Homogeneous Random Graphs are generated by successively randomizing the ends of pairs of links from an initially regular graph (e.g. Lattice or Ring). The resulting structure has a random interaction structure but all nodes in the network have the same degree. Figure 2a) depicts graphically an example of such structures and Figure 2c) the corresponding Degree distribution. Scale-free networks are generated by an algorithm of growth and preferential attachment [3]. This algorithm is as follows: 1) start from three fully connected nodes; 2) add, sequentially, each of the Z − m remaining nodes; 3) each time a new node is added, it connects to m pre-existing nodes, selecting preferentially nodes with higher degree. Here we have used m = 3 The resulting network is characterized by a heterogeneous degree distribution (one which decays as a power law), in which the majority of the nodes have few connections while a few have many. Figure 2b) shows a graphical example of such structure and Figure 2d) the degree distribution. In the following we explore the case of networks with Z = 103 P nodes and average degree of ⟨z⟩ = z zD(z) = 4. During the simulations we make use of 20 independently generated networks of each type.

level of cooperation

3.4

466

pu No blis t f he or d w di o str rk ib ing ut d io ra n

465

AAMAS’18, July 2018, Stockholm, Sweden

540 541

0.6

542 543

0.4

544

Random Group

0.2

545 546 547

0.0

1.0

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

2.0

Temptation parameter, T

Figure 5: Comparison between the effects of assigning the nearest neighbors of an agent i to the corresponding Beneficiary Set Bi (dark blue line) and when agents are assigned at random to Bi (light blue), on the level of cooperation in the domain of the Temptation payoff parameter, T. Panel a) shows the results on Heterogeneous populations and panel b) the impact on Homogeneous populations. Population size of Z = 103 and intensity of selection β = 1.0.

548 549 550 551 552 553 554 555 556 557 558 559

population), heterogeneous populations introduce an additional complexity layer by implying that different agents may vary in the maximum values of accumulated payoff that they are able to earn. This can be formalized as Z X α (Π j − θ ) (4) fi = (1 − α )(Πi − θ ) + δi, j |B j | j

560

where δi, j is equal to one if i is part of the Beneficiary Set towards which j contributes and zero otherwise, while |B j | denotes the size of set B j . Evolution in the frequency of strategies adopted in the population happens through a process of imitation or social learning. At each iteration a random agent, say i, compares his fitness with the fitness of a neighbor, say j. Depending on the fitness difference, i adopts the strategy of j with probability 1 p= (5) 1 + Exp(−β ( f j − fi ))

567

The meaning of this sigmoid function can be understood as follows: if j is performing much better than i, then i updates his/her strategy,

578

561 562 563 564 565 566

568 569 570 571 572 573 574 575 576 577

579 580

AAMAS’18, July 2018, Stockholm, Sweden

583 584 585 586 587

adopting the strategy of j. Conversely, if j is performing much worse, i does not update the strategy. The parameter β, often called the intensity of selection and akin to a learning rate, dictates how sharp is the transition between these two regimes, as f j − fi approaches zero. Large β means that individuals act in a more deterministic way, updating strategies at the minimum difference; small β means that individuals are prone to make imitation mistakes.

593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627

In this section we start by analyzing the scenario in which the Beneficiary Set of each agent i corresponds to his/her nearest neighbors. Hence, the size of the Beneficiary set of i is |Bi | = zi . These are also the agents from whom he/she interacts with and obtains a payoff from. Figure 3 shows the achieved levels of cooperation when the payoff threshold is set to θ = R = 1.0, as a function of the Temptation payoff (T ) and the level of taxation (α). Figure 3a shows the results on Homogeneous networks, and Figure 3b on Heterogeneous. We find that, for a fixed payoff threshold (θ ), increasing the level of taxation results in an increase in the levels of cooperation. This effect diminishes with an increase in the Temptation (T ). That is, when increasing T the minimum value of α necessary to promote cooperation increases as well. The same behavior is observed in both structures. However, there is a larger degree of cooperation on Heterogeneous networks, where there is always a level of taxation for a given Temptation that guarantees a 100% level of cooperation. Hence, in order for cooperation to be evolutionary viable on homogeneous networks, more stringent conditions are necessary, e.g. higher tax levels. Figure 4 shows how the level of cooperation depends on variations of the fitness threshold (0 ≤ θ ≤ 2.0, in intervals of 0.4) while keeping a fixed level of taxation (α = 0.5) under different levels of the Temptation payoff (T ). Figure 4a shows the results obtained for Heterogeneous networks and panel b) the results on Homogeneous structures. For a constant level of taxation, α, decreasing the payoff threshold, θ , increases the range of Temptation, T , under which cooperation can possibly evolve. This is the case in both types of structures. However, once again, the effect is more limited in homogeneous populations. Both Figure 3 and 4 highlight the positive impact of a local wealth redistribution mechanism in the enhancement of cooperation. It also puts in evidence that the success of such mechanism depends on the volume of payoff that is redistributed. Ultimately, this can be done by either increasing the level of taxation, α or decreasing the payoff threshold, θ , that defines the taxable payoff.

628 629 630 631 632 633 634 635 636 637 638

642

0.8

4.2

Randomized Beneficiary Set

Next we explore to which extent the results obtained depend on the way agents are being assigned to each Beneficiary Set. To that end, we compare two cases: i) nearest set assignment – the Beneficiary Set of each agent corresponds to her/his nearest neighbors, as above; and ii) random set assignment – agents are assigned at random to each Beneficiary Set. The number of agents assigned to each set is equal to the degree of the contributing agent, in both cases, which guarantees that the collected payoffs from each agent are

643

Nearest Neighbors (d = 1)

d=2 0.6

644 645

d=4

0.4

646

Legend d=1 d=2 d=3 d=4

0.2

647 648 649

0.0 1.0

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

2.0

1.0

0.8

651

Homogeneous

652 653 654 655

Nearest Neighbors (d = 1)

d=2

656 657

0.6

658 Legend

d=3

0.4

d=1 d=2 d=3 d=4

0.2

1.1

1.2

1.3

659 660 661 662 663

0.0

1.0

650

Temptation parameter, T

[𝜃 = 1.0, ⍺ = 0.90]

b)

level of cooperation

592

640 641

pu No blis t f he or d w di o str rk ib ing ut d io ra n

591

4 RESULTS 4.1 Wealth Redistribution and the Level of Cooperation in Structured Populations

Un

590

639

[𝜃 = 0.8, ⍺ = 0.50]

1.0

588 589

Heterogeneous

a)

ft

582

level of cooperation

581

Flávio L. Pinheiro and Fernando P. Santos

1.4

1.5

1.6

1.7

1.8

1.9

2.0

Temptation parameter, T

664 665 666

Figure 6: Panel a) compares how extending beneficiary sets, from the nearest neighbors (d = 1) to nodes at a distance up to d = 4 links away, impacts the level of cooperation on Heterogeneous networks. Panel b) shows how extended beneficiary sets impact the level of cooperation on Homogeneous networks. In both cases extending the set of beneficiaries has a negative a negative impact in the levels of cooperation. Population size of Z = 103 and intensity of selection β = 1.0.

667 668 669 670 671 672 673 674 675 676

distributed among the same number of individuals in both i) and ii). Figure 5a and b show the results obtained, respectively, on Heterogeneous and Homogeneous networks. We consider θ = 0.5, α = 0.9 and explore the domain 1.0 ≤ T ≤ 2. Dark blue curves show the results obtained under the nearest set assignment and light blue curves the results obtained under a random set assignment. The results show that the ability of a wealth redistribution mechanism lies in the redistribution of the taxed payoff among the agents that are spatially related. A random assignment of agents drastically decreases the levels of cooperation obtained in both networks. But to which extent do the Beneficiary Sets need to be constrained spatially?

677 678 679 680 681 682 683 684 685 686 687 688 689 690

4.3

Extended Beneficiary Set

691

To answer the previous question, we explore the case in which all nodes (up to a distance of d links) are assigned to the Beneficiary Set of a focal agent i; when d = 1 the previous results are thereby obtained.

692

2018-02-03 20:52 page 6 (pp. 1-9)

696

693 694 695

Local Wealth Redistribution Promotes Cooperation in Multiagent Systems

698

a)

699

1.0 0.9

701

0.8

702 703 704 705 706 707

level of taxation, α

700

4.4

Homogeneous

number of generations

697

0.7 0.6 0.5 0.4 0.3

708

0.2

709

0.1

710

0.0 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0

711

716

0.9

717

0.8

721 722 723 724

level of taxation, α

1.0

720

ft

Heterogeneous

316

0.7 0.6 0.5 0.4 0.3 0.2

= 1.0

725

0.1

726

0.0 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0

727 728

732 733 734 735 736 737 738

32

10

Figure 7: Panel a) shows the fixation times (in generations) on homogeneous networks. Panel b) shows the fixation times (in generations) in heterogeneous networks. A generation corresponds to Z iteration steps and the fixation times indicate the expected time that the population takes to arrive to a state dominated by Cooperators or Defectors when starting from a state with equal abundance of both strategies. Population size of Z = 103 and intensity of selection β = 1.0.

Un

731

4.5

100

Temptation Parameter, T

729 730

32 10

number of generations

b)

715

719

100

739 740 741 742

753

Figure 6a and b show the results up to d = 4 on Heterogeneous and Homogeneous networks respectively. In both cases, we see that an expansion in the size of the Beneficiary Set leads to a decrease in the levels of cooperation. This result further reinforces the conclusion that wealth redistribution is only efficient when agents return, in form of taxes, a share of the accumulated payoffs to the agents they have engaged with. We shall underline that here both distance and size of Bi play a role on the obtained results, while in the previous section the size of Bi was kept constant for each i across the different treatments, thus disambiguating the effect of Bi size and distance on the resulting cooperation levels.

754

2018-02-03 20:52 page 7 (pp. 1-9)

743 744 745 746 747 748 749 750 751 752

What is the cost of wealth redistribution?

Figure 7a and b shows the fixation times of populations when θ = 1.0 along the domain bounded by 0.0 ≤ α ≤ 1.0 and 1.0 ≤ T ≤ 2.0. The fixation times correspond to the expected number of generations (i.e., sets of Z potential imitation steps) for the population to reach a state in which only one strategy is present in the population. These plots map directly into Figure 3a and b, allowing to compare the relative fixation times of regions with high/low levels of cooperation. We observe that the evolution of cooperation is associated with an increase in the fixation times. This increase can be in some situations an order of magnitude higher. The regions that exhibit larger fixation times lie in the critical boundary that divides areas of defectors and cooperators dominance (Figure 3). Hence, promoting cooperation by redistributing wealth also requires a longer waiting time for the population to reach a state of full cooperation. However, setting higher taxation values than the bare minimum necessary for the emergence of cooperation allows populations to reach fixation quicker.

pu No blis t f he or d w di o str rk ib ing ut d io ra n

713

718

316

Temptation Parameter, T

712

714

=1.0

1000

AAMAS’18, July 2018, Stockholm, Sweden

Multiple Contribution Brackets

In the real world, taxes are unlikely to be defined by a single threshold (θ ) that separates agents who contribute from those that do not. In reality taxes are progressive, in the sense that taxation levels (α) increase with increasing level of income (in this case accumulated payoff). In this section we implement a similar approach and inspect the impact of increasing the number of taxation brackets. Let us consider that, instead of a single threshold we now have B taxation brackets divided by B − 1 threshold levels. For each bracket we define αb as the effective tax and θb as the bottom threshold of bracket b, where b ∈ {0, 1, 2, ..., B − 1, B}. By definition B = 0 corresponds to the case in which no taxes are collected, and the redistribution of wealth is absent. Moreover, B = 1, implies the existence of a single bracket were all individuals would contribute, a case that we do not explore in this manuscript. B = 2, corresponds to the case in which there are two brackets, which is the scenario that we have explored until now. We consider the case in which taxation increases linearly with increasing brackets. Let us define θb = bθ/B. Individuals in bracket b have their payoff surplus taxed by αb = (b − 1)α/B when their accumulated payoff falls into θb < Π ≤ θb+1 for b < B − 1. For b = B the tax level is αb = α and affects all individuals with Π > θ . As an example, for B = 3 each bracket would be characterized by the following tax levels

b b b b

= 0) = 1) = 2) = 3)

αb αb αb αb

= 0 for all individuals with Π ≤ θ/3; = α/3 for all individuals with θ/3 < Π ≤ 2θ/3; = 2α/3 for all individuals with 2θ/3 < Π ≤ θ ; = α for all individuals with Π > θ .

In this way we use θ and α as the upper level bound and only parameters in this condition. We find that variations in the number of taxation brackets (B=3,4,5) have only a marginal impact in the overall levels of cooperation observed when compared with the scenarios studied so far (B=2).

755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812

AAMAS’18, July 2018, Stockholm, Sweden

Flávio L. Pinheiro and Fernando P. Santos

814

3.5 ���

815 3.0

821

in eq ua lit y

Varf / Varp

820

2.0

in g

819

2.5

as

818

���

���

���

re

817

1.5

de c

payoff threshold, 𝜃

816

���

1.0

822 823

0.5

824 2.0

825

0.0

0.1

0.2

0.3 0.4

826

0.5

0.6 0.7

level of taxation, ⍺

0.8

0.9

1.0

827

829 830 831 832 833 834 835 836 837 838

Figure 8: Relative wealth inequality after the redistribution step, in a heterogeneous population dominated by cooperators and for different combinations of taxation level (α) and threshold (θ ). We quantify the relative wealth inequality after the redistribution step as the ratio between the variance of the fitness distribution (V ar f , i.e. variance in gains across the population after redistribution) and the variance of the accumulated payoff distribution (V arp , i.e. variance in gains before redistribution). Population size of Z = 103 and intensity of selection β = 1.0.

pu No blis t f he or d w di o str rk ib ing ut d io ra n

828

839

841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870

4.6

Wealth Inequality

Finally, we discuss the effect of wealth redistribution on fitness inequality. First, it is important to highlight that the observed levels of inequality depend, by default, on the distribution of strategies and network degree. In homogeneous structures, if every agent adopts the same strategy – either Defectors or Cooperators – everyone obtains the same fitness. In heterogeneous structures, a Cooperation dominance scenario bounds the feasible equality levels, given the degree distribution of the population. In fact, some agents engage in more interactions than others and Beneficiary Sets have different sizes, depending on the particular connectivity of agents. We shall focus on this scenario. We compare the variance of fitness (i.e. gains after the redistribution step) and the variance of accumulated payoff (i.e. gains before the redistribution step) in order to quantify the relative inequality after we apply the proposed redistribution mechanism. In particular, we use the ratio between the variance of fitness and the variance of accumulated payoff as a metric of resulting wealth inequality. Figure 8 shows how higher levels of θ and α reduce the resulting inequality. In fact, while increasing payoff threshold limits taxation to the richer agents, increasing level of taxation increases the flow of fitness from rich agents to their Beneficiary Sets. In the most strict case – high θ and α – the variance of the fitness distribution is reduced to as low as 7% of the accumulated payoff distribution.

Un

840

5

appropriately choosing the level of taxation (α) and payoff threshold (θ ) it is possible to shift from a Defector dominance to a Cooperator dominance dynamics. Moreover, we find that in Heterogeneous populations allow us to ease the redistribution mechanism – that is, imposing lower taxation rates and/or lower taxable surplus values when compared with Homogeneously structured populations. Additionally, we show, for the first time, that different assignments of Beneficiary Sets significantly impact the ensuing levels of cooperation. Local Beneficiary Sets, where agents receive the contributions from their direct neighbors, constitute a judicious choice when compared with Beneficiary Sets that are formed by 1) agents randomly picked from the population or 2) by including agents at higher distances. Naturally, a Local wealth redistribution scheme may not only prove optimal in terms of achieved cooperation levels, but also reveal much simpler to implement, by exempting the need of central redistribution entities and by minimizing the number of peers that agents need to interact with. We shall highlight, however, that promoting cooperation through a wealth redistribution mechanism bears longer fixation times, in terms of the number of iterations required to achieve overall cooperation. Here we assume that the redistribution mechanism is externally imposed. Agents are not able to opt out from the taxation scheme. Given that this mechanism increases the overall cooperation and average payoff in the system, an argument for its acceptance - by rational agents - can be formulated based on the infamous veil of ignorance proposed by John Rawls [35]: Agents should decide the kind of society they would like to live in without knowing their social position. Agents would, this way, prefer a cooperative society where redistribution exists, provided that here average payoff is maximized. Notwithstanding, future research shall analyze the role of more complex strategies that give opportunity of agents to voluntarily engage (or not) in the proposed redistribution scheme. Alongside, effective mechanisms that discourage the second order free riding problem (i.e., free riding by not contributing to the redistribution pot, while expecting others to do so) shall be examined. Future works shall also evaluate whether alternative taxation schemes are prone to be more efficient than the one proposed here. In all these cases, an evolutionary game theoretic framework – such as the one here developed – constitutes a promising toolkit to employ.

ft

4.0

813

CONCLUSION

To sum up, we show that wealth redistribution embodies an effective mechanism that significantly helps cooperation to evolve. It works by fundamentally changing the nature of the dilemma at stake: by

871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911

6

ACKNOWLEDGMENTS

912

The authors acknowledge the useful discussions with Francisco C. Santos, Jorge M. Pacheco and Aamena Alshamsi. F.L.P. is thankful to the Media Lab Consortium for financial support. F.P.S. acknowledges the financial support of Fundação para a Ciência e Tecnologia (FCT) through PhD scholarship SFRH/BD/94736/2013, multi-annual funding of INESC-ID (UID/CEC/50021/2013) and grants PTDC/EEISII/5081/2014, PTDC/MAT/STA/3358/2014.

913 914 915 916 917 918 919 920 921

REFERENCES

922

[1] Stéphane Airiau, Sandip Sen, and Daniel Villatoro. 2014. Emergence of conventions through social learning. Autonomous Agents and Multi-Agent Systems 28, 5 (2014), 779–804. [2] Ian F Akyildiz, Weilian Su, Yogesh Sankarasubramaniam, and Erdal Cayirci. 2002. Wireless sensor networks: a survey. Computer Networks 38, 4 (2002), 393–422. [3] Réka Albert and Albert-László Barabási. 2002. Statistical mechanics of complex networks. Reviews of Modern Physics 74, 1 (2002), 47. 2018-02-03 20:52 page 8 (pp. 1-9)

923 924 925 926 927 928

Local Wealth Redistribution Promotes Cooperation in Multiagent Systems

986

2018-02-03 20:52 page 9 (pp. 1-9)

932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984

Un

931

Letters 116, 12 (2016), 128702. [33] Jeremy Pitt, Julia Schaumeier, Didac Busquets, and Sam Macbeth. 2012. Selforganising common-pool resource allocation and canons of distributive justice. In Self-Adaptive and Self-Organizing Systems (SASO), 2012 IEEE Sixth International Conference on. IEEE, 119–128. [34] Bijan Ranjbar-Sahraei, Haitham Bou Ammar, Daan Bloembergen, Karl Tuyls, and Gerhard Weiss. 2014. Theory of cooperation in complex social networks. In Proceedings of AAAI’14. AAAI Press. [35] John Rawls. 2009. A theory of justice. Harvard University Press. [36] Luke Rendell, Robert Boyd, Daniel Cownden, Marquist Enquist, Kimmo Eriksson, Marc W Feldman, Laurel Fogarty, Stefano Ghirlanda, Timothy Lillicrap, and Kevin N Laland. 2010. Why copy others? Insights from the social learning strategies tournament. Science 328, 5975 (2010), 208–213. [37] Norman Salazar, Juan A Rodriguez-Aguilar, Josep Ll Arcos, Ana Peleteiro, and Juan C Burguillo-Rial. 2011. Emerging cooperation on complex networks. In Proceedings of the 2011 International Conference on Autonomous Agents and Multiagent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 669–676. [38] Francisco C Santos and Jorge M Pacheco. 2005. Scale-free networks provide a unifying framework for the emergence of cooperation. Physical Review Letters 95, 9 (2005), 098104. [39] Francisco C Santos, Jorge M Pacheco, and Tom Lenaerts. 2006. Cooperation prevails when individuals adjust their social ties. PLoS Computational Biology 2, 10 (2006), e140. [40] Francisco C Santos, JF Rodrigues, and Jorge M Pacheco. 2005. Epidemic spreading and cooperation dynamics on homogeneous small-world networks. Physical Review E 72, 5 (2005), 056128. [41] Fernando P Santos, Jorge M Pacheco, Ana Paiva, and Francisco C Santos. 2017. Structural power and the evolution of collective fairness in social networks. PloS ONE 12, 4 (2017), e0175687. [42] Fernando P. Santos, Jorge M. Pacheco, and Francisco C. Santos. 2018. Social norms of cooperation with costly reputation building. In AAAI’18. AAAI Press. [43] Sven Seuken, Jie Tang, and David C Parkes. 2010. Accounting Mechanisms for Distributed Work Systems. In AAAI’10. AAAI Press. [44] Karl Sigmund. 2010. The calculus of selfishness. Princeton University Press. [45] Robert L Trivers. 1971. The evolution of reciprocal altruism. The Quarterly Review of Biology 46, 1 (1971), 35–57. [46] Vítor V Vasconcelos, Francisco C Santos, and Jorge M Pacheco. 2015. Cooperation dynamics of polycentric climate governance. Mathematical Models and Methods in Applied Sciences 25, 13 (2015), 2503–2517. [47] Markus Waibel, Dario Floreano, and Laurent Keller. 2011. A quantitative test of Hamilton’s rule for the evolution of altruism. PLoS Biology 9, 5 (2011), e1000615.

pu No blis t f he or d w di o str rk ib ing ut d io ra n

985

[4] Josep Ll Arcos, Marc Esteva, Pablo Noriega, Juan A Rodríguez-Aguilar, and Carles Sierra. 2005. Engineering open environments with electronic institutions. Engineering Applications of Artificial Intelligence 18, 2 (2005), 191–204. [5] Juan C Burguillo-Rial. 2009. A memetic framework for describing and simulating spatial prisoner’s dilemma with coalition formation. In Proceedings of AAAI’09. AAAI Press, 441–448. [6] Ulle Endriss and Nicolas Maudet. 2003. Welfare engineering in multiagent systems. In International Workshop on Engineering Societies in the Agents World. Springer, 93–106. [7] Eithan Ephrati and Jeffrey S Rosenschein. 1996. Deriving consensus in multiagent systems. Artificial Intelligence 87, 1-2 (1996), 21–74. [8] Marc Esteva, Bruno Rosell, Juan A Rodriguez-Aguilar, and Josep Ll Arcos. 2004. AMELI: An agent-based middleware for electronic institutions. In Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems. IEEE Computer Society, 236–243. [9] Michal Feldman and John Chuang. 2005. Overcoming free-riding behavior in peer-to-peer systems. ACM SIGecom Exchanges 5, 4 (2005), 41–50. [10] Michael R Genesereth, Matthew L Ginsberg, and Jeffrey S Rosenschein. 1986. Cooperation without communication. In Proceedings of AAAI’86. AAAI Press. [11] Philippe Golle, Kevin Leyton-Brown, Ilya Mironov, and Mark Lillibridge. 2001. Incentives for sharing in peer-to-peer networks. In Electronic Commerce. Springer, 75–87. [12] Nathan Griffiths. 2008. Tags and image scoring for robust cooperation. In Proceedings of the 2008 International Conference on Autonomous Agents and Multi-agent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 575–582. [13] William D Hamilton. 1964. The genetical evolution of social behaviour. Journal of Theoretical Biology 7, 1 (1964), 17–52. [14] TA Han. 2016. Emergence of Social Punishment and Cooperation through Prior Commitments. In Proceedings of AAAI’16. AAAI Press, 2494–2500. [15] TA Han, Luís Moniz Pereira, Luis A Martinez-Vaquero, and Tom Lenaerts. 2017. Centralized vs. Personalized Commitments and their influence on Cooperation in Group Interactions. In Proceedings of AAAI’17. AAAI Press. [16] Chien-Ju Ho, Yu Zhang, Jennifer Vaughan, and Mihaela Van Der Schaar. 2012. Towards social norm design for crowdsourcing markets. In AAAI’12 Technical Report WS-12-08. AAAI Press. [17] Lisa-Maria Hofmann, Nilanjan Chakraborty, and Katia Sycara. 2011. The evolution of cooperation in self-interested agent societies: a critical study. In Proceedings of the 2011 International Conference on Autonomous Agents and Multi-agent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 685–692. [18] Genki Ichinose, Yoshiki Satotani, and Hiroki Sayama. 2017. How mutation alters fitness of cooperation in networked evolutionary games. arXiv preprint arXiv:1706.03013 (2017). [19] Nicholas R Jennings, Katia Sycara, and Michael Wooldridge. 1998. A roadmap of agent research and development. Autonomous Agents and Multi-agent Systems 1, 1 (1998), 7–38. [20] David Burth Kurka and Jeremy Pitt. 2016. Distributed distributive justice. In Self-Adaptive and Self-Organizing Systems (SASO), 2016 IEEE 10th International Conference on. IEEE, 80–89. [21] Michael W Macy and Andreas Flache. 2002. Learning dynamics in social dilemmas. Proceedings of the National Academy of Sciences 99 (2002), 7229–7236. [22] Martin A Nowak. 2006. Five rules for the evolution of cooperation. Science 314, 5805 (2006), 1560–1563. [23] Martin A Nowak. 2012. Evolving cooperation. Journal of Theoretical Biology 299 (2012), 1–8. [24] Martin A Nowak and Robert M May. 1992. Evolutionary games and spatial chaos. Nature 359, 6398 (1992), 826–829. [25] Martin A Nowak and Karl Sigmund. 2005. Evolution of indirect reciprocity. Nature (2005). [26] Hisashi Ohtsuki, Christoph Hauert, Erez Lieberman, and Martin A Nowak. 2006. A simple rule for the evolution of cooperation on graphs. Nature 441, 7092 (2006), 502. [27] Elinor Ostrom. 2015. Governing the commons. Cambridge University Press. [28] Liviu Panait and Sean Luke. 2005. Cooperative multi-agent learning: The state of the art. Autonomous Agents and Multi-agent Systems 11, 3 (2005), 387–434. [29] Ana Peleteiro, Juan C Burguillo, and Siang Yew Chong. 2014. Exploring indirect reciprocity in complex networks using coalitions and rewiring. In Proceedings of the 2014 International Conference on Autonomous Agents and Multi-agent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 669– 676. [30] Flávio L. Pinheiro and Dominik Hartmann. 2017. Intermediate Levels of Network Heterogeneity Provide the Best Evolutionary Outcomes. Scientific Reports 7, 1 (2017), 15242. [31] Flavio L Pinheiro, Jorge M Pacheco, and Francisco C Santos. 2012. From local to global dilemmas in social networks. PloS ONE 7, 2 (2012), e32114. [32] Flávio L Pinheiro, Francisco C Santos, and Jorge M Pacheco. 2016. Linking individual and collective behavior in adaptive social networks. Physical Review

930

ft

929

AAMAS’18, July 2018, Stockholm, Sweden 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044