Cyberinfrastructure Requirements for Computational

0 downloads 0 Views 419KB Size Report
Cyberinfrastructure Requirements for Computational Social Science: Agent-Based Models at Scale ... Agent-based models (ABMs) have become essential for.
Cyberinfrastructure Requirements for Computational Social Science: Agent-Based Models at Scale Santiago Núñez-Corrales and Les Gasser 1

1

Illinois Informatics Institute – UIUC, School of Information Sciences and National Center for Supercomputing Applications (NCSA) – UIUC 1 2 [email protected], [email protected] 2

Introduction

Scale: beyond problem size

In the same sense in which telescopes, microscopes and particle accelerators are the tools of the trade in the natural sciences, modeling and simulation have become central in developing theory and practice in social science [1]. Computational social science (CSS) extends the reach of research in social science [2], and should facilitate the work of the practitioner in her particular discipline by providing at least three constructs: (a) an agile model development cycle starting from prototypes towards fully-fledged simulations, (b) integration with real data (archives or streams) and (c) the infrastructure to decouple models from their execution, thereby enhancing multidisciplinary collaboration.

Intuitively, we interpret simulating at scale to mean executing a dynamic model whose size and complexity meet a frontier where research goals collide with practical feasibility, available resources, and outcome uncertainty/error. More formally, scales provide information about the relative states of reference systems where the same fundamental laws are expected to hold [8]. Scale is therefore a property of attributes in a delimited portion of an accessible world that rests on the concept of measurement. Given that social systems are complex dynamical systems [9], the definition of scale should be both generally applicable regardless of system and related to uncertainty and error. We summarize here our ongoing theoretical work towards providing precise definitions of scale by moving beyond the notion of problem size.

Agent-based models (ABMs) have become essential for empirically exploring social hypotheses in silico and more recently part of its standard practice [3]. Communication, as defined by sets of constraints in the types of messages and their destinations, is a strong determinant both in the ability of an agent network to reach its goals and in the efficiency of doing so. Communication patterns may also be abstracted as networks with causal structure, a case with analogues in biology where understanding living systems depends on the availability and quality of measurements, the complexity of the system being measured and the recency of events [4]. Information in both cases allows finding invariants at higher levels of abstraction by determining how properties of information transformation and exchange evolve in time. Networks in this context are the main tool for untangling complex social phenomena. Our aim is to design, implement and test novel software infrastructure serving social science scholars and practitioners by offering large-scale experiments, precise specification of models, and transparent resource integration. We hypothesize that the scaling properties of very large ABMs may be improved with better scaling strategies derived from a more general scaling and simulation theory. Cyberinfrastructure-driven transformations in CSS 1

2

3

2

Moving from ontological debates to epistemic goals capable of finding adequate levels of explanation from observable facts and approximating models [5]. Moving from small worlds to large-scale experiments through appropriation of adequately designed cyberinfrastructure resources that allow models to be empirically embedded [6]. Moving from information by-products to information-centric representations that make use of high-level patterns, coupled with adequate experimental design and data analysis tools, ultimately leading to powerful insights that reveal the connection between information and circumstance [7].

Definition (Measurement). Let the properties A, A0 respectively of systems S, S 0 be measurable sets. Let also σ : A → R, µ : A0 → R and f : A0 → A denote bi-monotonic reference and gauge functions for systems S and S 0, as well as the total mapping from reference to gauge systems respectively. Let  ∈ R+ denote an arbitrary small error. A measurement function is M : P(A → R) × P(A0 → R) × A0 → R × {0, 1} such that for particular values of σ, µ, a0 M(σ, µ, a0) = (σ(f (a0)), δ(σ, µ, a0)) where  1

0 0 |σ(f (a )) − µ(a )| ≤  0 δ(σ, µ, a ) =  0 otherwise

Definition (Scale) Let a system S have problem size Ω and problem structure ρ. The scale Λ of S is the ratio V (Ω, ρ) Λ(Ω, ρ) = U (Ω, ρ) · C(Ω, ρ) for utility, volume and cost functions U , V and C for a particular response variable. Λ is defined in multiples of its fundamental unit of inverse system effort V (Ω, ρ) C(Ω, ρ)

Definition (Scalability) Let a system S have positive problem size Ω and problem structure ρ with monotonically decreasing gauge function Λ(Ω, ρ) in Ω. S is scalable if there exists Ω∗ such that Ω∗ > Ω Λ(Ω∗, ρ) = 1 Λ(Ω, ρ) > Λ(Ω∗, ρ)

Architecture for CSS at Scale

Conclusion • Our early work suggests that scale is feature-rich, and

spanned by two fundamental objects: problem size and interaction topology. • Systems integration, human-computer interaction, and model-to-infrastructure matching may be be critical for research effectiveness.

Current work and next steps Figure 1: Block view of proposed cyberinfrastructure for CSS research [10].

Our architectural design (Fig. 1) attempts to match wellknown concerns [11] with the possibilities brought forth by current and future computing technologies and infrastructure. All proposed elements are in correspondence with a set of requirements that must hold throughout the course of each simulation: Contract specification Experiments are abstract contracts that contain sufficient information for their evaluation in terms of falsifcation of theories [12]. A simulation in our setting is specified through a formal language that encodes descriptions of (1) agent communication topologies, (2) their rules, parameters and number (3) the abstract representation of the world model. Contract-to-model The contract is instanced in a parameterizable massively parallel MAS simulation toolkit that support parallel execution in terms of communication topologies, agent actions and world representations. The toolkit is configurable and makes use of available parallel middleware through various implementation strategies. Model-to-execution The instancing of the model is translated into executable statements where (1) matching the execution specification to a particular infrastructure setting is solved by reduction to a goal satisfaction problem in AI and (2) the final execution is found through optimization in the goal satisfaction space. Initial scaling properties are estimated through a library that implements the formal framework. Execution-to-outcome After the model is executed, outcomes are classified in three categories: (1) profiling data in relation to communication topologies, (2) human-readable machine execution logs and (3) experimental results in structured form. In particular, standard scientific file formats are expected for later analysis and visualization. Post-processing and data management A data pipeline similar in spirit to the GStreamer media framework [13] or yt [14] allows composing data layers and applying the application of analysis/visualization filters. Outcomes and their human-readable representation are subject to provenance and data curation procedures.

We’re implementing an end-to-end infrastructure prototype. We’ll demonstrate its generality & power on science questions in several collective domains: population dynamics of knowledge diffusion [15]; how soil fertility relates to social structures of bacterial communication [16]; collective effects of small interference RNA therapies against neurodegenerative diseases [17]; and (provisionally) how organizations form in networks of cognitively sophisticated agents.

References [1] Rosaria Conte, Nigel Gilbert, Giulia Bonelli, Claudio Cioffi-Revilla, Guillaume Deffuant, Janos Kertesz, Vittorio Loreto, Suzy Moat, J-P Nadal, Anxo Sanchez, et al. Manifesto of computational social science. The European Physical Journal Special Topics, 214(1):325–346, 2012. [2] Bent Flyvbjerg. Making social science matter: Why social inquiry fails and how it can succeed again. Cambridge university press, 2001. [3] Nigel Gilbert and Pietro Terna. How to build and use agent-based models in social science. Mind & Society, 1(1):57–72, 2000. [4] Steven Bernstein, Richard Ned Lebow, Janice Gross Stein, and Steven Weber. God gave physics the easy problems: adapting social science to an unpredictable world. European Journal of International Relations, 6(1):43–76, 2000. [5] Jeroen Van Bouwel and Erik Weber. De-ontologizing the debate on social explanations: A pragmatic approach based on epistemic interests. Human Studies, 31(4):423–442, 2008. [6] Riccardo Boero and Flaminio Squazzoni. Does empirical embeddedness matter? methodological issues on agent-based models for analytical social science. Journal of Artificial Societies and Social Simulation, 8(4), 2005. [7] Jon Barwise. Information and circumstance. Notre Dame Journal of Formal Logic, 27(3):324–338, 1986. [8] Laurent Nottale. The theory of scale relativity. International Journal of Modern Physics A, 7(20):4899–4936, 1992. [9] Alessandro Vespignani. Predicting the behavior of techno-social systems. Science, 325(5939):425–428, 2009. [10] S. Núñez Corrales and L. Gasser. Simulation-oriented cyberinfrastructure for computational social science. In Computational Social Science Society of the Americas 2016, Santa Fe NM, United States, (paper accepted). [11] Robert Axelrod. Advancing the art of simulation in the social sciences. In Simulating social phenomena, pages 21–40. Springer, 1997. [12] David C Gooding. Experiment and the making of meaning: Human agency in scientific observation and experiment, volume 5. Springer Science & Business Media, 2012. [13] G Sundari, T Bernatin, and Pratik Somani. H. 264 encoder using gstreamer. In Circuit, Power and Computing Technologies (ICCPCT), 2015 International Conference on, pages 1–4. IEEE, 2015. [14] Matthew J Turk, Britton D Smith, Jeffrey S Oishi, Stephen Skory, Samuel W Skillman, Tom Abel, and Michael L Norman. yt: A multi-code analysis toolkit for astrophysical simulation data. The Astrophysical Journal Supplement Series, 192(1):9, 2010. [15] Samarth Swarup and Les Gasser. The iterated classification game: a new model of the cultural transmission of language. Adaptive behavior, 17(3):213–235, 2009. [16] Paul D Straight and Roberto Kolter. Interspecies chemical communication in bacterial development. Annual review of microbiology, 63:99–118, 2009. [17] Daniel H Kim and John J Rossi. Strategies for silencing human disease using rna interference. Nature Reviews Genetics, 8(3):173–184, 2007.

Acknowledgements We gratefully acknowledge the support of the National Center for Supercomputing Applications’ NCSA Fellows program via the project “Simulating Social Systems at Scale” (L. Gasser) and of the Illinois Informatics Institute through the Informatics PhD program (S. Núñez-Corrales).