A Hybrid Method for Macro and Micro Simulation of ...

2 downloads 110 Views 60MB Size Report
siones para Multitudes. Sergio Ruiz, Benjamın Hernández; In proceedings of: COMIA2014 6o Congreso Mexicano de Inteligencia Artificial At Zumpango, Estado ...
´ INSTITUTO TECNOLOGICO Y DE ESTUDIOS SUPERIORES DE MONTERREY

A HYBRID METHOD FOR MACRO AND MICRO SIMULATION OF CROWD BEHAVIOR THESIS PRESENTED TO OPT FOR THE DEGREE OF DOCTOR OF COMPUTER SCIENCE BY

SERGIO RUIZ LOZA Advisors:

Thesis committee:

Referees:

´ DR. BENJAM´IN HERNANDEZ ARREGU´IN ´ ´ DR. JOSE MARTIN MOLINA ESPINOSA ´ ALENCASTRE MIRANDA DR. MOISES DR. JUANA JULIETA NOGUEZ MONROY ´ ˜ DR. LOURDES MUNOZ GOMEZ

DR. JUANA JULIETA NOGUEZ MONROY ´ ˜ DR. LOURDES MUNOZ GOMEZ ´ DR. MOISES ALENCASTRE MIRANDA DR. JOSE´ MART´IN MOLINA ESPINOSA ´ DR. BENJAM´IN HERNANDEZ ARREGU´IN

President Secretary Vocal Vocal Vocal

Ejidos de Huipulco, Tlalpan. Distrito Federal, M´exico, November 2014

Publications 1. Reducing Memory Requirements for Diverse Animated Crowds Sergio Ruiz, Benjam´ın Hern´andez, Adriana Alvarado, Isaac Rudom´ın; MIG ’13 Proceedings of Motion on Games. Pages 77-86. ACM New York, NY, USA 2013. ISBN: 978-1-45032546-2. http://dl.acm.org/citation.cfm?id=2522901 2. Procesos de Decisi´on de Markov y Microescenarios para Navegaci´on y Evasi´on de colisiones para Multitudes Sergio Ruiz, Benjam´ın Hern´andez; In proceedings of: COMIA2014 6o Congreso Mexicano de Inteligencia Artificial At Zumpango, Estado de M´exico. Research in Computing Science. Issue 74, 2014 pp. 103-116. ISSN 1870-4069. http://www.rcs.cic.ipn.mx/ 2014_74/ 3. Generating Large Varied Animated Crowds in the GPU Isaac Rudomin, Benjam´ın Hern´andez, Oriam deGyves, Leonel Toledo, Ivan Rivalcoba, Sergio Ruiz; Computaci´on y Sistemas Vol. 17 No.3, 2013 pp. 365-380. ISSN 1405-5546. http://www.repositoriodigital.ipn.mx/handle/123456789/17229 4. Simulating and Visualizing Real-time Crowds on GPU Clusters Benjam´ın Hern´andez, Hugo Perez, Isaac Rudomin, Sergio Ruiz, Oriam de Gyves, Leonel Toledo; Computaci´on y Sistemas. Special Issue: Topic Trends in Computing Research. To be published in December, 2014. 5. A GPU Implementation of Markov Decision Processes for Path Planning Sergio Ruiz, Benjam´ın Hern´andez, Isaac Rudomin; March, 2014. Lecture conducted at the ISUM2014 Fifth International Supercomputer Conference in Mexico. http://isum2014. cicese.mx/english/index.php

To Mom and Dad. To Andrea, Ricardo, Regina and Mar´ıa Jos´e. Your happiness always inspires me to carry on. This thesis is dedicated to you.

“We swim through a sea of people —a social version of Middle World. We are evolved to second-guess the behavior of others by becoming brilliant, intuitive psychologists. Treating people as machines may be scientifically and philosophically accurate, but it’s a cumbersome waste of time if you want to guess what this person is going to do next. The economically useful way to model a person is to treat him as a purposeful, goal-seeking agent with pleasures and pains, desires and intentions, guilt, blame-worthiness. Personification and the imputing of intentional purpose is such a brilliantly successful way to model humans, it’s hardly surprising the same modeling software often seizes control when we’re trying to think about entities for which it’s not appropriate, like Basil Fawlty with his car or like millions of deluded people with the universe as a whole” Richard Dawkins

ACKNOWLEDGEMENTS Foremost I would like to express gratitude to my advisor, Dr. Benjam´ın Hern´andez, who has guided me honestly and steadily over these past five years, allowing me to decide on the research subjects to pursue, while constantly supervising my progress. Dr. Hern´andez suggested the title “A hybrid method for Macro and Micro Simulation of Crowd Behavior” while reviewing my early work; for this and all your selfless contributions to make this work possible, I am grateful. I also thank Dr. Isaac Rudomin for his support and ideas over these years. Thank you Doctor for your superb Image Synthesis class and for introducing me to the Cellular Automata concept. Your wisdom and insight always are an inspiration for me. I would like to thank Dr. Mart´ın Molina, among many reasons for giving me the opportunity to pursue higher studies, always offering his valuable advice and help for which I am grateful. I also thank Dr. Julieta Noguez for always keeping track of my progress, and for sharing her expert advice for the correct structure of this document. I also thank Dr. Mois´es Alencastre for introducing me to the computer graphics field some years ago; his ability to provoke scientific curiosity led me to investigate further, and here we are. I also thank Dr. Lourdes Mu˜noz for her valuable suggestions during earlier thesis defenses, her advice is now a part of this work. I am also grateful to Dr. Arturo Molina, for welcoming me as a beginner student and seeing some potential that motivated him to authorize the scholarship application that eventually led to this work. Dr. Molina said to me: “it will take a brilliant idea, and a lot of hard work to see this through. Are you willing?”. I still am Doctor. Thank you very much for granting me this opportunity. This work would not be possible without the financial aid provided by Instituto Tecnol´ogico y de Estudios Superiores de Monterrey, Campus Ciudad de M´exico scholarship for tuition number A01106919, and Consejo Nacional de Ciencia y Tecnolog´ıa for the Beca Nacional CVU: 375247. Financial support for this project was also granted by the CONACyT Becas Mixtas 2013 - mzo2014 Movilidad en el Extranjero program.

ABSTRACT Real-time crowd simulation is relevant for applications related to urban planning, evacuations, crowd control training and entertainment. Fundamental problems as agent Navigation and Collision Avoidance should be solved both accurately and efficiently to provide interactive simulations. The purpose of this thesis is to define a parallel Hybrid Crowd Movement Algorithm (HCMA), able to exploit the massive parallelism of the Graphics Processing Unit (GPU) in order to simulate large numbers of agents, including a strategy to plan optimal Long-Range paths, as well as ShortRange collision-free trajectories for simulated pedestrians: • For Long-Range Path Planning, Macro simulation, or Navigation, we present a parallel (GPU-based) Markov Decision Process (MDP) Solver for guided crowds. This Navigation aspect of the HCMA is able to determine optimal paths for groups of agents towards a goal or multiple goals by mapping a penalty-reward system to a static scenario with obstacles. Also, the Navigation aspect of the HCMA will adapt to dynamic topology providing real-time solution to changing scenarios. • For the Short-Range collision-free trajectories generation, or Micro simulation, we present a parallel (GPU-based) Local Collision Avoidance (LCA) system. This LCA aspect of the HCMA makes use of a fine-grained partition of the MDP state representation, in order to push agents towards local virtual exits placed along the Long-Range paths. This aspect of the HCMA is a cell-based, gather-scatter process similar to that of a Cellular Automaton, in which a free cell will query neighbors for candidates to occupy it, pulling at most one agent per cycle, thus avoiding collisions between agents. We extend the work in the doctoral thesis of Benjam´ın Hern´andez [1] with: a) a virtual motion capture system for characters animated with commercially available 3D modeling software; b) a texture-based smooth skinning system complementing the previously designed texture-based rig system for animated characters in virtual crowds that reduces memory requirements; c) a character generation software to load and stitch topologically compatible body parts. We also present the architecture for a configurable rendering engine to simulate virtual varied animated crowds, moving within a virtual scenario and responding to user interaction. An implementation of this rendering engine architecture simulates a crowd with such characteristics, controlled in real-time by the parallel HCMA. The above techniques are applied to simulate crowds for casual walking, evacuations and video games. Sections of this research are also applied to interactive visualization of characters in GPU clusters at Barcelona Supercomputing Center.

CONTENTS 1

2

3

Introduction 1.1 Motivation . . . . . 1.2 Hypothesis . . . . 1.3 Objectives . . . . . 1.4 Contributions . . . 1.5 Thesis organization

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

Related Work 2.1 Behavioral Traits . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 Individual Pedestrian Behavioral Traits . . . . . . . . 2.1.2 Crowd Group Behavioral Traits . . . . . . . . . . . . 2.2 Navigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Social Navigation Methods . . . . . . . . . . . . . . . 2.2.2 Grid Partitioning Navigation Methods . . . . . . . . . 2.2.3 Formation Navigation Methods . . . . . . . . . . . . 2.2.4 Navigable Areas Methods . . . . . . . . . . . . . . . 2.3 Collision Avoidance . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Machine Learning Collision Avoidance Methods . . . 2.3.2 Hierarchical Collision Avoidance Methods . . . . . . 2.3.3 Geometric-based Collision Avoidance Methods . . . . 2.3.4 Distributed Collision Avoidance Methods . . . . . . . 2.3.5 Virtual Perception-based Collision Avoidance Methods 2.4 Agent Animation . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 Rigging stage . . . . . . . . . . . . . . . . . . . . . . 2.4.2 Skinning stage . . . . . . . . . . . . . . . . . . . . . 2.4.3 Key-framing stage . . . . . . . . . . . . . . . . . . . 2.5 Rendering . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . .

. . . . . . . . . . . . . . . . . . .

. . . . .

. . . . . . . . . . . . . . . . . . .

GPU Acceleration of Markov Decision Processes for Path Planning 3.1 Problem Solving Strategy . . . . . . . . . . . . . . . . . . . . . . 3.1.1 Markov Decision Process Modeling . . . . . . . . . . . . 3.1.2 Agent Density . . . . . . . . . . . . . . . . . . . . . . . 3.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Parameters . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 MDP solving algorithm . . . . . . . . . . . . . . . . . . 3.2.3 Complexity . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . 3.3 Crowd Navigation . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . .

. . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . .

. . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . .

. . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . .

. . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . .

. . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . .

. . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . .

. . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . .

. . . . .

14 18 20 21 22 23

. . . . . . . . . . . . . . . . . . .

24 25 26 29 32 32 32 35 36 36 37 37 38 39 39 40 41 41 46 47

. . . . . . . . . . .

52 53 53 55 55 56 57 61 61 64 65 70

4

5

6

Cellular Automata for Local Collision Avoidance 4.1 MDP and Micro Scenarios applied to Local Collision Avoidance for Crowds . . 4.1.1 Cubic B´ezier Least-Square Fitting . . . . . . . . . . . . . . . . . . . . 4.1.2 Micro Scenario Selection Algorithm . . . . . . . . . . . . . . . . . . . 4.1.3 Implementation and Results . . . . . . . . . . . . . . . . . . . . . . . 4.1.4 Conclusion and Limitations . . . . . . . . . . . . . . . . . . . . . . . 4.2 A Parallel lattice-Boltzmann Solver for Guided Crowds . . . . . . . . . . . . . 4.2.1 The Boltzmann Equation . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.2 Characteristics of LGCA and LBM . . . . . . . . . . . . . . . . . . . 4.2.3 Generic LBM Simulation Algorithm . . . . . . . . . . . . . . . . . . . 4.2.4 Lattice-Boltzmann Model Definition . . . . . . . . . . . . . . . . . . . 4.2.5 Implementation and Results . . . . . . . . . . . . . . . . . . . . . . . 4.2.6 Conclusion and Limitations . . . . . . . . . . . . . . . . . . . . . . . 4.3 Cellular Automata applied to Local Collision Avoidance for Simulated Crowds 4.3.1 Local Navigation Map . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.2 Cellular Automaton Characteristics . . . . . . . . . . . . . . . . . . . 4.3.3 Cellular Automaton Update Rule . . . . . . . . . . . . . . . . . . . . . 4.3.4 Conclusion and Future work . . . . . . . . . . . . . . . . . . . . . . . 4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hybrid Crowd Movement Algorithm 5.1 Office Scenario . . . . . . . . . . . . . . . . . . . . . . . . 5.1.1 Simulation Results . . . . . . . . . . . . . . . . . . 5.2 Town Scenario . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Simulation Results . . . . . . . . . . . . . . . . . . 5.3 Eiffel Scenario . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 Simulation Results . . . . . . . . . . . . . . . . . . 5.4 Maze Scenario . . . . . . . . . . . . . . . . . . . . . . . . 5.4.1 Simulation Results . . . . . . . . . . . . . . . . . . 5.5 Campus Scenario . . . . . . . . . . . . . . . . . . . . . . . 5.5.1 Simulation Results . . . . . . . . . . . . . . . . . . 5.6 Multi-layered Interactive Scenario . . . . . . . . . . . . . . 5.6.1 Multi-layered scenario Simulation Results . . . . . . 5.6.2 Multi-layered Interactive scenario Simulation Results 5.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . .

Reducing memory requirements for diverse animated crowds 6.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.1 Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.2 Rig and Skinning Information Embedding and Animation Clip Exporting 6.1.3 Generation of Diversity . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.4 Animation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Additional Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Conclusions and Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

73 74 75 76 79 80 81 81 82 82 83 86 88 89 89 92 92 98 101

. . . . . . . . . . . . . .

102 105 105 108 108 112 112 115 115 118 118 121 121 122 124

. . . . . . . .

125 126 126 128 132 134 136 137 137

7

8

Collision Avoidance Simulator 7.1 Third Party Components . . . . . . . . . 7.1.1 Audio . . . . . . . . . . . . . . . 7.1.2 Images . . . . . . . . . . . . . . 7.1.3 Model Parsing . . . . . . . . . . 7.1.4 Scripting . . . . . . . . . . . . . 7.1.5 Computer Graphics Mathematics 7.1.6 Code Profiling . . . . . . . . . . 7.1.7 Documentation . . . . . . . . . . 7.2 Internal Components . . . . . . . . . . . 7.3 Character Generator . . . . . . . . . . . . Conclusion 8.1 Summary and Contributions . . . 8.2 Future Work . . . . . . . . . . . . 8.2.1 Performance improvement 8.2.2 New applications . . . . .

. . . .

. . . .

. . . .

. . . .

. . . . . . . . . .

. . . .

. . . . . . . . . .

. . . .

. . . . . . . . . .

. . . .

. . . . . . . . . .

. . . .

. . . . . . . . . .

. . . .

. . . . . . . . . .

. . . .

. . . . . . . . . .

. . . .

. . . . . . . . . .

. . . .

. . . . . . . . . .

. . . .

. . . . . . . . . .

. . . .

. . . . . . . . . .

. . . .

. . . . . . . . . .

. . . .

. . . . . . . . . .

. . . .

. . . . . . . . . .

. . . .

. . . . . . . . . .

. . . .

. . . . . . . . . .

. . . .

. . . . . . . . . .

. . . .

. . . . . . . . . .

. . . .

. . . . . . . . . .

. . . .

. . . . . . . . . .

. . . .

. . . . . . . . . .

. . . .

. . . . . . . . . .

141 142 142 142 143 144 147 147 147 147 152

. . . .

154 154 156 156 156

LIST OF ALGORITHMS 1 2 3 4 5 6

Algorithm to alter paths based on micro scenarios. Generic LGCA algorithm. . . . . . . . . . . . . . SPSC-LBM algorithm. . . . . . . . . . . . . . . . Local Navigation Map Algorithm . . . . . . . . . Race condition solution . . . . . . . . . . . . . . Queuing and Waiting solution . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

78 83 83 91 99 100

LIST OF FIGURES 1.1 1.2 1.3

World population . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 World population density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 Crowd models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 2.11 2.12 2.13 2.14 2.15

Virtual crowd movement process . . . . . . . Three levels of processing . . . . . . . . . . Steering Behaviors . . . . . . . . . . . . . . A crowd simulated with the Layered approach Indicative Routes Method . . . . . . . . . . . Geometric interpretation for RVO . . . . . . The set of ORCA permitted velocities . . . . Vision-based Collision Avoidance model . . . An Agent . . . . . . . . . . . . . . . . . . . Character rig and mesh . . . . . . . . . . . . Linear Blend Skinning . . . . . . . . . . . . Dual Quaternion Skinning . . . . . . . . . . Flowchart for “Frankenrigs” . . . . . . . . . Motion Variations . . . . . . . . . . . . . . . Camera view frustum . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

24 28 29 33 36 39 39 40 41 42 43 45 46 47 50

3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10 3.11 3.12

Basic MDP model on a hexagonal grid. . . . Π∗ override based on cell density information. Hexagonal tile . . . . . . . . . . . . . . . . . Hexagonal tiling . . . . . . . . . . . . . . . . Method pipeline overview . . . . . . . . . . MDP iteration overview . . . . . . . . . . . . States data collection example . . . . . . . . Direction evaluation point . . . . . . . . . . . Office complex scenario . . . . . . . . . . . Performance and speedup . . . . . . . . . . . Crowd navigating in dynamic environments . Test scenario to compare A∗ and MDP. . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

54 55 57 58 59 60 61 65 66 67 68 70

4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11

Collision Avoidance problem. . . . . . . . . . . . . . . . Micro scenarios of radius 1 and their respective signatures. Reference scenario solution . . . . . . . . . . . . . . . . . Rendering of scenarios . . . . . . . . . . . . . . . . . . . Arrays for the micro scenario selection algorithm. . . . . . Global MDP micro scenario algorithm performance. . . . Simulated Lemmings and virtual humans. . . . . . . . . . D2Q9 model . . . . . . . . . . . . . . . . . . . . . . . . . SPSC-LBM Implementation results . . . . . . . . . . . . Implementation with solid sites simulating a gate . . . . . LCA principles . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

74 74 75 77 78 79 80 82 87 88 89

4.12 4.13 4.14 4.15 4.16 4.17

LCA partition . . . . . . . . . . Agent translation . . . . . . . . Basic Scatter-Gather algorithm . Race condition solution . . . . . Similar directions . . . . . . . . Scatter-Gather step with steering

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

90 93 94 95 96 97

5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11 5.12 5.13 5.14 5.15 5.16 5.17 5.18 5.19 5.20

Parallel HCMA method . . . . . . . . . . . . . . . . . Rendered Office scenario model . . . . . . . . . . . . Office Complex Simulation . . . . . . . . . . . . . . . HCMA performance for the Office scenario . . . . . . HCMA performance breakdown for the Office scenario Rendered Town scenario model . . . . . . . . . . . . . Town Simulation . . . . . . . . . . . . . . . . . . . . Town scenario simulation with 4,096 agents . . . . . . HCMA performance for the Town scenario . . . . . . HCMA performance breakdown for the Town scenario Rendered Eiffel scenario model . . . . . . . . . . . . . Eiffel Simulation . . . . . . . . . . . . . . . . . . . . Eiffel scenario simulation with 4,096 agents . . . . . . Maze Simulation . . . . . . . . . . . . . . . . . . . . HCMA performance for different LCA per MDP ratios Rendered Campus scenario model . . . . . . . . . . . Rewards and speeds for the Campus scenario . . . . . Campus Simulation . . . . . . . . . . . . . . . . . . . Multi-layered Simulation . . . . . . . . . . . . . . . . Multi-layered interactive Simulation . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . .

103 105 106 107 107 108 109 110 111 111 112 113 114 115 117 118 119 120 121 123

6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 6.10 6.11

Diverse animated crowds method overview Basis meshes . . . . . . . . . . . . . . . . Character UVs . . . . . . . . . . . . . . . Patterns and foldings . . . . . . . . . . . . Rig and Skinning embedding . . . . . . . . Generation of Diversity . . . . . . . . . . . The pattern texture . . . . . . . . . . . . . Animation transfer . . . . . . . . . . . . . General diagram of the run-time stage . . . Rendering results . . . . . . . . . . . . . . GoD for an additional character . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

126 127 128 129 130 131 133 136 136 138 139

7.1 7.2 7.3 7.4 7.5

CASIM documentation . . . . . . . . . . . . . . . . . . Collision Avoidance Simulator architecture . . . . . . . CASIM rendering features . . . . . . . . . . . . . . . . CHARACTOY workspace and visual aids. . . . . . . . . Novel character generation process with CHARACTOY.

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

148 149 151 152 153

. . . . . . . . . . .

. . . . . .

. . . . . . . . . . .

. . . . . .

. . . . . . . . . . .

. . . . . .

. . . . . . . . . . .

. . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

LISTINGS 3.1 3.2 3.3 7.1 7.2 7.3

MDP Solver on GPU pseudo code overview mdp init permutations on device(...) . . . . mdp iterate on device(...) . . . . . . . . . . CASIM File Parsing Descriptor. . . . . . . CASIM Network Descriptor. . . . . . . . . CASIM Rendering Constants Descriptor. . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

62 63 64 144 145 146

LIST OF TABLES 2.1

Rendering Optimization Techniques . . . . . . . . . . . . . . . . . . . . . . . . . 51

3.1 3.2 3.3 3.4 3.5

Time to find Π∗ for the Office Complex Scenario . Time to find Π∗ for a virtual hexagonal environment A∗ performance . . . . . . . . . . . . . . . . . . . MDP performance . . . . . . . . . . . . . . . . . Comparison with other path planning methods . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

66 69 70 70 72

4.1 4.2 4.3 4.4

SPSC-LBM GPGPU without solid sites . . . . SPSC-LBM GPGPU with solid sites . . . . . . SPSC-LBM GPGPU with solid lanes . . . . . . SPSC-LBM GPGPU with solid sites and agents

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

87 87 88 88

5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11 5.12

Scenario metrics summary. . . . . . . . . . . . . . . . . . . . . . Office scenario results for 16 agents @ 5 ms per frame. . . . . . . Office scenario results for 64 agents @ 14 ms per frame. . . . . . Office scenario results for 256 agents @ 16 ms per frame. . . . . . Town scenario results for 1,024 agents @ 13 ms per frame. . . . . Town scenario results for 4,096 agents @ 34 ms per frame. . . . . Eiffel scenario results for 1,024 agents @ 16 ms per frame. . . . . Eiffel scenario results for 4,096 agents @ 34 ms per frame. . . . . Maze scenario results for 4,096 agents @ 45 ms per frame. . . . . Maze simulation performance for different LCA-MDP cell ratios. Campus scenario results for 4,096 agents @ 40 ms per frame. . . . Multi-layered scenario results for 256 agents @ 7 ms per frame. .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

104 106 107 107 110 110 114 114 116 116 119 122

6.1 6.2

Reduction Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 GPU memory load . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

. . . .

. . . .

1. INTRODUCTION As human beings arguably [2] constitute the most complex species known (both physically as well as behaviorally), it follows that simulating a crowd represents a considerable challenge, and although the first attempts to achieve such simulation began in the mid nineties [2] (And the first crowd behavior analysis conducted by Le Bon [3] dates from 1896), the high demand in computational resources produced by a real-time crowd simulation keeps pushing improved algorithms design and implementation to their limit, even as faster and increasingly capable hardware is available [4]. But aside the technical challenge, what is the utility of a crowd simulation? Growing world population numbers (Figure 1.1) combined with a limited surface of habitable land (Figure 1.2), confirm the necessity to cope with increasingly crowded environments; and although observation and measurements of the crowd phenomenon are feasible, experimentation and therefore formulation of hypotheses is not always so, for example in the case of large-scale emergency evacuations. The impossibility to apply the scientific method safely to certain subsets of the crowd phenomena, added to the prohibitive logistic and economical costs related to the conduction of experimental events, leads to the development of a virtual alternative: a simulation. But before discussing what a crowded environment implies, or how a simulation may aid in the analysis of its implications, let us establish the different types of crowds that will be referenced throughout this work. These will serve as a point of comparison when a coarse definition for a crowd is required. As Le Bon [3] defined it, “In its ordinary sense the word ‘crowd’ means a gathering of individuals of whatever nationality, profession, or sex, and whatever be the chances that have brought them together”. In order to refer consistently to distinct types of crowd models, we resort to the taxonomy proposed by Zhou et al. [5] (Figure 1.3). This classification system characterizes a crowd model based on a two-dimensional criterion: 14

15

Figure 1.1: World population a . a

Index Mundi. World - Population growth rate - Historical Data Graphs per Year. Retrieved November 1, 2014, from http://www.indexmundi.com/g/g.aspx?v=21&c=xx&l=en

Time scale

Usually from a few minutes up to several hours, the time scope may define a longterm crowd phenomena (as the diffusion of ideas within a crowd), or a short-term crowd phenomena (as the reaction of a crowd to an emergency).

Crowd size

By the number of individuals in the model, a crowd may be: Huge-sized

Hundreds of thousands.

Medium-sized

Hundreds.

Small-sized

Tens.

We proceed to enunciate some exemplary implications for a crowded environment, and the utility found in its corresponding simulation, justifying the existence and importance of the latter. Disease propagation

Besides the obvious utility in sparing a crowd from hazard, the long-term phenomena may be modified and analyzed several times, varying weather conditions on each iteration. A simulation is useful then to determine the relation between weather conditions and disease propagation within a medium-sized crowd.

Evacuations

Whether in an emergency scenario or a regular evacuation, a simulation is useful to determine optimal evacuation routes, as well as the optimal stations for evacuation authorities when dealing with a medium to largesized crowd evacuation, adjusting the time scale for the size of the building or enclosing space.

Spectator safety

Simulation of a crowded architectural plan for a stadium is useful to determine the maximum capacity to safely occupy the facility.

16

Figure 1.2: World population density b . b

Index Mundi. World - Population density - Historical Data Graphs per Year. Retrieved November 1, 2014, from http://www.indexmundi.com/g/g.aspx?v=21000&c=xx&l=en

Further examples of tragic events related to crowds, that may be prevented by analysis of simulations follow. October 2014

“Patna has within two years seen two incidents of stampede in which 17 people died during Chhath in 2012 and 33 people died after Ravanvadh on October 3 this year at Gandhi Maidan: Mass gathering is a common phenomenon in Patna. We need to improve our drills for mob management. Chhath is a priority even though I, being new, could not prepare anything in advance. But things like clean roads, street lights and adequate forces and magistrates have been planned. Watch towers will be located at strategic points. All the entry and exit routes to the ghats will be manned by police”1 .

April 2012

“A devastating gust of wind traveling at approximately 40 miles per hour (80 km/hr) or more, toppled a tent packed with approximately 150 people in St. Louis, Missouri, according to the St. Louis Post-Dispatch. One person was killed, five more people were seriously injured and at least one hundred others were treated on site”2 .

August 2011

“The Saturday Indiana State Fair stage collapse tragedy in Indianapolis,

1

Cannot control destiny, but will leave no stone unturned: Patna DM - The Times of India. (n.d.). Retrieved November 2, 2014, from http://timesofindia.indiatimes.com/city/patna/ Cannot-control-destiny-but-will-leave-no-stone-unturned-Patna-DM/articleshow/ 44902984.cms 2 Crowd Management Strategies - News and Views. (n.d.). Crowd Management Strategies - crowdsafe for rock, rap, rave, public safety and crowd control.. Retrieved July 3, 2012, from http://www.crowdsafe.com/new. asp?ID=2136

17

Figure 1.3: Crowd models a . a

Zhou, S. et al. Crowd modeling and simulation technologies. In TOMACS, (Oct 2010), p. 5.

Indiana, that claimed five lives so far and injured scores of fairgoers, appears to be one of the worst, if not the worst, stage collapse disaster in U.S. history”3 . July 2010

3

“A popular German dance and techno festival, the 2010 German Love Parade was marred by tragedy when 21 people were killed and over 500 were injured due to crushing and trampling. Although the venue was capable of handling more than 250,000 people, over one million attendees were anticipated. Entrance to the space, a former freight station, was through a series of underpasses and tunnels. Despite the venue being at capacity and the tunnels closed, people continued to press into the tunnel, resulting in deaths. It has since been announced that, as a direct result of this incident, future Love Parade events have been canceled, and the festival is no more”4 .

Crowd Management Strategies - News and Views. (n.d.). Crowd Management Strategies - crowdsafe for rock, rap, rave, public safety and crowd control.. Retrieved July 3, 2012, from http://www.crowdsafe.com/new. asp?ID=2075 4 Cooper, R. (n.d.). Worst Tragedies at Live Concerts - Concerts that have Resulted in Deaths. Punk Music. Retrieved July 3, 2012, from http://punkmusic.about.com/od/liveperformances/tp/ Death-At-A-Concert.htm

18 February 2008

1.1

“An oversold venue was to blame for the deaths of 10 fans and the injuries of six at a concert by Indonesian metalcore band Beside. Conflicting reports on numbers seem to confirm that the venue, meant to hold 700, was well over capacity, with perhaps as many as 1,500 people inside”5 .

MOTIVATION

A key component in every crowd simulation is the synthesis of behavior, for a crowd simulation will be accurate in the measure it is able to emulate both individual and group human behavior faithfully. This assumption does not imply that two types of simulations are required, it merely describes human behavior when observed from different perspectives, as an accurate simulation would. Again we refer to the interpretation by Zhou et al. [5]: Micro level

When experienced from a close distance, a crowd appears as a small-sized crowd regardless of its actual size, revealing fine entity traits that will be discussed in Section 2.1.1.

Macro level

When viewed as a whole, from far distance, a crowd presents coarse, fluid-like traits that will be addressed in Section 2.1.2.

According to Martin [6] a real-time system is one which “controls an environment by receiving data, processing them, and returning the results sufficiently quickly to affect the environment at that time”. In the computer graphics field, Akenine-M¨oller et al. [7] refer to “algorithms that create synthetic images fast enough that the viewer can interact with a virtual environment” when they discuss the subject of real-time rendering. To simulate a real-time virtual crowd, both interpretations of the term real-time apply, since a careful orchestration of scarce computational resources is required to, on the one hand, receive, process and return AI data while, on the other hand, process fast algorithms that create synthetic images so that users can interact with the simulation. The need to simulate both Micro and Macro levels of the crowd phenomena in an accurate and efficient manner motivates the work for this thesis. As the number of simulated agents in a crowd raises, processing power required to evaluate their AI states and to produce rendering in real-time, exhausts a CPU-RAM architecture quickly, making crowd simulation a challenging and complex problem. We are interested in demonstrating that, in order to simulate the largest number of agents possible, implementation of a Hybrid Crowd Movement Algorithm (HCMA) will leverage both the CPU serial processing speed, as well as the GPU parallel processing capabilities by making use of the following tools: Shaders 5

A graphical display oriented program, able to exploit the programmable capabilities of the Graphics Processing Unit (GPU). Implemented with a shading language.

Teens Killed In Punk Concert Crush. (n.d.). Sky News - First For Breaking News From The UK And Around The World. Retrieved July 3, 2012, from http://news.sky.com/story/579856/ teens-killed-in-punk-concert-crush

19 GPGPU

General purpose computations performed on the GPU, implemented with a high level programming language as CUDATM1 C. With GPGPU, certain mathematical operations, such as dot product and matrix multiplication, commonly computed serially on the CPU, may be prepared and massively processed in parallel when sent in bulk to the GPU.

HC

Heterogeneous Computing (HC) Refers to the synergy in the use of CPU-RAM and GPU architectures to program high performance applications.

As the challenge for a believable crowd simulation is better fulfilled [1] in time, concepts such as behavior, motion control and environmental integration find more available processing resources; we intend to integrate these concepts—restricted for the scope of this work to the navigable paths for agents, as it may be extended to motion planning—applied to simulated crowds into the core of a rendering engine, contributing an experimental tool that may prove to be useful not only for computer science research, but also for behavioral science—psychology, cognitive science, anthropology—research. We are interested in developing a configurable rendering engine, able to perform a coherent crowd simulation making use of the HCMA in real-time. In this context, coherency refers to the fact that agents are able to avoid collisions and determine the best path towards a goal; it also means that an effort is made to display a believable and aesthetically pleasant image. A rendering engine designed from scratch to optimize and coordinate computational resources for rendering and AI will be used to demonstrate the viability of the HCMA, while providing the required rendering capabilities to display a virtual varied animated crowd, moving in a simulated environment and responding to user interaction.

1

Parallel Programming and Computing Platform — CUDA — NVIDIA. (n.d.). World Leader in Visual Computing Technologies — NVIDIA. Retrieved June 25, 2012, from http://www.nvidia.com/object/cuda_home_ new.html

20 1.2

HYPOTHESIS

Given the following definitions: Definition 1. Macro crowd level optimal paths may be accurately simulated by Q-learning penaltyreward schemes evaluated over a mesh A that represents possible actions concerning path choices for groups of agents in large-sized crowds.

Definition 2. Micro crowd level least-effort paths may be accurately simulated by a Cellular Automaton over a lattice of sites B that partitions navigable space for Small-sized crowds.

Then the following propositions are to be proven: Proposition 1. A parallel algorithm is able to solve accurately and efficiently the mapping from a static navigable scenario with obstacles to a Q-learning formalism, delivering Long-Range paths for a Macro crowd simulation.

Proposition 2. A parallel algorithm is able to solve accurately and efficiently the Local Collision Avoidance problem via generation of cyclic phenomena typical of a Cellular Automaton.

Proposition 3. Let N be the area of navigable space in a virtual scenario for a simulated crowd. Let A be the partition of N corresponding to the collectively exhaustive, mutually exclusive set of states for the Q-learning formalism at Macro scope. Then B is a partition of A used for the Micro scope, allowing for a Hybrid Crowd Movement Algorithm. Therefore, a parallel hybrid algorithm for Macro and Micro simulation of crowd behavior is able to accurately and efficiently: • Generate an Optimal Policy (Π∗ ) from a Q-learning formalism to produce Long-Range paths for groups of agents. • Use Π∗ as input for a Cellular Automaton, that in turn will guide simulated agents in a crowd toward the directions signaled by Π∗ while avoiding local collisions.

21 1.3

OBJECTIVES

The objectives for this thesis are: 1. Through research of related work, establish the validity of Definition 1 and Definition 2. 2. Demonstrate and implement Proposition 1 by means of the Markov Decision Process (MDP) formalism. 3. Demonstrate Proposition 2 by researching implementations of Cellular Automata and later implementing a LCA technique based on its principles. 4. Implement and demonstrate Proposition 3 by producing a simulation that makes use of the HCMA with the following characteristics: • An interactive large-sized crowd simulation running in a commercially available laptop PC. • Simulation will be executed in test scenarios with diverse topology to demonstrate the accuracy and effectiveness of the HCMA. – In a single-layer approach, the HCMA will be tested for a single crowd directed to the nearest goals. – In a multiple-layer approach, the HCMA will be tested for two crowds directed to multiple goals in opposite directions. • The HCMA will be tested with different environmental conditions, such as terrain diversity, individual agent speed, complex topology scenarios, as well as dynamic scenarios, in which part of the topology changes on user demand. 5. Produce a configurable crowd rendering engine and authoring tool, capable of interpreting rendering instructions and scenario topology instructions. Rendering instructions encompass information regarding character variety definitions, lighting, environmental objects and, if required, technical rendering data. On the other hand, scenario topology instructions will specify scene dimensions, partition granularity, terrain types, dynamic obstacles and rewardpenalty layered maps.

22 1.4

CONTRIBUTIONS

1. In crowd Navigation and Local Collision Avoidance. The main objective of this thesis is to formulate a parallel Hybrid Crowd Movement Algorithm to simulate large-sized crowds. The HCMA is our first contribution because: • Although previous efforts have been made to map a penalty-reward system to a static scenario (specifically the work by Banerjee et al.[8]) in order to obtain paths for Navigation, this thesis presents the following advances: a) detailed formulation of the MDP formalism as matrix operations in a GPGPU algorithm to optimize performance; b) separation of the algorithm execution in stages to produce interactive solutions in response to changes in scenario topology; c) the use of a hexagonal navigation grid that reduces the MDP states set cardinality d) a GPU speedup analysis, reporting a 90x speed up in desktop platforms, and a 30x speed up in embedded systems. • This thesis presents a GPGPU interpretation of the Cellular Automata approach that focuses only on Collision Avoidance, solving the majority flow and direction problem by introducing the MDP Optimal Policy as a directional guide, granting separation and control for individuals. 2. In character animation and modeling. We extend a section from the doctoral thesis of Benjam´ın Hern´andez [1] with: a) a virtual motion capture system for characters animated with commercially available 3D modeling software; b) a texture-based smooth skinning system complementing the previously designed texture-based rig system for animated characters in virtual crowds that reduces memory requirements; c) a character creation software to load and stitch topologically compatible body parts. This extensions, working together as an animation technique present four advantages: first, memory requirements are reduced by 97% when compared to traditional libraries; second, it can also generate previously nonexistent characters from the original data set; third, embedding the rig and skinning into texture space allows painless animation transfer between different characters and fourth, between different levels of detail of the same character. 3. Collision Avoidance Simulator. We also contribute an architecture for a configurable rendering engine to simulate a virtual varied animated crowd, moving in its surrounding scenario and responding to user interaction. An implementation of this architecture presents a virtual crowd with this characteristics, controlled in real-time by the parallel HCMA. 4. Applications. The above techniques have been applied to crowd simulation of casual walking, evacuations and video games. Sections of this research are also applied to interactive visualization of characters in GPU clusters at Barcelona Supercomputing Center.

23 1.5

THESIS ORGANIZATION

This thesis is organized as follows: Chapter 2 presents an analysis of efficient techniques related to the path planning process for crowds, establishing the validity of Definition 1 and Definition 2 according to Objective 1. In Chapter 2 several technical requirements for a rendering engine are also discussed: Behavior

In Section 2.1, techniques that simulate individual and group behavioral traits are discussed.

Navigation

In Section 2.2, algorithms that determine paths for agents in a crowd simulation using Social, Grid Partitioning, Formation and Navigable Areas approaches are described.

Collision Avoidance

In Section 2.3, Machine Learning, Hierarchical, Geometric and Distributed approaches are reviewed.

Animation

In Section 2.4, mechanisms to efficiently Rig, Skin and Key-frame animated characters are discussed.

Rendering

Section 2.5 lists optimization techniques for image synthesis.

Next we present the work done to achieve the remaining objectives stated in Section 1.3: To achieve Objective 2, GPU acceleration of Markov Decision Processes for Path Planning is discussed in Chapter 3. Objective 3 points to the need to research implementations of Cellular Automata, to later implement a LCA version based on its characteristics. This is done in Chapter 4: Cellular Automata applied to Local Collision Avoidance for simulated crowds. A Hybrid Crowd Movement Algorithm to achieve Objective 4 is described and discussed in Chapter 5. Discussion about our contribution in character animation and modeling is presented in Chapter 6: Reducing memory requirements for diverse animated crowds. Also, the Collision Avoidance Simulator rendering engine is outlined in Chapter 7 to achieve Objective 5. Finally in Chapter 8 we present a summary of our contributions (Section 8.1) as well as a discussion with ideas for future work (Section 8.2).

2. RELATED WORK As Zhou et al. [5] put in perspective: “Despite the many existing research efforts and applications in crowd modeling and simulation, it is still a young and emerging area”. This is why the following sections aim to review modeling and technical tools related to critical stages of the crowd simulation process, focusing on those directly related to crowd movement, organizing them as the topics shown in Figure 2.1, noting that even though we use static 3D modeling and optimization to generate virtual scenarios, their design and construction are outside the scope of this thesis.

Figure 2.1: Virtual crowd movement process.

24

25 Behavioral traits

Individually (Micro) and in groups (Macro), pedestrians present distinct behavioral traits when choosing paths to navigate, thus affecting crowd movement.

Navigation

Refers to the choice of paths an agent makes in order to reach a goal, while avoiding collisions against non-intelligent objects that may be part of the environment. Path planning is essential to deliver collision-free trajectories in Macro simulations of crowd behavior.

Collision Avoidance

Refers to the anticipated movements with which an agent avoids colliding with other agents present in their environment. Collision Avoidance is relevant in Micro simulations of crowd behavior.

Animation

An essential step to simulate a virtual crowd consists in development and application of locomotion parameters for every character displayed, according to the situation produced by crowd movement at each simulation step.

Rendering

Refers to the efficient use of computational resources to ultimately display a sequence of images. Processing of fast algorithms that create synthetic images are required, so that users can interact with the simulation.

Johansson et al. [4] have discussed the increasing availability (in the last 20 years) of computational resources, which allows data capture in fine spatial scales. These resources provide advanced quantitative techniques for the simulation of pedestrian flows, automated computer vision as well as methods for modeling navigation and route finding. Methods for simulating pedestrian crowds include agent-based, social force, cellular automata, fluid dynamics, queuing models and the ones based on least-effort or plain heuristics. The approach used in this work is a hybrid agent-cellular automaton simulation, for it—as this thesis aims to establish—is able to simulate large-sized crowds accurately and efficiently. The following cited work describes the most effective crowd simulation techniques found during our research.

2.1

BEHAVIORAL TRAITS

As Monzani et al. [9] have stated, in behavioral animations “Virtual Humans acquire the ability to perceive their environment and are able to react and take decisions depending on this input (it is important to note that agents need to be situated in a common environment: otherwise, no interaction could be possible)”, when discussing Reynolds contribution at the SIGGRAPH’87 conference. Reynolds [10] himself introduced this notion by stating that “typical computer animation models only the shape and physical properties of the characters, whereas behavioral or character-based animation seeks to model the behavior of the character”.

26 2.1.1

INDIVIDUAL PEDESTRIAN BEHAVIORAL TRAITS

Personality Trait Theory As McCrae and John [11] define, Personality Trait Theory refers to “the basic dimensions of personality, the most important ways in which individuals differ in their enduring emotional, interpersonal, experiential, attitudinal, and motivational styles”. While simulating crowd behavior using Personality Trait Theory, Guy et al. [12] define a mapping from high-level behaviors (aggressive, assertive, shy, active, tense, impulsive) to simulation parameters (preferred neighbor distance, maximum neighbors allowed within a given radius, planning horizon distance, radius of sight, preferred walking speed) by calculation of a weighting matrix, as the example in Equation 2.1 models. 

  1  Aggressive (N eighborDist − 15) 13.5  Assertive     1 (M ax.N eighbors − 10)     49.5  Shy    1  (P lanningHoriz. − 30)   = Aadj  14.5  Active    1     (Radius − 15) 0.85  T ense  1 (P ref.Speed − 1.4) 0.5 Impulsive

(2.1)

We have observed that matrix multiplication is subject to massive parallelization in a GPGPU approach, therefore compatible with real-time crowd simulation.

Psychophysical, Psychosocial and Psychometric Factors During their research, Beltaief et al. [13] have presented results on Psychophysical and Psychosocial studies applied to pedestrian crowd simulations, which will be used to model the starting conditions for simulations in this thesis: • Psychophysical studies: Perception of a normal person The human horizontal perception field is divided into three main parts: 1. A vision angle of 30 degrees in which humans perceive detailed objects. 2. A vision angle of 100 degrees, in which humans perceive only object forms. 3. A vision angle of 200 degrees, in which only movement of objects is perceived. Velocity of a normal person Depending on the crowd density, the flowing equation describes the Kladek [14] function: 1 1 (2.2) VF,hi = VF,hf ∗ [1 − e(−µ( D − Dmax )) ] where VF,hi is the velocity of a pedestrian located in a crowd with a given density, VF,hf is the pedestrian velocity with freedom of movement, µ = 1.913 pedestrians , D is the m2 crowd density and Dmax is the crowd density for which a pedestrian will stop moving.

27 • Psychosocial studies: Goals of a normal Person Ordered by importance within a tree structure. Pedestrians preferences Generally following the Least Effort principle, a pedestrian chooses a path moving towards its goal. • Psychometric studies: Hou et al. [15] introduce the use of Psychometrics for crowd simulations by applying external psychological stimuli in an influencing quintuple : Frequency

The number of occurrences of the stimuli event per time unit.

Intensity

The power of certain psychological stimulus per unit volume in a crowd.

Range

The area that certain psychological stimulus can cover.

Efficiency coefficient

The psychological effect intensity due to environmental conditions.

Start and end time

The time interval for which a psychological stimulus last.

The Five Factor Model During their investigation, Garc´ıa Rojas et al. [16, 17] refer to McCrae’s Five-Factor Model (FFM) [11] (also referred to as the OCEAN model by Durupinar et al. [18]), for its ability to enhance believability of Virtual Humans, effectively simulating individualized human reactions by managing the following dimensions: Openness

How willing individuals are to adapt their conduct based on new ideas or situations.

Conscientiousness

Individuals high in Conscientiousness tend to be organized, thorough, and planful.

Extraversion

Individuals have a keen interest in other people and external events. They will venture with confidence into the unknown.

Agreeableness

How individuals are able to get along with others.

Neuroticism

A range covering from stability and low anxiety, to instability and high anxiety.

28 Three Levels of Processing In his book “Emotional Design” [19], Donald Norman describes human behavioral reactions as a stage of processing that takes sensory information as input and will output an optimal, motor reaction. The three processing levels proposed by Norman are: Visceral

It is not conscious, acts fast, makes rapid judgements of what is good or bad, safe or dangerous. Sends signals to the muscles and alerts the brain.

Behavioral

It is not conscious either, it is the site of most human behavior, it is enhanced or inhibited by the reflective layer and, in turn, it can enhance or inhibit the visceral layer as shown in Figure 2.2.

Reflective

It is the layer of reflective thought. It may not access directly the sensory input or the control of behavior. It watches over, reflects upon, and tries to bias the behavioral level.

Figure 2.2: Three levels of processing.

Although Norman applies the Levels of Processing theory to product design for effective marketing, we find it suitable for lying the foundations of or own GPGPU LCA system, instead of choosing the OCEAN model or opting for a Personality Trait Theory approach, for the following reasons: Predictability

The three-layer process to generate a response, is not only inspired in the human process, but will also produce a choice for an agent within a predictable lapse of time.

Efficiency

Real-time crowd simulation demands an equilibrium between realism and computational efficiency, this is why we have chosen the Three Levels of Processing for an individual pedestrian reactive model.

29 2.1.2

CROWD GROUP BEHAVIORAL TRAITS

A believable crowd simulation would not be accurate if it does not consider the governing dynamics in a medium scale: groups and formations within a crowd are constantly created and destroyed, and the following cited efforts have been made in order to model the group membership for an agent over time. Although each agent in a crowd has its own preferences and goals, a crowd is also composed of groups of agents that for example, will favor a substantial increase in evacuation times when in a state of high panic, as studies conducted by Sharma [20] have shown. In recent years, several approaches have arisen in order to tackle the group behavior problem: • Reynolds [21] defines steering behaviors for characters as “the ability to navigate around their world in a life-like and improvisational manner”, identifying three behaviors for groups: Separation

Allows a character to keep a distance from its neighbors (Figure 2.3(a)).

Cohesion

Allows a character to approach and form a group with nearby characters (Figure 2.3(b)).

Alignment

Allows a character to head in the same direction and speed as nearby characters (Figure 2.3(c)).

(a) Separation.

(b) Cohesion.

(c) Alignment.

Figure 2.3: Steering Behaviors a . a

Reynolds, C. Steering Behaviors For Autonomous Characters. In GDC, (1999), pp. 21-23.

• Johansson et al. [4] have identified five self-organising principles in crowds: Lane formation

Generated by bidirectional pedestrian movement in corridors.

Oscillations at bottlenecks

Bidirectional pedestrian flow over a narrow space.

Intermittency

Unidirectional pedestrian flow over narrow spaces is sometimes halted and gradually resumed.

Stop-and-go waves

Pedestrian movement flow breaks within high density crowds.

Crowd turbulence

Involuntary pedestrian movement in very high density crowds.

30 Some other relevant results found in Jonhansson et al. [4] research are used to model the starting conditions for simulations in this thesis: 1. Human mobility is not random, but characterized by temporal and spatial correlations. 2. On a small scale, pedestrian crowds do not spread out uniformly, but for an average crowd density D (people per squared meter) within a large area, local measurements of crowd density can be approximated by a Gaussian distribution with standard deviation p σ = D/3. 3. Group sizes follow a zero-truncated Poisson distribution. 4. Walking speed decreases with larger crowd density in an inverse parabolic function, so that maximum flow occurs at an intermediate crowd density and a high throughput of people is difficult to keep. Additional studies presented by Chattaraj et al. [22] demonstrate that the density D will vary on a cultural basis. For example, they found that a crowd of walking germans will form tighter groups than a crowd of walking indians because the minimum personal space is culture dependent. Moreover, Fridman et al. [23] summarize that: 1. Personal space is an invisible boundary that people maintain from each other. According to Hall [24] each person is surrounded by four invisible “bubbles” of space whose size depend, among other things, on relationships to the closest person and also on cultural background: Intimate

Refers to embracing, touching or whispering.

Personal

Refers to interactions among good friends or family members.

Social

Refers to interactions among acquaintances.

Public

Refers to all other interactions.

2. Pedestrian walking speed is also a cultural attribute [25]: people in England and France have faster walking paces than people in Jordan or Syria, as more populated cities are inhabited by faster walkers. 3. Up to 70% of the people in a crowd move in groups such as families or friends versus individuals [26]. 4. Group composition is culture-dependent in: Number and gender

For example, in Iraq there is a higher tendency to move in larger groups than it is in Canada, Israel, England or France.

Distribution

Walking side by side or men before women.

Gender homogeneity

In France, for instance, more heterogenous groups were registered.

5. Men in all cultures walk faster than women both individually and when comparing groups of men versus groups of women. Moreover, in all cultures individuals move faster than people in groups.

31 6. Culture dictates the preference to evacuate individually or in groups. 7. The will to notify about an evacuation event varies for each culture. 8. The level of seriousness regarding the participants response to a fire alarm as well as their reported emotions (the level of fear and insecurity) during the event is also a cultural attribute. 9. The presence of guiding authorities accelerates an evacuation process time in about 50% regardless of the culture. Loscos et al. [27] report that “in a city, less than a half of the pedestrians walk alone”, while modeling 2D maps to manage flow streams in order to guide a crowd using a simple set of rules that resemble individual behavior. • He et al. [28] discuss a multi-group hierarchy structure model, updating a dynamic database during execution time. Each group has a leader agent dictating common goals and behavior characteristics. • Thalmann et al. [29, 30, 31] have determined a series of crowd definitions based on the level of behavior control involved. These levels of behavior control produce, in ascending complexity order: Guided, Programmed and Autonomous Crowds. These authors have also identified a generic, high level group of model behaviors that effectively characterize a crowd: Flocking

Group ability to walk together in a structured group movement where agents move at the same speed toward the same goal.

Following

Group ability to follow a group or an individual motion.

Goal Changing

Agents may have the intention of changing groups.

Attraction

Groups of agents are attracted around an attraction point.

Repulsion

Group ability to be repulsed from a specific location or region.

Split

The subdivision of a group generates one or more groups.

Space Adaptability

Group ability to occupy all the walking space.

Safe-Wandering

Group ability to evaluate and avoid collision contact.

• In a multi-tier approach to human simulation authoring, Blank et al. [32] introduce the Behavior Authoring Pyramid, by incrementally designing the following authoring techniques: Actions, Paths, AI Base Behaviors and AI Minds. This approach is leveraged to populate virtual environments with realistic human characters in military applications. • It is also relevant to note that motion direction for a human group is rarely subject to manipulation by a self-interested and opinionated minority, because uninformed individuals play a central role in achieving democratic consensus, delegating the final group direction to the preference of the majority, as Couzin et al. [33] research demonstrates.

32 2.2

NAVIGATION

Navigation in a crowd simulation context refers to the ability of agents to effectively avoid collisions against objects within a scene while moving toward their goal. These obstacles may or may not move, but the criterion that classifies them as objects within a scene is the fact that obstacles do not avoid collisions: they do not present reactive behavior. When the number of agents is low (in a number that a CPU can handle), and obstacles do not move, classic path finding algorithms such as A∗ by Hart et al. [34] or Dijkstra’s [35, 36] algorithm can be pre-computed and applied successfully, as Rossmann et al. [37] demonstrate. For higher agent numbers and the option of moving obstacles, several techniques model and solve the agent Navigation problem, in approaches like Social, Grid Partitioning, Formation and Navigable Areas.

2.2.1

SOCIAL NAVIGATION METHODS

In his Social Force Model, Helbing [38] states that pedestrians keep a certain distance from obstacles, feeling uncomfortable the closer to a border they walk. By letting B be a border, α a α α pedestrian, ~rB the location of border B, and UαB (k ~rB k ) a repulsive and monotonic decreasing potential, the repulsive effect a border B evokes on pedestrian α is described by: α α F~αB (rB ) = −∇rBα UαB (k ~rB k)

(2.3)

This principle has been successfully implemented by Pelechano et al. [39] in an integrated sociopsychological and physical model.

2.2.2

GRID PARTITIONING NAVIGATION METHODS

Still [40] proposes the use of a Cellular Automata to model the crowd flow as a fluid with viscosity, initiating a widely adopted resource [41, 42, 43, 44] in crowd behavior simulation that produces a discrete solution for fluid dynamics equations: δρ + 5 · (ρ~u) = 0 δt ρ(

δ~u + ~u · 5~u) = − 5 p + ρυ 52 ~u + ρf~ (2.4) δt

where ~u is the fluid velocity, ρ is the density of the fluid, p is the pressure, υ is the kinematic viscosity of the fluid and f~ is the acceleration due to external forces acting upon the fluid element. When general fluid dynamics are described by the Navier-Stokes [45] equations (Equation 2.4), and simulated with the aid of a discrete version obtained by Lattice Gas Cellular Automata (LGCA) models [46, 47], the Least Effort [40] paths generated solve crowd Navigation as a Particle System problem. An advantage of using the Cellular Automata approach consists in solving the Collision Avoidance and Navigation problems with one algorithm. A disadvantage it presents is the lack of separation and control for individuals as they must follow the majority flow and direction towards an exit. Pettr´e [48] rejects this approach because “such models do not attempt to achieve realistic

33 individual trajectories”. Sarmady et al. [49] attempt to ease the lack of realism by proposing a Fine Grid Cellular Automata. Unrealistic paths, however, are still inherent to the LGCA model. Also, as Narain et al. [50] recognize, “the pressure projection looks only at local information”, and “there is often an informal cultural convention to walk on one side of the path to avoid oncoming pedestrian traffic”, noting that cultural factors (as the factors mentioned in Section 2.1.2) are not considered by LGCA methods. A Navigation method similar to Cellular Automata—in the sense of grid partitioning of the navigable space—is the one based on Vector Fields (called Motion Fields by Lee et al. [51], also used by Je [52], enriched with the concept of information entropy to describe the connection between individual behavior and the potential of disorder for a crowd) by Bian et al. [53]. Such method is able to produce a character motion flow for one agent, which is responsive to user input. The work by Banerjee et al. [8] resorts to a “scalable layered intelligence technique from the game industry”, referring to a Markov Decision Process (MDP), using it to achieve adaptable navigation results by analysis of the navigable space. By means of a layered approach, the authors demonstrate that MDP is a viable tool to dynamically produce agent paths (Figure 2.4). This implementation (handling up to 40,000 agents), however, has a limit for the number of dynamically introduced obstacles it can manage, as it requires the aggregation of a new navigation layer for each dynamic obstacle.

Figure 2.4: A crowd simulated with the Layered approach a . a

Banerjee, B. et al. Advancing the Layered Approach to Agent-Based Crowd Simulation. In PADS, (2008), p. 5.

Batty [54] defines navigation as some product of randomness, geometry, economic intentions, and social preference, yet agrees in that human navigation may be modeled with a MDP and a random component, as the geometry of the environment constrains and gives structure to the walking movements. By also resorting to a Reinforcement Machine Learning technique, Pejsa and Andrist [55] evaluate the feasibility of applying Q-learning [56] to the navigation problem, finding that it is applicable to find a near-optimal locomotion control policy for one virtual agent, by satisfying an equation of the form: Π∗ (s) = argmaxa Q (s, a). This work will analyze the possibility of extending a reward-penalization approach to Navigation for crowds in Section 3.

34 As noted by Reyes et al. [57], Foka and Trahanias [58], the major disadvantage of using MDPs is that they are computationally inefficient in terms of performance and memory consumption, since its state space grows exponentially with the number of domain variables, and its inference methods grow in the number of actions. Different approaches have been proposed to overcome this problem. A naive approach consists in maintaining the space state small by using a coarse discretization of the environment [59], typically one square meter discretization. Decomposition offers the possibility to solve large MDPs. A MDP can be decomposed based on pseudo independent sub processes [60, 61, 62] or automatically decomposed into such sub processes [63]. Resultant sub MDPs are merged or used to construct an approximate global solution. Meuleau et al. [61] noted two types of decomposition: state space decomposition and process decomposition. In state space decomposition the state space is divided into regions to form sub MDPs, so that the MDP is the union of the sub MDPs [58, 64, 65]. State space decomposition is also known as serial decomposition since one sub MDP is active at any given time. The overall state consists of an indicator of which sub MDP is active along with its state. Sub MDPs interact at their borders, which are the states where we can enter or leave them [66]. On the other hand, in process decomposition the sub MDP is treated as a concurrent process to compute coupled [67] or weakly coupled sub MDPs [60, 61]. Relevant work in robot path planning regarding state space decomposition is proposed by Foka and Trahanias. They present a hierarchical approach [58, 64, 65] to decompose a partially observable MDP (POMDP) into POMDPs with smaller state space. This decomposition is based on a quad-tree structure: at the top level there is a POMDP with a coarse discretization. At the next level, each state of the POMDP at the top level is divided in four states and a new POMDP is constructed having these four states. Each state at the top level corresponds to a POMDP in the next level. Partitioning stops when the quad-tree expansion reaches the number of states of the original flat POMDP. Then, the hierarchical POMDP is solved from top to bottom: the root POMDP gives the general route the robot should follow. Branch selection at lower levels depends on the current state of the robot. Solving selected POMDPs at all levels is performed until the bottom level is reached. Mausam and Weld [67] follow the idea of concurrency to solve MDPs, i.e. solving simultaneously coupled MDPs for non-conflicting actions generating solutions close to optimal. They use a factored action representation called probabilistic STRIPS [63] to determine non-conflicting actions, i.e. two actions may not be executed concurrently if in any state 1) they have inconsistent preconditions, 2) they have conflicting effects, or 3) the precondition of one conflicts with the (possibly probabilistic) effect of the other. Then, they modify the MDP definition given in [68] to introduce the associated parallel based formulations. Mausam and Weld solve the action-state space growing problem and increase the MDP solving performance by extending the Labeled Real-time Dynamic Programming method [68]. They use two pruning strategies: combo-elimination, that applies an action elimination theorem [69] and combo-skipping that eliminates sub optimal parallel action combinations. Finally, combosampling, that uses a “likely parallel action combinations” distribution, is used to achieve perfor-

35 mance speed-ups. Also following the idea of concurrency, Sucar [60] proposes a parallel implementation of weakly coupled MDPs. He decomposes a problem into different (independent) aspects. Each aspect is represented as a MDP and solved simultaneously. At any state of the problem, each MDP sends its value for each possible action and an “arbiter” selects the action with the greatest combined value. Process decomposition is an approach that might resemble the concept of task parallelism, where a problem is formulated as a stream of instructions that can be broken into sequences called tasks that can be executed simultaneously [70]. However, these tasks should usually be independent. Following this terminology, data parallelism focuses on the data required by the tasks and how it can be decomposed into distinct chunks [70]. Recently, efforts have been made to optimize MDP implementations from the point of view of data parallelism using the GPU. These algorithms take advantage of the “massive” data parallelism available in GPU computing to reduce the MDP computing time. One of the first methods using the GPU to implement a GPU based MDP solver was introduced by J´ohannsson [71]. His master thesis presents a dynamic programming framework that implements the Value Iteration algorithm to solve MDPs using CUDA, in particular he introduces two algorithms: Block Divided Iteration and Result Divided Iteration. While the first algorithm is a parallel version of the original sequential value iteration algorithm that fits the CUDA computational model, the latter focuses on the parallelization of all subtasks of the value iteration process, i.e. each step of the sequential algorithm is distributed between as many threads as possible. Chen and Lu [72], also present a GPU implementation of the Value Iteration algorithm, which according to the authors uses OpenCL. Their description, however, lacks of technical detail regarding how such implementation was performed. In addition, the reported results are inconsistent: for example, a MDP with both a thousand and ten thousand states is solved in 1024 ms, but a MDP with four thousand states takes only 768 ms. Noer [73] explores the design and implementation of a GPU point-based value iteration algorithm (PBVI) in the context of partially observable MDPs (POMDP). After presenting a survey of algorithms to solve POMDPs, the author noted that PBVI gives approximated solutions but in a shorter running time. An interesting feature of PBVI and its GPU implementation is the support for belief state pruning which avoids calculations.

2.2.3

FORMATION NAVIGATION METHODS

In a Morphable Crowd Model, in the sense of formation, Ju et al. [74] explore the concept of scalable crowd formation blending, in which formation traits are extracted from the sampling of motion clips. A first algorithm generates a mathematical description for the spatial distribution of agents by incrementally adjusting agent density within a group radius until an average density threshold is satisfied. A second algorithm will update agents positions based on a collection of distributions and a collection of trajectory segments. This Morphable Crowd Model can be applied to Navigation for crowds as groups of agents are shaped by the physical characteristics of a scenario.

36 2.2.4

NAVIGABLE AREAS METHODS

Pettr´e et al. [48, 75] opt for decomposing the navigable area into overlapping cylindrical cells with the aid of a Voronoi Diagram [76] that will generate a Navigation Graph which provides path variety and batch processing for groups of pedestrians. It does not, however, support dynamic scenarios. Karamouzas et al. [77] introduce the Indicative Route Method (IRM), capable of providing collision-free paths for agents moving in a dynamic scenario. IRM is a three phase algorithm: 1. The generation of an indicative route (meaning that it should not be followed precisely by an agent, but it is rather a description of the navigable area near a path towards a goal), either manually established or automatically determined by a grid searching algorithm such as A∗ (Figure 2.5(a)). 2. The creation of a collision-free corridor around the corresponding indicative route, where a character can move without colliding with the environment. This corridor is determined by applying a Generalized Voronoi Diagram to the space surrounding the indicative route (Figure 2.5(b)). 3. Helbing [38] Social Forces guide a character through the corridor as if it were a particle (Figure 2.5(c)).

(a) Indicative Route.

(b) Corridor.

(c) Final Path.

Figure 2.5: Indicative Routes Method (IRM) a . a

2.3

Karamouzas, I. et al. Indicative Routes for Path Planning and Crowd Simulation. In ICFDG, (Apr 2009), p. 7.

COLLISION AVOIDANCE

The Collision Avoidance (CA) problem is fundamental in robotics and crowd simulations, and presents a classic exercise of parallelism for the following reasons: • A single agent in a scene will not imply CA calculations in a real-time simulation, because the only possible condition of collision will be verified against elements within the scene (these kind of collisions are addressed in Section 2.2).

37 • Under a non-optimal simulation, as the number of simulated agents increases to N , complexity of the CA algorithm is O(N 2 ) as computations must be executed to compare each agent position against every other agent position in the scene. • A parallel—may be GPGPU—approach will decrease the time in which CA calculations are performed, because theoretically a processing core may be assigned with the task of comparison for each agent. However, complexity for the CA algorithm remains O(N 2 ). The following cited researchers address the CA problem either by adapting existing CA techniques to the GPGPU paradigm, by attempting to reduce the order of complexity for the algorithm that computes the positional comparisons, or by adopting a Machine Learning approach, as Abdullah et al. [78] have summarized. It is worth to mention that even though the Machine Learning approach is not currently a viable option for real-time crowd simulations, advances in hardware performance anticipate that it will gradually come to be [79, 80]. Some of these works have been selected during our research and listed below.

2.3.1

MACHINE LEARNING COLLISION AVOIDANCE METHODS

Although not suitable for Huge-sized crowds because of its computationally heavy perception requirements and original focus on vehicle-pedestrian CA, Fern´andez et al. [81] CA method has drawn our attention because of its resemblance to the processing levels of Norman [19]: by referring to a Knowledge Base (which implies a previous training and depuration stage), an agent state transits through the Perception, Planning and Actuation stages of a control architecture in order to successfully avoid collisions. Another relevant contribution related to Machine Learning is that of Li et al. [82] whose CA simulation results for vessels are the outcome of a carefully orchestrated interaction between a dynamic Rule Base, an Inference Engine and a Knowledge Base.

2.3.2

HIERARCHICAL COLLISION AVOIDANCE METHODS

In a hierarchical representation of the CA problem, Bonner and Kelley [83] introduce the concept of Successive Spherical Approximation (SSA), in which a hierarchy of rules is used to heuristically obtain precise collision-free paths within a structured environment. In an unpublished work, Rudolph [84] proposes an improvement to the basic CA algorithm, by dividing the rendered scene in sections aligned with the current rendering camera (a divide-andconquer approach according to Cormen et al. [36], also classified as a Branch-and-Bound method by Velho et al. [85]). The procedure is GPU-based and performed as follows: 1. Locate the current camera position and proceed to use this 3D point as the center of two orthogonal planes that will partition the scene in eight regions. 2. For each partition of the scene, generate a linked list for which each node points to an object inside the current partition.

38 3. For each partition, if the number of elements within the associated linked list is equal or less than one, ignore the partition, for no collision is possible. 4. Perform collision tests for the linked lists with more than one element, effectively reducing the required time to test every object in the scene for collisions. 5. In order to solve the inter-section collision test, randomly vary the orientation of the partitioning planes and compute collisions again. In another Branch-and-Bound approach, Bing et al. [86] implement a Space Division on the GPU, similar to the one proposed by Rudolph [84] (and even earlier in a broader sense with the Axis-Aligned Bounding Boxes method by Larsson and Akenine-M¨oller [87]), but instead of creating linked lists for each partition, they associate each object to its smallest enclosing ellipsoid, moving this ellipsoid along with its associated object in each animation frame. The CA algorithm tests collisions between ellipsoids and triangular meshes allowing more complex geometry to be tested for collision in a GPGPU approach. In yet another Branch-and-Bound approach, Reynolds [10, 88] implements a lattice of buckets that partitions the navigable space, and reduces the collision search space even more by restricting the CA computations to a bounding radius associated to each agent.

2.3.3

GEOMETRIC-BASED COLLISION AVOIDANCE METHODS

Using concepts originated from aerospace literature, Chakravarthy and Ghose [89] introduce a Collision Cone approach, capable of testing for collisions between irregular shapes, by analytically reducing the intersection problem to that of the collision estimation between a point and a circle. Another geometric CA method developed by Li et al. [90] consists in projecting slices of the scene at previously defined distances, and testing each of the generated slices for collisions, thus transforming a three-dimensional problem into a two-dimensional, more easily manageable one. In a novel approach, van den Berg et al. [91] introduce the Optimal Reciprocal Collision Avoidance (ORCA) concept, extending their own Reciprocal Velocity Obstacles (RVO) [92] theory (also extended by Guy et al. [93] for parallel implementation on the CPU): τ • For two agents A and B, V OA|B (the Velocity Obstacle for A induced by B for a time window τ ) is the set of all relative velocities of A with respect to B that will result in a collision between A and B at some moment before time τ as shown in Figure 2.6. τ • Let VA and VB be the current velocities for agents A and B. If VA − VB ∈ / V OA|B then A and B will not collide within time τ if they keep their velocity. τ • The set of collision-avoiding velocities CAτA|B (VB ) = {v|v ∈ / V OA|B ⊕ VB } for A given that B selects its velocity from VB .

• A pair of velocity sets VA and VB are reciprocally collision-avoiding if VA ⊆ CAτA|B (VB ) and VB ⊆ CAτB|A (VA ).

39

Figure 2.6: Geometric interpretation for Reciprocal Velocity Obstacles a . a

Van den Berg, J. et al. Reciprocal n-body collision avoidance. In ISRR, (May 2011), p. 5.

• Optimal sets of velocities vAopt and vBopt that maximize the amount of permitted velocities for A and B are ORCAτA|B for A and ORCAτB|A for B as shown in Figure 2.7.

Figure 2.7: The set of ORCA permitted velocities b . b

Van den Berg, J. et al. Reciprocal n-body collision avoidance. In ISRR, (May 2011), p. 7.

2.3.4

DISTRIBUTED COLLISION AVOIDANCE METHODS

From a Massively Multiplayer Online Game (MMOG) point of view, Morgan and Storey [94] propose a message exchange system between nodes in order to achieve collision detection in a Distributed Virtual Environment (DVE). In this scalable system, a collision detection server notifies users and members of a collision detection cluster about positional state updates and collision occurrences.

2.3.5

VIRTUAL PERCEPTION-BASED COLLISION AVOIDANCE METHODS

Ondˇrej et al. [95] propose the concept of Vision-based Collision Avoidance, starting from the idea that “Humans control their locomotion from their vision”, constructing a Synthetic-Vision

40 approach that allows detection of future collisions by considering the bearing angle α from an observer to another walker, along with its derivative in time α. ˙ In the case when α remains constant (α˙ = 0) between two approaching walkers, a collision will occur, as shown in Figure 2.8.

Figure 2.8: Vision-based Collision Avoidance model a . a

Ondˇrej, J. et al. A Synthetic-Vision Based Steering Approach for Crowd Simulation. In TOG, (Jul 2010), p. 3.

Ondrˇej [96] has indicated that humans use one of two strategies to avoid obstacles in the environment: “they can plan the trajectory in advance or the trajectory is a result of pure reactions to obstacles”. Some other relevant observations made by Ondrˇej [96] are used in our HCMA to avoid collisions: 1. Human CA against a stationary object is initiated from a constant distance. 2. Human CA against a moving object causes the speed of a pedestrian to decrease as the feasibility for a collision event is determined. 3. Humans prefer to avoid obstacles by facing them with their dominant side. For example, lefthanded persons will prefer to avoid an obstacle while exposing the left side of their bodies toward it, as Moussa¨ıd et al. [97] confirm. 4. Humans use a two-step strategy when performing CA, by first changing the heading course and then adjusting their walking speed. Also, Viswanathan and Lees [98] propose an Information-based Perception Model, that works with virtual perceived information, and not spatial distance, in an effort to model human perception and natural Collision Avoidance methods. By defining information units (agents) and then clusters of them within circle-enclosed areas, a virtual sensing procedure is performed to estimate optimal velocities for agents.

2.4

AGENT ANIMATION

To produce a believable crowd simulation, agents must be displayed as characters that physically resemble a human in the most possibly detailed manner (Figure 2.9). Such simulation will mimic human motion accurately, from coarse topics related to limb rotation limits or body orientation, to finer details as facial expression or clothing collisions (for example, Pandzic et al. [99] propose the design of separate engines for body and facial representations).

41 Thalmann et al. [100] have determined a series of challenges for the animation of crowds, while explaining applications of the MASSIVE1 crowd authoring software: Behavior

Defining collective behaviors while preserving agent individualities.

Flexibility

Sequence of actions should be fluid in any case.

Variety

Can be applied to an individual or a member of a crowd.

During the course of our research, we have found that the character animation discipline is linked to a three-stage process for characters that are not impostors (Section 2.5): the Rigging, Skinning and Keyframing stages.

2.4.1

Figure 2.9: An Agent.

RIGGING STAGE

Corresponds to the generation of a mathematical representation for the underlying rigid skeleton (a hierarchy of bones and joints, as shown in Figure 2.10) to which a three dimensional mesh is attached. This skeleton, often referred to as the rig, according to Ritchie [101] “imbues the models with the ability to move believably”, and may be either generated by an artist, as the locomotion clips used by Ma¨ım et al. [102], or determined automatically by mathematical analysis of its character-associated mesh. Baran and Popovi´c [103] have presented a general algorithm that automatically produces a rig, by progressively packing spheres inside the volume enclosed by the mesh, an then joining the centers of the spheres resulting in a functional hierarchy of bones and joints. A proposition to ensure correct rig generation involves querying a character database to adjust the automatically generated rig, as Pantuwong and Sugimoto [104] describe. The idea of connecting segment centers is also used by Vasilakis and Fudos [105], who developed an algorithm to determine, group and connect what they refer to as opening centroids (the centroid for the joining points at two adjacent components generated during a mesh segmentation process), successfully generating a supporting rig. A semi-automatic approach introduced by Zheng et al. [106] generates more functional rigs (in the sense that they produce the desired animations) by contemplating a user-generated sketch of the rig before proceeding to analyze, segment and connect components.

2.4.2

SKINNING STAGE

The graphical representation of a character is achieved by rendering a mesh, (Figure 2.10) with the following characteristics: 1

Massive Software Simulating Life. (n.d.). Massive Software Simulating Life. Retrieved June 25, 2012, from http://www.massivesoftware.com/

42

Deformable

The mesh will follow its associated rig rotations.

Three-dimensional

The mesh will represent a Virtual Human.

Textured

The mesh will imitate skin and clothing.

Geometrical

Although a mesh may be composed of quads, graphics chips perform best when they render triangles [100].

Figure 2.10: Character rig and mesh.

Every vertex in a character mesh is attached to at least one bone in the rig hierarchy, in a way that when a bone rotates, its attached vertices are translated accordingly, deforming the mesh altogether as explained by Eberly [107]. The algorithm that associates vertices positions to joint rotations, is known as the skinning algorithm, and has been explored in linear and non-linear variants. 1. Linear Skinning Techniques (LST). The Linear Blend Skinning (LBS) algorithm [108] (also known as Matrix palette skinning) is the starting point for every rig-skin-based model animation, and it simply weights contributions for a number of joints to influence a given vertex. Starting with a set composed n P of n bones, influences wi (where wi = 1) are multiplied by transformation matrices Cji i=1

(where sub index j represents a joint in the rig hierarchy) and the result is then multiplied by the current vertex position v to obtain the new vertex position: n X v0 = ( wi Cji )v

(2.5)

i=1

as shown in Figure 2.11(a). This technique can, however, produce unwanted results (known as the “candy-wrapper” effect, loss of volume in general) as shown in Figure 2.11(b). This happens because the formula for LBS does not preserve orthogonality for the rotational part of the matrix. In an interesting comparison of LST, Jacka et al. [109] test performance for the following linear skinning algorithms:

43

(a) Joints, Bones and Vertices in LBS.

(b) The “candy-wrapper” effect.

Figure 2.11: Linear Blend Skinning (LBS) a . a

Kavan, L. et al. Spherical Blend Skinning: A Real-time Deformation of Articulated Models. In I3D, (2005), p.

11.

Skeletal Subspace Deformation (SSD) Linearly combines the results of the vertex transformed rigidly with each bone: n X

wi Ti Tˆi−1 vˆ

(2.6)

i=1

where Ti is the current transformation matrix for bone i, and Tˆi is the transformation matrix at rest position for bone i, i.e. the initial pose for the rig hierarchy. Animation Space Changes the single rest pose position of each vertex for Equation 2.6, substituting pi = wi Tˆi−1 vˆ, producing: n X v= Ti pi (2.7) i=1

and allowing four variable weights per bone influence (the four components of the pi vector). Multi-Weight Enveloping (MWE) Also changes the weighting factor for Equation 2.6, substituting Mi = Ti Tˆi−1 , producing: n X v= wi Mi vˆ (2.8) i=1

allowing independent bone influence for each vertex. Jacka et al. [109] conclude that Animation Space consistently performs best, avoiding many of the loss-of-volume errors present in SSD, and also preventing the over fitting effect of MWE. A more precise approach is Particle Based Skinning, developed by Tokoi and Komiya [110] to model interactions—repulsion, static, attraction—between particles, positioned over the mesh and linked statically to the nearest bone in the rig hierarchy. Whenever a joint rotates, the particles move accordingly, and their resulting interactions are modeled deforming the

44 mesh. This method, however, performs slower than SSD, Animation Space and MWE because precise deformations require a large number of particles. Chen et al. [111] have also tested a particle-based approach mixed with a cubic lattice surrounding the mesh, driving the surface deformation and preserving volume. This approach has been tested with successful results for the precise deformation of a single character, and even though this method is implemented in the GPU, the high demand of resources for its algorithm prevents its use for real-time crowd simulations. In a successful effort to automatically skin deformable mesh animations without the definition of a rig, but with a consequential loss of animation accuracy, James et al. [112] present a method able to approximate the original animation, starting from a set of animation frames of a deformable mesh. This method can be applied to enrich LoD and CA techniques because of the underlying shape parameterization that these mesh animations imply, namely that in order to obtain this parameterization, a Mean Shift Clustering algorithm [113] is successively applied, and every iteration may serve as an independent LoD. 2. Non-Linear Skinning Techniques (N-LST). The work by Kavan et al. [114, 115, 116, 117, 118, 119] from 2005 to 2009 is relevant to the field of virtual model skinning for the following reasons: • Kavan et al. [114] propose the Spherical Blend Skinning (SBS) method to avoid unwanted artifacts produced by the LBS algorithm, introducing the use of a center of rotation and a rotation matrix computed with quaternion multiplication: 0

v = Q(v − rc ) +

n X

wi Cji rc

(2.9)

i=1

This algorithm produces more accurate visual results, but the execution time is 20% to 25% slower than its LBS counterpart. • The SBS approach can incorporate a Collision Detection structure based on a Bounding Volume Hierarchy (BVH) [115]. By making use of the unit quaternion sphere (implicit in the SBS algorithm), a refitting operation generates bounding spheres for joints in the rig, effectively creating a hierarchy similar to that of the rig, that can be used for Collision Detection routines involving complex meshes. • Kavan et al. [116, 117, 118] dual quaternion method can be used to skin arbitrary deformations distributing the control transformations uniformly over the animated 3D model (making no assumptions about the input animation); it requires larger pre-computation times but in return obtains much more accurate skinning approximations than the ones achieved by James et al. [112]. Dual quaternions have the following characteristics: – Similar to Hamilton quaternions [107], but in addition to the classical units 1, i, j, k, there is an additional dual unit, denoted as , that satisfies 2 = 0 and is commutative in multiplication with i, j, k.

45 – An overline denotes dual conjugation, or the replacement of  by −, for example 1 + 2 − i + 3k = 1 − 2 + i + 3k. – As Kavan et al. [118] mathematically prove, dual quaternions represent general rigid transformations in computer graphics. The dual quaternion method works as follows: Rigid joint transformations C1 , ..., Cp Are converted to dual quaternions q1 , ..., qp . Vertices With components vk = (vk,x , vk,y , vk,z ), are represented with dual quaternions as vˆk = 1 + (vk,x i + vk,y j + vk,z k). Deformed Vertices Positions are obtained with the formula: vˆkdef

nk nk X X =( wk,i qˆjk,i )ˆ vk ( wk,i qˆjk,i )−1 i=1

(2.10)

i=1

which is also of the form: def def def vˆkdef = 1 + (vk,x i + vk,y j + vk,z k)

(2.11)

The visual advantage of Dual Quaternion Skinning (DQS) over LBS is depicted in Figure 2.12.

Figure 2.12: Dual Quaternion Skinning (DQS) a . a

Kavan, L. et al. Skinning with Dual Quaternions. In I3D, (2007), p. 1.

• Kavan et al. [119] have also shown that DQS may be arbitrarily approximated by LBS with their own Automatic Linearization algorithm based on the placement of virtual bones in critical parts of the model, splitting large joint rotations without introducing any noticeable artifacts. In an integration effort, Ju et al. [120] propose the creation of a seamless cage composed of smaller cages, each modeling a possibly different skinning technique. Together, skinning technique

46 and associated cage form what the authors call a Reusable Skinning Template. The technique, however, is not fully automated. The work by Miller et al. [121] presents an automated method to solve both rigging and skinning problems by tailoring (annotating, fitting and scaling) body parts stored in a database in order to rig an arbitrary target mesh. The stored body parts have both rigging and skinning information available, that will be connected (by finding correspondences in the source and target meshes) and diffused (blending the skinning information for the whole mesh) by an algorithm named Skin Transfer. The authors call the rigging and skinning method “Frankenrigs” and its general work flow is shown in Figure 2.13.

Figure 2.13: Flowchart for “Frankenrigs” a . a

Miller, C. et al. Frankenrigs: Building Character Rigs from Multiple Sources. In TVCG, (Aug 2011), p. 2.

2.4.3

KEY-FRAMING STAGE

As Everly [107] defines, “Kinematics is the study of motion without considerations of mass or forces”, and Inverse Kinematics (determination of initial states from known final effectors) are ideal to determine rotations for the bone-joint hierarchy elements as their parents in hierarchy rotate. Eberly et al. [107] summarize the numerical solutions available for the Inverse Kinematics equations, namely Jacobian Methods, Nonlinear Optimization and Cyclic Coordinate Descent. As the rendering process is ultimately the successive display of raster images, it follows that key-frame animation based on Inverse Kinematics may be coupled with each rendering step. When plotted in a rotation-time graph, joint rotations should present a smooth, continuous curve, also known as a motion graph if the flexibility requirement proposed by Thalmann et al. [100] is to be fulfilled. Ikemoto et al. [122] address this issue by proposing a framework to control autonomous agents motion by performing a Reinforcement Learning-based synthesis of relevant motion graphs: proper motion graphs are chosen based on penalties represented by collisions against obstacles in the surrounding environment. Ma et al. [123] have explored the use of Bayesian Networks to model style and variation in human motion, describing a relationship between user-defined parameters (such as stride length) and latent variations as shown in Figure 2.14, generating unlimited motion variants. Although these Machine Learning variants have only been tested for one agent, we find it relevant to include them in the present work for the reasons pertaining hardware advances mentioned in Section 2.3.

47

Figure 2.14: Mapping from stride parameters to motion variations b . b

2.5

Ma, W. et al. Modeling Style and Variation in Human Motion. In SCA, (2010), p. 2.

RENDERING

Once the agents composing a crowd are organized to navigate avoiding collisions—against their surrounding environment and against each other—and are represented by characters that mimic human behavior, the final step in a real-time crowd simulation is to display the results: to render a sequence of images. The rendering process is a constant challenge to balance the available computational resources in order to produce a believably animated virtual crowd, and also to display as many characters as possible. Approaches for crowd visualization that use polygons and point based representations [124, 125, 126] demonstrated the usefulness of these primitives to represent characters geometrically. Nevertheless image based representations, such as impostors [127], were better suited for the hardware available at the time. Improvements to the original technique were conducted by Aubel et al. [128, 129] and Tecchia et al. [130, 131] while the use of hybrid representations that overcome the flat and pixelated appearance problem was introduced by Dobbyn et al. [132] and further developed by Millan and Rudomin [133] and Ma¨ım et al. [102]. Such hybrid representations consist in switching from detailed models into impostors according to the camera view. Impostors consume a large memory space due to the fact that every animation keyframe and point of view must be computed and stored beforehand. Kavan et al. [118] and Beacco et al. [134] reduced the impostor’s memory footprint. However with visual diversity (required for character uniqueness) even modified impostors require the use of a large memory footprint. This is because visual variety strategies consist in modifying the character’s color, textures, shape and animations. Color modulation works well using impostors [129, 130, 131, 132, 133] but when the crowd is seen at closer distances the characters seem to be uniform. Texture variation deals with this issue by modulating additional textures with the character’s textures to add effects such as make-up,

48 facial hair, wrinkles, eye sockets, facial spots, hair coloring and cloth patterns [135, 102]. Shape deformation requires a polygon mesh (body template) with a set of landmarks and userdefined or statistical data in order to generate novel characters [136, 137, 138], but resultant models might be naked and have not been tested in massive real time crowd rendering. On the other hand, Galvao et al. [139] and Ma¨ım et al. [102] use displacement mapping to change the body fat or musculature of a character. In a sense, these approaches (mainly based on polygon meshes) reduce the time needed to model novel characters and minimize the memory footprint. Nevertheless, an explicit analysis on this matter is not yet available. Recent methods as [140, 141] successfully address the subject of outfit synthesis. The former implements a Bayesian network and support vector machines to learn dress codes and optimize outfit suggestions (casual, sportswear, business-casual, and business) from real life variables (hair color, eye color, and skin color of the input body, plus a wardrobe of clothing items) while the latter uses “wear properly” and “wear aesthetically” criteria and support vector machines to propose outfit coordinates to the user. McDonnell et al. [142] found that animation diversity made it difficult to detect character duplicates, in particular when a duplicate was combined with random motions or when motions were out-of-step. Animation variety imposes challenges such as defining the means by which a character will move (rigging), reducing or eliminating the visual artifacts that joint motion produces (skinning), and animation reuse. Moreover, these must be performed automatically to reduce authoring time. As mentioned in Section 2.4, there are two approaches to these challenges, the first consists in generating [143], embedding preexistent skeletons [103] or in reconstructing new rigs from a partial rig database [144] to rig a character mesh. The second approach consists in calculating the bone transformations and its weights given a set of posed character meshes [112, 145, 146, 147]. All the same, the way in which the generated information should be stored, used and recycled in order to be applied in crowds remains an open problem. Several remaining rendering optimization techniques have been developed in numbers and complexity that are out of the scope for this section, nevertheless, we will present their definitions as well as the sections that discuss some of their implementations in Table 2.1 pointing out those that concern our effort to render a real-time virtual crowd. The following classification is based on the work by Ahearn [148], Eberly [107] and Akenine-M¨oller [7]. Asset-based optimizations Those implemented during asset production, for example: production of triangle-based meshes instead of quad-based meshes to accelerate rendering. MIP mapping Also known as Texture-LoD, consists of multiple scaled versions of the same texture packed into one file, in such way that a rendering engine may choose the texture version that consumes least resources for the current view of a scene. Masking and transparency Masking is the process of marking zones of a texture to be transparent, simulating com-

49 plex meshes where there are none. Transparency makes use of the alpha channel in a texture to simulate translucent objects. Texture size and compression The visual importance of the objects determine the texture size needed to cover them. A compressed texture file will occupy less memory space while still preserving visual integrity. Particle Systems The number and lifetime of particles will determine the amount of computational resources required for their simulation. Polygon Optimization Refers to the use of necessary vertices only, avoiding floating or not connected vertices, and artifacts that produce more triangles than rendering efficiency demands. LoD

Working on the same principle as MIP mapping, LoD techniques ensure a high polygon model when a camera is close to it, and gradually lower polygon models as the camera moves farther. As GPU hardware reaches performance limits when the number of agents in a simulated real-time crowd increases, Level of Detail (LoD) techniques arise to alleviate the processing burden allowing higher agent numbers without sacrificing visual accuracy. An example for raw LoD would be substituting agents situated at far distance from the camera by points colored with the average clothing tone of the character. Such substitution for a character is known as an Impostor. However, gradual Levels of Detail can be achieved with some of the techniques summarized by Thalmann et al. [100]: Animated Impostor

“A set of transparent polygons onto which we map meaningful opaque images”, “placed in such a way that it faces the camera continuously and its texture is updated regularly enough for it not to be visually distracting”.

Factored Impostor

Factoring of a single layer by detection of mutual occluders, delivering the composition of an agent and its closest occluding objects in a single texture resulting in resource optimization.

Z-buffer Corrected Impostor Being the Z-buffer a resource common in graphic programming libraries to determine linear depth for an object with respect to the camera, it can be used to sort occluders by depth whenever one of the participants in the occluding test move, triggering the generation of textures for the Impostors. Collision-based optimizations For Collision Detection routines, it is useful to represent an object with a low-resolution version or with its Bounding Volume: the basic geometric volume that best fits the simulated object. Either of these solutions is known as a collision hull.

50 Collision objects Depending on the complexity of displayed objects, and the precision required for Collision Detection routines, collision hulls will be complex (as an automatically generated convex hull) or simple (as a cube or sphere). Collision types As described in Section 2.3: collisions against the environment (Navigation), or collisions against other agents (Collision Avoidance). Occlusion and culling When a near object blocks the view of a far object, the far object should not be rendered to optimize rendering resources. Optimally, any occluded section of an object should not be rendered. Furthermore, sections of an object that are occluded by itself should not be rendered either. Likewise, an object that is outside the camera view range (also known as frustum) should not be rendered. View frustum As shown in Figure 2.15, the camera view frustum (or truncated pyramid) determines whether an object should be rendered if it is visible.

(a) The sphere is inside the frustum and it should be (b) The sphere is outside the frustum and it should not rendered. be rendered.

Figure 2.15: Camera view frustum.

51

Optimization Technique Asset-based optimizations MIP mapping Texture pages Unlit textures Multiple UV channels Light maps Masking and transparency Texture size and compression Particle systems Forced perspective Polygon optimization LoD Collision-based optimizations Collision objects Collision types Occlusion-based optimizations Occlusion and culling View frustum Distance fog Cull distance Cells and portals

Explored in this work Yes Yes No No No No Yes Yes Yes No Yes Yes Yes No Yes No Yes Yes No No Yes

Table 2.1: Rendering Optimization Techniques.

Section 7.1.3 7.1.2

7.1.2 7.1.2 4.2.5 7.1.3 7.2 3, 4 3, 4 7.2 7.2

3.4, 4.2.5

3. GPU ACCELERATION OF MARKOV DECISION PROCESSES FOR PATH PLANNING In this chapter we focus on solving the Navigation problem for real time crowds involving embodied agents within virtual environments. Our first observation is that an agent, while moving through an environment, solves a sequential decision problem to find a path that goes from its current configuration to a goal, constructing a set of additive rewards as it gets closer to it. This kind of behavior, where past states do not influence resolution for future states, can be described by a Markov Decision Process, or MDP, introduced as a Dynamic Programming Process by Bellman [149] in 1957. With the aim of applying Markov Decision Processes to crowd Navigation, in Section 3.1 we introduce some MDP terminology and overview the application of MDPs to the path planning for crowd simulation problem. Section 3.2 explains each step of our proposed algorithm, from the input parameters to the Value Iteration algorithm as well as the associated data structures. In Sub-section 3.2.4 we present an implementation of the algorithm using parallel transformation and reduction operations which are exposed in the Thrust library. We present how these results are implemented in crowd navigation (Section 3.3). Section 3.4 presents performance results of the algorithm using embedded and desktop platforms and its application in real-time 3D crowd simulation. We close the chapter in Section 3.5 presenting our conclusions.

52

53 3.1

PROBLEM SOLVING STRATEGY

A fully observable environment presenting a sequential decision problem with a Markovian transition model—a probabilistic decision model for which future states do not depend on prior states when a decision is made—and additive rewards is a Markov Decision Process, or MDP, as described by Russell [56]. A MDP is a tuple M =< S, A, T, R >, where S is a finite set of states, A is a finite set of actions, T is a transition model T (s, a, s0 ) and R is a reward function R(s). A solution that specifies what an agent should do at any given state is a policy π(s). Considering a scenario for which the position of static obstacles is determined prior to the crowd simulation, a constrained, fully observable MDP can be evaluated in order to determine optimal navigation directions for groups of agents enclosed in cells, given that a 2D representation of the scenario is regularly partitioned in these cells that will be equivalent to the finite set of states S for the MDP. This set of optimal directions is the Optimal Policy (Π∗ ).

3.1.1

MARKOV DECISION PROCESS MODELING

Granularity and shape of the motion space partition translates to the cardinality of the states set for the MDP, which in turn is related to the solution complexity. Liu [150] indicates that the best disposition of points to sample a plane occurs if the points are on a hexagonal grid as the center of hexagons and, as Brimkov and Barneva [151] note, hexagonal uniform tiling is optimal for approximation of a continuous object, requiring the least number of tiles—among all possible tilings—in order to fill a given area; for these reasons we opt to use a hexagonal grid in our experiments (but other shapes for tiles are compatible with the techniques presented in Section 3.2). Also, as Papadimitriou and Tsitsiklis [152] show, complexity of a MDP depends on the variant of the algorithm to search for Π∗ : finite horizon, infinite horizon discounted, or infinite horizon average cost. In Section 3.2.4 we present a parallel implementation of the infinite horizon discounted variant that will first, reduce the amount of time required to obtain an optimal MDP policy when compared to a CPU implementation and second, operate while the crowd simulation is executed, rendering the large-problem MDP application affordable in terms of processing time when applied to path planning for crowd simulation. A starting—arbitrary—case scenario consists of a 4 × 4 hexagonal grid, representing a rewardpenalty layout for an agent (Figure 3.1(a)). We define 6 possible directions as the set of actions from which an agent will choose at every discrete time evaluation in order to approach its goal (Figure 3.1(b)). By stating that an agent will move along the best path towards a goal position, we imply the solution of an utility function. In this case, Uh ([s0 , s1 , ...sn ]) = R(s0 + γR(s1 ) + γ 2 R(s2 )) + ... + γ n R(sn )) represents the preference of an agent for immediate rewards over future rewards corresponding to states—in this case, cells—in the set {s0 , ..., sn }, considering the discount factor γ ∈ [0, 1] until time horizon h. For the basic case scenario, a static obstacle is assigned a penalty (−1.0) and every available cell is assigned the same slight penalty (−0.04).

54

(a) Rewards R(s).

(b) Set of actions A.

(c) Optimal policy Π∗ .

Figure 3.1: Basic MDP model on a hexagonal grid. Observation of this model reveals that several solutions, or policies, are feasible in order to reach the cell with the greatest reward (1.0), which represents an exit and is the goal for agent Ai . An optimal policy Π∗ , one that maximizes the reward function R given the expected optimal utility from future actions in an infinite horizon, will be computed at a given time step t > 0, for a given state or cell s as: Π∗t (s) = argmaxa Qt (s, a) 5 X Qt (s, a) = R(s, a) + γ Tsja Vt−1 (j) j=0

(3.1)

Vt (s) = Qt (s, Π∗ (s)) V0 (s) = 0 Where Qt (s, a) represents the value of taking action a—in this case moving towards direction a—from cell s, Vt (s) represents the value of cell s at time t, γ ∈ [0, 1] is a future reward discount factor and Tsja is the transition function, taking into account the probability with which an agent will choose state j from state s through action a. Evaluation of Equation 3.1 for t = 1, will modify values from V0 (s) = 0 to V1 (s), ∀s ∈ {0, ..., n}, and every successive evaluation will marginally modify these values. If after iteration x it results that Vx−1 (s) = Vx (s)∀s ∈ {0, ..., n} then the policy has converged and is optimal. Six iterations of Equation 3.1 until convergence for the basic scenario in Figure 3.1(a) will render the optimal policy shown in Figure 3.1(c). From this optimal policy we observe: Observation 1. If a MDP goal cell represents an exit, then a group of agents may be distributed among a set of starting cells, and Π∗ will provide an optimal, collision-free set of general directions—with respect to the scenario, not other agents—towards the exit for all of them.

55 3.1.2

AGENT DENSITY

Even when a group of agents contained within a cell can be directed towards a goal when each of them move in the directions established by Π∗ , several groups may collide when reaching the same cell from different directions as Observation 1 notes. To overcome this problem we propose a similar approach to the one used by van Toll et al. [153] in the sense that non congested routes will be used by agents when the density information indicates, but: • Instead of partitioning the environment into walkable regions, we propose to store the agent density information per grid cell, making use of the optimal hexagonal uniform tiling. • Instead of partially re-planning paths during the simulation, we propose to temporarily override the optimal policy direction if the density threshold σi established for agent Ai is lesser than the actual agent density at the cell that Π∗ points at. When density information allows it, the agent will automatically regain optimal orientation even if the original path has been modified, by simply following once again the direction established by Π∗ . Figure 3.2 shows an optimal policy override example: according to Π∗ , agent Ai should move towards cell c3 , but density σ3 is greater than or equal to the density threshold τi established for Ai . The navigation algorithm determines that c2 and c4 are near c3 , and that c4 has the lowest agent density σ4 , overriding Π∗ , originally towards c3 , in favor of c4 .

Figure 3.2: Π∗ override based on cell density information. Notice that this path update approach exploits the nature of a Markovian decision process, as previous decisions do not affect future actions and Observation 1 is valid at any time step during the simulation.

3.2

ALGORITHM

In the previous section we have outlined a strategy to solve the crowd simulation global planning problem based on MDPs and per-cell density. In order to implement this strategy we propose parallel algorithms to: 1. Determine and initialize MDP parameters that will be used in the computations.

56 2. Solve navigation for the scenario interpreted as a MDP, obtaining the new Π∗ . 3. Use the data from Π∗ to actually displace agents and possibly override the direction dictated by Π∗ when density for the cell is higher than an established threshold. Each step is explained in the next subsections.

3.2.1

PARAMETERS

Implementation of the proposed method will perform based on the following parameters: 1. Scenario parameters • Tile dimensions. For a hexagonal grid (Figure 3.3), the radius Rhex will determine the scenarioheight width cardinality of the set of states |S| = scenario × . Shex Hhex • Number of agents. Number of agents in the simulated crowd. 2. MDP parameters • Available agent directions (set of actions A). Six as modeled in Section 3.1.1 but can be varied. |A| = 6. • Discount value (γ ∈ [0, 1]). A low value will force the policy to converge in less iterations, but also with less accurate results. For this work we use a value of γ = 0.9. • Number of cells (set of states S). Depends on the value of Rhex as described in scenario parameters, and the number of cells will also determine the number of densities to scenarioheight width store. M DProws = , M DPcolumns = scenario . Hhex Shex • Value of penalties and rewards (reward function R). Assigned values for the basic scenario in Section 3.1.1 are used for the sake of clarity, but every cell in a given scenario may be assigned a different reward or penalty as needed to model a diverse scenario. Even several exits can be simulated when high rewards are assigned to them, providing flexibility to model complex scenarios. • Transition probability (transition function Tsja ). Probability that an agent will go from state s to state j by taking action a will also modify the number of iterations required for the policy to converge. For this work we use a probability value of p = 0.8 to 1−p for other actions. estimate that an agent will indeed take action a, and q = |A|−1 3. Density parameters • Density threshold per agent (τi ). Density value that will determine when an agent should take an alternate route as described in Section 3.1.2.

57

Figure 3.3: Hexagonal tile. Rhex is the hexagon radius, Shex = 32 Rhex , Hhex = Whex = 2 · Rhex .

3.2.2



3 · Rhex and

MDP SOLVING ALGORITHM

In order to solve the MDP formalism on the GPU, we require an algorithm that is able to: 1. State data collection. Translate the geometric nature of the hexagonal uniform tiling into data that is suitable for parallel computations, namely the scenario and MDP parameters described in Section 3.2.1 and Figure 3.5. 2. Input generation. Involves the creation of data arrays and the transition matrix. 3. Value iteration. Output data from Step 2 is used as the input for Equation 3.1 that is to be solved by parallel computations. 4. This algorithm should also be modular, so that every—process intensive—iteration can be interleaved with the process of rendering frames, in order to achieve interactive rates when solving dynamic scenarios, as shown in Figure 3.6 and discussed in Section 3.2.2.

State data collection As shown in Figure 3.4, geometric representation of a hexagonal uniform tiling may be translated into a matrix representation when considering the half-height displacement for odd columns, allowing translation back and forth between the geometric form for rendering, and an array-based form for parallel computations. The remaining problem is how to treat the scenario and MDP parameters also in matrix form in order to obtain Π∗ with a parallel algorithm. We collect neighboring, directional information for each MDP state and store it in array form. Figure 3.7 shows an example of a 2 × 2 grid to illustrate the data collection process with a mask, that requires the temporary addition of an hexagonal tile perimeter in order to allow data collection from border tiles.

58

(a)

(b)

(c)

Figure 3.4: Hexagonal tiling. a) Matrix indexing and geometrical representation, where half-height displacement occurs for odd columns. b) Neighboring tile estimation for even columns. c) Neighboring tile estimation for odd columns.

Input generation The required data structures are generated in two steps: 1. Creation of data arrays. In order to store state data, the following arrays are prepared: • Rewards array (RW). For each state, stores the reward for neighboring states. • Previous value array (PV). Initially the same as the Rewards array, updated after each value iteration. • In-bounds array (IB). Describes if an action is valid for the scenario topology. 1 if it is valid, 0 otherwise. • Is-wall array (IW). Describes if an action is invalid due to the presence of a static obstacle. 1 if an action leads to a state with an obstacle, 0 otherwise. • In-bounds-Is-wall array (BW). BW = IB − IW yields an array indicating valid actions for an agent with a 1, and invalid actions with a 0, due either to the presence of an obstacle or the limits of the tiling. • Reachable array (RE). A reduction where each element of RE holds the sum of |A| consecutive elements from BW . This structure indicates how many states are reachable from a given state, discarding invalid actions due either to the presence of an obstacle or the limits of the tiling. REi ∈ [0, |A|] ∀i ∈ {0, ..., |S|}. • hReduction array (RX).i Auxiliary index array used in Section 3.2.2 defined as: RX = |S| 1 0 c, b |A| c, ..., b |A| c . b |A|

59

Figure 3.5: Method pipeline overview. In order to achieve an interactive frame rate, MDP solution, navigation, local collision avoidance and rendering are computed with parallel algorithms. Dotted lines represent written data, and solid lines represent read data.

2. Creation of Transition matrix (Tsja ). At this stage, transition function Tsja is encoded in a 2D matrix with the probability p. (a) Probability matrices and auxiliary diagonal matrices are generated, all with dimensions |A| × |A|:     qi ... qi p ... p     Q =  ... TP =  ... ...  ...  Tr,c qi ... qi p ... p    DA =  

   0 ... 1 1 ... 0  1 ... 1  0 ... 0       DB =   ... ... ...  ...  1 ... 0 0 ... 1

Where r ∈ [1, M DProws ], and c ∈ [1, M DPcolumns ]. Also, qi = REqi −1 |i = r × M DPcolumns + c. This ensures that the sum of probabilities involved in agent decisions is always 1. (b) Addition of Hadamard products contribute a region of Tsja : Q Tr,c = (TP ◦ DA ) + (Tr,c ◦ DB )

The purpose of Tr,c is that every row in it represents a permutation of probabilities for an agent to choose one action from set A over the remaining |A| − 1 actions from MDP cell at row r and column c. Dimensions of matrix Tsja are (M DProws × |A|) ×

60

Figure 3.6: MDP iteration overview. In order to achieve an interactive frame rate, our approach splits MDP solution in stages over time. A change in scenario topology (f ramei−1 ) will sequentially trigger: i) Initialization of data structures on the host (f ramei ): update of the reward function R(s). ii) Data uploading to the GPU: scenario and MDP parameters (f ramei+1 ). iii) MDP iteration on the GPU towards optimal policy until convergence, achieved in c frames or stopped at a safety threshold of 1,000 iterations (f ramei+2 ...f ramei+1+c ). iv) Π∗ update for the navigation algorithm when convergence is achieved (f ramei+2+c ).

(M DPcolumns × |A|).   Tsja = 

T1,1 ...

...

T1,M DPcolumns ...

  

TM DProws ,1 ... TM DProws ,M DPcolumns (c) Current Tsja does not account for scenario obstacles and limits, so invalid entries are discarded with Hadamard products by a modified BW array, repeating every |A|-tuple |A| times to match the dimensions of Tsja , expressed in matrix form: Tsja = Tsja ◦ BWm Tsja = Tsja ◦ BWm−1 Apparently unnecessary repetition of data is present in the described data structures, but it exists in an exchange for speed that allows bulk processing in parallel towards the computation of Π∗ in the following steps.

Value iteration When input data is ready in structures Tsja , RW , P V and RX, parallel algorithms can solve the following operations for every value iteration, as modeled in Equation 3.1:

61

(a)

(b)

(c)

Figure 3.7: States data collection example. a) Original 2 × 2 grid: indices and values. b) Surrounding perimeter for collection. c) Radius 1 mask to collect In-bounds data from cell 0 neighbors.

1. Computation of Q-values. • Πt = RW + (γ · Tsja · P V ) | a ∈ {1, ..., |A|} ∧ j ∈ {1, ..., |A|} ∧ s ∈ {0, ..., |S|}. • Reduce Πt by key where key are values from RX. At this point, Πt holds an array of |A| consecutive values for every state. 2. Selection of best Q-values. From every consecutive |A|-tuple in Πt , the largest value index indicates current best policy for the corresponding state and current iteration is complete. This selection is computed with another reduction operation. 3. Check for convergence. If parallel subtraction of Πt−1 from Πt yields an array containing only zeros, then Πt ≡ Π∗ and the iteration process is complete.

3.2.3

COMPLEXITY

Let N = |S| and M = |A|, then solving the MDP in k number of steps, i.e. finite horizon case, and when the agent’s state is fully observable has a complexity of O(kM N 2 ) when the value iteration algorithm is used [154].

3.2.4

IMPLEMENTATION

In previous sections we have covered the steps required by our MDP solver to find the Optimal Policy Π∗ . These stages were defined in terms of matrices in order to map computations to the GPU programming model. In this section we will explain how the use of parallel transformation and reduction operations, which are exposed in the Thrust Library [155], provided the means to solve MDPs via Value Iteration.

62 Thrust is a parallel algorithms library based on the C++ Standard Template Library (STL). It provides a high-level interface that is fully interoperable with CUDA C. It eases common operations such as memory allocations and transfers from host to device and vice versa. It also allows developers to implement GPU code in terms of data parallel patterns. Our MDP solver was designed to take advantage of Thrust’s functionality. Thrust defines two kind of vector containers, host vector and device vector. The first vector container is stored in host memory while the device vector is allocated in GPU device memory. In the next paragraphs we will only show the relevant GPU pseudo code, omitting sections where data is transferred from host to device memory. Algorithms relevant to our MDP solver implementation are parallel transformations and reduceby-key operations. The transform operation applies a unary function to each element of an input vector and stores the result at the corresponding position in an output vector. The unary operations we use are multiplication (mult) and summation (sum). On the other hand, reduction operations use a binary operation to “reduce” an input vector to a single value. In particular, reduce by key perform a reduction for key-value pairs, i.e. for each set of consecutive keys in the range [keys first, keys last) that are equal, reduce by key copies the first element of the group to the keys output. The corresponding values in the range are reduced using the plus operator and the result is copied to values output. Listing 3.1 shows an overview of the pseudo code to solve a MDP on the GPU. Initially rewards are stored in a comma-separated values (CSV) file. At runtime these rewards are read from the CSV file and loaded to host memory (line 1). According to this information, MDP data structures are initialized on the host side (line 2), transferred to device memory, solved and finally transferred back to the host (lines 3-4). Listing 3.1: MDP Solver on GPU pseudo code overview 1 2 3 4

rewards = read_matrix( csv ); init_mdp(); mdp_init_permutations_on_device(...); mdp_iterate_on_device(...);

Listing 3.2 illustrates the creation of the Tsja transition matrix (T*) from the BW array (dev BW), Q DA (dev DA), DB (dev DB), TP (dev T1) and Tr,c (dev T2) matrices. First the auxiliary diQ agonal matrices are filled as described in Section 3.2.2 (lines 8-9). Second, matrices TP and Tr,c Q are filled with probabilities p and qi respectively (lines 10-11); filling of matrix Tr,c requires information from the BW array to account for scenario obstacles and limits, ensuring that the sum of probabilities for chosen actions is always 1. Third, Tsja is filled (line 12). Parallelization of the Tsja matrix generation is straightforward since only addition and multiplication of floating point values are involved, translated to standard parallel transformation operations from their sequential counterpart.

63

Listing 3.2: mdp init permutations on device(...) 1 2 3 4 5 6 7 8 9 10 11 12

float p = 0.8; thrust::device_vector dev_BW; thrust::device_vector T*; thrust::device_vector dev_T1; thrust::device_vector dev_T2; thrust::device_vector dev_DA; thrust::device_vector dev_DB; fill_DA( dev_DA ); fill_DB( dev_DB ); fill_T1( dev_T1 ); fill_T2( dev_T2, dev_BW ); fill_T( p, T*, dev_DA,dev_T1,dev_DB,dev_T2 );

Listing 3.3 shows the pseudo code to solve the Value Iteration algorithm, following the steps from Section 3.2.2. First, Q-values are computed and stored in the dev Q vector with parallel transformation and reduce-by-key operations (lines 15-18) using the previous value vector dev PV. Second, the best Q-values are discriminated by selecting the largest value generated at the present iteration; the best value-policy tuple is stored in the current value dev CV and current policy dev CP vectors respectively (lines 20-23). Third, also in parallel, previous policy vector dev PP is subtracted from current policy vector dev CP; when the resulting vector contains only zeroes, meaning that no change was detected in the policy, convergence is achieved and dev CP holds the Optimal Policy (lines 25-33).

64

Listing 3.3: mdp iterate on device(...) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34

float discount = 0.9f; bool convergence = false; thrust::maximum op; thrust::equal_to pred; thrust::device_vector dev_Q; thrust::device_vector dev_PV; thrust::device_vector dev_CV; thrust::device_vector dev_PP; thrust::device_vector dev_CP; thrust::device_vector indices; thrust::device_vector indices8; thrust::device_vector indices64; while( !convergence ){ //1. RW+discount(T*)(PV) thrust::transform(T*, dev_PV, T*, mult ); thrust::transform(T*, discount, T*, mult ); thrust::transform(T*, rewards, T*, sum ); thrust::reduce_by_key( indices64,T*,dev_Q ); //2. Select best Q-values ZipIterator in (make_tuple(dev_Q, indices)); ZipIterator out(make_tuple(dev_CV, dev_CP)); thrust::reduce_by_key( indices8,in,out,pred,op ); //3. Check for convergence: thrust::transform( dev_CP.begin(), dev_CP.end(), dev_PP.begin(), dev_result.begin(), thrust::minus); if(thrust::count( dev_result.begin(), dev_result.end(), 0 ) == mdp_size ) convergence = true; } // while

3.3

CROWD NAVIGATION

Optimal Policy Π∗ is used to steer a group of agents in a virtual environment. The navigation algorithm follows these steps in parallel: 1. For each agent, record the initial relative displacement from its tile center to orient it whenever this relative displacement is detected (Figure 3.8) for the same agent at another cell.

65 2. Determine the current hexagonal tile for an agent given its scenario coordinates, considering the parameterization shown in Figure 3.4. Then search for the corresponding Π∗ action to take from an agent’s current state as Figure 3.8 shows. 3. Predict the next hexagonal tile for an agent, as shown in Figure 3.2. 4. Orient the agent in the corresponding direction and displace it, using a Local Collision Avoidance method to evade impacts against other agents; either a discrete [92] or continuum [44] representation may be used for a sub set of cells, according to agent density or level of detail. Collision avoidance, however, is out of the scope of the present work.

Figure 3.8: Direction evaluation point. A new direction is determined for agent Ai with the aid of Π∗ and cell density whenever the situation from Ai relative to the cell center is the same.

3.4

RESULTS

We designed three experiments to inspect the behavior of our method. The first one consisted in evaluating our algorithm, executed in CPU platforms when calculating the optimal policy for the case scenario shown in Figure 3.9. This scenario represents an office complex discretized using 1, 584 cells. Experiments one and two considers the MDP solving algorithm using a traditional square tiling while experiment three considers an interactive navigation case using the extended version of the algorithm that uses hexagonal tiling. CPU platforms used for this test were: i) An embedded system, Jetson TK1 development kit [156], with a 32 bit ARM quad-core Cortex-A15 CPU running at 2.32GHz. ii) A desktop system with a 64 bit Intel Core i7 CPU running at 3.40GHz. Thrust library supports different back-end systems which control how algorithms implemented using Thrust get mapped to and executed on the available parallel platforms. The Thrust backend system supports OpenMP, Intel Threading Building Blocks (TBB) and CUDA libraries. For this test, our algorithm was executed on the OpenMP backend using up to 128 threads on both platforms. Figure 3.10(a) shows performance results for the first test. Finding the Optimal Policy Π∗ on the Intel CPU using one thread took 78.11s whereas for the ARM processor took 182.34s and the Optimal Policy was found in 178 iterations with both CPUs. Performance increases when using up

66

Figure 3.9: Office complex scenario used in the tests, composed of 1,584 cells. Different routes after Optimal Policy Π∗ calculations are highlighted. GPU Jetson TK1 GPU Tesla K40c GTX Titan

Office Complex Time (s) 2.61769 0.363895 0.320886

Table 3.1: Time to find Π∗ for the Office Complex Scenario composed of 1,584 cells using GPU architectures.

to eight threads, in this case Π∗ was calculated in 28.01s using the Intel CPU, and in 79.94s using the ARM CPU. Using above eight threads there are no significant time gains. Further results are illustrated in Figure 3.10(b). It shows the speed up of the algorithm in both CPUs when using different number of threads. Notice that even when the Intel CPU performed better than the ARM processor, both of them show a similar behavior, reaching the maximum speed up at eight threads. The second experiment consisted in testing the algorithm in embedded and desktop GPU platforms using the Thrust CUDA back-end system. The embedded GPU is part of the Jetson TK1 development kit and it is based in the Tegra K1 mobile GPU [157] having 192 CUDA Cores. The desktop GPU platforms were a Tesla K40c (2880 CUDA cores) and a Geforce GTX TITAN (2688 CUDA cores). For this experiment we also used the office complex scenario (Figure 3.9) and its results are shown in Table 3.1. For the GPU version of the algorithm, it took 184 iterations to converge to Π∗ . In Figure 3.10(c), we show the speed up when using GPUs when compared to the best Intel and

67

(a)

(b)

(c)

Figure 3.10: Performance and speedup. a) Performance results executing our MDP algorithm in Intel and ARM CPUs using different number of threads. b) Speed up obtained using different number of threads in Intel and ARM CPUs. c) Speedup obtained in the CUDA version of the algorithm.

ARM CPUs performance (eight threads). Results confirm that our MDP implementation running on GPU achieves up to almost 90x and 250x better performance than the same version running on Intel and ARM CPUs respectively, which is crucial for the case of real-time crowd simulation in order to obtain interactive responsiveness. The third test consisted in measuring the performance for interactive 3D crowd simulations with the full implementation of the MDP-solving and navigation algorithms as described in Section 3.2 and Section 3.3. In addition, these algorithms were coupled with an interactive crowd rendering engine to visualize detailed crowds using animated 3D characters implemented with OpenGL.

68

(a)

(b)

(c)

(f)

(d)

(e)

(g)

(h)

Figure 3.11: Crowd navigating in dynamic environments. a) MDP policy represented by directions. b) Cell density represented by shades of color. c) Narrower grid partitioning with dynamically introduced exit point. d) Simulated agents. e) Navigation with dynamically introduced obstacles and exit points. Different perspectives of the Office Complex Scenario taken in different moments of the simulation. f) Top view. Notice how the characters are forming groups when reaching the exit located at the bottom-left of the image. g) Perspective view. h) Close-up view.

69

Row 8 16 32 64 128 256

MDP Size Columns 8 16 32 64 128 256

Cells 64 256 1,024 4,096 16,384 65,536

Time (s) K40 GTX Titan 0.011134 0.010902 0.049265 0.039090 0.051187 0.051288 0.983392 0.941780 1.121590 1.065430 2.123859 2.115497

Table 3.2: Time to find Π∗ for a virtual environment discretized with hexagonal tiles. Figure 3.11 shows different views of a virtual environment that has been incrementally discretized into hexagonal cells. Table 3.2 reports the MDP solver performance on Tesla K40 and GeForce Titan GPUs. As a pre-process, the MDP is solved and Π∗ is used to steer agents in a virtual environment. Figure 3.11 shows different perspectives of the Office Complex rendered interactively. Both environments were rendered using 4096 characters at an average time of 16 ms per frame. Further timing of the algorithm showed that, for the K40c or GTX Titan GPUs, the required time to upload data structures to the GPU, as well as the required time to update the policy data is less than 1 millisecond. Therefore GPU-CPU data transfer is not a bottleneck for the proposed method. On the other hand, as implemented, MDP Value Iteration solving consumes most of the processing time in the generation of matrix Tsja , and not in the actual solving of the MDP as would be expected. The proposed method relies heavily on parallel algorithms, requiring a modern CPUGPU platform to be executed. Also, the number of parameters to be finely-tuned for the proposed method is high; but rather than considering this a limitation, we see an opportunity to also generate finely-tuned simulations. Finally, we have designed a case to compare the MDP method with a classic path-finding algorithm: Figure 3.12 presents the solution for a 400 cell scenario (20 rows and 20 columns), of which 164 represent obstacles, in order to compare performance and memory requirements for the A∗ algorithm (Figure 3.12(a) and Table 3.3, using the implementation by Heyes [158]) and MDP (Figure 3.12(b) and Table 3.4) by finding optimal navigation paths. We found that execution time for A∗ depends on the number of agents to process, whereas solution by MDP is completed in constant time regardless of the number of agents. To process all 236 free cells in the test scenario, the MDP-based method takes 0.02079 seconds in average to process an agent, whereas A∗ takes 0.078 seconds. We also found that, to process these 236 cells, A∗ requires 6,580 bytes in average for each agent to store node lists, while the MDP-based method requires 1,803 bytes in average for each agent to store Π∗t , Qt and Vt in vectors [159].

70

(a) A∗ solution.

(b) MDP solution.

Figure 3.12: Test scenario to compare A∗ and MDP. Method CPU Options Goals able to process Solution time 236 agents time Solution steps Search steps Processed agents Total memory Agent memory

A∗ Intel [email protected]

4 1 0.078s 18.408s 46 235 1 6.42 KB 6.42 KB

Table 3.3: A∗ performance 3.5

Method GPU Options Goals able to process Solution time 236 agents time Solution steps Iterations Processed agents Total memory Agent memory

MDP NVIDIA GeForce GTX780M

8 235 4.907s 4.907s 39 167 236 415.63 KB 1.76 KB

Table 3.4: MDP performance

CONCLUSION

We have presented a method to simulate crowds navigating in a dynamic environment based on Markov Decision Processes over a uniform hexagonal tiling which takes advantage of the GPU. Crowd navigation paths estimated with the presented method are not limited by the number of obstacles it can handle. The GPU mapping of the MDP solver algorithm was made possible by formulating it in terms of matrix operations, thus taking advantage of the “massive” data parallelism available in GPU computing to reduce the MDP computing time. For optimal performance, the proposed algorithm states set is a hexagonal tile partition of the observable space, solving the “where to go” problem presented by Chen and Lu [72], but in addition producing the specific path for the agents to walk along until their goal is reached.

71 In addition, taking advantage of the proposed hexagonal grid partitioning method, our implementation provides a good level of space discretization and performance (Table 3.2) as an alternative to Noer’s [73] approximate solution over a POMDP. We demonstrated that standard parallel transformation and reduction operations provide the means to solve MDPs via Value Iteration with optimal performance, simplifying and advancing the solver introduced by J´ohannsson [71]. Notice that global path planning is computed based on the Optimal Policy generated by the MDP with the measure of agent densities per grid cell. This turns the solution semi-optimal when the algorithm uses the Density Parameter. This parameter is important because it avoids crowd jams by overriding the Optimal Policy and allowing the agents to move towards less crowded areas. On the other hand, our method is not a Local Collision Avoidance (LCA) solution, but a coarser approach to establish more general directions over all the navigable space; for this reason our implementation is in the context of fully observable MDPs. Implementation using the Thrust library allowed us to test and analyze the performance behavior of our MDP solver under different architectures. We obtained a 90x speed up using GPUs enabling us to simulate crowd behavior interactively. Interactive or real-time response is important in crowd applications such as training, steering crowd simulations, visualization of simulation results and entertainment among others. Also, we found the Jetson TK1 GPU to have a remarkable performance, opening many possibilities to incorporate real-time MDP solvers in mobile robotics. We have also compared our MDP solver to other path planning algorithms as shown in Table 3.5, noting that its parallel nature gives it unique advantages for crowd simulation path planning, such as the possibility to handle multiple exit points and obstacles, each with a different priority.

72

Table 3.5: Comparison with other path planning methods LPA*[160] D*Lite[161] ADA*[162] APPR[163] ARA*[164] MDP

LPA*[160] D*Lite[161] ADA*[162] APPR[163] ARA*[164] MDP

LPA*[160] D*Lite[161] ADA*[162] APPR[163] ARA*[164] MDP

LPA*[160] D*Lite[161] ADA*[162] APPR[163] ARA*[164] MDP

Base Complexity

Method

Depends on heuristics function: O(log(h∗ (x))) O(kM N 2 )

Incremental and heuristic search Incremental and heuristic search Incremental replanning Dynamic planning in state-time space Progressively optimal heuristic search Dynamic planning in state space

Approach

Inherently serial or parallel

Incremental version of A* Recalculation of goal distances that have changed Combination of D*Lite and ARA* Probabilistic sampling and deterministic planning Tunes performance based on available search time Uniform sampling and ML-based planning

SERIAL SERIAL SERIAL SERIAL SERIAL PARALLEL

Shortest path guaranteed

Dynamic scenario

YES YES YES NO YES NO

NO YES YES YES YES YES

Parallel multi-agent path planning

Multiple priority exit points

NO NO NO NO NO YES

NO NO NO NO NO YES

4. CELLULAR AUTOMATA FOR LOCAL COLLISION AVOIDANCE After completing the development of a MDP-based technique for Navigation, delivering LongRange paths for a Macro crowd simulation, our following objective aims to implement a Local Collision Avoidance (LCA) technique which is not only compatible, but also able to be coupled with its characteristics, in order to achieve Objective 3. We intend to show that working together, Macro and Micro simulations can share and optimize computational resources. The final approach to this objective comes from a series of experiments whose goal is to exploit the already established—although rather coarse—functional grid partitioning scheme from the Navigation stage. Experiments towards our final LCA approach make use of the square grid, which is easier to implement and test, but are still compatible with the optimized hexagonal grid discussed for Navigation. The present chapter is organized with the order in which our approaches were developed: in Section 4.1 we present the Micro Scenarios technique for LCA, listing its characteristics and limitations for our integrated method. In Section 4.2 an implementation of a Parallel lattice-Boltzmann Solver for Guided Crowds is discussed, also with a description of its characteristics and limitations for our integrated method. In Section 4.3, discussing our final LCA approach, the LGCA framework provides the basis for a Cellular Automaton applied to LCA for crowds. We present our conclusion for this chapter in Section 4.4.

73

74 4.1

MDP AND MICRO SCENARIOS APPLIED TO LOCAL COLLISION AVOIDANCE FOR CROWDS

Once an agent is assigned a static, obstacle-free MDP-B´ezier generated path, it will reach its goal without collisions if no changes are registered in the scenario (Figure 4.1(a)). This case, however, is not probable for two reasons: • A crowd simulation implies that agents will compete to occupy the available navigational space as each of them tries to reach its goal. • Not every scenario is static, for example in the case of a simulated earthquake-triggered evacuation.

(a) Static scenario.

(b) Dynamic scenario.

Figure 4.1: Collision Avoidance problem.

When evacuation scenarios are not static, then agents need to avoid collisions with each other (Figure 4.1(b)). We propose a GPGPU-based technique to provide Adaptive MDP-B´ezier (AMDP) paths for agents.

Figure 4.2: Micro scenarios of radius 1 and their respective signatures.

75 A micro scenario describes possible directions an agent will choose from to avoid collision against moving objects, whether obstacles or other agents. Depending on the number and position of this objects in motion, a specific micro scenario is selected to program a small, MDPprecalculated trajectory segment in real-time. Figure 4.2 shows a 3 × 3 cell, radius 1 micro scenario; the first two micro scenarios represent the case where no objects in motion exist around the agent, which has two different possible goals. The final micro scenario presents the case for an agent surrounded by objects in motion.

4.1.1

´ CUBIC BEZIER LEAST-SQUARE FITTING

If the array of directions containing the Optimal Policy Π∗ were to be used at the crowd rendering stage, agents would translate to discrete positions in an unrealistic fashion; therefore an extra step is needed in order to provide each agent with smooth translation paths. This step relies on the recursive fitting algorithm (LSF) presented by Shao and Zhou [165] to achieve the mapping from a set of cells to a set of cubic B´ezier curves (Figure 4.3(b)).

(a) Computed discrete Optimal Policy.

(b) B´ezier curves form evacuation paths.

Figure 4.3: Reference scenario solution.

If the array of directions containing the optimal policy were to be used at the crowd rendering stage, agents would translate to discrete positions in an unrealistic fashion; therefore an extra step is needed in order to provide each agent with smooth translation paths. This step relies on the recursive fitting algorithm presented by Shao and Zhou [165] to achieve the mapping from a set of cells to a set of cubic B´ezier curves and operates as follows: the equation for a cubic B´ezier curve is defined as: (4.1) q(ti ) = (1 − ti )3 P0 + 3ti (1 − ti )2 P1 + 3t2i (1 − ti )P2 + t3i P3 and the least square fitting (LSF) equation is: n X S= [pi − q(ti )]2 i=1

(4.2)

76 where pi represents the array of cell center points obtained by following the optimal policy, and q(ti ) their approximated counterparts. From Equation 4.1 and Equation 4.2: S=

n X

[pi − (1 − ti )3 P0 − 3ti (1 − ti )2 P1 − 3t2i (1 − ti )P2 − t3i P3 ]2

(4.3)

i=1

Since P0 and P3 are fixed in the sense that the curve will intersect them, the remaining calculations are for P1 and P2 , which determine the curvature and are computed from Equations 4.3, 4.4 and 4.5: δS =0 δP1 δS =0 δP2 (A2 C1 − BC2 ) P1 = (A1 A2 − B 2 ) (A1 C2 − BC1 ) P2 = (A1 A2 − B 2 )

(4.4) (4.5) (4.6) (4.7)

Where: A1 = 9

n X

t2i (1 − ti )4

i=1

A2 = 9

n X

t4i (1 − ti )2

i=1

B=9

n X

t3i (1 − ti )3

i=1

C1 =

n X

C2 =

3ti (1 − ti )2 [pi − (1 − ti)3 P0 − t3i P3 ]

i=1 n X

3t2i (1 − ti )[pi − (1 − ti)3 P0 − t3i P3 ]

i=1

(A1 A2 − B 2 ) 6= 0 By recursively partitioning the result set at the largest squared difference until an established threshold is reached, Equation 4.6 and Equation 4.7 output control points for the desired smooth paths. The GPU-based MDP solver and Cubic B´ezier LSF algorithms generate smooth evacuation paths for agents making use of Observation 1 (Figure 4.4(a)). The scenario in Figure 4.4(b) illustrates a more elaborated exit path, and the corresponding simulation screen capture is shown in Figure 4.4(c).

4.1.2

MICRO SCENARIO SELECTION ALGORITHM

The algorithm to select micro scenarios is described next:

77

(a) Reference scenario simulation.

(b) Random B´ezier samples.

(c) Simulation rendering.

Figure 4.4: Rendering of scenarios.

1. In a pre-processing step, the MDP-B´ezier Solver is used to determine optimal navigation paths for agents. 2. Also using the MDP-B´ezier Solver, another pre-processing step solves every possible micro scenario (of user-defined radius) for which the agent is located at the center cell (Figure 4.2). This pre-processing step outputs two arrays of data: (a) An array of unique B´ezier curves (tagged “B´ezier” in Figure 4.5) to be used in the simulation. (b) An array of “signatures” (tagged “Signatures” in Figure 4.5) that uniquely relate each micro scenario to a set of B´ezier curves (tagged “B´ezier-Signature Indirect Index” in Figure 4.5). In the scope of these signatures it does not matter if obstacles are part of the scenario or other agents. 3. At run-time, Algorithm 1 is run on the GPU, so that agents traverse their originally assigned

78 cells if they are available, or adjust their route with the aid of a precalculated micro scenario if a cell is occupied. Since drifting from the original path would undefine long-term direction for an agent, micro scenarios are chosen in a way that their exit cell points toward the original path until the agent can navigate it again.

Figure 4.5: Arrays for the micro scenario selection algorithm. Algorithm 1: Algorithm to alter paths based on micro scenarios. 1: for agent = 1 → N do 2: bezierIndex = signature[agent] 3: if bezParam[agent] == 0 then 4: if occupancy[agentCell[agent]] == 1 then 5: aSignature = readMicroScenario( agentCell[agent] ) 6: bezierIndex = BezierSignature[aSignature] 7: signature[agent] = bezierIndex 8: end if 9: end if 10: position[agent] = evalBezier( bezParam[agent], bezierIndex ) 11: bezParam[agent] += bezInc[agent] 12: if bezParam[agent] > 1 then 13: bezParam[agent] = 0 14: end if 15: occupancy[agentCell[agent]] = 0 16: agentCell[agent] = positionToCell( position[agent] ) 17: occupancy[agentCell[agent]] = 1 18: end for

79 4.1.3

IMPLEMENTATION AND RESULTS

We have designed an experiment that consists in simulating a massive crowd composed of Lemmings1 and virtual humans. This case is useful to test micro scenarios as a Collision Avoidance technique. We found that agents effectively avoid collisions between them, also reacting to user interaction, when obstacles were added during the simulation. This test was performed with a GeForce GTX 560M GPU. Numeric results for this simulation are shown in Figure 4.6(a). When agents only follow their computed trajectories, without encountering obstacles, the method is executed in MIN TIME, between 1 and 2 milliseconds for a crowd composed of 128 to 8,192 agents. On the other hand, for congested scenarios the method is executed in MAX TIME, from 6 to 13 milliseconds.

(a) Simulation time.

(b) Average time forecast.

Figure 4.6: Global MDP micro scenario algorithm performance.

As these measures indicate, an advantage of using MDP and micro scenarios is that in runtime, since solution for a scenario is size and complexity independent, the method is reasonably predictable (as the number of cells and agents implies) and stable (Figure 4.6(b)). Given the resulting performance and behavior for MAX TIME, we can predict that if enough GPU memory and processing power are available, then 1,000,000 agents can be simulated in about 806.37 milliseconds for congested scenarios. When micro scenarios, solved during preprocessing time are used to avoid collisions, algorithm complexity results in that of the reading of neighboring cells, added to search complexity in a signature list to obtain B´ezier curves for continuous motion of each agent. This is, for N agents: O(10N ), a linear cost as Figure 4.6(a) shows. Figure 4.7 shows visualizations of the Lemmings (Figure 4.7(a)) and virtual humans simulations (Figure 4.7(c)) for this experiment. Note that in both crowds characters change direction to avoid collisions against scenario obstacles (Figure 4.7(b)) and against other agents (Figure 4.7(d)). 1

de Rubens, Jean Dirk. Modeler. Personal interview. 26 October 2011.

80

(a) Lemmings simulation.

(b) Navigation.

(c) Virtual humans simulation.

(d) Collision Avoidance.

Figure 4.7: Simulated Lemmings and virtual humans.

4.1.4

CONCLUSION AND LIMITATIONS

GPU implementation of the MDP and micro scenarios method allows us to simulate real-time crowds freeing enough computational resources to present its three-dimensional visualization with the same hardware. We have presented a reactive movement approach for unexpected changes in the environment, opposed to path planning for this kind of changes, simplifying the structures involved in path adaptation calculations and computation speed when the need for a Semi-Markov [8] Decision Process (SMDP) is eliminated. We find, however, that implementation of this method for an integrated Navigation and Local Collision Avoidance technique introduces the following possible limitations: • Collision Avoidance accuracy depends entirely on the micro scenario radius, i.e. collisionfree trajectories are guaranteed inside the micro scenario radius, but not beyond; agents may still collide at the end of the path delimited by the precalculated B´ezier curve. • Increment of micro scenario radius is costly in computational terms, since every agent, obstacle, free space permutation must be precalculated. Increment of micro scenario radius may solve collisions at its boundary, but adds complexity and processing time requirements to the preprocessing stage of a simulation. • While in transit, agents must complete the path traced by the B´ezier curve before they are able to modify their trajectory.

81 4.2

A PARALLEL LATTICE-BOLTZMANN SOLVER FOR GUIDED CROWDS

Experimentation with a reduced version of the MDP formalism as a resource to perform CA for crowds, showed us that current hardware capacity is able to support an integrated Navigation-CA method. It also proved, however, that accurate AI methods quickly escalate in complexity when individual path planning is attempted. Therefore, we opted to experiment with the alternative to manage the crowd as whole, in order to explore the principles that govern fluid mechanics, with the idea to transfer as many as possible to an agent-based version. As mentioned in Section 2.2, general fluid dynamics are described by the Navier-Stokes [45] equations (Equation 2.4). These equations, however, present the problem of discretization when it comes to simulations: how are these equations to be processed with a discrete algorithm? Moreover, with several possible discretizations [47], a simulation may use the most precise—but computationally expensive—or the the one with the best performance. The latter case is chosen when a virtual crowd is rendered, for it will allow a large number of agents to be simulated, without a noticeable loss in visual precision. The following experiment addresses the creation of a parallel lattice-Boltzmann Solver for Guided Crowds.

4.2.1

THE BOLTZMANN EQUATION

As Nourgaliev et al. [166] summarize, the kinetic theory of gases states that the single-particle probability distribution function (PPDF) is f (r, e, t), defined in such a way that at time t, [f (r, e, t)· d3 r · d3 e] number of particles are located within a phase-space control element [d3 r · d3 e]. The probabilistic transport for a particle is then expressed by: (∂t + e · ∇r + a · ∇e )f (r, e, t) = (∂t f )coll

(4.8)

where e is the velocity for the particle, r is the position of the particle, and a accounts for external forces applied to the particle. Ludwig E. Boltzmann1 stated that if only binary collisions are considered, and the velocity for a molecule is not correlated to its position, the collision term (∂t f )coll from Equation 4.8 is defined as: Z Z (∂t f )coll = dΩ d3 e(0) σ(Ω)|e − e(0) |(f 0 f 0(0) − f f (0) ) (4.9) where Ω is the scattering angle of the binary collision {e0 , e0(0) } → {e, e(0) }, f and f 0 stand for the PPDF before and after the collision, and σ(Ω) is the differential cross-section of the collision. Lattice Gas Cellular Automata (LGCA) provide a framework (Section 4.2.2) to implement the molecular theory from Boltzmann (Equation 4.9), generating the lattice Boltzmann Method (LBM), which is implemented in Section 4.2.5 in the form of a Single Component Single Phase (SCSP) [167] LBM adapted to function as an AI algorithm to guide virtual crowds. 1

Lerner, Rita G. Encyclopedia of Physics. 2nd ed. New York: VCH, 1991. Print.

82 4.2.2

CHARACTERISTICS OF LGCA AND LBM

LGCA and LBM simulate a gas through particles at discrete points in space represented by Boolean variables, meaning that unitary mass is assigned to each particle. Both methods have a set of common characteristics: A set of connected sites Composing the lattice or mesh as shown in Figure 4.8. The two-dimensional model with 9 velocities at each site is known as D2Q9 [167]. In order to model a crowd as viewed from the top, the D2Q9 model is chosen for the proposed LBM implementation, but the model may be extended to use a three-dimensional lattice, as the D3Q15 model. State variables Defined at each site, these variables control the particle dynamics. An update rule Based on local and neighbor information. In the LBM case: collision and streaming. These operations model turbulence and relaxation that leads to equilibrium.

Figure 4.8: Connected sites forming a lattice in the D2Q9 model. A site is represented as: ~x.

4.2.3

GENERIC LBM SIMULATION ALGORITHM

1. Each particle travels with a discrete direction e. This is the discretization of velocity space. 2. At each time step, the particles move along their assigned directions toward the next lattice point: they are streamed. 3. If more than one of these particles arrive simultaneously at the same lattice point, a collision rule is applied, re-distributing the particles such that the conservation laws (Navier-Stokes for mass and momentum) are satisfied. This simple procedure is stated in Algorithm 2.

83 Algorithm 2: Generic LGCA algorithm. 1 2 3 4

foreach time step do Stream(); Collide(); end

4.2.4

LATTICE-BOLTZMANN MODEL DEFINITION

Based on the work by Nourgaliev et al. [166], an algorithm for the SPSC-LBM is condensed:

Algorithm 3: SPSC-LBM algorithm. 1 2 3 4 5 6 7

foreach time step do A. CalculateMacroscopicVariables(); B. DetermineNewEquilibriumDF(); C. Collide(); D. ProcessBoundaries(); E. Stream(); end

Each step in Algorithm 3 is described in the following subsections:

Flow macroscopic conserved variables Local macroscopic density ρ for the fluid is the sum of discrete velocities at each site (Equation 4.10). 8 X ρ= fi (4.10) i=0

Components for the resulting macroscopic velocity ~u are directly related to discrete velocities, and inversely related to the local density (Equation 4.11).

(f1 − f3 + f5 − f6 − f7 + f8 ) ρ (f2 − f4 + f5 + f6 − f7 − f8 ) uy = ρ

ux =

(4.11)

84 New local equilibrium distribution functions If the weights wi for the D2Q9 model are:        wi =      

4 9

if i = 0

1 9

if 16 i 64

1 36

if 56 i 68

and c is the speed for a particle in a given lattice, then the local equilibrium distribution functions account for relaxation after collision takes place (Equation 4.12).

u2x + u2y = w0 ρ(1 − w5 ) c2 u2x + u2y ux eu2 f1eq = w1 ρ(1 + 3 2 + 4.5 4x − w5 ) c c c2 u2x + u2y eu2 ux f2eq = w2 ρ(1 + 3 2 + 4.5 4x − w5 ) c c c2 u2x + u2y ux eu2x eq f3 = w3 ρ(1 − 3 2 + 4.5 4 − w5 ) c c c2 u2x + u2y ux eu2 f4eq = w4 ρ(1 − 3 2 + 4.5 4x − w5 ) c c c2 u2x + u2y e(ux + uy )2 (ux + uy ) + 4.5 − w ) f5eq = w5 ρ(1 + 3 5 c2 c4 c2 u2x + u2y (−ux + uy ) e(−ux + uy )2 eq f6 = w6 ρ(1 + 3 + 4.5 − w5 ) c2 c4 c2 u2x + u2y (−ux − uy ) e(−ux − uy )2 f7eq = w7 ρ(1 + 3 + 4.5 − w ) 5 c2 c4 c2 u2x + u2y (ux − uy ) e(ux − uy )2 f8eq = w8 ρ(1 + 3 + 4.5 − w ) 5 c2 c4 c2 f0eq

(4.12)

Compute collisions The term τ represents a relaxation time (Equation 4.13) related to viscosity in the BGK collision operator model (Equation 4.14), where BGK stands for Bhatnagar, Gross and Krook [166], who introduced it in 1954.

finew

τ = 0.51 1 = fitemp − (fitemp − fieq ) τ

(4.13) (4.14)

85 Boundary sites processing A particle that reaches a site marked as ‘solid’ is bounced back (Equation 4.15).

f temp2 = f f1 = f3temp2 f3 = f1temp2 f2 = f4temp2 f4 = f2temp2 f5 = f7temp2 f7 = f5temp2 f6 = f8temp2 f8 = f6temp2 (4.15) The equation of state [167] in the D2Q9 model directly relates pressure p to density ρ and the pseudo speed of sound cs (Equation 4.16). ρ (4.16) 3 An assumption of constant pressure and constant velocity at boundaries has been explored by Zou and He [168], and properly described by Sukop et al. [167]: Zou and He [168] pressure boundary on north side [167]: p = c2s ρ =

ρ0 =

f0 + f1 + f3 + 2(f2 + f5 + f6 ) 1 + uy

ru = ρ0 uy 2 f4 = f2 − ru 3 1 1 f7 = f5 − ru + (f1 − f 3) 6 2 1 1 f8 = f6 − ru + (f3 − f 1) 6 2 (4.17)

86 Zou and He [168] pressure boundary on south side [167]:

ρ0 =

f0 + f1 + f3 + 2(f4 + f7 + f8 ) 1 − uy

ru = ρ0 uy 2 f2 = f4 − ru 3 1 1 f5 = f7 − ru − (f1 − f 3) 6 2 1 1 f6 = f8 − ru − (f3 − f 1) 6 2 (4.18) Zou and He [168] pressure boundary on east side [167]:

f0 + f2 + f4 + 2(f1 + f5 + f8 ) 1 + ux ru = ρ0 ux 2 f3 = f1 − ru 3 1 1 f7 = f5 − ru + (f2 − f 4) 6 2 1 1 f6 = f8 − ru + (f4 − f 2) 6 2

ρ0 =

(4.19) Zou and He [168] pressure boundary on west side [167]:

f0 + f2 + f4 + 2(f3 + f7 + f6 ) 1 − ux ru = ρ0 ux 2 f1 = f3 − ru 3 1 1 f5 = f7 − ru − (f2 − f 4) 6 2 1 1 f8 = f6 − ru − (f4 − f 2) 6 2

ρ0 =

(4.20)

4.2.5

IMPLEMENTATION AND RESULTS

Equations 4.10 to 4.20 define the proposed SPSC-LBM that will update site variables in time. A problem arises, however, if the number of sites to be processed is in the order of millions, because

87 GPU Lattice width Lattice height Total sites τ Iteration time

TM

GTX 560M 1024 1024 1,048,576 2.8 90.4 ms.

GeForce

GPU Lattice width Lattice height Total sites τ Iteration time

Table 4.1: SPSC-LBM without solid sites.

TM

GTX 560M 1024 1024 1,048,576 2.8 90.3 ms.

GeForce

Table 4.2: SPSC-LBM with solid sites.

current CPU processing power is not enough to provide a real-time simulation for the AI alone. The SPSC-LBM algorithm is then implemented on the Graphics Processing Unit (GPU), and is specifically a GPGPU (General-Purpose computations performed on the GPU) algorithm written in CUDA C. Also, equations 4.10 to 4.20 are of straightforward parallel implementation since they are expressed in sums and multiplications. The CUDA C implementation of the SPSC-LBM has been tested in a laptop PC with performance reported in Table 4.1. Figure 4.9(a) captures the execution of the algorithm without solid boundaries. It is worth to mention that 1,048,576 sites processing in 0.0904 seconds leaves enough processing time for simultaneous character rendering. In a second test, solid sites are introduced to simulate a gate through which the fluid (and eventually agents) will pass. Execution results are presented in Table 4.2 and Figure 4.9(b). As expected, an increase in flow velocity appears around the gate opening. A third test arranges solid sites to simulate lanes through which the fluid (and eventually agents) will pass. Execution results are presented in Table 4.3 and Figure 4.9(c). As expected, an increase in flow velocity appears around the lanes, and the iteration time is reduced in about 2 milliseconds, because solid sites do not imply calculations.

(a)

(b)

(c)

Figure 4.9: SPSC-LBM Implementation results. Green and blue colors represent components ux and uy of macroscopic velocity ~u. a) Implementation without solid sites. b) Implementation with solid sites representing a gate. c) Implementation with solid sites representing lanes.

In a final test, 250,000 agents are introduced with a minimal collision rule: an occupied site

88 GPU Lattice width Lattice height Total sites τ Lanes Iteration time

TM

GTX 560M 1024 1024 1,048,576 2.8 9 88.0 ms.

GeForce

Table 4.3: SPSC-LBM with solid lanes.

GPU Lattice width Lattice height Total sites τ Agents Iteration time

TM

GTX 560M 1024 1024 1,048,576 2.8 256,000 90.4 ms.

GeForce

Table 4.4: SPSC-LBM with agents.

may not be occupied by another agent. Initially positioned randomly (restricted to the upper half of the scenario), agents are displaced by fluid forces, as recorded in Table 4.4 and Figure 4.10.

(a)

(b)

Figure 4.10: Implementation with solid sites simulating a gate. a) Agents flow through the walkable opening. b) A closer look at the gate zone.

4.2.6

CONCLUSION AND LIMITATIONS

As experimentation with the LGCA framework for crowd simulation demonstrates: • Its parallel implementation is viable and straightforward, since its describing equations are directly mapped to transformation and reduction operations. • Its performance frees enough computational processing resources for character rendering. • It solves both the Collision Avoidance and Navigation problems. Nevertheless, it also exemplifies the limitation stated in Section 2.2.2: “an advantage of using the Cellular Automata approach consists in solving the Collision Avoidance and Navigation problems with one algorithm. A disadvantage it presents is the lack of separation and control for individuals as they must follow the majority flow and direction towards an exit”. Because of this lack of separation and control for individuals, modeling of crowd behavior as a fluid with viscosity is not completely accurate. The fact remains, however, we have found during our research

89 [41, 42, 43, 44] that a crowd in motion, when observed from above, does resemble a fluid. Our next goal is to model crowd behavior with the same characteristics of LGCA, but at the same time introduce control for individual agents with the idea of solving the lack of separation and control issue.

4.3

CELLULAR AUTOMATA APPLIED TO LOCAL COLLISION AVOIDANCE FOR SIMULATED CROWDS

In this section we present a formulation for a GPGPU interpretation of the Cellular Automata approach that focuses only on Collision Avoidance, solving the majority flow and direction problem by introducing the MDP Optimal Policy Π∗ as a local directional guide, granting separation and control for individual agents.

4.3.1

LOCAL NAVIGATION MAP

We begin by observing a basic reference MDP Navigation example (Figure 4.11), in which a group of agents is to be directed (Figure 4.11(a)) towards a unique goal (Figure 4.11(b)), while avoiding an obstacle. It is clear that agents are likely to collide by merely following Navigation

(a)

(b)

(c)

Figure 4.11: LCA principles. a) Set of directions (set of actions in the MDP formalism) for an agent. DIR = {1, 2, 3, 4, 5, 6, 7, 8}. b) An empty reference scenario. c) MDP generated policy (Π∗ ). Agents will collide when only following the policy.

paths, if we consider that each agent moves with unique velocity, risking agent-to-agent collision (CA topic, Section 2.3) and, depending on its starting position within the cell, could even crash against the obstacle (Figure 4.11(c), Navigation topic, Section 2.2). These are the problems that coarse Navigation paths do not solve, and Local Collision avoidance will address. A sub-partition of the navigable space (Figure 4.12) for LCA—i.e. a partition of the MDP states set—has the advantage of preserving Navigation data intact (Figure 4.12(b)), while also allowing for individual agent control. Although there are infinite partitions for the states set, an area only big enough to

90 enclose an agent (Figure 4.12(a)) plus the boundary space described in Section 2.1.2 will serve our purpose of adding the ability to avoid collisions to an algorithm.

(a)

(b)

(c)

Figure 4.12: LCA partition. a) Partition of the MDP states set to enclose an agent per LCA cell. b) Local Navigation Map preserving Π∗ data at heuristically selected cells. c) Local Navigation paths when agents are directed to their nearest DIR[d] LCA cell.

We observe from the example in Figure 4.12(c) that collisions against environmental objects will no longer be an issue by implementing a partition of the MDP states set that preserves Optimal Policy data at heuristically selected cells, therefore solving the local obstacle collision problem. We will refer to this concept as a Local Navigation Map, composed of LCA cells, defining its generation and its heuristic for Optimal Policy cell placement in Algorithm 4. Also from the example in Figure 4.12 we can precise the remaining issues to address in order to formulate a Local Collision Avoidance algorithm: • Sub-partitioning of the navigable space, as defined with Algorithm 4 has produced an array of cells—partitioning the MDP states set—marked either as OPENSPACE, OBSTACLE, GOAL or a DIR[d] − 3 − 0 < d < 9, d ∈ N; but the resulting Local Navigation Map is not useful by itself to actually displace agents. • Agent-to-agent collisions may happen as they progress toward the goal zone if no Local Collision Avoidance method is implemented. • In Figure 4.12(c) example, agents are directed to their nearest cell marked as DIR[d]. Although this would be optimal for each agent, the presence of other agents may produce collisions if there is no mean to steer, or even modify the logic to select the assigned DIR[d] cell. Next, the Local Navigation Map will function as a Set of Connected Sites to implement a Cellular Automaton, addressing the Local Collision Avoidance problem.

91

Algorithm 4: Local Navigation Map Algorithm input : MDP Policy Π∗ P OLICY of size mdpW idth × mdpDepth output: Local Navigation Map LN M of size lcaW idth × lcaDepth 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

if lcaW idth % mdpW idth == 0 then lcaW idthRatio ← lcaW idth / mdpW idth; lcaDepthRatio ← lcaDepth / mdpDepth; lcaRatio ← lcaW idthRatio × lcaDepthRatio; for i ← 0 to lcaW idth × lcaDepth do LN M [i] ← OPENSPACE; end for lx ← 0 to lcaW idth do for lz ← 0 to lcaDepth do mx ← lx / lcaW idthRatio; mz ← lz / lcaDepthRatio; mi ← mz × mdpW idth + mx; if P OLICY [mi] == OBSTACLE || GOAL then li ← lz × lcaW idth + lx; LN M [li] ← P OLICY [mi]; end end end foreach mdpCell C do // Π∗ placement heuristic foreach lcaCell L in perimeter of C do direction ← P OLICY [C]; if DoesNotPointToObstacle(direction,L) then if EdgeMatchesPolicyDirection(direction,L) then LN M [L] ← direction; end end end end end

92 4.3.2

CELLULAR AUTOMATON CHARACTERISTICS

Following the description in Section 4.2.2 for the LGCA framework, the Cellular Automaton for LCA is composed of: A set of connected sites The Local Navigation Map (Figure 4.12(b)) defined with Algorithm 4, modeled with 9 velocities at each site (8 directions to move, and the ninth to hold) as the D2Q9 model (Figure 4.8), fitting precisely our set of directions from Figure 4.11(a). State variables As defined by the Local Navigation Map: OPENSPACE, OBSTACLE, GOAL or DIR[d] − 3 − 0 < d < 9, d ∈ N will determine the ability of agents to move, and their direction. An update rule For the LCA Cellular Automaton: a stage in which LCA cells Scatter agents currently in its area, and a stage in which LCA cells Gather nearby incoming agents.

4.3.3

CELLULAR AUTOMATON UPDATE RULE

For the update rule, our approach considers the following: to avoid the lack of separation and control for individuals that follow the majority flow and direction towards an exit, we abandon the pursuit of modeling crowds as a fluid, replacing steps Stream and Collide from Algorithm 2 with Scatter and Gather operations. The idea behind these modifications is to direct agents toward the nearest LCA cell marked as DIR[d] with a different transfer method. We will refer to this type of LCA cell, i.e. the nearest DIR[d], as the Local Goal for an agent.

Agent translation increment We define agent translation between LCA cells (Figure 4.13(a)) by means of: 1. A starting relative position S = LCAs (x, z) + (∆x, ∆z) within the starting LCAs cell. 2. A vector to direct the agent. For the LCA Cellular Automaton definition, one of the 9 velocities (eight directions to move, and the ninth to hold) will suffice. Also, the translation path may be a B´ezier Curve. Either this vector or calculation of the ending cell determines the ending position E = LCAe (x, z) + (∆x, ∆z). 3. A weighted parameter considering the agent’s predefined speed, terrain type at the current cell, as well as the elapsed time to generate a value 0 ≤ δ ≤ 1, δ ∈ R that is incremented by an amount λ at each simulation iteration. Parameter λ is relevant for our approach because it is one of the traits that differentiate each agent within a simulated crowd. 4. Linear interpolation computes an agent’s position as P = S + δ(E − S).

93

(a)

(b)

Figure 4.13: Agent translation. a) LCA translation parameter. An agent conserves relative in-cell displacement. b) LCA translation between MDP cells.

Basic Scatter-Gather Algorithm A first approach to model the Scatter-Gather algorithm for our LCA Cellular Automaton introduces the functionality to displace agents between LCA cells using parameter δ. As an example we consider the elements in Figure 4.14, where agent A is at LCA cell (0, 1) and agent B is at LCA cell (1, 1). Also agent A is 25% faster than agent B. At each sub figure, 16 LCA cells partition a single MDP cell (only the top 12 LCA cells are shown). The next algorithm performs the Scatter step, then the Gather step, for each LCA cell in the Local Navigation Map as follows: • Scatter. 1. For each agent at this LCA cell, if its parameter is δ = 0, then assign a Local Goal as follows: If this LCA cell is NOT a LG then assign the nearest Local Goal (Figure 4.14(a)) within this MDP cell. If this LCA cell IS a LG then assign the Local Goal as the neighboring cell pointed to by Π∗ , i.e. DIR[d] in the Local Navigation Map (Figure 4.13(b)). 2. For each agent at this LCA cell, if its parameter is 0 < δ < 1, then increment its parameter so that δ = δ + λ. Note that in Figure 4.14(b), agent A will reach the end of its path first. • Gather. For agents moving towards this cell (i.e. query neighboring cells), if their parameter is δ ≥ 1, then reset their parameter to δ = 0, and also set the agent’s current cell to this LCA cell (Figure 4.14(c)). Iteration of this algorithm at each simulation time increment generates the cyclic phenomena shown in Figure 4.14(d). Notice that translation between MDP cells is guaranteed because the Local Navigation Map preserves information from Π∗ at each LCA cell marked DIR[d].

94

(a)

(b)

(c)

(d)

Figure 4.14: Basic Scatter-Gather algorithm. a) Local Goal assignment. b) Scatter step c) Gather step. d) Final effect.

Improved Scatter-Gather Algorithm A problem with the Scatter-Gather algorithm as defined in Section 4.3.3 is that as the number of agents increases, a race condition will arise for agents competing to occupy the same LCA cell (Figure 4.15(a)), causing unpredictable system behavior. Even without the race condition, another problem is the inability of faster agents to steer when they encounter a slower agent in their path, forming a single-lane queue even when open space is available (Queuing problem). An extension of this problem is the inability of agents to disperse around a congested area, waiting to occupy their Local Goal (Waiting problem). Logically these issues are derived from the removal of turbulence and relaxation modeling that converge to equilibrium in the original LGCA framework. All the same we can overcome these issues, continuing the idea to grant separation and control for individual agents, by implementing General Semaphores as described by Dijkstra [169] as well as a Steering subsystem to the basic Scatter-Gather algorithm, as we show next.

95

(a)

(b)

(c)

(d)

Figure 4.15: Race condition solution. a) Race condition problem: in direction toward their respective Local Goal at LCA cell (2, 1), agents compete to occupy LCA cell (1, 1). b) Flag implementation c) Distance increment measure step, here λA > λB . d) AgentB gives way to AgentA .

Race condition solution. In order to avoid the Race condition problem (Figure 4.15(a)), a boolean flag is implemented in the agent model, signaling the agent to stop moving when the flag’s value is false. Additional steps are then scheduled in the Cellular Automaton for LCA, just before the Scatter-Gather step is performed: 1. For all agents incoming to this LCA cell (i.e. query neighboring cells), set all flags to false (Figure 4.15(b)). 2. If this LCA cell is occupied, i.e. the current cell assigned to Agenti is this LCA cell, and 0 < δi < 1, do nothing further. 3. For all agents incoming to this LCA cell (i.e. query neighboring cells), determine the agent with the greatest parameter increment λ (Figure 4.15(c)).

96 4. Set the flag to true only for the agent with the greatest parameter increment λ (Figure 4.15(d)). Finally, we can modify the Scatter-Gather algorithm to only scatter agents if their flag is set to true. The mechanic just described to avoid the Race Condition problem works as a General Semaphore [169], where LCA cells have the role of producers, and agents have the role of consumers. LCA cells “produce” navigable area, while agents “consume” it, by displacing over it, in buffered portions: the distance increment λ.

Figure 4.16: Similar directions. DIR[4] and DIR[6] are similar to DIR[5] for AgentA .

Queue and Waiting solution. When Dijkstra [169] defines the Priority Rule concept, states that “. . . when we have more than one producer and one of them is waiting, then the other ones may go on and reach the state that they wish to offer a portion.” Next we extend the Semaphore analogy, noting that Queues and Waiting agents occur when navigable area producers, i.e. LCA cells are waiting, whenever an agent is standing over their area. Now we come to the point when other producers wish to offer a portion for a consumer that wishes to consume it. Although Dijkstra’s explanation for the Priority Rule concept goes on for a rich understanding of interconnected processes, the mere existence of other producers wishing to offer a portion will suffice for our Queue and Waiting solution. The question remains to determine a method for agents to accept the offer from the producer that represents the next best alternative towards their goal. An agent may only move along adjacent cells, so for a given LCA cell, only the eight neighboring cells are to be considered as alternatives when the next cell is occupied. We determine the optimal alternative with the aid of the following example: we suppose that AgentA is trying to move to the adjacent cell in direction DIR[5], but that cell is currently occupied, as shown in Figure 4.16. Close directions to DIR[5]: DIR[4] and DIR[6] are the best alternatives, as they point to cells adjacent to the original target. In general, for an N -directional set, DIR[(i + 1)%N ] and DIR[(i + N − 1)%N ] are the best alternatives to DIR[i]. Of the cells pointed to by these alternative directions, the closest to the Local Goal will have priority. In Dijkstra’s terms, using this logic we can now program consumers to accept or reject portions offered by alternative producers,

97

(a)

(b)

Figure 4.17: Scatter-Gather step with steering for three simulation steps. a) Queue problem (left) solution (right) when free adjacent cells are available. b) Waiting problem (left) solution (right) when free adjacent cells are available.

whenever the producer of choice is waiting. Now we can modify the Scatter-Gather algorithm as follows: • Agent instances will maintain the index of the next LCA cell in their path. • Agent instances will maintain the index of the best LCA cell alternatives to the next LCA cell. • If the next LCA cell is occupied AND the movement flag is set to false after the Race Condition solution: – Check if the best alternative LCA cell closest to the Local Goal is occupied, set the next LCA cell index to this alternative if it is not occupied. – If the best alternative LCA cell closest to the Local Goal is occupied, then check if the best alternative LCA cell farthest to the Local Goal is occupied, set the next LCA cell index to this alternative if it is not occupied.

98 – If the next LCA cell index has changed: If this LCA cell is NOT a LG then assign the nearest Local Goal within this MDP cell. If this LCA cell IS a LG then assign the Local Goal as the neighboring cell pointed to by Π∗ , i.e. DIR[d] in the Local Navigation Map. – If both of the best alternative cells are occupied, do nothing further (The Agent is forced to wait). Consideration of additional producers allows us to solve the Queuing problem (Figure 4.17(a)) and the Waiting problem (Figure 4.17(b)) all with the same approach. Next we proceed to formulate the update rule for the LCA Cellular Automaton, knowing its basic Scatter-Gather algorithm, and the solution to the Race Condition, Queuing and Waiting problems derived from its characteristics. We define this update rule with Algorithm 5 for the Race Condition solution, and Algorithm 6 for the Queuing and Waiting solutions.

4.3.4

CONCLUSION AND FUTURE WORK

We have presented a modified version of the LGCA framework for a LCA Cellular Automaton that shares its characteristics: a) a set of connected sites; b) state variables; c) an update rule. The proposed LCA Cellular Automaton differs from the LGCA framework in its update rule, replacing the Stream and Collide steps with Scatter and Gather functions, which no longer simulate a fluid, but introduce separation and control for agents. The LCA Cellular Automaton can be coupled to the MDP Navigation method because: • Its set of connected sites is a partition of the MDP states set partition of the navigable space. • Its Local Navigation Map preserves the Optimal Policy (Π∗ ) data as an input to direct and control agents. Modeling of the LCA Cellular Automaton revealed Race Condition, Queuing and Waiting problems as a consequence of the removal of turbulence and relaxation functions that converge to equilibrium in the LGCA framework. However, we have been able to overcome them by introducing General Semaphores and a Steering subsystem to the Scatter-Gather algorithm in the update rule of the LCA Cellular Automaton. Although we have provided a theoretical framework for the algorithm, we yet have to determine if an implementation can verify the following hypotheses, since the method shares characteristics with the LGCA framework: • Its parallel implementation is viable and straightforward, since its describing equations are directly mapped to transformation and reduction operations. • Its performance frees enough computational processing resources for character rendering.

99

Algorithm 5: Update rule algorithm for the LCA Cellular Automaton. Race condition solution. input: Local Navigation Map LN M of size lcaW idth × lcaDepth 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53

foreach lcaCell L in LN M do moveAgent ← N U LL; occupied ← F ALSE; foreach Agent A in L do delta ←GetParamDelta(A); if GetCurrentCell(A)== L && delta < 1 && delta > 0 then occupied ← T RU E; break; end end foreach lcaNeighbor N next to L do λlargest ← 0; foreach Agent A in N do if GetNextCell(A)== L then SetFlag(A,F ALSE); if GetParamLambda(A)> λlargest then λlargest ←GetParamLambda(A); moveAgent ← A; end end end end if occupied == F ALSE then SetFlag(moveAgent,T RU E); end end foreach lcaCell L in LN M do foreach Agent A in L do if GetParamDelta(A)== 0 then if IsLocalGoal(L)== F ALSE then SetLocalGoal(A, FindNearestLG(L)); else SetLocalGoal(A, GetNeighborByPolicy(L)); end end deltaA ← GetParamDelta(A); if deltaA > 0 && deltaA < 1 && GetFlag(A)== T RU E then SetParamDelta(A,deltaA+GetParamLambda(A)); end end end foreach lcaCell L in LN M do foreach lcaNeighbor N next to L do foreach Agent A in N do if GetNextCell(A)==L then if GetParamDelta(A) ≥ 1 then SetParamDelta(A,0); SetCurrentCell(A,L); end end end end end

// Racing Condition check

// Scatter step // Assign Local Goal

// Scatter

// Gather step

// Gather

100

Algorithm 6: Update rule algorithm for the LCA Cellular Automaton. Queuing and Waiting solution. input: Local Navigation Map LN M of size lcaW idth × lcaDepth 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

/* Execute after Racing Condition check, before Scatter step foreach lcaCell L in LN M do foreach Agent A in L do N ←GetNextCell(A); occupiedN ← F ALSE; foreach Agent B in N do deltaB ←GetParamDelta(B); if deltaB < 1 && deltaB > 0 then occupiedN ← T RU E; break; end end if occupiedN && GetFlag(A)== F ALSE then nextChanged ← F ALSE; dir ← LN M [L]; localGoal ←GetLocalGoal(A); altDir1 ← DIR[mod(dir + 1, 8)]; altDir2 ← DIR[mod(dir + 7, 8)]; altCell1 ←GetNeighborByDirection(L,altDir1); altCell2 ←GetNeighborByDirection(L,altDir2); best1 ←GetBestAlternative(localGoal,altCell1,altCell2); best2 ←GetSecondAlternative(localGoal,altCell1,altCell2); if IsOccupied(best1)== F ALSE then SetNextCell(A,best1); nextChanged ← T RU E; else if IsOccupied(best2)== F ALSE then SetNextCell(A,best2); nextChanged ← T RU E; end end if nextChanged == T RU E then if IsLocalGoal(L)== F ALSE then SetLocalGoal(A, FindNearestLG(L)); else SetLocalGoal(A, GetNeighborByPolicy(L)); end end end end end /* Scatter step follows...

*/

*/

101 Since the LCA Cellular Automaton requires Π∗ as an input, we provide Implementation and Results for the full Hybrid Crowd Movement Algorithm in Chapter 5. Also, we believe that even though the nearest Local Goal solution to the Queuing and Waiting problems works, it may be improved if it considers the relevant individual and group behavioral traits listed in Section 2.1. This is because the proposed solution assumes a perfect and rational choice for agents in all situations, but behavioral traits found during our research shows that these choices are influenced by cultural and individual characteristics. The previous topic, as implementation of the Local Navigation Map in a hexagonal, more efficient grid, are opportunities for future work.

4.4

CONCLUSION

The search for a Collision Avoidance method compatible with Navigation partition resulting from mapping the MDP formalism to a navigable scenario with obstacles, led to diverse attempts to couple a Local Collision Avoidance method to the MDP states set partition. From these experiments we learned: For MDP and micro scenarios: Implementation of the method for an integrated Navigation and Local Collision Avoidance technique allows us to simulate real-time crowds while freeing enough computational resources to present its three dimensional visualization with the same hardware. It does, however, introduce the following limitations: a) Collision Avoidance accuracy depends entirely on the micro scenario radius; b) increment of micro scenario radius is costly in computational terms; c) while in transit, agents must complete the B´ezier curve path before they are able to modify their trajectory. For the LGCA framework: Implementation of the Single Phase, Single Component LGCA framework applied to crowd movement shows that: a) its parallel implementation is viable and straightforward; b) its performance frees enough computational processing resources for character rendering; c) it solves both the Collision Avoidance and Navigation problems. Nevertheless, it implies lack of separation and control for individuals as they must follow the majority flow and direction towards an exit. For a LCA Cellular Automaton With the following characteristics: a) a set of connected sites; b) state variables; c) an update rule, the LCA Cellular Automaton with Scatter and Gather functions for its update rule, introduces separation and control for agents. It can be coupled to the MDP Navigation method because: a) its set of connected sites is in turn a partition of the MDP states set partition; b) its Local Navigation Map preserves Π∗ data as an input to direct and control agents. Also, General Semaphores and a Steering subsystem eliminates the Race Condition, Queuing and Waiting problems introduced by its initial design.

5. HYBRID CROWD MOVEMENT ALGORITHM To achieve Objective 4 we require an implementation of the HCMA, to demonstrate Proposition 3, with the following characteristics: (i) An interactive large-sized crowd simulation running in a commercially available laptop PC; (ii) Simulation will be executed in test scenarios with diverse topology to demonstrate accuracy and effectiveness of the HCMA; (iii) In a single-layer approach, the HCMA will be tested for a single crowd directed to the nearest goals; (iv) In a multiple-layer approach, the HCMA will be tested for two crowds directed to multiple goals in opposite directions and (v) The HCMA will be tested with different environmental conditions, such as terrain diversity, individual agent speed, complex topology scenarios, as well as dynamic scenarios, in which part of the topology changes on user demand. In order to achieve this objective, we have designed feature rich scenarios to test the accuracy TM R and effectiveness of the HCMA. All tests were performed using an NVIDIA GeForce GTX TM R 780M GPU1 (1,536 CUDA cores) in a commercially available laptop PC, with an Intel Core i7-4800MQ CPU @ 2.70GHz and 16.0 GB of RAM. Rendered scenarios for demonstration were Ray-traced with Visual Enterprise Author2 . The HCMA method is outlined in Figure 5.1 for the preprocessing and run-time stages. Functions in the update rule of the LCA Cellular Automaton are translated to parallel code, from Algorithm 5 and Algorithm 6 with the use of transform, reduction and custom kernel operations exposed by the Thrust Parallel Algorithms Library3 . 1

Specifications.

(n.d.).

Retrieved November 11, 2014, from http://www.geforce.com/hardware/

notebook-gpus/geforce-gtx-780m/specifications 2 3

SAP. (n.d.). Retrieved November 11, 2014, from http://www.simulistic.com/sap.html Thrust - Parallel Algorithms Library. (n.d.). Retrieved November 11, 2014, from http://thrust.github.io/

102

103 The parallel HCMA method (Figure 5.1) is executed at run-time (Figure 5.1(b)) using data structures prepared during the preprocessing (Figure 5.1(a)) stage as described next.

(a)

(b)

Figure 5.1: Parallel HCMA method. a) Preprocessing algorithms. b) Run-time algorithms

Preprocessing stage

Given specific scenario features, whether scaled actual features as in the Office, Eiffel, Campus scenarios; or virtual features as in the Town and Maze scenarios, topology data is an input for the MDP solving algorithm, which in turn will obtain the Optimal Policy Π∗ via Value Iteration from the states set. Then the LNM Algorithm will generate an array holding Local Navigation Map values.

Run-time stage

Values in the Local Navigation Map are the input for parallel implementations of the Racing Condition Check, Queue-Waiting Solution, Scatter and Gather algorithms, that in turn modify an array of agent positions, controlling crowd movement.

Metrics for all scenario models are reported in Table 5.1. Execution results for simulations incorporating the parallel HCMA implementation for each scenario are presented next. Simulations presented in this chapter have the following rendering optimizations:

104 Office

Town

Vertices 2,003 53,789 Faces 9,280 58,876 Texture points 21,359 60,827 Normal points 44 1,899 Textures 4 32 MDP width cells MDP depth cells Total MDP cells

Eiffel

Maze

Campus

Multi-l.

29,542 31,095 5,703 616 13

448 604 5,703 242 1

9,376 12,856 714 2,157 27

448 604 5,703 242 1

60 60 3,600

100 100 10,000

100 100 10,000

100 100 10,000

200 200 40,000

10 10 100

LCA width cells 180 LCA depth cells 180 Total LCA cells 32,400

200 200 40,000

400 400 160,000

200 200 40,000

400 400 160,000

50 50 2,500

4

16

4

4

25

LCA cells per MDP cell

9

Table 5.1: Scenario metrics summary. • An implementation of the GPGPU Static Level of Detail and View Frustum Culling technique by Hern´andez and Rudomin [170] for characters (three levels of detail per character), accelerating rendering speed. • Reduction of memory requirements for diverse characters, as described in Section 6.1.1. • Reduction of memory requirements for animated characters, as described in Section 6.1.4. Also, the Maze and Multi-layered scenarios make use of the Instancing4 rendering technique to draw obstacles, improving rendering performance.

4

Vertex Rendering. (n.d.). Retrieved November 11, 2014, from https://www.opengl.org/wiki/ Vertex_Rendering#Instancing

105 5.1

OFFICE SCENARIO

Figure 5.2: Rendered Office scenario model. Modeled after an actual office complex5 (Figure 5.2) with precise measures, the Office scenario is ideal to test simulated evacuations in a reduced environment surrounded by cubicles and narrow corridors. The original office complex has a main, wide exit, and a narrower emergency exit. The HCMA is tested to evacuate a guided crowd from this virtual version of the dual exit scenario.

5.1.1

SIMULATION RESULTS

Simulation of an evacuation (Figure 5.3) from the Office scenario with 16 agents was performed at an average of 5 milliseconds per frame. Specific results for the HCMA are presented in Table 5.2. With scenario metrics constant, results for the same simulation evacuation, now for 64 agents at an average of 14 milliseconds per frame are presented in Table 5.3. With scenario metrics constant, results for the same simulation evacuation, now for 256 agents at an average of 16 milliseconds per frame are presented in Table 5.4. A performance summary based on these results is shown in Figure 5.5. We have found that, as the number of agents in the simulation increases, with scenario metrics constant, the HCMA represents a gradually lower time from the frame total. This is because additional time increment for HCMA computations is less than the additional time required to animate and render the agents. This performance is explained because the whole lattice of connected states is always active, and only demands an increment in computation time with agent proximity, as the number of required states coordinate advancing agents while avoiding collisions. Also, based on results from Tables 5.2, 5.3 and 5.4, we note that in general, average HCMA time remains constant independently of the number of characters.

5

Alvarado, Adriana. Modeler. Personal interview. 21 November 2011.

106

Figure 5.3: Office Complex Simulation. Top: general perspective. Bottom: close perspective. Table 5.2: Office scenario results for 16 agents @ 5 ms per frame. MDP width cells MDP depth cells Total MDP cells MDP solution time (ms) LCA width cells LCA depth cells Total LCA cells LCA cells per MDP cell

60 60 3,600 1,219 180 180 32,400 9

Required GPU Memory for parallel structures (MB) 26.504517 Time to allocate parallel structures (ms) 11.984300 Time to initialize parallel structures (ms) 1.381730 Racing condition, Q&W average time (ms) 0.213944 Scatter-Gather average time (ms) 1.476600 HCMA average time (ms) 1.690544 Percentage of frame time 33.810880

107

Table 5.3: Office scenario results for 64 agents @ 14 ms per frame. Racing condition, Q&W average time (ms) 0.215503 Scatter-Gather average time (ms) 1.50804 HCMA average time (ms) 1.723543 Percentage of frame time 12.311021

Table 5.4: Office scenario results for 256 agents @ 16 ms per frame. Racing condition, Q&W average time (ms) 0.227059 Scatter-Gather average time (ms) 1.579460 Average HCMA time (ms) 1.806519 Percentage of frame time 11.290744

Figure 5.4: HCMA performance for the Office scenario.

Figure 5.5: HCMA performance breakdown for the Office scenario.

108 5.2

TOWN SCENARIO

Figure 5.6: Rendered Town scenario model. “Instant Town for D&D Gaming6 ” is a 3D model (Figure 5.6) designed for video games. Its large area surrounded by walls is a convenient way to test multiple goals, long paths and narrow portals with the HCMA.

5.2.1

SIMULATION RESULTS

Conditions have changed for the Town scenario, because instead of an evacuation, we simulate an attack. Agents are immediately directed toward the nearest house, exploiting the ability for the HCMA to work with multiple destination goals. Agents outside the town walls must surround them to get to one of the two entrances. This sort of behavior may be applied not only to video games with crowds, but also to simulate emergency gatherings at rendezvous points in order to evaluate their locations. Simulation of an attack (Figure 5.7) at the Town scenario with 1,024 agents was performed at an average of 13 milliseconds per frame. Specific results for the HCMA are presented in Table 5.5. Next we increased the number of agents to 4,096 and measured an average time of 34 milliseconds per frame, as shown in Table 5.6. From the attack simulation with this scenario we note the following: • Required GPU time, as well as required GPU memory to allocate and initialize parallel structures depends only on scenario metrics, and not on the number of agents. • The Racing condition, Queuing and Waiting solution algorithms present only a marginal increase in the required time to perform computations when compared to large increases in the number of simulated agents, as the simulation with the Office scenario confirms. 6

3D Warehouse - View Model. (n.d.). Retrieved November 10, 2014, from https://3dwarehouse. sketchup.com/model.html?id=9bc83badd0061a8b1ba25132fa37e6d

109

Figure 5.7: Town Simulation. Top: general perspective. Bottom: close perspective. • From 1,024 to 4,096 agents, the HCMA percentage of the time required to present a simulation frame went from 26% to almost 63%. As global performance results shown in Figure 5.9 and detailed in Figure 5.10 indicate: – HCMA time represents a smaller percentage of the total time required to complete a simulation frame while agents flow freely, i.e. there are no bottlenecks due to narrow portals. – HCMA time represents a greater frame time percentage as more agents wait around congested areas (Figure 5.8). This is because the LCA Cellular Automaton algorithm is evaluated in full depth of its logical branches by more lattice sites until they determine that agents must wait. – Numeric results for the Town scenario confirm that even though the lattice of connected sites is always active, it only demands an increment in computation time with agent proximity, as the number of required states coordinate advancing agents while avoiding collisions. – We believe that an opportunity for future work resides in searching for thread synchronization issues, as uneven work loads may delay the performance of Streaming Multiprocessors in the GPU.

110

Table 5.5: Town scenario results for 1,024 agents @ 13 ms per frame. MDP width cells MDP depth cells Total MDP cells MDP solution time (ms) LCA width cells LCA depth cells Total LCA cells LCA cells per MDP cell

100 100 10,000 1,687 200 200 40,000 4

Required GPU Memory for parallel structures (MB) 32.615662 Time to allocate parallel structures (ms) 14.212800 Time to initialize parallel structures (ms) 2.558080 Racing condition, Q&W average time (ms) 0.303139 Scatter-Gather average time (ms) 3.106060 HCMA average time (ms) 3.409199 HCMA Percentage of frame time 26.224608

Table 5.6: Town scenario results for 4,096 agents @ 34 ms per frame. Required GPU Memory for parallel structures (MB) Time to allocate parallel structures (ms) Time to initialize parallel structures (ms) Racing condition, Q&W average time (ms) Scatter-Gather average time (ms) HCMA average time (ms) HCMA Percentage of frame time

32.615662 13.432100 2.960990 0.323706 21.050100 21.373806 62.864135

Figure 5.8: Town scenario simulation with 4,096 agents.

111

Figure 5.9: HCMA performance for the Town Scenario.

Figure 5.10: HCMA performance breakdown for the Town scenario.

112 5.3

EIFFEL SCENARIO

Figure 5.11: Rendered Eiffel scenario model. A section from Champ de Mars7 in Paris is a scenario to test the HCMA with a large-sized crowd evacuation in an urban environment. Actual Google Maps8 data was captured and processed to model the Eiffel scenario (Figure 5.11) topology.

5.3.1

SIMULATION RESULTS

For the Eiffel scenario, a different type of simulation is evaluated, as agents must evacuate towards the periphery of Champ de Mars, using only the walkable areas defined by the processed maps. For this simulation we increased the total number of LCA cells to 160,000 recording an average frame time of 16 milliseconds for 1,024 agents as reported in Table 5.7. 41% of the time required to present a frame corresponds to HCMA computations because of the large number of LCA cells. Instead of finding a limitation for the method because of this issue, we rather find an interesting opportunity in the ability to configure a trade off between simulation accuracy and hardware resources, with the simple assignment of LCA-MDP cells ratio. Next we increased the number of agents (Figure 5.13) to 4,096 which is in the range of real attendants at any given time considering that an average of 19,0009 people visit the Eiffel tower daily, and found that 63% of the time required to present a frame corresponds to the HCMA (Table 5.8). Here we find an interesting fact: for the same increment of agents, from 1,024 to 4,096, the frame time fraction for HCMA increased by 36% for the Town scenario, but only 22% for the Eiffel scenario. This fact supports the importance of congested scenario areas to evaluate the method’s performance, since the Eiffel scenario models a much larger area compared to the 7

3D Warehouse - View Model. (n.d.). Retrieved November 10, 2014, from https://3dwarehouse. sketchup.com/model.html?id=a31c40c55591aa6c59aed5538305c915 8 Tour Eiffel. (n.d.). Retrieved November 10, 2014, from https://www.google.com.mx/maps/place/ EiffelTower 9 The Eiffel Tower. (n.d.). Retrieved November 13, 2014, from http://www.toureiffel.paris/

113

Figure 5.12: Eiffel Simulation. Top: general perspective. Bottom: close perspective. Town scenario. As stated above, we find an opportunity for future work in addressing this issue, that can be solved applying an AI Level of Detail technique to the Local Navigation Map.

114

Table 5.7: Eiffel scenario results for 1,024 agents @ 16 ms per frame. MDP width cells MDP depth cells Total MDP cells MDP solution time (ms) LCA width cells LCA depth cells Total LCA cells LCA cells per MDP cell

100 100 10,000 1,672 400 400 160,000 16

Required GPU Memory for parallel structures (MB) 130.119324 Time to allocate parallel structures (ms) 34.308600 Time to initialize parallel structures (ms) 5.528030 Racing condition, Q&W average time (ms) 0.874596 Scatter-Gather average time (ms) 5.691640 HCMA average time (ms) 6.566236 HCMA Percentage of frame time 41.038975 Table 5.8: Eiffel scenario results for 4,096 agents @ 34 ms per frame. Racing condition, Q&W average time (ms) 0.919143 Scatter-Gather average time (ms) 20.550800 HCMA average time (ms) 21.469943 HCMA Percentage of frame time 63.146891

Figure 5.13: Eiffel scenario simulation with 4,096 agents.

115 5.4

MAZE SCENARIO

The HCMA is tested by guiding a crowd through a maze generated with Prim’s algorithm10 . The HCMA will generate long, non-trivial paths for agents.

5.4.1

SIMULATION RESULTS

For this simulation (Figure 5.14), agents are directed through a maze in order to evaluate Navigation with non-trivial paths.

Figure 5.14: Maze Simulation. Top: close perspective with 8,464 agents. Bottom: general perspective with 4,096 agents.

We tested the Maze simulation with 4,096 agents and recorded an average frame time of 45 milliseconds. HCMA percentage of this time corresponds to 8% due to the comparatively low number of LCA cells, 40,000 for this evaluation, as shown in Table 5.9. Next we tested the HCMA performance for 8,464 agents, varying the number of LCA cells per MDP cell, recording the results shown in Table 5.10, noting gradual increments in the time 10

Maze generation algorithm. (2014, October 24). Retrieved November 10, 2014, from http://en. wikipedia.org/wiki/Maze_generation_algorithm

116

Table 5.9: Maze scenario results for 4,096 agents @ 45 ms per frame. MDP width cells MDP depth cells Total MDP cells MDP solution time (ms) LCA width cells LCA depth cells Total LCA cells LCA cells per MDP cell

100 100 10,000 1,718 200 200 40,000 4

Required GPU Memory for parallel structures (MB) 32.615662 Time to allocate parallel structures (ms) 12.759000 Time to initialize parallel structures (ms) 2.841500 Racing condition, Q&W average time (ms) 0.337889 Scatter-Gather average time (ms) 3.546880 HCMA average time (ms) 3.884769 HCMA Percentage of frame time 8.632820 required to process the Racing, Queue, Waiting, Scatter and Gather algorithms. Table 5.10: Maze simulation performance for different LCA-MDP cell ratios. LCA cells per MDP cell

4

9

16

25

Racing, Q&W avg. time (ms) 0.356243 0.631404 0.966105 1.42076 Scatter-Gather avg. time (ms) 22.5748 24.0532 24.5246 26.4678 HCMA avg. time (ms) 22.93104 24.6846 25.49071 27.88856 HCMA Percentage of frame time 36.39848 37.97631 38.04583 41.01259 Average frame time (ms) 63 65 67 68 Number of Agents 8,464 These gradual time increments (Figure 5.15(a)) resulting from additional LCA cells packed into MDP cells to refine simulation accuracy (while the LCA cell area is enough to enclose an agent) are predictable as the linear trend in Figure 5.15(b) shows. By visual inspection we noticed tighter groups of agents, as well as more frequent character direction changes as the area of LCA cells decreased.

117

(a)

(b)

Figure 5.15: HCMA performance for 8,464 agents and 10,000 MDP cells. a) Different LCA per MDP cell ratios. b) HCMA time linear trendline.

118 5.5

CAMPUS SCENARIO

We have modeled a university campus11 (Figure 5.16) to test the HCMA ability to guide crowds toward evacuation points, adding diverse terrain features that impact agent speeds.

Figure 5.16: Rendered Campus scenario model.

5.5.1

SIMULATION RESULTS

For the Campus scenario, a building evacuation (Figure 5.18) is simulated exploiting the HCMA ability to use modified MDP reward values, modeling different types of terrain as shown in Figure 5.17(a). This scheme, for example, rewards an agent that walks over grass to reach its objective, but offers an even greater reward to an agent that walks over clear ground. Also, the HCMA is compatible with this diverse-terrain scheme, modifying agents displacements accordingly. The resulting penalty-reward model is shown in Figure 5.17(b), where red areas represent the greatest reward for an agent: a rendezvous point. We measured the average frame time for the Campus scenario evacuation, and registered 40 milliseconds for 4,096 agents, as shown in Table 5.11. This means that no penalization in execution time is added by the speed variety improvement. Also, we verified that agents do emulate the preference to walk over areas with greater rewards, and avoid walking over areas that represent penalties, as car circulation areas.

11

Mi Campus - Mi Campus ITESM CCM. (n.d.). Retrieved November 10, 2014, from http://micampus. ccm.itesm.mx/

119

(a)

(b)

Figure 5.17: Rewards and speeds for the Campus scenario. a) MDP rewards for the Campus scenario. b) Penalty-reward-speed model for the Campus scenario.

Table 5.11: Campus scenario results for 4,096 agents @ 40 ms per frame. MDP width cells MDP depth cells Total MDP cells MDP solution time (ms) LCA width cells LCA depth cells Total LCA cells LCA cells per MDP cell

200 200 40,000 4,781 400 400 160,000 4

Required GPU Memory for parallel structures (MB) 130.462646 Time to allocate parallel structures (ms) 32.699100 Time to initialize parallel structures (ms) 5.441920 Racing condition, Q&W average time (ms) 0.921846 Scatter-Gather average time (ms) 5.843110 HCMA average time (ms) 6.764956 HCMA Percentage of frame time 16.912390

120

Figure 5.18: Campus Simulation. Top: general perspective with terrain features. Middle: close perspective. Bottom: agents favor crossing through designated areas.

121 5.6

MULTI-LAYERED INTERACTIVE SCENARIO

The multi-layer HCMA advances the method with the idea to guide agents using different “channels” of Local Navigation Maps. Depending on the currently assigned channel for an agent, it can reach different goals, with different reward-penalty schemes opening new opportunities to research applications.

5.6.1

MULTI-LAYERED SCENARIO SIMULATION RESULTS

For this Multi-layered scenario (Figure 5.19), two Local Navigation Maps are generated, one with goals located at the far left, and another with goals located at the bottom cells of the scenario. In this way, a busy set of central cells is created, as agents have to cross to get to their assigned goal.

Figure 5.19: Multi-layered Simulation. Top: general perspective. Bottom: Close perspective. We found that, since Navigation is decoupled from Collision Avoidance in the HCMA implementation, the only modification required for the algorithm is its ability to work with more than one LNM, and an index variable for agents to perform Navigation on their assigned channel. The modified, 2-channel HCMA presents a frame every 7 milliseconds in average for 256 agents (Table 5.12) while the original 1-channel HCMA presents a frame every 6.5 milliseconds for the same topology. Although there is no theoretical limit for the number of channels that can be processed,

122 preprocessing simulation time increases proportionally to the number of MDPs to solve, with the corresponding creation of Local Navigation Maps. Table 5.12: Multi-layered scenario results for 256 agents @ 7 ms per frame. MDP width cells MDP depth cells Total MDP cells MDP (Layer 1) solution time (ms) MDP (Layer 2) solution time (ms) MDP total solution time (ms) LCA width cells LCA depth cells Total LCA cells LCA cells per MDP cell

10 10 100 375 203 578 50 50 2,500 25

Required GPU Memory for parallel structures (MB) 2.042389 Time to allocate parallel structures (ms) 2.911300 Time to initialize parallel structures (ms) 0.846496 Racing condition, Q&W average time (ms) 0.109635 Scatter-Gather average time (ms) 1.097600 HCMA average time (ms) 1.207235 HCMA Percentage of frame time 17.246214

5.6.2

MULTI-LAYERED INTERACTIVE SCENARIO SIMULATION RESULTS

Finally, we also extended the HCMA method with the ability to update Local Navigation Maps on user demand. The user is in control of a cursor (Figure 5.20) that allows interactive changes to the scenario topology. The user can add or remove obstacles at any MDP cell, and the interactive HCMA updates the required data structures with an extension of the modular process mentioned in Section 3.2.2. We found that, during the updating process, the simulation presents a frame every 9 milliseconds in average, meaning that two extra milliseconds are the overhead when updating dual 10 × 10 MDPs, searching for Π∗ convergence at one Value Iteration per frame. Visually and topologically, the scenario is not updated until convergence has been verified; this however, is not mandatory, since every Value Iteration produces a candidate policy Πi , and we think that an opportunity for future work is to update the scenario at every iteration, as the AI “learns” the Optimal Policy.

123

Figure 5.20: Multi-layered interactive Simulation. Top: general perspective with cursor. Bottom: modified scenario is solved at interactive rate.

124 5.7

CONCLUSION

We have presented a set of tests for the Hybrid Crowd Movement Algorithm, whose objective is to verify its accuracy and efficiency to fulfill Objective 4. For this purpose we have used a parallel implementation of the HCMA to conduct six simulation experiments, obtaining results supporting that in the majority of cases, rendering is the greater time consuming task per simulated frame. These results also support that the HCMA has the following capabilities: • An interactive large-sized crowd simulation running in a commercially available laptop PC. Results for the Town, Eiffel, Maze and Campus simulations show that crowds composed of up to 4,096 agents can be controlled by the HCMA, which frees enough computational processing resources to show varied animated characters, and present a scenario model simultaneously at interactive rates, using a commercially available laptop PC. • Simulation will be executed in test scenarios with diverse topology to demonstrate the accuracy and effectiveness of the HCMA. Results of the six experiments conducted demonstrate the HCMA accuracy by performing Local Collision Avoidance for agents in a simulated crowd tested with scenarios that presented diverse topology, while also producing Longrange Navigation paths for groups of agents within the crowd. The outcomes of these experiments confirm the HCMA effectiveness as it performed both Navigation and Local Collision Avoidance tasks, using 8.6% of the time required to present a simulation frame in slightly congested scenarios, up to 62.8% of the frame time in heavily congested scenarios using the referenced hardware. – In a single-layer approach, the HCMA will be tested for a single crowd directed to the nearest goals. This is the case for the Town scenario experiment, but we showed additional goal configurations for which the algorithm can control and guide agents in a single-layer approach. – In a multiple-layer approach, the HCMA will be tested for two crowds directed to multiple goals in opposite directions. With the Multi-layered scenario simulation we have shown that introduction of multiple Navigation layers to the HCMA can extend the goal configuration possibilities for the method. • The HCMA will be tested with different environmental conditions, such as terrain diversity, individual agent speed, complex topology scenarios, as well as dynamic scenarios, in which part of the topology changes on user demand. The Campus scenario simulation results have shown that the HCMA can be configured using a speed model that affects individual agent speed as it moves over diverse terrain. With the Multi-layered Interactive simulation we presented a configurable scenario that updates Local Navigation Maps on user demand.

6. REDUCING MEMORY REQUIREMENTS FOR DIVERSE ANIMATED CROWDS We extend a section from the doctoral thesis of Benjam´ın Hern´andez [1] with: a) a virtual motion capture system for characters animated with commercially available 3D modeling software, discussed in Section 6.1.2: Animation Clip Exporting; b) a texture-based smooth skinning system complementing the previously designed texture-based rig system for animated characters in virtual crowds, described in Section 6.1.2: Rig and Skinning Embedding and Section 6.1.4: Animation; c) a character creation software to load and stitch topologically compatible body parts, explained in Section 6.1.1: Pre-processing. The remaining equations and techniques are not part of our contribution, but are included to show the method’s context. All the following methods, working together as an animation technique present four advantages: first, memory requirements are reduced by 97% when compared to traditional libraries; second, it can also generate previously nonexistent characters from the original data set; third, embedding the rig and skinning into texture space allows painless animation transfer between different characters and fourth, between different levels of detail of the same character. Our approach to animation diversity is solving the rig and skinning reuse problem, therefore we propose embedding the previously created rig and skinning information into a global texture parameterization all generated characters share. We will show that taking advantage of this global texture parameterization our approach only needs a set of rigging and skinning maps per character gender in order to rig and animate the generated crowd while maintaining memory requirements low.

125

126

Figure 6.1: Method Overview. Our approach has pre processing and runtime steps. Asset organization, reduction, segmentation and rig and skinning encoding in texture space are performed off line. During runtime the resultant assets are used for generation of diversity, animation and rendering.

6.1

METHOD

Figure 6.1 shows a general diagram of our approach consisting in offline and runtime stages each one conformed by two steps. In the first step, pre-processing (Section 6.1.1), character meshes are organized, classified, reduced and segmented, and pattern textures and color palettes are created. Then, the rig and skinning information is encoded into character texture space and animation clips exported (Section 6.1.2). Finally, the user characterizes a desired crowd trough XML-scripting. At runtime, the XML description is parsed to generate outfit coordinates, body templates, crowds (Section 6.1.3) and to define which animation clips will be used (Section 6.1.4). The parsed data is then uploaded to graphics memory and using instancing, displacement mapping and animation are computed by a vertex shader, while visual and texture diversity are computed by a fragment shader (Section 6.2).

6.1.1

PRE-PROCESSING

As crowds increase in complexity and size more assets such as meshes, textures and animation clips need to be created. Commercially available or user created character libraries (e.g. Rocketbox, aXYZ Design, Turbosquid) are becoming a better alternative than modeling and rigging from scratch. These libraries can be adapted to allow the generation of new models by reusing existing parts and simplified to meet resources constraints resulting in significant production and economical savings. In this section, we will describe the steps taken to prepare the assets for the

127

(a)

(b)

(c)

(d)

Figure 6.2: a) Some character examples from the library. Basis meshes: b) Male’s body parts. c) Female’s body parts. d) Children body parts.

next stages. The library1 (Figure 6.2a) used in this work is stored in a three DVD set containing 104 characters (56 males, 42 females, 4 children and 2 grannies). Each character has three LOD meshes, a diffuse and specular texture and a normal map having a resolution of 2048 × 2048 each. Character meshes are fully rigged and available in 3ds MAX and Maya binary formats. In addition, there are three sets of animation clips, one for each LOD mesh, classified by gender (female or male) and age (adult or children). In addition, all character UV maps follow a generic texture parameterization distribution (Figure 6.3). The library was classified into families according to gender, age and outfit style while duplicates were discarded. Then, remaining characters where manually segmented to generate a library conformed by basis body part meshes preserving their original information (absolute position and orientation, normals and texture coordinates), discarding rigging and skinning data and maintain1

Complete Characters. Rocketbox Libraries. http://www.rocketbox-libraries.com/

128

Figure 6.3: Character UVs cover these areas: 1) Head. 2) Legs. 3) Torso. 4) Arms. 5) Feet. ing topological compatibility between head-torso and torso-legs body part meshes of the same family. After classification and segmentation meshes were reduced from 107 to 12 megabytes. Hair and skin textures where classified resulting in twenty five and twenty four male and female skin textures respectively. These textures where downscaled and packed into a texture atlas per gender (Figure 6.4a). Following the fact that observers focus visual attention on upper body parts [135], the torso set was treated differently. In this case, we reuse a fabric folds texture and generate two pattern textures per torso mesh in order to amplify the visual variety at runtime (Figure 6.4b). Further detail is added to the facial area of the skin atlas by using an extra set of facial features texture (skin spots, eye sockets and facial hair). Since, in the current implementation, the illumination system only calculates diffuse lighting, specular and normal maps are discarded. After texture reorganization, texture data was reduced from 9.7 GB to 204 MB.

6.1.2

RIG AND SKINNING INFORMATION EMBEDDING AND ANIMATION CLIP EXPORTING

As mentioned in Section 6.1.1 during body part segmentation rigging and skinning data was discarded. However, this information is required in order to animate new characters, in this case we have two possibilities, the first one is to rig and skin the characters after generation. This option implies the generation of a specific rig and skin for each body template, specifically for each body part, e.g. the skinning information may be different in characters using jackets than those using sweaters; in some sense, this is redundant and consumes valuable resources. A second option consist of maintaining partial rigs and use a similar approach to Miller et al. [144]. Nevertheless this approach generates new information from existing one which is not desirable in our case and extra processing must be done for a character with different levels of detail since LOD meshes

129

(a)

(b)

Figure 6.4: a) Example of the male skin texture atlas composed with the hair texture atlas. b) Examples of fabric folds and pattern textures. Notice that blank areas have a value of zero transparency.

are usually different in topology and overall shape. On the other hand, once a character has been rigged, another recurrent problem of character animation in crowds is defining how to use and reuse animations. Common approaches use characters with rig, skinning and animation packed in a file. Then, the character is loaded into memory and rendered several times using instancing and methods to enforce uniqueness. Different or new animations can be used by packing more animation cycles into the same file. However, this is time consuming, generates larger files, is redundant and can not easily incorporate new animation clips or character meshes. Our approach solves these problems, first, by generalizing rig and skinning information, i.e. we propose the encoding of the already existing rig and skinning information into the character UV areas (Figure 6.3) in order to take advantage of UV’s dense space avoiding mesh dependency and topology issues common in traditional skinning. This will generate a texture for each gender that can be used indistinctly on any generated character eliminating the need of specific or partial rigs and skin for each body part, body template or LOD mesh. In addition, it also reduces memory requirements since no extra information needs to be generated once a new character is generated. On the other hand, animation clips are stored in auxiliary textures and thus they can be used during run-time indistinctly according to user needs.

Rig and Skinning Embedding Using the UV parameterization (Figure 6.3) for characters from the same data set, our technique stores skinning weights in the red, green and blue channels of a texture, while joint placement

130

(a)

(b) Skinning weights and virtual MoCap

Figure 6.5: a) Rig and Skinning embedding: smooth skinning weights from Maya are exported as textures for each bone. These textures are colored in a batch process according to a list stored in a CSV file. Finally, the textures are added to compose the weights texture and to generate joint placement zones. b) Animation Clip Exporting: an animation is captured in a CSV file where each line holds 16 floating point values representing the world space matrix for a bone in a given frame of the animation. An auxiliary program converts these CSV capture files into 4-width RGBA textures holding the same information.

information is stored in the alpha channel (Figure 6.5a) of the same texture. 1. Weight texture creation. First the modeling software Maya is used to export a series of skinning weights textures for a generic rigged model representing a gender from the data set. Second, with the aid of the ImageMagick software and a bone-color list stored in a CSV file, a batch process colors each texture in red, green or blue. This coloring process will, at runtime, configure a vertex shader to influence the displacement of a vertex related to at most three bones in the rig hierarchy. Third, after each texture coloring, the result is added to a composite texture: the weight texture that holds skinning information in the red, green and blue channels (Figure 6.5a). 2. Zone texture creation. Whenever three bones in the rig hierarchy are directly connected, they are considered a rigging zone in the UV space. We define, for instance, a Neck-Right Clavicle-Right Arm zone, a Right Arm-Right Forearm-Right Hand zone and so on. Afterwards, each face in the generic character mesh is mapped to its corresponding rigging zone–with the aid of a MEL script that finds the nearest bone for a face–sequentially writing the alpha channel of the weight texture, which in our approach is the rig and skinning texture.

131

(a)

(b)

(c)

(d)

Figure 6.6: GoD. a) Geometry diversity after generating character templates. b) Different body shapes results after displacement mapping and height variations on figure a. Notice the overall shape of the heads is also modified. c) Applying visual diversity on b. d) Adding different combinations of skin features as well as clothing patterns.

Animation Clip Exporting In a first step, within the modeling software Maya we use an Embedded Language (MEL) script to capture an animation into a Comma Separated Values (CSV) file. For each frame and bone in an animation, the script records and writes a line in the CSV file with an array of 16 floating point values corresponding to the world space transformation matrix of the bone at the current animation frame (Figure 6.5b). In a second step, an auxiliary program reads the CSV generated file, writing a 4-pixels wide RGBA texture for every animation frame recorded so that every RGBA generated file holds the world space matrix for all bones in the rig hierarchy for one given frame. These RGBA files will integrate a 3D texture at runtime to configure a vertex shader for animation. When compared to the original data set animations in ANIM format, exporting animation clips

132 in this way requires less space. This reduction in memory requirements for 50 captured animations relies on the global texture parameterization: in the original data set, for example, three animation sets (143 MB each) are required for different ages in the female section (woman in flat heels, in high heels and child), whereas only one set of animation textures (56 MB) is required for all the female characters.

6.1.3

GENERATION OF DIVERSITY

We have described the offline stage of our approach so far. It resulted in asset reduction, rig and skinning encoded in characters UV zones and animation clips exported as a texture. In this section we describe how this data is used in order to generate new characters with geometric, visual and texture diversity.

Geometrical Diversity Generating groups of characters consists of mixing and stitching the body part meshes and texturing them automatically. Methods presented in [171, 172] introduce different approaches to the mesh fitting and mixing problem, as a preprocessing step they follow a similar approach to the one presented in Section 6.1.1. Our case, however, is simpler because meshes are of the same type, original absolute position, orientation, normals and texture coordinates are preserved and after character segmentation topological compatibility between body parts is preserved; thus body templates are generated by putting together body part meshes. Body part mixing and visual appearance are controlled by the user through XML scripting. Users can generate complex groups specifying body templates with different outfit coordinates and textures. In addition, the group’s body shape and height are also controllable. Crowds are formed from these groups defining the number of characters, the type of behavior they follow (reserved for future extensions), animation cycles per group and duration. Figure 6.6a shows a crowd composed of character templates, notice characters of the same gender have the same body complexion and height. After parsing the XML file, assets are loaded into graphics memory avoiding the load of asset duplicates. Resultant character templates are stored as vertex buffer objects (VBO) and textures in different texture arrays: a global texture array storing skin, hair and legs texture atlases, a texture array for each torso storing the fabric folds and pattern textures, a texture array for lower body clothing and a texture array storing facial features. Additional geometrical diversity is added using simple displacement mapping [139]. In our case different heights, body and facial deformations are generated according to thin, fat and muscular bodies; these deformations are calculated using the character’s unique instance id generated at initialization. As a result, instances of the same character template are geometrically different (Figure 6.6b). Our displacement map follows the same distribution shown in Figure 6.3; thus only a map is used encoding in each color component thin, fat and muscular displacements.

133 Visual and Texture Diversity Visual and texture diversity ([131, 132, 133] and [102] respectively) are used to obtain unique characters. Visual diversity is obtained by using color IDs, encoded in texture space or impostor’s alpha component, to represent color addresses in a color palette. On the other hand segmentation maps for each body template are used for texture diversity [102]. Our approach to visual and texture diversity is more general than [102] and focused on the upper body parts. We encode generic features such as fabric folds and pattern textures for clothing. Facial hair, wrinkles, eye sockets or facial spots for facial diversity. Taking advantage of character’s global texture parameterization we can apply all these textures to any generated body template, thus reducing memory requirements. The character’s upper body clothing texture is generated using a fabric folds texture, a pattern texture and a color palette designed by an artist. The pattern texture and color palette were designed based on fashion trends available on different on-line catalogs2 and specifically created for each gender. Figure 6.7 shows an example of upper body clothing texture generation. The pattern texture is a color coded texture for seeking the appropriate color combination out from the color palette. Up to four patterns can be stored in a pattern texture: one pattern per color component. The male and female’s color palettes store one hundred color combinations and each color combination made of eleven compatible colors. Different color combinations for different characters are obtained using a unique instance id, instanceID and modulus 100 in case it is greater than one hundred. More complex colorings are obtained by using patterns storing up to eleven color codes.

Figure 6.7: The pattern texture defines color combination codes stored in the color palette, then results are blended with the fabric folds texture.

The fabric fold texture is blended using Equation 6.1 after color palette look up to improve visual results. From Equation 6.1, Tc is the final clothing texture, base is the fabric fold texture 2

Fashion catalog: http://f-catalog.com/

134 and blend is the fragment color after palette look up. Figure 6.6c shows some garments generated with our approach. Tc =

( 2.0 × base × blend

if base < 0.5

1.0 − 2.0 × (1.0 − base) × (1.0 − blend)

(6.1)

otherwise

The lower body clothing texture is generated by simple color modulation using its upper body color coordinate set. Facial features are randomly selected from the appropriate texture array and blended using Equation 6.1. The next step is the selection of the skin and hair texture to create the final texture composite for each character. In order to select different hair and skin textures, the unique character instanceID is used. The instanceID is treated as a 1D index therefore to switch properly between different skin and hair textures, it is transformed to a 2D index using Equation 6.2a and 6.2b. instanceID N = instanceID − XinstanceID × N XinstanceID =

YinstanceID

(6.2a) (6.2b)

where N is the number of characters. Then Equation 6.3 is used to calculate the texture coordinates to select different hair and skin textures from the atlases. [u, v] = [(Ubodypartmesh , Vbodypartmesh )+ (XinstanceID , YinstanceID )] × 15

(6.3)

Notice that skin and hair atlases have twenty five textures (Figure 6.4), thus Equation 6.3 is multiplied by a factor of 15 . The final texture composite for each character is calculated by linearly interpolating each resultant texture. Let Ts be the skin, Th be the hair, Tt be the torso and Tl the lower body part texture, then the final texture T , is calculated using expression 6.4. T T T T

= = = =

lerp(T, Ts , αTs ) lerp(T, Th , αTh ) lerp(T, Tt , αTt ) lerp(T, Tl , αTl )

(6.4)

where αT ’s are the textures’ transparency values. Figure 6.6d shows the final result of this step, in addition it displays the additional facial features using the methods as described above.

6.1.4

ANIMATION

Once we have discussed the creation of a varied family of characters, in this section, we introduce the algorithm that uses both the generated characters and the rig to produce animation. For performance, this algorithm was implemented using parallel programming, in particular using shader

135 programs. In order to obtain the best frame rate possible, Linear Blend Skinning (LBS) with an influence limit of up to three bones, and forward kinematics techniques were selected to animate the characters, although any other skinning technique can be applied. The rig and skinning texture (Section 6.1.2) will be the input for a vertex shader that in turn will first, determine the corresponding zone for the current vertex. Second, determine the corresponding skinning weights for the current vertex. Third, according to the rigging zone, multiply the corresponding bone matrices by their assigned weights and add them, multiplying the result by the original vertex in a classic Matrix-palette skinning algorithm (chosen for performance as mentioned earlier, but open for another skinning technique). Assume that vertex v is attached to joints j1 , j2 , j3 with weights w1 , w2 , w3 and Mji represents the transformation matrix of joint j to vertex i. Then the transformed vertex v 0 , in LBS is obtained using Equation 6.5. v’ =

n X

wi Mji v

(6.5)

i=1

Equation 6.5 is modified for texture based rigging: let Tid be the id for the rigging zone texture, Tw be the smooth weight texture, (u, v) the texture coordinates of the character family, V the set of vertices of a given mesh, and Mid the transformation matrix for joint id. Then a pose is calculated using expressions 6.6 and 6.7.

id = Tid (u, v)

(6.6)

V [id]+ = Tw (u, v)Mjoint id V [id]

(6.7)

Figure 6.8a shows an animation sequence applied to three different characters, having the characters UV maps in full correspondence, allowing us to use the same rig indistinctly and skinning for any character in the family. We have noticed that: 1. The implementation is based on SIMD instructions, thus iterations along all the vertices of the V set are substituted for texture look ups. 2. Storing the animations in a texture array also makes it possible to apply any of the different animations simultaneously to the different characters in the crowd, or combine the animations. We have therefore different characters, all animatable with any of the animations, or combinations of them, at any level of detail. 3. Our implementation encodes and reproduces a previously generated rigging and skinning process (for this implementation in the form of world matrices, but could be Euler angles and displacements), having embedded it into textures.

136

(a)

(b)

Figure 6.8: a) Transferring animation to different characters. b) Transferring animation to the same character in different levels of detail.

6.2

IMPLEMENTATION

The system was implemented using C++, OpenGL and GLSL. Body part meshes, textures and the XML script were parsed using the Assimp, Devil and TinyXML libraries respectively. Figure 6.9 shows a general diagram of the implementation after assets have been loaded to graphics memory, culling and LOD-selection stage[170]. Character positions and unique Ids are stored in textures and meshes in vertex buffer objects. Rig and skinning information, and different displacement maps are stored in texture arrays. Animation clips and all required texture for generation of diversity are also stored in texture arrays.

Figure 6.9: General diagram of the run-time stage As can be seen the run-time stage was implemented in vertex and fragment programs. The

137 vertex program performs character displacement mapping (Section 6.1.3) and animation (Section 6.1.4) and calculates auxiliary variables to be used by the fragment program. The fragment program performs the operations described in Section 6.1.3 and final lighting calculations. An interesting feature of the run-time stage is its simplicity which is important to preserve in real-time crowd simulations due to the vast amount of calculations that must be performed (behavior simulation, visualization, illumination, motion control, level of detail computations, etc.)

6.3

ADDITIONAL RESULTS

In this section we present additional results and tests performed on a laptop computer with a NVIDIA GPU Geforce 710M and 1024 MB VRAM. Table 6.1 summarizes the space savings3 of our approach, numerical values from row one and two are given in megabytes. Results show that it is important to inspect and reorganize asset libraries in order to avoid redundancy and make easier asset organization in terms of application design. However it is also necessary to propose methods that take advantage of such reductions by reusing the remaining assets in order to generate new items.

Table 6.1: Reduction Summary. Library Our method Reduction

Mesh 108 12 89%

Texture 9 700 204 98%

Animation 708 112 84%

Total 10 516 328 97%

Table 6.2 shows the GPU memory used when running our method. Notice that our approach is benefited from asset reusing as the number of group increases. Figure 6.10(a) illustrates this behavior. Numerical values are given in megabytes and in this case the number of character instances per group is irrelevant since OpenGL instancing does not increase the total amount of memory used. We have also measured the time in seconds it takes to render a frame when running our method. As Figure 6.10(b) shows, an increase in time is present as more groups are included due to the intensive texture binding process.

6.4

CONCLUSIONS AND FUTURE WORK

We have presented a method to generate and animate virtual characters with four advantages: 1. Memory requirements are reduced by 97% when compared to traditional libraries. Results shows that most of the memory space is spent in textures (204 MB, 62%), then animations (112 MB, 34%) and finally polygonal meshes (12 MB, 4%). However the global texture 3

Space savings is calculated as follows: S = 1 −

ReducedSize OriginalSize

138

Table 6.2: GPU memory load. Values are given in megabytes. Groups 1 4 8 16 20

(a)

GPU memory load 110.17 144.96 293.68 388.10 424.10

Memory per Group 110.17 36.24 36.71 24.26 21.21

(b)

Figure 6.10: Rendering results. a) GPU memory load. b) GPU rendering times per characters and group.

parameterization the characters share allows the generation of a reduced number of texture sets that can be used indifferently by any character and for visual and texture diversity generation, while additional savings are obtained by focus texture detail on upper body parts. Character uniqueness is enforced by using polygonal meshes and displacement mapping to depict geometrical and animation diversity. Figure 6.11 shows the animation example for a crowd of characters with a different UV parameterization from the one used in Figures 6.3, 6.5a and 6.8b. 2. Generation of previously nonexistent characters from the original data set. This is especially useful to reduce costs directly related to the acquisition of assets while authoring time is reduced to the equivalent time it take to type a simple XML script. 3. Embedding the rig and skinning into texture space allows painless animation transfer between different characters. Preserving the global texture parameterization ensures that full correspondence of the major body features is also preserved for the whole family of gener-

139

(a)

(b)

(c)

(d)

Figure 6.11: GoD for an additional character. a) UV parameterization, rig texture, skinning texture, skin map. b) Geometry for a single character. c) Different body shapes results after displacement mapping and height variations on figure b. d) Applying visual diversity on c and adding different combinations of clothing patterns.

ated characters. Embedding the rig and skinning into this texture space allows us to generate these textures only once and apply them to other characters of the same family (Figure 6.8a) thus achieving animation diversity is straightforward and maintains animation space requirements low. Texture space has been found to be useful in several areas in computer graphics such as global illumination ([173]) and geometry images ([174, 175]), its associated dense space avoids size dependency and topology issues common in vertex representations. In addition, in the case of facial modeling, it has been shown ([176, 177]) that using a parameterized space is author friendly, reduces modeling time required and complexity. Our approach extends these methods but instead of generating a specific parameterization, we take full advantage of the already existent texture parameterization in polygonal meshes (Figure 6.3, Figure 6.11). The method is more intuitive because it occurs in texture space rather than 3D, which reduces complexity not just for authoring new characters. 4. Different levels of detail handling. An important feature crowd visualization should add is the possibility to handle frustum culling, occlusion culling and level of detail. Our approach manages discrete LOD (Figure 6.8b) making possible the use of alternative representations such as impostors4 and point based rendering for characters far from the camera’s point of view. However our approach has some drawbacks that can be overcome in future work: • Each body part family has its own physical proportions (weight and height) and its own topological compatibility between head-torso and torso-legs, thus in the current implementation meshes from different gender or age can not be combined. • Our implementation of semi automatic rig zones texture creation can be fully automatic, based on a previously known skeleton hierarchy. Moreover, as the skeleton is implicit in the rig zones texture, it can also be extracted from it. 4

Increasing memory requirements which is undesired.

140 • We are working on a rigging zone mapping algorithm, so that an animation can not only be transferred among characters with the same UV parameterization, but among characters from different UV families. • Our current implementation uses manually pregenerated rig and skinning information. We are working in generating smooth skinning weights automatically. We are using the most common skinning algorithm but we are studying a better alternative: use a distance based approach [Rudomin and Castillo 2002]. In this way we would avoid the artifacts of most common skinning algorithms at a negligible extra cost. • Our algorithm uses textures extensively; the use of texture arrays reduced texture binding but further improvements can be possible by taking advantage of the OpenGL bindless textures extension. • User characterization of crowds using XML can be enhanced by defining higher order features such as weather conditions, fashion trends or season of the year to constantly updating pattern and color textures.

7. COLLISION AVOIDANCE SIMULATOR In order to generate a crowd simulation useful to test the concepts discussed in chapters 3, 4 and 5, we require a configurable crowd rendering engine that meets the following requirements: • Common configurable requirements: 1. An efficient, interactive and extensible graphical rendering core. 2. Loading and efficient display of user-defined 3D models for characters, scenarios and additional environmental objects. 3. Implementation of virtual cameras to evaluate selected aspects of the simulation. 4. Post-processing effects to enhance the appearance of the simulation. 5. Loading and reproduction of audible effects and music to complete the simulation experience. • For the GPU-based Markov Decision Process Solver: 1. A mechanism to load the generated MDP-B´ezier solutions. 2. A mechanism to evaluate the loaded B´ezier curves in real-time for all agents. 3. A configurable CUDA compatible rendering core to test the adaptive aspect of the method. 4. A mechanism to transform agent-related data into 2D textures to comply with the requirements of the method.

141

142 5. A configurable GLSL compatible rendering core to test the Shaders defined in the method. • For the Cellular Automaton for Local Collision Avoidance: 1. A configurable CUDA compatible rendering core to test the GPU implementation. 2. A mechanism to displace the characters with information generated by the Cellular Automaton. We have developed a crowd rendering engine, named CASIM1 (for Collision Avoidance Simulator) that fulfills these requirements. Its core is written in the C++ programming language and uses the OpenGL2 library for its graphical capabilities, with additional characteristics and extensions detailed in the rest of this section. We have also developed a Character Generator to produce diverse models from a set of topologically compatible parts that share similar UV maps.

7.1

THIRD PARTY COMPONENTS

External, Open Source projects that fulfill requirements of the rendering engine, have been integrated and are listed in this Section.

7.1.1

AUDIO

As Blauert [178] has stated, even though human beings are primarily visually oriented, auditory stimuli is an essential factor that helps the perceiver to become conscious of what is perceived. This concept justifies the need to integrate an audio processor to our rendering engine, that will be useful as it helps the user to experience a more realistic simulation. Two external, Open Source projects are then to be operating closely with our rendering engine: OpenAL3 and FMOD4 . The first is relevant for its positional audio capabilities, and the latter for its sound quality and mature programming interface.

7.1.2

IMAGES

Images in two dimensions directly impact the performance of a rendering engine for the following reasons: 1

Ruiz, S. (n.d.). Collision Avoidance Simulator. Retrieved June 21, 2012, from https://sites.google. com/site/collisionavoidancesimulator 2 OpenGL - The Industry Standard for High Performance Graphics. (n.d.). OpenGL - The Industry Standard for High Performance Graphics. Retrieved June 21, 2012, from http://www.opengl.org/ 3 Peacock, D. (n.d.). Home - OpenAL. Home - Creative Labs: Connect. Retrieved June 21, 2012, from http: //connect.creativelabs.com/openal/default.aspx 4 fmod Interactive Audio. (n.d.). fmod Interactive Audio. Retrieved June 21, 2012, from http://www.fmod. org/

143 1. They are used to wrap or “skin” objects present in a scene, to the point that they are known as textures. From the images used to represent skies, floors and walls, to the small details or decals in clothing, two-dimensional images are mapped to three-dimensional objects by means of the uv-mapping technique described by Hern´andez [1]. This justifies the need to include an efficient two-dimensional image manager within our rendering engine. 2. As computational resources are scarce, every mean capable to reduce processing burden should be utilized. One of this means is the MIP mapping technique [179], introduced by Williams [180] in 1983, and its idea is to use a hierarchy of images to simplify the rendering of complex objects at varying levels of detail. In fact, MIP is the acronym for “multum in parvo” or “much in little”, and our image manager should support this feature. Specifically, R we want to exploit the use of the DirectDraw Surface5 (DDS) file format, introduced with R DirectX 7 to support uncompressed and compressed textures. 3. They are also used to create Geometry Images [174], and to perform Image-based Rigging [1], off-screen rendering (via Frame Buffer Objects) and post-processing effects that will be addressed in the following subsections. Considering the importance of efficient two-dimensional image management, we delegate the core task of image loading to the Open Source project Developer’s Image Library6 (DevIL), for its ability to efficiently load a wide variety of image formats (including the DDS format) into the memory of the host.

7.1.3

MODEL PARSING

Every object displayed by our rendering engine consists of a set of triangles that together form a closed three-dimensional surface, also known as a mesh. However, several file formats, both proprietary and Open Source have arisen to describe and compress this mesh data. But before considering model file parsing, an implementation question arises: why parsing external model files into a real-time simulation instead of generating the data within the rendering engine? The answer comes from the fact that such file formats are the result of an artistic and technical process that requires its own set of tools which in turn are out of the scope of real-time rendering. Based on this fact, a three-dimensional model parser is required in order to uniformly obtain—regardless of the enclosing file format—for each vertex in every triangle in the model: Location 5

The x, y and z coordinates for a vertex in a three-dimensional frame of reference.

DDS. (n.d.). MSDN Explore Windows, Web, Cloud, and Windows Phone Software Development. Retrieved June 21, 2012, from http://msdn.microsoft.com/en-us/library/windows/desktop/ bb943990%28v=vs.85%29.aspx 6 DevIL - A full featured cross-platform Image Library. (n.d.). DevIL - A full featured cross-platform Image Library. Retrieved June 21, 2012, from http://openil.sourceforge.net/

144 Normal

The normalized direction vector (nx, ny, nz) in which the light will reflect from a vertex.

Color

When the mesh is not textured, the (Red, Green, Blue) mixture of colors to render a vertex.

Tangent

The natural displacement normalized direction vector (tx, ty, tz) for a vertex.

Texture Coordinates

When the mesh is textured, the coordinates (u, v) determine a two dimensional point assigned for a vertex within the texture associated to a model, hence the term “uv-mapping”.

Once collected, this data will be stored within a 64-byte padded structure that is efficient for rendering, defining our basic rendering unit. We entrust the model loading routines to the ASSIMP7 (Open Asset Import Library) project, which provides a full asset importing pipeline for use in real-time rendering systems.

7.1.4

SCRIPTING

The rendering engine will be capable of interpreting rendering instructions and scenario topology instructions. We have decided to exploit the self-descriptive nature of the Extensible Markup Language (XML), as well as the simplicity in the standard of the INI files for the following purposes: • We have found that INI files are suitable for describing which XML files containing rendering or behavioral instructions will be necessary to parse. An example follows in Listing 7.1: Listing 7.1: CASIM File Parsing Descriptor. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 7

[ constants ] f i l e = ”CONFIG / THEMES / EVAC [ behavior ] f i l e = ”CONFIG / THEMES / EVAC [ objects ] f i l e = ”CONFIG / THEMES / EVAC [ agents ] f i l e = ”CONFIG / THEMES / EVAC [ skybox ] f i l e = ”CONFIG / THEMES / EVAC [ cameras ] f i l e = ”CONFIG / THEMES / EVAC [ lights ] f i l e = ”CONFIG / THEMES / EVAC [ audio ] f i l e = ”CONFIG / THEMES / EVAC [ menu ] f i l e = ”CONFIG / THEMES / EVAC [ net ] f i l e = ”CONFIG / THEMES / EVAC [ log ] f i l e = ”CONFIG / THEMES / EVAC

A / c o n s t a n t s . xml ” A / b e h a v i o r . xml ” A / o b j e c t s . xml ” A / a g e n t s . xml ” A / s k y b o x . xml ” A / c a m e r a s . xml ” A / l i g h t s . xml ” A / a u d i o . xml ” A / menu . xml ” A/ n e t p r o t o c o l . i n i ” A/ log . html ”

Open Asset Import Library. (n.d.). Open Asset Import Library. Retrieved June 21, 2012, from http: //assimp.sourceforge.net/

145 • We also found that INI files are useful as simple network protocol describers, as shown in Listing 7.2: Listing 7.2: CASIM Network Descriptor. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

[ casim net protocol ] CAM FORWARD CAM BACKWARD CAM LEFT CAM RIGHT CAM NEXT CAM PREV CAM ZOOM IN CAM ZOOM OUT CAM PAN LEFT CAM PAN RIGHT CAM PAN UP CAM PAN DOWN CAM NO STEREO CAM ACTIVE STEREO CAM PASSIVE STEREO TOGGLE BLOOM TOGGLE SSAO TOGGLE SHADOWS TOGGLE BUMPMAPPING TOGGLE AXIS TOGGLE PATHS TOGGLE PARTICLES TOGGLE COLLISIONS TOGGLE GEOMETRIES TOGGLE FRUSTUMS TOGGLE TANGENTS TOGGLE CAM TILT EXIT CASIM

= = = = = = = = = = = = = = = = = = = = = = = = = = = =

1 2 3 4 16 17 18 19 21 22 23 24 26 27 28 5 6 7 8 9 10 11 12 13 14 15 25 20

• The Open Source project iniParser8 provides a suitable interface to read and extract data from INI files, and will be integrated into our rendering engine. • Utilizing the constructs provided by the XML Specifications9 , we define the following instruction holders:

8

Agents

Defines input simulation data for agents: name, number of clones, visual effects, scale, format, model file, associated 2D textures, initial position, initial direction and static B´ezier control points.

Audio

Defines input simulation data for audible content: audio library provider, number of tracks, track name, track type (footstep, background or music) and audio file name.

Cameras

Defines input data for virtual cameras: camera name, type (free or fixed), visual aids for the camera object, 3D file name, initial position and direction,

iniParser: stand-alone ini parser library in ANSI C. (n.d.). Nicolas Devillard home page. Retrieved June 21, 2012, from http://ndevilla.free.fr/iniparser/ 9 Extensible Markup Language (XML). (n.d.). World Wide Web Consortium (W3C). Retrieved June 22, 2012, from http://www.w3.org/XML/

146 Field of View (vertical aperture), frustum near and far distance, pivot point and up vector. Constants

Defines constant simulation rendering values: OpenGL context version, 2D or Rectangular texture target and rendering techniques (Immediate mode, Instancing), scene dimensions (width, height, depth, center), window dimensions, post processing effects constants, collision detection technique, View Frustum Culling technique and provider (CPU or GPU) and 3D rendering type (Anaglyph, Active Stereo, No Stereo).

Lights

Defines light simulation data: for global ambient data the RGBA values. For an user-defined number of positional or directional lights: position, color, ambient, specular component and direction.

Menu

Defines a list of menu entries to be displayed on top of the simulation window.

Objects

Defines simulation data for additional environmental objects: name, number of clones, visual effects, scale, format, model file, associated 2D textures, initial position and direction.

Skybox

Defines input simulation data to construct a rectangular prism (in which the simulation will be displayed, providing a simulated physical environment): front, back, left, right, top and bottom textures with optional independent tiling factors.

• An example for a rendering constants instruction holder in XML form is presented in Listing 7.3: Listing 7.3: CASIM Rendering Constants Descriptor. 1 2 3 < g l u t c o n t e x t m a j o r v e r s i o n>3 4 1 5 < t e x t u r e t a r g e t>GL TEXTURE RECTANGLE 6 1 0 0 0 . 0 7 2 0 0 . 0 8 1 0 0 0 . 0 9 800 10 600 11 600 12 100 13 1 14 < s s a o r a t i o>1 . 0 15 2 . 0 16 0 . 2 5 17 TRUE 18 < c o l l i s i o n d e t e c t i o n e n a b l e d>TRUE 19 < c o l l i s i o n d e t e c t i o n p r o v i d e r>CPU 20 < c o l l i s i o n v i c i n i t y>60 21 < a g e n t t r a n s l a t i o n p r o v i d e r>CPU 22 RADAR 23 CPU 24 CASIM CAMERA NO STEREO 25

147 • The Open Source project TinyXML10 provides a suitable interface to parse data from XML files, and will be integrated into our rendering engine.

7.1.5

COMPUTER GRAPHICS MATHEMATICS

In Computer Graphics, mathematical operations as matrix multiplication, quaternion algebra and 3D vector algebra are common, and a useful tool to program these routines is the GLM11 library, that is extensively used throughout the engine source code.

7.1.6

CODE PROFILING

Evaluation of the rendering core efficiency is crucial to improve its performance, and knowledge of the time invested in every rendering stage is the process that reveals processing bottlenecks that should be analyzed and avoided with appropriate measures: from rewriting inefficient code to the obtention of better performing hardware. The VSL12 library provides a useful profiling tool that has been integrated into our rendering engine.

7.1.7

DOCUMENTATION

R The Open Source project Doxygen13 , along with the Microsoft HTML Help Compiler14 (HHC) are used to generate a compiled HTML help file to aid in the development and maintenance of the CASIM project (Figure 7.1).

7.2

INTERNAL COMPONENTS

The following components have been developed to integrate the crowd rendering engine (Figure 7.2), each of them as a modular, object-oriented C++ project. 3D Model

10

An interface to ASSIMP and the rendering core, handles the loading and display functions for every object (whether a character or an environmental element) in a crowd simulation.

TinyXML — Free Development software downloads at SourceForge.net. (n.d.). SourceForge - Download, Develop and Publish Free Open Source Software. Retrieved June 21, 2012, from http://sourceforge.net/ projects/tinyxml/ 11 OpenGL Mathematics. (n.d.). OpenGL Mathematics. Retrieved June 22, 2012, from http://glm.g-truc. net/ 12 Very Simple * Libs Lighthouse3d.com. (n.d.). Lighthouse3d.com. Retrieved June 22, 2012, from http: //www.lighthouse3d.com/very-simple-libs/ 13 Doxygen. (n.d.). - StackWiki. Retrieved June 22, 2012, from http://www.stack.nl/˜dimitri/ doxygen/ 14 Download: HTML Help Workshop and Documentation - Microsoft Download Center - Download Details. (n.d.). Microsoft Corporation. Retrieved June 22, 2012, from http://www.microsoft.com/en-us/download/ details.aspx?id=21138

148

Figure 7.1: CASIM documentation.

AI

An extensible interface to the rendering core, to test diverse AI techniques for agents.

Audio Manager

An interface to OpenAL, FMOD and the rendering core, loads and reproduces audible crowd simulation data such as environmental sounds and music.

Camera

A programmed abstraction of a video camera, may emulate pan, tilt, dolly and zoom actions. Works closely with the Frustum project.

Config

In charge of reading and validating both INI and XML files to initialize a crowd simulation. Works closely with the XML Parser project.

CUDA Path Manager An interface for OpenGL, CUDA15 and the rendering core, in charge of uploading, request processing and downloading data to and from the GPU. FBO Manager

A handler for the OpenGL facility known as a Frame Buffer Object: an off-screen texture to store user-defined data. It is also used to generate post-processing visual effects as Blur, Bloom and Screen-space Ambient Occlusion.

Frustum

An efficient programmed representation of a truncated pyramid, implements the CPU-based Geometric and Radar View Frustum Culling methods. Every Camera instance has an associated Frustum object.

15

Parallel Programming and Computing Platform — CUDA — NVIDIA. (n.d.). World Leader in Visual Computing Technologies — NVIDIA. Retrieved June 25, 2012, from http://www.nvidia.com/object/cuda_home_ new.html

149

Figure 7.2: Collision Avoidance Simulator architecture.

GL Error Manager

An utility to detect and display errors related to the OpenGL library.

GL Ext. Manager

An utility to detect hardware compatibility with OpenGL implementation extensions.

GLSL Manager

An utility to read, compile, and load Shaders into GPU memory, allowing user-managed activation.

Log Manager

A Singleton Pattern object to consistently report crowd simulation status in the console window and in HTML format to provide a persisten crowd simulation report.

Macros

A listing to define internal engine states and C++ compiler constants, exposing library functionalities to the rendering core. Also describes commonly used algorithms that are not programmed as functions, but declared as C Macros.

Projection Manager

An interface to OpenGL that conveniently programs rendering mode switches from 2D to 3D. This procedure is useful for post-processing effects generation and also to print text on screen.

Screen Text

An utility to encapsulate on-screen text display functionalities.

Skybox Manager

This project encapsulates Skybox (distant visual components of the scenario as the sky, mountains or cityscapes) functionality, in order to efficiently load, tile and display its associated textures.

150 Socket Server

Provides a networking interface for the rendering engine, implementing R the use of standard Network Sockets (currently implements Windows sockets16 (WinSocks), but is extensible to support other operating systems).

Spatial Config

A project that uniquely defines position and orientation for a simulated object in three dimensions, by using an azimuth and a zenith angle. It is constructed as a separate project due to the important need (in terms of processing efficiency) for consistently referring to spatial configuration for every simulated object.

Static LoD

Implements a version of the GPGPU Static Level of Detail technique by Hern´andez and Rudomin [170], accelerating the rendering speed of the engine core.

String Utils

An utility project to manage Shader strings of text. Also extends the functionality of the Standard Template Library17 (STL) to include every string handling routine required for the rendering engine.

System Info

A project to analyze and determine hardware capabilities prior to, and during the execution of the simulation. Currently reports CPU percentage R usage (only for Windows but is extensible to support other operating systems), GPU vendor and memory load.

Texture Manager

The purpose of this project is to efficiently load compressed and uncompressed 2D images (using the DevIL library), provide a mechanism to display them (whether they have transparency or not), and finally free the memory occupied when these images are no longer needed. The project also provides a programmed screen capture utility for the rendering engine.

VBO Manager

Vertex Buffer Objects are an OpenGL facility to construct basic rendering unit arrays that will be packed and sent in bulk to the GPU, achieving a more efficient, faster rendering. This project encapsulates the VBO functionality and is also able to bind a VBO to CUDA processing, in such a way that a change to the bound VBO made with OpenGL will be automatically reflected in CUDA, and viceversa.

16

Windows.Networking.Sockets namespace. (n.d.). MSDN Explore Windows, Web, Cloud, and Windows Phone Software Development. Retrieved June 22, 2012, from http://msdn.microsoft.com/en-us/library/ windows/apps/windows.networking.sockets 17 Standard Template Library Programmer’s Guide. (n.d.). SGI - The Trusted Leader in Technical Computing: HPC, Servers, Storage, Data Center Solutions, Cloud Computing. Retrieved June 22, 2012, from http://www. sgi.com/tech/stl/

151 Vertex

Defines the basic modeling unit. Its importance resides in that: (i) it is used by every project within the rendering engine defining points to be eventually rasterized to the screen; (ii) its memory usage efficiency impacts performance of the rendering engine. Current engine design reserves 64 bytes in memory per vertex to ensure an efficient rendering.

Figure 7.3: CASIM rendering features.

XML Parser

An interface to the TinyXML library, holds a mapping structure from tags that, when parsed, are stored in data structures that the rendering engine will use in its initialization stage. This project may be extended to accept any user-defined data that TinyXML can parse.

Figure 7.3 shows a collection of screen captures demonstrating the rendering features of CASIM.

152 7.3

CHARACTER GENERATOR

Figure 7.4: CHARACTOY workspace and visual aids. We present a character authoring software with built-in functionality to load and automatically stitch compatible body parts. CHARACTOY (Figure 7.4) is a software tool that, given a set of the following 3D models: (i) arms (ii) feet (iii) hair (iv) head (v) legs (vi) torso, and a set of textures of type: (i) template (ii) diffuse (iii) normal (iv) specular, is able to stitch these topologically compatible body parts given the layout in the template texture. The user is then able to exchange a specific body part or texture, while the software automatically finds boundary UV coordinates that finally welds to compose a single model with complete triangular geometry, UV parameterization, diffuse, normal and specular textures. This interchangeable parts scheme allows for novel character

153 generation from a standard character database (Figure 7.5), given that required parts are already sectioned.

Figure 7.5: Novel character generation process with CHARACTOY. CHARACTOY is programmed with C++, and makes use of the Qt Project18 framework for the Graphical User Interface, and the ImageMagick19 software for the image handling routines.

18

Qt Project. (n.d.). Retrieved November 10, 2014, from http://qt-project.org/ Convert, Edit, and Compose Images. (n.d.). Retrieved November 10, 2014, from http://www. imagemagick.org/ 19

8. CONCLUSION In the fields of entertainment, disaster prevention, public safety and urban planning, accurate crowd simulations are a valuable tool in the measure they can reproduce human behavior. If in addition these simulations can optimize hardware resources, their value increases with the power to simulate larger numbers of agents, since world population as well as population density trends point to a reality in which dealing with crowd related issues will be increasingly common. In this thesis we address the subject of simulating the crowd phenomena in an accurate and efficient manner, and for this purpose we designed and implemented a hybrid method for Macro and Micro simulation of crowd behavior. In this chapter we summarize our work and contributions related to the development of this method (Section 8.1), and finally discuss topics for future work in Section 8.2.

8.1

SUMMARY AND CONTRIBUTIONS

Through research of related work, we gathered results to establish the validity of two definitions that support the theory behind our hybrid method. These definitions are: 1. Macro crowd level optimal paths may be accurately simulated by Q-learning penalty-reward schemes evaluated over a mesh A that represents possible actions concerning path choices for groups of agents in large-sized crowds. 2. Micro crowd level least-effort paths may be accurately simulated by a Cellular Automaton over a lattice of sites B that partitions navigable space for Small-sized crowds. For Macro crowd simulations, we demonstrated that a parallel algorithm is able to accurately and efficiently solve the mapping from a navigable scenario with obstacles to a Q-learning formal154

155 ism, delivering Long-Range paths for a Macro crowd simulation when such Q-learning formalism solves a Markov Decision Process via Value Iteration. For Micro crowd simulations, we showed that a parallel algorithm is able to accurately and efficiently solve the Local Collision Avoidance problem via generation of cyclic phenomena typical of a Cellular Automaton. We showed this with the implementation of a LCA Cellular Automaton based on the Lattice Gas Cellular Automata principles, that grants separation and individual control for agents. We designed our Hybrid Crowd Movement Algorithm with the idea that Macro and Micro simulations can share and optimize computational resources without sacrificing optimal LongRange path planning or finer control for agents. To this end we defined a Local Navigation Map, that functions as an interface between Navigation and Local Collision Avoidance, separating their concerns but at the same time allowing them to cooperate in their common task of moving crowds. This cooperative concept is the cornerstone of our HCMA, supporting the following conclusions: Conclusion 1. Let SPACE be the area of navigable space in a virtual scenario for a simulated crowd. Let MACRO be the partition of SPACE corresponding to the set of states for the Navigation MDP. Then MICRO can be in turn a partition of MACRO used for the lattice of connected sites in the Local Collision Avoidance Cellular Automaton when cooperative execution of Macro and Micro simulation algorithms is required. Conclusion 2. A hybrid algorithm for Macro and Micro simulation of crowd behavior is able to accurately and efficiently: 1. Generate an Optimal Policy Π∗ from the MDP to produce Long-Range paths for groups of agents. 2. Use Π∗ as input for a Local Navigation Map, that in turn will guide simulated agents in a crowd toward the directions signaled by Π∗ while avoiding local collisions guided by the LCA Cellular Automaton algorithm. An implication of the above conclusions is that a parallel implementation of the HCMA is not only viable but straightforward, since we have showed that each of the algorithms that compose it have parallel versions. Finally, we have tested the scope, accuracy and efficiency of the HCMA presenting an analysis of execution results from crowd simulations controlled by a parallel HCMA implementation. For efficient character animation and modeling, we presented a virtual motion capture system for characters animated with commercially available 3D modeling software; a texture-based smooth skinning system complementing a previously designed texture-based rig system for animated characters in virtual crowds that reduces memory requirements, and a character generation software to load and stitch topologically compatible body parts. To simulate virtual crowds moving within a virtual scenario and responding to user interaction, we presented the architecture for a configurable rendering engine. An implementation of this rendering engine architecture simulates a crowd with such characteristics, controlled in real-time by the parallel HCMA.

156 We contributed applications of the above techniques to simulate crowds for casual walking, evacuations and video games. Sections of this research are also applied to interactive visualization of characters in GPU clusters at Barcelona Supercomputing Center.

8.2

FUTURE WORK

During development and implementation of the Hybrid Crowd Movement Algorithm, we encountered opportunities to improve its performance, as well as ideas for future work which are listed below.

8.2.1

PERFORMANCE IMPROVEMENT

In order to improve performance of the HCMA, and considering the possibility to simulate Hugesized crowds, we present the following topics as opportunities for future work. LCA LoD We have found that heavily congested scenarios impact performance of the HCMA, so that gradual accuracy resolution of Local Collision Avoidance, from individually controlled agents for close visualization, to fluid-like control for characters far from the camera would improve simulation performance. LCA Culling Since characters outside of the camera frustum are culled to improve rendering performance, statistical LCA estimation for characters outside of the camera frustum can also improve performance for AI calculations. Cluster implementation The lattice of connected sites approach allows for Cluster implementation of the HCMA if site boundary conditions are adapted to work in scheduled computing nodes set to perform the same task. A Cluster HCMA implementation would enable Huge-sized simulated crowds.

8.2.2

NEW APPLICATIONS

Agent type A possible extension of the HCMA is to adapt it to simulate other types of agents, as animals or vehicles. Robotics Another possible application for the MDP solver is to provide Optimal Policies faster than sequential methods for robots with on-board mobile GPUs.

157 Motion planning The MDP solver can also be applied to provide locomotion parameters for characters faster than sequential Q-learning methods. Scenario authoring We have also noticed that scenario authoring can be a difficult task when we lack the means to reproduce topology features automatically. Another opportunity for future work is to standardize scenario authoring with the use of Geographical Information Systems. Finally, we find an opportunity for future work in developing methods to correlate measurements from real crowd Navigation and Collision Avoidance to their simulated versions, producing test methods that can validate the latter.

BIBLIOGRAPHY [1] ARREGU´IN, B. H. Modeling and Animation of Virtual Characters for Crowds. PhD thesis, ITESM, feb 2008. [2] THALMANN, D., AND MUSSE, S. R. Crowd Simulation. Springer, 2007. [3] LE BON, G. The Crowd: A Study of the Popular Mind. The Criminology Series. The Macmillan Co., 1896. About the electronic version: The Crowd: A Study of the Popular Mind. (n.d.). University of Virginia Library. Retrieved March 7, 2012, from http://etext.lib.virginia.edu/toc/modeng/public/BonCrow.html. [4] JOHANSSON, A., BATTY, M., HAYASHI, K., BAR, O. A., MARCOZZI, D., AND MEMISH, Z. A. Crowd and environmental management during mass gatherings. In Mass Gatherings Health. Centre for Advanced Spatial Analysis, University College, London, UK, 2012. [5] ZHOU, S., CHEN, D., CAI, W., LUO, L., LOW, M. Y. H., TIAN, F., TAY, V. S.-H., ONG, D. W. S., AND HAMILTON, B. D. Crowd modeling and simulation technologies. Transactions on Modeling and Computer Simulation (TOMACS) 20, 4 (Oct 2010). [6] MARTIN, J. Programming real-time computer systems. Prentice-Hall series in automatic computation. Prentice-Hall, 1965. ¨ [7] AKENINE-MOLLER, T., HAINES, E., AND HOFFMAN, N. third ed. A K Peters, Ltd., 2008.

Real-Time Rendering,

[8] BANERJEE, B., ABUKMAIL, A., AND KRAEMER, L. Advancing the layered approach to agent-based crowd simulation. In Proceedings of the 22nd Workshop on Principles of Advanced and Distributed Simulation (2008), IEEE, IEEE Computer Society, pp. 185–192. [9] MAGNENAT-THALMANN, N., AND THALMANN, D. Handbook of Virtual Humans. John Wiley & Sons, Ltd, 2004. [10] REYNOLDS, C. W. Flocks, herds, and schools: A distributed behavioral model. In SIGGRAPH (1987). [11] MCCRAE, R. R., AND JOHN, O. P. An introduction to the five-factor model and its applications. Journal of Personality 60, 2 (Jun 1992), 175–215. [12] GUY, S. J., KIM, S., LIN, M. C., AND MANOCHA, D. Simulating heterogeneous crowd behavior using personality trait theory. In SIGGRAPH Symposium on Computer Animation (2011), Eurographics/ ACM. [13] BELTAIEF, O., HADOUAJ, S. E., AND GHEDIRA, K. Multi-agent simulation model of pedestrians crowd based on psychological theories. In Logistics (LOGISTIQUA) (2011), IEEE. 158

BIBLIOGRAPHY

159

[14] BRUNO, L., AND VENUTI, F. The pedestrian speed-density relation: Modelling and application. In Footbridge (2008), Wroclaw University of Technology. [15] HOU, B., WANG, B., YAO, Y., LIAO, D., AND LIU, J. Crowd psychology simulation incorporating psychometrics and intervention of relationship spaces. In First International Conference on Advances in System Simulation (2009). ´ [16] GARC´IA-ROJAS, A., GUTIERREZ, M., AND THALMANN, D. Simulation of individual spontaneous reactive behavior. In International Conference on Autonomous Agents and Multiagent Systems (2008), IFAAMAS. [17] GARC´IA-ROJAS, A., VEXO, F., AND THALMANN, D. Individualized reaction movements for virtual humans. In Conference on Computer Graphics and Interaction Techniques (2006), ACM. [18] DURUPINAR, F., ALLBECK, J., PELECHANO, N., AND BADLER, N. Creating crowd variation with the ocean personality model. Autonomous Agents and Multi-Agents Systems (2008), 1217–1220. [19] NORMAN, D. A. Emotional Design: Why we love (or hate) everyday things. Basic Books, 2004. [20] SHARMA, S., OTUNBA, S., AND HAN, J. Crowd simulation in emergency aircraft evacuation using virtual reality. In The 16 th International Conference on Computer Games (2011). [21] REYNOLDS, C. W. Steering behaviors for autonomous characters. In Game Developers Conference (1999), Miller Freeman Game Group, pp. 763–782. [22] CHATTARAJ, U., SEYFRIED, A., AND CHAKROBORTY, P. Understanding pedestrian motion across cultures: Experiments and modeling. Presentation for the Indian Institute of Technology at Kanpur, 2009. [23] FRIDMAN, N., ZILKA, A., AND KAMINKA, G. A. The impact of cultural differences on crowd dynamics in pedestrian and evacuation domains. Tech. Rep. FA86551013048, The MAVERICK Group, Sep 2011. [24] HALL, E. T. A system for the notation of proxemic behavior. American Anthropologist 65, 5 (Oct 1963), 1003–1026. [25] LEVINE, R. V., AND NORENZAYAN, A. The pace of life in 31 countries. Journal of Cross-Cultural Psychology 30, 2 (Mar 1999), 178–205. [26] MOUSSA¨ID, M., PEROZO, N., GARNIER, S., HELBING, D., AND THERAULAZ, G. The walking behaviour of pedestrian social groups and its impact on crowd dynamics. PLoS ONE 5, 4 (Apr 2010).

BIBLIOGRAPHY

160

[27] LOSCOS, C., MARCHAL, D., AND MEYER, A. Intuitive crowd behaviour in dense urban environments using local laws. In Theory and Practice of Computer Graphics (2003). [28] HE, C., XIAO, H., DONG, W., AND DENG, L. Dynamic group behavior for real-time multi-agent crowd simulation. In International Conference on Computer and Automation Engineering (2010), IEEE. [29] MAGNENAT-THALMANN, N., AND THALMANN, D. Artificial Life and Virtual Reality. John Wiley & Sons Ltd., 1994. [30] RAUPP, S., AND THALMANN, D. Hierarchical model for real time simulation of virtual human crowds. In IEEE Transactions on Visualization and Computer Graphics (2001). [31] MAGNENAT-THALMANN, N., AND THALMANN, D. Real-time individualized virtual humans. Lecture at SIGGRAPH Asia, 2008. [32] BLANK, B., BROADBENT, A., CRANE, A., AND PASTERNAK, G. Defeating the authoring bottleneck: Techniques for quickly and efficiently populating simulated environments. In IMAGE Conference (2009). [33] COUZIN, I. D., IOANNOU, C. C., DEMIREL, G., GROSS, T., TORNEY, C. J., HARTNETT, A., CONRADT, L., LEVIN, S. A., AND LEONARD, N. E. Uninformed individuals promote democratic consensus in animal groups. Science Magazine 334, 6062 (December 2011), 1578–1580. [34] HART, P. E., NILSSON, N. J., AND RAPHAEL, B. A formal basis for the heuristic determination of minimum cost paths. Systems Science and Cybernetics, IEEE Transactions on 4, 2 (1968), 100–107. [35] DIJKSTRA, E. W. A note on two problems in connexion with graphs. Numerische mathematik 1, 1 (1959), 269–271. [36] CORMEN, T. H., LEISERSON, C. E., RIVEST, R. L., AND STEIN, C. Introduction to Algorithms, second ed. The MIT Press/McGraw-Hill Book Company, 2001. [37] ROSSMANN, J., HEMPE, N., AND TIETJEN, P. A flexible model for real-time crowd simulation. In International Conference on Systems, Man, and Cybernetics (Oct 2009), IEEE, pp. 2085–2090. [38] HELBING, D., AND MOLNAR, P. Social force model for pedestrian dynamics. PHYSICAL REVIEW E 51 (1995), 4282. [39] PELECHANO, N., O’BRIEN, K., SILVERMAN, B., AND BADLER, N. Crowd simulation incorporating agent psychological models, roles and communication. In First International Workshop on Crowd Simulation (V-CROWDS) (Oct 2005), vol. 29, ACM, Pergamon Press, Inc.

BIBLIOGRAPHY

161

[40] STILL, G. K. Crowd Dynamics. PhD thesis, University of Warwick, Aug 2000. [41] BLUE, V. J., EMBRECHTS, M. J., AND ADLER, J. L. Cellular automata modeling of pedestrian movements. In Systems, Man, and Cybernetics (Oct 1997), vol. 3, IEEE, pp. 2320–2323. [42] GRONNELOV, T. A. Parallel implementation of large scale crowd simulation. Crowd simulation with CUDA - Greenleaf. (n.d.). Greenleaf. Retrieved March 10, 2012, from http://www.greenleaf.dk/projects/cudacrowd, 2009. [43] ZHANG, S., LI, M., LI, F., LIU, A., AND CAI, D. A simulation model of pedestrian flow based on geographical cellular automata. In Geoinformatics (Jun 2011), IEEE. [44] ZHIQUIANG, K., CHONGCHONG, Y., LI, T., AND JINGYAN, W. Simulation of evacuation based on multi-agent and cellular automaton. In Mechatronic Science, Electric Engineering and Computer (Aug 2011), IEEE. [45] NAVIER, C. L. M. H. M´emoire sur les lois du mouvement des fluids. Mem. Acad. Sci. Inst. Fr. 6 (1823), 389–416. [46] CHIRILA, D. B. Introduction to lattice boltzmann methods. In Alfred-Wegener-Institut f¨ur Polar-und Meeresforshung (Feb 2010). [47] WAGNER, A. J. A Practical Introduction to the Lattice Boltzmann Method. North Dakota State University, Mar 2008. ´ J. Motion planning and autonomy for virtual humans: Part 4. case study 2. part i. [48] PETTRE, design and simulation of virtual crowds. In SIGGRAPH (2008), ACM. [49] SARMADY, S., HARON, F., AND TALIB, A. Z. Simulating crowd movements using fine grid cellular automata. In International Conference on Computer Modelling and Simulation (2010), IEEE, pp. 428–433. [50] NARAIN, R., GOLAS, A., CURTIS, S., AND LIN, M. C. Aggregate dynamics for dense crowd simulation. In SIGGRAPH (2009), vol. 28, ACM. ´ J., AND POPOVIC, ´ Z. Motion [51] LEE, Y., WAMPLER, K., BERNSTEIN, G., POPOVIC, fields for interactive character locomtion. In Graphics (2010), vol. 6, ACM. [52] PARK, M. J. Guiding flows for controlling crowds. The Visual Computer 26, 11 (Jan 2010), 1383–1391. [53] BIAN, C., CHEN, D., AND WANG, S. Velocity field based modelling & simulation of crowd in confrontation operations. In International Conference on Parallel and Distributed Systems (2010), IEEE, pp. 646–651. [54] BATTY, M. Agent-based pedestrian modelling. In Centre for Advanced Spatial Analysis (2003), Working Paper Series, University College London.

BIBLIOGRAPHY

162

[55] PEJSA, T., AND ANDRIST, S. Reinforcement learning approaches for locomotion planning in interactive applications. Final project for the Machine Learning Course at University of Wisconsin, Madison, 2011. [56] RUSSELL, S. J., AND NORVIG, P. Artificial Intelligence: A Modern Approach, third ed. Prentice Hall, 2009. [57] REYES, A., SPAAN, M. T., AND SUCAR, L. E. An intelligent assistant for power plants based on factored mdps. In Intelligent System Applications to Power Systems, 2009. ISAP’09. 15th International Conference on (2009), IEEE, pp. 1–6. [58] FOKA, A., AND TRAHANIAS, P. Predictive autonomous robot navigation. In Intelligent Robots and Systems, IEEE/RSJ International Conference. (2002), IEEE, pp. 490–495. [59] CASSANDRA, A., KAELBLING, L., AND KURIEN, J. Acting under uncertainty: discrete bayesian models for mobile-robot navigation. In Intelligent Robots and Systems ’96, IROS 96, Proceedings of the 1996 IEEE/RSJ International Conference on (Nov 1996), vol. 2, pp. 963–972 vol.2. [60] SUCAR, L. Parallel markov decision processes. In Advances in Probabilistic Graphical Models, P. Lucas, J. Gmez, and A. Salmern, Eds., vol. 214 of Studies in Fuzziness and Soft Computing. Springer Berlin Heidelberg, 2007, pp. 295–309. [61] MEULEAU, N., HAUSKRECHT, M., KIM, K.-E., PESHKIN, L., KAELBLING, L. P., DEAN, T., AND BOUTILIER, C. Solving very large weakly coupled markov decision processes. In Proceedings of the Fifteenth National/Tenth Conference on Artificial Intelligence/Innovative Applications of Artificial Intelligence (Menlo Park, CA, USA, 1998), AAAI ’98/IAAI ’98, American Association for Artificial Intelligence, pp. 165–172. [62] SINGH, S., AND D., C. How to dynamically merge markov decision processes. In NIPS11, M. Mozer, M. Jordan, and T. Petsche, Eds. MIT Press, Cambridge, MA, USA, 1998. [63] BOUTILIER, C., BRAFMAN, R. I., AND GEIB, C. Prioritized goal decomposition of markov decision processes: Toward a synthesis of classical and decision theoretic planning. In Proceedings of the Fifteenth International Joint Conference on Artifical Intelligence Volume 2 (San Francisco, CA, USA, 1997), IJCAI’97, Morgan Kaufmann Publishers Inc., pp. 1156–1162. [64] FOKA, A., AND TRAHANIAS, P. Predictive control of robot velocity to avoid obstacles in dynamic environments. In International Conference on Intelligent Robots and Systems. IROS 2003 (2003), IEEE, pp. 370–375. [65] FOKA, A., AND TRAHANIAS, P. Real-time hierarchical pomdps for autonomous robot navigation. Robot. Auton. Syst. 55, 7 (July 2007), 561–571.

BIBLIOGRAPHY

163

[66] GUESTRIN, C., AND GORDON, G. Distributed planning in hierarchical factored mdps. In Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence (San Francisco, CA, USA, 2002), UAI’02, Morgan Kaufmann Publishers Inc., pp. 197–206. [67] MAUSAM, AND WELD, D. S. Solving concurrent markov decision processes. In Proceedings of the 19th National Conference on Artifical Intelligence (2004), AAAI’04, AAAI Press, pp. 716–722. [68] BONET, B., AND GEFFNER, H. Labeled RTDP: Improving the convergence of realtime dynamic programming. In Proc. 13th Int. Conf. on Automated Planning & Scheduling (ICAPS-2003) (2003), AAAI Press, AAAI Press, p. 12–31. [69] BERTSEKAS, D. P. Dynamic Programming and Optimal Control, 2nd ed. Athena Scientific, 2000. [70] MATTSON, T., SANDERS, B., AND MASSINGILL, B. Patterns for Parallel Programming, first ed. Addison-Wesley Professional, 2004. [71] JOHANNSSON, A. GPU-Based Markov Decision Process Solver. Master Thesis, School of Computer Science at Reykjav´ık University, 2009. [72] CHEN, P., AND LU, L. Markov decision process parallel value iteration algorithm on gpu. In International Conference on Information Science and Computer Applications (2013), Advances in Intelligent Systems Research, Atlantis Press. [73] NOER, D. Parallelization of the Value-Iteration algorithm for Partially Observable Markov Decision Processes. B.Sc. Thesis. Department of DTU Compute, Technical University of Denmark, 2013. [74] JU, E., CHOI, M. G., PARK, M., LEE, J., LEE, K. H., AND TAKAHASHI, S. Morphable crowds. In Graphics (Dec 2010), vol. 29, ACM. ´ J., GRILLON, H., AND THALMANN, D. Crowds of moving objects: Navigation [75] PETTRE, planning and simulation. In SIGGRAPH (2008), ACM, pp. 309–315. [76] LASZLO, M. J. Computational Geometry and Computer Graphics in C++. Rose Kernan/Prentice Hall, 1996. [77] KARAMOUZAS, I., GERAERTS, R., AND OVERMARS, M. Indicative routes for path planning and crowd simulation. In ICFDG (Apr 2009), ACM, pp. 113–120. [78] MUSTAPHA, N. A. B., BADE, A. B., AND KARI, S. A review of collision avoidance technique for crowd simulation. In International Conference on Information and Multimedia Technology (2009), IEEE. [79] GREGG, C., AND HAZELWOOD, K. Where is the data? why you cannot debate cpu vs. gpu performance without the answer. In International Symposium on Performance Analysis of Systems and Software (ISPASS) (2011), IEEE.

BIBLIOGRAPHY

164

¨ [80] FEINBUBE, F., TROGER, P., AND POLZE, A. Joint forces: From multithreaded programming to gpu computing. IEEE Software 28, 1 (Jan 2011), 51–57. ´ V., ALONSO, I. P., GAVILAN, ´ M., DAZA, I. G., PEREZ, ´ [81] LLORCA, D. F., MILANES, J., AND SOTELO, M. A. Autonomous pedestrian collision avoidance using a fuzzy steering controller. In Intelligent Transportation Systems (Jun 2011), IEEE. [82] LI, L., YANG, S., ZHOU, W., AND CHEN, G. Mechanism for constructing the dynamic collision avoidance knowledge-base by machine learning. In International Conference on Manufacturing Automation (2010), IEEE. [83] BONNER, S., AND KELLEY, R. B. A novel representation for planning 3-d collision-free paths. In Systems, Man, and Cybernetics (Nov 1990), vol. 20, IEEE, pp. 1337–1351. [84] NIENKEMPERSTR, A. R. Use of linked lists in the collision detection. Web Page, Jan 2011. 3D-Spieleentwicklung - OpenGL - GLSL - OpenCL - OpenAL - KI - Animation - Spielephysik: Einsatz von verketteten Listen bei der Kollisionserkennung. (n.d.). 3DSpieleentwicklung - OpenGL - GLSL - OpenCL - OpenAL - KI - Animation - Spielephysik. Retrieved March 7, 2012, from http://www.spieleprogrammierung.net/2011/01/einsatz-vonverketteten-listen-bei-der.html. [85] VELHO, L., CARVALHO, P. C. P., DE FIGUEIREDO, L. H., AND GOMES, J. Mathematical Optimization in Computer Graphics and Vision. The Morgan Kauffmann Series in Computer Graphics. Tiffany Gasbarrini/Elsevier Inc, 2008. [86] BING, H., YANGZIHAO, W., AND JIA, Z. An improved method of continuous collision detection using ellipsoids. In Computer-Aided Industrial Design & Conceptual Design (Nov 2009), IEEE. ¨ [87] LARSSON, T., AND AKENINE-MOLLER, T. Collision detection for continuously deforming bodies. In EUROGRAPHICS (2001), The Eurographics Association. [88] REYNOLDS, C. Big fast crowds on ps3. In Sandbox Symposium (Jul 2006), ACM. [89] CHAKRAVARTHY, A., AND GHOSE, D. Obstacle avoidance in a dynamic environment: A collision cone approach. In Systems, Man, and Cybernetics (Sep 1998), vol. 28, IEEE. [90] LI, X., ZHONG, Z., AND LU, Z. Collision detection algorithm based on slice projection. In International Conference on Mechatronics and Automation (Aug 2009), IEEE. [91] VAN DEN BERG, J., GUY, S. J., LIN, M. C., AND MANOCHA, D. Reciprocal n-body collision avoidance. In International Symposium on Robotics Research (ISRR) (May 2011), vol. 70, IFRR, pp. 3–19. [92] VAN DEN BERG, J., LIN, M., AND MANOCHA, D. Reciprocal velocity obstacles for real-time multi-agent navigation. In International Conference on Robotics and Automation (ICRA) (May 2008), IEEE, pp. 1928–1935.

BIBLIOGRAPHY

165

[93] GUY, S. J., CHHUGANI, J., KIM, C., SATISH, N., LIN, M., MANOCHA, D., AND DUBEY, P. Clearpath: highly parallel collision avoidance for multi-agent simulation. In Proceedings of the 2009 ACM SIGGRAPH/Eurographics Symposium on Computer Animation (New York, NY, USA, 2009), SCA ’09, ACM, pp. 177–187. [94] MORGAN, G., AND STOREY, K. Scalable collision detection for massively multiplayer online games. In The 19th International Conference on Advanced Information Networking and Applications (AINA) (2005), IEEE. ˇ ´ J., OLIVIER, A. H., AND DONIKIAN, S. A synthetic-vision based [95] ONDREJ, J., PETTRE, steering approach for crowd simulation. In Graphics (Jul 2010), vol. 29, ACM. ˇ [96] ONDREJ, J. Modeling and planning a realistic navigation for autonomous virtual humans. PhD thesis, Universit´e Europ´eenne de Bretagne, May 2011. ¨ M., GARNIER, S., THERAULAZ, G., AND HELBING, D. Collective infor[97] MOUSSAID, mation processing and pattern formation in swarms, flocks and crowds. Topics in cognitive science 1, 3 (2009), 469–497. [98] VISWANATHAN, V., AND LEES, M. An information-based perception model for agentbased crowd and egress simulation. In International Conference on Cyberworlds (CW) (Oct 2011), IEEE, pp. 38–45. [99] PANDZIC, I., CAPIN, T., MAGNENAT-THALMANN, N., AND THALMANN, D. A versatile navigation interface for virtual humans in collaborative virtual environments. In VRST (1997), ACM. [100] THALMANN, D. Crowd and group animation. Course Lecture Notes, 2004. [101] RITCHIE, K., ALEXANDER, O., AND BIRI, K. The Art of Rigging, vol. 2. Dover Publications, Aug 1990. ´ J. Yaq: An architecture for real[102] MA¨IM, J., YERSIN, B., THALMANN, D., AND PETTRE, time navigation and rendering of varied crowds. In Computer Graphics and Applications (Jul 2009), vol. 29, IEEE, pp. 44–53. ´ J. Automatic rigging and animation of 3d characters. In [103] BARAN, I., AND POPOVIC, SIGGRAPH (Jul 2007), vol. 26, ACM. [104] PANTUWONG, N., AND SUGIMOTO, M. A fully automatic rigging algorithm for 3d character animation. In SIGGRAPH (Dec 2011), ACM. [105] VASILAKIS, A., AND FUDOS, I. Skeleton-based rigid skinning for character animation. In International Conference on Computer Graphics Theory and Applications (Feb 2009), VISIGRAPP.

BIBLIOGRAPHY

166

[106] ZHENG, Q., LI, F. W., AND LAU, R. W. Sketching-based skeleton generation. In Ubimedia Computing (U-Media) (Jul 2010), IEEE, pp. 179–186. [107] EBERLY, D. H. 3D Game Engine Design: A Practical Approach to Real-Time Computer Graphics. Morgan Kaufmann Publishers, 2000. ´ [108] MAGNENAT-THALMANN, N., LAPERRIRE, R., THALMANN, D., AND MONTREAL, U. D. Joint-dependent local deformations for hand animation and object grasping. In Graphics Interface (1988), pp. 26–33. [109] JACKA, D., REID, A., MERRY, B., AND GAIN, J. A comparison of linear skinning techniques for character animation. In AFRIGRAPH (Oct 2007), ACM. [110] TOKOI, K., AND KOMIYA, S. Particle based skinning. In Conference on Computer Graphics, Imaging and Visualization (2009), IEEE, pp. 32–37. [111] CHEN, C.-H., LIN, I.-C., TSAI, M.-H., AND LU, P.-H. Lattice-based skinning and deformation for real-time skeleton-driven animation. In International Conference on ComputerAided Design and Computer Graphics (2011), IEEE, pp. 306–312. [112] JAMES, D. L., AND TWIGG, C. D. Skinning mesh animations. In SIGGRAPH (Aug 2005), vol. 24. [113] CHENG, Y. Mean shift, mode seeking, and clustering. In Transactions on Pattern Analysis and Machine Intelligence (Aug 1995), vol. 17, IEEE, pp. 790–799. ´ [114] KAVAN, L., AND Zˇ ARA, J. Spherical blend skinning: A real-time deformation of articulated models. In Symposium on Interactive 3D graphics and games (2005), ACM. ´ [115] KAVAN, L., O’SULLIVAN, C., AND Zˇ ARA, J. Efficient collision detection for spherical blend skinning. In GRAPHITE (Dec 2006), ACM, University Press, pp. 36–149. ´ [116] KAVAN, L., MCDONNELL, R., DOBBYN, S., Zˇ ARA, J., AND O’SULLIVAN, C. Skinning arbitrary deformations. In Symposium on Interactive 3D graphics and games (I3D) (Apr 2007), ACM. ´ [117] KAVAN, L., COLLINS, S., Zˇ ARA, J., AND O’SULLIVAN, C. Skinning with dual quaternions. In Symposium on Interactive 3D graphics and games (Apr 2007), ACM. ´ [118] KAVAN, L., COLLINS, S., Zˇ ARA, J., AND O’SULLIVAN, C. Geometric skinning with approximate dual quaternion blending. In Transactions on Graphics (Oct 2008), vol. 27, ACM. [119] KAVAN, L., COLLINS, S., AND O’SULLIVAN, C. Automatic linearization of nonlinear skinning. In Symposium on Interactive 3D graphics and games (I3D) (Feb 2009), ACM, pp. 49–56.

BIBLIOGRAPHY

167

[120] JU, T., ZHOU, Q.-Y., VAN DE PANNE, M., AND DANIEL. Reusable skinning templates using cage-based deformations. In SIGGRAPH (Dec 2008), vol. 27. [121] MILLER, C., ARIKAN, O., AND FUSSELL, D. Frankenrigs: Building character rigs from multiple sources. In Transactions on Visualization and Computer Graphics (TVCG) (Aug 2011), vol. 17, IEEE, pp. 1060–1070. [122] IKEMOTO, L., ARIKAN, O., AND FORSYTH, D. Learning to move autonomously in a hostile world. In SIGGRAPH (2005), ACM. [123] MA, W., XIA, S., HODGINS, J. K., YANG, X., LI, C., AND WANG, Z. Modeling style and variation in human motion. In Eurographics / ACM SIGGRAPH (2010), M. Otaduy and Z. Popovic, Eds., The Eurographics Association. [124] WAND, M., AND STRASSER, W. Multi-resolution rendering of complex animated scenes. Comput. Graph. Forum 21, 3 (2002), 483–491. [125] ULICNY, B., CIECHOMSKI, P. D. H., AND THALMANN, D. Crowdbrush: interactive authoring of real-time crowd scenes. In Proceedings of the 2004 ACM SIGGRAPH/Eurographics symposium on Computer animation (Aire-la-Ville, Switzerland, Switzerland, 2004), SCA ’04, Eurographics Association, pp. 243–252. ´ [126] RUDOMIN, I., AND MILLAN, E. Point based rendering and displaced subdivision for interactive animation of crowds of clothed characters. VRIPHYS 2004: Virtual Reality Interaction and Physical Simulation Workshop (2004), 139–148. [127] AUBEL, A., BOULIC, R., AND THALMANN, D. Animated impostors for real-time display of numerous virtual humans. In Proc. Virtual Worlds 98 (1998), Springer-Verlag, pp. 14–28. [128] AUBEL, A., BOULIC, R., AND THALMANN, D. Lowering the cost of virtual human rendering with structured animated impostors. In In Proceedings of WSCG 99, Plzen, Czech Republic (1999), Press. [129] AUBEL, A., BOULIC, R., AND THALMANN, D. Real-time display of virtual humans: levels of details and impostors. IEEE Transactions on Circuits and Systems for Video Technology 10, 2 (2000), 207–17. [130] TECCHIA, F., AND CHRYSANTHOU, Y. Real-time rendering of densely populated urban environments. In Proceedings of the Eurographics Workshop on Rendering Techniques 2000 (London, UK, 2000), Springer-Verlag, pp. 83–88. [131] TECCHIA, F., LOSCOS, C., AND CHRYSANTHOU, Y. Image-based crowd rendering. Computer Graphics and Applications, IEEE 22, 2 (2002), 36–43.

BIBLIOGRAPHY

168

[132] DOBBYN, S., HAMILL, J., O’CONOR, K., AND O’SULLIVAN, C. Geopostors: a realtime geometry / impostor crowd rendering system. In Proceedings of the 2005 symposium on Interactive 3D graphics and games (New York, NY, USA, 2005), I3D ’05, ACM, pp. 95– 102. [133] MILLAN, E., AND RUDOMIN, I. Impostors and pseudo-instancing for gpu crowd rendering. In GRAPHITE ’06: Proceedings of the 4th international conference on Computer graphics and interactive techniques in Australasia and Southeast Asia (New York, NY, USA, 2006), ACM Press, pp. 49–55. [134] BEACCO, A., ANDUJAR, C., PELECHANO, N., AND SPANLANG, B. Efficient rendering of animated characters through optimized per-joint impostors. Comput. Animat. Virtual Worlds 23 (Feb. 2012), 33–47. ´ [135] MCDONNELL, R., LARKIN, M., HERNANDEZ, B., RUDOMIN, I., AND O’SULLIVAN, C. Eye-catching crowds: saliency based selective variation. In SIGGRAPH ’09: ACM SIGGRAPH 2009 papers (New York, NY, USA, 2009), ACM, pp. 1–10. [136] SEO, H., AND MAGNENAT-THALMANN, N. An automatic modeling of human bodies from sizing parameters. In I3D ’03: Proceedings of the 2003 symposium on Interactive 3D graphics (New York, NY, USA, 2003), ACM, pp. 19–26. [137] SEO, H., AND MAGNENAT-THALMANN, N. An example-based approach to human body manipulation. Graph. Models 66, 1 (2004), 1–23. [138] KASAP, M., AND MAGNENAT-THALMANN, N. Parameterized human body model for real-time applications. In CW ’07: Proceedings of the 2007 International Conference on Cyberworlds (Washington, DC, USA, 2007), IEEE Computer Society, pp. 160–167. [139] GALVAO, R., LAYCOCK, R. G., AND DAY, A. M. Gpu techniques for creating visually diverse crowds in real-time. In VRST ’08: Proceedings of the 2008 ACM symposium on Virtual reality software and technology (New York, NY, USA, 2008), ACM, pp. 79–86. [140] YU, L.-F., YEUNG, S.-K., TERZOPOULOS, D., AND CHAN, T. F. Dressup!: outfit synthesis through automatic optimization. ACM Trans. Graph. 31, 6 (Nov. 2012), 134:1– 134:14. [141] LIU, S., FENG, J., SONG, Z., ZHANG, T., LU, H., XU, C., AND YAN, S. Hi, magic closet, tell me what to wear! In Proceedings of the 20th ACM international conference on Multimedia (New York, NY, USA, 2012), MM ’12, ACM, pp. 619–628. [142] MCDONNELL, R., LARKIN, M., DOBBYN, S., COLLINS, S., AND O’SULLIVAN, C. Clone attack! perception of crowd variety. ACM Trans. Graph. 27, 3 (Aug. 2008), 26:1– 26:8.

BIBLIOGRAPHY

169

[143] POIRIER, M., AND PAQUETTE, E. Rig retargeting for 3d animation. In GI ’09: Proceedings of Graphics Interface 2009 (Toronto, Ont., Canada, Canada, 2009), Canadian Information Processing Society, pp. 103–110. [144] MILLER, C., ARIKAN, O., AND FUSSELL, D. Frankenrigs: building character rigs ma multiple sources. In I3D ’10: Proceedings of the 2010 ACM SIGGRAPH symposium on Interactive 3D Graphics and Games (New York, NY, USA, 2010), ACM, pp. 31–38. [145] KAVAN, L., SLOAN, P.-P., AND O’SULLIVAN, C. Fast and efficient skinning of animated meshes. Comput. Graph. Forum 29, 2 (2010), 327–336. ¨ [146] HASLER, N., THORMAHLEN, T., ROSENHAHN, B., AND SEIDEL, H.-P. Learning skeletons for shape and pose. In Proceedings I3D 2010 : ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games (Washington DC, USA, February 2010), A. Varshney, C. Wyman, D. Aliaga, and M. M. Oliveira, Eds., Association for Computing Machinery (ACM), ACM, pp. 23–30. [147] LE, B., AND DENG, Z. Smooth skinning decomposition with rigid bones. ACM Trans. Graph. 31, 6 (Dec. 2012). [148] AHEARN, L. 3D Game Environments. Elsevier, Inc., 2008. [149] BELLMAN, R. A markovian decision process. Indiana Univ. Math. J. 6 (1957), 679–684. [150] YONG-KUI, L. The generation of straight lines on hexagonal grids. Computer Graphics Forum 12, 1 (1993), 27–31. [151] BRIMKOV, V. E., AND BARNEVA, R. P. Analytical honeycomb geometry for raster and volume graphics. Comput. J. 48, 2 (Mar. 2005), 180–199. [152] PAPADIMITRIOU, C., AND TSITSIKLIS, J. N. The complexity of markov decision processes. Math. Oper. Res. 12, 3 (Aug. 1987), 441–450. [153] VAN TOLL, W. G., COOK, IV, A. F., AND GERAERTS, R. Real-time density-based crowd simulation. Comput. Animat. Virtual Worlds 23, 1 (Feb. 2012), 59–69. [154] LITTMAN, M. L., DEAN, T. L., AND KAELBLING, L. P. On the complexity of solving markov decision problems. In IN PROC. OF THE ELEVENTH INTERNATIONAL CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE (1995), pp. 394–402. [155] BELL, N., AND HOBEROCK, J. Thrust: A productivity-oriented library for cuda. In GPU Computing Gems Jade Edition, W. W. Hwu, Ed. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2011. [156] NVIDIA, C. Nvidia jetson tk1 development kit. Web Page, Oct. 2014. [157] NVIDIA, C. Tegra k1 mobile processor. Web Page, Oct. 2014.

BIBLIOGRAPHY

170

[158] HEYES-JONES, J. Implementation of the a* algorithm in c++. Web Page, Mar 2014. justinhj/astar-algorithm-cpp. GitHub. Retrieved May 6, 2014, from https://github.com/justinhj/astar-algorithm-cpp. ´ [159] RUIZ, S., HERNANDEZ, B., AND RUDOM´IN, I. A gpu implementation of markov decision process for path planning. In The 5th International Supercomputing Conference in Mexico (ISUM 2014) (2014). [160] KOENIG, S., LIKHACHEV, M., AND FURCY, D. Lifelong planning a*. Artificial Intelligence (2004), 93–146. [161] KOENIG, S., AND LIKHACHEV, M. D*lite. In Eighteenth national conference on Artificial intelligence (Menlo Park, CA, USA, 2002), American Association for Artificial Intelligence, pp. 476–483. [162] LIKHACHEV, M., FERGUSON, D., GORDON, G., STENTZ, A. T., AND THRUN, S. Anytime dynamic a*: An anytime, replanning algorithm. In Proceedings of the International Conference on Automated Planning and Scheduling (ICAPS) (June 2005). [163] BERG, J. V. D. Anytime path planning and replanning in dynamic environments. In in Proceedings of the International Conference on Robotics and Automation (2006), pp. 2366– 2371. [164] LIKHACHEV, M., GORDON, G. J., AND THRUN, S. Ara*: Anytime a* with provable bounds on sub-optimality. In NIPS (2003). [165] SHAO, L., AND ZHOU, H. Curve fitting with b´ezier cubics. In Graphical Models and Image Processing (1996), vol. 58, pp. 223–232. [166] NOURGALIEV, R., DINH, T., THEOFANOUS, T., AND JOSEPH, D. The lattice boltzmann equation method: theoretical interpretation, numerics and implications. In International Journal of Multiphase Flow (2003), vol. 29, Elsevier, pp. 117–169. [167] SUKOP, M. C., AND THORNE, D. T. Lattice Boltzmann Modeling: An Introduction for Geoscientists and Engineers, 1 ed. Springer Publishing Company, Incorporated, 2007. [168] ZOU, Q., AND HE, X. On pressure and velocity boundary conditions for the lattice boltzmann bgk model. In Physics of Fluids (1997), vol. 9, pp. 1591–1598. [169] DIJKSTRA, E. W. The origin of concurrent programming. In The Origin of Concurrent Programming, P. B. Hansen, Ed. Springer-Verlag New York, Inc., New York, NY, USA, 2002, ch. Cooperating Sequential Processes, pp. 65–138. ´ [170] HERNANDEZ, B., AND RUDOMIN, I. GPU Pro 2. A. K. Peters/CRC Press, 2011, ch. A Rendering Pipeline for Real-time Crowds, pp. 369–383.

BIBLIOGRAPHY

171

[171] VAN KAICK, O., TAGLIASACCHI, A., SIDI, O., ZHANG, H., COHEN-OR, D., WOLF, L., , AND HAMARNEH, G. Prior knowledge for part correspondence. Computer Graphics Forum (Proc. Eurographics) 30 (2011), 553–562. [172] ZHENG, Y., COHEN-OR, D., AND MITRA, N. J. Smart variations: Functional substructures for part compatibility. Computer Graphics Forum (Eurographics) 32 (2013), 195–204. ¨ [173] THIEDEMANN, S., HENRICH, N., GROSCH, T., AND MULLER, S. Voxel-based global illumination. In Symposium on Interactive 3D Graphics and Games (New York, NY, USA, 2011), I3D ’11, ACM, pp. 103–110. [174] GU, X., GORTLER, S. J., AND HOPPE, H. Geometry images. In SIGGRAPH (2002), ACM. [175] FENG, W.-W., KIM, B.-U., YU, Y., PENG, L., AND HART, J. Feature-preserving triangular geometry images for level-of-detail representation of static and skinned meshes. ACM Trans. Graph. 29, 2 (Apr. 2010), 11:1–11:13. [176] BEIER, T., AND NEELY, S. Feature-based image metamorphosis. In Proceedings of the 19th annual conference on Computer graphics and interactive techniques (New York, NY, USA, 1992), SIGGRAPH ’92, ACM, pp. 35–42. [177] BLANZ, V., AND VETTER, T. A morphable model for the synthesis of 3d faces. In Proceedings of the 26th annual conference on Computer graphics and interactive techniques (New York, NY, USA, 1999), SIGGRAPH ’99, ACM Press/Addison-Wesley Publishing Co., pp. 187–194. [178] BLAUERT, J. Spatial hearing: the psychophysics of human sound localization. MIT Press, 1997. [179] BRONSON, J. R. Statistically-weighted visualization hierarchies. Dissertation, University of Maryland, Baltimore County, Apr 2008. [180] WILLIAMS, L. Pyramidal parametrics. In Computer Graphics (Jul 1983), vol. 17, ACM.