Optimization methods to solve adaptive management ... - Springer Link

98 downloads 0 Views 1MB Size Report
Oct 24, 2016 - lation (Charles 1992), the recovery rate after stock collapse. (Hauser and ...... adaptive management program (Haight and Polasky 2010;.
Theor Ecol (2017) 10:1–20 DOI 10.1007/s12080-016-0313-0

REVIEW PAPER

Optimization methods to solve adaptive management problems Iadine Chadès 1 & Sam Nicol 1 & Tracy M. Rout 2 & Martin Péron 1,3 & Yann Dujardin 1 & Jean-Baptiste Pichancourt 1 & Alan Hastings 4 & Cindy E. Hauser 2

Received: 7 December 2015 / Accepted: 22 September 2016 / Published online: 24 October 2016 # Springer Science+Business Media Dordrecht 2016

Abstract Determining the best management actions is challenging when critical information is missing. However, urgency and limited resources require that decisions must be made despite this uncertainty. The best practice method for managing uncertain systems is adaptive management, or learning by doing. Adaptive management problems can be solved optimally using decision-theoretic methods; the challenge for these methods is to represent current and future knowledge using easy-to-optimize representations. Significant methodological advances have been made since the seminal adaptive management work was published in the 1980s, but despite recent advances, guidance for implementing these approaches has been piecemeal and study-specific. There is a need to collate and summarize new work. Here, we classify methods and update the literature with the latest optimal or near-optimal approaches for solving adaptive management problems. We review three mathematical concepts required to solve adaptive management problems: Markov decision processes, sufficient statistics, and Bayes’ theorem. We provide a decision tree to determine whether adaptive management is appropriate and then group adaptive Electronic supplementary material The online version of this article (doi:10.1007/s12080-016-0313-0) contains supplementary material, which is available to authorized users. * Iadine Chadès [email protected]

1

CSIRO, GPO Box 2583, Brisbane QLD 4001, Australia

2

School of BioSciences, University of Melbourne, Parkville Vic 3010, Australia

3

School of Mathematical Sciences, Queensland University of Technology, Brisbane QLD 4000, Australia

4

Department of Environmental Science and Policy, University of California, Davis, CA 95616, USA

management approaches based on whether they learn only from the past (passive) or anticipate future learning (active). We discuss the assumptions made when using existing models and provide solution algorithms for each approach. Finally, we propose new areas of development that could inspire future research. For a long time, limited by the efficiency of the solution methods, recent techniques to efficiently solve partially observable decision problems now allow us to solve more realistic adaptive management problems such as imperfect detection and non-stationarity in systems. Keywords Adaptive management . Markov decision process . MDP . Partially observable Markov decision process . POMDP . Stochastic dynamic programming . Value of information . Hidden Markov models . Natural resource management . Conservation

Introduction Resources to manage ecological systems are limited worldwide. Managers have the difficult task of making decisions without perfect knowledge of system dynamics or the consequences of their actions (Wilson et al. 2006). In ecology, uncertainty may arise from measurement error, systematic error, natural variation, inherent randomness, structural uncertainty, and subjective judgment (Regan et al. 2002). In conservation, adaptive management is acknowledged as the principal tool for decision making under structural uncertainty (Keith et al. 2011), and it has the capacity to address most other forms of uncertainty. Decisions are selected to achieve a management objective while simultaneously gaining information to improve future management success (Holling 1978; McCarthy et al. 2012; Walters 1986). Adaptive management is designed to help managers learn about the best suite of management actions to implement

2

by monitoring their effectiveness in complex ecological systems (Westgate et al. 2013). In this sense, adaptive management is a systematic approach to improving the management process and accommodating changes by learning while doing (Gregory et al. 2006; Holling 1978; Walters 1986). There are two main approaches to adaptive management: decisiontheoretic and resilience-based. We provide an overview of the decision-theoretic approaches available for optimizing adaptive management; interested readers can refer to Runge (2011) for a discussion of resilience approaches. Before solving an adaptive management problem, we need to characterize the type of uncertainty we are facing. The literature on adaptive management refers to four kinds of uncertainty: (1) environmental variation, or process uncertainty, (2) control uncertainty, (3) state uncertainty, or partial observability, and (4) structural uncertainty (Williams et al. 1996). Environmental variation, or process uncertainty, comes from the inherent variability in natural processes. Regardless of how effectively our models describe the behavior of a natural population, we cannot expect these models to predict the exact state of the system at any given time in the future. The future will, at best, be described in probabilistic terms (Parma 1998). Control uncertainty refers to partial controllability and arises because managers cannot perfectly predict the consequences of their management actions (Fackler and Pacifici 2014; Williams et al. 1996). State uncertainty or partial observability results from imperfections in measuring equipment and monitoring techniques. The state of a system must be inferred from imperfect monitoring systems (Fackler and Pacifici 2014; Williams et al. 1996). Adaptive management is particularly tailored to address a last type of uncertainty, structural uncertainty, which corresponds to an imperfect knowledge of the system dynamics. Structural uncertainty is characterized by an uncertainty in parameters (parameter uncertainty) or in the model (model uncertainty) of the system dynamics. Here, we are emphasizing approaches that have wide applicability, but an illustration of how these different kinds of uncertainty enter into fisheries management is illustrative (Fulton et al. 2011; Sethi et al. 2005). Year-to-year variation in currents and climate (as well as varying impacts of other species) leads to process uncertainty in the dynamics of fish populations. Even if managers set fisheries policy, it is not possible to predict with certainty how fishermen will respond (Fulton et al. 2011) which leads to control uncertainty. Since any assessment of the size of a stock is imperfect, there is clearly state uncertainty. Finally, the true dynamics even without environmental stochasticity for fisheries are not know (and may depend on many factors such as age and space that may not be fully included), i.e., there is structural uncertainty. Similar issues clearly arise for other pressing environmental problems, such as control of invasive species (Mehta et al. 2007). To guide readers, we provide a decision tree that outlines the best order for key questions to be addressed, before undertaking an adaptive management approach (Fig. 1).

Theor Ecol (2017) 10:1–20

First, an adaptive management problem should satisfy three prerequisites: a clear management objective, an iterative action/observation process, and uncertain system dynamics. A management objective is required to distinguish adaptive management from post hoc learning, where learning may occur but is not planned as part of a targeted approach to reduce uncertainty. Iterative actions are essential as feedback is required to Blearn by doing.^ Second, it is essential to identify the type of structural uncertainty. The structural uncertainty can be driven by an uncertain quantity that may take infinitely many values, as in the case of parameter uncertainty. For example, the uncertain parameter can represent the growth rate of a population (Charles 1992), the recovery rate after stock collapse (Hauser and Possingham 2008; Moore et al. 2008), the survival rate of a species (Runge 2013; Springborn and Sanchirico 2013), the mortality rate of translocated population (Rout et al. 2009), a colonization rate between subpopulations (Southwell et al. 2016), or the probability of success of a management action in forestry (McCarthy and

Fig. 1 Decision tree summarizing the main questions to address before undertaking an adaptive management approach

Theor Ecol (2017) 10:1–20

Possingham 2007). Alternatively, the structural uncertainty may be about finite competing models for the system dynamics. Examples of model uncertainty include whether population dynamics follow a Ricker or Beverton-Holt model (Walters and Hilborn 1976), uncertain population response to harvest and survival (Williams et al. 1996), uncertain growth and aging models of a forest (Moore and Conroy 2006), uncertain consequences of climate change (Nicol et al. 2014, 2015), and different plausible population growth models arising from uncertain disease latency (McDonald-Madden et al. 2010b). Third, once the type of uncertainty is defined, the value of learning can be calculated using the value of information (Box 1, (Canessa et al. 2015; Schlaifer and Raiffa 1961)). Learning may not be inherently valuable for management, i.e., it may not result in improved management outcomes (Martin et al. 2016). If the value of learning is high, an adaptive management approach is usually justified. Otherwise, managers do not require an adaptive management approach as reducing uncertainty will not improve the management outcomes. Box 1: Will reducing uncertainty provide a better outcome? Principles of value of information analysis Value of information (VoI) analysis (Schlaifer and Raiffa 1961) determines the critical uncertainties in a problem. VoI quantifies whether reducing uncertainty will improve performance (Canessa et al. 2015; Runge et al. 2011). Note that quantifying the value of information gained by resolving model or parameter uncertainty iteratively is difficult because it requires evaluating the gain of implementing an adaptive policy rather than a single action. Little guidance is available in the literature (Hauser and Possingham 2008; Walters 1986; Williams and Johnson 2015). Calculation of the expected value of perfect information (EVPI) Expected value of perfect information is the difference between the expected benefit with perfect information (PI) and the expected benefit given the current level of uncertainty (no learning; NL) EVPI = PI − NL. This value depends on the current knowledge about the system dynamics, given by the belief b (i.e. probability that the uncertain quantity is correct). We provide the equations for model uncertainty; changing the summation to an integral leads to the parameter uncertainty formulation. The expected benefit with perfect information, PI, depends on the optimal values V*m for each model m ∈ M (obtained with SDP) = ∑m ∈ Mb(m)V*m. PI corresponds to the value we would obtain if we knew the true model from the beginning of the process. However, because the true model is unknown, PI is the average of the optimal values for each model weighted by a prior belief that each model is the true model. The literature provides alternative ways of calculating the expected benefit given the current level of uncertainty for dynamic systems, NL (Hauser and Possingham 2008; Walters 1986). Here, we provide a formulation that is easy to calculate. It requires creating a new MDP with transitions PNL. PNL is the average of the transitions of each model Pm weighted by the model beliefs PNL(st + 1|st, at) = ∑m ∈ Mb(m)Pm(st + 1|st, at). The value NL is the optimal value of the MDP with transition function PNL (states, actions, and rewards are assumed the same in all models). Because the transition function does not change over time, NL refers to the situation where no learning is undertaken.

3 Intuitive results If we denote AAM the optimal value obtained when implementing active adaptive management, we have NL ≤ AAM because AAM anticipates the knowledge improvements brought by actions and trades off informative against reward decisions optimally; AAM ≤ PI because the knowledge is perfect in PI from the very first time step. This implies EVPI = PI − NL ≥ AAM − NL ≥ 0: the potential gain of implementing active adaptive management is no greater than the EVPI. A small EVPI (relative to the values) means that an adaptive management approach will bring very little improvement when compared to a Bno learning^ approach. Note that this very much relies on the current knowledge of the system dynamics (b) and might be misleading if our estimation of b is wrong. A sensitivity analysis on b should be carried out.

Fourth, once it is established that adaptive management is justified, active and passive adaptive managements are the two commonly used approaches to solve adaptive management problems (Walters and Hilborn 1978). Both approaches are iterative learning procedures that provide at each time step an action to implement, given existing knowledge of the system dynamics. The difference between the approaches lies in how the recommended action is calculated. Passive adaptive approaches act as if the current knowledge of the system is correct while expecting some mistakes, which can be used to improve the knowledge as management proceeds over time. Active adaptive approaches explicitly acknowledge that the current knowledge of the system might not be correct, predict the mistakes that may arise and future improvement as management proceeds over time. Solutions to active adaptive management problems maximize the chance of achieving the objective by explicitly accounting for future learning opportunities. In contrast, passive adaptive management uses only past experience and does not account for future learning opportunities. From an optimization perspective, passive adaptive management methods are heuristics to solve more complex active adaptive management problems (Bertsekas 1995, p. 293). In the following section, we introduce three mathematical concepts that are required to solve an adaptive management problem (Fig. 1): Markov decision processes, sufficient statistics, and Bayes’ theorem. We then present existing decision models and algorithms that solve active and passive adaptive management problems for model and parameter uncertainty. Finally, we discuss the challenges that impede greater uptake of adaptive management approaches.

Important concepts Markov decision processes Solving an adaptive management problem results in a strategy that provides the best action given available

4

information, so that the chance of achieving a management objective is maximized. To provide these best decisions, adaptive management problems are modeled as sequential decision-making problems under uncertainty. Sequential decision-making processes in stochastic systems, including adaptive management, can be modeled using Markov decision processes (MDPs) (Bellman 1957; Marescot et al. 2013) as a theoretical foundation. MDP problems can be solved exactly using stochastic dynamic programming techniques (SDP, Marescot et al. 2013). Continuing the idea of fisheries management introduced earlier, these sequential decisions could be yearly limits on effort or catch (Sethi et al. 2005). MDPs are controlled stochastic processes satisfying the Markov property and assigning reward values to state transitions (Puterman 1994; Sigaud and Buffet 2010). Formally, MDPs are described by a tuple where S is the finite state space that describes the possible configurations of the system; A is the finite set of all possible actions or decisions that control the state dynamics; P denotes the state dynamics of the system, i.e., P(st+1|st,at) represents the probability of transitioning to state st+1 given the system is in state st and action at is applied; and r denotes the reward function defined on state transitions: r(st,at). Desirable transitions receive strong rewards. T is the time horizon over which decisions must be made and can be either finite or infinite. Because MDPs assume that the variables influencing the dynamics of the system are completely observable, a policy is simply defined as a function π: S → A that associates a decision (i.e., action) to each state configuration of the system. A policy provides the rules that a decision maker would follow to perform an optimal action in each state of the system. A policy may be dependent on a time step t or independent of time. The solution to a Markov decision process is an optimal policy, given an objective that decision makers wish to achieve (Sigaud and Buffet 2010). For fisheries, this objective could be maximizing the net present value of the fishery which would depend on catch in the upcoming and all future years. More generally, an objective, also called optimization criterion, may be γ -discounted, where the discount factor γ is a real number: 0 ≤ γ < 1 (Marescot et al. 2013). A value function is used to evaluate the expected performance of a policy π starting from state s. Solving an MDP problem means finding an optimal policy π* such that its value function V*(s) is the best value possible. Linear programming, value iteration, and policy iteration are among the most popular stochastic dynamic programming methods to solve MDP exactly (Marescot et al. 2013; Puterman 1994). The ready-to-use solvers MDPSOLVE (Fackler 2013) and MDPToolbox (Chadès et al. 2014) now empower users to solve MDPs. MDP applications have ranged from prioritizing global conservation effort (Wilson et al. 2006), weed control (Firn et al. 2008; Pichancourt et al. 2012), metapopulation

Theor Ecol (2017) 10:1–20

management (Nicol and Possingham 2010), fire regime management (McCarthy et al. 2001), and harvest problems (Hauser and Possingham 2008; Walters and Hilborn 1978) to cite a few. In behavioral ecology, MDPs have been used to test if species optimize their reproductive fitness over time (Houston et al. 1988; Venner et al. 2006). Sufficient statistics Adaptive management problems differ from classical MDPs because the value of a parameter or the true model is hidden from the decision maker and influences the dynamics of the system (Fig. S1) (Bertsekas 1995; Chadès et al. 2012). Because a state variable that influences the dynamics is hidden (and consequently also influences the best decision), the optimal policy π* depends on both the observable state variable and the value of the hidden variable. The value of the hidden variable must be estimated using the history of observations and actions. Because it is not feasible to remember the complete past history of observations and actions, sufficient statistics are used (Bertsekas 1995, p 251; Fisher 1922). Sufficient statistics allow us to retain data without losing any important information. To be useful in adaptive management problems, sufficient statistics must obey the Markov property and be easy to represent and update. Finding sufficient statistics that best represent uncertain variables is a central and long standing challenge of adaptive management (Walters 1986). For problems with uncertain variables that can take finite values, belief states are widely used sufficient statistics. Belief states are probability distributions over finite quantities and can be updated using Bayes’ theorem (Sigaud and Buffet 2010). When confronted with uncertain variables that can take infinite values, sufficient statistics that take finite values facilitate the use of fast and accurate solution methods. A common example of such convenient sufficient statistics is the number of successes and failures of an experiment to represent an unknown probability of management success (see example 1, Box 2). Box 2: Active adaptive management examples Harvesting under parameter uncertainty In Hauser and Possingham (2008), the optimal strategy recommends whether or not to harvest a population given a current 3-state population size (S = {robust, vulnerable, or collapsed}) and unknown recovery rate p; the probability of transition from a collapsed to a vulnerable population size. All possible recovery rates between 0 and 1 are plausible, and uncertainty surrounding the parameter p can be represented using a beta distribution. Given a Beta(∝, β) prior for p, the posterior is a beta distribution with new parameters ∝ + R and β + N–R, where the population is observed to recover in R out of N years spent in a collapsed state. Consequently, ∝ and β can be used as sufficient statistics. The optimal policy is derived by solving a MDP (Tables 2 and 3) as ∝

Theor Ecol (2017) 10:1–20 and β take finite discrete values. The state space is defined as X = S × α × β. The action space is harvest or no harvest. The transition probabilities from collapsed to vulnerable are derived for all possible value of ∝ and β. Profits are accrued when the population is harvested. The optimal policy matches an action to a population size and values of ∝ and β. In exploring optimal strategies over short-, medium-, and long-term management time horizons, the authors found that active adaptive strategies could be more precautionary than passive strategies depending on the length of time considered. Climate change mitigation under model uncertainty In Nicol et al. (2015), the optimal strategy recommends where to invest resources to protect migratory shorebird populations in the East AsianAustralasian (EAA) flyway given uncertain consequences of sea level rise (SLR). The impacts of sea level rise can be mitigated with protective management actions at a single location in each time step—the objective is to find the best location to invest resources at each time step. The consequences of sea level rise on shorebird populations are uncertain and are represented by three alternative SLR scenarios. Because there is a finite amount of scenarios we are uncertain about, belief states— probability distribution over the set of scenarios—are used as sufficient statistics. The optimal policy is derived by solving a factored POMDP (Table 5). The states are discrete breeding population sizes and the protection level of each location; actions are the level of protection applied to each location of the EAA flyway. States are fully observable; however, the correct scenario and the expected future states are only partially observable and must be learned by observing the system over time. Transition probabilities are derived based on the SLR scenario and the level of protection of each location. Rewards are a function of the population size and the cost of management actions. The optimal policy matches a protective action to a location, given the current belief in each SLR scenario.

5

managed (at) in a given configuration (st) and data are gathered (st+1):

btþ1 ðθjst ; at ; stþ1 ; bt Þ ¼

Pθ ðstþ1 jst ; at Þbt ðθÞ   ; ∫θ Pθ stþ1 jst ; at bt ðθÞdθ

ð1Þ

where Pθ(st+1|st,at) is the state transition probability assuming that the true parameter value is θ. Useful sufficient statistics for bt(θ) can be found when bt(θ) is a conjugate prior for Pθ(st+1|st,at). This approach is elegant but addresses only a limited set of problem structures (Walters 1986, p202). Unknown quantities can also take a finite discrete set of values; this is often the case under model uncertainty. In this case, Eq. 1 is expressed in discrete form:

btþ1 ðmjst ; at ; stþ1 ; bt Þ ¼ X

Pm ðstþ1 jst ; at Þbt ðmÞ P ðs js ; a Þb ðmÞ m∈M m tþ1 t t t

ð2Þ

where Pm(st+1|st,at) is the state transition probability assuming that the true model is m. The discrete belief value bt(m) is interpreted as the probability that m best describes system dynamics of the available models.

Bayes’ theorem

Active adaptive management The application of Bayes’ rule is the underlying mechanism for learning in all adaptive management problems. Bayes’ rule states that P(B|A), the probability that the system follows model B given the observed outcome A, can be calculated using P(A|B), the likelihood of receiving new information A when system follows model B, and P(B), the prior probability that B is the best available model to describe the system. Mathematically, P(B|A) = P(A|B)P(B)/P(A). The probability P(A|B) frequently depends on the management action we take, enabling us to learn about the efficiency of management actions (McCarthy 2007). Expressions of Bayes’ theorem are different for unknown quantities that take a finite number of possible values compared with those that take an infinite number of possible values. When an uncertain parameter has an infinite range of possible values, the distribution bt(θ) represents the values of parameter θ at time t as a probability density function and is referred to as the belief in θ. Observing the system response to management actions between times t and t + 1 provides information that can be used to update this belief. Bayes’ theorem provides a means of updating distribution b t (θ) as the system is

Given the mathematical concepts described in the previous section, we are now ready to introduce the two adaptive management solution approaches: active and passive adaptive management.

Introduction Active management requires Bthinking ahead^ and calculating the consequences of all possible values of the unknown information before deciding the optimal action. A probability distribution or belief is used to describe the range of plausible values and their relative credibility (Eqs. 1 and 2). Optimal decisions at a given time step depend on current knowledge of the uncertain quantities θ. Formally, a policy is defined as follows: πt: S, bt → A. The optimal value function V* that characterizes the performance of a policy is a function of a probability distribution which is a potentially continuous variable, bt. An active adaptive manager projects future data generation and belief distribution using a variation of Bellman’s optimality equation (Williams et al. 2009):

6

Theor Ecol (2017) 10:1–20

(3)

This optimization requires that the trajectory of belief bt to bt + and state transitions Pθ be calculated for all times t considered. There are two main branches of solution methods for adaptive management that are based on the kind of uncertainty that needs to be resolved (Fig. 1). Parameter uncertainty refers to systems where the values of the parameters driving the system are uncertain. Model uncertainty refers to the lack of understanding about the structure of biological and ecological relations that drive resource dynamics (Williams et al. 2009). While parameter and model refer to different types of uncertainty, in terms of solution methods, the most important question is whether or not the uncertain variable driving the dynamics of the systems takes continuous or discrete values (Fig. 2). We first discuss exact continuous methods to solve problems of parameter uncertainty. Then, we discuss solution methods for a discrete number of models (model uncertainty).

1

1986). That is, the means and covariance matrix can be carried as knowledge state variables in the dynamic programming equations used to determine the optimal policy (Eq. 3). This technique is known more generally as Badaptive filtering^ in control theory (Walters 1986, p. 200–202) and includes extended Kalman filters for nonlinear process models (Walters and Hilborn 1978). Since extended Kalman filters linearize the system around the best estimates, it can be a poor approximation technique for highly nonlinear systems and small data sets (Walters 1986, p. 211). Sufficient statistics can also be developed using principal component analysis (Walters 1986, p.178)

Parameter uncertainty In adaptive management to resolve parameter uncertainty, the task is to manage the system while simultaneously learning the value of the parameter to improve future management decisions. It is assumed that the unknown underlying parameter has a fixed value. The literature usually assumes that parameter uncertainty refers to an unknown quantity that could potentially take one of an infinite number of values. In the case where the unknown quantity takes a finite number of values, optimization methods to solve model uncertainty are applied. Walters and Hilborn (1976) introduced the concepts of adaptive management from control theory to deal with uncertainty in the management of renewable resources such as fisheries and wildlife. In control theory, adaptive management is referred to as adaptive control (Åström and Wittenmark 2008; Bertsekas 1995). Parameter uncertainty was targeted in the earliest formulations of adaptive management problems. For system models that are perfectly observable and linear in the uncertain parameters with additive, normally distributed natural variation, parameter means and their covariance matrix are sufficient statistics for characterizing uncertainty (Walters

Fig. 2 Can it be solved using active adaptive management? Decision tree summarizing the choice of optimization approaches available to solve active adaptive management problems

Theor Ecol (2017) 10:1–20

or singular value decomposition (Walters 1986, p. 180). Early studies introduced other extensions of the basic adaptive management framework such as exponential weighting to Bforget^ older data (Walters 1986, p. 213; Walters and Hilborn 1976), random or systematic shifts in the underlying parameter values over time (Smith and Walters 1981; Walters 1986, p. 212), partial controllability (Walters and Hilborn 1976), partial observability (Walters and Ludwig 1981), and a risk-averse utility function (Walters and Ludwig 1987). Walters and Hilborn (1976)’s parameter-uncertain Ricker models were the first to take advantage of conjugate distributions describing the prior and posterior to streamline Bayesian updating of uncertainty. In their case, describing parameter uncertainty with a normal distribution in a linear process model with additive normal environmental variation yielded a normally distributed posterior distribution for parameter uncertainty. The advantage of using conjugate distributions is that it is possible to obtain a closed form expression for the posterior; so, the distribution can be updated exactly without resorting to numerical simulation methods. A list of some known conjugate distributions is included in Table S1. Many new applications of adaptive management problems in areas outside of fisheries and harvest management have utilized conjugate distributions. McCarthy and Possingham (2007) posed a general conservation management problem where a manager must choose between implementing two actions, both with unknown probabilities of success. Each probability of success is described by a beta distribution with parameters α and β. After observing s successes and f failures from the trials implemented, sufficient statistics α and β are updated as α + s and β + f. The authors used a case study of choosing between high- or low-density planting for successful revegetation. The beta-binomial conjugate relationship has since been applied to adaptive management of wildlife harvest (Hauser and Possingham 2008; Moore et al. 2008)(Example 1, Box 2), threatened species translocation (Rout et al. 2009; Runge 2013), and conservation of a metapopulation (Southwell et al. 2016). Tables 2 and 3 provide the algorithms required to solve active adaptive management under parameter uncertainty in the case where the uncertain parameter is defined as a beta distribution. First, the optimal policy is calculated for all possible beliefs (Table 2). The procedure then applies the best action given the current state and belief (Table 3). After each implementation, the system is monitored and the belief updated (Table 3, lines 7 to 9). The process repeats for the duration of the time horizon. Although the use of conjugate distributions reduces the dimension of the optimization state space from a potentially (continuous) infinite state space to a finite state space, there remain computational challenges. In particular, the domain of plausible values can expand over time. For example, a betabinomial management problem with a known prior at time 0

7

and n trials per time step could project to any one of n*t + 1 states at time t (Example 1, Box 2). It is not possible to find an exact conjugate prior for every parameter uncertainty problem, and numerical solutions may be required to update sufficient statistics. Density projection for distributions from the exponential family can provide an alternative (Zhou et al. 2010). This approach calculates the posterior distribution by projecting the continuous belief space of the unknown parameter to the closest (in the sense of the minimum Kullback–Leibler divergence) discrete distribution that matches the family of the prior distribution. This projected belief becomes a continuous state MDP that can be solved in a number of ways, for example using discretization techniques. In resource management, this has been applied to a hierarchical beta distributed model with both continuous action and belief state spaces (Springborn and Sanchirico 2013). A simpler alternative treatment of parameter uncertainty is to discretize the parameter into a finite number of plausible values and attach a degree of belief to each value (Fig. 2). This is equivalent to the treatment of model uncertainty and can be computed using discretized belief MDP or partially observable Markov decision processes (POMDP), as described in the next section. Model uncertainty In adaptive management, model uncertainty is represented as alternative hypotheses (Bmodels^) about how the system dynamics function. Adaptive management tools to reduce model uncertainty were first proposed in the fisheries literature as early as 1978 (Silvert 1978) and were included in Walters’ seminal text on adaptive management (Walters 1986). In the mid-1990s, adaptive management under model uncertainty was successfully implemented by the US Fish and Wildlife Service to set harvest quotas for mallards in the USA (Johnson et al. 1997; Nichols et al. 1995), which set the stage for a plethora of other adaptive management studies designed to reduce model uncertainty in conservation and resource management (Johnson et al. 2002; Martin et al. 2009; McDonaldMadden et al. 2010b; Moore and Conroy 2006; Smith et al. 2013; Williams 2011a). The key prerequisite for an optimal adaptive management system designed to reduce model uncertainty is that plausible alternative hypotheses about system function can be articulated. The hypotheses (models) can take many forms, so long as the transition probabilities between states can be computed under each possible hypothesis (model). This is a key point of difference between the methods used to solve model and parameter uncertainty (which requires either specific parameter distributions with known conjugate priors or other suitable sufficient statistics). Because convenient sufficient statistics may not exist when confronted to parameter uncertainty, many adaptive management studies use the methods of model

8

uncertainty to distinguish between a small number of values of a single parameter (McDonald-Madden et al. 2010b; Moore et al. 2011; Runge 2013), and in these cases, parameter uncertainty could, in principle, be used instead. However, where multiple parameters are uncertain and key hypotheses need to be tested, model uncertainty is currently the only tractable approach (Williams 2009). For example, Moore et al. (2008) posited two alternative models of how burning affects population growth of a threatened plant by varying parameters associated with juvenile survival and reproduction. Similarly to parameter uncertainty, when modeling an active adaptive management problem under model uncertainty, we must predict how implementing actions will change our future knowledge, so that our chance of achieving our objective is maximized. To do so, one must include in the state space not only the information about the state of the systems but also the current and future knowledge, i.e., the probability distribution over possible models (belief). Finding the best action to implement becomes a function of both the state of the system and belief over the models. The classic approach requires solving an MDP with a continuous belief state space (belief MDP). Continuous MDPs are computationally hard to solve, and approximate solution techniques must be used to derive solutions. A natural way to overcome this limitation is to discretize the continuous belief state space and solve a discrete state MDP. We describe a simple method that illustrates the required steps (see algorithm Table 4). In the planning stage, the optimal policy is determined for discrete portions of the belief space. For a problem with M hypotheses or models, the belief state space is discretized into an M-dimensional grid representing the possible belief states of the system. Discrete elements of the grid contain areas of continuous belief space B for which the transition matrix P(st+1,bt+1|st,at,bt) must be calculated. This can be done by repeatedly simulating each possible model in the proportions indicated by the target belief state (Line 6, Table 4). Simulated results are stored, and the probability of transition is determined by dividing the number of transitions observed by the total number of simulations. The policy can then be calculated and implemented by executing the optimal action for the current state and belief state. The execution stage is similar to the parameter uncertainty case (Table 3), except that the belief is updated using the discrete formulation (Eq. 2). This approach and variants have been applied broadly from the harvest of natural resources (Martin et al. 2009) to the conservation of threatened species (McDonald-Madden et al. 2010b; Moore et al. 2011)(see Table 1 for additional references). Despite the simplicity of solving discretized belief-MDPs, the computational costs become very high as the dimensionality of the problem increases (Brafman 1997; Zhou and Hansen 2001). The limitations and inefficiency of discretized belief-MDP approaches (fixed or variable) are well-documented, and this solution technique is inadequate to solve

Theor Ecol (2017) 10:1–20

problems with more than a handful of models (Bonet 2002; Lovejoy 1991). Following upon the work of Chadès et al. (2008), MacKenzie (2009) first raised the possibility of using POMDP to tackle adaptive management problems. Later on, Williams (2011b) recognized that model uncertainty can be modeled using methods developed for dealing with partial observability. However, Williams (2011b) proposed a complex transition function suggesting that POMDPs must account for both observational and state uncertainty (Fackler and Pacifici 2014). Building on these previous works, Chadès et al. (2012) took advantage of a useful simplification: not knowing the correct model is equivalent to not being able to observe the model. This realization allows us to transform model uncertainty into observation uncertainty, making it analogous to a standard POMDP where the hidden variable represents the correct model. Chadès et al. (2012) further showed that where some state variables are perfectly observable and some partially observable, the problem can be modeled as a mixed observability MDP (MOMDP). This observation allows modelers to factor the state space, which exploits the conditional independence of variables within the joint probability of transition to develop even faster solution methods. Indeed, classic non-factored representations need to store the probabilities of transition between all possible states even though a state variable may not affect the state of another variable and vice versa. In this way, the classic algorithms are inefficient because they store information that is not needed to find an optimal solution. A better way is to use the structure of the problem to store information for the state variables that directly affect other state variables (Chadès et al. 2011, 2012). In the case of adaptive management (Fig S1), an unknown variable (ht, model or parameter) influences the state of the system (st, e.g., abundance of a population); however, the unknown variable is not influenced by the state of the system or the management actions, i.e., p(ht+1 | st, ht, at) = p(ht+1 | ht). If an optimization problem has many independent variables, we can solve larger size problems using factored representations because we have fewer state interactions to consider and store (Boutilier and Dearden 1994). Partially observable MDPs are MDPs where one or more state variables cannot be observed with certainty. In the ecology literature, examples of state variables that cannot be observed with certainty include abundance of a population (Nicol and Chadès 2012), presence of cryptic threatened species (Chadès et al. 2008) and, infected or susceptible status of populations vulnerable to disease (Chadès et al. 2011). First studied in the operation research literature, POMDP provides a way of reasoning about trade-offs between actions to gain rewards and actions to gain information (Monahan 1982). To take into account the incomplete observability of the system, POMDP models augment MDP models with a finite set of possible observations O and the corresponding observation function Z that maps to each state-action pair a

Maximize harvest value and minimize inter-annual harvest variance

Maximize harvest value in a fishery

Maximize harvest value in a fishery

Maximize harvest value in a fishery Maximize fish population discoveries

Maximize harvest value in a fishery

Maximize harvest value in a fishery

Maximize harvest value in a fishery Maximize harvest value in a fishery

Maximize long term harvest value in a fishery Maximize long-term cumulative harvest of waterfowl, above a certain density threshold Maximum long-term cumulative harvest of waterfowl, above a certain density threshold

Perpetuating a maximum stream of oldgrowth forest habitat in a national wild-life refugee Maximize the expected number of successes over a specified number of time periods or maximize the expected number of time periods in which the number of successes is considered acceptable

(Walters 1975)

(Walters and Hilborn 1976)

(Smith and Walters 1981)

(Ludwig and Walters 1981) (Mangel and Clark 1983)

(Walters 1986) p269-273

(Walters 1986) p273-275

(Walters 1986) p275-278, 286-291 (Charles 1992)

(Frederick and Peterman 1995)

(Moore and Conroy 2006)

(McCarthy and Possingham 2007; Moore and McCarthy 2010)

(Johnson et al. 2002)

(Williams et al. 1996)

Objective of the problem

Probability of success of management action defined as Beta distribution with binomial updating

NA

NA

Stock-recruitment parameters (lognormal) NA

Sensitivity of productivity is updated using a Normal distribution Normal distributions Growth rate and maximum sustainable population size (Normal distribution)

NA

Stock production parameter, equilibrium stock parameter and covariance matrix. The parameters are updated using a recursive least square estimators. Non-specific Average density of detectable fish schools parameter is updated using a gamma distribution

Ricker production parameter with 3 empirically inspired distributions: optimistic, natural and pessimistic shapes Stock production parameter, equilibrium stock parameter and covariance matrix. The parameters are updated using a recursive least square estimators.

Uncertain parameter

Non-exhaustive list of papers that use a decision-theoretic adaptive management approach

Reference

Table 1

NA

Two models for the influence of kill rate over survival and two models for the influence of the number of ponds over reproduction rate 3 models describing the growth and ageing of a forest

Two alternative models of population response to harvest and survival

NA

NA NA

Two alternative models of population response to escapement choice NA

NA NA

NA

Ricker and Beverton-Holt population model

NA

Uncertain model

Active/Passive

Active

Active/Passive

SDP/MDP

Discretized belief MDP

SDP/MDP

SDP/MDP

Active

Active/Passive

Wide-sense dual control Simulation. In each time step optimal policy determined by mean of parameters, after policy implemented parameters are updated. SDP/MDP

Dynamic programming

Numerical optimization at equilibrium Dynamic programming with gamma parameters as state variables, optimized for one updating time step SDP/MDP

Stochastic Dynamic Programming, solutions for one uncertain parameter at a time are presented. Management scenarios are evaluated by simulation for model uncertainty. Approximate stochastic dynamic programming "Wide sense dual control"

SDP/MDP

Optimization method

Active/Passive Passive

Active

Active

Passive Active

Active

Active for parameter

Passive

Passive/Active

Theor Ecol (2017) 10:1–20 9

Maximum long term fish stock harvest Maximize socio-economic benefits from harvesting ecosystem, while minimizing the probability that the social and ecological systems cross a given critical but uncertain threshold. Translocation of threatened species, choosing between introducing to two sites

Maximizing the population viability of the Tasmanian Devil affected by a tumor facial disease Maximize outflow but provide a minimum flow for habitat

Maximize time-discounted plant population size across years without burning Maximize impoundment productivity and shorebird use General

Maximize density of a threatened species

Maximize introduction success of a threatened species Maximize harvest profit of a species while maintaining abundance of another species (predator–prey) Maximize harvest profit

Maximize occurrences of a "good" state

Maintain minimum proportion of different habitats under climate change Maximize migratory shorebirds populations across space and subject to seal level rise

(Moore et al. 2008) (Martin et al. 2009)

(McDonald-Madden et al. 2010b)

(Moore et al. 2011)

(Williams et al. 2011)

(Chadès et al. 2012)

(Runge 2013)

(Springborn and Sanchirico 2013)

(Fackler and Pacifici 2014)

(Nicol et al. 2014)

(Southwell et al. 2016)

(Nicol et al. 2015)

(Smith et al. 2013)

(Williams 2011a)

(Martin et al. 2011)

Maximize probability that at least one patch is occupied at end of program, or maximize total number of patches occupied at end of program

Maximum long term fish stock harvest

(Hauser and Possingham 2008)

(Rout et al. 2009)

Objective of the problem

Reference

Table 1 (continued)

Colonization rate between patches

NA

NA

Survivorship rate (fish) follows a beta distribution over the years, whose mean is unknown (known variance) and described by another beta NA

Survival rate defined as Beta with binomial updating NA

NA

NA

NA

NA

NA

Mortality rate at one site represented as Beta distribution with Binomial updating NA

Recovery rate after stock collapse, modeled as a Beta distribution with binomial updating Beta with binomial updating NA

Uncertain parameter

Two alternative models representing an optimistic or a pessimistic outcome Three inflow scenarios based on historical rainfall trends Three models representing alternative responses to management under sea level rise NA

Three models representing the effect of weight gain on survival and fecundity NA

Uncertain disease latency, yielding different population growth rates under each action Three models representing alternative water demands under different rates of sea level rise Two models describing the juvenile plant stage response to burning Three models representing alternative responses to drawdown Two models representing alternative responses to management Four models representing alternative responses to management NA

NA

NA Two scenarios for the minimum amount of water in a patch necessary for a species to colonize this patch.

NA

Uncertain model

Active/Passive

Active

Passive

Active

Active/Passive

Active

Active

Active

Active

Active/Passive

Active

Passive

Active

Active/Passive

Active/Passive Active/Passive

Active/Passive

Passive/Active

SDP/MDP

Modeled and solved as POMDP and factored POMDP

Formulated as an extended POMDP and solved as a discretized belief MDP SDP/MDP

Discretized belief MDP

Discretized belief MDP

Modeled and solved as POMDP and factored POMDP Discretized belief MDP

POMDP

SDP/MDP/ discretized belief MDP

Discretized belief MDP

SDP/MDP

Discretized belief MDP

SDP/MDP

SDP/MDP SDP/MDP/discretized belief MDP

SDP/MDP

Optimization method

10 Theor Ecol (2017) 10:1–20

Theor Ecol (2017) 10:1–20

probability distribution over O. In the case of model uncertainty, a factored POMDP is characterized by the tuple 〈X, A, O, P, Z, r, γ〉. X = S × M represents the factored state space. S denotes states in the MDP sense—i.e., the possible conditions of the system. We consider the unknown model set M to be state variables. A is the set of available management actions to control the system. O is the set of observations perceived by a manager. If all states in S are observable, then O = S. In adaptive management with model uncertainty, the unknown model is hidden and cannot be observed—we infer the state of the model through observation of the state variables S. P is the transition matrix. Elements of the matrix Pm(st + 1|st, at) represent the probability of observing state st + 1 after taking action at given the current state is st and the correct model is m. Z is the observation function p(ot + 1|xt + 1, at) describing the probability of observing ot + 1 from factored state xt + 1 after taking action at. r is the reward function and γ is the discount factor as defined for a MDP (section Important concepts). The optimal decision at time t depends on the complete history of past actions and observations. Belief states are sufficient statistics used to summarize and overcome the difficulties of incomplete detection (Åström 1965) (section Important concepts). Solving a factored POMDP means finding a policy π: S x B → A, mapping a decision given a state of the system (s ∈ S) and a current belief over the set of models M (b ∈ B). An optimal policy maximizes the discounted expected sum of rewards over a finite or infinite time horizon. As for MDPs, this expected summation is referred to as the value function. In the context of adaptive management of problems subject to model uncertainty but with perfect state observation, the value function equations can be simplified because the model m is the only

11

function by selecting the action that gives the highest value for each state of the system. While algorithms have been developed over the past years, exact resolution of POMDPs is in general intractable: finite horizon POMDPs are PSPACEcomplete (Papadimitriou and Tsitsiklis 1987), and infinitehorizon POMDPs are undecidable (Madani et al. 2003). Approximate methods have been developed to solve large POMDPs. Among them, point-based approaches approximate the value function (Eq. 4) by updating it only for selected belief states (Pineau et al. 2003; Spaan and Vlassis 2005). Typical point-based methods sample belief states by simulating interactions with the environment and then update the value function and its gradient over a selection of those sampled belief states. In many cases, software is available for these methods. For example, methods such as SARSOP (Kurniawati et al. 2008), Perseus (Spaan and Vlassis 2005), and Symbolic Perseus (Poupart 2005) have been used successfully for adaptive management problems (Nicol et al. 2013) and spatial optimization (Chadès et al. 2011).The size and complexity of the problems that have been solved with these advanced methods are much larger than those solved with discretized belief-MDP approaches (Ong et al. 2010). There are costs associated with modeling an adaptive management problem as a POMDP. Using POMDP requires users to navigate the specialized literature on the topic, but see Chadès et al. (2008), McDonald-Madden et al. (2011), Regan et al. (2011) for illustrative applications in conservation and discussions about pros and cons. An additional issue arises when exploring the solutions provided by POMDPs. Although a POMDP solution can be represented as a decision graph (Nicol and

(4)

unknown variable (Chadès et al. 2012): where bt + 1 is calculated according to Bayes’ rule (Eq. 2). The optimal policy can be obtained from the optimal value

Chadès 2012), decision graphs are often too detailed to be presented to managers, and simplifications are required. Simplifications have traditionally been restricted

12

to heuristics or rules of thumb (Chadès et al. 2011; Nicol and Chadès 2012). However, a recent approximate method (alpha-min) allows users to set the maximum number of decisions they are willing to consider at each time step with a measure of the performance loss accrued (Dujardin et al. 2015). This approach empowers managers to trade simplicity of the POMDP solution against performance. Alpha-min is able to provide simple, nearoptimal policies for POMDPs which enables improved interpretation of results and better communication of outcomes to decision makers. Although alpha-min does not yet scale well with large problems, this research direction warrants further study. Perhaps because the solution methods for reducing model uncertainty are more general than those designed to reduce parameter uncertainty, the majority of adaptive management studies published today rely on the methods developed to reduce model uncertainty. Software for model uncertainty methods has also been more available than for parameter uncertainty (Fackler 2013; Lubow 1997), which may have contributed to its relative popularity.

Passive adaptive management Active adaptive management is the state-of-the-art adaptive management method because it offers guaranteed optimal performance: there is in theory no better way of achieving our management objective. Where practical, active adaptive management should be used. However, active adaptive

Theor Ecol (2017) 10:1–20

particularly high when the sufficient statistics take continuous values. Motivated by the need to scale up applications of adaptive management, passive adaptive management methods (Walters 1975, 1986) designate heuristics that calculate the best next decision assuming that the belief will not change into the future (this assumption is referred to as Bcertainty equivalence^ in control theory (Bertsekas 1995)). In many practical cases, it is possible to achieve good performance using passive adaptive management (Rout et al. 2009). Two approaches are commonly used to generate passive adaptive management policies: the weighted average and the most likely value. The weighted average methods to solve adaptive management problems are similar regardless of whether the uncertainty is parameter uncertainty or model uncertainty; so, we address the two uncertainties together. Recall that when solving passive adaptive management problems, the optimization assumes that the current knowledge of the system will not change over time. Learning occurs once the effect of an implemented action is monitored and not during the optimization procedure. The sufficient statistic used is a belief over the models or parameters and is updated using Bayes’ rule (Eqs. 1 and 2). During the implementation phase, the optimal action, at, is selected using weighted averaging. For each potential action, model or parameter responses (i.e., transition probabilities) are averaged across all models or parameter values, with the weights given by the current belief (Williams 2011a). To determine an optimal policy, a passive adaptive manager averages future outcomes over all plausible parameter values (Walters 1975):

(5)

management requires augmenting the state space with sufficient statistics and projecting the sufficient statistics into the future, generating an important computational cost (this is known as the Bcurse of dimensionality^). This cost is

When the unknown quantity θ takes discrete values, the integral in Eq. 5 is replaced by a summation. The resulting state, st + 1, is observed and belief, bt + 1, is updated according to Bayes’ rule. Because the transition matrices change as the

Theor Ecol (2017) 10:1–20 Table 2 Pseudocode for solving active adaptive management problems under parameter uncertainty using an exact stochastic dynamic programming approach, assuming that θ follows a Beta distribution

13

Calculate optimal active adaptive policy under parameter uncertainty assuming θ follows a Beta distribution with sufficient statistics (α,β) Input

S: set of states; A: set of actions; r: rewards; T: time horizon; (α,β)t: t finite sets of shape parameters for the beta distribution;

Output

P(.) = f(st,at, θ ∼ Beta (α,β)): generates a vector of probability distributions over future states based on the equation of the dynamics of the system as a function of the current state, action and unknown parameter θ ∼ Beta (α,β);

P: probability of transitions over set of states S and shape parameters (α,β)t and actions A; {π1, π2,…, πT}: optimal policy for the finite horizon MDP

1 2

% calculate transition probabilities P using sufficient statistics to represent uncertain parameter For all ai in A, all sj in S, (α,β)i in (α,β)t P(.|si, (α,β)t,ai) = f(si,ai, θi ∼ Beta (αi,βi)) ;

3 4

endFor % solve finite MDP problem e.g. (Chadès et al. 2014; Fackler 2013; Marescot et al. 2013) {π1, π2,…, πT} = solve_MDP(S,{(α,β)t}t=1..T,A,P,r,T);

belief changes, this approach requires that an MDP is solved in every time step (Tables 6 and 7). For model uncertainty problems, there exists a more basic procedure for solving passive adaptive management problems: the Bmost likely value.^ In this method, actions are selected based on the model with the highest belief (Williams 2011a). In the planning phase, a MDP is solved for each of the |M| candidate models, resulting in |M| policies specifying the optimal action to take for each state of the system (Table 8). During the implementation phase, the optimal action, at, is selected according to the model with the highest belief, i. The resulting state, st + 1, is observed and belief, bt + 1, is updated according to Bayes’ rule. This approach does not require that an MDP is solved in every time step; so, the computational cost of this approach is less than that of the previous approach. Numerous approaches based on the certainty equivalence principle have extensively been developed in the adaptive control literature (Filatov and Unbehauen 2000; Wittenmark 1995). Table 3 Pseudocode to implement active adaptive management problems. In this case, we use the sufficient statistics (α,β) to represent our knowledge on the uncertain parameter θ.

Similarly, in the artificial intelligence literature, heuristics have been developed to solve large POMDPs that could easily be used as passive adaptive management approaches (Cassandra and Kaelbling 1995). Advantages and limitations of these approaches have yet to be assessed in an ecological context.

Methodological challenges and discussion Recent advances Adaptive management methods have advanced significantly with the advent of powerful computational techniques for Bayesian updating. In particular, modeling adaptive management problems under model uncertainty as POMDPs allows us to solve previously unsolvable problems. As POMDP methods are more widely adopted in the ecological modeling community, the diversity of ecological challenges that can be

Implement active adaptive management Input Output 5 6 7 8 9 10

sinit: initial state; α0,β0 : initial shape parameters defining a Beta distribution; T: time horizon; {π1, π2,…, πT} : finite horizon optimal policy a1, s1, α1,β1, .., aT, sT, αT,βT: history of actions implemented, states observed, updated shape parameters at each time step 1 to T; st = sinit; αt = α0; βt = β0; For t = 1:T at = πt(st,αt,βt); st+1 = implement_action(st,at); % implement action using optimal policy calculated Table 2 and monitor αt,βt = update_sufficient_statistics(st+1,st, αt,βt,at); % see Eq. 1 endFor

14 Table 4 Pseudocode for solving active adaptive management for model uncertainty solved using the discretized belief MDP approach

Theor Ecol (2017) 10:1–20

Calculate active adaptive management using discretized belief MDP approach Input Output

1 2 3 4 5 6 7 8 9 10 11

S: set of states; A: set of actions; r: rewards; B: finite set of discretized belief points PB: probability of transitions over set of states S and discretized belief points B π: optimal policy for the discretized belief MDP B = discretize_grid(k,distance); % discretize the belief space over models using a grid approach For action ai in A, state sj in S, belief points b in B % calculate PB by means of simulations ns = zeros(B,1); For n = 1 to maxn bn = sample(b,distance); % sample neighboring points [sj’,b’] = simulate_action(ai,sj,bn); % b’ in B, is the nearest belief point ns(b’) = ns(b’) + 1; % counting endFor PB(sj’,b’|sj,b,ai) = ns(b’)/maxn; % averaging endFor π = solve_MDP(S,B,APB, r); % solve the discretized belief MDP e.g. (Chadès et al. 2014; Fackler 2013; Marescot et al. 2013)

managed with adaptive management methods will expand. Major contemporary ecological issues like climate change (non-stationarity), imperfect detection, and multi-objective management are now being solved using POMDPs. Here, we review some recent advances in these cutting-edge problem domains and discuss how adaptive management optimization methods could be expanded to robust and multi-actor decision-making. Non-stationary dynamics The dynamics of the system are commonly assumed to stay the same over time, i.e., stationary. In the case where changing dynamics of the system must be accounted for over time, it is possible to calculate non-stationary adaptive management strategies. This is particularly useful for models that accommodate Table 5 Pseudocode for solving active adaptive management for model uncertainty solved using POMDP approach

climate change. There are two main approaches to incorporate non-stationary dynamics. First, the suite of models can be composed of different rates of change so that the transition matrix changes every year in a known way (Martin et al. 2011; Nichols et al. 2011; Nicol et al. 2014). This approach uses standard model uncertainty techniques to learn the true rate of change but can only change at the rates specified in the model suite. In the second approach, the suite of models is composed of stationary models so that the transition matrices may change at any rate (Nicol et al. 2013, 2015). Unlike the first approach, which pre-specifies the rate of change, Nicol et al. (2015) specify a small probability of transition between models, allowing any candidate model to be true at a given time. While this approach allows more freedom in the rate of change between models, change is assumed to be less gradual than in the first approach. This approach requires estimates of the probability of

Calculate optimal active adaptive policy under model uncertainty using belief states as sufficient statistics Input

Output

1 2 3 4

S: set of states; P(.) = f(st,at, mi): generates a vector of probability distributions over future A: set of actions; states based on the equation of the dynamics of the system as a function of the current state st, action at and model mi in M; r: rewards; T: time horizon; M: finite set of models; P: probability of transitions over set of states S, models M, actions A; π: optimal policy % calculate transition probabilities P using sufficient statistics to represent uncertain parameter For all ai in A, all sj in S, mi in M P(.|si, mi,ai) = f(si,ai, mi) ; endFor % solve factored POMDP (Ong et al. 2010; Poupart 2005) π = solve_POMDP(S,M,A,P,r);

Theor Ecol (2017) 10:1–20

15

transition between models; however, this assumption can be removed by treating the rate of change between models as a hidden probability that can take discrete values (Nicol et al. 2015). This comes at an additional computational cost because an additional hidden parameter must be learned. Imperfect detection and monitoring Adaptive management techniques are tailored to address structural uncertainty; however, they can be extended to address other kinds of uncertainty including state uncertainty due to measurement error. POMDPs are an obvious candidate to tackle imperfect detection in adaptive management as they have been applied to partially observable problems in robotics for decades (Chadès et al. 2012; Fackler and Haight 2014). In adaptive management, POMDPs can help to decide when to change monitoring effort under imperfect detection (Fackler and Haight 2014). If the cost of adaptive management is an impediment to uptake, accounting for the cost of monitoring in adaptive management can help to minimize the cost of an adaptive management program (Haight and Polasky 2010; Moore and McCarthy 2010; White 2005). Indeed, some species might not require monitoring at each time step, as monitoring might not be needed to inform the next decisions. Monitoring decisions should be part of the optimization problem or else we risk wasting precious resources (Chadès et al. 2008; MacKenzie 2009; McDonald-Madden et al. 2010a). Multi-objective approaches A current challenge in applied ecology is the need to account for multiple objectives when deciding the best management action to implement (Kareiva et al. 2014). In multi-objective problems, the objective is transformed into a vector of objectives. Unlike Table 6 Pseudocode for solving passive adaptive management problems under parameter uncertainty using the weighted average approach

single-objective problems, multi-objective problems generally admit several optimal vectors of values. Each optimal vector corresponds to a possible Bbest compromise^ between the different objectives. The set of these optimal vectors is called the Pareto-frontier (Ehrgott 2005). One way to solve a multiobjective problem consists of generating the entire Pareto-frontier. Because the Pareto-frontier can be exponentially large even for two objectives, approximate Pareto-frontiers are usually sought. Roijers et al. (2015) propose a way of generating the Pareto-frontier of multi-objective POMDP producing stochastic policies. This work is attractive because it enables multiobjective active adaptive management problems to be solved. However, dealing with stochastic policies is not convenient in applied fields where simplicity of the solution is important. In the case of the passive adaptive management approach, MDPs can be solved in a multi-objective context providing deterministic policies and at a small computational cost (Perny and Weng 2010). Multi-actor approaches In the case where several actors manage a same resource, a multi-actor adaptive management problem could be formulated as a sequential decision problem under uncertainty. In artificial intelligence, these types of decision models are known as decentralized MDP, decentralized POMDP (Bernstein et al. 2002) or Multiagent MDP (Boutilier 1999; Littman 1994). However, most of these multi-actor problems do not have exact solution methods (Amato and Oliehoek 2015; Chades et al. 2002; Dibangoye et al. 2016). Perhaps the most accessible but also the most constrained model are the multiagent MDP models (Boutilier 1999; Chades and Bouteiller 2005). In its simplest form, a multiagent MDP assumes that the action space is factored and actors share a common objective. The

Implement passive adaptive management policy under parameter uncertainty assuming θ follows a Beta distribution with sufficient statistics (α,β) Input

Output 1 2 3 4 5 6 7 8

S: set of states; sinit: initial state;A: set of actions;r: rewards; α0,β0 : initial shape parameters defining a Beta distribution; T: time horizon a1, s1, α1,β1, .., aT, sT, αT,βT: actions implemented, states observed, updated shape parameters at each time step 1 to T; st = sinit; αt = α0; βt = β0; For t = 1:T Pαt,βt = calculateP(αt,βt); % calculate P(st+1|st,at) for a given αt,βt πt = MDPsolve(S,A, Pαt,βt,R,T); % finite horizon or infinite horizon at = πt(st); st+1 = implement_action(st,at); % implement action using policy πt calculated line 4 and monitor αt,βt = update_sufficient_statistics(st+1,st, αt,βt,at); % see Eq. 1 endFor

16 Table 7 Pseudocode for solving passive adaptive management under model uncertainty using the weighted average approach

Theor Ecol (2017) 10:1–20

Implement passive adaptive management policy under model uncertainty Input

Output

sinit: initial state; A: set of actions; binit: initial belief over the k models; r: rewards; T: time horizon; P1-k: probability transitions for models 1 to k a1, s1, b1, .., aT, sT, bT: actions implemented, states observed, updated belief states at each time step 1 to T;

1

st = sinit; bt = binit;

2 3 4 5 6

For t = 1:T Pa = weighted_average(P1-k, bt) % calculate weighted average transition probabilities πa = solve_MDP(S,A,Pa,r); % solve the MDP with the new transition probabilities at = πa(st); % Select action at using policy πa st+1 = implement_action(st,at); % implement and monitor bt+1 = update_sufficient_statistics (st+1,st,bt,at); % see Eq. 2

7

S: set of states;

endFor

complexity of solving a multiagent MDP is the same as an MDP; however, the size of the action space is exponential in the number of actors which would be a strong computational limiting factor. Multiagent MDP could be easily used in a passive adaptive management context and address some of the most pressing ecological problems.

operation research, Nilim and El Ghaoui (2005) provide an approach for solving robust MDP with unknown transition matrices in a passive adaptive management set up. In the active case, we would rely on available methods to solve robust POMDPs (Osogami 2015). None of these approaches have been assessed in an ecological context.

Robust approaches

Caveats

In the case where decision makers are interested in risk averse adaptive management solutions, methods to solve robust Markov decision processes (Givan et al. 2000; Wiesemann et al. 2013) could be adapted directly as passive adaptive management approaches. In

Under parameter uncertainty, it is assumed that the uncertain parameter can be modeled using a specified probability distribution (Table S1). Research on the consequences of assuming the wrong probability distribution does not exist, but a poor

Table 8 Pseudocode for solving passive adaptive management algorithm under model uncertainty using the most likely model approach

Planning stage of passive adaptive management under model uncertainty Input Output 1 2 3

S: set of states; A: set of actions; r: rewards; P1-k: probability transitions for models 1 to k π1, π2,…, πk: optimal policy for models 1 to k For all models (i = 1:k) πi = solve_MDP(S,A,Pi,r); % solve the corresponding MDP endFor

Implement passive adaptive management policy under model uncertainty Input Output 4 5 6 7 8 9 10

sinit: initial state; binit: initial belief over the k models; T: time horizon a1, s1, b1, .., aT, sT, bT: actions implemented, states observed, updated belief states at each time step 1 to T; st = sinit; bt = binit; For t = 1:T i = argmax (bt); % select the model with the highest belief at = πi(st); st+1 = implement_action(st,at); % implement and monitor bt+1 = update_sufficient_statistics(st+1,st,bt,at); % see Eq. 2 endFor

Theor Ecol (2017) 10:1–20

model selection may result in poor performance. Similarly, under model uncertainty, it is assumed that one of the candidate models representing future dynamics must be close to the true model. If the model set does not approximate the true scenario, then the best solutions may not be optimal. POMDP can in principle accommodate a large number of models, but as the number of possible models increases, distinguishing between models becomes difficult. Models must be similar enough to provide adequate resolution but different enough to require alternative optimal management strategies; there is no need to distinguish between models if the management response is the same (Nicol et al. 2015). Selecting the minimum set of models to include in adaptive management problems is a modeling decision for which no guidance can be found in the literature. For both parameter and model uncertainty, providing the tools to detect when these assumptions about the true model are violated during the adaptive management process would provide further confidence in optimization approaches. We have assumed that the dynamics of the system, although uncertain, can be modeled as a Markov chain. This assumption is common and rarely discussed. Assuming the Markov property means that adaptive management problems can be modeled as MDP or POMDPs and solved using stochastic dynamic programming techniques. However, many ecological systems would not fit this property. This is particularly true for systems that exhibit delays in response to management. For example, restoration problems require management actions such as planting trees for which the benefits would only be known in the future. More generally, management of species with complex life cycles for which management only targets specific stages, ages, or sizes are unlikely to fit a Markov chain. Unfortunately, there are no off-the-shelf adaptive management optimization methods for nonMarkovian problems. In the control theory literature, not assuming that the Markov property usually means that the method will rely on sub-optimal Monte Carlo simulation approaches, or the problem formulation would need to be simplified, e.g., linear transitions and a quadratic objective function can be solved using Kalman filters (Grewal 2011).

Conclusions Despite progress in developing optimal adaptive management strategies, uptake of adaptive management remains low. Financial commitment to long-term monitoring and management is rarely achieved (Keith et al. 2011; Westgate et al. 2013). A major challenge for adaptive management theoreticians is to generate and communicate real-world applications to prove that these methods can work successfully in practice (Canessa et al. 2016). At least one impediment to uptake is that optimal adaptive management is designed for a specific set of pre-conditions (Fig. 1) but has been poorly defined in the past

17

(Runge 2011). In fact, optimal adaptive management may not be the panacea for all management problems that common wisdom suggests. It is perhaps surprising that adaptive management has not had more of an impact even within fisheries management or more generally management of coastal ecosystems (Walters 1997). As outlined by Walters, some of the problems may be difficult to overcome, such as cross-scale issues. However, other challenges summarized by Walters are essentially the issues we have addressed—such as how to learn from limited empirical data and address structural uncertainty with modeling efforts. By specifying the appropriate decision context and providing detailed methods of the state-of-the art in optimal adaptive management methods, we have clarified which adaptive management approach is appropriate (Figs. 1 and 2) and how to implement it (Tables 2, 3, 4, 5, 6, 7, and 8), so that practitioners and modelers may finally bridge the gap between theory and implementation.

Acknowledgments The authors would like to thank Gwen Iacona and Ayesha Tulloch for commenting on earlier versions of this manuscript. The idea of this review paper emerged at the BNatural resource management^ workshop organized by the Mathematical Biosciences Institute, Columbus (2013) and an adaptive management workshop supported by a CSIRO Julius Career Award (IC). TMR was supported by an Australian Research Council Discovery Grant (DP110101499). CEH was supported by the National Environmental Research Program Environmental Decisions Hub.

References Amato C, Oliehoek FA Scalable Planning and Learning for Multiagent POMDPs. In: Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015 Åström KJ (1965) Optimal control of Markov decision processes with incomplete state estimation. J Math Anal Appl 10:174–205 Åström K, Wittenmark B (2008) Adaptive control, 2nd edn. Dover Publications, Mineola Bellman RE (1957) Dynamic Programming. Princeton University Press, Princeton Bernstein DS, Givan R, Immerman N, Zilberstein S (2002) The complexity of decentralized control of Markov decision processes. Math Oper Res 27:819–840 Bertsekas DP (1995) Dynamic programming and optimal control vol 1, vol 2. Athena Scientific Belmont, MA Bonet B (2002) An epsilon-optimal grid-based algorithm for partially observable Markov decision processes. In: Proceedings of the 19th International Conference on Machine Learning (ICML-02), Sydney, Australia. Morgan Kaufman Publishers Inc., pp 51–58 Boutilier C, Dearden R (1994) Using abstractions for decision-theoretic planning with time constraints. In: Proceedings of the Twelfth AAAI National Conference on Artificial Intelligence. AAAI Press, pp 1016–1022 Boutilier C (1999) Sequential optimality and coordination in multiagent systems. In: IJCAI. pp 478–485 Brafman R (1997) A heuristic variable grid solution method for POMDPs. In: Proceedings of the National Conference on Artificial Intelligence (AAAI-97), Providence, Rhode Island. pp 727–733

18 Canessa S et al (2015) When do we need more data? A primer on calculating the value of information for applied ecologists. Methods Ecol Evol 6:1219–1228. doi:10.1111/2041-210x.12423 Canessa S et al (2016) Adaptive management for improving species conservation across the captive-wild spectrum. Biol Conserv 199:123– 131. doi:10.1016/j.biocon.2016.04.026 Cassandra AR, Kaelbling LP (1995) Learning policies for partially observable environments: Scaling up. In: Machine Learning Proceedings 1995: Proceedings of the Twelfth International Conference on Machine Learning, Tahoe City, California. Morgan Kaufmann, p 362 Chades I, Bouteiller B Solving multiagent Markov decision processes: a forest management example. In: Proceedings of the International Congress on Modelling and Simulation (MODSIM 2005), 2005. pp 1594–1600 Chades I, Scherrer B, Charpillet F (2002) A heuristic approach for solving decentralized-pomdp: Assessment on the pursuit problem. In: Proceedings of the 2002 ACM symposium on Applied computing. ACM, pp 57–62 Chadès I, McDonald-Madden E, McCarthy MA, Wintle B, Linkie M, Possingham HP (2008) When to stop managing or surveying cryptic threatened species. Proc Natl Acad Sci U S A 105:13936 Chadès I, Martin TG, Nicol S, Burgman MA, Possingham HP, Buckley YM (2011) General rules for managing and surveying networks of pests, diseases, and endangered species. Proc Natl Acad Sci 108: 8323–8328. doi:10.1073/pnas.1016846108 Chadès I, Carwardine J, Martin TG, Nicol S, Sabbadin R, Buffet O (2012) MOMDPs: a solution for modelling adaptive management problems. In: The Twenty-Sixth AAAI Conference on Artificial Intelligence (AAAI-12), Toronto, Canada. pp 267–273 Chadès I, Chapron G, Cros M-J, Garcia F, Sabbadin R (2014) MDPtoolbox: a multi-platform toolbox to solve stochastic dynamic programming problems. Ecography 37:916–920 Charles AT (1992) Uncertainty and information in fishery management models: a Bayesian updating algorithm. Am J Math Manag Sci 12: 191–225 Dibangoye JS, Amato C, Buffet O, Charpillet F (2016) Optimally solving Dec-POMDPs as continuous-state MDPs. J Artif Intell Res 55:443–497 Dujardin Y, Dietterich T, Chadès I (2015) alpha-min: a compact POMDP solver. In: International Joint Conference on Artificial Intelligence (IJCAI-2015), Buenos Aires, Argentina Ehrgott M (2005) Multicriteria optimization, 2nd edn. Springer, Berlin Fackler P (2013) MDPSOLVE Software for Dynamic Optimization Fackler PL, Haight RG (2014) Monitoring as a partially observable decision problem. Resour Energy Econ 37:226–241 Fackler P, Pacifici K (2014) Addressing structural and observational uncertainty in resource management. J Environ Manag 133:27–36. doi:10.1016/j.jenvman.2013.11.004 Filatov NM, Unbehauen H (2000) Survey of adaptive dual control methods. IEE Proc - Control Theory Appl 147:118–128. doi:10.1049/ip-cta:20000107 Firn J, Rout T, Possingham H, Buckley YM (2008) Managing beyond the invader: manipulating disturbance of natives simplifies control efforts. J Appl Ecol 45:1143–1151. doi:10.1111/j.13652664.2008.01510.x Fisher RA (1922) On the Mathematical Foundations of Theoretical Statistics. Philos Trans R Soc Lond A: Math, Phys Eng Sci 222: 309–368. doi:10.1098/rsta.1922.0009 Frederick SW, Peterman RM (1995) Choosing fisheries harvest policies: when does uncertainty matter? Can J Fish Aquat Sci 52:291–306. doi:10.1139/f95-030 Fulton EA, Smith ADM, Smith DC, van Putten IE (2011) Human behaviour: the key source of uncertainty in fisheries management. Fish Fish 12:2–17. doi:10.1111/j.1467-2979.2010.00371.x Givan R, Leach S, Dean T (2000) Bounded-parameter Markov decision processes. Artif Intell 1:71–109

Theor Ecol (2017) 10:1–20 Gregory R, Ohlson D, Arvai J (2006) Deconstructing adaptive management: citeria for applications to environmental management. Ecol Appl 16:2411–2425 Grewal MS (2011) Kalman filtering. Springer Haight RG, Polasky S (2010) Optimal control of an invasive species with imperfect information about the level of infestation. Resour Energy Econ 32:519–533 Hauser CE, Possingham HP (2008) Experimental or precautionary? Adaptive management over a range of time horizons. J Appl Ecol 45:72–81. doi:10.1111/j.1365-2664.2007.01395.x Holling CS (1978) Adaptive environmental assessment and management. John Wiley & Sons, London Houston A, Clark C, McNamara J, Mangel M (1988) Dynamic models in behavioural and evolutionary ecology. Nature 332:29–34 Johnson FA, Clinton TM, Kendall WL, Dubovsky JA, Caithamer DF, Kelley JR Jr, Byron KW (1997) Uncertainty and the Management of Mallard Harvests. J Wildl Manag 61:202–216. doi:10.2307/3802429 Johnson FA, Kendall WL, Dubovsky JA (2002) Conditions and limitations on learning in the adaptive management of mallard harvests. Wildl Soc Bull 176–185 Kareiva P, Groves C, Marvier M (2014) REVIEW: The evolving linkage between conservation science and practice at The Nature Conservancy. J Appl Ecol 51:1137–1147. doi:10.1111/1365-2664.12259 Keith DA, Martin TG, McDonald-Madden E, Walters C (2011) Uncertainty and adaptive management for biodiversity conservation. Biol Conserv 144:1175–1178 Kurniawati H, Hsu D, Lee W-S (2008) SARSOP: Efficient Point-Based POMDP Planning by Approximating Optimally Reachable Belief Spaces. In: Proceedings of Robotics: Science and Systems IV, Zurich, Switzerland. pp 65–72 Littman ML (1994) Markov games as a framework for multi-agent reinforcement learning. In: Proceedings of the eleventh international conference on machine learning. pp 157–163 Lovejoy W (1991) Computationally feasible bounds for partially observed Markov decisions processes. Oper Res 39:162–175 Lubow BC (1997) Adaptive Stochastic Dynamic Programming (ASDP): Supplement to SFP User’s Guide, 20th edn. Colorado Cooperative Fish and Wildlife Research Unit, Colorado State University, Fort collins Ludwig D, Walters CJ (1981) Measurement Errors and Uncertainty in Parameter Estimates for Stock and Recruitment. Can J Fish Aquat Sci 38:711–720. doi:10.1139/f81-094 MacKenzie DI (2009) Getting the biggest bang for our conservation buck. Trends Ecol Evol (Personal Ed) 24:175–177 Madani O, Hanks S, Condon A (2003) On the undecidability of probabilistic planning and related stochastic optimization problems. Artif Intell 147:5–34 Mangel M, Clark CW (1983) Uncertainty, search, and information in fisheries. J Conseil 41:93–103. doi:10.1093/icesjms/41.1.93 Marescot L, Chapron G, Chadès I, Fackler P, Duchamp C, Marboutin E, Gimenez O (2013) Complex decisions made simple: a primer on stochastic dynamic programming. Methods Ecol Evol 4:872–884 Martin J, Runge MC, Nichols JD, Lubow BC, Kendall WL (2009) Structured decision making as a conceptual framework to identify thresholds for conservation and management. Ecol Appl 19:1079–1090 Martin J et al (2011) Structured decision making as a proactive approach to dealing with sea level rise in Florida. Clim Chang 107:185–202 Martin TG, Camaclang AE, Possingham HP, Maguire LA, Chadès I (2016) Timing of Protection of Critical Habitat Matters. Conserv Lett:n/a-n/a. doi:10.1111/conl.12266 McCarthy MA (2007) Bayesian methods for ecology. Cambridge University Press, Cambridge McCarthy MA, Possingham HP (2007) Active adaptive management for conservation. Conserv Biol 21:956–963 McCarthy MA, Possingham HP, Gill AM (2001) Using stochastic dynamic programming to determine optimal fire management for Banksia ornata. J Appl Ecol 38:585–592

Theor Ecol (2017) 10:1–20 McCarthy MA, Armstrong DP, Runge MC (2012) Adaptive Management of Reintroduction. In: Reintroduction Biology. John Wiley & Sons, Ltd, pp 256–289. doi:10.1002/9781444355833.ch8 McDonald-Madden E et al (2010a) Active adaptive conservation of threatened species in the face of uncertainty. Ecol Appl 20:1476– 1489. doi:10.1890/09-0647.1 McDonald-Madden E, Baxter PWJ, Fuller RA, Martin TG, Game ET, Montambault J, Possingham HP (2010b) Monitoring does not always count. Trends Ecol Evol 25:547–550. doi:10.1016/j. tree.2010.07.002 McDonald-Madden E, Chadès I, McCarthy MA, Linkie M, Possingham HP (2011) Allocating conservation resources between areas where persistence of a species is uncertain. Ecol Appl 21:844–858. doi:10.1890/09-2075.1 Mehta SV, Haight RG, Homans FR, Polasky S, Venette RC (2007) Optimal detection and control strategies for invasive species management. Ecol Econ 61:237–245. doi:10.1016/j.ecolecon.2006.10.024 Monahan GE (1982) Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms. MGMT SCI 28:1–16 Moore CT, Conroy MJ (2006) Optimal regeneration planning for oldgrowth forest: addressing scientific uncertainty in endangered species recovery through adaptive management. For Sci 52:155–172 Moore AL, McCarthy MA (2010) On Valuing Information in AdaptiveManagement Models. Conserv Biol 24:984–993. doi:10.1111/j.15231739.2009.01443.x Moore AL, Hauser CE, McCarthy MA (2008) How we value the future affects our desire to learn. Ecol Appl 18:1061–1069. doi:10.1890/07-0805.1 Moore CT et al (2011) An Adaptive Decision Framework for the Conservation of a Threatened Plant. J Fish Wildl Manag 2:247– 261. doi:10.3996/012011-jfwm-007 Nichols JD, Johnson FA, Byron KW (1995) Managing North American Waterfowl in the Face of Uncertainty. Annu Rev Ecol Syst 26:177– 199. doi:10.2307/2097204 Nichols JD et al (2011) Climate change, uncertainty, and natural resource management. J Wildl Manag 75:6–18 Nicol S, Chadès I (2012) Which States Matter? An Application of an Intelligent Discretization Method to Solve a Continuous POMDP in Conservation Biology. PLoS ONE 7:e28993. doi:10.1371 /journal.pone.0028993 Nicol SC, Possingham HP (2010) Should metapopulation restoration strategies increase patch area or number of patches? Ecol Appl 20: 566–581 Nicol S, Buffet O, Iwamura T, Chadès I (2013) Adaptive Management of Migratory Birds Under Sea Level Rise. In: Proceedings of the 23rd International Joint Conference on Artificial Intelligence, Beijing. pp 2955–2957 Nicol S, Griffith B, Austin J, Hunter CM (2014) Optimal water depth management on river-fed National Wildlife Refuges in a changing climate. Clim Chang 124:271–284 Nicol S, Fuller RA, Iwamura T, Chadès I (2015) Adapting environmental management to uncertain but inevitable change. Proc R Soc B 282 doi:10.1098/rspb.2014.2984 Nilim A, El Ghaoui L (2005) Robust control of Markov decision processes with uncertain transition matrices. Oper Res 53:780–798 Ong SCW, Png SW, Hsu D, Lee S (2010) Planning under Uncertainty for Robotic Tasks with Mixed Observability. Int J Robot Res 29:1053– 1068 Osogami T (2015) Robust partially observable Markov decision process. In: Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France. pp 106–115 Papadimitriou CH, Tsitsiklis JN (1987) The complexity of Markov decision processes. Math Oper Res 12:441–450. doi:10.1287 /moor.12.3.441 Parma AM (1998) What can adaptive management do for our fish, forests, food, and biodiversity? Integr Biol: Issues, News, Rev 1:16–26

19 Perny P, Weng P (2010) On finding compromise solutions in multiobjective Markov decision processes. In: European Conference on Artificial Intelligence (ECAI-2010), Lisbonne, Portugal. pp 969–970 Pichancourt JB, Chadès I, Firn J, van Klinken RD, Martin TG (2012) Simple rules to contain an invasive species with a complex life cycle and high dispersal capacity. J Appl Ecol 49:52–62 Pineau J, Gordon G, Thrun S (2003) Point-based value iteration: An anytime algorithm for POMDPs. In: International Joint Conference on Artificial Intelligence. Lawrence Erlbaum Associates LTD, pp 1025–1032 Poupart P (2005) Exploiting structure to efficiently solve large scale partially observable Markov decision processes. University of Toronto Puterman ML (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc, New York Regan HM, Colyvan M, Burgman MA (2002) A taxonomy and treatment of uncertainty for ecology and conservation biology. Ecol Appl 12: 618–628. doi:10.1890/1051-0761(2002)012[0618:atatou]2.0.co;2 Regan TJ, Chadès I, Possingham HP (2011) Optimal strategies for managing invasive plants in partially observable systems. J Appl Ecol 48:76–85 Roijers DM, Whiteson S, Oliehoek FA (2015) Point-based planning for multi-objective POMDPs. In: Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI2015), Buenos Aires, Argentina. Rout TM, Hauser CE, Possingham HP (2009) Optimal adaptive management for the translocation of a threatened species. Ecol Appl 19: 515–526. doi:10.1890/07-1989.1 Runge MC (2011) An Introduction to Adaptive Management for Threatened and Endangered Species. J Fish Wildl Manag 2:220– 233. doi:10.3996/082011-jfwm-045 Runge MC (2013) Active adaptive management for reintroduction of an animal population. J Wildl Manag 77:1135–1144. doi:10.1002/jwmg.571 Runge MC, Converse SJ, Lyons JE (2011) Which uncertainty? Using expert elicitation and expected value of information to design an adaptive program. Biol Conserv 144:1214–1223 Schlaifer R, Raiffa H (1961) Applied statistical decision theory. Clinton Press, Inc., Boston Sethi G, Costello C, Fisher A, Hanemann M, Karp L (2005) Fishery management under multiple uncertainty. J Environ Econ Manag 50:300–318. doi:10.1016/j.jeem.2004.11.005 Sigaud O, Buffet O (2010) Markov decision processes in artificial intelligence: MDPs, beyond MDPs and applications. ISTE/Wiley, Hoboken Silvert W (1978) The Price of Knowledge: Fisheries Management as a Research Tool. J Fish Res Board Can 35:208–212. doi:10.1139/f78-034 Smith ADM, Walters CJ (1981) Adaptive Management of Stock– Recruitment Systems. Can J Fish Aquat Sci 38:690–703. doi:10.1139/f81-092 Smith DR, McGowan CP, Daily JP, Nichols JD, Sweka JA, Lyons JE (2013) Evaluating a multispecies adaptive management framework: must uncertainty impede effective decision-making? J Appl Ecol 50: 1431–1440. doi:10.1111/1365-2664.12145 Southwell DM, Hauser CE, McCarthy MA (2016) Learning about colonization when managing metapopulations under an adaptive management framework. Ecol Appl 26:279–294. doi:10.1890/14-2430 Spaan M, Vlassis N (2005) Perseus: Randomized Point-based Value Iteration for POMDPs. J Artif Intell Res 24:195–220 Springborn M, Sanchirico JN (2013) A density projection approach for non-trivial information dynamics: adaptive management of stochastic natural resources. J Environ Econ Manag 66:609–624 Venner S, Chadès I, Bel-Venner M-C, Pasquet A, Charpillet F, Leborgne R (2006) Dynamic optimization over infinite-time horizon: Webbuilding strategy in an orb-weaving spider as a case study. J Theor Biol 241:725–733

20 Walters CJ (1975) Optimal Harvest Strategies for Salmon in Relation to Environmental Variability and Uncertain Production Parameters. J Fish Res Board Can 32:1777–1784. doi:10.1139/f75-211 Walters CJ (1986) Adaptive management of renewable resources. McGraw Hill, New York Walters C (1997) Challenges in adaptive management of riparian and coastal ecosystems. Conserv Ecol 1:1 Walters CJ, Hilborn R (1976) Adaptive Control of Fishing Systems. J Fish Res Board Can 33:145–159. doi:10.1139/f76-017 Walters CJ, Hilborn R (1978) Ecological optimization and adaptive management. Annu Rev Ecol Syst 9:157–188 Walters CJ, Ludwig D (1981) Effects of Measurement Errors on the Assessment of Stock–Recruitment Relationships. Can J Fish Aquat Sci 38:704–710. doi:10.1139/f81-093 Walters CJ, Ludwig D (1987) Adaptive management of harvest rates in the presence of a risk averse utility function. Nat Resour Model 1: 321–337 Westgate MJ, Likens GE, Lindenmayer DB (2013) Adaptive management of biological systems: A review. Biol Conserv 158:128–139. doi:10.1016/j.biocon.2012.08.016 White B (2005) An economic analysis of ecological monitoring. Ecol Model 189:241–250 Wiesemann W, Kuhn D, Rustem B (2013) Robust Markov Decision Processes. Math Oper Res 38:153–183. doi:10.1287/moor.1120.0566 Williams BK (2009) Markov decision processes in natural resources management: Observability and uncertainty. Ecol Model 220:830– 840. doi:10.1016/j.ecolmodel.2008.12.023 Williams BK (2011a) Passive and active adaptive management: Approaches and an example. J Environ Manag 92:1371–1378. doi:10.1016/j.jenvman.2010.10.039

Theor Ecol (2017) 10:1–20 Williams BK (2011b) Resolving structural uncertainty in natural resources management using POMDP approaches. Ecol Model 222: 1092–1102. doi:10.1016/j.ecolmodel.2010.12.015 Williams BK, Johnson FA (2015) Value of information in natural resource management: technical developments and application to pink-footed geese. Ecol Evol 5:466–474. doi:10.1002 /ece3.1363 Williams BK, Johnson FA, Wilkins K (1996) Uncertainty and the adaptive management of waterfowl harvests. J Wildl Manag 60:223–232. doi:10.2307/3802220 Williams B, Szaro R, Shapiro C (2009) Adaptive management: the U.S. Department of the Interior technical guide, 2 edn. U.S. Department of the Interior, Washington, D.C. doi:http://www.doi. gov/initiatives/AdaptiveManagement/TechGuide.pdf Williams BK, Eaton MJ, Breininger DR (2011) Adaptive resource management and the value of information. Ecol Model 222:3429–3436. doi:10.1016/j.ecolmodel.2011.07.003 Wilson KA, McBride MF, Bode M, Possingham HP (2006) Prioritizing global conservation efforts. Nature 440:337–340 Wittenmark B (1995) Adaptive Dual Control Methods: An Overview. In: In 5th IFAC symposium on Adaptive Systems in Control and Signal Processing Zhou R, Hansen E (2001) An improved grid-based approximation algorithm for POMDPs. In: Proceedings of the 17th International Joint Conference on Artificial Intelligence (IJCAI-2001), Seattle, Washington, USA Zhou E, Fu MC, Marcus S (2010) Solving continuous-state POMDPs via density projection. IEEE Trans Autom Control 55:1101–1116