Single-trajectory Opportunistic Planning under ... - Semantic Scholar

1 downloads 0 Views 67KB Size Report
to avoid contingency planning and make use of single- trajectory plans constructed using deterministic planning technology. The plan is constructed using ...
Single-trajectory Opportunistic Planning under Uncertainty Maria Fox and Derek Long University of Durham South Road, Durham, UK emails: [email protected] and [email protected]

Abstract An approach to planning for execution in uncertain environments is to characterise the action resource-use profiles using statistical techniques and then base predictions about action outcomes on these estimates. Because the exact profiles of actions cannot be determined prior to actual execution it is possible that these estimates will be flawed, resulting in the failure of the current plan trajectory. Terminal failure of the plan can be avoided if the plan trajectory branches at some of the anticipated failure points, allowing the executive to react to contingencies. This approach has the disadvantage that there are no existing contingency planners capable of handling realistic problems and no techniques available for validating contingent plans or for managing them efficiently at execution time. In this paper we present an alternative approach based on executing a deterministic plan trajectory based on conservative estimates and exploiting local opportunities to use up the resources that accumulate as plan execution progresses.

Introduction AI Planning has increased in sophistication at a very impressive rate over the last few years. There are now planning systems capable of producing complex concurrent plans in domains featuring temporal constraints and continuous resources. The third international planning competition was specifically concerned with assessing planning capabilities in such domains, and with the provision of a modelling language capable of expressing complex mixed discrete/continuous systems (Fox & Long 2002). Despite the advances that have been made many realistic problems still seem to lie beyond the capabilities of the current technology. These problems tend to be characterised by the presence of uncertainty in the outcomes of the actions that can be performed. Uncertainty in outcomes can arise in different ways: there can be uncertainty in the discrete outcomes of an action (non-deterministic logical effects of an action) and there can be uncertainty in the continuous outcomes of an action (such as the precise consumption of resources by the action). In this paper we are concerned with uncertainty over continuous outcomes, although some of the observations we make about uncertainty and its impact on planning can be generalised to the discrete case. As an example of the sort of problem we are concerned with, planetary rovers, whose task it is to gather scientific data in unpredictable environments, are subject to uncertainty caused by

their interaction with the physical environment (factors such as slope and terrain characteristics affect speed of movement and rate of power consumption). As a result, it is not possible to state with accuracy how long it will take for a rover to travel between two points, or how much power it will consume in doing so. Further, the question of whether a rover has successfully executed a move action depends on why the effect of the move is required (different actions require different degrees of precision in the location of the rover). The planetary rover example is developed in detail by Bresina et al (Bresina et al. 2002). It is an example of a domain in which actions have continuous outcomes affecting the predictive power of the planner. Because actions may fail to complete after the expected amount of time there appears to be a need to plan for contingencies rather than to construct a single plan trajectory that relies on the successful execution of every step. The authors provide a brief survey of existing contingent planning approaches and observe that these approaches lack expressive power and are computationally infeasible because of the branching factor involved when all contingencies are considered. Existing approaches lack the ability to select interesting contingencies for which to plan. One of the few exceptions identified by the authors is the JIC scheduler (Drummond, Bresina, & Swanson 1994) which limits its attention to constructing replacement schedules for use at the most likely failure points identified in an optimistic telescope operations schedule. Although this approach seems to work well in domains in which uncertainty is controlled it does not work well in general because by the time a schedule has failed it is often too late to repair it by local rescheduling. An alternative approach, presented in this paper, is to avoid contingency planning and make use of singletrajectory plans constructed using deterministic planning technology. The plan is constructed using estimates of the action durations based on (for example) the 95th percentile of the distributions representing actions outcomes. These distributions are constructed from data accumulated over a series of experiments. The 95th percentile gives conservative estimates of resource-use profiles resulting in a reliable and robust trajectory. Of course, the disadvantage with such a trajectory is that it can waste resources and leave rovers idle for long periods in which they could be acting usefully. We describe how the amount of unused resources can be pre-

dicted statistically, and how these resources, which accumulate over time as the plan is executed, can be consumed by exploiting local opportunities as they arise in the plan and its subsequent execution. We proceed by first describing why deterministic planning can be considered appropriate for planning under uncertainty. We then explain how opportunistic plan fragments can be constructed and made available for exploitation when sufficient resources have been accumulated to enable them to be executed without resource impacts on the remainder of the planned trajectory. Finally we discuss the advantages that can be obtained by our approach with reference to the small rover problem described in Bresina et al (Bresina et al. 2002).

Handling Uncertainty in Planning Many real problems involve uncertainty. There are several approaches to handling uncertainty in planning. A traditional approach is to ignore it, assuming that it can be abstracted out in the domain descriptions and then handled by the executive in interpretation and execution of plans. In some domains the uncertainty appears to intrude at a level at which it becomes important for the planner to consider its impact on the planning process. To handle these cases, various possibilities have been explored (for example in (Bertoli et al. 2001; Bonet & Geffner 2000; Boutilier, Dean, & Hanks 1999; Cimatti & Roveri 2000; Majercik & Littman 1999; Onder & Pollack 1999; Rintannen 1999)), including building plans with high probability of successful execution, conformant planning, contingency planning, coupling planning with execution-monitoring and replanning and finding plans with high expected utility. In all cases, it is assumed that there remains a motivation for planning at all. We briefly examine this assumption. The activity of planning is that of deciding what to do in advance, in order to achieve a desired goal. The plan is constructed prior to its execution and its subsequent execution is undertaken with reasonable expectation of success. In domains in which the planning agent has partial knowledge it might be necessary to execute plan components and then repair the overall plan if the outcome of execution is not as expected. However, for planning to be a sensible strategy it must be that plan components consisting of more than one or two actions can be constructed with high confidence in their successful execution. If this is not the case then a reactive strategy in which the execution agent simply responds to environmental stimuli is more appropriate from the point of view of overall cost-benefit ratio. To see that this is so, consider the extreme case in which actions have entirely unpredictable effects (both logical and metric). That is, executing an action could lead the executive into any possible state of the world. It is clear that there is no point in planning, since any action is as good or as bad as any other in this case. Reasoning about actions at all depends upon there being predictability in their behaviours. If actions are more predictable, but the state following execution of n of them is effectively unpredictable, then just as planning with entirely unpredictable actions is worthless, so in this case planning with more than n actions will be worthless. The horizon for

the utility of planning is certainly no further than the point at which states become effectively unpredictable. Therefore, to make planning worthwhile the execution environment must have certain properties: 1. Environment states do not change suddenly and unexpectedly (at least, not frequently). Changes in both metric and logical state only occur if they are implied by the domain model. 2. Actions have predictable outcomes and compose in a predictable way. 3. Small variations in initial conditions lead to small impacts on later states. The converse of this is sometimes known as the butterfly effect. A domain with a significant butterfly effect undermines planning by making it possible that initial states that are equivalent at the level of abstraction available to the planner will lead to very different evolutions of behaviour during execution. Under these assumptions uncertainty in the execution of actions is controlled in its effects so that a planner can usefully predict the behaviour of the domain using its model and anticipate sufficiently accurately for the investment of planning effort to be repaid in improved performance compared with purely reactive behaviour. A particular form of uncertainty is in metric values such as the expected duration of actions or the consumption of continuously valued resources by those actions. The planetary rovers example developed by Bresina et al (Bresina et al. 2002) describes the uncertainty that features in the physical capabilities of autonomous vehicles. The outcomes of actions that govern such vehicles are continuous and take the form of probability distributions (for example, the time it takes for a rover to travel 100 meters depends on terrain, weather conditions, power, gearing, and so on). We claim that planning is of limited utility unless the actions have reasonably small standard deviations, since predictions become increasingly weak as the standard deviations grow. With reasonable standard deviations, worst-case planning (that is, nominalising the expected behaviours of the actions at, say, the 95th percentile) has some important advantages. Foremost amongst these is that the planner can treat actions as deterministic and disregard the uncertainty that actually determines their true behaviours. Taking the 95th percentile (a traditional one for significance testing) has the disadvantage that it can waste resources by leaving the executive idle during episodes in which it could be active had the plan been more optimistic about resource use. In fact, if the distribution has a low standard deviation then the 95th percentile is likely to be a very accurate estimate of the time that will actually be taken. This means that narrow distributions represent a kind of uncertainty that can be reliably abstracted from the planning process without being overly conservative. If the standard deviations are larger then one can expect that the conservative 95th percentile plan will be far less efficient in its use of available resources. A danger exists that a 95th percentile plan might not even exist to achieve the stated goals, while a less conservatively constrained plan might be found. Of course, a threshold of lower than 95 percent can be used, but

one must consider whether executing a less robust plan that offers a possiblity of achieving a goal is a sensible strategy compared with a robust plan for lesser goals. Figure 1 shows the means and standard deviations describing distributions modelling the travel times that might be required by a robot moving over different distances in smooth, moderate and rough terrain. The estimated travel times are nominalised at the 95th percentile. It can be seen that, as expected, the rough terrain and noisy sensors both increased the means and standard deviations of the behaviour, forcing a more conservative estimate for the 95th percentile. The data was actually generated using a Lego Mindstorms robot, but the details of this are not relevant to the paper so are omitted. As multiple legs are combined, the use of the 95th percentile as an estimate for the nominal execution time of each action leads to an accumulating expected error. So, if k actions all with mean execution time m and standard deviation s are executed in sequence, the nominalised times will yield a total time for execution of k(m + 1.65s). The time actually required to achieve the 95th percentile √ for the combined sequence of actions is only km + 1.65s k showing that the estimate √ based on individual 95th percentiles yields a 1.65s(k − k) over-estimate of the time required for the 95th percentile. As a practical example, if 5 long legs over moderate terrain are to be executed, the use of the nominal time estimates will yield an estimated duration of 65.165 seconds. The 95th percentile for the estimated duration of the combined sequence is 60.948 seconds, so using the 95th percentile for nominalisation will lead to an expected overestimate for the execution time of just under four and a quarter seconds — 7% of the 95th percentile time for the execution of the complete sequence. As an alternative to allowing the expected accumulation of resources following the execution of a sequence of nominalised actions it would be possible to adjust the nominalised values to account for the length of the sequence. So, for k actions each with identical mean and standard deviation,√the nominalised durations can be reduced to m + 1.65s/ k. More generally, where several actions are sequenced to achieve a goal it is possible to “discount” the nominalised time to allow for the expected accumulated benefits of using the 95th percentile as the nominal durations of the individual components. As a further concrete example, if a plan requires the robot to traverse a long leg over moderate terrain, two short legs over rough terrain, a short leg over smooth terrain and, finally, a long leg over moderate terrain with noisy sensors then we obtain a total duration of 74.05 seconds using the nominal durations listed above, but a time of only 67.204 seconds using the adjusted times, which are given in figure 2. As the data demonstrates, if the standard deviation is high then resources that are predicted, by the nominalised action consumptions, to be unavailable will actually accumulate during execution. For example, time and energy can accumulate as actions execute with lower consumption of resources than the nominal values predict. Central to our

proposal is the idea that these resources can be exploited in opportunistic activities during execution of the main trajectory of the plan. Of course, accumulated resources might sometimes be wasted. For example if there is insufficient time to use for an opportunity, but the next planned activity is some time in the future, then the executive is forced to wait. We claim that the occasional loss of resources is a reasonable price to pay for stability and robustness in the core planned activities, combined with the significant simplification of the plan construction problem leading to the development of plans that are both easier to construct and easier to verify. The stability of the outcome of a robust plan has an important benefit: once an executive has been instructed to follow such a plan, it is possible for a planner to continue to plan from the expected final state while the plan is being executed with a high degree of confidence that the plan being developed will be relevant to the executive following execution of the first plan. In the next section we address the obvious disadvantage of a conservative plan: the potential waste of resources associated with its execution.

Single-trajectory Opportunistic Planning An important aspect of handling uncertainty is being able to react to unexpected outcomes and perform plan modifications in the light of the new circumstances. For example, if a robot reaches a location significantly earlier than expected, thereby making available time and power it had not expected to have, a desirable response might be to modify the rest of the plan to make use of the new resources. This might mean planning to visit an additional site and collect data there, which might have some impact on the remainder of the original plan structure. A well-explored approach to handling actions modelled as having a discrete number of different outcomes is to construct contingency plans (Pryor & Collins 1996; Majercik & Littman 1999; Onder & Pollack 1999) or multi-contingent schedules (Drummond, Bresina, & Swanson 1994). Contingency plans grow very complex even when relatively few choice points are explored. Choice points cause branching in the plan structure with the result that a completed plan can contain many alternative trajectories. Complex problems may contain many such branching points so that contingency planning rapidly becomes intractable. Part (b) of figure 3 shows how quickly a contingency plan can grow. The figure contains only seven anticipated branching points in a ninestep plan, resulting in eight different trajectories. Several authors have considered ways in which contingency branching can be minimised by including only the most valuable contingencies. For example, the JIC scheduler (Drummond, Bresina, & Swanson 1994) has to carry out relatively few “robustifying” operations on an initial nominal schedule before the schedule acquires a high degree of robustness. The approach taken is to generate a nominal schedule which has a significant probability of failure and then to improve it by rescheduling at the anticipated failure points. Other potential contingencies are ignored because they do not have a high probability of causing the schedule to fail. Although

Test type Long leg, moderate terrain Long leg, moderate terrain, noisy sensors Short leg, smooth terrain Short leg, rough terrain

Mean time 11507 15937 8421 10840

Std. Dev. 925 6890 858 1937

Nominalised time (msecs) 13033 27306 9837 14037

Figure 1: Travel times over different length journeys showing estimates nominalised at the 95th percentile. Test type Long leg, moderate terrain Long leg, moderate terrain, noisy sensors Short leg, smooth terrain Short leg, rough terrain

Adjusted nominalised time (msecs) 11700 26642 8588 11686

Figure 2: Travel times obtained over multiple leg journeys. this approach works well for scheduling telescope operations, in which there is little temporal uncertainty (in (Drummond, Bresina, & Swanson 1994) the authors observe that the standard deviation is just 2.5 percent of the mean for the variance in durations of their actions and that the performance they report would not be achieved were the value greater), it has not been demonstrated to be effective more generally and it remains an open research issue to determine how to restrict attention to the important contingencies. One way in which a rover might be equipped to handle unexpected effects, without incurring the high costs of contingency planning, is by constructing a single robust plan trajectory and then modifying it by local extensions. This represents a different view of the problem from that explored by JIC. There, rescheduling was used to improve a nominal schedule with an unacceptably low probability of success. In the approach proposed here the nominal plan is already highly robust (having been generated at the 95th percentile, or at some other suitably high confidence threshold) and the reason for modifying it is to reduce the resource wastage incurred as the price of this robustness. Local extensions constitute preplanned strategies for handling anticipated opportunities which will enable accumulated resources to be used. These preplanned components make local modifications that do not impact on the execution of the rest of the plan. They are executed when sufficient resources have been accumulated to enable their execution without affecting resources required for later activities. Plans to exploit opportunities take the form of loops attached to the plan trajectory which, if traversed, would increase the number of actions performed without consuming additional time or resources. Loops are different from contingent branches because they do not lead the executive off the current trajectory and do not comprise alternative trajectories in the plan structure. A loop is traversed whenever its resource preconditions are satisfied and after traversal the executive is ready to perform the same action as if the loop had been ignored. The notion of a plan loop must be interpreted carefully. A loop is not intended to return an executive to exactly the same state that it leaves — if it did, then the loop could not achieve any benefits so would not be

worth executing. Instead, the loop must return the executive to a state from which the tail of the plan can be successfully executed. A plan segment defines, by regressing its preconditions back through its length, a set of states from which it could be successfully executed. A plan loop must take an executive from a state in such a set to another state in the same set. In overview the single-trajectory opportunistic planning approach works as follows. First, a nominal plan is produced using, say, the 95th percentile of each of the distributions representing the profiles of the actions. Existing deterministic planning technology, capable of handling durations and metric effects, can be used to generate the nominal plan. This plan is highly robust, having a statistically insignificant probability of failure, but is potentially wasteful of resources. Next, the nominal plan is analysed for states associated with opportunities. Offline planning effort can be utilised to plan for these opportunities. Each state identified in the analysis leads to the creation of a distinct planning problem in which the achievement of goals constituting the opportunity forms part of the goal state and the resource requirements needed to exploit the opportunity form part of the initial state of the problem. The remaining parts of the initial and goal state are provided by the identified state itself, since the execution of the opportunity plan must loop on that state. These plans can be generated and validated independently of the nominal plan and of each other. The probability of their being executed can be determined in advance since the amount of time and energy accumulated by the time the opportunity states are reached can be statistically predicted, as can the probability of enough resources having been accumulated for an available opportunity to be exploited. In principle this procedure can be recursively applied since opportunity plans can, themselves, give rise to opportunities. Pursuing this in depth would suggest significant investment in anticipating and planning for opportunities. We think it makes sense not to anticipate opportunities more than one or two deep, but if there is idle time during which a planner could be solving problems it would be possible to anticipate in more depth.

(a) Main plan trajectory

Opportunity loop Opportunity loop

(b)

Choice point

Figure 3: The structural difference between single-trajectory opportunistic plans (a), and contingent plans (b).

Like contingencies, opportunities have to be anticipated and planned in advance of execution. At present we assume that one can rely on the domain expert to indicate what activities would be viewed as exploiting useful opportunities. For example, there is no advantage to be gained from adding navigational steps to the plan simply for the sake of consuming accumulated resources. On the other hand, if there are opportunities to collect data nearby it is worth adding the necessary navigation and collection actions if these can be executed within the resource bounds. The planetary rover example discussed below provides an example of how opportunities can be exploited during plan execution. In many cases opportunities can be planned as generic structures, executable in a wide range of contexts. For example, a searchand-rescue robot following a plan to explore a particular part of a disaster site could exploit a generic opportunity to capture a high resolution image of its environment whenever it is presented with the chance. Such unplanned images will always have the potential to be useful, and will not detract from the utility of the plan if they exploit opportunities without affecting the execution of the main trajectory of the plan. In general, there is only any point in considering the addition of particular opportunistic exploitations when sufficient resources could have accumulated to allow execution of at least one action with attached utility. Similarly, only actions with utility are worth considering for opportunistic exploitation (though their execution might demand preparation using actions that do not offer utility). A critical point to consider adding opportunistic exploitations is before any action that depends on a window of opportunity, since these actions will not be executable before the window opens and hence there is a danger of losing accumulated temporal resource in a busy wait. The key advantages of taking this approach are that the main plan structure is robust and is unaffected by the effort involved in constructing the loops. Loops can be constructed and validated independently and the plan can be extended by as many loops as there are resources to invest in constructing them. Unlike contingency planning, which raises real difficulties for generation and validation, planning to take advantage of opportunities in this way represents a powerful decomposition of the problem. The main plan trajectory provides stability enabling long-term planning on the basis of guarantees about goals achieved and states explored by the executive (even if this stability is, as is sometimes the case, bought at the cost of slightly lower overall utility). Figure 3 depicts the differences between the single-trajectory opportunistic planning approach (part a) and the contingency planning approach. Of course, loops can interact so that the traversal of an early loop prevents the later traversal of a higher utility one because insufficient resources are available at the later time. We argue that this models opportunistic behaviour well – opportunities have to be taken advantage of as they arise, not missed in the hopes that there will be better opportunities ahead.

The Small Rover Example In order to explore the possible consequences of using the strategy we propose, we consider in detail the planetary rover example developed by Bresina et al in (Bresina et al. 2002). In order to make this paper complete we reproduce the example here. The actions we focus on are presented in figure 4 (in the original example there were additional actions, but they complicate the example without adding significantly to the discussion that follows). The means and standard deviations for the durations of the actions are measured in seconds and the energy requirements are measured in Amp-hours. The distributions for these measurements are not normal (no duration or energy requirement can be less than 0), but we will consider them approximately normal for the purposes of determining expected behaviours. In this example we consider that the plan starts executing at time 13:45 and that there is ample energy to allow execution of the plan. It should be noted that some of the actions involved have relatively large standard deviations — particularly the move and drive actions. The basic plan structure is to move to an observation target, dig over the surface, back away from this area and then to take an image. The image could be a high resolution image (HiRes) offering a utility of 10, or a spectral image (NIR) offering a utility of 100. For this situation, the authors observe that the best plan is to move to the target, dig and back up, followed by a decision, based on the time, that serves to determine which imaging action to execute. This is a contingent plan with a single branch. Assuming near normal distributions, the first three steps will complete by 14:00 in 34.5% of cases and by 14:30 in 99.9% of cases. Thus, the contingent plan has an expected utility of 34.5 + 6.5 = 41.0 (in the 34.5% of cases where the first three steps end in time to perform an NIR the HiRes image is superfluous, while in the 65.4% of cases in which the first three steps end between 14:00 and 14:30 only the HiRes image is captured). By contrast, considering the case for single-trajectory plans, loop-free plans to collect the HiRes image or NIR image alone have expected utilities of 10.0 and 34.5 respectively. Since the distributions for the movement actions, at least, are necessarily skewed to the higher values (0 lies only one standard deviation from the mean in these actions), all of these calculated expected values are a little higher than their true values, but this does not significantly affect what follows. Following the proposals described in the previous sections the plan we produce will be based on 95th percentile nominalisations of the actions. These are given in figure 5. Using these values, the plan to achieve highest utility must select the HiRes action which will commence at 40 seconds after 14:17. The NIR cannot be executed because the deadline is missed by 17 minutes and 40 seconds. This plan is quite robust, since each nominalised action is based on the 95th percentile and, even if one of the actions were to fail to complete within its 95th percentile estimate, there is a good chance that the accumulated resources unused by one of the

other actions will provide a buffer to ensure the successful completion of the sequence. In fact, again assuming near normal distributions, we can estimate the chance of completing the first three actions within the nominal total duration of 32 minutes and 40 seconds as 95.7%. However, an interesting opportunity now arises (here we assume that all of the important preconditions and effects of the actions are indicated in the example): if the HiRes action completes and there is sufficient accumulated resource to execute the NIR action (using its nominalised values) while leaving the state within its expected final state following execution of the HiRes action alone, then the opportunity to enact the NIR imaging action is available for exploitation. The availability of this opportunity can be predicted in advance and added to the plan trajectory as a loop. For the loop to be traversed, the entire sequence up to and including the HiRes action must complete within 1268 seconds (the nominal expected time for the sequence is 1967 seconds, and 699 seconds must be left for completing the nominal duration of the NIR action within this deadline if the state is to remain within the predicted value). This happens 62.9% of the time (assuming near normal distributions). In fact, because the NIR action can only be executed if the first sequence is completed by 14:00, it is the precondition that dominates, rather than the limits imposed by nominalisation, and the NIR action will be executable in 34.1% of cases — a little lower than for the contingent plan because the HiRes action is executed before considering whether the NIR action can be opportunistically exploited. The energy requirement must also be met for the opportunity to be exploited. This requires that the total nominal energy requirement for the first four actions, which is 9.75 Ah, should exceed the actual energy use by sufficient to allow the nominal energy requirement of the NIR to be fitted into the excess. Thus, the four actions must use no more than 6.92 Ah, which will happen in 74.5% of cases. The original example given by Bresina et al does not make clear whether energy demands are distributed independently of time requirements (it seems very likely that they are, in fact, correlated) but if we make this conservative assumption, then we can suppose that the NIR action will be available to be exploited in 25.4% of cases. This gives the single-trajectory plan an overall expected utility of 25.4 + 7.4 = 32.8. In this calculation we assume that the energy levels after the imaging actions are important in the activities that follow (Bresina et al do not appear to assume this) and that the energy requirements are entirely independent of the time (which seems unlikely). Ignoring energy (justifiable in this case if the energy demands are actually reasonably correlated with durations) gives us an expected utility of 34.1 + 6.6 = 40.1, only a little lower than the utility of the contingent plan. We conclude with an observation that is relevant to the context in which this plan fragment might appear. Consider the two cases, one in which the executive completes the first three actions and then the HiRes imaging action and the other in which the executive completes the first three and then the NIR action. These will have time and energy requirements described by distributions with the following

Action: Precondition: Start before: Time: Energy:

µ σ µ σ

Move E > 10Ah

Dig E > .1Ah

Drive E > .6Ah

1000 500 5 2.5

60 1 .05 .02

40 20 .2 .2

HiRes E > .02Ah 14:30 5 1 .01 0

NIR E > 3Ah 14:00 600 60 2 .5

Figure 4: Time and energy distributions for actions in the small rover example. Action: Nominal duration: Nominal energy requirement:

Move 1825 9.13

Dig 62 0.08

Drive 73 0.53

HiRes 7 0.01

NIR 699 2.83

Figure 5: 95th percentile nominalisations of the actions. means and standard deviations (these distributions ignore the preconditions on the actions which will distort the distributions). Plan: Time: Energy:

µ σ µ σ

HiRes 1105 500.4 5.26 2.51

NIR 1700 504 7.25 2.56

In the case where the contingent plan is executed the alternative outcomes will be more widely separated because the HiRes action will only be executed when the NIR cannot be, and the NIR action will only be executed if the first part is completed before 14:00. The final state after execution of the contingent plan is therefore widely variable, particularly with respect to energy consumption. This means that the contingent plan will lead to a significant split in the plan structure, with probably quite different developments in the plan structure following each branch. Indeed, the evaluation of the branches cannot actually stop at the point considered in the example as presented in (Bresina et al. 2002), since the resulting state will affect the expected utility of the actions that follow, making it necessary to consider a potentially much larger collection of contingencies than is likely to lead to genuinely effective plans. It is this that leads to the combinatorial growth in the work confronted by a contingency planner. In contrast the single-trajectory plan, built using nominalised values, with potential to exploit the opportunity for NIR imaging, yields a single stable base state from which to continue with further activities.

Related Work The approach to managing the uncertainty associated with continuous resource consumption that we have described is closely related to the JIC scheduling work described, for example, in (Drummond, Bresina, & Swanson 1994). Broadly, in JIC the intention is to improve robustness of a high-utility (and probably high-risk) plan, while in opportunistic planning the intention is to improve the quality of a highly robust (but conservative) plan. One can see the JIC approach and

the opportunistic approach described here as arising from a common form: construct a seed plan and then find ways to improve on the plan by identifying points at which the plan structure is potentially defficient and adding structure to it at those points. In the case of JIC the additional structure is a complete new plan branch, while the opportunistic approach we describe involves adding loops, with the advantage that the plan still converges to the same goal state. The JIC approach does not necessarily distinguish the reason for a plan breaking at a particular point — plan breaks are considered to be points at which the enablement conditions for the next step are not met. It is possible that the break is caused by over consumption of resources in an earlier part of the plan, leading to inability to execute the next action. Alternatively, it can be that the earlier part of the plan executes surprisingly efficiently and the next step cannot be executed because it must synchronise with a window of opportunity that has yet to open. These breaks can be seen as bad breaks and good breaks respectively: in the case of a bad break the plan is simply unexecutable, but with a good break the plan remains executable at the cost of lost resources. By failing to differentiate between these possibilities, the JIC approach can lead to widely varying plan structures (and, therefore, unpredictable final states), replacing potentially executable plan branches with completely newly constructed contingent branches. The problem of widely varying final states that can arise from execution of a contingent plan has important consequences. Consider the case in which rovers are given plans during limited windows of communication opportunity: a contingent plan with a set of n possible final states (one along each of the contingent branches of the plan) will make the task of planning the next stage of a mission n times more difficult than for the case in which the final state is a single (conservatively) predictable state. Since the final state of the rover cannot be known until communication is possible, in the contingent planning case one is faced with the choice of building a plan for each possible outcome of the contingent plan and then selecting the appropriate one for download once communication establishes which one is appropriate, with all of the others having been a waste of time

and effort, or else one must wait until communication is established before even commencing the planning. Neither of these options seems attractive. In the case of the conservative plan one can predict the final state and begin to plan from that state before execution is over and communication established. Many people have considered the problem of managing uncertainty as a decision theoretic one, to be made by maximising expected utility in a Markov decision process (for example (Bertoli et al. 2001; Bonet & Geffner 2000; Boutilier, Dean, & Hanks 1999; Cimatti & Roveri 2000; Majercik & Littman 1999)). The decision theoretic approach has been seriously questioned for its inability to account for decisions made by apparently rational human decision makers (Shafer 1986). In particular, it appears that human decision makers are risk averse when considering potential profits, while take the opposite approach when considering potential losses (Ellsberg 1961). This suggests that our approach is one that would harmonize well with human planners: a secure profit is considered better than a lottery ticket with which one might win a fortune. In the context of handing control of expensive hardware to a remote software system, it is not hard to imagine that risk aversion would be a popular strategy. Other difficulties with decision theory include the problem of separation of utility assessments from risk assessments in subjective evaluations and the attempt to place probability measures on situations that are, in fact, unkown (the Knightian distinction between risk and uncertainty (Knight 1921)). Even disregarding such concerns, the difficulty of applying a decision theoretic approach to realistically large planning problems is a formidable one. Factorisation of MDPs (Dearden & Boutilier 1997) offers an interesting step towards scaling, but remains a long way from tackling state spaces of the size that characterises harder planning problems.

Conclusion The classical planning approach of discretizing effects and assuming predictability of action outcomes and state change have been accused of over-simplifying the problems associated with realistic planning domains. It has been claimed that, for planning to be realistic, it is necessary to reason directly with non-determinism and other forms of uncertainty, and to abandon the classical assumptions of perfect knowledge and predictable action outcomes. We argue that the converse is true – that planning is only computationally realistic under these assumptions and that, if a domain cannot be abstracted to a level at which these assumptions are reasonable then planning is computationally infeasible for use in that domain. Because of the complexity of contingency planning, and the instability of the resulting plans, we propose an alternative approach to planning under uncertainty which exploits the strengths of deterministic planning. In this approach a single-trajectory plan is constructed and then extended by the addition of opportunities. These opportunities take the form of plan fragments that can be attached to the trajectory as loops on the states already present. These fragments can be constructed and validated independently of the main

trajectory and do not undermine any of the guarantees associated with the main trajectory. An executive executes a loop only if sufficient time and resources have been accumulated to support its execution. Statistically these opportunities have a reasonable chance of being exploited because of the conservative estimates from which the main trajectory was constructed. We use the 95th percentile in order to ensure robustness. Opportunities provide a way of avoiding the wastage of resources traditionally associated with conservative action profile estimates. A criticism of our approach might be that the utility of the single-trajectory plan relies on distributions being characterised by small standard deviations. In cases where the standard deviations are large, and opportunities cannot be successfully exploited, the approach will indeed lead to potentially great resource wastage. Any technology would be similarly affected since large standard deviations also imply many unpredictable contingencies. However, we claim that standard deviations are unlikely to be much greater than 50% of the mean in domains in which advance planning constitutes a worthwhile investment of effort and that, in fact, most distributions will be characterised by smaller standard deviation/mean ratios.

References Bertoli, P.; Cimatti, A.; Roveri, M.; and Traverso, P. 2001. Planning in non-deterministic domains under partial observability via symbolic model-checking. In Proceedings of 17th International Joint Conference on AI. Bonet, B., and Geffner, H. 2000. Planning with incomplete information as heuristic search in belief space. In Proceedings of 5th International Conference on AI Planning and Scheduling. Boutilier, C.; Dean, T.; and Hanks, S. 1999. Decision theoretic planning: structural assumptions and computational leverage. Journal of AI Research 11:1–94. Bresina, J.; Dearden, R.; Meuleau, N.; Smith, D.; and Washington, R. 2002. Planning under continuous time and resource uncertainty: A challenge for AI. In Proceedings of AIPS’02 Workshop on Temporal Planning. Cimatti, A., and Roveri, M. 2000. Conformant planning via symbolic model checking. Journal of AI Research 13:305– 338. Dearden, R., and Boutilier, C. 1997. Abstraction and approximate decision-theoretic planning. Artificial Intelligence 89(1–2):219–283. Drummond, M.; Bresina, J.; and Swanson, K. 1994. Justin-case scheduling. In Proceedings of 12th National Conference on AI. Ellsberg, D. 1961. Risk, ambiguity and the Savage axioms. Quarterly Journal of Economics. Fox, M., and Long, D. 2002. PDDL2.1: A planning domain description language for modelling temporal and metric domains. Technical Report Computer Science: no. 02/02, University of Durham, UK. Knight, F. 1921. Risk, Uncertainty and Profit. Hart, Schaffner and Marx, Houghton Mifflin Company.

Majercik, S., and Littman, M. 1999. Contingent planning under uncertainty via stochastic satisfiability. In Proceedings of 16th National Conference on AI. Onder, N., and Pollack, M. 1999. Conditional probabilistic planning: a unifying algorithm and effective search control mechanisms. In Proceedings of 16th National Conference on AI. Pryor, L., and Collins, G. 1996. Planning for contingencies: a decision-based approach. Journal of AI Research 4:287–339. Rintannen, J. 1999. Constructing conditional plans by a theorem prover. Journal of AI Research 10:323–352. Shafer, G. 1986. Savage revisited. Statistical Science.