Adversarial Reasoning under uncertainty using a ...

4 downloads 0 Views 115KB Size Report
modified to offload some of the probability reasoning .... about probabilities, due to a deficiency in our ..... The game of contract bridge is similar in some ways.
Paper ID# 900488 PDF ADVERSARIAL REASONING UNDER UNCERTAINTY USING A DETERMINISTIC PLANNER Mike Howard and Eric Huang HRL Laboratories, LLC., Malibu, CA

Bradford W. Miller Raytheon Integrated Defense Systems, Portsmouth, RI September 28, 2009

considerations could be incorporated into the method in a tractable way without losing much of the responsiveness of deterministic planning. This hybrid of deterministic planning and decision-theoretic reasoning strikes a balance between the complexity of a more accurate decision-theoretic representation versus the responsiveness of a less expressive deterministic representation. The new RAPSODI can now automatically handle uncertain initial conditions, probabilistic action effects, and partial observability.

ABSTRACT We have added a restricted subset of probabilistic reasoning to our deterministic adversarial reasoning technique, RAPSODI. This hybrid approach strikes a balance between the intractable complexities of computing a fully probabilistic representation, versus the inadequate expressiveness of a simpler deterministic representation. After a brief review of the basic adversarial planning algorithm, we describe how the improvements handle probabilistic actions, multiple possible worlds, and probabilistic contingencies.

The contributions of this paper include a way to factor an otherwise intractable probabilistic planning problem into a set of deterministic sub-problems that are easier to plan, and then assembling the results into a solution to the original probabilistic problem. We handle some of the probabilistic parts automatically without losing the responsiveness, and still keep the user in the loop to direct and focus the planning. We briefly review the original, fully deterministic version of RAPSODI before describing how it has been modified to offload some of the probability reasoning from the user.

INTRODUCTION We approach adversarial reasoning from a planning perspective, as an iterative search for alternative ways to achieve a set of objectives in the context of an adversary intent on subverting our goals. This is situation management at level 2 (situation refinement) and especially level 3 (estimates and COA selection) in the JDL system. Real-world problems include uncertainty due to factors like an inability to completely sense the world (partial observability), and the possibility of action failures. These problems can often be modeled as Partially Observable Markov Decision Processes, but the state space grows exponentially with the number of state variables. The literature is filled with various strategies for dealing with the computational complexity of real-world planning problems (Cassandra 1998). In adversarial reasoning, one or more agents are added to the problem making it much more difficult.

BACKGROUND RAPSODI consists of two modules (Figure 1): a multi-agent plan-critic reasoner, GameMaster, and a fast single agent planner. GameMaster reduces a multi-agent planning problem a series of single-agent planning sub-problems. GameMaster is relatively planner-agnostic; the only requirement is that the underlying planner support a defined interface. We adapted the LPG planner (Gerevini 2003) with such a messaging interface. This single-agent planner provides a plan service that can be run on any machine, and GameMaster may connect to more than one instance of the planner at a time in order to process different parts of a problem in parallel.

(Howard et al 2007) presented our adversarial reasoner RAPSODI (Rapid Adversarial Planning with Strategic Opponent-Driven Intelligence). RAPSODI concentrated on analysis of deterministic adversarial problems: the state of the world is known and actions have deterministic effects. Uncertainty was handed off to a user in the loop, who could selectively explore different possibilities during planning. This approach achieves a very fast response on moderately complex adversarial problems, presenting the user with a “what-if” sort of mission planning tool.

We approach adversarial reasoning as a competition between the plans of two or more opponents, where the plans for adversaries are based on our best model of their capabilities, assets, and intents. GameMaster embodies an iterative plan-critic process that finds specific conflicts between the single-agent plans and adds contingency branches to repair the conflicts in

Since then we found that certain probabilistic

1 of 7

Hybrid Reasoner

No

Multi-agent Possible World Plan Store

Select 2 plans to check for conflicts

Domain Root

Yes Conflict ?

A

A

A

G

A A

PW 1

PW 2

Convert to Hybrid Partial Plan

G

A

A

PW X

Write updated plans back to Plan Store

Conflict Resolution Planning Problem

Extract deterministic planning problem

User

A

A

PEA

G

Deterministic Planning Engine(s)

G

Conflict Resolution Result: A Deterministic Partial Plan

A G

A

PW Y

A

A

A

A

Splice to resolve conflict

G

A

A

A

A

G

A A

G

Look for conflicts in hybrid plans

Figure 1. All probabilistic considerations are built into the hybrid multi-agent reasoner (GameMaster) that constructs deterministic single-agent planning tasks for the plan servers. Initially it plans one plan for each agent in each possible world, without considering any other agent. After the initial plans are formed, it begins an iterative process of comparing the plans of different agents looking for conflicts. Each conflict is addressed by planning a contingency branch that avoids or prevents the conflict. The hybrid reasoner extracts deterministic planning tasks from the hybrid deterministic/probabilistic contingency plans to be planned by deterministic plan servers, then converted into hybrid plans and spliced into the original plans as contingencies.

favor of one of the agents.

still in the loop to direct and focus the planning. But now the system can automatically handle uncertainty in the initial state, goals, and action effects. Our changes are all in the GameMaster module, which now can detect probabilistic actions and expand certain of their effects. It can also plan different possible combinations of initial state and goals, and compute the best actions for an agent to take despite a limited ability to observe the world. In particular, this adds the following capabilities to our earlier work:

The user directs and focuses the search by choosing which conflicts the system should address, and which computed resolutions are most satisfactory. With each iteration, the plans are expanded to handle more possible conflicts with other agents. It is interesting to note that, if you consider all of the possible resolutions given the first conflict in time that you see, and enumerate all of these branches, the result asymptotically approaches the full expansion of a minimax game tree. The iteratively improving “anytime” nature of this design is ideal for a decision support application. More details are provided in (Howard et al 2007).

Multiple Possible Worlds. To handle partial observability and consequent uncertainty about the initial state of the world, without fully enumerating the possible initial states, we allow the user to specify initial states of interest. RAPSODI will plan each of these “possible worlds”, which is considered here to be a permutation of the indicated initial states and goal states, associating each with a likelihood rating assigned by the user. We take the term Possible Worlds from the philosophical concept used to express modal claims (Wikipedia:Possible_Worlds).

Definition1: A conflict on fact f exists between actions a1 and a2 if a1 and a2 overlap in time and [f ∈ preconditions(a1) and ¬f ∈ preconditions(a2)] OR [f ∈ preconditions(a1) and ¬f ∈ effects(a2)]. Definition 2: A resolution r in favor of a1 for a conflict between a1 and a2 is an action such that ∃ f ∈ post(r) such that ¬f ∈ preconditions(a2) and ¬f ∉ preconditions(a1).

Pr obabilistic Action Effects. In addition to standard deterministic actions, we have defined Probabilistic Effects (PE) actions that can have two or more possible outcomes, each with a likelihood rating, and possibly conditional on propositions about the state. When the deterministic planner encounters a PE action, it considers only the most likely effect, but marks the other possible effects as stubs. Later the meta-planner GameMaster can selectively expand those stubs by asserting them as planning tasks to the planner and splicing them into the plan. This

Adding Uncer tainty The original RAPSODI system was deterministic, so in effect, all uncertainty of the real-world problem had to be handled by the user in the loop who had to sequentially pose different deterministic possibilities. Our new work adds some uncertainty reasoning to the RAPSODI system, while still retaining the benefits of fast deterministic single-agent planning. The user is

2 of 7

decision is controlled by user configurable thresholds and also by the user in the loop if desired.

a conservative model of the adversary, assuming he had full observability of the state and would make the decision that would cause us the most harm. But this new approach allows us to selectively relax that assumption and admit the possibility that the adversary will not make the best decision.

DETAILS OF THE TECHNIQUE Figure 2 shows the form of a contingency plan generated by RAPSODI. Different Possible World specifications are shown branching off the root node, each parenting plans for two or more adversarial agents. These plans are very simply illustrated; actions along the same branch are to be executed consecutively, and those on parallel branches can overlap in time. Each action is applied to the set of facts that describes the state in the box just above it, and its effects change some of those facts in the state of the world just below it. For example, action PEAb3 is a Possible Effects action with three possible outcomes.

Possible Wor lds. We use a modified version of the standard "Planning Domain Definition Language" 2.2 (PDDL, Edelkamp 2004) and Non-Deterministic PDDL (NPDDL, Bertoli et al 2003) to define the adversarial problem to the RAPSODI system. The “:init” section in the problem file lists values of facts that characterize the world in the initial state of the planning problem. We have broadened the expressiveness of that section so that it can describe a set of possible initial states, each rated by a probabilistic likelihood. We added a “:init_pw” tag to differentiate the new section for the parser. The tag is followed by a conjunctive list of pairs, where the first item of each pair is of the form (prob ), indicating the likelihood of the state. The second item in each pair is a conjunctive list of facts that are true in the possible world initial state. An example follows:

The plans in Figure 2 are contingency plans with

Decision Nodes (shown as diamonds) between contingencies. A decision node is a branch point between two optional courses of action in the plan; the node records a condition that is to be evaluated at mission time to decide which child branch to take. Each child branch is a partial plan, one constructed for the state in which the values of the facts match the branch condition, and one for the state in which the values do not match. (Howard et al 2007) employed

(:init_pw (and ( (prob 0.25) (at R1 X) (at R2 Y) … ) ( (prob 0.30) (at R1 Z) (at R2 Y) … ) ( (prob 0.45) (at R1 Z) (at R2 Z) … ) ( (prob 1.00) (at B Q) … ) ) )

Domain Root Possible World 1

The last possible world, with (prob 1.0), has facts that are true in every possible world, e.g. agent B always begins at location Q (the “at” fluent takes an agent and a location). In the other possible worlds, with probabilities that add to 1.0, the position of agents R1 and R2 differ. So facts from the last world are added by the parser to every other possible world; separating it is a notational convenience. Just to be clear, the “:init_pw” statement is an abuse of notation and cannot be read as a correct logical statement about probabilities, due to a deficiency in our planner’s parser that we have not yet addressed. We consider implicitly that if one of the top-level conjuncts is true, then all the others (except for (prob 1.00)) are false, so you can’t really perform standard probabilistic reasoning over the statement. For instance, in proper notation, P(A)=.5, P(B)=.5, P(A^B)=.25, but if we specify this to the planner, it doesn’t interpret it this way!

Possible World 2 Ab1

Ar1

S 45

S 23

f

PEAr2

Ar3

f2

S 46 S 47 S 48 S 49

Ar4 f3

Ar5

PEAb3

Ab2

S 24 S 25 S 26

Ar6

Ab4

Ab5 Ab7 S 49

Ab4 Ab5

RED contingency plan for PW2

BLUE contingency plan for PW2

Figure 2. A fragment of a hybrid contingency planning graph, illustrating plans for two adversaries, RED and BLUE, based on the initial state and goals defined in Possible World 2. Ab and Ar nodes are deterministic actions for BLUE and RED respectively. Actions along the same branch are to be executed consecutively. PEA nodes are Possible Effects actions. Boxes with “S” labels represent planning states. Diamonds are Decision Nodes, each containing a set of facts “f”.

Multiple possible goals are described in a similar notation in a “:goal_pw” list, but it includes an agent specifier that allows specifications of different goals for each agent. In our older system we employed agent-specific goal statement of the form “(:goal )”. Here our variant of the PDDL 2.2 problem file defines the goal as a conjunctive list of goals gi labeled by a probability pi.

3 of 7

The probability pi is the likelihood that agent has goals gi. The following example illustrates the format:

deterministic action effects to be declared. Some of those effects may be conditional on propositions about the state, e.g.: (when e-list), where e-list is a list of one or more effects. We adopt a problem input format similar to that, along with a feature of the probabilistic version of PDDL (Youns 2005) that allows a probabilistic e-list denoted like (probabilistic p1 e1 … pn en), and further constrains it by a predicate when (pi is the probability of effect ei).

(:goal_pw (and (p1 g1) (p2 g2) … (pn gn)))

This would define n possible goals gi, (n>1), which lets the system consider actions in the context of more than one possible adversarial intent. gi takes the form of one or more facts that the agent is to make true.

As an example of this, consider the following sensing action specification illustrating conditional probabilistic effects:

The possible worlds are derived by permuting each possible initial fact list with each possible goal list. Our current implementation plans a few of the worlds with the highest joint probability; the user can get lower probability worlds planned on request. For each possible world generated, we then know the initial state and goal state for each agent. From a single agent’s perspective, this information completely specifies a single agent planning problem like in a standard PDDL 2.2 description.

(:action PEA1 :parameters … :effect (and f5 (when x (probabilistic 0.50 f6 0.25 f7)) (when y (probabilistic 0.20 (and f6 f7))))

x and y are predicates like (weather sunny).

F5, f6 and f7 are facts, the types of which are given in the :parameters line. According to this specification, action PEA1 could be represented by one node with four outcomes for three weather configurations (see Table 1).

The planner performs an optional preplanning step that produces grounded versions of facts and actions for every possible binding of values to facts in the domain, and then a reachability analysis throws out ones that are impossible. This is an important speedup. (Bryce 2007) Since each possible world initial state has some number of initial facts with different values, and some objects may exist only in a subset of possible worlds, the reachability in some worlds differs from others. We do this search efficiently by performing a standard reachability analysis on a merged initial state, as follows:

Table 1. Action PEA1 Effects Likelihoods x y Predicates: ¬x ∧ ¬y f5 .25 .80 1 (and f5 f6) 0.50 0 0 (and f5 f7) 0.25 0 0 (and f5 f6 f7) 0 0.20 0

As a prelude to the reachability analysis described above, every action like PEA1 with multiple effects must be turned into multiple deterministic actions, each with a single effect. Basically one deterministic action is instantiated for each possible effect of the PE action. Then GameMaster composes

A fact is reachable if there is some sequence of actions that can be applied from the initial state to make the fact true, without applying the actions' delete lists. To compute this, we start with the set of facts in the initial state. We then find actions whose preconditions are satisfied by the facts in this set, and add to the set the actions' add list. This is done until no more new facts may be added. Notice that in this model, the set may contain both a fact and its complement. For example, with a movement operator that adds the new location, repeated applications might result in the unit being AT every location, but also NOT AT every location as well. In this way, reachability is the union of the facts over all reachable states. In the end, this analysis allows us to remove from the search any states that cannot be produced (i.e., any grounded actions and facts). Note that the converse is not true; if a fact is found to be reachable in some world, it is not necessarily reachable in every possible world. The planning process will discover that.

start

Get deterministic partial plan back.

Translate each PE action into a set of deterministic actions (Only need do once for a hybrid problem) Optional Speedup: Reachability analysis on union of all facts in all possible worlds (PW) Compute likelihood for any predicates in a when clause of a PE action Compose deterministic planning problem supplying deterministic versions of PE actions that exceed probability threshold and send it to deterministic planner Set initial state of planning problem to state after applying stubbed out effect

Possible Action Effects. The standard PDDL 2.2 format (Edelkamp 2004) only allows a single set of

Find splice point in hybrid contingency plan and create a Decision node there. Compute Decision branch likelihoods (see Fig. 4, detail 1) For any PE actions in deterministic partial plan, compute likelihood of chosen effect, and insert a stub for every other effect, each labeled with its likelihood (see Fig. 4, detail 2) Stubbed out PE effects can now be planned?

Yes

Want to plan more stubbed out PE effects?

No done

Figure 3. Flow Diagram for making a deterministic planning sub-problem from a hybrid problem.

4 of 7

deterministic planning sub-problems as illustrated in Figure 3. The subproblem definition includes only the reachable deterministic actions. When clauses of PE actions are evaluated and only those deterministic actions representing PE effects that exceed a likelihood threshold are considered reachable. For example, if the weather parameter y were true for PEA1 in Table 1 the probability threshold was 0.5, only the grounded deterministic versions with outcome effect f5 would be considered reachable.

as the true values of certain variables are discovered. The reasoner would then recalculate those likelihoods and the changes will ripple down the contingent plan. Optionally, certain low-likelihood branches can then be pruned and new branches planned as necessary. The probabilistic update should be performed on a contingency plan any time the plan is been modified by adding a new possible world, a new probabilistic action, or a new branch at a decision node. Likelihoods on each branch in the contingency plan are based on the likelihood of certain facts that represent the value of certain beliefs.

When the planning service returns a partial plan containing a PEA, GameMaster splices the partial plan into the contingency plan and annotates each possible effect of each PEA with the likelihoods for each alternative outcome. For PEA1 with y true, the f5 effect would be given a likelihood of 0.8, and since the planner only plans the single most likely effect, a stub is created for the (and f5, f6, f7) effect, labeled with 0.2 likelihood.

What is required is to: 1) support the user's decision support process by showing how certain beliefs affect plans. 2) compute the likeliest path through the tree; e.g.: a) Which PEA effects are likeliest b) Which way will a decision node branch.

A user-configurable parameter limits the number of alternative outcomes that are explored. The user may have requested that no more than the most likely n possible outcomes of PEAs be explored, or a likelihood threshold may have been established. Likelihood of outcomes is discussed in the next section. For each alternative outcome that satisfies the threshold, GameMaster composes a planning task and asserts it back to the planner, requesting a plan from the state after applying the effects of the action, to the original set of goals. The returned partial plan is spliced into the agent’s plan as one of the outcomes of the action.

3) While executing the plan, to figure out which state in which world is the best match with our observations and beliefs (required because of partial observability) 1 is accomplished by 2. 2a is dependent on the predicates of any "when" clause of the PE action, and can be supplied by a belief network. Multiply that by the effects probabilities to get the likelihoods of each branch. To decide 2b, we need the joint probability of each fact in the condition of the decision node. This must be performed on any agent’s plan that contains actions with any of those facts in the effects.

Par tial Obser vability and Likelihood: Within each possible world, as mentioned previously, RAPSODI’s iterative plan-critic technique illustrated in Figure 1 produces a contingency plan, with partial plans branching at decision nodes to handle conflicts with other agents, as illustrated in Figure 2. Our previous paper described how to identify which facts the user should use to decide which branch of a decision node to take, but no likelihood computations were computed to advise the user which branch of a decision node is more likely based on current beliefs. The procedure was entirely deterministic, and performed in the context of a fully observable world.

To decide 3, we compile our observations and beliefs into a state vector with likelihood ratings, and match that with every state in every world in the contingency plan tree for our agent. We don't know our current state with certainty. At some point, we may get evidence that changes those beliefs, and the world is in a state that is better described by one of the other possible worlds. We have not yet dealt explicitly with sensor actions, but we do provide an inference system that the user can interact with and introduce new evidence.

In this new version of RAPSODI, we build on that same plan-critic adversarial reasoning process to identify the most important states of the infinite number of states possible in a fully observable world. The new capability is to compute likelihoods of each branch node based on beliefs about the value of relevant propositions and variables. It should be noted that this work is specifically designed to work with a decision support system. Beliefs might change premission as a user explores a what-if during a decision support session, or later during execution of the plan

Belief States and Likelihood Pr opagation thr ough the Contingency Plan Tr ee. Belief states govern the a-priori likelihoods of each possible world and the likelihoods of predicates used in conditional (“when”) clauses of possible effects actions (PEAs). A probabilistic belief network is used to evaluate a belief state subject to evidence, and can be implemented by a network such as a Bayesian Network or a Dempster-Shafer Network. There are only three types of nodes in a contingent

5 of 7

fully admit the possibility that we do not know for sure which possible world we are in, we need to look at the Boolean value of facts across all possible worlds in which agents can affect those facts. To do this we apply a multi-agent simulation in each possible world in temporal lock-step, conditioning the fact likelihood in each world at the time of a branch by the likelihood of the world itself.

plan tree, and each may affect fact likelihoods differently. Users will need to decide how to propagate likelihoods based on decisions about how independent the probabilities are. Therefore, it is best if the user define the propagation of likelihoods at the same time actions are defined. But these are the principles for each type of node: Deterministic action node: Fact likelihoods are not altered by a deterministic action. This action is part of a plan that is based on certain beliefs about the world, but if we assume that we are in that world, a deterministic action will surely be taken.

Likelihood updating is done by simulating forward the plans of each player in each possible world, starting from the root, maintaining a running likelihood rating on each fact involved in a branch. The simulation starts in each world by loading the initial conditions, applying the earliest action, and updating the world by interleaving actions from each plan, propagating likelihoods across nodes as illustrated in Figure 4.

Probabilistic action node: The likelihoods of each branch of a PEA are conditioned on our belief in the conditional predicate for each possible outcome as illustrated by the discussion around Table 1 above. If the probabilities are independent, the likelihood of the state of the preconditions of a PEA is multiplied by the likelihood of the conditional predicate governing each particular outcome branch, and by the static likelihood of each effect given in the problem definition. This computation is carried out for each fact mentioned in each outcome branch of the PEA. The probability P(x), the belief in a predicate x, must either be supplied by the user or derived from a belief network that incorporates evidence from the user and other sources.

Note that startQ is a queue that automatically keeps its contents sorted so that the action that starts at the earliest time is the one that will be removed next. Likewise, endQ contents stay sorted by end time, so the action with the earliest end time will be removed next. Basically, we pull all the actions that start at the same time off the start queue and put them in the end queue, replacing each by its successor. The actions in the end queue are then processed in order of their end times, earliest first. Processing depends on the type of node. Deterministic actions are straightforward. Possible effects actions conditional on a when clause are first evaluated in a belief net for the likelihood that the when predicate is true. Then any fact in the effects of any outcome of the action gets labeled with the joint probability of P(outcome), P(when) and the probability of the parent state.

Decision node: As mentioned in the introduction, decision nodes are mainly created to splice in a resolution partial plan to resolve conflict with another agent. The branching likelihood of a decision node is equal to the joint probability that the facts in the condition are true. It is very possible that other agents in the same possible world can affect the value of one or more of those facts. Actually, in order to

The simulation proceeds across all worlds one

For each possible world (PW) in parallel (i.e., progress each world forward in time in lock step)

Node type? Decision node

Conditional action

Init startQueue with first node in every plan

Deterministic Nondeterministic Branch on decision node

Move all actions with earliest start time from startQueue to endQueue and put their successors into startQ Get action with earliest end time off endQueue

N

startQ empty?

No Update likelihoods of facts in decision nodes for current timestep to be max likelihood in any PW

Apply Deterministic action; no change to probabilities

Apply conditional action

endQ empty? Yes

Y

done

Figure 4. Likelihood update for contingency plans.

6 of 7

timestep at a time. When a Decision node is encountered, all actions that can affect the facts in the node’s condition should have been processed. We take the maximum likelihood of each fact across every possible world simulation, conditioned by the likelihood of the world itself, and use that as the likelihood of the fact in the Decision node conditional.

successfully, and their paper does a very good job of comparing alternative approaches to that domain. Like in our approach they use belief functions to compute likelihoods on facts. They also focus their search on the most likely subspaces. Theirs is a totally automated planner, unlike ours which is a more brainstorming, user-in-the-loop approach. And in HTN planning, the tactics are built into the action specification by the user. Our approach uses a simple low-level STRIPS-like action specification and our planner discovers strategies on its own, possibly ones that the user did not think of.

The “Branch on Decision Node” box in Figure 4 is a bit misleading. In effect it is a depth-first search down each branch of the tree, and the startQ and endQ state would be saved when a decision node is encountered and reset to that point when each branch is searched. The idea is that a user evaluating the contingency plan returned by the system would want to know the likelihood of a fact in the plan, across all possible evolutions of the world, taking into account which branch of his own plan he is on. For example, if the plan is to cross a bridge, there might be a decision node that branches on whether or not that bridge is intact. If we compute that across all possible worlds, it is 90% likely that the bridge gets blown up, there will still be a plan branch with a 10% probability that the bridge does not get blown up. Recall that the plan for this 10% likelihood was computed deterministically with the bridge intact. But when computing the likelihood of the “(bridge intact)” fact after this decision point, the likelihood we are want is to tell the user that although this branch has us marching across the bridge, it is only 10% likely to be intact.

In general, our deterministic/probabilistic hybrid approach offers us a speedup on the probabilistic reasoning that other planning approaches have not considered. We believe this approach is generally applicable to a wide variety of planners. REFERENCES P. Bertoli, A. Cimatti, U. Dal Lago, M. Pistore. 2003. “Extending PDDL to nondeterminism, limited sensing and iterative conditional plans,” in Proceedings of International Conference on Planning and Scheduling, 2003. C. Boutilier, T. Dean and S. Hanks. 1999. “Decision Theoretic Planning: Structural Assumptions and Computational Leverage,” Journal of AI Research (JAIR). D. Bryce and S. Kambhampati. 2007. “A Tutorial on Planning Graph-Based Reachability Heuristics.” AAAI Magazine, 28(1). Anthony Cassandra. 1998. “A Survey of POMDP Applications.” AAAI Fall Symposium. S. Edelkamp and J. Hoffman. 2004. “PDDL2.2 : The language for the Classical part of the 4th International Planning Competition,” Technical Report No. 195, Facbereich Informatik and Institut für Informatik, Germany. Alfonso Gerevini, Alessandro Saetti, Ivan Serina. 2003. "Planning through Stochastic Local Search and Temporal Action Graphs", Journal of Artificial Intelligence Research (JAIR), vol 20, pp. 239-290. M. Howard, E. Huang, K. Leung, P. Tinker. 2007. "RAPSODI Adversarial Reasoning" in 2007 IEEE Aerospace Conference, Big Sky, Montana. S. Smith, D. Nau, T. Throop 1998. “Success in Spades: Using AI Planning Techniques to Win the World Championship of Computer Bridge.” IAAI-98/AAAI98 Proceedings, pp. 1079–1086. H. Younes, M. Littman, D. Weissman, and J. Asmuth. 2005. “The First Probabilistic Track of the International Planning Competition”. Journal of AI Research, 24. Wikipedia:Possible_Worlds. http://en.wikipedia.org/wiki/Possible_world.

CONCLUSION We have described a method to strike a compromise between the intractable complexity of a full-blown probabilistic adversarial reasoning approach, and the inadequate expressiveness of a purely deterministic approach. We have recently implemented this in our adversarial reasoner, RAPSODI. The technique factors all adversarial reasoning and probabilistic considerations into a meta-reasoner that constructs single-agent planning tasks to an underlying deterministic planner. The main requirement on the planner is that it support the API that turns it into a plan server that can accept planning tasks and return a plan in the right form. RAPSODI was designed as a user-in-the-loop decision support system, so quick, reasonable results are better than slow, optimal results. The user can then question the plans generated and question the belief computations, and modify them in an iterative refinement process. The game of contract bridge is similar in some ways to our domain. (Smith et al 1998) employed Hierarchical Task Network planning very

7 of 7