Symbolic Dynamic Programming within the Fluent Calculus - CiteSeerX

6 downloads 0 Views 91KB Size Report
order term and a state is a multi-set of fluents represented as a term using a constant 1 denoting the empty multi-set and a binary AC1-function symbol ◦ denoting ...
Symbolic Dynamic Programming within the Fluent Calculus Axel Großmann

Steffen H¨olldobler

Olga Skvortsova

Artificial Intelligence Institute Department of Computer Science Technische Universit¨at Dresden Dresden, Germany Abstract A symbolic dynamic programming approach for modelling first-order Markov decision processes within the fluent calculus is given. Based on an idea initially presented in [3], the major components of Markov decision processes such as the optimal value function and a policy are logically represented. The technique produces a set of first-order formulae with equality that minimally partitions the state space. Consequently, the symbolic dynamic programming algorithm presented here does not require to enumerate the state and action spaces, thereby solving a drawback of classical dynamic programming methods. In addition, we illustrate how conditional actions and specificity can be modelled by the approach. Keywords Knowledge Representation, Dynamic Programming.

1

Introduction

Reasoning under uncertainty is a major issue in the study of sequential decision problems and has been addressed by scientists in many different fields, including AI planning, decision analysis, operations research, control theory, and economics [2, 5]. A large class of planning problems in these areas can be modelled as Markov decision processes. In this case, dynamic programming (DP) methods can be used for decision theoretic planning, i.e. the choices left to the agent are made by maximizing expected utility. An advantage of DP methods is their capability to deal with stochastic actions and incomplete world knowledge. However, classical dynamic programming algorithms do not scale up to large state and action spaces. In this paper, we address the scalability problem mentioned above by proposing the fluent calculus formalization of the dynamic programming paradigm. We develop a symbolic description of the value iteration algorithm [1] by representing stochastic actions, value functions, and the optimality criterion for choosing an optimal value function within the fluent calculus. Our approach performs a minimal partition of the state space and associates to each obtained partition, or ‘abstract state’, the utility value. Consequently, our symbolic dynamic programming algorithm avoids the explicit enumeration of states and actions.1 By using the logical de1 We believe, however, that it is possible to develop a worst case example which would require to consider all states as in classical DP

scription for states, we do not need to enumerate domain individuals, i.e. we avoid domain grounding. The latter is very impractical for realistic planning domains which may be even infinite in some cases. The paper is organized as follows. In Section 2, we describe the classical value iteration algorithm, an instance of a DP method. In Section 3, we introduce a version of the fluent calculus which is to be integrated with the value iteration algorithm in Section 4. Finally, we draw conclusions and give the directions of future work in Section 5.

2

Classical DP

A Markov decision process, or MDP for short, is a tuple (Z, A, P, R), where Z is a finite set of states, A is a finite set of actions, and P : Z × A → Π(Z) is a state transition function. Π : Z → [0, 1] is a probability distribution function. P(z, a, z 0 ) denotes the probability with which state z 0 is reached by executing action a in state z. R : Z → < is a reward function associating with each state z its immediate utility R(z). Throughout the paper, we consider only fully observable MDPs, i.e. an agent fully observes the state he is currently in, as well as discrete state and action spaces. A sequential decision problem consists of an MDP and is the problem of finding a policy π : Z → A that maximizes the total discounted and accumulated reward received when executing the policy. Such a policy is called optimal. The value of a state z with respect to a policy π is defined recursively as: X P(z, π(z), z 0 )Vπ (z 0 ), Vπ (z) = R(z) + γ z 0 ∈Z

where 0 ≤ γ < 1 is a discount factor. Finding an optimal policy corresponds to finding an optimal value function V ∗ , i.e. a value function which asserts that the value of a state z is the expected instantaneous reward plus the expected discounted value of the next state executing the best available action: X P(z, a, z 0 )V ∗ (z 0 ) }. V ∗ (z) = max { R(z) + γ a∈A

z 0 ∈Z

The value iteration algorithm in Figure 1 can be used to approximate an optimal value function arbitrarily well. The function Qn (a, z) used in this algorithm is algorithms, e.g. in one presented in Section 2. But we claim that such examples hardly ever arise in practice.

n := 0. Specify ε > 0. Initialize V0 (z) arbitrarily, e.g., V0 (z) = R(z), for all z ∈ Z. loop n := n + 1. loop for all z ∈ Z. loop for all a ∈ A. P P(z, a, z 0 )Vn−1 (z 0 ). Qn (a, z) := R(z) + γ z 0 ∈Z

formalization of the value iteration algorithm presented in the previous section. Before doing so, however, the original version of the FC is briefly sketched. In the FC, a fluent is a firstorder term and a state is a multi-set of fluents represented as a term using a constant 1 denoting the empty multi-set and a binary AC1-function symbol ◦ denoting multi-set union. For example, a state in which racing cars Ferrari (f ) and BMW (b) are in Monte Carlo (m) and it is raining (rain ) is represented by

end loop. Vn (z) := max Qn (a, z).

rin (f, m) ◦ rin (b, m) ◦ rain , 2

a∈A

end loop. until |Vn (z) − Vn−1 (z)| < ε for all z ∈ Z. Figure 1. The classical value iteration algorithm.

called n-stage-to-go Q-function and denotes the expected value of executing action a in state z with n steps to go and acting optimally thereafter. If an optimal value function is computed, then an optimal policy π ∗ executed in state z can be obtained as: X π ∗ (z) = arg max { R(z) + γ P(z, a, z 0 )V ∗ (z 0 ) }. a∈A

3

z 0 ∈Z

The Fluent Calculus

The fluent calculus (FC), much like the situation calculus, provides a methodology for specifying and reasoning about states, actions and causality. It was originally set up as a first-order logic program with equality using SLDE- or SLDENF-resolution as sole inference rule [6, 7]. In the meantime, the FC has been revised as a predicate logic specification language using constraint handling rules for reasoning [8, 9]. Whereas the original version allowed for backward as well as forward reasoning, the revised version was designed as a forward reasoning approach. In forward as well as in backward reasoning only reachable states are considered. This is in contrast to the DP approach mentioned in the previous section, which always considers all states. On the other hand, the DP approach is a backward reasoning system. Thus, in order to combine FC and DP we have three possibilities: (1) use the original version of the FC, (2) use the revised version of the FC and specify a regression operator in this version, or (3) use the revised version of the FC and revise DP such that it reasons forwardly. To our knowledge, there is no efficient forward dynamic programming algorithm. There always has to be a back-up at some point to do any form of prediction. Checking the revised version of the FC, it is not immediately obvious how a regression operator can be formalized without a major change in the calculus. Because this paper is mainly a case study of whether it is beneficial to combine DP and reasoning in the FC, we opt for the first possibility. In this case we do not have to change existing systems but can rather concentrate on finding a FC

where the fluent rin (R, C) states the presence of a racing car R in a city C. An action is represented using a predicate symbol action/3, whose arguments denote the preconditions, the name, and the effects of an action, respectively. As an example consider the successful unloading of a car from a truck: If a truck is in a certain city and a car is loaded on this truck, then after successfully unloading the truck, both the car and the truck will be in this city: action (on (R, T ) ◦ tin (T, C), unloadS (R, T ), rin (R, C) ◦ tin (T, C)).3 The action unloadS (R, T ) is a deterministic primitive, referred to as nature’s choice, of an action unload (R, T ). The technique of decomposing stochastic actions into nature’s choices is presented in Section 4. Please note that all formulae are supposed to be universally closed. Causality is represented using a predicate symbol causes/3, whose arguments denote a state, a sequence of actions, and a successor state, respectively. Intuitively, an atom such as causes (Z, P, Z 0 ) is to be understood as: The execution of a plan P transforms a state Z into a state Z 0 . The predicate causes/3 is defined recursively on the structure of plans, which are lists of actions: causes (Z, [ ], Z 0 ) ← Z =AC1 Z 0 . causes (Z, [A|P ], Z 0 ) ← action (C, A, E) ∧ Z 0 =AC1 Z 00 ◦ E ∧ causes (Z, P, Z 00 ◦ C). These two formulae together with the formulae representing actions as well as the equational theory for the AC1-function symbol ◦/2 specify a backward reasoning system. In particular, it is possible to regress a certain state. Formally, the FC theory comprises the equational axioms for ◦/2 and the definitions of action/3 and causes/3. 2 Please note that f ,

b, m are constants; rain /0 and rin /2 are fluents. may argue that tin (T, C) is a precondition of this action which remains unchanged when the action is executed, and is not an effect. For technical reasons, however, such preconditions must also be specified as effects in the original FC. This was one of the reasons for revising the FC. On the other hand, a user will ideally not see a FC specification but rather will specify actions in a much higher language such as, for example, a restricted subset of English. Sentences in such a language will be compiled into FC expressions. We believe that in real-world applications there are only few preconditions which remain unchanged when an action is executed and, consequently, that the technical necessity to repeat these preconditions as effects is not a considerable drawback. 3 One

rin (R, C) tin (T, C) on (R, T ) rain

racing car R is in city C truck T is in city C racing car R is on truck T it is raining

unload (R, T ) load (R, T ) drive (T, C)

racing car R is unloaded from truck T racing car R is loaded onto truck T truck T is driven to city C

Table 1. The fluents and actions of a logistics example.

Throughout the paper, we are going to use an example from logistics with trucks, racing cars, and cities. Trucks are driven between cities, racing cars are loaded onto and unloaded from trucks. The scenario can be described using the fluents and actions shown in Table 1. Suppose we would like to know how the state looks like such that after the successful unloading of the Ferrari(f ) from a truck t we find that the Ferrari(f ) as well as the truck t are in Monte Carlo(m). Then we have to ask a query of the following kind: ?− causes (Z, [unloadS(f, t)], rin (f, m) ◦ tin (t, m)). Applying SLDE-resolution yields the empty clause after four steps with computed answer substitution: {Z 7→ on (f, t) ◦ tin (t, m)}. Occasionally, one would like to restrict states such that some or all fluents occur at most once. The solution proposed in [6] was to require that states are consistent and to assume that states in which one of these fluents occur more than once are inconsistent. It was shown that if the execution of actions preserves consistency, then the check for consistency needs to be performed only once, for example, by adding an appropriate condition to the first clause of causes/3. Otherwise, a consistency check needs to be performed whenever causes/3 is resolved upon.

4

Symbolic DP

We first define the notion of ‘abstract state’ which we will need later. Definition 1 Let f1 , . . . , fn be fluents and X be a variable. Then the state of the form f1 ◦ . . . ◦ fn ◦ X is called abstract state. Due to the presence of a variable X an abstract state f1 ◦ . . . ◦ fn ◦ X represents a set of states. In all these states fluents f1 , . . . , fn are present and hold, but one should be aware that other fluents may hold as well. The first step in modelling DP within the FC is to introduce stochastic actions. We need to do so because stochastic actions have different outcomes that can be described using probabilities. The technique used here is to decompose a stochastic action into deterministic primitives under nature’s control, referred to as nature’s choices.

prob (unloadS (R, T ), unload (R, T ), Z) = .7 ↔ holds (rain , Z) prob (unloadS (R, T ), unload (R, T ), Z) = .9 ↔ ¬holds (rain , Z) prob (unloadF (R, T ), unload (R, T ), Z) = .3 ↔ holds (rain , Z) prob (unloadF (R, T ), unload (R, T ), Z) = .1 ↔ ¬holds (rain , Z) prob (loadS (R, T ), load (R, T ), Z) = .99 prob (loadF (R, T ), load (R, T ), Z) = .01 prob (driveS (T, C), drive (T, C), Z) = .99 prob (driveF (T, C), drive (T, C), Z) = .01 Table 2. The probabilities in the logistics example.

We are going to use a relation symbol choice/2 to model nature’s choice. As an example consider the action unload (R, T ) of unloading a racing car R from a truck T : choice (unload (R, T ), A) ↔ (A = unloadS (R, T ) ∨ A = unloadF (R, T )), where unloadS (R, T ) and unloadF (R, T ) define two nature’s choices for action unload (R, T ), viz. that it is successfully executed or fails. Likewise, nature’s choices can be defined for the other actions in the example scenario. For each of nature’s choices aj (X) associated with an action a(X) we define the probability prob (aj (X), a(X), Z). It denotes the probability with which one of nature’s choices aj (X) is chosen in a state Z. For example, prob (unloadS (R, T ), unload (R, T ), Z) = .7 ↔ holds (rain , Z) states that the probability for the successful execution of an unload action in state Z is .7 if it is raining, where holds/2 is a macro defined by: def

holds (F, Z) = (∃Z 0 ).Z =AC1 F ◦ Z 0 . Altogether, we specify the probabilities shown in Table 2. For each of the nature’s choices, we specify the conditions and effects using the predicate symbol action/3. Table 3 shows the specification. In the next step, we have to define the value of the reward function for each state. We give a reward of 10 to all states in which the Ferrari is in Monte Carlo and 0, otherwise: reward (Z) = 10 ↔ holds (rin (f, m), Z) reward (Z) = 0 ↔ ¬holds (rin (f, m), Z). One should observe that we have specified the reward function without explicit state enumeration. Instead, the state space is divided into abstract states depending on whether or not, the Ferrari is in Monte Carlo. Likewise, the value function can be specified with respect to the

where Z is a state. In other words, the value function vn (Z) applied to a state Z has the value W iff there exists an action a with the q-value W and for all other actions a0 having the q-value W 0 , we find that W 0 is less than or equal to W. In this way we formalize the maximality construct of the optimal value function. We can now come back to the running example and illustrate the FC formalization of the value iteration algorithm. Let the discount factor γ be set to .9 and initialize v0 (Z) = reward (Z) for all abstract states Z. The next step is to compute q1 (a(X), Z) with respect to all given actions a(X) and the abstract states Z. We will illustrate this step using the action unload with its nature’s choices unloadS and unloadF. In this case, we obtain:

action (rin (R, C) ◦ tin (T, C), loadS (R, T ), on (R, T ) ◦ tin (T, C)) action (rin (R, C) ◦ tin (T, C), loadF (R, T ), rin (R, C) ◦ tin (T, C)) action (on (R, T ) ◦ tin (T, C), unloadS (R, T ), rin (R, C) ◦ tin (T, C)) action (on (R, T ) ◦ tin (T, C), unloadF (R, T ), on (R, T ) ◦ tin (T, C)) action (tin (T, C), driveS (T, C 0 ), tin (T, C 0 )) action (tin (T, C), driveF (T, C 0 ), tin (T, C)) Table 3. The actions in the logistics example.

abstract states only. This is in contrast to classical DP algorithms like the one shown in Section 2, in which the states are explicitly enumerated. We proceed by specifying the n-stage-to-go Qfunction X Qn (a, z) = R(z) + γ P(z, a, z 0 )Vn−1 (z 0 ) 0 z ∈Z (1) within the FC. In our approach, the only way the execution of an action may lead to different outcomes is by means of nature’s choices. Let a(X) be an action and aj (X), 1 ≤ j ≤ k, its nature’s choices, where X is a sequence of arguments. We may replace (1) by its FC analogue

q1 (unload (R, T ), Z) = reward (Z)+ γ prob (unloadS (R, T ), unload (R, T ), Z) v0 (Z10 )+ γ prob (unloadF (R, T ), unload (R, T ), Z) v0 (Z20 ) with the conditions FC |= causes (Z, [unloadS (R, T )], Z10 ), FC |= causes (Z, [unloadF (R, T )], Z20 ).

(5) (6)

Starting from the given abstract state space, we compute the possible abstract predecessor states with respect to (5) and (6). Consider first the abstract state where the Ferrari is in Monte Carlo, i.e. holds (rin (f, m), Z10 ) is true. In this case, the holds -macro can be merged into the causes -statement. We obtain: FC |= causes (Z, [unloadS (R, T )], rin (f, m) ◦ Z100 ), FC |= causes (Z, [unloadF (R, T )], rin (f, m) ◦ Z200 ).

qnP (a(X), Z) = reward (Z)+ γ prob (aj (X), a(X), Z) vn−1 (Zj0 )

(2)

j

To solve the first equation, we have to find a refutation for: ?− causes (Z, [unloadS (R, T )], rin (f, m) ◦ Z100 ).

with FC |= causes (Z, [aj (X)], Zj0 )

(3)

for all 1 ≤ j ≤ k. One should observe that in the FC formalization of the value iteration algorithm we are not going to enumerate all states but rather divide the state space into abstract states. In our running example, these are initially the two abstract states, where the Ferrari is in Monte Carlo or is not. In order to meet condition (3), Zj0 is instantiated with these abstract states whereas the variable Z and the parameters X are left uninstantiated. By computing the entailment relation, we effectively regress Zj0 through the corresponding nature’s choice aj (X) to obtain Z. This reflects the backward nature of the value iteration algorithm. Finally, we have to find an FC analogue for the equation Vn (z) = max Qn (a, z).

Within two steps, this goal can be reduced to: ?− rin (R, C) ◦ tin (T, C) ◦ X =AC1 rin (f, m) ◦ Z100 , causes (Z, [ ], on (R, T ) ◦ tin (T, C) ◦ X). In the next step, the AC1-unification algorithm is called and terminates successfully with a set of most general unifiers consisting of σ1 = {R 7→ f, C 7→ m, Z100 7→ tin (T, m) ◦ X}, σ2 = {X 7→ rin (f, m) ◦ Y, Z100 7→ rin (R, C) ◦ tin (T, C) ◦ Y }. Thus, we obtain the two goals ?− causes (Z, [ ], on (f, T ) ◦ tin (T, m) ◦ X), ?− causes (Z, [ ], on (R, T ) ◦ tin (T, C)◦ rin (f, m) ◦ Y ).

a∈A

But this is a rather straightforward exercise: vn (Z) = W ↔ [ (∃X).qn (a(X), Z) = W ] ∧ W a

V [ (∀Y , W 0 ).qn (a0 (Y ), Z) = W 0 ] → W 0 ≤ W, (4) a0

Both goals can be refuted in one step leading to the computed answer substitutions θ1 = {Z 7→ on (f, T ) ◦ tin (T, m) ◦ X, R 7→ f, Z100 7→ tin (T, m) ◦ X, C 7→ m}, θ2 = {Z 7→ on (R, T ) ◦ tin (T, C) ◦ rin (f, m) ◦ Y, Z100 7→ rin (R, C) ◦ tin (T, C) ◦ Y },

Z A ∧ holds (rain , Z) A ∧ ¬holds (rain , Z) B ∧ holds (rain , Z) B ∧ ¬holds (rain , Z) C ∧ holds (rain , Z) C ∧ ¬holds (rain , Z)

A1 A2 B1 B2 C1 C2

q1 0 + .9 · .7 · 10 + .9 · .3 · 0 = 6.3 0 + .9 · .9 · 10 + .9 · .1 · 0 = 8.1 10 + .9 · .7 · 10 + .9 · .3 · 10 = 19 10 + .9 · .9 · 10 + .9 · .1 · 10 = 19 0 + .9 · .7 · 0 + .9 · .3 · 0 = 0 0 + .9 · .9 · 0 + .9 · .1 · 0 = 0

Table 4. The q1 -values for the unload action.

respectively. Furthermore, it can be computed that reward (Zθ1 ) = 0 and reward (Zθ2 ) = 10. Thus, we obtain two abstract predecessor states Z specified by def

A =

def

B =

holds (on (f, T ) ◦ tin (T, m), Z) ∧ ¬holds (rin (f, m), Z), holds (on (R, T ) ◦ tin (T, C) ◦ rin (f, m), Z).

Likewise, by considering the second abstract state where the Ferrari is not in Monte Carlo, i.e. ¬holds (rin (f, m), Z10 ) ¬holds (rin (f, m), Z20 ) are true, we obtain another abstract predecessor state specified by: def

C =

holds (on (R, T ) ◦ tin (T, C), Z) ∧ ¬holds (rin (f, m), Z) ∧ (R 6= f ∨ C 6= m).

For each of the abstract predecessor states, we can now compute its q1 -value. There is a slight complication though, because the probabilities of nature’s choices for the unload action depend on whether it is raining. Altogether, we obtain the six cases shown in Table 4. Because (B1) and (B2) lead to the same q1 -value, it does not make a difference whether it is raining. The same holds for (C1) and (C2). In summary, we obtain an abstract predecessor state space with the abstract states (A1), (A2), (BB), and (CC). One should observe that, for example, in case (C), we were considering the action unload (R, T ) without instantiating R and T . Thus, the FC formalization of classical DP abstracts not only from the state space but also from the action space. Similarly, one should compute q1 -values for actions drive and load. To compute v1 (Z) one has to apply Equation (4), compare the corresponding values, and choose an optimal action. For the sake of simplicity, we assume that the termination condition, i.e. |vn (Z) − vn−1 (Z)| < ε, is successfully met and our algorithm terminates after a single iteration step. The algorithm is summarized in Figure 2.

5

Discussion and Future Work

We have provided a symbolic DP approach for modelling first-order Markov decision processes within FC. We have addressed the scalability problem of the classical DP concept by partitioning the state space into abstract states. The partitioning is done according to the

n := 0. Specify ε > 0. Initialize v0 (Z) arbitrarily, e.g. v0 (Z) = reward (Z), for all Z ∈ Z. loop n := n + 1. For each action a(X) compute qn (a(X), Z) using vn−1 (Zj0 ), where Zj0 ∈ Z and satisfies the condition FC |= causes(Z, [aj (X)], Zj0 ) for all aj (X). Compute vn (Z) using qn (a(X), Z). until |vn (Z) − vn−1 (Z)| < ε for all abstract states Z. Figure 2. The first-order value iteration algorithm.

conditions represented as the first-order formulae produced by the decision-theoretic regression. Furthermore, by exploiting the logical structure of our approach, we avoid domain grounding. Our approach is very similar to the one presented within the situation calculus [3, 4] but is still distinguished by the following feature. FC operates on states directly which turns out to be a very efficient trait when computing the value of a fluent. In FC the value is readily available from the list of fluents. In SC the current situation term should be unfolded first either until the situation is reached where the fluent was caused true or false by the preceding action or until the initial situation is reached. Only after such unfolding will we have access to the value of the fluent. The symbolic DP approach presented herein can cover more complex scenarios than discussed so far. In particular, it can handle conditional actions as well as specificity. Consider a guard checking the doors of an office building. This scenario can be modelled with the help of the conditional action if (locked , continue , alarm ), the intuitive meaning of which is that if a door is locked (fluent locked ) then proceed to the next door (action continue ) else initiate the alarm signal (action alarm ). The primitive actions continue and alarm are specified as usual by action (current , continue , next ), action (green , alarm , red ), where fluents current and next define the current and the next position of a guard, fluents green and red state the fact that an alarm lamp is green or red, respectively. Such a scenario can be modelled by specifying that continue and alarm are the two choices of the conditional action and defining the probabilities as follows: prob (continue , if (locked , continue , alarm ), Z) = 1 ↔ holds (locked , Z), prob (continue , if (locked , continue , alarm ), Z) = 0 ↔ ¬holds (locked , Z), prob (alarm , if (locked , continue , alarm ), Z) = 1 ↔ ¬holds (locked , Z), prob (alarm , if (locked , continue , alarm ), Z) = 0 ↔ holds (locked , Z).

Likewise, consider the action of dropping an object: action (hold (X), drop (X), floor (X)). If, in addition it is known that the object is fragile, then after dropping it is also broken: action (hold (X) ◦ fragile (X), drop (X), floor (X) ◦ broken (X)). In [7] the concept of specificity was introduced to select the appropriate instance of the drop -action given a particular state. In the symbolic DP approach we can specify that the drop -action has two choices, viz. dropping non-fragile and fragile objects:

B. Nebel, editor, Proceedings of the Seventeenth International Conference on Artificial Intelligence (IJCAI-01), pages 690–700. Morgan Kaufmann, 2001. [4] C. Boutilier, R. Reiter, M. Soutchanski, and S. Thrun. Decision-theoretic, high-level agent programming in the situation calculus. In Proceedings of the Seventeenth National Conference on Artificial Intelligence (AAAI-00), 2000. [5] S. Hanks, S. Russel, and W. Wellman, editors. Decision Theoretic Planning: Proceedings of the AAAI Spring Symposium, Menlo Park, CA, USA, 1998. AAAI Press.

action (hold (X), dropN (X), floor (X)), action (hold (X) ◦ fragile (X), dropF (X), floor (X) ◦ broken (X)),

[6] S. H¨olldobler and J. Schneeberger. A new deductive approach to planning. New Generation Computing, 8:225–244, 1990.

and define the probabilities for selecting one of these choices as follows:

[7] S. H¨olldobler and M. Thielscher. Computing change and specificity with equational logic programs. Annals of Mathematics and Artificial Intelligence, 14:99–133, 1995.

prob (dropN (X), drop (X), Z) = 1 ↔ ¬holds (fragile , Z), prob (dropN (X), drop (X), Z) = 0 ↔ holds (fragile , Z), prob (dropF (X), drop (X), Z) = 1 ↔ holds (fragile , Z), prob (dropF (X), drop (X), Z) = 0 ↔ ¬holds (fragile , Z). Although some restrictions were made, the paper illustrates the ability to use FC for decision-theoretic regression and inspires to obviate these restrictions developing a more realistic presentation. A number of interesting directions remain to be explored. ‘Computer-based’ simplification techniques, not used so far because of their complexity, will definitely enhance the implementation. As an alternative direction of the future work, we may concentrate on integrating the presented decisiontheoretic regression algorithm for the FC with powerful first-order theorem provers to enhance the performance of the algorithm. Finally, it looks promising to investigate further the possibility of using the version of the FC presented here complementing it with such practical features as sensing, ramifications, or qualifications. Despite the fact that the main objective of this work was a case study of whether it is beneficial to combine DP and FC approaches, it is vital to address the complexity issues in the future.

References [1] R. E. Bellman. Dynamic Programming. Princeton University Press, Princeton, NJ, USA, 1957. [2] C. Boutilier, T. Dean, and S. Hanks. Decisiontheoretic planning: Structural assumptions and computational leverage. Journal of Artificial Intelligence Research, 11:1–94, 1999. [3] C. Boutilier, R. Reiter, and B. Price. Symbolic dynamic programming for first-order MDPs. In

[8] M. Thielscher. Introduction to the fluent calculus. Electronic Transactions on Artificial Intelligence, 2(3-4):179–192, 1998. [9] M. Thielscher. Programming of reasoning and planning agents with FLUX. In D. Fensel, D. McGuinness, and M.-A. Williams, editors, Proceedings of the International Conference on Principles of Knowledge Representation and Reasoning (KR), pages 435–446, Toulouse, France, Apr. 2002. Morgan Kaufmann.