Probabilistic Satis ability Pierre HANSEN GERAD and E cole des Hautes E tudes Commerciales 5255 avenue Decelles Montreal (Quebec), Canada H3T 1V6 [email protected] Brigitte JAUMARD GERAD and E cole Polytechnique de Montreal P.O. Box 6079, Station \Centre-Ville" Montreal (Quebec) Canada H3C 3A7 [email protected]

March 21, 1996

Contents 1 Introduction

1.1 Uncertainty and Probability : : : : : : : : : : : : : : : : : : : : : : : : 1.2 Probabilistic Satis ability : : : : : : : : : : : : : : : : : : : : : : : : : 1.3 Extensions : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1.3.1 Probability Intervals (or Imprecise Probabilities) : : : : : : : : : 1.3.2 Conditional Probabilities : : : : : : : : : : : : : : : : : : : : : : 1.3.3 Additional Linear Constraints : : : : : : : : : : : : : : : : : : : 1.3.4 Logical Operations on Conditional Events and their Probabilities

2 Analytical solution of PSAT

Boole's algebraic method : : : : : : : : : : : : : : : : : : : : : : : : : : Hailperin's extensions of Boole's algebraic method : : : : : : : : : : : : Polyhedral methods to obtain rules for combining bounds on probabilities Automated theorem proving with probabilistic satis ability : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 2.5 Theorem proving with condensed forms of probabilistic satis ability : :

2.1 2.2 2.3 2.4

3 Numerical Solution of PSAT

3.1 Column Generation : : : : : : : : 3.2 Solution of the auxiliary problem 3.2.1 Heuristics : : : : : : : : : 3.2.2 Exact algorithms : : : : : 3.3 Computational Experience : : : : 3.4 Computational Complexity : : : :

4 Decomposition

: : : : : :

: : : : : :

: : : : : :

: : : : : :

: : : : : :

: : : : : :

: : : : : :

: : : : : :

: : : : : :

: : : : : :

: : : : : :

: : : : : :

: : : : : :

: : : : : :

: : : : : :

: : : : : :

: : : : : :

: : : : : :

: : : : : :

: : : : : :

: : : : : :

1 1 3 5 5 6 8 9

11 11 13 15

17 19

21 21 23 23 28 33 33

35

5 Nonmonotonic Reasoning and Restoring Satis ability

40

6 Other Uses of the Probabilistic Satis ability Model

42

7 Other Related Approaches

45

5.1 Minimal Extension of Probability Intervals : : : : : : : : : : : : : : : : 40 5.2 Probabilistic Maximum Satis ability : : : : : : : : : : : : : : : : : : : 41 6.1 Maximum Entropy Solution : : : : : : : : : : : : : : : : : : : : : : : : 42 6.2 Anytime Deduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 44

7.1 7.2 7.3 7.4 7.5

Incidence Calculus : : : : : : : : : : : : : : : : Bayesian Logic : : : : : : : : : : : : : : : : : : Assumption-based Truth Maintenance Systems : Probabilistic Logic via Capacities : : : : : : : : Other Applications : : : : : : : : : : : : : : : :

8 Conclusions

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

45 46 48 48 49

50

1 Introduction 1.1 Uncertainty and Probability Due to the ubiquity of uncertainty both in knowledge and in inference rules, generalizations of logic-based methods including an explicit treatment of uncertainty have long been studied in Arti cial Intelligence. In fact, probability logic predates ai by more than a century | see Hailperin [101] for a detailed historical survey. Uncertainty has been studied following dierent perspectives. It has been argued (e.g., Zadeh [174]) that probability theory is not adequate to the treatment of uncertainty in ai. Alternate frameworks, such as fuzzy sets (e.g., Zadeh [172], Dubois and Prade [74]) and possibility theory (e.g., Zadeh [173], Dubois and Prade [76]) were proposed. Many speci c rules for combining uncertainty measures (or estimates) in expert systems were also elaborated (e.g., the certainty factor of mycin, see Buchanan and Shortlie [35]). Unsatisfaction with such skepticism and with alternate solutions led to reactions. Among others, Cheeseman [49] makes \A defense of probability" and Pearl [149] explains \How to do with probabilities what people say you can't". Successful methods spurred a recent return in favor of probability theory, highlighted by Nilsson's [144] paper on \Probability Logic" and Lauritzen and Spiegelhalter's [135] paper on local computation in Bayesian networks. The purpose of the present chapter is to survey the probability based treatment of uncertainty in ai, from an algorithmic point of view. To this eect the focus will be on a central model, probabilistic satis ability (equivalent to Nilsson's [144] probabilistic logic and probabilistic entailment) and its extensions. This model provides a way to make inferences of a general type. It is thus equivalent in practice, although expressed dierently, to a probability logic. For general discussions of propositional and rst-order probability logic the reader is referred to, e.g., Hailperin [99], Fagin, Halpern and Megiddo [78], Bacchus [16], Halpern [102], Grzymala-Busse [94], Abadi and Halpern [2]. The chapter is organized as follows. A formal statement of probabilistic satis 1

ability is given in the next subsection. Extensions are considered in Subsection 1.3: probability intervals (or imprecise probabilities), conditional probabilities in the constraints or objective function, and further linear constraints on the probabilities are introduced as well as probabilities for negations, conjunctions or disjunctions of conditional events and iterated conditionals. Analytical solution of probabilistic satis ability and its extensions is studied in Section 2. It treats of algebraic methods and of methods based on enumeration of vertices and extreme rays of polytopes. Applications to automated theorem proving in the theory of probabilities are described. Numerical solution of probabilistic satis ability is considered in Section 3. The column generation technique of linear programming is shown to play a crucial role. The auxiliary problem of nding the minimum (maximum) reduced cost, to be solved at each iteration when using column generation, is to minimize (maximize) a nonlinear function in 0{1 variables. It can be solved approximately, except when no more negative (positive) reduced costs can be found, by Tabu Search or some other heuristic. Then an exact solution method must be used: algebraic and linearization approaches appear to be the most ecient. Section 5 discusses the solution of large satis ability problems by decomposition. When the proposed probabilities are not consistent it is required to restore satis ability with minimal changes, a form of nonmonotonic reasoning. Several ways to do so are examined: probability intervals may be increased in a minimal way, which can again be done by linear programming, or a minimum subset of sentences may be deleted. The probabilistic maximum satis ability problem arising in this last case requires for its solution to combine column generation with mixed integer programming. Two ways to do so, extending the primal and dual approaches to mixed-integer programming are presented. In Section 6, ways to exploit the probabilistic satis ability model with aims different from those considered before are examined. Obtention of a unique solution, i.e., probability distribution, for probabilistic satis ability is rst examined. A natural tool is then entropy maximization. Next anytime deduction (Frisch and Haddawy [83]) is discussed. Bounds of increasing precision are computed using a set of rules whose application may be stopped whenever desired. This gives also an explicit justi cation 2

for the results obtained. In Section 7, probabilistic satis ability is compared with related approaches to the treatment of uncertainty in ai, mainly Bundy's [36, 37] Incidence calculus, Bayesian networks (e.g., Pearl [150], Lauritzen and Spiegelhalter [135]) and their combination with probabilistic satis ability known as Bayesian Logic, (Andersen and Hooker [11]). We also discuss the probabilistic assumption based truth maintenance systems (e.g., Kohlas and Monney [129]) and the recent work of Kampke [118] on extending probabilistic satis ability to belief functions using capacities (Choquet [52]). Applications of probabilistic satis ability or related models outside ai are brie y mentioned. Conclusions on the role of probabilistic satis ability in ai and related elds are drawn in Section 8.

1.2 Probabilistic Satis ability The probabilistic satis ability problem in decision form may be de ned as follows: Consider m logical sentences S1; S2; : : :; Sm de ned on n logical variables x1; x2; : : :; xn with the usual Boolean operators _ (logical sum), ^ (logical product) and (negation, or complementation). Assume probabilities 1; 2; : : :; m for these sentences to be true are given. Are these probabilities consistent? There are 2n complete products wj , for j = 1; 2; : : : ; 2n, of the variables x1; x2; : : : ; xn in direct or complemented form. These products may be called, following Leibniz, possible worlds. In each possible world wj any sentence Si is true or false. The probabilistic satis ability problem may then be reformulated: is there a probability distribution p1; p2; : : : ; p2 on the set of possible worlds such that the sum of the probabilities of the possible worlds in which sentence Si is true is equal to its probability i of being true, for i = 1; 2; : : : ; m. De ning the m 2n matrix A = (aij ) by 8 < 1 if Si is true in possible world wj aij = : 0 otherwise n

3

the decision form of probabilistic satis ability may be written:

1p = 1 Ap = p 0

(1)

where 1 is a 2n unit row vector, p and are the column vectors (p1; p2; : : :; p2 )T and (1; 2; : : :; m)T respectively. The answer is yes if there is a vector p satisfying (1) and no otherwise. Note that not all columns of A need be dierent. Moreover, not all 2m possible dierent column vectors of A need, or in most cases will, be present. This is due to the fact that some subset of sentences being true will force other sentences to be true or prohibit them from being so. Guggenheimer and Freedman [96] study the particular case in which for a subset of sentences all possible corresponding subvectors of A are present and the values of all sentences of the complementary subset are xed when the variables in any of these subvectors are xed. n

Considering one more sentence Sm+1 , with an unknown probability m+1 leads to the optimization form of probabilistic satis ability. Usually the constraints (1) do not impose a unique value for the probability m+1 of Sm+1 . As shown by de Finetti [67, 68, 69] this is the case if and only if the line-vector Am+1 = (am+1;j ) where am+1;j = 1 if Sm+1 is true in possible world wj and am+1;j = 0 if not, is a linear combination of the rows of A. Otherwise, the constraints (1) imply bounds on the probability m+1. The satis ability problem in optimization form is to nd the best possible such bounds. It can be written min / max Am+1p subject to: 1p=1 (2) Ap = p 0: Nilsson [144] calls (1) and (2) probabilistic logic and probabilistic entailment. However, while (1) and (2) are very useful inference tools they do not properly constitute a logic, i.e., a set of axioms and inference rules. The name of probabilistic satis ability, proposed by Georgakopoulos, Kavvadias and Papadimitriou [86], appears better suited as it stresses the relationship of (1) with the satis ability problem, which is the 4

particular case where = 1 and a solution with a single positive pj is required (which can be easily deduced from any other solution of (2)). As stressed by Kane [120, 122], two columns of (2) may dier only in their value in Am+1 and should not then be con ated and assumed to have the same probability, as suggested by Nilsson [144], for this would prohibit getting best possible bounds. Both problems (1) and (2) have their origin in the work of Boole [26, 27, 28, 29, 30], where they are called \conditions of possible experience" and \general problem in the theory of probabilities". Boole proposed algebraic methods for their solution (discussed below). Criticized by Wilbraham [171], and later by Keynes [125], Boole's work in probability was long forgotten in English-speaking countries. It seems however to have strongly in uenced de Finetti [66, 67, 68, 69], through Medolaghi [143], in the development of his theory of subjective probabilities. Boole's work was revived by Hailperin [97, 98, 100] who wrote a seminal paper explaining it with the help of linear programming, and a book-length study of Boole's logic and probability [98, 100]. Hailperin [97, 98, 100] also obtained several new results and proposed extensions of probabilistic satis ability, discussed below. Due to its basic character probabilistic satis ability was often independently rediscovered, sometimes in particular cases or variants, i.e., by Adams and Levine [8], Kounias and Marin [130], Nilsson [144], Chesnokov [51], Gelembe [84] and probably others.

1.3 Extensions 1.3.1 Probability Intervals (or Imprecise Probabilities) Several signi cant extensions of probabilistic satis ability have been proposed. Hailperin [97] noted that the use of intervals instead of point values for probabilities is often more

5

realistic and more general than Boole's \general problem". Then problem (2) becomes: min / max Am+1p subject to: 1p = 1 (3) Ap p 0: If bounded variables are used, an equivalent expression in which the number of constraints remains equal to m + 1 is obtained: min / max Am+1p subject to: 1p = 1 (4) Ap + s = p0 0 s ? : This problem is also discussed in Lad, Dicky and Rahman [132], Jaumard, Hansen and Poggi de Arag~ao [117], Andersen and Hooker [11]. An extensive study of statistical reasoning with imprecise probabilities, using (3) and various extensions, is due to Walley [170].

1.3.2 Conditional Probabilities Another important extension of probabilistic satis ability is to consider conditional probabilities instead of, or in addition to, unconditioned ones. Indeed, in many cases probabilistic knowledge is only precise when some conditions hold. Use of conditional probabilities was already discussed by Boole [27] for particular examples. It is connected with his idea of independence, which is examined in Section 7. Other authors addressing conditional probabilities in the context of probabilistic satis ability are Hailperin [100], Chesnokov [51], Jaumard, Hansen and Poggi de Arag~ao [117] and Coletti [57]. Two cases arise: conditionals may be in the constraints of (4) or in the objective function. Several ways of representing the conditional probability prob(Sk jS`) = 6

prob(S ^S ) = kj` in (2) have been proposed. Introducing a variable ` for the unknown prob(S ) probability prob(S`) leads to the two constraints (Jaumard et al. [117]: Ak^`p ? k^`` = 0 (5) Ak^`p ? ` = 0 where Ak^` = (ak^`;j ) with ak^`;j = 1 if both S` and Sk are true in possible world wj and 0 otherwise. This way to express conditional probabilities is close to that of Boole [26] who also introduces an unknown parameter. A more compact expression is obtained by eliminating ` (Hailperin [100]: k

`

`

A0k^`p = (Ak^` ? kj`A`)p = 0

(6)

i.e., A0k^` = (a0k^`;j ) where a0k^`;j = 1 ? kj` if Sk and S` are true, ?kj` if Sk is false and S` true and 0 if S` is false in possible world wj . Adding kj` 1 to both sides of (6) gives an equation

A00k^`p = kj`

(7)

where A00k^` = (a00k^`;j ) is such that a00k^`;j = 1 if Sk and S` are true, 0 if Sk is false and S` true and kj` if S` is false. Observe that these three values coincide with those given by de Finetti [68, 69] in his de nition of the probability of a conditional event in terms of a bet won, lost or cancelled. If the conditional probability prob(Sk jS`) is in the objective function, the problem becomes one of hyperbolic (or fractional) programming: min = max AA^pp subject to: 1p = 1 (8) Ap = p 0: k

`

`

As noted by Hailperin [100] and by Chesnokov [51], a result of Charnes and Cooper [45] may be used to reduce the problem (8) to a linear program with one more variable: min = max Ak^`p subject to: A` p = 1 1p = t (9) Ap = t p 0; t 0; 7

and the same optimal value; the corresponding solution is obtained by dividing the optimal solution p of (9) by t. Note that all but one of the equations of (9) are homogeneous. This may cause problems in numerical solution, due to degeneracy. An alternate way to solve (8) is to apply Dinkelbach's [70] lemma, as done by Jaumard, Hansen and Poggi de Arag~ao [117]. Let r = 1 and r be an upper bound for the optimal value of (8), in case of minimization (which can always be taken as 1). Solve the problem (Ak^` ? r A`)p 1p = 1 Ap = p0

min subject to:

(10)

If the optimal value (Ak^` ? r A`)p is non-negative, stop, p being optimal. Otherwise, let r r + 1, r = AA^pp and iterate. k

`

`

1.3.3 Additional Linear Constraints Fagin, Halpern and Megiddo [78] note that if some of the i are not xed they may be subject to v 1 further linear inequalities. This leads to another extension: min = max subject to:

Am+1p 1p = 1 Ap + s = B = b

(11)

where B and b are a (v m)-matrix and a v-column vector of real numbers. This includes the problem of coherence of qualitative probabilities studied by, among others, Coletti [55, 56, 57] where only order relations between probabilities are given (with an arbitrarily small approximation if some or all of the inequalities are strict). Qualitative conditional probabilities, also studied by Coletti [54, 57], Coletti and Scozzafava [58] lead to a more complex nonlinear model. 8

Imprecise conditional probabilities can be treated similarly to imprecise probabilities. If kj` kj` k` the corresponding lines in the linear program are Ak^`p ? kj`A`p 0 (12) Ak^`p ? kj`A`p 0 Andersen and Hooker [10] propose a particular interpretation for this case, in terms of unreliable sources of information: prob(Sk jS`) is viewed as the probability that Sk is true given that the source of information ` is reliable. This last condition is expressed by proposition S` the probabilities of which is itself bounded by an interval:

` prob(S` ) = A`p `:

(13)

Conditional propositions themselves conditioned on the reliability of the source can also be expressed in a similar way. This is a particular case of iterated conditioning, a topic explored by, among others, Goodman, Nguyen and Walker [92], Calabrese [42] and discussed below.

1.3.4 Logical Operations on Conditional Events and their Probabilities Conditional probabilities P (S1jS2) may be viewed as probabilities of conditional events (S1jS2) which have three truth values: true if S1 and S2 are true, false if S1 is false and S2 true and undetermined if S2 is false. Such conditional events, implicit in Boole [26] were de ned by de Finetti [67, 68, 69] and rediscovered recently by many authors. Proposals for building an algebra of conditional events were made, more or less systematically, by Reichenbach [156], Schay [160], Adams [8], Hailperin [98, 100], Dubois and Prade [76], Bruno and Gilio [34], Calabrese [39, 40, 41, 42], Goodman, Nguyen and Walker [92]. Several de nitions, often justi ed on intuitive grounds, were given for conjunction and disjunction operations. Diculty is largely due to the fact that as shown by Lewis' Triviality Result [136], there is no expression S for (S1jS2) in boolean algebra such that P (S ) = P (S1jS2) except in very particular cases. Goodman, Nguyen and Walker [92] show that the space of conditional events is a Stone algebra, generalizing Boolean algebras. Moreover, they show that dierent ways to de ne conjunction 9

and disjunction correspond to dierent three-valued logics. Schay [160] proposes two systems:

(S1jS2) ^ (S3jS4) = (S 2 _ S1)(S 4 _ S3)j(S2 _ S4 (S1jS2) _ (S3jS4) = (S1S2 _ S3S4jS3 _ S4)

and

(S1jS2) ^ (S3jS4) = (S1S3jS2S4) (S1jS2) _ (S3jS4) = (S1 _ S3jS2S4) Goodman and Nguyen [91] propose another one:

(14) (15)

(S1jS2) ^ (S3jS4) = S1S3j(S 1S2 _ S 3S4 _ S2S4) (S1jS2) _ (S3jS4) = S1 _ S3j(S1S2 _ S3S4 _ S2S4) :

(16)

All three systems have negation de ned by (S1jS2) = (S 1S2jS2):

(17)

Truth tables for S1; S1 _ S2 and S1 ^ S2 as a function of S1 and S2 deduced from rules (14) (17) and (15) (17) are those of Sobocinski's and Bochvar's 3-valued logics. Those for the system (16){(17) correspond to Lukasievicz and Kleene's 3-valued logics (as well as to Heyting's 3-valued logic concept for S 1). These results show that any algebraic expression of conditional events can be reduced (in several ways) to a single conditional event. Probabilities of such compound expressions can thus be expressed in probabilistic satis ability models as usual conditional probabilities. Iterated conditionals have also been reduced to conditionals in various ways. For instance, Calabrese [42] proposes the relation

(S1jS2)j(S3jS4) = (S1jS2) ^ (S3 _ S 4)

The subject is also discussed in detail in Goodman, Nguyen and Walker [92].

10

(18)

2 Analytical solution of PSAT 2.1 Boole's algebraic method Boole [26, 27, 28, 29, 30] proposed several methods (some of which are approximate) to solve analytically the decision and optimization versions of probabilistic satis ability . Methods for both cases are similar. Boole equates truth of a logical sentence with the value 1 and falsity with 0. His simplest and most ecient method proceeds as follows:

Algorithm B: (Boole's method) (i) express all logical sentences as sums of complete products, i.e., products of all variables in direct or complemented form; (ii) associate to each of these products an unknown probability pj , write linear equations stating that the sum of the probabilities pj of the complete products associated with a logical sentence is equal to the (given) probability i of that sentence to be true. Add constraints stating that probability pj of all complete products sum to 1 and are non-negative; (iii) eliminate from the equalities and inequalities as many probabilities pj as possible using the equalities; (iv) eliminate from the inequalities obtained in the previous step the remaining probabilities pj as well as m+1 by considering all upper bounds and all lower bounds on one of them, stating that each lower bound is inferior to each upper bound, removing redundant constraints and iterating.

The obtained relations involving 1; : : :; m are Boole's conditions of feasible experience; the relations involving also m+1 give best possible bounds on this last probability, i.e., are the solution to Boole's general problem.

11

Example 1. (Boole's challenge problem, 1851 [23]) Let prob(S1 x1) = 1 prob(S2 x2) = 2 prob(S3 x1x3) = 3 prob(S4 x2x3) = 4 prob(S5 x1x2x3) = 0 Find best possible bounds on the probability of S6 = x3. Step (i) gives:

x1 x2 x1x3 x2x3

= = = =

x1x2x3 + x1x2x3 + x1x2x3 + x1x2x3 x1x2x3 + x1x2x3 + x1x2x3 + x1x2x3 x1x2x3 + x1x2x3 x1x2x3 + x1x2x3:

Step (ii), after setting p1 = prob(x1x2x3), p2 = prob(x1x2x3), p3 = prob(x1x2x3), p4 = prob(x1x2x3), p5 = prob(x1x2x3), p6 = prob(x1x2x3), p7 = prob(x1x2x3), p8 = prob(x1x2x3), yields the following equalities and inequalities: p1 +p2+p3+p4 = 1 p1 +p2 +p5+p6 = 2 p1 +p3 = 3 p1 +p5 = 4 p7 = 0 p1 +p2+p3+p4+p5+p6 +p7+p8 = 1 p1; p2; p3; : : :; p8 0: Eliminating successively the variables p7; p4; p3; p6; p5 p1 and p2 yields at the end of Step (iii), the bounds max(3; 4) 6 min(1 ? 1 + 3; 3 + 4; 1 ? 2 + 4) and the conditions

1 3 ; 2 4 : 12

Eliminating 6 yields the additional condition

1 ? 3 + 4 1:

2

2.2 Hailperin's extensions of Boole's algebraic method Boole's algebraic method can be extended to deal with conditional probabilities. In fact, Boole [26] himself already considered some problems involving conditional probabilities, but a systematic treatment was only provided by Hailperin [100] (and independently, to some extent, by Chesnokov [51]). As mentioned above, two cases arise. First, one may have a conditional probability in the objective function only. Then one can set up the problem's constraints as done above, express the objective function as a ratio of linear expressions and use Charnes and Cooper's [45] result to obtain the equivalent linear program (9). Eliminating the variables pj and t as above leads to an analytical solution.

Example 2. (Hailperin, 1986 [100]).

Given prob(x1) = 1, prob(x2) = 2 nd best possible bounds on prob(x1x2jx1 _ x2). Let p1 = prob(x1x2), p2 = prob(x1x2), p3 = prob(x1x2), p4 = prob(x1x2). Then this problem can be expressed as p1 min = max p1 + p2 + p3 subject to: p1 +p2 = 1 p1 +p3 = 2 p1 +p2 +p3 +p4 = 1 p1; p2; p3; p4 0; p1 + p2 + p3 > 0:

13

The equivalent problem (9) is min = max subject to:

p1 p1 +p2 = t1 p1 +p3 = t2 p1 +p2 +p3 +p4 = t p1 +p2 +p3 =1 p1; p2; p3; p4 0:

Eliminating successively p2 ; p4; p3 and t yields the bounds

1 2 maxf0; 1 + 2 ? 1g prob(x1x2jx1 _ x2) min ; : 2 1 There are no other conditions than 0 1 1, 0 2 1.

2

Second, one may have conditional probabilities in the constraints. The elimination process can be applied to these constraints, written, e.g., in the form (6). Note that this procedure amounts to solving a linear program with parametric right-hand side or with some parameters in the coecients matrix by Fourier-Motzkin elimination (Dantzig and Eaves [65]).

Example 3. (Suppes, 1966 [165], Hailperin, 1986 [100]) Given prob(x1) = 1, prob(x2jx1) = 2j1 nd best possible bounds on prob(x2).

De ning p1; p2; p3; p4 as in Example 2, this problem can be expressed as min = max subject to:

= p 1 + p3 p 1 + p2 = 1 (1 ? 2j1)p1 ? 2j1p2 = 0 p1 + p2 + p3 + p4 = 1 p1; p2; p3; p4 0 14

Eliminating successively p4 ; p3; p2 and p1 yields the bounds

2j11 1 ? 1(1 ? 2j1): The lower bound was found in another way by Suppes [165]. Again, there are no conditions except 0 1 1, 0 2 1. 2 Observe that these methods for determining combination rules for probability bounds on logical sentences are quite general. They could be used to obtain in a uniform way the many rules of probabilistic logic gathered by Frisch and Haddawy [83], or to study under which conditions high probability of a set of sentences entails high probability of another one, or of one among another set, as studied by Adams [3, 4, 5, 6, 7] and Bamber[18]. They can also be used to check if combination rules based on other grounds than probability theory agree with this theory or not, possibly with further assumptions (e.g., Guggenheimer and Freedman [96], Dubois and Prade [75], Stephanou and Sage [164]).

2.3 Polyhedral methods to obtain rules for combining bounds on probabilities Other methods than Fourier-Motzkin elimination for obtaining an analytical solution of probabilistic satis ability have been devised. They are based on the study of the dual polyhedra for (2). Let the dual of (2) be written: min(max) y0 + y subject to: 1 y0 + Aty Atm+1 (11y0 + Aty Atm+1)

(19)

Observe that the constraints of (19) are satis ed by the vector (1; 0) (0; 0) , so the corresponding polyhedra are non empty. Then, using the duality theorem of linear programming yields

Theorem 1 (Hailperin, [97])

The best lower (upper) bound for m+1 is given by the following convex (concave) piece-

15

wise linear function of the probability assignment:

m+1() = m+1() =

j max (1; )tymax j t min (1; ) ymin j =1;2;:::;kmin j =1;2;:::;kmax

(20)

j (y j ) for all j represent the kmax (kmin ) extreme points of (19). where ymax min

This result gives bounds on m+1 but not the conditions of possible experience. It has recently been completed. Consider rst the dual of the probabilistic satis ability problem in decision form (1), after adding a dummy objective function 0p; to be maximized: min y0 + y (21) subject to: 1 y + Aty 0: 0

Then using the fact that any point in a polyhedron can be expressed as a convex linear combination of its extreme points plus a linear combination of its extreme rays (Caratheodory's Theorem [43]), and once again the duality theorem, yields

Theorem 2 (Hansen, Jaumard and Poggi de Arag~ao, [112])

The probabilistic satis ability problem (1) is consistent if and only if

(1; )tr 0

(22)

for all extreme rays r of (21).

The same argument shows that (22) yields all conditions of possible experience for problem (2). Both Theorems 1 and 2 readily extend to the case of probability intervals (problem (3)) but not to the case of conditional probabilities. The reason is that the constraints of (19) and (21) do not depend on but that property ceases to be true when there are conditional probabilities. Several authors study analytically conditions of possible experience and bounds for particular classes of propositional logic sentences. Andersen [9] and Andersen and Hooker [12] consider a subclass of Horn clauses which can be represented by a directed 16

graph G = (V; U ). Vertices vi 2 V are associated with atomic propositions Si (or logical variables) and arcs (vi; vk ) with implications. Truth of the conjunction of variables xi associated with the predecessors vi of a vertex vk implies the truth of the variables vk associated to that vertex. Adams [3, 4, 8, 6, 7] and Bamber [18] examine when high probability for a given set of sentences (possibly including conditionals) implies high probability of another sentence, or of at least one sentence among another given set.

2.4 Automated theorem proving with probabilistic satis ability The results of the previous subsection lend themselves easily to automation. While this could also be done for Fourier-Motzkin elimination, it would probably be more timeconsuming, as nding all implied relations and eliminating redundant ones are tasks whose diculty rapidly augments with problem size (but that approach remains of interest and apparently the only feasible one, when there are conditional probabilities). Numerous algorithms have been proposed for vertex and extreme ray enumeration of polyhedra, see, e.g., Dyer [77] and Chen, Hansen and Jaumard [48] for surveys. Usually methods proposed for vertex enumeration can be extended to handle ray enumeration also. Approaches to vertex enumeration include: (i) exploration of the adjacency graph G = (V; E ) of the polyhedron, where vertices vj of G are associated to extreme points xk of the polyhedron and edges fvj ; vkg 2 E join pairs of vertices vj ; vk associated with extreme points xj ; xk which are the endpoints of edges of this polyhedron. The exploration rule is depth- rst search (Dyer [77]) or breadth- rst search. The diculty lies in determining whether a vertex has already been visited. Long lists of visited vertices must be kept in most methods; (ii) the reverse search approach of Avis and Fukuda [15] which avoids this last problem by de ning a priori an arborescence on the graph G = (V; E ). This is done by using Bland's [21] rule for choice of the entering variable even in case of degeneracy in the simplex algorithm. When applying depth- rst search, Bland's rule is reversed when arriving at a vertex x`. If the vertex x` is that one associated with the vertex xk from which one comes, then 17

x` is considered as rst explored and stored. Otherwise backtracking takes place; (iii) the adjacency lists method of Chen, Hansen and Jaumard [47] which does not use the simplex algorithm but keeps adjacency lists for vertices of polyhedra having initially only a few constraints and updates them when adding constraints one at a time. Note that when applying such methods to probabilistic satis ability degeneracy is frequent and must be taken care of. Automated derivation of bounds and conditions of possible experience makes easy the study of variants of a problem, e.g., going from point probabilities to intervals, as next illustrated.

Example 4.

Consider again Example 1, but without xing 5 at the value 0. Then conditions of possible experience and bounds, automatically obtained (Hansen, Jaumard and Poggi de Arag~ao [112]), are:

Conditions of possible experience Lower bounds 1 3 3 + 5 2 4 4 + 5 1 + 5 1 2 + 5 1 3 + 1 1 + 4 + 5 4 + 1 2 + 3 + 5 0 i 1 i = 1; 2; : : : ; 5

Upper bounds (1 ? 1) + 3 (1 ? 2) + 4 3 + 4 + 5

Replacing all point values 1; 2; : : :; 5 by intervals [1; 1]; [2; 2]; : : :; [5; 5] leads to:

18

Conditions of possible experience Lower bounds i i i = 1; 2; : : : ; 5 3 + 5 0 i i = 1; 2; : : : ; 5 4 + 5 i 1 i = 1; 2; : : : ; 5 3 1 4 2 1 + 5 1 2 + 5 1 4 + 5 1 1 + 4 + 5 3 + 1 2 + 3 + 5 4 + 1

Upper bounds (1 ? 1) + 3 (1 ? 2) + 4

3 + 4 + 5 1 + 4 + 5 2 + 3 + 5 1 + 2 + 5

It can be proved (Hansen, Jaumard and Poggi de Arag~ao [112]) that the bounds obtained in the case of point probabilities are never redundant. In other words, there is always a vector (1; ) for which the corresponding vertex of the dual polytope is optimal, and the bound is attained. This property does not hold anymore for the case of probability intervals.

2.5 Theorem proving with condensed forms of probabilistic satis ability As mentioned above, probabilistic satis ability as expressed in (1) or (2) leads to very large programs. When studying particular cases, one may condense rows or columns by summing them, to drastically reduce the size of these programs. This approach, explored by Kounias and Marin [130], Prekopa [151, 152, 153], Boros and Prekopa [31], Kounias and Sotirakoglou [131], has led to generalizations and improvements of several important results in probability theory. Consider for instance n events and assume that sums of probabilities 1; 2; : : : ; m for all products of 1; 2; : : : ; m events, i.e., the rst m binomial moments, are given. 19

Let vi for i = 1; 2; : : : ; n denote the probability that exactly i events occur. Then m P vi is the probability that at least one event occurs. The well-known Bonferroni [22] i=1 inequalities state that: n X i=1 n X i=1 n X i=1

vi 1 vi 1 ? 2 v i 1 ? 2 + 3

and so on. Various authors have proposed improved formulae in which the right-hand sides coecients are not all equal to 1 or ?1. The problem of nding best bounds can be written (Prekopa [151]): min = max subject to:

m X i=1 n X

vi

Ci1vi = 1 i=1 ...

n X

(23)

Cimvi = m i=1 vi 0 i = 0; 1; : : : ; m where the Cij are the binomial coecients. Problem (23) can be viewed as a condensed form of a probabilistic satis ability problem in which logical sentences correspond to all products of up to n variables in direct or complemented form. Using a result of Fekete and Polya [79], Prekopa [151] solves the dual of (23) explicitly, thus obtaining best possible \Boole-Bonferroni" bounds. Boros and Prekopa [31], Prekopa [152, 153], Kounias and Sotirakoglou [131] generalize these results in several ways. Lad, Dickey and Rahman [133] use the probabilistic satis ability model in a dierent way to extend the classical Bienayme-Chebychev [46] inequality in the context of nite discrete quantities. 20

3 Numerical Solution of PSAT 3.1 Column Generation The linear programs (1) and (2) which express the Probabilistic Satis ability problem in decision and optimization versions have a number of columns which grows exponentially in the minimum of the number m of sentences and the number n of logical variables in these sentences. In view of the enormous size of these programs (about 109 columns for min(m; n) = 30, 1018 columns for min(m; n) = 60, etc.), it has been stated several times in the ai literature that they are untractable in a practical sense, not only in the worst case (as will be shown below). For instance, Nilsson [145] in a recent review of work subsequent to his \Probability Logic" paper of 1986 [144], writes about the \total impractibility of solving large instances" and recommends to look for heuristics. Such views are overly pessimistic: while even writing large probabilistic satis ability problems explicitly is impossible, they can be solved quite eciently by keeping them implicit. The tool to be used is an advanced one of linear programming, called column generation. It extends the revised simplex method, in which only a small number of columns are kept explicitly, by determining the entering column through solution of an auxiliary subproblem. This subproblem depends on the type of problem considered and is usually one of combinatorial programming. We next recall the principle of the column generation method for linear programming. Consider the linear program min z = cx

subject to:

Ax = b; x0

(24)

and its solution by the simplex algorithm (e.g., Dantzig [64]). At a current iteration (after a possible reindexing of the variables), let A = (B; N ) where B and N denote the submatrices of basic and nonbasic columns respectively. 21

Problem (24) can be expressed as subject to:

min z = cB B ?1b + (cN ? cB B ?1N )xN

(25) xB + B ?1NxN = B ?1b; xB ; xN 0 where xB ; xN are the vectors of basic and nonbasic variables and cB ; cN the corresponding vectors of coecients in the objective function. In the revised simplex method, one stores only the matrix B ?1 (in compact form), the current basic solution B ?1b and value cB B ?1b in addition to the data. The entering variable is determined by computing the smallest reduced cost, using the initial data, i.e.,

ck ? cB B ?1Ak = min c ? cB B ?1Aj = cj ? uAj j 2N j

(26)

where u = cB B ?1 is the current vector of dual variables. This computation is not too time consuming provided the matrix A is sparse and the columns not too numerous. Then the entering column is computed as B ?1Ak and the simplex iteration proceeds as usual (optimality check, unboundedness check, choice of leaving variable, updating of solution and basis inverse). If the number of columns is exponential in the input size one must compute min c ? uAj j 2N j

(27)

without considering nonbasic columns one at a time. This is done by a speci c algorithm in which the coecients in the columns Aj are the variables. For probabilistic satis ability the subproblem (27) is min c ? uAj = Sm+1 ? u0 ? j 2N j

m X i=1

uiSi

(28)

where, as discussed above, the values True and False for the Si; i = 1; : : : ; m + 1 are identi ed with the numbers 1 and 0. Then (28) is transformed into an arithmetical expression involving the logical variables x1; : : : ; xn appearing in the Si , with values 22

true and false also associated with 1 and 0. This is done by eliminating the usual boolean connectives _; ^ and using relations

xi _ xj xi + xj ? xi xj xi ^ xj xi xj xi 1 ? xi:

(29)

The resulting expression is a nonlinear (or multilinear) real valued function in 0{1 variables, or nonlinear 0{1 function, or pseudo-boolean function (Hammer and Rudeanu [104]).

Example 5. Consider again the problem of Example 1. Then subproblem (28) is min S6 ? u0 ? u1S1 ? u2S2 ? u3S3 ? u4S4 ? u5S5 = x3 ? u0 ? u1x1 ? u2x2 ? u3x4x3 ? u4x2x3 ? u5x1x2x3 = ?u0 ? u1x1 ? u2x2 + (u5 ? u3)x1x3 + (u5 ? u4)x2x3 ? u5x1x2x3 with x1; x2; x3 2 f0; 1g:

2 Note that if the probabilistic satis ability problem considered is in decision form one performs only Phase 1 of the simplex algorithm, with column generation: minimization of the sum of arti cial variables added to the constraints. The corresponding columns are kept explicit (as long as their variables remain in the basis, they can be discarded otherwise).

3.2 Solution of the auxiliary problem 3.2.1 Heuristics Problem (28) must be solved at each iteration of the column generation method and may be time-consuming. Indeed, minimization of a nonlinear 0{1 function is NP-hard, as numerous NP-hard problems, e.g., independent set, can be easily expressed in 23

that form. However, for guaranteeing convergence it is not mandatory to solve (28) exactly at all iterations. As long as a negative reduced cost (for minimization) is found an iteration of the revised simplex algorithm may be done. If a feasible solution is obtained in that way, the decision version of probabilistic satis ability is solved. When no more negative reduced cost is given by the heuristic one must turn to an exact algorithm to prove that there is no feasible solution for the decision version of probabilistic satis ability or no feasible solution giving a better bound than the incumbent for the optimization version. It is worth stressing that while stopping the column generation method prior to optimality yields valid bounds for many combinatorial optimization problems (obtained by exploiting an equivalent but hard to solve compact formulation with a polynomial number of columns, and duality theory) this is not the case for probabilistic satis ability. Indeed, no such compact form is known and stopping before getting the best possible bounds yields only an upper bound on a lower bound (or a lower bound on an upper bound) of the objective function values. Such results are only estimates of those values and not bounds. The same is true when possible worlds are drawn at random, as suggested by Henrion [115]. As for large instances the number of iterations may be in the hundreds or thousands, designing ecient heuristics for (28) is of importance. Note that this problem may be viewed as a weighted version of maximum satisfiability (maxsat): given a set of m weighted clauses on n logical variables determine a truth assignment such that the sum of weights of the satis ed clauses is greater than or equal to a given value. Therefore, algorithms for the subproblem (both heuristic or exact) also apply to the satisfiability (sat) problem and to constraint satisfaction problems expressed in satis ability form. Conversely, some recent algorithms for sat (e.g., Selman, Levesque and Mitchell's gsat [161]) could be extended to weighted maxsat. An early heuristic which could apply to (28) (written in maximization form) is the steepest-ascent one-point move (saopma) method of Reiter and Rice [157]. It proceeds by choosing a rst truth assignment (or 0{1 vector) at random then complementing 24

the variable for which the resulting increase in objective function value is largest, and iterating as long as there is a positive increase. The trouble with such a method is that it quickly gets stuck in a local optimum which may have a value substantially worse than the global optimum. Improvements can be obtained by repeating the process a certain number of times (the so-called multistart procedure) but this may still give solutions far from the optimum. Much better results are obtained using so-called Modern Heuristics (see e.g., Reeves [158] for a book-length survey) which provide ways to get out of local optima. Among the earliest and best known of such methods is simulated annealing (Kirkpatrick, Gelatt and Vecchi [127]), Cerny [44]). In this method moves (variable complementations for weighted maxsat) are made by choosing a direction at random, accepting the move if it improves the objective function value and possibly also if it does not with a probability which decreases with the amount of deterioration and the time since inception of the algorithm. Figure 1 provides a description of simulated annealing for weighted maxsat, adapted from Dowsland [72], see also Hansen and Jaumard [106], for the unweighted case.

25

Simulated Annealing for minimizing a weighted maxsat function with objective function f (x) equal to sum of weights of clauses satis ed by x and neighborhood structure N (x) equal to vectors obtained by complementing one variable of x. Select an initial solution x0 ; Select an initial temperature t0 > 0; Select a temperature reduction function ; Repeat Repeat Randomly select x 2 N (x0); = f (x) ? f (x0); If < 0 then x = x0 else generate random q uniformly in the range (0; 1); if q < exp(?=t) then x0 = x; Until Iteration-count = nrep Set t = (t) Until stopping condition = true. x0 is an approximation to the optimal solution. See Dowland [72], van Laarhoven and Aarts [169] and Aarts and Korst [1] for discussions on the choice of parameters t0 , n rep, \cooling" function and stopping condition.

Figure 1 Simulated annealing exploits mostly the sign of the gradient of the objective value and not its magnitude (which interferes only with the probability of accepting a deteriorating move). In contrast, Tabu Search methods (e.g., Glover [87, 88], Hansen and Jaumard [106]) fully exploit gradient information while still providing a way to get out of local minima. In a simple version of such method for maxsat, called steepestascent-mildest-descent samd and due to Hansen and Jaumard [106] a direction of steepest ascent is followed until a local maximum, then a direction of mildest descent and cycling is avoided (at least for some time) by forbidding a reverse move for a given number of iterations. Figure 2 provides a description of such an algorithm for weighted maxsat close to that of Hansen and Jaumard [106]. 26

Note that the unweighted version of samd applies also to the satis ability problem sat in which one is only interested in solutions satisfying all clauses. It exploits gradient information as in the gsat algorithm of Selman, Levesque, and Mitchell [161] and in the algorithm of Gu [95] in the ascent phase, and search with tabus which forbid backtracking for some iterations to get out of a plateau. The latter two algorithms do this by ipping variables (in unsatis ed clauses for gsat) at random.

Steepest Ascent Mildest Descent for minimizing a weighted maxsat function. Select an initial solution x0 ; fopt = f (x0); xopt = x0 ; Set tj = 0 for j = 1; : : :; n; Repeat 0 = fopt fopt Repeat Select xk 2 N (x0) such that k = f (xk ) ? f (x0) = min j ; j jt =0 x0 = xk ; If f (xk ) < fopt then fopt = f (x0 ); xopt = x0 ; endif; If k > 0 then tk = `; Set tj = tj ? 1 for tj > 0, j = 1; 2; : : :; n; Until Iteration-counter = n rep 0 = fopt Until fopt xopt is an approximation to the optimal solution j

See Hansen and Jaumard [106] for a discussion on the choice of parameters n rep and ` (length of Tabu list).

Figure 2 Kavvadias and Papadimitriou [124] propose a dierent way to exploit the gradient, i.e., variable depth search, which is based on ideas of Lin and Kernighan [137] for the travelling salesman problem. An initial solution is drawn at random, then moves are made along a direction of steepest-ascent or mildest-descent among unexplored directions. In this way one gets eventually to the opposite of the initial truth assignment. 27

Then the best solution along the path so-explored is selected and the procedure iterated as long as an improved solution is found. Rules of this method are given in Figure 3. Experiments conducted in the unweighted case show Tabu Search to give better results, and to obtain then more quickly than simulated annealing [106]. From further unpublished results, variable depth search appears to be almost as good but not better than Tabu Search.

Variable Depth Search for minimizing a weighted maxsat function. Select an initial solution x0 ;

fopt = f (x0); xopt = x0 ;

Repeat

0 = fopt fopt Set tj = 0 for j = 1; : : :; n;

Repeat Select xk 2 N (x0) such that k = f (xk ) ? f (x0) = min j ; j jt =0 x0 = xk ; tk = 1; If f (x0) < fopt then fopt = f (x0); xopt = x0 ; endif; Until all tj = 1 x0 = xopt; 0 = fopt Until fopt xopt is an approximation to the optimal solution. j

Figure 3 3.2.2 Exact algorithms When no more negative reduced cost (in case of minimization) can be found by a heuristic, and if no feasible solution has been obtained when considering the decision version of probabilistic satis ability, the auxiliary problem must be solved exactly. Research on maximization of nonlinear functions in 0{1 variables is well developed, see Hansen, Jaumard and Mathon [109] for a recent survey. Methods are based on: (i) 28

linearization; (ii) boolean manipulations (or algebra); (iii) implicit enumeration and (iv) cutting-planes. The two rst types of methods have been applied to probabilistic satis ability and are the only ones reviewed here (the other ones also hold promise and an experimental comparison for probabilistic satis ability problems and extensions would be of interest). Linearization is done by replacing products of variables by new variables and adding constraints to ensure that values agree in 0{1 variables (Dantzig [63], Fortet [80, 81]). Consider a term Y c xj (30) j 2J

where c 2 IR and xj 2 f0; 1g for j 2 J . Then (30) is equivalent to subject to:

cy

y P xj ? jJ j + 1; (31) j 2J y xj j 2 J; y 0 as the rst constraint forces y to be equal to 1 when all xj for j 2 J are equal to 1 and the two last constraints force y to be equal to 0 as soon as one of these xj is equal to 0. Note that it need not be explicitly speci ed that y is a 0{1 variable. Moreover, if c > 0 the rst constraint may be omitted, and if c < 0 the last constraints may be omitted, as the variable y appears in no other term or constraint, and hence automatically takes the required value at the optimum. Linearization as done above introduces as many new variables as there are nonlinear terms in the function to be minimized (or maximized) and a number of new constraints equal to the number of nonlinear clauses with a negative coecient 0 plus the number of nonlinear clauses with a positive coecient multiplied by the average number of variables in these clauses. So the size of the resulting linear 0{1 variables increases quickly with m, n and the number of non-zero dual variables ui. Fortunately, it turns out that this last number tends to be small at the optimum. 29

A slightly dierent linearization procedure has been proposed by Hooker [116], see also Andersen and Hooker [11]. Algebraic methods for maximizing a nonlinear 0{1 function are based on variable elimination (Hammer, Rosenberg and Rudeanu [103], Hammer and Rudeanu [104], Crama et al. [61]). Let f1 be the function to be maximized and

f1(x1; x2; : : :; xn) = x1g1(x2; x3; : : : ; xn) + h(x2; x3; : : : ; xn)

(32)

where g1 and h1 do not depend on x1. Clearly there exists a maximizing point (x1; x2; : : :; xn) of f1 such that x1 = 1 if g(x1; x2; : : :; xn) > 0 and such that x1 = 0 if g(x1; x2; : : :; xn) 0. Then de ne a function 8 < g (x2; x3 ; : : :; xn ) if g (x2 ; x3 : : :; xn ) > 0 ( x ; x ; : : : ; x ) = 1 1 2 n : 0 otherwise (33) n

o

= max g(x2; x3; : : :; xn); 0 : Let p2 = 1 + h1 (where 1 is expressed in polynomial form). The problem thus reduces to maximization of the n ? 1 variable function f2. Iterating yields sequences of functions f1; f2; : : :; fn and 1; 2; : : :; n?1 where fi depends on n ? i + 1 variables. A maximizing point (x1; x2; : : : ; xn) is then obtained from the recursion i (xi+1 ; xi+2 ; : : :; xn ) > 0:

xi = 1 if and only if

(34)

The crucial variable elimination step may be done by a branch-and-bound algorithm of special type which proceeds to the determination of 1, adding variables in direct or complemented form when branching (Crama et al. [61]). Let J denote the index set of variables appearing together with x1 in a term of f1. After replacing x1 by 1 ? x1 and grouping terms in which x1 appears, we have

f (x1; x2; : : :; xn) = x1g1(xj : j 2 J ) + h1(x2; : : : ; xn) and g1 can be written

g1 = c0 +

X xj j 2J

j

+

30

X Y

i2 j 2T (t)

xj

jt

(35) (36)

where xj is equal to 1 if j = 1 and to 0 if j = 0. Then one aims to nd a polynomial expression of the nonlinear 0{1 function 1 = max fg1; 0g for all 0{1 vectors (xj ; j 2 J ). j

If it can be shown that g1 is always positive, it can be copied out and if never positive deleted. Otherwise, branching on a variable xs gives

g1 = xsg0 + xs g00 where g0 and g00, restrictions of g1 induced by xs = 1 and xs = 0, are considered in turn. Lower and upper bounds g 1 and g 1 on g1 are given by

g1 = c0 + P min f0; cj g + P min f0; cj g j 2J j 2

g1 = c0 + P max f0; cj g + P max f0; cj g j 2

j 2J

(37)

Moreover, penalties qj1 and qj0, gj1 and gj0 associated with xation of xj at 1 or 0 are n

o

p1j = max j cj ; (j ? 1)cj + P (1 ? jt) maxf?ct; 0g o i2 jj 2T (i) n p0j = max ?j cj ; (1 ? j )cj + P jt maxf?ct; 0g

(38)

i2 jj 2T (i)

for g1 and

n

o

qj1 = max ?j cj ; (1 ? j )cj + P (1 ? jt) maxfct; 0g i2 jj 2T (t) o n qj0 = max j cj ; (j ? 1)cj + P jt maxfct; 0g

(39)

i2 jj 2T (t)

for g1. These penalties can be added (subtracted) to g1 (from g1) when xj is xed. They also lead to the improved lower and upper bounds minfp1j ; p0j g; g1 = g1 + max j 2J (40) 1; q 0g: g1 = g1 ? max min f q j j j 2J To describe the branch-and-bound algorithm we use the terminology of Hansen [105], with the following extended meaning: a resolution test exploits a sucient condition for a particular formula to be the desired expression of the current nonlinear 0{1 31

function, a feasibility test exploits a sucient condition for the null function to be such an expression. Let ` denote the product of variables in direct or complemented form corresponding to the variables xed at 1 and at 0 respectively in the current subproblem.

Algorithm C (Basic algorithm revisited, Crama et al. 1990) a) Initialization. Set = ; ` = 1. b) First direct feasibility test. Compute g. If g 0, go to (i). c) Second direct feasibility test. Compute g. If g 0, go to (i). d) First direct resolution test. Compute g. If g 0 then

+ `g and go to (i).

e) Second direct resolution test. Compute g. If g 0 then

+ `g and go to (i).

f) Conditional feasibility test. If, for some j 2 J , g ? qj1 0 set ` `xj , J J nfj g and x xj at 0 in g. If, for some j 2 J , g ? qj0 0 set ` `xj , J J n fj g and x xj to 1 in g. If at least one variable has been xed in this test return to b). g) Conditional resolution test. If for some j 2 J , g + p1j 0 set ` `xj , J J nfj g, x xj at 1 in g, + `g and go to (i). If for some j 2 J , g + p0j 0 set ` `xj , J J n fj g, x xj at 0 in g, + `g and go to (i). h) Branching. Choose a variable xs to branch upon be setting s = 1 or s = 0. Set ` xj , J J n fsg. Update g be setting xs to 1. Return to b). j

s

i) Backtracking. Find the last literal xs chosen in b) for which the complementary value has not yet been explored. If there is none stop. Otherwise delete from ` the literal xs and the literals introduced after it, free the corresponding variables in g. Update J , then x xj at 0 in g, set ` `x1? , J J n fsg and return to b). s

s

j

s

An example and discussion of how to best implement this algorithm are given in Crama et al. [61]. 32

3.3 Computational Experience Computational results for probabilistic satis ability have been reported by Kavvadias and Papadimitriou [124] and Jaumard, Hansen and Poggi de Arag~ao [117]. The former authors consider only the decision version, and solve the auxiliary problem of the column generation method by a variable depth search heuristic. The algorithm so implemented remains incomplete in that it is unable to prove there are no feasible solutions when none is found. Problems with up to 70 variables and 70 sentences, which are clauses, are solved. The latter authors use both Tabu Search and the Basic Algorithm Revisited, described above, in their column generation algorithm. The linear programming part is done with the mpsx code of Marsten [142]. Probabilistic Satis ability problems with up to 140 variables and 300 sentences are solved, both in decision and in optimization form. Moreover, problems with conditional probabilities of comparable size, are solved also. Recently, using the cplex code and linearization to solve the auxiliary problem led to solve problems with up to 500 sentences (Douanya Nguetse [71]. It thus appears that advanced linear programming tools allow solution of large scale probabilistic satis ability problem. To the best of our knowledge, no other method solves comparable size problems within probability logic except if strong independence assumptions are made (as, e.g., in Lauritzen and Spiegelhalter [135] method for uncertainty propagation in Bayesian networks).

3.4 Computational Complexity Georgakopoulos, Kavvadias, and Papadimitriou prove that Probabilistic Satis ability is NP-complete. In the proof, these authors consider problem (1) with m clauses as sentences and n variables, the result holds for general sentences as a consequence. First, they show that solving the dual of (1) by Khachiyan [126] ellipsod method for linear programming takes O(m2 log m) iterations, each of which requires solving an instance of a weighted maxsat (or unconstrained nonlinear 0{1 programming) aux33

iliary problem on the same clauses (with weights assumed to have O(n) bit length) to nd a violated inequality and performing O(m3 log m) more computations per iteration. Second, they note (as mentioned above) that the classical NP-complete satis ability (sat) problem is a particular case of (1). This proof shows polynomiality of algorithms for probabilistic satis ability hinges on polynomiality of algorithms for the weighted maxsat auxiliary problem. To study this point the co-occurrence graph G = (V; E ), (e.g., Crama et al. [61]) of the nonlinear 0{1 function is a useful tool. Its vertices are associated with variables of that function and edges join pairs of vertices associated with variables (in direct or complemented form) appearing together in at least one term. Kavvadias and Papadimitriou [124] show that probabilistic satis ability remains NP-complete when all clauses have at most two literals and G is planar (planar 2psat). Moreover, compatible marginals, i.e., the problem of nding if marginal probability distributions for all four conjunctions of given pairs of variables are compatible is also NP-hard. However, the case where there are at most two literals per clause and G is outerplanar, i.e., may be embedded in the plane so that all vertices are on the same face, is polynomial (Georgakopoulos et al. [86]). Other known polynomial cases of unconstrained nonlinear 0{1 programming lead to further polynomial cases of probabilistic satis ability. They include maximization of almost positive functions in which all terms with more than one variable have positive coecients (Rhys [159], Balinski [17]), unate functions, which are reducible to almostpositive ones by switching some variables (Hansen and Simeone [114], Crama [60], Simeone, de Werra and Cochand [163], unimodular functions which lead to unimodular matrices of coecients after linearization (Hansen and Simeone [114]), supermodular functions ( Grotschel, Lovasz and Schrijver [93], Billionnet and Minoux [20]), functions for which G contains no subgraph reducible to the complete graph or ve vertices (Barahona [19]) and functions for which G is a partial k-tree (Crama et al. [61]), see the cited references for de nitions not given here. Note that these results are theoretical, as Khachiyan [126] algorithm is not ecient 34

in practice. A dierent type of result has been obtained by Crama and van de Klundert [62], who are interested in polynomial heuristics. They consider problem (3) with lower bounds only, and no objective function, i.e., 1p = 1 Ap (41) p 0: Assuming with almost no loss of generality that i = bi=q for i = 1; 2; : : : ; m, with the bi, and q integers, (41) has a solution if and only if the optimal value of 2 P xi t=1 n

min subject to:

(42)

Ax b x0 is at most q. A heuristic solution to (42) can be obtained by a greedy column generation algorithm where the polynomial heuristic for weighted maxsat with a 3/4 guarantee of performance of Goemans and Williamson [89] is used to determine the columns to be n n selected, i.e., minimizing approximately 1= P aij (or maximizing P aij ). This gives a i=1 i=1 m solution of value at most 8=3H (m) times the optimal one where H (m) = P 1i . If this i=1 value is less than q a solution to (41) has been found. Otherwise, the selected columns may be used in an initial solution completed by the usual column generation method.

4 Decomposition Large instances of probabilistic satis ability problems are dicult to solve as the corresponding matrices A in (1), (2), (3), : : : tend to be dense. Even the use of an ecient code such as cplex does not allow solution of problems with more than 500 sentences in reasonable time. This suggest the interest of decomposition methods for probabilistic satis ability, a topic rst studied by Van der Gaag [167, 168]. The tools used are the same as for expression of conditional independence between variables in the study of belief networks 35

(e.g. Pearl [150]). Independence relations are represented by an I{map, i.e., a graph G = (V; E ) with vertices vj associated with variables xj and where the edge (vr; vs) does not belong to E if and only if variables xr and xs are conditionally independent. It is assumed edges have been added, e.g., with the minimum ll-in algorithm of Tarjan and Yannakakis [166], until all cycles of length greater than three have a cord, i.e., en edge joining non-successive vertices. Then G is a decomposable I{map. Assume further that all initially given probabilities as well as the probability to be bounded are local to the cliques of C . Under these conditions the joint probability distribution P can be expressed, as a product of marginal probability distributions on the maximal cliques of C , adequately scaled. So the problem will be solved on each of the cliques C`1; C`2; : : : ; C`i. However, it is necessary that the marginal distributions so-obtained agree on the intersection of the cliques. To ensure this one considers a join-graph G1 in which vertices are associated with the cliques C`1; C`2; : : : ; C`i and edges join vertices associated with cliques having a non-empty intersection. Then one determines a join-tree, i.e., a spanning tree of G1. Conditions must be written for each edge of this tree. Probabilistic satis ability in optimization form, with decomposition may thus be written: min = max Am+1p subject to: 1 pi = 1 i = 1; : : : ; t i (43) A pi = i i = 1; : : : ; t i j Tij p ? Tjip = 0 i = 1; : : : ; t; j such that C`i and C`j are adjacent in the join tree i p 0 i = 1; : : : ; t; where t is the number of maximal cliques. The rst set of constraints corresponds to the probabilistic satis ability problem on each clique and the second set to the compatibility conditions between the marginal distributions.

36

Example 6. (Douanya Nguetse et al. [71])

Consider the following six logical sentences and their probabilities of being true: prob (S1 x1) = 0:6 prob (S2 x1 _ x2) = 0:4 prob (S3 x2 _ x3) = 0:8 prob (S4 x3 ^ x4) = 0:3 prob (S5 x4 _ x5) = 0:5 prob (S6 x2 _ x5) = 0:6 Find best possible bounds on the probability 7 of S7 x5. The corresponding objective function row and matrix A are 1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p15 p16 p17 p18 p19 p20 p21 p22 p23 p24 p25 p26 p27 p28 p29 p30 p31 p32 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 0 1 0 1 1 1 1 1 1 1 1 1 0 1 0 1 0 1 0

p

Solving the linear program (2) gives when minimizing a lower bound of 0.2 on 7, with p13 = 0:2, p10 = 0:3, p12 = 0:1, p20 = 0:2, p22 = 0:2, all other pk = 0; and when maximizing an upper bound of 0.5 on 7, with p11 = 0:1, p13 = 0:1, p17 = 0:3, p10 = 0:3, p14 = 0:1, p22 = 0:1, all other pk = 0. The corresponding I-map, decomposable I-map and n join-tree are represented on Figure 4. This problem's objective function, coecient matrix and right hand side are: Si

51 S 0 x1 x1 _ x2 (x2 = 1) 2 S 0 x2 _ x3 x3 ^ x4 (x2 = 1; x4 = 1) (x2 = 1; x4 = 0) (x2 = 0; x4 = 1) 3 S 0 x4 _ x5 x2 _ x5 x

p

1 10 1 1 1 1

p

1 20 1 1 0 0

p

1 30 1 0 1 1

p

1 40 1 0 1 0

2 p2 p2 p2 p2 p2 p2 p2 p3 p3 p3 p3 p3 p3 p3 p3 10 20 30 40 50 60 70 80 11 20 31 40 51 60 71 80

p

?1 ?1 ?1 ?1 0 0 0 0 1 1 1 1 0 0

1 1 0 0 1 0

1 1 0 1 0 0

1 1 0 0 1 0

1 1 1 0 0 1

37

1 1 0 0 0 0

1 0 0 0 0 1

1 0 0 0 ?1 ?1 0 0 0 0 0 0 1 1 1 0 1 1

0 0 0 0 0 0

?1 ?1 0 0 0 0 0 0 ?1 ?1 0 0

1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 0

=1 = 0:6 = 0:4 =0 =1 = 0:8 = 0:3 =0 =0 =0 =1 = 0:5 = 0:6

v1 , v2 v3 v1

v2

v3 v4

v2

v1

v5

v2 v4

v2 , v3 , v4 v2 , v4

v5

v2 , v3 , v5 a)

b)

c)

Figure 4: I-map, decomposable I-map and join tree Solving the linear program (43) gives when minimizing a lower bound of 0.2 on 7, with p12 = 0:6, p13 = 0:4, p22 = 0:3, p23 = 0:1, p25 = 0:1, p26 = 0:1, p27 = 0:1, p28 = 0:1, p32 = 0:1, p34 = 0:3, p36 = 0:4, p37 = 0:2, all other pik = 0; and when maximizing an upper bound of 0.5 on 7, with p12 = 0:6, p13 = 0:4, p22 = 0:3, p23 = 0:1, p25 = 0:3, p26 = 0:1, p27 = 0:1, p28 = 0:1, p33 = 0:3, p36 = 0:4, p37 = 0:2, all other pik = 0. 2 Problem (43) is equivalent to problem (2), in the sense that it gives the same bounds. Indeed, (i) to any feasible solution of (2) corresponds a feasible solution of (43) with the same value, obtained by grouping terms to build the marginal distribution on each of the cliques, and (ii) to any feasible solution of (43) corresponds a solution of (2) with the same value: as C is a decomposable I-map it follows from a result of Pearl [150] that there is a corresponding joint probability distribution given by

prob C`i(wk ) pk = prob(wk ) = prob C` \ C` ( w ) i j k i=1;2;:::;` Y

(44)

where prob C`i(wk ) denotes the probability of the restriction of world wk to C`i and prob C`i \ C`j (wk ) the probability of the restriction of world wk to the intersection of cliques C`i and C`j . Note that the probability distribution obtained with (44) from the solution of (43) will usually not be the same as that of (2). In particular, it may not be a basic solution. 38

Example (continued) For world w10 = (10110)T , in the minimization case

prob C`2(011)T prob X`3 (010)T p10 = prob(C`1(10)T prob C`1 \ C`2(011)T prob C`2 \ C`3(010)T = 0:6 0:3 + 0:10+:30:1 + 0:1 00::44 = 0:3: Similarly, p11 = 0:1, p14 = 0:1, p15 = 0:1, p20 = 0:2, p24 = 0:1, all other pk = 0. 2 Van der Gaag [167, 168] proposes to exploit the structure of (39), eliminating subproblems one at a time, beginning from the leaves of the join-tree, by: (i) nding their feasible set Fi from the local system of constraints Aipi = i; (ii) projecting this feasible set on clique intersection Si , i.e., nding the set fTij pi ; pi 2 Fig; (iii) transmitting these restrictions to the neighboring subproblem by the equations Tij pi ? Tjipj = 0; (iv) iterating until only the problem corresponding to the root of the join-tree and containing the objective function remains; (v) solving this last problem, with all additional constraints by linear programming (and then the other subproblems if a solution is desired in addition to the bounds).

No details on how to perform these operations are given. The usual way would be to use vertex and extreme ray enumeration techniques for polytopes. Such methods are time-consuming as the number of vertices of polytopes of xed dimension usually augments exponentially with the number of constraints (Dyer [77], Chen et al. [47, 48]). It is of course also possible to apply Dantzig-Wolfe [63] decomposition to (43). Even without doing this the form (43) is of interest. Indeed while (43) has usually many less t t columns than (2) (i.e., P 2jC` j instead of 2j_j) and more rows (i.e., n + P 2jC` \C` j i=1 i=1 instead of m) it is also much sparser. This sparsity is well exploited in codes such as cplex. So an alternative is to solve (43) by column generation. To nd the entering column it will be necessary to solve up to t subproblems, one for each clique C`i. i

i

39

j

Example (Continued) For the rst clique the reduced cost expression becomes

?u1 ? u2x1 ? u3(1 ? x1 + x1x2) ? u4x2 = ?u1 ? (u2 + u3)x1 ? u4x2 ? u3x1x2; for the second clique u4 x2 ? u5 ? u6 (x2 + x3 ? x2 x3) ? u7x3 x4 ? u8 x2 x4 ? u9 (x2 ? x2x4 ) ? u10 (x4 ? x2x4 ) = ?u5 + (u4 ? u6 ? u9)x2 ? u6x3 ? u10x4 + u6x2x3 + (u9 + u10 ? u8)x2x4 ? u7x3x4; and for the third clique x5 ? u8 x2x4 + u9 (x2 ? x2 x4)+ u10(x4 ? x2x4 ) ? u11 ? u12(1 ? x4 + x4 x5 ) ? u13(x2 + x5 ? x2 x5) = ?u11 ? u12 +(u9 ? u13)x2 ? u12x4 ? (u13 ? 1)x5 ? (u9 + u10)x2x4 + u13x2x5 ? u12x4x5; where x1; x2; : : :; x5 2 f0; 1g.

This approach led to solve instances with up to m = 900 and a few cliques with small intersections (Douanya Nguetse et al. [71]).

5 Nonmonotonic Reasoning and Restoring Satis ability 5.1 Minimal Extension of Probability Intervals When new sentences and their probabilities are added to a probabilistic satis ability problem, consistency of the resulting system must be checked. In view of the \fundamental theorem of the theory of probabilities" of de Finetti [68] it is always possible to nd a coherent extension of the probabilities of the initial sentences. Indeed, considering the optimization versions (2){(6) and (8) of probabilistic satis ability it suces, while adding one sentence at a time, to choose a probability m+1 within the interval [m+1; m+1]. However, this might not correspond to the subjective view about m+1 to be modelled, the sign that some previously chosen values should be revised. This situation is more likely to happen if several new sentences are simultaneously added, possibly by dierent experts. Two natural ways to restore satis ability are to modify the probabilities i (or their bounds i and i) and to delete some of the sentences. We discuss them in 40

this subsection and the next one. To restore satis ability (or coherence) with minimal changes one must solve the following linear program subject to:

min ` + u 1p = 1 ? ` Ap + u `; u; p 0

(45)

(Jaumard et al. [117]), i.e., minimize the sum of enlargements of the probability intervals needed to restore satis ability. As con dence in the (subjective) estimates of the various sentences may vary substantially use can be made of the weighted objective function (46) min w` + wu where w and w are vectors of positive weights the larger the more the probability intervals (; ) are considered to be accurate. Problem (45) and (46) can be solved by column generation algorithms, as discussed in Section 3, keeping the column corresponding to ` and u explicit (or treating them separately as in the revised simplex algorithm). While similar extensions of probability intervals for conditional probabilities might be considered the resulting problem would be a bilinear program, which is much more dicult to solve than a linear program.

5.2 Probabilistic Maximum Satis ability A second way to restore satis ability is to delete a subset of sentences with minimum cardinality (or possibly with minimum total weight, where weights of sentences are subjective estimates of their importance or reliability). This is done by solving the

41

following mixed 0{1 linear program: subject to:

min jyj

m P = yi i=1

1p = 1 ? `y Ap + uy ` y u (1 ? )y `; u; p 0 y 2 f0; 1gm

(47)

The variables yi for i = 1; : : : ; m are equal to 1 if sentence Si is deleted and to 0 otherwise. In the former case the interval [i; i] can be extended to [0; 1], so the probability of Si is not constrained any more, which is equivalent to deleting Si. In the latter case ` = 0 and u = 0 so the probability interval [i; i ] is unchanged. Problem (47) has an exponential number of columns and also some integer variables. To solve it, it is necessary to combine column generation with integer programming. Fortunately, the number of integer variables, i.e. m, is small. So the standard dual algorithm for mixed-integer programming can be extended fairly eciently (Hansen, Minoux, Labbe [113]). It turns out that a primal integer programming algorithm (alternating phase 1 and phase 2 calculations as in the simplex algorithm) is even more ecient (Hansen, Jaumard and Poggi de Arag~ao [110, 111]).

6 Other Uses of the Probabilistic Satis ability Model 6.1 Maximum Entropy Solution The model (2) has been criticized on the ground that the bounds obtained may be weak for large instances or even provide no information at all, i.e., be equal to 0 or 1. It may be argued that in such a case the bounds being best possible, nothing more can be said with the available information. But bounds being far apart suggests also the interest of a representative solution, if one can be de ned. A natural choice is then to 42

seek the solution which makes the least assumptions, i.e., makes probabilities of the possible world as equal as can be, subject to the given constraints. This solution is the maximum entropy one. The problem becomes max p log p subject to: 1 p = 1 Ap = p 0:

(48)

This problem is very hard to solve. Using Lagrangian multipliers the objective function becomes max p log p ? ( ? Ap) (49) and dierentiating with respect to all pj yields the rst-order conditions log pj + 1 + from where it follows that

m X i=1

i aij = 0

m X ? 1 e? a pj = e i=1 j

Then setting

ij

:

(50) (51)

a0 = e?1e?1 (52) i = 1; : : : ; m ai = e? each probability pj can be expressed as a product of some of the a0; a1; : : :; am. i

This reduces (48) to a system of multilinear equations in the quantities a0; a1; : : :; am (Cheeseman [50], Nilsson [144], Kane [122]). Such a system may be solved by an iterative method, but this is time consuming even for small m. Moreover, as shown by Paris and Vancovska [146] computing the factors a0a1; : : : ; am to a reasonable accuracy is NP-hard. Nilsson [144] proposes also another approximate method for nding a representative solution of (2). McLeish [140, 141] characterizes when both solutions agree. Kane [119, 120, 121] considers systems in which sentences S1 to Sm?1 are atoms and Sm is an implication between the conjectures of these atoms and another atom, the conclusion. A closed form solution for the factors is then obtained, from which probabilities of the possible worlds are readily obtained. 43

6.2 Anytime Deduction Frisch and Haddawy [82, 83] consider models (1) and (2), as done by Nilsson [144], as well as their extension (3) to probability intervals. However, they do not apply the simplex algorithm, but consider instead a series of rules. Examples of such rules are prob(S1jS4) 2 [x; y] prob(S1 _ S2jS4) 2 [u; v] (53) prob(S1 ^ S2jSh4) 2 [w; z] i prob(S2jS4) 2 max(w; u ? y + w); min(v; v ? x + z) provided x y; x v; w v; prob(S1jS4) 2 [x; y] ; (54) prob(S 1jS4) 2 [1 ? x; 1 ? y] and prob(S2jS4) 2 [x; y] (55) prob(S1jS2 ^ S4) 2 [u; v] prob(S1 ^ S2jS4) 2 [x; u; y; v] where S1; S2; S3; S4 represent arbitrary propositional formulas. These rules have been obtained from various known results by adding an arbitrary conditioning sentence S4 (the same in premise and conclusions) or proved by the authors. While the set of rules considered is not complete, it covers most rules proposed in the literature. Frisch and Haddawy [83] propose an anytime deduction procedure for model (3): starting with an interval [0; 1] for the probability of the objective function sentence they apply the rules, in any order, to the data. The probability intervals so obtained decrease continuously. The algorithm can be stopped at any time, when it is judged that enough computing has been done or no further progress is observed (even if the best bounds have been obtained, the set of rules not being complete does not allow to recognize it is the case). An important feature of this approach is that it justi es what is done step by step and thus provides an explicit proof of the results, showing how they are obtained. There is, however, a diculty when the intervals values for the given sentences are not 44

coherent. Indeed, this fact may not be recognized, or even recognizable with the given rules. Then according to which rules have been applied when it is decided to stop a probability interval with high or low values may be obtained, and is arbitrary. A way out is to rst check consistency with linear programming and column generation, then apply the rules (possibly until the best bounds are obtained) to get an explicit step by step proof. If the given intervals are not coherent one may restore satis ability by extending them as discussed in Section 5, and then proceed as above.

7 Other Related Approaches 7.1 Incidence Calculus Many approaches to uncertainty in arti cial intelligence use probabilities. Some of them which predate or are contemporary to Nilsson's [144] paper are quite close to probabilistic satis ability. This is the case for Quinlan's [155] Inferno system, which exploits various rules of probability logic and for the incidence calculus developed by Bundy and coworkers [36, 37, 38, 139]. This logic for probabilistic reasoning proceeds, as probabilistic satis ability from lower and upper bounds or sentences (axioms of a logical theory) to lower and upper bounds on the remaining sentences (formulas of the theory). Incidences, i.e., sets of possible worlds, with a probability, are associated with sentences rather than probabilities or bounds on them. The intended meaning of the incidence of a sentence is the set of possible worlds in which the formula is true. This encoding makes incidence calculus truth functional, i.e., the incidence of a compound formula can be computed directly from its parts. Given a set W of worlds (which are here primitive objects of incidence calculus) rules for extending incidence are as

45

follows:

i(true) = W i(false) = ; i(S 1) = W n i(S1) i(S1 ^ S2) = i(S1) \ i(S2) i(Si _ S2) = i(S1) [ i(S2) i(S1 ! S2) = W n (S1) [ i(S2); from which one can deduce the rules for probabilities prob(true) = 1 prob(false) = ; prob(S1) = 1 ? prob(S1) prob(S1 _ S2) = prob(S1) + prob(S2) ? prob(S1 ^ S2) prob(S1 ! S2) = prob(S1) + prob(S2) ? prob(S 1 ^ S2) prob(S1 ^ S2) = prob(S 1) + prob( q S2 ) + c(S1; S2) prob(S1)prob(S1)prob(S2)prob(S 2) where c(S1; S2) is the correlation between S1 and S2 de ned by c(S1; S2) = q (S1 _ S2) ? prob(S1 ^ S2) : prob(S1)prob(S 1)prob(S2)prob(S 2)

(56)

(57)

(58)

Using these rules it is usually only possible to determine lower and upper bounds on the incidences of conclusions. The precision of these bounds will depend on the number of possible worlds considered. While incidence calculus avoids considering all possible worlds in the sense of probabilistic logic (and hence is easier to use than Nilsson's original proposition in which all such worlds are considered to set up (1) or (2)) the precision of the bounds obtained depends on the number of worlds considered and these bounds are not necessarily valid in worst case.

7.2 Bayesian Logic Consider a Bayesian network G = (V; U ) (e.g., Pearl [150]). Vertices (or nodes) vj of V are associated with simple events (or logical variables xj ; we assume here only two outcomes for each event are possible, i.e., true or false). Directed arcs (vi; vj ) are used 46

to represent probabilistic dependence among events. Moreover, the network is acyclic. Probabilities of vertices conditioned on the values of their immediate predecessors are given. The probability that a vertex is true, when conditioned on the truth values of all its non-successors, is equal to the probability that it is true, conditioned only on the truth values of its immediate predecessors. Consequently, probability of any possible world can be computed by the chain rule using the speci ed conditional probabilities only. This leads in practice to fairly easy ways to compute probabilities or conditional probabilities of events provided immediate predecessors are not too numerous (although it is NP-hard to do so even if their number is bounded as shown by Cooper [59]), see Pearl [150], Lauritzen and Spiegelhalter [135] and Andersen and Hooker [11] for examples. The assumptions made are, however, very strong ones: sucient information must be given to de ne a unique point probability distribution and this supposes giving 2jpred j ? 1 exact values, where predj denotes the number of predecessors of vertex j for exact vertex vj . j

Andersen and Hooker [11] examine how some of the assumptions of belief networks could be relaxed, by combining this approach with probabilistic satis ability. It is easy to see that usual computations in Bayesian networks can be cast into the form (3) (Hansen, Jaumard, Douanya Nguetse and Poggi de Arag~ao [108]), Andersen and Hooker [11] propose a more complicated nonlinear formulation. Then precise probability values for simple or conditional events may be replaced by intervals as discussed in Section 7. One can also add constraints dierent from the conditional implications, allow for networks with cycles, etc. not all extensions remain linear, as e.g., when marginal probabilities for simple events are given by intervals and these events are independent. While some proposals have been made (e.g., Andersen and Hooker [11] recommend using generalized Benders decomposition (Georion [85]) and signomial geometric programming for solving subproblems) ecient solution of nonlinear probabilistic satis ability problems is still largely unexplored.

47

7.3 Assumption-based Truth Maintenance Systems Assumption-based truth maintenance systems (Laskey, K.B. and Lehner [134], Kohlas and Monney [129]) may be viewed as probabilistic satis ability problems of the form (2) or (3), in which sentences are of two types: assumptions, which are atoms having a given probability or a probability in a given interval, and rules which have probability 1. Moreover, assumptions are assumed to be independent (in the usual sense of probability theory: this is dierent from the concept used by Boole [26] which corresponds to conditional logical independence). A very careful examination of algorithms for coherence and bounding in assumptionbased truth maintenance systems is made in a recent book of Kohlas and Monney [129]. It seems likely that relaxing the independence assumption might make solution of such problem easier (and the bounds obtained less precise).

7.4 Probabilistic Logic via Capacities An important relationship between probabilistic satis ability and capacities (or belief functions) has been recently established by Kampke [118]. A lower probability is the minimum of a set of probability distributions de ned over the same space. The probabilistic satis ability model (1) can be extended by considering several probability distributions p1; p2; : : :; pN instead of a simple one: 1 pi = 1 A min(p1; p2 ; : : :; pN ) = pi 0

i = 1; 2; : : : ; N i = 1; 2; : : : ; N:

(59)

While a solution to (1) always satis es (59) the converse is not necessarily true.

Example (Kampke [118]) Let S1 x1 _ x2, S2 x1x2 _ x1x2, 1 = 0:4 and 2 = 0:3. Set p1 = prob(x1x2),

p2 = prob(x1x2), p3 = prob(x1x2) and p4 = prob(x1x2). 48

Then the probabilistic satis ability problem (1) has no solution, but the lower probability problem (59) has a solution p1 = (:1; :1; :2; :6) (60) p2 = (:2; :3; :3; :2):

2

Kampke [118] proves that lower probabilities which are solutions of (59) are substitutable by the minimum of only two distributions. Moreover, these two distributions form a totaly monotone capacity, or belief function (see Choquet [52], Shafer [162] or Kampke [118] for de nitions). Problem (59) can be solved by extending solution techniques for (1) and (2).

7.5 Other Applications Due to its simplicity, it is not surprising that probabilistic satis ability has many applications (and the potential for many more) in addition to those in ai and in Probability discussed throughout this chapter. We mention a few Zemel [175], Assous [14], Brecht and Colbourn [32], Colbourn [53], Hansen, Jaumard and Douanya-Nguetse [107] consider two-terminal reliability of networks. Failure probabilities, not assumed to be independent, are given for all edges. The probability of an operational path from source to sink is to be bounded. Zemel [175] suggest the use of column generation and nds polynomial cases. Assous [13, 14] shows the lower and upper bounds can be found by solving a shortest path and a minimum cut problem. Brecht and Colbourn [32] use this result to improve reliability bounds with independent probabilities of edges failure through a two-stage computation. Hansen et al. [107] get more precise bounds by considering also probabilities of simultaneous failure of pairs of edges. Prekopa and Boros [154] study electricity production and distribution systems. Assuming probability distributions for oers and demands to be given they show how to compute the probability of non-satisfaction of demand due to insucient production or transportation capacity. 49

Kane, McAndrew and Wallace [123] apply the maximum entropy algorithm of Kane [119, 121] to model-based object recognition with a signi cant improvement over previous methods. Hailperin [100] suggest to apply probabilistic logic to fault analysis in digital circuits (Parker and McCluskey [147, 148]).

8 Conclusions While many proposals have been made for handling uncertainty in ai, there are few methods which apply to a large variety of problems, and also few methods which allow rigorous solution of large instances. The approach based on logic and probability, epitomized by probabilistic satis ability and its extensions is one of the most comprehensive and powerful available. This is largely due to the strength and versatility of the advanced linear and mixed-integer programming techniques upon which it relies. Both analytical and numerical solution can be obtained for a large variety of problems. The former are obtained through Fourier-Motzkin elimination, or enumeration of extreme points and extreme rays of polytopes, the latter through linear programming algorithms. Large instances can be solved using column generation and nonlinear 0{1 programming. These solution methods apply both to the consistency problem for given logical sentences and probabilities and to the problem of nding best bounds for an additional logical sentence. Both simple and conditional probabilities can be considered in the constraints and/or in the objective function, as well as probability intervals and additional linear constraints on the probabilities. Recent theories on combination or iteration of conditionals can also be expressed in this framework. Moreover, nonmonotonic reasoning can apply, through the study of minimal changes to restore consistency. No independence or conditional independence conditions assumptions need be im50

posed, but conditional independence may be implicitly taken into account. Probabilistic satis ability and its extensions may be viewed as the applied, computation oriented (but including formal computing) side of probability logic, which is a very active research area. After a brilliant start, with Boole's work, followed by a long dormant period until Hailperin's rst paper, it is now gaining impetus. Much work remains to be done, but the perspectives for theory and applications of probability satis ability (including here the subjective probability approach of de Finetti and his school and its extension to imprecise probabilities by Walley) appear very promising.

51

References [1] Aarts, E.H.L. and J.H.M. Korst, Simulated Annealing and Boltzmann Machines, Chichester: Wiley, 1989. [2] Abadi, M., and J.Y. Halpern, Decidability and Expressiveness for First-Order Logics of Probability, Information and Computation 112 (1994) 1{36. [3] Adams, E.W., Probability and the logic of conditionals, in Aspects of Inductive Logic, J. Hintikka and P. Suppes (Eds.) Amsterdam: North-Holland (1966) 265{ 316. [4] Adams, E.W., The Logic of \Almost All", Journal of Philosophical Logic 3 (1974) 3{17. [5] Adams, E.W., The Logic of Conditionals, D. Reidel Publishing, Dordrecht, Holland, 1975. [6] Adams, E.W., Probabilistic Enthymemes, Journal of Pragmatics 7 (1983) 283{ 295. [7] Adams, E.W., On the Logic of High Probability, Journal of Philosophical Logic 15 (1986) 255{279. [8] Adams, E.W. and H.P. Levine, On the Uncertainties Transmitted from Premises to Conclusions in Deductive Inferences, Synthese 30 (1975) 429{460. [9] Andersen, K.A., Characterizing Consistency for a Subclass of Horn Clauses, Mathematical Programming 66 (1994) 257{271. [10] Andersen, K.A., and J.N. Hooker, A Linear Programming Framework for logics of Uncertainty, Mathematish Institut, Aarhus Universitet, May 1993, to appear in Decision Support Systems. [11] Andersen, K.A., and J.N. Hooker, Bayesian Logic, Decision Support Systems 11 (1994) 191{210. [12] Andersen, K.A. and J.N. Hooker, Determining Lower and Upper Bounds on Probabilities of Atomic Propositions in Sets of Logical Formulas Represented by Digraphs, to appear in Annals of Operations Research (1996). [13] Assous, J.Y., Bounds on Network Reliability, Ph.D. Thesis, Northwestern University, 1983. 52

[14] Assous, J.Y., First and Second-Order Bounds on Terminal Reliability, Networks 16 (1986) 319{329. [15] Avis, D., and K. Fukuda, A Pivoting Algorithm for Convex Hulls and Vertex Enumeration Arrangements and Polyhedra, Discrete and Computational Geometry (1992). [16] Bacchus, F., Representing and Reasoning with Probabilistic Knowledge: A Logical Approach to Probabilities, The MIT Press, Cambridge, Massachusetts, 1990. [17] Balinski, M., On a Selection Problem, Management Science 17 (1970) 230{231. [18] Bamber, D., Probabilistic Entailment of Conditionals by Conditionals, IEEE Transactions on Systems, Man and Cybernetics 24 (1994) 1714{1723. [19] Barahona, F., The Max-cut Problem on Graphs Not Contractible to K5, Operations Research Letters 2 (1983) 107{111. [20] Billionnet, A., and M. Minoux, Maximizing a Super-modular Pseudo-Boolean Function: A Polynomial Algorithm for Super-modular Cubic Functions, Discrete Applied Mathematics 12 (1985) 1{11. [21] Bland, R.G., New Finite Pivoting Rules for the Simplex Method, Mathematics of Operations Research 2 (1977) 103{107. [22] Bonferroni, C.E., Teoria statistica delle classe del calcolo delle probabilita, Volume in onore di Riccardo Dalla Volta, Universita di Firenze, 1{62, 1937. [23] Boole, G., Proposed Question in the Theory of Probabilities, The Cambridge and Dublin Mathematical Journal 6 (1851) 186. [24] Boole, G., Further Observations on the Theory of Probabilities, The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 4(2) (1851) 96{101. [25] Boole, G., Collected Logical Works. Vol I, Studies in Logic and Probability, ed. R. Rhees. LaSalle, Illinois: Open Court, 1952. [26] Boole, G., An Investigation of the Laws of Thought, on which are Founded the Mathematical Theories of Logic and Probabilities, London: Walton and Maberley, 1854 (reprint New York: Dover 1958). [27] Boole, G., On the Conditions by which Solutions of Questions in the Theory of Probabilities are Limited, The London, Edinburgh and Dublin Philosophical Magazine and Journal of Science 4(8) (1854) 91{98. 53

[28] Boole, G., On a General Method in the Theory of Probabilities, The London, Edinburgh and Dublin Philosophical Magazine and Journal of Science 4(8) (1854) 431{444. [29] Boole, G., On Certain Propositions in Algebra Connected to the Theory of Probabilities, The London, Edinburgh and Dublin Philosophical Magazine and Journal of Science 4(9) (1855) 165{179. [30] Boole, G., On Propositions Numerically De nite (read posthumously by De Morgan March 16th, 1868), Transactions of the Cambridge Philosophical Society 11 (1871) 396{411. [31] Boros, E., and A. Prekopa, Closed Form Two-sided Bounds for Probabilities that at Least r and Exactly r out of n Events Occur, Mathematics of Operations Research 14 (1989) 317{342. [32] Brecht, T.B., and C.J. Colbourn, Improving Reliability Bounds on Computer Networks, Networks 16 (1986) 369{380. [33] Bruno, G., and A. Gilio, Applicazione del metodo del simplesso al teorema fondamentale per le probabilita nella concezione soggettivistica, Statistica XL (3) (1980) 337{344. [34] Bruno, G., and A. Gilio, Comparison of conditional Events of Zero Probability in Bayesian Statistical Inference, (Italian), Rivisita di Mathematica per le Scienze economiche e sociali (Milan) 8(2) (1985) 141{152. [35] Buchanan, B.G., and E.H. Shortlie, Rule-based expert systems | The MYCIN experiments of the Stanford heuristic programming project, Addison-Wesley, Reading, MA, 1985. [36] Bundy, A., Incidence Calculus: A Mechanism for Probabilistic Reasoning, Journal of Automated Reasoning 1 (1985) 263{283. [37] Bundy, A., Correctness Criteria of Some Algorithms for Uncertain Reasoning using Incidence Calculus, Journal of Automated Reasoning 2 (1986) 109{126. [38] Bundy, A., Incidence Calculus, in: Encyclopedia of Arti cial Intelligence, S.C. Shapiro (ed.), New-York; Wiley (1991) 663{668. [39] Calabrese, P.G., An Algebraic Synthesis of the Foundations of Logic and Probability, Information Sciences 42 (1987) 187{237. 54

[40] Calabrese, P.G., Reasoning with Uncertainty Using Conditional Logic and Probability, in: First International Symposium on Uncertainty Modeling and Analysis, IEEE Computer Society, 1990, 682{688. [41] Calabrese, P.G., Deduction and Inference using Conditional Logic and Probability, Chapter 2 in Conditional Logic in Expert Systems, I.R. Goodman et al., eds., North-Holland, 1991, 71{100. [42] Calabrese, P.G., A Theory of conditional Information with Applications, IEEE Transactions on Systems, Man and Cybernetics 24(12) (1994) 1676{1684. [43] Caratheodory, C., U ber den Variabilitatsbereich der Koezienten von Potenzreihen, die gegebene Werte nicht annehmen, Mathematische Annalen 64 (1907) 95{115. [44] Cerny, V., A Thermodynamical Approach to the Traveling Salesman Problem: An Ecient Simulation Algorithm, Journal of Optimization Theory and Applications 45(1) (1985) 41{51. [45] Charnes, A., and W.W. Cooper, Programming with Linear Fractional Functionals, Naval Research Logistics Quarterly 9 (1962) 181{186. [46] Chebychev, P.L., (1867) On Mean Values, in: D.E. Smith (ed.) A Source book of Mathematics, II, New-York: Dover, 1959. [47] Chen, P.C., P. Hansen and B. Jaumard, On-Line and O-Line Vertex Enumeration by Adjacency Lists, Operations Research Letters 10(7) (1991) 403-409. [48] Chen, P.C., P. Hansen and B. Jaumard, Partial Pivoting in Vertex Enumeration, GERAD Research Report 92{15, Montreal, 1992. [49] Cheeseman, P., In Defense of Probability, Proc. 9th International Joint conf. on Arti cial Intelligence, Los Angeles, 1985, 1002{1009. [50] Cheeseman, P., A Method of Computing Generalized Bayesian Probability Values for Expert Systems, Proc. Eighth International Joint Conference on Arti cial Intelligence, Karlsruhe, (1983) 198{292. [51] Chesnokov, S.V., The Eect of Semantic Freedom in the Logic of Natural Language, Fuzzy Sets and Systems 22 (1987) 121{154. [52] Choquet, G., Theory of Capacities, Annales de l'Institut Fourier 5 (1954) 131{ 291. 55

[53] Colbourn, C.J., The Combinatorics of Network Reliability, Oxford; Oxford University Press, 1987. [54] Coletti, G., Conditionally Coherent Qualitative Probabilities, Statistica 48 (1988) 235{242. [55] Coletti, G., Coherent Qualitative Probability, Journal of Mathematical Psychology 34 (1990) 297{310. [56] Coletti, G., Numerical and Qualitative Judgments in Probabilistic Expert Systems, in R. Scozzafava (ed.) Proceedings of the Workshop on Probabilistic Expert Systems, Roma SIS (1993) 37{55. [57] Coletti, G., IEEE Transactions on Systems, Man and Cybernetics 34(12) (1994). [58] Coletti, G., and R. Scozzafava, Characterization of Coherent Conditional Probabilities as a Tool for their Assessment and Extension, Research Report, Dept. Math Univ di Perugia, Italy, 1996. [59] Cooper, G.F., The Computational Complexity of Probabilistic Inference using Bayesian Belief Networks, Arti cial Intelligence 42 (1990) 393{405. [60] Crama, Y., Recognition Problems for Special Classes of Polynomials in 0{1 Variables, Mathematical Programming 44 (1989) 135{155. [61] Crama, Y., P. Hansen and B. Jaumard, The Basic Algorithm for Pseudo-Boolean Programming Revisited, Discrete Applied Mathematics 29(2{3) (1989) 171{185. [62] Crama, Y., and J. van de Klundert, Approximation Algorithms for Integer Covering Problems via Greedy Column Generation, RAIRO, Recherche Operationnelle 28(3) (1994) 283{302. [63] Dantzig, G.B., On the Signi cance of Solving Linear Programming Problems with Some Integer Variables, Econometrica 28 (1961) 30{44. [64] Dantzig, G.B., Linear Programming and Extensions, Princeton University Press, Princeton, 1963. [65] Dantzig, G.B. and B.C. Eaves, Fourier-Motzkin and its Dual, Journal of Combinatorial Theory (A)14 (1973) 288{297. [66] de Finetti, B., Problemi determinati e indeterminati nel calcolo delle probabilita, Rendiconti Reale Accademia dei Lincei 6(XII) (1930) 367{373. 56

[67] de Finetti, B., La prevision: ses lois logiques, ses sources subjectives, Annales de l'Institut Henri Poincare 7 (1937) 1{68. [68] de Finetti, B., Theory of Probability { A Critical Introductory Treatment, Vol. 1, Wiley, New York, 1974. [69] de Finetti, B., Theory of Probability { A Critical Introductory Treatment, Vol. 2, Wiley, New York, 1975. [70] Dinkelbach, W., On Nonlinear Fractional Programming, Management Science 13, (1967) 492{498. [71] Douanya-Nguetse, G.-B., P. Hansen, B. Jaumard, Probabilistic Satis ability and Decomposition, Les Cahiers du GERAD, G{94{55, December 1994, 15 pages. [72] Dowsland, K.A., Simulated Annealing, in C.R. Reeves (ed.) Modern Heuristic Techniques for Combinatorial Problems London: Blackwell (1993) 20{69. [73] Driankov, D., Reasoning with Consistent Probabilities, Proceedings of IJCAI (1987) 899{901. [74] Dubois, D. and H. Prade, Fuzzy sets and Systems: Theory and Applications, Academic Pres, New York, 1980. [75] Dubois, D. and H. Prade, A tentative comparison of numerical approximate reasoning methodologies, International Journal of Man-Machine Studies 27 (1987) 717{728. [76] Dubois, D. and H. Prade, Possibility Theory, Plenum Press, New York, 1988. [77] Dyer, M.E., On the Complexity of Vertex Enumeration Methods, Mathematics of Operations Research 8(3) (1983) 381-402. [78] Fagin, R., J.Y. Halpern et N. Megiddo, A Logic for Reasoning about Probabilities, Information and Computation 87 (1990) 78{128. [79] Fekete, M., and G., Polya, U ber ein Problem von Laguerre, Rendiconti del Circolo Matematico di Palermo 23 (1912) 89{120. [80] Fortet, R., L'algebre de Boole et ses applications en Recherche Operationnelle, Cahiers du Centre d'Etudes de Recherche Operationnelle 1:4 (1959) 5{36. [81] Fortet, R., Applications de l'algebre de Boole en Recherche Operationnelle, Revue Francaise d'Informatique et de Recherche Operationnelle 4:14 (1960) 17{25. 57

[82] Frisch, A.M. and P. Haddawy, Convergent Reduction for Probabilistic Logic, Uncertainty in Arti cial Intelligence 3, Amsterdam: Elsevier, 1987, 278{286. [83] Frisch, A.M. and P. Haddawy, Anytime Deduction for Probabilistic Logic, Arti cial Intelligence 69 (1994) 93{122. [84] Gelembe, E., Une generalisation probabiliste du probleme SAT, Comptes Rendus de l'Academie des Sciences de Paris 315 (1992) 339{342. [85] Georion, A.M., Generalized Benders Decomposition, Journal of Optimization Theory and its Applications, 1972. [86] Georgakopoulos, G., D. Kavvadias, and C.H. Papadimitriou, Probabilistic Satis ability, Journal of Complexity 4 (1988) 1{11. [87] Glover, F., Tabu Search | Part I, ORSA Journal on Computing 1 (1989) 190{ 206. [88] Glover, F., Tabu Search | Part II, ORSA Journal on Computing 2 (1990) 4{32. [89] Goemans, M.X. and D.P. Williamson, A New 3/4-approximation Algorithm for MAX SAT, in: Proceedings of the third IPCO Conference, G. Rinaldi and L. Wolsey (eds.), (1993) 313{321. [90] Goodman, I.R., A Measure-Free Approach to Conditioning. Proc. 3rd AAAI Workshop on Uncertainty in AI, Seattle, July 1987, 270{277. [91] Goodman, I.R., and H.T. Nguyen, Conditional Objects and the Modeling of Uncertainties, in: Fuzzy Computing, Theory, Hardware and Applications, M.M. Gupta, T. Yamakawa (eds.), North-Holland, Amsterdam, 1988, 119{138. [92] Goodman, I.R., H.T. Nguyen and E.A. Walker, Conditional Inference and Logic for Intelligent Systems, Amsterdam; North-Holland, 1991. [93] M. Grotschel, L. Lovasz and A. Schrijver, The Ellipsoid Method and its Consequences in Combinatorial Optimization, Combinatorica 1 (1981) 169{197. (Corrigendum 4 (1984) 291{295). [94] Grzymala-Busse, J.W., Managing Uncertainty in Expert Systems, Kluwer, Boston, 1991. [95] Gu, J., Ecient Local Search for Very Large-Scale Satis ability Problems, SIGART Bulletin 3 (1992) 8{12. 58

[96] Guggenheimer, H. and R.S. Freedman, Foundations of Probabilistic Logic, Proceedings of IJCAI (1987) 939{941. [97] Hailperin, T., Best Possible Inequalities for the Probability of a Logical Function of Events, American Mathematical Monthly 72 (1965) 343{359. [98] Hailperin, T., Boole's Logic and Probability, Studies in Logic and the Foundations of Mathematics 85, North Holland, Amsterdam, rst edition, 1976. [99] Hailperin, T., Probability Logic, Notre-Dame Journal of Formal Logic 25(3) (1984) 198{212. [100] Hailperin, T., Boole's Logic and Probability, Studies in Logic and the Foundations of Mathematics 85, North Holland, Amsterdam, 2nd enlarged edition, 1986. [101] Hailperin, T., Probability Logic, Manuscript, 1993. [102] Halpern, A Study of First Order Logics of Probability, Arti cial Intelligence, 1991. [103] Hammer, P.L., I. Rosenberg and S. Rudeanu, 1963. On the Determination of the Minima of Pseudo-boolean Functions (in Romanian), Studii si Cercetari Matematice 14 (1963) 359{364. [104] Hammer, P.L., and S. Rudeanu, Boolean Methods in Operations Research and Related Areas, Berlin: Springer, 1966. [105] Hansen, P, Les procedures d'optimisation et d'exploration par separation et evaluation, in: B Roy (Ed.) Combinatorial Programming, Dordrecht: Reidel (1975) 19{65. [106] Hansen, P. and B. Jaumard, Algorithms for the Maximum Satis ability Problem, Computing 44 (1990) 279{303. [107] Hansen, P., B. Jaumard and G.B. Douanya Nguetse, Best Second Order Bounds for Two-terminal Network Reliability with Dependent Edge Failures, Les Cahiers du GERAD G{94{01, February 1994, 23 pages. [108] Hansen, P., B. Jaumard, G.-B. Douanya Nguetse and M. Poggi De Arag~ao, Models and Algorithms for Probabilistic and Bayesian Logic, in: IJCAI-95 Proceedings of the Fourteenth International Joint Conference on Arti cial Intelligence 2 (1995) 1862{1868. 59

[109] Hansen, P., B. Jaumard and V. Mathon, Constrained Nonlinear 0{1 Programming, ORSA Journal on Computing 5 (1993) 97{119. [110] Hansen, P., B. Jaumard and M. Poggi de Arag~ao, Un algorithme primal de programmation lineaire generalisee pour les programmes mixtes, Comptes Rendus de l'Academie des Sciences de Paris 313(I) (1991) 557{560. [111] Hansen, P., B. Jaumard and M. Poggi de Arag~ao, Mixed-Integer Column Generation Algorithms and the Probabilistic Maximum Satis ability Problem, Integer Programming and Combinatorial Optimization II, E. Balas, G. Cornuejols and R. Kannan (Eds.), Pittsburgh: Carnegie Mellon University, (1992) 165{180. [112] Hansen, P., B. Jaumard and M. Poggi de Arag~ao, Boole's Conditions of Possible Experience and Reasoning Under Uncertainty, Discrete Applied Mathematics 60 (1995) 181{193. [113] Hansen, P., M. Minoux, and M. Labbe, Extension de la programmation lineaire generalisee au cas des programmes mixtes, Comptes Rendus de l'Academie des Sciences de Paris 305 (1987) 569{572. [114] Hansen, P. and B. Simeone, Unimodular Functions, Discrete Applied Mathematics 14 (1986) 269{281. [115] Henrion, M., Propagating Uncertainty in Bayesian Networks by Probabilistic Logic Sampling, Uncertainty in Arti cial Intelligence 2, J.F. Lemmer and L.N. Kanal (Eds.), North-Holland, Amsterdam, 1988, 149{164. [116] Hooker, J.N., A Mathematical Programming Model for Probabilistic Logic, Working paper 05{88{89, Graduate School of Industrial Engineering, Carnagie-Mellon University. Pittsburg, Pa 15123, July 1988. [117] Jaumard, B., P. Hansen and M. Poggi de Arag~ao, Column Generation Methods for Probabilistic Logic, ORSA Journal on Computing 3 (1991) 135{148. [118] Kampke, T., Probabilistic Logic via Capacities, International Journal of Intelligent Systems 10 (1995) 857{869. [119] Kane, T.B., Maximum Entropy in Nilsson's Probabilistic Logic, in: Proceedings of IJCAI 1989, Morgan Kaufmann, California, 442{447, 1989. [120] Kane, T.B., Enhancing the Inference Mechanisism of Nilsson's Probabilistic Logic, International Journal of Intelligent Systems 5(5) (1990) 487{504. 60

[121] Kane, T.B., Reasoning with Maximum Entropy in Expert Systems, in: W.T. Grandy and L.H. Schick (Eds.), Maximum Entropy and Bayesian Methods, Kluwer Academic Publishers, Boston, 201{214, 1991. [122] Kane, T.B., Reasoning with Uncertainty Using Nilsson's Probabilistic Logic and the Maximum Entropy Formalism, Doctoral Dissertation, Heriot-Watt University, Edinburgh, 1992. [123] Kane, T.B., P. McAndrew and A.M. Wallace, Model-Based Object Recognition Using Probabilistic Logic and Maximum Entropy, International Journal of A.I. and Pattern Recognition 5(3) (1991) 425{437. [124] Kavvadias, D. and C.H. Papadimitriou, A Linear Programming Approach to Reasoning about Probabilities, Annals of Mathematics and Arti cial Intelligence 1 (1990) 189{205. [125] Keynes, J.M., A Treatise on Probability, London: Macmillan, 1921. [126] Khachiyan, L.G., A Polynomial Algorithm in Linear Programming (in Russian), Doklady Akademii Nauk SSSR 224, 1093{1096, 1979. (English translation: Soviet Mathematics Doklady 20, 191{194, 1979). [127] Kirkpatrick, S., C.D. Gelatt and M.P. Vecchi, Optimization by Simulated Annealing, Science 220(4598) (1983) 671{674. [128] Kohlas, J. and P.-A. Monney, Probabilistic Assumption-Based Reasoning, Working Paper 94{22, Institute of Informatics, University of Fribourg, 1994. [129] Kohlas, J. and P.-A. Monney, Assumption Based Truth Maintenance, Lecture Notes in Computer Science, Berlin, Springer (1995). [130] Kounias S. and J. Marin, Best Linear Bonferroni Bounds, SIAM Journal on Applied Mathematics 30 (1976) 307{323. [131] Kounias, S. and K. Sotirakoglou, Upper and Lower Bounds for the Probability that r Events Occur, Optimization 27 (1993) 63{78. [132] Lad, F., J.M. Dickey and M.A. Rahman, The Fundamental Theorem of Prevision, Statistica 50 (1990) 19{38. [133] Lad, F., J.M. Dickey and M.A. Rahman, Numerical Applications of the Fundamental Theorem of Prevision, Journal of Statistical Computing and Simulation 40 (1992) 135{151. 61

[134] Laskey, K.B. and P.E. Lehner, Assumptions, Beliefs and Probabilities, Arti cial Intelligence 41 (1990) 65{77. [135] Lauritzen, S.L. and Spiegelhalter, D.J., Computation with Probabilities in Graphical Structures and their Application to Expert Systems, Journal of the Royal Statistical Society B 50(2) (1988) 157{224. [136] Lewis, D., Probabilities of Conditionals and Conditional Probabilities, Philosophical Review 85 (1976) 297{315. [137] Lin, S. and B.W. Kernighan, An Eective Heuristic Algorithm for the Traveling Salesman Problem, Operations Research 21 (1973) 498{516. [138] Liu, W. and A. Bundy, A Comprehensive Comparison between Generalized Incidence Calculus and the Dempster-Shafer Theory of Evidence, International Journal of Human-Computer Studies 40 (1994) 1009{1032. [139] McLean, R.G., A. Bundy and W. Liu, Assignment Methods for Incidence Calculus, International Journal of Approximate Reasoning 12 (1995) 21{41. [140] McLeish, M., A Note on Probabilistic Logic, Proceedings of American Association for Arti cial Intelligence Conference, St Paul-Minneapolis, 1988. [141] McLeish, M., Probabilistic Logic: Some Comments and Possible Use for Nonmonotonic Reasoning, in J.F. Lemmer and L.N. Kaval (Editors), Uncertainty in Arti cial Intelligence 2, Amsterdam:North-Holland, 55{62, 1988. [142] Marsten, R.E., The design of the XMP Linear Programming Library, ACM Transactions on Mathematical Software 7(4) (1981) 481{497. [143] Medolaghi, La logica matematica e il calcolo delle probabilita, Bolletino Associazione Ittaliani di Attuari 18 (1907). [144] Nilsson, N.J., Probabilistic logic, Arti cial Intelligence 28(1) (1986) 71{87. [145] Nilsson, N.J., Probabilistic Logic Revisited, Arti cial Intelligence 59 (1993) 3942. [146] Paris and Vancovska, On the Applicability of Maximum Entropy to Inexact Reasonning, International Journal of Approximate Reasonning 3 (1988) 1{34. [147] Parker, K.P. and E.J. McCluskey, Analysis of Logic with Faults Using Input Signal Probabilities, IEEE Transactions of Computers C{24 (1975) 573{578. 62

[148] Parker, K.P. and E.J. McCluskey, Probabilistic Treatment of General Combinational Networks, IEEE Transactions of Computers C{24 (1975) 668{670. [149] Pearl, J., How to Do with Probabilities what People say you Can't, Proceedings of the Second Annual Conference on Arti cial Intelligence Applications, December 11{13, Miami, Florida, 6{12, 1985. [150] Pearl, J., Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, Morgan Kaufmann, San Mateo, California, 1988. [151] Prekopa, A., Boole-Bonferroni Inequalities and Linear Programming, Operations Research 36 (1988) 145{162. [152] Prekopa, A., Sharp Bounds on Probabilities Using Linear Programming, Operations Research 38 (1990) 227{239. [153] Prekopa, A., The Discrete Moment Problem and Linear Programming, Discrete Applied Mathematics 27 (1990) 235{254. [154] Prekopa, A. and E. Boros, On the Existence of a Feasible Flow in a Stochastic Transportation Network, Operations Research 39 (1991) 119{129. [155] Quinlan, J.R., Inferno: A Cautious Approach to Uncertain Inference, The Computer Journal 26 (1983) 255{269. [156] Reichenbach, H., Philosophical Foundations of Quantum Mechanics, University of California Press, Berkeley, 1948. [157] Reiter, S. and D.B. Rice, Discrete Optimization Solution, Procedures for Linear and Nonlinear Integer Programming Problems, Management Science 12 (1966) 829{850. [158] Reeves, C.R. (Editor) Modern Heuristic Techniques for Combinatorial Problems, London: Blackwell (1993). [159] Rhys, J.M.W., A Selection Problem of Shared Fixed costs and Network Flows, Management Science 17 (1970) 200{207. [160] Schay, G., An Algebra of Conditional Events, Journal of Mathematical Analysis and Applications 24 (1968) 334{344. [161] Selman, B., H. Levesque and D. Mitchell, A New Method for Solving Hard Satis ability Problems, Proceedings of the Tenth National Conference on Arti cial Intelligence 1992, 440{446. 63

[162] Shafer, G., A Mathematical Theory of Evidence, Princeton University, Princeton, NJ, 1976. [163] Simeone, B., D. de Werra and M. Cochand., Combinatorial Properties and Recognition of Some Classes of Unimodular Functions, Discrete Applied Mathematics 29 (1990) 243{250. [164] Stephanou, H.S. and A.P. Sage, Perspectives on imperfect information processing, IEEE Transactions on Systems, Man and Cybernetics 17 (1987) 780{798. [165] Suppes, P., Probabilistic Inference and the Concept of Total Evidence, in: J. Hintikka and P. Suppes (Eds.), Aspects of Inductive Logic, Amsterdam: NorthHolland, 1966, 49{65. [166] Tarjan, R.E. and M. Yannakakis, Simple Linear-time Algorithms to Test Chordality of Graphs, Test Acyclicity of Hypergraphs and Selectively Reduce Acyclic Hypergraphs, SIAM Journal of Computing 13 (1984) 566{579. [167] Van der Gaag, L.C., Probability-Based Models for Plausible Reasoning, Ph.D. Thesis, University of Amsterdam, 1990. [168] Van der Gaag, L.C., Computing Probability Intervals Under Independency Constraints, in: P.P. Bonissone, M. Henrion, L.N. Kanal and J.F. Lemmer (Eds) Uncertainty in Arti cial Intelligence 6 (1991) 457{466. [169] Van Laarhoven, P.J.M. and E.H.L. Aarts, Simulated Annealing: Theory and Applications, Dordrecht: Kluwer, 1988. [170] Walley, P., Statistical Reasoning with Imprecise Probabilities, Chapman and Hall, Melbourne, 1991. [171] Wilbraham, H., On the Theory of Chances Developed in Professor Boole's \Laws of Thought", The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 4(7) (1854) 465{476. [172] Zadeh, L.A., Fuzzy Sets, Information and Control 8 (1965) 338{353. [173] Zadeh, L.A., Fuzzy Sets as a Basis for a Theory of Possibility, Fuzzy Sets and Systems 1 (1978) 3{28. 64

[174] Zadeh, L.A., Is Probability Theory Sucient for Dealing with Uncertainty in AI: A Negative View, in: L.N. Kanal and J.F. Lemmer (Eds.), Uncertainty in Arti cial Intelligence 4 North-Holland, 1986, 103{106. [175] Zemel, E., Polynomial Algorithms for Estimating Network Reliability, Networks 12 (1982) 439{452.

65

March 21, 1996

Contents 1 Introduction

1.1 Uncertainty and Probability : : : : : : : : : : : : : : : : : : : : : : : : 1.2 Probabilistic Satis ability : : : : : : : : : : : : : : : : : : : : : : : : : 1.3 Extensions : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1.3.1 Probability Intervals (or Imprecise Probabilities) : : : : : : : : : 1.3.2 Conditional Probabilities : : : : : : : : : : : : : : : : : : : : : : 1.3.3 Additional Linear Constraints : : : : : : : : : : : : : : : : : : : 1.3.4 Logical Operations on Conditional Events and their Probabilities

2 Analytical solution of PSAT

Boole's algebraic method : : : : : : : : : : : : : : : : : : : : : : : : : : Hailperin's extensions of Boole's algebraic method : : : : : : : : : : : : Polyhedral methods to obtain rules for combining bounds on probabilities Automated theorem proving with probabilistic satis ability : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 2.5 Theorem proving with condensed forms of probabilistic satis ability : :

2.1 2.2 2.3 2.4

3 Numerical Solution of PSAT

3.1 Column Generation : : : : : : : : 3.2 Solution of the auxiliary problem 3.2.1 Heuristics : : : : : : : : : 3.2.2 Exact algorithms : : : : : 3.3 Computational Experience : : : : 3.4 Computational Complexity : : : :

4 Decomposition

: : : : : :

: : : : : :

: : : : : :

: : : : : :

: : : : : :

: : : : : :

: : : : : :

: : : : : :

: : : : : :

: : : : : :

: : : : : :

: : : : : :

: : : : : :

: : : : : :

: : : : : :

: : : : : :

: : : : : :

: : : : : :

: : : : : :

: : : : : :

: : : : : :

1 1 3 5 5 6 8 9

11 11 13 15

17 19

21 21 23 23 28 33 33

35

5 Nonmonotonic Reasoning and Restoring Satis ability

40

6 Other Uses of the Probabilistic Satis ability Model

42

7 Other Related Approaches

45

5.1 Minimal Extension of Probability Intervals : : : : : : : : : : : : : : : : 40 5.2 Probabilistic Maximum Satis ability : : : : : : : : : : : : : : : : : : : 41 6.1 Maximum Entropy Solution : : : : : : : : : : : : : : : : : : : : : : : : 42 6.2 Anytime Deduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 44

7.1 7.2 7.3 7.4 7.5

Incidence Calculus : : : : : : : : : : : : : : : : Bayesian Logic : : : : : : : : : : : : : : : : : : Assumption-based Truth Maintenance Systems : Probabilistic Logic via Capacities : : : : : : : : Other Applications : : : : : : : : : : : : : : : :

8 Conclusions

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

: : : : :

45 46 48 48 49

50

1 Introduction 1.1 Uncertainty and Probability Due to the ubiquity of uncertainty both in knowledge and in inference rules, generalizations of logic-based methods including an explicit treatment of uncertainty have long been studied in Arti cial Intelligence. In fact, probability logic predates ai by more than a century | see Hailperin [101] for a detailed historical survey. Uncertainty has been studied following dierent perspectives. It has been argued (e.g., Zadeh [174]) that probability theory is not adequate to the treatment of uncertainty in ai. Alternate frameworks, such as fuzzy sets (e.g., Zadeh [172], Dubois and Prade [74]) and possibility theory (e.g., Zadeh [173], Dubois and Prade [76]) were proposed. Many speci c rules for combining uncertainty measures (or estimates) in expert systems were also elaborated (e.g., the certainty factor of mycin, see Buchanan and Shortlie [35]). Unsatisfaction with such skepticism and with alternate solutions led to reactions. Among others, Cheeseman [49] makes \A defense of probability" and Pearl [149] explains \How to do with probabilities what people say you can't". Successful methods spurred a recent return in favor of probability theory, highlighted by Nilsson's [144] paper on \Probability Logic" and Lauritzen and Spiegelhalter's [135] paper on local computation in Bayesian networks. The purpose of the present chapter is to survey the probability based treatment of uncertainty in ai, from an algorithmic point of view. To this eect the focus will be on a central model, probabilistic satis ability (equivalent to Nilsson's [144] probabilistic logic and probabilistic entailment) and its extensions. This model provides a way to make inferences of a general type. It is thus equivalent in practice, although expressed dierently, to a probability logic. For general discussions of propositional and rst-order probability logic the reader is referred to, e.g., Hailperin [99], Fagin, Halpern and Megiddo [78], Bacchus [16], Halpern [102], Grzymala-Busse [94], Abadi and Halpern [2]. The chapter is organized as follows. A formal statement of probabilistic satis 1

ability is given in the next subsection. Extensions are considered in Subsection 1.3: probability intervals (or imprecise probabilities), conditional probabilities in the constraints or objective function, and further linear constraints on the probabilities are introduced as well as probabilities for negations, conjunctions or disjunctions of conditional events and iterated conditionals. Analytical solution of probabilistic satis ability and its extensions is studied in Section 2. It treats of algebraic methods and of methods based on enumeration of vertices and extreme rays of polytopes. Applications to automated theorem proving in the theory of probabilities are described. Numerical solution of probabilistic satis ability is considered in Section 3. The column generation technique of linear programming is shown to play a crucial role. The auxiliary problem of nding the minimum (maximum) reduced cost, to be solved at each iteration when using column generation, is to minimize (maximize) a nonlinear function in 0{1 variables. It can be solved approximately, except when no more negative (positive) reduced costs can be found, by Tabu Search or some other heuristic. Then an exact solution method must be used: algebraic and linearization approaches appear to be the most ecient. Section 5 discusses the solution of large satis ability problems by decomposition. When the proposed probabilities are not consistent it is required to restore satis ability with minimal changes, a form of nonmonotonic reasoning. Several ways to do so are examined: probability intervals may be increased in a minimal way, which can again be done by linear programming, or a minimum subset of sentences may be deleted. The probabilistic maximum satis ability problem arising in this last case requires for its solution to combine column generation with mixed integer programming. Two ways to do so, extending the primal and dual approaches to mixed-integer programming are presented. In Section 6, ways to exploit the probabilistic satis ability model with aims different from those considered before are examined. Obtention of a unique solution, i.e., probability distribution, for probabilistic satis ability is rst examined. A natural tool is then entropy maximization. Next anytime deduction (Frisch and Haddawy [83]) is discussed. Bounds of increasing precision are computed using a set of rules whose application may be stopped whenever desired. This gives also an explicit justi cation 2

for the results obtained. In Section 7, probabilistic satis ability is compared with related approaches to the treatment of uncertainty in ai, mainly Bundy's [36, 37] Incidence calculus, Bayesian networks (e.g., Pearl [150], Lauritzen and Spiegelhalter [135]) and their combination with probabilistic satis ability known as Bayesian Logic, (Andersen and Hooker [11]). We also discuss the probabilistic assumption based truth maintenance systems (e.g., Kohlas and Monney [129]) and the recent work of Kampke [118] on extending probabilistic satis ability to belief functions using capacities (Choquet [52]). Applications of probabilistic satis ability or related models outside ai are brie y mentioned. Conclusions on the role of probabilistic satis ability in ai and related elds are drawn in Section 8.

1.2 Probabilistic Satis ability The probabilistic satis ability problem in decision form may be de ned as follows: Consider m logical sentences S1; S2; : : :; Sm de ned on n logical variables x1; x2; : : :; xn with the usual Boolean operators _ (logical sum), ^ (logical product) and (negation, or complementation). Assume probabilities 1; 2; : : :; m for these sentences to be true are given. Are these probabilities consistent? There are 2n complete products wj , for j = 1; 2; : : : ; 2n, of the variables x1; x2; : : : ; xn in direct or complemented form. These products may be called, following Leibniz, possible worlds. In each possible world wj any sentence Si is true or false. The probabilistic satis ability problem may then be reformulated: is there a probability distribution p1; p2; : : : ; p2 on the set of possible worlds such that the sum of the probabilities of the possible worlds in which sentence Si is true is equal to its probability i of being true, for i = 1; 2; : : : ; m. De ning the m 2n matrix A = (aij ) by 8 < 1 if Si is true in possible world wj aij = : 0 otherwise n

3

the decision form of probabilistic satis ability may be written:

1p = 1 Ap = p 0

(1)

where 1 is a 2n unit row vector, p and are the column vectors (p1; p2; : : :; p2 )T and (1; 2; : : :; m)T respectively. The answer is yes if there is a vector p satisfying (1) and no otherwise. Note that not all columns of A need be dierent. Moreover, not all 2m possible dierent column vectors of A need, or in most cases will, be present. This is due to the fact that some subset of sentences being true will force other sentences to be true or prohibit them from being so. Guggenheimer and Freedman [96] study the particular case in which for a subset of sentences all possible corresponding subvectors of A are present and the values of all sentences of the complementary subset are xed when the variables in any of these subvectors are xed. n

Considering one more sentence Sm+1 , with an unknown probability m+1 leads to the optimization form of probabilistic satis ability. Usually the constraints (1) do not impose a unique value for the probability m+1 of Sm+1 . As shown by de Finetti [67, 68, 69] this is the case if and only if the line-vector Am+1 = (am+1;j ) where am+1;j = 1 if Sm+1 is true in possible world wj and am+1;j = 0 if not, is a linear combination of the rows of A. Otherwise, the constraints (1) imply bounds on the probability m+1. The satis ability problem in optimization form is to nd the best possible such bounds. It can be written min / max Am+1p subject to: 1p=1 (2) Ap = p 0: Nilsson [144] calls (1) and (2) probabilistic logic and probabilistic entailment. However, while (1) and (2) are very useful inference tools they do not properly constitute a logic, i.e., a set of axioms and inference rules. The name of probabilistic satis ability, proposed by Georgakopoulos, Kavvadias and Papadimitriou [86], appears better suited as it stresses the relationship of (1) with the satis ability problem, which is the 4

particular case where = 1 and a solution with a single positive pj is required (which can be easily deduced from any other solution of (2)). As stressed by Kane [120, 122], two columns of (2) may dier only in their value in Am+1 and should not then be con ated and assumed to have the same probability, as suggested by Nilsson [144], for this would prohibit getting best possible bounds. Both problems (1) and (2) have their origin in the work of Boole [26, 27, 28, 29, 30], where they are called \conditions of possible experience" and \general problem in the theory of probabilities". Boole proposed algebraic methods for their solution (discussed below). Criticized by Wilbraham [171], and later by Keynes [125], Boole's work in probability was long forgotten in English-speaking countries. It seems however to have strongly in uenced de Finetti [66, 67, 68, 69], through Medolaghi [143], in the development of his theory of subjective probabilities. Boole's work was revived by Hailperin [97, 98, 100] who wrote a seminal paper explaining it with the help of linear programming, and a book-length study of Boole's logic and probability [98, 100]. Hailperin [97, 98, 100] also obtained several new results and proposed extensions of probabilistic satis ability, discussed below. Due to its basic character probabilistic satis ability was often independently rediscovered, sometimes in particular cases or variants, i.e., by Adams and Levine [8], Kounias and Marin [130], Nilsson [144], Chesnokov [51], Gelembe [84] and probably others.

1.3 Extensions 1.3.1 Probability Intervals (or Imprecise Probabilities) Several signi cant extensions of probabilistic satis ability have been proposed. Hailperin [97] noted that the use of intervals instead of point values for probabilities is often more

5

realistic and more general than Boole's \general problem". Then problem (2) becomes: min / max Am+1p subject to: 1p = 1 (3) Ap p 0: If bounded variables are used, an equivalent expression in which the number of constraints remains equal to m + 1 is obtained: min / max Am+1p subject to: 1p = 1 (4) Ap + s = p0 0 s ? : This problem is also discussed in Lad, Dicky and Rahman [132], Jaumard, Hansen and Poggi de Arag~ao [117], Andersen and Hooker [11]. An extensive study of statistical reasoning with imprecise probabilities, using (3) and various extensions, is due to Walley [170].

1.3.2 Conditional Probabilities Another important extension of probabilistic satis ability is to consider conditional probabilities instead of, or in addition to, unconditioned ones. Indeed, in many cases probabilistic knowledge is only precise when some conditions hold. Use of conditional probabilities was already discussed by Boole [27] for particular examples. It is connected with his idea of independence, which is examined in Section 7. Other authors addressing conditional probabilities in the context of probabilistic satis ability are Hailperin [100], Chesnokov [51], Jaumard, Hansen and Poggi de Arag~ao [117] and Coletti [57]. Two cases arise: conditionals may be in the constraints of (4) or in the objective function. Several ways of representing the conditional probability prob(Sk jS`) = 6

prob(S ^S ) = kj` in (2) have been proposed. Introducing a variable ` for the unknown prob(S ) probability prob(S`) leads to the two constraints (Jaumard et al. [117]: Ak^`p ? k^`` = 0 (5) Ak^`p ? ` = 0 where Ak^` = (ak^`;j ) with ak^`;j = 1 if both S` and Sk are true in possible world wj and 0 otherwise. This way to express conditional probabilities is close to that of Boole [26] who also introduces an unknown parameter. A more compact expression is obtained by eliminating ` (Hailperin [100]: k

`

`

A0k^`p = (Ak^` ? kj`A`)p = 0

(6)

i.e., A0k^` = (a0k^`;j ) where a0k^`;j = 1 ? kj` if Sk and S` are true, ?kj` if Sk is false and S` true and 0 if S` is false in possible world wj . Adding kj` 1 to both sides of (6) gives an equation

A00k^`p = kj`

(7)

where A00k^` = (a00k^`;j ) is such that a00k^`;j = 1 if Sk and S` are true, 0 if Sk is false and S` true and kj` if S` is false. Observe that these three values coincide with those given by de Finetti [68, 69] in his de nition of the probability of a conditional event in terms of a bet won, lost or cancelled. If the conditional probability prob(Sk jS`) is in the objective function, the problem becomes one of hyperbolic (or fractional) programming: min = max AA^pp subject to: 1p = 1 (8) Ap = p 0: k

`

`

As noted by Hailperin [100] and by Chesnokov [51], a result of Charnes and Cooper [45] may be used to reduce the problem (8) to a linear program with one more variable: min = max Ak^`p subject to: A` p = 1 1p = t (9) Ap = t p 0; t 0; 7

and the same optimal value; the corresponding solution is obtained by dividing the optimal solution p of (9) by t. Note that all but one of the equations of (9) are homogeneous. This may cause problems in numerical solution, due to degeneracy. An alternate way to solve (8) is to apply Dinkelbach's [70] lemma, as done by Jaumard, Hansen and Poggi de Arag~ao [117]. Let r = 1 and r be an upper bound for the optimal value of (8), in case of minimization (which can always be taken as 1). Solve the problem (Ak^` ? r A`)p 1p = 1 Ap = p0

min subject to:

(10)

If the optimal value (Ak^` ? r A`)p is non-negative, stop, p being optimal. Otherwise, let r r + 1, r = AA^pp and iterate. k

`

`

1.3.3 Additional Linear Constraints Fagin, Halpern and Megiddo [78] note that if some of the i are not xed they may be subject to v 1 further linear inequalities. This leads to another extension: min = max subject to:

Am+1p 1p = 1 Ap + s = B = b

(11)

where B and b are a (v m)-matrix and a v-column vector of real numbers. This includes the problem of coherence of qualitative probabilities studied by, among others, Coletti [55, 56, 57] where only order relations between probabilities are given (with an arbitrarily small approximation if some or all of the inequalities are strict). Qualitative conditional probabilities, also studied by Coletti [54, 57], Coletti and Scozzafava [58] lead to a more complex nonlinear model. 8

Imprecise conditional probabilities can be treated similarly to imprecise probabilities. If kj` kj` k` the corresponding lines in the linear program are Ak^`p ? kj`A`p 0 (12) Ak^`p ? kj`A`p 0 Andersen and Hooker [10] propose a particular interpretation for this case, in terms of unreliable sources of information: prob(Sk jS`) is viewed as the probability that Sk is true given that the source of information ` is reliable. This last condition is expressed by proposition S` the probabilities of which is itself bounded by an interval:

` prob(S` ) = A`p `:

(13)

Conditional propositions themselves conditioned on the reliability of the source can also be expressed in a similar way. This is a particular case of iterated conditioning, a topic explored by, among others, Goodman, Nguyen and Walker [92], Calabrese [42] and discussed below.

1.3.4 Logical Operations on Conditional Events and their Probabilities Conditional probabilities P (S1jS2) may be viewed as probabilities of conditional events (S1jS2) which have three truth values: true if S1 and S2 are true, false if S1 is false and S2 true and undetermined if S2 is false. Such conditional events, implicit in Boole [26] were de ned by de Finetti [67, 68, 69] and rediscovered recently by many authors. Proposals for building an algebra of conditional events were made, more or less systematically, by Reichenbach [156], Schay [160], Adams [8], Hailperin [98, 100], Dubois and Prade [76], Bruno and Gilio [34], Calabrese [39, 40, 41, 42], Goodman, Nguyen and Walker [92]. Several de nitions, often justi ed on intuitive grounds, were given for conjunction and disjunction operations. Diculty is largely due to the fact that as shown by Lewis' Triviality Result [136], there is no expression S for (S1jS2) in boolean algebra such that P (S ) = P (S1jS2) except in very particular cases. Goodman, Nguyen and Walker [92] show that the space of conditional events is a Stone algebra, generalizing Boolean algebras. Moreover, they show that dierent ways to de ne conjunction 9

and disjunction correspond to dierent three-valued logics. Schay [160] proposes two systems:

(S1jS2) ^ (S3jS4) = (S 2 _ S1)(S 4 _ S3)j(S2 _ S4 (S1jS2) _ (S3jS4) = (S1S2 _ S3S4jS3 _ S4)

and

(S1jS2) ^ (S3jS4) = (S1S3jS2S4) (S1jS2) _ (S3jS4) = (S1 _ S3jS2S4) Goodman and Nguyen [91] propose another one:

(14) (15)

(S1jS2) ^ (S3jS4) = S1S3j(S 1S2 _ S 3S4 _ S2S4) (S1jS2) _ (S3jS4) = S1 _ S3j(S1S2 _ S3S4 _ S2S4) :

(16)

All three systems have negation de ned by (S1jS2) = (S 1S2jS2):

(17)

Truth tables for S1; S1 _ S2 and S1 ^ S2 as a function of S1 and S2 deduced from rules (14) (17) and (15) (17) are those of Sobocinski's and Bochvar's 3-valued logics. Those for the system (16){(17) correspond to Lukasievicz and Kleene's 3-valued logics (as well as to Heyting's 3-valued logic concept for S 1). These results show that any algebraic expression of conditional events can be reduced (in several ways) to a single conditional event. Probabilities of such compound expressions can thus be expressed in probabilistic satis ability models as usual conditional probabilities. Iterated conditionals have also been reduced to conditionals in various ways. For instance, Calabrese [42] proposes the relation

(S1jS2)j(S3jS4) = (S1jS2) ^ (S3 _ S 4)

The subject is also discussed in detail in Goodman, Nguyen and Walker [92].

10

(18)

2 Analytical solution of PSAT 2.1 Boole's algebraic method Boole [26, 27, 28, 29, 30] proposed several methods (some of which are approximate) to solve analytically the decision and optimization versions of probabilistic satis ability . Methods for both cases are similar. Boole equates truth of a logical sentence with the value 1 and falsity with 0. His simplest and most ecient method proceeds as follows:

Algorithm B: (Boole's method) (i) express all logical sentences as sums of complete products, i.e., products of all variables in direct or complemented form; (ii) associate to each of these products an unknown probability pj , write linear equations stating that the sum of the probabilities pj of the complete products associated with a logical sentence is equal to the (given) probability i of that sentence to be true. Add constraints stating that probability pj of all complete products sum to 1 and are non-negative; (iii) eliminate from the equalities and inequalities as many probabilities pj as possible using the equalities; (iv) eliminate from the inequalities obtained in the previous step the remaining probabilities pj as well as m+1 by considering all upper bounds and all lower bounds on one of them, stating that each lower bound is inferior to each upper bound, removing redundant constraints and iterating.

The obtained relations involving 1; : : :; m are Boole's conditions of feasible experience; the relations involving also m+1 give best possible bounds on this last probability, i.e., are the solution to Boole's general problem.

11

Example 1. (Boole's challenge problem, 1851 [23]) Let prob(S1 x1) = 1 prob(S2 x2) = 2 prob(S3 x1x3) = 3 prob(S4 x2x3) = 4 prob(S5 x1x2x3) = 0 Find best possible bounds on the probability of S6 = x3. Step (i) gives:

x1 x2 x1x3 x2x3

= = = =

x1x2x3 + x1x2x3 + x1x2x3 + x1x2x3 x1x2x3 + x1x2x3 + x1x2x3 + x1x2x3 x1x2x3 + x1x2x3 x1x2x3 + x1x2x3:

Step (ii), after setting p1 = prob(x1x2x3), p2 = prob(x1x2x3), p3 = prob(x1x2x3), p4 = prob(x1x2x3), p5 = prob(x1x2x3), p6 = prob(x1x2x3), p7 = prob(x1x2x3), p8 = prob(x1x2x3), yields the following equalities and inequalities: p1 +p2+p3+p4 = 1 p1 +p2 +p5+p6 = 2 p1 +p3 = 3 p1 +p5 = 4 p7 = 0 p1 +p2+p3+p4+p5+p6 +p7+p8 = 1 p1; p2; p3; : : :; p8 0: Eliminating successively the variables p7; p4; p3; p6; p5 p1 and p2 yields at the end of Step (iii), the bounds max(3; 4) 6 min(1 ? 1 + 3; 3 + 4; 1 ? 2 + 4) and the conditions

1 3 ; 2 4 : 12

Eliminating 6 yields the additional condition

1 ? 3 + 4 1:

2

2.2 Hailperin's extensions of Boole's algebraic method Boole's algebraic method can be extended to deal with conditional probabilities. In fact, Boole [26] himself already considered some problems involving conditional probabilities, but a systematic treatment was only provided by Hailperin [100] (and independently, to some extent, by Chesnokov [51]). As mentioned above, two cases arise. First, one may have a conditional probability in the objective function only. Then one can set up the problem's constraints as done above, express the objective function as a ratio of linear expressions and use Charnes and Cooper's [45] result to obtain the equivalent linear program (9). Eliminating the variables pj and t as above leads to an analytical solution.

Example 2. (Hailperin, 1986 [100]).

Given prob(x1) = 1, prob(x2) = 2 nd best possible bounds on prob(x1x2jx1 _ x2). Let p1 = prob(x1x2), p2 = prob(x1x2), p3 = prob(x1x2), p4 = prob(x1x2). Then this problem can be expressed as p1 min = max p1 + p2 + p3 subject to: p1 +p2 = 1 p1 +p3 = 2 p1 +p2 +p3 +p4 = 1 p1; p2; p3; p4 0; p1 + p2 + p3 > 0:

13

The equivalent problem (9) is min = max subject to:

p1 p1 +p2 = t1 p1 +p3 = t2 p1 +p2 +p3 +p4 = t p1 +p2 +p3 =1 p1; p2; p3; p4 0:

Eliminating successively p2 ; p4; p3 and t yields the bounds

1 2 maxf0; 1 + 2 ? 1g prob(x1x2jx1 _ x2) min ; : 2 1 There are no other conditions than 0 1 1, 0 2 1.

2

Second, one may have conditional probabilities in the constraints. The elimination process can be applied to these constraints, written, e.g., in the form (6). Note that this procedure amounts to solving a linear program with parametric right-hand side or with some parameters in the coecients matrix by Fourier-Motzkin elimination (Dantzig and Eaves [65]).

Example 3. (Suppes, 1966 [165], Hailperin, 1986 [100]) Given prob(x1) = 1, prob(x2jx1) = 2j1 nd best possible bounds on prob(x2).

De ning p1; p2; p3; p4 as in Example 2, this problem can be expressed as min = max subject to:

= p 1 + p3 p 1 + p2 = 1 (1 ? 2j1)p1 ? 2j1p2 = 0 p1 + p2 + p3 + p4 = 1 p1; p2; p3; p4 0 14

Eliminating successively p4 ; p3; p2 and p1 yields the bounds

2j11 1 ? 1(1 ? 2j1): The lower bound was found in another way by Suppes [165]. Again, there are no conditions except 0 1 1, 0 2 1. 2 Observe that these methods for determining combination rules for probability bounds on logical sentences are quite general. They could be used to obtain in a uniform way the many rules of probabilistic logic gathered by Frisch and Haddawy [83], or to study under which conditions high probability of a set of sentences entails high probability of another one, or of one among another set, as studied by Adams [3, 4, 5, 6, 7] and Bamber[18]. They can also be used to check if combination rules based on other grounds than probability theory agree with this theory or not, possibly with further assumptions (e.g., Guggenheimer and Freedman [96], Dubois and Prade [75], Stephanou and Sage [164]).

2.3 Polyhedral methods to obtain rules for combining bounds on probabilities Other methods than Fourier-Motzkin elimination for obtaining an analytical solution of probabilistic satis ability have been devised. They are based on the study of the dual polyhedra for (2). Let the dual of (2) be written: min(max) y0 + y subject to: 1 y0 + Aty Atm+1 (11y0 + Aty Atm+1)

(19)

Observe that the constraints of (19) are satis ed by the vector (1; 0) (0; 0) , so the corresponding polyhedra are non empty. Then, using the duality theorem of linear programming yields

Theorem 1 (Hailperin, [97])

The best lower (upper) bound for m+1 is given by the following convex (concave) piece-

15

wise linear function of the probability assignment:

m+1() = m+1() =

j max (1; )tymax j t min (1; ) ymin j =1;2;:::;kmin j =1;2;:::;kmax

(20)

j (y j ) for all j represent the kmax (kmin ) extreme points of (19). where ymax min

This result gives bounds on m+1 but not the conditions of possible experience. It has recently been completed. Consider rst the dual of the probabilistic satis ability problem in decision form (1), after adding a dummy objective function 0p; to be maximized: min y0 + y (21) subject to: 1 y + Aty 0: 0

Then using the fact that any point in a polyhedron can be expressed as a convex linear combination of its extreme points plus a linear combination of its extreme rays (Caratheodory's Theorem [43]), and once again the duality theorem, yields

Theorem 2 (Hansen, Jaumard and Poggi de Arag~ao, [112])

The probabilistic satis ability problem (1) is consistent if and only if

(1; )tr 0

(22)

for all extreme rays r of (21).

The same argument shows that (22) yields all conditions of possible experience for problem (2). Both Theorems 1 and 2 readily extend to the case of probability intervals (problem (3)) but not to the case of conditional probabilities. The reason is that the constraints of (19) and (21) do not depend on but that property ceases to be true when there are conditional probabilities. Several authors study analytically conditions of possible experience and bounds for particular classes of propositional logic sentences. Andersen [9] and Andersen and Hooker [12] consider a subclass of Horn clauses which can be represented by a directed 16

graph G = (V; U ). Vertices vi 2 V are associated with atomic propositions Si (or logical variables) and arcs (vi; vk ) with implications. Truth of the conjunction of variables xi associated with the predecessors vi of a vertex vk implies the truth of the variables vk associated to that vertex. Adams [3, 4, 8, 6, 7] and Bamber [18] examine when high probability for a given set of sentences (possibly including conditionals) implies high probability of another sentence, or of at least one sentence among another given set.

2.4 Automated theorem proving with probabilistic satis ability The results of the previous subsection lend themselves easily to automation. While this could also be done for Fourier-Motzkin elimination, it would probably be more timeconsuming, as nding all implied relations and eliminating redundant ones are tasks whose diculty rapidly augments with problem size (but that approach remains of interest and apparently the only feasible one, when there are conditional probabilities). Numerous algorithms have been proposed for vertex and extreme ray enumeration of polyhedra, see, e.g., Dyer [77] and Chen, Hansen and Jaumard [48] for surveys. Usually methods proposed for vertex enumeration can be extended to handle ray enumeration also. Approaches to vertex enumeration include: (i) exploration of the adjacency graph G = (V; E ) of the polyhedron, where vertices vj of G are associated to extreme points xk of the polyhedron and edges fvj ; vkg 2 E join pairs of vertices vj ; vk associated with extreme points xj ; xk which are the endpoints of edges of this polyhedron. The exploration rule is depth- rst search (Dyer [77]) or breadth- rst search. The diculty lies in determining whether a vertex has already been visited. Long lists of visited vertices must be kept in most methods; (ii) the reverse search approach of Avis and Fukuda [15] which avoids this last problem by de ning a priori an arborescence on the graph G = (V; E ). This is done by using Bland's [21] rule for choice of the entering variable even in case of degeneracy in the simplex algorithm. When applying depth- rst search, Bland's rule is reversed when arriving at a vertex x`. If the vertex x` is that one associated with the vertex xk from which one comes, then 17

x` is considered as rst explored and stored. Otherwise backtracking takes place; (iii) the adjacency lists method of Chen, Hansen and Jaumard [47] which does not use the simplex algorithm but keeps adjacency lists for vertices of polyhedra having initially only a few constraints and updates them when adding constraints one at a time. Note that when applying such methods to probabilistic satis ability degeneracy is frequent and must be taken care of. Automated derivation of bounds and conditions of possible experience makes easy the study of variants of a problem, e.g., going from point probabilities to intervals, as next illustrated.

Example 4.

Consider again Example 1, but without xing 5 at the value 0. Then conditions of possible experience and bounds, automatically obtained (Hansen, Jaumard and Poggi de Arag~ao [112]), are:

Conditions of possible experience Lower bounds 1 3 3 + 5 2 4 4 + 5 1 + 5 1 2 + 5 1 3 + 1 1 + 4 + 5 4 + 1 2 + 3 + 5 0 i 1 i = 1; 2; : : : ; 5

Upper bounds (1 ? 1) + 3 (1 ? 2) + 4 3 + 4 + 5

Replacing all point values 1; 2; : : :; 5 by intervals [1; 1]; [2; 2]; : : :; [5; 5] leads to:

18

Conditions of possible experience Lower bounds i i i = 1; 2; : : : ; 5 3 + 5 0 i i = 1; 2; : : : ; 5 4 + 5 i 1 i = 1; 2; : : : ; 5 3 1 4 2 1 + 5 1 2 + 5 1 4 + 5 1 1 + 4 + 5 3 + 1 2 + 3 + 5 4 + 1

Upper bounds (1 ? 1) + 3 (1 ? 2) + 4

3 + 4 + 5 1 + 4 + 5 2 + 3 + 5 1 + 2 + 5

It can be proved (Hansen, Jaumard and Poggi de Arag~ao [112]) that the bounds obtained in the case of point probabilities are never redundant. In other words, there is always a vector (1; ) for which the corresponding vertex of the dual polytope is optimal, and the bound is attained. This property does not hold anymore for the case of probability intervals.

2.5 Theorem proving with condensed forms of probabilistic satis ability As mentioned above, probabilistic satis ability as expressed in (1) or (2) leads to very large programs. When studying particular cases, one may condense rows or columns by summing them, to drastically reduce the size of these programs. This approach, explored by Kounias and Marin [130], Prekopa [151, 152, 153], Boros and Prekopa [31], Kounias and Sotirakoglou [131], has led to generalizations and improvements of several important results in probability theory. Consider for instance n events and assume that sums of probabilities 1; 2; : : : ; m for all products of 1; 2; : : : ; m events, i.e., the rst m binomial moments, are given. 19

Let vi for i = 1; 2; : : : ; n denote the probability that exactly i events occur. Then m P vi is the probability that at least one event occurs. The well-known Bonferroni [22] i=1 inequalities state that: n X i=1 n X i=1 n X i=1

vi 1 vi 1 ? 2 v i 1 ? 2 + 3

and so on. Various authors have proposed improved formulae in which the right-hand sides coecients are not all equal to 1 or ?1. The problem of nding best bounds can be written (Prekopa [151]): min = max subject to:

m X i=1 n X

vi

Ci1vi = 1 i=1 ...

n X

(23)

Cimvi = m i=1 vi 0 i = 0; 1; : : : ; m where the Cij are the binomial coecients. Problem (23) can be viewed as a condensed form of a probabilistic satis ability problem in which logical sentences correspond to all products of up to n variables in direct or complemented form. Using a result of Fekete and Polya [79], Prekopa [151] solves the dual of (23) explicitly, thus obtaining best possible \Boole-Bonferroni" bounds. Boros and Prekopa [31], Prekopa [152, 153], Kounias and Sotirakoglou [131] generalize these results in several ways. Lad, Dickey and Rahman [133] use the probabilistic satis ability model in a dierent way to extend the classical Bienayme-Chebychev [46] inequality in the context of nite discrete quantities. 20

3 Numerical Solution of PSAT 3.1 Column Generation The linear programs (1) and (2) which express the Probabilistic Satis ability problem in decision and optimization versions have a number of columns which grows exponentially in the minimum of the number m of sentences and the number n of logical variables in these sentences. In view of the enormous size of these programs (about 109 columns for min(m; n) = 30, 1018 columns for min(m; n) = 60, etc.), it has been stated several times in the ai literature that they are untractable in a practical sense, not only in the worst case (as will be shown below). For instance, Nilsson [145] in a recent review of work subsequent to his \Probability Logic" paper of 1986 [144], writes about the \total impractibility of solving large instances" and recommends to look for heuristics. Such views are overly pessimistic: while even writing large probabilistic satis ability problems explicitly is impossible, they can be solved quite eciently by keeping them implicit. The tool to be used is an advanced one of linear programming, called column generation. It extends the revised simplex method, in which only a small number of columns are kept explicitly, by determining the entering column through solution of an auxiliary subproblem. This subproblem depends on the type of problem considered and is usually one of combinatorial programming. We next recall the principle of the column generation method for linear programming. Consider the linear program min z = cx

subject to:

Ax = b; x0

(24)

and its solution by the simplex algorithm (e.g., Dantzig [64]). At a current iteration (after a possible reindexing of the variables), let A = (B; N ) where B and N denote the submatrices of basic and nonbasic columns respectively. 21

Problem (24) can be expressed as subject to:

min z = cB B ?1b + (cN ? cB B ?1N )xN

(25) xB + B ?1NxN = B ?1b; xB ; xN 0 where xB ; xN are the vectors of basic and nonbasic variables and cB ; cN the corresponding vectors of coecients in the objective function. In the revised simplex method, one stores only the matrix B ?1 (in compact form), the current basic solution B ?1b and value cB B ?1b in addition to the data. The entering variable is determined by computing the smallest reduced cost, using the initial data, i.e.,

ck ? cB B ?1Ak = min c ? cB B ?1Aj = cj ? uAj j 2N j

(26)

where u = cB B ?1 is the current vector of dual variables. This computation is not too time consuming provided the matrix A is sparse and the columns not too numerous. Then the entering column is computed as B ?1Ak and the simplex iteration proceeds as usual (optimality check, unboundedness check, choice of leaving variable, updating of solution and basis inverse). If the number of columns is exponential in the input size one must compute min c ? uAj j 2N j

(27)

without considering nonbasic columns one at a time. This is done by a speci c algorithm in which the coecients in the columns Aj are the variables. For probabilistic satis ability the subproblem (27) is min c ? uAj = Sm+1 ? u0 ? j 2N j

m X i=1

uiSi

(28)

where, as discussed above, the values True and False for the Si; i = 1; : : : ; m + 1 are identi ed with the numbers 1 and 0. Then (28) is transformed into an arithmetical expression involving the logical variables x1; : : : ; xn appearing in the Si , with values 22

true and false also associated with 1 and 0. This is done by eliminating the usual boolean connectives _; ^ and using relations

xi _ xj xi + xj ? xi xj xi ^ xj xi xj xi 1 ? xi:

(29)

The resulting expression is a nonlinear (or multilinear) real valued function in 0{1 variables, or nonlinear 0{1 function, or pseudo-boolean function (Hammer and Rudeanu [104]).

Example 5. Consider again the problem of Example 1. Then subproblem (28) is min S6 ? u0 ? u1S1 ? u2S2 ? u3S3 ? u4S4 ? u5S5 = x3 ? u0 ? u1x1 ? u2x2 ? u3x4x3 ? u4x2x3 ? u5x1x2x3 = ?u0 ? u1x1 ? u2x2 + (u5 ? u3)x1x3 + (u5 ? u4)x2x3 ? u5x1x2x3 with x1; x2; x3 2 f0; 1g:

2 Note that if the probabilistic satis ability problem considered is in decision form one performs only Phase 1 of the simplex algorithm, with column generation: minimization of the sum of arti cial variables added to the constraints. The corresponding columns are kept explicit (as long as their variables remain in the basis, they can be discarded otherwise).

3.2 Solution of the auxiliary problem 3.2.1 Heuristics Problem (28) must be solved at each iteration of the column generation method and may be time-consuming. Indeed, minimization of a nonlinear 0{1 function is NP-hard, as numerous NP-hard problems, e.g., independent set, can be easily expressed in 23

that form. However, for guaranteeing convergence it is not mandatory to solve (28) exactly at all iterations. As long as a negative reduced cost (for minimization) is found an iteration of the revised simplex algorithm may be done. If a feasible solution is obtained in that way, the decision version of probabilistic satis ability is solved. When no more negative reduced cost is given by the heuristic one must turn to an exact algorithm to prove that there is no feasible solution for the decision version of probabilistic satis ability or no feasible solution giving a better bound than the incumbent for the optimization version. It is worth stressing that while stopping the column generation method prior to optimality yields valid bounds for many combinatorial optimization problems (obtained by exploiting an equivalent but hard to solve compact formulation with a polynomial number of columns, and duality theory) this is not the case for probabilistic satis ability. Indeed, no such compact form is known and stopping before getting the best possible bounds yields only an upper bound on a lower bound (or a lower bound on an upper bound) of the objective function values. Such results are only estimates of those values and not bounds. The same is true when possible worlds are drawn at random, as suggested by Henrion [115]. As for large instances the number of iterations may be in the hundreds or thousands, designing ecient heuristics for (28) is of importance. Note that this problem may be viewed as a weighted version of maximum satisfiability (maxsat): given a set of m weighted clauses on n logical variables determine a truth assignment such that the sum of weights of the satis ed clauses is greater than or equal to a given value. Therefore, algorithms for the subproblem (both heuristic or exact) also apply to the satisfiability (sat) problem and to constraint satisfaction problems expressed in satis ability form. Conversely, some recent algorithms for sat (e.g., Selman, Levesque and Mitchell's gsat [161]) could be extended to weighted maxsat. An early heuristic which could apply to (28) (written in maximization form) is the steepest-ascent one-point move (saopma) method of Reiter and Rice [157]. It proceeds by choosing a rst truth assignment (or 0{1 vector) at random then complementing 24

the variable for which the resulting increase in objective function value is largest, and iterating as long as there is a positive increase. The trouble with such a method is that it quickly gets stuck in a local optimum which may have a value substantially worse than the global optimum. Improvements can be obtained by repeating the process a certain number of times (the so-called multistart procedure) but this may still give solutions far from the optimum. Much better results are obtained using so-called Modern Heuristics (see e.g., Reeves [158] for a book-length survey) which provide ways to get out of local optima. Among the earliest and best known of such methods is simulated annealing (Kirkpatrick, Gelatt and Vecchi [127]), Cerny [44]). In this method moves (variable complementations for weighted maxsat) are made by choosing a direction at random, accepting the move if it improves the objective function value and possibly also if it does not with a probability which decreases with the amount of deterioration and the time since inception of the algorithm. Figure 1 provides a description of simulated annealing for weighted maxsat, adapted from Dowsland [72], see also Hansen and Jaumard [106], for the unweighted case.

25

Simulated Annealing for minimizing a weighted maxsat function with objective function f (x) equal to sum of weights of clauses satis ed by x and neighborhood structure N (x) equal to vectors obtained by complementing one variable of x. Select an initial solution x0 ; Select an initial temperature t0 > 0; Select a temperature reduction function ; Repeat Repeat Randomly select x 2 N (x0); = f (x) ? f (x0); If < 0 then x = x0 else generate random q uniformly in the range (0; 1); if q < exp(?=t) then x0 = x; Until Iteration-count = nrep Set t = (t) Until stopping condition = true. x0 is an approximation to the optimal solution. See Dowland [72], van Laarhoven and Aarts [169] and Aarts and Korst [1] for discussions on the choice of parameters t0 , n rep, \cooling" function and stopping condition.

Figure 1 Simulated annealing exploits mostly the sign of the gradient of the objective value and not its magnitude (which interferes only with the probability of accepting a deteriorating move). In contrast, Tabu Search methods (e.g., Glover [87, 88], Hansen and Jaumard [106]) fully exploit gradient information while still providing a way to get out of local minima. In a simple version of such method for maxsat, called steepestascent-mildest-descent samd and due to Hansen and Jaumard [106] a direction of steepest ascent is followed until a local maximum, then a direction of mildest descent and cycling is avoided (at least for some time) by forbidding a reverse move for a given number of iterations. Figure 2 provides a description of such an algorithm for weighted maxsat close to that of Hansen and Jaumard [106]. 26

Note that the unweighted version of samd applies also to the satis ability problem sat in which one is only interested in solutions satisfying all clauses. It exploits gradient information as in the gsat algorithm of Selman, Levesque, and Mitchell [161] and in the algorithm of Gu [95] in the ascent phase, and search with tabus which forbid backtracking for some iterations to get out of a plateau. The latter two algorithms do this by ipping variables (in unsatis ed clauses for gsat) at random.

Steepest Ascent Mildest Descent for minimizing a weighted maxsat function. Select an initial solution x0 ; fopt = f (x0); xopt = x0 ; Set tj = 0 for j = 1; : : :; n; Repeat 0 = fopt fopt Repeat Select xk 2 N (x0) such that k = f (xk ) ? f (x0) = min j ; j jt =0 x0 = xk ; If f (xk ) < fopt then fopt = f (x0 ); xopt = x0 ; endif; If k > 0 then tk = `; Set tj = tj ? 1 for tj > 0, j = 1; 2; : : :; n; Until Iteration-counter = n rep 0 = fopt Until fopt xopt is an approximation to the optimal solution j

See Hansen and Jaumard [106] for a discussion on the choice of parameters n rep and ` (length of Tabu list).

Figure 2 Kavvadias and Papadimitriou [124] propose a dierent way to exploit the gradient, i.e., variable depth search, which is based on ideas of Lin and Kernighan [137] for the travelling salesman problem. An initial solution is drawn at random, then moves are made along a direction of steepest-ascent or mildest-descent among unexplored directions. In this way one gets eventually to the opposite of the initial truth assignment. 27

Then the best solution along the path so-explored is selected and the procedure iterated as long as an improved solution is found. Rules of this method are given in Figure 3. Experiments conducted in the unweighted case show Tabu Search to give better results, and to obtain then more quickly than simulated annealing [106]. From further unpublished results, variable depth search appears to be almost as good but not better than Tabu Search.

Variable Depth Search for minimizing a weighted maxsat function. Select an initial solution x0 ;

fopt = f (x0); xopt = x0 ;

Repeat

0 = fopt fopt Set tj = 0 for j = 1; : : :; n;

Repeat Select xk 2 N (x0) such that k = f (xk ) ? f (x0) = min j ; j jt =0 x0 = xk ; tk = 1; If f (x0) < fopt then fopt = f (x0); xopt = x0 ; endif; Until all tj = 1 x0 = xopt; 0 = fopt Until fopt xopt is an approximation to the optimal solution. j

Figure 3 3.2.2 Exact algorithms When no more negative reduced cost (in case of minimization) can be found by a heuristic, and if no feasible solution has been obtained when considering the decision version of probabilistic satis ability, the auxiliary problem must be solved exactly. Research on maximization of nonlinear functions in 0{1 variables is well developed, see Hansen, Jaumard and Mathon [109] for a recent survey. Methods are based on: (i) 28

linearization; (ii) boolean manipulations (or algebra); (iii) implicit enumeration and (iv) cutting-planes. The two rst types of methods have been applied to probabilistic satis ability and are the only ones reviewed here (the other ones also hold promise and an experimental comparison for probabilistic satis ability problems and extensions would be of interest). Linearization is done by replacing products of variables by new variables and adding constraints to ensure that values agree in 0{1 variables (Dantzig [63], Fortet [80, 81]). Consider a term Y c xj (30) j 2J

where c 2 IR and xj 2 f0; 1g for j 2 J . Then (30) is equivalent to subject to:

cy

y P xj ? jJ j + 1; (31) j 2J y xj j 2 J; y 0 as the rst constraint forces y to be equal to 1 when all xj for j 2 J are equal to 1 and the two last constraints force y to be equal to 0 as soon as one of these xj is equal to 0. Note that it need not be explicitly speci ed that y is a 0{1 variable. Moreover, if c > 0 the rst constraint may be omitted, and if c < 0 the last constraints may be omitted, as the variable y appears in no other term or constraint, and hence automatically takes the required value at the optimum. Linearization as done above introduces as many new variables as there are nonlinear terms in the function to be minimized (or maximized) and a number of new constraints equal to the number of nonlinear clauses with a negative coecient 0 plus the number of nonlinear clauses with a positive coecient multiplied by the average number of variables in these clauses. So the size of the resulting linear 0{1 variables increases quickly with m, n and the number of non-zero dual variables ui. Fortunately, it turns out that this last number tends to be small at the optimum. 29

A slightly dierent linearization procedure has been proposed by Hooker [116], see also Andersen and Hooker [11]. Algebraic methods for maximizing a nonlinear 0{1 function are based on variable elimination (Hammer, Rosenberg and Rudeanu [103], Hammer and Rudeanu [104], Crama et al. [61]). Let f1 be the function to be maximized and

f1(x1; x2; : : :; xn) = x1g1(x2; x3; : : : ; xn) + h(x2; x3; : : : ; xn)

(32)

where g1 and h1 do not depend on x1. Clearly there exists a maximizing point (x1; x2; : : :; xn) of f1 such that x1 = 1 if g(x1; x2; : : :; xn) > 0 and such that x1 = 0 if g(x1; x2; : : :; xn) 0. Then de ne a function 8 < g (x2; x3 ; : : :; xn ) if g (x2 ; x3 : : :; xn ) > 0 ( x ; x ; : : : ; x ) = 1 1 2 n : 0 otherwise (33) n

o

= max g(x2; x3; : : :; xn); 0 : Let p2 = 1 + h1 (where 1 is expressed in polynomial form). The problem thus reduces to maximization of the n ? 1 variable function f2. Iterating yields sequences of functions f1; f2; : : :; fn and 1; 2; : : :; n?1 where fi depends on n ? i + 1 variables. A maximizing point (x1; x2; : : : ; xn) is then obtained from the recursion i (xi+1 ; xi+2 ; : : :; xn ) > 0:

xi = 1 if and only if

(34)

The crucial variable elimination step may be done by a branch-and-bound algorithm of special type which proceeds to the determination of 1, adding variables in direct or complemented form when branching (Crama et al. [61]). Let J denote the index set of variables appearing together with x1 in a term of f1. After replacing x1 by 1 ? x1 and grouping terms in which x1 appears, we have

f (x1; x2; : : :; xn) = x1g1(xj : j 2 J ) + h1(x2; : : : ; xn) and g1 can be written

g1 = c0 +

X xj j 2J

j

+

30

X Y

i2 j 2T (t)

xj

jt

(35) (36)

where xj is equal to 1 if j = 1 and to 0 if j = 0. Then one aims to nd a polynomial expression of the nonlinear 0{1 function 1 = max fg1; 0g for all 0{1 vectors (xj ; j 2 J ). j

If it can be shown that g1 is always positive, it can be copied out and if never positive deleted. Otherwise, branching on a variable xs gives

g1 = xsg0 + xs g00 where g0 and g00, restrictions of g1 induced by xs = 1 and xs = 0, are considered in turn. Lower and upper bounds g 1 and g 1 on g1 are given by

g1 = c0 + P min f0; cj g + P min f0; cj g j 2J j 2

g1 = c0 + P max f0; cj g + P max f0; cj g j 2

j 2J

(37)

Moreover, penalties qj1 and qj0, gj1 and gj0 associated with xation of xj at 1 or 0 are n

o

p1j = max j cj ; (j ? 1)cj + P (1 ? jt) maxf?ct; 0g o i2 jj 2T (i) n p0j = max ?j cj ; (1 ? j )cj + P jt maxf?ct; 0g

(38)

i2 jj 2T (i)

for g1 and

n

o

qj1 = max ?j cj ; (1 ? j )cj + P (1 ? jt) maxfct; 0g i2 jj 2T (t) o n qj0 = max j cj ; (j ? 1)cj + P jt maxfct; 0g

(39)

i2 jj 2T (t)

for g1. These penalties can be added (subtracted) to g1 (from g1) when xj is xed. They also lead to the improved lower and upper bounds minfp1j ; p0j g; g1 = g1 + max j 2J (40) 1; q 0g: g1 = g1 ? max min f q j j j 2J To describe the branch-and-bound algorithm we use the terminology of Hansen [105], with the following extended meaning: a resolution test exploits a sucient condition for a particular formula to be the desired expression of the current nonlinear 0{1 31

function, a feasibility test exploits a sucient condition for the null function to be such an expression. Let ` denote the product of variables in direct or complemented form corresponding to the variables xed at 1 and at 0 respectively in the current subproblem.

Algorithm C (Basic algorithm revisited, Crama et al. 1990) a) Initialization. Set = ; ` = 1. b) First direct feasibility test. Compute g. If g 0, go to (i). c) Second direct feasibility test. Compute g. If g 0, go to (i). d) First direct resolution test. Compute g. If g 0 then

+ `g and go to (i).

e) Second direct resolution test. Compute g. If g 0 then

+ `g and go to (i).

f) Conditional feasibility test. If, for some j 2 J , g ? qj1 0 set ` `xj , J J nfj g and x xj at 0 in g. If, for some j 2 J , g ? qj0 0 set ` `xj , J J n fj g and x xj to 1 in g. If at least one variable has been xed in this test return to b). g) Conditional resolution test. If for some j 2 J , g + p1j 0 set ` `xj , J J nfj g, x xj at 1 in g, + `g and go to (i). If for some j 2 J , g + p0j 0 set ` `xj , J J n fj g, x xj at 0 in g, + `g and go to (i). h) Branching. Choose a variable xs to branch upon be setting s = 1 or s = 0. Set ` xj , J J n fsg. Update g be setting xs to 1. Return to b). j

s

i) Backtracking. Find the last literal xs chosen in b) for which the complementary value has not yet been explored. If there is none stop. Otherwise delete from ` the literal xs and the literals introduced after it, free the corresponding variables in g. Update J , then x xj at 0 in g, set ` `x1? , J J n fsg and return to b). s

s

j

s

An example and discussion of how to best implement this algorithm are given in Crama et al. [61]. 32

3.3 Computational Experience Computational results for probabilistic satis ability have been reported by Kavvadias and Papadimitriou [124] and Jaumard, Hansen and Poggi de Arag~ao [117]. The former authors consider only the decision version, and solve the auxiliary problem of the column generation method by a variable depth search heuristic. The algorithm so implemented remains incomplete in that it is unable to prove there are no feasible solutions when none is found. Problems with up to 70 variables and 70 sentences, which are clauses, are solved. The latter authors use both Tabu Search and the Basic Algorithm Revisited, described above, in their column generation algorithm. The linear programming part is done with the mpsx code of Marsten [142]. Probabilistic Satis ability problems with up to 140 variables and 300 sentences are solved, both in decision and in optimization form. Moreover, problems with conditional probabilities of comparable size, are solved also. Recently, using the cplex code and linearization to solve the auxiliary problem led to solve problems with up to 500 sentences (Douanya Nguetse [71]. It thus appears that advanced linear programming tools allow solution of large scale probabilistic satis ability problem. To the best of our knowledge, no other method solves comparable size problems within probability logic except if strong independence assumptions are made (as, e.g., in Lauritzen and Spiegelhalter [135] method for uncertainty propagation in Bayesian networks).

3.4 Computational Complexity Georgakopoulos, Kavvadias, and Papadimitriou prove that Probabilistic Satis ability is NP-complete. In the proof, these authors consider problem (1) with m clauses as sentences and n variables, the result holds for general sentences as a consequence. First, they show that solving the dual of (1) by Khachiyan [126] ellipsod method for linear programming takes O(m2 log m) iterations, each of which requires solving an instance of a weighted maxsat (or unconstrained nonlinear 0{1 programming) aux33

iliary problem on the same clauses (with weights assumed to have O(n) bit length) to nd a violated inequality and performing O(m3 log m) more computations per iteration. Second, they note (as mentioned above) that the classical NP-complete satis ability (sat) problem is a particular case of (1). This proof shows polynomiality of algorithms for probabilistic satis ability hinges on polynomiality of algorithms for the weighted maxsat auxiliary problem. To study this point the co-occurrence graph G = (V; E ), (e.g., Crama et al. [61]) of the nonlinear 0{1 function is a useful tool. Its vertices are associated with variables of that function and edges join pairs of vertices associated with variables (in direct or complemented form) appearing together in at least one term. Kavvadias and Papadimitriou [124] show that probabilistic satis ability remains NP-complete when all clauses have at most two literals and G is planar (planar 2psat). Moreover, compatible marginals, i.e., the problem of nding if marginal probability distributions for all four conjunctions of given pairs of variables are compatible is also NP-hard. However, the case where there are at most two literals per clause and G is outerplanar, i.e., may be embedded in the plane so that all vertices are on the same face, is polynomial (Georgakopoulos et al. [86]). Other known polynomial cases of unconstrained nonlinear 0{1 programming lead to further polynomial cases of probabilistic satis ability. They include maximization of almost positive functions in which all terms with more than one variable have positive coecients (Rhys [159], Balinski [17]), unate functions, which are reducible to almostpositive ones by switching some variables (Hansen and Simeone [114], Crama [60], Simeone, de Werra and Cochand [163], unimodular functions which lead to unimodular matrices of coecients after linearization (Hansen and Simeone [114]), supermodular functions ( Grotschel, Lovasz and Schrijver [93], Billionnet and Minoux [20]), functions for which G contains no subgraph reducible to the complete graph or ve vertices (Barahona [19]) and functions for which G is a partial k-tree (Crama et al. [61]), see the cited references for de nitions not given here. Note that these results are theoretical, as Khachiyan [126] algorithm is not ecient 34

in practice. A dierent type of result has been obtained by Crama and van de Klundert [62], who are interested in polynomial heuristics. They consider problem (3) with lower bounds only, and no objective function, i.e., 1p = 1 Ap (41) p 0: Assuming with almost no loss of generality that i = bi=q for i = 1; 2; : : : ; m, with the bi, and q integers, (41) has a solution if and only if the optimal value of 2 P xi t=1 n

min subject to:

(42)

Ax b x0 is at most q. A heuristic solution to (42) can be obtained by a greedy column generation algorithm where the polynomial heuristic for weighted maxsat with a 3/4 guarantee of performance of Goemans and Williamson [89] is used to determine the columns to be n n selected, i.e., minimizing approximately 1= P aij (or maximizing P aij ). This gives a i=1 i=1 m solution of value at most 8=3H (m) times the optimal one where H (m) = P 1i . If this i=1 value is less than q a solution to (41) has been found. Otherwise, the selected columns may be used in an initial solution completed by the usual column generation method.

4 Decomposition Large instances of probabilistic satis ability problems are dicult to solve as the corresponding matrices A in (1), (2), (3), : : : tend to be dense. Even the use of an ecient code such as cplex does not allow solution of problems with more than 500 sentences in reasonable time. This suggest the interest of decomposition methods for probabilistic satis ability, a topic rst studied by Van der Gaag [167, 168]. The tools used are the same as for expression of conditional independence between variables in the study of belief networks 35

(e.g. Pearl [150]). Independence relations are represented by an I{map, i.e., a graph G = (V; E ) with vertices vj associated with variables xj and where the edge (vr; vs) does not belong to E if and only if variables xr and xs are conditionally independent. It is assumed edges have been added, e.g., with the minimum ll-in algorithm of Tarjan and Yannakakis [166], until all cycles of length greater than three have a cord, i.e., en edge joining non-successive vertices. Then G is a decomposable I{map. Assume further that all initially given probabilities as well as the probability to be bounded are local to the cliques of C . Under these conditions the joint probability distribution P can be expressed, as a product of marginal probability distributions on the maximal cliques of C , adequately scaled. So the problem will be solved on each of the cliques C`1; C`2; : : : ; C`i. However, it is necessary that the marginal distributions so-obtained agree on the intersection of the cliques. To ensure this one considers a join-graph G1 in which vertices are associated with the cliques C`1; C`2; : : : ; C`i and edges join vertices associated with cliques having a non-empty intersection. Then one determines a join-tree, i.e., a spanning tree of G1. Conditions must be written for each edge of this tree. Probabilistic satis ability in optimization form, with decomposition may thus be written: min = max Am+1p subject to: 1 pi = 1 i = 1; : : : ; t i (43) A pi = i i = 1; : : : ; t i j Tij p ? Tjip = 0 i = 1; : : : ; t; j such that C`i and C`j are adjacent in the join tree i p 0 i = 1; : : : ; t; where t is the number of maximal cliques. The rst set of constraints corresponds to the probabilistic satis ability problem on each clique and the second set to the compatibility conditions between the marginal distributions.

36

Example 6. (Douanya Nguetse et al. [71])

Consider the following six logical sentences and their probabilities of being true: prob (S1 x1) = 0:6 prob (S2 x1 _ x2) = 0:4 prob (S3 x2 _ x3) = 0:8 prob (S4 x3 ^ x4) = 0:3 prob (S5 x4 _ x5) = 0:5 prob (S6 x2 _ x5) = 0:6 Find best possible bounds on the probability 7 of S7 x5. The corresponding objective function row and matrix A are 1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p15 p16 p17 p18 p19 p20 p21 p22 p23 p24 p25 p26 p27 p28 p29 p30 p31 p32 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 0 1 0 1 1 1 1 1 1 1 1 1 0 1 0 1 0 1 0

p

Solving the linear program (2) gives when minimizing a lower bound of 0.2 on 7, with p13 = 0:2, p10 = 0:3, p12 = 0:1, p20 = 0:2, p22 = 0:2, all other pk = 0; and when maximizing an upper bound of 0.5 on 7, with p11 = 0:1, p13 = 0:1, p17 = 0:3, p10 = 0:3, p14 = 0:1, p22 = 0:1, all other pk = 0. The corresponding I-map, decomposable I-map and n join-tree are represented on Figure 4. This problem's objective function, coecient matrix and right hand side are: Si

51 S 0 x1 x1 _ x2 (x2 = 1) 2 S 0 x2 _ x3 x3 ^ x4 (x2 = 1; x4 = 1) (x2 = 1; x4 = 0) (x2 = 0; x4 = 1) 3 S 0 x4 _ x5 x2 _ x5 x

p

1 10 1 1 1 1

p

1 20 1 1 0 0

p

1 30 1 0 1 1

p

1 40 1 0 1 0

2 p2 p2 p2 p2 p2 p2 p2 p3 p3 p3 p3 p3 p3 p3 p3 10 20 30 40 50 60 70 80 11 20 31 40 51 60 71 80

p

?1 ?1 ?1 ?1 0 0 0 0 1 1 1 1 0 0

1 1 0 0 1 0

1 1 0 1 0 0

1 1 0 0 1 0

1 1 1 0 0 1

37

1 1 0 0 0 0

1 0 0 0 0 1

1 0 0 0 ?1 ?1 0 0 0 0 0 0 1 1 1 0 1 1

0 0 0 0 0 0

?1 ?1 0 0 0 0 0 0 ?1 ?1 0 0

1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 0

=1 = 0:6 = 0:4 =0 =1 = 0:8 = 0:3 =0 =0 =0 =1 = 0:5 = 0:6

v1 , v2 v3 v1

v2

v3 v4

v2

v1

v5

v2 v4

v2 , v3 , v4 v2 , v4

v5

v2 , v3 , v5 a)

b)

c)

Figure 4: I-map, decomposable I-map and join tree Solving the linear program (43) gives when minimizing a lower bound of 0.2 on 7, with p12 = 0:6, p13 = 0:4, p22 = 0:3, p23 = 0:1, p25 = 0:1, p26 = 0:1, p27 = 0:1, p28 = 0:1, p32 = 0:1, p34 = 0:3, p36 = 0:4, p37 = 0:2, all other pik = 0; and when maximizing an upper bound of 0.5 on 7, with p12 = 0:6, p13 = 0:4, p22 = 0:3, p23 = 0:1, p25 = 0:3, p26 = 0:1, p27 = 0:1, p28 = 0:1, p33 = 0:3, p36 = 0:4, p37 = 0:2, all other pik = 0. 2 Problem (43) is equivalent to problem (2), in the sense that it gives the same bounds. Indeed, (i) to any feasible solution of (2) corresponds a feasible solution of (43) with the same value, obtained by grouping terms to build the marginal distribution on each of the cliques, and (ii) to any feasible solution of (43) corresponds a solution of (2) with the same value: as C is a decomposable I-map it follows from a result of Pearl [150] that there is a corresponding joint probability distribution given by

prob C`i(wk ) pk = prob(wk ) = prob C` \ C` ( w ) i j k i=1;2;:::;` Y

(44)

where prob C`i(wk ) denotes the probability of the restriction of world wk to C`i and prob C`i \ C`j (wk ) the probability of the restriction of world wk to the intersection of cliques C`i and C`j . Note that the probability distribution obtained with (44) from the solution of (43) will usually not be the same as that of (2). In particular, it may not be a basic solution. 38

Example (continued) For world w10 = (10110)T , in the minimization case

prob C`2(011)T prob X`3 (010)T p10 = prob(C`1(10)T prob C`1 \ C`2(011)T prob C`2 \ C`3(010)T = 0:6 0:3 + 0:10+:30:1 + 0:1 00::44 = 0:3: Similarly, p11 = 0:1, p14 = 0:1, p15 = 0:1, p20 = 0:2, p24 = 0:1, all other pk = 0. 2 Van der Gaag [167, 168] proposes to exploit the structure of (39), eliminating subproblems one at a time, beginning from the leaves of the join-tree, by: (i) nding their feasible set Fi from the local system of constraints Aipi = i; (ii) projecting this feasible set on clique intersection Si , i.e., nding the set fTij pi ; pi 2 Fig; (iii) transmitting these restrictions to the neighboring subproblem by the equations Tij pi ? Tjipj = 0; (iv) iterating until only the problem corresponding to the root of the join-tree and containing the objective function remains; (v) solving this last problem, with all additional constraints by linear programming (and then the other subproblems if a solution is desired in addition to the bounds).

No details on how to perform these operations are given. The usual way would be to use vertex and extreme ray enumeration techniques for polytopes. Such methods are time-consuming as the number of vertices of polytopes of xed dimension usually augments exponentially with the number of constraints (Dyer [77], Chen et al. [47, 48]). It is of course also possible to apply Dantzig-Wolfe [63] decomposition to (43). Even without doing this the form (43) is of interest. Indeed while (43) has usually many less t t columns than (2) (i.e., P 2jC` j instead of 2j_j) and more rows (i.e., n + P 2jC` \C` j i=1 i=1 instead of m) it is also much sparser. This sparsity is well exploited in codes such as cplex. So an alternative is to solve (43) by column generation. To nd the entering column it will be necessary to solve up to t subproblems, one for each clique C`i. i

i

39

j

Example (Continued) For the rst clique the reduced cost expression becomes

?u1 ? u2x1 ? u3(1 ? x1 + x1x2) ? u4x2 = ?u1 ? (u2 + u3)x1 ? u4x2 ? u3x1x2; for the second clique u4 x2 ? u5 ? u6 (x2 + x3 ? x2 x3) ? u7x3 x4 ? u8 x2 x4 ? u9 (x2 ? x2x4 ) ? u10 (x4 ? x2x4 ) = ?u5 + (u4 ? u6 ? u9)x2 ? u6x3 ? u10x4 + u6x2x3 + (u9 + u10 ? u8)x2x4 ? u7x3x4; and for the third clique x5 ? u8 x2x4 + u9 (x2 ? x2 x4)+ u10(x4 ? x2x4 ) ? u11 ? u12(1 ? x4 + x4 x5 ) ? u13(x2 + x5 ? x2 x5) = ?u11 ? u12 +(u9 ? u13)x2 ? u12x4 ? (u13 ? 1)x5 ? (u9 + u10)x2x4 + u13x2x5 ? u12x4x5; where x1; x2; : : :; x5 2 f0; 1g.

This approach led to solve instances with up to m = 900 and a few cliques with small intersections (Douanya Nguetse et al. [71]).

5 Nonmonotonic Reasoning and Restoring Satis ability 5.1 Minimal Extension of Probability Intervals When new sentences and their probabilities are added to a probabilistic satis ability problem, consistency of the resulting system must be checked. In view of the \fundamental theorem of the theory of probabilities" of de Finetti [68] it is always possible to nd a coherent extension of the probabilities of the initial sentences. Indeed, considering the optimization versions (2){(6) and (8) of probabilistic satis ability it suces, while adding one sentence at a time, to choose a probability m+1 within the interval [m+1; m+1]. However, this might not correspond to the subjective view about m+1 to be modelled, the sign that some previously chosen values should be revised. This situation is more likely to happen if several new sentences are simultaneously added, possibly by dierent experts. Two natural ways to restore satis ability are to modify the probabilities i (or their bounds i and i) and to delete some of the sentences. We discuss them in 40

this subsection and the next one. To restore satis ability (or coherence) with minimal changes one must solve the following linear program subject to:

min ` + u 1p = 1 ? ` Ap + u `; u; p 0

(45)

(Jaumard et al. [117]), i.e., minimize the sum of enlargements of the probability intervals needed to restore satis ability. As con dence in the (subjective) estimates of the various sentences may vary substantially use can be made of the weighted objective function (46) min w` + wu where w and w are vectors of positive weights the larger the more the probability intervals (; ) are considered to be accurate. Problem (45) and (46) can be solved by column generation algorithms, as discussed in Section 3, keeping the column corresponding to ` and u explicit (or treating them separately as in the revised simplex algorithm). While similar extensions of probability intervals for conditional probabilities might be considered the resulting problem would be a bilinear program, which is much more dicult to solve than a linear program.

5.2 Probabilistic Maximum Satis ability A second way to restore satis ability is to delete a subset of sentences with minimum cardinality (or possibly with minimum total weight, where weights of sentences are subjective estimates of their importance or reliability). This is done by solving the

41

following mixed 0{1 linear program: subject to:

min jyj

m P = yi i=1

1p = 1 ? `y Ap + uy ` y u (1 ? )y `; u; p 0 y 2 f0; 1gm

(47)

The variables yi for i = 1; : : : ; m are equal to 1 if sentence Si is deleted and to 0 otherwise. In the former case the interval [i; i] can be extended to [0; 1], so the probability of Si is not constrained any more, which is equivalent to deleting Si. In the latter case ` = 0 and u = 0 so the probability interval [i; i ] is unchanged. Problem (47) has an exponential number of columns and also some integer variables. To solve it, it is necessary to combine column generation with integer programming. Fortunately, the number of integer variables, i.e. m, is small. So the standard dual algorithm for mixed-integer programming can be extended fairly eciently (Hansen, Minoux, Labbe [113]). It turns out that a primal integer programming algorithm (alternating phase 1 and phase 2 calculations as in the simplex algorithm) is even more ecient (Hansen, Jaumard and Poggi de Arag~ao [110, 111]).

6 Other Uses of the Probabilistic Satis ability Model 6.1 Maximum Entropy Solution The model (2) has been criticized on the ground that the bounds obtained may be weak for large instances or even provide no information at all, i.e., be equal to 0 or 1. It may be argued that in such a case the bounds being best possible, nothing more can be said with the available information. But bounds being far apart suggests also the interest of a representative solution, if one can be de ned. A natural choice is then to 42

seek the solution which makes the least assumptions, i.e., makes probabilities of the possible world as equal as can be, subject to the given constraints. This solution is the maximum entropy one. The problem becomes max p log p subject to: 1 p = 1 Ap = p 0:

(48)

This problem is very hard to solve. Using Lagrangian multipliers the objective function becomes max p log p ? ( ? Ap) (49) and dierentiating with respect to all pj yields the rst-order conditions log pj + 1 + from where it follows that

m X i=1

i aij = 0

m X ? 1 e? a pj = e i=1 j

Then setting

ij

:

(50) (51)

a0 = e?1e?1 (52) i = 1; : : : ; m ai = e? each probability pj can be expressed as a product of some of the a0; a1; : : :; am. i

This reduces (48) to a system of multilinear equations in the quantities a0; a1; : : :; am (Cheeseman [50], Nilsson [144], Kane [122]). Such a system may be solved by an iterative method, but this is time consuming even for small m. Moreover, as shown by Paris and Vancovska [146] computing the factors a0a1; : : : ; am to a reasonable accuracy is NP-hard. Nilsson [144] proposes also another approximate method for nding a representative solution of (2). McLeish [140, 141] characterizes when both solutions agree. Kane [119, 120, 121] considers systems in which sentences S1 to Sm?1 are atoms and Sm is an implication between the conjectures of these atoms and another atom, the conclusion. A closed form solution for the factors is then obtained, from which probabilities of the possible worlds are readily obtained. 43

6.2 Anytime Deduction Frisch and Haddawy [82, 83] consider models (1) and (2), as done by Nilsson [144], as well as their extension (3) to probability intervals. However, they do not apply the simplex algorithm, but consider instead a series of rules. Examples of such rules are prob(S1jS4) 2 [x; y] prob(S1 _ S2jS4) 2 [u; v] (53) prob(S1 ^ S2jSh4) 2 [w; z] i prob(S2jS4) 2 max(w; u ? y + w); min(v; v ? x + z) provided x y; x v; w v; prob(S1jS4) 2 [x; y] ; (54) prob(S 1jS4) 2 [1 ? x; 1 ? y] and prob(S2jS4) 2 [x; y] (55) prob(S1jS2 ^ S4) 2 [u; v] prob(S1 ^ S2jS4) 2 [x; u; y; v] where S1; S2; S3; S4 represent arbitrary propositional formulas. These rules have been obtained from various known results by adding an arbitrary conditioning sentence S4 (the same in premise and conclusions) or proved by the authors. While the set of rules considered is not complete, it covers most rules proposed in the literature. Frisch and Haddawy [83] propose an anytime deduction procedure for model (3): starting with an interval [0; 1] for the probability of the objective function sentence they apply the rules, in any order, to the data. The probability intervals so obtained decrease continuously. The algorithm can be stopped at any time, when it is judged that enough computing has been done or no further progress is observed (even if the best bounds have been obtained, the set of rules not being complete does not allow to recognize it is the case). An important feature of this approach is that it justi es what is done step by step and thus provides an explicit proof of the results, showing how they are obtained. There is, however, a diculty when the intervals values for the given sentences are not 44

coherent. Indeed, this fact may not be recognized, or even recognizable with the given rules. Then according to which rules have been applied when it is decided to stop a probability interval with high or low values may be obtained, and is arbitrary. A way out is to rst check consistency with linear programming and column generation, then apply the rules (possibly until the best bounds are obtained) to get an explicit step by step proof. If the given intervals are not coherent one may restore satis ability by extending them as discussed in Section 5, and then proceed as above.

7 Other Related Approaches 7.1 Incidence Calculus Many approaches to uncertainty in arti cial intelligence use probabilities. Some of them which predate or are contemporary to Nilsson's [144] paper are quite close to probabilistic satis ability. This is the case for Quinlan's [155] Inferno system, which exploits various rules of probability logic and for the incidence calculus developed by Bundy and coworkers [36, 37, 38, 139]. This logic for probabilistic reasoning proceeds, as probabilistic satis ability from lower and upper bounds or sentences (axioms of a logical theory) to lower and upper bounds on the remaining sentences (formulas of the theory). Incidences, i.e., sets of possible worlds, with a probability, are associated with sentences rather than probabilities or bounds on them. The intended meaning of the incidence of a sentence is the set of possible worlds in which the formula is true. This encoding makes incidence calculus truth functional, i.e., the incidence of a compound formula can be computed directly from its parts. Given a set W of worlds (which are here primitive objects of incidence calculus) rules for extending incidence are as

45

follows:

i(true) = W i(false) = ; i(S 1) = W n i(S1) i(S1 ^ S2) = i(S1) \ i(S2) i(Si _ S2) = i(S1) [ i(S2) i(S1 ! S2) = W n (S1) [ i(S2); from which one can deduce the rules for probabilities prob(true) = 1 prob(false) = ; prob(S1) = 1 ? prob(S1) prob(S1 _ S2) = prob(S1) + prob(S2) ? prob(S1 ^ S2) prob(S1 ! S2) = prob(S1) + prob(S2) ? prob(S 1 ^ S2) prob(S1 ^ S2) = prob(S 1) + prob( q S2 ) + c(S1; S2) prob(S1)prob(S1)prob(S2)prob(S 2) where c(S1; S2) is the correlation between S1 and S2 de ned by c(S1; S2) = q (S1 _ S2) ? prob(S1 ^ S2) : prob(S1)prob(S 1)prob(S2)prob(S 2)

(56)

(57)

(58)

Using these rules it is usually only possible to determine lower and upper bounds on the incidences of conclusions. The precision of these bounds will depend on the number of possible worlds considered. While incidence calculus avoids considering all possible worlds in the sense of probabilistic logic (and hence is easier to use than Nilsson's original proposition in which all such worlds are considered to set up (1) or (2)) the precision of the bounds obtained depends on the number of worlds considered and these bounds are not necessarily valid in worst case.

7.2 Bayesian Logic Consider a Bayesian network G = (V; U ) (e.g., Pearl [150]). Vertices (or nodes) vj of V are associated with simple events (or logical variables xj ; we assume here only two outcomes for each event are possible, i.e., true or false). Directed arcs (vi; vj ) are used 46

to represent probabilistic dependence among events. Moreover, the network is acyclic. Probabilities of vertices conditioned on the values of their immediate predecessors are given. The probability that a vertex is true, when conditioned on the truth values of all its non-successors, is equal to the probability that it is true, conditioned only on the truth values of its immediate predecessors. Consequently, probability of any possible world can be computed by the chain rule using the speci ed conditional probabilities only. This leads in practice to fairly easy ways to compute probabilities or conditional probabilities of events provided immediate predecessors are not too numerous (although it is NP-hard to do so even if their number is bounded as shown by Cooper [59]), see Pearl [150], Lauritzen and Spiegelhalter [135] and Andersen and Hooker [11] for examples. The assumptions made are, however, very strong ones: sucient information must be given to de ne a unique point probability distribution and this supposes giving 2jpred j ? 1 exact values, where predj denotes the number of predecessors of vertex j for exact vertex vj . j

Andersen and Hooker [11] examine how some of the assumptions of belief networks could be relaxed, by combining this approach with probabilistic satis ability. It is easy to see that usual computations in Bayesian networks can be cast into the form (3) (Hansen, Jaumard, Douanya Nguetse and Poggi de Arag~ao [108]), Andersen and Hooker [11] propose a more complicated nonlinear formulation. Then precise probability values for simple or conditional events may be replaced by intervals as discussed in Section 7. One can also add constraints dierent from the conditional implications, allow for networks with cycles, etc. not all extensions remain linear, as e.g., when marginal probabilities for simple events are given by intervals and these events are independent. While some proposals have been made (e.g., Andersen and Hooker [11] recommend using generalized Benders decomposition (Georion [85]) and signomial geometric programming for solving subproblems) ecient solution of nonlinear probabilistic satis ability problems is still largely unexplored.

47

7.3 Assumption-based Truth Maintenance Systems Assumption-based truth maintenance systems (Laskey, K.B. and Lehner [134], Kohlas and Monney [129]) may be viewed as probabilistic satis ability problems of the form (2) or (3), in which sentences are of two types: assumptions, which are atoms having a given probability or a probability in a given interval, and rules which have probability 1. Moreover, assumptions are assumed to be independent (in the usual sense of probability theory: this is dierent from the concept used by Boole [26] which corresponds to conditional logical independence). A very careful examination of algorithms for coherence and bounding in assumptionbased truth maintenance systems is made in a recent book of Kohlas and Monney [129]. It seems likely that relaxing the independence assumption might make solution of such problem easier (and the bounds obtained less precise).

7.4 Probabilistic Logic via Capacities An important relationship between probabilistic satis ability and capacities (or belief functions) has been recently established by Kampke [118]. A lower probability is the minimum of a set of probability distributions de ned over the same space. The probabilistic satis ability model (1) can be extended by considering several probability distributions p1; p2; : : :; pN instead of a simple one: 1 pi = 1 A min(p1; p2 ; : : :; pN ) = pi 0

i = 1; 2; : : : ; N i = 1; 2; : : : ; N:

(59)

While a solution to (1) always satis es (59) the converse is not necessarily true.

Example (Kampke [118]) Let S1 x1 _ x2, S2 x1x2 _ x1x2, 1 = 0:4 and 2 = 0:3. Set p1 = prob(x1x2),

p2 = prob(x1x2), p3 = prob(x1x2) and p4 = prob(x1x2). 48

Then the probabilistic satis ability problem (1) has no solution, but the lower probability problem (59) has a solution p1 = (:1; :1; :2; :6) (60) p2 = (:2; :3; :3; :2):

2

Kampke [118] proves that lower probabilities which are solutions of (59) are substitutable by the minimum of only two distributions. Moreover, these two distributions form a totaly monotone capacity, or belief function (see Choquet [52], Shafer [162] or Kampke [118] for de nitions). Problem (59) can be solved by extending solution techniques for (1) and (2).

7.5 Other Applications Due to its simplicity, it is not surprising that probabilistic satis ability has many applications (and the potential for many more) in addition to those in ai and in Probability discussed throughout this chapter. We mention a few Zemel [175], Assous [14], Brecht and Colbourn [32], Colbourn [53], Hansen, Jaumard and Douanya-Nguetse [107] consider two-terminal reliability of networks. Failure probabilities, not assumed to be independent, are given for all edges. The probability of an operational path from source to sink is to be bounded. Zemel [175] suggest the use of column generation and nds polynomial cases. Assous [13, 14] shows the lower and upper bounds can be found by solving a shortest path and a minimum cut problem. Brecht and Colbourn [32] use this result to improve reliability bounds with independent probabilities of edges failure through a two-stage computation. Hansen et al. [107] get more precise bounds by considering also probabilities of simultaneous failure of pairs of edges. Prekopa and Boros [154] study electricity production and distribution systems. Assuming probability distributions for oers and demands to be given they show how to compute the probability of non-satisfaction of demand due to insucient production or transportation capacity. 49

Kane, McAndrew and Wallace [123] apply the maximum entropy algorithm of Kane [119, 121] to model-based object recognition with a signi cant improvement over previous methods. Hailperin [100] suggest to apply probabilistic logic to fault analysis in digital circuits (Parker and McCluskey [147, 148]).

8 Conclusions While many proposals have been made for handling uncertainty in ai, there are few methods which apply to a large variety of problems, and also few methods which allow rigorous solution of large instances. The approach based on logic and probability, epitomized by probabilistic satis ability and its extensions is one of the most comprehensive and powerful available. This is largely due to the strength and versatility of the advanced linear and mixed-integer programming techniques upon which it relies. Both analytical and numerical solution can be obtained for a large variety of problems. The former are obtained through Fourier-Motzkin elimination, or enumeration of extreme points and extreme rays of polytopes, the latter through linear programming algorithms. Large instances can be solved using column generation and nonlinear 0{1 programming. These solution methods apply both to the consistency problem for given logical sentences and probabilities and to the problem of nding best bounds for an additional logical sentence. Both simple and conditional probabilities can be considered in the constraints and/or in the objective function, as well as probability intervals and additional linear constraints on the probabilities. Recent theories on combination or iteration of conditionals can also be expressed in this framework. Moreover, nonmonotonic reasoning can apply, through the study of minimal changes to restore consistency. No independence or conditional independence conditions assumptions need be im50

posed, but conditional independence may be implicitly taken into account. Probabilistic satis ability and its extensions may be viewed as the applied, computation oriented (but including formal computing) side of probability logic, which is a very active research area. After a brilliant start, with Boole's work, followed by a long dormant period until Hailperin's rst paper, it is now gaining impetus. Much work remains to be done, but the perspectives for theory and applications of probability satis ability (including here the subjective probability approach of de Finetti and his school and its extension to imprecise probabilities by Walley) appear very promising.

51

References [1] Aarts, E.H.L. and J.H.M. Korst, Simulated Annealing and Boltzmann Machines, Chichester: Wiley, 1989. [2] Abadi, M., and J.Y. Halpern, Decidability and Expressiveness for First-Order Logics of Probability, Information and Computation 112 (1994) 1{36. [3] Adams, E.W., Probability and the logic of conditionals, in Aspects of Inductive Logic, J. Hintikka and P. Suppes (Eds.) Amsterdam: North-Holland (1966) 265{ 316. [4] Adams, E.W., The Logic of \Almost All", Journal of Philosophical Logic 3 (1974) 3{17. [5] Adams, E.W., The Logic of Conditionals, D. Reidel Publishing, Dordrecht, Holland, 1975. [6] Adams, E.W., Probabilistic Enthymemes, Journal of Pragmatics 7 (1983) 283{ 295. [7] Adams, E.W., On the Logic of High Probability, Journal of Philosophical Logic 15 (1986) 255{279. [8] Adams, E.W. and H.P. Levine, On the Uncertainties Transmitted from Premises to Conclusions in Deductive Inferences, Synthese 30 (1975) 429{460. [9] Andersen, K.A., Characterizing Consistency for a Subclass of Horn Clauses, Mathematical Programming 66 (1994) 257{271. [10] Andersen, K.A., and J.N. Hooker, A Linear Programming Framework for logics of Uncertainty, Mathematish Institut, Aarhus Universitet, May 1993, to appear in Decision Support Systems. [11] Andersen, K.A., and J.N. Hooker, Bayesian Logic, Decision Support Systems 11 (1994) 191{210. [12] Andersen, K.A. and J.N. Hooker, Determining Lower and Upper Bounds on Probabilities of Atomic Propositions in Sets of Logical Formulas Represented by Digraphs, to appear in Annals of Operations Research (1996). [13] Assous, J.Y., Bounds on Network Reliability, Ph.D. Thesis, Northwestern University, 1983. 52

[14] Assous, J.Y., First and Second-Order Bounds on Terminal Reliability, Networks 16 (1986) 319{329. [15] Avis, D., and K. Fukuda, A Pivoting Algorithm for Convex Hulls and Vertex Enumeration Arrangements and Polyhedra, Discrete and Computational Geometry (1992). [16] Bacchus, F., Representing and Reasoning with Probabilistic Knowledge: A Logical Approach to Probabilities, The MIT Press, Cambridge, Massachusetts, 1990. [17] Balinski, M., On a Selection Problem, Management Science 17 (1970) 230{231. [18] Bamber, D., Probabilistic Entailment of Conditionals by Conditionals, IEEE Transactions on Systems, Man and Cybernetics 24 (1994) 1714{1723. [19] Barahona, F., The Max-cut Problem on Graphs Not Contractible to K5, Operations Research Letters 2 (1983) 107{111. [20] Billionnet, A., and M. Minoux, Maximizing a Super-modular Pseudo-Boolean Function: A Polynomial Algorithm for Super-modular Cubic Functions, Discrete Applied Mathematics 12 (1985) 1{11. [21] Bland, R.G., New Finite Pivoting Rules for the Simplex Method, Mathematics of Operations Research 2 (1977) 103{107. [22] Bonferroni, C.E., Teoria statistica delle classe del calcolo delle probabilita, Volume in onore di Riccardo Dalla Volta, Universita di Firenze, 1{62, 1937. [23] Boole, G., Proposed Question in the Theory of Probabilities, The Cambridge and Dublin Mathematical Journal 6 (1851) 186. [24] Boole, G., Further Observations on the Theory of Probabilities, The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 4(2) (1851) 96{101. [25] Boole, G., Collected Logical Works. Vol I, Studies in Logic and Probability, ed. R. Rhees. LaSalle, Illinois: Open Court, 1952. [26] Boole, G., An Investigation of the Laws of Thought, on which are Founded the Mathematical Theories of Logic and Probabilities, London: Walton and Maberley, 1854 (reprint New York: Dover 1958). [27] Boole, G., On the Conditions by which Solutions of Questions in the Theory of Probabilities are Limited, The London, Edinburgh and Dublin Philosophical Magazine and Journal of Science 4(8) (1854) 91{98. 53

[28] Boole, G., On a General Method in the Theory of Probabilities, The London, Edinburgh and Dublin Philosophical Magazine and Journal of Science 4(8) (1854) 431{444. [29] Boole, G., On Certain Propositions in Algebra Connected to the Theory of Probabilities, The London, Edinburgh and Dublin Philosophical Magazine and Journal of Science 4(9) (1855) 165{179. [30] Boole, G., On Propositions Numerically De nite (read posthumously by De Morgan March 16th, 1868), Transactions of the Cambridge Philosophical Society 11 (1871) 396{411. [31] Boros, E., and A. Prekopa, Closed Form Two-sided Bounds for Probabilities that at Least r and Exactly r out of n Events Occur, Mathematics of Operations Research 14 (1989) 317{342. [32] Brecht, T.B., and C.J. Colbourn, Improving Reliability Bounds on Computer Networks, Networks 16 (1986) 369{380. [33] Bruno, G., and A. Gilio, Applicazione del metodo del simplesso al teorema fondamentale per le probabilita nella concezione soggettivistica, Statistica XL (3) (1980) 337{344. [34] Bruno, G., and A. Gilio, Comparison of conditional Events of Zero Probability in Bayesian Statistical Inference, (Italian), Rivisita di Mathematica per le Scienze economiche e sociali (Milan) 8(2) (1985) 141{152. [35] Buchanan, B.G., and E.H. Shortlie, Rule-based expert systems | The MYCIN experiments of the Stanford heuristic programming project, Addison-Wesley, Reading, MA, 1985. [36] Bundy, A., Incidence Calculus: A Mechanism for Probabilistic Reasoning, Journal of Automated Reasoning 1 (1985) 263{283. [37] Bundy, A., Correctness Criteria of Some Algorithms for Uncertain Reasoning using Incidence Calculus, Journal of Automated Reasoning 2 (1986) 109{126. [38] Bundy, A., Incidence Calculus, in: Encyclopedia of Arti cial Intelligence, S.C. Shapiro (ed.), New-York; Wiley (1991) 663{668. [39] Calabrese, P.G., An Algebraic Synthesis of the Foundations of Logic and Probability, Information Sciences 42 (1987) 187{237. 54

[40] Calabrese, P.G., Reasoning with Uncertainty Using Conditional Logic and Probability, in: First International Symposium on Uncertainty Modeling and Analysis, IEEE Computer Society, 1990, 682{688. [41] Calabrese, P.G., Deduction and Inference using Conditional Logic and Probability, Chapter 2 in Conditional Logic in Expert Systems, I.R. Goodman et al., eds., North-Holland, 1991, 71{100. [42] Calabrese, P.G., A Theory of conditional Information with Applications, IEEE Transactions on Systems, Man and Cybernetics 24(12) (1994) 1676{1684. [43] Caratheodory, C., U ber den Variabilitatsbereich der Koezienten von Potenzreihen, die gegebene Werte nicht annehmen, Mathematische Annalen 64 (1907) 95{115. [44] Cerny, V., A Thermodynamical Approach to the Traveling Salesman Problem: An Ecient Simulation Algorithm, Journal of Optimization Theory and Applications 45(1) (1985) 41{51. [45] Charnes, A., and W.W. Cooper, Programming with Linear Fractional Functionals, Naval Research Logistics Quarterly 9 (1962) 181{186. [46] Chebychev, P.L., (1867) On Mean Values, in: D.E. Smith (ed.) A Source book of Mathematics, II, New-York: Dover, 1959. [47] Chen, P.C., P. Hansen and B. Jaumard, On-Line and O-Line Vertex Enumeration by Adjacency Lists, Operations Research Letters 10(7) (1991) 403-409. [48] Chen, P.C., P. Hansen and B. Jaumard, Partial Pivoting in Vertex Enumeration, GERAD Research Report 92{15, Montreal, 1992. [49] Cheeseman, P., In Defense of Probability, Proc. 9th International Joint conf. on Arti cial Intelligence, Los Angeles, 1985, 1002{1009. [50] Cheeseman, P., A Method of Computing Generalized Bayesian Probability Values for Expert Systems, Proc. Eighth International Joint Conference on Arti cial Intelligence, Karlsruhe, (1983) 198{292. [51] Chesnokov, S.V., The Eect of Semantic Freedom in the Logic of Natural Language, Fuzzy Sets and Systems 22 (1987) 121{154. [52] Choquet, G., Theory of Capacities, Annales de l'Institut Fourier 5 (1954) 131{ 291. 55

[53] Colbourn, C.J., The Combinatorics of Network Reliability, Oxford; Oxford University Press, 1987. [54] Coletti, G., Conditionally Coherent Qualitative Probabilities, Statistica 48 (1988) 235{242. [55] Coletti, G., Coherent Qualitative Probability, Journal of Mathematical Psychology 34 (1990) 297{310. [56] Coletti, G., Numerical and Qualitative Judgments in Probabilistic Expert Systems, in R. Scozzafava (ed.) Proceedings of the Workshop on Probabilistic Expert Systems, Roma SIS (1993) 37{55. [57] Coletti, G., IEEE Transactions on Systems, Man and Cybernetics 34(12) (1994). [58] Coletti, G., and R. Scozzafava, Characterization of Coherent Conditional Probabilities as a Tool for their Assessment and Extension, Research Report, Dept. Math Univ di Perugia, Italy, 1996. [59] Cooper, G.F., The Computational Complexity of Probabilistic Inference using Bayesian Belief Networks, Arti cial Intelligence 42 (1990) 393{405. [60] Crama, Y., Recognition Problems for Special Classes of Polynomials in 0{1 Variables, Mathematical Programming 44 (1989) 135{155. [61] Crama, Y., P. Hansen and B. Jaumard, The Basic Algorithm for Pseudo-Boolean Programming Revisited, Discrete Applied Mathematics 29(2{3) (1989) 171{185. [62] Crama, Y., and J. van de Klundert, Approximation Algorithms for Integer Covering Problems via Greedy Column Generation, RAIRO, Recherche Operationnelle 28(3) (1994) 283{302. [63] Dantzig, G.B., On the Signi cance of Solving Linear Programming Problems with Some Integer Variables, Econometrica 28 (1961) 30{44. [64] Dantzig, G.B., Linear Programming and Extensions, Princeton University Press, Princeton, 1963. [65] Dantzig, G.B. and B.C. Eaves, Fourier-Motzkin and its Dual, Journal of Combinatorial Theory (A)14 (1973) 288{297. [66] de Finetti, B., Problemi determinati e indeterminati nel calcolo delle probabilita, Rendiconti Reale Accademia dei Lincei 6(XII) (1930) 367{373. 56

[67] de Finetti, B., La prevision: ses lois logiques, ses sources subjectives, Annales de l'Institut Henri Poincare 7 (1937) 1{68. [68] de Finetti, B., Theory of Probability { A Critical Introductory Treatment, Vol. 1, Wiley, New York, 1974. [69] de Finetti, B., Theory of Probability { A Critical Introductory Treatment, Vol. 2, Wiley, New York, 1975. [70] Dinkelbach, W., On Nonlinear Fractional Programming, Management Science 13, (1967) 492{498. [71] Douanya-Nguetse, G.-B., P. Hansen, B. Jaumard, Probabilistic Satis ability and Decomposition, Les Cahiers du GERAD, G{94{55, December 1994, 15 pages. [72] Dowsland, K.A., Simulated Annealing, in C.R. Reeves (ed.) Modern Heuristic Techniques for Combinatorial Problems London: Blackwell (1993) 20{69. [73] Driankov, D., Reasoning with Consistent Probabilities, Proceedings of IJCAI (1987) 899{901. [74] Dubois, D. and H. Prade, Fuzzy sets and Systems: Theory and Applications, Academic Pres, New York, 1980. [75] Dubois, D. and H. Prade, A tentative comparison of numerical approximate reasoning methodologies, International Journal of Man-Machine Studies 27 (1987) 717{728. [76] Dubois, D. and H. Prade, Possibility Theory, Plenum Press, New York, 1988. [77] Dyer, M.E., On the Complexity of Vertex Enumeration Methods, Mathematics of Operations Research 8(3) (1983) 381-402. [78] Fagin, R., J.Y. Halpern et N. Megiddo, A Logic for Reasoning about Probabilities, Information and Computation 87 (1990) 78{128. [79] Fekete, M., and G., Polya, U ber ein Problem von Laguerre, Rendiconti del Circolo Matematico di Palermo 23 (1912) 89{120. [80] Fortet, R., L'algebre de Boole et ses applications en Recherche Operationnelle, Cahiers du Centre d'Etudes de Recherche Operationnelle 1:4 (1959) 5{36. [81] Fortet, R., Applications de l'algebre de Boole en Recherche Operationnelle, Revue Francaise d'Informatique et de Recherche Operationnelle 4:14 (1960) 17{25. 57

[82] Frisch, A.M. and P. Haddawy, Convergent Reduction for Probabilistic Logic, Uncertainty in Arti cial Intelligence 3, Amsterdam: Elsevier, 1987, 278{286. [83] Frisch, A.M. and P. Haddawy, Anytime Deduction for Probabilistic Logic, Arti cial Intelligence 69 (1994) 93{122. [84] Gelembe, E., Une generalisation probabiliste du probleme SAT, Comptes Rendus de l'Academie des Sciences de Paris 315 (1992) 339{342. [85] Georion, A.M., Generalized Benders Decomposition, Journal of Optimization Theory and its Applications, 1972. [86] Georgakopoulos, G., D. Kavvadias, and C.H. Papadimitriou, Probabilistic Satis ability, Journal of Complexity 4 (1988) 1{11. [87] Glover, F., Tabu Search | Part I, ORSA Journal on Computing 1 (1989) 190{ 206. [88] Glover, F., Tabu Search | Part II, ORSA Journal on Computing 2 (1990) 4{32. [89] Goemans, M.X. and D.P. Williamson, A New 3/4-approximation Algorithm for MAX SAT, in: Proceedings of the third IPCO Conference, G. Rinaldi and L. Wolsey (eds.), (1993) 313{321. [90] Goodman, I.R., A Measure-Free Approach to Conditioning. Proc. 3rd AAAI Workshop on Uncertainty in AI, Seattle, July 1987, 270{277. [91] Goodman, I.R., and H.T. Nguyen, Conditional Objects and the Modeling of Uncertainties, in: Fuzzy Computing, Theory, Hardware and Applications, M.M. Gupta, T. Yamakawa (eds.), North-Holland, Amsterdam, 1988, 119{138. [92] Goodman, I.R., H.T. Nguyen and E.A. Walker, Conditional Inference and Logic for Intelligent Systems, Amsterdam; North-Holland, 1991. [93] M. Grotschel, L. Lovasz and A. Schrijver, The Ellipsoid Method and its Consequences in Combinatorial Optimization, Combinatorica 1 (1981) 169{197. (Corrigendum 4 (1984) 291{295). [94] Grzymala-Busse, J.W., Managing Uncertainty in Expert Systems, Kluwer, Boston, 1991. [95] Gu, J., Ecient Local Search for Very Large-Scale Satis ability Problems, SIGART Bulletin 3 (1992) 8{12. 58

[96] Guggenheimer, H. and R.S. Freedman, Foundations of Probabilistic Logic, Proceedings of IJCAI (1987) 939{941. [97] Hailperin, T., Best Possible Inequalities for the Probability of a Logical Function of Events, American Mathematical Monthly 72 (1965) 343{359. [98] Hailperin, T., Boole's Logic and Probability, Studies in Logic and the Foundations of Mathematics 85, North Holland, Amsterdam, rst edition, 1976. [99] Hailperin, T., Probability Logic, Notre-Dame Journal of Formal Logic 25(3) (1984) 198{212. [100] Hailperin, T., Boole's Logic and Probability, Studies in Logic and the Foundations of Mathematics 85, North Holland, Amsterdam, 2nd enlarged edition, 1986. [101] Hailperin, T., Probability Logic, Manuscript, 1993. [102] Halpern, A Study of First Order Logics of Probability, Arti cial Intelligence, 1991. [103] Hammer, P.L., I. Rosenberg and S. Rudeanu, 1963. On the Determination of the Minima of Pseudo-boolean Functions (in Romanian), Studii si Cercetari Matematice 14 (1963) 359{364. [104] Hammer, P.L., and S. Rudeanu, Boolean Methods in Operations Research and Related Areas, Berlin: Springer, 1966. [105] Hansen, P, Les procedures d'optimisation et d'exploration par separation et evaluation, in: B Roy (Ed.) Combinatorial Programming, Dordrecht: Reidel (1975) 19{65. [106] Hansen, P. and B. Jaumard, Algorithms for the Maximum Satis ability Problem, Computing 44 (1990) 279{303. [107] Hansen, P., B. Jaumard and G.B. Douanya Nguetse, Best Second Order Bounds for Two-terminal Network Reliability with Dependent Edge Failures, Les Cahiers du GERAD G{94{01, February 1994, 23 pages. [108] Hansen, P., B. Jaumard, G.-B. Douanya Nguetse and M. Poggi De Arag~ao, Models and Algorithms for Probabilistic and Bayesian Logic, in: IJCAI-95 Proceedings of the Fourteenth International Joint Conference on Arti cial Intelligence 2 (1995) 1862{1868. 59

[109] Hansen, P., B. Jaumard and V. Mathon, Constrained Nonlinear 0{1 Programming, ORSA Journal on Computing 5 (1993) 97{119. [110] Hansen, P., B. Jaumard and M. Poggi de Arag~ao, Un algorithme primal de programmation lineaire generalisee pour les programmes mixtes, Comptes Rendus de l'Academie des Sciences de Paris 313(I) (1991) 557{560. [111] Hansen, P., B. Jaumard and M. Poggi de Arag~ao, Mixed-Integer Column Generation Algorithms and the Probabilistic Maximum Satis ability Problem, Integer Programming and Combinatorial Optimization II, E. Balas, G. Cornuejols and R. Kannan (Eds.), Pittsburgh: Carnegie Mellon University, (1992) 165{180. [112] Hansen, P., B. Jaumard and M. Poggi de Arag~ao, Boole's Conditions of Possible Experience and Reasoning Under Uncertainty, Discrete Applied Mathematics 60 (1995) 181{193. [113] Hansen, P., M. Minoux, and M. Labbe, Extension de la programmation lineaire generalisee au cas des programmes mixtes, Comptes Rendus de l'Academie des Sciences de Paris 305 (1987) 569{572. [114] Hansen, P. and B. Simeone, Unimodular Functions, Discrete Applied Mathematics 14 (1986) 269{281. [115] Henrion, M., Propagating Uncertainty in Bayesian Networks by Probabilistic Logic Sampling, Uncertainty in Arti cial Intelligence 2, J.F. Lemmer and L.N. Kanal (Eds.), North-Holland, Amsterdam, 1988, 149{164. [116] Hooker, J.N., A Mathematical Programming Model for Probabilistic Logic, Working paper 05{88{89, Graduate School of Industrial Engineering, Carnagie-Mellon University. Pittsburg, Pa 15123, July 1988. [117] Jaumard, B., P. Hansen and M. Poggi de Arag~ao, Column Generation Methods for Probabilistic Logic, ORSA Journal on Computing 3 (1991) 135{148. [118] Kampke, T., Probabilistic Logic via Capacities, International Journal of Intelligent Systems 10 (1995) 857{869. [119] Kane, T.B., Maximum Entropy in Nilsson's Probabilistic Logic, in: Proceedings of IJCAI 1989, Morgan Kaufmann, California, 442{447, 1989. [120] Kane, T.B., Enhancing the Inference Mechanisism of Nilsson's Probabilistic Logic, International Journal of Intelligent Systems 5(5) (1990) 487{504. 60

[121] Kane, T.B., Reasoning with Maximum Entropy in Expert Systems, in: W.T. Grandy and L.H. Schick (Eds.), Maximum Entropy and Bayesian Methods, Kluwer Academic Publishers, Boston, 201{214, 1991. [122] Kane, T.B., Reasoning with Uncertainty Using Nilsson's Probabilistic Logic and the Maximum Entropy Formalism, Doctoral Dissertation, Heriot-Watt University, Edinburgh, 1992. [123] Kane, T.B., P. McAndrew and A.M. Wallace, Model-Based Object Recognition Using Probabilistic Logic and Maximum Entropy, International Journal of A.I. and Pattern Recognition 5(3) (1991) 425{437. [124] Kavvadias, D. and C.H. Papadimitriou, A Linear Programming Approach to Reasoning about Probabilities, Annals of Mathematics and Arti cial Intelligence 1 (1990) 189{205. [125] Keynes, J.M., A Treatise on Probability, London: Macmillan, 1921. [126] Khachiyan, L.G., A Polynomial Algorithm in Linear Programming (in Russian), Doklady Akademii Nauk SSSR 224, 1093{1096, 1979. (English translation: Soviet Mathematics Doklady 20, 191{194, 1979). [127] Kirkpatrick, S., C.D. Gelatt and M.P. Vecchi, Optimization by Simulated Annealing, Science 220(4598) (1983) 671{674. [128] Kohlas, J. and P.-A. Monney, Probabilistic Assumption-Based Reasoning, Working Paper 94{22, Institute of Informatics, University of Fribourg, 1994. [129] Kohlas, J. and P.-A. Monney, Assumption Based Truth Maintenance, Lecture Notes in Computer Science, Berlin, Springer (1995). [130] Kounias S. and J. Marin, Best Linear Bonferroni Bounds, SIAM Journal on Applied Mathematics 30 (1976) 307{323. [131] Kounias, S. and K. Sotirakoglou, Upper and Lower Bounds for the Probability that r Events Occur, Optimization 27 (1993) 63{78. [132] Lad, F., J.M. Dickey and M.A. Rahman, The Fundamental Theorem of Prevision, Statistica 50 (1990) 19{38. [133] Lad, F., J.M. Dickey and M.A. Rahman, Numerical Applications of the Fundamental Theorem of Prevision, Journal of Statistical Computing and Simulation 40 (1992) 135{151. 61

[134] Laskey, K.B. and P.E. Lehner, Assumptions, Beliefs and Probabilities, Arti cial Intelligence 41 (1990) 65{77. [135] Lauritzen, S.L. and Spiegelhalter, D.J., Computation with Probabilities in Graphical Structures and their Application to Expert Systems, Journal of the Royal Statistical Society B 50(2) (1988) 157{224. [136] Lewis, D., Probabilities of Conditionals and Conditional Probabilities, Philosophical Review 85 (1976) 297{315. [137] Lin, S. and B.W. Kernighan, An Eective Heuristic Algorithm for the Traveling Salesman Problem, Operations Research 21 (1973) 498{516. [138] Liu, W. and A. Bundy, A Comprehensive Comparison between Generalized Incidence Calculus and the Dempster-Shafer Theory of Evidence, International Journal of Human-Computer Studies 40 (1994) 1009{1032. [139] McLean, R.G., A. Bundy and W. Liu, Assignment Methods for Incidence Calculus, International Journal of Approximate Reasoning 12 (1995) 21{41. [140] McLeish, M., A Note on Probabilistic Logic, Proceedings of American Association for Arti cial Intelligence Conference, St Paul-Minneapolis, 1988. [141] McLeish, M., Probabilistic Logic: Some Comments and Possible Use for Nonmonotonic Reasoning, in J.F. Lemmer and L.N. Kaval (Editors), Uncertainty in Arti cial Intelligence 2, Amsterdam:North-Holland, 55{62, 1988. [142] Marsten, R.E., The design of the XMP Linear Programming Library, ACM Transactions on Mathematical Software 7(4) (1981) 481{497. [143] Medolaghi, La logica matematica e il calcolo delle probabilita, Bolletino Associazione Ittaliani di Attuari 18 (1907). [144] Nilsson, N.J., Probabilistic logic, Arti cial Intelligence 28(1) (1986) 71{87. [145] Nilsson, N.J., Probabilistic Logic Revisited, Arti cial Intelligence 59 (1993) 3942. [146] Paris and Vancovska, On the Applicability of Maximum Entropy to Inexact Reasonning, International Journal of Approximate Reasonning 3 (1988) 1{34. [147] Parker, K.P. and E.J. McCluskey, Analysis of Logic with Faults Using Input Signal Probabilities, IEEE Transactions of Computers C{24 (1975) 573{578. 62

[148] Parker, K.P. and E.J. McCluskey, Probabilistic Treatment of General Combinational Networks, IEEE Transactions of Computers C{24 (1975) 668{670. [149] Pearl, J., How to Do with Probabilities what People say you Can't, Proceedings of the Second Annual Conference on Arti cial Intelligence Applications, December 11{13, Miami, Florida, 6{12, 1985. [150] Pearl, J., Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, Morgan Kaufmann, San Mateo, California, 1988. [151] Prekopa, A., Boole-Bonferroni Inequalities and Linear Programming, Operations Research 36 (1988) 145{162. [152] Prekopa, A., Sharp Bounds on Probabilities Using Linear Programming, Operations Research 38 (1990) 227{239. [153] Prekopa, A., The Discrete Moment Problem and Linear Programming, Discrete Applied Mathematics 27 (1990) 235{254. [154] Prekopa, A. and E. Boros, On the Existence of a Feasible Flow in a Stochastic Transportation Network, Operations Research 39 (1991) 119{129. [155] Quinlan, J.R., Inferno: A Cautious Approach to Uncertain Inference, The Computer Journal 26 (1983) 255{269. [156] Reichenbach, H., Philosophical Foundations of Quantum Mechanics, University of California Press, Berkeley, 1948. [157] Reiter, S. and D.B. Rice, Discrete Optimization Solution, Procedures for Linear and Nonlinear Integer Programming Problems, Management Science 12 (1966) 829{850. [158] Reeves, C.R. (Editor) Modern Heuristic Techniques for Combinatorial Problems, London: Blackwell (1993). [159] Rhys, J.M.W., A Selection Problem of Shared Fixed costs and Network Flows, Management Science 17 (1970) 200{207. [160] Schay, G., An Algebra of Conditional Events, Journal of Mathematical Analysis and Applications 24 (1968) 334{344. [161] Selman, B., H. Levesque and D. Mitchell, A New Method for Solving Hard Satis ability Problems, Proceedings of the Tenth National Conference on Arti cial Intelligence 1992, 440{446. 63

[162] Shafer, G., A Mathematical Theory of Evidence, Princeton University, Princeton, NJ, 1976. [163] Simeone, B., D. de Werra and M. Cochand., Combinatorial Properties and Recognition of Some Classes of Unimodular Functions, Discrete Applied Mathematics 29 (1990) 243{250. [164] Stephanou, H.S. and A.P. Sage, Perspectives on imperfect information processing, IEEE Transactions on Systems, Man and Cybernetics 17 (1987) 780{798. [165] Suppes, P., Probabilistic Inference and the Concept of Total Evidence, in: J. Hintikka and P. Suppes (Eds.), Aspects of Inductive Logic, Amsterdam: NorthHolland, 1966, 49{65. [166] Tarjan, R.E. and M. Yannakakis, Simple Linear-time Algorithms to Test Chordality of Graphs, Test Acyclicity of Hypergraphs and Selectively Reduce Acyclic Hypergraphs, SIAM Journal of Computing 13 (1984) 566{579. [167] Van der Gaag, L.C., Probability-Based Models for Plausible Reasoning, Ph.D. Thesis, University of Amsterdam, 1990. [168] Van der Gaag, L.C., Computing Probability Intervals Under Independency Constraints, in: P.P. Bonissone, M. Henrion, L.N. Kanal and J.F. Lemmer (Eds) Uncertainty in Arti cial Intelligence 6 (1991) 457{466. [169] Van Laarhoven, P.J.M. and E.H.L. Aarts, Simulated Annealing: Theory and Applications, Dordrecht: Kluwer, 1988. [170] Walley, P., Statistical Reasoning with Imprecise Probabilities, Chapman and Hall, Melbourne, 1991. [171] Wilbraham, H., On the Theory of Chances Developed in Professor Boole's \Laws of Thought", The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 4(7) (1854) 465{476. [172] Zadeh, L.A., Fuzzy Sets, Information and Control 8 (1965) 338{353. [173] Zadeh, L.A., Fuzzy Sets as a Basis for a Theory of Possibility, Fuzzy Sets and Systems 1 (1978) 3{28. 64

[174] Zadeh, L.A., Is Probability Theory Sucient for Dealing with Uncertainty in AI: A Negative View, in: L.N. Kanal and J.F. Lemmer (Eds.), Uncertainty in Arti cial Intelligence 4 North-Holland, 1986, 103{106. [175] Zemel, E., Polynomial Algorithms for Estimating Network Reliability, Networks 12 (1982) 439{452.

65