Tractable Problems in Bayesian Networks 1

0 downloads 0 Views 243KB Size Report
Intuitively, there is no directly linear relation between (p(x1) p(x2) p(x3)) and (p(y1) p(y2) ... In this paper, we will establish an approximation function similar to the above ... formally, for a rule X —Y (or X is a directly parent of Y ) in a given belief ...
Tractable Problems in Bayesian Networks Shichao Zhang School of Computing, National University of Singapore 55 Science Drive 2, Singapore 117599 [email protected] Chengqi Zhang School of Computing and Mathematics Deakin University Geelong, Vic 3217, Australia [email protected] Abstract

The complexity of probabilistic reasoning with Bayesian networks has recently been proven to be NP-complete. So reducing the complexity of such networks is an active issue in uncertainty reasoning. Much work on such as compressing the probabilistic information for Bayesian networks and optimal approximation algorithm for Bayesian inference, has been suggested. In this paper, we study a class of tractable problems in Bayesian networks aiming at reducing their computing complexity from non-polynomial amount to polynomial one. The main challenge for tractable problems in Bayesian networks is the propagation of probabilities. To solve this problem, we construct a new model to integrate statistical technique into Bayesian networks using an encoding technique. For eectiveness, an optimization for conditional probability matrix is also built. We evaluated the proposed technique, and our experimental results shown that the approach is ecient and promising.

Keywords: Probabilistic reasoning, Bayesian network, encoding technology, belief network, approximating reasoning.

1 Introduction The complexity of probabilistic reasoning with Bayesian networks has recently been proven to be NPcomplete 3], which has generally prevented problem formulations from utilising the full representational capabilities of Bayesian networks. In order to reduce the complexity of such networks, Santos 16] suggested a fast, ecient, and simple approach of compressing the probabilistic information for Bayesian networks by using an encoding technique. In this paper, we study a class of tractable problems in Bayesian networks aiming at reducing their computing complexity from non-polynomial amount to polynomial one. The main challenge for tractable problems in Bayesian networks is the propagation of probabilities. To solve this problem, we construct a new model to integrate statistical technique into Bayesian networks using an encoding technique. We evaluated the proposed technique, and our experimental results shown that the approach is ecient and promising.

1.1 Motivation

In Bayesian networks, the propagation of probabilities is based on matrices. So the complexity of such networks is non-polynomial. However, in our opinion, many nodes in Bayesian networks can be linearized. The linearized nodes are called as tractable problems. The main motivation in this paper is to propose a

1

model for tackling the tractable problems. We now use an example to illustrate the tractable problems as follows.

Example 1 1: In a certain trial there are three suspects, one of whom has denitely committed a murder. The murder weapon, showing some ngerprints, was later found by police. Let X identify the last user of the weapon, namely, the killer. Let Y identify the last holder of the weapon, i.e., the person whose ngerprints were left on the weapon, and let Z represent the possible readings that may be obtained in a ngerprint laboratory. The relations between these three variables would normally be expressed by the chain X ! Y ! Z  X generates expectations about Y , and Y generates expectations about Z , but X has no inuence on Z once we know the value of Y . Let suspect1 suspect2 suspect3 be the three suspects, respectively. To represent the common-sense knowledge(X ! Y ) that the killer is normally the last person to hold the weapon, we use a 3  3 conditional probability matrix: 2 3 0:8 0:1 0:1 MY jX = 4 0:1 0:8 0:1 5

0:1 0:1 0:8

Now, let an evidence be x = (0:7 0:1 0:2), or 0.7 is the probability that suspect1 is the last user of the weapon, 0.1 is the probability that suspect2 is the last user of the weapon, and 0.2 is the probability that suspect3 is the last user of the weapon. Then we have, 2 3 0:8 0:1 0:1   0:7 0:1 0:2] 4 0:1 0:8 0:1 5 = 0:59 0:17 0:24 :

0:1 0:1 0:8

So, y = 0:59 0:17 0:24] is the result what we want, or 0.59 is the probability that suspect1 is the last holder of the weapon, 0.17 is the probability that suspect2 is the last holder of the weapon, and 0.24 is the probability that suspect3 is the last holder of the weapon.

From this example, we have known that the amount of probabilistic information necessary to computations is overwhelming in belief networks. Actually, there is a linear causality between two variables X and Y in this example. However, it is described with MY jX for this linear causality in belief networks and then the complexity of such networks is non-polynomial. Apparently, if it can be solved in statistical technology, the computations may compress from non-polynomial amount to polynomial one. Now we demonstrate the argument by the example. For Example 1, let some random values of X and the corresponding values of Y solved with the above propagating model be listed as follows p(x1) p(x2) p(x3) p(y1 ) p(y2 ) 0.9 0.1 0 0.73 0.17 0 1 0 0.1 0.8 0 0.7 0.3 0.1 0.59 0.6 0 0.4 0.52 0.1 0.1 0.1 0.8 0.17 0.17 Intuitively, there is no directly linear relation between (p(x1 ) the above tabular. Now we encode them as

p(y3 ) 0.1 0.1 0.31 0.38 0.66 p(x2) p(x3)) and (p(y1 ) p(y2 ) p(y3 )) in

EX (x) = p(x1 )102 + p(x2)104 + p(x3)106 EY (y) = p(y1 )102 + p(y2 )104 + p(y3 )106: 1

It is cited from 11, pages 151{153]

2

Then the above tabular can be transformed into the table of the form: EX (x) EY (y) 1090 101773 10000 108010 307000 315910 400060 381052 801010 661717 Now we take EX (x) and EY (y) as ordered pair of the form (EX (x) EY (y)). These points (1090 101773), (10000 108010), (307000 315910), (400060 381052) and (801010 661717) are able to be tted in a line. In fact, they are tted by a line of the form: EY (y) = 0:7EX (x) + 101010: Once the above formula is determined, then for all x, we can determine EY (y) by the above formulae. Further, we can determine p(y1 ) p(y2 ) p(y3 ) by EY;1 (y), where EY;1(y) is the reversed function of EY (y). For example, let x = (0:1 0:3 0:6), then EX (x) = 603010: By using the above formulae, EY (y) = 523117. So by using EY;1(y), p(y1 ) = 0:17 p(y2) = 0:31 p(y3) = 0:52. In fact, this result is equal to the result with using Pearl's inference model for the observation. Hence, we can replace MY jX with the above linear

function.

In this paper, we will establish an approximation function similar to the above formula to close the propagating model in Bayesian networks, which uses the statistical technology with better encoders. More formally, for a rule X ! Y (or X is a directly parent of Y ) in a given belief network, let the domain of X be R(X) = fx1 x2 ::: xmg, the domain of Y be R(Y ) = fy1 y2 ::: yng, and the conditional probability matrix be MY jX . Suppose s = (p(x1 ) p(x2)  p(xm )) be an observation, P(YP ) = (p(y1 ) p(y2 )  p(yn)) be the result obtained with propagating model in Bayesian networks, P (YZ ) = (p(y10 ) p(y20 )  p(yn0 )) be the result obtained with the approximation function in this paper. Our goal is to construct the approximating solution P(YZ ) for tractable problems using the approximation function in this paper such that

jjP(YZ ) ; P(YP )jj  

where,

jjP(YZ) ; P(YP )jj = jjp(y10 ) ; p(y1)jj + jjp(y20 ) ; p(y2)jj +  + jjp(yn0 ) ; p(yn)jj

 > 0 is small enough. Once the approximation function of propagating probabilities is constructed for the above rule, we can replace the MY jX with this function. Certainly, it is not only the reasoning eciency enhanced, but also the amount of probabilistic information compressed to an acceptable level.

1.2 Related Work

Bayesian networks (or belief networks), one of the most popular models for probabilistic reasoning, have been widely accepted as a suitable, general and natural knowledge representation framework for reasoning and decision making under uncertainty. They have been successfully applied to such diverse areas as medical diagnosis 19], diagnosis of bottlenecks in computer systems 2], circuit fault detection 6, 11], planning systems 9], fraud detection 8], and advisory and control system for colon endoscopy 10]. However, the computing complexity of reasoning with general belief networks has been proven to be NP-hard 3, 4, 18]. Recently, many researchers attempt to improve the probabilistic reasoning model with belief networks. Some of them are concentrated on the conditional probability table size, such as probabilistic information can be compressed as an approximation function 16], the table size problems have been developed such as independence-based assignments 15, 17], \Noisy-OR" models 11, 12, 13, 20]. Some of them are focussed on perfecting the model, such as an optimal approximation algorithm for Bayesian inference 5], the creation of a hidden node 10], a search algorithm for estimating posterior probabilities in Bayesian networks 14],

3

local conditioning in Bayesian networks 7] and optimization of Pearl's method of conditioning and greedylike approximation algorithms for the vertex feedback set problem 1]. Some general researches are in 14, 17, 18, 21, 22]. Our work in this paper is concentrated on perfecting the model with Bayesian networks.

1.3 Organization

This paper is organised as follows. In Section 2, we give several denitions needed. In Section 3, we present an optimization for probability matrix. In Section 4, we establish an encoder technique for regression models employed to Bayesian networks. In Section 5, we propose a linear propagating model using the encoding technique. A summary of this paper is showed in the last section.

2 Basic Denitions In this paper, upper case letters will represent random variables and lower case letters will represent the possible assignments to the associated upper case letter random variable.  = (V P) will represent a Bayesian network, where V is the set of random variables and P is a set of conditional probabilities associated with the network. P (A = ajC1 = c1 ::: Cn = cn ) 2 P i C1 ::: Cn are all the immediate parents of A and there is an edge from Ci to A, i = 1 2  n, in the network.

Denition 1 Given a random variable A, the set of possible values for A, known as the range of A, will be denoted by R(A). For x 2 R(A), x is the point value of A.

Denition 2 Given a random variable A, all point values of A can construct a vector such as (x1 x2 ::: xn).

Each state of A can be described by its point values associated with probabilities as (P (x1) = a1 P(x2) = a2 ::: P(xn) = an), written as (a1 a2 ::: an). All states of A construct the state space of A, denoted by S(A). For a random limit sample space (A) of S(A), let the sample space have ` elements in (A), where ` as the capacity of (A), denoted by =((A)).

Denition 3 Given a random variable A 2 V , let > > < > > > :

Or,

@f @k1

=2

@f @k0

=2

P

((k1EX (a) + k0 ; EY (aMY jX ))EX (a)) = 0

P

(k1EX (a) + k0 ; EY (aMY jX )) = 0

a 2 (X) a 2 (X)

P P P 8 k1 EX2 (a) + k0 EX (a) ; (EY (aMY jX )EX (a)) = 0 > > > > a 2 (X) a 2 (X) a 2 (X) > < P P k E (a) + k EY (aMY jX ) = 0 > 1 X 0=((X)) ; > > > a 2 (X) a 2 (X) > :

So, we can estimate k1 and k0 by solving the above equation group as follows, 8 < k1 = !!13 ;;!!24 : k = 1==((X))(! ; ! ) 0 5 6

where, !1 = !2 =

X a2 (X )

=((X)

!3 = ( !4 = !5 =

EX (a)

X a2 (X )

X

a2 (X )

!6 = k1

a2 (X )

a2 (X )

EY (aMY jX )

(EY (aMY jX )EX (a))

EX (a))2

=((X)) X

X

X a2 (X )

EX2 (a)

EY (aMY jX )

X

a2 (X )

EX (a)):

Then the above formula: F (a) = k1EX (a) + k0 is the approximation function of (1) for propagating probabilities. For an observation a, F(a) can be gained from the above formula. Then we can solve b1 b2 ::: bm by F (a). That is, bi = (INT (F(a)=10(i;1)d) ; INT(F(a)=10id) 10d)=10d i = 1 2  m, where INT () is an integer function. In order to assure the probability signicance level of the results, the nal results are: b1 : = Maxf0 1 ; (b2 + b3 + ::: + bm )g bi : = bi=(b1 + b2 + ::: + bm )

8

i=1 2

 m.

Note that the value of b1 must rst be modied in the above method because there are distortion factors (or errors of approximation, or errors of calculation) in F(a) k1 and k0. The errors of such approximation inuence usually the value of b1 . Certainly, the errors may inuence the values of the other points of Y in some problems. But if d is proper and the sample space is large enough, the error may be controlled to inuence only to b1 . In other words, we can decrease the inuencing range of the error by using some techniques. For simplicity, we consider the distortion factors only to the value of b1 in this paper. Now a simple example is demonstrated the use of the above approximation function.

Example 2 For the rule in Example 2, let d = 2 and

2 3 0:1 0:8 0:1 MX jY = 4 0:1 0:1 0:8 5

0:8 0:1 0:1

For this matrix, the encoder of X needs to give a consideration to the increasing order of the encoder of Y because the encoders used directly the method in Subsection 4.1 are not better choice. In fact, the minimum and the maximum of EY are 101080 and 801010 according to the encoder method in Subsection 4.1, respectively. The minimum of EY is corresponding to the probabilities of point values of Y as, p(y1 ) = 0:8 p(y2) = 0:1 p(y3) = 0:1 the minimum is corresponding to the state of X as, p(x1) = 0 p(x2) = 0 p(x3) = 1: And the maximum of EY is corresponding to the probabilities of point values of Y as, p(y1 ) = 0:1 p(y2) = 0:1 p(y3) = 0:8 the maximum is corresponding to the state of X as, p(x1) = 0 p(x2) = 1 p(x3) = 0: So, these encoders are not better choice for estimating the approximation function of the rule. Certainly, if the encoder EX is determined with respect to the increasing order of the encoder of Y (or the encoders of states (0, 0, 1) and (0, 1, 0) are the minimum and the maximum of the encoder of X , respectively), then these encoders are more proper to estimating the approximation function. In order to realize this encoder, it must re-arrange the order of the point values as

x3 x1 x2 to rename them as,

z1 = x3 z2 = x1 z3 = x2 : Then, the state space is S(X) = f(p(z1 ) = a1 p(z2) = a2 p(z3) = a3 )ja1 + a2 + a3 = 1g, and the encoder is as

or and

EX (a) = EX (a1 a2 a3) = 10d a1 + 102d a2 + 103d a3 EX (a) = 10d p(x3) + 102dp(x1 ) + 103d p(x2) EY (b) = 10d p(y3 ) + 102dp(y1 ) + 103d p(y2 ):

9

Now we can solve the approximation function with the above encoder as follows:

F (a) = 0:7EX (a) + 101010 Given an observation a = (p(x1 ) = 0:2 p(x2) = 0:1 p(x3) = 0:7), the probabilities of the point values of

YP can be gained by using the propagating model in Bayesian networks,

p(y1 ) = 0:59 p(y2) = 0:24 p(y3) = 0:17

The corresponding state of this observation is (0.7, 0.2, 0.1) and the encoder of the state is EX (a) =

107020. If it is substituted into the above approximation function, we have, F (a) = 175924:

According to EY;1 (y), we can obtain the probabilities of YZ from F (a) as follows,

p(y1 ) = 0:59 p(y2) = 0:24 p(y3) = 0:17:

So, we have

jjP(YZ ) ; P(YP )jj = jjpZ (y1) ; pP (y1)jj + jjpZ(y2) ; pP (y2)jj + jjpZ (yn) ; pP (yn)jj

= 0: Generally, for given some random observations on (p(x1 ) p(x2) p(x3)), the values of EYP and EYZ are

listed as follows.

p(x1)

p(x2)

p(x3)

EX (a)

F(a)

0.2 0 0.8 2080 102466 0 0.1 0.9 100090 171073 0.2 0.1 0.7 102070 172459 0.6 0.3 0.1 306010 315217 0.1 0.7 0.2 701020 591724 This example shows, if the causality of rules of the form: X ! Y be perfectly tted by the above F (a).

EYZ

EYP

102466 102466 171073 171073 172459 172459 315217 315217 591724 591724 is tractable problem, the causality can

From these simple examples we can nd out, the above approximation functions can t the propagating model in Bayesian networks. The key of problems is to construct the appropriate encoders. Hence, if a conditional probabilistic matrix of a node in a given belief network has a potential linear relationship, then we may replace it with the above approximation function.

5 Optimization of Matrix We have seen, the complexity of Bayesian networks described with MY jX is non-polynomial. Actually, there are some tractable problems in applications. If they can be solved in regression models, the computation may be compressed from non-polynomial amount to polynomial one. In Section 1, we have shown that the tractable problems can linearized using encoding techniques. However, there is sometimes much useless (unnecessary) information in a given conditional probability matrix. For eectiveness, the unnecessary information would be cleaned before constructing linear functions for the tractable problems. For example, let the domains of Education and Salary be as fDoctor Master Bachelor UnderBachelorg and f3500 +1) 2400 3500) 0 2400)g respectively, and X and Y stand for Education and Salary, respectively. And the conditional probability matrix is as follows.

10

2 0:9 0:09 0:01 6 0:31 0:38 MY jX = 64 0:24 0:31 0:38 0:38

3 7 7 5

0:2 0:4 0:4 where rows: 2, 3 and 4 are useless for applications because each probability in these rows is less than a minimum probability (minprob 0) specied by users. In this section, we present a method of merging the unnecessary information in the matrices. Certainly, for the above matrix, if X ! Y with MY jX is as a causal rule, the unnecessary information such as rows: p(Y jX = Master) = (0:31 0:31 0:38), p(Y jX = Bachelor) = (0:24 0:38 0:38), and p(Y jX = UnderBachelor) = (0:2 0:4 0:4) would be minimized as possible. So MY jX is expected to be rened as follows.

0:9 0:09 0:01 MY0 jX = 0:25 0:363 0:387 where reduce last three rows into one row. Or a optimal matrix is as

0:9 0:05 MY00 jX = 0:25 0:375 where last two columns are reduced into one column. However, merging column unnecessary information can cause that sum of the probabilities in a row is not equal to 1. This matrix is generally required to hold that sum of the probabilities in a row is equal to 1 in probability theory. According to this requirement, the rst row of this matrix can change as (0:9=0:95 0:05=0:95) = (0:947 0:053). And the matrix is as

0:947 0:053 000 MY jX = 0:4 0:6 Apparently, this model of merging useless information is extremely utility to optimize the knowledge in intelligent systems. We now present a method to reduce such unnecessary information in matrices.

5.1 Merging Unnecessary Information

For a conditional probability matrix, if the probabilities of a row (or a column) satisfy p(Y = yi jX = xj ) < minprob for i = i0 and j = 1 2  m (or i = 1 2  n and j = j0 ), then this row (or column) is called unnecessary information in a conditional probability matrix. For example, let the domains of Education and Salary be as and

fPostDoctor Doctor P ostMaster Master Bachelor UnderBachelorg f3500 +1) 2400 3500) 0 2400)g

respectively, and X and Y stand for Education and Salary, respectively. And the statistical results are from a data set as follows.

Table 1 Statistical results of interest data

11

Education PostDoctor Doctor PostMaster Master Bachelor UnderBachelor

Salary

3500 +1) 2100 3500) 0 2100) 3500 +1) 2100 3500) 0 2100) 3500 +1) 2100 3500) 0 2100) 3500 +1) 2100 3500) 0 2100) 3500 +1) 2100 3500) 0 2100) 3500 +1) 2100 3500) 0 2100)

Number

9000 900 100 8000 1900 100 1000 7500 1500 3100 3100 3800 2400 3800 3800 2000 4000 4000

According to Bayesian networks, X ! Y with the conditional probability matrix MY jX is as follows. 2 3 0:9 0:09 0:01 6 0:8 0:19 0:01 77 6 6 0:1 0:75 0:15 77 MY1 jX = 66 0:31 0:31 0:38 77 6 4 0:24 0:38 0:38 5 0:2 0:4 0:4 In this conditional probability matrix, if let minprob = 0:6, then p(Y = yi jX = Master) < minprob, p(Y = yi jX = Bachelor) < minprob and p(Y = yi jX = UnderBachelor) < minprob. That is, rows: 4, 5, 6 are certainly unnecessary information. In fact, when given evidences are as: (0 0 0 1 0 0), (0 0 0 0 1 0) and (0 0 0 0 0 1), the reasoning results: (0:31 0:31 0:38), (0:24 0:38 0:38) and (0:2 0:4 0:4) can't be useful to applications. The problem of merging such unnecessary information can be formally described as follows. Let X ! Y be a extracted causal rule with conditional probability matrix MY jX , the domain of X be R(X) = fx1 x2 ::: xmg, the domain of Y be R(Y ) = fy1 y2 ::: yng. (1) nding out all columns i1 i2  is that any column ik holds p(yik jxj ) < minprob for j = 1 2  m and k = 1 2  s. (2) Merging all columns i1 i2  is into column i1 if s > 1, and delete columns i2  is from MY jX . (3) nding out all rows i1 i2  it that any row ik holds p(yj jxik ) < minprob for j = 1 2  n and k = 1 2  t. (4) Merging all rows i1 i2  it into row i1 if t > 1, and delete rows i2  it from MY jX .

Procedure 1 ReneRules(NewCRSET , RSET : sets of rules) Input: NewCRSET : the set of original rules Output: RSET : the set of optimized rules (1) begin let RSET  for each rule X ! Y with MY jX in NewCRSET do begin 12

(2) let col  for each column i of MY jX do beginfor for j := 1 to m do if p(yijxj ) minprob then let col col  fig (3)

(4)

(5)

(6)

endfor for j := 1 to m do let pj 0 for each i 2 col do for j := 1 to m do let pj pj + p(yi jxj ) for j := 1 to m do let pj pj =jcolj for each i 2 col do delete column i from matrix MY jX  add (p1 p2  pm) as a new column of MY jX  let r  for each row i of MY jX beginfor for j := 1 to n do if p(yj jxi) minconf then let r r  fig endfor for j := 1 to n do let pj 0 for each i 2 r do for j := 1 to n do let pj pj + p(yj jxi) for j := 1 to n do let pj pj =jrj for each i 2 r do delete row i from matrix MY jX  add (p1 p2  pn) as a new row of MY jX  let RSET optimized rule X ! Y with MY jX  enddo end

We now demonstrate the use of this model with the above rule as follows. Actually, when X = Master, X = Bachelor, and X = UnderBachelor, we cannot determine which salary he is possible to earn. In order to reduce this unnecessary information, we can merge these three quantitative items into a quantitative item M&U. Hence, the above data can be reduce as follows.

13

Table 2 Statistical results of interest data Education

PostDoctor Doctor PostMaster M&U

Salary

3500 +1) 2100 3500) 0 2100) 3500 +1) 2100 3500) 0 2100) 3500 +1) 2100 3500) 0 2100) 3500 +1) 2100 3500) 0 2100)

Number

9000 900 100 8000 1900 100 1000 7500 1500 7500 10900 11600

Hence, for causal rule X ! Y is with the conditional probability matrix MY jX as follows. And the domain of X is R(X) = fPostDoctor Doctor P ostMaster M&U g, the domain of Y is R(Y ) = f3500 +1) 2100, 3500) 0 2100)g. 2 3 0:9 0:09 0:01 6 0:19 0:01 77 MY2 jX = 64 0:8 0:1 0:75 0:15 5 0:25 0:363 0:387

Theorem 3 The merge of row unnecessary information is reasonable. Proof: In the above merge, rows i1 i2  it are called unnecessary information, if all of them satisfy p(yj jxik ) < minprob for j = 1 2  n and k = 1 2  t. Rows i1 i2  it are all merged as row i1 with p(yj jxi1 ) = (p(yj jxi1 ) + p(yj jxi2 ) +  + p(yj jxit ))=t, for j = 1 2  n. Certainly, n X j =1

p(yj jxi1 ) = (p(y1 jxi1 ) + p(y1 jxi2 ) +  + p(y1 jxit ))=t

+ (p(y2 jxi1 ) + p(y2 jxi2 ) +  + p(y2 jxit ))=t +  + (p(yn jxi1 ) + p(yn jxi2 ) +  + p(yn jxit ))=t = (1 + 1 +  + 1)=t = 1:

Hence, the merge of row unnecessary information is reasonable.

5.2 Merging Items with Identical Properties

2

We have seen, this merging rows of a given matrix can improve the probability matrix. On the other hand, for when X = PostDoctor and X = Doctor, we can determine that his/hers salary is in 3500 +1) with higher condence. Such quantitative items are called items with identical property. In the same reason of reducing this redundant, we can merge these two quantitative items into a quantitative item P&D. Hence, the above data can be reduce as follows.

14

Table 3 Statistical results of interest data Education P&D PostMaster M&U

Salary

3500 +1) 2100 3500) 0 2100) 3500 +1) 2100 3500) 0 2100) 3500 +1) 2100 3500) 0 2100)

Number

17000 2800 200 1000 7500 1500 7500 10900 11600

Hence, for causal rule X ! Y is with the conditional probability matrix MY jX as follows. And the domain of X is R(X) = fP&D PostMaster M&U g, the domain of Y is R(Y ) = f3500 +1) 2100 3500) 0 2100). 2 3 0:85 0:14 0:01 MY3 jX = 4 0:1 0:75 0:15 5 0:25 0:363 0:387 Also, the problem of merging such quantitative items can be formally described as follows. Let X ! Y be a extracted causal rule with conditional probability matrix MY jX , the domain of X be R(X) = fx1 x2 ::: xmg, the domain of Y be R(Y ) = fy1 y2 ::: yng. (1) nding out all columns i1 i2  is that column ik holds p(yik jxj ) minprob at some j (1  j  m) and k = 1 2  s. (2) Merging all columns i1 i2  is into column i1 if s > 1, and delete columns i2  is from MY jX . (3) nding out all rows i1 i2  it that row ik holds p(yj jxik ) minprob at some j (1  j  m) and k = 1 2  t. (4) Merging all rows i1 i2  it into row i1 if t > 1, and delete rows i2  it from MY jX . The algorithm of merging quantitative items with identical properties is as similar as the above procedure of reducing unnecessary information. So we omit it here.

Theorem 4 The merge of column unnecessary information is reasonable. Proof: The proof is as similar as Theorem 4.

2 In the previous, we illustrated that such represented probabilistic dependencies are useful in reasoning and decision under uncertainty. This merit is held in our polished rules. When such rules are applied, the probabilities of the merged point-values are altogether added as the probability of a new point-value that is used to substitute for the merged point-values. For example, let an evidence be (0:7 0:1 0:08 0:08 0:02 0:02) for MY1 jX . If it is used to inference in MY3 jX , the evidence (0:7 0:1 0:08 0:08 0:02 0:02) needs to merge into (0:8 0:08 0:12). In this way, we can obtain Y = (0:718 0:216 0:066). Or 0.718 is the probability that he/she earn salary in 3500 +1), 0.216 is the probability that he/she earn salary in 2100 3500), and 0.066 is the probability that he/she earn salary in 0 2100).

6 Conclusion Bayesian networks, one of the most popular models for probabilistic reasoning, have been widely accepted as a suitable, general and natural knowledge representation framework for reasoning and decision making under uncertainty. However, though the methods have a good theoretical basis, and some successful applications, their computing complexity has been proven to be NP-hard 3]. Recently, various researchers have attempted

15

to improve the probabilistic reasoning model 16, 11]. The main contribution of this paper is to establish an encoding technique for integrating statistical and probabilistic techniques to reasoning model with Bayesian networks. It can not only reduce the computing complexity of propagating probabilities in Bayesian networks from non-polynomial amount to polynomial one for a class of problems, but also use the results obtained in this method for further reasoning in Bayesian networks. To study the eectiveness of our model, we have performed several experiments. The algorithm is implemented on Sun SparcServer using Java. For the convenience of comparison, we randomly generate four simple Bayesian networks. The main properties of the networks are the following. The rst networks consists of two matrices, which are 2  3 and 3  3 matrices, respectively. The second networks consists of three matrices, which are 3  3, 3  4 and 4  5 matrices, respectively. The third networks consists of four matrices, which are 3  3, 3  5, 5  4 and 4  4 matrices, respectively. The fourth networks consists of ve matrices, which are 2  3, 3  3, 3  4, 4  4, and 4  5 matrices, respectively. The comparison of our model (written as LINEAR) with Bayesian networks (written as MATRIX) in running time and space is illustrated in Figure 1 and Figure 2. 70 t "LPF2" "PPF2"

60

50

40

30

20

10 6

8

10

12

14

16

18

data 20

Figure 1: The comparison on time. 450

s "LPF1" "PPF1"

400

350

300

250

200

150

100

6

8

10

12

14

16

18

data 20

Figure 2: The comparison on space ing.

We have seen, our experimental results demonstrate that the proposed approach is ecient and promis-

16

Future work on integrating statistical and probabilistic techniques to reasoning model with Bayesian networks will include mainly the acceptance test of this method and the applications of more advantages in statistics employed to improve the probabilistic reasoning model. We are going to establish an optimal method of constructing encoders, then apply it to some diagnosis systems.

References 1] Becker A. and Geiger D., Optimization of Pearl's method of conditioning and greedy- like approximation algorithms for the vertex feedback set problem, Articial Intelligence, 83(1996): 167-188. 2] Breese J. and Blake R., Automating computer bottleneck detection with belief nets, Proceedings of Eleventh Conference on Uncertainty in Articial Intelligence, Montreal, Que, 1995: 36-45. 3] Cooper G., The computational complexity probabilistic inference using belief networks, Articial Intelligence, 42(1990): 393-405. 4] Dagum P. and Luby M., Approximating probabilistic inference in Bayesian belief networks is NP-hard, Articial Intelligence, 60(1993): 141-153. 5] Dagum P. and Luby M., An optimal approximation algorithm for Bayesian inference, Articial Intelligence, 93(1997): 1-27. 6] Davis R., Diagnostic reasoning based on structure and behaviour, Articial Intelligence, 24(1984): 347-410. 7] Diez F., Local conditioning in Bayesian networks, Articial Intelligence, 87(1996): 1-20. 8] Ezawa K. and Schuermann T., Fraud/uncollectible debt detection using a Bayesian network based learning system: a rare binary outcome with mixed data structures, Proceedings Eleventh Conference on Uncertainty in Articial Intelligence, Seattle, WA, 1994: 227-234. 9] Kirman J., Nicholson A., Lejter M., Santos J. and Dean T., Using goals to nd plans with high expected utility, Proceedings of the 2nd European Workshop on Planning, 1993. 10] Kwoh C. and Gillies D., Using hidden nodes in Bayesian networks, Articial Intelligence, 88(1996): 1-38. 11] Pearl J., Probabilistic reasoning in intelligent systems: Networks of plausible inference, Morgan Kaufmann Publishers, 1988. 12] Peng Y., and Reggia J., Plausibility of diagnostic hypotheses: The nature of simplicity, Proceedings of AAAI'86, Menlo Park, Calif.,1986: 140-147. 13] Y. Peng and J. Reggia, A probabilistic causal model for diagnostic problem solving-Part 1: Integrating symbolic causal inference with numeric probabilistic inference, IEEE Trans. Systems, Man and Cybernetics, 17(1987): 146-162. 14] Poole D., Probabilistic conicts in a search algorithm for estimating posterior probabilities in Bayesian networks, Articial Intelligence, 88(1996): 39-68. 15] Santos,J. and Shimony S., Belief updating by enumerating high-probability independence-based assignments, Proceedings of the Conference on Uncertainty in Articial Intelligence, Morgan-Kaufmann, San Francisco, 1994: 506-513. 16] Santos J., On linear potential functions for approximating Bayesian computations, Journal of The ACM, 43(1996): 399-430. 17] Shimony S., The role of relevance in explanation, I: Irrelevance as statistical independence, Int. J. Approx. Reasoning, 1993.6.

17

18] Shimony S. and Charniak E., A new algorithm for nding map assignments to belief networks. Proceedings of the Conference on Uncertainty in Articial Intelligence, Morgan Kaufmann, San Francisco, calif., 1990. 19] Shwe M., Middleton B., Heckerman D., Henrion M., Horvitz E. and Lehmann H., Probabilistic diagnosis using a reformulation of the internist-1/qmr knowledge base: I. the probabilistic diagnosis model and inference algorithms. Meth. Inf. Med., 30(1991): 241-255. 20] Srinivas S., A generalization of the noisy-or model. Proceedings of the Conference on Uncertainty in Articial Intelligence, Morgan-Kaufmann, San Francisco, Calif., 1993: 208-215. 21] Shichao Zhang and Chengqi Zhang, A Model for Propagating Probabilities, Proceedings of ICCIMA'98, Australia, 1998. 22] Shichao Zhang and Chengqi Zhang, A Method of Learning Probabilities in Bayesian Networks, Proceedings of ICCIMA'98, Australia, 1998.

18