BENEDICT: An Algorithm for Learning Probabilistic

0 downloads 0 Views 147KB Size Report
BENEDICT: An Algorithm for Learning Probabilistic Belief Networks. Silvia ACID. Depto. de Ciencias de la Computaci on e I.A.. Universidad de Granada.
BENEDICT: An Algorithm for Learning Probabilistic Belief Networks Silvia ACID

Depto. de Ciencias de la Computacion e I.A. Universidad de Granada 18071-Granada, Spain [email protected]

Abstract We develop a system that, given a database containing instances of the variables in a domain of knowledge, captures many of the dependence relationships constrained by those data, and represents them as a belief network. To obtain the network structure, we have designed a new learning algorithm, called BENEDICT, which has been implemented and incorporated as a module within the system. The numerical component, i.e., the conditional probability tables, are estimated directly from the database. We have tested the system on databases generated from simulated networks by using probabilistic sampling, including an extensive database, corresponding to the well-known Alarm Monitoring System. These databases were used as inputs for the learning module, and the networks obtained, compared with the originals, were consistently similar.

1 Introduction Belief networks [12] have become common knowledge representation tools able to eciently represent and manipulate dependence relationships by means of directed acyclic graphs (dags). A belief network has a graphical component which explicitly reveals dependence and independence relationships, and a numerical component used to quantify these relationships. This paper presents a system for the induction of probabilistic networks from databases. The complete representation is obtained by means of an algorithm for learning the graph structure, called BENEDICT (BElief NEtworks DIscovery using Cut-set Techniques),  This work has been supported by the DGICYT under Project PB92-0939

Luis M. de CAMPOS

Depto. de Ciencias de la Computacion e I.A. Universidad de Granada 18071-Granada, Spain [email protected]

this determines which dependence relationships can be deduced from the data, and a parameter learning process over this structure that estimates the strength of these dependencies. We have tested the system by generating databases from networks using the probabilistic logic sampling method, and then using those databases as inputs to BENEDICT, after comparing the learnt and original networks. The paper is organized as follows: in Section 2 we brie y describe some preliminaries which are necessary to understand the proposed algorithm. Section 3 contains a detailed description of the algorithm, as well as a study of its time complexity. The results of the experiments on di erent databases are to be found in Section 4. Finally, Section 5 is devoted to the conclusions and future research.

2 Learning Methods for Belief Networks Nowadays, the problem of learning or estimating a belief network from data is receiving increasing attention within the community of researchers into Uncertainty in Arti cial Intelligence. The reason is that partially automated learning methods may alleviate the knowledge adquisition bottleneck, common to most techniques for knowledge representation and reasoning. In the literature on belief networks, basically we can nd two approaches to the learning problem,

 Methods based on conditional independence tests  Methods based on a scoring metric The algorithms based on independence tests carry out a qualitative study on the dependence and independence properties among the variables in the domain, and then try to nd a network representing these properties. So, they take as the input a list of conditional independence relationships (obtained, for ex-

ample, from a database by means of conditional independence tests), and the output is a graph displaying these relationships as far as possible. The main computational cost for these algorithms is due to the number and the complexity of the independence tests. Some of the algorithms based on this approach can be found in [4, 5, 8, 14, 15].

solution space (in our case the set of all the networks compatible with the ordering), searching for optimum (no discrepancy) or at least good (low discrepancy) solutions. So, we have to study what conditional independencies a belief network represents, and how to measure the degree to which these independencies are supported by the database.

The algorithms based on a scoring metric try to nd a graph which has the minimum number of links that `properly' represents the data. They all use a function (the scoring metric) that measures the quality of each candidate structure, and an heuristic search method to explore the space of possible solutions, trying to select the best one. So, each one of these algorithms is characterized by the speci c scoring metric and search procedure used. Some algorithms that use this approach may be found in [5, 13], where the search space is restricted to trees or polytrees, and the scoring function is based on marginal and/or rst order conditional dependence degrees (such as the Kullback-Leibler cross entropy). Other algorithms, which work on general dags, almost invariably use greedy searches, and the scoring metrics are based on di erent principles, such as entropy [10], bayesian approaches [6], or Minimum Description Length [11].

In order to measure the discrepancy between a conditional independency statement found in the graph, say ( j ), and the database, we are going to use the Kullback-Leibler cross entropy ( j ), which measures the degree of dependence between and , given that we know (we could also use other di erent dependency measures, such as those considered in [1]). The Kullback-Leibler cross entropy is de ned as follows:

The algorithm we are going to describe belongs to the group of methods based on a scoring metric, although it also has some similarities with the algorithms based on independency tests: it explicitly uses the conditional independencies embodied by the topology of the network to elaborate the scoring metric.

3 The BENEDICT Algorithm We have developed an algorithm, called BENEDICT, that takes a database as input and returns a belief network. BENEDICT determines the network structure under the assumption that a total ordering for the variables is known. This assumption, although somewhat restrictive, is quite frequent for learning algorithms [6]. We can understand this ordering as either causal (causes before e ects) or temporal (determined by the occurrence time for each variable), although it may also be arbitrary. The basic idea of the algorithm is to measure the discrepancies between the conditional independencies represented in any given candidate network and the ones displayed by the database. The lesser these discrepancies are, the better the network ts the data. A measure of global discrepancy is then used as an heuristics by the search procedure, which explores the feasible

I X; Y Z

D X; Y Z

X

Y

Z

j )=

(

D X; Y Z

X

x;y;z

P

(x

;

y; z) log

(x yjz) (xjz) (yjz) P

P

;

P

where x, y, z denote instantiations of the sets of variables , and respectively. Observe that the value of ( j ) is zero if and are indeed independent, given , and the more dependent and are, given , the greater ( j ) is. X

Y

Z

D X; Y Z

X

Y

Z

X

Z

Y

D X; Y Z

X2

X5 X1

X4

X7 X6

X3

Figure 1: A dag of seven nodes G

On the other hand, the capacity of belief networks for representing independency statements is based on the well-known graphical independence criterion called dseparation [12]. However, the number of d-separation statements displayed by a graph may be very high, and for eciency and reliability reasons (for the computation of the global discrepancy), we want to exclude most of them, and use only a `representative' set. To illustrate this point, let us consider the graph in Figure 1. For example, the number of graphical independencies that represents, involving only the single nodes 7 and 1, is 18: G

G

x

x

( (

For example, for the network in Figure 1, and assuming that the ordering of the variables in is ( 1 2 6 3 5 4 7), we need to compute the following dependency degrees:

j ) ( j ) 1j 2 4 ) ( 7 1 j 2 5 ) ( 7 1 j 2 6 ) ( 7 1j 4 5) ( 7 1j 5 6) 1j 2 3 6 ) ( 7 1 j 2 4 5 ) ( 7 1 j 2 4 6) ( 7 1 j 2 5 6 ) ( 7 1 j 3 5 6) ( 7 1 j 4 5 6 ) 1j 2 3 4 6 ) ( 7 1 j 2 3 5 6 ) ( 7 1 j 2 4 5 6 ) ( 7 1 j 3 4 5 6) 1j 2 3 4 5 6)

d

I x7 ; x 1 x2 ; I x 7 ; x 1 x 5 ; I x7 ; x

(

x ;x

I x ;x

I x7 ; x

(

; I x ;x

x ;x

x ;x ;x

x ;x

; I x ;x

; I x ;x

; I x ;x

x ;x

x ;x

x ;x ;x

; I x ;x

x ;x ;x

;

I x ;x

x ;x ;x

; I x ;x

x ;x ;x

;

x ;x ;x ;x

I x ;x

; I x ;x

x ;x ;x ;x

x ;x ;x ;x

; I x ;x

D x2 ; x

;

x ;x ;x ;x

x ;x ;x ;x ;x

G

I X; Y Z

X

Y

x ; x ; : : : ; xn

Z

U

U

d

U

Ui

x ; x ; : : : ; xi ; : : :

x ; : : : ; xi

xi

I xi ; Ui

d

Bi Bi

Bi

Bi

Ui

xi

Bi

n

I xi ; U i

B i Bi ; i

; : : :; n

i

Bi

xi

n

Ui

D x i ; Ui

D x3 ; x

;

D x4 ; x

:

U

j;) j 4j 5 4j 5 7j 5

D x1 ; x6

( ) 6) 6) 6)

j )

xi

B i Bi

( ( (

j;) ( 7j 5 6) 7j 5 6)

Ui

xi ; : : : ; x i

Bi

I x i ; Ui

I x i ; xi

Bi

j

Bi Bi

; : : :; k

O n

O n

xi

xj

xj

x i