The CUBE lattice model and its applications

0 downloads 0 Views 372KB Size Report
it is easy to define a lattice structure which allows to handle conjunctions of ...... We have now to define dual operators ′ and ◦ which allow to find the common.
The CUBE lattice model and its applications Laurent CHAUDRON1 , Nicolas MAILLE2 , Marc BOYER3 1 ONERA Toulouse Research Center DCSD, 2 av. Belin, 31055 Toulouse Cedex 04, France 2 ONERA Salon-de-Provence Research Center DCSD, BA 70113661 Salon Air, France 3 associate researcher Toulouse Science University, 118 route de Narbonne, 31062 Toulouse [email protected] [email protected] [email protected]

Journal “Applied Artificial Intelligence” volume 17, number 3, March 2003, pages 207 - 242

Abstract The aim of this article is to describe a basic algebraic structure on conjunctions of literals. As far as knowledge representation is concerned, the comparison of different pieces of information is a pivotal question; generally, the classical set operators (inclusion, union, intersection, subtraction) are used at least as a metaphorical model. In many applications, the core problem is the representation of actual data or information for which the basic unit of knowledge to represent is a conjunction of properties (while traditionally, AI is devoted to solving models for which the basic unit is a disjunction of properties, i.e. clauses). A specific model, called the CUBE model has been designed so as to capture the extension of the natural set operators to a lattice on conjunctions of first order literals. The paper is organized as follows: after a description of the origin and the postulates of the model, i.e. a need for a formal structure for knowledge fusion (section 1 and 2), the Cube model is described (section 3). Then applications are detailed: the Cubical Formal Concept Analysis (section 4), the Cubical Rule Induction (section 5) and the Reasoning tracking (section 6).

1

Introduction

In many applications1 : intelligence fusion (Chaudron, 1995), perception and situation assessment (Chaudron et al., 1997b,a), approximate reasoning (Chaudron & Maille, 1998), multi-agent information processing (Tessier & Chaudron, 1996, Fiorino et al., 1997, Chaudron et al., 1998, Tessier et al., 2000a), knowledge mining and incident analysis (Chaudron, 1997, Aeronautics & Agency, 2003)..., the comparison of different pieces of high level semantic information appeared as a fundamental question. When numerical data are considered, various tools are available to represent and deal with uncertainty or approximation: for example, the notions of mean, variance and standard deviation give a framework within which the uncertain data can also be captured and considered as non aberrant. The problem is quite different when data are symbolic, in so far as it becomes impossible to consider a statement like ‘‘formula ζ is an approximation of formula ξ to the nearest 0.25%’’. The theoretical challenge could be to define a differential symbolic operator: ‘‘formula ϕ + δϕ’’. As a matter of fact, symbolic data are essentially based on discrete frames (Tarski, 1944). Of course, symbolic data may be projected on –or combined with– a numerical space equipped with predefined likelihood or preference measures. But, as it is a matter of context, no universal method is available (Dubois & Prade, 1994). Moreover, these measures automatically define a total order on the pieces of information, which may induce irrelevant relations between elements which were not comparable a priori; furthermore, the operators defined from these measures are often purely numerical and produce results in which both symbolic origins and causality links are lost. Considering it was essential to keep symbolic information within symbolic spaces, we decided to rely entirely on symbolic data both for our experimental results and for the theoretical models required by our applications. Consequently, the challenge was to define a mathematical model providing a “symbolic network” of the knowledge to be captured; this is the purpose of the Cube Model. The paper is organized as indicated in Figure 1: the structure of the article is centered on the Cube model, and the references to correlated domains are represented with doted lines (the definitions are labeled in a sequence while propositions, lemmas, and theorems are labeled in the same sequence). Each section aims at being self contained; the consummate reader will skip the 1

the applied projects or studies will not be detailed in the paper.

2

basic recalls. Introduction

1

Postulates

2

The CUBE Model

3

Information Fusion

Mathematical Reasoning

Formal Concept Analysis

Cubical Formal Concept Analysis

6

4

Cubical Reasoning Tracking

Inductive Logic Programing

Cubical Rule Induction

5

Conclusion

7

Figure 1: The structure of the paper

2

Postulates for a lattice approach

2.1

From fusion to lattices

As it has been explained, the main requirement of our applications is to compare symbolic information within an adapted symbolic space. Therefore, the desired formal model must provide tools and operators allowing to define the similarities between two pieces of information and moreover to describe their differences. From a mathematical point of view, comparisons can rely on a partial order or a complete order, as for numerical data. Any partial order structure offers restricted capabilities for similarities and differences descriptions as no assumptions are made upon the existence and unicity of specific elements such that upper bound, lower bound, supremum, infimum... By providing a least upper bound and a greatest lower bound2 to every couple of elements, lattice structures (Birkhoff, 1940, Davey & Priestley, 1990) give a reasonable improvement of the partially ordered set capabilities, while avoiding the strong requirements of totally ordered sets. Here, the lattice structure, which is used as a basic tool, offers constructive functionalities for a flexible knowledge model, which is required in our symbolic space. 2

also called respectively: supremum/infimum or join/meet.

3

For example, in an Intelligence Fusion application, (Chaudron et al., 1997a) a lattice framework allowed to design a sound definition of the “endogenous fusion”: assume the knowledge representation space, K, is equipped with a lattice structure; when two pieces of information i1 and i2 have to be fused, it is possible, even if they are not comparable, to define their infimum inf (i1 , i2 ) and supremum sup(i1 , i2 ) (see Figure 2). Any relevant fusion of i1 and i2 should at least be included in this local lattice. Indeed: either the criteria used to elaborate the fusion result are represented within K, hence they verify this local constraint, or else they are not, and then they remain unjustified. Thanks to such an hypothesis, in the worst case, the fusion result is at least as “good” as the infimum and it can be improved so as to reach the supremum or to as be kept at an optimal level. This hypothesis can be formally described: a fusion process F : K × K −→ K is said to be endogenous iffdef : F (i1 , i2 ) ∈ [inf (i1 , i2 ), sup(i1 , i2 )]. This approach allowed

sup(i1,i2) i1

i2 F(i1 ,i2 ) inf(i1,i2)

Figure 2: The endogenous fusion principle to define the basic processes of the symbolic level of a perception system and the correlated principles. Step by step, the requirements of the Cube model were designed.

2.2

Conjunctions of elementary properties

A large class of projects and applications share many characteristics of the fusion process as they rely on a description of an observed situation: perception of information, pilot activity, reasoning activity, incident reporting, beliefs modeling, multi-agent information exchanges... Therefore, the symbolic structure we want to built must allow to capture the elementary items of observation and it has to be also adapted to the basic symbolic processing (deduction, ...). This can be achieved through a sound mathematical approach.

4

Let us begin by the elementary items of observation; they can be considered as facts: “the aircraft is climbing”, “line A is parallel to line B”, “there is a cat” ... A description of the scene is then a conjunction of these elementary observation. We can notice that disjunctions are not directly observable3 . Of course a given a system may observe a scene without being able to classify the different informations: “the radar signature of the object 1 is close to the one of an helicopter or a small aircraft”. But this disjunction is a consequence of the classification process required by the system but this disjunction was not included in the original perceived data. In such a case, one can consider 2 possible worlds: world 1 within which the object 1 is an helicopter and world 2 within which the object 1 is a small aircraft. The assessment of these experimental results led to assume that a basic model dedicated to capture elements of observation of a world should be based on conjunctions of properties . This is the purpose of the next section.

2.3

Conjunctions of logic properties and lattice structure

Logical languages offer a simple framework for the representation of elementary properties as they allow to assign a symbolic word to an observable entity or characteristic and to define relations between all the defined words. In very simple applications, a world description based on a zero order logical language may be sufficient. In this case, each elementary property is captured by a literal of the language, i.e. a proposition. Let L = {A, B, C, ...} be the set of literals of this language. A conjunction of properties can be represented by a subset of L. In such a simple framework, the basic comparison tools fit the usual set operators: ⊂, ∪, ∩, −. As an example, the singleton {B} states that, in the world, the property B has been observed. The set {A, B} states that both properties A and B have been observed and this is a more “complete”4 description of the world. We will say that {A, B} is more informed than {B} and the partial order 3 this important statement will not be deeply discussed here from the epistemological point of view. We just postulate that no perception system can capture disjunctive nor negative information as these operators are specific to reasoning or decision making functionalities. This point has been foreseen by Watzlawick et al. (1967) and can be exemplified by the challenge: “Find an example of a sensor of disjunctive information”. 4 in the informal sense of the number of information items

5

relation fits the set inclusion ⊂. From a mathematical point of view, each world description is a a subset of L. The set of all the possible description is then the power set of L: P(L). It is known that (P(L), ⊂, ∩, ∪) is a lattice. Given two elements of P(L): e.g. {A, B} and {B, C}, the set intersection and union allow to build a sub-lattice (Figure 3). logical implication

{A,B,C}

{A,B}

{B,C} {B}

Figure 3: A small zero order sub-lattice Then the similarity between {A, B} and {B, C} is represented by their infimum {B} and the differences between {A, B} and {B} can be captured by the set difference: {A, B} − {B} = {A}. Considering a finite set of properties interpreted as the conjunction of them, such a very simple lattice may be used in various ways: if the elements represent the hypothesis of local subproblems to solve, the infimum will represent the most important property to be proved. Conversely, if they represents the conclusion of proved problems the supremum is ensured. Trivially, the set inclusion represents the converse logical implication. One must notice that if the supremum corresponds exactly to the logical conjunction, the infimum is more general than the logical disjunction of the classical boolean logical lattice5 . As a first conclusion, we can see that in the framework of zero order logic, it is easy to define a lattice structure which allows to handle conjunctions of properties in a symbolic space relevant for application based on descriptions of the world. The idea is now to extend the natural set operators used in the propositional calculus to first order logic. Indeed, propositional calculus may be sufficient to capture small logical exercises, but as far as general properties 5

indeed, from the set point of view: {A, B} ∩ {B, C} = {B}. From the logical point of view: (A ∧ B) ∨ (B ∧ C) = B ∧ (A ∨ C) which implies B, the converse being false.

6

have to be stated “A is parallel to another line x” first order properties must be addressed paral(A,x). Unfortunately, the naive set notions of union and intersection are unable to deal with first order questions: how to compare i1 = {A(1), B(x)} and i2 = {B(2)}? What is the local sub-lattice (see Figure 4)? Are they comparable to i3 = {A(x), B(x)}? ?? {A(1),B(x)}

{B(1)}

??

Figure 4: The first order problem

A sound model is required, this is the purpose of the next section 3 which is the pivotal part of the article. The Cube model was previously introduced in (Chaudron et al., 1997b) and it has been detailed and proved in (Chaudron & Maille, 1999). In the present paper, the Cube model is polished and enriched with the definition basic tools for the description of differences. In section 4, a first application to the definition of first order concept analysis is described. An combination of both theories to rule induction is presented in section 5. An application of the Cube to a reasoning tracking model is presented in section 6. See Figure 1 for the global structure of the article).

3

The CUBE model

3.1

The definition of cubes

The cube model is an algebraic structure dedicated to the representation and the comparison of conjunctions of elementary properties. It is based on a classical first order logical language6 : sets of variables (e.g. V ar = {x, y, ...}), constants (e.g. Const = {BO727, DC9, MD82, Landing, TkOff...}), functions (e.g. F unct = {crew, ...}), predicates (e.g. P red = {People, Aircraft, Scenario, ...}) and connectives (Con = {¬, ∨, ∧, →, ↔, ∃x, ∃y..., ∀x, ∀y...}) 6

We just give here a short overview of logic as it relates to the cube model. Basic recalls on classical logic can be found in Chang and Lee (Chang & Lee, 1973) or any reference book.

7

are defined with their respective number of arguments. Then the set of terms T erm is the functional closure of F unct under V ar and Const (e.g. crew(DC9), Landing and crew(x) are terms). And the set of positive literals is the predicative closure of P red under T erm (e.g. People(crew(DC9)) and Aircraft(DC9,Landing) are positive literals). Positives literals and their negation form the set of literals L (e.g. ¬ People(crew(x)) is a literal). The set of well-formed formulas (wff) is the closure of Con under positive literals (e.g. ∃x People(crew(x)) → Aircraft(DC9,Landing) is a wff). The cube model directly use a part of such a general first order language: literals, conjunctions and the existential quantifier. It is an algebraic structure within which the definition of operators relies on the syntactical definition of the language. Nevertheless a logical system can always be defined from two equivalent viewpoints: the semantic view and the syntactic view. We will sometimes refer to the semantic view which allows to have an more intuitive comprehension of the operator meaning. Each elementary property is captured by a literal: if “A DC9 airplane is landing” is an elementary property of an incident analysis application, it could be represented by the literal Aircraft(DC9,Landing). Then, sets of elementary properties will map the elements of the power set of L: P(L). Elements of C = P(L) are called cubes as they are interpreted as the conjunction of the literals7 (e.g. {People(crew(DC9)), Aircraft(DC9,Landing)} and {People(crew(x)), Aircraft(x,TkOff), Aircraft(DC9,y)} are cubes). Cubes obviously play a dual role besides the classical clauses, hence by default, their variables are existentially quantified, as for queries in logic programming ((Shapiro & Sterling, 1986) pages 5-7). Therefore well-formed formula associated to the cube {People(crew(x)), Aircraft(x,TkOff), Aircraft(DC9,y)} is: ∃x ∃y People(crew(x)) ∧ Aircraft(x,TkOff) ∧ Aircraft(DC9,y) Supposing this cube comes from the codification of an aeronautical incident report, it could formalize the following information: • An Aircraft whose type is unknown (x) was taking off during the incident, 7

“cube” is the name which was defined for the first time in 1989 by A. Thayse (Thayse & col., 1989). The word “product” was previously used –but not defined– in 1975 by Vere (Vere, 1975).

8

• The crew of this8 aircraft People(crew(x)) is involved in the incident, • A DC9 is involved in the incident; its flight phase is unknown. .

3.2

An order relation

As the aim of the cube model is to build a symbolic space allowing relevant comparisons between conjunctions of properties, an order relation on cubes is required. We would like this order relation built on C to capture the intuitive notion of “information enrichment”. But such an enrichment can be obtained via different means: quantity of information, precision of terms, logical dependency. . . We want to take into account the criteria relevant to the description of an observed situation, i.e.: “quantity” and “precision”. • “Quantity” of information: let us consider the description of an aeronautical incident by the sentence: “A DC9 is involved in the incident, its flight phase is not assessed” it is intuitively less informative than the description of this incident by this second sentence: “A DC9 is involved in the incident, its flight phase is not assessed and another Aircraf t (the type of which is unknown ) was taking off during the incident”. Thus we would like the order relation ≤c to capture such a “quantity” level of information: {Aircraft(DC9,y)} ≤c {Aircraft(x,TkOff), Aircraft(DC9,y)}. • “Precision” of the information. The sentence: “A DC9 is involved in the incident, its flight phase is not assessed” provides less information than: “A DC9 is involved in the incident and it was landing” Here we would like the order relation to capture such a “precision” level: {Aircraft(DC9,Landing)} ≥c {Aircraft(DC9,y)}. 8

It is the same variable (x) as it is in the scope of the same existential quantifier.

9

The combination of both intuitive criteria cannot directly give a total order: {Aircraft(DC9,Landing)} is not comparable with {Aircraft(x,TkOff), Aircraft(DC9,y)} as the first one is more precise but less complete than the second one. A definition of an order relation, relevant for the combination of both criteria, is given by the classical subsumption operator (Robinson, 1965), (Plotkin, 1970). Definition 1: (Order relation) Let Sub be the set of substitution9 and c1 , c2 be two cubes. The order relation ≤c on cubes is defined by: c1 ≤c c2

⇔def

(∃σ ∈ Sub) σ(c1 ) ⊂ c2

Then an equivalence relation ∼ on cubes is built: c1 ∼ c2

⇔def

c1 ≤c c2

and c2 ≤c c1

Example: c1 = {Aircraft(DC9,y)} and c2 = {Aircraft(x,TkOff), Aircraft(DC9,Landing)}. The substitution σ such that σ(y) = Landing applied to c1 gives: σ(c1 ) = {Aircraft(DC9,Landing)} ⊂ {Aircraft(x,TkOff), Aircraft(DC9,Landing)} = c2 and we have c1 ≤c c2 .

3.3

Lattice structure

We saw in the introduction that if L is a zero order logical language then (P(L), ⊂, ∩, ∪) is a lattice. With a first order language, the order relation defined by set inclusion ⊂ has been replaced by the subsumption operator. In the same way, sound definitions for the intuitive concepts of union and intersection of two finite information sets are needed, in accordance with the following requirements: the meet has to capture the common features (while giving more information than the empty set frequently generated by the unification rule); the join has to cope with the complementary criteria: quantity/precision of the information (while giving a more synthetic result than the set union). Our work relies on the definition of a lattice structure on the terms algebra (Plotkin, 1970), or on atomic formulas (Reynolds, 1970). More precisely, we adopt the approach of Lassez (Lassez et al., 1987) and its definition of the anti-unification operator (noted f inu) and then we extend this operator to 9 Substitutions are all mappings from V ar to T erm (Robinson, 1965), extended to literals (Huet, 1976), to clauses (Chang & Lee, 1973), or to cubes (Maille, 1999).

10

cope with the cube algebra (Chaudron et al., 1997b, Chaudron & Maille, 1999). In the following definitions, let Φ be any bijection between T erm × T erm and V ar. Definition 2: (Anti-unification (Lassez et al., 1987)) The anti-unification operator f inu : T erm × T erm −→ T erm is defined as follows: let f be a function and s1 , ..., sn , t1 , ..., tn , s, t be terms, f inu(f (s1 , ..., sn ), f (t1 , ..., tn )) = f (f inu(s1 , t1 ), ..., f inu(sn , tn )), for every function or constant symbol, f inu(s, t) = Φ(s, t), otherwise. In a first step, f inu is extended to: L × L −→ C by the following rules (see (Maille, 1999)): Definition 3: Let: P be a predicate, s1 , ..., sn , t1 , ..., tn be terms and k, l be literals, f inu(P (s1 , ..., sn ), P (t1 , ..., tn )) = {P (f inu(s1 , t1 ), ..., f inu(sn , tn ))} for every predicate symbol. f inu(¬P (s1 , ..., sn ), ¬P (t1 , ..., tn )) = {¬P (f inu(s1 , t1 ), ..., f inu(sn , tn ))}, for every predicate symbol f inu(k, l) = {} otherwise. Finally f inu is extended to: C × C −→ C: Definition 4: Let k1 , ..., kn , l1 , ..., lp be literals. S S f inu({k1 , ..., kn }, {l1 , ..., lp }) = i∈[1,n] ( j∈[1,p] f inu(ki , lj )) Examples: (1) {P(x,g(y,b))} is the cube resulting from the anti-unification of the two literals P(a,g(a,b)) and P(1,g(b,b)). (2) The anti-unification of {P(x),Q(x)} and {P(1),Q(2)} is {P(x),Q(y)}. (3) f inu({People(crew(x)), Aircraft(x,TkOff), Aircraft(DC9,y)}, {Aircraft(x,TkOff), Aircraft(DC9,Landing)}) = {Aircraft(x1 ,TkOff), Aircraft(x2 ,x3 ), Aircraft(x4 ,x5 ), Aircraft(DC9,x6 )}.

The information shared by the literal Aircraft(x2 ,x3 ) in the cube {Aircraft(x1 ,TkOff), Aircraft(x2 ,x3 ), Aircraft(x4 ,x5 ), Aircraft(DC9,x6 )} is not significant. The same 11

issue was noted by Plotkin in (Plotkin, 1970) with clauses: “However, when C1 is equivalent to C2 , C1 and C2 need not be alphabetic variants... It turns out that there is a reduced member of the equivalence class of any clause. This member is unique to within an alphabetic variant.” As clauses and cubes play a dual role, we will use the same notion of reduction in order to obtain a unique (up to variable renaming) reduced member of any cube. The proof of the existence of such a unique reduced member (reduc(c)) of the equivalence class can be directly deduced from Plotkin works and is also available in (Maille, 1999), p. 69. Definition 5: a cube c is reducible if there exists a substitution θ such that cθ ⊆ / c; cθ is a reduction of c. An irreducible reduction of c, noted reduc(c) , always exists, is unique up to variable renaming and, is equivalent to c. C r denotes the subset of all the irreducible cubes of C Example: it is clear that reduc({P(x),P(1)}) = {P(1)} but reduc({P(1,x),P(y,2)}) = {P(1,x),P(y,2)}. Therefore {P(x),P(1)} 6∈ C r but {P(1)} ∈ C r and {P(1,x),P(y,2)} ∈ C r .

We can now focus on the set of reduced cubes C r and define the meet and join operators which take the place of the set intersection and union in the lattice structure. Definition 6: Join and meet operators ∪c and ∩c are defined on C r as: c1 ∪c c2 =def reduc(σ(c1 )∪c2 ), where σ is a substitution which standardizes the variables of c1 and c2 apart; c1 ∩c c2 =def reduc[f inu(c1 , c2 )]. Theorem 1 (C r , ≤c , ∪c , ∩c ) is a non-modular lattice . Proof: A complete detailed proof of this theorem can be found in (Maille, 1999), pp. 76-78. The pivotal hints are given hereafter. Let c = {l1 , ..., ln } be a cube. We have to prove that ∩c and ∪c are idempotent, commutative, associative and satisfy the absorption laws. 1. Commutativity of ∩c relies on the commutativity of the anti-unification operator. The one of ∪c relies on the commutativity of S the set union. S 2. Idempotency of ∩c : c∩c c = reduc[f inu(c, c)] = reduc[ ( i∈[1,n] j∈[1,n] f inu(li , lj ))]. S S As for any literal l, f inu(l, l) = l we have c ⊂ i∈[1,n] ( j∈[1,n] f inu(li , lj )). As for any f inu(li , lj ) there exists a substitution such that 12

S S σi (f inu(li , lj )) = li S there exists S a substitution such that σ( i∈[1,n] ( j∈[1,n] f inu(li , lj ))) ⊂ c. Therefore c and i∈[1,n] ( j∈[1,n] f inu(li , lj )) are equivalent and there reductions are equal up to variable renaming. Idempotency of ∪c : we have c ⊂ (c ∪ σ(c)) and as σ is a substitution which standardizes the variables of c and c apart, we also have σ −1 (c) = c. So σ −1 (c ∪ σ(c)) = c ∪ c = c and reduc(c ∪ σ(c)) = reduc(c). 3. Associativity of ∩c and ∪c rely on the associativity of the set union as both ∪c and f inu are mainly build on the set union operator. 4. Absorption laws: c1 ∩c (c1 ∪c c2 ) = c1 and c1 ∪c (c1 ∩c c2 ) = c1 . These proofs are long and they will not presented in this paper. They rely on two properties: (1) f inu(c1 , c2 ) ∼ f inu(reduc(c1 ), c2 ) and (2) f inu(c1 , c2 ) ≤c c1 . 5. non-modularity. Let c1 , c2 and, c3 be the three following cubes: c1 = {Q(x), P (y, x), P (y, 2)}, c2 = {Q(x), P (y, 2)}, c3 = {Q(2), P (2, 1)}. Clearly: c2 ≤c c1 We have: c1 ∪c c3 = c2 ∪c c3 = {Q(2), P (2, 1), P (y, 2)} and c1 ∩c c3 = c2 ∩c c3 = {Q(x), P (y, z)}. The local lattice defined by these three cubes is called N5 (see Figure 5).

c1 c2

c3

Figure 5: A cubical non-modular sub-lattice Thus C r has a sub-lattice isomorphic to N5 and therefore is non-modular (thus non-distributive) (Davey & Priestley, 1990). This result is theoretically important as far as it assesses the intrinsic complexity of the logical representation of first order conjunction, the cubes (resp. disjunctions, the clauses). Indeed, in the set theory framework, ∩ and ∪ are distributive to each other; likewise in the Lindebaum-Tarski lattice (i.e. the set F of all well formed formulas quotiented by the equivalence relation) ∧ and ∨ are distributive. From the practical and programing point of view, the result of the processing is guaranteed to be independent from the order of the sub-processes. This “good” property is lost when the lattice is not distributive and a fortiori if it is not modular (as: distributive ⇒ modular). In particular, it is well known that (F, ∧, ∨, →, ¬, T rue, F alse) is a boolean lattice in which each formula ϕ gets a unique complement ¬ϕ, i.e.: ϕ ∨ ¬ϕ = T rue and ϕ ∧ ¬ϕ = F alse. It is not the case in non-distributive 13

lattices (Figure 6). in D4 , the black element on the right has a unique com-

?

? D4

M5

N5

Figure 6: Lattices: D4 distributive, M5 non-distributive, N5 non-modular plement (the gray element on the left), in M5 and N5 , the unicity is lost. The non-modular situation has been widely studied in quantum logic (Dalla Chiara, 1986), or within specific works on negation (Hartonas, 1996, Dunn, 1996). But many theoretical improvements are needed in order to equip the Cube (resp. the Clause) model with an adapted negation operator allowing to define a complement, which is a prerequisite to a universal difference operator. This program is currently under study (Tessier et al., 2000b) pp 24-26. If a global solution is formally yet out of range so as to define a sound difference operator in C r , a traditional way to deal with such a drawback consists in defining local solutions, this is the purpose of paragraph 3.5 (another sub-solution will be launched section 6). Beforehand, in next paragraph, the role of the infimum as a similarity operator is emphasized.

3.4

Lattice structure and similarity

Our aim was to build a symbolic space allowing to describe the similarities and differences between pieces of information captured by conjunctions of properties. The meet operator ∩c of the cube model describes the common knowledge between two pieces of information. Example: given c1 and c2 c1 = {People(crew(x)), Aircraft(x,TkOff), Aircraft(DC9,y)} c2 = {People(crew(DC9)), Aircraft(DC9,x), Scenario(Windshear)} Similarities between c1 and c2 are captured by the infimum cube: c1 ∩c c2 = {People(crew(x)), Aircraft(x,y), Aircraft(DC9,z)} which means that in both cases the crew of an aircraft is involved in the incident and a DC9 is involved in the incident (it is not sure that the crew 14

involved in the incident is the one of the DC9 because of c1 ). Their supremum is: c1 ∪c c2 = {People(crew(x)), Aircraft(x,TkOff), Aircraft(DC9,y), People(crew(DC9)), Scenario(Windshear)} All the other cubes of the symbolic space can be located with regard to these 4 cubes. c1 ∪c c2 = {People(crew(x)),Aircraft(x,TkOff),Aircraft(DC9,y),People(crew(DC9)),Scenario(Windshear)} c1 = {People(crew(x)), Aircraft(x,TkOff), Aircraft(DC9,y)}

c2 = {People(crew(DC9)), Aircraft(DC9,x), Scenario(Windshear)}

c1 ∩c c2 = {People(crew(x)), Aircraft(x,y), Aircraft(DC9,z)}

Figure 7: The sub-lattice C{1,2} generated by c1 and c2

Thus, any information related to c1 and c2 will be represented by a cube which is situated inside this generated sub-lattice, say C{1,2} (there are not represented in extension in Figure 7). The Cube lattice model provides a relevant answer to the notion of similarity of conjunctives pieces of information. The converse question is the topic of next section.

3.5

Lattice structure and differences

Looking for a local solution to characterize differences between two cubes means to describe what makes a given cube c1 different from another c2 . The idea is to remain inside the lattice sub-space C{1,2} generated by c1 and c2 . Hence, the intrinsic characteristic of this sub-lattice will allow to design a local definition of these differences. Generally, the intrinsic meaning of any lattice greatly relies on the elements that cannot be defined as the join of two other elements (as prime numbers in IN). Such elements are named join-irreducible and in a finite lattice 15

every element can be represented as a join of a finite number of join irreducible elements (Birkhoff, 1940). They represent the elementary knowledge of the domain and all the differences between elements of the lattice can be decomposed on this basis. Definition 7: (join-irreducible (Birkhoff, 1940)) An element a of a lattice (L, ∧, ∨) is called join-irreducible if: a is not one of the universal bounds, and (∀x, y ∈ L)x ∨ y = a ⇒ x = a or y = a. If L is a lattice, L∨ denotes the set of all the join-irreducible elements of L. For example, in the lattice of Figure 8, the join-irreducible elements are marked:

Figure 8: An example of join-irreducible elements

Definition 8: Let L be a lattice and c an element of L. The principal ideal generated by c, ↓ c is defined by ((Davey & Priestley, 1990) page 184): ↓ c = {x ∈ L | x ≤ c} Then, the set of all the join-irreducible elements of the principal ideal generated by c is: (↓ c)∨ = {x ∈ (↓ c) | x ∈ L∨ }. Examples: c1 = {People(crew(x)), Aircraft(x,TkOff), Aircraft(DC9,y)} (↓ c1 )∨ = { {People(x)}, {People(crew(x))}, {Aircraft(x,y)}, {Aircraft(x,TkOff)}, {Aircraft(DC9,y)}, {People(crew(x)), Aircraft(x, y)}, {People(crew(x)), Aircraft(x, TkOff)} } c2 = {People(crew(DC9)), Aircraft(DC9,x), Scenario(Windshear)} (↓ c2 )∨ = { {People(x)}, {People(crew(x))}, {People(crew(DC9))}, {Aircraft(x,y)}, {Aircraft(DC9,x)}, {Scenario(x)}, {Scenario(Windshear)}, {People(crew(x)),Aircraft(x,y)}, {People(crew(DC9)),Aircraft(DC9,y)} }

16

We can now define a difference operator ⊖ for elements of a lattice. Definition 9: Let L be a lattice, c1 and c2 two elements of L. The difference operator ⊖ from L × L to P(L) is defined by: c1 ⊖ c2 = (↓ c1 )∨ − (↓ c2 )∨ , where − denotes the classical set difference. Examples: c1 ⊖ c2 = { {Aircraft(x,TkOff)},

{People(crew(x)),Aircraft(x,TkOff)} }

c2 ⊖ c1 = { {People(crew(DC9))}, {Scenario(x)}, {People(crew(DC9)),Aircraft(DC9,y)} }

{Scenario(Windshear)},

Thus, c1 differs from c2 because it captures the information “an aircraft is taking-off” (represented by the cube {Aircraft(x,TkOff)}) and the information “the crew of an aircraft which is taking-off is involved in the incident” (represented by the cube {People(crew(x)),Aircraft(x,TkOff)}). In the same way, c2 ⊖ c1 describe the elementary properties captured by c2 but not by c1 . The development of this difference operator ⊖, based on join-irreducible elements, is currently under study in the frame of an experience feedback project for flight security improvement: the pilot activity and the deviations are tracked thanks to a cube based model (Chaudron et al., 2002).

3.6

Cube implementation

All the operators of the cube model (subsumption, anti-unification, infimum, supremum) have been developed and implemented in CLP, Constraint Logic Programing. The implementation features will not be presented in the paper, see Maille (1999) for details. A main application of these programs is the Cubical Formal Concept Analysis, section 4.

3.7

Cube conclusion

The cube model described in this section allows: • to represent conjunctions of properties with a first order logical language: any piece of information is a cube, • to have a partial order consistent with the two following criteria: quantity and precision of the information: (C, ≤c ) is a partially ordered set, 17

• to organize all the information within a lattice structure: (C r , ≤c , ∩c , ∪c ) is a lattice, • to describe the common information shared by pieces of information: similarities between c1 and c2 is captured by their meet c1 ∩c c2 , • to describe their differences: the set of elementary properties shared by c1 and not by c2 is c1 ⊖ c2 and, the set of elementary properties shared by c2 and not by c1 is c2 ⊖ c1 . Therefore the symbolic space described allows to deals with comparison of symbolic data within a symbolic space. This basic tools are used for many applications: data analysis, activity modeling, symbolic fusion... As CLP is a natural extension of Logic Programming, an extension of the purely symbolic Cube has been designed and implemented as a “Constrained Cubes Model” i.e. Cube associated with numerical ordering constraints. This work (see (Maille, 1999), pp 145-180) is currently under study.

4

Application to “Cubical Formal Concept Analysis”

Foreword: this section is a deeply revised and augmented version of (Chaudron & Maille, 2000).

As far as knowledge information is concerned, the induction of clustered groups of entities from a context described through their features is a pivotal topic. Generally, if numerical valuations (belief measures, preferences...) can be defined on the considered data, numerical or mixed methods can be directly used: classical classification tools, fuzzy methods and, closer to the symbolic community, rough sets (Pawlak, 1991, Skowron & Polkowski, 1998) or cartesian space model (Ichino & Ono, 1998)... In some cases, requirements or accessibility constraints imply to rest only on symbolic attributes. Thus, more fundamental models and techniques are required; Formal Concept Analysis -also called “Galois Lattices”- (Ganter & Wille, 1999) is a suitable candidate for such a purpose. As FCA theory relies on propositional calculus, an extension may be required by a large amount of applications so as to characterize the context with more detailed attributes (e.g. properties with arguments: Speed(vehicle1 ,12), Aircraft(BO727,landing)). The cube model is a relevant candidate for the representation of the context and this section describe an adapted extension (see foreword hereafter) of FCA 18

named Cubical Formal Concept Analysis (CFCA). As a matter of fact an extension of FCA named Generalized Formal Concept Analysis (G-FCA) to any kind of lattice-structured set of properties was built and published in (Chaudron & Maille, 2000). The idea of a generalized concept lattice was previously developed by Liquiere and is now also studied by Ganter & Kuznetsov (2001). This section focuses on the CFCA which was completely programmed in a Constrained Logic Programming language and is nowadays used to identify the rules of linkage and correspondence for similar events issued from accidents reports (Aeronautics & Agency, 2003) and moreover various projects of knowledge discovery. The generic “incident analysis” example is used throughout the section. Proofs are omitted as they can be found in (Chaudron & Maille, 2000).

Foreword: the basic theorem (Ganter & Wille, 1999) which states that each lattice is isomorphic to a concept lattice lies in the foundation of FCA. Consequently, there is no possible actual “generalization” nor “extension” of any kind of FCA. Nevertheless, the notion of context in FCA is defined as a binary relation between two sets: objects and propositional literals; the description of real properties of operational contexts may require first-order literals so as to correctly capture the knowledge involved (see the incident analysis example throughout this paper). This is the purpose of the so-called “Generalized Formal Concept Analysis”; thanks to the previous reserves, it must be clear that this “generalization” only concerns the expression of the context.

4.1

Cubical Formal Concept Analysis

Let us consider a very simple example of aeronautical incidents analysis which is extracted from large data of the Nasa incidents / NTSB accidents databases (Aeronautics & Agency, 2003, Board, 2003). The data are represented in a synthetic table (see Table 1). Each event is qualified by attributes that are supposed to be important form the expert point of view, the first line captures that in the accident denoted “Acc1” the crew of the Boeing 727 (crew(BO727)) and the air traffic control (ATC) have both a part of responsibility, that the two aircrafts (BO727 and DC9) were in the Take Off flight phase and that the global scenario of the accident was a so-called 19

“Airmissground”. Accident

People

Acc1

crew(BO727) ATC ATC crew(MD82) ATC

Acc2 Acc3 Acc4

Aircraft Type Flight Phase BO727 Take Off DC9 Take Off BO727 Landing MD82 Landing B0727 Landing DC9 Take Off MD82 Landing

Scenario Airmissground Airmissground Airmissground Windshear

Table 1 : Accidents descriptions

Three predicates (People, Aircraft and, Scen) are deduced from this table. The second one (Aircraft) having two arguments : the type of the aircraft and its flight phase. Then each accident can be described by a cube and this table is considered as a context with four objects (the four accidents), each object been characterized by a cube. Definition 10: A cubical context is a pair (O, ξ) where O is finite set of objects, and ξ is a mapping from O to C r . Each object o in O has one and only one image p = ξ(o) in C r . As an example, the cubical context (O, ξ) extracted from Table 1 is defined by: O = {Acc1, Acc2, Acc3, Acc4} and the mapping ξ from O to C r such that: ξ(Acc1) = {People(Crew(BO727)), People(ATC), Aircraft(BO727,TkOff), Aircraft(DC9,TkOff), Scen(Airmissground)} ξ(Acc2) = {People(ATC),People(Crew(MD82)), Aircraft(BO727,Landing), Aircraft(MD82,Landing), Scen(Airmissground)} ξ(Acc3) = {People(ATC), Aircraft(BO727,Landing), Aircraft(DC9,TkOff), Scen(Airmissground)} ξ(Acc4) = {Aircraft(MD82,Landing), Scen(Windshear)} We have now to define dual operators ′ and ◦ which allow to find the common knowledge of a set of objects and the set of objects sharing a given property. Definition 11: The dual operators ′ and ◦ between O and ξ(O) are defined by: 20

A′ =def ∩c oi ∈A ξ(oi ) and B ◦ =def {oi ∈ O | B ≤c ξ(oi )} Thanks to G-FCA, we know that the pair of operators ′ and ◦ is a Galois connection between O and ξ(O) ⊂ C r . A proof is given in (Chaudron & Maille, 2000). For instance, we have: {Acc1,Acc2}′ = ξ(Acc1)∩c ξ(Acc2)) = {People(ATC),People(Crew(x)), Aircraft(BO727,y), Aircraft(x,y), Scen(Airmissground)} and : {People(ATC),People(Crew(x)), Aircraft(BO727,y), Aircraft(x,y), Scen(Airmissground)}◦ = {Acc1,Acc2} We are now equipped enough to define a concept of a cubical context as couple of sets of objects and properties stable for the ′ and ◦ operators. Definition 12: A cubical concept is a pair (A, B), A ⊂ O, B ∈ C r such that: A′ = B (up to variable renaming) and B ◦ = A The set of all cubical concepts defined by the context (O, ξ) is denoted as Lc . ({Acc1, Acc2} , {People(ATC), People(Crew(x)), Aircraft(BO727,y), Aircraft(x,y), Scen(Airmissground)}) is a concept of this cubical context. As in FCA, a lattice structure can be defined on the set of concepts Lc . Proposition 2: For cubical concepts (A1 , B1 ) and (A2 , B2 ) the relation defined by: (A1 , B1 ) ⊑ (A2 , B2 ) ⇔ A1 ⊆ A2 (⇔ B2 ≤c B1 ) is an order relation on Lc . Definition 13: The supremum: ⊔ and infimum: ⊓ are respectively defined on Lc as follows: ′

(A1 , B1 ) ⊔ (A2 , B2 ) =def ((A1 ∪ A2 ) ◦ , B1 ∩c B2 ) ′ (A1 , B1 ) ⊓ (A2 , B2 ) =def (A1 ∩ A2 , (B1 ∪c B2 )◦ ) Theorem 3: (Lc , ⊑, ⊔, ⊓) is a complete lattice.

21

4.2

CFCA and KDD

Let us now exploit the concept lattice built upon our example. The 12 cubical concepts of Lc are: 1: ({Acc1, Acc2, Acc3, Acc4},{Scen(x), Aircraft(y,z)}) 2: ({Acc1, Acc2, Acc3},{People(ATC), Aircraft(BO727,z), Scen(Airmissground)}) 3: ({Acc2, Acc4},{Scen(x), Aircraft(MD82,Landing)}) 4: ({Acc2, Acc3},{Scen(Airmissground), Aircraft(BO727,Landing), People(ATC)}) 5: ({Acc1, Acc3},{Scen(Airmissground), Aircraft(DC9,TkOff), Aircraft(BO727,z), People(ATC)}) 6: ({Acc4},{Aircraft(MD82,Landing), Scen(Windshear)}) 7: ({Acc3},{People(ATC), Aircraft(BO727,Landing), Aircraft(DC9,TkOff), Scen(Airmissground)}) 8: ({Acc2},{People(ATC) ,People(Crew(MD82)), Aircraft(BO727,Landing), Aircraft(MD82,Landing), Scen(Airmissground)}) 9: ({Acc1},{People(Crew(BO727)), People(ATC), Aircraft(BO727,TkOff), Aircraft(DC9,TkOff), Scen(Airmissground)}) 10: ({},{All-properties}) 11: ({Acc2,Acc3,Acc4},{Scen(x), Aircraft(y,Landing)}) 12: ({Acc1,Acc2},{People(ATC), People(Crew(x)), Aircraft(x,z), Aircraft(BO727,z), Scen(Airmissground)}) The Hasse diagram of the concept lattice shows all the links between the concepts (Figure 9): Scen(x),Aircraft(y,z) 000 111 000 111 000 111 1 000 111

Pers(ATC),Aircraft(BO727,z), Scen(Airmissground)

000 111

111 000 000 111 000 111 2 000 111 000 111

Aircraft(y,Landing) 11 Aircraft(MD82,Landing) 000 111 000 111 000 111 3 000 111 000 111

000 111 111 000 111 Scen(Windshear)000 6 000 111 000 111

Acc4

Aircraft(BO727,Landing) 000 111 000 111 000 111 4 000 111 000 111

Pers(Crew(MD82)) 000 111 000 111 000 111 000 111 000 111 000 111 7 8 000 111 000 111 Acc3 000 111

000 111

Acc2

111 000 000 111 000 111 5 000 111 000 111

12 111 000 000 111 000 111 9 000 111 000 111

Acc1

Aircraft(y,x) Aircraft(BO727,x) Pers(Crew(y)) Aircraft(DC9,TkOff)

Pers(Crew(BO727)), Aircraft(B727,TkOff)

000 111 111 000 000 111 10 000 111 000 111

Figure 9: Concept lattice diagram Clearly, as {Acc1,Acc2} is the intent of a concept, they are correlated and it is interesting to notice that the intent reveals that in both cases a B727 was involved with another aircraft x in a collision scenario; moreover their 22

were together in the same traffic phase y (thus, may be, contributing to the overshoot the ATC’s workload) and the crew of the second aircraft Crew(x) was also involved. This kind of mixed links was hidden through the propositional representation and revealed by the cubical formal analysis. Such a knowledge mining through incidents/accidents reports is of a major interest for aeronautical safety programs. Moreover, as the cube model allows the description of differences between cubes, we can use this facility to characterize differences between clusters. As an example, if we consider the clusters 5 and 12, we have: c1 ={Acc1, Acc3}’={Scen(Airmissground), Aircraft(DC9,TkOff), Aircraft(BO727,z), People(ATC)} c2 ={Acc1,Acc2}’= {People(ATC), People(Crew(x)), Aircraft(x,z), Aircraft(BO727,z), Scen(Airmissground)} Their common knowledge is captured by the cube {People(ATC), Aircraft(BO727,z), Scen(Airmissground)} (see concept 2) and their differences are described by: c1 ⊖c2 = {{ Aircraft(DC9,x) } , { Aircraft(x,TkOff) } , { Aircraft(DC9,TkOff)}} c2 ⊖ c1 = {{ People(Crew(x)) } , { People(Crew(x)), Aircraft(x,z)}} Hence, the mains characteristics of concept 5 compared to concept 12 is that it involves a DC9 taking-off TkOff. Dually the identity of concept 12 relatively to the concept 5 is that the crew of one aircraft has a responsibility in the accident. Therefore the CFCA allows to cluster knowledge based on conjunctions of properties represented in a first order language. As all the intent of concepts are described by cubes, their similarities and differences can be studied within a symbolic space with the operators of the cube model. Methodologies allowing to identify relevant clusters in a cubical concept lattice are currently under study. They may allow to find sets of accident reports which are both intrinsically homogeneous and well separated. Beyond the CFCA capabilities in terms of symbolic clustering, the discovery of links and rules among heterogeneous symbolic data is a main need of a tremendous amount of applications. This is the topic of next section.

23

5

Application to “Cubical Rule Induction”

The aim of this section is to present a rule induction approach from symbolic data. It is based on Generalized Formal Analysis - more precisely on the Cubical Formal Concept Analysis presented in the previous section enriched with Inductive Logic Programming features. The rule status is first defined and algorithms are proposed so as to produce checked symbolic rules. Second, we show how to gradate the rule space using again CFCA in a bootstrap step; this rule classification allows to evaluate the scope of each rule among the objects. Indeed, a specialist about the concerned domain examining his data with our tool, can have a global view of the syntactic causal relations, the stratification of this set and a graphical representation (via the rule concept lattice) of this classification.

5.1

Issues

The idea is to take advantage of two approaches: - in ILP: the efficiency of the rule production, - in FCA: the mathematical soundness. The Galois connection defines a sound rule production method that we extended thanks to ILP methodologies. The drawbacks of ILP relies on the bias of the rule production process, while FCA rigidity on the rule criteria (a rule must be totally justified to be produced) need to be shaded. This is the purpose of this work: inspired by the ILP facilities, the CFCA capabilities are extended to a new production and classification mechanism: the Formal Rule Analysis (FRA) (et M. Boyer, 2002). The details proofs and references can be found in (Boyer, 2001). • Inductive Logic Programming, say ILP (Muggleton & de Raerdt, 1994) is a research area formed at the intersection of Machine Learning and Logic Programming. ILP systems develop predicate descriptions from examples and background knowledge. The examples, background knowledge and final descriptions are all described as logic programs. We analyzed ILP capabilities through different approaches, in particular “Claudien”, a software developed by a Louvin University team (Dehaspe et al., 1996). Given a positive example set, a negative example set and a background, ILP systems can produce a theory T constituted by a set of attribute links. Unfortunately, ILP needs bias (information limiting space research) increasing the expert influence on the result of the mining. 24

• Rule induction by Galois connection is a production principle described in 1986 in the famous article (Guigues & Duquenne, 1986) and developed in the end of 80’s by Burmeister and Wille (Burmeister, 2000). The basic principle relies on a fundamental lemma which is a consequence of the classical Galois connection properties10 . We will use the CFCA notations: let us consider a given context (see Definition 10) (O, ξ) the following lemma is stated: Lemma 4 (∀A ∈ C r ), |= A → A◦′ .

11111111111 00000000000 00000000000 11111111111 00000000000 11111111111 A°’ 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 A 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111

C



r

O

Figure 10: The FCA rule principle

In fact, a more relevant rule should be: (∀ϕ ∈ (A◦′ ⊖ A)) |= A → ϕ as in traditional FCA the rule principle is summarized as: A → (A′′ − A) (see Figure 10). In the sequel, following the reserves of Definition 9, we refer to the general pattern of Lemma 4 but we will apply subtraction in each no ambiguous case.

Rule Production Using CFCA: A saturation rule production algorithm has been implemented in CLP11 so as to provide the set of all the rules that are verified and not refuted by all the objects of the context O. Example: back to the incident analysis context (cf. 4.1 and Definition 10) 24 rules are CFCA induced. A rule is saved in the database in a predicate 10

(∀A ∈ C r ), A ≤c A◦′ , (Chaudron & Maille, 2000), def.2, the algorithm is based on: - the incrementation of the cardinal of A, - the Galois connexion processing of the rule A → A◦′ , - a horn clause splitting of the rule, and - the update of the rule database. The programing features will not be detailed here. 11

25

RULE(identifier, ,consequence) RULE(reg(24),,People(Crew(MD82))) ... RULE(reg(7),,People(ATC)) ... RULE(reg(1),,Scen(Airmissground))

One must notice that CFCA guarantees that no rule can be induced if there exists at least one counter-example in the base. This drawback is solved in the ILP methodology: thanks to different parameters, rules which are refuted by a limited number (say “additional rules”) of objects can be ILP induced. Conversely, a CFCA induced rule based on one unique example is as credible as a rule based on the example majority. This second drawback must be solved thanks to a classification means. The situation can be summarized as follows: the finite set (considering that no functional term is in the language) of all subsets of literals generated by the context (O, ξ) allows to determine the finite set of all possible rule production (i.e. horn clauses), say Rmax 12 , the ILP induced rules, say RILP is a subset of Rmax and the CFCA induced rules, RCF CA , is a subset of the former (Figure 10).

R

Rmax

ILP

RCFCA

Figure 11: The different rule sets

Two aims appear: h how to define and control in a simple way the relevance of any

1

rule, h how to control the extension of R CF CA to the “additional

2

rules” of RILP . In the sequel, aim 1h will be solved thanks to two parameters. These criteria will allow to control the extension of aim 2h. Finally we will apply CFCA 12

the symbol max does not mean that Rmax is infinite, it aims at giving the idea that it is the greatest possible set of rules and it can be very large

26

itself on RCF CA and design a symbolic classification that will respect the criteria defined for 1h.

5.2

Rule classification

In order to distinguish belief relation between rules, Stumme (1999) proposes a numerical approach calculating two numbers: support and confidence. We propose a slight generalization of these definitions. Let ant and cons be two operators returning respectively the antecedent and the consequence of a given rule. Definitions 14: supp : Rmax → IR ′| r 7→ |(ant(r)∪cons(r)) |O| conf : Rmax → IR ′| r 7→ |(ant(r)∪cons(r)) |(ant(r))′ | Of course, given r ∈ Rmax , if |(ant(r))′ | = 0, i.e. if no object of the context verify the antecedent of the rule, conf (r) will be supposed to be equal to 0. Two trivial results can be stated: Propositions 5: (∀r ∈ Rmax )0 ≤ supp(r) ≤ conf (r) ≤ 1 and (∀r ∈ RCF CA )conf (r) = 1 In our Example, CFCA rule reg(1): People(ATC) → Scen(Airmissground), is verified by object Acc1, Acc2 and Acc4, thus supp(reg(1)) = 0.75. Thanks to the two criteria supp and conf , it is easy to define a different new map of Rmax and to give a first answer to aim 1h: In Figure 12, the set RCF CA of strongly validated rules (conf (r) = 1) is represented with a bricked area. The RILP is not detailed in the rest of the RCF CA but thanks to this double scaling a subset of additional can be cautiously defined and produced. A threshold ρ ∈ [0, 1] for supp will allow to select the rule that have a sufficient validity support on the objects (both in RCF CA and in all Rmax ). A second threshold σ ∈ [0, ρ] for conf , will allow to bound the confidence at the desired level. Thus the aim 2h is achieved by the determination of the set: Rσρ = {r ∈ Rmax |ρ ≤ supp(r) and σ ≤ conf (r) < 1}.

27

Additional rules

000000 111111 000000 1111111111 0000 000000 0000111111 1111 000000 111111 0000 1111 000000 0000111111 1111 000000 0000111111 1111 000000 0000111111 1111 000000 0000111111 1111 000000 111111 000000 111111 000000 111111 000000 111111

supp 1

ρ

σ

0

1

=1

RCFCA

conf

Figure 12: The set Rmax

Algorithms has been defined and a software has been implemented in CLP, so as to induce Rσρ in a consistent way with ILP methodology. The approach and results are detailed in (Boyer, 2001). The rule database is extended and weighted by supp and conf so as to provide a new rule database: RULEW(identifier w, ,consequence,supp,conf )

Example: the following set is the is rule data base RCF CA ∪ R0.50 , i.e., whatever the support is (ρ = 0) the confidence is greater than 0.5 (Figure 13): supp 1

ρ=

0 0

σ=

11111 00000 000000 111111 00000 11111 000000 111111 00000 11111 000000 111111 00000 11111 000000 111111 00000 11111 000000 00000111111 11111 000000 111111 00000 11111 000000 111111 00000 11111 000000 111111 00000 11111 000000 111111 00000 11111 000000 111111 00000 11111 000000 000001111111 11111 000000 conf 111111

0.5

Figure 13: Example of the set RCF CA ∪ R0.50 In RCF CA ∪ R0.50 , one can recognize the CFCA rules as their confidence is 28

equal to 1: RULEW(regw(36),,Aircraft(MD82,Landing),0.25,1) RULEW(regw(35),,Scen(Windshear),0.25,0.5) RULEW(regw(34),,Scen(Airmissground),0.25,0.5) RULEW(regw(33),,Aircraft(BO727,Landing),0.25,0.5) RULEW(regw(32),,People(ATC),0.25,0.5) RULEW(regw(31),,People(Crew(MD82)),0.25,0.5) RULEW(regw(30),,Aircraft(BO727,Landing),0.25,0.5) RULEW(regw(29),,Aircraft(BO727,TkOff),0.25,0.5) RULEW(regw(28),,People(Crew(BO727)),0.25,0.5) RULEW(regw(27),,Aircraft(MD82,Landing),0.25,0.5) RULEW(regw(26),,Aircraft(DC9,TkOff),0.25,0.5) RULEW(regw(25),,People(Crew(MD82)),0.25,0.5) RULEW(regw(24),,People(Crew(MD82)),0.25,1) RULEW(regw(23),,People(Crew(MD82)),0.25,0.5) RULEW(regw(22),,Aircraft(MD82,Landing),0.25,1) RULEW(regw(21),,Scen(Airmissground),0.25,1) RULEW(regw(20),,Aircraft(BO727,Landing),0.25,1) RULEW(regw(19),,People(ATC),0.25,1) RULEW(regw(18),,Scen(Airmissground),0.25,1) RULEW(regw(17),,Aircraft(DC9,TkOff),0.25,1) RULEW(regw(16),,People(ATC),0.25,1) RULEW(regw(15),,People(Crew(BO727)),0.25,1) RULEW(regw(14),,Scen(Airmissground),0.25,1) RULEW(regw(13),,Aircraft(DC9,TkOff),0.25,1) RULEW(regw(12),,People(ATC),0.25,1) RULEW(regw(11),,Aircraft(BO727,TkOff),0.25,1) RULEW(regw(10),,Aircraft(DC9,TkOff),0.5,0.667) RULEW(regw(9),,Aircraft(BO727,Landing),0.5,0.667) RULEW(regw(8),,Scen(Airmissground),0.5,1) RULEW(regw(7),,People(ATC),0.5,1) RULEW(regw(6),,Scen(Airmissground),0.5,1) RULEW(regw(5),,People(ATC),0.5,1) RULEW(regw(4),,Aircraft(DC9,TkOff),0.5,0.667) RULEW(regw(3),,Aircraft(BO727,Landing),0.5,0.667) RULEW(regw(2),,People(ATC),0.75,1) RULEW(regw(1),,Scen(Airmissground),0.75,1)

29

5.3

Symbolic approach for rule classification

The numerical supp and conf approach gives a useful tool so as to generate and compare rule induced by a context. Unfortunately our very first postulate (cf. Introduction): “rely entirely on symbolic data” is refuted here. The idea was to try to go deeper the meaning of the numbers (supp,conf ). Let us simplify and consider first only the support of the rules: it is easy to see that rule regw(1) (0.75) seems to be more efficient than rule regw(36) (0.25) as its spectrum on O is wider, but what are the reasons of this difference? Is rule regw(11) (0.25) in the same situation than regw(36) Is it possible to have an explicit view of the range of a rule on O? The idea is to make a self-analyze by CFCA: given a subset of rules R ⊂ Rmax , for each rule r and each object o in O, o verifies or not r. Thus, it is possible to directly define a new classical FCA context, say the “Rule context”, in which the “objects” are the rules of R and the “properties” are the objects of O (Table 2). Back to our example, if the subset of rules is the previous database RCF CA ∪ R0.50 , we have the classical FCA context. 1 × × ×

2 × × ×

3

4 ×

5

6

... 35 36 Acc1 ... Acc2 × × × ... Acc3 × × × × ... Acc4 ... × × Table 2: The RCF CA ∪ R0.50 Context.

Formal Concept Analysis of this new “Rule Context” provides a “Rule Lattice” (Figure 14). The Rule Lattice is labeled with the rule numbers allowing visualization of the hierarchy of induced rules. It is easy to see that the rules 1 and 2 appear at the bottom of the lattice: they are the most informative rules even though the rules 5, 3,.. 10 concern only two accidents (but not in the same way). We notice that this hierarchy matches exactly the support values (0.75 for rules numbers 1 and 2, 0.5 for the others). Let