Prolog based Description Logic Reasoning

0 downloads 0 Views 187KB Size Report
Ancestor resolution is implemented in Prolog by building an ancestor list which contains open .... DLog uses normal Prolog unification, rather than occurs check;.
Prolog based Description Logic Reasoning Gergely Lukácsy, Péter Szeredi and Balázs Kádár Budapest University of Technology and Economics Department of Computer Science and Information Theory 1117 Budapest, Magyar tudósok körútja 2., Hungary {lukacsy,szeredi}@cs.bme.hu

Abstract. In this paper we present the recent developments of the DLog system, an ABox reasoning engine for the the SHIQ description logic language. DLog differs from traditional description logic reasoners in that it transforms description logic axioms into a Prolog program. The transformation is done independently from the ABox, i.e. the set of assertions about the individuals. This makes it possible to store the ABox assertions in a database, rather than in memory. This approach results in better scalability and helps using description logic ontologies directly on top of existing information sources. The transformation involves several optimisation steps, which aim at producing more efficient Prolog programs. In this paper we focus on the partial evaluation technique we apply to obtain programs that do not use logic variables. This simplifies the implementation, improves performance and opens up the possibility of compiling into Mercury code. In the paper we also present the recent architectural changes in the DLog system, summarise the most important features of the implementation and evaluate the performance of DLog by comparing it to the best available description logic reasoners. Keywords: description logic, logic programming, resolution, large data sets

1

Introduction

Description Logics (DLs) are becoming widespread as more and more systems start using semantic technologies. Similarly to [1], the motivation for our work comes from the realisation that DLs are, or soon will be, used to reason on large amounts of data. On the Web, for example, we already have tremendous amounts of meta-information. Obviously, such information sources cannot be stored directly in memory. Thus, we are interested in querying DL concepts where the assertions about the individuals – the so called ABox – may be stored externally, e.g. in databases. We found that most existing DL reasoners are not suitable for this task, as the traditional algorithms for querying DL concepts need to examine the whole ABox to answer a query. This results in scalability problems and undermines the point of using databases.

We have developed an approach where the inference algorithm is divided into two phases. From the given terminological knowledge, without accessing the underlying data set, we first create a query-plan, in the form of a Prolog program. Subsequently, this query-plan can be run on the ABox data, to obtain the required results. This algorithm has been incorporated in the ABox reasoning engine DLog, which is available at http://dlog-reasoner.sourceforge.net. In this paper we focus on the recent developments of the DLog system, which include a partial evaluation technique we apply to avoid using logic variables in the Prolog programs generated, as well as the architectural redesign of the system. The complete description of other aspects of DLog, including the detailed explanation of the optimisation techniques, can be found in [2]. This paper is structured as follows. In Section 2 we introduce Description Logics and summarise existing theorem proving approaches for DLs. Section 3 gives an overview of the DLog approach. Section 4 discusses a new optimisation technique, called unfolding. Section 5 presents the architecture and the implementation details of the DLog server extension. Finally, before concluding the paper, we compare the performance of DLog with other reasoning systems.

2

Preliminaries and Related Work

Description Logics [3] are a family of simple logic languages used for knowledge representation. DLs are used for describing various kinds of knowledge of a selected field as well as of general nature. The Description Logic approach uses concepts to represent sets of objects i.e. unary relations, and roles to describe binary relations between objects. Objects are the instances occurring in the modelled application field, and thus are also called instances or individuals. A DL knowledge base is a set of DL axioms consisting of two disjoint parts: the TBox and the ABox. The TBox (terminology box) contains terminological axioms, such as C ⊑ D (concept C is subsumed by concept D). The ABox (assertion box) stores knowledge about the individuals in the world, e.g. the concept assertion C(i) states that individual i is an instance of the concept C. Concepts and roles may either be atomic or composite. A composite concept is built from atomic concepts using constructors. The expressiveness of a DL language depends on the constructors allowed for building composite concepts or roles. We use the DL language SHIQ in this paper which is one the most widely used DL variant. For more details we refer the reader to the first two chapters of [3]. In this paper, we will deal with two ABox-reasoning problems: instance check and instance retrieval. In an instance check problem, a query-concept C and an individual i is given. The question is whether C(i) is entailed by the TBox and the ABox. In an instance retrieval problem the task is to retrieve all the instances of a query-concept C, entailed by the given TBox and ABox. Traditionally, ABox-reasoning is based on the tableau inference algorithm. An individual i is inferred to be an instance of a concept C, if the tableau algorithm reports inconsistency for the given TBox and ABox, when the latter is extended with the indirect assertion ¬C(i). This approach cannot be directly used for high 2

volume instance retrieval, because it requires checking all instances in the ABox. Novel techniques have been developed recently, such as [4], to overcome this drawback of the tableau approach. These techniques have been incorporated in the state-of-the-art DL reasoners, such as RacerPro and Pellet, the two tableau reasoners used in our performance evaluation in Section 6. In [5], a resolution-based inference algorithm is described, which is not as sensitive to the increase of the ABox size as the tableau-based methods. However, this approach still requires the input of the whole content of the ABox before attempting to answer any queries. The KAON2 system [1] provides an implementation of this approach. Article [6] introduces the term Description Logic Programming. This idea uses a direct transformation of ALC concepts into definite Horn-clauses, and poses some restrictions on the form of the knowledge base, which disallow axioms requiring disjunctive reasoning. Further important work on Description Logic Programming includes [7,8,9]. The Prolog Technology Theorem Prover approach (PTTP) was developed by Mark E. Stickel in the late 1980’s [10], providing a theorem prover for First Order Logic (FOL) on top of Prolog. This means that an arbitrary FOL formula is transformed into a set of Horn-clauses, and FOL reasoning is performed using Prolog execution. In PTTP, each first order clause gives rise to a number of Horn-clauses, the so-called contrapositives. By using contrapositives each literal of a FOL clause will appear in the head of a Horn clause, ensuring that it can participate in a resolution step in spite of the restricted selection rule of Prolog. In the PTTP approach, ancestor resolution is used instead of factoring inference rule. Ancestor resolution is implemented in Prolog by building an ancestor list which contains open predicate calls (i.e. calls which were entered or reentered, but have not been exited yet, according to the Procedure-Box model of Prolog execution). If the ancestor list contains a literal which can be unified with the negation of the current goal literal, then the goal literal succeeds and the unification with the ancestor element is performed. Note that in order to retain completeness, as an alternative to ancestor resolution, one has to try to prove the current goal using normal resolution as well. There are two further features to make the PTTP approach complete. First, to avoid infinite loops, iterative deepening is used as opposed to the standard depth-first Prolog search strategy. Second, in contrast with Prolog, PTTP uses occurs check during unification.

3

An Overview of the DLog Approach

In this section we give a high level overview of the DLog reasoner. Let us consider the following DL knowledge base example: 2

∃hasFriend. Alcoholic ⊑ ¬Alcoholic ∃hasParent. ¬Alcoholic ⊑ ¬Alcoholic

3

hasParent(i1, i2)

1

hasParent(i1, i3)

3

hasFriend(i2, i3)

The axiom in line 1 states that if someone has a friend who is alcoholic, then he is not alcoholic. Line 2 states that if someone has a parent, who is not an alcohol addict, then he is not alcoholic either. The ABox in line 3 contains assertions for the hasParent and hasFriend relations, but nothing about someone being alcoholic or non-alcoholic. Interestingly, it is possible to conclude that i1 is non-alcoholic as one of his parents has to be non-alcoholic. The common properties of such problems is that solving them requires case analysis and therefore the trivial Prolog translation usually does not work. The first step of our sound and complete SHIQ to Prolog transformation process is to convert a SHIQ knowledge base to a set of first order clauses of a specific form. Here we rely on the saturation techniques described in [1] and [11]. In the present paper we only make use of the fact that the output of these transformations takes a specific form: [1] and [11] prove that for an arbitrary SHIQ knowledge base KB, the resulting set of first-order clauses, denoted by DL(KB), only contains clauses of the form listed in Figure 1.

(1) ¬R(x, y) ∨ S(y, x) (2) ¬R(x, y) ∨ S(x, y) (3) P(x) W W W (4) i,j,k ¬Rk (xi , xj ) ∨ i P(xi ) ∨ i,j (xi = xj ) (5) R(a, b) (6) C(a) Fig. 1. The format of FOL clauses generated from SHIQ knowledge bases

Here clause types (1)–(4) correspond to the TBox, while (5) and (6) are ABox clause templates. P(x) denotes a nonempty disjunction of possibly negated unary literals: (¬)P1 (x) ∨ . . . ∨ (¬)Pn (x). A clause of type (4) has further properties: it contains at least one negative binary literal, and at least one unary literal, but the set of variable equalities may be empty. Also, its negative binary literals contain all the variables of the clause. Furthermore, if we build a graph from the binary literals by converting R(x, y) to an edge x → y, this graph will be a tree. Note that, in contrast with [1], all clauses containing function symbols are eliminated: the resulting clauses can be resolved further only with ABox clauses. This forms the basis of a pure two phase reasoning framework, where every possible ABox-independent reasoning step is performed before accessing the ABox itself, allowing us to store the content of the ABox in an external database. Actually, in the general transformation, we use only certain properties of the clauses in Figure 1. These properties are satisfied by a subset of first order clauses that is, in fact, larger than the set of clauses that can be generated from a SHIQ KB. We call these clauses DL clauses. As a consequence of this, our results can be used for DL knowledge bases that are more expressive than SHIQ. This includes the use of certain role constructors, such as union. Furthermore, some 4

parts of the knowledge base can be supplied by the user directly in the form of first order clauses. More details can be found in [2]. As the clauses of a SHIQ knowledge base KB are normal first-order clauses we can apply the PTTP technology (cf. Section 2) directly on these. This involves the generation of contrapositives of DL(KB), which also requires the introduction of new predicate names for negated literals. We have simplified the PTTP approach for the special case of DL clauses. The following list is a brief summary of the principles we use in the execution of DL predicates, in comparison with their counterparts in PTTP: – – – – –

DLog DLog DLog DLog DLog

uses normal Prolog unification, rather than occurs check; uses loop elimination, instead of iterative deepening; eliminates contrapositives with negated binary literals in the head; does not apply ancestor resolution for roles; uses deterministic ancestor resolution.

In [2], we have proved that these modifications result in a reasoner on DL clauses, which is sound and complete. We have implemented the specialised PTTP approach as follows. First, we transform the DL clauses to a DL predicate format simply by generating all contrapositives and grouping these into predicates. For very simple knowledge bases, not requiring ancestor resolution nor loop elimination, the DL predicate translation produces a sound, executable Prolog code. For more complex knowledge bases, such as the alcoholic example, one has to include loop elimination and ancestor resolution in the DL predicates themselves.1 The complete and formalised transformation process is presented in [2]. As an example, the DL predicate format of the above alcoholic problem is shown below: 1

alcoholic(A) :- hasParent(B, A), alcoholic(B).

4

not_alcoholic(A) :- hasParent(A, B), not_alcoholic(B). not_alcoholic(A) :- hasFriend(A, B), alcoholic(B). not_alcoholic(A) :- hasFriend(B, A), alcoholic(B).

5

hasParent(i1, i2). hasParent(i1, i3). hasFriend(i2, i3).

2 3

Figure 2 shows the executable Prolog code generated for the DL predicate alcoholic, as shown in line 1 above. Lines 1 and 2 of the figure implement loop elimination and ancestor resolution, respectively. Line 3 is derived from the single DL clause of alcoholic, by extending the head and appropriate body calls with an additional argument, storing the ancestor list. Note that an additional clause is required in the Prolog code, if the ABox contains assertions for the given unary predicate. For example, if there were assertions for alcoholic in the ABox, then the alcoholic predicate would have a fourth clause of form: alcoholic(A, _) :- alcoholic(A). 1

Another option is to use an interpreter catering for loop elimination and ancestor resolution, see [12].

5

1 2 3

alcoholic(A, B) :- member(C, B), C == alcoholic(A), !, fail. alcoholic(A, B) :- memberchk(not_alcoholic(A), B). alcoholic(A, B) :- C = [alcoholic(A)|B], hasParent(D,A), alcoholic(D,C).

Fig. 2. The Prolog translation of the predicate alcoholic.

4

Unfolding

The present section discusses an important optimisation in the translation of DL predicates to Prolog code. This transformation uses well known partial evaluation techniques to produce Prolog code that is both more efficient and uses simpler data structures, relying on the specific features of DL predicates. 4.1

Motivation and Goals

Recall that DL predicates contain body goals with at most two arguments, and that only unary predicates require ancestor resolution and loop elimination. These two execution elements rely on maintaining a list of (unary) ancestor goals, which, in general, are not necessarily ground. Non-ground ancestors can only be created in the early phase of execution. As soon as a binary goal exits successfully, or a unary goal succeeds and instantiates its argument, the query variable is instantiated and from this point onwards all unary goal invocations, as well as the ancestor list, are ground. This is because such goals are brought to the front of the clause body the goal ordering algorithm, as described in [2]. The main goal of the unfolding transformation is to eliminate the phase involving non-ground ancestors. This is achieved by repeatedly unfolding clauses with unary goals only, until an invocation of either a binary goal, or of a unary ABox predicate, i.e. a predicate which is defined solely by ABox facts, appears in the body. Naturally, in the process of unfolding both ancestor resolution and loop elimination has to be taken care of. This transformation has several advantages. Performing ancestor resolution at compile time obviously saves runtime. What is more important, it may well be the case that ancestor resolution can be fully eliminated for certain predicates, thus also avoiding the need for building ancestor lists for these predicates. Even when ancestor lists are needed, they are ground, and thus no logic variables occur in the DLog code. This means the DL predicates can be potentially compiled to Mercury, rather than to plain Prolog, with obvious efficiency implications. Unfolding also opens the possibility for a more efficient implementation of ancestor storage, such as hash tables2 . 2

Hash tables have already been introduced in the previous version of DLog. However there we had to rely on the so called superset transformation, cf. Section 3, to ensure that all unary goals are ground. By the use of unfolding, the calculation of the superset, a potentially expensive operation, can be avoided.

6

4.2

An Example Let us consider the following simple TBox: ¬Alcoholic ⊒ Worried ⊓ Happy

(1)

Happy ⊒ Worried ⊓ ∃hasFriend.⊤ Worried ⊒ ¬Happy

(2) (3)

Worried ⊒ ∃hasFriend.Alcoholic

(4)

The DL predicates corresponding to the above TBox are shown below. Comments indicate which DL axiom gives rise to the given clause. Clauses which are contrapositives of “main” translations are shown in italics. not_alcoholic(X) :not_alcoholic(Y) :-

worried(X), happy(X). hasFriend(X, Y), not_worried(X).

(1) (4)

happy(X) :happy(X) :-

worried(X), hasFriend(X, _). not_worried(X).

(2) (3)

worried(X) :worried(X) :-

not_happy(X). hasFriend(X, Y), alcoholic(Y).

(3) (4)

not_worried(X) :not_worried(X) :-

happy(X), alcoholic(X). not_happy(X), hasFriend(X, _).

(1) (2)

not_happy(X) :-

worried(X), alcoholic(X).

(1)

Note that the above code cannot be directly executed in Prolog. Because alcoholic can be called within not_alcoholic, the invocation of the latter has to be put on the ancestor list, while the definition of the former has to cater for ancestor resolution. The same holds for the happy–not_happy pair of predicates. Therefore, similarly to the Prolog code in Figure 2, a second argument is added to these predicates to store the ancestor list, and their definition is extended by a clause performing the ancestor resolution. In contrast with this fairly complex translation, the unfolding optimisation results in Prolog code that does not require ancestor resolution. The executable Prolog code resulting from the optimisation is shown below, under the assumption that the ABox only contains facts for hasFriend and worried. happy(A) :-

hasFriend(A, _).

not_alcoholic(A) :not_alcoholic(A) :-

worried(A), hasFriend(A, _). hasFriend(A, A).

4.3

The Process of Transformation

The unfolding transformation takes a set of DL predicates and an ABox signature (the list of functors present in the ABox), and produces an equivalent 7

set of annotated DL predicates, i.e. DL predicates in which each goal is associated with an explicit ancestor list. This list contains the terms that have to be added to the ancestor list, maintained by the DLog execution, when the given goal is invoked. Note that this annotation is implicit in the input DL predicates, as each goal is assumed to be annotated with its parent goal. In general, each unary input predicate p/1 is duplicated in the output: there is an entry version, named p/1, and an inner version, named ’p$’/1. The inner version is equivalent to the original, while the entry version is a specialisation of the inner predicate, under the assumption that the ancestor list supplied to the predicate is empty. Note that this is the case when the given predicate is invoked from outside. The entry version of a predicate is omitted from the output, if it has no clauses, and the inner version is only included if it is invoked (perhaps indirectly, through other inner predicates) from one of the entry predicates. For instance, consider the unfolded code of the example at the end of Section 4.2. This Prolog program does not contain the entry predicate not_worried/1 because all its clauses fail, when called with an empty ancestor list. Furthermore, the code contains no inner predicates, because none is invoked from the entry ones. The transformation process consists of the following phases, discussed below in detail. 1. Equivalence transformations: (a) primary and secondary unfolding, (b) simplification 2. Specialisation of entry predicates 3. Composing the target program Primary unfolding is a process applied to each unary DL predicate. We start with a most general invocation of the given predicate, say p(X), and repeatedly expand the goal sequence at hand by nondeterministically replacing a unary goal with a clause body of its definition. The expansion goes on until a binary or an ABox predicate invocation appears in the sequence (recall that this ensures that ancestor lists are ground). Depending on the option settings, the expansion will continue after this minimal objective is achieved, but only for those unary goals whose argument is the same variable as that of the original goal (i.e. X in our example). Having enumerated all expansions B of goal p(X), the set of clauses ‘p(X) :- B.’ forms the result of the primary unfolding. For example, let us consider the following simple program: p(X) :- q(X), r(X). q(X) :- a(X). q(X) :- b(X,Y), p(Y). The minimal primary unfolding of p/1 is the following (assuming a/1 is an ABox predicate, and ignoring annotations for the moment): p(X) :- a(X), r(X). p(X) :- b(X,Y), p(Y), r(X). 8

If multiple unary goals are available for expansion, as e.g. in q(X),r(X) after the first expansion of p(X), we select the goal whose definition is the smallest (in terms of clauses). Thus, if r/1 had only a single clause, then we would have chosen to expand the goal r(X), rather than q(X), in the above example. In addition to using the clauses of its predicate definition, a goal can also be satisfied through ABox assertions or using ancestor resolution. Primary unfolding has to take this into account. If, for example, q/1 is present in the ABox signature, i.e. there are some ABox assertions of form q(...), then a third clause for p/1 is generated: p(X) :- qabox (X), r(X). The abox subscript indicates here that it is not the whole q/1 predicate which is to be called, only its ABox assertions. The above clause can obviously be treated as satisfying the minimal unfolding objective. Furthermore, if q/1 can succeed via ancestor resolution (i.e. it can be reached from not_q/1) then a fourth clause is added: p(X) :- ’$checkanc’(not_q(X)), r(X). Here the ’$checkanc’ goal indicates that an ancestor check has to be performed: if not_q(X) unifies with an element on the current ancestor list, then this call should succeed; otherwise it should fail. Note that a clause containing the special ’$checkanc’ goal is treated as satisfying the minimal unfolding requirement, as it will always be removed at entry predicate specialisation (see below). Also note that these additional clauses, generated during primary unfolding, correspond to the additional clauses in the Prolog translation, cf. Figure 2. During primary unfolding an ancestor list is maintained. Each time an expansion is performed, the goal being expanded is added to the ancestor list. The goals in the unfolded clauses are annotated with their ancestor lists. In the above example this results in the following (the ancestors are shown as subscripts): p(X) :- a(X), rp(X) (X).

(5)

p(X) :- b(X,Y), pp(X),q(X)(Y), rp(X) (X).

(6)

Note that no ancestors are given for the goals a and b, as these predicate invocations access the ABox only, and so are not dependent on the ancestor list. The ancestor list maintained during unfolding is also needed to perform the loop elimination and ancestor resolution operations. For example, assume that the predicate q/1 has a third clause: ‘q(X) :- p(X).’ When unfolding this clause within p/1, we detect that its body, p(X), is bound to fail, because of loop elimination. Correspondingly, in spite of the third clause for q/1, the unfolded p/1 will still contain the clauses (5) and (6) only. Similarly, if the third clause added to q/1 is ’q(X) :- not_p(X).’, then primary unfolding will determine that a successful ancestor resolution can be applied at this point. Thus the unfolded p/1 is extended with a third clause: p(X) :- rp(X) (X).

(7) 9

Note that both (5) and (6) are consequences of the above clause. This is detected in the program simplification phase, and both clauses are removed. Thus, in this last variant, the unfolded Prolog code for p/1 contains a single clause only, (7). Primary unfolding is performed only once, at the beginning of the optimisation. Secondary unfolding is the expansion of a unary goal in a unary clause, where the goal argument is different from the head variable. We apply secondary unfolding only in the deterministic case. This means that secondary unfolding is applicable to qAncs (Y) if: – q/1 cannot succeed via ancestor resolution, has no ABox clauses, but does have a single TBox clause, and, furthermore, the Ancs ancestor list has no member with the functor q/1, (i.e. we are not inside another primary or secondary unfolding for q/1); or – q/1 cannot succeed via ancestor resolution, has no TBox clauses, but does have ABox clauses; or – q/1 has no TBox and no ABox clauses, but it can succeed via ancestor resolution. In the first case the goal qAncs (Y) is expanded to the body of the single clause of predicate q/1. In the second case the goal is replaced by qabox (Y). Finally, in the third case, the goal in question is replaced by ’$checkanc’(not_q(Y)), provided no member of the ancestor list Ancs has the functor not_q/1. Otherwise, if Ancs has a member not_q(Z)3, then we still have two cases. Let p/1 denote the functor of the predicate in which the given invocation of q/1 is found. If p/1 cannot be reached from not_q/1, then the goal can only succeed if Y and Z are the same. Therefore the goal is replaced by the unification Y = Z. Otherwise, a goal ’$checkanc’(not_q(Y), not_q(Z)) is generated, whose task is either to unify its two arguments, or to unify the first argument with a member of the current ancestor list. The main purpose of secondary unfolding is to clarify the ancestor resolution dependencies in the program. When the goal in question succeeds through the ABox, it becomes clear that it does not need the ancestor list argument. Similarly, if the third case is resolved through the unification of two variables, the ancestor list argument becomes unnecessary. Secondary unfolding is first performed together with primary unfolding, but it is then repeated, possibly several times, within program simplification. Program simplification is the phase of the transformation where redundancies are removed. There are three basic simplifications: – removal of a group of goals posing a constraint which is weaker than some other group of goals in the same clause body, – removal of a clause whose body poses a stronger constraint than some other body in the same predicate, – removal of unnecessary ancestor annotations. 3

Each functor can occur only once on an Ancs ancestor list, because nested unfolding of the same predicate is disallowed, cf. the first case of secondary unfolding.

10

The essence of the first two simplifications is best illustrated with an example. Consider a clause body containing the following two groups of goals: b(X, Y), q(Y) and b(X, Z), q(Z), r(Z). Assuming that variables Y and Z do not occur elsewhere, we can notice that the first group of goals poses a weaker constraint than the second group, and therefore the first group is unnecessary. For the same reason, if the above two goal groups occur as complete clause bodies within the same predicate, then the clause with the second, stronger constraint will be removed. Regarding the third simplification, let us note that a term p(X) is included in the ancestor annotation of a goal q(Y) for the purpose of being put on the ancestor list of q. However, there is no need to include p(X) in the ancestor list of q if there are no goals reachable from within q that may make use of this ancestor (i.e. goals invoking the predicate not_p/1 or being of form ’$checkanc’(p(_)...)). Note also that there are several optimisations which result in the removal of goals and clauses. This means that even if p(X) was considered a necessary ancestor at an earlier stage, it may become superfluous later, when no more goals making use of it are reachable from q. To cater for this, simplifications and secondary unfolding are performed repeatedly, until a fixpoint is reached. At this point the first phase, that of equivalence transformations, is completed. Specialisation of entry predicates is the second phase. Here we specialise the output of phase 1 under the assumption that the ancestor list of the predicate is empty (which is the case for entry predicates). Although the optimisation is driven by this assumption, we keep track of the functors whose absence from the ancestor list is really needed for the given optimisation to work. If the optimised entry version of p/1 is not guaranteed to work correctly for an invocation where q(_) is present on p/1’s ancestor list, p/1 is said to be sensitive to q/1. For each predicate, we keep track of the predicate functors it is sensitive to. We start the phase by removing all clauses that contain a ’$checkanc’/1 or ’$checkanc’/2 goal. Obviously, when the enclosing predicate is called with an empty ancestor list, these goals will fail. Whenever we remove a clause of a predicate p/1, because of the presence of a ’$checkanc’(q(_)...) goal, we note that p/1 is sensitive to q/1. When clauses are removed, some predicates may become empty. If such a predicate, say r/1, has no ABox clauses either, it will always fail, unless called with a goal in its ancestor list to which it is sensitive. Consequently, if a clause C of another predicate t/1 calls rAncs (Y), where Ancs does not contain any goals to which r/1 is sensitive, then C can be removed. Note that at the same time t/1 inherits the sensitivity of r/1, i.e. it becomes sensitive to all the predicates r/1 is sensitive to. This process is continued until a fixpoint is reached. The next task of this phase is the identification of the query predicates, i.e. non-recursive predicates which need no ancestor resolution at all. A query predicate can only call ABox predicates and query predicates. Thus a list of query predicates is built, again iteratively, until a fixpoint is reached. Note that we assign a cumulated sensitivity to each query predicate, which is the union of 11

its own sensitivity and the sensitivities of any predicates it calls, directly or indirectly. This will be used in the next, final phase. Composing the target program means putting together the entry predicates, as produced by phase 2, with inner predicates, which are the output of phase 1. More precisely, the set of inner predicates is obtained by renaming: p/1 becomes ’p$’/1. Note that all goals in the entry predicate bodies also undergo this renaming, as do the terms on the ancestor lists. Next, as the final optimisation, we revert some goals to call the entry, rather than the inner version. Namely, if p/1 is a query predicate, which is called by a goal in a context where none of the functors of the cumulated sensitivity of p/1 can appear on the ancestor list of the goal, then this invocation can call the entry version of p/1, instead of its renamed, inner version. Having performed this optimisation we can remove those inner predicates which are never called. Status of the implementation. The unfold optimisation has been implemented and is being integrated into the DLog system. At present we are capable of executing queries for those knowledge bases whose optimised form consists of query predicates only. This is the case, for example, for the LUBM test suite; its performance is evaluated in Section 6.

5

DLog Server Architecture

We have extended the DLog implementation, as described in [2], into a server architecture which supports multiple interfaces such as DIG, OWL, etc., and is capable of operating in a server mode, as required by popular tools such as the Protégé ontology editor. DLog was originally developed in SICStus Prolog, and has been recently ported to the open source SWI Prolog. The general architecture of the DLog system is shown in Figure 3. Here, the rectangles with rounded corners represent the modules of the DLog system. The logger and configuration modules, used by all other modules, are not shown in the figure. The configuration module manages both global settings (such as server ports), and knowledge base specific settings (such as the selection of optimisations to use). The modular structure of the DLog system makes implementing new features (such as new interfaces) fairly easy. The system provides a console interface to access all features locally, and server interfaces for other applications (for example, the DIG interface [13] used by Protégé). The input arriving from these sources may contain TBox axioms, ABox assertions, queries, or control messages (e.g. creating a new database or setting system parameters). After transforming the input to an internal representation, the interfaces pass it to the knowledge base manager, which executes the command. The system can manage multiple knowledge bases simultaneously. The ABox translator module processes the ABox, which either contains the assertions themselves, or the description of how to access the databases containing the assertions. It produces ABox code, which is a Prolog module containing either the assertions themselves or the appropriate database access predicates. 12

PSfrag

ABox sig. Control

KRSS

TBox

TBox translator DL translator

TBox code

DIG

OWL

Local console

ABox translator

ABox

Knowledge Base Manager

Remote client

Interfaces

ABox code

External

Hash

Console Queries

Query module

Fig. 3. The architecture of the DLog system.

The ABox translator also generates the signature of the ABox, as required by the TBox translator. The TBox axioms are first processed by the DL translator module, which transforms the DL formulae to a set of DL clauses [11]. The results are passed on to the TBox translator module which generates the TBox code, a Prolog program that can be directly executed to answer instance check and instance retrieval queries. The queries are executed by the Query module by using this program. The Prolog program generated as TBox code relies on the Hash module, which implements a hash table in C, to speed up loop elimination and ancestor resolution.

6

Evaluation

We have compared our system with three state-of-the-art ABox reasoners: RacerPro 1.9.0, Pellet 1.5.0, and the latest version of KAON2 (August 2007). For the benchmark we have used publicly available benchmark ontologies (LUBM and VICODI), as well as the ontology corresponding to the Iocaste problem introduced in [2]. The tests were performed on a Fujitsu-Siemens S7020 laptop with 1.25GB memory. A sample of the test results is presented in Table 1. Here the values are given in seconds and dash (-) indicates a timeout of 600 seconds. For the LUBM test cases we show the DLog execution with the unfolding optimisation turned off 13

and on (the latter is denoted by the UF suffix). The fastest total time in each column is set in boldface (except for the results with unfolding). For detailed performance evaluation, including tests with ABoxes stored in databases, see [2]. Table 1. Sample results of performance evaluation.

0.33 0.01 0.34

6.96 0.26 7.22

7.06 0.23 7.29

21.34 1.32 22.66

21.44 1.10 22.54

8.61 0.05 8.66

load runtime total

0.45 0.72 1.17

-

6.56 0.70 7.26

N/A N/A N/A

28.73 1.69 30.42

N/A N/A N/A

5.88 0.36 6.24

load runtime total

0.01 0.07 0.08

0.51 1.68 2.19

24.28 117.89 142.17

N/A N/A N/A

-

N/A N/A N/A

34.96 76.48 111.44

load runtime total

1.27 0.19 1.46

2.19 456.40 458.58

16.76 31.93 48.69

N/A N/A N/A

-

N/A N/A N/A

-

DLog

0.07 0.00 0.07

RacerPro KAON2

Iocaste10 Iocaste1000 LUBM1 LUBM1UF LUBM4 LUBM4UF VICODI

load runtime total

Pellet

Test

We found that the larger the ABox, the better DLog performs compared to its peers. Our implementation of unfolding, still in its prototype stage produces 10-15% speed-up at runtime, with a constant (approx. 0.1 sec) cost at load time, which we believe is very promising.

7

Conclusions

In this paper we have presented the Description Logic reasoning system DLog. Unlike the traditional Tableau based approach, DLog determines the instances of a given SHIQ concept by transforming the knowledge base into a Prolog program. This technique allows us to use top-down query execution and to store the content of the ABox externally in a database, something which is essential when large amounts of data are involved. Following an overview of other optimisation techniques we presented the newly introduced unfolding optimisation. This is basically a partial evaluation technique used to unfold clauses containing no binary literals. As a result we can obtain programs where we no longer need to cater for executing unary predicates with uninstantiated arguments (except for the outermost query predicate). We have compared DLog with the best available ABox reasoning systems. From the test results we can conclude that in all of the scenarios DLog is significantly faster than traditional reasoning systems. 14

As an overall conclusion, we believe that our results are very promising and clearly show that Description Logic is an interesting application field for Prolog and logic programming.

Acknowledgements The authors acknowledge the support of the Hungarian NKFP programme for the SINTAGMA project under grant no. 2/052/2004. We are also grateful to Tamás Benkő and the anonymous reviewers for their comments on earlier versions of the paper. Thanks are due to Zsolt Zombori for his work [11] on the design and implementation of critical components of the DLog system.

References 1. Motik, B.: Reasoning in Description Logics using Resolution and Deductive Databases. PhD thesis, Univesität Karlsruhe, Karlsruhe, Germany (2006) 2. Lukácsy, G., Szeredi, P.: Efficient description logic reasoning in Prolog: the DLog system. Technical report, Budapest University of Technology and Economics (2008) Submitted to Theory and Practice of Logic Programming, http://sintagma.szit.bme.hu/lukacsy/publikaciok/dlog_tplp.pdf. 3. Baader, F., Nutt, W.: Basic description logics. In Baader, F., Calvanese, D., McGuinness, D.L., Nardi, D., Patel-Schneider, P.F., eds.: Description Logic Handbook, Cambridge University Press (2003) 4. Haarslev, V., Möller, R.: Optimization techniques for retrieving resources described in OWL/RDF documents: First results. In: In Proc. KR 2004, Whistler, BC, Canada, June 2-5. (2004) 163–173 5. Hustadt, U., Motik, B., Sattler, U.: Reasoning for Description Logics around SHIQ in a resolution framework. Technical report, FZI, Karlsruhe (2004) 6. Grosof, B.N., Horrocks, I., Volz, R., Decker, S.: Description logic programs: Combining logic programs with description logic. In: Proc. of WWW 2003, ACM (2003) 48–57 7. Hustadt, U., Motik, B., Sattler, U.: Data complexity of reasoning in very expressive description logics. In: Proceedings of IJCAI’05. (2005) 466–471 8. Samuel, K., Obrst, L., Stoutenburg, S., Fox, K., Franklin, P., Johnson, A., Laskey, K.J., Nichols, D., Lopez, S., Peterson, J.: Translating OWL and Semantic Web Rules into Prolog: Moving toward Description Logic Programs. TPLP 8(3) (2008) 301–322 9. Motik, B., Rosati, R.: A Faithful Integration of Description Logics with Logic Programming. In Veloso, M.M., ed.: Proc. of the 20th Int. Joint Conference on Artificial Intelligence (IJCAI’07), Hyderabad, India, Morgan Kaufmann Publishers (2007) 477–482 10. Stickel, M.E.: A Prolog technology theorem prover: a new exposition and implementation in Prolog. Theoretical Computer Science 104(1) (1992) 109–128 11. Zombori, Zs.: Efficient two-phase data reasoning for Description Logics. In: Proceedings of the IFIP International Conference on Artificial Intelligence, Milan, Italy (2008) http://www.cs.bme.hu/˜zombori/BME/dlog/dl_reasoning.pdf. 12. Nagy, Zs., Lukácsy, G., Szeredi, P.: Translating description logic queries to Prolog. In: Symposium, PADL 2006, Charleston, SC, USA, January 9-10, 2006, Proceedings. Volume 3819 of LNCS. (2006) 168–182 13. Bechhofer, S.: The DIG interface. http://dig.cs.manchester.ac.uk/ (2006)

15