DBPQL: An Integrated Database Programming

0 downloads 0 Views 163KB Size Report
the need to manage a large volume of objects with object semantics. However, most of .... In terms of the (more familiar) Java programming language, DBPQL.
DBPQL: An Integrated Database Programming and Query Language for Distributed Object Bases Markus Kirchberg ([email protected]) Information Science Research Centre, Massey University, Private Bag 11 222, Palmerston North 5301, New Zealand Ph.D. Supervisor: Prof Dr Klaus-Dieter Schewe ([email protected]) Ph.D. Co-Supervisor: Ph.D. Ray Kemp ([email protected])

1

Introduction

Database systems are an indispensable tool in most information science applications. They are extensively used within business information systems, web-based information services, on-line teaching systems, genome databases, etc. Distributed databases and support of object-oriented concepts are becoming more and more vital for present applications. Distribution is needed, because data is collected from various sources, especially in science applications, and used globally, especially in web-based applications. Object-orientation is advantageous and in many cases necessary to support more complex data structures and associated behaviour. Object-oriented database systems (OODBSs) appeared in the 1980’s. Initially, they have been regarded as the solution to the discovery of the limitations of relational database systems and the need to manage a large volume of objects with object semantics. However, most of the early OODBSs did not survive for long. Among others, [9] investigates their disappearance. Research directions are outlined that need to be addressed before being able to built a sophisticated, successful OODBS. Even nowadays, 15 years later, there is still a lack of standards, research projects and research results in this area. These days, OODBSs are no longer seen as a replacement for existing relational and objectrelational database systems. OODBS are more seen as a complement for relational and objectrelational databases. Embedded DBS applications, applications requiring complex object relationships, applications dealing with changing object structures, etc. are among the applications regarded to benefit from using OODBS technologies over relational and object-relational DBS technologies. The Ph.D. work presented in this paper forms a part of a bigger research project that aims to propose a distributed object-oriented database system that is based on a sound theoretical framework. An overview of the already proposed OODBS architecture can be found in [12]. Here, we addresses the core of the physical system of a distributed OODBS. A stack-based approach to database programming and querying is investigated as a correct way of integrating query and programming languages. Thus, capabilities of executing queries, transactions, methods, etc. on an operational database level are intended to be captured in one language. 1

1.1

Outline

The remainder of this paper is organised as follows: Section 2 provides a brief overview of existing work on the subject of database programming. At the end of that section, we outline our intended contribution in more detail. Section 3 discusses the target environment and relates a variety of languages and abstractions used frequently later in the paper. Section 4 details main characteristics of the proposed integrated language. Section 5 then addresses the realisation of this language on the level of evaluation engines. Finally, Section 6 concludes our work.

2

Related Work

The relationship between query languages (QLs) and general-purpose programming languages (PLs) has been studied since decades. The popular classification distinguishes between the embedded and integrated approach. The standard solution, namely embedding a query language in the programming language, suffers from problems collectively known as impedance mismatch. Alternative, integrated approaches (e.g. Pascal/R [20], Napier88 [16], DBPL [5, 4], LOQIS [22], O2 C [6], Fibonacci [1], and Oracle PL/SQL) circumvent such problems. However, most integrated approaches either represent a PL with added QL-constructs (e.g. DBPL and O2 C) or a QL with added PL-constructs (e.g. Oracle PL/SQL). The first approach provides full computational and pragmatic universality, and clean semantics whereas the latter, commercially more popular, one provides user friendliness, macroscopic programming, declaritiveness, and data independence. Combining these two approaches (i.e. integrating programming and querying languages seamingly) has been one of the main objectives of the LOQIS project [22]. We follow this line of thinking. Before introducing the seamless integration approach in more detail (refer Section 2.1), we also like to refer to another relevant project: Tigukat [17] is a novel OODBS. It has a novel object model whose identifying characteristics include a purely behavioural semantics and a uniform approach to objects. Research results of interest (w.r.t. this paper) include a type system for object-oriented database programming languages [14, 13] and a draft Tigukat user (query) language [15]. However, research has been terminated (in 1999, to the best of our knowledge) without addressing problems arising when including programming language constructs into a behavioural database language.

2.1

A Stack-Based Approach

[23] investigates the so-called ‘seamless’ integration of a query language with a programming language. Thus, a foundation of a QL-centralised programming language according to the traditional paradigms of the programming languages domain is built. An extended approach to two-stack abstract machines – known from classical programming languages such as Pascal – is presented. This proposal includes definitions of an abstract storage model, of an abstract machine model, and of semantics of query and programming language operators that are defined through operations on these stacks. Figure 1 shows the relationship between data models and the abstract storage model proposed in [23]. In the abstract storage model, objects are defined as triples < id, name, value >, where id is an internal object identifier, name represents its external identifier, and value is either an atomic 2

Data Model

Query in the Data Model

Interpretation of the Result

in the Data Model

Query Adressing the Abstract Storage Model

SBQL’s Abstract Storage Model

Machine Program

Query Result

Figure 1: Relationship between data models and the abstract storage model of SBQL. value, an identifier of another object or a set of objects. The abstract machine model introduces two stacks. These are the environment stack ES and the query result stack QRES. The environment stack, as usual, determines scoping and binding. The query result stack is a storage for intermediate results, used either for the evaluation of query operators or for the evaluation of arithmetic-style expressions. Semantics of operations are part of the proposed language SBQL (Stack-Based Query Language). It is an untyped, query-centralised programming language in the 4GL style. Semantics are defined for atomic queries, compound queries (using both algebraic and non-algebraic operators), selection, projection, navigation, path expressions, natural join, quantifiers, bounded variables, transitive closure, ordering, null values, variants, assignments, and for each statements. In addition, remarks are provided on how to deal with procedures, classes, class inheritance, methods, and encapsulation. However, this approach has several shortcomings [11]. These include: • It does not support arbitrary types just relations. It is desirable to have a sound type system from which (complex) objects are built. • It does not aim towards efficiency. [23] only details the underlying ideas discussing the realisation of the most basic, straight-forward algorithms when considering the implementation of language constructs. For instance, the join operation is realised as Cartesian product. It is desirable to have a number of different implementations for each language construct available from which the most efficient one can be selected for execution on the basis of the objects involved, the OODBS configuration, and the current system load. • It does not support concurrency. Only a single abstract machine is defined that evaluates all SBQL statements in a serial manner. It is desirable to have a collection of these abstract machines that cooperatively and concurrently execute requests passed down from higher OODBS layers. • It does not support the concept of transactions. It is desirable to execute operations in a way that data consistency can be ensured. Thus, transaction support and management has to be provided. • It does not support distribution. It is vital for the success of most systems nowadays to operate in distributed computing environments, in particular for DBSs. For instance, data should be stored at (or close to) the location it is used most frequently. Thus, it is 3

desirable to have a network of abstract machines that cooperatively (over many locations if necessary) execute DB operations. • It is not suitable for large databases, as the stacks are main memory based. This limits the size of objects and also restricts scalability of the OODBS. It is desirable to support objects of any size. Even more important, database systems and applications have to be scalable to adapt ‘easily’ to changes in business structures, business demands, customer demands, etc. • It provides operational semantics only. A more abstract, formal definition of the stack based approach to integrate query and programming languages is desirable.

2.2

Contribution

In this paper, we introduce an integrated approach to database programming and querying. The language DBPQL, its main characteristics and its realisation is addressed. While following the approach by Subieta et al. [23] and addressing the aforementioned shortcomings of that approach, we also aim to propose a modern (i.e. supporting the development of large applications, scalability, various levels of transparency) database programming and querying language that is realised efficiently.

3

Overview of Our Approach to Integrate Database Programming and Query Languages

In this section, we briefly discuss the distributed database environment we had in mind when designing DBPQL. Also, we intend to provide an overview of the DBS and DBPQL components required to evaluate database requests. In a truly distributed database system, a high-level database user is not aware of the distributed nature of the DBS. In fact, data independence, network transparency, replication transparency, and fragmentation transparency are key properties of this type of DBS. Thus, (DBPQL) programs (or more precise modules) are free of location- or communication-specific information. Only during the compilation, fragmentation, allocation, code rewriting, and optimisation processes such information is added (in the form of annotations) by the DBS itself. Figure 3 provides a brief overview of key DBS components (shaded rectangles) with some of their functionality (italic text), high- and low-level DBPQL code references, communication aspects (bold, solid arrows), etc. High-level user requests arrive in the form of DBPQL modules. The Request Processing Module (RPM) employs a number of components which include a DBPQL compiler, code optimisers (compile-time code optimisation and also query optimisation), code rewritters (e.g. to map operations on objects that correspond to a global object-oriented database schema to schema fragments based on a fragmentation and allocation catalog), a reflection module (e.g. adding support for genericity by exploiting linguistic reflection [19]), etc. Details about these components and corresponding processes are beyond the scope of this paper. Here, we take a black box approach as discussed further below. RPMs transform incoming user requests (i.e. DBPQL modules) into optimised execution plans, which are then translated into iDBPQL code, a lowerlevel version of DBPQL. In terms of the (more familiar) Java programming language, DBPQL would correspond to Source-code and iDBPQL to Byte-code. 4

DBS User Inferfaces DBPQL Programs Collections of (global) Objects

RPM

Compilation, fragmentation, allocation, and optimisation of DBPQL programs

adds annotations (processing nodes, processing order, synchronisation signals, indices, ...)

Generation of optimised execution plans and translation into iDBPQL code

adds support for generic operations

Execution Plans (iDBPQL code) Collections of (fragmented) Objects

TMS REE { agents } { agents }

Other DBS nodes

DBACL { agents } { agents }

employs local and remote communication mechanisms

Persistent Object Store (POS)

Figure 2: Overview of Request Evaluation Components. iDBPQL code is then evaluated by a collection (or better network) of agents 1 . Again, in terms of the Java programming language, the reader might think of a (networked) virtual machine. Code evaluation is governed by so-called Request Evaluation Engines (there exists one REE instance per DBS node). In relational DBSs such a component is usually referred to as the Query Evaluation Engine (e.g. in [21, 18]). However, more sophisticated database systems (e.g. objectrelational, object-oriented and XML database systems) do not just evaluate queries. Thus, we will refer to such a DBS component as REE. Agents of REEs are aware of the distributed nature of the DBS, know about transactions, indices, etc. Agent technologies are exploited by all lowerlevel DBS components. Agents of different components talk different languages – usually referred to as Agent Execution or Evaluation Language (AEL). However, they all use the same (agent) communication language DBACL. Separating AELs and DBACL is straightforward to due the fact that original, high-level user requests do not contain any location- or communication-specific information as mentioned above. REE agents interpret annotations added by the RPM while mapping DBPQL modules relating to global schemata to code that corresponds to schema fragments. In addition, REE agents also optimise processing further by exploiting concurrent, distributed and (intra-)parallel processing capabilities (if and as available). More details are discussed in Section 5. In order to support local and distributed (multi-level) transactions, REE agents cooperate with agents of a transaction management system (TMS). TMS agents ensure serialisability and recoverability. (Local and distributed) concurrency control, various levels of recovery, commit synchronisation, replication management, etc. are some of the key tasks realised by these agents. Persistence and index management are provided by a Persistent Object Store (POS). For more details about this and other key OODBS components, the reader is referred to [12]. 1

An agent can be regarded as a piece of software (most commonly realised as a thread) that performs one or more relatively simple tasks to fulfil one or more given requests. Agents can work independently or cooperatively to fulfil one or more given requests.

5

DBPQL

DBPQL Module

Collection of (global) values / objects

Code Analysis, Type Checking, Code Optimisation, Target Code Generation, global

(Query) Optimisation, Fragmentation,

Object−Oriented Data Model

Reflection Support, Code Translation, ...

fragmented

iDBPQL

Re−Transformation Fragments −−−> Global

(Optimised) Execution Plan Formulated in iDBPQL Code

Network of Agents

Collection of (fragmented) values / objects

Figure 3: Relationship between DBPQL, data models and iDBPQL.

Figure 3 provides a more abstract view of the relationship between DBPQL, iDBPQL, conceptual data models and associated processes. DBPQL can be regarded as a (partial) high-level user language. Partial because we only discuss features here that stem from iDBPQL. Schema evaluation support, more sophisticated (class and schema) constraints, generic operations and other features are not implemented on the level of evaluation engines where iDBPQL resides. Thus, when adding support for these and other features the DBPQL language will be enriched. From a more abstract point of view, iDBPQL code can be considered as a DBPQL module where: • Code fragments referring to higher-level features (e.g. class and schema constraints, generic operations, etc.) have either been removed or replaced by macros (e.g. in case of generic operation support) formulated in iDBPQL code. • Classes, views, objects, methods, statements, etc. relating to persistent or view-derived data now correspond to schema fragments (rather than a global schema). • iDBPQL statements are allocated to DBS nodes (i.e. as annotations) where their evaluation is to be done. • Information about execution (in-)dependencies (e.g. serial, concurrent, pipelined, etc.) is added together with basic synchronisation commands. Therefore the concept of blocks is supported. Explicit beginBlock and endBlock statements are added during the code optimisation process. beginBlock statements also have annotations that refer to the suggested type of processing (which is one of the following: serial, concurrent, parallel, distributed, or none). Agents will use this information to decide about an efficient way of evaluating a single iDBPQL statement or a block of iDBPQL statements. Blocks may have sub-blocks. Blocks are also used to model transactions. As outlined in Section 4, explicit transaction (begin and end) statements are only added during compilation and code rewriting processes. In fact, the concept of blocks is used again. A block can be declared (i.e. another annotation is added) to correspond to a (sub-)transaction. Sub-transactions are only supported when using a more sophisticated transaction model (e.g. nested or multi-level trans6

actions). Thus, execution (in-)dependencies evolve to (sub-)transaction (in-)dependencies when dealing with transaction blocks. In order to relate block-statements or sub-transactions (e.g. refer to [2] where weak and strong input orders are supported that allow to specify dependencies between (sub)transactions) a block may have an associated identifier. • Indices are added (i.e. as annotations) to support the evaluation of iDBPQL statements more efficiently. Additional annotations are necessary, but neglected here for simplicity reasons.

4

Characteristics of (i)DBPQL

In this section, we discuss main characteristics of both DBPQL and iDBPQL. Explanations below apply to both if not stated otherwise. Syntax snapshots are provided using the syntax of DBPQL since this will be easier to understand. (i)DBPQL is a modular language. Modules can be considered as compile-time abstractions that support the development of large-scale programs through the support of import (explicit IMPORT declaration) and export (module interfaces) of services (i.e. data, behaviour or both). ProgUnit = Schema | ModIface | ModImpl;

Modules support information hiding naturally, separate compilation units, support orderly scoping of names, etc. Having explicit import declarations, an import graph is created assisting with scoping of names. In summary, modules are only used as structuring and information hiding primitives. Sub-modules are not supported. Within modules, classes are used as (run-time) abstractions. ModIface = "MODULE INTERFACE", Id, ’{’, ModBlock, ’}’; % Module interfaces outline type and class interfaces of transient services. ModImpl = "MODULE", Id, [ "IMPLEMENTS", Id-Set ], ’{’, ModBlock, ’}’; % Modules can implement multiple interfaces. ModBlock = { ImpDecl }, { ModDecl }, [ ModInit ]; % ModInit is used to initialise module elements before the start of the execution. ImpDecl = "IMPORTS", [ "SCHEMA" ], Id, [ ’.’, Id ], [ "AS", Id ]; % IMPORTS SCHEMA is used to import whole DB schemata only.

Modules may import services implemented by other modules as well as database schemata. A database schema can be understood as a special module interface. Whereas a module interface is specified by the programmer, a schema is maintained by the DBS automatically. Also, there is no explicit implementation module. The database system ‘implements’ the functionality outlined in the schema.

7

Schema = "SCHEMA", Id, ’{’, SchemaBk, ’}’; % All persistent services are made available through such interfaces. SchemaBk = { ImpDecl }, { ModDecl }, [ ModInit ]; % Import declarations are restricted to type interfaces. ModInit is used as above.

Explicit schema creation and manipulation commands are provided by DBPQL. A slightly simplified syntax snapshot of these statements is shown below: SchemaDf = NwSchema | UpSchema | RmSchema;

% Create, Alter and Delete statements.

NwSchema = "CREATE SCHEMA", Id, "WITH", "CLASSES", Id-Set, [ "CONSTRAINTS", Id-Set ], [ "VIEWS", Id-Set ]; % Supports simple constraints and view definitions (through access expressions). UpSchema = "ALTER SCHEMA", Id, [ "ADD", [ "CLASSES", Id-Set ] | [ "CONSTRAINTS", Id-Set ] | [ "VIEWS", Id-Set ] ] | [ "REMOVE", [ "CLASSES", Id-Set ] | [ "CONSTRAINTS", Id-Set ] | [ "VIEWS", Id-Set ] ]; RmSchema = "DELETE SCHEMA", Id;

As mentioned above, modules are compile-time abstractions. The concept of classes is used to model run-time abstractions. Modules contain classes, constants, simple constraints, type definitions, global variables, and view definitions. ModDecl

= ClsDecl | CnstDecl | CstrDecl | TypeDecl | VarDecl | ViewDecl;

(i)DBPQL distinguished values and objects. Values are data items that are identified by their value (e.g. boolean, character, natural, integer, and real, which are the atomic types of (i)DBPQL). Objects are data items that have an immutable object identifier (OID) independent of associated values. The concept of types and classes respectively are used to group values and objects respectively. Separating types and classes is not common in nowadays programming environments (e.g. think of C++ and Java), but desirable to, among other things, clearly distinguish between inheritance and sub-typing 2 . Class interfaces are used to define common structure and behaviour of objects that belong to the same class. Classes are then used to implement one or more class interfaces. Object creation takes place only through classes. Class hierarchies are formed explicitly by supporting name-based sub- and super-classing. ClsDecl

= ClsIface | ClsImpl;

ClsIface = "CLASS INTERFACE", Id, [ ( "SUBCLASS" | "SUPERCLASS" ), "OF", Id-Set ], ’{’, { AttrDecl }, { MethSig }, ’}’; ClsImpl

= "public" | "private", "CLASS", Id, [ ( "SUBCLASS" | "SUPERCLASS" ), "OF", Id-Set ], [ "IMPLEMENTS", Id-Set ], ’{’, { AttrDecl }, { VarDecl }, { CtrcDecl }, { MethDecl }, ’}’; % Used to implement (public) class interfaces and structure functionality. 2

Please refer to [7] for a in depth discussion on the subject of inheritance, sub-typing and why they are to be distinguished.

8

Types structure values. Type interfaces are used to define common structure and behaviour (i.e. type operations) of values. Sub-typing is structural. Thus, behaviour is not inherited. Typeclasses are then used to implement type interfaces exploiting implementation inheritance. Type definitions may contain type parameters allowing for an built-in support of type constructors. Pre-defined type constructors are multiset (with set and emptyset as specialisations), record (with emptylist, string, array, enum, and subrange as specialisations), and list. They can all be seen as specialisations of collections. Values are used to support attributes on the class-level of a database schema. TypIface = "TYPE", TypeId, [ ’(’, TypePara, ’)’ ], ’{’, { ConsDecl }, { VarDecl }, { TypOpSig }, ’}’; TypImpl

= "public" | "private", "TYPECLASS", Id, [ ’(’, TypePara, ’)’ ], [ "INHERITS", Id-Set ], "IMPLEMENTS", TypeId, ’{’, { ConstDecl }, { VarDecl }, { TypODecl }, ’}’;

SubType

= "SUBTYPE", TypeId, "OF", TypeId-Set;

% Specify sub-type explicitly.

SupType

= "SUPERTYPE", TypeId, "OF", TypeId-Set;

% Specify super-type explicitly.

TypOpSig = "public" | "private", "TYPEOP", TypeOpId, "AS", TypeOpAbbr, ’(’, [ TypeParaList ], ’)’, ResultType; % Specifies the signature of a type operation. TypOp

= TypOpSig, ’{’, Block, ’}’;

Type

= ( AtomType, ’;’ ) | CollType | TypeId | TypePara;

Considering type, view, class, ... definitions, values, objects and behaviours, database programming languages are also concerned with persistence issues. In (i)DBPQL every data item that can exists in main memory may also be persistent. Thus, as desired in [5, 3, 4, 14], (i)DBPQL’s underlying type system supports persistence independence and orthogonality of type and persistence. This results in a uniform treatment of volatile and persistent data. Turning our attention more to statements and expression, (i)DBPQL includes (among others) the following PL constructs: assignment, break, condition, continue, label, loop (i.e. do-while, for, loop, and while loops), method invocation, and return statements. Supported QL constructs include ad-hoc or value queries (which only return values), select-from-where (SFW) statements (that return either collections of values or collections of values and objects), schema definition and manipulation statements as discussed above, view definition and manipulation statements, access expressions, simple constraint definition statements, etc. While classes are used to group objects (implicitly by the system, i.e. by maintaining shallow and deep extents) according to their common structure and behaviour, collections allow users to create and maintain additional groups of objects. Collections are similar to classes. However they do not support object creation, are user maintained, may contain objects of different structure and behaviour, and have implicit interfaces. For instance, a SFW statement returns a (temporary, transient) collection that is derived from the collections specified in the FROM expression that match the boolean expression detailed in the WHERE expression. The SELECT expression corresponds to a list of projections that then 9

implicitly define the structure (and behaviour) of the resulting collection objects. Finally, we want to refer to a concept that is supported differently when considering DBPQL and iDBPQL. DBPQL supports transactions implicitly. Thus, there is a default that can be summarised as follows: each invocation of a persistent or schema related method or each statement involving persistent or schema related (e.g. view-derived) data items corresponds to a transaction. Alternatively, users may declare transactions explicitly overwriting the default procedure. On the lower (iDBPQL) level, transactions are supported explicitly. Recall how blocks are used to model transactions. Supporting transactions explicitly on a lower level is necessary since fragmentation and allocation processes may result in replacing one method call by two or more method calls in case a class has been fragmented vertically. However, we have to retain the information that all or none of these two or more method calls must be executed. Forming a transaction block around them will allow to preserve this information.

5

Realising iDBPQL

Knowing the main characteristics of iDBPQL, we now turn our attention to some more lower level details addressing the realisation of this language. As mentioned above, REEs govern the evaluation of iDBPQL code. REEs are comprised of a network of agents that (in terms of the Java programming language) form a (networked) virtual machine evaluating iDBPQL code. Having such a (networked) virtual machine allows to re-use persistent behaviours easily on any DBS node no matter where it has been compiled. REEs also take advantage of services (e.g. transaction support, concurrency control, transaction, crash and media recovery, etc.) provided by other DBS components. Communication among agents of REEs and between agents of different DBS components is realised using the DBACL language briefly introduced in Section 5.2. Networks of agents are built using agent-building blocks of the DBAA briefly introduced in Section 5.1. Once we are able to build networks of agents and communicate within them, we consider how basic REE agents are structured and how they evaluate iDBPQL statements.

5.1

DBAA – A Database Agent Architecture

In this section, we briefly introduce the agent architecture DBAA, which defines building blocks used to form networks of agents that cooperatively evaluate iDBPQL code. Basic DBAA components are a DBS-wide registry, one or more Master Agents per agent-based DBS component and an agent pool per DBS node. Figure 4 details six common agent-building blocks. Characteristics are as follows: • Master Agent (MA). MAs provide a means of first point of contact to other DBS components. Each DBS component that intends to provide services to others will have to register one or more (i.e. a collection of) MAs before their services can be accessed. MAs delegate incoming requests to one or more (idle) agents waiting in the local agent pool. Once a request has been delegated, the corresponding agent(s) will take over all communication responsibilities w.r.t. this request or stream of requests. 10

Master Agent

Federation of Agents

Virtual Collection of Agents

Agent

Agent−building blocks Collection of Agents

Block of Agents

Figure 4: Overview of DBAA Agent Building Blocks. • Agent. 5.3.

Agents are basic building blocks. We will discuss these in more detail in Section

• Collection of Agents. A collection of agents is formed as soon as an agent delegates a number of sub-tasks to another agent on the local DBS node. Agents of the same collection work cooperatively on fulfilling a common set of requests. Collections can contain subcollections. • Block of Agents. A block of agents can be understood as a special type of a collection of agents that becomes important in the DBS environment. For instance, during the evaluation of database requests intermediate results are created. Whenever possible results are pipelined between agents implementing subsequent tasks as outlined in an execution plan. However, at times, it is necessary to materialise some of these intermediate results. Blocks can be seen as boundaries. Pipelining is used for agent communication inside the block. Whenever the block of agents has fulfilled one or more tasks, results (or answers) are materialised and then returned to the caller. • Federation of Agents. While agents in collections and blocks add new agents to help with the fulfilment of one or more tasks, agents that form (or join) a federation only share their already gained knowledge or expertise. • Virtual Collection of Agents. Virtual collections can be used for different purposes. On one hand, they can be used in the same way as collections but may span multiple DBS nodes. In addition, a virtual (sub-)collection can be maintained explicitly to group a number of agents that work towards the same goal. For instance, collections are used to model (local) transactions. Assuming the use of a multilevel transaction model [24, 2, 10], this may be done by designating a particular sub-collection as sub-transaction. The outermost collection always corresponds to a top-level transaction. Distributed transactions and replica management can be governed using (virtual) collections.

5.2

DBACL – A Database Agent Communication Language

In this section, we briefly outline communication capabilities provided to basic agents as defined by the DBAA. The agent communication language DBACL is introduced to gather all communication demands. 11

DBACL is similar to the KQML language [8] in terms of separating communication aspects from, what we call, the Agent Evaluation/Execution Language. Thus, agents exchange messages (in an AEL of their choice) wrapped in a DBACL message. DBACL supports • One-to-one, one-to-many and broadcast styles of agent-to-agent communication. • Synchronous or up- and down-stream pipelining of data items. Triggers can be set to control pipelining based on time and / or size constraints. Also, synchronisation signals may be used to overwrite trigger constraints. • A variety of communication patterns including signals, notifications, broadcasts, votes, negotiations, and requests. Requests, in turn, include registration requests, agent administration requests and evaluation requests (i.e. iDBPQL code when considering REE agents). Votes, negotiations and requests also support pre-conditions, post-conditions, and completion-conditions (formulated in AEL) that correspond to conditions, which are to be forwarded to the inner-agent logics.

5.3

Realisation of Basic REE Agents

Similar to the approach by Subieta et al. [23] we propose the use of stack-based abstract machines to evaluate iDBPQL code. Basic REE agents (or also referred to as three-stack abstract machines or, short, 3SAMs) consist of the following components: • An Environment Stack (ES) that determines scoping and binding. In contrast to Subieta et al. [23], we separate private and public components of data held on the ES. • A Result Stack (RS) that stores intermediate and final results. • A new Parameter Stack (PS) that stores evaluation parameters. Supporting complex objects requires capabilities to unnest evaluation parameters. Therefore, a separate stack is necessary not to comprise system performance. • A number of queues (i.e. InQueues and OutQueues) are used to hold data items that are either to be sent to other agents or that hold new requests or results of previously delegated code segments. • A 3SAM engine (code modules implementing core iDBPQL statements) evaluating iDBPQL code. 3SAMs implement a number of basic stack operations and exploit functionality defined by DBACL. Basic stack operations include: • push, pop, top, empty, and unnest operations for the RS; • push, pop, top, empty, findById, findByName, findNextName, bindNames, unnest, openNewScope, and closeScope operations for the ES together with a variety of scoping and search-related pointers; and • push, pop, top, empty, and unnest operations for the PS. 12

3SAM engines interpret iDBPQL statements and annotations, and decide how each statement is to be evaluated efficiently while respecting information specified as annotations. First, block annotations have to be considered. When dealing with a transactional block, the local TMS has to be consulted. Only when approval has been given to execute a particular block, the 3SAM engine may continue. Secondly, sub-tasks (code pieces) may be delegated to other (local or remote) REE agents depending on the specified mode of processing. Processing annotations and corresponding actions are as follows: • Serial. All actions can be executed by the same REE agent. • Concurrent. A block of agents is used to exploit multi-threading. Whenever possible arguments and results are pipelined between agents of the same block. • Parallel. Same as Concurrent but agents associated with different CPUs are chosen. • Distributed. A (virtual) collection is formed. Whenever possible arguments and results are pipelined in order to reduce delays due to data communication. • None. Same as Serial. Once, it has been decided that a particular statement is to be evaluated by the local 3SAM, stacks are initialised which may result in opening a large number of scopes and (potentially) retrieving data (i.e. operational, structural, statistical or index data) from persistent storage. In order to retrieve data, the service interface of the Persistent Object Store is used. Subsequently, the 3SAM macro for the particular iDBPQL statement (e.g. a hybrid hash join implementation) is invoked. Further distribution of task may occur here, e.g. when operating on a parallel machine.

6

Conclusion

In this paper, we have discussed an approach to integrate programming and querying languages for distributed object-oriented database systems. Main focus has been on features that stem from a low level version of this language that is realised on the level of request evaluation engines and transaction management systems. A network of agents is used to form a networked virtual machine. Agents themselves are realised as three-stack abstract machines.

References [1] Antonio Albano, Giorgio Ghelli, and Renzo Orsini. Fibonacci: a programming language for object databases. The VLDB Journal, 4(3):403–444, 1995. [2] Gustavo Alonso, Stephen Blott, Armin Feßler, and Hans-J¨ org Schek. Correctness and parallelism in composite systems. In PODS ’97: Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, pages 197–208. ACM Press, 1997. [3] Malcolm Atkinson, Fran¸cois Bancilhon, David DeWitt, Klaus Dittrich, David Maier, and Stanley Zdonik. The object-oriented database system manifesto. In Proceedings of the First International Conference on Deductive and Object-Oriented Databases, pages 223–240, Kyoto, Japan, 1989. [4] Malcolm Atkinson and Ronald Morrison. Orthogonally persistent object systems. The VLDB Journal, 4(3):319–402, 1995.

13

[5] Malcolm P. Atkinson and Peter Buneman. Types and persistence in database programming languages. ACM Computing Surveys (CSUR), 19(2):105–170, 1987. [6] Fran¸cois Bancilhon, Claude Delobel, and Paris Kanellakis. Building an object-oriented database system: the story of 02 . Morgan Kaufmann Publishers Inc., 1992. [7] William R. Cook, Walter Hill, and Peter S. Canning. Inheritance is not subtyping. In POPL ’90: Proceedings of the 17th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 125–135, New York, NY, USA, 1990. ACM Press. [8] Tim Finin, Richard Fritzson, Don McKay, and Robin McEntire. Kqml as an agent communication language. In CIKM ’94: Proceedings of the third international conference on Information and knowledge management, pages 456–463, New York, NY, USA, 1994. ACM Press. [9] Won Kim. Research directions in object-oriented database systems. In Proceedings of the Ninth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, April 2-4, 1990, Nashville, Tennessee, pages 1–15. ACM Press, 1990. [10] Markus Kirchberg. Exploiting multi-level transactions in distributed database systems. In Witold Litwin and G´erard L´evy, editors, Distributed Data & Structures 4: Records of the 4th International Meeting, volume 14 of Proceedings in Informatics, pages 37–58. Carleton Scientific, 2002. [11] Markus Kirchberg. Two-stack abstract machines (2SAMs): An integrated approach to database query and transaction processing. PhD Workshop Communications 6/2003, Department of Information Systems, Massey University, New Zealand, Aug 2003. [12] Markus Kirchberg, Klaus-Dieter Schewe, and Alexei Tretiakov. A multi-level architecture for distributed object bases. In Proceedings of the 5th International Conference on Enterprise Information Systems (ICEIS), volume 1, pages 63–70. ICEIS Press, 2003. [13] Yuri Leontiev. Type system for an object-oriented database programming language. PhD thesis, 1999. ¨ Adviser-M. Tamer Ozsu and Adviser-Duane Szafron. ¨ [14] Yuri Leontiev, M. Tamer Ozsu, and Duane Szafron. On type systems for object-oriented database programming languages. ACM Computing Surveys (CSUR), 34(4):409–449, 2002. [15] Anna Lipka. The design and implementation of TIGUKAT user languages. Technical Report TR9311, Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada, July 1993. [16] R. Morrison, R. C. H. Connor, G. N. C. Kirby, D. S. Munro, M. P. Atkinson, Q. I. Cutts, A. L. Brown, and A. Dearle. The Napier88 persistent programming language and environment. In M. P. Atkinson and R. Welland, editors, Fully Integrated Data Environments, pages 98–154. Springer, 1999. ¨ [17] M. Tamer Ozsu, Randal J. Peters, Duane Szafron, Boman Irani, Anna Lipka, and Adriana Mu˜ noz. TIGUKAT: A uniform behavioral objectbase management system. VLDB Journal, 4(3):445–492, 1995. [18] Raghu Ramakrishnan and Johannes Gehrke. Database Management Systems. McGraw-Hill Higher Education, 2003. [19] Klaus-Dieter Schewe, David W. Stemple, and Bernhard Thalheim. Higher-level genericity in objectoriented databases. In Conference on Management of Data, 1994. [20] J. W. Schmidt and M. Mall. Pascal/R report. Technical Report 66, Fachbereich Automatik, University of Hamburg, Hamburg, Federal Republic of Germany, Jan 1980. [21] Abraham Silberschatz, Henry F. Korth, and S. Sudarshan. Database Systems Concepts. McGraw-Hill Higher Education, 2002. [22] Kazimierz Subieta. LOQIS: The object-oriented database programming system. Lecture Notes in Computer Science, 504:403–421, 1991.

14

[23] Kazimierz Subieta, Catriel Beeri, Florian Matthes, and Joachim W. Schmidt. A stack-based approach to query languages. Technical Report 738, Institute of Computer Science Polish Academy of Sciences, Warszawa, Poland, dec 1993. [24] Gerhard Weikum. Principles and realization strategies of multilevel transaction management. ACM Transactions on Database Systems (TODS), 16(1):132–180, 1991.

15