Query Rewriting with Coko-Kola

0 downloads 0 Views 124KB Size Report
plan, which can be seen as a strategy how to answer a query. Query Definitions given most often in a standard query language like SQL for relational database systems or ... In Section 7 two examples of the usage of the Rewriter is given. .... A rewrite-rule may require that in a certain subexpression only one free variable.
Query Rewriting with Coko-Kola Robin Aly Database Seminar University of Mannheim Instructor: Prof. Dr. Guido Moerkotte Revision: April 22, 2004 Contents 1. Overview

2

2. Definitions

3

3. Motivation for Query Rewrites

3

4. Problems in Query Rewriting

4

5. The 5.1. 5.2. 5.3. 5.4.

Coko-Kola Query Extensibility . . . Correctness . . . Expressiveness . . Efficiency . . . .

Rewriter . . . . . . . . . . . . . . . . . . . . . . . .

. . . .

5 5 6 6 7

6. Syntax of a Query Rewrite rule 6.1. Kola Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2. Firing-Algorithms Syntax . . . . . . . . . . . . . . . . . . . . . . . . . .

7 7 10

7. Examples using Coko-Kola 7.1. Conjunctive Normal Form . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2. Query Unnesting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10 11 11

8. Conclusion

13

A. Theorem Proving

14

. . . .

. . . .

. . . .

. . . .

1

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

1. Overview The Query Optimizer is a difficult but important component of every Database Management System (DBMS). Starting from a query definition it produces a query execution plan, which can be seen as a strategy how to answer a query. Query Definitions given most often in a standard query language like SQL for relational database systems or OQL for object-oriented systems. The major task for the Query Optimizer is to produce a good execution plan in order to quickly generate the query result and create a minimum load on the system executing the query. A common architecture of a Query Optimizer is to split this task into four subcomponents (see Figure 1). First the Parser takes the query language as an input and transforms it into some form of internal representation. The Query Rewriter then applies several heuristics on the representation in order to generate more feasible plans. The plan generator uses the improved representation to create several alternative query evaluation plans. The plan which is considered best is then chosen. During the Query Evaluation this plan is used to generate the selected tuples from the database. Query (SQL / OQL)

Translator primary Query Representation Query Rewriter optimized Query Representation (Algebra) Plan Generator Query Plan Query Evaluator Result Data (Tuples)

Figure 1: common query processing steps This paper concentrates on the Query Rewriter Coko-Kola which is a relatively new offspring of this class of software. One of the first Query Rewriters was included in IBM’s System R in 1979. These old systems merely used Query Rewriting for view-merging: the rewriting of a query over a view into a query over the data underlying the view. The emergence of alternative data-models (such as the object-oriented data model) demanded extensible Query Optimizers . Rule-based Query Optimizers were first introduced with the Starburst System in 1984. The rule-based Query Rewriters perform the

2

means of mapping the query representation into a Query Execution Plan by executing a sequence of rules. In section 2 some more formal definitions of the used keywords are given. In the following sectiong 3 the use of Query Rewriters is motivated. Then in section 4 some of the most crucial problems occurring during the development of a Rewriter are pointed out to the reader. In section 5 the Coko-Kola system is introduced followed by a description of the particular goals of the system. The syntax of the Rewriting Language is introduced in Section 6. In Section 7 two examples of the usage of the Rewriter is given. The paper is ended with section 8 by a summary and conclusion on things learned with Coko-Kola.

2. Definitions Here are some short definitions of the most important keywords used in this paper. Query Optimization is the process of making a query more efficiently executable. This includes a transformation of the query into a equivalent query which is likely to be executed better and increases the choices of algorithms to be used for query execution. Query Rewriting is a processing step during Query Optimization. It transforms the querie’s parse-tree into a tree producing equivalent results but giving the plan generator more choices. Rule-based Query Rewriting this is a special class of Query Rewriters . They execute a given set of rules against a parse-tree. Each rule first tries to identify a certain subtree which should be transformed. After a successful identification this subtree is transformed into a equivalent subtree. The identification phase of the rule execution is also referred to be the left hand side (lhs) of the rule whereas the transformation phase is the right hand side (rhs). Correctness of a Query Rewrite A query rewrite is correct if rewriting alway produces a semantically equivalent query (i.e. a query that is guaranteed to to produce the same results over all database states as th e original query) [1].

3. Motivation for Query Rewrites The reason why we are doing Query Rewriting is that direct translation of queries posed by users are often far from the optimal execution plan [6]. It is important the DBMS supports the user to pose its queries in the most natural way. In reality a trade off has to be established between user-friendlieness and degrees of optimization opportunities in a Query Optimization Process. A classical example is a query containing a sub-query (For an example please refer to Figure 2) . The inner query (which is potentially complex) is executed for every tuple in the outer query, resulting in very long execution times. The

3

semantically same query is transformed into a join between two base relations which performs much faster. SELECT DISTINCT E.Name FROM EMP E WHERE E.Dept# IN SELECT D.Dept# FROM Dept D WHERE D.Loc=’MA’ E.Emp#=D.Mgr

SELECT DISTINCT E.Name FROM EMP E, DEPT D WHERE E.Dept# = D.Dept# D.Loc=’MA’ E.Emp#=D.Mgr

Figure 2: Rewriting a nested query in a query using a join It is also desirable to generate more possible execution plans for the Plan Generator. With a rewrite from a nested query into a join based query the plan generator gains much more alternatives on which algorithms to use to execute the query (see Figure 2). The emergence of new data-models and query languages (object-oriented databases and OQL) also gave good reasons for rewriting. This specially increased the need of extensible Query Rewriters that could adjust to deeply nested data-structures as well as complex query languages. Rule-based optimizers like Coko-Kola address this problem by allowing the developer to easily modify the behavior of the optimizer by changing the set of rules it operates on. More formerly the objectives of a rule-based Query Rewriter can be seen as: Normalization Bring query subexpressions into a form that allows the optimizer to consider a greater number of alternative plans. Improvement The application of heuristics which improve the execution of the query. For example a rewrite could delete a duplicate elimination when it is not needed. This is the case if the analyzed collection is already a set and therefore duplicate free.

4. Problems in Query Rewriting Query Rewriting is a complex and error-prone task. It has been found that optimized queries often return incorrect results. One of the most notorious examples is the ”COUNT bug” in a paper by Kim [5] which was revealed to be incorrect in several points [4]. This brings up the demand that query rewrites should preserve the semantics of the query they rewrite. Therefore the workings of the Rewriter must be understandable and the developer must be able to reason about the correctness of the rules used for rewriting. This is difficult with most common Query Rewriters . Starburst [8] for example uses C code to rewrite queries. This is problematic as most algorithms in a higher level language are hard to reason about. The formal verification of correctness is even impossible which is a important realization of theoretical computer science.

4

Another problem arises with variable-based Query Rewriters . These systems use variables to identify subexpressions of queries. Some subexpressions look syntactically the same but might have different semantics as they are used in different contexts. This is the reason why those systems also have to do context checks during the identification phase of a rewrite. These checks require code and can not be done by simple pattern matching..

5. The Coko-Kola Query Rewriter The Query Rewriter analyzed in this paper is the Coko-Kola rule-based Query Rewriter . It was developed at the Brown University, U.S. mainly by Mitch Cherniack and Stan Zdonik starting from 1996. The new idea behind this rewriter is that the syntax of the rewrite-rules is combinator-based. This relaxes many difficulties and insufficiencies other rewriters have. The Query-Rewriting Component consists of two major parts: Kola is the combinator-based language which is used to represent query parse-trees and specify the identification and transformation part of a rewrite rule Coko is a language to specify rule firing algorithms on the query parse-tree. Here the order of the executed rewrite rules is specified. Following goals were followed during the design of the Query Rewriter 1. Extensibility 2. Correctness 3. Expressiveness 4. Efficiency In the next sections reasons for the major goals of the software are elaborated. The design decissions done by the inventors of Coko-Kola are also explained.

5.1. Extensibility As mentioned before extensibility is particular important in Query Rewriters as they need to be able to adapt new data models and query languages. Older versions of SQL only allowed Sub-queries in the WHERE-Clause of the query whereas in OQL you could specify subqueries in all Clauses. These subqueries could also have a theoretically infinite nesting. Together with a higher degree of complexity there are also more opportunities to optimize queries. To exploit these opportunities it is important that the optimizer is able to change its behavior easily according to new arrising challenges. This extensibility is achieved in Coko-Kola through the ease of modifying the base of relatively powerful rewrite rules. Important here is that these Kola rules themselves are

5

small and self contained. So new optimization ideas could be easily formulated without the possibility that other rules could be affected in their correctness. With a Coko-Transformation you can combine several rewrite-rules into one logical unit. These Transformations can also use other Transformations. This increases the modularity of such rules and Transformations and makes it easy to build new ones upon the already existing base of rules. This makes the architecture flexible and extensible.

5.2. Correctness As mentioned before the rule-based Rewrite consists of two steps: the subexpression identification step followed by the transformation step. Most Query Rewriters use pattern matching algorithms to identify subexpressions. Those subexpressions are then transformed into a new equivalent query. Query-rewriting using variable-based rewrite-rules can not simply rely on pattern matching as there are cases where the context of variables need to be further examined. This is true for the query identification as well as for the query transformation steps. A rewrite-rule may require that in a certain subexpression only one free variable appears. This can not be done with simple pattern matching algorithms as they only rely on the syntactical equivalence of expressions. Kola rules uses combinators to represent the query tree as well as the left-hand-side and the right-hand-side of a rewrite rule. This eliminates the need of context checks altogether and binds the rule solely on the syntax of the query. We also do not need supplemented source code to check the correct application of the rule anymore. The biggest benefit earned by using a combinator-based syntax is that we can verify the correctness of each Rewrite-rule with an Automatic Theorem Prover. This kind of software is mainly used in Artificial Intelligence context but can also be used to verify the correctness of a transformations from one query representation into another. Proving the equivalence of two statements is a natural application of Theorem Provers. Of course automated theorem proving can not ultimately proof that a rewrite is correct because it is also only a piece of software. Therefore it can only increase the confidence in a correctness proof of a Rewrite when its proving capabilities are trusted more than the one of a human-being. Please refer to Appendix A for a more detailed description of the operations of the used theorem prover.

5.3. Expressiveness As mentioned earlier the expressiveness of a rewrite-language is also important. It enables rule developers to more effectively reason about the proposed rules and allows easier formulation of complex rewrites. Together with the correctness issue this was the reason for creating the Kola-language in a declarative style. The hope is that the developer does not have to specify how a rewrite should happen but what it should produce. So far this only gives the ability to specify a single rewrite expression which is not very much. To give the rule developer the control over the order of rule execution multiple

6

kola-rules can be combined in one header of a Coko-Transformation. In the body of this transformation it is stated how to apply the rewrites on the query tree. These statements are called the firing algorithms of the transformation. It is also possible to call other transformations for necessary preprocessing steps which are common with other transformations. As the single kola statement was too simple to express complex rewrites such as normalization. The combination with Coko-Transformations gives a much more expressive tool at hand to develop even complex rewrites.

5.4. Efficiency As far as performance is concerned it is also important to keep the rewriting phase as efficient as possible. One problem is to quickly find the nodes where some rules could be executed. The Coko-Kola Query Rewriter addresses this issue by giving the developer flexible set of firing algorithms to optimally traverse the query parse tree. In their paper about Coko [7] the inventors compare several algorithms (one of them is described below) for a Conjunctive-Normal-Form (CNF) Transformation against an naive exhaustive search approach in a top-down as well in a bottom-up manner. The outcome is that their algorithm is increasingly faster depending on the height of the query parse tree. Another problem is to optimize the actual transformation of the found sub-expression. This is achieved by translating the declarative Kola rules and Coko-Transformations into C++ classes. This provides a good execution speed and makes the translation into C++ a good entry point to introduce further optimizations and enhancements.

6. Syntax of a Query Rewrite rule As mentioned before the syntax of the Coko-Kola Query Rewriter consists of Kola Rewrite rules and Coko-Transformations specifying their execution order. The Rewrite rules are in a combinator based language which makes them more difficult to read. On the other hand we can proof their correctness due to this fact. A Coko-Transformation consists of possibly more than one Kola rule in the USES-Section, each identified by a unique Name. It is also possible to declare the use of other Coko-Transformations simply by stating their name. Between the BEGIN and END keywords the developer writes a sequence of firing algorithms (see Figure 3). They control the execution order of the Rewrite rules as well as the traversal of the query-tree. In the following two subsections the syntax of Kola-rules and Coko-Transformations are described more deeply.

6.1. Kola Syntax Kola’s combinator style maybe difficult to read. But we also have to bear in mind that it is not meant as a query language but only as an internal representation. (A more

7

TRANSFORMATION USES : => ... BEGIN ... END Figure 3: Coko-Transformation Syntax comprehensive introduction of Kola can be found at [2]). The syntax of Kola includes no variables. The functions and predicates are either constructed from primitives or formers. Through the lack of variables we do not need to do environment checking during subexpression identification anymore. Kola rules solely depend on he syntax of the query. Therefore the application of simple pattern matching algorithms are sufficient to perform the identification. It is important to understand the semantics of primitives and formers of Koko to trace the operation of rewrite rules. The invocation of a function f on object or value x is expressed as f !x. Predicates are also listed with the objects they are checked on (denoted as p?x). We show the most common formers in the Tables 1, 2 and 3. We expect that Table 1 and Table 2 are self-explanatory. Table 3 lists the formers to represent a query. The former iterate for example can be used to represent a simple SELECT-FROMWHERE query with a single base relation. For every xA the predicate p is checked and if this succeeds the result of function f applied on x is returned. The former is used to join two base relations together. Njoin is similar in its operation but performs a grouping of values by the values of the ”left” set. To get a little bit more familiar with the syntax we trace the equivalence of a predicate from the SQL-Statement in Figure 2 which is invoked on two concrete objects e and d: (eq ⊕ hDeptN o ◦ π1 , DeptN o ◦ π1 i)?[e, d] = eq?(hDeptN o ◦ π1 , DeptN o ◦ π1 i![e, d]) = eq?[(DeptN o ◦ π1 )![e, d], (DeptN o ◦ π2 )![e, d]] = eq?[DeptN o!e, DeptN o!d] = eq?[e.DeptN o, d.DeptN o] = e.DeptN o == d.DeptN o The identification of subexpressions can now be performed by pattern matching algorithms without any additional context checks. If the identification step of a rule was successful the result is some bindings of symbols to subexpressions. The expression is than rewritten into a new expression using these symbols.

8

KOLA Semantics id id!x π1 π1 ![x, y] π2 π2 ![x, y] < att > < att >!x ◦ (f ◦ g)!x hi hf, gi!x × (f × g)![x, y] Kf Kf (x)!y Cf Cf (f, x)!y

= = = = = = = = =

Comment x identity function x projection y projection x. < att > attribute access of an object f !(g!x) composition [f !x, g!x] function paring [f !x, g!y] pairwise application x constant function f ![x, y] curried functions Table 1: Function Combinators

KOLA eq ⊕ & | ∼ −1

Kp Cp

Semantics Comment eq?[x, y] = x == y(lt, gt, leq, geq) comparison operators (p ⊕ f )?x = p?(f !x)) combination of predicate and function (p&q)?x = (p?x)AN D(q?x) logical operator AND (p|q)?x = (p?x)OR(q?x) logical operator OR ∼ (p)?x = N OT (p?x) logical operator NOT p−1 ?[x, y] = p?[y, x] inverse predicate Kp (b)?x = b constant predicate Cp (p, x)?y = p?[x, y] curried predicate Table 2: Predicate Combinators

KOLA iterate(p, f )!A

Semantics {(f !x)i |xi A, p?x}

Comment function f applied to a bag of elements satisfying an predicate p. ij i j join(p, f )![A, B] {(f ![x, y]) |x A, y B, p?[x, y]} join two bags of values together and perform a function f on all elements satisfying p. j i njoin(p, f, g)![A, B] {[x, g!{(f !y)|y B, p?[x, y]}]|x A} performs a join and groups the elements of B under a corresponding element of A ex(p)?A ∃(xA ∧ p?x) Existence Quantor set!A {x|xA} Duplicate removal Table 3: Query Formers

9

6.2. Firing-Algorithms Syntax Coko-Firing Algorithms are used to control the execution of the previously mentioned Kola-rules. There is the possibility to selectively fire rules only on some certain nodes of the query-tree, or make the firing dependent of the outcome from the last fired rule. In the following the four different kinds of firing algorithms are described: Explicit Firing is the easiest kind of firing algorithm. The execution is done like a function call in C. Simply state one of the rules listed in the USES section followed by a semicolon an this rule will be executed at the specified point in time. It is also possible to execute the inverse of a rule which swaps the left and the right hand side of the rule - effectively undoing changes done by the execution of a previous rule. Traversal Control This category summarizes rules that should not only be executed on the current node but also on some in the subtree rooted at this node. It is possible to choose between a Top-Down (TD) or Bottom-Up (BU) traversal of the tree. Followed by the traversal specifier occurs the name of the rule which should be executed on every node. A little bit different is the third kind of Traversal Control algorithms: ”REPEAT s” fires rule s as long as the rule successfully executes (i.e. until the left hand side of the rule doesn’t match the current node anymore. Selective Firing rules do not need to be fired only on the current node. They could instead be fired on an isolated subtree of the current subtree. The GIVEN Statement resembles the left hand side of a Kola-rule in its first part. This part is used to identify the subtrees and the Kola-rule followed in the second part (after the DO keyword) can use the identified symbols to specify the node is should be fired on. Conditional Firing These firing algorithms make the execution of a rule dependent of the outcome of the previous executed rule. We differentiate three different ways of Conditional Firing: • S ⇒ S 0 only executes S 0 if S is successfully executed • S||S 0 executes S 0 only if the previous S didn’t succeed • S; S 0 subsequently executes S and S 0 and returns true if either one succeeded.

7. Examples using Coko-Kola After explaining the Coko-Kola Query Rewriter two examples of the usage of Rewriter are presented.

10

7.1. Conjunctive Normal Form As a entry point a a relatively easy example of transforming a predicate into Conjunctive Normal Form (CNF) is presented. As a reminder: CNF is a conjunct of disjuncts of possibly negated literals. This rewrite is used as a preprocessing step in many other rewrites. For example it can be used in the transformation into Separate Normal Form which in turn is then used for magic set rewriting as well as predicate push down rewrites. The two rules that are involved are: →

d1 : (p&q)|r = (p|r)&(q|r) → d2 : r|(p&q) = (p|r)&(q|r) First the reader needs to realize that this rewrite is too difficult for two single Kola-rule as it at least includes recursion at the point when the predicate term is rewritten and one of the subtrees could be transformed again. This visualizes the need for a more complex firing algorithm specified as a Coko-Transformation. There are several possibilities to do the job. First, it is possible to do a top-down traversal of the query-tree and fire both rules on every node until both of them can not be applied anymore. This is obviously not very ideal as it scans the query-tree much to often. Another alternative would be to do a bottom-up scan of the tree and if one of the rules succeeds recursively executed the rules again on the two new sub-trees. But this is also not very idea. A good alternative would be a single bottom up traversal of the query-tree and only if a firing of one of the rules was successfully applied try the rules again on the roots of the newly created sub-trees (not recursively on every node in the tree). Please refer to Figure 4 to see ”the source code” of this alternative. The working of this algorithm will be demonstrated on a combined predicate: (P &Q&R)|S). You can see a tree representation of the 3 execution steps in Figure 5. Here is a short explanation of what happens. First in (a) CNF is called and performs and does a bottom-up traversal of the tree - firing CNF Aux on every node. It is only successful on the root node. The node is transformed according to the rule d1 and CNF Aux is fired on both ”ORed” child-nodes in (b). It is only successful on the right node and performs (again) two executions on the left and the right subtree (c). But this time without success.

7.2. Query Unnesting The second example is a little bit more complex. A nested Query is transformed into a join based query. As mentioned before this technique can be used to give the plan generator more choices. The SQL Query shown in Figure 2 is used to demonstrate the transformation. First the original query is translated into a Kola-Expressions. The result is: set!iterate(ex(O) ⊕ hid, iterate(P &Q, π2 )!hid, Kf (DEPT)ii, N ame)!EMP

11

TRANSFORMATION CNF USES CNF_Aux BEGIN BU CNF_AUX END

TRANSFORMATION CNF_AUX USES d1: (p & q) | r => (p | r) & (q | r) d2: r | (p & q) => (p | r) & (q | r) BEGIN { d1 || d2 } -> GIVEN p & q DO {CNF_AUX(p); CNF_AUX(q)} END

Figure 4: Rewrite-rule to transform a predicate into CNF

| &

& |

S &

P Q

(a)

P R

& | &

S Q

| S

P

& |

S Q

R

(b)

S (c)

| R

S

Figure 5: Transformation of a predicate (P and Q and R) or S into CNF

12

with O = (eq ⊕ hDeptN o, DeptN oi) P = (eq ⊕ hLoc ◦ π2 , Kf (0 M a0 )i) Q = (eq ⊕ hEmp ◦ π1 , M gr ◦ π2 i) Then the rewrite into a join based query is performed (see Figure 6). This is achieved by two Koko-rules. First the predicate from an existence quantor is pulled out and thus eliminates the iteration of the inner query (1). With (2) the query is transformed into a join based query. The Coko-Transformation surrounding these two rules is not explicitly stated as it does not contain any special firing algorithms. →

ex(p) ⊕ hid, iterate(q, π2 )!hid, Kf (B)ii = ex(q&p) ⊕ hid, Kf (B)i → set!iterate(ex(p) ⊕ hid, Kf (B)i, f )!A = set!join(p, f ◦ π1 )![A, B]

(1) (2)

Figure 6: Koko-rules for transforming a nested query in a join based query After the application of both rules the resulting query looks like this: set!join(O&P &Q, N ame ◦ π1 )![EMP, DEPT]

8. Conclusion We now come to our conclusion over the Coko-Kola Query-Rewriter. We saw that it belongs to the class of rule-based Rewriters. These rewriters operate on a set of rules which are executed in two steps: identification and transformation. Special about CokoKola is that these rules are not formulated using variables but with a combinator based syntax. The syntax is declarative allowing us to concentrate on what we want to achieve, not on how we want to achieve it. The major goals the designers of Coko-Kola followed were extensibility, correctness, expressiveness and efficiency. Extensibility was achieved by a modular design of the rules and the clear separation of rule definition and firing strategies. Through the combinator based syntax we achieved a high degree of verifiability (and therefore correctness) by the fact that we can use automatic theorem provers to check whether our rules produce equivalent queries. Expressiveness was provided by flexible firing-algorithms and the declarative style which enables us to use fairly simple syntax even in complex scenarios. At last efficiency was achieved by translation of the rules into C++ code and allowing the user to specify efficient firing-algorithms.. We then went through the concrete syntax of Koko which is the language used for rule definitions. Then we introduced the Coko language for the grouping of Kola rules and their associated firing algorithms. At the end we saw two examples demonstrating

13

the application of Coko-Kola to transform a selection predicate into Conjunctive Normal Form and to change a nested query into a query using joins. As the Coko-Kola system is rather new, it has never been benchmarked against another Query Optimizer . Generally the developers mainly pointed out the advantages of their system but didn’t go into the weak points of it. We see a critical point in the choice of the combinator based syntax as only few developers will be used to the kind of syntax and it is difficult to read by others. The structure of complexer expressions also doesn’t improve readability. Another weak point is that the Optimizer isn’t currently built into a working DBMS. This doesn’t allow users and other researchers to experiment with the system to see its workings in real life situations.

A. Theorem Proving The Theorem Prover used to verify the correctness of Kola-rules is Larch [3]. All proofs in Larch operate on the underlying type-system called traits 1 . Larch has several built in traits like boolean, string, integer and floating point data types. The verification of Kola rules also ueses two additional traits: BagBasics and Bag . BagBasics defines bag constructors, an empty bag constant, the insert operation and several unary operators on bags. Bag defines several binary operators on bags such as union, set difference and intersection. Together with the built-in traits they form a base of axioms upon which all prooves of Query Rewrite rules operate. The proof is performed by rewriting the terms in both expressions to reduce both of them to an equal term. This technique is called term rewriting (not to be confused with query rewriting). This functionality is illustrate with a small example. The goal is to verify that the union of two iterations over to bags produces the result as the iteration over the union of the same two bags. Expressed in KOLA the statement looks like this: →

iterate(p, f )!(A ∪ B) = (iterate(p, f )!A) ∪ (iterate(p, f )!B) Proof. The proof is done on the basis of a set of axioms listed here: {} ∪ B insert(x, A) ∪ B iterate(p, f )!{} (p?x) :: iterate(p, f )!insert(x, A) ∼ (p?x) :: iterate(p, f )!insert(x, A)

→ → → → →

B insert(x, A ∪ B) {} insert(f !x, iterate(p, f )!A) iterate(p, f )!A

(3) (4) (5) (6) (7)

First a proof by induction is performed so that the statement hold for all bags A and B. This is done by assuming an empty bag for A and showing the equivalence of the statements. The proof looks like this: 1

similar to a abstract data type

14



iterate(p, f )!({} ∪ B) = (iterate(p, f )!{}) ∪ (iterate(p, f )!B) → (3) and (5) iterate(p, f )!(B) = ({}) ∪ (iterate(p, f )!B) → (3) iterate(p, f )!(B) = iterate(p, f )!(B) The induction hypothesis is that the statement holds for any bag Ac. Is is now shown that it also holds for the same bag Ac with a random x inserted: →

iterate(p, f )!(insert(x, Ac) ∪ B) = (iterate(p, f )!insert(x, Ac)) ∪ (iterate(p, f )!B) → (4) iterate(p, f )!(insert(x, (Ac ∪ B)) = (iterate(p, f )!insert(x, Ac)) ∪ (iterate(p, f )!B) The rest of the proof is procceeded by cases. One case for the assumption that the predicate p holds on x and one for that it does not. First it is assumed that (p?x): →

(6) twice insert(f !x, iterate(p, f )!(Ac ∪ B)) = (insert(f !x, iterate(p, f )!Ac)) ∪ (iterate(p, f )!B) → (4) insert(f !x, iterate(p, f )!(Ac ∪ B)) = (insert(f !x, iterate(p, f )!Ac ∪ iterate(p, f )!B) ... and if ∼ (p?x): →

(7) iterate(p, f )!(Ac ∪ B) = (iterate(p, f )!Ac) ∪ (iterate(p, f )!B) This proof can be applied for B without touching its generality.

References [1] Mitch Cherniack. Building Query Optimizers with Combinators. Phd thesis, Brown University U.S., May 1999. [2] Mitch Cherniack and Stanley B. Zdonik. Rule languages and internal algebras for rule-based optimizers. pages 401–412, 1996. [3] J.V. Guttag J.J Horning. Larch: Languages and tools for formal specification. Documentation, Laboratory for Computer Science, January 1993. http://www.sds.lcs.mit.edu/spd/larch/index.html. [4] Werner Kiessling. SQL-like and QUEL-like correlation queries with aggregates revisited. Memorandum UCB/ERL M84/75, Electronics Research Laboratory, College of Engineering, University of California, Berkeley, Berkeley, CA, September 1984.

15

[5] Won Kim. On optimizing an sql-like nested query. ACM Trans. Database Syst., 7(3):443–469, 1982. [6] June Suk Lee. Design Documentation - COKO Compiler. Brown University, U.S., May 1998. Documentation that comes with the source. Web Site: http://www.cs.brandeis.edu/ cokokola/. [7] Stan Zdonic Mitch Cherniack. Changing the rules: Transformations for rule-based optimizers. In ACM SIGMOD International Conference on Management of Data, page 13. Brown Univeristy, U.S., June 1998. [8] Hamid Pirahesh, Joseph M. Hellerstein, and Waqar Hasan. Extensible/rule based query rewrite optimization in starburst. In Michael Stonebraker, editor, Proceedings of the 1992 ACM SIGMOD International Conference on Management of Data, San Diego, California, June 2-5, 1992, pages 39–48. ACM Press, 1992.

16