Supporting Array Types in Monoid Comprehensions

0 downloads 0 Views 339KB Size Report
The notion of \ nite domain" that we use to de ne arrays as total functions, formalizes the idea of \rectangular domains" used by 15] ( i.e. all indices must.
Supporting Array Types in Monoid Comprehensions Antonio Martinez Alcantara and Bill P. Buckles Dept. of Electrical Engineering and Computer Science Tulane University New Orleans, LA 70118 e-mail: [email protected] Phone: (504)862-3373 Fax: (504)862-3293 Abstract

A widely known problem of commercial database systems is their lack of support for \scienti c" applications. This is in part derived from their inability to deal with some data types such as arrays. It is the purpose of this research to make possible the full manipulation of array types using the query language for an OODB. The proposal is to use the Monoid Comprehension Calculus (MCC) [11], properly augmented, to support array types. The approach is to de ne arrays as functions instead of specifying them directly as monoids. This overcomes signi cant disadvantages of previously reported approaches. We present some examples of queries involving intensive manipulation of arrays in the context of image processing.

1 Introduction Object oriented database technology has reduced the impedance between programming languages and databases. In large part, this is due to the introduction of collections, some of which are recognizable abstract data types frequently employed in scienti c programming. Greater attention and support, however, is being given to unordered collections such as sets and bags that are less frequently employed in scienti c applications of databases. Arrays, while present in some form in most widely used OODBs are less fully developed and supported. We describe a systematic approach to incorporating arrays as functions in the MCC avoiding the problem of nding an adequate monoid for vectors. Comprehensions are shown to be suciently expressive for nontrivial array operations such as transpose and inner product. The comprehensions can be further reduced to lambda calculus expressions in a manner similar to that of other collection types [11, 12, 15]. Finally the utility of the entire framework is 1

shown within a matrix intense scienti c application { image processing. High level operations such as histograms and co-occurrence matrices are possible as well as smoothing and convolution. The plan of the paper is to cover as discrete topics the several theories that are integrated to achieve the objectives. That is, the monoid comprehension calculus is described, and that is followed by a separate discussion of array representation issues. After that, the essential aspects of category theory necessary in the interpretation of arrays as functions is given. Only then is the integration of arrays and MCC fully described with examples.

2 Background The problem of adding capabilities for manipulating arrays in a query language is only a part of the more widely discussed problem of enabling database languages to support a richer set of data types which would allow the users to model the real world in a more natural way. Those data types, \bulk types", often contain large amounts of data [23]. Amongst those types, some of the ones which have received more attention from the research community are sets, bags and lists. There is a strong belief amongst database researchers that in, large part, this lack of support of complex data structures is one of the main reasons explaining the poor impact of commercial database management systems in scienti c applications [16]. There have been attempts to more formally express the use of collections in programming languages (e.g., [3]) and at least one attempt to interpret collections in the context of relational query languages [5]. In the object-oriented database context, there exist an algebra/calculus (generator-free) approach [21] a mixed comprehension/algebra approach [13] and a comprehension approach [9, 12] for expressing the semantics of collection-based querying. One proposal representing the latter is the \Monoid Comprehension Calculus" [12] that has the property of capturing most of the operators that are commonly used in OODBs. For example, many of the most important features present in OQL of ODMG-93 [8] can be mapped directly to the calculus. It is this approach that we will use here. Figure 1 provides a clearer context of the role of this work in OODB query language research. MCC is proposed as a normal form for query expression. We are extending the calculus to incorporate uniformly arrays. Given a suitably expressive calculus, then further research is achievable such as the translation of a query language to MCC [13], provisions for symbolic optimization, and de nition of formal semantics as illustrated in the gure. In this calculus it is possible to represent in a declarative form, operations (queries) involving collection types whose output can be a collection type as well, but of a di erent kind. The argued advantages of this approach are attributed to the precise semantic meaning of all types (collection types like sets, bags and lists but also primitive types such as integers and booleans) and the operations that arise from their homogeneous representation via monoids. Monoids are 2

used as general templates for all data types. This representation captures the di erences between collection types (avoiding inconsistencies), and their similarities, yielding as a result a framework that treats collections in a uniform way. This uniformity can also be exploited to improve the process of query optimization because regular representation reduces the number of special cases. Based on these ideas, for arrays to be de ned as collection types in this calculus, we should be able to nd a monoid for arrays. In fact there is a proposal for such [11, 12]. However, that proposal violates a basic constraint for collection monoids: The monoid is not free. This means the homomorphisms are not unique and so, presumably, the semantics is not well de ned. The approach we use avoids the pitfall, in part, by de ning arrays as functions. The formal aspect of our approach bears much resemblance to both Fegaras/Maier [11, 12] and Libkin et al. [15] but di ers in important ways. It is similar to [15] in that a) our arrays are functions as well and b) we also use comprehensions for denoting arrays. It di ers in that it i) is based on the MCC ii) is developed for use in object-oriented databases. Conversely, it is similar to [11] and [12] in the use of the MCC but adding functions. Like Haskell [10], we treat arrays as functions. These functions are total over nite domains as in [22] rather than partial with nite support as in [15]. The notion of \ nite domain" that we use to de ne arrays as total functions, formalizes the idea of \rectangular domains" used by [15] ( i.e. all indices must be successions of integers beginning at zero, and each subscript tuple must be well-de ned over the array ).

3 The Monoid Comprehension Calculus In this section we describe the MCC The monoid calculus has deep roots in the theory of categories, however the exposition in this section is ad-hoc and follows [12]. We begin by presenting the de nition of monoid, the most basic idea in the MCC. De nition 1 (Monoid) A monoid of type T is a triple M = (T; merge; zero); merge is a binary (merge : T T ! T ) and associative operation with an identity element called zero (merge(zero, x) = merge(x, zero) = x ). Monoids can be quali ed as idempotent and/or commutative depending on whether merge is idempotent (i.e. merge(x; x) = x) and/or commutative (i.e. merge(x; y) = merge(y; x) ). Let us take for example the monoid for sets (set; [; f g) and the monoid for lists (list; ++; [ ]). The merge operation for sets ( [ ) is idempotent and commutative, while the operation for list ( ++ ) is neither idempotent nor commutative. In fact, the properties of merge are what constitute the di erence between sets and lists. A Collection or Free Monoid is the relevant construct with respect to this work. De nition 2 (Collection Monoid) A collection monoid is a quadruple (T( ), unit, zero, merge) where T( ) is a parameterized type over and (T( ), zero, 3

OQL Query

Translation Rules

Monoid Comprehension Calculus Optimization

Lambda Calculus

Computational Rules (Rewrite Systems)

Semantics

Query Execution Plan

Figure 1: Monoid Comprehensions in Query Language Processing merge) is a monoid as prescribed by De nition 1. The unit is a function of type ! T ( ). In the theory of categories, collection monoids are known as free monoids and the MCC depends on their theoretical properties. As an example of a collection monoid, let us take again the monoid for sets. To de ne the collection monoid for sets we need the set monoid (set; f g; [), a type set( ), a monoid type , and a function unit that takes a of type and outputs the singleton set fag. In Table 3 appear some examples of monoids corresponding to collection types frequently observed in commercial systems. Next we present the de nition of primitive monoid in the MCC.

De nition 3 (Primitive Monoid) A primitive monoid is a quadruple (T, unit, zero, merge), in which unit is the identity function (i.e. unit(a) = a; 8a 2 T ) and (T, zero, merge) is a monoid as prescribed by De nition 1. Thus primitive monoids are a special case of collection monoids for which unit is the identity function. Because the MCC is based on the theoretical 4

monoid list set bag string type list( ) set( ) bag( ) list(char) zero [] fg ff gg \" unit [a] fag ffagg \a" merge ++ [ ] concat Table 1: Some Collection Monoids properties of collection monoids, this de nition assumes primitive monoids have the necessary properties. Some useful primitive monoids are illustrated in Table 3.

monoid sum prod some all type int int bool bool zero 0 1 false true unit a a a a merge + * _ ^ Table 2: Some Primitive Monoids

Since monoids are used to represent all types, a query involving items of several di erent types would imply a mapping (called monoid homomorphism) from the several monoids to a speci c monoid. This is, monoid homomorphisms capture the operations over multiple collection types, for example, the join of a bag with a list returning a set. The de nition and properties of monoid homomorphisms are given next. De nition 4 (Monoid Homomorphism) A homomorphism from the collection monoid M = (T ( ); zeroM ; unitM ; mergeM ) to any other monoid N , where M  N , and N = (S; zeroN ; mergeN ), is de ned by the following equations: homM !N (f )(zeroM ) = zeroN M ! N M hom (f )(unit (a)) = f (a) homM !N (f )(mergeM (x; y)) = mergeN (homM !N (f )x; homM !N (f )y) In the above, f is an arbitrary function f : T ( ) ! T ( ), and x and y are elements of T . Thus a homomorphism sends functions over T ( ) to functions over S . There exists a partial order amongst the names of the monoids; the notation M  N indicates that name M precedes N in the order. This relationship is imposed by the properties of commutativity and idempotence of the merge operation. If we de ne a mapping  :N! fC; I g (N is the set of monoid names) to be: 8 < fI g i M is idempotent (M ) = : fC g i M is commutative fC; I g i M is both 5

then the condition M  N is equivalent to (M )  (N ). The constraint M  N can be explained as follows: we said that monoids can be quali ed as idempotent and/or commutative depending on whether merge is idempotent and/or commutative. Basically, one cannot deterministically map a monoid having either property into one not having the same property. To clarify, Fig. 2 depicts deterministic mappings. For example, List  Bag and Bag  Set. Ordered sets are lists with non-repeated items. τ List

Bag

Commutative

ρ

θ Commutative Idempotent

Idempotent

Ordered Set

κ

Set

Figure 2: Collection Type Lattice Monoid homomorphisms are used as the basis to de ne monoid comprehensions which are the means to represent queries in the MCC. Queries may deal with one or more collection types. Monoid comprehensions take the general form

M fejqg = M fejq1; q2 :::qn g; n  0; where element e is known as the \head" of the comprehension and q is a list of quali ers qi , that can be either:

 A generator v e0 . v is a variable and e0 is an expression.  A lter predicate. A lter is a boolean-valued expression. It expresses a condition that must be satis ed for an element to be selected for evaluation by subsequent quali ers. The generator v e0 , makes the variable v range over the elements of the collection valued expression e0 . A formal de nition transforms a comprehension to a -calculus expression. De nition 5 (Monoid Comprehension) A monoid comprehension over a primitive or collection monoid M is de ned by the following inductive equations

M fej g = unitM (e) M fejx u; qg = homN !M (x:M fejqg)u M fejpred; qg = if pred then M fejqg else zeroM 6

where u in x u is an expression that computes an instance of the collection monoid N; N  M and pred is a lter (a predicate). In the next example we will see how the rules in the De nitions 4 and 5 are used to reduce a monoid comprehension. Example 1 Let us apply the rules in the de nitions above to reduce the comprehension: [square x j x [1; 2; 3]; odd x] Before proceeding, we will agree about some notation that will ease the presentation of this document and in particular of this example:  The notation M fej qg used to specify a comprehension can be simpli ed. We can take o the letter M (that denotes the target monoid) and substitute the brackets \f g" for the brackets that correspond to collection monoid M . For example, we can write [ej q] instead of listfej qg; similarly, we can write ffej qgg instead of bagfej qg, etc.  We will write hsource;target instead of homsource!target . Here, source and target are respectively the names of the source and target monoids. We can even write hs;t when the names of the monoids result obvious from the context. Let us begin by applying rule (v):

[1; 2; 3]; odd x] = hlist;list (x:[square x j odd x])[1; 2; 3] making x:[square x j odd x]  f , and applying rule (iii): l;l h f merge([1]; [2; 3]) = mergelist (hl;l f [1]; hl;l f [2; 3]) apply rule (ii) and rule (iii): = f [1] ++mergelist(hl;l f [2]; hl;l f [3]) apply rule (ii), two times: = f [1] ++(f [2] ++f [3])

[square x j x

Developing f [1]:

f [1] = (x:[square x j odd x])[1] substituting x: = [square 1 j odd 1] applying rule (vi): = if odd 1 then [square 1 j ] else zerolist applying rule (iv): = unitlist [square 1] = [1]

7

Similarly f [2] = [ ] and f [3] = [9]; then, nally, as expected:

f [1] ++f [2] ++f [3] = [1; 9] A nal note before nishing the example. We have just seen that the process of reducing a comprehension is rather intricate; therefore, we have chosen to use the monoid calculus for its ability to give a unique meaning to the comprehensions and not because of its ease for reducing comprehensions. Some interesting examples of comprehensions are: length(x) count(x; a) sum(x) lter(p)x

= = = = x\y =

sumf1 j e xg sumf1 j e x; e = ag sumfv j v xg setfe j e x; p(e)g setfe j e x; e 2 yg

length(x), takes for example a list x and calculates the number of elements. count(x; a) counts the number of times the element a appears in a list (or bag) x. The expression sum(x) gets the summation of the elements in a non-idempotent monoid, a bag, for example. lter(p)x, selects an element e from a set x, i the predicate p(e) is true. Finally, x \ y selects elements e of set x i they belong also to set y. Now, we are ready to give a de nition for the monoid calculus.

De nition 6 (Monoid Comprehension Calculus) The valid syntactic forms

in the MCC are: v c e.A hA1 = e1 ; : : : ; An = en i e1 e 2

variable constant projection record construction

2 f=; ; ; ; 6=g

zeroM unitM (e) mergeM (e1 ; e2 ) M fejq1; q2 :::qn g comprehension where e; e1; : : : ; en are terms in the monoid calculus, v is a variable, and q1 ; : : : ; qn are quali ers of the form v e or e. The de nition above details the prevalent syntactic forms in the MCC. The only form that has not been addressed before is the record type. However it can be easily shown that it is also a free monoid (see [11]). We proceed by presenting an example that involves a query to a (very) simple database. 8

Example 2 Let us suppose we have a database for keeping track of some data about the books in a library. A logical schema could be: books: bag(struct(title: string, subject: string, authors: list(struct(name: string, last name: string)) )) i.e. \books" is a bag of structures containing the relevant information of the di erent books. A book is a structure containing the title, the subject and a list of structures corresponding to the authors. For each author it is kept the rst name and the last name. A possibly interesting query could be: \for the books about computers in our library, retrieve the number of authors whose last name is Perez ". A monoid comprehension for representing this query could be: sumf1 j b books, a b.authors, b.subject=`computers'^ a.last name=`Perez'g Applying again the reduction rules in De nitions 4 and 5, the comprehension is transformed into: hbag;sum (b:hlist;sum (a: if b.subject=`computers'^ a.last name=`Perez' then 1 else 0)b.authors)books The process of evaluation of this expression would follow these steps: 1. The outer lambda expression takes each book from the bag \books". 2. The inner lambda expression takes the list of authors of the book: if the book's subject is `computers', it sums one for every author whose last name is Perez. 3. The outer lambda expression adds up the individual results obtained for each book.

4 Theoretical basis of the MCC This section presents two results derived by using the category theoretical basis of the monoid calculus [1],[18],[24]: 1. We nd disadvantages in the monoid for vectors as proposed in [11] and [12]. 2. We propose a solution that does not disturb the existing structures of the MCC. Figure 3, o ers a map of the concepts used and the relationship amongst them. The purpose is to ease the presentation of ideas by providing the reader with a graphic notion of what is being discussed. 9

a) Monoid Comprehension Calculus Monoids form a Category

Monoid Freeness Universal Property

Free (or collection) monoid Homomorphism

Concept derived from algebra "structure preserving mappings" homologous to functors in categories

b)

Concepts from category theory

Category Object Arrow Homomorphism

Functor

Natural transformation

Adjoint functors: Freeness Universal Property

Figure 3: Map of Concepts. In part a) of the gure, we have enclosed in an ellipse

the three more basic concepts of the MCC. They point, in turn, to concepts from the theory of categories that are relevant to them. In part b), there is a hierarchical diagram of those concepts from category theory that give support to the MCC. The terms common to the two sections of the gure appear in italics.

4.1 Adjunctions and Free Monoids

We begin by de ning \adjunction", a concept which is central in the theory of categories. The notion of adjunction is fundamental to fully comprehend the idea of free (or collection) monoid, the core concept in the de nition of the Monoid Comprehension Calculus. This background will enable us to demonstrate how arrays (as functions between sets) naturally t into the MCC as homomorphisms between monoids. De nition 7 (Adjunction) An adjunction consists of  a pair of categories C and D  a pair of functors F : C ! D and G : D ! C  a natural transformation  : IC !: (G  F ); 10

such that for each C-object X and C-arrow f : X ! G(Y ), there is a unique D-arrow f # : F (X ) ! Y such that the triangle in Figure 4 commutes: ηX

X

G(F(X)) G(f # )

f G(Y)

Figure 4: Commutative Diagram for an Adjunction We say that (F,G) is an adjoint pair of functors; F is the left adjoint of G and G is the right adjoint of F. The natural transformation  is called the unit of the adjunction.

Now we will see how the de nition of free monoid can be obtained by making speci c some of the elements of the diagram in Figure 4: Make category C the category of monoids a category D the category of sets. Let functors F and G to be respectively, F :Set!Mon and G  U :Mon!Set. Then a free monoid would be de ned as follows:

De nition 8 (Free Monoid) Given the functor U between the categories of sets and monoids Set U? Mon and an object A 2 Set. An object F (A) 2 Mon is said to be the free monoid generated by A (or more properly, that object F(A) is free on A with respect to U) if there is an arrow A : A ! U (F (A)) such that for any arrow : A ! U (B ) (B 2 Mon) there is a unique arrow : F (A) ! B (B 2 Mon) such that U ( )  A = . U

Set

ηA

A

U(F(A ))

Mon F(A ) β

U(β)

α

U(B)

B

Figure 5: The Free Monoid F(A)

11

In Figure 5 appear diagrammatically the elements of this de nition. The arrow

A is called a universal arrow and the de nition is known as a universal property. It is called universal because the exact same A composed with U ( ) (for the resulting for a given function ) equals whatever that maps A into U (B ). This A makes unique for a speci c monoid B and function .

Many important characteristics can be deduced from the de nition of free monoid. Amongst them: 1. Any function between sets induces a monoid homomorphism between the corresponding free monoids. 2. Each set X 2 Set generates a free monoid F (X ). 3. The free monoid functor, F : Set ! Mon takes a set A to the free monoid F (A) which is the Kleene closure A with concatenation as the binary operation [1]. The rst observation tells us that we can introduce arrays as functions and that those functions will be represented as monoid homomorphisms in the MCC. The second one, tells us that it is actually possible. The third observation gives us another perspective for looking at the monoid for vectors in [12]: A free monoid for vectors should be able to generate any vector of any size and its merge operation should \look like" the concatenation operation for lists. This last observation raises doubt about the possibility of de ning a proper free monoid for vectors.

4.2 A Free Monoid for Vectors?

In [11] and [12], we nd the following proposal of a monoid for vectors (we will use the term \vector" to denote an array of one dimension): zeroM [n] = (jzeroM ; : : : ; zeroM j) M [ n ] unit (a; k) = (je0 ; : : : ; en?1j) M (a) if i = (k mod n) where ei = unit zeroM otherwise M [ n ] merge ((ja0 ; : : : ; an?1 j); (jb0 ; : : : ; bn?1j)) = (jmergeM (a0 ; b0); : : : ; mergeM (an?1 ; bn?1 )j) This monoid has at least two disadvantages: One is that the merge operation depends on the choice of the monoid M, this implies that the operations on arrays are not standard (e.g. in [11, 12] the monoid chosen is sum, then the operations for vectors have to be de ned using \+"). Another more important disadvantage is that because of the way in which unit is de ned, this monoid for vectors is not free. The reason is that such de nition violates the \universal mapping property" of unit (uniqueness property). The result is that unit not only depends on the base monoid argument a, but it is also a function of n, the length of the vector. This has important practical implications: 12

 We have an in nite number of monoids, one for each value of n, because

a single monoid is not capable of generating all possible vectors (linear combinations) of any size: One must always specify n.  Second and most important: Since the MCC is designed for collections de ned as free monoids there is no way of working in a uniform way with a monoid that is not free. In other words: there is no way of \transforming" (via a homomorphism) any other collection type into a vector or a vector into some other collection type. For example, there is no way of getting a vector of size 2 by merging 2 \vectors" of size 1. Conversely, there is no way of getting a set of ve elements from a vector of size ve. Instead of trying to de ne a free monoid for vectors, the approach we use is to circumvent the diculty by treating arrays as homomorphisms. The basis for this treatment resides in the capabilities of the monoid algebra to deal with general functions as homomorphisms between monoids. Before proceding to the task of integrating arrays into the MCC we study the details of how arrays can be implemented as functions in general. Later those functions will be properly represented as homomorphisms in the MCC.

5 Arrays as Functions Arrays are considered as being xed length sequences of values of a given type. There are, actually, several ways of modeling arrays. One of them is by using functions over nite types, an idea suggested by S. Thompson in [22]. We will use his TT (Type Theory) notation.

5.1 Finite Types

In its simplest form, the nite type Cn has n elements 1n; 2n ; : : : ; nn. The subscript shows the type from which the element comes. For example, we can think of the boolean types as being elements of C2 , the two-element nite type bool, where True is a representation of the element 12 and False represents the element 22 . Finite types de ned in this way are not useful for the purpose of de ning arrays. We need a special re-de nition, that considers an ordering of its elements. De nition 9 (Finite Types) The nite type Cn is de ned as being Cn  (9m : N ):(m < n) In type theory, a type of the form (9x : A):B consists of pairs (a : A; b : B [a=x]) (where B [a=x] denotes the expression B , with the occurrences of `variable' x replaced by an actual value, a) . It can be interpreted as a subset of the elements a in A with the property B [a=x]. It can be shown that Cn provides a consistent de nition of the index space [22]. This in turn assumes that the codomain of the function is well-de ned. 13

5.2 Vectors

While we do not present it here, it is possible to formally de ne a function for vectors using type theory [22]. Using Cn as the domain (i.e. subscript space) for vectors An of length n, it is straightforward to de ne A[i]; i 2 Cn , to be the i-th element of An . Then vectors are de ned as functions V ec An : Cn ! A. This notation denotes a vector of size n storing elements of type A. Further, the use of Cn as a domain, formalizes the idea of \rectangular domains" used by Libkin et al. [15].

5.3 Multidimensional Arrays

Addressing array storage at the physical level is not the intent here. At the physical level, issues such as row-major order and tagging as \sparse" or \triangular" are important. Here we are concerned with the logical structure of arrays. For multidimensional arrays we use the same function de ned for vectors, appropriately extended to permit a data type Cni for the i-th subscript. We need an ad-hoc mechanism to univocally arrange multidimensional data as a vector. A multidimensional array B with shape [n1 ; n2 ; : : : ; nm ], will be displayed by using two dimensional subarrays (represented by means of grids) in alternating directions. We adopt the convention that the rightmost number in the vector shape (in this case nm ) represents the \columns" of the innermost grid (so, nm?1 , represents the \rows" of the grid, nm?2 is again columns and so on). Then for an array with shape [2; 3], the grid has three columns and two rows. The spacing between grids is increased between higher dimensions. The items in the \cells" are positioned in a row-major order. Example 3 The array with dimensions 2 2 3 3 2 is displayed in Figure 6. Note the di erence with the array of dimensions 2 6 3 2 that appears in Figure 7. Both have space for 72 items but the layouts of the grids, as well as the numbering of the cells (i.e. the order in which the items are positioned in the array) are di erent. In those gures, the numbers in boldface (between parentheses next to the center of the brackets), represent the di erent dimensions of the array, and are to be read from the innermost number of columns to the outermost number in an alternating fashion. The other numbers (not within the cells) represent the numbering to identify \rows" and \columns" of the array. All of them range from zero to the length of the respective dimension minus one. Note that those \rows" and \columns" are actually rows and columns of grids, for dimensions greater than two. These conventions are important for two reasons: 1. They provide us with a unique way of identifying the cells of an array. For an array M of dimensions n1 ; : : : ; nm, we write the indexing function as M [e1 ][e2 ] : : : [em ], with e1 < n1 ; : : : ; em < nm . 14

(2) 1

(3) 0 0 (2)

0 0 (3)

1

0

1

0

1

2

0

0 6

7

12

13

0

1

2

1

36

37

38

...

1

2

3

8

9

14

15

1

2

4

5

10

11

16

17

2

18

19

30

31

66

67

32

33

68

69

34

35

70

71

(2)

...

1

...

...

Figure 6: Grid diagram for an array of dimensions 2 2 3 3 2 (6) (2) 0

(3) 0

2

0

1

0

0

1

6

7

1

2

3

8

9

2

4

5

10

11

0

1

0

36

37

1

38

...

(2)

1

1

3

...

...

...

4

...

...

5

...

...

2

30

31

32

33

34

35

66

67

68

69

70

71

Figure 7: Grid diagram for an array of dimensions 2 6 3 2 2. There is a direct relationship between the index e1 e2 : : : em (we omit the squared brackets for the sake of clarity) and the labeling k for that cell (0  k < n1  n2  : : :  nm ):

k = e1 (n2  : : :  nm ) + e2 (n3  : : :  nm ) + : : : + em (1) Then, the sequence of indices e1 e2 : : : em, can be obtained from k and the 15

shape of an array by applying an ad-hoc version of the division algorithm 1 [17]. See Figure 8, e1 is the quotient of the division of k by n2 : : :nm and R1 is the remainder; e2 is the quotient of the division of R1 by n3  : : :  nm and R2 is the remainder; etc. k

n2 x n3 x . . . x nm n3 x . . . x nm n4 x . . . x nm

R1 R2

e1 e2

... n m-1 1

Rm-1 e m-1 Rm e m

Figure 8: The division algorithm modi ed

Example 4 For the 2 2 3 3 2 array in Example 3, the index 10010 indicates position 38. Take n1 n2 n3 n4 n5  2 2 3 3 2 and e1 e2 e3e4 e5  10010, so e1 (n2  n3  n4  n5 ) + e2 (n3  n4  n5 ) + e3(n4  n5 ) + e4 (n5 ) + e5 (1) = 38:

Conversely, the position 38 corresponds to index 10010 according to the division algorithm. See Figure 9 36 38 18 2 6 2 2 2 1 0

1 0 0 1

0

0

Figure 9: The division algorithm applied These two observations have a key relevance for our modeling of multidimensional arrays: They allow us to represent any multidimensional array as a single vector. The only additional information we need is its shape. The next section formally justi es our representation of arrays as homomorphisms in the context of the Monoid Comprehension Calculus.

6 Arrays as Homomorphisms in the MCC Now we show that our de nition of arrays as homomorphisms in the MCC satis es the conditions for an adjunction. This also provides details on how arrays 1 Division Algorithm: a; b Z b>0

that

a = qb + r ; 0 r < b

If

2

with

, there exists unique integers q and r such



16

must be implemented. We proceed simply by making speci c the functions and free monoids involved in the adjunction of Figure 10. In Figure 10, U is the forgetful functor with domain in the category of monoids, and counterdomain in the category of sets (i:e: U : Mon ! Set). U , takes any monoid (M; ; e) to its underlying set M , and any monoid homomorphism hom : (M; ; e) ! (M 0 ; 0 ; e0 ) to the corresponding function h : M ! M 0 on the underlying sets of the monoids. unit is the injection taking an element c of set N (the set of the natural numbers) to the singleton set fcg 2 FiniteSet(N). The functor FiniteSet : Set ! Set, is de ned to be the nite power set [20], i.e. FiniteSet(S ) = fB jB  S; B finiteg We have de ned vectors of elements of type A, as functions V ec An : Cn ! A. From its de nition, is clear that Cn is a nite subset of N , then our more general discussion of FiniteSet(N) and a general function f , necessarily covers the special case of Cn and the speci c function V ec An. Category of Sets unit

FiniteSet ( ) h

f

Category of Monoids

F U U

S=( FiniteSet ( ) , U ,{ } ) hom

S

T

F U( T )

U

T=( FiniteSet ( ) , U ,{ } )

Figure 10: An Adjunction Involving Natural Numbers Again, from Figure 10, we see that unit corresponds to a natural transformation: the object part2 of functor F takes set N to the monoid S = (FiniteSet(N); [; f g), and functor U takes S to its underlying set, FiniteSet(N), accomplishing the conditions for the natural transformation  in De nition 7. As a consequence of unit being a natural transformation, and having a speci c function f , we can make some observations:  The function h : FiniteSet(N) ! U (T ), has the properties required in the adjunction: h  unit(c) = f (c)  S results to be a free monoid (we prove below that homS!T is a unique homomorphism).  S being free does not require monoid T to be of a speci c kind, this allows us to choose the monoid T to be any free monoid. This means we can select it to be of any collection type3 (e.g. list, set, bag), as long as we We discuss the arrow part of F in page 18 types are free monoids in the MCC

2 3 collection

17

maintain consistency with the other elements in the adjunction (Figure 10) and in the diagram for transference of types in the MCC (Figure 2). The only part missing is to check that homS!T is a homomorphism, and that it is unique. This would be equivalent to demonstrating that the representation of function f (that we will choose to be our function for vectors) as a homomorphism in category Mon is unique. Consequently, this gives a unique semantics to our arrays. We will follow closely the presentation of [19]. We said before that the object part of functor F : Set ! Mon takes the set N to the monoid S = (FiniteSet(N); [; f g). The arrow part of F , takes a function (e.g. the function f ) whose domain is N to a monoid homomorphism4 that maps that function to the elements of FiniteSet(N). We will give the name mapf ( ) to the homomorphism F (f )  homS!T and we de ne it to be 1. homS!T (f g) = mapf (f g) = f g 2. homS!T (L) = mapf (L0 [ L00 ) = mapf (L0) [ mapf (L00) S ! T 3. hom (fcg) = mapf (fcg) = ff (c)g The fact that mapf ( ) is a homomorphism is established in the two rst lines of its de nition. The third line ensures that these equations de ne the homomorphism mapf ( ), in the sense that it satis es them uniquely: Say that there exists another f 0 that satis es the equations 1-3. we will show (by induction on the size of L) that f 0 (L) = mapf (L) for a set L of any size.  Suppose L = f g. If f 0 satis es 1-3, then f 0(L) = f g = mapf (f g) (because of the rst condition).  Now, if L = c1 [ : : : [ cn+1 then: f 0(L) = f 0 (c1 [ : : : [ cn+1 ) = f 0 (c1 [ : : : [ cn [ cn+1 ) = f 0 (c1 [ : : : [ cn ) [ f 0 (cn+1 ) (because of the second condition) = mapf (c1 [ : : : [ cn ) [ f 0 (cn+1 ) (because of the induction hypothesis) = mapf (c1 [ : : : [ cn+1 ) since, by the third condition f 0 (cn+1 ) = f (cn+1 ) = mapf (cn+1)  The most important result stemming from this section is the possibility of representing arrays as homomorphisms between free monoids. An implication is that this makes possible the use of comprehensions to manipulate arrays, allowing their full integration into the MCC. A second consequence is the potential use of diverse functions that facilitate the manipulation of arrays.

4 A monoid homomorphism from monoid A = (M; ; e) to monoid B = (M ; ; e ), is a function g : M M such that g(e) = e and g(x y) = g(x) g(y). 0



!

0

0



18



0



0

0

7 Supporting Arrays in the MCC A conclusion from section 4 is that by representing arrays as functions, there is no need of nding a free monoid for vectors. The result is that such functions are represented as homomorphisms between two free monoids that, in section 6, we have selected to be of the type FiniteSet. In order to make this selection, we have taken the following ideas in consideration : 1. We can choose any pair of monoids as long as we respect the limitations explained in diagrams of Figures 4 and 2. 2. If we choose the two monoids as being of the same kind we only need to know the function to be mapped since the merge operation and the zero element remain the same after mapping. 3. By taking the two monoids of the same kind we can more \safely" and \easily" perform the composition of the functions that transform the elements of the arrays, given that we do not have to heed the transformations happening in the merge and zero components of the monoids when we apply the homomorphism. 4. In selecting the monoids we must consider that we should be able to access the elements of an array in constant time. Now we will proceed to specify the way in which the arrays are \constructed". After selecting FiniteSet as the free monoid, our arrays will be represented as a homomorphism between a set S containing n natural numbers (beginning at zero) and a set T containing the items that we are interested in \storing" in that array. The function (array) mapped from S to T by means of the homomorphism, establishes the associations desired in the array. From De nition 5, it is clear that monoid comprehensions are a way of specifying the characteristics of a homomorphism between monoids. That is, the homomorphism must indicate the source and target monoids as well as the \operations" (or function to be applied) on the elements of the source monoid. Since our arrays are monoid homomorphisms they can be expressed by means of a monoid comprehension. In order to have a better understanding of this idea, we need to \see" in more detail the \inner mechanism" of a monoid comprehension. From the de nition of monoid comprehension, it is easily shown that the monoid comprehension Mff (x) j x E g can be computed by a loop (in pseudocode):

R := zeroM ; foreach x in E R := mergeM (R; unitMf (x)); return C ; (if E happens to be a comprehension itself, this would lead to a nested loop). If we think of the comprehension above as representing a vector, then E is a 19

set containing n natural numbers beginning at zero (i.e. the elements of Cn ). M is the target monoid (a set) containing elements of the type to be stored in the array and f (x) is the function that represents the array: It speci es the associations between \cells" and the items \stored" in them. Also from the de nition of monoid comprehension, we know that we could represent this function as a monoid comprehension as well; its result would be the set of associations that de ne an array. The comprehensions denoting an array (i.e. the function that determines it) will be enclosed between the special brackets [ ]. Example 5 The array of n elements that associates to position i, the value f (i), could be represented by the comprehension: [ (i; f (i)) j i

Cn ]

Example 6 The function that expresses the array containing letters a, b and c, in that order could be denoted by

[ (0; a); (2; c); (1; b)]]: It represents the lambda expression (:x : C3 )(if (eq x 0) then a else if (eq x 1) then b else c)

8 Operations with Arrays Given that arrays are functions, the way of \operating" on its elements is via function application (i.e. the application of the function to actual values of its domain). The two main complementary functions for de ning a vector are:  Subscript: given a value i of type Cn and a vector an of type V ecAn, the ith element of an is obtained by applying the function an (i) that we denote by an [i].  Update: the function

vn [m] := b  update An vn mb  x:(if (eq mx) then b else (vn x)) updates an existing vector vn (of type V ecAn) in position m with value b. It has type updateAn : V ecAn ! Cn ! A ! V ecAn Another useful function is dim to get the dimensions of an existing array

Xn1 :::nm has type

dim : Array An1 : : : nm ! (n1 ; : : : ; nm) 20

Example 7 Let's call g3 the array in Example 6 above. The di erent elements of it can be obtained by means of a monoid comprehension:

fg3 [i] j i C3 g The result is a set containing the elements \stored" in the array, namely fa; b; cg.

Example 8 A set containing the associations speci ed by the function g3(i), are obtained by reducing the comprehension: f(i; g3[i]) j i C3 g The result is a set of pairs f(0; a); (1; b); (2; c)g (it is di erent from the expression in Example 6 that de nes a lambda term).

Example 9 The comprehension [ (i; g3 [i] := d) j i

Cn ]

makes all the elements of vector g3 equal to d. Based on this very simple set of functions we can de ne some others that could ease the manipulation of arrays. Examples include:

 We can map a function over the di erent elements of an array xn : map f xn  [ (i; f (xn [i])) j i Cn ]  Sub-sequence or sub-array: subseq(xn; a; b)  [ ((i ? a); xn [i]) j i fa::bg] where fa::bg is the subset of Cn containing the sequence of natural numbers between a and b. In the case of x being a matrix, subseq(x) is a submatrix formed with the columns a; : : : ; b: subseq(xmn; a; b)  [ (i; j ? a; xmn [i; j ]) j i Cm ; j fa::bg]

 Summation of two Vectors xn and yn : SV (xn ; yn )  [ (i; xn [i] + yn [i]) j i Cn ]  The jth column of a matrix: col(j; Amn )  [ (i; Amn [i; j ]) j i

Cm ]

 Concatenation of two vectors xn and ym : cat(xn ; yn )  [ (i; if i < n then xn [i] else ym [i ? n]) j i Cn+m ] 21

 Unscrolling a matrix: We discussed in the last section that in general, an

array An1 :::nk , can be transformed to a single vector: unscroll(An1 :::nk )  [ (0; A[0; 0; : : : ; 0]); : : : ; (n1  : : :  nk ; A[n1 ? 1; : : : ; nk ? 1])]] The elements in the vector are in row major order. For example, if   A = 14 25 36 then unscroll(A) = [ (0; 1); (1; 2); (2; 3); (3; 4); (4; 5); (5; 6)]].  Reshaping a matrix: reshape(d; v) is the companion function, for a desired shape d and a given vector v. In particular: A = reshape(dim(A); unscroll(A))

 Zip n vectors of size m:

zip(x1 : : : xn )  reshape( (n; m); cat(x1 : : : xn ) )

 Matrix multiplication: The product of two compatible matrices xmn and yno :

Matprod(xmn ; yno )  [ (i; j; sumfxmn [i; k]yno [k; j ] j k

Cn g) j i

Cm ; j

Co ]

 Matrix transposition:

Transpose(xmn )  [ (j; i; xmn [i; j ]) j j

C n ; i Cm ] Buneman [4] and Fegaras [12] use generators of the form a[i] x where x is an

array. We nd this notation useful and we will use it as syntactic sugar (we will write instead ai x to avoid overloading the notation a[i]) :  If x is a vector of size n, the generator accesses the pair (a; i) (value a and its position i). Then a comprehension f(i; a) j a[i] xg would be equivalent to our f(i; x[i]) j i Cn g.  If x is a matrix of shape (m; n) the generator a[i] x, binds to a the column (vector) in position i. Then a comprehension fa j a[i] xg would be equivalent to our fcol(j; xmn ) j j Cn g.

9 Applicative Examples In this section we are interested in presenting some operations that could be useful in practice given that many applications of databases store information that can be viewed as an array. The following operations comprise a wide range of capabilities for which we believe queries using comprehensions can be constructed. While they tend toward image processing tasks, it is obvious the proposed primitives (generators and lters) are not in themselves limited to images. The tasks below are of the nature that ordinarily each would consist of a separate application package. 22

9.1 Histograms

A histogram of an image is an array whose length is the number of possible intensity values produced by the sensor. For example, histograms can be used to summarize the gray level content of a picture by showing the number of pixels at each level of gray [6]. For an image f , let hf be the histogram. hf could be constructed by the following algorithm: 1. Assign zero values to all elements of the array hf . 2. For all pixels (x; y) of the image f , increment hf (f (x; y)) by one. This operation can be presented in comprehension form as: H (f )  [ (k; sumfif (f [i; j ] = k) then 1 else 0 j i Cm ; j Cn g) j k Cl ] where l (the number of levels of gray) is the length of the array of the histogram. Note that the number of calculations depends on the algorithm used. The complexity of this algorithm is O(mnp) which is quite high. However we could be able to nd more ecient algorithms. For example, Libkin et al.[15], discuss the implementation in their Array Query Language, of a primitive function index that reduces the complexity to logarithmic time by creating an implicit group-by operation.

9.2 Smoothing of Images

An operation frequently performed in image analysis is smoothing. The purpose of the operation is to distribute the sensor noise more uniformly across the image which is represented as a 2-D grid. Each grid point is termed a pixel and consists of an integer representing intensity. Smoothing generally entails one of the variations of discrete convolution. One such method is as follows: For each internal (non-border) pixel, replace it with the average of itself and its neighbors. There are eight neighbors in a standard square grid. For a matrix xmn a \smoothed" version can be obtained by applying the next comprehension: (1) [ (i; j; b) j j f1; : : :; n ? 2g; (2) a subseq(xmn; j ? 1; j + 1); (3) p transpose(am;3 ); (4) i f1; : : : ; m ? 2g; (5) c subseq(p3;m; i ? 1; i + 1); (6) d sum all(c); (7) b 1=9  d] Lines (1) and (4) ensure that only non{border points are considered for smoothing. In line (2) above, a is constructed so that it contains exactly three consecutive columns of matrix x. Lines (3) and (5), together, select three consecutive rows from p. Thus, c is a 3  3 square matrix of adjacent elements from x. Line (6) simply sums the elements of the extracted submatrix. The last line is an application of map which takes the average from the sum de ned in line (6). 23

9.3 Fast Fourier Transformation

The discrete version of the Fourier Transformation (DFT) of a vector a, is known to be nX ?1 k = 0; : : : ; n ? 1 Fk = aj !nkj j =0 where the aj are the elements of a and !n is the nth root of unity: !n = e2i=n

p

i  ?1, and

ei = cos  + i sin 

This DFT, is easily represented in comprehension notation:

DFT (a)  [ (k; F ) j k

sumfa!nkj ja[j ] ag] It is easily seen that the complexity of this algorithm is O(n2 ). That becomes burdensome for big values of n that are common to nd in real applications. In the literature we can nd several algorithms which are O(n log n). Those Cn ; F

algorithms are known as \Fast Fourier Transformations", and they exploit the mathematical properties of the complex roots of unity. Buneman [4], presents in comprehension notation, an algorithm to calculate the Discrete Fourier Transformation of a vector V of size nm using m DFTs of size n. The arguement is that the total amount of time needed to compute the DFT of V , is T (mn) = mT (n)+ Km2n for some constant K . When m = 2, then the relation becomes T (2n) = 2T (n) + K 0n, which shows that the total time needed to compute the DFT of V is proportional to n log2 n. The algorithm Buneman presents, is as follows: Input: a vector V of size nm Output: a vector F of size nm, containing the DFT of V 1. Get a matrix W of dimensions n  m, by reshaping V . e.g. if V = [0 1 2 3], we can take n = 2; m = 2, and get



W = 02 13



2. for d = 0; : : : ; m ? 1 (a) Get the DFT for the ath column of W . e.g. the rst column of W is the vector [0 2] and its DFT is [2 ? 2] (we are using the bracket notation \[ ]" as it is used in mathematics to denote a vector in this example and not to denote a list as in other parts of this document!). Similarly, the DFT of the second column [1 3] is [4 ? 2]. 24

(b) For all the elements of the DFT vectors in the step before, apply a(nd+c) t , where the notation t indicates value t in the operation !mn c c position c of the DFT vector. Get the summation of the resulting vectors. For example, for column 1 of W , the DFT vector is [2 ? 2]. The a(nd+c) t is vector [2 ? 2] as well. For the DFT result of operation !mn c of column 2, the result is [4 ? 2i] . The sum the resulting vectors is [2 ? 2] + [4 ? 2i] = [6 ? 2 ? 2i]. (c) Concatenate the resulting vectors obtained for di erent values of d above. This algorithm can be easily represented using comprehension notation (that is precisely the objective of Bunneman's paper!): a(nd+c) t) j tc DFT (z )]] j za W )]] j d Cm ] cat[ [ SV ([[(c; !mn In Figure 11 we put in detail an example to show how this comprehension is reduced. a(nd+c) mn

SV( ( c, ω

cat

t ) tc

DFT ( z )

0 1 2 3

za

)

d

0 1

d=0

SV(

a(nd+c) mn

( c, ω

t ) tc

a=0, t=2, c=0

2 -2

a=0,t=-2,c=1

a(nd+c) mn

( c, ω

t ) tc

a=1,t=4,c=0

4

d=1

-2

)

a=1,t=-2,c=1

. . .

SV(

(0, 4), (1, -2i)

(0, -2), (1, -2)

cat (

(0, 6), (1, -2-2i)

)

(0, -2), (1, -2+2i)

(0, 6), (1, -2-2i), (2, -2), (3, -2+2i)

Figure 11: An example of the Fast Fourier Transform as a query.

10 Summary and Conclusions The comprehensions of the previous section indicate the potential of arrays as collections in scienti c applications. Using comprehensions that bind indices, 25

)

Buneman [4] showed that a discrete fast Fourier transform could be expressed as a comprehension. We have all but accomplished the same using the array semantics de ned here. In this paper we have taken steps to formally de ne the semantics of arrays in terms of the monoid calculus. The formulation is free of the inconsistencies of previously reported approaches. Such formalization enables a precise semantic de nition of ordered collections such as arrays. Often, the semantics of object oriented database query languages depend on pragmatics and intuition. The approach begun here can provide both a precise meaning and an operational semantics. Some query languages. e.g. OQL, [7] are structured in a manner amenable to translation to the MCC. For example Grust [13] purports to translate the whole of OQL to comprehensions. OQL and others, however, are limited to single dimensioned arrays, having limited functionality. This is unsatisfactory for domains such as geographic information systems (GISs) and image databases. We will soon be able to propose minor extensions to OQL that will increase the power to the level of the comprehensions described here. (This will include appropriate translation to the MCC.) Additionally this is a starting point for the tight integration of functional programming languages and databases.

References [1] M. Barr and C. Wells. Category Theory for Computing Science. Prentice Hall International, Cambridge, UK, 1995. [2] V. Breazu-Tannen, P. Buneman, and S. Naqvi. Structural recursion as a query language. In Proceedings of the 3rd Intern. Workshop on Database Programming Languages, pages 9{19, Naphlion, Greece, Aug. 1991. [3] V. Breazu-Tannen and R. Subrahmanyan. Logical and computational aspects of programming with sets/bags/lists. In Proceedings of the 18th Intern. Colloquium on Automata, Languages and Programming, pages 60{75, Madrid, Spain, July 1991. [4] P. Buneman. The fast fourier transform as a database query. Technical Report MS-CIS-93-37/L&C 60, University of Pennsylvania, Mar. 1993. [5] P. Buneman, S. Naqvi, V. Tannen, and L. Wong. Principles of programming with complex objects and collection types. Theoretical Computer Science, 48(149):3{47, Mar. 1995. [6] K. R. Castleman. Digital image processing. Prentice-Hall, Englewood Cli s, N.J., 1979. [7] R. G. G. Cattell. Object Data Management. Addison-Wesley Publishing Company, Reading, MA, 1991. 26

[8] R. G. G. Cattell. The Object Database Standard: ODMG-93. Morgan Kaufmann Publishers, San Francisco, CA, 1996. [9] D. K. C. Chan, P. Trinder, and R. Welland. Evaluating object-oriented query languages. The Computer Journal, 37(10):858{872, Nov. 1994. [10] A. J. T. Davie. An Introduction to Functional Programming Systems Using Haskell. Cambridge University Press, Cambridge, U.K., 1992. [11] L. Fegaras. A uniform calculus for collection types. Technical Report 94030, Oregon Graduate Institute, Dept. of Computer Science & Engineering, 1994. [12] L. Fegaras and D. Maier. Towards an e ective calculus for object query languages. In Proc. SIGMOD'95, pages 47{58, San Jose, Cal, may 1995. [13] T. Grust. Monoid comprehensions as a target for the translation of OQL. In Workshop on Performance Enhancement in Object Bases, Schloss Dagstuhl, Germany, Apr. 1996. [14] P. Hudak and P. W. (editors). Report on the programming language haskell, a non-strict purely functionsl language (version 1.0). Technical Report YALEU/DCS/RR777, Yale University, apr 1990. [15] L. Libkin, R. Machlin, and L. Wong. A query language for multidimensional arrays: Design, implementation, and optimization techniques. In Proc. SIGMOD'96, pages 228{239, Montreal, June 1996. [16] D. Maier and B. Vance. A call to order. In PODS, pages 1{16, May 1993. [17] N. H. McCoy and T. R. Berger. Algebra: Groups, Rings and Other Topics. Allyn and Bacon, Inc, London, England, 1977. [18] B. C. Pierce. Basic Category Theory for Computer Scientists. The M. I. T. Press, Cambridge, Massachusetts, 1991. [19] D. E. Rydeheard. Adjunctions. In Category Theory and Computer Programming, pages 51{57, Guilford U.K., Sept. 1985. [20] D. E. Rydeheard. Functors and natural transformations. In Category Theory and Computer Programming, pages 43{50, Guilford U.K., Sept. 1985. [21] D. Straube and M. T. Ozsu. Queries and object processing in objectoriented database systems. ACM Transactions on Information Systems, 8(4):387{430, Oct. 1990. [22] S. Thompson. Type Theory and Functional Programming. Addison-Wesley, Reading, MA, 1991. [23] P. Trinder. Comprehensions, a query notation for DBPLs. In 3rd International Workshop on Database Programming Languages, pages 55{68, Nafplion, Greece, 1991. 27

[24] R. F. C. Walters. Categories and Computer Science. Cambridge University Press, Cambridge, G.B., 1991.

28