An Efficient Indexing and Search Technique for ... - CiteSeerX

5 downloads 0 Views 242KB Size Report
ematical sense) acting on elements of a set which in turn constitute the database ..... GD(m) for m ∈ M. As the set GD(m) points to all occur- rences of object m ...
An Efficient Indexing and Search Technique for Multimedia Databases Michael Clausen



Heiko Korner ¨

Frank Kurth

Dept. of Computer Science III University of Bonn Romerstraße ¨ 164 53117 Bonn, Germany

Dept. of Computer Science III University of Bonn Romerstraße ¨ 164 53117 Bonn, Germany

Dept. of Computer Science III University of Bonn Romerstraße ¨ 164 53117 Bonn, Germany

[email protected]

[email protected]

[email protected]

ABSTRACT We present a novel index-based approach for searching multimedia databases by content. Our approach integrates methods from classical full-text retrieval with the mathematical concept of groups acting on sets. This yields a flexible framework applicable to a wide range of content-based search problems such as audio- or image-identification. We propose space-efficient indexing methods as well as very fast fault-tolerant searching algorithms. In contrast to other approaches, our query response times decrease with incresing query complexity. As a further main benefit, a concept of partial matches is an integral part of our technique. We demonstrate the capabilities of our approach with examples from content-based music- and image-retrieval.

1.

INTRODUCTION

The last few years have seen an increasing importance of multimedia databases for a wide range of applications. As one major reason, the availability of affordable high performance hardware now allows for efficient processing and storage of the huge amounts of data which arise, e.g., in video-, image-, or audio-applications. A key philosophy to accessing data in multimedia databases is content-based retrieval [26], where the content of the multimedia documents is processed rather than just some textual annotation describing the documents. Hence, a content-based query to an image database asking for all images showing a certain person would basically rely on a suitable feature extraction mechanism to scan images for occurrences of that person. On the other hand, a classical query based on additional textual information would rely on the existence of a suitable textual annotation of the contents of all images. Unfortunately, in most cases such a textual annotation is neither available nor may it be easily extracted automatically, emphasizing the demand for ∗This work was supported in part by Deutsche Forschungsgemeinschaft under grant CL 64/3.

Figure 1: A query to the database in the piano roll notation.

feasible content-based retrieval methods. It turns out that many problems arising in content-based retrieval share essential structural properties. We briefly sketch some of those problems. Let us first consider a problem from music information retrieval. Assume that a music database consists of a collection of scores. This is, each database document is a score representation of a piece of music containing the notes of that piece as well as additional information such as meter or tempo. Apart from using conventional music notation, musical scores may be visualized using the so called piano roll representation. Fig. 1 shows the piano roll representation of a small part of a piece of music. In the figure, the horizontal axis represents time whereas the vertical axis represents pitches. Each rectangle of width d located (w.r.t. its left lower edge) at coordinates [t, p] represents one note of pitch p, onset time t, and duration d. Now we consider the following database search problem: Given a melody or, more generally, an arbitrary excerpt of a piece of music, we are looking for all occurrences of that query in the database documents. As an example, assume that we are looking for for all positions where the musical fragement of Fig. 1 (or a pitch-transposed version thereof) occurs in J.S. Bach’s Fugue in C major, BWV 846, which is partially depicted in Fig. 2. Fig. 3 shows all such occurrences. The second problem is concerned with content-based image retrieval. Consider a database consisting of digital copyrighted images. Assume that we are interested, e.g., for some legal reasons, in finding all web pages on the Internet containing at least one of the copyrighted images or fragments thereof. This problem may be again considered as a database search problem: Given a (query) image taken from some web page, we are looking for an occurrence of

Figure 2: A part of J.S. Bach’s Fugue in C major, BWV 846, in the piano roll notation.

Figure 3: Bach fugue of Fig. 2. All occurrences of the query in Fig. 1 are highlighted.

that image as a subimage of one of the database images, including the case that the query matches one of those images as a whole. Extensions to this problem include that we are also interested in finding rotated, resized, or lower-quality versions of the original images. Both of the above problems may be viewed in the following general setting: A query to a database consisting of a collection of multimedia documents has to be answered in the sense that certain transformations (or generalized shift operations) have to be found which transport the query to its location within the database document. In this paper, we systematically exploit this principle for the case that admissible transformations are taken from a group (in the mathematical sense) acting on elements of a set which in turn constitute the database documents. It turns out that this approach leads to very efficient search algorithms for a large class of content-based search problems, which are in particular applicable to spatial-, temporal-, or spatio-temporal retrieval settings [26]. We briefly summarize the main contributions of our approach:

• We develop a general framework for retrieval of multimedia documents by example. Our technique’s flexibility has been demonstrated by prototypes in various fields (e.g., music, audio, image, and (relational) object retrieval).

• We propose generic algorithms for query evaluation together with efficient algorithms for fault-tolerant retrieval which consequently exploit the structure inherent in the retrieval problems. • In contrast to other approaches, query evaluation becomes more efficient when the complexity of a query increases. • The concept of partial matches (i.e., a query is only matched to a part of a document) is an integral part of our technique and requires no additional storage. The proposed technique has been successfully tested on a variety of content-based retrieval problems. We summarize some figures on those prototypes: • Our PROMS system, for the first time, allowed for efficient polyphonic search in polyphonic scores [4]. E.g., queries to a database of 12,000 pieces of music containing 33 million notes can be answered in about 50 milliseconds. • Our system for searching large databases of audio signals allows for both identifying and precisely locating short fragments of audio signals w.r.t. the database. The sizes of our search indexes are very small, e.g., only 1:1,000 – 1:15,000 the size of the original audio data depending on the required retrieval granularity. As an example, a database of 180 GB of audio material can be indexed using about 50 MB only, while still

allowing audio signals of only several seconds of length to be located within fractions of seconds [17]. • Index construction may be performed very efficiently. For the PROMS system, index construction takes only a few minutes, hence allowing for indexing on the fly. Indexing PCM audio data may be performed several times faster than real-time on standard PC hardware. • In our prototypic image retrieval system containing 3,300 images, exact sub-image queries require about 50 ms response time [19, 18]. The search index is compressed to 1:6 compared to the original (JPEG) data. The paper is organized as follows. In the next section, we develop our general index-based approach to contentbased multimedia search. This involves suitable data modeling, defining the notion of a match, developing an index data structure, and designing algorithms for fast indexbased searching. The third and fourth section deal with several general mechanisms to incorporate fault tolerance into our search algorithms and to exploit prior knowledge to obtain a significant retrieval speed-up. In the fifth section, we address the issue of fast indexing and search algorithms, followed by a discussion on index compression techniques in Section 6. Section 7 establishes the important link of our framework to feature extraction mechanisms. In Section 8 we describe the application of our technique to several retrieval-scenarios and gives test results demonstrating the capabilities of the proposed approach. In Section 9, we review related work concerning both our general approach and the specific retrieval applications.

2.

RELATED WORK

The techniques presented in this paper are related to work from across several communities. In this section we try to establish the most important relations to previous as well as ongoing research efforts. From a classical database point of view, multimedia data may be modeled using a relational framework. In a usual approach, multimedia documents a preprocessed yielding certain feature vectors. The extracted features are then suitably stored in tables of a relational database [20]. Using relations, it is possible to model complex object dependencies like spatial constraints on regions in an image. Efficient retrieval methods have been proposed using approximate search like hill climbing [16]. Although our approach may be extended to a relational setting by introducing permutation groups, the methods proposed in this paper were primarily developed to exploit the structure of the underlying object set M and the group G acting on M (e.g., our musical documents are structured by specific time- and pitch- intervals between single notes which are not changed by the group action). In this light, our model constitutes a special case of the general relational setting which, however, allows for very efficient query evaluation. An important issue in multimedia indexing is the use of multidimensional access structures like k-d- or R*-Trees [13]. Popular indexing approaches map multimedia documents such as time series to higher dimensional feature sequences and use multidimensional access structures for searching in

those structures [8]. In our approach we, as far as possible, tried to avoid higher dimensional features in order to avoid problems resulting from the dimensionality curse. In indexing audio this became, e.g., possible by exploiting the fixed temporal relationsships between the features. On the other hand, when dealing with more complex groups (e.g., the group of Euclidean motions in 3D leads to a 6-parameter representation for each element), also our approach is dependent on efficient algorithms for higher dimensional rangeand nearest neighbor search [1]. Specializing to time-series, there has been considerable recent interest in searching a query sequence in large sets of times–series w.r.t. various distance measures. Examples are Euclidean, `p , or dynamic time warping distances [11, 14]. Whereas our approach has up to now only been applied to audio signals, its high performance suggests an extension to include general time-series search under the latter distance measures. For the particular case of audio identification, several methods suitable for large data collections have been proposed recently. Among those, the hashing algorithms proposed by Kalker et al. [10] are the most similar to our approach, as hash tables are related to inverted files. Our approach has the advantage of needing significantly less memory for storing the index (about 100 MB as compared to 1–2 GB) with otherwise comparably (fast) performance data. In music retrieval, most of the early work has concentrated on similarity-based retrieval of melodies, see, e.g., [22]. For a long time, retrieval in polyphonic music (see, e.g., [12]) already suffered from a lack of suitable data modeling. Our technique led to a breakthrough in allowing a modeling as well as very efficient search in polyphonic music [4]. As our approach is up to now mainly focused on modeling and efficient retrieval, the incorporation of measures on music similarity, which have been proposed in the music information retrieval community, is a natural challenge for future work. An approach which is similar in spirit to our general technique is geometric hashing for object recognition proposed in Computer Vision [25]. This approach shares the modeling of shift operations which are considered for 2D/3D settings as well as the exploitation of the datas’ structural (geometric) properties. In our approach, the data modeling is more general which yields advantages in designing more efficient fault-tolerant retrieval algorithms. An extension of our approach [6] including general distance measures between query and matching position leads to (shape) matching problems which have been extensively treated in the area of computational geometry [23]. It will be a great challenge for future work to try to combine our approach, which is more tuned for use with larger datasets, with the sophisticated geometric matching techniques. Many approaches to content-based image retrieval have been proposed in the last years, among which perhaps IBM’s QBIC system (Query by Image Content) is the most popular [9]. In comparison to those, a considerable advantage of our technique is the natural integration of partial matches and the ability to locate queries as subimages of database

images at no additional memory overhead. An interesting direction is the Blobworld approach of region-based image representation and retrieval [3], where image descriptors are created from a prior segmentation into similarly textured regions. It would be very interesting to combine such texture descriptors with our approach to spatial indexing.

3.

GENERAL CONCEPT

In what follows, we use an abstract model of a database to describe collections of multimedia documents. Instances of this model may be implemented using standard databases. However, those issues are beyond the scope of this paper. Multimedia objects like score music or digital images may usually be described by certain atomic elements. Such elements may be (semantically) low level descriptors like image pixels or musical notes. Higher level descriptor elements, which are frequently called features may include edges within images or melodies in a piece of music. Those elementary objects will be modeled by a set M . Then a document is a finite subset D ⊆ M . A database over M is a collection D := (D1 , . . . , DN ) of documents Di ⊆ M . Note that with this convention, we only consider homogeneous databases where all documents consist of elements taken from the same set M . As an example consider the above scenario of score-based music search. We define the set of elementary objects as the set of all notes M := Z × Z × N. In this idealized model, a note [t, p, d] ∈ M consists of an onset time t, a pitch p, and a duration d. As a second example consider the image retrieval scenario. In a first approach, we model images using pixels as elementary objects. Let P := R2 × N denote the set of all pixels. Then [(x, y), c] ∈ P is a pixel of color c located at 2D-coordinates (x, y). As a more complex elementary object space consider L := R2 × R2 , where each [p1 , p2 ] ∈ L represents a line segment between coordinates p1 and p2 . To model queries, we follow a query-by-example approach, i.e., a query is also modeled as a document Q ⊆ M . Hence, in contrast to conventional database queries which consist of certain declarative expressions, it makes sense to speak of a query occurring in a document in our case. E.g., we might use the photograph of a person as a query to search an image database for all photos showing this person. We shall now formally specify occurrences of queries in documents. For this sake, we first define shifted versions of a document D ⊆ M . We recall some basic definitions from elementary group theory. A group (G, ∗) consists of a set G and a map ∗ : G × G → G, the group operation, such that ∀f, g, h ∈ G : (f ∗ g) ∗ h = f ∗ (g ∗ h) (associative law), there exists a unit element e ∈ G such that ∀g ∈ G : e ∗ g = g, and for all g ∈ G there exists an inverse element g 0 ∈ G such that g ∗ g 0 = e = g 0 ∗ g. Frequently, as in the group (Z, +), the operation is written additively. In those cases we write g 0 = −g and e = 0. In case of multiplicative groups, such as the group GLn (R) of all invertible n × n matrices where the group operation is the matrix multiplication, we write g 0 = g −1 and e = 1. A group is called abelian iff ∀g, h ∈ G we have g∗h = h∗g. As usual, we frequently write G instead

of (G, ∗). A group of great importance for the applications discussed later on in this paper is the orthogonal group On of real n×n matrices A satisfying A> A = 1 together with the standard matrix multiplication. We say that a group G acts on a set M if there is a map G × M → M , (g, m) 7→ gm, satisfying g(hm) = (gh)m and 1G m = m for all g, h ∈ G and m ∈ M . This is, using group elements, certain elements of M may be shifted into each other. Such a group action induces an equivalence relation on M by m ∼G n, iff ∃g ∈ G : gm = n. The equivalence class containing m is a so called G-orbit Gm := {gm | g ∈ G}. Hence, if R denotes a transversal of the G-orbits, i.e., R contains exactly one element of each orbit, there is a disjoint decomposition of M into G-orbits: G M= Gr. (1) r∈R

As an example, consider the following action of the group T2 := (Z × Z, +) of 2D-shifts on the set of notes M : (τ, ρ)[t, p, d] := [t + τ, p + ρ, d]. This is, (τ, ρ) ∈ T2 shifts a note by τ time units and ρ pitch units. One easily verifies that there is exactly one orbit for each fixed duration d. The orthogonal group O2 acts on the set of pixels P via A[p, c] := [Ap, c], where p ∈ R2 , c ∈ N and Ap denotes standard matrix-vector multiplication. Orbits consist of all equally colored pixels on a circle around the origin, O2 [p, c] = {[q, c] | kqk2 = kpk2 }. The action of a group G on a set M naturally induces an action on the power set 2M := {K | K ⊆ M } of M via gK := {gk | k ∈ K} for K ∈ 2M . Now, a match to query Q ⊆ M w.r.t. a group action of G on M and database D is defined as the set GD (Q) := {(g, i) ∈ G × [1 : N ] | gQ ⊆ Di } of all pairs (g, i) such that a g-shift transports Q to a pattern gQ occurring in the i-th document. Consider a database D := (D1 ) consisting of the score document D1 over the set of notes M depicted in Fig. 2, the group T2 of time- and pitch-shifts, and a query Q as depicted in Fig. 1. Then the highlighted parts of Fig. 3 correspond to {gQ | (g, 1) ∈ T2 D (Q)}, the occurrences of all shifted and transposed versions of Q within D1 . To obtain an indexing scheme, we observe that for A, B ⊆ M we have GD (A ∪ B) = GD (A) ∩ GD (B) and hence \ GD (Q) = GD (q), (2) q∈Q

where we use GD (q) := GD ({q}) as a shorthand notation. Hence, all queries may be evaluated by only using the sets

GD (m) for m ∈ M . As the set GD (m) points to all occurrences of object m within the database, these sets are actually generalizations of inverted files or inverted lists. However, as the set M generally is of infinite size, this does not yet yield a feasible indexing mechanism. Fortunately, it turns out that we only need to store one inverted list for each G-orbit as the following basic result holds for all m ∈ M and g ∈ G: GD (gm) = GD (m)g −1 −1

(3)

−1

where GD (m)g := {(hg , i) | (h, i) ∈ GD (m)}. To exploit the latter property, we choose a set R ⊆ M of representatives of each G-orbit. Then each m ∈ M has a representation m = gm rm with a unique rm ∈ R and a suitable gm ∈ G. Based on the set R, a substitution of (3) into (2) yields \ GD (Q) = GD (rq )gq−1 . (4) q∈Q,q=gq rq

Hence, arbitrary sets of matches may be calculated based on the |R| inverted files {GD (r) | r ∈ R} only. In the above T2 -example, we might choose R := {[0, 0, d] | d ∈ N}, i.e., there is one inverted list for each admissible duration.

4. FAULT TOLERANCE 4.1 Mismatches In this section we introduce two basic mechanisms to incorporate fault tolerance into our retrieval scheme. The first mechanism considers the case that a query can only be partially matched to a database document. Formally, for a maximum number k ∈ N of mismatches and A \ B := {a ∈ A | a 6∈ B} we define the set GD,k (Q) := {(g, i) | |gQ \ Di | ≤ k} of all matches of a query Q ⊆ M where each match contains up to k mismatching positions. The matches GD,k (Q) may be determined efficiently using a dynamic programming approach. Consider an arbitrary enumeration Q =: {q1 , . . . , qn } of the elements of Q. Hj := GD (qj ) is the inverted list of the j-th query component. Furthermore, Γj := H1 ∪ · · · ∪ Hj is the set of all potential match candidates up to the j-th step of the following algorithm. For each j ∈ [1 : n] we inductively define a credit function Cj : Γj → Z. For γ ∈ Γ1 , let C1 (γ) := k + 1. For 2 ≤ j ≤ n, we let  if γ ∈ Γj−1 ∩ Hj ,  Cj−1 (γ) Cj−1 (γ) − 1 if γ ∈ Γj−1 \Hj , Cj (γ) :=  k+2−j if γ ∈ Hj \Γj−1 . One easily shows that

GD,k (Q) = {γ ∈ Γn | Cn (γ) > 0}.

(5)

This is, the elements γ ∈ Γn with a positive remaining credit value Cn (γ) > 0 are exactly the hits with up to k mismatches. Note that we used the latter definition of credit functions for illustration only. In an implementation one would, in the j-step, remove all candidates with negative or zero credit from the set Γj in order to speed up the computation. Algorithms for computing intersections as discussed here are treated in [2].

Fig. 4 illustrates the concept of k-mismatches for the case of score-based music search. The left hand side shows a query to a document (middle) in the piano roll notation. All except two notes of the query match the document at the position shown on the right.

4.2 Fuzzy Search When posing a query, one is frequently in doubt about certain parts of the query. In several cases it would be desirable to specify alternative versions of the actual query. In the music example, it could be the case that one is unsure about a certain musical interval and hence query several alternative versions of a melody simultaneously. Such fuzziness may be incorporated into the proposed framework as follows. Let (Q1 , . . . , Qn ) be a sequence of alternatives (or fuzzy sets) Qi ⊂ M . The corresponding fuzzy query Q is a family of queries defined by Q := {{q1 , . . . , qn } | ∀i : qi ∈ Qi }. The set of all fuzzy matches GD (Q) := {(g, i) | ∃Q ∈ Q : gQ ⊆ Di } may be computed efficiently along the formula   n [ \  GD (q) . GD (Q) = i=1

(6)

q∈Qi

Fig. 5 shows an example of a fuzzy query (left) to a database document (middle). The query contains two fuzzy sets, one containing two, the other containing three notes. The query is matched to the document as shown on the right.

4.3 Pdf-weighted Queries Considering a fuzzy query, it may be sometimes appropriate to introduce weights on the above family Q of queries, e.g., as one might know in advance that some (combinations of) alternatives are more probable than others. This may be realized introducing a probability density function (pdf) p : Q → [0, 1]. Note that one would frequently introduce pdfs on each set of alternatives Qi and then use the product density for p. The pdf p is then used to obtain a ranking of the elements (g, i) ∈ GD (Q) by defining X GD (Q, p) := {(g, i, π) | π = p(Q)}. Q∈Q:gQ⊆Di

Concluding this section we note that all of the introduced concepts of k-mismatches, fuzzy search, and pdf-weighting may be combined with each other.

5. EXPLOITING PRIOR KNOWLEDGE The sets of matches may be determined using intersections of inverted lists as in (4) or (6). Assuming the lists are always of similar lengths, the time complexity of evaluating a fixed query consisting of |Q| elements decreases with an increasing number of lists. Intuitively, a large number of lists — corresponding to a large number of G-orbits — means that our data are described in a rather detailed manner. Hence we might assume that a detailed prior knowledge

hit with 2 mismatching positions

?

                   

document

query

mismatches Figure 4: Query (left) to a score document (middle) and a hit with two mismatching positions (right).

Fuzzy set 2

     

         

         

  

Match of fuzzy set 2

?

Query Fuzzy set 1

   

      

Document Match of fuzzy set 1

Figure 5: Query containing two fuzzy notes (left) and matching position (right) within a document (middle).

about the desired matches may be exploited to speed-up the retrieval. Consider again the example of score-based search and assume that all pieces of the database are in 4/4 measure. Assume furthermore that each measure is subdivided into 16 metrical positions. When a user posing a query knows about the metrical position of the query, e.g., that the query’s first note starts at an offbeat, this may be incorporated as follows: Instead of using the group T2 = Z × Z of time- and pitch-translations we now use the proper subgroup U := 16Z × Z for defining matches. Using this group, a match only gives us the measure number of a query, the exact metrical position has to be provided by the user. Next, assume the user additionally knows about the correct key of his query. In this case we may replace U by its proper subgroup V := 16Z × {0}. To clarify things, assume that T2 , U, and V act on the set of notes M := Z × Z, where [t, p] ∈ M consists of onset time t and pitch p, by the above time- and pitch-translation (i.e., durations are omitted). As the action of T2 on M is transitive, i.e., each element of M may be shifted into every other element of M , there is only one orbit and hence only one inverted list in this case. For the U -action there are 16 orbits and a set of representatives is given by {[i, 0] | i ∈ [0 : 15]}. In case of the V -action, a set of representatives is given by {[i, j] | i ∈ [0 : 15], j ∈ Z}. Assuming a finite pitch interval P ⊆ Z, e.g., P = [0 : 127], this yields a total of 16|P | potentially non-empty lists. In general, we consider a subgroup U < G, i.e., U is a nonempty subset of G which is closed under group multi-

plication and inversion. Then for a suitable subset R of G with 1 ∈ R, there is a disjoint decomposition G = tr∈R U r of G into right cosets of the form U r := {ur | u ∈ U }. Using this decomposition, G-inverted lists and U -inverted lists are connected by G GD (m) = UD (rm)r r∈R

for each m ∈ M . Then, the speed-up in query processing when restricting ourselves to subgroups U < G is explained by the fundamental property G GD (Q) = UD (rQ)r r∈R

M

for all Q ∈ 2 . Hence, GD (Q) consists of many lists of the form UD (rQ)r. However, assuming prior knowledge as described above, we only need to determine the list for the case r = 1, i.e., UD (Q).

6. FAST INDEX-BASED SEARCH Fast index-based search amounts to evaluating expressions such as (4) for the case of exact search or (6) for fuzzy search. Hence, the operations of set intersection and set union have to be performed efficiently. Note that the k-mismatch search may be essentially realized using modified set intersections. As set union and hence fuzzy search may be treated similarly, we shall from now on concentrate on the case of exact search. To perform set intersection efficiently, we assume that each of the inverted lists {GD (r) | r ∈ R} constituting the search index is stored as a sorted sequence. For this sake, we assume a suitable total (or linear) order on the elements of

the underlying group G. For example, the group (Z, +) is linearly ordered by the natural order on the set of integers. The above group T2 = Z × Z may be linearly ordered using a lexicographic order. To evaluate intersections (4), we consider elementary intersections of the form GD (r)g ∩ GD (˜ r)˜ g.

(7)

This is, we have to consider g- resp. g˜-shifted versions of the sorted lists GD (r) resp. GD (˜ r). As the group action either may or may not preserve the order of the list elements, we have to consider two different cases. First assume that the group action preserves the order of the list elements. Then, a list intersection of two lists of lengths ` and k may be performed by, e.g., merging both lists in O(` + k) steps or by searching the ` elements of one (the shorter) list in the other list, requiring O(` log(k)) steps. Which intersection algorithms are most suitable depends, among other parameters, on the actual list lengths ` and k, and the distribution of the list entries [7]. As inverted list intersection is a standard technique, we only briefly consider the case of k-mismatch search as it is most relevant our approach. Assume that a database contains a total of n := |D1 |+· · ·+|DN | elements equally distributed to ` inverted lists and that a query is of size m := |Q|. Using sequential intersection of all m involved lists, while admitting unlimited mismatches, we obtain a total time complexity of O(m2 n/`) steps which may be further reduced to O(m log(m)n/`) steps using m-way merging. Assuming that we are able to choose ` ≥ m log(m), this amounts to a complexity of O(n). Note that this choice of ` is generally quite realistic. For example for the case of score-based search, we obtain approximately ` = 2000 lists whereas usual query sizes are about m = 10 elements. Further note that the O(n) estimate is quite pessimistic as the number of allowed mismatches and hence the number of intermediate results will be generally much smaller as assumed here. Now assume that the group action does not preserve the ordering of the list elements, i.e., even if the lists GD (r) are organized as ordered sequences, lists GD (r)g will not be ordered anymore. To obtain an efficient strategy for list intersections in this case, observe that (7) is equivalent to (GD (r) ∩ GD (˜ r)˜ gg −1 )g.

(8)

Note that as GD (r) is a sorted list, we now may apply binary search of the elements of GD (˜ r)˜ gg −1 within GD (r) to obtain a fast intersection algorithm. Observing that for Q =: {q1 , . . . , qn } we may iterate the above by G1 := GD (rq1 )gq−1 g q2 , 1 Gk := (Gk−1 ∩ GD (rqk ))gq−1 gqk+1 for k ∈ [2 : n − 1] and k Gn := (Gn−1 ∩ GD (rqn ))gq−1 , and also obtain a fast algon rithm for evaluating GD (Q) = Gn (c.f. (4)) in the second case. We finally remark that in several applications, e.g., involving groups such as 2- or 3-dimensional Euclidean motions, fuzzy search leads to inverted lists consisting of intervals of group elements rather than single elements. This leads to interval intersection problems. When groups with higher dimensional matrix representations are involved, the interval

intersections are usually carried out using high dimensional range search. Those problems are beyond the scope of this paper. For further reading we refer to the survey [1].

7. INDEX COMPRESSION Inverted file indexing is a well-established technique in the area of full-text retrieval and several mechanisms have been proposed to obtain compressed representations of an inverted index [24]. Generally, one exploits the (w.l.o.g. ascending) order of the inverted lists’ elements and works on a differential representation of the lists. Consider, e.g., an inverted file over Z consisting of integers a1 ≤ a2 ≤ · · · ≤ ak . A differential representation would be (a1 , a2 − a1 , . . . , ak − ak−1 ). The differential representation is usually run-length encoded based on a suitable model of the differential representation’s distribution. Examples of run-length codes are unary-, gamma-, or Golomb-codes. We shall demonstrate by an example, how inverted file compression may be carried over to the case of our generalized inverted files. Consider the group of integers Z, e.g., operating by time-shifts on the set M := Z × Z of notes, τ [t, p] := [τ + t, p] for τ ∈ Z and [t, p] ∈ M . Using the set of representatives R := {[0, p] | p ∈ Z} leads to inverted lists of the form GD ([0, p]) for p ∈ Z with entries (t, i) whenever note [t, p] is contained in document i. Hence, we have to code inverted lists with entries from N × [1 : N ]. To obtain simpler inverted files containing only integer entries, the idea is to create one long virtual document by concatenating D1 , . . . , DN . As all documents are finite, we may define `(D) := min{t | [t, p] ∈ D}, u(D) := max{t | [t, p] ∈ D}, and hence the width w(D) := u(D) − `(D) + 1 of document P D. For i ∈ [1 : n] let Ti := i−1 j=1 w(Dj ) denote the starting position of document i within the virtual document. Then, the inverted files are converted to lists G(p) := {t + Ti | [t, p] ∈ Di } over Z which may be sorted appropriately. Inverted file search is carried out as before except for using the G(p) instead of GD ([0, p]). Assuming that a G(p)-based search to a query Q, \ G(Q) := G(p) − t, q∈Q,q=[t,p]

returns a hit t ∈ G(Q), we recover the document ID by determining the unique i such that Ti ≤ t + `(Q) < Ti+1 . Note that we have to eliminate eventual false positive matches, which may occur iff w(Q) + t ≥ Ti+1 for i ∈ [1 : n − 1]. Then g := t − Ti yields the position of the match within document Di , i.e., (g, i) ∈ GD (Q). Using this type of document concatenation for the case of score-based music retrieval, an uncompressed index of size 110 MB created from a 33 million note database has been compressed to a size of 22 MB only. In this case, a modified Golomb code was used for index compression.

8. CONVERTING RAW- TO INDEX-DATA In most cases, the actual (raw) data objects constituting a multimedia database are not directly used in the search index. Instead, we work on somehow reduced versions which constitute a database of the form D as defined above. The

reasons for using such a reduced representation may be the immense size of the raw data (consider for example raw high quality image- or audio-datasets), a significant increase in robustness, e.g., against distortions of digitized data, or the possibility to omit unnecessary or misleading data. Tacitly, we have already used several forms of preprocessing involving reduced data representations when modeling score-based data. First, we used a set M := Z × Z × N to model all possible notes. When converting an audio file in MIDI format to a document over M , we have to omit additional information like instrumentation, dynamics, etc. contained in the MIDI file. When we restricted ourselves to notes of the form Z × Z, the data was further reduced by omitting the third component of note duration. However, in each case we retained the structure of the underlying data that we needed in defining matches w.r.t. a group G acting on the data. In this section we will first investigate transforms between different sets of data- and index-objects preserving structural properties of the underlying group acting on the data objects. We will then investigate how such structure preserving mappings may be extended to composite objects. In particular we obtain G-invariant feature extractors which enable us to preprocess digitized data such as images or audio tracks which may then be indexed using the previously proposed techniques.

8.1 Structure-Preserving Transforms Structure preserving mappings are modeled by so called Gmorphisms. Let G be a group acting on two sets M and M 0 of elementary objects. A map f : M → M 0 will be called a G-morphism iff ∀m ∈ M, g ∈ G : f (gm) = gf (m), i.e., f commutes with the group action. Consider for example the two above note models M := Z × Z × N and M 0 := Z × Z with G := (Z × Z, +) acting on both M and M 0 by timeand pitch-translation, ((τ, ρ), [t, p, d]) := [t + τ, p + ρ, d] and ((τ, ρ), [t, p]) := [t + τ, p + ρ]. Then obviously f : M → M 0 , [t, p, d] 7→ [t, p] is a G-morphism from M to M 0 . In what follows, for Q ⊆ M we use f (Q) := {f (q) | q ∈ Q} as a shorthand notation. The following basic result states that we do not loose any matches when using G-morphisms between sets of elementary objects. Let G be a group acting on both sets M and M 0 . Consider a database D := (D1 , . . . , DN ) over M . Furthermore, consider a G-morphism f : M → M 0 , and let f (D) := (f (D1 ), . . . , f (DN )) denote the transformed database. Then, for each match (g, i) to a query Q ⊆ M , gf (Q) = f (gQ) ⊆ f (Di ) holds and hence GD (Q) ⊆ Gf (D) (f (Q)).

(9)

As an example consider again the sets of notes M and M 0 as defined above together with the document D1 := {[0, 1, 1], [10, 1, 1], [20, 1, 2]} ⊂ M, f (D1 ) = {[0, 1], [10, 1], [20, 1]} ⊂ M 0 . For a query Q := {[0, 1, 1], [10, 1, 2]} ⊂ M,

f (Q) = {[0, 1], [10, 1]} ⊂ M 0 we are looking for all matches w.r.t. the above group G of time- and pitch-shifts. Then G(D1 ) (Q) = {((10, 0), 1)} as a time-shift by 10 transports Q into document D1 . On the other hand, G(f (D1 )) (f (Q)) = {((0, 0), 1), ((10, 0), 1)} ⊃ G(D1 ) (Q) contains a second match as the durations have been eliminated by f .

8.2 Feature Extraction Considering the large amounts of data and the various signal qualities, a significant amount of data reduction will be necessary when attempting to index digitized audio-, imageand video-material. We shall illustrate the concept of invariant preprocessing considering the example of image retrieval described above. Assume an image is modeled as a set of elementary pixels D ⊂ P = R2 × N. Strictly speaking, an image is a map d : I × J → N, where each coordinate (x, y) ∈ I × J ⊆ R2 is assigned a color value c ∈ N. Hence, looking at an image as a subset D ⊂ P amounts to considering the graph D of the map d. To achieve data reduction while retaining the essential image information, one applies a preprocessing step which is 2 2 formalized by an operator F : NR → 2R ×X where an image d : I × J → N is mapped to a subset F (d) ⊆ R2 × X =: P. Here, X denotes a finite set of features and F is called feature extractor. In applications, d and F (d) are quantized to discrete domains [i1 : i2 ] × [j1 : j2 ] ⊂ N2 and N2 × X respectively. Then generally |F (d)| will be very small as compared to the image size |i2 + 1 − i1 | · |j2 + 1 − j1 |. As an example, consider the well-known SUSAN corner detector [21], which determines all positions in an image resembling a corner (according to some suitable criterion). Then X := {corner} and for an image d, we have [(x, y),corner] ∈ F (d) iff d contains a corner at coordinates (x, y). In the above scenario of content-based image search we want to find all images in a database containing a certain query image as a subimage. In this setting the query has to be shifted somehow along the x and y axis in order to match a database image. Hence we consider the action of the group G := (R2 , +) of 2D-translations on the set P of pixels: ((χ, µ), [(x, y), c]) := [(x + χ, y + µ), c]. As the index will be constructed based on the feature representation, we have to verify that G also acts on P, which is trivially the case. Finally, in order to obtain meaningful G-matches based on a P-index we have to establish that shifted versions of an image yield likewise shifted feature representations, i.e., the feature extractor F should be invariant to translations. For2 mally, F is a map M → M0 , where M = NR ⊂ 2P denotes the set of all graphs of admissible images and M0 ⊂ 2P the corresponding set of all feature-sets. Then invariance will just mean that F is a G-morphism from M to M0 . Note that this kind of G-morphism is defined on documents rather than on elementary objects. Generally, assume a database as defined in the context of (9) and assume G acts on the sets M and M 0 as well as on 0 M ⊆ 2M and M0 ⊆ 2M . Further assume that F : M → M0 is a G-morphism satisfying A ⊆ B ⇒ F (A) ⊆ F (B). Then for all Q ∈ M we have GD (Q) ⊆ GF (D) (F (Q)), in analogy to (9).

9.

APPLICATIONS, PROTOTYPES, AND TEST RESULTS

In this section we give an overview on several applications of the proposed indexing and search technique. A more detailed treatment for the case of music retrieval may be found in our related work [5]. We describe prototypic implementations and give some test results demonstrating time- and space-efficiency of the proposed algorithms. Recall that for each application we have to specify an underlying set M of elementary objects, a group G operating on this set, and a set of representatives R ⊂ M to specify the inverted file index {GD (r) | r ∈ R}.

9.1 Content–Based Music Retrieval Score-based polyphonic search has been used as a running example within this paper. In our PROMS-system [4] we used a universe M := Z × [0 : 127] of notes consisting of onset-times and MIDI-pitches. The search is carried out w.r.t. the groups G := (Z, +) and V := (16Z, +) of timeshifts, where G shifts by metrical positions and V by whole measures each consisting of 16 metrical positions. The latter models prior knowledge about the metrical position of a query within a measure. The sets of representatives are defined as in Section 5. Our database consists of 12,000 classical pieces of music given in the MIDI format. The pieces consist of a total of about 33 million notes. Response times for queries of various lengths are summarized in Table 1. The response times for each query length were averaged over 100 randomly generated queries. As the table demonstrates, our query processing is very fast. In addition, the index requires only a small amount of disk space and indexing of 330 MB of polyphonic music takes only 40 seconds. The uncompressed index requires 110 MB of disk space. Compressing the inverted lists using Golomb coding as described above results in reducing the space requirement to 22 MB. a b c

4 51 1

8 86 5

12 92 7

16 97 10

20 100 12

30 107 19

50 125 31

100 159 64

Table 1: Average total system response time (row b) in ms for different numbers of notes per query (row a). Row c: Disk access time for fetching inverted lists. (Pentium II, 333 MHz, 256 MB RAM).

9.2 Audio Identification The task of audio identification may be described as follows. Given a short part q of an audio track and a database x1 , . . . xN of full-size audio tracks, locate all occurrences of q within the database tracks. In this, a pair (t, i) determines an occurrence of q in xi iff q = xi [t : t + |q| − 1]. Note that this problem may be stated in terms of our group-based approach as described above, as audio signals s ∈ RZ may be interpreted as subsets S ⊂ 2Z×R where G = (Z, +) operates by addition in the Z- (time) component. Because of the robustness- and space-efficiency reasons discussed above, audio signals are preprocessed using a G-invariant feature extractor F : RZ → 2Z×X , where X denotes a set of feature

classes (in this case G-invariance denotes the usual timeinvariance). As an illustration we sketch a feature extractor which extracts significant local maxima from a smoothed version of a signal. For a detailed treatment of more robust feature extractors, we refer to [17]. F will be composed from several elementary operators which are each maps on the signal space, i.e., maps RZ → RZ . First, an input signal s is smoothed by linear P filtering. The corresponding operator is Cf [s] : n 7→ k∈Z f (k)s(n − k), where f denotes a signal of finite support. Next, K-significant local maxima are extracted by an operator   x(n) if x(n − K) < · · · < x(n) ∧ x(n) > · · · > x(n + K), MK [x] : n 7→  0 otherwise.

0 The resulting signal is again processed by an operator MK 0 0 extracting local maxima. MK 0 is defined exactly as MK , 0 but regards only the support of the input signal. MK 0 usually returns a very sparse output signal. The operator ∆ assigns to each non-zero position of an input sequence the distance to the previous nonzero position (provided existence), and zero otherwise. Finally, let Q|X| denote an, e.g., linear, quantizer which reduces a signal’s amplitude to |X| feature classes. Note that Q|X| is an operator RZ → 2Z×X which in addition to quantization discards zero-positions of a signal. Then, our feature extractor may be written as 0 F := Qc ◦ ∆ ◦ MK 0 ◦ MK ◦ Cf . In an example we could 0 choose K = 5, K = 3, and X = [1 : 50]. To construct our search index, we calculate F [xi ] ∈ 2Z×X for each signal xi of our database. In this {[0, x] | x ∈ X} serves as the set of representatives for index construction.

We briefly summarize some results of our extensive tests in the audio identification scenario. Our database consists of 4500 full-size audio tracks. This approximately amounts to 180 GB of original data or 13 days of high quality audio. Using the above significant maxima as features, we obtain an (uncompressed) index size of about 128 MB which is a compression ratio of about 1:1,400 as compared to the original data. Using different feature extraction methods, the index size may be further reduced to sizes of 1:5,000 or even lower. The query times range from only a few milliseconds (higher quality queries) to about one second. The required length of a query signal depends on the feature extractor and ranges from few fractions of a second (significant maxima features) to 5-15 seconds (robust features and low quality queries) [5]. Recent experiments with even larger audio data bases of 10000, 15000, and even 50000 full-size audio tracks prove that our index structure and search algorithms scale well for very large data sets.

9.3 Content–Based Image Retrieval In content-based 2D- or 3D-retrieval we are interested in finding possibly translated, rotated, or (uniformly) scaled versions of a query object in an underlying database. In this overview we shall consider translations and rotations only. Hence, the groups of interest are the group Tn of translations in Rn and the orthogonal group On defined above consisting of all rotations including reflections w.r.t. the origin. Writing elements of Tn as τx for vectors x ∈ Rn , the set En of all pairs (τx , A) for τx ∈ Tn and A ∈ On together with the multiplication (τx , A) ∗ (τy , B) := (τAy+x, AB) defines the

group of Euclidean motions, i.e., rotations, reflections, and translations, in nD. In content-based image retrieval, we are working with 2D images D ⊂ R2 × N = P and are hence interested in the groups T2 and E2 . Assume a suitable feature extractor yielding a set of features F (D) ⊂ R2 × X for an image D. T2 acts on P ’s first two components as described above. Hence, after feature extraction, we may create an index based on the set {[0, 0, x] | x ∈ X} of representatives of P . In our extensive tests described in [18] we investigated several kinds of feature extractors including corner detectors, gray value statistics and histograms. Looking at retrieval under the group E2 acting on 2D points from P , we face the problem that each point x ∈ R2 is mapped to any other point y ∈ R2 by infinitely many elements from E2 , resulting in inverted lists of infinite size. Hence we resort to indexing line segments, which are modeled by the set P2 (M ) := {L ⊂ M | |M | = 2} of all two– element subsets of R2 . Using M 0 := P2 (R2 ) × X the line segments {[(0, a), (0, −a), x] | x ∈ X, a ≥ 0} serve as a set of representatives for indexing. In [18], several types of features were tested in the latter setting. Essentially two different image data collections were tested on a 1.8 GHz Pentium 4 processor with 512 MB of main memory running Windows NT 4.0. For testing T2 -invariant search, we used a collection consisting of 3500 JPEG images of various contents, totaling to 121 MB of compressed image material. The resulting index consists of 625000 features of size 18MB. Various settings with between 3000–90000 T2 lists were investigated. For additionally testing rotationinvariance, i.e., E2 -based search, we used 2000 artificially generated polygon images, amounting to 25 MB of compressed data. The resulting index contains 100000 features and is of size 30MB. We used several settings with between 20–150 E2 -lists. Sufficient query sizes for image identification in the case of T2 -invariance are typically less than 50 features. The response time is less than 50 ms in this setting. E2 -invariant queries are a lot slower due to the reduced number of lists and require in the order of a few seconds of query time. As a well-known phenomenon, E2 -invariant features are very hard to construct. Generally, a coarse quantization level is necessary, resulting in longer inverted lists or, alternatively, high mismatch rates.

9.4 Searching 3D–Scenes We investigated content-based search in 3D scenes for the case of a database of VRML (Virtual Reality Modeling Language) documents [15]. To obtain feasible inverted lists, the elementary objects were choosen to be all 3-sets in R3 , i.e., M = P3 (R3 ), interpreted as the sets of all triangles in R3 . Hence for indexing, all VRML documents were converted into documents consisting of triangles only. Indexing was performed for the groups T3 of 3D translations and for the group E3 of Euclidean motions in R3 . As a set of representatives we chose all sets of triangles with the origin as the center of gravity. Additionally, each representative is rotated such that one specific edge runs in parallel to the x-axis, this edge depending on the triangle having one, two,

Figure 6: Toy database of 3D-objects (left), query object (top), and matching database object (right). The matching position is highlighted.

or three different side lengths. This way, one obtains a finite set of inverted lists for this application. Fig. 6 illustrates the concept of 3D-retrieval using an underlying toy-database of 3D-objects (on the left). The top of the figure shows a part of an object which is used as a query to the database. An index-based search results in one object of the database matching the query (on the right). The position of that object matching the query is highlighted.

10. CONCLUSION AND ONGOING WORK In this paper, we presented an index-based approach for searching multimedia databases by content. The approach has been derived from the general mathematical concept of groups acting on sets and yields a general methodology for defining the notion of a match based on group actions. Combining this approach with the classical concept of inverted files yields time- and space-efficient algorithms for content-based (‘full-text’) search in multimedia data. We furthermore introduced several general concepts for incorporating fault-tolerance into our retrieval scheme including mismatch- and fuzzy search. A discussion of group-invariant transforms allowed us to specify a general framework for data preprocessing including feature extraction from digitized (audio, image, video, etc.) data. Using group-invariant feature extractors which preserve the essential structure of the underlying data (e.g., both image pixels as well as features have (x, y) coordinates), we — in contrast to most existing approaches — obtain the additional freedom to precisely localize a query pattern in a database object. The latter has many important applications, e.g., in music retrieval or object localization in 2D images. The wide range of applications sketched in the previous section demonstrates the power and flexibility of our approach. The approach is unique in exploiting the general structure of the retrieval problems, which significantly contributes to its high performance, particularly when dealing with partial matches. An important generalization of the notion of a match incorporates distance metrics to allow for matching w.r.t. those distances. An example is matching of polygonal curves under the Fr´echet distance [6]. In future work, the concept for searching digital audio signals will be extended to searching general large time-series databases such as stock data. As a variant of our philosophy of matches we are investigat-

ing monitoring applications such as screening TV- or radioprogrammes for certain audio- or video-clips. A widely acknowledged problem with digitized data is to obtain features which are robust to various kinds of signal distortions in addition to being invariant w.r.t. group transformations. In our ongoing work we are trying to integrate robust feature extractors within our framework to obtain improved retrieval performance. Finally, ongoing work is concerned with continously applying the proposed methods to new kinds of data, e.g., speech or video data from several application areas, e.g., chemistry, geo-information science etc. Last section’s discussion documents that the field of contentbased multimedia retrieval is highly interdisciplinary. We hope that our work, in presenting a general framework and efficient algorithms for a wide variety of content-based retrieval problems, contibutes in bringing together powerful techniques from the various areas.

Acknowledgements The authors would like to thank Roland Engelbrecht, Rainer Manthey, Axel Mosig, Andreas Ribbrock and Tido R¨ oder for their support within this project. The authors express their gratitude to the anonymous reviewers for various helpful comments and suggestions.

11. REFERENCES [1] P. K. Agarwal. Geometric range searching. In Handbook of Comp. Geometry. CRC, 1997. [2] J. Barbay and C. Kenyon. Adaptive Intersection and t-Threshold Problems. In Proc. 13th ACM-SIAM Symposium On Discrete Algorithms (SODA), San Francisco, Jan. 2002. [3] C. Carson, M. Thomas, S. Belongie, J. M. Hellerstein, and J. Malik. Blobworld: A system for region-based image indexing and retrieval. In Third International Conference on Visual Information Systems. Springer, 1999. [4] M. Clausen, R. Engelbrecht, D. Meyer, and J. Schmitz. PROMS: A Web-based Tool for Searching in Polyphonic Music. In Proceedings Intl. Symp. on Music Information Retrieval 2000, Plymouth, M.A., USA, 2000. [5] M. Clausen and F. Kurth. A Unified Approach to Content-Based and Fault Tolerant Music Recognition, 2003. IEEE Transactions on Multimedia, Accepted for Publication. [6] M. Clausen and A. Mosig. Approximately Matching Polygonal Curves under Translation, Rotation and Scaling with Respect to the Fr´echet-Distance. In 19th European Workshop on Computational Geometry, 2003. [7] E. D. Demaine, A. L´ opez-Ortiz, and J. I. Munro. Experiments on Adaptive Set Intersections for Text Retrieval Systems. In Proc. 3rd Workshop on Algorithm Engineering and Experiments (ALENEX 2001), pages 91–104, 2001. [8] C. Faloutsos. Searching Multimedia Databases by Content. Kluwer, 1996.

[9] M. Flickner, H. Sawhney, W. Niblack, J. Ashley, Q. Huang, B. Dom, M. Gorkani, J. Hafner, D. Lee, D. Petkovic, D. Steele, and P. Yanker. Query by image and video content: The qbic system. IEEE Computer, 28(9):23–32, 1995. [10] J. Haitsma, T. Kalker, and J. Oostven. Robust Audio Hashing for Content Identification. In 2nd Intl. Workshop on Content Based Multimedia and Indexing, Brescia, Italy, Sept. 2001. [11] E. Keogh. Exact indexing of dynamic time warping. In 28th International Conference VLDB, Hong Kong, pages 406–417, 2002. [12] K. Lemstr¨ om and S. Perttu. SEMEX - An Efficient Music Retrieval Prototype. In Proceedings Intl. Symp. on Music Information Retrieval 2000, Plymouth, M.A., USA, 2000. [13] G. Lu. Techniques and Data Structures for Efficient Multimedia Retrieval Based on Similarity. IEEE Trans. on Multimedia, 4(3):372–384, Sept. 2002. [14] Y.-S. Moon, K.-Y. Whang, and W.-S. Han. General Match: A Subsequence Matching Method in Time-Series Databases Based on Generalized Windows. In SIGMOD Conference, pages 382–393, 2002. [15] A. Mosig. Algorithmen und Datenstrukturen zur effizienten Konstellationssuche. Masters Thesis, Department of Computer Science III, University of Bonn, Germany, 2001. [16] D. Papadias. Hill Climbing Algorithms for Content-Based Retrieval of Similar Configurations. In Proc. SIGIR, Greece, pages 240–247, 2000. [17] A. Ribbrock and F. Kurth. A Full-Text Retrieval Approach to Content-Based Audio Identification. In Proc. 5. IEEE Workshop on MMSP, St. Thomas, Virgin Islands, USA, 2002. [18] T. R¨ oder. A Group Theoretical Approach to Content-Based Image Retrieval. Masters Thesis, Department of Computer Science III, University of Bonn, Germany, 2002. [19] T. R¨ oder, F. Kurth, A. Mosig, and M. Clausen. A Group Theoretical Approach to Content-Based Image Retrieval. NATO Advanced Study Institute (ASI), Lucca, Italy, 2003. Available via email to the authors. [20] S. Santini and A. Gupta. Principles of Schema Design in Multimedia Data Bases. IEEE Trans. on Multimedia, 4(2), June 2002. [21] S. M. Smith and J. M. Brady. SUSAN – A new approach to low level image processing. Int. Journal of Computer Vision, 40(1):45–78, 1997. [22] A. L. Uitdenbogerd and J. Zobel. Melodic Matching Techniques for Large Music Databases. In Proc. ACM Multimedia, 1999. [23] R. C. Veltkamp. Shape Matching: Similarity Measures and Algorithms. In Shape Modelling International, pages 188–199, 2001.

[24] I. H. Witten, A. Moffat, and T. C. Bell. Managing Gigabytes. Van Nostrand Reinhold, 2nd edition, 1999. [25] H. Wolfson and I. Rigoutsos. Geometric Hashing: An Overview. IEEE Computational Science and Engineering, 4(4):10–21, 1997. [26] A. Yoshitaka and T. Ichikawa. A Survey on Content-Based Retrieval for Multimedia Databases. IEEE Transactions on Knowlegde and Data Engineering, 11(1):81–93, 1999.