Similarity Queries in Image Databases - Semantic Scholar

5 downloads 0 Views 203KB Size Report
In a 1977 paper 8], Amos Tversky proposed his fa- mous feature contrast model. Instead of considering stimuli as points in a metric space, Tversky charac-.
Similarity Queries in Image Databases Simone Santini

Ramesh Jain

Department of Computer Science University of California, San Diego La Jolla, CA 92093-0114

Department of Electrical and Computer Eng. University of California, San Diego La Jolla, CA 92093-0407

Abstract

Query-by-content image database will be based on similarity, rater than on matching, where similarity is a measure that is de ned and meaningful for every pair of images in the image space. Since it is the human user that, in the end, has to be satis ed with the results of the query, it is natural to base the similarity measure that we will use on the characteristics of human similarity assessment. In the rst part of this paper, we review some of these characteristics and de ne a similarity measure based on them. Another problem that similarity-based databases will have to face is how to combine di erent queries into a single complex query. We present a solution based on three operators that are the analogous of the and, or, and not operators one uses in traditional databases. These operators are powerful enough to express queries of unlimited complexity, yet have a very intuitive behavior, making easy for the user to specify a query tailored to a particular need.

1 Introduction

Multimedia databases are coming, and they will force us to rethink the way we build databases. The revision process will not be limited to nding better algorithms to store and retrieve large amounts of data, but it will take place at all the levels of a database system: he very foundations of a database will have to be adapted to the new reality. We are analyzing, and challenging, one basic concept in the organization of a database: the concept of matching. Matching is the fundamental operation in a traditional database, and it consists in comparing an item with the query, and deciding whether the item satis es the query or not. In textual databases, matching is a binary operation: every item either matched the query, or didn't. Using distortion robust matching is not a satisfactory solution: it is still based on the idea of matching, and it states basically that the database image and

Figure 1: Magritte's Les ch^ateau des Pyrenees and a possible query to retrieve it the query ought to be equal; they just happen to be slightly di erent due to some accident or imperfection. The idea behind this approach is that the user wants to retrieve, say, Magritte's Le ch^ateau des Pyrenees. The user knows that there is a graysh sea at the bottom of the painting, a blue sky in the back, and a a big rock that dominates the center of the painting. A typical sketch query to retrieve the ch^ateau could be that of Fig. 1. The user knows roughly what colors and objects are in the image, but his memory is not perfect, he will make some mistakes, misplace some objects, ask for a color distribution that is slightly di erent from that in the painting. However, it is a priori known that the user wants to retrieve that particular painting, and not something merely similar to it. We believe that future image databases should abandon the matching paradigm, and rely instead on similarity searches. In similarity search, we don't postulate the existence of a target image in the database. Rather, we order the images with respect to similarity with the query, given a xed similarity criterion. A

consequence of this is that, in principle, the answer to any query is the whole database. In practice, of course, we are interested only in a tiny fraction of the database: the images most similar to the query. We can state the main di erence between matchbased and similarity searches as follows: the result of a match-based search is a partition of the database in the set of images that match the query, and the set of images that don't; the result of a similarity search is a permutation (in particular, a sorting with respect to the similarity criterion), of the whole database. Since we are abandoning the idea of matching, we must nd a similarity measure that behaves well|at least in principle|for any pair of images, even two that are very di erent one from the other. This is quite a di erent requirement from what we ask to a matching technique, which only has to behave well in the presence of an almost match between the database image and the query. Our requirement is especially important if we want to manage complex queries: queries in which we ask for similarity with respect to two or more criteria, or for similarity with respect to two or more images. Complex queries are not in general managed well by matching based techniques. Systems based on matching, either allow only one type of query [4], or allow a query to select exactly one type from those available [3]. The importance of the presence of a good similarity measure in a complex query can be understood from the following example. Suppose we have two similarity criteria, say S1 and S2 . We ask the database to show us all the images that are similar to a given sample according to both S1 and S2 . Even if an image is not very similar to he query according to S1 , we need an accurate estimate of the similarity, since the same image can be very similar to the query according to S2 and, in that case, the precise similarity according to S1 might place that image in a di erent place among the top ranking images. Searching for a reasonable similarity measure, the most obvious place to look is at is human similarity assessment. After all, when a user makes a sketch, or selects a prototype image, searching for something similar, he has in mind his own concept of similarity. The similarity used by the database should be as.... similar as possible to human similarity, if the results of the search have to be satisfactory. A good place to look for the characteristics of human similarity is the psychological literature.

2 Similarity Theories

Similarity perception is a complicated activity. It results from the cooperation of a number of di er-

ent mechanisms placed at di erent levels in the visual system. Because of this multiplicity, it is dicult to give similarity perception a unique characterization. Di erent experiments stimulate di erent mechanisms that operate at di erent levels, and result in di erent characteristics. In this section, we make a brief review of the most in uential models proposed in the past decades, trying to identify important properties that we should replicate in our database measures. Early models [7] hypothesized that similarity assessment was based on the measurement of a suitable distance in a psychological space. Stimuli were translated into points of the perceptual space, and the similarity between a stimulus SA and a stimulus SB is a function of SA ? SB . This point of view implies that the distance function on which similarity is based must satisfy the metric axioms.

2.1 The metric axioms

Suppose SA and SB are two stimuli, represented as vectors in some space of suitable dimension, and let the similarity between the two be measures via a psychological distance function d(SA ; SB ) [6] In general, the assumption is made that the psychological distance function d gives rise to the perceived dissimilarity d, which is di erent from the judged dissimilarity  [1], and that the two are related by a monotonically nondecreasing function g:

(SA ; SB ) = g[d(SA ; SB )]

(1)

If d is a metric function, it has the following properties: constancy of self-similarity (d(S; S ) = 0 8S ), minimality (d(S; S )  d(S; R)), symmetry (d(S; R) = d(R; S )), and the triangle inequality (d(SA ; SC )  d(SA ; SB ) + d(SB ; SC )) All the distance axioms have been analyzed experimentally, and have been showed not to hold for at least some of the experiments.

2.1.1 Thurstone-Shepard Similarity Models An important class of similarity models, strongly connected to the metric approach is the ThurstoneShepard class, whose ideas go back to [7], and that has been recently revived in [2]. In this class of models, stimuli are represented as normally distributed vectors in a high dimensional space. A momentary psychological value x = (x1 ; : : : xn ) is seen as one particular instance of the stimulus x, having distribution h(x), with mean x and variance x .

The momentary distance between x and y is de ned to be a Minkowski distance:

d=

"

n X k=1

jxk ? yk j

# 1

(2)

The similarity between two stimuli is a function g of the distance, usually assumed to be of the form:

g(d) = exp(?d )

(3)

where is a positive parameter. The most common models derived from this family are the Euclidean/Gaussian model (for = = 2) and the Cityblock/Exponential model (for = = 1). In more recent years, some models have been developed that abandon the strict distance model, in an attempt to account for the experimentally veri ed violation of the distance axioms.

2.1.2 Set-Theoretic Similarity

In a 1977 paper [8], Amos Tversky proposed his famous feature contrast model. Instead of considering stimuli as points in a metric space, Tversky characterized them as sets of features. Let a, b be two stimuli, and A, B the respective sets of features. Also, let s(a; b) be a measure of the similarity between a and b. The main result of Tversky's paper is the following representation theorem1 :

Theorem 2.1 Let s be a similarity function. Then there are a similarity function S and a non-negative function f such that, for all a; b; c; d:  S (a; b)  S (c; d) () s(a; b)  s(c; d)  S (a; b) = f (A \ B ) ? f (A ? B ) ? f (B ? A) This result implies that any similarity ordering that satis es matching, monotonicity and independence can be obtained using a linear combination (contrast) of a function of the common features (A \ B ) and of the distinctive features (A ? B and B ? A). This representation is called the contrast model. This model can account for violation of all the geometric distance axioms. In particular, S (a; b) is asymmetric if 6= . If S (a; b) is the answer to the question \how is a similar to b?" then, when making the comparison, subjects naturally focus more on the features of a (the subject) than on those of b (the referent). 1 The representation theorem holds under some hypotheses that Tversky called matching, monotonocity, and independence. We skipped these technical details for the sake of brevity. The reader should refer to [8, 5] for details.

This correspond to the use of Tversky's measure with

> : in this case the model predicts

S (a; b) > S (b; a) whenever f (A) > f (B )

(4)

this implies that the direction of the asymmetry is determined by the relative \salience" of the stimuli: if b is more salient than a, then a is more similar to b than vice versa. In other words, the variant is more similar to the prototype than the prototype to the variant, a phenomenon that Tversky con rmed experimentally.

3 Fuzzy Set-theoretic Measures

Tversky's experiments showed that the featurecontrast model has a number of desirable properties, most noticeably, it explains violation of symmetry and of the corner equality. One serious problem for the adoption of the featurecontrast model in image understanding applications is its characterization of features. In Tversky's theory, each stimulus is characterized by the presence or absence of features. This convention forces Tversky to adopt complex mechanisms for the representation of numerical quantities, that don't t nicely into its framework. In the following subsection we introduce the use of fuzzy predicates in the feature-contrast model. The use of fuzzy logic will allow us to extend Tversky's results to situations in which modeling by enumeration of features is impossible or problematic. It has been noted [1] that not all the stimuli in uence similarity perception according to the same mechanism. For some of them|color is an example| an Euclidean distance in an appropriate color space is appropriate and can explain the results of the experiment. For other features, more complicated models are needed, as we have seen. Tversky's feature contrast model applies to a particular type of features: those can be expressed as predicates over the stimuli domain. In this section we will consider only this type of features.

3.1 Fuzzy features contrast model

Consider a typical task in computer vision: assessing the similarity between faces. A face is characterized by a number of features of di erent types but, for the following discussion, we will only consider geometrical features, since these lead naturally to predicate features. It seems pretty intuitive that face similarity is in uenced by things like the size of the mouth, the shape of the chin, and so on. Also, intuitively, two faces with big mouths will be, all other things being equal, more similar than two faces one with a big mouth and one with a small mouth.

A predicate like the mouth of this person is wide can be modeled as a fuzzy predicate whose truth, in the rst approximation, is supposed to be based on the measurement we make of the width of the mouth. In general, we have an image I and a number of measurements i on the image. We want to use these measurements to assess the truth of n fuzzy predicates, (P (xi )) = (xi ). From the measurements i we derive the truth values of a number p of fuzzy predicates, and collect them into a vector:

() = f1 (); : : : p ()g

(5)

We call () the (fuzzy) set of true predicates on the measurements . The set is fuzzy in that a predicate Pj belongs to () to the extent j (). We use this fuzzy set as a basis to apply Tversky's theory. In order to apply the feature contrast model we need to compute the fuzzy set () \ ( ), compute the fuzzy set () ? ( ), and choose a suitable salience function f . The saliency of the fuzzy set  = f1 : : : p g is assumed to be its cardinality:

f () =

p X i=1

i

(6)

The intersection of the sets () and ( ) is de ned in the usual way:

\ (;

) = fminfi (); i ( )ggpi=1

(7)

while the di erence of two sets is de ned as

? (; ) = fmaxfi () ? i ( ); 0ggpi=1

(8)

With these de nitions, we can write the Tversky's similarity function between two fuzzy sets () and ( ) corresponding to measurements made on two images as:

S (; ) =

p X

minfi (); i ( )g

i=1 p X

? ?

i=1 p X i=1

maxfi () ? i ( ); 0g maxfi () ? i ( ); 0g (9)

We refer to the model de ned by eq. (9) as the Fuzzy Features Contrast (FFC) model.

n u= 3

h = vρ

v

nl = 6

nr = 3 nd = 6

Figure 2: A silhouette used for the rst experiment in similarity. Each silhouette is characterized by 5 parameters: the height/width ratio of the central rectangle and the number of extrusions on each side.

4 Similarity Experiments

In this section, we present an experimental comparison between some of the similarity measures introduced so far. We will consider Attneave city-block distance, a few Thurstone-Shepard models, and the Fuzzy feature contrast model. All the experiments reported in the following are done with rather small ensembles of stimuli. Because of this, it was impossible to test measures like Krumhansl's or the decision theoretic models that depend on a statistical characterization of the stimuli.

4.1 Similarity of Silhouettes

The rst experiments uses simple silhouettes like that in Fig. 2. Each silhouette is characterized by ve parameters: The ratio () between the width and the height of the central rectangle, and the number of extrusions on each side of the central rectangle (nl , nr , nu , nl ). From these quantities, we derive the support for the following ve propositions, that are used for Tversky similarity:  \The vertical sides are complex," expressed as B (nl + nr ).  \The horizontal sides are complex," expressed as B (nu + nd).  \The gure is complex," expressed as B (nl + nr + nu + nd ).  \The gure is slender," expressed as S ().  \The gure is equilibrated," expressed as R ( nnul ++nnrd ) Fig. 3 shows the result of a similarity experiment for several similarity measures. The gure in the upper left corner is the stimulus that has been presented.

Euclid

Stimulus

Shepard γ=1, α=1

Fuzzy FC 1 α = 0, β = 0 Fuzzy FC 2 α > 0, β > 0

1

2

3

4

Figure 3: Each row contains the eight gures (out of the 30 that compose the database) ranked as most similar to the stimulus by every measure, excluding the stimulus itself.

5 Complex Queries

When we envision a database founded on similarity assessment, rather than on matching, we immediately face the problem of how to handle complex queries. For the purpose of the following discussion, we de ne simple a query whose semantics is: Given the similarity measure s and the image I , order the images I0 ; : : : In with respect to the image I More formally, a simple query can be represented by the pair (d; I ) and, if I n is the space of the ntuples of images, the semantics of a simple query is the function

F(s;I ) : I n ! I n : (I1 ; : : : In ) 7! (Ip1 ; : : : Ipn ) (10) such that

s(Ipi?1 ; I )  s(Ipi ; I ) i = 2 : : : n

(11) A query is complex if it doesn't have this semantics. Complex queries include operations like ordering with respect to two or more similarity measures (\show me the images with this dominant color and this texture"), or with respect to two or more images (\what is there similar to either of these?"). There are a number of plausible ways to de ne the semantics of a complex query. Unfortunately, in this case experiments don't help us much: all the relevant psychological literature is limited to experiment

with simple queries, at least to the best of the authors knowledge. We have de ned similarity based on a fuzzy logic construct. It seems natural to continue on the same path, and use fuzzy logic to de ne complex queries as well. We will therefore de ne complex queries based on the de nition of and (^), or (_), and not (:) operators. The and operator takes two arguments, each one being a pair (si ; Ii ), and returns a similarity function s^, de ned as: s^ (J ) = min (s1 (J; I1 ); s2 (J; I2 )) (12) It is convenient to de ne a dummy image I^ , so that we can write the result of the and operator as a pair (s^ ; I^ ). The semantics of the and operator depends on the semantics of its two arguments, and is again a function F^ : I n ! I n : (J1 ; : : : Jn ) 7! (Jp1 ; : : : Jpn ) (13) such that:  ? min s1 (Jpi?1 ; I1 ); s2 (Jpi?1 ; I2 ) ;  i = 2 : : : (n14) min (s1 (Jpi ; I1 ); s2 (Jpi ; I2 )) The or and not operators are de ned similarly, based on the functions: s_ (J ) = max (s1 (J; I1 ); s2 (J; I2 )) (15) and s: (J ) = s(X; X ) ? s(J; I ) (16) where s(X; X ) is the self-similarity of the universe of discourse, which is the maximum value the similarity functions can assume. The property that the semantics of the composition of two operators is formally equal to the semantics of a single operator, allow us to nest complex queries and to express queries of arbitrary complexity.

6 Experiments with complex queries

We used the same silhouette drawing discussed in the previous section to experiment with the behavior of complex similarity queries. In the experiment we present here, we have two stimuli, A, and B , and we draw the similarity rating resulting from the following four queries: A, A _ B , A ^ B , and A ^ :B . The similarity measure used for each simple query is the Fuzzy Feature Contrast. The results are presented in Fig. 4 Note that the or query contains either very simple or very complex shapes, while the and queries contains medium complexity shapes: the best compromise between the two contrasting requirements of

A

B

Stimuli Similarity with:

A

A B

A B

A 1

2

3

4

5

6

B Rank

Figure 4: being similar to two very di erent things at the same time. Also, in the or query, the complex gures dominate: they occupy the rst two position, while a silhouette identical to stimulus A appears only in third position. This is due to the fact that, for the complex gures, more predicates are true than for simple gures and, therefore, the similarity with a complex gure has, in general, a higher value than the similarity with a simple gure. It would be interesting to see whether this kind of behavior can be found in human observers too. That would give indications not only on the validity of this model, but also on the kind of predicates that our perceptual system uses to assess similarity.

7 Conclusions

We believe that a useful point of view for multimedia repositories will be to base the query process on similarity assessment, rather than on matching. On one hand, this will require changes in the organization and implementation of databases, not only to cope with the sheer amount of data that is overwhelmingly superior for multimedia data than for textual data, but also to deal with the new requirements posed by the similarity queries. One problem that similarity queries will pose is how to express requirements more complex than a simple \show me all the images similar to this," or \show me all the images with this texture." New tools for combining queries in a meaningful way are needed. In this paper we have discussed the introduction of and, or, and not operators in similarity queries. These operators can be nested, allowing the expression of very complex queries. In addition, the operators have

a very intuitive behavior, making it easy for the user to express the \right" query for the problem at hand.

References

[1] F. Gregory Ashby and Nancy A. Perrin. Toward a uni ed theory os similarity and recognition. Psychological Review, 95(1):124{150, 1988. [2] Daniel M. Ennis and Normal L. Johnson. Thurstone-shepard similarity models as special cases of moment generating functions. Journal of Mathematical Psychology, 37:104{110, 1993. [3] Myron Flickner, Harpreet Sawhney, Wayne Niblack, Jonathan Ashley, Qian Huang, Byron Dom, Monika Gorkani, Jim Hafner, Denis Lee, Dragutin Petkovic, David Steele, and Peter Yanker. Query by image and video content: the QBIC system. IEEE Computer, 1995. [4] Charles E. Jacobs, Adam Finkelstein, and Savid H. Salesin. Fast multiresolution image querying. In Proceedings of SIGGRAPH 95, Los Angeles, CA. ACM SIGGRAPH, New York, 1995. [5] Simone Santini and Ramesh Jain. Similarity matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1995. (submitted). [6] Roger N. Shepard. The analysis of proximities: Multidimensional scaling with unknown distance function. Part I. Psychometrika, 27:125{140, 1962. [7] L. L. Thurstone. A law of comparative judgement. Psychological Review, 34:273{286, 1927. [8] Amos Tversky. Features of similarity. Psychological review, 84(4):327{352, July 1977.