Spectrally Layered Color Indexing

3 downloads 205518 Views 405KB Size Report
1 School of Computer Science, The University of Nottingham, ... have a good understanding at this level, mid and high level concepts are very difficult to grasp ...
Spectrally Layered Color Indexing Guoping Qiu1,2 and Kin-Man Lam2 1

School of Computer Science, The University of Nottingham, [email protected] 2 Center for Multimedia Signal Processing, The Hong Kong Polytechnic University [email protected]

Abstract. Image patches of different roughness are likely to have different perceptual significance. In this paper, we introduce a method, which separates an image into layers, each of which retains only pixels in areas with similar spectral distribution characteristics. Simple color indexing is used to index the layers individually. By indexing the layers separately, we are implicitly associating the indices with perceptual and physical meanings thus greatly enhancing the power of color indexing while keeping its simplicity and elegance. Experimental results are presented to demonstrate the effectiveness of the method.

1 Introduction An effective, efficient, and suitable representation is the key starting point to building image processing and computer vision systems. In many ways, the success or failure of an algorithm depends greatly on an appropriately designed representation. In the computer vision community, it is a common practice to classify representation schemes as low-level, intermediate-level and high level. Low-level deals with pixel level features, high level deals with abstract concepts and intermediate level deals with something in between. Whilst low level vision is fairly well studied and we have a good understanding at this level, mid and high level concepts are very difficult to grasp, certainly extremely difficult to represent using computer bits. In the signal processing community, an image can be represented in the time/spatial domain and in the frequency/spectral domain. In contrast to many vision approaches, signal processing is more deeply rooted in mathematical analysis. Both time domain and frequency domain analysis technologies are very well developed, see for example many excellent textbooks in this area, e.g., [8]. A signal/image can be represented as time sequence or transform coefficients of various types, Fourier, Wavelet, Gabor, KLT etc. These coefficients often provide a convenient way to interpreting and exploiting the physical properties of the original signal. Exploiting well-established signal analysis technology to represent and interpret vision concepts could be a fertile area for making progress. Content-based image and video indexing and retrieval have been a popular research subject in many fields related to computer science for over a decade [1]. Of all the challenging issues associated with the indexing and retrieval tasks, “retrieval relevance” [7] is probably most difficult to achieve. The difficulties can be explained

in a number of aspects. Firstly, relevance is a high level concept and is therefore difficult to describe numerically/using computer bits. Secondly, traditional indexing approaches mostly extract low-level features in a low-level fashion and it is therefore difficult to represent relevance using low-level features. Because low-level features can bear no correlation with high level concepts, the burden has to be on high-level retrieval strategies, which is again hard. One way to improve the situation is to develop numerical representations (low-level features) that not only have clear physical meanings but also can be related to high level perceptual concepts. Importantly as well, the representations have to be simple, easily to compute and efficient to store. There is apparent evidence to suggest that human vision system consist of frequency sensitive channels [6]. In other words, when we see the visual world, we perform some forms of frequency analysis among many other complicated and not yet understood processing. Following the frequency analysis argument, it can be understood, that when a subject is presented an image in front of her, she will “decompose” the image into various frequency components and processing each component with different processing channels (presumably in a parallel fashion). It is convenient to view such a process as decomposing the image into different layers, each layer consists of an image the same size as the original one, but only a certain frequency components are retained in each layer, i.e., a band-pass filtered version of the original image. On each layer, only those grid positions where the pixels has a certain “busyness” will have values other grid positions will be empty. It is to be noted that the notion of layered representation was used in [9] as well, but [9] dealt with motion and high level concept and is not related to what we are proposing in this paper. By decomposing an image into spectrally separated layers, we have applied the concept to the development of simple indexing features for content-based image indexing and retrieval (simplicity and effectiveness is an important consideration in this paper). The organization in the rest of the paper is as follows. In section 2, we present the idea. Section 3 presents an algorithm. Section 4 presents experimental results and section 5 concludes our presentation.

2 The Idea We are interested in developing efficient and effective image indexing features for content-based image indexing and retrieval. Ideally, the indexing features should be chosen in such a way that simple retrieval methods, such as computing simple metric distance measures in the feature space will produce good results. It is also well known that simple low-level features will bear little correlation with perceptual similarities if only simplistic distance measures are used. Our basic idea is to associate simple lowlevel features, such as color, with perceptual and physical meanings. Color is an effective cue for indexing [3] which is well know. Because of its simplicity and effectiveness, it is attractive feature. However, it is also well know simple usage of color, i.e., color histogram, has significant drawbacks [1]. Researchers have tried various ideas of combining color with other features for indexing, see e.g., [1]. Our idea is to group colors according to their associated physical and perceptual properties.

The well-known opponent color theory [2] suggests that there are three visual pathways in the human color vision system and the spatial sharpness of a color image depends mainly on the sharpness of the light dark component of the images and very little on the structure of the opponent-color image components. In terms of perceptual significance of an image, the sharpness or roughnessof an image region determines its perceptual importance. In other words, if two areas of an image contain the same color, then the difference/or similarity of the regions are separated/identified by their spatial busyness. Digital signal processing researchers have developed a wealth of technologies to analyze physical phenomena such as sharpness/roughnessof a signal/image. The most effective way is frequency analysis, technologies ranging from FIR filter to filter banks are well studied [8]. A busy/sharp area is associated with higher frequency components, and a flat area has lower frequency distributions. A busy area may be associated with textured surfaces or object boundaries, a flat area may be associated with backgrounds or interior of an objects. Therefore a red color in a flat area may signify a red background or large red objects with flat surface, and a similar red color in a busy area may be indications of red colored textured surface or red object boundaries. Based on these observations and reasoning, we propose a spectrally layered approach to image indexing. A schematic is illustrated in Fig. 1. Low-Pass Filter

Indexing

Band-Pass Filter # 1

Indexing

….. …

Input

…….

…..

Band-Pass Filter # n

Indexing

High-Pass Filter

Indexing

Filter bank

Max selector

Spectrally classified layers

Indexing each layer

Indexing features

Fig. 1. Schematic of spectrally layered image indexing

Let x be the input image array, hk be the impulse response of a band-pass filter (including the low-pass and high pass filters). Then the output of each band pass filter is y k (i, j ) = x(i − l , j − m ) ∗ hk (l , m )

(1)

Where * denotes convolution. For each pixel position, the MAX selector will identify the filter that produces the largest output, which is used to form the spectrally classified images. Let Lk be the kth layer image corresponding to the kth filter, then  x(i, j ), if y k (i, j ) = MAX ( y1 (i, j ) ,y 2 (i, j ) ,L ,y n (i, j ) ) Lk (i, j ) =   Empty, Othewise

(2)

Indexing is then performed on each layer to obtain the indexing feature vector, If, for the image

I f = {I f (L1 ), I f (L2 ), L , I f (Ln )}

(3)

To summarize therefore, an image is first passed through a filter bank (each filter of the filter bank covers a specified spectral bandwidth). The output of the filter bank is used to classify the pixels. Pixels in an area with similar spatial frequencies are then retained on the same layer. In each layer, which contains only those pixels in areas with similar frequency distributions, is used to form it’s own index. The aggregation of the feature indices from all the layers then forms the overall index of the image. In this way, we have effectively classified the images according to the frequency contents of the image areas and indexing them separately. When such a strategy is used to match two images, the features from areas of similar spatial roughness are matched. That is, we will be matching flat area features in one image to the features in the flat areas of another image, and similarly, busy area features will be mapped to busy area features. When simple image features such as color is used for indexing such strategy should work very effectively, and we introduce an implementation in the next section.

+

x

G(x) Gaussian Filter

y

|…|

_

y < T0

b0

T0< =y < T1

b1

T1< =y < T2

b2

….. y > Tn

bn

Thresholding

Fig. 2. A simple layer classification method.

3 An Algorithm One of the consideration factors in our current work is efficiency. Although it is possible to introduce various features to index each layer, we present a simple and yet effective algorithm for the implementation of the scheme of the last section. The task involved two aspects. The first is how to implement the filter bank scheme, the second is how to index each layer. Filter bank is a well-studied area in image processing. However, we realize that the spectra classification can be implemented in a variety of ways. The essence is to classify pixels in area with similar frequency characteristic into the same layer. We here present a simple non-filter bank based spectral classification method. This is illustrated in Fig. 2. Notice that only the Y component of the image is used in the classification process. This is because the sharpness of the image is mostly contained in this component. An image x is low-pass filtered first by a Gaussian kernel. This lowpassed version is then subtracted from the original image. The difference is then rectified (i.e., taking the absolute values). Multiple thresholds are then applied. The binary images are obtained as (4) and the layers are formed as in (5)

1, if Tk−1