Image Indexing using a Coloured Pattern ... - Semantic Scholar

1 downloads 0 Views 1MB Size Report
A coloured pattern appearance model (CPAM) is constructed to encode the signals of ... Section 3 introduces a CPAM model for small size patterns which uses ...
Image Indexing using a Coloured Pattern Appearance Model



G. Qiu School of Computer Science & IT The University of Nottingham Jubilee Campus, Nottingham NG8 1BB, United Kingdom [email protected]

Abstract We introduce a new method for colour image indexing and content-based image retrieval. An image is divided into small sub-images and the visual appearance of which is characterised by a coloured pattern appearance model. The statistics of the local visual appearance of the image are then computed as measures of the global visual appearance of the image. The visual appearance of the small sub-images is modelled by their spatial pattern, colour direction and local energy strength. To encode the local visual appearance, an approach based on vector quantisation (VQ) is introduced. The distributions of the VQ code indices are then used to index/retrieve the images. The new method can not only be used to achieve effective image indexing and retrieval; it can also be used for image compression. Based on this method, indexing and retrieval can be easily and conveniently performed in the compressed domain without performing decoding operation.

Keywords: colour vision, colour appearance model, image database, image indexing, content-based retrieval, vector quantization, image coding

1.

Introduction

Image indexing and content-based image retrieval (CBIR) has been an extensively researched area in the computer vision community for the past decade. Many methods and techniques have been developed and published. A general approach to CBIR is to extract low-level visual features, such as colour [1], texture [2], and shape [3] etc., and store them as meta-data in addition to the imagery data itself. Content based retrieval is achieved by comparing the visual features extracted from the query example image with those stored in the database based on some forms of similarity measures. Early approaches used colour alone for indexing [1]. This approach has been very successful and is extensively used in many of today’s research and commercial systems. Even though the concept of colour histogram matching is simple, computation can be very expensive, and researchers have been trying to address this issue [4]. In addition to computational problem, it is well known that the biggest drawback of colour histogram based method is that histogram is a global measure, it does not contain any spatial information, again, researchers have recognised the issue and developed schemes to tackle the problem, e.g. [5]. Methods using Gabor filter banks to create texture features for CBIR have been reported as being quite successful in certain areas of application [2]. In [2] only grey-scale textures were studied, and in [6] multi-scale Gabor filter banks have been used for colour texture recognition. In [6] the authors also applied the double-opponent colour vision theory [7] and used Gabor filters outputs to create Gabor opponent features to enhance the performance. In general, a combination of various visual features (colour, texture, shape, etc.) is used to achieve better performance. For a review of current state of the art in this area, there have been several review papers, e.g. [8], and many annual conferences dedicated to the topic, e.g. [9], [10]. In this paper, we have developed a new approach to CBIR for colour image database. We were motivated by a study in the field of psychology and human colour vision, specifically, by the pattern colour separable (PCS) model of human colour vision [11]. According to [11], the visual appearance of a colour pattern to a ♣

Part of this work was performed when the author was with the School of Computing, University of Leeds, United Kingdom. To view the images appear in this paper in colour, a pdf version is available online at http://www.cs.nott.ac.uk/~qiu/Online/Publications.html

human observer is determined by three factors: 1) the spatial frequency, 2) the colour angle and 3) the strength of the stimulus. In other words, within the human visual system (HVS) there are three visual pathways, one is sensitive to the spatial frequency, one is sensitive to the colour and one is sensitive to the strength of the visual stimulus. From this PCS theory, we set out to process the three components, pattern, colour and strength of an image separately. A coloured pattern appearance model (CPAM) is constructed to encode the signals of the visual pathways. We found that the codes used to encode the three channels can be used successfully for image indexing. The organisation of the paper is as follows. Section 2 briefly reviews the pattern colour separable theory. Section 3 introduces a CPAM model for small size patterns which uses vector quantization to encode pattern and colour signals. Section 4 explains how the codes for the pattern and colour visual pathway of the CPAM can be used for content-based image indexing and retrieval. Section 5 will present experimental results and section 6 gives some concluding remarks.

2. Colour Vision Theories and Colour Spaces There is evidence to suggest that different visual pathways process colour and pattern in the human visual system. In [11], experiments were carried out using square wave patterns with a range of different spatial frequencies, colours and stimulus strengths to measure how colour appearance depends on spatial pattern. Results suggest that the value of one neural image is the product of three terms. One term defines the pathway's sensitivity to the square wave’s colour direction. A second term defines the pathway’s sensitivity to the spatial pattern and the third term defines the pathway’s sensitivity to the square wave’s stimulus strength. This is the pattern-colour-separable (PCS) model of human colour vision. There is also physiological evidence to suggest the existence of opponent colour signals in the visual pathway [12]. The opponent colour theory suggests that there are three visual pathways in the human colour vision system. One pathway is sensitive mainly to light-dark variations; this pathway has the best spatial resolution. The other two pathways are sensitive to red-green and blue-yellow variation. The blue-yellow pathway has the worst spatial resolution. In opponent-colour representations, the spatial sharpness of a colour image depends mainly on the sharpness of the light dark component of the images and very little on the structure of the opponent-colour image components However, the pattern-colour-separable model and the opponent colour theory are consistent with one another. In [11], the spatial and spectral tuning characteristics of the pattern, colour and strength pathways were estimated; the result was that one broadband and two opponent colour pathways were inferred. The property of the HVS that different visual pathways have different spatial acuity is well known and exploited in colour image processing in the form of colour models (spaces). The earliest exploitation of this perhaps was the use of YIQ signal in terrestrial TV broadcasting [13]; where the Y component captures the light-dark variation of the TV signal and is transmitted in full bandwidth, whilst the I and Q channels capture the chromatic components of the signal and are transmitted using half the bandwidth. Similar colour models, such as YCbCr [15], Lab [14] and many more [16] were also developed in different contexts and applications. Put the colour models such as YIQ and YCbCr in the context of pattern colour separable framework, it could be roughly interpreted as that the spatial patterns are mostly contained in the Y channel and colours in the I and Q or Cb and Cr channels, whilst the strength is the overall energy of all three channels, although Y will contain the vast majority of it. Because colours and patterns are separable, coding the Y, independently from I and Q, or Cb and Cr, plus the strength, should completely capture the visual appearance of an image. Given that these independent codes contain the visual appearance information, they could be used to index image and retrieve (recognise) images of similar visual appearance, thus are ideal features for image database indexing. In the rest of the paper, we shall show it is indeed the case.

3. Coloured Pattern Appearance Model We would like to translate the pattern-colour-separable model [11], into a computational system. Colour signals captured using a camera or other input devices normally appear in the form of RGB signals. It is first

necessary to convert RGB to an opponent colour space. We decided to use the YCb Cr colour model1 in this paper. The relation between YCb Cr and the better-known RGB space is as follows:

0.114   R  Y   0.299 0.587 C  =  −0.169 −0.331 0.500  G   b    Cr   0.500 −0.419 −0.081  B The Y contains the luminance information, Cb and Cr contain (mostly) chromatic information as well as some luminance information. Because pattern and colour are separable, and Y component has the highest bandwidth, the spatial patterns will be mostly contained in Y, Cb and Cr together can be roughly interpreted as colour. The stimulus strength of a small area of the image can be approximated by the mean values of the area in Y channel only. The three visual pathways, pattern, colour and strength for a small block of image are now modelled in the coloured pattern appearance model (CPAM) as shown in Fig.1.

Y

S

mean

÷

P

Cb

↓2 Cr

÷

C

↓2 Fig.1 Coloured Pattern Appearance Model (CPAM). The visual appearance of a small image block is modelled by three components: the stimulus strength (S), the spatial pattern (P) and the colour (C). For a small image area, the stimulus strength S is approximated by the local mean of the Y component. The pixels in Y normalised by S form the spatial pattern. Because Cb and Cr have lower bandwidth, they are sub-sampled by a factor of 2 in both dimensions. The sub-sampled pixels of Cb and Cr are normalised by S, to form the colour (C) component of the appearance model. Normalising the pattern and colour channels by the strength has two purposes. Firstly, from a coding’s point of view, removing the DC component makes the code more efficient [17]. Secondly, from image indexing’s point of view, it removes to a certain extent the effects of lighting conditions, making the visual appearance model somewhat “colour constant” [20] which should improve the indexing and retrieval performance, especially in the case of retrieving similar surfaces imaged under different conditions. In order to use the model for the purpose of image indexing, the S, P and C signals of the model have to be coded properly. Because we have in mind the code should capture the visual appearance of the image and simultaneously could be conveniently used as features for indexing in image database, we design our encoder based on vector quantization [17]. Vector quantization (VQ) is a mature method of lossy signal compression/coding in which statistical techniques are used to optimise distortion/rate tradeoffs. A vector quantizer is described by an encoder Q, which maps the k-dimensional input vector X to an index i ∈ I specifying which one of a small collection of reproduction vectors (codewords) in a codebook C = {Ci ; i ∈ I} is used for reconstruction, and there is also a decoder, Q-1, which maps the indices into the reproduction vectors, i.e., X’ = Q-1(Q(X)). 1

Other similar colour space can also be used.

There are many methods developed for designing VQ codebook. The K-means types algorithms, such as the LGB algorithm [17], and neural network based algorithms, such as the Kohonen feature map [18] are popular tools. In this work, we used a specific neural network training algorithm, the frequency sensitive competitive learning (FSCL) algorithm [19] to design our codebook. We find FSCL is insensitive to the initial choice of codewords, and the codewords designed by FSCL are more efficiently utilised than those designed by methods such as the LGB algorithm. The FSCL method can be briefly described as follows: 1. 2. 3.

4.

Initialise the codewords, Ci (0), i = 1, 2, …, I, to random number and set the counters associated with each codeword to 1, n i (0) =1 Present the training sample, X(t), where t is the sequence index, and calculate the distance between X(t) and the codewords, Di (t) = D(X(t), Ci (t)), and modify the distance according to D* i (t)=n i (t)Di (t) Find j, such that D* j