A Statistical Mask-Matching Approach for Recognizing Handwritten

1 downloads 0 Views 555KB Size Report
downloaded from the WBIIS [10] database are used to demonstrate the .... 23, pp. 90-93, 1974. [2] H.-J. Bae and S.-H. Jung, "Image Retrieval. Using Texture ...
Content-Based Image Retrieval Using Fuzzy Cognition Concepts Yo-Ping Huang Tienwei Tsai Te-Wei Chiang Department of Accounting Department of Computer Department of Computer Science and Engineering Science and Engineering Information Systems Tatung University Chihlee Institute of Technology Tatung University Taipei, Taiwan, R.O.C. Taipei County,Taiwan,R.O.C. Taipei, Taiwan, R.O.C. Tel: (02)22576167 ext. 286 Tel: (02)22576167 ext. 236 Tel: (02)25925252 ext. 2473 Fax: (02)25943699 Fax: (02)22591114 Fax: (02)22591114 [email protected] [email protected] [email protected] Abstract 1. Introduction In this paper, a content-based image retrieval method that exploits fuzzy cognition concepts is proposed. Due to the energy compacting property and the ability of extracting the texture information, discrete cosine transform (DCT) is applied to extract low-level features from the images. In the image database establishing phase, each image is first transformed from the standard RGB color space to the YUV space; then each component (i.e., Y, U, and V) of the image is further transformed to the DCT domain. In the image retrieving phase, the system compares the most significant DCT coefficients of the Y, U, and V components of the query image and those of the images in the database and find out good matches. To benefit from the user-machine interaction, a GUI for fuzzy cognition is developed, allowing users to easily adjust weights for each feature according to their preferences. In our preliminary experiment, 1000 test images are used to demonstrate the effectiveness of our system.

摘要 本論文提出了一種配合模糊認知之以內容為基 礎的圖片檢索(CBIR)法。本研究主要是利用離 散餘弦轉換(DCT)作為特徵萃取的基本工具, 將圖片之低階紋理特徵取出,用以產生一組圖片 的特徵描述向量。於圖片資料庫建立階段,所有 圖片先由原始的 RGB 色彩空間轉換成 YUV 色彩 空間,再進一步轉換成 DCT 係數。於圖片檢索階 段,經比對查詢圖片(query image)之 Y、U、V 成份中最重要的幾個 DCT 係數,即可找到與查詢 圖片最相似的幾張圖片。此外,為了有效整合不 同的特徵,並讓使用者與本系統產生良好的互 動,我們設計了一個模糊認知之人機介面,使用 者可以藉此表達查詢圖片中不同特徵間之相對重 要性,進而讓使用者更易於描繪心目中的標的圖 片。在初步的實驗中,我們使用了 1000 張各式圖 片來驗証本方法的有效性。 關鍵詞:以內容為基礎之圖片檢索、圖片索引、 離散餘弦轉換、模糊認知。

With the spread of Internet and the growth of multimedia application, the requirement of image retrieval is increased. Basically, image retrieval procedures can be roughly divided into two approaches: query-by-text (QBT) and query-byexample (QBE). In QBT, queries are texts and targets are images; in QBE, queries are images and targets are images. For practicality, images in QBT retrieval are often annotated by words. When images are sought using these annotations, such retrieval is known as annotation-based image retrieval (ABIR). ABIR has the following drawbacks. First, manual image annotation is time-consuming and therefore costly. Second, human annotation is subjective. Furthermore, some images could not be annotated because it is difficult to describe their content with words. On the other hand, annotations are not necessary in a QBE setting, although they can be used. The retrieval is carried out according to the image contents. Such retrieval is known as contentbased image retrieval (CBIR) [5]. CBIR becomes popular for the purpose of retrieving the desired images automatically [2], [6], [8]. However, the standard CBIR techniques can find the images exactly matching the user query only. The retrieval result will be unsatisfactory if the query forms can’t easily represent the content. To overcome such problems, we propose an information retrieval method that allows users to conduct a query by transforming human cognition into numerical data based on their impressions or opinions on the target. In QBE, the retrieval of images basically has been done via the similarity between the query image and all candidates on the image database. To evaluate the similarity between two images, one of the simplest ways is to calculate the Euclidean distance between the feature vectors representing the two images. To obtain the feature vector of an image, some transform type feature extraction techniques can be applied, such as wavelet, Walsh, Fourier, 2-D moment, DCT, and Karhunen-Loeve. In our approach, the DCT is used to extract low-level texture features. Due to the energy compacting property of DCT, much of the signal energy has a tendency to lie at low frequencies. In other words,

Figure 1. The proposed system architecture. most of the signal energy is preserved in a relatively small amount of DCT coefficients. For instance, in the case of OCR, 84% signal energy appears in the 2.8% DCT coefficients, which corresponds to a saving of 97.2% [3]. Moreover, most of the current images are stored in JPEG format, which is based on DCT. As to other transform-based feature extraction methods like wavelet transform, the image decompression of inverse DCT is needed for the DCT-coded images. We hope that the feature sets derived here can be generated directly in DCT domain so as to reduce the processing time. In this paper, we propose a content-based image retrieval method that incorporates the fuzzy cognition concepts [7]. In the image database establishing phase, each image is first transformed from the standard RGB color space to the YUV space; then each component (i.e., Y, U, and V) of the image is further transformed to the DCT domain. In the image retrieving phase, the system compares the most significant DCT coefficients of the Y, U, and V components of the query image and those of the images in the database and find out good matches. To obtain the benefits from the user-machine interaction, a GUI for fuzzy cognition is developed, allowing users to adjust weights for each feature according to their preferences. The remainder of this paper is organized as follows. The next section is the problem formulation. Section 3 presents the proposed image retrieval system. Experimental results are shown in Section 4. Finally, conclusions are drawn in Section 5.

2. Problem Formulation In this work we focus on the QBE approach. The user gives an example image similar to the one

he/she is looking for. Formally, the QBE can be defined as follows. Let I be the image database with I := {Xn | n = 1, . . ., N} where Xn is an image represented by a set of features: Xn := {xn m | m = 1, . . ., M}. N and M are the number of images in the image database and the number of features, respectively. Because query Q is also an image, we have Q := {qm | m = 1, . . ., M}. To query the database, the dissimilarity (or distance) measure D(Q, Xn) is calculated for each n as M

D(Q, X n ) = ∑ wm .d m (qm , xnm ), for n = 1, ..., N . (1) m =1

Here, dm is the distance function or dissimilarity measure for the mth feature and wm ∈ R is the weight of the mth feature. For each n, ∑M wm = 1 holds. By m=1 adjusting the weights wm it is possible to emphasize properties of different features. Finally, the images with the smallest dissimilarity values are retrieved from the image database as the resulting images of the query Q. In the next section we shall introduce our image retrieval system.

3. The Proposed Image Retrieval System 3.1 System Architecture Figure 1 shows the system architecture of our DCT-based QBE system. This system contains three major modules: the feature extraction module, the similarity measuring module, and the fuzzy cognition query module. The details of each module are introduced in the following sections.

3.2 Feature Extraction Features are functions of the measurements performed on a class of objects (or patterns) that enable that class to be distinguished from other classes in the same general category [9]. Basically, we have to extract distinguishable and reliable features from the images. Before the feature extraction process, the images have to be converted to the desired color space. There exist many models through which to define the valid colors in image data. Each of the following models is specified by a vector of values, each component of that vector being valid on a specified range. This presentation will cover the following major color spaces definitions [4]: RGB (Red, Green, and Blue), CMYK (Cyan, Magenta, Yellow, and Black Key), CIE (Centre International d’Eclairage), YUV (Luminance and Chroma channels), etc. In our approach, the RGB images are first transformed to the YUV color space.

spectral sub-bands) of differing importance (with respect to the image's visual quality). It uses the orthogonal real basis vectors whose components are cosines. The DCT approach has an excellent energy compaction property and requires only real operations in transformation process. On applying 2-D DCT, a frequency spectrum (or the 2-D DCT coefficients) F(u, v) of an K×K image represented by f(x, y) for x, y=0, 1, …, K-1 can be defined as F (u, v) =

where ⎧⎪ 1

for w = 0, 2 ⎪⎩ 1 otherwise. We notice that DCT is a unitary transform, which has the energy preservation property, i.e.,

α (w) = ⎨

K −1 K −1

K −1 K −1

E = ∑ ∑ ( f ( x, y ) ) = ∑ ∑ (F (u , v) ) ,

3.2.1 RGB color space A gray-level digital image can be defined to be a function of two variables, f(x, y), where x and y are spatial coordinates, and the amplitude f at a given pair of coordinates is called the intensity of the image at that point. Every digital image is composed of a finite number of elements, called pixels, each with a particular location and a finite value. Similarly, for a color image, each pixel (x, y) consists of three components: R(x, y), G(x, y), and B(x, y), each of which corresponds to the intensity of the red, green, and blue color in the pixel, respectively.

3.2.2 YUV color space Originally used for PAL (European "standard") analog video, YUV is based on the CIE Y primary, and also chrominance. The Y primary was specifically designed to follow the luminous efficiency function of human eyes. Chrominance is the difference between a color and a reference white at the same luminance. The following equations are used to convert from RGB to YUV spaces: Y(x, y) = 0.299 R(x, y) + 0.587 G(x, y) + 0.114 B(x, y), U(x, y) = 0.492 (B(x, y) - Y(x, y)), and V(x, y) = 0.877 (R(x, y) - Y(x, y)).

K −1 K −1 2 α (u )α (v)∑∑ f ( x, y) K x =0 y =0 ( 2 x + 1)uπ ( 2 y + 1)vπ × cos( ) × cos( ), (5) 2K 2K

x =0 y =0

Basically, the Y, U, and V components of a image can be regarded as the luminance, the blue chrominance, and the red chrominance, respectively. After converting from RGB to YUV, the features of each image can be extracted by the DCT.

3.2.3 Discrete Cosine Transform Developed by Ahmed et al. [1], DCT is a technique for separating the image into parts (or

2

(6)

u =0 v =0

where E is the signal energy. In Eq. (5), the coefficients with small u and v correspond to low frequency components; on the other hand, the ones with large u or v correspond to high frequency components. For most images, much of the signal energy lies at low frequencies; the high frequency coefficients are often small - small enough to be neglected with little visible distortion. Therefore, DCT has superior energy compacting property. DCT techniques can be applied to extract texture feature from the images due to the following characteristics [2]: z z

z

(2) (3) (4)

2

the DC coefficient (i.e. F(0, 0)) represents the average energy of the image; all the remaining coefficients contain frequency information which produces a different pattern of image variation; and the coefficients of some regions represent some directional information.

Based on above observations, we were motivated to devise an image retrieval scheme using low frequency 2-D DCT coefficients as the discriminating features.

3.3 Similarity Measurement To decide which image in the image database is the most similar one with the query image, we have to define a measure to indicate the degree of dissimilarity (or distance). Traditional distance measure typically applies the sum of absolute differences (SAD) to avoid multiplications. To

exploit the energy preservation property of DCT (see Eq. (6)), however, we use the sum of squared differences (SSD) instead, which can be implemented efficiently using a look-up table to calculate the square. Assume that Fq (u , v) and m

Fx (u , v ) represent the DCT coefficients of the mth feature of the query image Q and image Xn, respectively. Then the distance between qm and xnm under the low frequency block of size k×k can be defined as nm

k −1 k −1

(

)

2 d m ( qm , xnm ) = ∑ ∑ Fq m (u , v ) − Fxnm (u , v ) .

of the mth feature. For each n,



3 m =1

wm = 1 holds.

From another point of view, each weight can be regarded as the fuzziness of the cognition to the associated feature. The three weight factors can be adjusted by users via a GUI. Therefore, users can emphasize the features that are relatively important based on their feelings or opinions. Finally, the images that are most similar to the query image are retrieved from the image database.

4. Experimental Results (7)

u =0 v =0

3.4 Fuzzy Cognition Query To benefit from the user-machine interaction, we develop a GUI for fuzzy cognition, allowing users to adjust the weight of each feature more easily according to their preferences. As introduced in Section 2, each image is represented by M features. In this paper, three features (i.e., luminance Y, chrominance U, and chrominance V) are considered for each image. Thus, M is 3 and the distance D(Q, Xn) is calculated as 3

D(Q, X n ) = ∑ wm .d m (qm , xnm ), for n = 1, ..., N . (8) m =1

Here, Q and Xn are the query image and one of the images in the image database. dm is the distance function defined as Eq. (7) and wm ∈ R is the weight

In this preliminary experiment, 1000 images downloaded from the WBIIS [10] database are used to demonstrate the effectiveness of our system. The user can query by an external image or an image from the database. The difference between these two options is that when an external image is used its features need to be extracted while if the image is already in the database, its features are already extracted and stored in the database along the image. Therefore, when a query submitted using an image from the database, only the index of the image in the database is transferred to the server. In both cases the features used are imposed by the database selected at the start. In this experiment, we only use the images from the database as query images. In our experiments, we found that the low frequency DCT coefficients of size 5×5 are enough to make a fair quality of retrieval. Figure 2 shows the result of using a butterfly as the query image and its luminance (or Y component) as the main feature. The

Figure 2. Retrieved results using a butterfly as the query image and its luminance as the main feature.

retrieved results are ranked in the ascending order of the distance to the query image from the left to the right and then from the top to the bottom. We can see that the result is not good enough. The 8th and 9th retrieved images are red flowers, which are quite different to the query image. Thus, users can increase the importance of the red color component (i.e., the U component) via the text box (or scrollbar) provided by the system’s GUI (as shown at the bottom of Figure 1). Figure 3 shows the result of using the same butterfly as the query image and emphasizing the weight of its V component. Figure 4 shows the result of using a mountain scene as the query image and its Y component as the main feature. We can see that the result is not good enough. Since the major difference between the query image and the retrieved images lies in their color, users can improve the retrieved results by increasing the weight of the U component (i.e., blue color). Figure 5 shows the result of using the same mountain as the query image and its U component as the main feature. It is observed that the use of the different components may lead to a better retrieval result. Note that the performance evaluation of image retrieval techniques is in general difficult since there is no commonly agreed image database for comparative study and the performance would highly depend on the selection of query image.

5. Conclusions In this paper, a content-based image retrieval method that exploits fuzzy cognition concepts is proposed. The DCT is applied to extract low-level features from the images due to its energy compacting property and ability of extracting the texture information. To achieve QBE, the system compares the most significant DCT coefficients of the Y, U, and V components of the query image and those of the images in the database and find out good matches by the help of users’ cognition ability. Since there is no feature capable of covering all aspects of an image, the discrimination performance is highly dependent on the selection of features and the images involved. Since several features are used simultaneously, it is necessary to integrate similarity scores resulting from the matching processes. An important part of our system is the implementation of a set of flexible weighting factors for this reason. Since only preliminary experiment has been made to test our approach, a lot of works should be done to improve this system: For each type of feature we will continue investigating and improving its ability of describing the image and its performance of similarity measuring. A long-term aim is combining the semantic annotations and low-level features to improve the retrieval performance. For the analysis of complex

Figure 3. Retrieved results using a butterfly as the query image and emphasizing the weight of its V component.

(a)

(b)

Figure 4. Retrieved results using a mountain scene as the query image and its Y component as the main feature: (a) the query image; (b) the retrieved images.

(a)

(b)

Figure 5. Retrieved results using a mountain scene as the query image and its U component as the main feature: (a) the query image; (b) the retrieved images.

scenes, the concept that provide a high amount of content understanding enable highly differentiated queries on abstract information level. The concept is worthy of further study to fulfill the demands of integrating semantics into CBIR.

References [1] N. Ahmed, T. Natarajan, and K. R. Rao, "Discrete cosine transform," IEEE Trans. on Comput., vol. 23, pp. 90-93, 1974. [2] H.-J. Bae and S.-H. Jung, "Image Retrieval Using Texture Based on DCT," Proc. Int. Conf. on Information, Communications and Signal Processing, Singapore, pp.1065-1068, 1997. [3] T.-W. Chiang, T. Tsai, and Y.-C. Lin, "Progressive Pattern Matching Approach Using Discrete Cosine Transform," Proc. Int. Computer Symposium, Taipei, Taiwan, pp.726-730, Dec. 2004. [4] S. Dunn. Digital Color. Available at http://davis.wpi. edu/~matt/courses/color/, 1999. [5] V. Gudivada and V. Raghavan, "Content-Based

Image Retrieval Systems," IEEE Computers, vol. 28, no. 9, pp. 18-22, 1995. [6] Y.-L. Huang and R.-F. Chang, "Texture Features for DCT-Coded Image Retrieval and Classification," Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, pp.3013-3016, 1999. [7] Y.-P. Huang and T. Tsai, "A Fuzzy Semantic Approach to Retrieving Bird Information Using Handheld Devices," IEEE Intelligent Systems, pp.16-23, Jan. 2005. [8] K.-C. Liang and C. C. Kuo, "WaveGuide: A Joint Wavelet-Based Image Representation and Description System," IEEE Trans. on Image Processing, vol. 8, no. 11, pp.1619-1629, 1999. [9] M. Nadler and E. P. Smith, Pattern Recognition Engineering, New York: Wesley Interscience, 1993. [10] J. Z. Wang. Content Based Image Search Demo Page. Available at http://bergman.stanford.edu/ ~zwang/ project /imsearch/WBIIS.html, 1996.