A Statistical Mask-Matching Approach for Recognizing Handwritten

Fast Image Retrieval Using Low Frequency DCT Coefficients Tienwei Tsai Department of Computer Science and Engineering Tatung University Taipei, Taiwan, R.O.C. [email protected]

Yo-Ping Huang Department of Computer Science and Engineering Tatung University Taipei, Taiwan, R.O.C. [email protected]

Te-Wei Chiang Department of Information Networking Technology Chihlee Institute of Technology Taipei County, Taiwan, R.O.C. [email protected]

Abstract Retrieval by image content has received great attention in the last decades. Recently, fast contentbased multimedia retrieval is becoming more important as the number of compressed multimedia images and video data increases. In this paper, we investigate the use of the low frequency DCT coefficients that transformed from YUV color space as feature vectors for the retrieval of images. The proposed system allows users to select its dominant feature of query images so as to improve the retrieval performance. The experimental results show that the proposed features are sufficient for performing effective retrieval by introducing users’ opinions on the query images. Keywords: Content-based image retrieval, discrete cosine transform, color space, query by example.

1. Introduction Due to the rapid development in digital imaging, storage and networking technologies, the contentbased image retrieval (CBIR) has emerged as an important area in computer vision and multimedia computing. Rather than relying on manual indexing or text descriptions by humans, CBIR systems use features that can be extracted from the image files themselves in searching a collection of images. How to achieve high correct retrieval rate at the lower feature dimensionality is an important issue in CBIR. Though a number of image features based on color, texture and shape attributes in various domains have been reported in the literature [1-4], it is still a challenge to select a good feature set for image classification. Typically, CBIR comprises both indexing and retrieval. It has been observed that the problem of image indexing has been approached by two different groups, using distinctive methodologies [5]. One group, clustered largely around the more traditional, text-oriented library informatics world, has approached the problem as a task in efficiently adding text descriptors to images. The second group, clustered largely around computer science work, has approached the problem through image processing. This second group has come to be largely identified with CBIR. We also note a third research direction,

proposed by Goodrum [6], that seeks to combine image processing with text labelling of images. Several techniques have been proposed to the problem of finding or indexing images based on their contents. For example, methods such as Fourier Transform, Discrete Cosine Transform (DCT), Hough Transform, Wavelet Transform, Gabor Transform, and Hadamard transform coefficients have been used as engine in CBIR system. Each method used has strong and weak points. In our approach, the DCT is used to extract low-level texture features for fast retrieval of images. Due to the energy compacting property of DCT, much of the signal energy has a tendency to lie at low frequencies. In other words, most of the signal energy is preserved in a relatively small amount of DCT coefficients. For instance, in the case of OCR, 84% signal energy appears in the 2.8% DCT coefficients, which corresponds to a saving of 97.2% [9]. This means that the coefficients of the highfrequency components are close to zero, and therefore negligible in most cases. Moreover, most of the current images are stored in JPEG format and the image-compression technology at the heart of the JPEG standard is DCT. As to other transform-based feature extraction methods like wavelet transform, the image decompression of inverse DCT is needed for the DCT-coded images. We hope that the feature sets derived here can be generated directly in DCT domain so as to reduce the processing time. Generally, the CBIR systems follow the paradigm of representing images via a set of feature attributes, such as color, texture, shape and layout. Selecting too many features can probably inhibit important features to be taken into account. In this paper, we intend to achieve an acceptable retrieval result using a content descriptor formed by only one single feature. For large image database, therefore, the system can be built to support efficient and effective accesses to image data with a very low computational load. Our system mainly focuses on the texture feature in YUV color space. The effective image feature vector can be derived from the low frequency DCT coefficients which are transformed from the texture feature of an image. A retrieval task is performed by matching the feature vector of the query image with

Figure 1. The architecture of the proposed system. those of the database images. We have implemented an example system employing the proposed featureextraction technology, using the size of 5×5 low frequency DCT coefficients. The experimental results show that the proposed feature is sufficient for performing a high efficiency of retrieval. Moreover, the retrieval quality can be further improved by introducing users’ opinions on the query images. That is, users could try to conduct the retrieval based on the transform of Y, U or V component. Such multiple passes of retrieval can be of great help for novel or inexperienced users. The remainder of this paper is organized as follows. The next section is the problem description. Section 3 illustrates the proposed image retrieval system. Experimental results are shown in Section 4. Finally, conclusions are drawn in Section 5.

image or sketch into a mathematical feature vector compatible with the feature vectors stored in the database. Afterward, while conducting the retrieval, the CBIR system will use the search strategy to efficiently search the database of feature vectors for those near to the user query feature vector. Note that a search strategy is often defined by the choice of organization of the feature vectors. Since similarity matching is typically defined by a distance function, e.g. Euclidean distance, fewer items in a feature vector will lead to a faster matching at the expense of accuracy. Actually, using low dimensional features is an important (and challenging) issue in CBIR and it is the very problem we try to tackle in this paper.

3. The Proposed Method 2. Problem Description CBIR often comprises both indexing and retrieval [6]. Indexing is a process that performs data reduction of images into mathematical features. Indexing may be subdivided into the steps of feature extraction, feature vector organization and classification. Feature extraction finds out the suitable properties of interest and converting them into mathematical “feature vectors”. Feature vector organization intends to organize the feature vectors in the database into a structure optimized for searching efficiency. As for the classification, it labels the images into categories of interest to narrow down the search space. Retrieval is a process that supports the user interaction to retrieve desired images from the database, which comprises user query formulation, user query feature extraction, query search space strategy and similarity matching method. There are three typical approaches to formulate the user query: query by text (QBT), query by example (QBE) and query by sketch (QBS). The user query feature extraction will first convert the user's exemplary

3.1 System Architecture Figure 1 shows the architecture of the proposed CBIR system. There are two major modules in the system: the feature extraction module and the similarity measurement module. The feature extraction module computes image features automatically from given images to archive and/or retrieve images. The similarity measurement module will compute the distance between the query image and those images in the database. From another point of view, the whole system involves two phases: database establishing phase and image retrieving phase. In the image database establishing phase, each image is first transformed from the standard RGB color space to the YUV space; then each component (i.e., Y, U, and V) of the image is further transformed to the DCT domain. In the image retrieving phase, the system compares the low frequency DCT coefficients of the Y, U, or V component of the query image and those of the images in the database. Users can choose another component as the feature vector if they are not

satisfied with the retrieval results by using the current one.

3.2 Feature Extraction Module Feature extraction is the most important step in image retrieval. Numerous feature extraction strategies are proposed in the literature. Typically, the images are submitted to a linear transform, filter, or bank of filters, followed by some energy measure. DCT is one of the best filters for feature extraction in the spatial frequency domain [7]. Furthermore, DCT preserves useful properties such as energy compacting, and image data correlation. In this paper, we use DCT as a filter. The transformed DCT coefficients are further used to form a feature vector. There are K2 DCT coefficients for a K×K image block. However, not all the coefficients contain useful information. Moreover, the complexity of similarity measuring depends on the number of coefficients. Our experiments have shown that the use of a block size of 5×5 low frequency coefficients performs quite well from a viewpoint of retrieval quality. The steps of feature extraction for both the objects in the image database and the query object can be summarized as: z transforming a RGB image into the YUV color space, and z calculating the low frequency DCT coefficients for all three components. In the image database establishing phase, the generated feature vectors are stored into the database, along with original images. In the image retrieving phase, the generated vectors are used to compare with those in the database. The details of transformation will be illustrated in the following subsections.

3.2.1 RGB Color Space There are several different ways to describe color, known as color spaces. There are many color spaces in use in the world today, and the issues of color modeling and computation are subtle. RGB is perhaps the simplest color space for people to understand. In RGB color space, combining red, green, and blue in different ratios makes up all colors. That is, for a color image, each pixel (x, y) consists of three components: R(x, y), G(x, y), and B(x, y), each of which corresponds to the intensity of the red, green, and blue color in the pixel, respectively. In these component images, lack of color appears as black and full color appears as white, with gray values in between representing increasingly more of that color component.

3.2.2 YUV Color Space The YUV color space differs from RGB, which is what the camera captures and what humans view. The Y stands for “luminance” (or brightness), which is

isolated from the color information. Black and white TVs decode only the Y part of the signal. U and V are chrominance components, which are the difference between a color and a reference white at the same luminance. More precisely, U and V provide color information and are "color difference" signals of blue minus luminance (i.e., B-Y) and red minus luminance (i.e., R-Y). Psycho-perceptual studies have shown that the human brain perceives images largely based on their luminance value, and only secondarily based on their color information. In this paper, Y, U and V are all regarded as distinguishable features for image retrieval. The following equations are used to convert from RGB to YUV spaces: Y(x, y) = 0.299 R(x, y)+0.587 G(x, y)+ 0.114 B(x, y), U(x, y) = 0.492 (B(x, y) - Y(x, y)), and V(x, y) = 0.877 (R(x, y) - Y(x, y)).

(1) (2) (3)

In summary, the Y, U, and V components of an image can be regarded as the luminance, the blue chrominance, and the red chrominance, respectively. After converting from RGB to YUV, the features of each image can be extracted by DCT transformation over Y, U, and V components.

3.2.3 Discrete Cosine Transform Our feature is based on a measure of the local energy derivation of DCT coefficients. We generate DCT coefficients for a K×K image on a pixel by pixel basis. The K×K DCT coefficients thus give the nature of textual energy for each pixel. The equation used for the DCT calculation for each pixel is given as follows: F (u, v) =

K −1 K −1 2 α (u )α (v)∑∑ f ( x, y) K x =0 y =0 ( 2 x + 1)uπ ( 2 y + 1)vπ × cos( ) × cos( ), 2K 2K

(4)

where f(x, y) is the pixel value at the (x, y) coordinate position in the image, F(u, v) is DCT domain representation of f(x, y), where u and v represent vertical and horizontal frequencies, respectively. u, v, x, and y have values from 0 to K-1, and ⎧⎪ 1

α (w) = ⎨

⎪⎩ 1

for w = 0,

2

otherwise.

The DCT coefficients have an excellent energy preservation property, i.e., K −1 K −1

K −1 K −1

E = ∑ ∑ ( f ( x, y ) ) = ∑ ∑ (F (u, v) ) , x =0 y =0

2

2

(5)

u =0 v =0

where E is the signal energy. In Eq. (4), the coefficients with small u and v correspond to low frequency components; on the other hand, the ones with large u or v correspond to high frequency components. For most images, much of the signal energy lies at low frequencies; the high

frequency coefficients are often small - small enough to be neglected with little visible distortion. Therefore, DCT has superior energy compacting property. Based on above observations, we were motivated to devise an image retrieval scheme using low frequency 2-D DCT coefficients as discriminating features. In summary, DCT techniques can be applied to extract texture feature from the images due to the following characteristics [8]: z

F(0, 0), i.e., the DC coefficient, represents the average energy of the image; z all the remaining coefficients, i.e., the AC coefficients, contain frequency information which produces a different pattern of image variation; and z the coefficients of some regions represent some directional information. We have used DCT transform due to its very good information packing. Practically, this means that DCT concentrates most of the signals energy in a few coefficients.

3.3 Similarity Measurement Module In this section, we describe the computation of the similarity measure between a pair of images. To decide which image in the image database is the most similar one to the query image, we have to define a measure to indicate the degree of similarity (or dissimilarity). Traditional dissimilarity (or distance) measure typically applies the sum of absolute differences (SAD) to avoid multiplications. To exploit the energy preservation property of DCT (see Eq. (5)), however, we use the sum of squared differences (SSD) instead, which can be implemented efficiently using a look-up table to calculate the square. In our system, a query image is any one of the images from image database. If q and x are two ndimensional feature vectors of a query image and any database image, then the distance is defined as: n

d E ( q, x ) = ∑ ( q i − x i ) 2 . i =1

From the above equation, the computing cost directly depends on the dimension of the feature vector. The lowest cost is achieved by using only one DC coefficient. For moderate computational complexity, the low frequency block size was chosen to be 5×5. Thus, d E (q, x) = ∑ ∑ (Fq (u , v) − Fx (u , v) ) , 4

4

2

u =0 v =0

where Fq (u , v) and Fx (u , v) represent the DCT coefficients of Y-, U- or V-component of the query image q and a database image x, respectively. The fact that the distances in each dimension are squared before summation, places great emphasis on those

dimensions for which the dissimilarity is large. Since we used only one feature vector for comparison, it did not become an issue. If several features are used simultaneously, it is necessary to integrate similarity scores by normalizing the distance resulting from each individual feature.

4. Experimental Results To evaluate the proposed method, two fundamental questions to be asked are: (1) would the extracted features be able to reserve the contents of an image and be good enough for image indexing and retrieval? (2) would the retrieval process be efficient enough in a large image database? However, evaluation of a certain technique’s performance in a CBIR system is often difficult because up to this date, there is no agreed measuring criterions and benchmark testing data set to compare different methods. One of the objectives of our present work is to develop more flexible and more efficient methods for CBIR. In this section, we will demonstrate how well the feature vectors derived from the DCT coefficients in YUV color space. We have tested an image database of 1000 color images downloaded from the WBIIS database [10]. It mainly consists of scenes of natural, animals, insects, building, people, and so on. No preprocessing was done on the images. To evaluate the effectiveness of the proposed feature extraction method, we present three different query experiments, each of which uses one of the YUV components as the feature vector. Fig. 2 presents the main screen which illustrates the flexibility of our experimental system. A search consists of the comparison of the query vector against all other feature vectors in the database using the proposed distance measure and then sorting them in order. These query results are the best 10 images ranking from left to right and then from top to bottom, including the query image itself as the first image as expected. An image is deemed as a “correct” retrieval if it contains objects similar to the query, as judged subjectively. For the first query, we obtain five right answers (related to outer space, spaceman, or spaceflight) in the top 10 using Y-component as the feature vector, as shown in Fig. 3. We may obtain eight correct answers if we use U-component as the feature vector instead, as shown in Fig. 4. For the second query, we obtain seven correct answers (related to flowers) in the top 10 using Y-component as the feature vector, as shown in Fig. 5. We may obtain ten correct answers if we use V-component as the feature vector instead, as shown in Fig. 6. Note that the second query has a better result than the first one because the second query example has common textures. These results also vindicate the importance of the multiple processing and refinement by choosing different feature vectors. Moreover, the retrieval efficiency is excellent because we use only

Figure 2. The main screen of the proposed CBIR system. one feature vector at a time, without sacrificing the retrieval quality.

5. Conclusions and Future Work Directly retrieving feature vectors from low level contents of images has been a very active research area in the past decade. In this paper, we mainly focus on the texture features of images in YUV color space, which are formed by low frequency DCT coefficients. In the image database establishing phase, each component (i.e., Y, U, and V) of the image is transformed to the DCT domain. In the image retrieving phase, the system compares the low frequency DCT coefficients of the Y, U, or V component of the query image and those of the images in the database. Users can choose any one of the components as its main feature for comparison. However, they can choose another component instead if they are not satisfied with the current retrieval results. Such multiple passes of retrieval can be of great help for novel or inexperienced users. Two query examples are conducted to demonstrate the efficiency and effectiveness of our proposed feature extraction approach. The experimental results show a high efficiency of retrieval. Since only a total of 25 DCT coefficients are calculated as the complete set of features, it leads to fast matching with an acceptable retrieval quality. The speed can be further improved by using fewer DCT coefficients at the

expense of accuracy. Moreover, we conclude that DCT coefficients may be better suited for the retrieval of landscapes (natural scenes), where colors are relatively constant (grass has a yellowish-green hue, sky has a blue hue, etc). In the future, we will integrate multiple feature attributes to improve the retrieval results. Since several features are used simultaneously, it is necessary to integrate the relative importance of the individual feature by involving a weighting vector. On the other hand, the availability of a summary view is important in situations where a user has no specific query image at the beginning of the search process and wants to explore the image collections to locate images of interest. After images are pre-classified by some algorithms, we will provide a layout of thumbnail browsing to overcome such problems.

References [1] G. Fan and X.-G. Xia, “Wavelet-Based Texture Analysis and Synthesis Using Hidden Markov Models,” IEEE Trans. on Circuit and Systems－I: Fundamental Theory and Applications, vol. 50, no. 1, Jan. 2003, pp.106-120. [2] G. Qiu, “Color Image Indexing Using BTC,” IEEE Trans. on Image Processing, vol. 12, no. 1, Jan. 2003, pp.93-101.

(a) (b) Figure 3. (a) the query image, and (b) retrieved results using Y-components as the feature vector.

(a) (b) Figure 4. (a) the query image, and (b) retrieved results using U-components as the feature vector.

(a) (b) Figure 5. (a) the query image, and (b) retrieved results using Y-components as the feature vector.

(a) (b) Figure 6. (a) the query image, and (b) retrieved results using V-components as the feature vector.

[3] J. Wei, “Color Object Indexing and Retrieval in Digital Libraries,” IEEE Trans. on Image Processing, vol. 11, no. 8, Aug. 2002, pp.912-922. [4] K.-C. Liang and C. C. Kuo, “WaveGuide: A Joint Wavelet-Based Image Representation and Description System,” IEEE Trans. on Image Processing, vol. 8, no. 11, 1999, pp.1619-1629.

[5] Communications Engineering Branch, “ContentBased Image Retrieval (CBIR) of Biomedical Images,” A report to the Board of Scientific Counselors, National Library of Medicine, Available at: http://archive.nlm.nih.gov/pubs/ reports/bosc02/node6.html

[6] A.A. Goodrum, M.E. Rorvig, K.T. Jeong and C. Suresh, “An open source agenda for research linking text and image content features,” Journal of the American Society for Information Science and Technology, vol. 52, no. 11, 2001, pp.948-953. [7] G. Feng and J. Jiang, “JPEG Compressed Image Retrieval via Statistical Features,” Pattern Recognition, vol. 36, 2003, pp.977-985. [8] H.-J. Bae and S.-H. Jung, "Image Retrieval Using Texture Based on DCT," Proc. of Int. Conf. on

Information, Communications and Signal Processing, Singapore, 1997, pp.1065-1068. [9] T.-W. Chiang, T. Tsai, and Y.-C. Lin, "Progressive Pattern Matching Approach Using Discrete Cosine Transform," Proc. of Int. Computer Symposium, Taipei, Taiwan, Dec. 2004, pp.726-730. [10] J. Z. Wang, Content Based Image Search Demo Page, Available at http://bergman.stanford.edu/ ~zwang/project/imsearch/WBIIS.html,1996.