image compression mismatch effect on color image based face ...

2 downloads 0 Views 193KB Size Report
Jae Young Choi1, Yong Man Ro1, Konstantinos N. Plataniotis2 ... More than three thousand ... Index Terms—Color face recognition, Image compression.
IMAGE COMPRESSION MISMATCH EFFECT ON COLOR IMAGE BASED FACE RECOGNITION SYSTEM Jae Young Choi1, Yong Man Ro1, Konstantinos N. Plataniotis2 1

Image and Video System Laboratory, Korea Advanced Institute of Science and Technology (KAIST), Yuseong-Gu, Daejeon, 305-701, Republic of Korea, 2The Edward S Rogers Sr Department of Electrical and Computer Engineering, University of Toronto, Toronto, Ontario, M5S 3GA, Canada ABSTRACT Face recognition (FR) for emerging applications such as face tagging for social networking, consumer products, and gamming utilize color images stored in distributed repositories. Such images are often in compressed format and of different dimensions. This compression mismatch problem may adversely affect the performance of the face recognition engine. In this paper, we present a comparative investigation of the image compression mismatch problem. Two commonly used color image based face recognition solutions are utilized. More than three thousand images of 341 subjects, typical of the problem, are collected from three public databases. The experimental results support the main thesis of the paper that recognition performance depends critically on the color image properties.

Index Terms—Color face recognition, Image compression mismatch, Web based face recognition, Video surveillance 1. INTRODUCTION Main traditional application areas of automatic face recognition are national security, law enforcement, and biometric-based authentication. Most FR systems in these classic applications are developed under the assumption that the available training and testing data sets consist of grayscale (or intensity) still images with the same data format. Moreover, they have usually controlled FR protocol for the training method and process of the acquisition and transmission of probe (or testing) images to be recognized. Recently, much research has been oriented toward FR systems based on the Web including facial search of celebrities and annotation of faces on personal photos because of their high commercial potentials [1]. In these applications, images and videos stored in distributed repositories over the Web are utilized for training and testing data sets. Grayscale image based FR is the natural setting for classical FR applications such as watch lists and related security tasks. Recently, however, considerable research effort has been devoted to the development of color image based FR solutions [2], [5], [7], which is readily available in most practical face image datasets. Common agreement in these works is that facial color cue can play an important role and it can be used to improve the recognition performance obtained using only grayscale based image sets. Regardless of the types of testing input images, grayscale or color, the majority of FR solutions employ a trainable feature extractor (e.g., eigenfaces) that transforms the input data into corresponding

978-1-4244-5654-3/09/$26.00 ©2009 IEEE

features in a reduced dimensional subspace where identification or verification process are performed. In most practical FR systems, training process of constructing a feature extractor is usually isolated from the testing stage. In addition, depending on applications, it may be difficult for FR designers on the training phase to predict which testing data formats will be present in the future operation. As a result, the feature extractor construction may utilize a limited set of training data whose format could be mismatched with that of the testing data encountered during the actual (or real-time) operation. Thus feature extractors constructed with training data that significantly differ from testing data could indeed be applied to extracting the low-dimensional features of testing images. Especially, such format mismatch problem could be critical in color image based FR systems operated under uncontrolled FR environment, such as Web. In the Web, billions of personal users have been uploading multimedia contents including images and videos on a daily basis. They can be acquired from heterogeneous devices, such as web camera and cell phone camera, and also stored in a large number of repositories dispersed over the Web. Since it is common that content uploading process is highly uncontrolled under Web circumstances, it is imperative that various content acquisition devices result in irregularities of formats of contents uploaded to the Web. Particularly, different image and video contents are very likely to have different compression factors from each other. As such, the feature extractors built with a certain training data format can be commonly confronted with testing data that have different compression formats. An extensive experimental study on the effect of JPEG2000 compression on grayscale iris recognition has been reported in the literature [3]. However, we have not found any systematic study on investigating the impact of compression mismatch on FR methods using color face images. In this paper, we perform an extensive, comparative study to evaluate the compression mismatch FR scenarios that can potentially occur in the real-life color image based FR applications. To exploit the compression mismatch effect, two representation input-level augmentation and decision-level fusion color FR solutions are used. The rest of paper is organized as following: in Section 2, two representative color FR design methods are described in detail to facilitate better understanding of the compression mismatch problem; Section 3 presents the experimental condition and the results to assess the effect of compression mismatch on employed color image based FR systems. Discussion and conclusion are made in Section 4.

4149

2. COLOR IMAGE BASED FACE RECOGNITION

ICIP 2009

General framework in color image based FR largely consists of three parts: preprocessing, feature extraction, and classification steps. Two representative color based FR design methods are the input-level augmentation [2] and decision-level fusion solutions [5]. In order to describe detail frameworks for these two color FR

{ } let {I }

In the training stage, as opposed to the input-level augmentation

{ }

method, the individual set s (t im )

Nt

i =1

is separately used to compute

the corresponding projection matrix Φ ( m ) , resulting in multiple feature extractors. A group of Φ ( m ) ( m = 1,  K ) is then provided

design methods, we introduce necessary notations. Let I

(i ) Nt t i =1

be a

with testing stage. Let s (gim ) and s (pm ) be column vectors of the m th

training set of N t RGB color face images. And

(i ) Ng g i =1

and

spectral plane of I(gi ) and I p , respectively, where s (gim ) , s (pm ) ∈ R Dm .

I p be a gallery (or target) set consisting of N g enrolled color

In the testing stage, the s (gim ) and s (pm ) are separately projected onto

images of known individuals and a color probe image, respectively.

an individual Φ (m ) as following:

2.1. Input-level augmentation based color FR design In

the training stage,

I (t i ) (i = 1,, N t )

preprocessing step in which I (t i ) is usually converted into another different color space. In the sequence, each individual plane (e.g., luminance Y in YCbCr color space) of a training image generated after color space conversion is individually transformed into each associated column vector, denoted by s (t im ) , where m = 1, K and

s (t im ) ∈ R Dm . By concatenating the s (t im ) in the standard column order, an augmented input vector x (t i ) is generated such that x (t i ) = [(s(t i1) )T (s (t i 2 ) ) T  (s (t iK ) )T ]T , where T represents a transpose operator of matrix, x (ti ) ∈ R D , and D =

f g( im ) = (Φ ( m ) ) T s (gim ) and f p( m ) = (Φ ( m ) ) T s (pm ) ,

is first cast into

¦

K

m =1

Dm . Note that each

s (t im ) should be normalized to zero mean and unit variance prior to

where

1 ≤ i ≤ N g and 1 ≤ m ≤ K . Using d (⋅) as defined in (2),

Ng ⋅ K

distinct distance scores d (im )

d

{ }

Ng

i =1

and I p . The formed Φ is offered to the testing stage for the FR tasks. Let

{x }

(i ) N g g i =1

be a set of N g augmented gallery vectors

{ }

produced from I

( i ) Ng g i =1

in the same manner as x (t i ) . And let x p be

an unknown augmented probe vector generated from I p . To

{ }

perform FR tasks on I p , x (gi )

Ng

i =1

and x p are projected onto the Φ

to get the corresponding feature representations as following: f g( i ) = Φ T x (gi ) and f p = Φ T x p .

(1)

A nearest neighbor classifier is then applied to determine the identity of I p by finding the smallest distance between f g( i ) (i = 1,, N g ) and f p as following: *

"( I p ) = "(I (gi ) ) and i * = arg min d (f g( i ) , f p ),

(2)

i

where "(⋅) returns a identity label of face images and d (⋅) denotes distance metric.

2.2. Decision-level fusion based color FR design

= d (f

( im ) g

,f

( m) p

). Noted that d

(im )

are computed using

(i = 1,, N g ) scores with a

Bayesian posterior probability as following:

N

projection matrix Φ to get the low-dimensional features of I (gi )

( im )

particular m should be separately normalized to have zero mean unit standard deviation. And each normalized d ( im ) is generally transformed into the corresponding confidence value c ( im ) prior to classification based on decision-level evidence fusion strategies [4]. The detailed transformation methods are given in [4]. In the classification step, the identity of I p is determined based on

{ }

its augmentation. With a formed training set x (t i ) i=t1 , a feature extractor (or feature subspace) is trained and constructed. The rationale behind the feature extractor construction is to find a

(3)

*

"( I p ) = " (I (gi ) ) and i * = arg max P ("(I (gi ) ) | c ( i1) ,  , c ( iK ) ),

(4)

i

where the computation for P ("(I (gi ) ) | c ( i1) , , c ( iK ) ) depends on a selected combination rule (e.g, sum rule) as described in [4].

3. EXPERIMENTS TO ASSESS IMAGE COPRESSION MISMATCH EFFECT Three de factor standard datasets of CMU PIE, Color FERET, and XM2VTSDB were used to investigate the effect of image compression mismatch on recognition performance of the color image based FR methods. To construct training and probe sets, total 3,452 color images of 341 subjects were collected from these three datasets. Note that initial collected images have an uncompressed RGB true color data format (24 bits depth per pixel). During collection phase, 1,428 frontal view images of 68 subjects (21 samples/subject) were selected from CMU PIE; for one subject, face images have 21 different illumination variations with ‘room lighting on’ condition. From Color FERET, 980 frontal-view images of 140 subjects (7 samples/subject) were chosen form ‘fa’, ‘fb’, ‘fc’, and ‘dup1’ sets. From XM2VTSDB, 1,064 frontal-view images of 133 subjects were collected from two different sessions. In addition, we constructed a gallery set consisting of 341 distinct color images corresponding to 341 target individuals. According to the regulation of enrollment of gallery faces, gallery images have neutral illumination, expression, and frontal-view pose in our experimentation. In order to carry out compression over the collected uncompressed color face images, JPEG standard image compression schemes were used in our experimentation. This is because this color image compression standard is commonly used

4150

in many Web applications, and also adopted as a common face biometric data format between a wide variety of applications or across dissimilar systems as stated in [6]. Hence, JPEG images are expected to be widely used in practical FR applications as well. The JPEG standard includes two basic compression methods: a DCT-based method for “lossy” compression and a predictive method for “lossless” compression. In this experimentation, lossy technique known as the baseline method was employed as it has been by far the most widely implemented JPEG method to date [6]. For measuring the compression degrees, the form of bit per pixel (bpp) was employed. For example, given that the uncompressed color images (24 bpp), the 0.3 bpp represents the 80:1 compression ratio.

Fig. 1. The examples of decompressed JPEG color face images with respect to seven different compression factors. JPEG lossy compression was performed on uncompressed face images (24 bpp) rescaled to the size of 150 [ 130 pixels. Note that during compression operation, the 4:2:0 chroma sub-sampling was used over all compression factors.

(BstCRR) [2] was adopted as a recognition performance metric to evaluate image compression mismatch FR scenarios. For an experiment protocol, collected set of 3,452 images was randomly partitioned into two sets: training and probe sets. The training set consisted of (5 samples x 341 subjects) face images, with remaining 1,647 face images (unseen during training) for the probe set. To guarantee the reliability of the evaluation, 20 runs of random partitions were executed and all of the experimental results reported here were averaged over 20 runs. The compression mismatch FR scenarios that we test are twofold as following: z Scenario 1: Only probe color images are compressed, while training and gallery color images are uncompressed. This scenario would be appropriate in typical surveillance video applications (e.g., protection of criminal in areas). In this FR application, it is common that a FR engine built in central unit uses high-quality training and galley images [3]. And, a large number of surveillance cameras such as CCTV are usually distributed and installed over different, various locations to acquire the probe images. For efficient use of channel bandwidth, it is desirable that captured video frames should be transmitted as a compression data format. z Scenario 2: All color images in training and testing stages are compressed. This scenario is suited to the Web-based FR applications in which training and galley data as well as probe data are very likely to have different compression image and video formats. This could lead to the compression mismatch between compressed training and gallery, and probe data. 

5DQN%VW&55

    3&$



)/'$ %D\HVLDQ .3&$

 

 

.''$





  %SS ELWVSHUSL[HO







(a)   5DQN%VW&55

Fig. 1 shows examples of decompressed JPEG images with different compression factors. The images shown in Fig. 1 are compressed to a certain number of bpp, and then decompressed prior to use in compression mismatch FR experiments. Specifically, we took original uncompressed face images (shown in the most left of the first row in Fig. 1) and, subsequently, performed lossy JPEG compression on them along with seven different compression factors. Note that the chroma components (Cb, Cr) of YCbCr color space are down-sampled during compression and later up-sampled during decompression of compressed data. This up-sampling is known to be handled differently by different programs. Here, bicubic interpolation was employed, the out pixel value is a weighted average of pixels in the nearest 4-by-4 neighborhood of chromaticity planes. The used color space at the training and testing stages in both color image based FR solutions is YCbCr color space [2], which is a scaled and offset version of the YUV color space. As a feature extractor described Section 2, five different methods are used in this experiment: Pincipal Component Analysis (PCA), FisherLinear Discriminant Analysis (FLDA), Bayesian, Kernel PCA (KPCA), and Kernel Direct Discriminant Analysis (KDDA). For kernel-based methods, radio basis function (RBF) is adopted as the kernel function for KPCA and KDDA. For compression mismatch experimentation under decision-level fusion based color FR, zscore technique [2] was used to normalize the distances as stated in Section 2.2. In addition, to compute a posterior probability shown in (4), a sigmoid function followed by a unit of sum normalization [1] and a sum rule [4] was used. In [4], sum rule shows the most superior performance than other confidence fusion strategies, such as product and media rule. The best found correct recognition rate

  3&$



)/'$ %D\HVLDQ



.3&$ .''$

 





  %SS ELWVSHUSL[HO







(b) Fig. 2. Rank 1 BstCRR (identification rate of top response being correct) comparisons, obtained from five feature extraction methods in aforementioned compression mismatch FR Scenario 1, with respect to eight different ‘bpp’ factors. Note that in Fig. 2, training and gallery images are uncompressed (24 bpp), while probe images are to be compressed with a particular one of the seven different compression factors. (a) Input-level augmentation based color FR. (b) Decision-level fusion based color FR.

4151



     

3&$ )/'$ %D\HVLDQ .3&$ .''$ 



  %SS ELWVSHUSL[HO







 5DQN%VW&55

5DQN%VW&55



     



3&$ )/'$ %D\HVLDQ .3&$ .''$ 



  %SS ELWVSHUSL[HO







(a) 



5DQN%VW&55

5DQN%VW&55

   

3&$ )/'$ %D\HVLDQ

   

.3&$ .''$





  %SS ELWVSHUSL[HO







      



3&$ )/'$ %D\HVLDQ .3&$ .''$ 



  %SS ELWVSHUSL[HO







(b) Fig. 3. Rank 1 BstCRR identification comparisons, obtained from five feature extraction methods in aforementioned compression mismatch FR Scenario 2, with respect to eight different compression factors of probe images. Note that the graphs of the left side in Fig. 3 (a) and (b) were obtained from the input-level augmentation based color FR solution, while those in the right side were generated from the decision-level fusion based color FR solutions. (a) Training and gallery images are compressed to 9.0 bpp. (b) Training and gallery images are compressed to 2.3 bpp. As shown in Fig. 2 (a), in the input-level augmentation based color FR, the BstCRR performances with all five feature extraction methods are acceptable, compared to those obtained from the baseline compression matched FR case (i.e., training, gallery, and probe images are uncompressed) if a compression factor of probe images exceeds 0.5 bpp. On the contrary, from Fig. 2 (b), noticeable BstCRR performance drop-off can be observed in all feature extraction methods under the decision-level fusion color FR when a compression factor has smaller values than 1.4 bpp. It is reasonable to assume that selected training and gallery image are relatively high-quality although they have a compressed data format in most practical cases. As such, 9.0 and 2.3 bpp were adopted as compression factors of training and gallery images for evaluating compression mismatch Scenario 2. As can be seen in the graphs of the left side in Fig. 3 (a) and (b), there is little difference in BstCRRs obtained from the input-level augmentation based method with all five FR methods unless compression factor falls below 0.5 bpp. However, in case of the decision-level fusion based method, PCA results in gradual BstCRR decrease as the compression factor is getting smaller, while other four feature extraction methods yield the noticeable degraded BstCRR as compression factor becomes lower than 0.8 bpp.

been gone below 0.75 bpp, we believe that the input-level augmentation FR design is a good choice in the practical applications, which frequently have to deal with image compression mismatch problem. Above empirical results will be useful for FR practitioners to implement the reliable color FR systems. For the future works, we will further demonstrate the effect of compression mismatch on local feature based color FR methods. In addition, we will investigate how the performance of image enhancement over compressed images affects the FR performance of a color FR system when encountering compression mismatch problem.

5. REFERENCES [1] M. S. Kankanhalli and Y. Rui, “Application Potential of Multimedia [2] [3]

4. DISUCSSION AND CONCLUSION

[4]

From the experimental results, the input-level augmentation based color FR is even more robust against the image compression mismatch than decision-level fusion based method. In both mismatch FR scenarios, in the input-level augmentation approach, the BstCRRs taken at even 0.5 bpp (48:1 compression) are comparable to those attained from the counterpart (i.e, matched cases) with all three FR algorithms. For most applications including Web platforms, since JPEG compression has usually not

[5]

4152

[6] [7]

Information Retrieval,” Proc. IEEE, vol. 96, no. 4, pp. 712-720, 2008. J. Y. Choi, Y. M. Ro, and K. N. Plataniotis, “Color Face Recognition for Degraded Face Images,” IEEE Trans. Systems, Man and Cybernetics-Part B , vol. 39, no. 5, pp. 1217-1230, 2009. S. R. Rakshit and D. M. Monro, “An Evaluation of Image Sampling and Compression for Human Iris Recognition,” IEEE Trans. Information Forensic and Security, vol. 2, no. 3, pp. 605-612, 2007. J. Kittler et al., “On Combining Classifiers,” IEEE Trans. Pattern. Anal. Machine Intell., vo1. 20, no. 3, pp. 226-239, 1998. L. Torres, J. Y. Reutter, and L. Lorente, “The Importance of the Color Information in Face Recognition,” Proc. IEEE Int’l Conf. on ICIP, 1999. ANSI/NIST-ITL 1-2007, American National Standard for Information Systems- “Data Format for the Interchange of Fingerprint, Facial, & Other Biometric Information- Part1”. P. Shih and C. Liu, “Improving the Face Recognition Grand Challenge Baseline Performance Using Color Configurations Across Color Spaces,” Proc. IEEE Int’l Conf. on ICIP, 2006.