Robust detection of outliers for projection-based face ... - Springer Link

3 downloads 0 Views 483KB Size Report
Nov 2, 2007 - two face recognition methods, namely PCA and LDA2D, and we propose a filtering process that allows the automatic selection of noisy face ...
Multimed Tools Appl (2008) 38:271–291 DOI 10.1007/s11042-007-0176-x

Robust detection of outliers for projection-based face recognition methods Sid-Ahmed Berrani · Christophe Garcia

Published online: 2 November 2007 © Springer Science + Business Media, LLC 2007

Abstract In this paper, the impact of outliers on the performance of high-dimensional data analysis methods is studied in the context of face recognition. Most of the existing face recognition methods are based on PCA-like methods: faces are projected into a lower dimensional space in which similarity between faces is supposed to be more easily evaluated. These methods are, however, very sensitive to the quality of the face images used in the training and in the recognition phases. Their performance significantly drops when face images are not well centered or taken under variable illumination conditions. In this paper, we study this phenomenon for two face recognition methods, namely PCA and LDA2D, and we propose a filtering process that allows the automatic selection of noisy face images which are responsible for the performance degradation. This process uses two techniques. The first one is based on the recently proposed robust high-dimensional data analysis method called RobPCA. It is specific to the case of recognition from video sequences. The second technique is based on a novel and effective face classification technique. It allows isolating still face images that are not very precisely cropped, not well-centered or in a non-frontal pose. Experiments show that this filtering process significantly improves recognition rates by 10 to 30%. Keywords High-dimensional data analysis · Outliers · Face recognition · Dimensionality curse

Part of this work was supported by the European Commission under contract FP6-001765 aceMedia. S.-A. Berrani (B) · C. Garcia Orange—France Telecom Division R&D—TECH/IRIS, 4, rue du Clos Courtel, BP 91226, 35512 Cesson Sévigné Cedex, France e-mail: [email protected] C. Garcia e-mail: [email protected]

272

Multimed Tools Appl (2008) 38:271–291

1 Introduction High-dimensional data analysis is a very important research topic with applications in numerous domains such as video indexing, data mining, pattern recognition, etc. The aim is to develop methods that can extract knowledge and explore high-dimensional datasets. Depending on the domain, the effect of increasing the dimension can be an exponential growth of computing time, a significant degradation of precision or a greater sensitivity to outliers. All these problems are usually referred to as the dimensionality curse phenomenon, an expression introduced by Bellman in [5]. Many papers have studied this phenomenon with respect to specific domains: machine learning [26], similarity searches [2, 6], neural networks [27], etc. In this paper, we study a specific aspect of this problem within the context of face recognition. In particular, we focus on the sensitivity to outliers of the highdimensional data analysis methods used in face recognition. These methods are used to reduce the dimension of feature vectors extracted from face images. They aim at projecting face images into a lower dimensional space, while retaining most of the information expressed by the images. The similarity between face images is then evaluated in this projection space. Examples of such methods are Principal Component Analysis (PCA) [25], PCA2D [30], Linear Discriminant Analysis (LDA) and LDA2D [29]. This paper first studies the sensitivity of these methods when face images have not been precisely cropped, are not well-centered or when they have been extracted from images taken under variable illumination conditions. This study is an extension of the one we have presented in [7, 8]. As in [7, 8], we will study this problem from a statistical point of view and we will show that “noisy” face images can be viewed as outliers in feature space. The second contribution of the paper is a filtering procedure which automatically isolates face images that are responsible for the performance degradation of face recognition methods. The idea was introduced in our previous works [7, 8]. It consists in filtering out feature vectors that affect the accuracy of internal statistical entities used in PCA-like methods (typically the mean and the covariance matrix). In this paper, we propose a complete procedure with two techniques that deals with both cases of face recognition: recognition from video sequences and recognition from still images. The first one uses robust statistics and in particular RobPCA [13] in order to isolate noisy face images from a sequence of face images. This solution is suitable when performing face recognition from video sequences. The second technique is specific to the case of recognition from still images. It focuses on noisy face images that are not very precisely cropped, not well-centered or in a non-frontal pose. Each face image is separately analyzed and noisy face images are isolated using a novel and effective face classification technique. This technique relies on a new statistical distance. Throughout this paper, we will illustrate the introduced concepts using three face databases and two statistical methods for face recognition, namely PCA and LDA2D. The rest of the paper is organized as follows. Section 2 presents the face recognition problem and gives a brief overview of state-of-the-art face recognition methods, focusing on statistical approaches. In Section 3, the sensitivity to outliers of

Multimed Tools Appl (2008) 38:271–291

273

statistical methods for face recognition is studied. The notion of noisy face images is first introduced. Then, a set of experiments showing the impact of noisy face images is presented. Finally, the obtained results are analyzed. Section 4 discusses how noisy face images and the outliers that they generate can be dealt with. It presents also procedures for isolating noisy face images for both cases, recognition from video sequences and recognition from still images. The experimental results showing the effectiveness of the proposed solution are presented in Section 5. Section 6 concludes the paper and gives an outline of future extensions.

2 The face recognition problem During the past decade, automatic recognition of human faces has grown into a key technology and has been extensively studied due to its importance in different domains such as biometrics, video surveillance, multimedia indexing, etc. The basic objective of a face recognition system is to identify an unknown face using a database of known faces. In order to assess state-of-the-art performances, systematic empirical evaluations of face recognition techniques (FRT) have been conducted, including the FERET [20], FRVT 2000 [9], FRVT 2002 [18], and XM2VTS [16] protocols. Face recognition techniques can be roughly classified into three groups: holistic matching methods where the face image is classified as a whole, feature-based (structural) matching methods where specific facial features and their spatial distribution are analyzed and hybrid methods where both global and local based methods are fused. New approaches tend to use three-dimensional morphable models, and 2D video faces. A detailed survey of existing methods is presented in [33]. Given their popularity in the face recognition community, in this paper, we will focus on statistical methods applied to 2D gray-level face images. Like any other pattern recognition task, face recognition can be basically defined as a two-step process: feature extraction and classification. Since the seminal work of Sirovich and Kirby [24], statistical projection-based methods have been widely used for facial feature extraction. Turk and Pentland [25] proposed the very wellknown Eigenfaces method, based on PCA, where a face image can be represented as a weighted sum of a collection of images (eigenfaces) that define a facial basis. The principle of this method is to make use of the statistical properties of feature vectors associated with face images to compute a projection subspace. Face images are projected into this subspace and their similarity is evaluated as an Euclidean distance. In this method, the feature vector is the vector obtained by concatenating the rows or the columns of the face image. The projection subspace is the subspace defined by the eigenvectors of the covariance matrix of the feature vectors. The identification of an unknown face is achieved by finding the known face image in the database whose projection vector is the closest to the projection vector of the face image to be recognized (nearest-neighbor classification). As another way to perform face recognition, Belhumeur et al. [4] introduced the Fisherfaces method, based on Linear Discriminant Analysis (LDA), also known as Fisher Discriminant Analysis, where the class information, i.e. the identity of each face image, is taken into account for enhancing the separation between different classes, while building the face space. This supposes that several face images are

274

Multimed Tools Appl (2008) 38:271–291

available for each person stored in the database. The projection space is built so that the projected face classes form clusters that are as compact as possible with centers as far away as possible from each other (here, a class refers to the set of feature vectors extracted from the face images of a given person). Hence, LDA is carried out via a scatter matrix analysis. Once again, the identification of an unknown face is usually performed via a nearest-neighbor classification. In PCA-based and LDA-based face recognition methods, the h × w 2D face images must be first transformed into 1D image vectors of size h.w, which leads to high-dimensional image vector space, where statistical analysis, i.e. covariance matrix calculation and eigen system resolution, is costly, difficult and may be unstable. To overcome these drawbacks, Yang et al. [30] proposed the Two Dimensional PCA (2D-PCA) method that aims at performing PCA using directly the face image matrices, keeping the 2D structure of the face images. They have shown on various databases that PCA2D is more efficient and robust than PCA when dealing with face segmentation inaccuracies, low-quality images and partial occlusions. More recently, Visani and Garcia [29] have proposed the Two-Dimensional-Oriented Linear Discriminant Analysis (LDA2D) approach, that consists in applying LDA on image matrices. They have shown on various face image databases that LDA2D is more efficient than both PCA2D and the Fisherfaces method for the task of face recognition. They have shown also that LDA2D is more robust to variations in lighting conditions, facial expressions and head poses. The advantages are also an important gain in storage and a better numerical stability. In this paper, we perform experiments using the PCA and LDA2D methods.

3 The impact of outliers As introduced in the previous section, statistical methods are mainly based on the analysis of the covariance matrix of feature vectors. Therefore, they are likely to be sensitive to outliers as they are mainly based on first and second order moments. For example, in the case of PCA, only the few first components are kept. These are supposed to encode most of the information expressed by the data. They correspond to directions within which the variance is at its maximum. However, if the dataset contains too many noisy vectors, the first components will encode only the variations due to the noise and not the variations containing the necessary information to differentiate faces, i.e. attributes and shape of the face. In the case of LDA, in addition to the impact of outliers on the scatter matrices involved in the computation of the projected space, noisy vectors introduce important overlapping between classes, and therefore, reduce the performance of the classifier. One of the main challenges of face recognition methods is to deal with variations that might affect the face images. These variations can be due to the acquisition process (important illumination variations, poor image quality, low image resolution, artifacts of compression), to the changes in facial expressions and/or in head poses, or to local facial occlusions. Moreover, as a necessary preliminary step, faces have to be automatically detected and robustly localized. Over the last decade, automatic face detection has become a very active research field [11, 28, 31] and several solutions can now be considered as mature tools like the CFF method [10], even though progress

Multimed Tools Appl (2008) 38:271–291

275

is yet to be made for full-profile view detection and accurate facial feature detection, for allowing efficient face alignment by precise face cropping. A robust face recognition method is therefore supposed to recognize a face despite these variations. In practice, this is far from being the case. This has been confirmed by the extensive experimentations performed during FRVT 2002. Under controlled indoor lighting conditions, the best performing face recognition systems lead to a good recognition rate of 90% for a database of 600 people, with perfectly cropped frontal images. However, recognition rates dropped significantly to 51% for the same people when images are taken outdoors with uncontrolled lighting conditions. It has also been shown that state-of-the-art systems performed very poorly when faces are not in frontal position, with average recognition rates of 30% for left/right turned faces and 42% for faces looking up/down. In the following, we consider a face image as noisy if it presents a variation that the face recognition method cannot handle and that results in performance degradation. We propose to classify these variations into three categories: 1. Important illumination changes, 2. Imprecise face cropping, 3. Non-frontal poses. Some face images presenting these kinds of variations are shown in Fig. 1. Face images from the second category arise usually from an automatic extraction of faces from still images or video sequences. They are due to the imprecision inherent to face detection methods. To illustrate the impact of these variations on statistical methods for face recognition, we have evaluated the performances of PCA and LDA2D on three databases that contain noisy face images. The obtained results and their analysis are presented in the two following sub-sections.

Fig. 1 Example of noisy face images

276

Multimed Tools Appl (2008) 38:271–291

3.1 Experimental evaluation of the impact of noisy face images To evaluate the impact of noisy face images, we have used three databases. For each one, we have performed training with PCA and LDA2D and evaluated the recognition rates twice: a first time using all the images contained in the database, including noisy ones; and a second time without taking into account noisy face images. In this last case, noisy face images have been manually removed from the databases. The three databases that we have used are as follows. –





The Asian Face Image Database PF011 is a database of 107 people with 17 different views per person: one in normal conditions, four taken under important illumination variations (e.g. the two first images of the first row of Fig. 1), eight with slight pose variations and four with facial expression variations (smiling, looking surprised, etc.). In this experiment, we have focused on the illumination variations and we have considered as noisy the four views taken under important illumination variations. Two of the four views have been included in the training set and the two others in the test set. The 13 other images have been randomly divided into two subsets: nine for training and four for testing. FDB15 is a database of 15 people that we have created specially for the needs of this paper. It has been created by automatically extracting faces from video sequences using the CFF2 face detector [10]. In the training set, we used 15 wellframed images and six noisy face images per person. In the test set, we used five well-framed images and five noisy face images per person. In this database, the considered variations concern the position of the face in the image and the head pose. Note that training and test sets have been created manually. PIE* is a subset of the CMU Pose, Illumination and Expression (PIE) database [23] which is a database of 41,368 images of 68 people. Each person is taken under 13 different poses, 43 different illumination conditions and four different expressions. We have constructed PIE* by selecting 30 people from PIE. For each person, we have selected 20 views: 11 frontal views under correct illumination conditions and nine views combining variations in pose and variations in illumination. In the training set, we have included seven well-framed images and five noisy images. The remaining images (four well-framed and four noisy images) have been used for testing. This database aims therefore at studying the impact of very noisy images due to both illumination and pose variations.

All the images in the three databases have been resized to 65 × 75 pixels and have undergone a histogram equalization in order to reduce the effect of illumination variations.

1 Available 2 The

following the URL: http://nova.postech.ac.kr/.

“Convolutional Face Finder” is a near-real time neural-based face detector. It has been designed to precisely locate multiple faces of minimum size 20 × 20 pixels and variable appearance, rotated up to ±30◦ in image plane and turned up to ±60◦ , in complex real world images. A detection rate of 90.3% with 8 false positives have been reported on the CMU test set, which are the best results published so far on this test set.

Multimed Tools Appl (2008) 38:271–291

277 FDB15, PCA 100

80

80 Recognition rate (%)

recognition rate (%)

PF01, PCA 100

60

40

20

0

40 60 80 Number of components

40

20

Without noisy faces Using all faces 20

60

Without noisy faces Using all faces 0

100

5

10

15 20 25 30 Number of components

35

40

PIE*, PCA

Recognition rate (%)

100

80

60

40

20

0 0

Without noisy faces Using all faces 20

40 60 Number of components

80

100

Fig. 2 Impact of outliers on recognition capabilities of PCA

Following the two-phase evaluation protocol, the impact of noisy face images on PCA and LDA2D has been measured on the three databases. The obtained results are presented in Figs. 2 and 3. These two figures show that, when all the face images including noisy ones are used, the recognition rates of LDA2D and PCA decrease by about 10% for databases PF01 and FDB15. In the case of PIE*, where very noisy face images have been included, the recognition rate drops down by more than 30% for both PCA and LDA2D. These results clearly show the significant performance degradation due to noisy face images. We can also note that this degradation is independent of the number of components retained during the analysis. We recall that this number of components corresponds to the dimension of the projection space in the case of the PCA, and to the number of the row oriented discriminant (ROD) components in the case of LDA2D. The dimension of the LDA2D projection space is hence the number of ROD components multiplied by the height of the face images. 3.2 Result analysis In order to provide evidence of the relation between noisy face images, the impact on low order statistics and the performance degradation of recognition, we have studied the obtained eigenvalues when PCA is performed on the database with and

278

Multimed Tools Appl (2008) 38:271–291 PF01, LDA2D

FDB15, LDA2D

100

100 90 80 Recognition rate (%)

Recognition rate (%)

80

60

40

20

60 50 40 30 20

Without noisy faces Using all faces 0

70

5

10 15 Number of components

Without noisy faces Using all faces

10 0

20

5

10 15 Number of components

20

PIE*, LDA2D

Recognition rate (%)

100

80

60

40

20

0

Without noisy faces Using all faces 5

10 15 Number of components

20

Fig. 3 Impact of outliers on recognition capabilities of LDA2D

without noisy face images. First, if we study the three first eigenvalues from PCA, we notice that the ratio of their sum w.r.t the sum of all the eigenvalues is much larger when noisy face images are included during the analysis. We recall that this ratio corresponds to the proportion of the variation expressed by the corresponding principal components. Table 1 gives the ratio of the first three eigenvalues w.r.t. the sum of all eigenvalues for PF01 and FDB15. This table shows that, despite their relatively small number, noisy face images in the training sets significantly alter the PCA results, and hence decrease the recognition rates as they affect the discriminant information encoded by the first principal components. On the other hand, the absolute values of the first eigenvalues are much greater when noisy face images are included during PCA. Figure 4 shows the huge difference between eigenvalues on PIE*. This means that noisy face images introduce important

Table 1 Ratio of the first three eigenvalues w.r.t. the sum of all eigenvalues

Using all face images Without noisy face images

PF01 (%)

FDB15 (%)

35.51 25.68

35.02 29.13

Multimed Tools Appl (2008) 38:271–291 Fig. 4 Fifteen first eigenvalues of PIE* with and without noisy face images

279

PIE*

9

3

x 10

Using all faces Without noisy faces 2.5

Eigenvalue

2

1.5

1

0.5

0

1

2

3

4

5

6

7

8

9 10 11 12 13 14 15

Index

additional variations to the first eigenvalues, corrupting the information that should be encoded by these eigenvalues and on which recognition heavily depends. 3.3 Discussion: How to deal with outliers? Experiments presented in Section 3.1 have shown that a few noisy face images can significantly reduce the performance of statistical face recognition methods. To deal with this problem, two orthogonal solutions can be considered. The first one consists of trying to develop methods that are able to implicitly cope with noisy face images. This is the aim of most face recognition techniques that have been developed. Until now, it has been however an unreachable objective. The second solution rests on the simple observation that, in most cases, both learning and identification are based on multiple face images per person. Therefore, the solution could be to discard noisy face images. In this paper, we focus on this second solution and we propose a novel and efficient procedure to spot and isolate noisy face images.

4 The proposed solution Filtering out noisy face images is needed twice in the whole face recognition process: (1) off-line, to filter out noisy face images from the training set, and (2) on-line, to decide whether to perform recognition for a given query face image or not. Isolating noisy face images depends on the way face recognition is performed: when face recognition is performed from video sequences, a set of face images of the person to learn or to recognize is extracted from the video frames. These face images are temporally correlated and can be analyzed together in order to isolate noisy face images. However, when performing face recognition from still images, each face image has to be separately analyzed and classified.

280

Multimed Tools Appl (2008) 38:271–291

Therefore, we propose two different solutions, one for the case where face recognition is performed from video sequences and another one for the case where face recognition is performed from still images. 4.1 Case 1: Recognition from video sequences In this case, during the training phase, one video sequence is provided for each person to learn. A set of face images is then extracted from the video frames. If we assume that the proportion of noisy face images in this set is relatively low, we can consider the filtering problem from a statistical point of view: it can be viewed as an outlier detection problem. A face image is considered as noisy if the associated feature vector is isolated as an outlier. The assumption on the proportion of noisy images during the training phase is not very restrictive as training is done off-line, only once and therefore can be therefore performed in controlled conditions. During the recognition phase, there is, however, no restriction on the proportion of noisy face images. The solution has therefore to be modified. We propose to insert each query face image into a set of well-framed face images from the training set and then apply the same outlier detection procedure. The query face image is noisy if the corresponding feature vector is isolated as an outlier. Feature vectors can be obtained simply by concatenating the rows or the columns of each face image. However, as we are only concerned with the head pose and the global illumination of the face in the image, we suggest reducing the resolution of the images. This allows the acceleration of the filtering process and avoids taking small details of the images into account. In the remainder of this section, we present an overview of outlier detection techniques and we focus on the technique we have chosen. 4.1.1 Outlier detection techniques The outlier detection problem has been thoroughly studied. The interested reader can refer to [12] for a complete description of the problem, its applications and the state of the art of existing methods. In this section, we only give different definitions of outliers and a brief overview of the most recent outlier detection methods. In the literature, different definitions of outliers have been given depending on the considered application. An outlier can refer to a vector that deviates markedly from the other members of the dataset or to an observation that appears inconsistent with the reminder of the dataset [3]. It can also refer to a surprising veridical data [14], to a noisy point lying outside a set of defined clusters or a point lying outside clusters but separated from noise [1]. Methods designed to deal with these different kinds of outliers are referred to as outlier detection, novelty detection, anomaly detection, noise detection, etc. [12]. These methods can be divided into three classes: (1) the statistical methods that use vector distribution models to characterize the proximity or the belonging of a vector to a set of vectors; (2) the neural network-based methods that use neural training to isolate outliers and (3) machine learning methods that make use of decision trees and clustering techniques. In the context of face recognition, no prior knowledge of the data is available. Therefore, methods that use supervised learning or a predefined model of data cannot be applied. Moreover, the notion of outliers is slightly different from the definitions given above. Our main concern is the selection and removal of vectors

Multimed Tools Appl (2008) 38:271–291

281

that affect the accuracy of the first and second order moments involved in the process of face recognition. In other words, we are concerned with vectors whose deviation from the rest of vectors of the dataset is high enough to affect the accuracy of the first and second order moments. We therefore have chosen the RobPCA method introduced by Hubert et al. [13]. It is a statistical method that aims at performing a robust principal component analysis, i.e. finding principal components that are less influenced by outliers. It also provides a useful method to flag outliers. The main idea of RobPCA is to find the subset of vectors that allows computing reliable basic statistics. These are then used to compute robust principal components and to isolate outliers. 4.1.2 Outlier detection using RobPCA RobPCA [13] combines the ideas of two different approaches for robust estimation of principal components. The objective of the first approach is to find a subset of vectors whose covariance matrix has the smallest determinant, i.e. that is the most compact in space. The mean and the covariance matrix are computed on this selected subset. The second approach uses Projection Pursuit techniques. The idea is to maximize a robust measure of spread to sequentially find the principal axes. ˆ of a dataset To estimate the robust mean (μ) ˆ and the robust covariance matrix (C) Xn,d of n d-dimensional vectors, RobPCA proceeds in three steps: 1. Data vectors are processed using a classical PCA. The objective is not to reduce the dimension but only to remove superfluous dimensions. 2. The h “least outlying” vectors are searched, where h < n and n − h is the maximum expected number of outliers. To do that, a measure of “outlyingness” is used. This measure is computed by projecting all the vectors on a set of lines and by measuring the degree of “outlyingness” of each vector w.r.t. the spread of projections. PCA is then applied on the found h vectors and the dimension is reduced. 3. The final μˆ and Cˆ are estimated using an Minimal Covariance Determinant (MCD) estimator: they are estimated on the h vectors whose covariance matrix has the smallest determinant. To find these vectors, a FAST-MCD algorithm [21] is applied. The principle of FAST-MCD is to draw a set of random subsets and to refine them iteratively: – – –

Compute the mean (m) and covariance matrix (C) of the h vectors, Compute the C-Mahalanobis distances of all the vectors to m, Choose a new set composed of the h vectors with the smallest Mahalanobis distances. The determinant of the covariance matrix of these new h vectors is smaller than the determinant of C.

This procedure is repeated until convergence, i.e., no further improvements are obtained. Once μˆ and Cˆ have been estimated, the vectors are projected into a lower ˆ Let Yn,k be the new data matrix: dimensional space defined by the eigenvector of C.   Yn,k = Xn, d − 1d μˆ t Pd,k ,

(1)

282

Multimed Tools Appl (2008) 38:271–291

where 1d is a d-dimensional vector of all components equal to 1 and Pd,k is the ˆ projection matrix. Pd,k is computed from a spectral decomposition of C: t , Cˆ = Pd,k Lk,k Pd,k

(2)

where Lk,k is the diagonal matrix of eigenvalues l1 , ..., lk . The outliers are then determined by analyzing the distribution of the two following distances (computed for the vector i):   k 2  yij   , and D2i = xi − μˆ − Pd,k yit  . (3) D1i =  l j=1 j Distance D1 is the distance to the robust center of the vectors. It evaluates the proximity of xi to the cloud of vectors in the projection space. Distance D2 is the orthogonal distance to the projection space. Two thresholds are then derived from the distribution of these distances. If a vector has at least one of the two distances greater than the associated threshold then it is considered as an outlier. The distribution of D1 can be approximated by a χk2 distribution because it is a Mahalanobis distance of normal vectors. Therefore, a value of the associated thresh2 old can be χk,0.975 . However, the distribution of D2 is not exactly known. Hence, we use the approximation proposed in [13]: D2 to the power of 2/3 is approximately normally distributed. The associated threshold is hence (m + σ z0.975 )3/2 , where m and σ are respectively the robust estimations of the mean and the standard deviation and z0.975 is the 97.50% quantile of the normal distribution. 4.2 Case 2: Recognition from still images When performing recognition from still images, the problem is more difficult. Each face image has to be separately analyzed to determine whether it is noisy or not. It is hence a binary classification problem. Positive examples are well-framed face images and negative examples are noisy face images. In this section, we will focus on noisy face images that are not well centered or that are in a non-frontal pose. Illumination changes are relative and cannot be defined in a general manner. It is therefore difficult to build a classifier for this kind of noisy face images. In the literature, techniques that can be used to classify well-framed face images are facial pose classification techniques. The facial pose classification problem has been extensively studied and many techniques have been proposed. The first category of these techniques performs classification from videos where tracking algorithms are used in order to estimate for each frame the face model and its pose [22, 32]. These techniques are limited to videos and cannot be used for still images. Other techniques use facial features (the eye centers, the tip of the nose and the center of the mouth) along with some biometric rules in order to deduce the pose of the face [15]. These techniques are inherently limited by the drawbacks and the imprecision of the automatic techniques for facial feature detection. Indeed, facial feature detection techniques are not sufficiently precise to make reliable the applied

Multimed Tools Appl (2008) 38:271–291

283

biometric rules, in particular in presence of profile poses. Another class of techniques uses PCA [17, 25] or binary masks [15] in order to characterize a specific pose. These techniques are widely referenced. However, they are generally very sensitive to face occlusions and external variations (like glasses, mustaches...). Based on this state-of-the-art study, we have noticed than there is no technique that can be directly applied to our problem. We have therefore proposed a novel solution, and we have selected and implemented the well-known PCA-based classification method [17, 25] to carry out a comparative study. 4.2.1 Proposed classification technique The proposed solution is based on an off-line training phase in which a set of wellframed face images (positive examples) and a set of noisy face images (negative examples) are required. Even if positive examples are all correctly chosen, they are generally heterogeneous: they may contain neutral face images but also face images with glasses, with and without mustaches, smiling faces....We therefore propose to partition the set of positive examples into a set of more homogeneous subsets and to build a specialized classifier for each subset. This improves the performance of each classifier as it does not have to deal with important variations. This partitioning can be automatically performed using a clustering algorithm such as the k-means or manually by an expert. For each subset of positive examples, a classifier is built. To build classifiers, a feature vector is extracted from each face image. As usual in this field, the chosen feature vector is simply the concatenation of gray level values of the pixels of the images. The resolution of face images is however reduced beforehand. A low resolution is sufficient as we are only interested in the pose and the position of the face in the face image. The classification method we propose is based on a statistical distance that measures how close is a feature vector to a set of feature vectors. For a subset E1 of positive examples having the set of feature vectors {v1 , v2 , ..., vn }, the distance of feature vector q of a new face image w.r.t. E1 is computed as follows: DH(q, E1 ) = CD({v1 , v2 , ..., vn }) − CD({v1 , v2 , ..., vn } ∪ {q}),

(4)

where CD(A) is the determinant of the covariance matrix of a set of vectors A. This new distance makes use of the homogeneity of subsets. It measures how close a feature vector q is to a subset E1 as the impact of inserting q into E1 on the homogeneity of E1 . This idea is inspired from the work done in robust statistics where low level moments are computed using only very homogeneous vectors. Training a classifier thus relies on computing the covariance matrix, its determinant and a decision threshold for each subset of positive examples. This threshold is computed using the set of negative examples and possibly another set of positive examples if available. It defines the limit above which, a vector does not belong to the subset. Online, to classify a face image, the distance DH of the associated feature vector is computed to each of the subsets. The face image is considered as well framed if at least one of the computed distances is smaller than the threshold of the corresponding subset.

284

Multimed Tools Appl (2008) 38:271–291

Table 2 Summary of the content of the three used face databases

Number of people Total number of images Training set: number of images per person Training set: number of noisy images per person Test set: number of images per person Test set: number of noisy images per person

PF01

FDB15

PIE*

107 1819 11 2 6 2

15 465 21 6 10 5

30 600 12 5 8 4

5 Experimental results This section presents experimental results that assess the robustness and the effectiveness of the proposed approach for filtering out noisy face images. In the first subsection, experiments on the RobPCA-based filtering procedure for the case of recognition from video sequences is presented. The second subsection presents the results obtained on still images using the DH distance based classifier. For both cases, the impact of noisy face images has been measured on two face recognition methods, PCA and LDA2D. For each method, we have varied the number of components retained when computing the projection space. 5.1 Case 1: Recognition from videos sequences As in the study of the impact of outliers presented in Section 3.1, the evaluation has been conducted using the three databases: PF01, FDB15 and PIE*. These databases have been introduced and described in Section 3.1. Table 2 summarizes their content. A sample of face images from each of them is given in Fig. 5.

PF01

FDB15

PIE*

Fig. 5 Samples of face images from the three databases. Images in the first column are well-framed. The other columns contain noisy face images

Multimed Tools Appl (2008) 38:271–291 Table 3 The selection rates of face images (%) Training set Test set

285 PF01

FDB15

PIE*

65.25 40.03

63.81 36.00

68.89 51.25

The size of the face images in the three databases has been set to 65 × 75. However, it has been reduced to 32 × 37 during the ROBPCA-based filtering process. All the images have undergone a histogram equalization in order to reduce the effect of illumination variations. The ROBPCA-based filtering procedure of noisy face images has been carried out on face images of each person separately on each training set. The selection rates of face images is presented in Table 3 for the training sets and also for the test sets. These rates have to be compared with the proportion of noisy face images that have been included in the database. We can notice that apart from the PIE* database, the selection rates are greater than the proportion of noisy face images. This can be explained by the fact that the number of noisy face images reported in Table 2 is not the exact number of noisy face images but only the number of noisy face images we are focusing on w.r.t. the studied variation. For instance, in PF01, we have considered as noisy only face images with important illumination variations whereas the other face images contain other kind of variations in facial expressions and head poses. To illustrate the filtering procedure, we present in Fig. 6, a set of face images that have been automatically extracted from a video sequence. The number of face images we considered is 21. Among them there are six images in a non frontal pose or not well cropped. These images correspond to the training images associated with a person stored in the FDB15 database. The seven images in the last two rows of the figure are those that had been isolated as noisy. This example shows that the filtering method is able to isolate atypical face images, among which are the six non-frontal and not well-centered face images.

Fig. 6 Example of face images extracted from a video sequence. Face images in the last row have been automatically isolated as noisy using a ROBPCA-based filtering procedure

286

Multimed Tools Appl (2008) 38:271–291 FDB15, PCA 100

80

80 Recognition rate (%)

Recognition rate (%)

PF01, PCA 100

60

40

20

0

40 60 80 Number of components

40

20

After an auto selection Using all faces 20

60

0

100

After an auto selection Using all images 5

10

15 20 25 30 Number of components

35

40

PIE*, PCA 100

Recognition rate (%)

80

60

40

20

0 0

After an auto selection Using all faces 20

40 60 80 100 Number of components

120

Fig. 7 Recognition capabilities of PCA with and without an automatic selection of face images using a ROBPCA-based filtering procedure

To assess the ability of this procedure to improve the recognition rates,3 we have then evaluated the recognition rates considering only the selected face images (for both training and testing) and we have compared the obtained results with those obtained without filtering. The results are summarized in Figs. 7 and 8. We can notice that, overall, the recognition rates have been improved by the filtering procedure by 10 to 30%. We can notice that these improvements are at least equivalent and sometimes better than the improvements obtained with a manual selection of face images (results shown in Figs. 2 and 3). The reason is that in the experiments presented in Section 3, the selection concerned only some specific variations. For example, we have filtered out only face images presenting important illumination changes from PF01. The proposed filtering procedure has however isolated all the noisy face images including those presenting important expression changes or non-frontal poses. This shows that the proposed filtering method is able to handle different variations simultaneously.

3 In

these experiments, we give only the recognition rates. As only the single top result is considered, it holds that the recognition rate is equal to the precision that is also equal to the recall.

Multimed Tools Appl (2008) 38:271–291

287 FDB15, LDA2D 100

80

80 Recognition rate (%)

Recognition rate (%)

PF01, LDA2D 100

60

40

20

0

10 15 Number of components

40

20

After an auto selection Using all faces 5

60

0

20

After an auto selection Using all faces 5

10 15 Number of components

20

PIE*, LDA2D 100

Recognition rate (%)

80

60

40

20

0

After an auto selection Using all faces 5

10 15 Number of components

20

Fig. 8 Recognition capabilities of LDA2D with and without an automatic selection of face images using a ROBPCA-based filtering procedure

5.2 Case 2: Recognition from still images In order to evaluate the second filtering procedure we propose for the case of recognition from still images, we have first evaluated our DH-based classification method. We have compared its performance to the well-known PCA-based facial pose classification [17, 25]. We have then evaluated the effectiveness of our classifier to isolate noisy face images and to improve the recognition rates of PCA and LDA2D. 5.2.1 DH-based classifier vs. PCA-based classifier In this comparative study, we focus on the task of frontal pose classification. The objective is to show the effectiveness of our classifier on a common classification task in the field of face image processing and to compare its performance to the PCAbased facial pose classification method. The PCA-based facial pose classification method relies on the distance-fromfeature-space (DFFS). This distance is used to select the eigenspace which best describes the face image to classify. In our case, only one eigenspace is computed (using positive examples of frontal face images). Negative examples (non-frontal face

288

Multimed Tools Appl (2008) 38:271–291

Table 4 Classification rates of frontal and non-frontal face images using our DH-based classifier and the PCA-based technique

Frontal face images Non-frontal face images

DH (%)

PCA (%)

91.21 96.81

80.62 80.39

images) are used to determine the threshold of the DFFS above which a face image is considered non-frontal. This first evaluation has been performed on two different datasets: a subset from the FERET database [19] and a subset from the Stirling University face image database. Due to the lack of space, we only present results obtained on the FERET database. Identical results have been obtained on the Stirling database. The subset from FERET we used is composed of 540 frontal face images and 254 non-frontal face images. These sets have been divided as follows: – – –

180 frontal face images to train classifiers. 200 frontal face images and 98 of non-frontal face images to determine the decision threshold. 160 frontal face images and 156 non-frontal face images to test the performance of the classifier.

In this experiment, the size of face images has been reduced to 50 × 50. To build the DH-classifier, face images of the training set has been manually partitioned into seven homogeneous subsets, i.e. seven specialized classifiers have been created. Table 4 summarizes the obtained classification rates for both techniques. It shows that our solution performs significantly better than the PCA-based method. It should be noted that the rates of the PCA-based technique are the best we can obtain when we vary internal parameters. 5.2.2 Effectiveness of the DH-based classifier In order to evaluate the ability of our solution to isolate noisy face images from still images, we have used the same evaluation protocol as previously. However, we have FDB15, LDA2D 100

80

80 Recognition rate (%)

Recognition rate (%)

FDB15, PCA 100

60

40

After a DH–based filtering step Using all faces

20

0

10

20 30 Numbers of components

40

60

40 After a DH–based filtering step Using all faces

20

0

5

10 15 Number of components

20

Fig. 9 Recognition capabilities of PCA and LDA2D with and without an automatic selection of face images using the DH-based classifier

Multimed Tools Appl (2008) 38:271–291

289

consider each face image separately: each image is passed through the DH-based classifier in order to decide whether it is noisy or not. In this experiment, we have focused on the FDB15 dataset. The face images contained in this dataset exhibit important pose variations and cropping imprecisions. To train the DH-based classifier, a training set has been created using a set of 116 well-framed face images and a set of 50 noisy face images. These sets have been chosen w.r.t. the criteria we have defined on noisy face images for still images, that is, non-frontal and not precisely cropped. We have evaluated the recognition performance for both PCA and LDA2D using the FDB15 database. Obtained results are depicted in Fig. 9. They show an improvement of the recognition rate of more than 10% for both methods.

6 Conclusions and future work In this paper, we have analyzed the problem of outliers and their impact on statistical projective methods within the context of face recognition. First, we have shown the sensitivity of these methods to outliers and experimentally evaluated the performance degradation of two methods (PCA and ALD2D) on three different databases. This degradation can be up to 30% when the database contain very noisy face images even if their number is reduced. We analyzed this phenomenon and proposed a solution that consists in discarding noisy face images from the face recognition process during both training and recognition. This solution has been implemented by considering the problem as an outlier detection problem in the case of recognition from video sequence. In the case of recognition from still images, it has been considered as a classification problem. The performed experiments shown the effectiveness of the proposed approach as it allows increasing recognition rates by 10 to 30%. In our future work, we will extend this study to other face recognition methods (e.g. based on neural networks). We will also consider a way of handling noisy face images that are rejected in the proposed solution. They can be used, when classified, to develop multiple specific projection spaces for recognition.

References 1. Aggarwal CC, Yu PS (2001) Outlier detection for high dimensional data. In: proceedings of the ACM Sigmod international conference on management of Data, Santa Barbara, CA, USA 2. Aggrawal CC (2001) On the effects of dimensionality reduction on high dimensional similarity search. In: Symposium on principles of database systems, Santa Barbara, California, USA 3. Barnett V, Lewis T (1994) Outliers in statistical data, 3rd edn. Wiley 4. Belhumeur P, Hespanha J, Kriegman DJ (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. Special Theme Issue on Face and Gesture Recognition, IEEE Trans. on Pattern Anal Mach Intell 19(7):711–720 5. Bellman R (1961) Adaptive control processes: a guided tour. Princeton University Press 6. Berrani S-A, Amsaleg L, Gros P (2003) Approximate searches: k-neighbors+precision. In: Proceedings of the 12th ACM international conference on information and knowledge management, New Orleans, Louisiana, USA 7. Berrani S-A, Garcia C (2005) Enhancing face recognition from video sequences using robust statistics. In: Proceedings of the IEEE international conference on video and signal-based surveillance, Como, Italy

290

Multimed Tools Appl (2008) 38:271–291

8. Berrani S-A, Garcia C (2005) On the impact of outliers on high-dimensional data analysis methods for face recognition. In: proceedings of the international Workshop on Computer Vision Meets Databases, Baltimore, MD, USA 9. Blackburn D, Bone JM, Phillips JP (2001) Face recognition vendor test 2000: evaluation report. Technical report, http://www.frvt.org 10. Garcia C, Delakis M (2004) Convolutional face finder: a neural architecture for fast and robust face detection. IEEE Trans Pattern Anal Mach Intell 26(11):1408–1423 11. Hjelmas E, Low B (2001)Face detection: a survey. Comput Vis Image Underst 83(3): 236–274 12. Hodge VJ, Austin J (2004) A survey of outlier detection methodologies. Artif Intell Rev 22(2):85–126 13. Hubert M, Rousseeuw P, Branden KV (2005) ROBPCA: a new approach to robust principal component analysis. Technometrics 1(47):64–79 14. John GH (1995) Robust decision trees: removing outliers from databases. In: Proceedings of the 1st international conference on knowledge discovery and data mining, Montreal, Canada 15. Lin C, Fan K-C (2003) Pose classification of human faces by weighting mask function appraoch. Pattern Recogn Lett 24:1857–1869 16. Messer K, Matas J, Kittler J, Lüettin J, Maitre (1999) XM2VTSDB: the extended M2VTS database. In: Proceedings of the internatinal conference on audio- and video-based biometric person authentication, Washington, USA 17. Pentland A, Moghaddam B, Starner T (1994) View-based and modular eigenspaces for face recognition. In: Proceedings of 13th IEEE conference on computer vision and pattern recognition, Seattle, Washington, USA 18. Phillips JP, Gother PJ, Micheals RJ, Blackburn D, Tabassi E, Bone JM (2003) Face recognition vendor test 2002: evaluation report. Technical report, http://www.frvt.org. NIST IR n 6965 19. Phillips PJ, Moon H, Rauss PJ, Rizvi S (2000) The FERET evaluation methodology for face recognition algorithms. IEEE Trans Pattern Anal Mach Intell 22(10):1090–1104 20. Phillips PJ, Wechsler H, Huang J, Rauss P (1998) The FERET database and evaluation procedure for face-recognition algorithms. Image Vis Comput 16(5):295–306 21. Rousseeuw P, Van Driessen K (1999) A fast algorithm for the minimum covariance determinant estimator. Technometrics 41:212–223 22. Seo K, Cohen I, You S, Neumann U (2004) Face pose estimation system by combining hybrid ICA-SVM learning and re-registration. In: proceedings of the asian conferenve on computer vision, Jeju, Korea 23. Sim T, Baker S, Bsat M (2002) The CMU pose, illumination, and expression (PIE) database. In: Proceedings of the 5th international conference on automatic face and gesture recognition, Washington DC, USA 24. Sirovitch L, Kirby M (1987) A low-dimensional procedure for the characterization of human faces. J Opt Soc Am 4(3):519–524 25. Turk M, Pentland A (1991) Eigenfaces for recognition. J Cogn Neurosci 3(1):71–86 26. Verleysen M (2003) Learning high-dimensional data. In: Piuri SAV, Gori M, Goras L (eds) Limitations and future trends in neural computation. IOS Press, pp 141–162 27. Verleysen M, Francois D, Simon G, Wertz V (2003) On the effects of dimensionality on data analysis with neural networks. In: Proceedings of the 7th internatioanl work-conference on artificial and natural neural networks, Maó, Menorca, Spain 28. Viola PA, Jones MJ (2004) Robust real-time face detection. Int J Comput Vis 57(2):137–154 29. Visani M, Garcia C, Jolion J-M (2004) Two-dimensional-oriented linear discriminant analysis for face recognition. In: Proceedings of the international conference on computer vision and graphics, Warsaw, Poland 30. Yang J, Zhang D, Frangi AF, Yang J-Y (2004) Two-dimensional PCA: a new approach to appearance-based face representation and recognition. IEEE Trans Pattern Anal Mach Intell 26(1):131–137 31. Yang M, Kriegman D, Ahuja N (2002) Detecting faces in images: a survey. IEEE Trans Pattern Anal Mach Intell 24(1):34–58 32. Yao P, Evans G, Calway A (2001) Face tracking and pose estimation using affine motion parameters. In: Proceedings of the 12th scandinavian conference. on Image Analysis, Bergen, Norway 33. Zhao W, Chellappa R, Phillips J, Rosenfeld A (2003) Face recognition: a literature survey. ACM Comput Surv 35(4):399–458

Multimed Tools Appl (2008) 38:271–291

291

Sid-Ahmed Berrani received in February 2004 the Ph.D. degree in Computer Science from the University of Rennes 1, France. His Ph.D. work has been achieved at IRISA-INRIA, Rennes and was funded by Thomson R&D France. It concerned multimedia indexing, content-based image retrieval, and similarity search in very large image databases. He then spent 6 months as a Research Follow in the Sigmedia Group at the University of Dublin, Trinity College, where he has been working on video indexing and multidimensional data analysis. Since November 2004, Sid-Ahmed Berrani is a Researcher at France Telecom R&D Rennes where his research interests include image and video indexing, pattern recognition and image classification.

Christophe Garcia received the Ph.D. degree in Computer Vision from the University of Lyon I, France, in 1994. He has been involved in various computer vision research projects during his stay of three years at the IBM Vision Automation Group, France, and at the Computer Vision Center of the Autonomous University of Barcelona, Spain. As a fellow of the European Consortium in Informatics and Mathematics (ERCIM), he spent one year at the German National Research Center (now Fraunhofer Institute), working toward the development of autonomous and multiagents robots. From 1997 to 2002, he was a researcher at the Foundation for Research and Technology Hellas (FORTH), and a visiting Professor at the Computer Science Department of the University of Crete, Greece, where he was teaching Artificial Neural Networks and Pattern Recognition. In 2003, he spent 10 months at IRISA-INRIA, Rennes, France, working in the field of automatic video structuring and indexing. Since December 2003, Dr. Garcia is a Senior Expert Researcher at France Telecom R&D, leading research activities in Multimedia Content Indexing. His current research activities are in the areas of neural networks, pattern recognition, image and video analysis, and computer vision. He holds 15 industrial patents and has published more than 60 articles in international conferences and journals. He is currently associate editor of Int. Journal of Visual Communication and Image Representation (Elsevier), Image and Video Processing (Hindawi), Pattern Analysis and Application (Springer-Verlag) and Pattern Recognition (Elsevier).