Face Recognition from Face Motion Manifolds

0 downloads 0 Views 3MB Size Report
In the first, feature extraction stage input images are processed to form a .... (HMM) [124, 109, 151] approaches never established themselves as very promising ones so ..... ”Test runs of the Visionics (now Identix) magical face-recognition terrorist finder ...... Courts,. Biometrics comparison chart., http://ctl.ncsc.dni.us/biomet.

– First Year PhD Report –

Face Recognition from Face Motion Manifolds

Ognjen Arandjelovi´ c [email protected]

Department of Engineering

Abstract Automatic face recognition (AFR) is one of the key means of computer-human interaction. Its applications include security (access-control, surveillance), intelligent environments and content-based multimedia retrieval. This report focuses on access-control AFR using video. Two novel face recognition algorithms are introduced, both based on the concept of a Face Motion Manifold (FMM) as a representation of a human face. The first part of the work introduces FMMs and addresses the fundamental issue of face recognition-specific means of robustly comparing them in the presence of noise. The second part of the report extends this work and proposes a novel method for comparing FMMs under severe illumination variation, while achieving significant robustness with respect to pose. Rigourous experimental framework is used to evaluate all methods described in this report and compare them with state-of-the-art algorithms in the literature. Consistent recognition rates of 97–100% are demonstrated under loosely controlled illumination and 94–100% under significant illumination changes.

Contents 1 Introduction 1.1

Motivation

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6

A case for AFR from video . . . . . . . . . . . . . . . . . . . . . . . .

8

Problem Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8

1.2.1

Why is face recognition hard? . . . . . . . . . . . . . . . . . . . . . . .

8

Document Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9

1.1.1 1.2

1.3

6

2 Literature Review

11

2.1

Neuroscience of Face Recognition . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.2

Still-shot AFR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.3

2.2.1

Early work – geometric feature-based methods. . . . . . . . . . . . . . 15

2.2.2

The 1990s – statistical, appearance-based methods. . . . . . . . . . . . 17

2.2.3

Neural network-based methods. . . . . . . . . . . . . . . . . . . . . . . 23

2.2.4

Wavelet-based methods. . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.2.5

Shape+appearance generative model-based methods. . . . . . . . . . . 28

Video-based Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

1

2.4

2.5

2.3.1

Simple extensions of still recognition methods . . . . . . . . . . . . . . 33

2.3.2

Video-based recognition with temporal information . . . . . . . . . . . 34

2.3.3

Video-based recognition without temporal information . . . . . . . . . 37

Commercial Face Recognition Systems . . . . . . . . . . . . . . . . . . . . . . 39 2.4.1

FaceItr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

2.4.2

FacePASSr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

2.4.3

FacePassr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

2.4.4

FaceKeyr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

2.4.5

FaceVACSr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

2.4.6

Other commercial software . . . . . . . . . . . . . . . . . . . . . . . . 41

Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3 Recognition Framework

43

3.1

Imaging Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.2

Face Video Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.2.1

CamFace Database data acquisition . . . . . . . . . . . . . . . . . . . 46

3.2.2

Face database description . . . . . . . . . . . . . . . . . . . . . . . . . 47

3.3

Face Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.4

Face Motion Manifolds (FMMs) . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.4.1

To model face dynamics or not? . . . . . . . . . . . . . . . . . . . . . 50

3.5

Registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3.6

Facial Feature Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.6.1

3.7

Facial feature detection using shape extraction and pattern matching . 53

Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 2

4 Robust Recognition from FMMs - Controlled Illumination 4.1

4.2

4.3

59

Comparing FMMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.1.1

Nearest neighbour approach . . . . . . . . . . . . . . . . . . . . . . . . 60

4.1.2

MSM and principal angles . . . . . . . . . . . . . . . . . . . . . . . . . 60

4.1.3

FMMs as probability distributions . . . . . . . . . . . . . . . . . . . . 61

Distance Measures for Probability Distribution Functions (pdfs)

. . . . . . . 62

4.2.1

Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

4.2.2

Kullback-Leibler divergence . . . . . . . . . . . . . . . . . . . . . . . . 63

4.2.3

Limitations of Kullback-Leibler divergence . . . . . . . . . . . . . . . . 63

4.2.4

Jensen-Shannon divergence . . . . . . . . . . . . . . . . . . . . . . . . 64

4.2.5

Resistor-Average distance . . . . . . . . . . . . . . . . . . . . . . . . . 64

4.2.6

Computing KLD, JSD and RAD . . . . . . . . . . . . . . . . . . . . . 65

Principal Component Analysis (PCA) . . . . . . . . . . . . . . . . . . . . . . 66 4.3.1

Nonlinear manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

4.4

Kernel Principal Component Analysis . . . . . . . . . . . . . . . . . . . . . . 66

4.5

Combining Information Theory and Kernel PCA . . . . . . . . . . . . . . . . 69

4.6

Modelling Registration Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.6.1

Small registration errors . . . . . . . . . . . . . . . . . . . . . . . . . . 70

4.6.2

Large registration errors - outliers . . . . . . . . . . . . . . . . . . . . 71

4.7

Wrapping it all up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

4.8

Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

5 Face Recognition from FMMs - Achieving Illumination Invariance 5.1

76

Limitations of the Robust Kernel RAD method . . . . . . . . . . . . . . . . . 77 3

5.2

Face Pose Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 5.2.1

5.3

Pose-semantic clustering . . . . . . . . . . . . . . . . . . . . . . . . . . 78

Illumination Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.3.1

Gamma Intensity Correction . . . . . . . . . . . . . . . . . . . . . . . 81

5.3.2

Illumination subspace normalization . . . . . . . . . . . . . . . . . . . 81

5.4

Comparing Pose Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

5.5

Unified Manifold Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

5.6

5.5.1

Likelihoods from annotated corpus . . . . . . . . . . . . . . . . . . . . 87

5.5.2

Estimates using RBF networks . . . . . . . . . . . . . . . . . . . . . . 87

Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

6 Evaluation and Results

90

6.1

Evaluation Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

6.2

Evaluation Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

6.3

Facial Feature Detector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

6.4

Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

6.5

6.4.1

Loosely controlled illumination . . . . . . . . . . . . . . . . . . . . . . 95

6.4.2

Varying illumination. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

7 Conclusions and Future Work

100

7.1

Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

7.2

Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 7.2.1

Detailed comparative evaluation of Robust Kernel KLD . . . . . . . . 103

4

7.2.2

Alternative representations . . . . . . . . . . . . . . . . . . . . . . . . 103

7.2.3

Manifold parameterizations . . . . . . . . . . . . . . . . . . . . . . . . 103

7.2.4

Pose robustness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

7.2.5

Provisional research timetable . . . . . . . . . . . . . . . . . . . . . . . 104

5

Chapter 1

Introduction This chapter introduces the problem of automatic face recognition (AFR) considered in this report. A summary on motivation and the most important applications of AFR is followed by a section on the difficulties of this problem. An overview of the structure of the report finalizes the chapter.

1.1

Motivation

Face recognition is one of the best-known biometrics. With other biometrics, such as the fingerprint [76], iris [49, 158] or retinal scan-based ones, having achieved impressively high identification rates [150] (retinal scan ∼ 10−7 error rate [57]), one might ask why face recognition would be on any practical interest. For one, face information may be the only cue available, such as in the increasingly important content-based multimedia retrieval applications [8, 136], see Figure 1.1(a). In others, such as in some surveillance environments

6

Query Ranked results AFR from video

Video DB

(a) AFR from video

Match

Wanted DB

(b) Figure 1.1: Automatic face recognition for content-based multimedia retrieval (a) and surveillance (b). (Figure 1.1(b)), bad imaging conditions render any single cue insufficient and a fusion of as many biometric cues as practical, is needed (e.g. see [31, 32, 134] as well as Section 2.4.4). Even for access-control applications (for buildings, computer systems, airports etc.) when more reliable cues are available [57], face recognition has the attractive property of being very intuitive to humans as well as non-invasive, making it readily acceptable by wider audiences. Finally, face recognition does not require user cooperation.

7

1.1.1

A case for AFR from video

The nature of many practical applications is such that more than a single image of a face is available. In surveillance, for example, the face can be tracked to provide a temporal sequence of a moving face. For access-control use of face recognition the user may be assumed to be cooperative and hence be instructed to move in front of a fixed camera. A number of technical advantages of using video also exist: person-specific dynamics can be learnt [144], or more effective face representations be obtained (e.g. super-resolution images [14] or a 3D face model [26]) than in the single-shot recognition setup. Regardless of the way in which multiple images of a face are acquired, this abundance of information can be used to achieve greater robustness of face recognition by resolving some of the inherent ambiguities (shape, texture, illumination etc.) of single-shot recognition.

1.2

Problem Specification

The problem of face recognition from single images concerns matching a detected (roughly localized) face (or faces) [166, 71] against a database of known faces with associated identities. This task, although very intuitive to humans and despite the vast amount of research behind it (see Chapter 2), still poses a significant challenge to automatic computer-based methods.

1.2.1

Why is face recognition hard?

A number of factors other than one’s identity influence the way an imaged face appears. Lighting conditions, and especially light angle (see Figure 1.2), can drastically change the

8

Figure 1.2: Imaging conditions, such as the head pose and illumination dramatically change the appearance of faces. appearance of a face. Facial expressions, including closed or partially closed eyes, also complicate the problem, just as head pose change does. Partial occlusions, be they artifacts in front of the face (such as glasses), or resulting from hair style change, growing a beard or a moustache all pose problems to automatic face recognition methods. Invariance to these factors is the main challenge of automatic face recognition.

1.3

Document Structure

The organization of this report is as follows. Chapter 2 reviews the literature relevant for automatic face recognition. The neuroscience of human face recognition, AFR from still images and videos, as well as commercial systems are all covered in this chapter. Chapter 3 presents an introduction to the recognition setup considered in this work, databases used for comparative method evaluation and data pre-processing techniques employed. A novel information-theoretic method for face recognition is introduced in Chapter 4. This chapter focuses on the problem of modelling nonlinear FMMs and robustly comparing them under the presence of face registration noise. Chapter 5 begins with a discussion on the limita-

9

tions of the aforementioned algorithm. The rest of the chapter introduces a novel AFR algorithm that is robust to pose and drastic illumination changes. Comparative evaluation of the methods presented in this report is given in Chapter 6 and the key failure modes are identified. Finally, Chapter 7 concludes the report, discusses promising directions for future research and presents a provisional timetable for the remainder of this PhD.

10

Chapter 2

Literature Review Important practical applications of automatic face recognition have made it a very popular research area of computer vision. This is evidenced by a vast number of face recognition algorithms developed over the last three decades and, in the recent years, the emergence of a number of commercial face recognition systems. This chapter presents a review of the face recognition literature. The first section covers some basic neuroscience of human face recognition and its applicability to AFR. It is followed by sections on still-shot and videobased AFR and, finally, a section on commercial AFR systems available on the market.

2.1

Neuroscience of Face Recognition

While AFR methods need not necessarily mimic the way humans recognize faces, a lot of neuroscience studies are of immediate interest to the AFR community. The understanding of human face perception and AFR have recently been evolving together, very much benefitting

11

from each other [78]. It is therefore instructive to take a brief look at and try to understand the basics of the former. The neuroscience of face recognition literature is very rich, and being somewhat out of scope of the work presented in this report, this should be regarded only as a concise overview of the topic. For a more detailed treatment see [23]. The most important questions for AFR research that neuroscience addresses are the following: • Recognition performance - How well do humans recognize faces, • Mechanistic issues - how humans do it, and • Emulating performance - modelling of human face perception. Performance. Humans recognize faces on daily basis, seemingly effortlessly and with high accuracy. However, humans seldom recognize people using face as the only cue – other biometric and non-biometric cues are frequently used (such as gait and hair, or clothes) as well as prior, contextual knowledge. Wide variability (of an order of magnitude) in human ability to recognize faces was reported in the recent work of Adler [4] (also see [69] for a study on distinctiveness of faces). Of special importance for research presented in this report is the finding that humans recognize familiar faces better from video sequences than still shots. The same phenomenon was more pronounced for familiar faces and in poor illumination conditions [113]. Mechanics. There is now little doubt that human face recognition is a dedicated process. This is evidenced by worse recognition performance [106], and illumination and pose generalization [108] for inverted or faces rotated in plane [53], as well as experiments with subjects with pathological conditions such as prosopagnosia [157, 19]. Patients suffering 12

Figure 2.1: So-called ”Thatcher Illusion”. from prosopagnosia, while able to recognize objects as well as detect faces, are unable to recognize the latter. It is still, however, a topic of debate whether face recognition is innately a dedicated process or a consequence of the vast experience and exposure from the early childhood. Face representation is a topic of interest both to research in human and computer face recognition. There is now ample evidence that humans recognize face using both holistic and local features [106], well demonstrated by the ”Thatcher effect” [143] (see Figure 2.1). Recognition performance was also found to to be dependent on the distinctiveness of the faces. Non-distinctive faces were found to be the hardest to recognize [15, 69], while caricatural effects did not significantly degrade recognition performance [102] showing that humans automatically select the most discriminative features for recognition, as holistic features are deformed by caricaturization. In general, hair, mouth and eyes were found to be selected for recognition, while nose contributed insignificantly. Both lower and higher spatial bands seem to be used in recognition, lower ones corresponding to coarser categorizations (male/female, white/black), while higher are used for finer classification.

13

Faces

Spatial Filter Lattice

(a)

(b)

Spatial Units in a Deformable Lattice

Spatial Units in a Graph of Fiducial Points

Distance

Figure 2.2: A model of human face recognition [23]. In the (a) pathway, modelled after Lades et al. [96], the default position of the columns (termed ‘jets’) of filters is a lattice similar to that of the input layer but which can be deformed to provide a best match to a probe image. In the (b) pathway, modelled after Wiskott, Fellous, Krger, and von der Malsburg [159], the jets are centred on a particular facial feature, termed a fiducial point.

14

2.2

Still-shot AFR

Due to the practical limitations in storing and processing large amounts of data, the initial and most of the later work in AFR was in recognition from still images. As, additionally, many of the video-based methods developed later are based on the same or similar concepts as the still-shot ones, we present the still-shot AFR literature review first. Methods are organized into five (sometimes overlapping) groups and are covered in the order of increased sophistication: 1. Early work, 1970-80s – geometric feature-based, 2. The 1990s – statistical, appearance-based, 3. The 1990s – neural networks, 4. The 1990s – wavelet-based, and 5. The 2000s – shape+appearance generative model-based. The key concepts that are addressed are that of face representation and, based on a given representation, classification. This is illustrated in Figure 2.3.

2.2.1

Early work – geometric feature-based methods.

The earliest attempts at AFR were using a geometric, feature-based approach, first appearing in the works of Kelly [83] and Kanade [79]. These methods use geometric features (such as distances [34] or angles) derived from locations of characteristic facial features (such as the eyes etc., see Figure 2.4) to discriminate between faces, typically using a Bayes classifier [34]. In his early work Kanade reported correct identification of only 15 out of 20 individuals in 15

Input Data (face image(s), videos etc.)

Feature Extraction

Feature Vector

Classification

Classification Decision

Known Persons Database

Figure 2.3: A schematic representation of a general face recognition system. Two key stages can be recognized. In the first, feature extraction stage input images are processed to form a feature vector – the face ‘fingerprint’. In the second, classification stage, the novel person’s face is recognized by comparing its feature vector representation with entries in the database of known people. controlled imaging conditions, although Goldstein et al. [64] and, Kaya and Kobayashi [82] (also see work by Bledsoe et al [28, 37]) showed geometric features to be sufficiently discriminatory if facial features are manually selected. In practice, geometric feature-based methods suffer from the sensitivity to noise in the localization of facial features, especially prominent in difficult illumination conditions, and by their very nature, these methods are also very sensitive to pose variation i.e. camera angle. In the past, due to the compactness of the face representation they use, they were attractive for real-time applications. Nowadays, the enormous increase in raw processing power and readily available digital storage, make this a moot point. More importantly, geometric face representations are inherently robust to illumination changes. For this reason, while primitive geometric features of the early AFR work have now been largely abandoned, shape and geometry-based features are still used by a number of methods (see Section 2.2.5).

16

Figure 2.4: Manually defined facial features [47].

2.2.2

The 1990s – statistical, appearance-based methods.

The recognition of limitations of geometric feature-based methods in the early AFR work motivated research into alternative representations of faces. Late 1980s/early 1990s gave rise to the first statistical, appearance-based approaches.

Normalized cross-correlation. In the simplest of statistical methods, the similarity of two face images is quantified by the maximal response of their normalized cross-correlation [22]: P P i ρ(x, y) = P P i

j I1 (i

+ x, j + y)I2 (i, j) P P i j I2 (i, j)

j I1 (i + x, j + y)

(2.1)

where I1 (x, y) and I2 (x, y) are two face images, treated as matrices (or equivalently discrete functions). It can be seen that this similarity measure is inherently robust with respect to additive and multiplicative illumination changes. To achieve further robustness with respect

17

to expression, illumination and pose, instead of the whole image of a face, most methods use characteristic face regions and combine the corresponding highest cross-correlation scores [34].

Linear subspace methods. Most linear subspace-based methods have the same common approach to recognition. Given a database of faces and a novel face, they are first projected into a linear subspace. The novel face image is then classified using the nearest-neighbour or the Bayes classifier in this space [156]. The main difference between different methods is in the choice of the projection space. Eigenfaces. The ”Eigenfaces” approach is one of the first appearance-based methods, first proposed by Turk and Pentland [145, 146]. This approach to face recognition was motivated by previous work by Kohonen [87] on auto-associative memory matrices for storage and retrieval of face images, and by Kirby and Sirovich [135, 85] on compact representations of images of faces. The projection subspace chosen in Eigenfaces is the one that best approximates the training data in terms of the expected L2 error. This subspace is spanned by the largest eigenvectors (see Figure 2.5) of the data cross-correlation matrix. A number of different distance measures have been proposed for use with Eigenfaces, best results being reported with L1 and Mahalanobis norms [52, 21]. Based on the same idea as Eigenfaces, but decorrelating not only the first, but also higher moments of data variation are methods that use Independent Component Analysis (ICA) [12, 52, 17, 18], see Figure 2.5. Reported performance in comparison with PCA-based methods is inconclusive (e.g. see [17] and [52]).

18

Figure 2.5: The first eight eigenvectors computed from 500 randomly selected images from FERET gallery (top) and eight of 200 ICA basis vectors computed using the technique of [17] (bottom). Note that the eigenvectors are ‘global’ in that they often overlap, assigning significant weights to the same pixels while ICA basis vectors are more spatially localized and never overlap, unlike their PCA counterpart [12]. The main limitation of Eigenfaces is that the derived features (eigenfaces) are selected to best describe training classes, not discriminate between them. Hence, PCA is recognized as more suitable for detection and compression tasks (see [103] and [84]) than recognition. Since only eigenvectors corresponding to the highest eigenvalues are actually used in recognition, the discriminating ability of eigenvectors should ideally decay together with eigenvalues, something shown not to be the case in the works of Yambor [165] and O’Toole [114] (for discussion also see [70] and [2]). Better results than with original Eigenfaces were reported when the first 3 principal components were omitted [63]. Put differently, eigenfaces were found to describe well both intra-class and extra-class variation. Fisherfaces. The Fisherfaces method was developed to address the main limitation of the original Eigenfaces method - that of limited discrimination ability. It is based on Fisher Linear Discriminant analysis (LDA) [164, 169]. Unlike Eigenfaces, the subspace projection chosen by Fisherfaces is constructed not to best describe the training data, but to best

19

discriminate between training classes. The objective function that LDA minimizes is the ratio of projected data within-class and between-class matrix norms. When data is normally distributed, the constructed projection subspace is also optimal in probabilistic, maximum likelihood sense. The projection space in Fisherfaces is spanned by the first eigenvectors of −1 the matrix SW SB :

SW =

K X X

(xi − x ¯i )(xi − x ¯ i )T

(2.2)

i=1 x∈Xi

SB =

K X

ni (¯ xi − x ¯)(¯ xi − x ¯)T

(2.3)

i=1

where x ¯ is the mean face and ni the number of samples in class Xi . A major shortcoming of Fisherfaces is its limited generalization ability. An advantage of the Eigenfaces method lies in its ability to describe novel faces that the algorithm has not seen before, for example due to mild lighting direction changes [55]. On the other hand, the projection subspace used by the Fisherfaces is chosen to best discriminate between training images, an approach that is found to typically perform better than Eigenfaces [164, 155]. However, the projection subspace is sensitive to the particular choice of training images [21]. More invariance to lighting conditions was demonstrated by applying Fisherfaces to Fourier transformed images [7]. Bayesian Eigenfaces. Bayesian Eigenfaces method is a generalization of Fisherfaces. In this method, PCA is used to separately characterize intra-class and inter-class variations, unified in a Bayesian framework. Intra-personal subspace is defined as the subspace spanned by the first eigenvectors of the intra-personal covariance matrix. This matrix is the mean of covariance matrices for each training person. The orthogonal, extra-personal variation 20

subspace is spanned by the rest of the eigenvectors. In general, the difference vector between a novel input face and a reference face has components in both of these subspaces. Bayesian Eigenfaces classifies the novel face to the same class as the reference if the maximum a posteriori (MAP) estimate of the difference vector being intra-personal is greater than 0.5. Given a difference vector ∆, and intra-personal and extra-personal subspaces ΩI , ΩE , the Bayesian MAP estimate gives:

P (∆ ∈ ΩI ) = P (ΩI |∆) =

P (∆|ΩI )P (ΩI ) P (∆|ΩI )P (ΩI ) + P (∆|ΩE )P (ΩE )

(2.4)

Probabilities P (Ω) are estimated from the prior knowledge about the face database used (i.e. considering the number of image per person and the total number of images). On the other hand: ³ 2 ´ ³ PM y 2 ´   (∆) exp − ε 2ρ exp − 12 i=1 λii    · P (∆|Ω) =  QM 1/2 (2πρ)(N −M )/2 (2π)M/2 i=1 λi 

(2.5)

where yi are components of ∆ in Ω, λi corresponding eigenvalues and ε2 (∆) the component of ∆ in the complementary subspace Ω. The first term in the above expression can be seen as the Mahalanobis distance in Ω (if the pattern belongs to this subspace, it is expected to behave according to the distribution characterized by the principal components of Ω), whereas the second term is derived under the assumption that the pattern has added noise in the orthogonal subspace Ω according to an isotropic Gaussian. Bayesian Eigenfaces attempt to separate intra-personal and extra-personal sources of variation in face appearance. Each mode of variation is approximated by a Gaussian p.d.f. This assumption, while providing a convenient expression for the MAP estimate of class membership does not accurately model these highly nonlinear manifolds. The modelling 21

constraint that these two distributions are confined to two orthogonal subspaces is also an oversimplification that does not reflect the true complexity of data behaviour. Additionally, in this method another important source of variation in images is not addressed - that of noise. Although noise energy is typically much lower than that of inter-personal variation energy, since Mahalanobis distance measure is used (see (2.5)), contribution can be significantly magnified [153]. Bayesian Eigenfaces were found to typically perform better than the original Eigenfaces [105]. In general, linear subspace methods were found to be very sensitive to image plane transformations [131], such as translation, rotation or scaling [66, 110], making registration a necessary pre-processing step. Fast drop-off in performance was also found with significant head rotation around the vertical axis [12, 16]. This effect is not corrected using plane transformations only. To counter this effect, in [116] the authors used a number of different linear subspaces to represent faces imaged from different angles. A harder problem to deal with is the performance drop-off due to varying expressions stemming from the holistic nature of these methods. Local methods, such as Eigenfeatures [116] and Eigeneyes [50] were developed in the attempt to overcome this problem, reaching recognition rates better than those of the original Eigenfaces [50]. These also have performance advantages. However, best results were obtained using a combination of holistic and feature-based features [116].

Other statistical methods. A number of other statistical, appearance-based approaches to face recognition have emerged over the years. Most of them are based on a combination of Eigenfaces and Fisherfaces. In a recent paper of Wang and Tang [153]), for example, intra- and inter- personal face

22

differences and noise are modelled using a fusion of PCA, LDA and Bayesian approaches. Other methods, such as Singular Value Decomposition (SVD [119]) or Hidden Markov Model (HMM) [124, 109, 151] approaches never established themselves as very promising ones so we only refer the interested reader to relevant publications. It is worth noting that a number of methods have been proposed to deal specifically with illumination invariance, typically under somewhat restrictive assumptions. In the important work of Belhumeur et al. [20] it was shown that the set of N -pixel images of a convex, Lambertian object, illuminated by an arbitrary number of point light sources at infinity, forms a convex polyhedral cone in