Automatic Asymmetric 3D-2D Face Recognition - CNRS

0 downloads 0 Views 437KB Size Report
Rama et al. [10] proposed Partial ... overview of the proposed method is introduced in section II, and section III shows the preprocessing pipeline. Section IV.
2010 International Conference on Pattern Recognition

Automatic Asymmetric 3D-2D Face Recognition Di Huang1

Mohsen Ardabilian1

Yunhong Wang2

1

2

MI Department, LIRIS Laboratory Ecole Centrale de Lyon Lyon, France {di.huang; mohsen.ardabilian; liming.chen}@ec-lyon.fr

Abstract²3D Face recognition has been considered as a major solution to deal with unsolved issues of reliable 2D face recognition in recent years, i.e. lighting and pose variations. However, 3D techniques are currently limited by their high registration and computation cost. In this paper, an asymmetric 3D-2D face recognition method is presented, enrolling in textured 3D whilst performing automatic identification using only 2D facial images. The goal is to limit the use of 3D data to where it really helps to improve face recognition accuracy. The proposed approach contains two separate matching steps: Sparse Representation Classifier (SRC) is applied to 2D-2D matching, while Canonical Correlation Analysis (CCA) is exploited to learn the mapping between range LBP faces (3D) and texture LBP faces (2D). Both matching scores are combined for the final decision. Moreover, we propose a new preprocessing pipeline to enhance robustness to lighting and pose effects. The proposed method achieves better experimental results in the FRGC v2.0 dataset than 2D methods do, but avoiding the cost and inconvenience of data acquisition and computation of 3D approaches. Keywords-asymmetric, 3D, 2D face recognition; Canonical Correlation Analysis; Sparse Representation Classifier

I.

INTRODUCTION

Face recognition is a critical and popular topic in the area of computer vision and image processing for its wide application potential and scientific challenge. Unfortunately, despite the great progress made in the field [1], as a biometric feature, the face does not remain reliable as affected by variations of illumination, pose, facial expression, etc. In recent years, 3D face recognition has emerged as a major alternative that handles unsolved issues for reliable face recognition, i.e., lighting and pose changes [2, 3]. However, 3D approaches are currently limited by their registration and computation cost. Generally, face recognition requires data in the gallery and probe set to possess the similar properties: 2D/3D, color/gray, or even to be captured by the same type of camera sensor. Yet, more recently, several novel applications, namely asymmetric [4, 5] or heterogeneous [6, 7] face analysis, match faces between different types of data, which can be related by applying some techniques [8, 9]. So, it is possible to obtain relationship between 2D and 3D face data. Few tasks in the literature have addressed such an asymmetric 3D-2D face recognition problem so far. Rama et al. [10] proposed Partial Principle Component Analysis (P2CA) for feature extraction and dimensionality reduction on both

1051-4651/10 $26.00 © 2010 IEEE DOI 10.1109/ICPR.2010.305

1229 1225

Liming Chen1

School of Computer Science and Engineering Beihang University Beijing, China [email protected]

cylindrical texture representation (3D) in the gallery set and 2D images in the probe set. In [4], Riccio et al. used predefined control keypoints to compute some geometrical invariants for 2D/3D face recognition. More recently, Yang et al. [11] proposed patch based Kernel CCA to learn the mapping between range and texture face images. All the tasks partially rely on 2D data; however, none provided reliable performance as lighting condition or pose status changes. This paper presents a novel asymmetric 3D-2D face recognition method, aiming to limit the use of 3D data to where it really helps to improve performance. The approach utilizes textured 3D face models for enrollment, whilst only 2D facial images for identification, which makes it unique when compared with the state of the art. Since each 3D face model consists of one point-cloud and its corresponding 2D image, our approach contains two separate matching steps: 2D2D based on a Sparse Representation Classifier (SRC) [12]; 3D-2D by Canonical Correlation Analysis (CCA) [13]. Both matching scores are combined for final decision. Robustness is greatly improved by a new preprocessing pipeline making use of Logarithmic Total variation (LTV) [14] to decrease illumination influence and Active Appearance Model (AAM) [15] to normalize pose status. Experimental results achieved in the FRGC v2.0 database [15] are promising, proving that compared with traditional 2D face recognition methods, the proposed asymmetric face recognition method provides better performance; while compared with 3D shape based ones, it reduces high online cost and inconvenience of data acquisition and computation. The remainder of this paper is organized as follows: the overview of the proposed method is introduced in section II, and section III shows the preprocessing pipeline. Section IV presents the asymmetric face recognition approach in detail. Experimental results are described in section V. Section VI concludes the paper. II.

THE APPROACH OVERVIEW

The training and test stage frameworks are shown in Figure 1 and Figure 2 respectively. At the training stage (Figure 1), textured 3D face models, each of which contains one densely registered 2D image and 3D point-cloud, are required. For each face model, there are 64 manual landmarks. In 2D phase, AAM is built using illumination normalized 2D face images, and in 3D phase, all point-clouds are first registered, and then range face images

III.

are extracted. All texture and range faces are transformed to the mean face shape of AAM. Local Binary Patterns (LBP) [16] are applied to enhance the local structure of both texture and range images; then two types of resulting LBP faces are utilized to train a CCA subspace to learn the mapping between 2D and 3D face data [5]. In addition, four types of PCA subspaces are produced from the sample set of 2D face images, range face images, 2D LBP face images, and range LBP face images respectively.

A. 2D Preprocessing In this paper, LTV [14] is applied to normalize illumination variations because it works on any single image without any prior information on 3D face geometry or light sources. It not only inherits ability from the TV-L1 model to decompose an image f into a large-scale output u and a small-scale output v, but also possesses its properties of edge-preserving and multiscale additive signal decomposition. The LTV model is based on a general multiplicative light reflectance theory:  I ( x , y ) U ( x , y ) ˜ S ( x, y ) where I is reflected light intensity, ȡ is surface albedo, and S is lighting amount. By using logarithm transform on (1), the LTV model can be described as follows: f ( x, y ) log( I ( x, y )) log( U ( x, y ))  log( S ( x, y )) 

Training Set 3D Point-Cloud

2D Image

ICP Registration

Illumination Preprocess

Range Image Extraction

Build 2D AAM

DATA PREPROCESSING

u*

arg min{³ ’u  O f  u L } v*

PCA Subspace Range Image

Convert to Mean Shape



f  u*

 where ³ ’u is the total variation of u, and Ȝ is a scalar threshold on scale. In (3), minimizing ³ ’u would make the level sets of u have simple boundaries and minimizing f  u would ensure the approximation of u to f. S and ȡ can be approximately estimated by solving (5):  S | exp(u* ), U | exp(v* ) Figure 3 shows some LTV based illumination normalized 2D face image samples.

PCA Subspace 2D Image

L

PCA Subspace Range LBP Face

Range LBP Face

PCA Subspace 2D LBP Face

2D LBP Face

Train CCA Subspace

Figure 1. Training stage framework.

At the test stage, textured 3D face models are used as gallery samples. After the same preprocessing as at the training stage, AAM fitting is operated on 2D images to locate faces. Based on positions of localized keypoints, both corresponding 2D and range face images are transformed to the mean face shape. All operations above are offline. Probe samples are 2D face images, hence the online part includes normalizing illumination, AAM fitting, 2D-2D and 3D-2D matching as well as score fusion. See Figure 2 for more details. Gallery 2D Image

Illumination Preprocess

Figure 3. The upper row presents original images; while the bottom row shows normalized ones.

After LTV illumination normalization, AAM is applied to normalize pose of 2D face images. A training set is required to produce an AAM. The AAM fitting method refers to [17]; implementation was achieved with DTU source code [18].

AAM Fitting

2D PCA Sub-Vector SRC Matching 2D PCA Sub-Vector

Test Set

Probe 2D Image

Illumination Preprocess

AAM Fitting

Final Fusion

Convert to Mean Shape 2D LBP Face PCA Sub-Vector CCA Regression

Gallery 3D Point-Cloud

ICP Registration

Range Image Extraction

Range LBP Face PCA Sub-Vector

Figure 2. Test stage framework.

1226 1230

B. 3D Preprocessing 3D face registration is very important in 3D face recognition for 3D pose correction. Based on AAM fitting result on its corresponding 2D face image, keypoints of each 3D face model can be obtained. Then, Region based Iterative Closet Point (R-ICP) [19] is applied after removing spikes and filling holes. R-ICP only works on the rigid region around the nose and forehead, which is considered insensitive to facial expression changes. The registration adopts a coarse-to-fine strategy. The coarse step utilizes 11 landmarks of all the 64 keypoints and applies SVD to recover 3D rotation and translation. At the fine step, ICP is employed to match rigid surfaces and improve estimate of translation and rotation parameters (see Fig.4).

(a)

(b)

can be solved as a good approximation to (6). With the solution x1 of (7), we can compute the residual between a given probe face and each individual gallery face as: Ri

j 1

 2

The identity of any given probe face is then determined as the one with the smallest residual R. B. 3D-2D Face Matching CCA [13] is a powerful analysis algorithm especially useful for relating two sets of variables, by maximizing correlation in the CCA subspace. Here, it is introduced to learn the mapping between range and 2D LBP faces. Given N pairs of samples (xi, yi) of (X, Y), i = 1, 2« N, where X  p , Y  q , with the mean value of zero. The goal of CCA is to learn a pair of directions wx and wy to maximize correlation between x = wxTX and y = wyTY. In the context of CCA, two projections: x and y are also referred to as canonical variants. Formally, the directions can be found as the maxima of the function: E[wTx XY T wy ]  U T T T T E[ wx XX wx ]E[wy YY wy ]

(c)

Figure 4. R-ICP based Registration (best scene in color): (a) rigid region of textured 3D data; (b) coarse step; (c) fine step.

IV.

k

y  ¦ x1 i , j vi , j

To test new pairs of variables, we firstly project them into CCA subspace [¶ = wxT;¶; \¶ = wyT