Face Recognition at-a-Distance Using Texture ... - Semantic Scholar

3 downloads 0 Views 2MB Size Report
Face Recognition at-a-Distance using Texture, Dense- and Sparse-Stereo .... sults, a pixel from the left/right image of the stereo pair is commonly mapped to ...
2010 International Conference on Pattern Recognition

Face Recognition at-a-Distance using Texture, Dense- and Sparse-Stereo Reconstruction Ham M. Rara, Asem A. Ali, Shireen Y. Elhabian, Thomas L. Starr, Aly A. Farag University of Louisville {hmrara01,amali003,syelha01,tlstar01,aafara01}@louisville.edu

Abstract This paper introduces a framework for long-distance face recognition using dense and sparse stereo reconstruction, with texture of the facial region. Two methods to determine correspondences of the stereo pair are used in this paper: (a) dense global stereomatching using maximum-a-posteriori Markov Random Fields (MAP-MRF) algorithms and (b) Active Appearance Model (AAM) fitting of both images of the stereo pair and using the fitted AAM mesh as the sparse correspondences. Experiments are performed using combinations of different features extracted from the dense and sparse reconstructions, as well as facial texture. The cumulative rank curves (CMC), which are generated using the proposed framework, confirms the feasibility of the proposed work for long distance recognition of human faces.

Figure 1. Illustration of captured images: (a) 3-meter indoor (b) 15-meter indoor, (c) 30-meter outdoor, and (d) 50-meter outdoor.

1. Introduction Face recognition is a challenging task that has been an attractive research area in the past three decades [9]. The main theme of the solutions provided by different researchers involves detecting one or more faces from the given image, followed by facial feature extraction which can be used for recognition. Recently, there has been interest in face recognition at-a-distance. Yao, et al. [8] created a face video database, acquired from long distances, high magnifications, and both indoor and outdoor under uncontrolled surveillance conditions. They created a comprehensive processing algorithm to deal with image degradations related to long-distance image acquisition and were successful in improving recognition rates. Medioni, et al. [4] presented an approach to identify non-cooperative individuals at a distance by inferring 3D shape from a sequence of images. We constructed our own passive stereo acquisition setup and an accompanying database in [6]. 1051-4651/10 $26.00 © 2010 IEEE DOI 10.1109/ICPR.2010.304

In this paper, we used the same stereo setup to increase our database to a total of 61 subjects. In addition to previous indoor ranges, we acquired samples from outdoor ranges of 30 and 50 meters (Fig. 1). Experiments are then performed using various combinations of different features extracted from the dense and sparse reconstructions, as well as facial texture. The paper is organized as follows: Section 2 describes stereo-based reconstruction, Section 3 discusses the features for recognition, Section 4 provides experimental results and related discussion, and Section 5 concludes the paper.

2. Stereo-matching Based Reconstruction Dense, Global Stereo Matching: The objective of the classical stereo problem is to find the pair of corresponding points p and q that result from the projection of the same scene point (X, Y, Z) to the two images 1225 1221

of the stereo-pair. The stereo problem is formulated as a MAP-MRF framework. To handle variant illumination between stereo pairs, the data term of the energy function to be minimized is modified with a normalized cross-correlation (NCC) similarity measure. Due to space constraints, more details can be found in [1]. Sparse-Stereo Reconstruction: This approach uses the fitted AAM mesh of both images of the stereo pair as the sparse correspondences [6]. To facilitate a successful fitting process, the AAM mesh is initialized according to detected face landmarks (eyes, mouth center, and nose tip). More details can be found in [6][5].

Figure 2. Distribution of distance (similarity) measures between authentic pairs.

the same subject). Fig. 2 illustrates a real distribution using the sparse-reconstruction approach. Since we are dealing with similarity measures, lower distance values mean good matches; higher values indicate unsure matches. Therefore, given a distance d in Fig. 2, a reasonable measure of confidence is the area under the curve to the right of d. These confidence values are used as scores to be combined later in the section in a multi-classifier architecture.

3. Features for Face Recognition For face recognition, various features are extracted from the dense and sparse reconstructions, as well as facial texture. Dense Reconstruction: The reconstruction results for this section are basically a dense collection of 3D points, resembling a face. The method used to classifty the dense reconstructions in this paper is the straightforward Procrustes approach [3] between the gallery and the probe. The Procrustes distance between two shapes is a least-squares type of metric that requires one-to-one correspondence between shapes. After some preprocessing steps, the squared Procrustes distance between two shapes x1 and x2 is the sum of squared point distances: Pd2 = kx1 − x2 k2

4. Experimental Results Fig. 3 illustrates the 3D reconstruction results according to the methodology in [1]. The figure contains different views of the reconstruction with and without texture. The reconstructed face contains about 10,000 3D vertices. Fig. 4 shows stereo reconstruction results of three subjects, visualized with the x-y, x-z, and y-z projections, after rigid alignment to one of the subjects. Notice that in the x-y projections, the similarity (or difference) of 2D shapes coming from the same (or different) subject is visually enhanced. In particular, Subject 1 (probe) is visually similar to Subject 1 (gallery) than Subject 2 (gallery) in the x-y projection. This similarity (or difference) is not obvious in other projections. This is the main reason behind the use of x-y projections as features in Sec. 3 (Sparse Reconstruction). This phenomenon is validated in [5] with a much larger FRGC database to prove that it is not a small-sample database occurrence. Experimental Setup: Our current database consists of 61 subjects, with a gallery at 3 meters and four different probe sets at 3-meter and 15-meter indoors, together with 30-meter and 50-meter outdoors. The 33-meter indoor of [6] is now replaced with 30-meter outdoor for brevity of results. Fig. 1 illustrates the captured images (left image of the stereo pair) at different ranges. The features discussed in Sec. 3 are now used for recognition. No training is required for identification

(1)

Sparse Reconstruction: The graph-cuts approach used in dense reconstruction of stereo pairs can be computationally expensive, especially with high-resolution images. Sparse-stereo reconstruction [6], using the fitted AAM mesh of the stereo pair as correspondences, provides a quick alternative to dense reconstruction. This paper uses the findings in [6] that 2D projections of the 3D sparse reconstructions can provide decent classification via a 2D version of the Procrustes distance. Texture: To visualize the dense reconstruction results, a pixel from the left/right image of the stereo pair is commonly mapped to each resulting 3D vertex (see Fig. 3). These textures can be used to classify faces, in addition to the two earlier methods. The method used in this paper is the classical principal component analysis (PCA) approach [2]. Confidence Levels: For each of the classification approaches (dense, sparse, and texture), we assign a confidence value to the identification result. The first step is to create a probability density function out of the set of all possible distance (similarity) measures between authentic pairs (i.e., image pairs that belong to 1222 1226

Figure 3. Dense 3D reconstructions from the 15-meter range.

Figure 4. Sparse-stereo reconstruction results.

submit the results to both dense and texture classifiers. This architecture, along with other possibilities, will be considered in the future.) The final result uses the sum rule of decision fusion [7], weighted by the accuracy at rank-1 of each method at a specific range. Notice that the output of the multi-classifier approach is superior to all others, for any distance range in Figs. 5-8.

using dense- and sparse-stereo reconstructions, i.e., the Procrustes distance is computed directly between the probe and each gallery instance and choosing the pair with the smallest distance as the match. For recognition using texture, the face space is determined by the gallery of 61 subjects at the 3-meter range. Three quick observations can be garnered from Figs. 5-8. Texture alone performs well at short ranges but is worst at outdoor and farther distances, i.e., it gets perfect recognition at the 3-meter range but is mediocre at the 30- and 50-meter outdoor ranges. This is expected since illumination variation can severely affect recognition. Dense reconstruction results generally have lower rank-1 recognition rates compared to its two counterparts. However, it performs better than texture at outdoor cases (e.g., 30-m and 50-m). Notice that the dense reconstruction results have a marked improvement over the results at [5], which makes use of moments for recognition. Sparse reconstruction results provide the middle ground between texture and dense reconstruction. Due to the quick computation of sparse-stereo reconstruction, it is an attractive alternative method for recognition. Multiple-Classifier Architecture: The observation regarding dense (and sparse) reconstruction results arriving perfect recognition at a lower rank compared to texture in outdoor conditions, leads to the design of a multiple-classifier architecture in Fig. 9. In this architecture, the top n (e.g., n = 7) candidates from the dense reconstruction classifier is submitted to both sparse and texture classifiers. (An alternative would be to replace the dense classifier with sparse and

Figure 5. Cumulative match characteristic (CMC) curves for the 3-meter indoor probe set.

5. Conclusions and Future Work We have studied the use of texture, sparse-stereo and dense-stereo reconstructions, in the context of longdistance face recognition. With our database of images, we have illustrated the effectiveness of relatively straightforward algorithms, especially when combined in a multi-classifier manner. A continuing goal of this project is to further increase the database size and include images as far as 100 meters at challenging image conditions. With that in mind,

1223 1227

Figure 6. CMC curves for the 15-meter indoor probe set.

Figure 9. Schematic diagram of the multiclassifier architecture.

References

Figure 7. CMC curves for the 30-meter outdoor probe set.

[1] A. Ali, M. Miller, T. Starr, and A. Farag. Passive stereobased 3d human face reconstruction at a distance. Technical report, CVIP Lab, Univ. of Louisville, Jan. 2010. [2] P. N. Belhumeur, J. Hespanha, and D. J. Kriegman. Eigenfaces vs. fisherfaces: Recognition using class specific linear projection. IEEE Trans. Pattern Anal. Mach. Intell., 19(7):711–720, 1997. [3] T. Cootes and C. Taylor. Statistical models of appearance for computer vision. Technical report, Univ. of Manchester, UK, Mar. 2004. [4] G. Medioni, J. Choi, C.-H. Kuo, and D. Fidaleo. Identifying noncooperative subjects at a distance using face images and inferred three-dimensional face models. IEEE Trans. Sys. Man Cyber. Part A, 39(1):12–24, 2009. [5] H. Rara, S. Elhabian, A. Ali, T. Gault, M. Miller, T. Starr, and A. Farag. A framework for long distance face recognition using dense - and sparse-stereo reconstruction. In ISVC ’09: Proceedings of the 5th International Symposium on Advances in Visual Computing, pages 774–783, Berlin, Heidelberg, 2009. Springer-Verlag. [6] H. Rara, S. Elhabian, A. Ali, M. Miller, T. Starr, and A. Farag. Face recognition at-a-distance based on sparsestereo reconstruction. Computer Vision and Pattern Recognition Workshop, 0:27–32, 2009. [7] A. Ross and A. Jain. Information fusion in biometrics. Pattern Recogn. Lett., 24(13):2115–2125, 2003. [8] Y. Yao, B. R. Abidi, N. D. Kalka, N. A. Schmid, and M. A. Abidi. Improving long range and high magnification face recognition: Database acquisition, evaluation, and enhancement. Comput. Vis. Image Underst., 111(2):111–125, 2008. [9] W. Zhao, R. Chellappa, P. J. Phillips, and A. Rosenfeld. Face recognition: A literature survey. ACM Comput. Surv., 35(4):399–458, 2003.

the straightforward PCA for texture recognition will be replaced by more elaborate algorithms. The authors are aware of more sophisticated techniques for pure 3D facial shape recognition. However, at this stage of the project, as long as the stereo reconstruction is done right and input images are devoid of expressions, this work has illustrated that simpler algorithms (e.g., Procrustes) can do the job. As more challenging scenarios are encountered later on, additional novel and existing approaches will be utilized. Another future step of this project is to use a more sophisticated facial feature localization method that is better than the current active appearance model (AAM) approach.

Figure 8. CMC curves for the 50-meter outdoor probe set.

1224 1228