Example-Based Face Shape Recovery Using the ... - Springer Link

9 downloads 5927 Views 513KB Size Report
Acquiring surface models of faces is an important problem in computer vision and ... For face shape recovery, however, the use of SFS has proved to be an ...
Example-Based Face Shape Recovery Using the Zenith Angle of the Surface Normal Mario Castel´ an1 , Ana J. Almaz´ an-Delf´ın2 , Marco I. Ram´ırez-Sosa-Mor´an3, and Luz A. Torres-M´endez1 1

2

CINVESTAV Campus Saltillo, Ramos Arizpe 25900, Coahuila, M´exico [email protected] Universidad Veracruzana, Facultad de F´ısica e Inteligencia Artificial, Xalapa 91000, Veracruz, M´exico 3 ITESM, Campus Saltillo, Saltillo 25270, Coahuila, M´exico

Abstract. We present a method for recovering facial shape using an image of a face and a reference model. The zenith angle of the surface normal is recovered directly from the intensities of the image. The azimuth angle of the reference model is then combined with the calculated zenith angle in order to get a new field of surface normals. After integration of the needle map, the recovered surface has the effect of mapped facial features over the reference model. Experiments demonstrate that for the lambertian case, surface recovery is achieved with high accuracy. For non-Lambertian cases, experiments suggest potential for face recognition applications.

1

Introduction

Acquiring surface models of faces is an important problem in computer vision and visualization, since it has significant applications in biometrics, computer games and production graphics. Shape-from-shading (SFS) [1] seems to be an appealing method, since this is a non-invasive process which mimics the capabilities of the human vision system [2]. For face shape recovery, however, the use of SFS has proved to be an elusive task, since the concave-convex ambiguity can result in the inversion of important features such as the nose. To overcome this problem, domain specific constraints have proved to be essential to improve the quality of the overall reconstructions, and the recovery of accurately detailed facial surfaces still proves to be a challenge. Despite the improvements achieved by using domain specific information, it is fair to say that no SFS scheme has been demonstrated to work as statistical SFS [3,4]. In this framework, the main idea is to represent surfaces in the parametric eigenspace. This is constructed through the eigenvectors of the covariance matrix of a training set of 3D faces. Once the surfaces are parameterized, shapecoefficients that satisfy image irradiance constraints are sought. Unfortunately, a computationally expensive parameter search has to be carried out, since the fitting procedure involves minimizing the error between the rendered facial surface and the observed intensity of the input image. This minimization procedure is subject to multiple local minima. A. Gelbukh and A.F. Kuri Morales (Eds.): MICAI 2007, LNAI 4827, pp. 758–768, 2007. c Springer-Verlag Berlin Heidelberg 2007 

Example-Based Face Shape Recovery

759

More recently, Hancok et al. [5,6] have tried to relax this problem by using different surface representations and alternative parameter fitting procedures. They have proposed statistical models that can be fitted to image brightness data using geometric constraints on surface normal direction provided by Lambert’s law [7]. Kemelmacher and Basri [8] have developed a novel method for 3D facial shape recovery from a single image using a single 3D reference surface height model of a different face. This example-based technique “molds” the reference model to the input image to achieve surface reconstruction. Their method seeks the shape, albedo, and lighting that best fit the image, while preserving the overall structure of the model. Although this method does not use statistical models of faces, good results can be achieve provided that the reference model shows a good resemblance to the input image. In this paper we test the simple idea of using the zenith angle of the surface normal to “map” facial features from an input image to a reference model. The zenith angle is calculated directly from the image intensities. This information is further coupled with the azimuth angle of a reference model in order to obtain a set of surface normals. The final result, however, is achieved only until these surface normals are integrated. Experiments over Lambertian data (i.e. ideal data) show that facial shape can be accurately recovered through the combination of zenith and azimuth angles. In a similar way, experiments with non-Lambertian data suggest a potential for face recognition applications. The paper is organized as follows: in Section 2 we explain concepts related to surface orientation, Section 3 describes the image irradiance equation. The combination of zenith and azimuth angles to recover facial surfaces is explained in Section 4. An experimental evaluation of the model is described in Section 5. We finally present conclusions and future work in Section 6.

2

Surface Orientation

Information about a surface that is intermediate between a full 3D representation and a 2D projection onto a plane is often referred to as a 2.5D surface representation [9]. Surface orientation is one of the most important 2.5D representations. For every visible point on a surface, there exists a corresponding orientation which is usually represented by either surface normal, surface gradient or the azimuth and zenith angles of the surface normal. In contrast to height data, directional information cannot be used to generate novel views in a straightforward way. However, given the illumination direction and the surface albedo properties, then the direction of the surface normal plays a central role in the radiance generation process. This is of particular interest in face analysis since light-source effects are responsible for more variability in the appearance of facial images than changes in shape or identity [10]. The surface gradient is based on the directional partial derivatives of the height function Z, p=

∂Z(x, y) ∂x

and

q=

∂Z(x, y) . ∂y

(1)

760

M. Castel´ an et al.

Fig. 1. The azimuth (φ) and zenith (θ) angles of a surface normal (left) and the visual interpretation of the slant and the tilt (right)

The set of first partial derivatives of a surface is also known as the gradient space. This is a 2D representation of the orientation of visible points on the surface. The surface normal is a vector perpendicular to the plane tangent to a point of the surface. The relation between surface normal and surface gradient is given by (p, q, −1) . (2) (nx , ny , nz ) =  p2 + q 2 + 1 Directional information can also be expressed using the zenith and azimuth angles of the surface normals. In terms of the slope parameters, the zenith angle  is θ = arctan p2 + q 2 and the azimuth angle is φ = arctan pq . Here we use the four quadrant arc-tangent function and therefore −π ≤ φ ≤ π and 0 ≤ θ ≤ π/2. The zenith angle is related to inclination and the azimuth angle is related to orientation. We will refer to these two angles as slant and tilt (see Figure 1). Figure 2 presents examples of facial surfaces. The figure is divided into two panels. The leftmost panel illustrates surface normals: nx , ny and nz appear from left to right. The slant and tilt related to these surface normals are shown, respectively, in the columns of the rightmost panel. Two different examples are

Fig. 2. The two rows of the figure present two different subjects. The three columns of the leftmost panel show nx , ny , nz . Slant and tilt are shown in the two columns of the rightmost panel.

Example-Based Face Shape Recovery

761

shown row-wise. We present images as intensity maps, where brighter and darker pixels correspond to higher and lower values for each measure. Note how the surface normals seem to characterize a face illuminated from three orthogonal directions. From the slant, we can observe which surface normals have a steep inclination (darker intensities) and which have a small inclination (brighther intensities). Hight slant values are located around regions such as forehead, tip of the nose, chin, and centers of the eyes and mouth. Facial boundaries and sides of the nose correspond to low slant values. Another important feature to note from the figure is the similarity between nz and the slant. This similarity will be discussed in the next section.

3

The Image Irradiance Equation

The shape-from-shading (SFS) problem is the one of recovering the surface that, after interaction with the environment (illumination conditions, objects’ reflectance properties, inter-reflections), produces the radiances perceived by human eyes as intensities. In brief, SFS aims to solve the image irradiance equation, E(x, y) = R(p, q, s), where E is the image brightness value of the pixel with position (x, y), and R is a function referred to as the reflectance map [11]. The reflectance map uses and q = ∂Z(x,y) together with the light source the surface gradients p = ∂Z(x,y) ∂x ∂y direction vector s to compute a brightness estimate which can be compared with the observed brightness. If the surface normal at the image location (x, y) is n(x, y), then under the Lambertian reflectance model, with a single light source direction, no interreflections and constant albedo, the image irradiance equation becomes E(x, y) = n · s.

(3)

the image irradiance equation demands that the recovered surface normals lie on the reflectance cone whose axis is the light source direction and whose opening angle is the inverse cosine of the normalized image brightness. This is why SFS is an under-constrained problem: the two degrees of freedom for surface orientation (slant and tilt) must be recovered from a single measured brightness value. Surfaces rendered under Lambert’s law have a matte aspect. Examples can be seen in the third column of Figure 2. The nz component of the surface normal is indeed the Lambertian illumination of the surface, with the light source direction parallel to the viewers direction (i.e. s = (0, 0, 1)). In contrast to the human visual system [12], it seems that computer vision systems encounter more difficulty in estimating the tilt of a surface from a single image than its slant. This is not surprising since the only measure that can be directly recovered from the (Lambertian) image brightness is the zenith angle. This fact is exploited in the next section, where we explain how the slant can be used to approximate facial surface using only a reference model and a brightness image.

762

4

M. Castel´ an et al.

Using the Zenith Angle of the Surface Normal to Approximate Facial Shape

We profit on the fact that the inverse cosine of the produced irradiance equals the zenith angle of the surface normal. This means one can calculate the zenith angle in a straightforward way. The main idea in this paper is to pair the slant calculated from a brightness image with the tilt obtained from a reference model. This will transfer the facial features of the input image onto the reference model. In our experiments, the input brightness data was taken from the Lambertian reillumination of the subjects in the database. The face database used for building the models was provided by the Max-Planck Institute for Biological Cybernetics in Tuebingen, Germany [13]. The reference model was the average face over all the subjects in the database.

Fig. 3. The rows represent two different subjects (same used in Figure 2). The leftmost panel shows the case when θim and φref are combined. The rightmost panel shows the case of combining θref and φim ..

To illustrate the combination of slant and tilt, let us call (θref , φref ) to the zenith and azimuth angles of the surface normal of a reference model. Similarly, let us call (θim , φim ) to the azimuth and zenith angles of an input brightness image. Unlike the zenith angle, the azimuth angle cannot be calculated directly from the image brightness, however, we assume we have accurate tilt values in φim . In Figure 3 we show two examples. The Figure contains two panels, each of which consists of columns showing slant, tilt, and a frontal reillumination. This reillumination represents the integrated surface from the normals obtained after combining slant and tilt values1 . The rows represent two different subjects (same as used in Figure 2). The leftmost panel shows the case when θim and φref are combined. The rightmost panel shows the case of combining θref and φim . The important feature to note here is that the main responsible of the facial appearance is the slant. In both cases, the tilt seems to provide the general shape of the face, but the slant dictates the perceptible changes in surface inclination (i.e. regions around the nose, lips, eyes). 1

For surface integration from surface normals, we used the global integration method of Frankot and Chellappa [14].

Example-Based Face Shape Recovery

5

763

Experiments

In this section, we present an experimental evaluation of the method. First, we use Lambertian data to test accuracy in surface shape recovery. Then we use real world images to generate novel reillumination and explore the usability of the recovered surfaces for face recognition applications. 5.1

Lambertian Examples

We performed experiments using the brightness images of each surface in the database (i.e the nz component). The slant θim was obtained directly from each subject and then combined with the reference tilt φref . A new set of surface normals was derived from the pair (θim ,φref ). These surface normals were integrated to generate a surface. Therefore, 100 facial surfaces were approximated from each brightness image in the database. Profile comparisons of two examples are shown in Figure 4. The ground truth (solid line) is plotted against the recovered surface (dashed line). Note how the difference in height surface is negligible, while the profile contour reveals agreement in shape. In order to test the accuracy of facial reconstruction, we have explored the distribution of the recovered surfaces. We have used Multi Dimensional Scaling (MDS) [15] to embed the faces in a low dimensional pattern space. We built a 100 eigenfaces model based on the ground truth database and determined the dissimilarity measure in the following manner: 1. Calculate the matrix of vector coefficients for each in-training element in the database (the columns of this matrix are parameter vectors representing the in-training sample faces). 2. Calculate the linear correlation coefficient between the columns of the parameter matrix. A correlation of 1 indicates a dissimilarity of 0, a correlation of -1 indicates a dissimilarity of 1. We used only the first 40 coefficients for each vector, and these account for at least 90% of the total variance of the model. We repeated this procedure using the recovered height surfaces, building another height model from which the dissimilarity matrix was calculated in the same way as explained above. In Figure 5 the results of performing MDS are shown as gray and white circles for the ground truth and the recovered surfaces, respectively. The distribution of dissimilarities is very similar for both set of data. This suggests that the heigh surfaces recovered using the method reflect the same shape distribution as the ground-truth parameters. In other words, the output of the method may be suitable for the purposes of recognition. 5.2

Non-lambertian Examples

Although our main interest in this paper has been the use of the the zenith angle of the surface normal for reconstructing facial shape, we have also

764

M. Castel´ an et al. Example 2 300

280

280

260

260 Height data

Height Value

Example 1 300

240 Ground truth Recovered

220

Ground truth Recovered 220

200

180

240

200

0

20

40

60

80

100

120

140

180

160

0

20

40

60

80

100

120

140

160

Fig. 4. The ground truth (solid line) is plotted against the recovered surface (dashed line)

1

Reconstructed Ground truth

0.8 0.6 0.4 0.2 0 −0.2 −0.4 −0.6 −0.8 −1 −1

−0.5

0

0.5

1

Fig. 5. Results of performing MDS are shown as gray and white circles for the ground truth and the recovered surfaces respectively

performed some relatively limited experiments aimed at exploring their potential for recognition. We also present experiments with a number of real world face images. These images are drawn from the Yale B database [16]. In the images, the faces are in the frontal pose and were illuminated by a point light source situated approximately in the viewer direction. The input images were aligned to the reference model using a simple warping operation with 4 landmarks (the centers of the eyes, tip of the nose and center of the mouth). The recovered surfaces were un-warped to better approximate the input image. The surface recovery results are shown in Figure 6. From left to right we show the input image, the aligned input, slant, tilt and frontal illumination corresponding to the recovered surface. Note the errors due to the non-Lambertian nature of the input images. This evidences itself as instabilities around the boundaries of

Example-Based Face Shape Recovery

765

Fig. 6. Surface recovery results for non-Lambertian data. From left to right we show the input image, the aligned input, slant, tilt and frontal illumination corresponding to the recovered surface.

the face and in the proximity of the mouth and nose. Although the method struggled to recover the shape of the eye sockets, the overall structure of the face is well reconstructed. Moreover, the eyebrow location, nose length and width of the face clearly match those of the input images. Another important feature to note from the figure is that the recovered slant (third column) shows more correspondence with the input image than the recovered tilt (fourth column). This is because the facial features of the input images were better conserved in the slant, while the reference tilt just served as the mold where the input image was fitted. For the recognition experiments, we used the frontal view subset of the Yale B database. In the database, the subjects are illuminated from many different light sources and photographs are taken to capture the appearance of the images over a wide range of illuminations. We rendered new reilluminations and performed recognition tests following the next steps: 1. Estimation of lighting coefficients. For each subject, we used the recovered surface and a photograph. With these two estimations at hand we approximated the lighting coefficients of a spherical harmonic illumination model [17]. 2. Rendering appearance. A novel reillumination is then generated using the lightning coefficients, an albedo map and the surface normals of the recovered surface.

766

M. Castel´ an et al.

3. Comparing rendered and original image. We computed the correlation coefficient between the rendered image and the original input image. We located the element that maximized the correlation. The identity of this element was used to classify the input image. In this manner we aimed to recognize views of the subject re-illuminated from different light source directions. In Figure 7 we show results on the rendering technique. The first column shows a sample image from which the illumination coefficients where recovered. These coefficients were used to render the recovered surfaces under similar illumination conditions. This renderization is illustrated in the remaining columns. The rows of the figure show three different illumination cases. Note that the model struggles to generate realistic views when the light source direction departs from the viewer direction. This is due to the simplicity of the rendering technique (cast shadows are not modeled) as well as the rough approximation of the surface (the bas-relief ambiguity was not considered). Despite the simplicity of the rendering technique and the surface recovery method, the recognition tests suggest that this simple approximation can lead to favorable results. This is illustrated in Figure 8, where a plot of the subject number against the correlation coefficient achieved is shown. We computed the average behaviour for the five subjects of the database. Consistent examples (i.e. the ones supposed to get the highest correlation coefficient) are represented with a solid line. Inconsistent examples are represented with a dotted line. The figure shows how the average behaviour of the recognition test tends to assign the highest correlation coefficient to consistent examples, therefore achieving a correct classification in all the cases.

Fig. 7. The first column shows a sample image from which the illumination coefficients where recovered. These coefficients were used to render the recovered surfaces under similar illumination conditions. This is shown in the remaining columns. The rows show three different illumination cases.

Example-Based Face Shape Recovery

767

Average Recognition Performance 1

Average Image Correlation Coefficient

0.95

0.9

0.85

0.8

0.75 Consistent examples Inconsistent examples

0.7

0.65

0

5

10

15

20 25 30 Number of Examples

35

40

45

Fig. 8. Consistent examples are represented with a solid line. Inconsistent examples are represented with a dotted line.

6

Conclusions

We have presented a method for recovering facial shape using an image of a face and a reference model. The image has to be frontally illuminated and aligned to the reference model. The zenith angle of the surface normal is recovered directly from the intensities of the image. The azimuth angle of the reference model is then combined with the calculated zenith angle in order to get a new field of surface normals. After integration of the needle map, the recovered surface has the effect of mapped facial features over the reference model. Experiments demonstrate that for the lambertian case, surface recovery is achieved with high accuracy. For non-Lambertian cases, experiments suggest potential for face recognition applications. Future work includes using the method with more realistic rendering techniques as well as considering the bas-relief ambiguity.

References 1. Horn, B., Brooks, M.: Shape from Shading. MIT Press, Cambridge (1989) 2. Jognston, A., Hill, H., Carman, N.: Recognising faces: Effects of lightning direction, inversion and brightness reversal. Perception 21, 365–375 (1992) 3. Atick, J., Griffin, P., Redlich, N.: Statistical approach to shape from shading: Reconstruction of three-dimensional face surfaces from single two-dimensional images. Neural Computation 8, 1321–1340 (1996) 4. Blanz, V., Vetter, T.: Face recognition based on fitting a 3d morphable model. IEEE T. PAMI 25(9), 1063–1074 (2003) 5. Smith, W., Hancock, E.R.: Recovering facial shape and albedo using a statistical model of surface normal direction. In: Proc. IEEE ICCV 2005, pp. 588–595 (2005)

768

M. Castel´ an et al.

6. Castel´ an, M., Hancock, E.R.: Using cartesian models of faces with a data-driven and integrable fitting framework. In: Campilho, A., Kamel, M. (eds.) ICIAR 2006. LNCS, vol. 4142, pp. 134–145. Springer, Heidelberg (2006) 7. Worthington, P.L., Hancock, E.R.: New constraints on data-closeness and needle map consistency for shape-from-shading. IEEE T. PAMI 21(12), 1250–1267 (1999) 8. Kemelmacher, I., Basri, R.: Molding face shapes by example. In: Proc. European Conference in Computer Vision (2006) 9. Marr, D.: Vision: A Computational Investigation into the Human Representation and Processing of the Visual information. Freeman (1982) 10. Moses, Y., Adini, Y., Ullman, S.: Face recognition: the problem of compensating for changes in illumination direction. In: Proc. European Conference on Computer Vision, pp. 286–296 (1994) 11. Horn, B.: Understanding image intensities. Artificial Intelligence 8, 201–231 (1997) 12. Erens, R., Kappers, A., Koenderink, J.: Perception of local shape from shading. Perception and Psychophysics 54(2), 145–156 (1993) 13. Blanz, V., Vetter, T.: A morphable model for the synthesis of 3d faces. In: SIGGRAPH 1999, pp. 187–194 (1999) 14. Frankot, R., Chellappa, R.: A method for enforcing integrability in shape from shading algorithms. IEEE T.PAMI 10, 438–451 (1988) 15. Young, F.W., H.R.M.: Theory and Applications of Multidimensional Scaling. Eribaum Associates, Hillsdale (1994) 16. Georghiades, A., Belhumeur, D., Kriegman, D.: From few to many: Illumination cone models for face recognition under variable lighting and pose. In: IEEE T. PAMI, pp. 634–660 (2001) 17. Basri, R., Jacobs, D.W.: Lambertian reflectance and linear subspaces. IEEE T. PAMI 25(2), 218–233 (2003)