Automated Multimodal Biometrics Using Face and Ear - Springer Link

0 downloads 0 Views 395KB Size Report
Abstract. In this paper, we present an automated multimodal biomet- ric system for the detection and recognition of humans using face and ear as input.
Automated Multimodal Biometrics Using Face and Ear Lorenzo Luciano and Adam Krzy˙zak Department of Computer Science and Software Engineering Concordia University 1455 de Maisonneuve Blvd. West Montreal, Quebec, H3G 1M8 Canada {l lucian,krzyzak}@cs.concordia.ca

Abstract. In this paper, we present an automated multimodal biometric system for the detection and recognition of humans using face and ear as input. The system is totally automated, with a trained detection system for face and for ear. We look at individual recognition rates for both face and ear, and then at combined recognition rates, and show that an automated multimodal biometric system achieves significant performance gains. We also discuss methods of combining biometric input and the recognition rates that each achieves. Keywords: Face recognition, ear recognition, multimodal biometrics, eigenface, eigenear, PCA.

1

Introduction

The recognition of individuals without their full cooperation is in high demand by security and intelligence agencies requiring a robust person identification system. A system such as this would allow person identification from reasonable distances without the subject’s knowledge. Such a system would also have to be fully automated with the detection and recognition done without manual intervention, to be of greater value and use. Towards such a system, we combined face recognition with ear recognition in a multimodal biometric system to improve recognition rates of unimodal systems. There are many methods for face recognition as this is a heavily researched area. Some of these more popular methods are Eigenface [1], Gabor features [2], Fisherface [3] and Local Feature Analysis [4]. Due to the fact that a fast automated system was desired, we used Eigenface to improve the recognition rates, yet still maintain a fast, and fully automated system. Other biometrics modalities include iris, hand, gait, voice, fingerprint, see Handbook of Biometrics [5]. Research in this area has shown some very interesting results, Chang et al.[7] used PCA on face and ear, with a manual land marking method. With the largest dataset of 111 subjects, they achieved a combined recognition rate of 90%. Rahman and Ishikawa[8] also used PCA for combining face and ear, they used M. Kamel and A. Campilho (Eds.): ICIAR 2009, LNCS 5627, pp. 451–460, 2009. c Springer-Verlag Berlin Heidelberg 2009 

452

L. Luciano and A. Krzy˙zak

profile images and manually extracted features. On a dataset of 18 subjects of profile face and ear, the recognition rate was 94.44%. Middendorff and Bowyer[6] used PCA/ICP for face/ear, manually annotating feature landmarks. On a 411 subject dataset they were able to achieve a best fusion rate of 97.8%. Yuan er al.[14] used s FSLDA (full-space linear discriminant analysis) algorithm on 75 subject database with 4 images each(USTB) and on the ORL database of 75 subjects, achieving a best recognition rate of 98.7%. The novel feature of this paper is the development of a multibiometric system using face and ear as biometrics. It requires no manual intervention and was able to achieve recognition rates using two separate databases and achieved 98.4% on a subset of FERET[13] and 99.2% on CVL[15] Face Database. The automation process includes a trained face and ear detector, extraction, cropping, and preprocessing. This paper is organized as follows. We start with object detection in section 2. In section 3, we discuss biometrics for face and ear. In section 4, we describe a multimodal biometric, combining face and ear for recognition. In section 5, we present the experimental data along with results. Finally, we give concluding remarks.

2

Object Detection

The regions of interest are extracted using a Haar like features based object detector provided by the open source project OpenCV library [9]. This form of detection system is based on the detection of features that display information about a certain object class to be detected. Haar like features encode the oriented regions in images whenever they are found, they are calculated similarly to the coefficients in Haar wavelet transformations. These features can be used to detect objects in images, in this case the human face and the human ear. The Haar like object detector was originally proposed by Viola and Jones [10] and later extended by Lienhart and Maydt [11]. 2.1

Face Detection

To create a face detector we used 2000 positive face samples and 5000 negative samples. The positive samples were scaled to the same size of 24x24; yielding

Fig. 1. Image of falsely detected face

Automated Multimodal Biometrics Using Face and Ear

453

the best and fastest results. The face detector worked very well, detecting all faces, with a few false detections. The problem of false detection was overcome by selecting the largest detected region in the images, see Figure 1. 2.2

Ear Detection

To create the ear detector we also used 2000 positive samples and 5000 negative images. The positive images were scaled to a size of 16x24 to reflect the rectangular dimensions of the ear. The ear detector worked well with a few falsely detected ears, the problem was overcome by selecting the larger detected object, see Figure 2. We did get images where the ear was not detected; this was due to too many occlusions around the ear, see Figure 2 for an example. The system in this case can do simple face recognition, but we chose not to include these images because we were interested in multimodal results.

Fig. 2. On the left, image of falsely detected ear. On the right, undetected ear.

3

Unimodal Biometrics

We extracted only the portion of the image which was detected. For the face, the detected portion was further cropped in width to remove some of the unwanted and unneeded areas not making up the face. The ear was also extracted and further cropped to get a more accurate ear representation. This was all done automatically where the best cropping techniques and parameters were determined experimentally. For both face and ear we used Principal Component Analysis (PCA) for recognition. PCA is a successful method for recognition in images and is largely a statistical method. PCA transforms the image space to a feature space; the feature space is then used for recognition. PCA translates the pixels of an image into principal components. Eigenspace is determined by the eigenvectors of the covariance matrix derived from the images. Let a face/ear image be represented by N × N matrix I(x, y) and the training database be represented by images I1, . . . , IM . Next images are converted to an N 2. The average face Υ is Υ =

M 1  In . M n=1

454

L. Luciano and A. Krzy˙zak

Each face differs from the average face Υ by vector φi = Ii − Υ. Set of vectors is subject to PCA seeking a set of N 2 orthonormal vectors μk and eigenvalues λk . Let C be a covariance matrix C=

M 1  φn φTn M n=1

= AAT where μk are its eigenvectors and λk are its eigenvalues and A = [φ1 , φ2 , ..., φM ]. The eigenproblem with N 2 × N 2 matrix C is computationally intensive, so instead we can determine M eigenvectors μk and M eigenvalues λk by solving a smaller M × M matrix AT A. Observe that Aμk are eigenvectors of C = AAT . We then use linear combination of M training faces to form eigenfaces ul M  ul = μl,n φn . n=1

We usually use only a subset of M  eigenfaces corresponding to the largest eigenvalues. For classification, an unknown face image I is resolved into weight components by the transformation ωk = uTk (I − Υ ),

k = 1, ..., M 

T and we form a new weight vector Ωnew T  Ωnew = [ω1 , ..., ωM ].

Let Ωk be a vector describing k-th face class. Then we compute the Euclidean distance k = ||Ω − Ωk || and we classify face I to class k, where k is minimum.

4

Multimodal Biometrics

A multibiometric system normally overcomes many of the factors that plague a unimodal biometric system such as noise, variability and error rates [12]. Apart from the benefit of a higher recognition rate, a multimodal biometric system can also help in lowering false rejection error rates. The approach we adopt for our multibiometric system is a multimodal approach (face and ear) with a single algorithm (PCA).

Automated Multimodal Biometrics Using Face and Ear

455

Fig. 3. Sample images

Fig. 4. Graph of unimodal recognition rates for face and ear

4.1

Database

We used 2 databases for our experiments, the first consisted of a subset of 854 images of 100 subjects of FERET [13], see Figure 3 for samples of images. For each subject, there are at least two frontal images and two profile images. This ensured that we had enough frontal and profile images for each subject. The second database we used is the CVL Face Database[15], which consists of 798 images in total. The database is made up of 114 subjects, 7 images per subject taken at various angles. 4.2

Individual Face and Ear Recognition

Each mode was first run separately; specifically we ran face through to the recognition phase and then did the same for ear. Using the Euclidean distance, we achieved better results for face than for ear. The best results for face were 93.6% and 94.2% for FERET and CVL respectively. The best results for ear were 75.8% and 76.4% for FERET and CVL respectively.

456

4.3

L. Luciano and A. Krzy˙zak

Fusion Recognition

In a multibiometric system, fusion is used to determine classification based on both individual biometrics. There are many methods of achieving this; one simple method is the sum of both biometric results to determine the best classification. We experimented with many methods of fusion which will be discussed in detail in the experimental section. We want to optimally combine the results from both biometrics to increase the levels of recognition.

5

Experiments

We experimented with several fusion techniques, to discover which methods yielded the best results. We will not only present the experimental data, but we will also describe and analyze the data so that a better understanding of multimodal biometrics, more specifically face and ear multimodal biometrics can be gleaned from this research. Using the automation techniques previously described, we were able to avoid any manual intervention needed such as land marking and extracting features described in papers [7,8,6]. Experimental results also show that significant improvement in recognition rates were achieved using our automated multimodal biometric approach to recognition as compared to those in the mentioned papers. 5.1

Multimodal Recognition

To properly compare and fuse distances from different modes, there is a need for an accurate normalization technique that can be applied to the distances. To normalize the distances in our experiments we used the min-max normalization[6]. More formally, to normalize the distance x in the dataset; we get the normalized value x‘i by; x‘i = (xi − mini )/(maxi − mini ) where, min and max are the minimum and maximum values for each dataset. Using this normalization technique we get values in the range of [0, 1] for each distance. This will allow us to fuse face and ear values with more accurate comparisons. 5.2

Normalized Sum

The distances for face and ear are first normalized using the min-max normalization technique, then we sum the two normalized distances to get a normalized combined sum. The candidate with the least distance is considered to be the best candidate. Using the Euclidean distance, the best recognition rate achieved was 95.2% for FERET and 96.1% for CVL.

Automated Multimodal Biometrics Using Face and Ear

457

Table 1. Combined face/ear normalized weighted sum recognition rates using Euclidean distance weight(face/ear) FERET dataset CVL dataset (1.0/0.0) 93.6% 94.2% (0.9/0.1) 98.4% 98.9% (0.8/0.2) 98.4% 99.2% (0.7/0.3) 96.8% 97.6% (0.6/0.4) 96.8% 97.1% (0.5/0.5) 95.2% 96.1% (0.4/0.6) 91.9% 93.8% (0.3/0.7) 91.9% 92.2% (0.2/0.8) 90.3% 91.3% (0.1/0.9) 85.5% 87.1% (0.0/1.0) 75.8% 76.4%

Fig. 5. Recognition rates for different face/ear weights using normalized sum

5.3

Weighted Normalized Sum

Using weighted values, the best recognition rate was achieved using a weight in the range of (0.9 to 0.8)/(0.1 to 0.2) for face/ear respectively using FERET and (0.8/0.2) using CVL, see Table 1 for all results. From this table, we see the effects of different weight values for face and ear. The weight (1.0/0.0) represents face recognition only and (0.0/1.0) represents ear recognition only. Also, the weight (0.5/0.5) represents the non-weighted recognition rate. Figure 5 presents a graph of the different recognition rates achieved using the normalized sum of face/ear with many different face/ear weights. From the graph we can see how the line peaks at face/ear weights of (0.9 to 0.8)/ (0.1 to 0.2) respectively, and then decline.

458

5.4

L. Luciano and A. Krzy˙zak

Interval

In our experiments, we also attempted to use a distance measurement between the first and second best match assuming that a greater distance between the first and second match would indicate a greater reliability. We called this the Interval-Euclidean distance. Using this distance measurement, we were able to achieve a recognition rate of 95.2% for FERET and 95.6% for CVL. The thinking behind this is that if there is a greater distance between the first and second best matches then it is an indication that the selection of the first is a surer thing or more reliable selection. As opposed to the first and second distances being very close, where this might indicate the selection is not so reliable and was a close call. 5.5

Weighted Interval

We also experimented with weights on the interval recognition algorithm, we ran experiments with the same weights we did for the normalized weighted sum. Table 2 presents the data for different face/ear weights for both datasets. In Figure 6, we see a graph of the different recognition rates achieved using weights on an interval based fusion system. There is a slight improvement in the face/ear range of (0.7 to 0.6) / (0.3 to 0.4) for FERET and in the face/ear range of (0.8 to 0.6) / (0.2 to 0.4) for CVL. 5.6

Errors

We did encounter a few errors with the algorithm. Some error cases are shown in Table 3. The subject in this case is 00796 using an algorithm of a face/ear weighted sum of 0.8/0.2 respectively. The table displays the results from Euclidean distance, weighted normalized values, and finally from a weighted sum. The algorithm incorrectly selects subject 00768 as the best fit candidate. Further research can help to remedy this problem. Table 2. Combined face/ear weighted interval recognition rates. The weight (1.0/0.0) represents face recognition only and (0.0/1.0) represents ear recognition only. weight(face/ear) FERET dataset CVL dataset (1.0/0.0) 93.6% 94.2% (0.9/0.1) 93.6% 94.8% (0.8/0.2) 95.2% 96.3% (0.7/0.3) 96.8% 97.5% (0.6/0.4) 96.8% 97.2% (0.5/0.5) 95.2% 95.6% (0.4/0.6) 93.6% 94.2% (0.3/0.7) 88.7% 91.1% (0.2/0.8) 83.9% 85.5% (0.1/0.9) 79.0% 80.6% (0.0/1.0) 75.8% 76.4%

Automated Multimodal Biometrics Using Face and Ear

459

Fig. 6. Recognition rates for different face/ear weights using interval Table 3. Combined face/ear weighted sum error for subject 00796 Candidate 00796 00768 00792

6

Euclidean Distance face ear 5.02903e+006 9.73122e+006 4.92275e+006 8.77253e+006 1.3419e+007 7.37937e+006

Normalized Euclidean Distance 0.8*face 0.2*ear 0.8/0.2 sum 0.00171 0.01595 0.01766 0 0.00945 0.00945 0.13631 0 0.13631

Concluding Remarks

In this paper, we described an automated multibiometric system using face and ear. Among several fusion methods a normalized Euclidean weighted sum for face/ear of (0.8/0.2), gives the best result for both FERET and CVL of 98.4% and 99.2% respectively. These results may aid in the development of a passive recognition system where the subject’s cooperation is not required.

References 1. Turk, M.A., Pentland, A.P.: Eigenfaces for recognition. Journal of Cognitive Neuroscience 3(1), 71–86 (1991) 2. Qin, J., He, Z.S.: A SVM face recognition method based on Gabor-featured key points. In: Proc. Fourth Int. Conf. Machine Learning and Cybernetics, pp. 5144– 5149 (2005) 3. Belhumer, P.N., Hespanha, J.P., Kriegman, D.J.: Eigenfaces vs. Fisherfaces: recognition using class specific linear projection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 711–720 (1997) 4. Penev, P.S., Atick, J.J.: Local feature analysis: a general statistical theory for object representation. Network: Comput. Neural Syst., 477–500 (1996)

460

L. Luciano and A. Krzy˙zak

5. Jain, A., Flynn, P., Ross, A.A.: Handbook of Biometrics. Springer, Heidelberg (2008) 6. Middendorf, C., Bowyer, K.W.: Multibiometrics using face and ear. In: Handbook of Biometrics, pp. 315–341. Springer, Heidelberg (2008) 7. Chang, K., Bowyer, K., Sarkar, S., Victor, B.: Comparison and Combination of Ear and Face Images in Appearance-Based Biometrics. IEEE Transactions on Pattern Analysis and Machine Intelligence 25(9), 1160–1165 (2003) 8. Rahman, M.M., Ishikawa, S.: Proposing a passive biometric system for robotic vision. In: Proc. 10th International Symp. on Artificial Life and Robotics (2005) 9. http://sourceforge.net/projects/opencvlibrary/ 10. Viola, P., Jones, M.: Rapid object detection using boosted cascade of simple features. In: Proceedings of IEEE Computer Vision and Pattern Recognition (2001) 11. Lienhart, R., Maydt, J.: An extended set of Haar-like features for rapid object detection. In: Proceedings of IEEE International Conference on Image Processing, pp. 900–903 (2002) 12. Bubeck, U.M., Sanchez, D.: Biometric authentication: Technology and evaluation, Technical Report, San Diego State University (2003) 13. Phillips, P.J., Moon, H., Rizvi, S.A., Rauss, P.J.: The FERET Evaluation Methodology for Face-Recognition Algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(10), 1090–1104 (2000) 14. Yuan, L., Mu, Z.-C., Xu, X.-N.: Multimodal recognition based on face and ear. In: International Conference on Wavelet Analysis and Pattern Recognition, ICWAPR 2007, vol. 3(2-4), pp. 1203–1207 (2007) 15. Peer, P.: CVL Face Database, http://www.lrv.fri.uni-lj.si/facedb.html