Face Recognition Using Deep Features

1 downloads 0 Views 247KB Size Report
Department of Computer Science, M2I Laboratory, ASIA Team, .... methods is DeepFace [19], which using a CNN architecture trained on a dataset of four.
Face Recognition Using Deep Features Hamid Ouanan(&), Mohammed Ouanan, and Brahim Aksasse Department of Computer Science, M2I Laboratory, ASIA Team, Faculty of Science and Techniques, Moulay Ismail University, BP 509 Boutalamine, 52000 Errachidia, Morocco [email protected], [email protected], [email protected]

Abstract. Recent studies discovered that the human brain has a deep face-processing network, where identity are processed by multiple different neurons. Consequently, we turn our attention for using deep architecture of neural networks to reach near human performance in the world of face recognition. In this paper, we make the following contributions: Firstly, we build a novel dataset with over four million faces labelled for identity by employing a smart synthesis augmented approach based on rendering pipeline to increase the pose and lighting variability. Secondly, a robust deep CNN model taking place. Finally, we set up a new real time application of this approach proposed. This application called PubFace, which allows users to identify anyone in public spaces. Experiments conducting on the well-known LFW dataset, demonstrating that the proposed approach achieved state-of-the-art results. Keywords: Face recognition  Artificial intelligence augmentation  Big data  Smart digital



Deep learning



Data

1 Introduction Deep learning have been widely used in computer vision community, significantly improving the state-of-the-art. Thanks to Deep Learning, in particular Convolutional Neural Networks (CNNs), the past year (2016) has seen incredible breakthrough in artificial intelligence. In March 2016, Google DeepMind’s AlphaGo computer program [1] won a Korean champion to go game by four wins at one, making it the first time a computer Go program had defeated an excellent human player. In June 2016, the Chinese team of search engine Baidu announced unmatched performances in machine translation: six points better than the state of the art. In September 2016, Google replied by a better point and integration of this technique in its famous translation tool [2]. In November 2016, the team of Oxford and Google described her lecture program on the lips [3]. These are just a few of the milestones artificial intelligence (AI) that has enabled in the past year (2016). The success of deep learning stems from the fact: the availability of very large amount of training datasets, which is the main key to build a great model, based on CNNs. However, in the area of face recognition, the new advancements remain limited to Internet giants like Facebook, Flickr and Google, which have the world’s private largest databases. Besides, There are many challenges in dealing with this applications listed such as variation in illumination, variability in © Springer International Publishing AG 2018 M. Ezziyyani et al. (eds.), Advanced Information Technology, Services and Systems, Lecture Notes in Networks and Systems 25, https://doi.org/10.1007/978-3-319-69137-4_8

Face Recognition Using Deep Features

79

scale, location, orientation and pose. Furthermore, facial expressions, facial decorations, partial occlusion, and lighting conditions change the overall appearance making it harder to recognize faces. The remaining of the paper is organized as follows: Second section presents a review of recent advances in face recognition techniques. New approach of large-scale face recognition in the wild is given in the third section. In the fourth section, an extension of experimental results is present. Followed by presentation of the main features and some specific application areas of PubFace app and finally the last section concludes our paper.

2 Related Works In this section, we draw up a state-of-the-art review of data augmentation algorithms and the face recognition methods giving good results. Big data: Recently, the number of face images has been growing exponentially on social network such as Facebook and Twitter. As an example, the director of Facebook AI Research Yann LeCun has said, “almost 1 billion new photos were uploaded each day on Facebook in 2016” [4]. Large face datasets are important for deep CNNs [5]. However, to build large dataset by downloading the images from search engine is very difficult and most financially challenging. One way to get around a lack of large face datasets is to augment the data. Data augmentation: To train face recognition systems based on deep convolutional neural networks, very large training sets are needed with millions of labeled images. However, large training datasets are not publicly available and very difficult to collect. In this work, we a method to generate very large training datasets of synthetic images by compositing real face images in a small dataset collected from social networks. There are many Smart approaches to augmenting the size of the training set 10-fold or more. The popular augmentation methods include simple, geometric transformations such as oversampling [6], mirroring [7, 8], rotating [9] the images, Hairstyles synthesis [10], Glasses synthesis [11], 3D Face Reconstruction [12], Illuminations synthesis [12]. State-of-the-art face recognition: Traditional feature extractors such as GaborZernike features [13], HOG [14, 15], SIFT [16] produce a good results on controlled conditions (constrained environments) as represented in FERET benchmark [17]. However, the recognition performances of theses representations may decreased dramatically in the wild, which represented in LFW [18]. This is because these features cannot improve the robustness to visual challenges such as pose, illumination and expression …etc. In light of these nuisance factors, deep CNN feature extractor obtained by concatenating several linear and non-linear operators replaced conventional features extractors. These features demonstrated their potential by produce promising face recognition rates in the wild. A popular approach of this class of methods is DeepFace [19], which using a CNN architecture trained on a dataset of four million images spanning 4000 subjects. This approach achieve excellent recognition

80

H. Ouanan et al.

accuracy near human visual system in the LFW benchmark [18]. This work is extended by the DeepId series of papers [20, 21] by using multiple CNNs [22]. Other interesting approach are being proposed [23–25].

3 The Proposed Method In this section, we present our contributions to improve face recognition performance in the wild, particularly in terms of pose and illumination invariant. 3.1

Data Augmentation

In this sub-section, we describe the process used to build a large synthetic face dataset. The different steps of this process are summarized below in Fig. 1: Step 1: Select list of some names of public figures. Step 2: Filtering a list of candidate identity names. Step 3: Face images were detected using a robust face detector. Step 4: Building a small dataset by collecting some photos for each identity. Step 5: Increase the size of the small dataset building (in Step 4) by using The proposed method in [26]. Fig. 1. Main stages of the dataset building process

In the first time, we select list of some names of public figures (football players, actors and politics figures) for obtaining their faces images and informations via Facebook Graph Search. Next, we apply the filtering process in order to reduce the list of identities. Then, a robust face detection is applied [27–30]. Finally, we apply the smart augmented approach [12, 26] to increase the size of our dataset. In this manner, a final list of four mille public figures names is obtained. We call these images as Puball-dataset, which all the images have the size of 152  152 pixels. Table 1 gives some statistical information on the larger face datasets public and private. Table 1. Dataset comparisons Dataset Facebook Google MegaFace CASIA VGG Face LFW CelebFaces Chen et al. Puball-dataset (ours)

Identities 4,030 8M 690,572 10,575 2,622 5,749 10,177 2,995 4000

Images 4.4 M 200 M 1.02 M 494,414 2.6 M 13,233 202,599 99,773 5M

Face Recognition Using Deep Features

81

In the next sub-section, we present the deep CNN architecture adopted and their training process used in our experiments. 3.2

Network Architecture and Training

We use the VGGNet, off-the-shelf deep models of [31], originally trained on the ImageNet, large-scale image recognition benchmark (ILSVRC) [32]. We fine-tuned this CNN model on our training dataset. The input to deep CNN architecture adopted is a RGB face image (3  96  96): As shown in Fig. 2 the deep architecture that we have used for representing faces in images consists of many function compositions, or layers, followed by a loss function. The loss function measure how accurately the neural network classifie a face image. It comprises more than 40 layers, each linear operator followed by spatial batch normalization (SBN) and one or more non-linearities such as ReLU and max pooling. Input image of this architecture is passed through a series of convolution filters and non-linear projections to obtain the identity classification. This process is serially repeated several times giving them their popular “Deep” identity. A rectification layer (ReLU) follows all the convolution layers. The last three blocks are called Fully Connected (FC); they are the same as a convolutional layer. The resulting vector is passed to a classification layer to compute the class posterior probabilities.

Fig. 2. CNN architecture adopted in our approach

4 The Experiments and Tests The performance of the proposed approach is assessed by conducting experiments on the well-known LFW dataset, which is described briefly below. In addition, we compare our approach with competitive supervised methods and current best commercial system. The receiver operating characteristic curves (ROC) is used to evaluate the performance of our proposed approach.

82

4.1

H. Ouanan et al.

Labeled Face in the Wild

The dataset contains 13,233 images of 5,749 people downloaded from the Web. This database, cover large variations including different subjects, poses, illumination, occlusion etc. Two views are provided to develop models and validate on one view and finally test on another. For evaluation, we have using the standard protocol which defines 3,000 positive pairs and 3,000 negative pairs in total and further splits them into 10 disjoint subsets for cross validation. Each subset contains 300 positive and 300 negative pairs, portraying different people. 4.2

Results and Discussion

We present the average ROC curves for them in Fig. 3. In addition, we compare the mean accuracy of the proposed approach with some methods which achieve state of the art and other commercial systems. The results are summarized in Table 2: 1 Our proposed approach

True Positive Rate(FPR)

0.8

0.6

0.4

0.2

0 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

False Positive Rate(FPR)

Fig. 3. ROC Curve of our proposed approach on LFW dataset Table 2. Accuracy of different methods on the LFW dataset. Method DeepFace [19] DeepID2 [20] Yi et al. [33] Wang et al. [34] Human [35] Our Proposed approach

Mean accuracy 97.35% 95.43% 96.13% 96.95% 97.53% 98.12%

It can be seen from the Table 2 that our approach performs well comparably to other methods and commercial systems. Our proposed method achieve a good results on LFW dataset, which contains faces with full pose, illumination, and other difficult conditions. It is robust, especially in the presence of large head pose variations.

Face Recognition Using Deep Features

83

5 PubFace Application Having looked at the proposed approach of face recognition in the wild, we look at features and some of specific applications of PubFace app that we have developed: The number of facebook users in morocco is more than 12,000,000. PubFace is could scan billions of Facebook profile images in real time. Through a database, that we have building by followed the same process presented in the Sect. (3.1). This dataset called Pub-dataset, include approximately half of adult photos downloaded from Facebook, without their knowledge or consent, in the hunt for suspected criminals. Besides, PubFace is able to link most faces photos (even from a side view as well as when the person is directly facing the camera in the picture) with a profile on the social network. So, PubFace app will tell you who it is? PubFace app maybe used in the context of: City surveillance: Large cities require more resources to handle threats such as vehicle theft, pickpockets, assaults, gang violence and shootings. PubFace app can improve overall public safety by reducing response times and providing law enforcement agencies with the ability to handle emergencies in a more effective way. Photo Organizing: With the rapid growth in the personal digital content thanks to the smart phones, there is a increase need for automatic organization to cluster picture collections based on person identities.

6 Conclusion In this paper, we have presented a new approach of large-scale face recognition in the wild. Our new approach based on deep learning was trained on Puball-dataset (Sect. 3) and evaluated on LFW dataset. Experimental results demonstrate that the performance of the proposed approach is much better than some methods, which achieve state of the art and other commercial systems. Moreover, we have presented briefly the main features and some specific areas of application of our PubFace app, which mainly could help reducing crime by making everyone identifiable.

References 1. Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., Den, Van, Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016) 2. Wu, Y., Schuster, M., Chen, Z., Le, Q.V., Norouzi, M., Macherey, W., Krikun, M., Cao, Y., Gao, Q., Macherey, K., et al.: Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016) 3. Assael, Y.M., Shillingford, B., Whiteson, S., de Freitas, N.: LipNet: End-To-End Sentence-Level Lipreading. https://arxiv.org/abs/1611.01599 (2016) 4. https://www.youtube.com/watch?v=vlQomVlaNFg&t=317s

84

H. Ouanan et al.

5. Masi, H., Tran, I., Leksut, J.T., Hassner, T., Medioni, G.G.: Do we really need to collect millions of faces for effective face recognition? CoRR, abs/1603.07057 (2016) 6. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Neural Information Processing Systems, pp. 1097–1105 (2012) 7. Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. In: Proceedings British Machine Vision Conference (2014) 8. Yang, H., Patras, I.: Mirror, mirror on the wall, tell me, is the error small? In: Proceedings Conference on Computer Vision and Pattern Recognition (2015) 9. Xie, S., Tu, Z.: Holistically-nested edge detection. In: Proceedings International Conference on Computer Vision (2015) 10. Liu, J.X., Liu, S., Xu, H., Zhou, X., Yan, S.: Wow! you are so beautiful today! ACM Trans. Multimedia Comput. Commun. Appl. 11(1s) (2014). 20 11. Wen, Y., Liu, W., Yang, M., Fu, Y., Xiang, Y., Hu, R.: Structured occlusion coding for robust face recognition. Neurocomputing 178, 11–24 (2016) 12. Jiang, D., Hu, Y., Yan, S., Zhang, L., Zhang, H., Gao, W.: Efficient 3d reconstruction for face recognition. Pattern Recogn. 38(6), 787–798 (2005) 13. Ouanan, H., Ouanan, M., Aksasse, B.: Gabor-zernike features based face recognition scheme. Int. J. Imaging Robot. 16(2), 118–131 (2015) 14. Dèniz, O., Bueno, G., Salido, J., De la Torre, F.: Face recognition using histograms of oriented gradients. Pattern Recognit. Lett. 32(12), 1598–1603 (2011) 15. Ouanan, H., Ouanan, M., Aksasse, B.: Gabor-HOG features based face recognition scheme. TELKOMNIKA Indonesian J. Electr. Eng. 15(2), 331–335 (2015) 16. Liu, C., Yuen, J., Torralba, A.: SIFT flow: dense correspondence across scenes and its applications. Springer International Publishing (2016). https://doi.org/10.1007/978-3-31923048-1_2 17. Phillips, P.J., Moon, H., Rizvi, S.A., Rauss, P.J.: The FERET evaluation methodology for face recognition algorithms. IEEE Trans. Pattern Anal. Mach. Intell. 22(10), 1090–1104 (2000) 18. Huang, G.B., Ramesh, M., Berg, T., Learned-Miller, E.: Labeled faces in the wild: a database for studying face recognition in unconstrained environments. University of Massachusetts, Amherst, TR 07-49 (2007) 19. Taigman, Y., Yang, M., Ranzato, M., Wolf, L.: Deep-face: closing the gap to human-level performance in face verification. In: IEEE CVPR (2014) 20. Sun, Y., Chen, Y., Wang, X., Tang, X.: Deep learning face representation by joint identification verification. In: Advances in Neural Information Processing Systems (2014) 21. Sun, Y., Ding, L., Wang, X., Tang, X.: Deepid3: Face recognition with very deep neural networks. CoRR, abs/1502.00873 (2015) 22. Sun, Y., Wang, X., Tang, X.: Deep learning face representation from predicting 10,000 classes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2014) 23. He, K., Zhang, X., Ren, S.: Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385 (2015) 24. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015) 25. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: CVPR (2015) 26. Ouanan, H., Ouanan, M., Aksasse, B.: Novel approach to pose invariant face recognition. Procedia Comput. Sci. 110, 434–439 (2017)

Face Recognition Using Deep Features

85

27. Viola, P., Jones, M.J.: Robust real-time face detection. Int. J. Comput. Vision 57(2), 137– 154 (2004) 28. Zhu, X., Ramanan, D.: Face detection, pose estimation, and landmark localization in the wild. In: Proceedings of Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, pp. 2879–2886 (2012) 29. Ouanan, H., Ouanan, M., Aksasse, B.: Facial landmark localization: Past, present and future. In: 4th IEEE International Colloquium on Information Science and Technology (CiSt), pp. 487–493 (2016) 30. Ouanan, H., Ouanan, M., Aksass, B.: Implementation and optimization of face detection framework based on OpenCV library on mobile platforms using Davinci’s technology. Int. J. Imag. Robot.™ 15(4) (2015) 31. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015) 32. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, S., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Li, F.F.: Imagenet large scale visual recognition challenge. IJCV (2015) 33. Yi, D., Lei, Z., Liao, S., Li, S.Z.: Learning face representation from scratch. arXiv preprint arXiv:1411.7923 (2014) 34. Wang, D., Otto, C., Jain, A.K.: Face search at scale: 80 million gallery. arXiv preprint arXiv: 1507.07242 (2015) 35. Kumar, N., Berg, A.C., Belhumeur, P.N., Nayar, S.K.: Attribute and simile classifiers for face verification. In: IEEE International Conference on Computer Vision (ICCV), October 2009