Combining Wavelets with HMM for Face Recognition - CiteSeerX

17 downloads 0 Views 122KB Size Report
This paper describes face recognition algorithms that improve upon the original DCT based HMM face recogniser by using wavelet multi- resolution analysis to ...
Combining Wavelets with HMM for Face Recognition Li Bai and Linlin Shen School of Computer Science & IT University of Nottingham Nottingham NG8 1BB {bai,lls}@cs.nott.ac.uk Abstract This paper describes face recognition algorithms that improve upon the original DCT based HMM face recogniser by using wavelet multiresolution analysis to extract observation sequences . In this approach a face image is divided into a number of overlapping subimages and wavelet decomposition is performed on each of the subimages. The ORL and our own face databases are used to test the algorithms and it is observed that our algorithms give better performance than the original. The face recognition algorithm is incorporated into a real-time face recognition system developed at the University of Nottingham.

1. Introduction Automated face recognition systems have a wide range of important applications such as crime prevention and detection. Major approaches to face recognition include Graph Matching, Neural Networks, Hidden Markov Models. An example of graph matching is the Dynamic Link Architecture first proposed by Von der Malsburg [1]. Gabor wavelet responses at the intersection points of a rectangular grid overlaid on the face are taken as image features and represented as a graph structure, with the extracted features as the nodes and topological relationships between the feature locations as the edges of the graph. The face recognition process then becomes an elastic matching process, in which the test face graph is associated with one of the stored model face graphs through an optimisation procedure that maximises the similarities between the test and the corresponding model face graph. More recently the Face Bunch Graph [2] is proposed to cope with variations in face images. Gabor wavelets are applied at manually selected fiducial points (eyes, mouth, nose, etc.) of several images of a face and the results, refereed to as jets, are packed in a graph called the Face Bunch Graph. Alternatively faces can be recognised by their distinctive features suitable for a certain classifier such as a neural network [3]. Liu described a method that forms feature vectors through convolution of the image with Gabor wavelet kernels at five scales and eight orientations [4]. The rows of each resulting image are concatenated into a single vector and all vectors such obtained are concatenated. This produces a vector of 10240 dimensions and subsequently the eigenface method [5 ] is used to reduce vector dimension. Eigenfaces are a set of orthogonal eigenvectors arising from applying Principle Component Analysis [6] to a collection of images in a face database and any face image could then be described by its projection coefficients onto the eigenfaces. This reduces the dimension of face vectors before recognition takes place and no longer was there the need of dealing with the potentially difficult problem of locating

1

facial features. Though the most well-known, the eigenface method is by no means perfect. Belhumeur’s fisherfaces [7] is motivated by the observation that the eigenfaces corresponding to large eigenvalues often encode variations in illumination rather than the identity of the individual and the eigenface approach projects into a space that captures the most overall variance of all classes without discrimination. The fisherface approach uses within-class and between-class scatter matrices to formulate class separability criteria and it projects into a space that maximises the variance of different classes and minimises the variance of the same class. Other extensions of the eigenface method include McGuire’s eigenpixels [8], Etemad’s Linear Discriminant Analysis [9], and Pentland’s view based and modular eigenspaces [10]. Independent Component Analysis (ICA) is believed to be more effective than PCA [11] as it imposes a much stronger criterion. However, Baek [12] presents a rigorous comparison of PCA and ICA and discovers that, when a proper distance metric is used, PCA outperforms significantly ICA on the FERET face database [13] of more than 1000 face images, even though ICA is computationally more complex and uses PCA for pre-processing. In the following sections we will describe HMM based face recognition and our experiments in using wavelets to improve the accuracy of Embedded HMM.

2. The Hidden Markov Model Hidden Markov Models (HMM) see a face image as a sequence of states produced when the face is scanned from top to bottom. More interesting is the Embedded HMM proposed by Nefian [ 14]. The embedded HMM consists of a set of super states and each super state is associated with a set of embedded states. Super states of such a HMM represent primary facial regions whilst embedded states within each super state describe in more detail the facial regions. States are associated with probabilistic dis tributions and transitions from states to states can also be predicted from a transition probability matrix but this is not allowed between embedded states of different super states. At a particular state an observation sequence can be generated according to the associated probability distribution. A face image is divided into overlapping subimages. To deal with illumination changes, image shifts and rotation, 2D DCT [ 15] coefficients of the subimages are extracted to produce observation vectors, but only the coefficients representing low frequencies in the 2D-DCT domain are chosen to further reduce the dimension of observation vectors. Specifically, a HMM can be represented as a triplet λ = { A, B, Π} consisting of • • •

The number of states N, and the state at time t is given by qt ,1 ≤ t ≤ T , where T is the length of the observation sequence. The initial state distribution: Π = {π i } , where π i = p{q1 = i},1 ≤ i ≤ N The state transition probability matrix A = {a } , where ij

a •

ij

= p{q

t +1

N = j | q = i}, 1 ≤ i, j ≤ N , 0 ≤ a ≤ 1 , and ∑ aij = 1 . t ij j =1

A probability distribution for each of the states,

B = {b j (o t )} , usually

probability density function is approximated by the weighted sum of M

2

M b (o ) = ∑ c N ( µ , Σ , o ) where j t jm jm jm t m =1 N ( µ , Σ , o ) is a Gaussian pdf with mean vector µ and covariance jm jm t jm

Gaussian

distributions:

matrix Σ jm ,

c jm is weight coefficients for the m th mixture in state j with

constraints c

jm

≥ 0 ,1 ≤ j ≤ N ,1 ≤ m ≤ M and

M ∑ c jm = 1 . m=1

3. Extracting Features for HMM As described in the last section, the HMM approach to face recognition involves extracting observation vector sequences from subimages. We believe that DCT may not be the best way to describe an image and wavelets may be a better choice so we propose to use wavelet multi-resolution analysis [16]. Wavelets are orthonormal basis functions, which are scaled and translated versions of the ‘mother wavelet’. The fundamental idea behind wavelet analysis is to look at a function at different scales and resolutions by spanning the function using wavelets [17][ 18][ 19]. In practice this is done by convolving the function with wavelet kernels to obtain wavelet coefficients representing the contributions of wavelets in the function at different scales and orientations. Wavelets have many advantages over other mathematical transforms such as the Fourier Transform or DCT. Functions with discontinuities and functions with sharp spikes usually take substantially fewer wavelet basis functions than sine-cosine functions to achieve a comparable approximation. Mallat started to use wavelets in signal processing in 1985 when he discovered the relationship between wavelet bases and multi-resolution analysis - constructing wavelet bases corresponds to a multi-resolution analysis process [20 ]. The multi-resolution analysis process is associated with a father wavelet (the scaling function), and a mother wavelet and it is the mother wavelet that forms the basis of discrete wavelet transform. In implementing discrete wavelet transform two filters and their adjoins are used to decompose a signal into a pyramid structure in the form of approximation images and detail images. It is observed that features of the approximation images change smoothly, which is a suitable property for discrimination. The approximation images are also known to be a robust and distortion tolerant feature space for many pattern recognition applications. In using wavelets to produce observation vectors our algorithms work by scanning the image from left to right and top to bottom using a P × L sized window and performing wavelet multi-resolution analysis at each subimage using Symmlet 5 wavelet. The subimage is decomp osed to a certain level and the coefficients at the lowest level or the energy of the subbands are selected to form observation vectors for the HMM. 2D J level wavelet decomposition on an image I[P ,L] represents the image by 3 J +1 subimages

[a J , {d1j , d 2j , d 3j } j=1,..., J ] a J is a low resolution approximation of the original image, and d kj are detail images containing the details of the image at different scales (2j ) and orientations ( k ). where

3

Wavelet coefficients in

d1j , d 2j and d 3j correspond to vertical high frequencies

(horizontal edges), horizontal high frequencies (vertical edges), and high frequencies in both directions, respectively. Figure 1 shows the 2 level wavelet decomposition of a subimage representing an eye. For this 2 level decomposition in total 7 subbands

{a 2 , d 11 , d 12 , d13 , d 12 , d 22 , d 32 } are generated. In the following experiments we produce observation vectors in two different ways: (i) Extract wavelet coefficients using a J . (ii)

Extract subband energy, i.e., the observation vector is defined as:

[{e 1j , e 2j , e 3j } j =1,..., J ] where

e kj denotes the l 2 -norm (

2

) of the subimage

d kj .

P L

DWT

(a)

(b)

Figure 1 (a) Extracting observation vectors (b) Wavelet decomposition

4. Experimental Results Two databases , the ORL and the face database created by ourselves are used to test the proposed algorithm. The ORL database contains 400 frontal face images of 40 people whilst our own database consists of 300 images of 30 people. The images in our own database are captured using a Creative webcam in the office environment. These images are more difficult than those in the ORL database due to large lighting and background variations. In the experiments, both P and L are set to 12 and neighbouring subimages are allowed to overlap by 3 in each direction. We first produce observation vectors from approximation images using method (i). 2 level wavelet decomposition is performed on each subimage. We then use method (ii) to extract subband energy by performing 5 level wavelet decomposition on each subimage. The lengths of the observation vectors are 9 for method (i) and 15 for method (ii) respectively. No.of Training images DCT HMM DWT HMM i DWT HMM ii

1 76.1% 77.5% 80.8%

2 78.1% 88.1% 84.7%

3 85.7% 95.4% 92.5%

4 92.5% 95.4% 95.8%

5 98.5% 97.5% 96.5%

Table 1: Experimental results for ORL database

Table 1 above shows the comparative results of our new algorithm and the original algorithm. The number of training images per person varies from 1 to 5. It is clear that when the number of images used for training is less than 5, both of our new methods

4

perform significantly better than the original algorithm. Table 2 below shows the results using our own database when 5 images of each person are used for training. It also shows the computation time required for all of the algorithms. The proposed method (i) performs better and can be applied in real time systems. A Pentium IIII 1.8 GHz PC is used for all the experiments. Recognition Rate DCT HMM DWT HMM I

92.6% 95.3%

Training Time (per person) 0.30 seconds 0.68 seconds

DWT HMM ii

90.7%

1.27 seconds

Recognition Time (per image) 0.48 seconds 1.55 seconds 2.49 seconds

Table 2: Experimental results for our own database

5. The Face Recognition System The face recognition algorithm described in previous section is incorporated into a real time face recognition system encompassing two major components: face detection and face recognition. Faces are automatically detected from a video sequence and recognised against a face database. The face detection component is based on that described in [21], which uses Haar-like features to detect facial regions in an image. A tuple r=(x,y,w,h,a) (x, y, w, h = 0 and a ∈ {0, 450 }) denotes a rectangle and its pixel sum is represented by S(r). Given a subimage to be tested for the presence of a face, the feature of the subimage is defined as:

feature = w1 S ( r1 ) + w2 S ( r2 ) where weights w1 , w2 ∈ ℜ and the pairs of rectangles r1 , r2 represent one of the 14

feature prototypes shown in Figure 2 - the black and white areas are the two rectangles used to calculate the feature in the equation above. They are four edge features, eight line features, and two centre-surround features. A cascade of classifiers with 13 stages is trained to detect faces while rejecting non-face patterns. At each stage a classifier is trained using the Discrete Adaboost algorithm [ 22].

Figure 2 Feature prototypes of simple Haar-like features [21].

Figure 3 below shows an example image captured from a webcam and the detected face image. It is obvious that the detection is not very accurate - too much background information is included in the detected face, which will undoubtedly affect recognition accuracy. Therefore we incorporate to the detection algorithm a skin mask module to refine the detected region. The skin colour module is described in detail in one of our

5

earlier papers [23]. To test the effect of the skin mask on face recognition, we use a Creative webcam to capture pictures of 30 people, 10 pictures are taken automatically when a person stand in front of the camera. Figure 4 shows the flow chart to generate the face databases. Databases B and C are generated with the skin mask on and faces thus captured contain less background. The face detection process takes about 100 ms and the skin masking process takes about 45 ms.

(a) Figure 3

(b)

(a) Original input image (b) Detected face image

Database A

Face Detection

Skin Masking

Skin Masking

Database B

Histogram Equalization

Database C

Figure 4 Creating face databases.

6. Conclusion We have proposed DWT based HMM algorithms for face recognition. 2D wavelet decomposition is applied to the image subwindows to extract observation vectors from low resolution approximation images and from subbands energy. Both methods are tested using two face databases and the results are compared with the original DCT based HMM algorithm.

References 1 2

von der Malsburg C., Pattern Recognition by Labelled Graph Matching, Neural Networks, vol. 1, 141-148, 1988. Wiskott L., Fellous J.M., Kuiger N., von der Malsburg, C., Face Recognition by Elastic Bunch Graph Matching, Pattern Analysis and Machine Intelligence, IEEE Transactions, VOL. 19, NO 7, July 1997.

6

3

Bartlett M.S., and Sejnowski T.J., Viewpoint Invariant Face Recognition using Independent Component Analysis and Attractor Networks, Neural Information Processing Systems – Natural and Synthetic, M. Mozer, M. Jordan, and T. Petsche, Editors, 1997, MIT Press. 4 Liu C. and Wechsler H., Gabor Feature Based Classification Using the Enhanced Fisher Linear Discriminant Model, IEEE Transactions on Image Processing, Vol. 11, No. 4, April, 2002. 5 Turk M. and Pentland A., Eigenfaces for Recognition, Journal of Cognitive Neuroscience, 3(1), 1991. 6 Jolliffe I.T., Principal Component Analysis. Springer-Verlag; New York; 1986. 7 Belhumeur P., Hespanhu J. and Kriegman D., Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 19, July 1997. 8 McGuire P. and D'Eluteriom M.T, Eigenpaxels and a neural network approach to image classification, IEEE Tansactions on Neural Networks, VOL.12, NO.3, May 2001. 9 Etemad K. and Chellappa R., Discriminant Analysis for Recognition of Human Face Images. In 1st International Conference on Audio and Video Based Biometric Person Authentication, 1997. 10 View-Based and Modular Eigenspaces for Face Recognition, Pentland A., Moghaddam B., Starner T., Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 1994. 11 Comon P., Independent Component Analysis, A New Concept? Signal Processing, 1994. 12 Baek Kyungim, et al, PCA vs. ICA: A Comparison on the FERET Data Set, International Conference on Computer Vision, Pattern Recognition and Image Processing, Durham, North Carolina, March 8-14, 2002. 13 Phillips P.J., Wechsler, H., Huang J., Rauss P., The FERET Database and evaluation procedure for face recognition algorithms, Image and Vision Computing, 16, 1998. 14 A. Nefian and M. Hayes. An Embedded HMM-based Approach for Face Detection and Recognition, Proc. IEEE Int. Conf. On Acoustics, Speech and Signal Processing, vol. 6, 1999. 15 Ahmed, N., Natarajan, T., and Rao, K. R. Discrete Cosine Transform, IEEE Trans. Computers, vol. C-23, Jan. 1974. 16 C. Garcia, G. Zikos, G. Tziritas, Wavelet Packet Analysis for Face Recognition, Image and Vision Computing, 2000. 17 I. Doubechies, Orthonormal Bases of Compactly Supported Wavelets, Comm. Pure Applied Math, Vol. 41, 1988. 18 Weiss, L. G., Wavelets and Wideband Correlation Processing, IEEE Transactions on Signal Processing, January 1994. 19 Gilbert Strang and Truong Nguyen, Wavelets and Filter Banks, Wellesley-Cambridge Press, 1996. 20 Mallat, S., and Zhong S., Wavelet Transform Maxima and Multiscale Edges. Wavelets and Their Applications, 1992. 21 Rainer Lienhart and Jochen Maydt . An Extended Set of Haar-like Features for Rapid Object Detection, IEEE ICIP 2002, Vol. 1, pp. 900-903, Sep. 2002. 22 P.Viola and M.Jones, Rapid Object Detection using a Boosted Cascade of Simple Features. IEEE Conf. on Computer Vision and Pattern Recognition, Kauai, Hawaii, Dec. 2001. 23 Li Bai and LinLin Shen. Face Detection by Orientation Map Matching. Proceeding of International Conference on Computational Intelligence for Modelling Control and Automation, Austria, Feb. 2003.

7