Improved HMM based face recognition system

0 downloads 0 Views 175KB Size Report
occurs in a natural order when the image is scanned the same way we can ... rotations and small variations in face orientation. THE .... Histogram. Equalization (CLAHE)[10,11] normalization technique. .... Contrast Limited Adaptive Histogram.
Improved HMM based face recognition system P. Corcoran, C. Iancu and G. Costache Dept. of Electronic Engineering, NUI Galway, Ireland

Abstract— In this paper we present a face recognition system based on an embedded hidden Markov models, which uses an efficient set of observation vectors obtained from the 2D-DCT coefficients. Two dimensional data such as images are much better modeled by a two dimensional HMM compared to a one dimensional HMM, but the computational complexity of the first makes the recognition process difficult. The embedded HMM realizes a compromise between the two models: due to its pseudo two dimensional structure is able to model the two dimensional data better than the one dimensional HMM and is computationally less complex than the two dimensional HMM. In order to improve the robustness of the recognition system to different illumination we apply an illumination normalization technique (CLAHE) prior to analysis.

I. INTRODUCTION Face recognition remains one of the most important areas in computer vision with a very successful history, and still captures the attention of many researchers from both academic and industrial environments. One of the main reasons for this interest is the large area of commercial applications where the face recognition systems could be used starting with small, basic login applications and finishing with high security access control, secure biometric based transactions or security surveillance which requires a very high accuracy. Although there are many face recognition system reported in the literature with high recognition rates there still isn’t a widely accepted method to detect, analyze and classify the face regions inside an image. The reason for this situation is that every recognition system has weak points regarding their robustness to different factors, mostly concerning the conditions of capturing the images like: illumination, size, quality and face orientation or pose. Even if the systems are reported with high recognition rates tested with standard databases when tested with different database in real-world conditions the performances of the systems will depreciate. Another reason is that usually a face recognition system is initially designed for a certain type of application which means that its characteristics, like the recognition rate, the false acceptance rate and false rejection rate, its complexity or the size of the resulted database that has to be stored for classification will make this system unsuitable for other different types of applications [1]. Good reviews of face recognition systems are presented in [2] and [3]. As one can see, a high variety of methods and algorithms were used in the systems and also a large period of time was dedicated to it by a large community of researchers. The proposed system uses a variant of 2D HMMs called embedded HMMs. The HMM classification is based on modeling the pattern to be recognized as a process represented by a succession of states. The states represent parts of the pattern and by analyzing them we

can obtain sets of observations which are used to train models and further can be used for classification. The HMM is successfully used in continuous speech recognition because it can cope with different lengths of the same pattern (word, phoneme). Considering the fact that the facial features in an image occurs in a natural order when the image is scanned the same way we can assign the same states in the model for the same regions in the image. One advantage of using HMMs for face recognition is their robustness to small rotations and small variations in face orientation. II. THE EMBEDDED HMM Hidden Markov Models are a set of statistical models used to characterize the statistical properties of a signal. HMM consist of two interrelated processes: an underlying, unobservable Markov chain with a finite number of states, a state transition probability matrix and an initial state probability distribution, and a set of observations, defined by the observation density functions associated with each state [4]. The embedded HMM was first introduced for character recognition by Kuo and Agazzi in [5] and has a large applicability in pattern recognition involving two dimensional data. The embedded HMM is a generalization of the classic HMM, where each state in the one dimensional HMM is itself an HMM. Thus, an embedded HMM consists of a set of super states along with a set of embedded states. The super states model the two dimensional data along one direction, while the embedded HMMs model the data along the other direction. The elements of an embedded HMM are: • A set of N0 super states,

S0 = {S 0,i }, 1 ≤ i ≤ N0

• The initial probabilities of the super states ∏ 0 =

{π } 0 ,i

where π 0,i is the probability of being in super state i at time zero

{ } where

• The transition probability matrix, A0 = a 0 ,ij

a 0 ,ij is the probability of transitioning from super state i to super state j • The parameters of the embedded HMM for the super state k, 1 ≤ k ≤ N 0 , Λk = (Π 1k , A1k , B k ) which include: The number of embedded states in the k-th super k k k state, N 1 , and the set of embedded states, S1 = S1,i .

{ }

{ }

The initial state distribution, Π 1 = π 1,i , where π 1,i are the probabilities of being in state i of the super state k at time zero. k

k

k

k

The state transition probability matrix A1

{ }

= a1k,ij ,

k

where a1,ij is the transition probability from state i to state j. The probability distribution matrix of the observations, B k . The observations are characterized by continuous probability density functions, considered finite Gaussian mixtures of the form:

b ( O t0 ,t1 ) = k i

M ik

∑c m =1

k im

horizontal direction, as shown in Fig. 2. In the embedded HMM, the observation vectors consist of six coefficients within a rectangular window (3 x 2) over the lowest frequencies in the 2D-DCT domain.

N ( O t 0 ,t1 , μ imk , U imk )

k im is

where c the mixture coefficient for the m-th mixture in state i of super state k, and k k N ( O t 0 , t 1 , μ im , U im ) is a Gaussian density with a mean vector μ im and covariance matrix U im . k

k

Using a shorthand notation, an embedded HMM is defined as the triplet λ = (Π 0 , A0 , Λ ) ,

{

where Λ = Λ , Λ ,...., Λ 1

Figure 2. Face image parameterization and blocks extraction

2

N0

}.

IV. TRAINING THE FACE MODELS Each individual in the database is represented by an embedded HMM face model. A set of images representing different instances of the same face is used in the training phase. In Fig. 3 is presented the training scheme used in the process of face recognition:

This model is appropriate for face images since it exploits an important facial characteristic: frontal faces preserve the same structure of "super states" from top to bottom, and also the same left-to-right structure of "states" inside each of these "super states" [6, 7]. The state structure of the face model and the non-zero transition probabilities of the embedded HMM are shown in Fig. 1. Each state in the overall top-to-bottom HMM is assigned to a left-to-right HMM.

Figure 3. Training HMM

1.

First, the data is uniformly segmented: the observation of the overall top-to-bottom HMM are segmented in N0 vertical super states, then the data corresponding to each of the super k

2. Figure 1. Embedded HMM for face recognition

III. THE OBSERVATION VECTORS The observation sequence for a face image is formed from image blocks that are extracted by scanning the image from left-to-right and top-to-bottom. Adjacent image blocks overlap in the vertical direction, and in the

states is segmented from left to right into N 1 states. After segmentation, the initial estimates of the model parameters are obtained. At the next iteration, a doubly embedded Viterbi algorithm replaces the uniform segmentation. Viterbi algorithm evaluates the likelihood of the best match between the given image observations and the given HMM, P (O , Q | λ ) , and performs segmentation of image observations by HMM states. The segmentation is done on the basis of the match found.

3.

4.

The model parameters are estimated using a Euclidean distance to group vectors around the existing mixtures centers. The iteration stop and the parameters of the embedded HMM are estimated, when the Viterbi segmentation likelihood at consecutive iterations is smaller than a threshold.

and its histogram. The histogram is a modified form of the ordinary histogram in which the contrast enhancement induced by the method at each intensity value is limited to a selectable maximum. Fig. 5 shows the effect of the normalization processing

V. FACE RECOGNITION SYSTEM In the recognition phase, the remaining set of images not used in the training process is considered to determine the recognition performances of the system. After extracting the observation vectors as in the training phase, the probability of the observation sequence given an embedded HMM face model is computed via a doubly embedded Viterbi recognizer. The model with the highest likelihood is selected and this model reveals the identity of the unknown face, as shown in Fig. 4. Figure 5. The effects of applying CLAHE normalization method

We tested the system using 1 to 5 training images for the BioId databases, and the rest of 15 faces remained from each person is used for testing, and 1 to 4 images for the Achermann database, the rest of 6 faces were used for testing. The recognition performances of the system are shown in table 1 for the BioID database and in table 2 for the Acherman database. Table 1 Recognition rates for BioID database Original Normalized No faces train/No images % images % faces test 41,7 45 1/19 51,3 56 2/18 58 67,7 3/17 62 67,7 4/16 66,3 77 5/15 Figure 4. HMM recognition scheme

VI. EXPERIMENTAL RESULTS The face recognition system has been tested on two databases: BioId database [8] which contains 400 pictures corresponding to 20 people's faces, frontal views with high variations in facial expressions and illumination and the Achermann database from the University of Bern in Switzerland which contains 200 pictures corresponding to 20 people, 10 face image per individual, with different head orientations and very small differences in illuminations. All pictures are gray scale images. Prior to analysis, a face detector [9] was applied on the entire image to extract only the face region and no other method was used to align the faces. This makes the recognition system fully automatic. First series of test consist in choosing a number of images from the database for training the HMMs and using the remaining faces for testing. For the second series of tests we normalized both databases using the Contrast-Limited Adaptive Histogram Equalization (CLAHE)[10,11] normalization technique. CLAHE is based on examining for each pixel the histogram of an image region centered in it and assigning a new intensity for the pixel based on rank of its intensity

Table2. Recognition rates for the Achermann database Original Normalized No faces train/No images % images % faces test 64,2 77,5 1/9 69,2 87,5 2/8 74,2 91,7 3/7 83,3 95 4/6 Graphical representations of the results are given in Fig. 6 for BioID database and in Fig. 7 for the Achermann database.

Recognition Rate %

REFERENCES 90 80 70

[1]

60 50 40 30

original normalized

20 10 0 1

2

3

4

5

No of images for training

Recognition Rate %

Figure 6. Recognition rates for BioID database

100 90 80 70 60 50 40 30 20 10 0

original normalized

1

2

3

4

No of images for training

Figure 7. Recognition rates for Achermann database

It can be observed that in all cases for both databases the results when using the normalization method improved substantially comparing with the result without normalization, proving that the recognition system is more robust to variations in illumination. An obvious remark is that the recognition rate increases when using multiple images for training the HMM. This is due to the fact that new images improve the model for each person because it brings new details/information useful for classification. VII. CONCLUSIONS We presented in the paper a face recognition system based on using HMM models for classification which uses a set of observation vectors obtained from the 2D-DCT coefficients. In order to make the system more robust to illumination we applied an illumination normalization technique based on histogram equalization called CLAHE. We used two standard databases to test the accuracy of the system and compared the results when using the normalization algorithm and without normalization for different number of faces used in training and testing. The results showed that by normalizing the illumination inside the face regions we can achieve a higher accuracy for the recognition. The increase of the recognition rate was between 10% and 15 % for different configurations. Higher recognition rates could also be achieved by using more faces in the training stage in order to cope with other variations than illumination in the databases. Future work implies normalizing also other variations like face orientation.

P. Corcoran and G. Costache, Automated sorting of consumer image collections using face and peripheral region image classifiers IEEE Transactions on Consumer Electronics, Volume 51, Issue 3, Aug. 2005 Page(s):747 - 754 [2] W. Zhao, R. Chellappa, PJ Phillips, and A. Rosenfeld, Face recognition: A literature survey, ACM Comput. Surv., vol.35, no.4, pp.399–458, 2003. [3] R. Chellappa, C.L. Wilson, and S. Sirohey, Human and Machine Recognition of Faces: A Survey, Proc. IEEE, Vol. 83, pp. 705-740, 1995. [4] L.R Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition, Proceedings of the IEEE, Volume 77, Issue 2, Feb. 1989 pp. 257 – 286. [5] S. Kuo and 0. Agazzi, Keyword spotting in poorly printed documents using pseudo 2-D Hidden Markov Models, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 16, pp. 842-848, August 1994. [6] A.V. Nefian, M.H. Hayes III, An embedded HMM-based approach for face detection and recognition, IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP Proceedings., 1999, pp. 3553-3556 vol.6 . [7] A.V. Nefian, M.H. Hayes III, Maximum likelihood training of the embedded HMM for face detection and recognition, International Conference on Image Processing, 2000. Proceedings. 2000, 10-13 Sept. 2000 pp. 33 - 36 vol.1. [8] http://www.humanscan.de/support/downloads/facedb.php [9] P. Viola and M. Jones, Rapid object detection using a boosted cascade of simple features, Conference on Computer Vision and Pattern Recognition, 2001, pp. 511-518. [10] K. Zuiderveld,. Contrast Limited Adaptive Histogram Equalization. Graphics Gems IV. P. Heckbert. Boston, Academic Press. IV 1994: pp 474—485. [11] SM Pizer, EP Amburn, JD Austin, et al. Adaptive histogram equalizations and its variations, Comput Vision Graph Image Process 1987; 39:355-368