chapter 1 introduction

0 downloads 0 Views 3MB Size Report
biometrics as the system used for human recognition consisting of identification and ... face, hand, hand geometry, finger prints, ear shapes, iris or retina.


Biometrics has become a common system used in various applications such as attendance recording, traffic and toll monitoring; identification of a person etc. The biometrics provides great solution to security technologies. The need for a system being able to recognize people for various purposes is increasing widely. Conventional biometrics or person authentication system usually uses two mechanisms for verification and identification. Verification deals with one-to-one matching whereas identification ensures one-to-many matching. Use of password, key, or PIN (personal identification number) has many drawbacks such as; the password may be forgotten or hacked; PIN numbers may be stolen; or cards may be Use of card or key has several disadvantages such as can be lost, stolen, used together, and easily duplicated. The use of PINs and passwords has also several issues, such as password may be forgotten, used together or can be cracked by unauthorized user. If the biometric modality used in the system is single then the possibility of inaccurate result of matching is high; and therefore use of more than one biometric modalities; is suggested so that the recognition accuracy could be improved and the scope of breaching the authenticity can be avoided. This chapter gives an overview of biometric security technologies and need of multimodal biometrics.

1.1 Introduction to Biometrics According to Ross et al. (2003), biometrics deals with automatic recognition of a person using distinguishing traits which are also known as biometric modalities. A more expansive definition of biometrics is “any automatically measurable, robust and


distinctive physical characteristic or personal trait that can be used to identify an individual or verify the claimed identity of an individual”. Jain et al. (2004) defined biometrics as the system used for human recognition consisting of identification and verification along with their advantages, disadvantages and some challenges. Biometric technologies are developed on the basis of recognizing the characteristics of person. Biometrics is very useful technology used in numerous applications. Examples of biometrics include face recognition, fingerprint matching, DNA matching, iris recognition, gait recognition etc.

1.2 Biometric Modalities Biometrics or biometric security technology uses some characteristics known as biometric modalities; which can be classified as: physiological and behavioral modalities. The biometric modalities are also referred as biometric traits. Jain et al. (2006) opined that all biometric verifiers may be considered combinations of physiological and behavioral characteristics due to the interaction between the user and the system. Any physiological or behavioral feature may be used as a biometric verifier as long as it satisfies the requirements. Some of the modalities are related to physical structure or properties of human body; and some other traits are associated with human behavior. Few examples of physiological and behavioral traits are:

1.2.1 Physiological Traits: Physical characteristics of human being such as face, hand, hand geometry, finger prints, ear shapes, iris or retina characteristics etc. are called as physiological traits. Few of the traits of same person are shown in Fig. 1.1. The shapes of ear form the basis for matching in biometric system.


1.2.2 Behavioral Traits:

The characteristics associated with behavior of

persons are known as behavioral traits such as signature, voice, keystroke, gait pattern etc.

(a) Ear

(b) Face


(c) Foot

(d) Eye highlighting iris of a person. Fig.1.1: Physiological traits of a person.

1.3 Biometric Authentication System A biometric authentication system can operate in two important modes namely identification and verification. Identification deals with one-to-many matching; whereas verification is one-to-one, which means that verification can result who the person is among the persons present in database. Identification authenticates a person and ensures presence or absence of a person but it does not report the exact identity of the person. Fig. 1.2 is very good example of a biometric attendance system in which a number of employees are registered or enrolled in the database. At the time of testing 4

the system results the presence of a person and marks his attendance. Identification mode does not tell the identity claim from the user Sinha et al. (2013).

Fig. 1.2: An example of typical biometric attendance system.

The enrolment is very important process involved in biometrics. This is illustrated in Fig. 1.3. The steps of biometric trait enrolment are: 

The biometric data or input is captured using suitable acquisition system or sensor.

The modality is stored inside the biometric database.

Features are extracted from the traits and converted into suitable transformations, called as biometric templates.


Fig.1.3: Enrollment.

1.3.1 Training and Testing involved in Biometric System Actually, biometric system is divided into two major processes namely training and testing. During training process, biometric modality is captured and converted into suitable template. This process is performed as: 

Input image or signal is captured or acquired.

The signal is pre-processed to remove noise or some similar signal; or image resizing, reformatting takes place.

Feature extraction is performed.

Features are transformed into suitable templates and these are stored in template database.

Testing process is performed at the time of matching, which is similar to training method. The input is captured and it is subjected to: 


Feature extraction.

Template conversion.

The template is matched against the templates already present in the template databases. If there is matching, it results matching of the input. The training and testing processes are illustrated in Fig. 1.4 and Fig. 1.5 respectively.


Capturing Input Image


Feature Extraction

Conversion into Templates

Template Database

Training Process

Fig. 1.4: Training process involved in biometrics.

1.3.2 Verification and Identification In the verification mode of biometrics, person’s identity is validated by comparing the captured biometric data with the biometric templates stored in the system database. An individual who desires to be recognized claims an identity is authenticated. Verification is also known as one-to-one matching. In the identification mode ob biometrics, the system recognizes an individual by searching the templates of the entire database for a match. Therefore the system performs a one-to-many comparison. Identification ensures presence of an individual inside the database but does not indicate the exact identity of the person subjected for authentication.


Image to be tested

Preprocessing and Feature Extraction

Conversion into Template

Matching against Template database

Matched or Not Matched

Testing Process

Fig. 1.5: Testing process involved in biometrics.


Comparison of Biometrics using Different Traits

Biometrics uses two main types of biometric traits: physiological and behavioral. Accordingly, there are several methods of biometric techniques based on the modalities used in the system. Bhattacharyya et al. (2009) opined that biometric authentication systems have been developed based on these characteristics or traits of person. A brief study is made highlighting important types of biometric security technologies here. 

Face biometrics: Face recognition technique is a process of recognizing a person based on features extracted from the face of the person. This is an application of computer for automatic identification or verification of a person


using digital image or a video frame captured. This type of biometric system is most commonly used technique. Face recognition methods are various types using facial metrics and Eigen faces. Facial metric method relies on the specific facial features such as positioning of eyes, nose and mouth and distance between these features; whereas the Eigen face method is based on differentiating faces according to the degree of it with a fixed set of 100 to 150 Eigen faces. 

Ear biometrics: Ear biometrics uses ear as a modality where features or characteristics of ear are used as the basis of matching. This is a stable biometric system and does not vary with age. The ear is also visible part of the human body that can be used for a non invasive biometric technique. The ears undergo very slight changes from infancy to adulthood. The ears also do not suffer the change in appearance by hair growth like the face does. There are many methods available in the literature on ear recognition.

Footprint biometrics: Footprint identification is the measurement of footprint features for recognizing a person. Footprint is universal, easy to capture and does not change much across time. Footprint biometric system does not require specialized acquisition devices. Footprint image of a left/right leg is captured for people in different angles. No special lighting is used in this setup. The foot image is positioned and cropped according to the key points. There are many different techniques are applied to the resized footprint image to obtain feature. Features are compared with the feature vectors stored in database using distance technique.

Fingerprint biometrics: A fingerprint is an impression made up of ridges and valleys of finger and thumb images. A friction ridge is a raised portion of


the on the palmer (palm) or digits (fingers and toes) or plantar (sole) skin, consisting of one or more connected ridge units of friction ridge skin. The traditional method uses the ink to get the finger print onto a piece of paper. Now in modern approach, finger print readers or scanners are used which are based on optical, thermal, silicon or ultrasonic concepts. Optical finger print reader is the most commonly used and is based on reflection changes at the spots where finger lines touch the surface. This biometrics has some difficulties when the finger is dirty or wet. 

Iris biometrics: Iris recognition uses the iris of the eye which is colored area that isometrics surrounds the pupil. Iris patterns are unique and are obtained through suitable image acquisition system. Each iris structure is featuring a complex pattern. This can be a combination of specific characteristics known as corona, crypts, filaments, freckles, pits, furrows, striations and rings. Although the biometrics is not very user friendly, it gives optimal performance.

Hand Geometry biometrics: This is based characteristics of hand geometry of persons. Every person’s hand is shaped differently and person’s hand does not change after certain age. The methods include the estimation of length, width, thickness and surface area of the hand. Various methods are used to measure the hand geometry using mechanical or optical principle.

Voice biometrics: Voice of every person has different pitch and hence it is considered as behavioral trait of a person. The voice recognition is mainly based on the study of the way a person speaks. Voice recognition is also known as speaker recognition which focuses on the vocal characteristics that produce speech and not on the sound or the pronunciation of speech itself. The


vocal characteristics depend on the dimensions of the vocal tract, mouth, nasal cavities and the other speech processing mechanism of the human body. This biometrics doesn’t require any special and expensive hardware. 

Signature biometrics: Signature recognition is based on the dynamics of making the signature. The dynamics of signature are measured as a means of the pressure, direction, acceleration and the length of the strokes, dynamics number of strokes and their duration. The most obvious and important advantage of this is that a fraudster cannot glean any information on how to write the signature by simply looking at one that has been previously written. There are various kinds of devices used to capture the signature dynamics. These are either traditional tablets or special purpose devices.

Keystroke biometrics: Keystroke is a method of verifying the identity of an individual by the way the person types and uses key strokes on keyboard. The typing rhythm which can cope with trained typists as well as the amateur twofinger typist, play an important role in this type of biometrics.


Challenges in Biometrics

Some of the challenges commonly encountered in implementation of biometric systems are given as: 

Noise: A biometric data captured through an image acquisition system or sensor may be influenced by noise signal added due to sensor itself. This may be due to imperfect acquisition conditions. Other factors that could contribute noise are subtle variations in the biometric itself a fingerprint image with a scar or a voice sample altered by cold. Noisy data could also result from defective or improperly maintained sensors such as accumulation of dirt on a


fingerprint sensor or unfavorable ambient conditions, for example, poor illumination. 

Intra-class variations: These variations are generally caused by a user who is incorrectly interacting with the sensor i.e. incorrect facial pose, or when the characteristics of a sensor are modified during authentication.

Non-universality: The biometric system may not be able to acquire meaningful biometric data from a subset of users. A fingerprint biometric system may extract incorrect minutiae features from the fingerprints of certain individuals, due to the poor quality of the ridges.

Spoof attacks: This type of attack is relevant when behavioral traits such as signature or voice are used. However, physical traits such as fingerprints are also susceptible to spoof attacks.

1.6 Drawbacks of Unimodal Biometrics The biometrics using single trait or characteristic is very simple and easy to use. However, these biometric systems have following major drawbacks: 

The lack of universality of some characteristics.

Noisy signals captured through the sensors due to the incorrect usage and due to the environmental conditions such as humidity, dirt, dust etc.

The discrimination of biometric systems due to a high in-class and low interclass variability.

The recognition performances of unimodal systems are limited to certain level.

Error rates are sometimes unacceptable.

The lack of permanence and variability in biometric characteristics.


The possibility of fraud through voluntarily or involuntarily cloning.

If there are problems with the trait being used; no alternative could save the biometric system.

1.7 Multimodal Biometrics System and its Need According to Jain et al. (2004) the term “multimodal” is used to combine two or more different biometric sources of a person like face, ear, iris and foot sensed by different sensors. Mmultiple sources of biometric information are combined for overcoming some of the limitations mentioned in unimodal biometric system of Ross et al. (2007). Jing et al. (2007) suggested that most of the biometric systems deployed in real world applications are unimodal which rely on the evidence of single source of information for authentication such as fingerprint, face, voice etc. These systems are vulnerable to variety of problems such as noisy data, intra-class variations, inter-class similarities, non-universality and spoofing; which may lead to considerably high false acceptance rate (FAR) and false rejection rate (FRR), limited discrimination capability, upper bound in performance and lack of permanence. A typical example of multimodal biometrics is shown in Fig. 1.6, where four modalities namely face, ear, iris and foot of same person are captured and subjected to training process of biometrics. These modalities could be combined at various levels as shown in figure. The fusion can be applied at matching score level, feature level or decision level; as discussed in previous section briefly.

1.8 Introduction to Fusion levels Multimodal biometrics system involves various levels of fusion. The main aim of using fusion is to determine the best set of experts in a problem domain and devise an


appropriate function that could combine the decisions of individual experts at optimum level. These are categorized as: 

Prior to matching

After matching

Eigen Face

Feature Extraction

Matching Module

Eigen Ear

Feature Extraction

Matching Module

Fusion Module

Iris Template

Feature Extraction

Matching Module Decision Module e


Feature Extraction

Matching Module

Fig. 1.6: Multimodal Biometric system.


Fusion schemes prior to matching are used to integrate the evidence before matching. Sensor level and feature level fusion are important fusion schemes under this category. 

Sensor level: The raw data acquired from multiple sensors can be processed and combined to generate new data from which features can be extracted.

Feature level: The feature sets extracted from multiple data sources can be combined to create a new feature set to represent the individual. The geometric features of the hand may be augmented with the Eigen coefficients of the face to construct a new high-dimensional feature vector.

Fusion schemes after matching are used to combine pieces of evidence after matching. This type of fusion includes the following: 

Match Score level: Multiple classifiers result scores are combined to generate a single scalar score. The match scores generated by the face and hand modalities of a user may be combined using a simple sum rule in order to obtain a new match score which is then used to make the final decision.

Rank level: This type of fusion is relevant in identification systems where each classifier associates a rank with every enrolled identity.

Decision level: When the fusion is at decision level then each matcher outputs are combined to accept or reject in a verification system.

1.9 Thesis Organization The present chapter introduced basic concepts of biometrics and various modalities used along with the challenges of unimodal biometric system. The statement of the problem is also presented. Chapter 2 highlights the existing research contributions in field of multimodal biometrics and their implementation challenges, findings and


research scope. Multimodal biometrics and associated processes are reported in Chapter 3. Chapter 4 discusses fusion schemes used in biometric system. The proposed methodology, implementation algorithms and results have been discussed in Chapter 5. Conclusions and future scope are reported in Chapter 6.



In this doctoral research, an extensive literature survey was done on several research contributions on multimodal biometrics in comparison with single modality biometrics. The different types of fusion schemes such as matching level and rank level fusions were given special emphasis in the context of multimodal biometric system. Since four important traits as face, ear, iris and foot were used and therefore the research works mainly dealing with these traits were specifically studied in terms of their findings and limitations. This chapter discusses important research works done on information fusion methods used in multimodal biometrics.

2.1 Fusion Scheme Multimodal biometrics employs fusion of multiple biometric modalities. A study has been made in this direction. Linas et al. (2004) suggested that the information fusion is an information process that associates, correlates and combines data and information from single or multiple sensors or sources to achieve refined estimates of parameters, characteristics, events and behaviours. A good information fusion method is characterized by minimum influence of unreliable sources compared to reliable ones. The information is modality here in this work. Kludas et al. (2008) explored a number of disparate research areas including robotics, image processing, pattern recognition, information retrieval etc. that utilize fusion scheme in their context of applications. Image processing based applications are mainly using multimodal biometrics and image retrieval methods in which multiple traits of a person are fused using suitable scheme and the performance is 17

improved as compared to single modality of the same person. Multimodal biometric systems rely on the evidence presented by multiple sources of biometric information and hence the fusion is very important step for analysis, indexing and retrieval of such information. There are a number of fusion techniques or methods for any particular information; and most appropriate needs to be chosen for specific information based on necessity of the application and the performance of biometrics. Sanderson et al.(2001) categorized the fusion methods into two broad categories namely fusion before matching and fusion after matching. This was done depending on considering the possible fusion elements or type of biometric information. Fusion before matching includes sensor level fusion and feature level fusion, whereas fusion after matching includes match score level fusion, rank level fusion and decision level fusion.

2.2 Multimodal Fusion As discussed in Chapter 1, there are several challenges associated with the unimodal biometric data such as small variation over the population, large intra-variability over time etc. multimodal biometrics is preferred in authentication based image processing applications. Ross et al.(2004) introduced the main objective of a multimodal biometric system is to improve the recognition performance of the system and to make the system robust over the limitations associated with unimodal biometric systems. Several other approaches have been proposed and developed for multimodal biometric authentication system with different biometric traits and with different fusion schemes. Automatic person authentication using biometric traits has been a research area for biometric scientists and researchers for last many years. Bigun et al. (2005)


introduced recognizing person utilizing multiple biometric traits and has significant advantages such as better recognition accuracy and higher robustness, that is; resistance to sub-system failures, increased recognition performance. Tumer et al. (1999) summarized information fusion as necessary step to

utilize multiple

biometrics for decision making in a single modality system. Initially, neural networks were used for information fusion; later this method is used in various areas, such as econometrics, machine learning, pattern recognition, information retrieval etc. Ross et al. (2003) introduced biometric fusion in multimodal system which is performed at three fusion levels: feature level, matching score level and decision level. Out of these fusion schemes, feature-level fusion could keep the identity information to its most and is expected to perform better.

Abate et al. (2007)

proposed a hybrid face/ear recognition system based on IFS (Iterated Function System) theory, which largely used image compression and indexing mechanisms. First of all, the initial face and ear images are normalized respectively; then the feature extraction process is made local to the region of interest (ROI) defined as the union of eight main areas, which are left eye, right eye, nose, mouth front face, upperleft/right and lower-left/right corners from the ear; to improve robustness against occlusions. Once the segmentation of the face and the ear is done, the objects are indexed separately by using IFS systems as separate regions of interest. Yao et al. (2007) proposed a multimodal biometric system using face and palm print at feature level. Gabor features of face and palm prints were used individually. Extracted Gabor features are then analyzed using linear projection scheme such as principal component analysis (PCA) to obtain the dominant principal components of face and palm print separately.


2.2.1 Sensor Level Fusion Nandakumar et al.(2009) presented a multisensory multimodal biometric system that employed fusion by multiple sources of raw data such as image, video, sound, text, symbols etc., at sensor level. This method was expected to produce more accurate results than the unimodal system by integrating the information. Liu et al. (2003) proposed a face Mosaicking technique which is a method for combining two or more images of the same face. The 3D ellipsoidal models were used to approximate human images. Sim et al. (2003) used geometric mapping and projections of 2D face images onto the ellipsoidal model and utilized based probabilistic model for classification. Raghavendra

et al. (2010) proposed an

approach to combine information obtained from face and palm print image using particle swarm optimization (PSO). Tronci et al. (2007) suggested a process of selecting the best Score Fusion method for the particular problem of verifying the person’s identity, based on the features of handwritten signature, iris and speech. The score fusion includes score combination, score classification and dynamic score selection. Four classifiers were chosen as: weighted sum (Σ), weighted product (Π), neural networks (NN) and Support Vector Machines (SVM).

2.2.2 Feature Level Fusion Ross et al. (2005) employed feature level fusion from multiple biometric feature sets of the same person. As the most information or features related to the identity of a person is available at this level and therefore feature level fusion is likely to perform better than match score level or decision level fusion methods. Ross et al. (2006) justified that there are some inherent problems associated with the fusion method. The


feature spaces of different biometric traits may not be compatible and the feature level fusion may lead to dimensionality problem by concatenating several features. Another multimodal biometric system employed fusion scheme suggested by Jing et al. (2007) as concatenation method for feature fusion. Two feature extraction approaches were used resulting better recognition performance. Rattani et al. (2010) proposed a multimodal biometric system which combines face and fingerprint modalities at the feature level. In this work, feature sets were extracted from face and fingerprint images and concatenated after normalization process to obtain combined feature set. The dimensionality reduction was achieved by implementing several feature reduction techniques for the proposed system. Quan-Sen et al.(2005) proposed a new feature fusion method that adopted the idea of canonical correlation analysis (CCA). Two major groups of feature vector with the same sample were extracted and a correlation criterion function between two groups of feature vectors was established. The canonical correlation features were extracted as per the criterion. An effective set of vectors were formed and the method used correlation feature between two groups of feature vectors. Yuan et al. (2007) developed a multimodal biometrics recognition system based on the geometric characteristics of face and ear. The feature vector includes eight characters, for example; width of ears, height compared with vertical distance between eyes and mouth etc. The method was not robust enough since the features extracted are composed of relative values.

2.2.3 Score Level Fusion Matching score fusion is based on consolidating matching scores generated from different classifiers and that can be applied to most of the multi biometric scenarios.


This is because of its content of adequate information to make genuine and impostor case distinguishable and its availability. Jain et al. (1997) utilized different matching scores from different classifiers; and normalization of these scores is obtained. Using a fingerprint database and a public domain face database, the system achieved higher recognition accuracy at match score level fusion as against single biometric trait. Hong et al. (1998) suggested a bimodal approach with the use of Principal Component Analysis (PCA) based face and minutiae-based fingerprint identification system with a fusion method at the match score level. This was done by integrating the matching scores of different classifiers and making a decision based on the consensus matching scores. Jain et al. (2005) proposed a multimodal approach for face, fingerprint and hand geometry achieving fusion at the score level. Simple maxrule and min-rule method of match score fusion with seven normalization techniques were employed. The final results demonstrated that all fusion approaches exhibit better recognition performance than unimodal methods. Kittler et al. (2005) proposed a theoretical framework for transformation based score level fusion approaches such as sum rule, median rule, min rule, max rule and product rule. It was implemented by combining the scores from three different modalities such as face (frontal and profile) and speech using indicates the supreme performance of the modalities with sum rule. Wang et al. (2003) proposed weighted sum rule, where weights are calculated depending on the individual performance of the modalities. Snelick et al. (2005) proposed a different normalization schemes for transformed based score level fusion and experimental results indicates that Min-Max normalization scheme is more efficient than all other normalization schemes such as Decimal scaling, Median, Double sigmoid and Tanh normalization scheme. It adopted a parametric approach to estimate the conditional densities of the match scores from


different modalities. Core densities are assumed to follow a Gaussian distribution and finally classification is carried out using Bayes rule.

2.2.4 Decision Level Fusion Ross et al. (2006) reported a decision level fusion method which integrates the final decisions of single biometric matchers to form a consolidated decision. This consolidated decision can be obtained by employing various techniques including “AND”/“OR”, majority voting, weighted majority voting, decision table, Bayesian decision and Dumpster-Shafer theory of evidence. Decision level fusion is too rigid and comparatively less sophisticated than other fusion methods as it operates only on binary information.

Frischholz et al. (2000) developed a commercial multimodal

approach for a model-based face classifier, a vector quantization (VQ)-based voice classifier and an optical-flow-based lip movement classifier for verifying persons. Weighted sum rule and majority voting approaches of decision level fusion method were used for fusion. It was experimented on 150 persons and the system could reduce the FAR (false acceptance rate) significantly. Yu et al. (2009) presented a multi biometric approach which combines palm print, fingerprint and finger geometry collected by a digital camera at decision fusion level. Three decision fusion rules, including “AND” rule, “OR” rule and majority voting, are employed to perform the fusion. Experiments were conducted on a database of 86 hands (10 impressions per hand) which showed that the proposed decision fusion methods are effective. Xu et al. (2007) used a subset of multimodal image database. It consists of 294 images for 42 persons. Each person has seven profile views head images with variations of the head position and slightly facial expressions, and some persons wear glasses. The first image is the profile view of the


head, and the second to the seventh are -10°, +10°, -20°, +20°, -30°, +30° rotation respectively. After pre-processed, the ear images were transformed. In their experiment, 5 images per person are used as the gallery and the other 2 images are used as the probe. Then the gallery and the probe separately consist of 210 and 84 images.

2.2.5 Rank Level Fusion Ross et al. (2006) reported rank level fusion which is obtained by combining the individual ranking preferences of several biometric matchers, into a single ranking list of the alternatives representing the consensus and which would aid in establishing the final authentication decision. Farah et al. (2008) proposed the criterion for success is the position of the true class in the consensus ranking, as compared to its position in the rankings before fusion in which rankings of documents are combined in order to produce a consensus ranking. The method was based on decision rules and produced better performance over other positional data fusion methods. Ailon et al. (2010) discussed about rank aggregation from partial ranking lists. Kumar et al. (2010) investigated a new approach for person recognition using rank level combination of multiple palm print representations. Among all of the fusion approaches they investigated, the usage of nonlinearities in conjunction with the weights resulted in the highest performance improvement.

2.3 Biometric Technologies Biometrics is the science of identify humans using biological characteristics. Here, the study is based on biometrics methods.

Jain et al. (2005) reported that the

biometrics is becoming more commonly used in several devices in many places


including computer rooms, research labs, airports, blood banks, ATMs and military installations. Researchers have investigated different biometric identifiers based on several factors including application scenario, associated cost and availability of the identifiers. Each biometric trait has its advantages and disadvantages and no single trait is expected to effectively meet all the requirements of all applications.

2.3.1 Face Recognition Bruner et al. (1954) started to analyse faces to distinguish them in order to conduct psychological research. However, the research on automatic machine recognition of faces could start in the 1970s. Chellappa et al. (1995) did extensive research with the help of psychologists, neuroscientists and engineers on various aspects of face recognition by humans and machines. The early face recognition was mainly based on measured facial attributes such as eyes, eyebrows, nose, lips, chin shape etc. Hong et al. (1998) opined that the lack of appropriate resources, particularly suitable algorithms, as the obstruction to achieve satisfactory performance from a face-based biometric system. Face recognition algorithms can be divided into three categories: holistic methods, which use the whole face image for recognition; featurebased methods, which use local regions such as eyes or mouth; and hybrid methods, which use both local regions and the whole face. Turk et al. (1991) used PCA for face recognition using Eigen space decomposition. The faces were compared using a Euclidean distance measure by projecting them into Eigen face components and results were provided for a 16-users database of 2500 images in various conditions. Belhumeur et al.(1997) proposed a face recognition algorithm, known as Fisher face using both PCA and FLDA (Fisher’s Linear Discriminant Analysis) methods to overcome the problem of illumination and pose variations.


Wiskott et al. (1997) studied deformations of the faces using local features (chin, eyes, nose, etc.) represented by wavelets and computed from different face images of the same subject. Huang et al. (2003) developed a hybrid face recognition system where a combination of component-based recognition and 3D morphable models were used for pose and illumination invariant face recognition.

2.3.2 Ear Recognition Ear is a relatively new class of biometrics used for person authentication. Iannarelli et al. (1989) used manual techniques to identify ear images. Over 10,000 samples of ears were studied to prove the distinctiveness of ears. However, the potential for using the ear’s appearance as a means of personal identification was recognized. Victor et al. (2002) used PCA and FERET evaluation protocol for ear identification. Burge et al. (1998) introduced geometric algorithm utilizing neighbourhood graph and voronoi detected edges of diagram of ears’ curve segments for automated ear recognition. Hurley et al. (2005) applied force field transform to ear images in order to find energy lines, wells and channels. Each image is represented by a compact characteristic vector, which is remarkably invariant to initialization, scale, rotation and noise. The experiment displayed the robustness of the technique to extract the 2D ear. Yan and Bower (2005) compared PCA and ICP methods on 2D and 3D ear images and introduced a fast method based on ICP for 3D shape images. High recognition rate was obtained for 3D ear shapes. But 3D image acquisition required more time so 2D images are preferred for real-time purposes. Feng et al. (2000) computed Eigen faces from a midrange wavelet sub-band. The method was based on wavelet sub-band using PCA for human face recognition. A mid-range frequency subband is selected for PCA representation. Chang et al. (2003) developed an ear


recognition system using Eigen ear method and compare with Eigen face method. The Eigen face and Eigen ear were combined to evaluate the performance of the system. Bhanu et al. (2003) presented a 3D ear recognition method using a new local surface descriptor. The similarity of two ears was determined by three factors namely the number of similar local surface descriptors in ears, geometric constraint, and the match quality.

2.3.3 Iris Recognition Dessimoz et al. (2006) studied human iris as very complex layered structure unique to an individual and is an extremely valuable source of biometric information. The general structure of the iris is genetically, but the particular characteristics are critically dependent on circumstances and stable with age. Daugman et al. (1993) developed iris recognition systems which exploit the complexity and stability over time of iris patterns and claim to be highly accurate. The most well-known algorithm, on which the principle state-of-the-art iris recognition systems are based, That approach was comprised of the four steps - position localization of the iris, normalization, features extraction and matching.

Daugman et al. (1994) suggested

iris recognition algorithm for biometric personal identification system based on iris analysis. This algorithm is based for all commercial iris recognition system and uses 2D Gabor wavelets to perform feature extraction from iris and Hamming distance for comparing those features for classification. Daugman et al. (2003) defined a decision process; the matching software takes two iris codes and compute the hamming distance based on the number of different bits. The hamming distances score (within the range 0 means the same iris codes), which is then compared with the security threshold to make the final decision.


Computing the hamming distance of two iris codes is very fast (it is the fact only counting the number of bits in the exclusive OR of two iris codes). Wildes et al. (1997) introduced some other approaches of iris recognition were also introduced a histogram based model fitting method was used to localize the iris. For representation and matching, author registered a captured image to a stored model, filtered with isotropic 2D band-pass decomposition (Pyramid Laplacian), and followed by a correlation matching based on Fisher’s Linear Discriminant.

Marin et al. (2006)

displayed retina scans require that the person removes their glasses, place their eye close to the scanner, stare at a specific point, and remain still, and focus on a specified location for approximately 10 to 15 seconds while the scan is completed. A retinal scan involves the use of a low-intensity coherent light source, which is projected onto the retina to illuminate the blood vessels which are then photographed and analyzed. A coupler is used to read the blood vessel patterns. A retina scan cannot be faked as it is currently impossible to forge a human retina. Furthermore, the retina of a deceased person decays too rapidly to be used to deceive a retinal scan.

2.3.4 Footprint Recognition Very limited literature is available on footprint recognition system. Nakajima et al. (2000) proposed a technique for the footprint based recognition. Footprints are standardized, together in direction and in point for sturdiness image-matching between the input pair of footprints and the pair of recorded footprints. The Euclidean distance between the geometric information of the input footprint is used proceeding to the normalization. The pressure distribution of the footprint was measured with a pressure-sensing mat.

Jung et al. (2003) proposed methods which are based on

human gait, stable, relatively continuing walking data are the crucial conditions for


person recognition. In future, these methods are very challenging to accomplish with countless change of walking velocity which may be generated often during real walking. In this literature, they recommend a technique which uses just single-step walking records from mat-type pressure sensor. Wang et al. (2004) proposed an alternative system grounded on gait investigation. The dissemination of footprint substantial pressure surface reproduces the performance characteristics and the physiological characteristics of the humanoid figure. Consequently, footprint substantial pressure surface pick-up and depiction is the establishment of footprint biological feature identification. Kuragano et al. (2005) suggested a novel approach based on gait and footprint analysis. Health care providers in Japan assess the recovery status of patients by detecting a variation in the patient’s style of walking. In the first phase of psychoanalysis, the manner of walking is uneven. By way of rehabilitation progresses, the mode of walking of the patient turn into stable state. The techniques of binarization of a foot print image, noise-reduction, and damage and stretching to smoothening of the edge of the binary image to discover the edge of the footprint image are defined. Wild et al. (2008) published for singlesensor hand and footprint-based multimodal biometric recognition by He has developed a system for contemporary humanity, and as it is assumed that no complete biometric modality suitable for all the applications has been established. The novel modality provisions offers underneath accuracy.

2.4 Summary This chapter has presented an extensive literature survey on multimodal biometrics using different fusion schemes and biometric modalities. PCA based methods worked well but used small database. Some were based on iris and other traits but fusion did


not work properly. Appropriate feature extraction plays an important role. As a neural network based principle component analysis for face, Eigen images for ear, Hamming distance for iris and modified sequential Harr transform for foot are used. The logistic regression method has been used as tools for biometric rank fusion.



This chapter presents an overview of multimodal biometrics and proposed methods which have been used for recognition of various biometric modalities. The different fusion schemes namely matching level fusion and rank level fusion utilizing face, ear, iris and foot biometric information have been also discussed.

3.1 Basic Architecture of Multimodal Biometrics First of all, four unimodal biometrics utilizing face, ear, iris and foot individually have been discussed individually. Fig. 3.1 shows a simple unimodal biometrics highlighting the example is of face recognition. Face image is enrolled, pre-processed and features are extracted and stored in the template database. The test facial image is now checked against that of stored in the database. If there is any template available inside the database then matching is resulted. If more than one modality is involved then it becomes multimodal biometrics. Fig. 3.2 shows an example of multimodal biometrics utilizing face, ear, iris and foot as four traits.

Eigen images for ear

recognition, hamming distance for iris recognition and sequential modified Harr transform for foot recognition are used as different classifiers for different traits. Rank level fusion is applied to calculate the rank of individuals. For a multimodal biometric system, selection of appropriate biometric traits is one of the main tasks which depends on several factors such as the type of biometric operation namely identification or verification, perceived risks, type of users, and need for security. The main aim is to evaluate the performance of the multimodal biometric system based on matching level fusion using different classifier approaches.


Unimodal Biometric System Single Biometric Trait Enrollment

Feature Extraction

Matching Result

Feature Matching

Fig. 3.1: Face recognition as an example of Unimodal Biometrics. Multimodal Biometrics

Feature Extraction Enrollment of multiple biometric traits Feature Matching

Rank Fusion

Final Result

Fig. 3.2: Multimodal Biometrics with rank level fusion.


The image databases of all the biometric traits contain images of faces, ears, iris and foots of individuals. The database was prepared by colleting from different persons and subjecting them to certain pre-processing tasks.

3.2 Face Recognition The multimodal biometrics uses recognizable facial characteristics of faces and important features are obtained as templates. The templates are equivalent feature transformations. The templates are matched against known training set of face images. The faces are given as input and with the help of Principal Component Analysis (PCA), features are extracted and stored in template database.

Test face image

Training face images

Apply PCA

Apply PCA

Eigen image and generation of feature vectors

Eigen image and generation of feature vectors

Template Database

Calculate Euclidean distance

Classify face image

Fig. 3.3: Face recognition process. 33

The output is a reconstructed image based on minimum Euclidian distance matching of faces. Fig. 3.3 shows a face recognition process. Faces are recognized on the basis of features which were extracted from the faces to represent the faces. Features are stored as feature vectors. There are numerous research contributions towards feature extraction of facial images. Belhumeur, et al. (1997) suggested few successful approaches which are appearance based methods directly operating on images and process the images as two dimensional (2-D) patterns. Principal Component Analysis (PCA) and Linear Discriminate Analysis (LDA) are two powerful tools used for data reduction and feature extraction in the appearance-based approaches. Ee have used PCA for face recognition in multimodal biometric system. Monwar et al. (2009) introduced the method to produce a sub-space projection matrix of the training images in image space. This method is robust against noise and occlusion; and also against illumination, scaling and orientation; and hence used in this biometrics because the problems mentioned are more likely to occur in faces.

3.2.1 Principal Component Analysis The most popular algorithm used in face recognition methods is Principal Component Analysis (PCA). The main concept of the algorithm is to de-correlate data to obtain differences and similarities by finding the principal directions which are Eigenvectors of the covariance matrix of a multidimensional data. Firstly, system is initialized with a set of training set of face image vector containing images of each subject. Testing of the biometric system makes use of face images from the training set of face images. Then, the images are trained using PCA and training set of images to generate Eigenvectors. The mean image is computed as: ψ


1 M


Γ 34

where ϕ is the mean subtracted image. The image ϕ can be obtained by: ϕ =Γ −ψ

i = 1,2, … . . M


It is large vectors set subjected to PCA to get a set of M orthonormal vectors, Un. The kth vector, Uk, is selected as: =


(∪ Φ )


where vectors Uk and scalars λk are the Eigenvectors and Eigen values respectively. The covariance matrix (CM) is given as:

CM =


= AA


The mean image Ψ is computed and projected onto the “face space” by the M Eigen vectors, resulting: ω

=∪ Φ

K = 1….M


The distance between the projections is calculated by the Euclidean distance between the training and test classification space projections as: d =∥ Ω−Ω ∥


where kth face class is described by d k vector. Each image in the training set is transformed into the image space and its components are stored in memory. An input face is subjected to the system and projected onto the face space. Then Euclidian distance is computed. If the image presented to the system is face or not, needs to be checked. Fig. 3.4 shows training set of faces images and Fig. 3.5 shows equivalent Eigen faces.


Fig. 3.4: Training set of faces.

Fig. 3.5: Eigen faces of facial images of the image database.


3.3 Ear Recognition Ear of a person is a relatively new class of biometrics much like the face is a visible part of the human body which can be used for a non invasive biometric technique. Ear is an example of stable biometrics which does not vary with age. Ear does not have the completely random structure as shown in Fig. 3.6.

Fig. 3.6: Anatomy of the ear.

Several methods are reported in existing literature of biometric systems. In the proposed multimodal system, ear databases different persons were developed. The database contains ear images with varying illumination and orientation. Eigen image algorithm is sued for ear recognition. The steps for generating Eigen ears are similar to that of generation of Eigen faces. Chang et al. (2003) used standard PCA algorithm 37

for ear recognition and concluded that ear and face do not have much difference in terms of recognition rate. Fig. 3.7 shows ear recognition system.

Test image

Feature extraction module

Eigen image

Template database

Decision module


Fig. 3.7: Ear recognition system.

3.3.1 Eigen Image Darwish et al. (2009) described Eigen image method as most effective method for face recognition system and this could be used in ear biometrics also. The ear recognition process is initialized by using the training set of the images of ear. The side face images have been also acquired using high quality camera in the same lighting condition. The ear portion is cropped from the side face image using preprocessing operation. The color images are converted to grayscale images which are subjected for subsequent stages of biometrics. Fig. 3.8 shows the dataset of grayscale


images which are obtained by cropping the ear part of image. Each set of images contains images of training set and test set. An ear image can be considered as a vector in huge dimensional space with concatenating columns. The proposed method is based on normalized ear images that are pre-processed. Then Eigen vectors and Eigen values are calculated over the covariance matrix of the images.

Fig. 3.8: Gray scale images of ears.

The ear images are projected onto the image space, and their weights are stored. Once the Eigen space is defined, the test image is projected into the Eigen space. The images with a low correlation can be rejected. Acceptance or rejection is determined by applying a threshold; the distance below the threshold is a match. Fig. 3.9 shows results as Eigen images for ear images.

3.4 Iris Recognition The iris of an eye is visible ring structure that surrounds the pupil of eyes. It is a muscular structure that controls the amount of light entering an eye. Iris recognition 39

system is supposed to be most accurate biometrics that utilizes the measurable features of the iris. Fig. 3.10 shows an eye highlighting iris and other parts.

Fig. 3.9: Eigen images of ears.

Eyelid Iris



Fig. 3.10: Human eye highlighting iris and other parts.

Hamming distance method was introduced by Hamming et al. (1950); which has been used in iris recognition in the proposed work. The iris images are cropped from


eyes and then applied to pre-processing and encoding with Hough transform, as defined by Hough et al. (1962). The iris parts are localized of the eye image and outside the pupil using an automatic segmentation algorithm based on Hough transform. The Hough transform method is a general purpose method for identifying the locations and orientations of features in a digital image. The method is simple, easy to implement, handles missing and occluded data and can be adapted to many types of forms, not just lines. As iris has edges with a known shape as circle, using Hough transform is feasible for detecting and linking edges to form closed iris areas.

3.4.1 Segmentation Iris region is segmented from the eye image and it is approximated by two circles, indicating the iris boundary and pupil boundary respectively. The eyelids and eyelashes are upper and lower parts of the iris region. Kong et al. (2001) presented a method for eyelash detection, where eyelashes are treated as separable eyelashes, which are isolated in the image, and multiple eyelashes, which are bunched together and overlap in the eye image. Separable eyelashes are detected using 1D Gabor filters, since the convolution of a separable eyelash with the secular reflections along the eye image are detected using threshold. The intensity values at these regions will be higher than at any other regions

in the image. The most popular computer vision

algorithm is Hough transform which is used for geometric shapes like lines and circles in an object. The circular Hough transform can be used to detect iris regions radius and centre coordinates. Wildes et al. (1994) developed an automated iris recognition system. Kong et al. (2001) suggested an accurate iris segmentation method based on novel reflection and eyelash detection model. An individual is identified using human iris recognition. The parameters used are the centre


coordinates xc and yc, and the radius r. A maximum point in the Hough space will correspond to the radius and centre coordinates of the circle best defined by the edge points. The eyelid is detecting by the horizontal direction and the outer circular boundary of the iris is detecting by vertical direction derivatives. A rubber sheet model is used as shown in Fig. 3.11; to remap each point within the iris region to a pair of polar coordinates ((r, θ), where r lies in the interval [0,1] and θ is the angular variable, cyclic over [0,2π]. This remapping of the iris region can be modeled as,

I x(r, θ), y(r, θ) → I(r, θ) where


x(r, θ) = (1 − r) x (θ) + rx (θ); y(r, θ) = (1 − r) (θ) +


I(x, y) is the iris region image; (x, y) are the original Cartesian coordinates; (r, θ) are the corresponding normalized polar coordinates; and (xp, yp) and (xi, yi) are the coordinates of the pupil and iris boundaries along the θ direction respectively. The transformed pattern produces a 2D array with horizontal dimensions of angular resolution and vertical dimensions of radial resolution.

θ 0θ







Fig. 3.11: The rubber sheet model.


3.4.2 Normalization and Encoding After successful iris region segmentation, the segmented region is transformed to convert into dimensions. The dimensional inconsistencies between eye images are mainly due to the stretching of the iris caused by pupil dilation from varying levels of illumination. The normalization process will produce iris regions having same constant dimensions. The homogenous rubber sheet model suggested by Daugman et al. (1988) remaps each point within the iris region to a pair of polar coordinates (r,θ) where r is on the interval [0, 1] and θ is angle [0,2π]. In this system, rotation is accounted during matching by shifting the iris templates in the direction until two iris templates are aligned. Fig. 3.12 shows gray scale cropped iris image and the iris after segmentation is shown in Fig. 3.13.

Fig. 3.12: Cropped and grayscale image of eye.

Fig. 3.13: Segmented iris region. 43

Encoding of features extracted from iris region is achieved by convolving the normalized iris pattern with 1D Log-Gabor wavelets. The 2D normalized pattern is broken up into number of 1D signal. These 1D signals are convolved with 1D Gabor wavelets. The rows of the 2D normalized pattern are taken as the 1D signal; each row corresponds to a circular ring on the iris region. The intensity values at known noise areas in the normalized pattern are set to the average intensity of surrounding pixels to prevent influence of noise in the output of the filtering. The output of filtering is phase quantized to four levels with each filter producing two bits of data for each phase.

3.4.3 Iris Matching Hamming distance is chosen as a metric for recognition as distance measure. The Hamming distance is calculated between two iris templates by using only important bits. These bits in the iris pattern that corresponds to ‘0’ bits in noise masks of both iris patterns will be used in the calculation. The Hamming distance considers only the bits generated from the accurate iris region, and this is modified by each template. The Hamming distance (HD) between two Boolean iris vectors is defined as:



⊗ ||

where, C and M



⋂ ∩ and C

⋂ || B



are the coefficients of two iris images respectively; and M


are the mask image of two iris images respectively. The ⊗ is the XOR

operator which shows difference between a corresponding pair of bits, and ⋂ is the AND operator which shows that the compared bits. The denominator of the above equation is used to reduce the effect of the unwanted portion of the iris due to eyelashes or eyelids. Ideally, the Hamming distance of two irises should be 0.


3.5 Foot Recognition Footprint identification deals with the measurement of footprint features for recognition of identity of a person. Footprint is universal, easy to capture and does not change much across time. Footprint biometric system does not require specialized acquisition devices. Footprint image of a left/right leg is captured of people in different angles. No special lighting is used in this setup. Footprint texture features are usually extracted using transform-based method such as Fourier Transform (Wenxin et al. (2002)) and Discrete Cosine Transform (Jing, et al. (2004)). Wavelet Transform was also introduced by Qian et al. (2002) which is also used to extract the texture features of the footprint. Fig.3.14 shows the footprint recognition process. Image acquisition

Feature extraction

Matching Template database Feature extraction




Fig. 3.14: Footprint biometric system. A footprint-based personal recognition method is used where the foot image is positioned and cropped according to the key points and then sequential modified Haar


Wavelet is proposed to find modified Haar Energy (MHE) feature. Fig.3.15 shows the proposed footprint identification using sequential modified Haar Transform.

3.5.1 Sequential Modified Haar Wavelet Transform Method Foot image features are extracted by transform-based method like Discrete Cosine Transform and Fourier transform. Wenxin et al. (2002) suggested that Fourier transform has floating-valued signals that involve into integer-valued signals giving less accuracy, and Jing et al. (2004) observed in Discrete Cosine Transform that some points are missed leading to incorrect inference. Qian et al. (2002) introduced another transform method wavelet transform which is used to take out the features of the foot image. Footprint







Haar Energy



Database storage

Fig. 3.15: Footprint identification system.


RGB to Gray scale conversion

Segmentation of the foot region

Sequential modified Haar wavelet is used to find the modified Haar energy feature in which sequential modified Haar transform is applied to the resize footprint image to get MHE. The Haar wavelet coefficients are represented using decimal numbers. The MHE feature is compared with the feature vectors stored in database using Euclidean Distance. The accuracy of the MHE feature and Haar energy feature under different decomposition levels and combinations are compared. Fig. 3.16 shows a crpped and gray scale image of foot.

Fig. 3.16: Cropped and grayscale image of foot.

The samples of foot print of different people are cropped and resized. The modified Harr enargy of image is obtained by dividing the image into 4 x 4 blocks. The detailed coefficients of every image are then determined. The modified Haar energy for each of the block is calculated as:


MHEi, j, k =




where i is the level of decomposition; j denotes horizontal, vertical or diagonal details; k is the block number from 1 to 16; and P x Q is the size of the block. Fig. 3.17 shows 4 x 4 blocks of a foot image.

Fig: 3.17: Foot image in 4x4 blocks.

The minimum MHE is selected out of 16 images. Let MHE1, MHE2, MHE3….MHE16 be the modified Haar energy values for 16 blocks. Then a modified value is calculated by taking minimum of all the values. MHE = Minimum(MHE , MHE , … … … … . MHE )


The MHE is compared with the MHE of different persons stored in the database.


3.6 Summary In this chapter, we presented the methodology for the multimodal biometric system in which individual biometric system for four modalities (face, ear, iris and foot) was discussed as recognition process. Face recognition uses principal component analysis classifier. Ear recognition utilizes Eigenimage approach and Iris recognition addressed segmentation, normalization and feature encoding. The Hamming Distance is used as measure for iris recognition. A sequential modified Haar wavelet transform approach is suggested for foot biometrics.



Biometrics uses different unique features which are extracted using different methods. Robustness is improved incorporating many unique features in the recognition system. This is achieved using fusion schemes or methods that help combining various features of different biometric modalities. In this system, it becomes extremely difficult for an intruder to violate the integrity of a system requiring multiple biometric indicators.

4.1 Fusion Scheme Multimodal biometric system can be designed by integrating several modalities using different fusion schemes. The main goal of fusion is to determine the best set of expert values that can optimally combine the decisions rendered by the individual experts. There are several types of fusion schemes in literature used in multimodal biometrics. The brief description of the fusion methods is presented here. 

Sensor Level: The raw data obtained as different modalities from sensors are fused. Ross et al. (2006) suggested that the sensor level fusion can be performed if the data or modalities are obtained from multiple compatible sensors or multiple instances of same biometric trait obtain using a single sensor. Since sensor level fusion combines the information from different sensor, it requires some pre-processing such as sensor calibration and data registration before performing the fusion.

Feature Extraction Level: Feature level fusion consolidates the features obtained from different modalities using suitable methods of feature 50

extraction. Ross et al. (2006) stated that if the features are structurally compatible then the features can be combined and this is done by using the features obtained from different sources. This approach also introduces a curse of dimensionality and hence either feature transformation or feature selection can be applied to reduce the dimensionality of the fused feature set. 

Matching Score Level: Match score is a measure of the similarity between the input and template biometric feature vector. Ross et al. (2006) described in match score level fusion, the match score obtained from different matchers are combined. Since scores obtained from different matchers are


homogeneous, score normalization technique maps the scores obtained from different matcher on to a same range. 

Rank Level and Decision Level: Decision level fusion involves the fusion of decisions obtained using different modalities. Since decision level fusion holds binary value it is also called as abstract level fusion. The strategy adopted for concatenation of biometric modalities depends on the level at which fusion is performed. Fusion at the feature level can be accomplished by concatenating two compatible feature sets. Feature selection/reduction techniques may be employed to handle the curse-of-dimensionality problem.

Duin et al. (2000) studied the fusion at the match score level. Verlinde et al. (1999) stated in the context of verification that two distinct strategies exist for fusion at this level. In the first approach, the fusion is viewed as a classification problem where a feature vector is constructed using the matching scores output by the individual matchers. Then this feature vector is classified into one of two classes: Accept (genuine user) or Reject (impostor). Dieckmann et al. (1997) described the second approach the fusion is viewed as a combination problem where the individual


matching scores are combined to generate a single scalar score which is then used to make the final decision. Ross et al. (2003) demonstrated that the simple sum rule is sufficient to obtain a significant improvement in the matching performance of a multimodal biometric system. There are difficulties with sensor level and feature level fusion because these schemes require the data acquired by different sensors as compatible and feature set obtained by different traits may either inaccessible or incompatible. Fusion at matching score level is the most preferred because it has sufficient information and can be easily combined and accessible. A decision made by a biometric system is either a “genuine” type of decision or an “impostor” type of decision. For each type of decision, there are two possible outcomes, true or false. Therefore, there are a total of four possible outcomes: a genuine individual is accepted or a genuine match occurred, a genuine individual is rejected or a false rejection occurred, an impostor is rejected or a genuine rejection occurred and an impostor is accepted or a false match occurred. In present work, we have used the fusion technique at matching level and rank level for multimodal biometric system. All the biometric traits (face, ear. iris and foot) were acquired and subjected to the biometrics. The genuine score, impostor score, False Accept Rate (FAR), False Reject rate (FRR) were calculated. The ERR/ROC (error/region of convergence) curve was plotted against FAR at various values of threshold. We also normalized all traits and combined them and calculated matching score for all possible combinations of four biometric modalities. At rank level, logistic regression method has been used to identify relative position of the individual or person in the ranked list.

Fig. 4.1 illustrates a scheme of fusion used in proposed multimodal



Data Acquisition Module

Sensor Level Fusion

Feature Extraction Module

Feature Level fusion

Score Level Fusion

Matching Module

Decision Module

Decision Level Fusion

Final Result

Data Acquisition Module

Feature Extraction Module

Matching Module

Decision Module

Final Result

Fig. 4.1: Different Fusion Schemes.

4.1.1 Normalization Technique Normalization is required in combining the scores of different traits into a single score because the match scores at the output of the individual trait may not be homogeneous. If the scores are not similar then it becomes very difficult to combine various scores. The range of score values may also not be same for all the modalities. To address these disparities, normalization techniques are used. Normalization of the score is necessary to transform the scores of each trait into a common score value. 53

Min-Max normalization technique has been used in the work where minimum and maximum bounds of the scores produced by a particular trait to 0 and 1 are shifted. If the matching score of a particular trait is not bounded then minimum and maximum values are found from the training set of match scores of that particular trait. Let ‘x’ and ‘y’ be the matching score before and after normalization, respectively. The MinMax technique computes the value of ‘y’ as:


( (


) (

(4.1) )

where Sx is the set of all possible matching scores generated by a particular modality. Min-Max normalization retains the original distribution of scores and transforms all the scores into a common range [0, 1].

4.1.2 The Fusion Technique Matching score level and rank level fusion schemes have been used in the present work. Four biometric traits namely face, ear, iris and foot are used and different approaches were applied for identification of the traits. Neural network based PCA, Eigen image for ear, hamming distance for iris and sequential modified Harr transform for foot were the methods used for recognition of the traits. At rank level, positional method as logistic regression method was used that considered the relative position of the parson in the ranked list. At matching score level, the matching module compares extracted features against the stored templates to generate match score in verification mode. In verification mode, the system validates a person’s identity by comparing the captured biometric data with his own biometric template stored in the system database. We have prepared the database for all biometric traits like face, ear, iris and foot. Few major score values are given below. 54

Genuine score: A match score is referred as genuine score if it is resulted as an outcome of matching two samples of biometric traits of same user. A genuine score that falls below the predefined threshold is genuine score.

Imposter score: A score is known as an imposter score if it is the result of matching two samples of a biometric trait originating from different users. An imposter score that exceeds the predefined threshold is known as imposter score.

There are some more important parameters used in performance evaluation such as FAR, FRR etc.  False accept rate (FAR): False accept rate is defined as the probability of an impostor being accepted as a genuine individual. The FAR is computed as the rate of number of people is falsely accepted with respect to the total number of enrolled people. 

False reject rate (FRR): False reject rate is defined as the probability of a genuine individual being rejected as an impostor. This is computed as the rate of number of people is falsely rejected with respect to the total number of enrolled people.

Relative Operating Characteristic (ROC): The values of FAR and FRR can have trade off against each other by changing some parameters. The ROC plotted as a graph against the values of FAR and FRR, changing the variables implicitly.

Equal Error Rate (EER): The rate at which both accept and reject errors are equal is called as EER. When quick comparison of two systems is required, the ERR is commonly used. This is obtained from the ROC at a point where


FAR and FRR have the same value. The lower the EER, the more accurate the system is. 

Weight of biometric traits: The fusion technique used in the experiment is based on the different weight assignment to each biometric trait. The weight for ith trait, Wi is calculated as:

W = 


Normalized score: The match score of the individual trait may not be homogeneous and the match scores at the output of different traits may follow different statistical distribution. Therefore, Min-Max normalization technique is used to calculate normalized score of each trait. The weight of particular trait of all biometric traits is calculated as normalized score:

W =


where EERj is equal error rate for jth trait and ‘n’ is the number of traits. 

Score after fusion: The sum rule based fusion is used in the work. The score after fusion is calculated as:




where Sj is match score and Wj is the weight of jth trait respectively.

4.2 Logistic Regression Method The logistic regression method is used to correctly predict the category of outcome for individual cases using the most parsimonious model. A new model is created for this purpose that includes all predictor variables which are useful in prediction. Logistic regression is used in calculating the probability of success over the 56

probability of failure. A hypothesis testing in logistic regression involves reasoning by contradiction. The first assumption or the null hypothesis is that, the predictor coefficient is zero in the population. Rank fusion level fusion was used along with the logistic regression method. Ho et al. (1994) suggested the logistic regression method as the best in terms of recognition performance. A weighted sum of the individual ranks is calculated. The weight to be assigned to different matchers is determined by logistic regression. In this method, the final consensus rank is obtained by sorting the identities according to the summation of their rankings obtained from individual matchers multiplied by the assigned weight. The process involves following steps: 

Get the list of ranks from different biometric classifiers.

Assign different weights to all ranks.

For all ranks: The total rank score of each identity is estimated as:

R =∑

(W R )


where n is the number of ranking list, Ri is the ith ranking list. 

Rc is sorted in descending order and replaced with corresponding identity.

The weight to be assigned to the different matchers is determined by the recognition performances obtained through numerous trial executions of the system. This method is very useful when the different matchers have significant differences in their accuracies but requires a training phase to determine the weights. Fig. 4.2 shows an enrollment process used in multimodal biometrics using logistic regression method. It includes pre-processing of the modalities then converting them into templates. The templates are stored in the database.



Template Database

Face Preprocessing

Eigenface projection

Ear Preprocessing

Eigen ear projection

Ear Template

Iris Preprocessing

Iris code Generation

Iris Template

Foot Preprocessing

Modified Harr Energy Generation

Foot Template

Face Template

Fig. 4.2: Logistic Regression method for enrollment process of biometrics.

Fig. 4.3 shows the use of the method is identification process of multimodal biometrics. The process is repeated as enrolment of biometrics. The templates are matched against the templates stored in the databases.



System Database

Face Preprocessing

Eigen face Projection

Eigen ear Projection

Ear Preprocessing

Iris Preprocessing

Foot Preprocessing

Iris code Generation

Features matching (based of Euclidian distance & Hamming distance)

Face Rank

Ear Rank

Iris Rank

Modified Harr Energy Generation Rank level fusion Final Ranking

Fig. 4.3: Logistic regression in Testing.


Foot Rank

4.3 Summary The fusion technique for multimodal biometric was discussed. Normalization method (Min-Max Normalization) has been briefly explained along with the steps used. The logistic regression method is suggested and this method works on the single matcher’s recognition with different databases.



This chapter discusses the implementation and analysis of all procedures, databases used for testing and training of biometric modalities. For this multimodal biometric system we did not use predefined databases for different modalities. We have prepared our self database of 100 persons for each modality (face, ear, iris and foot). The performance was evaluated of the biometrics involving the databases. To compare with different classifiers fusion methods, the outcomes of each classifier approaches have been tested against the performance of match score level and rank level fusion method.

5.1 Software and Databases A multimodal biometric system has been developed using a high computing software, MATLAB 7.10.0 and Pentium-IV machine. The research work involved four biometric traits: face, ear, iris and foot. For implementation, self created database consisting of 100 person’s images for face, ear, iris and foot; was created. Biometric information (face, ear, iris and foot) for every user were captured and stored as a training set. The necessary image pre-processing was also used so that images may be subjected to all other subsequent stages properly. The multiple biometric modalities of a single person can be chosen by selecting among face, ear, iris and footprint images of that person. After training process, the necessary features are extracted. Different classifier approaches at match score level for different modalities were used such as neural network based principle component analysis for face, Eigen image for ear, hamming distance based approach for iris and 61

sequential modified Haar transform for foot. At rank level, logistic regression method was applied to assign the rank of identity. For face recognition, the front face images were acquired using high quality camera in same lighting condition with no illumination changes. The face image is saved using JPEG format. Image preprocessing included the face part is manually cropped from the image and then converted RGB image to gray scale image. The face images were resized in to 170 X 190 pixels. Figure 5.1 shows few sample face images of the databases.





Fig. 5.1: Faces of the databases.


For ear recognition, side face images were captured using high quality camera in same lighting condition with no illumination changes. Now, proper image preprocessing helped in getting cropped ear images. The ear images are saved in JPEG format. The RGB images than converted to gray scale images and resized into 190 x 170 pixels. Figure 5.2 shows few sample ear images of the ear databases.





Fig. 5.2: Sample ear images of the database.

The iris is a thin circular diaphragm, which lies between the cornea and the lens of the human eye. Formation of the unique patterns of the iris is random and not related to any genetic factors. Wildes et al. (1997) suggested that the iris region can be 63

approximated by two circles, one for the sclera boundary and another, interior to the first, for the pupil boundary. The eye images were captured using high quality camera in dark room with no illumination changes. All the images are taken from with the distance 10-15 cm and saved in JPEG format. The eye part is manually cropped from the face image, converted to gray scale images and resized. Figure 5.3 shows sample iris images of the image database.





Fig. 5.3: Sample iris image of the database.

For footprint recognition, footprints of hundred right leg images from 100 different persons were captured using digital camera without any special lighting condition. The foot image was saved in JPEG format. After acquiring the foot image, key points


were extracted the image. The RGB format footprint image was converted into gray scale image. Figure 5.4 shows samples of footprint images.





Fig. 5.4: Sample footprint images.

5.2 Experimental Results Face, ear, iris and foot modalities were used in multimodal biometric recognition system. Four physical independent biometric traits of person were tested. Biometrics involves image acquisition, feature extraction, feature matching and decision making. The research work was carried out on individual biometric trait applied to a classifier algorithm. The matching score of individual traits combined using fusion method and 65

the best results of all possible combinations of multimodal biometric system were determined. Face biometric modality for face recognition is tested using PCA that projected the image onto the Eigen face space. The corresponding set of weights was obtained which were compared with the set of weights of the faces in the training set. The Euclidean distance is used for the matching. Figure 5.5 to Figure 5.10 show various experimental results using MATLAB 7.10.0 software applied over face images. Figure 5.5 shows the training set of face images in PCA space the corresponding transformation matrix. The individual images were normalized and then subjected to pre-processing operations and clear face images were constructed. Figure 5.6 shows a normalized training set of face images.

Fig. 5.5: Training set of face images.


Fig. 5.6: Normalized face images.

Figure 5.7 shows the mean image of the training data of faces; and Figure 5.8 shows Eigen faces representing feature set.


Fig. 5.7: Face Mean image.

Fig. 5.8: Eigen faces.


Figure 5.9 shows an input image and reconstructed image. Figure 5.10 shows weight of input face and the distance of input image. The weights are stored. An acceptance or rejection is determined by comparing with the help of Euclidian distance.


(b) Fig. 5.9: Input image and its reconstructed image.


Fig. 5.10: Weight of input face and the Euclidian distance.

In similar manner, ear recognition was implemented. Eigen vectors and Eigen values were estimated. The known images are projected onto the image space, and their weights are stored. Figure 5.11 to Figure 5.14 are results of biometrics tested over ear images.

Fig.5.11: Training set of ear images. 70

Fig. 5.12: Normalized ear images.

Fig. 5.13: Eigen ear images.


Now, test image was projected into the Eigen space. An acceptance or rejection is determined by applying a threshold. When an unknown image was projected it into Eigen space then the distance between the unknown image’s positions in Eigen space is measured with respect to all the known image’s positions in Eigen space. The image closest to the unknown image in the Eigen space is found as matched.


(b) Fig. 5.14: An ear image and its reconstructed image.


Iris recognition system is implemented tested over databases of eye images. The system included segmentation, normalization and feature encoding as important stages. Segmentation helped in locating the iris region in an eye image as shown in Figure 5.15.

Fig 5.15: Segmented iris image.

Then normalization is used to create a consistent representation of the iris region, and feature encoding helps in producing templates as set of discriminated features of the iris. The input to the system is an eye image, and the output will be an iris template that provides a mathematical representation of the iris region. Figure 5.16 shows the result of segmentation and normalization.

Fig 5.16: Segmented and normalized image.


For matching, Hamming distance is used as a metric for recognition. The Hamming distance algorithm was employed incorporating noise masking also. The distance between two iris templates is calculated. The Hamming distance calculation uses only the bits generated from the true iris region. The proposed work has resulted the Hamming distance of 0.31 for two iris templates generated from the same iris which confirms that iris recognition is a reliable and accurate biometrics. The genuine score and an imposter score are calculated on the basis of minimum Euclidean distance of matched and non matched images. Now, foot modality was subjected to sequential modified Haar transform technique for foot recognition. The sequential modified Haar wavelet was mapped integervalued signals onto integer-valued signals abandoning the property of perfect reconstruction. The wavelet coefficients represented using decimal numbers which needed eight bytes for storing each of the Haar coefficients. The cancellation of the division in subtraction avoids the usage of decimal numbers while preserving the difference between two adjacent pixels. Figure 5.18 shows the result of foot image.

Fig: 5.18: Foot image divided into 4x4 blocks.


The middle portion of the leg was cropped because of its more intensity at this portion; and the portion was divided into 4x4 blocks using sequential modified Harr transform. The accuracy of the MHE feature and Haar energy feature under different decomposition levels and combinations was compared. The modified haar energy (MHE) or the threshold value of the footprint image is stored in the database and its minimum value is selected from all the calculated MHEs. Finally, minimum MHE of test image is calculated and also the Euclidian distance between test image and training set.

5.2.1 Genuine Score and Imposter Score The minimum genuine score and imposter score for each biometric trait is calculated. The minimum distance of test image and the training set is calculated through Euclidian distance for biometric modalities faces, ears, iris and foot. This is also referred as a genuine score. Imposter score is calculated using other image which is not included in training set. Table 5.1 illustrates the Euclidian distance of face, ear, iris and foot modalities. The minimum distance of each modality is found as genuine score. This minimum distance indicated with yellow colour is a minimum genuine score as a result of matching two sample of biometric trait of same person. A genuine score which falls below the predefined threshold is genuine and imposter score is the result of matching two sample of a biometric trait originating from different users and this is the score that exceeds the predefined threshold.


Table 5.1: Euclidean Distance for face, ear, iris and foot Face

Minimum Euclidean distance for face


Minimum Euclidian distance for ear







1.7121E+04 Pfo 1







1.7236E+04 Pfo 2




Pe14 1.5282E+04


1.7265E+04 Pfo 14




Pe15 1.5289E+04


1.7254E+04 Pfo 15




Pe17 1.5301E+04


1.7270E+04 Pfo 17


Minimum Euclidean distance for iris


Minimum Euclidean distance for foot

Table 5.2 shows genuine score, imposter score and threshold value for face, ear, iris and foot images.

Table 5.2 Genuine Score and Imposter Score for face, ear, iris and foot images Trait

Genuine Score

Imposter Score

Threshold Value


















5.2.2 FAR and FRR Face, ear, iris and foot biometrics were tested individually and the results of individual

modalities were calculated in term of False accept rate (FAR) and

False reject rate (FRR). False accept rate of biometric system is the fraction of imposter exceeding the threshold. Similarly, the false reject rate of a system defined as fraction of genuine score falling below the threshold. Table 5.3 shows values of FAR and FRR for individual trait face, ear, iris, foot.

Table 5.3: FAR and FRR for Individual Traits. Traits















5.2.3 EER Equal Error rate (EER) is the point where the FAR equals the FRR. EER is calculated by using FAR/FRR curve for each modality. Figure 5.19 shows the EER curve for all four modalities face, ear, iris and foot. Value of EER of each modality has been mentioned. Table 5.4 shows EER for face, ear, iris and foot modalities.


(a) EER for face.

(b) EER curve for ear.


(c) EER curve for iris.

(d) EER curve of foot image. Fig 5.19: EER (FAR/FRR) curves for various traits.


Table 5.4: EER for Individual Trait. Traits










5.2.4 Calculation of Weight The weight of all individual face, ear, iris and foot modalities was calculated as 1/EER, shown in Table 5.5.

Table 5.5: Weight for all Modalities. Traits










5.2.5 Score Normalization Face recognition and ear recognition produced similar scores and iris recognition and foot recognition algorithms produced dissimilarity scores. The Min-Max normalization technique was used to convert all dissimilar data into similar data. The upper bound and lower bound of matching scores for modalities is taken from its training data. Table 5.6 shows the normalized score for all four modalities face, ear, iris and foot. 80

Table 5.6: Normalized Score. Traits

Normalized Score









The normalized score was calculated and after that the calculation of weights for fusion. There are 6 possible fusion schemes of two traits combination (face + ear), (face + iris), (face + foot), (ear + iris), (ear + foot) and (iris + foot) of four modalities face, ear, iris and foot. Table 5.7 shows the weights assigned to each traits in all possible fusion of two traits in four modalities. Table 5.8 shows the weights assigned to each traits in all possible fusion of three traits in four modalities; and Table 5.9 shows the fusion of four traits (face + ear + iris + foot). The matching score of all possible combinations of traits is shown in Table 5.10 where “+” denotes the fusion.

Table 5.7: Weight for each Trait in all possible Fusion of two Traits. Traits





Faces + Ear





Face + Iris




Face+ foot





Iris + Foot





Iris + Ear





Ear+ Foot






Table 5.8: Weight for each Trait in all possible Fusion of three Traits. Traits





Faces + Ear +Iris





Face + Iris +Foot





Ear +Iris + foot





Face + Foot+ Ear





Table 5.9: Weight for each Trait in all possible Fusion of four Traits. Traits





Faces + Ear + Iris +foot





Table 5.10 shows all possible combination of two traits and their matching score. Table 5.11 shows the results of all possible combination of three traits and finally Table 5.12 shows the matching score for combination of four traits face, ear, iris and foot. Table 5.10: Matching Score of combination of two Traits. Traits


Face + Ear


Face + Iris


Face + Foot


Ear + Iris


Ear + Foot


Iris + Foot



Table 5.11: Matching Score of combination of three Traits. Traits


Face + Ear +Iris


Face + Ear + Foot


Face + Foot + Iris


Ear + Iris + Foot


Table 5.12: Matching Score of combination of four Traits. Traits


Face+ Ear + Iris + Foot


The highest matching score was found as 0.11 when two modalities ear and foot were combined. The matching score of 0.19 was obtained when three modalities face, ear and foot were combined; and matching score of 0.27 was obtained when we combined four modalities face, iris, ear and foot. Table 5.13 illustrates the highest rank of the logistic regression rank fusion approach. The more the weight the less the performance is. The weights are chosen by reviewing the previous results obtained by different classifier. ‘Person 5’ gets the top position in the reordered rank list as can be seen in the Table.


Table 5.13: Result of logistic regression method. Logistic regression method Face=0.24, Ear=0.26, Iris=0.25, Foot=0.24 Identities

Fused score

Reordered Rank

Person 1



Person 2



Person 3



Person 4



Person 5



Person 6



Person 7



Person 8



Person 9



Person 10



5.3 Summary A multimodal biometrics was implemented and the images were tested on MATLAB. We developed a multimodal biometric system employing rank level fusion for four biometric modalities face, ear, iris and foot. The performance has been evaluated using different classifier approaches such as PCA for face, Eigen image for ear, 84

Hamming distance based approach for iris and sequential modified Harr transform for foot based on calculated weight of individual biometric trait in terms of FAR, FRR and EER.

The implementation was made for all possible combinations of four

modalities. Matching score was calculated for all combinations using weight of modalities and normalized score. The best result was obtained for two modalities (ear and foot) with 0.11 matching score, for three modalities (face, ear and foot) with 0.19 matching score and 0.27 matching score was found for four modalities (face, iris, ear and foot). Logistic regression method has been used at rank level.



6.1Conclusions In this research work, a multimodal biometric system has been developed that used face, ear, iris and foot biometric traits. The weight of each biometric trait was calculated independently and applied to different classifier approaches. PCA for face modalities, Eigen image for ear modalities, Hamming distance based technique for iris modalities and modified sequential Harr transform for foot modalities; were used. The information from these four biometric identifiers was combined after normalization of the weight of individual modalities using Min-Max normalization technique. The normalized data was applied to sum rule based fusion scheme over four modalities (face, ear, iris and foot) and all possible combinations of modalities. The matching score for all combinations was calculated. The highest matching score was found as 0.11 when two modalities ear and foot were combined; 0.19 when three modalities face, ear and foot were combined; and 0.27 matching score was obtained when all the four modalities face, ear, iris and foot were combined. The recognition performance of the multimodal biometric system was greatly improved. Rank level logistic regression method was used over self created multimodal databases. 86

6.2 Future Scope We used different classifier approaches for all the traits. Only one classifier could be used for different modalities.