Face Recognition using Neural Network

2 downloads 0 Views 5MB Size Report
image the following formula will be used to mix the three colours band [42,43]:. Gray Scale = 0.3*Red + ...... Figure (3.3) shows the block F that accepts the output signal S of a summation block to ...... Laboratories,1998. Marni@salk.edu.
Face Recognition using Neural Network A THESIS SUBMITTED TO UNIVERSITY OF TECHNOLOGY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY IN ELECTRICAL ENGINEERING / COMMUNICATION AND COMPUTER ENGINEERING

BY

KHAMIS A. ZIDAN SUPERVISED BY Prof. Dr. MUNTHER N.

3002

Dr. MUZHIR SH.

DEDICATIONS

To My Father...... Mother...... Brothers, Sisters and my Family The catalysts of my "Dreams"

To all those

......

For making it worthwhile

Khamis

Acknowledgment I would like to express my deep appreciation and gratitude to my supervisors Prof. Dr. Munther N. and Dr. Muzhir Sh. whose dedication to this work has been boundless. They worked to ensure the high quality of the material of this text. It has been my good fortune to have the advice and guidance from them. I would like to thank the anonymous reviewers for helpful comments, and the staff and students of Computer and Software Engineering Department in Al-Mustansiriya University and University of Technology for compiling and maintaining the database and for their kind assistance. I would like also to convey my appreciation to Karam Al-Ani, Rokaia Shalal, Laith Al-Rawi, Walaa Mohammed, Raghad Al-Macdici and Ali Hussein for their cooperation and assistance that helped to achieve the goal. Sincere appreciation is due to my family, for their patience, encouragement and help during the work. Finally, I would like to thank all kind, helpful and lovely people who helped me directly or indirectly to complete this work and to apologize to them for not being able to mention them by name only in the stillness of my heart.

Khamis

                       ‫صدق اهلل العظيم‬                  

‫بسم اهلل الرمحن الرحيم‬

‫كان‬ َ ‫لمك ما لَ ْم ت َ ُك ْن تَعلَ ْم َو‬ َ ‫ع‬ َ ‫ َو‬  ‫عظيما‬ َ ‫ع‬ َ ‫ليك‬ َ ‫فَض ُل هللا‬

Abstract Human faces are very similar in structure, with minor differences from person to person. Furthermore, lighting condition changes, facial expressions, and pose variations further complicate the face recognition task. This thesis presents an efficient and low complexity hybrid approach to face recognition, which combines image preprocessing based on histogram equalization, wavelet transform and multi- neural networks. A face preprocessing approach is a histogram equalization to improve contrast and compensates for differences in camera input gains. The preprocessed image is compressed to reduce the number of input pixels by using the wavelet transform to speed up the system and provide invariance to minor changes in the image samples. Multi-neural networks must be trained to deal with all remaining variation (rotation, scale and deformation). The outputs from multiple recognize must be combined into a single decision unit, that decide the classified face and associated information which is already stored in the database. The system recognize stimulus images correctly without being affected by shift in position, rotation, scaling, or by distortion in shape of these images because of using the arbitrate structure. The system also recognizes images with changes in angle and expression. Small set of images per person in the training database is needed to produce acceptable classification accuracy. The results obtained show that the proposed system gives a very encouraging performance. The proposed system structure was implemented as a software package using C++ and visual Basic. Keywords: Pattern Recognition, Face Detection, Human Face Recognition, Computer Vision , Feature Extraction , Artificial Neural Networks, Machine Learning, Pattern Classification, Multilayer Perceptrons, Statistical Classification.

i

List of Contents Title

Page

Abstract …………………………………………….…………………….. i List of Abbreviation …………….……………………………………….viii List of Symbols ……………………………..…….…………..…………ix List of Figures …………………………….…….….….……….…….……x List of Tables ……………………………………..…………….………..xiii Chapter One Overview ………………………………….………….….………………..1 1.1 Introduction …………………………………………..………………1 1.2 Challenges in Face recognition …….…………………..……………2 1.2.1 Variation in the Image Plane …………..……….….………….2 1.2.2 Pose Variation …………………..………………………….…2 1.2.3 Lighting and Texture Variation ……….……….…….….…….3 1.2.4 Background Variation ……………………..…………….……3 1.2.5 Shape Variation …………………………..……….…..………3 1.3 Literature Survey (Related Work) ………………..………..…………..3 1.3.1 Geometrical Features analysis ………………….………..……4 1.3.2 Eigenface Approach …………………………………………..5 1.3.3 Template Matching

………………………………………..…6

1.3.4 Hidden Markov Model ……………………………..…………8 1.3.5 Graph Matching

……………………………………………...9

1.3.6 Neural Network Approaches ……………………………….…9 1.3.7 Automatic Face Processing Approach …..…………………...12 1.4 Scope of the work ………………………………..…….……………12 1.5 Outline of the Thesis

..….………………………..…………………14

ii

Title

Page

Chapter Two Theoretical Principles of Image Processing ……………………………..16 2.1 Introduction ……………………………………………..………….16 2.2 Computer Imaging System ………………………………..………...17 2.3 Elements of Digital Image Processing System …………………..…18 2.3.1 Image Acquisition ……………………………………………18 2.3.2 Storage ………………………………………………………18 2.3.3 Processing

………………………………….……………….19

2.4 Image Capturing by Camera ………………………………………..19 2.4.1 Mechanical Camera …………………………….…………….19 2.4.2 Digital Camera ………………………………………………20 2.5 Image representation ………………………………………………..20 2.5.1 Binary Images ………………………………………….….…..21 2.5.2 Gray Scale Images ………………………………………..….21 2.5.3 Color Images ……………………………………..………….22 2.5.4 Multispectral Images ………………………………..……….22 2.6 Image File Formats ………………………………………………….22 2.6.1 Bitmap Image File (BMP) ……………………………….……23 2.6.2 Joint Photographic Experts Group (JPEG) …………..………..24 2.6.3 Graphic Interchange Format (GIF) ……………………..……..24 2.6.4 Tag Image File Format (TIFF) ……………………………..…25 2.7 Image Processing ……………………………………………………25 2.7.1 Image Smoothing ..…....………………………………………25 2.7.2 Image Sharpening .……………………………………………26 2.7.3 Image Enhancement .…………………………………………26 2.7.3.1 Enhancement by Histogram Modification …………....…27 2.7.3.1 Image Histogram …………………………………..……28

iii

Title

Page

2.7.4 Segmentation ..……………………………………………….31 2.7.4.1 Thresholding .………………………………………31 2.7.4.2 Edge finding ….…………………………………..…33 2.7.5 Image Compression …………………………………………34 2.7.6 Fourier Transforms and Bandpass Filtering ..……………….35 2.8 The Discrete Wavelet Transform …………………..………………36 2.9 Dilation ……………………………………………..………………41 2.10 Modeling and Simulation …………………………………………43 2.11 Digital Signal Processors .…………………………………………44 2.12 Programming language ……………………………………………44 Chapter Three Artificial Neural Networks Basic Concepts ……………..………………46 3.1 Introduction ..……………………………………………………….46 3.2 Definition of Artificial Neural Networks ..…………………………47 3.3 Artificial Neuron Structure …….……………………………………47 3.4 Neural Model…………………………………………………………49 3.5 Neural Network Learning ….………………………………………52 3.6 Structure of Connections

.…………………………………………53

3.7 Neural Networks Taxonomies .…..…………………………………53 3.7.1 Neural net taxonomy based on training strategies ……………54 3.7.2 Neural net taxonomy based on input nature

..………………55

3.7.3 Neural net taxonomy based on architectures .………………56 3.7.4 Neural net taxonomy based on building element .……………58 3.7.4.1 Single artificial neuron (The Perceptron) …………58 3.7.4.2 Single Layer Artificial Neural Networks ……………59 3.7.4.3 Multilayer Artificial Neural Networks ………………60 3.8 Reasons for Using Neural Networks .………………………………61

iv

Title

Page

3.9 Limits in the use of neural networks .………………………………64 3.10 Artificial neural networks applications

..…………………………64

3.10.1 Characteristics of suitable applications ..…………………64 3.10.2 Methodology of Neural Network Applications .…………..65 3.11 Convergence ………………………………………………………65 3.12 Strengths and limitation of backpropagation ………………………66 3.13 Network size .………………………………………………………68 Chapter Four Proposed Face Recognition System Design …………………….….……69 4.1 Introduction …………………………………………………………69 4.2 Characteristics of the Implemented Images ..……………………….69 4.3 Image Datasets (Databases) …..……………………..………………70 4.4 Image Thresholding .…………………………………….…………75 4.5 Limitations and Research Direction of Proposed Method

…………76

4.6 Characteristics of the Face Recognition System ……………………76 4.7 Feature Selection and Classification …………………………..……76 4.7.1 Training phase ………………………………………….……77 4.7.2 Test phase (Retrieving phase) ..…….…………………..……78 4.8 The Structure of Proposed Face Recognition System

..……………78

4.8.1 Background of Database Images .…………………………….79 4.8.2 Preprocessing for Brightness (Lighting Variation) & Contrast..80 4.8.3 Face Preprocessing (Image Size Reduction) ………………....82 4.8.3.1 Wavelet Transform ……………………………….…83 4.8.3.2 Advantages of utilize of wavelet transform …….……87 4.8.4 Individual Face Recognition Networks ………………….……88 4.8.5 The Backpropagation Training Algorithm …….………….….91 4.8.5.1 Forward propagation ..… ..……………………….…91

v

Title

Page 4.8.5.2 Backward propagation

..………….………………93

4.8.6 Arbitration among Multiple Networks ………………………96 4.8.7 Classification

..………………………………………………97

4.9 Face Training Images ...…………………………………………..…97 4.9.1 Training mode ..……………………………………………98 4.9.2 Testing mode .………………………………………………99 4.9.3 Non-Face Training Images

. .……… ..…………………101

Chapter Five Simulation and Experimental Results …………………………………102 5.1 Introduction

………………………………………………………102

5.2 System User Interface

……………………………………………102

5.3 Results of Testing Images

……………………………………107

5.3.1 Test of Image with Glasses

………………………………107

5.3.2 Test of Image with Noise

………………………………108

5.3.3 Test of Image with Orientations (pan, tilt & translate ) ….110 5.3.4 Test of Image With Intensity Variations …………………111 …………………112

5.3.5 Test on Images with Open/Closed Eyes 5.4 Additional Experiments

…………………………………………...113

5.4.1 Face Recognition under Controlled/Ideal Condition and Size Variation

………………………………………..114

5.4.2 Face Recognition under Varying Lighting Conditions ……115 5.4.3 Face Recognition under Facial Expression Changes 5.4.4 Face Recognition under Varying Poses

……..117

…………………..118

5.5 Experiments on Variation of System Components

………………119

5.5.1 Variation on the Number of Nodes for Single Hidden Layer …………………………………………………….120 5.5.2 Variation on the Number of Hidden Layers

vi

…..………. 120

Title

Page 5.5.3 Variation on the Number of Nodes for Second Hidden Layer

……………………………………………………....121

5.5.4 Effect of Learning Rate and Momentum …………………..121 5.5.5 Generalization of the Network 5.6 Recognition Results

…………………………123

……..………………………….…………125

5.6.1 Recognition Results with Variation of the Number of Training Images per Person ………………………………..125 5.6.2 Recognition Results with Variation of the Number of Output Classes

…………………………………………..126

5.6.3 Recognition Results as an Arbitrating among Networks

…127

5.6.4 Recognition Approaches based on the Different Database …128 Chapter six Conclusions and Suggestions for Further Work 6.1 Conclusions

……………………129

………………………………………………………129

6.2 Suggestions for Further Work

……………………………………132

References ……………………………………………………………..134

vii

List of Abbreviation ANN ANS BMP BPNN CCD D DCT DIPP DSP DWT GIF HH HL HMM HVS JPEG LEM LH LL LUT MLP NFL ORL PCA PDBNN RGB RMS SNR SOFM SOM TIFF VLSI WWW

Artificial neural network Artificial neural systems Bitmap Image File Backpropagation Neural Network Charge Coupled Device Dilation Discrete cosine transforms Digital image processing package Digital Signal Processing Discrete Wavelet Transform Graphic Interchange Format Highpass-highpass Highpass-lowpass Hidden Markov Models Human vision system Joint Photographic Experts Group Line Edge Map Lowpass-highpass Lowpass-lowpass Lookup table Multilayer Perceptron Nearest Feature Line Olivetti Oracle Research Lab. Principal Component Analysis Probabilistic decision-based neural network Red, Green and Blue Root-mean-squared Signal to Noise Ratio Self-Organizing Feature Map Self-organizing map Tag Image File Format Very Large Scale Integration World Wide Web

viii

List of Symbols L Ij H Ij

(i,j) ai bi ck

g(i,j) H I(i,j) N(b) Nm Nn Np1 Np2 P(b) t T voj wok x Xi Yk Zj α δ Δ θ μ

Inverse low-pass output of wavelet function Inverse high-pass output of wavelet function Spatial coordinates of pixel in an image Low-pass output of wavelet function High-pass output of wavelet function Wavelet coefficients Thresholded function of image Entropy Two dimensional light intensity function of image Number of pixels at gray level b Number of nodes in the output layer Number of nodes in the input layer Number of nodes in the first hidden layer Number of nodes in the second hidden layer First order histogram probability Output target vector Threshold value for image Bias on hidden unit j Bias on output unit k Input training vector Input unit i Output unit k Hidden unit j Learning rate Error correction weight adjustment Update variation Threshold for the activation function Momentum parameter

ix

List of Figures

No. of Figure

Figure

Schematic diagram of the main steps of the face recognition system developed in this thesis. Figure (2.1) Computer imaging system Figure (2.2) A typical mechanical camera. Original Contrast Stretched histogram Equalized. Figure (2.3)(a) Original image. (b) Contrast stretching image. (c) Histogram equalized image. Pixels below the threshold (I(i,j) < ) will be labeled as object pixels; those above the threshold will be labeled Figure (2.4) as background pixels. (a) Image to be thresholded (b) Brightness histogram of the image. Figure (2.5) Edge finding (a) SNR = 30 dB (b) SNR = 20 dB Figure (2.6) An image called boy (320*320) boy is decomposed to 10 subbands with 2 dimensional Figure (2.7) wavelet transform for 3 levels Figure (2.8) The notation for corresponding subbands (a) The data structure for the corresponding Figure (2.9) subbands.(b) Tree structure. Figure (2.10) Dilation of a sixteen-sample block of data. Figure (3.1) The Biological Neuron. Figure (3.2) The Artificial Neuron Figure (3.3) Artificial Neuron with Activation Function Figure (3.4) Activation Signal Functions Figure (3.5) Supervised and Unsupervised Learning Neural Networks Taxonomy (based on input nature and Figure (3.6) training strategy). Neural Networks Taxonomy (based on network structure Figure (3.7) and training strategy). Figure (3.8) The Perceptron Figure (3.9) Single layer neural network. Figure (3.10) Multilayer neural network. Figure (4.1) Example of first database face files. Figure (4.2) Example of second database face files. Figure (4.3) Example of third database face files. Figure (4.3): Table of face image information of third database. Continued. P: Original face image number. P1, P2, P3, P4, P5, P6, Figure (1.1)

x

Page 13 17 20 30

32 33 38 38 38 39 42 48 49 50 51 55 56 57 59 60 61 71 71 72 73

Figure (4.4) Figure (4.5) Figure (4.6) Figure (4.7) Figure (4.8) Figure (4.9)

Figure (4.10)

Figure (4.11)

Figure (4.12)

Figure (4.13) Figure (4.14) Figure (4.15) Figure (4.16) Figure (4.17) Figure (4.18)

Figure (5.1) Figure (5.1): Continued. Figure (5.2) Figure (5.3) Figure (5.4)

P7, P8, P9: Face images for each person. Example of ORL database face files. Example of rejection database face files. Classification stages. Two phases of classification: (a) Training phase. (b) Testing phase. Block Diagram of Proposed Face Recognition System. The sample results from preprocessing. (a) Original image. (b) Equalized image. Histogram of original and equalized images. (a) Original image. (b) Histogram of image in (a). (c) Image after histogram equalization. (d) Histogram of equalized image in (c). Reduction of image size.(a) Original image (160x160) pixels. (b) Compressed image (20x20) pixels. Wavelet transform using Daubechies 2. (a) The input vector of 160*160 elements. (b) Distribution of the WT coefficients. (c) Reconstructed vector using 50% of the coefficient. Wavelet transform display location of frequency bands in a four-band wavelet-transformed image. Designation is row / column. Wavelet transform. The basic algorithm used for face recognition. Flowchart of back propagation neural network training steps. Training and testing modes for face recognition system. Examples of original, rotated, non-face, and translated images. (a) & (f) Original images. (b), (c), (d) & (e) Translated images. (g) & (h) Rotated images. (i) & (j) Non-face images. Face recognition system user interface. (a) Splash form. (b) Main form of recognized face image. (c) Main form of not recognized face image (new face image). (d) Main form of not recognized face image (rejected image). Flow chart of DIPP of system user interface with the processing steps of proposed face recognition system Test faces with and without glasses. (a) & (b) Training images. (c) & (d) Testing images Test images with / without noise. (a) Test image without noise. (b) Test image with grain noise (SNR = 27 dB).

xi

74 75 77 78 79 80

82

83

85

86 87 90 92 99 100

104 105 106 107 108

Figure (5.4): Continued.

Figure (5.5)

Figure (5.6) Figure (5.7) Figure (5.8)

Figure (5.9)

Figure (5.10)

Figure (5.11)

Figure (5.12) Figure (5.13) Figure (5.14)

(c) Test image with grain noise after enhancement. (d) Histogram of original image. (e) Histogram of noisy image. (f) Histogram of enhanced image. (g)Test image with medium distorts diffuse glow noise (SNR = 8 dB). (h) Test image with noise after enhancement. (i) Histogram of noisy image. (j)Histogram of enhanced image. (k) Test image with highly distorts diffuse glow noise (SNR = 5 dB). (l) Test image with noise after enhancement. (m) Histogram of noisy image. (n) Histogram of enhanced image. Test of face images with some rotates, translate. (a) Original test image. (b) Test image rotate 7 degree clockwise. (c) Test image rotate 7 degree counterclockwise. (d) Test image translates to left. (e) Test image translates to down. (f) Test image translates to up. Test of face images with intensity variation. (a) Training image. (b), (c) and (d) Testing images. Test of face images for open / closed eyes. (a) Training images. (b) Testing images. An example pair of testing faces. The two faces were taken with a two-week interval. Sample cropped images of model and test faces under varying lighting. (a) Training images. (b) Testing image left light on. (c) Testing image right light on. (d) Testing image both lights on. Sample cropped faces used in the experiment under facial expression changes. (a) Training image. (b) Testing image of smiling expression. (c) Testing image of surprised expression. (d) Testing image of sleepy expression. Sample faces used in the experiment under varying poses. (a) Training image. (b) Testing looks right image. (c) Testing looks left image. (d) Testing looks up image. (e) Testing looks down image. Effect of α on maximum iteration needed and minimum error obtained when keeping μ constant. Effect of μ on maximum iteration needed and minimum error obtained when keeping α constant. The error rate as a function of the number of classes. It is modify the network from that used for the 40-class case.

xii

109

111

112 113 114

116

118

119

122 123 127

List of Tables

No. of Table Table (2.1) Table (2.2) Table (3.1) Table (5.1) Table (5.2) Table (5.3)

Table

The BMP file format Coefficients for three named wavelet functions Current applications areas for different neural network. Recognition results under varying lighting. Recognition results under different facial expressions. Face recognition results under pose different variations. Accuracy of face recognition system with varying Table (5.4) number of nodes for the single hidden layer: Five test images per person. Accuracy of face recognition system with varying Table (5.5) number of hidden layers in the network subnets: Five test images per person. Accuracy of face recognition system with varying Table (5.6) number of nodes for the second hidden layer in the network subnets: Five test images per person. Table (5.7) Results of proposed system for each stage. Error rate as the size of the training set is varied from 1 Table (5.8) to 5 images per person. Averaged over two different selections of the training and test sets. Error rate of the face recognition system with varying Table (5.9) number of classes (subjects). Table (5.10) Recognition rate with varying of arbitration output. Table (5.11)

Performance comparison to recognition applied to the different database.

xiii

Page 23 40 65 116 117 119 120

121

121 125 126 126 127 128

Chapter One Overview 1.1 Introduction Computerized human face recognition has been an active research area for the last years. There is a wide range of military and civilian applications such as identity authentication, access control, digital libraries, bankcard identification, mug shots searching, and surveillance systems [1,2,3]. A general statement of the problem of face recognition can be formulated as follows: it is used to identify one or more persons from still images or a video image sequence of a scene by comparing input images with faces stored in a database [4,5]. The solution of the problem involves segmentation of faces from cluttered scenes, extraction of features from face region, identification, and matching. Face recognition problems and techniques can be separated into two groups [6]: dynamic (video) and static matching. Dynamic matching is used when a video sequence is available. The video images tend to be of low quality, the background is very cluttered and more than one face present in the picture. However, since a video sequence is available, one could use motion as a strong cue for segmenting faces of moving persons. Static matching uses images with typically reasonable controlled illumination, background, resolution, and distance between camera and the person. Some of the images that arise in this group can be acquired from a video camera. Mug shots matching are the most common application in static matching group. Typically, in mug shots photographs one frontal and one or more side views of a person's face are taken. Profile images provide a detailed structure of the face that is not seen in frontal images. Face recognition from profile

1

concentrates on locating points of interest called fiducial points. Recognition involves the determination of relationships among these fiducial points. The application of face recognition technology can be categorized into two main parts [1]: law enforcement application and commercial application. Face recognition technology is primarily used in law enforcement applications, especially mug shot albums (static matching) and video surveillance (real-time matching by video image sequences). The commercial applications range from static matching of photographs on credit cards, passports, driver's licenses, and photo identification to real-time matching with still images or video image sequences for access control. Each presents different constraints in terms of processing requirement.

1.2 Challenges in Face Recognition A capable face recognition system is almost larger than image variations due to change in face identity [4,7]. This makes face recognition a great challenge. Two issues are central:  What features should be able to deal with variation of face images in viewpoint, illumination, and expression.  How to classify a new face image based on the chosen representation. There are many sources of variability in the face recognition problem, as follow: 1.2.1 Variation in the Image Plane :The simplest type of variability of images of a face can be expressed independently of the face itself, by rotating, translating, scaling, and mirroring its image. Also included in this category are changes in the overall brightness and contrast of the image, and occlusion by other objects. 1.2.2 Pose Variation :Some aspects of the pose of a face are included in image plane variations, such as rotation and translation. Rotations of the face that are not in the image plane can have a larger impact on it is appearance. Another 2

source of variation is the distance of the face from the camera and changes in which can result in perspective distortion. 1.2.3 Lighting and Texture Variation :Up to now, the described variations due to the position and orientation of the object with respect to the camera caused by the face and its environment, specifically the face’s surface properties and the light sources. Changes in the light source in particular can radically change a face’s appearance. 1.2.4 Background Variation : This thesis, suggested that for profile faces, the border of the face itself is the most important feature, and its shape varies from person to person. Thus the boundary is not predictable, so the background cannot be simply masked off and is ignored. A variety of different backgrounds can be seen in the examples. 1.2.5 Shape Variation :A final source of variation is the shape of the face itself. This type of variation includes facial expressions, whether the mouth and eyes are open or closed, with or without glasses, and the shape of the individual’s face.

1.3 Literature Survey (Related Work) Face recognition has been actively studied [8,9], particularly over the last few years. The research effort has focused on the subproblem of frontal face recognition, with limited variance in illumination and facial expression. A literature review of face recognition techniques is given in this section, some of them have made researches based on the traditional methods, and others have used neural networks. Most successful approaches to frontal face recognition, namely, geometrical feature matching, eigenface, template matching, hidden Markov model, graph matching, neural network, and automatic face processing are discussed.

3

1.3.1 Geometrical Feature Analysis Geometrical feature matching are based on the computation of a set of geometrical features from the picture of a face, and measured distances between features may be most useful for finding possible matches in a large database such as a mug shot album. A vector representing the position and size of the main facial features, such as eyes and eyebrows, nose, mouth, and the shape of face outline can describe the overall configuration. Goldstein et al. [10] showed that a face recognition program provided with features extracted manually could perform recognition apparently with satisfactory results. Brunelli and Poggio [11] computed a set of geometrical features such as nose width and length, mouth position, and chin shape. They report a 90% recognition rate on a database of 47 person. Cox et al. [12] have introduced a mixture-distance technique which achieves a recognition rate of 90% using a query database of 95 images. In [13] Bartlett and Sejnowski presented a method, which the facial features such as eyes, nose, mouth, and chin are detected. Properties and relations (e.g. areas, distances, angles) between the features are used as the descriptions of faces for recognition. Although such features are economical and efficient in achieving data reduction and insensitive to variations in illumination and viewpoint, but they rely heavily on the extraction of facial features. Using geometrical features, one of the pioneering works on automated face recognition is by Kanade [1] the reported recognition rate is between 45-75% with a database of 20 person. The algorithms for automatic location of feature points do not provide a high degree of accuracy and require considerable computational capacity.

4

1.3.2 Eigenface Approach Eigenface is one of the most thoroughly investigated approaches to face recognition. It is also known as Karhunen-Loeve expansion, eigenpicture, eigenvector, or principal component. Sirovich and Kirby [14] and Kirby et al. [15] used Principal Component Analysis (PCA) to efficiently represent pictures of faces. They argued that any face images could be approximately reconstructed by a small collection of weights for each face and a standard face picture (eigenpicture). The weights describing each face are obtained by projecting the face image onto the eigenpicture. Turk and Pentland [16] presented a face recognition scheme in which face images are projected onto the principal components of the original set of training images. The resulting eigenfaces are classified by comparison with known individuals. Turk and Pentland presented results on a database of 16 subjects with various head orientations, scaling, and lighting. Their images appeared identical otherwise with little variation in facial expression, facial details, pose, etc. For lighting, orientation, and scale variation their system achieves 96%, 85% and 64% correct classification respectively. In Pentand et al. [17] good results are reported on a large database (95% recognition of 200 person from a database). It is difficult to draw broad conclusions as many of the images of the same people look very similar, and the database has accurate registration and alignment [18,19]. However, the eigenface approach is a fast, simple, and practical algorithm. While, it may be limited because optimal performance requires a high degree of correlation between the pixel intensities of the training and test images. This limitation has been addressed by using extensive preprocessing to normalize the image [16].

5

1.3.3 Template Matching Template matching methods operate by performing direct correlation of image segments. Template matching is only effective when the query images have the same scale, orientation, and illumination as the training images. A simple version of template matching is that a test image represented as a twodimensional array of intensity values is compared using a suitable metric, such as the Euclidean distance, with a single template representing the whole face. There are several other more sophisticated versions of template matching on face recognition. One can use more than one face template from different viewpoints to represent an individual's face. A face from a single viewpoint can also be represented by a set of multiple distinctive smaller templates. The face image of gray levels may also be properly processed before matching. In [11], Bruneli and Poggio automatically selected a set of four features templates, i.e., the eyes, nose, mouth, and the whole face, for all of the available faces. They compared the performance of their geometrical matching algorithm and template-matching algorithm on the same database of faces. The template matching was superior in recognition to geometrical matching and simpler. Since the principal components (also known as eigenfaces or eigenfeatures) are linear combinations of the templates in the database, the technique cannot be achieve better results than correlation [11], but it may be less computationally expensive. Li and Lu [8] presented a method for face recognition called the Nearest Feature Line (NFL). The Feature Line (FL) passing through two points generalizes any two-feature points of the same class (person). The derived FL can capture more variations of face images than the original points and thus expands the capacity of the available database. The classification is based on the nearest distance from the query feature point to each FL. Face recognition using profile images based on the representation of the original and morphological

6

derived profile shapes is presented in [6]. From profile shapes feature vectors are obtained using distances between outline curve points and shape centroid. After normalizing the vector components, the Euclidean distance measure is used for measuring the similarity of the feature vectors derived from different profiles. Spies and Ricketts [20] presented a method that describes a simple face recognition system based on an analysis of faces via their Fourier spectra. Facial information was analyzed in terms of spatial frequencies. Specifically, faces contain both features and configure properties that coexist. These properties are not conveyed by the same spatial frequency ranges and are not equally useful depending on the nature of the processes involved in performing a particular task. A low frequency representation provides information concerning facial configuration (i.e., general shape, outer contour, and hairline of a face) but does not provide a detailed representation of the individual features. As a consequence, categorization tasks may be accurately achieved by processing only the low frequency range of a face image. In contrast, because recognition and identification tasks require finer information than a simple categorization task, performance for these tasks can be improved by the addition of a high frequency range. High frequencies carry information concerning the inner features of a face (e.g., specific shape of the eyes, nose, and mouth). The recognition system developed simply takes the values of a few selected frequencies in Fourier space as features. The resulting feature vectors are compared to each other using Euclidean distance and the closest match is selected by applying a threshold on the distance between two nearest vectors, however this results in severe performance penalties. Gao and Leung [1] presented Line Edge Map (LEM), which based face coding and line matching techniques to integrate geometrical and structural features in the template matching. This method demonstrates that LEM together

7

with a generic line segment and distance measure provides a new way for face coding and recognition. The earliest methods for template matching were correlation based, thus they were computationally very expensive and require great amount of storage, these problems lie in the description of these templates. Since the recognition system has to be tolerant to certain discrepancies between the template and the test image, this tolerance might average out the differences that make individual faces unique. In general, template-based approaches compared to feature matching are a more logical approach.

1.3.4 Hidden Markov Model Stochastic modeling of nonstationary vector time series based on Hidden Markov Models (HMM) has been very successful for speech applications. Samaria and Fallside [21] applied this method to human face recognition. Faces were intuitively divided into regions such as the eyes, nose, mouth, etc., which can be associated with the states of a hidden Markov model. Since HMMs require a one-dimensional observation sequence and images are twodimensional, the images should be converted into either 1-D temporal sequences or 1-D spatial sequences. A spatial observation sequence was extracted from a face image by using a band sampling technique. Each face image was represented by a 1-D vector series of pixel observation. Then, it is matched against every HMMs in the model face database (each HMM represents a different subject). The highest is considered the best match and the relevant model reveals the identity of the test face. It is classification time and training time were not given (believed to be very expensive).

8

1.3.5 Graph Matching Graph matching is another approach to face recognition. Lades et al. [22] presented a dynamic link structure for distortion invariant object recognition, which employed elastic graph matching to find the closest stored graph. Dynamic link architecture is an extension to classical artificial neural networks. Memorized objects are represented by sparse graphs, whose vertices are labeled with a multi-resolution description in terms of a local power spectrum and whose edges are labeled with geometrical distance vectors. Object recognition can be formulated as elastic graph matching which is performed by stochastic optimization of a matching cost function. In general, dynamic link architecture is good in terms of rotation invariant; however, the matching process is computationally expensive.

1.3.6 Neural Network Approaches Neural network has been studied in the past few years and become one of the major fields in face recognition. This results in a well established approaches to use artificial neural network in recognition systems [23]. This method, theoretically, leads to an increased ability to identify faces in difficult conditions. As with all primary technologies, neural network facial recognition can do 1-1 or 1-many [24]. Therefore it is perhaps the most widely utilized face recognition technology. Neural methods [24] generally operate directly on an image-based representation (i.e. pixel intensity array), this class of methods has been more practical and reliable as compared to geometric featurebased methods. The attractiveness of using neural network could be due to the nonlinearity in the network. Hence, the feature extraction step may be more efficient than the linear Karhunen-Loeve methods. One of the first artificial neural network (ANN) techniques used for face recognition is a single layer network, which contains a separate network for each stored individual [25]. The

9

way in constructing a neural network structure is crucial for successful recognition. It is very much dependent on the intended application. Weng and Huang [26] presented a multi-resolution pyramid structure for face verification. Jahren et. al. [27] presented a neural network-based system in which a small window of an image is examined to find and identify a face. The problem of facial recognition from gray-scale images is approached using a twostage neural network implemented in software. The first net finds the eyes of a person and the second neural network uses an image of the area around the eyes to identify the person. Lawrence et al. [2] proposed a hybrid neural network which combined local image sampling, a self-organizing map (SOM) neural network, and a convolutional neural network. The SOM provides a quantization of the image samples into a topological space where inputs that are nearby in the original space also nearby in the output space, thereby providing dimension reduction and invariance to minor changes in the image sample. The convolutional network extracts successively larger features in a hierarchical set of layers and provides partial invariance to translation, rotation, scale, and deformation. It has achieved the lowest error rate reported to date for the Olivetti Oracle Research Lab. (ORL) database. Lin et al. [28] used probabilistic decision-based neural network (PDBNN), that can be applied effectively to 1) face : which finds the location of a human face in a cluttered image, 2) eye localizer: which determines the positions of both eyes in order to generate meaningful feature vectors, and 3) face recognizer. Recognize facial image with low resolution by utilizing the Hopfield combined with the pattern matching is presented in [7]. In this method, a Hopfield memory model is organized and the optimal procedure of the unlearning is determined. Based on the composed Hopfield memory model, the

11

relation between the reliability of the recalls and the number of faces memorized in the Hopfield memory is analyzed. Pan and Bolouri [29] presented a discrete cosine transforms (DCT) to reduce image information redundancy because only a subset of the transform coefficients are necessary to preserve the most important facial features such as hair outline, eyes and mouth. Then the DCT coefficients are fed into a backpropagation neural network for classification. Ibrahem [30] presented a two-computer program systems to recognize human face; one system is concerned with recognition of persons from their side view (profile image) while the other is concerned with recognition from frontal view. A simple memory-based technique for appearance face recognition motivated by the real-world task of visitor identification can outperform more sophisticated algorithms that use principal components analysis (PCA) and neural network is presented in [31,32]. Irzoqi [33] presented a method for face recognition that combines a local image sampling (Self- Organization Feature Map) neural network, and Multilayer Perceptron (MLP) neural network for classification. Dawood [34] presented a modified image recognition system based on the neocognitron model for feature extraction and multilayered feedforward to associate the feature with their labeled recognition code. As for neural network recognition systems, due to the difficulty of selecting a representation that could capture features robustly, most approaches avoid the feature extraction procedure by feeding the pixel images directly to neural networks and making use of the ability of neural networks as an information processing tool. Nevertheless, Lawrence et al. applied SOM as a feature extractor and then the generated features were exploited as the input of a convolutional neural network for recognition, a much similar architecture to

11

neocognitron. Training either the SOM or the convolutional neural network is tremendously computationally expensive.

1.3.7 Automatic Face Processing Approach It is a more basic technology, using distances and distance ratios between easily acquired features such as eyes, end of nose, and corners of mouth. Though overall was not as robust as eigenfaces, feature analysis, or neural network. It may be more effective in dimly lit, frontal image capture situations [5,35].

In summary, no existing technique is free from limitations. Further efforts are required to improve the performance of face recognition techniques, especially in the wide range of environments encountered in real world. Neural networks are one of the most thoroughly investigated approaches and has demonstrated excellent performance. Hence, the technique will be used, in this thesis, as a baseline for recognition performance comparison.

1.4 Scope of the Work The goal of the thesis is to develop a face recognition system using efficiently and accurately approaches implemented with artificial neural networks. When an image of a human face, from a database containing faces "learned" in the past, is given, the system should identify the "learned" image whenever a new picture (in frontal view) is presented. The system compares this image with a collection of images of known individuals in the database and reports for whom it belongs. In developing a view-based face recognition that uses machine learning, three main subproblems arise. First, images of faces vary considerably depending on lighting, occlusion, pose, facial expression, and identity. The recognition algorithm should explicitly deal with as many of these sources of variation as possible. To further reduce variation caused by lighting or camera 12

differences, the images are preprocessed with histogram equalization to improve the overall brightness and contrast in the images. Second, one or more neuralnetworks must be trained to deal with all remaining variation in distinguishing faces. Third, the outputs from multiple recognition must be combined into a single decision about the face. The problem of big dimensionally of original bitmap face images that is fed to the neural network which is expensive in terms of the time and space complexity of the network. It can be solved by reducing the input space dimensionally, while extracting only the most relevant information. In this case, it reduces the dimensionality of the input space by using wavelet transform. The system used for face recognition is a combination of a high-level block diagram is shown in figure (1.1), which shows a breakdown of the various subsystems.

Input Image

Histogram Equalization

Dimensionally Reduction

Neural Network

Face Recognition Output

Figure (1.1): Schematic diagram of the main steps of the face recognition system developed in this thesis.

The work in this thesis concentrates on the face recognition problem, but incorporates recognition techniques to deal with the changes in the facial details: expression (smiling / non-smiling, opened / closed eyes), details (glasses / without glasses & slightly change in hair style), and lighting conditions. This thesis provides a rigorous analysis of the accuracy of the algorithms developed. A number of test sets were used, with images collected from a variety of sources, including the World Wide Web, scanned photographs and digital camera.

13

Each test set is designed to test one aspect of the algorithm, including the ability to recognize faces in cluttered backgrounds, the ability to recognize a wide variety of faces of different persons, and the recognition of faces of different poses. Therefore, the net is trained to achieve a balance between the ability to respond correctly to the input patterns that are used for training (memorization) and the ability to give reasonable (good) responses to input that is similar, but not identical, to that used in training (generalization).

1.5 Outline of the Thesis The remainder of the thesis is organized as follows: Chapter 2 includes simple image processing operations, and discusses histogram equalization for correcting the overall lighting to remove variation caused by lighting and camera parameters of a face, and presented the principle of wavelet transform. Chapters 3 review the concepts of neural networks, and discuss some basic theoretical principles of neural network. Chapter 4 describes an implementation of the proposed face recognition system design with the database, which is limited to upright, frontal faces. The system is evaluated over several large test sets. Chapter 5 gives the results of the proposed face recognition system, also contains discussion on these results, which describes related work in the face recognition domains, and presents comparisons of the accuracy of the algorithms when they have been applied to the same test sets. The resulting system is evaluated over the same test sets as the upright, as well as a new test set specifically for tilted faces.

14

Also examines some techniques for speeding up the face recognition algorithms. These include using a fast and accurate selection transform along with algorithms Chapter 6 summarizes the conclusions and contributions of the thesis and points out directions for future work.

15

Chapter Two Theoretical Principles of Image Processing 2.1 Introduction Image or picture that is produced by a camera or non-photographic imaging sensor is a flat scene representation that varies in visual properties from point to point such as brightness and colour. When a picture is digitized, it is represented by a regularly spaced array of samples of the picture function. These samples are quantized so the numbers take on a discrete set of possible values, generally integers. The elements of a digital picture array are called pixels. The pixel value is termed the gray level [36]. Image processing has become a common technique for making images more comprehensible to the human eye. In recent years image processing techniques have been so commonly used that they are now routinely built into common everyday appliances. Scanners and fax machines regularly do contrast and brightness enhancement and balancing, and many digital cameras and video cameras use techniques such as fourier transform in order to automatically obtain focal lengths [9]. Image processing is primarily a set of techniques to make images more meaningful to humans, as opposed to computers (many computer techniques work best with the original, unaltered images). Image processing works by removing features of an image that confuse the human eye, and enhancing other features that make the underlying structure of the image recognizable. This is often performed by attempting to understand the physical process that causes the imperfections in the image and then, by mathematically reversing their effect,

61

returning to a closer approximation of the 'ideal', defect-free image. Image processing is now a standard procedure in high-resolution optical microscopy.

2.2 Computer Imaging System Computer imaging systems are comprised of two-primary component types, hardware and software. The hardware components as illustrated in figure (2.1) can be divided into the image acquisition system, the computer itself, and the display device. The software allows manipulation of the image and performs any desired processing on the image data. Additionally, software may be used to control the image acquisition and storage process [37]. Referring to figure (2.1), the computer system may be a general-purpose computer with a frame grabber, or image digitizer, board in it. The frame grabber is a special purpose piece of hardware that accepts a standard video signal and outputs an image in the form that a computer can understand. This form is called a digital image. [38] The process of transforming a standard video signal into a digital image is called digitalization. This transformation is necessary because the standard video signal is an analog form and the computer requires digitized or sampled version of that analog form [37].

Image

Computer System

Digitizer

Figure (2.1) Computer imaging system.

61

Display

2.3 Elements of Digital Image Processing System Digital image processing system, generally, consists of the following:

2.3.1 Image Acquisition Two elements are required to acquire digital images. The first is a physical device that is sensitive to a band in the electromagnetic energy spectrum (such as the x-ray, ultraviolet, visible, or infrared bands) and that produces an electrical signal output proportional to the level of energy sensed. The second called a digitizer, which is a device for converting the electrical output of the physical sensing device into digital form [39]. As an example a major sensor category deals with visible and infrared light. Among the devices most frequently used for this purpose are vidicon cameras, and photosensitive solid-state arrays. The device requires that the image is to be digitized in the form of a transparency. Vidicon cameras and solid-state arrays can accept images recorded in this manner, and they also can digitize natural images that have sufficient light intensity to excite the detector [39].

2.3.2 Storage The applications of digital image processing falls into three principal categories: (1) Short time storage for use during processing,

(2) on line storage for

relatively fast recall and (3) archival storage [39]. One method of providing short-time storage is computed by the computer memory that may be a general-purpose computer with image digitizer. On line storage, generally, takes the form of magnetic disks (3.5 and 5.25 inch). Archival storage is characterized by massive storage requirements, but infrequent need for access. Magnetic tapes and compact disks are the usual media for archival applications.

61

2.3.3 Processing Processing of digital images involves procedures that are usually expressed in algorithmic form. Thus, with the exception of image acquisition and display, most images processing functions can be implemented in software. The only reason for specialized image processing hardware is the need for speed in some applications or to overcome some fundamental computer limitation [39]. The most obvious limitation of image processing is that it cannot reveal details that are not in some way present in the original picture. Further difficulties occur if the techniques are misused, if images are over processed it is possible to produce artifacts that are not present in the original image, by amplifying random noise until it becomes a feature of the image [4]. For these reasons it is important to always have the original image available when viewing the results of image processing, so as to be able to compare the image-processed result with its base image.

2.4 Image Capturing by Camera There are two types of image capturing by camera as follows:

2.4.1 Mechanical Camera Standard mechanical cameras, as well as TV camera are modeled after the human eye. A simple mechanical camera is shown in figure (2.2). Here the aperture setting governs the amount of light entering the camera, as the iris does in the human eye. The camera lens is moved back and forth to focus the image on the film. Light striking the camera film causes a chemical change within the film, which can be subsequently developed to reproduce the image [40].

61

Iris diaphragm Object Aperture

Lens Focus Film

Figure (2.2): A typical mechanical camera.

2.4.2 Digital Camera Digital technique replaces the film in mechanical camera by an integrated light detector (a CCD chip: Charge Coupled Device) that is, a microelectronic unit based on semiconductor technology. This chip may be regarded as a large number of individual light detectors arranged as squares on a chessboard. The individual detectors are called the pixels of the CCD, and a typical number might be 1000 x 1000 a spatial resolution comparable to, or exceeding television quality [40]. The CCD cameras become extremely popular as imaging devices, due to their high sensitivity over a wide light spectrum, low power consumption, and small size. There are many applications that uses CCD such as digital cameras and medical applications.

2.5 Image Representation Human visual system receives input image as a collection of spatially distributed light energy; this form is called an optical image. Optical images are 02

represented as video information in the form of analog electrical signals that are sampled to generate a digital image I(i,j). Digital image I(i,j) is represented as a two –dimensional array of data, when each pixel value corresponds to the brightness of the image at the point (i,j). In linear algebra terms, a 2-D array like our image model I(i,j) is referred to as a matrix, and one row or column is called a vector. This image model is for monochrome (one colour, this is normally refer to as black and white) image data, but there are other types of image data that require extensions or modifications to this model. Typically, these are multiband images (colour, multispectral) and they can be modeled by a different I(i,j) function corresponding to each separate band of brightness information [36,37,41].

2.5.1 Binary Images Binary images are the simplest type of images and can take on two values, typically black and white, or 0 and 1. A binary image is referred to as a 1-bit/pixel image because it takes only 1 binary digit to represent each pixel. These types of images are most frequently used in computer vision applications where the only information required for the task is general shape, or outline information. Binary images are often created from gray scale images via a threshold operation, where every pixel above the threshold value is turned white 1, and those below it are turned black 0.

2.5.2 Gray Scale Images Gray scale images are referred as monochrome or one- colour images. They contain brightness information only, no colour information. The number of bits used for each pixel determines the number of different brightness levels available. A typical image contains 8 bits/pixel data, which gives rise to 256 (0255) different brightness (gray) levels. The 8-bit representation is typical due to

06

the fact that the byte, which corresponds to 8-bits of data, is the standard small unit in the world of digital computers.

2.5.3 Colour Images Colour images can be modeled as three band monochrome image data, where each band corresponds to different colours. The actual information stored in the digital image data is the brightness information in each information spectral band. When the image is displayed, the corresponding brightness information is displayed on the screen by picture elements that emit light energy corresponding to that particular colour. Typical colour images are represented as red, green, and blue or RGB images. Using the 8-bits monochrome standard as a model, the corresponding colour image would have 24-bits/pixel, 8 bits for each of the three colour bands (red, green, and blue).

2.5.4 Multispectral Images Multispectral images typically contain information outside the normal human perceptual range. This may include infrared, ultraviolet, X-ray, acoustic, or radar data. These are not images in the usual sense because the information represented is not directly visible by the human system. However, the information is often represented in visual form by mapping the different spectral bands to RGB components. Sources for these types of images include satellite systems, infrared imaging systems, and medical diagnostic imaging systems.

2.6 Image File Formats Image files are stored by any type of persistent graphics data that are intended for eventual rendering and display. The various ways in which these files are structured are called graphics file format. There are two broad

00

categories of graphics file formats, termed bitmapped formats and vector formats. They are fundamentally very different: a bitmapped format stores a complete, digitally encoded image; while a vector format stores the individual graphical elements that make up an image. The concern here is on bitmapped formats only [33].

2.6.1 Bitmap Image File (BMP) The selected file format takes the BMP (Bitmap image) file due to the popularity of this format. The BMP files consist of three parts, a header, RGB colour table, and image pixel data as shown in table (2.1), the fourth colour in the colour palette is reserved for future development [42]. The header is consisting of file header (14 bytes) image header (40 bytes), which contains information about the image like (image size, width, number of colours, …etc.). The RGB colour table contains the colour components (red, green, blue) of the existing colour listed. The remaining part of the BMP file is the image pixels data part. This part contains the colour number of each image element (pixel) listed sequentially according to it is raster position. Table (2.1): The BMP file format. Header

Colour Palette Table

Image Pixels Data

54 byte

(No. of colours) * 4 byte

(Image height) * (Image width) byte

Each pixel colour number is mapped through the RGB colours table to construct and display the RGB components. To display image as a gray scale image the following formula will be used to mix the three colours band [42,43]: Gray Scale = 0.3*Red + 0.59*Green + 0.11*Blue .……………..……(2.1) The BMP format is becoming very popular, it is considered as a standard file format in windows environment, it is a common way of dealing with image

02

in which data access speed is more important than the amount of disk space. BMP format has many features: 1. No limits for the dimension of the image. 2. The data is uncompressed to provide high-speed access to the image. 3. It can support images with 1, 4, 8 or 24 bit colours. 4. It stores lines from bottom to up. 5. Monochrome images of any two colours, not necessarily black and white.

2.6.2 Joint Photographic Experts Group (JPEG) The joint photographic experts group was created ten years ago. JPEG is the most widely compression standard for true colour continuous tone images. JPEG image features: 1. JPEG supports 24 bit colours. 2. Uses a lossy compression method. 3. Best compression for photographs. 4. Rather slow software compression and decompression.

2.6.3 Graphic Interchange Format (GIF) It is one of the image file formats commonly used on many different computer platforms, as well as on the World Wide Web. GIF files are limited to a maximum of 8 bits/pixel and allows compression. The 8 bits/pixel limitation does not mean that it does not support colour image, it simply means that no more than 256 colour (2 8) are allowed in image and this is implemented by means of a LookUp Table (LUT), where the 256 colours are stored in a table and one byte is used as an index (address) into that table for each pixels. The GIF header is 13 bytes long and contains the basic information required [43]. It has a few features, such as the ability to include multiple images in sequence, hints on how long an image should be displayed, how it should be overlaid on previous images on the screen, and so forth.

02

2.6.4 Tag Image File Format (TIFF) TIFF is also commonly used on many different computer platforms as well as on the World Wide Web. It was developed by ALDUS corporation to save image created by scanners frame grabber, and photo editing programs. TIFF files allow a maximum 24 bit/pixel and support five types of compression. TIFF header is variable size and is arranged in a hierarchical manner. Unfortunately, it is also by far the most difficult format to handle because it has dozens and dozens of different subformats [44].

2.7 Image Processing Images are signals with special characteristics. First, they are a measure of a parameter over space (distance), while most signals are a measure of a parameter over time. Second, they contain a great deal of information. For example, more than 10 megabytes can be required to store one second of television video. This is more than a thousand times greater than for a similar length voice signal. Third, the final judge of quality is often a subjective human evaluation, rather than objective criteria .These special characteristics have made image processing a distinct subgroup within Digital Signal Processing (DSP). The following sections outline a number of the most common techniques that are used.

2.7.1 Image Smoothing This has a number of uses. First, if an image is degraded, smoothing may result in an image more meaningful to human eyes. Second, if an image is obtained via some digital technique such as a fax or scanner, it is possible that the image has been artificially sharpened that is, the edges of objects in the image have in some way been intensified. Smoothing, which leaves large

02

regions of the same colour and intensity unchanged, but smears edges, may restore an image's original appearance [45]. The danger with image smoothing is that it may blur lines and borders, and that it will destroy the fine detail of a picture. The same process that prevents us from seeing the wrinkles in an actor's skin may prevent us from seeing the fine structure under an electron microscope.

2.7.2 Image Sharpening The converse of image smoothing is image sharpening, which deliberately intensifies the edges of objects. The eye, in combination with the various visual portions of the nervous system, acts to emphasize lines and borders, at the expense of broad regions of colour an adaptation that allows swift recognition of the outline of an object [45]. The difficulty with image sharpening lies in its sensitivity to changes in shade. Whereas it can easily emphasize a boundary between two slightly different shades of gray, it will also emphasize boundaries between an isolated dark or bright dot - thus if there is any noise in an image, image sharpening will increase it, possibly making the image useless.

2.7.3 Image Enhancement Image enhancement techniques are used to emphasize and sharpen image features for display and analysis. Image enhancement is the process of applying these techniques to facilitate the development of a solution to a computer imaging problem. Consequently, the enhancement methods are application specific and are often developed empirically. The range of application includes using enhancement techniques as preprocessing steps to ease the next processing step or as post-processing step to improve the visual perception of a processed image, or image enhancement may be an end in itself. Enhancement methods

01

operate in the spatial domain by manipulating the pixel data or in the frequency domain by modifying the spectral components. Some enhancement algorithms use both spatial and frequency domains [37]. The type of techniques includes point operations, where each pixel is modified according to a particular equation that is independent on other pixel values. Mask operation, where each pixel is modified according to the values of the pixel neighbors (using convolution masks); or global operations, where all pixel values in the image (or sub-image) are taken into consideration. Spatial domain processing methods include all three types, but frequency domain operations, by nature of the frequency transforms, are global operations. Image enhancement is used for post processing to generate a visually desirable image. For instance, image restoration may be performed to eliminate image distortion and find that the output image has lost most of its contrast. Here, some basic image enhancement methods may be applied to restore the image contrast

2.7.3.1 Enhancement by Histogram Modification The histogram of gray of an image is the distribution of the gray levels in an image. Histogram with a small spread has low contrast, and histogram with a wide spread has high contrast. An image gray level, clustered at the low end of the range will be dark, and a histogram with the value clustered at the high end of the range will be bright image. The histogram can be modified by a mapping function, which will stretch, shrink (compress), slide or equalize the histogram [46].

2.7.3.2 Image Histogram The histogram of an image is a plot of the gray level values versus the number of pixels at that value. The shape of the histogram provides information about the nature of the image. There are three types of histogram as follows: 01

1. Histogram Equalization: Histogram equalization is a popular technique for improving the appearance of a poor image. It is used to process images that are lacking in contrast. The lack of contrast may be in all shades (when an image is under- or over-exposed), or may be due to certain shades bunching up. The latter occurs when the image is correctly exposed, but the objects photographed have insufficient contrast between each other. It is the process whereby the computer attempts to redistribute the shades of the picture to produce a more easily interpreted image. The intensity range of a heavily under-exposed image, (that would normally appear to be black and shades of dark gray), may be stretched so that while the black stays black, the lightest of the dark grays becomes white. All the intermediate shades of dark gray are interpolated between white and black, but all become lighter than before. Since a computer image has only a finite number of elements, or pixels, and a finite number of shades of gray, then tabulate the number of pixels represented with each shade of gray into a histogram. Using this histogram the computer can assign each pixel for a given shade of gray to a new value, possibly producing a more informative picture. On the other hand, histogram equalization that has stretched dark grays to light gray and white, while leaving pure black unchanged. This recovers most of the original contrast. This technique is often used to improve badly exposed photographs, and indeed is often done in hardware by modern digital cameras. In general the optimality of the histogram equalization transformation is to minimize the error between the transformed and desired histogram. The theoretical bases for histogram equalization involves probability theory where the histogram is treated as the probability distribution of the gray levels. The first order histogram probability P(b) can be defined as:

01

P(b)=N(b) / Mp

………………………………………………………..(2.2)

Where (Mp) is the number of pixels in the image or sub-image, and N(b) is the number of pixels at gray level b. The statistical measure based on histogram probability used to measure image regularity is the entropy which is a measure of randomness, achieving its highest value when all gray levels of image are equal so, it is given by [37]: N 1 L     P(b) log 2[ P(b)] .......... .......... .......... .......... .......... .......... .......... (2.3) b 0

As, the pixel values in the image are distributed among more gray levels, the entropy increases.

2. Localized Histograms: A more elaborate use of histogram techniques is to apply the same principle on a regional basis to an image. This allows detail to be extracted from regions that are unusually lacking in contrast within a larger image that may be perfectly well adjusted. There are a number of ways that localized histograms can be used. A naive approach is simply to divide an image into subsections, and individually process each subsection as if it were an entire image. The shortcoming of this approach is revealed when an interesting region straddles the intersection of multiple subregion [47].

01

3. Contrast Stretching: An important class of point operations is based upon the manipulation of an image histogram or a region histogram frequently; an image is scanned in such a way that the resulting brightness values do not make full use of the available dynamic range. By stretching the histogram over the available dynamic range then it is an attempt to correct this situation. If the image is intended to go from brightness 0 to brightness 2 B-1 then one generally maps the 0% value (or minimum) to the value 0 and the 100% value (or maximum) to the value 2B-1. The appropriate transformation is given by:

b(i, j)  (2B 1) 

I (i, j)  min imum .............................................(2.4) max imum min imum

It is possible to apply the contrast-stretching operation on a regional basis using the histogram from a region to determine the appropriate limits for the algorithm [37]. Figure (2.3), illustrates the effect of contrast stretching and histogram equalization on a standard image. The histogram equalization procedure can also be applied on a regional basis.

(a)

(b)

(c)

Figure (2.3): Original Contrast Stretched histogram Equalized. (a) Original image (b) Contrast stretching image (c) Histogram equalized image.

22

2.7.4 Segmentation In the analysis of the objects in images it is essential to distinguish between the objects of interest and the rest. This latter group is also referred to as the background. The techniques that are used to find the objects of interest are usually referred to as segmentation techniques, segmenting the foreground from background. In this section two of the most common techniques, thresholding and edge finding, will be presented to improve the quality of the segmentation result [48]. It is important to understand that:  There is no universally applicable segmentation technique that will work for all images, and,  No segmentation technique is perfect.

2.7.4.1 Thresholding This technique is based upon a simple concept. A parameter called the brightness threshold is chosen and applied to the image I(i,j) as follows:

If I (i, j)  T I (i, j)  object 1 Else I (i, j)  background 0 .........................................(2.5) Where T is the threshold value. This version of the algorithm assumes that interest is in light objects on a dark background. For dark objects on a light background use the following equation:

If I (i, j)  T I (i, j)  object 1 Else I (i, j)  background 0 .........................................(2.6) The output is the label "object" or "background" which, due to its dichotomous nature, can be represented as a Boolean variable "1" or "0". In

26

principle, the test condition could be based upon some other property than simple brightness (for example, If (Redness {I(i,j)} >=

red),

but the concept is

clear [46]. The central question in thresholding is: How to choose the threshold? While there is no universal procedure for threshold selection that is guaranteed to work on all images, there are a variety of alternatives.

1. Fixed Threshold: One alternative is to use a threshold that is chosen independently of the image data. If it is known that one is dealing with very high contrast images where the objects are very dark and the background is homogeneous and very light, then a constant threshold of 128 on a scale of 0 to 255 might be sufficiently accurate. By accuracy it is meant that the number of falsely classified pixels should be kept to a minimum.

2. Histogram Derived Thresholds: In most cases the threshold is chosen from the brightness histogram of the region or image that is to be segmented and it is associated brightness histogram are shown in figure (2.4). A variety of techniques have been devised to automatically choose a threshold starting from the gray-value histogram, {P[b] | b = 0, 1, ... , 2B-1}.

P(b)

(a)

(b)

Figure (2.4): Pixels below the threshold will be labeled as object pixels; those above the threshold will be labeled as background pixels. (a) Image to be thresholded (b) Brightness histogram of the image.

20

2.7.4.2 Edge Finding Thresholding produces a segmentation that yields all the pixels that, in principle, belong to the object or objects of interest in an image. An alternative to this is to find those pixels that belong to the borders of the objects. Techniques that are directed to this goal are termed edge finding techniques. From that, it is clear that there is an intimate relationship between edges and regions [46]. Gradient Based Procedure: The central challenge to edge finding techniques is to find procedures that produce closed contours around the objects of interest. For objects of particularly high signal to noise ratio (SNR), this can be achieved by calculating the gradient and then using a suitable threshold. This is illustrated in Figure (2.5). While the technique works well for the 30 dB image in Figure (2.5a), it fails to provide an accurate determination of those pixels associated with the object edges for the 20 dB image in Figure (2.5b).

(a)

(b)

Figure (2.5): Edge finding. (a) SNR = 30 dB. (b) SNR = 20 dB

22

2.7.5 Image Compression Since, a digital information has some advantages over its analog one in terms of processing flexibility, easy or random access in storage, higher SNR, possibility of errorless transmission, etc. Therefore, image data compression technique aimed to reduce the bandwidth of digital signal compared to its analog bandwidth requirements and on occasion use of processing flexibility [49]. Image data compression is to represent an image with as few bits as possible which preserving the level of quality required for the given application [45]. This does not destroy image information and allow a good reconstruction of the original image. The reconstructed image from the coded one will be accompanied by some distortion. The efficiency of a data compression technique is measured by its resulting distortion, the data compression ability, and complexity of the system. The key to practical system implementation lies in selecting an efficient technique, which achieves successful compromise between complexity and performance [45]. At the light of the forgoing, the fundamental aim of image data compression techniques is to reduce the bit rate for transmission and storage image data, while maintaining an acceptable fidelity or image quality. Fidelity criteria can be divided into two classes: (1) objective fidelity criteria and (2) subjective fidelity criteria. The objective fidelity criteria are borrowed from digital signal processing and information theory and provide us with equations that can be used to measure the amount of error in the reconstructed (decompressed) image. Subjective fidelity criteria require the definition of a qualitative scale to assess image quality. This scale can then be used by human test subjects to determine image fidelity. In order to provide unbiased results, evaluation with subjective measures requires careful selection of the test subjects and carefully designed evaluation experiments. The objective

22

criteria, although widely used, are not necessarily correlated with our perception of image quality. However, they are useful as a relative measure in comparing different versions of the same image [49].

2.7.6 Fourier Transforms and Bandpass Filtering Another set of common image processing techniques use Fourier transforms. A Fourier transform models a particular shape as a series of overlapping sine waves of different frequencies, a process that is a little counterintuitive, but nonetheless very powerful. Fourier transforms can be applied in two dimensions, the image being transformed into a two-dimensional grid of the different frequency waveforms, or components, underlying the image. The Fourier transform takes a normal image of pixels and converts it into frequency space. Since no information is lost during the process, the resultant Fourier transform of frequency components can be represented of the same size as the original [38]. An image can also be smoothed, and high frequency noise (i.e. isolated spots and lines) in the image can be removed by eliminating all the high frequency components of the Fourier transform image. Conversely, removing all the low frequency components of the Fourier transform image can enhance small features such as edges and points. This exclusion of a set of frequencies in the fourier transform is called bandpass filtering (since a range, or band, of frequencies is included/excluded).

22

Transform Efficiency: In order to discuss the advantage of using transforms, a statistical model will be considered. Efficient transforms are compact variances in the frequency domain in a way that a fewer coefficients would have most part of the energy in the signal. The most significant parameters considered to evaluate the transform efficiency are [49]: 1-Correlation Reduction: It is the ability of the transform to produce uncorrelated coefficients. 2-Energy Packing: It is the measurement of the transform ability to pack the signal energy into the first few transformed coefficients relative to the total energy resident in all coefficients. 3-Computational Load: An important issue concerning the choice of transform is the computational complexity required for its implementation. This complexity can be measured in operations per sample, where operations can be additions, multiplication, or others. During the last three decades, a major effort was spent to develop efficient algorithms. Fast algorithms lead to reduce digital computer processing time, reduce round-off errors, save in storage requirements, and simplify digital hardware.

2.8 The Discrete Wavelet Transform (DWT) During the last few years digital images have been expanding in usage and size and this led to improve image compression algorithms that can be used to produce smaller file size with a lower computational requirement. The purpose of this work is to code images through the use of wavelet transform in such a way that the redundancy and the visually unimportant information in the original image are removed while the coded images can represent the original image faithfully. Image data compression is concerned with minimizing the number of bits required representing an image with minimum perceived loss of image quality. 21

Therefore, digital image compression techniques are necessary for applications in communications, remote surveillance, and remote sensing, facsimile transmission etc [50]. In general, compression is possible because of the redundancy in uncompressed images. The DWT consists of applying a wavelet coefficient matrix like A hierarchically, first to the full data vector of length N, then to the “smooth” vector of length N/2, then to the “smooth-smooth” vector of length N/4, and so on until only a trivial number of “smooth-…-smooth” components usually 2 remain [51]. This transform provides an attractive tradeoff between spatial and frequency resolution. This unique property of wavelet transform does not exist in other transform. It also has better energy concentration property than the other transforms [52,53]. Figure (2.6) shows an image called a boy and figure (2.7) shows its 10 subbands decomposed by two-dimensional wavelet transform. Figures (2.8) and (2.9) show the notation and the data structure for these subbands. The original image of N*N is decomposed into 4 subimages of N/2*N/2 in the subband decomposition of level 1 which are LL, LH, HL, and HH bands. The subimage of the LL band is the coarse image of the original image. Similarly, the LL band is decomposed into the LL, LH, HL, and HH bands of N/4*N/4 in level 2. Finally, the LL band in level 2 is decomposed into the LL, LH, HL, and HH bands of N/8*N/8 in level 3 [40].

21

Figure (2.7): boy is decomposed to 10 subbands with 2 dimensional wavelet transform for 3 levels

Figure (2.6): An image called boy (320*320)

Level 1

Level 2

HH

LH

LH

HH

HL LH

HH

Level 3

HL LL HL 3

2

1

Figure (2.8): The notation for corresponding subbands

21

Original image N*N

2

Level 1

Level 1 N/2*N/2

3

1

5

6

Level 2

2

6

3

7

Level 2 N/4*N/4

7 Level 3 N/8*N/8

10

11

8

9

8

1

9

10

11

5

Level 3

3

2

1 (b)

(a)

Figure (2.9): (a) The data1 structure for the corresponding subbands. (b) Tree structure. Since the scaling and wavelet functions are separable, each convolution breaks down into one-dimensional convolutions on the rows and columns of I(i,j) which is N*N dimension. At stage 1, the rows of the image I(i,j) first convolve with low-pass filter and with high-pass filter. The columns of each of the N*N/2 array are then convolved with low-pass filter and with high-pass filter. The result is that the four N/2*N/2 arrays are required for that stage of the transform. The transform process can be carried to m stages, where the integer m ≤ min (log2 (N)) for an N*N pixel image. For one dimensional calculation, one half of the output is produced by the low-pass filter function: ai 

1 N I  c 2 j1 j12 i j

, i=1,2,…,N/2 ……………….. …..… (2.7)

21

while the other half is produced by the „high-pass‟ filter function:

bi 

1 N j1  (1) c 2 i  j I j 2 j1

, i=1,2,…,N/2 …..……………...… (2.8)

where: N: is the input row size, c‟s: are the coefficients and a, b: are the output functions. The nonzero coefficient ck, which determine these functions, are summarized in table (2.2). Using other coefficients and other orders of wavelets yields similar results, except that the outputs are not exactly averages and differences, as in the case using the Haar coefficients [54,55]. Coefficients for the Daubechies-6 wavelet, one used in the discussion of the wavelet transformer implementation are also given in table (2.2). Table (2.2): Coefficients for three named wavelet functions. Wavelet

C0

C1

Haar

1

1

C2

C3

Daubechies-4

0.683012 1.183012

0.316987

-0.183012

Daubechies-6

0.332671 0.806891

0.459877

-0.135011

C4

C5

-0.085441

0.035226

In many situations, the low-pass output contains most of the “information content” of the input row. The high-pass output contains the differences between the true input and the value of the reconstructed input, if it were to be reconstructed from only the information given in the low-pass (detail) output. Therefore, for Haar transform the output of the low-pass filter consists of the average of every two samples and the output of the high-pass filter consists of the difference of every two samples. To reconstruct the original matrix from this final matrix, the inverse lowpass filter function is applied:

22

N/ 2 L I j   c j12 i a i i 1

, j= 1,2,…,N …………..……..……… (2.9)

The inverse high-pass filter function is applied: j1 H N/ 2 I j   (1) c2 i  jbi i 1

, j=1,2,…,N …………….....…….. (2.10)

The perfect reconstruction is a sum of the inverse low-pass and inverse highpass filter:

L H Ij  Ij  Ij

……………………..………..……….………… (2.11)

The human vision system (HVS) is taken into consideration to quantize coarsely the coefficients in higher spatial frequency bands and the coefficients in the middle level and lower-resolution bands finely to prevent perceptible degradations.

2.9 Dilation Since most of the information exists in the lowpass filter output, one can imagine taking this filter output and transforming it again, to get two new sets of data, each one half the size of the original input. If again, little information is carried by the highpass output, then it can be discarded to yield 4x data compression. Each step of retransforming the lowpass output is called a dilation, and if the number of input samples is N=2 D then a maximum of D dilations can be performed, the last dilation resulting in a single lowpass value and single highpass value. This process is shown in figure (2.10), where each “x” is an actual system output. If the original input stream is segmented into blocks of size 2 D, then the result of the Dth level of transformations is a single data point, and further splitting the data is impossible [56,57].

26

Wavelet Transform Dilations Input stream

(block size=16)

Lowpass Odd 1st Dilation

Highpass

x

x

x

x

x

x

x

x

Even

Lowpass Odd 2nd Dilation Highpass

x

x

x

x

Even

Lowpass Odd 3rd Dilation Highpass

x

Lowpass

x

Even

x

Odd 4th Dilation

Highpass

Even

x

Figure (2.10): Dilation of a sixteen-sample block of data. 20

2.10 Modeling and Simulation Modeling is a broad term covering any abstract representation of a physical object or process. The term 'model' is sometimes reserved for static representations, while 'simulation' is used for representations of dynamic processes, but the difference is largely arbitrary, and the two terms are used interchangeably within this thesis. Modeling of the sort needed for modern engineering requires consideration of many physical parameters. A mathematical abstraction is constructed of the object, and mathematical terms representing the various physical systems that act on or comprise the object are introduced into the system of equations. The modeling performed a system of equations. Due to the complexity of the problem, these various equations are broken up into separate computer algorithms and solved numerically, rather than analytically [50].

Limitations of Modeling and Simulation: Similarly most complex modeling requires even the mathematics of the simplified model to be approximated, and the degree to which this approximation is done must be selected with care. Almost always (and certainly in this thesis) there are tradeoff to be made between accuracy, speed, the usability of the resulting data, and how closely the mathematical model approaches the physical reality being modeled. Despite these limitations, modeling has been an extraordinarily successful method of understanding the world and models with known limitations are used to great effect by scientists and engineers every day.

22

2.11 Digital Signal Processors DSP is carried out by mathematical operations. Digital signal processors are microprocessors specifically designed to handle digital signal processing tasks. These devices have seen tremendous growth in the last decade, finding use in everything from cellular telephones to advanced scientific instrument [46].

How Fast are DSPs? The primary reason for using a DSP instead of a traditional microprocessor is speed, the ability to move samples into the device, carry out the needed mathematical operations, and output the processed data. This brings up the question: How fast are DSPs? The idea behind benchmarks is to provide a head-to-head comparison to show which is the best device. To handle these high-power tasks, several DSPs can be combined into a single system. This is called multiprocessing or parallel processing.

2.12 Programming Language The highest level of programming sophistication is found in applications packages for DSP. These come in a variety of forms, and are often provided to support specific hardware. Which programming language should you use? That depends on whom you are and what you plan to do. Most computer scientists and programmers use C language (or the more advanced C++). Power, flexibility, modularity; C language has it all. As a middle-level language, C language allows the manipulation of bits, bytes, and addresses-the basic elements with which the computer functions [58]. C language is commonly referred to simply as a structured language, a structured language allows you a variety of programming

22

possibilities. It directly supports several loop constructs, such as while do-while, and for. Programs which written in assembly, can execute faster, while programs which written in C language are easier to develop and maintain. In traditional applications, such as programs run on personal computers, C language is almost the first choice. If assembly is used at all, it is restricted to short subroutines that must run with the utmost speed. The future improvements will minimize the difference in execution time between C language and assembly, and allow C language to be used in more applications. If you need flexibility and fast development, choose C language. On the other hand, use assembly if you need the best possible performance.

22

Chapter Three Artificial Neural Networks Basic Concepts 3.1 Introduction The hallmark of the neural net is massive parallelism and high interconnectivity between a large number of relatively simple processors. The information in a neural processor is stored in the interconnection pattern rather than at specific spatial locations uniquely defined by a memory address. Instead of a central processor that acts on few bits of information at a given time, as it is found in standard computers, a neural network recruits its entire force of processors or “neurons” to work on a given problem all at once. In this manner, the devices seemingly mimic the brain, in which a signal fired from one region to another and back. Two major benefits of artificial neural systems (ANS) are storage capacity and classification speed. ANS can store large number of complex patterns; visual scenes, speech templates and robot movements. It can classify new patterns to store patterns quickly. The classification speed is independent of the number of patterns stored. These features promise new fields of applications, such as real-time pattern recognition, sensory processors, real-time fuzzy expert systems, robot control, and others [59]. The aim of this chapter is to establish the fundamentals of the neural networks. It discusses the neural networks definitions, structures, taxonomies, the reasons of using neural networks and the limitations of using them and the methodology of application with some of networks application areas.

46

3.2 Definition of Artificial Neural Networks Artificial neural networks (ANN) are biological inspired that is; they are composed of elements that perform in a manner that is analogous to the most elementary functions of the biological neuron. These elements are then recognized in a way that may (or may not) be related to the anatomy of the brain [2]. The neural networks are a parallel distributed information-processing structure consisting of processing elements interconnected via unidirectional signal channels called connections. From geometrical view a neural network is a directed graph consisting of a set of nodes (processing elements) along with a set of links (connections) between them [41]. Thus ANN resembles the brain in: 1. Knowledge is acquired from its environment through learning process. 2. Inter neuron connection strengths, known as synoptic weights are used to store the acquired knowledge. The generalization capacity of a neural network is its capacity to give a satisfactory response for input, which is not part of examples on which it was trained. The degree of possible generalization is related to the quality of the result set up by network.

3.3 Artificial Neuron Structure ANN are computational structures modeled after biological process. Researchers are usually thinking about the organization of the brain when considering network configuration and algorithms; artificial neuron was designed to mimic the first-order characteristics of the biological neuron. The biological neurons have five functions: they receive signals coming from neighboring neurons, integrate these signals, give rise to nerve pulses, conduct the pulses, and transmit them to other neurons. The biological neuron is built up of three parts: the cell body, the dendrites, and the axon, as shown in figure (3.1) [59]. 47

The body of the cell contains the nucleus of the neuron and the molecules necessary to the life of the neuron, while dendrites are the principal receptors of the neuron and serve to connect its coming signals. The axon is the outgoing connection of signal emitted by the neuron, and the connection between the neurons takes place at synapses, where they are separated by a synaptic gap of the order of one-hundredth of a micron.

Cell body

Synapse Axon

Dendrites

Nucleus

Figure (3.1): The Biological Neuron.

In the artificial neuron, a set of inputs is applied, each representing the output of another neuron. Each input is multiplied by a corresponding weight, and all weighted inputs are then summed to determine the activation level of the neuron. Figure (3.2) shows a model that implements this idea, where a set of inputs (x1,x2,…,xn) are applied to the artificial neuron. These inputs are collectively referred to as vector X, each signal is multiplied by an associated weight w1,w2,…,wn before it is applied to a summation block. If this summation exceeds a certain threshold, the neuron responds by issuing a new pulse, which is propagated along its axon, otherwise the neuron remains inactive.

48

Each weight corresponds to the “strength” of single biological synaptic connections; the set of weights is referred to collectively as the vector W. The summation block, corresponding roughly to the biological cell body, adds all of the weighted inputs algebraically, producing an output S, in vector notation S=XW. Dendrites Cell body

Weights W

Inputs X X1

Threshold

W1

W2

Axon Out

X2 . . Wn

Xn

Summation, S=XW Figure (3.2): The Artificial Neuron

3.4 Neural Model A general neural model may be defined by the following five elements: 1. The nature of its input. 2. The input function, defining the preprocessing carried out on its input. 3. The activation function of the neuron defining its internal state as a total of its input. 4. The output function, which determines the output of the neuron as a function of its activation state. 5. The nature of the output of the neuron.

49

The neuron inputs may be binary, with values of (-1, +1) or (0,1), or they may be continuous or real numbers. The total input function can be:  Boolean.  Linear : S ( x1 ,......, xn )  Wi X i i 1, n

 Affine : S ( x1 ,......, xn )  Wi X i  a i 1, n

The linear and affine functions are most frequent, the term „-a‟ in the affine function can be implemented using additional neuron (bias neuron), which always furnishes an input of –1 to the neuron under consideration. This is particularly useful when considering the problems of learning using activation functions with threshold [60]. Figure (3.3) shows the block F that accepts the output signal S of a summation block to produce the neuron output signal out. X1 X2

W1 W2 Wn

Σ

F S = XW

OUT = F(S)

Artificial Neuron

Xn

Figure (3.3): Artificial Neuron with Activation Function Several of the most common types of activation signal functions are depicted in figure (3.4). With the linear signal function, the node activation forms the actual output from the node. The linear threshold function is known as saturation, it is similar to linear function but it limits the output to the range [λ,-λ]. With “threshold” or “unit step” function the neuron carries out a summation of the input signals. If this summation exceeds a certain threshold λ, the neuron responds by issuing a new pulse, which propagates along its output connection. If the summation is less than the threshold the neuron remains inactive, a typical function is :

50

f(x) =1 ; if x >λ ; otherwise f(x) = 0, where the hard limit function is: f(x) =1 ; if x > 0 ; f(x) =0 ; if x = 0 , f(x) = -1 ; if x < 0. The activation function may be sigmoid function (S-shaped) which is a bounded differentiable real function that is defined for all real input values and that has a positive derivation everywhere. A typical sigmoid function is

f ( x) 

1 1  ex

……..…………………………………………(3.1)

where the constant λ determine the steepness of the rise.   ,  x  y  x ,    x      , x   

y=x

f(x)

1 0

x

λ



-1 Linear Function

Threshold Linear (Saturation Function)

y  1 (1  e  x )

Hard Limiter (Sign Function)

1

 0

1 f(x)

1

0.5

0 λ Threshold (Unit Step Function)

Sigmoidal Logistic Function Figure (3.4): Activation Signal Functions

51

-1 Hyperbolic Tangent Function

Another commonly used activation function is the hyperbolic tangent, which is used by biologists as a mathematical model of nerve-cell activation and it is expressed as f(x) = tanh(x), the hyperbolic tangent function is S-shaped, it is symmetrical about the origin and has a bipolar value for f(x). Any other function may be used for activation, but it is generally chosen to be monatomic and odd. The output function of the neuron is defined to be identity function. In other words, the output of the neuron is made identical to its activation level [61].

Biases and Thresholds: A bias acts exactly as a weight on a connection from a unit whose activation is always 1. Increasing the bias increases the net input to the unit. If a bias is included, the activation function is typically taken to be [59]:

1

if net  0

f (net) 

………………….………………………..…..(3.2)

1

if net  0

where

net  b   xi wi ………………………………………….……………(3.3) i Some authors do not use a bias weight, but instead use a fixed threshold θ for the activation function. In that case: 1

if net  

…………………………...….…….……….(3.4)

f (net)  1

if net  

where

net   xi wi ……………………………..……………..…….………..(3.5) i 3.5 Neural Network Learning The ANN learning ability is most intriguing property. Like the biological systems, networks modify themselves as a result of experience to produce a

52

more desirable behavior pattern. Adjusting the weights is commonly called “training” and the network is said to “learn”. Training is accomplished by sequentially applying input vectors, while adjusting network weights according to predetermined procedure. During training, the network weights gradually converge to values such that each input vector produces the desired output vector. Learning can be defined as “the process

which

is

completed

by

incorporating

past

experience

into

interconnection patterns in neural nets”, which depends on the all relatively long-term modification in the behavior that can be attributed to the input of the network [59].

3.6 Structure of Connections The architecture of ANN may specify total conductivity where all the neurons are connected to all others or local conductivity in which neurons are only connected to their nearest neighbors. However, it is common to use networks with regular connection structure to facilitate their implementation. There are two classical models: fully connected networks, each neuron is connected to every other neuron, including itself, this structure of connection makes the networks very complex to simulate, for this reason, the layered networks are frequently used. Layered neural networks involve a set of inputs cells connected to a collection of output cells by means of one or more layers of modifiable intermediate connections these intermediate layers are called hidden layers and they may vary in number. Neurons that belong to a particular layer are not connected to each other. Each layer receives signals only from the previous layer and transmits results to the following layer [62].

3.7 Neural Networks Taxonomies Neural Networks is a field having demonstrated performance, unique potential, many limitations, and a host of unanswered questions.

53

Unfortunately, there are neither published standards nor general taxonomy for ANN. Identical networks taxonomies can appear differently when presented by different authors. These taxonomies are classifying the neural networks according to different aspects of neural networks and their characteristics, such as: Learning strategies, input data nature, neural networks connection structures, state of activation function, fixed or adaptive network weights. Some of these taxonomies will be listed here. They are presented in different ways and they sometimes conflict with each other.

3.7.1 Neural Net Taxonomy Based on Training Strategies At most fundamental level, the training strategies can be divided into two categories: supervised training, and unsupervised training [63]. The supervised training requires the pairing of each input vector with a target vector representing the desired output; together these are called a training pair. Usually a network is trained over a number of such training pairs called a training set. An input vector is applied, the output of the network is calculated and compared to the corresponding target vector, and the difference (error) is feedback through the network and weights are changed according to an algorithm that tends to minimize the error. The vectors of the training set are applied sequentially, and errors are calculated and weights adjusted for each vector, until the total error for the training set is at an acceptably low level. In unsupervised training, a network modifies itself in response to input vector, there are no target output, and hence, no comparisons to predetermine ideal responses. From just the input vector, it is expected to organize itself into some useful configuration, and modifies the weights to produce output vectors that are consistent that are based on competitive learning. The unsupervised learning methods fall into one of two categories: competitive and non-competitive paradigms. Our emphasis will be on the competitive type [i.e. the neurons that compete with each other (in accordance to

54

a learning rule) to respond to features that contained in the input data]. In such strategy the neuron with the greatest total input wins the competition and turns ON, all other neurons then switch OFF. Self-Organizing Feature Map (SOFM) is an example for this type [34]. The SOFM is an unsupervised (self-organizing) neural network that can transform nonlinear, high dimensional data on a low dimensional illustrative display, and to perform this transformation adaptively in a topologically ordered fashion. In other words, the supervised learning requires an external “teacher” that evaluates the behavior of the system and directs the subsequent modifications. Unsupervised learning requires no teacher; the network organizes itself to produce the desired changes, as shown in figure (3.5), [64]. Teacher Training

Training Patterns

Wij

Patterns

Wij

(b) Unsupervised Learning

(a) Supervised Learning

Figure (3.5): Supervised and Unsupervised Learning

3.7.2 Neural Net Taxonomy Based on Input Nature Another taxonomy is based on the nature of inputs and the training strategy used [59]. As shown in figure (3.6), this taxonomy is first divided into nets with binary and continuous valued inputs. Nets trained with supervision such as Hopfield net and perceptrons, and nets trained without supervision, such as the Konohana‟s feature-map forming nets. Although all the nets shown in

55

figure (3.6), can be trained adaptively the Hopfield net and the Hamming net are generally used with fixed weights.

Neural Networks

Continuous Input Binary Input Supervised Supervised

Unsupervised Perceptron

Hopfield Net

Unsupervised

Hamming Net

Carpenter/ Grossberg Classifier

Multi-layer perceptron

Kohonen Self-organizing Feature maps

Use Adaptive Training

Use Fixed Weights

Figure (3.6): Neural Networks Taxonomy (based on input nature and training strategy).

3.7.3 Neural Net Taxonomy Based on Architectures Karayiannis and Venetsanopoulos have different philosophy in neural network taxonomy [63]. They classified the most widely used neural networks into three basic categories; feedforward, feedback, and self organization, as shown in figure (3.7), these main categories then divided into some subcategories according to different principles, such as the linear state of the activation function and the learning strategy.

56

Neural Networks

Feedforward

Self-organization

Feature

Linear

Map

Non-linear Hopfield Model

Supervised

Single Artificial

Feedback

Boltzmann Machine

Unsupervised

Single Layered Neural Networks

Neurons

Multi Layered Neural Networks

Figure (3.7): Neural Networks Taxonomy (based on network structure and training strategy). On the first level of this taxonomy there are three categories feedforward, feedback, and self-organization. As mentioned before, the self-organization network such as Kohonen feature map monitors itself and corrects errors without receiving any additional information.

57

The second main category of this taxonomy is the feedforward networks, which is known type of neural networks, this type is called “Non-recurrent” and it has no feedback connection, that is, connections through weights extending from the outputs of layer to the inputs of the next layers. Most widely used feedforward neural network architectures are trained with supervised learning, but they can be unsupervised learning, when they are based on the Hebbian learning principle. The Hebbian learning rule is biological inspired scheme, which has strongly influenced unsupervised learning. Feedback networks are type of neural networks that contain feedback connections, and they are called “Recurrent”, recurrent networks recalculate previous outputs back to inputs; hence, their output is determined both by their current input and their previous outputs. For this reason recurrent networks can exhibit properties very similar to short-term memory in humans in that state of the network outputs depends in part upon their previous input. The Hopfield model is the simplest and most widely used feedback architecture, another example of feedback neural networks appears in this taxonomy is Boltzmann machine which is close to Hopfield Model Architecture [7].

3.7.4 Neural Net Taxonomy Based on Building Element In this taxonomy the process of using artificial neurons (Perceptron) as basic building elements for development of single and multi-layered neural networks [63]. This will be clarified in the preceeding sections.

3.7.4.1 Single Artificial Neuron (The Perceptron) The perceptrons are the earliest feedforward neural architecture. Nevertheless, the potential of feedforward neural networks was revealed quite recently, after the appearance of multi-layered neural networks. Perceptrons are the logical starting point for a study of ANN because their theory is the

58

foundation for many other forms of ANN, and they demonstrate important principles. As shown in figure (3.8), the perceptron is a processing element, its input consisting of an (n+1) dimensional vector, X = (x0, x1,…,xn), where x0, is permanently set to 1 (this is called bias input),the output of perceptron is 1 if weighted input sum x0w0+x1w1+…+xnwn is greater than or equal to zero, the output is zero if the weighted sum is less than zero [63]. (Correct output) y (Bias input) X0=1

Σ

X1

  1 if y 0 if 

n

W X i 0 n

i

i

W X i 0

i

i

Training Inputs X2 Xn Figure (3.8): The Perceptron

3.7.4.2 Single Layer Artificial Neural Networks Although a single neuron can perform certain simple pattern detection function, the power of neural computation comes from connecting neurons into networks. The simplest network is a group of neurons arranged in single layer as shown on figure (3.9), the nodes on the input layer serve only to receive the inputs from the outside world and distribute them to other processing elements of the network. For this reason the first layer is not included in the layer count, since it is not performing the computation network. It is convenient to consider the weights to be elements of a matrix W. the dimensions of the W matrix are m rows by n columns, where n is the number of neurons and m is the number of inputs. For example, the weight connecting the third input to the second neuron would be W2,3 , in this way, calculating the set

59

0 0

of neurons summation outputs N for a layer is a simple matrix multiplication, thus N=XW, where N and X are row vector [59].

Output Layer

Input Layer W11

X1

y1

W21 Wn1

W12 X2

W22

. W . .W

y2

. . .

n2

1m

Xm

W2m Wnm

yn

Figure (3.9): Single layer neural network.

3.7.4.3 Multilayer Artificial Neural Networks Larger and more complex networks generally offer greater computational capabilities. Although networks have been constructed in every imaginable configuration, rearranging neurons in layers mimics the layered structure of certain portions of the brain. These multilayer networks have proven to have capabilities beyond those of single layer, and in recent years algorithms have been developed to train them. Simple cascading a group of single layers may form multilayer networks; the output of one layer provides the input to the subsequent layer. Figure (3.10), shows such a network.

60

The literature is inconsistent defining the number of layers in these networks. Some authors refer to the number of layers of neurons (including the nonsumming input layer), others to the layers of weights, or to the number of layers without including the input layer (since it is not performing the computation work), and the input or distribution layer is designated layer 0. Because the latter definition is more functionally descriptive, it is used throughout this thesis.

(1) Wji X1

W11

1

W21 Wn1 W12

X2

Xm

2

. . .W

W22

Wn2

1m

W2m Wnm

m Layer 0 Input Layer

(L) Wji W11

1

1

y1

W21 W22 2 W2n

y2

W12 W1n

2

. . .

Wp2 Wp1 Wpn

n Layer 1

Hidden Layers (Layer 1 to Layer L-1)

. . .

p

yn

Layer L Output Layer

Figure (3.10): Multilayer neural network. Hidden layers (hidden neurons) play a critical role in the operation of multilayer perceptron (MLP) with backpropagation learning because they act as a feature detector. As the learning process progresses, the hidden neurons in these layers begin to gradually discover the salient features that characterize the training data [59]. For most applications, a single hidden layer is sufficient. Sometimes, difficult learning tasks can be simplified by increasing the number of internal

61

layers. So, for complex mappings, two hidden layers may give better generalization and may make training easier than a single hidden layer [63].

3.8 Reasons for Using Neural Networks After two decades of near eclipse, interest in ANN has grown rapidly over the past few years, that is because it demands minimum capabilities that could not be delivered until the sufficient maturation in mid 1980‟s Silicon Very Large Scale Integration (VLSI) circuit technology [41]. It was only then that the field could emerge from its former status as an esoteric research subject to become an important “new” practical technology. Suddenly, it appears possible to apply computation to realm previously restricted to human intelligence. To make machines that learn and remember in ways that bear a striking resemblance to human mental process, and exhibit a surprising number of the brain‟s characteristics, giving a new and significant meaning to the much-abused term artificial intelligence. Some of specific properties of neural networks responsible for the interest that they arouse today are listed below [24,65]. 1. Adaptation or learn ability, “learning than programming”: A highly important feature of the neural network is their ability to learn to solve a problem but not programmed, which gives the networks the ability to adapt and continue learning new things from external world as they arise. This capacity of adaptation is particularly relevant for problems, which do not yet have technical solutions in order to solve them, and for solved problems to resolve them in a new manner. The networks are ensuring their stability as dynamic systems by their capacity to self-organize [63]. 2. Parallelism: It is fundamental in the architecture of neural networks when the neurons are considered as sets of elementary units operating simultaneously which allows a great speed of 62

calculation.

The

neural

network

speed

is

measured

in

interconnections per second, and the degree of its interconnection determines its parallelism. This parallelism in data processing needing an enormous quantity of data and computational time [66]. 3. Capacity of generalization: The generalization capacity of a neural network is its capacity to give a satisfactory response for input, which is not part of examples on which it was trained. The degree of generalization possible is related to the quality of the result set up by the network and limited to a certain degree. Training for too long, for example, by forcing the output to reach a very small error over the training set, damages the network‟s capacity for generalization, since the frontiers obtained by the network become very close to the examples that have been learned [59]. 4. Distributed Memory (Fault Tolerance): Each neuron has a memory, which is distributed over many units, giving a valuable property of resistance to noise. In a neural network the loss of one individual component does not necessarily cause the loss of stored data item; and the destruction of one memory unit (neuron) only changes the activation map of the neuron. This will provide a great degree of robustness of fault tolerance [67]. 5. Abstraction and solving problems with noisy data: Some ANN are capable of abstracting the essence of a set of inputs, the neural nets can start with noisy data and make the correct data appear from the network‟s activation map without noise. This ability to extract an ideal from imperfect inputs raises interesting philosophical issues, extracting idealized prototypes is a highly useful ability in humans; it seems that now it may be shared with ANN [68].

63

6. Ease of construction and learning: Computer simulation of a neural network for example is simple and requires only a short development time. For more complex applications, simulator or accelerator cards have proved useful. Neurocomputing is relatively easy to learn, compared with most domain knowledge [69]. 7. Mapping Capabilities: A neural network can be regarded as a black box that transforms input vectors x from an m-dimensional space to an output vector y in n-dimensional space. The mapping may be autoassociative (mapping to an original pattern from a noisy or partially given input pattern) or hetroassociative (mapping from an input pattern to a different output pattern) or multi-layer perceptron or self-organizing feature map [70].

3.9 Limits in the use of Neural Networks There is an operational problem encountered when attempting to simulate the parallelism of neural network on sequential machines, such as rapid increase in processing time requirements as the size of the problem grows. The implementation of neural network directly in hardware will enable the true exploitation of the network‟s parallelism, but may lose much of flexibility of software simulations.

3.10 Artificial Neural Networks Applications 3.10.1 Characteristics of Suitable Applications In the following some of the characteristics of problems which are suitable for implementation on neural networks, given by experts. 1. The problem which makes use of noisy data. 2. The problem which evolves, for example by varying its set of initial conditions.

64

3. The problem which may need very high speed processing, processing in real time for example. 4. The problem which may not have current technical solutions. These constraints suggest possible list of application domains: pattern recognition, signal processing, forecasting and modeling, decisions making aids and robotics. In addition, neural network helps in the study of the brain to develop the neurocomputers and build new computer architecture [71].

3.10.2 Methodology of Neural Network Applications The methodology for using neural network in novel application consists of two stages: The first stage is studying the problem and determining its suitability for solution using neural network considering the characteristics explained in the previous section. The problem must process large number of data points enabling learning to take place and the performance of the network to be verified. Finally the size of the problem must be determined and the possibility of dividing it into sub-problem is considered if necessary. At this stage, the desired performance is defined both for the learning phase and for production use. The second stage is the choice of the type of neural network and its type of implementation is based on the nature of the application, the nature of the data and performance considerations. The table (3.1) gives an indication of the types of neural network suitable for different applications [72,73]. Table (3.1): Current applications areas for different neural network.

65

Network Model Application Backpropagation Classification



Image processing



Decision-making



Hopfield 

Boltzman machine 

Kohnen  



Optimization









3.11 Convergence When a network is trained successfully, it produces correct answers more and more often as the training session progresses. It is important, then, to have a quantitative measure of learning. The root-mean-squared (RMS) error is usually calculated to reflect the degree to which learning has taken place in the network. This measure reflects how close the network is to get the correct answers. As the network learns, its RMS error decreases. Generally, an RMS value below 0.1 indicates that a network has learned its training set. Convergence is a process whereby the RMS value for the network gets closer and closer to 0. Convergence is not always easy to achieve because the process may take. A exceedingly long time and sometimes the network get stuck in a local minimum and stops learning altogether. Typically an application of back-propagation requires both a training set and a test set. Both the training set and test set contain input/output pattern pairs. While the training set is used to train the network, the test set is used to assess the performance of the network after training is complete. In a typical application both sets are taken from real data, although sometimes-simulated data is used as well. If available data is scarce, then small amounts of noise may be added to the data to simulate additional patterns for the training or test sets. In any case, the training and test sets should use patterns typical of the type of data that the 66

network is to encounter later. To provide the best of network performance the test set should be different from the training set [74].

3.12 Strengths and Limitation of Backpropagation The principle strength of back-error propagation is relatively general pattern mapping capability; it can learn a tremendous variety of patterns mapping relationships. It does not require any prior knowledge of a mathematical function that maps the input patterns to the output patterns backpropagation merely needs examples of the mapping to be learned. The flexibility of the paradigm is enhanced by the large number of design choices available for the number of layers, interconnections, processing units, the learning constant, and data representations. As a result back-error propagation might be able to address a broad spectrum of application. Despite many successful applications of backpropagation, it is not panacea; it still presents a certain range of difficulties, which have not been resolved. 1- There is neither theoretical result nor even a satisfactory empirical rule suggesting how a network should be dimensioned to solve a particular problem. Should the network use one hidden layer or more? How many neurons should there be on the hidden layers? What is the relationship between the number of training examples and the number of classes to separate these examples into, and the overall size of the network? [75]. 2- The largest drawback with back-error propagation appears to be its convergence time. Training sessions can require hundreds or thousands of iterations for relatively simple problems. Realistic applications may have thousands of examples in a training set, and it may require days or weeks to train the network, and it may not train at all. Usually this lengthy training needs to be done only during the development of the network, because most

67

applications require a trained network and do not need on-line retraining of the net [76]. 3- As the network trains, the weights can become adjusted to very large values. This can force all or most of the neurons to operate at large value in a region where the derivative of the sigmoid function is very small. Since the error sent back training is proportional to this derivative. The training process can come to a virtual standstill (or Network Paralysis). There is little theoretical understanding of this problem. It is commonly avoided by reducing the step size α, but this extends training time. Various heuristics have been employed to prevent paralysis, or to recover from its effects, but these can only be described as experimental. However, if the step size is too small, for example, convergence of the entire network will take too many steps, while if the step size is too large, there is the risk of oscillation [77]. 4- Backpropagation employs a type of gradient descent; that is, it follows the slope of the error surface downward, constantly adjusting the weights toward a minimum. The error surface of a complex network is highly convoluted, full of hills, valleys, folds, and gullies in high-dimensional space. The network can get trapped in a local minimum (a shallow valley) when there is a much deeper minimum nearby. From the limited viewpoint of the network, all directions are up, and it has no way to escape. Statistical training methods can help avoid this trap, but they tend to be slow [78]. Improvements in convergence have also been found by varying the learning parameter α by starting with a larger value for α and progressing to smaller values. Techniques for avoiding local minimal include changing the network or the training set, and adding noise to the weights. In spite of these improvements, applications developers utilize a variety of specialized accelerator boards, parallel processing machines, and other fast computers in training back-error propagation nets.

68

3.13 Network Size It is simple to convince that the network required to perform recognition of the considered images is a very complex one. The number of connections between two layers is given by multiplying the total number of neurons of two layers, then adding the number of bias neurons connections of the second layer (bias connections of a layer is equal to the number of layer neurons). If there are N0 nodes in the input layer, N1 nodes in the hidden layer and N2 nodes in the output layer, the total number of connections is given by [(N0*N1)+N1]+[(N1*N2)+N2].

69

Chapter Four Proposed Face Recognition System Design 4.1 Introduction Face recognition is one of the most attractive and challenging topics in the fields of pattern recognition and computer vision [79]. The main difficulty in face recognition lies in the fact that complete and effective sets of features to describe and distinguish person’s faces have not been found yet. One reason of difficulty is that the face image can be changed completely in intensity, and if the face is modified for instance by cosmetics, oil, dirt or hair covering, glasses, etc.

Another difficulty in face recognition is the contradiction

between the high dimensionality and low sample number in training set on original face image space, which limits the generalization ability of classical classifier [80,81]. Neural Network related methods achieved better recognition results compared with classical methods. In this search, a multi-networks solution for face recognition is suggested. The system is based on wavelet feature map for feature extraction and multi-layer perceptron neural network for recognition.

4.2 Characteristics of the Implemented Images Before going in details of the face recognition system, the study of digital image representation is needed. Simply, the term monochrome image refers to a two dimensional light intensity function I(i,j), where i and j denote spatial coordinates and the value of I at any pixel (i,j) is proportional to the brightness (gray level) of the image at that pixel. In the real images the neighboring pixels will tend to be highly correlated by its nature.

69

The image are static noise-free of BMP standard format with the size of 160160 pixels. Such image size compared to other sizes seems to provide a reasonable resolution. On the other side, another assumption is made that these images are 256 gray-scale levels [8-bit (28=256)]. Therefore, the resolution provided by this gray-scale is quite adequate for such application.

4.3 Image Datasets (Databases) The image databases are collected such that they have differences in lighting conditions, facial expressions, and background. In a first approach a 256-grayscale-bitmap image of 320x320 pixel is obtained using a digital camera coupled to a 1000 MHz, PIII personal computer. There are many conditions, which uses for prepare of the database such as:

is the head position of the person, and this field has 4 types: straight, left, right, up.



is the facial expression of the person, and this field has 4 types: neutral, smile, sad, and angry.



is the eye state of the person, and this field has 4 types: open, closed, glasses, no-glasses.



is the scale of the image, and this field has 2 types: full and half. Full indicates a full-resolution image (320 columns*320 rows); half indicates a half-resolution image (160*160). For this assignment, the half-resolution images are used for experiments, to keep training time to a reasonable level.

70

The databases are presented briefly in the following sections:1-

The first database contains frontal views of 40 persons, each person having one image. Examples are shown in figure (4.1).

Figure (4.1): Example of first database face files. 2-

The second database contains 40 persons. Each person have three different images, the images are taken under different conditions (varying lighting and facial expressions). Examples are shown in figure (4.2).

Figure (4.2): Example of second database face files. 3-

The third database contains 40 distinct persons, each person having 10 different images, taken at different times, varying lighting slightly, facial expressions (open / closed eyes, smiling / non-smiling) and facial details

71

(with glasses / without glasses). All the images are taken against a white homogeneous background and the persons are in up-right, frontal position (with tolerance for some side movement). Many students, faculty, and staff in the Computer and Software Engineering Department in AL-Mustansiriya University have digital images online. Examples are shown in figure (4.3) that contain relational database, which consists of a set of face images and table of information that are connected together by the primary key (face number).

1

1

3

4

5

6

7

8

9

11

11

11

13

14

15

16

17

18

19

11

11

11

13

14

15

16

17

18

19

31

31

31

33

34

35

36

37

38

39

41

Figure (4.3): Example of third database face files.

72

p9 1 9 2 9 3 9 4 9 5 9 6 9 7 9 8 9 9 9 10 9 11 9 12 9 13 9 14 9 15 9 16 9 17 9 18 9 19 9 20 9 21 9 22 9 23 9 24 9 25 9 26 9 27 9 28 9 29 9 30 9 31 9 32 9 33 9 34 9 35 9 36 9 37 9 38 9 39 9 40 9

p8 1 8 2 8 3 8 4 8 5 8 6 8 7 8 8 8 9 8 10 8 11 8 12 8 13 8 14 8 15 8 16 8 17 8 18 8 19 8 20 8 21 8 22 8 23 8 24 8 25 8 26 8 27 8 28 8 29 8 30 8 31 8 32 8 33 8 34 8 35 8 36 8 37 8 38 8 39 8 40 8

p7 17 27 37 4 7 5 7 6 7 7 7 8 7 9 7 10 7 11 7 12 7 13 7 14 7 15 7 16 7 17 7 18 7 19 7 20 7 21 7 22 7 23 7 24 7 25 7 26 7 27 7 28 7 29 7 30 7 31 7 32 7 33 7 34 7 35 7 36 7 37 7 38 7 39 7 40 7

p6 1 6 2 6 3 6 4 6 5 6 6 6 7 6 8 6 9 6 10 6 11 6 12 6 13 6 14 6 15 6 16 6 17 6 18 6 19 6 20 6 21 6 22 6 23 6 24 6 25 6 26 6 27 6 28 6 29 6 30 6 31 6 32 6 33 6 34 6 35 6 36 6 37 6 38 6 39 6 40 6

p5 1 5 2 5 3 5 4 5 5 5 6 5 7 5 8 5 9 5 10 5 11 5 12 5 13 5 14 5 15 5 16 5 17 5 18 5 19 5 20 5 21 5 22 5 23 5 24 5 25 5 26 5 27 5 28 5 29 5 30 5 31 5 32 5 33 5 34 5 35 5 36 5 37 5 38 5 39 5 40 5

p4 1 4 2 4 3 4 4 4 5 4 6 4 7 4 8 4 9 4 10 4 11 4 12 4 13 4 14 4 15 4 16 4 17 4 18 4 19 4 20 4 21 4 22 4 23 4 24 4 25 4 26 4 27 4 28 4 29 4 30 4 31 4 32 4 33 4 34 4 35 4 36 4 37 4 38 4 39 4 40 4

p3 1 3 2 3 3 3 4 3 5 3 6 3 7 3 8 3 9 3 10 3 11 3 12 3 13 3 14 3 15 3 16 3 17 3 18 3 19 3 20 3 21 3 22 3 23 3 24 3 25 3 26 3 27 3 28 3 29 3 30 3 31 3 32 3 33 3 34 3 35 3 36 3 37 3 38 3 39 3 40 3

p2 1 2 2 2 3 2 4 2 5 2 6 2 7 2 8 2 9 2 10 2 11 2 12 2 13 2 14 2 15 2 16 2 17 2 18 2 19 2 20 2 21 2 22 2 23 2 24 2 25 2 26 2 27 2 28 2 29 2 30 2 31 2 32 2 33 2 34 2 35 2 36 2 37 2 38 2 39 2 40 2

p1 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 1 10 1 11 1 12 1 13 1 14 1 15 1 16 1 17 1 18 1 19 1 20 1 21 1 22 1 23 1 24 1 25 1 26 1 27 1 28 1 29 1 30 1 31 1 32 1 33 1 34 1 35 1 36 1 37 1 38 1 39 1 40 1

p 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

Work Place Age Sex Name Comp. Eng. Dept. 21 Female Bushra Adel Elect. Eng. Dept. 35 Female Ahlam Najem Comp. Eng. Dept. 23 Male Ali Huseen Comp. Eng. Dept. 31 Male Anmar Ahmed Comp. Eng. Dept. 32 Male Aus Raheem Comp. Eng. Dept. 30 Female Asraa Mahmod Comp. Eng. Dept. 27 Male Raad Mahsen Comp. Eng. Dept. 25 Male Ali Jalawy Comp. Eng. Dept. 21 Female Basma Fozy Mech. Eng. Dept. 40 Male Fuaad Alwan Elect. Eng. Dept. 35 Male Harithj Ali Comp. Eng. Dept. 31 Male Qutiba Fahid Comp. Eng. Dept. 24 Male Qusan Farhan Comp. Eng. Dept. 25 Male Hewa Jumaa Comp. Eng. Dept. 34 Male Geny Ahmed Comp. Eng. Dept. 31 Male Sedeq Baker Comp. Eng. Dept. 29 Male Raged Yosef Comp. Eng. Dept. 30 Female Valee Abrahim Comp. Eng. Dept. 26 Male Aumer Ramzi Comp. Eng. Dept. 33 Male Yaser Kaldon Comp. Eng. Dept. 45 Female Safia Taher Comp. Eng. Dept. 32 Male Hasim Fatuhi Comp. Eng. Dept. 35 Male Riyad Abedraba Comp. Eng. Dept. 27 Male Ahmed Jasim Comp. Eng. Dept. 38 Male Qusay Adnan Comp. Eng. Dept. 33 Male Luay Tareq Comp. Eng. Dept. 37 Male Mohand Fadel Comp. Eng. Dept. 23 Male Majed Mehsen Comp. Eng. Dept. 25 Male Kalid Abedallah Comp. Eng. Dept. 21 Male Amaar Jasim Comp. Eng. Dept. 26 Male Qies Radi Comp. Eng. Dept. 29 Male Marwan Ahmed Comp. Eng. Dept. 26 Male Mushib Lazam Comp. Eng. Dept. 33 Male Fadel Muni Comp. Eng. Dept. 25 Male Dafer Rafa Comp. Eng. Dept. 23 Male Wasem Nehad Comp. Eng. Dept. 31 Female Nian Nehaad Comp. Eng. Dept. 32 Male Karam Hafed Elect. Eng. Dept. 21 Male Taraq Jasim Comp. Eng. Dept. 27 Male Basem Kalid

Figure (4.3): Continued. Table of face image information of third database. P: Original face image number. P1, P2, P3, P4, P5, P6, P7, P8, P9: Face images for each person.

73

4-

The experiments used the ORL face database [2]. The database contains 400 images compiled of 40 persons with 10 images each. The faces are in a frontal upright position and show a range of expressions. Side movement and head tilt were tolerated to a limited extent only. Figure (4.4) shows examples of the database.

Figure (4.4): Example of ORL database face files. 74

5-

Rejection Faces: There are 18 face images to train the neural network on rejection of unknown faces in database as shown in figure (4.5).

Figure (4.5): Example of rejection database face files. Two test schemes are designed to compare the error. In the first scheme, the query set is composed of half the set images that are not used for the training (the compound data set minus the training set). The second scheme takes all the images as the query set for training and testing. The faces are consistently positioned in the image frame, and very little background is visible.

4.4 Image Thresholding A major technique in image processing is the location of prominent edge in an image. In general, this can be achieved by generating an edge map and selecting some threshold based on edge strength such that only prominent edges are retained. The way in which the threshold is chosen is usually non-trivial. A thresholded image g(i,j) is defined as: 1

if I (i, j)  T

 0

if I (i, j )  T

g (i, j )  

...........................................................................(4.1)

75

N N T    ( I (i, j)) / N 2 ………………...….……..………………………..(4.2) i 1 j 1

where I(i,j) is the gray level of pixel (i,j), N2 is the square size of image, and T denotes the threshold value.

4.5 Limitations and Research Direction of Proposed Method In this thesis, the main emphasis is on reducing the number of the necessary calculations before training through backpropagation. One limitation of this technique lies in the fact that faces should be under the same lighting effects and same position. Since the recognition is carried out on values of pixel intensities, any minor change in the lighting reflects on the whole image and makes it almost impossible for the network to generalize. Current work is geared towards the use of multi-networks that implement layers of different types of networks. Backpropagation based networks can be placed at the end of the process of recognition. Each of which specialize in a specific task, in such architectures, a lot of effort are spend toward reducing the input space and guaranteeing a good performance under usual conditions, where lighting, rotation and tilting effects should not affect recognition.

4.6 Characteristics of the Proposed Face Recognition System 1.

The system is on-line with human interaction.

2.

It used static matching. 3. Images are gray-scale, 2-D, frontal, upright views of almost uniform illumination.

4.7 Feature Selection and Classification Any object, which can be recognized and classified, possesses a number

76

of discriminatory properties or features. The process of extracting these essential properties from an input data set, and which are invariant to transformations of the input, is referred to as feature extraction. Figure (4.6) shows the classification stages that should be contained in any classification system.

Input face images

Feature vectors

Output classes

Feature

Classification

extraction

Figure (4.6): Classification stages. In feature extraction, one seeks a new feature space, which represents better observations for a given purpose. In this way data can be transformed from high dimensional pattern space to low dimensional feature space. Recently, wavelet transform has been used to solve the task of feature extraction. Some networks provide impressive mapping of a fairly complex structure. Application of a classifier first requires selection of features that should contain information required to distinguish between classes, and to be insensitive to irrelevant variability in the input. Following feature selection, classifier development requires collection of training and testing data sets, and separating training and testing or use phases as shown in figure (4.7).

4.7.1 Training Phase The classifier should be trained by a set of training face images (feature-vectors). Training the network involves moving from the training set to a set of weights that correctly classifies the training set of vectors at least within some defined error limit.

77

Training Face Images

Testing Face Images

Feature Extraction

Feature Extractor

Classifier

Classifier

Classification Weight vectors

Faces Classes

(a)

(b)

Figure (4.7): Two phases of classification: (a) Training phase. (b) Testing phase.

4.7.2 Test Phase (Retrieving Phase) The trained classifier assigns the unknown (new) input face images to one of the categories or clusters based on the extracted feature-vectors. It is important to note that test data should never be used to estimate classifier parameters or to determine classifier structure. In this search (1-10) image per person were taken; half of the images for training and the other for testing.

4.8 The Structure of Proposed Face Recognition System In the following sections, the techniques, which form the components of the face recognition system, are introduced as shown in figure (4.8). Section (4.8.2) presents the histogram equalization, while section (4.8.3) describes the wavelet transform in details. In sections (4.8.4) and (4.8.5) the network architecture is introduced and section (4.8.6) presents the arbitration method. 78

Data Preparation

Histogram Equalization

Wavelet Transform

N1 N2 Networks Formation

N3

Threshold

N4

Judgment Process

Database & Lookup Table

Result

Figure (4.8): Block Diagram of Proposed Face Recognition System.

4.8.1 Background of Database Images One characteristic of these databases is that the images have fairly uniform backgrounds. Because the recognizer itself will be trained with regions of the image, which include the background, it is needed to make sure that the recognizer is taught to simply look for the uniform background.

79

4.8.2 Preprocessing for Brightness (Lighting Variation) and Contrast Real-world input data always contains some amount of noise and certain preprocessing is needed to reduce its effect. The term noise is to be understood broadly: anything that hinders a face recognition system to fulfill its objective, may be regarded as noise no matter how inherent this noise is in the nature of the data. Some desirable properties of the data may also be enhanced with preprocessing before the data is fed into the recognition system. Images had glitches due to problems with the camera setup; these are bad images. This variation is caused by lighting and camera characteristics, there is major source of variation (apart from intrinsic differences between faces), which can result in brightly or poorly lighted images, or images with poor contrast. These problems are first addressed by using a simple image processing approach. This preprocessing technique first attempts to equalize the intensity values across the image. Histogram equalization is performed, which nonlinearly maps the intensity values to expand the range of intensities in the image. The histogram is computed for pixels inside the image. This compensates for differences in camera input gains, as well as improving contrast in some cases. Some sample results from each of the preprocessing are shown in figure (4.9). It is noted that the preprocessing steps of the histogram equalization step flatten the histogram.

(a)

(b)

Figure (4.9): The sample results from preprocessing. (a) Original image. (b) Equalized image. 80

Part of the motivation of the preprocessing steps in the previous section is to have robustness to variations in the lighting conditions, for instance lighting from the side of the face, which changes its overall appearance.

Histogram Equalization: The gray-level histogram of an image is the distribution of the gray levels in an image. Figure (4.10), illustrates an image and its corresponding histogram for 8-bit 160x160 pixels. In general, a histogram with a small spread has low contrast, and a histogram with a wide spread has high contrast. Histogram equalization is a non-linear process reassigning the brightness of pixels on the basis of the image histogram. Individual pixels retain their brightness order (each pixel remains brighter or darker than other pixels) but the values are shifted, so that an equal number of pixels have each possible brightness values. In many cases, this spreads out the values in areas where different regions meet, showing details in areas with a high brightness gradient. This normalization operation expands the range of brightness intensities in the image. This compensates for differences in camera input gains (response curves), as well as improving contrast in some cases. The histogram equalization process for digital images consists of four steps: 1-Find the running sum of the histogram values. 2-Normalize the values from step 1 by dividing by the total number of pixels. 3-Multiply the values from step 2 by the maximum gray level value and round. 4-Map the original gray level values to the results from step 3 using a one-to-one correspondence. Figure (4.10b) shows the result of histogram with very poor contrast. Figure (4.10d), illustrates the results of applying histogram equalization to image. The results of this process are often very dramatic.

81

Number of pixels 0

Gray value

(b)

Number of pixels

(a)

255

0

(c)

Gray value

255

(d)

Figure (4.10): Histogram of original and equalized images. (a) Original image. (b) Histogram of image in (a). (c) Image after histogram equalization (d) Histogram of equalized image in (c).

4.8.3 Face Preprocessing (Image Size Reduction) The main drawback of such network, regardless of its architecture and learning algorithm, is the size of its inputs. A small 160x160 pixels image represents 25600 inputs, besides, the network should be provided with a large number of images so that decent generalization can be achieved. Therefore if it is combined with a large number of training epochs, training would require a 82

huge amount of time and resources. Many solutions have been suggested to remedy this problem, one of the simplest is to apply a spatial resolution reduction to the image. This method consists of extracting key features in the face and feeding them into the network instead of utilizing all the pixels as shown in figure (4.11).

(a)

(b)

Figure (4.11): Reduction of image size. (a) Original image (160x160) pixels. (b) Compressed image (20x20)pixels. In a face recognition system, searching is the most computationally expensive operation due to the large number of images available in the database. Therefore, efficient search algorithms are a prerequisite of recognition systems. In most systems, face features are extracted from the original images, and stored in the face feature database. In recognition, the same features are extracted from the input face and the features of the input images are compared with the features of each model image in the database. Apart from adopting a fast facematching algorithm, a preprocessing operation can further speed up the search by reducing the number of candidates and the actual face feature matching.

4.8.3.1 Wavelet Transform The drawback with the suggested system is that when it is run on a Pentium-I personal computer it is too slow, and when implemented in hardware

83

it is too large and expensive. Clearly, one would like to reduce the number of inputs by using some kind of feature extracting preprocessor. Suggesting that the total number of nodes should be kept below one hundred. A method to reduce the number of inputs, while, at the same time, increase the flexibility and the redundancy of the total system is presented. One of the most modern transform, used today in a number of applications is the discrete wavelet transform (DWT). This is a very versatile orthogonal transform. It is used with very good results for data compression and for the enhancement of the signal to noise ratio [56]. In fact, approximately 50% of the largest wavelet coefficients need to be retained in order to yield a fair reconstruction of the input vector as shown in figure (4.12). The wavelet transform can be described as a transform that has basis functions that are shifted and expanded versions of themselves. Because of this, the wavelet transform contains not just frequency information but spatial information as well. One of the most common models for a wavelet transform uses the fourier transform and highpass and lowpass filters. The wavelet transform breaks an image down into four subsampled, or decimated, images, they are subsampled by keeping every other pixel. The results consist of one image that has been highpass filtered in both the horizontal and vertical directions, and one that has been highpass filtered in the vertical and lowpass filtered in the horizontal, the other one is that has been lowpassed in the vertical and highpassed in the horizontal, and the last one that has been lowpass filtered in both directions.

84

(a) Full signal

(b) Coefficients

(c) Reconstruction

Figure (4.12). Wavelet transform using Daubechies 4. (a) The input vector of 160*160 elements. (b) Distribution of the WT coefficients. (c) Reconstructed vector using 50% of the coefficient.

Algorithm listing provides how such an implementation can be simulated, and results of the transform are performing by doing the following: 1.

Convolve the lowpass filter with the rows (this is done by sliding, multiplying coincident terms, and summing the results) and save the results. (Note: For the basis vectors as given, they do not need to be reversed for convolution).

2.

Convolve the lowpass filter with the columns (of the results from step 1) and subsample this result by taking every other value; this gives us the lowpass-lowpass version of the image.

3.

Convolve the result from step 1, the lowpass filtered rows, with the highpass filter on the columns. Subsample by taking every other value to produce the lowpass-highpass image.

4.

Convolve the original image with the highpass filter on the rows and save the result.

5.

Convolve the result from step 4 with the lowpass filter on the columns; subsample to yield the highpass-lowpass version of the image.

85

6.

To obtain the highpass-highpass version, convolve the columns of the result from step 4 with the highpass filter. In practice the convolution sum of every other pixel is not performed since the resulting values are not used. This is typically done by shifting the basis vector by 2, instead of by 1 at each convolution step. The convention for displaying the wavelet transform results, as an image, is shown in figure (4.13). Figure (4.14), shows the results of applying the wavelet transform to an image. While figure (4.14b), illustrates the lowpass-lowpass image in the lower-left corner, the lowpass-highpass images on the diagonals, and the highpass-highpass in the upper-right corner. The same wavelet transform on the lowpass-lowpass version of the image can be continued to obtain seven subimages, as in figure (4.14c), or it may be performed another time to obtain ten subimages, as in figure (4.14d). This process is called multiresolution decomposition.

LOW/

HIGH/

HIGH

HIGH

LOW/

HIGH/

LOW

LOW

Figure (4.13): Wavelet transform display location of frequency bands in a four-band wavelet-transformed image. Designation is row / column.

The resulting images show that the transform contains spatial information because the image itself is still visible in the transform domain. Compare this to the spectrums of the previous transforms, where there is no visible correlation to the image itself.

86

(a). Original image

(c). Wavelet transform using Daubechies basis vectors, seven bands.

(b). Wavelet transform using Daubechies basis vectors, four bands.

(d). Wavelet transform using Daubechies basis vectors, ten bands.

Figure (4.14): Wavelet transform.

4.8.3.2 Advantages of Utilize of Wavelet Transform The advantages of the wavelet transform for static image analysis are well known. It has been found that using the wavelet transform in time and space, combined with a multiresolution approach, leads to an efficient and effective method of compression. In addition, the computational requirements are considerably less than for other compression methods, and are more suited to VLSI implementation [53,58]. It relies on redundancy within the image and the characteristics of the human visual system to achieve high rates of compression without causing intolerable image degradation. It has a number of advantages over more conventional techniques.

87

4.8.4 Individual Face Recognition Networks An image is a rectangular grid of pixels; each pixel has an integer value ranging from 0 to 255. Rows and columns index images; row 0 is the top row of the image, column 0 is the left column of the image. There are two key problems of using neural networks for pattern recognition. The first one is how to design a suitable network topology (i.e., architecture of the network, the form of connection, the number of neurons and weights, etc). The other problem is to find a fast and effective algorithm for invariance extraction. The system operates into stages: it applies a set of neural network-based recognizers to an image, and then uses an arbitrator to combine the outputs. The individual recognizers examine each location in the image at several scales, looking for locations that might contain a face. The arbitrator then merges detentions from individual networks and eliminates overlapping detentions. The network has retinal connections to its input layer; the receptive fields of hidden units are shown in figure (4.15). The input image is broken down into smaller pieces. There are four types: 4 pieces each 10x10 pixel subregions, 16 piece each 5x5 pixel subregions, 16 piece overlapping 5x20 pixel horizontal stripes of pixels, and 16 piece overlapping 20x5 pixel vertical stripes of pixels. The neural network and image access code by supplying C++ code for three layers fully connected feedforward neural network, which uses the backpropagation algorithm to tune its weights. Although figure (4.15) shows a single hidden unit for each subregions of the input, these units can be replicated. For the experiments, which are described later networks with two sets of these hidden units are used. The shapes of these subregions were chosen to allow the hidden units to detect local features that might be important for face recognition. It is important that the input is broken into smaller pieces instead of using complete connections to the entire input. In particular, the horizontal stripes allow the hidden units to detect such features as

88

mouths or pairs of eyes and translating the image to up or down. The vertical stripes allow the hidden units to detect such features and translating the image to left or right. While the hidden units with square receptive fields might detect features such as individual eyes, the nose, or corners of the mouth.

Activation Function: An activation function for a backpropagation net should have several important characteristics: It should be continuous, differentiable,

and

monotonically

non-decreasing.

Furthermore,

for

computational efficiency, it is desirable that its derivative be easy to compute. For the most commonly used activation functions, the value of the derivative (at a particular value of the independent variable) can be expressed in terms of the value of the function (at that value of the independent variable). Usually, the function is expected to saturate, i.e. approach finite maximum and minimum values asymptotically. The common activation function is bipolar sigmoid, which has range of (-1,1) and is defined as:

f ( x) 

2  1 ..............................................................................(4.3) 1  exp( x)

with f ( x)  1 [1  f ( x)][1  f ( x)] ..................................................................(4.4) 2

Where, ƒ´(x) is the derivative of ƒ(x).

89

Receptive fields 10*10 Pixels

10*10 Pixels

Input Image

10*10 Pixels

10*10 Pixels

*5 5

Preprocessing

Histogram Equalized

*5 5 *5 5 *5 5

Wavelet Transform Compression

Hidden units

*5 5

*5 5

*5 5

*5 5

*5 5

*5 5

*5 5

*5 5

*5 5

O

O

O

O

. . O

O O

O

O

O O

O

O

O

O O

O

O

. .

O O

O

O

O

*5 5

*5 5

*5 5

O

Output units

5*20 Pixels

Output O

20 by 20 Pixels

5*20 Pixels

O

O

. .

...

5*20 Pixels Network input

O

.. .

O

O O

5*20 Pixels 5*20 Pixels

O 20* 5 Pixels

20* 5 Pixels

20* 5 Pixels

20* 5 Pixels

20* 5 Pixels

...

O

O

O

. .

...

O

O O

Neural Network Figure (4.15): The basic algorithm used for face recognition. 90

4.8.5 The Backpropagation Training Algorithm From the training steps, the backpropagation learning algorithm involves a forward-propagating step followed by a backward-propagating step. Both the forward and the backward propagation steps are performed for each pattern presentation during training as shown in figure (4.16). The usual motivation for applying a backpropagation net is to achieve a balance between correct responses to training patterns and good responses to new input patterns (i.e., a balance between memorization and generalization). It is not necessarily advantageous to continue training until the total squared error actually reaches a minimum.

4.8.5.1 Forward Propagation The forward-propagation step begins with the presentation of an input pattern to the input layer of the network, and continues as activation level calculations propagate forward through the hidden layers. In each successive layer, every processing unit sums its inputs and applies a sigmoid function to compute its output. The output layer of units then produces the output of the network. Backpropagation networks employ a bias unit (sometimes called threshold) as part of every layer but the output layer. This unit has a constant activation value of 1. Each bias unit is connected to all units in the next higher layer, and the weight on this connection is the negative of the threshold, and it can be learnt in just the same way as the other weights. The bias units provide a constant term in the weighted sum of the units in the next layer. The result is sometimes an improvement on the convergence properties of the network. The bias unit also provides a "threshold " effect on each unit it targets, which is the operand in the sigmoid function. This is equivalent to translating the sigmoid curve to the left or to the right. In this way, the bias units provide an adjustable threshold for each target unit. 91

Start of learning Forward propagation

Select random values of weights

Select the next training pair, (inputs and desired output)

Calculate the actual output of the network

Calculate the error, where (Error =desired output-actual output)

Adjust the weights of the network. In away that minimizes the error.

Backward propagation

Is the error acceptable?

No

Yes

End of learning

Figure (4.16): Flowchart of back propagation neural network training steps.

92

4.8.5.2 Backward Propagation The backward-propagation step begins with the comparison of the network's output pattern to the target vector, when the difference, or "error", is calculated. The backward-propagation step then calculates error values for hidden units and changes for their incoming weights, starting with the output layer and moving backward through the successive hidden layers. In this backpropagation step the network corrects its weights in such a way as to decrease the observed error. A typical backpropagation example might entail hundreds or thousands of training iteration. The nomenclature used in the training algorithm for the backpropagation net is as follows: x

Input training vector. x = (x1,……xi,…..xn).

t

Output target vector. t = (t1,……tk,…..tm).

δi

Portion of error correction weight adjustment for wjk that is due to an error at output unit Yk: also, the information about the error at unit Y k that is propagated back to the hidden units that feed into unit Yk.

δj

Portion of error correction weight adjustment for v ij that is due to the backpropagation of error information from the output layer to the hidden unit Zj.

α

Learning rate.

Xi

Input unit i:

voj

Bias on hidden unit j.

Zj

Hidden unit j.

wok

Bias on output unit k.

Yk

Output unit k.

93

The error value, denoted by the variable  , is simple to compute for the output layer and somewhat more complicated for the hidden layers. The amount adjusted depends on two factors:  and α. This weight adjustment equation is known as the generalized  rule. The variable in the weight adjustment equation is the learning rate. Its value (commonly between 0 and 1) is chosen by the neural network's user, and usually reflects the rate of learning of the network. The values that are very large can lead to instability in the network, and unsatisfactory learning values that are too small can lead to excessively slow learning. Sometimes the learning rate is varied in an attempt to produce more efficient learning of the network. Training the backpropagation network requires the following steps (as shown in figure (4.16)): 1. Initialize weights. (Set to small random values). 2.Select the next training pair from the training set; apply the input vector to the network input and specify the desired output vector. 3.Calculate the actual outputs of the network. Each input unit (X i ,i= 1,2, ….n) receives input signal x i and broadcasts this signal to all units in the layer above (the hidden units). n z _ in j  voj   xi vij ..................................................................................(4.5) i 1

Each hidden unit (Z j , j= 1,2, …..p) sums its weighted input signals. Applies its activation function to compute its output signal. And sends this signal to all units in the layer above (output units).

z j  f ( z _ in j ) ..............................................................................................(4.6) p y _ ink  wok   z j w jk .............................................................................(4.7) j 1

Each output unit (Y k, k= 1,2,…….m) sums its weighted input signals. And applies its activation function to compute its output signal.

yk  f ( y _ ink ) ................................................................................................(4.8)

94

4.Calculate the error between the actual output of the network and the desired output (the target vector from the training pair). Each output unit (Yk , k=1,2,….m) receives a target pattern corresponding to the input training pattern, computes its error information term.  k  (tk  yk ) f ( y _ ink ) ..............................................................................(4.9) Calculates its weight correction term (used to update wjk later). w jk   k z j ..............................................................................................(4.10)

calculates its bias correction term(used to update wok later). wok   k .................................................................................................(4.11)

and sends δk to units in the layer below. Each hidden unit (Zj, j=1,2,………p) sums its delta inputs (from units in the layer above). m

 _ in j    k w jk ..................................................................................(4.12) k 1

Multiplies by the derivative of its activation function to calculate its error information term.  j   _ ini f ( z _ ini ) ...................................................................................(4.13)

Calculates its weight correction term (used to update vij later). vij   j xi ..............................................................................................(4.14)

and calculates its bias correction term (used to update v oj later). voi  i .................................................................................................(4.15)

5.Adjust the weights of the network in a way that minimizes the error. Updates weights and biases: Each output unit (Yk, k=1,2,….m) updates its bias and weights (j=0,1,….p): w jk (new)  w jk (old )  w jk .....................................................................(4.16)

Each hidden unit (Zj, j=1,2,….p) updates its bias and weights (i=0,1,…n): vij (new)  vij (old)  vij ............................................................................(4.17)

6.Repeat steps 2 through 5 for each vector in the training set until the total error for the entire set is acceptable.

95

For improving the training time of the backpropagation algorithm, as well as enhancing the stability of the process, a method called momentum is described. This method involves adding a term to the weight adjustment that is proportional to the amount of the previous error change. Once an adjustment is made it is "remembered" and serves to modify all subsequent weight adjustments. The adjustment equations are modified to the following. w jk (t  1)  w jk (t )   k z j  [w jk (t )  w jk (t  1)] ..........................................(4.18)

or w jk (t  1)   k z j  w jk (t ) ..........................................................................(4.19)

and vij (t  1)  vij (t )   j xi  [vij (t )  vij (t  1)] ....................................................(4.20)

or vij (t  1)   j xi  vij (t ) .............................................................................(4.21)

Where μ, is the momentum parameter in the range of 0 to 1. Momentum allows the net to make reasonably large weight adjustments as long as the corrections are in the same general direction for several patterns, while using a smaller learning rate to prevent a large response to the error from any training pattern. When using momentum, the net is proceeding not in the direction of the gradient, but in the direction of a combination of the current gradient and the previous direction of weight correction.

4.8.6 Arbitration among Multiple Networks To further reduce the number of false recognizer, It is possible to apply multiple networks, and arbitrate between their outputs to produce the final decision. Each network is trained using the same algorithm with the same set of face examples. However, because of different training the networks will make

96

different errors. This strategy signals recognition not only if each networks recognize the same face at precisely. Due to the different conditions of the individual networks, they will rarely agree on a false recognition of a face. This allows eliminating most false recognition. Unfortunately, this heuristic can decrease the recognition rate because a face recognized by only one network will be thrown out. However, it is seen later that individual networks can all detect roughly the same set of faces. These arbitration heuristics can all be implemented with variants of the threshold algorithm. For instance, it can be implemented by combining the results of every network, and applying threshold. The use of arbitration multiple networks significantly improves the accuracy of the recognizer. The methods of merging overlapping region where a face was found. The technique, "thresholding", involves combining multiple, near-by detection into one.

4.8.7 Classification Classification can be performed by means of the threshold. The threshold for presented test face decides if the output belongs to the same person or not, the classification is successful to select two nearest faces. This allows the use of a threshold in order to gain some confidence in the decision. If the output result of the closest match is smaller than the selected threshold the system rejects the test image. Use of a threshold makes it possible to reject faces that do not belong to the trained subject this is a very important property for a face recognition system. Therefore, a non-linear classifier is also constructed.

4.9 Face Training Images The training algorithm is the first stage neural network adopted a conventional feedforward network with one and two hidden layer. It was trained using the backpropagation algorithm using a sigmoid and bipolar as transfer functions. This neural network is of the same architecture but has many input 97

nodes, many hidden nodes and has as many output nodes as individuals to be recognized, which is encoded on (n) binary outputs to allow for maximum faces to be recognized. The number of outputs units can be increased to fit a larger number of faces. To give the recognizer some robustness to slight variations in the faces for experiments, the weights and biases of the network are initialized to random values in [-0.05,0.05]. The maximum number of training epochs is 3000. To allow comparisons, the same training and test set size of images are used. Hence there are 200 training images and 200 test images in total and no overlap exists between the training and test images. On the first iteration of this loop, the network’s weights are initialized randomly. After the first iteration, the weights computed by training in the previous iteration are used as the starting point. The process is repeated many times for each pattern in the training set until the total output error converges to the minimum or until some limit is reached in the number of training iterations completed.

4.9.1 Training Mode Each network sub-net consists of input layer (100 neurons), one and two hidden layers (The first hidden layer contains 70 neurons and the second hidden layer contains 50 neurons) and an output layer of 6 neurons. Each sub-net is trained on half set images per person. The training algorithm used in error backpropagation algorithm with a momentum term. The neurons use the bipolar activation function, which gives an output ranging from -1 to 1, hence the threshold of 0 for the recognition of a face. Neural network training usually requires training the network many times on its training images; a single pass through 10560000 scenery images not only requires a huge amount of storage as shown in figure (4.17) (where M=2 n), but

98

also takes nearly a day on a computer. Therefore, the preprocessed process is compressed the face image to reduce the number of input pixels to speed up the system and decrease the storage media.

4.9.2 Testing Mode During this mode half-new set of images are applied to each of the subnets to find a local winner within the same network. Then, find the global winner pattern (which is the pattern among the 4 local winners). Training set

1

2

3

…….

M

Test image Which image in the training set matches the test image best?

Test

Figure (4.17): Training and testing modes for face recognition system.

The alignment algorithm converges face to a 20x20-pixel image. Three face examples are generated for the testing set from each original image, by randomly rotating the images (about their center points) up to 7 degree, translating up, down, left, right, mirroring, with glasses / without glasses, and varying expression. A few example images are shown in figure (4.18).

99

Result

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

(j)

Figure (4.18): Examples of original, rotated, non-face, and translated images. (a)

& (f) Original images.

(b), (c), (d) & (e) Translated images.

(g) & (h) Rotated images.

100

(i) & (j) Non-face images.

4.9.3 Non-Face Training Images A large number of nonface or wrong (rejected) face images are needed to train the face recognizer, because the variety of nonface images is much greater than the variety of face images as shown in figures (4.18). The number of output nodes depends on the number of classes that are recognized. Since the recognition system is specified for 40 classes, therefore the total number of nodes for network structure is 41 node. The appended node for rejection unknown face that does not belong to the 40 class. When it uses the coded output nodes of neural network, the rejected images gives a specified code not similar to the coded one of any class of 40 class that is known. The disadvantage of neural network is the probability of wrong classification for all the unknown images in the known classes. Therefore, to solve this problem the network is trained on the larger number of these rejected faces.

101

Chapter Five Simulation and Experimental Results 5.1 Introduction A central characteristic of the face recognition problem is that the number of different targets or faces that the system has to cope with it at least in principle unlimited. Thus the problem can not be solved by straightforward matching or database searches. Each face has to be classified to one of the classes. The system is designed based on a sample of typical faces representing different classes, and hence the system must be able to classify new, unknown faces with minimum error. This is often called generalization. For the face recognition system, several persons face images were used to measure its performance. The tests are made on different images for each person. Half set images for training and new half set images for testing. There was no overlap between the training and test sets. To evaluate the performance of any neural network recognition system, the accuracy of the system can be calculated as: Accuracy 

Number of correctly classifies patterns Total number of paterrns

The proposed system is implemented using computer simulation implemented with C++ programming language and Visual Basic.

5.2 System User Interface According to the BMP image format. Digital image header and palette represent the first 1078 samples of the original BMP image (header (54) plus, the palette (1024) byte).

201

In this search, a Digital Image-Processing Package (DIPP) for user interfaces has been implemented using the visual basic V.6. The use of visual basic language in designing and writing computer program has the following merits: 1.Provide user-friendly environment to design the user interface and menus. 2.There are almost no limitation in computer memory needed to hold the image due to the establishment of the virtual memory concept. 3.Since all windows languages deal with object, there is no need for reading the header of the image to reach the image data, i.e. it is not needed to know any information about the image format. Instead, the visual basic objects (like picture box object) offer an easy way to access the image data of any digital image format. Figure (5.1) shows the user interface of the system, which consists of the command buttons, the test-face image (make test image), the process for searching (search), and the results of processing (which consists of recognized face image, execution time, search date, name, sex, age, and work place) for recognized face from database as well as lockup table for face image information. Figure (5.2) illustrate the flow chart of DIPP of system user interface with the processing steps of proposed face recognition and the language that is used for implementation of the steps. The test image is compared to the more similar encountered using the database. However, those similarity are based on the threshold of judgment process can easily be ordered such that three of these face images were not identified but belonging to any of the trained subjects. The judgment process identifies the recognized face image. If the tested face image belongs to the database, the system loads the information of the recognized face image and display data. But if the tested face image does not belong to the database (new or rejected face images) the system display the message undefined face in database, as shown in figures (5.1c) and (5.1d).

201

(a)

(b) Figure (5.1): Face recognition system user interface. (a) Splash form. (b) Main form of recognized face image.

201

(c)

(d) Figure (5.1): Continued. (c) Main form of not recognized face image (new face image). (d) Main form of not recognized face image (rejected image). 201

Start of Processing

Visual Basic

Display Splash Form Display Main Form Make Test Image Command Button Search Command Button

Wavelet Transform Operations

Networks Formation Operations

C++ Language

Histogram Equalization Operations

Judgment Process

Load Information for Recognized Face Image from Lookup Table of Face Image Information

Display the Execution Time and Search Date End of Processing Figure (5.2): Flow chart of DIPP of system user interface with the processing steps of proposed face recognition system. 201

Visual Basic

Display the Recognized Face Image from Database

5.3 Results of Testing Images In the following sections the results of the system implementation are presented. In all the experiments the size of the image is 160x160 pixel with 256 gray level.

5.3.1 Test of Image with Glasses In the training phase the system have learned pictures randomly with and without wearing glasses as shown in figure (5.3). In the test phase an image with and without glasses is presented and the system perfectly recognize it. Images of persons wearing glasses were the most images missed due to the reflected light into the camera. The recognizer is not trained on such images, and expects the eyes to be darker than the rest of the face. Thus the recognition rate for such faces is lower.

(a)

(b)

(c) (d) Figure (5.3): Test faces with and without glasses. (a) & (b) Training images. (c) & (d) Testing images.

201

5.3.2 Test of Image with Noise Random noise is an important topic to be considered in digital image processing. Hence it is required to test the performance of the proposed algorithms in the presence of noise. In this work adobe PhotoShop can be used to create an arbitrary distributed noise with an image. This option was used to add noise to the image. Figure (5.4) shows the effect of adding noise to an image.

(b)

(c)

0

Gray value

(d)

255

Number of pixels

Number of pixels

Number of pixels

(a)

0

Gray value

255

0

Gray value

(e)

(f)

Figure (5.4): Test images with / without noise. (a)Test image without noise. (b)Test image with grain noise (SNR = 27 dB). (c)Test image with grain noise after enhancement. (d)Histogram of original image. (e)Histogram of noisy image. (f)Histogram of enhanced image.

201

255

(h)

Number of pixels

Number of pixels

(g)

0

Gray value

255

0

(i)

Gray value

255

(j)

(l) Number of pixels

Number of pixels

(k)

0

Gray value

0

255

(m)

Gray value

255

(n)

Figure (5.4): Continued. (g)Test image with medium distorts diffuse glow noise (SNR = 8 dB). (h)Test image with noise after enhancement. (i)Histogram of noisy image. (j)Histogram of enhanced image. (k)Test image with highly distorts diffuse glow noise (SNR = 5 dB). (l)Test image with noise after enhancement. (m)Histogram of noisy image. (n)Histogram of enhanced image. 201

The histogram equalization option performs the digital image enhancement using the histogram equalization technique. Figure (5.4) shows the input image and the enhanced image with a sketch of their histogram. A good benefit of neural networks is their ability to recognize noisy images. Test on a noisy image was made and the system successfully recognized it as shown in figure (5.4). This figure shows that the system has noise immunity that well known in the neural network generally and for the network designed in this thesis especially, that is clear from some test noisy face images.

5.3.3 Test of Image with Orientations (pan, tilt & translate) The system was tested with images having some rotation side-to-side, translating, and tilting as shown in figure (5.5). The system described is invariant to approximately 7 degree of tilt from frontal upright (both clockwise and counterclockwise), i.e. the face recognition system shows the ability to recognize rotated images to about 7 degrees. Therefore, the results were very encouraging.

220

(a)

(b)

(c)

(d) (e) (f) Figure (5.5): Test of face images with some rotate, translate. (a) Original test image. (b) Test image rotate 7 degree clockwise. (c) Test image rotate 7 degree counterclockwise. (d) Test image translate to left. (e) Test image translates to down. (f) Test image translates to up.

5.3.4 Test of Image with Intensity Variations Face recognition system shows the ability to recognize image with some variations in intensity as shown in figure (5.6). The ability to recognize image with some variations in intensity is due to the normalization operation on input data to the neural network. This operation makes the input values between zero and one and near to each other instead of binary input to the neural network as well as bipolar representation.

222

(a)

(b)

(c)

(d)

Figure (5.6): Test of face images with intensity variation. (a) (b)

Training image.

, (c) and (d) Testing images.

5.3.5 Test on Images with Open/Closed Eyes An experiment of a face with closed eyes is made, in spite of the fact that the image is corrupted, the proposed system also, shows the ability to recognize it. This idea is presented in figure (5.7).

221

(a)

(b) Figure (5.7): Test of face images for open / closed eyes. (a) Training images. (b)Testing images.

5.4 Additional Experiments The system performance investigation, which covers all conditions of human face recognition, has been conducted. They are face recognition under 1. Controlled condition and size variation, 2. Varying lighting condition, 3. Varying facial expression, and 4. Varying pose. To examine the system performances under controlled/ideal condition and head pose variations. The database contains frontal views of 40 people. Each person has 10 images with different head pose variations, controlled/ideal condition, varying lighting condition, and facial expression (normal, smile, sleepy, and surprised). The database contains images corresponding to 400

221

person faces, without any restrictions on their wearing clothes, glasses, make-up, hairstyle, etc.

5.4.1 Face Recognition under Controlled/Ideal Condition and Size Variation A sensitivity analysis to size variation was conducted using the database. Applying a best level of reduction to obtain an acceptable size of image which was of (160*160 pixels). The training set should include multiple images for each person with some variation to obtain a better performance. Here, only one image per person was used for training and the other is used for testing. Another reason is the difference between two identical faces is larger in the first database than that in the database of second and third databases. In particular, the illuminations of the input and the model are slightly different (figure (5.8)). The face images under controlled condition in the database were used to evaluate the performance of the proposed approach. Two example pairs of the face images are illustrated in figure (5.8).

Figure (5.8): An example pair of testing faces. The two faces were taken with a two-week interval.

221

5.4.2 Face Recognition under Varying Lighting Conditions Ideally, a face representation employed for recognition should be invariant to lighting variations. It has been shown theoretically that, for the general case, a function invariant to illumination does not exist. It can be shown theoretically that the proposed system on a smooth background is stable with changes in the lighting direction. Figure (5.9) shows sample corrupted images of model (upper leftmost) and test faces (under varying lighting) representations provide an illumination insensitive representation for face recognition. The experiment was designed using face images taken under different lighting conditions (figure (5.9)). The upper leftmost face image in natural face expression with natural background illumination was used as single model for the subjects. The images under three different lighting conditions were used as test images. The experimental results with three different lighting conditions are illustrated in table (5.1). There are 120 test images in total. These experiments reveal a number of interesting points: 1. In all the experiments, the system consistently performed better accuracy with an improvement of 3 percent in recognition rate. 2. The system performance was affected by the variations of lighting conditions. Nevertheless, the recognition rates of the approach were still high and acceptable. 3. It was found that the recognition rates, with right side light on, were always higher than that with left side light on. This could be due to the fact that the illumination on faces from the left light was slightly stronger than that from the right light. 4. When both lights were on, the error rates became much lower than that of only one light on. Therefore, the shape information on faces would have been clearly which could result in the decrease of the error rate of classification. 221

(a)

(b) (c) (d) Figure (5.9): Sample corrupted images of model and test faces under varying lighting. (a) Training images. (b) Testing image left light on. (c) Testing image right light on. (d) Testing image both lights on.

Table (5.1): Recognition results under varying lighting. Testing faces

Recognition rate

Left light on

89.4%

Right light on

90.2%

Both lights on

92.2%

221

5.4.3 Face Recognition under Facial Expression Changes Similar experiments were conducted to evaluate the effects of different facial expressions (normal, smile, sleepy, and surprised) on the system performances. The faces in neutral expression (the upper leftmost image in figure (5.10)) were used as single models of the subjects. Totally, there were 40 models (representing 40 individuals) and 120 test images. The experimental results on faces with smile, sleepy and surprised expressions were summarized in table (5.2). The smile expression caused the recognition rate to drop by 9.5 percent as compared to neutral expression, while the surprised expression caused only a 6 percent drop of the rate. This is because the surprised expression had produced less physical variation from neutral expression than the expression of smile. The sleepy expression could be the extreme case of deformation among various human facial expressions, i.e., most facial features had been distorted to generate closely corrupted images. Figure (5.10) displays samples of 3 corrupted images of one subject from the database.

Table (5.2): Recognition results under different facial expressions. Testing faces

Recognition rate

Smiling expression

83.5%

Surprised expression

87%

Sleepy expression

79.5%

221

(a)

(b)

(c) (d) Figure (5.10): Sample corrupted faces used in the experiment under facial expression changes. (a) Training image. (b) Testing image of smiling expression. (c) Testing image of surprised expression. (d) Testing image of sleepy expression.

The various facial features could be ranked according to their importance in recognizing faces and separate modules could be introduced for various parts of the face, e.g. the eye region, the nose region, and the mouth region give very good performance using a strategy on precisely these regions.

5.4.4 Face Recognition under Varying Poses The face database is used to evaluate the system performance on face images of different poses. The system was tested using the four poses looking to the right, left, up, and down for each person as shown in figure (5.11). There are 160 test images in total. The recognition results are summarized in table (5.3). It can be observed that the recognition rates most robust to pose variation.

221

Table (5.3): Face recognition results under pose different variations. Testing faces

Recognition rate

Looks right / left

70%

Looks up

74%

Looks down

73.5%

(a)

(b)

(d)

(c)

(e)

Figure (5.11): Sample faces used in the experiment under varying poses. (a) Training image. (b) Testing looks right image. (c) Testing looks left image. (d) Testing looks up image. (e) Testing looks down image.

5.5 Experiments on Variation of System Components Results have been extensively compared with other state-of-the-art face recognition algorithms. Due to space limitations, only one parameter is varied in each case and a brief summary of these experiments is presented.

221

5.5.1 Variation on the Number of Nodes for Single Hidden Layer The results of network were fully connected and consists of a single hidden layer as shown in table (5.4). Correct recognition was 87.3% with 60 neurons in the hidden layer, 89% when using 70 hidden neurons. To reduce the classification error and exclude the mistaken acceptance of unknown faces, the threshold for the output varies to reduce the error. Table (5.4): Accuracy of face recognition system with varying number of nodes for the single hidden layer: Five test images per person. No. of nodes for single hidden layer

60

70

Accuracy

87.3%

89%

5.5.2 Variation on the Number of Hidden Layers Hidden layers (hidden neurons) play a critical role in the operation of backpropagation learning because they act as a feature detectors. As the learning process progresses, the hidden neurons in these layers begin to gradually discover the salient features that characterize the training data. For most applications, a single hidden layer is sufficient. Some times, difficult learning tasks can be simplified by increasing the number of internal layers. So, for complex mapping, two hidden layers may give better generalization and may make training easier than a single hidden layer. Table (5.5) shows the accuracy of the system as the number of hidden layers in the network is varied. The size of the network was 70 nodes for the first hidden layer and 50 nodes for the second hidden layer. As expected, the performance increased with the network of 2 hidden layers.

210

Table (5.5): Accuracy of face recognition system with varying number of hidden layers in the network subnets: Five test images per person. No. of hidden layers

One hidden layer

Two hidden layer

Accuracy

89%

93%

5.5.3 Variation on the Number of Nodes for Second Hidden Layer The accuracy of the system as the number of nodes varies for the second hidden layer as shown in table (5.6). The size of the first hidden layer is fixed on 70 nodes. Referring to these sections it is clearly noticed that best performance is obtained when using the network with two hidden layers (70 and 50 nodes respectively).

Table (5.6): Accuracy of face recognition system with varying number of nodes for the second hidden layer in the network subnets: Five test images per person. No. of nodes for second hidden layer

30

40

50

Accuracy

65%

81%

93%

5.5.4 Effect of Learning Rate and Momentum In backpropagation with momentum, the weight change is in a direction that is a combination of the current gradient and previous gradient. It is used when some training data are very different from the majority of the data. It is desirable to use a small learning rate to avoid a major disruption of the direction of learning when an unusual pair of training patterns present. However, it is also

212

preferable to maintain training at a fairly rapid pace as long as the training data are relatively similar. When the learning rate is low, a network will adjust it’s weights gradually but in this case convergence may be slow. While with a high learning rate, the network can make a drastic change that are not desirable in a nearly trained network, but this is not a problem when starting from random weights. Momentum term involves adding a term to weight adjustment that is proportional to the amount of the previous weight change. Figure (5.12) shows the effect of α on maximum iteration needed and minimum error obtained when keeping μ constant in it’s nominal value (0.3). Figure (5.13) shows the effect of μ on maximum iteration needed and minimum error obtained when keeping α constant in it’s nominal value (0.3).

2 Error

1.5 1 0.5 0 0

0.2

0.4

0.6

0.8

1

Learning rate

Iteration

4000 3000 2000 1000 0 0

0.2

0.4

0.6

0.8

Learning rate

Figure (5.12): Effect of α on maximum iteration needed and minimum error obtained when keeping μ constant. 211

1

Error

0.5 0.4 0.3 0.2 0.1 0 0

0.2

0.4

0.6

0.8

1

0.8

1

Mome ntum parame ter

Iteration

4000 3000 2000 1000 0 0

0.2

0.4

0.6

Mome ntum parameter

Figure (5.13): Effect of μ on maximum iteration needed and minimum error obtained when keeping α constant.

5.5.5 Generalization of the Network The number of connections between two layers is given by multiplying the total number of neurons of two layers, then adding the number of bias neurons connections of the second layer (bias connections of a layer is equal to the number of layer neurons). For a certain one hidden layer network, there are Nn nodes in the input layer, Np nodes in the hidden layer and Nm nodes in the output layer, the total number of connections is given by: Size of Network = [(Nn*Np)+Np]+ [(Np*Nm)+Nm]. Capacity = Size of Network / Nm

211

Generalization = Number of patterns / Capacity When Nn =100, Np =70, Nm =6 Size of Network =7496 Capacity=1249.333 For 10*10 sub-net the number of patterns is 880, Generalization = 0.704 For 5*5 sub-net the number of patterns is 3520, Generalization =2.817 For 5*20 sub-net the number of patterns is 3520, Generalization =2.817 For 20*5 sub-net the number of patterns is 3520, Generalization =2.817 For two hidden layer there are Nn nodes in the input layer, Np1 nodes in the first hidden layer, Np2 nodes in the second hidden layer and Nm nodes in the output layer, the total number of connections is given by: Size of Network = [(Nn*Np1)+Np1]+[(Np1*Np2)+Np2]+ [(Np2*Nm)+Nm]. Capacity = Size of Network / Nm Generalization = Number of patterns / Capacity When Nn =100, Np1 =70, Np2 =50, Nm =6 Size of Network =10926 Capacity=1821 For 10*10 sub-net the number of patterns is 880, Generalization = 0.5 For 5*5 sub-net the number of patterns is 3520, Generalization =1.94 For 5*20 sub-net the number of patterns is 3520, Generalization =1.94 For 20*5 sub-net the number of patterns is 3520, Generalization =1.94 Since the generalization is proportional linearly with the number of patterns and inversely with the size of network. Therefore, from the above results of generalization it could be noted that the 10*10 sub-net is less than other in generalization and that lead to give it the least weight in the judgment process of recognition.

211

5.6 Recognition Results The test set consists of images that were not part of the training set. The initial size of the images before undergoing the transformation is 25600 pixels and the result as shown in table (5.7). The recognition time is the time required having an output classification when applied an image to the network input. The network design in this work is implemented by using a special programming. Therefore, the recognition time depends on the efficiency of the recognition program, the programming language and on the speed of computer used. This time also depends on the network structure. Table (5.7): Results of proposed system for each stage. Size of images after Level of spatial Percentage of

Recognition

Recognition

transformation

resolution

successful

time (sec)

time (sec)

(pixels)

reduction

recognition

550 MHz

1000MHz

100

1

13%

1.52

1.13

5.6.1 Recognition Results with Variation of the Number of Training Images per Person Table (5.8) shows the results of varying the number of images per class used in the training set from 1 to 5. Very good result is obtained when the system is tested with different images. These results are obtained using three database sets of the different images for each person in order to study the effect on the performance of the system.

211

Table (5.8): Error rate as the size of the training set is varied from 1 to 5 images per person. Averaged over two different selections of the training and test sets. Images per person

1

3

5

Error rate

55%

32%

7%

5.6.2 Recognition Results with Variation of the Number of Output Classes Table (5.9) and figure (5.14) show the error rate of the system as the number of classes is varied from 10 to 20 to 40. It made no attempt to optimize the system for the smaller numbers of classes. As expected, performance is improved with fewer classes to discriminate between them. Increasing the number of classes for recognition demand a large number of varying images for each class that required in the training models, which lead to increasing of network structure for acceptable performance.

Table (5.9): Error rate of the face recognition system with varying number of classes (subjects). Number of classes

10

20

40

Error rate

1.5%

4.3%

5.8%

est error %

8 6 4 2

211

5.6.3 Recognition Results as an Arbitrating among Networks Its interesting to note that the system generally have a higher recognition rate for faces with arbitrating among four networks than for the other types of arbitration as shown in table (5.10). Most of the persons whose faces have variability of images by rotating, translating, scaling, expression, and lighting condition, as well as wearing glasses, that are missed due to the reflecting light into the camera. The recognizer is not trained on such images. Thus the recognition rate for some faces is lower, when it used some of these arbitration. Table (5.10): Recognition rate with varying of arbitration output. Type

Recognition rate

Output of one network

18%

(10 * 10) Arbitrating among two networks (10 * 10) & (5 * 5) Arbitrating among three networks (10 *10) & (5 * 5) & (5 * 20) Arbitrating among four networks (10 *10) & (5 * 5) & (5 * 20) & (20 * 5)

211

40% 72% 93%

5.6.4 Recognition Approaches based on Different Database The recognition rates of the best models and their training/ recognition times (if available) are shown in table (5.11). For comparison, the performance of the network applied to similarly reduced images is also shown. The network has two-hidden layer with 70 and 50 hidden neurons, the numbers of input neurons and output neurons are 100 and 6 respectively. The output neurons represent the coded of all classes as well as the rejected class.

Table (5.11): Performance comparison to recognition applied to the different database. Based approach

Recognition

Training time

Recognition time

of database

rate

(hours)

(second)

First database

45%

0.5

1.13

Second database

68%

2

1.13

Third database

93%

23

1.13

211

Chapter Six Conclusions and Suggestions for Future Work 6.1 Conclusions This work has proved that the proposed new concept provides a new way for human face recognition, which is robust to variation of image condition such as lighting condition changes and size variations. The proposed face recognition technique performance can be considered superior to the well known method in most of the comparison experiments. From previous simulation results and discussions some remarks related to the behavior and performance of the suggested face recognition system could be reported. A summary of some important conclusions are pointed out as follows: 1. The system provides a user-friendly environment for interaction with computer on line. 2. It deals with gray scale face images instead of the color information. The colour images were avoided for two reasons: first, humans can easily locate faces in grayscale images, so it was interesting to see if a computer could do the same. A more pragmatic reason is that color would increase the number of inputs to the neural networks, making them slower and requiring more training examples to train and generalize correctly. However, given appropriate training data, this additional data source might be valuable. 3. This work has demonstrated that face image preprocessing is indeed a practical solution to speed up the image searching by carefully selecting/generating proper representation features. This is believed to be the first piece of work on face image preprocessing.

921

4. The wavelet transform used for feature extraction shows robustness against minor changes in the image sample (i.e. orientation, scaling, lighting conditions, expression and noise). 5. The orientations HL and LH contain more energy than HH subbands. The coefficients in orientation HH are less important, since they are responsible for patterns that can hardly be seen by the human’s eye. In addition, first resolution level contains less energy than second, which contains less energy than the third, etc. 6. The coarsest level, orientation LL consists of wider range larger positive coefficients than the other orientations. This orientation contains the most important coefficients. Thus, a great care should be given to this subband. 7. Face recognition system requires amount of time to train the subnets so as to choose the appropriate weight vector that can recognize class successfully. 8. The neural network provides for partial invariance to translation, rotation, scale, and deformation. 9. Face recognition system needs to learn more than one image per person in order to increase the ability of recognition. 10. The generalization values were calculated for each sub-net. It is found that the generalization required more than one image per person in order to increase the ability of recognition (increase the number of patters), while the size of network is constant. Therefore, the networks are trained to achieve a balance between the ability to respond correctly to the input patterns that are used for training (memorization) and the ability to give reasonable (good) responses to input that is similar, but not identical, to that used in training (generalization). 11. Experiments on frontal faces under controlled / ideal conditions indicate that the proposed system is consistently superior correctly identify the

931

input frontal faces on face databases. Moreover, the proposed approach is much more robust to size variation and superior to the slight appearance variations. 12. The error rate of proposed method is 1.5% - 5.8% of that of the other methods. The improvement is due to the combined network and ability to expand the representational capacity of available sets, and to account for new conditions not represented by original face images. 13. To ensure that the random selection of training and test sets do not effect the performance of the system. The system was tested using two randomly chosen divisions of training and test sets. In both experiments the correct recognition remained at 93%. 14. This system easily handles incremental updates to the face recognition database. 15. The structure of neural network with two hidden layers provides recognition results better than one hidden layer. 16. The major difficulty was in taking pictures. When the system could not recognize a person this does not mean that the system is not good; may be the image is too difficult to recognize. 17. The proposed system is a novel compact face feature representation. It is less sensitive to illumination changes and requires less storage space of faces. It is a very encouraging finding that the proposed face recognition approach can achieve higher recognition accuracy than the other approaches

with much

less

storage

requirement

and

computational time. 18. The work presented in this thesis achieves 93 percent correct recognition. In particular it was found to be superior to the other methods that have been used on the ORL database of facial images. Those methods consist of a system using Hidden Markov Models

939

[Samaria, 1993][21], a principal components analysis approach termed eigenfaces [Turk and Pentland, 1991][16]. 19. The network is trained on large number of reject faces and non-face images to reduce the probability of wrong classification for all the unknown images in the known classes. 20. It is not intended to be an exhaustive exploration of methods to optimize the execution time in spite of the training time needing several hours. The complete testing time takes 1.13 second on PC with 1000MHz CPU. 21. Referring to literature survey, the face recognition system has been compared with research activities. Some aspects have been over looked. 1. The work includes the system user interface. 2. The work includes the effect of glasses. 3. The system can recognize faces with scale variations . 4. The system can recognize faces with intensity variations. 5. The system can recognize faces with some noise. 6. The system can recognize faces with orientations less than 8 degrees. 7. The work includes the effect of smiling. 8. The system will recognize faces if the person closed his eyes. 9. The work includes the effect of generalization. 90.The work includes the effect of system component variation on results.

6.2 Suggestions for Future Work During the development of previous work presented in this thesis, many recommendations come to mind. In this context some ideas may be considered for further work to the proposed method: 1. A combination of the holistic approach presented here with a method based on the extraction of geometrical features might lead to further increase in reliability. 932

2. Try optimized MLP neural network structure using genetic algorithm. 3. Try another feature extraction task like SOFM and compare the results with wavelet. 4. An interesting extension to the thesis work trying to recognize colour images. 5. Future work may involve an application of presented work using threedimensional images (movies), which may be applicable. 6. An interesting extension to the thesis work of joining the software algorithms that are developed in this study with the hardware implementation for the face recognition system, which lead to very fast system and ability to control of building access. The problem is then reduced to a simple yes or no questions - either known or unknown.

933

References [1]

Y. Gao and M. K. H. Leung, “Face recognition using line edge map”, IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol. 24, No. 6, June 2002.

[2]

Stave Lawrence, C. Lee Giles, Ah Chung Teoi, and Andrew D. Back, “Face recognition: A convolutional neural network approach”, IEEE Transaction on Neural Network, Special Issue on Neural Networks and Pattern Recognition, Vol. 8, No. 1, pp. 98-113, 1997.

[3]

M. S. Bartlett, G. Donato, J. R. Movellan, J. C. Hager, P. Ekman and T. J. Sejnowski, “Image representations for facial expression coding”, Supported by NIH Grant No. 1F32 MH12417-01.

[4]

M. S. Bartlett and T. J. Sejnowski, “Independent components of face images: A representation for face recognition”, Institute for Neural Computation, UCSD 0523, La Jolla, CA 92093.

[5]

R. Chellappa, C. Wilson and S. Sirohey, “Human and machine recognition of faces: A survey”, in Proceeding of IEEE, Vol. 83, No. 5, PP.705-740, 1995.

[6]

Z. Liposcak and S. Loncaric, “Face recognition from profiles using morphological operations”, Electrical Engineering, University of Zagreb, Unska 3, Zagreb, Croatia, 2000.

[7]

Ying Dai and Yasuaki Nakano, “Recognition of facial images with low resolution using a hopfield memory model”, Pattern Recognition, Vol. 31, pp. 159-167, 1998.

[8]

Stan Li and Juwei Lu, “Face recognition using the nearest feature line method”, IEEE Transaction on Neural Networks, Vol. 10, No.2, PP.439443, 1999.

[9]

D. Anifantis, E. Dermatas and G. Kokkinakis, “A neural network method for accurate face detection on arbitrary images”, Electrical & Computer

434

Engineering

Department,

University

of

Patras,

Kato

Kastritsi,

anifant,[email protected]. [10] A. Goldstein, L. Harmon and A. Lesk, “Identification of human faces”, Proc. IEEE, Vol. 59, PP. 748-, 1971. [11] R. Brunelli and T. Poggio, “Face recognition: Features versus templates”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.15, No.10, pages 1042-1052, October 1993. [12] Ingemar J. Cox, Joumana Ghosn, and Peter N. Yianilos, “Feature-based face recognition using mixture-distance”, In Computer Vision and Pattern Recognition. IEEE Press, 1996. [13] M. S. Bartlett and T. J. Sejnowski, G. L. Movellan and J. R. Ekman, “Automating the facial action coding system: Issues and image representations”, Conference of NIPS on Affective Computing, Breckenridge, 2000. [14] L. Sirovich and M. Kirby, “Low dimensional procedure for the characterization of human faces”, Journal of the Optical Society of America, Vol. 4, No. 3, PP. 519-524, 1987. [15] M. Kirby and L. Sirovich, “Application of the Karhunen-Loeve procedure for the characterization of human faces”, IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol. 12, PP. 831-835, 1990. [16] M. Turk and A. Pentland, “Eigenfaces for recognition”, Journal of Cognitive Neuroscience, Vol. 3, PP. 71-86, 1991. [17] A. Pentand, T. Starner, N. Etcoff, O. Oliyide, and M. Turk, “Experiments with eigenfaces”, International Joint Conference on Artificial Intelligence, Chamberry, France, 1993. [18] B. Moghaddam and A. Pentland, “Face recognition using view based and modular eigenspaces”, In Automatic Systems for the Identification and Inspection of Humans, SPIE, Vol. 2257, 1994.

435

[19] A. Pentand, B. Moghaddam and T. Starner, “View based and modular eigenspaces for face recognition”, In IEEE Conference on Computer Vision and Pattern Recognition, 1994 [20] Hagen Spies and Ian Ricketts, “Face recognition in Fourier space”, Department of Applied Computing, University of Dundee, Dundee DD1 4HN, UK, 2000, [email protected]. [21] F. Samaria and F. Fallside, “Face identification and feature extraction using hidden markov model”, Image Processing Theory and Application, 1993. [22] M. Lades, J. Vorbruggen, J Buhmann, J. Lange, R. Wurtz and W. Konen, “Distortion invariant object recognition in the dynamic link architecture”, IEEE Transaction on Computers, Vpl.42, No. 3, PP. 300-311, 1993. [23] Kumpati S. Narendra and P. Kannan, “Identification and control of dynamical systems using neural networks”, IEEE Transactions on Neural Networks, Vol. 1, No. 1, pp. 4-27, March 1990. [24] Kwok-wo wong, Sheng-jiang Chang and Chi-sing Leung, “Handwritten digit recognition using trace neural network with EKF training algorithm”, 2000, [email protected].. [25] T. J. Stonham, “Practical face recognition and verification with wizard”, Aspects of Face Processing, PP. 426-441, 1984. [26] J. Weng, J. Huang and N. Ahuja, “Learning recognition and segmentation of 3-D objects from 2-D images”, Proceeding of IEEE International Conference Computer Vision, PP. 121-128, 1993. [27] Christer Jahren, Clark S. Lindsey, Thomas Lindbland, and Kare Osterud, “Eye identification for face recognition with neural networks”, Pattern Recognition, PP.150-163, 1996. [28] S. H. Lin, S. Y. Kung, and L. J. Lin, “Face recognition / detection by probabilistic decision based neural network”, IEEE Transaction on Neural Network, Vol. 8, No. 1, PP.114-131, 1997.

436

[29] Zhengjun Pan and Hamid Bolouri, “High speed face recognition based on discrete cosine transforms and neural networks”, Science & Technology Research Center, University of Hertfordshire, Hatfield, Herts,AL10 9AB, UK,1999, Z. pan, H, [email protected]. [30] Haitham F. Ibrahem, “Human face recognition by computer”, M.Sc. Thesis, Saddam University, 1999. [31] T. Sim, R. Sukthankar, M. Mullin, and S. Baluja, “ Memory- based face recognition for visitor identification”, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA 15213, 2000, tsim, rahuls, mdm,[email protected]. [32] D. Valentin, H. Abdi and J. Toole, “Principal component and neural network analysis of face images: Explorations into the nature of information available for classifying faces by sex”, University of Texas at Dallas, MS: GR 4 1, Richardson, USA, 1999. ]33[

Nagam K. Irzoqi“ ,Neural network for face recognition”, M.Sc. Thesis, University of technology, 2000.

[34] Shefa A. Dawood, “Face recognition using neocognitron neural network”, M.Sc. Thesis, University of Mosul, 2000. [35] M. S. Bartlett, G. Donato, J. R. Movellan, P. Ekman and T. J. Sejnowski, “Automating the facial action coding system: Issues and image representations”,

NIPS

Post-Conference

Workshop

on

Affective

Computing, Breckenridge, CO, December 2000. [36] R. S. AL-Huwayzi, “Design of image filtering system based on color model”, M.Sc. Thesis, National Computer Center, Institute of Higher Studies in Computer and Informatics, 2002. [37] Scott E. Umbaugh, “Computer vision and image processing”, Prentice Hall PTR 1998. [38] R. Gonzales and R. Woods, “Digital image processing”, AddisonWesley, 1992.

437

]39[

R. Z. Al-Macdici, “Designing security keys for satellite image encryption”, M.Sc. Thesis, AL-Mustansiriya University, 2001.

[40] W. M. AL-Waily, “Satellite and CCD image compression using wavelet transform”, M.Sc. Thesis, AL-Mustansiriya University, 2000. [44] N. Dolia, A. Burian, V. V. Lukin, C. Rusu, A. A. Kurekin and A. A. Zelensky, “Neural network application for primary local recognition and nonlinear adaptive filtering of images”, [email protected]. [42] A. A. Shenshtawy, “Image programming and processing”, Dar Al-Kutub for Publication and Distribution, Cairo, Egypt, 1997. [43] M. Sonka, V. Hlavac and R. Boyle, “Image processing, analysis and machine vision”, Brooks/Cole Publishing Company, 1999. [44]

J. L. Strack, “Image processing and data analysis”, Cambridge University Press., 1998.

[45] W. K. Pratt, “ Digital image processing”, by John Wiley &Sons, Inc. 1978. ]46[

D .Phillips, “Image processing: Analysis and enhancing digital images”, R.& D. Publication, Inc. Prentice Hall, 1994.

]47[

J. Gomez and L. Velho, “Image processing for computer graphics”, Springer- Verlag. New York Inc., 1997.

]48[

Bryans

Morse“ ,Color

image

processing”,

1995,

http://iul.cs.byu.edu/morse/sso-f95/node19.html. ]49[

K. A. Zidan“ ,High performance technique for image data compression”, M.Sc. Thesis, AL-Mustansiriya University, 1997.

[50] A. Isar, D. Isar, and T. Asztals, “ Nonlinear adaptive filters and wavelets. A Statistical analysis”, Electronics and Telecommunications Faculty, 2Bd. V. Parvan 1900, Romania, [email protected]. [54]

R. Forchheimer and T. Kronander, "Image coding-from waveforms to animation", IEEE Trans. on ASSP, vol. 37, no. 12, pp. 2008-2023, December 1989.

438

[52] Tim

Edwards,

“Discrete

wavelet

transforms:

Theory

and

implementation”, Stanford University, September 1991. [53] S. Pittner and S. Kamarthi, “Feature extraction from wavelet coefficients for pattern recognition tasks”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 21, No. 1, pp. 83-88, January 1999. [54] S. Ranganath and K. Arun, “Face recognition using transform features and neural networks”, Pattern Recognition, Vol. 30, No. 10,PP.1615-1622, 1997. [55] A. S. Lewis and G. Knowles, “Video compression using 3-D wavelet transforms”, Electronics Letters, Vol. 26, No. 6, March 1990. [56] C. Garcia, G. Zikos and G. Tziritas, “Face detection in color images using wavelet packet analysis”, [email protected]. [57] C. Garcia, G. Zikos and G. Tziritas, “A wavelet-based framework for face recognition” , ICS- Foundation for Research and Technology- HellasFORTH. Cgarcia,gzikos,[email protected]. [58] H. Schildt “ ,Coplete reference of borland C ,”++Prentice Hall, 1997. [59] L. Fausett, “ Fundamentals of neural networks: Architectures, algorithms and applications”, Prentice- Hall, Inc., 1994. [60] Kah-Kay Sung and T. Poggio, “Learning human face detection in cluttered scenes”, In Computer Analysis of Images and Patterns, pages 432-439, 1995. [61] Trevor Darrell, ”Computer vision for interface and surveillance”, [email protected]. [62] Stan Franklin, “Artificial minds”, United State of America, 1997. [63] J. M. Zurada, “Introduction to artificial neural systems”, Jaico Publishing House, Delhi, 1996. [64] Ruchir

Patel,

“Face

detection

http://www.ri.cmu.edu/pubs/pub_926.html.

439

using

neural

network”,

[65] B. Forrest, D. Roweth, N. Stroud, D. Wallace and G. Wilson, “Neural network models”, Parallel Computing, Elsevier Science Publishers B. V. (North – Holland), 1988. [66] Mohamed Amtoun, “Face recognition using back-propagation”, Computer Science, University of Windsor, May 2001, [email protected]. [67] M. S. Bartlett, J. C. Hager, P. Ekman and T. J. Sejnowski, “Measuring facial expression by computer image analysis”, University of California, San Francisco, Supported by NSF Grant No. BS-9120868, National Laboratories,1998. [email protected]. [68] H. A. Rowley, S. Baluja, and T. Kanade, “Neural network based face detection”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 20, No. 1, PP.23-38, 1998. [69] R. V. Rullen, J. Gautrais, A. Delorme and S. Thorpe, “Face processing using one spike per neurone”, Centre de Resherche Cerveau & cognition, UMR 5549, 1998. [70] H. A. Rowley, S. Baluja and T. Kanade, “Rotation invariant neural network-based

face

detection”,

Supported

by

Hewlett-Packard

Corporation ( Grant No. DAAH04-94-G-0006). [71] Laurenz Wiskott, Jean-Marc Fellous, Norbert Krüger, and Christoph von der Malsburg, “Face recognition and gender determination”, In Proceedings of the International Workshop on Automatic Face and Gesture Recognition, Zürich, 1995. [72] D. Valen , A. T. Garrison and W. Cottrell, “Connectionist models of face processing: A survey”, Pattern Recognition, Vol. 27, PP. 1209-1230., 1994. [73] Siena, “Stimulation initiative for european neural application”, ESPRIT Project 9811, http://www.mbfys.kun.nl/snn/siena. [74] Steven W. Smith, “Digital signal processing”, California Technical Publishing USA 1999.

440

[75] O. Nakamura, S. Mathur, and T. Minami, “Identification of human faces based on isodensity maps”, Pattern Recognition, Vol. 24, PP. 263-271, 1991. [76] Ruchir

Patel,

“Face

detection

using

neural

network”,

http://www.ri.cmu.edu/pubs/pub_926.html. [77] Roberto Cipolla and Alex Pentland, “Computer vision for humanmachine interaction”, Cambridge University Press 1998. [78] Y. Le Cun and Yoshua Bengio, “Convolutional networks for images, speech, and time series”, In Michael A. Arbib, editor, The Handbook of Brain Theory and Neural Networks, pages 255-258. MIT Press, Cambridge, Massachusetts, 1995. [79] A. N. Dolia, A. Burian, V. V. Lukin, C. Rusu, A. A. Kurekin and A. A. Zelensky, “Neural network application for primary local recognition and nonlinear adaptive filtering of images”, [email protected]. [80] Henry A. Rowley, Shumeet Baluja, and T. Kanade, “Human face detection in visual scenes”, Technical Report CMU-CS-95-158, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, July 1995. [81] Dan W. Patterson, “Artificial neural network: Theory and applications “, Inc. Prentice Hall, 1996.

444

‫تـمييز الوجوه باستخدام الشبكات العصبية‬ ‫الخالصت‬ ‫إن َصً اإلوسان عبارة عه ومط معقد مخعد ا اعاعداا َعم تدت حمزتد مزد ٌداا الدىمط َ اضداا صدت ت‬ ‫رااضتت حسخخ م لخمتتز الُصً حعخبر عم تت صعبت َمعق ة‪.‬‬ ‫فدده ٌدداا الب ددذ قخرحددج طراقددت ٌضتىددً كفددُ ة َاق د حعقت د ا لخمتتددز الُصددُي َالخدده حخخددمه إا ددا‬ ‫الصددُرة الخددام إلددظ المىجُمددت عصددرا عم تدداث معالضددت َلتددت ع تٍددا مددص العددب اث العصددبتت ا صد ىاعتت‪.‬‬ ‫َحخ ُن ٌاي ال راقت مه رالد مراح ‪:‬‬ ‫المرح دت اعَلدظ‪ :‬حخخدمه ح سدته دُاو الصدُرة اا دخخ ام )‪َ .(Histogram Equalization‬عم تدت‬ ‫الخ سته المسخخ مت حُفر إم اوتت ع م حدثرر مع ُمداث الصدُرة ل خ تتدراث ال فتفدت الخده ح د د فتٍدا وختضدت‬ ‫ا‬

‫خالف ا فا ة ال امترا َح سته الخبااه اا‬

‫الصُرة‪.‬‬

‫المرح ت الزاوتت‪ :‬حخخمه حق تص حضم الصُرة اا خخ ام )‪ .(Wavelet Transform‬لزاااة درعت الىجدام‬ ‫َلخق ت البتاواث ال ا ت إلظ العب ت العصبتت‪.‬‬ ‫المرح ددت الزالزددت‪ :‬حصددمتم وجددام عصددبه اصدد ىاعه مخ امدد لخمتتددز الصددُر َحصددىتفٍا‪ .‬حخخددمه إا ددا‬ ‫المع ُمدداث مدده المرح ددت السددااقت إلددظ العددب اث العصددبتت المخعد اة )‪ٌ .(Multi-Neural Networks‬دداي‬ ‫العب اث العصبتت حُفر إم اوتت ع م حثرر المع ُماث لخ تر الصُرة صزئتا مه حتذ ا‬

‫خ ارة َكالك اعض‬

‫الخعٌُاث الخه ح د ل صُر‪.‬‬ ‫إن مرح ت راط ح ك الخُاو إلدظ فدفرة الخمتتدز حخخدمه صمدص اإل دراس الىداحش عده كد صدز مده‬ ‫العب اث العصدبتت المخعد اة ل صدُ ع دظ الخدرس الىٍدائه ل عدب ت الدال ا د ا الُصدً الممتدز َالمع ُمداث‬ ‫الخاصت اً َالمخزَوت اا‬

‫قاع ة البتاواث‪.‬‬

‫الصُر الخده ا دخخ مج احخدُث ع دظ كزتدر مده المخ تدراث فده الخعبتدر َفده ا دخ ارة الُصدً َفده‬ ‫حفاصت الُصً‪ .‬كان اا الىجام مسخقرا مص الخ تتر فه حضم الُصً المراا حصىتفً َمُقعدت َا َراودت َكلدك‬ ‫ددخخ ام فددب ت كاث ٌت تددت مخع د اة‪ .‬و خدداس إلددظ ع د ا مدده الصددُر ل عددخص الُاح د فدده عم تددت الخ د را‬ ‫ل صُ ع ظ وسبت حمتتز مقبُلت‪ .‬الىخائش الخه حص ىا ع تٍا حبته ن ل ىجام المقخرط اا معضص ص ا‪.‬‬ ‫ٌت تت الىجام المقخرط حم اىائٍا ع ظ ف‬

‫حقتبت ارمضتت اا خخ ام )‪(C++ and Visual Basic‬‬

‫كلمات المفاتيح‪:‬‬ ‫حمتتز اعوماط ‪ ،‬ح ا الُصُي ‪ ،‬حمتتز َصُي اعفخاو ‪ ،‬الرؤات االماكىت ‪ ،‬ا دخخالو الٍت داث ‪ ،‬العدب اث‬ ‫العصبتت ا ص ىاعتت ‪ ،‬الخع م االماكىت ‪ ،‬حصدىت اعومداط ‪ ،‬الخالادا العصدبتت مخعد اة ال بقداث ‪ ،‬الخصدىت‬ ‫اإلحصائه‪.‬‬