chapter 1

0 downloads 0 Views 4MB Size Report
Recognition stage by using Modular Neural Network. ...... less than two seconds and the data could be stored efficiently in a 9-byte .... templates and geometric models, most such system perform very specific ...... augmented by additional (high-resolution) details in terms of salient facial ...... enum {RECALL, LEARNING};.
ZAGAZIG UNIVERSITY BANHA BRANCH FACULTY OF ENGINEERING-SHOUBRA ELECTRICAL ENGINEERING DEPARTMENT COMPUTER SYSTEM DIVISION

FACE RECOGNITION USING NEURAL NETWORKS By Ahmed Usama Faramawi Ahmed Bayomy Zaky Amir Esmaiel Yossef Eman Farouk El-tokhi Hassan Mohamed Naguib Supervised by:

Ass. Prof. Raffat A. El-kammar Electrical Eng. Dept.- Faculty of Engineering

Dr. Hala H. Zayed Electrical Eng. Dept.- Faculty of Engineering

Julay 2003

Abstract In the last few years, the recognition systems using the biometrics features had been developed in many researches. The human face one of the biometric features that contain many features that could be used in the recognition process. Until now, there are many systems which had been built using the human face in the recognition process. These systems could be used in many systems that needs to a high degree of security such as Airports, Bank systems, and a lot of criminals‟ investigations.

In this project we implemented one of the most efficient face recognition systems. In our system we used a Gabor Filters and Neural Networks. This system consists of three stages: 1. Preprocessing stage to the input image to determine the center of eyes that center is the point between the two eyes and above the nose. 2. Extraction of the basic features of the input face and converting it to the short code to ease the recognition process. 3. Recognition stage by using Modular Neural Network.

We made many tests to measure the performance of the system and to reach the best results by using different values for the Neural Network parameters. So the performance of the system reached more than 97% recognition. Then this system is a complete Automated Face Recognition system.

ii

CHAPTER 1 .......................................................................................................................... 1 INTRODUCTION ................................................................................................................. 1 1.1. Why face recognition? ................................................................................................ 2 1.2. Problem Definition ..................................................................................................... 5 CHAPTER 2 .......................................................................................................................... 8 Computer Vision ................................................................................................................... 8 2.1. Introduction ................................................................................................................ 8 2.3. 2D Image Input ......................................................................................................... 13 2.3.1. 2D Input Devices ................................................................................................... 14 2.3.1.1. TV Camera or Vidicon Tube .......................................................................... 14 2.3.1.2. CCD Camera .................................................................................................. 15 2.4. 3D imaging ............................................................................................................... 15 2.4.1. Methods of Acquisition ..................................................................................... 16 2.5. Image processing ...................................................................................................... 22 2.5.1. Smoothing Noise ............................................................................................... 24 2.5.2. Edge Detection .................................................................................................. 28 2.5.2.1.1 Gradient based methods ............................................................................... 29 2.5.3. Edge Linking ...................................................................................................... 41 2.5.3.1. Local Edge Linking Methods ......................................................................... 42 2.5.4. Segmentation ..................................................................................................... 43 2.5.4.1. Region Splitting .............................................................................................. 44 2.5.4.2. Region Growing ............................................................................................. 46 CHAPTER 3 ........................................................................................................................ 49 Neural Network ................................................................................................................... 49 3.1 Introduction ............................................................................................................. 49 3.1.1 Human and Computer ......................................................................................... 49 3.1.2 The structure of the brain.................................................................................... 50 3.1.3 Learning in biological systems ........................................................................... 51 3.1.4 Learning in Machines ......................................................................................... 52 3.2 Pattern Recognition ................................................................................................. 53 3.2.1 Pattern Recognition: in Perspective .................................................................... 53 3.2.2 Pattern Recognition - a Definition ...................................................................... 53 3.2.3 Discriminate Functions ....................................................................................... 54 3.2.4. Classification techniques ................................................................................... 56 3.2.5. Linear Classifiers ............................................................................................... 57 3.3. The Basic Neuron ................................................................................................... 59 3.3.1. Modeling the single neuron ............................................................................... 59 3-3-2) Learning in simple neuron ................................................................................ 62 3.3.3. The Perceptron: A Vectorial Perspective .......................................................... 63 3.4. Neural Network Applications ................................................................................ 64 3.5. Altering the Perceptron Model .............................................................................. 67 3.6. Neural Network Architectures .................................................................................. 68 3.6.1. Multilayer Perceptron ........................................................................................ 68 3.6.2. Radial Basis Function ........................................................................................ 73 3.6.3. Kohenan Self – Organizing Map Networks ...................................................... 75 3.6.4. Hopfield Networks ............................................................................................ 78 3.7. Hardwired Neural Network ...................................................................................... 80 3.7.1. NNW Chips ....................................................................................................... 80 3.7.2. Comparison Ratings .......................................................................................... 81 3.7.3. Digital NNW Chips ........................................................................................... 81

iii

3.7.4. Analog & Hybrid NNW Chips .......................................................................... 83 3.7.5. Neuromorphic NNW‟s ...................................................................................... 85 3.7.6. General Purpose Vs. Algorithm Specific .......................................................... 85 3.7.7. Hardware Systems ............................................................................................. 86 3.7.8. NeuroComputers ................................................................................................ 87 3.7.9. NNW Accelerator Cards .................................................................................... 87 CHAPTER 4 ........................................................................................................................ 89 PAST RESEARCHES ON FACE RECOGNITION .......................................................... 89 4.1. Human Face Recognition ......................................................................................... 89 4.1.1. Discussion.......................................................................................................... 92 4.2. Automatic Face Recognition .................................................................................... 94 4.2.1. Representation, Matching and Statistical Decision ........................................... 95 4.2.2. Early face recognition methods ......................................................................... 97 4.2.3. Statistical approaches to face recognition ......................................................... 99 4.2.4. Hidden Markov Model Based Methods .......................................................... 114 4.2.5. Neural Networks Approach ............................................................................. 120 4.2.6. Template Based Methods ................................................................................ 125 4.2.7. Feature Based Methods ................................................................................... 127 4.2.8. Current State of the Art ................................................................................... 134 Chapter (5) ......................................................................................................................... 136 The Proposed System ........................................................................................................ 136 5.1 Introduction ............................................................................................................. 136 5.2 The Preprocessing Stage .......................................................................................... 137 5.2.1 Histogram Equalization .................................................................................... 139 5.2.2 Edge Detection ................................................................................................. 144 5.2.3 Finding Eyes Region ........................................................................................ 152 5.2.4 Eyes Detection (Neural Network) .................................................................... 157 5.3 The Feature Extraction Stage ............................................................................ 163 5.3.1 Gabor filtering .................................................................................................. 164 5.3.2 Face Code ......................................................................................................... 174 5.4 The Recognition Stage............................................................................................. 176 CHAPTER 6 ...................................................................................................................... 177 Results and Conclusion ..................................................................................................... 177 6.1 The System Database............................................................................................... 177 6.2 Neural Network Training ........................................................................................ 180 6.3 Neural Network Testing and Results ....................................................................... 182 6.4 Conclusion ............................................................................................................... 189 6.4 Future work ............................................................................................................. 190 Appendix A ....................................................................................................................... 191 Important Codes ................................................................................................................ 191 A.1 Histogram Equalization Class ................................................................................ 191 A.2 Sobel Edge Detection Class .................................................................................... 193 A.4 Gabor Filter Function ............................................................................................. 196 A.5 Convolution Class ................................................................................................... 198 A.6 Image Normalization Function ............................................................................... 199 A.6 Face Code Generation ............................................................................................ 200 A.7 Face Position Class ................................................................................................ 201 A.8 Figures of the program operation steps .................................................................. 208

iv

CHAPTER 1 INTRODUCTION Machine recognition of faces emerges as an active research area spanning several disciplines such as image processing, pattern recognition, computer vision and neural networks. Face recognition technology has numerous commercial and law enforcement applications. These applications range from static matching of controlled format photographs such as passports, credit cards, photo ID‟s, driver‟s licenses, and mug shots to real time matching of surveillance video images[16].

Humans seem to recognize faces in cluttered scenes with relative ease, having the ability to identify distorted images, coarsely quantized images, and faces with occluded details. Machine recognition is much more daunting task. Understanding the human mechanisms employed to recognize faces constitute a challenge for psychologists and neural scientists. In addition to the cognitive aspects, understanding face recognition is important, since the same underlying mechanisms could be used to build a system for the automatic identification of faces by machine.

A formal method of classifying faces was first proposed by Francis Galton in 1888 [10] during the 1980‟s work on face recognition remained largely dormant. Since the 1990‟s, the research interest in face recognition has grown significantly as a result of the following facts: 1. The increase in emphasis on civilian/commercial research projects, 2. The re-emergence of neural network classifiers with emphasis on real time computation and adaptation,

1

3. The availability of real time hardware, 4. The increasing need for surveillance related applications due to drug trafficking, terrorist activities, etc.

Still most of the access control methods, with all their legitimate applications in an expanding society, have a bothersome drawback. Except for human and voice recognition, these methods require the user to remember a password, to enter a PIN code, to carry a batch, or, in general, require a human action in the course of identification or authentication. In addition, the corresponding means (keys, batches, passwords, PIN codes) are prone to being lost or forgotten, whereas fingerprints and retina scans suffer from low user acceptance. Modern face recognition has reached an identification rate greater than 90% with well-controlled pose and illumination conditions. While this is a high rate for face recognition, it is not comparable to methods using keys, passwords or batches.

1.1. Why face recognition? Within today‟s environment of increased importance of security and organization, identification and authentication methods have developed into a key technology in various areas: entrance control in buildings; access control for computers in general or for automatic teller machines in particular; day-today affairs like withdrawing money from a bank account or dealing with the post office; or in the prominent field of criminal investigation. Such requirement for reliable personal identification in computerized access control has resulted in an increased interest in biometrics.

Biometric identification is the technique of automatically identifying or verifying an individual by a physical characteristic or personal trait. The term 2

“automatically” means the biometric identification system must identify or verify a human characteristic or trait quickly with little or no intervention from the user. Biometric technology was developed for use in high-level security systems and law enforcement markets. The key element of biometric technology is its ability to identify a human being and enforce security [12].

Biometric characteristics and traits are divided into behavioral or physical Categories. Behavioral biometrics encompasses such behaviors as signature. Physical biometric systems use the eye, finger, hand, voice, and face, for identification.

A biometric-based system was developed by Recognition Systems Inc., Campbell, California, as reported by Sidlauskas [25]. The system was called ID3D Handkey and used the three dimensional shape of a person‟s hand to distinguish people. The side and top view of a hand positioned in a controlled capture box were used to generate a set of geometric features. Capturing takes less than two seconds and the data could be stored efficiently in a 9-byte feature vector. This system could store up to 20000 different hands.

Another well-known biometric measure is that of fingerprints. Various institutions around the world have carried out research in the field. Fingerprint systems are unobtrusive and relatively cheap to buy. They are used in banks and to control entrance to restricted access areas. Fowler [21] has produced a short summary of the available systems.

Fingerprints are unique to each human being. It has been observed that the iris of the eye, like fingerprints, displays patterns and textures unique to each human and that it remains stable over decades of life as detailed by

3

Siedlarz [22] Daugman designed a robust pattern recognition method based on 2-D Gabor transforms to classify human irises. Speech recognition is also offers one of the most natural and less obtrusive biometric measures, where a user is identified through his or her spoken words, AT&T have produced a prototype that stores a person‟s voice on a memory card. Details of which are described by Mandelbaum [13].

While appropriate for bank transactions and entry into secure areas, such technologies have the disadvantage that they are intrusive both physically and socially. They require the user to position their body relative to the sensor, and then pause for a second to declare himself or herself. This pauses and declares interaction is unlikely to change because of the fine-grain spatial sensing required. Moreover, since people can not recognize people using this sort of data, these types of identification do not have a place in normal human interactions and social structures.

While the pause and present interaction perception is useful in high security applications, they are exactly the opposite of what is required when building a store that recognizing its best customers, or an information kiosk that remembers you, or a house that knows the people who live there.

A face recognition system would allow user to be identified by simply walking past a surveillance camera. Human beings often recognize one another by unique facial characteristics. One of the newest biometric technologies, automatic facial recognition, is based on this phenomenon. Facial recognition is the most successful form of human surveillance. Facial recognition technology, is being used to improve human efficiency when recognizing faces, is one of the fastest growing fields in the biometric industry. Interest in facial recognition is being fueled by the availability and 4

low cost of video hardware, the ever-increasing number of video cameras being placed in the workspace, and the noninvasive aspect of facial recognition systems.

Although facial recognition is still in the research and development phase, several commercial systems are currently available and research organizations, such as Harvard University and the MIT Media Lab, are working on the development of more accurate and reliable systems.

1.2. Problem Definition A general statement of the problem can be formulated as follows: given still or video images of a scene, identify one or more persons in the scene using a stored database of faces.

The environment surrounding a face recognition application can cover a wide spectrum from a well-controlled environment to an uncontrolled one. In a controlled environment, frontal and profile photographs are taken complete with uniform background and identical poses among the participants. These face images are commonly called mug shots. Each mug shot can be manually or automatically cropped to extract a normalized subpart called a canonical face image. In a canonical face image, the size and position of the face are normalized approximately to the predefined values and background region is minimal.

General face recognition, a task that is done by humans in daily activities, comes from virtually uncontrolled environment. Systems, which automatically recognize faces from uncontrolled environment, must detect faces in images. Face detection task is to report the location, and typically 5

also the size, of all the faces from a given image and completely a different problem with respect to face recognition.

Face recognition is a difficult problem due to the general similar shape of faces combined with the numerous variations between images of the same face.

Recognition of faces from an uncontrolled environment is a very complex task: lighting condition may vary tremendously; facial expressions also vary from time to time; face may appear at different orientations and a face can be partially occluded. Further, depending on the application, handling facial features over time (aging) may also be required.

Although existing methods performs well under constrained conditions, the problems with the illumination changes, out of plane rotations and occlusions are still remains unsolved. The proposed algorithm, deals with two of these three important problems, namely occlusion and illumination changes.

Since the techniques used in the best face recognition systems may depend on the application of the system, one can identify at least two broad categories of face recognition systems [23]: 1. Finding a person within large database of faces (e.g. in a police database). (Often only one image is available per person. It is usually not necessary for recognition to be done in real time.) 2. Identifying particular people in real time (e.g. location tracking system).

6

(Multiple images per person are often available for training and real time recognition is required.)

7

CHAPTER 2 Computer Vision

2.1. Introduction The goal of computer vision is to process images acquired with cameras or scanners in order to produce a representation of objects in the world. Computer vision can be described as a process that converts a digitized image of sensor values into a symbolic description of patterns and objects in the scene, suitable for subsequent use in a computer dependent task.

Computer vision has been an active area of research for more than three decades. A quick review of the field reveals that image processing and pattern recognition has been tremendously successful in terms of delivering operational system. Everyday barcode scanners are used in supermarkets, and pattern recognition techniques are used for such purposes has been driven by signification, bill recognition, and address recognition. This progress has been driven by significant increases in computer power and widespread use of multimedia technology. The use of imaging technology for everyday tasks and the wide use of images as part of the web has resulted in a tremendous reduction in price and delivery of images to virtually every desktop. In terms of full-scale computer vision applications that involve motion estimation, depth recovery, and scene interpretation, fairly limited progress has been achieved. Few systems have been deployed for regular use. A few notable exceptions include the some highway driving systems, the NAVLAB system 8

from Carnegie Mellon University. And a few system for medical diagnostics and computer aided surgery .Overall, it has however not been possible to deliver the expected set of applications due to a lack of robustness and methods for integration of methods into full systems.

At the beginning of a new millennium, is now a number of promising efforts to construct fully operational systems. The construction of systems gradually provides the insight to formulate basic methods for the design, analysis, implementation, and evaluation of operational and robust systems that have a wide domain of application. Most of operational and robust systems that have a wide domain of application, Most of the established conferences in computer vision such as ICCV, ECCV, CVPR, ACCV, and ICPR focus on component methods for discussion of system issues, the first International Conference on Computer Vision Systems(ICVS) was held january1999,at Gran Canaria island in Spain. The program included more than sixty papers and three invited lectures by prof. Discmans (automated highway driving), prof. Kanade (autonomous helicopter flying), and prof. Brady (image-based medical diagnostics).

Organizations use computer-based vision and automation tools in wide variety of industrial and scientific applications, including pharmaceutical, semiconductor, electronics automotive, and research applications, these systems perform process monitoring, information gathering, and feedback control on-the-fly to be correct manufacturing problems, Laboratory automation and image processing applications use filtering and analysis techniques to perform cell and biomaterial counting and sizing. With computer-based vision systems, we can increase productivity, flexibility, consistency, reliability, lower production costs, and perform complex inspection tasks. 9

There already exists a number of working systems that perform parts of this task in specialized domains. For example:  A map of a city or a mountain range can be produced semiautomatically from a set of aerial images.  A robot can use the several images frames per second produced by one or two obstacle avoidance.  A printed circuit inspection system may take one picture per board on a conveyer belt and produce a binary image flagging possible faulty soldering points on board.  A zip code reader tasks single snapshots of envelopes and translates a handwritten number into ASCII string.  A security system can match one or a few pictures of a face with a database of known employees for recognition. Most current “image understanding” tasks fall under one or more of the following categories:  Pattern Classification: This category describes perhaps the widest range of machine vision tasks that researchers have been able to formulate algorithmically. The human visual system can detect and recognize complex visual patterns with a minimal amount of training; this capability is very useful for recognizing objects, inspecting industrial products and organizing perception of cluttered environments. Researches are working at duplicating this capability in an artificial system. A pattern classification problem involves assigning identities or labels to patterns in an image, or sometimes to an entire image. Starting from a small set of sample images of an object, or of a category of objects, the systems should be able 10

to detect and locate instances of the object in previously unseen images. Many classical computer vision problems like image annotation object recognition and object detection can be posed, at least in part, as a pattern classification problem.  Registration: A registration problem involves establishing a match between an input image and a reference image or model. An image registration system “understands” the input image by explaining its appearance in terms of a transformed reference image or model.  Reconstruction: Reconstruction problems “understand” an input image by creating a computer representation of surfaces and objects in the scene .An appropriate representation scheme could be a depth map or CAD model. In some reconstruction problem the goal may be to determine the imaging conditions under which the image is taken, such as distribution and color or light sources. However, the generic “vision problem” is far from being solved .No existing system can come close to emulating the capabilities of a human. System such as the ones described above is fundamentally brittle: as soon as the input deviates ever so slightly from the intended format, the output becomes almost invariably meaningless.

Vision is therefore one of the problems of computer science worthy of investigation because we know that it can be solved, yet we do not know to solve it well. In fact, to solve the “general vision problem” we will have to come up with answers to deep and fundamental questions about representation and computation at the core human intelligence.

11

There has been successful computer vision system in the past especially object recognition and localization system based on pictorial templates and geometric models, most such system perform very specific tasks at best, and operate only under very heavily constrained conditions. For example, in some systems, one can only deal with a very limited library of objects and patterns made up of only rigid parts .In other cases, lighting must be controlled and the imaging geometry must be fixed. Computer vision systems today are still too inflexible for any widespread use beyond specific tasks that they have been designed to perform.

2.2. Image Acquisition The first stage of any vision system is the image acquisition stage. After the image has been obtained, various methods of processing can be applied to the image to perform the many different vision tasks required today.

However, if the image has not been acquired satisfactorily then the intended tasks ma not be achievable, even with the aid of some form of image enhancement Images of the manufactured products are obtained by cameras, which form the basis of

every vision system. The camera contains

sophisticated sensors that convert visual scenes to electrical video signals that a computer can then analyze.

Vision system developers can choose from two types of cameras: analog and digital: 1. Analog

cameras:

Analog

video

predominates.

Video

professional have widely adopted the gray scale and color analog video formats, therefore, any basic computer-based vision system should easily accept both monochrome and 12

color analog signals. However different types of analog cameras transmit color and brightness information in different ways and a flexible computer-based system will accept several different types of these color analog signals. 2. Digital cameras: Digital cameras transmit a cleaner image than analog because a digital stream of data is less susceptible to noise and distortion. An analog camera sends its signal to a plug-in computer board that transforms the signal from the camera into digital code. The computer‟s central processing unit can then process the code for analysis and presentation. With digital cameras, the analog signal becomes digital inside the camera before it reaches the computer.

Digital camera also transmits their signal higher speeds than their analog counterparts. Some digital cameras can output data at rates greater than 100megabytes per second. To handle this speed, it is necessary to have a computer plug-in board specifically suited for digital cameras .some of these computer boards can accept information at rates as high as 200Mbytes/s.

2.3. 2D Image Input The basic two-dimensional image is a monochrome (greyscale) image which has been digitised. Describe image as a two-dimensional light intensity function f(x,y) where x and y are spatial coordinates and the value of f at any point (x, y) is proportional to the brightness or grey value of the image at that point. A digitized image is one where spatial and grayscale values have been made discret:

13

 Intensity measured across a regularly spaced grid in x and y directions.  Intensities sampled to 8 bits (256 values). For computational purposes, we may think of a digital image as a twodimensional array where x and y index an image point. Each element in the array is called a pixel (picture element).

2.3.1. 2D Input Devices 2.3.1.1. TV Camera or Vidicon Tube A first choice for a two-dimensional image input device may be a television camera output is a video signal: 

Image focused onto a photoconductive target.



Target scanned line by line horizontally by an electron beam



Electric current produces as the beam passes over target.



Current proportional to the intensity of light at each point.



Tap current to give a video signal.

This form of device has several disadvantages:

Limited resolution

finite number of scan lines (about 625) and frame rate (30 or 60 frames per second) 

Distortion

unwanted persistence between one frame and the next 

Non-linear video output with respect to light intensity.



Non-flat target on tube.

14

2.3.1.2. CCD Camera By far the most popular two-dimensional imaging device is the chargecoupled device (CCD) camera. 

Single IC device



Consists of an array of photosensitive cells



each cell produces an electric current dependent on the incident

light falling on it. 

Video Signal Output



Less geometric distortion



More linear Video output.

2.4. 3D imaging The simplest and most convenient way of representing and storing the depth measurements taken from a scene is a depth map. A depth map is a two-dimensional array where the x and y distance information corresponds to the rows and columns of the array as in an ordinary image, and the corresponding depth readings (values) are stored in the array's elements (pixels). Depth map is like a grey scale image except the information replaces the intensity information.

Why use 3D data? A 3D image containing has many advantages over its 2D counterpart Such as explicit Geometry 

2D images give only limited information the physical shape and

size of an object in a scene.

15



3d images express the geometry in terms of three-dimensional

coordinates. e.g Size (and shape) of an object in a scene can be straightforwardly computed from its three-dimensional coordinates. Recent technological advances ( e.g. in camera optics, CCD cameras and laser rangefinders) have made the production of reliable and accurate three-dimensional depth data possible. 2.4.1. Methods of Acquisition 2.4.1.1 Laser Ranging Systems Laser ranging works on the principle that the surface of the object reflects laser light back towards a receiver which then measures the time (or phase difference) between transmission and reception in order to calculate the depth. Most laser rangefinders:

Work at long distances



Consequently their depth resolution is inadequate for detailed

vision tasks. 

Shorter range systems exist but still have an inadequate depth

resolution (1cm at best) for most practical industrial vision purposes. Structured Light Basic idea: 

Project patterns of light (grids, stripes, elliptical patterns etc.)

onto an object.

16



Surface shapes are then deduced from the distortions of the

patterns that are produced on Object's Surface. 

Knowing relevant camera and projector geometry, depth can be

inferred by triangulation. 

Many methods have been developed using this approach.



Major advantage -- simple to use.



Low spatial resolution -- patterns become sparser with distance.



Some close range (4cm) sensors exist with good depth resolution

(around 0.05mm) but have very narrow field of view and close range of operation. 2.4.1.2. Moire Fringe Methods The essence of the method is that a grating is projected onto an object and an image is formed in the plane of some reference grating The image then interferes with the reference grating to form Moire fringe contour patterns which appear as dark and light stripes. Analysis of the patterns then gives accurate descriptions of changes in depth and hence shape. 

It is not possible to determine whether adjacent contours are

higher or lower in depth. 

Resolve by moving one of the gratings and taking multiple

Moire images. 

Reference grating can also be omitted and its effect can be

simulated in software. Moire fringe methods are capable of producing very accurate depth data (resolution to within about 10 microns) but the methods have certain drawbacks.

17



Methods are relatively computationally expensive.



Surfaces at a large angle are sometimes unmeasurable -- fringe

density becomes too dense.

2.4.1.3 Shape from Shading Methods Methods based on shape from shading employ photometric stereo techniques to produce depth measurements. Using a single camera, two or more images are taken of an object in a fixed position but under different lighting conditions. By studying the changes in brigtness over a surface and employing constraints in the orientation of surfaces, certain depth information may be calculated. Methods based on these techniques are not suited for general three-dimensional depth data acquisition: 

Methods are sensitively dependent on the illumination and

surface reflectance properties of objects present in the scene. 

Methods only work well on objects with uniform surface texture.



It is difficult to infer absolute depth, and only surface orientation

is easily inferred. 

Methods are mostly used when it is desired to extract surface

shape information. 2.4.1.4 Passive Stereoscopic Methods Stereoscopy as a technique for measuring range by triangulation to selected locations in a scene imaged by two cameras already . The primary computational problem of stereoscopy is to find the correspondence of various points in the two images. This requires: 18



Reliable extraction of certain features (such as edges or points)

from both images 

Matching of corresponding features between images.



Both of these tasks are non-trivial and computationally complex.



Passive stereo may not produce depth maps within a reasonable



the depth data produced is typically sparse since high level

time.

features, such as edges, are used rather than points. NOTE: 

Problems in finding and accurately locating features in each

image can be hard. 

Care needed not to introduce errors.



Depth measurements accurate to a few millimetres.



One such passive stereo vision system is TINA developed at

Sheffield University. 2.4.1.5 Active Stereoscopic Methods The problems of passive stereoscopic techniques may be overcome by 

Illuminating the scene with a strong source of light (in the form

of a point or line of light) which can be observed by both cameras. 

Known corresponding points provided in each image.



Depth maps can then be produced by sweeping the light source

across the whole scene. 

Laser light source typically employed.



Active stereo can only be applied in controlled environments

industrial applications.

19

Active Stereo Vision System the active stereoscopic subsystem which provides the threedimensional data to system for automatically inspecting mechanical parts.

The vision system consists of : 

a matched pair of high sensitivity CCD cameras,



a laser scanner all mounted on an optical bench to reduce

vibration. Initially the cameras of the system must be calibrated in order to

20



determing the 3D position of them relative to some world

coordinates 

focal length and lens distortion of the camera (+ lens etc.).

Depth maps extracted from the scene by : 

Moving the laser stripe across the scene to obtain a series of

vertical columns of pixels 

Triangulate Pixels to give the required dense depth map. The

depth of a point is measured as the distance from one of the cameras, chosen as the master camera. 

Knowing the relevant geometry and optical properties of the

cameras the depth map is constructed using the following method:



Measuring a depth value :1.

For each vertical stripe of laser light form an image of the stripe

in the pair of frames from each camera. 2.

For each row in the master camera image, search until the stripe

is found at point P(i,j), say. 3.

Form a three-dimensional line l passing through the centre Cm

of the master camera and P(i,j). 4.

Construct the epipolar line which is the projection of the line l

into the image formed by the other camera. Do this by projecting two arbitrary points P1 and P2 into the image and constructing a line between the two projected points. 5.

Search along the epipolar line for the laser stripe. If it is found at

Q, proceed to Step 6.

21

6.

Find the point Pp on line l which corresponds to Q. Calculate

the (x,y,z) coordinates of Pp, and store the z value at position (i,j) corresponding to x and y in the depth map. The position of the point Pp is easily found by projecting a line L‟ from the centre C0of the secondary camera passing through Q. The intersection of the lines l and L‟ gives the coordinates of Pp. The depth map is formed by using a world coordinate system fixed on the master camera with its origin at Cm.

2.5. Image processing Image processing is in many cases concerned with taking one array of pixels as input and producing another array of pixels as output which in some way represents an improvement to the original array. For example, this processing 

may remove noise,



improve the contrast of the image,



remove blurring caused by movement of the camera during

image acquisition, 

it may correct for geometrical distortions caused by the lens.

Image processing methods may be broadly divided into  Real space methods which work by directly processing the input pixel array.  Fourier space methods which work by firstly deriving a new representation of the input data by performing a Fourier transform, which is then processed, and

22

finally,

an inverse Fourier transform is performed on the resulting data to

give the final output image. Fourier Methods Lets consider a 1D Fourier transform example: Consider a complicated sound such as the noise of a car horn. We can describe this sound in two related ways: 

sample the amplitude of the sound many times a second, which

gives an approximation to the sound as a function of time. 

analyse the sound in terms of the pitches of the notes, or

frequencies, which make the sound up, recording the amplitude of each frequency. Similarly brightness along a line can be recorded as a set of values measured at equally spaced distances apart, or equivalently, at a set of spatial frequency values. Each of these frequency values is referred to as a frequency component. An image is a two-dimensional array of pixel measurements on a uniform grid. This information be described in terms of a two-dimensional grid of spatial frequencies. A given frequency component now specifies what contribution is made by data which is changing with specified x and y direction spatial frequencies

What do frequencies mean in an image? If an image has large values at high frequency components then the data is changing rapidly on a short distance scale. e.g. a page of text

23

If the image has large low frequency components then the large scale features of the picture are more important. e.g. a single fairly simple object which occupies most of the image.

Fourier Theory:The tool which converts a spatial (real space) description of an image into one in terms of its frequency components is called the Fourier transform, The new version is usually referred to as the Fourier space description of the image. The corresponding inverse transformation which turns a Fourier space description back into a real space one is called the inverse Fourier transform.

2.5.1. Smoothing Noise The idea with noise smoothing is to reduce various spurious effects of a local nature in the image, caused perhaps by 

noise in the image acquisition system,



arising as a result of transmission of the image, for example from

a space probe utilising a low-power transmitter. The smoothing can be done either by considering the real space image, or its Fourier transform. 2.5.1.1. Fourier Space Smoothing Methods Noise in an image means there are many rapid transitions (over a short distance) in intensity from high to low and back again or vice versa, as faulty pixels are encountered.

24

Therefore noise will contribute heavily to the high frequency components of the image when it is considered in Fourier space. Thus if we reduce the high frequency components, we should reduce the amount of noise in the image. We thus create a new version of the image in Fourier space by computing

where F(u,v) is the Fourier transform of the original image, H(u,v) is a filter function, designed to reduce high frequencies, and G(u,v) is the Fourier transform of the improved image. 2.5.1.1.1 Ideal Low Pass Filter The simplest sort of filter to use is an ideal lowpass filter, which in one dimension appears as shown in Figure 2.1.

Figure 2.1 Lowpass filter

25

This is a top hat function which is 1 for u between 0 and

, the cut-off

frequency, and zero elsewhere. So All frequency space space information above

is thrown away, and all information below

is kept.

The two dimensional analogue of this is the function

where

is now the cut-off frequency.

Thus,all frequencies inside a radius

are kept, and all others

discarded. The problem with this filter is that as well as the noise, edges (places of rapid transition from light to dark) also significantly contribute to the high frequency components. Thus an ideal lowpass filter will tend to blur edges become blurred. The lower the cut-off frequency is made, the more pronounced this effect becomes. 2.5.1.1.2. Low Pass Butterworth Filter Another filter sometimes used is the Butterworth lowpass filter. In this case, H(u,v) takes the form

26

where n is called the order of the filter. This keeps some of the high frequency information, as illustrated by the second order one dimensional Butterworth filter shown in Figure. 2.2 and consequently reduces the blurring.

Figure. 2.2 A Butterworth filter 2.5.1.2 Real Space Smoothing Methods 2.5.1.2.1 neighbourhood averaging The simplest approach is neighbourhood averaging, where each pixel is replaced by the average value of the pixels contained in some neighbourhood about it. The simplest case is probably to consider the 3x3 group of pixels centred on the given pixel, and to replace the central pixel value by the unweighted average of these nine pixels. If any one of the pixels in the neighbourhood has a faulty value due to noise, this fault will now be smeared over nine pixels as the image is smoothed. This tends to blur the image.

27

2.5.1.2.2.median filter A better approach is to use a median filter. A neighbourhood around the pixel under consideration is used, but this time the pixel value is replaced by the median pixel value in the neighbourhood. Thus, if we have a 3x3 neighbourhood, we write the 9 pixel values in sorted order, and replace the central pixel by the fifth highest value. This approach has two advantages. 

Occasional spurious high or low values are not averaged in they

are ignored the sharpness of edges is preserved. When the neighbourhood covers the left hand nine pixels, the median value is 10; when it covers the right hand ones, the median value is 20, and the edge is preserved. If there are large amounts of noise in an image, more than one pass of median filtering may be useful to further reduce the noise. A rather different real space technique for smoothing is to average multiple copies of the image.

2.5.2. Edge Detection Edges are very important to any vision system (biological or machine). 

They are fairly cheap to compute



They do provide strong visual clues that can help the recognition

process. 

Edges are affected by noise present in an image though.

An edge may be regarded as a boundary between two dissimilar regions in an image. these may be different surfaces of the object, or perhaps a boundary between light and shadow falling on a single surface. In principle an edge is easy to find since differences in pixel values between regions are

28

relatively easy to calculate by considering gradients. The idea is that over several images, the noise will tend to cancel itself out if it is independent from one image to the next. Statistically, we expect the effects of noise to be reduced by a factor

if we use n images. One particular situation where

this technique is of use is in low lighting conditions. 2.5.2.1 Detecting Edge Points 2.5.2.1.1 Gradient based methods An edge point can be regarded as a point in an image where a discontinuity (ingradient) occurs across some line. A discontinuity may be classified as one of three types .

Figure. 2.3 The C Compilation Model

2.5.2.1.1.1 A Gradient Discontinuity where the gradient of the pixel values changes across a line. This type of discontinuity can be classed as 

roof edges 29



ramp edges



convex edges



concave edges

by noting the sign of the component of the gradient perpendicular to the edge on either side of the edge. Ramp edges have the same signs in the gradient components on either side of the discontinuity, while roof edges have opposite signs in the gradient components. 1. A Jump or Step Discontinuity where pixel values themselves change suddenly across some line. 2. A Bar Discontinuity where pixel values rapidly increase then decrease again (or vice versa) across some line. For example, if the pixel values are depth values, 

jump discontinuities occur where one object occludes another (or

another part of itself). 

Gradient discontinuities usually occur between adjacent faces of

the same object. If the pixel values are intensities, 

a bar discontinuity would represent cases like a thin black line on

a white piece of paper. 

Step edges may separate different objects, or may occur where a

shadow falls across an object.

30

The gradient is a vector, whose components measure how rapidly pixel values are changing with distance in the x and y directions. Thus, the components of the gradient may be found using the following approximation:

where

and

measure distance along the x and y directions respectively.

In (discrete) images we can consider pixels between two points. Thus, when

and

in terms of numbers of (pixel spacing) and we are

at the point whose pixel coordinates are (i,j) we have

In order to detect the presence of a gradient discontinuity we must calculate the change in gradient at (i,j). We can do this by finding the following gradient magnitude measure,

31

and the gradient direction, , given by

2.5.2.1.2. Detection Implementation 2.5.2.1.2.1. Edge Operator The difference operators in Eqn. 2.6 correspond to convolving the image with the two masks in Figure 2.4. This is easy to compute: 

The top left-hand corner of the appropriate mask is

superimposed over each pixel of the image in turn, 

A value is calculated for

or

by using the mask coefficients

in a weighted sum of the value of pixel (i,j) and its neighbours. 

These masks are referred to as convolution masks or sometimes

convolution kernels.

Figure.2.4 Edge operator convolution masks .

32

2.5.2.1.2.2. Robert Edge Operator Instead of finding approximate gradient components along the x and y directions we can also approximate gradient components along directions at and

to the axes respectively. In this case the following equations are

used:

This form of operator is known as the Roberts edge operator and was one of the first operators used to detect edges in images. The correspondingconvolution masks are given by: -

Figure. 2.5 The C Compilation Model

2.5.2.1.2.3. Sobel Edge Operator Many edge detectors have been designed using convolution mask techniques, often using

mask sizes or even larger.

33

An advantage of using a larger mask size is that errors due to the effects of noise are reduced by local averaging within the neighbourhood of the mask. An advantage of using a mask of odd size is that the operators are centred and can therefore provide an estimate that is biased towards a centre pixel (i,j). One important edge operator of this type is the Sobel edge operator. The Sobel edge operator masks are given in Figure 2.6.

Figure. 2.6 Sobel edge operator convolution masks 2.5.2.1.2.4. Laplacian Operator All of the previous edge detectors have approximated the first order derivatives of pixel values in an image. It is also possible to use second order derivatives to detect edges. A very popular second order operator is the Laplacian operator. The Laplacian of a function f(x,y), denoted by

34

, is defined by:

Once more we can use discrete difference approximations to estimate the derivatives and represent the Laplacian operator with the convolution mask shown in Figure 2.7 .

Figure. 2.7 Laplacian operator convolution mask However there are disadvantages to the use of second order derivatives. 

(We should note that first derivative operators exaggerate the

effects of noise.) Second derivatives will exaggerated noise twice as much. 

No directional information about the edge is given.

The problems that the presence of noise causes when using edge detectors means we should try to reduce the noise in an image prior to or in conjunction with the edge detection process. 2.5.2.1.2.5. LOG Operator Another smoothing method is Gaussian smoothing 

Gaussian smoothing is performed by convolving an image with a

Gaussian operator which is defined below. 

By using Gaussian smoothing in conjunction with the Laplacian

operator, or another Gaussian operator, it is possible to detect edges. Lets look at the Gaussian smoothing process first.

35

The Gaussian distribution function in two variables, g(x,y), is illustrated in Figure. 2.8 and is defined by

where

is the standard deviation representing the width of the Gaussian

distribution. 

The shape of the distribution and hence the amount of smoothing

can be controlled by varying . 

In order to smooth an image f(x,y), we convolve it with g(x,y) to

produce a smoothed image s(x,y) i.e. s(x,y) = f(x,y)*g(x,y).

36

(2.10)

Figure. 2.8 The Gaussian distribution in two variables

Having smoothed the image with a Gaussian operator we can now take the Laplacian of the smoothed image: 

Therefore the total operation of edge detection after smoothing

on the original image is .

37

(2.11)



It is simple to show that this operation can be reduced to

convolving the original image f(x,y) with a ``Laplacian of a Gaussian'' (LOG) operator

, which is shown in Figure 2.9.

Figure. 2.9 The LOG operator Thus the edge pixels in an image are determined by a single convolution operation.

Figure 2.10 Steps of the LOG operator

38

This method of edge detection was first proposed by Marr and Hildreth at MIT who introduced the principle of the zero-crossing method.

The basic principle of this method is to find the position in an image where the second derivatives become zero. These positions correspond to edge positions as shown in Figure 2.10. 

The Gaussian function firstly smooths or blurs any step edges. 39



The second derivative of the blurred image is taken; it has a

zero-crossing at the edge. 

NOTE Blurring is advantageous here:

o

Laplacian would be infinity at (unsmoothed) step edge.

o

Edge position still preserved.

NOTE also: 

LOG operator is still susceptible to noise, but the effects of noise

can be reduced by ignoring zero-crossings produced by small changes in image intensity. 

LOG operator gives edge direction information as well as edge

points - determined from the direction of the zero-crossing. 2.5.2.1.2.6. DOG Operator A related method of edge detection is that of applying the Difference of Gaussian (DOG) operator to an image. 

computed by applying two Gaussian operators with different

values of to an image and forming the difference of the resulting two smoothed images. 

It can be shown that the DOG operator approximates the LOG

operator . 

Evidence exists that the human visual system uses a similar

method. 2.5.2.1.2.7. Canny Edge Detector Another important recent edge detection method is the Canny edge detector.

40

o

Canny's approach is based on optimising the trade-off between

two performance criteria: 

Good edge detection -- there should be low probabilities of

failing to mark real edge points and marking false edge points. 

Good edge localisation -- the positions of edge points marked by

the edge detector should be as close as possible to the real edge. The optimisation can be formulated by maximising a function that is expressed in terms of 

The signal-to-noise ratio of the image,



The localisation of the edges



A probability that the edge detector only produces a single

response to each actual edge in an image. 2.5.3. Edge Linking Edge detectors yield pixels in an image lie on edges. The next step is to try to collect these pixels together into a set of edges. Thus, our aim is to replace many points on edges with a few edges themselves. The practical problem may be much more difficult than the idealised case. 

Small pieces of edges may be missing,



Small edge segments may appear to be present due to noise

where there is no real edge, etc.

41

In general, edge linking methods can be classified into two categories:

Local Edge Linkers where edge points are grouped to form edges by considering each

point's relationship to any neighbouring edge points.  Global Edge Linkers where all edge points in the image plane are considered at the same time and sets of edge points are sought according to some similarity constraint, such as points which share the same edge equation. 2.5.3.1. Local Edge Linking Methods Most edge detectors yield information about the magnitude of the gradient at an edge point and, more importantly, the direction of the edge in the locality of the point. This is obviously useful when deciding which edge points to link together since edge points in a neighbourhood which have similar gradients directions are likely to lie on the same edge. Local edge linking methods usually start at some arbitrary edge point and consider points in a local neighbourhood for similarity of edge direction as shown in Figure 2.11

Figure. 2.11 Edge linking

42

If the points satisfy the similarity constraint then the points are added to the current edge set. The neighbourhoods based around the recently added edge points are then considered in turn and so on. If the points do not satisfy the constraint then we conclude we are at the end of the edge, and so the process stops. A new starting edge point is found which does not belong to any edge set found so far, and the process is repeated. The algorithm terminates when all edge points have been linked to one edge or at least have been considered for linking once. Thus the basic process used by local edge linkers is that of tracking a sequence of edge points. An advantage of such methods is that they can readily be used to find arbitrary curves. The basic idea is to find sets of edges. Here the assignment of each edge point to an edge is based on a probability estimate that the particular edge point and its local neighbours lie on the same edge. Other methods have posed the edge linking problem as 

A graph or tree search problem,



Dynamic programming problem where functions measuring the

error in the fitting of an edge to a set of points are minimised to find the best fitting edges in the image.

2.5.4. Segmentation Another way of extracting and representing information from an image is to group pixels together into regions of similarity. This process is commonly called segmentation. In 2D :we would group pixels together according to the rate of change of their intensity over a region. 3D :-

43

we group together pixels according to the rate of change of depth in the image, corresponding to pixels lying on the same surface such as a plane, cylinder, sphere. There are two main approaches to segmentation:2.5.4.1. Region Splitting The basic idea of region splitting is to break the image into a set of disjoint regions which are coherent within themselves: 

Initially take the image as a whole to be the area of interest.



Look at the area of interest and decide if all pixels contained in

the region satisfy some similarity constraint. 

If TRUE then the area of interest corresponds to a region in the

image. 

If FALSE split the area of interest (usually into four equal sub-

areas) and consider each of the sub-areas as the area of interest in turn. 

This process continues until no further splitting occurs. In the

worst case this happens when the areas are just one pixel in size. 

This is a divide and conquer or top down method.

If only a splitting schedule is used then the final segmentation would probably contain many neighbouring regions that have identical or similar properties. Thus, a merging process is used after each split which compares adjacent regions and merges them if necessary. Algorithms of this nature are called split and merge algorithms. To illustrate the basic principle of these methods let us consider an imaginary image. 

Let denote the whole image shown in Figure 2.12(a).

44



Not all the pixels in

are similar so the region is split as in

Figure 2.12(b). 

Assume that all pixels within regions I1, I2 and I3 respectively

are similar but those in I4 are not. 

Therefore I4 is split next as in Figure 2.12(c).



Now assume that all pixels within each region are similar with

respect to that region, and that after comparing the split regions, regions I43 and I44 are found to be identical. 

These are thus merged together as in Figure.

2.12(d).

Figure. 2.12 Example of region splitting and merging

We can describe the splitting of the image using a tree structure, using a modified quadtree. Each non-terminal node in the tree has at most four descendants, although it may have less due to merging. See Figure 2.13.

45

Figure. 2.13 Region splitting and merging tree

2.5.4.2. Region Growing Region growing approach is the opposite of the split and merge approach: 

An initial set of small areas are iteratively merged according to

similarity constraints. 

Start by choosing an arbitrary seed pixel and compare it with

neighbouring pixels . 

Region is grown from the seed pixel by adding in neighbouring

pixels that are similar, increasing the size of the region. 

When the growth of one region stops we simply choose another

seed pixel which does not yet belong to any region and start again. 

This whole process is continued until all pixels belong to some

region. 

A bottom up method.

Region growing methods often give very good segmentations that correspond well to the observed edges.

46

Figure. 2.14 Example of region growing

However starting with a particular seed pixel and letting this region grow completely before trying other seeds biases the segmentation in favour of the regions which are segmented first. This can have several undesirable effects: 

Current region dominates the growth process

ambiguities

around edges of adjacent regions may not be resolved correctly. 

Different choices of seeds may give different segmentation

results. 

Problems can occur if the (arbitrarily chosen) seed point lies on

an edge. To counter the above problems, simultaneous region growing techniques have been developed. 47



Similarities of neighbouring regions are taken into account in the

growing process. 

No single region is allowed to completely dominate the

proceedings. 

A number of regions are allowed to grow at the same time.

o

similar regions will gradually coalesce into expanding regions.



Control of these methods may be quite complicated but efficient

methods have been developed. 

Easy and efficient to implement on parallel computers.

48

CHAPTER 3 Neural Network

3.1 Introduction 3.1.1 Human and Computer What is the difference between the human brain and the computer? The difference is computer can do logical things well and it very bad at performing simple visual tasks. But the human can easily decide and give a name to any thing he sees.

The problem is the computer may do not perform as well as we want them. This problem is the one that people in AI want to tackle but their efforts are not sufficient to allow them to make the claim that they have computer systems the are artificial intelligent in any general sense that we would recognize.

The approach of neural computing is to capture the guiding principles that underlie the brain‟s solution to these problems and apply them to computer system.

The most important feature of the human brains is that it is able to learn things, it can teach itself but it is not true of conventional computer systems. In theses the computer usually has along and complicated programs, which gives it specific instructions as to what to do at every stage in its operation.

49

3.1.2 The structure of the brain The brain consists of 1010 basic units called neurons, and it is stand alone analogue logical processing unit. The neurons form two types: a) Local processing interneuron cells that have their input and output connections over about 100 microns. b) Output cells that connect different regions of the brain to each other connect the brain to muscle, or connect from sensory organs into the brain.

The operation of neuron is clear. It accepts many inputs which are all added up in some fashion. If enough active inputs are received at once, then the neuron will be activated and “fire”, if not, then the neuron will remain in its inactive, quite state.

The soma is the body of the neuron attached to the soma is long irregular shaped filaments, called dendrite, the dendrites act as the connection through which all the input to the neuron arrive. These cells are able to perform more complex functions than simple addition on the inputs they receive but considering a simple summation are a reasonable approximation.

Another part of the brain is an axon. It is electrically active and serves as the output channel of the neuron. Axon always appear on output cells, but are often absent from interneuron. The axon is a non-linear threshold device producing a voltage pulse called an active potential. It is in fact a series of rapid voltage spikes.

The synapse is the end of the axon and it is specialized contact that couples the axon with the dendrite of another cell. The synapse release chemical called neurotransmitters when its potential is raised sufficiently by 50

the action potential. It may take the arrival of more than one action potential before the synapse is triggered.

Axon

Synapse

Soma

Axon

Dendrite Dendrite

Figure 3.1 The basic feature of a biological neuron

3.1.3 Learning in biological systems Learning is thought to occur when modifications are made to the effective coupling between one cell and another, at the synaptic junction.

The mechanism for achieving this seems to be to facilitate the release of more neurotransmitters. This has the effect of opening more gats on the dendrite on the post-synaptic side of the junction and so increasing the coupling effect of the two cells. The adjustment of coupling so as to favorably reinforce good connections is an important feature of artificial neural net models as is the effective coupling, or weighting, that occurs on connections into a neural cell.

51

3.1.4 Learning in Machines The concepts of machine learning goes against many of the commonly held beliefs about computers that they can do only what they are programmed to d and can not adopt to their surroundings. Whilst it is true on an atomic level that the program controls the machine, the behavior that results does not have to be so rigid and deterministic as is commonly felt.

Having a computer learn to respond correctly to a given input, or learn to play a game, is not a simple concept and it is often felt that complicated programs and systems are required to achieve behavior such as this, that many would class as one of the requirements for intelligence. For an example we take a MENACE as a reference. The first game is played with the machine moving completely t random. When the game is over, the outcome is feedback into the machine so that it can adapt its behavior in the light of the outcome. So it can learn to play better next time. This is achieved by reinforce all the moves that were ultimately successful when the machine won and by decreasing the chance of it making the same bad that led to defeat. This process continues until the probability of the machine making a good move far outweighs the chance bad of it making a bad one.

The important features of machine learning is it usually takes some time for a machine to achieve a good probabilistic solution to a problem, but it is possible, given that the reinforcement learning takes place. MENACE treats the process of learning to play a game as a series of smaller subproblems, not enough on its own to play the game.

52

3.2 Pattern Recognition Pattern recognition is currently the dominating area for the application of neural networks. It is a large of computer science itself.

3.2.1 Pattern Recognition: in Perspective To appreciate what the pattern recognition problem is let us take Reading as an example. A significant proportion of the information that we absorb is presented to us in the form of patterns. Before we even start to consider the far reaching cognitive issues of language processing, the visual system must solve the pattern recognition problem. However, if we present this task to a computer we soon begin to realize the enormous complexity of the problem. The “classification” is one of the simpler pattern recognition tasks. It could be considered is resolved using a template matching technique where each letter is read into a fixed size frame and the frame compared to a template of all possible characters.

But consider the case for handwritten text-it would prove a near impossibility to provide templates to cope with the widely varying patterns in cursive script. Text processing is just one example of the pattern recognition problem. The difficulties described above are further complicated when we turn our attention to processing images, speech, or even stock market trends.

3.2.2 Pattern Recognition - a Definition A pattern recognition system can be considered as a two stage device. The first stage is feature extraction. The second one is classification.

53

We define a feature extraction as a measured taken must taken on the input pattern that is to be classified. Typically we are looking for feature that will provide definition characteristics of that input type. The feature extraction is rarely trivial and often poses the greater part of the recognition problem.

The classifier is supplied with the list of measured features. Its task is to map these input features onto a classified state that is, given the input features the classifier must decide which type of the class they match most closely. Classifiers typically rely on distance metrics and the probability to do this.

3.2.3 Discriminate Functions Discriminant functions are the basis for the majority of pattern recognition techniques. For example take a two dimensional rugby player and pallet dancer shown in Figure.3.2. Looking at the spread of measured samples we can see they form two distinct two clusters.

By looking again, we could intuitively decide that some line drawn (decision boundary) between the two classes could arbitrarily separate them, figure 3.3.

The mathematical definition of such a decision boundary is a “discriminant function”. It is a function that maps our input features onto a classification space. In our simple example there is an infinite number of boundaries we could have drawn to separate the two regions. In practice, it is advisable to make the discriminant function as simple as possible.

54

Height

Weight Rugby Player Ballet Dancer

Figure 3.2 A two dimensional Euclidean feature spaces Height

Decision Boundary

Rugby Player

Weight

Ballet Dancer

Figure 3.3 A linear classification decision boundary.

55

3.2.4. Classification techniques Pattern classifications fall techniques into two broad categoriesnumeric and non-numeric. Numeric techniques include deterministic and statistical measures which can be considered as measures made on the geometric pattern space. Non-numeric techniques are these which take as into the domain of symbolic processing that is dealt with by such methods as fuzzy sets.

3.2.4.1. Nearest Neighbor classification Nearest neighbor techniques, in essence, make a decision based on the shortest distance to the neighboring class samples. They assign it to which ever class it appears to be closest to. So the discriminant function will be in the form

f ( x)  closest(class1)  closest(class2)

(3-1)

3.2.4.2. Distance Metrics A) Hamming Distance measure It is the most and simplest one to use for two vectors X  ( x1 , x 2 , x 3 ,...) Y  ( y1 , y 2 , y 3 ,...)

The hamming distance is found by evaluating the difference between each component of one vector with the corresponding component of the other and summing these differences to provide an absolute value for the variation between the two vectors

56

. H   (| xi  yi |)

(3-2)

B) Euclidean distance measure Consider we have two vectors X and Y that we wish to find the distance between them (d(x, y)) then the shortest distance is

d ( X , Y ) eac 

n

 (x

i

 yi ) 2

(3-3)

i 1

Where n is the dimensionality of the vector.

C) City Block distance This method performs the Euclidean measure without calculating the squared or square root functions, thus Dcb  | xi  yi |

(3-4)

D) Square distance With this measure the distance between two vectors is defined as the maximum difference between each element of the two vectors D sq  MAX | X i  Yi |

(3-4)

3.2.5. Linear Classifiers The decision boundary defines a discriminant function f(x) of the form

n

f ( x )   wi xi

(3-5)

i 1

57

Where xi is ith component of an input vector wi is ith component of a weight vector n is dimensional of the input vector.

The output of the function for any input will be either a positive or negative value depending upon the value of the weight vector and the input vector. Then we have a decision mechanism that simply looks for the sign of f(x) for any input value.

The problem is finding a suitable weight vector that will give these results for all inputs. If we expand the discriminant function we can visualize the dependence of the output on the weight vector, we have f ( x)   wi xi  

(3-6)

f ( x)  (| w | . | x | cos  )  

(3-7)

This expands to

Where  is the angle between the vector X and W.

There are two parameters that control the position of the decision boundary in the pattern space. There is the slope of the line and the Y-axis intercept the slope of the line in the function is determined by the value of the weight vector. When the output of the classifier is zero we have

w x i

i

  0

58

x1 w1  x2 w2    0 x2 

 w1  x1  w2 w2

(3-8)

We can note that the slope is determined by the ratio of the weight values w1and w2 and the intercept is controlled by the bias value. Finding the weight vector is a problem but it is usually found by iterative trail and error methods that modify the weight values according to some error function. The error function typically compares the output of the classifier with a desired response and gives an indication of the difference between the two.

3.3. The Basic Neuron The idea behind neural networks computing is that by modeling the major feature of the brain and its operation we can produce computers that exhibit many of the useful properties of the brain.

3.3.1. Modeling the single neuron The basic function of the biological neuron is to add up its inputs and to produce an output if this sum is greater than some value, knowing as the threshold.

Our model of the neuron must capture these important features. We can summaries them as follows:  The output from a neuron is either on or off.

59

 The output depends only on the inputs. A certain number must be on at any one time in order to make the neuron fire.

Figure 3.4 shows the outline of the basic model. It performs a weight sum of its inputs, compares this to some internal threshold level and turns on only if this level is exceeded. If not, it stays off. This system is called feed forward because the inputs are passing through the model neuron to produce the output.

Input

Multiplicative weight

Body – adds its inputs, then threshold





Output

Input

Figure 3.4 outline of the basic model

The total input to the model is

Total = weight on line 1 * input no. 1 + weight on line 2 * input no.2 + …..  w1 x1  w2 x2  .... 

(3-9)

n

w x

i i

i 1

60

This sum then has to be compared to a certain value in the neuron, the threshold value. If the sum is greater than the threshold value, the output is a 1; If less, the output is a 0. An alternative way of achieving the some effect is to take the threshold out of the body of the model neuron and connect it to an extra input value that is fixed to be on all time (-) – this is known as biasing the neuron. The value of - is therefore known as the neuron‟s bias or offset.

Then the output of the neuron will be in the form:  n  y  f h   wi x i     i 1 

(3-10)

Where fh is a step function (Heaviside function) and f ( x)  1

x0

f ( x)  0

x0

(3-11)

So that it does what we want.

If we use the bias term as an extra input, input 0, which is always set to be on, with a weight that represent the bias applied to the neuron. The equation will be:  n  y  f h  wi xi   i 0 

(3-12)

This model is shown in figure 3.5.

61

x0

x1

w0

x2 x3

w1 w2

. . .

w3

Threshold unit

. .

wn

xn

Figure 3.5 Details of the basic model

3-3-2) Learning in simple neuron The guiding principle is to allow the neuron to learn from its mistake, if it produces an incorrect output. We want to reduce the chance of that happening again. We will setup the neuron with random weights on its input lines. Corresponding to a starting state in which it knows nothing. The neuron will perform the weighted sum of the inputs and compare this to the threshold. If it exceeds the threshold, it will output a 1, whilst if it does not, it will output a 0.

Let us assume it does get the correct answer, and then we do not need to do anything, since the model has been successful. But if the neuron produces a 0, we want to increase the weighted sum, so the next time it will exceed the threshold and so produce the correct output. We do this by increasing the weights.

62

This means that for the network to learn, we want to increase the weights on the active inputs when we want the output to be active, and to decrease them when we want the output to be inactive. We can achieve this by adding the input values to be weights when we want the output to be on, and subtracting the input values from the weights when we want the output to be off. Since the learning is guided by knowing what we want to achieve. It is known as supervised learning. Our learning paradigm can be summarized as follows:  Set the weights and threshold randomly.  Present an input.  Calculate the actual output by taking the threshold value of the weighted sum of the inputs.  Alter the weights to reinforce correct decisions and discourage – i.e. reduce the error.  Present the next input.

3.3.3. The Perceptron: A Vectorial Perspective If we write the input to perceptron as a vector X and has n element and write the weight as another vector W then we can replace the weighted sum with the identical vector dot product:

n

w x i

i

 W .X

(3-13)

i 0

We can understand how the perceptron learning procedure works on an intuitive level by examining the behavior of the weight vector as the

63

perceptron learns patterns. The solution to classifying the patterns is to produce a dividing line between them. That line is what we want our perceptron to discover for itself. A line such that which separates two classes in pattern space is said to partition the space into two classes. The perceptron generates this line by adjusting the value of the weight vector. The perceptron starts with random weight vector that points any where in the pattern space. A pattern is presented and the learning procedure ensures that if the output is incorrect, the weight vector is altered to reduce the error. This achieved by moving the vector infinite amount towards the ideal weight vector. See figure 3.6.

Figure 3.6 Behavior of the weight vector in pattern space

3.4. Neural Network Applications Neural networks are performing successfully in some fields, such as recognizing and matching complicated, vague, or incomplete patterns. Neural networks have been applied in solving a wide variety of problems. The most 64

common use for neural networks is to project what will most likely happen. There are many areas where predication can help in setting priorities. For example, the emergency room at a hospital can be hectic place; to know who the most critical needs help can enable a more successful operation. Basically, all organizations must establish priorities, which govern the allocation of their resources. Neural networks have been used as a mechanism of knowledge acquisition for expert system in stock market forecasting with astonishingly accurate results. Neural networks have also been used for bankruptcy prediction for credit card institutions.

Although one may apply neural network systems for interpretation, prediction, diagnosis, planning, debugging, repair, instruction, and control, the most successful applications of neural networks are in categorization and pattern recognition. Such a system classifies the object under investigation (e.g. an illness, a pattern, a picture, a chemical compound, a word, the financial profile of a customer) as one of numerous possible categories that, in return, may trigger the recommendation of an action (such as a treatment plan or a financial plan). A company called Nestor has neural network for financial risk assessment for mortgage insurance decisions, categorizing the risk of loans as good or bad. Neural networks has also been applied to convert text to speech, NETtalk is one of the systems developed for this purpose. Image processing and pattern recognition form an important area of neural networks, and probably one of the most actively research of neural networks.

Another application of neural networks is character recognition and handwriting recognition. This area has been in banking, credit card processing and other financial services, where reading and correctly recognizing

65

handwriting on documents is of crucial significance. The pattern recognition capability of neural networks has been used to read handwriting in processing checks, the amount must normally be entered into the system by a human. A system that could automate this task would expedite check processing and reduce errors.

Basically, most application of neural networks falls into the following five categories: (i) Prediction Use input values to predict some output. e.g. pick the best stocks in the market, predict weather, and identify people with cancer risk. (ii) Classification Use input values to determine the classification. (e.g. is the input the letter A, is the blob of the video data a plane and what kind of plane is it). (iii) Data Association Like classification, but it also recognizes data that contains errors. (e.g. not only identify the characters that were scanned but identify when the scanner is not working properly). (iv) Data Conceptualization Analyze the inputs so that grouping relationships can be inferred. (e.g. extract from a database the names of those most likely to buy a particular product).

(v) Data Filtering Smooth an input signal.(e.g. take the noise out of a telephone signal).

66

3.5. Altering the Perceptron Model An initial approach to overcome the problem of being unable to solve linearly inseparable problems with our perceptron would be to use more than one perceptron, each setup to identify small, linearly separable section of the inputs, then combining their outputs into anther perceptron, which would produce a final indication of the class to which the input belongs.

But this approach will be unable to learn. For the perceptron in the first layer, the inputs come from the actual inputs to the network. The perceptrons in the second layer take as their inputs the outputs from the first layer. This means that the perceptron in the second layer do not know which of the real inputs were on or not.

Since learning corresponds to strengthening the connections between active inputs and active units, it is impossible to strengthen the correct parts of the network, since the actual inputs are effectively masked off from the output units by the intermediate layer. The two – state neuron gives us no indication of the scale by which we need to adjust the weights, and so we can not make a reasonable adjustment. Now the problem is known as credit assignment problem, it means that the network is unable to determine which of the input weights should be increased and which should not, and so is unable to work out what changes should be made to produce a better solution next time. We can solve this problem by using the step function (figure 3.7) as the threshold process is to adjust it slightly and use a slightly different non – 67

linearity, if we smooth it out, so that region in the middle that will give us some information on strengthen or weaken the relevant weights. This means that the network will be able to learn as required.

A

B 1

1

0

0

Linear threshold limits – otherwise, 0 or 1.

Sigomidal threshold

Figure 3.7 Two possible thresholding functions.

3.6. Neural Network Architectures 3.6.1. Multilayer Perceptron It is a new model consists of three layers; an input layer, an output layer, and a hidden layer (figure 3.8). Each unit in the hidden layer and the output is like a perceptron unit, except that the thresholding function is the sigmoid function (figure 3.7 B)

The units in the input layer serve to distribute the value. They receive to the next layer, and so do not perform a weighted sum or threshold.

68

Input layer

Hidden layer

Output layer

Figure 3.8 the multilayer perceptron neural network

3.6.1.1. The Learning Rule of MLP The rule of learning MLP is called Back Propagation Rule. The operation of the network is similar to that of the single – layer perceptron, in that we show the net a pattern and calculate its response; comparison with the desired response enables the weights to be altered so that the network can produce a more accurate output next time. The use of the sigmoid function means that enough information about the output is available to units in earlier layers, so that these units can have their weights adjusted so as to decrease the error next time.

The learning rule is a little more complex than the previous one. We show the untrained network an input pattern; it will produce any random output. We need to define an error function that represent the difference between the networks current output and the correct output that we want it to

69

produce, because we need to know the “correct” pattern, this type of learning is know as “supervised learning”.

In order to learn successfully we want to make the output of the net approach the desired output, that is, we want to continually reduce the value of this error function this is achieved by adjusting the weights on the links between the units, and generalized delta rule does this by calculate the value of the error faction for that particular input, and then pack – propagation the error from one layer to the previous one.

Each unit in the net has its weights adjusted so that it reduces the value of the error function; for units actually on the output, their output and the desired output is known, so adjusting the weights is relatively simple, but for units in the middle layer, the adjusting is not so obvious. We can summaries the MLP learning process as follows:  Initialize weights and thresholds with random values.  Present the input and desired output. Calculate the actual output for each layer  n 1  y pi  f  wi xi   i 0 

(3-14)

And passes that as input to the next layer. The final layer outputs values Opj.  Adapt weights starting from the output layer wij (t  1)  wij (t )   ij  pj

70

(3-15)

For output units  pj   pj 1   pj pj   pj 

(3-16)

For hidden units  p j   p j 1   p j   p k wik

(3-17)

k

3.6.1.2. Learning Difficults Some times the network settles into a stable solution that does not provide the correct output. In these cases the energy function is in a local minimum. There are alternative approaches to minimizing these occurrences such as:  Lowering the gain term.  Addition of internal nodes.  Using a momentum term.  Addition of noise.

3.6.1.3. Multilayer Perceptron as Classifier Consider a net of there perceptron devices as shown in figure. 3-9. If the unit in the second layer has its threshold set so that it turns on only when both of the first – layer units are on, it is performing a logical AND operation. Since each of the units in the first layer defines a line in pattern space, the second unit produces a classification based on a combination of these lines.

71

Figure 3.9 Two perceptron units can be combined to produce input for a third.

If one unit is set to respond with a 1 if the input is above its decision line and the other responds with a 1 if the same input is below its decision line, then the second layer produces a solution as shown in figure 3-10, producing a 1 if it is above line 1 and below line 2. Line 1

Line 2

Figure 3.10 3 three perceptron: the decision region produced by combining 2 perceptron with another.

More than two units can be use in the first layer, which produces pattern space partition that is a combination of more than 2 lines. All regions produced in this way are known as convex region or convex hulls.

72

The addition of more perceptron units in the first layer allows us to define more and more edges – from the points we have made above, it is obvious the total number of sides that we can have in our regions will be at most equal to the number of units in the first layer, and that the regions defined will still be convex.

However, if we add another layer of perceptrons, the units in this layer will receive as inputs, not lines, but convex hulls, and the combination of these are not necessarily convex, the combination of these convex regions may intersect, overlap, or be separate from each other, producing arbitrary shapes.

Three layers of perceptron units can therefore form arbitrarily complex shapes, and are capable of separating any classes. The complexity of the shapes is limited by the number of nodes in the networks, since these define the number of edges that we can have. The arbitrary complexity of shapes that we can create, means that we never need more than three layers in a network, a statement that is referred to as the Kolmogorov theorem this can be proved, with a bit of complex maths.

3.6.2. Radial Basis Function These are asset of generally non - linear functions that are built up into one function that can partition the pattern space successfully. The Radial Basis approach uses hyper ellipsoids to partition the pattern space. These are defined by functions of the form:    



(3-18)

73

Where ||.....|| denotes some distance measure, this expression describes some sort of multi – dimensional ellipse, since it represents a function whose argument is related to a distance from a center, y. The function s in Kdimensional space, which partitions the space, has elements Sk given by: m



s k    jk x  y j



(3-19)

j 1

It is a linear combination of these basis functions.

The advantage of using the Radial Basis Approach is that once it have been chosen, all that is left to determine are the coefficients j for each, to allow them to partition the space correctly. Since these coefficients are added in a linear fashion. This approach has no nasty local minima situation in which to fall. The radial basis functions have expanded the inputs into a higher - dimensional space where they are now linearly separable.

This approach is guaranteed to produce a function for each input does mean that noisy or anomalous data points will also be classified; however and these will tend to cause distortion. This noise distortion causes problems with generalization; since the classification surface is not necessarily smooth, very similar inputs may find themselves assigned to very different classes. The solution to this is to reduce the number of basis functions to a level at which an acceptable fit to the data is still achieved.

The choice of which of which radial basis function to use is usually made in one of two ways:

74

1. In the absence of any knowledge about the data, the basis functions are chosen so that they fit points evenly distributed through the set of possible inputs. 2. If we have some knowledge as to the overall structure of the inputs. Then it is better to try and mirror that structure in the choice of functions.

This is most easily achieved by choosing a subset of the inputs points, which should have a similar distribution to the overall input.

The difficulty in using the Radial Basis Functions is in deciding on the set of basis function to be used, in order to get an acceptable fit to the data; this is one of a number of techniques that essentially preprocess the data and transform it into a higher – dimensional space in which the classes are linearly separable.

3.6.3. Kohenan Self – Organizing Map Networks Kohenan uses the idea of brain that uses spatial mapping to model complex data structures internally to good advantage in his network; it allows him to perform data compression on the vectors to be stored in the network; using a technique know as vector quantization it also allows the network to store data in such a way that spatial or topological relationships in the training data are maintained and represented in a meaning full way.

The network shown in figure 3-11 is a one-layer two dimensional Kohenan network. The most obvious point to note is that the neurons are not arranged in layer in layers as in the MLP but rather on a flat grid. All inputs connect to ever node in the network, feed back is restricted to literal inter 75

connections to immediate neighboring nodes. Note that there is no separate output layer – each of the nodes in the grid is itself an output node.

The learning algorithm organizes the nodes in the grid into local neighborhoods that acts as a feature classifier on the input data. The topographic map is autonomously organized by a cyclic process of comparing input pattern to vectors “stored” at each node. No training response is specified for any training input.

Input nodes

Figure 3.11 A Kohonen feature map. Note that there is only one layer of neurons and all inputs are connected to all nodes. Where inputs match the node vectors, that area of the map is selectively optimized to represent an average of the training data for that class. Form a randomly organized set of nodes the grid settles into a feature map that has local representation and is self – organized.

76

3.6.3.1. Weight Training We can see that the changing the weight value is proportional to the difference between the input vector and the weight vector: wij (t  1)  wij (t )   (t )( xi (t )  wij (t ))

(3-20)

Where wij is the ith component of weight vector to node j, for j in the neighborhood N j* (t ) ( 0  i  n  1 ).  (t ) is the learning rate coefficient ( 0   (t )  1).

The training cycle having two stages: 1. The training process attempts to cluster the nodes on the topological map to reflect the rang of class types found in the training data. 2. Fine – tuned to the input training vectors, to achieve this fine – tuning much smaller changes must be made to the weight vectors at each node, so the adaption rate is reduced as training progresses.

Each time a new training input is applied to the network the winning node must first be located; this identifies the regions of the feature map that will have its weight values updated. The winning node is the node that has closest matching weight vector to the input vector, and the metric that is used to measure the similarity of the vectors is the Euclidean distance measure.

The network weights should be set to small, normalized random values. Typically the input training vectors will fall into cluster over a limited region of the pattern spaces, corresponding to their class. If the weight vectors, stored at the nodes in the network, are randomly spread then the situation could quite easily arise where many of the weight vectors are in a very

77

different orientation to the majority of the training inputs. These nodes will not win any of the best – match comparisons and will remain unused in forming the topological map.

3.6.3.2. Neighborhoods The idea of neighborhoods is a dynamically changing boundary that defines how many nodes surrounding the winning node will be affected with weight modifications during the training process.

When a node is selected as the closest match to an input it will have its weights adapted to tune it to the input signal. However, all the nodes in the neighborhood will also be adapted by a similar amount. As training progresses the size of the neighborhood is slowly decreased to a predefined limit.

The adaptation rate must be reduced during the training cycle so that weight changes are made more and more gradual as the map develops. This ensures that clusters form accurate internal representations of the training data as well as causing the network to converge to a solution with in a predefined time limit.

The training also is affected by the shape of the neighborhood boundary. It is preferable to start with neighborhood fairly wide initially and allow them to decrease slowly with the number of training posses.

3.6.4. Hopfield Networks The Hopfield networks consists of a number of nodes, each connected to every other node; it is a full - connected networks. It is shown in figure 78

3.12. It is also a symmetrically – weighted network, since the weights on the link from one node to another are the same in both directions.

Figure 3.12 Hopfield Network

Each node has a threshold and a step – function, and the node calculate the weighted sum of their inputs minus the threshold value, passing that through the step function to determine their output state the net takes only 2 – state inputs - these can be binary (0,1) or bipolar (-1,+1). From the figure 312, we can see that there is no obvious input or output connections – each node is the same as the any other so the network operates in a different way.

Inputs to the network are applied to all nodes at once, and consist of a set of alternating values, +1 or -1. The network is then left a lone, and it proceeds to cycle through a succession of states, until it converges to a stable solution, which happens when the values of the nodes no longer alter. The

79

output of the network is taken to be the value of all the nodes when the network has reached a stable, steady state.

3.7. Hardwired Neural Network We define NNW hardware as those devices designed to implement neural architectures and learning algorithms, especially those devices that take advantage of the parallel nature inherent to NNW‟s. We give here a brief review of NNW hardware architectures and some of the products developed over the pats decade or NNW implementations.

3.7.1. NNW Chips  Although NNW‟s have been built with discrete components, the heart of the modern hardware NNW is a VLSI chip.  The basic categories are: 1. Digital 2. Analog 3. Hybrid  Other major distinguishing feature: 1. Neural Network architecture. 2. Programmable or hardwired network. 3. On – chip learning or chip – in – the – loop training. 4. Low, medium or high number of parallel processing elements (PE‟s). 5. Maximum networks size. 6. Can chips be chained together to increase network size. 7. Bits of precision (estimate for analog). 8. Transfer function on – chip or off – chip. 9. Accumulate size in bits. 80

10. Expensive or cheap.

3.7.2. Comparison Ratings  Comparing hardware NNW performance can be tricky.  The most common performance measure is the Connection – Per – Sec (CPS) rate, defined as the number of multiply and accumulate operations per second during recall, or forward, processing.  However, a device with only bit weights and inputs may not always be considered superior to anther device that has a lower CPS but, say, 16 bit weights and inputs.  For a measure of training speed, the Connection – Update – Per – Second (CPUS) rate is sometimes provided.  The learning rate also depends on the algorithm implemented. A chip with a RBF algorithm could have a slower learning than a Feed – forward chip trained with back – propagation but learns with far fewer passes.  Unfortunately, just as for software network algorithm, there is no standard benchmark detests on which hardware networks are tested?

3.7.3. Digital NNW Chips  Digital design include: 1. Slice architectures. 2. Single Instruction Multiple Data (SIMD). 3. Systolic array device.  Digital advantages include: 1. Mature fabrication techniques. 2. Weight storage in RAM.

81

3. Arithmetic operations exact to the number of bits of the operands and accumulators. 4. Digital chips are easily integrated into most application.  Digital disadvantages: 1. Digital operations are usually slower than analog systems, especially in the weight*input multiplication. 2. Real world is analog, so sensor inputs need conversion to digital, control output converted back to analog. 3.7.3.1. Digital Designs:  Slice architectures 1. Inspired by bit – slice conventional architectures. 2. Simple, cheap building blocks to construct networks of arbitrary size precision. 3. Micro Devices MD1220 – one of the first NNW chips:  8 neurons with hard- -limit thresholds.  8 16 – bit synapses with 1 – bit inputs.  With bit – serial multipliers in the synapse, the chip provides about 9MCPS.  Bigger networks and networks with higher bit inputs constructed with multiple chips.  16 – Bit accumulator limited the total number of inputs because of overflows.  Single Instruction Multiple Data (SIMD) 1. Multiple PE‟s, each run simultaneously same instruction, but on different data. 2. Powerful design that allows for many different NNW‟s to be programmed. 82

3. Adaptive solution N6400:  64 PE‟s  Each PE holds a 9*16 bit integer multiplier, 32 – bit accumulator, and 4KByte of on chip memory for weight storage.  Common control and data buses allow for multiple chips.  Systolic array devices: 1. Each PE does one step of a calculation (always the same step). 2. Then passes its result on to the next processor in the pipe line. 3. Excellent for the matrix – matrix multiplication‟s common to NNW‟s. 4. Semen MA – 16 :  Fast matrix – matrix operations (multi, sub or add).  4*4 matrices with 16 – bit elements.  The multiplier outputs and accumulators have 48 – bit precision.  Weights stored off – chip and neuron transfer functions are off – chip via lookup tables.  Multiple chips can be cascaded.

3.7.4. Analog & Hybrid NNW Chips  Analog advantages: 1. Exploit physical properties to do network operations, thereby obtain high speed and densities. 2. A common output line, for example, can sum current outputs synapses to sum the neuron inputs.  Analog disadvantages:

83

1. Design can be very difficult because of the need to compensate for variations in manufacturing in temperature, etc. 2. Analog weight storage complicated, specially, especially if non – volatility required. 3. Weight*input must be linear over a wide rang.  Hybrids combine digital and analog technology to attempt to get the best of both. Variations include: 1. Internal processing analog for speed but weights set digitally. 2. Pulse network use rate or widths of pulses to emulate amplitude of I/O and weights. 3.7.4.1 Analog & Hybrid Designs  Intel Eclectically Trainable Neural Network (ETANN) – no longer available but the most elaborate and powerful analog chip to data:  64 neurons  10280 synapse with 5 – 6 bit precision.  Non – volatile gates and a Gilbert Multiplier provided 4 – quadrant multiplication.  Internal feedback and division of the weights into two 64*80 banks (including 16 biases), allowed for multiple configurations: 1. Including 2 – layers of 64 neurons/layer. 2. 1 – Layer with 128 inputs and 64 neurons.  No on – chip learning provided so a chip – in – the – loop mode with a PC is necessary.

84

3.7.5. Neuromorphic NNW’s  Neuromorphic refers to circuit design that closely emulates biological neural design.  The processing is mostly analog, although outputs can be digital.  The function may be as sensor rather than as a pattern classifier.  Example include: 1. Silicon Retina. 2. Synaptic touchpad. 3. Pulse couple Neural Networks (PCNN):  Recently implemented in hardware.  PCNN‟s can perform image preprocessing, such as edge finding and segmentation.  Time series output invariant to scaling rotation and translation.

3.7.6. General Purpose Vs. Algorithm Specific  Intel Pentiums have made life difficult for NNW hardware designers.  NNW hardware has gone in two directions: 1. General purpose (but expensive) NeuroComputers. 2. Simple, cheap, algorithm specific chips.  It seems to be important that algorithm specific chips not be expensive and complicated.  Intel‟s own ETANN, for example, could not find a market niche large enough to justify it‟s continued development. 1. The ETANN could implement feed – forward networks and run at high speed (10s). 2. However, it was complicated and time consuming to train.

85

3. Simply downloading known weights took substantial time due to the iterative technique needed to set the floating gate charges. 4. Expensive: $2K per chip, $10 for full development system including software and EEPROM.  Adaptive solutions CNAPS has done better (although not spectacular). 1. Multi PE‟s have own multiplier, accumulator, memory buffer, etc. 2. Besides the NNW algorithm processing, other tasks can be paralleled, e.g. initializing inputs, normalizing outputs, etc. 3. CNAPS has been used for other purposes, such as image

processing, and so broadened its market. 4. Complex system requiring its own C compiler, assembler, debugger, etc.  On the other hand, sensory Inc.s cheap dedicated chips for speech recognition seem to be doing quite well – nearly 1 million sold so far.

3.7.7. Hardware Systems  Besides NNW chips, NNW system are available.  A system is comprised of: 1. A Fully loaded board, or a standalone computer, including the NNW chips. 2. Software development environment to train and run the networks.  Even when one intends to implement a chip into one‟s own custom board, having a development system to gain experience with the chip can be very useful.

86

3.7.8. NeuroComputers  NeuroComputers are defined here as stand alone systems with elaborate hardware.  They are intended for large scale processing applications such high throughput OCR, e.g. forms processing.  Expensive and complex.  Example:  Semen synapse 1 NeuroComputers:  Uses 8 of the MA – 16 systolic array chips.  It resides in its own cabinet and communicates via Ethernet to a host workstation.  Peak performance of 3.2 billion multiplication‟s (16 – bit * 16 – bit) and additions (48 – bit) per sec. at25MHZ – clock rate.  Adaptive solutions CNAPServer VME System  VME boards in a custom cabinet run form a UNIX host via an Ethernet link.  Boards come with 1 to 4 chips and up to two boards to give a total of 512 PE‟s.  Software includes a C – language library, assembler, compiler, and a package of NN algorithms.

3.7.9. NNW Accelerator Cards  Another approach to dealing with the PC is to work with it in partnership.  Accelerator cards reside in the expansion slots and are used to speed up the NNW computations.  Cheaper than NeuroComputers. 87

 Usually based on NNW chips but some just use fast digital signal processors (DSP) that do very fast multiple – accumulate operations.  Examples:  IBM, ZISC, ISA, and PCI cards:  ZISC implements RBF architecture with RCE learning.  ISA cards holds to 16 ZISC036 chips, giving 576 prototype neurons.  PCI cards holds up to 19 chips for 684 prototypes.  PC card can process 165,000 patters/sec, where pattern are 64 8 – bit element vectors.

88

CHAPTER 4 PAST RESEARCHES ON FACE RECOGNITION The task of recognizing faces has attracted much attention both from neuroscientists and from computer vision scientists. This chapter reviews some of the well-known approaches from these both fields.

4.1. Human Face Recognition The major research issues of interest to neuroscientists include the human capacity for face recognition, the modeling of this capability, and the apparent modularity of face recognition. In this section some findings, reached as the result of experiments about human face recognition system that are potentially relevant to the design of face recognition systems will be summarized.

One of the basic issues that have been argued by several scientists is the existence of a dedicated face processing system [1,10]. Physiological evidence indicates that the brain possesses specialized „face recognition hardware‟ in the form of face detector cells in the infer temporal cortex and regions in the frontal right hemisphere; impairment in these areas leads to a

syndrome

known as prosapagnosia.

Interestingly, prosapagnosics,

although unable to recognize familiar faces, retain their ability to visually recognize non-face objects. As a result of many studies scientists come up with the decision that face recognition is not like other object recognition [11].

89

Hence, the question is what features humans use for face recognition. The results of the related studies are very valuable in the algorithm design of some face recognition systems. It is interesting that when all facial features like nose, mouth, eye etc. are contained in an image, but in different

order

than

ordinary, recognition is not possible for human.

Explanation of face perception as the result of holistic or feature analysis alone is not possible since both are true. In human both global and local features are used in a hierarchical manner[1]. Local features provide a finer classification system for face recognition. Simulations show that the most difficult faces for humans to recognize are those faces, which are neither attractive nor unattractive [12]. Distinctive faces are more easily recognized than typical ones. Information contained in low frequency bands used in order to make the determination of the sex of the individual, while the higher frequency components are used in recognition.

The low frequency

components contribute to the global description, while the high frequency components contribute to the finer details required in the identification task [13, 14, 15] .It has also been found that the upper part of the face is more useful for recognition than the lower part [1].

In [11], Bruce explains an experiment that is realized by superimposing the low spatial frequency Margaret Thatcher‟s face on the high spatial frequency components of Tony Blair‟s face. Although when viewed close up only Tony Blair was seen, viewed from distance, Blair disappears and Margaret Thatcher becomes visible. This demonstrates that the important information for recognizing familiar faces is contained within a particular range of spatial frequencies.

Another important finding is that human face recognition system is disrupted by changes in lighting direction and also changes of 90

viewpoint. Although some scientists tend to explain human face recognition system based on derivation of 3D models of faces using shape from shading

derivatives, it

is difficult to understand why face recognition

appears so viewpoint dependent [16]. The effects of lighting change on face identification and matching suggest that representations for face recognition are crucially affected by changes in low level image features.

Bruce and Langton found that negation (inverting both hue and luminance values of an image) effects badly the identification of familiar faces [17]. They also observe that negation has no significant effect on identification and matching of surface images that lacked any pigmented and textured features; this led them to attribute the negation effect to the alteration of the brightness information about pigmented areas. A negative image of a dark-haired Caucasian, for example, will appear to be a blonde with dark skin. Kemp et al [18] showed that the hue values of these pigmented regions do not themselves matters for face identification. Familiar faces presented in „hue negated‟ versions, with preserved luminance values, were recognized as well as those with original hue values maintained, though there was a decrement in recognition memory for pictures of faces when hue was altered in this way [19].This suggests that episodic memory for pictures of unfamiliar faces can be sensitive to hue, though the representations of familiar faces seems not to be. This distinction between memory for pictures and faces is important. It is clear that recognition of familiar and unfamiliar faces is not the same for humans. It is likely that unfamiliar faces are processed in order to recognize a picture where as familiar faces are fed into the face recognition system of human brain. A detailed discussion of recognizing familiar and unfamiliar faces can be found in [20].

91

Young children typically recognize unfamiliar faces using unrelated cues such as glasses, clothes, hats, and hairstyle. By the age of twelve, these paraphernalia are usually reliably ignored. Curiously, when children as young as five years are asked to recognize familiar faces, they do pretty well in ignoring paraphernalia. Several other interesting studies related to how children perceive inverted faces are summarized in [41, 22].

Humans recognize people from their own race better than people from another race. Humans may encode an „average‟ face; these averages may be different for different races and recognition may suffer from prejudice and unfamiliarity with the class of faces from another race or gender [1]. The poor identification of other races is not a psychophysical problem but more likely a psychosocial one. One of the interesting results of the studies to quantify the role of gender in face recognition is that in Japanese population, majority of the women‟s facial features is more heterogeneous than the men‟s features. It has also been found that white women‟s faces are slightly more variable than men‟s, but that the overall variation is small [23, 24].

4.1.1. Discussion The recognition of familiar faces plays a fundamental role in our social interactions. Humans are able to identify reliably a large number of faces and psychologists are interested in understanding the perceptual and cognitive mechanisms at the base of the face recognition process. Those researches illuminate computer vision scientists‟ studies.

92

We can summarize the founding of studies on human face recognition system as follows: 1. The human capacity for face recognition is a dedicated process, not merely an application of the general object recognition process. Thus artificial face recognition systems should also be facing specific. 2. Distinctive faces are more easily recognized than typical ones. 3. Both global and local features are used for representing and recognizing faces. 4. Humans recognize people from their own race better than people from another race. Humans may encode an „average‟ face. 5. Certain image transformations, such as intensity negation, strange viewpoint changes, and changes in lighting direction can severely disrupt human face recognition.

Using the present technology it is impossible to completely model human recognition system and reach its performance. However, the human brain has its shortcomings in the total number of persons that it can accurately

„remember‟. The benefit of a computer system would be its

capacity to handle large datasets of face images.

The observations and findings about human face recognition system will be a good starting point for automatic face recognition methods. As it is mentioned above "automated face recognition" system should be face specific. It should effectively use features that discriminate a face from others, and more as in caricatures it preferably amplifies such distinctive characteristics of face [25, 15].

Difference between recognition of familiar and unfamiliar faces must also be noticed. First of all we should find out what makes us familiar to a 93

face. Seeing a face in many different conditions (different illuminations, rotations, expressions…etc.) make us familiar to that face, or by just frequently looking at the same face image can we be familiar to that face? Seeing a face in many different conditions is something related to training however the interesting point is that by using only the same 2D information how we can pass from unfamiliarity to familiarity. Methods, which recognize faces from a single view, should pay attention to this familiarity subject.

Some of the early scientists were inspired by watching bird flight and built their vehicles with mobile wings.

Although a single underlying

principle, the Bernoulli effect, explains both biological and man-made flight, we note that no modern aircraft has flapping wings. Designers of face recognition algorithms and systems should be aware of relevant psychophysics and neurophysiologic studies but should be prudent in using only those that are applicable or relevant from a practical/implementation point of view.

4.2. Automatic Face Recognition Although humans perform face recognition in an effortless manner, underlying computations within the human visual system are of tremendous complexity. The seemingly trivial task of finding and recognizing faces is the result of millions years of evolution and we are far away from fully understanding how the brain performs this task.

Up to date, no complete solution has been proposed that allow the automatic recognition of faces in real images. In this section we will

94

review existing face recognition systems in five categories: early methods, neural

networks

approaches,

statistical

approaches,

template

based

approaches, and feature based methods. Finally current state of the art of the face recognition technology will be presented.

4.2.1. Representation, Matching and Statistical Decision The performance of face recognition depends on the solution of two problems: representation and matching. At an elementary level, the image of a face is a two dimensional (2-D) array of pixel gray levels as, x= {xi, j, i, j  S},

(4.1)

Where S is a square lattice. However in some cases it is more convenient to express the face image, x, as one-dimensional (1-D) column vector of concatenated rows of pixels, as x=[x1, x2... x n] T

(4.2)

Where n= ║S║ is the total number of pixels in the image. Therefore x Rn, the n dimensional Euclidean space.

For

a

given

representation,

two

properties

are

important:

discriminating power and efficiency; i.e. how far apart are the faces under the representation and how compact is the representation.

While many previous techniques represent faces in their most elementary forms of (4.1) or (4.2), many others use a feature vector, F(x)=[ f1 (x), f2(x),...., fm(x)]T, where f1(.),f2(.),...,fm(.) are linear or nonlinear functional. Feature-based representations are usually more efficient since generally m is much smaller than n.

95

A simple way to achieve good efficiency is to use an alternative orthonormal basis of Rn. Specifically, suppose e1, e2... en are an orthonormal basis. Then can be expressed as a

4 - 3

x  ~ x i ei i 1

Where ~xi   x, ei  (inner product), and x can be equivalently by ~x  ~x1 , ~x2 ,...., ~xn T . Two examples of orthonormal basis are the natural basis used in (2-2) with ei  0,...,0,1,0,...,0T , where one is ` i th position, and the Fourier basis

  n 1t  j 2 x   1   j 2 x  nt  j 2 x  2nt  et   1  1, e ,e ,...., e  n   .   n 2   T

If for a given orthonormal basis ~x t are small when i  m , then the vector ~ x

can be compressed into an m dimensional vector, ~x ~ ~x1 , ~x2 ,...., ~xn T .

It is important to notice that an efficient representation does not necessarily have good discriminating power.

In the matching problem, an incoming face is recognized by identifying it with a prestored face. For example, suppose the input face is x and there are K prestored faces c k , K  1,2,.. K . One possibility is to assign x to ck0 if k 0  arg min x  c k

(4-4)

0i  x

Where represents the Euclidean distance in Rn. If ||ck|| is normalized so that ||ck||=c for all k, the minimum distance matching in (4.4) simplifies to correlation matching k 0  arg min ( x, c k )

(4-5)

0i  x

96

Since distance and inner product are invariant to change of orthonormal basis, minimum distance and correlation matching can be performed using any orthonormal basis and the recognition performance will be the same. To do this, simply replace x and c k in (4-4) or (4-5) by ~x and c~k . Similarly (4-4) and (4-5) Also could be used with feature vectors. Due to such factors such as viewing angle, illumination, facial expression, distortion, and noise, the face images for a given person can have random variations and therefore are better modeled as a random vector. In this case, maximum likelihood (ML) matching is often used, k 0  arg min log( p( x | ck ))

(4-6)

0i  x

Where p(x|ck) is the density of x conditioning on its being the kth person. The ML criterion minimizes the probability of recognition error when a priori, the incoming face is equally likely to be that of any of the K persons. Furthermore if we assume that variations in face vectors are caused by additive white Gaussian noise (AWGN)

xk = ck + wk

(4.7)

Where wk is a zero-mean AWGN with power 2, then the ML matching becomes the minimum distance matching of (4.4).

4.2.2. Early face recognition methods The initial work in automatic face processing dates back to the end of the 19th century as reported by Benson and Perrett [26]. In his lecture on personal identification at the Royal Institution on 25 May 1888, Sir Francis

97

Galton [2], an English scientist, explorer and a cousin of Charles Darwin, explained that he had “frequently chafed under the sense of inability to verbally explain hereditary resemblance and types of features”. In order to relieve himself from this embarrassment, he took considerable trouble and made many experiments. He described how French prisoners were identified using four primary measures (head length, head breadth, foot length and middle digit length of the foot and hand respectively). Each measure could take one of the three possible values (large, medium, or small), giving a total of 81 possible primary classes. Galton felt it would be advantageous to have an automatic method of classification. For this purpose, he devised an apparatus, which he called a mechanical selector, which could be used to compare measurements of face profiles. Galton reported that most of the measures he had tried were fairy efficient. Early face recognition methods were mostly feature based. Galton‟s proposed method, and a lot of work to follow, focused on detecting important facial features as eye corners, mouth corners, nose tip, etc. By measuring the relative distances between these facial features a feature vector can be constructed to describe each face. By comparing the feature vector of an unknown face to the feature vectors of known vectors from a database of known faces, the closest match can be determined.

One of the earliest works is reported by Bledsoe [27]. In this system, a human operator located the feature points on the face and entered their positions into the computer. Given a set of feature point distances of an unknown person, nearest neighbor or other classification rules were used for identifying the test face. Since feature extraction is manually done, this system could accommodate wide variations in head rotation, tilt, image quality, and contrast. 98

In Kanade‟s work [28], series fiducially points are detected using relatively simple image processing tools (edge maps, signatures etc.) and the Euclidean distances are then used as a feature vector to perform recognition. The face feature points are located in two stages. The coarse-grain stage simplified the succeeding differential operation and feature finding algorithms. Once the eyes, nose and mouth are approximately located, more accurate information is extracted by confining the processing to four smaller groups, scanning at higher resolution, and using „best beam intensity‟ for the region. The four regions are the left and right eye, nose, and mouth. The beam intensity is based on the local area histogram obtained in the coarse-gain stage. A set of 16 facial parameters, which are rations of distances, areas, and angles to compensate for the varying size of the pictures, is extracted. To eliminate scale and dimension differences the components of the resulting vector are normalized. A simple distance measure is used to check similarity between two face images.

4.2.3. Statistical approaches to face recognition 4.2.3.1. Karhunen-Loeve Expansion Based Methods: 4.2.3.1.1. Eigenface A face image, I(x, y), of size NxN is simply a matrix with beach element representing the intensity at that particular pixel. I(x, y) may also be considered as a vector of length N2 or a single point in an N2 dimensional space. So a 128x128 pixel image can be represented as a point in a 16,384 dimensional space. Facial images in general will occupy only a small subregion of this high dimensional „image space‟ and thus are not optimally represented in this coordinate system.

99

As mentioned in section 4.2.1, alternative orthonormal bases are often used to compress face vectors. One such basis is the Karhunen-Loeve (KL). The „Eigenfaces‟ method proposed by Turk and Pentland [29], is based on the Karhunen-Loeve expansion and is motivated by the earlier work of Sirovitch and Kirby [30] for efficiently representing picture of faces. Eigenface recognition derives it is name from the German prefix „eigen‟, meaning „own‟ or „individual‟. The Eigenface method of facial recognition is considered the first working facial recognition technology.

The eigenfaces method presented by Turk and Pentland finds the principal components (Karhunen-Loeve expansion) of the face image distribution or the eigenvectors of the covariance matrix of the set of face images. These eigenvectors can be thought as a set of features, which together characterize the variation between face images.

Let a face image I(x, y) be a two dimensional array of intensity values, or a vector of dimension n. Let the training set of images be I1, I2... IN. The average face image of the set is defined by

 

1 N

N

I

i

i 1

Each face differs from the average by the vector t  I t   .

This set of very large vectors is subject to principal component analysis which seeks a set of K orthonormal vectors vk, k=1,...,K and their associated eigen values k which best describe the distribution of data. Vectors vk and scalars k are the eigenvectors and eigen values of the covariance matrix: 100

1 N T C   t t  AAT N t 1

(4-9)

Where the matrix A= [1, 2... N]. Finding the eigenvectors of matrix Cnxn is computationally intensive. However, the eigenvectors of C can be determined by first finding the eigenvectors of a much smaller matrix of size NxN and taking a linear combination of the resulting vectors.

4 - 10  4 - 11

Cv k  k vk vkT  vkT  K vk

Since eigen vectors, vk are orthonormal and normalized v kT  1 , 4 - 12

vT Cv   k k k

k 

1 T N v k   t  tT v k N t 1



1 N T v k  t  tT v k  N t 1

4 - 13 

T

1  N

 v   v  

1  N

 v  

N

k

T t

k

T t

t 1

2

N

k

T t

t 1





1 N   v k I tT  mean v k I tT N t 1 



1 N var v k I tT  N t 1

2





Thus eigen value k represents the variance of the representative facial image set along the axis described by eigenvector k.

101

The space spanned by the eigenvectors vk, k=1,...,K corresponding to the largest K eigen values of the covariance matrix C, is called the face space. The eigenvectors of matrix C, which are called eigenfaces from a basis set for the face images. A new face image  is transformed into its eigen face components (projected onto the face space) by:  k  vk , T     vk T T   

4 - 14 

For k=1,...,K. The projections wk form the feature vector =[ w1, w2,..., wK] which describes the contribution of each of each eigen face in representing the input image.

Given a set of face classes Eq and the corresponding feature vectors q, the simplest method for determining which face class provides the best description of an input face image  is to find the face class that minimizes the Euclidean distance in the feature space:  q    q

4 - 15 

A face is classified as belonging to class Eq when the minimum q is below some threshold  and also E q  arg min q  q 

4 - 16 

Otherwise, the face is classified as unknown.

Turk and Pentland [29] test how their algorithm performs in changing conditions, by varying illumination, size and orientation of the faces. They found that their system had the most trouble with faces scaled larger or smaller than the original dataset. To overcome this problem they suggest using a multi-resolution method in which faces are compared to eigenfaces of varying sizes to compute the best match. Also they note that image

102

background can have significant effect on performance,

which

they

minimize by multiplying input images with a 2-D Gaussian to diminish the contribution of the background and highlight the central facial features. System performs face recognition in real-time. Turk and Pentland‟s paper was very seminal in the field of face recognition and their method is still quite popular due to its ease of implementation.

Murase and Nayar [31] extended the capabilities of the eigen face method to general 3D-object recognition under different illumination and viewing conditions. Given N object images taken under P views and L different illumination conditions, a universal image set is built which contains all the available data. In this way a single „parametric space‟ describes the object identity as well as the viewing or illumination conditions. The eigen face decomposition of this space was used for feature extraction and classification. However in order to insure discrimination between different objects the number of eigenvectors used in this method was increased compared to the classical Eigenface method.

Later, based on the eigen face decomposition, Pentland et al [32] developed a

„view

based‟

eigenspace

approach

for

human

face

recognition under general viewing conditions. Given N individuals under P different views, recognition is performed over P separate eigenspaces, each capturing the variation of the individuals in a common view. The „view based‟ approach is essentially an extension of the eigenface technique to multiple sets of eigenvectors, one for each face orientation. In order to deal with multiple views, in the first stage of this approach, the orientation of the test face is determined and the eigenspace which best describes the input image is selected. This is accomplished by calculating the residual description error (distance from feature space: DFFS) for each view space. Once the 103

proper view is determined, the image is projected onto appropriate view space and then recognized. The view based approach is computationally more intensive than the parametric approach because P different sets of V projections are required (V is the number of eigenfaces selected to represent each eigenspace). Naturally, the view-based representation can yield more accurate representation of the underlying geometry. 4.2.3.1.2. Face Recognition using Eigenfaces There are two main approaches of recognizing faces by using eigenfaces. Appearance model: 1- A database of face images is collected. 2- A set of eigenfaces is generated by performing principal component analysis (PCA) on the face images. Approximately, 100 eigenvectors are enough to code a large database of faces. 3- Each face image is represented as a linear combination of the eigenfaces. 4- A given test image is approximated by a combination of eigenfaces.

A distance measure is used to compare the similarity

between two images.

104

Figure 4.1: Appearance model

Figure 4.2: Discriminative model 105

Discriminative model: 1- Two datasets l and E are obtained by computing intrapersonal differences (by matching two views of each individual in the dataset) and the other by computing extra personal differences (by matching different individuals in the dataset), respectively. 2- Two datasets of eigenfaces are generated by performing PCA on each class. 3- Similarity score between two images is derived by calculating S=P( l| ), where  is the difference between a pair of images. Two images are determined to be the same individual, if S>0.5.

Although the recognition performance is lower than the correlation method, the substantial reduction in computational complexity of the eigenface method makes this method very attractive. The recognition rates increase with the number of principal components (eigenfaces) used and in the limit, as more principal components are used, performance approaches that of correlation. In [29], and [32], authors reported that the performances level off at about 45 principal components.

It has been shown that removing first three principal components results in better recognition performances (the authors reported an error rate of %20 when using the eigenface method with 30 principal components on a database strongly affected by illumination variations and only %10 error rate after removing the first three components). The recognition rates in this case were better than the recognition rates obtained using the correlation method. This was argued based on the fact that first components are more influenced by variations of lighting conditions.

106

4.2.3.1.3. Eigenfeatures Pentland et al [32]. discussed the use of facial features for face recognition. This can be either a modular or a layered representation of the face, where a coarse (low-resolution) description of the whole head is augmented by additional (high-resolution) details in terms of salient facial features. The eigenface technique was extended to detect facial features. For each of the facial features, a feature space is built by selecting the most significant

eigenfeatures

(eigenvectors

corresponding

to

the

largest

eigenvalues of the features correlation matrix).

After the facial features in a test image were extracted, a score of similarity between the detected features and the features corresponding to the model images is computed. A simple approach for recognition is to compute a cumulative score in terms of equal contribution by each of the facial feature scores. More elaborate weighting schemes can also be used for classification. Once the cumulative score is determined, a new face is classified such that this score is maximized.

The performance of eigenfeatures method is close to that of eigenfaces; however a combined representation of eigenfaces and eigenfeatures shows higher recognition rates.

4.2.3.1.4. The Karhunen-Loeve Transform of the Fourier Spectrum Akamatsu ET [33]. Al. illustrated the effectiveness of Karhunen-Loeve Transform of Fourier Spectrum in the Affine Transformed Target Image (KLFSAT) for face recognition. First, the original images were standardized with respect to position, size, and orientations using an affine transform so that three reference points satisfy a specific spatial arrangement. The position of these points is related to the position of some significant facial features. 107

The eigenface method is applied discussed in the section 4.2.3.1.1 to the magnitude of the Fourier spectrum of the standardized images (KL-FSAT). Due to the shift invariance property of the magnitude of the Fourier spectrum, the KL-FSAT performed better than classical eigenfaces method under variations in head orientation and shifting.

However the computational

complexity of KL-FSAT method is significantly greater than the eigenface method due to the computation of the Fourier spectrum.

4.2.3.2. Linear Discriminant Methods- Fisher Faces In [34], [35], the authors proposed a new method for reducing the dimensionality of the feature space is Fisher‟s Linear Discriminant (FLD) [36]. The FLD uses the class membership information and develops a set of feature vectors in which variations of different faces are emphasized while different instances of faces due to illumination conditions, facial expression and orientations are de-emphasized. 4.2.3.2.1. Fisher’s Linear Discriminant Given c classes with a priori probabilities Pi, let Ni be the number of samples of class i, i=1,...,c . Then the following positive semi-definite scatter matrices are defined as:

c

S B   Pt  t    t   

4 - 17 

T

t 1 c

Ni





S W   Pt  x   t x   t t 1

t j

t j

T



4 - 18 

j 1

Where x ij denotes the J th n – dimensional sample vector belonging to class i,t is the mean of class i: t 

1 Ni

N

x

( 4  19 )

i j

j 1

108

And  is the overall mean of sample vectors:



c

1 c

N

N

 x

(4  20 )

i j

i 1 j 1

t

t 1

Sw is the within class scatter matrix and represent the average scatter of sample vector of class i, SB is the between – class scatter matrix and represents the scatter of the mean i of class i around the overall mean vector  if Sw is non ingular, the Linear Di scriminant Analysis (LDA) selects a matrix v opt  R nxk with orthonormal columns which maximizes the ratio of the between class scatter matrix of the projected samples, V T S BV Vopt  arg max   V T S V   v1 , v 2 ,........v k   W 

4 - 21

Where {vi|i=1,...,k} is the set of generalized eigenvectors of SB and Sw corresponding to the set of decreasing eigenvalues { i |i=1,...,k}, i.e. S B v t  t S W v t

4 - 22 

the upper bound of k is c-1. The matrix Vopt describes the Optimal Linear Discriminant Transform or the Foley-Sammon Transform. While the Karhunen-Loeve Transform performs a rotation on a set of axes along which the projection of sample vectors differ most in the autocorrelation sense, the Linear Discriminant Transform performs a rotation on a set of axes [v1, v2,..., vk] along which the projection of sample vectors show maximum discrimination.

109

4.2.3.2.2. Face Recognition Using Linear Discriminant Analysis Let a training set of N face images represents c different subjects. The face images in the training set are two-dimensional arrays of intensity values, represented as vectors of dimension n. Different instances of a person‟s face (variations in lighting, pose or facial expressions) are defined to be in the same class and faces of different subjects are defined to be from different classes.

The scatter matrices SB and Sw are defined in Equations (4.17), (4.18). However the matrix Vopt cannot be found directly from Equation (4.21), because in general matrix Sw is singular. This stems from the fact that the rank of Sw is less than N-c, and in general, the number of pixels in each image n is much larger than the number of images in the learning set N. There have been presented many solutions in the literature in order to overcome this problem [37, 38].

In [34], the authors propose a method which is called Fisherfaces. The problem of Sw being singular is avoided by projecting the image set onto a lower dimensional space so that the resulting within class scatter is non singular. This is achieved by using Principal Component Analysis (PCA) to reduce the dimension of the feature space to N-c and then, applying the standard linear discriminant defined

in

Equation (4.21) to reduce the

dimension to c-1. More formally Vopt is given by:

4 - 23 

Vopt  V fld V pca

Where V pca  arg max v V T CV ,

And 110

4 - 24 

T V T VPCA SBVPCAV V fld  arg max v  V T V T SW V V   pca pca 

4 - 25 

Where C is the covariance matrix of the set of training images and is computed from Equation (4.9). The columns of Vopt are orthogonal vectors which are called Fisherfaces. Unlike the Eigenfaces, the Fisherfaces do not correspond to face like patterns. All example face images Eq, q=1,...,Q in the example set S are projected on to the vectors corresponding to the columns of the Vfld and a set of features is extracted for each example face image. These feature vectors are used directly for classification.

Having extracted a compact and efficient feature set, the recognition task can be performed by using the Euclidean distance in the feature space. However, in [35] as a measure in the feature space is proposed a weighted mean absolute/square distance with weights obtained based on the reliability of the decision axis.

K

D, E    v 1

4 - 26 

  E 2     E 2 

ES

Therefore, for a given face image  , the match E 0 is given by E 0  arg min E  S D, E 

4 - 27 

The confidence measure is defined as:







0 conf , E 0  1  DD,,EE1 

Where E 1 is the second best candidate.

111



4 - 28

In [33], Akamatsu Applied LDA to the Fourier Spectrum of the intensity image. The results reported by the authors showed that LDA in the Fourier domain is significantly more robust to variations in lighting than the LDA applied directly to the intensity images. However the computational complexity of this method is significantly greater than classical Fisherface method due to the computation of the Fourier spectrum. 4.2.3.3. Singular Value Decomposition Methods 4.2.3.3.1 Singular value decomposition Methods based on the Singular Value Decomposition for face recognition use the general result stated by the following theorem:

Theorem: Let Ipxq be a real rectangular matrix Rank(I) =r, then there exists two orthonormal matrices Upxp, Vqxq and a diagonal matrix pxq and the following formula holds:

T

4 - 29 

I  U  V T   t vt vtT t 1

Where U=( u1, u2,..., ur, ur+1,..., up), V=( v1, v2,..., vr, vr+1,..., vq), =diag( 1, 2,..., r, 0,..., 0), 1  2  ..........  r  0, t2 , i  1,.........., r

Are

the

eigenvalues

of

IIT

and

IT I , ui , v j , i  1,...... p j  1,.........q are the eigenvectors corresponding to eigenvalues of IIT and I T I.

112

4.2.3.3.2. Face recognition Using Singular Value Decomposition Let a face image I(x,y) be a two dimensional (mxn) array of intensity values and [ 1, 2,..., r] be its singular value (SV) vector. In [38] Zhong revealed the importance of using SVD for human face recognition by proving several important properties of the SV vector as: the stability of the SV vector to small perturbations caused by stochastic variation in the intensity image, the proportional variance of the SV vector to proportional variance of pixels in the intensity image, the invariance of the SV feature vector to rotation transform, translation and mirror transform. The above properties of the SV vector provide the theoretical basis for using singular values as image features. However, it has been shown that compressing the original SV vector into a low dimensional space, by means of various mathematic transforms leads to higher recognition performances. Among various transformations of compressing dimensionality, the Foley-Sammon transform based on Fisher criterion, i.e. Optimal discriminant vectors, is the most popular one. Given N face images, which present c different subjects, the SV vectors are extracted from each image. According to Equations (4.17) and (4.18), the scatter matrices SB and Sw of the SV vectors are constructed. It has been shown that it is difficult to obtain the optimal discriminant vectors in the case of small number of samples, i.e. the number of samples is less than the dimensionality of the SV vector because the scatter matrix Sw is singular in this case. Many solutions have been proposed to overcome this problem. Hong [38], Circumvented the problem by adding a small singular value perturbation to Sw resulting in Sw(t) such that Sw(t) becomes nonsingular. However the perturbation of Sw introduces an arbitrary parameter, and the range to which the authors restricted the perturbation is not appropriate to ensure that the inversion of Sw(t) is numerically stable. Cheng et al [37], solved the problem by rank decomposition of Sw. This is generalization of Tian‟s method [9], who substitutes Sw by the positive pseudo – inverse S+w.. 113

After the set of optimal discriminant vectors {v1, v2, ..., vk} has been extracted, the feature vectors are obtained by projecting the SV vectors onto the space spanned by {v1, v2, ..., vk}.

When a test image is acquired, its SV vector is projected onto the space spanned by {v1, v2, ..., vk} and classification is performed in the feature space by measuring the Euclidean distance in this space and assigning the test image to the class of images for which the minimum distance is achieved.

Another method to reduce the feature space of the SV feature vectors was described by Cheng et al [39]. The training set used consisted of a small sample of face image of the person. If I ij represents the j th face image of person i, then the average image is given by 1 N

N

I

i j

.

j 1

Eigenvalues and eigenvectors are determined for this average image using SVD. The eigenvalues are thresholded to disregard the values close to zero. Average eigenvectors (called feature vectors) for all the average face images are calculated. A test image is then projected onto the space spanned by the eigenvectors. The Frobenius norm is used as a criterion to determine which person the test image belongs.

4.2.4. Hidden Markov Model Based Methods Hidden Markov Models (HMM) are a set of statistical models used to characterize the statistical properties of a signal. Rabiner [40][41]provides an extensive and complete tutorial on HMMs. HMM is made of two interrelated processes: 114

- An underlying, unobservable Markov chain with finite number of states, a state transition probability matrix and an initial state probability distribution. - A set of probability density functions associated to each state.

The elements of HMM are: N, the number of states in the model. If S is the set of states, then S={ S1, S2,...,SN}. The state of the model qt time t is given by

, where T is

the length of the observation sequence (number of frames). M, the number of different observation symbols. If V is the set of all possible observation symbols (also called the codebook of the model), then V={ V1, V2,...,VM}. A, the state transition probability matrix; A={aij} where





Aij  p q t  S j q t 1  S t , 0  a ij  1,

N

a

ij

 1,

1  i, j  N

(4 - 30)

1 i  N

(4 - 31)

j 1

B, the observation symbol probability matrix; B=bj(k) where,





B j k   p Qt  vk qt  S j ,1  j  N , 1  k  M

(4 - 32)

And Qt is the observation symbol at time t. , the initial state distribution; = i where  i  pq1  S i ,

1 i  N

Using a shorthand notation, a HMM is defined as:

115

(4 - 33)

=(A,B, ).

(4.34)

 The above characterization corresponds to a discrete HMM, where the observations characterized as discrete symbols chosen from a finite alphabet V={v1, v2,...,vM}. In a continuous density HMM, the states are characterized by continuous observation density functions. The most general representation of the model probability density function (PDF) is a finite mixture of the form: M

bt O    ctk N O,  tk , U tk ,

1 i  N

(4 - 35)

k 1

Where Cik is the matrix coefficient for the k mixture in state i. without loss of generality N O,  ik ,U ik  is assumed to be a Gaussian PDF with mean vector  ik and covariance matrix Uik. HMM have been used extensively for speech recognition, where data is naturally one-dimensional (1-D) along time axis. However, the equivalent fully connected two-dimensional HMM would lead to a very high computational problem [42]. Attempts have been made to use multi-model representations that lead to pseudo 2-D HMM [43]. These models are currently used in character recognition [44][45].

116

Figure 4.3: Image sampling technique for HMM recognition

In [46],Samaria et al proposed the use of the 1-D continuous HMM for face recognition. Assuming that each face is in an upright, frontal position, features will occur in a predictable order. This ordering suggests the use of a topbottom model, where only transitions between adjacent states in a top to bottom manner are allowed [47]. The states of the model correspond to the facial features forehead, eyes, nose, mouth and chin [48]. The observation sequence O is generated from an XxY image using an XxL sampling window with XxM pixels overlap (Figure 4.3). Each observation vector is a block of L lines. There is an M line overlap between successive observations. The overlapping allows the features to be captured in a manner, which is independent of vertical position, while a disjoint partitioning of the image could result in the truncation of features occurring across block boundaries . In[49] The effect of different sampling parameters has been discussed. With no overlap, if a small height of the sampling window is used, the segmented data do not correspond to significant facial features. However, as the window height increases, there is a higher probability of cutting across the features.

Given c face images for each subject of the training set, the goal of the training stage is to optimize the parameters i=(A,B, ) to describe „best‟, the observations O={ o1, o2,...,oT}, in the sense of maximizing P(O| ). The general HMM training scheme is illustrated in Figure 4.4 and is a variant of the K-means iterative procedure for clustering data: 1. The training images are collected for each subject in the database and are sampled to generate the observation sequence. 2. A common prototype (state) model is constructed with the purpose of specifying the number of states in the HMM and the state transitions allowed, A (model initialization).

117

3. A set of initial parameter values using the training data and the prototype model are computed iteratively. The goal of this stage is to find a good estimate for the observation model probability matrix B. in [41] It has been shown that good initial estimates of the parameters are essential for rapid and proper convergence (to the global maximum of the likelihood function) of the re-estimation formulas. On the first cycle, the data is uniformly segmented, matched with each model state and the initial model parameters are extracted. On successive cycles, the set of training observation sequences was segmented into states via the Viterbi algorithm [50]. The result of segmenting each of the training sequences is for each of N states, a maximum likelihood estimate of the set of observations that occur within each state according to the current model. 4. Following the Viterbi segmentation, the model parameters are reestimated using the Baum-Welch re-estimation procedure.

This

procedure adjusts the model parameters so as to maximize the probability of observing the training data, given each corresponding model. 5. The resulting model is then compared to the previous model (by computing a distance score that reflects the statistical similarity of the HMMs). If the model distance score exceeds a threshold, then the old model  is replaced by the new model ~ , and the overall training loop is repeated. If the model distance score falls below the threshold, then model convergence is assumed and the final parameters are saved.

Recognition is carried out by matching the test image against each of the trained models (Figure 4.5). In order to achieve this, the image is converted to an observation sequence and then model likelihoods P(Otesti) 118

are computed for eachi, i=1,...,c. The model with highest likelihood reveals the identity of the unknown face, as V  arg max 1ic  pOtest t ,

(4 - 36)

Figure 4.4: HMM training scheme

The HMM based method showed significantly better performances for face recognition compared to the eigenface method. This is due to fact that HMM based method offers a solution to facial features detection as well as face recognition.

119

Figure 2.5: HMM recognition scheme

However the 1-D continuous HMM is computationally more complex than the Eigenface method. A solution in reducing the running time of this method is the use of discrete HMM. Extremely encouraging preliminary results (error rates below %5) were reported in [51] when pseudo 2-D HMM is used. Furthermore, the authors suggested that Fourier representation of the images can lead to better recognition performance as frequency and frequency-space representation can lead to better data separation.

4.2.5. Neural Networks Approach In principal, the popular back-propagation (BP) neural network [52] can be trained to recognize face images directly. However, a simple network can be very complex and difficult to train. A typical image recognition

120

network requires N=mxn input neurons, one for each of the pixels in an nxm image. For example, if the images are 128x128, the number of inputs of the network would be 16,384. In order to reduce the complexity, Cottrell and Fleming [53] used two BP nets (Figure 4.6). The first net operates in the autoassociation mode [54] and extracts features for the second net, which operates in the more common classification mode.

The autoassociation net has n inputs, n outputs and p hidden layer nodes. Usually p is much smaller than n. The network takes a face vector x as an input and is trained to produce an output y that is a „best approximation‟ of x. In this way, the hidden layer output h constitutes a compressed version of x, or a feature vector, and can be used as the input to classification net. .

Figure 4.6: Auto-association and classification networks

Bourland and Kamp [54] showed that “under the best circumstances”, when the sigmoidal functions at the network nodes are replaced by linear functions (when the network is linear), the feature vector is the same as that 121

produced by the Karhunen-Loeve basis, or the eigenfaces. When the network is nonlinear, the feature vector could deviate from the best. The problem here turns out to be an application of the singular value decomposition.

Specifically, suppose that for each training face vector xk (ndimensional), k=1, 2,...,N, the outputs of the hidden layer and output layer for the autoassociation net are hk (p-dimensional, usually p first by 20 pixel take a point (Last +first)/2 V-the points resulting from step Ш will work as centers for a window 20X10 that represent an eye and will be as input for the next stage.

This step will reduce the number of windows given to the neural stage so will reduce the time needed for detecting the eyes position.

Figure 5.12 show phases for Finding Eyes Region stage (a) the original image, (b) the histogram image, (c) the edge detection image, (d) a threshold image by 100 threshold, (e) the row array that used in counting the black pixels row by row, (f) the column array that used in counting the black pixels column by column., (g) the cropped eye a 15x70 pixel image ,(h)the generated eye probabilities.

Figure 5.13 show samples for finding the eye region stage and intermediate steps for 3 persons a, b, c.

154

(a)

(b)

(c)

1  2          15    16 .        112   

(d)

1  2    .    .  .    97

12 7071 .92 12  22

(e)

(f)

(h)

(g) Figure 5.12 Finding Eyes Region stage

155

(a)

(b)

156

(c) Figure 5.13 samples for finding Eyes Region stage

5.2.4 Eyes Detection (Neural Network) The purpose of this stage is to find the eye by neural networks. It take as input a number of 20X10 windows that contain eye and give output as decision that each window is eye or not.

The neural networks are made up of neurons, also called processing elements or nodes that work in parallel, thus, they can solve classification problems more quickly. A neural network is first trained to classify the input

157

data into one of pre-specified categories. In this stage, we try two different types of neural networks a back-propagation network and a Modular network.

We take as a sample data for training the network 320 sample eyes and 960 non eyes (figure 5.14) organized in one input file as image 20x12800.

5-2-4-1 The back-propagation learning algorithm to train the neural network (figure 5.15). The first step is to build a multi-layered neural network (figure. (5.10)). the input layer is a single layer with 200 input processing elements that accept 200 values from the training file. These values represent the eye image 20x10.

We use two hidden layers for the back-propagation network the first layer has 6 processing elements and the second has 4. The number of processing elements in the hidden layers was determined empirically by changing the number of processing elements from 2 to 10 for each layer and testing the neural network performance each time we change this number. The best results were obtained when the number of processing elements was 6 for the first layer and 4 for the second. The learning rule used in the hidden layers is the back-propagation with momentum. The momentum provides the gradient decent with some inertia, so that it tends to move along a direction that decreases the mean square error function. The amount of inertia is dictated by the momentum parameter . The higher the momentum, the more it smoothes the gradient estimate and the less effect a single change in the gradient has on the weight change. The major benefit is the ability to break out of local minima that the neural network might get caught in. We note that oscillations may occur if the momentum is set too high. We performed several tests to

158

estimate an optimal value for the momentum and it was empirically set to be 0.9.

The output layer has 2 processing elements such that each processing element in the output layer represents a decision eye or not eye. The learning rule used is also the back-propagation with momentum that has a momentum rate of 0.9. Also the non-linear activation function used in the output layer is the TanhAxon. Figure. (5.16) shows the neural network used in this stage.

The training process task is to adjust the weights of the neural network. This is done by applying the training file that contains the eyes images to the input layer and applying a desired output file to the output layer. When the training process begins, the neural network tries to adjust its weights so as to make the actual output of each processing element in the output layer equal the desired output in the response file introduced to the output layer. We note that the response file will have only one processing element whose output is required to be high (equals one) and the other processing element will be low (equal zero). This means that input image reads by the input layer during this sample belongs to the class eye or not eye. We trained the back-propagation for 10000 Epochs.

5-2-4-2 the Modular network Modular networks are a special class of feed forward MLPs, with multiple parallel MLPs figure (5.17).

Modular networks process their input using several parallel MLP's and then recombine the results. This tends to faster specialization of function in each sub-module. In contrast to the MLP, modular networks do not have full 159

(a)

(b)

Figure 5.14 samples from input data to eye neural network (a) eye samples, (b) non eye samples

160

Figure 5.15 the back-propagation network model

2

6

200

The desired response file

Input layer of source nodes

Layer of hidden neurons

Layer of output neurons

Figure 5.16- The back-propagation network used in eye detection

Interconnectivity between their layers and thus require a smaller number of weights for the same size network. This tends to speed up training times and reduce the number of required training exemplars.

We use the Modular network with the same parameters of the backpropagation network for each branch, 200 input processing element and 2 161

hidden layers with 6 processing element for the first and 4 processing element for the second hidden layer , 2 output processing element to classify the two classes (eye or not eye).

Figure 5.17 a model for Modular network with two pass MLP

We use the same input file for training data and the same desired output file. We train the Modular network for 9000 Epochs.

162

Figure 5.18 samples for eye detection stage

5.3 The Feature Extraction Stage In this stage, the features of the face are extracted and joined to form the face code. There are two main steps in this stage which are: (1)

Applying Gabor filters

(2)

Computing the Face code. Figure. (5.19) shows the block diagram of this stage.

163

A bank of

Computing the Face

Gabor Filters

Code

Face Code

Normalized Face

Figure 5.19- The feature extraction stage

5.3.1 Gabor filtering

Using local features is a mature approach to face recognition problem [14, 67, 60, 28]. One of the main motivations of feature based methods is due to: representation of the face image in a very compact way and hence lowering the memory needs. This fact

especially gains importance

when there is a huge face database. Feature based methods are based on finding fiducial points (or local areas) on a face and representing corresponding information in an efficient way. However, choosing suitable feature locations and the corresponding values are extremely critical for the performance of a recognition system. Searching nature for finding an answer has lead researchers to examine the behavior of human visual system (HVS).

Physiological studies found simple cells, in human visual cortex, that are selectively tuned to orientation as well as to spatial frequency. It was suggested that the response of a simple cell could be approximated by 2D Gabor filters. Over the last couple of years, it has been shown that using 164

Gabor filters as the front-end of an automated face recognition system could be highly successful [67, 60, 68]. One of the most successful face recognition methods is based on graph matching of coefficients which are obtained from Gabor filter responses [4]. However, such graph matching algorithm methods have some disadvantages due to their matching complexity, manual localization of training graphs, and overall execution time. They use general face structure to generate graphs and such an approach brings the question of how efficient the feature represents the special facial characteristics of each individual. A novel Gabor based method may overcome those disadvantages.

2 D Gabor functions are similar to enhancing edge contours, as well as valleys and ridge contours of the image. This corresponds to enhancing eye, mouth, nose edges, which are supposed to be the main important points on a face. Moreover, such an approach also enhances moles, dimples, scars, etc. Hence, by using such enhanced points as feature locations, a feature map for each facial image can be obtained and each face can be represented with its own characteristics without any initial constrains. Having feature maps specialized for each face makes it possible to keep overall face information while enhancing local characteristics.

Gabor functions first proposed by Dennis Gabor as a tool for signal detection in noise. Gabor showed that there exists a “quantum principle” for information; the conjoint time-frequency domain for 1D signal must necessarily be quantized so that no signal or filter can occupy less than certain minimal area in it. However, there is a trade off between time resolution and frequency resolution. Gabor discovered that Gaussian modulated complex exponentials provide the best trade off. For such a case, the original Gabor elementary 165

functions are generated with a fixed Gaussian, while the frequency Of the modulating wave varies.  1  x 2 y  2   G ( x, y, f ,  )  exp   2  2   J ( x, y, f ,  )  2  x  y     

(5.5)

The Gabor filter has the general form: Where J (x, y, f, ) is the harmonic oscillator and is given by: J ( x, y , f ,  )  exp( 2ifx)

(5.6)

and x  x sin   y cos  y   x cos   y sin 

(5.7)

Where f is the frequency of the sinusoidal plane wave along the  direction from the x axis and x ,y are the space constants of the Guassian envelope along x and y axis respectively. The overall shape of the two dimensional Gabor filter is divided into two elementary functions which are:

(i)

Even symmetric (Cosine): In which the Gabor filter has the following equation form:  1  x 2 y  2   G ( x, y , f , )  exp   2  2   cos( 2fx)  2  x  y     

(ii)

(5.8)

Odd symmetric (Sine): In which the Gabor filter has the following equation form:  1  x 2 y  2   G ( x, y , f , )  exp   2  2   sin( 2fx)  2  x  y     

(5.9)

166

(4.13)

The special characteristics of an even symmetric Gabor filter, for =00 and =600, can be seen in Figure. (5.20).

Figure 5.20- Two Gabor filters for =0 and 30 degrees

We use the even symmetric Gabor filter with 5 frequency's (f=0.2, 0.3, 0.4, 0.5, 0.6), and evaluate each mask for 8 different orientation (=0, 22.5, 45, 67.5, 90,112.5, 135,157.5 degrees) and stored each mask which corresponds to a specific frequency and angle in a separate file to avoid evaluating the mask each time, to demonstrate the values and shapes of the gabor mask we normalized the values in the range of 0 to 255 and construct an image in which each pixel represents the corresponding mask element value, we use filter size 10 for each mask.

167

The x and y (the standard deviation of the Guassian envelope) must be chosen carefully. If x and y are too high then the filter will be more robust to noise, however the filter will more likely smooth the image to the extent that the ridge and valley details of the face image will be lost. If x and y are too small then the filter is not effective in removing the noise. We set the values of x and y to 3.0.

The effect of applying a Gabor filter in a certain direction will give us the feature parallel with that direction and hiding feature in the other directions.

Convolution process is the last step for The Feature Extraction Stage Convolution is a simple mathematical operation which is fundamental to many common image processing operators. Convolution provides a way of `multiplying together' two arrays of numbers, generally of different sizes, but of the same dimensionality, to produce a third array of numbers of the same dimensionality. This can be used in image processing to implement operators whose output pixel values are simple linear combinations of certain input pixel values, in an image processing context, one of the input arrays is normally just a gray level image. The second array is usually much smaller, and is also two-dimensional and is known as the kernel. Figure 5.21 shows an example image and kernel.

The convolution is performed by sliding the kernel over the image, generally starting at the top left corner, so as to move the kernel through all the positions where the kernel fits entirely within the boundaries of the image. (Note that implementations differ in what they do at the edges of images, as explained below.) Each kernel position corresponds to a single output pixel, the value of which is calculated by multiplying together the kernel value and the underlying image pixel value for each of the cells in the kernel, and then 168

adding all these numbers together.

Figure 5.21 An example small image (left) and kernel (right)

If the image has M rows and N columns, and the kernel has m rows and n columns, then the size of the output image will have M - m + 1 rows, and N n + 1 columns. Mathematically we can write the convolution as:

m

n

i, j    I i  k  1, j  l  1 k , l 

(5-10)

k 1 l 1

Where i run from 1 to M - m + 1 and j runs from 1 to N - n + 1. as an example  57  I 57 K 11  I 58 K 12  I 59 K 13  I 67 K 21  I 68 K 22  I 69 K 23

(5-11)

Note that many implementations of convolution produce a larger output image than this because they relax the constraint that the kernel can only be moved to positions where it fits entirely within the image. Instead, these implementations typically slide the kernel to all positions where just the top 169

left corner of the kernel is within the image. Therefore the kernel `overlaps' the image on the bottom and right edges. One advantage of this approach is that the output image is the same size as the input image. Unfortunately, in order to calculate the output pixel values for the bottom and right edges of the image, it is necessary to invent input pixel values for places where the kernel extends off the end of the image. Typically pixel values of zero are chosen for regions outside the true image, but this can often distort the output image at these places.

Therefore convolution implementation removes these spurious regions by Removing n - 1 pixels from the right hand side and m - 1 pixel from the bottom (for that we try a filter size 33x33 but we cannot use it it will remove 33x33 window from the face image 92x112 so most face features will remove so we use smaller filter size 10x10 only).

In our system the face image 92x112 and the kernel 10x01 (filter size) so the output image will be 82x102, again to make the output image the same size as the face image first we make a normalization for the filter response to be between values from 0 to 255 and fill the image in the lost data with 127 as a gray level value represent the zero level in the filter response. Figure. (5.22), (5.23) shows the 40 gabor filters used and the output of the convolution process with Gabor filters for 5 different frequencies and 8 different directions to generate 40 mask filters. These filters are sufficient to capture the basic feature for the face; this can be verified by reconstructing the face image by adding the filtered images together. The resulting image will be an enhanced image of the original face image. We notice that the main feature of the face is the same in both images.

170

Frequency

Orientation Figure 5.22 the 40 gabor filters used varying for 5 frequencies and 8 angles

(a)

(b)

171

(c) Figure 5.23 (a) gabor filters responses

Figure 5.24 shows samples for gabor feature extraction stage with the original Image, filter response and gabor reconstruction image.

172

Figure 5.24 samples for gabor filter stage

173

5.3.2 Face Code

After getting the 40 filtered images, we begin generating the face code which will be used in identification stage, we must take a powerful face code that can identify persons from each other and make a clear representation for each face and each feature in the face without losing data or make a redundancy in the feature representation, the representation must begin from the a predetermined point for the person each time and for any image for that person we chose the point between the eye to be the start point, we try two different representation for face coding as shown in figure (5.25).

(a)

(b)

Figure 5.25 methods for face coding (a) representing face by a grid of points, (b) window response

After finding the point between eyes we take a window 60x60 with a center of the point that we get minus 10 pixels in y direction, then we divide the window width and height into ten parts with 6 pixel separator so we have 100 points that we will get the 40 gabor mask response at it then the face code will be 100 point*40 gabor mask for each point so we have 4000 feature for each face, and we use 40 person. We used 6 images for training for each person so we have a training file containing 6*40*4000 entry for the neural stage. This input is very large and will take time so we minimized the training

174

file by selecting only 14 persons from whole database. And used a window 60x60 that subdivided into 36 sub windows 10x10 and take the mean of the responses for each window.

So the face code will be 36 window * 40 response 1440 feature and the training file will be 14*6*1440 and it is sufficient for recognizing the 14 person.

We calculate the average absolute deviation from the mean of the pixel values in each window. This gives the concentration of the face feature along each direction in that part of the face image. Let Fi (x,y) be the -direction filtered image for window Si where i  (0,1,2,…,36) and   (0,22.5,45,67.5,90,112.5,135,157.5). The feature vector Vi is the average absolute deviation from the mean, and is defined as:

Vi 

1   Fi ( x, y )  Pi ni  ni

   

(5-12)

Where ni is 36 (the number of pixels in each window). Pi is the mean of pixel values of Fi(x,y) in window Si The resulting 1440 average absolute deviation values (40 filter x 36 windows) are the feature vector of the face image.

175

5.4 The Recognition Stage We use the Modular network for the Identification Stage figure (5.14) which:1. Have two MLPs branches. 2. Each branch has two hidden layer. 3. Each hidden layer has 150 PE's (Processing Element). 4. Input layer has a 40*36 PE's. 5. Output layer 14*1 PE's to classify 14 class (person). 6. Training file which contains 14*6*36*40 entry, each 36*40 represent one face code. 7. Desired output file which is a matrix of (14)*(14*6) each line (14) to recognize one image class. 8. We train the network for 2000 times Epoch's. 9. The training toke about 3 hours (on 800 MHz PIII & 256 RAM PC). 10.The mean square error had decayed to 8*10 -10 which is an acceptable error.

176

CHAPTER 6 Results and Conclusion 6.1 The System Database We use The Olivetti and Oracle Research Laboratory (ORL) face database there are ten different images of each of 40 distinct subjects. For some subjects, the images were taken at different times, varying lighting, facial expressions (open / closed eyes, smiling / not smiling), facial details (glasses / no glasses) and head pose (tilting and rotation up to 20 degrees). All the images were taken against a dark homogeneous background. Figure 5.21 shows the whole set of 40 individuals 10 images per person from the ORL database. We took the first images for each 40 individuals as reference and the rest is used for testing purposes. The proposed method achieved 92.8% correct classification for only 14 persons of the orl database. This large variation in the database makes it suitable to test the ability of the system to recognize the images at different situations.

177

178

179

Figure 5.21 a sample of used database 10 individuals 10 images per person.

6.2 Neural Network Training Our database contains 14 person every person has 10 images, we toke 6 images for training and 4 for testing, so we have 56 image for testing and 84 for training. We used a modular neural network because it's more accurate and more complex than traditional back-propagation networks. The network configuration is as follows:

1. Have two branches.

180

2. Each branch has two hidden layer. 3. Each layer has 150 PE's (Processing Element). 4. Input layer with a 40*36 PE's. 5. Output layer 14*1 PE's. 6. Input file which contains 14*6*36*40 entry, each 36*40 represent one face code. 7. Output file which is a matrix of (14)*(14*6) each line (14) to recognize one image class. 8. β = 0.5. 9. We train the network for 2000 times Epoch's. 10.The training toke about 3 hours (on 800 MHz PIII & 256 RAM PC). 11.The mean square error had decayed to 8*10-10 which is an acceptable error.

Figure 6.1 represents the decay of error

181

# A sample of the output file: 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

# A sample of the input file (a vector of one point): {10.0363073712469 12.5954810024872 17.4409733276314 10.6604715486051 11.4178553405722 16.6736165000813 9.78902669156465 32.8947156721091 22.2342997571521 26.2050751474695 27.1196203150192 4.78682053411067 9.65855514397201 43.0787713810643 12.9466895818991 15.0395544118385 15.6701741806719 20.5780404371784 6.30255651148294 17.9863935553509 10.822663570695 20.9562088619384 25.0238543814462 44.2981139112541 18.0494677224891 23.1998039533293 16.2940292908745 30.5295731199756 28.2437110105454 19.6969335114395 9.44988556582318 25.747498944781 17.860679323392 36.8855217895322 12.3389144291669 11.2990588039568 }

6.3 Neural Network Testing and Results

For every person in the database we have 6 images for training and 4 for testing. In the below figures for every one, the first table is the testing results and the second table is the training results.

Results for person NO. One

0.982

0.002

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.015

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

182

0.035

0.000

0.066

0.005

0.001

0.005

0.000

0.102

0.004

0.733

0.048

0.000

0.002

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

Results for person NO. Two

0.000

0.999

0.001

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.991

0.003

0.000

0.000

0.000

0.000

0.000

0.000

0.003

0.002

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

Results for person NO. Three

0.000

0.000

0.993

0.000

0.003

0.002

0.000

0.001

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.992

0.000

0.000

0.007

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.996

0.000

0.001

0.002

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

Results for person NO. Four

183

0.000

0.000

0.000

0.994

0.005

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.004

0.992

0.002

0.000

0.000

0.000

0.000

0.000

0.000

0.001

0.001

0.000

0.000

0.000

0.000

0.999

0.001

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

Results for person NO. Five

0.000

0.000

0.000

0.001

0.922

0.005

0.000

0.022

0.002

0.005

0.000

0.043

0.000

0.000

0.000

0.000

0.000

0.000

0.998

0.000

0.000

0.001

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.001

0.000

0.991

0.001

0.000

0.002

0.000

0.001

0.000

0.003

0.000

0.000

0.000

0.000

0.000

0.000

0.997

0.001

0.000

0.001

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

Results for person NO. Six

0.000

0.000

0.000

0.000

0.000

0.999

0.000

0.001

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.999

0.000

0.001

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.999

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.006

0.002

0.000

0.991

0.000

0.001

0.001

0.000

0.000

0.000

0.000

0.000

184

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

Results for person NO. Seven

0.000

0.000

0.000

0.000

0.000

0.008

0.936

0.000

0.056

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.011

0.965

0.000

0.023

0.000

0.000

0.000

0.000

0.001

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

Results for person NO. Eight

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.999

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.996

0.003

0.001

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.001

0.000

0.999

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.001

0.000

0.000

0.999

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

Results for person NO. Nine

185

0.000

0.000

0.000

0.000

0.001

0.000

0.000

0.000

0.999

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.002

0.997

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

Results for person NO. Ten

0.001

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.998

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.998

0.001

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

Results for person NO. Eleven

0.005

0.001

0.000

0.000

0.001

0.000

0.003

0.000

0.002

0.026

0.959

0.000

0.000

0.002

0.017

0.003

0.063

0.001

0.000

0.225

0.002

0.029

0.000

0.163

0.489

0.006

0.000

0.000

0.016

0.004

0.000

0.000

0.000

0.000

0.001

0.014

0.036

0.321

0.526

0.046

0.001

0.034

0.000

0.016

0.010

0.007

0.027

0.003

0.001

0.010

0.001

0.063

0.792

0.001

0.069

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

186

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

Results for person NO. Twelve

0.000

0.000

0.000

0.000

0.000

0.001

0.000

0.000

0.000

0.000

0.000

0.998

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.011

0.000

0.002

0.000

0.000

0.000

0.987

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

Results for person NO. Thirteen

0.000

0.000

0.000

0.001

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.998

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.999

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.011

0.014

0.000

0.000

0.974

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

Results for person NO. Fourteen 187

0.002

0.035

0.869

0.000

0.000

0.006

0.010

0.000

0.017

0.003

0.012

0.000

0.040

0.005

0.001

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.999

0.000

0.005

0.001

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.008

0.000

0.000

0.985

0.000

0.010

0.090

0.002

0.002

0.000

0.000

0.001

0.001

0.003

0.000

0.002

0.000

0.891

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

0.000

1.000

To evaluate the system performance we have some tests which must be passed, these tests are: 1. true acceptance It means that the network recognizes the image to be the right person. 2. false acceptance It means that the network recognize the image to the wrong person. 3. rejection It means that the network failed in recognizing a person who is in the database

True acceptance False acceptance Rejection

Training

Testing

Total

(84/84) 100% (0/84) 0% (0/84) 0%

(52/56) 92.8% (3/56) 5.3% (1/56) 1.7%

(136/140) 97.1% (3/140) 2.1% (1/140) 0.7%

188

6.4 Conclusion We have developed a face recognition system. This system uses the face features for recognition .The recognition stage recognize is an input face image belongs to a certain person from some predefined persons in the system database. In this project we go through the following steps: 1. Preprocessing stage which must produce an initial point where the second stage will start, this stage will make a histogram equalization which will produce a clearer image with more details and less noise. 2. Another preprocessing stage is the eye detection stage which will get the eye position as a result, this stage has two sub-stages, is just a estimation to reach the eye region, the second sub-stage is a neural network which will produce the eye location. 3. The Gabor stage, this stage will produce a 40 Gabor filters each one with different orientation and frequency. 4. Feature extraction stage which considered the main stage in the system, the output of this stage is a files containing the response of the Gabor filters and the face code that will used in identification stage. 5. After the feature extraction stage we will pass that response file to the next stage which will recognize and classify the input image to one of the database classes (person). 6. the proposed system was tested on a 14 person database and give a recognition rate of 97.2%

189

6.4 Future work In our proposed system we can make some work to enhance the performance. We can use another algorithm for the eye detection stage to achieve better performance. Another work may be done in trying other parameters in the Gabor filter stage; this may lead to more accurate feature extraction of the face and lead to more accurate recognition system. Also we can change the way we chose the face code from the response file. Other neural networks may be tested and implemented to achieve better results. We can make the system updatable, which mean that the end user may be able to add or remove persons (face images) from the database.

190

Appendix A Important Codes

This appendix contains some important codes we had implemented for different stages in the proposed system. Most of codes and user interfaces has implemented in VC# and VC++ using Microsoft compilers.

A.1 Histogram Equalization Class Histogram equalization class is used in the first stage of the preprocessing stage (chapter 5.2.1), It had been implemented in VC# figure (A.1). using System; using System.Windows.Forms; using System.Drawing; namespace Face_Recognition { public class HistoEqu { public Point[] equtran=new Point[240]; public Point[] equtest=new Point[160]; int yy =0; public Bitmap res; Bitmap sou; int[] arr=new int[256]; public HistoEqu(int width,int hight,Bitmap source) { res=new Bitmap(width,hight); sou=source; counthiso(); } public void counthiso() { int temp=0,maxgray=255;

191

double[] arr1=new double[256]; for (int i=0;i