Segmentation-Verification for Handwritten Digit ...

10 downloads 0 Views 3MB Size Report
Jun 16, 2016 - F. C. Ribas, L. S. Oliveira, Jr. A. S. Britto and R. Sabourin, Handwritten digit segmentation: a .... Processing(ICIAP), Mantova, Italy, pp. 670-675 ...
PEOPLE’S DEMOCRATIC REPUBLIC OF ALGERIA MINISTRY OF HIGHER EDUCATION AND SCIENTIFIC RESEARCH ECOLE NATIONALE SUPÉRIEURE D’INFORMATIQUE (ex INI) ECOLE DOCTORALE : STIC OPTION : SIC

THESIS SUBMITTED FOR THE DEGREE OF DOCTOR IN COMPUTER SCIENCE BY GATTAL Abdeljalil

Segmentation-Verification for Handwritten Digit Recognition

This thesis was publicly defended on June 16, 2016 in front of the examination committee composed of:

President: BENATCHBA Karima

Professor, ESI, Algiers, Algeria

Examiners: BELHADJ AISSA Aichouche HADDADOU Hamid AOUAT Saliha

Professor, USTHB, Algiers, Algeria Associate Professor, ESI, Algiers, Algeria Associate Professor, USTHB, Algiers, Algeria

Thesis Supervisor: CHIBANI Youcef

Professor, LISIC Lab, USTHB, Algiers, Algeria

À CEUX QUE J’AIME …ET CEUX QUI M’AIMENT TO THOSE I LOVE AND THOSE WHO LOVE ME

A BSTRACT Automatic reading of digit fields from an image document has been proposed in several applications such as bank checks, postal code and forms. In this context, two main problems are occurred when attempting to design a handwritten digit string recognition system. The first problem is the link between adjacent digits, which can be naturally spaced, overlapped or/and connected. The second problem is the unknown-length of the digit string, which is not carefully written by people in real-life situations.

In this thesis, SVM-based segmentation-verification system for segmenting two connected handwritten digits using the oriented sliding window is proposed. It employs a segmentationverification system using conjointly the oriented sliding window and Support Vector Machine (SVM) classifiers. Experimental results showed that the proposed system is more appropriate for segmenting simple and multiple connections. Its main advantage lays in the use few rules for finding the optimal segmentation path. Hence, the proposed approach constitutes a tradeoff between the correct segmentation and the number of the segmentation cuts.

Thereafter, we propose a new design of a handwritten digit string recognition system based on the explicit approach for the unknown-length digit strings. Three methods are combined according the link of adjacent digits, which are the histogram of the vertical projection dedicated for spaced digits, the contour analysis dedicated for overlapped digits and the Radon transform performed on the sliding window dedicated for connected digits. A recognition and verification module based on Support Vector Machine (SVM) classifiers allows analyzing and deciding the rejection or acceptance each segmented digit image. Experimental results conducted on the benchmark dataset show that the proposed system is effective for segmenting handwritten digit strings without prior knowledge of their length comparatively to the state-of-art.

II

R ESUM E

La lecture automatique de champs numériques à partir d’une image de document a été envisagée dans plusieurs applications telles que les chèques bancaires, code postal et les formulaires. Dans ce contexte, deux principaux problèmes sont survenus lors de la tentative de concevoir un système de reconnaissance de la chaîne de chiffres manuscrits. Le premier problème est le type de liaison entre les chiffres adjacents, qui peuvent être naturellement espacées, se chevauchent et/ou connectés. Le deuxième problème est la longueur inconnue de la chaîne de chiffres, qui ne sont pas soigneusement écrit par les gens dans des situations réelles.

Dans cette thèse, Un système de segmentation-vérification basé sur SVM est proposé pour segmenter deux chiffres manuscrits connectés utilisant la fenêtre glissante orientée. Il emploie un système de segmentation-vérification utilisant conjointement la fenêtre glissante orientée et le classifieurs Support Vector Machines (SVM). Les résultats expérimentaux ont montré que le système

proposé

est

plus

approprié

pour

segmenter

des

simples

et

multiples

connexions. L'avantage principal réside dans l'utilisation de quelques règles pour trouver le chemin de segmentation optimale. Par conséquent, l'approche proposée représente un compromis entre la segmentation correcte et le nombre de coupes de segmentation.

Ensuite, nous proposons une nouvelle conception d'un système de reconnaissance de la chaîne de chiffres manuscrits basée sur l'approche explicite pour la longueur inconnue de la chaîne de chiffres. Trois méthodes sont combinées selon le type de la liaison entre les chiffres adjacents, qui sont l'histogramme de projection verticale dédiée pour les chiffres espacés, l'analyse du contour affecté aux chiffres qui se chevauchent et la transformée de Radon effectuée sur la fenêtre glissante destiné aux chiffres connectés. Un module de reconnaissance et vérification est basé sur le classifieurs Support Vector Machine (SVM) qui permet d'analyser et de décider le rejet

ou

l'acceptation

de

chaque

image

de

chiffres

segmentés.

Les

résultats

expérimentaux effectués sur la base de données du benchmark montrent que le système proposé est efficace pour segmenter les chaînes de chiffres manuscrits sans connaissance au préalable de leur longueur relativement à l'état de l'art.

III

A CK NOW LEDGEM ENTS

First, I would like to thank Prof. Youcef CHIBANI for her valuable contributions, corrections and discussions. I would like to thank the members of the LISIC Laboratory for their encouragement and support during the preparation of this thesis. In addition, I would like to thank the members of my examining committee. Their comments helped to improve the quality of the final version of this thesis.

IV

C ONTEN T Abstract Résumé Acknowledgements Content List of figures List of tables List of acronyms and symbols Introduction Chapter 1. 0verview of handwritten digit recognition 1.1. Introduction 1.2. Character recognition systems 1.2.1. Acquisition mode 1.2.1.1. On-line mode 1.2.1. 2.Off-line mode 1.2.2. Preprocessing 1.2.2.1. Noise reduction 1.2.2.2. Normalization 1.2.2.3. Smoothing 1.2.2.4. Skeletonization 1.2.3. Segmentation 1.2.4. Feature generation 1.2.4.1. Global features 1.2.4.2. Statistical features 1.2.4.3. Geometrical and topological features 1.2.5. Classification techniques 1.2.5.1. Statistical techniques 1.2.5.2. Structural techniques 1.2.5.3. Stochastic techniques 1.3. Overview of Support vector machines « SVM » 1.3.1. Basic principles 1.3.2. SVM Kernel 1.3.3. Multi-class SVM 1. 4. Summary Chapter 2. Isolated handwritten digit recognition 2.1. Introduction 2.2. Overview of isolated handwritten digit recognition 2.3. Size normalization 2.4. Feature generation 2.4.1. Global features 2.4.1.1. Density 2.4.1.2. Center of gravity 2.4.1.3. Second order geometrical moments 2.4.1.4. Number of transitions 2.4.2. Hu’s Moment Invariants 2.4.3. Skew 2.4.4. Zernike moments 2.4.5. Projections 2.4.6. Profile Features 2.4.7. Background features 2.4.8. Foreground features 2.4.8.1. Contour Based Features 2.4.8.2. Skeleton Based Features 2.4.9. Ridgelet Transform

II III IV V VI IX X 01 05 05 06 06 07 07 07 07 09 09 09 11 12 12 12 12 13 14 14 14 15 19 21 23 24 25 27 28 28 28 28 29 29 29 30 30 31 31 31 33 33 33 34

V

2.4.10. Region sampling: uniform grid 2.5. Recognition 2.6. Experimental results 2.6.1. Experimental results on NIST SD19 2.6.2. Experimental results on CVL Single Digit Database 2.7. Summary Chapter 3. Segmentation of two connected handwritten digit recognition 3.1. Introduction 3.2. Overview of two connected handwritten digit recognition 3.3. Segmentation-verification system 3.3.1. Segmentation of connected digits 3.3.1.1. Finding the interconnection points 3.3.1.2. Finding the cutting path 3.3.2. Feature generation 3.3.3. Recognition and verification 3.4. System evaluation 3.4.1. Databases 3.4.2. Parameter tuning of the SVM model 3.4.3. Results 3.4.3.1. Influence of the orientation angle 3.4.3.2. Complexity of the proposed segmentation-verification 3.4.3.3. Comparative analysis 3.5. Summary Chapter 4. Handwritten digit string recognition 4.1. Introduction 4.2. Overview of handwritten digit string recognition 4.3. Segmentation methods of the digit string 4.3.1. Segmentation of spaced digits 4.3.2. Segmentation of overlapped digits 4.3.3. Segmentation of connected digits 4.3.3.1. Adjustment of the width (ܹௌௐோ் )

35 36 36 37 39 41

4.3.3.2. Finding the angular cut via the Radon transform 4.4. Design of the digit string recognition system 4.4.1. Digit recognition-verification 4.4.1.1. Feature generation 4.4.1.2. Design of the SVM classifiers on isolated digits 4.4.1.3. Digit verification 4.4.2. Spaced digit recognition-verification 4.4.3. Overlapped digit recognition-verification 4.4.4. Connected digit recognition-verification 4.5. Experimental results 4.5.1. Databases and evaluation criteria 4.5.2. Experimental setup 4.5.3. Design of SVM classifiers on isolated digits 4.5.4. Evaluation of the digit segmentation 4.5.4.1. Influence of the range of the projection angle 4.5.4.2. Adjustment of the angular step 4.5.5. Comparative analysis 4.5.6. Computation cost of the proposed system 4.6. Summary Conclusion References

71 73 74 74 75 75 75 76 76 78 78 79 80 81 81 85 89 90 91 92 94

42 43 46 47 48 49 50 51 52 53 54 55 55 56 60 62 64 65 66 66 67 69 70

VI

L I ST OF FI GURES

Figure 1.1. Steps of the typical character recognition system. Figure 1.2. Samples of the basic methods for normalization: (a) Skew normalization (b) Slant normalization (c) Size normalization. Figure 1.3. Example of skeletonization Figure 1.4. Misclassification caused by over-segmentation Figure 1.5. Misclassification caused by under-segmentation Figure 1.6. Maximum margin hyperplane and margins for an SVM Figure 1.7. Hyperplane soft margin for non-linearly separable data Figure 1.8. The data not linearly separable in 2-D are mapped onto three dimensions where a linear decision surface between the classes can be made. Figure 1.9. Regions not classified by OAA approach for a problem of three classes Figure 1.10. Regions not classified by OAO approach for a problem of three classes Figure 1.11. Directed acyclic graph SVM method. Figure 2.1. Different configurations of concavity Figure 2.2. Concavity labels for digit ‘9’ Figure 2.3. Contour detection: (A) Contour of the upper region, (B) Feature vector, and (C) 8-Freeman directions Figure 2.4. Features extracted from the skeleton Figure 2.5. Example of splitting a digit using a uniform grid (2x2) Figure 3.1. Different types of connected digit samples. Figure 3.2. Possible segmentation paths using IP and BP: Hypothesis 1 (b) Hypothesis 2 (c) Hypothesis 3. Figure 3.3. Crossing and the segmentation paths according the orientation of the window. Figure 3.4. Illustrative example for segmenting two connected digits using the oriented sliding window and recognition-verification strategy. Figure 3.5. Number (#) of segmentation cuts versus the orientation angle D (°) Figure 3.6. The overall rate (%) versus the orientation angle D (°) Figure 3.7. The overall rate (%) versus the number (#) of segmentation cuts Figure 3.8. Unsuccessful segmentation-verification of connected digits produced by the proposed system Figure 3.9. Performance of the segmentation-recognition algorithms Figure 4.1. Correct and incorrect segmentation using the HVP. Figure 4.2. Sample images of broken handwritten digits. Figure 4.3. Segmentation by contour analysis. (a) Possible segmentation. (b) Impossible segmentation Figure 4.4. Examples of incorrect segmentation when using the contour analysis. Figure 4.5. Processing of broken parts (a) two distinct overlapped single-digits (Rule1) (b) broken singledigit (Rule2) (c and d) broken digit (Rule 3) Figure 4.6. Influence of adjusting the width on the sliding window. Figure 4.7. Radon transform for three examples of connected digits. The maximum value of the Radon transform corresponds to the orientation angle Ʌୡ୳୲ for cutting two connected digits. Figure 4.8. Two selected projections of the Radon Transform showing Ʌୡ୳୲ = 51° Figure 4.9. Impact of selecting the orientation angle for segmentation (a) Segmentation with SWRT (b) Segmentation without SWRT Figure 4.10. Full segmentation system for handwritten digit string recognition. Figure 4.11. Some samples of connected digits considered by SCA as isolated digits. Figure 4.12. Connected digit recognition-verification (a) Original connected digits (b) Scan IPs (c) Fixing sliding window Radon transform around IP (d) Segmentation paths (e) Final decision. Figure 4.13. Segmentation example of our proposed digit string recognition system.

06 08 09 11 11 16 19 20 21 22 23 32 32 33 34 35 47 49 50 52 56 57 57 59 61 67 67 67 68 69 71 72 72 73 74 76 77 78

VII

Figure 4.14. Impact of selecting the range of the projection angle for detecting the cutting path from [1,179] to [80, 100]. Figure 4.15. Influence of selecting the range of the projection angle (°) for different digit string lengths: (a) 2-Digit string length (b) 3-Digit string length (c) 4-Digit string length (d) 5-Digit string length (e) 6-Digit string length (f) 10-Digit string length. Figure 4.16. Influence of adjusting the angular step (°) for different digit string lengths: (a) 2-Digit string length (b) 3-Digit string length (c) 4-Digit string length (d) 5-Digit string length (e) 6-Digit string length (f) 10-Digit string length. Figure 4.17. Impact of selecting the angular step for detecting the cutting path: (a) Correct segmentation when the angular step is fixed to 3°. (b) Incorrect segmentation when the angular step is fixed to 6°.

82 84 87 88

VIII

L I ST OF TABLES Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table Table

2.1. 2.2. 2.3. 2.4. 2.5. 2.6. 3.1. 3.2. 3.3. 3.4. 3.5. 3.6. 3.7.

Summary of features The recognition rate for various experiments Recognition results on individual features Recognition results on feature combinations Recognition rates on individual digits Comparison of proposed method with state-of-the-art methods Comparison of different segmentation methods Distribution of the NIST SD19 database for handwritten digits Types of connected numeral strings Distribution of the database regarding the type of connection The recognition rate for various orientation angles Segmentation effect according the orientation angle Some examples of the successful segmentation-verification produced by the proposed system according the connection type 3.8. Some examples of the unsuccessful segmentation-verification produced by the proposed system according the connection type 3.9. Comparative analysis 3.10. Rank of our method comparatively to the existing methods according the connection type 4.1. Number of digit string samples (#Strings) distributed according to the Numbers of Spaced Digits (#SD) and Connected and/or Overlapped Digits (#C-OD) expressed also in %. 4.2. Recognition rate (%) obtained when training and retraining SVM classifiers. 4.3. Average rates obtained for all digit string lengths for different ranges of the projection angle (°) 4.4. Recognition rates according to the angular step for each string length. 4.5. Comparative analysis of various segmentation systems for the unknown-length string performed on NIST NSTRING SD19 4.6. Examples of the correct and incorrect segmentation-recognition produced by the proposed system from the NSTRING SD19 Database.

36 38 39 40 40 41 46 53 54 54 55 56 58 59 60 62 79 81 85 88 89 90

IX

L I ST OF ACRONYM S AND SYM BOLS PDA

Personal Digital Assistant

Tablet PCs

Tablet Personal Computers

SAM

Spectral Angle Mapper

ANN

Artificial Neural Networks

SVM

Support Vector Machines

NN

Nearest Neighbor

HMM

Hidden Markov Models

VC

Vapnik and Chervonenkis

KKT

Karush Kuhn Tucker

RBF

Radial Basis Function kernel

KMOD

Kernel with Moderate Decreasing

OAA

One Against All

OAO

One Against One

DAG

Directed Acyclic Graph

GMM

Gaussian Mixture Models

CVL

Computer Vision Lab

NIST

National Institute of Standards and Technology

MNIST

Modified NIST

NIST SD19

NIST Special Database 19

CENPARMI

Centre for Pattern Recognition and Machine Intelligence

CEDAR

Center of Excellence for Document Analysis and Recognition

NSTRING SD19 NIST STRING Special Database 19 MLP

Multi-Layer Perceptron

FIR MLP

Finite Impulse Response MultiLayer Perceptron

LIBSVM

A Library for Support Vector Machines

SOM

Self-Organizing Maps

IP

Interconnection Point

BP

Base Point

SWRT

Sliding Window Radon Transform

AHDSR

Automatic Handwritten Digit String Recognition

HVP

Histogram of the Vertical Projection

SC

Segmented Components

SCA

Segmented Component Analysis X

DRV

Digit Recognition-Verification

GC

Grouped Component

GCA

Grouped Component Analysis

BSVM

binary SVM

SD

Spaced Digits

C-OD

Connected and/or Overlapped Digits

w

Perpendicular vector to the separating hyperplane of an SVM

w୲

Transpose of w

x

Input space showing an example of the database

w଴

Bias setting an SVM

‫ݑ‬

Possible decision space associated with an example x

݄

VC dimension

N

Number of training set

Ʉ

probability

C

Regularization constant

‫ן‬௜

Lagrange multipliers

ߦ௜

Non-negative slack variables

‫ܮ‬

Lagrangean function

ĭ

Mapping function (quadratic transform)

K

Kernel function

m

Dimension of the feature space



Decision function of SVM



Scale parameter of RBF and KMOD



Parameter of KMOD.

m୮୯

Two-dimensional geometric moment

݉଴଴

Zeroth order moment

Ɋ୮୯

Central moments

10T

݉ଶ଴ , ݉଴ଶ

R

Second-order moments

Ԅ଻

The seventh moment invariant

ܸ௡,௠

Zernike polynomials

R ୬,୫

Radial polynomial

ܼ௡,௠

Zernike moments

‫)(ܫ‬

Image function

หܼ௡,௠ ห

Amplitudes of Zernike moments

ߜ

Dirac distribution

ߠ

Angular variable

‫ݎ‬

Radial variable

ܶ௥௔ௗ

Radon transform XI

D

Orientation angle

‫ݐ‬ௗ

Distance between two digits

ܹௌௐோ்

Width of Sliding Window Radon Transform

ߠ௖௨௧

Angular cut

ߚ

Parameter defined experimentally

݃(‫ݎ‬, ߠ)

Radon transform

݂௠௔௫ (‫ݔ‬ௌ஼ )

Maximal value selected from 10 responses provided by SVM classifiers

‫ݔ‬௜

Feature vector

‫ݐ‬௙

Threshold for Digit verification

‫ݐ‬ுௐ

Fixed decision threshold

XII

I NTRODUCTI ON Automatic handwriting recognition is intended to convert images that are understandable by humans, in an interpretable code by a machine. It has been a subject of intensive research for about last fifty years. The problem, which is very simple for almost every human, is extremely complicated for the machine. Several techniques and methods have been proposed in order to build faster and more reliable systems. However, despite all the efforts in this area, there is still a significant gap between human and machine performances. Two fields can be released: Online handwriting recognition and offline handwriting recognition. In Online recognition, characters are recognized when written. Stroke information is mono-dimensionally captured dynamically, represented by the pen trajectory. Offline handwriting recognition is different from online handwriting recognition, because here, stroke information is not accessible. The information is two-dimensional represented as an image and captured of the scanned device of the characters. Offline handwriting recognition is less accurate than online systems because the temporal information is absent. The latter is the subject of this thesis. Offline handwriting recognition itself can be divided into two approaches, according to the number of persons whose writing must be recognized. When this number is limited, the recognition system can support a specific training on the base of the own writing to those persons. The system is called the mono-script recognition. When the writings is composed from a very large number of different persons, the system is called the multi-script recognition. Hence, this work takes place in this theses. A typical offline handwriting recognition system usually consists of three main processing steps: preprocessing, feature generation and recognition using a classifier. Preprocessing consists of a sequence of operations performed to the images in order to prepare them for feature generation. These common operations include noise reduction, document skew correction, slant correction, normalization, smoothing and skeletonization. The feature generation allows representing an image as a vector of features using various extraction techniques, which can be redundant or not. When features are redundant, a selection algorithm can be performed to reduce the size of the input feature vector to avoid the so-called curse of dimensionality problem. Finally, there are also a large number of classifiers for recognition ncluding the statistical, the structural, the stochastic classifiers and finishing on combination of classifiers. At each step, selecting the appropriate parameters could affect the final classification performance.

1

P roblem Statem ent The challenges of unconstrained handwritten digit

recognition are related mainly from

the following factors: x Multi-script recognition: is considered as a complex problem for the recognition of unconstrained handwritten digits caused by the variability of writing; x Lack of constraints: imposed upon the writing (slopes, slants, overlaps and interconnections of digits); x Variety of writing styles: are multiple such as stick, detached, mixed and cursive script; x Tools of writing: make the line thickness not unified; x Insufficient inking: causes scanning defects largely due to the age of the writer and the quality of the paper; x The segmentation into isolated digits can produce pseudo-digits or connected digits. At the end this step, the system must be robust in order to avoid an oversegmentation and under-segmentation. In summary, the difficulties of the unconstrained handwritten digit recognition are not only located in recognizing individual digits, but also to separate out digits each other within the string through segmentation. It can be conducted by considering three following situations: spaced, overlapped or connected digits. In most cases, the overlapped and connected digits are the frequent observed situations. Currently, the focus of our work is the development of an Automatic Handwritten Digit String Recognition (AHDSR), which is required in many applications such as the amount of the bank checks, postal code and forms. In this context, some complex problems are occurred for recognizing handwritten digit strings: the presence of the noise, broken digits, overlapping digits, connected digits and unknown length of the string. The literature shows different methods for AHDSR. Some works are based on contours, the background and others on the skeleton. Others combine the two approaches, using the skeleton and contour allowing to deduce the potential cutting points and in order to achieve a better performance. In reviewing the various explicit segmentation methods for AHDSR, the literature presents two different approaches: segmentation-recognition and recognition-based. The relationship between segmentation and recognition when using the recognition-based, each step depends on the other. In this case, the priori knowledge is required in the segmentation step. Generally, algorithms based on the segmentation-recognition approach allows to separate the digit string into segments using rules without recognition.

2

In this work, the segmentation of handwritten digit string into digits is the most crucial part in AHDSR. We shall deal with the foregoing problems by using a recognition-based approach based on segmentation and verification.

G oals of the w ork The primary goal of this research is focused on the recognition of isolated digits. The main challenges in handwritten digit recognition arise from variations in size, shape, slant, and most importantly, the differences in the writing styles of individuals. In this research, we are interested in enhancing the feature generation step for isolated digit recognition used for avoiding digit normalization. The idea is to find a combination of multiple features which improves the overall recognition rates by minimizing the intra-class variability and maximizing inter-class variability, the most desirable requirement of any pattern recognition system. The performance of an AHDSR system depends in particular on the design of a robust Support Vector Machines (SVM) classifier. A second goal of this research lies in the recognition of two connected handwritten using a SVM-based segmentation-verification system for segmenting simple and multiple connected digits using an oriented sliding window. Two morphological features are combined based on the contour and the skeleton for detecting and splitting correctly the connected digits. In order to avoid the over-segmentation and the under-segmentation, a sliding window is used for finding interconnection points. When the interconnection points are found, the window is oriented to define the optimal cutting path. The recognition and verification process is performed using the Support Vector Machines (SVM) through the One-Against-All (OAA) implementation. The third goal of this research is to propose a solution for recognizing handwritten digit strings. based on the explicit approach for the unknown-length. Three methods are combined according the link of adjacent digits, which are the histogram of the vertical projection dedicated for spaced digits, the contour analysis dedicated for overlapped digits and the Sliding Window Radon Transform method dedicated for connected digits. Beside, the proposed system uses Support Vector Machine (SVM) as classifier for recognition and verification. Experiments carried out on isolated digits, two connected digits and strings of digits are are performed for evaluating these three contributions using standard datasets.

Outline of the Thesis This thesis consists of four chapters as described as follows: Chapter 1 presents a brief overview of the handwritten character recognition system in order to be able to introduce all definitions related to the system. This attempt is to bring out all steps of

3

the recognition system.

It provides a detailed description of the popular Support Vector

Machine (SVM) classification technique since it is more accurate than other classifiers in many areas of applications for data classification. Chapter 2 describes the combination of different statistical and structural features for recognition of isolated handwritten digits. These features include some global statistics, moments, profile and projection based features and features generated from the contour and skeleton of the digits. Some of these features are extracted from the complete image of the digit while others are extracted from different regions of the image by first applying a uniform grid sampling to the image. Classification is carried out using one-against-all SVM implementation. The experiments are conducted on the isolated handwritten digit NIST SD19 and the CVL Single Digit Database. SVM-based segmentation-verification system for segmenting two connected handwritten digits using the oriented sliding window is addressed in Chapter 3. This latter describes a segmentation-verification system using conjointly the oriented sliding window and support vector machine (SVM) classifiers. The oriented sliding window is used for finding at the same time the interconnection points and the optimal angle for cutting the adjacent digits. Whilst, classifiers are used for recognizing and verifying the correct segmentation via a global decision module for accepting or rejecting the processed image. Experimental results conducted on a large synthetic database of handwritten digits show the effective use of the oriented sliding window for segmentation-verification. Chapter 4 is dedicated to propose a new segmentation and recognition system for unknownlength handwritten digit strings. Three segmentation methods are combined based on the histogram vertical projection, contour analysis and the Sliding Window Radon Transform (SWRT) for recognizing unknown-length handwritten digit strings with the help of the verification strategy stage. A recognition module based on the SVM classifier allows analyzing and deciding the rejection or acceptance for each segmented digit image. Experimental results conducted on the benchmark dataset show that the proposed strategy is effective for segmenting handwritten digit strings. Finally, we conclude this thesis by summarizing the main contributions of this work and new directions are proposed for its improvment.

4

C HAP TER n O VER VI EW OF HANDW R I TTEN DI GI T R ECOGNI TI ON Handwritten character recognition is an important area in image processing and pattern recognition field. Its goal is to transform the handwritten characters to automatically recognized by a machine readable. In this context, this thesis focuses on the handwritten digit recognition from an image document since many applications can be envisaged as bank checks, postal code and forms. Hence, this chapter is devoted to overview the main modules composed a handwritten digit recognition system.

1.1. I ntroduction Automatic reading of digit fields from an image document has been proposed in several applications such as bank checks [1], postal code [2] and forms [3]. The digit fields can be printed (or typewritten) and written by a writer text. The first case is considered as a closed problem since all difficulties have overcome. In contrast, the second case is entirely different because of the high variability of the writing [4]. This chapter is devoted to overview the different modules that allow recognizing a handwritten digit character. Basically, a handwritten digit recognition system is composed of five distinct modules: acquisition, preprocessing, segmentation, feature generation and classification followed by an optional post-processing module.

1.2. Character recognition system s The design of a handwritten character recognition system can be conducted mainly in five steps: acquisition, preprocessing, segmentation, feature generation and classification. The first module is the image acquisition, where a digital image of the text is obtained. This could be done off-line using a scanner or on-line by a digital pen/stylus. Next, the preprocessing module generally consists of several methods used to improve the quality of images for further processing. The segmentation module is used for separating the overlapping and/or joining of adjacent digits into elementary digits in order to deduce the possible distinct classes [5-9]. After that, a feature generation is performed on the digit image for reducing the dimension of the representation and thus makes the design of the classification system. Next, a decision function

5

allows assigning a character image to predefined class. Figure 1.1 illustrates the different modules for recognizing a handwritten digit. The following sections describe each module composed a handwritten digit recognition system. Acquisition Preprocessing Segmentation Feature generation Classification

Training

Recognition

Reference patterns

Recognized character

Figure 1.1. Steps of the typical character recognition system.

1.2.1. Acquisition mode The acquisition of handwritten characters is done in two modes: on-line or off-line. Each mode has its own acquisition tools and its corresponding recognition algorithms. On-line mode takes into account the chronological information of the movements of the writer's hand; whereas the off-line mode treats the information in delayed independently of the time.

1.2.1.1. On-line mode This mode works in real time (during writing) i.e. the digital ink sample is composed as a set of coordinates ordered in time. It is possible to track the path to know the pen-tip movements as well as pen-up/pen-down switching and eventually the slope and speed. This is notably the case of light pens, digital pens or styluses on touch screens, PDAs or Tablet PCs. The symbols are recognized as they are written by hand. The obtained signal is converted into character codes which are recognized by means of character recognition systems. The writing is represented as a set of points whose coordinates are a function of time, which can be regarded as a digital representation of handwriting [10]. On-line recognition has a great advantage for its possible correction and modification interactively of writing having regard continuous system response [11].

6

1.2.1.2. Off-line mode The off-line handwriting recognition is performed on an image of a scanned document. In this context, the acquired data is regarded as a static representation of handwriting. This mode is appropriate for printed documents and manuscripts previously written after the acquisition. The design of the off-line handwriting recognition is difficult comparatively to the on-line handwriting recognition system since many desirables characteristics are not available as the velocity, the pressure and the coordinates of the character. This mode can be considered as the most general case for recognizing a handwriting character. In our case, the image of the written text acquired by the off-line mode involves capturing the text image using physical sensors (scanner, camera ...) with a minimum degradation. During this step, despite the good quality of acquisition systems, noises might appear in scanned document images. This is caused by the texture type, the area and its lighting.

1.2.2. Preprocessing Preprocessing is performed essentially for reducing the superimposed noise on the image data and trying to keep only the material information of the document. The noise may be due to acquisition conditions (lighting, wrong placing the document ...) or the quality of the original document. One of the problems in the recognition of handwritten characters is the skew/slant detection and correction in the text document, which introduces challenges for segmentation. Therefore, the preprocessing of the images normally yields better results. These tasks commonly include noise reduction, normalization, smoothing and skeletonization.

1.2.2.1. Noise reduction Prior to the character recognition, it is necessary to eliminate the imperfections. Noise reduction is the process of removing noise from an image. The imperfection in the optical scanning devices intensity of light, scratches on the camera scanner lens or the writing instrument causes disconnected line segments introduces noises in the scanned images. There are many techniques to reduce the noise. Basically, the filtering function is used to remove the noises and diminish spurious points in the image. For example, the symmetric Gaussian filter function is used for smoothing equally in all directions. An alternative approach is the use of the Morphological operations, which are basically neighbourhood operations. It is performed on the input image using a structuring element. Two basic morphological operations are used: dilation and erosion [1-2, 6-7].

1.2.2.2. Normalization The normalization method is popularly used in character recognition to reduce all types of variations and to obtain standardized data. However, it also gives rise to excessive shape

7

distortion and eliminates some useful information. The usual methods for normalizing a character are the following: x Skew normalization:: Due to variation in the writing style, the skew can hurt the effectiveness of recognition and therefore should be detected and corrected with respect to the baseline (see Fig. 1.2.a). Various methods have been used, which are the projection profile of the image [12], the Hough transform [13] or the shape of nearest neighbor clustering [14]. After skew detection, the character or word is translated to the origin and rotated until the baseline is horizontal. x Slant normalization: The character inclination typically found in cursive script is called slant. Formally, it is defined as the angle between longest stroke in a word and the vertical direction referred to the word slant. Slant normalization is used to normalize all characters to a standard form with no slant (see Fig. 1.2.b). Many methods have been proposed to detect and to correct the slant of cursive words. One of the used methods is based on the center of gravity [15], another method uses the projection profiles [16] and some used a variant of the Hough transform [17]. x Size normalization: It is used to adjust the size, position and shape (dimension) of the character image. This step is required for reducing the shape variation between images of the class to facilitate the feature generation and improve their classification [18] (see Fig. 1.2.c).

(a)

(b)

(c) Figure 1.2. Samples of the basic methods for normalization: (a) Skew normalization (b) Slant normalization (c) Size normalization.

8

1.2.2.3. Smoothing The smoothing operation is done to regularize the edges in the image, to remove small bits of noise and to reduce the high frequency noise in the image [19]. Furthermore, different preprocessing methods are used for smoothing image in order to acquire more accurate output image. In freeman’s direction extraction, smoothing is done by comparing each code with previous and next code [6, 18].

1.2.2.4. Skeletonization Skeletonization is a morphological operation used for reducing foreground regions (contour) in a binary image to a skeletal, which the connectivity of the original region is detected while destroying most of the original foreground pixels. Methods for the skeletonization are divided into two main approaches: iterative and noniterative [20]. When using the iterative approach, the peeling contour process parallel or sequentially by erasing or removing the unwanted pixels in each iteration. In contrast, the noniterative approach, the skeleton is straightforward extracted without examining each pixel individually. Unfortunately, these techniques are difficult to implement and slow as well. Thinning can be somewhat performed for skeletonization using methods like erosion or opening. In this mode, it is commonly used by reducing all lines to single pixel thickness as shown Fig. 1.3.

Figure 1.3. Example of skeletonization.

1.2.3. Segmentation When designing a digit recognition system, the most important step is the segmentation of the digit string to obtain isolated digits. This step is a non-trivial problem, due to several complexities. The first one is the inherent nature of the script that is cursive and at the same time overlapped. The second one is the high degree of variation in writing styles produced by writers.

9

The segmentation systems can be divided into two approaches: implicit and explicit. x Implicit approach considers all traced points as potential segmentation points [3, 9]. In this case, the segmentation and recognition are performed simultaneously for recognizing a digit string. Indeed, this approach does not attempt to separate digits, but rather it incorporates the implicit segmentation into the recognition module. x Explicit approach is performed by finding the best way to separate adjacent digits before recognition. In this case, the segmentation and recognition are performed separately. Also, three cases can be occurred when attempting to separate two adjacent digits: spaced, overlapped and/or connected digits. In most cases, the overlapped and connected digits are the frequent observed situations. Hence, many algorithms were proposed to separate the couple or string of contiguous digits. In this case, the explicit segmentation algorithms can be categorized in two groups [8-9]: x The recognition-based approaches usually generate multiple candidate segmentation points which are verified with recognizer for choosing the optimal segmentation points. Although this approach gives better efficiency than the segmentation-recognition approaches, the main weakness lies in the computational time to compare all the segmentation hypotheses. x The segmentation-recognition approaches are based on the morphology of the connected digits presented by the contour (concavities, profiles) and the skeleton (Background, foreground). It is used to construct the segmentation paths where each segmented component should contain an isolated digit, which is submitted to the recognizer. These methods are generally faster than recognition-based because a number of separation attempts must be performed before a segmentation. Generally, the recognition reaches very high performance when dealing with spaced digits. However, when a complete segmentation system is used, the recognition is more difficult since it can generate either an over-segmentation or under-segmentation. When the over-segmentation problem is occurred, the segmentation cut is performed in intradigits and provides better results than other inter-digit segmentation cuts. Figure 1.4 shows some examples of the over-segmentation problems.

10

20

020

19

14

101

121

Figure 1.4. Misclassification caused by over-segmentation When the under-segmentation problem is occurred, the lack segmentation produces wrong result than the correct segmentation cuts. For example, the connected digits often are recognized as isolated digit. Therefore, this problem can provide the confusion between isolated and under-segmented digits. Some examples of the under-segmentation problem are shown in Figure 1.5.

00

18 Figure 1.5. Misclassification caused by under-segmentation The proposed segmentation system for separating the couple or string of contiguous digits will be discussed in detail later in the Chapters 3 and 4.

1.2.4. Feature generation The feature generation can be defined as a problem for extracting the most pertinent information from the image for a classification problem i.e. which minimizes the intra-class variability and maximizes the inter-class variability [21-23]. This pertinent information often is represented by a numeric value vector. The feature generation methods are many and varied. Each one has its own properties and can only apply to various contexts and different conditions. Before bringing our choice on a method than another, it is important to know all the ins and outs of their use. There are numerous types of features but they are broadly classified into three types: Global features, Statistical features, and Geometrical and topological features.

11

1.2.4.1. Global features The global features are generated from the entire character image using for instance the center of gravity, moments, the coefficient obtained from mathematic transforms such as Fourier, wavelet and Ridgelet coefficients... [23]. The global features is faster speed since the required values for calculating and the matching classification is convenient, but the distinction ability of the character details is weak, and sensitive to the image deformation. It is generally used for the simple character detection.

1.2.4.2. Statistical features The statistical features are derived from the statistical distribution of pixels in the character image of the pixels [21]. They offer high speed and low complexity and take care of writing variations to some extent. They may also be used for reducing the dimension of the feature set. One of the most common statistical features is the moments extracted from images (Hu, Zernike moments…). Therefore, the Zoning used for dividing the character into several frame containing overlapping or non overlapping zones (angular, concentric and circles grids). Finally, the horizontal or vertical projection features count the number of foreground pixels.

1.2.4.3. Geometrical and topological features The geometrical and topological features may represent global and local properties of the character's structure and have high tolerances to deformation, distortions and writing variations (translation and rotation) [22]. The topological features can encode some knowledge about the character structure or may require some knowledge for this kind of components. The specific features include shape descriptor and geometric structure features. The shape descriptor is used to describe the character structure while the geometric structure feature is used for reflecting character shape structure and the stroke segment change such the direction of the stroke segment due to writing variations. The geometrical and topological features generated from the character include the complex stroke as curves and splines, such as stroke directions, end points, intersections of line segments and loops, the number of holes, size of slopes and curves, length and thickness of strokes, the areas and perimeters... All these measures can be integrated into a single feature vector for recognition of a handwritten character.

1.2.5. Classification techniques The classification is a technique that allows assigning an unknown pattern into a predefined class [24]. The classification workflow uses either unsupervised or supervised methods to categorize character features into many classes. It can perform an unsupervised classification

12

without providing training data. In this case, the classes are generated according to the resemblance between samples. In contrast, the supervised classification requires training data and specify a classification method such as the maximum likelihood, minimum distance, Mahalanobis distance, or Spectral Angle Mapper (SAM). For handwritten digit recognition, the nearest neighbour classifier is one of the simplest and widely used for classifying the digit image [32]. Artificial Neural Networks (ANN) are another well studied classification method, and widely used for handwritten character recognition [33]. Hidden Markov Model (HMM) is also a popular approach to this problem which uses both the statistical and structural information contained into handwritten digit shapes [85]. Support Vector Machines (SVM) originally developed as a two class classifiers have been extended to multi-class problems and applied to handwritten character recognition [28,88]. In the case of the handwritten digit recognition, supervised classification has been adopted since digit classes are well specified. It can be performed in two stages: x Training stage: The goal of the training stage is to train the classifier with the known digit dataset for further recognition with unknown digit dataset. x Recognition and decision stage: This stage classifies the input pattern by comparing them to a list of reference patterns. This stage also uses classification techniques in the form of decision. The decision stage is strongly influenced by the feature generation step, and a successful of the classifier.

1.2.5.1. Statistical techniques Statistical techniques are based on a statistical decision theory that uses the statistical decision functions and a set of optimality criteria. It is performed on the shapes to be recognized. However, it requires a significant number of samples in order to achieve correct training of the probability distributions for different classes. The study allows reflecting their distribution in a metric space and the statistical characteristics of the classes allow to take a decision as higher probability of the class membership (more confidence) [24]. The main statistical techniques which are applied in the character recognition field are the followings [18, 24]: x Parametric Recognition: This method requires a priori information about the characters for obtaining a parametric model for each character. Therefore, once the parameters of the model which may be based on some probabilities are obtained, the characters are classified according to some decision rules such as maximum Likelihood or Bayes's method [33,28,85]. x Non-parametric Recognition: This method does not require a priori information about the data. It is used to separate different pattern classes along hyper planes defined in a given hyper space. The finest known method of non-parametric categorization is the Nearest Neighbor (NN) and is extensively used in character recognition. An incoming pattern is

13

classified using the cluster, whose the center is the minimum distance from the pattern over all the clusters [18,32].

1.2.5.2. Syntactic and structural techniques Syntactic and Structural techniques are based on the structural primitives taking into account the physical structure of characters. In general, it is assumed that the character primitives extracted from writing are quantifiable and one can find the relations among them.

For

example, syntactic pattern recognition can be used to find out what objects are present in an image. Furthermore, structural methods are strong in finding a correspondence mapping between two images of an object [18, 32-33]. The main difference between these structural techniques and statistical techniques is that these features are topological primitives and not measures. Several techniques are available such as graph-matching algorithm, grammatical Methods and string matching.

1.2.5.3. Stochastic techniques The stochastic techniques use a model for recognition, taking into account the high variability of the shape. In these techniques, the models are often discrete and many studies are based on the theory of Markov fields and Bayesian estimation. Markov fields permit the modeling of the global properties while using local constraints. The model describes these states using state transition probabilities and observation probabilities. The most common methods in these techniques are the methods using Hidden Markov Models (HMM) [18, 24, 85].

1 .3. Overview of Support vector m achines (SVM ) The support vector machine (SVM) is a popular classification technique including linear and non-linear, which not only has a solid theoretical foundation, but also it is more accurate than other classifiers in many areas of applications for data classification. SVMs are originally designed to use the principle of the structural risk [25]. In addition, they come to meet two major disadvantageous in machine learning which are: x Parametrically controlling the capacity of the SVM measured by the Vapnik–Chervonenkis (VC) dimension; x Avoid over-learning (overfitting). Two other advantages are particularly offered by SVMs. First, with an appropriate kernel, the SVM can work well even if data are not linearly separable in feature space. Second, especially popular in texts and images classification problems where very high-dimensional spaces are the norm. These two advantages allow us to select SVMs as better candidates for performing handwritten digit recognition.

14

SVMs are a machine learning algorithm for performing classification and regression via a hyperplane in a high feature space. SVM has its ability to select the representative dataset from the training dataset, which is commonly called "Support vectors" and attempts to find the optimal line (hyperplane) that maximizes the distance (margin) between the closest points from two classes. Once learned, a decision function is constructed in order to classify data according to the region separated by the hyperplane. The error is minimized by maximizing the margin controlled by the VC dimension of the classifier. For a better clarity, the theoretical aspects of SVMs are explained in the following sections [25-31].

1.3.1. Basic principles Support

Vector Machine

(SVM), introduced

by

Vapnik in 1995 [25],

is a binary

classification technique for supervised learning methods which can be used for both classification and regression. In simple terms, given a set of training samples, each labeled as members of one of two classes, an SVM classification training algorithm tries to construct a decision model be able to predict whether a new sample falls within one class or the other. If the samples are represented as points in space, a linear SVM model can be interpreted as a division of the space to the examples belonging to separate classes, which are divided by a clear gap that is as wide as possible. New samples are then predicted to belong to a class based on which side of the gap they fall on. It is based on the use of a function called decision function that allows separating optimally the data. The principle of SVM during optimization is to maximize the margin between classes in order to increase the capability of separation. The SVM classifier will classify all the points on one side of the decision boundary as belonging to one class and all those on the other side as belonging to the other class.

15

Maximum margin

h2

‫ ݓ‬௧ ‫ݔ‬௜ + ‫ݓ‬଴ < 0 h1

‫ ݓ‬௧ ‫ݔ‬௜ + ‫ݓ‬଴ > 0

Support Vector

Optimal hyperplane

‫ ݓ‬௧ ‫ݔ‬௜ + ‫ݓ‬଴ = 0 െ‫ݓ‬଴ /ԡ‫ݓ‬ԡ

Figure 1.6. Maximum margin hyperplane and margins for an SVM

‫ݔ‬௜ belongs to the hyperplane with the equation: ‫ ݓ‬௧ ‫ݔ‬௜ + ‫ݓ‬଴ = 0 . Where ‫ ݓ‬is the normal of the hyperplane while

|‫ݓ‬଴ |/ԡ‫ݓ‬ԡ is the perpendicular distance between the hyperplane and the 14T

origin (Figure 1.6). Figure 1.6 shows an example where input data are grouped into two classes which should be identified by the SVM classifier (Support vectors are circled). In the case when the training data are linearly separable, the classifier can select two hyperplanes in a way that they separate the data and there are no points between them, and then try to maximize their distance. For all data (support vectors)

‫ݔ‬௜ of the class

‫ݑ‬௜ ߳{െ1, +1} based learning, these hyperplanes (‫ݓ‬, ‫ݓ‬0 ) can be described through the 14T

14T

following equations:

൜ It is possible to rescale

‫ ݓ‬௧ ‫ݔ‬௜ + ‫ݓ‬଴ > 0 ݂݅ ‫ݑ‬௜ = 1 ‫ ݓ‬௧ ‫ݔ‬௜ + ‫ݓ‬଴ < 0 ݂݅ ‫ݑ‬௜ = െ1

(1.1)

‫ ݓ‬and ‫ݓ‬଴ such that: 14T

14T



‫ ݓ‬௧ ‫ݔ‬௜ + ‫ݓ‬଴ ൒ 1 ݂݅ ‫ݑ‬௜ = 1 ‫ ݓ‬௧ ‫ݔ‬௜ + ‫ݓ‬଴ ൑ െ1 ݂݅ ‫ݑ‬௜ = െ1

(1.2)

These can be combined into one set of inequalities:

‫ݑ‬௜ (‫ ݓ‬௧ ‫ݔ‬௜ + ‫ݓ‬଴ ) ൒ 1 ‫݅׊‬

(1.3)

16

The points for which the equality in equation (1.2) holds lie on the supporting hyperplane h1:

‫ ݓ‬௧ ‫ݔ‬௜ + ‫ݓ‬଴ = 1 and the supporting hyperplane h2: ‫ ݓ‬௧ ‫ݔ‬௜ + ‫ݓ‬଴ = െ1 .Note that h1 and h2 14T

14T

are parallel and that no training points fall between them. For a hyperplane equation hyperplane is

݄(‫ ݓ = )ݔ‬௧ ‫ ݔ‬+ ‫ݓ‬଴ , the distance of a point ‫ ݔ‬to the

݄(‫)ݔ‬/ԡ‫ݓ‬ԡ . 14T

SVM constructs a decision surface hyperplane that maximizes the separation margin by following a principle approach rooted in from the statistical learning theory. More precisely, the VC dimension is a measure of the capacity and complexity of a statistical classification theory, because it can predict a probabilistic upper bound on the test error of a classification model. Where

ܰ denotes the number of training set (restriction: this formula is valid when ݄ ‫)ܰ ا‬. 14T

Now, choose some probability

ߟ such that 0 ൑ ߟ ൑ 1 . Then, for losses taking these values, with 14T

14T

1 െ ߟ , the following bound holds (Vapnik [30]): 14T

௛ ே

ଶே ௛

ଵ ே

ఎ ସ

(1.4)

test error ൑ training error + ට (log( ) + 1) െ log( ) Where

݄ is a non-negative integer called VC dimension and is a measure of the capacity of the 14T

learning

machine.

More

precisely,

the

support

vector

machine

is

an

approximate

implementation of the method of structural risk minimization. This induction principle is based on the fact that the error rate of a learning machine on test data is bounded by the sum of the training-error rate and a term that depends on the VC dimension. In the case of separable patterns, a SVM produces a zero value for the first term (right-hand side of equation (1.4)) and minimizes the second term, thus minimizing the overall risk. In the case of non-separable patterns, the cost function that must be minimized will now perform an extra term weighted by the parameter C which penalizes any SVM generates too non-separable patterns. For

the

SVM

optimization

problem,

the

maximum

possible

hyperplane and the support vectors are obtained by minimizing

ଵ ଶ

margin

between

the

ԡ‫ݓ‬ԡଶ, under the constraints:

‫ݑ‬௜ (‫ ݓ‬௧ ‫ݔ‬௜ + ‫ݓ‬଴ ) > 1 for any ݅߳1, … , ݉ . According to the optimization theory, the 14T

goal (minimize

ଵ ଶ

ԡ‫ݓ‬ԡଶ ) and constraints (‫ݑ‬௜ (‫ ݓ‬௧ ‫ݔ‬௜ + ‫ݓ‬଴ ) > 1 ) are strictly convex. This

problem can be resolved by introducing the Lagrange multipliers ߙ௜ , therefore the constrained 14T

problem can be expressed as: ଵ ଶ

‫ݓ(ܮ‬, ‫ݓ‬଴ , ߙ) = ԡ‫ݓ‬ԡଶ െ σ௠ ௜ୀଵ ߙ௜ [‫ݑ‬௜ . (‫ݔ‬௜ . ‫ ݓ‬+ ‫ݓ‬଴ ) െ 1] That must be canceling the partial derivatives with respect to

(1.5)

‫ ݓ‬and ‫ݓ‬଴ . In this expression 14T

14T

called dual form, the constraints of good ranking become a trade-off between a large margin and a small error penalty.

17

The Karush-Kuhn-Tucker (KKT) conditions play a central role in both the theory and practice of constrained minimization problem, it proves that this is equivalent to the equations solutions of annulling the derivatives of Lagrangian with respect to the variables

‫ ݓ‬, ‫ݓ‬଴ , ߙ. The annulment

of partial derivatives of these partial derivatives gives ߙ:



ଵ ଶ

௠ ‫ݔܽܯ‬ఈ {σ௠ ௜ୀଵ ߙ௜ െ σ௜,௝ୀଵ ߙ௜ ߙ௝ ‫ݑ‬௜ ‫ݑ‬௝ (‫ݔ‬௜ . ‫ݔ‬௝ )}

σ௠ ௜ୀଵ ߙ௜ ‫ݑ‬௜ = 0

(1.6)

‫ݓ‬଴ can be computed thanks to the ߙ terms: ‫ݓ‬଴ = σ௠ ௜ୀଵ ߙ௜ ‫ݑ‬௜ ‫ݔ‬௜

(1. 7)

The Lagrange multipliers are deduced by setting the partial derivatives to zero, which correspond to support vectors. All other points have null

ߙ. Finally, the equation of the

separating hyperplane is defined: ‫כ‬ ‫כ‬ ݄(‫ )ݔ כ ݓ( = )ݔ‬+ ‫ݓ‬଴‫ = כ‬σ௠ ௜ୀଵ ߙ௜ ‫ݑ‬௜ . (‫ݔ‬. ‫ݔ‬௜ ) + ‫ݓ‬଴

Where

ߙ௜‫ כ‬are the solutions of non-null ߙ୧ and ‫ݓ‬଴‫ כ‬is found by placing the coordinates of a

support vector if ‫ݑ‬௜

(1.8)

‫ݔ‬௜ of the class ‫ݑ‬௜ in ‫ ݓ‬௧ ‫ݔ‬௜ + ‫ݓ‬଴ > 0 if ‫ݑ‬௜ = 1 or in ‫ ݓ‬௧ ‫ݔ‬௜ + ‫ݓ‬଴ < 0

= െ1.

The Soft Margin method allows selecting a hyperplane that splits the examples as cleanly as possible, while still maximizing the distance to the nearest cleanly split examples. In the case of non-linearly separable patterns, the objective function is then increased by a function which penalizes non-null ߦ௜ , and the minimization optimization problem. The good ranking constraint initially defined by:

‫ݑ‬௜ (‫ ݓ‬௧ ‫ݔ‬௜ + ‫ݓ‬଴ ) > 1

(1.9)

The Soft Margin method introduces non-negative slack variables Ɍ୧ , which measure the degree of misclassification of the data

‫ݔ‬௜ . If the penalty function is linear, the optimization problem

released through a parameter ߦ௜ to become a:

‫ݑ‬௜ (‫ ݓ‬௧ ‫ݔ‬௜ + ‫ݓ‬଴ ) > 1 െ ߦ௜

(1.10)

In this case, the maximum possible margin between the hyperplane and the support vectors, initially obtained by minimizing

ଵ ଶ

ԡ‫ݓ‬ԡଶ. If the penalty function is linear, the minimization

optimization problem becomes: ଵ ԡ‫ݓ‬ԡଶ ଶ

where

+ ‫ ܥ‬σ௠ ௜ୀଵ ߦ௜

(1.11)

‫ ܥ‬is a strictly positive soft margin parameter to be determined called also regularization

constant. This parameter allows striking a balance between the two competing criteria of

18

margin maximization and error minimization, whereas the slack variables

ߦ௜ indicates the

distance of the incorrectly classified points from the optimal hyper plane. Figure 1.7 shows the incomplete separable data and slack variables (wrong side of the classifier).

ߦ௜ ‫ݔ‬ଶ

‫ݔ‬ଵ ߦ௜

Figure 1.7. Hyperplane soft margin for non-linearly separable data (‫ݔ‬ଵ is misclassified while ‫ݔ‬ଶ is properly classified although its ߦ௜ is non-null)

1.3.2. SVM Kernel The problem of a nonlinear classifier is addressed by nonlinear transformation of the inputs into high-dimensional feature space, in which the probability of linear separation is high. This transformation of input data into high-dimensional feature space is achieved using kernel functions. The method of linear separation, previously presented is somewhat limited. The introduction of kernel functions will allow overcoming this limitation. The main idea of the nonlinear extension is to map the data from the original input space to a feature space and find the hyperplane with a large margin in the feature space. For an enlarged feature space, consider transformations of a mapping

ĭ as follow: Ȱ: Ը୒ ՜ ‫ܪ‬

(1.12)

19

Figure 1.8. The data not linearly separable in 2-D are mapped onto three dimensions where a linear decision surface between the classes can be made. which maps the training data from

Ը୫ to some higher Euclidean space H , which possibly has

infinite dimensions (m is the dimension of the feature space). In this high dimension space, the data can be linearly separable, hence the linear SVM formulation above can be applied to these data. In the SVM formulation, the training data only appear in the form of dot products These can be replace by dot products in the Euclidean space

H : Ȱ(‫ݔ‬௜ ). Ȱ൫‫ݔ‬௝ ൯.

‫ܭ‬൫‫ݔ‬௜ . ‫ݔ‬௝ ൯ = Ȱ(‫ݔ‬௜ ). Ȱ൫‫ݔ‬௝ ൯ In the equation of the separating hyperplane (1.8), the dot product any kernel function

‫ݔ‬௜ . ‫ݔ‬௝ .

(1.13)

(‫ݔ‬. ‫ݔ‬௜ ) may be replaced by

‫ݔ(ܭ‬. ‫ݔ‬௜ ) providing a dot product. The equation of the separating

hyperplane becomes: ‫כ‬ ‫כ‬ ݄(‫ )ݔ כ ݓ( = )ݔ‬+ ‫ݓ‬଴‫ = כ‬σ௠ ௜ୀଵ ߙ௜ ‫ݑ‬௜ . ‫ݔ(ܭ‬. ‫ݔ‬௜ ) + ‫ݓ‬଴

where

(1.14)

ߙ௜‫ כ‬are the solutions of: ቊ

ଵ ଶ

௠ ‫ݔܽܯ‬ఈ {σ௠ ௜ୀଵ ‫ן‬௜ െ σ௜,௝ୀଵ ߙ௜ ߙ௝ ‫ݑ‬௜ ‫ݑ‬௝ ‫ݔ(ܭ‬௜ . ‫ݔ‬௝ )}

σ௠ ௜ୀଵ ߙ௜ ‫ݑ‬௜ = 0

(1.15)

The kernel functions commonly used are:

x Linear kernel: ‫ݔ(ܭ‬, ‫ݔ‬௜ ) = (‫ݔ‬. ‫ݔ‬௜ ) x Gaussian Radial Basis Function kernel (RBF): Sometimes parametrized using ߛ x Polynomial kernel:



‫ݔ(ܭ‬, ‫ݔ‬௜ ) = exp(െߛ ‫ ݔ צ‬െ ‫ݔ‬௜ ‫צ‬ଶ ) for ߛ >0.

= ଶఙమ .

‫ݔ(ܭ‬, ‫ݔ‬௜ ) = (‫ݔ‬. ‫ݔ‬௜ + ܿ)ଶ

x Sigmoid kernel or Hyperbolic tangent:

‫ݔ(ܭ‬, ‫ݔ‬௜ ) = ‫ݔ(݄݊ܽݐ‬. ‫ݔ‬௜ + ܿ)

20

The effectiveness of SVM depends on the selection type of the kernel, the kernel's parameters and the soft margin parameter

‫ ܥ‬. A common choice is a Gaussian kernel, which has a single

parameter ߛ . The best combination of

‫ ܥ‬and ߛ is checked using cross validation [26].

1.3.3. Multi-class SVM SVM is designed only for separating two classes. Its extension for multi-classes has conducted to develop various implementations SVM [27-30]. The multiclass problem can be processed as a combination of binary classifiers. When using SVM for handwritten digit recognition, we have a problem of ten (10) classes. Hence, some approaches are used for extending the binary SVM to the mutli-class, which are One-Against-All (OAA), One-Against-One (OAO) and Direct Acyclic Graph SVM. We review in the following the main properties of the three implementations. x One against all: The most intuitive approach for making the multiclass SVM involves building many SVMs classifier than two classes [28, 30]. One-against-all approach constructs binary SVM classifiers and requires

ܰ

ܰ comparisons for the decision, each one has the ability

to separate of which separates one class from all the rest. The classification of new samples is done by a winner-takes-all strategy, in which the classifier with the highest output function assigns the class. (1.14)

arg ݉ܽ‫ݔ‬௜ ݂௜ (‫)ݔ‬

In this way, the number of unclassified data is reduced. However, when the maximum output value is obtained for two SVMs, it cannot assign the concerned data to a class. Hence, this method leaves regions of the undecided feature space where more than one class are accepted or all classes are rejected. This is demonstrated in figure 1.9.

݂஺ (‫ = )ݔ‬0

݂஻ (‫ = )ݔ‬0

݂஼ (‫ = )ݔ‬0

Figure 1.9. Regions not classified by OAA approach for a problem of three classes.

21

x One-against-one: Another combination method is “One-Against-One”, also known as “pairwise coupling”, which builds one SVM for each pair of classes, or problem with

ܰ(ܰ െ 1)/2 for a

ܰ classes. The classification is done by a max-wins voting strategy, in which

every classifier assigns the sample to one of the two classes. Then, the vote for the assigned class is increased by one vote, and finally the class with the most votes determines the sample classification. ݂௜௝ (‫)ݔ‬ class

= 0 is the decision function that separates the class ݅ of the

݆ by: ݂௜௝ (‫ = )ݔ‬െ݂௝௜ (‫)ݔ‬. During the classification step, max-wins voting strategy is

calculated as:

arg ݉ܽ‫ݔ‬௜ୀଵ,…,ே σே ௝ஷ௜,௝ୀଵ ‫݂(݊݃݅ݏ‬௜௝ (‫))ݔ‬ However, when two classes have the same score,

(1.15)

‫ ݔ‬will not be classified. This concept is

showed in figure 1.10.

A C

A B

B C

Figure 1.10. Regions not classified by OAO approach for a problem of three classes. x Directed Acyclic Graph SVM (DAGSVM): To resolve unclassifiable regions for the pairwise classification, Platt et al. [31] proposed One-against-one in a DAG graph (decision-treebased pairwise classification), which uses directed acyclic graph (DAG) to reduce the number of SVM that need to use during the testing phase. When having method requires

ܰ classes, One-against-one

ܰ(ܰ െ 1)/2 classifiers, while the DAG method requires only to ܰ െ 1

classifiers. Fig. 1.11 shows the decision tree for the three classes. In the figure,

݅ shows that

‫ ݔ‬does not belong to class ݅ . In this case, any pair of classes can be the top-level classification of the tree. Except for the leaf node when class ݆. For example, if ݂ଵଶ (‫)ݔ‬

݂௜௝ (‫ > )ݔ‬0, ‫ ݔ‬does not belong to

> 0, ‫ ݔ‬does not belong to Class 2. Thus, it can belong to the

22

Class 1 or 3 and the next classification pair is Classes 1 and 3 supplied by the decision function.

݂ଵଶ (‫)ݔ‬ Not 2

Not1

݂ଵଷ (‫)ݔ‬ Not 3 Class 1

Not1

݂ଶଷ (‫)ݔ‬ Not 2 Class 3

Not1 Class 2

Figure 1.11. Directed acyclic graph SVM method.

1 .4. Sum m ary The objective of this chapter is to overview the main modules composed a character recognition system. Depending on the application, each module has its own importance. In our case, we focus on the development of a digit recognition system having three main modules: feature generation, segmentation of the digit string to isolated digits and the classification based on the SVM. In the next chapter, we will introduce our system with a brief overview for recognition of isolated handwritten digits.

23

C HAP TER o I SOLATED HANDW R I TTEN DI GI T R ECOGNI TI ON This chapter investigates the combination of different statistical and structural features for recognition of isolated handwritten digits, a classical pattern recognition problem. The objective of this study is to improve the recognition rates by combining different representations of nonnormalized handwritten digits. These features include some global statistics, moments, profile and projection based features and features computed from the contour and skeleton of the digits. Some of these features are extracted from the complete image of digit while others are extracted from different regions of the image by first applying a uniform grid sampling to the image. Classification is carried out using one-against-all SVM. The experiments conducted on the isolated handwritten digit NIST SD19 and the CVL Single Digit Database realized high recognition rates, which are comparable to state-of-the-art methods on this subject.

2.1. I ntroduction Handwriting recognition has been the premier research problem of the document analysis and recognition community for over three decades now. The sub problems in handwriting recognition mainly include line, word or character level segmentation, recognition of isolated characters, words, or complete lines/paragraphs and recognition of numerical strings and isolated digits. Among these different modalities of handwriting recognition, this chapter focuses on recognition of isolated digits, a classical pattern recognition problem that offers a wide range of applications. Unlike alphabet, the ten glyphs of the most commonly used Western Arabic numerals are shared by many scripts and languages around the world making them globally acceptable. The main challenges in handwritten digit recognition arise from variations in size, shape, slant, and most importantly, the differences in the writing styles of individuals. With the recent advancements in image analysis and pattern classification, sophisticated digit recognition systems have been proposed which aim to enhance the overall recognition performance by improving the feature generation or/and classification techniques used. Some of the studies aim to improve the classification performance by using a combination of multiple classifiers while others aim to combine multiple features and select the most pertinent and optimum set of features for this problem.

24

In this chapter, we are interested in enhancing the feature generation step for isolated digit recognition used for avoiding digit normalization. The idea is to find a combination of multiple features which improves the overall recognition rates by minimizing the intra-class variability and maximizing inter-class variability [21-23], the most desirable requirement of any pattern recognition system.

2 .2. Overview of isolated handw ritten digit recognition Over the years, various handwritten isolated digit recognition systems reporting high recognition rates have been proposed. Most of these systems have been evaluated on the widely used MNIST database [32]. Among significant contributions to digit recognition, authors in leCun et al. 1995 [32] present a comprehensive comparison of different classification algorithms on the recognition task. Heutte et al. [33] proposed a combination of seven different features to feed a linear discrimination based classifier. The method of Cai and Liu (2001) [34] present an approach that integrates both statistical and structural information for the recognition of unconstrained handwritten numerals. Dong et al. [35] extracted a set of gradient features while Teow and Loe [36] computed linearly separable features from the MNIST database and applied triowise linear support vector machine with soft voting for classification. Belongie et al. [37] developed a novel similarity measure by finding the correspondences between points in two shapes and estimating an aligning transform. The proposed matching technique achieved high recognition rates when applied to digit recognition. In another notable contribution, Lauer et al. [38] proposed a trainable feature extractor based on LeNet5 neural network architecture. Classification carried out using Support Vector Machine realized promising recognition rates. A comprehensive survey on handwritten digit recognition on CENPARMI, CEDAR, MNIST databases can be found in [39]. Recently, the handwritten digit recognition competition [40] held in conjunction with ICDAR 2013 also provided a platform for comparison of state-of-the-art digit recognition techniques under the same experimental conditions. In this competition, we have participated with two (2) methods for handwritten digit recognition (Tébessa I and Tébessa II). In the following, we can be cited these techniques as follow: x Salzburg I method: The approach is based on the Finite Impulse Response Multilayer Perceptron (FIR MLP). First, the color images are transformed to 8 bit gray-scale images. Then these gray-scale images are resized to 20×20 pixel images and their center of mass is computed. Each scaled image is positioned by their center of mass in the center of a 28 × 28 25

pixel image. Each pixel value has been normalized into the range [-1, 1]. For the experiments, a neural network framework has been adapted to the FIR MLP. This method uses one partially connected FIR MLP with four layers. x Salzburg II method: The description of this method is similar to the previous (Salzburg I), but uses an ensemble of four FIR MLP partially and fully connected with four layers. x Orand method: The approach is based on the combination of four descriptors which allow exploiting three different characteristics of image digits. For the pre-processing, a thresholding operation using Otsu’s method is applied. For the feature generation, three different characteristics of the digits are exploited: the stroke orientations based on the HOG approach, the relation between background and foreground based on concavities, and the contour. In particular, the image is divided into 2 × 2 regions. For each region a histogram of orientations using 32 bins is computed. Then, the descriptor is produced by concatenating the region histograms. The image is resized to 40 × 40 pixels. Then a thinning operation is applied. The profile with respect to the left and right side is computed yielding an 80-size descriptor. Finally, the digit descriptor is the concatenation of the four described descriptors yielding a 240-size descriptor. For classification, a multi-class SVM classifier using a RBF kernel is used, the cost parameter is set to 6, and the gamma parameter is set to 1.4. x Jadavpur method: The system is based on a Fuzzy-Entropy-based feature selection strategy over a combination of Quad-tree-based longest-run and convex-hull-based feature sets with SVM classifiers. The system is finally developed using the selected 190 features. x Paris Sud method: The images are pre-processed as following: First, getting rid of color, second, using down-sampling to 20×20 resolution and finally, placing the images on a 28×28 grid by centering their center of gravity. Then Hamming tree algorithm and the AdaBoost.MH implementation are used as classifier. The chosen classifier has 47,642 trees of 4 nodes (5 leaves) each. In each boosting iteration 100 random Haar filters are tested, chosen uniformly from the possible geometries. x François Rabelais method: First, the image is cropped to the bounding box of the digit, deleting the white border, and is surrounded with a 1-pixel margin. It is then magnified to have a final size of 128×128. Finally a skew and slant normalization is applied. The black pixels are considered as input data points, and reduce them using k-means algorithm. The Delaunay Triangulation is built on the input data points. A multilevel static uniform zoning is computed at K distinct orders. For each cell of a grid, two values are appended: the number of input elements in the cell ,and the average of input elements in the neighborhood of the cell. Both the centers of gravity of the triangles and the black pixels within the cell are input elements. A SVM classifier (libSVM) with a RBF kernel is trained and then used for the prediction.

26

x Hannover method: The numerals are normalized, binarized and slope corrected. The feature vector is composed of three methods: first, number of black pixels in each row and column; second, lengths of 12 probes in different directions from different positions; and third, normalized central moments. The digits are classified with a nearest-k-neighbor classifier. x Tébessa I method: This method is conducted with combination two structural features without uniform grid method, background features and foreground features of the skeleton. For the three remaining methods: global features, ridgelet transform and foreground features, the image was divided into four regions by using uniform grid method. The recognition module is based on the SVM multi-class approach using the one-against-all implementation. SVM and 5%)NHUQHOSDUDPHWHUVDUHIL[HGWR& DQGı  x Tébessa II method: This method is based on multi-scale run length features which are determined on the binary image taking into consideration both the black pixels corresponding to the ink trace and the white pixels corresponding to the background. The probability distribution of black and white run-lengths has been used. The proposed method includes global features, the ridgelet transform, background features, foreground features and multiscale run length features, completing the system by a multi-class SVM classifier based on approach one-against-all. The objective of our study is to find a combination of features which achieves high recognition rates on non-normalized isolated handwritten digits. We have considered global and local, structural and statistical features [22-23,41] in our work. The features that we consider in our study include Hu’s moment invariants, skew angle, Zernike moments, profile and project based features, background and foreground features and Ridgelet transform. The proposed approach aims to combine these different features to best represent the digits.

2 .3. Size norm alization To evaluate our technique on the same database but normalized. We used the size normalization module which allowed adjusting the size, position and shape of the digit image in order to reduce the shape variation between images. In this way, the feature generation task is facilitated and to expect improving their classification [42]. In our case, the method based on bilinear interpolation was selected which assign each target point a linear combination of four nearest source points and then performing the inverse transformation. For example, suppose that the gray levels u(i1, i2) of an image are known points with integer coordinates (0, 0), (1, 0), (0, 1), (1, 1), assigning a gray scale value at the point (x, y) is effected by a bilinear interpolation [43], is defined by: ‫ݔ(ݑ‬, ‫( = )ݕ‬1 െ ‫)ݔ‬. (1 െ ‫)ݕ‬. ‫(ݑ‬0,0) + (1 െ ‫)ݔ‬. ‫ݕ‬. ‫(ݑ‬0,1) +‫ݔ‬. (1 െ ‫)ݕ‬. ‫(ݑ‬1,0) + ‫ݔ‬. ‫ݕ‬. ‫(ݑ‬1,1)

(2.1)

27

Other types of interpolation can be defined by using more points with integer coordinates. By these methods, it is possible to increase the resolution of images fictively. In our work, the image size is reduced to a normalized size 24 x 24.

2 .4. Feature generation Feature generation aims to express input data using a numerical representation or a set of symbols (coding) to select the best set of features (feature vector) for a particular problem, digit recognition in our case [22-23, 33, 39, 43]. Features are generally categorized into global or local and statistical or structural features. Statistical features represent pattern classes by statistical measures while structural features use formal structures for data representation. Commonly

used

statistical

features

include

moments,

descriptors

and

geometrical

measurements etc. Examples of structural features include bends, end points, intersections, loops and measures of concavity etc. [44]. Structural properties can sometimes also be expressed using statistical measures. Each of these types of features can be extracted globally from the objects (digits, characters, words or paragraphs) under study or locally from small regions of these objects. However, each feature is more suited to one type or the other giving rise to global and local features. In the following sections, we provide an overview of the features used in our study.

2.4.1. Global features Global features are computed from the image of the digit as a whole. The global features we compute include the following.

2.4.1.1. Density Computing the density would be to count the number of black pixels and dividing by the total number of pixels.

2.4.1.2. Center of gravity The centre of gravity, or sometimes called as centre of mass, is a point in an image in which its mass is concentrated. They characterize images in a way that has analogies to statistics. Generally, an image may be considered as a Cartesian density distribution function I(x, y), where I(x, y) is the intensity of the pixel. Then, the two-dimensional geometric moment of order (p+q) of a function I(x, y) is defined as: ାஶ

ାஶ

݉௣௤ = ‫ି׬‬ஶ ‫ି׬‬ஶ ‫ ݔ‬௣ ‫ ݕ‬௤ ‫ݔ(ܫ‬, ‫ )ݕ‬dxdy

(2.2)

where ‫ ݔ‬௣ ‫ ݕ‬௤ is the basis or the weighing kernel with ‫݌‬, ‫ = ݍ‬0,1,2, . . , λ. The basis may have properties that are passed on to the moments, producing descriptors that are invariant to scale, translation or rotation.

28

In order to express the above equation in a discrete form, the image has been sampled into square unit pixels. ݉௣௤ = σ௫ σ௬ ‫ ݔ‬௣ ‫ ݕ‬௤ ‫ݔ(ܫ‬, ‫)ݕ‬

(2.3)

The zeroth order moment defines the total mass of the image (segmented image, its area, total number of pixels). ே ݉଴଴ = σெ ௫ୀଵ σ௬ୀଵ ‫ݔ(ܫ‬, ‫)ݕ‬

(2.4)

The tow first orders geometrical moments define the centre of gravity of the image, respectively the centre of coordinates of an image: ‫ݔ‬ҧ =

௠భబ ௠బబ



, ‫ݕ‬ത = ௠బభ బబ

(2.5)

These two coordinates represent the center of gravity of the digit image. Then, the central moments in discrete form can be expressed as : ே ௣ ത)௤ ‫ݔ(ܫ‬, ‫)ݕ‬ ߤ௣௤ = σெ ௫ୀଵ σ௬ୀଵ(‫ ݔ‬െ ‫ݔ‬ҧ ) (‫ ݕ‬െ ‫ݕ‬

(2.6)

2.4.1.3. Second order geometrical moments The second order geometrical moments are a statistical measure of the allocation of pixels around the center of gravity. Second-order moments m 20 and m 02 describe the distribution of mass of the image with respect to the coordinate axes. In mechanics, they are called the moments of inertia. It may be used to determine the principal axes, image ellipse and radii of gyration.

2.4.1.4. Number of transitions: The number of white to black transitions (or vice versa) counted at each pixel in the four principal directions.

2.4.2. Hu’s Moment Invariants Hu proposed the application of moment invariants to image analysis and object representation problems in [45, 46]. Since then, they have been effectively applied to a large number of shape matching and similar problems. Hu’s seven moments are invariant with respect to position, scale and orientation. These moments capture information on image area, centroid and its orientation. Hu’s moments invariant are calculated using combinations of second and third order normalized central moments. The normalized central moments are given by Ʉ୮୯ =

ஜ౦౧

(2.7)



ஜబబ

where ɀ=

୮ା୯ ଶ

+1

‫ ݌( ׊‬+ ‫ )ݍ‬൒ 2

(2.8)

29

Hu’s seven invariant moments are then calculated by (2.9)

Ԅଵ = Ʉଶ଴ + Ʉ଴ଶ , ଶ Ԅଶ = (Ʉଶ଴ െ Ʉ଴ଶ )ଶ + 4Ʉଵଵ ,

(2.10)

Ԅଷ = (Ʉଷ଴ െ 3Ʉଵଶ )ଶ + (3Ʉଶଵ െ Ʉ଴ଷ )ଶ ,

(2.11)

Ԅସ = (Ʉଷ଴ + Ʉଵଶ )ଶ + (Ʉଶଵ + Ʉ଴ଷ )ଶ ,

(2.12)

Ԅହ = (Ʉଷ଴ െ 3Ʉଵଶ )(Ʉଷ଴ + Ʉଵଶ )[(Ʉଷ଴ + Ʉଵଶ )ଶ െ 3(Ʉଶଵ + Ʉ଴ଷ )ଶ ] + (3Ʉଶଵ െ Ʉ଴ଷ )(Ʉଶଵ + Ʉ଴ଷ )[3(Ʉଷ଴ + Ʉଵଶ )ଶ െ (Ʉଶଵ + Ʉ଴ଷ )ଶ ],

(2.13)

Ԅ଺ = (Ʉଶ଴ െ Ʉ଴ଶ )[(Ʉଷ଴ + Ʉଵଶ )ଶ െ (Ʉଶଵ + Ʉ଴ଷ )ଶ ] + 4Ʉଵଵ (Ʉଷ଴ + Ʉଵଶ )(Ʉଶଵ + Ʉ଴ଷ ),

(2.14)

߶଻ = (3ߟଶଵ െ ߟ଴ଷ )(ߟଷ଴ + ߟଵଶ )[(ߟଷ଴ + ߟଵଶ )ଶ െ 3(ߟଶଵ + ߟ଴ଷ )ଶ ] െ (ߟଷ଴ െ 3ߟଵଶ )(ߟଶଵ + ߟ଴ଷ )[3(ߟଷ଴ + ߟଵଶ )ଶ െ (ߟଶଵ + ߟ଴ଷ )ଶ ].

(2.15)

The seventh moment invariant,Ԅ଻ , is also skew invariant. These seven invariant moments were calculated for each binary digit image.

2.4.3. Skew The skew or orientation of the digit is calculated using Radon transform of the image [47, 48]. Radon transform of the image is the sum of radon transform of each pixel in the image. The radon function takes parallel beam projections of the image from different angles and the skew angle is determined based on the maximum value of radon function which is used as a feature.

2.4.4. Zernike moments Zernike moments [49] have been widely employed in a wide variety of pattern recognition problems and we use them in our study for characterizing the digits. For efficient computation of Zernike moments, we implemented the method in [49] which is based on recurrence relations for fast computation of radial polynomials of Zernike moments. Zernike complex moments are constructed using a set of Zernike polynomials. Zernike polynomials are a sequence of orthogonal polynomials on the unit disk and they can be expressed as: ܸ௡,௠ (‫ݔ‬, ‫ܸ = )ݕ‬௡,௠ (ߩ, ߴ) = ܴ௡,௠ (‫ ݁)ݎ‬௜௠ణ

(2.16)

where i = ξെ1 , n is the order of the radial polynomial and m is positive or negative integer according to the conditions ݊ െ |݉| even, |m| ൑ n, representing the repetition of the azimuthal angle. From x and y we obtain ɏ and Ԃ by simple conversion to polar coordinates (ɏ = ‫ ߠ ݏ݋ܿ ݎ‬, Ԃ = ‫)ߠ ݊݅ݏ ݎ‬. The radial polynomial R ୬,୫ (‫ )ݎ‬is defined as:

(௡ି௦)!

(௡ି|௠|)/ଶ

ܴ௡,௠ (‫ = )ݎ‬σ௦ୀ଴

(െ1)௦

೙శ|೘| ೙శ|೘| ି௦ቁ!ቀ ି௦ቁ! మ మ

௦!ቀ

‫ ݎ‬௡ିଶ௦

(2.17)

30

Zernike moments for a digit image can be obtained by making use of the complex conjugate property, which is Z௡,௠ = Z ‫ כ‬௡,ି௠ . We will skip directly to the formulation of the moments in adequate form: ܼ௡,௠ =

௡ାଵ గ

‫ݔ( כ‬, σ௫ σ௬ ‫ݔ(ܫ‬, ‫ܸ)ݕ‬௡,௠ ‫)ݕ‬, ‫ ݔ‬ଶ + ‫ ݕ‬ଶ ൑ 1

where I(x, y) is the image function and

Ӓ

(2.18)

denotes the complex conjugate with order n and

repetition m. The procedure for computing Zernike moments can be seen as an inner product between the image function and the Zernike basis function. If an image function ‫ߩ(ܫ‬, ߴ) having Zernike moments ܼ௡,௠ is rotated counterclockwise by angle ߙ; the transformed image function is ‫ߩ(ܫ‬, ߴ) = ‫ߩ(ܫ‬, ߴ െ ߙ), by simply substituting that into the equation. The discrete form of the Zernike moments of an image is expressed as follows: ௥ ௥ = ܼ௡,௠ ݁ ି௜௠ణ and ห ܼ௡,௠ ห = หܼ௡,௠ ห ܼ௡,௠

(2.19)

The amplitudes of Zernike moments หܼ௡,௠ ห are rotation invariant. Invariance to the scale and translation can be obtained by shifting and scaling the image before the computation of Zernike moments. In our implementation, we compute up to fourth order Zernike moments.

2.4.5. Projections Horizontal and vertical projections are determined by counting the total number of text pixels in each row/column of the image. These values are normalized by the width/height of the image. The mean and variance of these projections are used as features in our study.

2.4.6. Profile Features Left and right profiles are computed by considering for each image row, the distance between the first text pixel and the left (right) boundary of the digit image. Like projections, the profiles are normalized to the interval [0 1] and the mean and variance of these profiles are employed as features.

2.4.7. Background features The background feature vector is based on the concavity information. These features are aimed at capturing the topological and geometrical properties of the digits. Each concavity feature is the number of white pixels belonging to a specific concavity configuration [41]. The label for each white pixel is chosen based on Freeman code with four directions. Each direction is explored until a black pixel or the extremity of the digit is met.

31

Figure 2.1. Different configurations of concavity In addition to the nine standard concavity configurations, we also consider five additional configurations to more accurately model the loops in digits. These configurations are illustrated in Figure 1 and produce a 14 dimensional feature vector which is normalized between 0 and 1. Figure 2.2 shows concavity labels of the background pixel for a sample of digit ‘9’.

Figure 2.2. Concavity labels for digit ‘9’

32

2.4.8. Foreground features The foreground features are computed from two different representations of digit, contour and skeleton. Each of these types of features is discussed in the following.

2.4.8.1. Contour Based Features These features are aimed at capturing the dominant orientations in the shape of the digit and are computed from the contour of the (digit) image. The contour is detected using morphological operations [18, 44, 41, 50] and is represented by Freeman chain codes traversing the pixels in clockwise direction. This generates a string of codes in the interval [1 8] for the contour of a digit. The normalized histogram of these codes is then computed and is used as feature to characterize the digit (Figure 3). This histogram of contour chain codes is effective in capturing the dominant stroke directions (horizontal, vertical or diagonal). However, these features are very sensitive to noise and also fail to capture the structure and topology of the digit.

Figure 2.3. Contour detection: (A) Contour of the upper region, (B) Feature vector, and (C) 8Freeman directions

2.4.8.2. Skeleton Based Features These features are computed from the skeleton of the image of the digit. The skeleton of the image is computed [2, 8] and is searched for end points, crossings (interconnections) and (horizontal and vertical) directional points.

33

The neighbor N(p) of a current pixel p on the skeleton is calculated by using the 8-Freeman directions in the clockwise such as: N(p) = σ଼୧ୀଵ I(np୧ )

(2.20)

where ݊‫݌‬௜ is the ith neighbor of p. Then, the end point is defined when ܰ(‫ = )݌‬1 , the directional points is defined when ܰ(‫ = )݌‬2 and the Interconnection Point (IP) is defined when ܰ(‫ > )݌‬2 . Figure 4 illustrates some examples of each type of points (labeled as 1, 2, 3 and 4 respectively) in a digit. The normalized histogram of the occurrence of these points in a digit is used as a feature.

Figure 2.4. Features extracted from the skeleton

These skeleton based features compute the structural information of the digit but like contour based features; these features are also sensitive to noise in the image.

2.4.9. Ridgelet Transform The Ridgelet transform defined by Candès and Donoho [51] has been effectively employed for pattern recognition [52]. The Ridgelet transform combines the Radon and wavelet transforms. Radon transform has the ability to detect lines in the image while the wavelet transform allows detecting line singularities along the Radon slices. Ridgelet transform has been successfully applied to a number of problems including image compression, image transform coding, character recognition, watermarking, texture based image retrieval and biometric identification [52]. The Ridgelet transform is based on the Radon transform which is computed on several angular directions. Radon coefficients correspond to projections representing the shadow of the shape

34

at each angle [51]. Consequently, significant linear features in any direction are expressed by high magnitudes. Thus, in order to characterize linear singularities, the one-dimensional wavelet transform is applied on Radon slices to yield the Ridgelet coefficients. Hence, along the Radon axis projection, the Ridgelet is constant while in the direction orthogonal to these ridges it is a wavelet [53]. For an image ‫݌(ܫ‬ଵ , ‫݌‬ଶ ), the Ridgelet transform can be computed by first calculating the Radon transform as defined in [52]. ܶ௥௔ௗ (ߠ, ‫݌(ܫ ׭ = )ݎ‬ଵ , ‫݌‬ଶ )ߜ(‫݌‬ଵ cos(ߠ) + ‫݌‬ଶ sin(ߠ) െ ‫݌݀)ݎ‬ଵ ݀‫݌‬ଶ

(2.21)

Where ߜ, ߠ and ‫ ݎ‬are Dirac distribution, angular and radial variables, respectively. The 1-D wavelet transform is then applied on each Radon slice in order to obtain the Ridgelet coefficients ܶ௥௔ௗ (ߠ, ‫)ݎ‬. The sum of the normalized values of the coefficients is used as feature.

2.4.10. Region sampling: uniform grid Uniform grid sampling [54] is applied to the image of the digit which allows extracting features from different regions of the image separately. A uniform grid creates rectangular regions for sampling where each region is of the same size and has the same shape. For a given image of size H x L, the position of horizontal and vertical grid lines for sampling is determined as follows. ௞

‫݌‬௜ = ቂ݅ × ቃ ௡

i = 1,2, … , n െ 1

(2.22)

Where p is the vector of line positions, n is the number of horizontal or vertical regions, and k is the width or the height of the image. Figure 5 illustrates an example of a digit split into a 2x2 grid. Once the image is divided into different regions, features are extracted from each region separately. This allows a different level of granularity and features extracted from similar regions of the digit can be compared with one another allowing more effective matching.

Figure 2.5. Example of splitting a digit using a uniform grid (2x2).

35

A summary of the features used in our study along with the dimensionality of each is summarized in Table 2.1.

Feature

Description

Dimension

f1

Global features

8

f2

Hu’s Moment Invariants

7

f3

Skew Angle

1

f4

Zernike Moments

50

f5

Projection Histograms

4

f6

Profile based features

4

f7

Background features

14

f8

Histogram of contour chain code

8

f9

Skeleton based features

4

f10

Ridgelet transform

1

Table 2.1. Summary of features

2 .5. R ecognition The proposed recognition engine is based on SVM multi-class approach using the one-againstall implementation [28, 29]. The features discussed in the previous section are extracted from the training data set (to be discussed in the next section) and are fed to the SVM. Two important parameters required for training the SVM include the regularization parameter (C) DQGWKH5DGLDO%DVLV)XQFWLRQ 5%) NHUQHOSDUDPHWHU ı  Thus, some properties should be considered between C and V . x A low value of C (