ISOLATED HANDWRITTEN LATIN AND DEVANAGARI NUMERAL

0 downloads 0 Views 676KB Size Report
In this communication we propose an efficient automatic recognition system for isolated handwritten Latin and Devanagari numerals. Fourier descriptors based ...
ISOLATED HANDWRITTEN LATIN AND DEVANAGARI NUMERAL RECOGNITION USING FOURIER DESCRIPTORS AND CORRELATION R.V.KULKARNI Department of Technology, SHIVAJI University, Kolhapur416004, India. [email protected]

P.N.VASAMBEKAR Department of Computer Science, Shivaji University, Kolhapur416004, India.

ABSTRACT

Automatic recognition of handwritten numerals has importance in practical fields. In this communication we propose an efficient automatic recognition system for isolated handwritten Latin and Devanagari numerals. Fourier descriptors based features are extracted and are input to a feed forward backpropagation neural network for classification. The numeral recognition is also done by template matching classifier based on correlation metric. Using total 10000 training samples the proposed technique is tested on total 2360 handwritten Latin and Devanagari numerals extracted from dates present on bank cheques. The average recognition accuracy of 98.42% and 99.63% are obtained by using artificial neural network (ANN) classifier and template matching (TM) classifier respectively. KEY WORDS

Handwritten numeral recognition, Neural Networks, Fourier Descriptors, Correlation, Devanagari numerals 1. INTRODUCTION

The numeral recognition system plays a significant role in document image analysis and recognition. Various applications that include automated systems to process bank cheques, postal documents, income tax forms, historic documents etc. need the number recognition module. In the literature different approaches for handwritten script identification have been proposed. Surveys on feature extraction in handwritten script recognition are available [1,2]. B.P.Ckacko and Babu A.P. [3] compared different statistical and structural features. The statistical feature extraction methods [4-7] are based on zoning, moments, Fratio etc. The structural features [8-11] are obtained from geometrical and topological properties of a character like endpoints, chain codes etc. Different classifiers like Artificial Neural Network (ANN), Hidden Markov Model (HMM), Nearest Neighborhood (NN), K-Nearest Neighborhood (KNN) Template Matching, Support Vector Machine (SVM) etc. have been reported for recognition [12,13]. Various combination strategies [9, 14-16] resulting in more accurate recognition and Devanagari character recognition [17-20] are reported. In present communication we report an automatic recognition system for handwritten Latin and Devanagari numerals extracted from dates present on bank cheques, using features based on Fourier descriptors and two classifiersArtificial Neural Network (ANN) and Template Matching (TM). 719

2. DATA COLLECTION AND PREPROCESSING

For preparation of training database of 0-9 digits, 5000 Latin numerals and 5000 Devanagari numerals are collected from 100 writers with variety of handwriting styles. Each writer contributed unconstrained 5 samples /digit (0-9)/ script (Latin and Devanagari).The Latin and Devanagari numerals for testing are obtained from handwritten dates present on bank cheques in the fixed format. 170 cheques are written in Latin script and 130 are in Devanagari by different professionals. In preprocessing skew elimination is done by Radon transforms and noise is reduced by 3x3 median filter. The extracted isolated numerals from the date field on bank cheques are then converted into normalized binary images. Preprocessed binary image samples are shown in Fig.1. Fig.1 Preprocessed Latin and Devanagari numerals 1 to 3

3. PROPOSED SYSTEM

The proposed system layout is shown in Fig.2.

Fig.2 Proposed Sytem Layout The operations for binarization, skew elimination, noise removal, digit extraction are applied on handwritten Latin and Devanagari bank cheque samples to get isolated numerals from their date fields. After normalizing to 40x40 pixels size, these preprocessed isolated digits are given for classification. 4. FOURIER DESCRIPTORS BASED FEATURE EXTRACTION

Boundary coordinates of a test numeral at 128 points are obtained. Using these coordinates 64 dimensional Fourier Descriptors are computed. Each coordinate pair s(k) = [x(k), y(k)] of 128 points boundary can be treated as a complex number so that, s(k) = x(k) + jy(k), k = 0, 1, 2, …..,127. (1) That is, x-axis is treated as real axis and the y-axis as the imaginary axis of a sequence of complex numbers. The discrete Fourier transform (DFT) of s(k) is, u = 0,1,….,127. (2) a(u)=1/k ( ∑ k=0127 s(k)e –j2Πuk/128), The complex coefficients a(u) are called Fourier Descriptors of the boundary. The inverse Fourier transforms of these coefficients restores s(k), k = 0,1,….,127 . (3) s(k)= ∑ u=0127 a(u)e j2Πuk/128 , If only the first 64 coefficients instead of all Fourier coefficients are used the result is the following approximation to s(k), k = 0,1,….,127 . (4) ŝ(k)= ∑ u=063a(u)e j2Πuk/128 , This feature vector is input to ANN classifier to recognize the best numeral. 720

5. ANN CLASSIFICATION

The feedforward backpropagation neural network with two hidden layers of 80 and 10 neurons respectively and an output layer with 10 neurons corresponding to 10 classes of digits 0 to 9 is implemented for the classification of numerals. ANN training is done by using total 10000 numeral samples (5000 for Latin and 5000 for Devanagari script) of digits 0-9 collected from 100 writers from various professions. For testing purpose total 2360 numerals (1342 for Latin and 1018 for Devanagari script) are extracted from the date fields on collected 300 handwritten bank cheques. 6. CORRELATION BASED RECOGNITION

Each class of digits 0-9 is represented by the prototype pattern vectors known as templates. An unknown pattern is assigned to the class to which it is closest in terms of correlation metric. The cross correlation of the test image with the template is performed. A set of 1000 templates per script is extracted from collected training samples. Thus each digit will have 100 templates per script for template matching. For recognition by template matching, correlation coefficients between the preprocessed image of a separated test digit and 1000 templates (100 templates per digit) are calculated to get the best result. 7. RESULTS

The system is executed on an Intel Core i5 CPU running on 2.66 GHz with RAM of 4 GB. Training time of ANN with 5000 training samples are 2488.66 s and 2187.74 s for Latin and Devanagari numerals respectively. Tab.1 shows accuracy rates of recognition of extracted Latin numerals by ANN and Template matching with 1000 templates. Tab.2 shows accuracy rates of recognition of extracted Devanagari numerals by ANN and Template matching with 1000 templates. The average accuracy rate and average time consumed in recognition of each digit by ANN and Template matching are presented in Tab.3. From this table it can be noticed that ANN classifier trained with 5000 samples per script provides average accuracy of 98.42% for numeral recognition. However, template matching with 1000 templates per script results in average accuracy rate of 99.63% for numeral recognition. Once trained, ANN recognition is much faster than template matching. 7. SUMMARIES

An efficient method for automatic recognition of isolated handwritten Latin and Devanagari numerals extracted from dates present on bank cheques is implemented using Fourier descriptors and two classifiers (ANN and TM). The novelty of the present system is that it is thinning free. Promising recognition results with accuracy rate of 98.42% and 99.63% are achieved by ANN classifier and by TM respectively. The response time for numeral recognition depends upon the number of templates and number of training samples in the database. The developed system can be used to recognize numeric data on various document images and their interpretation by applying contextual rules in an integrated manner. 721

Tab.1 Latin numeral recognition result by ANN and TM Digits (Latin)

Test Samples (170 Dates)

0 1 2 3 4 5 6 7 8 9 TOTAL

292 230 240 43 78 68 56 62 83 190 1342

ANN-5000 training samples Accuracy Time per Rate (%) digit(s) 97.26 0.007 95.22 0.007 89.58 0.007 100 0.01 100 0.007 100 0.008 100 0.008 100 0.008 91.57 0.008 96.84 0.008 97.05 avg 0.008 avg

TM-1000 Templates Accuracy Rate (%) 100 100 98.75 100 100 100 100 100 100 99.47 99.82 avg

Time per digit(s) 2.59 2.56 2.56 2.62 2.62 2.62 2.57 2.55 2.54 2.59 2.58 avg

Tab.2 Devanagari numeral recognition result by ANN and TM

Digits (Dev.)

Test Samples (130 Dates)

0 1 2 3 4 5 6 7 8 9 TOTAL

239 237 173 47 31 62 38 36 31 124 1018

ANN-5000 training samples Accuracy Time per Rate (%) digit(s) 98.33 0.009 99.58 0.008 100 0.01 100 0.006 100 0.007 100 0.006 100 0.007 100 0.006 100 0.007 100 0.006 99.79 avg 0.007avg

TM-1000 Templates Accuracy Rate (%) 100 99.58 97.11 100 100 100 100 100 100 97.58 99.43 avg

Time per digit(s) 2.56 2.54 2.55 2.58 2.60 2.58 2.58 2.60 2.58 2.55 2.57 avg

Tab.3 Average accuracy rate and average time/digit by ANN and TM Script

Latin Devanagari Avg. per classifier

ANN-5000 training samples Average Time per Accuracy (%) digit(s) 97.05 0.008 99.79 0.007 98.42

0.008 722

TM-1000 Templates Average Time per Accuracy(%) digit(s) 99.82 2.58 99.43 2.57 99.63

2.57

REFERENCES [1] Anil.K.Jain and Torfinn Taxt. Feature Extraction Methods for Character Recognition-A Survey, Pattern Recognition, 1996, Vol. 29, No. 4, PP.641-662. [2] Ms. Snehal Dalal and Mrs. Latesh Malik. A Survey of Methods and Strategies for Feature Extraction in Handwritten Script Identification, 1st Int. Conf. on Emerging Trends in Engineering and Technology, 2008. [3] Binu P Ckacko and Babu Anto P. Comparison of Statistical and Structural Features for Handwritten Numeral Recognition, Int. Conf. on Computational Intelligence and Multimedia Applications, 2007. [4] S. Impedovo; G. Pirlo; R. Modugno; A. Ferrante. Zoning Methods for Hand-Written Character Recognition: An Overview,12th Int. Conf. on Frontiers in Handwriting Recognition, 2010. [5] Satish Kumar and Chandan Singh. A Study of Zernike Moments and its use in Devnagari Handwritten Character Recognition, Int. Conf. on Cognition and Recognition, 2005,PP.514520. [6] Reena Bajaj; Lipika Dey; Santanu Chaudhury. Devanagari Numeral Recognition by Combining Decision of Multiple Connectionist Classifiers, Sadhana, 2002, Vol. 27, Part 1, PP.59–72. [7] T. Wakabayashi; U. Pal; F. Kimura; Y. Miyake. F-ratio Based Weighted Feature Extraction for Similar Shape Character Recognition, 10th Int. Conf. on Document Analysis and Recognition, 2009. [8] Saima Farhan; Muhammad Abuzar Fahiem; Huma Tauseef. Geometrical Features Based Approach for the Classification and Recognition of Handwritten Characters, 2nd Int. Conf. in Visualization, 2009. [9] Sushama Shelke and Shaila Apte. A Novel Multi-feature Multi-Classifier Scheme for Unconstrained Handwritten Devanagari Character Recognition, 12th Int. Conf. on Frontiers in Handwriting Recognition, 2010. [10]S.Arora; D. Bhattacharjee; M. Nasipuri; D.K. Basu; M.Kundu; L.Malik. Study of Different Features on Handwritten Devnagari Character, 2nd Int. Conf. on Emerging Trends in Engineering and Technology, 2009. [11]M.H.B.Zulkefly.Number Recognition System by using Chain code Technique, 2010. [12]Alceu de S. Britto Jr et al., A Two-Stage HMM-Based System for Recognizing Handwritten Numeral Strings, 6th Int. Conf. on Document Analysis and Recognition, 2001. [13]G.G.Rajput and S.M.Mali. Fourier Descriptor based Isolated Marathi Handwritten Numeral Recognition, Int. Journal of Computer Applications[J], 2010 ,Vol. 3, No.4. [14]U. Bhattacharya; S. K. Parui; B. Shaw; K. Bhattacharya. Neural combination of ANN and HMM for handwritten Devnagari Numeral Recognition”, 10th IWFHR, 2006, PP.613-618. [15]Promod Kumar Sharma. Multiple Classifiers for Unconstrained Offline Handwritten Numeral Recognition, Int. Conf. on Computational Intelligence and Multimedia Applications, 2007. [16]P.Zhang; T. D. Bui; C. Y. Suen. Hybrid Feature Extraction and Feature Selection for Improving Recognition Accuracy of Handwritten Numerals”, 8th Int. Conf. on Document Analysis and Recognition, 2005. [17]U.Pal; R. K. Roy; K. Roy; F. Kimura. Indian Multi-Script Full Pin-code String Recognition for Postal Automation, 10th Int. Conf. on Document Analysis and Recognition, 2009. [18]M.Hanmandlu; J. Grover; V. K. Madasu; S. Vasikarla. Input Fuzzy Modeling for the Recognition of Handwritten Hindi Numerals, Int. Conf. on Information Technology, 2007. [19]C.Vasantha Lakshmi; Ritu Jain; C.Patvardhan. Handwritten Devnagari Numerals Recognition with higher accuracy, Int. Conf. on Computational Intelligence and Multimedia Applications, 2007. [20]U.Pal; T. Wakabayashi; F. Kimura. Comparative Study of Devnagari Handwritten Character Recognition using Different Feature and Classifiers, 10th Int. Conf. on Document Analysis and Recognition, 2009.

723