Persian Handwritten Digit Recognition Using ... - Science Direct

5 downloads 0 Views 387KB Size Report
In this paper, a method is proposed to recognize Persian handwritten digits. ..... of Hand printed Characters The state of the Art”, Proceedings of the IEEE, Vol.
Available online at www.sciencedirect.com

ScienceDirect Procedia Computer Science 73 (2015) 416 – 425

The International Conference on Advanced Wireless, Information, and Communication Technologies (AWICT 2015)

Persian handwritten digit recognition using ensemble classifiers Hossein Karimia*, Azadeh Esfahanimehrb, Mohammad Moslehb, Faraz Mohammadian jadval ghadamc, Simintaj Salehpourc, Omid Medhatia a

Sama Technical and Vocational Training College, Islamic Azad University, Yasouj Branch, Yasouj, Iran. b Department of Computer Science and Engineering, Islamic Azad University of Dezful, Dezful, Iran. c Department of Computer Science and Engineering, The Pooya University of Yasouj, Yasouj, Iran.

Abstract

Optical character recognition (OCR) includes three main sections, pre-processing, feature extraction and classification. The purpose of the pre-processing is to remove noise, smooth and normalize the input data, which can have a significant role in better differentiating patterns in the feature space. In the feature extraction, a feature vector is assigned to each sample which represents the sample in the related feature space and thus makes it distinct from the other samples. Feature extraction has significant effect on classification of sample class. In the classification stage, correct boundaries should be made between feature vectors, so that the samples of each pattern are separated from other samples by clear boundaries. Persian handwritten digits recognition is a branches of pattern recognition. In this paper, a method is proposed to recognize Persian handwritten digits. The proposed framework includes three main sections, pre-processing, feature extraction and classification. In the feature extraction stage, an appropriate and complementary set of features consist of 115 features extracted from Persian handwritten digits. In the classification stage, the ensemble classifier algorithm is used to separate the samples' classes from each other. Estimation of results was performed on TMU (Tarbiat Modares University) digits database and the best recognition rate of Persian handwritten digits, was 95.280%. © 2015 2015Published The Authors. Published B.V. by Elsevier B.V. by ThisElsevier is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review under responsibility of organizing committee of the International Conference on Advanced Wireless, Information, Peer-review under responsibility of organizing and Communication Technologies (AWICTcommittee 2015). of the International Conference on Advanced Wireless, Information, and Communication Technologies (AWICT 2015)

* Corresponding author. Tel.: +98- 0937 913 0864; E-mail address: [email protected] (H. Karimi) && [email protected] (A. Esfahanimehr) && [email protected] (M. Mosleh) && [email protected] (F. Mohammadin) && [email protected] (S. Salehpour) && [email protected] (O. Medhati).

1877-0509 © 2015 Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review under responsibility of organizing committee of the International Conference on Advanced Wireless, Information, and Communication Technologies (AWICT 2015) doi:10.1016/j.procs.2015.12.018

Hossein Karimi et al. / Procedia Computer Science 73 (2015) 416 – 425

417

Keywords: Persian handwritten digits recognition, preprocessing, feature extraction, classification, ensemble classifier.

1. Intruduction Image processing and analysis started development in 1960s. Handwriting recognition is a branch of image processing with many uses such as machine reading of postal addresses, checks and banknotes, old documents, barcodes, polls and tax forms [6]. Optical Character Recognition (OCR) is a field of research in pattern recognition, artificial intelligence and machine vision. It refers to the mechanical or electronic translation of images of handwritten, typewritten or printed text into machine-editable text. Nowadays, the accurate recognition of machine printed characters is considered largely a solved problem. However, handwritten character recognition is comparatively difficult, as different people have different handwriting styles. So, handwritten OCR is still a subject of active research [7]. Handwritten characters recognition has been a popular field of research in the last three decades [4, 5, 6, 8, 9, 10]. First researches on patterns recognition have done into Chinese and Japanese scripts [11]. Various scholars did research on online Latin handwritten characters recognition and they continue their studies [3]. Persian is the official language of more than 150 million people of the world [13]. Various methods have been suggested for English handwritten digits recognition and high recognition rate are reported. But, for the recognition of Farsi and Arabic handwritten digits very few works are reported and the results are not so satisfactory [12]. While detection of typed content and limited set of handwritten scripts has been solved to a great extent, recognizing handwritings and converting them to editable documents is faced with many problems. There is still a long way to go for achieving recognition systems with no limitation, and machine simulation of human writing and printed documents. Studies on detection of English handwritten scripts have been started since half a century ago, while research on Persian and Arabic languages began approximately twenty years later [12]. The most important challenges for Persian handwritten digits is conceiving high rates of recognition, complicated by the similarities of some Persian digits. Thus, developing new and better recognition techniques is of paramount importance. It is worth noting that digits written in Persian and Arabic scripts are not all identical. But the similarity is sufficient to necessitate research on both variants (Figure 1). In addition, numbers and formulae in Persian are from left to right–similarly to their English counterparts–despite the right to left nature of the language [14].

Figure 1: Sample (Persian/Arabic) handwritten and printed digits.

As shown in Figure 2, an ordinary OCR system include three major phase: Pre-processing, feature extraction, and

418

Hossein Karimi et al. / Procedia Computer Science 73 (2015) 416 – 425

classification, that output of each step is the input of next step. The purpose of the pre-processing is convert images to binary, remove noise, smooth, thin, make thick and normalize the input data, which can have a significant role in better differentiating patterns in the feature space. In the feature extraction section, as the most important, a feature vector is assigned to each sample which represents the sample in the related feature space and thus makes it distinct from the other samples. Feature type detection is application-dependent, and more suitable features are achieved through heuristic evaluation of the input data. Proper feature extraction and isolation of patterns in feature space greatly reduces the input data for the classification stage. Usually choosing more suitable features increases the recognition rate and reduce the time to recognize patterns. There are many feature extraction methods causing the recognition rate’s improvement. In the last ten years a lot of papers were published on pattern recognition and they used different methods for feature extraction, such as wavelet [1], profiles [3], structural features [7], Zernike moments, Gradient [24], Fourier Transform [6], Kirsch [24], Crossing counts feature, Projection histogram feature[3], Local chain code, Discrete Cosine Transform (DCT), Vertical and horizontal projections, Statistical features [24], Geometrical features [24], and shadow coding. In the classification stage, correct boundaries should be made between feature vectors, so that the samples of each pattern are separated from other samples by clear boundaries. There are various methods in classification stage such Support Vector Machines (SVMs) [2,25], Neural Networks (NNs) [6], K-Nearest-Neighbours (KNNs), Decision Trees, Hidden Markov Model (HMM), Fuzzy logic [26], Genetic algorithm, Bagging, Statistical Classifiers, Fisher linear discriminant, Hybrid Classifiers [23], Bayesian decision theory. The stage may combine classifiers as needed to improve classification results. Combining complementary classifier is common practice in pattern recognition that improves recognition rates and system reliability.

Figure 2: The core stages of OCR in Persian handwritten digit recognition

So far, numerous studies have been conducted in Persian OCR, which have been based on feature extraction methods and the use of those different classifiers. However, most of these studies have faced challenges in practice; the most important of them is the need of high accuracy especially in Persian because of similarities between different Farsi digits. The aim of this study is to produce an out-of-band recognition system with improved performance and accuracy by extracting fewer but more significant and complementary features. Therefore, necessary considerations have been given to any of the data pre-processing, feature extraction and classification sections. We used ensemble classifiers in the classification section to provide a framework, in order to increase the accuracy of recognition for Persian handwritten digits. The remaining sections of this paper are organized as follows: In 2nd section, introduced the underlying concepts of this study. In 3rd section, we provide the framework for our proposed method on the recognition of Persian handwritten digits. In 4th section, simulation results of the proposed method are presented. Finally, in 5th section, we summarize, bring conclusion and suggest some cases. 2. Concepts (Foundations of Research) This section introduces our ensemble classifier techniques. 2.1. Ensemble classifiers A classifier a function that can map a sample pattern to a class. As shown in Figure 3, an ensemble classifier consists of several individual classifiers that contribute to the final result [16] .

Hossein Karimi et al. / Procedia Computer Science 73 (2015) 416 – 425

419

Figure 3: Structure of an ensemble classifier

The main idea behind an ensemble classifier is to train a group of parallel or cascading classifiers instead of only one, and take advantage of their combined results [17]. Main motives include: •

Reduced variance: The end results are less influenced by the characteristics of one training set. (Variance is a measure of dispersion from the mean in a set).



Reduced bias: Combined classifiers can better represent concept of class.

Examples of ensemble classifier include Boosting [18], Bagging [19] and Random Forest . 2.1.1. Bagging classifier Bootstrap aggregation or Bagging classifier is an ensemble classification method that works by sampling with replacement from the main dataset. To better understand its work, let’s draw an analogy. Imagine a patient in need of diagnosing an ailment. Instead of asking one doctor, he can consult multiple, and concludes based on their cumulative judgment. In Bagging, each classifier works similar to one of those doctors. Each classifier of the training set is built through random extraction with replacement of N samples, where N is the size of the main training set. Many of the samples may possibly be repeated in the results of the training set and some may possibly not appear. Each individual classifier in an ensemble algorithm is built through various samplings of the training data. In some stages, some of the data to may possibly be omitted. This increases error, but eventually, by combining the results of classifiers, the error is reduced. Leo Breiman proved that Bagging works better with non-robust algorithms such as the decision tree and the neural network [20, 21]. The following code snippet (#1) is the pseudo-code for the Bagging algorithm:

420

Hossein Karimi et al. / Procedia Computer Science 73 (2015) 416 – 425

For m=1 to M // M… number of Iterations Draw (with replacement) a bootstrap Sample Sm of the data Learn a classifier Cm from Sm For each test example Try all classifiers Cm Predict the class that receives the Highest number of votes Algorithm 1. Bagging classifier

2. 1.2. Boosting classifiers In the analogy drawn in the previous section, a patient consulted multiple doctors to determine his illness. Let’s assume the patient values each diagnosis and assigns it a weight. The final conclusion is a combination the weighed judgments. This is the nature of Boosting. In this method, each training sample is assigned a weight. A series of classifiers are trained successively on the data. After the training of the Mi classifier, weights are updated to let the Mi+1 classifier pay more attention to the training sample that Mi did not classify. The last classifier combines the results of all classifiers to produces the ensemble classification result [21, 22]. Adaboost is a well-known variation of Boosting. It builds a set of classifiers in which each produces a weighed verdict. It works on the set D, with d number of labeled data, i.e. (x1, y1), (x2, y2) … (xi, yi), where yi is the label for xi. Adaboost sets the weight of each training sample to 1/d. In k rounds of the algorithm, k number of classifiers are built. In round i, samples of D are drawn to form the training set di with the size of d. Sampling is done with replacement, thus each sample may possibly be drawn several times. The probability of drawing each sample depends on its weight. The model Mi is produced from Di training samples. Therefore, its error margin is calculated through using Di as the test set. Each training sample is weighed depending on how it is classified. Correct classification increases the weight value while incorrect classification decreases the value. Weights are used for producing the training samples of next round’s classifiers. The main idea is to a build classifier that focus on the misclassified sample of the last round. This classifier may do a better job classifying some samples than the other classifiers. The following snippet shows the pseudo-code for Adaboost:

Input: D, a set of d class-labeled training tuples. K, the number of rounds (one classifier is generated per round). A classification learning scheme. Output: A composite model. Method: 1) Initialize the weight of each tuple in D to ; 2) For i=1 to k do: // for each round 3) Sample D with replacement according to tuple weights to obtain Di; 4) To derive a model Mi, User training set Di; 5) compute error Mi; 6) If error ( Mi> 0.5 ) then 7) Reinitialize the weights to 8) Go back to step 3 and try again; 9) End if; 10) For each tuple in Di That was correctly classified do:

Hossein Karimi et al. / Procedia Computer Science 73 (2015) 416 – 425

11) Multiply the weight of the tuple by



 

; // update weight



12) 13) 14) 15) 16) 17)

Normalize the weight of each tuple; End for To use the composite model to classify tuple, X: Initialize weight of each class to 0; For i=1 to k do: // for each classifier   ; Classifiers vote Wi = log

18) 19) 20) 21)

C = Mi(x); // Get class predication for x from Mi To weight for class c Add Wi ; End for; Return the class with the largest weight;



421



Algorithm 2. Adaboost classifier.

3. The Proposed Approach The input of the handwritten Persian digit recognition procedure is a set of images from digits, and classes are the digits to be recognized. The machine learning app must match the images to the digits. The goal of this study is to use ensemble algorithms to achieve high recognition rates. As such, the machine (computer) is trained with samples (a set of handwritten Persian digits) and is subsequently tested. Our proposed framework consists of multiple core stages. 3.1. First Stage: Preprocessing This stage, developed in Matlab®, consists of several steps. 3.1.1. Normalizing images As the images of the TMU database are not homogenous in dimensional sizes, the first step is to normalize them. We resized all of them into images of 40×40 pixels in Matlab (xn=40; yn = 40). This size proved to have better results. The best target size for normalization is the average resolution of all; excessive size changes can result in loss of data. 3.1.2. Making binary Our proposed algorithm requires input images to be monochrome, i.e. turned into a matrix of 1s and 0s. The most general method is Otsu’s methods, which Matlab supports. Figure 4 shows the results.

Figure 4: Example of a binary image of a digit

3.1.3. Thinning out the digits This step implements a function to thin out the contours of the digits, thus eliminating noise.

422

Hossein Karimi et al. / Procedia Computer Science 73 (2015) 416 – 425

3.2. The Second Stage: Feature Extraction This stage extracts features of the images of digits. Our goal is to extract the best features: 3.2.1. First feature set In this stage, the normalized 40×40 images are subdivided into twenty five 8×8 frames. The first set of features extracted are the number of 1s in each frame, resulting in 25 features per image. 3.2.2. Second feature set The distance of 1s from origin of the coordinates system in each frame is the second set of features. As frames may have more than a single 1, the distance from origin of each of 1s is calculated to generate a mean distance from origin for each frame, resulting in another set of 25 features per image (Figure 5).

Figure 5: Framing and calculating the distance and angle of 1s from the origin

3.2.3. Third feature set The third feature set consists of the angles (assigned ) of the lines connecting each of 1s to the origin of the coordinates system with the horizontal axis (Figure 5). To generate a set of 25 features, the s of all 1s in each frame are calculated and the mean per frame is produced. is calculated from the following reverse tangent formula.





         

Consult with figure 5 for

and

   



locations.

3.2.4. Fourth feature set The fourth set of features extracted are the first 40 factors of the fast Fourier transform (FFT). This feature set is an important element in accurate recognition. 3.3. The Third Stage: Classification The first and the second stages in Matlab end by generating a Comma-Separated Values (CSV) file containing the results. This file is consumed by WEKA, the computer program that performs the third stage, classification. WEKA, which stands for Waikato Environment Knowledge Analysis, is a free and open-source app developed in Waikato University in New Zealand. (WEKA is also a bird endemic to New Zealand, which appears on the app’s logo.) This app, in its modern form, is first released in 1997 and is licensed under the terms of GNU General Public License (GPL). It runs on Java platform and is available for multiple operating system. WEKA exposes a public API, allowing other apps to use its data-mining features. WEKA exposes a multitude of classifiers, among which one can find ensemble classifier such as Boosting, and

423

Hossein Karimi et al. / Procedia Computer Science 73 (2015) 416 – 425

Bagging that are introduced on section 2. These classifiers have excellent pattern recognition potentials. 4. Implementation, tests and results 4.1. Database For the Western European digits, there are many databases. However, only in the recent years, a few databases containing Persian digits and letter became available. These include TMU (Tarbiat Modares University) database, Hoda database, CENPARMI (Centre for Pattern Recognition and Machine Intelligence) database and MadBase. This study employs the TMU repository which has made 1699 samples of scanned handwritten Persian digits in monochrome bitmap format via its website. Figure 6 shows an example. Table 1 summarizes the number of samples in each class (c0 through c9) for this database.

Figure 6: A sample of scanned handwritten Persian digits from TMU database. Table 1: Number of samples in TMU database for each Persian digit

Digit class

0

Number of samples in database

176

1 170

2 163

3 171

4 173

5 172

6 174

7 172

8 169

9 159

4.2. Implementation and test In this study, various experiments were conducted to achieve high recognition rates, as explained above. In the experiments conducted on the TMU database, 1000 digit samples were used in training, 699 of which were used in trial run. In our implementation, Matlab was used in the first and second stages, while WEKA was used in the third stage and employed Boosting, and Bagging as the ensemble classifiers. Because the choice of the features and the classifier greatly impact the result, made our best effort to choose the most optimum for our proposed framework. The time spends in the trial run greatly depending on the dataset size and the hardware specifications. 4.3. Trial run results This section summarizes the results of the trial run on Sony VAIO laptop equipped with an Intel Core i7 CPU and 4 GB of RAM. Table 2 illustrates the result of the trial run on the TMU database. Table 2: The result of extracting 115 features from 1000 trained samples and 699 trial samples

Training samples 1000 1000

Examined samples 699 699

Features extracted 115 115

Classifier Boosting Bagging

Modeling time (sec.) 25.7 93.41

Recognition rate (percent) 95.280 95.14

424

Hossein Karimi et al. / Procedia Computer Science 73 (2015) 416 – 425

5. Conclusion and recommendations: This study represents great efforts in the way of achieving high recognition accuracy in handwritten Persian digit recognition. Because some digits in Persian script are written in different styles, the machine may encounter difficulty distinguishing them. Fast and accurate handwritten digit recognition is of paramount importance is financial and administrative departments. A handwritten patterns recognition system includes three main sections, pre-processing, feature extraction and classification. The purpose of the pre-processing is to remove noise, smooth and normalize the input data, which can have a significant role in better differentiating patterns in the feature space. In the feature extraction section, as the most important, a feature vector is assigned to each sample which represents the sample in the related feature space and thus makes it distinct from the other samples. Feature extraction is depending on the usage. Suitable feature extractions have a significant role in increasing the recognition rate and reducing the time of patterns detection. Also, feature extraction, has a great effect on the classification. In the classification stage, correct boundaries should be made between feature vectors, so that the samples of each pattern are separated from other samples by clear boundaries. Although the main purpose of this study was to improve the rate of Persian handwritten digits recognition, but next to it, other objectives such as increasing the speed of modeling and reducing computational complexity, have been considered. Also in choosing the features, use the complimentary ones. The best recognition rate of Persian handwritten digits using our proposed method on TMU digits database, was 95.280%. This result was obtained with 115 appropriate and complimentary features from digits' images and use of ensemble classifiers, is there in WEKA software, to classify the samples' classes and evaluate our proposed method. The Boosting classifier has the best results. Classifiers have different advantages and disadvantages, so combining them under the cover disadvantages of each by the other is a suggestion for future research. References [1]

Mowlaei, A., Faez, K., And T. Haghighat, 2002, ”Feature Extraction With Wavelet Transform for Recognition of Isolated Handwritten Farsi /Arabic Characters and Numerals”, , IEEE, Digital Signal Processing, Vol. 2, PP: 923- 926.

[2]

Alaei, A., Umapadalpal, and P. Nagabhushan, 2009, ”Using Modified Contour Features and SVM Based Classifier for the Recognition of Persian / Arabic Handwritten Numerals”, Seventh International Conference on Advances in Pattern Recognition, IEEE, DOI: 10.1109/ICARP.2009.14.

[3]

Soltanzadeh, H., Rahmati, M., 2004, “Recognition of Persian handwritten digit using image profiles of multiple orientations”, Pattern Recognition Letters, Elsevier, 25(2004), PP: 1569-1576.

[4]

Ebrahimpour, R., Davoudi Vahid, R., Mazloom Nezhad, B., 2011, “Decision Template with Gradient based features for farsi Handwritten word recognition”, International Journal of Hybrid Information Technology, Vol. , 4, No.1.

[5]

Khosravi, Hossein, Kabir, Ehsanollah, “A blackboard approach towards integrated Farsi OCR system”, Springer, ORIGINAL PAPER, IJDAR (2009) 12:21-32, DOI: 10.1007/s10032-009-0079-7, 2009.

[6]

Bagheri Noaparast, Kianoosh, Broumandnia, Ali, ,2009, “Persian handwritten word recognition using Zernike and fourier-mellin moments”, IEEE, 5 th International Conference: Sciences of Electronic, Technologies of Information and Telecommunications, SETTT 2009, TUNISIA, March 22-26, 2009.

[7]

Vamvakas, G., Gatos, B., Stavros J. Perantonis, 2010, “Handwritten character recognition through two-stage foreground sub-sampling”, Pattern Recognition, Elsevier, 43(2010), PP: 2807-2816, doi: 10.1016/j.patcog.2010.02.018.

[8]

Suen, C.,Berthod, M., 1980, “Automatic Recognition of Hand printed Characters The state of the Art”, Proceedings of the IEEE, Vol. 68, No. 4, PP: 469-487.

[9]

Suen, C. Y., Tappert, C. C., and T. Wakahara, 1990,“The state of the art in online handwriting Recognition”, IEEE Trans. Pattern Anal. Machine Intell, Vol.12, PP: 787- 808.

[10] Trier, O. D., Jain, A. K., Taxt, T., 1996, “Feature Extraction Methods for Character Recognition- A Survey”, Pattern Recognition, Vol. 29, No. 4, PP. 641-662. [11] Al-Usefi H. and Udpa S.S.1992. Recognition of arabic characters, IEEE PAMI, Vol. 14, No. 8, pp: 853-857. [12] Impedovo, S., Wang, P.S., and Bunke, H., editors, “Automatic Bankcheck Processing”, World Scientific, Singapore, 1997. [13] Khosravi, Hossein, Kabir, Ehsanollah, A blackboard approach towards integrated Farsi OCR system", Springer, ORIGINAL PAPER, IJDAR (2009) 12:21-32, DOI: 10.1007/s10032-009-0079-7, 2009.

Hossein Karimi et al. / Procedia Computer Science 73 (2015) 416 – 425

425

[14] Pan, W.M., Bui, T.D., and Suen, C.Y., “Isolated Handwritten Farsi numerals Recognition Using Sparse And Over-Complete Representations”, 2009 10 th International Conference on Document Analysis Recognition. [15] Abdleazemm, S., E-Sherif, E., “ Arabic Handwritten digit Recognition”, IJDAR, 2008, 11: P. 127-141. [16] Essa, E.M., Tolba, A.S., Elmougy, S., 2008, “A Comparison of Combined Classifier Architectures for Arabic Speech Recognition”. [17] Chan, J. C-W, Demarchi, L., de Voorde, T. V., Frank Canters, F., 2008, “Binary Classification Strategies for Mapping Urban Land Cover With Ensemble Classifiers”, IGARSS. [18] Garcia-Pedrajas, N., FEBRUARY 2009, “Constructing Ensemble of Classifiers by Means of Weighted Instance Selection”, IEEE Transactions On Neural Networks, Vol. 20, No. 2. [19] Polikar, R., “Ensemble Based System in Detection Making”, IEEE Circuits And System Magazine. [20] Duba, R. O., Hart, P. E., Stork, D. G., 2000, “Pattern Classification 2nd Edition”, Wiely-Inter science. [21] Chen, L., 2007, “A New Design of Multiple Classifier System and its Application to Classification of Time Series Data”, thesis requirement for the degree of Doctor of Philosophy in Electrical and Computer Engineering, Canada. [22] Oplet, A., Pinz, A., Fussenegger, M., Auer, P., 2006, “Generic Object Recognition With Boosting ”, IEEE, Transactions on Pattern Analysis and Machine Intelligence. [23] Xiao-Xiao Niu, Ching Y. Suen., 2012, “A novel hybrid CNN–SVM classifier for recognizing handwritten digits”, Pattern Recognition 45 (2012) 1318–1325, Elsevier. , doi:10.1016/j.patcog.2011.09.021 [24] Zhang, P., Tien D. Bui, Ching Y. Suen, 2007, “A novel cascade ensemble classifier system with a high recognition performance on handwritten digits”, Elsevier, Science Direct, Pattern Recognition, Vol. 40, PP: 3415-3429. [25] Sadri, J., Suen, C. Y., Bui, T. D., 2003, “Application o f support vector machine recognition of handwritten arabic/persian digits”, Proceedings of Second Iranian Conference on Machine Vision and Image Processing, Vol. 1, PP: 300-307. [26] Hanmandlu, M. ,Murali Mohan, K. R., Chakraborty, S., Goyal, S., Roy Choudhary, D., 2003, “Unconstrained handwritten character recognition based on fuzzy logic”, Pattern Recognition, Elsevier, Vol. 36, PP: 603-623.